All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3 0/4] virtio net: spurious interrupt related fixes
@ 2021-05-26  8:24 ` Michael S. Tsirkin
  0 siblings, 0 replies; 49+ messages in thread
From: Michael S. Tsirkin @ 2021-05-26  8:24 UTC (permalink / raw)
  To: linux-kernel
  Cc: Jakub Kicinski, Wei Wang, David Miller, netdev, Willem de Bruijn,
	virtualization


With the implementation of napi-tx in virtio driver, we clean tx
descriptors from rx napi handler, for the purpose of reducing tx
complete interrupts. But this introduces a race where tx complete
interrupt has been raised, but the handler finds there is no work to do
because we have done the work in the previous rx interrupt handler.
A similar issue exists with polling from start_xmit, it is however
less common because of the delayed cb optimization of the split ring -
but will likely affect the packed ring once that is more common.

In particular, this was reported to lead to the following warning msg:
[ 3588.010778] irq 38: nobody cared (try booting with the
"irqpoll" option)
[ 3588.017938] CPU: 4 PID: 0 Comm: swapper/4 Not tainted
5.3.0-19-generic #20~18.04.2-Ubuntu
[ 3588.017940] Call Trace:
[ 3588.017942]  <IRQ>
[ 3588.017951]  dump_stack+0x63/0x85
[ 3588.017953]  __report_bad_irq+0x35/0xc0
[ 3588.017955]  note_interrupt+0x24b/0x2a0
[ 3588.017956]  handle_irq_event_percpu+0x54/0x80
[ 3588.017957]  handle_irq_event+0x3b/0x60
[ 3588.017958]  handle_edge_irq+0x83/0x1a0
[ 3588.017961]  handle_irq+0x20/0x30
[ 3588.017964]  do_IRQ+0x50/0xe0
[ 3588.017966]  common_interrupt+0xf/0xf
[ 3588.017966]  </IRQ>
[ 3588.017989] handlers:
[ 3588.020374] [<000000001b9f1da8>] vring_interrupt
[ 3588.025099] Disabling IRQ #38

This patchset attempts to fix this by cleaning up a bunch of races
related to the handling of sq callbacks (aka tx interrupts).
Somewhat tested but I couldn't reproduce the original issues
reported, sending out for help with testing.

Wei, does this address the spurious interrupt issue you are
observing? Could you confirm please?

Thanks!

changes from v2:
	Fixed a race condition in start_xmit: enable_cb_delayed was
	done as an optimization (to push out event index for
	split ring) so we did not have to care about it
	returning false (recheck). Now that we actually disable the cb
	we have to do test the return value and do the actual recheck.


Michael S. Tsirkin (4):
  virtio_net: move tx vq operation under tx queue lock
  virtio_net: move txq wakeups under tx q lock
  virtio: fix up virtio_disable_cb
  virtio_net: disable cb aggressively

 drivers/net/virtio_net.c     | 49 ++++++++++++++++++++++++++++--------
 drivers/virtio/virtio_ring.c | 26 ++++++++++++++++++-
 2 files changed, 64 insertions(+), 11 deletions(-)

-- 
MST


^ permalink raw reply	[flat|nested] 49+ messages in thread

* [PATCH v3 0/4] virtio net: spurious interrupt related fixes
@ 2021-05-26  8:24 ` Michael S. Tsirkin
  0 siblings, 0 replies; 49+ messages in thread
From: Michael S. Tsirkin @ 2021-05-26  8:24 UTC (permalink / raw)
  To: linux-kernel
  Cc: Willem de Bruijn, netdev, virtualization, Jakub Kicinski,
	Wei Wang, David Miller


With the implementation of napi-tx in virtio driver, we clean tx
descriptors from rx napi handler, for the purpose of reducing tx
complete interrupts. But this introduces a race where tx complete
interrupt has been raised, but the handler finds there is no work to do
because we have done the work in the previous rx interrupt handler.
A similar issue exists with polling from start_xmit, it is however
less common because of the delayed cb optimization of the split ring -
but will likely affect the packed ring once that is more common.

In particular, this was reported to lead to the following warning msg:
[ 3588.010778] irq 38: nobody cared (try booting with the
"irqpoll" option)
[ 3588.017938] CPU: 4 PID: 0 Comm: swapper/4 Not tainted
5.3.0-19-generic #20~18.04.2-Ubuntu
[ 3588.017940] Call Trace:
[ 3588.017942]  <IRQ>
[ 3588.017951]  dump_stack+0x63/0x85
[ 3588.017953]  __report_bad_irq+0x35/0xc0
[ 3588.017955]  note_interrupt+0x24b/0x2a0
[ 3588.017956]  handle_irq_event_percpu+0x54/0x80
[ 3588.017957]  handle_irq_event+0x3b/0x60
[ 3588.017958]  handle_edge_irq+0x83/0x1a0
[ 3588.017961]  handle_irq+0x20/0x30
[ 3588.017964]  do_IRQ+0x50/0xe0
[ 3588.017966]  common_interrupt+0xf/0xf
[ 3588.017966]  </IRQ>
[ 3588.017989] handlers:
[ 3588.020374] [<000000001b9f1da8>] vring_interrupt
[ 3588.025099] Disabling IRQ #38

This patchset attempts to fix this by cleaning up a bunch of races
related to the handling of sq callbacks (aka tx interrupts).
Somewhat tested but I couldn't reproduce the original issues
reported, sending out for help with testing.

Wei, does this address the spurious interrupt issue you are
observing? Could you confirm please?

Thanks!

changes from v2:
	Fixed a race condition in start_xmit: enable_cb_delayed was
	done as an optimization (to push out event index for
	split ring) so we did not have to care about it
	returning false (recheck). Now that we actually disable the cb
	we have to do test the return value and do the actual recheck.


Michael S. Tsirkin (4):
  virtio_net: move tx vq operation under tx queue lock
  virtio_net: move txq wakeups under tx q lock
  virtio: fix up virtio_disable_cb
  virtio_net: disable cb aggressively

 drivers/net/virtio_net.c     | 49 ++++++++++++++++++++++++++++--------
 drivers/virtio/virtio_ring.c | 26 ++++++++++++++++++-
 2 files changed, 64 insertions(+), 11 deletions(-)

-- 
MST

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 49+ messages in thread

* [PATCH v3 1/4] virtio_net: move tx vq operation under tx queue lock
  2021-05-26  8:24 ` Michael S. Tsirkin
@ 2021-05-26  8:24   ` Michael S. Tsirkin
  -1 siblings, 0 replies; 49+ messages in thread
From: Michael S. Tsirkin @ 2021-05-26  8:24 UTC (permalink / raw)
  To: linux-kernel
  Cc: Jakub Kicinski, Wei Wang, David Miller, netdev, Willem de Bruijn,
	virtualization, Jason Wang

It's unsafe to operate a vq from multiple threads.
Unfortunately this is exactly what we do when invoking
clean tx poll from rx napi.
Same happens with napi-tx even without the
opportunistic cleaning from the receive interrupt: that races
with processing the vq in start_xmit.

As a fix move everything that deals with the vq to under tx lock.

Fixes: b92f1e6751a6 ("virtio-net: transmit napi")
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 drivers/net/virtio_net.c | 22 +++++++++++++++++++++-
 1 file changed, 21 insertions(+), 1 deletion(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index ac0c143f97b4..12512d1002ec 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -1508,6 +1508,8 @@ static int virtnet_poll_tx(struct napi_struct *napi, int budget)
 	struct virtnet_info *vi = sq->vq->vdev->priv;
 	unsigned int index = vq2txq(sq->vq);
 	struct netdev_queue *txq;
+	int opaque;
+	bool done;
 
 	if (unlikely(is_xdp_raw_buffer_queue(vi, index))) {
 		/* We don't need to enable cb for XDP */
@@ -1517,10 +1519,28 @@ static int virtnet_poll_tx(struct napi_struct *napi, int budget)
 
 	txq = netdev_get_tx_queue(vi->dev, index);
 	__netif_tx_lock(txq, raw_smp_processor_id());
+	virtqueue_disable_cb(sq->vq);
 	free_old_xmit_skbs(sq, true);
+
+	opaque = virtqueue_enable_cb_prepare(sq->vq);
+
+	done = napi_complete_done(napi, 0);
+
+	if (!done)
+		virtqueue_disable_cb(sq->vq);
+
 	__netif_tx_unlock(txq);
 
-	virtqueue_napi_complete(napi, sq->vq, 0);
+	if (done) {
+		if (unlikely(virtqueue_poll(sq->vq, opaque))) {
+			if (napi_schedule_prep(napi)) {
+				__netif_tx_lock(txq, raw_smp_processor_id());
+				virtqueue_disable_cb(sq->vq);
+				__netif_tx_unlock(txq);
+				__napi_schedule(napi);
+			}
+		}
+	}
 
 	if (sq->vq->num_free >= 2 + MAX_SKB_FRAGS)
 		netif_tx_wake_queue(txq);
-- 
MST


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v3 1/4] virtio_net: move tx vq operation under tx queue lock
@ 2021-05-26  8:24   ` Michael S. Tsirkin
  0 siblings, 0 replies; 49+ messages in thread
From: Michael S. Tsirkin @ 2021-05-26  8:24 UTC (permalink / raw)
  To: linux-kernel
  Cc: Willem de Bruijn, netdev, virtualization, Jakub Kicinski,
	Wei Wang, David Miller

It's unsafe to operate a vq from multiple threads.
Unfortunately this is exactly what we do when invoking
clean tx poll from rx napi.
Same happens with napi-tx even without the
opportunistic cleaning from the receive interrupt: that races
with processing the vq in start_xmit.

As a fix move everything that deals with the vq to under tx lock.

Fixes: b92f1e6751a6 ("virtio-net: transmit napi")
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 drivers/net/virtio_net.c | 22 +++++++++++++++++++++-
 1 file changed, 21 insertions(+), 1 deletion(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index ac0c143f97b4..12512d1002ec 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -1508,6 +1508,8 @@ static int virtnet_poll_tx(struct napi_struct *napi, int budget)
 	struct virtnet_info *vi = sq->vq->vdev->priv;
 	unsigned int index = vq2txq(sq->vq);
 	struct netdev_queue *txq;
+	int opaque;
+	bool done;
 
 	if (unlikely(is_xdp_raw_buffer_queue(vi, index))) {
 		/* We don't need to enable cb for XDP */
@@ -1517,10 +1519,28 @@ static int virtnet_poll_tx(struct napi_struct *napi, int budget)
 
 	txq = netdev_get_tx_queue(vi->dev, index);
 	__netif_tx_lock(txq, raw_smp_processor_id());
+	virtqueue_disable_cb(sq->vq);
 	free_old_xmit_skbs(sq, true);
+
+	opaque = virtqueue_enable_cb_prepare(sq->vq);
+
+	done = napi_complete_done(napi, 0);
+
+	if (!done)
+		virtqueue_disable_cb(sq->vq);
+
 	__netif_tx_unlock(txq);
 
-	virtqueue_napi_complete(napi, sq->vq, 0);
+	if (done) {
+		if (unlikely(virtqueue_poll(sq->vq, opaque))) {
+			if (napi_schedule_prep(napi)) {
+				__netif_tx_lock(txq, raw_smp_processor_id());
+				virtqueue_disable_cb(sq->vq);
+				__netif_tx_unlock(txq);
+				__napi_schedule(napi);
+			}
+		}
+	}
 
 	if (sq->vq->num_free >= 2 + MAX_SKB_FRAGS)
 		netif_tx_wake_queue(txq);
-- 
MST

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v3 2/4] virtio_net: move txq wakeups under tx q lock
  2021-05-26  8:24 ` Michael S. Tsirkin
@ 2021-05-26  8:24   ` Michael S. Tsirkin
  -1 siblings, 0 replies; 49+ messages in thread
From: Michael S. Tsirkin @ 2021-05-26  8:24 UTC (permalink / raw)
  To: linux-kernel
  Cc: Jakub Kicinski, Wei Wang, David Miller, netdev, Willem de Bruijn,
	virtualization, Jason Wang

We currently check num_free outside tx q lock
which is unsafe: new packets can arrive meanwhile
and there won't be space in the queue.
Thus a spurious queue wakeup causing overhead
and even packet drops.

Move the check under the lock to fix that.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 drivers/net/virtio_net.c | 13 +++++++------
 1 file changed, 7 insertions(+), 6 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 12512d1002ec..c29f42d1e04f 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -1434,11 +1434,12 @@ static void virtnet_poll_cleantx(struct receive_queue *rq)
 
 	if (__netif_tx_trylock(txq)) {
 		free_old_xmit_skbs(sq, true);
+
+		if (sq->vq->num_free >= 2 + MAX_SKB_FRAGS)
+			netif_tx_wake_queue(txq);
+
 		__netif_tx_unlock(txq);
 	}
-
-	if (sq->vq->num_free >= 2 + MAX_SKB_FRAGS)
-		netif_tx_wake_queue(txq);
 }
 
 static int virtnet_poll(struct napi_struct *napi, int budget)
@@ -1522,6 +1523,9 @@ static int virtnet_poll_tx(struct napi_struct *napi, int budget)
 	virtqueue_disable_cb(sq->vq);
 	free_old_xmit_skbs(sq, true);
 
+	if (sq->vq->num_free >= 2 + MAX_SKB_FRAGS)
+		netif_tx_wake_queue(txq);
+
 	opaque = virtqueue_enable_cb_prepare(sq->vq);
 
 	done = napi_complete_done(napi, 0);
@@ -1542,9 +1546,6 @@ static int virtnet_poll_tx(struct napi_struct *napi, int budget)
 		}
 	}
 
-	if (sq->vq->num_free >= 2 + MAX_SKB_FRAGS)
-		netif_tx_wake_queue(txq);
-
 	return 0;
 }
 
-- 
MST


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v3 2/4] virtio_net: move txq wakeups under tx q lock
@ 2021-05-26  8:24   ` Michael S. Tsirkin
  0 siblings, 0 replies; 49+ messages in thread
From: Michael S. Tsirkin @ 2021-05-26  8:24 UTC (permalink / raw)
  To: linux-kernel
  Cc: Willem de Bruijn, netdev, virtualization, Jakub Kicinski,
	Wei Wang, David Miller

We currently check num_free outside tx q lock
which is unsafe: new packets can arrive meanwhile
and there won't be space in the queue.
Thus a spurious queue wakeup causing overhead
and even packet drops.

Move the check under the lock to fix that.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 drivers/net/virtio_net.c | 13 +++++++------
 1 file changed, 7 insertions(+), 6 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 12512d1002ec..c29f42d1e04f 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -1434,11 +1434,12 @@ static void virtnet_poll_cleantx(struct receive_queue *rq)
 
 	if (__netif_tx_trylock(txq)) {
 		free_old_xmit_skbs(sq, true);
+
+		if (sq->vq->num_free >= 2 + MAX_SKB_FRAGS)
+			netif_tx_wake_queue(txq);
+
 		__netif_tx_unlock(txq);
 	}
-
-	if (sq->vq->num_free >= 2 + MAX_SKB_FRAGS)
-		netif_tx_wake_queue(txq);
 }
 
 static int virtnet_poll(struct napi_struct *napi, int budget)
@@ -1522,6 +1523,9 @@ static int virtnet_poll_tx(struct napi_struct *napi, int budget)
 	virtqueue_disable_cb(sq->vq);
 	free_old_xmit_skbs(sq, true);
 
+	if (sq->vq->num_free >= 2 + MAX_SKB_FRAGS)
+		netif_tx_wake_queue(txq);
+
 	opaque = virtqueue_enable_cb_prepare(sq->vq);
 
 	done = napi_complete_done(napi, 0);
@@ -1542,9 +1546,6 @@ static int virtnet_poll_tx(struct napi_struct *napi, int budget)
 		}
 	}
 
-	if (sq->vq->num_free >= 2 + MAX_SKB_FRAGS)
-		netif_tx_wake_queue(txq);
-
 	return 0;
 }
 
-- 
MST

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v3 3/4] virtio: fix up virtio_disable_cb
  2021-05-26  8:24 ` Michael S. Tsirkin
@ 2021-05-26  8:24   ` Michael S. Tsirkin
  -1 siblings, 0 replies; 49+ messages in thread
From: Michael S. Tsirkin @ 2021-05-26  8:24 UTC (permalink / raw)
  To: linux-kernel
  Cc: Jakub Kicinski, Wei Wang, David Miller, netdev, Willem de Bruijn,
	virtualization, Jason Wang

virtio_disable_cb is currently a nop for split ring with event index.
This is because it used to be always called from a callback when we know
device won't trigger more events until we update the index.  However,
now that we run with interrupts enabled a lot we also poll without a
callback so that is different: disabling callbacks will help reduce the
number of spurious interrupts.
Further, if using event index with a packed ring, and if being called
from a callback, we actually do disable interrupts which is unnecessary.

Fix both issues by tracking whenever we get a callback. If that is
the case disabling interrupts with event index can be a nop.
If not the case disable interrupts. Note: with a split ring
there's no explicit "no interrupts" value. For now we write
a fixed value so our chance of triggering an interupt
is 1/ring size. It's probably better to write something
related to the last used index there to reduce the chance
even further. For now I'm keeping it simple.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 drivers/virtio/virtio_ring.c | 26 +++++++++++++++++++++++++-
 1 file changed, 25 insertions(+), 1 deletion(-)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index 71e16b53e9c1..88f0b16b11b8 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -113,6 +113,9 @@ struct vring_virtqueue {
 	/* Last used index we've seen. */
 	u16 last_used_idx;
 
+	/* Hint for event idx: already triggered no need to disable. */
+	bool event_triggered;
+
 	union {
 		/* Available for split ring */
 		struct {
@@ -739,7 +742,10 @@ static void virtqueue_disable_cb_split(struct virtqueue *_vq)
 
 	if (!(vq->split.avail_flags_shadow & VRING_AVAIL_F_NO_INTERRUPT)) {
 		vq->split.avail_flags_shadow |= VRING_AVAIL_F_NO_INTERRUPT;
-		if (!vq->event)
+		if (vq->event)
+			/* TODO: this is a hack. Figure out a cleaner value to write. */
+			vring_used_event(&vq->split.vring) = 0x0;
+		else
 			vq->split.vring.avail->flags =
 				cpu_to_virtio16(_vq->vdev,
 						vq->split.avail_flags_shadow);
@@ -1605,6 +1611,7 @@ static struct virtqueue *vring_create_virtqueue_packed(
 	vq->weak_barriers = weak_barriers;
 	vq->broken = false;
 	vq->last_used_idx = 0;
+	vq->event_triggered = false;
 	vq->num_added = 0;
 	vq->packed_ring = true;
 	vq->use_dma_api = vring_use_dma_api(vdev);
@@ -1919,6 +1926,12 @@ void virtqueue_disable_cb(struct virtqueue *_vq)
 {
 	struct vring_virtqueue *vq = to_vvq(_vq);
 
+	/* If device triggered an event already it won't trigger one again:
+	 * no need to disable.
+	 */
+	if (vq->event_triggered)
+		return;
+
 	if (vq->packed_ring)
 		virtqueue_disable_cb_packed(_vq);
 	else
@@ -1942,6 +1955,9 @@ unsigned virtqueue_enable_cb_prepare(struct virtqueue *_vq)
 {
 	struct vring_virtqueue *vq = to_vvq(_vq);
 
+	if (vq->event_triggered)
+		vq->event_triggered = false;
+
 	return vq->packed_ring ? virtqueue_enable_cb_prepare_packed(_vq) :
 				 virtqueue_enable_cb_prepare_split(_vq);
 }
@@ -2005,6 +2021,9 @@ bool virtqueue_enable_cb_delayed(struct virtqueue *_vq)
 {
 	struct vring_virtqueue *vq = to_vvq(_vq);
 
+	if (vq->event_triggered)
+		vq->event_triggered = false;
+
 	return vq->packed_ring ? virtqueue_enable_cb_delayed_packed(_vq) :
 				 virtqueue_enable_cb_delayed_split(_vq);
 }
@@ -2044,6 +2063,10 @@ irqreturn_t vring_interrupt(int irq, void *_vq)
 	if (unlikely(vq->broken))
 		return IRQ_HANDLED;
 
+	/* Just a hint for performance: so it's ok that this can be racy! */
+	if (vq->event)
+		vq->event_triggered = true;
+
 	pr_debug("virtqueue callback for %p (%p)\n", vq, vq->vq.callback);
 	if (vq->vq.callback)
 		vq->vq.callback(&vq->vq);
@@ -2083,6 +2106,7 @@ struct virtqueue *__vring_new_virtqueue(unsigned int index,
 	vq->weak_barriers = weak_barriers;
 	vq->broken = false;
 	vq->last_used_idx = 0;
+	vq->event_triggered = false;
 	vq->num_added = 0;
 	vq->use_dma_api = vring_use_dma_api(vdev);
 #ifdef DEBUG
-- 
MST


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v3 3/4] virtio: fix up virtio_disable_cb
@ 2021-05-26  8:24   ` Michael S. Tsirkin
  0 siblings, 0 replies; 49+ messages in thread
From: Michael S. Tsirkin @ 2021-05-26  8:24 UTC (permalink / raw)
  To: linux-kernel
  Cc: Willem de Bruijn, netdev, virtualization, Jakub Kicinski,
	Wei Wang, David Miller

virtio_disable_cb is currently a nop for split ring with event index.
This is because it used to be always called from a callback when we know
device won't trigger more events until we update the index.  However,
now that we run with interrupts enabled a lot we also poll without a
callback so that is different: disabling callbacks will help reduce the
number of spurious interrupts.
Further, if using event index with a packed ring, and if being called
from a callback, we actually do disable interrupts which is unnecessary.

Fix both issues by tracking whenever we get a callback. If that is
the case disabling interrupts with event index can be a nop.
If not the case disable interrupts. Note: with a split ring
there's no explicit "no interrupts" value. For now we write
a fixed value so our chance of triggering an interupt
is 1/ring size. It's probably better to write something
related to the last used index there to reduce the chance
even further. For now I'm keeping it simple.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 drivers/virtio/virtio_ring.c | 26 +++++++++++++++++++++++++-
 1 file changed, 25 insertions(+), 1 deletion(-)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index 71e16b53e9c1..88f0b16b11b8 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -113,6 +113,9 @@ struct vring_virtqueue {
 	/* Last used index we've seen. */
 	u16 last_used_idx;
 
+	/* Hint for event idx: already triggered no need to disable. */
+	bool event_triggered;
+
 	union {
 		/* Available for split ring */
 		struct {
@@ -739,7 +742,10 @@ static void virtqueue_disable_cb_split(struct virtqueue *_vq)
 
 	if (!(vq->split.avail_flags_shadow & VRING_AVAIL_F_NO_INTERRUPT)) {
 		vq->split.avail_flags_shadow |= VRING_AVAIL_F_NO_INTERRUPT;
-		if (!vq->event)
+		if (vq->event)
+			/* TODO: this is a hack. Figure out a cleaner value to write. */
+			vring_used_event(&vq->split.vring) = 0x0;
+		else
 			vq->split.vring.avail->flags =
 				cpu_to_virtio16(_vq->vdev,
 						vq->split.avail_flags_shadow);
@@ -1605,6 +1611,7 @@ static struct virtqueue *vring_create_virtqueue_packed(
 	vq->weak_barriers = weak_barriers;
 	vq->broken = false;
 	vq->last_used_idx = 0;
+	vq->event_triggered = false;
 	vq->num_added = 0;
 	vq->packed_ring = true;
 	vq->use_dma_api = vring_use_dma_api(vdev);
@@ -1919,6 +1926,12 @@ void virtqueue_disable_cb(struct virtqueue *_vq)
 {
 	struct vring_virtqueue *vq = to_vvq(_vq);
 
+	/* If device triggered an event already it won't trigger one again:
+	 * no need to disable.
+	 */
+	if (vq->event_triggered)
+		return;
+
 	if (vq->packed_ring)
 		virtqueue_disable_cb_packed(_vq);
 	else
@@ -1942,6 +1955,9 @@ unsigned virtqueue_enable_cb_prepare(struct virtqueue *_vq)
 {
 	struct vring_virtqueue *vq = to_vvq(_vq);
 
+	if (vq->event_triggered)
+		vq->event_triggered = false;
+
 	return vq->packed_ring ? virtqueue_enable_cb_prepare_packed(_vq) :
 				 virtqueue_enable_cb_prepare_split(_vq);
 }
@@ -2005,6 +2021,9 @@ bool virtqueue_enable_cb_delayed(struct virtqueue *_vq)
 {
 	struct vring_virtqueue *vq = to_vvq(_vq);
 
+	if (vq->event_triggered)
+		vq->event_triggered = false;
+
 	return vq->packed_ring ? virtqueue_enable_cb_delayed_packed(_vq) :
 				 virtqueue_enable_cb_delayed_split(_vq);
 }
@@ -2044,6 +2063,10 @@ irqreturn_t vring_interrupt(int irq, void *_vq)
 	if (unlikely(vq->broken))
 		return IRQ_HANDLED;
 
+	/* Just a hint for performance: so it's ok that this can be racy! */
+	if (vq->event)
+		vq->event_triggered = true;
+
 	pr_debug("virtqueue callback for %p (%p)\n", vq, vq->vq.callback);
 	if (vq->vq.callback)
 		vq->vq.callback(&vq->vq);
@@ -2083,6 +2106,7 @@ struct virtqueue *__vring_new_virtqueue(unsigned int index,
 	vq->weak_barriers = weak_barriers;
 	vq->broken = false;
 	vq->last_used_idx = 0;
+	vq->event_triggered = false;
 	vq->num_added = 0;
 	vq->use_dma_api = vring_use_dma_api(vdev);
 #ifdef DEBUG
-- 
MST

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v3 4/4] virtio_net: disable cb aggressively
  2021-05-26  8:24 ` Michael S. Tsirkin
@ 2021-05-26  8:24   ` Michael S. Tsirkin
  -1 siblings, 0 replies; 49+ messages in thread
From: Michael S. Tsirkin @ 2021-05-26  8:24 UTC (permalink / raw)
  To: linux-kernel
  Cc: Jakub Kicinski, Wei Wang, David Miller, netdev, Willem de Bruijn,
	virtualization, Jason Wang

There are currently two cases where we poll TX vq not in response to a
callback: start xmit and rx napi.  We currently do this with callbacks
enabled which can cause extra interrupts from the card.  Used not to be
a big issue as we run with interrupts disabled but that is no longer the
case, and in some cases the rate of spurious interrupts is so high
linux detects this and actually kills the interrupt.

Fix up by disabling the callbacks before polling the tx vq.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 drivers/net/virtio_net.c | 16 ++++++++++++----
 1 file changed, 12 insertions(+), 4 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index c29f42d1e04f..a83dc038d8af 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -1433,7 +1433,10 @@ static void virtnet_poll_cleantx(struct receive_queue *rq)
 		return;
 
 	if (__netif_tx_trylock(txq)) {
-		free_old_xmit_skbs(sq, true);
+		do {
+			virtqueue_disable_cb(sq->vq);
+			free_old_xmit_skbs(sq, true);
+		} while (unlikely(!virtqueue_enable_cb_delayed(sq->vq)));
 
 		if (sq->vq->num_free >= 2 + MAX_SKB_FRAGS)
 			netif_tx_wake_queue(txq);
@@ -1605,12 +1608,17 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev)
 	struct netdev_queue *txq = netdev_get_tx_queue(dev, qnum);
 	bool kick = !netdev_xmit_more();
 	bool use_napi = sq->napi.weight;
+	unsigned int bytes = skb->len;
 
 	/* Free up any pending old buffers before queueing new ones. */
-	free_old_xmit_skbs(sq, false);
+	do {
+		if (use_napi)
+			virtqueue_disable_cb(sq->vq);
 
-	if (use_napi && kick)
-		virtqueue_enable_cb_delayed(sq->vq);
+		free_old_xmit_skbs(sq, false);
+
+	} while (use_napi && kick &&
+	       unlikely(!virtqueue_enable_cb_delayed(sq->vq)));
 
 	/* timestamp packet in software */
 	skb_tx_timestamp(skb);
-- 
MST


^ permalink raw reply related	[flat|nested] 49+ messages in thread

* [PATCH v3 4/4] virtio_net: disable cb aggressively
@ 2021-05-26  8:24   ` Michael S. Tsirkin
  0 siblings, 0 replies; 49+ messages in thread
From: Michael S. Tsirkin @ 2021-05-26  8:24 UTC (permalink / raw)
  To: linux-kernel
  Cc: Willem de Bruijn, netdev, virtualization, Jakub Kicinski,
	Wei Wang, David Miller

There are currently two cases where we poll TX vq not in response to a
callback: start xmit and rx napi.  We currently do this with callbacks
enabled which can cause extra interrupts from the card.  Used not to be
a big issue as we run with interrupts disabled but that is no longer the
case, and in some cases the rate of spurious interrupts is so high
linux detects this and actually kills the interrupt.

Fix up by disabling the callbacks before polling the tx vq.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 drivers/net/virtio_net.c | 16 ++++++++++++----
 1 file changed, 12 insertions(+), 4 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index c29f42d1e04f..a83dc038d8af 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -1433,7 +1433,10 @@ static void virtnet_poll_cleantx(struct receive_queue *rq)
 		return;
 
 	if (__netif_tx_trylock(txq)) {
-		free_old_xmit_skbs(sq, true);
+		do {
+			virtqueue_disable_cb(sq->vq);
+			free_old_xmit_skbs(sq, true);
+		} while (unlikely(!virtqueue_enable_cb_delayed(sq->vq)));
 
 		if (sq->vq->num_free >= 2 + MAX_SKB_FRAGS)
 			netif_tx_wake_queue(txq);
@@ -1605,12 +1608,17 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev)
 	struct netdev_queue *txq = netdev_get_tx_queue(dev, qnum);
 	bool kick = !netdev_xmit_more();
 	bool use_napi = sq->napi.weight;
+	unsigned int bytes = skb->len;
 
 	/* Free up any pending old buffers before queueing new ones. */
-	free_old_xmit_skbs(sq, false);
+	do {
+		if (use_napi)
+			virtqueue_disable_cb(sq->vq);
 
-	if (use_napi && kick)
-		virtqueue_enable_cb_delayed(sq->vq);
+		free_old_xmit_skbs(sq, false);
+
+	} while (use_napi && kick &&
+	       unlikely(!virtqueue_enable_cb_delayed(sq->vq)));
 
 	/* timestamp packet in software */
 	skb_tx_timestamp(skb);
-- 
MST

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply related	[flat|nested] 49+ messages in thread

* Re: [PATCH v3 4/4] virtio_net: disable cb aggressively
  2021-05-26  8:24   ` Michael S. Tsirkin
@ 2021-05-26 15:15     ` Eric Dumazet
  -1 siblings, 0 replies; 49+ messages in thread
From: Eric Dumazet @ 2021-05-26 15:15 UTC (permalink / raw)
  To: Michael S. Tsirkin, linux-kernel
  Cc: Jakub Kicinski, Wei Wang, David Miller, netdev, Willem de Bruijn,
	virtualization, Jason Wang



On 5/26/21 10:24 AM, Michael S. Tsirkin wrote:
> There are currently two cases where we poll TX vq not in response to a
> callback: start xmit and rx napi.  We currently do this with callbacks
> enabled which can cause extra interrupts from the card.  Used not to be
> a big issue as we run with interrupts disabled but that is no longer the
> case, and in some cases the rate of spurious interrupts is so high
> linux detects this and actually kills the interrupt.
> 
> Fix up by disabling the callbacks before polling the tx vq.


It is not clear why we want to poll TX completions from ndo_start_xmit() in napi mode ?

This seems not needed, adding costs to sender thread, this might
reduce the ability to use a different cpu for tx completions.

Also this will likely conflict with BQL model if we want to use BQL at some point.

> 

This probably needs a Fixes: tag 

> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> ---
>  drivers/net/virtio_net.c | 16 ++++++++++++----
>  1 file changed, 12 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index c29f42d1e04f..a83dc038d8af 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -1433,7 +1433,10 @@ static void virtnet_poll_cleantx(struct receive_queue *rq)
>  		return;
>  
>  	if (__netif_tx_trylock(txq)) {
> -		free_old_xmit_skbs(sq, true);
> +		do {
> +			virtqueue_disable_cb(sq->vq);
> +			free_old_xmit_skbs(sq, true);
> +		} while (unlikely(!virtqueue_enable_cb_delayed(sq->vq)));
>  
>  		if (sq->vq->num_free >= 2 + MAX_SKB_FRAGS)
>  			netif_tx_wake_queue(txq);
> @@ -1605,12 +1608,17 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev)
>  	struct netdev_queue *txq = netdev_get_tx_queue(dev, qnum);
>  	bool kick = !netdev_xmit_more();
>  	bool use_napi = sq->napi.weight;
> +	unsigned int bytes = skb->len;
>  
>  	/* Free up any pending old buffers before queueing new ones. */
> -	free_old_xmit_skbs(sq, false);
> +	do {
> +		if (use_napi)
> +			virtqueue_disable_cb(sq->vq);
>  
> -	if (use_napi && kick)
> -		virtqueue_enable_cb_delayed(sq->vq);
> +		free_old_xmit_skbs(sq, false);
> +
> +	} while (use_napi && kick &&
> +	       unlikely(!virtqueue_enable_cb_delayed(sq->vq)));
>  
>  	/* timestamp packet in software */
>  	skb_tx_timestamp(skb);
> 

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v3 4/4] virtio_net: disable cb aggressively
@ 2021-05-26 15:15     ` Eric Dumazet
  0 siblings, 0 replies; 49+ messages in thread
From: Eric Dumazet @ 2021-05-26 15:15 UTC (permalink / raw)
  To: Michael S. Tsirkin, linux-kernel
  Cc: Willem de Bruijn, netdev, virtualization, Jakub Kicinski,
	Wei Wang, David Miller



On 5/26/21 10:24 AM, Michael S. Tsirkin wrote:
> There are currently two cases where we poll TX vq not in response to a
> callback: start xmit and rx napi.  We currently do this with callbacks
> enabled which can cause extra interrupts from the card.  Used not to be
> a big issue as we run with interrupts disabled but that is no longer the
> case, and in some cases the rate of spurious interrupts is so high
> linux detects this and actually kills the interrupt.
> 
> Fix up by disabling the callbacks before polling the tx vq.


It is not clear why we want to poll TX completions from ndo_start_xmit() in napi mode ?

This seems not needed, adding costs to sender thread, this might
reduce the ability to use a different cpu for tx completions.

Also this will likely conflict with BQL model if we want to use BQL at some point.

> 

This probably needs a Fixes: tag 

> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> ---
>  drivers/net/virtio_net.c | 16 ++++++++++++----
>  1 file changed, 12 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index c29f42d1e04f..a83dc038d8af 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -1433,7 +1433,10 @@ static void virtnet_poll_cleantx(struct receive_queue *rq)
>  		return;
>  
>  	if (__netif_tx_trylock(txq)) {
> -		free_old_xmit_skbs(sq, true);
> +		do {
> +			virtqueue_disable_cb(sq->vq);
> +			free_old_xmit_skbs(sq, true);
> +		} while (unlikely(!virtqueue_enable_cb_delayed(sq->vq)));
>  
>  		if (sq->vq->num_free >= 2 + MAX_SKB_FRAGS)
>  			netif_tx_wake_queue(txq);
> @@ -1605,12 +1608,17 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev)
>  	struct netdev_queue *txq = netdev_get_tx_queue(dev, qnum);
>  	bool kick = !netdev_xmit_more();
>  	bool use_napi = sq->napi.weight;
> +	unsigned int bytes = skb->len;
>  
>  	/* Free up any pending old buffers before queueing new ones. */
> -	free_old_xmit_skbs(sq, false);
> +	do {
> +		if (use_napi)
> +			virtqueue_disable_cb(sq->vq);
>  
> -	if (use_napi && kick)
> -		virtqueue_enable_cb_delayed(sq->vq);
> +		free_old_xmit_skbs(sq, false);
> +
> +	} while (use_napi && kick &&
> +	       unlikely(!virtqueue_enable_cb_delayed(sq->vq)));
>  
>  	/* timestamp packet in software */
>  	skb_tx_timestamp(skb);
> 
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v3 0/4] virtio net: spurious interrupt related fixes
  2021-05-26  8:24 ` Michael S. Tsirkin
@ 2021-05-26 15:34   ` Willem de Bruijn
  -1 siblings, 0 replies; 49+ messages in thread
From: Willem de Bruijn @ 2021-05-26 15:34 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: linux-kernel, Jakub Kicinski, Wei Wang, David Miller,
	Network Development, virtualization

On Wed, May 26, 2021 at 4:24 AM Michael S. Tsirkin <mst@redhat.com> wrote:
>
>
> With the implementation of napi-tx in virtio driver, we clean tx
> descriptors from rx napi handler, for the purpose of reducing tx
> complete interrupts. But this introduces a race where tx complete
> interrupt has been raised, but the handler finds there is no work to do
> because we have done the work in the previous rx interrupt handler.
> A similar issue exists with polling from start_xmit, it is however
> less common because of the delayed cb optimization of the split ring -
> but will likely affect the packed ring once that is more common.
>
> In particular, this was reported to lead to the following warning msg:
> [ 3588.010778] irq 38: nobody cared (try booting with the
> "irqpoll" option)
> [ 3588.017938] CPU: 4 PID: 0 Comm: swapper/4 Not tainted
> 5.3.0-19-generic #20~18.04.2-Ubuntu
> [ 3588.017940] Call Trace:
> [ 3588.017942]  <IRQ>
> [ 3588.017951]  dump_stack+0x63/0x85
> [ 3588.017953]  __report_bad_irq+0x35/0xc0
> [ 3588.017955]  note_interrupt+0x24b/0x2a0
> [ 3588.017956]  handle_irq_event_percpu+0x54/0x80
> [ 3588.017957]  handle_irq_event+0x3b/0x60
> [ 3588.017958]  handle_edge_irq+0x83/0x1a0
> [ 3588.017961]  handle_irq+0x20/0x30
> [ 3588.017964]  do_IRQ+0x50/0xe0
> [ 3588.017966]  common_interrupt+0xf/0xf
> [ 3588.017966]  </IRQ>
> [ 3588.017989] handlers:
> [ 3588.020374] [<000000001b9f1da8>] vring_interrupt
> [ 3588.025099] Disabling IRQ #38
>
> This patchset attempts to fix this by cleaning up a bunch of races
> related to the handling of sq callbacks (aka tx interrupts).
> Somewhat tested but I couldn't reproduce the original issues
> reported, sending out for help with testing.
>
> Wei, does this address the spurious interrupt issue you are
> observing? Could you confirm please?

Thanks for working on this, Michael. Wei is on leave. I'll try to reproduce.

My main concern is whether the cost of the fix may be greater than the
race, if the additional locking may significantly impact
efficiency/throughput/latency. We lack that performance data right
now. The race had not been reported for years, and caused no real
concerns in the initial report we did get, either. That said, it may
be more problematic in specific scenarios, such as the packed rings
you pointed out.

One (additional) short term mitigation could be to further restrict
tx_napi default-on to exclude such scenarios.

Let me take a closer look at the individual patches.


>
> Thanks!
>
> changes from v2:
>         Fixed a race condition in start_xmit: enable_cb_delayed was
>         done as an optimization (to push out event index for
>         split ring) so we did not have to care about it
>         returning false (recheck). Now that we actually disable the cb
>         we have to do test the return value and do the actual recheck.
>
>
> Michael S. Tsirkin (4):
>   virtio_net: move tx vq operation under tx queue lock
>   virtio_net: move txq wakeups under tx q lock
>   virtio: fix up virtio_disable_cb
>   virtio_net: disable cb aggressively
>
>  drivers/net/virtio_net.c     | 49 ++++++++++++++++++++++++++++--------
>  drivers/virtio/virtio_ring.c | 26 ++++++++++++++++++-
>  2 files changed, 64 insertions(+), 11 deletions(-)
>
> --
> MST
>

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v3 0/4] virtio net: spurious interrupt related fixes
@ 2021-05-26 15:34   ` Willem de Bruijn
  0 siblings, 0 replies; 49+ messages in thread
From: Willem de Bruijn @ 2021-05-26 15:34 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Network Development, linux-kernel, virtualization,
	Jakub Kicinski, Wei Wang, David Miller

On Wed, May 26, 2021 at 4:24 AM Michael S. Tsirkin <mst@redhat.com> wrote:
>
>
> With the implementation of napi-tx in virtio driver, we clean tx
> descriptors from rx napi handler, for the purpose of reducing tx
> complete interrupts. But this introduces a race where tx complete
> interrupt has been raised, but the handler finds there is no work to do
> because we have done the work in the previous rx interrupt handler.
> A similar issue exists with polling from start_xmit, it is however
> less common because of the delayed cb optimization of the split ring -
> but will likely affect the packed ring once that is more common.
>
> In particular, this was reported to lead to the following warning msg:
> [ 3588.010778] irq 38: nobody cared (try booting with the
> "irqpoll" option)
> [ 3588.017938] CPU: 4 PID: 0 Comm: swapper/4 Not tainted
> 5.3.0-19-generic #20~18.04.2-Ubuntu
> [ 3588.017940] Call Trace:
> [ 3588.017942]  <IRQ>
> [ 3588.017951]  dump_stack+0x63/0x85
> [ 3588.017953]  __report_bad_irq+0x35/0xc0
> [ 3588.017955]  note_interrupt+0x24b/0x2a0
> [ 3588.017956]  handle_irq_event_percpu+0x54/0x80
> [ 3588.017957]  handle_irq_event+0x3b/0x60
> [ 3588.017958]  handle_edge_irq+0x83/0x1a0
> [ 3588.017961]  handle_irq+0x20/0x30
> [ 3588.017964]  do_IRQ+0x50/0xe0
> [ 3588.017966]  common_interrupt+0xf/0xf
> [ 3588.017966]  </IRQ>
> [ 3588.017989] handlers:
> [ 3588.020374] [<000000001b9f1da8>] vring_interrupt
> [ 3588.025099] Disabling IRQ #38
>
> This patchset attempts to fix this by cleaning up a bunch of races
> related to the handling of sq callbacks (aka tx interrupts).
> Somewhat tested but I couldn't reproduce the original issues
> reported, sending out for help with testing.
>
> Wei, does this address the spurious interrupt issue you are
> observing? Could you confirm please?

Thanks for working on this, Michael. Wei is on leave. I'll try to reproduce.

My main concern is whether the cost of the fix may be greater than the
race, if the additional locking may significantly impact
efficiency/throughput/latency. We lack that performance data right
now. The race had not been reported for years, and caused no real
concerns in the initial report we did get, either. That said, it may
be more problematic in specific scenarios, such as the packed rings
you pointed out.

One (additional) short term mitigation could be to further restrict
tx_napi default-on to exclude such scenarios.

Let me take a closer look at the individual patches.


>
> Thanks!
>
> changes from v2:
>         Fixed a race condition in start_xmit: enable_cb_delayed was
>         done as an optimization (to push out event index for
>         split ring) so we did not have to care about it
>         returning false (recheck). Now that we actually disable the cb
>         we have to do test the return value and do the actual recheck.
>
>
> Michael S. Tsirkin (4):
>   virtio_net: move tx vq operation under tx queue lock
>   virtio_net: move txq wakeups under tx q lock
>   virtio: fix up virtio_disable_cb
>   virtio_net: disable cb aggressively
>
>  drivers/net/virtio_net.c     | 49 ++++++++++++++++++++++++++++--------
>  drivers/virtio/virtio_ring.c | 26 ++++++++++++++++++-
>  2 files changed, 64 insertions(+), 11 deletions(-)
>
> --
> MST
>
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v3 4/4] virtio_net: disable cb aggressively
  2021-05-26  8:24   ` Michael S. Tsirkin
  (?)
  (?)
@ 2021-05-26 19:39   ` Jakub Kicinski
  -1 siblings, 0 replies; 49+ messages in thread
From: Jakub Kicinski @ 2021-05-26 19:39 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: linux-kernel, Wei Wang, David Miller, netdev, Willem de Bruijn,
	virtualization, Jason Wang

On Wed, 26 May 2021 04:24:43 -0400 Michael S. Tsirkin wrote:
> @@ -1605,12 +1608,17 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev)
>  	struct netdev_queue *txq = netdev_get_tx_queue(dev, qnum);
>  	bool kick = !netdev_xmit_more();
>  	bool use_napi = sq->napi.weight;
> +	unsigned int bytes = skb->len;

FWIW GCC says bytes is unused.

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v3 4/4] virtio_net: disable cb aggressively
  2021-05-26 15:15     ` Eric Dumazet
@ 2021-05-26 21:22       ` Willem de Bruijn
  -1 siblings, 0 replies; 49+ messages in thread
From: Willem de Bruijn @ 2021-05-26 21:22 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Michael S. Tsirkin, linux-kernel, Jakub Kicinski, Wei Wang,
	David Miller, Network Development, virtualization, Jason Wang

On Wed, May 26, 2021 at 11:15 AM Eric Dumazet <eric.dumazet@gmail.com> wrote:
>
>
>
> On 5/26/21 10:24 AM, Michael S. Tsirkin wrote:
> > There are currently two cases where we poll TX vq not in response to a
> > callback: start xmit and rx napi.  We currently do this with callbacks
> > enabled which can cause extra interrupts from the card.  Used not to be
> > a big issue as we run with interrupts disabled but that is no longer the
> > case, and in some cases the rate of spurious interrupts is so high
> > linux detects this and actually kills the interrupt.

Temporarily disabling interrupts during free_old_xmit_skbs in
virtnet_poll_cleantx might reduce the spurious interrupt rate by
avoiding an additional Tx interrupt from being scheduled during
virtnet_poll_cleantx.

It probably does not address all spurious interrupts, as
virtnet_poll_cleantx might also run in between the scheduling of the
Tx interrupt and the call to virtnet_poll_tx, right? The Tx and Rx
interrupts racing.

If I can reproduce the report, I can also test how much this helps in practice.

> > Fix up by disabling the callbacks before polling the tx vq.
>
>
> It is not clear why we want to poll TX completions from ndo_start_xmit() in napi mode ?

Yes, we can simply exclude that. The original napi-tx patch did not
make that change, but not for any strong reason.

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v3 4/4] virtio_net: disable cb aggressively
@ 2021-05-26 21:22       ` Willem de Bruijn
  0 siblings, 0 replies; 49+ messages in thread
From: Willem de Bruijn @ 2021-05-26 21:22 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Michael S. Tsirkin, Network Development, linux-kernel,
	virtualization, Jakub Kicinski, Wei Wang, David Miller

On Wed, May 26, 2021 at 11:15 AM Eric Dumazet <eric.dumazet@gmail.com> wrote:
>
>
>
> On 5/26/21 10:24 AM, Michael S. Tsirkin wrote:
> > There are currently two cases where we poll TX vq not in response to a
> > callback: start xmit and rx napi.  We currently do this with callbacks
> > enabled which can cause extra interrupts from the card.  Used not to be
> > a big issue as we run with interrupts disabled but that is no longer the
> > case, and in some cases the rate of spurious interrupts is so high
> > linux detects this and actually kills the interrupt.

Temporarily disabling interrupts during free_old_xmit_skbs in
virtnet_poll_cleantx might reduce the spurious interrupt rate by
avoiding an additional Tx interrupt from being scheduled during
virtnet_poll_cleantx.

It probably does not address all spurious interrupts, as
virtnet_poll_cleantx might also run in between the scheduling of the
Tx interrupt and the call to virtnet_poll_tx, right? The Tx and Rx
interrupts racing.

If I can reproduce the report, I can also test how much this helps in practice.

> > Fix up by disabling the callbacks before polling the tx vq.
>
>
> It is not clear why we want to poll TX completions from ndo_start_xmit() in napi mode ?

Yes, we can simply exclude that. The original napi-tx patch did not
make that change, but not for any strong reason.
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v3 1/4] virtio_net: move tx vq operation under tx queue lock
  2021-05-26  8:24   ` Michael S. Tsirkin
@ 2021-05-27  3:41     ` Jason Wang
  -1 siblings, 0 replies; 49+ messages in thread
From: Jason Wang @ 2021-05-27  3:41 UTC (permalink / raw)
  To: Michael S. Tsirkin, linux-kernel
  Cc: Jakub Kicinski, Wei Wang, David Miller, netdev, Willem de Bruijn,
	virtualization


在 2021/5/26 下午4:24, Michael S. Tsirkin 写道:
> It's unsafe to operate a vq from multiple threads.
> Unfortunately this is exactly what we do when invoking
> clean tx poll from rx napi.
> Same happens with napi-tx even without the
> opportunistic cleaning from the receive interrupt: that races
> with processing the vq in start_xmit.
>
> As a fix move everything that deals with the vq to under tx lock.
>
> Fixes: b92f1e6751a6 ("virtio-net: transmit napi")
> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> ---
>   drivers/net/virtio_net.c | 22 +++++++++++++++++++++-
>   1 file changed, 21 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index ac0c143f97b4..12512d1002ec 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -1508,6 +1508,8 @@ static int virtnet_poll_tx(struct napi_struct *napi, int budget)
>   	struct virtnet_info *vi = sq->vq->vdev->priv;
>   	unsigned int index = vq2txq(sq->vq);
>   	struct netdev_queue *txq;
> +	int opaque;
> +	bool done;
>   
>   	if (unlikely(is_xdp_raw_buffer_queue(vi, index))) {
>   		/* We don't need to enable cb for XDP */
> @@ -1517,10 +1519,28 @@ static int virtnet_poll_tx(struct napi_struct *napi, int budget)
>   
>   	txq = netdev_get_tx_queue(vi->dev, index);
>   	__netif_tx_lock(txq, raw_smp_processor_id());
> +	virtqueue_disable_cb(sq->vq);
>   	free_old_xmit_skbs(sq, true);
> +
> +	opaque = virtqueue_enable_cb_prepare(sq->vq);
> +
> +	done = napi_complete_done(napi, 0);
> +
> +	if (!done)
> +		virtqueue_disable_cb(sq->vq);
> +
>   	__netif_tx_unlock(txq);
>   
> -	virtqueue_napi_complete(napi, sq->vq, 0);
> +	if (done) {
> +		if (unlikely(virtqueue_poll(sq->vq, opaque))) {
> +			if (napi_schedule_prep(napi)) {
> +				__netif_tx_lock(txq, raw_smp_processor_id());
> +				virtqueue_disable_cb(sq->vq);
> +				__netif_tx_unlock(txq);
> +				__napi_schedule(napi);
> +			}
> +		}
> +	}


Interesting, this looks like somehwo a open-coded version of 
virtqueue_napi_complete(). I wonder if we can simply keep using 
virtqueue_napi_complete() by simply moving the __netif_tx_unlock() after 
that:

netif_tx_lock(txq);
free_old_xmit_skbs(sq, true);
virtqueue_napi_complete(napi, sq->vq, 0);
__netif_tx_unlock(txq);

Thanks


>   
>   	if (sq->vq->num_free >= 2 + MAX_SKB_FRAGS)
>   		netif_tx_wake_queue(txq);


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v3 1/4] virtio_net: move tx vq operation under tx queue lock
@ 2021-05-27  3:41     ` Jason Wang
  0 siblings, 0 replies; 49+ messages in thread
From: Jason Wang @ 2021-05-27  3:41 UTC (permalink / raw)
  To: Michael S. Tsirkin, linux-kernel
  Cc: Willem de Bruijn, netdev, virtualization, Jakub Kicinski,
	Wei Wang, David Miller


在 2021/5/26 下午4:24, Michael S. Tsirkin 写道:
> It's unsafe to operate a vq from multiple threads.
> Unfortunately this is exactly what we do when invoking
> clean tx poll from rx napi.
> Same happens with napi-tx even without the
> opportunistic cleaning from the receive interrupt: that races
> with processing the vq in start_xmit.
>
> As a fix move everything that deals with the vq to under tx lock.
>
> Fixes: b92f1e6751a6 ("virtio-net: transmit napi")
> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> ---
>   drivers/net/virtio_net.c | 22 +++++++++++++++++++++-
>   1 file changed, 21 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index ac0c143f97b4..12512d1002ec 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -1508,6 +1508,8 @@ static int virtnet_poll_tx(struct napi_struct *napi, int budget)
>   	struct virtnet_info *vi = sq->vq->vdev->priv;
>   	unsigned int index = vq2txq(sq->vq);
>   	struct netdev_queue *txq;
> +	int opaque;
> +	bool done;
>   
>   	if (unlikely(is_xdp_raw_buffer_queue(vi, index))) {
>   		/* We don't need to enable cb for XDP */
> @@ -1517,10 +1519,28 @@ static int virtnet_poll_tx(struct napi_struct *napi, int budget)
>   
>   	txq = netdev_get_tx_queue(vi->dev, index);
>   	__netif_tx_lock(txq, raw_smp_processor_id());
> +	virtqueue_disable_cb(sq->vq);
>   	free_old_xmit_skbs(sq, true);
> +
> +	opaque = virtqueue_enable_cb_prepare(sq->vq);
> +
> +	done = napi_complete_done(napi, 0);
> +
> +	if (!done)
> +		virtqueue_disable_cb(sq->vq);
> +
>   	__netif_tx_unlock(txq);
>   
> -	virtqueue_napi_complete(napi, sq->vq, 0);
> +	if (done) {
> +		if (unlikely(virtqueue_poll(sq->vq, opaque))) {
> +			if (napi_schedule_prep(napi)) {
> +				__netif_tx_lock(txq, raw_smp_processor_id());
> +				virtqueue_disable_cb(sq->vq);
> +				__netif_tx_unlock(txq);
> +				__napi_schedule(napi);
> +			}
> +		}
> +	}


Interesting, this looks like somehwo a open-coded version of 
virtqueue_napi_complete(). I wonder if we can simply keep using 
virtqueue_napi_complete() by simply moving the __netif_tx_unlock() after 
that:

netif_tx_lock(txq);
free_old_xmit_skbs(sq, true);
virtqueue_napi_complete(napi, sq->vq, 0);
__netif_tx_unlock(txq);

Thanks


>   
>   	if (sq->vq->num_free >= 2 + MAX_SKB_FRAGS)
>   		netif_tx_wake_queue(txq);

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v3 2/4] virtio_net: move txq wakeups under tx q lock
  2021-05-26  8:24   ` Michael S. Tsirkin
@ 2021-05-27  3:48     ` Jason Wang
  -1 siblings, 0 replies; 49+ messages in thread
From: Jason Wang @ 2021-05-27  3:48 UTC (permalink / raw)
  To: Michael S. Tsirkin, linux-kernel
  Cc: Jakub Kicinski, Wei Wang, David Miller, netdev, Willem de Bruijn,
	virtualization


在 2021/5/26 下午4:24, Michael S. Tsirkin 写道:
> We currently check num_free outside tx q lock
> which is unsafe: new packets can arrive meanwhile
> and there won't be space in the queue.
> Thus a spurious queue wakeup causing overhead
> and even packet drops.
>
> Move the check under the lock to fix that.
>
> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>


Acked-by: Jason Wang <jasowang@redhat.com>


> ---
>   drivers/net/virtio_net.c | 13 +++++++------
>   1 file changed, 7 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index 12512d1002ec..c29f42d1e04f 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -1434,11 +1434,12 @@ static void virtnet_poll_cleantx(struct receive_queue *rq)
>   
>   	if (__netif_tx_trylock(txq)) {
>   		free_old_xmit_skbs(sq, true);
> +
> +		if (sq->vq->num_free >= 2 + MAX_SKB_FRAGS)
> +			netif_tx_wake_queue(txq);
> +
>   		__netif_tx_unlock(txq);
>   	}
> -
> -	if (sq->vq->num_free >= 2 + MAX_SKB_FRAGS)
> -		netif_tx_wake_queue(txq);
>   }
>   
>   static int virtnet_poll(struct napi_struct *napi, int budget)
> @@ -1522,6 +1523,9 @@ static int virtnet_poll_tx(struct napi_struct *napi, int budget)
>   	virtqueue_disable_cb(sq->vq);
>   	free_old_xmit_skbs(sq, true);
>   
> +	if (sq->vq->num_free >= 2 + MAX_SKB_FRAGS)
> +		netif_tx_wake_queue(txq);
> +
>   	opaque = virtqueue_enable_cb_prepare(sq->vq);
>   
>   	done = napi_complete_done(napi, 0);
> @@ -1542,9 +1546,6 @@ static int virtnet_poll_tx(struct napi_struct *napi, int budget)
>   		}
>   	}
>   
> -	if (sq->vq->num_free >= 2 + MAX_SKB_FRAGS)
> -		netif_tx_wake_queue(txq);
> -
>   	return 0;
>   }
>   


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v3 2/4] virtio_net: move txq wakeups under tx q lock
@ 2021-05-27  3:48     ` Jason Wang
  0 siblings, 0 replies; 49+ messages in thread
From: Jason Wang @ 2021-05-27  3:48 UTC (permalink / raw)
  To: Michael S. Tsirkin, linux-kernel
  Cc: Willem de Bruijn, netdev, virtualization, Jakub Kicinski,
	Wei Wang, David Miller


在 2021/5/26 下午4:24, Michael S. Tsirkin 写道:
> We currently check num_free outside tx q lock
> which is unsafe: new packets can arrive meanwhile
> and there won't be space in the queue.
> Thus a spurious queue wakeup causing overhead
> and even packet drops.
>
> Move the check under the lock to fix that.
>
> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>


Acked-by: Jason Wang <jasowang@redhat.com>


> ---
>   drivers/net/virtio_net.c | 13 +++++++------
>   1 file changed, 7 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index 12512d1002ec..c29f42d1e04f 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -1434,11 +1434,12 @@ static void virtnet_poll_cleantx(struct receive_queue *rq)
>   
>   	if (__netif_tx_trylock(txq)) {
>   		free_old_xmit_skbs(sq, true);
> +
> +		if (sq->vq->num_free >= 2 + MAX_SKB_FRAGS)
> +			netif_tx_wake_queue(txq);
> +
>   		__netif_tx_unlock(txq);
>   	}
> -
> -	if (sq->vq->num_free >= 2 + MAX_SKB_FRAGS)
> -		netif_tx_wake_queue(txq);
>   }
>   
>   static int virtnet_poll(struct napi_struct *napi, int budget)
> @@ -1522,6 +1523,9 @@ static int virtnet_poll_tx(struct napi_struct *napi, int budget)
>   	virtqueue_disable_cb(sq->vq);
>   	free_old_xmit_skbs(sq, true);
>   
> +	if (sq->vq->num_free >= 2 + MAX_SKB_FRAGS)
> +		netif_tx_wake_queue(txq);
> +
>   	opaque = virtqueue_enable_cb_prepare(sq->vq);
>   
>   	done = napi_complete_done(napi, 0);
> @@ -1542,9 +1546,6 @@ static int virtnet_poll_tx(struct napi_struct *napi, int budget)
>   		}
>   	}
>   
> -	if (sq->vq->num_free >= 2 + MAX_SKB_FRAGS)
> -		netif_tx_wake_queue(txq);
> -
>   	return 0;
>   }
>   

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v3 3/4] virtio: fix up virtio_disable_cb
  2021-05-26  8:24   ` Michael S. Tsirkin
@ 2021-05-27  4:01     ` Jason Wang
  -1 siblings, 0 replies; 49+ messages in thread
From: Jason Wang @ 2021-05-27  4:01 UTC (permalink / raw)
  To: Michael S. Tsirkin, linux-kernel
  Cc: Jakub Kicinski, Wei Wang, David Miller, netdev, Willem de Bruijn,
	virtualization


在 2021/5/26 下午4:24, Michael S. Tsirkin 写道:
> virtio_disable_cb is currently a nop for split ring with event index.
> This is because it used to be always called from a callback when we know
> device won't trigger more events until we update the index.  However,
> now that we run with interrupts enabled a lot we also poll without a
> callback so that is different: disabling callbacks will help reduce the
> number of spurious interrupts.
> Further, if using event index with a packed ring, and if being called
> from a callback, we actually do disable interrupts which is unnecessary.
>
> Fix both issues by tracking whenever we get a callback. If that is
> the case disabling interrupts with event index can be a nop.


This seems unnecessary:

1) we check avail_flags_shadow before touching touching the index
2) The nop is not good at least for split, if we choose a suitable event 
index, it can help to reduce the chance of 1/N interrupt, (see below).


> If not the case disable interrupts. Note: with a split ring
> there's no explicit "no interrupts" value. For now we write
> a fixed value so our chance of triggering an interupt
> is 1/ring size.


1/65535 actually? If yes, do we still need this trick?


>   It's probably better to write something
> related to the last used index there to reduce the chance
> even further. For now I'm keeping it simple.
>
> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> ---
>   drivers/virtio/virtio_ring.c | 26 +++++++++++++++++++++++++-
>   1 file changed, 25 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> index 71e16b53e9c1..88f0b16b11b8 100644
> --- a/drivers/virtio/virtio_ring.c
> +++ b/drivers/virtio/virtio_ring.c
> @@ -113,6 +113,9 @@ struct vring_virtqueue {
>   	/* Last used index we've seen. */
>   	u16 last_used_idx;
>   
> +	/* Hint for event idx: already triggered no need to disable. */
> +	bool event_triggered;
> +
>   	union {
>   		/* Available for split ring */
>   		struct {
> @@ -739,7 +742,10 @@ static void virtqueue_disable_cb_split(struct virtqueue *_vq)
>   
>   	if (!(vq->split.avail_flags_shadow & VRING_AVAIL_F_NO_INTERRUPT)) {
>   		vq->split.avail_flags_shadow |= VRING_AVAIL_F_NO_INTERRUPT;
> -		if (!vq->event)
> +		if (vq->event)
> +			/* TODO: this is a hack. Figure out a cleaner value to write. */
> +			vring_used_event(&vq->split.vring) = 0x0;


used_idx or last_used_idx seems better here.


> +		else
>   			vq->split.vring.avail->flags =
>   				cpu_to_virtio16(_vq->vdev,
>   						vq->split.avail_flags_shadow);
> @@ -1605,6 +1611,7 @@ static struct virtqueue *vring_create_virtqueue_packed(
>   	vq->weak_barriers = weak_barriers;
>   	vq->broken = false;
>   	vq->last_used_idx = 0;
> +	vq->event_triggered = false;
>   	vq->num_added = 0;
>   	vq->packed_ring = true;
>   	vq->use_dma_api = vring_use_dma_api(vdev);
> @@ -1919,6 +1926,12 @@ void virtqueue_disable_cb(struct virtqueue *_vq)
>   {
>   	struct vring_virtqueue *vq = to_vvq(_vq);
>   
> +	/* If device triggered an event already it won't trigger one again:
> +	 * no need to disable.
> +	 */
> +	if (vq->event_triggered)
> +		return;
> +
>   	if (vq->packed_ring)
>   		virtqueue_disable_cb_packed(_vq);
>   	else
> @@ -1942,6 +1955,9 @@ unsigned virtqueue_enable_cb_prepare(struct virtqueue *_vq)
>   {
>   	struct vring_virtqueue *vq = to_vvq(_vq);
>   
> +	if (vq->event_triggered)
> +		vq->event_triggered = false;
> +
>   	return vq->packed_ring ? virtqueue_enable_cb_prepare_packed(_vq) :
>   				 virtqueue_enable_cb_prepare_split(_vq);
>   }
> @@ -2005,6 +2021,9 @@ bool virtqueue_enable_cb_delayed(struct virtqueue *_vq)
>   {
>   	struct vring_virtqueue *vq = to_vvq(_vq);
>   
> +	if (vq->event_triggered)
> +		vq->event_triggered = false;
> +
>   	return vq->packed_ring ? virtqueue_enable_cb_delayed_packed(_vq) :
>   				 virtqueue_enable_cb_delayed_split(_vq);
>   }


Miss the case of virtqueue_enable_cb()?

Thanks


> @@ -2044,6 +2063,10 @@ irqreturn_t vring_interrupt(int irq, void *_vq)
>   	if (unlikely(vq->broken))
>   		return IRQ_HANDLED;
>   
> +	/* Just a hint for performance: so it's ok that this can be racy! */
> +	if (vq->event)
> +		vq->event_triggered = true;
> +
>   	pr_debug("virtqueue callback for %p (%p)\n", vq, vq->vq.callback);
>   	if (vq->vq.callback)
>   		vq->vq.callback(&vq->vq);
> @@ -2083,6 +2106,7 @@ struct virtqueue *__vring_new_virtqueue(unsigned int index,
>   	vq->weak_barriers = weak_barriers;
>   	vq->broken = false;
>   	vq->last_used_idx = 0;
> +	vq->event_triggered = false;
>   	vq->num_added = 0;
>   	vq->use_dma_api = vring_use_dma_api(vdev);
>   #ifdef DEBUG


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v3 3/4] virtio: fix up virtio_disable_cb
@ 2021-05-27  4:01     ` Jason Wang
  0 siblings, 0 replies; 49+ messages in thread
From: Jason Wang @ 2021-05-27  4:01 UTC (permalink / raw)
  To: Michael S. Tsirkin, linux-kernel
  Cc: Willem de Bruijn, netdev, virtualization, Jakub Kicinski,
	Wei Wang, David Miller


在 2021/5/26 下午4:24, Michael S. Tsirkin 写道:
> virtio_disable_cb is currently a nop for split ring with event index.
> This is because it used to be always called from a callback when we know
> device won't trigger more events until we update the index.  However,
> now that we run with interrupts enabled a lot we also poll without a
> callback so that is different: disabling callbacks will help reduce the
> number of spurious interrupts.
> Further, if using event index with a packed ring, and if being called
> from a callback, we actually do disable interrupts which is unnecessary.
>
> Fix both issues by tracking whenever we get a callback. If that is
> the case disabling interrupts with event index can be a nop.


This seems unnecessary:

1) we check avail_flags_shadow before touching touching the index
2) The nop is not good at least for split, if we choose a suitable event 
index, it can help to reduce the chance of 1/N interrupt, (see below).


> If not the case disable interrupts. Note: with a split ring
> there's no explicit "no interrupts" value. For now we write
> a fixed value so our chance of triggering an interupt
> is 1/ring size.


1/65535 actually? If yes, do we still need this trick?


>   It's probably better to write something
> related to the last used index there to reduce the chance
> even further. For now I'm keeping it simple.
>
> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> ---
>   drivers/virtio/virtio_ring.c | 26 +++++++++++++++++++++++++-
>   1 file changed, 25 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> index 71e16b53e9c1..88f0b16b11b8 100644
> --- a/drivers/virtio/virtio_ring.c
> +++ b/drivers/virtio/virtio_ring.c
> @@ -113,6 +113,9 @@ struct vring_virtqueue {
>   	/* Last used index we've seen. */
>   	u16 last_used_idx;
>   
> +	/* Hint for event idx: already triggered no need to disable. */
> +	bool event_triggered;
> +
>   	union {
>   		/* Available for split ring */
>   		struct {
> @@ -739,7 +742,10 @@ static void virtqueue_disable_cb_split(struct virtqueue *_vq)
>   
>   	if (!(vq->split.avail_flags_shadow & VRING_AVAIL_F_NO_INTERRUPT)) {
>   		vq->split.avail_flags_shadow |= VRING_AVAIL_F_NO_INTERRUPT;
> -		if (!vq->event)
> +		if (vq->event)
> +			/* TODO: this is a hack. Figure out a cleaner value to write. */
> +			vring_used_event(&vq->split.vring) = 0x0;


used_idx or last_used_idx seems better here.


> +		else
>   			vq->split.vring.avail->flags =
>   				cpu_to_virtio16(_vq->vdev,
>   						vq->split.avail_flags_shadow);
> @@ -1605,6 +1611,7 @@ static struct virtqueue *vring_create_virtqueue_packed(
>   	vq->weak_barriers = weak_barriers;
>   	vq->broken = false;
>   	vq->last_used_idx = 0;
> +	vq->event_triggered = false;
>   	vq->num_added = 0;
>   	vq->packed_ring = true;
>   	vq->use_dma_api = vring_use_dma_api(vdev);
> @@ -1919,6 +1926,12 @@ void virtqueue_disable_cb(struct virtqueue *_vq)
>   {
>   	struct vring_virtqueue *vq = to_vvq(_vq);
>   
> +	/* If device triggered an event already it won't trigger one again:
> +	 * no need to disable.
> +	 */
> +	if (vq->event_triggered)
> +		return;
> +
>   	if (vq->packed_ring)
>   		virtqueue_disable_cb_packed(_vq);
>   	else
> @@ -1942,6 +1955,9 @@ unsigned virtqueue_enable_cb_prepare(struct virtqueue *_vq)
>   {
>   	struct vring_virtqueue *vq = to_vvq(_vq);
>   
> +	if (vq->event_triggered)
> +		vq->event_triggered = false;
> +
>   	return vq->packed_ring ? virtqueue_enable_cb_prepare_packed(_vq) :
>   				 virtqueue_enable_cb_prepare_split(_vq);
>   }
> @@ -2005,6 +2021,9 @@ bool virtqueue_enable_cb_delayed(struct virtqueue *_vq)
>   {
>   	struct vring_virtqueue *vq = to_vvq(_vq);
>   
> +	if (vq->event_triggered)
> +		vq->event_triggered = false;
> +
>   	return vq->packed_ring ? virtqueue_enable_cb_delayed_packed(_vq) :
>   				 virtqueue_enable_cb_delayed_split(_vq);
>   }


Miss the case of virtqueue_enable_cb()?

Thanks


> @@ -2044,6 +2063,10 @@ irqreturn_t vring_interrupt(int irq, void *_vq)
>   	if (unlikely(vq->broken))
>   		return IRQ_HANDLED;
>   
> +	/* Just a hint for performance: so it's ok that this can be racy! */
> +	if (vq->event)
> +		vq->event_triggered = true;
> +
>   	pr_debug("virtqueue callback for %p (%p)\n", vq, vq->vq.callback);
>   	if (vq->vq.callback)
>   		vq->vq.callback(&vq->vq);
> @@ -2083,6 +2106,7 @@ struct virtqueue *__vring_new_virtqueue(unsigned int index,
>   	vq->weak_barriers = weak_barriers;
>   	vq->broken = false;
>   	vq->last_used_idx = 0;
> +	vq->event_triggered = false;
>   	vq->num_added = 0;
>   	vq->use_dma_api = vring_use_dma_api(vdev);
>   #ifdef DEBUG

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v3 4/4] virtio_net: disable cb aggressively
  2021-05-26  8:24   ` Michael S. Tsirkin
@ 2021-05-27  4:09     ` Jason Wang
  -1 siblings, 0 replies; 49+ messages in thread
From: Jason Wang @ 2021-05-27  4:09 UTC (permalink / raw)
  To: Michael S. Tsirkin, linux-kernel
  Cc: Jakub Kicinski, Wei Wang, David Miller, netdev, Willem de Bruijn,
	virtualization


在 2021/5/26 下午4:24, Michael S. Tsirkin 写道:
> There are currently two cases where we poll TX vq not in response to a
> callback: start xmit and rx napi.  We currently do this with callbacks
> enabled which can cause extra interrupts from the card.  Used not to be
> a big issue as we run with interrupts disabled but that is no longer the
> case, and in some cases the rate of spurious interrupts is so high
> linux detects this and actually kills the interrupt.
>
> Fix up by disabling the callbacks before polling the tx vq.
>
> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> ---
>   drivers/net/virtio_net.c | 16 ++++++++++++----
>   1 file changed, 12 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index c29f42d1e04f..a83dc038d8af 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -1433,7 +1433,10 @@ static void virtnet_poll_cleantx(struct receive_queue *rq)
>   		return;
>   
>   	if (__netif_tx_trylock(txq)) {
> -		free_old_xmit_skbs(sq, true);
> +		do {
> +			virtqueue_disable_cb(sq->vq);
> +			free_old_xmit_skbs(sq, true);
> +		} while (unlikely(!virtqueue_enable_cb_delayed(sq->vq)));
>   
>   		if (sq->vq->num_free >= 2 + MAX_SKB_FRAGS)
>   			netif_tx_wake_queue(txq);
> @@ -1605,12 +1608,17 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev)
>   	struct netdev_queue *txq = netdev_get_tx_queue(dev, qnum);
>   	bool kick = !netdev_xmit_more();
>   	bool use_napi = sq->napi.weight;
> +	unsigned int bytes = skb->len;
>   
>   	/* Free up any pending old buffers before queueing new ones. */
> -	free_old_xmit_skbs(sq, false);
> +	do {
> +		if (use_napi)
> +			virtqueue_disable_cb(sq->vq);
>   
> -	if (use_napi && kick)
> -		virtqueue_enable_cb_delayed(sq->vq);
> +		free_old_xmit_skbs(sq, false);
> +
> +	} while (use_napi && kick &&
> +	       unlikely(!virtqueue_enable_cb_delayed(sq->vq)));
>   
>   	/* timestamp packet in software */
>   	skb_tx_timestamp(skb);


I wonder whehter we can simple disable cb during ndo_start_xmit(), or is 
there a way to make xmit and napi work in parallel?

Thanks



^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v3 4/4] virtio_net: disable cb aggressively
@ 2021-05-27  4:09     ` Jason Wang
  0 siblings, 0 replies; 49+ messages in thread
From: Jason Wang @ 2021-05-27  4:09 UTC (permalink / raw)
  To: Michael S. Tsirkin, linux-kernel
  Cc: Willem de Bruijn, netdev, virtualization, Jakub Kicinski,
	Wei Wang, David Miller


在 2021/5/26 下午4:24, Michael S. Tsirkin 写道:
> There are currently two cases where we poll TX vq not in response to a
> callback: start xmit and rx napi.  We currently do this with callbacks
> enabled which can cause extra interrupts from the card.  Used not to be
> a big issue as we run with interrupts disabled but that is no longer the
> case, and in some cases the rate of spurious interrupts is so high
> linux detects this and actually kills the interrupt.
>
> Fix up by disabling the callbacks before polling the tx vq.
>
> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> ---
>   drivers/net/virtio_net.c | 16 ++++++++++++----
>   1 file changed, 12 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index c29f42d1e04f..a83dc038d8af 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -1433,7 +1433,10 @@ static void virtnet_poll_cleantx(struct receive_queue *rq)
>   		return;
>   
>   	if (__netif_tx_trylock(txq)) {
> -		free_old_xmit_skbs(sq, true);
> +		do {
> +			virtqueue_disable_cb(sq->vq);
> +			free_old_xmit_skbs(sq, true);
> +		} while (unlikely(!virtqueue_enable_cb_delayed(sq->vq)));
>   
>   		if (sq->vq->num_free >= 2 + MAX_SKB_FRAGS)
>   			netif_tx_wake_queue(txq);
> @@ -1605,12 +1608,17 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev)
>   	struct netdev_queue *txq = netdev_get_tx_queue(dev, qnum);
>   	bool kick = !netdev_xmit_more();
>   	bool use_napi = sq->napi.weight;
> +	unsigned int bytes = skb->len;
>   
>   	/* Free up any pending old buffers before queueing new ones. */
> -	free_old_xmit_skbs(sq, false);
> +	do {
> +		if (use_napi)
> +			virtqueue_disable_cb(sq->vq);
>   
> -	if (use_napi && kick)
> -		virtqueue_enable_cb_delayed(sq->vq);
> +		free_old_xmit_skbs(sq, false);
> +
> +	} while (use_napi && kick &&
> +	       unlikely(!virtqueue_enable_cb_delayed(sq->vq)));
>   
>   	/* timestamp packet in software */
>   	skb_tx_timestamp(skb);


I wonder whehter we can simple disable cb during ndo_start_xmit(), or is 
there a way to make xmit and napi work in parallel?

Thanks


_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v3 1/4] virtio_net: move tx vq operation under tx queue lock
  2021-05-27  3:41     ` Jason Wang
@ 2021-05-28 22:25       ` Willem de Bruijn
  -1 siblings, 0 replies; 49+ messages in thread
From: Willem de Bruijn @ 2021-05-28 22:25 UTC (permalink / raw)
  To: Jason Wang
  Cc: Michael S. Tsirkin, linux-kernel, Jakub Kicinski, Wei Wang,
	David Miller, Network Development, virtualization

On Wed, May 26, 2021 at 11:41 PM Jason Wang <jasowang@redhat.com> wrote:
>
>
> 在 2021/5/26 下午4:24, Michael S. Tsirkin 写道:
> > It's unsafe to operate a vq from multiple threads.
> > Unfortunately this is exactly what we do when invoking
> > clean tx poll from rx napi.
> > Same happens with napi-tx even without the
> > opportunistic cleaning from the receive interrupt: that races
> > with processing the vq in start_xmit.
> >
> > As a fix move everything that deals with the vq to under tx lock.

This patch also disables callbacks during free_old_xmit_skbs
processing on tx interrupt. Should that be a separate commit, with its
own explanation?
> >
> > Fixes: b92f1e6751a6 ("virtio-net: transmit napi")
> > Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> > ---
> >   drivers/net/virtio_net.c | 22 +++++++++++++++++++++-
> >   1 file changed, 21 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > index ac0c143f97b4..12512d1002ec 100644
> > --- a/drivers/net/virtio_net.c
> > +++ b/drivers/net/virtio_net.c
> > @@ -1508,6 +1508,8 @@ static int virtnet_poll_tx(struct napi_struct *napi, int budget)
> >       struct virtnet_info *vi = sq->vq->vdev->priv;
> >       unsigned int index = vq2txq(sq->vq);
> >       struct netdev_queue *txq;
> > +     int opaque;
> > +     bool done;
> >
> >       if (unlikely(is_xdp_raw_buffer_queue(vi, index))) {
> >               /* We don't need to enable cb for XDP */
> > @@ -1517,10 +1519,28 @@ static int virtnet_poll_tx(struct napi_struct *napi, int budget)
> >
> >       txq = netdev_get_tx_queue(vi->dev, index);
> >       __netif_tx_lock(txq, raw_smp_processor_id());
> > +     virtqueue_disable_cb(sq->vq);
> >       free_old_xmit_skbs(sq, true);
> > +
> > +     opaque = virtqueue_enable_cb_prepare(sq->vq);
> > +
> > +     done = napi_complete_done(napi, 0);
> > +
> > +     if (!done)
> > +             virtqueue_disable_cb(sq->vq);
> > +
> >       __netif_tx_unlock(txq);
> >
> > -     virtqueue_napi_complete(napi, sq->vq, 0);
> > +     if (done) {
> > +             if (unlikely(virtqueue_poll(sq->vq, opaque))) {

Should this also be inside the lock, as it operates on vq?

Is there anything that is not allowed to run with the lock held?

> > +                     if (napi_schedule_prep(napi)) {
> > +                             __netif_tx_lock(txq, raw_smp_processor_id());
> > +                             virtqueue_disable_cb(sq->vq);
> > +                             __netif_tx_unlock(txq);
> > +                             __napi_schedule(napi);
> > +                     }
> > +             }
> > +     }
>
>
> Interesting, this looks like somehwo a open-coded version of
> virtqueue_napi_complete(). I wonder if we can simply keep using
> virtqueue_napi_complete() by simply moving the __netif_tx_unlock() after
> that:
>
> netif_tx_lock(txq);
> free_old_xmit_skbs(sq, true);
> virtqueue_napi_complete(napi, sq->vq, 0);
> __netif_tx_unlock(txq);

Agreed. And subsequent block

       if (sq->vq->num_free >= 2 + MAX_SKB_FRAGS)
               netif_tx_wake_queue(txq);

as well

>
> Thanks
>
>
> >
> >       if (sq->vq->num_free >= 2 + MAX_SKB_FRAGS)
> >               netif_tx_wake_queue(txq);
>

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v3 1/4] virtio_net: move tx vq operation under tx queue lock
@ 2021-05-28 22:25       ` Willem de Bruijn
  0 siblings, 0 replies; 49+ messages in thread
From: Willem de Bruijn @ 2021-05-28 22:25 UTC (permalink / raw)
  To: Jason Wang
  Cc: Michael S. Tsirkin, Network Development, linux-kernel,
	virtualization, Jakub Kicinski, Wei Wang, David Miller

On Wed, May 26, 2021 at 11:41 PM Jason Wang <jasowang@redhat.com> wrote:
>
>
> 在 2021/5/26 下午4:24, Michael S. Tsirkin 写道:
> > It's unsafe to operate a vq from multiple threads.
> > Unfortunately this is exactly what we do when invoking
> > clean tx poll from rx napi.
> > Same happens with napi-tx even without the
> > opportunistic cleaning from the receive interrupt: that races
> > with processing the vq in start_xmit.
> >
> > As a fix move everything that deals with the vq to under tx lock.

This patch also disables callbacks during free_old_xmit_skbs
processing on tx interrupt. Should that be a separate commit, with its
own explanation?
> >
> > Fixes: b92f1e6751a6 ("virtio-net: transmit napi")
> > Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> > ---
> >   drivers/net/virtio_net.c | 22 +++++++++++++++++++++-
> >   1 file changed, 21 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > index ac0c143f97b4..12512d1002ec 100644
> > --- a/drivers/net/virtio_net.c
> > +++ b/drivers/net/virtio_net.c
> > @@ -1508,6 +1508,8 @@ static int virtnet_poll_tx(struct napi_struct *napi, int budget)
> >       struct virtnet_info *vi = sq->vq->vdev->priv;
> >       unsigned int index = vq2txq(sq->vq);
> >       struct netdev_queue *txq;
> > +     int opaque;
> > +     bool done;
> >
> >       if (unlikely(is_xdp_raw_buffer_queue(vi, index))) {
> >               /* We don't need to enable cb for XDP */
> > @@ -1517,10 +1519,28 @@ static int virtnet_poll_tx(struct napi_struct *napi, int budget)
> >
> >       txq = netdev_get_tx_queue(vi->dev, index);
> >       __netif_tx_lock(txq, raw_smp_processor_id());
> > +     virtqueue_disable_cb(sq->vq);
> >       free_old_xmit_skbs(sq, true);
> > +
> > +     opaque = virtqueue_enable_cb_prepare(sq->vq);
> > +
> > +     done = napi_complete_done(napi, 0);
> > +
> > +     if (!done)
> > +             virtqueue_disable_cb(sq->vq);
> > +
> >       __netif_tx_unlock(txq);
> >
> > -     virtqueue_napi_complete(napi, sq->vq, 0);
> > +     if (done) {
> > +             if (unlikely(virtqueue_poll(sq->vq, opaque))) {

Should this also be inside the lock, as it operates on vq?

Is there anything that is not allowed to run with the lock held?

> > +                     if (napi_schedule_prep(napi)) {
> > +                             __netif_tx_lock(txq, raw_smp_processor_id());
> > +                             virtqueue_disable_cb(sq->vq);
> > +                             __netif_tx_unlock(txq);
> > +                             __napi_schedule(napi);
> > +                     }
> > +             }
> > +     }
>
>
> Interesting, this looks like somehwo a open-coded version of
> virtqueue_napi_complete(). I wonder if we can simply keep using
> virtqueue_napi_complete() by simply moving the __netif_tx_unlock() after
> that:
>
> netif_tx_lock(txq);
> free_old_xmit_skbs(sq, true);
> virtqueue_napi_complete(napi, sq->vq, 0);
> __netif_tx_unlock(txq);

Agreed. And subsequent block

       if (sq->vq->num_free >= 2 + MAX_SKB_FRAGS)
               netif_tx_wake_queue(txq);

as well

>
> Thanks
>
>
> >
> >       if (sq->vq->num_free >= 2 + MAX_SKB_FRAGS)
> >               netif_tx_wake_queue(txq);
>
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v3 0/4] virtio net: spurious interrupt related fixes
  2021-05-26 15:34   ` Willem de Bruijn
@ 2021-06-01  2:53     ` Willem de Bruijn
  -1 siblings, 0 replies; 49+ messages in thread
From: Willem de Bruijn @ 2021-06-01  2:53 UTC (permalink / raw)
  Cc: Michael S. Tsirkin, linux-kernel, Jakub Kicinski, Wei Wang,
	David Miller, Network Development, virtualization

On Wed, May 26, 2021 at 11:34 AM Willem de Bruijn
<willemdebruijn.kernel@gmail.com> wrote:
>
> On Wed, May 26, 2021 at 4:24 AM Michael S. Tsirkin <mst@redhat.com> wrote:
> >
> >
> > With the implementation of napi-tx in virtio driver, we clean tx
> > descriptors from rx napi handler, for the purpose of reducing tx
> > complete interrupts. But this introduces a race where tx complete
> > interrupt has been raised, but the handler finds there is no work to do
> > because we have done the work in the previous rx interrupt handler.
> > A similar issue exists with polling from start_xmit, it is however
> > less common because of the delayed cb optimization of the split ring -
> > but will likely affect the packed ring once that is more common.
> >
> > In particular, this was reported to lead to the following warning msg:
> > [ 3588.010778] irq 38: nobody cared (try booting with the
> > "irqpoll" option)
> > [ 3588.017938] CPU: 4 PID: 0 Comm: swapper/4 Not tainted
> > 5.3.0-19-generic #20~18.04.2-Ubuntu
> > [ 3588.017940] Call Trace:
> > [ 3588.017942]  <IRQ>
> > [ 3588.017951]  dump_stack+0x63/0x85
> > [ 3588.017953]  __report_bad_irq+0x35/0xc0
> > [ 3588.017955]  note_interrupt+0x24b/0x2a0
> > [ 3588.017956]  handle_irq_event_percpu+0x54/0x80
> > [ 3588.017957]  handle_irq_event+0x3b/0x60
> > [ 3588.017958]  handle_edge_irq+0x83/0x1a0
> > [ 3588.017961]  handle_irq+0x20/0x30
> > [ 3588.017964]  do_IRQ+0x50/0xe0
> > [ 3588.017966]  common_interrupt+0xf/0xf
> > [ 3588.017966]  </IRQ>
> > [ 3588.017989] handlers:
> > [ 3588.020374] [<000000001b9f1da8>] vring_interrupt
> > [ 3588.025099] Disabling IRQ #38
> >
> > This patchset attempts to fix this by cleaning up a bunch of races
> > related to the handling of sq callbacks (aka tx interrupts).
> > Somewhat tested but I couldn't reproduce the original issues
> > reported, sending out for help with testing.
> >
> > Wei, does this address the spurious interrupt issue you are
> > observing? Could you confirm please?
>
> Thanks for working on this, Michael. Wei is on leave. I'll try to reproduce.

The original report was generated with five GCE virtual machines
sharing a sole-tenant node, together sending up to 160 netperf
tcp_stream connections to 16 other instances. Running Ubuntu 20.04-LTS
with kernel 5.4.0-1034-gcp.

But the issue can also be reproduced with just two n2-standard-16
instances, running neper tcp_stream with high parallelism (-T 16 -F
240).

It's a bit faster to trigger by reducing the interrupt count threshold
from 99.9K/100K to 9.9K/10K. And I added additional logging to report
the unhandled rate even if lower.

Unhandled interrupt rate scales with the number of queue pairs
(`ethtool -L $DEV combined $NUM`). It is essentially absent at 8
queues, at around 90% at 14 queues. By default these GCE instances
have one rx and tx interrupt per core, so 16 each. With the rx and tx
interrupts for a given virtio-queue pinned to the same core.

Unfortunately, commit 3/4 did not have a significant impact on these
numbers. Have to think a bit more about possible mitigations. At least
I'll be able to test the more easily now.

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v3 0/4] virtio net: spurious interrupt related fixes
@ 2021-06-01  2:53     ` Willem de Bruijn
  0 siblings, 0 replies; 49+ messages in thread
From: Willem de Bruijn @ 2021-06-01  2:53 UTC (permalink / raw)
  Cc: Michael S. Tsirkin, Network Development, linux-kernel,
	virtualization, Jakub Kicinski, Wei Wang, David Miller

On Wed, May 26, 2021 at 11:34 AM Willem de Bruijn
<willemdebruijn.kernel@gmail.com> wrote:
>
> On Wed, May 26, 2021 at 4:24 AM Michael S. Tsirkin <mst@redhat.com> wrote:
> >
> >
> > With the implementation of napi-tx in virtio driver, we clean tx
> > descriptors from rx napi handler, for the purpose of reducing tx
> > complete interrupts. But this introduces a race where tx complete
> > interrupt has been raised, but the handler finds there is no work to do
> > because we have done the work in the previous rx interrupt handler.
> > A similar issue exists with polling from start_xmit, it is however
> > less common because of the delayed cb optimization of the split ring -
> > but will likely affect the packed ring once that is more common.
> >
> > In particular, this was reported to lead to the following warning msg:
> > [ 3588.010778] irq 38: nobody cared (try booting with the
> > "irqpoll" option)
> > [ 3588.017938] CPU: 4 PID: 0 Comm: swapper/4 Not tainted
> > 5.3.0-19-generic #20~18.04.2-Ubuntu
> > [ 3588.017940] Call Trace:
> > [ 3588.017942]  <IRQ>
> > [ 3588.017951]  dump_stack+0x63/0x85
> > [ 3588.017953]  __report_bad_irq+0x35/0xc0
> > [ 3588.017955]  note_interrupt+0x24b/0x2a0
> > [ 3588.017956]  handle_irq_event_percpu+0x54/0x80
> > [ 3588.017957]  handle_irq_event+0x3b/0x60
> > [ 3588.017958]  handle_edge_irq+0x83/0x1a0
> > [ 3588.017961]  handle_irq+0x20/0x30
> > [ 3588.017964]  do_IRQ+0x50/0xe0
> > [ 3588.017966]  common_interrupt+0xf/0xf
> > [ 3588.017966]  </IRQ>
> > [ 3588.017989] handlers:
> > [ 3588.020374] [<000000001b9f1da8>] vring_interrupt
> > [ 3588.025099] Disabling IRQ #38
> >
> > This patchset attempts to fix this by cleaning up a bunch of races
> > related to the handling of sq callbacks (aka tx interrupts).
> > Somewhat tested but I couldn't reproduce the original issues
> > reported, sending out for help with testing.
> >
> > Wei, does this address the spurious interrupt issue you are
> > observing? Could you confirm please?
>
> Thanks for working on this, Michael. Wei is on leave. I'll try to reproduce.

The original report was generated with five GCE virtual machines
sharing a sole-tenant node, together sending up to 160 netperf
tcp_stream connections to 16 other instances. Running Ubuntu 20.04-LTS
with kernel 5.4.0-1034-gcp.

But the issue can also be reproduced with just two n2-standard-16
instances, running neper tcp_stream with high parallelism (-T 16 -F
240).

It's a bit faster to trigger by reducing the interrupt count threshold
from 99.9K/100K to 9.9K/10K. And I added additional logging to report
the unhandled rate even if lower.

Unhandled interrupt rate scales with the number of queue pairs
(`ethtool -L $DEV combined $NUM`). It is essentially absent at 8
queues, at around 90% at 14 queues. By default these GCE instances
have one rx and tx interrupt per core, so 16 each. With the rx and tx
interrupts for a given virtio-queue pinned to the same core.

Unfortunately, commit 3/4 did not have a significant impact on these
numbers. Have to think a bit more about possible mitigations. At least
I'll be able to test the more easily now.
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v3 0/4] virtio net: spurious interrupt related fixes
  2021-06-01  2:53     ` Willem de Bruijn
@ 2021-06-09 21:36       ` Willem de Bruijn
  -1 siblings, 0 replies; 49+ messages in thread
From: Willem de Bruijn @ 2021-06-09 21:36 UTC (permalink / raw)
  To: Willem de Bruijn
  Cc: Michael S. Tsirkin, linux-kernel, Jakub Kicinski, Wei Wang,
	David Miller, Network Development, virtualization

On Mon, May 31, 2021 at 10:53 PM Willem de Bruijn
<willemdebruijn.kernel@gmail.com> wrote:
>
> On Wed, May 26, 2021 at 11:34 AM Willem de Bruijn
> <willemdebruijn.kernel@gmail.com> wrote:
> >
> > On Wed, May 26, 2021 at 4:24 AM Michael S. Tsirkin <mst@redhat.com> wrote:
> > >
> > >
> > > With the implementation of napi-tx in virtio driver, we clean tx
> > > descriptors from rx napi handler, for the purpose of reducing tx
> > > complete interrupts. But this introduces a race where tx complete
> > > interrupt has been raised, but the handler finds there is no work to do
> > > because we have done the work in the previous rx interrupt handler.
> > > A similar issue exists with polling from start_xmit, it is however
> > > less common because of the delayed cb optimization of the split ring -
> > > but will likely affect the packed ring once that is more common.
> > >
> > > In particular, this was reported to lead to the following warning msg:
> > > [ 3588.010778] irq 38: nobody cared (try booting with the
> > > "irqpoll" option)
> > > [ 3588.017938] CPU: 4 PID: 0 Comm: swapper/4 Not tainted
> > > 5.3.0-19-generic #20~18.04.2-Ubuntu
> > > [ 3588.017940] Call Trace:
> > > [ 3588.017942]  <IRQ>
> > > [ 3588.017951]  dump_stack+0x63/0x85
> > > [ 3588.017953]  __report_bad_irq+0x35/0xc0
> > > [ 3588.017955]  note_interrupt+0x24b/0x2a0
> > > [ 3588.017956]  handle_irq_event_percpu+0x54/0x80
> > > [ 3588.017957]  handle_irq_event+0x3b/0x60
> > > [ 3588.017958]  handle_edge_irq+0x83/0x1a0
> > > [ 3588.017961]  handle_irq+0x20/0x30
> > > [ 3588.017964]  do_IRQ+0x50/0xe0
> > > [ 3588.017966]  common_interrupt+0xf/0xf
> > > [ 3588.017966]  </IRQ>
> > > [ 3588.017989] handlers:
> > > [ 3588.020374] [<000000001b9f1da8>] vring_interrupt
> > > [ 3588.025099] Disabling IRQ #38
> > >
> > > This patchset attempts to fix this by cleaning up a bunch of races
> > > related to the handling of sq callbacks (aka tx interrupts).
> > > Somewhat tested but I couldn't reproduce the original issues
> > > reported, sending out for help with testing.
> > >
> > > Wei, does this address the spurious interrupt issue you are
> > > observing? Could you confirm please?
> >
> > Thanks for working on this, Michael. Wei is on leave. I'll try to reproduce.
>
> The original report was generated with five GCE virtual machines
> sharing a sole-tenant node, together sending up to 160 netperf
> tcp_stream connections to 16 other instances. Running Ubuntu 20.04-LTS
> with kernel 5.4.0-1034-gcp.
>
> But the issue can also be reproduced with just two n2-standard-16
> instances, running neper tcp_stream with high parallelism (-T 16 -F
> 240).
>
> It's a bit faster to trigger by reducing the interrupt count threshold
> from 99.9K/100K to 9.9K/10K. And I added additional logging to report
> the unhandled rate even if lower.
>
> Unhandled interrupt rate scales with the number of queue pairs
> (`ethtool -L $DEV combined $NUM`). It is essentially absent at 8
> queues, at around 90% at 14 queues. By default these GCE instances
> have one rx and tx interrupt per core, so 16 each. With the rx and tx
> interrupts for a given virtio-queue pinned to the same core.
>
> Unfortunately, commit 3/4 did not have a significant impact on these
> numbers. Have to think a bit more about possible mitigations. At least
> I'll be able to test the more easily now.

Continuing to experiment with approaches to avoid this interrupt disable.

I think it's good to remember that the real bug is the disabling of
interrupts, which may cause stalls in absence of receive events.

The spurious tx interrupts themselves are no worse than the processing
the tx and rx interrupts strictly separately without the optimization.
The clean-from-rx optimization just reduces latency. The spurious
interrupts indicate a cycle optimization opportunity for sure. I
support Jason's suggestion for a single combined interrupt for both tx
and rx. That is not feasible as a bugfix for stable, so we need something
to mitigate the impact in the short term.

For that, I suggest just an approach to maintain most benefit
from the opportunistic cleaning, while keeping spurious rate below the
threshold. A few variants:

1. In virtnet_poll_cleantx, a uniformly random draw on whether or not
to attemp to clean. Not trivial to get a good random source that is
essentially free. One example perhaps is sq->vq->num_free & 0x7, but
not sure how randomized those bits are. Pro: this can be implemented
strictly in virtio_net. Con: a probabilistic method will reduce the
incidence rate, but it may still occur at the tail.

2. If also changing virtio_ring, in vring_interrupt count spurious
interrupts and report this count through a new interface. Modify
virtio_net to query and skip the optimization if above a threshold.

2a. slight variant: in virtio_net count consecutive succesful
opportunistic cleaning operations. If 100% hit rate, then probably
the tx interrupts are all spurious. Temporarily back off. (virtio_net
is not called for interrupts if there is no work on the ring, so cannot
count these events independently itself).

3. Modify virtio_ring to explicitly allow opportunistic cleaning and
spurious interrupts on a per vring basis. Add a boolean to struct
vring_virtqueue. And return IRQ_HANDLED instead of IRQ_NONE for these
(only).

The first two patches in Michael's series, which ensure that all relevant
operations are executed with the tx lock held, perhaps shouldn't wait
on additional interrupt suppression / mitigation work.

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v3 0/4] virtio net: spurious interrupt related fixes
@ 2021-06-09 21:36       ` Willem de Bruijn
  0 siblings, 0 replies; 49+ messages in thread
From: Willem de Bruijn @ 2021-06-09 21:36 UTC (permalink / raw)
  To: Willem de Bruijn
  Cc: Michael S. Tsirkin, Network Development, linux-kernel,
	virtualization, Jakub Kicinski, Wei Wang, David Miller

On Mon, May 31, 2021 at 10:53 PM Willem de Bruijn
<willemdebruijn.kernel@gmail.com> wrote:
>
> On Wed, May 26, 2021 at 11:34 AM Willem de Bruijn
> <willemdebruijn.kernel@gmail.com> wrote:
> >
> > On Wed, May 26, 2021 at 4:24 AM Michael S. Tsirkin <mst@redhat.com> wrote:
> > >
> > >
> > > With the implementation of napi-tx in virtio driver, we clean tx
> > > descriptors from rx napi handler, for the purpose of reducing tx
> > > complete interrupts. But this introduces a race where tx complete
> > > interrupt has been raised, but the handler finds there is no work to do
> > > because we have done the work in the previous rx interrupt handler.
> > > A similar issue exists with polling from start_xmit, it is however
> > > less common because of the delayed cb optimization of the split ring -
> > > but will likely affect the packed ring once that is more common.
> > >
> > > In particular, this was reported to lead to the following warning msg:
> > > [ 3588.010778] irq 38: nobody cared (try booting with the
> > > "irqpoll" option)
> > > [ 3588.017938] CPU: 4 PID: 0 Comm: swapper/4 Not tainted
> > > 5.3.0-19-generic #20~18.04.2-Ubuntu
> > > [ 3588.017940] Call Trace:
> > > [ 3588.017942]  <IRQ>
> > > [ 3588.017951]  dump_stack+0x63/0x85
> > > [ 3588.017953]  __report_bad_irq+0x35/0xc0
> > > [ 3588.017955]  note_interrupt+0x24b/0x2a0
> > > [ 3588.017956]  handle_irq_event_percpu+0x54/0x80
> > > [ 3588.017957]  handle_irq_event+0x3b/0x60
> > > [ 3588.017958]  handle_edge_irq+0x83/0x1a0
> > > [ 3588.017961]  handle_irq+0x20/0x30
> > > [ 3588.017964]  do_IRQ+0x50/0xe0
> > > [ 3588.017966]  common_interrupt+0xf/0xf
> > > [ 3588.017966]  </IRQ>
> > > [ 3588.017989] handlers:
> > > [ 3588.020374] [<000000001b9f1da8>] vring_interrupt
> > > [ 3588.025099] Disabling IRQ #38
> > >
> > > This patchset attempts to fix this by cleaning up a bunch of races
> > > related to the handling of sq callbacks (aka tx interrupts).
> > > Somewhat tested but I couldn't reproduce the original issues
> > > reported, sending out for help with testing.
> > >
> > > Wei, does this address the spurious interrupt issue you are
> > > observing? Could you confirm please?
> >
> > Thanks for working on this, Michael. Wei is on leave. I'll try to reproduce.
>
> The original report was generated with five GCE virtual machines
> sharing a sole-tenant node, together sending up to 160 netperf
> tcp_stream connections to 16 other instances. Running Ubuntu 20.04-LTS
> with kernel 5.4.0-1034-gcp.
>
> But the issue can also be reproduced with just two n2-standard-16
> instances, running neper tcp_stream with high parallelism (-T 16 -F
> 240).
>
> It's a bit faster to trigger by reducing the interrupt count threshold
> from 99.9K/100K to 9.9K/10K. And I added additional logging to report
> the unhandled rate even if lower.
>
> Unhandled interrupt rate scales with the number of queue pairs
> (`ethtool -L $DEV combined $NUM`). It is essentially absent at 8
> queues, at around 90% at 14 queues. By default these GCE instances
> have one rx and tx interrupt per core, so 16 each. With the rx and tx
> interrupts for a given virtio-queue pinned to the same core.
>
> Unfortunately, commit 3/4 did not have a significant impact on these
> numbers. Have to think a bit more about possible mitigations. At least
> I'll be able to test the more easily now.

Continuing to experiment with approaches to avoid this interrupt disable.

I think it's good to remember that the real bug is the disabling of
interrupts, which may cause stalls in absence of receive events.

The spurious tx interrupts themselves are no worse than the processing
the tx and rx interrupts strictly separately without the optimization.
The clean-from-rx optimization just reduces latency. The spurious
interrupts indicate a cycle optimization opportunity for sure. I
support Jason's suggestion for a single combined interrupt for both tx
and rx. That is not feasible as a bugfix for stable, so we need something
to mitigate the impact in the short term.

For that, I suggest just an approach to maintain most benefit
from the opportunistic cleaning, while keeping spurious rate below the
threshold. A few variants:

1. In virtnet_poll_cleantx, a uniformly random draw on whether or not
to attemp to clean. Not trivial to get a good random source that is
essentially free. One example perhaps is sq->vq->num_free & 0x7, but
not sure how randomized those bits are. Pro: this can be implemented
strictly in virtio_net. Con: a probabilistic method will reduce the
incidence rate, but it may still occur at the tail.

2. If also changing virtio_ring, in vring_interrupt count spurious
interrupts and report this count through a new interface. Modify
virtio_net to query and skip the optimization if above a threshold.

2a. slight variant: in virtio_net count consecutive succesful
opportunistic cleaning operations. If 100% hit rate, then probably
the tx interrupts are all spurious. Temporarily back off. (virtio_net
is not called for interrupts if there is no work on the ring, so cannot
count these events independently itself).

3. Modify virtio_ring to explicitly allow opportunistic cleaning and
spurious interrupts on a per vring basis. Add a boolean to struct
vring_virtqueue. And return IRQ_HANDLED instead of IRQ_NONE for these
(only).

The first two patches in Michael's series, which ensure that all relevant
operations are executed with the tx lock held, perhaps shouldn't wait
on additional interrupt suppression / mitigation work.
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v3 1/4] virtio_net: move tx vq operation under tx queue lock
  2021-05-28 22:25       ` Willem de Bruijn
@ 2021-06-09 22:03         ` Michael S. Tsirkin
  -1 siblings, 0 replies; 49+ messages in thread
From: Michael S. Tsirkin @ 2021-06-09 22:03 UTC (permalink / raw)
  To: Willem de Bruijn
  Cc: Jason Wang, linux-kernel, Jakub Kicinski, Wei Wang, David Miller,
	Network Development, virtualization

On Fri, May 28, 2021 at 06:25:11PM -0400, Willem de Bruijn wrote:
> On Wed, May 26, 2021 at 11:41 PM Jason Wang <jasowang@redhat.com> wrote:
> >
> >
> > 在 2021/5/26 下午4:24, Michael S. Tsirkin 写道:
> > > It's unsafe to operate a vq from multiple threads.
> > > Unfortunately this is exactly what we do when invoking
> > > clean tx poll from rx napi.
> > > Same happens with napi-tx even without the
> > > opportunistic cleaning from the receive interrupt: that races
> > > with processing the vq in start_xmit.
> > >
> > > As a fix move everything that deals with the vq to under tx lock.
> 
> This patch also disables callbacks during free_old_xmit_skbs
> processing on tx interrupt. Should that be a separate commit, with its
> own explanation?
> > >
> > > Fixes: b92f1e6751a6 ("virtio-net: transmit napi")
> > > Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> > > ---
> > >   drivers/net/virtio_net.c | 22 +++++++++++++++++++++-
> > >   1 file changed, 21 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > > index ac0c143f97b4..12512d1002ec 100644
> > > --- a/drivers/net/virtio_net.c
> > > +++ b/drivers/net/virtio_net.c
> > > @@ -1508,6 +1508,8 @@ static int virtnet_poll_tx(struct napi_struct *napi, int budget)
> > >       struct virtnet_info *vi = sq->vq->vdev->priv;
> > >       unsigned int index = vq2txq(sq->vq);
> > >       struct netdev_queue *txq;
> > > +     int opaque;
> > > +     bool done;
> > >
> > >       if (unlikely(is_xdp_raw_buffer_queue(vi, index))) {
> > >               /* We don't need to enable cb for XDP */
> > > @@ -1517,10 +1519,28 @@ static int virtnet_poll_tx(struct napi_struct *napi, int budget)
> > >
> > >       txq = netdev_get_tx_queue(vi->dev, index);
> > >       __netif_tx_lock(txq, raw_smp_processor_id());
> > > +     virtqueue_disable_cb(sq->vq);
> > >       free_old_xmit_skbs(sq, true);
> > > +
> > > +     opaque = virtqueue_enable_cb_prepare(sq->vq);
> > > +
> > > +     done = napi_complete_done(napi, 0);
> > > +
> > > +     if (!done)
> > > +             virtqueue_disable_cb(sq->vq);
> > > +
> > >       __netif_tx_unlock(txq);
> > >
> > > -     virtqueue_napi_complete(napi, sq->vq, 0);
> > > +     if (done) {
> > > +             if (unlikely(virtqueue_poll(sq->vq, opaque))) {
> 
> Should this also be inside the lock, as it operates on vq?

No vq poll is ok outside of locks, it's atomic.

> Is there anything that is not allowed to run with the lock held?
> > > +                     if (napi_schedule_prep(napi)) {
> > > +                             __netif_tx_lock(txq, raw_smp_processor_id());
> > > +                             virtqueue_disable_cb(sq->vq);
> > > +                             __netif_tx_unlock(txq);
> > > +                             __napi_schedule(napi);
> > > +                     }
> > > +             }
> > > +     }
> >
> >
> > Interesting, this looks like somehwo a open-coded version of
> > virtqueue_napi_complete(). I wonder if we can simply keep using
> > virtqueue_napi_complete() by simply moving the __netif_tx_unlock() after
> > that:
> >
> > netif_tx_lock(txq);
> > free_old_xmit_skbs(sq, true);
> > virtqueue_napi_complete(napi, sq->vq, 0);
> > __netif_tx_unlock(txq);
> 
> Agreed. And subsequent block
> 
>        if (sq->vq->num_free >= 2 + MAX_SKB_FRAGS)
>                netif_tx_wake_queue(txq);
> 
> as well

Yes I thought I saw something here that can't be called with tx lock
held but I no longer see it. Will do.

> >
> > Thanks
> >
> >
> > >
> > >       if (sq->vq->num_free >= 2 + MAX_SKB_FRAGS)
> > >               netif_tx_wake_queue(txq);
> >


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v3 1/4] virtio_net: move tx vq operation under tx queue lock
@ 2021-06-09 22:03         ` Michael S. Tsirkin
  0 siblings, 0 replies; 49+ messages in thread
From: Michael S. Tsirkin @ 2021-06-09 22:03 UTC (permalink / raw)
  To: Willem de Bruijn
  Cc: Network Development, linux-kernel, virtualization,
	Jakub Kicinski, Wei Wang, David Miller

On Fri, May 28, 2021 at 06:25:11PM -0400, Willem de Bruijn wrote:
> On Wed, May 26, 2021 at 11:41 PM Jason Wang <jasowang@redhat.com> wrote:
> >
> >
> > 在 2021/5/26 下午4:24, Michael S. Tsirkin 写道:
> > > It's unsafe to operate a vq from multiple threads.
> > > Unfortunately this is exactly what we do when invoking
> > > clean tx poll from rx napi.
> > > Same happens with napi-tx even without the
> > > opportunistic cleaning from the receive interrupt: that races
> > > with processing the vq in start_xmit.
> > >
> > > As a fix move everything that deals with the vq to under tx lock.
> 
> This patch also disables callbacks during free_old_xmit_skbs
> processing on tx interrupt. Should that be a separate commit, with its
> own explanation?
> > >
> > > Fixes: b92f1e6751a6 ("virtio-net: transmit napi")
> > > Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> > > ---
> > >   drivers/net/virtio_net.c | 22 +++++++++++++++++++++-
> > >   1 file changed, 21 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > > index ac0c143f97b4..12512d1002ec 100644
> > > --- a/drivers/net/virtio_net.c
> > > +++ b/drivers/net/virtio_net.c
> > > @@ -1508,6 +1508,8 @@ static int virtnet_poll_tx(struct napi_struct *napi, int budget)
> > >       struct virtnet_info *vi = sq->vq->vdev->priv;
> > >       unsigned int index = vq2txq(sq->vq);
> > >       struct netdev_queue *txq;
> > > +     int opaque;
> > > +     bool done;
> > >
> > >       if (unlikely(is_xdp_raw_buffer_queue(vi, index))) {
> > >               /* We don't need to enable cb for XDP */
> > > @@ -1517,10 +1519,28 @@ static int virtnet_poll_tx(struct napi_struct *napi, int budget)
> > >
> > >       txq = netdev_get_tx_queue(vi->dev, index);
> > >       __netif_tx_lock(txq, raw_smp_processor_id());
> > > +     virtqueue_disable_cb(sq->vq);
> > >       free_old_xmit_skbs(sq, true);
> > > +
> > > +     opaque = virtqueue_enable_cb_prepare(sq->vq);
> > > +
> > > +     done = napi_complete_done(napi, 0);
> > > +
> > > +     if (!done)
> > > +             virtqueue_disable_cb(sq->vq);
> > > +
> > >       __netif_tx_unlock(txq);
> > >
> > > -     virtqueue_napi_complete(napi, sq->vq, 0);
> > > +     if (done) {
> > > +             if (unlikely(virtqueue_poll(sq->vq, opaque))) {
> 
> Should this also be inside the lock, as it operates on vq?

No vq poll is ok outside of locks, it's atomic.

> Is there anything that is not allowed to run with the lock held?
> > > +                     if (napi_schedule_prep(napi)) {
> > > +                             __netif_tx_lock(txq, raw_smp_processor_id());
> > > +                             virtqueue_disable_cb(sq->vq);
> > > +                             __netif_tx_unlock(txq);
> > > +                             __napi_schedule(napi);
> > > +                     }
> > > +             }
> > > +     }
> >
> >
> > Interesting, this looks like somehwo a open-coded version of
> > virtqueue_napi_complete(). I wonder if we can simply keep using
> > virtqueue_napi_complete() by simply moving the __netif_tx_unlock() after
> > that:
> >
> > netif_tx_lock(txq);
> > free_old_xmit_skbs(sq, true);
> > virtqueue_napi_complete(napi, sq->vq, 0);
> > __netif_tx_unlock(txq);
> 
> Agreed. And subsequent block
> 
>        if (sq->vq->num_free >= 2 + MAX_SKB_FRAGS)
>                netif_tx_wake_queue(txq);
> 
> as well

Yes I thought I saw something here that can't be called with tx lock
held but I no longer see it. Will do.

> >
> > Thanks
> >
> >
> > >
> > >       if (sq->vq->num_free >= 2 + MAX_SKB_FRAGS)
> > >               netif_tx_wake_queue(txq);
> >

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v3 0/4] virtio net: spurious interrupt related fixes
  2021-06-09 21:36       ` Willem de Bruijn
@ 2021-06-09 22:59         ` Willem de Bruijn
  -1 siblings, 0 replies; 49+ messages in thread
From: Willem de Bruijn @ 2021-06-09 22:59 UTC (permalink / raw)
  To: Willem de Bruijn
  Cc: Michael S. Tsirkin, linux-kernel, Jakub Kicinski, Wei Wang,
	David Miller, Network Development, virtualization

On Wed, Jun 9, 2021 at 5:36 PM Willem de Bruijn
<willemdebruijn.kernel@gmail.com> wrote:
>
> On Mon, May 31, 2021 at 10:53 PM Willem de Bruijn
> <willemdebruijn.kernel@gmail.com> wrote:
> >
> > On Wed, May 26, 2021 at 11:34 AM Willem de Bruijn
> > <willemdebruijn.kernel@gmail.com> wrote:
> > >
> > > On Wed, May 26, 2021 at 4:24 AM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > >
> > > >
> > > > With the implementation of napi-tx in virtio driver, we clean tx
> > > > descriptors from rx napi handler, for the purpose of reducing tx
> > > > complete interrupts. But this introduces a race where tx complete
> > > > interrupt has been raised, but the handler finds there is no work to do
> > > > because we have done the work in the previous rx interrupt handler.
> > > > A similar issue exists with polling from start_xmit, it is however
> > > > less common because of the delayed cb optimization of the split ring -
> > > > but will likely affect the packed ring once that is more common.
> > > >
> > > > In particular, this was reported to lead to the following warning msg:
> > > > [ 3588.010778] irq 38: nobody cared (try booting with the
> > > > "irqpoll" option)
> > > > [ 3588.017938] CPU: 4 PID: 0 Comm: swapper/4 Not tainted
> > > > 5.3.0-19-generic #20~18.04.2-Ubuntu
> > > > [ 3588.017940] Call Trace:
> > > > [ 3588.017942]  <IRQ>
> > > > [ 3588.017951]  dump_stack+0x63/0x85
> > > > [ 3588.017953]  __report_bad_irq+0x35/0xc0
> > > > [ 3588.017955]  note_interrupt+0x24b/0x2a0
> > > > [ 3588.017956]  handle_irq_event_percpu+0x54/0x80
> > > > [ 3588.017957]  handle_irq_event+0x3b/0x60
> > > > [ 3588.017958]  handle_edge_irq+0x83/0x1a0
> > > > [ 3588.017961]  handle_irq+0x20/0x30
> > > > [ 3588.017964]  do_IRQ+0x50/0xe0
> > > > [ 3588.017966]  common_interrupt+0xf/0xf
> > > > [ 3588.017966]  </IRQ>
> > > > [ 3588.017989] handlers:
> > > > [ 3588.020374] [<000000001b9f1da8>] vring_interrupt
> > > > [ 3588.025099] Disabling IRQ #38
> > > >
> > > > This patchset attempts to fix this by cleaning up a bunch of races
> > > > related to the handling of sq callbacks (aka tx interrupts).
> > > > Somewhat tested but I couldn't reproduce the original issues
> > > > reported, sending out for help with testing.
> > > >
> > > > Wei, does this address the spurious interrupt issue you are
> > > > observing? Could you confirm please?
> > >
> > > Thanks for working on this, Michael. Wei is on leave. I'll try to reproduce.
> >
> > The original report was generated with five GCE virtual machines
> > sharing a sole-tenant node, together sending up to 160 netperf
> > tcp_stream connections to 16 other instances. Running Ubuntu 20.04-LTS
> > with kernel 5.4.0-1034-gcp.
> >
> > But the issue can also be reproduced with just two n2-standard-16
> > instances, running neper tcp_stream with high parallelism (-T 16 -F
> > 240).
> >
> > It's a bit faster to trigger by reducing the interrupt count threshold
> > from 99.9K/100K to 9.9K/10K. And I added additional logging to report
> > the unhandled rate even if lower.
> >
> > Unhandled interrupt rate scales with the number of queue pairs
> > (`ethtool -L $DEV combined $NUM`). It is essentially absent at 8
> > queues, at around 90% at 14 queues. By default these GCE instances
> > have one rx and tx interrupt per core, so 16 each. With the rx and tx
> > interrupts for a given virtio-queue pinned to the same core.
> >
> > Unfortunately, commit 3/4 did not have a significant impact on these
> > numbers. Have to think a bit more about possible mitigations. At least
> > I'll be able to test the more easily now.
>
> Continuing to experiment with approaches to avoid this interrupt disable.
>
> I think it's good to remember that the real bug is the disabling of
> interrupts, which may cause stalls in absence of receive events.
>
> The spurious tx interrupts themselves are no worse than the processing
> the tx and rx interrupts strictly separately without the optimization.
> The clean-from-rx optimization just reduces latency. The spurious
> interrupts indicate a cycle optimization opportunity for sure. I
> support Jason's suggestion for a single combined interrupt for both tx
> and rx. That is not feasible as a bugfix for stable, so we need something
> to mitigate the impact in the short term.
>
> For that, I suggest just an approach to maintain most benefit
> from the opportunistic cleaning, while keeping spurious rate below the
> threshold. A few variants:
>
> 1. In virtnet_poll_cleantx, a uniformly random draw on whether or not
> to attemp to clean. Not trivial to get a good random source that is
> essentially free. One example perhaps is sq->vq->num_free & 0x7, but
> not sure how randomized those bits are. Pro: this can be implemented
> strictly in virtio_net. Con: a probabilistic method will reduce the
> incidence rate, but it may still occur at the tail.
>
> 2. If also changing virtio_ring, in vring_interrupt count spurious
> interrupts and report this count through a new interface. Modify
> virtio_net to query and skip the optimization if above a threshold.
>
> 2a. slight variant: in virtio_net count consecutive succesful
> opportunistic cleaning operations. If 100% hit rate, then probably
> the tx interrupts are all spurious. Temporarily back off. (virtio_net
> is not called for interrupts if there is no work on the ring, so cannot
> count these events independently itself).
>
> 3. Modify virtio_ring to explicitly allow opportunistic cleaning and
> spurious interrupts on a per vring basis. Add a boolean to struct
> vring_virtqueue. And return IRQ_HANDLED instead of IRQ_NONE for these
> (only).
>
> The first two patches in Michael's series, which ensure that all relevant
> operations are executed with the tx lock held, perhaps shouldn't wait
> on additional interrupt suppression / mitigation work.

I forgot to mention: virtio_net cannot configure interrupt moderation through
ethtool. But to reduce interrupt rate, it may also be interesting to use try
/sys/class/net/$DEV/gro_flush_timeout. To mask the device interrupts and
instead wait on a kernel timer for some usec, to increase batching.

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v3 0/4] virtio net: spurious interrupt related fixes
@ 2021-06-09 22:59         ` Willem de Bruijn
  0 siblings, 0 replies; 49+ messages in thread
From: Willem de Bruijn @ 2021-06-09 22:59 UTC (permalink / raw)
  To: Willem de Bruijn
  Cc: Michael S. Tsirkin, Network Development, linux-kernel,
	virtualization, Jakub Kicinski, Wei Wang, David Miller

On Wed, Jun 9, 2021 at 5:36 PM Willem de Bruijn
<willemdebruijn.kernel@gmail.com> wrote:
>
> On Mon, May 31, 2021 at 10:53 PM Willem de Bruijn
> <willemdebruijn.kernel@gmail.com> wrote:
> >
> > On Wed, May 26, 2021 at 11:34 AM Willem de Bruijn
> > <willemdebruijn.kernel@gmail.com> wrote:
> > >
> > > On Wed, May 26, 2021 at 4:24 AM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > >
> > > >
> > > > With the implementation of napi-tx in virtio driver, we clean tx
> > > > descriptors from rx napi handler, for the purpose of reducing tx
> > > > complete interrupts. But this introduces a race where tx complete
> > > > interrupt has been raised, but the handler finds there is no work to do
> > > > because we have done the work in the previous rx interrupt handler.
> > > > A similar issue exists with polling from start_xmit, it is however
> > > > less common because of the delayed cb optimization of the split ring -
> > > > but will likely affect the packed ring once that is more common.
> > > >
> > > > In particular, this was reported to lead to the following warning msg:
> > > > [ 3588.010778] irq 38: nobody cared (try booting with the
> > > > "irqpoll" option)
> > > > [ 3588.017938] CPU: 4 PID: 0 Comm: swapper/4 Not tainted
> > > > 5.3.0-19-generic #20~18.04.2-Ubuntu
> > > > [ 3588.017940] Call Trace:
> > > > [ 3588.017942]  <IRQ>
> > > > [ 3588.017951]  dump_stack+0x63/0x85
> > > > [ 3588.017953]  __report_bad_irq+0x35/0xc0
> > > > [ 3588.017955]  note_interrupt+0x24b/0x2a0
> > > > [ 3588.017956]  handle_irq_event_percpu+0x54/0x80
> > > > [ 3588.017957]  handle_irq_event+0x3b/0x60
> > > > [ 3588.017958]  handle_edge_irq+0x83/0x1a0
> > > > [ 3588.017961]  handle_irq+0x20/0x30
> > > > [ 3588.017964]  do_IRQ+0x50/0xe0
> > > > [ 3588.017966]  common_interrupt+0xf/0xf
> > > > [ 3588.017966]  </IRQ>
> > > > [ 3588.017989] handlers:
> > > > [ 3588.020374] [<000000001b9f1da8>] vring_interrupt
> > > > [ 3588.025099] Disabling IRQ #38
> > > >
> > > > This patchset attempts to fix this by cleaning up a bunch of races
> > > > related to the handling of sq callbacks (aka tx interrupts).
> > > > Somewhat tested but I couldn't reproduce the original issues
> > > > reported, sending out for help with testing.
> > > >
> > > > Wei, does this address the spurious interrupt issue you are
> > > > observing? Could you confirm please?
> > >
> > > Thanks for working on this, Michael. Wei is on leave. I'll try to reproduce.
> >
> > The original report was generated with five GCE virtual machines
> > sharing a sole-tenant node, together sending up to 160 netperf
> > tcp_stream connections to 16 other instances. Running Ubuntu 20.04-LTS
> > with kernel 5.4.0-1034-gcp.
> >
> > But the issue can also be reproduced with just two n2-standard-16
> > instances, running neper tcp_stream with high parallelism (-T 16 -F
> > 240).
> >
> > It's a bit faster to trigger by reducing the interrupt count threshold
> > from 99.9K/100K to 9.9K/10K. And I added additional logging to report
> > the unhandled rate even if lower.
> >
> > Unhandled interrupt rate scales with the number of queue pairs
> > (`ethtool -L $DEV combined $NUM`). It is essentially absent at 8
> > queues, at around 90% at 14 queues. By default these GCE instances
> > have one rx and tx interrupt per core, so 16 each. With the rx and tx
> > interrupts for a given virtio-queue pinned to the same core.
> >
> > Unfortunately, commit 3/4 did not have a significant impact on these
> > numbers. Have to think a bit more about possible mitigations. At least
> > I'll be able to test the more easily now.
>
> Continuing to experiment with approaches to avoid this interrupt disable.
>
> I think it's good to remember that the real bug is the disabling of
> interrupts, which may cause stalls in absence of receive events.
>
> The spurious tx interrupts themselves are no worse than the processing
> the tx and rx interrupts strictly separately without the optimization.
> The clean-from-rx optimization just reduces latency. The spurious
> interrupts indicate a cycle optimization opportunity for sure. I
> support Jason's suggestion for a single combined interrupt for both tx
> and rx. That is not feasible as a bugfix for stable, so we need something
> to mitigate the impact in the short term.
>
> For that, I suggest just an approach to maintain most benefit
> from the opportunistic cleaning, while keeping spurious rate below the
> threshold. A few variants:
>
> 1. In virtnet_poll_cleantx, a uniformly random draw on whether or not
> to attemp to clean. Not trivial to get a good random source that is
> essentially free. One example perhaps is sq->vq->num_free & 0x7, but
> not sure how randomized those bits are. Pro: this can be implemented
> strictly in virtio_net. Con: a probabilistic method will reduce the
> incidence rate, but it may still occur at the tail.
>
> 2. If also changing virtio_ring, in vring_interrupt count spurious
> interrupts and report this count through a new interface. Modify
> virtio_net to query and skip the optimization if above a threshold.
>
> 2a. slight variant: in virtio_net count consecutive succesful
> opportunistic cleaning operations. If 100% hit rate, then probably
> the tx interrupts are all spurious. Temporarily back off. (virtio_net
> is not called for interrupts if there is no work on the ring, so cannot
> count these events independently itself).
>
> 3. Modify virtio_ring to explicitly allow opportunistic cleaning and
> spurious interrupts on a per vring basis. Add a boolean to struct
> vring_virtqueue. And return IRQ_HANDLED instead of IRQ_NONE for these
> (only).
>
> The first two patches in Michael's series, which ensure that all relevant
> operations are executed with the tx lock held, perhaps shouldn't wait
> on additional interrupt suppression / mitigation work.

I forgot to mention: virtio_net cannot configure interrupt moderation through
ethtool. But to reduce interrupt rate, it may also be interesting to use try
/sys/class/net/$DEV/gro_flush_timeout. To mask the device interrupts and
instead wait on a kernel timer for some usec, to increase batching.
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v3 4/4] virtio_net: disable cb aggressively
  2021-05-26  8:24   ` Michael S. Tsirkin
@ 2023-01-16 13:41     ` Laurent Vivier
  -1 siblings, 0 replies; 49+ messages in thread
From: Laurent Vivier @ 2023-01-16 13:41 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Willem de Bruijn, netdev, linux-kernel, virtualization,
	Stefano Brivio, Jakub Kicinski, Wei Wang, David Miller

Hi Michael,

On 5/26/21 10:24, Michael S. Tsirkin wrote:
> There are currently two cases where we poll TX vq not in response to a
> callback: start xmit and rx napi.  We currently do this with callbacks
> enabled which can cause extra interrupts from the card.  Used not to be
> a big issue as we run with interrupts disabled but that is no longer the
> case, and in some cases the rate of spurious interrupts is so high
> linux detects this and actually kills the interrupt.
> 
> Fix up by disabling the callbacks before polling the tx vq.
> 
> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> ---
>   drivers/net/virtio_net.c | 16 ++++++++++++----
>   1 file changed, 12 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index c29f42d1e04f..a83dc038d8af 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -1433,7 +1433,10 @@ static void virtnet_poll_cleantx(struct receive_queue *rq)
>   		return;
>   
>   	if (__netif_tx_trylock(txq)) {
> -		free_old_xmit_skbs(sq, true);
> +		do {
> +			virtqueue_disable_cb(sq->vq);
> +			free_old_xmit_skbs(sq, true);
> +		} while (unlikely(!virtqueue_enable_cb_delayed(sq->vq)));
>   
>   		if (sq->vq->num_free >= 2 + MAX_SKB_FRAGS)
>   			netif_tx_wake_queue(txq);
> @@ -1605,12 +1608,17 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev)
>   	struct netdev_queue *txq = netdev_get_tx_queue(dev, qnum);
>   	bool kick = !netdev_xmit_more();
>   	bool use_napi = sq->napi.weight;
> +	unsigned int bytes = skb->len;
>   
>   	/* Free up any pending old buffers before queueing new ones. */
> -	free_old_xmit_skbs(sq, false);
> +	do {
> +		if (use_napi)
> +			virtqueue_disable_cb(sq->vq);
>   
> -	if (use_napi && kick)
> -		virtqueue_enable_cb_delayed(sq->vq);
> +		free_old_xmit_skbs(sq, false);
> +
> +	} while (use_napi && kick &&
> +	       unlikely(!virtqueue_enable_cb_delayed(sq->vq)));
>   
>   	/* timestamp packet in software */
>   	skb_tx_timestamp(skb);

This patch seems to introduce a problem with QEMU connected to passt using netdev stream 
backend.

When I run an iperf3 test I get after 1 or 2 seconds of test:

[  254.035559] NETDEV WATCHDOG: ens3 (virtio_net): transmit queue 0 timed out
...
[  254.060962] virtio_net virtio1 ens3: TX timeout on queue: 0, sq: output.0, vq: 0x1, 
name: output.0, 8856000 usecs ago
[  259.155150] virtio_net virtio1 ens3: TX timeout on queue: 0, sq: output.0, vq: 0x1, 
name: output.0, 13951000 usecs ago

In QEMU, I can see in virtio_net_tx_bh() the function virtio_net_flush_tx() has flushed 
all the queue entries and re-enabled the queue notification with 
virtio_queue_set_notification() and tries to flush again the queue and as it is empty it 
does nothing more and then rely on a kernel notification to re-enable the bottom half 
function. As this notification never comes the queue is stuck and kernel add entries but 
QEMU doesn't remove them:

2812 static void virtio_net_tx_bh(void *opaque)
2813 {
...
2833     ret = virtio_net_flush_tx(q);

-> flush the queue and ret is not an error and not n->tx_burst (that would re-schedule the 
function)

...
2850     virtio_queue_set_notification(q->tx_vq, 1);

-> re-enable the queue notification

2851     ret = virtio_net_flush_tx(q);
2852     if (ret == -EINVAL) {
2853         return;
2854     } else if (ret > 0) {
2855         virtio_queue_set_notification(q->tx_vq, 0);
2856         qemu_bh_schedule(q->tx_bh);
2857         q->tx_waiting = 1;
2858     }

-> ret is 0, exit the function without re-scheduling the function.
...
2859 }

If I revert this patch in the kernel (a7766ef18b33 ("virtio_net: disable cb 
aggressively")), it works fine.

How to reproduce it:

I start passt (https://passt.top/passt):

passt -f

and then QEMU

qemu-system-x86_64 ... --netdev 
stream,id=netdev0,server=off,addr.type=unix,addr.path=/tmp/passt_1.socket -device 
virtio-net,mac=9a:2b:2c:2d:2e:2f,netdev=netdev0

Host side:

sysctl -w net.core.rmem_max=134217728
sysctl -w net.core.wmem_max=134217728
iperf3 -s

Guest side:

sysctl -w net.core.rmem_max=536870912
sysctl -w net.core.wmem_max=536870912

ip link set dev $DEV mtu 256
iperf3 -c $HOST -t10 -i0 -Z -P 8 -l 1M --pacing-timer 1000000 -w 4M

Any idea of what is the problem?

Thanks,
Laurent


_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v3 4/4] virtio_net: disable cb aggressively
@ 2023-01-16 13:41     ` Laurent Vivier
  0 siblings, 0 replies; 49+ messages in thread
From: Laurent Vivier @ 2023-01-16 13:41 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Jakub Kicinski, Wei Wang, David Miller, netdev, Willem de Bruijn,
	virtualization, Jason Wang, Stefano Brivio, linux-kernel

Hi Michael,

On 5/26/21 10:24, Michael S. Tsirkin wrote:
> There are currently two cases where we poll TX vq not in response to a
> callback: start xmit and rx napi.  We currently do this with callbacks
> enabled which can cause extra interrupts from the card.  Used not to be
> a big issue as we run with interrupts disabled but that is no longer the
> case, and in some cases the rate of spurious interrupts is so high
> linux detects this and actually kills the interrupt.
> 
> Fix up by disabling the callbacks before polling the tx vq.
> 
> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> ---
>   drivers/net/virtio_net.c | 16 ++++++++++++----
>   1 file changed, 12 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index c29f42d1e04f..a83dc038d8af 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -1433,7 +1433,10 @@ static void virtnet_poll_cleantx(struct receive_queue *rq)
>   		return;
>   
>   	if (__netif_tx_trylock(txq)) {
> -		free_old_xmit_skbs(sq, true);
> +		do {
> +			virtqueue_disable_cb(sq->vq);
> +			free_old_xmit_skbs(sq, true);
> +		} while (unlikely(!virtqueue_enable_cb_delayed(sq->vq)));
>   
>   		if (sq->vq->num_free >= 2 + MAX_SKB_FRAGS)
>   			netif_tx_wake_queue(txq);
> @@ -1605,12 +1608,17 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev)
>   	struct netdev_queue *txq = netdev_get_tx_queue(dev, qnum);
>   	bool kick = !netdev_xmit_more();
>   	bool use_napi = sq->napi.weight;
> +	unsigned int bytes = skb->len;
>   
>   	/* Free up any pending old buffers before queueing new ones. */
> -	free_old_xmit_skbs(sq, false);
> +	do {
> +		if (use_napi)
> +			virtqueue_disable_cb(sq->vq);
>   
> -	if (use_napi && kick)
> -		virtqueue_enable_cb_delayed(sq->vq);
> +		free_old_xmit_skbs(sq, false);
> +
> +	} while (use_napi && kick &&
> +	       unlikely(!virtqueue_enable_cb_delayed(sq->vq)));
>   
>   	/* timestamp packet in software */
>   	skb_tx_timestamp(skb);

This patch seems to introduce a problem with QEMU connected to passt using netdev stream 
backend.

When I run an iperf3 test I get after 1 or 2 seconds of test:

[  254.035559] NETDEV WATCHDOG: ens3 (virtio_net): transmit queue 0 timed out
...
[  254.060962] virtio_net virtio1 ens3: TX timeout on queue: 0, sq: output.0, vq: 0x1, 
name: output.0, 8856000 usecs ago
[  259.155150] virtio_net virtio1 ens3: TX timeout on queue: 0, sq: output.0, vq: 0x1, 
name: output.0, 13951000 usecs ago

In QEMU, I can see in virtio_net_tx_bh() the function virtio_net_flush_tx() has flushed 
all the queue entries and re-enabled the queue notification with 
virtio_queue_set_notification() and tries to flush again the queue and as it is empty it 
does nothing more and then rely on a kernel notification to re-enable the bottom half 
function. As this notification never comes the queue is stuck and kernel add entries but 
QEMU doesn't remove them:

2812 static void virtio_net_tx_bh(void *opaque)
2813 {
...
2833     ret = virtio_net_flush_tx(q);

-> flush the queue and ret is not an error and not n->tx_burst (that would re-schedule the 
function)

...
2850     virtio_queue_set_notification(q->tx_vq, 1);

-> re-enable the queue notification

2851     ret = virtio_net_flush_tx(q);
2852     if (ret == -EINVAL) {
2853         return;
2854     } else if (ret > 0) {
2855         virtio_queue_set_notification(q->tx_vq, 0);
2856         qemu_bh_schedule(q->tx_bh);
2857         q->tx_waiting = 1;
2858     }

-> ret is 0, exit the function without re-scheduling the function.
...
2859 }

If I revert this patch in the kernel (a7766ef18b33 ("virtio_net: disable cb 
aggressively")), it works fine.

How to reproduce it:

I start passt (https://passt.top/passt):

passt -f

and then QEMU

qemu-system-x86_64 ... --netdev 
stream,id=netdev0,server=off,addr.type=unix,addr.path=/tmp/passt_1.socket -device 
virtio-net,mac=9a:2b:2c:2d:2e:2f,netdev=netdev0

Host side:

sysctl -w net.core.rmem_max=134217728
sysctl -w net.core.wmem_max=134217728
iperf3 -s

Guest side:

sysctl -w net.core.rmem_max=536870912
sysctl -w net.core.wmem_max=536870912

ip link set dev $DEV mtu 256
iperf3 -c $HOST -t10 -i0 -Z -P 8 -l 1M --pacing-timer 1000000 -w 4M

Any idea of what is the problem?

Thanks,
Laurent



^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v3 4/4] virtio_net: disable cb aggressively
  2023-01-16 13:41     ` Laurent Vivier
@ 2023-01-17  3:48       ` Jason Wang
  -1 siblings, 0 replies; 49+ messages in thread
From: Jason Wang @ 2023-01-17  3:48 UTC (permalink / raw)
  To: Laurent Vivier
  Cc: Willem de Bruijn, Michael S. Tsirkin, netdev, linux-kernel,
	virtualization, Stefano Brivio, Jakub Kicinski, Wei Wang,
	David Miller

On Mon, Jan 16, 2023 at 9:41 PM Laurent Vivier <lvivier@redhat.com> wrote:
>
> Hi Michael,
>
> On 5/26/21 10:24, Michael S. Tsirkin wrote:
> > There are currently two cases where we poll TX vq not in response to a
> > callback: start xmit and rx napi.  We currently do this with callbacks
> > enabled which can cause extra interrupts from the card.  Used not to be
> > a big issue as we run with interrupts disabled but that is no longer the
> > case, and in some cases the rate of spurious interrupts is so high
> > linux detects this and actually kills the interrupt.
> >
> > Fix up by disabling the callbacks before polling the tx vq.
> >
> > Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> > ---
> >   drivers/net/virtio_net.c | 16 ++++++++++++----
> >   1 file changed, 12 insertions(+), 4 deletions(-)
> >
> > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > index c29f42d1e04f..a83dc038d8af 100644
> > --- a/drivers/net/virtio_net.c
> > +++ b/drivers/net/virtio_net.c
> > @@ -1433,7 +1433,10 @@ static void virtnet_poll_cleantx(struct receive_queue *rq)
> >               return;
> >
> >       if (__netif_tx_trylock(txq)) {
> > -             free_old_xmit_skbs(sq, true);
> > +             do {
> > +                     virtqueue_disable_cb(sq->vq);
> > +                     free_old_xmit_skbs(sq, true);
> > +             } while (unlikely(!virtqueue_enable_cb_delayed(sq->vq)));
> >
> >               if (sq->vq->num_free >= 2 + MAX_SKB_FRAGS)
> >                       netif_tx_wake_queue(txq);
> > @@ -1605,12 +1608,17 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev)
> >       struct netdev_queue *txq = netdev_get_tx_queue(dev, qnum);
> >       bool kick = !netdev_xmit_more();
> >       bool use_napi = sq->napi.weight;
> > +     unsigned int bytes = skb->len;
> >
> >       /* Free up any pending old buffers before queueing new ones. */
> > -     free_old_xmit_skbs(sq, false);
> > +     do {
> > +             if (use_napi)
> > +                     virtqueue_disable_cb(sq->vq);
> >
> > -     if (use_napi && kick)
> > -             virtqueue_enable_cb_delayed(sq->vq);
> > +             free_old_xmit_skbs(sq, false);
> > +
> > +     } while (use_napi && kick &&
> > +            unlikely(!virtqueue_enable_cb_delayed(sq->vq)));
> >
> >       /* timestamp packet in software */
> >       skb_tx_timestamp(skb);
>
> This patch seems to introduce a problem with QEMU connected to passt using netdev stream
> backend.
>
> When I run an iperf3 test I get after 1 or 2 seconds of test:
>
> [  254.035559] NETDEV WATCHDOG: ens3 (virtio_net): transmit queue 0 timed out
> ...
> [  254.060962] virtio_net virtio1 ens3: TX timeout on queue: 0, sq: output.0, vq: 0x1,
> name: output.0, 8856000 usecs ago
> [  259.155150] virtio_net virtio1 ens3: TX timeout on queue: 0, sq: output.0, vq: 0x1,
> name: output.0, 13951000 usecs ago
>
> In QEMU, I can see in virtio_net_tx_bh() the function virtio_net_flush_tx() has flushed
> all the queue entries and re-enabled the queue notification with
> virtio_queue_set_notification() and tries to flush again the queue and as it is empty it
> does nothing more and then rely on a kernel notification to re-enable the bottom half
> function. As this notification never comes the queue is stuck and kernel add entries but
> QEMU doesn't remove them:
>
> 2812 static void virtio_net_tx_bh(void *opaque)
> 2813 {
> ...
> 2833     ret = virtio_net_flush_tx(q);
>
> -> flush the queue and ret is not an error and not n->tx_burst (that would re-schedule the
> function)
>
> ...
> 2850     virtio_queue_set_notification(q->tx_vq, 1);
>
> -> re-enable the queue notification
>
> 2851     ret = virtio_net_flush_tx(q);
> 2852     if (ret == -EINVAL) {
> 2853         return;
> 2854     } else if (ret > 0) {
> 2855         virtio_queue_set_notification(q->tx_vq, 0);
> 2856         qemu_bh_schedule(q->tx_bh);
> 2857         q->tx_waiting = 1;
> 2858     }
>
> -> ret is 0, exit the function without re-scheduling the function.
> ...
> 2859 }
>
> If I revert this patch in the kernel (a7766ef18b33 ("virtio_net: disable cb
> aggressively")), it works fine.
>
> How to reproduce it:
>
> I start passt (https://passt.top/passt):
>
> passt -f
>
> and then QEMU
>
> qemu-system-x86_64 ... --netdev
> stream,id=netdev0,server=off,addr.type=unix,addr.path=/tmp/passt_1.socket -device
> virtio-net,mac=9a:2b:2c:2d:2e:2f,netdev=netdev0
>
> Host side:
>
> sysctl -w net.core.rmem_max=134217728
> sysctl -w net.core.wmem_max=134217728
> iperf3 -s
>
> Guest side:
>
> sysctl -w net.core.rmem_max=536870912
> sysctl -w net.core.wmem_max=536870912
>
> ip link set dev $DEV mtu 256
> iperf3 -c $HOST -t10 -i0 -Z -P 8 -l 1M --pacing-timer 1000000 -w 4M
>
> Any idea of what is the problem?

This looks similar to what I spot and try to fix in:

[PATCH net V3] virtio-net: correctly enable callback during start_xmit

(I've cced you in this version).

Thanks

>
> Thanks,
> Laurent
>
>

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v3 4/4] virtio_net: disable cb aggressively
@ 2023-01-17  3:48       ` Jason Wang
  0 siblings, 0 replies; 49+ messages in thread
From: Jason Wang @ 2023-01-17  3:48 UTC (permalink / raw)
  To: Laurent Vivier
  Cc: Michael S. Tsirkin, Jakub Kicinski, Wei Wang, David Miller,
	netdev, Willem de Bruijn, virtualization, Stefano Brivio,
	linux-kernel

On Mon, Jan 16, 2023 at 9:41 PM Laurent Vivier <lvivier@redhat.com> wrote:
>
> Hi Michael,
>
> On 5/26/21 10:24, Michael S. Tsirkin wrote:
> > There are currently two cases where we poll TX vq not in response to a
> > callback: start xmit and rx napi.  We currently do this with callbacks
> > enabled which can cause extra interrupts from the card.  Used not to be
> > a big issue as we run with interrupts disabled but that is no longer the
> > case, and in some cases the rate of spurious interrupts is so high
> > linux detects this and actually kills the interrupt.
> >
> > Fix up by disabling the callbacks before polling the tx vq.
> >
> > Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> > ---
> >   drivers/net/virtio_net.c | 16 ++++++++++++----
> >   1 file changed, 12 insertions(+), 4 deletions(-)
> >
> > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > index c29f42d1e04f..a83dc038d8af 100644
> > --- a/drivers/net/virtio_net.c
> > +++ b/drivers/net/virtio_net.c
> > @@ -1433,7 +1433,10 @@ static void virtnet_poll_cleantx(struct receive_queue *rq)
> >               return;
> >
> >       if (__netif_tx_trylock(txq)) {
> > -             free_old_xmit_skbs(sq, true);
> > +             do {
> > +                     virtqueue_disable_cb(sq->vq);
> > +                     free_old_xmit_skbs(sq, true);
> > +             } while (unlikely(!virtqueue_enable_cb_delayed(sq->vq)));
> >
> >               if (sq->vq->num_free >= 2 + MAX_SKB_FRAGS)
> >                       netif_tx_wake_queue(txq);
> > @@ -1605,12 +1608,17 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev)
> >       struct netdev_queue *txq = netdev_get_tx_queue(dev, qnum);
> >       bool kick = !netdev_xmit_more();
> >       bool use_napi = sq->napi.weight;
> > +     unsigned int bytes = skb->len;
> >
> >       /* Free up any pending old buffers before queueing new ones. */
> > -     free_old_xmit_skbs(sq, false);
> > +     do {
> > +             if (use_napi)
> > +                     virtqueue_disable_cb(sq->vq);
> >
> > -     if (use_napi && kick)
> > -             virtqueue_enable_cb_delayed(sq->vq);
> > +             free_old_xmit_skbs(sq, false);
> > +
> > +     } while (use_napi && kick &&
> > +            unlikely(!virtqueue_enable_cb_delayed(sq->vq)));
> >
> >       /* timestamp packet in software */
> >       skb_tx_timestamp(skb);
>
> This patch seems to introduce a problem with QEMU connected to passt using netdev stream
> backend.
>
> When I run an iperf3 test I get after 1 or 2 seconds of test:
>
> [  254.035559] NETDEV WATCHDOG: ens3 (virtio_net): transmit queue 0 timed out
> ...
> [  254.060962] virtio_net virtio1 ens3: TX timeout on queue: 0, sq: output.0, vq: 0x1,
> name: output.0, 8856000 usecs ago
> [  259.155150] virtio_net virtio1 ens3: TX timeout on queue: 0, sq: output.0, vq: 0x1,
> name: output.0, 13951000 usecs ago
>
> In QEMU, I can see in virtio_net_tx_bh() the function virtio_net_flush_tx() has flushed
> all the queue entries and re-enabled the queue notification with
> virtio_queue_set_notification() and tries to flush again the queue and as it is empty it
> does nothing more and then rely on a kernel notification to re-enable the bottom half
> function. As this notification never comes the queue is stuck and kernel add entries but
> QEMU doesn't remove them:
>
> 2812 static void virtio_net_tx_bh(void *opaque)
> 2813 {
> ...
> 2833     ret = virtio_net_flush_tx(q);
>
> -> flush the queue and ret is not an error and not n->tx_burst (that would re-schedule the
> function)
>
> ...
> 2850     virtio_queue_set_notification(q->tx_vq, 1);
>
> -> re-enable the queue notification
>
> 2851     ret = virtio_net_flush_tx(q);
> 2852     if (ret == -EINVAL) {
> 2853         return;
> 2854     } else if (ret > 0) {
> 2855         virtio_queue_set_notification(q->tx_vq, 0);
> 2856         qemu_bh_schedule(q->tx_bh);
> 2857         q->tx_waiting = 1;
> 2858     }
>
> -> ret is 0, exit the function without re-scheduling the function.
> ...
> 2859 }
>
> If I revert this patch in the kernel (a7766ef18b33 ("virtio_net: disable cb
> aggressively")), it works fine.
>
> How to reproduce it:
>
> I start passt (https://passt.top/passt):
>
> passt -f
>
> and then QEMU
>
> qemu-system-x86_64 ... --netdev
> stream,id=netdev0,server=off,addr.type=unix,addr.path=/tmp/passt_1.socket -device
> virtio-net,mac=9a:2b:2c:2d:2e:2f,netdev=netdev0
>
> Host side:
>
> sysctl -w net.core.rmem_max=134217728
> sysctl -w net.core.wmem_max=134217728
> iperf3 -s
>
> Guest side:
>
> sysctl -w net.core.rmem_max=536870912
> sysctl -w net.core.wmem_max=536870912
>
> ip link set dev $DEV mtu 256
> iperf3 -c $HOST -t10 -i0 -Z -P 8 -l 1M --pacing-timer 1000000 -w 4M
>
> Any idea of what is the problem?

This looks similar to what I spot and try to fix in:

[PATCH net V3] virtio-net: correctly enable callback during start_xmit

(I've cced you in this version).

Thanks

>
> Thanks,
> Laurent
>
>


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v3 3/4] virtio: fix up virtio_disable_cb
  2021-05-26  8:24   ` Michael S. Tsirkin
@ 2023-03-30  6:07     ` Xuan Zhuo
  -1 siblings, 0 replies; 49+ messages in thread
From: Xuan Zhuo @ 2023-03-30  6:07 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Willem de Bruijn, netdev, virtualization, Jakub Kicinski,
	Wei Wang, David Miller, linux-kernel

On Wed, 26 May 2021 04:24:40 -0400, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> virtio_disable_cb is currently a nop for split ring with event index.
> This is because it used to be always called from a callback when we know
> device won't trigger more events until we update the index.  However,
> now that we run with interrupts enabled a lot we also poll without a
> callback so that is different: disabling callbacks will help reduce the
> number of spurious interrupts.
> Further, if using event index with a packed ring, and if being called
> from a callback, we actually do disable interrupts which is unnecessary.
>
> Fix both issues by tracking whenever we get a callback. If that is
> the case disabling interrupts with event index can be a nop.
> If not the case disable interrupts. Note: with a split ring
> there's no explicit "no interrupts" value. For now we write
> a fixed value so our chance of triggering an interupt
> is 1/ring size. It's probably better to write something
> related to the last used index there to reduce the chance
> even further. For now I'm keeping it simple.


Don't understand, is this patch necessary? For this patch set, we can do without
this patch.

So doest this patch optimize virtqueue_disable_cb() by reducing a modification of
vring_used_event(&vq-> split.vring)?

Or I miss something.

Thanks.

>
> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> ---
>  drivers/virtio/virtio_ring.c | 26 +++++++++++++++++++++++++-
>  1 file changed, 25 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> index 71e16b53e9c1..88f0b16b11b8 100644
> --- a/drivers/virtio/virtio_ring.c
> +++ b/drivers/virtio/virtio_ring.c
> @@ -113,6 +113,9 @@ struct vring_virtqueue {
>  	/* Last used index we've seen. */
>  	u16 last_used_idx;
>
> +	/* Hint for event idx: already triggered no need to disable. */
> +	bool event_triggered;
> +
>  	union {
>  		/* Available for split ring */
>  		struct {
> @@ -739,7 +742,10 @@ static void virtqueue_disable_cb_split(struct virtqueue *_vq)
>
>  	if (!(vq->split.avail_flags_shadow & VRING_AVAIL_F_NO_INTERRUPT)) {
>  		vq->split.avail_flags_shadow |= VRING_AVAIL_F_NO_INTERRUPT;
> -		if (!vq->event)
> +		if (vq->event)
> +			/* TODO: this is a hack. Figure out a cleaner value to write. */
> +			vring_used_event(&vq->split.vring) = 0x0;
> +		else
>  			vq->split.vring.avail->flags =
>  				cpu_to_virtio16(_vq->vdev,
>  						vq->split.avail_flags_shadow);
> @@ -1605,6 +1611,7 @@ static struct virtqueue *vring_create_virtqueue_packed(
>  	vq->weak_barriers = weak_barriers;
>  	vq->broken = false;
>  	vq->last_used_idx = 0;
> +	vq->event_triggered = false;
>  	vq->num_added = 0;
>  	vq->packed_ring = true;
>  	vq->use_dma_api = vring_use_dma_api(vdev);
> @@ -1919,6 +1926,12 @@ void virtqueue_disable_cb(struct virtqueue *_vq)
>  {
>  	struct vring_virtqueue *vq = to_vvq(_vq);
>
> +	/* If device triggered an event already it won't trigger one again:
> +	 * no need to disable.
> +	 */
> +	if (vq->event_triggered)
> +		return;
> +
>  	if (vq->packed_ring)
>  		virtqueue_disable_cb_packed(_vq);
>  	else
> @@ -1942,6 +1955,9 @@ unsigned virtqueue_enable_cb_prepare(struct virtqueue *_vq)
>  {
>  	struct vring_virtqueue *vq = to_vvq(_vq);
>
> +	if (vq->event_triggered)
> +		vq->event_triggered = false;
> +
>  	return vq->packed_ring ? virtqueue_enable_cb_prepare_packed(_vq) :
>  				 virtqueue_enable_cb_prepare_split(_vq);
>  }
> @@ -2005,6 +2021,9 @@ bool virtqueue_enable_cb_delayed(struct virtqueue *_vq)
>  {
>  	struct vring_virtqueue *vq = to_vvq(_vq);
>
> +	if (vq->event_triggered)
> +		vq->event_triggered = false;
> +
>  	return vq->packed_ring ? virtqueue_enable_cb_delayed_packed(_vq) :
>  				 virtqueue_enable_cb_delayed_split(_vq);
>  }
> @@ -2044,6 +2063,10 @@ irqreturn_t vring_interrupt(int irq, void *_vq)
>  	if (unlikely(vq->broken))
>  		return IRQ_HANDLED;
>
> +	/* Just a hint for performance: so it's ok that this can be racy! */
> +	if (vq->event)
> +		vq->event_triggered = true;
> +
>  	pr_debug("virtqueue callback for %p (%p)\n", vq, vq->vq.callback);
>  	if (vq->vq.callback)
>  		vq->vq.callback(&vq->vq);
> @@ -2083,6 +2106,7 @@ struct virtqueue *__vring_new_virtqueue(unsigned int index,
>  	vq->weak_barriers = weak_barriers;
>  	vq->broken = false;
>  	vq->last_used_idx = 0;
> +	vq->event_triggered = false;
>  	vq->num_added = 0;
>  	vq->use_dma_api = vring_use_dma_api(vdev);
>  #ifdef DEBUG
> --
> MST
>
> _______________________________________________
> Virtualization mailing list
> Virtualization@lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v3 3/4] virtio: fix up virtio_disable_cb
@ 2023-03-30  6:07     ` Xuan Zhuo
  0 siblings, 0 replies; 49+ messages in thread
From: Xuan Zhuo @ 2023-03-30  6:07 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Willem de Bruijn, netdev, linux-kernel, virtualization,
	Jakub Kicinski, Wei Wang, David Miller

On Wed, 26 May 2021 04:24:40 -0400, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> virtio_disable_cb is currently a nop for split ring with event index.
> This is because it used to be always called from a callback when we know
> device won't trigger more events until we update the index.  However,
> now that we run with interrupts enabled a lot we also poll without a
> callback so that is different: disabling callbacks will help reduce the
> number of spurious interrupts.
> Further, if using event index with a packed ring, and if being called
> from a callback, we actually do disable interrupts which is unnecessary.
>
> Fix both issues by tracking whenever we get a callback. If that is
> the case disabling interrupts with event index can be a nop.
> If not the case disable interrupts. Note: with a split ring
> there's no explicit "no interrupts" value. For now we write
> a fixed value so our chance of triggering an interupt
> is 1/ring size. It's probably better to write something
> related to the last used index there to reduce the chance
> even further. For now I'm keeping it simple.


Don't understand, is this patch necessary? For this patch set, we can do without
this patch.

So doest this patch optimize virtqueue_disable_cb() by reducing a modification of
vring_used_event(&vq-> split.vring)?

Or I miss something.

Thanks.

>
> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> ---
>  drivers/virtio/virtio_ring.c | 26 +++++++++++++++++++++++++-
>  1 file changed, 25 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> index 71e16b53e9c1..88f0b16b11b8 100644
> --- a/drivers/virtio/virtio_ring.c
> +++ b/drivers/virtio/virtio_ring.c
> @@ -113,6 +113,9 @@ struct vring_virtqueue {
>  	/* Last used index we've seen. */
>  	u16 last_used_idx;
>
> +	/* Hint for event idx: already triggered no need to disable. */
> +	bool event_triggered;
> +
>  	union {
>  		/* Available for split ring */
>  		struct {
> @@ -739,7 +742,10 @@ static void virtqueue_disable_cb_split(struct virtqueue *_vq)
>
>  	if (!(vq->split.avail_flags_shadow & VRING_AVAIL_F_NO_INTERRUPT)) {
>  		vq->split.avail_flags_shadow |= VRING_AVAIL_F_NO_INTERRUPT;
> -		if (!vq->event)
> +		if (vq->event)
> +			/* TODO: this is a hack. Figure out a cleaner value to write. */
> +			vring_used_event(&vq->split.vring) = 0x0;
> +		else
>  			vq->split.vring.avail->flags =
>  				cpu_to_virtio16(_vq->vdev,
>  						vq->split.avail_flags_shadow);
> @@ -1605,6 +1611,7 @@ static struct virtqueue *vring_create_virtqueue_packed(
>  	vq->weak_barriers = weak_barriers;
>  	vq->broken = false;
>  	vq->last_used_idx = 0;
> +	vq->event_triggered = false;
>  	vq->num_added = 0;
>  	vq->packed_ring = true;
>  	vq->use_dma_api = vring_use_dma_api(vdev);
> @@ -1919,6 +1926,12 @@ void virtqueue_disable_cb(struct virtqueue *_vq)
>  {
>  	struct vring_virtqueue *vq = to_vvq(_vq);
>
> +	/* If device triggered an event already it won't trigger one again:
> +	 * no need to disable.
> +	 */
> +	if (vq->event_triggered)
> +		return;
> +
>  	if (vq->packed_ring)
>  		virtqueue_disable_cb_packed(_vq);
>  	else
> @@ -1942,6 +1955,9 @@ unsigned virtqueue_enable_cb_prepare(struct virtqueue *_vq)
>  {
>  	struct vring_virtqueue *vq = to_vvq(_vq);
>
> +	if (vq->event_triggered)
> +		vq->event_triggered = false;
> +
>  	return vq->packed_ring ? virtqueue_enable_cb_prepare_packed(_vq) :
>  				 virtqueue_enable_cb_prepare_split(_vq);
>  }
> @@ -2005,6 +2021,9 @@ bool virtqueue_enable_cb_delayed(struct virtqueue *_vq)
>  {
>  	struct vring_virtqueue *vq = to_vvq(_vq);
>
> +	if (vq->event_triggered)
> +		vq->event_triggered = false;
> +
>  	return vq->packed_ring ? virtqueue_enable_cb_delayed_packed(_vq) :
>  				 virtqueue_enable_cb_delayed_split(_vq);
>  }
> @@ -2044,6 +2063,10 @@ irqreturn_t vring_interrupt(int irq, void *_vq)
>  	if (unlikely(vq->broken))
>  		return IRQ_HANDLED;
>
> +	/* Just a hint for performance: so it's ok that this can be racy! */
> +	if (vq->event)
> +		vq->event_triggered = true;
> +
>  	pr_debug("virtqueue callback for %p (%p)\n", vq, vq->vq.callback);
>  	if (vq->vq.callback)
>  		vq->vq.callback(&vq->vq);
> @@ -2083,6 +2106,7 @@ struct virtqueue *__vring_new_virtqueue(unsigned int index,
>  	vq->weak_barriers = weak_barriers;
>  	vq->broken = false;
>  	vq->last_used_idx = 0;
> +	vq->event_triggered = false;
>  	vq->num_added = 0;
>  	vq->use_dma_api = vring_use_dma_api(vdev);
>  #ifdef DEBUG
> --
> MST
>
> _______________________________________________
> Virtualization mailing list
> Virtualization@lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/virtualization
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v3 3/4] virtio: fix up virtio_disable_cb
  2023-03-30  6:07     ` Xuan Zhuo
@ 2023-03-30  6:44       ` Michael S. Tsirkin
  -1 siblings, 0 replies; 49+ messages in thread
From: Michael S. Tsirkin @ 2023-03-30  6:44 UTC (permalink / raw)
  To: Xuan Zhuo
  Cc: Willem de Bruijn, netdev, linux-kernel, virtualization,
	Jakub Kicinski, Wei Wang, David Miller

On Thu, Mar 30, 2023 at 02:07:37PM +0800, Xuan Zhuo wrote:
> On Wed, 26 May 2021 04:24:40 -0400, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > virtio_disable_cb is currently a nop for split ring with event index.
> > This is because it used to be always called from a callback when we know
> > device won't trigger more events until we update the index.  However,
> > now that we run with interrupts enabled a lot we also poll without a
> > callback so that is different: disabling callbacks will help reduce the
> > number of spurious interrupts.
> > Further, if using event index with a packed ring, and if being called
> > from a callback, we actually do disable interrupts which is unnecessary.
> >
> > Fix both issues by tracking whenever we get a callback. If that is
> > the case disabling interrupts with event index can be a nop.
> > If not the case disable interrupts. Note: with a split ring
> > there's no explicit "no interrupts" value. For now we write
> > a fixed value so our chance of triggering an interupt
> > is 1/ring size. It's probably better to write something
> > related to the last used index there to reduce the chance
> > even further. For now I'm keeping it simple.
> 
> 
> Don't understand, is this patch necessary? For this patch set, we can do without
> this patch.
> 
> So doest this patch optimize virtqueue_disable_cb() by reducing a modification of
> vring_used_event(&vq-> split.vring)?
> 
> Or I miss something.
> 
> Thanks.

Before this patch virtqueue_disable_cb did nothing at all
for the common case of event index enabled, so
calling it from virtio net would not help matters.

But the patch is from 2021, isn't it a bit too late to argue?
If you have a cleanup or an optimization in mind, please
post a patch.

> >
> > Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> > ---
> >  drivers/virtio/virtio_ring.c | 26 +++++++++++++++++++++++++-
> >  1 file changed, 25 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> > index 71e16b53e9c1..88f0b16b11b8 100644
> > --- a/drivers/virtio/virtio_ring.c
> > +++ b/drivers/virtio/virtio_ring.c
> > @@ -113,6 +113,9 @@ struct vring_virtqueue {
> >  	/* Last used index we've seen. */
> >  	u16 last_used_idx;
> >
> > +	/* Hint for event idx: already triggered no need to disable. */
> > +	bool event_triggered;
> > +
> >  	union {
> >  		/* Available for split ring */
> >  		struct {
> > @@ -739,7 +742,10 @@ static void virtqueue_disable_cb_split(struct virtqueue *_vq)
> >
> >  	if (!(vq->split.avail_flags_shadow & VRING_AVAIL_F_NO_INTERRUPT)) {
> >  		vq->split.avail_flags_shadow |= VRING_AVAIL_F_NO_INTERRUPT;
> > -		if (!vq->event)
> > +		if (vq->event)
> > +			/* TODO: this is a hack. Figure out a cleaner value to write. */
> > +			vring_used_event(&vq->split.vring) = 0x0;
> > +		else
> >  			vq->split.vring.avail->flags =
> >  				cpu_to_virtio16(_vq->vdev,
> >  						vq->split.avail_flags_shadow);
> > @@ -1605,6 +1611,7 @@ static struct virtqueue *vring_create_virtqueue_packed(
> >  	vq->weak_barriers = weak_barriers;
> >  	vq->broken = false;
> >  	vq->last_used_idx = 0;
> > +	vq->event_triggered = false;
> >  	vq->num_added = 0;
> >  	vq->packed_ring = true;
> >  	vq->use_dma_api = vring_use_dma_api(vdev);
> > @@ -1919,6 +1926,12 @@ void virtqueue_disable_cb(struct virtqueue *_vq)
> >  {
> >  	struct vring_virtqueue *vq = to_vvq(_vq);
> >
> > +	/* If device triggered an event already it won't trigger one again:
> > +	 * no need to disable.
> > +	 */
> > +	if (vq->event_triggered)
> > +		return;
> > +
> >  	if (vq->packed_ring)
> >  		virtqueue_disable_cb_packed(_vq);
> >  	else
> > @@ -1942,6 +1955,9 @@ unsigned virtqueue_enable_cb_prepare(struct virtqueue *_vq)
> >  {
> >  	struct vring_virtqueue *vq = to_vvq(_vq);
> >
> > +	if (vq->event_triggered)
> > +		vq->event_triggered = false;
> > +
> >  	return vq->packed_ring ? virtqueue_enable_cb_prepare_packed(_vq) :
> >  				 virtqueue_enable_cb_prepare_split(_vq);
> >  }
> > @@ -2005,6 +2021,9 @@ bool virtqueue_enable_cb_delayed(struct virtqueue *_vq)
> >  {
> >  	struct vring_virtqueue *vq = to_vvq(_vq);
> >
> > +	if (vq->event_triggered)
> > +		vq->event_triggered = false;
> > +
> >  	return vq->packed_ring ? virtqueue_enable_cb_delayed_packed(_vq) :
> >  				 virtqueue_enable_cb_delayed_split(_vq);
> >  }
> > @@ -2044,6 +2063,10 @@ irqreturn_t vring_interrupt(int irq, void *_vq)
> >  	if (unlikely(vq->broken))
> >  		return IRQ_HANDLED;
> >
> > +	/* Just a hint for performance: so it's ok that this can be racy! */
> > +	if (vq->event)
> > +		vq->event_triggered = true;
> > +
> >  	pr_debug("virtqueue callback for %p (%p)\n", vq, vq->vq.callback);
> >  	if (vq->vq.callback)
> >  		vq->vq.callback(&vq->vq);
> > @@ -2083,6 +2106,7 @@ struct virtqueue *__vring_new_virtqueue(unsigned int index,
> >  	vq->weak_barriers = weak_barriers;
> >  	vq->broken = false;
> >  	vq->last_used_idx = 0;
> > +	vq->event_triggered = false;
> >  	vq->num_added = 0;
> >  	vq->use_dma_api = vring_use_dma_api(vdev);
> >  #ifdef DEBUG
> > --
> > MST
> >
> > _______________________________________________
> > Virtualization mailing list
> > Virtualization@lists.linux-foundation.org
> > https://lists.linuxfoundation.org/mailman/listinfo/virtualization

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v3 3/4] virtio: fix up virtio_disable_cb
@ 2023-03-30  6:44       ` Michael S. Tsirkin
  0 siblings, 0 replies; 49+ messages in thread
From: Michael S. Tsirkin @ 2023-03-30  6:44 UTC (permalink / raw)
  To: Xuan Zhuo
  Cc: Willem de Bruijn, netdev, virtualization, Jakub Kicinski,
	Wei Wang, David Miller, linux-kernel

On Thu, Mar 30, 2023 at 02:07:37PM +0800, Xuan Zhuo wrote:
> On Wed, 26 May 2021 04:24:40 -0400, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > virtio_disable_cb is currently a nop for split ring with event index.
> > This is because it used to be always called from a callback when we know
> > device won't trigger more events until we update the index.  However,
> > now that we run with interrupts enabled a lot we also poll without a
> > callback so that is different: disabling callbacks will help reduce the
> > number of spurious interrupts.
> > Further, if using event index with a packed ring, and if being called
> > from a callback, we actually do disable interrupts which is unnecessary.
> >
> > Fix both issues by tracking whenever we get a callback. If that is
> > the case disabling interrupts with event index can be a nop.
> > If not the case disable interrupts. Note: with a split ring
> > there's no explicit "no interrupts" value. For now we write
> > a fixed value so our chance of triggering an interupt
> > is 1/ring size. It's probably better to write something
> > related to the last used index there to reduce the chance
> > even further. For now I'm keeping it simple.
> 
> 
> Don't understand, is this patch necessary? For this patch set, we can do without
> this patch.
> 
> So doest this patch optimize virtqueue_disable_cb() by reducing a modification of
> vring_used_event(&vq-> split.vring)?
> 
> Or I miss something.
> 
> Thanks.

Before this patch virtqueue_disable_cb did nothing at all
for the common case of event index enabled, so
calling it from virtio net would not help matters.

But the patch is from 2021, isn't it a bit too late to argue?
If you have a cleanup or an optimization in mind, please
post a patch.

> >
> > Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> > ---
> >  drivers/virtio/virtio_ring.c | 26 +++++++++++++++++++++++++-
> >  1 file changed, 25 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> > index 71e16b53e9c1..88f0b16b11b8 100644
> > --- a/drivers/virtio/virtio_ring.c
> > +++ b/drivers/virtio/virtio_ring.c
> > @@ -113,6 +113,9 @@ struct vring_virtqueue {
> >  	/* Last used index we've seen. */
> >  	u16 last_used_idx;
> >
> > +	/* Hint for event idx: already triggered no need to disable. */
> > +	bool event_triggered;
> > +
> >  	union {
> >  		/* Available for split ring */
> >  		struct {
> > @@ -739,7 +742,10 @@ static void virtqueue_disable_cb_split(struct virtqueue *_vq)
> >
> >  	if (!(vq->split.avail_flags_shadow & VRING_AVAIL_F_NO_INTERRUPT)) {
> >  		vq->split.avail_flags_shadow |= VRING_AVAIL_F_NO_INTERRUPT;
> > -		if (!vq->event)
> > +		if (vq->event)
> > +			/* TODO: this is a hack. Figure out a cleaner value to write. */
> > +			vring_used_event(&vq->split.vring) = 0x0;
> > +		else
> >  			vq->split.vring.avail->flags =
> >  				cpu_to_virtio16(_vq->vdev,
> >  						vq->split.avail_flags_shadow);
> > @@ -1605,6 +1611,7 @@ static struct virtqueue *vring_create_virtqueue_packed(
> >  	vq->weak_barriers = weak_barriers;
> >  	vq->broken = false;
> >  	vq->last_used_idx = 0;
> > +	vq->event_triggered = false;
> >  	vq->num_added = 0;
> >  	vq->packed_ring = true;
> >  	vq->use_dma_api = vring_use_dma_api(vdev);
> > @@ -1919,6 +1926,12 @@ void virtqueue_disable_cb(struct virtqueue *_vq)
> >  {
> >  	struct vring_virtqueue *vq = to_vvq(_vq);
> >
> > +	/* If device triggered an event already it won't trigger one again:
> > +	 * no need to disable.
> > +	 */
> > +	if (vq->event_triggered)
> > +		return;
> > +
> >  	if (vq->packed_ring)
> >  		virtqueue_disable_cb_packed(_vq);
> >  	else
> > @@ -1942,6 +1955,9 @@ unsigned virtqueue_enable_cb_prepare(struct virtqueue *_vq)
> >  {
> >  	struct vring_virtqueue *vq = to_vvq(_vq);
> >
> > +	if (vq->event_triggered)
> > +		vq->event_triggered = false;
> > +
> >  	return vq->packed_ring ? virtqueue_enable_cb_prepare_packed(_vq) :
> >  				 virtqueue_enable_cb_prepare_split(_vq);
> >  }
> > @@ -2005,6 +2021,9 @@ bool virtqueue_enable_cb_delayed(struct virtqueue *_vq)
> >  {
> >  	struct vring_virtqueue *vq = to_vvq(_vq);
> >
> > +	if (vq->event_triggered)
> > +		vq->event_triggered = false;
> > +
> >  	return vq->packed_ring ? virtqueue_enable_cb_delayed_packed(_vq) :
> >  				 virtqueue_enable_cb_delayed_split(_vq);
> >  }
> > @@ -2044,6 +2063,10 @@ irqreturn_t vring_interrupt(int irq, void *_vq)
> >  	if (unlikely(vq->broken))
> >  		return IRQ_HANDLED;
> >
> > +	/* Just a hint for performance: so it's ok that this can be racy! */
> > +	if (vq->event)
> > +		vq->event_triggered = true;
> > +
> >  	pr_debug("virtqueue callback for %p (%p)\n", vq, vq->vq.callback);
> >  	if (vq->vq.callback)
> >  		vq->vq.callback(&vq->vq);
> > @@ -2083,6 +2106,7 @@ struct virtqueue *__vring_new_virtqueue(unsigned int index,
> >  	vq->weak_barriers = weak_barriers;
> >  	vq->broken = false;
> >  	vq->last_used_idx = 0;
> > +	vq->event_triggered = false;
> >  	vq->num_added = 0;
> >  	vq->use_dma_api = vring_use_dma_api(vdev);
> >  #ifdef DEBUG
> > --
> > MST
> >
> > _______________________________________________
> > Virtualization mailing list
> > Virtualization@lists.linux-foundation.org
> > https://lists.linuxfoundation.org/mailman/listinfo/virtualization


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v3 3/4] virtio: fix up virtio_disable_cb
  2023-03-30  6:44       ` Michael S. Tsirkin
@ 2023-03-30  6:54         ` Xuan Zhuo
  -1 siblings, 0 replies; 49+ messages in thread
From: Xuan Zhuo @ 2023-03-30  6:54 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Willem de Bruijn, netdev, virtualization, Jakub Kicinski,
	Wei Wang, David Miller, linux-kernel

On Thu, 30 Mar 2023 02:44:44 -0400, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> On Thu, Mar 30, 2023 at 02:07:37PM +0800, Xuan Zhuo wrote:
> > On Wed, 26 May 2021 04:24:40 -0400, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > > virtio_disable_cb is currently a nop for split ring with event index.
> > > This is because it used to be always called from a callback when we know
> > > device won't trigger more events until we update the index.  However,
> > > now that we run with interrupts enabled a lot we also poll without a
> > > callback so that is different: disabling callbacks will help reduce the
> > > number of spurious interrupts.
> > > Further, if using event index with a packed ring, and if being called
> > > from a callback, we actually do disable interrupts which is unnecessary.
> > >
> > > Fix both issues by tracking whenever we get a callback. If that is
> > > the case disabling interrupts with event index can be a nop.
> > > If not the case disable interrupts. Note: with a split ring
> > > there's no explicit "no interrupts" value. For now we write
> > > a fixed value so our chance of triggering an interupt
> > > is 1/ring size. It's probably better to write something
> > > related to the last used index there to reduce the chance
> > > even further. For now I'm keeping it simple.
> >
> >
> > Don't understand, is this patch necessary? For this patch set, we can do without
> > this patch.
> >
> > So doest this patch optimize virtqueue_disable_cb() by reducing a modification of
> > vring_used_event(&vq-> split.vring)?
> >
> > Or I miss something.
> >
> > Thanks.
>
> Before this patch virtqueue_disable_cb did nothing at all
> for the common case of event index enabled, so
> calling it from virtio net would not help matters.

I agree with these codes:

-		if (!vq->event)
+		if (vq->event)
+			/* TODO: this is a hack. Figure out a cleaner value to write. */
+			vring_used_event(&vq->split.vring) = 0x0;
+		else


I just don't understand event_triggered.

>
> But the patch is from 2021, isn't it a bit too late to argue?
> If you have a cleanup or an optimization in mind, please
> post a patch.

Sorry, I just have some problems, I don't oppose it. At least it can reduce the
modification of vring_used_event(&vq->split.vring). I think it is also beneficial.

Thanks very much.


>
> > >
> > > Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> > > ---
> > >  drivers/virtio/virtio_ring.c | 26 +++++++++++++++++++++++++-
> > >  1 file changed, 25 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> > > index 71e16b53e9c1..88f0b16b11b8 100644
> > > --- a/drivers/virtio/virtio_ring.c
> > > +++ b/drivers/virtio/virtio_ring.c
> > > @@ -113,6 +113,9 @@ struct vring_virtqueue {
> > >  	/* Last used index we've seen. */
> > >  	u16 last_used_idx;
> > >
> > > +	/* Hint for event idx: already triggered no need to disable. */
> > > +	bool event_triggered;
> > > +
> > >  	union {
> > >  		/* Available for split ring */
> > >  		struct {
> > > @@ -739,7 +742,10 @@ static void virtqueue_disable_cb_split(struct virtqueue *_vq)
> > >
> > >  	if (!(vq->split.avail_flags_shadow & VRING_AVAIL_F_NO_INTERRUPT)) {
> > >  		vq->split.avail_flags_shadow |= VRING_AVAIL_F_NO_INTERRUPT;
> > > -		if (!vq->event)
> > > +		if (vq->event)
> > > +			/* TODO: this is a hack. Figure out a cleaner value to write. */
> > > +			vring_used_event(&vq->split.vring) = 0x0;
> > > +		else
> > >  			vq->split.vring.avail->flags =
> > >  				cpu_to_virtio16(_vq->vdev,
> > >  						vq->split.avail_flags_shadow);
> > > @@ -1605,6 +1611,7 @@ static struct virtqueue *vring_create_virtqueue_packed(
> > >  	vq->weak_barriers = weak_barriers;
> > >  	vq->broken = false;
> > >  	vq->last_used_idx = 0;
> > > +	vq->event_triggered = false;
> > >  	vq->num_added = 0;
> > >  	vq->packed_ring = true;
> > >  	vq->use_dma_api = vring_use_dma_api(vdev);
> > > @@ -1919,6 +1926,12 @@ void virtqueue_disable_cb(struct virtqueue *_vq)
> > >  {
> > >  	struct vring_virtqueue *vq = to_vvq(_vq);
> > >
> > > +	/* If device triggered an event already it won't trigger one again:
> > > +	 * no need to disable.
> > > +	 */
> > > +	if (vq->event_triggered)
> > > +		return;
> > > +
> > >  	if (vq->packed_ring)
> > >  		virtqueue_disable_cb_packed(_vq);
> > >  	else
> > > @@ -1942,6 +1955,9 @@ unsigned virtqueue_enable_cb_prepare(struct virtqueue *_vq)
> > >  {
> > >  	struct vring_virtqueue *vq = to_vvq(_vq);
> > >
> > > +	if (vq->event_triggered)
> > > +		vq->event_triggered = false;
> > > +
> > >  	return vq->packed_ring ? virtqueue_enable_cb_prepare_packed(_vq) :
> > >  				 virtqueue_enable_cb_prepare_split(_vq);
> > >  }
> > > @@ -2005,6 +2021,9 @@ bool virtqueue_enable_cb_delayed(struct virtqueue *_vq)
> > >  {
> > >  	struct vring_virtqueue *vq = to_vvq(_vq);
> > >
> > > +	if (vq->event_triggered)
> > > +		vq->event_triggered = false;
> > > +
> > >  	return vq->packed_ring ? virtqueue_enable_cb_delayed_packed(_vq) :
> > >  				 virtqueue_enable_cb_delayed_split(_vq);
> > >  }
> > > @@ -2044,6 +2063,10 @@ irqreturn_t vring_interrupt(int irq, void *_vq)
> > >  	if (unlikely(vq->broken))
> > >  		return IRQ_HANDLED;
> > >
> > > +	/* Just a hint for performance: so it's ok that this can be racy! */
> > > +	if (vq->event)
> > > +		vq->event_triggered = true;
> > > +
> > >  	pr_debug("virtqueue callback for %p (%p)\n", vq, vq->vq.callback);
> > >  	if (vq->vq.callback)
> > >  		vq->vq.callback(&vq->vq);
> > > @@ -2083,6 +2106,7 @@ struct virtqueue *__vring_new_virtqueue(unsigned int index,
> > >  	vq->weak_barriers = weak_barriers;
> > >  	vq->broken = false;
> > >  	vq->last_used_idx = 0;
> > > +	vq->event_triggered = false;
> > >  	vq->num_added = 0;
> > >  	vq->use_dma_api = vring_use_dma_api(vdev);
> > >  #ifdef DEBUG
> > > --
> > > MST
> > >
> > > _______________________________________________
> > > Virtualization mailing list
> > > Virtualization@lists.linux-foundation.org
> > > https://lists.linuxfoundation.org/mailman/listinfo/virtualization
>

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v3 3/4] virtio: fix up virtio_disable_cb
@ 2023-03-30  6:54         ` Xuan Zhuo
  0 siblings, 0 replies; 49+ messages in thread
From: Xuan Zhuo @ 2023-03-30  6:54 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Willem de Bruijn, netdev, linux-kernel, virtualization,
	Jakub Kicinski, Wei Wang, David Miller

On Thu, 30 Mar 2023 02:44:44 -0400, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> On Thu, Mar 30, 2023 at 02:07:37PM +0800, Xuan Zhuo wrote:
> > On Wed, 26 May 2021 04:24:40 -0400, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > > virtio_disable_cb is currently a nop for split ring with event index.
> > > This is because it used to be always called from a callback when we know
> > > device won't trigger more events until we update the index.  However,
> > > now that we run with interrupts enabled a lot we also poll without a
> > > callback so that is different: disabling callbacks will help reduce the
> > > number of spurious interrupts.
> > > Further, if using event index with a packed ring, and if being called
> > > from a callback, we actually do disable interrupts which is unnecessary.
> > >
> > > Fix both issues by tracking whenever we get a callback. If that is
> > > the case disabling interrupts with event index can be a nop.
> > > If not the case disable interrupts. Note: with a split ring
> > > there's no explicit "no interrupts" value. For now we write
> > > a fixed value so our chance of triggering an interupt
> > > is 1/ring size. It's probably better to write something
> > > related to the last used index there to reduce the chance
> > > even further. For now I'm keeping it simple.
> >
> >
> > Don't understand, is this patch necessary? For this patch set, we can do without
> > this patch.
> >
> > So doest this patch optimize virtqueue_disable_cb() by reducing a modification of
> > vring_used_event(&vq-> split.vring)?
> >
> > Or I miss something.
> >
> > Thanks.
>
> Before this patch virtqueue_disable_cb did nothing at all
> for the common case of event index enabled, so
> calling it from virtio net would not help matters.

I agree with these codes:

-		if (!vq->event)
+		if (vq->event)
+			/* TODO: this is a hack. Figure out a cleaner value to write. */
+			vring_used_event(&vq->split.vring) = 0x0;
+		else


I just don't understand event_triggered.

>
> But the patch is from 2021, isn't it a bit too late to argue?
> If you have a cleanup or an optimization in mind, please
> post a patch.

Sorry, I just have some problems, I don't oppose it. At least it can reduce the
modification of vring_used_event(&vq->split.vring). I think it is also beneficial.

Thanks very much.


>
> > >
> > > Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> > > ---
> > >  drivers/virtio/virtio_ring.c | 26 +++++++++++++++++++++++++-
> > >  1 file changed, 25 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> > > index 71e16b53e9c1..88f0b16b11b8 100644
> > > --- a/drivers/virtio/virtio_ring.c
> > > +++ b/drivers/virtio/virtio_ring.c
> > > @@ -113,6 +113,9 @@ struct vring_virtqueue {
> > >  	/* Last used index we've seen. */
> > >  	u16 last_used_idx;
> > >
> > > +	/* Hint for event idx: already triggered no need to disable. */
> > > +	bool event_triggered;
> > > +
> > >  	union {
> > >  		/* Available for split ring */
> > >  		struct {
> > > @@ -739,7 +742,10 @@ static void virtqueue_disable_cb_split(struct virtqueue *_vq)
> > >
> > >  	if (!(vq->split.avail_flags_shadow & VRING_AVAIL_F_NO_INTERRUPT)) {
> > >  		vq->split.avail_flags_shadow |= VRING_AVAIL_F_NO_INTERRUPT;
> > > -		if (!vq->event)
> > > +		if (vq->event)
> > > +			/* TODO: this is a hack. Figure out a cleaner value to write. */
> > > +			vring_used_event(&vq->split.vring) = 0x0;
> > > +		else
> > >  			vq->split.vring.avail->flags =
> > >  				cpu_to_virtio16(_vq->vdev,
> > >  						vq->split.avail_flags_shadow);
> > > @@ -1605,6 +1611,7 @@ static struct virtqueue *vring_create_virtqueue_packed(
> > >  	vq->weak_barriers = weak_barriers;
> > >  	vq->broken = false;
> > >  	vq->last_used_idx = 0;
> > > +	vq->event_triggered = false;
> > >  	vq->num_added = 0;
> > >  	vq->packed_ring = true;
> > >  	vq->use_dma_api = vring_use_dma_api(vdev);
> > > @@ -1919,6 +1926,12 @@ void virtqueue_disable_cb(struct virtqueue *_vq)
> > >  {
> > >  	struct vring_virtqueue *vq = to_vvq(_vq);
> > >
> > > +	/* If device triggered an event already it won't trigger one again:
> > > +	 * no need to disable.
> > > +	 */
> > > +	if (vq->event_triggered)
> > > +		return;
> > > +
> > >  	if (vq->packed_ring)
> > >  		virtqueue_disable_cb_packed(_vq);
> > >  	else
> > > @@ -1942,6 +1955,9 @@ unsigned virtqueue_enable_cb_prepare(struct virtqueue *_vq)
> > >  {
> > >  	struct vring_virtqueue *vq = to_vvq(_vq);
> > >
> > > +	if (vq->event_triggered)
> > > +		vq->event_triggered = false;
> > > +
> > >  	return vq->packed_ring ? virtqueue_enable_cb_prepare_packed(_vq) :
> > >  				 virtqueue_enable_cb_prepare_split(_vq);
> > >  }
> > > @@ -2005,6 +2021,9 @@ bool virtqueue_enable_cb_delayed(struct virtqueue *_vq)
> > >  {
> > >  	struct vring_virtqueue *vq = to_vvq(_vq);
> > >
> > > +	if (vq->event_triggered)
> > > +		vq->event_triggered = false;
> > > +
> > >  	return vq->packed_ring ? virtqueue_enable_cb_delayed_packed(_vq) :
> > >  				 virtqueue_enable_cb_delayed_split(_vq);
> > >  }
> > > @@ -2044,6 +2063,10 @@ irqreturn_t vring_interrupt(int irq, void *_vq)
> > >  	if (unlikely(vq->broken))
> > >  		return IRQ_HANDLED;
> > >
> > > +	/* Just a hint for performance: so it's ok that this can be racy! */
> > > +	if (vq->event)
> > > +		vq->event_triggered = true;
> > > +
> > >  	pr_debug("virtqueue callback for %p (%p)\n", vq, vq->vq.callback);
> > >  	if (vq->vq.callback)
> > >  		vq->vq.callback(&vq->vq);
> > > @@ -2083,6 +2106,7 @@ struct virtqueue *__vring_new_virtqueue(unsigned int index,
> > >  	vq->weak_barriers = weak_barriers;
> > >  	vq->broken = false;
> > >  	vq->last_used_idx = 0;
> > > +	vq->event_triggered = false;
> > >  	vq->num_added = 0;
> > >  	vq->use_dma_api = vring_use_dma_api(vdev);
> > >  #ifdef DEBUG
> > > --
> > > MST
> > >
> > > _______________________________________________
> > > Virtualization mailing list
> > > Virtualization@lists.linux-foundation.org
> > > https://lists.linuxfoundation.org/mailman/listinfo/virtualization
>
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v3 3/4] virtio: fix up virtio_disable_cb
  2023-03-30  6:54         ` Xuan Zhuo
@ 2023-03-30 14:04           ` Michael S. Tsirkin
  -1 siblings, 0 replies; 49+ messages in thread
From: Michael S. Tsirkin @ 2023-03-30 14:04 UTC (permalink / raw)
  To: Xuan Zhuo
  Cc: Willem de Bruijn, netdev, virtualization, Jakub Kicinski,
	Wei Wang, David Miller, linux-kernel

On Thu, Mar 30, 2023 at 02:54:21PM +0800, Xuan Zhuo wrote:
> On Thu, 30 Mar 2023 02:44:44 -0400, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > On Thu, Mar 30, 2023 at 02:07:37PM +0800, Xuan Zhuo wrote:
> > > On Wed, 26 May 2021 04:24:40 -0400, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > > > virtio_disable_cb is currently a nop for split ring with event index.
> > > > This is because it used to be always called from a callback when we know
> > > > device won't trigger more events until we update the index.  However,
> > > > now that we run with interrupts enabled a lot we also poll without a
> > > > callback so that is different: disabling callbacks will help reduce the
> > > > number of spurious interrupts.
> > > > Further, if using event index with a packed ring, and if being called
> > > > from a callback, we actually do disable interrupts which is unnecessary.
> > > >
> > > > Fix both issues by tracking whenever we get a callback. If that is
> > > > the case disabling interrupts with event index can be a nop.
> > > > If not the case disable interrupts. Note: with a split ring
> > > > there's no explicit "no interrupts" value. For now we write
> > > > a fixed value so our chance of triggering an interupt
> > > > is 1/ring size. It's probably better to write something
> > > > related to the last used index there to reduce the chance
> > > > even further. For now I'm keeping it simple.
> > >
> > >
> > > Don't understand, is this patch necessary? For this patch set, we can do without
> > > this patch.
> > >
> > > So doest this patch optimize virtqueue_disable_cb() by reducing a modification of
> > > vring_used_event(&vq-> split.vring)?
> > >
> > > Or I miss something.
> > >
> > > Thanks.
> >
> > Before this patch virtqueue_disable_cb did nothing at all
> > for the common case of event index enabled, so
> > calling it from virtio net would not help matters.
> 
> I agree with these codes:
> 
> -		if (!vq->event)
> +		if (vq->event)
> +			/* TODO: this is a hack. Figure out a cleaner value to write. */
> +			vring_used_event(&vq->split.vring) = 0x0;
> +		else
> 
> 
> I just don't understand event_triggered.


The comment near it says it all:
        /* Hint for event idx: already triggered no need to disable. */
the write into event idx is potentially expensive since it can
invalidate cache for another processor (depending on the CPU).

> >
> > But the patch is from 2021, isn't it a bit too late to argue?
> > If you have a cleanup or an optimization in mind, please
> > post a patch.
> 
> Sorry, I just have some problems, I don't oppose it. At least it can reduce the
> modification of vring_used_event(&vq->split.vring). I think it is also beneficial.
> 
> Thanks very much.
> 
> 
> >
> > > >
> > > > Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> > > > ---
> > > >  drivers/virtio/virtio_ring.c | 26 +++++++++++++++++++++++++-
> > > >  1 file changed, 25 insertions(+), 1 deletion(-)
> > > >
> > > > diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> > > > index 71e16b53e9c1..88f0b16b11b8 100644
> > > > --- a/drivers/virtio/virtio_ring.c
> > > > +++ b/drivers/virtio/virtio_ring.c
> > > > @@ -113,6 +113,9 @@ struct vring_virtqueue {
> > > >  	/* Last used index we've seen. */
> > > >  	u16 last_used_idx;
> > > >
> > > > +	/* Hint for event idx: already triggered no need to disable. */
> > > > +	bool event_triggered;
> > > > +
> > > >  	union {
> > > >  		/* Available for split ring */
> > > >  		struct {
> > > > @@ -739,7 +742,10 @@ static void virtqueue_disable_cb_split(struct virtqueue *_vq)
> > > >
> > > >  	if (!(vq->split.avail_flags_shadow & VRING_AVAIL_F_NO_INTERRUPT)) {
> > > >  		vq->split.avail_flags_shadow |= VRING_AVAIL_F_NO_INTERRUPT;
> > > > -		if (!vq->event)
> > > > +		if (vq->event)
> > > > +			/* TODO: this is a hack. Figure out a cleaner value to write. */
> > > > +			vring_used_event(&vq->split.vring) = 0x0;
> > > > +		else
> > > >  			vq->split.vring.avail->flags =
> > > >  				cpu_to_virtio16(_vq->vdev,
> > > >  						vq->split.avail_flags_shadow);
> > > > @@ -1605,6 +1611,7 @@ static struct virtqueue *vring_create_virtqueue_packed(
> > > >  	vq->weak_barriers = weak_barriers;
> > > >  	vq->broken = false;
> > > >  	vq->last_used_idx = 0;
> > > > +	vq->event_triggered = false;
> > > >  	vq->num_added = 0;
> > > >  	vq->packed_ring = true;
> > > >  	vq->use_dma_api = vring_use_dma_api(vdev);
> > > > @@ -1919,6 +1926,12 @@ void virtqueue_disable_cb(struct virtqueue *_vq)
> > > >  {
> > > >  	struct vring_virtqueue *vq = to_vvq(_vq);
> > > >
> > > > +	/* If device triggered an event already it won't trigger one again:
> > > > +	 * no need to disable.
> > > > +	 */
> > > > +	if (vq->event_triggered)
> > > > +		return;
> > > > +
> > > >  	if (vq->packed_ring)
> > > >  		virtqueue_disable_cb_packed(_vq);
> > > >  	else
> > > > @@ -1942,6 +1955,9 @@ unsigned virtqueue_enable_cb_prepare(struct virtqueue *_vq)
> > > >  {
> > > >  	struct vring_virtqueue *vq = to_vvq(_vq);
> > > >
> > > > +	if (vq->event_triggered)
> > > > +		vq->event_triggered = false;
> > > > +
> > > >  	return vq->packed_ring ? virtqueue_enable_cb_prepare_packed(_vq) :
> > > >  				 virtqueue_enable_cb_prepare_split(_vq);
> > > >  }
> > > > @@ -2005,6 +2021,9 @@ bool virtqueue_enable_cb_delayed(struct virtqueue *_vq)
> > > >  {
> > > >  	struct vring_virtqueue *vq = to_vvq(_vq);
> > > >
> > > > +	if (vq->event_triggered)
> > > > +		vq->event_triggered = false;
> > > > +
> > > >  	return vq->packed_ring ? virtqueue_enable_cb_delayed_packed(_vq) :
> > > >  				 virtqueue_enable_cb_delayed_split(_vq);
> > > >  }
> > > > @@ -2044,6 +2063,10 @@ irqreturn_t vring_interrupt(int irq, void *_vq)
> > > >  	if (unlikely(vq->broken))
> > > >  		return IRQ_HANDLED;
> > > >
> > > > +	/* Just a hint for performance: so it's ok that this can be racy! */
> > > > +	if (vq->event)
> > > > +		vq->event_triggered = true;
> > > > +
> > > >  	pr_debug("virtqueue callback for %p (%p)\n", vq, vq->vq.callback);
> > > >  	if (vq->vq.callback)
> > > >  		vq->vq.callback(&vq->vq);
> > > > @@ -2083,6 +2106,7 @@ struct virtqueue *__vring_new_virtqueue(unsigned int index,
> > > >  	vq->weak_barriers = weak_barriers;
> > > >  	vq->broken = false;
> > > >  	vq->last_used_idx = 0;
> > > > +	vq->event_triggered = false;
> > > >  	vq->num_added = 0;
> > > >  	vq->use_dma_api = vring_use_dma_api(vdev);
> > > >  #ifdef DEBUG
> > > > --
> > > > MST
> > > >
> > > > _______________________________________________
> > > > Virtualization mailing list
> > > > Virtualization@lists.linux-foundation.org
> > > > https://lists.linuxfoundation.org/mailman/listinfo/virtualization
> >


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v3 3/4] virtio: fix up virtio_disable_cb
@ 2023-03-30 14:04           ` Michael S. Tsirkin
  0 siblings, 0 replies; 49+ messages in thread
From: Michael S. Tsirkin @ 2023-03-30 14:04 UTC (permalink / raw)
  To: Xuan Zhuo
  Cc: Willem de Bruijn, netdev, linux-kernel, virtualization,
	Jakub Kicinski, Wei Wang, David Miller

On Thu, Mar 30, 2023 at 02:54:21PM +0800, Xuan Zhuo wrote:
> On Thu, 30 Mar 2023 02:44:44 -0400, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > On Thu, Mar 30, 2023 at 02:07:37PM +0800, Xuan Zhuo wrote:
> > > On Wed, 26 May 2021 04:24:40 -0400, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > > > virtio_disable_cb is currently a nop for split ring with event index.
> > > > This is because it used to be always called from a callback when we know
> > > > device won't trigger more events until we update the index.  However,
> > > > now that we run with interrupts enabled a lot we also poll without a
> > > > callback so that is different: disabling callbacks will help reduce the
> > > > number of spurious interrupts.
> > > > Further, if using event index with a packed ring, and if being called
> > > > from a callback, we actually do disable interrupts which is unnecessary.
> > > >
> > > > Fix both issues by tracking whenever we get a callback. If that is
> > > > the case disabling interrupts with event index can be a nop.
> > > > If not the case disable interrupts. Note: with a split ring
> > > > there's no explicit "no interrupts" value. For now we write
> > > > a fixed value so our chance of triggering an interupt
> > > > is 1/ring size. It's probably better to write something
> > > > related to the last used index there to reduce the chance
> > > > even further. For now I'm keeping it simple.
> > >
> > >
> > > Don't understand, is this patch necessary? For this patch set, we can do without
> > > this patch.
> > >
> > > So doest this patch optimize virtqueue_disable_cb() by reducing a modification of
> > > vring_used_event(&vq-> split.vring)?
> > >
> > > Or I miss something.
> > >
> > > Thanks.
> >
> > Before this patch virtqueue_disable_cb did nothing at all
> > for the common case of event index enabled, so
> > calling it from virtio net would not help matters.
> 
> I agree with these codes:
> 
> -		if (!vq->event)
> +		if (vq->event)
> +			/* TODO: this is a hack. Figure out a cleaner value to write. */
> +			vring_used_event(&vq->split.vring) = 0x0;
> +		else
> 
> 
> I just don't understand event_triggered.


The comment near it says it all:
        /* Hint for event idx: already triggered no need to disable. */
the write into event idx is potentially expensive since it can
invalidate cache for another processor (depending on the CPU).

> >
> > But the patch is from 2021, isn't it a bit too late to argue?
> > If you have a cleanup or an optimization in mind, please
> > post a patch.
> 
> Sorry, I just have some problems, I don't oppose it. At least it can reduce the
> modification of vring_used_event(&vq->split.vring). I think it is also beneficial.
> 
> Thanks very much.
> 
> 
> >
> > > >
> > > > Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> > > > ---
> > > >  drivers/virtio/virtio_ring.c | 26 +++++++++++++++++++++++++-
> > > >  1 file changed, 25 insertions(+), 1 deletion(-)
> > > >
> > > > diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> > > > index 71e16b53e9c1..88f0b16b11b8 100644
> > > > --- a/drivers/virtio/virtio_ring.c
> > > > +++ b/drivers/virtio/virtio_ring.c
> > > > @@ -113,6 +113,9 @@ struct vring_virtqueue {
> > > >  	/* Last used index we've seen. */
> > > >  	u16 last_used_idx;
> > > >
> > > > +	/* Hint for event idx: already triggered no need to disable. */
> > > > +	bool event_triggered;
> > > > +
> > > >  	union {
> > > >  		/* Available for split ring */
> > > >  		struct {
> > > > @@ -739,7 +742,10 @@ static void virtqueue_disable_cb_split(struct virtqueue *_vq)
> > > >
> > > >  	if (!(vq->split.avail_flags_shadow & VRING_AVAIL_F_NO_INTERRUPT)) {
> > > >  		vq->split.avail_flags_shadow |= VRING_AVAIL_F_NO_INTERRUPT;
> > > > -		if (!vq->event)
> > > > +		if (vq->event)
> > > > +			/* TODO: this is a hack. Figure out a cleaner value to write. */
> > > > +			vring_used_event(&vq->split.vring) = 0x0;
> > > > +		else
> > > >  			vq->split.vring.avail->flags =
> > > >  				cpu_to_virtio16(_vq->vdev,
> > > >  						vq->split.avail_flags_shadow);
> > > > @@ -1605,6 +1611,7 @@ static struct virtqueue *vring_create_virtqueue_packed(
> > > >  	vq->weak_barriers = weak_barriers;
> > > >  	vq->broken = false;
> > > >  	vq->last_used_idx = 0;
> > > > +	vq->event_triggered = false;
> > > >  	vq->num_added = 0;
> > > >  	vq->packed_ring = true;
> > > >  	vq->use_dma_api = vring_use_dma_api(vdev);
> > > > @@ -1919,6 +1926,12 @@ void virtqueue_disable_cb(struct virtqueue *_vq)
> > > >  {
> > > >  	struct vring_virtqueue *vq = to_vvq(_vq);
> > > >
> > > > +	/* If device triggered an event already it won't trigger one again:
> > > > +	 * no need to disable.
> > > > +	 */
> > > > +	if (vq->event_triggered)
> > > > +		return;
> > > > +
> > > >  	if (vq->packed_ring)
> > > >  		virtqueue_disable_cb_packed(_vq);
> > > >  	else
> > > > @@ -1942,6 +1955,9 @@ unsigned virtqueue_enable_cb_prepare(struct virtqueue *_vq)
> > > >  {
> > > >  	struct vring_virtqueue *vq = to_vvq(_vq);
> > > >
> > > > +	if (vq->event_triggered)
> > > > +		vq->event_triggered = false;
> > > > +
> > > >  	return vq->packed_ring ? virtqueue_enable_cb_prepare_packed(_vq) :
> > > >  				 virtqueue_enable_cb_prepare_split(_vq);
> > > >  }
> > > > @@ -2005,6 +2021,9 @@ bool virtqueue_enable_cb_delayed(struct virtqueue *_vq)
> > > >  {
> > > >  	struct vring_virtqueue *vq = to_vvq(_vq);
> > > >
> > > > +	if (vq->event_triggered)
> > > > +		vq->event_triggered = false;
> > > > +
> > > >  	return vq->packed_ring ? virtqueue_enable_cb_delayed_packed(_vq) :
> > > >  				 virtqueue_enable_cb_delayed_split(_vq);
> > > >  }
> > > > @@ -2044,6 +2063,10 @@ irqreturn_t vring_interrupt(int irq, void *_vq)
> > > >  	if (unlikely(vq->broken))
> > > >  		return IRQ_HANDLED;
> > > >
> > > > +	/* Just a hint for performance: so it's ok that this can be racy! */
> > > > +	if (vq->event)
> > > > +		vq->event_triggered = true;
> > > > +
> > > >  	pr_debug("virtqueue callback for %p (%p)\n", vq, vq->vq.callback);
> > > >  	if (vq->vq.callback)
> > > >  		vq->vq.callback(&vq->vq);
> > > > @@ -2083,6 +2106,7 @@ struct virtqueue *__vring_new_virtqueue(unsigned int index,
> > > >  	vq->weak_barriers = weak_barriers;
> > > >  	vq->broken = false;
> > > >  	vq->last_used_idx = 0;
> > > > +	vq->event_triggered = false;
> > > >  	vq->num_added = 0;
> > > >  	vq->use_dma_api = vring_use_dma_api(vdev);
> > > >  #ifdef DEBUG
> > > > --
> > > > MST
> > > >
> > > > _______________________________________________
> > > > Virtualization mailing list
> > > > Virtualization@lists.linux-foundation.org
> > > > https://lists.linuxfoundation.org/mailman/listinfo/virtualization
> >

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v3 3/4] virtio: fix up virtio_disable_cb
  2023-03-30 14:04           ` Michael S. Tsirkin
@ 2023-03-31  3:38             ` Xuan Zhuo
  -1 siblings, 0 replies; 49+ messages in thread
From: Xuan Zhuo @ 2023-03-31  3:38 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Willem de Bruijn, netdev, virtualization, Jakub Kicinski,
	Wei Wang, David Miller, linux-kernel

On Thu, 30 Mar 2023 10:04:03 -0400, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> On Thu, Mar 30, 2023 at 02:54:21PM +0800, Xuan Zhuo wrote:
> > On Thu, 30 Mar 2023 02:44:44 -0400, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > > On Thu, Mar 30, 2023 at 02:07:37PM +0800, Xuan Zhuo wrote:
> > > > On Wed, 26 May 2021 04:24:40 -0400, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > > > > virtio_disable_cb is currently a nop for split ring with event index.
> > > > > This is because it used to be always called from a callback when we know
> > > > > device won't trigger more events until we update the index.  However,
> > > > > now that we run with interrupts enabled a lot we also poll without a
> > > > > callback so that is different: disabling callbacks will help reduce the
> > > > > number of spurious interrupts.
> > > > > Further, if using event index with a packed ring, and if being called
> > > > > from a callback, we actually do disable interrupts which is unnecessary.
> > > > >
> > > > > Fix both issues by tracking whenever we get a callback. If that is
> > > > > the case disabling interrupts with event index can be a nop.
> > > > > If not the case disable interrupts. Note: with a split ring
> > > > > there's no explicit "no interrupts" value. For now we write
> > > > > a fixed value so our chance of triggering an interupt
> > > > > is 1/ring size. It's probably better to write something
> > > > > related to the last used index there to reduce the chance
> > > > > even further. For now I'm keeping it simple.
> > > >
> > > >
> > > > Don't understand, is this patch necessary? For this patch set, we can do without
> > > > this patch.
> > > >
> > > > So doest this patch optimize virtqueue_disable_cb() by reducing a modification of
> > > > vring_used_event(&vq-> split.vring)?
> > > >
> > > > Or I miss something.
> > > >
> > > > Thanks.
> > >
> > > Before this patch virtqueue_disable_cb did nothing at all
> > > for the common case of event index enabled, so
> > > calling it from virtio net would not help matters.
> >
> > I agree with these codes:
> >
> > -		if (!vq->event)
> > +		if (vq->event)
> > +			/* TODO: this is a hack. Figure out a cleaner value to write. */
> > +			vring_used_event(&vq->split.vring) = 0x0;
> > +		else
> >
> >
> > I just don't understand event_triggered.
>
>
> The comment near it says it all:
>         /* Hint for event idx: already triggered no need to disable. */
> the write into event idx is potentially expensive since it can
> invalidate cache for another processor (depending on the CPU).

Yes, I agree.

Thanks.


>
> > >
> > > But the patch is from 2021, isn't it a bit too late to argue?
> > > If you have a cleanup or an optimization in mind, please
> > > post a patch.
> >
> > Sorry, I just have some problems, I don't oppose it. At least it can reduce the
> > modification of vring_used_event(&vq->split.vring). I think it is also beneficial.
> >
> > Thanks very much.
> >
> >
> > >
> > > > >
> > > > > Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> > > > > ---
> > > > >  drivers/virtio/virtio_ring.c | 26 +++++++++++++++++++++++++-
> > > > >  1 file changed, 25 insertions(+), 1 deletion(-)
> > > > >
> > > > > diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> > > > > index 71e16b53e9c1..88f0b16b11b8 100644
> > > > > --- a/drivers/virtio/virtio_ring.c
> > > > > +++ b/drivers/virtio/virtio_ring.c
> > > > > @@ -113,6 +113,9 @@ struct vring_virtqueue {
> > > > >  	/* Last used index we've seen. */
> > > > >  	u16 last_used_idx;
> > > > >
> > > > > +	/* Hint for event idx: already triggered no need to disable. */
> > > > > +	bool event_triggered;
> > > > > +
> > > > >  	union {
> > > > >  		/* Available for split ring */
> > > > >  		struct {
> > > > > @@ -739,7 +742,10 @@ static void virtqueue_disable_cb_split(struct virtqueue *_vq)
> > > > >
> > > > >  	if (!(vq->split.avail_flags_shadow & VRING_AVAIL_F_NO_INTERRUPT)) {
> > > > >  		vq->split.avail_flags_shadow |= VRING_AVAIL_F_NO_INTERRUPT;
> > > > > -		if (!vq->event)
> > > > > +		if (vq->event)
> > > > > +			/* TODO: this is a hack. Figure out a cleaner value to write. */
> > > > > +			vring_used_event(&vq->split.vring) = 0x0;
> > > > > +		else
> > > > >  			vq->split.vring.avail->flags =
> > > > >  				cpu_to_virtio16(_vq->vdev,
> > > > >  						vq->split.avail_flags_shadow);
> > > > > @@ -1605,6 +1611,7 @@ static struct virtqueue *vring_create_virtqueue_packed(
> > > > >  	vq->weak_barriers = weak_barriers;
> > > > >  	vq->broken = false;
> > > > >  	vq->last_used_idx = 0;
> > > > > +	vq->event_triggered = false;
> > > > >  	vq->num_added = 0;
> > > > >  	vq->packed_ring = true;
> > > > >  	vq->use_dma_api = vring_use_dma_api(vdev);
> > > > > @@ -1919,6 +1926,12 @@ void virtqueue_disable_cb(struct virtqueue *_vq)
> > > > >  {
> > > > >  	struct vring_virtqueue *vq = to_vvq(_vq);
> > > > >
> > > > > +	/* If device triggered an event already it won't trigger one again:
> > > > > +	 * no need to disable.
> > > > > +	 */
> > > > > +	if (vq->event_triggered)
> > > > > +		return;
> > > > > +
> > > > >  	if (vq->packed_ring)
> > > > >  		virtqueue_disable_cb_packed(_vq);
> > > > >  	else
> > > > > @@ -1942,6 +1955,9 @@ unsigned virtqueue_enable_cb_prepare(struct virtqueue *_vq)
> > > > >  {
> > > > >  	struct vring_virtqueue *vq = to_vvq(_vq);
> > > > >
> > > > > +	if (vq->event_triggered)
> > > > > +		vq->event_triggered = false;
> > > > > +
> > > > >  	return vq->packed_ring ? virtqueue_enable_cb_prepare_packed(_vq) :
> > > > >  				 virtqueue_enable_cb_prepare_split(_vq);
> > > > >  }
> > > > > @@ -2005,6 +2021,9 @@ bool virtqueue_enable_cb_delayed(struct virtqueue *_vq)
> > > > >  {
> > > > >  	struct vring_virtqueue *vq = to_vvq(_vq);
> > > > >
> > > > > +	if (vq->event_triggered)
> > > > > +		vq->event_triggered = false;
> > > > > +
> > > > >  	return vq->packed_ring ? virtqueue_enable_cb_delayed_packed(_vq) :
> > > > >  				 virtqueue_enable_cb_delayed_split(_vq);
> > > > >  }
> > > > > @@ -2044,6 +2063,10 @@ irqreturn_t vring_interrupt(int irq, void *_vq)
> > > > >  	if (unlikely(vq->broken))
> > > > >  		return IRQ_HANDLED;
> > > > >
> > > > > +	/* Just a hint for performance: so it's ok that this can be racy! */
> > > > > +	if (vq->event)
> > > > > +		vq->event_triggered = true;
> > > > > +
> > > > >  	pr_debug("virtqueue callback for %p (%p)\n", vq, vq->vq.callback);
> > > > >  	if (vq->vq.callback)
> > > > >  		vq->vq.callback(&vq->vq);
> > > > > @@ -2083,6 +2106,7 @@ struct virtqueue *__vring_new_virtqueue(unsigned int index,
> > > > >  	vq->weak_barriers = weak_barriers;
> > > > >  	vq->broken = false;
> > > > >  	vq->last_used_idx = 0;
> > > > > +	vq->event_triggered = false;
> > > > >  	vq->num_added = 0;
> > > > >  	vq->use_dma_api = vring_use_dma_api(vdev);
> > > > >  #ifdef DEBUG
> > > > > --
> > > > > MST
> > > > >
> > > > > _______________________________________________
> > > > > Virtualization mailing list
> > > > > Virtualization@lists.linux-foundation.org
> > > > > https://lists.linuxfoundation.org/mailman/listinfo/virtualization
> > >
>

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH v3 3/4] virtio: fix up virtio_disable_cb
@ 2023-03-31  3:38             ` Xuan Zhuo
  0 siblings, 0 replies; 49+ messages in thread
From: Xuan Zhuo @ 2023-03-31  3:38 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Willem de Bruijn, netdev, linux-kernel, virtualization,
	Jakub Kicinski, Wei Wang, David Miller

On Thu, 30 Mar 2023 10:04:03 -0400, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> On Thu, Mar 30, 2023 at 02:54:21PM +0800, Xuan Zhuo wrote:
> > On Thu, 30 Mar 2023 02:44:44 -0400, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > > On Thu, Mar 30, 2023 at 02:07:37PM +0800, Xuan Zhuo wrote:
> > > > On Wed, 26 May 2021 04:24:40 -0400, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > > > > virtio_disable_cb is currently a nop for split ring with event index.
> > > > > This is because it used to be always called from a callback when we know
> > > > > device won't trigger more events until we update the index.  However,
> > > > > now that we run with interrupts enabled a lot we also poll without a
> > > > > callback so that is different: disabling callbacks will help reduce the
> > > > > number of spurious interrupts.
> > > > > Further, if using event index with a packed ring, and if being called
> > > > > from a callback, we actually do disable interrupts which is unnecessary.
> > > > >
> > > > > Fix both issues by tracking whenever we get a callback. If that is
> > > > > the case disabling interrupts with event index can be a nop.
> > > > > If not the case disable interrupts. Note: with a split ring
> > > > > there's no explicit "no interrupts" value. For now we write
> > > > > a fixed value so our chance of triggering an interupt
> > > > > is 1/ring size. It's probably better to write something
> > > > > related to the last used index there to reduce the chance
> > > > > even further. For now I'm keeping it simple.
> > > >
> > > >
> > > > Don't understand, is this patch necessary? For this patch set, we can do without
> > > > this patch.
> > > >
> > > > So doest this patch optimize virtqueue_disable_cb() by reducing a modification of
> > > > vring_used_event(&vq-> split.vring)?
> > > >
> > > > Or I miss something.
> > > >
> > > > Thanks.
> > >
> > > Before this patch virtqueue_disable_cb did nothing at all
> > > for the common case of event index enabled, so
> > > calling it from virtio net would not help matters.
> >
> > I agree with these codes:
> >
> > -		if (!vq->event)
> > +		if (vq->event)
> > +			/* TODO: this is a hack. Figure out a cleaner value to write. */
> > +			vring_used_event(&vq->split.vring) = 0x0;
> > +		else
> >
> >
> > I just don't understand event_triggered.
>
>
> The comment near it says it all:
>         /* Hint for event idx: already triggered no need to disable. */
> the write into event idx is potentially expensive since it can
> invalidate cache for another processor (depending on the CPU).

Yes, I agree.

Thanks.


>
> > >
> > > But the patch is from 2021, isn't it a bit too late to argue?
> > > If you have a cleanup or an optimization in mind, please
> > > post a patch.
> >
> > Sorry, I just have some problems, I don't oppose it. At least it can reduce the
> > modification of vring_used_event(&vq->split.vring). I think it is also beneficial.
> >
> > Thanks very much.
> >
> >
> > >
> > > > >
> > > > > Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> > > > > ---
> > > > >  drivers/virtio/virtio_ring.c | 26 +++++++++++++++++++++++++-
> > > > >  1 file changed, 25 insertions(+), 1 deletion(-)
> > > > >
> > > > > diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> > > > > index 71e16b53e9c1..88f0b16b11b8 100644
> > > > > --- a/drivers/virtio/virtio_ring.c
> > > > > +++ b/drivers/virtio/virtio_ring.c
> > > > > @@ -113,6 +113,9 @@ struct vring_virtqueue {
> > > > >  	/* Last used index we've seen. */
> > > > >  	u16 last_used_idx;
> > > > >
> > > > > +	/* Hint for event idx: already triggered no need to disable. */
> > > > > +	bool event_triggered;
> > > > > +
> > > > >  	union {
> > > > >  		/* Available for split ring */
> > > > >  		struct {
> > > > > @@ -739,7 +742,10 @@ static void virtqueue_disable_cb_split(struct virtqueue *_vq)
> > > > >
> > > > >  	if (!(vq->split.avail_flags_shadow & VRING_AVAIL_F_NO_INTERRUPT)) {
> > > > >  		vq->split.avail_flags_shadow |= VRING_AVAIL_F_NO_INTERRUPT;
> > > > > -		if (!vq->event)
> > > > > +		if (vq->event)
> > > > > +			/* TODO: this is a hack. Figure out a cleaner value to write. */
> > > > > +			vring_used_event(&vq->split.vring) = 0x0;
> > > > > +		else
> > > > >  			vq->split.vring.avail->flags =
> > > > >  				cpu_to_virtio16(_vq->vdev,
> > > > >  						vq->split.avail_flags_shadow);
> > > > > @@ -1605,6 +1611,7 @@ static struct virtqueue *vring_create_virtqueue_packed(
> > > > >  	vq->weak_barriers = weak_barriers;
> > > > >  	vq->broken = false;
> > > > >  	vq->last_used_idx = 0;
> > > > > +	vq->event_triggered = false;
> > > > >  	vq->num_added = 0;
> > > > >  	vq->packed_ring = true;
> > > > >  	vq->use_dma_api = vring_use_dma_api(vdev);
> > > > > @@ -1919,6 +1926,12 @@ void virtqueue_disable_cb(struct virtqueue *_vq)
> > > > >  {
> > > > >  	struct vring_virtqueue *vq = to_vvq(_vq);
> > > > >
> > > > > +	/* If device triggered an event already it won't trigger one again:
> > > > > +	 * no need to disable.
> > > > > +	 */
> > > > > +	if (vq->event_triggered)
> > > > > +		return;
> > > > > +
> > > > >  	if (vq->packed_ring)
> > > > >  		virtqueue_disable_cb_packed(_vq);
> > > > >  	else
> > > > > @@ -1942,6 +1955,9 @@ unsigned virtqueue_enable_cb_prepare(struct virtqueue *_vq)
> > > > >  {
> > > > >  	struct vring_virtqueue *vq = to_vvq(_vq);
> > > > >
> > > > > +	if (vq->event_triggered)
> > > > > +		vq->event_triggered = false;
> > > > > +
> > > > >  	return vq->packed_ring ? virtqueue_enable_cb_prepare_packed(_vq) :
> > > > >  				 virtqueue_enable_cb_prepare_split(_vq);
> > > > >  }
> > > > > @@ -2005,6 +2021,9 @@ bool virtqueue_enable_cb_delayed(struct virtqueue *_vq)
> > > > >  {
> > > > >  	struct vring_virtqueue *vq = to_vvq(_vq);
> > > > >
> > > > > +	if (vq->event_triggered)
> > > > > +		vq->event_triggered = false;
> > > > > +
> > > > >  	return vq->packed_ring ? virtqueue_enable_cb_delayed_packed(_vq) :
> > > > >  				 virtqueue_enable_cb_delayed_split(_vq);
> > > > >  }
> > > > > @@ -2044,6 +2063,10 @@ irqreturn_t vring_interrupt(int irq, void *_vq)
> > > > >  	if (unlikely(vq->broken))
> > > > >  		return IRQ_HANDLED;
> > > > >
> > > > > +	/* Just a hint for performance: so it's ok that this can be racy! */
> > > > > +	if (vq->event)
> > > > > +		vq->event_triggered = true;
> > > > > +
> > > > >  	pr_debug("virtqueue callback for %p (%p)\n", vq, vq->vq.callback);
> > > > >  	if (vq->vq.callback)
> > > > >  		vq->vq.callback(&vq->vq);
> > > > > @@ -2083,6 +2106,7 @@ struct virtqueue *__vring_new_virtqueue(unsigned int index,
> > > > >  	vq->weak_barriers = weak_barriers;
> > > > >  	vq->broken = false;
> > > > >  	vq->last_used_idx = 0;
> > > > > +	vq->event_triggered = false;
> > > > >  	vq->num_added = 0;
> > > > >  	vq->use_dma_api = vring_use_dma_api(vdev);
> > > > >  #ifdef DEBUG
> > > > > --
> > > > > MST
> > > > >
> > > > > _______________________________________________
> > > > > Virtualization mailing list
> > > > > Virtualization@lists.linux-foundation.org
> > > > > https://lists.linuxfoundation.org/mailman/listinfo/virtualization
> > >
>
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 49+ messages in thread

end of thread, other threads:[~2023-03-31  3:39 UTC | newest]

Thread overview: 49+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-05-26  8:24 [PATCH v3 0/4] virtio net: spurious interrupt related fixes Michael S. Tsirkin
2021-05-26  8:24 ` Michael S. Tsirkin
2021-05-26  8:24 ` [PATCH v3 1/4] virtio_net: move tx vq operation under tx queue lock Michael S. Tsirkin
2021-05-26  8:24   ` Michael S. Tsirkin
2021-05-27  3:41   ` Jason Wang
2021-05-27  3:41     ` Jason Wang
2021-05-28 22:25     ` Willem de Bruijn
2021-05-28 22:25       ` Willem de Bruijn
2021-06-09 22:03       ` Michael S. Tsirkin
2021-06-09 22:03         ` Michael S. Tsirkin
2021-05-26  8:24 ` [PATCH v3 2/4] virtio_net: move txq wakeups under tx q lock Michael S. Tsirkin
2021-05-26  8:24   ` Michael S. Tsirkin
2021-05-27  3:48   ` Jason Wang
2021-05-27  3:48     ` Jason Wang
2021-05-26  8:24 ` [PATCH v3 3/4] virtio: fix up virtio_disable_cb Michael S. Tsirkin
2021-05-26  8:24   ` Michael S. Tsirkin
2021-05-27  4:01   ` Jason Wang
2021-05-27  4:01     ` Jason Wang
2023-03-30  6:07   ` Xuan Zhuo
2023-03-30  6:07     ` Xuan Zhuo
2023-03-30  6:44     ` Michael S. Tsirkin
2023-03-30  6:44       ` Michael S. Tsirkin
2023-03-30  6:54       ` Xuan Zhuo
2023-03-30  6:54         ` Xuan Zhuo
2023-03-30 14:04         ` Michael S. Tsirkin
2023-03-30 14:04           ` Michael S. Tsirkin
2023-03-31  3:38           ` Xuan Zhuo
2023-03-31  3:38             ` Xuan Zhuo
2021-05-26  8:24 ` [PATCH v3 4/4] virtio_net: disable cb aggressively Michael S. Tsirkin
2021-05-26  8:24   ` Michael S. Tsirkin
2021-05-26 15:15   ` Eric Dumazet
2021-05-26 15:15     ` Eric Dumazet
2021-05-26 21:22     ` Willem de Bruijn
2021-05-26 21:22       ` Willem de Bruijn
2021-05-26 19:39   ` Jakub Kicinski
2021-05-27  4:09   ` Jason Wang
2021-05-27  4:09     ` Jason Wang
2023-01-16 13:41   ` Laurent Vivier
2023-01-16 13:41     ` Laurent Vivier
2023-01-17  3:48     ` Jason Wang
2023-01-17  3:48       ` Jason Wang
2021-05-26 15:34 ` [PATCH v3 0/4] virtio net: spurious interrupt related fixes Willem de Bruijn
2021-05-26 15:34   ` Willem de Bruijn
2021-06-01  2:53   ` Willem de Bruijn
2021-06-01  2:53     ` Willem de Bruijn
2021-06-09 21:36     ` Willem de Bruijn
2021-06-09 21:36       ` Willem de Bruijn
2021-06-09 22:59       ` Willem de Bruijn
2021-06-09 22:59         ` Willem de Bruijn

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.