netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH net-next 0/6] Always use tx interrupt for virtio-net
@ 2014-10-15  7:25 Jason Wang
  2014-10-15  7:25 ` [RFC PATCH net-next 1/6] virtio: make sure used event never go backwards Jason Wang
                   ` (6 more replies)
  0 siblings, 7 replies; 36+ messages in thread
From: Jason Wang @ 2014-10-15  7:25 UTC (permalink / raw)
  To: rusty, mst, virtualization, linux-kernel, netdev; +Cc: davem, eric.dumazet

According to David, proper accounting and queueing (at all levels, not
just TCP sockets) is more important than trying to skim a bunch of
cycles by avoiding TX interrupts. Having an event to free the SKB is
absolutely essential for the stack to operate correctly.

This series tries to enable tx interrupt for virtio-net. The idea is
simple: enable tx interrupt and schedule a tx napi to free old xmit
skbs.

Several notes:
- Tx interrupt storm avoidance when queue is about to be full is
  kept. Since we may enable callbacks in both ndo_start_xmit() and tx
  napi, patch 1 adds a check to make sure used event never go
  back. This will let the napi not enable the callbacks wrongly after
  delayed callbacks was used.
- For bulk dequeuing, there's no need to enable tx interrupt for each
  packet. The last patch only enable tx interrupt for the final skb in
  the chain through xmit_more and a new helper to publish current avail
  idx as used event.

This series fixes several issues of original rfc pointed out by Michael.

Smoking test is done, and will do complete netperf test. Send the
seires for early review.

Thanks

Jason Wang (6):
  virtio: make sure used event never go backwards
  virtio: introduce virtio_enable_cb_avail()
  virtio-net: small optimization on free_old_xmit_skbs()
  virtio-net: return the number of packets sent in free_old_xmit_skbs()
  virtio-net: enable tx interrupt
  virtio-net: enable tx interrupt only for the final skb in the chain

 drivers/net/virtio_net.c     |  125 ++++++++++++++++++++++++++++++------------
 drivers/virtio/virtio_ring.c |   25 ++++++++-
 include/linux/virtio.h       |    2 +
 3 files changed, 115 insertions(+), 37 deletions(-)

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [RFC PATCH net-next 1/6] virtio: make sure used event never go backwards
  2014-10-15  7:25 [RFC PATCH net-next 0/6] Always use tx interrupt for virtio-net Jason Wang
@ 2014-10-15  7:25 ` Jason Wang
  2014-10-15  9:34   ` Michael S. Tsirkin
  2014-10-15  7:25 ` [RFC PATCH net-next 2/6] virtio: introduce virtio_enable_cb_avail() Jason Wang
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 36+ messages in thread
From: Jason Wang @ 2014-10-15  7:25 UTC (permalink / raw)
  To: rusty, mst, virtualization, linux-kernel, netdev; +Cc: davem, eric.dumazet

This patch checks the new event idx to make sure used event idx never
goes back. This is used to synchronize the calls between
virtqueue_enable_cb_delayed() and virtqueue_enable_cb().

Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 drivers/virtio/virtio_ring.c |    7 +++++--
 1 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index 3b1f89b..1b3929f 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -559,14 +559,17 @@ unsigned virtqueue_enable_cb_prepare(struct virtqueue *_vq)
 	u16 last_used_idx;
 
 	START_USE(vq);
-
+	last_used_idx = vq->last_used_idx;
 	/* We optimistically turn back on interrupts, then check if there was
 	 * more to do. */
 	/* Depending on the VIRTIO_RING_F_EVENT_IDX feature, we need to
 	 * either clear the flags bit or point the event index at the next
 	 * entry. Always do both to keep code simple. */
 	vq->vring.avail->flags &= ~VRING_AVAIL_F_NO_INTERRUPT;
-	vring_used_event(&vq->vring) = last_used_idx = vq->last_used_idx;
+	/* Make sure used event never go backwards */
+	if (!vring_need_event(vring_used_event(&vq->vring),
+			      vq->vring.avail->idx, last_used_idx))
+		vring_used_event(&vq->vring) = last_used_idx;
 	END_USE(vq);
 	return last_used_idx;
 }
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [RFC PATCH net-next 2/6] virtio: introduce virtio_enable_cb_avail()
  2014-10-15  7:25 [RFC PATCH net-next 0/6] Always use tx interrupt for virtio-net Jason Wang
  2014-10-15  7:25 ` [RFC PATCH net-next 1/6] virtio: make sure used event never go backwards Jason Wang
@ 2014-10-15  7:25 ` Jason Wang
  2014-10-15  9:28   ` Michael S. Tsirkin
  2014-10-15  7:25 ` [RFC PATCH net-next 3/6] virtio-net: small optimization on free_old_xmit_skbs() Jason Wang
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 36+ messages in thread
From: Jason Wang @ 2014-10-15  7:25 UTC (permalink / raw)
  To: rusty, mst, virtualization, linux-kernel, netdev; +Cc: davem, eric.dumazet

This patch introduces virtio_enable_cb_avail() to publish avail idx
and used event. This could be used by batched buffer submitting to
reduce the number of tx interrupts.

Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 drivers/virtio/virtio_ring.c |   22 ++++++++++++++++++++--
 include/linux/virtio.h       |    2 ++
 2 files changed, 22 insertions(+), 2 deletions(-)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index 1b3929f..d67fbf8 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -567,14 +567,32 @@ unsigned virtqueue_enable_cb_prepare(struct virtqueue *_vq)
 	 * entry. Always do both to keep code simple. */
 	vq->vring.avail->flags &= ~VRING_AVAIL_F_NO_INTERRUPT;
 	/* Make sure used event never go backwards */
-	if (!vring_need_event(vring_used_event(&vq->vring),
-			      vq->vring.avail->idx, last_used_idx))
+	if (vq->vring.avail->idx != vring_used_event(&vq->vring) &&
+	    !vring_need_event(vring_used_event(&vq->vring),
+			      vq->vring.avail->idx, last_used_idx)) {
 		vring_used_event(&vq->vring) = last_used_idx;
+	}
 	END_USE(vq);
 	return last_used_idx;
 }
 EXPORT_SYMBOL_GPL(virtqueue_enable_cb_prepare);
 
+bool virtqueue_enable_cb_avail(struct virtqueue *_vq)
+{
+	struct vring_virtqueue *vq = to_vvq(_vq);
+	bool ret;
+
+	START_USE(vq);
+	vq->vring.avail->flags &= ~VRING_AVAIL_F_NO_INTERRUPT;
+	vring_used_event(&vq->vring) = vq->vring.avail->idx;
+	ret = vring_need_event(vq->vring.avail->idx,
+			       vq->last_used_idx, vq->vring.used->idx);
+	END_USE(vq);
+
+	return ret;
+}
+EXPORT_SYMBOL_GPL(virtqueue_enable_cb_avail);
+
 /**
  * virtqueue_poll - query pending used buffers
  * @vq: the struct virtqueue we're talking about.
diff --git a/include/linux/virtio.h b/include/linux/virtio.h
index b46671e..bfaf058 100644
--- a/include/linux/virtio.h
+++ b/include/linux/virtio.h
@@ -65,6 +65,8 @@ bool virtqueue_enable_cb(struct virtqueue *vq);
 
 unsigned virtqueue_enable_cb_prepare(struct virtqueue *vq);
 
+bool virtqueue_enable_cb_avail(struct virtqueue *vq);
+
 bool virtqueue_poll(struct virtqueue *vq, unsigned);
 
 bool virtqueue_enable_cb_delayed(struct virtqueue *vq);
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [RFC PATCH net-next 3/6] virtio-net: small optimization on free_old_xmit_skbs()
  2014-10-15  7:25 [RFC PATCH net-next 0/6] Always use tx interrupt for virtio-net Jason Wang
  2014-10-15  7:25 ` [RFC PATCH net-next 1/6] virtio: make sure used event never go backwards Jason Wang
  2014-10-15  7:25 ` [RFC PATCH net-next 2/6] virtio: introduce virtio_enable_cb_avail() Jason Wang
@ 2014-10-15  7:25 ` Jason Wang
  2014-10-15  9:36   ` Eric Dumazet
  2014-10-15  9:37   ` Michael S. Tsirkin
  2014-10-15  7:25 ` [RFC PATCH net-next 4/6] virtio-net: return the number of packets sent in free_old_xmit_skbs() Jason Wang
                   ` (3 subsequent siblings)
  6 siblings, 2 replies; 36+ messages in thread
From: Jason Wang @ 2014-10-15  7:25 UTC (permalink / raw)
  To: rusty, mst, virtualization, linux-kernel, netdev; +Cc: davem, eric.dumazet

Accumulate the sent packets and sent bytes in local variables and perform a
single u64_stats_update_begin/end() after.

Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 drivers/net/virtio_net.c |   12 ++++++++----
 1 files changed, 8 insertions(+), 4 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 3d0ce44..a4d56b8 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -833,17 +833,21 @@ static void free_old_xmit_skbs(struct send_queue *sq)
 	unsigned int len;
 	struct virtnet_info *vi = sq->vq->vdev->priv;
 	struct virtnet_stats *stats = this_cpu_ptr(vi->stats);
+	u64 tx_bytes = 0, tx_packets = 0;
 
 	while ((skb = virtqueue_get_buf(sq->vq, &len)) != NULL) {
 		pr_debug("Sent skb %p\n", skb);
 
-		u64_stats_update_begin(&stats->tx_syncp);
-		stats->tx_bytes += skb->len;
-		stats->tx_packets++;
-		u64_stats_update_end(&stats->tx_syncp);
+		tx_bytes += skb->len;
+		tx_packets++;
 
 		dev_kfree_skb_any(skb);
 	}
+
+	u64_stats_update_begin(&stats->tx_syncp);
+	stats->tx_bytes += tx_bytes;
+	stats->tx_packets =+ tx_packets;
+	u64_stats_update_end(&stats->tx_syncp);
 }
 
 static int xmit_skb(struct send_queue *sq, struct sk_buff *skb)
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [RFC PATCH net-next 4/6] virtio-net: return the number of packets sent in free_old_xmit_skbs()
  2014-10-15  7:25 [RFC PATCH net-next 0/6] Always use tx interrupt for virtio-net Jason Wang
                   ` (2 preceding siblings ...)
  2014-10-15  7:25 ` [RFC PATCH net-next 3/6] virtio-net: small optimization on free_old_xmit_skbs() Jason Wang
@ 2014-10-15  7:25 ` Jason Wang
  2014-10-15  7:25 ` [RFC PATCH net-next 5/6] virtio-net: enable tx interrupt Jason Wang
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 36+ messages in thread
From: Jason Wang @ 2014-10-15  7:25 UTC (permalink / raw)
  To: rusty, mst, virtualization, linux-kernel, netdev
  Cc: davem, eric.dumazet, Jason Wang

This patch returns the number of packets sent in free_old_xmit_skbs(), this is
necessary for calling this function in napi.

Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 drivers/net/virtio_net.c |    4 +++-
 1 files changed, 3 insertions(+), 1 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index a4d56b8..ccf98f9 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -827,7 +827,7 @@ static int virtnet_open(struct net_device *dev)
 	return 0;
 }
 
-static void free_old_xmit_skbs(struct send_queue *sq)
+static int free_old_xmit_skbs(struct send_queue *sq)
 {
 	struct sk_buff *skb;
 	unsigned int len;
@@ -848,6 +848,8 @@ static void free_old_xmit_skbs(struct send_queue *sq)
 	stats->tx_bytes += tx_bytes;
 	stats->tx_packets =+ tx_packets;
 	u64_stats_update_end(&stats->tx_syncp);
+
+	return tx_packets;
 }
 
 static int xmit_skb(struct send_queue *sq, struct sk_buff *skb)
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [RFC PATCH net-next 5/6] virtio-net: enable tx interrupt
  2014-10-15  7:25 [RFC PATCH net-next 0/6] Always use tx interrupt for virtio-net Jason Wang
                   ` (3 preceding siblings ...)
  2014-10-15  7:25 ` [RFC PATCH net-next 4/6] virtio-net: return the number of packets sent in free_old_xmit_skbs() Jason Wang
@ 2014-10-15  7:25 ` Jason Wang
  2014-10-15  9:37   ` Eric Dumazet
  2014-10-15 10:18   ` Michael S. Tsirkin
  2014-10-15  7:25 ` [RFC PATCH net-next 6/6] virtio-net: enable tx interrupt only for the final skb in the chain Jason Wang
  2014-10-15 10:25 ` [RFC PATCH net-next 0/6] Always use tx interrupt for virtio-net Michael S. Tsirkin
  6 siblings, 2 replies; 36+ messages in thread
From: Jason Wang @ 2014-10-15  7:25 UTC (permalink / raw)
  To: rusty, mst, virtualization, linux-kernel, netdev
  Cc: davem, eric.dumazet, Jason Wang

Orphan skb in ndo_start_xmit() breaks socket accounting and packet
queuing. This in fact breaks lots of things such as pktgen and several
TCP optimizations. And also make BQL can't be implemented for
virtio-net.

This patch tries to solve this issue by enabling tx interrupt. To
avoid introducing extra spinlocks, a tx napi was scheduled to free
those packets.

More tx interrupt mitigation method could be used on top.

Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 drivers/net/virtio_net.c |  125 +++++++++++++++++++++++++++++++---------------
 1 files changed, 85 insertions(+), 40 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index ccf98f9..2afc2e2 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -72,6 +72,8 @@ struct send_queue {
 
 	/* Name of the send queue: output.$index */
 	char name[40];
+
+	struct napi_struct napi;
 };
 
 /* Internal representation of a receive virtqueue */
@@ -217,15 +219,40 @@ static struct page *get_a_page(struct receive_queue *rq, gfp_t gfp_mask)
 	return p;
 }
 
+static int free_old_xmit_skbs(struct send_queue *sq, int budget)
+{
+	struct sk_buff *skb;
+	unsigned int len;
+	struct virtnet_info *vi = sq->vq->vdev->priv;
+	struct virtnet_stats *stats = this_cpu_ptr(vi->stats);
+	u64 tx_bytes = 0, tx_packets = 0;
+
+	while (tx_packets < budget &&
+	       (skb = virtqueue_get_buf(sq->vq, &len)) != NULL) {
+		pr_debug("Sent skb %p\n", skb);
+
+		tx_bytes += skb->len;
+		tx_packets++;
+
+		dev_kfree_skb_any(skb);
+	}
+
+	u64_stats_update_begin(&stats->tx_syncp);
+	stats->tx_bytes += tx_bytes;
+	stats->tx_packets =+ tx_packets;
+	u64_stats_update_end(&stats->tx_syncp);
+
+	return tx_packets;
+}
+
 static void skb_xmit_done(struct virtqueue *vq)
 {
 	struct virtnet_info *vi = vq->vdev->priv;
+	struct send_queue *sq = &vi->sq[vq2txq(vq)];
 
-	/* Suppress further interrupts. */
-	virtqueue_disable_cb(vq);
-
-	/* We were probably waiting for more output buffers. */
-	netif_wake_subqueue(vi->dev, vq2txq(vq));
+	if (napi_schedule_prep(&sq->napi)) {
+		__napi_schedule(&sq->napi);
+	}
 }
 
 static unsigned int mergeable_ctx_to_buf_truesize(unsigned long mrg_ctx)
@@ -774,7 +801,39 @@ again:
 	return received;
 }
 
+static int virtnet_poll_tx(struct napi_struct *napi, int budget)
+{
+	struct send_queue *sq =
+		container_of(napi, struct send_queue, napi);
+	struct virtnet_info *vi = sq->vq->vdev->priv;
+	struct netdev_queue *txq = netdev_get_tx_queue(vi->dev, vq2txq(sq->vq));
+	unsigned int r, sent = 0;
+
+again:
+	__netif_tx_lock(txq, smp_processor_id());
+	virtqueue_disable_cb(sq->vq);
+	sent += free_old_xmit_skbs(sq, budget - sent);
+
+	if (sent < budget) {
+		r = virtqueue_enable_cb_prepare(sq->vq);
+		napi_complete(napi);
+		__netif_tx_unlock(txq);
+		if (unlikely(virtqueue_poll(sq->vq, r)) &&
+		    napi_schedule_prep(napi)) {
+			virtqueue_disable_cb(sq->vq);
+			__napi_schedule(napi);
+			goto again;
+		}
+	} else {
+		__netif_tx_unlock(txq);
+	}
+
+	netif_wake_subqueue(vi->dev, vq2txq(sq->vq));
+	return sent;
+}
+
 #ifdef CONFIG_NET_RX_BUSY_POLL
+
 /* must be called with local_bh_disable()d */
 static int virtnet_busy_poll(struct napi_struct *napi)
 {
@@ -822,36 +881,12 @@ static int virtnet_open(struct net_device *dev)
 			if (!try_fill_recv(&vi->rq[i], GFP_KERNEL))
 				schedule_delayed_work(&vi->refill, 0);
 		virtnet_napi_enable(&vi->rq[i]);
+		napi_enable(&vi->sq[i].napi);
 	}
 
 	return 0;
 }
 
-static int free_old_xmit_skbs(struct send_queue *sq)
-{
-	struct sk_buff *skb;
-	unsigned int len;
-	struct virtnet_info *vi = sq->vq->vdev->priv;
-	struct virtnet_stats *stats = this_cpu_ptr(vi->stats);
-	u64 tx_bytes = 0, tx_packets = 0;
-
-	while ((skb = virtqueue_get_buf(sq->vq, &len)) != NULL) {
-		pr_debug("Sent skb %p\n", skb);
-
-		tx_bytes += skb->len;
-		tx_packets++;
-
-		dev_kfree_skb_any(skb);
-	}
-
-	u64_stats_update_begin(&stats->tx_syncp);
-	stats->tx_bytes += tx_bytes;
-	stats->tx_packets =+ tx_packets;
-	u64_stats_update_end(&stats->tx_syncp);
-
-	return tx_packets;
-}
-
 static int xmit_skb(struct send_queue *sq, struct sk_buff *skb)
 {
 	struct skb_vnet_hdr *hdr;
@@ -917,6 +952,7 @@ static int xmit_skb(struct send_queue *sq, struct sk_buff *skb)
 		sg_set_buf(sq->sg, hdr, hdr_len);
 		num_sg = skb_to_sgvec(skb, sq->sg + 1, 0, skb->len) + 1;
 	}
+
 	return virtqueue_add_outbuf(sq->vq, sq->sg, num_sg, skb, GFP_ATOMIC);
 }
 
@@ -925,10 +961,11 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev)
 	struct virtnet_info *vi = netdev_priv(dev);
 	int qnum = skb_get_queue_mapping(skb);
 	struct send_queue *sq = &vi->sq[qnum];
-	int err;
+	int err, qsize = virtqueue_get_vring_size(sq->vq);
 
+	virtqueue_disable_cb(sq->vq);
 	/* Free up any pending old buffers before queueing new ones. */
-	free_old_xmit_skbs(sq);
+	free_old_xmit_skbs(sq, qsize);
 
 	/* Try to transmit */
 	err = xmit_skb(sq, skb);
@@ -944,22 +981,20 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev)
 		return NETDEV_TX_OK;
 	}
 
-	/* Don't wait up for transmitted skbs to be freed. */
-	skb_orphan(skb);
-	nf_reset(skb);
-
 	/* Apparently nice girls don't return TX_BUSY; stop the queue
 	 * before it gets out of hand.  Naturally, this wastes entries. */
 	if (sq->vq->num_free < 2+MAX_SKB_FRAGS) {
 		netif_stop_subqueue(dev, qnum);
 		if (unlikely(!virtqueue_enable_cb_delayed(sq->vq))) {
 			/* More just got used, free them then recheck. */
-			free_old_xmit_skbs(sq);
+			free_old_xmit_skbs(sq, qsize);
 			if (sq->vq->num_free >= 2+MAX_SKB_FRAGS) {
 				netif_start_subqueue(dev, qnum);
 				virtqueue_disable_cb(sq->vq);
 			}
 		}
+	} else if (virtqueue_enable_cb(sq->vq)) {
+		free_old_xmit_skbs(sq, qsize);
 	}
 
 	if (__netif_subqueue_stopped(dev, qnum) || !skb->xmit_more)
@@ -1141,8 +1176,10 @@ static int virtnet_close(struct net_device *dev)
 	/* Make sure refill_work doesn't re-enable napi! */
 	cancel_delayed_work_sync(&vi->refill);
 
-	for (i = 0; i < vi->max_queue_pairs; i++)
+	for (i = 0; i < vi->max_queue_pairs; i++) {
 		napi_disable(&vi->rq[i].napi);
+		napi_disable(&vi->sq[i].napi);
+	}
 
 	return 0;
 }
@@ -1461,8 +1498,10 @@ static void virtnet_free_queues(struct virtnet_info *vi)
 {
 	int i;
 
-	for (i = 0; i < vi->max_queue_pairs; i++)
+	for (i = 0; i < vi->max_queue_pairs; i++) {
 		netif_napi_del(&vi->rq[i].napi);
+		netif_napi_del(&vi->sq[i].napi);
+	}
 
 	kfree(vi->rq);
 	kfree(vi->sq);
@@ -1616,6 +1655,8 @@ static int virtnet_alloc_queues(struct virtnet_info *vi)
 		netif_napi_add(vi->dev, &vi->rq[i].napi, virtnet_poll,
 			       napi_weight);
 		napi_hash_add(&vi->rq[i].napi);
+		netif_napi_add(vi->dev, &vi->sq[i].napi, virtnet_poll_tx,
+			       napi_weight);
 
 		sg_init_table(vi->rq[i].sg, ARRAY_SIZE(vi->rq[i].sg));
 		ewma_init(&vi->rq[i].mrg_avg_pkt_len, 1, RECEIVE_AVG_WEIGHT);
@@ -1920,8 +1961,10 @@ static int virtnet_freeze(struct virtio_device *vdev)
 	if (netif_running(vi->dev)) {
 		for (i = 0; i < vi->max_queue_pairs; i++) {
 			napi_disable(&vi->rq[i].napi);
+			napi_disable(&vi->sq[i].napi);
 			napi_hash_del(&vi->rq[i].napi);
 			netif_napi_del(&vi->rq[i].napi);
+			netif_napi_del(&vi->sq[i].napi);
 		}
 	}
 
@@ -1946,8 +1989,10 @@ static int virtnet_restore(struct virtio_device *vdev)
 			if (!try_fill_recv(&vi->rq[i], GFP_KERNEL))
 				schedule_delayed_work(&vi->refill, 0);
 
-		for (i = 0; i < vi->max_queue_pairs; i++)
+		for (i = 0; i < vi->max_queue_pairs; i++) {
 			virtnet_napi_enable(&vi->rq[i]);
+			napi_enable(&vi->sq[i].napi);
+		}
 	}
 
 	netif_device_attach(vi->dev);
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [RFC PATCH net-next 6/6] virtio-net: enable tx interrupt only for the final skb in the chain
  2014-10-15  7:25 [RFC PATCH net-next 0/6] Always use tx interrupt for virtio-net Jason Wang
                   ` (4 preceding siblings ...)
  2014-10-15  7:25 ` [RFC PATCH net-next 5/6] virtio-net: enable tx interrupt Jason Wang
@ 2014-10-15  7:25 ` Jason Wang
  2014-10-15 10:22   ` Michael S. Tsirkin
  2014-10-15 10:25 ` [RFC PATCH net-next 0/6] Always use tx interrupt for virtio-net Michael S. Tsirkin
  6 siblings, 1 reply; 36+ messages in thread
From: Jason Wang @ 2014-10-15  7:25 UTC (permalink / raw)
  To: rusty, mst, virtualization, linux-kernel, netdev
  Cc: davem, eric.dumazet, Jason Wang

With the help of xmit_more and virtqueue_enable_cb_avail(), this patch
enable tx interrupt only for the final skb in the chain if
needed. This will help to mitigate tx interrupts.

Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 drivers/net/virtio_net.c |   10 +++++++---
 1 files changed, 7 insertions(+), 3 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 2afc2e2..5943f3f 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -993,12 +993,16 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev)
 				virtqueue_disable_cb(sq->vq);
 			}
 		}
-	} else if (virtqueue_enable_cb(sq->vq)) {
-		free_old_xmit_skbs(sq, qsize);
 	}
 
-	if (__netif_subqueue_stopped(dev, qnum) || !skb->xmit_more)
+	if (__netif_subqueue_stopped(dev, qnum) || !skb->xmit_more) {
 		virtqueue_kick(sq->vq);
+		if (sq->vq->num_free >= 2 +MAX_SKB_FRAGS &&
+		    unlikely(virtqueue_enable_cb_avail(sq->vq))) {
+			free_old_xmit_skbs(sq, qsize);
+			virtqueue_disable_cb(sq->vq);
+		}
+	}
 
 	return NETDEV_TX_OK;
 }
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* Re: [RFC PATCH net-next 2/6] virtio: introduce virtio_enable_cb_avail()
  2014-10-15  7:25 ` [RFC PATCH net-next 2/6] virtio: introduce virtio_enable_cb_avail() Jason Wang
@ 2014-10-15  9:28   ` Michael S. Tsirkin
  2014-10-15 10:19     ` Jason Wang
  0 siblings, 1 reply; 36+ messages in thread
From: Michael S. Tsirkin @ 2014-10-15  9:28 UTC (permalink / raw)
  To: Jason Wang; +Cc: eric.dumazet, netdev, linux-kernel, virtualization, davem

On Wed, Oct 15, 2014 at 03:25:26PM +0800, Jason Wang wrote:
> This patch introduces virtio_enable_cb_avail() to publish avail idx
> and used event. This could be used by batched buffer submitting to
> reduce the number of tx interrupts.
> 
> Cc: Rusty Russell <rusty@rustcorp.com.au>
> Cc: Michael S. Tsirkin <mst@redhat.com>
> Signed-off-by: Jason Wang <jasowang@redhat.com>
> ---
>  drivers/virtio/virtio_ring.c |   22 ++++++++++++++++++++--
>  include/linux/virtio.h       |    2 ++
>  2 files changed, 22 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> index 1b3929f..d67fbf8 100644
> --- a/drivers/virtio/virtio_ring.c
> +++ b/drivers/virtio/virtio_ring.c
> @@ -567,14 +567,32 @@ unsigned virtqueue_enable_cb_prepare(struct virtqueue *_vq)
>  	 * entry. Always do both to keep code simple. */
>  	vq->vring.avail->flags &= ~VRING_AVAIL_F_NO_INTERRUPT;
>  	/* Make sure used event never go backwards */
> -	if (!vring_need_event(vring_used_event(&vq->vring),
> -			      vq->vring.avail->idx, last_used_idx))
> +	if (vq->vring.avail->idx != vring_used_event(&vq->vring) &&
> +	    !vring_need_event(vring_used_event(&vq->vring),
> +			      vq->vring.avail->idx, last_used_idx)) {
>  		vring_used_event(&vq->vring) = last_used_idx;
> +	}
>  	END_USE(vq);
>  	return last_used_idx;
>  }
>  EXPORT_SYMBOL_GPL(virtqueue_enable_cb_prepare);
>

I see you are also changing virtqueue_enable_cb_prepare, why?

> +bool virtqueue_enable_cb_avail(struct virtqueue *_vq)
> +{
> +	struct vring_virtqueue *vq = to_vvq(_vq);
> +	bool ret;
> +
> +	START_USE(vq);
> +	vq->vring.avail->flags &= ~VRING_AVAIL_F_NO_INTERRUPT;
> +	vring_used_event(&vq->vring) = vq->vring.avail->idx;
> +	ret = vring_need_event(vq->vring.avail->idx,
> +			       vq->last_used_idx, vq->vring.used->idx);
> +	END_USE(vq);
> +
> +	return ret;
> +}
> +EXPORT_SYMBOL_GPL(virtqueue_enable_cb_avail);
> +
>  /**
>   * virtqueue_poll - query pending used buffers
>   * @vq: the struct virtqueue we're talking about.

Could not figure out what this does.
Please add documentation.

> diff --git a/include/linux/virtio.h b/include/linux/virtio.h
> index b46671e..bfaf058 100644
> --- a/include/linux/virtio.h
> +++ b/include/linux/virtio.h
> @@ -65,6 +65,8 @@ bool virtqueue_enable_cb(struct virtqueue *vq);
>  
>  unsigned virtqueue_enable_cb_prepare(struct virtqueue *vq);
>  
> +bool virtqueue_enable_cb_avail(struct virtqueue *vq);
> +
>  bool virtqueue_poll(struct virtqueue *vq, unsigned);
>  
>  bool virtqueue_enable_cb_delayed(struct virtqueue *vq);
> -- 
> 1.7.1

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [RFC PATCH net-next 1/6] virtio: make sure used event never go backwards
  2014-10-15  7:25 ` [RFC PATCH net-next 1/6] virtio: make sure used event never go backwards Jason Wang
@ 2014-10-15  9:34   ` Michael S. Tsirkin
  2014-10-15 10:13     ` Jason Wang
  0 siblings, 1 reply; 36+ messages in thread
From: Michael S. Tsirkin @ 2014-10-15  9:34 UTC (permalink / raw)
  To: Jason Wang; +Cc: eric.dumazet, netdev, linux-kernel, virtualization, davem

On Wed, Oct 15, 2014 at 03:25:25PM +0800, Jason Wang wrote:
> This patch checks the new event idx to make sure used event idx never
> goes back. This is used to synchronize the calls between
> virtqueue_enable_cb_delayed() and virtqueue_enable_cb().
> 
> Cc: Rusty Russell <rusty@rustcorp.com.au>
> Cc: Michael S. Tsirkin <mst@redhat.com>
> Signed-off-by: Jason Wang <jasowang@redhat.com>

the implication being that moving event idx back might cause some race
condition?  If yes but please describe the race explicitly.
Is there a bug we need to fix on stable?
Please also explicitly describe a configuration that causes event idx
to go back.

All this info should go in the commit log.

> ---
>  drivers/virtio/virtio_ring.c |    7 +++++--
>  1 files changed, 5 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> index 3b1f89b..1b3929f 100644
> --- a/drivers/virtio/virtio_ring.c
> +++ b/drivers/virtio/virtio_ring.c
> @@ -559,14 +559,17 @@ unsigned virtqueue_enable_cb_prepare(struct virtqueue *_vq)
>  	u16 last_used_idx;
>  
>  	START_USE(vq);
> -
> +	last_used_idx = vq->last_used_idx;
>  	/* We optimistically turn back on interrupts, then check if there was
>  	 * more to do. */
>  	/* Depending on the VIRTIO_RING_F_EVENT_IDX feature, we need to
>  	 * either clear the flags bit or point the event index at the next
>  	 * entry. Always do both to keep code simple. */
>  	vq->vring.avail->flags &= ~VRING_AVAIL_F_NO_INTERRUPT;
> -	vring_used_event(&vq->vring) = last_used_idx = vq->last_used_idx;
> +	/* Make sure used event never go backwards */

s/go/goes/

> +	if (!vring_need_event(vring_used_event(&vq->vring),
> +			      vq->vring.avail->idx, last_used_idx))
> +		vring_used_event(&vq->vring) = last_used_idx;

The result will be that driver will *not* get an interrupt
on the next consumed buffer, which is likely not what driver
intended when it called virtqueue_enable_cb.

Instead, how about we simply document the requirement that drivers either
always call virtqueue_enable_cb_delayed or virtqueue_enable_cb
but not both?


>  	END_USE(vq);
>  	return last_used_idx;
>  }
> -- 
> 1.7.1

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [RFC PATCH net-next 3/6] virtio-net: small optimization on free_old_xmit_skbs()
  2014-10-15  7:25 ` [RFC PATCH net-next 3/6] virtio-net: small optimization on free_old_xmit_skbs() Jason Wang
@ 2014-10-15  9:36   ` Eric Dumazet
  2014-10-15  9:37   ` Michael S. Tsirkin
  1 sibling, 0 replies; 36+ messages in thread
From: Eric Dumazet @ 2014-10-15  9:36 UTC (permalink / raw)
  To: Jason Wang; +Cc: mst, netdev, linux-kernel, virtualization, davem

On Wed, 2014-10-15 at 15:25 +0800, Jason Wang wrote:
> Accumulate the sent packets and sent bytes in local variables and perform a
> single u64_stats_update_begin/end() after.
> 
> Cc: Rusty Russell <rusty@rustcorp.com.au>
> Cc: Michael S. Tsirkin <mst@redhat.com>
> Signed-off-by: Jason Wang <jasowang@redhat.com>
> ---
>  drivers/net/virtio_net.c |   12 ++++++++----
>  1 files changed, 8 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index 3d0ce44..a4d56b8 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -833,17 +833,21 @@ static void free_old_xmit_skbs(struct send_queue *sq)
>  	unsigned int len;
>  	struct virtnet_info *vi = sq->vq->vdev->priv;
>  	struct virtnet_stats *stats = this_cpu_ptr(vi->stats);
> +	u64 tx_bytes = 0, tx_packets = 0;
>  
>  	while ((skb = virtqueue_get_buf(sq->vq, &len)) != NULL) {
>  		pr_debug("Sent skb %p\n", skb);
>  
> -		u64_stats_update_begin(&stats->tx_syncp);
> -		stats->tx_bytes += skb->len;
> -		stats->tx_packets++;
> -		u64_stats_update_end(&stats->tx_syncp);
> +		tx_bytes += skb->len;
> +		tx_packets++;
>  
>  		dev_kfree_skb_any(skb);
>  	}
> +
> +	u64_stats_update_begin(&stats->tx_syncp);
> +	stats->tx_bytes += tx_bytes;
> +	stats->tx_packets =+ tx_packets;

	
	stats->tx_packets += tx_packets;


> +	u64_stats_update_end(&stats->tx_syncp);
>  }
>  
>  static int xmit_skb(struct send_queue *sq, struct sk_buff *skb)

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [RFC PATCH net-next 3/6] virtio-net: small optimization on free_old_xmit_skbs()
  2014-10-15  7:25 ` [RFC PATCH net-next 3/6] virtio-net: small optimization on free_old_xmit_skbs() Jason Wang
  2014-10-15  9:36   ` Eric Dumazet
@ 2014-10-15  9:37   ` Michael S. Tsirkin
  2014-10-15  9:49     ` David Laight
  1 sibling, 1 reply; 36+ messages in thread
From: Michael S. Tsirkin @ 2014-10-15  9:37 UTC (permalink / raw)
  To: Jason Wang; +Cc: eric.dumazet, netdev, linux-kernel, virtualization, davem

On Wed, Oct 15, 2014 at 03:25:27PM +0800, Jason Wang wrote:
> Accumulate the sent packets and sent bytes in local variables and perform a
> single u64_stats_update_begin/end() after.
> 
> Cc: Rusty Russell <rusty@rustcorp.com.au>
> Cc: Michael S. Tsirkin <mst@redhat.com>
> Signed-off-by: Jason Wang <jasowang@redhat.com>

Not sure how much it's worth but since Eric suggested it ...

Acked-by: Michael S. Tsirkin <mst@redhat.com>

> ---
>  drivers/net/virtio_net.c |   12 ++++++++----
>  1 files changed, 8 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index 3d0ce44..a4d56b8 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -833,17 +833,21 @@ static void free_old_xmit_skbs(struct send_queue *sq)
>  	unsigned int len;
>  	struct virtnet_info *vi = sq->vq->vdev->priv;
>  	struct virtnet_stats *stats = this_cpu_ptr(vi->stats);
> +	u64 tx_bytes = 0, tx_packets = 0;
>  
>  	while ((skb = virtqueue_get_buf(sq->vq, &len)) != NULL) {
>  		pr_debug("Sent skb %p\n", skb);
>  
> -		u64_stats_update_begin(&stats->tx_syncp);
> -		stats->tx_bytes += skb->len;
> -		stats->tx_packets++;
> -		u64_stats_update_end(&stats->tx_syncp);
> +		tx_bytes += skb->len;
> +		tx_packets++;
>  
>  		dev_kfree_skb_any(skb);
>  	}
> +
> +	u64_stats_update_begin(&stats->tx_syncp);
> +	stats->tx_bytes += tx_bytes;
> +	stats->tx_packets =+ tx_packets;
> +	u64_stats_update_end(&stats->tx_syncp);
>  }
>  
>  static int xmit_skb(struct send_queue *sq, struct sk_buff *skb)
> -- 
> 1.7.1

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [RFC PATCH net-next 5/6] virtio-net: enable tx interrupt
  2014-10-15  7:25 ` [RFC PATCH net-next 5/6] virtio-net: enable tx interrupt Jason Wang
@ 2014-10-15  9:37   ` Eric Dumazet
  2014-10-15 10:21     ` Jason Wang
  2014-10-15 10:18   ` Michael S. Tsirkin
  1 sibling, 1 reply; 36+ messages in thread
From: Eric Dumazet @ 2014-10-15  9:37 UTC (permalink / raw)
  To: Jason Wang; +Cc: mst, netdev, linux-kernel, virtualization, davem

On Wed, 2014-10-15 at 15:25 +0800, Jason Wang wrote:

...

> +static int free_old_xmit_skbs(struct send_queue *sq, int budget)
> +{
> +	struct sk_buff *skb;
> +	unsigned int len;
> +	struct virtnet_info *vi = sq->vq->vdev->priv;
> +	struct virtnet_stats *stats = this_cpu_ptr(vi->stats);
> +	u64 tx_bytes = 0, tx_packets = 0;
> +
> +	while (tx_packets < budget &&
> +	       (skb = virtqueue_get_buf(sq->vq, &len)) != NULL) {
> +		pr_debug("Sent skb %p\n", skb);
> +
> +		tx_bytes += skb->len;
> +		tx_packets++;
> +
> +		dev_kfree_skb_any(skb);
> +	}
> +
> +	u64_stats_update_begin(&stats->tx_syncp);
> +	stats->tx_bytes += tx_bytes;
> +	stats->tx_packets =+ tx_packets;

	
	stats->tx_packets += tx_packets;

> +	u64_stats_update_end(&stats->tx_syncp);
> +
> +	return tx_packets;
> +}

^ permalink raw reply	[flat|nested] 36+ messages in thread

* RE: [RFC PATCH net-next 3/6] virtio-net: small optimization on free_old_xmit_skbs()
  2014-10-15  9:37   ` Michael S. Tsirkin
@ 2014-10-15  9:49     ` David Laight
  2014-10-15 10:48       ` Michael S. Tsirkin
  0 siblings, 1 reply; 36+ messages in thread
From: David Laight @ 2014-10-15  9:49 UTC (permalink / raw)
  To: 'Michael S. Tsirkin', Jason Wang
  Cc: eric.dumazet, netdev, linux-kernel, virtualization, davem

From: Of Michael S. Tsirkin
> On Wed, Oct 15, 2014 at 03:25:27PM +0800, Jason Wang wrote:
> > Accumulate the sent packets and sent bytes in local variables and perform a
> > single u64_stats_update_begin/end() after.
> >
> > Cc: Rusty Russell <rusty@rustcorp.com.au>
> > Cc: Michael S. Tsirkin <mst@redhat.com>
> > Signed-off-by: Jason Wang <jasowang@redhat.com>
> 
> Not sure how much it's worth but since Eric suggested it ...

Probably depends on the actual cost of u64_stats_update_begin/end
against the likely extra saving of the tx_bytes and tx_packets
values onto the stack across the call to dev_kfree_skb_any().
(Which depends on the number of caller saved registers.)

> Acked-by: Michael S. Tsirkin <mst@redhat.com>
> 
> > ---
> >  drivers/net/virtio_net.c |   12 ++++++++----
> >  1 files changed, 8 insertions(+), 4 deletions(-)
> >
> > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > index 3d0ce44..a4d56b8 100644
> > --- a/drivers/net/virtio_net.c
> > +++ b/drivers/net/virtio_net.c
> > @@ -833,17 +833,21 @@ static void free_old_xmit_skbs(struct send_queue *sq)
> >  	unsigned int len;
> >  	struct virtnet_info *vi = sq->vq->vdev->priv;
> >  	struct virtnet_stats *stats = this_cpu_ptr(vi->stats);
> > +	u64 tx_bytes = 0, tx_packets = 0;

tx_packets need only be 'unsigned int'.
The same is almost certainly true of tx_bytes.

	David

> >  	while ((skb = virtqueue_get_buf(sq->vq, &len)) != NULL) {
> >  		pr_debug("Sent skb %p\n", skb);
> >
> > -		u64_stats_update_begin(&stats->tx_syncp);
> > -		stats->tx_bytes += skb->len;
> > -		stats->tx_packets++;
> > -		u64_stats_update_end(&stats->tx_syncp);
> > +		tx_bytes += skb->len;
> > +		tx_packets++;
> >
> >  		dev_kfree_skb_any(skb);
> >  	}
> > +
> > +	u64_stats_update_begin(&stats->tx_syncp);
> > +	stats->tx_bytes += tx_bytes;
> > +	stats->tx_packets =+ tx_packets;
> > +	u64_stats_update_end(&stats->tx_syncp);
> >  }
> >
> >  static int xmit_skb(struct send_queue *sq, struct sk_buff *skb)
> > --
> > 1.7.1
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [RFC PATCH net-next 1/6] virtio: make sure used event never go backwards
  2014-10-15  9:34   ` Michael S. Tsirkin
@ 2014-10-15 10:13     ` Jason Wang
  2014-10-15 10:32       ` Michael S. Tsirkin
  0 siblings, 1 reply; 36+ messages in thread
From: Jason Wang @ 2014-10-15 10:13 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: eric.dumazet, netdev, linux-kernel, virtualization, davem

On 10/15/2014 05:34 PM, Michael S. Tsirkin wrote:
> On Wed, Oct 15, 2014 at 03:25:25PM +0800, Jason Wang wrote:
>> This patch checks the new event idx to make sure used event idx never
>> goes back. This is used to synchronize the calls between
>> virtqueue_enable_cb_delayed() and virtqueue_enable_cb().
>>
>> Cc: Rusty Russell <rusty@rustcorp.com.au>
>> Cc: Michael S. Tsirkin <mst@redhat.com>
>> Signed-off-by: Jason Wang <jasowang@redhat.com>
> the implication being that moving event idx back might cause some race
> condition?  

This will cause race condition when tx interrupt is enabled. Consider
the following cases

1) tx napi was scheduled
2) start_xmit() call virtqueue_enable_cb_delayed() and disable cb, [used
event is vq->last_used_idx + 3/4 pendg bufs]
3) tx napi enable the callback by virtqueue_enable_cb() [ used event is
vq->last_used_idx ]
 
After step 3, used event was moved back, unnecessary tx interrupt was
triggered.
> If yes but please describe the race explicitly.
> Is there a bug we need to fix on stable?

Looks not, current code does not have such race condition.
> Please also explicitly describe a configuration that causes event idx
> to go back.
>
> All this info should go in the commit log.

Will do this.
>> ---
>>  drivers/virtio/virtio_ring.c |    7 +++++--
>>  1 files changed, 5 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
>> index 3b1f89b..1b3929f 100644
>> --- a/drivers/virtio/virtio_ring.c
>> +++ b/drivers/virtio/virtio_ring.c
>> @@ -559,14 +559,17 @@ unsigned virtqueue_enable_cb_prepare(struct virtqueue *_vq)
>>  	u16 last_used_idx;
>>  
>>  	START_USE(vq);
>> -
>> +	last_used_idx = vq->last_used_idx;
>>  	/* We optimistically turn back on interrupts, then check if there was
>>  	 * more to do. */
>>  	/* Depending on the VIRTIO_RING_F_EVENT_IDX feature, we need to
>>  	 * either clear the flags bit or point the event index at the next
>>  	 * entry. Always do both to keep code simple. */
>>  	vq->vring.avail->flags &= ~VRING_AVAIL_F_NO_INTERRUPT;
>> -	vring_used_event(&vq->vring) = last_used_idx = vq->last_used_idx;
>> +	/* Make sure used event never go backwards */
> s/go/goes/
>
>> +	if (!vring_need_event(vring_used_event(&vq->vring),
>> +			      vq->vring.avail->idx, last_used_idx))
>> +		vring_used_event(&vq->vring) = last_used_idx;
> The result will be that driver will *not* get an interrupt
> on the next consumed buffer, which is likely not what driver
> intended when it called virtqueue_enable_cb.

This will only happen when we want to delay the interrupt for next few
consumed buffers (virtqueue_enable_cb_delayed() was called). For the
other case, vq->last_used_idx should be ahead of previous used event. Do
you see any other case?
>
> Instead, how about we simply document the requirement that drivers either
> always call virtqueue_enable_cb_delayed or virtqueue_enable_cb
> but not both?

We need call them both when tx interrupt is enabled I believe.
>
>>  	END_USE(vq);
>>  	return last_used_idx;
>>  }
>> -- 
>> 1.7.1

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [RFC PATCH net-next 5/6] virtio-net: enable tx interrupt
  2014-10-15  7:25 ` [RFC PATCH net-next 5/6] virtio-net: enable tx interrupt Jason Wang
  2014-10-15  9:37   ` Eric Dumazet
@ 2014-10-15 10:18   ` Michael S. Tsirkin
  2014-10-15 10:25     ` Jason Wang
  1 sibling, 1 reply; 36+ messages in thread
From: Michael S. Tsirkin @ 2014-10-15 10:18 UTC (permalink / raw)
  To: Jason Wang; +Cc: eric.dumazet, netdev, linux-kernel, virtualization, davem

On Wed, Oct 15, 2014 at 03:25:29PM +0800, Jason Wang wrote:
> Orphan skb in ndo_start_xmit() breaks socket accounting and packet
> queuing. This in fact breaks lots of things such as pktgen and several
> TCP optimizations. And also make BQL can't be implemented for
> virtio-net.
> 
> This patch tries to solve this issue by enabling tx interrupt. To
> avoid introducing extra spinlocks, a tx napi was scheduled to free
> those packets.
> 
> More tx interrupt mitigation method could be used on top.
> 
> Cc: Rusty Russell <rusty@rustcorp.com.au>
> Cc: Michael S. Tsirkin <mst@redhat.com>
> Signed-off-by: Jason Wang <jasowang@redhat.com>
> ---
>  drivers/net/virtio_net.c |  125 +++++++++++++++++++++++++++++++---------------
>  1 files changed, 85 insertions(+), 40 deletions(-)
> 
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index ccf98f9..2afc2e2 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -72,6 +72,8 @@ struct send_queue {
>  
>  	/* Name of the send queue: output.$index */
>  	char name[40];
> +
> +	struct napi_struct napi;
>  };
>  
>  /* Internal representation of a receive virtqueue */
> @@ -217,15 +219,40 @@ static struct page *get_a_page(struct receive_queue *rq, gfp_t gfp_mask)
>  	return p;
>  }
>  
> +static int free_old_xmit_skbs(struct send_queue *sq, int budget)
> +{
> +	struct sk_buff *skb;
> +	unsigned int len;
> +	struct virtnet_info *vi = sq->vq->vdev->priv;
> +	struct virtnet_stats *stats = this_cpu_ptr(vi->stats);
> +	u64 tx_bytes = 0, tx_packets = 0;
> +
> +	while (tx_packets < budget &&
> +	       (skb = virtqueue_get_buf(sq->vq, &len)) != NULL) {
> +		pr_debug("Sent skb %p\n", skb);
> +
> +		tx_bytes += skb->len;
> +		tx_packets++;
> +
> +		dev_kfree_skb_any(skb);
> +	}
> +
> +	u64_stats_update_begin(&stats->tx_syncp);
> +	stats->tx_bytes += tx_bytes;
> +	stats->tx_packets =+ tx_packets;
> +	u64_stats_update_end(&stats->tx_syncp);
> +
> +	return tx_packets;
> +}
> +
>  static void skb_xmit_done(struct virtqueue *vq)
>  {
>  	struct virtnet_info *vi = vq->vdev->priv;
> +	struct send_queue *sq = &vi->sq[vq2txq(vq)];
>  
> -	/* Suppress further interrupts. */
> -	virtqueue_disable_cb(vq);
> -
> -	/* We were probably waiting for more output buffers. */
> -	netif_wake_subqueue(vi->dev, vq2txq(vq));
> +	if (napi_schedule_prep(&sq->napi)) {
> +		__napi_schedule(&sq->napi);
> +	}
>  }
>  
>  static unsigned int mergeable_ctx_to_buf_truesize(unsigned long mrg_ctx)
> @@ -774,7 +801,39 @@ again:
>  	return received;
>  }
>  
> +static int virtnet_poll_tx(struct napi_struct *napi, int budget)
> +{
> +	struct send_queue *sq =
> +		container_of(napi, struct send_queue, napi);
> +	struct virtnet_info *vi = sq->vq->vdev->priv;
> +	struct netdev_queue *txq = netdev_get_tx_queue(vi->dev, vq2txq(sq->vq));
> +	unsigned int r, sent = 0;
> +
> +again:
> +	__netif_tx_lock(txq, smp_processor_id());
> +	virtqueue_disable_cb(sq->vq);
> +	sent += free_old_xmit_skbs(sq, budget - sent);
> +
> +	if (sent < budget) {
> +		r = virtqueue_enable_cb_prepare(sq->vq);
> +		napi_complete(napi);
> +		__netif_tx_unlock(txq);
> +		if (unlikely(virtqueue_poll(sq->vq, r)) &&


So you are enabling callback on the next packet,
which is almost sure to cause an interrupt storm
on the guest.


I think it's a bad idea, this is why I used
enable_cb_delayed in my patch.




> +		    napi_schedule_prep(napi)) {
> +			virtqueue_disable_cb(sq->vq);
> +			__napi_schedule(napi);
> +			goto again;
> +		}
> +	} else {
> +		__netif_tx_unlock(txq);
> +	}
> +
> +	netif_wake_subqueue(vi->dev, vq2txq(sq->vq));
> +	return sent;
> +}
> +
>  #ifdef CONFIG_NET_RX_BUSY_POLL
> +
>  /* must be called with local_bh_disable()d */
>  static int virtnet_busy_poll(struct napi_struct *napi)
>  {
> @@ -822,36 +881,12 @@ static int virtnet_open(struct net_device *dev)
>  			if (!try_fill_recv(&vi->rq[i], GFP_KERNEL))
>  				schedule_delayed_work(&vi->refill, 0);
>  		virtnet_napi_enable(&vi->rq[i]);
> +		napi_enable(&vi->sq[i].napi);
>  	}
>  
>  	return 0;
>  }
>  
> -static int free_old_xmit_skbs(struct send_queue *sq)
> -{
> -	struct sk_buff *skb;
> -	unsigned int len;
> -	struct virtnet_info *vi = sq->vq->vdev->priv;
> -	struct virtnet_stats *stats = this_cpu_ptr(vi->stats);
> -	u64 tx_bytes = 0, tx_packets = 0;
> -
> -	while ((skb = virtqueue_get_buf(sq->vq, &len)) != NULL) {
> -		pr_debug("Sent skb %p\n", skb);
> -
> -		tx_bytes += skb->len;
> -		tx_packets++;
> -
> -		dev_kfree_skb_any(skb);
> -	}
> -
> -	u64_stats_update_begin(&stats->tx_syncp);
> -	stats->tx_bytes += tx_bytes;
> -	stats->tx_packets =+ tx_packets;
> -	u64_stats_update_end(&stats->tx_syncp);
> -
> -	return tx_packets;
> -}
> -

So you end up moving it all anyway, why bother splitting out
minor changes in previous patches?


>  static int xmit_skb(struct send_queue *sq, struct sk_buff *skb)
>  {
>  	struct skb_vnet_hdr *hdr;
> @@ -917,6 +952,7 @@ static int xmit_skb(struct send_queue *sq, struct sk_buff *skb)
>  		sg_set_buf(sq->sg, hdr, hdr_len);
>  		num_sg = skb_to_sgvec(skb, sq->sg + 1, 0, skb->len) + 1;
>  	}
> +
>  	return virtqueue_add_outbuf(sq->vq, sq->sg, num_sg, skb, GFP_ATOMIC);
>  }
>  
> @@ -925,10 +961,11 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev)
>  	struct virtnet_info *vi = netdev_priv(dev);
>  	int qnum = skb_get_queue_mapping(skb);
>  	struct send_queue *sq = &vi->sq[qnum];
> -	int err;
> +	int err, qsize = virtqueue_get_vring_size(sq->vq);
>  
> +	virtqueue_disable_cb(sq->vq);
>  	/* Free up any pending old buffers before queueing new ones. */
> -	free_old_xmit_skbs(sq);
> +	free_old_xmit_skbs(sq, qsize);
>  
>  	/* Try to transmit */
>  	err = xmit_skb(sq, skb);
> @@ -944,22 +981,20 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev)
>  		return NETDEV_TX_OK;
>  	}
>  
> -	/* Don't wait up for transmitted skbs to be freed. */
> -	skb_orphan(skb);
> -	nf_reset(skb);
> -
>  	/* Apparently nice girls don't return TX_BUSY; stop the queue
>  	 * before it gets out of hand.  Naturally, this wastes entries. */
>  	if (sq->vq->num_free < 2+MAX_SKB_FRAGS) {
>  		netif_stop_subqueue(dev, qnum);
>  		if (unlikely(!virtqueue_enable_cb_delayed(sq->vq))) {
>  			/* More just got used, free them then recheck. */
> -			free_old_xmit_skbs(sq);
> +			free_old_xmit_skbs(sq, qsize);
>  			if (sq->vq->num_free >= 2+MAX_SKB_FRAGS) {
>  				netif_start_subqueue(dev, qnum);
>  				virtqueue_disable_cb(sq->vq);
>  			}
>  		}
> +	} else if (virtqueue_enable_cb(sq->vq)) {
> +		free_old_xmit_skbs(sq, qsize);
>  	}
>  
>  	if (__netif_subqueue_stopped(dev, qnum) || !skb->xmit_more)
> @@ -1141,8 +1176,10 @@ static int virtnet_close(struct net_device *dev)
>  	/* Make sure refill_work doesn't re-enable napi! */
>  	cancel_delayed_work_sync(&vi->refill);
>  
> -	for (i = 0; i < vi->max_queue_pairs; i++)
> +	for (i = 0; i < vi->max_queue_pairs; i++) {
>  		napi_disable(&vi->rq[i].napi);
> +		napi_disable(&vi->sq[i].napi);
> +	}
>  
>  	return 0;
>  }
> @@ -1461,8 +1498,10 @@ static void virtnet_free_queues(struct virtnet_info *vi)
>  {
>  	int i;
>  
> -	for (i = 0; i < vi->max_queue_pairs; i++)
> +	for (i = 0; i < vi->max_queue_pairs; i++) {
>  		netif_napi_del(&vi->rq[i].napi);
> +		netif_napi_del(&vi->sq[i].napi);
> +	}
>  
>  	kfree(vi->rq);
>  	kfree(vi->sq);
> @@ -1616,6 +1655,8 @@ static int virtnet_alloc_queues(struct virtnet_info *vi)
>  		netif_napi_add(vi->dev, &vi->rq[i].napi, virtnet_poll,
>  			       napi_weight);
>  		napi_hash_add(&vi->rq[i].napi);
> +		netif_napi_add(vi->dev, &vi->sq[i].napi, virtnet_poll_tx,
> +			       napi_weight);
>  
>  		sg_init_table(vi->rq[i].sg, ARRAY_SIZE(vi->rq[i].sg));
>  		ewma_init(&vi->rq[i].mrg_avg_pkt_len, 1, RECEIVE_AVG_WEIGHT);
> @@ -1920,8 +1961,10 @@ static int virtnet_freeze(struct virtio_device *vdev)
>  	if (netif_running(vi->dev)) {
>  		for (i = 0; i < vi->max_queue_pairs; i++) {
>  			napi_disable(&vi->rq[i].napi);
> +			napi_disable(&vi->sq[i].napi);
>  			napi_hash_del(&vi->rq[i].napi);
>  			netif_napi_del(&vi->rq[i].napi);
> +			netif_napi_del(&vi->sq[i].napi);
>  		}
>  	}
>  
> @@ -1946,8 +1989,10 @@ static int virtnet_restore(struct virtio_device *vdev)
>  			if (!try_fill_recv(&vi->rq[i], GFP_KERNEL))
>  				schedule_delayed_work(&vi->refill, 0);
>  
> -		for (i = 0; i < vi->max_queue_pairs; i++)
> +		for (i = 0; i < vi->max_queue_pairs; i++) {
>  			virtnet_napi_enable(&vi->rq[i]);
> +			napi_enable(&vi->sq[i].napi);
> +		}
>  	}
>  
>  	netif_device_attach(vi->dev);
> -- 
> 1.7.1

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [RFC PATCH net-next 2/6] virtio: introduce virtio_enable_cb_avail()
  2014-10-15  9:28   ` Michael S. Tsirkin
@ 2014-10-15 10:19     ` Jason Wang
  2014-10-15 10:41       ` Michael S. Tsirkin
  0 siblings, 1 reply; 36+ messages in thread
From: Jason Wang @ 2014-10-15 10:19 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: eric.dumazet, netdev, linux-kernel, virtualization, davem

On 10/15/2014 05:28 PM, Michael S. Tsirkin wrote:
> On Wed, Oct 15, 2014 at 03:25:26PM +0800, Jason Wang wrote:
>> This patch introduces virtio_enable_cb_avail() to publish avail idx
>> and used event. This could be used by batched buffer submitting to
>> reduce the number of tx interrupts.
>>
>> Cc: Rusty Russell <rusty@rustcorp.com.au>
>> Cc: Michael S. Tsirkin <mst@redhat.com>
>> Signed-off-by: Jason Wang <jasowang@redhat.com>
>> ---
>>  drivers/virtio/virtio_ring.c |   22 ++++++++++++++++++++--
>>  include/linux/virtio.h       |    2 ++
>>  2 files changed, 22 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
>> index 1b3929f..d67fbf8 100644
>> --- a/drivers/virtio/virtio_ring.c
>> +++ b/drivers/virtio/virtio_ring.c
>> @@ -567,14 +567,32 @@ unsigned virtqueue_enable_cb_prepare(struct virtqueue *_vq)
>>  	 * entry. Always do both to keep code simple. */
>>  	vq->vring.avail->flags &= ~VRING_AVAIL_F_NO_INTERRUPT;
>>  	/* Make sure used event never go backwards */
>> -	if (!vring_need_event(vring_used_event(&vq->vring),
>> -			      vq->vring.avail->idx, last_used_idx))
>> +	if (vq->vring.avail->idx != vring_used_event(&vq->vring) &&
>> +	    !vring_need_event(vring_used_event(&vq->vring),
>> +			      vq->vring.avail->idx, last_used_idx)) {
>>  		vring_used_event(&vq->vring) = last_used_idx;
>> +	}
>>  	END_USE(vq);
>>  	return last_used_idx;
>>  }
>>  EXPORT_SYMBOL_GPL(virtqueue_enable_cb_prepare);
>>
> I see you are also changing virtqueue_enable_cb_prepare, why?

This is also used to prevent it from moving the used event backwards.
This may happens when we handle tx napi after we publish avail idx as
used event (virtqueue_enable_cb_avail() was called).
>
>> +bool virtqueue_enable_cb_avail(struct virtqueue *_vq)
>> +{
>> +	struct vring_virtqueue *vq = to_vvq(_vq);
>> +	bool ret;
>> +
>> +	START_USE(vq);
>> +	vq->vring.avail->flags &= ~VRING_AVAIL_F_NO_INTERRUPT;
>> +	vring_used_event(&vq->vring) = vq->vring.avail->idx;
>> +	ret = vring_need_event(vq->vring.avail->idx,
>> +			       vq->last_used_idx, vq->vring.used->idx);
>> +	END_USE(vq);
>> +
>> +	return ret;
>> +}
>> +EXPORT_SYMBOL_GPL(virtqueue_enable_cb_avail);
>> +
>>  /**
>>   * virtqueue_poll - query pending used buffers
>>   * @vq: the struct virtqueue we're talking about.
> Could not figure out what this does.
> Please add documentation.
>

Sure, does something like below explain what does this function do?

/**                                                                            

 * virtqueue_enable_cb_avail - restart callbacks after
disable_cb.           
 * @vq: the struct virtqueue we're talking
about.                              
 *                                                                             

 * This re-enables callbacks but hints to the other side to
delay              
 * interrupts all of the available buffers have been processed;         
 * it returns "false" if there are at least one pending buffer in the
queue,          
 * to detect a possible race between the driver checking for more
work,        
 * and enabling
callbacks.                                                     
 *                                                                             

 * Caller must ensure we don't call this with other
virtqueue                  
 * operations at the same time (except where
noted).                           
 */

>> diff --git a/include/linux/virtio.h b/include/linux/virtio.h
>> index b46671e..bfaf058 100644
>> --- a/include/linux/virtio.h
>> +++ b/include/linux/virtio.h
>> @@ -65,6 +65,8 @@ bool virtqueue_enable_cb(struct virtqueue *vq);
>>  
>>  unsigned virtqueue_enable_cb_prepare(struct virtqueue *vq);
>>  
>> +bool virtqueue_enable_cb_avail(struct virtqueue *vq);
>> +
>>  bool virtqueue_poll(struct virtqueue *vq, unsigned);
>>  
>>  bool virtqueue_enable_cb_delayed(struct virtqueue *vq);
>> -- 
>> 1.7.1
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [RFC PATCH net-next 5/6] virtio-net: enable tx interrupt
  2014-10-15  9:37   ` Eric Dumazet
@ 2014-10-15 10:21     ` Jason Wang
  0 siblings, 0 replies; 36+ messages in thread
From: Jason Wang @ 2014-10-15 10:21 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: mst, netdev, linux-kernel, virtualization, davem

On 10/15/2014 05:37 PM, Eric Dumazet wrote:
> On Wed, 2014-10-15 at 15:25 +0800, Jason Wang wrote:
>
> ...
>
>> +static int free_old_xmit_skbs(struct send_queue *sq, int budget)
>> +{
>> +	struct sk_buff *skb;
>> +	unsigned int len;
>> +	struct virtnet_info *vi = sq->vq->vdev->priv;
>> +	struct virtnet_stats *stats = this_cpu_ptr(vi->stats);
>> +	u64 tx_bytes = 0, tx_packets = 0;
>> +
>> +	while (tx_packets < budget &&
>> +	       (skb = virtqueue_get_buf(sq->vq, &len)) != NULL) {
>> +		pr_debug("Sent skb %p\n", skb);
>> +
>> +		tx_bytes += skb->len;
>> +		tx_packets++;
>> +
>> +		dev_kfree_skb_any(skb);
>> +	}
>> +
>> +	u64_stats_update_begin(&stats->tx_syncp);
>> +	stats->tx_bytes += tx_bytes;
>> +	stats->tx_packets =+ tx_packets;
> 	
> 	stats->tx_packets += tx_packets;

My bad, will correct it in next version.

Thanks

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [RFC PATCH net-next 6/6] virtio-net: enable tx interrupt only for the final skb in the chain
  2014-10-15  7:25 ` [RFC PATCH net-next 6/6] virtio-net: enable tx interrupt only for the final skb in the chain Jason Wang
@ 2014-10-15 10:22   ` Michael S. Tsirkin
  2014-10-15 10:31     ` Jason Wang
  0 siblings, 1 reply; 36+ messages in thread
From: Michael S. Tsirkin @ 2014-10-15 10:22 UTC (permalink / raw)
  To: Jason Wang; +Cc: eric.dumazet, netdev, linux-kernel, virtualization, davem

On Wed, Oct 15, 2014 at 03:25:30PM +0800, Jason Wang wrote:
> With the help of xmit_more and virtqueue_enable_cb_avail(), this patch
> enable tx interrupt only for the final skb in the chain if
> needed. This will help to mitigate tx interrupts.
> 
> Cc: Rusty Russell <rusty@rustcorp.com.au>
> Cc: Michael S. Tsirkin <mst@redhat.com>
> Signed-off-by: Jason Wang <jasowang@redhat.com>

I think you should split xmit_more stuff out.



> ---
>  drivers/net/virtio_net.c |   10 +++++++---
>  1 files changed, 7 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index 2afc2e2..5943f3f 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -993,12 +993,16 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev)
>  				virtqueue_disable_cb(sq->vq);
>  			}
>  		}
> -	} else if (virtqueue_enable_cb(sq->vq)) {
> -		free_old_xmit_skbs(sq, qsize);
>  	}
>  
> -	if (__netif_subqueue_stopped(dev, qnum) || !skb->xmit_more)
> +	if (__netif_subqueue_stopped(dev, qnum) || !skb->xmit_more) {
>  		virtqueue_kick(sq->vq);
> +		if (sq->vq->num_free >= 2 +MAX_SKB_FRAGS &&
> +		    unlikely(virtqueue_enable_cb_avail(sq->vq))) {
> +			free_old_xmit_skbs(sq, qsize);
> +			virtqueue_disable_cb(sq->vq);
> +		}
> +	}
>  
>  	return NETDEV_TX_OK;
>  }
> -- 
> 1.7.1

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [RFC PATCH net-next 5/6] virtio-net: enable tx interrupt
  2014-10-15 10:18   ` Michael S. Tsirkin
@ 2014-10-15 10:25     ` Jason Wang
  2014-10-15 10:43       ` Michael S. Tsirkin
  0 siblings, 1 reply; 36+ messages in thread
From: Jason Wang @ 2014-10-15 10:25 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: eric.dumazet, netdev, linux-kernel, virtualization, davem

On 10/15/2014 06:18 PM, Michael S. Tsirkin wrote:
> On Wed, Oct 15, 2014 at 03:25:29PM +0800, Jason Wang wrote:
>> > Orphan skb in ndo_start_xmit() breaks socket accounting and packet
>> > queuing. This in fact breaks lots of things such as pktgen and several
>> > TCP optimizations. And also make BQL can't be implemented for
>> > virtio-net.
>> > 
>> > This patch tries to solve this issue by enabling tx interrupt. To
>> > avoid introducing extra spinlocks, a tx napi was scheduled to free
>> > those packets.
>> > 
>> > More tx interrupt mitigation method could be used on top.
>> > 
>> > Cc: Rusty Russell <rusty@rustcorp.com.au>
>> > Cc: Michael S. Tsirkin <mst@redhat.com>
>> > Signed-off-by: Jason Wang <jasowang@redhat.com>
>> > ---
>> >  drivers/net/virtio_net.c |  125 +++++++++++++++++++++++++++++++---------------
>> >  1 files changed, 85 insertions(+), 40 deletions(-)
>> > 
>> > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
>> > index ccf98f9..2afc2e2 100644
>> > --- a/drivers/net/virtio_net.c
>> > +++ b/drivers/net/virtio_net.c
>> > @@ -72,6 +72,8 @@ struct send_queue {
>> >  
>> >  	/* Name of the send queue: output.$index */
>> >  	char name[40];
>> > +
>> > +	struct napi_struct napi;
>> >  };
>> >  
>> >  /* Internal representation of a receive virtqueue */
>> > @@ -217,15 +219,40 @@ static struct page *get_a_page(struct receive_queue *rq, gfp_t gfp_mask)
>> >  	return p;
>> >  }
>> >  
>> > +static int free_old_xmit_skbs(struct send_queue *sq, int budget)
>> > +{
>> > +	struct sk_buff *skb;
>> > +	unsigned int len;
>> > +	struct virtnet_info *vi = sq->vq->vdev->priv;
>> > +	struct virtnet_stats *stats = this_cpu_ptr(vi->stats);
>> > +	u64 tx_bytes = 0, tx_packets = 0;
>> > +
>> > +	while (tx_packets < budget &&
>> > +	       (skb = virtqueue_get_buf(sq->vq, &len)) != NULL) {
>> > +		pr_debug("Sent skb %p\n", skb);
>> > +
>> > +		tx_bytes += skb->len;
>> > +		tx_packets++;
>> > +
>> > +		dev_kfree_skb_any(skb);
>> > +	}
>> > +
>> > +	u64_stats_update_begin(&stats->tx_syncp);
>> > +	stats->tx_bytes += tx_bytes;
>> > +	stats->tx_packets =+ tx_packets;
>> > +	u64_stats_update_end(&stats->tx_syncp);
>> > +
>> > +	return tx_packets;
>> > +}
>> > +
>> >  static void skb_xmit_done(struct virtqueue *vq)
>> >  {
>> >  	struct virtnet_info *vi = vq->vdev->priv;
>> > +	struct send_queue *sq = &vi->sq[vq2txq(vq)];
>> >  
>> > -	/* Suppress further interrupts. */
>> > -	virtqueue_disable_cb(vq);
>> > -
>> > -	/* We were probably waiting for more output buffers. */
>> > -	netif_wake_subqueue(vi->dev, vq2txq(vq));
>> > +	if (napi_schedule_prep(&sq->napi)) {
>> > +		__napi_schedule(&sq->napi);
>> > +	}
>> >  }
>> >  
>> >  static unsigned int mergeable_ctx_to_buf_truesize(unsigned long mrg_ctx)
>> > @@ -774,7 +801,39 @@ again:
>> >  	return received;
>> >  }
>> >  
>> > +static int virtnet_poll_tx(struct napi_struct *napi, int budget)
>> > +{
>> > +	struct send_queue *sq =
>> > +		container_of(napi, struct send_queue, napi);
>> > +	struct virtnet_info *vi = sq->vq->vdev->priv;
>> > +	struct netdev_queue *txq = netdev_get_tx_queue(vi->dev, vq2txq(sq->vq));
>> > +	unsigned int r, sent = 0;
>> > +
>> > +again:
>> > +	__netif_tx_lock(txq, smp_processor_id());
>> > +	virtqueue_disable_cb(sq->vq);
>> > +	sent += free_old_xmit_skbs(sq, budget - sent);
>> > +
>> > +	if (sent < budget) {
>> > +		r = virtqueue_enable_cb_prepare(sq->vq);
>> > +		napi_complete(napi);
>> > +		__netif_tx_unlock(txq);
>> > +		if (unlikely(virtqueue_poll(sq->vq, r)) &&
> So you are enabling callback on the next packet,
> which is almost sure to cause an interrupt storm
> on the guest.
>
>
> I think it's a bad idea, this is why I used
> enable_cb_delayed in my patch.

Right, will do this, but may also need to make sure used event never
goes back since we may call virtqueue_enable_cb_avail().
>
>
>> > +		    napi_schedule_prep(napi)) {
>> > +			virtqueue_disable_cb(sq->vq);
>> > +			__napi_schedule(napi);
>> > +			goto again;
>> > +		}
>> > +	} else {
>> > +		__netif_tx_unlock(txq);
>> > +	}
>> > +
>> > +	netif_wake_subqueue(vi->dev, vq2txq(sq->vq));
>> > +	return sent;
>> > +}
>> > +
>> >  #ifdef CONFIG_NET_RX_BUSY_POLL
>> > +
>> >  /* must be called with local_bh_disable()d */
>> >  static int virtnet_busy_poll(struct napi_struct *napi)
>> >  {
>> > @@ -822,36 +881,12 @@ static int virtnet_open(struct net_device *dev)
>> >  			if (!try_fill_recv(&vi->rq[i], GFP_KERNEL))
>> >  				schedule_delayed_work(&vi->refill, 0);
>> >  		virtnet_napi_enable(&vi->rq[i]);
>> > +		napi_enable(&vi->sq[i].napi);
>> >  	}
>> >  
>> >  	return 0;
>> >  }
>> >  
>> > -static int free_old_xmit_skbs(struct send_queue *sq)
>> > -{
>> > -	struct sk_buff *skb;
>> > -	unsigned int len;
>> > -	struct virtnet_info *vi = sq->vq->vdev->priv;
>> > -	struct virtnet_stats *stats = this_cpu_ptr(vi->stats);
>> > -	u64 tx_bytes = 0, tx_packets = 0;
>> > -
>> > -	while ((skb = virtqueue_get_buf(sq->vq, &len)) != NULL) {
>> > -		pr_debug("Sent skb %p\n", skb);
>> > -
>> > -		tx_bytes += skb->len;
>> > -		tx_packets++;
>> > -
>> > -		dev_kfree_skb_any(skb);
>> > -	}
>> > -
>> > -	u64_stats_update_begin(&stats->tx_syncp);
>> > -	stats->tx_bytes += tx_bytes;
>> > -	stats->tx_packets =+ tx_packets;
>> > -	u64_stats_update_end(&stats->tx_syncp);
>> > -
>> > -	return tx_packets;
>> > -}
>> > -
> So you end up moving it all anyway, why bother splitting out
> minor changes in previous patches?

To make review easier, but if you think this complicated it in fact,
will pack them into a single patch.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [RFC PATCH net-next 0/6] Always use tx interrupt for virtio-net
  2014-10-15  7:25 [RFC PATCH net-next 0/6] Always use tx interrupt for virtio-net Jason Wang
                   ` (5 preceding siblings ...)
  2014-10-15  7:25 ` [RFC PATCH net-next 6/6] virtio-net: enable tx interrupt only for the final skb in the chain Jason Wang
@ 2014-10-15 10:25 ` Michael S. Tsirkin
  2014-10-15 11:14   ` Jason Wang
  6 siblings, 1 reply; 36+ messages in thread
From: Michael S. Tsirkin @ 2014-10-15 10:25 UTC (permalink / raw)
  To: Jason Wang; +Cc: eric.dumazet, netdev, linux-kernel, virtualization, davem

On Wed, Oct 15, 2014 at 03:25:24PM +0800, Jason Wang wrote:
> According to David, proper accounting and queueing (at all levels, not
> just TCP sockets) is more important than trying to skim a bunch of
> cycles by avoiding TX interrupts.

He also mentioned we should find other ways to batch

> Having an event to free the SKB is
> absolutely essential for the stack to operate correctly.
> 
> This series tries to enable tx interrupt for virtio-net. The idea is
> simple: enable tx interrupt and schedule a tx napi to free old xmit
> skbs.
> 
> Several notes:
> - Tx interrupt storm avoidance when queue is about to be full is
>   kept.

But queue is typically *not* full. More important to avoid interrupt
storms in that case IMO.

> Since we may enable callbacks in both ndo_start_xmit() and tx
>   napi, patch 1 adds a check to make sure used event never go
>   back. This will let the napi not enable the callbacks wrongly after
>   delayed callbacks was used.

So why not just use delayed callbacks?


> - For bulk dequeuing, there's no need to enable tx interrupt for each
>   packet. The last patch only enable tx interrupt for the final skb in
>   the chain through xmit_more and a new helper to publish current avail
>   idx as used event.
> 
> This series fixes several issues of original rfc pointed out by Michael.

Could you list the issues, for ease of review?


> 
> Smoking test is done, and will do complete netperf test. Send the
> seires for early review.
> 
> Thanks
> 
> Jason Wang (6):
>   virtio: make sure used event never go backwards
>   virtio: introduce virtio_enable_cb_avail()
>   virtio-net: small optimization on free_old_xmit_skbs()
>   virtio-net: return the number of packets sent in free_old_xmit_skbs()
>   virtio-net: enable tx interrupt
>   virtio-net: enable tx interrupt only for the final skb in the chain
> 
>  drivers/net/virtio_net.c     |  125 ++++++++++++++++++++++++++++++------------
>  drivers/virtio/virtio_ring.c |   25 ++++++++-
>  include/linux/virtio.h       |    2 +
>  3 files changed, 115 insertions(+), 37 deletions(-)

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [RFC PATCH net-next 6/6] virtio-net: enable tx interrupt only for the final skb in the chain
  2014-10-15 10:22   ` Michael S. Tsirkin
@ 2014-10-15 10:31     ` Jason Wang
  2014-10-15 10:46       ` Michael S. Tsirkin
  0 siblings, 1 reply; 36+ messages in thread
From: Jason Wang @ 2014-10-15 10:31 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: eric.dumazet, netdev, linux-kernel, virtualization, davem

On 10/15/2014 06:22 PM, Michael S. Tsirkin wrote:
> On Wed, Oct 15, 2014 at 03:25:30PM +0800, Jason Wang wrote:
>> With the help of xmit_more and virtqueue_enable_cb_avail(), this patch
>> enable tx interrupt only for the final skb in the chain if
>> needed. This will help to mitigate tx interrupts.
>>
>> Cc: Rusty Russell <rusty@rustcorp.com.au>
>> Cc: Michael S. Tsirkin <mst@redhat.com>
>> Signed-off-by: Jason Wang <jasowang@redhat.com>
> I think you should split xmit_more stuff out.

No much difference but if you prefer I will do this.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [RFC PATCH net-next 1/6] virtio: make sure used event never go backwards
  2014-10-15 10:13     ` Jason Wang
@ 2014-10-15 10:32       ` Michael S. Tsirkin
  2014-10-15 10:44         ` Jason Wang
  0 siblings, 1 reply; 36+ messages in thread
From: Michael S. Tsirkin @ 2014-10-15 10:32 UTC (permalink / raw)
  To: Jason Wang; +Cc: eric.dumazet, netdev, linux-kernel, virtualization, davem

On Wed, Oct 15, 2014 at 06:13:19PM +0800, Jason Wang wrote:
> On 10/15/2014 05:34 PM, Michael S. Tsirkin wrote:
> > On Wed, Oct 15, 2014 at 03:25:25PM +0800, Jason Wang wrote:
> >> This patch checks the new event idx to make sure used event idx never
> >> goes back. This is used to synchronize the calls between
> >> virtqueue_enable_cb_delayed() and virtqueue_enable_cb().
> >>
> >> Cc: Rusty Russell <rusty@rustcorp.com.au>
> >> Cc: Michael S. Tsirkin <mst@redhat.com>
> >> Signed-off-by: Jason Wang <jasowang@redhat.com>
> > the implication being that moving event idx back might cause some race
> > condition?  
> 
> This will cause race condition when tx interrupt is enabled. Consider
> the following cases
> 
> 1) tx napi was scheduled
> 2) start_xmit() call virtqueue_enable_cb_delayed() and disable cb, [used
> event is vq->last_used_idx + 3/4 pendg bufs]
> 3) tx napi enable the callback by virtqueue_enable_cb() [ used event is
> vq->last_used_idx ]
>  
> After step 3, used event was moved back, unnecessary tx interrupt was
> triggered.

Well unnecessary interrupts are safe.
With your patch caller of virtqueue_enable_cb will not get an
interrupt on the next buffer which is not safe.

If you don't want an interrupt on the next buffer, don't
call virtqueue_enable_cb.


> > If yes but please describe the race explicitly.
> > Is there a bug we need to fix on stable?
> 
> Looks not, current code does not have such race condition.
> > Please also explicitly describe a configuration that causes event idx
> > to go back.
> >
> > All this info should go in the commit log.
> 
> Will do this.
> >> ---
> >>  drivers/virtio/virtio_ring.c |    7 +++++--
> >>  1 files changed, 5 insertions(+), 2 deletions(-)
> >>
> >> diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> >> index 3b1f89b..1b3929f 100644
> >> --- a/drivers/virtio/virtio_ring.c
> >> +++ b/drivers/virtio/virtio_ring.c
> >> @@ -559,14 +559,17 @@ unsigned virtqueue_enable_cb_prepare(struct virtqueue *_vq)
> >>  	u16 last_used_idx;
> >>  
> >>  	START_USE(vq);
> >> -
> >> +	last_used_idx = vq->last_used_idx;
> >>  	/* We optimistically turn back on interrupts, then check if there was
> >>  	 * more to do. */
> >>  	/* Depending on the VIRTIO_RING_F_EVENT_IDX feature, we need to
> >>  	 * either clear the flags bit or point the event index at the next
> >>  	 * entry. Always do both to keep code simple. */
> >>  	vq->vring.avail->flags &= ~VRING_AVAIL_F_NO_INTERRUPT;
> >> -	vring_used_event(&vq->vring) = last_used_idx = vq->last_used_idx;
> >> +	/* Make sure used event never go backwards */
> > s/go/goes/
> >
> >> +	if (!vring_need_event(vring_used_event(&vq->vring),
> >> +			      vq->vring.avail->idx, last_used_idx))
> >> +		vring_used_event(&vq->vring) = last_used_idx;
> > The result will be that driver will *not* get an interrupt
> > on the next consumed buffer, which is likely not what driver
> > intended when it called virtqueue_enable_cb.
> 
> This will only happen when we want to delay the interrupt for next few
> consumed buffers (virtqueue_enable_cb_delayed() was called). For the
> other case, vq->last_used_idx should be ahead of previous used event. Do
> you see any other case?

Call virtqueue_enable_cb_delayed, later call virtqueue_enable_cb.  If
event index is not updated in virtqueue_enable_cb, driver will not get
an interrupt on the next buffer.


> >
> > Instead, how about we simply document the requirement that drivers either
> > always call virtqueue_enable_cb_delayed or virtqueue_enable_cb
> > but not both?
> 
> We need call them both when tx interrupt is enabled I believe.

Can you pls reply to my patch and document issues you see?

> >
> >>  	END_USE(vq);
> >>  	return last_used_idx;
> >>  }
> >> -- 
> >> 1.7.1

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [RFC PATCH net-next 2/6] virtio: introduce virtio_enable_cb_avail()
  2014-10-15 10:19     ` Jason Wang
@ 2014-10-15 10:41       ` Michael S. Tsirkin
  2014-10-15 10:58         ` Jason Wang
  0 siblings, 1 reply; 36+ messages in thread
From: Michael S. Tsirkin @ 2014-10-15 10:41 UTC (permalink / raw)
  To: Jason Wang; +Cc: eric.dumazet, netdev, linux-kernel, virtualization, davem

On Wed, Oct 15, 2014 at 06:19:15PM +0800, Jason Wang wrote:
> On 10/15/2014 05:28 PM, Michael S. Tsirkin wrote:
> > On Wed, Oct 15, 2014 at 03:25:26PM +0800, Jason Wang wrote:
> >> This patch introduces virtio_enable_cb_avail() to publish avail idx
> >> and used event. This could be used by batched buffer submitting to
> >> reduce the number of tx interrupts.
> >>
> >> Cc: Rusty Russell <rusty@rustcorp.com.au>
> >> Cc: Michael S. Tsirkin <mst@redhat.com>
> >> Signed-off-by: Jason Wang <jasowang@redhat.com>
> >> ---
> >>  drivers/virtio/virtio_ring.c |   22 ++++++++++++++++++++--
> >>  include/linux/virtio.h       |    2 ++
> >>  2 files changed, 22 insertions(+), 2 deletions(-)
> >>
> >> diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> >> index 1b3929f..d67fbf8 100644
> >> --- a/drivers/virtio/virtio_ring.c
> >> +++ b/drivers/virtio/virtio_ring.c
> >> @@ -567,14 +567,32 @@ unsigned virtqueue_enable_cb_prepare(struct virtqueue *_vq)
> >>  	 * entry. Always do both to keep code simple. */
> >>  	vq->vring.avail->flags &= ~VRING_AVAIL_F_NO_INTERRUPT;
> >>  	/* Make sure used event never go backwards */
> >> -	if (!vring_need_event(vring_used_event(&vq->vring),
> >> -			      vq->vring.avail->idx, last_used_idx))
> >> +	if (vq->vring.avail->idx != vring_used_event(&vq->vring) &&
> >> +	    !vring_need_event(vring_used_event(&vq->vring),
> >> +			      vq->vring.avail->idx, last_used_idx)) {
> >>  		vring_used_event(&vq->vring) = last_used_idx;
> >> +	}
> >>  	END_USE(vq);
> >>  	return last_used_idx;
> >>  }
> >>  EXPORT_SYMBOL_GPL(virtqueue_enable_cb_prepare);
> >>
> > I see you are also changing virtqueue_enable_cb_prepare, why?
> 
> This is also used to prevent it from moving the used event backwards.
> This may happens when we handle tx napi after we publish avail idx as
> used event (virtqueue_enable_cb_avail() was called).

So it's wrong exactly in the same way.

But also, please document this stuff, don't put
unrelated changes in a patch called "introduce
virtqueue_enable_cb_avail".


> >
> >> +bool virtqueue_enable_cb_avail(struct virtqueue *_vq)
> >> +{
> >> +	struct vring_virtqueue *vq = to_vvq(_vq);
> >> +	bool ret;
> >> +
> >> +	START_USE(vq);
> >> +	vq->vring.avail->flags &= ~VRING_AVAIL_F_NO_INTERRUPT;
> >> +	vring_used_event(&vq->vring) = vq->vring.avail->idx;
> >> +	ret = vring_need_event(vq->vring.avail->idx,
> >> +			       vq->last_used_idx, vq->vring.used->idx);
> >> +	END_USE(vq);
> >> +
> >> +	return ret;
> >> +}
> >> +EXPORT_SYMBOL_GPL(virtqueue_enable_cb_avail);
> >> +
> >>  /**
> >>   * virtqueue_poll - query pending used buffers
> >>   * @vq: the struct virtqueue we're talking about.
> > Could not figure out what this does.
> > Please add documentation.
> >
> 
> Sure, does something like below explain what does this function do?
> 
> /**                                                                            
> 
>  * virtqueue_enable_cb_avail - restart callbacks after
> disable_cb.           
>  * @vq: the struct virtqueue we're talking
> about.                              
>  *                                                                             
> 
>  * This re-enables callbacks but hints to the other side to
> delay              
>  * interrupts all of the available buffers have been processed;         


So this is like virtqueue_enable_cb_delayed but even more
aggressive?
I think it's too agressive: it's better to wake up guest
after you are through most of buffers, but not all,
so guest and host can work in parallel.


>  * it returns "false" if there are at least one pending buffer in the
> queue,          
>  * to detect a possible race between the driver checking for more
> work,        
>  * and enabling
> callbacks.                                                     
>  *                                                                             
> 
>  * Caller must ensure we don't call this with other
> virtqueue                  
>  * operations at the same time (except where
> noted).                           
>  */
> 
> >> diff --git a/include/linux/virtio.h b/include/linux/virtio.h
> >> index b46671e..bfaf058 100644
> >> --- a/include/linux/virtio.h
> >> +++ b/include/linux/virtio.h
> >> @@ -65,6 +65,8 @@ bool virtqueue_enable_cb(struct virtqueue *vq);
> >>  
> >>  unsigned virtqueue_enable_cb_prepare(struct virtqueue *vq);
> >>  
> >> +bool virtqueue_enable_cb_avail(struct virtqueue *vq);
> >> +
> >>  bool virtqueue_poll(struct virtqueue *vq, unsigned);
> >>  
> >>  bool virtqueue_enable_cb_delayed(struct virtqueue *vq);
> >> -- 
> >> 1.7.1
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [RFC PATCH net-next 5/6] virtio-net: enable tx interrupt
  2014-10-15 10:25     ` Jason Wang
@ 2014-10-15 10:43       ` Michael S. Tsirkin
  2014-10-15 11:00         ` Jason Wang
  0 siblings, 1 reply; 36+ messages in thread
From: Michael S. Tsirkin @ 2014-10-15 10:43 UTC (permalink / raw)
  To: Jason Wang; +Cc: eric.dumazet, netdev, linux-kernel, virtualization, davem

On Wed, Oct 15, 2014 at 06:25:25PM +0800, Jason Wang wrote:
> On 10/15/2014 06:18 PM, Michael S. Tsirkin wrote:
> > On Wed, Oct 15, 2014 at 03:25:29PM +0800, Jason Wang wrote:
> >> > Orphan skb in ndo_start_xmit() breaks socket accounting and packet
> >> > queuing. This in fact breaks lots of things such as pktgen and several
> >> > TCP optimizations. And also make BQL can't be implemented for
> >> > virtio-net.
> >> > 
> >> > This patch tries to solve this issue by enabling tx interrupt. To
> >> > avoid introducing extra spinlocks, a tx napi was scheduled to free
> >> > those packets.
> >> > 
> >> > More tx interrupt mitigation method could be used on top.
> >> > 
> >> > Cc: Rusty Russell <rusty@rustcorp.com.au>
> >> > Cc: Michael S. Tsirkin <mst@redhat.com>
> >> > Signed-off-by: Jason Wang <jasowang@redhat.com>
> >> > ---
> >> >  drivers/net/virtio_net.c |  125 +++++++++++++++++++++++++++++++---------------
> >> >  1 files changed, 85 insertions(+), 40 deletions(-)
> >> > 
> >> > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> >> > index ccf98f9..2afc2e2 100644
> >> > --- a/drivers/net/virtio_net.c
> >> > +++ b/drivers/net/virtio_net.c
> >> > @@ -72,6 +72,8 @@ struct send_queue {
> >> >  
> >> >  	/* Name of the send queue: output.$index */
> >> >  	char name[40];
> >> > +
> >> > +	struct napi_struct napi;
> >> >  };
> >> >  
> >> >  /* Internal representation of a receive virtqueue */
> >> > @@ -217,15 +219,40 @@ static struct page *get_a_page(struct receive_queue *rq, gfp_t gfp_mask)
> >> >  	return p;
> >> >  }
> >> >  
> >> > +static int free_old_xmit_skbs(struct send_queue *sq, int budget)
> >> > +{
> >> > +	struct sk_buff *skb;
> >> > +	unsigned int len;
> >> > +	struct virtnet_info *vi = sq->vq->vdev->priv;
> >> > +	struct virtnet_stats *stats = this_cpu_ptr(vi->stats);
> >> > +	u64 tx_bytes = 0, tx_packets = 0;
> >> > +
> >> > +	while (tx_packets < budget &&
> >> > +	       (skb = virtqueue_get_buf(sq->vq, &len)) != NULL) {
> >> > +		pr_debug("Sent skb %p\n", skb);
> >> > +
> >> > +		tx_bytes += skb->len;
> >> > +		tx_packets++;
> >> > +
> >> > +		dev_kfree_skb_any(skb);
> >> > +	}
> >> > +
> >> > +	u64_stats_update_begin(&stats->tx_syncp);
> >> > +	stats->tx_bytes += tx_bytes;
> >> > +	stats->tx_packets =+ tx_packets;
> >> > +	u64_stats_update_end(&stats->tx_syncp);
> >> > +
> >> > +	return tx_packets;
> >> > +}
> >> > +
> >> >  static void skb_xmit_done(struct virtqueue *vq)
> >> >  {
> >> >  	struct virtnet_info *vi = vq->vdev->priv;
> >> > +	struct send_queue *sq = &vi->sq[vq2txq(vq)];
> >> >  
> >> > -	/* Suppress further interrupts. */
> >> > -	virtqueue_disable_cb(vq);
> >> > -
> >> > -	/* We were probably waiting for more output buffers. */
> >> > -	netif_wake_subqueue(vi->dev, vq2txq(vq));
> >> > +	if (napi_schedule_prep(&sq->napi)) {
> >> > +		__napi_schedule(&sq->napi);
> >> > +	}
> >> >  }
> >> >  
> >> >  static unsigned int mergeable_ctx_to_buf_truesize(unsigned long mrg_ctx)
> >> > @@ -774,7 +801,39 @@ again:
> >> >  	return received;
> >> >  }
> >> >  
> >> > +static int virtnet_poll_tx(struct napi_struct *napi, int budget)
> >> > +{
> >> > +	struct send_queue *sq =
> >> > +		container_of(napi, struct send_queue, napi);
> >> > +	struct virtnet_info *vi = sq->vq->vdev->priv;
> >> > +	struct netdev_queue *txq = netdev_get_tx_queue(vi->dev, vq2txq(sq->vq));
> >> > +	unsigned int r, sent = 0;
> >> > +
> >> > +again:
> >> > +	__netif_tx_lock(txq, smp_processor_id());
> >> > +	virtqueue_disable_cb(sq->vq);
> >> > +	sent += free_old_xmit_skbs(sq, budget - sent);
> >> > +
> >> > +	if (sent < budget) {
> >> > +		r = virtqueue_enable_cb_prepare(sq->vq);
> >> > +		napi_complete(napi);
> >> > +		__netif_tx_unlock(txq);
> >> > +		if (unlikely(virtqueue_poll(sq->vq, r)) &&
> > So you are enabling callback on the next packet,
> > which is almost sure to cause an interrupt storm
> > on the guest.
> >
> >
> > I think it's a bad idea, this is why I used
> > enable_cb_delayed in my patch.
> 
> Right, will do this, but may also need to make sure used event never
> goes back since we may call virtqueue_enable_cb_avail().

That's why my patch always calls virtqueue_enable_cb_delayed.
So no need for hacks.

Maybe you can review my patch and comment?




> >
> >
> >> > +		    napi_schedule_prep(napi)) {
> >> > +			virtqueue_disable_cb(sq->vq);
> >> > +			__napi_schedule(napi);
> >> > +			goto again;
> >> > +		}
> >> > +	} else {
> >> > +		__netif_tx_unlock(txq);
> >> > +	}
> >> > +
> >> > +	netif_wake_subqueue(vi->dev, vq2txq(sq->vq));
> >> > +	return sent;
> >> > +}
> >> > +
> >> >  #ifdef CONFIG_NET_RX_BUSY_POLL
> >> > +
> >> >  /* must be called with local_bh_disable()d */
> >> >  static int virtnet_busy_poll(struct napi_struct *napi)
> >> >  {
> >> > @@ -822,36 +881,12 @@ static int virtnet_open(struct net_device *dev)
> >> >  			if (!try_fill_recv(&vi->rq[i], GFP_KERNEL))
> >> >  				schedule_delayed_work(&vi->refill, 0);
> >> >  		virtnet_napi_enable(&vi->rq[i]);
> >> > +		napi_enable(&vi->sq[i].napi);
> >> >  	}
> >> >  
> >> >  	return 0;
> >> >  }
> >> >  
> >> > -static int free_old_xmit_skbs(struct send_queue *sq)
> >> > -{
> >> > -	struct sk_buff *skb;
> >> > -	unsigned int len;
> >> > -	struct virtnet_info *vi = sq->vq->vdev->priv;
> >> > -	struct virtnet_stats *stats = this_cpu_ptr(vi->stats);
> >> > -	u64 tx_bytes = 0, tx_packets = 0;
> >> > -
> >> > -	while ((skb = virtqueue_get_buf(sq->vq, &len)) != NULL) {
> >> > -		pr_debug("Sent skb %p\n", skb);
> >> > -
> >> > -		tx_bytes += skb->len;
> >> > -		tx_packets++;
> >> > -
> >> > -		dev_kfree_skb_any(skb);
> >> > -	}
> >> > -
> >> > -	u64_stats_update_begin(&stats->tx_syncp);
> >> > -	stats->tx_bytes += tx_bytes;
> >> > -	stats->tx_packets =+ tx_packets;
> >> > -	u64_stats_update_end(&stats->tx_syncp);
> >> > -
> >> > -	return tx_packets;
> >> > -}
> >> > -
> > So you end up moving it all anyway, why bother splitting out
> > minor changes in previous patches?
> 
> To make review easier, but if you think this complicated it in fact,
> will pack them into a single patch.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [RFC PATCH net-next 1/6] virtio: make sure used event never go backwards
  2014-10-15 10:32       ` Michael S. Tsirkin
@ 2014-10-15 10:44         ` Jason Wang
  2014-10-15 11:38           ` Michael S. Tsirkin
  0 siblings, 1 reply; 36+ messages in thread
From: Jason Wang @ 2014-10-15 10:44 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: eric.dumazet, netdev, linux-kernel, virtualization, davem

On 10/15/2014 06:32 PM, Michael S. Tsirkin wrote:
> On Wed, Oct 15, 2014 at 06:13:19PM +0800, Jason Wang wrote:
>> On 10/15/2014 05:34 PM, Michael S. Tsirkin wrote:
>>> On Wed, Oct 15, 2014 at 03:25:25PM +0800, Jason Wang wrote:
>>>> This patch checks the new event idx to make sure used event idx never
>>>> goes back. This is used to synchronize the calls between
>>>> virtqueue_enable_cb_delayed() and virtqueue_enable_cb().
>>>>
>>>> Cc: Rusty Russell <rusty@rustcorp.com.au>
>>>> Cc: Michael S. Tsirkin <mst@redhat.com>
>>>> Signed-off-by: Jason Wang <jasowang@redhat.com>
>>> the implication being that moving event idx back might cause some race
>>> condition?  
>> This will cause race condition when tx interrupt is enabled. Consider
>> the following cases
>>
>> 1) tx napi was scheduled
>> 2) start_xmit() call virtqueue_enable_cb_delayed() and disable cb, [used
>> event is vq->last_used_idx + 3/4 pendg bufs]
>> 3) tx napi enable the callback by virtqueue_enable_cb() [ used event is
>> vq->last_used_idx ]
>>  
>> After step 3, used event was moved back, unnecessary tx interrupt was
>> triggered.
> Well unnecessary interrupts are safe.

But it that is what we want to reduce.
> With your patch caller of virtqueue_enable_cb will not get an
> interrupt on the next buffer which is not safe.
>
> If you don't want an interrupt on the next buffer, don't
> call virtqueue_enable_cb.

So something like this patch should be done in virtio core somewhere
else. Virtio-net can not do this since it does not have the knowledge of
event index.
>
>>> If yes but please describe the race explicitly.
>>> Is there a bug we need to fix on stable?
>> Looks not, current code does not have such race condition.
>>> Please also explicitly describe a configuration that causes event idx
>>> to go back.
>>>
>>> All this info should go in the commit log.
>> Will do this.
>>>> ---
>>>>  drivers/virtio/virtio_ring.c |    7 +++++--
>>>>  1 files changed, 5 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
>>>> index 3b1f89b..1b3929f 100644
>>>> --- a/drivers/virtio/virtio_ring.c
>>>> +++ b/drivers/virtio/virtio_ring.c
>>>> @@ -559,14 +559,17 @@ unsigned virtqueue_enable_cb_prepare(struct virtqueue *_vq)
>>>>  	u16 last_used_idx;
>>>>  
>>>>  	START_USE(vq);
>>>> -
>>>> +	last_used_idx = vq->last_used_idx;
>>>>  	/* We optimistically turn back on interrupts, then check if there was
>>>>  	 * more to do. */
>>>>  	/* Depending on the VIRTIO_RING_F_EVENT_IDX feature, we need to
>>>>  	 * either clear the flags bit or point the event index at the next
>>>>  	 * entry. Always do both to keep code simple. */
>>>>  	vq->vring.avail->flags &= ~VRING_AVAIL_F_NO_INTERRUPT;
>>>> -	vring_used_event(&vq->vring) = last_used_idx = vq->last_used_idx;
>>>> +	/* Make sure used event never go backwards */
>>> s/go/goes/
>>>
>>>> +	if (!vring_need_event(vring_used_event(&vq->vring),
>>>> +			      vq->vring.avail->idx, last_used_idx))
>>>> +		vring_used_event(&vq->vring) = last_used_idx;
>>> The result will be that driver will *not* get an interrupt
>>> on the next consumed buffer, which is likely not what driver
>>> intended when it called virtqueue_enable_cb.
>> This will only happen when we want to delay the interrupt for next few
>> consumed buffers (virtqueue_enable_cb_delayed() was called). For the
>> other case, vq->last_used_idx should be ahead of previous used event. Do
>> you see any other case?
> Call virtqueue_enable_cb_delayed, later call virtqueue_enable_cb.  If
> event index is not updated in virtqueue_enable_cb, driver will not get
> an interrupt on the next buffer.

This is just what we want I think. The interrupt was not lost but fired
after 3/4 pending buffers were consumed. Do you see any real issue on this?
>
>>> Instead, how about we simply document the requirement that drivers either
>>> always call virtqueue_enable_cb_delayed or virtqueue_enable_cb
>>> but not both?
>> We need call them both when tx interrupt is enabled I believe.
> Can you pls reply to my patch and document issues you see?
>

In the previous reply you said you're using
virtuqueue_enable_cb_delayed(), so no race in your patch.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [RFC PATCH net-next 6/6] virtio-net: enable tx interrupt only for the final skb in the chain
  2014-10-15 10:31     ` Jason Wang
@ 2014-10-15 10:46       ` Michael S. Tsirkin
  0 siblings, 0 replies; 36+ messages in thread
From: Michael S. Tsirkin @ 2014-10-15 10:46 UTC (permalink / raw)
  To: Jason Wang; +Cc: eric.dumazet, netdev, linux-kernel, virtualization, davem

On Wed, Oct 15, 2014 at 06:31:19PM +0800, Jason Wang wrote:
> On 10/15/2014 06:22 PM, Michael S. Tsirkin wrote:
> > On Wed, Oct 15, 2014 at 03:25:30PM +0800, Jason Wang wrote:
> >> With the help of xmit_more and virtqueue_enable_cb_avail(), this patch
> >> enable tx interrupt only for the final skb in the chain if
> >> needed. This will help to mitigate tx interrupts.
> >>
> >> Cc: Rusty Russell <rusty@rustcorp.com.au>
> >> Cc: Michael S. Tsirkin <mst@redhat.com>
> >> Signed-off-by: Jason Wang <jasowang@redhat.com>
> > I think you should split xmit_more stuff out.
> 
> No much difference but if you prefer I will do this.

In fact I did it exactly like this already :)

-- 
MST

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [RFC PATCH net-next 3/6] virtio-net: small optimization on free_old_xmit_skbs()
  2014-10-15  9:49     ` David Laight
@ 2014-10-15 10:48       ` Michael S. Tsirkin
  2014-10-15 10:51         ` David Laight
  0 siblings, 1 reply; 36+ messages in thread
From: Michael S. Tsirkin @ 2014-10-15 10:48 UTC (permalink / raw)
  To: David Laight; +Cc: eric.dumazet, netdev, linux-kernel, virtualization, davem

On Wed, Oct 15, 2014 at 09:49:01AM +0000, David Laight wrote:
> From: Of Michael S. Tsirkin
> > On Wed, Oct 15, 2014 at 03:25:27PM +0800, Jason Wang wrote:
> > > Accumulate the sent packets and sent bytes in local variables and perform a
> > > single u64_stats_update_begin/end() after.
> > >
> > > Cc: Rusty Russell <rusty@rustcorp.com.au>
> > > Cc: Michael S. Tsirkin <mst@redhat.com>
> > > Signed-off-by: Jason Wang <jasowang@redhat.com>
> > 
> > Not sure how much it's worth but since Eric suggested it ...
> 
> Probably depends on the actual cost of u64_stats_update_begin/end
> against the likely extra saving of the tx_bytes and tx_packets
> values onto the stack across the call to dev_kfree_skb_any().
> (Which depends on the number of caller saved registers.)

Yea, some benchmark results would be nice to see.


> > Acked-by: Michael S. Tsirkin <mst@redhat.com>
> > 
> > > ---
> > >  drivers/net/virtio_net.c |   12 ++++++++----
> > >  1 files changed, 8 insertions(+), 4 deletions(-)
> > >
> > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > > index 3d0ce44..a4d56b8 100644
> > > --- a/drivers/net/virtio_net.c
> > > +++ b/drivers/net/virtio_net.c
> > > @@ -833,17 +833,21 @@ static void free_old_xmit_skbs(struct send_queue *sq)
> > >  	unsigned int len;
> > >  	struct virtnet_info *vi = sq->vq->vdev->priv;
> > >  	struct virtnet_stats *stats = this_cpu_ptr(vi->stats);
> > > +	u64 tx_bytes = 0, tx_packets = 0;
> 
> tx_packets need only be 'unsigned int'.
> The same is almost certainly true of tx_bytes.
> 
> 	David
> 
> > >  	while ((skb = virtqueue_get_buf(sq->vq, &len)) != NULL) {
> > >  		pr_debug("Sent skb %p\n", skb);
> > >
> > > -		u64_stats_update_begin(&stats->tx_syncp);
> > > -		stats->tx_bytes += skb->len;
> > > -		stats->tx_packets++;
> > > -		u64_stats_update_end(&stats->tx_syncp);
> > > +		tx_bytes += skb->len;
> > > +		tx_packets++;
> > >
> > >  		dev_kfree_skb_any(skb);
> > >  	}
> > > +
> > > +	u64_stats_update_begin(&stats->tx_syncp);
> > > +	stats->tx_bytes += tx_bytes;
> > > +	stats->tx_packets =+ tx_packets;
> > > +	u64_stats_update_end(&stats->tx_syncp);
> > >  }
> > >
> > >  static int xmit_skb(struct send_queue *sq, struct sk_buff *skb)
> > > --
> > > 1.7.1
> > --
> > To unsubscribe from this list: send the line "unsubscribe netdev" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

^ permalink raw reply	[flat|nested] 36+ messages in thread

* RE: [RFC PATCH net-next 3/6] virtio-net: small optimization on free_old_xmit_skbs()
  2014-10-15 10:48       ` Michael S. Tsirkin
@ 2014-10-15 10:51         ` David Laight
  2014-10-15 12:00           ` Michael S. Tsirkin
  0 siblings, 1 reply; 36+ messages in thread
From: David Laight @ 2014-10-15 10:51 UTC (permalink / raw)
  To: 'Michael S. Tsirkin'
  Cc: eric.dumazet, netdev, linux-kernel, virtualization, davem

From: Michael S. Tsirkin
> On Wed, Oct 15, 2014 at 09:49:01AM +0000, David Laight wrote:
> > From: Of Michael S. Tsirkin
> > > On Wed, Oct 15, 2014 at 03:25:27PM +0800, Jason Wang wrote:
> > > > Accumulate the sent packets and sent bytes in local variables and perform a
> > > > single u64_stats_update_begin/end() after.
> > > >
> > > > Cc: Rusty Russell <rusty@rustcorp.com.au>
> > > > Cc: Michael S. Tsirkin <mst@redhat.com>
> > > > Signed-off-by: Jason Wang <jasowang@redhat.com>
> > >
> > > Not sure how much it's worth but since Eric suggested it ...
> >
> > Probably depends on the actual cost of u64_stats_update_begin/end
> > against the likely extra saving of the tx_bytes and tx_packets
> > values onto the stack across the call to dev_kfree_skb_any().
> > (Which depends on the number of caller saved registers.)
> 
> Yea, some benchmark results would be nice to see.

I there are likely to be multiple skb on the queue the fastest
code would probably do one 'virtqueue_get_all()' that returned
a linked list of buffers, then follow the list to get the stats,
and follow it again to free the skb.

	David

> > > Acked-by: Michael S. Tsirkin <mst@redhat.com>
> > >
> > > > ---
> > > >  drivers/net/virtio_net.c |   12 ++++++++----
> > > >  1 files changed, 8 insertions(+), 4 deletions(-)
> > > >
> > > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > > > index 3d0ce44..a4d56b8 100644
> > > > --- a/drivers/net/virtio_net.c
> > > > +++ b/drivers/net/virtio_net.c
> > > > @@ -833,17 +833,21 @@ static void free_old_xmit_skbs(struct send_queue *sq)
> > > >  	unsigned int len;
> > > >  	struct virtnet_info *vi = sq->vq->vdev->priv;
> > > >  	struct virtnet_stats *stats = this_cpu_ptr(vi->stats);
> > > > +	u64 tx_bytes = 0, tx_packets = 0;
> >
> > tx_packets need only be 'unsigned int'.
> > The same is almost certainly true of tx_bytes.
> >
> > 	David
> >
> > > >  	while ((skb = virtqueue_get_buf(sq->vq, &len)) != NULL) {
> > > >  		pr_debug("Sent skb %p\n", skb);
> > > >
> > > > -		u64_stats_update_begin(&stats->tx_syncp);
> > > > -		stats->tx_bytes += skb->len;
> > > > -		stats->tx_packets++;
> > > > -		u64_stats_update_end(&stats->tx_syncp);
> > > > +		tx_bytes += skb->len;
> > > > +		tx_packets++;
> > > >
> > > >  		dev_kfree_skb_any(skb);
> > > >  	}
> > > > +
> > > > +	u64_stats_update_begin(&stats->tx_syncp);
> > > > +	stats->tx_bytes += tx_bytes;
> > > > +	stats->tx_packets =+ tx_packets;
> > > > +	u64_stats_update_end(&stats->tx_syncp);
> > > >  }
> > > >
> > > >  static int xmit_skb(struct send_queue *sq, struct sk_buff *skb)
> > > > --
> > > > 1.7.1
> > > --
> > > To unsubscribe from this list: send the line "unsubscribe netdev" in
> > > the body of a message to majordomo@vger.kernel.org
> > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [RFC PATCH net-next 2/6] virtio: introduce virtio_enable_cb_avail()
  2014-10-15 10:41       ` Michael S. Tsirkin
@ 2014-10-15 10:58         ` Jason Wang
  2014-10-15 11:43           ` Michael S. Tsirkin
  0 siblings, 1 reply; 36+ messages in thread
From: Jason Wang @ 2014-10-15 10:58 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: eric.dumazet, netdev, linux-kernel, virtualization, davem

On 10/15/2014 06:41 PM, Michael S. Tsirkin wrote:
> On Wed, Oct 15, 2014 at 06:19:15PM +0800, Jason Wang wrote:
>> On 10/15/2014 05:28 PM, Michael S. Tsirkin wrote:
>>> On Wed, Oct 15, 2014 at 03:25:26PM +0800, Jason Wang wrote:
>>>> This patch introduces virtio_enable_cb_avail() to publish avail idx
>>>> and used event. This could be used by batched buffer submitting to
>>>> reduce the number of tx interrupts.
>>>>
>>>> Cc: Rusty Russell <rusty@rustcorp.com.au>
>>>> Cc: Michael S. Tsirkin <mst@redhat.com>
>>>> Signed-off-by: Jason Wang <jasowang@redhat.com>
>>>> ---
>>>>  drivers/virtio/virtio_ring.c |   22 ++++++++++++++++++++--
>>>>  include/linux/virtio.h       |    2 ++
>>>>  2 files changed, 22 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
>>>> index 1b3929f..d67fbf8 100644
>>>> --- a/drivers/virtio/virtio_ring.c
>>>> +++ b/drivers/virtio/virtio_ring.c
>>>> @@ -567,14 +567,32 @@ unsigned virtqueue_enable_cb_prepare(struct virtqueue *_vq)
>>>>  	 * entry. Always do both to keep code simple. */
>>>>  	vq->vring.avail->flags &= ~VRING_AVAIL_F_NO_INTERRUPT;
>>>>  	/* Make sure used event never go backwards */
>>>> -	if (!vring_need_event(vring_used_event(&vq->vring),
>>>> -			      vq->vring.avail->idx, last_used_idx))
>>>> +	if (vq->vring.avail->idx != vring_used_event(&vq->vring) &&
>>>> +	    !vring_need_event(vring_used_event(&vq->vring),
>>>> +			      vq->vring.avail->idx, last_used_idx)) {
>>>>  		vring_used_event(&vq->vring) = last_used_idx;
>>>> +	}
>>>>  	END_USE(vq);
>>>>  	return last_used_idx;
>>>>  }
>>>>  EXPORT_SYMBOL_GPL(virtqueue_enable_cb_prepare);
>>>>
>>> I see you are also changing virtqueue_enable_cb_prepare, why?
>> This is also used to prevent it from moving the used event backwards.
>> This may happens when we handle tx napi after we publish avail idx as
>> used event (virtqueue_enable_cb_avail() was called).
> So it's wrong exactly in the same way.
>
> But also, please document this stuff, don't put
> unrelated changes in a patch called "introduce
> virtqueue_enable_cb_avail".
>
>
>>>> +bool virtqueue_enable_cb_avail(struct virtqueue *_vq)
>>>> +{
>>>> +	struct vring_virtqueue *vq = to_vvq(_vq);
>>>> +	bool ret;
>>>> +
>>>> +	START_USE(vq);
>>>> +	vq->vring.avail->flags &= ~VRING_AVAIL_F_NO_INTERRUPT;
>>>> +	vring_used_event(&vq->vring) = vq->vring.avail->idx;
>>>> +	ret = vring_need_event(vq->vring.avail->idx,
>>>> +			       vq->last_used_idx, vq->vring.used->idx);
>>>> +	END_USE(vq);
>>>> +
>>>> +	return ret;
>>>> +}
>>>> +EXPORT_SYMBOL_GPL(virtqueue_enable_cb_avail);
>>>> +
>>>>  /**
>>>>   * virtqueue_poll - query pending used buffers
>>>>   * @vq: the struct virtqueue we're talking about.
>>> Could not figure out what this does.
>>> Please add documentation.
>>>
>> Sure, does something like below explain what does this function do?
>>
>> /**                                                                            
>>
>>  * virtqueue_enable_cb_avail - restart callbacks after
>> disable_cb.           
>>  * @vq: the struct virtqueue we're talking
>> about.                              
>>  *                                                                             
>>
>>  * This re-enables callbacks but hints to the other side to
>> delay              
>>  * interrupts all of the available buffers have been processed;         
>
> So this is like virtqueue_enable_cb_delayed but even more
> aggressive?
> I think it's too agressive: it's better to wake up guest
> after you are through most of buffers, but not all,
> so guest and host can work in parallel.

Note that:

- it was only used when there are still few of free slots (which is
greater than 2 + MAX_SKB_FRAGS)
- my patch keeps the free_old_xmit_skbs() in the beginning of
start_xmit(), so the tx skb reclaiming does not depends totally on tx
interrupt. If more packets comes, we'd expect some of them were freed in
ndo_start_xmit(). If not, finally we may trigger the
virtqueue_enable_cb_delayed().

So probably not as aggressive as it looks. I will do benchmark on this.
>
>
>>  * it returns "false" if there are at least one pending buffer in the
>> queue,          
>>  * to detect a possible race between the driver checking for more
>> work,        
>>  * and enabling
>> callbacks.                                                     
>>  *                                                                             
>>
>>  * Caller must ensure we don't call this with other
>> virtqueue                  
>>  * operations at the same time (except where
>> noted).                           
>>  */
>>
>>>> diff --git a/include/linux/virtio.h b/include/linux/virtio.h
>>>> index b46671e..bfaf058 100644
>>>> --- a/include/linux/virtio.h
>>>> +++ b/include/linux/virtio.h
>>>> @@ -65,6 +65,8 @@ bool virtqueue_enable_cb(struct virtqueue *vq);
>>>>  
>>>>  unsigned virtqueue_enable_cb_prepare(struct virtqueue *vq);
>>>>  
>>>> +bool virtqueue_enable_cb_avail(struct virtqueue *vq);
>>>> +
>>>>  bool virtqueue_poll(struct virtqueue *vq, unsigned);
>>>>  
>>>>  bool virtqueue_enable_cb_delayed(struct virtqueue *vq);
>>>> -- 
>>>> 1.7.1
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>> Please read the FAQ at  http://www.tux.org/lkml/
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [RFC PATCH net-next 5/6] virtio-net: enable tx interrupt
  2014-10-15 10:43       ` Michael S. Tsirkin
@ 2014-10-15 11:00         ` Jason Wang
  0 siblings, 0 replies; 36+ messages in thread
From: Jason Wang @ 2014-10-15 11:00 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: eric.dumazet, netdev, linux-kernel, virtualization, davem

On 10/15/2014 06:43 PM, Michael S. Tsirkin wrote:
> On Wed, Oct 15, 2014 at 06:25:25PM +0800, Jason Wang wrote:
>> > On 10/15/2014 06:18 PM, Michael S. Tsirkin wrote:
>>> > > On Wed, Oct 15, 2014 at 03:25:29PM +0800, Jason Wang wrote:
>>>>> > >> > Orphan skb in ndo_start_xmit() breaks socket accounting and packet
>>>>> > >> > queuing. This in fact breaks lots of things such as pktgen and several
>>>>> > >> > TCP optimizations. And also make BQL can't be implemented for
>>>>> > >> > virtio-net.
>>>>> > >> > 
>>>>> > >> > This patch tries to solve this issue by enabling tx interrupt. To
>>>>> > >> > avoid introducing extra spinlocks, a tx napi was scheduled to free
>>>>> > >> > those packets.
>>>>> > >> > 
>>>>> > >> > More tx interrupt mitigation method could be used on top.
>>>>> > >> > 
>>>>> > >> > Cc: Rusty Russell <rusty@rustcorp.com.au>
>>>>> > >> > Cc: Michael S. Tsirkin <mst@redhat.com>
>>>>> > >> > Signed-off-by: Jason Wang <jasowang@redhat.com>
>>>>> > >> > ---
>>>>> > >> >  drivers/net/virtio_net.c |  125 +++++++++++++++++++++++++++++++---------------
>>>>> > >> >  1 files changed, 85 insertions(+), 40 deletions(-)
>>>>> > >> > 
>>>>> > >> > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
>>>>> > >> > index ccf98f9..2afc2e2 100644
>>>>> > >> > --- a/drivers/net/virtio_net.c
>>>>> > >> > +++ b/drivers/net/virtio_net.c
>>>>> > >> > @@ -72,6 +72,8 @@ struct send_queue {
>>>>> > >> >  
>>>>> > >> >  	/* Name of the send queue: output.$index */
>>>>> > >> >  	char name[40];
>>>>> > >> > +
>>>>> > >> > +	struct napi_struct napi;
>>>>> > >> >  };
>>>>> > >> >  
>>>>> > >> >  /* Internal representation of a receive virtqueue */
>>>>> > >> > @@ -217,15 +219,40 @@ static struct page *get_a_page(struct receive_queue *rq, gfp_t gfp_mask)
>>>>> > >> >  	return p;
>>>>> > >> >  }
>>>>> > >> >  
>>>>> > >> > +static int free_old_xmit_skbs(struct send_queue *sq, int budget)
>>>>> > >> > +{
>>>>> > >> > +	struct sk_buff *skb;
>>>>> > >> > +	unsigned int len;
>>>>> > >> > +	struct virtnet_info *vi = sq->vq->vdev->priv;
>>>>> > >> > +	struct virtnet_stats *stats = this_cpu_ptr(vi->stats);
>>>>> > >> > +	u64 tx_bytes = 0, tx_packets = 0;
>>>>> > >> > +
>>>>> > >> > +	while (tx_packets < budget &&
>>>>> > >> > +	       (skb = virtqueue_get_buf(sq->vq, &len)) != NULL) {
>>>>> > >> > +		pr_debug("Sent skb %p\n", skb);
>>>>> > >> > +
>>>>> > >> > +		tx_bytes += skb->len;
>>>>> > >> > +		tx_packets++;
>>>>> > >> > +
>>>>> > >> > +		dev_kfree_skb_any(skb);
>>>>> > >> > +	}
>>>>> > >> > +
>>>>> > >> > +	u64_stats_update_begin(&stats->tx_syncp);
>>>>> > >> > +	stats->tx_bytes += tx_bytes;
>>>>> > >> > +	stats->tx_packets =+ tx_packets;
>>>>> > >> > +	u64_stats_update_end(&stats->tx_syncp);
>>>>> > >> > +
>>>>> > >> > +	return tx_packets;
>>>>> > >> > +}
>>>>> > >> > +
>>>>> > >> >  static void skb_xmit_done(struct virtqueue *vq)
>>>>> > >> >  {
>>>>> > >> >  	struct virtnet_info *vi = vq->vdev->priv;
>>>>> > >> > +	struct send_queue *sq = &vi->sq[vq2txq(vq)];
>>>>> > >> >  
>>>>> > >> > -	/* Suppress further interrupts. */
>>>>> > >> > -	virtqueue_disable_cb(vq);
>>>>> > >> > -
>>>>> > >> > -	/* We were probably waiting for more output buffers. */
>>>>> > >> > -	netif_wake_subqueue(vi->dev, vq2txq(vq));
>>>>> > >> > +	if (napi_schedule_prep(&sq->napi)) {
>>>>> > >> > +		__napi_schedule(&sq->napi);
>>>>> > >> > +	}
>>>>> > >> >  }
>>>>> > >> >  
>>>>> > >> >  static unsigned int mergeable_ctx_to_buf_truesize(unsigned long mrg_ctx)
>>>>> > >> > @@ -774,7 +801,39 @@ again:
>>>>> > >> >  	return received;
>>>>> > >> >  }
>>>>> > >> >  
>>>>> > >> > +static int virtnet_poll_tx(struct napi_struct *napi, int budget)
>>>>> > >> > +{
>>>>> > >> > +	struct send_queue *sq =
>>>>> > >> > +		container_of(napi, struct send_queue, napi);
>>>>> > >> > +	struct virtnet_info *vi = sq->vq->vdev->priv;
>>>>> > >> > +	struct netdev_queue *txq = netdev_get_tx_queue(vi->dev, vq2txq(sq->vq));
>>>>> > >> > +	unsigned int r, sent = 0;
>>>>> > >> > +
>>>>> > >> > +again:
>>>>> > >> > +	__netif_tx_lock(txq, smp_processor_id());
>>>>> > >> > +	virtqueue_disable_cb(sq->vq);
>>>>> > >> > +	sent += free_old_xmit_skbs(sq, budget - sent);
>>>>> > >> > +
>>>>> > >> > +	if (sent < budget) {
>>>>> > >> > +		r = virtqueue_enable_cb_prepare(sq->vq);
>>>>> > >> > +		napi_complete(napi);
>>>>> > >> > +		__netif_tx_unlock(txq);
>>>>> > >> > +		if (unlikely(virtqueue_poll(sq->vq, r)) &&
>>> > > So you are enabling callback on the next packet,
>>> > > which is almost sure to cause an interrupt storm
>>> > > on the guest.
>>> > >
>>> > >
>>> > > I think it's a bad idea, this is why I used
>>> > > enable_cb_delayed in my patch.
>> > 
>> > Right, will do this, but may also need to make sure used event never
>> > goes back since we may call virtqueue_enable_cb_avail().
> That's why my patch always calls virtqueue_enable_cb_delayed.
> So no need for hacks.
>
> Maybe you can review my patch and comment?

I think I miss the virtqueue_enable_cb_delayed() in your patch when I'm
reviewing it. Will do.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [RFC PATCH net-next 0/6] Always use tx interrupt for virtio-net
  2014-10-15 10:25 ` [RFC PATCH net-next 0/6] Always use tx interrupt for virtio-net Michael S. Tsirkin
@ 2014-10-15 11:14   ` Jason Wang
  2014-10-15 11:58     ` Michael S. Tsirkin
  0 siblings, 1 reply; 36+ messages in thread
From: Jason Wang @ 2014-10-15 11:14 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: eric.dumazet, netdev, linux-kernel, virtualization, davem

On 10/15/2014 06:25 PM, Michael S. Tsirkin wrote:
> On Wed, Oct 15, 2014 at 03:25:24PM +0800, Jason Wang wrote:
>> According to David, proper accounting and queueing (at all levels, not
>> just TCP sockets) is more important than trying to skim a bunch of
>> cycles by avoiding TX interrupts.
> He also mentioned we should find other ways to batch
>

Right.
>> Having an event to free the SKB is
>> absolutely essential for the stack to operate correctly.
>>
>> This series tries to enable tx interrupt for virtio-net. The idea is
>> simple: enable tx interrupt and schedule a tx napi to free old xmit
>> skbs.
>>
>> Several notes:
>> - Tx interrupt storm avoidance when queue is about to be full is
>>   kept.
> But queue is typically *not* full. More important to avoid interrupt
> storms in that case IMO.

Yes.
>> Since we may enable callbacks in both ndo_start_xmit() and tx
>>   napi, patch 1 adds a check to make sure used event never go
>>   back. This will let the napi not enable the callbacks wrongly after
>>   delayed callbacks was used.
> So why not just use delayed callbacks?

This means the tx interrupt are coalesced in a somewhat adaptive way.
Need benchmark to see its effect.
>
>> - For bulk dequeuing, there's no need to enable tx interrupt for each
>>   packet. The last patch only enable tx interrupt for the final skb in
>>   the chain through xmit_more and a new helper to publish current avail
>>   idx as used event.
>>
>> This series fixes several issues of original rfc pointed out by Michael.
> Could you list the issues, for ease of review?

Probably just one:

- Move the virtqueue_disable_cb() from skb_xmit_done() into
virtnet_poll_tx() under tx lock.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [RFC PATCH net-next 1/6] virtio: make sure used event never go backwards
  2014-10-15 10:44         ` Jason Wang
@ 2014-10-15 11:38           ` Michael S. Tsirkin
  2014-10-17  5:04             ` Jason Wang
  0 siblings, 1 reply; 36+ messages in thread
From: Michael S. Tsirkin @ 2014-10-15 11:38 UTC (permalink / raw)
  To: Jason Wang; +Cc: eric.dumazet, netdev, linux-kernel, virtualization, davem

On Wed, Oct 15, 2014 at 06:44:41PM +0800, Jason Wang wrote:
> On 10/15/2014 06:32 PM, Michael S. Tsirkin wrote:
> > On Wed, Oct 15, 2014 at 06:13:19PM +0800, Jason Wang wrote:
> >> On 10/15/2014 05:34 PM, Michael S. Tsirkin wrote:
> >>> On Wed, Oct 15, 2014 at 03:25:25PM +0800, Jason Wang wrote:
> >>>> This patch checks the new event idx to make sure used event idx never
> >>>> goes back. This is used to synchronize the calls between
> >>>> virtqueue_enable_cb_delayed() and virtqueue_enable_cb().
> >>>>
> >>>> Cc: Rusty Russell <rusty@rustcorp.com.au>
> >>>> Cc: Michael S. Tsirkin <mst@redhat.com>
> >>>> Signed-off-by: Jason Wang <jasowang@redhat.com>
> >>> the implication being that moving event idx back might cause some race
> >>> condition?  
> >> This will cause race condition when tx interrupt is enabled. Consider
> >> the following cases
> >>
> >> 1) tx napi was scheduled
> >> 2) start_xmit() call virtqueue_enable_cb_delayed() and disable cb, [used
> >> event is vq->last_used_idx + 3/4 pendg bufs]
> >> 3) tx napi enable the callback by virtqueue_enable_cb() [ used event is
> >> vq->last_used_idx ]
> >>  
> >> After step 3, used event was moved back, unnecessary tx interrupt was
> >> triggered.
> > Well unnecessary interrupts are safe.
> 
> But it that is what we want to reduce.

It's all about correctness. I don't think mixing enable_cb
and enable_cb_delayed makes sense, let's just make
virtio behave correctly if that happens, no need to
optimize for that.


> > With your patch caller of virtqueue_enable_cb will not get an
> > interrupt on the next buffer which is not safe.
> >
> > If you don't want an interrupt on the next buffer, don't
> > call virtqueue_enable_cb.
> 
> So something like this patch should be done in virtio core somewhere
> else. Virtio-net can not do this since it does not have the knowledge of
> event index.

Take a look at my patch - no calls to enable_cb, only
enable_cb_delayed, so we should be fine.

> >
> >>> If yes but please describe the race explicitly.
> >>> Is there a bug we need to fix on stable?
> >> Looks not, current code does not have such race condition.
> >>> Please also explicitly describe a configuration that causes event idx
> >>> to go back.
> >>>
> >>> All this info should go in the commit log.
> >> Will do this.
> >>>> ---
> >>>>  drivers/virtio/virtio_ring.c |    7 +++++--
> >>>>  1 files changed, 5 insertions(+), 2 deletions(-)
> >>>>
> >>>> diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> >>>> index 3b1f89b..1b3929f 100644
> >>>> --- a/drivers/virtio/virtio_ring.c
> >>>> +++ b/drivers/virtio/virtio_ring.c
> >>>> @@ -559,14 +559,17 @@ unsigned virtqueue_enable_cb_prepare(struct virtqueue *_vq)
> >>>>  	u16 last_used_idx;
> >>>>  
> >>>>  	START_USE(vq);
> >>>> -
> >>>> +	last_used_idx = vq->last_used_idx;
> >>>>  	/* We optimistically turn back on interrupts, then check if there was
> >>>>  	 * more to do. */
> >>>>  	/* Depending on the VIRTIO_RING_F_EVENT_IDX feature, we need to
> >>>>  	 * either clear the flags bit or point the event index at the next
> >>>>  	 * entry. Always do both to keep code simple. */
> >>>>  	vq->vring.avail->flags &= ~VRING_AVAIL_F_NO_INTERRUPT;
> >>>> -	vring_used_event(&vq->vring) = last_used_idx = vq->last_used_idx;
> >>>> +	/* Make sure used event never go backwards */
> >>> s/go/goes/
> >>>
> >>>> +	if (!vring_need_event(vring_used_event(&vq->vring),
> >>>> +			      vq->vring.avail->idx, last_used_idx))
> >>>> +		vring_used_event(&vq->vring) = last_used_idx;
> >>> The result will be that driver will *not* get an interrupt
> >>> on the next consumed buffer, which is likely not what driver
> >>> intended when it called virtqueue_enable_cb.
> >> This will only happen when we want to delay the interrupt for next few
> >> consumed buffers (virtqueue_enable_cb_delayed() was called). For the
> >> other case, vq->last_used_idx should be ahead of previous used event. Do
> >> you see any other case?
> > Call virtqueue_enable_cb_delayed, later call virtqueue_enable_cb.  If
> > event index is not updated in virtqueue_enable_cb, driver will not get
> > an interrupt on the next buffer.
> 
> This is just what we want I think. The interrupt was not lost but fired
> after 3/4 pending buffers were consumed. Do you see any real issue on this?

Yes, this violates the API. For example device might never
consume the rest of buffers.


> >
> >>> Instead, how about we simply document the requirement that drivers either
> >>> always call virtqueue_enable_cb_delayed or virtqueue_enable_cb
> >>> but not both?
> >> We need call them both when tx interrupt is enabled I believe.
> > Can you pls reply to my patch and document issues you see?
> >
> 
> In the previous reply you said you're using
> virtuqueue_enable_cb_delayed(), so no race in your patch.

OK so you think my patch is also correct, but that yours gives better
efficiency?

-- 
MST

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [RFC PATCH net-next 2/6] virtio: introduce virtio_enable_cb_avail()
  2014-10-15 10:58         ` Jason Wang
@ 2014-10-15 11:43           ` Michael S. Tsirkin
  0 siblings, 0 replies; 36+ messages in thread
From: Michael S. Tsirkin @ 2014-10-15 11:43 UTC (permalink / raw)
  To: Jason Wang; +Cc: eric.dumazet, netdev, linux-kernel, virtualization, davem

On Wed, Oct 15, 2014 at 06:58:15PM +0800, Jason Wang wrote:
> On 10/15/2014 06:41 PM, Michael S. Tsirkin wrote:
> > On Wed, Oct 15, 2014 at 06:19:15PM +0800, Jason Wang wrote:
> >> On 10/15/2014 05:28 PM, Michael S. Tsirkin wrote:
> >>> On Wed, Oct 15, 2014 at 03:25:26PM +0800, Jason Wang wrote:
> >>>> This patch introduces virtio_enable_cb_avail() to publish avail idx
> >>>> and used event. This could be used by batched buffer submitting to
> >>>> reduce the number of tx interrupts.
> >>>>
> >>>> Cc: Rusty Russell <rusty@rustcorp.com.au>
> >>>> Cc: Michael S. Tsirkin <mst@redhat.com>
> >>>> Signed-off-by: Jason Wang <jasowang@redhat.com>
> >>>> ---
> >>>>  drivers/virtio/virtio_ring.c |   22 ++++++++++++++++++++--
> >>>>  include/linux/virtio.h       |    2 ++
> >>>>  2 files changed, 22 insertions(+), 2 deletions(-)
> >>>>
> >>>> diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> >>>> index 1b3929f..d67fbf8 100644
> >>>> --- a/drivers/virtio/virtio_ring.c
> >>>> +++ b/drivers/virtio/virtio_ring.c
> >>>> @@ -567,14 +567,32 @@ unsigned virtqueue_enable_cb_prepare(struct virtqueue *_vq)
> >>>>  	 * entry. Always do both to keep code simple. */
> >>>>  	vq->vring.avail->flags &= ~VRING_AVAIL_F_NO_INTERRUPT;
> >>>>  	/* Make sure used event never go backwards */
> >>>> -	if (!vring_need_event(vring_used_event(&vq->vring),
> >>>> -			      vq->vring.avail->idx, last_used_idx))
> >>>> +	if (vq->vring.avail->idx != vring_used_event(&vq->vring) &&
> >>>> +	    !vring_need_event(vring_used_event(&vq->vring),
> >>>> +			      vq->vring.avail->idx, last_used_idx)) {
> >>>>  		vring_used_event(&vq->vring) = last_used_idx;
> >>>> +	}
> >>>>  	END_USE(vq);
> >>>>  	return last_used_idx;
> >>>>  }
> >>>>  EXPORT_SYMBOL_GPL(virtqueue_enable_cb_prepare);
> >>>>
> >>> I see you are also changing virtqueue_enable_cb_prepare, why?
> >> This is also used to prevent it from moving the used event backwards.
> >> This may happens when we handle tx napi after we publish avail idx as
> >> used event (virtqueue_enable_cb_avail() was called).
> > So it's wrong exactly in the same way.
> >
> > But also, please document this stuff, don't put
> > unrelated changes in a patch called "introduce
> > virtqueue_enable_cb_avail".
> >
> >
> >>>> +bool virtqueue_enable_cb_avail(struct virtqueue *_vq)
> >>>> +{
> >>>> +	struct vring_virtqueue *vq = to_vvq(_vq);
> >>>> +	bool ret;
> >>>> +
> >>>> +	START_USE(vq);
> >>>> +	vq->vring.avail->flags &= ~VRING_AVAIL_F_NO_INTERRUPT;
> >>>> +	vring_used_event(&vq->vring) = vq->vring.avail->idx;
> >>>> +	ret = vring_need_event(vq->vring.avail->idx,
> >>>> +			       vq->last_used_idx, vq->vring.used->idx);
> >>>> +	END_USE(vq);
> >>>> +
> >>>> +	return ret;
> >>>> +}
> >>>> +EXPORT_SYMBOL_GPL(virtqueue_enable_cb_avail);
> >>>> +
> >>>>  /**
> >>>>   * virtqueue_poll - query pending used buffers
> >>>>   * @vq: the struct virtqueue we're talking about.
> >>> Could not figure out what this does.
> >>> Please add documentation.
> >>>
> >> Sure, does something like below explain what does this function do?
> >>
> >> /**                                                                            
> >>
> >>  * virtqueue_enable_cb_avail - restart callbacks after
> >> disable_cb.           
> >>  * @vq: the struct virtqueue we're talking
> >> about.                              
> >>  *                                                                             
> >>
> >>  * This re-enables callbacks but hints to the other side to
> >> delay              
> >>  * interrupts all of the available buffers have been processed;         
> >
> > So this is like virtqueue_enable_cb_delayed but even more
> > aggressive?
> > I think it's too agressive: it's better to wake up guest
> > after you are through most of buffers, but not all,
> > so guest and host can work in parallel.
> 
> Note that:
> 
> - it was only used when there are still few of free slots (which is
> greater than 2 + MAX_SKB_FRAGS)
> - my patch keeps the free_old_xmit_skbs() in the beginning of
> start_xmit(), so the tx skb reclaiming does not depends totally on tx
> interrupt. If more packets comes, we'd expect some of them were freed in
> ndo_start_xmit(). If not, finally we may trigger the
> virtqueue_enable_cb_delayed().
> 
> So probably not as aggressive as it looks. I will do benchmark on this.

Mine too:
        } else if (virtqueue_enable_cb_delayed(sq->vq)) {
                free_old_xmit_skbs(txq, sq, qsize);
        }



> >
> >
> >>  * it returns "false" if there are at least one pending buffer in the
> >> queue,          
> >>  * to detect a possible race between the driver checking for more
> >> work,        
> >>  * and enabling
> >> callbacks.                                                     
> >>  *                                                                             
> >>
> >>  * Caller must ensure we don't call this with other
> >> virtqueue                  
> >>  * operations at the same time (except where
> >> noted).                           
> >>  */
> >>
> >>>> diff --git a/include/linux/virtio.h b/include/linux/virtio.h
> >>>> index b46671e..bfaf058 100644
> >>>> --- a/include/linux/virtio.h
> >>>> +++ b/include/linux/virtio.h
> >>>> @@ -65,6 +65,8 @@ bool virtqueue_enable_cb(struct virtqueue *vq);
> >>>>  
> >>>>  unsigned virtqueue_enable_cb_prepare(struct virtqueue *vq);
> >>>>  
> >>>> +bool virtqueue_enable_cb_avail(struct virtqueue *vq);
> >>>> +
> >>>>  bool virtqueue_poll(struct virtqueue *vq, unsigned);
> >>>>  
> >>>>  bool virtqueue_enable_cb_delayed(struct virtqueue *vq);
> >>>> -- 
> >>>> 1.7.1
> >>> --
> >>> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> >>> the body of a message to majordomo@vger.kernel.org
> >>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >>> Please read the FAQ at  http://www.tux.org/lkml/
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [RFC PATCH net-next 0/6] Always use tx interrupt for virtio-net
  2014-10-15 11:14   ` Jason Wang
@ 2014-10-15 11:58     ` Michael S. Tsirkin
  0 siblings, 0 replies; 36+ messages in thread
From: Michael S. Tsirkin @ 2014-10-15 11:58 UTC (permalink / raw)
  To: Jason Wang; +Cc: eric.dumazet, netdev, linux-kernel, virtualization, davem

On Wed, Oct 15, 2014 at 07:14:14PM +0800, Jason Wang wrote:
> On 10/15/2014 06:25 PM, Michael S. Tsirkin wrote:
> > On Wed, Oct 15, 2014 at 03:25:24PM +0800, Jason Wang wrote:
> >> According to David, proper accounting and queueing (at all levels, not
> >> just TCP sockets) is more important than trying to skim a bunch of
> >> cycles by avoiding TX interrupts.
> > He also mentioned we should find other ways to batch
> >
> 
> Right.
> >> Having an event to free the SKB is
> >> absolutely essential for the stack to operate correctly.
> >>
> >> This series tries to enable tx interrupt for virtio-net. The idea is
> >> simple: enable tx interrupt and schedule a tx napi to free old xmit
> >> skbs.
> >>
> >> Several notes:
> >> - Tx interrupt storm avoidance when queue is about to be full is
> >>   kept.
> > But queue is typically *not* full. More important to avoid interrupt
> > storms in that case IMO.
> 
> Yes.
> >> Since we may enable callbacks in both ndo_start_xmit() and tx
> >>   napi, patch 1 adds a check to make sure used event never go
> >>   back. This will let the napi not enable the callbacks wrongly after
> >>   delayed callbacks was used.
> > So why not just use delayed callbacks?
> 
> This means the tx interrupt are coalesced in a somewhat adaptive way.
> Need benchmark to see its effect.

I think it's a minimal change, and does not need new APIs.
If that's not optimal, we can do smarter things on top.

> >
> >> - For bulk dequeuing, there's no need to enable tx interrupt for each
> >>   packet. The last patch only enable tx interrupt for the final skb in
> >>   the chain through xmit_more and a new helper to publish current avail
> >>   idx as used event.
> >>
> >> This series fixes several issues of original rfc pointed out by Michael.
> > Could you list the issues, for ease of review?
> 
> Probably just one:
> 
> - Move the virtqueue_disable_cb() from skb_xmit_done() into
> virtnet_poll_tx() under tx lock.

I think I did this already, I'll recheck.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [RFC PATCH net-next 3/6] virtio-net: small optimization on free_old_xmit_skbs()
  2014-10-15 10:51         ` David Laight
@ 2014-10-15 12:00           ` Michael S. Tsirkin
  0 siblings, 0 replies; 36+ messages in thread
From: Michael S. Tsirkin @ 2014-10-15 12:00 UTC (permalink / raw)
  To: David Laight; +Cc: eric.dumazet, netdev, linux-kernel, virtualization, davem

On Wed, Oct 15, 2014 at 10:51:32AM +0000, David Laight wrote:
> From: Michael S. Tsirkin
> > On Wed, Oct 15, 2014 at 09:49:01AM +0000, David Laight wrote:
> > > From: Of Michael S. Tsirkin
> > > > On Wed, Oct 15, 2014 at 03:25:27PM +0800, Jason Wang wrote:
> > > > > Accumulate the sent packets and sent bytes in local variables and perform a
> > > > > single u64_stats_update_begin/end() after.
> > > > >
> > > > > Cc: Rusty Russell <rusty@rustcorp.com.au>
> > > > > Cc: Michael S. Tsirkin <mst@redhat.com>
> > > > > Signed-off-by: Jason Wang <jasowang@redhat.com>
> > > >
> > > > Not sure how much it's worth but since Eric suggested it ...
> > >
> > > Probably depends on the actual cost of u64_stats_update_begin/end
> > > against the likely extra saving of the tx_bytes and tx_packets
> > > values onto the stack across the call to dev_kfree_skb_any().
> > > (Which depends on the number of caller saved registers.)
> > 
> > Yea, some benchmark results would be nice to see.
> 
> I there are likely to be multiple skb on the queue the fastest
> code would probably do one 'virtqueue_get_all()' that returned
> a linked list of buffers, then follow the list to get the stats,
> and follow it again to free the skb.
> 
> 	David

Interesting. Each time we tried playing with linked list in the past,
it simply destroyed performance.
Maybe this case is different but I have my doubts.

> > > > Acked-by: Michael S. Tsirkin <mst@redhat.com>
> > > >
> > > > > ---
> > > > >  drivers/net/virtio_net.c |   12 ++++++++----
> > > > >  1 files changed, 8 insertions(+), 4 deletions(-)
> > > > >
> > > > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > > > > index 3d0ce44..a4d56b8 100644
> > > > > --- a/drivers/net/virtio_net.c
> > > > > +++ b/drivers/net/virtio_net.c
> > > > > @@ -833,17 +833,21 @@ static void free_old_xmit_skbs(struct send_queue *sq)
> > > > >  	unsigned int len;
> > > > >  	struct virtnet_info *vi = sq->vq->vdev->priv;
> > > > >  	struct virtnet_stats *stats = this_cpu_ptr(vi->stats);
> > > > > +	u64 tx_bytes = 0, tx_packets = 0;
> > >
> > > tx_packets need only be 'unsigned int'.
> > > The same is almost certainly true of tx_bytes.
> > >
> > > 	David
> > >
> > > > >  	while ((skb = virtqueue_get_buf(sq->vq, &len)) != NULL) {
> > > > >  		pr_debug("Sent skb %p\n", skb);
> > > > >
> > > > > -		u64_stats_update_begin(&stats->tx_syncp);
> > > > > -		stats->tx_bytes += skb->len;
> > > > > -		stats->tx_packets++;
> > > > > -		u64_stats_update_end(&stats->tx_syncp);
> > > > > +		tx_bytes += skb->len;
> > > > > +		tx_packets++;
> > > > >
> > > > >  		dev_kfree_skb_any(skb);
> > > > >  	}
> > > > > +
> > > > > +	u64_stats_update_begin(&stats->tx_syncp);
> > > > > +	stats->tx_bytes += tx_bytes;
> > > > > +	stats->tx_packets =+ tx_packets;
> > > > > +	u64_stats_update_end(&stats->tx_syncp);
> > > > >  }
> > > > >
> > > > >  static int xmit_skb(struct send_queue *sq, struct sk_buff *skb)
> > > > > --
> > > > > 1.7.1
> > > > --
> > > > To unsubscribe from this list: send the line "unsubscribe netdev" in
> > > > the body of a message to majordomo@vger.kernel.org
> > > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > >
> 

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [RFC PATCH net-next 1/6] virtio: make sure used event never go backwards
  2014-10-15 11:38           ` Michael S. Tsirkin
@ 2014-10-17  5:04             ` Jason Wang
  0 siblings, 0 replies; 36+ messages in thread
From: Jason Wang @ 2014-10-17  5:04 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: eric.dumazet, netdev, linux-kernel, virtualization, davem

On 10/15/2014 07:38 PM, Michael S. Tsirkin wrote:
> On Wed, Oct 15, 2014 at 06:44:41PM +0800, Jason Wang wrote:
>> On 10/15/2014 06:32 PM, Michael S. Tsirkin wrote:
>>> On Wed, Oct 15, 2014 at 06:13:19PM +0800, Jason Wang wrote:
>>>> On 10/15/2014 05:34 PM, Michael S. Tsirkin wrote:
>>>>> On Wed, Oct 15, 2014 at 03:25:25PM +0800, Jason Wang wrote:
>>>>>> This patch checks the new event idx to make sure used event idx never
>>>>>> goes back. This is used to synchronize the calls between
>>>>>> virtqueue_enable_cb_delayed() and virtqueue_enable_cb().
>>>>>>
>>>>>> Cc: Rusty Russell <rusty@rustcorp.com.au>
>>>>>> Cc: Michael S. Tsirkin <mst@redhat.com>
>>>>>> Signed-off-by: Jason Wang <jasowang@redhat.com>
>>>>> the implication being that moving event idx back might cause some race
>>>>> condition?  
>>>> This will cause race condition when tx interrupt is enabled. Consider
>>>> the following cases
>>>>
>>>> 1) tx napi was scheduled
>>>> 2) start_xmit() call virtqueue_enable_cb_delayed() and disable cb, [used
>>>> event is vq->last_used_idx + 3/4 pendg bufs]
>>>> 3) tx napi enable the callback by virtqueue_enable_cb() [ used event is
>>>> vq->last_used_idx ]
>>>>  
>>>> After step 3, used event was moved back, unnecessary tx interrupt was
>>>> triggered.
>>> Well unnecessary interrupts are safe.
>> But it that is what we want to reduce.
> It's all about correctness. I don't think mixing enable_cb
> and enable_cb_delayed makes sense, let's just make
> virtio behave correctly if that happens, no need to
> optimize for that.

Then as you said, need document or add WARN_ON() or BUG() in case both
of the two are used.
>
>
>>> With your patch caller of virtqueue_enable_cb will not get an
>>> interrupt on the next buffer which is not safe.
>>>
>>> If you don't want an interrupt on the next buffer, don't
>>> call virtqueue_enable_cb.
>> So something like this patch should be done in virtio core somewhere
>> else. Virtio-net can not do this since it does not have the knowledge of
>> event index.
> Take a look at my patch - no calls to enable_cb, only
> enable_cb_delayed, so we should be fine.
>
>>>>> If yes but please describe the race explicitly.
>>>>> Is there a bug we need to fix on stable?
>>>> Looks not, current code does not have such race condition.
>>>>> Please also explicitly describe a configuration that causes event idx
>>>>> to go back.
>>>>>
>>>>> All this info should go in the commit log.
>>>> Will do this.
>>>>>> ---
>>>>>>  drivers/virtio/virtio_ring.c |    7 +++++--
>>>>>>  1 files changed, 5 insertions(+), 2 deletions(-)
>>>>>>
>>>>>> diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
>>>>>> index 3b1f89b..1b3929f 100644
>>>>>> --- a/drivers/virtio/virtio_ring.c
>>>>>> +++ b/drivers/virtio/virtio_ring.c
>>>>>> @@ -559,14 +559,17 @@ unsigned virtqueue_enable_cb_prepare(struct virtqueue *_vq)
>>>>>>  	u16 last_used_idx;
>>>>>>  
>>>>>>  	START_USE(vq);
>>>>>> -
>>>>>> +	last_used_idx = vq->last_used_idx;
>>>>>>  	/* We optimistically turn back on interrupts, then check if there was
>>>>>>  	 * more to do. */
>>>>>>  	/* Depending on the VIRTIO_RING_F_EVENT_IDX feature, we need to
>>>>>>  	 * either clear the flags bit or point the event index at the next
>>>>>>  	 * entry. Always do both to keep code simple. */
>>>>>>  	vq->vring.avail->flags &= ~VRING_AVAIL_F_NO_INTERRUPT;
>>>>>> -	vring_used_event(&vq->vring) = last_used_idx = vq->last_used_idx;
>>>>>> +	/* Make sure used event never go backwards */
>>>>> s/go/goes/
>>>>>
>>>>>> +	if (!vring_need_event(vring_used_event(&vq->vring),
>>>>>> +			      vq->vring.avail->idx, last_used_idx))
>>>>>> +		vring_used_event(&vq->vring) = last_used_idx;
>>>>> The result will be that driver will *not* get an interrupt
>>>>> on the next consumed buffer, which is likely not what driver
>>>>> intended when it called virtqueue_enable_cb.
>>>> This will only happen when we want to delay the interrupt for next few
>>>> consumed buffers (virtqueue_enable_cb_delayed() was called). For the
>>>> other case, vq->last_used_idx should be ahead of previous used event. Do
>>>> you see any other case?
>>> Call virtqueue_enable_cb_delayed, later call virtqueue_enable_cb.  If
>>> event index is not updated in virtqueue_enable_cb, driver will not get
>>> an interrupt on the next buffer.
>> This is just what we want I think. The interrupt was not lost but fired
>> after 3/4 pending buffers were consumed. Do you see any real issue on this?
> Yes, this violates the API. For example device might never
> consume the rest of buffers.

Then it should be a bug of device which is out of the control of guest.
If not, device might never also consume 3/4 rest of buffers.
>
>>>>> Instead, how about we simply document the requirement that drivers either
>>>>> always call virtqueue_enable_cb_delayed or virtqueue_enable_cb
>>>>> but not both?
>>>> We need call them both when tx interrupt is enabled I believe.
>>> Can you pls reply to my patch and document issues you see?
>>>
>> In the previous reply you said you're using
>> virtuqueue_enable_cb_delayed(), so no race in your patch.
> OK so you think my patch is also correct, but that yours gives better
> efficiency?
>

Need some benchmark to see the difference I think.

^ permalink raw reply	[flat|nested] 36+ messages in thread

end of thread, other threads:[~2014-10-17  5:04 UTC | newest]

Thread overview: 36+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-10-15  7:25 [RFC PATCH net-next 0/6] Always use tx interrupt for virtio-net Jason Wang
2014-10-15  7:25 ` [RFC PATCH net-next 1/6] virtio: make sure used event never go backwards Jason Wang
2014-10-15  9:34   ` Michael S. Tsirkin
2014-10-15 10:13     ` Jason Wang
2014-10-15 10:32       ` Michael S. Tsirkin
2014-10-15 10:44         ` Jason Wang
2014-10-15 11:38           ` Michael S. Tsirkin
2014-10-17  5:04             ` Jason Wang
2014-10-15  7:25 ` [RFC PATCH net-next 2/6] virtio: introduce virtio_enable_cb_avail() Jason Wang
2014-10-15  9:28   ` Michael S. Tsirkin
2014-10-15 10:19     ` Jason Wang
2014-10-15 10:41       ` Michael S. Tsirkin
2014-10-15 10:58         ` Jason Wang
2014-10-15 11:43           ` Michael S. Tsirkin
2014-10-15  7:25 ` [RFC PATCH net-next 3/6] virtio-net: small optimization on free_old_xmit_skbs() Jason Wang
2014-10-15  9:36   ` Eric Dumazet
2014-10-15  9:37   ` Michael S. Tsirkin
2014-10-15  9:49     ` David Laight
2014-10-15 10:48       ` Michael S. Tsirkin
2014-10-15 10:51         ` David Laight
2014-10-15 12:00           ` Michael S. Tsirkin
2014-10-15  7:25 ` [RFC PATCH net-next 4/6] virtio-net: return the number of packets sent in free_old_xmit_skbs() Jason Wang
2014-10-15  7:25 ` [RFC PATCH net-next 5/6] virtio-net: enable tx interrupt Jason Wang
2014-10-15  9:37   ` Eric Dumazet
2014-10-15 10:21     ` Jason Wang
2014-10-15 10:18   ` Michael S. Tsirkin
2014-10-15 10:25     ` Jason Wang
2014-10-15 10:43       ` Michael S. Tsirkin
2014-10-15 11:00         ` Jason Wang
2014-10-15  7:25 ` [RFC PATCH net-next 6/6] virtio-net: enable tx interrupt only for the final skb in the chain Jason Wang
2014-10-15 10:22   ` Michael S. Tsirkin
2014-10-15 10:31     ` Jason Wang
2014-10-15 10:46       ` Michael S. Tsirkin
2014-10-15 10:25 ` [RFC PATCH net-next 0/6] Always use tx interrupt for virtio-net Michael S. Tsirkin
2014-10-15 11:14   ` Jason Wang
2014-10-15 11:58     ` Michael S. Tsirkin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).