All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH net-next RFC 0/4] virtio-net tx napi
@ 2017-03-03 14:39 Willem de Bruijn
  2017-03-03 14:39 ` [PATCH net-next RFC 1/4] virtio-net: napi helper functions Willem de Bruijn
                   ` (4 more replies)
  0 siblings, 5 replies; 18+ messages in thread
From: Willem de Bruijn @ 2017-03-03 14:39 UTC (permalink / raw)
  To: netdev; +Cc: jasowang, mst, Willem de Bruijn

From: Willem de Bruijn <willemb@google.com>

Add napi for virtio-net transmit completion processing. Based on
previous patchsets by Jason Wang:

  [RFC V7 PATCH 0/7] enable tx interrupts for virtio-net
  http://lkml.iu.edu/hypermail/linux/kernel/1505.3/00245.html

This patchset is not ready for submission yet, but it is time for
another checkpoint. Among others, it requires more testing with
more diverse workloads.


Before commit b0c39dbdc204 ("virtio_net: don't free buffers in xmit
ring") the virtio-net driver would free transmitted packets on
transmission of new packets in ndo_start_xmit and, to catch the edge
case when no new packet is sent, also in a timer at 10HZ.

A timer can cause long stalls. VIRTIO_F_NOTIFY_ON_EMPTY avoids stalls
due to low free descriptor count. It does not address a stalls due to
low socket SO_SNDBUF. Increasing timer frequency decreases that stall
time, but increases interrupt rate and, thus, cycle count.

Currently, with no timer, packets are freed only at ndo_start_xmit.
Latency of consume_skb is now unbounded. To avoid a deadlock if a sock
reaches SO_SNDBUF, packets are orphaned on tx. This breaks TCP small
queues.

Reenable TCP small queues by removing the orphan. Instead of using a
timer, convert the driver to regular tx napi. This does not have the
unresolved stall issue and does not have any frequency to tune.

By keeping interrupts enabled by default, napi increases tx
interrupt rate. VIRTIO_F_EVENT_IDX avoids sending an interrupt if
one is already unacknowledged, so makes this more feasible today.
Combine that with two optimizations that bring interrupt rate
back in line with the existing code:

Interrupt coalescing delays interrupts until a number of events
accrue or a timer fires.

Tx completion cleaning on rx interrupts elides most explicit tx
interrupts by relying on the fact that many rx interrupts fire.

Tested by running {1, 10, 100} TCP_STREAM and TCP_RR tests from a
guest to a server on the host, on an x86_64 Haswell. The guest
runs 4 vCPUs pinned to 4 cores. vhost and the test server are
pinned to a core each.

All results are the median of 5 runs, with variance well < 10%.
Used neper (github.com/google/neper) as test process. Tests used
experimental_zcopy=0. This is likely no longer needed.

Napi increases single stream throughput, but increases cycle cost
across the board. Interrupt moderation ("+vhost") reverts both, if
not fully. For this workload with ACKs in the return path, the
last optimization ("at-rx") is more effective. For UDP this is
likely not true.

             upstream     napi   +vhost   +at-rx +v+at-rx
Stream:
  1x:
  Mbps          30182    38782    30106    38002    32842
  Gcycles         405      499      386      403      417        

  10x:
  Mbps          40441    40575    41638    40260    41299
  Gcycles         438      545      430      416      416

  100x:
  Mbps          34049    34697    34763    34637    34259
  Gcycles         441      545      433      415      422

Latency (us):
  1x:
  p50              24       24       24       21       24
  p99              27       27       27       26       27
  Gcycles         299      430      432      312      297

10x:
  p50              30       31       31       42       31
  p99              40       46       48       52       42
  Gcycles         347      423      471      322      463

100x:
  p50             155      151      163      306      161
  p99             337      329      352      361      349
  Gcycles         340      421      463      306      441


Lower throughput at 100x vs 10x can be (at least in part)
explained by looking at bytes per packet sent (nstat). It likely
also explains the lower throughput of 1x for some variants.

upstream:

 N=1   bytes/pkt=16581
 N=10  bytes/pkt=61513
 N=100 bytes/pkt=51558

at_rx:

 N=1   bytes/pkt=65204
 N=10  bytes/pkt=65148
 N=100 bytes/pkt=56840

For this experiment, vhost has 64 frames and usecs thresholds.
Configuring this from the guest requires additional patches to qemu.
Temporary patch:

  @@ -846,9 +845,6 @@ static int vhost_net_open(struct inode *inode, struct file *f)
          vhost_poll_init(n->poll + VHOST_NET_VQ_TX, handle_tx_net, POLLOUT, dev);
          vhost_poll_init(n->poll + VHOST_NET_VQ_RX, handle_rx_net, POLLIN, dev);
 
  -       vqs[VHOST_NET_VQ_TX]->max_coalesce_ktime = ktime_set(0, 64 * NSEC_PER_USEC);
  -       vqs[VHOST_NET_VQ_TX]->max_coalesce_frames = 64;
  -
          f->private_data = n;

TODO
 - restart timer if trylock failed and lock not held by hande_tx
   - start timer only at end of handle_tx and kill at start
 - make napi_tx configurable
 - increase test coverage
   - 4KB TCP_RR
   - UDP
   - multithreaded sender
   - with experimental_zcopytx


Willem de Bruijn (4):
  virtio-net: napi helper functions
  virtio-net: transmit napi
  vhost: interrupt coalescing support
  virtio-net: clean tx descriptors from rx napi

 drivers/net/virtio_net.c   | 157 +++++++++++++++++++++++++++++++++------------
 drivers/vhost/vhost.c      |  74 ++++++++++++++++++++-
 drivers/vhost/vhost.h      |  12 ++++
 include/uapi/linux/vhost.h |  11 ++++
 4 files changed, 211 insertions(+), 43 deletions(-)

-- 
2.12.0.rc1.440.g5b76565f74-goog

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH net-next RFC 1/4] virtio-net: napi helper functions
  2017-03-03 14:39 [PATCH net-next RFC 0/4] virtio-net tx napi Willem de Bruijn
@ 2017-03-03 14:39 ` Willem de Bruijn
  2017-03-03 14:39 ` [PATCH net-next RFC 2/4] virtio-net: transmit napi Willem de Bruijn
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 18+ messages in thread
From: Willem de Bruijn @ 2017-03-03 14:39 UTC (permalink / raw)
  To: netdev; +Cc: jasowang, mst, Willem de Bruijn

From: Willem de Bruijn <willemb@google.com>

Prepare virtio-net for tx napi by converting existing napi code to
use helper functions. This also deduplicates some logic.

Signed-off-by: Willem de Bruijn <willemb@google.com>
---
 drivers/net/virtio_net.c | 65 ++++++++++++++++++++++++++----------------------
 1 file changed, 35 insertions(+), 30 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index bf95016f442a..8c21e9a4adc7 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -239,6 +239,26 @@ static struct page *get_a_page(struct receive_queue *rq, gfp_t gfp_mask)
 	return p;
 }
 
+static void virtqueue_napi_schedule(struct napi_struct *napi,
+				    struct virtqueue *vq)
+{
+	if (napi_schedule_prep(napi)) {
+		virtqueue_disable_cb(vq);
+		__napi_schedule(napi);
+	}
+}
+
+static void virtqueue_napi_complete(struct napi_struct *napi,
+				    struct virtqueue *vq, int processed)
+{
+	int opaque;
+
+	opaque = virtqueue_enable_cb_prepare(vq);
+	if (napi_complete_done(napi, processed) &&
+	    unlikely(virtqueue_poll(vq, opaque)))
+		virtqueue_napi_schedule(napi, vq);
+}
+
 static void skb_xmit_done(struct virtqueue *vq)
 {
 	struct virtnet_info *vi = vq->vdev->priv;
@@ -936,27 +956,20 @@ static void skb_recv_done(struct virtqueue *rvq)
 	struct virtnet_info *vi = rvq->vdev->priv;
 	struct receive_queue *rq = &vi->rq[vq2rxq(rvq)];
 
-	/* Schedule NAPI, Suppress further interrupts if successful. */
-	if (napi_schedule_prep(&rq->napi)) {
-		virtqueue_disable_cb(rvq);
-		__napi_schedule(&rq->napi);
-	}
+	virtqueue_napi_schedule(&rq->napi, rvq);
 }
 
-static void virtnet_napi_enable(struct receive_queue *rq)
+static void virtnet_napi_enable(struct virtqueue *vq, struct napi_struct *napi)
 {
-	napi_enable(&rq->napi);
+	napi_enable(napi);
 
 	/* If all buffers were filled by other side before we napi_enabled, we
-	 * won't get another interrupt, so process any outstanding packets
-	 * now.  virtnet_poll wants re-enable the queue, so we disable here.
-	 * We synchronize against interrupts via NAPI_STATE_SCHED */
-	if (napi_schedule_prep(&rq->napi)) {
-		virtqueue_disable_cb(rq->vq);
-		local_bh_disable();
-		__napi_schedule(&rq->napi);
-		local_bh_enable();
-	}
+	 * won't get another interrupt, so process any outstanding packets now.
+	 * Call local_bh_enable after to trigger softIRQ processing.
+	 */
+	local_bh_disable();
+	virtqueue_napi_schedule(napi, vq);
+	local_bh_enable();
 }
 
 static void refill_work(struct work_struct *work)
@@ -971,7 +984,7 @@ static void refill_work(struct work_struct *work)
 
 		napi_disable(&rq->napi);
 		still_empty = !try_fill_recv(vi, rq, GFP_KERNEL);
-		virtnet_napi_enable(rq);
+		virtnet_napi_enable(rq->vq, &rq->napi);
 
 		/* In theory, this can happen: if we don't get any buffers in
 		 * we will *never* try to fill again.
@@ -1011,21 +1024,13 @@ static int virtnet_poll(struct napi_struct *napi, int budget)
 {
 	struct receive_queue *rq =
 		container_of(napi, struct receive_queue, napi);
-	unsigned int r, received;
+	unsigned int received;
 
 	received = virtnet_receive(rq, budget);
 
 	/* Out of packets? */
-	if (received < budget) {
-		r = virtqueue_enable_cb_prepare(rq->vq);
-		if (napi_complete_done(napi, received)) {
-			if (unlikely(virtqueue_poll(rq->vq, r)) &&
-			    napi_schedule_prep(napi)) {
-				virtqueue_disable_cb(rq->vq);
-				__napi_schedule(napi);
-			}
-		}
-	}
+	if (received < budget)
+		virtqueue_napi_complete(napi, rq->vq, received);
 
 	return received;
 }
@@ -1040,7 +1045,7 @@ static int virtnet_open(struct net_device *dev)
 			/* Make sure we have some buffers: if oom use wq. */
 			if (!try_fill_recv(vi, &vi->rq[i], GFP_KERNEL))
 				schedule_delayed_work(&vi->refill, 0);
-		virtnet_napi_enable(&vi->rq[i]);
+		virtnet_napi_enable(vi->rq[i].vq, &vi->rq[i].napi);
 	}
 
 	return 0;
@@ -1737,7 +1742,7 @@ static int virtnet_restore_up(struct virtio_device *vdev)
 				schedule_delayed_work(&vi->refill, 0);
 
 		for (i = 0; i < vi->max_queue_pairs; i++)
-			virtnet_napi_enable(&vi->rq[i]);
+			virtnet_napi_enable(vi->rq[i].vq, &vi->rq[i].napi);
 	}
 
 	netif_device_attach(vi->dev);
-- 
2.12.0.rc1.440.g5b76565f74-goog

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH net-next RFC 2/4] virtio-net: transmit napi
  2017-03-03 14:39 [PATCH net-next RFC 0/4] virtio-net tx napi Willem de Bruijn
  2017-03-03 14:39 ` [PATCH net-next RFC 1/4] virtio-net: napi helper functions Willem de Bruijn
@ 2017-03-03 14:39 ` Willem de Bruijn
  2017-03-06  9:21   ` Jason Wang
  2017-03-03 14:39 ` [PATCH net-next RFC 3/4] vhost: interrupt coalescing support Willem de Bruijn
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 18+ messages in thread
From: Willem de Bruijn @ 2017-03-03 14:39 UTC (permalink / raw)
  To: netdev; +Cc: jasowang, mst, Willem de Bruijn

From: Willem de Bruijn <willemb@google.com>

Convert virtio-net to a standard napi tx completion path. This enables
better TCP pacing using TCP small queues and increases single stream
throughput.

The virtio-net driver currently cleans tx descriptors on transmission
of new packets in ndo_start_xmit. Latency depends on new traffic, so
is unbounded. To avoid deadlock when a socket reaches its snd limit,
packets are orphaned on tranmission. This breaks socket backpressure,
including TSQ.

Napi increases the number of interrupts generated compared to the
current model, which keeps interrupts disabled as long as the ring
has enough free descriptors. Keep tx napi optional for now. Follow-on
patches will reduce the interrupt cost.

Signed-off-by: Willem de Bruijn <willemb@google.com>
---
 drivers/net/virtio_net.c | 73 ++++++++++++++++++++++++++++++++++++++++--------
 1 file changed, 61 insertions(+), 12 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 8c21e9a4adc7..9a9031640179 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -33,6 +33,8 @@
 static int napi_weight = NAPI_POLL_WEIGHT;
 module_param(napi_weight, int, 0444);
 
+static int napi_tx_weight = NAPI_POLL_WEIGHT;
+
 static bool csum = true, gso = true;
 module_param(csum, bool, 0444);
 module_param(gso, bool, 0444);
@@ -86,6 +88,8 @@ struct send_queue {
 
 	/* Name of the send queue: output.$index */
 	char name[40];
+
+	struct napi_struct napi;
 };
 
 /* Internal representation of a receive virtqueue */
@@ -262,12 +266,16 @@ static void virtqueue_napi_complete(struct napi_struct *napi,
 static void skb_xmit_done(struct virtqueue *vq)
 {
 	struct virtnet_info *vi = vq->vdev->priv;
+	struct napi_struct *napi = &vi->sq[vq2txq(vq)].napi;
 
 	/* Suppress further interrupts. */
 	virtqueue_disable_cb(vq);
 
-	/* We were probably waiting for more output buffers. */
-	netif_wake_subqueue(vi->dev, vq2txq(vq));
+	if (napi->weight)
+		virtqueue_napi_schedule(napi, vq);
+	else
+		/* We were probably waiting for more output buffers. */
+		netif_wake_subqueue(vi->dev, vq2txq(vq));
 }
 
 static unsigned int mergeable_ctx_to_buf_truesize(unsigned long mrg_ctx)
@@ -961,6 +969,9 @@ static void skb_recv_done(struct virtqueue *rvq)
 
 static void virtnet_napi_enable(struct virtqueue *vq, struct napi_struct *napi)
 {
+	if (!napi->weight)
+		return;
+
 	napi_enable(napi);
 
 	/* If all buffers were filled by other side before we napi_enabled, we
@@ -1046,12 +1057,13 @@ static int virtnet_open(struct net_device *dev)
 			if (!try_fill_recv(vi, &vi->rq[i], GFP_KERNEL))
 				schedule_delayed_work(&vi->refill, 0);
 		virtnet_napi_enable(vi->rq[i].vq, &vi->rq[i].napi);
+		virtnet_napi_enable(vi->sq[i].vq, &vi->sq[i].napi);
 	}
 
 	return 0;
 }
 
-static void free_old_xmit_skbs(struct send_queue *sq)
+static unsigned int free_old_xmit_skbs(struct send_queue *sq, int budget)
 {
 	struct sk_buff *skb;
 	unsigned int len;
@@ -1060,7 +1072,8 @@ static void free_old_xmit_skbs(struct send_queue *sq)
 	unsigned int packets = 0;
 	unsigned int bytes = 0;
 
-	while ((skb = virtqueue_get_buf(sq->vq, &len)) != NULL) {
+	while (packets < budget &&
+	       (skb = virtqueue_get_buf(sq->vq, &len)) != NULL) {
 		pr_debug("Sent skb %p\n", skb);
 
 		bytes += skb->len;
@@ -1073,12 +1086,35 @@ static void free_old_xmit_skbs(struct send_queue *sq)
 	 * happens when called speculatively from start_xmit.
 	 */
 	if (!packets)
-		return;
+		return 0;
 
 	u64_stats_update_begin(&stats->tx_syncp);
 	stats->tx_bytes += bytes;
 	stats->tx_packets += packets;
 	u64_stats_update_end(&stats->tx_syncp);
+
+	return packets;
+}
+
+static int virtnet_poll_tx(struct napi_struct *napi, int budget)
+{
+	struct send_queue *sq = container_of(napi, struct send_queue, napi);
+	struct virtnet_info *vi = sq->vq->vdev->priv;
+	struct netdev_queue *txq = netdev_get_tx_queue(vi->dev, vq2txq(sq->vq));
+	bool complete = false;
+
+	__netif_tx_lock(txq, smp_processor_id());
+	if (free_old_xmit_skbs(sq, budget) < budget)
+		complete = true;
+	__netif_tx_unlock(txq);
+
+	if (complete)
+		virtqueue_napi_complete(napi, sq->vq, 0);
+
+	if (sq->vq->num_free >= 2 + MAX_SKB_FRAGS)
+		netif_wake_subqueue(vi->dev, vq2txq(sq->vq));
+
+	return complete ? 0 : budget;
 }
 
 static int xmit_skb(struct send_queue *sq, struct sk_buff *skb)
@@ -1130,9 +1166,11 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev)
 	int err;
 	struct netdev_queue *txq = netdev_get_tx_queue(dev, qnum);
 	bool kick = !skb->xmit_more;
+	bool use_napi = sq->napi.weight;
 
 	/* Free up any pending old buffers before queueing new ones. */
-	free_old_xmit_skbs(sq);
+	if (!use_napi)
+		free_old_xmit_skbs(sq, napi_weight);
 
 	/* timestamp packet in software */
 	skb_tx_timestamp(skb);
@@ -1152,8 +1190,10 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev)
 	}
 
 	/* Don't wait up for transmitted skbs to be freed. */
-	skb_orphan(skb);
-	nf_reset(skb);
+	if (!use_napi) {
+		skb_orphan(skb);
+		nf_reset(skb);
+	}
 
 	/* If running out of space, stop queue to avoid getting packets that we
 	 * are then unable to transmit.
@@ -1167,9 +1207,10 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev)
 	 */
 	if (sq->vq->num_free < 2+MAX_SKB_FRAGS) {
 		netif_stop_subqueue(dev, qnum);
-		if (unlikely(!virtqueue_enable_cb_delayed(sq->vq))) {
+		if (!use_napi &&
+		    unlikely(!virtqueue_enable_cb_delayed(sq->vq))) {
 			/* More just got used, free them then recheck. */
-			free_old_xmit_skbs(sq);
+			free_old_xmit_skbs(sq, virtqueue_get_vring_size(sq->vq));
 			if (sq->vq->num_free >= 2+MAX_SKB_FRAGS) {
 				netif_start_subqueue(dev, qnum);
 				virtqueue_disable_cb(sq->vq);
@@ -1371,8 +1412,10 @@ static int virtnet_close(struct net_device *dev)
 	/* Make sure refill_work doesn't re-enable napi! */
 	cancel_delayed_work_sync(&vi->refill);
 
-	for (i = 0; i < vi->max_queue_pairs; i++)
+	for (i = 0; i < vi->max_queue_pairs; i++) {
 		napi_disable(&vi->rq[i].napi);
+		napi_disable(&vi->sq[i].napi);
+	}
 
 	return 0;
 }
@@ -1719,6 +1762,7 @@ static void virtnet_freeze_down(struct virtio_device *vdev)
 	if (netif_running(vi->dev)) {
 		for (i = 0; i < vi->max_queue_pairs; i++)
 			napi_disable(&vi->rq[i].napi);
+			napi_disable(&vi->sq[i].napi);
 	}
 }
 
@@ -1741,8 +1785,10 @@ static int virtnet_restore_up(struct virtio_device *vdev)
 			if (!try_fill_recv(vi, &vi->rq[i], GFP_KERNEL))
 				schedule_delayed_work(&vi->refill, 0);
 
-		for (i = 0; i < vi->max_queue_pairs; i++)
+		for (i = 0; i < vi->max_queue_pairs; i++) {
 			virtnet_napi_enable(vi->rq[i].vq, &vi->rq[i].napi);
+			virtnet_napi_enable(vi->sq[i].vq, &vi->sq[i].napi);
+		}
 	}
 
 	netif_device_attach(vi->dev);
@@ -1947,6 +1993,7 @@ static void virtnet_free_queues(struct virtnet_info *vi)
 	for (i = 0; i < vi->max_queue_pairs; i++) {
 		napi_hash_del(&vi->rq[i].napi);
 		netif_napi_del(&vi->rq[i].napi);
+		netif_napi_del(&vi->sq[i].napi);
 	}
 
 	/* We called napi_hash_del() before netif_napi_del(),
@@ -2132,6 +2179,8 @@ static int virtnet_alloc_queues(struct virtnet_info *vi)
 		vi->rq[i].pages = NULL;
 		netif_napi_add(vi->dev, &vi->rq[i].napi, virtnet_poll,
 			       napi_weight);
+		netif_napi_add(vi->dev, &vi->sq[i].napi, virtnet_poll_tx,
+			       napi_tx_weight);
 
 		sg_init_table(vi->rq[i].sg, ARRAY_SIZE(vi->rq[i].sg));
 		ewma_pkt_len_init(&vi->rq[i].mrg_avg_pkt_len);
-- 
2.12.0.rc1.440.g5b76565f74-goog

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH net-next RFC 3/4] vhost: interrupt coalescing support
  2017-03-03 14:39 [PATCH net-next RFC 0/4] virtio-net tx napi Willem de Bruijn
  2017-03-03 14:39 ` [PATCH net-next RFC 1/4] virtio-net: napi helper functions Willem de Bruijn
  2017-03-03 14:39 ` [PATCH net-next RFC 2/4] virtio-net: transmit napi Willem de Bruijn
@ 2017-03-03 14:39 ` Willem de Bruijn
  2017-03-06  9:28   ` Jason Wang
  2017-03-03 14:39 ` [PATCH net-next RFC 4/4] virtio-net: clean tx descriptors from rx napi Willem de Bruijn
  2017-03-06 19:44 ` [PATCH net-next RFC 0/4] virtio-net tx napi Michael S. Tsirkin
  4 siblings, 1 reply; 18+ messages in thread
From: Willem de Bruijn @ 2017-03-03 14:39 UTC (permalink / raw)
  To: netdev; +Cc: jasowang, mst, Willem de Bruijn

From: Willem de Bruijn <willemb@google.com>

Implement standard interrupt coalescing for vhost based drivers,
delaying interrupts up to a maximum latency or number of events.

Interrupt moderation is customary for network devices, where it
is specified as ethtool -K $DEV rx-frames N rx-usecs M. Add the
same variable to vhost and integrate into vhost_notify. Add a
timer to track the time bound.

The feature is configured from the guest over PCI. Add new control
operations VHOST_SET_VRING_COALESCE and VHOST_GET_VRING_COALESCE to
communicate these features between the hypervisor and vhost.

Signed-off-by: Willem de Bruijn <willemb@google.com>
---
 drivers/vhost/vhost.c      | 76 +++++++++++++++++++++++++++++++++++++++++++++-
 drivers/vhost/vhost.h      | 12 ++++++++
 include/uapi/linux/vhost.h | 11 +++++++
 3 files changed, 98 insertions(+), 1 deletion(-)

diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index 4269e621e254..55711f5ce2ae 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -312,6 +312,9 @@ static void vhost_vq_reset(struct vhost_dev *dev,
 	vq->busyloop_timeout = 0;
 	vq->umem = NULL;
 	vq->iotlb = NULL;
+	vq->max_coalesce_ktime = 0;
+	vq->max_coalesce_frames = 0;
+	vq->coalesce_frames = 0;
 }
 
 static int vhost_worker(void *data)
@@ -394,6 +397,22 @@ static void vhost_dev_free_iovecs(struct vhost_dev *dev)
 		vhost_vq_free_iovecs(dev->vqs[i]);
 }
 
+void vhost_signal(struct vhost_dev *dev, struct vhost_virtqueue *vq);
+static enum hrtimer_restart vhost_coalesce_timer(struct hrtimer *timer)
+{
+	struct vhost_virtqueue *vq =
+		container_of(timer, struct vhost_virtqueue, ctimer);
+
+	if (mutex_trylock(&vq->mutex)) {
+		vq->coalesce_frames = vq->max_coalesce_frames;
+		vhost_signal(vq->dev, vq);
+		mutex_unlock(&vq->mutex);
+	}
+
+	/* TODO: restart if lock failed and not held by handle_tx */
+	return HRTIMER_NORESTART;
+}
+
 void vhost_dev_init(struct vhost_dev *dev,
 		    struct vhost_virtqueue **vqs, int nvqs)
 {
@@ -423,6 +442,8 @@ void vhost_dev_init(struct vhost_dev *dev,
 		vq->heads = NULL;
 		vq->dev = dev;
 		mutex_init(&vq->mutex);
+		hrtimer_init(&vq->ctimer, CLOCK_MONOTONIC, HRTIMER_MODE_REL);
+		vq->ctimer.function = vhost_coalesce_timer;
 		vhost_vq_reset(dev, vq);
 		if (vq->handle_kick)
 			vhost_poll_init(&vq->poll, vq->handle_kick,
@@ -608,6 +629,7 @@ void vhost_dev_cleanup(struct vhost_dev *dev, bool locked)
 	int i;
 
 	for (i = 0; i < dev->nvqs; ++i) {
+		hrtimer_cancel(&dev->vqs[i]->ctimer);
 		if (dev->vqs[i]->error_ctx)
 			eventfd_ctx_put(dev->vqs[i]->error_ctx);
 		if (dev->vqs[i]->error)
@@ -1279,6 +1301,7 @@ long vhost_vring_ioctl(struct vhost_dev *d, int ioctl, void __user *argp)
 	struct vhost_vring_state s;
 	struct vhost_vring_file f;
 	struct vhost_vring_addr a;
+	struct vhost_vring_coalesce c;
 	u32 idx;
 	long r;
 
@@ -1335,6 +1358,30 @@ long vhost_vring_ioctl(struct vhost_dev *d, int ioctl, void __user *argp)
 		if (copy_to_user(argp, &s, sizeof s))
 			r = -EFAULT;
 		break;
+	case VHOST_SET_VRING_COALESCE:
+		if (copy_from_user(&c, argp, sizeof(c))) {
+			r = -EFAULT;
+			break;
+		}
+
+		if ((c.max_coalesce_frames && !c.max_coalesce_usecs) ||
+		    (c.max_coalesce_usecs && !c.max_coalesce_frames) ||
+		    (c.max_coalesce_usecs > 10000) ||
+		    (c.max_coalesce_frames > 1024)) {
+			r = -EINVAL;
+			break;
+		}
+
+		vq->max_coalesce_ktime = ns_to_ktime(c.max_coalesce_usecs *
+						     NSEC_PER_USEC);
+		vq->max_coalesce_frames = c.max_coalesce_frames;
+		break;
+	case VHOST_GET_VRING_COALESCE:
+		c.max_coalesce_usecs = ktime_to_us(vq->max_coalesce_ktime);
+		c.max_coalesce_frames = vq->max_coalesce_frames;
+		if (copy_to_user(argp, &c, sizeof(c)))
+			r = -EFAULT;
+		break;
 	case VHOST_SET_VRING_ADDR:
 		if (copy_from_user(&a, argp, sizeof a)) {
 			r = -EFAULT;
@@ -1418,6 +1465,11 @@ long vhost_vring_ioctl(struct vhost_dev *d, int ioctl, void __user *argp)
 			break;
 		}
 		if (eventfp != vq->call) {
+			/* do not update while timer is active */
+			if (hrtimer_active(&vq->ctimer)) {
+				hrtimer_cancel(&vq->ctimer);
+				vhost_signal(vq->dev, vq);
+			}
 			filep = vq->call;
 			ctx = vq->call_ctx;
 			vq->call = eventfp;
@@ -2112,6 +2164,10 @@ static int __vhost_add_used_n(struct vhost_virtqueue *vq,
 	 * signals at least once in 2^16 and remove this. */
 	if (unlikely((u16)(new - vq->signalled_used) < (u16)(new - old)))
 		vq->signalled_used_valid = false;
+
+	if (vq->max_coalesce_frames)
+		vq->coalesce_frames += count;
+
 	return 0;
 }
 
@@ -2152,6 +2208,14 @@ int vhost_add_used_n(struct vhost_virtqueue *vq, struct vring_used_elem *heads,
 }
 EXPORT_SYMBOL_GPL(vhost_add_used_n);
 
+static void vhost_coalesce_reset(struct vhost_virtqueue *vq)
+{
+	if (vq->max_coalesce_frames) {
+		vq->coalesce_frames = 0;
+		hrtimer_try_to_cancel(&vq->ctimer);
+	}
+}
+
 static bool vhost_notify(struct vhost_dev *dev, struct vhost_virtqueue *vq)
 {
 	__u16 old, new;
@@ -2162,6 +2226,14 @@ static bool vhost_notify(struct vhost_dev *dev, struct vhost_virtqueue *vq)
 	    unlikely(vq->avail_idx == vq->last_avail_idx))
 		return true;
 
+	if (vq->coalesce_frames < vq->max_coalesce_frames) {
+		if (!hrtimer_active(&vq->ctimer))
+			hrtimer_start(&vq->ctimer, vq->max_coalesce_ktime,
+				      HRTIMER_MODE_REL);
+		return false;
+	}
+	vhost_coalesce_reset(vq);
+
 	if (!vhost_has_feature(vq, VIRTIO_RING_F_EVENT_IDX)) {
 		__virtio16 flags;
 		/* Flush out used index updates. This is paired
@@ -2208,8 +2280,10 @@ static bool vhost_notify(struct vhost_dev *dev, struct vhost_virtqueue *vq)
 void vhost_signal(struct vhost_dev *dev, struct vhost_virtqueue *vq)
 {
 	/* Signal the Guest tell them we used something up. */
-	if (vq->call_ctx && vhost_notify(dev, vq))
+	if (vq->call_ctx && vhost_notify(dev, vq)) {
 		eventfd_signal(vq->call_ctx, 1);
+		vhost_coalesce_reset(vq);
+	}
 }
 EXPORT_SYMBOL_GPL(vhost_signal);
 
diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h
index a9cbbb148f46..cee25f376812 100644
--- a/drivers/vhost/vhost.h
+++ b/drivers/vhost/vhost.h
@@ -119,6 +119,18 @@ struct vhost_virtqueue {
 	/* Last used index value we have signalled on */
 	bool signalled_used_valid;
 
+	/* Maximum time to wait before genearting an interrupt */
+	ktime_t max_coalesce_ktime;
+
+	/* Maximum number of pending frames before generating an interrupt */
+	u32 max_coalesce_frames;
+
+	/* The number of frames pending an interrupt */
+	u32 coalesce_frames;
+
+	/* Timer used to trigger an coalesced interrupt */
+	struct hrtimer ctimer;
+
 	/* Log writes to used structure. */
 	bool log_used;
 	u64 log_addr;
diff --git a/include/uapi/linux/vhost.h b/include/uapi/linux/vhost.h
index 60180c0b5dc6..a0f9e2b1545a 100644
--- a/include/uapi/linux/vhost.h
+++ b/include/uapi/linux/vhost.h
@@ -27,6 +27,11 @@ struct vhost_vring_file {
 
 };
 
+struct vhost_vring_coalesce {
+	__u32 max_coalesce_usecs;
+	__u32 max_coalesce_frames;
+};
+
 struct vhost_vring_addr {
 	unsigned int index;
 	/* Option flags. */
@@ -143,6 +148,12 @@ struct vhost_memory {
 #define VHOST_SET_VRING_ENDIAN _IOW(VHOST_VIRTIO, 0x13, struct vhost_vring_state)
 #define VHOST_GET_VRING_ENDIAN _IOW(VHOST_VIRTIO, 0x14, struct vhost_vring_state)
 
+/* Set coalescing parameters for the ring. */
+#define VHOST_SET_VRING_COALESCE _IOW(VHOST_VIRTIO, 0x15, \
+				      struct vhost_vring_coalesce)
+/* Get coalescing parameters for the ring. */
+#define VHOST_GET_VRING_COALESCE _IOW(VHOST_VIRTIO, 0x16, \
+				      struct vhost_vring_coalesce)
 /* The following ioctls use eventfd file descriptors to signal and poll
  * for events. */
 
-- 
2.12.0.rc1.440.g5b76565f74-goog

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH net-next RFC 4/4] virtio-net: clean tx descriptors from rx napi
  2017-03-03 14:39 [PATCH net-next RFC 0/4] virtio-net tx napi Willem de Bruijn
                   ` (2 preceding siblings ...)
  2017-03-03 14:39 ` [PATCH net-next RFC 3/4] vhost: interrupt coalescing support Willem de Bruijn
@ 2017-03-03 14:39 ` Willem de Bruijn
  2017-03-06  9:34   ` Jason Wang
  2017-03-06 19:44 ` [PATCH net-next RFC 0/4] virtio-net tx napi Michael S. Tsirkin
  4 siblings, 1 reply; 18+ messages in thread
From: Willem de Bruijn @ 2017-03-03 14:39 UTC (permalink / raw)
  To: netdev; +Cc: jasowang, mst, Willem de Bruijn

From: Willem de Bruijn <willemb@google.com>

Amortize the cost of virtual interrupts by doing both rx and tx work
on reception of a receive interrupt. Together VIRTIO_F_EVENT_IDX and
vhost interrupt moderation, this suppresses most explicit tx
completion interrupts for bidirectional workloads.

Signed-off-by: Willem de Bruijn <willemb@google.com>
---
 drivers/net/virtio_net.c | 19 +++++++++++++++++++
 1 file changed, 19 insertions(+)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 9a9031640179..21c575127d50 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -1031,6 +1031,23 @@ static int virtnet_receive(struct receive_queue *rq, int budget)
 	return received;
 }
 
+static unsigned int free_old_xmit_skbs(struct send_queue *sq, int budget);
+
+static void virtnet_poll_cleantx(struct receive_queue *rq)
+{
+	struct virtnet_info *vi = rq->vq->vdev->priv;
+	unsigned int index = vq2rxq(rq->vq);
+	struct send_queue *sq = &vi->sq[index];
+	struct netdev_queue *txq = netdev_get_tx_queue(vi->dev, index);
+
+	__netif_tx_lock(txq, smp_processor_id());
+	free_old_xmit_skbs(sq, sq->napi.weight);
+	__netif_tx_unlock(txq);
+
+	if (sq->vq->num_free >= 2 + MAX_SKB_FRAGS)
+		netif_wake_subqueue(vi->dev, vq2txq(sq->vq));
+}
+
 static int virtnet_poll(struct napi_struct *napi, int budget)
 {
 	struct receive_queue *rq =
@@ -1039,6 +1056,8 @@ static int virtnet_poll(struct napi_struct *napi, int budget)
 
 	received = virtnet_receive(rq, budget);
 
+	virtnet_poll_cleantx(rq);
+
 	/* Out of packets? */
 	if (received < budget)
 		virtqueue_napi_complete(napi, rq->vq, received);
-- 
2.12.0.rc1.440.g5b76565f74-goog

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH net-next RFC 2/4] virtio-net: transmit napi
  2017-03-03 14:39 ` [PATCH net-next RFC 2/4] virtio-net: transmit napi Willem de Bruijn
@ 2017-03-06  9:21   ` Jason Wang
  2017-03-06 17:50     ` Willem de Bruijn
  0 siblings, 1 reply; 18+ messages in thread
From: Jason Wang @ 2017-03-06  9:21 UTC (permalink / raw)
  To: Willem de Bruijn, netdev; +Cc: mst, Willem de Bruijn



On 2017年03月03日 22:39, Willem de Bruijn wrote:
> From: Willem de Bruijn <willemb@google.com>
>
> Convert virtio-net to a standard napi tx completion path. This enables
> better TCP pacing using TCP small queues and increases single stream
> throughput.
>
> The virtio-net driver currently cleans tx descriptors on transmission
> of new packets in ndo_start_xmit. Latency depends on new traffic, so
> is unbounded. To avoid deadlock when a socket reaches its snd limit,
> packets are orphaned on tranmission. This breaks socket backpressure,
> including TSQ.
>
> Napi increases the number of interrupts generated compared to the
> current model, which keeps interrupts disabled as long as the ring
> has enough free descriptors. Keep tx napi optional for now. Follow-on
> patches will reduce the interrupt cost.
>
> Signed-off-by: Willem de Bruijn <willemb@google.com>
> ---
>   drivers/net/virtio_net.c | 73 ++++++++++++++++++++++++++++++++++++++++--------
>   1 file changed, 61 insertions(+), 12 deletions(-)
>
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index 8c21e9a4adc7..9a9031640179 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -33,6 +33,8 @@
>   static int napi_weight = NAPI_POLL_WEIGHT;
>   module_param(napi_weight, int, 0444);
>   
> +static int napi_tx_weight = NAPI_POLL_WEIGHT;
> +

Maybe we should use module_param for this? Or in the future, use 
tx-frames-irq for a per-device configuration.

Thanks

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH net-next RFC 3/4] vhost: interrupt coalescing support
  2017-03-03 14:39 ` [PATCH net-next RFC 3/4] vhost: interrupt coalescing support Willem de Bruijn
@ 2017-03-06  9:28   ` Jason Wang
  2017-03-06 17:31     ` Willem de Bruijn
  0 siblings, 1 reply; 18+ messages in thread
From: Jason Wang @ 2017-03-06  9:28 UTC (permalink / raw)
  To: Willem de Bruijn, netdev; +Cc: mst, Willem de Bruijn



On 2017年03月03日 22:39, Willem de Bruijn wrote:
> +void vhost_signal(struct vhost_dev *dev, struct vhost_virtqueue *vq);
> +static enum hrtimer_restart vhost_coalesce_timer(struct hrtimer *timer)
> +{
> +	struct vhost_virtqueue *vq =
> +		container_of(timer, struct vhost_virtqueue, ctimer);
> +
> +	if (mutex_trylock(&vq->mutex)) {
> +		vq->coalesce_frames = vq->max_coalesce_frames;
> +		vhost_signal(vq->dev, vq);
> +		mutex_unlock(&vq->mutex);
> +	}
> +
> +	/* TODO: restart if lock failed and not held by handle_tx */
> +	return HRTIMER_NORESTART;
> +}
> +

Then we may lose an interrupt forever if no new tx request? I believe we 
need e.g vhost_poll_queue() here.

Thanks

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH net-next RFC 4/4] virtio-net: clean tx descriptors from rx napi
  2017-03-03 14:39 ` [PATCH net-next RFC 4/4] virtio-net: clean tx descriptors from rx napi Willem de Bruijn
@ 2017-03-06  9:34   ` Jason Wang
  2017-03-06 17:43     ` Willem de Bruijn
  0 siblings, 1 reply; 18+ messages in thread
From: Jason Wang @ 2017-03-06  9:34 UTC (permalink / raw)
  To: Willem de Bruijn, netdev; +Cc: mst, Willem de Bruijn



On 2017年03月03日 22:39, Willem de Bruijn wrote:
> From: Willem de Bruijn <willemb@google.com>
>
> Amortize the cost of virtual interrupts by doing both rx and tx work
> on reception of a receive interrupt. Together VIRTIO_F_EVENT_IDX and
> vhost interrupt moderation, this suppresses most explicit tx
> completion interrupts for bidirectional workloads.
>
> Signed-off-by: Willem de Bruijn <willemb@google.com>
> ---
>   drivers/net/virtio_net.c | 19 +++++++++++++++++++
>   1 file changed, 19 insertions(+)
>
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index 9a9031640179..21c575127d50 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -1031,6 +1031,23 @@ static int virtnet_receive(struct receive_queue *rq, int budget)
>   	return received;
>   }
>   
> +static unsigned int free_old_xmit_skbs(struct send_queue *sq, int budget);
> +
> +static void virtnet_poll_cleantx(struct receive_queue *rq)
> +{
> +	struct virtnet_info *vi = rq->vq->vdev->priv;
> +	unsigned int index = vq2rxq(rq->vq);
> +	struct send_queue *sq = &vi->sq[index];
> +	struct netdev_queue *txq = netdev_get_tx_queue(vi->dev, index);
> +
> +	__netif_tx_lock(txq, smp_processor_id());
> +	free_old_xmit_skbs(sq, sq->napi.weight);
> +	__netif_tx_unlock(txq);

Should we check tx napi weight here? Or this was treated as an 
independent optimization?

> +
> +	if (sq->vq->num_free >= 2 + MAX_SKB_FRAGS)
> +		netif_wake_subqueue(vi->dev, vq2txq(sq->vq));
> +}
> +
>   static int virtnet_poll(struct napi_struct *napi, int budget)
>   {
>   	struct receive_queue *rq =
> @@ -1039,6 +1056,8 @@ static int virtnet_poll(struct napi_struct *napi, int budget)
>   
>   	received = virtnet_receive(rq, budget);
>   
> +	virtnet_poll_cleantx(rq);
> +

Better to do the before virtnet_receive() consider refill may allocate 
memory for rx buffers.

Btw, if this is proved to be more efficient. In the future we may 
consider to:

1) use a single interrupt for both rx and tx
2) use a single napi to handle both rx and tx

Thanks

>   	/* Out of packets? */
>   	if (received < budget)
>   		virtqueue_napi_complete(napi, rq->vq, received);

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH net-next RFC 3/4] vhost: interrupt coalescing support
  2017-03-06  9:28   ` Jason Wang
@ 2017-03-06 17:31     ` Willem de Bruijn
  2017-03-08  3:25       ` Jason Wang
  0 siblings, 1 reply; 18+ messages in thread
From: Willem de Bruijn @ 2017-03-06 17:31 UTC (permalink / raw)
  To: Jason Wang; +Cc: Network Development, Michael S. Tsirkin, Willem de Bruijn

On Mon, Mar 6, 2017 at 4:28 AM, Jason Wang <jasowang@redhat.com> wrote:
>
>
> On 2017年03月03日 22:39, Willem de Bruijn wrote:
>>
>> +void vhost_signal(struct vhost_dev *dev, struct vhost_virtqueue *vq);
>> +static enum hrtimer_restart vhost_coalesce_timer(struct hrtimer *timer)
>> +{
>> +       struct vhost_virtqueue *vq =
>> +               container_of(timer, struct vhost_virtqueue, ctimer);
>> +
>> +       if (mutex_trylock(&vq->mutex)) {
>> +               vq->coalesce_frames = vq->max_coalesce_frames;
>> +               vhost_signal(vq->dev, vq);
>> +               mutex_unlock(&vq->mutex);
>> +       }
>> +
>> +       /* TODO: restart if lock failed and not held by handle_tx */
>> +       return HRTIMER_NORESTART;
>> +}
>> +
>
>
> Then we may lose an interrupt forever if no new tx request? I believe we
> need e.g vhost_poll_queue() here.

Absolutely, I need to fix this. The common case for failing to grab
the lock is competition with handle_tx. With careful coding we can
probably avoid scheduling another run with vhost_poll_queue in
the common case.

Your patch v7 cancels the pending hrtimer at the start of handle_tx.
I need to reintroduce that, and also only schedule a timer at the end
of handle_tx, not immediately when vq->coalesce_frames becomes
non-zero.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH net-next RFC 4/4] virtio-net: clean tx descriptors from rx napi
  2017-03-06  9:34   ` Jason Wang
@ 2017-03-06 17:43     ` Willem de Bruijn
  2017-03-06 18:04       ` Willem de Bruijn
  0 siblings, 1 reply; 18+ messages in thread
From: Willem de Bruijn @ 2017-03-06 17:43 UTC (permalink / raw)
  To: Jason Wang; +Cc: Network Development, Michael S. Tsirkin, Willem de Bruijn

>> +static void virtnet_poll_cleantx(struct receive_queue *rq)
>> +{
>> +       struct virtnet_info *vi = rq->vq->vdev->priv;
>> +       unsigned int index = vq2rxq(rq->vq);
>> +       struct send_queue *sq = &vi->sq[index];
>> +       struct netdev_queue *txq = netdev_get_tx_queue(vi->dev, index);
>> +
>> +       __netif_tx_lock(txq, smp_processor_id());
>> +       free_old_xmit_skbs(sq, sq->napi.weight);
>> +       __netif_tx_unlock(txq);
>
>
> Should we check tx napi weight here? Or this was treated as an independent
> optimization?

Good point. This was not intended to run in no-napi mode as is.
With interrupts disabled most of the time in that mode, I don't
expect it to be worthwhile using in that case. I'll add the check
for sq->napi.weight != 0.

>> +
>> +       if (sq->vq->num_free >= 2 + MAX_SKB_FRAGS)
>> +               netif_wake_subqueue(vi->dev, vq2txq(sq->vq));
>> +}
>> +
>>   static int virtnet_poll(struct napi_struct *napi, int budget)
>>   {
>>         struct receive_queue *rq =
>> @@ -1039,6 +1056,8 @@ static int virtnet_poll(struct napi_struct *napi,
>> int budget)
>>         received = virtnet_receive(rq, budget);
>>   +     virtnet_poll_cleantx(rq);
>> +
>
>
> Better to do the before virtnet_receive() consider refill may allocate
> memory for rx buffers.

Will do.

> Btw, if this is proved to be more efficient. In the future we may consider
> to:
>
> 1) use a single interrupt for both rx and tx
> 2) use a single napi to handle both rx and tx

Agreed, I think that's sensible.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH net-next RFC 2/4] virtio-net: transmit napi
  2017-03-06  9:21   ` Jason Wang
@ 2017-03-06 17:50     ` Willem de Bruijn
  2017-03-06 18:55       ` David Miller
  0 siblings, 1 reply; 18+ messages in thread
From: Willem de Bruijn @ 2017-03-06 17:50 UTC (permalink / raw)
  To: Jason Wang; +Cc: Network Development, Michael S. Tsirkin, Willem de Bruijn

>>   drivers/net/virtio_net.c | 73
>> ++++++++++++++++++++++++++++++++++++++++--------
>>   1 file changed, 61 insertions(+), 12 deletions(-)
>>
>> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
>> index 8c21e9a4adc7..9a9031640179 100644
>> --- a/drivers/net/virtio_net.c
>> +++ b/drivers/net/virtio_net.c
>> @@ -33,6 +33,8 @@
>>   static int napi_weight = NAPI_POLL_WEIGHT;
>>   module_param(napi_weight, int, 0444);
>>   +static int napi_tx_weight = NAPI_POLL_WEIGHT;
>> +
>
>
> Maybe we should use module_param for this? Or in the future, use
> tx-frames-irq for a per-device configuration.

This option should eventually just go away, and napi tx become the
standard mode.

In the short term, while we evaluate it on varied workloads, a
module_param sounds good to me. In general that is frowned
upon, as it leads to different configuration interfaces for each
device driver. But that should not be a concern in this limited
case.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH net-next RFC 4/4] virtio-net: clean tx descriptors from rx napi
  2017-03-06 17:43     ` Willem de Bruijn
@ 2017-03-06 18:04       ` Willem de Bruijn
  0 siblings, 0 replies; 18+ messages in thread
From: Willem de Bruijn @ 2017-03-06 18:04 UTC (permalink / raw)
  To: Jason Wang; +Cc: Network Development, Michael S. Tsirkin, Willem de Bruijn

On Mon, Mar 6, 2017 at 12:43 PM, Willem de Bruijn
<willemdebruijn.kernel@gmail.com> wrote:
>>> +static void virtnet_poll_cleantx(struct receive_queue *rq)
>>> +{
>>> +       struct virtnet_info *vi = rq->vq->vdev->priv;
>>> +       unsigned int index = vq2rxq(rq->vq);
>>> +       struct send_queue *sq = &vi->sq[index];
>>> +       struct netdev_queue *txq = netdev_get_tx_queue(vi->dev, index);
>>> +
>>> +       __netif_tx_lock(txq, smp_processor_id());
>>> +       free_old_xmit_skbs(sq, sq->napi.weight);
>>> +       __netif_tx_unlock(txq);
>>
>>
>> Should we check tx napi weight here? Or this was treated as an independent
>> optimization?
>
> Good point. This was not intended to run in no-napi mode as is.
> With interrupts disabled most of the time in that mode, I don't
> expect it to be worthwhile using in that case. I'll add the check
> for sq->napi.weight != 0.

I'm wrong here. Rx interrupts are not disabled, of course. It is
probably worth benchmarking, then.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH net-next RFC 2/4] virtio-net: transmit napi
  2017-03-06 17:50     ` Willem de Bruijn
@ 2017-03-06 18:55       ` David Miller
  2017-03-06 19:33         ` Michael S. Tsirkin
  0 siblings, 1 reply; 18+ messages in thread
From: David Miller @ 2017-03-06 18:55 UTC (permalink / raw)
  To: willemdebruijn.kernel; +Cc: jasowang, netdev, mst, willemb

From: Willem de Bruijn <willemdebruijn.kernel@gmail.com>
Date: Mon, 6 Mar 2017 12:50:19 -0500

>>>   drivers/net/virtio_net.c | 73
>>> ++++++++++++++++++++++++++++++++++++++++--------
>>>   1 file changed, 61 insertions(+), 12 deletions(-)
>>>
>>> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
>>> index 8c21e9a4adc7..9a9031640179 100644
>>> --- a/drivers/net/virtio_net.c
>>> +++ b/drivers/net/virtio_net.c
>>> @@ -33,6 +33,8 @@
>>>   static int napi_weight = NAPI_POLL_WEIGHT;
>>>   module_param(napi_weight, int, 0444);
>>>   +static int napi_tx_weight = NAPI_POLL_WEIGHT;
>>> +
>>
>>
>> Maybe we should use module_param for this? Or in the future, use
>> tx-frames-irq for a per-device configuration.
> 
> This option should eventually just go away, and napi tx become the
> standard mode.
> 
> In the short term, while we evaluate it on varied workloads, a
> module_param sounds good to me. In general that is frowned
> upon, as it leads to different configuration interfaces for each
> device driver. But that should not be a concern in this limited
> case.

In any event, do we really need a TX weight at all?

I guess you tried this, but why doesn't it not work to just do
all TX work unconditionally in a NAPI poll pass?  This is how
we encourage all NIC drivers to handle this.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH net-next RFC 2/4] virtio-net: transmit napi
  2017-03-06 18:55       ` David Miller
@ 2017-03-06 19:33         ` Michael S. Tsirkin
  2017-03-06 19:43           ` Willem de Bruijn
  0 siblings, 1 reply; 18+ messages in thread
From: Michael S. Tsirkin @ 2017-03-06 19:33 UTC (permalink / raw)
  To: David Miller; +Cc: willemdebruijn.kernel, jasowang, netdev, willemb

On Mon, Mar 06, 2017 at 10:55:22AM -0800, David Miller wrote:
> From: Willem de Bruijn <willemdebruijn.kernel@gmail.com>
> Date: Mon, 6 Mar 2017 12:50:19 -0500
> 
> >>>   drivers/net/virtio_net.c | 73
> >>> ++++++++++++++++++++++++++++++++++++++++--------
> >>>   1 file changed, 61 insertions(+), 12 deletions(-)
> >>>
> >>> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> >>> index 8c21e9a4adc7..9a9031640179 100644
> >>> --- a/drivers/net/virtio_net.c
> >>> +++ b/drivers/net/virtio_net.c
> >>> @@ -33,6 +33,8 @@
> >>>   static int napi_weight = NAPI_POLL_WEIGHT;
> >>>   module_param(napi_weight, int, 0444);
> >>>   +static int napi_tx_weight = NAPI_POLL_WEIGHT;
> >>> +
> >>
> >>
> >> Maybe we should use module_param for this? Or in the future, use
> >> tx-frames-irq for a per-device configuration.
> > 
> > This option should eventually just go away, and napi tx become the
> > standard mode.
> > 
> > In the short term, while we evaluate it on varied workloads, a
> > module_param sounds good to me. In general that is frowned
> > upon, as it leads to different configuration interfaces for each
> > device driver. But that should not be a concern in this limited
> > case.
> 
> In any event, do we really need a TX weight at all?
> 
> I guess you tried this, but why doesn't it not work to just do
> all TX work unconditionally in a NAPI poll pass?  This is how
> we encourage all NIC drivers to handle this.

This seems to be more or less what this driver does already.
So I suspect it can just ignore the weight.

-- 
MST

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH net-next RFC 2/4] virtio-net: transmit napi
  2017-03-06 19:33         ` Michael S. Tsirkin
@ 2017-03-06 19:43           ` Willem de Bruijn
  0 siblings, 0 replies; 18+ messages in thread
From: Willem de Bruijn @ 2017-03-06 19:43 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: David Miller, Jason Wang, Network Development, Willem de Bruijn

On Mon, Mar 6, 2017 at 2:33 PM, Michael S. Tsirkin <mst@redhat.com> wrote:
> On Mon, Mar 06, 2017 at 10:55:22AM -0800, David Miller wrote:
>> From: Willem de Bruijn <willemdebruijn.kernel@gmail.com>
>> Date: Mon, 6 Mar 2017 12:50:19 -0500
>>
>> >>>   drivers/net/virtio_net.c | 73
>> >>> ++++++++++++++++++++++++++++++++++++++++--------
>> >>>   1 file changed, 61 insertions(+), 12 deletions(-)
>> >>>
>> >>> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
>> >>> index 8c21e9a4adc7..9a9031640179 100644
>> >>> --- a/drivers/net/virtio_net.c
>> >>> +++ b/drivers/net/virtio_net.c
>> >>> @@ -33,6 +33,8 @@
>> >>>   static int napi_weight = NAPI_POLL_WEIGHT;
>> >>>   module_param(napi_weight, int, 0444);
>> >>>   +static int napi_tx_weight = NAPI_POLL_WEIGHT;
>> >>> +
>> >>
>> >>
>> >> Maybe we should use module_param for this? Or in the future, use
>> >> tx-frames-irq for a per-device configuration.
>> >
>> > This option should eventually just go away, and napi tx become the
>> > standard mode.
>> >
>> > In the short term, while we evaluate it on varied workloads, a
>> > module_param sounds good to me. In general that is frowned
>> > upon, as it leads to different configuration interfaces for each
>> > device driver. But that should not be a concern in this limited
>> > case.
>>
>> In any event, do we really need a TX weight at all?
>>
>> I guess you tried this, but why doesn't it not work to just do
>> all TX work unconditionally in a NAPI poll pass?  This is how
>> we encourage all NIC drivers to handle this.
>
> This seems to be more or less what this driver does already.
> So I suspect it can just ignore the weight.

Okay. Then we still need a boolean to toggle tx napi until we're
sure that the old path can be deprecated.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH net-next RFC 0/4] virtio-net tx napi
  2017-03-03 14:39 [PATCH net-next RFC 0/4] virtio-net tx napi Willem de Bruijn
                   ` (3 preceding siblings ...)
  2017-03-03 14:39 ` [PATCH net-next RFC 4/4] virtio-net: clean tx descriptors from rx napi Willem de Bruijn
@ 2017-03-06 19:44 ` Michael S. Tsirkin
  4 siblings, 0 replies; 18+ messages in thread
From: Michael S. Tsirkin @ 2017-03-06 19:44 UTC (permalink / raw)
  To: Willem de Bruijn; +Cc: netdev, jasowang, Willem de Bruijn

On Fri, Mar 03, 2017 at 09:39:05AM -0500, Willem de Bruijn wrote:
> From: Willem de Bruijn <willemb@google.com>
> 
> Add napi for virtio-net transmit completion processing. Based on
> previous patchsets by Jason Wang:
> 
>   [RFC V7 PATCH 0/7] enable tx interrupts for virtio-net
>   http://lkml.iu.edu/hypermail/linux/kernel/1505.3/00245.html
> 
> This patchset is not ready for submission yet, but it is time for
> another checkpoint. Among others, it requires more testing with
> more diverse workloads.
> 
> 
> Before commit b0c39dbdc204 ("virtio_net: don't free buffers in xmit
> ring") the virtio-net driver would free transmitted packets on
> transmission of new packets in ndo_start_xmit and, to catch the edge
> case when no new packet is sent, also in a timer at 10HZ.
> 
> A timer can cause long stalls. VIRTIO_F_NOTIFY_ON_EMPTY avoids stalls
> due to low free descriptor count. It does not address a stalls due to
> low socket SO_SNDBUF. Increasing timer frequency decreases that stall
> time, but increases interrupt rate and, thus, cycle count.
> 
> Currently, with no timer, packets are freed only at ndo_start_xmit.
> Latency of consume_skb is now unbounded. To avoid a deadlock if a sock
> reaches SO_SNDBUF, packets are orphaned on tx. This breaks TCP small
> queues.
> 
> Reenable TCP small queues by removing the orphan. Instead of using a
> timer, convert the driver to regular tx napi. This does not have the
> unresolved stall issue and does not have any frequency to tune.
> 
> By keeping interrupts enabled by default, napi increases tx
> interrupt rate. VIRTIO_F_EVENT_IDX avoids sending an interrupt if
> one is already unacknowledged, so makes this more feasible today.
> Combine that with two optimizations that bring interrupt rate
> back in line with the existing code:
> 
> Interrupt coalescing delays interrupts until a number of events
> accrue or a timer fires.
> 
> Tx completion cleaning on rx interrupts elides most explicit tx
> interrupts by relying on the fact that many rx interrupts fire.
> 
> Tested by running {1, 10, 100} TCP_STREAM and TCP_RR tests from a
> guest to a server on the host, on an x86_64 Haswell. The guest
> runs 4 vCPUs pinned to 4 cores. vhost and the test server are
> pinned to a core each.
> 
> All results are the median of 5 runs, with variance well < 10%.
> Used neper (github.com/google/neper) as test process. Tests used
> experimental_zcopy=0. This is likely no longer needed.
> 
> Napi increases single stream throughput, but increases cycle cost
> across the board. Interrupt moderation ("+vhost") reverts both, if
> not fully. For this workload with ACKs in the return path, the
> last optimization ("at-rx") is more effective. For UDP this is
> likely not true.

I am inclined to say coalescing is more problematic because under light
load it causes timers to fire on host, causing exits if any VM is
running on the same core. Some people might have spare cores, etc,
but generally I think at-rx might be less disruptive.

UDP testing would be required to determine how effective it is,
current numbers look nice.

>              upstream     napi   +vhost   +at-rx +v+at-rx
> Stream:
>   1x:
>   Mbps          30182    38782    30106    38002    32842
>   Gcycles         405      499      386      403      417        
> 
>   10x:
>   Mbps          40441    40575    41638    40260    41299
>   Gcycles         438      545      430      416      416
> 
>   100x:
>   Mbps          34049    34697    34763    34637    34259
>   Gcycles         441      545      433      415      422
> 
> Latency (us):
>   1x:
>   p50              24       24       24       21       24
>   p99              27       27       27       26       27
>   Gcycles         299      430      432      312      297
> 
> 10x:
>   p50              30       31       31       42       31
>   p99              40       46       48       52       42
>   Gcycles         347      423      471      322      463
> 
> 100x:
>   p50             155      151      163      306      161
>   p99             337      329      352      361      349
>   Gcycles         340      421      463      306      441
> 
> 
> Lower throughput at 100x vs 10x can be (at least in part)
> explained by looking at bytes per packet sent (nstat). It likely
> also explains the lower throughput of 1x for some variants.
> 
> upstream:
> 
>  N=1   bytes/pkt=16581
>  N=10  bytes/pkt=61513
>  N=100 bytes/pkt=51558
> 
> at_rx:
> 
>  N=1   bytes/pkt=65204
>  N=10  bytes/pkt=65148
>  N=100 bytes/pkt=56840
> 
> For this experiment, vhost has 64 frames and usecs thresholds.
> Configuring this from the guest requires additional patches to qemu.
> Temporary patch:
> 
>   @@ -846,9 +845,6 @@ static int vhost_net_open(struct inode *inode, struct file *f)
>           vhost_poll_init(n->poll + VHOST_NET_VQ_TX, handle_tx_net, POLLOUT, dev);
>           vhost_poll_init(n->poll + VHOST_NET_VQ_RX, handle_rx_net, POLLIN, dev);
>  
>   -       vqs[VHOST_NET_VQ_TX]->max_coalesce_ktime = ktime_set(0, 64 * NSEC_PER_USEC);
>   -       vqs[VHOST_NET_VQ_TX]->max_coalesce_frames = 64;
>   -
>           f->private_data = n;
> 
> TODO
>  - restart timer if trylock failed and lock not held by hande_tx
>    - start timer only at end of handle_tx and kill at start
>  - make napi_tx configurable
>  - increase test coverage
>    - 4KB TCP_RR
>    - UDP
>    - multithreaded sender
>    - with experimental_zcopytx
> 
> 
> Willem de Bruijn (4):
>   virtio-net: napi helper functions
>   virtio-net: transmit napi
>   vhost: interrupt coalescing support
>   virtio-net: clean tx descriptors from rx napi
> 
>  drivers/net/virtio_net.c   | 157 +++++++++++++++++++++++++++++++++------------
>  drivers/vhost/vhost.c      |  74 ++++++++++++++++++++-
>  drivers/vhost/vhost.h      |  12 ++++
>  include/uapi/linux/vhost.h |  11 ++++
>  4 files changed, 211 insertions(+), 43 deletions(-)
> 
> -- 
> 2.12.0.rc1.440.g5b76565f74-goog

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH net-next RFC 3/4] vhost: interrupt coalescing support
  2017-03-06 17:31     ` Willem de Bruijn
@ 2017-03-08  3:25       ` Jason Wang
  0 siblings, 0 replies; 18+ messages in thread
From: Jason Wang @ 2017-03-08  3:25 UTC (permalink / raw)
  To: Willem de Bruijn
  Cc: Network Development, Michael S. Tsirkin, Willem de Bruijn



On 2017年03月07日 01:31, Willem de Bruijn wrote:
> On Mon, Mar 6, 2017 at 4:28 AM, Jason Wang <jasowang@redhat.com> wrote:
>>
>> On 2017年03月03日 22:39, Willem de Bruijn wrote:
>>> +void vhost_signal(struct vhost_dev *dev, struct vhost_virtqueue *vq);
>>> +static enum hrtimer_restart vhost_coalesce_timer(struct hrtimer *timer)
>>> +{
>>> +       struct vhost_virtqueue *vq =
>>> +               container_of(timer, struct vhost_virtqueue, ctimer);
>>> +
>>> +       if (mutex_trylock(&vq->mutex)) {
>>> +               vq->coalesce_frames = vq->max_coalesce_frames;
>>> +               vhost_signal(vq->dev, vq);
>>> +               mutex_unlock(&vq->mutex);
>>> +       }
>>> +
>>> +       /* TODO: restart if lock failed and not held by handle_tx */
>>> +       return HRTIMER_NORESTART;
>>> +}
>>> +
>>
>> Then we may lose an interrupt forever if no new tx request? I believe we
>> need e.g vhost_poll_queue() here.
> Absolutely, I need to fix this. The common case for failing to grab
> the lock is competition with handle_tx. With careful coding we can
> probably avoid scheduling another run with vhost_poll_queue in
> the common case.

Yes, probably add some checking after releasing the mutex_lock in 
handle_tx().

Thans

>
> Your patch v7 cancels the pending hrtimer at the start of handle_tx.
> I need to reintroduce that, and also only schedule a timer at the end
> of handle_tx, not immediately when vq->coalesce_frames becomes
> non-zero.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH net-next RFC 2/4] virtio-net: transmit napi
@ 2017-03-03 15:23 Willem de Bruijn
  0 siblings, 0 replies; 18+ messages in thread
From: Willem de Bruijn @ 2017-03-03 15:23 UTC (permalink / raw)
  To: netdev; +Cc: jasowang, mst, Willem de Bruijn

From: Willem de Bruijn <willemb@google.com>

Convert virtio-net to a standard napi tx completion path. This enables
better TCP pacing using TCP small queues and increases single stream
throughput.

The virtio-net driver currently cleans tx descriptors on transmission
of new packets in ndo_start_xmit. Latency depends on new traffic, so
is unbounded. To avoid deadlock when a socket reaches its snd limit,
packets are orphaned on tranmission. This breaks socket backpressure,
including TSQ.

Napi increases the number of interrupts generated compared to the
current model, which keeps interrupts disabled as long as the ring
has enough free descriptors. Keep tx napi optional for now. Follow-on
patches will reduce the interrupt cost.

Signed-off-by: Willem de Bruijn <willemb@google.com>
---
 drivers/net/virtio_net.c | 73 ++++++++++++++++++++++++++++++++++++++++--------
 1 file changed, 61 insertions(+), 12 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 8c21e9a4adc7..9a9031640179 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -33,6 +33,8 @@
 static int napi_weight = NAPI_POLL_WEIGHT;
 module_param(napi_weight, int, 0444);
 
+static int napi_tx_weight = NAPI_POLL_WEIGHT;
+
 static bool csum = true, gso = true;
 module_param(csum, bool, 0444);
 module_param(gso, bool, 0444);
@@ -86,6 +88,8 @@ struct send_queue {
 
 	/* Name of the send queue: output.$index */
 	char name[40];
+
+	struct napi_struct napi;
 };
 
 /* Internal representation of a receive virtqueue */
@@ -262,12 +266,16 @@ static void virtqueue_napi_complete(struct napi_struct *napi,
 static void skb_xmit_done(struct virtqueue *vq)
 {
 	struct virtnet_info *vi = vq->vdev->priv;
+	struct napi_struct *napi = &vi->sq[vq2txq(vq)].napi;
 
 	/* Suppress further interrupts. */
 	virtqueue_disable_cb(vq);
 
-	/* We were probably waiting for more output buffers. */
-	netif_wake_subqueue(vi->dev, vq2txq(vq));
+	if (napi->weight)
+		virtqueue_napi_schedule(napi, vq);
+	else
+		/* We were probably waiting for more output buffers. */
+		netif_wake_subqueue(vi->dev, vq2txq(vq));
 }
 
 static unsigned int mergeable_ctx_to_buf_truesize(unsigned long mrg_ctx)
@@ -961,6 +969,9 @@ static void skb_recv_done(struct virtqueue *rvq)
 
 static void virtnet_napi_enable(struct virtqueue *vq, struct napi_struct *napi)
 {
+	if (!napi->weight)
+		return;
+
 	napi_enable(napi);
 
 	/* If all buffers were filled by other side before we napi_enabled, we
@@ -1046,12 +1057,13 @@ static int virtnet_open(struct net_device *dev)
 			if (!try_fill_recv(vi, &vi->rq[i], GFP_KERNEL))
 				schedule_delayed_work(&vi->refill, 0);
 		virtnet_napi_enable(vi->rq[i].vq, &vi->rq[i].napi);
+		virtnet_napi_enable(vi->sq[i].vq, &vi->sq[i].napi);
 	}
 
 	return 0;
 }
 
-static void free_old_xmit_skbs(struct send_queue *sq)
+static unsigned int free_old_xmit_skbs(struct send_queue *sq, int budget)
 {
 	struct sk_buff *skb;
 	unsigned int len;
@@ -1060,7 +1072,8 @@ static void free_old_xmit_skbs(struct send_queue *sq)
 	unsigned int packets = 0;
 	unsigned int bytes = 0;
 
-	while ((skb = virtqueue_get_buf(sq->vq, &len)) != NULL) {
+	while (packets < budget &&
+	       (skb = virtqueue_get_buf(sq->vq, &len)) != NULL) {
 		pr_debug("Sent skb %p\n", skb);
 
 		bytes += skb->len;
@@ -1073,12 +1086,35 @@ static void free_old_xmit_skbs(struct send_queue *sq)
 	 * happens when called speculatively from start_xmit.
 	 */
 	if (!packets)
-		return;
+		return 0;
 
 	u64_stats_update_begin(&stats->tx_syncp);
 	stats->tx_bytes += bytes;
 	stats->tx_packets += packets;
 	u64_stats_update_end(&stats->tx_syncp);
+
+	return packets;
+}
+
+static int virtnet_poll_tx(struct napi_struct *napi, int budget)
+{
+	struct send_queue *sq = container_of(napi, struct send_queue, napi);
+	struct virtnet_info *vi = sq->vq->vdev->priv;
+	struct netdev_queue *txq = netdev_get_tx_queue(vi->dev, vq2txq(sq->vq));
+	bool complete = false;
+
+	__netif_tx_lock(txq, smp_processor_id());
+	if (free_old_xmit_skbs(sq, budget) < budget)
+		complete = true;
+	__netif_tx_unlock(txq);
+
+	if (complete)
+		virtqueue_napi_complete(napi, sq->vq, 0);
+
+	if (sq->vq->num_free >= 2 + MAX_SKB_FRAGS)
+		netif_wake_subqueue(vi->dev, vq2txq(sq->vq));
+
+	return complete ? 0 : budget;
 }
 
 static int xmit_skb(struct send_queue *sq, struct sk_buff *skb)
@@ -1130,9 +1166,11 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev)
 	int err;
 	struct netdev_queue *txq = netdev_get_tx_queue(dev, qnum);
 	bool kick = !skb->xmit_more;
+	bool use_napi = sq->napi.weight;
 
 	/* Free up any pending old buffers before queueing new ones. */
-	free_old_xmit_skbs(sq);
+	if (!use_napi)
+		free_old_xmit_skbs(sq, napi_weight);
 
 	/* timestamp packet in software */
 	skb_tx_timestamp(skb);
@@ -1152,8 +1190,10 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev)
 	}
 
 	/* Don't wait up for transmitted skbs to be freed. */
-	skb_orphan(skb);
-	nf_reset(skb);
+	if (!use_napi) {
+		skb_orphan(skb);
+		nf_reset(skb);
+	}
 
 	/* If running out of space, stop queue to avoid getting packets that we
 	 * are then unable to transmit.
@@ -1167,9 +1207,10 @@ static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev)
 	 */
 	if (sq->vq->num_free < 2+MAX_SKB_FRAGS) {
 		netif_stop_subqueue(dev, qnum);
-		if (unlikely(!virtqueue_enable_cb_delayed(sq->vq))) {
+		if (!use_napi &&
+		    unlikely(!virtqueue_enable_cb_delayed(sq->vq))) {
 			/* More just got used, free them then recheck. */
-			free_old_xmit_skbs(sq);
+			free_old_xmit_skbs(sq, virtqueue_get_vring_size(sq->vq));
 			if (sq->vq->num_free >= 2+MAX_SKB_FRAGS) {
 				netif_start_subqueue(dev, qnum);
 				virtqueue_disable_cb(sq->vq);
@@ -1371,8 +1412,10 @@ static int virtnet_close(struct net_device *dev)
 	/* Make sure refill_work doesn't re-enable napi! */
 	cancel_delayed_work_sync(&vi->refill);
 
-	for (i = 0; i < vi->max_queue_pairs; i++)
+	for (i = 0; i < vi->max_queue_pairs; i++) {
 		napi_disable(&vi->rq[i].napi);
+		napi_disable(&vi->sq[i].napi);
+	}
 
 	return 0;
 }
@@ -1719,6 +1762,7 @@ static void virtnet_freeze_down(struct virtio_device *vdev)
 	if (netif_running(vi->dev)) {
 		for (i = 0; i < vi->max_queue_pairs; i++)
 			napi_disable(&vi->rq[i].napi);
+			napi_disable(&vi->sq[i].napi);
 	}
 }
 
@@ -1741,8 +1785,10 @@ static int virtnet_restore_up(struct virtio_device *vdev)
 			if (!try_fill_recv(vi, &vi->rq[i], GFP_KERNEL))
 				schedule_delayed_work(&vi->refill, 0);
 
-		for (i = 0; i < vi->max_queue_pairs; i++)
+		for (i = 0; i < vi->max_queue_pairs; i++) {
 			virtnet_napi_enable(vi->rq[i].vq, &vi->rq[i].napi);
+			virtnet_napi_enable(vi->sq[i].vq, &vi->sq[i].napi);
+		}
 	}
 
 	netif_device_attach(vi->dev);
@@ -1947,6 +1993,7 @@ static void virtnet_free_queues(struct virtnet_info *vi)
 	for (i = 0; i < vi->max_queue_pairs; i++) {
 		napi_hash_del(&vi->rq[i].napi);
 		netif_napi_del(&vi->rq[i].napi);
+		netif_napi_del(&vi->sq[i].napi);
 	}
 
 	/* We called napi_hash_del() before netif_napi_del(),
@@ -2132,6 +2179,8 @@ static int virtnet_alloc_queues(struct virtnet_info *vi)
 		vi->rq[i].pages = NULL;
 		netif_napi_add(vi->dev, &vi->rq[i].napi, virtnet_poll,
 			       napi_weight);
+		netif_napi_add(vi->dev, &vi->sq[i].napi, virtnet_poll_tx,
+			       napi_tx_weight);
 
 		sg_init_table(vi->rq[i].sg, ARRAY_SIZE(vi->rq[i].sg));
 		ewma_pkt_len_init(&vi->rq[i].mrg_avg_pkt_len);
-- 
2.12.0.rc1.440.g5b76565f74-goog

^ permalink raw reply related	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2017-03-08  3:25 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-03-03 14:39 [PATCH net-next RFC 0/4] virtio-net tx napi Willem de Bruijn
2017-03-03 14:39 ` [PATCH net-next RFC 1/4] virtio-net: napi helper functions Willem de Bruijn
2017-03-03 14:39 ` [PATCH net-next RFC 2/4] virtio-net: transmit napi Willem de Bruijn
2017-03-06  9:21   ` Jason Wang
2017-03-06 17:50     ` Willem de Bruijn
2017-03-06 18:55       ` David Miller
2017-03-06 19:33         ` Michael S. Tsirkin
2017-03-06 19:43           ` Willem de Bruijn
2017-03-03 14:39 ` [PATCH net-next RFC 3/4] vhost: interrupt coalescing support Willem de Bruijn
2017-03-06  9:28   ` Jason Wang
2017-03-06 17:31     ` Willem de Bruijn
2017-03-08  3:25       ` Jason Wang
2017-03-03 14:39 ` [PATCH net-next RFC 4/4] virtio-net: clean tx descriptors from rx napi Willem de Bruijn
2017-03-06  9:34   ` Jason Wang
2017-03-06 17:43     ` Willem de Bruijn
2017-03-06 18:04       ` Willem de Bruijn
2017-03-06 19:44 ` [PATCH net-next RFC 0/4] virtio-net tx napi Michael S. Tsirkin
2017-03-03 15:23 [PATCH net-next RFC 2/4] virtio-net: transmit napi Willem de Bruijn

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.