All of lore.kernel.org
 help / color / mirror / Atom feed
From: Willem de Bruijn <willemdebruijn.kernel@gmail.com>
To: netdev@vger.kernel.org
Cc: davem@davemloft.net, mst@redhat.com, jasowang@redhat.com,
	den@klaipeden.com, virtualization@lists.linux-foundation.org,
	Willem de Bruijn <willemb@google.com>
Subject: [PATCH net-next] vhost_net: do not stall on zerocopy depletion
Date: Wed, 27 Sep 2017 20:25:56 -0400	[thread overview]
Message-ID: <20170928002556.41240-1-willemdebruijn.kernel@gmail.com> (raw)

From: Willem de Bruijn <willemb@google.com>

Vhost-net has a hard limit on the number of zerocopy skbs in flight.
When reached, transmission stalls. Stalls cause latency, as well as
head-of-line blocking of other flows that do not use zerocopy.

Instead of stalling, revert to copy-based transmission.

Tested by sending two udp flows from guest to host, one with payload
of VHOST_GOODCOPY_LEN, the other too small for zerocopy (1B). The
large flow is redirected to a netem instance with 1MBps rate limit
and deep 1000 entry queue.

  modprobe ifb
  ip link set dev ifb0 up
  tc qdisc add dev ifb0 root netem limit 1000 rate 1MBit

  tc qdisc add dev tap0 ingress
  tc filter add dev tap0 parent ffff: protocol ip \
      u32 match ip dport 8000 0xffff \
      action mirred egress redirect dev ifb0

Before the delay, both flows process around 80K pps. With the delay,
before this patch, both process around 400. After this patch, the
large flow is still rate limited, while the small reverts to its
original rate. See also discussion in the first link, below.

The limit in vhost_exceeds_maxpend must be carefully chosen. When
vq->num >> 1, the flows remain correlated. This value happens to
correspond to VHOST_MAX_PENDING for vq->num == 256. Allow smaller
fractions and ensure correctness also for much smaller values of
vq->num, by testing the min() of both explicitly. See also the
discussion in the second link below.

Link:http://lkml.kernel.org/r/CAF=yD-+Wk9sc9dXMUq1+x_hh=3ThTXa6BnZkygP3tgVpjbp93g@mail.gmail.com
Link:http://lkml.kernel.org/r/20170819064129.27272-1-den@klaipeden.com
Signed-off-by: Willem de Bruijn <willemb@google.com>
---
 drivers/vhost/net.c | 14 ++++----------
 1 file changed, 4 insertions(+), 10 deletions(-)

diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index 58585ec8699e..50758602ae9d 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -436,8 +436,8 @@ static bool vhost_exceeds_maxpend(struct vhost_net *net)
 	struct vhost_net_virtqueue *nvq = &net->vqs[VHOST_NET_VQ_TX];
 	struct vhost_virtqueue *vq = &nvq->vq;
 
-	return (nvq->upend_idx + vq->num - VHOST_MAX_PEND) % UIO_MAXIOV
-		== nvq->done_idx;
+	return (nvq->upend_idx + UIO_MAXIOV - nvq->done_idx) % UIO_MAXIOV >
+	       min(VHOST_MAX_PEND, vq->num >> 2);
 }
 
 /* Expects to be always run from workqueue - which acts as
@@ -480,12 +480,6 @@ static void handle_tx(struct vhost_net *net)
 		if (zcopy)
 			vhost_zerocopy_signal_used(net, vq);
 
-		/* If more outstanding DMAs, queue the work.
-		 * Handle upend_idx wrap around
-		 */
-		if (unlikely(vhost_exceeds_maxpend(net)))
-			break;
-
 		head = vhost_net_tx_get_vq_desc(net, vq, vq->iov,
 						ARRAY_SIZE(vq->iov),
 						&out, &in);
@@ -509,6 +503,7 @@ static void handle_tx(struct vhost_net *net)
 		len = iov_length(vq->iov, out);
 		iov_iter_init(&msg.msg_iter, WRITE, vq->iov, out, len);
 		iov_iter_advance(&msg.msg_iter, hdr_size);
+
 		/* Sanity check */
 		if (!msg_data_left(&msg)) {
 			vq_err(vq, "Unexpected header len for TX: "
@@ -519,8 +514,7 @@ static void handle_tx(struct vhost_net *net)
 		len = msg_data_left(&msg);
 
 		zcopy_used = zcopy && len >= VHOST_GOODCOPY_LEN
-				   && (nvq->upend_idx + 1) % UIO_MAXIOV !=
-				      nvq->done_idx
+				   && !vhost_exceeds_maxpend(net)
 				   && vhost_net_tx_select_zcopy(net);
 
 		/* use msg_control to pass vhost zerocopy ubuf info to skb */
-- 
2.14.2.822.g60be5d43e6-goog

             reply	other threads:[~2017-09-28  0:26 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-09-28  0:25 Willem de Bruijn [this message]
2017-09-28  0:33 ` [PATCH net-next] vhost_net: do not stall on zerocopy depletion Willem de Bruijn
2017-09-28  0:33 ` Willem de Bruijn
2017-09-28  7:41 ` Jason Wang
2017-09-28 16:05   ` Willem de Bruijn
2017-09-28 16:05   ` Willem de Bruijn
2017-09-28  7:41 ` Jason Wang
2017-09-29 19:38 ` Michael S. Tsirkin
2017-09-30  1:25   ` Willem de Bruijn
2017-09-30  1:25   ` Willem de Bruijn
2017-10-02  4:08     ` Michael S. Tsirkin
2017-10-02  4:08     ` Michael S. Tsirkin
2017-10-02 21:34     ` Willem de Bruijn
2017-10-02 21:34     ` Willem de Bruijn
2017-09-29 19:38 ` Michael S. Tsirkin
2017-09-30 22:12 ` kbuild test robot
2017-09-30 22:20 ` kbuild test robot
2017-09-30 22:20 ` kbuild test robot
2017-10-01  0:09 ` kbuild test robot
2017-10-01  3:20   ` Michael S. Tsirkin
2017-10-01  3:26     ` [kbuild-all] " Fengguang Wu
2017-10-01  3:26     ` Fengguang Wu
2017-10-01  3:20   ` Michael S. Tsirkin
2017-10-01  0:09 ` kbuild test robot
2017-09-28  0:25 Willem de Bruijn

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170928002556.41240-1-willemdebruijn.kernel@gmail.com \
    --to=willemdebruijn.kernel@gmail.com \
    --cc=davem@davemloft.net \
    --cc=den@klaipeden.com \
    --cc=jasowang@redhat.com \
    --cc=mst@redhat.com \
    --cc=netdev@vger.kernel.org \
    --cc=virtualization@lists.linux-foundation.org \
    --cc=willemb@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.