netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH V2 0/6] vhost code cleanup and minor enhancement
@ 2013-08-30  4:29 Jason Wang
  2013-08-30  4:29 ` [PATCH V2 1/6] vhost_net: make vhost_zerocopy_signal_used() returns void Jason Wang
                   ` (5 more replies)
  0 siblings, 6 replies; 21+ messages in thread
From: Jason Wang @ 2013-08-30  4:29 UTC (permalink / raw)
  To: mst, kvm, virtualization, netdev, linux-kernel

Hi all:

This series tries to unify and simplify vhost codes especially for
zerocopy. With this series, 5% - 10% improvement for per cpu throughput were
seen during netperf guest sending test.

Plase review.

Changes from V1:
- Fix the zerocopy enabling check by changing the check of upend_idx != done_idx
  to (upend_idx + 1) % UIO_MAXIOV == done_idx.
- Switch to use put_user() in __vhost_add_used_n() if there's only one used
- Keep the max pending check based on Michael's suggestion.

Jason Wang (6):
  vhost_net: make vhost_zerocopy_signal_used() returns void
  vhost_net: use vhost_add_used_and_signal_n() in
    vhost_zerocopy_signal_used()
  vhost: switch to use vhost_add_used_n()
  vhost_net: determine whether or not to use zerocopy at one time
  vhost_net: poll vhost queue after marking DMA is done
  vhost_net: correctly limit the max pending buffers

 drivers/vhost/net.c   |   88 +++++++++++++++++++++----------------------------
 drivers/vhost/vhost.c |   54 +++++++-----------------------
 2 files changed, 50 insertions(+), 92 deletions(-)

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH V2 1/6] vhost_net: make vhost_zerocopy_signal_used() returns void
  2013-08-30  4:29 [PATCH V2 0/6] vhost code cleanup and minor enhancement Jason Wang
@ 2013-08-30  4:29 ` Jason Wang
  2013-09-02  5:51   ` Michael S. Tsirkin
  2013-08-30  4:29 ` [PATCH V2 2/6] vhost_net: use vhost_add_used_and_signal_n() in vhost_zerocopy_signal_used() Jason Wang
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 21+ messages in thread
From: Jason Wang @ 2013-08-30  4:29 UTC (permalink / raw)
  To: mst, kvm, virtualization, netdev, linux-kernel

None of its caller use its return value, so let it return void.

Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 drivers/vhost/net.c |    5 ++---
 1 files changed, 2 insertions(+), 3 deletions(-)

diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index 969a859..280ee66 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -276,8 +276,8 @@ static void copy_iovec_hdr(const struct iovec *from, struct iovec *to,
  * of used idx. Once lower device DMA done contiguously, we will signal KVM
  * guest used idx.
  */
-static int vhost_zerocopy_signal_used(struct vhost_net *net,
-				      struct vhost_virtqueue *vq)
+static void vhost_zerocopy_signal_used(struct vhost_net *net,
+				       struct vhost_virtqueue *vq)
 {
 	struct vhost_net_virtqueue *nvq =
 		container_of(vq, struct vhost_net_virtqueue, vq);
@@ -297,7 +297,6 @@ static int vhost_zerocopy_signal_used(struct vhost_net *net,
 	}
 	if (j)
 		nvq->done_idx = i;
-	return j;
 }
 
 static void vhost_zerocopy_callback(struct ubuf_info *ubuf, bool success)
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH V2 2/6] vhost_net: use vhost_add_used_and_signal_n() in vhost_zerocopy_signal_used()
  2013-08-30  4:29 [PATCH V2 0/6] vhost code cleanup and minor enhancement Jason Wang
  2013-08-30  4:29 ` [PATCH V2 1/6] vhost_net: make vhost_zerocopy_signal_used() returns void Jason Wang
@ 2013-08-30  4:29 ` Jason Wang
  2013-09-02  5:50   ` Michael S. Tsirkin
  2013-08-30  4:29 ` [PATCH V2 3/6] vhost: switch to use vhost_add_used_n() Jason Wang
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 21+ messages in thread
From: Jason Wang @ 2013-08-30  4:29 UTC (permalink / raw)
  To: mst, kvm, virtualization, netdev, linux-kernel

We tend to batch the used adding and signaling in vhost_zerocopy_callback()
which may result more than 100 used buffers to be updated in
vhost_zerocopy_signal_used() in some cases. So wwitch to use
vhost_add_used_and_signal_n() to avoid multiple calls to
vhost_add_used_and_signal(). Which means much more less times of used index
updating and memory barriers.

Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 drivers/vhost/net.c |   13 ++++++++-----
 1 files changed, 8 insertions(+), 5 deletions(-)

diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index 280ee66..8a6dd0d 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -281,7 +281,7 @@ static void vhost_zerocopy_signal_used(struct vhost_net *net,
 {
 	struct vhost_net_virtqueue *nvq =
 		container_of(vq, struct vhost_net_virtqueue, vq);
-	int i;
+	int i, add;
 	int j = 0;
 
 	for (i = nvq->done_idx; i != nvq->upend_idx; i = (i + 1) % UIO_MAXIOV) {
@@ -289,14 +289,17 @@ static void vhost_zerocopy_signal_used(struct vhost_net *net,
 			vhost_net_tx_err(net);
 		if (VHOST_DMA_IS_DONE(vq->heads[i].len)) {
 			vq->heads[i].len = VHOST_DMA_CLEAR_LEN;
-			vhost_add_used_and_signal(vq->dev, vq,
-						  vq->heads[i].id, 0);
 			++j;
 		} else
 			break;
 	}
-	if (j)
-		nvq->done_idx = i;
+	while (j) {
+		add = min(UIO_MAXIOV - nvq->done_idx, j);
+		vhost_add_used_and_signal_n(vq->dev, vq,
+					    &vq->heads[nvq->done_idx], add);
+		nvq->done_idx = (nvq->done_idx + add) % UIO_MAXIOV;
+		j -= add;
+	}
 }
 
 static void vhost_zerocopy_callback(struct ubuf_info *ubuf, bool success)
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH V2 3/6] vhost: switch to use vhost_add_used_n()
  2013-08-30  4:29 [PATCH V2 0/6] vhost code cleanup and minor enhancement Jason Wang
  2013-08-30  4:29 ` [PATCH V2 1/6] vhost_net: make vhost_zerocopy_signal_used() returns void Jason Wang
  2013-08-30  4:29 ` [PATCH V2 2/6] vhost_net: use vhost_add_used_and_signal_n() in vhost_zerocopy_signal_used() Jason Wang
@ 2013-08-30  4:29 ` Jason Wang
  2013-08-30  4:29 ` [PATCH V2 4/6] vhost_net: determine whether or not to use zerocopy at one time Jason Wang
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 21+ messages in thread
From: Jason Wang @ 2013-08-30  4:29 UTC (permalink / raw)
  To: mst, kvm, virtualization, netdev, linux-kernel

Let vhost_add_used() to use vhost_add_used_n() to reduce the code duplication.

Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 drivers/vhost/vhost.c |   54 ++++++++++--------------------------------------
 1 files changed, 12 insertions(+), 42 deletions(-)

diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index e58cf00..124c433 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -1332,48 +1332,9 @@ EXPORT_SYMBOL_GPL(vhost_discard_vq_desc);
  * want to notify the guest, using eventfd. */
 int vhost_add_used(struct vhost_virtqueue *vq, unsigned int head, int len)
 {
-	struct vring_used_elem __user *used;
+	struct vring_used_elem heads = { head, len };
 
-	/* The virtqueue contains a ring of used buffers.  Get a pointer to the
-	 * next entry in that used ring. */
-	used = &vq->used->ring[vq->last_used_idx % vq->num];
-	if (__put_user(head, &used->id)) {
-		vq_err(vq, "Failed to write used id");
-		return -EFAULT;
-	}
-	if (__put_user(len, &used->len)) {
-		vq_err(vq, "Failed to write used len");
-		return -EFAULT;
-	}
-	/* Make sure buffer is written before we update index. */
-	smp_wmb();
-	if (__put_user(vq->last_used_idx + 1, &vq->used->idx)) {
-		vq_err(vq, "Failed to increment used idx");
-		return -EFAULT;
-	}
-	if (unlikely(vq->log_used)) {
-		/* Make sure data is seen before log. */
-		smp_wmb();
-		/* Log used ring entry write. */
-		log_write(vq->log_base,
-			  vq->log_addr +
-			   ((void __user *)used - (void __user *)vq->used),
-			  sizeof *used);
-		/* Log used index update. */
-		log_write(vq->log_base,
-			  vq->log_addr + offsetof(struct vring_used, idx),
-			  sizeof vq->used->idx);
-		if (vq->log_ctx)
-			eventfd_signal(vq->log_ctx, 1);
-	}
-	vq->last_used_idx++;
-	/* If the driver never bothers to signal in a very long while,
-	 * used index might wrap around. If that happens, invalidate
-	 * signalled_used index we stored. TODO: make sure driver
-	 * signals at least once in 2^16 and remove this. */
-	if (unlikely(vq->last_used_idx == vq->signalled_used))
-		vq->signalled_used_valid = false;
-	return 0;
+	return vhost_add_used_n(vq, &heads, 1);
 }
 EXPORT_SYMBOL_GPL(vhost_add_used);
 
@@ -1387,7 +1348,16 @@ static int __vhost_add_used_n(struct vhost_virtqueue *vq,
 
 	start = vq->last_used_idx % vq->num;
 	used = vq->used->ring + start;
-	if (__copy_to_user(used, heads, count * sizeof *used)) {
+	if (count == 1) {
+		if (__put_user(heads[0].id, &used->id)) {
+			vq_err(vq, "Failed to write used id");
+			return -EFAULT;
+		}
+		if (__put_user(heads[0].len, &used->len)) {
+			vq_err(vq, "Failed to write used len");
+			return -EFAULT;
+		}
+	} else if (__copy_to_user(used, heads, count * sizeof *used)) {
 		vq_err(vq, "Failed to write used");
 		return -EFAULT;
 	}
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH V2 4/6] vhost_net: determine whether or not to use zerocopy at one time
  2013-08-30  4:29 [PATCH V2 0/6] vhost code cleanup and minor enhancement Jason Wang
                   ` (2 preceding siblings ...)
  2013-08-30  4:29 ` [PATCH V2 3/6] vhost: switch to use vhost_add_used_n() Jason Wang
@ 2013-08-30  4:29 ` Jason Wang
  2013-08-30 18:35   ` Sergei Shtylyov
  2013-08-30  4:29 ` [PATCH V2 5/6] vhost_net: poll vhost queue after marking DMA is done Jason Wang
  2013-08-30  4:29 ` [PATCH V2 6/6] vhost_net: correctly limit the max pending buffers Jason Wang
  5 siblings, 1 reply; 21+ messages in thread
From: Jason Wang @ 2013-08-30  4:29 UTC (permalink / raw)
  To: mst, kvm, virtualization, netdev, linux-kernel

Currently, even if the packet length is smaller than VHOST_GOODCOPY_LEN, if
upend_idx != done_idx we still set zcopy_used to true and rollback this choice
later. This could be avoided by determine zerocopy once by checking all
conditions at one time before.

Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 drivers/vhost/net.c |   46 +++++++++++++++++++---------------------------
 1 files changed, 19 insertions(+), 27 deletions(-)

diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index 8a6dd0d..ff60c2a 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -404,43 +404,35 @@ static void handle_tx(struct vhost_net *net)
 			       iov_length(nvq->hdr, s), hdr_size);
 			break;
 		}
-		zcopy_used = zcopy && (len >= VHOST_GOODCOPY_LEN ||
-				       nvq->upend_idx != nvq->done_idx);
+
+		zcopy_used = zcopy && len >= VHOST_GOODCOPY_LEN
+			&& (nvq->upend_idx + 1) % UIO_MAXIOV != nvq->done_idx
+			&& vhost_net_tx_select_zcopy(net);
 
 		/* use msg_control to pass vhost zerocopy ubuf info to skb */
 		if (zcopy_used) {
+			struct ubuf_info *ubuf;
+			ubuf = nvq->ubuf_info + nvq->upend_idx;
+
 			vq->heads[nvq->upend_idx].id = head;
-			if (!vhost_net_tx_select_zcopy(net) ||
-			    len < VHOST_GOODCOPY_LEN) {
-				/* copy don't need to wait for DMA done */
-				vq->heads[nvq->upend_idx].len =
-							VHOST_DMA_DONE_LEN;
-				msg.msg_control = NULL;
-				msg.msg_controllen = 0;
-				ubufs = NULL;
-			} else {
-				struct ubuf_info *ubuf;
-				ubuf = nvq->ubuf_info + nvq->upend_idx;
-
-				vq->heads[nvq->upend_idx].len =
-					VHOST_DMA_IN_PROGRESS;
-				ubuf->callback = vhost_zerocopy_callback;
-				ubuf->ctx = nvq->ubufs;
-				ubuf->desc = nvq->upend_idx;
-				msg.msg_control = ubuf;
-				msg.msg_controllen = sizeof(ubuf);
-				ubufs = nvq->ubufs;
-				kref_get(&ubufs->kref);
-			}
+			vq->heads[nvq->upend_idx].len = VHOST_DMA_IN_PROGRESS;
+			ubuf->callback = vhost_zerocopy_callback;
+			ubuf->ctx = nvq->ubufs;
+			ubuf->desc = nvq->upend_idx;
+			msg.msg_control = ubuf;
+			msg.msg_controllen = sizeof(ubuf);
+			ubufs = nvq->ubufs;
+			kref_get(&ubufs->kref);
 			nvq->upend_idx = (nvq->upend_idx + 1) % UIO_MAXIOV;
-		} else
+		} else {
 			msg.msg_control = NULL;
+			ubufs = NULL;
+		}
 		/* TODO: Check specific error and bomb out unless ENOBUFS? */
 		err = sock->ops->sendmsg(NULL, sock, &msg, len);
 		if (unlikely(err < 0)) {
 			if (zcopy_used) {
-				if (ubufs)
-					vhost_net_ubuf_put(ubufs);
+				vhost_net_ubuf_put(ubufs);
 				nvq->upend_idx = ((unsigned)nvq->upend_idx - 1)
 					% UIO_MAXIOV;
 			}
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH V2 5/6] vhost_net: poll vhost queue after marking DMA is done
  2013-08-30  4:29 [PATCH V2 0/6] vhost code cleanup and minor enhancement Jason Wang
                   ` (3 preceding siblings ...)
  2013-08-30  4:29 ` [PATCH V2 4/6] vhost_net: determine whether or not to use zerocopy at one time Jason Wang
@ 2013-08-30  4:29 ` Jason Wang
  2013-08-30 16:44   ` Ben Hutchings
  2014-02-12  7:38   ` Qin Chuanyu
  2013-08-30  4:29 ` [PATCH V2 6/6] vhost_net: correctly limit the max pending buffers Jason Wang
  5 siblings, 2 replies; 21+ messages in thread
From: Jason Wang @ 2013-08-30  4:29 UTC (permalink / raw)
  To: mst, kvm, virtualization, netdev, linux-kernel

We used to poll vhost queue before making DMA is done, this is racy if vhost
thread were waked up before marking DMA is done which can result the signal to
be missed. Fix this by always poll the vhost thread before DMA is done.

Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 drivers/vhost/net.c |    9 +++++----
 1 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index ff60c2a..d09c17c 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -308,6 +308,11 @@ static void vhost_zerocopy_callback(struct ubuf_info *ubuf, bool success)
 	struct vhost_virtqueue *vq = ubufs->vq;
 	int cnt = atomic_read(&ubufs->kref.refcount);
 
+	/* set len to mark this desc buffers done DMA */
+	vq->heads[ubuf->desc].len = success ?
+		VHOST_DMA_DONE_LEN : VHOST_DMA_FAILED_LEN;
+	vhost_net_ubuf_put(ubufs);
+
 	/*
 	 * Trigger polling thread if guest stopped submitting new buffers:
 	 * in this case, the refcount after decrement will eventually reach 1
@@ -318,10 +323,6 @@ static void vhost_zerocopy_callback(struct ubuf_info *ubuf, bool success)
 	 */
 	if (cnt <= 2 || !(cnt % 16))
 		vhost_poll_queue(&vq->poll);
-	/* set len to mark this desc buffers done DMA */
-	vq->heads[ubuf->desc].len = success ?
-		VHOST_DMA_DONE_LEN : VHOST_DMA_FAILED_LEN;
-	vhost_net_ubuf_put(ubufs);
 }
 
 /* Expects to be always run from workqueue - which acts as
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH V2 6/6] vhost_net: correctly limit the max pending buffers
  2013-08-30  4:29 [PATCH V2 0/6] vhost code cleanup and minor enhancement Jason Wang
                   ` (4 preceding siblings ...)
  2013-08-30  4:29 ` [PATCH V2 5/6] vhost_net: poll vhost queue after marking DMA is done Jason Wang
@ 2013-08-30  4:29 ` Jason Wang
  2013-09-02  5:56   ` Michael S. Tsirkin
  5 siblings, 1 reply; 21+ messages in thread
From: Jason Wang @ 2013-08-30  4:29 UTC (permalink / raw)
  To: mst, kvm, virtualization, netdev, linux-kernel

As Michael point out, We used to limit the max pending DMAs to get better cache
utilization. But it was not done correctly since it was one done when there's no
new buffers submitted from guest. Guest can easily exceeds the limitation by
keeping sending packets.

So this patch moves the check into main loop. Tests shows about 5%-10%
improvement on per cpu throughput for guest tx. But a 5% drop on per cpu
transaction rate for a single session TCP_RR.

Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 drivers/vhost/net.c |   15 ++++-----------
 1 files changed, 4 insertions(+), 11 deletions(-)

diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index d09c17c..592e1f2 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -363,6 +363,10 @@ static void handle_tx(struct vhost_net *net)
 		if (zcopy)
 			vhost_zerocopy_signal_used(net, vq);
 
+		if ((nvq->upend_idx + vq->num - VHOST_MAX_PEND) % UIO_MAXIOV ==
+		    nvq->done_idx)
+			break;
+
 		head = vhost_get_vq_desc(&net->dev, vq, vq->iov,
 					 ARRAY_SIZE(vq->iov),
 					 &out, &in,
@@ -372,17 +376,6 @@ static void handle_tx(struct vhost_net *net)
 			break;
 		/* Nothing new?  Wait for eventfd to tell us they refilled. */
 		if (head == vq->num) {
-			int num_pends;
-
-			/* If more outstanding DMAs, queue the work.
-			 * Handle upend_idx wrap around
-			 */
-			num_pends = likely(nvq->upend_idx >= nvq->done_idx) ?
-				    (nvq->upend_idx - nvq->done_idx) :
-				    (nvq->upend_idx + UIO_MAXIOV -
-				     nvq->done_idx);
-			if (unlikely(num_pends > VHOST_MAX_PEND))
-				break;
 			if (unlikely(vhost_enable_notify(&net->dev, vq))) {
 				vhost_disable_notify(&net->dev, vq);
 				continue;
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH V2 5/6] vhost_net: poll vhost queue after marking DMA is done
  2013-08-30  4:29 ` [PATCH V2 5/6] vhost_net: poll vhost queue after marking DMA is done Jason Wang
@ 2013-08-30 16:44   ` Ben Hutchings
  2013-09-02  3:06     ` Jason Wang
  2014-02-12  7:38   ` Qin Chuanyu
  1 sibling, 1 reply; 21+ messages in thread
From: Ben Hutchings @ 2013-08-30 16:44 UTC (permalink / raw)
  To: Jason Wang; +Cc: netdev, virtualization, linux-kernel, kvm, mst

On Fri, 2013-08-30 at 12:29 +0800, Jason Wang wrote:
> We used to poll vhost queue before making DMA is done, this is racy if vhost
> thread were waked up before marking DMA is done which can result the signal to
> be missed. Fix this by always poll the vhost thread before DMA is done.

Does this bug only exist in net-next or is it older?  Should the fix go
to net and stable branches?

Ben.

> Signed-off-by: Jason Wang <jasowang@redhat.com>
> ---
>  drivers/vhost/net.c |    9 +++++----
>  1 files changed, 5 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
> index ff60c2a..d09c17c 100644
> --- a/drivers/vhost/net.c
> +++ b/drivers/vhost/net.c
> @@ -308,6 +308,11 @@ static void vhost_zerocopy_callback(struct ubuf_info *ubuf, bool success)
>  	struct vhost_virtqueue *vq = ubufs->vq;
>  	int cnt = atomic_read(&ubufs->kref.refcount);
>  
> +	/* set len to mark this desc buffers done DMA */
> +	vq->heads[ubuf->desc].len = success ?
> +		VHOST_DMA_DONE_LEN : VHOST_DMA_FAILED_LEN;
> +	vhost_net_ubuf_put(ubufs);
> +
>  	/*
>  	 * Trigger polling thread if guest stopped submitting new buffers:
>  	 * in this case, the refcount after decrement will eventually reach 1
> @@ -318,10 +323,6 @@ static void vhost_zerocopy_callback(struct ubuf_info *ubuf, bool success)
>  	 */
>  	if (cnt <= 2 || !(cnt % 16))
>  		vhost_poll_queue(&vq->poll);
> -	/* set len to mark this desc buffers done DMA */
> -	vq->heads[ubuf->desc].len = success ?
> -		VHOST_DMA_DONE_LEN : VHOST_DMA_FAILED_LEN;
> -	vhost_net_ubuf_put(ubufs);
>  }
>  
>  /* Expects to be always run from workqueue - which acts as

-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH V2 4/6] vhost_net: determine whether or not to use zerocopy at one time
  2013-08-30  4:29 ` [PATCH V2 4/6] vhost_net: determine whether or not to use zerocopy at one time Jason Wang
@ 2013-08-30 18:35   ` Sergei Shtylyov
  2013-09-02  3:15     ` Jason Wang
  0 siblings, 1 reply; 21+ messages in thread
From: Sergei Shtylyov @ 2013-08-30 18:35 UTC (permalink / raw)
  To: Jason Wang; +Cc: netdev, virtualization, linux-kernel, kvm, mst

Hello.

On 08/30/2013 08:29 AM, Jason Wang wrote:

> Currently, even if the packet length is smaller than VHOST_GOODCOPY_LEN, if
> upend_idx != done_idx we still set zcopy_used to true and rollback this choice
> later. This could be avoided by determine zerocopy once by checking all
> conditions at one time before.

> Signed-off-by: Jason Wang <jasowang@redhat.com>
> ---
>   drivers/vhost/net.c |   46 +++++++++++++++++++---------------------------
>   1 files changed, 19 insertions(+), 27 deletions(-)

> diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
> index 8a6dd0d..ff60c2a 100644
> --- a/drivers/vhost/net.c
> +++ b/drivers/vhost/net.c
> @@ -404,43 +404,35 @@ static void handle_tx(struct vhost_net *net)
>   			       iov_length(nvq->hdr, s), hdr_size);
>   			break;
>   		}
> -		zcopy_used = zcopy && (len >= VHOST_GOODCOPY_LEN ||
> -				       nvq->upend_idx != nvq->done_idx);
> +
> +		zcopy_used = zcopy && len >= VHOST_GOODCOPY_LEN
> +			&& (nvq->upend_idx + 1) % UIO_MAXIOV != nvq->done_idx
> +			&& vhost_net_tx_select_zcopy(net);

    Could you leave && on a first of two lines, matching the previous style?

>
>   		/* use msg_control to pass vhost zerocopy ubuf info to skb */
>   		if (zcopy_used) {
> +			struct ubuf_info *ubuf;
> +			ubuf = nvq->ubuf_info + nvq->upend_idx;
> +
>   			vq->heads[nvq->upend_idx].id = head;
[...]
> +			vq->heads[nvq->upend_idx].len = VHOST_DMA_IN_PROGRESS;
> +			ubuf->callback = vhost_zerocopy_callback;
> +			ubuf->ctx = nvq->ubufs;
> +			ubuf->desc = nvq->upend_idx;
> +			msg.msg_control = ubuf;
> +			msg.msg_controllen = sizeof(ubuf);

    'sizeof(ubuf)' where 'ubuf' is a pointer? Are you sure it shouldn't be 
'sizeof(*ubuf)'?

WBR, Sergei

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH V2 5/6] vhost_net: poll vhost queue after marking DMA is done
  2013-08-30 16:44   ` Ben Hutchings
@ 2013-09-02  3:06     ` Jason Wang
  0 siblings, 0 replies; 21+ messages in thread
From: Jason Wang @ 2013-09-02  3:06 UTC (permalink / raw)
  To: Ben Hutchings; +Cc: netdev, virtualization, linux-kernel, kvm, mst

On 08/31/2013 12:44 AM, Ben Hutchings wrote:
> On Fri, 2013-08-30 at 12:29 +0800, Jason Wang wrote:
>> We used to poll vhost queue before making DMA is done, this is racy if vhost
>> thread were waked up before marking DMA is done which can result the signal to
>> be missed. Fix this by always poll the vhost thread before DMA is done.
> Does this bug only exist in net-next or is it older?  Should the fix go
> to net and stable branches?

This should go for the stable branches too (3.4 above).

Thanks for the checking.
>
> Ben.
>
>> Signed-off-by: Jason Wang <jasowang@redhat.com>
>> ---
>>  drivers/vhost/net.c |    9 +++++----
>>  1 files changed, 5 insertions(+), 4 deletions(-)
>>
>> diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
>> index ff60c2a..d09c17c 100644
>> --- a/drivers/vhost/net.c
>> +++ b/drivers/vhost/net.c
>> @@ -308,6 +308,11 @@ static void vhost_zerocopy_callback(struct ubuf_info *ubuf, bool success)
>>  	struct vhost_virtqueue *vq = ubufs->vq;
>>  	int cnt = atomic_read(&ubufs->kref.refcount);
>>  
>> +	/* set len to mark this desc buffers done DMA */
>> +	vq->heads[ubuf->desc].len = success ?
>> +		VHOST_DMA_DONE_LEN : VHOST_DMA_FAILED_LEN;
>> +	vhost_net_ubuf_put(ubufs);
>> +
>>  	/*
>>  	 * Trigger polling thread if guest stopped submitting new buffers:
>>  	 * in this case, the refcount after decrement will eventually reach 1
>> @@ -318,10 +323,6 @@ static void vhost_zerocopy_callback(struct ubuf_info *ubuf, bool success)
>>  	 */
>>  	if (cnt <= 2 || !(cnt % 16))
>>  		vhost_poll_queue(&vq->poll);
>> -	/* set len to mark this desc buffers done DMA */
>> -	vq->heads[ubuf->desc].len = success ?
>> -		VHOST_DMA_DONE_LEN : VHOST_DMA_FAILED_LEN;
>> -	vhost_net_ubuf_put(ubufs);
>>  }
>>  
>>  /* Expects to be always run from workqueue - which acts as

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH V2 4/6] vhost_net: determine whether or not to use zerocopy at one time
  2013-08-30 18:35   ` Sergei Shtylyov
@ 2013-09-02  3:15     ` Jason Wang
  0 siblings, 0 replies; 21+ messages in thread
From: Jason Wang @ 2013-09-02  3:15 UTC (permalink / raw)
  To: Sergei Shtylyov; +Cc: netdev, virtualization, linux-kernel, kvm, mst

On 08/31/2013 02:35 AM, Sergei Shtylyov wrote:
> Hello.
>
> On 08/30/2013 08:29 AM, Jason Wang wrote:
>
>> Currently, even if the packet length is smaller than
>> VHOST_GOODCOPY_LEN, if
>> upend_idx != done_idx we still set zcopy_used to true and rollback
>> this choice
>> later. This could be avoided by determine zerocopy once by checking all
>> conditions at one time before.
>
>> Signed-off-by: Jason Wang <jasowang@redhat.com>
>> ---
>>   drivers/vhost/net.c |   46
>> +++++++++++++++++++---------------------------
>>   1 files changed, 19 insertions(+), 27 deletions(-)
>
>> diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
>> index 8a6dd0d..ff60c2a 100644
>> --- a/drivers/vhost/net.c
>> +++ b/drivers/vhost/net.c
>> @@ -404,43 +404,35 @@ static void handle_tx(struct vhost_net *net)
>>                      iov_length(nvq->hdr, s), hdr_size);
>>               break;
>>           }
>> -        zcopy_used = zcopy && (len >= VHOST_GOODCOPY_LEN ||
>> -                       nvq->upend_idx != nvq->done_idx);
>> +
>> +        zcopy_used = zcopy && len >= VHOST_GOODCOPY_LEN
>> +            && (nvq->upend_idx + 1) % UIO_MAXIOV != nvq->done_idx
>> +            && vhost_net_tx_select_zcopy(net);
>
>    Could you leave && on a first of two lines, matching the previous
> style?
>

ok.
>>
>>           /* use msg_control to pass vhost zerocopy ubuf info to skb */
>>           if (zcopy_used) {
>> +            struct ubuf_info *ubuf;
>> +            ubuf = nvq->ubuf_info + nvq->upend_idx;
>> +
>>               vq->heads[nvq->upend_idx].id = head;
> [...]
>> +            vq->heads[nvq->upend_idx].len = VHOST_DMA_IN_PROGRESS;
>> +            ubuf->callback = vhost_zerocopy_callback;
>> +            ubuf->ctx = nvq->ubufs;
>> +            ubuf->desc = nvq->upend_idx;
>> +            msg.msg_control = ubuf;
>> +            msg.msg_controllen = sizeof(ubuf);
>
>    'sizeof(ubuf)' where 'ubuf' is a pointer? Are you sure it shouldn't
> be 'sizeof(*ubuf)'?

Yes, pointer is sufficiet. Vhost allocate an arrays of ubuf and
tun/macvtap just need a reference of it.
>
> WBR, Sergei
>
> -- 
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH V2 2/6] vhost_net: use vhost_add_used_and_signal_n() in vhost_zerocopy_signal_used()
  2013-08-30  4:29 ` [PATCH V2 2/6] vhost_net: use vhost_add_used_and_signal_n() in vhost_zerocopy_signal_used() Jason Wang
@ 2013-09-02  5:50   ` Michael S. Tsirkin
  2013-09-02  6:28     ` Jason Wang
  0 siblings, 1 reply; 21+ messages in thread
From: Michael S. Tsirkin @ 2013-09-02  5:50 UTC (permalink / raw)
  To: Jason Wang; +Cc: netdev, linux-kernel, kvm, virtualization

On Fri, Aug 30, 2013 at 12:29:18PM +0800, Jason Wang wrote:
> We tend to batch the used adding and signaling in vhost_zerocopy_callback()
> which may result more than 100 used buffers to be updated in
> vhost_zerocopy_signal_used() in some cases. So wwitch to use

switch

> vhost_add_used_and_signal_n() to avoid multiple calls to
> vhost_add_used_and_signal(). Which means much more less times of used index
> updating and memory barriers.

pls put info on perf gain in commit log too

> 
> Signed-off-by: Jason Wang <jasowang@redhat.com>
> ---
>  drivers/vhost/net.c |   13 ++++++++-----
>  1 files changed, 8 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
> index 280ee66..8a6dd0d 100644
> --- a/drivers/vhost/net.c
> +++ b/drivers/vhost/net.c
> @@ -281,7 +281,7 @@ static void vhost_zerocopy_signal_used(struct vhost_net *net,
>  {
>  	struct vhost_net_virtqueue *nvq =
>  		container_of(vq, struct vhost_net_virtqueue, vq);
> -	int i;
> +	int i, add;
>  	int j = 0;
>  
>  	for (i = nvq->done_idx; i != nvq->upend_idx; i = (i + 1) % UIO_MAXIOV) {
> @@ -289,14 +289,17 @@ static void vhost_zerocopy_signal_used(struct vhost_net *net,
>  			vhost_net_tx_err(net);
>  		if (VHOST_DMA_IS_DONE(vq->heads[i].len)) {
>  			vq->heads[i].len = VHOST_DMA_CLEAR_LEN;
> -			vhost_add_used_and_signal(vq->dev, vq,
> -						  vq->heads[i].id, 0);
>  			++j;
>  		} else
>  			break;
>  	}
> -	if (j)
> -		nvq->done_idx = i;
> +	while (j) {
> +		add = min(UIO_MAXIOV - nvq->done_idx, j);
> +		vhost_add_used_and_signal_n(vq->dev, vq,
> +					    &vq->heads[nvq->done_idx], add);
> +		nvq->done_idx = (nvq->done_idx + add) % UIO_MAXIOV;
> +		j -= add;
> +	}
>  }
>  
>  static void vhost_zerocopy_callback(struct ubuf_info *ubuf, bool success)
> -- 
> 1.7.1

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH V2 1/6] vhost_net: make vhost_zerocopy_signal_used() returns void
  2013-08-30  4:29 ` [PATCH V2 1/6] vhost_net: make vhost_zerocopy_signal_used() returns void Jason Wang
@ 2013-09-02  5:51   ` Michael S. Tsirkin
  2013-09-02  6:29     ` Jason Wang
  0 siblings, 1 reply; 21+ messages in thread
From: Michael S. Tsirkin @ 2013-09-02  5:51 UTC (permalink / raw)
  To: Jason Wang; +Cc: netdev, linux-kernel, kvm, virtualization

tweak subj s/returns/return/

On Fri, Aug 30, 2013 at 12:29:17PM +0800, Jason Wang wrote:
> None of its caller use its return value, so let it return void.
> 
> Signed-off-by: Jason Wang <jasowang@redhat.com>
> ---
>  drivers/vhost/net.c |    5 ++---
>  1 files changed, 2 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
> index 969a859..280ee66 100644
> --- a/drivers/vhost/net.c
> +++ b/drivers/vhost/net.c
> @@ -276,8 +276,8 @@ static void copy_iovec_hdr(const struct iovec *from, struct iovec *to,
>   * of used idx. Once lower device DMA done contiguously, we will signal KVM
>   * guest used idx.
>   */
> -static int vhost_zerocopy_signal_used(struct vhost_net *net,
> -				      struct vhost_virtqueue *vq)
> +static void vhost_zerocopy_signal_used(struct vhost_net *net,
> +				       struct vhost_virtqueue *vq)
>  {
>  	struct vhost_net_virtqueue *nvq =
>  		container_of(vq, struct vhost_net_virtqueue, vq);
> @@ -297,7 +297,6 @@ static int vhost_zerocopy_signal_used(struct vhost_net *net,
>  	}
>  	if (j)
>  		nvq->done_idx = i;
> -	return j;
>  }
>  
>  static void vhost_zerocopy_callback(struct ubuf_info *ubuf, bool success)
> -- 
> 1.7.1

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH V2 6/6] vhost_net: correctly limit the max pending buffers
  2013-08-30  4:29 ` [PATCH V2 6/6] vhost_net: correctly limit the max pending buffers Jason Wang
@ 2013-09-02  5:56   ` Michael S. Tsirkin
  2013-09-02  6:30     ` Jason Wang
  0 siblings, 1 reply; 21+ messages in thread
From: Michael S. Tsirkin @ 2013-09-02  5:56 UTC (permalink / raw)
  To: Jason Wang; +Cc: netdev, linux-kernel, kvm, virtualization

On Fri, Aug 30, 2013 at 12:29:22PM +0800, Jason Wang wrote:
> As Michael point out, We used to limit the max pending DMAs to get better cache
> utilization. But it was not done correctly since it was one done when there's no
> new buffers submitted from guest. Guest can easily exceeds the limitation by
> keeping sending packets.
> 
> So this patch moves the check into main loop. Tests shows about 5%-10%
> improvement on per cpu throughput for guest tx. But a 5% drop on per cpu
> transaction rate for a single session TCP_RR.

Any explanation for the drop? single session TCP_RR is unlikely to
exceed VHOST_MAX_PEND, correct?

> 
> Signed-off-by: Jason Wang <jasowang@redhat.com>
> ---
>  drivers/vhost/net.c |   15 ++++-----------
>  1 files changed, 4 insertions(+), 11 deletions(-)
> 
> diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
> index d09c17c..592e1f2 100644
> --- a/drivers/vhost/net.c
> +++ b/drivers/vhost/net.c
> @@ -363,6 +363,10 @@ static void handle_tx(struct vhost_net *net)
>  		if (zcopy)
>  			vhost_zerocopy_signal_used(net, vq);
>  
> +		if ((nvq->upend_idx + vq->num - VHOST_MAX_PEND) % UIO_MAXIOV ==
> +		    nvq->done_idx)
> +			break;
> +
>  		head = vhost_get_vq_desc(&net->dev, vq, vq->iov,
>  					 ARRAY_SIZE(vq->iov),
>  					 &out, &in,
> @@ -372,17 +376,6 @@ static void handle_tx(struct vhost_net *net)
>  			break;
>  		/* Nothing new?  Wait for eventfd to tell us they refilled. */
>  		if (head == vq->num) {
> -			int num_pends;
> -
> -			/* If more outstanding DMAs, queue the work.
> -			 * Handle upend_idx wrap around
> -			 */
> -			num_pends = likely(nvq->upend_idx >= nvq->done_idx) ?
> -				    (nvq->upend_idx - nvq->done_idx) :
> -				    (nvq->upend_idx + UIO_MAXIOV -
> -				     nvq->done_idx);
> -			if (unlikely(num_pends > VHOST_MAX_PEND))
> -				break;
>  			if (unlikely(vhost_enable_notify(&net->dev, vq))) {
>  				vhost_disable_notify(&net->dev, vq);
>  				continue;
> -- 
> 1.7.1

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH V2 2/6] vhost_net: use vhost_add_used_and_signal_n() in vhost_zerocopy_signal_used()
  2013-09-02  5:50   ` Michael S. Tsirkin
@ 2013-09-02  6:28     ` Jason Wang
  0 siblings, 0 replies; 21+ messages in thread
From: Jason Wang @ 2013-09-02  6:28 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: netdev, linux-kernel, kvm, virtualization

On 09/02/2013 01:50 PM, Michael S. Tsirkin wrote:
> On Fri, Aug 30, 2013 at 12:29:18PM +0800, Jason Wang wrote:
>> > We tend to batch the used adding and signaling in vhost_zerocopy_callback()
>> > which may result more than 100 used buffers to be updated in
>> > vhost_zerocopy_signal_used() in some cases. So wwitch to use
> switch

Ok.
>> > vhost_add_used_and_signal_n() to avoid multiple calls to
>> > vhost_add_used_and_signal(). Which means much more less times of used index
>> > updating and memory barriers.
> pls put info on perf gain in commit log too
>

Sure.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH V2 1/6] vhost_net: make vhost_zerocopy_signal_used() returns void
  2013-09-02  5:51   ` Michael S. Tsirkin
@ 2013-09-02  6:29     ` Jason Wang
  0 siblings, 0 replies; 21+ messages in thread
From: Jason Wang @ 2013-09-02  6:29 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: netdev, linux-kernel, kvm, virtualization

On 09/02/2013 01:51 PM, Michael S. Tsirkin wrote:
> tweak subj s/returns/return/
>
> On Fri, Aug 30, 2013 at 12:29:17PM +0800, Jason Wang wrote:
>> > None of its caller use its return value, so let it return void.
>> > 
>> > Signed-off-by: Jason Wang <jasowang@redhat.com>
>> > ---

Will correct it in v3.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH V2 6/6] vhost_net: correctly limit the max pending buffers
  2013-09-02  5:56   ` Michael S. Tsirkin
@ 2013-09-02  6:30     ` Jason Wang
  2013-09-02  8:37       ` Jason Wang
  0 siblings, 1 reply; 21+ messages in thread
From: Jason Wang @ 2013-09-02  6:30 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: netdev, linux-kernel, kvm, virtualization

On 09/02/2013 01:56 PM, Michael S. Tsirkin wrote:
> On Fri, Aug 30, 2013 at 12:29:22PM +0800, Jason Wang wrote:
>> As Michael point out, We used to limit the max pending DMAs to get better cache
>> utilization. But it was not done correctly since it was one done when there's no
>> new buffers submitted from guest. Guest can easily exceeds the limitation by
>> keeping sending packets.
>>
>> So this patch moves the check into main loop. Tests shows about 5%-10%
>> improvement on per cpu throughput for guest tx. But a 5% drop on per cpu
>> transaction rate for a single session TCP_RR.
> Any explanation for the drop? single session TCP_RR is unlikely to
> exceed VHOST_MAX_PEND, correct?

Unlikely to exceed. Recheck the result, looks like it was not stable
enough. Will re-test and report.
>
>> Signed-off-by: Jason Wang <jasowang@redhat.com>
>> ---
>>  drivers/vhost/net.c |   15 ++++-----------
>>  1 files changed, 4 insertions(+), 11 deletions(-)
>>
>> diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
>> index d09c17c..592e1f2 100644
>> --- a/drivers/vhost/net.c
>> +++ b/drivers/vhost/net.c
>> @@ -363,6 +363,10 @@ static void handle_tx(struct vhost_net *net)
>>  		if (zcopy)
>>  			vhost_zerocopy_signal_used(net, vq);
>>  
>> +		if ((nvq->upend_idx + vq->num - VHOST_MAX_PEND) % UIO_MAXIOV ==
>> +		    nvq->done_idx)
>> +			break;
>> +
>>  		head = vhost_get_vq_desc(&net->dev, vq, vq->iov,
>>  					 ARRAY_SIZE(vq->iov),
>>  					 &out, &in,
>> @@ -372,17 +376,6 @@ static void handle_tx(struct vhost_net *net)
>>  			break;
>>  		/* Nothing new?  Wait for eventfd to tell us they refilled. */
>>  		if (head == vq->num) {
>> -			int num_pends;
>> -
>> -			/* If more outstanding DMAs, queue the work.
>> -			 * Handle upend_idx wrap around
>> -			 */
>> -			num_pends = likely(nvq->upend_idx >= nvq->done_idx) ?
>> -				    (nvq->upend_idx - nvq->done_idx) :
>> -				    (nvq->upend_idx + UIO_MAXIOV -
>> -				     nvq->done_idx);
>> -			if (unlikely(num_pends > VHOST_MAX_PEND))
>> -				break;
>>  			if (unlikely(vhost_enable_notify(&net->dev, vq))) {
>>  				vhost_disable_notify(&net->dev, vq);
>>  				continue;
>> -- 
>> 1.7.1
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH V2 6/6] vhost_net: correctly limit the max pending buffers
  2013-09-02  6:30     ` Jason Wang
@ 2013-09-02  8:37       ` Jason Wang
  0 siblings, 0 replies; 21+ messages in thread
From: Jason Wang @ 2013-09-02  8:37 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: netdev, linux-kernel, kvm, virtualization

On 09/02/2013 02:30 PM, Jason Wang wrote:
> On 09/02/2013 01:56 PM, Michael S. Tsirkin wrote:
>> > On Fri, Aug 30, 2013 at 12:29:22PM +0800, Jason Wang wrote:
>>> >> As Michael point out, We used to limit the max pending DMAs to get better cache
>>> >> utilization. But it was not done correctly since it was one done when there's no
>>> >> new buffers submitted from guest. Guest can easily exceeds the limitation by
>>> >> keeping sending packets.
>>> >>
>>> >> So this patch moves the check into main loop. Tests shows about 5%-10%
>>> >> improvement on per cpu throughput for guest tx. But a 5% drop on per cpu
>>> >> transaction rate for a single session TCP_RR.
>> > Any explanation for the drop? single session TCP_RR is unlikely to
>> > exceed VHOST_MAX_PEND, correct?
> Unlikely to exceed. Recheck the result, looks like it was not stable
> enough. Will re-test and report.

Re-tested with more iterations, result shows no difference on TCP_RR
test and 5%-10% improvement on per cpu throughput for guest tx.

Will post V3 soon.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH V2 5/6] vhost_net: poll vhost queue after marking DMA is done
  2013-08-30  4:29 ` [PATCH V2 5/6] vhost_net: poll vhost queue after marking DMA is done Jason Wang
  2013-08-30 16:44   ` Ben Hutchings
@ 2014-02-12  7:38   ` Qin Chuanyu
  2014-02-12 10:06     ` Jason Wang
  2014-02-12 16:01     ` Michael S. Tsirkin
  1 sibling, 2 replies; 21+ messages in thread
From: Qin Chuanyu @ 2014-02-12  7:38 UTC (permalink / raw)
  To: Jason Wang, virtualization, netdev, linux-kernel; +Cc: mst, kvm

On 2013/8/30 12:29, Jason Wang wrote:
> We used to poll vhost queue before making DMA is done, this is racy if vhost
> thread were waked up before marking DMA is done which can result the signal to
> be missed. Fix this by always poll the vhost thread before DMA is done.
>
> Signed-off-by: Jason Wang <jasowang@redhat.com>
> ---
>   drivers/vhost/net.c |    9 +++++----
>   1 files changed, 5 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
> index ff60c2a..d09c17c 100644
> --- a/drivers/vhost/net.c
> +++ b/drivers/vhost/net.c
> @@ -308,6 +308,11 @@ static void vhost_zerocopy_callback(struct ubuf_info *ubuf, bool success)
>   	struct vhost_virtqueue *vq = ubufs->vq;
>   	int cnt = atomic_read(&ubufs->kref.refcount);
>
> +	/* set len to mark this desc buffers done DMA */
> +	vq->heads[ubuf->desc].len = success ?
> +		VHOST_DMA_DONE_LEN : VHOST_DMA_FAILED_LEN;
> +	vhost_net_ubuf_put(ubufs);
> +
>   	/*
>   	 * Trigger polling thread if guest stopped submitting new buffers:
>   	 * in this case, the refcount after decrement will eventually reach 1
> @@ -318,10 +323,6 @@ static void vhost_zerocopy_callback(struct ubuf_info *ubuf, bool success)
>   	 */
>   	if (cnt <= 2 || !(cnt % 16))
>   		vhost_poll_queue(&vq->poll);
> -	/* set len to mark this desc buffers done DMA */
> -	vq->heads[ubuf->desc].len = success ?
> -		VHOST_DMA_DONE_LEN : VHOST_DMA_FAILED_LEN;
> -	vhost_net_ubuf_put(ubufs);
>   }
>
>   /* Expects to be always run from workqueue - which acts as
>
with this change, vq would lose protection that provided by ubufs->kref.
if another thread is waiting at vhost_net_ubuf_put_and_wait called by
vhost_net_release, then after vhost_net_ubuf_put, vq would been free
by vhost_net_release soon, vhost_poll_queue(&vq->poll) may cause NULL
pointer Exception.

another question is that vhost_zerocopy_callback is called by kfree_skb,
it may called in different thread context.
vhost_poll_queue is called decided by ubufs->kref.refcount, this may 
cause there isn't any thread call  vhost_poll_queue, but at least one is 
needed. and this cause network break.
We could repeat it by using 8 netperf thread in guest to xmit tcp to its 
host.

I think if using atomic_read to decide while do vhost_poll_queue or not,
at least a spink_lock is needed.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH V2 5/6] vhost_net: poll vhost queue after marking DMA is done
  2014-02-12  7:38   ` Qin Chuanyu
@ 2014-02-12 10:06     ` Jason Wang
  2014-02-12 16:01     ` Michael S. Tsirkin
  1 sibling, 0 replies; 21+ messages in thread
From: Jason Wang @ 2014-02-12 10:06 UTC (permalink / raw)
  To: Qin Chuanyu, virtualization, netdev, linux-kernel; +Cc: kvm, mst

On 02/12/2014 03:38 PM, Qin Chuanyu wrote:
> On 2013/8/30 12:29, Jason Wang wrote:
>> We used to poll vhost queue before making DMA is done, this is racy
>> if vhost
>> thread were waked up before marking DMA is done which can result the
>> signal to
>> be missed. Fix this by always poll the vhost thread before DMA is done.
>>
>> Signed-off-by: Jason Wang <jasowang@redhat.com>
>> ---
>>   drivers/vhost/net.c |    9 +++++----
>>   1 files changed, 5 insertions(+), 4 deletions(-)
>>
>> diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
>> index ff60c2a..d09c17c 100644
>> --- a/drivers/vhost/net.c
>> +++ b/drivers/vhost/net.c
>> @@ -308,6 +308,11 @@ static void vhost_zerocopy_callback(struct
>> ubuf_info *ubuf, bool success)
>>       struct vhost_virtqueue *vq = ubufs->vq;
>>       int cnt = atomic_read(&ubufs->kref.refcount);
>>
>> +    /* set len to mark this desc buffers done DMA */
>> +    vq->heads[ubuf->desc].len = success ?
>> +        VHOST_DMA_DONE_LEN : VHOST_DMA_FAILED_LEN;
>> +    vhost_net_ubuf_put(ubufs);
>> +
>>       /*
>>        * Trigger polling thread if guest stopped submitting new buffers:
>>        * in this case, the refcount after decrement will eventually
>> reach 1
>> @@ -318,10 +323,6 @@ static void vhost_zerocopy_callback(struct
>> ubuf_info *ubuf, bool success)
>>        */
>>       if (cnt <= 2 || !(cnt % 16))
>>           vhost_poll_queue(&vq->poll);
>> -    /* set len to mark this desc buffers done DMA */
>> -    vq->heads[ubuf->desc].len = success ?
>> -        VHOST_DMA_DONE_LEN : VHOST_DMA_FAILED_LEN;
>> -    vhost_net_ubuf_put(ubufs);
>>   }
>>
>>   /* Expects to be always run from workqueue - which acts as
>>
> with this change, vq would lose protection that provided by ubufs->kref.
> if another thread is waiting at vhost_net_ubuf_put_and_wait called by
> vhost_net_release, then after vhost_net_ubuf_put, vq would been free
> by vhost_net_release soon, vhost_poll_queue(&vq->poll) may cause NULL
> pointer Exception.
>

Good catch.
> another question is that vhost_zerocopy_callback is called by kfree_skb,
> it may called in different thread context.
> vhost_poll_queue is called decided by ubufs->kref.refcount, this may
> cause there isn't any thread call  vhost_poll_queue, but at least one
> is needed. and this cause network break.
> We could repeat it by using 8 netperf thread in guest to xmit tcp to
> its host.
>
> I think if using atomic_read to decide while do vhost_poll_queue or not,
> at least a spink_lock is needed.

Then you need another ref count to protect that spinlock? Care to send
patches?

Thanks
>
> -- 
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH V2 5/6] vhost_net: poll vhost queue after marking DMA is done
  2014-02-12  7:38   ` Qin Chuanyu
  2014-02-12 10:06     ` Jason Wang
@ 2014-02-12 16:01     ` Michael S. Tsirkin
  1 sibling, 0 replies; 21+ messages in thread
From: Michael S. Tsirkin @ 2014-02-12 16:01 UTC (permalink / raw)
  To: Qin Chuanyu; +Cc: netdev, linux-kernel, kvm, virtualization

On Wed, Feb 12, 2014 at 03:38:18PM +0800, Qin Chuanyu wrote:
> another question is that vhost_zerocopy_callback is called by kfree_skb,
> it may called in different thread context.
> vhost_poll_queue is called decided by ubufs->kref.refcount, this may
> cause there isn't any thread call  vhost_poll_queue, but at least
> one is needed. and this cause network break.
> We could repeat it by using 8 netperf thread in guest to xmit tcp to
> its host.

Thanks a lot for the report, will send the patch soon.

> 
> I think if using atomic_read to decide while do vhost_poll_queue or not,
> at least a spink_lock is needed.

No, nothing so drastic.

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2014-02-12 16:01 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-08-30  4:29 [PATCH V2 0/6] vhost code cleanup and minor enhancement Jason Wang
2013-08-30  4:29 ` [PATCH V2 1/6] vhost_net: make vhost_zerocopy_signal_used() returns void Jason Wang
2013-09-02  5:51   ` Michael S. Tsirkin
2013-09-02  6:29     ` Jason Wang
2013-08-30  4:29 ` [PATCH V2 2/6] vhost_net: use vhost_add_used_and_signal_n() in vhost_zerocopy_signal_used() Jason Wang
2013-09-02  5:50   ` Michael S. Tsirkin
2013-09-02  6:28     ` Jason Wang
2013-08-30  4:29 ` [PATCH V2 3/6] vhost: switch to use vhost_add_used_n() Jason Wang
2013-08-30  4:29 ` [PATCH V2 4/6] vhost_net: determine whether or not to use zerocopy at one time Jason Wang
2013-08-30 18:35   ` Sergei Shtylyov
2013-09-02  3:15     ` Jason Wang
2013-08-30  4:29 ` [PATCH V2 5/6] vhost_net: poll vhost queue after marking DMA is done Jason Wang
2013-08-30 16:44   ` Ben Hutchings
2013-09-02  3:06     ` Jason Wang
2014-02-12  7:38   ` Qin Chuanyu
2014-02-12 10:06     ` Jason Wang
2014-02-12 16:01     ` Michael S. Tsirkin
2013-08-30  4:29 ` [PATCH V2 6/6] vhost_net: correctly limit the max pending buffers Jason Wang
2013-09-02  5:56   ` Michael S. Tsirkin
2013-09-02  6:30     ` Jason Wang
2013-09-02  8:37       ` Jason Wang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).