KVM Archive on lore.kernel.org
 help / color / Atom feed
* [PATCH V3 00/15] Packed virtqueue support for vhost
@ 2019-07-17 10:52 Jason Wang
  2019-07-17 10:52 ` [PATCH V3 01/15] vhost: simplify meta data pointer accessing Jason Wang
                   ` (15 more replies)
  0 siblings, 16 replies; 19+ messages in thread
From: Jason Wang @ 2019-07-17 10:52 UTC (permalink / raw)
  To: mst, jasowang
  Cc: kvm, virtualization, netdev, linux-kernel, jfreimann, tiwei.bie,
	maxime.coquelin

Hi all:

This series implements packed virtqueues which were described
at [1]. In this version we try to address the performance regression
saw by V2. The root cause is packed virtqueue need more times of
userspace memory accesssing which turns out to be very
expensive. Thanks to the help of 7f466032dc9e ("vhost: access vq
metadata through kernel virtual address"), such overhead cold be
eliminated. So in this version, we can see about 2% improvement for
packed virtqueue on PPS.

More optimizations (e.g IN_ORDER) is on the road.

Please review.

[1] https://docs.oasis-open.org/virtio/virtio/v1.1/csprd01/virtio-v1.1-csprd01.html#x1-610007

This version were tested with:
- zercopy/datacopy
- mergeable buffer on/off
- TCP stream & virtio-user

Changes from V2:
- rebase on top of vhost metadata accelreation series
- introduce shadow used ring API
- new SET_VRING_BASE/GET_VRING_BASE that takes care about warp counter
  and index for both avail and used
- various twaeaks

Changes from V1:
- drop uapi patch and use Tiwei's
- split the enablement of packed virtqueue into a separate patch

Changes from RFC V5:
- save unnecessary barriers during vhost_add_used_packed_n()
- more compact math for event idx
- fix failure of SET_VRING_BASE when avail_wrap_counter is true
- fix not copy avail_wrap_counter during GET_VRING_BASE
- introduce SET_VRING_USED_BASE/GET_VRING_USED_BASE for syncing
- last_used_idx
- rename used_wrap_counter to last_used_wrap_counter
- rebase to net-next

Changes from RFC V4:
- fix signalled_used index recording
- track avail index correctly
- various minor fixes

Changes from RFC V3:
- Fix math on event idx checking
- Sync last avail wrap counter through GET/SET_VRING_BASE
- remove desc_event prefix in the driver/device structure

Changes from RFC V2:
- do not use & in checking desc_event_flags
- off should be most significant bit
- remove the workaround of mergeable buffer for dpdk prototype
- id should be in the last descriptor in the chain
- keep _F_WRITE for write descriptor when adding used
- device flags updating should use ADDR_USED type
- return error on unexpected unavail descriptor in a chain
- return false in vhost_ve_avail_empty is descriptor is available
- track last seen avail_wrap_counter
- correctly examine available descriptor in get_indirect_packed()
- vhost_idx_diff should return u16 instead of bool

Changes from RFC V1:
- Refactor vhost used elem code to avoid open coding on used elem
- Event suppression support (compile test only).
- Indirect descriptor support (compile test only).
- Zerocopy support.
- vIOMMU support.
- SCSI/VSOCK support (compile test only).
- Fix several bugs

Jason Wang (15):
  vhost: simplify meta data pointer accessing
  vhost: remove the unnecessary parameter of vhost_vq_avail_empty()
  vhost: remove unnecessary parameter of
    vhost_enable_notify()/vhost_disable_notify
  vhost-net: don't use vhost_add_used_n() for zerocopy
  vhost: introduce helpers to manipulate shadow used ring
  vhost_net: switch TX to use shadow used ring API
  vhost_net: calculate last used length once for mergeable buffer
  vhost_net: switch to use shadow used ring API for RX
  vhost: do not export vhost_add_used_n() and
    vhost_add_used_and_signal_n()
  vhost: hide used ring layout from device
  vhost: do not use vring_used_elem
  vhost: vhost_put_user() can accept metadata type
  vhost: packed ring support
  vhost: event suppression for packed ring
  vhost: enable packed virtqueues

 drivers/vhost/net.c   |  200 +++---
 drivers/vhost/scsi.c  |   72 +-
 drivers/vhost/test.c  |    6 +-
 drivers/vhost/vhost.c | 1508 +++++++++++++++++++++++++++++++++++------
 drivers/vhost/vhost.h |   78 ++-
 drivers/vhost/vsock.c |   57 +-
 6 files changed, 1513 insertions(+), 408 deletions(-)

-- 
2.18.1


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH V3 01/15] vhost: simplify meta data pointer accessing
  2019-07-17 10:52 [PATCH V3 00/15] Packed virtqueue support for vhost Jason Wang
@ 2019-07-17 10:52 ` Jason Wang
  2019-07-17 10:52 ` [PATCH V3 02/15] vhost: remove the unnecessary parameter of vhost_vq_avail_empty() Jason Wang
                   ` (14 subsequent siblings)
  15 siblings, 0 replies; 19+ messages in thread
From: Jason Wang @ 2019-07-17 10:52 UTC (permalink / raw)
  To: mst, jasowang
  Cc: kvm, virtualization, netdev, linux-kernel, jfreimann, tiwei.bie,
	maxime.coquelin

Instead of open coding meta pointer accessing through caches. This
patch introduces vhost_get_meta_ptr()/vhost_put_meta_ptr() pair to
reduce the code duplication and simplify its callers
implementation. This is helpful for reducing LOC for packed virtqueue
as well.

Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 drivers/vhost/vhost.c | 207 ++++++++++++++----------------------------
 1 file changed, 70 insertions(+), 137 deletions(-)

diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index dc9301d31f12..7f51c74d9aee 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -1194,25 +1194,37 @@ static inline void __user *__vhost_get_user(struct vhost_virtqueue *vq,
 	ret; \
 })
 
-static inline int vhost_put_avail_event(struct vhost_virtqueue *vq)
-{
 #if VHOST_ARCH_CAN_ACCEL_UACCESS
+static void *vhost_get_meta_ptr(struct vhost_virtqueue *vq, int type)
+{
 	struct vhost_map *map;
-	struct vring_used *used;
 
 	if (!vq->iotlb) {
 		rcu_read_lock();
+		map = rcu_dereference(vq->maps[type]);
+		if (likely(map))
+			return map->addr;
+		rcu_read_unlock();
+	}
+	return NULL;
+}
 
-		map = rcu_dereference(vq->maps[VHOST_ADDR_USED]);
-		if (likely(map)) {
-			used = map->addr;
-			*((__virtio16 *)&used->ring[vq->num]) =
-				cpu_to_vhost16(vq, vq->avail_idx);
-			rcu_read_unlock();
-			return 0;
-		}
+static void vhost_put_meta_ptr(void)
+{
+	rcu_read_unlock();
+}
+#endif
 
-		rcu_read_unlock();
+static inline int vhost_put_avail_event(struct vhost_virtqueue *vq)
+{
+#if VHOST_ARCH_CAN_ACCEL_UACCESS
+	struct vring_used *used = vhost_get_meta_ptr(vq, VHOST_ADDR_USED);
+
+	if (likely(used)) {
+		*((__virtio16 *)&used->ring[vq->num]) =
+			cpu_to_vhost16(vq, vq->avail_idx);
+		vhost_put_meta_ptr();
+		return 0;
 	}
 #endif
 
@@ -1225,23 +1237,14 @@ static inline int vhost_put_used(struct vhost_virtqueue *vq,
 				 int count)
 {
 #if VHOST_ARCH_CAN_ACCEL_UACCESS
-	struct vhost_map *map;
-	struct vring_used *used;
+	struct vring_used *used = vhost_get_meta_ptr(vq, VHOST_ADDR_USED);
 	size_t size;
 
-	if (!vq->iotlb) {
-		rcu_read_lock();
-
-		map = rcu_dereference(vq->maps[VHOST_ADDR_USED]);
-		if (likely(map)) {
-			used = map->addr;
-			size = count * sizeof(*head);
-			memcpy(used->ring + idx, head, size);
-			rcu_read_unlock();
-			return 0;
-		}
-
-		rcu_read_unlock();
+	if (likely(used)) {
+		size = count * sizeof(*head);
+		memcpy(used->ring + idx, head, size);
+		vhost_put_meta_ptr();
+		return 0;
 	}
 #endif
 
@@ -1253,21 +1256,12 @@ static inline int vhost_put_used_flags(struct vhost_virtqueue *vq)
 
 {
 #if VHOST_ARCH_CAN_ACCEL_UACCESS
-	struct vhost_map *map;
-	struct vring_used *used;
-
-	if (!vq->iotlb) {
-		rcu_read_lock();
+	struct vring_used *used = vhost_get_meta_ptr(vq, VHOST_ADDR_USED);
 
-		map = rcu_dereference(vq->maps[VHOST_ADDR_USED]);
-		if (likely(map)) {
-			used = map->addr;
-			used->flags = cpu_to_vhost16(vq, vq->used_flags);
-			rcu_read_unlock();
-			return 0;
-		}
-
-		rcu_read_unlock();
+	if (likely(used)) {
+		used->flags = cpu_to_vhost16(vq, vq->used_flags);
+		vhost_put_meta_ptr();
+		return 0;
 	}
 #endif
 
@@ -1279,21 +1273,12 @@ static inline int vhost_put_used_idx(struct vhost_virtqueue *vq)
 
 {
 #if VHOST_ARCH_CAN_ACCEL_UACCESS
-	struct vhost_map *map;
-	struct vring_used *used;
+	struct vring_used *used = vhost_get_meta_ptr(vq, VHOST_ADDR_USED);
 
-	if (!vq->iotlb) {
-		rcu_read_lock();
-
-		map = rcu_dereference(vq->maps[VHOST_ADDR_USED]);
-		if (likely(map)) {
-			used = map->addr;
-			used->idx = cpu_to_vhost16(vq, vq->last_used_idx);
-			rcu_read_unlock();
-			return 0;
-		}
-
-		rcu_read_unlock();
+	if (likely(used)) {
+		used->idx = cpu_to_vhost16(vq, vq->last_used_idx);
+		vhost_put_meta_ptr();
+		return 0;
 	}
 #endif
 
@@ -1343,21 +1328,12 @@ static inline int vhost_get_avail_idx(struct vhost_virtqueue *vq,
 				      __virtio16 *idx)
 {
 #if VHOST_ARCH_CAN_ACCEL_UACCESS
-	struct vhost_map *map;
-	struct vring_avail *avail;
-
-	if (!vq->iotlb) {
-		rcu_read_lock();
+	struct vring_avail *avail = vhost_get_meta_ptr(vq, VHOST_ADDR_AVAIL);
 
-		map = rcu_dereference(vq->maps[VHOST_ADDR_AVAIL]);
-		if (likely(map)) {
-			avail = map->addr;
-			*idx = avail->idx;
-			rcu_read_unlock();
-			return 0;
-		}
-
-		rcu_read_unlock();
+	if (likely(avail)) {
+		*idx = avail->idx;
+		vhost_put_meta_ptr();
+		return 0;
 	}
 #endif
 
@@ -1368,21 +1344,12 @@ static inline int vhost_get_avail_head(struct vhost_virtqueue *vq,
 				       __virtio16 *head, int idx)
 {
 #if VHOST_ARCH_CAN_ACCEL_UACCESS
-	struct vhost_map *map;
-	struct vring_avail *avail;
-
-	if (!vq->iotlb) {
-		rcu_read_lock();
-
-		map = rcu_dereference(vq->maps[VHOST_ADDR_AVAIL]);
-		if (likely(map)) {
-			avail = map->addr;
-			*head = avail->ring[idx & (vq->num - 1)];
-			rcu_read_unlock();
-			return 0;
-		}
+	struct vring_avail *avail = vhost_get_meta_ptr(vq, VHOST_ADDR_AVAIL);
 
-		rcu_read_unlock();
+	if (likely(avail)) {
+		*head = avail->ring[idx & (vq->num - 1)];
+		vhost_put_meta_ptr();
+		return 0;
 	}
 #endif
 
@@ -1394,21 +1361,12 @@ static inline int vhost_get_avail_flags(struct vhost_virtqueue *vq,
 					__virtio16 *flags)
 {
 #if VHOST_ARCH_CAN_ACCEL_UACCESS
-	struct vhost_map *map;
-	struct vring_avail *avail;
-
-	if (!vq->iotlb) {
-		rcu_read_lock();
+	struct vring_avail *avail = vhost_get_meta_ptr(vq, VHOST_ADDR_AVAIL);
 
-		map = rcu_dereference(vq->maps[VHOST_ADDR_AVAIL]);
-		if (likely(map)) {
-			avail = map->addr;
-			*flags = avail->flags;
-			rcu_read_unlock();
-			return 0;
-		}
-
-		rcu_read_unlock();
+	if (likely(avail)) {
+		*flags = avail->flags;
+		vhost_put_meta_ptr();
+		return 0;
 	}
 #endif
 
@@ -1419,19 +1377,12 @@ static inline int vhost_get_used_event(struct vhost_virtqueue *vq,
 				       __virtio16 *event)
 {
 #if VHOST_ARCH_CAN_ACCEL_UACCESS
-	struct vhost_map *map;
-	struct vring_avail *avail;
+	struct vring_avail *avail = vhost_get_meta_ptr(vq, VHOST_ADDR_AVAIL);
 
-	if (!vq->iotlb) {
-		rcu_read_lock();
-		map = rcu_dereference(vq->maps[VHOST_ADDR_AVAIL]);
-		if (likely(map)) {
-			avail = map->addr;
-			*event = (__virtio16)avail->ring[vq->num];
-			rcu_read_unlock();
-			return 0;
-		}
-		rcu_read_unlock();
+	if (likely(avail)) {
+		*event = (__virtio16)avail->ring[vq->num];
+		vhost_put_meta_ptr();
+		return 0;
 	}
 #endif
 
@@ -1442,21 +1393,12 @@ static inline int vhost_get_used_idx(struct vhost_virtqueue *vq,
 				     __virtio16 *idx)
 {
 #if VHOST_ARCH_CAN_ACCEL_UACCESS
-	struct vhost_map *map;
-	struct vring_used *used;
-
-	if (!vq->iotlb) {
-		rcu_read_lock();
-
-		map = rcu_dereference(vq->maps[VHOST_ADDR_USED]);
-		if (likely(map)) {
-			used = map->addr;
-			*idx = used->idx;
-			rcu_read_unlock();
-			return 0;
-		}
+	struct vring_used *used = vhost_get_meta_ptr(vq, VHOST_ADDR_USED);
 
-		rcu_read_unlock();
+	if (likely(used)) {
+		*idx = used->idx;
+		vhost_put_meta_ptr();
+		return 0;
 	}
 #endif
 
@@ -1467,21 +1409,12 @@ static inline int vhost_get_desc(struct vhost_virtqueue *vq,
 				 struct vring_desc *desc, int idx)
 {
 #if VHOST_ARCH_CAN_ACCEL_UACCESS
-	struct vhost_map *map;
-	struct vring_desc *d;
-
-	if (!vq->iotlb) {
-		rcu_read_lock();
-
-		map = rcu_dereference(vq->maps[VHOST_ADDR_DESC]);
-		if (likely(map)) {
-			d = map->addr;
-			*desc = *(d + idx);
-			rcu_read_unlock();
-			return 0;
-		}
+	struct vring_desc *d = vhost_get_meta_ptr(vq, VHOST_ADDR_DESC);
 
-		rcu_read_unlock();
+	if (likely(d)) {
+		*desc = *(d + idx);
+		vhost_put_meta_ptr();
+		return 0;
 	}
 #endif
 
-- 
2.18.1


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH V3 02/15] vhost: remove the unnecessary parameter of vhost_vq_avail_empty()
  2019-07-17 10:52 [PATCH V3 00/15] Packed virtqueue support for vhost Jason Wang
  2019-07-17 10:52 ` [PATCH V3 01/15] vhost: simplify meta data pointer accessing Jason Wang
@ 2019-07-17 10:52 ` Jason Wang
  2019-07-17 10:52 ` [PATCH V3 03/15] vhost: remove unnecessary parameter of vhost_enable_notify()/vhost_disable_notify Jason Wang
                   ` (13 subsequent siblings)
  15 siblings, 0 replies; 19+ messages in thread
From: Jason Wang @ 2019-07-17 10:52 UTC (permalink / raw)
  To: mst, jasowang
  Cc: kvm, virtualization, netdev, linux-kernel, jfreimann, tiwei.bie,
	maxime.coquelin

Its dev parameter is not even used, so remove it.

Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 drivers/vhost/net.c   | 8 ++++----
 drivers/vhost/vhost.c | 2 +-
 drivers/vhost/vhost.h | 2 +-
 3 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index 3beb401235c0..7d34e8cbc89b 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -498,7 +498,7 @@ static int sock_has_rx_data(struct socket *sock)
 static void vhost_net_busy_poll_try_queue(struct vhost_net *net,
 					  struct vhost_virtqueue *vq)
 {
-	if (!vhost_vq_avail_empty(&net->dev, vq)) {
+	if (!vhost_vq_avail_empty(vq)) {
 		vhost_poll_queue(&vq->poll);
 	} else if (unlikely(vhost_enable_notify(&net->dev, vq))) {
 		vhost_disable_notify(&net->dev, vq);
@@ -540,8 +540,8 @@ static void vhost_net_busy_poll(struct vhost_net *net,
 		}
 
 		if ((sock_has_rx_data(sock) &&
-		     !vhost_vq_avail_empty(&net->dev, rvq)) ||
-		    !vhost_vq_avail_empty(&net->dev, tvq))
+		     !vhost_vq_avail_empty(rvq)) ||
+		    !vhost_vq_avail_empty(tvq))
 			break;
 
 		cpu_relax();
@@ -638,7 +638,7 @@ static int get_tx_bufs(struct vhost_net *net,
 static bool tx_can_batch(struct vhost_virtqueue *vq, size_t total_len)
 {
 	return total_len < VHOST_NET_WEIGHT &&
-	       !vhost_vq_avail_empty(vq->dev, vq);
+	       !vhost_vq_avail_empty(vq);
 }
 
 #define SKB_FRAG_PAGE_ORDER     get_order(32768)
diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index 7f51c74d9aee..ec3534bcd51b 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -2946,7 +2946,7 @@ void vhost_add_used_and_signal_n(struct vhost_dev *dev,
 EXPORT_SYMBOL_GPL(vhost_add_used_and_signal_n);
 
 /* return true if we're sure that avaiable ring is empty */
-bool vhost_vq_avail_empty(struct vhost_dev *dev, struct vhost_virtqueue *vq)
+bool vhost_vq_avail_empty(struct vhost_virtqueue *vq)
 {
 	__virtio16 avail_idx;
 	int r;
diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h
index 819296332913..e0451c900177 100644
--- a/drivers/vhost/vhost.h
+++ b/drivers/vhost/vhost.h
@@ -247,7 +247,7 @@ void vhost_add_used_and_signal_n(struct vhost_dev *, struct vhost_virtqueue *,
 			       struct vring_used_elem *heads, unsigned count);
 void vhost_signal(struct vhost_dev *, struct vhost_virtqueue *);
 void vhost_disable_notify(struct vhost_dev *, struct vhost_virtqueue *);
-bool vhost_vq_avail_empty(struct vhost_dev *, struct vhost_virtqueue *);
+bool vhost_vq_avail_empty(struct vhost_virtqueue *vq);
 bool vhost_enable_notify(struct vhost_dev *, struct vhost_virtqueue *);
 
 int vhost_log_write(struct vhost_virtqueue *vq, struct vhost_log *log,
-- 
2.18.1


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH V3 03/15] vhost: remove unnecessary parameter of vhost_enable_notify()/vhost_disable_notify
  2019-07-17 10:52 [PATCH V3 00/15] Packed virtqueue support for vhost Jason Wang
  2019-07-17 10:52 ` [PATCH V3 01/15] vhost: simplify meta data pointer accessing Jason Wang
  2019-07-17 10:52 ` [PATCH V3 02/15] vhost: remove the unnecessary parameter of vhost_vq_avail_empty() Jason Wang
@ 2019-07-17 10:52 ` Jason Wang
  2019-07-17 10:52 ` [PATCH V3 04/15] vhost-net: don't use vhost_add_used_n() for zerocopy Jason Wang
                   ` (12 subsequent siblings)
  15 siblings, 0 replies; 19+ messages in thread
From: Jason Wang @ 2019-07-17 10:52 UTC (permalink / raw)
  To: mst, jasowang
  Cc: kvm, virtualization, netdev, linux-kernel, jfreimann, tiwei.bie,
	maxime.coquelin

Its dev parameter is not even used, so remove it.

Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 drivers/vhost/net.c   | 25 ++++++++++++-------------
 drivers/vhost/scsi.c  | 12 ++++++------
 drivers/vhost/test.c  |  6 +++---
 drivers/vhost/vhost.c |  4 ++--
 drivers/vhost/vhost.h |  4 ++--
 drivers/vhost/vsock.c | 14 +++++++-------
 6 files changed, 32 insertions(+), 33 deletions(-)

diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index 7d34e8cbc89b..78d248574f8e 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -500,8 +500,8 @@ static void vhost_net_busy_poll_try_queue(struct vhost_net *net,
 {
 	if (!vhost_vq_avail_empty(vq)) {
 		vhost_poll_queue(&vq->poll);
-	} else if (unlikely(vhost_enable_notify(&net->dev, vq))) {
-		vhost_disable_notify(&net->dev, vq);
+	} else if (unlikely(vhost_enable_notify(vq))) {
+		vhost_disable_notify(vq);
 		vhost_poll_queue(&vq->poll);
 	}
 }
@@ -524,7 +524,7 @@ static void vhost_net_busy_poll(struct vhost_net *net,
 	if (!mutex_trylock(&vq->mutex))
 		return;
 
-	vhost_disable_notify(&net->dev, vq);
+	vhost_disable_notify(vq);
 	sock = rvq->private_data;
 
 	busyloop_timeout = poll_rx ? rvq->busyloop_timeout:
@@ -552,7 +552,7 @@ static void vhost_net_busy_poll(struct vhost_net *net,
 	if (poll_rx || sock_has_rx_data(sock))
 		vhost_net_busy_poll_try_queue(net, vq);
 	else if (!poll_rx) /* On tx here, sock has no rx data. */
-		vhost_enable_notify(&net->dev, rvq);
+		vhost_enable_notify(rvq);
 
 	mutex_unlock(&vq->mutex);
 }
@@ -788,9 +788,8 @@ static void handle_tx_copy(struct vhost_net *net, struct socket *sock)
 		if (head == vq->num) {
 			if (unlikely(busyloop_intr)) {
 				vhost_poll_queue(&vq->poll);
-			} else if (unlikely(vhost_enable_notify(&net->dev,
-								vq))) {
-				vhost_disable_notify(&net->dev, vq);
+			} else if (unlikely(vhost_enable_notify(vq))) {
+				vhost_disable_notify(vq);
 				continue;
 			}
 			break;
@@ -880,8 +879,8 @@ static void handle_tx_zerocopy(struct vhost_net *net, struct socket *sock)
 		if (head == vq->num) {
 			if (unlikely(busyloop_intr)) {
 				vhost_poll_queue(&vq->poll);
-			} else if (unlikely(vhost_enable_notify(&net->dev, vq))) {
-				vhost_disable_notify(&net->dev, vq);
+			} else if (unlikely(vhost_enable_notify(vq))) {
+				vhost_disable_notify(vq);
 				continue;
 			}
 			break;
@@ -960,7 +959,7 @@ static void handle_tx(struct vhost_net *net)
 	if (!vq_meta_prefetch(vq))
 		goto out;
 
-	vhost_disable_notify(&net->dev, vq);
+	vhost_disable_notify(vq);
 	vhost_net_disable_vq(net, vq);
 
 	if (vhost_sock_zcopy(sock))
@@ -1129,7 +1128,7 @@ static void handle_rx(struct vhost_net *net)
 	if (!vq_meta_prefetch(vq))
 		goto out;
 
-	vhost_disable_notify(&net->dev, vq);
+	vhost_disable_notify(vq);
 	vhost_net_disable_vq(net, vq);
 
 	vhost_hlen = nvq->vhost_hlen;
@@ -1156,10 +1155,10 @@ static void handle_rx(struct vhost_net *net)
 		if (!headcount) {
 			if (unlikely(busyloop_intr)) {
 				vhost_poll_queue(&vq->poll);
-			} else if (unlikely(vhost_enable_notify(&net->dev, vq))) {
+			} else if (unlikely(vhost_enable_notify(vq))) {
 				/* They have slipped one in as we were
 				 * doing that: check again. */
-				vhost_disable_notify(&net->dev, vq);
+				vhost_disable_notify(vq);
 				continue;
 			}
 			/* Nothing new?  Wait for eventfd to tell us
diff --git a/drivers/vhost/scsi.c b/drivers/vhost/scsi.c
index a9caf1bc3c3e..8d4e87007a8d 100644
--- a/drivers/vhost/scsi.c
+++ b/drivers/vhost/scsi.c
@@ -458,7 +458,7 @@ vhost_scsi_do_evt_work(struct vhost_scsi *vs, struct vhost_scsi_evt *evt)
 	}
 
 again:
-	vhost_disable_notify(&vs->dev, vq);
+	vhost_disable_notify(vq);
 	head = vhost_get_vq_desc(vq, vq->iov,
 			ARRAY_SIZE(vq->iov), &out, &in,
 			NULL, NULL);
@@ -467,7 +467,7 @@ vhost_scsi_do_evt_work(struct vhost_scsi *vs, struct vhost_scsi_evt *evt)
 		return;
 	}
 	if (head == vq->num) {
-		if (vhost_enable_notify(&vs->dev, vq))
+		if (vhost_enable_notify(vq))
 			goto again;
 		vs->vs_events_missed = true;
 		return;
@@ -828,8 +828,8 @@ vhost_scsi_get_desc(struct vhost_scsi *vs, struct vhost_virtqueue *vq,
 
 	/* Nothing new?  Wait for eventfd to tell us they refilled. */
 	if (vc->head == vq->num) {
-		if (unlikely(vhost_enable_notify(&vs->dev, vq))) {
-			vhost_disable_notify(&vs->dev, vq);
+		if (unlikely(vhost_enable_notify(vq))) {
+			vhost_disable_notify(vq);
 			ret = -EAGAIN;
 		}
 		goto done;
@@ -936,7 +936,7 @@ vhost_scsi_handle_vq(struct vhost_scsi *vs, struct vhost_virtqueue *vq)
 	memset(&vc, 0, sizeof(vc));
 	vc.rsp_size = sizeof(struct virtio_scsi_cmd_resp);
 
-	vhost_disable_notify(&vs->dev, vq);
+	vhost_disable_notify(vq);
 
 	do {
 		ret = vhost_scsi_get_desc(vs, vq, &vc);
@@ -1189,7 +1189,7 @@ vhost_scsi_ctl_handle_vq(struct vhost_scsi *vs, struct vhost_virtqueue *vq)
 
 	memset(&vc, 0, sizeof(vc));
 
-	vhost_disable_notify(&vs->dev, vq);
+	vhost_disable_notify(vq);
 
 	do {
 		ret = vhost_scsi_get_desc(vs, vq, &vc);
diff --git a/drivers/vhost/test.c b/drivers/vhost/test.c
index 40589850eb33..746f5d439153 100644
--- a/drivers/vhost/test.c
+++ b/drivers/vhost/test.c
@@ -50,7 +50,7 @@ static void handle_vq(struct vhost_test *n)
 		return;
 	}
 
-	vhost_disable_notify(&n->dev, vq);
+	vhost_disable_notify(vq);
 
 	for (;;) {
 		head = vhost_get_vq_desc(vq, vq->iov,
@@ -62,8 +62,8 @@ static void handle_vq(struct vhost_test *n)
 			break;
 		/* Nothing new?  Wait for eventfd to tell us they refilled. */
 		if (head == vq->num) {
-			if (unlikely(vhost_enable_notify(&n->dev, vq))) {
-				vhost_disable_notify(&n->dev, vq);
+			if (unlikely(vhost_enable_notify(vq))) {
+				vhost_disable_notify(vq);
 				continue;
 			}
 			break;
diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index ec3534bcd51b..e781db88dfca 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -2964,7 +2964,7 @@ bool vhost_vq_avail_empty(struct vhost_virtqueue *vq)
 EXPORT_SYMBOL_GPL(vhost_vq_avail_empty);
 
 /* OK, now we need to know about added descriptors. */
-bool vhost_enable_notify(struct vhost_dev *dev, struct vhost_virtqueue *vq)
+bool vhost_enable_notify(struct vhost_virtqueue *vq)
 {
 	__virtio16 avail_idx;
 	int r;
@@ -3002,7 +3002,7 @@ bool vhost_enable_notify(struct vhost_dev *dev, struct vhost_virtqueue *vq)
 EXPORT_SYMBOL_GPL(vhost_enable_notify);
 
 /* We don't need to be notified again. */
-void vhost_disable_notify(struct vhost_dev *dev, struct vhost_virtqueue *vq)
+void vhost_disable_notify(struct vhost_virtqueue *vq)
 {
 	int r;
 
diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h
index e0451c900177..e054f178d8b0 100644
--- a/drivers/vhost/vhost.h
+++ b/drivers/vhost/vhost.h
@@ -246,9 +246,9 @@ void vhost_add_used_and_signal(struct vhost_dev *, struct vhost_virtqueue *,
 void vhost_add_used_and_signal_n(struct vhost_dev *, struct vhost_virtqueue *,
 			       struct vring_used_elem *heads, unsigned count);
 void vhost_signal(struct vhost_dev *, struct vhost_virtqueue *);
-void vhost_disable_notify(struct vhost_dev *, struct vhost_virtqueue *);
+void vhost_disable_notify(struct vhost_virtqueue *vq);
 bool vhost_vq_avail_empty(struct vhost_virtqueue *vq);
-bool vhost_enable_notify(struct vhost_dev *, struct vhost_virtqueue *);
+bool vhost_enable_notify(struct vhost_virtqueue *vq);
 
 int vhost_log_write(struct vhost_virtqueue *vq, struct vhost_log *log,
 		    unsigned int log_num, u64 len,
diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
index 814bed72d793..f94021b450f0 100644
--- a/drivers/vhost/vsock.c
+++ b/drivers/vhost/vsock.c
@@ -96,7 +96,7 @@ vhost_transport_do_send_pkt(struct vhost_vsock *vsock,
 		goto out;
 
 	/* Avoid further vmexits, we're already processing the virtqueue */
-	vhost_disable_notify(&vsock->dev, vq);
+	vhost_disable_notify(vq);
 
 	do {
 		struct virtio_vsock_pkt *pkt;
@@ -109,7 +109,7 @@ vhost_transport_do_send_pkt(struct vhost_vsock *vsock,
 		spin_lock_bh(&vsock->send_pkt_list_lock);
 		if (list_empty(&vsock->send_pkt_list)) {
 			spin_unlock_bh(&vsock->send_pkt_list_lock);
-			vhost_enable_notify(&vsock->dev, vq);
+			vhost_enable_notify(vq);
 			break;
 		}
 
@@ -135,8 +135,8 @@ vhost_transport_do_send_pkt(struct vhost_vsock *vsock,
 			/* We cannot finish yet if more buffers snuck in while
 			 * re-enabling notify.
 			 */
-			if (unlikely(vhost_enable_notify(&vsock->dev, vq))) {
-				vhost_disable_notify(&vsock->dev, vq);
+			if (unlikely(vhost_enable_notify(vq))) {
+				vhost_disable_notify(vq);
 				continue;
 			}
 			break;
@@ -369,7 +369,7 @@ static void vhost_vsock_handle_tx_kick(struct vhost_work *work)
 	if (!vq->private_data)
 		goto out;
 
-	vhost_disable_notify(&vsock->dev, vq);
+	vhost_disable_notify(vq);
 	do {
 		u32 len;
 
@@ -387,8 +387,8 @@ static void vhost_vsock_handle_tx_kick(struct vhost_work *work)
 			break;
 
 		if (head == vq->num) {
-			if (unlikely(vhost_enable_notify(&vsock->dev, vq))) {
-				vhost_disable_notify(&vsock->dev, vq);
+			if (unlikely(vhost_enable_notify(vq))) {
+				vhost_disable_notify(vq);
 				continue;
 			}
 			break;
-- 
2.18.1


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH V3 04/15] vhost-net: don't use vhost_add_used_n() for zerocopy
  2019-07-17 10:52 [PATCH V3 00/15] Packed virtqueue support for vhost Jason Wang
                   ` (2 preceding siblings ...)
  2019-07-17 10:52 ` [PATCH V3 03/15] vhost: remove unnecessary parameter of vhost_enable_notify()/vhost_disable_notify Jason Wang
@ 2019-07-17 10:52 ` Jason Wang
  2019-07-17 10:52 ` [PATCH V3 05/15] vhost: introduce helpers to manipulate shadow used ring Jason Wang
                   ` (11 subsequent siblings)
  15 siblings, 0 replies; 19+ messages in thread
From: Jason Wang @ 2019-07-17 10:52 UTC (permalink / raw)
  To: mst, jasowang
  Cc: kvm, virtualization, netdev, linux-kernel, jfreimann, tiwei.bie,
	maxime.coquelin

We tried to use vhost_add_used_n() for the packets that is not
zero-copied. This can help to mitigate HOL issue but not a total
solution. What's more, it may lead out of order completion and cause
extra complexity for packed virtqueue implementation that needs to
maintain wrap counters.

So this patch switch to constantly use vq->heads[] to maintain
heads. This will ease the introduction of zerocopy shadow used ring
API and reduce the complexity for packed virtqueues.

After this, vhost_net became a in order device.

Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 drivers/vhost/net.c | 21 +++++++++------------
 1 file changed, 9 insertions(+), 12 deletions(-)

diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index 78d248574f8e..ac31983d2d77 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -894,9 +894,6 @@ static void handle_tx_zerocopy(struct vhost_net *net, struct socket *sock)
 		if (zcopy_used) {
 			struct ubuf_info *ubuf;
 			ubuf = nvq->ubuf_info + nvq->upend_idx;
-
-			vq->heads[nvq->upend_idx].id = cpu_to_vhost32(vq, head);
-			vq->heads[nvq->upend_idx].len = VHOST_DMA_IN_PROGRESS;
 			ubuf->callback = vhost_zerocopy_callback;
 			ubuf->ctx = nvq->ubufs;
 			ubuf->desc = nvq->upend_idx;
@@ -907,11 +904,14 @@ static void handle_tx_zerocopy(struct vhost_net *net, struct socket *sock)
 			msg.msg_controllen = sizeof(ctl);
 			ubufs = nvq->ubufs;
 			atomic_inc(&ubufs->refcount);
-			nvq->upend_idx = (nvq->upend_idx + 1) % UIO_MAXIOV;
 		} else {
 			msg.msg_control = NULL;
 			ubufs = NULL;
 		}
+		vq->heads[nvq->upend_idx].id = cpu_to_vhost32(vq, head);
+		vq->heads[nvq->upend_idx].len = zcopy_used ?
+			 VHOST_DMA_IN_PROGRESS : VHOST_DMA_DONE_LEN;
+		nvq->upend_idx = (nvq->upend_idx + 1) % UIO_MAXIOV;
 		total_len += len;
 		if (tx_can_batch(vq, total_len) &&
 		    likely(!vhost_exceeds_maxpend(net))) {
@@ -923,11 +923,10 @@ static void handle_tx_zerocopy(struct vhost_net *net, struct socket *sock)
 		/* TODO: Check specific error and bomb out unless ENOBUFS? */
 		err = sock->ops->sendmsg(sock, &msg, len);
 		if (unlikely(err < 0)) {
-			if (zcopy_used) {
+			if (zcopy_used)
 				vhost_net_ubuf_put(ubufs);
-				nvq->upend_idx = ((unsigned)nvq->upend_idx - 1)
-					% UIO_MAXIOV;
-			}
+			nvq->upend_idx = ((unsigned int)nvq->upend_idx - 1)
+					 % UIO_MAXIOV;
 			vhost_discard_vq_desc(vq, 1);
 			vhost_net_enable_vq(net, vq);
 			break;
@@ -935,10 +934,8 @@ static void handle_tx_zerocopy(struct vhost_net *net, struct socket *sock)
 		if (err != len)
 			pr_debug("Truncated TX packet: "
 				 " len %d != %zd\n", err, len);
-		if (!zcopy_used)
-			vhost_add_used_and_signal(&net->dev, vq, head, 0);
-		else
-			vhost_zerocopy_signal_used(net, vq);
+
+		vhost_zerocopy_signal_used(net, vq);
 		vhost_net_tx_packet(net);
 	} while (likely(!vhost_exceeds_weight(vq, ++sent_pkts, total_len)));
 }
-- 
2.18.1


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH V3 05/15] vhost: introduce helpers to manipulate shadow used ring
  2019-07-17 10:52 [PATCH V3 00/15] Packed virtqueue support for vhost Jason Wang
                   ` (3 preceding siblings ...)
  2019-07-17 10:52 ` [PATCH V3 04/15] vhost-net: don't use vhost_add_used_n() for zerocopy Jason Wang
@ 2019-07-17 10:52 ` Jason Wang
  2019-07-17 10:52 ` [PATCH V3 06/15] vhost_net: switch TX to use shadow used ring API Jason Wang
                   ` (10 subsequent siblings)
  15 siblings, 0 replies; 19+ messages in thread
From: Jason Wang @ 2019-07-17 10:52 UTC (permalink / raw)
  To: mst, jasowang
  Cc: kvm, virtualization, netdev, linux-kernel, jfreimann, tiwei.bie,
	maxime.coquelin

We open coding vq->heads[] in net.c for:

1) implementing batching which in fact a shadow used ring
   implementation.
2) maintain pending heads in order which is in fact another kind of
   shadow used ring

But this expose used ring layout for device which makes it hard to
introduce new kind of ring like packed virtqueue. So this patch
introduces two types of shadow used ring API:

1) shadow used ring API for batch updating of used heads
2) zerocopy shadow used API for maintaining pending heads and batch
   updating used heads

This can help to hide the used ring layout from device. Device should
not mix using those two kinds of APIs.

Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 drivers/vhost/vhost.c | 95 +++++++++++++++++++++++++++++++++++++------
 drivers/vhost/vhost.h | 18 ++++++++
 2 files changed, 100 insertions(+), 13 deletions(-)

diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index e781db88dfca..5bfca5b76b05 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -486,6 +486,7 @@ static void vhost_vq_reset(struct vhost_dev *dev,
 #if VHOST_ARCH_CAN_ACCEL_UACCESS
 	vhost_reset_vq_maps(vq);
 #endif
+	vq->nheads = 0;
 }
 
 static int vhost_worker(void *data)
@@ -2790,25 +2791,28 @@ int vhost_get_vq_desc(struct vhost_virtqueue *vq,
 }
 EXPORT_SYMBOL_GPL(vhost_get_vq_desc);
 
-/* Reverse the effect of vhost_get_vq_desc. Useful for error handling. */
-void vhost_discard_vq_desc(struct vhost_virtqueue *vq, int n)
+static void vhost_withdraw_shadow_used(struct vhost_virtqueue *vq, int count)
 {
-	vq->last_avail_idx -= n;
+	BUG_ON(count > vq->nheads);
+	vq->nheads -= count;
 }
-EXPORT_SYMBOL_GPL(vhost_discard_vq_desc);
 
-/* After we've used one of their buffers, we tell them about it.  We'll then
- * want to notify the guest, using eventfd. */
-int vhost_add_used(struct vhost_virtqueue *vq, unsigned int head, int len)
+/* Reverse the effect of vhost_get_vq_desc and
+ * vhost_add_shadow_used. Useful for error handleing
+ */
+void vhost_discard_shadow_used(struct vhost_virtqueue *vq, int n)
 {
-	struct vring_used_elem heads = {
-		cpu_to_vhost32(vq, head),
-		cpu_to_vhost32(vq, len)
-	};
+	vhost_withdraw_shadow_used(vq, n);
+	vhost_discard_vq_desc(vq, n);
+}
+EXPORT_SYMBOL_GPL(vhost_discard_shadow_used);
 
-	return vhost_add_used_n(vq, &heads, 1);
+/* Reverse the effect of vhost_get_vq_desc. Useful for error handling. */
+void vhost_discard_vq_desc(struct vhost_virtqueue *vq, int n)
+{
+	vq->last_avail_idx -= n;
 }
-EXPORT_SYMBOL_GPL(vhost_add_used);
+EXPORT_SYMBOL_GPL(vhost_discard_vq_desc);
 
 static int __vhost_add_used_n(struct vhost_virtqueue *vq,
 			    struct vring_used_elem *heads,
@@ -2842,6 +2846,41 @@ static int __vhost_add_used_n(struct vhost_virtqueue *vq,
 	return 0;
 }
 
+void vhost_set_zc_used_len(struct vhost_virtqueue *vq,
+			       int idx, int len)
+{
+	vq->heads[idx].len = len;
+}
+EXPORT_SYMBOL_GPL(vhost_set_zc_used_len);
+
+int vhost_get_zc_used_len(struct vhost_virtqueue *vq, int idx)
+{
+	return vq->heads[idx].len;
+}
+EXPORT_SYMBOL_GPL(vhost_get_zc_used_len);
+
+void vhost_set_zc_used(struct vhost_virtqueue *vq,
+			   int idx, unsigned int head, int len)
+{
+	vq->heads[idx].id = head;
+	vq->heads[idx].len = len;
+}
+EXPORT_SYMBOL_GPL(vhost_set_zc_used);
+
+void vhost_add_shadow_used(struct vhost_virtqueue *vq,
+			   unsigned int head, int len)
+{
+	vhost_set_zc_used(vq, vq->nheads, head, len);
+	++vq->nheads;
+}
+EXPORT_SYMBOL_GPL(vhost_add_shadow_used);
+
+int vhost_get_shadow_used_count(struct vhost_virtqueue *vq)
+{
+	return vq->nheads;
+}
+EXPORT_SYMBOL_GPL(vhost_get_shadow_used_count);
+
 /* After we've used one of their buffers, we tell them about it.  We'll then
  * want to notify the guest, using eventfd. */
 int vhost_add_used_n(struct vhost_virtqueue *vq, struct vring_used_elem *heads,
@@ -2879,6 +2918,19 @@ int vhost_add_used_n(struct vhost_virtqueue *vq, struct vring_used_elem *heads,
 }
 EXPORT_SYMBOL_GPL(vhost_add_used_n);
 
+/* After we've used one of their buffers, we tell them about it.  We'll then
+ * want to notify the guest, using eventfd. */
+int vhost_add_used(struct vhost_virtqueue *vq, unsigned int head, int len)
+{
+	struct vring_used_elem heads = {
+		cpu_to_vhost32(vq, head),
+		cpu_to_vhost32(vq, len)
+	};
+
+	return vhost_add_used_n(vq, &heads, 1);
+}
+EXPORT_SYMBOL_GPL(vhost_add_used);
+
 static bool vhost_notify(struct vhost_dev *dev, struct vhost_virtqueue *vq)
 {
 	__u16 old, new;
@@ -2945,6 +2997,23 @@ void vhost_add_used_and_signal_n(struct vhost_dev *dev,
 }
 EXPORT_SYMBOL_GPL(vhost_add_used_and_signal_n);
 
+void vhost_flush_shadow_used_and_signal(struct vhost_virtqueue *vq)
+
+{
+	if (!vq->nheads)
+		return;
+
+	vhost_add_used_and_signal_n(vq->dev, vq, vq->heads, vq->nheads);
+	vq->nheads = 0;
+}
+EXPORT_SYMBOL_GPL(vhost_flush_shadow_used_and_signal);
+
+void vhost_flush_zc_used_and_signal(struct vhost_virtqueue *vq, int idx, int n)
+{
+	vhost_add_used_and_signal_n(vq->dev, vq, &vq->heads[idx], n);
+}
+EXPORT_SYMBOL_GPL(vhost_flush_zc_used_and_signal);
+
 /* return true if we're sure that avaiable ring is empty */
 bool vhost_vq_avail_empty(struct vhost_virtqueue *vq)
 {
diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h
index e054f178d8b0..175eb5ebf954 100644
--- a/drivers/vhost/vhost.h
+++ b/drivers/vhost/vhost.h
@@ -163,6 +163,7 @@ struct vhost_virtqueue {
 	struct iovec iotlb_iov[64];
 	struct iovec *indirect;
 	struct vring_used_elem *heads;
+	int nheads;
 	/* Protected by virtqueue mutex. */
 	struct vhost_umem *umem;
 	struct vhost_umem *iotlb;
@@ -241,10 +242,27 @@ int vhost_vq_init_access(struct vhost_virtqueue *);
 int vhost_add_used(struct vhost_virtqueue *, unsigned int head, int len);
 int vhost_add_used_n(struct vhost_virtqueue *, struct vring_used_elem *heads,
 		     unsigned count);
+
 void vhost_add_used_and_signal(struct vhost_dev *, struct vhost_virtqueue *,
 			       unsigned int id, int len);
 void vhost_add_used_and_signal_n(struct vhost_dev *, struct vhost_virtqueue *,
 			       struct vring_used_elem *heads, unsigned count);
+
+/* Zerocopy shadow used ring API */
+void vhost_set_zc_used_len(struct vhost_virtqueue *vq,
+			   int idx, int len);
+int vhost_get_zc_used_len(struct vhost_virtqueue *vq, int idx);
+void vhost_flush_zc_used_and_signal(struct vhost_virtqueue *vq, int idx, int n);
+void vhost_set_zc_used(struct vhost_virtqueue *vq, int idx,
+		       unsigned int head, int len);
+
+/* Non zerocopy shadow used ring API */
+void vhost_add_shadow_used(struct vhost_virtqueue *vq, unsigned int head,
+			   int len);
+void vhost_flush_shadow_used_and_signal(struct vhost_virtqueue *vq);
+void vhost_discard_shadow_used(struct vhost_virtqueue *vq, int n);
+int vhost_get_shadow_used_count(struct vhost_virtqueue *vq);
+
 void vhost_signal(struct vhost_dev *, struct vhost_virtqueue *);
 void vhost_disable_notify(struct vhost_virtqueue *vq);
 bool vhost_vq_avail_empty(struct vhost_virtqueue *vq);
-- 
2.18.1


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH V3 06/15] vhost_net: switch TX to use shadow used ring API
  2019-07-17 10:52 [PATCH V3 00/15] Packed virtqueue support for vhost Jason Wang
                   ` (4 preceding siblings ...)
  2019-07-17 10:52 ` [PATCH V3 05/15] vhost: introduce helpers to manipulate shadow used ring Jason Wang
@ 2019-07-17 10:52 ` Jason Wang
  2019-07-17 10:52 ` [PATCH V3 07/15] vhost_net: calculate last used length once for mergeable buffer Jason Wang
                   ` (9 subsequent siblings)
  15 siblings, 0 replies; 19+ messages in thread
From: Jason Wang @ 2019-07-17 10:52 UTC (permalink / raw)
  To: mst, jasowang
  Cc: kvm, virtualization, netdev, linux-kernel, jfreimann, tiwei.bie,
	maxime.coquelin

This patch switch to use shadow used ring API for transmission. This
will help to hide used ring layout from device.

Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 drivers/vhost/net.c | 31 +++++++++++++++----------------
 1 file changed, 15 insertions(+), 16 deletions(-)

diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index ac31983d2d77..cf47e6e348f4 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -361,22 +361,22 @@ static void vhost_zerocopy_signal_used(struct vhost_net *net,
 {
 	struct vhost_net_virtqueue *nvq =
 		container_of(vq, struct vhost_net_virtqueue, vq);
-	int i, add;
+	int i, add, len;
 	int j = 0;
 
 	for (i = nvq->done_idx; i != nvq->upend_idx; i = (i + 1) % UIO_MAXIOV) {
-		if (vq->heads[i].len == VHOST_DMA_FAILED_LEN)
+		len = vhost_get_zc_used_len(vq, i);
+		if (len == VHOST_DMA_FAILED_LEN)
 			vhost_net_tx_err(net);
-		if (VHOST_DMA_IS_DONE(vq->heads[i].len)) {
-			vq->heads[i].len = VHOST_DMA_CLEAR_LEN;
+		if (VHOST_DMA_IS_DONE(len)) {
+			vhost_set_zc_used_len(vq, i, VHOST_DMA_CLEAR_LEN);
 			++j;
 		} else
 			break;
 	}
 	while (j) {
 		add = min(UIO_MAXIOV - nvq->done_idx, j);
-		vhost_add_used_and_signal_n(vq->dev, vq,
-					    &vq->heads[nvq->done_idx], add);
+		vhost_flush_zc_used_and_signal(vq, nvq->done_idx, add);
 		nvq->done_idx = (nvq->done_idx + add) % UIO_MAXIOV;
 		j -= add;
 	}
@@ -391,8 +391,8 @@ static void vhost_zerocopy_callback(struct ubuf_info *ubuf, bool success)
 	rcu_read_lock_bh();
 
 	/* set len to mark this desc buffers done DMA */
-	vq->heads[ubuf->desc].len = success ?
-		VHOST_DMA_DONE_LEN : VHOST_DMA_FAILED_LEN;
+	vhost_set_zc_used_len(vq, ubuf->desc, success ?
+			      VHOST_DMA_DONE_LEN : VHOST_DMA_FAILED_LEN);
 	cnt = vhost_net_ubuf_put(ubufs);
 
 	/*
@@ -480,7 +480,7 @@ static void vhost_tx_batch(struct vhost_net *net,
 	}
 
 signal_used:
-	vhost_net_signal_used(nvq);
+	vhost_flush_shadow_used_and_signal(&nvq->vq);
 	nvq->batched_xdp = 0;
 }
 
@@ -776,7 +776,7 @@ static void handle_tx_copy(struct vhost_net *net, struct socket *sock)
 	do {
 		bool busyloop_intr = false;
 
-		if (nvq->done_idx == VHOST_NET_BATCH)
+		if (vhost_get_shadow_used_count(vq) == VHOST_NET_BATCH)
 			vhost_tx_batch(net, nvq, sock, &msg);
 
 		head = get_tx_bufs(net, nvq, &msg, &out, &in, &len,
@@ -835,9 +835,7 @@ static void handle_tx_copy(struct vhost_net *net, struct socket *sock)
 			pr_debug("Truncated TX packet: len %d != %zd\n",
 				 err, len);
 done:
-		vq->heads[nvq->done_idx].id = cpu_to_vhost32(vq, head);
-		vq->heads[nvq->done_idx].len = 0;
-		++nvq->done_idx;
+		vhost_add_shadow_used(vq, cpu_to_vhost32(vq, head), 0);
 	} while (likely(!vhost_exceeds_weight(vq, ++sent_pkts, total_len)));
 
 	vhost_tx_batch(net, nvq, sock, &msg);
@@ -908,9 +906,10 @@ static void handle_tx_zerocopy(struct vhost_net *net, struct socket *sock)
 			msg.msg_control = NULL;
 			ubufs = NULL;
 		}
-		vq->heads[nvq->upend_idx].id = cpu_to_vhost32(vq, head);
-		vq->heads[nvq->upend_idx].len = zcopy_used ?
-			 VHOST_DMA_IN_PROGRESS : VHOST_DMA_DONE_LEN;
+		vhost_set_zc_used(vq, nvq->upend_idx,
+				  cpu_to_vhost32(vq, head),
+				  zcopy_used ? VHOST_DMA_IN_PROGRESS :
+				  VHOST_DMA_DONE_LEN);
 		nvq->upend_idx = (nvq->upend_idx + 1) % UIO_MAXIOV;
 		total_len += len;
 		if (tx_can_batch(vq, total_len) &&
-- 
2.18.1


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH V3 07/15] vhost_net: calculate last used length once for mergeable buffer
  2019-07-17 10:52 [PATCH V3 00/15] Packed virtqueue support for vhost Jason Wang
                   ` (5 preceding siblings ...)
  2019-07-17 10:52 ` [PATCH V3 06/15] vhost_net: switch TX to use shadow used ring API Jason Wang
@ 2019-07-17 10:52 ` Jason Wang
  2019-07-17 10:52 ` [PATCH V3 08/15] vhost_net: switch to use shadow used ring API for RX Jason Wang
                   ` (8 subsequent siblings)
  15 siblings, 0 replies; 19+ messages in thread
From: Jason Wang @ 2019-07-17 10:52 UTC (permalink / raw)
  To: mst, jasowang
  Cc: kvm, virtualization, netdev, linux-kernel, jfreimann, tiwei.bie,
	maxime.coquelin

This patch tries to calculate last used length once instead of
twice. This can help to convert to use shadow used ring API for
RX.

Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 drivers/vhost/net.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index cf47e6e348f4..1a67f889cbc1 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -1065,12 +1065,12 @@ static int get_rx_bufs(struct vhost_virtqueue *vq,
 		}
 		heads[headcount].id = cpu_to_vhost32(vq, d);
 		len = iov_length(vq->iov + seg, in);
-		heads[headcount].len = cpu_to_vhost32(vq, len);
 		datalen -= len;
+		heads[headcount].len = cpu_to_vhost32(vq,
+				       datalen >= 0 ? len : len + datalen);
 		++headcount;
 		seg += in;
 	}
-	heads[headcount - 1].len = cpu_to_vhost32(vq, len + datalen);
 	*iovcount = seg;
 	if (unlikely(log))
 		*log_num = nlogs;
-- 
2.18.1


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH V3 08/15] vhost_net: switch to use shadow used ring API for RX
  2019-07-17 10:52 [PATCH V3 00/15] Packed virtqueue support for vhost Jason Wang
                   ` (6 preceding siblings ...)
  2019-07-17 10:52 ` [PATCH V3 07/15] vhost_net: calculate last used length once for mergeable buffer Jason Wang
@ 2019-07-17 10:52 ` Jason Wang
  2019-07-17 10:52 ` [PATCH V3 09/15] vhost: do not export vhost_add_used_n() and vhost_add_used_and_signal_n() Jason Wang
                   ` (7 subsequent siblings)
  15 siblings, 0 replies; 19+ messages in thread
From: Jason Wang @ 2019-07-17 10:52 UTC (permalink / raw)
  To: mst, jasowang
  Cc: kvm, virtualization, netdev, linux-kernel, jfreimann, tiwei.bie,
	maxime.coquelin

This patch switches to use shadow used ring API for RX. This will help
to hid used ring layout from device.

Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 drivers/vhost/net.c | 37 +++++++++++--------------------------
 1 file changed, 11 insertions(+), 26 deletions(-)

diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index 1a67f889cbc1..9e087d08b199 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -445,18 +445,6 @@ static int vhost_net_enable_vq(struct vhost_net *n,
 	return vhost_poll_start(poll, sock->file);
 }
 
-static void vhost_net_signal_used(struct vhost_net_virtqueue *nvq)
-{
-	struct vhost_virtqueue *vq = &nvq->vq;
-	struct vhost_dev *dev = vq->dev;
-
-	if (!nvq->done_idx)
-		return;
-
-	vhost_add_used_and_signal_n(dev, vq, vq->heads, nvq->done_idx);
-	nvq->done_idx = 0;
-}
-
 static void vhost_tx_batch(struct vhost_net *net,
 			   struct vhost_net_virtqueue *nvq,
 			   struct socket *sock,
@@ -999,7 +987,7 @@ static int vhost_net_rx_peek_head_len(struct vhost_net *net, struct sock *sk,
 
 	if (!len && rvq->busyloop_timeout) {
 		/* Flush batched heads first */
-		vhost_net_signal_used(rnvq);
+		vhost_flush_shadow_used_and_signal(rvq);
 		/* Both tx vq and rx socket were polled here */
 		vhost_net_busy_poll(net, rvq, tvq, busyloop_intr, true);
 
@@ -1020,7 +1008,6 @@ static int vhost_net_rx_peek_head_len(struct vhost_net *net, struct sock *sk,
  *	returns number of buffer heads allocated, negative on error
  */
 static int get_rx_bufs(struct vhost_virtqueue *vq,
-		       struct vring_used_elem *heads,
 		       int datalen,
 		       unsigned *iovcount,
 		       struct vhost_log *log,
@@ -1063,11 +1050,11 @@ static int get_rx_bufs(struct vhost_virtqueue *vq,
 			nlogs += *log_num;
 			log += *log_num;
 		}
-		heads[headcount].id = cpu_to_vhost32(vq, d);
 		len = iov_length(vq->iov + seg, in);
 		datalen -= len;
-		heads[headcount].len = cpu_to_vhost32(vq,
-				       datalen >= 0 ? len : len + datalen);
+		vhost_add_shadow_used(vq, cpu_to_vhost32(vq, d),
+				      cpu_to_vhost32(vq, datalen >= 0 ? len
+						     : len + datalen));
 		++headcount;
 		seg += in;
 	}
@@ -1082,7 +1069,7 @@ static int get_rx_bufs(struct vhost_virtqueue *vq,
 	}
 	return headcount;
 err:
-	vhost_discard_vq_desc(vq, headcount);
+	vhost_discard_shadow_used(vq, headcount);
 	return r;
 }
 
@@ -1141,8 +1128,7 @@ static void handle_rx(struct vhost_net *net)
 			break;
 		sock_len += sock_hlen;
 		vhost_len = sock_len + vhost_hlen;
-		headcount = get_rx_bufs(vq, vq->heads + nvq->done_idx,
-					vhost_len, &in, vq_log, &log,
+		headcount = get_rx_bufs(vq, vhost_len, &in, vq_log, &log,
 					likely(mergeable) ? UIO_MAXIOV : 1);
 		/* On error, stop handling until the next kick. */
 		if (unlikely(headcount < 0))
@@ -1189,7 +1175,7 @@ static void handle_rx(struct vhost_net *net)
 		if (unlikely(err != sock_len)) {
 			pr_debug("Discarded rx packet: "
 				 " len %d, expected %zd\n", err, sock_len);
-			vhost_discard_vq_desc(vq, headcount);
+			vhost_discard_shadow_used(vq, headcount);
 			continue;
 		}
 		/* Supply virtio_net_hdr if VHOST_NET_F_VIRTIO_NET_HDR */
@@ -1213,12 +1199,11 @@ static void handle_rx(struct vhost_net *net)
 		    copy_to_iter(&num_buffers, sizeof num_buffers,
 				 &fixup) != sizeof num_buffers) {
 			vq_err(vq, "Failed num_buffers write");
-			vhost_discard_vq_desc(vq, headcount);
+			vhost_discard_shadow_used(vq, headcount);
 			goto out;
 		}
-		nvq->done_idx += headcount;
-		if (nvq->done_idx > VHOST_NET_BATCH)
-			vhost_net_signal_used(nvq);
+		if (vhost_get_shadow_used_count(vq) > VHOST_NET_BATCH)
+			vhost_flush_shadow_used_and_signal(vq);
 		if (unlikely(vq_log))
 			vhost_log_write(vq, vq_log, log, vhost_len,
 					vq->iov, in);
@@ -1230,7 +1215,7 @@ static void handle_rx(struct vhost_net *net)
 	else if (!sock_len)
 		vhost_net_enable_vq(net, vq);
 out:
-	vhost_net_signal_used(nvq);
+	vhost_flush_shadow_used_and_signal(vq);
 	mutex_unlock(&vq->mutex);
 }
 
-- 
2.18.1


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH V3 09/15] vhost: do not export vhost_add_used_n() and vhost_add_used_and_signal_n()
  2019-07-17 10:52 [PATCH V3 00/15] Packed virtqueue support for vhost Jason Wang
                   ` (7 preceding siblings ...)
  2019-07-17 10:52 ` [PATCH V3 08/15] vhost_net: switch to use shadow used ring API for RX Jason Wang
@ 2019-07-17 10:52 ` Jason Wang
  2019-07-17 10:52 ` [PATCH V3 10/15] vhost: hide used ring layout from device Jason Wang
                   ` (6 subsequent siblings)
  15 siblings, 0 replies; 19+ messages in thread
From: Jason Wang @ 2019-07-17 10:52 UTC (permalink / raw)
  To: mst, jasowang
  Cc: kvm, virtualization, netdev, linux-kernel, jfreimann, tiwei.bie,
	maxime.coquelin

We would request device to use shadow used ring API. Then there's no
need for exposing those to device.

Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 drivers/vhost/vhost.c | 13 +++++++------
 drivers/vhost/vhost.h |  4 ----
 2 files changed, 7 insertions(+), 10 deletions(-)

diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index 5bfca5b76b05..50ba382f0981 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -2883,8 +2883,9 @@ EXPORT_SYMBOL_GPL(vhost_get_shadow_used_count);
 
 /* After we've used one of their buffers, we tell them about it.  We'll then
  * want to notify the guest, using eventfd. */
-int vhost_add_used_n(struct vhost_virtqueue *vq, struct vring_used_elem *heads,
-		     unsigned count)
+static int vhost_add_used_n(struct vhost_virtqueue *vq,
+			    struct vring_used_elem *heads,
+			    unsigned count)
 {
 	int start, n, r;
 
@@ -2988,14 +2989,14 @@ void vhost_add_used_and_signal(struct vhost_dev *dev,
 EXPORT_SYMBOL_GPL(vhost_add_used_and_signal);
 
 /* multi-buffer version of vhost_add_used_and_signal */
-void vhost_add_used_and_signal_n(struct vhost_dev *dev,
-				 struct vhost_virtqueue *vq,
-				 struct vring_used_elem *heads, unsigned count)
+static void vhost_add_used_and_signal_n(struct vhost_dev *dev,
+					struct vhost_virtqueue *vq,
+					struct vring_used_elem *heads,
+					unsigned count)
 {
 	vhost_add_used_n(vq, heads, count);
 	vhost_signal(dev, vq);
 }
-EXPORT_SYMBOL_GPL(vhost_add_used_and_signal_n);
 
 void vhost_flush_shadow_used_and_signal(struct vhost_virtqueue *vq)
 
diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h
index 175eb5ebf954..481baba20c3d 100644
--- a/drivers/vhost/vhost.h
+++ b/drivers/vhost/vhost.h
@@ -240,13 +240,9 @@ void vhost_discard_vq_desc(struct vhost_virtqueue *, int n);
 
 int vhost_vq_init_access(struct vhost_virtqueue *);
 int vhost_add_used(struct vhost_virtqueue *, unsigned int head, int len);
-int vhost_add_used_n(struct vhost_virtqueue *, struct vring_used_elem *heads,
-		     unsigned count);
 
 void vhost_add_used_and_signal(struct vhost_dev *, struct vhost_virtqueue *,
 			       unsigned int id, int len);
-void vhost_add_used_and_signal_n(struct vhost_dev *, struct vhost_virtqueue *,
-			       struct vring_used_elem *heads, unsigned count);
 
 /* Zerocopy shadow used ring API */
 void vhost_set_zc_used_len(struct vhost_virtqueue *vq,
-- 
2.18.1


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH V3 10/15] vhost: hide used ring layout from device
  2019-07-17 10:52 [PATCH V3 00/15] Packed virtqueue support for vhost Jason Wang
                   ` (8 preceding siblings ...)
  2019-07-17 10:52 ` [PATCH V3 09/15] vhost: do not export vhost_add_used_n() and vhost_add_used_and_signal_n() Jason Wang
@ 2019-07-17 10:52 ` Jason Wang
  2019-07-17 10:52 ` [PATCH V3 11/15] vhost: do not use vring_used_elem Jason Wang
                   ` (5 subsequent siblings)
  15 siblings, 0 replies; 19+ messages in thread
From: Jason Wang @ 2019-07-17 10:52 UTC (permalink / raw)
  To: mst, jasowang
  Cc: kvm, virtualization, netdev, linux-kernel, jfreimann, tiwei.bie,
	maxime.coquelin

We used to return descriptor head by vhost_get_vq_desc() to device and
pass it back to vhost_add_used() and its friends. This exposes the
internal used ring layout to device which makes it hard to be extended for
e.g packed ring layout.

So this patch tries to hide the used ring layout by

- letting vhost_get_vq_desc() return pointer to struct vring_used_elem
- accepting pointer to struct vring_used_elem in vhost_add_used() and
  vhost_add_used_and_signal()

This could help to hide used ring layout and make it easier to
implement packed ring on top.

Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 drivers/vhost/net.c   | 88 ++++++++++++++++++++++---------------------
 drivers/vhost/scsi.c  | 62 ++++++++++++++++--------------
 drivers/vhost/vhost.c | 38 +++++++++++--------
 drivers/vhost/vhost.h | 11 +++---
 drivers/vhost/vsock.c | 43 +++++++++++----------
 5 files changed, 129 insertions(+), 113 deletions(-)

diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index 9e087d08b199..572d80c8c36e 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -546,25 +546,28 @@ static void vhost_net_busy_poll(struct vhost_net *net,
 }
 
 static int vhost_net_tx_get_vq_desc(struct vhost_net *net,
-				    struct vhost_net_virtqueue *tnvq,
+				    struct vring_used_elem *used_elem,
 				    unsigned int *out_num, unsigned int *in_num,
 				    struct msghdr *msghdr, bool *busyloop_intr)
 {
+	struct vhost_net_virtqueue *tnvq = &net->vqs[VHOST_NET_VQ_TX];
 	struct vhost_net_virtqueue *rnvq = &net->vqs[VHOST_NET_VQ_RX];
 	struct vhost_virtqueue *rvq = &rnvq->vq;
 	struct vhost_virtqueue *tvq = &tnvq->vq;
 
-	int r = vhost_get_vq_desc(tvq, tvq->iov, ARRAY_SIZE(tvq->iov),
+	int r = vhost_get_vq_desc(tvq, used_elem, tvq->iov,
+				  ARRAY_SIZE(tvq->iov),
 				  out_num, in_num, NULL, NULL);
 
-	if (r == tvq->num && tvq->busyloop_timeout) {
+	if (r == -ENOSPC && tvq->busyloop_timeout) {
 		/* Flush batched packets first */
 		if (!vhost_sock_zcopy(tvq->private_data))
 			vhost_tx_batch(net, tnvq, tvq->private_data, msghdr);
 
 		vhost_net_busy_poll(net, rvq, tvq, busyloop_intr, false);
 
-		r = vhost_get_vq_desc(tvq, tvq->iov, ARRAY_SIZE(tvq->iov),
+		r = vhost_get_vq_desc(tvq, used_elem, tvq->iov,
+				      ARRAY_SIZE(tvq->iov),
 				      out_num, in_num, NULL, NULL);
 	}
 
@@ -593,6 +596,7 @@ static size_t init_iov_iter(struct vhost_virtqueue *vq, struct iov_iter *iter,
 }
 
 static int get_tx_bufs(struct vhost_net *net,
+		       struct vring_used_elem *used_elem,
 		       struct vhost_net_virtqueue *nvq,
 		       struct msghdr *msg,
 		       unsigned int *out, unsigned int *in,
@@ -601,9 +605,10 @@ static int get_tx_bufs(struct vhost_net *net,
 	struct vhost_virtqueue *vq = &nvq->vq;
 	int ret;
 
-	ret = vhost_net_tx_get_vq_desc(net, nvq, out, in, msg, busyloop_intr);
+	ret = vhost_net_tx_get_vq_desc(net, used_elem, out, in,
+				       msg, busyloop_intr);
 
-	if (ret < 0 || ret == vq->num)
+	if (ret < 0 || ret == -ENOSPC)
 		return ret;
 
 	if (*in) {
@@ -747,8 +752,8 @@ static void handle_tx_copy(struct vhost_net *net, struct socket *sock)
 {
 	struct vhost_net_virtqueue *nvq = &net->vqs[VHOST_NET_VQ_TX];
 	struct vhost_virtqueue *vq = &nvq->vq;
+	struct vring_used_elem used;
 	unsigned out, in;
-	int head;
 	struct msghdr msg = {
 		.msg_name = NULL,
 		.msg_namelen = 0,
@@ -767,13 +772,11 @@ static void handle_tx_copy(struct vhost_net *net, struct socket *sock)
 		if (vhost_get_shadow_used_count(vq) == VHOST_NET_BATCH)
 			vhost_tx_batch(net, nvq, sock, &msg);
 
-		head = get_tx_bufs(net, nvq, &msg, &out, &in, &len,
-				   &busyloop_intr);
-		/* On error, stop handling until the next kick. */
-		if (unlikely(head < 0))
-			break;
+		err = get_tx_bufs(net, &used,
+				  nvq, &msg, &out, &in, &len,
+				  &busyloop_intr);
 		/* Nothing new?  Wait for eventfd to tell us they refilled. */
-		if (head == vq->num) {
+		if (err == -ENOSPC) {
 			if (unlikely(busyloop_intr)) {
 				vhost_poll_queue(&vq->poll);
 			} else if (unlikely(vhost_enable_notify(vq))) {
@@ -782,7 +785,9 @@ static void handle_tx_copy(struct vhost_net *net, struct socket *sock)
 			}
 			break;
 		}
-
+		/* On error, stop handling until the next kick. */
+		if (unlikely(err < 0))
+			break;
 		total_len += len;
 
 		/* For simplicity, TX batching is only enabled if
@@ -823,7 +828,7 @@ static void handle_tx_copy(struct vhost_net *net, struct socket *sock)
 			pr_debug("Truncated TX packet: len %d != %zd\n",
 				 err, len);
 done:
-		vhost_add_shadow_used(vq, cpu_to_vhost32(vq, head), 0);
+		vhost_add_shadow_used(vq, &used, 0);
 	} while (likely(!vhost_exceeds_weight(vq, ++sent_pkts, total_len)));
 
 	vhost_tx_batch(net, nvq, sock, &msg);
@@ -834,7 +839,6 @@ static void handle_tx_zerocopy(struct vhost_net *net, struct socket *sock)
 	struct vhost_net_virtqueue *nvq = &net->vqs[VHOST_NET_VQ_TX];
 	struct vhost_virtqueue *vq = &nvq->vq;
 	unsigned out, in;
-	int head;
 	struct msghdr msg = {
 		.msg_name = NULL,
 		.msg_namelen = 0,
@@ -843,6 +847,7 @@ static void handle_tx_zerocopy(struct vhost_net *net, struct socket *sock)
 		.msg_flags = MSG_DONTWAIT,
 	};
 	struct tun_msg_ctl ctl;
+	struct vring_used_elem used;
 	size_t len, total_len = 0;
 	int err;
 	struct vhost_net_ubuf_ref *uninitialized_var(ubufs);
@@ -856,13 +861,10 @@ static void handle_tx_zerocopy(struct vhost_net *net, struct socket *sock)
 		vhost_zerocopy_signal_used(net, vq);
 
 		busyloop_intr = false;
-		head = get_tx_bufs(net, nvq, &msg, &out, &in, &len,
-				   &busyloop_intr);
-		/* On error, stop handling until the next kick. */
-		if (unlikely(head < 0))
-			break;
+		err = get_tx_bufs(net, &used, nvq, &msg, &out, &in, &len,
+				  &busyloop_intr);
 		/* Nothing new?  Wait for eventfd to tell us they refilled. */
-		if (head == vq->num) {
+		if (err == -ENOSPC) {
 			if (unlikely(busyloop_intr)) {
 				vhost_poll_queue(&vq->poll);
 			} else if (unlikely(vhost_enable_notify(vq))) {
@@ -871,6 +873,9 @@ static void handle_tx_zerocopy(struct vhost_net *net, struct socket *sock)
 			}
 			break;
 		}
+		/* On error, stop handling until the next kick. */
+		if (unlikely(err < 0))
+			break;
 
 		zcopy_used = len >= VHOST_GOODCOPY_LEN
 			     && !vhost_exceeds_maxpend(net)
@@ -895,7 +900,7 @@ static void handle_tx_zerocopy(struct vhost_net *net, struct socket *sock)
 			ubufs = NULL;
 		}
 		vhost_set_zc_used(vq, nvq->upend_idx,
-				  cpu_to_vhost32(vq, head),
+				  &used,
 				  zcopy_used ? VHOST_DMA_IN_PROGRESS :
 				  VHOST_DMA_DONE_LEN);
 		nvq->upend_idx = (nvq->upend_idx + 1) % UIO_MAXIOV;
@@ -921,7 +926,6 @@ static void handle_tx_zerocopy(struct vhost_net *net, struct socket *sock)
 		if (err != len)
 			pr_debug("Truncated TX packet: "
 				 " len %d != %zd\n", err, len);
-
 		vhost_zerocopy_signal_used(net, vq);
 		vhost_net_tx_packet(net);
 	} while (likely(!vhost_exceeds_weight(vq, ++sent_pkts, total_len)));
@@ -1012,34 +1016,30 @@ static int get_rx_bufs(struct vhost_virtqueue *vq,
 		       unsigned *iovcount,
 		       struct vhost_log *log,
 		       unsigned *log_num,
-		       unsigned int quota)
+		       unsigned int quota,
+		       s16 *count)
 {
 	unsigned int out, in;
 	int seg = 0;
 	int headcount = 0;
-	unsigned d;
-	int r, nlogs = 0;
+	int r = 0, nlogs = 0;
 	/* len is always initialized before use since we are always called with
 	 * datalen > 0.
 	 */
 	u32 uninitialized_var(len);
+	struct vring_used_elem uninitialized_var(used);
 
 	while (datalen > 0 && headcount < quota) {
 		if (unlikely(seg >= UIO_MAXIOV)) {
 			r = -ENOBUFS;
 			goto err;
 		}
-		r = vhost_get_vq_desc(vq, vq->iov + seg,
+		r = vhost_get_vq_desc(vq, &used, vq->iov + seg,
 				      ARRAY_SIZE(vq->iov) - seg, &out,
 				      &in, log, log_num);
 		if (unlikely(r < 0))
 			goto err;
 
-		d = r;
-		if (d == vq->num) {
-			r = 0;
-			goto err;
-		}
 		if (unlikely(out || in <= 0)) {
 			vq_err(vq, "unexpected descriptor format for RX: "
 				"out %d, in %d\n", out, in);
@@ -1052,7 +1052,7 @@ static int get_rx_bufs(struct vhost_virtqueue *vq,
 		}
 		len = iov_length(vq->iov + seg, in);
 		datalen -= len;
-		vhost_add_shadow_used(vq, cpu_to_vhost32(vq, d),
+		vhost_add_shadow_used(vq, &used,
 				      cpu_to_vhost32(vq, datalen >= 0 ? len
 						     : len + datalen));
 		++headcount;
@@ -1064,12 +1064,15 @@ static int get_rx_bufs(struct vhost_virtqueue *vq,
 
 	/* Detect overrun */
 	if (unlikely(datalen > 0)) {
-		r = UIO_MAXIOV + 1;
+		headcount = UIO_MAXIOV + 1;
 		goto err;
 	}
-	return headcount;
+
+	*count = headcount;
+	return 0;
 err:
 	vhost_discard_shadow_used(vq, headcount);
+	*count = 0;
 	return r;
 }
 
@@ -1128,13 +1131,11 @@ static void handle_rx(struct vhost_net *net)
 			break;
 		sock_len += sock_hlen;
 		vhost_len = sock_len + vhost_hlen;
-		headcount = get_rx_bufs(vq, vhost_len, &in, vq_log, &log,
-					likely(mergeable) ? UIO_MAXIOV : 1);
-		/* On error, stop handling until the next kick. */
-		if (unlikely(headcount < 0))
-			goto out;
+		err = get_rx_bufs(vq, vhost_len, &in, vq_log, &log,
+				  likely(mergeable) ? UIO_MAXIOV : 1,
+				  &headcount);
 		/* OK, now we need to know about added descriptors. */
-		if (!headcount) {
+		if (err == -ENOSPC) {
 			if (unlikely(busyloop_intr)) {
 				vhost_poll_queue(&vq->poll);
 			} else if (unlikely(vhost_enable_notify(vq))) {
@@ -1148,6 +1149,9 @@ static void handle_rx(struct vhost_net *net)
 			goto out;
 		}
 		busyloop_intr = false;
+		/* On error, stop handling until the next kick. */
+		if (unlikely(err < 0))
+			goto out;
 		if (nvq->rx_ring)
 			msg.msg_control = vhost_net_buf_consume(&nvq->rxq);
 		/* On overrun, truncate and discard */
diff --git a/drivers/vhost/scsi.c b/drivers/vhost/scsi.c
index 8d4e87007a8d..4a5a75ab25ad 100644
--- a/drivers/vhost/scsi.c
+++ b/drivers/vhost/scsi.c
@@ -72,7 +72,7 @@ struct vhost_scsi_inflight {
 
 struct vhost_scsi_cmd {
 	/* Descriptor from vhost_get_vq_desc() for virt_queue segment */
-	int tvc_vq_desc;
+	struct vring_used_elem tvc_vq_used;
 	/* virtio-scsi initiator task attribute */
 	int tvc_task_attr;
 	/* virtio-scsi response incoming iovecs */
@@ -213,7 +213,7 @@ struct vhost_scsi {
  * Context for processing request and control queue operations.
  */
 struct vhost_scsi_ctx {
-	int head;
+	struct vring_used_elem head;
 	unsigned int out, in;
 	size_t req_size, rsp_size;
 	size_t out_size, in_size;
@@ -449,8 +449,9 @@ vhost_scsi_do_evt_work(struct vhost_scsi *vs, struct vhost_scsi_evt *evt)
 	struct vhost_virtqueue *vq = &vs->vqs[VHOST_SCSI_VQ_EVT].vq;
 	struct virtio_scsi_event *event = &evt->event;
 	struct virtio_scsi_event __user *eventp;
+	struct vring_used_elem used;
 	unsigned out, in;
-	int head, ret;
+	int ret;
 
 	if (!vq->private_data) {
 		vs->vs_events_missed = true;
@@ -459,16 +460,16 @@ vhost_scsi_do_evt_work(struct vhost_scsi *vs, struct vhost_scsi_evt *evt)
 
 again:
 	vhost_disable_notify(vq);
-	head = vhost_get_vq_desc(vq, vq->iov,
+	ret = vhost_get_vq_desc(vq, &used, vq->iov,
 			ARRAY_SIZE(vq->iov), &out, &in,
 			NULL, NULL);
-	if (head < 0) {
+	if (ret == -ENOSPC) {
+		if (vhost_enable_notify(&vs->dev, vq))
+			goto again;
 		vs->vs_events_missed = true;
 		return;
 	}
-	if (head == vq->num) {
-		if (vhost_enable_notify(vq))
-			goto again;
+	if (ret < 0) {
 		vs->vs_events_missed = true;
 		return;
 	}
@@ -488,7 +489,7 @@ vhost_scsi_do_evt_work(struct vhost_scsi *vs, struct vhost_scsi_evt *evt)
 	eventp = vq->iov[out].iov_base;
 	ret = __copy_to_user(eventp, event, sizeof(*event));
 	if (!ret)
-		vhost_add_used_and_signal(&vs->dev, vq, head, 0);
+		vhost_add_used_and_signal(&vs->dev, vq, &used, 0);
 	else
 		vq_err(vq, "Faulted on vhost_scsi_send_event\n");
 }
@@ -549,7 +550,7 @@ static void vhost_scsi_complete_cmd_work(struct vhost_work *work)
 		ret = copy_to_iter(&v_rsp, sizeof(v_rsp), &iov_iter);
 		if (likely(ret == sizeof(v_rsp))) {
 			struct vhost_scsi_virtqueue *q;
-			vhost_add_used(cmd->tvc_vq, cmd->tvc_vq_desc, 0);
+			vhost_add_used(cmd->tvc_vq, &cmd->tvc_vq_used, 0);
 			q = container_of(cmd->tvc_vq, struct vhost_scsi_virtqueue, vq);
 			vq = q - vs->vqs;
 			__set_bit(vq, signal);
@@ -793,7 +794,7 @@ static void vhost_scsi_submission_work(struct work_struct *work)
 static void
 vhost_scsi_send_bad_target(struct vhost_scsi *vs,
 			   struct vhost_virtqueue *vq,
-			   int head, unsigned out)
+			   struct vhost_scsi_ctx *vc)
 {
 	struct virtio_scsi_cmd_resp __user *resp;
 	struct virtio_scsi_cmd_resp rsp;
@@ -801,10 +802,10 @@ vhost_scsi_send_bad_target(struct vhost_scsi *vs,
 
 	memset(&rsp, 0, sizeof(rsp));
 	rsp.response = VIRTIO_SCSI_S_BAD_TARGET;
-	resp = vq->iov[out].iov_base;
+	resp = vq->iov[vc->out].iov_base;
 	ret = __copy_to_user(resp, &rsp, sizeof(rsp));
 	if (!ret)
-		vhost_add_used_and_signal(&vs->dev, vq, head, 0);
+		vhost_add_used_and_signal(&vs->dev, vq, &vc->head, 0);
 	else
 		pr_err("Faulted on virtio_scsi_cmd_resp\n");
 }
@@ -813,21 +814,17 @@ static int
 vhost_scsi_get_desc(struct vhost_scsi *vs, struct vhost_virtqueue *vq,
 		    struct vhost_scsi_ctx *vc)
 {
-	int ret = -ENXIO;
+	int ret;
 
-	vc->head = vhost_get_vq_desc(vq, vq->iov,
-				     ARRAY_SIZE(vq->iov), &vc->out, &vc->in,
-				     NULL, NULL);
+	ret = vhost_get_vq_desc(vq, &vc->head, vq->iov,
+				ARRAY_SIZE(vq->iov), &vc->out, &vc->in,
+				NULL, NULL);
 
 	pr_debug("vhost_get_vq_desc: head: %d, out: %u in: %u\n",
-		 vc->head, vc->out, vc->in);
-
-	/* On error, stop handling until the next kick. */
-	if (unlikely(vc->head < 0))
-		goto done;
+		 vc->head.id, vc->out, vc->in);
 
 	/* Nothing new?  Wait for eventfd to tell us they refilled. */
-	if (vc->head == vq->num) {
+	if (ret == -ENOSPC) {
 		if (unlikely(vhost_enable_notify(vq))) {
 			vhost_disable_notify(vq);
 			ret = -EAGAIN;
@@ -835,6 +832,10 @@ vhost_scsi_get_desc(struct vhost_scsi *vs, struct vhost_virtqueue *vq,
 		goto done;
 	}
 
+	/* On error, stop handling until the next kick. */
+	if (unlikely(ret < 0))
+		goto done;
+
 	/*
 	 * Get the size of request and response buffers.
 	 * FIXME: Not correct for BIDI operation
@@ -1025,6 +1026,7 @@ vhost_scsi_handle_vq(struct vhost_scsi *vs, struct vhost_virtqueue *vq)
 					vq_err(vq, "Received non zero pi_bytesin,"
 						" but wrong data_direction\n");
 					goto err;
+
 				}
 				prot_bytes = vhost32_to_cpu(vq, v_req_pi.pi_bytesin);
 			}
@@ -1097,7 +1099,7 @@ vhost_scsi_handle_vq(struct vhost_scsi *vs, struct vhost_virtqueue *vq)
 		 * complete the virtio-scsi request in TCM callback context via
 		 * vhost_scsi_queue_data_in() and vhost_scsi_queue_status()
 		 */
-		cmd->tvc_vq_desc = vc.head;
+		cmd->tvc_vq_used = vc.head;
 		/*
 		 * Dispatch cmd descriptor for cmwq execution in process
 		 * context provided by vhost_scsi_workqueue.  This also ensures
@@ -1117,8 +1119,10 @@ vhost_scsi_handle_vq(struct vhost_scsi *vs, struct vhost_virtqueue *vq)
 		if (ret == -ENXIO)
 			break;
 		else if (ret == -EIO)
-			vhost_scsi_send_bad_target(vs, vq, vc.head, vc.out);
+			vhost_scsi_send_bad_target(vs, vq, &vc);
+
 	} while (likely(!vhost_exceeds_weight(vq, ++c, 0)));
+
 out:
 	mutex_unlock(&vq->mutex);
 }
@@ -1140,7 +1144,7 @@ vhost_scsi_send_tmf_reject(struct vhost_scsi *vs,
 
 	ret = copy_to_iter(&rsp, sizeof(rsp), &iov_iter);
 	if (likely(ret == sizeof(rsp)))
-		vhost_add_used_and_signal(&vs->dev, vq, vc->head, 0);
+		vhost_add_used_and_signal(&vs->dev, vq, &vc->head, 0);
 	else
 		pr_err("Faulted on virtio_scsi_ctrl_tmf_resp\n");
 }
@@ -1162,7 +1166,7 @@ vhost_scsi_send_an_resp(struct vhost_scsi *vs,
 
 	ret = copy_to_iter(&rsp, sizeof(rsp), &iov_iter);
 	if (likely(ret == sizeof(rsp)))
-		vhost_add_used_and_signal(&vs->dev, vq, vc->head, 0);
+		vhost_add_used_and_signal(&vs->dev, vq, &vc->head, 0);
 	else
 		pr_err("Faulted on virtio_scsi_ctrl_an_resp\n");
 }
@@ -1269,8 +1273,10 @@ vhost_scsi_ctl_handle_vq(struct vhost_scsi *vs, struct vhost_virtqueue *vq)
 		if (ret == -ENXIO)
 			break;
 		else if (ret == -EIO)
-			vhost_scsi_send_bad_target(vs, vq, vc.head, vc.out);
+			vhost_scsi_send_bad_target(vs, vq, &vc);
+
 	} while (likely(!vhost_exceeds_weight(vq, ++c, 0)));
+
 out:
 	mutex_unlock(&vq->mutex);
 }
diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index 50ba382f0981..dbe4db0179a5 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -2651,6 +2651,7 @@ static int get_indirect(struct vhost_virtqueue *vq,
  * never a valid descriptor number) if none was found.  A negative code is
  * returned on error. */
 int vhost_get_vq_desc(struct vhost_virtqueue *vq,
+		      struct vring_used_elem *used,
 		      struct iovec iov[], unsigned int iov_size,
 		      unsigned int *out_num, unsigned int *in_num,
 		      struct vhost_log *log, unsigned int *log_num)
@@ -2683,7 +2684,7 @@ int vhost_get_vq_desc(struct vhost_virtqueue *vq,
 		 * invalid.
 		 */
 		if (vq->avail_idx == last_avail_idx)
-			return vq->num;
+			return -ENOSPC;
 
 		/* Only get avail ring entries after they have been
 		 * exposed by guest.
@@ -2700,6 +2701,7 @@ int vhost_get_vq_desc(struct vhost_virtqueue *vq,
 		return -EFAULT;
 	}
 
+	used->id = ring_head;
 	head = vhost16_to_cpu(vq, ring_head);
 
 	/* If their number is silly, that's an error. */
@@ -2787,10 +2789,17 @@ int vhost_get_vq_desc(struct vhost_virtqueue *vq,
 	/* Assume notifications from guest are disabled at this point,
 	 * if they aren't we would need to update avail_event index. */
 	BUG_ON(!(vq->used_flags & VRING_USED_F_NO_NOTIFY));
-	return head;
+	return 0;
 }
 EXPORT_SYMBOL_GPL(vhost_get_vq_desc);
 
+static void vhost_set_used_len(struct vhost_virtqueue *vq,
+			       struct vring_used_elem *used, int len)
+{
+	used->len = cpu_to_vhost32(vq, len);
+}
+EXPORT_SYMBOL_GPL(vhost_set_used_len);
+
 static void vhost_withdraw_shadow_used(struct vhost_virtqueue *vq, int count)
 {
 	BUG_ON(count > vq->nheads);
@@ -2860,17 +2869,17 @@ int vhost_get_zc_used_len(struct vhost_virtqueue *vq, int idx)
 EXPORT_SYMBOL_GPL(vhost_get_zc_used_len);
 
 void vhost_set_zc_used(struct vhost_virtqueue *vq,
-			   int idx, unsigned int head, int len)
+		       int idx, struct vring_used_elem *elem, int len)
 {
-	vq->heads[idx].id = head;
-	vq->heads[idx].len = len;
+	vq->heads[idx] = *elem;
+	vhost_set_zc_used_len(vq, idx, len);
 }
 EXPORT_SYMBOL_GPL(vhost_set_zc_used);
 
 void vhost_add_shadow_used(struct vhost_virtqueue *vq,
-			   unsigned int head, int len)
+			   struct vring_used_elem *elem, int len)
 {
-	vhost_set_zc_used(vq, vq->nheads, head, len);
+	vhost_set_zc_used(vq, vq->nheads, elem, len);
 	++vq->nheads;
 }
 EXPORT_SYMBOL_GPL(vhost_add_shadow_used);
@@ -2921,14 +2930,11 @@ EXPORT_SYMBOL_GPL(vhost_add_used_n);
 
 /* After we've used one of their buffers, we tell them about it.  We'll then
  * want to notify the guest, using eventfd. */
-int vhost_add_used(struct vhost_virtqueue *vq, unsigned int head, int len)
+int vhost_add_used(struct vhost_virtqueue *vq, struct vring_used_elem *used,
+		   int len)
 {
-	struct vring_used_elem heads = {
-		cpu_to_vhost32(vq, head),
-		cpu_to_vhost32(vq, len)
-	};
-
-	return vhost_add_used_n(vq, &heads, 1);
+	vhost_set_used_len(vq, used, len);
+	return vhost_add_used_n(vq, used, 1);
 }
 EXPORT_SYMBOL_GPL(vhost_add_used);
 
@@ -2981,9 +2987,9 @@ EXPORT_SYMBOL_GPL(vhost_signal);
 /* And here's the combo meal deal.  Supersize me! */
 void vhost_add_used_and_signal(struct vhost_dev *dev,
 			       struct vhost_virtqueue *vq,
-			       unsigned int head, int len)
+			       struct vring_used_elem *used, int len)
 {
-	vhost_add_used(vq, head, len);
+	vhost_add_used(vq, used, len);
 	vhost_signal(dev, vq);
 }
 EXPORT_SYMBOL_GPL(vhost_add_used_and_signal);
diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h
index 481baba20c3d..f835eefa240c 100644
--- a/drivers/vhost/vhost.h
+++ b/drivers/vhost/vhost.h
@@ -233,16 +233,17 @@ bool vhost_vq_access_ok(struct vhost_virtqueue *vq);
 bool vhost_log_access_ok(struct vhost_dev *);
 
 int vhost_get_vq_desc(struct vhost_virtqueue *,
+		      struct vring_used_elem *used_elem,
 		      struct iovec iov[], unsigned int iov_count,
 		      unsigned int *out_num, unsigned int *in_num,
 		      struct vhost_log *log, unsigned int *log_num);
 void vhost_discard_vq_desc(struct vhost_virtqueue *, int n);
 
 int vhost_vq_init_access(struct vhost_virtqueue *);
-int vhost_add_used(struct vhost_virtqueue *, unsigned int head, int len);
+int vhost_add_used(struct vhost_virtqueue *, struct vring_used_elem *, int);
 
 void vhost_add_used_and_signal(struct vhost_dev *, struct vhost_virtqueue *,
-			       unsigned int id, int len);
+			       struct vring_used_elem *, int);
 
 /* Zerocopy shadow used ring API */
 void vhost_set_zc_used_len(struct vhost_virtqueue *vq,
@@ -250,11 +251,11 @@ void vhost_set_zc_used_len(struct vhost_virtqueue *vq,
 int vhost_get_zc_used_len(struct vhost_virtqueue *vq, int idx);
 void vhost_flush_zc_used_and_signal(struct vhost_virtqueue *vq, int idx, int n);
 void vhost_set_zc_used(struct vhost_virtqueue *vq, int idx,
-		       unsigned int head, int len);
+		       struct vring_used_elem *elem, int len);
 
 /* Non zerocopy shadow used ring API */
-void vhost_add_shadow_used(struct vhost_virtqueue *vq, unsigned int head,
-			   int len);
+void vhost_add_shadow_used(struct vhost_virtqueue *vq,
+			   struct vring_used_elem *elem, int len);
 void vhost_flush_shadow_used_and_signal(struct vhost_virtqueue *vq);
 void vhost_discard_shadow_used(struct vhost_virtqueue *vq, int n);
 int vhost_get_shadow_used_count(struct vhost_virtqueue *vq);
diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
index f94021b450f0..1c962bfdc3a1 100644
--- a/drivers/vhost/vsock.c
+++ b/drivers/vhost/vsock.c
@@ -100,11 +100,12 @@ vhost_transport_do_send_pkt(struct vhost_vsock *vsock,
 
 	do {
 		struct virtio_vsock_pkt *pkt;
+		struct vring_used_elem used;
 		struct iov_iter iov_iter;
 		unsigned out, in;
 		size_t nbytes;
 		size_t len;
-		int head;
+		int ret;
 
 		spin_lock_bh(&vsock->send_pkt_list_lock);
 		if (list_empty(&vsock->send_pkt_list)) {
@@ -118,16 +119,9 @@ vhost_transport_do_send_pkt(struct vhost_vsock *vsock,
 		list_del_init(&pkt->list);
 		spin_unlock_bh(&vsock->send_pkt_list_lock);
 
-		head = vhost_get_vq_desc(vq, vq->iov, ARRAY_SIZE(vq->iov),
-					 &out, &in, NULL, NULL);
-		if (head < 0) {
-			spin_lock_bh(&vsock->send_pkt_list_lock);
-			list_add(&pkt->list, &vsock->send_pkt_list);
-			spin_unlock_bh(&vsock->send_pkt_list_lock);
-			break;
-		}
-
-		if (head == vq->num) {
+		ret = vhost_get_vq_desc(vq, &used, vq->iov, ARRAY_SIZE(vq->iov),
+					&out, &in, NULL, NULL);
+		if (ret == -ENOSPC) {
 			spin_lock_bh(&vsock->send_pkt_list_lock);
 			list_add(&pkt->list, &vsock->send_pkt_list);
 			spin_unlock_bh(&vsock->send_pkt_list_lock);
@@ -141,6 +135,12 @@ vhost_transport_do_send_pkt(struct vhost_vsock *vsock,
 			}
 			break;
 		}
+		if (ret < 0) {
+			spin_lock_bh(&vsock->send_pkt_list_lock);
+			list_add(&pkt->list, &vsock->send_pkt_list);
+			spin_unlock_bh(&vsock->send_pkt_list_lock);
+			break;
+		}
 
 		if (out) {
 			virtio_transport_free_pkt(pkt);
@@ -148,7 +148,7 @@ vhost_transport_do_send_pkt(struct vhost_vsock *vsock,
 			break;
 		}
 
-		len = iov_length(&vq->iov[out], in);
+		len = vhost32_to_cpu(vq, used.len);
 		iov_iter_init(&iov_iter, READ, &vq->iov[out], in, len);
 
 		nbytes = copy_to_iter(&pkt->hdr, sizeof(pkt->hdr), &iov_iter);
@@ -165,7 +165,7 @@ vhost_transport_do_send_pkt(struct vhost_vsock *vsock,
 			break;
 		}
 
-		vhost_add_used(vq, head, sizeof(pkt->hdr) + pkt->len);
+		vhost_add_used(vq, &used, sizeof(pkt->hdr) + pkt->len);
 		added = true;
 
 		if (pkt->reply) {
@@ -360,7 +360,8 @@ static void vhost_vsock_handle_tx_kick(struct vhost_work *work)
 	struct vhost_vsock *vsock = container_of(vq->dev, struct vhost_vsock,
 						 dev);
 	struct virtio_vsock_pkt *pkt;
-	int head, pkts = 0, total_len = 0;
+	struct vring_used_elem used;
+	int ret, pkts = 0, total_len = 0;
 	unsigned int out, in;
 	bool added = false;
 
@@ -381,18 +382,17 @@ static void vhost_vsock_handle_tx_kick(struct vhost_work *work)
 			goto no_more_replies;
 		}
 
-		head = vhost_get_vq_desc(vq, vq->iov, ARRAY_SIZE(vq->iov),
-					 &out, &in, NULL, NULL);
-		if (head < 0)
-			break;
-
-		if (head == vq->num) {
+		ret = vhost_get_vq_desc(vq, &used, vq->iov, ARRAY_SIZE(vq->iov),
+					&out, &in, NULL, NULL);
+		if (ret == -ENOSPC) {
 			if (unlikely(vhost_enable_notify(vq))) {
 				vhost_disable_notify(vq);
 				continue;
 			}
 			break;
 		}
+		if (ret < 0)
+			break;
 
 		pkt = vhost_vsock_alloc_pkt(vq, out, in);
 		if (!pkt) {
@@ -411,8 +411,7 @@ static void vhost_vsock_handle_tx_kick(struct vhost_work *work)
 		else
 			virtio_transport_free_pkt(pkt);
 
-		len += sizeof(pkt->hdr);
-		vhost_add_used(vq, head, len);
+		vhost_add_used(vq, &used, sizeof(pkt->hdr) + len);
 		total_len += len;
 		added = true;
 	} while(likely(!vhost_exceeds_weight(vq, ++pkts, total_len)));
-- 
2.18.1


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH V3 11/15] vhost: do not use vring_used_elem
  2019-07-17 10:52 [PATCH V3 00/15] Packed virtqueue support for vhost Jason Wang
                   ` (9 preceding siblings ...)
  2019-07-17 10:52 ` [PATCH V3 10/15] vhost: hide used ring layout from device Jason Wang
@ 2019-07-17 10:52 ` Jason Wang
  2019-07-17 10:52 ` [PATCH V3 12/15] vhost: vhost_put_user() can accept metadata type Jason Wang
                   ` (4 subsequent siblings)
  15 siblings, 0 replies; 19+ messages in thread
From: Jason Wang @ 2019-07-17 10:52 UTC (permalink / raw)
  To: mst, jasowang
  Cc: kvm, virtualization, netdev, linux-kernel, jfreimann, tiwei.bie,
	maxime.coquelin

Instead of depending on the exported vring_used_elem, this patch
switches to use a new internal structure vhost_used_elem which embed
vring_used_elem in itself. This could be used to let vhost to record
extra metadata for the incoming packed ring layout.

Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 drivers/vhost/net.c   | 10 +++++-----
 drivers/vhost/scsi.c  |  8 ++++----
 drivers/vhost/vhost.c | 38 +++++++++++++++++++++++---------------
 drivers/vhost/vhost.h | 21 +++++++++++++++------
 drivers/vhost/vsock.c |  4 ++--
 5 files changed, 49 insertions(+), 32 deletions(-)

diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index 572d80c8c36e..7c2f320930c7 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -546,7 +546,7 @@ static void vhost_net_busy_poll(struct vhost_net *net,
 }
 
 static int vhost_net_tx_get_vq_desc(struct vhost_net *net,
-				    struct vring_used_elem *used_elem,
+				    struct vhost_used_elem *used_elem,
 				    unsigned int *out_num, unsigned int *in_num,
 				    struct msghdr *msghdr, bool *busyloop_intr)
 {
@@ -596,7 +596,7 @@ static size_t init_iov_iter(struct vhost_virtqueue *vq, struct iov_iter *iter,
 }
 
 static int get_tx_bufs(struct vhost_net *net,
-		       struct vring_used_elem *used_elem,
+		       struct vhost_used_elem *used_elem,
 		       struct vhost_net_virtqueue *nvq,
 		       struct msghdr *msg,
 		       unsigned int *out, unsigned int *in,
@@ -752,7 +752,7 @@ static void handle_tx_copy(struct vhost_net *net, struct socket *sock)
 {
 	struct vhost_net_virtqueue *nvq = &net->vqs[VHOST_NET_VQ_TX];
 	struct vhost_virtqueue *vq = &nvq->vq;
-	struct vring_used_elem used;
+	struct vhost_used_elem used;
 	unsigned out, in;
 	struct msghdr msg = {
 		.msg_name = NULL,
@@ -847,7 +847,7 @@ static void handle_tx_zerocopy(struct vhost_net *net, struct socket *sock)
 		.msg_flags = MSG_DONTWAIT,
 	};
 	struct tun_msg_ctl ctl;
-	struct vring_used_elem used;
+	struct vhost_used_elem used;
 	size_t len, total_len = 0;
 	int err;
 	struct vhost_net_ubuf_ref *uninitialized_var(ubufs);
@@ -1027,7 +1027,7 @@ static int get_rx_bufs(struct vhost_virtqueue *vq,
 	 * datalen > 0.
 	 */
 	u32 uninitialized_var(len);
-	struct vring_used_elem uninitialized_var(used);
+	struct vhost_used_elem uninitialized_var(used);
 
 	while (datalen > 0 && headcount < quota) {
 		if (unlikely(seg >= UIO_MAXIOV)) {
diff --git a/drivers/vhost/scsi.c b/drivers/vhost/scsi.c
index 4a5a75ab25ad..42c32612dc32 100644
--- a/drivers/vhost/scsi.c
+++ b/drivers/vhost/scsi.c
@@ -72,7 +72,7 @@ struct vhost_scsi_inflight {
 
 struct vhost_scsi_cmd {
 	/* Descriptor from vhost_get_vq_desc() for virt_queue segment */
-	struct vring_used_elem tvc_vq_used;
+	struct vhost_used_elem tvc_vq_used;
 	/* virtio-scsi initiator task attribute */
 	int tvc_task_attr;
 	/* virtio-scsi response incoming iovecs */
@@ -213,7 +213,7 @@ struct vhost_scsi {
  * Context for processing request and control queue operations.
  */
 struct vhost_scsi_ctx {
-	struct vring_used_elem head;
+	struct vhost_used_elem head;
 	unsigned int out, in;
 	size_t req_size, rsp_size;
 	size_t out_size, in_size;
@@ -449,7 +449,7 @@ vhost_scsi_do_evt_work(struct vhost_scsi *vs, struct vhost_scsi_evt *evt)
 	struct vhost_virtqueue *vq = &vs->vqs[VHOST_SCSI_VQ_EVT].vq;
 	struct virtio_scsi_event *event = &evt->event;
 	struct virtio_scsi_event __user *eventp;
-	struct vring_used_elem used;
+	struct vhost_used_elem used;
 	unsigned out, in;
 	int ret;
 
@@ -821,7 +821,7 @@ vhost_scsi_get_desc(struct vhost_scsi *vs, struct vhost_virtqueue *vq,
 				NULL, NULL);
 
 	pr_debug("vhost_get_vq_desc: head: %d, out: %u in: %u\n",
-		 vc->head.id, vc->out, vc->in);
+		 vc->head.elem.id, vc->out, vc->in);
 
 	/* Nothing new?  Wait for eventfd to tell us they refilled. */
 	if (ret == -ENOSPC) {
diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index dbe4db0179a5..6044cdea124f 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -2651,7 +2651,7 @@ static int get_indirect(struct vhost_virtqueue *vq,
  * never a valid descriptor number) if none was found.  A negative code is
  * returned on error. */
 int vhost_get_vq_desc(struct vhost_virtqueue *vq,
-		      struct vring_used_elem *used,
+		      struct vhost_used_elem *used,
 		      struct iovec iov[], unsigned int iov_size,
 		      unsigned int *out_num, unsigned int *in_num,
 		      struct vhost_log *log, unsigned int *log_num)
@@ -2701,7 +2701,7 @@ int vhost_get_vq_desc(struct vhost_virtqueue *vq,
 		return -EFAULT;
 	}
 
-	used->id = ring_head;
+	used->elem.id = ring_head;
 	head = vhost16_to_cpu(vq, ring_head);
 
 	/* If their number is silly, that's an error. */
@@ -2793,13 +2793,20 @@ int vhost_get_vq_desc(struct vhost_virtqueue *vq,
 }
 EXPORT_SYMBOL_GPL(vhost_get_vq_desc);
 
-static void vhost_set_used_len(struct vhost_virtqueue *vq,
-			       struct vring_used_elem *used, int len)
+void vhost_set_used_len(struct vhost_virtqueue *vq,
+			struct vhost_used_elem *used, int len)
 {
-	used->len = cpu_to_vhost32(vq, len);
+	used->elem.len = cpu_to_vhost32(vq, len);
 }
 EXPORT_SYMBOL_GPL(vhost_set_used_len);
 
+__virtio32 vhost_get_used_len(struct vhost_virtqueue *vq,
+			      struct vhost_used_elem *used)
+{
+	return vhost32_to_cpu(vq, used->elem.len);
+}
+EXPORT_SYMBOL_GPL(vhost_get_used_len);
+
 static void vhost_withdraw_shadow_used(struct vhost_virtqueue *vq, int count)
 {
 	BUG_ON(count > vq->nheads);
@@ -2824,9 +2831,10 @@ void vhost_discard_vq_desc(struct vhost_virtqueue *vq, int n)
 EXPORT_SYMBOL_GPL(vhost_discard_vq_desc);
 
 static int __vhost_add_used_n(struct vhost_virtqueue *vq,
-			    struct vring_used_elem *heads,
+			    struct vhost_used_elem *shadow,
 			    unsigned count)
 {
+	struct vring_used_elem *heads = (struct vring_used_elem *)shadow;
 	struct vring_used_elem __user *used;
 	u16 old, new;
 	int start;
@@ -2858,18 +2866,18 @@ static int __vhost_add_used_n(struct vhost_virtqueue *vq,
 void vhost_set_zc_used_len(struct vhost_virtqueue *vq,
 			       int idx, int len)
 {
-	vq->heads[idx].len = len;
+	vq->heads[idx].elem.len = len;
 }
 EXPORT_SYMBOL_GPL(vhost_set_zc_used_len);
 
 int vhost_get_zc_used_len(struct vhost_virtqueue *vq, int idx)
 {
-	return vq->heads[idx].len;
+	return vq->heads[idx].elem.len;
 }
 EXPORT_SYMBOL_GPL(vhost_get_zc_used_len);
 
-void vhost_set_zc_used(struct vhost_virtqueue *vq,
-		       int idx, struct vring_used_elem *elem, int len)
+void vhost_set_zc_used(struct vhost_virtqueue *vq, int idx,
+		       struct vhost_used_elem *elem, int len)
 {
 	vq->heads[idx] = *elem;
 	vhost_set_zc_used_len(vq, idx, len);
@@ -2877,7 +2885,7 @@ void vhost_set_zc_used(struct vhost_virtqueue *vq,
 EXPORT_SYMBOL_GPL(vhost_set_zc_used);
 
 void vhost_add_shadow_used(struct vhost_virtqueue *vq,
-			   struct vring_used_elem *elem, int len)
+			   struct vhost_used_elem *elem, int len)
 {
 	vhost_set_zc_used(vq, vq->nheads, elem, len);
 	++vq->nheads;
@@ -2893,7 +2901,7 @@ EXPORT_SYMBOL_GPL(vhost_get_shadow_used_count);
 /* After we've used one of their buffers, we tell them about it.  We'll then
  * want to notify the guest, using eventfd. */
 static int vhost_add_used_n(struct vhost_virtqueue *vq,
-			    struct vring_used_elem *heads,
+			    struct vhost_used_elem *heads,
 			    unsigned count)
 {
 	int start, n, r;
@@ -2930,7 +2938,7 @@ EXPORT_SYMBOL_GPL(vhost_add_used_n);
 
 /* After we've used one of their buffers, we tell them about it.  We'll then
  * want to notify the guest, using eventfd. */
-int vhost_add_used(struct vhost_virtqueue *vq, struct vring_used_elem *used,
+int vhost_add_used(struct vhost_virtqueue *vq, struct vhost_used_elem *used,
 		   int len)
 {
 	vhost_set_used_len(vq, used, len);
@@ -2987,7 +2995,7 @@ EXPORT_SYMBOL_GPL(vhost_signal);
 /* And here's the combo meal deal.  Supersize me! */
 void vhost_add_used_and_signal(struct vhost_dev *dev,
 			       struct vhost_virtqueue *vq,
-			       struct vring_used_elem *used, int len)
+			       struct vhost_used_elem *used, int len)
 {
 	vhost_add_used(vq, used, len);
 	vhost_signal(dev, vq);
@@ -2997,7 +3005,7 @@ EXPORT_SYMBOL_GPL(vhost_add_used_and_signal);
 /* multi-buffer version of vhost_add_used_and_signal */
 static void vhost_add_used_and_signal_n(struct vhost_dev *dev,
 					struct vhost_virtqueue *vq,
-					struct vring_used_elem *heads,
+					struct vhost_used_elem *heads,
 					unsigned count)
 {
 	vhost_add_used_n(vq, heads, count);
diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h
index f835eefa240c..b8a5d1a2bed9 100644
--- a/drivers/vhost/vhost.h
+++ b/drivers/vhost/vhost.h
@@ -37,6 +37,10 @@ struct vhost_poll {
 	struct vhost_dev	 *dev;
 };
 
+struct vhost_used_elem {
+	struct vring_used_elem elem;
+};
+
 void vhost_work_init(struct vhost_work *work, vhost_work_fn_t fn);
 void vhost_work_queue(struct vhost_dev *dev, struct vhost_work *work);
 bool vhost_has_work(struct vhost_dev *dev);
@@ -162,7 +166,7 @@ struct vhost_virtqueue {
 	struct iovec iov[UIO_MAXIOV];
 	struct iovec iotlb_iov[64];
 	struct iovec *indirect;
-	struct vring_used_elem *heads;
+	struct vhost_used_elem *heads;
 	int nheads;
 	/* Protected by virtqueue mutex. */
 	struct vhost_umem *umem;
@@ -233,17 +237,22 @@ bool vhost_vq_access_ok(struct vhost_virtqueue *vq);
 bool vhost_log_access_ok(struct vhost_dev *);
 
 int vhost_get_vq_desc(struct vhost_virtqueue *,
-		      struct vring_used_elem *used_elem,
+		      struct vhost_used_elem *used_elem,
 		      struct iovec iov[], unsigned int iov_count,
 		      unsigned int *out_num, unsigned int *in_num,
 		      struct vhost_log *log, unsigned int *log_num);
 void vhost_discard_vq_desc(struct vhost_virtqueue *, int n);
 
 int vhost_vq_init_access(struct vhost_virtqueue *);
-int vhost_add_used(struct vhost_virtqueue *, struct vring_used_elem *, int);
+int vhost_add_used(struct vhost_virtqueue *, struct vhost_used_elem *, int);
 
 void vhost_add_used_and_signal(struct vhost_dev *, struct vhost_virtqueue *,
-			       struct vring_used_elem *, int);
+			       struct vhost_used_elem *, int);
+
+__virtio32 vhost_get_used_len(struct vhost_virtqueue *vq,
+			      struct vhost_used_elem *used);
+void vhost_set_used_len(struct vhost_virtqueue *vq,
+			struct vhost_used_elem *used, int len);
 
 /* Zerocopy shadow used ring API */
 void vhost_set_zc_used_len(struct vhost_virtqueue *vq,
@@ -251,11 +260,11 @@ void vhost_set_zc_used_len(struct vhost_virtqueue *vq,
 int vhost_get_zc_used_len(struct vhost_virtqueue *vq, int idx);
 void vhost_flush_zc_used_and_signal(struct vhost_virtqueue *vq, int idx, int n);
 void vhost_set_zc_used(struct vhost_virtqueue *vq, int idx,
-		       struct vring_used_elem *elem, int len);
+		       struct vhost_used_elem *elem, int len);
 
 /* Non zerocopy shadow used ring API */
 void vhost_add_shadow_used(struct vhost_virtqueue *vq,
-			   struct vring_used_elem *elem, int len);
+			   struct vhost_used_elem *elem, int len);
 void vhost_flush_shadow_used_and_signal(struct vhost_virtqueue *vq);
 void vhost_discard_shadow_used(struct vhost_virtqueue *vq, int n);
 int vhost_get_shadow_used_count(struct vhost_virtqueue *vq);
diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
index 1c962bfdc3a1..a33e194499cf 100644
--- a/drivers/vhost/vsock.c
+++ b/drivers/vhost/vsock.c
@@ -100,7 +100,7 @@ vhost_transport_do_send_pkt(struct vhost_vsock *vsock,
 
 	do {
 		struct virtio_vsock_pkt *pkt;
-		struct vring_used_elem used;
+		struct vhost_used_elem used;
 		struct iov_iter iov_iter;
 		unsigned out, in;
 		size_t nbytes;
@@ -148,7 +148,7 @@ vhost_transport_do_send_pkt(struct vhost_vsock *vsock,
 			break;
 		}
 
-		len = vhost32_to_cpu(vq, used.len);
+		len = vhost_get_used_len(&used);
 		iov_iter_init(&iov_iter, READ, &vq->iov[out], in, len);
 
 		nbytes = copy_to_iter(&pkt->hdr, sizeof(pkt->hdr), &iov_iter);
-- 
2.18.1


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH V3 12/15] vhost: vhost_put_user() can accept metadata type
  2019-07-17 10:52 [PATCH V3 00/15] Packed virtqueue support for vhost Jason Wang
                   ` (10 preceding siblings ...)
  2019-07-17 10:52 ` [PATCH V3 11/15] vhost: do not use vring_used_elem Jason Wang
@ 2019-07-17 10:52 ` Jason Wang
  2019-07-17 10:52 ` [PATCH V3 13/15] vhost: packed ring support Jason Wang
                   ` (3 subsequent siblings)
  15 siblings, 0 replies; 19+ messages in thread
From: Jason Wang @ 2019-07-17 10:52 UTC (permalink / raw)
  To: mst, jasowang
  Cc: kvm, virtualization, netdev, linux-kernel, jfreimann, tiwei.bie,
	maxime.coquelin

We assumes used ring update is the only user for vhost_put_user() in
the past. This may not be the case for the incoming packed ring which
may update the descriptor ring for used. So introduce a new type
parameter.

Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 drivers/vhost/vhost.c | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index 6044cdea124f..3fa1adf2cb90 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -1178,7 +1178,7 @@ static inline void __user *__vhost_get_user(struct vhost_virtqueue *vq,
 	return __vhost_get_user_slow(vq, addr, size, type);
 }
 
-#define vhost_put_user(vq, x, ptr)		\
+#define vhost_put_user(vq, x, ptr, type)		\
 ({ \
 	int ret = -EFAULT; \
 	if (!vq->iotlb) { \
@@ -1186,7 +1186,7 @@ static inline void __user *__vhost_get_user(struct vhost_virtqueue *vq,
 	} else { \
 		__typeof__(ptr) to = \
 			(__typeof__(ptr)) __vhost_get_user(vq, ptr,	\
-					  sizeof(*ptr), VHOST_ADDR_USED); \
+					  sizeof(*ptr), type); \
 		if (to != NULL) \
 			ret = __put_user(x, to); \
 		else \
@@ -1230,7 +1230,7 @@ static inline int vhost_put_avail_event(struct vhost_virtqueue *vq)
 #endif
 
 	return vhost_put_user(vq, cpu_to_vhost16(vq, vq->avail_idx),
-			      vhost_avail_event(vq));
+			      vhost_avail_event(vq), VHOST_ADDR_USED);
 }
 
 static inline int vhost_put_used(struct vhost_virtqueue *vq,
@@ -1267,7 +1267,7 @@ static inline int vhost_put_used_flags(struct vhost_virtqueue *vq)
 #endif
 
 	return vhost_put_user(vq, cpu_to_vhost16(vq, vq->used_flags),
-			      &vq->used->flags);
+			      &vq->used->flags, VHOST_ADDR_USED);
 }
 
 static inline int vhost_put_used_idx(struct vhost_virtqueue *vq)
@@ -1284,7 +1284,7 @@ static inline int vhost_put_used_idx(struct vhost_virtqueue *vq)
 #endif
 
 	return vhost_put_user(vq, cpu_to_vhost16(vq, vq->last_used_idx),
-			      &vq->used->idx);
+			      &vq->used->idx, VHOST_ADDR_USED);
 }
 
 #define vhost_get_user(vq, x, ptr, type)		\
-- 
2.18.1


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH V3 13/15] vhost: packed ring support
  2019-07-17 10:52 [PATCH V3 00/15] Packed virtqueue support for vhost Jason Wang
                   ` (11 preceding siblings ...)
  2019-07-17 10:52 ` [PATCH V3 12/15] vhost: vhost_put_user() can accept metadata type Jason Wang
@ 2019-07-17 10:52 ` Jason Wang
  2019-07-17 10:52 ` [PATCH V3 14/15] vhost: event suppression for packed ring Jason Wang
                   ` (2 subsequent siblings)
  15 siblings, 0 replies; 19+ messages in thread
From: Jason Wang @ 2019-07-17 10:52 UTC (permalink / raw)
  To: mst, jasowang
  Cc: kvm, virtualization, netdev, linux-kernel, jfreimann, tiwei.bie,
	maxime.coquelin

This patch introduces basic support for packed ring. The idea behinds
packed ring is to use a single descriptor ring instead of three
different rings (avail, used and descriptor). This could help to
reduce the cache contention and PCI transactions. So it was designed
to help for the performance for both software implementation and
hardware implementation.

The implementation was straightforward, packed version of vhost core
(whose name has a packed suffix) helpers were introduced and previous
helpers were renamed with a split suffix. Then the exported helpers
can just do a switch to go to the correct internal helpers.

The event suppression (device area and driver area) were not
implemented. It will be done on top with another patch.

For more information of packed ring, please refer Virtio spec.

Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 drivers/vhost/net.c   |   6 +-
 drivers/vhost/vhost.c | 980 ++++++++++++++++++++++++++++++++++++++----
 drivers/vhost/vhost.h |  24 +-
 3 files changed, 925 insertions(+), 85 deletions(-)

diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index 7c2f320930c7..ef79446b42f1 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -799,7 +799,7 @@ static void handle_tx_copy(struct vhost_net *net, struct socket *sock)
 				goto done;
 			} else if (unlikely(err != -ENOSPC)) {
 				vhost_tx_batch(net, nvq, sock, &msg);
-				vhost_discard_vq_desc(vq, 1);
+				vhost_discard_vq_desc(vq, &used);
 				vhost_net_enable_vq(net, vq);
 				break;
 			}
@@ -820,7 +820,7 @@ static void handle_tx_copy(struct vhost_net *net, struct socket *sock)
 		/* TODO: Check specific error and bomb out unless ENOBUFS? */
 		err = sock->ops->sendmsg(sock, &msg, len);
 		if (unlikely(err < 0)) {
-			vhost_discard_vq_desc(vq, 1);
+			vhost_discard_vq_desc(vq, &used);
 			vhost_net_enable_vq(net, vq);
 			break;
 		}
@@ -919,7 +919,7 @@ static void handle_tx_zerocopy(struct vhost_net *net, struct socket *sock)
 				vhost_net_ubuf_put(ubufs);
 			nvq->upend_idx = ((unsigned int)nvq->upend_idx - 1)
 					 % UIO_MAXIOV;
-			vhost_discard_vq_desc(vq, 1);
+			vhost_discard_vq_desc(vq, &used);
 			vhost_net_enable_vq(net, vq);
 			break;
 		}
diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index 3fa1adf2cb90..a7d24b9d5204 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -479,6 +479,9 @@ static void vhost_vq_reset(struct vhost_dev *dev,
 	vhost_reset_is_le(vq);
 	vhost_disable_cross_endian(vq);
 	vq->busyloop_timeout = 0;
+	vq->last_used_wrap_counter = true;
+	vq->last_avail_wrap_counter = true;
+	vq->avail_wrap_counter = true;
 	vq->umem = NULL;
 	vq->iotlb = NULL;
 	vq->invalidate_count = 0;
@@ -551,7 +554,8 @@ static long vhost_dev_alloc_iovecs(struct vhost_dev *dev)
 					     GFP_KERNEL);
 		vq->log = kmalloc_array(dev->iov_limit, sizeof(*vq->log),
 					GFP_KERNEL);
-		vq->heads = kmalloc_array(dev->iov_limit, sizeof(*vq->heads),
+		vq->heads = kmalloc_array(dev->iov_limit,
+					  sizeof(struct vhost_used_elem),
 					  GFP_KERNEL);
 		if (!vq->indirect || !vq->log || !vq->heads)
 			goto err_nomem;
@@ -1406,8 +1410,8 @@ static inline int vhost_get_used_idx(struct vhost_virtqueue *vq,
 	return vhost_get_used(vq, *idx, &vq->used->idx);
 }
 
-static inline int vhost_get_desc(struct vhost_virtqueue *vq,
-				 struct vring_desc *desc, int idx)
+static inline int vhost_get_desc_split(struct vhost_virtqueue *vq,
+				       struct vring_desc *desc, int idx)
 {
 #if VHOST_ARCH_CAN_ACCEL_UACCESS
 	struct vring_desc *d = vhost_get_meta_ptr(vq, VHOST_ADDR_DESC);
@@ -1422,6 +1426,104 @@ static inline int vhost_get_desc(struct vhost_virtqueue *vq,
 	return vhost_copy_from_user(vq, desc, vq->desc + idx, sizeof(*desc));
 }
 
+static inline int vhost_get_desc_packed(struct vhost_virtqueue *vq,
+					struct vring_packed_desc *desc, int idx)
+{
+	int ret;
+#if VHOST_ARCH_CAN_ACCEL_UACCESS
+	struct vring_packed_desc *d = vhost_get_meta_ptr(vq, VHOST_ADDR_DESC);
+
+	if (likely(d)) {
+		d += idx;
+
+		desc->flags = d->flags;
+
+		/* Make sure flags is seen before the rest fields of
+		 * descriptor.
+		 */
+		smp_rmb();
+
+		desc->addr = d->addr;
+		desc->len = d->len;
+		desc->id = d->id;
+
+		vhost_put_meta_ptr();
+		return 0;
+	}
+#endif
+
+	ret = vhost_copy_from_user(vq, &desc->flags,
+				   &(vq->desc_packed + idx)->flags,
+				   sizeof(desc->flags));
+	if (unlikely(ret))
+		return ret;
+
+	/* Make sure flags is seen before the rest fields of
+	 * descriptor.
+	 */
+	smp_rmb();
+
+	return vhost_copy_from_user(vq, desc, vq->desc_packed + idx,
+				    offsetof(struct vring_packed_desc, flags));
+}
+
+static inline int vhost_put_desc_packed(struct vhost_virtqueue *vq,
+					struct vring_packed_desc *desc, int idx)
+{
+#if VHOST_ARCH_CAN_ACCEL_UACCESS
+	struct vring_packed_desc *d = vhost_get_meta_ptr(vq, VHOST_ADDR_DESC);
+
+	if (likely(d)) {
+		d += idx;
+		d->id = desc->id;
+		d->flags = desc->flags;
+		vhost_put_meta_ptr();
+		return 0;
+	}
+#endif
+
+	return vhost_copy_to_user(vq, vq->desc_packed + idx, desc,
+				  sizeof(*desc));
+}
+
+static inline int vhost_get_desc_flags(struct vhost_virtqueue *vq,
+				       __virtio16 *flags, int idx)
+{
+	struct vring_packed_desc __user *desc;
+#if VHOST_ARCH_CAN_ACCEL_UACCESS
+	struct vring_packed_desc *d = vhost_get_meta_ptr(vq, VHOST_ADDR_DESC);
+
+	if (likely(d)) {
+		d += idx;
+		*flags = d->flags;
+		vhost_put_meta_ptr();
+		return 0;
+	}
+#endif
+
+	desc = vq->desc_packed + idx;
+	return vhost_get_user(vq, *flags, &desc->flags, VHOST_ADDR_DESC);
+}
+
+static inline int vhost_put_desc_flags(struct vhost_virtqueue *vq,
+				       __virtio16 *flags, int idx)
+{
+	struct vring_packed_desc __user *desc;
+#if VHOST_ARCH_CAN_ACCEL_UACCESS
+	struct vring_packed_desc *d = vhost_get_meta_ptr(vq, VHOST_ADDR_DESC);
+
+	if (likely(d)) {
+		d += idx;
+		d->flags = *flags;
+		vhost_put_meta_ptr();
+		return 0;
+	}
+#endif
+
+	desc = vq->desc_packed + idx;
+	return vhost_put_user(vq, *flags, &desc->flags, VHOST_ADDR_DESC);
+}
+
 static int vhost_new_umem_range(struct vhost_umem *umem,
 				u64 start, u64 size, u64 end,
 				u64 userspace_addr, int perm)
@@ -1701,17 +1803,44 @@ static int vhost_iotlb_miss(struct vhost_virtqueue *vq, u64 iova, int access)
 	return 0;
 }
 
-static bool vq_access_ok(struct vhost_virtqueue *vq, unsigned int num,
-			 struct vring_desc __user *desc,
-			 struct vring_avail __user *avail,
-			 struct vring_used __user *used)
+static int vq_access_ok_packed(struct vhost_virtqueue *vq, unsigned int num,
+			       struct vring_desc __user *desc,
+			       struct vring_avail __user *avail,
+			       struct vring_used __user *used)
+{
+	struct vring_packed_desc *packed = (struct vring_packed_desc *)desc;
+
+	/* TODO: check device area and driver area */
+	return access_ok(packed, num * sizeof(*packed)) &&
+	       access_ok(packed, num * sizeof(*packed));
+}
 
+static int vq_access_ok_split(struct vhost_virtqueue *vq, unsigned int num,
+			      struct vring_desc __user *desc,
+			      struct vring_avail __user *avail,
+			      struct vring_used __user *used)
 {
 	return access_ok(desc, vhost_get_desc_size(vq, num)) &&
 	       access_ok(avail, vhost_get_avail_size(vq, num)) &&
 	       access_ok(used, vhost_get_used_size(vq, num));
 }
 
+#define VHOST_ALTERNATIVE(func, vq, ...)		   \
+do {							   \
+	if (vhost_has_feature(vq, VIRTIO_F_RING_PACKED))   \
+		return func##_##packed(vq, ##__VA_ARGS__); \
+	else						   \
+		return func##_##split(vq, ##__VA_ARGS__);  \
+} while (0)
+
+static int vq_access_ok(struct vhost_virtqueue *vq, unsigned int num,
+			struct vring_desc __user *desc,
+			struct vring_avail __user *avail,
+			struct vring_used __user *used)
+{
+	VHOST_ALTERNATIVE(vq_access_ok, vq, num, desc, avail, used);
+}
+
 static void vhost_vq_meta_update(struct vhost_virtqueue *vq,
 				 const struct vhost_umem_node *node,
 				 int type)
@@ -1786,13 +1915,18 @@ int vq_meta_prefetch(struct vhost_virtqueue *vq)
 		return 1;
 	}
 
-	return iotlb_access_ok(vq, VHOST_ACCESS_RO, (u64)(uintptr_t)vq->desc,
-			       vhost_get_desc_size(vq, num), VHOST_ADDR_DESC) &&
-	       iotlb_access_ok(vq, VHOST_ACCESS_RO, (u64)(uintptr_t)vq->avail,
+	return iotlb_access_ok(vq, VHOST_ACCESS_RO,
+			       (u64)(uintptr_t)vq->desc,
+			       vhost_get_desc_size(vq, num),
+			       VHOST_ADDR_DESC) &&
+	       iotlb_access_ok(vq, VHOST_ACCESS_RO,
+			       (u64)(uintptr_t)vq->avail,
 			       vhost_get_avail_size(vq, num),
 			       VHOST_ADDR_AVAIL) &&
-	       iotlb_access_ok(vq, VHOST_ACCESS_WO, (u64)(uintptr_t)vq->used,
-			       vhost_get_used_size(vq, num), VHOST_ADDR_USED);
+	       iotlb_access_ok(vq, VHOST_ACCESS_WO,
+			       (u64)(uintptr_t)vq->used,
+			       vhost_get_used_size(vq, num),
+			       VHOST_ADDR_USED);
 }
 EXPORT_SYMBOL_GPL(vq_meta_prefetch);
 
@@ -2067,18 +2201,33 @@ long vhost_vring_ioctl(struct vhost_dev *d, unsigned int ioctl, void __user *arg
 			r = -EFAULT;
 			break;
 		}
-		if (s.num > 0xffff) {
-			r = -EINVAL;
-			break;
+		if (vhost_has_feature(vq, VIRTIO_F_RING_PACKED)) {
+			vq->last_used_wrap_counter = s.num & (1 << 31);
+			vq->last_used_idx = s.num >> 16 & ~(1 << 15);
+			s.num >>= 16;
+			vq->last_avail_wrap_counter = s.num & (1 << 15);
+			vq->last_avail_idx = s.num & ~(1 << 15);
+		} else {
+			if (s.num > 0xffff) {
+				r = -EINVAL;
+				break;
+			}
+			vq->last_avail_idx = s.num;
 		}
-		vq->last_avail_idx = s.num;
 		/* Forget the cached index value. */
 		vq->avail_idx = vq->last_avail_idx;
+		if (vhost_has_feature(vq, VIRTIO_F_RING_PACKED))
+			vq->avail_wrap_counter = vq->last_avail_wrap_counter;
 		break;
 	case VHOST_GET_VRING_BASE:
 		s.index = idx;
 		s.num = vq->last_avail_idx;
-		if (copy_to_user(argp, &s, sizeof s))
+		if (vhost_has_feature(vq, VIRTIO_F_RING_PACKED)) {
+			s.num |= vq->last_avail_wrap_counter << 15;
+			s.num |= vq->last_used_idx << 16;
+			s.num |= vq->last_used_wrap_counter << 31;
+		}
+		if (copy_to_user(argp, &s, sizeof(s)))
 			r = -EFAULT;
 		break;
 	case VHOST_SET_VRING_KICK:
@@ -2375,6 +2524,48 @@ static int log_used(struct vhost_virtqueue *vq, u64 used_offset, u64 len)
 	return 0;
 }
 
+static int __log_write(struct vhost_virtqueue *vq, void *addr, int size)
+{
+	struct iovec iov[64];
+	int ret, i;
+
+	if (!size)
+		return 0;
+
+	if (!vq->iotlb)
+		return log_write_hva(vq, (uintptr_t)addr, size);
+
+	ret = translate_desc(vq, (uintptr_t)addr, size, iov, 64,
+			     VHOST_ACCESS_WO);
+	if (ret < 0)
+		return ret;
+
+	for (i = 0; i < ret; i++) {
+		ret = log_write_hva(vq,	(uintptr_t)iov[i].iov_base,
+				    iov[i].iov_len);
+		if (ret)
+			return ret;
+	}
+
+	return 0;
+}
+
+static int log_desc(struct vhost_virtqueue *vq, unsigned int idx,
+		    unsigned int count)
+{
+	int ret, c = min(count, vq->num - idx);
+
+	ret = __log_write(vq, vq->desc_packed + idx,
+			  c * sizeof(*vq->desc_packed));
+	if (ret < 0)
+		return ret;
+
+	ret = __log_write(vq, vq->desc_packed,
+			  (count - c) * sizeof(*vq->desc_packed));
+
+	return ret;
+}
+
 int vhost_log_write(struct vhost_virtqueue *vq, struct vhost_log *log,
 		    unsigned int log_num, u64 len, struct iovec *iov, int count)
 {
@@ -2458,6 +2649,9 @@ int vhost_vq_init_access(struct vhost_virtqueue *vq)
 
 	vhost_init_is_le(vq);
 
+	if (vhost_has_feature(vq, VIRTIO_F_RING_PACKED))
+		return 0;
+
 	r = vhost_update_used_flags(vq);
 	if (r)
 		goto err;
@@ -2531,7 +2725,8 @@ static int translate_desc(struct vhost_virtqueue *vq, u64 addr, u32 len,
 /* Each buffer in the virtqueues is actually a chain of descriptors.  This
  * function returns the next descriptor in the chain,
  * or -1U if we're at the end. */
-static unsigned next_desc(struct vhost_virtqueue *vq, struct vring_desc *desc)
+static unsigned next_desc_split(struct vhost_virtqueue *vq,
+				struct vring_desc *desc)
 {
 	unsigned int next;
 
@@ -2544,11 +2739,17 @@ static unsigned next_desc(struct vhost_virtqueue *vq, struct vring_desc *desc)
 	return next;
 }
 
-static int get_indirect(struct vhost_virtqueue *vq,
-			struct iovec iov[], unsigned int iov_size,
-			unsigned int *out_num, unsigned int *in_num,
-			struct vhost_log *log, unsigned int *log_num,
-			struct vring_desc *indirect)
+static unsigned int next_desc_packed(struct vhost_virtqueue *vq,
+				     struct vring_packed_desc *desc)
+{
+	return desc->flags & cpu_to_vhost16(vq, VRING_DESC_F_NEXT);
+}
+
+static int get_indirect_split(struct vhost_virtqueue *vq,
+			      struct iovec iov[], unsigned int iov_size,
+			      unsigned int *out_num, unsigned int *in_num,
+			      struct vhost_log *log, unsigned int *log_num,
+			      struct vring_desc *indirect)
 {
 	struct vring_desc desc;
 	unsigned int i = 0, count, found = 0;
@@ -2638,23 +2839,291 @@ static int get_indirect(struct vhost_virtqueue *vq,
 			}
 			*out_num += ret;
 		}
-	} while ((i = next_desc(vq, &desc)) != -1);
+	} while ((i = next_desc_split(vq, &desc)) != -1);
 	return 0;
 }
 
-/* This looks in the virtqueue and for the first available buffer, and converts
- * it to an iovec for convenient access.  Since descriptors consist of some
- * number of output then some number of input descriptors, it's actually two
- * iovecs, but we pack them into one and note how many of each there were.
- *
- * This function returns the descriptor number found, or vq->num (which is
- * never a valid descriptor number) if none was found.  A negative code is
- * returned on error. */
-int vhost_get_vq_desc(struct vhost_virtqueue *vq,
-		      struct vhost_used_elem *used,
-		      struct iovec iov[], unsigned int iov_size,
-		      unsigned int *out_num, unsigned int *in_num,
-		      struct vhost_log *log, unsigned int *log_num)
+static int get_indirect_packed(struct vhost_virtqueue *vq,
+			       struct iovec iov[], unsigned int iov_size,
+			       unsigned int *out_num, unsigned int *in_num,
+			       struct vhost_log *log, unsigned int *log_num,
+			       struct vring_packed_desc *indirect)
+{
+	struct vring_packed_desc desc;
+	unsigned int i, count, found = 0;
+	u32 len = vhost32_to_cpu(vq, indirect->len);
+	struct iov_iter from;
+	int ret, access;
+
+	/* Sanity check */
+	if (unlikely(len % sizeof(desc))) {
+		vq_err(vq, "Invalid length in indirect descriptor: len 0x%llx not multiple of 0x%zx\n",
+		       (unsigned long long)len,
+		       sizeof(desc));
+		return -EINVAL;
+	}
+
+	ret = translate_desc(vq, vhost64_to_cpu(vq, indirect->addr),
+			     len, vq->indirect,
+			     UIO_MAXIOV, VHOST_ACCESS_RO);
+	if (unlikely(ret < 0)) {
+		if (ret != -EAGAIN)
+			vq_err(vq, "Translation failure %d in indirect.\n",
+			       ret);
+		return ret;
+	}
+	iov_iter_init(&from, READ, vq->indirect, ret, len);
+
+	/* We will use the result as an address to read from, so most
+	 * architectures only need a compiler barrier here.
+	 */
+	read_barrier_depends();
+
+	count = len / sizeof(desc);
+	/* Buffers are chained via a 16 bit next field, so
+	 * we can have at most 2^16 of these.
+	 */
+	if (unlikely(count > USHRT_MAX + 1)) {
+		vq_err(vq, "Indirect buffer length too big: %d\n",
+		       indirect->len);
+		return -E2BIG;
+	}
+
+	for (i = 0; i < count; i++) {
+		unsigned int iov_count = *in_num + *out_num;
+
+		if (unlikely(++found > count)) {
+			vq_err(vq, "Loop detected: last one at %u indirect size %u\n",
+			       i, count);
+			return -EINVAL;
+		}
+		if (unlikely(!copy_from_iter_full(&desc, sizeof(desc),
+						  &from))) {
+			vq_err(vq, "Failed indirect descriptor: idx %d, %zx\n",
+			       i, (size_t)vhost64_to_cpu(vq, indirect->addr)
+				  + i * sizeof(desc));
+			return -EINVAL;
+		}
+		if (unlikely(desc.flags &
+			     cpu_to_vhost16(vq, VRING_DESC_F_INDIRECT))) {
+			vq_err(vq, "Nested indirect descriptor: idx %d, %zx\n",
+			       i, (size_t)vhost64_to_cpu(vq, indirect->addr)
+				  + i * sizeof(desc));
+			return -EINVAL;
+		}
+
+		if (desc.flags & cpu_to_vhost16(vq, VRING_DESC_F_WRITE))
+			access = VHOST_ACCESS_WO;
+		else
+			access = VHOST_ACCESS_RO;
+
+		ret = translate_desc(vq, vhost64_to_cpu(vq, desc.addr),
+				     vhost32_to_cpu(vq, desc.len),
+				     iov + iov_count,
+				     iov_size - iov_count, access);
+		if (unlikely(ret < 0)) {
+			if (ret != -EAGAIN)
+				vq_err(vq, "Translation failure %d indirect idx %d\n",
+				       ret, i);
+			return ret;
+		}
+		/* If this is an input descriptor, increment that count. */
+		if (access == VHOST_ACCESS_WO) {
+			*in_num += ret;
+			if (unlikely(log)) {
+				log[*log_num].addr =
+					vhost64_to_cpu(vq, desc.addr);
+				log[*log_num].len =
+					vhost32_to_cpu(vq, desc.len);
+				++*log_num;
+			}
+		} else {
+			/* If it's an output descriptor, they're all supposed
+			 * to come before any input descriptors.
+			 */
+			if (unlikely(*in_num)) {
+				vq_err(vq, "Indirect descriptor has out after in: idx %d\n",
+				       i);
+				return -EINVAL;
+			}
+			*out_num += ret;
+		}
+	}
+
+	return 0;
+}
+
+#define VRING_DESC_F_AVAIL (1 << VRING_PACKED_DESC_F_AVAIL)
+#define VRING_DESC_F_USED  (1 << VRING_PACKED_DESC_F_USED)
+
+static bool desc_is_avail(struct vhost_virtqueue *vq, bool wrap_counter,
+			  __virtio16 flags)
+{
+	bool avail = flags & cpu_to_vhost16(vq, VRING_DESC_F_AVAIL);
+
+	return avail == wrap_counter;
+}
+
+static __virtio16 get_desc_flags(struct vhost_virtqueue *vq,
+				 bool wrap_counter, bool write)
+{
+	__virtio16 flags = 0;
+
+	if (wrap_counter) {
+		flags |= cpu_to_vhost16(vq, VRING_DESC_F_AVAIL);
+		flags |= cpu_to_vhost16(vq, VRING_DESC_F_USED);
+	} else {
+		flags &= ~cpu_to_vhost16(vq, VRING_DESC_F_AVAIL);
+		flags &= ~cpu_to_vhost16(vq, VRING_DESC_F_USED);
+	}
+
+	if (write)
+		flags |= cpu_to_vhost16(vq, VRING_DESC_F_WRITE);
+
+	return flags;
+}
+
+static bool vhost_vring_packed_need_event(struct vhost_virtqueue *vq,
+					  bool wrap, __u16 off_wrap, __u16 new,
+					  __u16 old)
+{
+	int off = off_wrap & ~(1 << 15);
+
+	if (wrap != off_wrap >> 15)
+		off -= vq->num;
+
+	return vring_need_event(off, new, old);
+}
+
+static int vhost_get_vq_desc_packed(struct vhost_virtqueue *vq,
+				    struct vhost_used_elem *used,
+				    struct iovec iov[], unsigned int iov_size,
+				    unsigned int *out_num, unsigned int *in_num,
+				    struct vhost_log *log,
+				    unsigned int *log_num)
+{
+	struct vring_packed_desc desc;
+	struct vring_packed_used_elem *elem = &used->packed_elem;
+	int ret, access;
+	u16 last_avail_idx = vq->last_avail_idx;
+	u16 off_wrap = vq->avail_idx | (vq->avail_wrap_counter << 15);
+
+	/* When we start there are none of either input nor output. */
+	*out_num = *in_num = 0;
+	if (unlikely(log))
+		*log_num = 0;
+
+	elem->count = 0;
+
+	do {
+		unsigned int iov_count = *in_num + *out_num;
+
+		ret = vhost_get_desc_packed(vq, &desc, vq->last_avail_idx);
+		if (unlikely(ret)) {
+			vq_err(vq, "Failed to get flags: idx %d\n",
+			       vq->last_avail_idx);
+			return -EFAULT;
+		}
+
+		if (!desc_is_avail(vq, vq->last_avail_wrap_counter,
+				   desc.flags)) {
+			/* If there's nothing new since last we looked, return
+			 * invalid.
+			 */
+			if (!elem->count)
+				return -ENOSPC;
+			vq_err(vq, "Unexpected unavail descriptor: idx %d\n",
+			       vq->last_avail_idx);
+			return -EFAULT;
+		}
+
+		elem->id = desc.id;
+
+		if (desc.flags & cpu_to_vhost16(vq, VRING_DESC_F_INDIRECT)) {
+			ret = get_indirect_packed(vq, iov, iov_size,
+						  out_num, in_num, log,
+						  log_num, &desc);
+			if (unlikely(ret < 0)) {
+				if (ret != -EAGAIN)
+					vq_err(vq, "Failure detected in indirect descriptor at idx %d\n",
+					       desc.id);
+				return ret;
+			}
+			goto next;
+		}
+
+		if (desc.flags & cpu_to_vhost16(vq, VRING_DESC_F_WRITE))
+			access = VHOST_ACCESS_WO;
+		else
+			access = VHOST_ACCESS_RO;
+		ret = translate_desc(vq, vhost64_to_cpu(vq, desc.addr),
+				     vhost32_to_cpu(vq, desc.len),
+				     iov + iov_count, iov_size - iov_count,
+				     access);
+		if (unlikely(ret < 0)) {
+			if (ret != -EAGAIN)
+				vq_err(vq, "Translation failure %d idx %d\n",
+				       ret, desc.id);
+			return ret;
+		}
+
+		if (access == VHOST_ACCESS_WO) {
+			/* If this is an input descriptor,
+			 * increment that count.
+			 */
+			*in_num += ret;
+			if (unlikely(log)) {
+				log[*log_num].addr =
+					vhost64_to_cpu(vq, desc.addr);
+				log[*log_num].len =
+					vhost32_to_cpu(vq, desc.len);
+				++*log_num;
+			}
+		} else {
+			/* If it's an output descriptor, they're all supposed
+			 * to come before any input descriptors.
+			 */
+			if (unlikely(*in_num)) {
+				vq_err(vq, "Desc out after in: idx %d\n",
+				       desc.id);
+				return -EINVAL;
+			}
+			*out_num += ret;
+		}
+
+next:
+		if (unlikely(++elem->count > vq->num)) {
+			vq_err(vq, "Loop detected: last one at %u vq size %u head %u\n",
+			       desc.id, vq->num, elem->id);
+			return -EINVAL;
+		}
+		if (unlikely(++vq->last_avail_idx >= vq->num)) {
+			vq->last_avail_idx = 0;
+			vq->last_avail_wrap_counter ^= 1;
+		}
+	/* If this descriptor says it doesn't chain, we're done. */
+	} while (next_desc_packed(vq, &desc));
+
+	/* Packed ring does not have avail idx which means we need to
+	 * track it by our own. The check here is to make sure it
+	 * grows monotonically.
+	 */
+	if (likely(vhost_vring_packed_need_event(vq,
+						 vq->last_avail_wrap_counter,
+						 off_wrap, vq->last_avail_idx,
+						 last_avail_idx))) {
+		vq->avail_idx = vq->last_avail_idx;
+		vq->avail_wrap_counter = vq->last_avail_wrap_counter;
+	}
+
+	return 0;
+}
+
+static int vhost_get_vq_desc_split(struct vhost_virtqueue *vq,
+				   struct vhost_used_elem *used,
+				   struct iovec iov[], unsigned int iov_size,
+				   unsigned int *out_num, unsigned int *in_num,
+				   struct vhost_log *log, unsigned int *log_num)
 {
 	struct vring_desc desc;
 	unsigned int i, head, found = 0;
@@ -2730,16 +3199,16 @@ int vhost_get_vq_desc(struct vhost_virtqueue *vq,
 			       i, vq->num, head);
 			return -EINVAL;
 		}
-		ret = vhost_get_desc(vq, &desc, i);
+		ret = vhost_get_desc_split(vq, &desc, i);
 		if (unlikely(ret)) {
 			vq_err(vq, "Failed to get descriptor: idx %d addr %p\n",
 			       i, vq->desc + i);
 			return -EFAULT;
 		}
 		if (desc.flags & cpu_to_vhost16(vq, VRING_DESC_F_INDIRECT)) {
-			ret = get_indirect(vq, iov, iov_size,
-					   out_num, in_num,
-					   log, log_num, &desc);
+			ret = get_indirect_split(vq, iov, iov_size,
+						 out_num, in_num,
+						 log, log_num, &desc);
 			if (unlikely(ret < 0)) {
 				if (ret != -EAGAIN)
 					vq_err(vq, "Failure detected "
@@ -2781,7 +3250,7 @@ int vhost_get_vq_desc(struct vhost_virtqueue *vq,
 			}
 			*out_num += ret;
 		}
-	} while ((i = next_desc(vq, &desc)) != -1);
+	} while ((i = next_desc_split(vq, &desc)) != -1);
 
 	/* On success, increment avail index. */
 	vq->last_avail_idx++;
@@ -2791,50 +3260,134 @@ int vhost_get_vq_desc(struct vhost_virtqueue *vq,
 	BUG_ON(!(vq->used_flags & VRING_USED_F_NO_NOTIFY));
 	return 0;
 }
+
+/* This looks in the virtqueue and for the first available buffer, and converts
+ * it to an iovec for convenient access.  Since descriptors consist of some
+ * number of output then some number of input descriptors, it's actually two
+ * iovecs, but we pack them into one and note how many of each there were.
+ *
+ * This function returns the descriptor number found, or vq->num (which is
+ * never a valid descriptor number) if none was found.  A negative code is
+ * returned on error.
+ */
+int vhost_get_vq_desc(struct vhost_virtqueue *vq,
+		      struct vhost_used_elem *used,
+		      struct iovec iov[], unsigned int iov_size,
+		      unsigned int *out_num, unsigned int *in_num,
+		      struct vhost_log *log, unsigned int *log_num)
+{
+	VHOST_ALTERNATIVE(vhost_get_vq_desc, vq, used, iov,
+			  iov_size, out_num, in_num, log, log_num);
+}
 EXPORT_SYMBOL_GPL(vhost_get_vq_desc);
 
+static void vhost_set_used_len_split(struct vhost_virtqueue *vq,
+				     struct vhost_used_elem *used, int len)
+{
+	struct vring_used_elem *elem = &used->elem;
+
+	elem->len = cpu_to_vhost32(vq, len);
+}
+
+static void vhost_set_used_len_packed(struct vhost_virtqueue *vq,
+				      struct vhost_used_elem *used, int len)
+{
+	struct vring_packed_used_elem *elem = &used->packed_elem;
+
+	elem->len = cpu_to_vhost32(vq, len);
+}
+
 void vhost_set_used_len(struct vhost_virtqueue *vq,
 			struct vhost_used_elem *used, int len)
 {
-	used->elem.len = cpu_to_vhost32(vq, len);
+	VHOST_ALTERNATIVE(vhost_set_used_len, vq, used, len);
 }
 EXPORT_SYMBOL_GPL(vhost_set_used_len);
 
+__virtio32 vhost_get_used_len_split(struct vhost_virtqueue *vq,
+				    struct vhost_used_elem *used)
+{
+	return vhost32_to_cpu(vq, used->elem.len);
+}
+
+__virtio32 vhost_get_used_len_packed(struct vhost_virtqueue *vq,
+				    struct vhost_used_elem *used)
+{
+	return vhost32_to_cpu(vq, used->packed_elem.len);
+}
+
 __virtio32 vhost_get_used_len(struct vhost_virtqueue *vq,
 			      struct vhost_used_elem *used)
 {
-	return vhost32_to_cpu(vq, used->elem.len);
+	VHOST_ALTERNATIVE(vhost_get_used_len, vq, used);
 }
 EXPORT_SYMBOL_GPL(vhost_get_used_len);
 
-static void vhost_withdraw_shadow_used(struct vhost_virtqueue *vq, int count)
+static void vhost_withdraw_shadow_used(struct vhost_virtqueue *vq,
+				       int count)
 {
 	BUG_ON(count > vq->nheads);
 	vq->nheads -= count;
 }
 
+static void vhost_discard_vq_desc_packed(struct vhost_virtqueue *vq,
+					 struct vring_packed_used_elem *elem,
+					 int headcount)
+{
+	int i;
+
+	for (i = 0; i < headcount; i++) {
+		vq->last_avail_idx -= elem[i].count;
+		if (vq->last_avail_idx >= vq->num) {
+			vq->last_avail_wrap_counter ^= 1;
+			vq->last_avail_idx += vq->num;
+		}
+	}
+}
+
+static void vhost_discard_vq_desc_split(struct vhost_virtqueue *vq, int n)
+{
+	vq->last_avail_idx -= n;
+}
+
+static void vhost_discard_shadow_used_packed(struct vhost_virtqueue *vq, int n)
+{
+	struct vring_packed_used_elem *elem = vq->heads;
+
+	vhost_discard_vq_desc_packed(vq, elem + vq->nheads - n, n);
+	vhost_withdraw_shadow_used(vq, n);
+}
+
+static void vhost_discard_shadow_used_split(struct vhost_virtqueue *vq, int n)
+{
+	vhost_withdraw_shadow_used(vq, n);
+	vhost_discard_vq_desc_split(vq, n);
+}
+
 /* Reverse the effect of vhost_get_vq_desc and
  * vhost_add_shadow_used. Useful for error handleing
  */
 void vhost_discard_shadow_used(struct vhost_virtqueue *vq, int n)
 {
-	vhost_withdraw_shadow_used(vq, n);
-	vhost_discard_vq_desc(vq, n);
+	VHOST_ALTERNATIVE(vhost_discard_shadow_used, vq, n);
 }
 EXPORT_SYMBOL_GPL(vhost_discard_shadow_used);
 
 /* Reverse the effect of vhost_get_vq_desc. Useful for error handling. */
-void vhost_discard_vq_desc(struct vhost_virtqueue *vq, int n)
+void vhost_discard_vq_desc(struct vhost_virtqueue *vq,
+			   struct vhost_used_elem *used)
 {
-	vq->last_avail_idx -= n;
+	if (vhost_has_feature(vq, VIRTIO_F_RING_PACKED))
+		vhost_discard_vq_desc_packed(vq, &used->packed_elem, 1);
+	else
+		vhost_discard_vq_desc_split(vq, 1);
 }
 EXPORT_SYMBOL_GPL(vhost_discard_vq_desc);
 
-static int __vhost_add_used_n(struct vhost_virtqueue *vq,
-			    struct vhost_used_elem *shadow,
-			    unsigned count)
+static int __vhost_add_used_n_split(struct vhost_virtqueue *vq,
+				    struct vring_used_elem *heads,
+				    unsigned count)
 {
-	struct vring_used_elem *heads = (struct vring_used_elem *)shadow;
 	struct vring_used_elem __user *used;
 	u16 old, new;
 	int start;
@@ -2863,59 +3416,241 @@ static int __vhost_add_used_n(struct vhost_virtqueue *vq,
 	return 0;
 }
 
+static void vhost_set_zc_used_len_split(struct vhost_virtqueue *vq,
+					int idx, int len)
+{
+	struct vring_used_elem *heads = vq->heads;
+
+	heads[idx].len = len;
+}
+
+static void vhost_set_zc_used_len_packed(struct vhost_virtqueue *vq,
+					 int idx, int len)
+{
+	struct vring_packed_used_elem *elem = vq->heads;
+
+	elem[idx].len = len;
+}
+
 void vhost_set_zc_used_len(struct vhost_virtqueue *vq,
-			       int idx, int len)
+			   int idx, int len)
 {
-	vq->heads[idx].elem.len = len;
+	VHOST_ALTERNATIVE(vhost_set_zc_used_len, vq, idx, len);
 }
 EXPORT_SYMBOL_GPL(vhost_set_zc_used_len);
 
-int vhost_get_zc_used_len(struct vhost_virtqueue *vq, int idx)
+void vhost_add_shadow_used_split(struct vhost_virtqueue *vq,
+				 struct vhost_used_elem *used, int len)
 {
-	return vq->heads[idx].elem.len;
+	struct vring_used_elem *heads = vq->heads;
+
+	heads[vq->nheads].id = used->elem.id;
+	heads[vq->nheads].len = len;
+	++vq->nheads;
 }
-EXPORT_SYMBOL_GPL(vhost_get_zc_used_len);
 
-void vhost_set_zc_used(struct vhost_virtqueue *vq, int idx,
-		       struct vhost_used_elem *elem, int len)
+void vhost_add_shadow_used_packed(struct vhost_virtqueue *vq,
+				  struct vhost_used_elem *used, int len)
 {
-	vq->heads[idx] = *elem;
-	vhost_set_zc_used_len(vq, idx, len);
+	struct vring_packed_used_elem *elem = vq->heads;
+
+	elem[vq->nheads].id = used->packed_elem.id;
+	elem[vq->nheads].count = used->packed_elem.count;
+	elem[vq->nheads].len = len;
+	++vq->nheads;
 }
-EXPORT_SYMBOL_GPL(vhost_set_zc_used);
 
 void vhost_add_shadow_used(struct vhost_virtqueue *vq,
-			   struct vhost_used_elem *elem, int len)
+			   struct vhost_used_elem *used, int len)
 {
-	vhost_set_zc_used(vq, vq->nheads, elem, len);
-	++vq->nheads;
+	VHOST_ALTERNATIVE(vhost_add_shadow_used, vq, used, len);
 }
 EXPORT_SYMBOL_GPL(vhost_add_shadow_used);
 
+int vhost_get_zc_used_len_split(struct vhost_virtqueue *vq, int idx)
+{
+	struct vring_used_elem *heads = vq->heads;
+
+	return heads[idx].len;
+}
+
+int vhost_get_zc_used_len_packed(struct vhost_virtqueue *vq, int idx)
+{
+	struct vring_packed_used_elem *elem = vq->heads;
+
+	return elem[idx].len;
+}
+
+int vhost_get_zc_used_len(struct vhost_virtqueue *vq, int idx)
+{
+	VHOST_ALTERNATIVE(vhost_get_zc_used_len, vq, idx);
+}
+EXPORT_SYMBOL_GPL(vhost_get_zc_used_len);
+
+void vhost_set_zc_used_split(struct vhost_virtqueue *vq, int idx,
+			     struct vhost_used_elem *used, int len)
+{
+	struct vring_used_elem *heads = vq->heads;
+
+	heads[idx] = used->elem;
+	vhost_set_zc_used_len_split(vq, idx, len);
+}
+
+void vhost_set_zc_used_packed(struct vhost_virtqueue *vq, int idx,
+			      struct vhost_used_elem *used, int len)
+{
+	struct vring_packed_used_elem *elem = vq->heads;
+
+	elem[idx].id = used->packed_elem.id;
+	elem[idx].count = used->packed_elem.count;
+	vhost_set_zc_used_len_packed(vq, idx, len);
+}
+
+void vhost_set_zc_used(struct vhost_virtqueue *vq, int idx,
+		       struct vhost_used_elem *used, int len)
+{
+	VHOST_ALTERNATIVE(vhost_set_zc_used, vq, idx, used, len);
+}
+EXPORT_SYMBOL_GPL(vhost_set_zc_used);
+
 int vhost_get_shadow_used_count(struct vhost_virtqueue *vq)
 {
 	return vq->nheads;
 }
 EXPORT_SYMBOL_GPL(vhost_get_shadow_used_count);
 
+static int __vhost_add_used_n_packed(struct vhost_virtqueue *vq,
+				     struct vring_packed_used_elem *elem,
+				     unsigned int count)
+{
+	int i, ret;
+	__virtio16 head_flags = 0;
+	struct vring_packed_desc desc;
+	unsigned int used_idx = vq->last_used_idx;
+	bool wrap_counter = vq->last_used_wrap_counter;
+#if VHOST_ARCH_CAN_ACCEL_UACCESS
+	struct vring_packed_desc *d = vhost_get_meta_ptr(vq, VHOST_ADDR_DESC);
+
+	if (likely(d)) {
+		used_idx += elem[0].count;
+		if (unlikely(used_idx >= vq->num)) {
+			used_idx -= vq->num;
+			wrap_counter ^= 1;
+		}
+
+		for (i = 1; i < count; i++) {
+			d[used_idx].flags = get_desc_flags(vq, wrap_counter,
+							   elem[i].len);
+			d[used_idx].id = elem[i].id;
+			d[used_idx].len = elem[i].len;
+			used_idx += elem[i].count;
+			if (unlikely(used_idx >= vq->num)) {
+				used_idx -= vq->num;
+				wrap_counter ^= 1;
+			}
+		}
+
+		d[vq->last_used_idx].id = elem[0].id;
+		d[vq->last_used_idx].len = elem[0].len;
+		/* Make sure descriptor id and len is written before
+		 * flags for the first used buffer.
+		 */
+		smp_wmb();
+		head_flags = get_desc_flags(vq,	vq->last_used_wrap_counter,
+					    elem[0].len);
+		d[vq->last_used_idx].flags = head_flags;
+
+		vq->last_used_idx = used_idx;
+		vq->last_used_wrap_counter = wrap_counter;
+
+		vhost_put_meta_ptr();
+
+		return 0;
+	}
+#endif
+
+	/* Update used elems other than first to save unnecessary
+	 * memory barriers.
+	 */
+	for (i = 0; i < count; i++) {
+		__virtio16 flags = get_desc_flags(vq, wrap_counter,
+						  elem[i].len);
+		if (i == 0) {
+			head_flags = flags;
+			desc.flags = ~flags;
+		} else
+			desc.flags = flags;
+
+		desc.id = elem[i].id;
+		desc.len = elem[i].len;
+
+		ret = vhost_put_desc_packed(vq, &desc, used_idx);
+		if (unlikely(ret))
+			return ret;
+
+		used_idx += elem[i].count;
+		if (used_idx >= vq->num) {
+			used_idx -= vq->num;
+			wrap_counter ^= 1;
+		}
+	}
+
+	/* Make sure descriptor id and len is written before
+	 * flags for the first used buffer.
+	 */
+	smp_wmb();
+
+	if (vhost_put_desc_flags(vq, &head_flags, vq->last_used_idx)) {
+		vq_err(vq, "Failed to update flags: idx %d",
+		       vq->last_used_idx);
+		return -EFAULT;
+	}
+
+	vq->last_used_idx = used_idx;
+	vq->last_used_wrap_counter = wrap_counter;
+
+	return 0;
+}
+
+static int vhost_add_used_n_packed(struct vhost_virtqueue *vq,
+				   struct vring_packed_used_elem *elem,
+				   unsigned int count)
+{
+	u16 last_used_idx = vq->last_used_idx;
+	int ret;
+
+	ret = __vhost_add_used_n_packed(vq, elem, count);
+	if (unlikely(ret))
+		return ret;
+
+	if (unlikely(vq->log_used)) {
+		/* Make sure used descriptors are seen before log. */
+		smp_wmb();
+		ret = log_desc(vq, last_used_idx, count);
+	}
+
+	return ret;
+}
+
 /* After we've used one of their buffers, we tell them about it.  We'll then
  * want to notify the guest, using eventfd. */
-static int vhost_add_used_n(struct vhost_virtqueue *vq,
-			    struct vhost_used_elem *heads,
-			    unsigned count)
+static int vhost_add_used_n_split(struct vhost_virtqueue *vq,
+				  struct vring_used_elem *heads,
+				  unsigned count)
+
 {
 	int start, n, r;
 
 	start = vq->last_used_idx & (vq->num - 1);
 	n = vq->num - start;
 	if (n < count) {
-		r = __vhost_add_used_n(vq, heads, n);
+		r = __vhost_add_used_n_split(vq, heads, n);
 		if (r < 0)
 			return r;
 		heads += n;
 		count -= n;
 	}
-	r = __vhost_add_used_n(vq, heads, count);
+	r = __vhost_add_used_n_split(vq, heads, count);
 
 	/* Make sure buffer is written before we update index. */
 	smp_wmb();
@@ -2934,7 +3669,15 @@ static int vhost_add_used_n(struct vhost_virtqueue *vq,
 	}
 	return r;
 }
-EXPORT_SYMBOL_GPL(vhost_add_used_n);
+
+/* After we've used one of their buffers, we tell them about it.  We'll then
+ * want to notify the guest, using eventfd.
+ */
+static int vhost_add_used_n(struct vhost_virtqueue *vq,
+			    void *heads, unsigned int count)
+{
+	VHOST_ALTERNATIVE(vhost_add_used_n, vq, heads, count);
+}
 
 /* After we've used one of their buffers, we tell them about it.  We'll then
  * want to notify the guest, using eventfd. */
@@ -2951,6 +3694,11 @@ static bool vhost_notify(struct vhost_dev *dev, struct vhost_virtqueue *vq)
 	__u16 old, new;
 	__virtio16 event;
 	bool v;
+
+	/* TODO: check driver area */
+	if (vhost_has_feature(vq, VIRTIO_F_RING_PACKED))
+		return true;
+
 	/* Flush out used index updates. This is paired
 	 * with the barrier that the Guest executes when enabling
 	 * interrupts. */
@@ -3023,14 +3771,32 @@ void vhost_flush_shadow_used_and_signal(struct vhost_virtqueue *vq)
 }
 EXPORT_SYMBOL_GPL(vhost_flush_shadow_used_and_signal);
 
+void vhost_flush_zc_used_and_signal_split(struct vhost_virtqueue *vq,
+					  int idx, int n)
+{
+	struct vring_used_elem *heads = vq->heads;
+
+	vhost_add_used_n_split(vq, &heads[idx], n);
+	vhost_signal(vq->dev, vq);
+}
+
+void vhost_flush_zc_used_and_signal_packed(struct vhost_virtqueue *vq,
+					   int idx, int n)
+{
+	struct vring_packed_used_elem *elem = vq->heads;
+
+	vhost_add_used_n_packed(vq, &elem[idx], n);
+	vhost_signal(vq->dev, vq);
+}
+
 void vhost_flush_zc_used_and_signal(struct vhost_virtqueue *vq, int idx, int n)
 {
-	vhost_add_used_and_signal_n(vq->dev, vq, &vq->heads[idx], n);
+	VHOST_ALTERNATIVE(vhost_flush_zc_used_and_signal, vq, idx, n);
 }
 EXPORT_SYMBOL_GPL(vhost_flush_zc_used_and_signal);
 
 /* return true if we're sure that avaiable ring is empty */
-bool vhost_vq_avail_empty(struct vhost_virtqueue *vq)
+static bool vhost_vq_avail_empty_split(struct vhost_virtqueue *vq)
 {
 	__virtio16 avail_idx;
 	int r;
@@ -3045,10 +3811,52 @@ bool vhost_vq_avail_empty(struct vhost_virtqueue *vq)
 
 	return vq->avail_idx == vq->last_avail_idx;
 }
+
+static bool vhost_vq_avail_empty_packed(struct vhost_virtqueue *vq)
+{
+	__virtio16 flags;
+	int ret;
+
+	ret = vhost_get_desc_flags(vq, &flags, vq->avail_idx);
+	if (unlikely(ret)) {
+		vq_err(vq, "Failed to get flags: idx %d\n",
+		       vq->avail_idx);
+		return -EFAULT;
+	}
+
+	return !desc_is_avail(vq, vq->avail_wrap_counter, flags);
+}
+
+bool vhost_vq_avail_empty(struct vhost_virtqueue *vq)
+{
+	VHOST_ALTERNATIVE(vhost_vq_avail_empty, vq);
+}
 EXPORT_SYMBOL_GPL(vhost_vq_avail_empty);
 
-/* OK, now we need to know about added descriptors. */
-bool vhost_enable_notify(struct vhost_virtqueue *vq)
+static bool vhost_enable_notify_packed(struct vhost_virtqueue *vq)
+{
+	struct vring_packed_desc *d = vq->desc_packed + vq->avail_idx;
+	__virtio16 flags;
+	int ret;
+
+	/* TODO: enable notification through device area */
+
+	/* They could have slipped one in as we were doing that: make
+	 * sure it's written, then check again.
+	 */
+	smp_mb();
+
+	ret = vhost_get_desc_flags(vq, &flags, vq->avail_idx);
+	if (unlikely(ret)) {
+		vq_err(vq, "Failed to get descriptor: idx %d addr %p\n",
+			vq->avail_idx, &d->flags);
+		return -EFAULT;
+	}
+
+	return desc_is_avail(vq, vq->avail_wrap_counter, flags);
+}
+
+static bool vhost_enable_notify_split(struct vhost_virtqueue *vq)
 {
 	__virtio16 avail_idx;
 	int r;
@@ -3083,10 +3891,20 @@ bool vhost_enable_notify(struct vhost_virtqueue *vq)
 
 	return vhost16_to_cpu(vq, avail_idx) != vq->avail_idx;
 }
+
+/* OK, now we need to know about added descriptors. */
+bool vhost_enable_notify(struct vhost_virtqueue *vq)
+{
+	VHOST_ALTERNATIVE(vhost_enable_notify, vq);
+}
 EXPORT_SYMBOL_GPL(vhost_enable_notify);
 
-/* We don't need to be notified again. */
-void vhost_disable_notify(struct vhost_virtqueue *vq)
+static void vhost_disable_notify_packed(struct vhost_virtqueue *vq)
+{
+	/* TODO: disable notification through device area */
+}
+
+static void vhost_disable_notify_split(struct vhost_virtqueue *vq)
 {
 	int r;
 
@@ -3100,6 +3918,12 @@ void vhost_disable_notify(struct vhost_virtqueue *vq)
 			       &vq->used->flags, r);
 	}
 }
+
+/* We don't need to be notified again. */
+void vhost_disable_notify(struct vhost_virtqueue *vq)
+{
+	VHOST_ALTERNATIVE(vhost_disable_notify, vq);
+}
 EXPORT_SYMBOL_GPL(vhost_disable_notify);
 
 /* Create a new message. */
diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h
index b8a5d1a2bed9..7f3a2dd1b628 100644
--- a/drivers/vhost/vhost.h
+++ b/drivers/vhost/vhost.h
@@ -37,8 +37,17 @@ struct vhost_poll {
 	struct vhost_dev	 *dev;
 };
 
+struct vring_packed_used_elem {
+	__u16 id;
+	__u16 count;
+	__u32 len;
+};
+
 struct vhost_used_elem {
-	struct vring_used_elem elem;
+	union {
+		struct vring_used_elem elem;
+		struct vring_packed_used_elem packed_elem;
+	};
 };
 
 void vhost_work_init(struct vhost_work *work, vhost_work_fn_t fn);
@@ -112,7 +121,10 @@ struct vhost_virtqueue {
 	/* The actual ring of buffers. */
 	struct mutex mutex;
 	unsigned int num;
-	struct vring_desc __user *desc;
+	union {
+		struct vring_desc __user *desc;
+		struct vring_packed_desc __user *desc_packed;
+	};
 	struct vring_avail __user *avail;
 	struct vring_used __user *used;
 
@@ -166,7 +178,7 @@ struct vhost_virtqueue {
 	struct iovec iov[UIO_MAXIOV];
 	struct iovec iotlb_iov[64];
 	struct iovec *indirect;
-	struct vhost_used_elem *heads;
+	void *heads;
 	int nheads;
 	/* Protected by virtqueue mutex. */
 	struct vhost_umem *umem;
@@ -188,6 +200,9 @@ struct vhost_virtqueue {
 	u32 busyloop_timeout;
 	spinlock_t mmu_lock;
 	int invalidate_count;
+	bool last_used_wrap_counter;
+	bool avail_wrap_counter;
+	bool last_avail_wrap_counter;
 };
 
 struct vhost_msg_node {
@@ -241,7 +256,8 @@ int vhost_get_vq_desc(struct vhost_virtqueue *,
 		      struct iovec iov[], unsigned int iov_count,
 		      unsigned int *out_num, unsigned int *in_num,
 		      struct vhost_log *log, unsigned int *log_num);
-void vhost_discard_vq_desc(struct vhost_virtqueue *, int n);
+void vhost_discard_vq_desc(struct vhost_virtqueue *vq,
+			   struct vhost_used_elem *elem);
 
 int vhost_vq_init_access(struct vhost_virtqueue *);
 int vhost_add_used(struct vhost_virtqueue *, struct vhost_used_elem *, int);
-- 
2.18.1


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH V3 14/15] vhost: event suppression for packed ring
  2019-07-17 10:52 [PATCH V3 00/15] Packed virtqueue support for vhost Jason Wang
                   ` (12 preceding siblings ...)
  2019-07-17 10:52 ` [PATCH V3 13/15] vhost: packed ring support Jason Wang
@ 2019-07-17 10:52 ` Jason Wang
  2019-07-17 10:52 ` [PATCH V3 15/15] vhost: enable packed virtqueues Jason Wang
  2019-07-17 11:02 ` [PATCH V3 00/15] Packed virtqueue support for vhost Michael S. Tsirkin
  15 siblings, 0 replies; 19+ messages in thread
From: Jason Wang @ 2019-07-17 10:52 UTC (permalink / raw)
  To: mst, jasowang
  Cc: kvm, virtualization, netdev, linux-kernel, jfreimann, tiwei.bie,
	maxime.coquelin

This patch introduces support for event suppression. This is done by
have a two areas: device area and driver area. One side could then try
to disable or enable (delayed) notification from other side by using a
boolean hint or event index interface in the areas.

For more information, please refer Virtio spec.

Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 drivers/vhost/vhost.c | 265 +++++++++++++++++++++++++++++++++++++++---
 drivers/vhost/vhost.h |  11 +-
 2 files changed, 255 insertions(+), 21 deletions(-)

diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index a7d24b9d5204..a188e9af3b35 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -1524,6 +1524,76 @@ static inline int vhost_put_desc_flags(struct vhost_virtqueue *vq,
 	return vhost_put_user(vq, *flags, &desc->flags, VHOST_ADDR_DESC);
 }
 
+static int vhost_get_driver_off_wrap(struct vhost_virtqueue *vq,
+				     __virtio16 *off_wrap)
+{
+#if VHOST_ARCH_CAN_ACCEL_UACCESS
+	struct vring_packed_desc_event *event =
+	       vhost_get_meta_ptr(vq, VHOST_ADDR_AVAIL);
+	if (likely(event)) {
+		*off_wrap = event->off_wrap;
+		vhost_put_meta_ptr();
+		return 0;
+	}
+#endif
+	return vhost_get_user(vq, *off_wrap,
+			      &vq->driver_event->off_wrap,
+			      VHOST_ADDR_AVAIL);
+}
+
+static int vhost_get_driver_flags(struct vhost_virtqueue *vq,
+				  __virtio16 *driver_flags)
+{
+#if VHOST_ARCH_CAN_ACCEL_UACCESS
+	struct vring_packed_desc_event *event =
+	       vhost_get_meta_ptr(vq, VHOST_ADDR_AVAIL);
+
+	if (likely(event)) {
+		*driver_flags = event->flags;
+		vhost_put_meta_ptr();
+		return 0;
+	}
+#endif
+	return vhost_get_user(vq, *driver_flags, &vq->driver_event->flags,
+			      VHOST_ADDR_AVAIL);
+}
+
+static int vhost_put_device_off_wrap(struct vhost_virtqueue *vq,
+				     __virtio16 *off_wrap)
+{
+#if VHOST_ARCH_CAN_ACCEL_UACCESS
+	struct vring_packed_desc_event *event =
+	       vhost_get_meta_ptr(vq, VHOST_ADDR_USED);
+
+	if (likely(event)) {
+		event->off_wrap = *off_wrap;
+		vhost_put_meta_ptr();
+		return 0;
+	}
+#endif
+	return vhost_put_user(vq, *off_wrap,
+			      &vq->device_event->off_wrap,
+			      VHOST_ADDR_USED);
+}
+
+static int vhost_put_device_flags(struct vhost_virtqueue *vq,
+				  __virtio16 *device_flags)
+{
+#if VHOST_ARCH_CAN_ACCEL_UACCESS
+	struct vring_packed_desc_event *event =
+	       vhost_get_meta_ptr(vq, VHOST_ADDR_USED);
+
+	if (likely(event)) {
+		event->flags = *device_flags;
+		vhost_put_meta_ptr();
+		return 0;
+	}
+#endif
+	return vhost_put_user(vq, *device_flags,
+			      &vq->device_event->flags,
+			      VHOST_ADDR_USED);
+}
+
 static int vhost_new_umem_range(struct vhost_umem *umem,
 				u64 start, u64 size, u64 end,
 				u64 userspace_addr, int perm)
@@ -1809,10 +1879,15 @@ static int vq_access_ok_packed(struct vhost_virtqueue *vq, unsigned int num,
 			       struct vring_used __user *used)
 {
 	struct vring_packed_desc *packed = (struct vring_packed_desc *)desc;
+	struct vring_packed_desc_event *driver_event =
+		(struct vring_packed_desc_event *)avail;
+	struct vring_packed_desc_event *device_event =
+		(struct vring_packed_desc_event *)used;
 
-	/* TODO: check device area and driver area */
 	return access_ok(packed, num * sizeof(*packed)) &&
-	       access_ok(packed, num * sizeof(*packed));
+	       access_ok(packed, num * sizeof(*packed)) &&
+	       access_ok(driver_event, sizeof(*driver_event)) &&
+	       access_ok(device_event, sizeof(*device_event));
 }
 
 static int vq_access_ok_split(struct vhost_virtqueue *vq, unsigned int num,
@@ -1904,16 +1979,25 @@ static void vhost_vq_map_prefetch(struct vhost_virtqueue *vq)
 }
 #endif
 
-int vq_meta_prefetch(struct vhost_virtqueue *vq)
+static int vq_iotlb_prefetch_packed(struct vhost_virtqueue *vq)
 {
-	unsigned int num = vq->num;
+	int num = vq->num;
 
-	if (!vq->iotlb) {
-#if VHOST_ARCH_CAN_ACCEL_UACCESS
-		vhost_vq_map_prefetch(vq);
-#endif
-		return 1;
-	}
+	return iotlb_access_ok(vq, VHOST_ACCESS_RO, (u64)(uintptr_t)vq->desc,
+			       num * sizeof(*vq->desc), VHOST_ADDR_DESC) &&
+	       iotlb_access_ok(vq, VHOST_ACCESS_WO, (u64)(uintptr_t)vq->desc,
+			       num * sizeof(*vq->desc), VHOST_ADDR_DESC) &&
+	       iotlb_access_ok(vq, VHOST_ACCESS_RO,
+			       (u64)(uintptr_t)vq->driver_event,
+			       sizeof(*vq->driver_event), VHOST_ADDR_AVAIL) &&
+	       iotlb_access_ok(vq, VHOST_ACCESS_WO,
+			       (u64)(uintptr_t)vq->device_event,
+			       sizeof(*vq->device_event), VHOST_ADDR_USED);
+}
+
+static int vq_iotlb_prefetch_split(struct vhost_virtqueue *vq)
+{
+	unsigned int num = vq->num;
 
 	return iotlb_access_ok(vq, VHOST_ACCESS_RO,
 			       (u64)(uintptr_t)vq->desc,
@@ -1928,6 +2012,21 @@ int vq_meta_prefetch(struct vhost_virtqueue *vq)
 			       vhost_get_used_size(vq, num),
 			       VHOST_ADDR_USED);
 }
+
+int vq_meta_prefetch(struct vhost_virtqueue *vq)
+{
+	if (!vq->iotlb) {
+#if VHOST_ARCH_CAN_ACCEL_UACCESS
+		vhost_vq_map_prefetch(vq);
+#endif
+		return 1;
+	}
+
+	if (vhost_has_feature(vq, VIRTIO_F_RING_PACKED))
+		return vq_iotlb_prefetch_packed(vq);
+	else
+		return vq_iotlb_prefetch_split(vq);
+}
 EXPORT_SYMBOL_GPL(vq_meta_prefetch);
 
 /* Can we log writes? */
@@ -2620,6 +2719,48 @@ static int vhost_update_used_flags(struct vhost_virtqueue *vq)
 	return 0;
 }
 
+static int vhost_update_device_flags(struct vhost_virtqueue *vq,
+				     __virtio16 device_flags)
+{
+	void __user *flags;
+
+	if (vhost_put_device_flags(vq, &device_flags))
+		return -EFAULT;
+	if (unlikely(vq->log_used)) {
+		/* Make sure the flag is seen before log. */
+		smp_wmb();
+		/* Log used flag write. */
+		flags = &vq->device_event->flags;
+		log_write(vq->log_base, vq->log_addr +
+			  (flags - (void __user *)vq->device_event),
+			  sizeof(vq->device_event->flags));
+		if (vq->log_ctx)
+			eventfd_signal(vq->log_ctx, 1);
+	}
+	return 0;
+}
+
+static int vhost_update_device_off_wrap(struct vhost_virtqueue *vq,
+					__virtio16 device_off_wrap)
+{
+	void __user *off_wrap;
+
+	if (vhost_put_device_off_wrap(vq, &device_off_wrap))
+		return -EFAULT;
+	if (unlikely(vq->log_used)) {
+		/* Make sure the flag is seen before log. */
+		smp_wmb();
+		/* Log used flag write. */
+		off_wrap = &vq->device_event->off_wrap;
+		log_write(vq->log_base, vq->log_addr +
+			  (off_wrap - (void __user *)vq->device_event),
+			  sizeof(vq->device_event->off_wrap));
+		if (vq->log_ctx)
+			eventfd_signal(vq->log_ctx, 1);
+	}
+	return 0;
+}
+
 static int vhost_update_avail_event(struct vhost_virtqueue *vq, u16 avail_event)
 {
 	if (vhost_put_avail_event(vq))
@@ -3689,16 +3830,13 @@ int vhost_add_used(struct vhost_virtqueue *vq, struct vhost_used_elem *used,
 }
 EXPORT_SYMBOL_GPL(vhost_add_used);
 
-static bool vhost_notify(struct vhost_dev *dev, struct vhost_virtqueue *vq)
+static bool vhost_notify_split(struct vhost_dev *dev,
+			       struct vhost_virtqueue *vq)
 {
 	__u16 old, new;
 	__virtio16 event;
 	bool v;
 
-	/* TODO: check driver area */
-	if (vhost_has_feature(vq, VIRTIO_F_RING_PACKED))
-		return true;
-
 	/* Flush out used index updates. This is paired
 	 * with the barrier that the Guest executes when enabling
 	 * interrupts. */
@@ -3731,6 +3869,62 @@ static bool vhost_notify(struct vhost_dev *dev, struct vhost_virtqueue *vq)
 	return vring_need_event(vhost16_to_cpu(vq, event), new, old);
 }
 
+static bool vhost_notify_packed(struct vhost_dev *dev,
+				struct vhost_virtqueue *vq)
+{
+	__virtio16 event_off_wrap, event_flags;
+	__u16 old, new, off_wrap;
+	bool v;
+
+	/* Flush out used descriptors updates. This is paired
+	 * with the barrier that the Guest executes when enabling
+	 * interrupts.
+	 */
+	smp_mb();
+
+	if (vhost_get_driver_flags(vq, &event_flags)) {
+		vq_err(vq, "Failed to get driver desc_event_flags");
+		return true;
+	}
+
+	if (!vhost_has_feature(vq, VIRTIO_RING_F_EVENT_IDX))
+		return event_flags !=
+		       cpu_to_vhost16(vq, VRING_PACKED_EVENT_FLAG_DISABLE);
+
+	old = vq->signalled_used;
+	v = vq->signalled_used_valid;
+	new = vq->signalled_used = vq->last_used_idx;
+	vq->signalled_used_valid = true;
+
+	if (event_flags != cpu_to_vhost16(vq, VRING_PACKED_EVENT_FLAG_DESC))
+		return event_flags !=
+		       cpu_to_vhost16(vq, VRING_PACKED_EVENT_FLAG_DISABLE);
+
+	/* Read desc event flags before event_off and event_wrap */
+	smp_rmb();
+
+	if (vhost_get_driver_off_wrap(vq, &event_off_wrap) < 0) {
+		vq_err(vq, "Failed to get driver desc_event_off/wrap");
+		return true;
+	}
+
+	off_wrap = vhost16_to_cpu(vq, event_off_wrap);
+
+	if (unlikely(!v))
+		return true;
+
+	return vhost_vring_packed_need_event(vq, vq->last_used_wrap_counter,
+					     off_wrap, new, old);
+}
+
+static bool vhost_notify(struct vhost_dev *dev, struct vhost_virtqueue *vq)
+{
+	if (vhost_has_feature(vq, VIRTIO_F_RING_PACKED))
+		return vhost_notify_packed(dev, vq);
+	else
+		return vhost_notify_split(dev, vq);
+}
+
 /* This actually signals the guest, using eventfd. */
 void vhost_signal(struct vhost_dev *dev, struct vhost_virtqueue *vq)
 {
@@ -3836,10 +4030,34 @@ EXPORT_SYMBOL_GPL(vhost_vq_avail_empty);
 static bool vhost_enable_notify_packed(struct vhost_virtqueue *vq)
 {
 	struct vring_packed_desc *d = vq->desc_packed + vq->avail_idx;
-	__virtio16 flags;
+	__virtio16 flags = cpu_to_vhost16(vq, VRING_PACKED_EVENT_FLAG_ENABLE);
 	int ret;
 
-	/* TODO: enable notification through device area */
+	if (!(vq->used_flags & VRING_USED_F_NO_NOTIFY))
+		return false;
+	vq->used_flags &= ~VRING_USED_F_NO_NOTIFY;
+
+	if (vhost_has_feature(vq, VIRTIO_RING_F_EVENT_IDX)) {
+		__virtio16 off_wrap = cpu_to_vhost16(vq, vq->avail_idx |
+				      vq->avail_wrap_counter << 15);
+
+		ret = vhost_update_device_off_wrap(vq, off_wrap);
+		if (ret) {
+			vq_err(vq, "Failed to write to off warp at %p: %d\n",
+			       &vq->device_event->off_wrap, ret);
+			return false;
+		}
+		/* Make sure off_wrap is wrote before flags */
+		smp_wmb();
+		flags = cpu_to_vhost16(vq, VRING_PACKED_EVENT_FLAG_DESC);
+	}
+
+	ret = vhost_update_device_flags(vq, flags);
+	if (ret) {
+		vq_err(vq, "Failed to enable notification at %p: %d\n",
+			&vq->device_event->flags, ret);
+		return false;
+	}
 
 	/* They could have slipped one in as we were doing that: make
 	 * sure it's written, then check again.
@@ -3901,7 +4119,18 @@ EXPORT_SYMBOL_GPL(vhost_enable_notify);
 
 static void vhost_disable_notify_packed(struct vhost_virtqueue *vq)
 {
-	/* TODO: disable notification through device area */
+	__virtio16 flags;
+	int r;
+
+	if (vq->used_flags & VRING_USED_F_NO_NOTIFY)
+		return;
+	vq->used_flags |= VRING_USED_F_NO_NOTIFY;
+
+	flags = cpu_to_vhost16(vq, VRING_PACKED_EVENT_FLAG_DISABLE);
+	r = vhost_update_device_flags(vq, flags);
+	if (r)
+		vq_err(vq, "Failed to enable notification at %p: %d\n",
+		       &vq->device_event->flags, r);
 }
 
 static void vhost_disable_notify_split(struct vhost_virtqueue *vq)
diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h
index 7f3a2dd1b628..bb3f8bb763b9 100644
--- a/drivers/vhost/vhost.h
+++ b/drivers/vhost/vhost.h
@@ -125,9 +125,14 @@ struct vhost_virtqueue {
 		struct vring_desc __user *desc;
 		struct vring_packed_desc __user *desc_packed;
 	};
-	struct vring_avail __user *avail;
-	struct vring_used __user *used;
-
+	union {
+		struct vring_avail __user *avail;
+		struct vring_packed_desc_event __user *driver_event;
+	};
+	union {
+		struct vring_used __user *used;
+		struct vring_packed_desc_event __user *device_event;
+	};
 #if VHOST_ARCH_CAN_ACCEL_UACCESS
 	/* Read by memory accessors, modified by meta data
 	 * prefetching, MMU notifier and vring ioctl().
-- 
2.18.1


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH V3 15/15] vhost: enable packed virtqueues
  2019-07-17 10:52 [PATCH V3 00/15] Packed virtqueue support for vhost Jason Wang
                   ` (13 preceding siblings ...)
  2019-07-17 10:52 ` [PATCH V3 14/15] vhost: event suppression for packed ring Jason Wang
@ 2019-07-17 10:52 ` Jason Wang
  2019-07-17 11:02 ` [PATCH V3 00/15] Packed virtqueue support for vhost Michael S. Tsirkin
  15 siblings, 0 replies; 19+ messages in thread
From: Jason Wang @ 2019-07-17 10:52 UTC (permalink / raw)
  To: mst, jasowang
  Cc: kvm, virtualization, netdev, linux-kernel, jfreimann, tiwei.bie,
	maxime.coquelin

This patch enables packed virtqueue support for vhost.

Testpmd (virtio-user) + vhost_net shows about 2.6% improvemetn on TX
PPS:

Before: 5.75Mpps
After : 5.90Mpps

Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 drivers/vhost/vhost.h | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h
index bb3f8bb763b9..5483eea84a5c 100644
--- a/drivers/vhost/vhost.h
+++ b/drivers/vhost/vhost.h
@@ -326,7 +326,8 @@ enum {
 			 (1ULL << VIRTIO_RING_F_EVENT_IDX) |
 			 (1ULL << VHOST_F_LOG_ALL) |
 			 (1ULL << VIRTIO_F_ANY_LAYOUT) |
-			 (1ULL << VIRTIO_F_VERSION_1)
+			 (1ULL << VIRTIO_F_VERSION_1) |
+			 (1ULL << VIRTIO_F_RING_PACKED)
 };
 
 static inline bool vhost_has_feature(struct vhost_virtqueue *vq, int bit)
-- 
2.18.1


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH V3 00/15] Packed virtqueue support for vhost
  2019-07-17 10:52 [PATCH V3 00/15] Packed virtqueue support for vhost Jason Wang
                   ` (14 preceding siblings ...)
  2019-07-17 10:52 ` [PATCH V3 15/15] vhost: enable packed virtqueues Jason Wang
@ 2019-07-17 11:02 ` Michael S. Tsirkin
  2019-07-17 12:27   ` Jason Wang
  15 siblings, 1 reply; 19+ messages in thread
From: Michael S. Tsirkin @ 2019-07-17 11:02 UTC (permalink / raw)
  To: Jason Wang
  Cc: kvm, virtualization, netdev, linux-kernel, jfreimann, tiwei.bie,
	maxime.coquelin

On Wed, Jul 17, 2019 at 06:52:40AM -0400, Jason Wang wrote:
> Hi all:
> 
> This series implements packed virtqueues which were described
> at [1]. In this version we try to address the performance regression
> saw by V2. The root cause is packed virtqueue need more times of
> userspace memory accesssing which turns out to be very
> expensive. Thanks to the help of 7f466032dc9e ("vhost: access vq
> metadata through kernel virtual address"), such overhead cold be
> eliminated. So in this version, we can see about 2% improvement for
> packed virtqueue on PPS.


Great job, thanks!
Pls allow a bit more review time than usual as this is a big patchset.
Should be done by Tuesday.
-next material anyway.

> More optimizations (e.g IN_ORDER) is on the road.
> 
> Please review.
> 
> [1] https://docs.oasis-open.org/virtio/virtio/v1.1/csprd01/virtio-v1.1-csprd01.html#x1-610007
> 
> This version were tested with:
> - zercopy/datacopy
> - mergeable buffer on/off
> - TCP stream & virtio-user
> 
> Changes from V2:
> - rebase on top of vhost metadata accelreation series
> - introduce shadow used ring API
> - new SET_VRING_BASE/GET_VRING_BASE that takes care about warp counter
>   and index for both avail and used
> - various twaeaks
> 
> Changes from V1:
> - drop uapi patch and use Tiwei's
> - split the enablement of packed virtqueue into a separate patch
> 
> Changes from RFC V5:
> - save unnecessary barriers during vhost_add_used_packed_n()
> - more compact math for event idx
> - fix failure of SET_VRING_BASE when avail_wrap_counter is true
> - fix not copy avail_wrap_counter during GET_VRING_BASE
> - introduce SET_VRING_USED_BASE/GET_VRING_USED_BASE for syncing
> - last_used_idx
> - rename used_wrap_counter to last_used_wrap_counter
> - rebase to net-next
> 
> Changes from RFC V4:
> - fix signalled_used index recording
> - track avail index correctly
> - various minor fixes
> 
> Changes from RFC V3:
> - Fix math on event idx checking
> - Sync last avail wrap counter through GET/SET_VRING_BASE
> - remove desc_event prefix in the driver/device structure
> 
> Changes from RFC V2:
> - do not use & in checking desc_event_flags
> - off should be most significant bit
> - remove the workaround of mergeable buffer for dpdk prototype
> - id should be in the last descriptor in the chain
> - keep _F_WRITE for write descriptor when adding used
> - device flags updating should use ADDR_USED type
> - return error on unexpected unavail descriptor in a chain
> - return false in vhost_ve_avail_empty is descriptor is available
> - track last seen avail_wrap_counter
> - correctly examine available descriptor in get_indirect_packed()
> - vhost_idx_diff should return u16 instead of bool
> 
> Changes from RFC V1:
> - Refactor vhost used elem code to avoid open coding on used elem
> - Event suppression support (compile test only).
> - Indirect descriptor support (compile test only).
> - Zerocopy support.
> - vIOMMU support.
> - SCSI/VSOCK support (compile test only).
> - Fix several bugs
> 
> Jason Wang (15):
>   vhost: simplify meta data pointer accessing
>   vhost: remove the unnecessary parameter of vhost_vq_avail_empty()
>   vhost: remove unnecessary parameter of
>     vhost_enable_notify()/vhost_disable_notify
>   vhost-net: don't use vhost_add_used_n() for zerocopy
>   vhost: introduce helpers to manipulate shadow used ring
>   vhost_net: switch TX to use shadow used ring API
>   vhost_net: calculate last used length once for mergeable buffer
>   vhost_net: switch to use shadow used ring API for RX
>   vhost: do not export vhost_add_used_n() and
>     vhost_add_used_and_signal_n()
>   vhost: hide used ring layout from device
>   vhost: do not use vring_used_elem
>   vhost: vhost_put_user() can accept metadata type
>   vhost: packed ring support
>   vhost: event suppression for packed ring
>   vhost: enable packed virtqueues
> 
>  drivers/vhost/net.c   |  200 +++---
>  drivers/vhost/scsi.c  |   72 +-
>  drivers/vhost/test.c  |    6 +-
>  drivers/vhost/vhost.c | 1508 +++++++++++++++++++++++++++++++++++------
>  drivers/vhost/vhost.h |   78 ++-
>  drivers/vhost/vsock.c |   57 +-
>  6 files changed, 1513 insertions(+), 408 deletions(-)
> 
> -- 
> 2.18.1

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH V3 00/15] Packed virtqueue support for vhost
  2019-07-17 11:02 ` [PATCH V3 00/15] Packed virtqueue support for vhost Michael S. Tsirkin
@ 2019-07-17 12:27   ` Jason Wang
  2019-07-17 14:28     ` Michael S. Tsirkin
  0 siblings, 1 reply; 19+ messages in thread
From: Jason Wang @ 2019-07-17 12:27 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: kvm, virtualization, netdev, linux-kernel, jfreimann, tiwei.bie,
	maxime.coquelin


On 2019/7/17 下午7:02, Michael S. Tsirkin wrote:
> On Wed, Jul 17, 2019 at 06:52:40AM -0400, Jason Wang wrote:
>> Hi all:
>>
>> This series implements packed virtqueues which were described
>> at [1]. In this version we try to address the performance regression
>> saw by V2. The root cause is packed virtqueue need more times of
>> userspace memory accesssing which turns out to be very
>> expensive. Thanks to the help of 7f466032dc9e ("vhost: access vq
>> metadata through kernel virtual address"), such overhead cold be
>> eliminated. So in this version, we can see about 2% improvement for
>> packed virtqueue on PPS.
> Great job, thanks!
> Pls allow a bit more review time than usual as this is a big patchset.
> Should be done by Tuesday.
> -next material anyway.


Sure, just to confirm, I think this should go for your vhost tree?.

Thanks


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH V3 00/15] Packed virtqueue support for vhost
  2019-07-17 12:27   ` Jason Wang
@ 2019-07-17 14:28     ` Michael S. Tsirkin
  0 siblings, 0 replies; 19+ messages in thread
From: Michael S. Tsirkin @ 2019-07-17 14:28 UTC (permalink / raw)
  To: Jason Wang
  Cc: kvm, virtualization, netdev, linux-kernel, jfreimann, tiwei.bie,
	maxime.coquelin

On Wed, Jul 17, 2019 at 08:27:28PM +0800, Jason Wang wrote:
> 
> On 2019/7/17 下午7:02, Michael S. Tsirkin wrote:
> > On Wed, Jul 17, 2019 at 06:52:40AM -0400, Jason Wang wrote:
> > > Hi all:
> > > 
> > > This series implements packed virtqueues which were described
> > > at [1]. In this version we try to address the performance regression
> > > saw by V2. The root cause is packed virtqueue need more times of
> > > userspace memory accesssing which turns out to be very
> > > expensive. Thanks to the help of 7f466032dc9e ("vhost: access vq
> > > metadata through kernel virtual address"), such overhead cold be
> > > eliminated. So in this version, we can see about 2% improvement for
> > > packed virtqueue on PPS.
> > Great job, thanks!
> > Pls allow a bit more review time than usual as this is a big patchset.
> > Should be done by Tuesday.
> > -next material anyway.
> 
> 
> Sure, just to confirm, I think this should go for your vhost tree?.
> 
> Thanks

I think this makes sense, yes.

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, back to index

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-07-17 10:52 [PATCH V3 00/15] Packed virtqueue support for vhost Jason Wang
2019-07-17 10:52 ` [PATCH V3 01/15] vhost: simplify meta data pointer accessing Jason Wang
2019-07-17 10:52 ` [PATCH V3 02/15] vhost: remove the unnecessary parameter of vhost_vq_avail_empty() Jason Wang
2019-07-17 10:52 ` [PATCH V3 03/15] vhost: remove unnecessary parameter of vhost_enable_notify()/vhost_disable_notify Jason Wang
2019-07-17 10:52 ` [PATCH V3 04/15] vhost-net: don't use vhost_add_used_n() for zerocopy Jason Wang
2019-07-17 10:52 ` [PATCH V3 05/15] vhost: introduce helpers to manipulate shadow used ring Jason Wang
2019-07-17 10:52 ` [PATCH V3 06/15] vhost_net: switch TX to use shadow used ring API Jason Wang
2019-07-17 10:52 ` [PATCH V3 07/15] vhost_net: calculate last used length once for mergeable buffer Jason Wang
2019-07-17 10:52 ` [PATCH V3 08/15] vhost_net: switch to use shadow used ring API for RX Jason Wang
2019-07-17 10:52 ` [PATCH V3 09/15] vhost: do not export vhost_add_used_n() and vhost_add_used_and_signal_n() Jason Wang
2019-07-17 10:52 ` [PATCH V3 10/15] vhost: hide used ring layout from device Jason Wang
2019-07-17 10:52 ` [PATCH V3 11/15] vhost: do not use vring_used_elem Jason Wang
2019-07-17 10:52 ` [PATCH V3 12/15] vhost: vhost_put_user() can accept metadata type Jason Wang
2019-07-17 10:52 ` [PATCH V3 13/15] vhost: packed ring support Jason Wang
2019-07-17 10:52 ` [PATCH V3 14/15] vhost: event suppression for packed ring Jason Wang
2019-07-17 10:52 ` [PATCH V3 15/15] vhost: enable packed virtqueues Jason Wang
2019-07-17 11:02 ` [PATCH V3 00/15] Packed virtqueue support for vhost Michael S. Tsirkin
2019-07-17 12:27   ` Jason Wang
2019-07-17 14:28     ` Michael S. Tsirkin

KVM Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/kvm/0 kvm/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 kvm kvm/ https://lore.kernel.org/kvm \
		kvm@vger.kernel.org kvm@archiver.kernel.org
	public-inbox-index kvm


Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.kvm


AGPL code for this site: git clone https://public-inbox.org/ public-inbox