linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH RFC 0/2] Packed ring for vhost
@ 2018-02-14  2:37 Jason Wang
  2018-02-14  2:37 ` [PATCH RFC 1/2] virtio: introduce packed ring defines Jason Wang
                   ` (3 more replies)
  0 siblings, 4 replies; 8+ messages in thread
From: Jason Wang @ 2018-02-14  2:37 UTC (permalink / raw)
  To: mst, virtualization, linux-kernel, netdev
  Cc: wexu, jfreimann, tiwei.bie, Jason Wang

Hi all:

This RFC implement a subset of packed ring which was described at
https://github.com/oasis-tcs/virtio-docs/blob/master/virtio-v1.1-packed-wd07.pdf
. The code were tested with pmd implement by Jens at
http://dpdk.org/ml/archives/dev/2018-January/089417.html. Minor
change was needed for pmd codes to kick virtqueue since it assumes a
busy polling backend.

Test were done between localhost and guest. Testpmd (rxonly) in guest
reports 2.4Mpps. Testpmd (txonly) repots about 2.1Mpps.

It's not a complete implemention, here's what were missed:

- Device Area
- Driver Area
- Descriptor indirection
- Zerocopy may not be functional
- Migration path is not tested
- Vhost devices except for net
- vIOMMU can not work (mainly because the metadata prefetch is not
  implemented).
- See FIXME/TODO in the codes for more details
- No batching or other optimizations were implemented

For a quick prototype, this series open code the tracking of warp
counter and descriptor index at net device. This will be addressed in
the future by:

- Move get_rx_bufs() from net.c to vhost.c
- Let vhost_get_vq_desc() returns vring_used_elem instad of head id

With the above, we can hide the internal (at least part of) vring
layout from specific device.

Please review.

Thanks

Jason Wang (2):
  virtio: introduce packed ring defines
  vhost: packed ring support

 drivers/vhost/net.c                |  14 +-
 drivers/vhost/vhost.c              | 351 ++++++++++++++++++++++++++++++++++---
 drivers/vhost/vhost.h              |   6 +-
 include/uapi/linux/virtio_config.h |   9 +
 include/uapi/linux/virtio_ring.h   |  17 ++
 5 files changed, 369 insertions(+), 28 deletions(-)

-- 
2.7.4

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH RFC 1/2] virtio: introduce packed ring defines
  2018-02-14  2:37 [PATCH RFC 0/2] Packed ring for vhost Jason Wang
@ 2018-02-14  2:37 ` Jason Wang
  2018-02-14  2:37 ` [PATCH RFC 2/2] vhost: packed ring support Jason Wang
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 8+ messages in thread
From: Jason Wang @ 2018-02-14  2:37 UTC (permalink / raw)
  To: mst, virtualization, linux-kernel, netdev
  Cc: wexu, jfreimann, tiwei.bie, Jason Wang

Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 include/uapi/linux/virtio_config.h |  9 +++++++++
 include/uapi/linux/virtio_ring.h   | 17 +++++++++++++++++
 2 files changed, 26 insertions(+)

diff --git a/include/uapi/linux/virtio_config.h b/include/uapi/linux/virtio_config.h
index 308e209..5903d51 100644
--- a/include/uapi/linux/virtio_config.h
+++ b/include/uapi/linux/virtio_config.h
@@ -71,4 +71,13 @@
  * this is for compatibility with legacy systems.
  */
 #define VIRTIO_F_IOMMU_PLATFORM		33
+
+#define VIRTIO_F_RING_PACKED		34
+
+/*
+ * This feature indicates that all buffers are used by the device in
+ * the same order in which they have been made available.
+ */
+#define VIRTIO_F_IN_ORDER		35
+
 #endif /* _UAPI_LINUX_VIRTIO_CONFIG_H */
diff --git a/include/uapi/linux/virtio_ring.h b/include/uapi/linux/virtio_ring.h
index 6d5d5fa..a169b53 100644
--- a/include/uapi/linux/virtio_ring.h
+++ b/include/uapi/linux/virtio_ring.h
@@ -43,6 +43,8 @@
 #define VRING_DESC_F_WRITE	2
 /* This means the buffer contains a list of buffer descriptors. */
 #define VRING_DESC_F_INDIRECT	4
+#define VRING_DESC_F_AVAIL      7
+#define VRING_DESC_F_USED	15
 
 /* The Host uses this in used->flags to advise the Guest: don't kick me when
  * you add a buffer.  It's unreliable, so it's simply an optimization.  Guest
@@ -62,6 +64,17 @@
  * at the end of the used ring. Guest should ignore the used->flags field. */
 #define VIRTIO_RING_F_EVENT_IDX		29
 
+struct vring_desc_packed {
+	/* Buffer Address. */
+	__virtio64 addr;
+	/* Buffer Length. */
+	__virtio32 len;
+	/* Buffer ID. */
+	__virtio16 id;
+	/* The flags depending on descriptor type. */
+	__virtio16 flags;
+};
+
 /* Virtio ring descriptors: 16 bytes.  These can chain together via "next". */
 struct vring_desc {
 	/* Address (guest-physical). */
@@ -86,6 +99,10 @@ struct vring_used_elem {
 	__virtio32 id;
 	/* Total length of the descriptor chain which was used (written to) */
 	__virtio32 len;
+	/* Index of the descriptor that needs to write, used by packed ring. */
+	u16 idx;
+	/* Wrap counter for this used desc, used by packed ring. */
+	bool wrap_counter;
 };
 
 struct vring_used {
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH RFC 2/2] vhost: packed ring support
  2018-02-14  2:37 [PATCH RFC 0/2] Packed ring for vhost Jason Wang
  2018-02-14  2:37 ` [PATCH RFC 1/2] virtio: introduce packed ring defines Jason Wang
@ 2018-02-14  2:37 ` Jason Wang
  2018-02-27  9:03   ` Tiwei Bie
  2018-02-14  2:43 ` [PATCH RFC 0/2] Packed ring for vhost Jason Wang
  2018-02-14  2:47 ` Michael S. Tsirkin
  3 siblings, 1 reply; 8+ messages in thread
From: Jason Wang @ 2018-02-14  2:37 UTC (permalink / raw)
  To: mst, virtualization, linux-kernel, netdev
  Cc: wexu, jfreimann, tiwei.bie, Jason Wang

Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 drivers/vhost/net.c   |  14 +-
 drivers/vhost/vhost.c | 351 ++++++++++++++++++++++++++++++++++++++++++++++----
 drivers/vhost/vhost.h |   6 +-
 3 files changed, 343 insertions(+), 28 deletions(-)

diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index c613d2e..65b27c9 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -67,7 +67,8 @@ enum {
 	VHOST_NET_FEATURES = VHOST_FEATURES |
 			 (1ULL << VHOST_NET_F_VIRTIO_NET_HDR) |
 			 (1ULL << VIRTIO_NET_F_MRG_RXBUF) |
-			 (1ULL << VIRTIO_F_IOMMU_PLATFORM)
+			 (1ULL << VIRTIO_F_IOMMU_PLATFORM) |
+			 (1ULL << VIRTIO_F_RING_PACKED)
 };
 
 enum {
@@ -473,6 +474,7 @@ static void handle_tx(struct vhost_net *net)
 	struct socket *sock;
 	struct vhost_net_ubuf_ref *uninitialized_var(ubufs);
 	bool zcopy, zcopy_used;
+	struct vring_used_elem used;
 
 	mutex_lock(&vq->mutex);
 	sock = vq->private_data;
@@ -494,6 +496,8 @@ static void handle_tx(struct vhost_net *net)
 			vhost_zerocopy_signal_used(net, vq);
 
 
+		used.idx = vq->last_avail_idx & (vq->num - 1);
+		used.wrap_counter = vq->used_wrap_counter;
 		head = vhost_net_tx_get_vq_desc(net, vq, vq->iov,
 						ARRAY_SIZE(vq->iov),
 						&out, &in);
@@ -515,6 +519,8 @@ static void handle_tx(struct vhost_net *net)
 		}
 		/* Skip header. TODO: support TSO. */
 		len = iov_length(vq->iov, out);
+		used.id = head;
+		used.len = 0;
 		iov_iter_init(&msg.msg_iter, WRITE, vq->iov, out, len);
 		iov_iter_advance(&msg.msg_iter, hdr_size);
 		/* Sanity check */
@@ -576,7 +582,7 @@ static void handle_tx(struct vhost_net *net)
 			pr_debug("Truncated TX packet: "
 				 " len %d != %zd\n", err, len);
 		if (!zcopy_used)
-			vhost_add_used_and_signal(&net->dev, vq, head, 0);
+			vhost_add_used_and_signal_n(&net->dev, vq, &used, 1);
 		else
 			vhost_zerocopy_signal_used(net, vq);
 		vhost_net_tx_packet(net);
@@ -691,6 +697,8 @@ static int get_rx_bufs(struct vhost_virtqueue *vq,
 			r = -ENOBUFS;
 			goto err;
 		}
+		heads[headcount].idx = vq->last_avail_idx & (vq->num - 1);
+		heads[headcount].wrap_counter = vq->used_wrap_counter;
 		r = vhost_get_vq_desc(vq, vq->iov + seg,
 				      ARRAY_SIZE(vq->iov) - seg, &out,
 				      &in, log, log_num);
@@ -780,6 +788,8 @@ static void handle_rx(struct vhost_net *net)
 	vq_log = unlikely(vhost_has_feature(vq, VHOST_F_LOG_ALL)) ?
 		vq->log : NULL;
 	mergeable = vhost_has_feature(vq, VIRTIO_NET_F_MRG_RXBUF);
+	/* FIXME: workaround for current dpdk prototype */
+	mergeable = false;
 
 	while ((sock_len = vhost_net_rx_peek_head_len(net, sock->sk))) {
 		sock_len += sock_hlen;
diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index 2db5af8..5667d03 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -324,6 +324,7 @@ static void vhost_vq_reset(struct vhost_dev *dev,
 	vhost_reset_is_le(vq);
 	vhost_disable_cross_endian(vq);
 	vq->busyloop_timeout = 0;
+	vq->used_wrap_counter = true;
 	vq->umem = NULL;
 	vq->iotlb = NULL;
 	__vhost_vq_meta_reset(vq);
@@ -1136,10 +1137,22 @@ static int vhost_iotlb_miss(struct vhost_virtqueue *vq, u64 iova, int access)
 	return 0;
 }
 
-static int vq_access_ok(struct vhost_virtqueue *vq, unsigned int num,
-			struct vring_desc __user *desc,
-			struct vring_avail __user *avail,
-			struct vring_used __user *used)
+static int vq_access_ok_packed(struct vhost_virtqueue *vq, unsigned int num,
+			       struct vring_desc __user *desc,
+			       struct vring_avail __user *avail,
+			       struct vring_used __user *used)
+{
+	struct vring_desc_packed *packed = (struct vring_desc_packed *)desc;
+
+	/* FIXME: check device area and driver area */
+	return access_ok(VERIFY_READ, packed, num * sizeof(*packed)) &&
+	       access_ok(VERIFY_WRITE, packed, num * sizeof(*packed));
+}
+
+static int vq_access_ok_split(struct vhost_virtqueue *vq, unsigned int num,
+			      struct vring_desc __user *desc,
+			      struct vring_avail __user *avail,
+			      struct vring_used __user *used)
 
 {
 	size_t s = vhost_has_feature(vq, VIRTIO_RING_F_EVENT_IDX) ? 2 : 0;
@@ -1151,6 +1164,17 @@ static int vq_access_ok(struct vhost_virtqueue *vq, unsigned int num,
 			sizeof *used + num * sizeof *used->ring + s);
 }
 
+static int vq_access_ok(struct vhost_virtqueue *vq, unsigned int num,
+			struct vring_desc __user *desc,
+			struct vring_avail __user *avail,
+			struct vring_used __user *used)
+{
+	if (vhost_has_feature(vq, VIRTIO_F_RING_PACKED))
+		return vq_access_ok_packed(vq, num, desc, avail, used);
+	else
+		return vq_access_ok_split(vq, num, desc, avail, used);
+}
+
 static void vhost_vq_meta_update(struct vhost_virtqueue *vq,
 				 const struct vhost_umem_node *node,
 				 int type)
@@ -1763,6 +1787,9 @@ int vhost_vq_init_access(struct vhost_virtqueue *vq)
 
 	vhost_init_is_le(vq);
 
+	if (vhost_has_feature(vq, VIRTIO_F_RING_PACKED))
+		return 0;
+
 	r = vhost_update_used_flags(vq);
 	if (r)
 		goto err;
@@ -1947,18 +1974,136 @@ static int get_indirect(struct vhost_virtqueue *vq,
 	return 0;
 }
 
-/* This looks in the virtqueue and for the first available buffer, and converts
- * it to an iovec for convenient access.  Since descriptors consist of some
- * number of output then some number of input descriptors, it's actually two
- * iovecs, but we pack them into one and note how many of each there were.
- *
- * This function returns the descriptor number found, or vq->num (which is
- * never a valid descriptor number) if none was found.  A negative code is
- * returned on error. */
-int vhost_get_vq_desc(struct vhost_virtqueue *vq,
-		      struct iovec iov[], unsigned int iov_size,
-		      unsigned int *out_num, unsigned int *in_num,
-		      struct vhost_log *log, unsigned int *log_num)
+#define DESC_AVAIL (1 << VRING_DESC_F_AVAIL)
+#define DESC_USED  (1 << VRING_DESC_F_USED)
+static bool desc_is_avail(struct vhost_virtqueue *vq,
+			  struct vring_desc_packed *desc)
+{
+	if (vq->used_wrap_counter)
+		if ((desc->flags & DESC_AVAIL) && !(desc->flags & DESC_USED))
+			return true;
+	if (vq->used_wrap_counter == false)
+		if (!(desc->flags & DESC_AVAIL) && (desc->flags & DESC_USED))
+			return true;
+
+	return false;
+}
+
+static void set_desc_used(struct vhost_virtqueue *vq,
+			  struct vring_desc_packed *desc, bool wrap_counter)
+{
+	__virtio16 flags = desc->flags;
+
+	if (wrap_counter) {
+		desc->flags |= cpu_to_vhost16(vq, DESC_AVAIL);
+		desc->flags |= cpu_to_vhost16(vq, DESC_USED);
+	} else {
+		desc->flags &= ~cpu_to_vhost16(vq, DESC_AVAIL);
+		desc->flags &= ~cpu_to_vhost16(vq, DESC_USED);
+	}
+
+	desc->flags = flags;
+}
+
+static int vhost_get_vq_desc_packed(struct vhost_virtqueue *vq,
+				    struct iovec iov[], unsigned int iov_size,
+				    unsigned int *out_num, unsigned int *in_num,
+				    struct vhost_log *log,
+				    unsigned int *log_num)
+{
+	struct vring_desc_packed desc;
+	int ret, access, i;
+	u16 avail_idx = vq->last_avail_idx;
+
+	/* When we start there are none of either input nor output. */
+	*out_num = *in_num = 0;
+	if (unlikely(log))
+		*log_num = 0;
+
+	do {
+		unsigned int iov_count = *in_num + *out_num;
+
+		i = vq->last_avail_idx & (vq->num - 1);
+		ret = vhost_copy_from_user(vq, &desc, vq->desc_packed + i,
+					   sizeof(desc));
+		if (unlikely(ret)) {
+			vq_err(vq, "Failed to get descriptor: idx %d addr %p\n",
+				i, vq->desc_packed + i);
+			return -EFAULT;
+		}
+
+		if (!desc_is_avail(vq, &desc)) {
+			/* If there's nothing new since last we looked, return
+			 * invalid.
+			 */
+			if (likely(avail_idx == vq->last_avail_idx))
+				return vq->num;
+		}
+
+		/* Only start to read descriptor after we're sure it was
+		 * available.
+		 */
+		smp_rmb();
+
+		/* FIXME: support indirect */
+		if (desc.flags & cpu_to_vhost16(vq, VRING_DESC_F_INDIRECT)) {
+			vq_err(vq, "Indirect descriptor is not supported\n");
+			return -EFAULT;
+		}
+
+		if (desc.flags & cpu_to_vhost16(vq, VRING_DESC_F_WRITE))
+			access = VHOST_ACCESS_WO;
+		else
+			access = VHOST_ACCESS_RO;
+		ret = translate_desc(vq, vhost64_to_cpu(vq, desc.addr),
+				     vhost32_to_cpu(vq, desc.len),
+				     iov + iov_count, iov_size - iov_count,
+				     access);
+		if (unlikely(ret < 0)) {
+			if (ret != -EAGAIN)
+				vq_err(vq, "Translation failure %d idx %d\n",
+					ret, i);
+			return ret;
+		}
+
+		if (access == VHOST_ACCESS_WO) {
+			/* If this is an input descriptor,
+			 * increment that count.
+			 */
+			*in_num += ret;
+			if (unlikely(log)) {
+				log[*log_num].addr =
+					vhost64_to_cpu(vq, desc.addr);
+				log[*log_num].len =
+					vhost32_to_cpu(vq, desc.len);
+				++*log_num;
+			}
+		} else {
+			/* If it's an output descriptor, they're all supposed
+			 * to come before any input descriptors.
+			 */
+			if (unlikely(*in_num)) {
+				vq_err(vq, "Desc out after in: idx %d\n",
+				       i);
+				return -EINVAL;
+			}
+			*out_num += ret;
+		}
+
+		/* On success, increment avail index. */
+		if ((++vq->last_avail_idx & (vq->num - 1)) == 0)
+			vq->used_wrap_counter ^= 1;
+
+	/* If this descriptor says it doesn't chain, we're done. */
+	} while (desc.flags & cpu_to_vhost16(vq, VRING_DESC_F_NEXT));
+
+	return desc.id;
+}
+
+static int vhost_get_vq_desc_split(struct vhost_virtqueue *vq,
+				   struct iovec iov[], unsigned int iov_size,
+				   unsigned int *out_num, unsigned int *in_num,
+				   struct vhost_log *log, unsigned int *log_num)
 {
 	struct vring_desc desc;
 	unsigned int i, head, found = 0;
@@ -2096,6 +2241,30 @@ int vhost_get_vq_desc(struct vhost_virtqueue *vq,
 	BUG_ON(!(vq->used_flags & VRING_USED_F_NO_NOTIFY));
 	return head;
 }
+
+/* This looks in the virtqueue and for the first available buffer, and converts
+ * it to an iovec for convenient access.  Since descriptors consist of some
+ * number of output then some number of input descriptors, it's actually two
+ * iovecs, but we pack them into one and note how many of each there were.
+ *
+ * This function returns the descriptor number found, or vq->num (which is
+ * never a valid descriptor number) if none was found.  A negative code is
+ * returned on error.
+ */
+int vhost_get_vq_desc(struct vhost_virtqueue *vq,
+		      struct iovec iov[], unsigned int iov_size,
+		      unsigned int *out_num, unsigned int *in_num,
+		      struct vhost_log *log, unsigned int *log_num)
+{
+	if (vhost_has_feature(vq, VIRTIO_F_RING_PACKED))
+		return vhost_get_vq_desc_packed(vq, iov, iov_size,
+						out_num, in_num,
+						log, log_num);
+	else
+		return vhost_get_vq_desc_split(vq, iov, iov_size,
+					       out_num, in_num,
+					       log, log_num);
+}
 EXPORT_SYMBOL_GPL(vhost_get_vq_desc);
 
 /* Reverse the effect of vhost_get_vq_desc. Useful for error handling. */
@@ -2161,10 +2330,48 @@ static int __vhost_add_used_n(struct vhost_virtqueue *vq,
 	return 0;
 }
 
-/* After we've used one of their buffers, we tell them about it.  We'll then
- * want to notify the guest, using eventfd. */
-int vhost_add_used_n(struct vhost_virtqueue *vq, struct vring_used_elem *heads,
-		     unsigned count)
+static int vhost_add_used_n_packed(struct vhost_virtqueue *vq,
+				   struct vring_used_elem *heads,
+				   unsigned int count)
+{
+	struct vring_desc_packed desc = {
+		.addr = 0,
+		.flags = 0,
+	};
+	int i, ret;
+
+	for (i = 0; i < count; i++) {
+		desc.id = heads[i].id;
+		desc.len = heads[i].len;
+		set_desc_used(vq, &desc, heads[i].wrap_counter);
+
+		/* Update flags etc before desc is written */
+		smp_mb();
+
+		ret = vhost_copy_to_user(vq, vq->desc_packed + heads[i].idx,
+					 &desc, sizeof(desc));
+		if (unlikely(ret)) {
+			vq_err(vq, "Failed to set descriptor: idx %d addr %p\n",
+			       heads[i].idx, vq->desc_packed + heads[i].idx);
+			return -EFAULT;
+		}
+		if (unlikely(vq->log_used)) {
+			/* Make sure desc is written before update log. */
+			smp_wmb();
+			log_write(vq->log_base,
+				  vq->log_addr + heads[i].idx * sizeof(desc),
+				  sizeof(desc));
+			if (vq->log_ctx)
+				eventfd_signal(vq->log_ctx, 1);
+		}
+	}
+
+	return 0;
+}
+
+static int vhost_add_used_n_split(struct vhost_virtqueue *vq,
+				  struct vring_used_elem *heads,
+				  unsigned int count)
 {
 	int start, n, r;
 
@@ -2196,6 +2403,19 @@ int vhost_add_used_n(struct vhost_virtqueue *vq, struct vring_used_elem *heads,
 	}
 	return r;
 }
+
+/* After we've used one of their buffers, we tell them about it.  We'll then
+ * want to notify the guest, using eventfd.
+ */
+int vhost_add_used_n(struct vhost_virtqueue *vq,
+		     struct vring_used_elem *heads,
+		     unsigned int count)
+{
+	if (vhost_has_feature(vq, VIRTIO_F_RING_PACKED))
+		return vhost_add_used_n_packed(vq, heads, count);
+	else
+		return vhost_add_used_n_split(vq, heads, count);
+}
 EXPORT_SYMBOL_GPL(vhost_add_used_n);
 
 static bool vhost_notify(struct vhost_dev *dev, struct vhost_virtqueue *vq)
@@ -2203,6 +2423,11 @@ static bool vhost_notify(struct vhost_dev *dev, struct vhost_virtqueue *vq)
 	__u16 old, new;
 	__virtio16 event;
 	bool v;
+
+	/* FIXME: check driver area */
+	if (vhost_has_feature(vq, VIRTIO_F_RING_PACKED))
+		return false;
+
 	/* Flush out used index updates. This is paired
 	 * with the barrier that the Guest executes when enabling
 	 * interrupts. */
@@ -2265,7 +2490,8 @@ void vhost_add_used_and_signal_n(struct vhost_dev *dev,
 EXPORT_SYMBOL_GPL(vhost_add_used_and_signal_n);
 
 /* return true if we're sure that avaiable ring is empty */
-bool vhost_vq_avail_empty(struct vhost_dev *dev, struct vhost_virtqueue *vq)
+static bool vhost_vq_avail_empty_split(struct vhost_dev *dev,
+				       struct vhost_virtqueue *vq)
 {
 	__virtio16 avail_idx;
 	int r;
@@ -2280,10 +2506,61 @@ bool vhost_vq_avail_empty(struct vhost_dev *dev, struct vhost_virtqueue *vq)
 
 	return vq->avail_idx == vq->last_avail_idx;
 }
+
+/* FIXME: unify codes with vhost_enable_notify_packed() */
+static bool vhost_vq_avail_empty_packed(struct vhost_dev *dev,
+					struct vhost_virtqueue *vq)
+{
+	struct vring_desc_packed desc;
+	int ret, i = vq->last_avail_idx & (vq->num - 1);
+
+	ret = vhost_copy_from_user(vq, &desc, vq->desc_packed + i,
+				   sizeof(desc));
+	if (unlikely(ret)) {
+		vq_err(vq, "Failed to get descriptor: idx %d addr %p\n",
+			i, vq->desc_packed + i);
+		return -EFAULT;
+	}
+
+	/* Read flag after desc is read */
+	smp_mb();
+
+	return desc_is_avail(vq, &desc);
+}
+
+bool vhost_vq_avail_empty(struct vhost_dev *dev, struct vhost_virtqueue *vq)
+{
+	if (vhost_has_feature(vq, VIRTIO_F_RING_PACKED))
+		return vhost_vq_avail_empty_packed(dev, vq);
+	else
+		return vhost_vq_avail_empty_split(dev, vq);
+}
 EXPORT_SYMBOL_GPL(vhost_vq_avail_empty);
 
-/* OK, now we need to know about added descriptors. */
-bool vhost_enable_notify(struct vhost_dev *dev, struct vhost_virtqueue *vq)
+static bool vhost_enable_notify_packed(struct vhost_dev *dev,
+				       struct vhost_virtqueue *vq)
+{
+	struct vring_desc_packed desc;
+	int ret, i = vq->last_avail_idx & (vq->num - 1);
+
+	/* FIXME: disable notification through device area */
+
+	ret = vhost_copy_from_user(vq, &desc, vq->desc_packed + i,
+				   sizeof(desc));
+	if (unlikely(ret)) {
+		vq_err(vq, "Failed to get descriptor: idx %d addr %p\n",
+			i, vq->desc_packed + i);
+		return -EFAULT;
+	}
+
+	/* Read flag after desc is read */
+	smp_mb();
+
+	return desc_is_avail(vq, &desc);
+}
+
+static bool vhost_enable_notify_split(struct vhost_dev *dev,
+				      struct vhost_virtqueue *vq)
 {
 	__virtio16 avail_idx;
 	int r;
@@ -2318,10 +2595,25 @@ bool vhost_enable_notify(struct vhost_dev *dev, struct vhost_virtqueue *vq)
 
 	return vhost16_to_cpu(vq, avail_idx) != vq->avail_idx;
 }
+
+/* OK, now we need to know about added descriptors. */
+bool vhost_enable_notify(struct vhost_dev *dev, struct vhost_virtqueue *vq)
+{
+	if (vhost_has_feature(vq, VIRTIO_F_RING_PACKED))
+		return vhost_enable_notify_packed(dev, vq);
+	else
+		return vhost_enable_notify_split(dev, vq);
+}
 EXPORT_SYMBOL_GPL(vhost_enable_notify);
 
-/* We don't need to be notified again. */
-void vhost_disable_notify(struct vhost_dev *dev, struct vhost_virtqueue *vq)
+static void vhost_disable_notify_packed(struct vhost_dev *dev,
+					struct vhost_virtqueue *vq)
+{
+	/* FIXME: disable notification through device area */
+}
+
+static void vhost_disable_notify_split(struct vhost_dev *dev,
+				       struct vhost_virtqueue *vq)
 {
 	int r;
 
@@ -2335,6 +2627,15 @@ void vhost_disable_notify(struct vhost_dev *dev, struct vhost_virtqueue *vq)
 			       &vq->used->flags, r);
 	}
 }
+
+/* We don't need to be notified again. */
+void vhost_disable_notify(struct vhost_dev *dev, struct vhost_virtqueue *vq)
+{
+	if (vhost_has_feature(vq, VIRTIO_F_RING_PACKED))
+		return vhost_disable_notify_packed(dev, vq);
+	else
+		return vhost_disable_notify_split(dev, vq);
+}
 EXPORT_SYMBOL_GPL(vhost_disable_notify);
 
 /* Create a new message. */
diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h
index ac4b605..cf6533a 100644
--- a/drivers/vhost/vhost.h
+++ b/drivers/vhost/vhost.h
@@ -87,7 +87,10 @@ struct vhost_virtqueue {
 	/* The actual ring of buffers. */
 	struct mutex mutex;
 	unsigned int num;
-	struct vring_desc __user *desc;
+	union {
+		struct vring_desc __user *desc;
+		struct vring_desc_packed __user *desc_packed;
+	};
 	struct vring_avail __user *avail;
 	struct vring_used __user *used;
 	const struct vhost_umem_node *meta_iotlb[VHOST_NUM_ADDRS];
@@ -144,6 +147,7 @@ struct vhost_virtqueue {
 	bool user_be;
 #endif
 	u32 busyloop_timeout;
+	bool used_wrap_counter;
 };
 
 struct vhost_msg_node {
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH RFC 0/2] Packed ring for vhost
  2018-02-14  2:37 [PATCH RFC 0/2] Packed ring for vhost Jason Wang
  2018-02-14  2:37 ` [PATCH RFC 1/2] virtio: introduce packed ring defines Jason Wang
  2018-02-14  2:37 ` [PATCH RFC 2/2] vhost: packed ring support Jason Wang
@ 2018-02-14  2:43 ` Jason Wang
  2018-02-14  2:47 ` Michael S. Tsirkin
  3 siblings, 0 replies; 8+ messages in thread
From: Jason Wang @ 2018-02-14  2:43 UTC (permalink / raw)
  To: mst, virtualization, linux-kernel, netdev; +Cc: wexu, jfreimann, tiwei.bie



On 2018年02月14日 10:37, Jason Wang wrote:
> Hi all:
>
> This RFC implement a subset of packed ring which was described at
> https://github.com/oasis-tcs/virtio-docs/blob/master/virtio-v1.1-packed-wd07.pdf
> . The code were tested with pmd implement by Jens at
> http://dpdk.org/ml/archives/dev/2018-January/089417.html. Minor
> change was needed for pmd codes to kick virtqueue since it assumes a
> busy polling backend.
>
> Test were done between localhost and guest. Testpmd (rxonly) in guest
> reports 2.4Mpps. Testpmd (txonly) repots about 2.1Mpps.
>
> It's not a complete implemention, here's what were missed:
>
> - Device Area
> - Driver Area
> - Descriptor indirection
> - Zerocopy may not be functional
> - Migration path is not tested
> - Vhost devices except for net
> - vIOMMU can not work (mainly because the metadata prefetch is not
>    implemented).
> - See FIXME/TODO in the codes for more details
> - No batching or other optimizations were implemented
>
> For a quick prototype, this series open code the tracking of warp
> counter and descriptor index at net device. This will be addressed in
> the future by:
>
> - Move get_rx_bufs() from net.c to vhost.c
> - Let vhost_get_vq_desc() returns vring_used_elem instad of head id
>
> With the above, we can hide the internal (at least part of) vring
> layout from specific device.
>
> Please review.
>
> Thanks

It's near spring festival in China, will probably reply after the holiday.

Thanks

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH RFC 0/2] Packed ring for vhost
  2018-02-14  2:37 [PATCH RFC 0/2] Packed ring for vhost Jason Wang
                   ` (2 preceding siblings ...)
  2018-02-14  2:43 ` [PATCH RFC 0/2] Packed ring for vhost Jason Wang
@ 2018-02-14  2:47 ` Michael S. Tsirkin
  2018-02-14  3:48   ` Jason Wang
  3 siblings, 1 reply; 8+ messages in thread
From: Michael S. Tsirkin @ 2018-02-14  2:47 UTC (permalink / raw)
  To: Jason Wang
  Cc: virtualization, linux-kernel, netdev, wexu, jfreimann, tiwei.bie

On Wed, Feb 14, 2018 at 10:37:07AM +0800, Jason Wang wrote:
> Hi all:
> 
> This RFC implement a subset of packed ring which was described at
> https://github.com/oasis-tcs/virtio-docs/blob/master/virtio-v1.1-packed-wd07.pdf
> . The code were tested with pmd implement by Jens at
> http://dpdk.org/ml/archives/dev/2018-January/089417.html. Minor
> change was needed for pmd codes to kick virtqueue since it assumes a
> busy polling backend.
> 
> Test were done between localhost and guest. Testpmd (rxonly) in guest
> reports 2.4Mpps. Testpmd (txonly) repots about 2.1Mpps.

How does this compare with the split ring design?

> It's not a complete implemention, here's what were missed:
> 
> - Device Area
> - Driver Area
> - Descriptor indirection
> - Zerocopy may not be functional
> - Migration path is not tested
> - Vhost devices except for net
> - vIOMMU can not work (mainly because the metadata prefetch is not
>   implemented).
> - See FIXME/TODO in the codes for more details
> - No batching or other optimizations were implemented

ioeventfd for PIO/mmio/s390.

> For a quick prototype, this series open code the tracking of warp
> counter and descriptor index at net device. This will be addressed in
> the future by:
> 
> - Move get_rx_bufs() from net.c to vhost.c
> - Let vhost_get_vq_desc() returns vring_used_elem instad of head id
> 
> With the above, we can hide the internal (at least part of) vring
> layout from specific device.
> 
> Please review.
> 
> Thanks
> 
> Jason Wang (2):
>   virtio: introduce packed ring defines
>   vhost: packed ring support
> 
>  drivers/vhost/net.c                |  14 +-
>  drivers/vhost/vhost.c              | 351 ++++++++++++++++++++++++++++++++++---
>  drivers/vhost/vhost.h              |   6 +-
>  include/uapi/linux/virtio_config.h |   9 +
>  include/uapi/linux/virtio_ring.h   |  17 ++
>  5 files changed, 369 insertions(+), 28 deletions(-)
> 
> -- 
> 2.7.4

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH RFC 0/2] Packed ring for vhost
  2018-02-14  2:47 ` Michael S. Tsirkin
@ 2018-02-14  3:48   ` Jason Wang
  0 siblings, 0 replies; 8+ messages in thread
From: Jason Wang @ 2018-02-14  3:48 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: virtualization, linux-kernel, netdev, wexu, jfreimann, tiwei.bie



On 2018年02月14日 10:47, Michael S. Tsirkin wrote:
> On Wed, Feb 14, 2018 at 10:37:07AM +0800, Jason Wang wrote:
>> Hi all:
>>
>> This RFC implement a subset of packed ring which was described at
>> https://github.com/oasis-tcs/virtio-docs/blob/master/virtio-v1.1-packed-wd07.pdf
>> . The code were tested with pmd implement by Jens at
>> http://dpdk.org/ml/archives/dev/2018-January/089417.html. Minor
>> change was needed for pmd codes to kick virtqueue since it assumes a
>> busy polling backend.
>>
>> Test were done between localhost and guest. Testpmd (rxonly) in guest
>> reports 2.4Mpps. Testpmd (txonly) repots about 2.1Mpps.
> How does this compare with the split ring design?

No obvious difference (+-5%). I believe we reach the bottleneck of vhost.

>
>> It's not a complete implemention, here's what were missed:
>>
>> - Device Area
>> - Driver Area
>> - Descriptor indirection
>> - Zerocopy may not be functional
>> - Migration path is not tested
>> - Vhost devices except for net
>> - vIOMMU can not work (mainly because the metadata prefetch is not
>>    implemented).
>> - See FIXME/TODO in the codes for more details
>> - No batching or other optimizations were implemented
> ioeventfd for PIO/mmio/s390.
>

Probably, but this is not the stuffs of packed ring I think.

Thanks

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH RFC 2/2] vhost: packed ring support
  2018-02-14  2:37 ` [PATCH RFC 2/2] vhost: packed ring support Jason Wang
@ 2018-02-27  9:03   ` Tiwei Bie
  2018-02-28  2:38     ` Jason Wang
  0 siblings, 1 reply; 8+ messages in thread
From: Tiwei Bie @ 2018-02-27  9:03 UTC (permalink / raw)
  To: Jason Wang; +Cc: mst, virtualization, linux-kernel, netdev, wexu, jfreimann

On Wed, Feb 14, 2018 at 10:37:09AM +0800, Jason Wang wrote:
[...]
> +static void set_desc_used(struct vhost_virtqueue *vq,
> +			  struct vring_desc_packed *desc, bool wrap_counter)
> +{
> +	__virtio16 flags = desc->flags;
> +
> +	if (wrap_counter) {
> +		desc->flags |= cpu_to_vhost16(vq, DESC_AVAIL);
> +		desc->flags |= cpu_to_vhost16(vq, DESC_USED);
> +	} else {
> +		desc->flags &= ~cpu_to_vhost16(vq, DESC_AVAIL);
> +		desc->flags &= ~cpu_to_vhost16(vq, DESC_USED);
> +	}
> +
> +	desc->flags = flags;

The `desc->flags` is restored after the change.

> +}
> +
> +static int vhost_get_vq_desc_packed(struct vhost_virtqueue *vq,
> +				    struct iovec iov[], unsigned int iov_size,
> +				    unsigned int *out_num, unsigned int *in_num,
> +				    struct vhost_log *log,
> +				    unsigned int *log_num)
> +{
> +	struct vring_desc_packed desc;
> +	int ret, access, i;
> +	u16 avail_idx = vq->last_avail_idx;
> +
> +	/* When we start there are none of either input nor output. */
> +	*out_num = *in_num = 0;
> +	if (unlikely(log))
> +		*log_num = 0;
> +
> +	do {
> +		unsigned int iov_count = *in_num + *out_num;
> +
> +		i = vq->last_avail_idx & (vq->num - 1);

The queue size may not be a power of 2 in packed ring.

Best regards,
Tiwei Bie

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH RFC 2/2] vhost: packed ring support
  2018-02-27  9:03   ` Tiwei Bie
@ 2018-02-28  2:38     ` Jason Wang
  0 siblings, 0 replies; 8+ messages in thread
From: Jason Wang @ 2018-02-28  2:38 UTC (permalink / raw)
  To: Tiwei Bie; +Cc: mst, virtualization, linux-kernel, netdev, wexu, jfreimann



On 2018年02月27日 17:03, Tiwei Bie wrote:
> On Wed, Feb 14, 2018 at 10:37:09AM +0800, Jason Wang wrote:
> [...]
>> +static void set_desc_used(struct vhost_virtqueue *vq,
>> +			  struct vring_desc_packed *desc, bool wrap_counter)
>> +{
>> +	__virtio16 flags = desc->flags;
>> +
>> +	if (wrap_counter) {
>> +		desc->flags |= cpu_to_vhost16(vq, DESC_AVAIL);
>> +		desc->flags |= cpu_to_vhost16(vq, DESC_USED);
>> +	} else {
>> +		desc->flags &= ~cpu_to_vhost16(vq, DESC_AVAIL);
>> +		desc->flags &= ~cpu_to_vhost16(vq, DESC_USED);
>> +	}
>> +
>> +	desc->flags = flags;
> The `desc->flags` is restored after the change.

Right, will fix.

>> +}
>> +
>> +static int vhost_get_vq_desc_packed(struct vhost_virtqueue *vq,
>> +				    struct iovec iov[], unsigned int iov_size,
>> +				    unsigned int *out_num, unsigned int *in_num,
>> +				    struct vhost_log *log,
>> +				    unsigned int *log_num)
>> +{
>> +	struct vring_desc_packed desc;
>> +	int ret, access, i;
>> +	u16 avail_idx = vq->last_avail_idx;
>> +
>> +	/* When we start there are none of either input nor output. */
>> +	*out_num = *in_num = 0;
>> +	if (unlikely(log))
>> +		*log_num = 0;
>> +
>> +	do {
>> +		unsigned int iov_count = *in_num + *out_num;
>> +
>> +		i = vq->last_avail_idx & (vq->num - 1);
> The queue size may not be a power of 2 in packed ring.

Will fix this too.

Thanks

> Best regards,
> Tiwei Bie

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2018-02-28  2:38 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-02-14  2:37 [PATCH RFC 0/2] Packed ring for vhost Jason Wang
2018-02-14  2:37 ` [PATCH RFC 1/2] virtio: introduce packed ring defines Jason Wang
2018-02-14  2:37 ` [PATCH RFC 2/2] vhost: packed ring support Jason Wang
2018-02-27  9:03   ` Tiwei Bie
2018-02-28  2:38     ` Jason Wang
2018-02-14  2:43 ` [PATCH RFC 0/2] Packed ring for vhost Jason Wang
2018-02-14  2:47 ` Michael S. Tsirkin
2018-02-14  3:48   ` Jason Wang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).