All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH RFC 0/2] Packed ring for virtio
@ 2018-02-23 11:17 ` Tiwei Bie
  0 siblings, 0 replies; 38+ messages in thread
From: Tiwei Bie @ 2018-02-23 11:17 UTC (permalink / raw)
  To: mst, virtualization, linux-kernel, netdev
  Cc: jasowang, wexu, jfreimann, tiwei.bie

Hello everyone,

This RFC implements a subset of packed ring which is described at
https://github.com/oasis-tcs/virtio-docs/blob/master/virtio-v1.1-packed-wd08.pdf

The code was tested with DPDK vhost (testpmd/vhost-PMD) implemented
by Jens at http://dpdk.org/ml/archives/dev/2018-January/089417.html
Minor changes are needed for the vhost code, e.g. to kick the guest.

It's not a complete implementation, here is what's missing:

- Device area and driver area
- VIRTIO_RING_F_INDIRECT_DESC
- VIRTIO_F_NOTIFICATION_DATA
- Virtio devices except net are not tested
- See FIXME in the code for more details

Thanks!

Best regards,
Tiwei Bie

Tiwei Bie (2):
  virtio: introduce packed ring defines
  virtio_ring: support packed ring

 drivers/virtio/virtio_ring.c       | 699 ++++++++++++++++++++++++++++++++-----
 include/linux/virtio_ring.h        |   8 +-
 include/uapi/linux/virtio_config.h |  18 +-
 include/uapi/linux/virtio_ring.h   |  68 ++++
 4 files changed, 703 insertions(+), 90 deletions(-)

-- 
2.14.1

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH RFC 0/2] Packed ring for virtio
@ 2018-02-23 11:17 ` Tiwei Bie
  0 siblings, 0 replies; 38+ messages in thread
From: Tiwei Bie @ 2018-02-23 11:17 UTC (permalink / raw)
  To: mst, virtualization, linux-kernel, netdev; +Cc: wexu

Hello everyone,

This RFC implements a subset of packed ring which is described at
https://github.com/oasis-tcs/virtio-docs/blob/master/virtio-v1.1-packed-wd08.pdf

The code was tested with DPDK vhost (testpmd/vhost-PMD) implemented
by Jens at http://dpdk.org/ml/archives/dev/2018-January/089417.html
Minor changes are needed for the vhost code, e.g. to kick the guest.

It's not a complete implementation, here is what's missing:

- Device area and driver area
- VIRTIO_RING_F_INDIRECT_DESC
- VIRTIO_F_NOTIFICATION_DATA
- Virtio devices except net are not tested
- See FIXME in the code for more details

Thanks!

Best regards,
Tiwei Bie

Tiwei Bie (2):
  virtio: introduce packed ring defines
  virtio_ring: support packed ring

 drivers/virtio/virtio_ring.c       | 699 ++++++++++++++++++++++++++++++++-----
 include/linux/virtio_ring.h        |   8 +-
 include/uapi/linux/virtio_config.h |  18 +-
 include/uapi/linux/virtio_ring.h   |  68 ++++
 4 files changed, 703 insertions(+), 90 deletions(-)

-- 
2.14.1

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH RFC 1/2] virtio: introduce packed ring defines
  2018-02-23 11:17 ` Tiwei Bie
@ 2018-02-23 11:18   ` Tiwei Bie
  -1 siblings, 0 replies; 38+ messages in thread
From: Tiwei Bie @ 2018-02-23 11:18 UTC (permalink / raw)
  To: mst, virtualization, linux-kernel, netdev
  Cc: jasowang, wexu, jfreimann, tiwei.bie

Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
---
 include/uapi/linux/virtio_config.h | 18 +++++++++-
 include/uapi/linux/virtio_ring.h   | 68 ++++++++++++++++++++++++++++++++++++++
 2 files changed, 85 insertions(+), 1 deletion(-)

diff --git a/include/uapi/linux/virtio_config.h b/include/uapi/linux/virtio_config.h
index 308e2096291f..e3d077ef5207 100644
--- a/include/uapi/linux/virtio_config.h
+++ b/include/uapi/linux/virtio_config.h
@@ -49,7 +49,7 @@
  * transport being used (eg. virtio_ring), the rest are per-device feature
  * bits. */
 #define VIRTIO_TRANSPORT_F_START	28
-#define VIRTIO_TRANSPORT_F_END		34
+#define VIRTIO_TRANSPORT_F_END		37
 
 #ifndef VIRTIO_CONFIG_NO_LEGACY
 /* Do we get callbacks when the ring is completely used, even if we've
@@ -71,4 +71,20 @@
  * this is for compatibility with legacy systems.
  */
 #define VIRTIO_F_IOMMU_PLATFORM		33
+
+/* This feature indicates support for the packed virtqueue layout. */
+#define VIRTIO_F_RING_PACKED		34
+
+/*
+ * This feature indicates that all buffers are used by the device
+ * in the same order in which they have been made available.
+ */
+#define VIRTIO_F_IN_ORDER		35
+
+/*
+ * This feature indicates that drivers pass extra data (besides
+ * identifying the Virtqueue) in their device notifications.
+ */
+#define VIRTIO_F_NOTIFICATION_DATA	36
+
 #endif /* _UAPI_LINUX_VIRTIO_CONFIG_H */
diff --git a/include/uapi/linux/virtio_ring.h b/include/uapi/linux/virtio_ring.h
index 6d5d5faa989b..77b1d4aeef72 100644
--- a/include/uapi/linux/virtio_ring.h
+++ b/include/uapi/linux/virtio_ring.h
@@ -44,6 +44,9 @@
 /* This means the buffer contains a list of buffer descriptors. */
 #define VRING_DESC_F_INDIRECT	4
 
+#define VRING_DESC_F_AVAIL(b)	((b) << 7)
+#define VRING_DESC_F_USED(b)	((b) << 15)
+
 /* The Host uses this in used->flags to advise the Guest: don't kick me when
  * you add a buffer.  It's unreliable, so it's simply an optimization.  Guest
  * will still kick if it's out of buffers. */
@@ -104,6 +107,36 @@ struct vring {
 	struct vring_used *used;
 };
 
+struct vring_packed_desc_event {
+	/* Descriptor Event Offset */
+	__virtio16 desc_event_off   : 15,
+	/* Descriptor Event Wrap Counter */
+		   desc_event_wrap  : 1;
+	/* Descriptor Event Flags */
+	__virtio16 desc_event_flags : 2;
+};
+
+struct vring_packed_desc {
+	/* Buffer Address. */
+	__virtio64 addr;
+	/* Buffer Length. */
+	__virtio32 len;
+	/* Buffer ID. */
+	__virtio16 id;
+	/* The flags depending on descriptor type. */
+	__virtio16 flags;
+};
+
+struct vring_packed {
+	unsigned int num;
+
+	struct vring_packed_desc *desc;
+
+	struct vring_packed_desc_event *driver;
+
+	struct vring_packed_desc_event *device;
+};
+
 /* Alignment requirements for vring elements.
  * When using pre-virtio 1.0 layout, these fall out naturally.
  */
@@ -171,4 +204,39 @@ static inline int vring_need_event(__u16 event_idx, __u16 new_idx, __u16 old)
 	return (__u16)(new_idx - event_idx - 1) < (__u16)(new_idx - old);
 }
 
+/* The standard layout for the packed ring is a continuous chunk of memory
+ * which looks like this.
+ *
+ * struct vring_packed
+ * {
+ *	// The actual descriptors (16 bytes each)
+ *	struct vring_packed_desc desc[num];
+ *
+ *	// Padding to the next align boundary.
+ *	char pad[];
+ *
+ *	// Driver Event Suppression
+ *	struct vring_packed_desc_event driver;
+ *
+ *	// Device Event Suppression
+ *	struct vring_packed_desc_event device;
+ * };
+ */
+
+static inline void vring_packed_init(struct vring_packed *vr, unsigned int num,
+				     void *p, unsigned long align)
+{
+	vr->num = num;
+	vr->desc = p;
+	vr->driver = (void *)(((uintptr_t)p + sizeof(struct vring_packed_desc)
+		* num + align - 1) & ~(align - 1));
+	vr->device = vr->driver + 1;
+}
+
+static inline unsigned vring_packed_size(unsigned int num, unsigned long align)
+{
+	return ((sizeof(struct vring_packed_desc) * num + align - 1)
+		& ~(align - 1)) + sizeof(struct vring_packed_desc_event) * 2;
+}
+
 #endif /* _UAPI_LINUX_VIRTIO_RING_H */
-- 
2.14.1

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH RFC 1/2] virtio: introduce packed ring defines
@ 2018-02-23 11:18   ` Tiwei Bie
  0 siblings, 0 replies; 38+ messages in thread
From: Tiwei Bie @ 2018-02-23 11:18 UTC (permalink / raw)
  To: mst, virtualization, linux-kernel, netdev; +Cc: wexu

Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
---
 include/uapi/linux/virtio_config.h | 18 +++++++++-
 include/uapi/linux/virtio_ring.h   | 68 ++++++++++++++++++++++++++++++++++++++
 2 files changed, 85 insertions(+), 1 deletion(-)

diff --git a/include/uapi/linux/virtio_config.h b/include/uapi/linux/virtio_config.h
index 308e2096291f..e3d077ef5207 100644
--- a/include/uapi/linux/virtio_config.h
+++ b/include/uapi/linux/virtio_config.h
@@ -49,7 +49,7 @@
  * transport being used (eg. virtio_ring), the rest are per-device feature
  * bits. */
 #define VIRTIO_TRANSPORT_F_START	28
-#define VIRTIO_TRANSPORT_F_END		34
+#define VIRTIO_TRANSPORT_F_END		37
 
 #ifndef VIRTIO_CONFIG_NO_LEGACY
 /* Do we get callbacks when the ring is completely used, even if we've
@@ -71,4 +71,20 @@
  * this is for compatibility with legacy systems.
  */
 #define VIRTIO_F_IOMMU_PLATFORM		33
+
+/* This feature indicates support for the packed virtqueue layout. */
+#define VIRTIO_F_RING_PACKED		34
+
+/*
+ * This feature indicates that all buffers are used by the device
+ * in the same order in which they have been made available.
+ */
+#define VIRTIO_F_IN_ORDER		35
+
+/*
+ * This feature indicates that drivers pass extra data (besides
+ * identifying the Virtqueue) in their device notifications.
+ */
+#define VIRTIO_F_NOTIFICATION_DATA	36
+
 #endif /* _UAPI_LINUX_VIRTIO_CONFIG_H */
diff --git a/include/uapi/linux/virtio_ring.h b/include/uapi/linux/virtio_ring.h
index 6d5d5faa989b..77b1d4aeef72 100644
--- a/include/uapi/linux/virtio_ring.h
+++ b/include/uapi/linux/virtio_ring.h
@@ -44,6 +44,9 @@
 /* This means the buffer contains a list of buffer descriptors. */
 #define VRING_DESC_F_INDIRECT	4
 
+#define VRING_DESC_F_AVAIL(b)	((b) << 7)
+#define VRING_DESC_F_USED(b)	((b) << 15)
+
 /* The Host uses this in used->flags to advise the Guest: don't kick me when
  * you add a buffer.  It's unreliable, so it's simply an optimization.  Guest
  * will still kick if it's out of buffers. */
@@ -104,6 +107,36 @@ struct vring {
 	struct vring_used *used;
 };
 
+struct vring_packed_desc_event {
+	/* Descriptor Event Offset */
+	__virtio16 desc_event_off   : 15,
+	/* Descriptor Event Wrap Counter */
+		   desc_event_wrap  : 1;
+	/* Descriptor Event Flags */
+	__virtio16 desc_event_flags : 2;
+};
+
+struct vring_packed_desc {
+	/* Buffer Address. */
+	__virtio64 addr;
+	/* Buffer Length. */
+	__virtio32 len;
+	/* Buffer ID. */
+	__virtio16 id;
+	/* The flags depending on descriptor type. */
+	__virtio16 flags;
+};
+
+struct vring_packed {
+	unsigned int num;
+
+	struct vring_packed_desc *desc;
+
+	struct vring_packed_desc_event *driver;
+
+	struct vring_packed_desc_event *device;
+};
+
 /* Alignment requirements for vring elements.
  * When using pre-virtio 1.0 layout, these fall out naturally.
  */
@@ -171,4 +204,39 @@ static inline int vring_need_event(__u16 event_idx, __u16 new_idx, __u16 old)
 	return (__u16)(new_idx - event_idx - 1) < (__u16)(new_idx - old);
 }
 
+/* The standard layout for the packed ring is a continuous chunk of memory
+ * which looks like this.
+ *
+ * struct vring_packed
+ * {
+ *	// The actual descriptors (16 bytes each)
+ *	struct vring_packed_desc desc[num];
+ *
+ *	// Padding to the next align boundary.
+ *	char pad[];
+ *
+ *	// Driver Event Suppression
+ *	struct vring_packed_desc_event driver;
+ *
+ *	// Device Event Suppression
+ *	struct vring_packed_desc_event device;
+ * };
+ */
+
+static inline void vring_packed_init(struct vring_packed *vr, unsigned int num,
+				     void *p, unsigned long align)
+{
+	vr->num = num;
+	vr->desc = p;
+	vr->driver = (void *)(((uintptr_t)p + sizeof(struct vring_packed_desc)
+		* num + align - 1) & ~(align - 1));
+	vr->device = vr->driver + 1;
+}
+
+static inline unsigned vring_packed_size(unsigned int num, unsigned long align)
+{
+	return ((sizeof(struct vring_packed_desc) * num + align - 1)
+		& ~(align - 1)) + sizeof(struct vring_packed_desc_event) * 2;
+}
+
 #endif /* _UAPI_LINUX_VIRTIO_RING_H */
-- 
2.14.1

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH RFC 2/2] virtio_ring: support packed ring
  2018-02-23 11:17 ` Tiwei Bie
@ 2018-02-23 11:18   ` Tiwei Bie
  -1 siblings, 0 replies; 38+ messages in thread
From: Tiwei Bie @ 2018-02-23 11:18 UTC (permalink / raw)
  To: mst, virtualization, linux-kernel, netdev
  Cc: jasowang, wexu, jfreimann, tiwei.bie

Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
---
 drivers/virtio/virtio_ring.c | 699 +++++++++++++++++++++++++++++++++++++------
 include/linux/virtio_ring.h  |   8 +-
 2 files changed, 618 insertions(+), 89 deletions(-)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index eb30f3e09a47..393778a2f809 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -58,14 +58,14 @@
 
 struct vring_desc_state {
 	void *data;			/* Data for callback. */
-	struct vring_desc *indir_desc;	/* Indirect descriptor, if any. */
+	void *indir_desc;		/* Indirect descriptor, if any. */
+	int num;			/* Descriptor list length. */
 };
 
 struct vring_virtqueue {
 	struct virtqueue vq;
 
-	/* Actual memory layout for this queue */
-	struct vring vring;
+	bool packed;
 
 	/* Can we use weak barriers? */
 	bool weak_barriers;
@@ -87,11 +87,28 @@ struct vring_virtqueue {
 	/* Last used index we've seen. */
 	u16 last_used_idx;
 
-	/* Last written value to avail->flags */
-	u16 avail_flags_shadow;
-
-	/* Last written value to avail->idx in guest byte order */
-	u16 avail_idx_shadow;
+	union {
+		/* Available for split ring */
+		struct {
+			/* Actual memory layout for this queue */
+			struct vring vring;
+
+			/* Last written value to avail->flags */
+			u16 avail_flags_shadow;
+
+			/* Last written value to avail->idx in
+			 * guest byte order */
+			u16 avail_idx_shadow;
+		};
+
+		/* Available for packed ring */
+		struct {
+			/* Actual memory layout for this queue */
+			struct vring_packed vring_packed;
+			u8 wrap_counter : 1;
+			bool chaining;
+		};
+	};
 
 	/* How to notify other side. FIXME: commonalize hcalls! */
 	bool (*notify)(struct virtqueue *vq);
@@ -201,26 +218,37 @@ static dma_addr_t vring_map_single(const struct vring_virtqueue *vq,
 			      cpu_addr, size, direction);
 }
 
-static void vring_unmap_one(const struct vring_virtqueue *vq,
-			    struct vring_desc *desc)
+static void vring_unmap_one(const struct vring_virtqueue *vq, void *_desc)
 {
+	u64 addr;
+	u32 len;
 	u16 flags;
 
 	if (!vring_use_dma_api(vq->vq.vdev))
 		return;
 
-	flags = virtio16_to_cpu(vq->vq.vdev, desc->flags);
+	if (vq->packed) {
+		struct vring_packed_desc *desc = _desc;
+
+		addr = virtio64_to_cpu(vq->vq.vdev, desc->addr);
+		len = virtio32_to_cpu(vq->vq.vdev, desc->len);
+		flags = virtio16_to_cpu(vq->vq.vdev, desc->flags);
+	} else {
+		struct vring_desc *desc = _desc;
+
+		addr = virtio64_to_cpu(vq->vq.vdev, desc->addr);
+		len = virtio32_to_cpu(vq->vq.vdev, desc->len);
+		flags = virtio16_to_cpu(vq->vq.vdev, desc->flags);
+	}
 
 	if (flags & VRING_DESC_F_INDIRECT) {
 		dma_unmap_single(vring_dma_dev(vq),
-				 virtio64_to_cpu(vq->vq.vdev, desc->addr),
-				 virtio32_to_cpu(vq->vq.vdev, desc->len),
+				 addr, len,
 				 (flags & VRING_DESC_F_WRITE) ?
 				 DMA_FROM_DEVICE : DMA_TO_DEVICE);
 	} else {
 		dma_unmap_page(vring_dma_dev(vq),
-			       virtio64_to_cpu(vq->vq.vdev, desc->addr),
-			       virtio32_to_cpu(vq->vq.vdev, desc->len),
+			       addr, len,
 			       (flags & VRING_DESC_F_WRITE) ?
 			       DMA_FROM_DEVICE : DMA_TO_DEVICE);
 	}
@@ -235,8 +263,9 @@ static int vring_mapping_error(const struct vring_virtqueue *vq,
 	return dma_mapping_error(vring_dma_dev(vq), addr);
 }
 
-static struct vring_desc *alloc_indirect(struct virtqueue *_vq,
-					 unsigned int total_sg, gfp_t gfp)
+static struct vring_desc *alloc_indirect_split(struct virtqueue *_vq,
+					       unsigned int total_sg,
+					       gfp_t gfp)
 {
 	struct vring_desc *desc;
 	unsigned int i;
@@ -257,14 +286,32 @@ static struct vring_desc *alloc_indirect(struct virtqueue *_vq,
 	return desc;
 }
 
-static inline int virtqueue_add(struct virtqueue *_vq,
-				struct scatterlist *sgs[],
-				unsigned int total_sg,
-				unsigned int out_sgs,
-				unsigned int in_sgs,
-				void *data,
-				void *ctx,
-				gfp_t gfp)
+static struct vring_packed_desc *alloc_indirect_packed(struct virtqueue *_vq,
+						       unsigned int total_sg,
+						       gfp_t gfp)
+{
+	struct vring_packed_desc *desc;
+
+	/*
+	 * We require lowmem mappings for the descriptors because
+	 * otherwise virt_to_phys will give us bogus addresses in the
+	 * virtqueue.
+	 */
+	gfp &= ~__GFP_HIGHMEM;
+
+	desc = kmalloc(total_sg * sizeof(struct vring_packed_desc), gfp);
+
+	return desc;
+}
+
+static inline int virtqueue_add_split(struct virtqueue *_vq,
+				      struct scatterlist *sgs[],
+				      unsigned int total_sg,
+				      unsigned int out_sgs,
+				      unsigned int in_sgs,
+				      void *data,
+				      void *ctx,
+				      gfp_t gfp)
 {
 	struct vring_virtqueue *vq = to_vvq(_vq);
 	struct scatterlist *sg;
@@ -303,7 +350,7 @@ static inline int virtqueue_add(struct virtqueue *_vq,
 	/* If the host supports indirect descriptor tables, and we have multiple
 	 * buffers, then go indirect. FIXME: tune this threshold */
 	if (vq->indirect && total_sg > 1 && vq->vq.num_free)
-		desc = alloc_indirect(_vq, total_sg, gfp);
+		desc = alloc_indirect_split(_vq, total_sg, gfp);
 	else {
 		desc = NULL;
 		WARN_ON_ONCE(total_sg > vq->vring.num && !vq->indirect);
@@ -437,6 +484,243 @@ static inline int virtqueue_add(struct virtqueue *_vq,
 	return -EIO;
 }
 
+static inline int virtqueue_add_packed(struct virtqueue *_vq,
+				       struct scatterlist *sgs[],
+				       unsigned int total_sg,
+				       unsigned int out_sgs,
+				       unsigned int in_sgs,
+				       void *data,
+				       void *ctx,
+				       gfp_t gfp)
+{
+	struct vring_virtqueue *vq = to_vvq(_vq);
+	struct vring_packed_desc *desc;
+	struct scatterlist *sg;
+	unsigned int i, n, descs_used, uninitialized_var(prev), err_idx;
+	__virtio16 uninitialized_var(head_flags), flags;
+	int head, wrap_counter;
+	bool indirect;
+
+	START_USE(vq);
+
+	BUG_ON(data == NULL);
+	BUG_ON(ctx && vq->indirect);
+
+	if (unlikely(vq->broken)) {
+		END_USE(vq);
+		return -EIO;
+	}
+
+	if (total_sg > 1 && !vq->chaining && !vq->indirect) {
+		END_USE(vq);
+		return -ENOTSUPP;
+	}
+
+#ifdef DEBUG
+	{
+		ktime_t now = ktime_get();
+
+		/* No kick or get, with .1 second between?  Warn. */
+		if (vq->last_add_time_valid)
+			WARN_ON(ktime_to_ms(ktime_sub(now, vq->last_add_time))
+					    > 100);
+		vq->last_add_time = now;
+		vq->last_add_time_valid = true;
+	}
+#endif
+
+	BUG_ON(total_sg == 0);
+
+	head = vq->free_head;
+	wrap_counter = vq->wrap_counter;
+
+	/* If the host supports indirect descriptor tables, and we have multiple
+	 * buffers, then go indirect. FIXME: tune this threshold */
+	if (vq->indirect && total_sg > 1 && vq->vq.num_free)
+		desc = alloc_indirect_packed(_vq, total_sg, gfp);
+	else {
+		desc = NULL;
+		WARN_ON_ONCE(total_sg > vq->vring.num && !vq->indirect);
+	}
+
+	if (desc) {
+		/* Use a single buffer which doesn't continue */
+		indirect = true;
+		/* Set up rest to use this indirect table. */
+		i = 0;
+		descs_used = 1;
+	} else {
+		indirect = false;
+		desc = vq->vring_packed.desc;
+		i = head;
+		descs_used = total_sg;
+
+		if (total_sg > 1 && !vq->chaining) {
+			END_USE(vq);
+			return -ENOTSUPP;
+		}
+	}
+
+	if (vq->vq.num_free < descs_used) {
+		pr_debug("Can't add buf len %i - avail = %i\n",
+			 descs_used, vq->vq.num_free);
+		/* FIXME: for historical reasons, we force a notify here if
+		 * there are outgoing parts to the buffer.  Presumably the
+		 * host should service the ring ASAP. */
+		if (out_sgs)
+			vq->notify(&vq->vq);
+		if (indirect)
+			kfree(desc);
+		END_USE(vq);
+		return -ENOSPC;
+	}
+
+	for (n = 0; n < out_sgs; n++) {
+		for (sg = sgs[n]; sg; sg = sg_next(sg)) {
+			dma_addr_t addr = vring_map_one_sg(vq, sg, DMA_TO_DEVICE);
+			if (vring_mapping_error(vq, addr))
+				goto unmap_release;
+
+			flags = cpu_to_virtio16(_vq->vdev, VRING_DESC_F_NEXT |
+					VRING_DESC_F_AVAIL(vq->wrap_counter) |
+					VRING_DESC_F_USED(!vq->wrap_counter));
+			if (!indirect && i == head)
+				head_flags = flags;
+			else
+				desc[i].flags = flags;
+
+			desc[i].addr = cpu_to_virtio64(_vq->vdev, addr);
+			desc[i].len = cpu_to_virtio32(_vq->vdev, sg->length);
+			desc[i].id = cpu_to_virtio32(_vq->vdev, head);
+			prev = i;
+			i++;
+			if (!indirect && i >= vq->vring_packed.num) {
+				i = 0;
+				vq->wrap_counter ^= 1;
+			}
+		}
+	}
+	for (; n < (out_sgs + in_sgs); n++) {
+		for (sg = sgs[n]; sg; sg = sg_next(sg)) {
+			dma_addr_t addr = vring_map_one_sg(vq, sg, DMA_FROM_DEVICE);
+			if (vring_mapping_error(vq, addr))
+				goto unmap_release;
+
+			flags = cpu_to_virtio16(_vq->vdev, VRING_DESC_F_NEXT |
+					VRING_DESC_F_WRITE |
+					VRING_DESC_F_AVAIL(vq->wrap_counter) |
+					VRING_DESC_F_USED(!vq->wrap_counter));
+			if (!indirect && i == head)
+				head_flags = flags;
+			else
+				desc[i].flags = flags;
+
+			desc[i].addr = cpu_to_virtio64(_vq->vdev, addr);
+			desc[i].len = cpu_to_virtio32(_vq->vdev, sg->length);
+			desc[i].id = cpu_to_virtio32(_vq->vdev, head);
+			prev = i;
+			i++;
+			if (!indirect && i >= vq->vring_packed.num) {
+				i = 0;
+				vq->wrap_counter ^= 1;
+			}
+		}
+	}
+	/* Last one doesn't continue. */
+	if (!indirect && (head + 1) % vq->vring_packed.num == i)
+		head_flags &= cpu_to_virtio16(_vq->vdev, ~VRING_DESC_F_NEXT);
+	else
+		desc[prev].flags &= cpu_to_virtio16(_vq->vdev, ~VRING_DESC_F_NEXT);
+
+	if (indirect) {
+		/* FIXME: to be implemented */
+
+		/* Now that the indirect table is filled in, map it. */
+		dma_addr_t addr = vring_map_single(
+			vq, desc, total_sg * sizeof(struct vring_packed_desc),
+			DMA_TO_DEVICE);
+		if (vring_mapping_error(vq, addr))
+			goto unmap_release;
+
+		head_flags = cpu_to_virtio16(_vq->vdev, VRING_DESC_F_INDIRECT |
+					     VRING_DESC_F_AVAIL(wrap_counter) |
+					     VRING_DESC_F_USED(!wrap_counter));
+		vq->vring_packed.desc[head].addr = cpu_to_virtio64(_vq->vdev, addr);
+		vq->vring_packed.desc[head].len = cpu_to_virtio32(_vq->vdev,
+				total_sg * sizeof(struct vring_packed_desc));
+		vq->vring_packed.desc[head].id = cpu_to_virtio32(_vq->vdev, head);
+	}
+
+	/* We're using some buffers from the free list. */
+	vq->vq.num_free -= descs_used;
+
+	/* Update free pointer */
+	if (indirect) {
+		n = head + 1;
+		if (n >= vq->vring_packed.num) {
+			n = 0;
+			vq->wrap_counter ^= 1;
+		}
+		vq->free_head = n;
+	} else
+		vq->free_head = i;
+
+	/* Store token and indirect buffer state. */
+	vq->desc_state[head].num = descs_used;
+	vq->desc_state[head].data = data;
+	if (indirect)
+		vq->desc_state[head].indir_desc = desc;
+	else
+		vq->desc_state[head].indir_desc = ctx;
+
+	virtio_wmb(vq->weak_barriers);
+	vq->vring_packed.desc[head].flags = head_flags;
+	vq->num_added++;
+
+	pr_debug("Added buffer head %i to %p\n", head, vq);
+	END_USE(vq);
+
+	return 0;
+
+unmap_release:
+	err_idx = i;
+	i = head;
+
+	for (n = 0; n < total_sg; n++) {
+		if (i == err_idx)
+			break;
+		vring_unmap_one(vq, &desc[i]);
+		i++;
+		if (!indirect && i >= vq->vring_packed.num)
+			i = 0;
+	}
+
+	vq->wrap_counter = wrap_counter;
+
+	if (indirect)
+		kfree(desc);
+
+	END_USE(vq);
+	return -EIO;
+}
+
+static inline int virtqueue_add(struct virtqueue *_vq,
+				struct scatterlist *sgs[],
+				unsigned int total_sg,
+				unsigned int out_sgs,
+				unsigned int in_sgs,
+				void *data,
+				void *ctx,
+				gfp_t gfp)
+{
+	struct vring_virtqueue *vq = to_vvq(_vq);
+
+	return vq->packed ? virtqueue_add_packed(_vq, sgs, total_sg, out_sgs,
+						 in_sgs, data, ctx, gfp) :
+			    virtqueue_add_split(_vq, sgs, total_sg, out_sgs,
+						in_sgs, data, ctx, gfp);
+}
+
 /**
  * virtqueue_add_sgs - expose buffers to other end
  * @vq: the struct virtqueue we're talking about.
@@ -561,6 +845,12 @@ bool virtqueue_kick_prepare(struct virtqueue *_vq)
 	 * event. */
 	virtio_mb(vq->weak_barriers);
 
+	if (vq->packed) {
+		/* FIXME: to be implemented */
+		needs_kick = true;
+		goto out;
+	}
+
 	old = vq->avail_idx_shadow - vq->num_added;
 	new = vq->avail_idx_shadow;
 	vq->num_added = 0;
@@ -579,6 +869,8 @@ bool virtqueue_kick_prepare(struct virtqueue *_vq)
 	} else {
 		needs_kick = !(vq->vring.used->flags & cpu_to_virtio16(_vq->vdev, VRING_USED_F_NO_NOTIFY));
 	}
+
+out:
 	END_USE(vq);
 	return needs_kick;
 }
@@ -628,8 +920,8 @@ bool virtqueue_kick(struct virtqueue *vq)
 }
 EXPORT_SYMBOL_GPL(virtqueue_kick);
 
-static void detach_buf(struct vring_virtqueue *vq, unsigned int head,
-		       void **ctx)
+static void detach_buf_split(struct vring_virtqueue *vq, unsigned int head,
+			     void **ctx)
 {
 	unsigned int i, j;
 	__virtio16 nextflag = cpu_to_virtio16(vq->vq.vdev, VRING_DESC_F_NEXT);
@@ -677,29 +969,81 @@ static void detach_buf(struct vring_virtqueue *vq, unsigned int head,
 	}
 }
 
-static inline bool more_used(const struct vring_virtqueue *vq)
+static void detach_buf_packed(struct vring_virtqueue *vq, unsigned int head,
+			      void **ctx)
+{
+	struct vring_packed_desc *desc;
+	unsigned int i, j;
+
+	/* Clear data ptr. */
+	vq->desc_state[head].data = NULL;
+
+	i = head;
+
+	for (j = 0; j < vq->desc_state[head].num; j++) {
+		desc = &vq->vring_packed.desc[i];
+		vring_unmap_one(vq, desc);
+		i++;
+		if (i >= vq->vring_packed.num)
+			i = 0;
+	}
+
+	vq->vq.num_free += vq->desc_state[head].num;
+
+	if (vq->indirect) {
+		u32 len;
+
+		desc = vq->desc_state[head].indir_desc;
+		/* Free the indirect table, if any, now that it's unmapped. */
+		if (!desc)
+			return;
+
+		len = virtio32_to_cpu(vq->vq.vdev,
+				      vq->vring_packed.desc[head].len);
+
+		BUG_ON(!(vq->vring_packed.desc[head].flags &
+			 cpu_to_virtio16(vq->vq.vdev, VRING_DESC_F_INDIRECT)));
+		BUG_ON(len == 0 || len % sizeof(struct vring_packed_desc));
+
+		for (j = 0; j < len / sizeof(struct vring_packed_desc); j++)
+			vring_unmap_one(vq, &desc[j]);
+
+		kfree(desc);
+		vq->desc_state[head].indir_desc = NULL;
+	} else if (ctx) {
+		*ctx = vq->desc_state[head].indir_desc;
+	}
+}
+
+static inline bool more_used_split(const struct vring_virtqueue *vq)
 {
 	return vq->last_used_idx != virtio16_to_cpu(vq->vq.vdev, vq->vring.used->idx);
 }
 
-/**
- * virtqueue_get_buf - get the next used buffer
- * @vq: the struct virtqueue we're talking about.
- * @len: the length written into the buffer
- *
- * If the device wrote data into the buffer, @len will be set to the
- * amount written.  This means you don't need to clear the buffer
- * beforehand to ensure there's no data leakage in the case of short
- * writes.
- *
- * Caller must ensure we don't call this with other virtqueue
- * operations at the same time (except where noted).
- *
- * Returns NULL if there are no used buffers, or the "data" token
- * handed to virtqueue_add_*().
- */
-void *virtqueue_get_buf_ctx(struct virtqueue *_vq, unsigned int *len,
-			    void **ctx)
+static inline bool more_used_packed(const struct vring_virtqueue *vq)
+{
+	u16 last_used, flags;
+	bool avail, used;
+
+	if (vq->vq.num_free == vq->vring.num)
+		return false;
+
+	last_used = vq->last_used_idx;
+	flags = virtio16_to_cpu(vq->vq.vdev,
+				vq->vring_packed.desc[last_used].flags);
+	avail = flags & VRING_DESC_F_AVAIL(1);
+	used = flags & VRING_DESC_F_USED(1);
+
+	return avail == used;
+}
+
+static inline bool more_used(const struct vring_virtqueue *vq)
+{
+	return vq->packed ? more_used_packed(vq) : more_used_split(vq);
+}
+
+void *virtqueue_get_buf_ctx_split(struct virtqueue *_vq, unsigned int *len,
+				  void **ctx)
 {
 	struct vring_virtqueue *vq = to_vvq(_vq);
 	void *ret;
@@ -735,9 +1079,9 @@ void *virtqueue_get_buf_ctx(struct virtqueue *_vq, unsigned int *len,
 		return NULL;
 	}
 
-	/* detach_buf clears data, so grab it now. */
+	/* detach_buf_split clears data, so grab it now. */
 	ret = vq->desc_state[i].data;
-	detach_buf(vq, i, ctx);
+	detach_buf_split(vq, i, ctx);
 	vq->last_used_idx++;
 	/* If we expect an interrupt for the next entry, tell host
 	 * by writing event index and flush out the write before
@@ -754,6 +1098,87 @@ void *virtqueue_get_buf_ctx(struct virtqueue *_vq, unsigned int *len,
 	END_USE(vq);
 	return ret;
 }
+
+void *virtqueue_get_buf_ctx_packed(struct virtqueue *_vq, unsigned int *len,
+				   void **ctx)
+{
+	struct vring_virtqueue *vq = to_vvq(_vq);
+	void *ret;
+	unsigned int i;
+	u16 last_used;
+
+	START_USE(vq);
+
+	if (unlikely(vq->broken)) {
+		END_USE(vq);
+		return NULL;
+	}
+
+	if (!more_used(vq)) {
+		pr_debug("No more buffers in queue\n");
+		END_USE(vq);
+		return NULL;
+	}
+
+	/* Only get used array entries after they have been exposed by host. */
+	virtio_rmb(vq->weak_barriers);
+
+	last_used = vq->last_used_idx;
+
+	i = virtio32_to_cpu(_vq->vdev, vq->vring_packed.desc[last_used].id);
+	*len = virtio32_to_cpu(_vq->vdev, vq->vring_packed.desc[last_used].len);
+
+	if (unlikely(i >= vq->vring_packed.num)) {
+		BAD_RING(vq, "id %u out of range\n", i);
+		return NULL;
+	}
+	if (unlikely(!vq->desc_state[i].data)) {
+		BAD_RING(vq, "id %u is not a head!\n", i);
+		return NULL;
+	}
+
+	/* detach_buf_packed clears data, so grab it now. */
+	ret = vq->desc_state[i].data;
+	detach_buf_packed(vq, i, ctx);
+
+	vq->last_used_idx += vq->desc_state[i].num;
+	if (vq->last_used_idx >= vq->vring_packed.num)
+		vq->last_used_idx %= vq->vring_packed.num;
+
+	// FIXME: implement the desc event support
+
+#ifdef DEBUG
+	vq->last_add_time_valid = false;
+#endif
+
+	END_USE(vq);
+	return ret;
+}
+
+/**
+ * virtqueue_get_buf - get the next used buffer
+ * @vq: the struct virtqueue we're talking about.
+ * @len: the length written into the buffer
+ *
+ * If the device wrote data into the buffer, @len will be set to the
+ * amount written.  This means you don't need to clear the buffer
+ * beforehand to ensure there's no data leakage in the case of short
+ * writes.
+ *
+ * Caller must ensure we don't call this with other virtqueue
+ * operations at the same time (except where noted).
+ *
+ * Returns NULL if there are no used buffers, or the "data" token
+ * handed to virtqueue_add_*().
+ */
+void *virtqueue_get_buf_ctx(struct virtqueue *_vq, unsigned int *len,
+			    void **ctx)
+{
+	struct vring_virtqueue *vq = to_vvq(_vq);
+
+	return vq->packed ? virtqueue_get_buf_ctx_packed(_vq, len, ctx) :
+			    virtqueue_get_buf_ctx_split(_vq, len, ctx);
+}
 EXPORT_SYMBOL_GPL(virtqueue_get_buf_ctx);
 
 void *virtqueue_get_buf(struct virtqueue *_vq, unsigned int *len)
@@ -761,6 +1186,24 @@ void *virtqueue_get_buf(struct virtqueue *_vq, unsigned int *len)
 	return virtqueue_get_buf_ctx(_vq, len, NULL);
 }
 EXPORT_SYMBOL_GPL(virtqueue_get_buf);
+
+static void virtqueue_disable_cb_split(struct virtqueue *_vq)
+{
+	struct vring_virtqueue *vq = to_vvq(_vq);
+
+	if (!(vq->avail_flags_shadow & VRING_AVAIL_F_NO_INTERRUPT)) {
+		vq->avail_flags_shadow |= VRING_AVAIL_F_NO_INTERRUPT;
+		if (!vq->event)
+			vq->vring.avail->flags = cpu_to_virtio16(_vq->vdev,
+							vq->avail_flags_shadow);
+	}
+}
+
+static void virtqueue_disable_cb_packed(struct virtqueue *_vq)
+{
+	// FIXME: to be implemented
+}
+
 /**
  * virtqueue_disable_cb - disable callbacks
  * @vq: the struct virtqueue we're talking about.
@@ -774,12 +1217,10 @@ void virtqueue_disable_cb(struct virtqueue *_vq)
 {
 	struct vring_virtqueue *vq = to_vvq(_vq);
 
-	if (!(vq->avail_flags_shadow & VRING_AVAIL_F_NO_INTERRUPT)) {
-		vq->avail_flags_shadow |= VRING_AVAIL_F_NO_INTERRUPT;
-		if (!vq->event)
-			vq->vring.avail->flags = cpu_to_virtio16(_vq->vdev, vq->avail_flags_shadow);
-	}
-
+	if (vq->packed)
+		virtqueue_disable_cb_packed(_vq);
+	else
+		virtqueue_disable_cb_split(_vq);
 }
 EXPORT_SYMBOL_GPL(virtqueue_disable_cb);
 
@@ -802,6 +1243,12 @@ unsigned virtqueue_enable_cb_prepare(struct virtqueue *_vq)
 
 	START_USE(vq);
 
+	if (vq->packed) {
+		// FIXME: to be implemented
+		last_used_idx = vq->last_used_idx;
+		goto out;
+	}
+
 	/* We optimistically turn back on interrupts, then check if there was
 	 * more to do. */
 	/* Depending on the VIRTIO_RING_F_EVENT_IDX feature, we need to
@@ -813,6 +1260,7 @@ unsigned virtqueue_enable_cb_prepare(struct virtqueue *_vq)
 			vq->vring.avail->flags = cpu_to_virtio16(_vq->vdev, vq->avail_flags_shadow);
 	}
 	vring_used_event(&vq->vring) = cpu_to_virtio16(_vq->vdev, last_used_idx = vq->last_used_idx);
+out:
 	END_USE(vq);
 	return last_used_idx;
 }
@@ -832,6 +1280,12 @@ bool virtqueue_poll(struct virtqueue *_vq, unsigned last_used_idx)
 	struct vring_virtqueue *vq = to_vvq(_vq);
 
 	virtio_mb(vq->weak_barriers);
+	if (vq->packed) {
+		u16 flags = virtio16_to_cpu(vq->vq.vdev,
+				vq->vring_packed.desc[last_used_idx].flags);
+		return !(flags & VRING_DESC_F_AVAIL(1)) ==
+		       !(flags & VRING_DESC_F_USED(1));
+	}
 	return (u16)last_used_idx != virtio16_to_cpu(_vq->vdev, vq->vring.used->idx);
 }
 EXPORT_SYMBOL_GPL(virtqueue_poll);
@@ -874,6 +1328,11 @@ bool virtqueue_enable_cb_delayed(struct virtqueue *_vq)
 
 	START_USE(vq);
 
+	if (vq->packed) {
+		// FIXME: to be implemented
+		goto out;
+	}
+
 	/* We optimistically turn back on interrupts, then check if there was
 	 * more to do. */
 	/* Depending on the VIRTIO_RING_F_USED_EVENT_IDX feature, we need to
@@ -896,6 +1355,7 @@ bool virtqueue_enable_cb_delayed(struct virtqueue *_vq)
 		return false;
 	}
 
+out:
 	END_USE(vq);
 	return true;
 }
@@ -922,14 +1382,20 @@ void *virtqueue_detach_unused_buf(struct virtqueue *_vq)
 			continue;
 		/* detach_buf clears data, so grab it now. */
 		buf = vq->desc_state[i].data;
-		detach_buf(vq, i, NULL);
-		vq->avail_idx_shadow--;
-		vq->vring.avail->idx = cpu_to_virtio16(_vq->vdev, vq->avail_idx_shadow);
+		if (vq->packed)
+			detach_buf_packed(vq, i, NULL);
+		else {
+			detach_buf_split(vq, i, NULL);
+			vq->avail_idx_shadow--;
+			vq->vring.avail->idx = cpu_to_virtio16(_vq->vdev,
+							vq->avail_idx_shadow);
+		}
 		END_USE(vq);
 		return buf;
 	}
 	/* That should have freed everything. */
-	BUG_ON(vq->vq.num_free != vq->vring.num);
+	BUG_ON(vq->vq.num_free != (vq->packed ? vq->vring_packed.num :
+						vq->vring.num));
 
 	END_USE(vq);
 	return NULL;
@@ -957,7 +1423,8 @@ irqreturn_t vring_interrupt(int irq, void *_vq)
 EXPORT_SYMBOL_GPL(vring_interrupt);
 
 struct virtqueue *__vring_new_virtqueue(unsigned int index,
-					struct vring vring,
+					union vring_union vring,
+					bool packed,
 					struct virtio_device *vdev,
 					bool weak_barriers,
 					bool context,
@@ -965,19 +1432,20 @@ struct virtqueue *__vring_new_virtqueue(unsigned int index,
 					void (*callback)(struct virtqueue *),
 					const char *name)
 {
-	unsigned int i;
+	unsigned int num, i;
 	struct vring_virtqueue *vq;
 
-	vq = kmalloc(sizeof(*vq) + vring.num * sizeof(struct vring_desc_state),
+	num = packed ? vring.vring_packed.num : vring.vring_split.num;
+
+	vq = kmalloc(sizeof(*vq) + num * sizeof(struct vring_desc_state),
 		     GFP_KERNEL);
 	if (!vq)
 		return NULL;
 
-	vq->vring = vring;
 	vq->vq.callback = callback;
 	vq->vq.vdev = vdev;
 	vq->vq.name = name;
-	vq->vq.num_free = vring.num;
+	vq->vq.num_free = num;
 	vq->vq.index = index;
 	vq->we_own_ring = false;
 	vq->queue_dma_addr = 0;
@@ -986,9 +1454,8 @@ struct virtqueue *__vring_new_virtqueue(unsigned int index,
 	vq->weak_barriers = weak_barriers;
 	vq->broken = false;
 	vq->last_used_idx = 0;
-	vq->avail_flags_shadow = 0;
-	vq->avail_idx_shadow = 0;
 	vq->num_added = 0;
+	vq->packed = packed;
 	list_add_tail(&vq->vq.list, &vdev->vqs);
 #ifdef DEBUG
 	vq->in_use = false;
@@ -999,18 +1466,41 @@ struct virtqueue *__vring_new_virtqueue(unsigned int index,
 		!context;
 	vq->event = virtio_has_feature(vdev, VIRTIO_RING_F_EVENT_IDX);
 
+	if (vq->packed) {
+		vq->vring_packed = vring.vring_packed;
+		vq->free_head = 0;
+		vq->wrap_counter = 1;
+
+#if 0
+		vq->chaining = virtio_has_feature(vdev,
+						  VIRTIO_RING_F_LIST_DESC);
+#else
+		vq->chaining = true;
+#endif
+	} else {
+		vq->vring = vring.vring_split;
+		vq->avail_flags_shadow = 0;
+		vq->avail_idx_shadow = 0;
+
+		/* Put everything in free lists. */
+		vq->free_head = 0;
+		for (i = 0; i < num-1; i++)
+			vq->vring.desc[i].next = cpu_to_virtio16(vdev, i + 1);
+	}
+
 	/* No callback?  Tell other side not to bother us. */
 	if (!callback) {
-		vq->avail_flags_shadow |= VRING_AVAIL_F_NO_INTERRUPT;
-		if (!vq->event)
-			vq->vring.avail->flags = cpu_to_virtio16(vdev, vq->avail_flags_shadow);
+		if (packed) {
+			// FIXME: to be implemented
+		} else {
+			vq->avail_flags_shadow |= VRING_AVAIL_F_NO_INTERRUPT;
+			if (!vq->event)
+				vq->vring.avail->flags = cpu_to_virtio16(vdev,
+						vq->avail_flags_shadow);
+		}
 	}
 
-	/* Put everything in free lists. */
-	vq->free_head = 0;
-	for (i = 0; i < vring.num-1; i++)
-		vq->vring.desc[i].next = cpu_to_virtio16(vdev, i + 1);
-	memset(vq->desc_state, 0, vring.num * sizeof(struct vring_desc_state));
+	memset(vq->desc_state, 0, num * sizeof(struct vring_desc_state));
 
 	return &vq->vq;
 }
@@ -1058,6 +1548,14 @@ static void vring_free_queue(struct virtio_device *vdev, size_t size,
 	}
 }
 
+static inline int
+__vring_size(unsigned int num, unsigned long align, bool packed)
+{
+	if (packed)
+		return vring_packed_size(num, align);
+	return vring_size(num, align);
+}
+
 struct virtqueue *vring_create_virtqueue(
 	unsigned int index,
 	unsigned int num,
@@ -1074,7 +1572,8 @@ struct virtqueue *vring_create_virtqueue(
 	void *queue = NULL;
 	dma_addr_t dma_addr;
 	size_t queue_size_in_bytes;
-	struct vring vring;
+	union vring_union vring;
+	bool packed;
 
 	/* We assume num is a power of 2. */
 	if (num & (num - 1)) {
@@ -1082,9 +1581,13 @@ struct virtqueue *vring_create_virtqueue(
 		return NULL;
 	}
 
+	packed = virtio_has_feature(vdev, VIRTIO_F_RING_PACKED);
+
 	/* TODO: allocate each queue chunk individually */
-	for (; num && vring_size(num, vring_align) > PAGE_SIZE; num /= 2) {
-		queue = vring_alloc_queue(vdev, vring_size(num, vring_align),
+	for (; num && __vring_size(num, vring_align, packed) > PAGE_SIZE;
+			num /= 2) {
+		queue = vring_alloc_queue(vdev, __vring_size(num, vring_align,
+							     packed),
 					  &dma_addr,
 					  GFP_KERNEL|__GFP_NOWARN|__GFP_ZERO);
 		if (queue)
@@ -1096,17 +1599,21 @@ struct virtqueue *vring_create_virtqueue(
 
 	if (!queue) {
 		/* Try to get a single page. You are my only hope! */
-		queue = vring_alloc_queue(vdev, vring_size(num, vring_align),
+		queue = vring_alloc_queue(vdev, __vring_size(num, vring_align,
+							     packed),
 					  &dma_addr, GFP_KERNEL|__GFP_ZERO);
 	}
 	if (!queue)
 		return NULL;
 
-	queue_size_in_bytes = vring_size(num, vring_align);
-	vring_init(&vring, num, queue, vring_align);
+	queue_size_in_bytes = __vring_size(num, vring_align, packed);
+	if (packed)
+		vring_packed_init(&vring.vring_packed, num, queue, vring_align);
+	else
+		vring_init(&vring.vring_split, num, queue, vring_align);
 
-	vq = __vring_new_virtqueue(index, vring, vdev, weak_barriers, context,
-				   notify, callback, name);
+	vq = __vring_new_virtqueue(index, vring, packed, vdev, weak_barriers,
+				   context, notify, callback, name);
 	if (!vq) {
 		vring_free_queue(vdev, queue_size_in_bytes, queue,
 				 dma_addr);
@@ -1132,10 +1639,17 @@ struct virtqueue *vring_new_virtqueue(unsigned int index,
 				      void (*callback)(struct virtqueue *vq),
 				      const char *name)
 {
-	struct vring vring;
-	vring_init(&vring, num, pages, vring_align);
-	return __vring_new_virtqueue(index, vring, vdev, weak_barriers, context,
-				     notify, callback, name);
+	union vring_union vring;
+	bool packed;
+
+	packed = virtio_has_feature(vdev, VIRTIO_F_RING_PACKED);
+	if (packed)
+		vring_packed_init(&vring.vring_packed, num, pages, vring_align);
+	else
+		vring_init(&vring.vring_split, num, pages, vring_align);
+
+	return __vring_new_virtqueue(index, vring, packed, vdev, weak_barriers,
+				     context, notify, callback, name);
 }
 EXPORT_SYMBOL_GPL(vring_new_virtqueue);
 
@@ -1145,7 +1659,9 @@ void vring_del_virtqueue(struct virtqueue *_vq)
 
 	if (vq->we_own_ring) {
 		vring_free_queue(vq->vq.vdev, vq->queue_size_in_bytes,
-				 vq->vring.desc, vq->queue_dma_addr);
+				 vq->packed ? (void *)vq->vring_packed.desc :
+					      (void *)vq->vring.desc,
+				 vq->queue_dma_addr);
 	}
 	list_del(&_vq->list);
 	kfree(vq);
@@ -1159,14 +1675,18 @@ void vring_transport_features(struct virtio_device *vdev)
 
 	for (i = VIRTIO_TRANSPORT_F_START; i < VIRTIO_TRANSPORT_F_END; i++) {
 		switch (i) {
+#if 0 // FIXME: to be implemented
 		case VIRTIO_RING_F_INDIRECT_DESC:
 			break;
+#endif
 		case VIRTIO_RING_F_EVENT_IDX:
 			break;
 		case VIRTIO_F_VERSION_1:
 			break;
 		case VIRTIO_F_IOMMU_PLATFORM:
 			break;
+		case VIRTIO_F_RING_PACKED:
+			break;
 		default:
 			/* We don't understand this bit. */
 			__virtio_clear_bit(vdev, i);
@@ -1187,7 +1707,7 @@ unsigned int virtqueue_get_vring_size(struct virtqueue *_vq)
 
 	struct vring_virtqueue *vq = to_vvq(_vq);
 
-	return vq->vring.num;
+	return vq->packed ? vq->vring_packed.num : vq->vring.num;
 }
 EXPORT_SYMBOL_GPL(virtqueue_get_vring_size);
 
@@ -1224,6 +1744,7 @@ dma_addr_t virtqueue_get_desc_addr(struct virtqueue *_vq)
 }
 EXPORT_SYMBOL_GPL(virtqueue_get_desc_addr);
 
+/* Only available for split ring */
 dma_addr_t virtqueue_get_avail_addr(struct virtqueue *_vq)
 {
 	struct vring_virtqueue *vq = to_vvq(_vq);
@@ -1235,6 +1756,7 @@ dma_addr_t virtqueue_get_avail_addr(struct virtqueue *_vq)
 }
 EXPORT_SYMBOL_GPL(virtqueue_get_avail_addr);
 
+/* Only available for split ring */
 dma_addr_t virtqueue_get_used_addr(struct virtqueue *_vq)
 {
 	struct vring_virtqueue *vq = to_vvq(_vq);
@@ -1246,6 +1768,7 @@ dma_addr_t virtqueue_get_used_addr(struct virtqueue *_vq)
 }
 EXPORT_SYMBOL_GPL(virtqueue_get_used_addr);
 
+/* Only available for split ring */
 const struct vring *virtqueue_get_vring(struct virtqueue *vq)
 {
 	return &to_vvq(vq)->vring;
diff --git a/include/linux/virtio_ring.h b/include/linux/virtio_ring.h
index bbf32524ab27..a0075894ad16 100644
--- a/include/linux/virtio_ring.h
+++ b/include/linux/virtio_ring.h
@@ -60,6 +60,11 @@ static inline void virtio_store_mb(bool weak_barriers,
 struct virtio_device;
 struct virtqueue;
 
+union vring_union {
+	struct vring vring_split;
+	struct vring_packed vring_packed;
+};
+
 /*
  * Creates a virtqueue and allocates the descriptor ring.  If
  * may_reduce_num is set, then this may allocate a smaller ring than
@@ -79,7 +84,8 @@ struct virtqueue *vring_create_virtqueue(unsigned int index,
 
 /* Creates a virtqueue with a custom layout. */
 struct virtqueue *__vring_new_virtqueue(unsigned int index,
-					struct vring vring,
+					union vring_union vring,
+					bool packed,
 					struct virtio_device *vdev,
 					bool weak_barriers,
 					bool ctx,
-- 
2.14.1

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH RFC 2/2] virtio_ring: support packed ring
@ 2018-02-23 11:18   ` Tiwei Bie
  0 siblings, 0 replies; 38+ messages in thread
From: Tiwei Bie @ 2018-02-23 11:18 UTC (permalink / raw)
  To: mst, virtualization, linux-kernel, netdev; +Cc: wexu

Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
---
 drivers/virtio/virtio_ring.c | 699 +++++++++++++++++++++++++++++++++++++------
 include/linux/virtio_ring.h  |   8 +-
 2 files changed, 618 insertions(+), 89 deletions(-)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index eb30f3e09a47..393778a2f809 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -58,14 +58,14 @@
 
 struct vring_desc_state {
 	void *data;			/* Data for callback. */
-	struct vring_desc *indir_desc;	/* Indirect descriptor, if any. */
+	void *indir_desc;		/* Indirect descriptor, if any. */
+	int num;			/* Descriptor list length. */
 };
 
 struct vring_virtqueue {
 	struct virtqueue vq;
 
-	/* Actual memory layout for this queue */
-	struct vring vring;
+	bool packed;
 
 	/* Can we use weak barriers? */
 	bool weak_barriers;
@@ -87,11 +87,28 @@ struct vring_virtqueue {
 	/* Last used index we've seen. */
 	u16 last_used_idx;
 
-	/* Last written value to avail->flags */
-	u16 avail_flags_shadow;
-
-	/* Last written value to avail->idx in guest byte order */
-	u16 avail_idx_shadow;
+	union {
+		/* Available for split ring */
+		struct {
+			/* Actual memory layout for this queue */
+			struct vring vring;
+
+			/* Last written value to avail->flags */
+			u16 avail_flags_shadow;
+
+			/* Last written value to avail->idx in
+			 * guest byte order */
+			u16 avail_idx_shadow;
+		};
+
+		/* Available for packed ring */
+		struct {
+			/* Actual memory layout for this queue */
+			struct vring_packed vring_packed;
+			u8 wrap_counter : 1;
+			bool chaining;
+		};
+	};
 
 	/* How to notify other side. FIXME: commonalize hcalls! */
 	bool (*notify)(struct virtqueue *vq);
@@ -201,26 +218,37 @@ static dma_addr_t vring_map_single(const struct vring_virtqueue *vq,
 			      cpu_addr, size, direction);
 }
 
-static void vring_unmap_one(const struct vring_virtqueue *vq,
-			    struct vring_desc *desc)
+static void vring_unmap_one(const struct vring_virtqueue *vq, void *_desc)
 {
+	u64 addr;
+	u32 len;
 	u16 flags;
 
 	if (!vring_use_dma_api(vq->vq.vdev))
 		return;
 
-	flags = virtio16_to_cpu(vq->vq.vdev, desc->flags);
+	if (vq->packed) {
+		struct vring_packed_desc *desc = _desc;
+
+		addr = virtio64_to_cpu(vq->vq.vdev, desc->addr);
+		len = virtio32_to_cpu(vq->vq.vdev, desc->len);
+		flags = virtio16_to_cpu(vq->vq.vdev, desc->flags);
+	} else {
+		struct vring_desc *desc = _desc;
+
+		addr = virtio64_to_cpu(vq->vq.vdev, desc->addr);
+		len = virtio32_to_cpu(vq->vq.vdev, desc->len);
+		flags = virtio16_to_cpu(vq->vq.vdev, desc->flags);
+	}
 
 	if (flags & VRING_DESC_F_INDIRECT) {
 		dma_unmap_single(vring_dma_dev(vq),
-				 virtio64_to_cpu(vq->vq.vdev, desc->addr),
-				 virtio32_to_cpu(vq->vq.vdev, desc->len),
+				 addr, len,
 				 (flags & VRING_DESC_F_WRITE) ?
 				 DMA_FROM_DEVICE : DMA_TO_DEVICE);
 	} else {
 		dma_unmap_page(vring_dma_dev(vq),
-			       virtio64_to_cpu(vq->vq.vdev, desc->addr),
-			       virtio32_to_cpu(vq->vq.vdev, desc->len),
+			       addr, len,
 			       (flags & VRING_DESC_F_WRITE) ?
 			       DMA_FROM_DEVICE : DMA_TO_DEVICE);
 	}
@@ -235,8 +263,9 @@ static int vring_mapping_error(const struct vring_virtqueue *vq,
 	return dma_mapping_error(vring_dma_dev(vq), addr);
 }
 
-static struct vring_desc *alloc_indirect(struct virtqueue *_vq,
-					 unsigned int total_sg, gfp_t gfp)
+static struct vring_desc *alloc_indirect_split(struct virtqueue *_vq,
+					       unsigned int total_sg,
+					       gfp_t gfp)
 {
 	struct vring_desc *desc;
 	unsigned int i;
@@ -257,14 +286,32 @@ static struct vring_desc *alloc_indirect(struct virtqueue *_vq,
 	return desc;
 }
 
-static inline int virtqueue_add(struct virtqueue *_vq,
-				struct scatterlist *sgs[],
-				unsigned int total_sg,
-				unsigned int out_sgs,
-				unsigned int in_sgs,
-				void *data,
-				void *ctx,
-				gfp_t gfp)
+static struct vring_packed_desc *alloc_indirect_packed(struct virtqueue *_vq,
+						       unsigned int total_sg,
+						       gfp_t gfp)
+{
+	struct vring_packed_desc *desc;
+
+	/*
+	 * We require lowmem mappings for the descriptors because
+	 * otherwise virt_to_phys will give us bogus addresses in the
+	 * virtqueue.
+	 */
+	gfp &= ~__GFP_HIGHMEM;
+
+	desc = kmalloc(total_sg * sizeof(struct vring_packed_desc), gfp);
+
+	return desc;
+}
+
+static inline int virtqueue_add_split(struct virtqueue *_vq,
+				      struct scatterlist *sgs[],
+				      unsigned int total_sg,
+				      unsigned int out_sgs,
+				      unsigned int in_sgs,
+				      void *data,
+				      void *ctx,
+				      gfp_t gfp)
 {
 	struct vring_virtqueue *vq = to_vvq(_vq);
 	struct scatterlist *sg;
@@ -303,7 +350,7 @@ static inline int virtqueue_add(struct virtqueue *_vq,
 	/* If the host supports indirect descriptor tables, and we have multiple
 	 * buffers, then go indirect. FIXME: tune this threshold */
 	if (vq->indirect && total_sg > 1 && vq->vq.num_free)
-		desc = alloc_indirect(_vq, total_sg, gfp);
+		desc = alloc_indirect_split(_vq, total_sg, gfp);
 	else {
 		desc = NULL;
 		WARN_ON_ONCE(total_sg > vq->vring.num && !vq->indirect);
@@ -437,6 +484,243 @@ static inline int virtqueue_add(struct virtqueue *_vq,
 	return -EIO;
 }
 
+static inline int virtqueue_add_packed(struct virtqueue *_vq,
+				       struct scatterlist *sgs[],
+				       unsigned int total_sg,
+				       unsigned int out_sgs,
+				       unsigned int in_sgs,
+				       void *data,
+				       void *ctx,
+				       gfp_t gfp)
+{
+	struct vring_virtqueue *vq = to_vvq(_vq);
+	struct vring_packed_desc *desc;
+	struct scatterlist *sg;
+	unsigned int i, n, descs_used, uninitialized_var(prev), err_idx;
+	__virtio16 uninitialized_var(head_flags), flags;
+	int head, wrap_counter;
+	bool indirect;
+
+	START_USE(vq);
+
+	BUG_ON(data == NULL);
+	BUG_ON(ctx && vq->indirect);
+
+	if (unlikely(vq->broken)) {
+		END_USE(vq);
+		return -EIO;
+	}
+
+	if (total_sg > 1 && !vq->chaining && !vq->indirect) {
+		END_USE(vq);
+		return -ENOTSUPP;
+	}
+
+#ifdef DEBUG
+	{
+		ktime_t now = ktime_get();
+
+		/* No kick or get, with .1 second between?  Warn. */
+		if (vq->last_add_time_valid)
+			WARN_ON(ktime_to_ms(ktime_sub(now, vq->last_add_time))
+					    > 100);
+		vq->last_add_time = now;
+		vq->last_add_time_valid = true;
+	}
+#endif
+
+	BUG_ON(total_sg == 0);
+
+	head = vq->free_head;
+	wrap_counter = vq->wrap_counter;
+
+	/* If the host supports indirect descriptor tables, and we have multiple
+	 * buffers, then go indirect. FIXME: tune this threshold */
+	if (vq->indirect && total_sg > 1 && vq->vq.num_free)
+		desc = alloc_indirect_packed(_vq, total_sg, gfp);
+	else {
+		desc = NULL;
+		WARN_ON_ONCE(total_sg > vq->vring.num && !vq->indirect);
+	}
+
+	if (desc) {
+		/* Use a single buffer which doesn't continue */
+		indirect = true;
+		/* Set up rest to use this indirect table. */
+		i = 0;
+		descs_used = 1;
+	} else {
+		indirect = false;
+		desc = vq->vring_packed.desc;
+		i = head;
+		descs_used = total_sg;
+
+		if (total_sg > 1 && !vq->chaining) {
+			END_USE(vq);
+			return -ENOTSUPP;
+		}
+	}
+
+	if (vq->vq.num_free < descs_used) {
+		pr_debug("Can't add buf len %i - avail = %i\n",
+			 descs_used, vq->vq.num_free);
+		/* FIXME: for historical reasons, we force a notify here if
+		 * there are outgoing parts to the buffer.  Presumably the
+		 * host should service the ring ASAP. */
+		if (out_sgs)
+			vq->notify(&vq->vq);
+		if (indirect)
+			kfree(desc);
+		END_USE(vq);
+		return -ENOSPC;
+	}
+
+	for (n = 0; n < out_sgs; n++) {
+		for (sg = sgs[n]; sg; sg = sg_next(sg)) {
+			dma_addr_t addr = vring_map_one_sg(vq, sg, DMA_TO_DEVICE);
+			if (vring_mapping_error(vq, addr))
+				goto unmap_release;
+
+			flags = cpu_to_virtio16(_vq->vdev, VRING_DESC_F_NEXT |
+					VRING_DESC_F_AVAIL(vq->wrap_counter) |
+					VRING_DESC_F_USED(!vq->wrap_counter));
+			if (!indirect && i == head)
+				head_flags = flags;
+			else
+				desc[i].flags = flags;
+
+			desc[i].addr = cpu_to_virtio64(_vq->vdev, addr);
+			desc[i].len = cpu_to_virtio32(_vq->vdev, sg->length);
+			desc[i].id = cpu_to_virtio32(_vq->vdev, head);
+			prev = i;
+			i++;
+			if (!indirect && i >= vq->vring_packed.num) {
+				i = 0;
+				vq->wrap_counter ^= 1;
+			}
+		}
+	}
+	for (; n < (out_sgs + in_sgs); n++) {
+		for (sg = sgs[n]; sg; sg = sg_next(sg)) {
+			dma_addr_t addr = vring_map_one_sg(vq, sg, DMA_FROM_DEVICE);
+			if (vring_mapping_error(vq, addr))
+				goto unmap_release;
+
+			flags = cpu_to_virtio16(_vq->vdev, VRING_DESC_F_NEXT |
+					VRING_DESC_F_WRITE |
+					VRING_DESC_F_AVAIL(vq->wrap_counter) |
+					VRING_DESC_F_USED(!vq->wrap_counter));
+			if (!indirect && i == head)
+				head_flags = flags;
+			else
+				desc[i].flags = flags;
+
+			desc[i].addr = cpu_to_virtio64(_vq->vdev, addr);
+			desc[i].len = cpu_to_virtio32(_vq->vdev, sg->length);
+			desc[i].id = cpu_to_virtio32(_vq->vdev, head);
+			prev = i;
+			i++;
+			if (!indirect && i >= vq->vring_packed.num) {
+				i = 0;
+				vq->wrap_counter ^= 1;
+			}
+		}
+	}
+	/* Last one doesn't continue. */
+	if (!indirect && (head + 1) % vq->vring_packed.num == i)
+		head_flags &= cpu_to_virtio16(_vq->vdev, ~VRING_DESC_F_NEXT);
+	else
+		desc[prev].flags &= cpu_to_virtio16(_vq->vdev, ~VRING_DESC_F_NEXT);
+
+	if (indirect) {
+		/* FIXME: to be implemented */
+
+		/* Now that the indirect table is filled in, map it. */
+		dma_addr_t addr = vring_map_single(
+			vq, desc, total_sg * sizeof(struct vring_packed_desc),
+			DMA_TO_DEVICE);
+		if (vring_mapping_error(vq, addr))
+			goto unmap_release;
+
+		head_flags = cpu_to_virtio16(_vq->vdev, VRING_DESC_F_INDIRECT |
+					     VRING_DESC_F_AVAIL(wrap_counter) |
+					     VRING_DESC_F_USED(!wrap_counter));
+		vq->vring_packed.desc[head].addr = cpu_to_virtio64(_vq->vdev, addr);
+		vq->vring_packed.desc[head].len = cpu_to_virtio32(_vq->vdev,
+				total_sg * sizeof(struct vring_packed_desc));
+		vq->vring_packed.desc[head].id = cpu_to_virtio32(_vq->vdev, head);
+	}
+
+	/* We're using some buffers from the free list. */
+	vq->vq.num_free -= descs_used;
+
+	/* Update free pointer */
+	if (indirect) {
+		n = head + 1;
+		if (n >= vq->vring_packed.num) {
+			n = 0;
+			vq->wrap_counter ^= 1;
+		}
+		vq->free_head = n;
+	} else
+		vq->free_head = i;
+
+	/* Store token and indirect buffer state. */
+	vq->desc_state[head].num = descs_used;
+	vq->desc_state[head].data = data;
+	if (indirect)
+		vq->desc_state[head].indir_desc = desc;
+	else
+		vq->desc_state[head].indir_desc = ctx;
+
+	virtio_wmb(vq->weak_barriers);
+	vq->vring_packed.desc[head].flags = head_flags;
+	vq->num_added++;
+
+	pr_debug("Added buffer head %i to %p\n", head, vq);
+	END_USE(vq);
+
+	return 0;
+
+unmap_release:
+	err_idx = i;
+	i = head;
+
+	for (n = 0; n < total_sg; n++) {
+		if (i == err_idx)
+			break;
+		vring_unmap_one(vq, &desc[i]);
+		i++;
+		if (!indirect && i >= vq->vring_packed.num)
+			i = 0;
+	}
+
+	vq->wrap_counter = wrap_counter;
+
+	if (indirect)
+		kfree(desc);
+
+	END_USE(vq);
+	return -EIO;
+}
+
+static inline int virtqueue_add(struct virtqueue *_vq,
+				struct scatterlist *sgs[],
+				unsigned int total_sg,
+				unsigned int out_sgs,
+				unsigned int in_sgs,
+				void *data,
+				void *ctx,
+				gfp_t gfp)
+{
+	struct vring_virtqueue *vq = to_vvq(_vq);
+
+	return vq->packed ? virtqueue_add_packed(_vq, sgs, total_sg, out_sgs,
+						 in_sgs, data, ctx, gfp) :
+			    virtqueue_add_split(_vq, sgs, total_sg, out_sgs,
+						in_sgs, data, ctx, gfp);
+}
+
 /**
  * virtqueue_add_sgs - expose buffers to other end
  * @vq: the struct virtqueue we're talking about.
@@ -561,6 +845,12 @@ bool virtqueue_kick_prepare(struct virtqueue *_vq)
 	 * event. */
 	virtio_mb(vq->weak_barriers);
 
+	if (vq->packed) {
+		/* FIXME: to be implemented */
+		needs_kick = true;
+		goto out;
+	}
+
 	old = vq->avail_idx_shadow - vq->num_added;
 	new = vq->avail_idx_shadow;
 	vq->num_added = 0;
@@ -579,6 +869,8 @@ bool virtqueue_kick_prepare(struct virtqueue *_vq)
 	} else {
 		needs_kick = !(vq->vring.used->flags & cpu_to_virtio16(_vq->vdev, VRING_USED_F_NO_NOTIFY));
 	}
+
+out:
 	END_USE(vq);
 	return needs_kick;
 }
@@ -628,8 +920,8 @@ bool virtqueue_kick(struct virtqueue *vq)
 }
 EXPORT_SYMBOL_GPL(virtqueue_kick);
 
-static void detach_buf(struct vring_virtqueue *vq, unsigned int head,
-		       void **ctx)
+static void detach_buf_split(struct vring_virtqueue *vq, unsigned int head,
+			     void **ctx)
 {
 	unsigned int i, j;
 	__virtio16 nextflag = cpu_to_virtio16(vq->vq.vdev, VRING_DESC_F_NEXT);
@@ -677,29 +969,81 @@ static void detach_buf(struct vring_virtqueue *vq, unsigned int head,
 	}
 }
 
-static inline bool more_used(const struct vring_virtqueue *vq)
+static void detach_buf_packed(struct vring_virtqueue *vq, unsigned int head,
+			      void **ctx)
+{
+	struct vring_packed_desc *desc;
+	unsigned int i, j;
+
+	/* Clear data ptr. */
+	vq->desc_state[head].data = NULL;
+
+	i = head;
+
+	for (j = 0; j < vq->desc_state[head].num; j++) {
+		desc = &vq->vring_packed.desc[i];
+		vring_unmap_one(vq, desc);
+		i++;
+		if (i >= vq->vring_packed.num)
+			i = 0;
+	}
+
+	vq->vq.num_free += vq->desc_state[head].num;
+
+	if (vq->indirect) {
+		u32 len;
+
+		desc = vq->desc_state[head].indir_desc;
+		/* Free the indirect table, if any, now that it's unmapped. */
+		if (!desc)
+			return;
+
+		len = virtio32_to_cpu(vq->vq.vdev,
+				      vq->vring_packed.desc[head].len);
+
+		BUG_ON(!(vq->vring_packed.desc[head].flags &
+			 cpu_to_virtio16(vq->vq.vdev, VRING_DESC_F_INDIRECT)));
+		BUG_ON(len == 0 || len % sizeof(struct vring_packed_desc));
+
+		for (j = 0; j < len / sizeof(struct vring_packed_desc); j++)
+			vring_unmap_one(vq, &desc[j]);
+
+		kfree(desc);
+		vq->desc_state[head].indir_desc = NULL;
+	} else if (ctx) {
+		*ctx = vq->desc_state[head].indir_desc;
+	}
+}
+
+static inline bool more_used_split(const struct vring_virtqueue *vq)
 {
 	return vq->last_used_idx != virtio16_to_cpu(vq->vq.vdev, vq->vring.used->idx);
 }
 
-/**
- * virtqueue_get_buf - get the next used buffer
- * @vq: the struct virtqueue we're talking about.
- * @len: the length written into the buffer
- *
- * If the device wrote data into the buffer, @len will be set to the
- * amount written.  This means you don't need to clear the buffer
- * beforehand to ensure there's no data leakage in the case of short
- * writes.
- *
- * Caller must ensure we don't call this with other virtqueue
- * operations at the same time (except where noted).
- *
- * Returns NULL if there are no used buffers, or the "data" token
- * handed to virtqueue_add_*().
- */
-void *virtqueue_get_buf_ctx(struct virtqueue *_vq, unsigned int *len,
-			    void **ctx)
+static inline bool more_used_packed(const struct vring_virtqueue *vq)
+{
+	u16 last_used, flags;
+	bool avail, used;
+
+	if (vq->vq.num_free == vq->vring.num)
+		return false;
+
+	last_used = vq->last_used_idx;
+	flags = virtio16_to_cpu(vq->vq.vdev,
+				vq->vring_packed.desc[last_used].flags);
+	avail = flags & VRING_DESC_F_AVAIL(1);
+	used = flags & VRING_DESC_F_USED(1);
+
+	return avail == used;
+}
+
+static inline bool more_used(const struct vring_virtqueue *vq)
+{
+	return vq->packed ? more_used_packed(vq) : more_used_split(vq);
+}
+
+void *virtqueue_get_buf_ctx_split(struct virtqueue *_vq, unsigned int *len,
+				  void **ctx)
 {
 	struct vring_virtqueue *vq = to_vvq(_vq);
 	void *ret;
@@ -735,9 +1079,9 @@ void *virtqueue_get_buf_ctx(struct virtqueue *_vq, unsigned int *len,
 		return NULL;
 	}
 
-	/* detach_buf clears data, so grab it now. */
+	/* detach_buf_split clears data, so grab it now. */
 	ret = vq->desc_state[i].data;
-	detach_buf(vq, i, ctx);
+	detach_buf_split(vq, i, ctx);
 	vq->last_used_idx++;
 	/* If we expect an interrupt for the next entry, tell host
 	 * by writing event index and flush out the write before
@@ -754,6 +1098,87 @@ void *virtqueue_get_buf_ctx(struct virtqueue *_vq, unsigned int *len,
 	END_USE(vq);
 	return ret;
 }
+
+void *virtqueue_get_buf_ctx_packed(struct virtqueue *_vq, unsigned int *len,
+				   void **ctx)
+{
+	struct vring_virtqueue *vq = to_vvq(_vq);
+	void *ret;
+	unsigned int i;
+	u16 last_used;
+
+	START_USE(vq);
+
+	if (unlikely(vq->broken)) {
+		END_USE(vq);
+		return NULL;
+	}
+
+	if (!more_used(vq)) {
+		pr_debug("No more buffers in queue\n");
+		END_USE(vq);
+		return NULL;
+	}
+
+	/* Only get used array entries after they have been exposed by host. */
+	virtio_rmb(vq->weak_barriers);
+
+	last_used = vq->last_used_idx;
+
+	i = virtio32_to_cpu(_vq->vdev, vq->vring_packed.desc[last_used].id);
+	*len = virtio32_to_cpu(_vq->vdev, vq->vring_packed.desc[last_used].len);
+
+	if (unlikely(i >= vq->vring_packed.num)) {
+		BAD_RING(vq, "id %u out of range\n", i);
+		return NULL;
+	}
+	if (unlikely(!vq->desc_state[i].data)) {
+		BAD_RING(vq, "id %u is not a head!\n", i);
+		return NULL;
+	}
+
+	/* detach_buf_packed clears data, so grab it now. */
+	ret = vq->desc_state[i].data;
+	detach_buf_packed(vq, i, ctx);
+
+	vq->last_used_idx += vq->desc_state[i].num;
+	if (vq->last_used_idx >= vq->vring_packed.num)
+		vq->last_used_idx %= vq->vring_packed.num;
+
+	// FIXME: implement the desc event support
+
+#ifdef DEBUG
+	vq->last_add_time_valid = false;
+#endif
+
+	END_USE(vq);
+	return ret;
+}
+
+/**
+ * virtqueue_get_buf - get the next used buffer
+ * @vq: the struct virtqueue we're talking about.
+ * @len: the length written into the buffer
+ *
+ * If the device wrote data into the buffer, @len will be set to the
+ * amount written.  This means you don't need to clear the buffer
+ * beforehand to ensure there's no data leakage in the case of short
+ * writes.
+ *
+ * Caller must ensure we don't call this with other virtqueue
+ * operations at the same time (except where noted).
+ *
+ * Returns NULL if there are no used buffers, or the "data" token
+ * handed to virtqueue_add_*().
+ */
+void *virtqueue_get_buf_ctx(struct virtqueue *_vq, unsigned int *len,
+			    void **ctx)
+{
+	struct vring_virtqueue *vq = to_vvq(_vq);
+
+	return vq->packed ? virtqueue_get_buf_ctx_packed(_vq, len, ctx) :
+			    virtqueue_get_buf_ctx_split(_vq, len, ctx);
+}
 EXPORT_SYMBOL_GPL(virtqueue_get_buf_ctx);
 
 void *virtqueue_get_buf(struct virtqueue *_vq, unsigned int *len)
@@ -761,6 +1186,24 @@ void *virtqueue_get_buf(struct virtqueue *_vq, unsigned int *len)
 	return virtqueue_get_buf_ctx(_vq, len, NULL);
 }
 EXPORT_SYMBOL_GPL(virtqueue_get_buf);
+
+static void virtqueue_disable_cb_split(struct virtqueue *_vq)
+{
+	struct vring_virtqueue *vq = to_vvq(_vq);
+
+	if (!(vq->avail_flags_shadow & VRING_AVAIL_F_NO_INTERRUPT)) {
+		vq->avail_flags_shadow |= VRING_AVAIL_F_NO_INTERRUPT;
+		if (!vq->event)
+			vq->vring.avail->flags = cpu_to_virtio16(_vq->vdev,
+							vq->avail_flags_shadow);
+	}
+}
+
+static void virtqueue_disable_cb_packed(struct virtqueue *_vq)
+{
+	// FIXME: to be implemented
+}
+
 /**
  * virtqueue_disable_cb - disable callbacks
  * @vq: the struct virtqueue we're talking about.
@@ -774,12 +1217,10 @@ void virtqueue_disable_cb(struct virtqueue *_vq)
 {
 	struct vring_virtqueue *vq = to_vvq(_vq);
 
-	if (!(vq->avail_flags_shadow & VRING_AVAIL_F_NO_INTERRUPT)) {
-		vq->avail_flags_shadow |= VRING_AVAIL_F_NO_INTERRUPT;
-		if (!vq->event)
-			vq->vring.avail->flags = cpu_to_virtio16(_vq->vdev, vq->avail_flags_shadow);
-	}
-
+	if (vq->packed)
+		virtqueue_disable_cb_packed(_vq);
+	else
+		virtqueue_disable_cb_split(_vq);
 }
 EXPORT_SYMBOL_GPL(virtqueue_disable_cb);
 
@@ -802,6 +1243,12 @@ unsigned virtqueue_enable_cb_prepare(struct virtqueue *_vq)
 
 	START_USE(vq);
 
+	if (vq->packed) {
+		// FIXME: to be implemented
+		last_used_idx = vq->last_used_idx;
+		goto out;
+	}
+
 	/* We optimistically turn back on interrupts, then check if there was
 	 * more to do. */
 	/* Depending on the VIRTIO_RING_F_EVENT_IDX feature, we need to
@@ -813,6 +1260,7 @@ unsigned virtqueue_enable_cb_prepare(struct virtqueue *_vq)
 			vq->vring.avail->flags = cpu_to_virtio16(_vq->vdev, vq->avail_flags_shadow);
 	}
 	vring_used_event(&vq->vring) = cpu_to_virtio16(_vq->vdev, last_used_idx = vq->last_used_idx);
+out:
 	END_USE(vq);
 	return last_used_idx;
 }
@@ -832,6 +1280,12 @@ bool virtqueue_poll(struct virtqueue *_vq, unsigned last_used_idx)
 	struct vring_virtqueue *vq = to_vvq(_vq);
 
 	virtio_mb(vq->weak_barriers);
+	if (vq->packed) {
+		u16 flags = virtio16_to_cpu(vq->vq.vdev,
+				vq->vring_packed.desc[last_used_idx].flags);
+		return !(flags & VRING_DESC_F_AVAIL(1)) ==
+		       !(flags & VRING_DESC_F_USED(1));
+	}
 	return (u16)last_used_idx != virtio16_to_cpu(_vq->vdev, vq->vring.used->idx);
 }
 EXPORT_SYMBOL_GPL(virtqueue_poll);
@@ -874,6 +1328,11 @@ bool virtqueue_enable_cb_delayed(struct virtqueue *_vq)
 
 	START_USE(vq);
 
+	if (vq->packed) {
+		// FIXME: to be implemented
+		goto out;
+	}
+
 	/* We optimistically turn back on interrupts, then check if there was
 	 * more to do. */
 	/* Depending on the VIRTIO_RING_F_USED_EVENT_IDX feature, we need to
@@ -896,6 +1355,7 @@ bool virtqueue_enable_cb_delayed(struct virtqueue *_vq)
 		return false;
 	}
 
+out:
 	END_USE(vq);
 	return true;
 }
@@ -922,14 +1382,20 @@ void *virtqueue_detach_unused_buf(struct virtqueue *_vq)
 			continue;
 		/* detach_buf clears data, so grab it now. */
 		buf = vq->desc_state[i].data;
-		detach_buf(vq, i, NULL);
-		vq->avail_idx_shadow--;
-		vq->vring.avail->idx = cpu_to_virtio16(_vq->vdev, vq->avail_idx_shadow);
+		if (vq->packed)
+			detach_buf_packed(vq, i, NULL);
+		else {
+			detach_buf_split(vq, i, NULL);
+			vq->avail_idx_shadow--;
+			vq->vring.avail->idx = cpu_to_virtio16(_vq->vdev,
+							vq->avail_idx_shadow);
+		}
 		END_USE(vq);
 		return buf;
 	}
 	/* That should have freed everything. */
-	BUG_ON(vq->vq.num_free != vq->vring.num);
+	BUG_ON(vq->vq.num_free != (vq->packed ? vq->vring_packed.num :
+						vq->vring.num));
 
 	END_USE(vq);
 	return NULL;
@@ -957,7 +1423,8 @@ irqreturn_t vring_interrupt(int irq, void *_vq)
 EXPORT_SYMBOL_GPL(vring_interrupt);
 
 struct virtqueue *__vring_new_virtqueue(unsigned int index,
-					struct vring vring,
+					union vring_union vring,
+					bool packed,
 					struct virtio_device *vdev,
 					bool weak_barriers,
 					bool context,
@@ -965,19 +1432,20 @@ struct virtqueue *__vring_new_virtqueue(unsigned int index,
 					void (*callback)(struct virtqueue *),
 					const char *name)
 {
-	unsigned int i;
+	unsigned int num, i;
 	struct vring_virtqueue *vq;
 
-	vq = kmalloc(sizeof(*vq) + vring.num * sizeof(struct vring_desc_state),
+	num = packed ? vring.vring_packed.num : vring.vring_split.num;
+
+	vq = kmalloc(sizeof(*vq) + num * sizeof(struct vring_desc_state),
 		     GFP_KERNEL);
 	if (!vq)
 		return NULL;
 
-	vq->vring = vring;
 	vq->vq.callback = callback;
 	vq->vq.vdev = vdev;
 	vq->vq.name = name;
-	vq->vq.num_free = vring.num;
+	vq->vq.num_free = num;
 	vq->vq.index = index;
 	vq->we_own_ring = false;
 	vq->queue_dma_addr = 0;
@@ -986,9 +1454,8 @@ struct virtqueue *__vring_new_virtqueue(unsigned int index,
 	vq->weak_barriers = weak_barriers;
 	vq->broken = false;
 	vq->last_used_idx = 0;
-	vq->avail_flags_shadow = 0;
-	vq->avail_idx_shadow = 0;
 	vq->num_added = 0;
+	vq->packed = packed;
 	list_add_tail(&vq->vq.list, &vdev->vqs);
 #ifdef DEBUG
 	vq->in_use = false;
@@ -999,18 +1466,41 @@ struct virtqueue *__vring_new_virtqueue(unsigned int index,
 		!context;
 	vq->event = virtio_has_feature(vdev, VIRTIO_RING_F_EVENT_IDX);
 
+	if (vq->packed) {
+		vq->vring_packed = vring.vring_packed;
+		vq->free_head = 0;
+		vq->wrap_counter = 1;
+
+#if 0
+		vq->chaining = virtio_has_feature(vdev,
+						  VIRTIO_RING_F_LIST_DESC);
+#else
+		vq->chaining = true;
+#endif
+	} else {
+		vq->vring = vring.vring_split;
+		vq->avail_flags_shadow = 0;
+		vq->avail_idx_shadow = 0;
+
+		/* Put everything in free lists. */
+		vq->free_head = 0;
+		for (i = 0; i < num-1; i++)
+			vq->vring.desc[i].next = cpu_to_virtio16(vdev, i + 1);
+	}
+
 	/* No callback?  Tell other side not to bother us. */
 	if (!callback) {
-		vq->avail_flags_shadow |= VRING_AVAIL_F_NO_INTERRUPT;
-		if (!vq->event)
-			vq->vring.avail->flags = cpu_to_virtio16(vdev, vq->avail_flags_shadow);
+		if (packed) {
+			// FIXME: to be implemented
+		} else {
+			vq->avail_flags_shadow |= VRING_AVAIL_F_NO_INTERRUPT;
+			if (!vq->event)
+				vq->vring.avail->flags = cpu_to_virtio16(vdev,
+						vq->avail_flags_shadow);
+		}
 	}
 
-	/* Put everything in free lists. */
-	vq->free_head = 0;
-	for (i = 0; i < vring.num-1; i++)
-		vq->vring.desc[i].next = cpu_to_virtio16(vdev, i + 1);
-	memset(vq->desc_state, 0, vring.num * sizeof(struct vring_desc_state));
+	memset(vq->desc_state, 0, num * sizeof(struct vring_desc_state));
 
 	return &vq->vq;
 }
@@ -1058,6 +1548,14 @@ static void vring_free_queue(struct virtio_device *vdev, size_t size,
 	}
 }
 
+static inline int
+__vring_size(unsigned int num, unsigned long align, bool packed)
+{
+	if (packed)
+		return vring_packed_size(num, align);
+	return vring_size(num, align);
+}
+
 struct virtqueue *vring_create_virtqueue(
 	unsigned int index,
 	unsigned int num,
@@ -1074,7 +1572,8 @@ struct virtqueue *vring_create_virtqueue(
 	void *queue = NULL;
 	dma_addr_t dma_addr;
 	size_t queue_size_in_bytes;
-	struct vring vring;
+	union vring_union vring;
+	bool packed;
 
 	/* We assume num is a power of 2. */
 	if (num & (num - 1)) {
@@ -1082,9 +1581,13 @@ struct virtqueue *vring_create_virtqueue(
 		return NULL;
 	}
 
+	packed = virtio_has_feature(vdev, VIRTIO_F_RING_PACKED);
+
 	/* TODO: allocate each queue chunk individually */
-	for (; num && vring_size(num, vring_align) > PAGE_SIZE; num /= 2) {
-		queue = vring_alloc_queue(vdev, vring_size(num, vring_align),
+	for (; num && __vring_size(num, vring_align, packed) > PAGE_SIZE;
+			num /= 2) {
+		queue = vring_alloc_queue(vdev, __vring_size(num, vring_align,
+							     packed),
 					  &dma_addr,
 					  GFP_KERNEL|__GFP_NOWARN|__GFP_ZERO);
 		if (queue)
@@ -1096,17 +1599,21 @@ struct virtqueue *vring_create_virtqueue(
 
 	if (!queue) {
 		/* Try to get a single page. You are my only hope! */
-		queue = vring_alloc_queue(vdev, vring_size(num, vring_align),
+		queue = vring_alloc_queue(vdev, __vring_size(num, vring_align,
+							     packed),
 					  &dma_addr, GFP_KERNEL|__GFP_ZERO);
 	}
 	if (!queue)
 		return NULL;
 
-	queue_size_in_bytes = vring_size(num, vring_align);
-	vring_init(&vring, num, queue, vring_align);
+	queue_size_in_bytes = __vring_size(num, vring_align, packed);
+	if (packed)
+		vring_packed_init(&vring.vring_packed, num, queue, vring_align);
+	else
+		vring_init(&vring.vring_split, num, queue, vring_align);
 
-	vq = __vring_new_virtqueue(index, vring, vdev, weak_barriers, context,
-				   notify, callback, name);
+	vq = __vring_new_virtqueue(index, vring, packed, vdev, weak_barriers,
+				   context, notify, callback, name);
 	if (!vq) {
 		vring_free_queue(vdev, queue_size_in_bytes, queue,
 				 dma_addr);
@@ -1132,10 +1639,17 @@ struct virtqueue *vring_new_virtqueue(unsigned int index,
 				      void (*callback)(struct virtqueue *vq),
 				      const char *name)
 {
-	struct vring vring;
-	vring_init(&vring, num, pages, vring_align);
-	return __vring_new_virtqueue(index, vring, vdev, weak_barriers, context,
-				     notify, callback, name);
+	union vring_union vring;
+	bool packed;
+
+	packed = virtio_has_feature(vdev, VIRTIO_F_RING_PACKED);
+	if (packed)
+		vring_packed_init(&vring.vring_packed, num, pages, vring_align);
+	else
+		vring_init(&vring.vring_split, num, pages, vring_align);
+
+	return __vring_new_virtqueue(index, vring, packed, vdev, weak_barriers,
+				     context, notify, callback, name);
 }
 EXPORT_SYMBOL_GPL(vring_new_virtqueue);
 
@@ -1145,7 +1659,9 @@ void vring_del_virtqueue(struct virtqueue *_vq)
 
 	if (vq->we_own_ring) {
 		vring_free_queue(vq->vq.vdev, vq->queue_size_in_bytes,
-				 vq->vring.desc, vq->queue_dma_addr);
+				 vq->packed ? (void *)vq->vring_packed.desc :
+					      (void *)vq->vring.desc,
+				 vq->queue_dma_addr);
 	}
 	list_del(&_vq->list);
 	kfree(vq);
@@ -1159,14 +1675,18 @@ void vring_transport_features(struct virtio_device *vdev)
 
 	for (i = VIRTIO_TRANSPORT_F_START; i < VIRTIO_TRANSPORT_F_END; i++) {
 		switch (i) {
+#if 0 // FIXME: to be implemented
 		case VIRTIO_RING_F_INDIRECT_DESC:
 			break;
+#endif
 		case VIRTIO_RING_F_EVENT_IDX:
 			break;
 		case VIRTIO_F_VERSION_1:
 			break;
 		case VIRTIO_F_IOMMU_PLATFORM:
 			break;
+		case VIRTIO_F_RING_PACKED:
+			break;
 		default:
 			/* We don't understand this bit. */
 			__virtio_clear_bit(vdev, i);
@@ -1187,7 +1707,7 @@ unsigned int virtqueue_get_vring_size(struct virtqueue *_vq)
 
 	struct vring_virtqueue *vq = to_vvq(_vq);
 
-	return vq->vring.num;
+	return vq->packed ? vq->vring_packed.num : vq->vring.num;
 }
 EXPORT_SYMBOL_GPL(virtqueue_get_vring_size);
 
@@ -1224,6 +1744,7 @@ dma_addr_t virtqueue_get_desc_addr(struct virtqueue *_vq)
 }
 EXPORT_SYMBOL_GPL(virtqueue_get_desc_addr);
 
+/* Only available for split ring */
 dma_addr_t virtqueue_get_avail_addr(struct virtqueue *_vq)
 {
 	struct vring_virtqueue *vq = to_vvq(_vq);
@@ -1235,6 +1756,7 @@ dma_addr_t virtqueue_get_avail_addr(struct virtqueue *_vq)
 }
 EXPORT_SYMBOL_GPL(virtqueue_get_avail_addr);
 
+/* Only available for split ring */
 dma_addr_t virtqueue_get_used_addr(struct virtqueue *_vq)
 {
 	struct vring_virtqueue *vq = to_vvq(_vq);
@@ -1246,6 +1768,7 @@ dma_addr_t virtqueue_get_used_addr(struct virtqueue *_vq)
 }
 EXPORT_SYMBOL_GPL(virtqueue_get_used_addr);
 
+/* Only available for split ring */
 const struct vring *virtqueue_get_vring(struct virtqueue *vq)
 {
 	return &to_vvq(vq)->vring;
diff --git a/include/linux/virtio_ring.h b/include/linux/virtio_ring.h
index bbf32524ab27..a0075894ad16 100644
--- a/include/linux/virtio_ring.h
+++ b/include/linux/virtio_ring.h
@@ -60,6 +60,11 @@ static inline void virtio_store_mb(bool weak_barriers,
 struct virtio_device;
 struct virtqueue;
 
+union vring_union {
+	struct vring vring_split;
+	struct vring_packed vring_packed;
+};
+
 /*
  * Creates a virtqueue and allocates the descriptor ring.  If
  * may_reduce_num is set, then this may allocate a smaller ring than
@@ -79,7 +84,8 @@ struct virtqueue *vring_create_virtqueue(unsigned int index,
 
 /* Creates a virtqueue with a custom layout. */
 struct virtqueue *__vring_new_virtqueue(unsigned int index,
-					struct vring vring,
+					union vring_union vring,
+					bool packed,
 					struct virtio_device *vdev,
 					bool weak_barriers,
 					bool ctx,
-- 
2.14.1

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* Re: [PATCH RFC 1/2] virtio: introduce packed ring defines
  2018-02-23 11:18   ` Tiwei Bie
  (?)
  (?)
@ 2018-02-27  8:54   ` Jens Freimann
  2018-02-27  9:18     ` Jens Freimann
                       ` (5 more replies)
  -1 siblings, 6 replies; 38+ messages in thread
From: Jens Freimann @ 2018-02-27  8:54 UTC (permalink / raw)
  To: Tiwei Bie; +Cc: mst, virtualization, linux-kernel, netdev, jasowang, wexu

On Fri, Feb 23, 2018 at 07:18:00PM +0800, Tiwei Bie wrote:
>Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
>---
> include/uapi/linux/virtio_config.h | 18 +++++++++-
> include/uapi/linux/virtio_ring.h   | 68 ++++++++++++++++++++++++++++++++++++++
> 2 files changed, 85 insertions(+), 1 deletion(-)
>
>diff --git a/include/uapi/linux/virtio_config.h b/include/uapi/linux/virtio_config.h
>index 308e2096291f..e3d077ef5207 100644
>--- a/include/uapi/linux/virtio_config.h
>+++ b/include/uapi/linux/virtio_config.h
>@@ -49,7 +49,7 @@
>  * transport being used (eg. virtio_ring), the rest are per-device feature
>  * bits. */
> #define VIRTIO_TRANSPORT_F_START	28
>-#define VIRTIO_TRANSPORT_F_END		34
>+#define VIRTIO_TRANSPORT_F_END		37
>
> #ifndef VIRTIO_CONFIG_NO_LEGACY
> /* Do we get callbacks when the ring is completely used, even if we've
>@@ -71,4 +71,20 @@
>  * this is for compatibility with legacy systems.
>  */
> #define VIRTIO_F_IOMMU_PLATFORM		33
>+
>+/* This feature indicates support for the packed virtqueue layout. */
>+#define VIRTIO_F_RING_PACKED		34

Spec says VIRTIO_F_PACKED_RING not RING_PACKED
>+
>+/*
>+ * This feature indicates that all buffers are used by the device
>+ * in the same order in which they have been made available.
>+ */
>+#define VIRTIO_F_IN_ORDER		35
>+
>+/*
>+ * This feature indicates that drivers pass extra data (besides
>+ * identifying the Virtqueue) in their device notifications.
>+ */
>+#define VIRTIO_F_NOTIFICATION_DATA	36
>+
> #endif /* _UAPI_LINUX_VIRTIO_CONFIG_H */
>diff --git a/include/uapi/linux/virtio_ring.h b/include/uapi/linux/virtio_ring.h
>index 6d5d5faa989b..77b1d4aeef72 100644
>--- a/include/uapi/linux/virtio_ring.h
>+++ b/include/uapi/linux/virtio_ring.h
>@@ -44,6 +44,9 @@
> /* This means the buffer contains a list of buffer descriptors. */
> #define VRING_DESC_F_INDIRECT	4
>
>+#define VRING_DESC_F_AVAIL(b)	((b) << 7)
>+#define VRING_DESC_F_USED(b)	((b) << 15)
>+
> /* The Host uses this in used->flags to advise the Guest: don't kick me when
>  * you add a buffer.  It's unreliable, so it's simply an optimization.  Guest
>  * will still kick if it's out of buffers. */
>@@ -104,6 +107,36 @@ struct vring {
> 	struct vring_used *used;
> };
>
>+struct vring_packed_desc_event {
>+	/* Descriptor Event Offset */
>+	__virtio16 desc_event_off   : 15,
>+	/* Descriptor Event Wrap Counter */
>+		   desc_event_wrap  : 1;
>+	/* Descriptor Event Flags */
>+	__virtio16 desc_event_flags : 2;
>+};

Where would the virtqueue number go in driver notifications?

regards,
Jens 

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH RFC 1/2] virtio: introduce packed ring defines
  2018-02-23 11:18   ` Tiwei Bie
  (?)
@ 2018-02-27  8:54   ` Jens Freimann
  -1 siblings, 0 replies; 38+ messages in thread
From: Jens Freimann @ 2018-02-27  8:54 UTC (permalink / raw)
  To: Tiwei Bie; +Cc: mst, netdev, linux-kernel, virtualization, wexu

On Fri, Feb 23, 2018 at 07:18:00PM +0800, Tiwei Bie wrote:
>Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
>---
> include/uapi/linux/virtio_config.h | 18 +++++++++-
> include/uapi/linux/virtio_ring.h   | 68 ++++++++++++++++++++++++++++++++++++++
> 2 files changed, 85 insertions(+), 1 deletion(-)
>
>diff --git a/include/uapi/linux/virtio_config.h b/include/uapi/linux/virtio_config.h
>index 308e2096291f..e3d077ef5207 100644
>--- a/include/uapi/linux/virtio_config.h
>+++ b/include/uapi/linux/virtio_config.h
>@@ -49,7 +49,7 @@
>  * transport being used (eg. virtio_ring), the rest are per-device feature
>  * bits. */
> #define VIRTIO_TRANSPORT_F_START	28
>-#define VIRTIO_TRANSPORT_F_END		34
>+#define VIRTIO_TRANSPORT_F_END		37
>
> #ifndef VIRTIO_CONFIG_NO_LEGACY
> /* Do we get callbacks when the ring is completely used, even if we've
>@@ -71,4 +71,20 @@
>  * this is for compatibility with legacy systems.
>  */
> #define VIRTIO_F_IOMMU_PLATFORM		33
>+
>+/* This feature indicates support for the packed virtqueue layout. */
>+#define VIRTIO_F_RING_PACKED		34

Spec says VIRTIO_F_PACKED_RING not RING_PACKED
>+
>+/*
>+ * This feature indicates that all buffers are used by the device
>+ * in the same order in which they have been made available.
>+ */
>+#define VIRTIO_F_IN_ORDER		35
>+
>+/*
>+ * This feature indicates that drivers pass extra data (besides
>+ * identifying the Virtqueue) in their device notifications.
>+ */
>+#define VIRTIO_F_NOTIFICATION_DATA	36
>+
> #endif /* _UAPI_LINUX_VIRTIO_CONFIG_H */
>diff --git a/include/uapi/linux/virtio_ring.h b/include/uapi/linux/virtio_ring.h
>index 6d5d5faa989b..77b1d4aeef72 100644
>--- a/include/uapi/linux/virtio_ring.h
>+++ b/include/uapi/linux/virtio_ring.h
>@@ -44,6 +44,9 @@
> /* This means the buffer contains a list of buffer descriptors. */
> #define VRING_DESC_F_INDIRECT	4
>
>+#define VRING_DESC_F_AVAIL(b)	((b) << 7)
>+#define VRING_DESC_F_USED(b)	((b) << 15)
>+
> /* The Host uses this in used->flags to advise the Guest: don't kick me when
>  * you add a buffer.  It's unreliable, so it's simply an optimization.  Guest
>  * will still kick if it's out of buffers. */
>@@ -104,6 +107,36 @@ struct vring {
> 	struct vring_used *used;
> };
>
>+struct vring_packed_desc_event {
>+	/* Descriptor Event Offset */
>+	__virtio16 desc_event_off   : 15,
>+	/* Descriptor Event Wrap Counter */
>+		   desc_event_wrap  : 1;
>+	/* Descriptor Event Flags */
>+	__virtio16 desc_event_flags : 2;
>+};

Where would the virtqueue number go in driver notifications?

regards,
Jens 

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH RFC 1/2] virtio: introduce packed ring defines
  2018-02-27  8:54   ` Jens Freimann
@ 2018-02-27  9:18     ` Jens Freimann
  2018-02-27  9:18     ` Jens Freimann
                       ` (4 subsequent siblings)
  5 siblings, 0 replies; 38+ messages in thread
From: Jens Freimann @ 2018-02-27  9:18 UTC (permalink / raw)
  To: Tiwei Bie; +Cc: mst, virtualization, linux-kernel, netdev, jasowang, wexu

On Tue, Feb 27, 2018 at 09:54:58AM +0100, Jens Freimann wrote:
>On Fri, Feb 23, 2018 at 07:18:00PM +0800, Tiwei Bie wrote:
>>Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
>>---
>>include/uapi/linux/virtio_config.h | 18 +++++++++-
>>include/uapi/linux/virtio_ring.h   | 68 ++++++++++++++++++++++++++++++++++++++
>>2 files changed, 85 insertions(+), 1 deletion(-)
>>
>>diff --git a/include/uapi/linux/virtio_config.h b/include/uapi/linux/virtio_config.h
>>index 308e2096291f..e3d077ef5207 100644
>>--- a/include/uapi/linux/virtio_config.h
>>+++ b/include/uapi/linux/virtio_config.h
>>@@ -49,7 +49,7 @@
>> * transport being used (eg. virtio_ring), the rest are per-device feature
>> * bits. */
>>#define VIRTIO_TRANSPORT_F_START	28
>>-#define VIRTIO_TRANSPORT_F_END		34
>>+#define VIRTIO_TRANSPORT_F_END		37
>>
>>#ifndef VIRTIO_CONFIG_NO_LEGACY
>>/* Do we get callbacks when the ring is completely used, even if we've
>>@@ -71,4 +71,20 @@
>> * this is for compatibility with legacy systems.
>> */
>>#define VIRTIO_F_IOMMU_PLATFORM		33
>>+
>>+/* This feature indicates support for the packed virtqueue layout. */
>>+#define VIRTIO_F_RING_PACKED		34
>
>Spec says VIRTIO_F_PACKED_RING not RING_PACKED

Ignore this. Seems to have changed.

regards,
Jens 

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH RFC 1/2] virtio: introduce packed ring defines
  2018-02-27  8:54   ` Jens Freimann
  2018-02-27  9:18     ` Jens Freimann
@ 2018-02-27  9:18     ` Jens Freimann
  2018-02-27 12:01     ` Tiwei Bie
                       ` (3 subsequent siblings)
  5 siblings, 0 replies; 38+ messages in thread
From: Jens Freimann @ 2018-02-27  9:18 UTC (permalink / raw)
  To: Tiwei Bie; +Cc: mst, netdev, linux-kernel, virtualization, wexu

On Tue, Feb 27, 2018 at 09:54:58AM +0100, Jens Freimann wrote:
>On Fri, Feb 23, 2018 at 07:18:00PM +0800, Tiwei Bie wrote:
>>Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
>>---
>>include/uapi/linux/virtio_config.h | 18 +++++++++-
>>include/uapi/linux/virtio_ring.h   | 68 ++++++++++++++++++++++++++++++++++++++
>>2 files changed, 85 insertions(+), 1 deletion(-)
>>
>>diff --git a/include/uapi/linux/virtio_config.h b/include/uapi/linux/virtio_config.h
>>index 308e2096291f..e3d077ef5207 100644
>>--- a/include/uapi/linux/virtio_config.h
>>+++ b/include/uapi/linux/virtio_config.h
>>@@ -49,7 +49,7 @@
>> * transport being used (eg. virtio_ring), the rest are per-device feature
>> * bits. */
>>#define VIRTIO_TRANSPORT_F_START	28
>>-#define VIRTIO_TRANSPORT_F_END		34
>>+#define VIRTIO_TRANSPORT_F_END		37
>>
>>#ifndef VIRTIO_CONFIG_NO_LEGACY
>>/* Do we get callbacks when the ring is completely used, even if we've
>>@@ -71,4 +71,20 @@
>> * this is for compatibility with legacy systems.
>> */
>>#define VIRTIO_F_IOMMU_PLATFORM		33
>>+
>>+/* This feature indicates support for the packed virtqueue layout. */
>>+#define VIRTIO_F_RING_PACKED		34
>
>Spec says VIRTIO_F_PACKED_RING not RING_PACKED

Ignore this. Seems to have changed.

regards,
Jens 

^ permalink raw reply	[flat|nested] 38+ messages in thread

* RE: [PATCH RFC 1/2] virtio: introduce packed ring defines
  2018-02-23 11:18   ` Tiwei Bie
                     ` (3 preceding siblings ...)
  (?)
@ 2018-02-27  9:26   ` David Laight
  2018-02-27 11:31       ` Tiwei Bie
  -1 siblings, 1 reply; 38+ messages in thread
From: David Laight @ 2018-02-27  9:26 UTC (permalink / raw)
  To: 'Tiwei Bie', mst, virtualization, linux-kernel, netdev
  Cc: jasowang, wexu, jfreimann

From: Tiwei Bie
> Sent: 23 February 2018 11:18
...
> +struct vring_packed_desc_event {
> +	/* Descriptor Event Offset */
> +	__virtio16 desc_event_off   : 15,
> +	/* Descriptor Event Wrap Counter */
> +		   desc_event_wrap  : 1;
> +	/* Descriptor Event Flags */
> +	__virtio16 desc_event_flags : 2;
> +};

This looks like you are assuming that a bit-field has a defined
layout and can be used to map a 'hardware' structure.
The don't, don't use them like that.

	David

^ permalink raw reply	[flat|nested] 38+ messages in thread

* RE: [PATCH RFC 1/2] virtio: introduce packed ring defines
  2018-02-23 11:18   ` Tiwei Bie
                     ` (2 preceding siblings ...)
  (?)
@ 2018-02-27  9:26   ` David Laight
  -1 siblings, 0 replies; 38+ messages in thread
From: David Laight @ 2018-02-27  9:26 UTC (permalink / raw)
  To: 'Tiwei Bie', mst, virtualization, linux-kernel, netdev; +Cc: wexu

From: Tiwei Bie
> Sent: 23 February 2018 11:18
...
> +struct vring_packed_desc_event {
> +	/* Descriptor Event Offset */
> +	__virtio16 desc_event_off   : 15,
> +	/* Descriptor Event Wrap Counter */
> +		   desc_event_wrap  : 1;
> +	/* Descriptor Event Flags */
> +	__virtio16 desc_event_flags : 2;
> +};

This looks like you are assuming that a bit-field has a defined
layout and can be used to map a 'hardware' structure.
The don't, don't use them like that.

	David

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH RFC 1/2] virtio: introduce packed ring defines
  2018-02-27  9:26   ` David Laight
@ 2018-02-27 11:31       ` Tiwei Bie
  0 siblings, 0 replies; 38+ messages in thread
From: Tiwei Bie @ 2018-02-27 11:31 UTC (permalink / raw)
  To: David Laight
  Cc: mst, virtualization, linux-kernel, netdev, jasowang, wexu, jfreimann

On Tue, Feb 27, 2018 at 09:26:27AM +0000, David Laight wrote:
> From: Tiwei Bie
> > Sent: 23 February 2018 11:18
> ...
> > +struct vring_packed_desc_event {
> > +	/* Descriptor Event Offset */
> > +	__virtio16 desc_event_off   : 15,
> > +	/* Descriptor Event Wrap Counter */
> > +		   desc_event_wrap  : 1;
> > +	/* Descriptor Event Flags */
> > +	__virtio16 desc_event_flags : 2;
> > +};
> 
> This looks like you are assuming that a bit-field has a defined
> layout and can be used to map a 'hardware' structure.
> The don't, don't use them like that.
> 
> 	David
> 

Thanks for the comments! Above definition isn't used in
this RFC, and the corresponding parts (event suppression)
haven't been implemented yet. It's more like some pseudo
code (I should add some comments about this in the code).

I planned to change it to something like this in the next
version:

struct vring_packed_desc_event {
	__virtio16 off_wrap;
	__virtio16 flags;  // XXX maybe not a good name for future
};                         // extension. Only 2bits are used now.

But it seems that I had a misunderstanding about the spec
on this previously:

https://lists.oasis-open.org/archives/virtio-dev/201802/msg00173.html

Anyway, it will be addressed. Thank you very much! ;-)

Best regards,
Tiwei Bie

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH RFC 1/2] virtio: introduce packed ring defines
@ 2018-02-27 11:31       ` Tiwei Bie
  0 siblings, 0 replies; 38+ messages in thread
From: Tiwei Bie @ 2018-02-27 11:31 UTC (permalink / raw)
  To: David Laight; +Cc: mst, netdev, linux-kernel, virtualization, wexu

On Tue, Feb 27, 2018 at 09:26:27AM +0000, David Laight wrote:
> From: Tiwei Bie
> > Sent: 23 February 2018 11:18
> ...
> > +struct vring_packed_desc_event {
> > +	/* Descriptor Event Offset */
> > +	__virtio16 desc_event_off   : 15,
> > +	/* Descriptor Event Wrap Counter */
> > +		   desc_event_wrap  : 1;
> > +	/* Descriptor Event Flags */
> > +	__virtio16 desc_event_flags : 2;
> > +};
> 
> This looks like you are assuming that a bit-field has a defined
> layout and can be used to map a 'hardware' structure.
> The don't, don't use them like that.
> 
> 	David
> 

Thanks for the comments! Above definition isn't used in
this RFC, and the corresponding parts (event suppression)
haven't been implemented yet. It's more like some pseudo
code (I should add some comments about this in the code).

I planned to change it to something like this in the next
version:

struct vring_packed_desc_event {
	__virtio16 off_wrap;
	__virtio16 flags;  // XXX maybe not a good name for future
};                         // extension. Only 2bits are used now.

But it seems that I had a misunderstanding about the spec
on this previously:

https://lists.oasis-open.org/archives/virtio-dev/201802/msg00173.html

Anyway, it will be addressed. Thank you very much! ;-)

Best regards,
Tiwei Bie

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH RFC 1/2] virtio: introduce packed ring defines
  2018-02-27  8:54   ` Jens Freimann
                       ` (2 preceding siblings ...)
  2018-02-27 12:01     ` Tiwei Bie
@ 2018-02-27 12:01     ` Tiwei Bie
  2018-02-27 20:28     ` Michael S. Tsirkin
  2018-02-27 20:28     ` Michael S. Tsirkin
  5 siblings, 0 replies; 38+ messages in thread
From: Tiwei Bie @ 2018-02-27 12:01 UTC (permalink / raw)
  To: Jens Freimann; +Cc: mst, virtualization, linux-kernel, netdev, jasowang, wexu

On Tue, Feb 27, 2018 at 09:54:58AM +0100, Jens Freimann wrote:
> On Fri, Feb 23, 2018 at 07:18:00PM +0800, Tiwei Bie wrote:
[...]
> > 
> > +struct vring_packed_desc_event {
> > +	/* Descriptor Event Offset */
> > +	__virtio16 desc_event_off   : 15,
> > +	/* Descriptor Event Wrap Counter */
> > +		   desc_event_wrap  : 1;
> > +	/* Descriptor Event Flags */
> > +	__virtio16 desc_event_flags : 2;
> > +};
> 
> Where would the virtqueue number go in driver notifications?

This structure is for event suppression instead of notification.

You could refer to the "Event Suppression Structure Format"
section of the spec for more details:

https://github.com/oasis-tcs/virtio-docs/blob/master/virtio-v1.1-packed-wd08.pdf

Best regards,
Tiwei Bie

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH RFC 1/2] virtio: introduce packed ring defines
  2018-02-27  8:54   ` Jens Freimann
  2018-02-27  9:18     ` Jens Freimann
  2018-02-27  9:18     ` Jens Freimann
@ 2018-02-27 12:01     ` Tiwei Bie
  2018-02-27 12:01     ` Tiwei Bie
                       ` (2 subsequent siblings)
  5 siblings, 0 replies; 38+ messages in thread
From: Tiwei Bie @ 2018-02-27 12:01 UTC (permalink / raw)
  To: Jens Freimann; +Cc: mst, netdev, linux-kernel, virtualization, wexu

On Tue, Feb 27, 2018 at 09:54:58AM +0100, Jens Freimann wrote:
> On Fri, Feb 23, 2018 at 07:18:00PM +0800, Tiwei Bie wrote:
[...]
> > 
> > +struct vring_packed_desc_event {
> > +	/* Descriptor Event Offset */
> > +	__virtio16 desc_event_off   : 15,
> > +	/* Descriptor Event Wrap Counter */
> > +		   desc_event_wrap  : 1;
> > +	/* Descriptor Event Flags */
> > +	__virtio16 desc_event_flags : 2;
> > +};
> 
> Where would the virtqueue number go in driver notifications?

This structure is for event suppression instead of notification.

You could refer to the "Event Suppression Structure Format"
section of the spec for more details:

https://github.com/oasis-tcs/virtio-docs/blob/master/virtio-v1.1-packed-wd08.pdf

Best regards,
Tiwei Bie

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH RFC 1/2] virtio: introduce packed ring defines
  2018-02-27  8:54   ` Jens Freimann
                       ` (3 preceding siblings ...)
  2018-02-27 12:01     ` Tiwei Bie
@ 2018-02-27 20:28     ` Michael S. Tsirkin
  2018-02-27 20:28     ` Michael S. Tsirkin
  5 siblings, 0 replies; 38+ messages in thread
From: Michael S. Tsirkin @ 2018-02-27 20:28 UTC (permalink / raw)
  To: Jens Freimann
  Cc: Tiwei Bie, virtualization, linux-kernel, netdev, jasowang, wexu

On Tue, Feb 27, 2018 at 09:54:58AM +0100, Jens Freimann wrote:
> On Fri, Feb 23, 2018 at 07:18:00PM +0800, Tiwei Bie wrote:
> > Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
> > ---
> > include/uapi/linux/virtio_config.h | 18 +++++++++-
> > include/uapi/linux/virtio_ring.h   | 68 ++++++++++++++++++++++++++++++++++++++
> > 2 files changed, 85 insertions(+), 1 deletion(-)
> > 
> > diff --git a/include/uapi/linux/virtio_config.h b/include/uapi/linux/virtio_config.h
> > index 308e2096291f..e3d077ef5207 100644
> > --- a/include/uapi/linux/virtio_config.h
> > +++ b/include/uapi/linux/virtio_config.h
> > @@ -49,7 +49,7 @@
> >  * transport being used (eg. virtio_ring), the rest are per-device feature
> >  * bits. */
> > #define VIRTIO_TRANSPORT_F_START	28
> > -#define VIRTIO_TRANSPORT_F_END		34
> > +#define VIRTIO_TRANSPORT_F_END		37
> > 
> > #ifndef VIRTIO_CONFIG_NO_LEGACY
> > /* Do we get callbacks when the ring is completely used, even if we've
> > @@ -71,4 +71,20 @@
> >  * this is for compatibility with legacy systems.
> >  */
> > #define VIRTIO_F_IOMMU_PLATFORM		33
> > +
> > +/* This feature indicates support for the packed virtqueue layout. */
> > +#define VIRTIO_F_RING_PACKED		34
> 
> Spec says VIRTIO_F_PACKED_RING not RING_PACKED

I changed that now :)

> > +
> > +/*
> > + * This feature indicates that all buffers are used by the device
> > + * in the same order in which they have been made available.
> > + */
> > +#define VIRTIO_F_IN_ORDER		35
> > +
> > +/*
> > + * This feature indicates that drivers pass extra data (besides
> > + * identifying the Virtqueue) in their device notifications.
> > + */
> > +#define VIRTIO_F_NOTIFICATION_DATA	36
> > +
> > #endif /* _UAPI_LINUX_VIRTIO_CONFIG_H */
> > diff --git a/include/uapi/linux/virtio_ring.h b/include/uapi/linux/virtio_ring.h
> > index 6d5d5faa989b..77b1d4aeef72 100644
> > --- a/include/uapi/linux/virtio_ring.h
> > +++ b/include/uapi/linux/virtio_ring.h
> > @@ -44,6 +44,9 @@
> > /* This means the buffer contains a list of buffer descriptors. */
> > #define VRING_DESC_F_INDIRECT	4
> > 
> > +#define VRING_DESC_F_AVAIL(b)	((b) << 7)
> > +#define VRING_DESC_F_USED(b)	((b) << 15)
> > +
> > /* The Host uses this in used->flags to advise the Guest: don't kick me when
> >  * you add a buffer.  It's unreliable, so it's simply an optimization.  Guest
> >  * will still kick if it's out of buffers. */
> > @@ -104,6 +107,36 @@ struct vring {
> > 	struct vring_used *used;
> > };
> > 
> > +struct vring_packed_desc_event {
> > +	/* Descriptor Event Offset */
> > +	__virtio16 desc_event_off   : 15,
> > +	/* Descriptor Event Wrap Counter */
> > +		   desc_event_wrap  : 1;
> > +	/* Descriptor Event Flags */
> > +	__virtio16 desc_event_flags : 2;
> > +};
> 
> Where would the virtqueue number go in driver notifications?
> 
> regards,
> Jens

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH RFC 1/2] virtio: introduce packed ring defines
  2018-02-27  8:54   ` Jens Freimann
                       ` (4 preceding siblings ...)
  2018-02-27 20:28     ` Michael S. Tsirkin
@ 2018-02-27 20:28     ` Michael S. Tsirkin
  5 siblings, 0 replies; 38+ messages in thread
From: Michael S. Tsirkin @ 2018-02-27 20:28 UTC (permalink / raw)
  To: Jens Freimann; +Cc: netdev, linux-kernel, virtualization, wexu

On Tue, Feb 27, 2018 at 09:54:58AM +0100, Jens Freimann wrote:
> On Fri, Feb 23, 2018 at 07:18:00PM +0800, Tiwei Bie wrote:
> > Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
> > ---
> > include/uapi/linux/virtio_config.h | 18 +++++++++-
> > include/uapi/linux/virtio_ring.h   | 68 ++++++++++++++++++++++++++++++++++++++
> > 2 files changed, 85 insertions(+), 1 deletion(-)
> > 
> > diff --git a/include/uapi/linux/virtio_config.h b/include/uapi/linux/virtio_config.h
> > index 308e2096291f..e3d077ef5207 100644
> > --- a/include/uapi/linux/virtio_config.h
> > +++ b/include/uapi/linux/virtio_config.h
> > @@ -49,7 +49,7 @@
> >  * transport being used (eg. virtio_ring), the rest are per-device feature
> >  * bits. */
> > #define VIRTIO_TRANSPORT_F_START	28
> > -#define VIRTIO_TRANSPORT_F_END		34
> > +#define VIRTIO_TRANSPORT_F_END		37
> > 
> > #ifndef VIRTIO_CONFIG_NO_LEGACY
> > /* Do we get callbacks when the ring is completely used, even if we've
> > @@ -71,4 +71,20 @@
> >  * this is for compatibility with legacy systems.
> >  */
> > #define VIRTIO_F_IOMMU_PLATFORM		33
> > +
> > +/* This feature indicates support for the packed virtqueue layout. */
> > +#define VIRTIO_F_RING_PACKED		34
> 
> Spec says VIRTIO_F_PACKED_RING not RING_PACKED

I changed that now :)

> > +
> > +/*
> > + * This feature indicates that all buffers are used by the device
> > + * in the same order in which they have been made available.
> > + */
> > +#define VIRTIO_F_IN_ORDER		35
> > +
> > +/*
> > + * This feature indicates that drivers pass extra data (besides
> > + * identifying the Virtqueue) in their device notifications.
> > + */
> > +#define VIRTIO_F_NOTIFICATION_DATA	36
> > +
> > #endif /* _UAPI_LINUX_VIRTIO_CONFIG_H */
> > diff --git a/include/uapi/linux/virtio_ring.h b/include/uapi/linux/virtio_ring.h
> > index 6d5d5faa989b..77b1d4aeef72 100644
> > --- a/include/uapi/linux/virtio_ring.h
> > +++ b/include/uapi/linux/virtio_ring.h
> > @@ -44,6 +44,9 @@
> > /* This means the buffer contains a list of buffer descriptors. */
> > #define VRING_DESC_F_INDIRECT	4
> > 
> > +#define VRING_DESC_F_AVAIL(b)	((b) << 7)
> > +#define VRING_DESC_F_USED(b)	((b) << 15)
> > +
> > /* The Host uses this in used->flags to advise the Guest: don't kick me when
> >  * you add a buffer.  It's unreliable, so it's simply an optimization.  Guest
> >  * will still kick if it's out of buffers. */
> > @@ -104,6 +107,36 @@ struct vring {
> > 	struct vring_used *used;
> > };
> > 
> > +struct vring_packed_desc_event {
> > +	/* Descriptor Event Offset */
> > +	__virtio16 desc_event_off   : 15,
> > +	/* Descriptor Event Wrap Counter */
> > +		   desc_event_wrap  : 1;
> > +	/* Descriptor Event Flags */
> > +	__virtio16 desc_event_flags : 2;
> > +};
> 
> Where would the virtqueue number go in driver notifications?
> 
> regards,
> Jens

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH RFC 2/2] virtio_ring: support packed ring
  2018-02-23 11:18   ` Tiwei Bie
@ 2018-03-16  4:03     ` Jason Wang
  -1 siblings, 0 replies; 38+ messages in thread
From: Jason Wang @ 2018-03-16  4:03 UTC (permalink / raw)
  To: Tiwei Bie, mst, virtualization, linux-kernel, netdev; +Cc: wexu, jfreimann



On 2018年02月23日 19:18, Tiwei Bie wrote:
> Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
> ---
>   drivers/virtio/virtio_ring.c | 699 +++++++++++++++++++++++++++++++++++++------
>   include/linux/virtio_ring.h  |   8 +-
>   2 files changed, 618 insertions(+), 89 deletions(-)
>
> diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> index eb30f3e09a47..393778a2f809 100644
> --- a/drivers/virtio/virtio_ring.c
> +++ b/drivers/virtio/virtio_ring.c
> @@ -58,14 +58,14 @@
>   
>   struct vring_desc_state {
>   	void *data;			/* Data for callback. */
> -	struct vring_desc *indir_desc;	/* Indirect descriptor, if any. */
> +	void *indir_desc;		/* Indirect descriptor, if any. */
> +	int num;			/* Descriptor list length. */
>   };
>   
>   struct vring_virtqueue {
>   	struct virtqueue vq;
>   
> -	/* Actual memory layout for this queue */
> -	struct vring vring;
> +	bool packed;
>   
>   	/* Can we use weak barriers? */
>   	bool weak_barriers;
> @@ -87,11 +87,28 @@ struct vring_virtqueue {
>   	/* Last used index we've seen. */
>   	u16 last_used_idx;
>   
> -	/* Last written value to avail->flags */
> -	u16 avail_flags_shadow;
> -
> -	/* Last written value to avail->idx in guest byte order */
> -	u16 avail_idx_shadow;
> +	union {
> +		/* Available for split ring */
> +		struct {
> +			/* Actual memory layout for this queue */
> +			struct vring vring;
> +
> +			/* Last written value to avail->flags */
> +			u16 avail_flags_shadow;
> +
> +			/* Last written value to avail->idx in
> +			 * guest byte order */
> +			u16 avail_idx_shadow;
> +		};
> +
> +		/* Available for packed ring */
> +		struct {
> +			/* Actual memory layout for this queue */
> +			struct vring_packed vring_packed;
> +			u8 wrap_counter : 1;
> +			bool chaining;
> +		};
> +	};
>   
>   	/* How to notify other side. FIXME: commonalize hcalls! */
>   	bool (*notify)(struct virtqueue *vq);
> @@ -201,26 +218,37 @@ static dma_addr_t vring_map_single(const struct vring_virtqueue *vq,
>   			      cpu_addr, size, direction);
>   }
>   
> -static void vring_unmap_one(const struct vring_virtqueue *vq,
> -			    struct vring_desc *desc)
> +static void vring_unmap_one(const struct vring_virtqueue *vq, void *_desc)
>   {

Let's split the helpers to packed/split version like other helpers? 
(Consider the caller has already known the type of vq).

> +	u64 addr;
> +	u32 len;
>   	u16 flags;
>   
>   	if (!vring_use_dma_api(vq->vq.vdev))
>   		return;
>   
> -	flags = virtio16_to_cpu(vq->vq.vdev, desc->flags);
> +	if (vq->packed) {
> +		struct vring_packed_desc *desc = _desc;
> +
> +		addr = virtio64_to_cpu(vq->vq.vdev, desc->addr);
> +		len = virtio32_to_cpu(vq->vq.vdev, desc->len);
> +		flags = virtio16_to_cpu(vq->vq.vdev, desc->flags);
> +	} else {
> +		struct vring_desc *desc = _desc;
> +
> +		addr = virtio64_to_cpu(vq->vq.vdev, desc->addr);
> +		len = virtio32_to_cpu(vq->vq.vdev, desc->len);
> +		flags = virtio16_to_cpu(vq->vq.vdev, desc->flags);
> +	}
>   
>   	if (flags & VRING_DESC_F_INDIRECT) {
>   		dma_unmap_single(vring_dma_dev(vq),
> -				 virtio64_to_cpu(vq->vq.vdev, desc->addr),
> -				 virtio32_to_cpu(vq->vq.vdev, desc->len),
> +				 addr, len,
>   				 (flags & VRING_DESC_F_WRITE) ?
>   				 DMA_FROM_DEVICE : DMA_TO_DEVICE);
>   	} else {
>   		dma_unmap_page(vring_dma_dev(vq),
> -			       virtio64_to_cpu(vq->vq.vdev, desc->addr),
> -			       virtio32_to_cpu(vq->vq.vdev, desc->len),
> +			       addr, len,
>   			       (flags & VRING_DESC_F_WRITE) ?
>   			       DMA_FROM_DEVICE : DMA_TO_DEVICE);
>   	}
> @@ -235,8 +263,9 @@ static int vring_mapping_error(const struct vring_virtqueue *vq,
>   	return dma_mapping_error(vring_dma_dev(vq), addr);
>   }
>   
> -static struct vring_desc *alloc_indirect(struct virtqueue *_vq,
> -					 unsigned int total_sg, gfp_t gfp)
> +static struct vring_desc *alloc_indirect_split(struct virtqueue *_vq,
> +					       unsigned int total_sg,
> +					       gfp_t gfp)
>   {
>   	struct vring_desc *desc;
>   	unsigned int i;
> @@ -257,14 +286,32 @@ static struct vring_desc *alloc_indirect(struct virtqueue *_vq,
>   	return desc;
>   }
>   
> -static inline int virtqueue_add(struct virtqueue *_vq,
> -				struct scatterlist *sgs[],
> -				unsigned int total_sg,
> -				unsigned int out_sgs,
> -				unsigned int in_sgs,
> -				void *data,
> -				void *ctx,
> -				gfp_t gfp)
> +static struct vring_packed_desc *alloc_indirect_packed(struct virtqueue *_vq,
> +						       unsigned int total_sg,
> +						       gfp_t gfp)
> +{
> +	struct vring_packed_desc *desc;
> +
> +	/*
> +	 * We require lowmem mappings for the descriptors because
> +	 * otherwise virt_to_phys will give us bogus addresses in the
> +	 * virtqueue.
> +	 */
> +	gfp &= ~__GFP_HIGHMEM;
> +
> +	desc = kmalloc(total_sg * sizeof(struct vring_packed_desc), gfp);
> +
> +	return desc;
> +}
> +
> +static inline int virtqueue_add_split(struct virtqueue *_vq,
> +				      struct scatterlist *sgs[],
> +				      unsigned int total_sg,
> +				      unsigned int out_sgs,
> +				      unsigned int in_sgs,
> +				      void *data,
> +				      void *ctx,
> +				      gfp_t gfp)
>   {
>   	struct vring_virtqueue *vq = to_vvq(_vq);
>   	struct scatterlist *sg;
> @@ -303,7 +350,7 @@ static inline int virtqueue_add(struct virtqueue *_vq,
>   	/* If the host supports indirect descriptor tables, and we have multiple
>   	 * buffers, then go indirect. FIXME: tune this threshold */
>   	if (vq->indirect && total_sg > 1 && vq->vq.num_free)
> -		desc = alloc_indirect(_vq, total_sg, gfp);
> +		desc = alloc_indirect_split(_vq, total_sg, gfp);
>   	else {
>   		desc = NULL;
>   		WARN_ON_ONCE(total_sg > vq->vring.num && !vq->indirect);
> @@ -437,6 +484,243 @@ static inline int virtqueue_add(struct virtqueue *_vq,
>   	return -EIO;
>   }
>   
> +static inline int virtqueue_add_packed(struct virtqueue *_vq,
> +				       struct scatterlist *sgs[],
> +				       unsigned int total_sg,
> +				       unsigned int out_sgs,
> +				       unsigned int in_sgs,
> +				       void *data,
> +				       void *ctx,
> +				       gfp_t gfp)
> +{
> +	struct vring_virtqueue *vq = to_vvq(_vq);
> +	struct vring_packed_desc *desc;
> +	struct scatterlist *sg;
> +	unsigned int i, n, descs_used, uninitialized_var(prev), err_idx;
> +	__virtio16 uninitialized_var(head_flags), flags;
> +	int head, wrap_counter;
> +	bool indirect;
> +
> +	START_USE(vq);
> +
> +	BUG_ON(data == NULL);
> +	BUG_ON(ctx && vq->indirect);
> +
> +	if (unlikely(vq->broken)) {
> +		END_USE(vq);
> +		return -EIO;
> +	}
> +
> +	if (total_sg > 1 && !vq->chaining && !vq->indirect) {
> +		END_USE(vq);
> +		return -ENOTSUPP;
> +	}
> +
> +#ifdef DEBUG
> +	{
> +		ktime_t now = ktime_get();
> +
> +		/* No kick or get, with .1 second between?  Warn. */
> +		if (vq->last_add_time_valid)
> +			WARN_ON(ktime_to_ms(ktime_sub(now, vq->last_add_time))
> +					    > 100);
> +		vq->last_add_time = now;
> +		vq->last_add_time_valid = true;
> +	}
> +#endif
> +
> +	BUG_ON(total_sg == 0);
> +
> +	head = vq->free_head;
> +	wrap_counter = vq->wrap_counter;
> +
> +	/* If the host supports indirect descriptor tables, and we have multiple
> +	 * buffers, then go indirect. FIXME: tune this threshold */
> +	if (vq->indirect && total_sg > 1 && vq->vq.num_free)
> +		desc = alloc_indirect_packed(_vq, total_sg, gfp);
> +	else {
> +		desc = NULL;
> +		WARN_ON_ONCE(total_sg > vq->vring.num && !vq->indirect);
> +	}
> +
> +	if (desc) {
> +		/* Use a single buffer which doesn't continue */
> +		indirect = true;
> +		/* Set up rest to use this indirect table. */
> +		i = 0;
> +		descs_used = 1;
> +	} else {
> +		indirect = false;
> +		desc = vq->vring_packed.desc;
> +		i = head;
> +		descs_used = total_sg;
> +
> +		if (total_sg > 1 && !vq->chaining) {
> +			END_USE(vq);
> +			return -ENOTSUPP;
> +		}
> +	}
> +
> +	if (vq->vq.num_free < descs_used) {
> +		pr_debug("Can't add buf len %i - avail = %i\n",
> +			 descs_used, vq->vq.num_free);
> +		/* FIXME: for historical reasons, we force a notify here if
> +		 * there are outgoing parts to the buffer.  Presumably the
> +		 * host should service the ring ASAP. */
> +		if (out_sgs)
> +			vq->notify(&vq->vq);
> +		if (indirect)
> +			kfree(desc);
> +		END_USE(vq);
> +		return -ENOSPC;
> +	}
> +
> +	for (n = 0; n < out_sgs; n++) {
> +		for (sg = sgs[n]; sg; sg = sg_next(sg)) {
> +			dma_addr_t addr = vring_map_one_sg(vq, sg, DMA_TO_DEVICE);
> +			if (vring_mapping_error(vq, addr))
> +				goto unmap_release;
> +
> +			flags = cpu_to_virtio16(_vq->vdev, VRING_DESC_F_NEXT |
> +					VRING_DESC_F_AVAIL(vq->wrap_counter) |
> +					VRING_DESC_F_USED(!vq->wrap_counter));
> +			if (!indirect && i == head)
> +				head_flags = flags;
> +			else
> +				desc[i].flags = flags;
> +
> +			desc[i].addr = cpu_to_virtio64(_vq->vdev, addr);
> +			desc[i].len = cpu_to_virtio32(_vq->vdev, sg->length);
> +			desc[i].id = cpu_to_virtio32(_vq->vdev, head);

If it's a part of chain, we only need to do this for last buffer I think.

> +			prev = i;
> +			i++;

It looks to me prev is always i - 1?

> +			if (!indirect && i >= vq->vring_packed.num) {
> +				i = 0;
> +				vq->wrap_counter ^= 1;
> +			}
> +		}
> +	}
> +	for (; n < (out_sgs + in_sgs); n++) {
> +		for (sg = sgs[n]; sg; sg = sg_next(sg)) {
> +			dma_addr_t addr = vring_map_one_sg(vq, sg, DMA_FROM_DEVICE);
> +			if (vring_mapping_error(vq, addr))
> +				goto unmap_release;
> +
> +			flags = cpu_to_virtio16(_vq->vdev, VRING_DESC_F_NEXT |
> +					VRING_DESC_F_WRITE |
> +					VRING_DESC_F_AVAIL(vq->wrap_counter) |
> +					VRING_DESC_F_USED(!vq->wrap_counter));
> +			if (!indirect && i == head)
> +				head_flags = flags;
> +			else
> +				desc[i].flags = flags;
> +
> +			desc[i].addr = cpu_to_virtio64(_vq->vdev, addr);
> +			desc[i].len = cpu_to_virtio32(_vq->vdev, sg->length);
> +			desc[i].id = cpu_to_virtio32(_vq->vdev, head);
> +			prev = i;
> +			i++;
> +			if (!indirect && i >= vq->vring_packed.num) {
> +				i = 0;
> +				vq->wrap_counter ^= 1;
> +			}
> +		}
> +	}
> +	/* Last one doesn't continue. */
> +	if (!indirect && (head + 1) % vq->vring_packed.num == i)
> +		head_flags &= cpu_to_virtio16(_vq->vdev, ~VRING_DESC_F_NEXT);

I can't get the why we need this here.

> +	else
> +		desc[prev].flags &= cpu_to_virtio16(_vq->vdev, ~VRING_DESC_F_NEXT);
> +
> +	if (indirect) {
> +		/* FIXME: to be implemented */
> +
> +		/* Now that the indirect table is filled in, map it. */
> +		dma_addr_t addr = vring_map_single(
> +			vq, desc, total_sg * sizeof(struct vring_packed_desc),
> +			DMA_TO_DEVICE);
> +		if (vring_mapping_error(vq, addr))
> +			goto unmap_release;
> +
> +		head_flags = cpu_to_virtio16(_vq->vdev, VRING_DESC_F_INDIRECT |
> +					     VRING_DESC_F_AVAIL(wrap_counter) |
> +					     VRING_DESC_F_USED(!wrap_counter));
> +		vq->vring_packed.desc[head].addr = cpu_to_virtio64(_vq->vdev, addr);
> +		vq->vring_packed.desc[head].len = cpu_to_virtio32(_vq->vdev,
> +				total_sg * sizeof(struct vring_packed_desc));
> +		vq->vring_packed.desc[head].id = cpu_to_virtio32(_vq->vdev, head);
> +	}
> +
> +	/* We're using some buffers from the free list. */
> +	vq->vq.num_free -= descs_used;
> +
> +	/* Update free pointer */
> +	if (indirect) {
> +		n = head + 1;
> +		if (n >= vq->vring_packed.num) {
> +			n = 0;
> +			vq->wrap_counter ^= 1;
> +		}
> +		vq->free_head = n;

detach_buf_packed() does not even touch free_head here, so need to 
explain its meaning for packed ring.

> +	} else
> +		vq->free_head = i;

ID is only valid in the last descriptor in the list, so head + 1 should 
be ok too?

> +
> +	/* Store token and indirect buffer state. */
> +	vq->desc_state[head].num = descs_used;
> +	vq->desc_state[head].data = data;
> +	if (indirect)
> +		vq->desc_state[head].indir_desc = desc;
> +	else
> +		vq->desc_state[head].indir_desc = ctx;
> +
> +	virtio_wmb(vq->weak_barriers);

Let's add a comment to explain the barrier here.

> +	vq->vring_packed.desc[head].flags = head_flags;
> +	vq->num_added++;
> +
> +	pr_debug("Added buffer head %i to %p\n", head, vq);
> +	END_USE(vq);
> +
> +	return 0;
> +
> +unmap_release:
> +	err_idx = i;
> +	i = head;
> +
> +	for (n = 0; n < total_sg; n++) {
> +		if (i == err_idx)
> +			break;
> +		vring_unmap_one(vq, &desc[i]);
> +		i++;
> +		if (!indirect && i >= vq->vring_packed.num)
> +			i = 0;
> +	}
> +
> +	vq->wrap_counter = wrap_counter;
> +
> +	if (indirect)
> +		kfree(desc);
> +
> +	END_USE(vq);
> +	return -EIO;
> +}
> +
> +static inline int virtqueue_add(struct virtqueue *_vq,
> +				struct scatterlist *sgs[],
> +				unsigned int total_sg,
> +				unsigned int out_sgs,
> +				unsigned int in_sgs,
> +				void *data,
> +				void *ctx,
> +				gfp_t gfp)
> +{
> +	struct vring_virtqueue *vq = to_vvq(_vq);
> +
> +	return vq->packed ? virtqueue_add_packed(_vq, sgs, total_sg, out_sgs,
> +						 in_sgs, data, ctx, gfp) :
> +			    virtqueue_add_split(_vq, sgs, total_sg, out_sgs,
> +						in_sgs, data, ctx, gfp);
> +}
> +
>   /**
>    * virtqueue_add_sgs - expose buffers to other end
>    * @vq: the struct virtqueue we're talking about.
> @@ -561,6 +845,12 @@ bool virtqueue_kick_prepare(struct virtqueue *_vq)
>   	 * event. */
>   	virtio_mb(vq->weak_barriers);
>   
> +	if (vq->packed) {
> +		/* FIXME: to be implemented */
> +		needs_kick = true;
> +		goto out;
> +	}
> +
>   	old = vq->avail_idx_shadow - vq->num_added;
>   	new = vq->avail_idx_shadow;
>   	vq->num_added = 0;
> @@ -579,6 +869,8 @@ bool virtqueue_kick_prepare(struct virtqueue *_vq)
>   	} else {
>   		needs_kick = !(vq->vring.used->flags & cpu_to_virtio16(_vq->vdev, VRING_USED_F_NO_NOTIFY));
>   	}
> +
> +out:
>   	END_USE(vq);
>   	return needs_kick;
>   }
> @@ -628,8 +920,8 @@ bool virtqueue_kick(struct virtqueue *vq)
>   }
>   EXPORT_SYMBOL_GPL(virtqueue_kick);
>   
> -static void detach_buf(struct vring_virtqueue *vq, unsigned int head,
> -		       void **ctx)
> +static void detach_buf_split(struct vring_virtqueue *vq, unsigned int head,
> +			     void **ctx)
>   {
>   	unsigned int i, j;
>   	__virtio16 nextflag = cpu_to_virtio16(vq->vq.vdev, VRING_DESC_F_NEXT);
> @@ -677,29 +969,81 @@ static void detach_buf(struct vring_virtqueue *vq, unsigned int head,
>   	}
>   }
>   
> -static inline bool more_used(const struct vring_virtqueue *vq)
> +static void detach_buf_packed(struct vring_virtqueue *vq, unsigned int head,
> +			      void **ctx)
> +{
> +	struct vring_packed_desc *desc;
> +	unsigned int i, j;
> +
> +	/* Clear data ptr. */
> +	vq->desc_state[head].data = NULL;
> +
> +	i = head;
> +
> +	for (j = 0; j < vq->desc_state[head].num; j++) {
> +		desc = &vq->vring_packed.desc[i];
> +		vring_unmap_one(vq, desc);
> +		i++;
> +		if (i >= vq->vring_packed.num)
> +			i = 0;
> +	}
> +
> +	vq->vq.num_free += vq->desc_state[head].num;

It looks to me vq->free_head grows always, how can we make sure it does 
not exceeds vq.num here?

> +
> +	if (vq->indirect) {
> +		u32 len;
> +
> +		desc = vq->desc_state[head].indir_desc;
> +		/* Free the indirect table, if any, now that it's unmapped. */
> +		if (!desc)
> +			return;
> +
> +		len = virtio32_to_cpu(vq->vq.vdev,
> +				      vq->vring_packed.desc[head].len);
> +
> +		BUG_ON(!(vq->vring_packed.desc[head].flags &
> +			 cpu_to_virtio16(vq->vq.vdev, VRING_DESC_F_INDIRECT)));
> +		BUG_ON(len == 0 || len % sizeof(struct vring_packed_desc));
> +
> +		for (j = 0; j < len / sizeof(struct vring_packed_desc); j++)
> +			vring_unmap_one(vq, &desc[j]);
> +
> +		kfree(desc);
> +		vq->desc_state[head].indir_desc = NULL;
> +	} else if (ctx) {
> +		*ctx = vq->desc_state[head].indir_desc;
> +	}
> +}
> +
> +static inline bool more_used_split(const struct vring_virtqueue *vq)
>   {
>   	return vq->last_used_idx != virtio16_to_cpu(vq->vq.vdev, vq->vring.used->idx);
>   }
>   
> -/**
> - * virtqueue_get_buf - get the next used buffer
> - * @vq: the struct virtqueue we're talking about.
> - * @len: the length written into the buffer
> - *
> - * If the device wrote data into the buffer, @len will be set to the
> - * amount written.  This means you don't need to clear the buffer
> - * beforehand to ensure there's no data leakage in the case of short
> - * writes.
> - *
> - * Caller must ensure we don't call this with other virtqueue
> - * operations at the same time (except where noted).
> - *
> - * Returns NULL if there are no used buffers, or the "data" token
> - * handed to virtqueue_add_*().
> - */
> -void *virtqueue_get_buf_ctx(struct virtqueue *_vq, unsigned int *len,
> -			    void **ctx)
> +static inline bool more_used_packed(const struct vring_virtqueue *vq)
> +{
> +	u16 last_used, flags;
> +	bool avail, used;
> +
> +	if (vq->vq.num_free == vq->vring.num)
> +		return false;
> +
> +	last_used = vq->last_used_idx;
> +	flags = virtio16_to_cpu(vq->vq.vdev,
> +				vq->vring_packed.desc[last_used].flags);
> +	avail = flags & VRING_DESC_F_AVAIL(1);
> +	used = flags & VRING_DESC_F_USED(1);
> +
> +	return avail == used;
> +}
> +
> +static inline bool more_used(const struct vring_virtqueue *vq)
> +{
> +	return vq->packed ? more_used_packed(vq) : more_used_split(vq);
> +}
> +
> +void *virtqueue_get_buf_ctx_split(struct virtqueue *_vq, unsigned int *len,
> +				  void **ctx)
>   {
>   	struct vring_virtqueue *vq = to_vvq(_vq);
>   	void *ret;
> @@ -735,9 +1079,9 @@ void *virtqueue_get_buf_ctx(struct virtqueue *_vq, unsigned int *len,
>   		return NULL;
>   	}
>   
> -	/* detach_buf clears data, so grab it now. */
> +	/* detach_buf_split clears data, so grab it now. */
>   	ret = vq->desc_state[i].data;
> -	detach_buf(vq, i, ctx);
> +	detach_buf_split(vq, i, ctx);
>   	vq->last_used_idx++;
>   	/* If we expect an interrupt for the next entry, tell host
>   	 * by writing event index and flush out the write before
> @@ -754,6 +1098,87 @@ void *virtqueue_get_buf_ctx(struct virtqueue *_vq, unsigned int *len,
>   	END_USE(vq);
>   	return ret;
>   }
> +
> +void *virtqueue_get_buf_ctx_packed(struct virtqueue *_vq, unsigned int *len,
> +				   void **ctx)
> +{
> +	struct vring_virtqueue *vq = to_vvq(_vq);
> +	void *ret;
> +	unsigned int i;
> +	u16 last_used;
> +
> +	START_USE(vq);
> +
> +	if (unlikely(vq->broken)) {
> +		END_USE(vq);
> +		return NULL;
> +	}
> +
> +	if (!more_used(vq)) {
> +		pr_debug("No more buffers in queue\n");
> +		END_USE(vq);
> +		return NULL;
> +	}
> +
> +	/* Only get used array entries after they have been exposed by host. */
> +	virtio_rmb(vq->weak_barriers);
> +
> +	last_used = vq->last_used_idx;
> +
> +	i = virtio32_to_cpu(_vq->vdev, vq->vring_packed.desc[last_used].id);
> +	*len = virtio32_to_cpu(_vq->vdev, vq->vring_packed.desc[last_used].len);
> +
> +	if (unlikely(i >= vq->vring_packed.num)) {
> +		BAD_RING(vq, "id %u out of range\n", i);
> +		return NULL;
> +	}
> +	if (unlikely(!vq->desc_state[i].data)) {
> +		BAD_RING(vq, "id %u is not a head!\n", i);
> +		return NULL;
> +	}
> +
> +	/* detach_buf_packed clears data, so grab it now. */
> +	ret = vq->desc_state[i].data;
> +	detach_buf_packed(vq, i, ctx);
> +
> +	vq->last_used_idx += vq->desc_state[i].num;
> +	if (vq->last_used_idx >= vq->vring_packed.num)
> +		vq->last_used_idx %= vq->vring_packed.num;

'-=' should be sufficient here?

> +
> +	// FIXME: implement the desc event support
> +
> +#ifdef DEBUG
> +	vq->last_add_time_valid = false;
> +#endif
> +
> +	END_USE(vq);
> +	return ret;
> +}
> +
> +/**
> + * virtqueue_get_buf - get the next used buffer
> + * @vq: the struct virtqueue we're talking about.
> + * @len: the length written into the buffer
> + *
> + * If the device wrote data into the buffer, @len will be set to the
> + * amount written.  This means you don't need to clear the buffer
> + * beforehand to ensure there's no data leakage in the case of short
> + * writes.
> + *
> + * Caller must ensure we don't call this with other virtqueue
> + * operations at the same time (except where noted).
> + *
> + * Returns NULL if there are no used buffers, or the "data" token
> + * handed to virtqueue_add_*().
> + */
> +void *virtqueue_get_buf_ctx(struct virtqueue *_vq, unsigned int *len,
> +			    void **ctx)
> +{
> +	struct vring_virtqueue *vq = to_vvq(_vq);
> +
> +	return vq->packed ? virtqueue_get_buf_ctx_packed(_vq, len, ctx) :
> +			    virtqueue_get_buf_ctx_split(_vq, len, ctx);
> +}
>   EXPORT_SYMBOL_GPL(virtqueue_get_buf_ctx);
>   
>   void *virtqueue_get_buf(struct virtqueue *_vq, unsigned int *len)
> @@ -761,6 +1186,24 @@ void *virtqueue_get_buf(struct virtqueue *_vq, unsigned int *len)
>   	return virtqueue_get_buf_ctx(_vq, len, NULL);
>   }
>   EXPORT_SYMBOL_GPL(virtqueue_get_buf);
> +
> +static void virtqueue_disable_cb_split(struct virtqueue *_vq)
> +{
> +	struct vring_virtqueue *vq = to_vvq(_vq);
> +
> +	if (!(vq->avail_flags_shadow & VRING_AVAIL_F_NO_INTERRUPT)) {
> +		vq->avail_flags_shadow |= VRING_AVAIL_F_NO_INTERRUPT;
> +		if (!vq->event)
> +			vq->vring.avail->flags = cpu_to_virtio16(_vq->vdev,
> +							vq->avail_flags_shadow);
> +	}
> +}
> +
> +static void virtqueue_disable_cb_packed(struct virtqueue *_vq)
> +{
> +	// FIXME: to be implemented
> +}
> +
>   /**
>    * virtqueue_disable_cb - disable callbacks
>    * @vq: the struct virtqueue we're talking about.
> @@ -774,12 +1217,10 @@ void virtqueue_disable_cb(struct virtqueue *_vq)
>   {
>   	struct vring_virtqueue *vq = to_vvq(_vq);
>   
> -	if (!(vq->avail_flags_shadow & VRING_AVAIL_F_NO_INTERRUPT)) {
> -		vq->avail_flags_shadow |= VRING_AVAIL_F_NO_INTERRUPT;
> -		if (!vq->event)
> -			vq->vring.avail->flags = cpu_to_virtio16(_vq->vdev, vq->avail_flags_shadow);
> -	}
> -
> +	if (vq->packed)
> +		virtqueue_disable_cb_packed(_vq);
> +	else
> +		virtqueue_disable_cb_split(_vq);
>   }
>   EXPORT_SYMBOL_GPL(virtqueue_disable_cb);
>   
> @@ -802,6 +1243,12 @@ unsigned virtqueue_enable_cb_prepare(struct virtqueue *_vq)
>   
>   	START_USE(vq);
>   
> +	if (vq->packed) {
> +		// FIXME: to be implemented
> +		last_used_idx = vq->last_used_idx;
> +		goto out;
> +	}
> +
>   	/* We optimistically turn back on interrupts, then check if there was
>   	 * more to do. */
>   	/* Depending on the VIRTIO_RING_F_EVENT_IDX feature, we need to
> @@ -813,6 +1260,7 @@ unsigned virtqueue_enable_cb_prepare(struct virtqueue *_vq)
>   			vq->vring.avail->flags = cpu_to_virtio16(_vq->vdev, vq->avail_flags_shadow);
>   	}
>   	vring_used_event(&vq->vring) = cpu_to_virtio16(_vq->vdev, last_used_idx = vq->last_used_idx);
> +out:
>   	END_USE(vq);
>   	return last_used_idx;
>   }
> @@ -832,6 +1280,12 @@ bool virtqueue_poll(struct virtqueue *_vq, unsigned last_used_idx)
>   	struct vring_virtqueue *vq = to_vvq(_vq);
>   
>   	virtio_mb(vq->weak_barriers);
> +	if (vq->packed) {
> +		u16 flags = virtio16_to_cpu(vq->vq.vdev,
> +				vq->vring_packed.desc[last_used_idx].flags);
> +		return !(flags & VRING_DESC_F_AVAIL(1)) ==
> +		       !(flags & VRING_DESC_F_USED(1));
> +	}
>   	return (u16)last_used_idx != virtio16_to_cpu(_vq->vdev, vq->vring.used->idx);
>   }
>   EXPORT_SYMBOL_GPL(virtqueue_poll);
> @@ -874,6 +1328,11 @@ bool virtqueue_enable_cb_delayed(struct virtqueue *_vq)
>   
>   	START_USE(vq);
>   
> +	if (vq->packed) {
> +		// FIXME: to be implemented
> +		goto out;
> +	}
> +
>   	/* We optimistically turn back on interrupts, then check if there was
>   	 * more to do. */
>   	/* Depending on the VIRTIO_RING_F_USED_EVENT_IDX feature, we need to
> @@ -896,6 +1355,7 @@ bool virtqueue_enable_cb_delayed(struct virtqueue *_vq)
>   		return false;
>   	}
>   
> +out:
>   	END_USE(vq);
>   	return true;
>   }
> @@ -922,14 +1382,20 @@ void *virtqueue_detach_unused_buf(struct virtqueue *_vq)
>   			continue;
>   		/* detach_buf clears data, so grab it now. */
>   		buf = vq->desc_state[i].data;
> -		detach_buf(vq, i, NULL);
> -		vq->avail_idx_shadow--;
> -		vq->vring.avail->idx = cpu_to_virtio16(_vq->vdev, vq->avail_idx_shadow);
> +		if (vq->packed)
> +			detach_buf_packed(vq, i, NULL);
> +		else {
> +			detach_buf_split(vq, i, NULL);
> +			vq->avail_idx_shadow--;
> +			vq->vring.avail->idx = cpu_to_virtio16(_vq->vdev,
> +							vq->avail_idx_shadow);
> +		}
>   		END_USE(vq);
>   		return buf;
>   	}
>   	/* That should have freed everything. */
> -	BUG_ON(vq->vq.num_free != vq->vring.num);
> +	BUG_ON(vq->vq.num_free != (vq->packed ? vq->vring_packed.num :
> +						vq->vring.num));
>   
>   	END_USE(vq);
>   	return NULL;
> @@ -957,7 +1423,8 @@ irqreturn_t vring_interrupt(int irq, void *_vq)
>   EXPORT_SYMBOL_GPL(vring_interrupt);
>   
>   struct virtqueue *__vring_new_virtqueue(unsigned int index,
> -					struct vring vring,
> +					union vring_union vring,
> +					bool packed,
>   					struct virtio_device *vdev,
>   					bool weak_barriers,
>   					bool context,
> @@ -965,19 +1432,20 @@ struct virtqueue *__vring_new_virtqueue(unsigned int index,
>   					void (*callback)(struct virtqueue *),
>   					const char *name)
>   {
> -	unsigned int i;
> +	unsigned int num, i;
>   	struct vring_virtqueue *vq;
>   
> -	vq = kmalloc(sizeof(*vq) + vring.num * sizeof(struct vring_desc_state),
> +	num = packed ? vring.vring_packed.num : vring.vring_split.num;
> +
> +	vq = kmalloc(sizeof(*vq) + num * sizeof(struct vring_desc_state),
>   		     GFP_KERNEL);
>   	if (!vq)
>   		return NULL;
>   
> -	vq->vring = vring;
>   	vq->vq.callback = callback;
>   	vq->vq.vdev = vdev;
>   	vq->vq.name = name;
> -	vq->vq.num_free = vring.num;
> +	vq->vq.num_free = num;
>   	vq->vq.index = index;
>   	vq->we_own_ring = false;
>   	vq->queue_dma_addr = 0;
> @@ -986,9 +1454,8 @@ struct virtqueue *__vring_new_virtqueue(unsigned int index,
>   	vq->weak_barriers = weak_barriers;
>   	vq->broken = false;
>   	vq->last_used_idx = 0;
> -	vq->avail_flags_shadow = 0;
> -	vq->avail_idx_shadow = 0;
>   	vq->num_added = 0;
> +	vq->packed = packed;
>   	list_add_tail(&vq->vq.list, &vdev->vqs);
>   #ifdef DEBUG
>   	vq->in_use = false;
> @@ -999,18 +1466,41 @@ struct virtqueue *__vring_new_virtqueue(unsigned int index,
>   		!context;
>   	vq->event = virtio_has_feature(vdev, VIRTIO_RING_F_EVENT_IDX);
>   
> +	if (vq->packed) {
> +		vq->vring_packed = vring.vring_packed;
> +		vq->free_head = 0;
> +		vq->wrap_counter = 1;
> +
> +#if 0
> +		vq->chaining = virtio_has_feature(vdev,
> +						  VIRTIO_RING_F_LIST_DESC);
> +#else
> +		vq->chaining = true;

Looks like in V10 there's no F_LIST_DESC.

> +#endif
> +	} else {
> +		vq->vring = vring.vring_split;
> +		vq->avail_flags_shadow = 0;
> +		vq->avail_idx_shadow = 0;
> +
> +		/* Put everything in free lists. */
> +		vq->free_head = 0;
> +		for (i = 0; i < num-1; i++)
> +			vq->vring.desc[i].next = cpu_to_virtio16(vdev, i + 1);
> +	}
> +
>   	/* No callback?  Tell other side not to bother us. */
>   	if (!callback) {
> -		vq->avail_flags_shadow |= VRING_AVAIL_F_NO_INTERRUPT;
> -		if (!vq->event)
> -			vq->vring.avail->flags = cpu_to_virtio16(vdev, vq->avail_flags_shadow);
> +		if (packed) {
> +			// FIXME: to be implemented
> +		} else {
> +			vq->avail_flags_shadow |= VRING_AVAIL_F_NO_INTERRUPT;
> +			if (!vq->event)
> +				vq->vring.avail->flags = cpu_to_virtio16(vdev,
> +						vq->avail_flags_shadow);
> +		}
>   	}
>   
> -	/* Put everything in free lists. */
> -	vq->free_head = 0;
> -	for (i = 0; i < vring.num-1; i++)
> -		vq->vring.desc[i].next = cpu_to_virtio16(vdev, i + 1);
> -	memset(vq->desc_state, 0, vring.num * sizeof(struct vring_desc_state));
> +	memset(vq->desc_state, 0, num * sizeof(struct vring_desc_state));
>   
>   	return &vq->vq;
>   }
> @@ -1058,6 +1548,14 @@ static void vring_free_queue(struct virtio_device *vdev, size_t size,
>   	}
>   }
>   
> +static inline int
> +__vring_size(unsigned int num, unsigned long align, bool packed)
> +{
> +	if (packed)
> +		return vring_packed_size(num, align);
> +	return vring_size(num, align);
> +}
> +
>   struct virtqueue *vring_create_virtqueue(
>   	unsigned int index,
>   	unsigned int num,
> @@ -1074,7 +1572,8 @@ struct virtqueue *vring_create_virtqueue(
>   	void *queue = NULL;
>   	dma_addr_t dma_addr;
>   	size_t queue_size_in_bytes;
> -	struct vring vring;
> +	union vring_union vring;
> +	bool packed;
>   
>   	/* We assume num is a power of 2. */
>   	if (num & (num - 1)) {
> @@ -1082,9 +1581,13 @@ struct virtqueue *vring_create_virtqueue(
>   		return NULL;
>   	}
>   
> +	packed = virtio_has_feature(vdev, VIRTIO_F_RING_PACKED);
> +
>   	/* TODO: allocate each queue chunk individually */
> -	for (; num && vring_size(num, vring_align) > PAGE_SIZE; num /= 2) {
> -		queue = vring_alloc_queue(vdev, vring_size(num, vring_align),
> +	for (; num && __vring_size(num, vring_align, packed) > PAGE_SIZE;
> +			num /= 2) {
> +		queue = vring_alloc_queue(vdev, __vring_size(num, vring_align,
> +							     packed),
>   					  &dma_addr,
>   					  GFP_KERNEL|__GFP_NOWARN|__GFP_ZERO);
>   		if (queue)
> @@ -1096,17 +1599,21 @@ struct virtqueue *vring_create_virtqueue(
>   
>   	if (!queue) {
>   		/* Try to get a single page. You are my only hope! */
> -		queue = vring_alloc_queue(vdev, vring_size(num, vring_align),
> +		queue = vring_alloc_queue(vdev, __vring_size(num, vring_align,
> +							     packed),
>   					  &dma_addr, GFP_KERNEL|__GFP_ZERO);
>   	}
>   	if (!queue)
>   		return NULL;
>   
> -	queue_size_in_bytes = vring_size(num, vring_align);
> -	vring_init(&vring, num, queue, vring_align);
> +	queue_size_in_bytes = __vring_size(num, vring_align, packed);
> +	if (packed)
> +		vring_packed_init(&vring.vring_packed, num, queue, vring_align);
> +	else
> +		vring_init(&vring.vring_split, num, queue, vring_align);

Let's rename vring_init to vring_init_split() like other helpers?

>   
> -	vq = __vring_new_virtqueue(index, vring, vdev, weak_barriers, context,
> -				   notify, callback, name);
> +	vq = __vring_new_virtqueue(index, vring, packed, vdev, weak_barriers,
> +				   context, notify, callback, name);
>   	if (!vq) {
>   		vring_free_queue(vdev, queue_size_in_bytes, queue,
>   				 dma_addr);
> @@ -1132,10 +1639,17 @@ struct virtqueue *vring_new_virtqueue(unsigned int index,
>   				      void (*callback)(struct virtqueue *vq),
>   				      const char *name)
>   {
> -	struct vring vring;
> -	vring_init(&vring, num, pages, vring_align);
> -	return __vring_new_virtqueue(index, vring, vdev, weak_barriers, context,
> -				     notify, callback, name);
> +	union vring_union vring;
> +	bool packed;
> +
> +	packed = virtio_has_feature(vdev, VIRTIO_F_RING_PACKED);
> +	if (packed)
> +		vring_packed_init(&vring.vring_packed, num, pages, vring_align);
> +	else
> +		vring_init(&vring.vring_split, num, pages, vring_align);
> +
> +	return __vring_new_virtqueue(index, vring, packed, vdev, weak_barriers,
> +				     context, notify, callback, name);
>   }
>   EXPORT_SYMBOL_GPL(vring_new_virtqueue);
>   
> @@ -1145,7 +1659,9 @@ void vring_del_virtqueue(struct virtqueue *_vq)
>   
>   	if (vq->we_own_ring) {
>   		vring_free_queue(vq->vq.vdev, vq->queue_size_in_bytes,
> -				 vq->vring.desc, vq->queue_dma_addr);
> +				 vq->packed ? (void *)vq->vring_packed.desc :
> +					      (void *)vq->vring.desc,
> +				 vq->queue_dma_addr);
>   	}
>   	list_del(&_vq->list);
>   	kfree(vq);
> @@ -1159,14 +1675,18 @@ void vring_transport_features(struct virtio_device *vdev)
>   
>   	for (i = VIRTIO_TRANSPORT_F_START; i < VIRTIO_TRANSPORT_F_END; i++) {
>   		switch (i) {
> +#if 0 // FIXME: to be implemented
>   		case VIRTIO_RING_F_INDIRECT_DESC:
>   			break;
> +#endif
>   		case VIRTIO_RING_F_EVENT_IDX:
>   			break;
>   		case VIRTIO_F_VERSION_1:
>   			break;
>   		case VIRTIO_F_IOMMU_PLATFORM:
>   			break;
> +		case VIRTIO_F_RING_PACKED:
> +			break;
>   		default:
>   			/* We don't understand this bit. */
>   			__virtio_clear_bit(vdev, i);
> @@ -1187,7 +1707,7 @@ unsigned int virtqueue_get_vring_size(struct virtqueue *_vq)
>   
>   	struct vring_virtqueue *vq = to_vvq(_vq);
>   
> -	return vq->vring.num;
> +	return vq->packed ? vq->vring_packed.num : vq->vring.num;
>   }
>   EXPORT_SYMBOL_GPL(virtqueue_get_vring_size);
>   
> @@ -1224,6 +1744,7 @@ dma_addr_t virtqueue_get_desc_addr(struct virtqueue *_vq)
>   }
>   EXPORT_SYMBOL_GPL(virtqueue_get_desc_addr);
>   
> +/* Only available for split ring */

Interesting, I think we need this for correctly configure pci. e.g in 
setup_vq()?

>   dma_addr_t virtqueue_get_avail_addr(struct virtqueue *_vq)
>   {
>   	struct vring_virtqueue *vq = to_vvq(_vq);
> @@ -1235,6 +1756,7 @@ dma_addr_t virtqueue_get_avail_addr(struct virtqueue *_vq)
>   }
>   EXPORT_SYMBOL_GPL(virtqueue_get_avail_addr);
>   
> +/* Only available for split ring */
>   dma_addr_t virtqueue_get_used_addr(struct virtqueue *_vq)

Maybe it's better to rename this to get_device_addr().

Thanks

>   {
>   	struct vring_virtqueue *vq = to_vvq(_vq);
> @@ -1246,6 +1768,7 @@ dma_addr_t virtqueue_get_used_addr(struct virtqueue *_vq)
>   }
>   EXPORT_SYMBOL_GPL(virtqueue_get_used_addr);
>   
> +/* Only available for split ring */
>   const struct vring *virtqueue_get_vring(struct virtqueue *vq)
>   {
>   	return &to_vvq(vq)->vring;
> diff --git a/include/linux/virtio_ring.h b/include/linux/virtio_ring.h
> index bbf32524ab27..a0075894ad16 100644
> --- a/include/linux/virtio_ring.h
> +++ b/include/linux/virtio_ring.h
> @@ -60,6 +60,11 @@ static inline void virtio_store_mb(bool weak_barriers,
>   struct virtio_device;
>   struct virtqueue;
>   
> +union vring_union {
> +	struct vring vring_split;
> +	struct vring_packed vring_packed;
> +};
> +
>   /*
>    * Creates a virtqueue and allocates the descriptor ring.  If
>    * may_reduce_num is set, then this may allocate a smaller ring than
> @@ -79,7 +84,8 @@ struct virtqueue *vring_create_virtqueue(unsigned int index,
>   
>   /* Creates a virtqueue with a custom layout. */
>   struct virtqueue *__vring_new_virtqueue(unsigned int index,
> -					struct vring vring,
> +					union vring_union vring,
> +					bool packed,
>   					struct virtio_device *vdev,
>   					bool weak_barriers,
>   					bool ctx,

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH RFC 2/2] virtio_ring: support packed ring
@ 2018-03-16  4:03     ` Jason Wang
  0 siblings, 0 replies; 38+ messages in thread
From: Jason Wang @ 2018-03-16  4:03 UTC (permalink / raw)
  To: Tiwei Bie, mst, virtualization, linux-kernel, netdev; +Cc: wexu



On 2018年02月23日 19:18, Tiwei Bie wrote:
> Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
> ---
>   drivers/virtio/virtio_ring.c | 699 +++++++++++++++++++++++++++++++++++++------
>   include/linux/virtio_ring.h  |   8 +-
>   2 files changed, 618 insertions(+), 89 deletions(-)
>
> diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> index eb30f3e09a47..393778a2f809 100644
> --- a/drivers/virtio/virtio_ring.c
> +++ b/drivers/virtio/virtio_ring.c
> @@ -58,14 +58,14 @@
>   
>   struct vring_desc_state {
>   	void *data;			/* Data for callback. */
> -	struct vring_desc *indir_desc;	/* Indirect descriptor, if any. */
> +	void *indir_desc;		/* Indirect descriptor, if any. */
> +	int num;			/* Descriptor list length. */
>   };
>   
>   struct vring_virtqueue {
>   	struct virtqueue vq;
>   
> -	/* Actual memory layout for this queue */
> -	struct vring vring;
> +	bool packed;
>   
>   	/* Can we use weak barriers? */
>   	bool weak_barriers;
> @@ -87,11 +87,28 @@ struct vring_virtqueue {
>   	/* Last used index we've seen. */
>   	u16 last_used_idx;
>   
> -	/* Last written value to avail->flags */
> -	u16 avail_flags_shadow;
> -
> -	/* Last written value to avail->idx in guest byte order */
> -	u16 avail_idx_shadow;
> +	union {
> +		/* Available for split ring */
> +		struct {
> +			/* Actual memory layout for this queue */
> +			struct vring vring;
> +
> +			/* Last written value to avail->flags */
> +			u16 avail_flags_shadow;
> +
> +			/* Last written value to avail->idx in
> +			 * guest byte order */
> +			u16 avail_idx_shadow;
> +		};
> +
> +		/* Available for packed ring */
> +		struct {
> +			/* Actual memory layout for this queue */
> +			struct vring_packed vring_packed;
> +			u8 wrap_counter : 1;
> +			bool chaining;
> +		};
> +	};
>   
>   	/* How to notify other side. FIXME: commonalize hcalls! */
>   	bool (*notify)(struct virtqueue *vq);
> @@ -201,26 +218,37 @@ static dma_addr_t vring_map_single(const struct vring_virtqueue *vq,
>   			      cpu_addr, size, direction);
>   }
>   
> -static void vring_unmap_one(const struct vring_virtqueue *vq,
> -			    struct vring_desc *desc)
> +static void vring_unmap_one(const struct vring_virtqueue *vq, void *_desc)
>   {

Let's split the helpers to packed/split version like other helpers? 
(Consider the caller has already known the type of vq).

> +	u64 addr;
> +	u32 len;
>   	u16 flags;
>   
>   	if (!vring_use_dma_api(vq->vq.vdev))
>   		return;
>   
> -	flags = virtio16_to_cpu(vq->vq.vdev, desc->flags);
> +	if (vq->packed) {
> +		struct vring_packed_desc *desc = _desc;
> +
> +		addr = virtio64_to_cpu(vq->vq.vdev, desc->addr);
> +		len = virtio32_to_cpu(vq->vq.vdev, desc->len);
> +		flags = virtio16_to_cpu(vq->vq.vdev, desc->flags);
> +	} else {
> +		struct vring_desc *desc = _desc;
> +
> +		addr = virtio64_to_cpu(vq->vq.vdev, desc->addr);
> +		len = virtio32_to_cpu(vq->vq.vdev, desc->len);
> +		flags = virtio16_to_cpu(vq->vq.vdev, desc->flags);
> +	}
>   
>   	if (flags & VRING_DESC_F_INDIRECT) {
>   		dma_unmap_single(vring_dma_dev(vq),
> -				 virtio64_to_cpu(vq->vq.vdev, desc->addr),
> -				 virtio32_to_cpu(vq->vq.vdev, desc->len),
> +				 addr, len,
>   				 (flags & VRING_DESC_F_WRITE) ?
>   				 DMA_FROM_DEVICE : DMA_TO_DEVICE);
>   	} else {
>   		dma_unmap_page(vring_dma_dev(vq),
> -			       virtio64_to_cpu(vq->vq.vdev, desc->addr),
> -			       virtio32_to_cpu(vq->vq.vdev, desc->len),
> +			       addr, len,
>   			       (flags & VRING_DESC_F_WRITE) ?
>   			       DMA_FROM_DEVICE : DMA_TO_DEVICE);
>   	}
> @@ -235,8 +263,9 @@ static int vring_mapping_error(const struct vring_virtqueue *vq,
>   	return dma_mapping_error(vring_dma_dev(vq), addr);
>   }
>   
> -static struct vring_desc *alloc_indirect(struct virtqueue *_vq,
> -					 unsigned int total_sg, gfp_t gfp)
> +static struct vring_desc *alloc_indirect_split(struct virtqueue *_vq,
> +					       unsigned int total_sg,
> +					       gfp_t gfp)
>   {
>   	struct vring_desc *desc;
>   	unsigned int i;
> @@ -257,14 +286,32 @@ static struct vring_desc *alloc_indirect(struct virtqueue *_vq,
>   	return desc;
>   }
>   
> -static inline int virtqueue_add(struct virtqueue *_vq,
> -				struct scatterlist *sgs[],
> -				unsigned int total_sg,
> -				unsigned int out_sgs,
> -				unsigned int in_sgs,
> -				void *data,
> -				void *ctx,
> -				gfp_t gfp)
> +static struct vring_packed_desc *alloc_indirect_packed(struct virtqueue *_vq,
> +						       unsigned int total_sg,
> +						       gfp_t gfp)
> +{
> +	struct vring_packed_desc *desc;
> +
> +	/*
> +	 * We require lowmem mappings for the descriptors because
> +	 * otherwise virt_to_phys will give us bogus addresses in the
> +	 * virtqueue.
> +	 */
> +	gfp &= ~__GFP_HIGHMEM;
> +
> +	desc = kmalloc(total_sg * sizeof(struct vring_packed_desc), gfp);
> +
> +	return desc;
> +}
> +
> +static inline int virtqueue_add_split(struct virtqueue *_vq,
> +				      struct scatterlist *sgs[],
> +				      unsigned int total_sg,
> +				      unsigned int out_sgs,
> +				      unsigned int in_sgs,
> +				      void *data,
> +				      void *ctx,
> +				      gfp_t gfp)
>   {
>   	struct vring_virtqueue *vq = to_vvq(_vq);
>   	struct scatterlist *sg;
> @@ -303,7 +350,7 @@ static inline int virtqueue_add(struct virtqueue *_vq,
>   	/* If the host supports indirect descriptor tables, and we have multiple
>   	 * buffers, then go indirect. FIXME: tune this threshold */
>   	if (vq->indirect && total_sg > 1 && vq->vq.num_free)
> -		desc = alloc_indirect(_vq, total_sg, gfp);
> +		desc = alloc_indirect_split(_vq, total_sg, gfp);
>   	else {
>   		desc = NULL;
>   		WARN_ON_ONCE(total_sg > vq->vring.num && !vq->indirect);
> @@ -437,6 +484,243 @@ static inline int virtqueue_add(struct virtqueue *_vq,
>   	return -EIO;
>   }
>   
> +static inline int virtqueue_add_packed(struct virtqueue *_vq,
> +				       struct scatterlist *sgs[],
> +				       unsigned int total_sg,
> +				       unsigned int out_sgs,
> +				       unsigned int in_sgs,
> +				       void *data,
> +				       void *ctx,
> +				       gfp_t gfp)
> +{
> +	struct vring_virtqueue *vq = to_vvq(_vq);
> +	struct vring_packed_desc *desc;
> +	struct scatterlist *sg;
> +	unsigned int i, n, descs_used, uninitialized_var(prev), err_idx;
> +	__virtio16 uninitialized_var(head_flags), flags;
> +	int head, wrap_counter;
> +	bool indirect;
> +
> +	START_USE(vq);
> +
> +	BUG_ON(data == NULL);
> +	BUG_ON(ctx && vq->indirect);
> +
> +	if (unlikely(vq->broken)) {
> +		END_USE(vq);
> +		return -EIO;
> +	}
> +
> +	if (total_sg > 1 && !vq->chaining && !vq->indirect) {
> +		END_USE(vq);
> +		return -ENOTSUPP;
> +	}
> +
> +#ifdef DEBUG
> +	{
> +		ktime_t now = ktime_get();
> +
> +		/* No kick or get, with .1 second between?  Warn. */
> +		if (vq->last_add_time_valid)
> +			WARN_ON(ktime_to_ms(ktime_sub(now, vq->last_add_time))
> +					    > 100);
> +		vq->last_add_time = now;
> +		vq->last_add_time_valid = true;
> +	}
> +#endif
> +
> +	BUG_ON(total_sg == 0);
> +
> +	head = vq->free_head;
> +	wrap_counter = vq->wrap_counter;
> +
> +	/* If the host supports indirect descriptor tables, and we have multiple
> +	 * buffers, then go indirect. FIXME: tune this threshold */
> +	if (vq->indirect && total_sg > 1 && vq->vq.num_free)
> +		desc = alloc_indirect_packed(_vq, total_sg, gfp);
> +	else {
> +		desc = NULL;
> +		WARN_ON_ONCE(total_sg > vq->vring.num && !vq->indirect);
> +	}
> +
> +	if (desc) {
> +		/* Use a single buffer which doesn't continue */
> +		indirect = true;
> +		/* Set up rest to use this indirect table. */
> +		i = 0;
> +		descs_used = 1;
> +	} else {
> +		indirect = false;
> +		desc = vq->vring_packed.desc;
> +		i = head;
> +		descs_used = total_sg;
> +
> +		if (total_sg > 1 && !vq->chaining) {
> +			END_USE(vq);
> +			return -ENOTSUPP;
> +		}
> +	}
> +
> +	if (vq->vq.num_free < descs_used) {
> +		pr_debug("Can't add buf len %i - avail = %i\n",
> +			 descs_used, vq->vq.num_free);
> +		/* FIXME: for historical reasons, we force a notify here if
> +		 * there are outgoing parts to the buffer.  Presumably the
> +		 * host should service the ring ASAP. */
> +		if (out_sgs)
> +			vq->notify(&vq->vq);
> +		if (indirect)
> +			kfree(desc);
> +		END_USE(vq);
> +		return -ENOSPC;
> +	}
> +
> +	for (n = 0; n < out_sgs; n++) {
> +		for (sg = sgs[n]; sg; sg = sg_next(sg)) {
> +			dma_addr_t addr = vring_map_one_sg(vq, sg, DMA_TO_DEVICE);
> +			if (vring_mapping_error(vq, addr))
> +				goto unmap_release;
> +
> +			flags = cpu_to_virtio16(_vq->vdev, VRING_DESC_F_NEXT |
> +					VRING_DESC_F_AVAIL(vq->wrap_counter) |
> +					VRING_DESC_F_USED(!vq->wrap_counter));
> +			if (!indirect && i == head)
> +				head_flags = flags;
> +			else
> +				desc[i].flags = flags;
> +
> +			desc[i].addr = cpu_to_virtio64(_vq->vdev, addr);
> +			desc[i].len = cpu_to_virtio32(_vq->vdev, sg->length);
> +			desc[i].id = cpu_to_virtio32(_vq->vdev, head);

If it's a part of chain, we only need to do this for last buffer I think.

> +			prev = i;
> +			i++;

It looks to me prev is always i - 1?

> +			if (!indirect && i >= vq->vring_packed.num) {
> +				i = 0;
> +				vq->wrap_counter ^= 1;
> +			}
> +		}
> +	}
> +	for (; n < (out_sgs + in_sgs); n++) {
> +		for (sg = sgs[n]; sg; sg = sg_next(sg)) {
> +			dma_addr_t addr = vring_map_one_sg(vq, sg, DMA_FROM_DEVICE);
> +			if (vring_mapping_error(vq, addr))
> +				goto unmap_release;
> +
> +			flags = cpu_to_virtio16(_vq->vdev, VRING_DESC_F_NEXT |
> +					VRING_DESC_F_WRITE |
> +					VRING_DESC_F_AVAIL(vq->wrap_counter) |
> +					VRING_DESC_F_USED(!vq->wrap_counter));
> +			if (!indirect && i == head)
> +				head_flags = flags;
> +			else
> +				desc[i].flags = flags;
> +
> +			desc[i].addr = cpu_to_virtio64(_vq->vdev, addr);
> +			desc[i].len = cpu_to_virtio32(_vq->vdev, sg->length);
> +			desc[i].id = cpu_to_virtio32(_vq->vdev, head);
> +			prev = i;
> +			i++;
> +			if (!indirect && i >= vq->vring_packed.num) {
> +				i = 0;
> +				vq->wrap_counter ^= 1;
> +			}
> +		}
> +	}
> +	/* Last one doesn't continue. */
> +	if (!indirect && (head + 1) % vq->vring_packed.num == i)
> +		head_flags &= cpu_to_virtio16(_vq->vdev, ~VRING_DESC_F_NEXT);

I can't get the why we need this here.

> +	else
> +		desc[prev].flags &= cpu_to_virtio16(_vq->vdev, ~VRING_DESC_F_NEXT);
> +
> +	if (indirect) {
> +		/* FIXME: to be implemented */
> +
> +		/* Now that the indirect table is filled in, map it. */
> +		dma_addr_t addr = vring_map_single(
> +			vq, desc, total_sg * sizeof(struct vring_packed_desc),
> +			DMA_TO_DEVICE);
> +		if (vring_mapping_error(vq, addr))
> +			goto unmap_release;
> +
> +		head_flags = cpu_to_virtio16(_vq->vdev, VRING_DESC_F_INDIRECT |
> +					     VRING_DESC_F_AVAIL(wrap_counter) |
> +					     VRING_DESC_F_USED(!wrap_counter));
> +		vq->vring_packed.desc[head].addr = cpu_to_virtio64(_vq->vdev, addr);
> +		vq->vring_packed.desc[head].len = cpu_to_virtio32(_vq->vdev,
> +				total_sg * sizeof(struct vring_packed_desc));
> +		vq->vring_packed.desc[head].id = cpu_to_virtio32(_vq->vdev, head);
> +	}
> +
> +	/* We're using some buffers from the free list. */
> +	vq->vq.num_free -= descs_used;
> +
> +	/* Update free pointer */
> +	if (indirect) {
> +		n = head + 1;
> +		if (n >= vq->vring_packed.num) {
> +			n = 0;
> +			vq->wrap_counter ^= 1;
> +		}
> +		vq->free_head = n;

detach_buf_packed() does not even touch free_head here, so need to 
explain its meaning for packed ring.

> +	} else
> +		vq->free_head = i;

ID is only valid in the last descriptor in the list, so head + 1 should 
be ok too?

> +
> +	/* Store token and indirect buffer state. */
> +	vq->desc_state[head].num = descs_used;
> +	vq->desc_state[head].data = data;
> +	if (indirect)
> +		vq->desc_state[head].indir_desc = desc;
> +	else
> +		vq->desc_state[head].indir_desc = ctx;
> +
> +	virtio_wmb(vq->weak_barriers);

Let's add a comment to explain the barrier here.

> +	vq->vring_packed.desc[head].flags = head_flags;
> +	vq->num_added++;
> +
> +	pr_debug("Added buffer head %i to %p\n", head, vq);
> +	END_USE(vq);
> +
> +	return 0;
> +
> +unmap_release:
> +	err_idx = i;
> +	i = head;
> +
> +	for (n = 0; n < total_sg; n++) {
> +		if (i == err_idx)
> +			break;
> +		vring_unmap_one(vq, &desc[i]);
> +		i++;
> +		if (!indirect && i >= vq->vring_packed.num)
> +			i = 0;
> +	}
> +
> +	vq->wrap_counter = wrap_counter;
> +
> +	if (indirect)
> +		kfree(desc);
> +
> +	END_USE(vq);
> +	return -EIO;
> +}
> +
> +static inline int virtqueue_add(struct virtqueue *_vq,
> +				struct scatterlist *sgs[],
> +				unsigned int total_sg,
> +				unsigned int out_sgs,
> +				unsigned int in_sgs,
> +				void *data,
> +				void *ctx,
> +				gfp_t gfp)
> +{
> +	struct vring_virtqueue *vq = to_vvq(_vq);
> +
> +	return vq->packed ? virtqueue_add_packed(_vq, sgs, total_sg, out_sgs,
> +						 in_sgs, data, ctx, gfp) :
> +			    virtqueue_add_split(_vq, sgs, total_sg, out_sgs,
> +						in_sgs, data, ctx, gfp);
> +}
> +
>   /**
>    * virtqueue_add_sgs - expose buffers to other end
>    * @vq: the struct virtqueue we're talking about.
> @@ -561,6 +845,12 @@ bool virtqueue_kick_prepare(struct virtqueue *_vq)
>   	 * event. */
>   	virtio_mb(vq->weak_barriers);
>   
> +	if (vq->packed) {
> +		/* FIXME: to be implemented */
> +		needs_kick = true;
> +		goto out;
> +	}
> +
>   	old = vq->avail_idx_shadow - vq->num_added;
>   	new = vq->avail_idx_shadow;
>   	vq->num_added = 0;
> @@ -579,6 +869,8 @@ bool virtqueue_kick_prepare(struct virtqueue *_vq)
>   	} else {
>   		needs_kick = !(vq->vring.used->flags & cpu_to_virtio16(_vq->vdev, VRING_USED_F_NO_NOTIFY));
>   	}
> +
> +out:
>   	END_USE(vq);
>   	return needs_kick;
>   }
> @@ -628,8 +920,8 @@ bool virtqueue_kick(struct virtqueue *vq)
>   }
>   EXPORT_SYMBOL_GPL(virtqueue_kick);
>   
> -static void detach_buf(struct vring_virtqueue *vq, unsigned int head,
> -		       void **ctx)
> +static void detach_buf_split(struct vring_virtqueue *vq, unsigned int head,
> +			     void **ctx)
>   {
>   	unsigned int i, j;
>   	__virtio16 nextflag = cpu_to_virtio16(vq->vq.vdev, VRING_DESC_F_NEXT);
> @@ -677,29 +969,81 @@ static void detach_buf(struct vring_virtqueue *vq, unsigned int head,
>   	}
>   }
>   
> -static inline bool more_used(const struct vring_virtqueue *vq)
> +static void detach_buf_packed(struct vring_virtqueue *vq, unsigned int head,
> +			      void **ctx)
> +{
> +	struct vring_packed_desc *desc;
> +	unsigned int i, j;
> +
> +	/* Clear data ptr. */
> +	vq->desc_state[head].data = NULL;
> +
> +	i = head;
> +
> +	for (j = 0; j < vq->desc_state[head].num; j++) {
> +		desc = &vq->vring_packed.desc[i];
> +		vring_unmap_one(vq, desc);
> +		i++;
> +		if (i >= vq->vring_packed.num)
> +			i = 0;
> +	}
> +
> +	vq->vq.num_free += vq->desc_state[head].num;

It looks to me vq->free_head grows always, how can we make sure it does 
not exceeds vq.num here?

> +
> +	if (vq->indirect) {
> +		u32 len;
> +
> +		desc = vq->desc_state[head].indir_desc;
> +		/* Free the indirect table, if any, now that it's unmapped. */
> +		if (!desc)
> +			return;
> +
> +		len = virtio32_to_cpu(vq->vq.vdev,
> +				      vq->vring_packed.desc[head].len);
> +
> +		BUG_ON(!(vq->vring_packed.desc[head].flags &
> +			 cpu_to_virtio16(vq->vq.vdev, VRING_DESC_F_INDIRECT)));
> +		BUG_ON(len == 0 || len % sizeof(struct vring_packed_desc));
> +
> +		for (j = 0; j < len / sizeof(struct vring_packed_desc); j++)
> +			vring_unmap_one(vq, &desc[j]);
> +
> +		kfree(desc);
> +		vq->desc_state[head].indir_desc = NULL;
> +	} else if (ctx) {
> +		*ctx = vq->desc_state[head].indir_desc;
> +	}
> +}
> +
> +static inline bool more_used_split(const struct vring_virtqueue *vq)
>   {
>   	return vq->last_used_idx != virtio16_to_cpu(vq->vq.vdev, vq->vring.used->idx);
>   }
>   
> -/**
> - * virtqueue_get_buf - get the next used buffer
> - * @vq: the struct virtqueue we're talking about.
> - * @len: the length written into the buffer
> - *
> - * If the device wrote data into the buffer, @len will be set to the
> - * amount written.  This means you don't need to clear the buffer
> - * beforehand to ensure there's no data leakage in the case of short
> - * writes.
> - *
> - * Caller must ensure we don't call this with other virtqueue
> - * operations at the same time (except where noted).
> - *
> - * Returns NULL if there are no used buffers, or the "data" token
> - * handed to virtqueue_add_*().
> - */
> -void *virtqueue_get_buf_ctx(struct virtqueue *_vq, unsigned int *len,
> -			    void **ctx)
> +static inline bool more_used_packed(const struct vring_virtqueue *vq)
> +{
> +	u16 last_used, flags;
> +	bool avail, used;
> +
> +	if (vq->vq.num_free == vq->vring.num)
> +		return false;
> +
> +	last_used = vq->last_used_idx;
> +	flags = virtio16_to_cpu(vq->vq.vdev,
> +				vq->vring_packed.desc[last_used].flags);
> +	avail = flags & VRING_DESC_F_AVAIL(1);
> +	used = flags & VRING_DESC_F_USED(1);
> +
> +	return avail == used;
> +}
> +
> +static inline bool more_used(const struct vring_virtqueue *vq)
> +{
> +	return vq->packed ? more_used_packed(vq) : more_used_split(vq);
> +}
> +
> +void *virtqueue_get_buf_ctx_split(struct virtqueue *_vq, unsigned int *len,
> +				  void **ctx)
>   {
>   	struct vring_virtqueue *vq = to_vvq(_vq);
>   	void *ret;
> @@ -735,9 +1079,9 @@ void *virtqueue_get_buf_ctx(struct virtqueue *_vq, unsigned int *len,
>   		return NULL;
>   	}
>   
> -	/* detach_buf clears data, so grab it now. */
> +	/* detach_buf_split clears data, so grab it now. */
>   	ret = vq->desc_state[i].data;
> -	detach_buf(vq, i, ctx);
> +	detach_buf_split(vq, i, ctx);
>   	vq->last_used_idx++;
>   	/* If we expect an interrupt for the next entry, tell host
>   	 * by writing event index and flush out the write before
> @@ -754,6 +1098,87 @@ void *virtqueue_get_buf_ctx(struct virtqueue *_vq, unsigned int *len,
>   	END_USE(vq);
>   	return ret;
>   }
> +
> +void *virtqueue_get_buf_ctx_packed(struct virtqueue *_vq, unsigned int *len,
> +				   void **ctx)
> +{
> +	struct vring_virtqueue *vq = to_vvq(_vq);
> +	void *ret;
> +	unsigned int i;
> +	u16 last_used;
> +
> +	START_USE(vq);
> +
> +	if (unlikely(vq->broken)) {
> +		END_USE(vq);
> +		return NULL;
> +	}
> +
> +	if (!more_used(vq)) {
> +		pr_debug("No more buffers in queue\n");
> +		END_USE(vq);
> +		return NULL;
> +	}
> +
> +	/* Only get used array entries after they have been exposed by host. */
> +	virtio_rmb(vq->weak_barriers);
> +
> +	last_used = vq->last_used_idx;
> +
> +	i = virtio32_to_cpu(_vq->vdev, vq->vring_packed.desc[last_used].id);
> +	*len = virtio32_to_cpu(_vq->vdev, vq->vring_packed.desc[last_used].len);
> +
> +	if (unlikely(i >= vq->vring_packed.num)) {
> +		BAD_RING(vq, "id %u out of range\n", i);
> +		return NULL;
> +	}
> +	if (unlikely(!vq->desc_state[i].data)) {
> +		BAD_RING(vq, "id %u is not a head!\n", i);
> +		return NULL;
> +	}
> +
> +	/* detach_buf_packed clears data, so grab it now. */
> +	ret = vq->desc_state[i].data;
> +	detach_buf_packed(vq, i, ctx);
> +
> +	vq->last_used_idx += vq->desc_state[i].num;
> +	if (vq->last_used_idx >= vq->vring_packed.num)
> +		vq->last_used_idx %= vq->vring_packed.num;

'-=' should be sufficient here?

> +
> +	// FIXME: implement the desc event support
> +
> +#ifdef DEBUG
> +	vq->last_add_time_valid = false;
> +#endif
> +
> +	END_USE(vq);
> +	return ret;
> +}
> +
> +/**
> + * virtqueue_get_buf - get the next used buffer
> + * @vq: the struct virtqueue we're talking about.
> + * @len: the length written into the buffer
> + *
> + * If the device wrote data into the buffer, @len will be set to the
> + * amount written.  This means you don't need to clear the buffer
> + * beforehand to ensure there's no data leakage in the case of short
> + * writes.
> + *
> + * Caller must ensure we don't call this with other virtqueue
> + * operations at the same time (except where noted).
> + *
> + * Returns NULL if there are no used buffers, or the "data" token
> + * handed to virtqueue_add_*().
> + */
> +void *virtqueue_get_buf_ctx(struct virtqueue *_vq, unsigned int *len,
> +			    void **ctx)
> +{
> +	struct vring_virtqueue *vq = to_vvq(_vq);
> +
> +	return vq->packed ? virtqueue_get_buf_ctx_packed(_vq, len, ctx) :
> +			    virtqueue_get_buf_ctx_split(_vq, len, ctx);
> +}
>   EXPORT_SYMBOL_GPL(virtqueue_get_buf_ctx);
>   
>   void *virtqueue_get_buf(struct virtqueue *_vq, unsigned int *len)
> @@ -761,6 +1186,24 @@ void *virtqueue_get_buf(struct virtqueue *_vq, unsigned int *len)
>   	return virtqueue_get_buf_ctx(_vq, len, NULL);
>   }
>   EXPORT_SYMBOL_GPL(virtqueue_get_buf);
> +
> +static void virtqueue_disable_cb_split(struct virtqueue *_vq)
> +{
> +	struct vring_virtqueue *vq = to_vvq(_vq);
> +
> +	if (!(vq->avail_flags_shadow & VRING_AVAIL_F_NO_INTERRUPT)) {
> +		vq->avail_flags_shadow |= VRING_AVAIL_F_NO_INTERRUPT;
> +		if (!vq->event)
> +			vq->vring.avail->flags = cpu_to_virtio16(_vq->vdev,
> +							vq->avail_flags_shadow);
> +	}
> +}
> +
> +static void virtqueue_disable_cb_packed(struct virtqueue *_vq)
> +{
> +	// FIXME: to be implemented
> +}
> +
>   /**
>    * virtqueue_disable_cb - disable callbacks
>    * @vq: the struct virtqueue we're talking about.
> @@ -774,12 +1217,10 @@ void virtqueue_disable_cb(struct virtqueue *_vq)
>   {
>   	struct vring_virtqueue *vq = to_vvq(_vq);
>   
> -	if (!(vq->avail_flags_shadow & VRING_AVAIL_F_NO_INTERRUPT)) {
> -		vq->avail_flags_shadow |= VRING_AVAIL_F_NO_INTERRUPT;
> -		if (!vq->event)
> -			vq->vring.avail->flags = cpu_to_virtio16(_vq->vdev, vq->avail_flags_shadow);
> -	}
> -
> +	if (vq->packed)
> +		virtqueue_disable_cb_packed(_vq);
> +	else
> +		virtqueue_disable_cb_split(_vq);
>   }
>   EXPORT_SYMBOL_GPL(virtqueue_disable_cb);
>   
> @@ -802,6 +1243,12 @@ unsigned virtqueue_enable_cb_prepare(struct virtqueue *_vq)
>   
>   	START_USE(vq);
>   
> +	if (vq->packed) {
> +		// FIXME: to be implemented
> +		last_used_idx = vq->last_used_idx;
> +		goto out;
> +	}
> +
>   	/* We optimistically turn back on interrupts, then check if there was
>   	 * more to do. */
>   	/* Depending on the VIRTIO_RING_F_EVENT_IDX feature, we need to
> @@ -813,6 +1260,7 @@ unsigned virtqueue_enable_cb_prepare(struct virtqueue *_vq)
>   			vq->vring.avail->flags = cpu_to_virtio16(_vq->vdev, vq->avail_flags_shadow);
>   	}
>   	vring_used_event(&vq->vring) = cpu_to_virtio16(_vq->vdev, last_used_idx = vq->last_used_idx);
> +out:
>   	END_USE(vq);
>   	return last_used_idx;
>   }
> @@ -832,6 +1280,12 @@ bool virtqueue_poll(struct virtqueue *_vq, unsigned last_used_idx)
>   	struct vring_virtqueue *vq = to_vvq(_vq);
>   
>   	virtio_mb(vq->weak_barriers);
> +	if (vq->packed) {
> +		u16 flags = virtio16_to_cpu(vq->vq.vdev,
> +				vq->vring_packed.desc[last_used_idx].flags);
> +		return !(flags & VRING_DESC_F_AVAIL(1)) ==
> +		       !(flags & VRING_DESC_F_USED(1));
> +	}
>   	return (u16)last_used_idx != virtio16_to_cpu(_vq->vdev, vq->vring.used->idx);
>   }
>   EXPORT_SYMBOL_GPL(virtqueue_poll);
> @@ -874,6 +1328,11 @@ bool virtqueue_enable_cb_delayed(struct virtqueue *_vq)
>   
>   	START_USE(vq);
>   
> +	if (vq->packed) {
> +		// FIXME: to be implemented
> +		goto out;
> +	}
> +
>   	/* We optimistically turn back on interrupts, then check if there was
>   	 * more to do. */
>   	/* Depending on the VIRTIO_RING_F_USED_EVENT_IDX feature, we need to
> @@ -896,6 +1355,7 @@ bool virtqueue_enable_cb_delayed(struct virtqueue *_vq)
>   		return false;
>   	}
>   
> +out:
>   	END_USE(vq);
>   	return true;
>   }
> @@ -922,14 +1382,20 @@ void *virtqueue_detach_unused_buf(struct virtqueue *_vq)
>   			continue;
>   		/* detach_buf clears data, so grab it now. */
>   		buf = vq->desc_state[i].data;
> -		detach_buf(vq, i, NULL);
> -		vq->avail_idx_shadow--;
> -		vq->vring.avail->idx = cpu_to_virtio16(_vq->vdev, vq->avail_idx_shadow);
> +		if (vq->packed)
> +			detach_buf_packed(vq, i, NULL);
> +		else {
> +			detach_buf_split(vq, i, NULL);
> +			vq->avail_idx_shadow--;
> +			vq->vring.avail->idx = cpu_to_virtio16(_vq->vdev,
> +							vq->avail_idx_shadow);
> +		}
>   		END_USE(vq);
>   		return buf;
>   	}
>   	/* That should have freed everything. */
> -	BUG_ON(vq->vq.num_free != vq->vring.num);
> +	BUG_ON(vq->vq.num_free != (vq->packed ? vq->vring_packed.num :
> +						vq->vring.num));
>   
>   	END_USE(vq);
>   	return NULL;
> @@ -957,7 +1423,8 @@ irqreturn_t vring_interrupt(int irq, void *_vq)
>   EXPORT_SYMBOL_GPL(vring_interrupt);
>   
>   struct virtqueue *__vring_new_virtqueue(unsigned int index,
> -					struct vring vring,
> +					union vring_union vring,
> +					bool packed,
>   					struct virtio_device *vdev,
>   					bool weak_barriers,
>   					bool context,
> @@ -965,19 +1432,20 @@ struct virtqueue *__vring_new_virtqueue(unsigned int index,
>   					void (*callback)(struct virtqueue *),
>   					const char *name)
>   {
> -	unsigned int i;
> +	unsigned int num, i;
>   	struct vring_virtqueue *vq;
>   
> -	vq = kmalloc(sizeof(*vq) + vring.num * sizeof(struct vring_desc_state),
> +	num = packed ? vring.vring_packed.num : vring.vring_split.num;
> +
> +	vq = kmalloc(sizeof(*vq) + num * sizeof(struct vring_desc_state),
>   		     GFP_KERNEL);
>   	if (!vq)
>   		return NULL;
>   
> -	vq->vring = vring;
>   	vq->vq.callback = callback;
>   	vq->vq.vdev = vdev;
>   	vq->vq.name = name;
> -	vq->vq.num_free = vring.num;
> +	vq->vq.num_free = num;
>   	vq->vq.index = index;
>   	vq->we_own_ring = false;
>   	vq->queue_dma_addr = 0;
> @@ -986,9 +1454,8 @@ struct virtqueue *__vring_new_virtqueue(unsigned int index,
>   	vq->weak_barriers = weak_barriers;
>   	vq->broken = false;
>   	vq->last_used_idx = 0;
> -	vq->avail_flags_shadow = 0;
> -	vq->avail_idx_shadow = 0;
>   	vq->num_added = 0;
> +	vq->packed = packed;
>   	list_add_tail(&vq->vq.list, &vdev->vqs);
>   #ifdef DEBUG
>   	vq->in_use = false;
> @@ -999,18 +1466,41 @@ struct virtqueue *__vring_new_virtqueue(unsigned int index,
>   		!context;
>   	vq->event = virtio_has_feature(vdev, VIRTIO_RING_F_EVENT_IDX);
>   
> +	if (vq->packed) {
> +		vq->vring_packed = vring.vring_packed;
> +		vq->free_head = 0;
> +		vq->wrap_counter = 1;
> +
> +#if 0
> +		vq->chaining = virtio_has_feature(vdev,
> +						  VIRTIO_RING_F_LIST_DESC);
> +#else
> +		vq->chaining = true;

Looks like in V10 there's no F_LIST_DESC.

> +#endif
> +	} else {
> +		vq->vring = vring.vring_split;
> +		vq->avail_flags_shadow = 0;
> +		vq->avail_idx_shadow = 0;
> +
> +		/* Put everything in free lists. */
> +		vq->free_head = 0;
> +		for (i = 0; i < num-1; i++)
> +			vq->vring.desc[i].next = cpu_to_virtio16(vdev, i + 1);
> +	}
> +
>   	/* No callback?  Tell other side not to bother us. */
>   	if (!callback) {
> -		vq->avail_flags_shadow |= VRING_AVAIL_F_NO_INTERRUPT;
> -		if (!vq->event)
> -			vq->vring.avail->flags = cpu_to_virtio16(vdev, vq->avail_flags_shadow);
> +		if (packed) {
> +			// FIXME: to be implemented
> +		} else {
> +			vq->avail_flags_shadow |= VRING_AVAIL_F_NO_INTERRUPT;
> +			if (!vq->event)
> +				vq->vring.avail->flags = cpu_to_virtio16(vdev,
> +						vq->avail_flags_shadow);
> +		}
>   	}
>   
> -	/* Put everything in free lists. */
> -	vq->free_head = 0;
> -	for (i = 0; i < vring.num-1; i++)
> -		vq->vring.desc[i].next = cpu_to_virtio16(vdev, i + 1);
> -	memset(vq->desc_state, 0, vring.num * sizeof(struct vring_desc_state));
> +	memset(vq->desc_state, 0, num * sizeof(struct vring_desc_state));
>   
>   	return &vq->vq;
>   }
> @@ -1058,6 +1548,14 @@ static void vring_free_queue(struct virtio_device *vdev, size_t size,
>   	}
>   }
>   
> +static inline int
> +__vring_size(unsigned int num, unsigned long align, bool packed)
> +{
> +	if (packed)
> +		return vring_packed_size(num, align);
> +	return vring_size(num, align);
> +}
> +
>   struct virtqueue *vring_create_virtqueue(
>   	unsigned int index,
>   	unsigned int num,
> @@ -1074,7 +1572,8 @@ struct virtqueue *vring_create_virtqueue(
>   	void *queue = NULL;
>   	dma_addr_t dma_addr;
>   	size_t queue_size_in_bytes;
> -	struct vring vring;
> +	union vring_union vring;
> +	bool packed;
>   
>   	/* We assume num is a power of 2. */
>   	if (num & (num - 1)) {
> @@ -1082,9 +1581,13 @@ struct virtqueue *vring_create_virtqueue(
>   		return NULL;
>   	}
>   
> +	packed = virtio_has_feature(vdev, VIRTIO_F_RING_PACKED);
> +
>   	/* TODO: allocate each queue chunk individually */
> -	for (; num && vring_size(num, vring_align) > PAGE_SIZE; num /= 2) {
> -		queue = vring_alloc_queue(vdev, vring_size(num, vring_align),
> +	for (; num && __vring_size(num, vring_align, packed) > PAGE_SIZE;
> +			num /= 2) {
> +		queue = vring_alloc_queue(vdev, __vring_size(num, vring_align,
> +							     packed),
>   					  &dma_addr,
>   					  GFP_KERNEL|__GFP_NOWARN|__GFP_ZERO);
>   		if (queue)
> @@ -1096,17 +1599,21 @@ struct virtqueue *vring_create_virtqueue(
>   
>   	if (!queue) {
>   		/* Try to get a single page. You are my only hope! */
> -		queue = vring_alloc_queue(vdev, vring_size(num, vring_align),
> +		queue = vring_alloc_queue(vdev, __vring_size(num, vring_align,
> +							     packed),
>   					  &dma_addr, GFP_KERNEL|__GFP_ZERO);
>   	}
>   	if (!queue)
>   		return NULL;
>   
> -	queue_size_in_bytes = vring_size(num, vring_align);
> -	vring_init(&vring, num, queue, vring_align);
> +	queue_size_in_bytes = __vring_size(num, vring_align, packed);
> +	if (packed)
> +		vring_packed_init(&vring.vring_packed, num, queue, vring_align);
> +	else
> +		vring_init(&vring.vring_split, num, queue, vring_align);

Let's rename vring_init to vring_init_split() like other helpers?

>   
> -	vq = __vring_new_virtqueue(index, vring, vdev, weak_barriers, context,
> -				   notify, callback, name);
> +	vq = __vring_new_virtqueue(index, vring, packed, vdev, weak_barriers,
> +				   context, notify, callback, name);
>   	if (!vq) {
>   		vring_free_queue(vdev, queue_size_in_bytes, queue,
>   				 dma_addr);
> @@ -1132,10 +1639,17 @@ struct virtqueue *vring_new_virtqueue(unsigned int index,
>   				      void (*callback)(struct virtqueue *vq),
>   				      const char *name)
>   {
> -	struct vring vring;
> -	vring_init(&vring, num, pages, vring_align);
> -	return __vring_new_virtqueue(index, vring, vdev, weak_barriers, context,
> -				     notify, callback, name);
> +	union vring_union vring;
> +	bool packed;
> +
> +	packed = virtio_has_feature(vdev, VIRTIO_F_RING_PACKED);
> +	if (packed)
> +		vring_packed_init(&vring.vring_packed, num, pages, vring_align);
> +	else
> +		vring_init(&vring.vring_split, num, pages, vring_align);
> +
> +	return __vring_new_virtqueue(index, vring, packed, vdev, weak_barriers,
> +				     context, notify, callback, name);
>   }
>   EXPORT_SYMBOL_GPL(vring_new_virtqueue);
>   
> @@ -1145,7 +1659,9 @@ void vring_del_virtqueue(struct virtqueue *_vq)
>   
>   	if (vq->we_own_ring) {
>   		vring_free_queue(vq->vq.vdev, vq->queue_size_in_bytes,
> -				 vq->vring.desc, vq->queue_dma_addr);
> +				 vq->packed ? (void *)vq->vring_packed.desc :
> +					      (void *)vq->vring.desc,
> +				 vq->queue_dma_addr);
>   	}
>   	list_del(&_vq->list);
>   	kfree(vq);
> @@ -1159,14 +1675,18 @@ void vring_transport_features(struct virtio_device *vdev)
>   
>   	for (i = VIRTIO_TRANSPORT_F_START; i < VIRTIO_TRANSPORT_F_END; i++) {
>   		switch (i) {
> +#if 0 // FIXME: to be implemented
>   		case VIRTIO_RING_F_INDIRECT_DESC:
>   			break;
> +#endif
>   		case VIRTIO_RING_F_EVENT_IDX:
>   			break;
>   		case VIRTIO_F_VERSION_1:
>   			break;
>   		case VIRTIO_F_IOMMU_PLATFORM:
>   			break;
> +		case VIRTIO_F_RING_PACKED:
> +			break;
>   		default:
>   			/* We don't understand this bit. */
>   			__virtio_clear_bit(vdev, i);
> @@ -1187,7 +1707,7 @@ unsigned int virtqueue_get_vring_size(struct virtqueue *_vq)
>   
>   	struct vring_virtqueue *vq = to_vvq(_vq);
>   
> -	return vq->vring.num;
> +	return vq->packed ? vq->vring_packed.num : vq->vring.num;
>   }
>   EXPORT_SYMBOL_GPL(virtqueue_get_vring_size);
>   
> @@ -1224,6 +1744,7 @@ dma_addr_t virtqueue_get_desc_addr(struct virtqueue *_vq)
>   }
>   EXPORT_SYMBOL_GPL(virtqueue_get_desc_addr);
>   
> +/* Only available for split ring */

Interesting, I think we need this for correctly configure pci. e.g in 
setup_vq()?

>   dma_addr_t virtqueue_get_avail_addr(struct virtqueue *_vq)
>   {
>   	struct vring_virtqueue *vq = to_vvq(_vq);
> @@ -1235,6 +1756,7 @@ dma_addr_t virtqueue_get_avail_addr(struct virtqueue *_vq)
>   }
>   EXPORT_SYMBOL_GPL(virtqueue_get_avail_addr);
>   
> +/* Only available for split ring */
>   dma_addr_t virtqueue_get_used_addr(struct virtqueue *_vq)

Maybe it's better to rename this to get_device_addr().

Thanks

>   {
>   	struct vring_virtqueue *vq = to_vvq(_vq);
> @@ -1246,6 +1768,7 @@ dma_addr_t virtqueue_get_used_addr(struct virtqueue *_vq)
>   }
>   EXPORT_SYMBOL_GPL(virtqueue_get_used_addr);
>   
> +/* Only available for split ring */
>   const struct vring *virtqueue_get_vring(struct virtqueue *vq)
>   {
>   	return &to_vvq(vq)->vring;
> diff --git a/include/linux/virtio_ring.h b/include/linux/virtio_ring.h
> index bbf32524ab27..a0075894ad16 100644
> --- a/include/linux/virtio_ring.h
> +++ b/include/linux/virtio_ring.h
> @@ -60,6 +60,11 @@ static inline void virtio_store_mb(bool weak_barriers,
>   struct virtio_device;
>   struct virtqueue;
>   
> +union vring_union {
> +	struct vring vring_split;
> +	struct vring_packed vring_packed;
> +};
> +
>   /*
>    * Creates a virtqueue and allocates the descriptor ring.  If
>    * may_reduce_num is set, then this may allocate a smaller ring than
> @@ -79,7 +84,8 @@ struct virtqueue *vring_create_virtqueue(unsigned int index,
>   
>   /* Creates a virtqueue with a custom layout. */
>   struct virtqueue *__vring_new_virtqueue(unsigned int index,
> -					struct vring vring,
> +					union vring_union vring,
> +					bool packed,
>   					struct virtio_device *vdev,
>   					bool weak_barriers,
>   					bool ctx,

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH RFC 2/2] virtio_ring: support packed ring
  2018-03-16  4:03     ` Jason Wang
@ 2018-03-16  6:10       ` Tiwei Bie
  -1 siblings, 0 replies; 38+ messages in thread
From: Tiwei Bie @ 2018-03-16  6:10 UTC (permalink / raw)
  To: Jason Wang; +Cc: mst, virtualization, linux-kernel, netdev, wexu, jfreimann

On Fri, Mar 16, 2018 at 12:03:25PM +0800, Jason Wang wrote:
> On 2018年02月23日 19:18, Tiwei Bie wrote:
> > Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
> > ---
> >   drivers/virtio/virtio_ring.c | 699 +++++++++++++++++++++++++++++++++++++------
> >   include/linux/virtio_ring.h  |   8 +-
> >   2 files changed, 618 insertions(+), 89 deletions(-)
> > 
> > diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> > index eb30f3e09a47..393778a2f809 100644
> > --- a/drivers/virtio/virtio_ring.c
> > +++ b/drivers/virtio/virtio_ring.c
> > @@ -58,14 +58,14 @@
> >   struct vring_desc_state {
> >   	void *data;			/* Data for callback. */
> > -	struct vring_desc *indir_desc;	/* Indirect descriptor, if any. */
> > +	void *indir_desc;		/* Indirect descriptor, if any. */
> > +	int num;			/* Descriptor list length. */
> >   };
> >   struct vring_virtqueue {
> >   	struct virtqueue vq;
> > -	/* Actual memory layout for this queue */
> > -	struct vring vring;
> > +	bool packed;
> >   	/* Can we use weak barriers? */
> >   	bool weak_barriers;
> > @@ -87,11 +87,28 @@ struct vring_virtqueue {
> >   	/* Last used index we've seen. */
> >   	u16 last_used_idx;
> > -	/* Last written value to avail->flags */
> > -	u16 avail_flags_shadow;
> > -
> > -	/* Last written value to avail->idx in guest byte order */
> > -	u16 avail_idx_shadow;
> > +	union {
> > +		/* Available for split ring */
> > +		struct {
> > +			/* Actual memory layout for this queue */
> > +			struct vring vring;
> > +
> > +			/* Last written value to avail->flags */
> > +			u16 avail_flags_shadow;
> > +
> > +			/* Last written value to avail->idx in
> > +			 * guest byte order */
> > +			u16 avail_idx_shadow;
> > +		};
> > +
> > +		/* Available for packed ring */
> > +		struct {
> > +			/* Actual memory layout for this queue */
> > +			struct vring_packed vring_packed;
> > +			u8 wrap_counter : 1;
> > +			bool chaining;
> > +		};
> > +	};
> >   	/* How to notify other side. FIXME: commonalize hcalls! */
> >   	bool (*notify)(struct virtqueue *vq);
> > @@ -201,26 +218,37 @@ static dma_addr_t vring_map_single(const struct vring_virtqueue *vq,
> >   			      cpu_addr, size, direction);
> >   }
> > -static void vring_unmap_one(const struct vring_virtqueue *vq,
> > -			    struct vring_desc *desc)
> > +static void vring_unmap_one(const struct vring_virtqueue *vq, void *_desc)
> >   {
> 
> Let's split the helpers to packed/split version like other helpers?
> (Consider the caller has already known the type of vq).

Okay.

> 
> > +	u64 addr;
> > +	u32 len;
> >   	u16 flags;
> >   	if (!vring_use_dma_api(vq->vq.vdev))
> >   		return;
> > -	flags = virtio16_to_cpu(vq->vq.vdev, desc->flags);
> > +	if (vq->packed) {
> > +		struct vring_packed_desc *desc = _desc;
> > +
> > +		addr = virtio64_to_cpu(vq->vq.vdev, desc->addr);
> > +		len = virtio32_to_cpu(vq->vq.vdev, desc->len);
> > +		flags = virtio16_to_cpu(vq->vq.vdev, desc->flags);
> > +	} else {
> > +		struct vring_desc *desc = _desc;
> > +
> > +		addr = virtio64_to_cpu(vq->vq.vdev, desc->addr);
> > +		len = virtio32_to_cpu(vq->vq.vdev, desc->len);
> > +		flags = virtio16_to_cpu(vq->vq.vdev, desc->flags);
> > +	}
> >   	if (flags & VRING_DESC_F_INDIRECT) {
> >   		dma_unmap_single(vring_dma_dev(vq),
> > -				 virtio64_to_cpu(vq->vq.vdev, desc->addr),
> > -				 virtio32_to_cpu(vq->vq.vdev, desc->len),
> > +				 addr, len,
> >   				 (flags & VRING_DESC_F_WRITE) ?
> >   				 DMA_FROM_DEVICE : DMA_TO_DEVICE);
> >   	} else {
> >   		dma_unmap_page(vring_dma_dev(vq),
> > -			       virtio64_to_cpu(vq->vq.vdev, desc->addr),
> > -			       virtio32_to_cpu(vq->vq.vdev, desc->len),
> > +			       addr, len,
> >   			       (flags & VRING_DESC_F_WRITE) ?
> >   			       DMA_FROM_DEVICE : DMA_TO_DEVICE);
> >   	}
> > @@ -235,8 +263,9 @@ static int vring_mapping_error(const struct vring_virtqueue *vq,
> >   	return dma_mapping_error(vring_dma_dev(vq), addr);
> >   }
> > -static struct vring_desc *alloc_indirect(struct virtqueue *_vq,
> > -					 unsigned int total_sg, gfp_t gfp)
> > +static struct vring_desc *alloc_indirect_split(struct virtqueue *_vq,
> > +					       unsigned int total_sg,
> > +					       gfp_t gfp)
> >   {
> >   	struct vring_desc *desc;
> >   	unsigned int i;
> > @@ -257,14 +286,32 @@ static struct vring_desc *alloc_indirect(struct virtqueue *_vq,
> >   	return desc;
> >   }
> > -static inline int virtqueue_add(struct virtqueue *_vq,
> > -				struct scatterlist *sgs[],
> > -				unsigned int total_sg,
> > -				unsigned int out_sgs,
> > -				unsigned int in_sgs,
> > -				void *data,
> > -				void *ctx,
> > -				gfp_t gfp)
> > +static struct vring_packed_desc *alloc_indirect_packed(struct virtqueue *_vq,
> > +						       unsigned int total_sg,
> > +						       gfp_t gfp)
> > +{
> > +	struct vring_packed_desc *desc;
> > +
> > +	/*
> > +	 * We require lowmem mappings for the descriptors because
> > +	 * otherwise virt_to_phys will give us bogus addresses in the
> > +	 * virtqueue.
> > +	 */
> > +	gfp &= ~__GFP_HIGHMEM;
> > +
> > +	desc = kmalloc(total_sg * sizeof(struct vring_packed_desc), gfp);
> > +
> > +	return desc;
> > +}
> > +
> > +static inline int virtqueue_add_split(struct virtqueue *_vq,
> > +				      struct scatterlist *sgs[],
> > +				      unsigned int total_sg,
> > +				      unsigned int out_sgs,
> > +				      unsigned int in_sgs,
> > +				      void *data,
> > +				      void *ctx,
> > +				      gfp_t gfp)
> >   {
> >   	struct vring_virtqueue *vq = to_vvq(_vq);
> >   	struct scatterlist *sg;
> > @@ -303,7 +350,7 @@ static inline int virtqueue_add(struct virtqueue *_vq,
> >   	/* If the host supports indirect descriptor tables, and we have multiple
> >   	 * buffers, then go indirect. FIXME: tune this threshold */
> >   	if (vq->indirect && total_sg > 1 && vq->vq.num_free)
> > -		desc = alloc_indirect(_vq, total_sg, gfp);
> > +		desc = alloc_indirect_split(_vq, total_sg, gfp);
> >   	else {
> >   		desc = NULL;
> >   		WARN_ON_ONCE(total_sg > vq->vring.num && !vq->indirect);
> > @@ -437,6 +484,243 @@ static inline int virtqueue_add(struct virtqueue *_vq,
> >   	return -EIO;
> >   }
> > +static inline int virtqueue_add_packed(struct virtqueue *_vq,
> > +				       struct scatterlist *sgs[],
> > +				       unsigned int total_sg,
> > +				       unsigned int out_sgs,
> > +				       unsigned int in_sgs,
> > +				       void *data,
> > +				       void *ctx,
> > +				       gfp_t gfp)
> > +{
> > +	struct vring_virtqueue *vq = to_vvq(_vq);
> > +	struct vring_packed_desc *desc;
> > +	struct scatterlist *sg;
> > +	unsigned int i, n, descs_used, uninitialized_var(prev), err_idx;
> > +	__virtio16 uninitialized_var(head_flags), flags;
> > +	int head, wrap_counter;
> > +	bool indirect;
> > +
> > +	START_USE(vq);
> > +
> > +	BUG_ON(data == NULL);
> > +	BUG_ON(ctx && vq->indirect);
> > +
> > +	if (unlikely(vq->broken)) {
> > +		END_USE(vq);
> > +		return -EIO;
> > +	}
> > +
> > +	if (total_sg > 1 && !vq->chaining && !vq->indirect) {
> > +		END_USE(vq);
> > +		return -ENOTSUPP;
> > +	}
> > +
> > +#ifdef DEBUG
> > +	{
> > +		ktime_t now = ktime_get();
> > +
> > +		/* No kick or get, with .1 second between?  Warn. */
> > +		if (vq->last_add_time_valid)
> > +			WARN_ON(ktime_to_ms(ktime_sub(now, vq->last_add_time))
> > +					    > 100);
> > +		vq->last_add_time = now;
> > +		vq->last_add_time_valid = true;
> > +	}
> > +#endif
> > +
> > +	BUG_ON(total_sg == 0);
> > +
> > +	head = vq->free_head;
> > +	wrap_counter = vq->wrap_counter;
> > +
> > +	/* If the host supports indirect descriptor tables, and we have multiple
> > +	 * buffers, then go indirect. FIXME: tune this threshold */
> > +	if (vq->indirect && total_sg > 1 && vq->vq.num_free)
> > +		desc = alloc_indirect_packed(_vq, total_sg, gfp);
> > +	else {
> > +		desc = NULL;
> > +		WARN_ON_ONCE(total_sg > vq->vring.num && !vq->indirect);
> > +	}
> > +
> > +	if (desc) {
> > +		/* Use a single buffer which doesn't continue */
> > +		indirect = true;
> > +		/* Set up rest to use this indirect table. */
> > +		i = 0;
> > +		descs_used = 1;
> > +	} else {
> > +		indirect = false;
> > +		desc = vq->vring_packed.desc;
> > +		i = head;
> > +		descs_used = total_sg;
> > +
> > +		if (total_sg > 1 && !vq->chaining) {
> > +			END_USE(vq);
> > +			return -ENOTSUPP;
> > +		}
> > +	}
> > +
> > +	if (vq->vq.num_free < descs_used) {
> > +		pr_debug("Can't add buf len %i - avail = %i\n",
> > +			 descs_used, vq->vq.num_free);
> > +		/* FIXME: for historical reasons, we force a notify here if
> > +		 * there are outgoing parts to the buffer.  Presumably the
> > +		 * host should service the ring ASAP. */
> > +		if (out_sgs)
> > +			vq->notify(&vq->vq);
> > +		if (indirect)
> > +			kfree(desc);
> > +		END_USE(vq);
> > +		return -ENOSPC;
> > +	}
> > +
> > +	for (n = 0; n < out_sgs; n++) {
> > +		for (sg = sgs[n]; sg; sg = sg_next(sg)) {
> > +			dma_addr_t addr = vring_map_one_sg(vq, sg, DMA_TO_DEVICE);
> > +			if (vring_mapping_error(vq, addr))
> > +				goto unmap_release;
> > +
> > +			flags = cpu_to_virtio16(_vq->vdev, VRING_DESC_F_NEXT |
> > +					VRING_DESC_F_AVAIL(vq->wrap_counter) |
> > +					VRING_DESC_F_USED(!vq->wrap_counter));
> > +			if (!indirect && i == head)
> > +				head_flags = flags;
> > +			else
> > +				desc[i].flags = flags;
> > +
> > +			desc[i].addr = cpu_to_virtio64(_vq->vdev, addr);
> > +			desc[i].len = cpu_to_virtio32(_vq->vdev, sg->length);
> > +			desc[i].id = cpu_to_virtio32(_vq->vdev, head);
> 
> If it's a part of chain, we only need to do this for last buffer I think.

I'm not sure I've got your point about the "last buffer".
But, yes, id just needs to be set for the last desc.

> 
> > +			prev = i;
> > +			i++;
> 
> It looks to me prev is always i - 1?

No. prev will be (vq->vring_packed.num - 1) when i becomes 0.

> 
> > +			if (!indirect && i >= vq->vring_packed.num) {
> > +				i = 0;
> > +				vq->wrap_counter ^= 1;
> > +			}
> > +		}
> > +	}
> > +	for (; n < (out_sgs + in_sgs); n++) {
> > +		for (sg = sgs[n]; sg; sg = sg_next(sg)) {
> > +			dma_addr_t addr = vring_map_one_sg(vq, sg, DMA_FROM_DEVICE);
> > +			if (vring_mapping_error(vq, addr))
> > +				goto unmap_release;
> > +
> > +			flags = cpu_to_virtio16(_vq->vdev, VRING_DESC_F_NEXT |
> > +					VRING_DESC_F_WRITE |
> > +					VRING_DESC_F_AVAIL(vq->wrap_counter) |
> > +					VRING_DESC_F_USED(!vq->wrap_counter));
> > +			if (!indirect && i == head)
> > +				head_flags = flags;
> > +			else
> > +				desc[i].flags = flags;
> > +
> > +			desc[i].addr = cpu_to_virtio64(_vq->vdev, addr);
> > +			desc[i].len = cpu_to_virtio32(_vq->vdev, sg->length);
> > +			desc[i].id = cpu_to_virtio32(_vq->vdev, head);
> > +			prev = i;
> > +			i++;
> > +			if (!indirect && i >= vq->vring_packed.num) {
> > +				i = 0;
> > +				vq->wrap_counter ^= 1;
> > +			}
> > +		}
> > +	}
> > +	/* Last one doesn't continue. */
> > +	if (!indirect && (head + 1) % vq->vring_packed.num == i)
> > +		head_flags &= cpu_to_virtio16(_vq->vdev, ~VRING_DESC_F_NEXT);
> 
> I can't get the why we need this here.

If only one desc is used, we will need to clear the
VRING_DESC_F_NEXT flag from the head_flags.

> 
> > +	else
> > +		desc[prev].flags &= cpu_to_virtio16(_vq->vdev, ~VRING_DESC_F_NEXT);
> > +
> > +	if (indirect) {
> > +		/* FIXME: to be implemented */
> > +
> > +		/* Now that the indirect table is filled in, map it. */
> > +		dma_addr_t addr = vring_map_single(
> > +			vq, desc, total_sg * sizeof(struct vring_packed_desc),
> > +			DMA_TO_DEVICE);
> > +		if (vring_mapping_error(vq, addr))
> > +			goto unmap_release;
> > +
> > +		head_flags = cpu_to_virtio16(_vq->vdev, VRING_DESC_F_INDIRECT |
> > +					     VRING_DESC_F_AVAIL(wrap_counter) |
> > +					     VRING_DESC_F_USED(!wrap_counter));
> > +		vq->vring_packed.desc[head].addr = cpu_to_virtio64(_vq->vdev, addr);
> > +		vq->vring_packed.desc[head].len = cpu_to_virtio32(_vq->vdev,
> > +				total_sg * sizeof(struct vring_packed_desc));
> > +		vq->vring_packed.desc[head].id = cpu_to_virtio32(_vq->vdev, head);
> > +	}
> > +
> > +	/* We're using some buffers from the free list. */
> > +	vq->vq.num_free -= descs_used;
> > +
> > +	/* Update free pointer */
> > +	if (indirect) {
> > +		n = head + 1;
> > +		if (n >= vq->vring_packed.num) {
> > +			n = 0;
> > +			vq->wrap_counter ^= 1;
> > +		}
> > +		vq->free_head = n;
> 
> detach_buf_packed() does not even touch free_head here, so need to explain
> its meaning for packed ring.

Above code is for indirect support which isn't really
implemented in this patch yet.

For your question, free_head stores the index of the
next avail desc. I'll add a comment for it or move it
to union and give it a better name in next version.

> 
> > +	} else
> > +		vq->free_head = i;
> 
> ID is only valid in the last descriptor in the list, so head + 1 should be
> ok too?

I don't really get your point. The vq->free_head stores
the index of the next avail desc.

> 
> > +
> > +	/* Store token and indirect buffer state. */
> > +	vq->desc_state[head].num = descs_used;
> > +	vq->desc_state[head].data = data;
> > +	if (indirect)
> > +		vq->desc_state[head].indir_desc = desc;
> > +	else
> > +		vq->desc_state[head].indir_desc = ctx;
> > +
> > +	virtio_wmb(vq->weak_barriers);
> 
> Let's add a comment to explain the barrier here.

Okay.

> 
> > +	vq->vring_packed.desc[head].flags = head_flags;
> > +	vq->num_added++;
> > +
> > +	pr_debug("Added buffer head %i to %p\n", head, vq);
> > +	END_USE(vq);
> > +
> > +	return 0;
> > +
> > +unmap_release:
> > +	err_idx = i;
> > +	i = head;
> > +
> > +	for (n = 0; n < total_sg; n++) {
> > +		if (i == err_idx)
> > +			break;
> > +		vring_unmap_one(vq, &desc[i]);
> > +		i++;
> > +		if (!indirect && i >= vq->vring_packed.num)
> > +			i = 0;
> > +	}
> > +
> > +	vq->wrap_counter = wrap_counter;
> > +
> > +	if (indirect)
> > +		kfree(desc);
> > +
> > +	END_USE(vq);
> > +	return -EIO;
> > +}
> > +
> > +static inline int virtqueue_add(struct virtqueue *_vq,
> > +				struct scatterlist *sgs[],
> > +				unsigned int total_sg,
> > +				unsigned int out_sgs,
> > +				unsigned int in_sgs,
> > +				void *data,
> > +				void *ctx,
> > +				gfp_t gfp)
> > +{
> > +	struct vring_virtqueue *vq = to_vvq(_vq);
> > +
> > +	return vq->packed ? virtqueue_add_packed(_vq, sgs, total_sg, out_sgs,
> > +						 in_sgs, data, ctx, gfp) :
> > +			    virtqueue_add_split(_vq, sgs, total_sg, out_sgs,
> > +						in_sgs, data, ctx, gfp);
> > +}
> > +
> >   /**
> >    * virtqueue_add_sgs - expose buffers to other end
> >    * @vq: the struct virtqueue we're talking about.
> > @@ -561,6 +845,12 @@ bool virtqueue_kick_prepare(struct virtqueue *_vq)
> >   	 * event. */
> >   	virtio_mb(vq->weak_barriers);
> > +	if (vq->packed) {
> > +		/* FIXME: to be implemented */
> > +		needs_kick = true;
> > +		goto out;
> > +	}
> > +
> >   	old = vq->avail_idx_shadow - vq->num_added;
> >   	new = vq->avail_idx_shadow;
> >   	vq->num_added = 0;
> > @@ -579,6 +869,8 @@ bool virtqueue_kick_prepare(struct virtqueue *_vq)
> >   	} else {
> >   		needs_kick = !(vq->vring.used->flags & cpu_to_virtio16(_vq->vdev, VRING_USED_F_NO_NOTIFY));
> >   	}
> > +
> > +out:
> >   	END_USE(vq);
> >   	return needs_kick;
> >   }
> > @@ -628,8 +920,8 @@ bool virtqueue_kick(struct virtqueue *vq)
> >   }
> >   EXPORT_SYMBOL_GPL(virtqueue_kick);
> > -static void detach_buf(struct vring_virtqueue *vq, unsigned int head,
> > -		       void **ctx)
> > +static void detach_buf_split(struct vring_virtqueue *vq, unsigned int head,
> > +			     void **ctx)
> >   {
> >   	unsigned int i, j;
> >   	__virtio16 nextflag = cpu_to_virtio16(vq->vq.vdev, VRING_DESC_F_NEXT);
> > @@ -677,29 +969,81 @@ static void detach_buf(struct vring_virtqueue *vq, unsigned int head,
> >   	}
> >   }
> > -static inline bool more_used(const struct vring_virtqueue *vq)
> > +static void detach_buf_packed(struct vring_virtqueue *vq, unsigned int head,
> > +			      void **ctx)
> > +{
> > +	struct vring_packed_desc *desc;
> > +	unsigned int i, j;
> > +
> > +	/* Clear data ptr. */
> > +	vq->desc_state[head].data = NULL;
> > +
> > +	i = head;
> > +
> > +	for (j = 0; j < vq->desc_state[head].num; j++) {
> > +		desc = &vq->vring_packed.desc[i];
> > +		vring_unmap_one(vq, desc);
> > +		i++;
> > +		if (i >= vq->vring_packed.num)
> > +			i = 0;
> > +	}
> > +
> > +	vq->vq.num_free += vq->desc_state[head].num;
> 
> It looks to me vq->free_head grows always, how can we make sure it does not
> exceeds vq.num here?

The vq->free_head stores the index of the next avail
desc. You can find it wraps together with vq->wrap_counter
in virtqueue_add_packed().

> 
> > +
> > +	if (vq->indirect) {
> > +		u32 len;
> > +
> > +		desc = vq->desc_state[head].indir_desc;
> > +		/* Free the indirect table, if any, now that it's unmapped. */
> > +		if (!desc)
> > +			return;
> > +
> > +		len = virtio32_to_cpu(vq->vq.vdev,
> > +				      vq->vring_packed.desc[head].len);
> > +
> > +		BUG_ON(!(vq->vring_packed.desc[head].flags &
> > +			 cpu_to_virtio16(vq->vq.vdev, VRING_DESC_F_INDIRECT)));
> > +		BUG_ON(len == 0 || len % sizeof(struct vring_packed_desc));
> > +
> > +		for (j = 0; j < len / sizeof(struct vring_packed_desc); j++)
> > +			vring_unmap_one(vq, &desc[j]);
> > +
> > +		kfree(desc);
> > +		vq->desc_state[head].indir_desc = NULL;
> > +	} else if (ctx) {
> > +		*ctx = vq->desc_state[head].indir_desc;
> > +	}
> > +}
> > +
> > +static inline bool more_used_split(const struct vring_virtqueue *vq)
> >   {
> >   	return vq->last_used_idx != virtio16_to_cpu(vq->vq.vdev, vq->vring.used->idx);
> >   }
> > -/**
> > - * virtqueue_get_buf - get the next used buffer
> > - * @vq: the struct virtqueue we're talking about.
> > - * @len: the length written into the buffer
> > - *
> > - * If the device wrote data into the buffer, @len will be set to the
> > - * amount written.  This means you don't need to clear the buffer
> > - * beforehand to ensure there's no data leakage in the case of short
> > - * writes.
> > - *
> > - * Caller must ensure we don't call this with other virtqueue
> > - * operations at the same time (except where noted).
> > - *
> > - * Returns NULL if there are no used buffers, or the "data" token
> > - * handed to virtqueue_add_*().
> > - */
> > -void *virtqueue_get_buf_ctx(struct virtqueue *_vq, unsigned int *len,
> > -			    void **ctx)
> > +static inline bool more_used_packed(const struct vring_virtqueue *vq)
> > +{
> > +	u16 last_used, flags;
> > +	bool avail, used;
> > +
> > +	if (vq->vq.num_free == vq->vring.num)
> > +		return false;
> > +
> > +	last_used = vq->last_used_idx;
> > +	flags = virtio16_to_cpu(vq->vq.vdev,
> > +				vq->vring_packed.desc[last_used].flags);
> > +	avail = flags & VRING_DESC_F_AVAIL(1);
> > +	used = flags & VRING_DESC_F_USED(1);
> > +
> > +	return avail == used;
> > +}
> > +
> > +static inline bool more_used(const struct vring_virtqueue *vq)
> > +{
> > +	return vq->packed ? more_used_packed(vq) : more_used_split(vq);
> > +}
> > +
> > +void *virtqueue_get_buf_ctx_split(struct virtqueue *_vq, unsigned int *len,
> > +				  void **ctx)
> >   {
> >   	struct vring_virtqueue *vq = to_vvq(_vq);
> >   	void *ret;
> > @@ -735,9 +1079,9 @@ void *virtqueue_get_buf_ctx(struct virtqueue *_vq, unsigned int *len,
> >   		return NULL;
> >   	}
> > -	/* detach_buf clears data, so grab it now. */
> > +	/* detach_buf_split clears data, so grab it now. */
> >   	ret = vq->desc_state[i].data;
> > -	detach_buf(vq, i, ctx);
> > +	detach_buf_split(vq, i, ctx);
> >   	vq->last_used_idx++;
> >   	/* If we expect an interrupt for the next entry, tell host
> >   	 * by writing event index and flush out the write before
> > @@ -754,6 +1098,87 @@ void *virtqueue_get_buf_ctx(struct virtqueue *_vq, unsigned int *len,
> >   	END_USE(vq);
> >   	return ret;
> >   }
> > +
> > +void *virtqueue_get_buf_ctx_packed(struct virtqueue *_vq, unsigned int *len,
> > +				   void **ctx)
> > +{
> > +	struct vring_virtqueue *vq = to_vvq(_vq);
> > +	void *ret;
> > +	unsigned int i;
> > +	u16 last_used;
> > +
> > +	START_USE(vq);
> > +
> > +	if (unlikely(vq->broken)) {
> > +		END_USE(vq);
> > +		return NULL;
> > +	}
> > +
> > +	if (!more_used(vq)) {
> > +		pr_debug("No more buffers in queue\n");
> > +		END_USE(vq);
> > +		return NULL;
> > +	}
> > +
> > +	/* Only get used array entries after they have been exposed by host. */
> > +	virtio_rmb(vq->weak_barriers);
> > +
> > +	last_used = vq->last_used_idx;
> > +
> > +	i = virtio32_to_cpu(_vq->vdev, vq->vring_packed.desc[last_used].id);
> > +	*len = virtio32_to_cpu(_vq->vdev, vq->vring_packed.desc[last_used].len);
> > +
> > +	if (unlikely(i >= vq->vring_packed.num)) {
> > +		BAD_RING(vq, "id %u out of range\n", i);
> > +		return NULL;
> > +	}
> > +	if (unlikely(!vq->desc_state[i].data)) {
> > +		BAD_RING(vq, "id %u is not a head!\n", i);
> > +		return NULL;
> > +	}
> > +
> > +	/* detach_buf_packed clears data, so grab it now. */
> > +	ret = vq->desc_state[i].data;
> > +	detach_buf_packed(vq, i, ctx);
> > +
> > +	vq->last_used_idx += vq->desc_state[i].num;
> > +	if (vq->last_used_idx >= vq->vring_packed.num)
> > +		vq->last_used_idx %= vq->vring_packed.num;
> 
> '-=' should be sufficient here?

Good suggestion. I think so.

> 
> > +
> > +	// FIXME: implement the desc event support
> > +
> > +#ifdef DEBUG
> > +	vq->last_add_time_valid = false;
> > +#endif
> > +
> > +	END_USE(vq);
> > +	return ret;
> > +}
> > +
> > +/**
> > + * virtqueue_get_buf - get the next used buffer
> > + * @vq: the struct virtqueue we're talking about.
> > + * @len: the length written into the buffer
> > + *
> > + * If the device wrote data into the buffer, @len will be set to the
> > + * amount written.  This means you don't need to clear the buffer
> > + * beforehand to ensure there's no data leakage in the case of short
> > + * writes.
> > + *
> > + * Caller must ensure we don't call this with other virtqueue
> > + * operations at the same time (except where noted).
> > + *
> > + * Returns NULL if there are no used buffers, or the "data" token
> > + * handed to virtqueue_add_*().
> > + */
> > +void *virtqueue_get_buf_ctx(struct virtqueue *_vq, unsigned int *len,
> > +			    void **ctx)
> > +{
> > +	struct vring_virtqueue *vq = to_vvq(_vq);
> > +
> > +	return vq->packed ? virtqueue_get_buf_ctx_packed(_vq, len, ctx) :
> > +			    virtqueue_get_buf_ctx_split(_vq, len, ctx);
> > +}
> >   EXPORT_SYMBOL_GPL(virtqueue_get_buf_ctx);
> >   void *virtqueue_get_buf(struct virtqueue *_vq, unsigned int *len)
> > @@ -761,6 +1186,24 @@ void *virtqueue_get_buf(struct virtqueue *_vq, unsigned int *len)
> >   	return virtqueue_get_buf_ctx(_vq, len, NULL);
> >   }
> >   EXPORT_SYMBOL_GPL(virtqueue_get_buf);
> > +
> > +static void virtqueue_disable_cb_split(struct virtqueue *_vq)
> > +{
> > +	struct vring_virtqueue *vq = to_vvq(_vq);
> > +
> > +	if (!(vq->avail_flags_shadow & VRING_AVAIL_F_NO_INTERRUPT)) {
> > +		vq->avail_flags_shadow |= VRING_AVAIL_F_NO_INTERRUPT;
> > +		if (!vq->event)
> > +			vq->vring.avail->flags = cpu_to_virtio16(_vq->vdev,
> > +							vq->avail_flags_shadow);
> > +	}
> > +}
> > +
> > +static void virtqueue_disable_cb_packed(struct virtqueue *_vq)
> > +{
> > +	// FIXME: to be implemented
> > +}
> > +
> >   /**
> >    * virtqueue_disable_cb - disable callbacks
> >    * @vq: the struct virtqueue we're talking about.
> > @@ -774,12 +1217,10 @@ void virtqueue_disable_cb(struct virtqueue *_vq)
> >   {
> >   	struct vring_virtqueue *vq = to_vvq(_vq);
> > -	if (!(vq->avail_flags_shadow & VRING_AVAIL_F_NO_INTERRUPT)) {
> > -		vq->avail_flags_shadow |= VRING_AVAIL_F_NO_INTERRUPT;
> > -		if (!vq->event)
> > -			vq->vring.avail->flags = cpu_to_virtio16(_vq->vdev, vq->avail_flags_shadow);
> > -	}
> > -
> > +	if (vq->packed)
> > +		virtqueue_disable_cb_packed(_vq);
> > +	else
> > +		virtqueue_disable_cb_split(_vq);
> >   }
> >   EXPORT_SYMBOL_GPL(virtqueue_disable_cb);
> > @@ -802,6 +1243,12 @@ unsigned virtqueue_enable_cb_prepare(struct virtqueue *_vq)
> >   	START_USE(vq);
> > +	if (vq->packed) {
> > +		// FIXME: to be implemented
> > +		last_used_idx = vq->last_used_idx;
> > +		goto out;
> > +	}
> > +
> >   	/* We optimistically turn back on interrupts, then check if there was
> >   	 * more to do. */
> >   	/* Depending on the VIRTIO_RING_F_EVENT_IDX feature, we need to
> > @@ -813,6 +1260,7 @@ unsigned virtqueue_enable_cb_prepare(struct virtqueue *_vq)
> >   			vq->vring.avail->flags = cpu_to_virtio16(_vq->vdev, vq->avail_flags_shadow);
> >   	}
> >   	vring_used_event(&vq->vring) = cpu_to_virtio16(_vq->vdev, last_used_idx = vq->last_used_idx);
> > +out:
> >   	END_USE(vq);
> >   	return last_used_idx;
> >   }
> > @@ -832,6 +1280,12 @@ bool virtqueue_poll(struct virtqueue *_vq, unsigned last_used_idx)
> >   	struct vring_virtqueue *vq = to_vvq(_vq);
> >   	virtio_mb(vq->weak_barriers);
> > +	if (vq->packed) {
> > +		u16 flags = virtio16_to_cpu(vq->vq.vdev,
> > +				vq->vring_packed.desc[last_used_idx].flags);
> > +		return !(flags & VRING_DESC_F_AVAIL(1)) ==
> > +		       !(flags & VRING_DESC_F_USED(1));
> > +	}
> >   	return (u16)last_used_idx != virtio16_to_cpu(_vq->vdev, vq->vring.used->idx);
> >   }
> >   EXPORT_SYMBOL_GPL(virtqueue_poll);
> > @@ -874,6 +1328,11 @@ bool virtqueue_enable_cb_delayed(struct virtqueue *_vq)
> >   	START_USE(vq);
> > +	if (vq->packed) {
> > +		// FIXME: to be implemented
> > +		goto out;
> > +	}
> > +
> >   	/* We optimistically turn back on interrupts, then check if there was
> >   	 * more to do. */
> >   	/* Depending on the VIRTIO_RING_F_USED_EVENT_IDX feature, we need to
> > @@ -896,6 +1355,7 @@ bool virtqueue_enable_cb_delayed(struct virtqueue *_vq)
> >   		return false;
> >   	}
> > +out:
> >   	END_USE(vq);
> >   	return true;
> >   }
> > @@ -922,14 +1382,20 @@ void *virtqueue_detach_unused_buf(struct virtqueue *_vq)
> >   			continue;
> >   		/* detach_buf clears data, so grab it now. */
> >   		buf = vq->desc_state[i].data;
> > -		detach_buf(vq, i, NULL);
> > -		vq->avail_idx_shadow--;
> > -		vq->vring.avail->idx = cpu_to_virtio16(_vq->vdev, vq->avail_idx_shadow);
> > +		if (vq->packed)
> > +			detach_buf_packed(vq, i, NULL);
> > +		else {
> > +			detach_buf_split(vq, i, NULL);
> > +			vq->avail_idx_shadow--;
> > +			vq->vring.avail->idx = cpu_to_virtio16(_vq->vdev,
> > +							vq->avail_idx_shadow);
> > +		}
> >   		END_USE(vq);
> >   		return buf;
> >   	}
> >   	/* That should have freed everything. */
> > -	BUG_ON(vq->vq.num_free != vq->vring.num);
> > +	BUG_ON(vq->vq.num_free != (vq->packed ? vq->vring_packed.num :
> > +						vq->vring.num));
> >   	END_USE(vq);
> >   	return NULL;
> > @@ -957,7 +1423,8 @@ irqreturn_t vring_interrupt(int irq, void *_vq)
> >   EXPORT_SYMBOL_GPL(vring_interrupt);
> >   struct virtqueue *__vring_new_virtqueue(unsigned int index,
> > -					struct vring vring,
> > +					union vring_union vring,
> > +					bool packed,
> >   					struct virtio_device *vdev,
> >   					bool weak_barriers,
> >   					bool context,
> > @@ -965,19 +1432,20 @@ struct virtqueue *__vring_new_virtqueue(unsigned int index,
> >   					void (*callback)(struct virtqueue *),
> >   					const char *name)
> >   {
> > -	unsigned int i;
> > +	unsigned int num, i;
> >   	struct vring_virtqueue *vq;
> > -	vq = kmalloc(sizeof(*vq) + vring.num * sizeof(struct vring_desc_state),
> > +	num = packed ? vring.vring_packed.num : vring.vring_split.num;
> > +
> > +	vq = kmalloc(sizeof(*vq) + num * sizeof(struct vring_desc_state),
> >   		     GFP_KERNEL);
> >   	if (!vq)
> >   		return NULL;
> > -	vq->vring = vring;
> >   	vq->vq.callback = callback;
> >   	vq->vq.vdev = vdev;
> >   	vq->vq.name = name;
> > -	vq->vq.num_free = vring.num;
> > +	vq->vq.num_free = num;
> >   	vq->vq.index = index;
> >   	vq->we_own_ring = false;
> >   	vq->queue_dma_addr = 0;
> > @@ -986,9 +1454,8 @@ struct virtqueue *__vring_new_virtqueue(unsigned int index,
> >   	vq->weak_barriers = weak_barriers;
> >   	vq->broken = false;
> >   	vq->last_used_idx = 0;
> > -	vq->avail_flags_shadow = 0;
> > -	vq->avail_idx_shadow = 0;
> >   	vq->num_added = 0;
> > +	vq->packed = packed;
> >   	list_add_tail(&vq->vq.list, &vdev->vqs);
> >   #ifdef DEBUG
> >   	vq->in_use = false;
> > @@ -999,18 +1466,41 @@ struct virtqueue *__vring_new_virtqueue(unsigned int index,
> >   		!context;
> >   	vq->event = virtio_has_feature(vdev, VIRTIO_RING_F_EVENT_IDX);
> > +	if (vq->packed) {
> > +		vq->vring_packed = vring.vring_packed;
> > +		vq->free_head = 0;
> > +		vq->wrap_counter = 1;
> > +
> > +#if 0
> > +		vq->chaining = virtio_has_feature(vdev,
> > +						  VIRTIO_RING_F_LIST_DESC);
> > +#else
> > +		vq->chaining = true;
> 
> Looks like in V10 there's no F_LIST_DESC.

Yes. I kept this in this patch just because the
desc chaining is optional in the old spec draft
when sending out this patch set. I'll remove it
in next version.

> 
> > +#endif
> > +	} else {
> > +		vq->vring = vring.vring_split;
> > +		vq->avail_flags_shadow = 0;
> > +		vq->avail_idx_shadow = 0;
> > +
> > +		/* Put everything in free lists. */
> > +		vq->free_head = 0;
> > +		for (i = 0; i < num-1; i++)
> > +			vq->vring.desc[i].next = cpu_to_virtio16(vdev, i + 1);
> > +	}
> > +
> >   	/* No callback?  Tell other side not to bother us. */
> >   	if (!callback) {
> > -		vq->avail_flags_shadow |= VRING_AVAIL_F_NO_INTERRUPT;
> > -		if (!vq->event)
> > -			vq->vring.avail->flags = cpu_to_virtio16(vdev, vq->avail_flags_shadow);
> > +		if (packed) {
> > +			// FIXME: to be implemented
> > +		} else {
> > +			vq->avail_flags_shadow |= VRING_AVAIL_F_NO_INTERRUPT;
> > +			if (!vq->event)
> > +				vq->vring.avail->flags = cpu_to_virtio16(vdev,
> > +						vq->avail_flags_shadow);
> > +		}
> >   	}
> > -	/* Put everything in free lists. */
> > -	vq->free_head = 0;
> > -	for (i = 0; i < vring.num-1; i++)
> > -		vq->vring.desc[i].next = cpu_to_virtio16(vdev, i + 1);
> > -	memset(vq->desc_state, 0, vring.num * sizeof(struct vring_desc_state));
> > +	memset(vq->desc_state, 0, num * sizeof(struct vring_desc_state));
> >   	return &vq->vq;
> >   }
> > @@ -1058,6 +1548,14 @@ static void vring_free_queue(struct virtio_device *vdev, size_t size,
> >   	}
> >   }
> > +static inline int
> > +__vring_size(unsigned int num, unsigned long align, bool packed)
> > +{
> > +	if (packed)
> > +		return vring_packed_size(num, align);
> > +	return vring_size(num, align);
> > +}
> > +
> >   struct virtqueue *vring_create_virtqueue(
> >   	unsigned int index,
> >   	unsigned int num,
> > @@ -1074,7 +1572,8 @@ struct virtqueue *vring_create_virtqueue(
> >   	void *queue = NULL;
> >   	dma_addr_t dma_addr;
> >   	size_t queue_size_in_bytes;
> > -	struct vring vring;
> > +	union vring_union vring;
> > +	bool packed;
> >   	/* We assume num is a power of 2. */
> >   	if (num & (num - 1)) {
> > @@ -1082,9 +1581,13 @@ struct virtqueue *vring_create_virtqueue(
> >   		return NULL;
> >   	}
> > +	packed = virtio_has_feature(vdev, VIRTIO_F_RING_PACKED);
> > +
> >   	/* TODO: allocate each queue chunk individually */
> > -	for (; num && vring_size(num, vring_align) > PAGE_SIZE; num /= 2) {
> > -		queue = vring_alloc_queue(vdev, vring_size(num, vring_align),
> > +	for (; num && __vring_size(num, vring_align, packed) > PAGE_SIZE;
> > +			num /= 2) {
> > +		queue = vring_alloc_queue(vdev, __vring_size(num, vring_align,
> > +							     packed),
> >   					  &dma_addr,
> >   					  GFP_KERNEL|__GFP_NOWARN|__GFP_ZERO);
> >   		if (queue)
> > @@ -1096,17 +1599,21 @@ struct virtqueue *vring_create_virtqueue(
> >   	if (!queue) {
> >   		/* Try to get a single page. You are my only hope! */
> > -		queue = vring_alloc_queue(vdev, vring_size(num, vring_align),
> > +		queue = vring_alloc_queue(vdev, __vring_size(num, vring_align,
> > +							     packed),
> >   					  &dma_addr, GFP_KERNEL|__GFP_ZERO);
> >   	}
> >   	if (!queue)
> >   		return NULL;
> > -	queue_size_in_bytes = vring_size(num, vring_align);
> > -	vring_init(&vring, num, queue, vring_align);
> > +	queue_size_in_bytes = __vring_size(num, vring_align, packed);
> > +	if (packed)
> > +		vring_packed_init(&vring.vring_packed, num, queue, vring_align);
> > +	else
> > +		vring_init(&vring.vring_split, num, queue, vring_align);
> 
> Let's rename vring_init to vring_init_split() like other helpers?

The vring_init() is a public API in include/uapi/linux/virtio_ring.h.
I don't think we can rename it.

> 
> > -	vq = __vring_new_virtqueue(index, vring, vdev, weak_barriers, context,
> > -				   notify, callback, name);
> > +	vq = __vring_new_virtqueue(index, vring, packed, vdev, weak_barriers,
> > +				   context, notify, callback, name);
> >   	if (!vq) {
> >   		vring_free_queue(vdev, queue_size_in_bytes, queue,
> >   				 dma_addr);
> > @@ -1132,10 +1639,17 @@ struct virtqueue *vring_new_virtqueue(unsigned int index,
> >   				      void (*callback)(struct virtqueue *vq),
> >   				      const char *name)
> >   {
> > -	struct vring vring;
> > -	vring_init(&vring, num, pages, vring_align);
> > -	return __vring_new_virtqueue(index, vring, vdev, weak_barriers, context,
> > -				     notify, callback, name);
> > +	union vring_union vring;
> > +	bool packed;
> > +
> > +	packed = virtio_has_feature(vdev, VIRTIO_F_RING_PACKED);
> > +	if (packed)
> > +		vring_packed_init(&vring.vring_packed, num, pages, vring_align);
> > +	else
> > +		vring_init(&vring.vring_split, num, pages, vring_align);
> > +
> > +	return __vring_new_virtqueue(index, vring, packed, vdev, weak_barriers,
> > +				     context, notify, callback, name);
> >   }
> >   EXPORT_SYMBOL_GPL(vring_new_virtqueue);
> > @@ -1145,7 +1659,9 @@ void vring_del_virtqueue(struct virtqueue *_vq)
> >   	if (vq->we_own_ring) {
> >   		vring_free_queue(vq->vq.vdev, vq->queue_size_in_bytes,
> > -				 vq->vring.desc, vq->queue_dma_addr);
> > +				 vq->packed ? (void *)vq->vring_packed.desc :
> > +					      (void *)vq->vring.desc,
> > +				 vq->queue_dma_addr);
> >   	}
> >   	list_del(&_vq->list);
> >   	kfree(vq);
> > @@ -1159,14 +1675,18 @@ void vring_transport_features(struct virtio_device *vdev)
> >   	for (i = VIRTIO_TRANSPORT_F_START; i < VIRTIO_TRANSPORT_F_END; i++) {
> >   		switch (i) {
> > +#if 0 // FIXME: to be implemented
> >   		case VIRTIO_RING_F_INDIRECT_DESC:
> >   			break;
> > +#endif
> >   		case VIRTIO_RING_F_EVENT_IDX:
> >   			break;
> >   		case VIRTIO_F_VERSION_1:
> >   			break;
> >   		case VIRTIO_F_IOMMU_PLATFORM:
> >   			break;
> > +		case VIRTIO_F_RING_PACKED:
> > +			break;
> >   		default:
> >   			/* We don't understand this bit. */
> >   			__virtio_clear_bit(vdev, i);
> > @@ -1187,7 +1707,7 @@ unsigned int virtqueue_get_vring_size(struct virtqueue *_vq)
> >   	struct vring_virtqueue *vq = to_vvq(_vq);
> > -	return vq->vring.num;
> > +	return vq->packed ? vq->vring_packed.num : vq->vring.num;
> >   }
> >   EXPORT_SYMBOL_GPL(virtqueue_get_vring_size);
> > @@ -1224,6 +1744,7 @@ dma_addr_t virtqueue_get_desc_addr(struct virtqueue *_vq)
> >   }
> >   EXPORT_SYMBOL_GPL(virtqueue_get_desc_addr);
> > +/* Only available for split ring */
> 
> Interesting, I think we need this for correctly configure pci. e.g in
> setup_vq()?

Yes. The setup_vq() should be updated. But it requires
QEMU change, so I just kept it as is in this RFC patch.

> 
> >   dma_addr_t virtqueue_get_avail_addr(struct virtqueue *_vq)
> >   {
> >   	struct vring_virtqueue *vq = to_vvq(_vq);
> > @@ -1235,6 +1756,7 @@ dma_addr_t virtqueue_get_avail_addr(struct virtqueue *_vq)
> >   }
> >   EXPORT_SYMBOL_GPL(virtqueue_get_avail_addr);
> > +/* Only available for split ring */
> >   dma_addr_t virtqueue_get_used_addr(struct virtqueue *_vq)
> 
> Maybe it's better to rename this to get_device_addr().

It's a kernel API which has been exported by EXPORT_SYMBOL_GPL(),
I'm not sure whether it's a good idea to rename it.

Best regards,
Tiwei Bie

> 
> Thanks
> 
> >   {
> >   	struct vring_virtqueue *vq = to_vvq(_vq);
> > @@ -1246,6 +1768,7 @@ dma_addr_t virtqueue_get_used_addr(struct virtqueue *_vq)
> >   }
> >   EXPORT_SYMBOL_GPL(virtqueue_get_used_addr);
> > +/* Only available for split ring */
> >   const struct vring *virtqueue_get_vring(struct virtqueue *vq)
> >   {
> >   	return &to_vvq(vq)->vring;
> > diff --git a/include/linux/virtio_ring.h b/include/linux/virtio_ring.h
> > index bbf32524ab27..a0075894ad16 100644
> > --- a/include/linux/virtio_ring.h
> > +++ b/include/linux/virtio_ring.h
> > @@ -60,6 +60,11 @@ static inline void virtio_store_mb(bool weak_barriers,
> >   struct virtio_device;
> >   struct virtqueue;
> > +union vring_union {
> > +	struct vring vring_split;
> > +	struct vring_packed vring_packed;
> > +};
> > +
> >   /*
> >    * Creates a virtqueue and allocates the descriptor ring.  If
> >    * may_reduce_num is set, then this may allocate a smaller ring than
> > @@ -79,7 +84,8 @@ struct virtqueue *vring_create_virtqueue(unsigned int index,
> >   /* Creates a virtqueue with a custom layout. */
> >   struct virtqueue *__vring_new_virtqueue(unsigned int index,
> > -					struct vring vring,
> > +					union vring_union vring,
> > +					bool packed,
> >   					struct virtio_device *vdev,
> >   					bool weak_barriers,
> >   					bool ctx,
> 

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH RFC 2/2] virtio_ring: support packed ring
@ 2018-03-16  6:10       ` Tiwei Bie
  0 siblings, 0 replies; 38+ messages in thread
From: Tiwei Bie @ 2018-03-16  6:10 UTC (permalink / raw)
  To: Jason Wang; +Cc: mst, netdev, linux-kernel, virtualization, wexu

On Fri, Mar 16, 2018 at 12:03:25PM +0800, Jason Wang wrote:
> On 2018年02月23日 19:18, Tiwei Bie wrote:
> > Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
> > ---
> >   drivers/virtio/virtio_ring.c | 699 +++++++++++++++++++++++++++++++++++++------
> >   include/linux/virtio_ring.h  |   8 +-
> >   2 files changed, 618 insertions(+), 89 deletions(-)
> > 
> > diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> > index eb30f3e09a47..393778a2f809 100644
> > --- a/drivers/virtio/virtio_ring.c
> > +++ b/drivers/virtio/virtio_ring.c
> > @@ -58,14 +58,14 @@
> >   struct vring_desc_state {
> >   	void *data;			/* Data for callback. */
> > -	struct vring_desc *indir_desc;	/* Indirect descriptor, if any. */
> > +	void *indir_desc;		/* Indirect descriptor, if any. */
> > +	int num;			/* Descriptor list length. */
> >   };
> >   struct vring_virtqueue {
> >   	struct virtqueue vq;
> > -	/* Actual memory layout for this queue */
> > -	struct vring vring;
> > +	bool packed;
> >   	/* Can we use weak barriers? */
> >   	bool weak_barriers;
> > @@ -87,11 +87,28 @@ struct vring_virtqueue {
> >   	/* Last used index we've seen. */
> >   	u16 last_used_idx;
> > -	/* Last written value to avail->flags */
> > -	u16 avail_flags_shadow;
> > -
> > -	/* Last written value to avail->idx in guest byte order */
> > -	u16 avail_idx_shadow;
> > +	union {
> > +		/* Available for split ring */
> > +		struct {
> > +			/* Actual memory layout for this queue */
> > +			struct vring vring;
> > +
> > +			/* Last written value to avail->flags */
> > +			u16 avail_flags_shadow;
> > +
> > +			/* Last written value to avail->idx in
> > +			 * guest byte order */
> > +			u16 avail_idx_shadow;
> > +		};
> > +
> > +		/* Available for packed ring */
> > +		struct {
> > +			/* Actual memory layout for this queue */
> > +			struct vring_packed vring_packed;
> > +			u8 wrap_counter : 1;
> > +			bool chaining;
> > +		};
> > +	};
> >   	/* How to notify other side. FIXME: commonalize hcalls! */
> >   	bool (*notify)(struct virtqueue *vq);
> > @@ -201,26 +218,37 @@ static dma_addr_t vring_map_single(const struct vring_virtqueue *vq,
> >   			      cpu_addr, size, direction);
> >   }
> > -static void vring_unmap_one(const struct vring_virtqueue *vq,
> > -			    struct vring_desc *desc)
> > +static void vring_unmap_one(const struct vring_virtqueue *vq, void *_desc)
> >   {
> 
> Let's split the helpers to packed/split version like other helpers?
> (Consider the caller has already known the type of vq).

Okay.

> 
> > +	u64 addr;
> > +	u32 len;
> >   	u16 flags;
> >   	if (!vring_use_dma_api(vq->vq.vdev))
> >   		return;
> > -	flags = virtio16_to_cpu(vq->vq.vdev, desc->flags);
> > +	if (vq->packed) {
> > +		struct vring_packed_desc *desc = _desc;
> > +
> > +		addr = virtio64_to_cpu(vq->vq.vdev, desc->addr);
> > +		len = virtio32_to_cpu(vq->vq.vdev, desc->len);
> > +		flags = virtio16_to_cpu(vq->vq.vdev, desc->flags);
> > +	} else {
> > +		struct vring_desc *desc = _desc;
> > +
> > +		addr = virtio64_to_cpu(vq->vq.vdev, desc->addr);
> > +		len = virtio32_to_cpu(vq->vq.vdev, desc->len);
> > +		flags = virtio16_to_cpu(vq->vq.vdev, desc->flags);
> > +	}
> >   	if (flags & VRING_DESC_F_INDIRECT) {
> >   		dma_unmap_single(vring_dma_dev(vq),
> > -				 virtio64_to_cpu(vq->vq.vdev, desc->addr),
> > -				 virtio32_to_cpu(vq->vq.vdev, desc->len),
> > +				 addr, len,
> >   				 (flags & VRING_DESC_F_WRITE) ?
> >   				 DMA_FROM_DEVICE : DMA_TO_DEVICE);
> >   	} else {
> >   		dma_unmap_page(vring_dma_dev(vq),
> > -			       virtio64_to_cpu(vq->vq.vdev, desc->addr),
> > -			       virtio32_to_cpu(vq->vq.vdev, desc->len),
> > +			       addr, len,
> >   			       (flags & VRING_DESC_F_WRITE) ?
> >   			       DMA_FROM_DEVICE : DMA_TO_DEVICE);
> >   	}
> > @@ -235,8 +263,9 @@ static int vring_mapping_error(const struct vring_virtqueue *vq,
> >   	return dma_mapping_error(vring_dma_dev(vq), addr);
> >   }
> > -static struct vring_desc *alloc_indirect(struct virtqueue *_vq,
> > -					 unsigned int total_sg, gfp_t gfp)
> > +static struct vring_desc *alloc_indirect_split(struct virtqueue *_vq,
> > +					       unsigned int total_sg,
> > +					       gfp_t gfp)
> >   {
> >   	struct vring_desc *desc;
> >   	unsigned int i;
> > @@ -257,14 +286,32 @@ static struct vring_desc *alloc_indirect(struct virtqueue *_vq,
> >   	return desc;
> >   }
> > -static inline int virtqueue_add(struct virtqueue *_vq,
> > -				struct scatterlist *sgs[],
> > -				unsigned int total_sg,
> > -				unsigned int out_sgs,
> > -				unsigned int in_sgs,
> > -				void *data,
> > -				void *ctx,
> > -				gfp_t gfp)
> > +static struct vring_packed_desc *alloc_indirect_packed(struct virtqueue *_vq,
> > +						       unsigned int total_sg,
> > +						       gfp_t gfp)
> > +{
> > +	struct vring_packed_desc *desc;
> > +
> > +	/*
> > +	 * We require lowmem mappings for the descriptors because
> > +	 * otherwise virt_to_phys will give us bogus addresses in the
> > +	 * virtqueue.
> > +	 */
> > +	gfp &= ~__GFP_HIGHMEM;
> > +
> > +	desc = kmalloc(total_sg * sizeof(struct vring_packed_desc), gfp);
> > +
> > +	return desc;
> > +}
> > +
> > +static inline int virtqueue_add_split(struct virtqueue *_vq,
> > +				      struct scatterlist *sgs[],
> > +				      unsigned int total_sg,
> > +				      unsigned int out_sgs,
> > +				      unsigned int in_sgs,
> > +				      void *data,
> > +				      void *ctx,
> > +				      gfp_t gfp)
> >   {
> >   	struct vring_virtqueue *vq = to_vvq(_vq);
> >   	struct scatterlist *sg;
> > @@ -303,7 +350,7 @@ static inline int virtqueue_add(struct virtqueue *_vq,
> >   	/* If the host supports indirect descriptor tables, and we have multiple
> >   	 * buffers, then go indirect. FIXME: tune this threshold */
> >   	if (vq->indirect && total_sg > 1 && vq->vq.num_free)
> > -		desc = alloc_indirect(_vq, total_sg, gfp);
> > +		desc = alloc_indirect_split(_vq, total_sg, gfp);
> >   	else {
> >   		desc = NULL;
> >   		WARN_ON_ONCE(total_sg > vq->vring.num && !vq->indirect);
> > @@ -437,6 +484,243 @@ static inline int virtqueue_add(struct virtqueue *_vq,
> >   	return -EIO;
> >   }
> > +static inline int virtqueue_add_packed(struct virtqueue *_vq,
> > +				       struct scatterlist *sgs[],
> > +				       unsigned int total_sg,
> > +				       unsigned int out_sgs,
> > +				       unsigned int in_sgs,
> > +				       void *data,
> > +				       void *ctx,
> > +				       gfp_t gfp)
> > +{
> > +	struct vring_virtqueue *vq = to_vvq(_vq);
> > +	struct vring_packed_desc *desc;
> > +	struct scatterlist *sg;
> > +	unsigned int i, n, descs_used, uninitialized_var(prev), err_idx;
> > +	__virtio16 uninitialized_var(head_flags), flags;
> > +	int head, wrap_counter;
> > +	bool indirect;
> > +
> > +	START_USE(vq);
> > +
> > +	BUG_ON(data == NULL);
> > +	BUG_ON(ctx && vq->indirect);
> > +
> > +	if (unlikely(vq->broken)) {
> > +		END_USE(vq);
> > +		return -EIO;
> > +	}
> > +
> > +	if (total_sg > 1 && !vq->chaining && !vq->indirect) {
> > +		END_USE(vq);
> > +		return -ENOTSUPP;
> > +	}
> > +
> > +#ifdef DEBUG
> > +	{
> > +		ktime_t now = ktime_get();
> > +
> > +		/* No kick or get, with .1 second between?  Warn. */
> > +		if (vq->last_add_time_valid)
> > +			WARN_ON(ktime_to_ms(ktime_sub(now, vq->last_add_time))
> > +					    > 100);
> > +		vq->last_add_time = now;
> > +		vq->last_add_time_valid = true;
> > +	}
> > +#endif
> > +
> > +	BUG_ON(total_sg == 0);
> > +
> > +	head = vq->free_head;
> > +	wrap_counter = vq->wrap_counter;
> > +
> > +	/* If the host supports indirect descriptor tables, and we have multiple
> > +	 * buffers, then go indirect. FIXME: tune this threshold */
> > +	if (vq->indirect && total_sg > 1 && vq->vq.num_free)
> > +		desc = alloc_indirect_packed(_vq, total_sg, gfp);
> > +	else {
> > +		desc = NULL;
> > +		WARN_ON_ONCE(total_sg > vq->vring.num && !vq->indirect);
> > +	}
> > +
> > +	if (desc) {
> > +		/* Use a single buffer which doesn't continue */
> > +		indirect = true;
> > +		/* Set up rest to use this indirect table. */
> > +		i = 0;
> > +		descs_used = 1;
> > +	} else {
> > +		indirect = false;
> > +		desc = vq->vring_packed.desc;
> > +		i = head;
> > +		descs_used = total_sg;
> > +
> > +		if (total_sg > 1 && !vq->chaining) {
> > +			END_USE(vq);
> > +			return -ENOTSUPP;
> > +		}
> > +	}
> > +
> > +	if (vq->vq.num_free < descs_used) {
> > +		pr_debug("Can't add buf len %i - avail = %i\n",
> > +			 descs_used, vq->vq.num_free);
> > +		/* FIXME: for historical reasons, we force a notify here if
> > +		 * there are outgoing parts to the buffer.  Presumably the
> > +		 * host should service the ring ASAP. */
> > +		if (out_sgs)
> > +			vq->notify(&vq->vq);
> > +		if (indirect)
> > +			kfree(desc);
> > +		END_USE(vq);
> > +		return -ENOSPC;
> > +	}
> > +
> > +	for (n = 0; n < out_sgs; n++) {
> > +		for (sg = sgs[n]; sg; sg = sg_next(sg)) {
> > +			dma_addr_t addr = vring_map_one_sg(vq, sg, DMA_TO_DEVICE);
> > +			if (vring_mapping_error(vq, addr))
> > +				goto unmap_release;
> > +
> > +			flags = cpu_to_virtio16(_vq->vdev, VRING_DESC_F_NEXT |
> > +					VRING_DESC_F_AVAIL(vq->wrap_counter) |
> > +					VRING_DESC_F_USED(!vq->wrap_counter));
> > +			if (!indirect && i == head)
> > +				head_flags = flags;
> > +			else
> > +				desc[i].flags = flags;
> > +
> > +			desc[i].addr = cpu_to_virtio64(_vq->vdev, addr);
> > +			desc[i].len = cpu_to_virtio32(_vq->vdev, sg->length);
> > +			desc[i].id = cpu_to_virtio32(_vq->vdev, head);
> 
> If it's a part of chain, we only need to do this for last buffer I think.

I'm not sure I've got your point about the "last buffer".
But, yes, id just needs to be set for the last desc.

> 
> > +			prev = i;
> > +			i++;
> 
> It looks to me prev is always i - 1?

No. prev will be (vq->vring_packed.num - 1) when i becomes 0.

> 
> > +			if (!indirect && i >= vq->vring_packed.num) {
> > +				i = 0;
> > +				vq->wrap_counter ^= 1;
> > +			}
> > +		}
> > +	}
> > +	for (; n < (out_sgs + in_sgs); n++) {
> > +		for (sg = sgs[n]; sg; sg = sg_next(sg)) {
> > +			dma_addr_t addr = vring_map_one_sg(vq, sg, DMA_FROM_DEVICE);
> > +			if (vring_mapping_error(vq, addr))
> > +				goto unmap_release;
> > +
> > +			flags = cpu_to_virtio16(_vq->vdev, VRING_DESC_F_NEXT |
> > +					VRING_DESC_F_WRITE |
> > +					VRING_DESC_F_AVAIL(vq->wrap_counter) |
> > +					VRING_DESC_F_USED(!vq->wrap_counter));
> > +			if (!indirect && i == head)
> > +				head_flags = flags;
> > +			else
> > +				desc[i].flags = flags;
> > +
> > +			desc[i].addr = cpu_to_virtio64(_vq->vdev, addr);
> > +			desc[i].len = cpu_to_virtio32(_vq->vdev, sg->length);
> > +			desc[i].id = cpu_to_virtio32(_vq->vdev, head);
> > +			prev = i;
> > +			i++;
> > +			if (!indirect && i >= vq->vring_packed.num) {
> > +				i = 0;
> > +				vq->wrap_counter ^= 1;
> > +			}
> > +		}
> > +	}
> > +	/* Last one doesn't continue. */
> > +	if (!indirect && (head + 1) % vq->vring_packed.num == i)
> > +		head_flags &= cpu_to_virtio16(_vq->vdev, ~VRING_DESC_F_NEXT);
> 
> I can't get the why we need this here.

If only one desc is used, we will need to clear the
VRING_DESC_F_NEXT flag from the head_flags.

> 
> > +	else
> > +		desc[prev].flags &= cpu_to_virtio16(_vq->vdev, ~VRING_DESC_F_NEXT);
> > +
> > +	if (indirect) {
> > +		/* FIXME: to be implemented */
> > +
> > +		/* Now that the indirect table is filled in, map it. */
> > +		dma_addr_t addr = vring_map_single(
> > +			vq, desc, total_sg * sizeof(struct vring_packed_desc),
> > +			DMA_TO_DEVICE);
> > +		if (vring_mapping_error(vq, addr))
> > +			goto unmap_release;
> > +
> > +		head_flags = cpu_to_virtio16(_vq->vdev, VRING_DESC_F_INDIRECT |
> > +					     VRING_DESC_F_AVAIL(wrap_counter) |
> > +					     VRING_DESC_F_USED(!wrap_counter));
> > +		vq->vring_packed.desc[head].addr = cpu_to_virtio64(_vq->vdev, addr);
> > +		vq->vring_packed.desc[head].len = cpu_to_virtio32(_vq->vdev,
> > +				total_sg * sizeof(struct vring_packed_desc));
> > +		vq->vring_packed.desc[head].id = cpu_to_virtio32(_vq->vdev, head);
> > +	}
> > +
> > +	/* We're using some buffers from the free list. */
> > +	vq->vq.num_free -= descs_used;
> > +
> > +	/* Update free pointer */
> > +	if (indirect) {
> > +		n = head + 1;
> > +		if (n >= vq->vring_packed.num) {
> > +			n = 0;
> > +			vq->wrap_counter ^= 1;
> > +		}
> > +		vq->free_head = n;
> 
> detach_buf_packed() does not even touch free_head here, so need to explain
> its meaning for packed ring.

Above code is for indirect support which isn't really
implemented in this patch yet.

For your question, free_head stores the index of the
next avail desc. I'll add a comment for it or move it
to union and give it a better name in next version.

> 
> > +	} else
> > +		vq->free_head = i;
> 
> ID is only valid in the last descriptor in the list, so head + 1 should be
> ok too?

I don't really get your point. The vq->free_head stores
the index of the next avail desc.

> 
> > +
> > +	/* Store token and indirect buffer state. */
> > +	vq->desc_state[head].num = descs_used;
> > +	vq->desc_state[head].data = data;
> > +	if (indirect)
> > +		vq->desc_state[head].indir_desc = desc;
> > +	else
> > +		vq->desc_state[head].indir_desc = ctx;
> > +
> > +	virtio_wmb(vq->weak_barriers);
> 
> Let's add a comment to explain the barrier here.

Okay.

> 
> > +	vq->vring_packed.desc[head].flags = head_flags;
> > +	vq->num_added++;
> > +
> > +	pr_debug("Added buffer head %i to %p\n", head, vq);
> > +	END_USE(vq);
> > +
> > +	return 0;
> > +
> > +unmap_release:
> > +	err_idx = i;
> > +	i = head;
> > +
> > +	for (n = 0; n < total_sg; n++) {
> > +		if (i == err_idx)
> > +			break;
> > +		vring_unmap_one(vq, &desc[i]);
> > +		i++;
> > +		if (!indirect && i >= vq->vring_packed.num)
> > +			i = 0;
> > +	}
> > +
> > +	vq->wrap_counter = wrap_counter;
> > +
> > +	if (indirect)
> > +		kfree(desc);
> > +
> > +	END_USE(vq);
> > +	return -EIO;
> > +}
> > +
> > +static inline int virtqueue_add(struct virtqueue *_vq,
> > +				struct scatterlist *sgs[],
> > +				unsigned int total_sg,
> > +				unsigned int out_sgs,
> > +				unsigned int in_sgs,
> > +				void *data,
> > +				void *ctx,
> > +				gfp_t gfp)
> > +{
> > +	struct vring_virtqueue *vq = to_vvq(_vq);
> > +
> > +	return vq->packed ? virtqueue_add_packed(_vq, sgs, total_sg, out_sgs,
> > +						 in_sgs, data, ctx, gfp) :
> > +			    virtqueue_add_split(_vq, sgs, total_sg, out_sgs,
> > +						in_sgs, data, ctx, gfp);
> > +}
> > +
> >   /**
> >    * virtqueue_add_sgs - expose buffers to other end
> >    * @vq: the struct virtqueue we're talking about.
> > @@ -561,6 +845,12 @@ bool virtqueue_kick_prepare(struct virtqueue *_vq)
> >   	 * event. */
> >   	virtio_mb(vq->weak_barriers);
> > +	if (vq->packed) {
> > +		/* FIXME: to be implemented */
> > +		needs_kick = true;
> > +		goto out;
> > +	}
> > +
> >   	old = vq->avail_idx_shadow - vq->num_added;
> >   	new = vq->avail_idx_shadow;
> >   	vq->num_added = 0;
> > @@ -579,6 +869,8 @@ bool virtqueue_kick_prepare(struct virtqueue *_vq)
> >   	} else {
> >   		needs_kick = !(vq->vring.used->flags & cpu_to_virtio16(_vq->vdev, VRING_USED_F_NO_NOTIFY));
> >   	}
> > +
> > +out:
> >   	END_USE(vq);
> >   	return needs_kick;
> >   }
> > @@ -628,8 +920,8 @@ bool virtqueue_kick(struct virtqueue *vq)
> >   }
> >   EXPORT_SYMBOL_GPL(virtqueue_kick);
> > -static void detach_buf(struct vring_virtqueue *vq, unsigned int head,
> > -		       void **ctx)
> > +static void detach_buf_split(struct vring_virtqueue *vq, unsigned int head,
> > +			     void **ctx)
> >   {
> >   	unsigned int i, j;
> >   	__virtio16 nextflag = cpu_to_virtio16(vq->vq.vdev, VRING_DESC_F_NEXT);
> > @@ -677,29 +969,81 @@ static void detach_buf(struct vring_virtqueue *vq, unsigned int head,
> >   	}
> >   }
> > -static inline bool more_used(const struct vring_virtqueue *vq)
> > +static void detach_buf_packed(struct vring_virtqueue *vq, unsigned int head,
> > +			      void **ctx)
> > +{
> > +	struct vring_packed_desc *desc;
> > +	unsigned int i, j;
> > +
> > +	/* Clear data ptr. */
> > +	vq->desc_state[head].data = NULL;
> > +
> > +	i = head;
> > +
> > +	for (j = 0; j < vq->desc_state[head].num; j++) {
> > +		desc = &vq->vring_packed.desc[i];
> > +		vring_unmap_one(vq, desc);
> > +		i++;
> > +		if (i >= vq->vring_packed.num)
> > +			i = 0;
> > +	}
> > +
> > +	vq->vq.num_free += vq->desc_state[head].num;
> 
> It looks to me vq->free_head grows always, how can we make sure it does not
> exceeds vq.num here?

The vq->free_head stores the index of the next avail
desc. You can find it wraps together with vq->wrap_counter
in virtqueue_add_packed().

> 
> > +
> > +	if (vq->indirect) {
> > +		u32 len;
> > +
> > +		desc = vq->desc_state[head].indir_desc;
> > +		/* Free the indirect table, if any, now that it's unmapped. */
> > +		if (!desc)
> > +			return;
> > +
> > +		len = virtio32_to_cpu(vq->vq.vdev,
> > +				      vq->vring_packed.desc[head].len);
> > +
> > +		BUG_ON(!(vq->vring_packed.desc[head].flags &
> > +			 cpu_to_virtio16(vq->vq.vdev, VRING_DESC_F_INDIRECT)));
> > +		BUG_ON(len == 0 || len % sizeof(struct vring_packed_desc));
> > +
> > +		for (j = 0; j < len / sizeof(struct vring_packed_desc); j++)
> > +			vring_unmap_one(vq, &desc[j]);
> > +
> > +		kfree(desc);
> > +		vq->desc_state[head].indir_desc = NULL;
> > +	} else if (ctx) {
> > +		*ctx = vq->desc_state[head].indir_desc;
> > +	}
> > +}
> > +
> > +static inline bool more_used_split(const struct vring_virtqueue *vq)
> >   {
> >   	return vq->last_used_idx != virtio16_to_cpu(vq->vq.vdev, vq->vring.used->idx);
> >   }
> > -/**
> > - * virtqueue_get_buf - get the next used buffer
> > - * @vq: the struct virtqueue we're talking about.
> > - * @len: the length written into the buffer
> > - *
> > - * If the device wrote data into the buffer, @len will be set to the
> > - * amount written.  This means you don't need to clear the buffer
> > - * beforehand to ensure there's no data leakage in the case of short
> > - * writes.
> > - *
> > - * Caller must ensure we don't call this with other virtqueue
> > - * operations at the same time (except where noted).
> > - *
> > - * Returns NULL if there are no used buffers, or the "data" token
> > - * handed to virtqueue_add_*().
> > - */
> > -void *virtqueue_get_buf_ctx(struct virtqueue *_vq, unsigned int *len,
> > -			    void **ctx)
> > +static inline bool more_used_packed(const struct vring_virtqueue *vq)
> > +{
> > +	u16 last_used, flags;
> > +	bool avail, used;
> > +
> > +	if (vq->vq.num_free == vq->vring.num)
> > +		return false;
> > +
> > +	last_used = vq->last_used_idx;
> > +	flags = virtio16_to_cpu(vq->vq.vdev,
> > +				vq->vring_packed.desc[last_used].flags);
> > +	avail = flags & VRING_DESC_F_AVAIL(1);
> > +	used = flags & VRING_DESC_F_USED(1);
> > +
> > +	return avail == used;
> > +}
> > +
> > +static inline bool more_used(const struct vring_virtqueue *vq)
> > +{
> > +	return vq->packed ? more_used_packed(vq) : more_used_split(vq);
> > +}
> > +
> > +void *virtqueue_get_buf_ctx_split(struct virtqueue *_vq, unsigned int *len,
> > +				  void **ctx)
> >   {
> >   	struct vring_virtqueue *vq = to_vvq(_vq);
> >   	void *ret;
> > @@ -735,9 +1079,9 @@ void *virtqueue_get_buf_ctx(struct virtqueue *_vq, unsigned int *len,
> >   		return NULL;
> >   	}
> > -	/* detach_buf clears data, so grab it now. */
> > +	/* detach_buf_split clears data, so grab it now. */
> >   	ret = vq->desc_state[i].data;
> > -	detach_buf(vq, i, ctx);
> > +	detach_buf_split(vq, i, ctx);
> >   	vq->last_used_idx++;
> >   	/* If we expect an interrupt for the next entry, tell host
> >   	 * by writing event index and flush out the write before
> > @@ -754,6 +1098,87 @@ void *virtqueue_get_buf_ctx(struct virtqueue *_vq, unsigned int *len,
> >   	END_USE(vq);
> >   	return ret;
> >   }
> > +
> > +void *virtqueue_get_buf_ctx_packed(struct virtqueue *_vq, unsigned int *len,
> > +				   void **ctx)
> > +{
> > +	struct vring_virtqueue *vq = to_vvq(_vq);
> > +	void *ret;
> > +	unsigned int i;
> > +	u16 last_used;
> > +
> > +	START_USE(vq);
> > +
> > +	if (unlikely(vq->broken)) {
> > +		END_USE(vq);
> > +		return NULL;
> > +	}
> > +
> > +	if (!more_used(vq)) {
> > +		pr_debug("No more buffers in queue\n");
> > +		END_USE(vq);
> > +		return NULL;
> > +	}
> > +
> > +	/* Only get used array entries after they have been exposed by host. */
> > +	virtio_rmb(vq->weak_barriers);
> > +
> > +	last_used = vq->last_used_idx;
> > +
> > +	i = virtio32_to_cpu(_vq->vdev, vq->vring_packed.desc[last_used].id);
> > +	*len = virtio32_to_cpu(_vq->vdev, vq->vring_packed.desc[last_used].len);
> > +
> > +	if (unlikely(i >= vq->vring_packed.num)) {
> > +		BAD_RING(vq, "id %u out of range\n", i);
> > +		return NULL;
> > +	}
> > +	if (unlikely(!vq->desc_state[i].data)) {
> > +		BAD_RING(vq, "id %u is not a head!\n", i);
> > +		return NULL;
> > +	}
> > +
> > +	/* detach_buf_packed clears data, so grab it now. */
> > +	ret = vq->desc_state[i].data;
> > +	detach_buf_packed(vq, i, ctx);
> > +
> > +	vq->last_used_idx += vq->desc_state[i].num;
> > +	if (vq->last_used_idx >= vq->vring_packed.num)
> > +		vq->last_used_idx %= vq->vring_packed.num;
> 
> '-=' should be sufficient here?

Good suggestion. I think so.

> 
> > +
> > +	// FIXME: implement the desc event support
> > +
> > +#ifdef DEBUG
> > +	vq->last_add_time_valid = false;
> > +#endif
> > +
> > +	END_USE(vq);
> > +	return ret;
> > +}
> > +
> > +/**
> > + * virtqueue_get_buf - get the next used buffer
> > + * @vq: the struct virtqueue we're talking about.
> > + * @len: the length written into the buffer
> > + *
> > + * If the device wrote data into the buffer, @len will be set to the
> > + * amount written.  This means you don't need to clear the buffer
> > + * beforehand to ensure there's no data leakage in the case of short
> > + * writes.
> > + *
> > + * Caller must ensure we don't call this with other virtqueue
> > + * operations at the same time (except where noted).
> > + *
> > + * Returns NULL if there are no used buffers, or the "data" token
> > + * handed to virtqueue_add_*().
> > + */
> > +void *virtqueue_get_buf_ctx(struct virtqueue *_vq, unsigned int *len,
> > +			    void **ctx)
> > +{
> > +	struct vring_virtqueue *vq = to_vvq(_vq);
> > +
> > +	return vq->packed ? virtqueue_get_buf_ctx_packed(_vq, len, ctx) :
> > +			    virtqueue_get_buf_ctx_split(_vq, len, ctx);
> > +}
> >   EXPORT_SYMBOL_GPL(virtqueue_get_buf_ctx);
> >   void *virtqueue_get_buf(struct virtqueue *_vq, unsigned int *len)
> > @@ -761,6 +1186,24 @@ void *virtqueue_get_buf(struct virtqueue *_vq, unsigned int *len)
> >   	return virtqueue_get_buf_ctx(_vq, len, NULL);
> >   }
> >   EXPORT_SYMBOL_GPL(virtqueue_get_buf);
> > +
> > +static void virtqueue_disable_cb_split(struct virtqueue *_vq)
> > +{
> > +	struct vring_virtqueue *vq = to_vvq(_vq);
> > +
> > +	if (!(vq->avail_flags_shadow & VRING_AVAIL_F_NO_INTERRUPT)) {
> > +		vq->avail_flags_shadow |= VRING_AVAIL_F_NO_INTERRUPT;
> > +		if (!vq->event)
> > +			vq->vring.avail->flags = cpu_to_virtio16(_vq->vdev,
> > +							vq->avail_flags_shadow);
> > +	}
> > +}
> > +
> > +static void virtqueue_disable_cb_packed(struct virtqueue *_vq)
> > +{
> > +	// FIXME: to be implemented
> > +}
> > +
> >   /**
> >    * virtqueue_disable_cb - disable callbacks
> >    * @vq: the struct virtqueue we're talking about.
> > @@ -774,12 +1217,10 @@ void virtqueue_disable_cb(struct virtqueue *_vq)
> >   {
> >   	struct vring_virtqueue *vq = to_vvq(_vq);
> > -	if (!(vq->avail_flags_shadow & VRING_AVAIL_F_NO_INTERRUPT)) {
> > -		vq->avail_flags_shadow |= VRING_AVAIL_F_NO_INTERRUPT;
> > -		if (!vq->event)
> > -			vq->vring.avail->flags = cpu_to_virtio16(_vq->vdev, vq->avail_flags_shadow);
> > -	}
> > -
> > +	if (vq->packed)
> > +		virtqueue_disable_cb_packed(_vq);
> > +	else
> > +		virtqueue_disable_cb_split(_vq);
> >   }
> >   EXPORT_SYMBOL_GPL(virtqueue_disable_cb);
> > @@ -802,6 +1243,12 @@ unsigned virtqueue_enable_cb_prepare(struct virtqueue *_vq)
> >   	START_USE(vq);
> > +	if (vq->packed) {
> > +		// FIXME: to be implemented
> > +		last_used_idx = vq->last_used_idx;
> > +		goto out;
> > +	}
> > +
> >   	/* We optimistically turn back on interrupts, then check if there was
> >   	 * more to do. */
> >   	/* Depending on the VIRTIO_RING_F_EVENT_IDX feature, we need to
> > @@ -813,6 +1260,7 @@ unsigned virtqueue_enable_cb_prepare(struct virtqueue *_vq)
> >   			vq->vring.avail->flags = cpu_to_virtio16(_vq->vdev, vq->avail_flags_shadow);
> >   	}
> >   	vring_used_event(&vq->vring) = cpu_to_virtio16(_vq->vdev, last_used_idx = vq->last_used_idx);
> > +out:
> >   	END_USE(vq);
> >   	return last_used_idx;
> >   }
> > @@ -832,6 +1280,12 @@ bool virtqueue_poll(struct virtqueue *_vq, unsigned last_used_idx)
> >   	struct vring_virtqueue *vq = to_vvq(_vq);
> >   	virtio_mb(vq->weak_barriers);
> > +	if (vq->packed) {
> > +		u16 flags = virtio16_to_cpu(vq->vq.vdev,
> > +				vq->vring_packed.desc[last_used_idx].flags);
> > +		return !(flags & VRING_DESC_F_AVAIL(1)) ==
> > +		       !(flags & VRING_DESC_F_USED(1));
> > +	}
> >   	return (u16)last_used_idx != virtio16_to_cpu(_vq->vdev, vq->vring.used->idx);
> >   }
> >   EXPORT_SYMBOL_GPL(virtqueue_poll);
> > @@ -874,6 +1328,11 @@ bool virtqueue_enable_cb_delayed(struct virtqueue *_vq)
> >   	START_USE(vq);
> > +	if (vq->packed) {
> > +		// FIXME: to be implemented
> > +		goto out;
> > +	}
> > +
> >   	/* We optimistically turn back on interrupts, then check if there was
> >   	 * more to do. */
> >   	/* Depending on the VIRTIO_RING_F_USED_EVENT_IDX feature, we need to
> > @@ -896,6 +1355,7 @@ bool virtqueue_enable_cb_delayed(struct virtqueue *_vq)
> >   		return false;
> >   	}
> > +out:
> >   	END_USE(vq);
> >   	return true;
> >   }
> > @@ -922,14 +1382,20 @@ void *virtqueue_detach_unused_buf(struct virtqueue *_vq)
> >   			continue;
> >   		/* detach_buf clears data, so grab it now. */
> >   		buf = vq->desc_state[i].data;
> > -		detach_buf(vq, i, NULL);
> > -		vq->avail_idx_shadow--;
> > -		vq->vring.avail->idx = cpu_to_virtio16(_vq->vdev, vq->avail_idx_shadow);
> > +		if (vq->packed)
> > +			detach_buf_packed(vq, i, NULL);
> > +		else {
> > +			detach_buf_split(vq, i, NULL);
> > +			vq->avail_idx_shadow--;
> > +			vq->vring.avail->idx = cpu_to_virtio16(_vq->vdev,
> > +							vq->avail_idx_shadow);
> > +		}
> >   		END_USE(vq);
> >   		return buf;
> >   	}
> >   	/* That should have freed everything. */
> > -	BUG_ON(vq->vq.num_free != vq->vring.num);
> > +	BUG_ON(vq->vq.num_free != (vq->packed ? vq->vring_packed.num :
> > +						vq->vring.num));
> >   	END_USE(vq);
> >   	return NULL;
> > @@ -957,7 +1423,8 @@ irqreturn_t vring_interrupt(int irq, void *_vq)
> >   EXPORT_SYMBOL_GPL(vring_interrupt);
> >   struct virtqueue *__vring_new_virtqueue(unsigned int index,
> > -					struct vring vring,
> > +					union vring_union vring,
> > +					bool packed,
> >   					struct virtio_device *vdev,
> >   					bool weak_barriers,
> >   					bool context,
> > @@ -965,19 +1432,20 @@ struct virtqueue *__vring_new_virtqueue(unsigned int index,
> >   					void (*callback)(struct virtqueue *),
> >   					const char *name)
> >   {
> > -	unsigned int i;
> > +	unsigned int num, i;
> >   	struct vring_virtqueue *vq;
> > -	vq = kmalloc(sizeof(*vq) + vring.num * sizeof(struct vring_desc_state),
> > +	num = packed ? vring.vring_packed.num : vring.vring_split.num;
> > +
> > +	vq = kmalloc(sizeof(*vq) + num * sizeof(struct vring_desc_state),
> >   		     GFP_KERNEL);
> >   	if (!vq)
> >   		return NULL;
> > -	vq->vring = vring;
> >   	vq->vq.callback = callback;
> >   	vq->vq.vdev = vdev;
> >   	vq->vq.name = name;
> > -	vq->vq.num_free = vring.num;
> > +	vq->vq.num_free = num;
> >   	vq->vq.index = index;
> >   	vq->we_own_ring = false;
> >   	vq->queue_dma_addr = 0;
> > @@ -986,9 +1454,8 @@ struct virtqueue *__vring_new_virtqueue(unsigned int index,
> >   	vq->weak_barriers = weak_barriers;
> >   	vq->broken = false;
> >   	vq->last_used_idx = 0;
> > -	vq->avail_flags_shadow = 0;
> > -	vq->avail_idx_shadow = 0;
> >   	vq->num_added = 0;
> > +	vq->packed = packed;
> >   	list_add_tail(&vq->vq.list, &vdev->vqs);
> >   #ifdef DEBUG
> >   	vq->in_use = false;
> > @@ -999,18 +1466,41 @@ struct virtqueue *__vring_new_virtqueue(unsigned int index,
> >   		!context;
> >   	vq->event = virtio_has_feature(vdev, VIRTIO_RING_F_EVENT_IDX);
> > +	if (vq->packed) {
> > +		vq->vring_packed = vring.vring_packed;
> > +		vq->free_head = 0;
> > +		vq->wrap_counter = 1;
> > +
> > +#if 0
> > +		vq->chaining = virtio_has_feature(vdev,
> > +						  VIRTIO_RING_F_LIST_DESC);
> > +#else
> > +		vq->chaining = true;
> 
> Looks like in V10 there's no F_LIST_DESC.

Yes. I kept this in this patch just because the
desc chaining is optional in the old spec draft
when sending out this patch set. I'll remove it
in next version.

> 
> > +#endif
> > +	} else {
> > +		vq->vring = vring.vring_split;
> > +		vq->avail_flags_shadow = 0;
> > +		vq->avail_idx_shadow = 0;
> > +
> > +		/* Put everything in free lists. */
> > +		vq->free_head = 0;
> > +		for (i = 0; i < num-1; i++)
> > +			vq->vring.desc[i].next = cpu_to_virtio16(vdev, i + 1);
> > +	}
> > +
> >   	/* No callback?  Tell other side not to bother us. */
> >   	if (!callback) {
> > -		vq->avail_flags_shadow |= VRING_AVAIL_F_NO_INTERRUPT;
> > -		if (!vq->event)
> > -			vq->vring.avail->flags = cpu_to_virtio16(vdev, vq->avail_flags_shadow);
> > +		if (packed) {
> > +			// FIXME: to be implemented
> > +		} else {
> > +			vq->avail_flags_shadow |= VRING_AVAIL_F_NO_INTERRUPT;
> > +			if (!vq->event)
> > +				vq->vring.avail->flags = cpu_to_virtio16(vdev,
> > +						vq->avail_flags_shadow);
> > +		}
> >   	}
> > -	/* Put everything in free lists. */
> > -	vq->free_head = 0;
> > -	for (i = 0; i < vring.num-1; i++)
> > -		vq->vring.desc[i].next = cpu_to_virtio16(vdev, i + 1);
> > -	memset(vq->desc_state, 0, vring.num * sizeof(struct vring_desc_state));
> > +	memset(vq->desc_state, 0, num * sizeof(struct vring_desc_state));
> >   	return &vq->vq;
> >   }
> > @@ -1058,6 +1548,14 @@ static void vring_free_queue(struct virtio_device *vdev, size_t size,
> >   	}
> >   }
> > +static inline int
> > +__vring_size(unsigned int num, unsigned long align, bool packed)
> > +{
> > +	if (packed)
> > +		return vring_packed_size(num, align);
> > +	return vring_size(num, align);
> > +}
> > +
> >   struct virtqueue *vring_create_virtqueue(
> >   	unsigned int index,
> >   	unsigned int num,
> > @@ -1074,7 +1572,8 @@ struct virtqueue *vring_create_virtqueue(
> >   	void *queue = NULL;
> >   	dma_addr_t dma_addr;
> >   	size_t queue_size_in_bytes;
> > -	struct vring vring;
> > +	union vring_union vring;
> > +	bool packed;
> >   	/* We assume num is a power of 2. */
> >   	if (num & (num - 1)) {
> > @@ -1082,9 +1581,13 @@ struct virtqueue *vring_create_virtqueue(
> >   		return NULL;
> >   	}
> > +	packed = virtio_has_feature(vdev, VIRTIO_F_RING_PACKED);
> > +
> >   	/* TODO: allocate each queue chunk individually */
> > -	for (; num && vring_size(num, vring_align) > PAGE_SIZE; num /= 2) {
> > -		queue = vring_alloc_queue(vdev, vring_size(num, vring_align),
> > +	for (; num && __vring_size(num, vring_align, packed) > PAGE_SIZE;
> > +			num /= 2) {
> > +		queue = vring_alloc_queue(vdev, __vring_size(num, vring_align,
> > +							     packed),
> >   					  &dma_addr,
> >   					  GFP_KERNEL|__GFP_NOWARN|__GFP_ZERO);
> >   		if (queue)
> > @@ -1096,17 +1599,21 @@ struct virtqueue *vring_create_virtqueue(
> >   	if (!queue) {
> >   		/* Try to get a single page. You are my only hope! */
> > -		queue = vring_alloc_queue(vdev, vring_size(num, vring_align),
> > +		queue = vring_alloc_queue(vdev, __vring_size(num, vring_align,
> > +							     packed),
> >   					  &dma_addr, GFP_KERNEL|__GFP_ZERO);
> >   	}
> >   	if (!queue)
> >   		return NULL;
> > -	queue_size_in_bytes = vring_size(num, vring_align);
> > -	vring_init(&vring, num, queue, vring_align);
> > +	queue_size_in_bytes = __vring_size(num, vring_align, packed);
> > +	if (packed)
> > +		vring_packed_init(&vring.vring_packed, num, queue, vring_align);
> > +	else
> > +		vring_init(&vring.vring_split, num, queue, vring_align);
> 
> Let's rename vring_init to vring_init_split() like other helpers?

The vring_init() is a public API in include/uapi/linux/virtio_ring.h.
I don't think we can rename it.

> 
> > -	vq = __vring_new_virtqueue(index, vring, vdev, weak_barriers, context,
> > -				   notify, callback, name);
> > +	vq = __vring_new_virtqueue(index, vring, packed, vdev, weak_barriers,
> > +				   context, notify, callback, name);
> >   	if (!vq) {
> >   		vring_free_queue(vdev, queue_size_in_bytes, queue,
> >   				 dma_addr);
> > @@ -1132,10 +1639,17 @@ struct virtqueue *vring_new_virtqueue(unsigned int index,
> >   				      void (*callback)(struct virtqueue *vq),
> >   				      const char *name)
> >   {
> > -	struct vring vring;
> > -	vring_init(&vring, num, pages, vring_align);
> > -	return __vring_new_virtqueue(index, vring, vdev, weak_barriers, context,
> > -				     notify, callback, name);
> > +	union vring_union vring;
> > +	bool packed;
> > +
> > +	packed = virtio_has_feature(vdev, VIRTIO_F_RING_PACKED);
> > +	if (packed)
> > +		vring_packed_init(&vring.vring_packed, num, pages, vring_align);
> > +	else
> > +		vring_init(&vring.vring_split, num, pages, vring_align);
> > +
> > +	return __vring_new_virtqueue(index, vring, packed, vdev, weak_barriers,
> > +				     context, notify, callback, name);
> >   }
> >   EXPORT_SYMBOL_GPL(vring_new_virtqueue);
> > @@ -1145,7 +1659,9 @@ void vring_del_virtqueue(struct virtqueue *_vq)
> >   	if (vq->we_own_ring) {
> >   		vring_free_queue(vq->vq.vdev, vq->queue_size_in_bytes,
> > -				 vq->vring.desc, vq->queue_dma_addr);
> > +				 vq->packed ? (void *)vq->vring_packed.desc :
> > +					      (void *)vq->vring.desc,
> > +				 vq->queue_dma_addr);
> >   	}
> >   	list_del(&_vq->list);
> >   	kfree(vq);
> > @@ -1159,14 +1675,18 @@ void vring_transport_features(struct virtio_device *vdev)
> >   	for (i = VIRTIO_TRANSPORT_F_START; i < VIRTIO_TRANSPORT_F_END; i++) {
> >   		switch (i) {
> > +#if 0 // FIXME: to be implemented
> >   		case VIRTIO_RING_F_INDIRECT_DESC:
> >   			break;
> > +#endif
> >   		case VIRTIO_RING_F_EVENT_IDX:
> >   			break;
> >   		case VIRTIO_F_VERSION_1:
> >   			break;
> >   		case VIRTIO_F_IOMMU_PLATFORM:
> >   			break;
> > +		case VIRTIO_F_RING_PACKED:
> > +			break;
> >   		default:
> >   			/* We don't understand this bit. */
> >   			__virtio_clear_bit(vdev, i);
> > @@ -1187,7 +1707,7 @@ unsigned int virtqueue_get_vring_size(struct virtqueue *_vq)
> >   	struct vring_virtqueue *vq = to_vvq(_vq);
> > -	return vq->vring.num;
> > +	return vq->packed ? vq->vring_packed.num : vq->vring.num;
> >   }
> >   EXPORT_SYMBOL_GPL(virtqueue_get_vring_size);
> > @@ -1224,6 +1744,7 @@ dma_addr_t virtqueue_get_desc_addr(struct virtqueue *_vq)
> >   }
> >   EXPORT_SYMBOL_GPL(virtqueue_get_desc_addr);
> > +/* Only available for split ring */
> 
> Interesting, I think we need this for correctly configure pci. e.g in
> setup_vq()?

Yes. The setup_vq() should be updated. But it requires
QEMU change, so I just kept it as is in this RFC patch.

> 
> >   dma_addr_t virtqueue_get_avail_addr(struct virtqueue *_vq)
> >   {
> >   	struct vring_virtqueue *vq = to_vvq(_vq);
> > @@ -1235,6 +1756,7 @@ dma_addr_t virtqueue_get_avail_addr(struct virtqueue *_vq)
> >   }
> >   EXPORT_SYMBOL_GPL(virtqueue_get_avail_addr);
> > +/* Only available for split ring */
> >   dma_addr_t virtqueue_get_used_addr(struct virtqueue *_vq)
> 
> Maybe it's better to rename this to get_device_addr().

It's a kernel API which has been exported by EXPORT_SYMBOL_GPL(),
I'm not sure whether it's a good idea to rename it.

Best regards,
Tiwei Bie

> 
> Thanks
> 
> >   {
> >   	struct vring_virtqueue *vq = to_vvq(_vq);
> > @@ -1246,6 +1768,7 @@ dma_addr_t virtqueue_get_used_addr(struct virtqueue *_vq)
> >   }
> >   EXPORT_SYMBOL_GPL(virtqueue_get_used_addr);
> > +/* Only available for split ring */
> >   const struct vring *virtqueue_get_vring(struct virtqueue *vq)
> >   {
> >   	return &to_vvq(vq)->vring;
> > diff --git a/include/linux/virtio_ring.h b/include/linux/virtio_ring.h
> > index bbf32524ab27..a0075894ad16 100644
> > --- a/include/linux/virtio_ring.h
> > +++ b/include/linux/virtio_ring.h
> > @@ -60,6 +60,11 @@ static inline void virtio_store_mb(bool weak_barriers,
> >   struct virtio_device;
> >   struct virtqueue;
> > +union vring_union {
> > +	struct vring vring_split;
> > +	struct vring_packed vring_packed;
> > +};
> > +
> >   /*
> >    * Creates a virtqueue and allocates the descriptor ring.  If
> >    * may_reduce_num is set, then this may allocate a smaller ring than
> > @@ -79,7 +84,8 @@ struct virtqueue *vring_create_virtqueue(unsigned int index,
> >   /* Creates a virtqueue with a custom layout. */
> >   struct virtqueue *__vring_new_virtqueue(unsigned int index,
> > -					struct vring vring,
> > +					union vring_union vring,
> > +					bool packed,
> >   					struct virtio_device *vdev,
> >   					bool weak_barriers,
> >   					bool ctx,
> 
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH RFC 2/2] virtio_ring: support packed ring
  2018-03-16  6:10       ` Tiwei Bie
  (?)
  (?)
@ 2018-03-16  6:44       ` Jason Wang
  2018-03-16  7:40         ` Tiwei Bie
  2018-03-16  7:40         ` Tiwei Bie
  -1 siblings, 2 replies; 38+ messages in thread
From: Jason Wang @ 2018-03-16  6:44 UTC (permalink / raw)
  To: Tiwei Bie; +Cc: mst, virtualization, linux-kernel, netdev, wexu, jfreimann



On 2018年03月16日 14:10, Tiwei Bie wrote:
> On Fri, Mar 16, 2018 at 12:03:25PM +0800, Jason Wang wrote:
>> On 2018年02月23日 19:18, Tiwei Bie wrote:
>>> Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
>>> ---
>>>    drivers/virtio/virtio_ring.c | 699 +++++++++++++++++++++++++++++++++++++------
>>>    include/linux/virtio_ring.h  |   8 +-
>>>    2 files changed, 618 insertions(+), 89 deletions(-)
>>>
>>> diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
>>> index eb30f3e09a47..393778a2f809 100644
>>> --- a/drivers/virtio/virtio_ring.c
>>> +++ b/drivers/virtio/virtio_ring.c
>>> @@ -58,14 +58,14 @@
>>>    struct vring_desc_state {
>>>    	void *data;			/* Data for callback. */
>>> -	struct vring_desc *indir_desc;	/* Indirect descriptor, if any. */
>>> +	void *indir_desc;		/* Indirect descriptor, if any. */
>>> +	int num;			/* Descriptor list length. */
>>>    };
>>>    struct vring_virtqueue {
>>>    	struct virtqueue vq;
>>> -	/* Actual memory layout for this queue */
>>> -	struct vring vring;
>>> +	bool packed;
>>>    	/* Can we use weak barriers? */
>>>    	bool weak_barriers;
>>> @@ -87,11 +87,28 @@ struct vring_virtqueue {
>>>    	/* Last used index we've seen. */
>>>    	u16 last_used_idx;
>>> -	/* Last written value to avail->flags */
>>> -	u16 avail_flags_shadow;
>>> -
>>> -	/* Last written value to avail->idx in guest byte order */
>>> -	u16 avail_idx_shadow;
>>> +	union {
>>> +		/* Available for split ring */
>>> +		struct {
>>> +			/* Actual memory layout for this queue */
>>> +			struct vring vring;
>>> +
>>> +			/* Last written value to avail->flags */
>>> +			u16 avail_flags_shadow;
>>> +
>>> +			/* Last written value to avail->idx in
>>> +			 * guest byte order */
>>> +			u16 avail_idx_shadow;
>>> +		};
>>> +
>>> +		/* Available for packed ring */
>>> +		struct {
>>> +			/* Actual memory layout for this queue */
>>> +			struct vring_packed vring_packed;
>>> +			u8 wrap_counter : 1;
>>> +			bool chaining;
>>> +		};
>>> +	};
>>>    	/* How to notify other side. FIXME: commonalize hcalls! */
>>>    	bool (*notify)(struct virtqueue *vq);
>>> @@ -201,26 +218,37 @@ static dma_addr_t vring_map_single(const struct vring_virtqueue *vq,
>>>    			      cpu_addr, size, direction);
>>>    }
>>> -static void vring_unmap_one(const struct vring_virtqueue *vq,
>>> -			    struct vring_desc *desc)
>>> +static void vring_unmap_one(const struct vring_virtqueue *vq, void *_desc)
>>>    {
>> Let's split the helpers to packed/split version like other helpers?
>> (Consider the caller has already known the type of vq).
> Okay.
>

[...]

>>> +				desc[i].flags = flags;
>>> +
>>> +			desc[i].addr = cpu_to_virtio64(_vq->vdev, addr);
>>> +			desc[i].len = cpu_to_virtio32(_vq->vdev, sg->length);
>>> +			desc[i].id = cpu_to_virtio32(_vq->vdev, head);
>> If it's a part of chain, we only need to do this for last buffer I think.
> I'm not sure I've got your point about the "last buffer".
> But, yes, id just needs to be set for the last desc.

Right, I think I meant "last descriptor" :)

>
>>> +			prev = i;
>>> +			i++;
>> It looks to me prev is always i - 1?
> No. prev will be (vq->vring_packed.num - 1) when i becomes 0.

Right, so prev = i ? i - 1 : vq->vring_packed.num - 1.

>
>>> +			if (!indirect && i >= vq->vring_packed.num) {
>>> +				i = 0;
>>> +				vq->wrap_counter ^= 1;
>>> +			}
>>> +		}
>>> +	}
>>> +	for (; n < (out_sgs + in_sgs); n++) {
>>> +		for (sg = sgs[n]; sg; sg = sg_next(sg)) {
>>> +			dma_addr_t addr = vring_map_one_sg(vq, sg, DMA_FROM_DEVICE);
>>> +			if (vring_mapping_error(vq, addr))
>>> +				goto unmap_release;
>>> +
>>> +			flags = cpu_to_virtio16(_vq->vdev, VRING_DESC_F_NEXT |
>>> +					VRING_DESC_F_WRITE |
>>> +					VRING_DESC_F_AVAIL(vq->wrap_counter) |
>>> +					VRING_DESC_F_USED(!vq->wrap_counter));
>>> +			if (!indirect && i == head)
>>> +				head_flags = flags;
>>> +			else
>>> +				desc[i].flags = flags;
>>> +
>>> +			desc[i].addr = cpu_to_virtio64(_vq->vdev, addr);
>>> +			desc[i].len = cpu_to_virtio32(_vq->vdev, sg->length);
>>> +			desc[i].id = cpu_to_virtio32(_vq->vdev, head);
>>> +			prev = i;
>>> +			i++;
>>> +			if (!indirect && i >= vq->vring_packed.num) {
>>> +				i = 0;
>>> +				vq->wrap_counter ^= 1;
>>> +			}
>>> +		}
>>> +	}
>>> +	/* Last one doesn't continue. */
>>> +	if (!indirect && (head + 1) % vq->vring_packed.num == i)
>>> +		head_flags &= cpu_to_virtio16(_vq->vdev, ~VRING_DESC_F_NEXT);
>> I can't get the why we need this here.
> If only one desc is used, we will need to clear the
> VRING_DESC_F_NEXT flag from the head_flags.

Yes, I meant why following desc[prev].flags won't work for this?

>
>>> +	else
>>> +		desc[prev].flags &= cpu_to_virtio16(_vq->vdev, ~VRING_DESC_F_NEXT);
>>> +
>>> +	if (indirect) {
>>> +		/* FIXME: to be implemented */
>>> +
>>> +		/* Now that the indirect table is filled in, map it. */
>>> +		dma_addr_t addr = vring_map_single(
>>> +			vq, desc, total_sg * sizeof(struct vring_packed_desc),
>>> +			DMA_TO_DEVICE);
>>> +		if (vring_mapping_error(vq, addr))
>>> +			goto unmap_release;
>>> +
>>> +		head_flags = cpu_to_virtio16(_vq->vdev, VRING_DESC_F_INDIRECT |
>>> +					     VRING_DESC_F_AVAIL(wrap_counter) |
>>> +					     VRING_DESC_F_USED(!wrap_counter));
>>> +		vq->vring_packed.desc[head].addr = cpu_to_virtio64(_vq->vdev, addr);
>>> +		vq->vring_packed.desc[head].len = cpu_to_virtio32(_vq->vdev,
>>> +				total_sg * sizeof(struct vring_packed_desc));
>>> +		vq->vring_packed.desc[head].id = cpu_to_virtio32(_vq->vdev, head);
>>> +	}
>>> +
>>> +	/* We're using some buffers from the free list. */
>>> +	vq->vq.num_free -= descs_used;
>>> +
>>> +	/* Update free pointer */
>>> +	if (indirect) {
>>> +		n = head + 1;
>>> +		if (n >= vq->vring_packed.num) {
>>> +			n = 0;
>>> +			vq->wrap_counter ^= 1;
>>> +		}
>>> +		vq->free_head = n;
>> detach_buf_packed() does not even touch free_head here, so need to explain
>> its meaning for packed ring.
> Above code is for indirect support which isn't really
> implemented in this patch yet.
>
> For your question, free_head stores the index of the
> next avail desc. I'll add a comment for it or move it
> to union and give it a better name in next version.

Yes, something like avail_idx might be better.

>
>>> +	} else
>>> +		vq->free_head = i;
>> ID is only valid in the last descriptor in the list, so head + 1 should be
>> ok too?
> I don't really get your point. The vq->free_head stores
> the index of the next avail desc.

I think I get your idea now, free_head has two meanings:

- next avail index
- buffer id

If I'm correct, let's better add a comment for this.

>
>>> +
>>> +	/* Store token and indirect buffer state. */
>>> +	vq->desc_state[head].num = descs_used;
>>> +	vq->desc_state[head].data = data;
>>> +	if (indirect)
>>> +		vq->desc_state[head].indir_desc = desc;
>>> +	else
>>> +		vq->desc_state[head].indir_desc = ctx;
>>> +
>>> +	virtio_wmb(vq->weak_barriers);
>> Let's add a comment to explain the barrier here.
> Okay.
>
>>> +	vq->vring_packed.desc[head].flags = head_flags;
>>> +	vq->num_added++;
>>> +
>>> +	pr_debug("Added buffer head %i to %p\n", head, vq);
>>> +	END_USE(vq);
>>> +
>>> +	return 0;
>>> +
>>> +unmap_release:
>>> +	err_idx = i;
>>> +	i = head;
>>> +
>>> +	for (n = 0; n < total_sg; n++) {
>>> +		if (i == err_idx)
>>> +			break;
>>> +		vring_unmap_one(vq, &desc[i]);
>>> +		i++;
>>> +		if (!indirect && i >= vq->vring_packed.num)
>>> +			i = 0;
>>> +	}
>>> +
>>> +	vq->wrap_counter = wrap_counter;
>>> +
>>> +	if (indirect)
>>> +		kfree(desc);
>>> +
>>> +	END_USE(vq);
>>> +	return -EIO;
>>> +}
>>> +
>>> +static inline int virtqueue_add(struct virtqueue *_vq,
>>> +				struct scatterlist *sgs[],
>>> +				unsigned int total_sg,
>>> +				unsigned int out_sgs,
>>> +				unsigned int in_sgs,
>>> +				void *data,
>>> +				void *ctx,
>>> +				gfp_t gfp)
>>> +{
>>> +	struct vring_virtqueue *vq = to_vvq(_vq);
>>> +
>>> +	return vq->packed ? virtqueue_add_packed(_vq, sgs, total_sg, out_sgs,
>>> +						 in_sgs, data, ctx, gfp) :
>>> +			    virtqueue_add_split(_vq, sgs, total_sg, out_sgs,
>>> +						in_sgs, data, ctx, gfp);
>>> +}
>>> +
>>>    /**
>>>     * virtqueue_add_sgs - expose buffers to other end
>>>     * @vq: the struct virtqueue we're talking about.
>>> @@ -561,6 +845,12 @@ bool virtqueue_kick_prepare(struct virtqueue *_vq)
>>>    	 * event. */
>>>    	virtio_mb(vq->weak_barriers);
>>> +	if (vq->packed) {
>>> +		/* FIXME: to be implemented */
>>> +		needs_kick = true;
>>> +		goto out;
>>> +	}
>>> +
>>>    	old = vq->avail_idx_shadow - vq->num_added;
>>>    	new = vq->avail_idx_shadow;
>>>    	vq->num_added = 0;
>>> @@ -579,6 +869,8 @@ bool virtqueue_kick_prepare(struct virtqueue *_vq)
>>>    	} else {
>>>    		needs_kick = !(vq->vring.used->flags & cpu_to_virtio16(_vq->vdev, VRING_USED_F_NO_NOTIFY));
>>>    	}
>>> +
>>> +out:
>>>    	END_USE(vq);
>>>    	return needs_kick;
>>>    }
>>> @@ -628,8 +920,8 @@ bool virtqueue_kick(struct virtqueue *vq)
>>>    }
>>>    EXPORT_SYMBOL_GPL(virtqueue_kick);
>>> -static void detach_buf(struct vring_virtqueue *vq, unsigned int head,
>>> -		       void **ctx)
>>> +static void detach_buf_split(struct vring_virtqueue *vq, unsigned int head,
>>> +			     void **ctx)
>>>    {
>>>    	unsigned int i, j;
>>>    	__virtio16 nextflag = cpu_to_virtio16(vq->vq.vdev, VRING_DESC_F_NEXT);
>>> @@ -677,29 +969,81 @@ static void detach_buf(struct vring_virtqueue *vq, unsigned int head,
>>>    	}
>>>    }
>>> -static inline bool more_used(const struct vring_virtqueue *vq)
>>> +static void detach_buf_packed(struct vring_virtqueue *vq, unsigned int head,
>>> +			      void **ctx)
>>> +{
>>> +	struct vring_packed_desc *desc;
>>> +	unsigned int i, j;
>>> +
>>> +	/* Clear data ptr. */
>>> +	vq->desc_state[head].data = NULL;
>>> +
>>> +	i = head;
>>> +
>>> +	for (j = 0; j < vq->desc_state[head].num; j++) {
>>> +		desc = &vq->vring_packed.desc[i];
>>> +		vring_unmap_one(vq, desc);
>>> +		i++;
>>> +		if (i >= vq->vring_packed.num)
>>> +			i = 0;
>>> +	}
>>> +
>>> +	vq->vq.num_free += vq->desc_state[head].num;
>> It looks to me vq->free_head grows always, how can we make sure it does not
>> exceeds vq.num here?
> The vq->free_head stores the index of the next avail
> desc. You can find it wraps together with vq->wrap_counter
> in virtqueue_add_packed().
>

I see, thanks.


[...]

>>> +
>>> +	/* detach_buf_packed clears data, so grab it now. */
>>> +	ret = vq->desc_state[i].data;
>>> +	detach_buf_packed(vq, i, ctx);
>>> +
>>> +	vq->last_used_idx += vq->desc_state[i].num;
>>> +	if (vq->last_used_idx >= vq->vring_packed.num)
>>> +		vq->last_used_idx %= vq->vring_packed.num;
>> '-=' should be sufficient here?
> Good suggestion. I think so.
>
>>> +
>>> +	// FIXME: implement the desc event support
>>> +
>>> +#ifdef DEBUG
>>> +	vq->last_add_time_valid = false;
>>> +#endif
>>> +
>>> +	END_USE(vq);
>>> +	return ret;
>>> +}
>>> +

[...]

>>> +
>>> +#if 0
>>> +		vq->chaining = virtio_has_feature(vdev,
>>> +						  VIRTIO_RING_F_LIST_DESC);
>>> +#else
>>> +		vq->chaining = true;
>> Looks like in V10 there's no F_LIST_DESC.
> Yes. I kept this in this patch just because the
> desc chaining is optional in the old spec draft
> when sending out this patch set. I'll remove it
> in next version.
>
>>> +#endif
>>> +	} else {
>>> +		vq->vring = vring.vring_split;
>>> +		vq->avail_flags_shadow = 0;
>>> +		vq->avail_idx_shadow = 0;
>>> +
>>> +		/* Put everything in free lists. */
>>> +		vq->free_head = 0;
>>> +		for (i = 0; i < num-1; i++)
>>> +			vq->vring.desc[i].next = cpu_to_virtio16(vdev, i + 1);
>>> +	}
>>> +
>>>    	/* No callback?  Tell other side not to bother us. */
>>>    	if (!callback) {
>>> -		vq->avail_flags_shadow |= VRING_AVAIL_F_NO_INTERRUPT;
>>> -		if (!vq->event)
>>> -			vq->vring.avail->flags = cpu_to_virtio16(vdev, vq->avail_flags_shadow);
>>> +		if (packed) {
>>> +			// FIXME: to be implemented
>>> +		} else {
>>> +			vq->avail_flags_shadow |= VRING_AVAIL_F_NO_INTERRUPT;
>>> +			if (!vq->event)
>>> +				vq->vring.avail->flags = cpu_to_virtio16(vdev,
>>> +						vq->avail_flags_shadow);
>>> +		}
>>>    	}
>>> -	/* Put everything in free lists. */
>>> -	vq->free_head = 0;
>>> -	for (i = 0; i < vring.num-1; i++)
>>> -		vq->vring.desc[i].next = cpu_to_virtio16(vdev, i + 1);
>>> -	memset(vq->desc_state, 0, vring.num * sizeof(struct vring_desc_state));
>>> +	memset(vq->desc_state, 0, num * sizeof(struct vring_desc_state));
>>>    	return &vq->vq;
>>>    }
>>> @@ -1058,6 +1548,14 @@ static void vring_free_queue(struct virtio_device *vdev, size_t size,
>>>    	}
>>>    }
>>> +static inline int
>>> +__vring_size(unsigned int num, unsigned long align, bool packed)
>>> +{
>>> +	if (packed)
>>> +		return vring_packed_size(num, align);
>>> +	return vring_size(num, align);
>>> +}
>>> +
>>>    struct virtqueue *vring_create_virtqueue(
>>>    	unsigned int index,
>>>    	unsigned int num,
>>> @@ -1074,7 +1572,8 @@ struct virtqueue *vring_create_virtqueue(
>>>    	void *queue = NULL;
>>>    	dma_addr_t dma_addr;
>>>    	size_t queue_size_in_bytes;
>>> -	struct vring vring;
>>> +	union vring_union vring;
>>> +	bool packed;
>>>    	/* We assume num is a power of 2. */
>>>    	if (num & (num - 1)) {
>>> @@ -1082,9 +1581,13 @@ struct virtqueue *vring_create_virtqueue(
>>>    		return NULL;
>>>    	}
>>> +	packed = virtio_has_feature(vdev, VIRTIO_F_RING_PACKED);
>>> +
>>>    	/* TODO: allocate each queue chunk individually */
>>> -	for (; num && vring_size(num, vring_align) > PAGE_SIZE; num /= 2) {
>>> -		queue = vring_alloc_queue(vdev, vring_size(num, vring_align),
>>> +	for (; num && __vring_size(num, vring_align, packed) > PAGE_SIZE;
>>> +			num /= 2) {
>>> +		queue = vring_alloc_queue(vdev, __vring_size(num, vring_align,
>>> +							     packed),
>>>    					  &dma_addr,
>>>    					  GFP_KERNEL|__GFP_NOWARN|__GFP_ZERO);
>>>    		if (queue)
>>> @@ -1096,17 +1599,21 @@ struct virtqueue *vring_create_virtqueue(
>>>    	if (!queue) {
>>>    		/* Try to get a single page. You are my only hope! */
>>> -		queue = vring_alloc_queue(vdev, vring_size(num, vring_align),
>>> +		queue = vring_alloc_queue(vdev, __vring_size(num, vring_align,
>>> +							     packed),
>>>    					  &dma_addr, GFP_KERNEL|__GFP_ZERO);
>>>    	}
>>>    	if (!queue)
>>>    		return NULL;
>>> -	queue_size_in_bytes = vring_size(num, vring_align);
>>> -	vring_init(&vring, num, queue, vring_align);
>>> +	queue_size_in_bytes = __vring_size(num, vring_align, packed);
>>> +	if (packed)
>>> +		vring_packed_init(&vring.vring_packed, num, queue, vring_align);
>>> +	else
>>> +		vring_init(&vring.vring_split, num, queue, vring_align);
>> Let's rename vring_init to vring_init_split() like other helpers?
> The vring_init() is a public API in include/uapi/linux/virtio_ring.h.
> I don't think we can rename it.

I see, then this need more thoughts to unify the API.

>
>>> -	vq = __vring_new_virtqueue(index, vring, vdev, weak_barriers, context,
>>> -				   notify, callback, name);
>>> +	vq = __vring_new_virtqueue(index, vring, packed, vdev, weak_barriers,
>>> +				   context, notify, callback, name);
>>>    	if (!vq) {
>>>    		vring_free_queue(vdev, queue_size_in_bytes, queue,
>>>    				 dma_addr);
>>> @@ -1132,10 +1639,17 @@ struct virtqueue *vring_new_virtqueue(unsigned int index,
>>>    				      void (*callback)(struct virtqueue *vq),
>>>    				      const char *name)
>>>    {
>>> -	struct vring vring;
>>> -	vring_init(&vring, num, pages, vring_align);
>>> -	return __vring_new_virtqueue(index, vring, vdev, weak_barriers, context,
>>> -				     notify, callback, name);
>>> +	union vring_union vring;
>>> +	bool packed;
>>> +
>>> +	packed = virtio_has_feature(vdev, VIRTIO_F_RING_PACKED);
>>> +	if (packed)
>>> +		vring_packed_init(&vring.vring_packed, num, pages, vring_align);
>>> +	else
>>> +		vring_init(&vring.vring_split, num, pages, vring_align);
>>> +
>>> +	return __vring_new_virtqueue(index, vring, packed, vdev, weak_barriers,
>>> +				     context, notify, callback, name);
>>>    }
>>>    EXPORT_SYMBOL_GPL(vring_new_virtqueue);
>>> @@ -1145,7 +1659,9 @@ void vring_del_virtqueue(struct virtqueue *_vq)
>>>    	if (vq->we_own_ring) {
>>>    		vring_free_queue(vq->vq.vdev, vq->queue_size_in_bytes,
>>> -				 vq->vring.desc, vq->queue_dma_addr);
>>> +				 vq->packed ? (void *)vq->vring_packed.desc :
>>> +					      (void *)vq->vring.desc,
>>> +				 vq->queue_dma_addr);
>>>    	}
>>>    	list_del(&_vq->list);
>>>    	kfree(vq);
>>> @@ -1159,14 +1675,18 @@ void vring_transport_features(struct virtio_device *vdev)
>>>    	for (i = VIRTIO_TRANSPORT_F_START; i < VIRTIO_TRANSPORT_F_END; i++) {
>>>    		switch (i) {
>>> +#if 0 // FIXME: to be implemented
>>>    		case VIRTIO_RING_F_INDIRECT_DESC:
>>>    			break;
>>> +#endif
>>>    		case VIRTIO_RING_F_EVENT_IDX:
>>>    			break;
>>>    		case VIRTIO_F_VERSION_1:
>>>    			break;
>>>    		case VIRTIO_F_IOMMU_PLATFORM:
>>>    			break;
>>> +		case VIRTIO_F_RING_PACKED:
>>> +			break;
>>>    		default:
>>>    			/* We don't understand this bit. */
>>>    			__virtio_clear_bit(vdev, i);
>>> @@ -1187,7 +1707,7 @@ unsigned int virtqueue_get_vring_size(struct virtqueue *_vq)
>>>    	struct vring_virtqueue *vq = to_vvq(_vq);
>>> -	return vq->vring.num;
>>> +	return vq->packed ? vq->vring_packed.num : vq->vring.num;
>>>    }
>>>    EXPORT_SYMBOL_GPL(virtqueue_get_vring_size);
>>> @@ -1224,6 +1744,7 @@ dma_addr_t virtqueue_get_desc_addr(struct virtqueue *_vq)
>>>    }
>>>    EXPORT_SYMBOL_GPL(virtqueue_get_desc_addr);
>>> +/* Only available for split ring */
>> Interesting, I think we need this for correctly configure pci. e.g in
>> setup_vq()?
> Yes. The setup_vq() should be updated. But it requires
> QEMU change, so I just kept it as is in this RFC patch.

Ok.

>
>>>    dma_addr_t virtqueue_get_avail_addr(struct virtqueue *_vq)
>>>    {
>>>    	struct vring_virtqueue *vq = to_vvq(_vq);
>>> @@ -1235,6 +1756,7 @@ dma_addr_t virtqueue_get_avail_addr(struct virtqueue *_vq)
>>>    }
>>>    EXPORT_SYMBOL_GPL(virtqueue_get_avail_addr);
>>> +/* Only available for split ring */
>>>    dma_addr_t virtqueue_get_used_addr(struct virtqueue *_vq)
>> Maybe it's better to rename this to get_device_addr().
> It's a kernel API which has been exported by EXPORT_SYMBOL_GPL(),
> I'm not sure whether it's a good idea to rename it.

If it's not a part of uapi, I think we can.

Thanks

>
> Best regards,
> Tiwei Bie
>
>> Thanks
>>
>>>    {
>>>    	struct vring_virtqueue *vq = to_vvq(_vq);
>>> @@ -1246,6 +1768,7 @@ dma_addr_t virtqueue_get_used_addr(struct virtqueue *_vq)
>>>    }
>>>    EXPORT_SYMBOL_GPL(virtqueue_get_used_addr);
>>> +/* Only available for split ring */
>>>    const struct vring *virtqueue_get_vring(struct virtqueue *vq)
>>>    {
>>>    	return &to_vvq(vq)->vring;
>>> diff --git a/include/linux/virtio_ring.h b/include/linux/virtio_ring.h
>>> index bbf32524ab27..a0075894ad16 100644
>>> --- a/include/linux/virtio_ring.h
>>> +++ b/include/linux/virtio_ring.h
>>> @@ -60,6 +60,11 @@ static inline void virtio_store_mb(bool weak_barriers,
>>>    struct virtio_device;
>>>    struct virtqueue;
>>> +union vring_union {
>>> +	struct vring vring_split;
>>> +	struct vring_packed vring_packed;
>>> +};
>>> +
>>>    /*
>>>     * Creates a virtqueue and allocates the descriptor ring.  If
>>>     * may_reduce_num is set, then this may allocate a smaller ring than
>>> @@ -79,7 +84,8 @@ struct virtqueue *vring_create_virtqueue(unsigned int index,
>>>    /* Creates a virtqueue with a custom layout. */
>>>    struct virtqueue *__vring_new_virtqueue(unsigned int index,
>>> -					struct vring vring,
>>> +					union vring_union vring,
>>> +					bool packed,
>>>    					struct virtio_device *vdev,
>>>    					bool weak_barriers,
>>>    					bool ctx,

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH RFC 2/2] virtio_ring: support packed ring
  2018-03-16  6:10       ` Tiwei Bie
  (?)
@ 2018-03-16  6:44       ` Jason Wang
  -1 siblings, 0 replies; 38+ messages in thread
From: Jason Wang @ 2018-03-16  6:44 UTC (permalink / raw)
  To: Tiwei Bie; +Cc: mst, netdev, linux-kernel, virtualization, wexu



On 2018年03月16日 14:10, Tiwei Bie wrote:
> On Fri, Mar 16, 2018 at 12:03:25PM +0800, Jason Wang wrote:
>> On 2018年02月23日 19:18, Tiwei Bie wrote:
>>> Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
>>> ---
>>>    drivers/virtio/virtio_ring.c | 699 +++++++++++++++++++++++++++++++++++++------
>>>    include/linux/virtio_ring.h  |   8 +-
>>>    2 files changed, 618 insertions(+), 89 deletions(-)
>>>
>>> diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
>>> index eb30f3e09a47..393778a2f809 100644
>>> --- a/drivers/virtio/virtio_ring.c
>>> +++ b/drivers/virtio/virtio_ring.c
>>> @@ -58,14 +58,14 @@
>>>    struct vring_desc_state {
>>>    	void *data;			/* Data for callback. */
>>> -	struct vring_desc *indir_desc;	/* Indirect descriptor, if any. */
>>> +	void *indir_desc;		/* Indirect descriptor, if any. */
>>> +	int num;			/* Descriptor list length. */
>>>    };
>>>    struct vring_virtqueue {
>>>    	struct virtqueue vq;
>>> -	/* Actual memory layout for this queue */
>>> -	struct vring vring;
>>> +	bool packed;
>>>    	/* Can we use weak barriers? */
>>>    	bool weak_barriers;
>>> @@ -87,11 +87,28 @@ struct vring_virtqueue {
>>>    	/* Last used index we've seen. */
>>>    	u16 last_used_idx;
>>> -	/* Last written value to avail->flags */
>>> -	u16 avail_flags_shadow;
>>> -
>>> -	/* Last written value to avail->idx in guest byte order */
>>> -	u16 avail_idx_shadow;
>>> +	union {
>>> +		/* Available for split ring */
>>> +		struct {
>>> +			/* Actual memory layout for this queue */
>>> +			struct vring vring;
>>> +
>>> +			/* Last written value to avail->flags */
>>> +			u16 avail_flags_shadow;
>>> +
>>> +			/* Last written value to avail->idx in
>>> +			 * guest byte order */
>>> +			u16 avail_idx_shadow;
>>> +		};
>>> +
>>> +		/* Available for packed ring */
>>> +		struct {
>>> +			/* Actual memory layout for this queue */
>>> +			struct vring_packed vring_packed;
>>> +			u8 wrap_counter : 1;
>>> +			bool chaining;
>>> +		};
>>> +	};
>>>    	/* How to notify other side. FIXME: commonalize hcalls! */
>>>    	bool (*notify)(struct virtqueue *vq);
>>> @@ -201,26 +218,37 @@ static dma_addr_t vring_map_single(const struct vring_virtqueue *vq,
>>>    			      cpu_addr, size, direction);
>>>    }
>>> -static void vring_unmap_one(const struct vring_virtqueue *vq,
>>> -			    struct vring_desc *desc)
>>> +static void vring_unmap_one(const struct vring_virtqueue *vq, void *_desc)
>>>    {
>> Let's split the helpers to packed/split version like other helpers?
>> (Consider the caller has already known the type of vq).
> Okay.
>

[...]

>>> +				desc[i].flags = flags;
>>> +
>>> +			desc[i].addr = cpu_to_virtio64(_vq->vdev, addr);
>>> +			desc[i].len = cpu_to_virtio32(_vq->vdev, sg->length);
>>> +			desc[i].id = cpu_to_virtio32(_vq->vdev, head);
>> If it's a part of chain, we only need to do this for last buffer I think.
> I'm not sure I've got your point about the "last buffer".
> But, yes, id just needs to be set for the last desc.

Right, I think I meant "last descriptor" :)

>
>>> +			prev = i;
>>> +			i++;
>> It looks to me prev is always i - 1?
> No. prev will be (vq->vring_packed.num - 1) when i becomes 0.

Right, so prev = i ? i - 1 : vq->vring_packed.num - 1.

>
>>> +			if (!indirect && i >= vq->vring_packed.num) {
>>> +				i = 0;
>>> +				vq->wrap_counter ^= 1;
>>> +			}
>>> +		}
>>> +	}
>>> +	for (; n < (out_sgs + in_sgs); n++) {
>>> +		for (sg = sgs[n]; sg; sg = sg_next(sg)) {
>>> +			dma_addr_t addr = vring_map_one_sg(vq, sg, DMA_FROM_DEVICE);
>>> +			if (vring_mapping_error(vq, addr))
>>> +				goto unmap_release;
>>> +
>>> +			flags = cpu_to_virtio16(_vq->vdev, VRING_DESC_F_NEXT |
>>> +					VRING_DESC_F_WRITE |
>>> +					VRING_DESC_F_AVAIL(vq->wrap_counter) |
>>> +					VRING_DESC_F_USED(!vq->wrap_counter));
>>> +			if (!indirect && i == head)
>>> +				head_flags = flags;
>>> +			else
>>> +				desc[i].flags = flags;
>>> +
>>> +			desc[i].addr = cpu_to_virtio64(_vq->vdev, addr);
>>> +			desc[i].len = cpu_to_virtio32(_vq->vdev, sg->length);
>>> +			desc[i].id = cpu_to_virtio32(_vq->vdev, head);
>>> +			prev = i;
>>> +			i++;
>>> +			if (!indirect && i >= vq->vring_packed.num) {
>>> +				i = 0;
>>> +				vq->wrap_counter ^= 1;
>>> +			}
>>> +		}
>>> +	}
>>> +	/* Last one doesn't continue. */
>>> +	if (!indirect && (head + 1) % vq->vring_packed.num == i)
>>> +		head_flags &= cpu_to_virtio16(_vq->vdev, ~VRING_DESC_F_NEXT);
>> I can't get the why we need this here.
> If only one desc is used, we will need to clear the
> VRING_DESC_F_NEXT flag from the head_flags.

Yes, I meant why following desc[prev].flags won't work for this?

>
>>> +	else
>>> +		desc[prev].flags &= cpu_to_virtio16(_vq->vdev, ~VRING_DESC_F_NEXT);
>>> +
>>> +	if (indirect) {
>>> +		/* FIXME: to be implemented */
>>> +
>>> +		/* Now that the indirect table is filled in, map it. */
>>> +		dma_addr_t addr = vring_map_single(
>>> +			vq, desc, total_sg * sizeof(struct vring_packed_desc),
>>> +			DMA_TO_DEVICE);
>>> +		if (vring_mapping_error(vq, addr))
>>> +			goto unmap_release;
>>> +
>>> +		head_flags = cpu_to_virtio16(_vq->vdev, VRING_DESC_F_INDIRECT |
>>> +					     VRING_DESC_F_AVAIL(wrap_counter) |
>>> +					     VRING_DESC_F_USED(!wrap_counter));
>>> +		vq->vring_packed.desc[head].addr = cpu_to_virtio64(_vq->vdev, addr);
>>> +		vq->vring_packed.desc[head].len = cpu_to_virtio32(_vq->vdev,
>>> +				total_sg * sizeof(struct vring_packed_desc));
>>> +		vq->vring_packed.desc[head].id = cpu_to_virtio32(_vq->vdev, head);
>>> +	}
>>> +
>>> +	/* We're using some buffers from the free list. */
>>> +	vq->vq.num_free -= descs_used;
>>> +
>>> +	/* Update free pointer */
>>> +	if (indirect) {
>>> +		n = head + 1;
>>> +		if (n >= vq->vring_packed.num) {
>>> +			n = 0;
>>> +			vq->wrap_counter ^= 1;
>>> +		}
>>> +		vq->free_head = n;
>> detach_buf_packed() does not even touch free_head here, so need to explain
>> its meaning for packed ring.
> Above code is for indirect support which isn't really
> implemented in this patch yet.
>
> For your question, free_head stores the index of the
> next avail desc. I'll add a comment for it or move it
> to union and give it a better name in next version.

Yes, something like avail_idx might be better.

>
>>> +	} else
>>> +		vq->free_head = i;
>> ID is only valid in the last descriptor in the list, so head + 1 should be
>> ok too?
> I don't really get your point. The vq->free_head stores
> the index of the next avail desc.

I think I get your idea now, free_head has two meanings:

- next avail index
- buffer id

If I'm correct, let's better add a comment for this.

>
>>> +
>>> +	/* Store token and indirect buffer state. */
>>> +	vq->desc_state[head].num = descs_used;
>>> +	vq->desc_state[head].data = data;
>>> +	if (indirect)
>>> +		vq->desc_state[head].indir_desc = desc;
>>> +	else
>>> +		vq->desc_state[head].indir_desc = ctx;
>>> +
>>> +	virtio_wmb(vq->weak_barriers);
>> Let's add a comment to explain the barrier here.
> Okay.
>
>>> +	vq->vring_packed.desc[head].flags = head_flags;
>>> +	vq->num_added++;
>>> +
>>> +	pr_debug("Added buffer head %i to %p\n", head, vq);
>>> +	END_USE(vq);
>>> +
>>> +	return 0;
>>> +
>>> +unmap_release:
>>> +	err_idx = i;
>>> +	i = head;
>>> +
>>> +	for (n = 0; n < total_sg; n++) {
>>> +		if (i == err_idx)
>>> +			break;
>>> +		vring_unmap_one(vq, &desc[i]);
>>> +		i++;
>>> +		if (!indirect && i >= vq->vring_packed.num)
>>> +			i = 0;
>>> +	}
>>> +
>>> +	vq->wrap_counter = wrap_counter;
>>> +
>>> +	if (indirect)
>>> +		kfree(desc);
>>> +
>>> +	END_USE(vq);
>>> +	return -EIO;
>>> +}
>>> +
>>> +static inline int virtqueue_add(struct virtqueue *_vq,
>>> +				struct scatterlist *sgs[],
>>> +				unsigned int total_sg,
>>> +				unsigned int out_sgs,
>>> +				unsigned int in_sgs,
>>> +				void *data,
>>> +				void *ctx,
>>> +				gfp_t gfp)
>>> +{
>>> +	struct vring_virtqueue *vq = to_vvq(_vq);
>>> +
>>> +	return vq->packed ? virtqueue_add_packed(_vq, sgs, total_sg, out_sgs,
>>> +						 in_sgs, data, ctx, gfp) :
>>> +			    virtqueue_add_split(_vq, sgs, total_sg, out_sgs,
>>> +						in_sgs, data, ctx, gfp);
>>> +}
>>> +
>>>    /**
>>>     * virtqueue_add_sgs - expose buffers to other end
>>>     * @vq: the struct virtqueue we're talking about.
>>> @@ -561,6 +845,12 @@ bool virtqueue_kick_prepare(struct virtqueue *_vq)
>>>    	 * event. */
>>>    	virtio_mb(vq->weak_barriers);
>>> +	if (vq->packed) {
>>> +		/* FIXME: to be implemented */
>>> +		needs_kick = true;
>>> +		goto out;
>>> +	}
>>> +
>>>    	old = vq->avail_idx_shadow - vq->num_added;
>>>    	new = vq->avail_idx_shadow;
>>>    	vq->num_added = 0;
>>> @@ -579,6 +869,8 @@ bool virtqueue_kick_prepare(struct virtqueue *_vq)
>>>    	} else {
>>>    		needs_kick = !(vq->vring.used->flags & cpu_to_virtio16(_vq->vdev, VRING_USED_F_NO_NOTIFY));
>>>    	}
>>> +
>>> +out:
>>>    	END_USE(vq);
>>>    	return needs_kick;
>>>    }
>>> @@ -628,8 +920,8 @@ bool virtqueue_kick(struct virtqueue *vq)
>>>    }
>>>    EXPORT_SYMBOL_GPL(virtqueue_kick);
>>> -static void detach_buf(struct vring_virtqueue *vq, unsigned int head,
>>> -		       void **ctx)
>>> +static void detach_buf_split(struct vring_virtqueue *vq, unsigned int head,
>>> +			     void **ctx)
>>>    {
>>>    	unsigned int i, j;
>>>    	__virtio16 nextflag = cpu_to_virtio16(vq->vq.vdev, VRING_DESC_F_NEXT);
>>> @@ -677,29 +969,81 @@ static void detach_buf(struct vring_virtqueue *vq, unsigned int head,
>>>    	}
>>>    }
>>> -static inline bool more_used(const struct vring_virtqueue *vq)
>>> +static void detach_buf_packed(struct vring_virtqueue *vq, unsigned int head,
>>> +			      void **ctx)
>>> +{
>>> +	struct vring_packed_desc *desc;
>>> +	unsigned int i, j;
>>> +
>>> +	/* Clear data ptr. */
>>> +	vq->desc_state[head].data = NULL;
>>> +
>>> +	i = head;
>>> +
>>> +	for (j = 0; j < vq->desc_state[head].num; j++) {
>>> +		desc = &vq->vring_packed.desc[i];
>>> +		vring_unmap_one(vq, desc);
>>> +		i++;
>>> +		if (i >= vq->vring_packed.num)
>>> +			i = 0;
>>> +	}
>>> +
>>> +	vq->vq.num_free += vq->desc_state[head].num;
>> It looks to me vq->free_head grows always, how can we make sure it does not
>> exceeds vq.num here?
> The vq->free_head stores the index of the next avail
> desc. You can find it wraps together with vq->wrap_counter
> in virtqueue_add_packed().
>

I see, thanks.


[...]

>>> +
>>> +	/* detach_buf_packed clears data, so grab it now. */
>>> +	ret = vq->desc_state[i].data;
>>> +	detach_buf_packed(vq, i, ctx);
>>> +
>>> +	vq->last_used_idx += vq->desc_state[i].num;
>>> +	if (vq->last_used_idx >= vq->vring_packed.num)
>>> +		vq->last_used_idx %= vq->vring_packed.num;
>> '-=' should be sufficient here?
> Good suggestion. I think so.
>
>>> +
>>> +	// FIXME: implement the desc event support
>>> +
>>> +#ifdef DEBUG
>>> +	vq->last_add_time_valid = false;
>>> +#endif
>>> +
>>> +	END_USE(vq);
>>> +	return ret;
>>> +}
>>> +

[...]

>>> +
>>> +#if 0
>>> +		vq->chaining = virtio_has_feature(vdev,
>>> +						  VIRTIO_RING_F_LIST_DESC);
>>> +#else
>>> +		vq->chaining = true;
>> Looks like in V10 there's no F_LIST_DESC.
> Yes. I kept this in this patch just because the
> desc chaining is optional in the old spec draft
> when sending out this patch set. I'll remove it
> in next version.
>
>>> +#endif
>>> +	} else {
>>> +		vq->vring = vring.vring_split;
>>> +		vq->avail_flags_shadow = 0;
>>> +		vq->avail_idx_shadow = 0;
>>> +
>>> +		/* Put everything in free lists. */
>>> +		vq->free_head = 0;
>>> +		for (i = 0; i < num-1; i++)
>>> +			vq->vring.desc[i].next = cpu_to_virtio16(vdev, i + 1);
>>> +	}
>>> +
>>>    	/* No callback?  Tell other side not to bother us. */
>>>    	if (!callback) {
>>> -		vq->avail_flags_shadow |= VRING_AVAIL_F_NO_INTERRUPT;
>>> -		if (!vq->event)
>>> -			vq->vring.avail->flags = cpu_to_virtio16(vdev, vq->avail_flags_shadow);
>>> +		if (packed) {
>>> +			// FIXME: to be implemented
>>> +		} else {
>>> +			vq->avail_flags_shadow |= VRING_AVAIL_F_NO_INTERRUPT;
>>> +			if (!vq->event)
>>> +				vq->vring.avail->flags = cpu_to_virtio16(vdev,
>>> +						vq->avail_flags_shadow);
>>> +		}
>>>    	}
>>> -	/* Put everything in free lists. */
>>> -	vq->free_head = 0;
>>> -	for (i = 0; i < vring.num-1; i++)
>>> -		vq->vring.desc[i].next = cpu_to_virtio16(vdev, i + 1);
>>> -	memset(vq->desc_state, 0, vring.num * sizeof(struct vring_desc_state));
>>> +	memset(vq->desc_state, 0, num * sizeof(struct vring_desc_state));
>>>    	return &vq->vq;
>>>    }
>>> @@ -1058,6 +1548,14 @@ static void vring_free_queue(struct virtio_device *vdev, size_t size,
>>>    	}
>>>    }
>>> +static inline int
>>> +__vring_size(unsigned int num, unsigned long align, bool packed)
>>> +{
>>> +	if (packed)
>>> +		return vring_packed_size(num, align);
>>> +	return vring_size(num, align);
>>> +}
>>> +
>>>    struct virtqueue *vring_create_virtqueue(
>>>    	unsigned int index,
>>>    	unsigned int num,
>>> @@ -1074,7 +1572,8 @@ struct virtqueue *vring_create_virtqueue(
>>>    	void *queue = NULL;
>>>    	dma_addr_t dma_addr;
>>>    	size_t queue_size_in_bytes;
>>> -	struct vring vring;
>>> +	union vring_union vring;
>>> +	bool packed;
>>>    	/* We assume num is a power of 2. */
>>>    	if (num & (num - 1)) {
>>> @@ -1082,9 +1581,13 @@ struct virtqueue *vring_create_virtqueue(
>>>    		return NULL;
>>>    	}
>>> +	packed = virtio_has_feature(vdev, VIRTIO_F_RING_PACKED);
>>> +
>>>    	/* TODO: allocate each queue chunk individually */
>>> -	for (; num && vring_size(num, vring_align) > PAGE_SIZE; num /= 2) {
>>> -		queue = vring_alloc_queue(vdev, vring_size(num, vring_align),
>>> +	for (; num && __vring_size(num, vring_align, packed) > PAGE_SIZE;
>>> +			num /= 2) {
>>> +		queue = vring_alloc_queue(vdev, __vring_size(num, vring_align,
>>> +							     packed),
>>>    					  &dma_addr,
>>>    					  GFP_KERNEL|__GFP_NOWARN|__GFP_ZERO);
>>>    		if (queue)
>>> @@ -1096,17 +1599,21 @@ struct virtqueue *vring_create_virtqueue(
>>>    	if (!queue) {
>>>    		/* Try to get a single page. You are my only hope! */
>>> -		queue = vring_alloc_queue(vdev, vring_size(num, vring_align),
>>> +		queue = vring_alloc_queue(vdev, __vring_size(num, vring_align,
>>> +							     packed),
>>>    					  &dma_addr, GFP_KERNEL|__GFP_ZERO);
>>>    	}
>>>    	if (!queue)
>>>    		return NULL;
>>> -	queue_size_in_bytes = vring_size(num, vring_align);
>>> -	vring_init(&vring, num, queue, vring_align);
>>> +	queue_size_in_bytes = __vring_size(num, vring_align, packed);
>>> +	if (packed)
>>> +		vring_packed_init(&vring.vring_packed, num, queue, vring_align);
>>> +	else
>>> +		vring_init(&vring.vring_split, num, queue, vring_align);
>> Let's rename vring_init to vring_init_split() like other helpers?
> The vring_init() is a public API in include/uapi/linux/virtio_ring.h.
> I don't think we can rename it.

I see, then this need more thoughts to unify the API.

>
>>> -	vq = __vring_new_virtqueue(index, vring, vdev, weak_barriers, context,
>>> -				   notify, callback, name);
>>> +	vq = __vring_new_virtqueue(index, vring, packed, vdev, weak_barriers,
>>> +				   context, notify, callback, name);
>>>    	if (!vq) {
>>>    		vring_free_queue(vdev, queue_size_in_bytes, queue,
>>>    				 dma_addr);
>>> @@ -1132,10 +1639,17 @@ struct virtqueue *vring_new_virtqueue(unsigned int index,
>>>    				      void (*callback)(struct virtqueue *vq),
>>>    				      const char *name)
>>>    {
>>> -	struct vring vring;
>>> -	vring_init(&vring, num, pages, vring_align);
>>> -	return __vring_new_virtqueue(index, vring, vdev, weak_barriers, context,
>>> -				     notify, callback, name);
>>> +	union vring_union vring;
>>> +	bool packed;
>>> +
>>> +	packed = virtio_has_feature(vdev, VIRTIO_F_RING_PACKED);
>>> +	if (packed)
>>> +		vring_packed_init(&vring.vring_packed, num, pages, vring_align);
>>> +	else
>>> +		vring_init(&vring.vring_split, num, pages, vring_align);
>>> +
>>> +	return __vring_new_virtqueue(index, vring, packed, vdev, weak_barriers,
>>> +				     context, notify, callback, name);
>>>    }
>>>    EXPORT_SYMBOL_GPL(vring_new_virtqueue);
>>> @@ -1145,7 +1659,9 @@ void vring_del_virtqueue(struct virtqueue *_vq)
>>>    	if (vq->we_own_ring) {
>>>    		vring_free_queue(vq->vq.vdev, vq->queue_size_in_bytes,
>>> -				 vq->vring.desc, vq->queue_dma_addr);
>>> +				 vq->packed ? (void *)vq->vring_packed.desc :
>>> +					      (void *)vq->vring.desc,
>>> +				 vq->queue_dma_addr);
>>>    	}
>>>    	list_del(&_vq->list);
>>>    	kfree(vq);
>>> @@ -1159,14 +1675,18 @@ void vring_transport_features(struct virtio_device *vdev)
>>>    	for (i = VIRTIO_TRANSPORT_F_START; i < VIRTIO_TRANSPORT_F_END; i++) {
>>>    		switch (i) {
>>> +#if 0 // FIXME: to be implemented
>>>    		case VIRTIO_RING_F_INDIRECT_DESC:
>>>    			break;
>>> +#endif
>>>    		case VIRTIO_RING_F_EVENT_IDX:
>>>    			break;
>>>    		case VIRTIO_F_VERSION_1:
>>>    			break;
>>>    		case VIRTIO_F_IOMMU_PLATFORM:
>>>    			break;
>>> +		case VIRTIO_F_RING_PACKED:
>>> +			break;
>>>    		default:
>>>    			/* We don't understand this bit. */
>>>    			__virtio_clear_bit(vdev, i);
>>> @@ -1187,7 +1707,7 @@ unsigned int virtqueue_get_vring_size(struct virtqueue *_vq)
>>>    	struct vring_virtqueue *vq = to_vvq(_vq);
>>> -	return vq->vring.num;
>>> +	return vq->packed ? vq->vring_packed.num : vq->vring.num;
>>>    }
>>>    EXPORT_SYMBOL_GPL(virtqueue_get_vring_size);
>>> @@ -1224,6 +1744,7 @@ dma_addr_t virtqueue_get_desc_addr(struct virtqueue *_vq)
>>>    }
>>>    EXPORT_SYMBOL_GPL(virtqueue_get_desc_addr);
>>> +/* Only available for split ring */
>> Interesting, I think we need this for correctly configure pci. e.g in
>> setup_vq()?
> Yes. The setup_vq() should be updated. But it requires
> QEMU change, so I just kept it as is in this RFC patch.

Ok.

>
>>>    dma_addr_t virtqueue_get_avail_addr(struct virtqueue *_vq)
>>>    {
>>>    	struct vring_virtqueue *vq = to_vvq(_vq);
>>> @@ -1235,6 +1756,7 @@ dma_addr_t virtqueue_get_avail_addr(struct virtqueue *_vq)
>>>    }
>>>    EXPORT_SYMBOL_GPL(virtqueue_get_avail_addr);
>>> +/* Only available for split ring */
>>>    dma_addr_t virtqueue_get_used_addr(struct virtqueue *_vq)
>> Maybe it's better to rename this to get_device_addr().
> It's a kernel API which has been exported by EXPORT_SYMBOL_GPL(),
> I'm not sure whether it's a good idea to rename it.

If it's not a part of uapi, I think we can.

Thanks

>
> Best regards,
> Tiwei Bie
>
>> Thanks
>>
>>>    {
>>>    	struct vring_virtqueue *vq = to_vvq(_vq);
>>> @@ -1246,6 +1768,7 @@ dma_addr_t virtqueue_get_used_addr(struct virtqueue *_vq)
>>>    }
>>>    EXPORT_SYMBOL_GPL(virtqueue_get_used_addr);
>>> +/* Only available for split ring */
>>>    const struct vring *virtqueue_get_vring(struct virtqueue *vq)
>>>    {
>>>    	return &to_vvq(vq)->vring;
>>> diff --git a/include/linux/virtio_ring.h b/include/linux/virtio_ring.h
>>> index bbf32524ab27..a0075894ad16 100644
>>> --- a/include/linux/virtio_ring.h
>>> +++ b/include/linux/virtio_ring.h
>>> @@ -60,6 +60,11 @@ static inline void virtio_store_mb(bool weak_barriers,
>>>    struct virtio_device;
>>>    struct virtqueue;
>>> +union vring_union {
>>> +	struct vring vring_split;
>>> +	struct vring_packed vring_packed;
>>> +};
>>> +
>>>    /*
>>>     * Creates a virtqueue and allocates the descriptor ring.  If
>>>     * may_reduce_num is set, then this may allocate a smaller ring than
>>> @@ -79,7 +84,8 @@ struct virtqueue *vring_create_virtqueue(unsigned int index,
>>>    /* Creates a virtqueue with a custom layout. */
>>>    struct virtqueue *__vring_new_virtqueue(unsigned int index,
>>> -					struct vring vring,
>>> +					union vring_union vring,
>>> +					bool packed,
>>>    					struct virtio_device *vdev,
>>>    					bool weak_barriers,
>>>    					bool ctx,

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH RFC 2/2] virtio_ring: support packed ring
  2018-03-16  6:44       ` Jason Wang
  2018-03-16  7:40         ` Tiwei Bie
@ 2018-03-16  7:40         ` Tiwei Bie
  2018-03-16  8:34             ` Jason Wang
  1 sibling, 1 reply; 38+ messages in thread
From: Tiwei Bie @ 2018-03-16  7:40 UTC (permalink / raw)
  To: Jason Wang; +Cc: mst, virtualization, linux-kernel, netdev, wexu, jfreimann

On Fri, Mar 16, 2018 at 02:44:12PM +0800, Jason Wang wrote:
> On 2018年03月16日 14:10, Tiwei Bie wrote:
> > On Fri, Mar 16, 2018 at 12:03:25PM +0800, Jason Wang wrote:
> > > On 2018年02月23日 19:18, Tiwei Bie wrote:
> > > > Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
> > > > ---
> > > >    drivers/virtio/virtio_ring.c | 699 +++++++++++++++++++++++++++++++++++++------
> > > >    include/linux/virtio_ring.h  |   8 +-
> > > >    2 files changed, 618 insertions(+), 89 deletions(-)
[...]
> > > >    			      cpu_addr, size, direction);
> > > >    }
> > > > -static void vring_unmap_one(const struct vring_virtqueue *vq,
> > > > -			    struct vring_desc *desc)
> > > > +static void vring_unmap_one(const struct vring_virtqueue *vq, void *_desc)
> > > >    {
> > > Let's split the helpers to packed/split version like other helpers?
> > > (Consider the caller has already known the type of vq).
> > Okay.
> > 
> 
> [...]
> 
> > > > +				desc[i].flags = flags;
> > > > +
> > > > +			desc[i].addr = cpu_to_virtio64(_vq->vdev, addr);
> > > > +			desc[i].len = cpu_to_virtio32(_vq->vdev, sg->length);
> > > > +			desc[i].id = cpu_to_virtio32(_vq->vdev, head);
> > > If it's a part of chain, we only need to do this for last buffer I think.
> > I'm not sure I've got your point about the "last buffer".
> > But, yes, id just needs to be set for the last desc.
> 
> Right, I think I meant "last descriptor" :)
> 
> > 
> > > > +			prev = i;
> > > > +			i++;
> > > It looks to me prev is always i - 1?
> > No. prev will be (vq->vring_packed.num - 1) when i becomes 0.
> 
> Right, so prev = i ? i - 1 : vq->vring_packed.num - 1.

Yes, i wraps together with vq->wrap_counter in following code:

> > > > +			if (!indirect && i >= vq->vring_packed.num) {
> > > > +				i = 0;
> > > > +				vq->wrap_counter ^= 1;
> > > > +			}


> > > > +		}
> > > > +	}
> > > > +	for (; n < (out_sgs + in_sgs); n++) {
> > > > +		for (sg = sgs[n]; sg; sg = sg_next(sg)) {
> > > > +			dma_addr_t addr = vring_map_one_sg(vq, sg, DMA_FROM_DEVICE);
> > > > +			if (vring_mapping_error(vq, addr))
> > > > +				goto unmap_release;
> > > > +
> > > > +			flags = cpu_to_virtio16(_vq->vdev, VRING_DESC_F_NEXT |
> > > > +					VRING_DESC_F_WRITE |
> > > > +					VRING_DESC_F_AVAIL(vq->wrap_counter) |
> > > > +					VRING_DESC_F_USED(!vq->wrap_counter));
> > > > +			if (!indirect && i == head)
> > > > +				head_flags = flags;
> > > > +			else
> > > > +				desc[i].flags = flags;
> > > > +
> > > > +			desc[i].addr = cpu_to_virtio64(_vq->vdev, addr);
> > > > +			desc[i].len = cpu_to_virtio32(_vq->vdev, sg->length);
> > > > +			desc[i].id = cpu_to_virtio32(_vq->vdev, head);
> > > > +			prev = i;
> > > > +			i++;
> > > > +			if (!indirect && i >= vq->vring_packed.num) {
> > > > +				i = 0;
> > > > +				vq->wrap_counter ^= 1;
> > > > +			}
> > > > +		}
> > > > +	}
> > > > +	/* Last one doesn't continue. */
> > > > +	if (!indirect && (head + 1) % vq->vring_packed.num == i)
> > > > +		head_flags &= cpu_to_virtio16(_vq->vdev, ~VRING_DESC_F_NEXT);
> > > I can't get the why we need this here.
> > If only one desc is used, we will need to clear the
> > VRING_DESC_F_NEXT flag from the head_flags.
> 
> Yes, I meant why following desc[prev].flags won't work for this?

Because the update of desc[head].flags (in above case,
prev == head) has been delayed. The flags is saved in
head_flags.

> 
> > 
> > > > +	else
> > > > +		desc[prev].flags &= cpu_to_virtio16(_vq->vdev, ~VRING_DESC_F_NEXT);
> > > > +
> > > > +	if (indirect) {
> > > > +		/* FIXME: to be implemented */
> > > > +
> > > > +		/* Now that the indirect table is filled in, map it. */
> > > > +		dma_addr_t addr = vring_map_single(
> > > > +			vq, desc, total_sg * sizeof(struct vring_packed_desc),
> > > > +			DMA_TO_DEVICE);
> > > > +		if (vring_mapping_error(vq, addr))
> > > > +			goto unmap_release;
> > > > +
> > > > +		head_flags = cpu_to_virtio16(_vq->vdev, VRING_DESC_F_INDIRECT |
> > > > +					     VRING_DESC_F_AVAIL(wrap_counter) |
> > > > +					     VRING_DESC_F_USED(!wrap_counter));
> > > > +		vq->vring_packed.desc[head].addr = cpu_to_virtio64(_vq->vdev, addr);
> > > > +		vq->vring_packed.desc[head].len = cpu_to_virtio32(_vq->vdev,
> > > > +				total_sg * sizeof(struct vring_packed_desc));
> > > > +		vq->vring_packed.desc[head].id = cpu_to_virtio32(_vq->vdev, head);
> > > > +	}
> > > > +
> > > > +	/* We're using some buffers from the free list. */
> > > > +	vq->vq.num_free -= descs_used;
> > > > +
> > > > +	/* Update free pointer */
> > > > +	if (indirect) {
> > > > +		n = head + 1;
> > > > +		if (n >= vq->vring_packed.num) {
> > > > +			n = 0;
> > > > +			vq->wrap_counter ^= 1;
> > > > +		}
> > > > +		vq->free_head = n;
> > > detach_buf_packed() does not even touch free_head here, so need to explain
> > > its meaning for packed ring.
> > Above code is for indirect support which isn't really
> > implemented in this patch yet.
> > 
> > For your question, free_head stores the index of the
> > next avail desc. I'll add a comment for it or move it
> > to union and give it a better name in next version.
> 
> Yes, something like avail_idx might be better.
> 
> > 
> > > > +	} else
> > > > +		vq->free_head = i;
> > > ID is only valid in the last descriptor in the list, so head + 1 should be
> > > ok too?
> > I don't really get your point. The vq->free_head stores
> > the index of the next avail desc.
> 
> I think I get your idea now, free_head has two meanings:
> 
> - next avail index
> - buffer id

In my design, free_head is just the index of the next
avail desc.

Driver can set anything to buffer ID. And in my design,
I save desc index in buffer ID.

I'll add comments for them.

> 
> If I'm correct, let's better add a comment for this.
> 
> > 
> > > > +
> > > > +	/* Store token and indirect buffer state. */
> > > > +	vq->desc_state[head].num = descs_used;
> > > > +	vq->desc_state[head].data = data;
> > > > +	if (indirect)
> > > > +		vq->desc_state[head].indir_desc = desc;
> > > > +	else
> > > > +		vq->desc_state[head].indir_desc = ctx;
> > > > +
> > > > +	virtio_wmb(vq->weak_barriers);
> > > Let's add a comment to explain the barrier here.
> > Okay.
> > 
> > > > +	vq->vring_packed.desc[head].flags = head_flags;
> > > > +	vq->num_added++;
> > > > +
> > > > +	pr_debug("Added buffer head %i to %p\n", head, vq);
> > > > +	END_USE(vq);
> > > > +
> > > > +	return 0;
> > > > +
> > > > +unmap_release:
> > > > +	err_idx = i;
> > > > +	i = head;
> > > > +
> > > > +	for (n = 0; n < total_sg; n++) {
> > > > +		if (i == err_idx)
> > > > +			break;
> > > > +		vring_unmap_one(vq, &desc[i]);
> > > > +		i++;
> > > > +		if (!indirect && i >= vq->vring_packed.num)
> > > > +			i = 0;
> > > > +	}
> > > > +
> > > > +	vq->wrap_counter = wrap_counter;
> > > > +
> > > > +	if (indirect)
> > > > +		kfree(desc);
> > > > +
> > > > +	END_USE(vq);
> > > > +	return -EIO;
> > > > +}
[...]
> > > > @@ -1096,17 +1599,21 @@ struct virtqueue *vring_create_virtqueue(
> > > >    	if (!queue) {
> > > >    		/* Try to get a single page. You are my only hope! */
> > > > -		queue = vring_alloc_queue(vdev, vring_size(num, vring_align),
> > > > +		queue = vring_alloc_queue(vdev, __vring_size(num, vring_align,
> > > > +							     packed),
> > > >    					  &dma_addr, GFP_KERNEL|__GFP_ZERO);
> > > >    	}
> > > >    	if (!queue)
> > > >    		return NULL;
> > > > -	queue_size_in_bytes = vring_size(num, vring_align);
> > > > -	vring_init(&vring, num, queue, vring_align);
> > > > +	queue_size_in_bytes = __vring_size(num, vring_align, packed);
> > > > +	if (packed)
> > > > +		vring_packed_init(&vring.vring_packed, num, queue, vring_align);
> > > > +	else
> > > > +		vring_init(&vring.vring_split, num, queue, vring_align);
> > > Let's rename vring_init to vring_init_split() like other helpers?
> > The vring_init() is a public API in include/uapi/linux/virtio_ring.h.
> > I don't think we can rename it.
> 
> I see, then this need more thoughts to unify the API.

My thought is to keep the old API as is, and introduce
new types and helpers for packed ring.

More details can be found in this patch:
https://lkml.org/lkml/2018/2/23/243
(PS. The type which has bit fields is just for reference,
 and will be changed in next version.)

Do you have any other suggestions?

Best regards,
Tiwei Bie

> 
> > 
> > > > -	vq = __vring_new_virtqueue(index, vring, vdev, weak_barriers, context,
> > > > -				   notify, callback, name);
> > > > +	vq = __vring_new_virtqueue(index, vring, packed, vdev, weak_barriers,
> > > > +				   context, notify, callback, name);
> > > >    	if (!vq) {
> > > >    		vring_free_queue(vdev, queue_size_in_bytes, queue,
> > > >    				 dma_addr);
[...]

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH RFC 2/2] virtio_ring: support packed ring
  2018-03-16  6:44       ` Jason Wang
@ 2018-03-16  7:40         ` Tiwei Bie
  2018-03-16  7:40         ` Tiwei Bie
  1 sibling, 0 replies; 38+ messages in thread
From: Tiwei Bie @ 2018-03-16  7:40 UTC (permalink / raw)
  To: Jason Wang; +Cc: mst, netdev, linux-kernel, virtualization, wexu

On Fri, Mar 16, 2018 at 02:44:12PM +0800, Jason Wang wrote:
> On 2018年03月16日 14:10, Tiwei Bie wrote:
> > On Fri, Mar 16, 2018 at 12:03:25PM +0800, Jason Wang wrote:
> > > On 2018年02月23日 19:18, Tiwei Bie wrote:
> > > > Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
> > > > ---
> > > >    drivers/virtio/virtio_ring.c | 699 +++++++++++++++++++++++++++++++++++++------
> > > >    include/linux/virtio_ring.h  |   8 +-
> > > >    2 files changed, 618 insertions(+), 89 deletions(-)
[...]
> > > >    			      cpu_addr, size, direction);
> > > >    }
> > > > -static void vring_unmap_one(const struct vring_virtqueue *vq,
> > > > -			    struct vring_desc *desc)
> > > > +static void vring_unmap_one(const struct vring_virtqueue *vq, void *_desc)
> > > >    {
> > > Let's split the helpers to packed/split version like other helpers?
> > > (Consider the caller has already known the type of vq).
> > Okay.
> > 
> 
> [...]
> 
> > > > +				desc[i].flags = flags;
> > > > +
> > > > +			desc[i].addr = cpu_to_virtio64(_vq->vdev, addr);
> > > > +			desc[i].len = cpu_to_virtio32(_vq->vdev, sg->length);
> > > > +			desc[i].id = cpu_to_virtio32(_vq->vdev, head);
> > > If it's a part of chain, we only need to do this for last buffer I think.
> > I'm not sure I've got your point about the "last buffer".
> > But, yes, id just needs to be set for the last desc.
> 
> Right, I think I meant "last descriptor" :)
> 
> > 
> > > > +			prev = i;
> > > > +			i++;
> > > It looks to me prev is always i - 1?
> > No. prev will be (vq->vring_packed.num - 1) when i becomes 0.
> 
> Right, so prev = i ? i - 1 : vq->vring_packed.num - 1.

Yes, i wraps together with vq->wrap_counter in following code:

> > > > +			if (!indirect && i >= vq->vring_packed.num) {
> > > > +				i = 0;
> > > > +				vq->wrap_counter ^= 1;
> > > > +			}


> > > > +		}
> > > > +	}
> > > > +	for (; n < (out_sgs + in_sgs); n++) {
> > > > +		for (sg = sgs[n]; sg; sg = sg_next(sg)) {
> > > > +			dma_addr_t addr = vring_map_one_sg(vq, sg, DMA_FROM_DEVICE);
> > > > +			if (vring_mapping_error(vq, addr))
> > > > +				goto unmap_release;
> > > > +
> > > > +			flags = cpu_to_virtio16(_vq->vdev, VRING_DESC_F_NEXT |
> > > > +					VRING_DESC_F_WRITE |
> > > > +					VRING_DESC_F_AVAIL(vq->wrap_counter) |
> > > > +					VRING_DESC_F_USED(!vq->wrap_counter));
> > > > +			if (!indirect && i == head)
> > > > +				head_flags = flags;
> > > > +			else
> > > > +				desc[i].flags = flags;
> > > > +
> > > > +			desc[i].addr = cpu_to_virtio64(_vq->vdev, addr);
> > > > +			desc[i].len = cpu_to_virtio32(_vq->vdev, sg->length);
> > > > +			desc[i].id = cpu_to_virtio32(_vq->vdev, head);
> > > > +			prev = i;
> > > > +			i++;
> > > > +			if (!indirect && i >= vq->vring_packed.num) {
> > > > +				i = 0;
> > > > +				vq->wrap_counter ^= 1;
> > > > +			}
> > > > +		}
> > > > +	}
> > > > +	/* Last one doesn't continue. */
> > > > +	if (!indirect && (head + 1) % vq->vring_packed.num == i)
> > > > +		head_flags &= cpu_to_virtio16(_vq->vdev, ~VRING_DESC_F_NEXT);
> > > I can't get the why we need this here.
> > If only one desc is used, we will need to clear the
> > VRING_DESC_F_NEXT flag from the head_flags.
> 
> Yes, I meant why following desc[prev].flags won't work for this?

Because the update of desc[head].flags (in above case,
prev == head) has been delayed. The flags is saved in
head_flags.

> 
> > 
> > > > +	else
> > > > +		desc[prev].flags &= cpu_to_virtio16(_vq->vdev, ~VRING_DESC_F_NEXT);
> > > > +
> > > > +	if (indirect) {
> > > > +		/* FIXME: to be implemented */
> > > > +
> > > > +		/* Now that the indirect table is filled in, map it. */
> > > > +		dma_addr_t addr = vring_map_single(
> > > > +			vq, desc, total_sg * sizeof(struct vring_packed_desc),
> > > > +			DMA_TO_DEVICE);
> > > > +		if (vring_mapping_error(vq, addr))
> > > > +			goto unmap_release;
> > > > +
> > > > +		head_flags = cpu_to_virtio16(_vq->vdev, VRING_DESC_F_INDIRECT |
> > > > +					     VRING_DESC_F_AVAIL(wrap_counter) |
> > > > +					     VRING_DESC_F_USED(!wrap_counter));
> > > > +		vq->vring_packed.desc[head].addr = cpu_to_virtio64(_vq->vdev, addr);
> > > > +		vq->vring_packed.desc[head].len = cpu_to_virtio32(_vq->vdev,
> > > > +				total_sg * sizeof(struct vring_packed_desc));
> > > > +		vq->vring_packed.desc[head].id = cpu_to_virtio32(_vq->vdev, head);
> > > > +	}
> > > > +
> > > > +	/* We're using some buffers from the free list. */
> > > > +	vq->vq.num_free -= descs_used;
> > > > +
> > > > +	/* Update free pointer */
> > > > +	if (indirect) {
> > > > +		n = head + 1;
> > > > +		if (n >= vq->vring_packed.num) {
> > > > +			n = 0;
> > > > +			vq->wrap_counter ^= 1;
> > > > +		}
> > > > +		vq->free_head = n;
> > > detach_buf_packed() does not even touch free_head here, so need to explain
> > > its meaning for packed ring.
> > Above code is for indirect support which isn't really
> > implemented in this patch yet.
> > 
> > For your question, free_head stores the index of the
> > next avail desc. I'll add a comment for it or move it
> > to union and give it a better name in next version.
> 
> Yes, something like avail_idx might be better.
> 
> > 
> > > > +	} else
> > > > +		vq->free_head = i;
> > > ID is only valid in the last descriptor in the list, so head + 1 should be
> > > ok too?
> > I don't really get your point. The vq->free_head stores
> > the index of the next avail desc.
> 
> I think I get your idea now, free_head has two meanings:
> 
> - next avail index
> - buffer id

In my design, free_head is just the index of the next
avail desc.

Driver can set anything to buffer ID. And in my design,
I save desc index in buffer ID.

I'll add comments for them.

> 
> If I'm correct, let's better add a comment for this.
> 
> > 
> > > > +
> > > > +	/* Store token and indirect buffer state. */
> > > > +	vq->desc_state[head].num = descs_used;
> > > > +	vq->desc_state[head].data = data;
> > > > +	if (indirect)
> > > > +		vq->desc_state[head].indir_desc = desc;
> > > > +	else
> > > > +		vq->desc_state[head].indir_desc = ctx;
> > > > +
> > > > +	virtio_wmb(vq->weak_barriers);
> > > Let's add a comment to explain the barrier here.
> > Okay.
> > 
> > > > +	vq->vring_packed.desc[head].flags = head_flags;
> > > > +	vq->num_added++;
> > > > +
> > > > +	pr_debug("Added buffer head %i to %p\n", head, vq);
> > > > +	END_USE(vq);
> > > > +
> > > > +	return 0;
> > > > +
> > > > +unmap_release:
> > > > +	err_idx = i;
> > > > +	i = head;
> > > > +
> > > > +	for (n = 0; n < total_sg; n++) {
> > > > +		if (i == err_idx)
> > > > +			break;
> > > > +		vring_unmap_one(vq, &desc[i]);
> > > > +		i++;
> > > > +		if (!indirect && i >= vq->vring_packed.num)
> > > > +			i = 0;
> > > > +	}
> > > > +
> > > > +	vq->wrap_counter = wrap_counter;
> > > > +
> > > > +	if (indirect)
> > > > +		kfree(desc);
> > > > +
> > > > +	END_USE(vq);
> > > > +	return -EIO;
> > > > +}
[...]
> > > > @@ -1096,17 +1599,21 @@ struct virtqueue *vring_create_virtqueue(
> > > >    	if (!queue) {
> > > >    		/* Try to get a single page. You are my only hope! */
> > > > -		queue = vring_alloc_queue(vdev, vring_size(num, vring_align),
> > > > +		queue = vring_alloc_queue(vdev, __vring_size(num, vring_align,
> > > > +							     packed),
> > > >    					  &dma_addr, GFP_KERNEL|__GFP_ZERO);
> > > >    	}
> > > >    	if (!queue)
> > > >    		return NULL;
> > > > -	queue_size_in_bytes = vring_size(num, vring_align);
> > > > -	vring_init(&vring, num, queue, vring_align);
> > > > +	queue_size_in_bytes = __vring_size(num, vring_align, packed);
> > > > +	if (packed)
> > > > +		vring_packed_init(&vring.vring_packed, num, queue, vring_align);
> > > > +	else
> > > > +		vring_init(&vring.vring_split, num, queue, vring_align);
> > > Let's rename vring_init to vring_init_split() like other helpers?
> > The vring_init() is a public API in include/uapi/linux/virtio_ring.h.
> > I don't think we can rename it.
> 
> I see, then this need more thoughts to unify the API.

My thought is to keep the old API as is, and introduce
new types and helpers for packed ring.

More details can be found in this patch:
https://lkml.org/lkml/2018/2/23/243
(PS. The type which has bit fields is just for reference,
 and will be changed in next version.)

Do you have any other suggestions?

Best regards,
Tiwei Bie

> 
> > 
> > > > -	vq = __vring_new_virtqueue(index, vring, vdev, weak_barriers, context,
> > > > -				   notify, callback, name);
> > > > +	vq = __vring_new_virtqueue(index, vring, packed, vdev, weak_barriers,
> > > > +				   context, notify, callback, name);
> > > >    	if (!vq) {
> > > >    		vring_free_queue(vdev, queue_size_in_bytes, queue,
> > > >    				 dma_addr);
[...]
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH RFC 2/2] virtio_ring: support packed ring
  2018-03-16  7:40         ` Tiwei Bie
@ 2018-03-16  8:34             ` Jason Wang
  0 siblings, 0 replies; 38+ messages in thread
From: Jason Wang @ 2018-03-16  8:34 UTC (permalink / raw)
  To: Tiwei Bie; +Cc: mst, virtualization, linux-kernel, netdev, wexu, jfreimann



On 2018年03月16日 15:40, Tiwei Bie wrote:
> On Fri, Mar 16, 2018 at 02:44:12PM +0800, Jason Wang wrote:
>> On 2018年03月16日 14:10, Tiwei Bie wrote:
>>> On Fri, Mar 16, 2018 at 12:03:25PM +0800, Jason Wang wrote:
>>>> On 2018年02月23日 19:18, Tiwei Bie wrote:
>>>>> Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
>>>>> ---
>>>>>     drivers/virtio/virtio_ring.c | 699 +++++++++++++++++++++++++++++++++++++------
>>>>>     include/linux/virtio_ring.h  |   8 +-
>>>>>     2 files changed, 618 insertions(+), 89 deletions(-)
> [...]
>>>>>     			      cpu_addr, size, direction);
>>>>>     }
>>>>> -static void vring_unmap_one(const struct vring_virtqueue *vq,
>>>>> -			    struct vring_desc *desc)
>>>>> +static void vring_unmap_one(const struct vring_virtqueue *vq, void *_desc)
>>>>>     {
>>>> Let's split the helpers to packed/split version like other helpers?
>>>> (Consider the caller has already known the type of vq).
>>> Okay.
>>>
>> [...]
>>
>>>>> +				desc[i].flags = flags;
>>>>> +
>>>>> +			desc[i].addr = cpu_to_virtio64(_vq->vdev, addr);
>>>>> +			desc[i].len = cpu_to_virtio32(_vq->vdev, sg->length);
>>>>> +			desc[i].id = cpu_to_virtio32(_vq->vdev, head);
>>>> If it's a part of chain, we only need to do this for last buffer I think.
>>> I'm not sure I've got your point about the "last buffer".
>>> But, yes, id just needs to be set for the last desc.
>> Right, I think I meant "last descriptor" :)
>>
>>>>> +			prev = i;
>>>>> +			i++;
>>>> It looks to me prev is always i - 1?
>>> No. prev will be (vq->vring_packed.num - 1) when i becomes 0.
>> Right, so prev = i ? i - 1 : vq->vring_packed.num - 1.
> Yes, i wraps together with vq->wrap_counter in following code:
>
>>>>> +			if (!indirect && i >= vq->vring_packed.num) {
>>>>> +				i = 0;
>>>>> +				vq->wrap_counter ^= 1;
>>>>> +			}
>
>>>>> +		}
>>>>> +	}
>>>>> +	for (; n < (out_sgs + in_sgs); n++) {
>>>>> +		for (sg = sgs[n]; sg; sg = sg_next(sg)) {
>>>>> +			dma_addr_t addr = vring_map_one_sg(vq, sg, DMA_FROM_DEVICE);
>>>>> +			if (vring_mapping_error(vq, addr))
>>>>> +				goto unmap_release;
>>>>> +
>>>>> +			flags = cpu_to_virtio16(_vq->vdev, VRING_DESC_F_NEXT |
>>>>> +					VRING_DESC_F_WRITE |
>>>>> +					VRING_DESC_F_AVAIL(vq->wrap_counter) |
>>>>> +					VRING_DESC_F_USED(!vq->wrap_counter));
>>>>> +			if (!indirect && i == head)
>>>>> +				head_flags = flags;
>>>>> +			else
>>>>> +				desc[i].flags = flags;
>>>>> +
>>>>> +			desc[i].addr = cpu_to_virtio64(_vq->vdev, addr);
>>>>> +			desc[i].len = cpu_to_virtio32(_vq->vdev, sg->length);
>>>>> +			desc[i].id = cpu_to_virtio32(_vq->vdev, head);
>>>>> +			prev = i;
>>>>> +			i++;
>>>>> +			if (!indirect && i >= vq->vring_packed.num) {
>>>>> +				i = 0;
>>>>> +				vq->wrap_counter ^= 1;
>>>>> +			}
>>>>> +		}
>>>>> +	}
>>>>> +	/* Last one doesn't continue. */
>>>>> +	if (!indirect && (head + 1) % vq->vring_packed.num == i)
>>>>> +		head_flags &= cpu_to_virtio16(_vq->vdev, ~VRING_DESC_F_NEXT);
>>>> I can't get the why we need this here.
>>> If only one desc is used, we will need to clear the
>>> VRING_DESC_F_NEXT flag from the head_flags.
>> Yes, I meant why following desc[prev].flags won't work for this?
> Because the update of desc[head].flags (in above case,
> prev == head) has been delayed. The flags is saved in
> head_flags.

Ok, but let's try to avoid modular here e.g tracking the number of sgs 
in a counter.

And I see lots of duplication in the above two loops, I believe we can 
unify them with a a single loop. the only difference is dma direction 
and write flag.

>
>>>>> +	else
>>>>> +		desc[prev].flags &= cpu_to_virtio16(_vq->vdev, ~VRING_DESC_F_NEXT);
>>>>> +
>>>>> +	if (indirect) {
>>>>> +		/* FIXME: to be implemented */
>>>>> +
>>>>> +		/* Now that the indirect table is filled in, map it. */
>>>>> +		dma_addr_t addr = vring_map_single(
>>>>> +			vq, desc, total_sg * sizeof(struct vring_packed_desc),
>>>>> +			DMA_TO_DEVICE);
>>>>> +		if (vring_mapping_error(vq, addr))
>>>>> +			goto unmap_release;
>>>>> +
>>>>> +		head_flags = cpu_to_virtio16(_vq->vdev, VRING_DESC_F_INDIRECT |
>>>>> +					     VRING_DESC_F_AVAIL(wrap_counter) |
>>>>> +					     VRING_DESC_F_USED(!wrap_counter));
>>>>> +		vq->vring_packed.desc[head].addr = cpu_to_virtio64(_vq->vdev, addr);
>>>>> +		vq->vring_packed.desc[head].len = cpu_to_virtio32(_vq->vdev,
>>>>> +				total_sg * sizeof(struct vring_packed_desc));
>>>>> +		vq->vring_packed.desc[head].id = cpu_to_virtio32(_vq->vdev, head);
>>>>> +	}
>>>>> +
>>>>> +	/* We're using some buffers from the free list. */
>>>>> +	vq->vq.num_free -= descs_used;
>>>>> +
>>>>> +	/* Update free pointer */
>>>>> +	if (indirect) {
>>>>> +		n = head + 1;
>>>>> +		if (n >= vq->vring_packed.num) {
>>>>> +			n = 0;
>>>>> +			vq->wrap_counter ^= 1;
>>>>> +		}
>>>>> +		vq->free_head = n;
>>>> detach_buf_packed() does not even touch free_head here, so need to explain
>>>> its meaning for packed ring.
>>> Above code is for indirect support which isn't really
>>> implemented in this patch yet.
>>>
>>> For your question, free_head stores the index of the
>>> next avail desc. I'll add a comment for it or move it
>>> to union and give it a better name in next version.
>> Yes, something like avail_idx might be better.
>>
>>>>> +	} else
>>>>> +		vq->free_head = i;
>>>> ID is only valid in the last descriptor in the list, so head + 1 should be
>>>> ok too?
>>> I don't really get your point. The vq->free_head stores
>>> the index of the next avail desc.
>> I think I get your idea now, free_head has two meanings:
>>
>> - next avail index
>> - buffer id
> In my design, free_head is just the index of the next
> avail desc.
>
> Driver can set anything to buffer ID.

Then you need another method to track id to context e.g hashing.

>   And in my design,
> I save desc index in buffer ID.
>
> I'll add comments for them.
>
>> If I'm correct, let's better add a comment for this.
>>
>>>>> +
>>>>> +	/* Store token and indirect buffer state. */
>>>>> +	vq->desc_state[head].num = descs_used;
>>>>> +	vq->desc_state[head].data = data;
>>>>> +	if (indirect)
>>>>> +		vq->desc_state[head].indir_desc = desc;
>>>>> +	else
>>>>> +		vq->desc_state[head].indir_desc = ctx;
>>>>> +
>>>>> +	virtio_wmb(vq->weak_barriers);
>>>> Let's add a comment to explain the barrier here.
>>> Okay.
>>>
>>>>> +	vq->vring_packed.desc[head].flags = head_flags;
>>>>> +	vq->num_added++;
>>>>> +
>>>>> +	pr_debug("Added buffer head %i to %p\n", head, vq);
>>>>> +	END_USE(vq);
>>>>> +
>>>>> +	return 0;
>>>>> +
>>>>> +unmap_release:
>>>>> +	err_idx = i;
>>>>> +	i = head;
>>>>> +
>>>>> +	for (n = 0; n < total_sg; n++) {
>>>>> +		if (i == err_idx)
>>>>> +			break;
>>>>> +		vring_unmap_one(vq, &desc[i]);
>>>>> +		i++;
>>>>> +		if (!indirect && i >= vq->vring_packed.num)
>>>>> +			i = 0;
>>>>> +	}
>>>>> +
>>>>> +	vq->wrap_counter = wrap_counter;
>>>>> +
>>>>> +	if (indirect)
>>>>> +		kfree(desc);
>>>>> +
>>>>> +	END_USE(vq);
>>>>> +	return -EIO;
>>>>> +}
> [...]
>>>>> @@ -1096,17 +1599,21 @@ struct virtqueue *vring_create_virtqueue(
>>>>>     	if (!queue) {
>>>>>     		/* Try to get a single page. You are my only hope! */
>>>>> -		queue = vring_alloc_queue(vdev, vring_size(num, vring_align),
>>>>> +		queue = vring_alloc_queue(vdev, __vring_size(num, vring_align,
>>>>> +							     packed),
>>>>>     					  &dma_addr, GFP_KERNEL|__GFP_ZERO);
>>>>>     	}
>>>>>     	if (!queue)
>>>>>     		return NULL;
>>>>> -	queue_size_in_bytes = vring_size(num, vring_align);
>>>>> -	vring_init(&vring, num, queue, vring_align);
>>>>> +	queue_size_in_bytes = __vring_size(num, vring_align, packed);
>>>>> +	if (packed)
>>>>> +		vring_packed_init(&vring.vring_packed, num, queue, vring_align);
>>>>> +	else
>>>>> +		vring_init(&vring.vring_split, num, queue, vring_align);
>>>> Let's rename vring_init to vring_init_split() like other helpers?
>>> The vring_init() is a public API in include/uapi/linux/virtio_ring.h.
>>> I don't think we can rename it.
>> I see, then this need more thoughts to unify the API.
> My thought is to keep the old API as is, and introduce
> new types and helpers for packed ring.

I admit it's not a fault of this patch. But we'd better think of this in 
the future, consider we may have new kinds of ring.

>
> More details can be found in this patch:
> https://lkml.org/lkml/2018/2/23/243
> (PS. The type which has bit fields is just for reference,
>   and will be changed in next version.)
>
> Do you have any other suggestions?

No.

Thanks

>
> Best regards,
> Tiwei Bie
>
>>>>> -	vq = __vring_new_virtqueue(index, vring, vdev, weak_barriers, context,
>>>>> -				   notify, callback, name);
>>>>> +	vq = __vring_new_virtqueue(index, vring, packed, vdev, weak_barriers,
>>>>> +				   context, notify, callback, name);
>>>>>     	if (!vq) {
>>>>>     		vring_free_queue(vdev, queue_size_in_bytes, queue,
>>>>>     				 dma_addr);
> [...]

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH RFC 2/2] virtio_ring: support packed ring
@ 2018-03-16  8:34             ` Jason Wang
  0 siblings, 0 replies; 38+ messages in thread
From: Jason Wang @ 2018-03-16  8:34 UTC (permalink / raw)
  To: Tiwei Bie; +Cc: mst, netdev, linux-kernel, virtualization, wexu



On 2018年03月16日 15:40, Tiwei Bie wrote:
> On Fri, Mar 16, 2018 at 02:44:12PM +0800, Jason Wang wrote:
>> On 2018年03月16日 14:10, Tiwei Bie wrote:
>>> On Fri, Mar 16, 2018 at 12:03:25PM +0800, Jason Wang wrote:
>>>> On 2018年02月23日 19:18, Tiwei Bie wrote:
>>>>> Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
>>>>> ---
>>>>>     drivers/virtio/virtio_ring.c | 699 +++++++++++++++++++++++++++++++++++++------
>>>>>     include/linux/virtio_ring.h  |   8 +-
>>>>>     2 files changed, 618 insertions(+), 89 deletions(-)
> [...]
>>>>>     			      cpu_addr, size, direction);
>>>>>     }
>>>>> -static void vring_unmap_one(const struct vring_virtqueue *vq,
>>>>> -			    struct vring_desc *desc)
>>>>> +static void vring_unmap_one(const struct vring_virtqueue *vq, void *_desc)
>>>>>     {
>>>> Let's split the helpers to packed/split version like other helpers?
>>>> (Consider the caller has already known the type of vq).
>>> Okay.
>>>
>> [...]
>>
>>>>> +				desc[i].flags = flags;
>>>>> +
>>>>> +			desc[i].addr = cpu_to_virtio64(_vq->vdev, addr);
>>>>> +			desc[i].len = cpu_to_virtio32(_vq->vdev, sg->length);
>>>>> +			desc[i].id = cpu_to_virtio32(_vq->vdev, head);
>>>> If it's a part of chain, we only need to do this for last buffer I think.
>>> I'm not sure I've got your point about the "last buffer".
>>> But, yes, id just needs to be set for the last desc.
>> Right, I think I meant "last descriptor" :)
>>
>>>>> +			prev = i;
>>>>> +			i++;
>>>> It looks to me prev is always i - 1?
>>> No. prev will be (vq->vring_packed.num - 1) when i becomes 0.
>> Right, so prev = i ? i - 1 : vq->vring_packed.num - 1.
> Yes, i wraps together with vq->wrap_counter in following code:
>
>>>>> +			if (!indirect && i >= vq->vring_packed.num) {
>>>>> +				i = 0;
>>>>> +				vq->wrap_counter ^= 1;
>>>>> +			}
>
>>>>> +		}
>>>>> +	}
>>>>> +	for (; n < (out_sgs + in_sgs); n++) {
>>>>> +		for (sg = sgs[n]; sg; sg = sg_next(sg)) {
>>>>> +			dma_addr_t addr = vring_map_one_sg(vq, sg, DMA_FROM_DEVICE);
>>>>> +			if (vring_mapping_error(vq, addr))
>>>>> +				goto unmap_release;
>>>>> +
>>>>> +			flags = cpu_to_virtio16(_vq->vdev, VRING_DESC_F_NEXT |
>>>>> +					VRING_DESC_F_WRITE |
>>>>> +					VRING_DESC_F_AVAIL(vq->wrap_counter) |
>>>>> +					VRING_DESC_F_USED(!vq->wrap_counter));
>>>>> +			if (!indirect && i == head)
>>>>> +				head_flags = flags;
>>>>> +			else
>>>>> +				desc[i].flags = flags;
>>>>> +
>>>>> +			desc[i].addr = cpu_to_virtio64(_vq->vdev, addr);
>>>>> +			desc[i].len = cpu_to_virtio32(_vq->vdev, sg->length);
>>>>> +			desc[i].id = cpu_to_virtio32(_vq->vdev, head);
>>>>> +			prev = i;
>>>>> +			i++;
>>>>> +			if (!indirect && i >= vq->vring_packed.num) {
>>>>> +				i = 0;
>>>>> +				vq->wrap_counter ^= 1;
>>>>> +			}
>>>>> +		}
>>>>> +	}
>>>>> +	/* Last one doesn't continue. */
>>>>> +	if (!indirect && (head + 1) % vq->vring_packed.num == i)
>>>>> +		head_flags &= cpu_to_virtio16(_vq->vdev, ~VRING_DESC_F_NEXT);
>>>> I can't get the why we need this here.
>>> If only one desc is used, we will need to clear the
>>> VRING_DESC_F_NEXT flag from the head_flags.
>> Yes, I meant why following desc[prev].flags won't work for this?
> Because the update of desc[head].flags (in above case,
> prev == head) has been delayed. The flags is saved in
> head_flags.

Ok, but let's try to avoid modular here e.g tracking the number of sgs 
in a counter.

And I see lots of duplication in the above two loops, I believe we can 
unify them with a a single loop. the only difference is dma direction 
and write flag.

>
>>>>> +	else
>>>>> +		desc[prev].flags &= cpu_to_virtio16(_vq->vdev, ~VRING_DESC_F_NEXT);
>>>>> +
>>>>> +	if (indirect) {
>>>>> +		/* FIXME: to be implemented */
>>>>> +
>>>>> +		/* Now that the indirect table is filled in, map it. */
>>>>> +		dma_addr_t addr = vring_map_single(
>>>>> +			vq, desc, total_sg * sizeof(struct vring_packed_desc),
>>>>> +			DMA_TO_DEVICE);
>>>>> +		if (vring_mapping_error(vq, addr))
>>>>> +			goto unmap_release;
>>>>> +
>>>>> +		head_flags = cpu_to_virtio16(_vq->vdev, VRING_DESC_F_INDIRECT |
>>>>> +					     VRING_DESC_F_AVAIL(wrap_counter) |
>>>>> +					     VRING_DESC_F_USED(!wrap_counter));
>>>>> +		vq->vring_packed.desc[head].addr = cpu_to_virtio64(_vq->vdev, addr);
>>>>> +		vq->vring_packed.desc[head].len = cpu_to_virtio32(_vq->vdev,
>>>>> +				total_sg * sizeof(struct vring_packed_desc));
>>>>> +		vq->vring_packed.desc[head].id = cpu_to_virtio32(_vq->vdev, head);
>>>>> +	}
>>>>> +
>>>>> +	/* We're using some buffers from the free list. */
>>>>> +	vq->vq.num_free -= descs_used;
>>>>> +
>>>>> +	/* Update free pointer */
>>>>> +	if (indirect) {
>>>>> +		n = head + 1;
>>>>> +		if (n >= vq->vring_packed.num) {
>>>>> +			n = 0;
>>>>> +			vq->wrap_counter ^= 1;
>>>>> +		}
>>>>> +		vq->free_head = n;
>>>> detach_buf_packed() does not even touch free_head here, so need to explain
>>>> its meaning for packed ring.
>>> Above code is for indirect support which isn't really
>>> implemented in this patch yet.
>>>
>>> For your question, free_head stores the index of the
>>> next avail desc. I'll add a comment for it or move it
>>> to union and give it a better name in next version.
>> Yes, something like avail_idx might be better.
>>
>>>>> +	} else
>>>>> +		vq->free_head = i;
>>>> ID is only valid in the last descriptor in the list, so head + 1 should be
>>>> ok too?
>>> I don't really get your point. The vq->free_head stores
>>> the index of the next avail desc.
>> I think I get your idea now, free_head has two meanings:
>>
>> - next avail index
>> - buffer id
> In my design, free_head is just the index of the next
> avail desc.
>
> Driver can set anything to buffer ID.

Then you need another method to track id to context e.g hashing.

>   And in my design,
> I save desc index in buffer ID.
>
> I'll add comments for them.
>
>> If I'm correct, let's better add a comment for this.
>>
>>>>> +
>>>>> +	/* Store token and indirect buffer state. */
>>>>> +	vq->desc_state[head].num = descs_used;
>>>>> +	vq->desc_state[head].data = data;
>>>>> +	if (indirect)
>>>>> +		vq->desc_state[head].indir_desc = desc;
>>>>> +	else
>>>>> +		vq->desc_state[head].indir_desc = ctx;
>>>>> +
>>>>> +	virtio_wmb(vq->weak_barriers);
>>>> Let's add a comment to explain the barrier here.
>>> Okay.
>>>
>>>>> +	vq->vring_packed.desc[head].flags = head_flags;
>>>>> +	vq->num_added++;
>>>>> +
>>>>> +	pr_debug("Added buffer head %i to %p\n", head, vq);
>>>>> +	END_USE(vq);
>>>>> +
>>>>> +	return 0;
>>>>> +
>>>>> +unmap_release:
>>>>> +	err_idx = i;
>>>>> +	i = head;
>>>>> +
>>>>> +	for (n = 0; n < total_sg; n++) {
>>>>> +		if (i == err_idx)
>>>>> +			break;
>>>>> +		vring_unmap_one(vq, &desc[i]);
>>>>> +		i++;
>>>>> +		if (!indirect && i >= vq->vring_packed.num)
>>>>> +			i = 0;
>>>>> +	}
>>>>> +
>>>>> +	vq->wrap_counter = wrap_counter;
>>>>> +
>>>>> +	if (indirect)
>>>>> +		kfree(desc);
>>>>> +
>>>>> +	END_USE(vq);
>>>>> +	return -EIO;
>>>>> +}
> [...]
>>>>> @@ -1096,17 +1599,21 @@ struct virtqueue *vring_create_virtqueue(
>>>>>     	if (!queue) {
>>>>>     		/* Try to get a single page. You are my only hope! */
>>>>> -		queue = vring_alloc_queue(vdev, vring_size(num, vring_align),
>>>>> +		queue = vring_alloc_queue(vdev, __vring_size(num, vring_align,
>>>>> +							     packed),
>>>>>     					  &dma_addr, GFP_KERNEL|__GFP_ZERO);
>>>>>     	}
>>>>>     	if (!queue)
>>>>>     		return NULL;
>>>>> -	queue_size_in_bytes = vring_size(num, vring_align);
>>>>> -	vring_init(&vring, num, queue, vring_align);
>>>>> +	queue_size_in_bytes = __vring_size(num, vring_align, packed);
>>>>> +	if (packed)
>>>>> +		vring_packed_init(&vring.vring_packed, num, queue, vring_align);
>>>>> +	else
>>>>> +		vring_init(&vring.vring_split, num, queue, vring_align);
>>>> Let's rename vring_init to vring_init_split() like other helpers?
>>> The vring_init() is a public API in include/uapi/linux/virtio_ring.h.
>>> I don't think we can rename it.
>> I see, then this need more thoughts to unify the API.
> My thought is to keep the old API as is, and introduce
> new types and helpers for packed ring.

I admit it's not a fault of this patch. But we'd better think of this in 
the future, consider we may have new kinds of ring.

>
> More details can be found in this patch:
> https://lkml.org/lkml/2018/2/23/243
> (PS. The type which has bit fields is just for reference,
>   and will be changed in next version.)
>
> Do you have any other suggestions?

No.

Thanks

>
> Best regards,
> Tiwei Bie
>
>>>>> -	vq = __vring_new_virtqueue(index, vring, vdev, weak_barriers, context,
>>>>> -				   notify, callback, name);
>>>>> +	vq = __vring_new_virtqueue(index, vring, packed, vdev, weak_barriers,
>>>>> +				   context, notify, callback, name);
>>>>>     	if (!vq) {
>>>>>     		vring_free_queue(vdev, queue_size_in_bytes, queue,
>>>>>     				 dma_addr);
> [...]

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH RFC 2/2] virtio_ring: support packed ring
  2018-03-16  8:34             ` Jason Wang
  (?)
@ 2018-03-16 10:04             ` Tiwei Bie
  2018-03-16 11:36                 ` Jason Wang
  -1 siblings, 1 reply; 38+ messages in thread
From: Tiwei Bie @ 2018-03-16 10:04 UTC (permalink / raw)
  To: Jason Wang; +Cc: mst, virtualization, linux-kernel, netdev, wexu, jfreimann

On Fri, Mar 16, 2018 at 04:34:28PM +0800, Jason Wang wrote:
> On 2018年03月16日 15:40, Tiwei Bie wrote:
> > On Fri, Mar 16, 2018 at 02:44:12PM +0800, Jason Wang wrote:
> > > On 2018年03月16日 14:10, Tiwei Bie wrote:
> > > > On Fri, Mar 16, 2018 at 12:03:25PM +0800, Jason Wang wrote:
> > > > > On 2018年02月23日 19:18, Tiwei Bie wrote:
> > > > > > Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
> > > > > > ---
> > > > > >     drivers/virtio/virtio_ring.c | 699 +++++++++++++++++++++++++++++++++++++------
> > > > > >     include/linux/virtio_ring.h  |   8 +-
> > > > > >     2 files changed, 618 insertions(+), 89 deletions(-)
> > [...]
> > > > > >     			      cpu_addr, size, direction);
> > > > > >     }
> > > > > > -static void vring_unmap_one(const struct vring_virtqueue *vq,
> > > > > > -			    struct vring_desc *desc)
> > > > > > +static void vring_unmap_one(const struct vring_virtqueue *vq, void *_desc)
> > > > > >     {
> > > > > Let's split the helpers to packed/split version like other helpers?
> > > > > (Consider the caller has already known the type of vq).
> > > > Okay.
> > > > 
> > > [...]
> > > 
> > > > > > +				desc[i].flags = flags;
> > > > > > +
> > > > > > +			desc[i].addr = cpu_to_virtio64(_vq->vdev, addr);
> > > > > > +			desc[i].len = cpu_to_virtio32(_vq->vdev, sg->length);
> > > > > > +			desc[i].id = cpu_to_virtio32(_vq->vdev, head);
> > > > > If it's a part of chain, we only need to do this for last buffer I think.
> > > > I'm not sure I've got your point about the "last buffer".
> > > > But, yes, id just needs to be set for the last desc.
> > > Right, I think I meant "last descriptor" :)
> > > 
> > > > > > +			prev = i;
> > > > > > +			i++;
> > > > > It looks to me prev is always i - 1?
> > > > No. prev will be (vq->vring_packed.num - 1) when i becomes 0.
> > > Right, so prev = i ? i - 1 : vq->vring_packed.num - 1.
> > Yes, i wraps together with vq->wrap_counter in following code:
> > 
> > > > > > +			if (!indirect && i >= vq->vring_packed.num) {
> > > > > > +				i = 0;
> > > > > > +				vq->wrap_counter ^= 1;
> > > > > > +			}
> > 
> > > > > > +		}
> > > > > > +	}
> > > > > > +	for (; n < (out_sgs + in_sgs); n++) {
> > > > > > +		for (sg = sgs[n]; sg; sg = sg_next(sg)) {
> > > > > > +			dma_addr_t addr = vring_map_one_sg(vq, sg, DMA_FROM_DEVICE);
> > > > > > +			if (vring_mapping_error(vq, addr))
> > > > > > +				goto unmap_release;
> > > > > > +
> > > > > > +			flags = cpu_to_virtio16(_vq->vdev, VRING_DESC_F_NEXT |
> > > > > > +					VRING_DESC_F_WRITE |
> > > > > > +					VRING_DESC_F_AVAIL(vq->wrap_counter) |
> > > > > > +					VRING_DESC_F_USED(!vq->wrap_counter));
> > > > > > +			if (!indirect && i == head)
> > > > > > +				head_flags = flags;
> > > > > > +			else
> > > > > > +				desc[i].flags = flags;
> > > > > > +
> > > > > > +			desc[i].addr = cpu_to_virtio64(_vq->vdev, addr);
> > > > > > +			desc[i].len = cpu_to_virtio32(_vq->vdev, sg->length);
> > > > > > +			desc[i].id = cpu_to_virtio32(_vq->vdev, head);
> > > > > > +			prev = i;
> > > > > > +			i++;
> > > > > > +			if (!indirect && i >= vq->vring_packed.num) {
> > > > > > +				i = 0;
> > > > > > +				vq->wrap_counter ^= 1;
> > > > > > +			}
> > > > > > +		}
> > > > > > +	}
> > > > > > +	/* Last one doesn't continue. */
> > > > > > +	if (!indirect && (head + 1) % vq->vring_packed.num == i)
> > > > > > +		head_flags &= cpu_to_virtio16(_vq->vdev, ~VRING_DESC_F_NEXT);
> > > > > I can't get the why we need this here.
> > > > If only one desc is used, we will need to clear the
> > > > VRING_DESC_F_NEXT flag from the head_flags.
> > > Yes, I meant why following desc[prev].flags won't work for this?
> > Because the update of desc[head].flags (in above case,
> > prev == head) has been delayed. The flags is saved in
> > head_flags.
> 
> Ok, but let's try to avoid modular here e.g tracking the number of sgs in a
> counter.
> 
> And I see lots of duplication in the above two loops, I believe we can unify
> them with a a single loop. the only difference is dma direction and write
> flag.

The above implementation for packed ring is basically
an mirror of the existing implementation in split ring
as I want to keep the coding style consistent. Below
is the corresponding code in split ring:

static inline int virtqueue_add(struct virtqueue *_vq,
				struct scatterlist *sgs[],
				unsigned int total_sg,
				unsigned int out_sgs,
				unsigned int in_sgs,
				void *data,
				void *ctx,
				gfp_t gfp)
{
	......

	for (n = 0; n < out_sgs; n++) {
		for (sg = sgs[n]; sg; sg = sg_next(sg)) {
			dma_addr_t addr = vring_map_one_sg(vq, sg, DMA_TO_DEVICE);
			if (vring_mapping_error(vq, addr))
				goto unmap_release;

			desc[i].flags = cpu_to_virtio16(_vq->vdev, VRING_DESC_F_NEXT);
			desc[i].addr = cpu_to_virtio64(_vq->vdev, addr);
			desc[i].len = cpu_to_virtio32(_vq->vdev, sg->length);
			prev = i;
			i = virtio16_to_cpu(_vq->vdev, desc[i].next);
		}
	}
	for (; n < (out_sgs + in_sgs); n++) {
		for (sg = sgs[n]; sg; sg = sg_next(sg)) {
			dma_addr_t addr = vring_map_one_sg(vq, sg, DMA_FROM_DEVICE);
			if (vring_mapping_error(vq, addr))
				goto unmap_release;

			desc[i].flags = cpu_to_virtio16(_vq->vdev, VRING_DESC_F_NEXT | VRING_DESC_F_WRITE);
			desc[i].addr = cpu_to_virtio64(_vq->vdev, addr);
			desc[i].len = cpu_to_virtio32(_vq->vdev, sg->length);
			prev = i;
			i = virtio16_to_cpu(_vq->vdev, desc[i].next);
		}
	}

	......
}

> 
> > 
> > > > > > +	else
> > > > > > +		desc[prev].flags &= cpu_to_virtio16(_vq->vdev, ~VRING_DESC_F_NEXT);
> > > > > > +
> > > > > > +	if (indirect) {
> > > > > > +		/* FIXME: to be implemented */
> > > > > > +
> > > > > > +		/* Now that the indirect table is filled in, map it. */
> > > > > > +		dma_addr_t addr = vring_map_single(
> > > > > > +			vq, desc, total_sg * sizeof(struct vring_packed_desc),
> > > > > > +			DMA_TO_DEVICE);
> > > > > > +		if (vring_mapping_error(vq, addr))
> > > > > > +			goto unmap_release;
> > > > > > +
> > > > > > +		head_flags = cpu_to_virtio16(_vq->vdev, VRING_DESC_F_INDIRECT |
> > > > > > +					     VRING_DESC_F_AVAIL(wrap_counter) |
> > > > > > +					     VRING_DESC_F_USED(!wrap_counter));
> > > > > > +		vq->vring_packed.desc[head].addr = cpu_to_virtio64(_vq->vdev, addr);
> > > > > > +		vq->vring_packed.desc[head].len = cpu_to_virtio32(_vq->vdev,
> > > > > > +				total_sg * sizeof(struct vring_packed_desc));
> > > > > > +		vq->vring_packed.desc[head].id = cpu_to_virtio32(_vq->vdev, head);
> > > > > > +	}
> > > > > > +
> > > > > > +	/* We're using some buffers from the free list. */
> > > > > > +	vq->vq.num_free -= descs_used;
> > > > > > +
> > > > > > +	/* Update free pointer */
> > > > > > +	if (indirect) {
> > > > > > +		n = head + 1;
> > > > > > +		if (n >= vq->vring_packed.num) {
> > > > > > +			n = 0;
> > > > > > +			vq->wrap_counter ^= 1;
> > > > > > +		}
> > > > > > +		vq->free_head = n;
> > > > > detach_buf_packed() does not even touch free_head here, so need to explain
> > > > > its meaning for packed ring.
> > > > Above code is for indirect support which isn't really
> > > > implemented in this patch yet.
> > > > 
> > > > For your question, free_head stores the index of the
> > > > next avail desc. I'll add a comment for it or move it
> > > > to union and give it a better name in next version.
> > > Yes, something like avail_idx might be better.
> > > 
> > > > > > +	} else
> > > > > > +		vq->free_head = i;
> > > > > ID is only valid in the last descriptor in the list, so head + 1 should be
> > > > > ok too?
> > > > I don't really get your point. The vq->free_head stores
> > > > the index of the next avail desc.
> > > I think I get your idea now, free_head has two meanings:
> > > 
> > > - next avail index
> > > - buffer id
> > In my design, free_head is just the index of the next
> > avail desc.
> > 
> > Driver can set anything to buffer ID.
> 
> Then you need another method to track id to context e.g hashing.

I keep the context in desc_state[desc_idx]. So there is
no extra method needed to track the context.

> 
> >   And in my design,
> > I save desc index in buffer ID.
> > 
> > I'll add comments for them.
> > 
> > > If I'm correct, let's better add a comment for this.
> > > 
> > > > > > +
> > > > > > +	/* Store token and indirect buffer state. */
> > > > > > +	vq->desc_state[head].num = descs_used;
> > > > > > +	vq->desc_state[head].data = data;
> > > > > > +	if (indirect)
> > > > > > +		vq->desc_state[head].indir_desc = desc;
> > > > > > +	else
> > > > > > +		vq->desc_state[head].indir_desc = ctx;
> > > > > > +
> > > > > > +	virtio_wmb(vq->weak_barriers);
> > > > > Let's add a comment to explain the barrier here.
> > > > Okay.
> > > > 
> > > > > > +	vq->vring_packed.desc[head].flags = head_flags;
> > > > > > +	vq->num_added++;
> > > > > > +
> > > > > > +	pr_debug("Added buffer head %i to %p\n", head, vq);
> > > > > > +	END_USE(vq);
> > > > > > +
> > > > > > +	return 0;
> > > > > > +
> > > > > > +unmap_release:
> > > > > > +	err_idx = i;
> > > > > > +	i = head;
> > > > > > +
> > > > > > +	for (n = 0; n < total_sg; n++) {
> > > > > > +		if (i == err_idx)
> > > > > > +			break;
> > > > > > +		vring_unmap_one(vq, &desc[i]);
> > > > > > +		i++;
> > > > > > +		if (!indirect && i >= vq->vring_packed.num)
> > > > > > +			i = 0;
> > > > > > +	}
> > > > > > +
> > > > > > +	vq->wrap_counter = wrap_counter;
> > > > > > +
> > > > > > +	if (indirect)
> > > > > > +		kfree(desc);
> > > > > > +
> > > > > > +	END_USE(vq);
> > > > > > +	return -EIO;
> > > > > > +}
> > [...]
> > > > > > @@ -1096,17 +1599,21 @@ struct virtqueue *vring_create_virtqueue(
> > > > > >     	if (!queue) {
> > > > > >     		/* Try to get a single page. You are my only hope! */
> > > > > > -		queue = vring_alloc_queue(vdev, vring_size(num, vring_align),
> > > > > > +		queue = vring_alloc_queue(vdev, __vring_size(num, vring_align,
> > > > > > +							     packed),
> > > > > >     					  &dma_addr, GFP_KERNEL|__GFP_ZERO);
> > > > > >     	}
> > > > > >     	if (!queue)
> > > > > >     		return NULL;
> > > > > > -	queue_size_in_bytes = vring_size(num, vring_align);
> > > > > > -	vring_init(&vring, num, queue, vring_align);
> > > > > > +	queue_size_in_bytes = __vring_size(num, vring_align, packed);
> > > > > > +	if (packed)
> > > > > > +		vring_packed_init(&vring.vring_packed, num, queue, vring_align);
> > > > > > +	else
> > > > > > +		vring_init(&vring.vring_split, num, queue, vring_align);
> > > > > Let's rename vring_init to vring_init_split() like other helpers?
> > > > The vring_init() is a public API in include/uapi/linux/virtio_ring.h.
> > > > I don't think we can rename it.
> > > I see, then this need more thoughts to unify the API.
> > My thought is to keep the old API as is, and introduce
> > new types and helpers for packed ring.
> 
> I admit it's not a fault of this patch. But we'd better think of this in the
> future, consider we may have new kinds of ring.
> 
> > 
> > More details can be found in this patch:
> > https://lkml.org/lkml/2018/2/23/243
> > (PS. The type which has bit fields is just for reference,
> >   and will be changed in next version.)
> > 
> > Do you have any other suggestions?
> 
> No.

Hmm.. Sorry, I didn't describe my question well.
I mean do you have any suggestions about the API
design for packed ring in uapi header? Currently
I introduced below two new helpers:

static inline void vring_packed_init(struct vring_packed *vr, unsigned int num,
				     void *p, unsigned long align);
static inline unsigned vring_packed_size(unsigned int num, unsigned long align);

When new rings are introduced in the future, above
helpers can't be reused. Maybe we should make the
helpers be able to determine the ring type?

Best regards,
Tiwei Bie

> 
> Thanks
> 
> > 
> > Best regards,
> > Tiwei Bie
> > 
> > > > > > -	vq = __vring_new_virtqueue(index, vring, vdev, weak_barriers, context,
> > > > > > -				   notify, callback, name);
> > > > > > +	vq = __vring_new_virtqueue(index, vring, packed, vdev, weak_barriers,
> > > > > > +				   context, notify, callback, name);
> > > > > >     	if (!vq) {
> > > > > >     		vring_free_queue(vdev, queue_size_in_bytes, queue,
> > > > > >     				 dma_addr);
> > [...]
> 

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH RFC 2/2] virtio_ring: support packed ring
  2018-03-16  8:34             ` Jason Wang
  (?)
  (?)
@ 2018-03-16 10:04             ` Tiwei Bie
  -1 siblings, 0 replies; 38+ messages in thread
From: Tiwei Bie @ 2018-03-16 10:04 UTC (permalink / raw)
  To: Jason Wang; +Cc: mst, netdev, linux-kernel, virtualization, wexu

On Fri, Mar 16, 2018 at 04:34:28PM +0800, Jason Wang wrote:
> On 2018年03月16日 15:40, Tiwei Bie wrote:
> > On Fri, Mar 16, 2018 at 02:44:12PM +0800, Jason Wang wrote:
> > > On 2018年03月16日 14:10, Tiwei Bie wrote:
> > > > On Fri, Mar 16, 2018 at 12:03:25PM +0800, Jason Wang wrote:
> > > > > On 2018年02月23日 19:18, Tiwei Bie wrote:
> > > > > > Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
> > > > > > ---
> > > > > >     drivers/virtio/virtio_ring.c | 699 +++++++++++++++++++++++++++++++++++++------
> > > > > >     include/linux/virtio_ring.h  |   8 +-
> > > > > >     2 files changed, 618 insertions(+), 89 deletions(-)
> > [...]
> > > > > >     			      cpu_addr, size, direction);
> > > > > >     }
> > > > > > -static void vring_unmap_one(const struct vring_virtqueue *vq,
> > > > > > -			    struct vring_desc *desc)
> > > > > > +static void vring_unmap_one(const struct vring_virtqueue *vq, void *_desc)
> > > > > >     {
> > > > > Let's split the helpers to packed/split version like other helpers?
> > > > > (Consider the caller has already known the type of vq).
> > > > Okay.
> > > > 
> > > [...]
> > > 
> > > > > > +				desc[i].flags = flags;
> > > > > > +
> > > > > > +			desc[i].addr = cpu_to_virtio64(_vq->vdev, addr);
> > > > > > +			desc[i].len = cpu_to_virtio32(_vq->vdev, sg->length);
> > > > > > +			desc[i].id = cpu_to_virtio32(_vq->vdev, head);
> > > > > If it's a part of chain, we only need to do this for last buffer I think.
> > > > I'm not sure I've got your point about the "last buffer".
> > > > But, yes, id just needs to be set for the last desc.
> > > Right, I think I meant "last descriptor" :)
> > > 
> > > > > > +			prev = i;
> > > > > > +			i++;
> > > > > It looks to me prev is always i - 1?
> > > > No. prev will be (vq->vring_packed.num - 1) when i becomes 0.
> > > Right, so prev = i ? i - 1 : vq->vring_packed.num - 1.
> > Yes, i wraps together with vq->wrap_counter in following code:
> > 
> > > > > > +			if (!indirect && i >= vq->vring_packed.num) {
> > > > > > +				i = 0;
> > > > > > +				vq->wrap_counter ^= 1;
> > > > > > +			}
> > 
> > > > > > +		}
> > > > > > +	}
> > > > > > +	for (; n < (out_sgs + in_sgs); n++) {
> > > > > > +		for (sg = sgs[n]; sg; sg = sg_next(sg)) {
> > > > > > +			dma_addr_t addr = vring_map_one_sg(vq, sg, DMA_FROM_DEVICE);
> > > > > > +			if (vring_mapping_error(vq, addr))
> > > > > > +				goto unmap_release;
> > > > > > +
> > > > > > +			flags = cpu_to_virtio16(_vq->vdev, VRING_DESC_F_NEXT |
> > > > > > +					VRING_DESC_F_WRITE |
> > > > > > +					VRING_DESC_F_AVAIL(vq->wrap_counter) |
> > > > > > +					VRING_DESC_F_USED(!vq->wrap_counter));
> > > > > > +			if (!indirect && i == head)
> > > > > > +				head_flags = flags;
> > > > > > +			else
> > > > > > +				desc[i].flags = flags;
> > > > > > +
> > > > > > +			desc[i].addr = cpu_to_virtio64(_vq->vdev, addr);
> > > > > > +			desc[i].len = cpu_to_virtio32(_vq->vdev, sg->length);
> > > > > > +			desc[i].id = cpu_to_virtio32(_vq->vdev, head);
> > > > > > +			prev = i;
> > > > > > +			i++;
> > > > > > +			if (!indirect && i >= vq->vring_packed.num) {
> > > > > > +				i = 0;
> > > > > > +				vq->wrap_counter ^= 1;
> > > > > > +			}
> > > > > > +		}
> > > > > > +	}
> > > > > > +	/* Last one doesn't continue. */
> > > > > > +	if (!indirect && (head + 1) % vq->vring_packed.num == i)
> > > > > > +		head_flags &= cpu_to_virtio16(_vq->vdev, ~VRING_DESC_F_NEXT);
> > > > > I can't get the why we need this here.
> > > > If only one desc is used, we will need to clear the
> > > > VRING_DESC_F_NEXT flag from the head_flags.
> > > Yes, I meant why following desc[prev].flags won't work for this?
> > Because the update of desc[head].flags (in above case,
> > prev == head) has been delayed. The flags is saved in
> > head_flags.
> 
> Ok, but let's try to avoid modular here e.g tracking the number of sgs in a
> counter.
> 
> And I see lots of duplication in the above two loops, I believe we can unify
> them with a a single loop. the only difference is dma direction and write
> flag.

The above implementation for packed ring is basically
an mirror of the existing implementation in split ring
as I want to keep the coding style consistent. Below
is the corresponding code in split ring:

static inline int virtqueue_add(struct virtqueue *_vq,
				struct scatterlist *sgs[],
				unsigned int total_sg,
				unsigned int out_sgs,
				unsigned int in_sgs,
				void *data,
				void *ctx,
				gfp_t gfp)
{
	......

	for (n = 0; n < out_sgs; n++) {
		for (sg = sgs[n]; sg; sg = sg_next(sg)) {
			dma_addr_t addr = vring_map_one_sg(vq, sg, DMA_TO_DEVICE);
			if (vring_mapping_error(vq, addr))
				goto unmap_release;

			desc[i].flags = cpu_to_virtio16(_vq->vdev, VRING_DESC_F_NEXT);
			desc[i].addr = cpu_to_virtio64(_vq->vdev, addr);
			desc[i].len = cpu_to_virtio32(_vq->vdev, sg->length);
			prev = i;
			i = virtio16_to_cpu(_vq->vdev, desc[i].next);
		}
	}
	for (; n < (out_sgs + in_sgs); n++) {
		for (sg = sgs[n]; sg; sg = sg_next(sg)) {
			dma_addr_t addr = vring_map_one_sg(vq, sg, DMA_FROM_DEVICE);
			if (vring_mapping_error(vq, addr))
				goto unmap_release;

			desc[i].flags = cpu_to_virtio16(_vq->vdev, VRING_DESC_F_NEXT | VRING_DESC_F_WRITE);
			desc[i].addr = cpu_to_virtio64(_vq->vdev, addr);
			desc[i].len = cpu_to_virtio32(_vq->vdev, sg->length);
			prev = i;
			i = virtio16_to_cpu(_vq->vdev, desc[i].next);
		}
	}

	......
}

> 
> > 
> > > > > > +	else
> > > > > > +		desc[prev].flags &= cpu_to_virtio16(_vq->vdev, ~VRING_DESC_F_NEXT);
> > > > > > +
> > > > > > +	if (indirect) {
> > > > > > +		/* FIXME: to be implemented */
> > > > > > +
> > > > > > +		/* Now that the indirect table is filled in, map it. */
> > > > > > +		dma_addr_t addr = vring_map_single(
> > > > > > +			vq, desc, total_sg * sizeof(struct vring_packed_desc),
> > > > > > +			DMA_TO_DEVICE);
> > > > > > +		if (vring_mapping_error(vq, addr))
> > > > > > +			goto unmap_release;
> > > > > > +
> > > > > > +		head_flags = cpu_to_virtio16(_vq->vdev, VRING_DESC_F_INDIRECT |
> > > > > > +					     VRING_DESC_F_AVAIL(wrap_counter) |
> > > > > > +					     VRING_DESC_F_USED(!wrap_counter));
> > > > > > +		vq->vring_packed.desc[head].addr = cpu_to_virtio64(_vq->vdev, addr);
> > > > > > +		vq->vring_packed.desc[head].len = cpu_to_virtio32(_vq->vdev,
> > > > > > +				total_sg * sizeof(struct vring_packed_desc));
> > > > > > +		vq->vring_packed.desc[head].id = cpu_to_virtio32(_vq->vdev, head);
> > > > > > +	}
> > > > > > +
> > > > > > +	/* We're using some buffers from the free list. */
> > > > > > +	vq->vq.num_free -= descs_used;
> > > > > > +
> > > > > > +	/* Update free pointer */
> > > > > > +	if (indirect) {
> > > > > > +		n = head + 1;
> > > > > > +		if (n >= vq->vring_packed.num) {
> > > > > > +			n = 0;
> > > > > > +			vq->wrap_counter ^= 1;
> > > > > > +		}
> > > > > > +		vq->free_head = n;
> > > > > detach_buf_packed() does not even touch free_head here, so need to explain
> > > > > its meaning for packed ring.
> > > > Above code is for indirect support which isn't really
> > > > implemented in this patch yet.
> > > > 
> > > > For your question, free_head stores the index of the
> > > > next avail desc. I'll add a comment for it or move it
> > > > to union and give it a better name in next version.
> > > Yes, something like avail_idx might be better.
> > > 
> > > > > > +	} else
> > > > > > +		vq->free_head = i;
> > > > > ID is only valid in the last descriptor in the list, so head + 1 should be
> > > > > ok too?
> > > > I don't really get your point. The vq->free_head stores
> > > > the index of the next avail desc.
> > > I think I get your idea now, free_head has two meanings:
> > > 
> > > - next avail index
> > > - buffer id
> > In my design, free_head is just the index of the next
> > avail desc.
> > 
> > Driver can set anything to buffer ID.
> 
> Then you need another method to track id to context e.g hashing.

I keep the context in desc_state[desc_idx]. So there is
no extra method needed to track the context.

> 
> >   And in my design,
> > I save desc index in buffer ID.
> > 
> > I'll add comments for them.
> > 
> > > If I'm correct, let's better add a comment for this.
> > > 
> > > > > > +
> > > > > > +	/* Store token and indirect buffer state. */
> > > > > > +	vq->desc_state[head].num = descs_used;
> > > > > > +	vq->desc_state[head].data = data;
> > > > > > +	if (indirect)
> > > > > > +		vq->desc_state[head].indir_desc = desc;
> > > > > > +	else
> > > > > > +		vq->desc_state[head].indir_desc = ctx;
> > > > > > +
> > > > > > +	virtio_wmb(vq->weak_barriers);
> > > > > Let's add a comment to explain the barrier here.
> > > > Okay.
> > > > 
> > > > > > +	vq->vring_packed.desc[head].flags = head_flags;
> > > > > > +	vq->num_added++;
> > > > > > +
> > > > > > +	pr_debug("Added buffer head %i to %p\n", head, vq);
> > > > > > +	END_USE(vq);
> > > > > > +
> > > > > > +	return 0;
> > > > > > +
> > > > > > +unmap_release:
> > > > > > +	err_idx = i;
> > > > > > +	i = head;
> > > > > > +
> > > > > > +	for (n = 0; n < total_sg; n++) {
> > > > > > +		if (i == err_idx)
> > > > > > +			break;
> > > > > > +		vring_unmap_one(vq, &desc[i]);
> > > > > > +		i++;
> > > > > > +		if (!indirect && i >= vq->vring_packed.num)
> > > > > > +			i = 0;
> > > > > > +	}
> > > > > > +
> > > > > > +	vq->wrap_counter = wrap_counter;
> > > > > > +
> > > > > > +	if (indirect)
> > > > > > +		kfree(desc);
> > > > > > +
> > > > > > +	END_USE(vq);
> > > > > > +	return -EIO;
> > > > > > +}
> > [...]
> > > > > > @@ -1096,17 +1599,21 @@ struct virtqueue *vring_create_virtqueue(
> > > > > >     	if (!queue) {
> > > > > >     		/* Try to get a single page. You are my only hope! */
> > > > > > -		queue = vring_alloc_queue(vdev, vring_size(num, vring_align),
> > > > > > +		queue = vring_alloc_queue(vdev, __vring_size(num, vring_align,
> > > > > > +							     packed),
> > > > > >     					  &dma_addr, GFP_KERNEL|__GFP_ZERO);
> > > > > >     	}
> > > > > >     	if (!queue)
> > > > > >     		return NULL;
> > > > > > -	queue_size_in_bytes = vring_size(num, vring_align);
> > > > > > -	vring_init(&vring, num, queue, vring_align);
> > > > > > +	queue_size_in_bytes = __vring_size(num, vring_align, packed);
> > > > > > +	if (packed)
> > > > > > +		vring_packed_init(&vring.vring_packed, num, queue, vring_align);
> > > > > > +	else
> > > > > > +		vring_init(&vring.vring_split, num, queue, vring_align);
> > > > > Let's rename vring_init to vring_init_split() like other helpers?
> > > > The vring_init() is a public API in include/uapi/linux/virtio_ring.h.
> > > > I don't think we can rename it.
> > > I see, then this need more thoughts to unify the API.
> > My thought is to keep the old API as is, and introduce
> > new types and helpers for packed ring.
> 
> I admit it's not a fault of this patch. But we'd better think of this in the
> future, consider we may have new kinds of ring.
> 
> > 
> > More details can be found in this patch:
> > https://lkml.org/lkml/2018/2/23/243
> > (PS. The type which has bit fields is just for reference,
> >   and will be changed in next version.)
> > 
> > Do you have any other suggestions?
> 
> No.

Hmm.. Sorry, I didn't describe my question well.
I mean do you have any suggestions about the API
design for packed ring in uapi header? Currently
I introduced below two new helpers:

static inline void vring_packed_init(struct vring_packed *vr, unsigned int num,
				     void *p, unsigned long align);
static inline unsigned vring_packed_size(unsigned int num, unsigned long align);

When new rings are introduced in the future, above
helpers can't be reused. Maybe we should make the
helpers be able to determine the ring type?

Best regards,
Tiwei Bie

> 
> Thanks
> 
> > 
> > Best regards,
> > Tiwei Bie
> > 
> > > > > > -	vq = __vring_new_virtqueue(index, vring, vdev, weak_barriers, context,
> > > > > > -				   notify, callback, name);
> > > > > > +	vq = __vring_new_virtqueue(index, vring, packed, vdev, weak_barriers,
> > > > > > +				   context, notify, callback, name);
> > > > > >     	if (!vq) {
> > > > > >     		vring_free_queue(vdev, queue_size_in_bytes, queue,
> > > > > >     				 dma_addr);
> > [...]
> 
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH RFC 2/2] virtio_ring: support packed ring
  2018-03-16 10:04             ` Tiwei Bie
@ 2018-03-16 11:36                 ` Jason Wang
  0 siblings, 0 replies; 38+ messages in thread
From: Jason Wang @ 2018-03-16 11:36 UTC (permalink / raw)
  To: Tiwei Bie; +Cc: mst, virtualization, linux-kernel, netdev, wexu, jfreimann



On 2018年03月16日 18:04, Tiwei Bie wrote:
> On Fri, Mar 16, 2018 at 04:34:28PM +0800, Jason Wang wrote:
>> On 2018年03月16日 15:40, Tiwei Bie wrote:
>>> On Fri, Mar 16, 2018 at 02:44:12PM +0800, Jason Wang wrote:
>>>> On 2018年03月16日 14:10, Tiwei Bie wrote:
>>>>> On Fri, Mar 16, 2018 at 12:03:25PM +0800, Jason Wang wrote:
>>>>>> On 2018年02月23日 19:18, Tiwei Bie wrote:
>>>>>>> Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
>>>>>>> ---
>>>>>>>      drivers/virtio/virtio_ring.c | 699 +++++++++++++++++++++++++++++++++++++------
>>>>>>>      include/linux/virtio_ring.h  |   8 +-
>>>>>>>      2 files changed, 618 insertions(+), 89 deletions(-)
>>>

[...]

>>>>>>> +		}
>>>>>>> +	}
>>>>>>> +	for (; n < (out_sgs + in_sgs); n++) {
>>>>>>> +		for (sg = sgs[n]; sg; sg = sg_next(sg)) {
>>>>>>> +			dma_addr_t addr = vring_map_one_sg(vq, sg, DMA_FROM_DEVICE);
>>>>>>> +			if (vring_mapping_error(vq, addr))
>>>>>>> +				goto unmap_release;
>>>>>>> +
>>>>>>> +			flags = cpu_to_virtio16(_vq->vdev, VRING_DESC_F_NEXT |
>>>>>>> +					VRING_DESC_F_WRITE |
>>>>>>> +					VRING_DESC_F_AVAIL(vq->wrap_counter) |
>>>>>>> +					VRING_DESC_F_USED(!vq->wrap_counter));
>>>>>>> +			if (!indirect && i == head)
>>>>>>> +				head_flags = flags;
>>>>>>> +			else
>>>>>>> +				desc[i].flags = flags;
>>>>>>> +
>>>>>>> +			desc[i].addr = cpu_to_virtio64(_vq->vdev, addr);
>>>>>>> +			desc[i].len = cpu_to_virtio32(_vq->vdev, sg->length);
>>>>>>> +			desc[i].id = cpu_to_virtio32(_vq->vdev, head);
>>>>>>> +			prev = i;
>>>>>>> +			i++;
>>>>>>> +			if (!indirect && i >= vq->vring_packed.num) {
>>>>>>> +				i = 0;
>>>>>>> +				vq->wrap_counter ^= 1;
>>>>>>> +			}
>>>>>>> +		}
>>>>>>> +	}
>>>>>>> +	/* Last one doesn't continue. */
>>>>>>> +	if (!indirect && (head + 1) % vq->vring_packed.num == i)
>>>>>>> +		head_flags &= cpu_to_virtio16(_vq->vdev, ~VRING_DESC_F_NEXT);
>>>>>> I can't get the why we need this here.
>>>>> If only one desc is used, we will need to clear the
>>>>> VRING_DESC_F_NEXT flag from the head_flags.
>>>> Yes, I meant why following desc[prev].flags won't work for this?
>>> Because the update of desc[head].flags (in above case,
>>> prev == head) has been delayed. The flags is saved in
>>> head_flags.
>> Ok, but let's try to avoid modular here e.g tracking the number of sgs in a
>> counter.
>>
>> And I see lots of duplication in the above two loops, I believe we can unify
>> them with a a single loop. the only difference is dma direction and write
>> flag.
> The above implementation for packed ring is basically
> an mirror of the existing implementation in split ring
> as I want to keep the coding style consistent. Below
> is the corresponding code in split ring:
>
> static inline int virtqueue_add(struct virtqueue *_vq,
> 				struct scatterlist *sgs[],
> 				unsigned int total_sg,
> 				unsigned int out_sgs,
> 				unsigned int in_sgs,
> 				void *data,
> 				void *ctx,
> 				gfp_t gfp)
> {
> 	......
>
> 	for (n = 0; n < out_sgs; n++) {
> 		for (sg = sgs[n]; sg; sg = sg_next(sg)) {
> 			dma_addr_t addr = vring_map_one_sg(vq, sg, DMA_TO_DEVICE);
> 			if (vring_mapping_error(vq, addr))
> 				goto unmap_release;
>
> 			desc[i].flags = cpu_to_virtio16(_vq->vdev, VRING_DESC_F_NEXT);
> 			desc[i].addr = cpu_to_virtio64(_vq->vdev, addr);
> 			desc[i].len = cpu_to_virtio32(_vq->vdev, sg->length);
> 			prev = i;
> 			i = virtio16_to_cpu(_vq->vdev, desc[i].next);
> 		}
> 	}
> 	for (; n < (out_sgs + in_sgs); n++) {
> 		for (sg = sgs[n]; sg; sg = sg_next(sg)) {
> 			dma_addr_t addr = vring_map_one_sg(vq, sg, DMA_FROM_DEVICE);
> 			if (vring_mapping_error(vq, addr))
> 				goto unmap_release;
>
> 			desc[i].flags = cpu_to_virtio16(_vq->vdev, VRING_DESC_F_NEXT | VRING_DESC_F_WRITE);
> 			desc[i].addr = cpu_to_virtio64(_vq->vdev, addr);
> 			desc[i].len = cpu_to_virtio32(_vq->vdev, sg->length);
> 			prev = i;
> 			i = virtio16_to_cpu(_vq->vdev, desc[i].next);
> 		}
> 	}
>
> 	......
> }

There's no need for such consistency especially consider it's a new kind 
of ring. Anyway, you can stick to it.

[...]

>>>>>>> +	} else
>>>>>>> +		vq->free_head = i;
>>>>>> ID is only valid in the last descriptor in the list, so head + 1 should be
>>>>>> ok too?
>>>>> I don't really get your point. The vq->free_head stores
>>>>> the index of the next avail desc.
>>>> I think I get your idea now, free_head has two meanings:
>>>>
>>>> - next avail index
>>>> - buffer id
>>> In my design, free_head is just the index of the next
>>> avail desc.
>>>
>>> Driver can set anything to buffer ID.
>> Then you need another method to track id to context e.g hashing.
> I keep the context in desc_state[desc_idx]. So there is
> no extra method needed to track the context.

Well, it works for this patch but my reply was for "set anything to 
buffer ID". The size of desc_state is limited, so in fact you can't use 
a value greater than vq.num.


>

[...]

>> @@ -1096,17 +1599,21 @@ struct virtqueue *vring_create_virtqueue(
>>>>>>>      	if (!queue) {
>>>>>>>      		/* Try to get a single page. You are my only hope! */
>>>>>>> -		queue = vring_alloc_queue(vdev, vring_size(num, vring_align),
>>>>>>> +		queue = vring_alloc_queue(vdev, __vring_size(num, vring_align,
>>>>>>> +							     packed),
>>>>>>>      					  &dma_addr, GFP_KERNEL|__GFP_ZERO);
>>>>>>>      	}
>>>>>>>      	if (!queue)
>>>>>>>      		return NULL;
>>>>>>> -	queue_size_in_bytes = vring_size(num, vring_align);
>>>>>>> -	vring_init(&vring, num, queue, vring_align);
>>>>>>> +	queue_size_in_bytes = __vring_size(num, vring_align, packed);
>>>>>>> +	if (packed)
>>>>>>> +		vring_packed_init(&vring.vring_packed, num, queue, vring_align);
>>>>>>> +	else
>>>>>>> +		vring_init(&vring.vring_split, num, queue, vring_align);
>>>>>> Let's rename vring_init to vring_init_split() like other helpers?
>>>>> The vring_init() is a public API in include/uapi/linux/virtio_ring.h.
>>>>> I don't think we can rename it.
>>>> I see, then this need more thoughts to unify the API.
>>> My thought is to keep the old API as is, and introduce
>>> new types and helpers for packed ring.
>> I admit it's not a fault of this patch. But we'd better think of this in the
>> future, consider we may have new kinds of ring.
>>
>>> More details can be found in this patch:
>>> https://lkml.org/lkml/2018/2/23/243
>>> (PS. The type which has bit fields is just for reference,
>>>    and will be changed in next version.)
>>>
>>> Do you have any other suggestions?
>> No.
> Hmm.. Sorry, I didn't describe my question well.
> I mean do you have any suggestions about the API
> design for packed ring in uapi header? Currently
> I introduced below two new helpers:
>
> static inline void vring_packed_init(struct vring_packed *vr, unsigned int num,
> 				     void *p, unsigned long align);
> static inline unsigned vring_packed_size(unsigned int num, unsigned long align);
>
> When new rings are introduced in the future, above
> helpers can't be reused. Maybe we should make the
> helpers be able to determine the ring type?

Let's wait for Michael's comment here. Generally, I fail to understand 
why vring_init() become a part of uapi. Git grep shows the only use 
cases are virtio_test/vringh_test.

Thanks

>
> Best regards,
> Tiwei Bie
>
>> Thanks
>>
>>> Best regards,
>>> Tiwei Bie
>>>
>>>>>>> -	vq = __vring_new_virtqueue(index, vring, vdev, weak_barriers, context,
>>>>>>> -				   notify, callback, name);
>>>>>>> +	vq = __vring_new_virtqueue(index, vring, packed, vdev, weak_barriers,
>>>>>>> +				   context, notify, callback, name);
>>>>>>>      	if (!vq) {
>>>>>>>      		vring_free_queue(vdev, queue_size_in_bytes, queue,
>>>>>>>      				 dma_addr);
>>> [...]

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH RFC 2/2] virtio_ring: support packed ring
@ 2018-03-16 11:36                 ` Jason Wang
  0 siblings, 0 replies; 38+ messages in thread
From: Jason Wang @ 2018-03-16 11:36 UTC (permalink / raw)
  To: Tiwei Bie; +Cc: mst, netdev, linux-kernel, virtualization, wexu



On 2018年03月16日 18:04, Tiwei Bie wrote:
> On Fri, Mar 16, 2018 at 04:34:28PM +0800, Jason Wang wrote:
>> On 2018年03月16日 15:40, Tiwei Bie wrote:
>>> On Fri, Mar 16, 2018 at 02:44:12PM +0800, Jason Wang wrote:
>>>> On 2018年03月16日 14:10, Tiwei Bie wrote:
>>>>> On Fri, Mar 16, 2018 at 12:03:25PM +0800, Jason Wang wrote:
>>>>>> On 2018年02月23日 19:18, Tiwei Bie wrote:
>>>>>>> Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
>>>>>>> ---
>>>>>>>      drivers/virtio/virtio_ring.c | 699 +++++++++++++++++++++++++++++++++++++------
>>>>>>>      include/linux/virtio_ring.h  |   8 +-
>>>>>>>      2 files changed, 618 insertions(+), 89 deletions(-)
>>>

[...]

>>>>>>> +		}
>>>>>>> +	}
>>>>>>> +	for (; n < (out_sgs + in_sgs); n++) {
>>>>>>> +		for (sg = sgs[n]; sg; sg = sg_next(sg)) {
>>>>>>> +			dma_addr_t addr = vring_map_one_sg(vq, sg, DMA_FROM_DEVICE);
>>>>>>> +			if (vring_mapping_error(vq, addr))
>>>>>>> +				goto unmap_release;
>>>>>>> +
>>>>>>> +			flags = cpu_to_virtio16(_vq->vdev, VRING_DESC_F_NEXT |
>>>>>>> +					VRING_DESC_F_WRITE |
>>>>>>> +					VRING_DESC_F_AVAIL(vq->wrap_counter) |
>>>>>>> +					VRING_DESC_F_USED(!vq->wrap_counter));
>>>>>>> +			if (!indirect && i == head)
>>>>>>> +				head_flags = flags;
>>>>>>> +			else
>>>>>>> +				desc[i].flags = flags;
>>>>>>> +
>>>>>>> +			desc[i].addr = cpu_to_virtio64(_vq->vdev, addr);
>>>>>>> +			desc[i].len = cpu_to_virtio32(_vq->vdev, sg->length);
>>>>>>> +			desc[i].id = cpu_to_virtio32(_vq->vdev, head);
>>>>>>> +			prev = i;
>>>>>>> +			i++;
>>>>>>> +			if (!indirect && i >= vq->vring_packed.num) {
>>>>>>> +				i = 0;
>>>>>>> +				vq->wrap_counter ^= 1;
>>>>>>> +			}
>>>>>>> +		}
>>>>>>> +	}
>>>>>>> +	/* Last one doesn't continue. */
>>>>>>> +	if (!indirect && (head + 1) % vq->vring_packed.num == i)
>>>>>>> +		head_flags &= cpu_to_virtio16(_vq->vdev, ~VRING_DESC_F_NEXT);
>>>>>> I can't get the why we need this here.
>>>>> If only one desc is used, we will need to clear the
>>>>> VRING_DESC_F_NEXT flag from the head_flags.
>>>> Yes, I meant why following desc[prev].flags won't work for this?
>>> Because the update of desc[head].flags (in above case,
>>> prev == head) has been delayed. The flags is saved in
>>> head_flags.
>> Ok, but let's try to avoid modular here e.g tracking the number of sgs in a
>> counter.
>>
>> And I see lots of duplication in the above two loops, I believe we can unify
>> them with a a single loop. the only difference is dma direction and write
>> flag.
> The above implementation for packed ring is basically
> an mirror of the existing implementation in split ring
> as I want to keep the coding style consistent. Below
> is the corresponding code in split ring:
>
> static inline int virtqueue_add(struct virtqueue *_vq,
> 				struct scatterlist *sgs[],
> 				unsigned int total_sg,
> 				unsigned int out_sgs,
> 				unsigned int in_sgs,
> 				void *data,
> 				void *ctx,
> 				gfp_t gfp)
> {
> 	......
>
> 	for (n = 0; n < out_sgs; n++) {
> 		for (sg = sgs[n]; sg; sg = sg_next(sg)) {
> 			dma_addr_t addr = vring_map_one_sg(vq, sg, DMA_TO_DEVICE);
> 			if (vring_mapping_error(vq, addr))
> 				goto unmap_release;
>
> 			desc[i].flags = cpu_to_virtio16(_vq->vdev, VRING_DESC_F_NEXT);
> 			desc[i].addr = cpu_to_virtio64(_vq->vdev, addr);
> 			desc[i].len = cpu_to_virtio32(_vq->vdev, sg->length);
> 			prev = i;
> 			i = virtio16_to_cpu(_vq->vdev, desc[i].next);
> 		}
> 	}
> 	for (; n < (out_sgs + in_sgs); n++) {
> 		for (sg = sgs[n]; sg; sg = sg_next(sg)) {
> 			dma_addr_t addr = vring_map_one_sg(vq, sg, DMA_FROM_DEVICE);
> 			if (vring_mapping_error(vq, addr))
> 				goto unmap_release;
>
> 			desc[i].flags = cpu_to_virtio16(_vq->vdev, VRING_DESC_F_NEXT | VRING_DESC_F_WRITE);
> 			desc[i].addr = cpu_to_virtio64(_vq->vdev, addr);
> 			desc[i].len = cpu_to_virtio32(_vq->vdev, sg->length);
> 			prev = i;
> 			i = virtio16_to_cpu(_vq->vdev, desc[i].next);
> 		}
> 	}
>
> 	......
> }

There's no need for such consistency especially consider it's a new kind 
of ring. Anyway, you can stick to it.

[...]

>>>>>>> +	} else
>>>>>>> +		vq->free_head = i;
>>>>>> ID is only valid in the last descriptor in the list, so head + 1 should be
>>>>>> ok too?
>>>>> I don't really get your point. The vq->free_head stores
>>>>> the index of the next avail desc.
>>>> I think I get your idea now, free_head has two meanings:
>>>>
>>>> - next avail index
>>>> - buffer id
>>> In my design, free_head is just the index of the next
>>> avail desc.
>>>
>>> Driver can set anything to buffer ID.
>> Then you need another method to track id to context e.g hashing.
> I keep the context in desc_state[desc_idx]. So there is
> no extra method needed to track the context.

Well, it works for this patch but my reply was for "set anything to 
buffer ID". The size of desc_state is limited, so in fact you can't use 
a value greater than vq.num.


>

[...]

>> @@ -1096,17 +1599,21 @@ struct virtqueue *vring_create_virtqueue(
>>>>>>>      	if (!queue) {
>>>>>>>      		/* Try to get a single page. You are my only hope! */
>>>>>>> -		queue = vring_alloc_queue(vdev, vring_size(num, vring_align),
>>>>>>> +		queue = vring_alloc_queue(vdev, __vring_size(num, vring_align,
>>>>>>> +							     packed),
>>>>>>>      					  &dma_addr, GFP_KERNEL|__GFP_ZERO);
>>>>>>>      	}
>>>>>>>      	if (!queue)
>>>>>>>      		return NULL;
>>>>>>> -	queue_size_in_bytes = vring_size(num, vring_align);
>>>>>>> -	vring_init(&vring, num, queue, vring_align);
>>>>>>> +	queue_size_in_bytes = __vring_size(num, vring_align, packed);
>>>>>>> +	if (packed)
>>>>>>> +		vring_packed_init(&vring.vring_packed, num, queue, vring_align);
>>>>>>> +	else
>>>>>>> +		vring_init(&vring.vring_split, num, queue, vring_align);
>>>>>> Let's rename vring_init to vring_init_split() like other helpers?
>>>>> The vring_init() is a public API in include/uapi/linux/virtio_ring.h.
>>>>> I don't think we can rename it.
>>>> I see, then this need more thoughts to unify the API.
>>> My thought is to keep the old API as is, and introduce
>>> new types and helpers for packed ring.
>> I admit it's not a fault of this patch. But we'd better think of this in the
>> future, consider we may have new kinds of ring.
>>
>>> More details can be found in this patch:
>>> https://lkml.org/lkml/2018/2/23/243
>>> (PS. The type which has bit fields is just for reference,
>>>    and will be changed in next version.)
>>>
>>> Do you have any other suggestions?
>> No.
> Hmm.. Sorry, I didn't describe my question well.
> I mean do you have any suggestions about the API
> design for packed ring in uapi header? Currently
> I introduced below two new helpers:
>
> static inline void vring_packed_init(struct vring_packed *vr, unsigned int num,
> 				     void *p, unsigned long align);
> static inline unsigned vring_packed_size(unsigned int num, unsigned long align);
>
> When new rings are introduced in the future, above
> helpers can't be reused. Maybe we should make the
> helpers be able to determine the ring type?

Let's wait for Michael's comment here. Generally, I fail to understand 
why vring_init() become a part of uapi. Git grep shows the only use 
cases are virtio_test/vringh_test.

Thanks

>
> Best regards,
> Tiwei Bie
>
>> Thanks
>>
>>> Best regards,
>>> Tiwei Bie
>>>
>>>>>>> -	vq = __vring_new_virtqueue(index, vring, vdev, weak_barriers, context,
>>>>>>> -				   notify, callback, name);
>>>>>>> +	vq = __vring_new_virtqueue(index, vring, packed, vdev, weak_barriers,
>>>>>>> +				   context, notify, callback, name);
>>>>>>>      	if (!vq) {
>>>>>>>      		vring_free_queue(vdev, queue_size_in_bytes, queue,
>>>>>>>      				 dma_addr);
>>> [...]

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH RFC 2/2] virtio_ring: support packed ring
  2018-03-16 11:36                 ` Jason Wang
@ 2018-03-16 14:30                   ` Michael S. Tsirkin
  -1 siblings, 0 replies; 38+ messages in thread
From: Michael S. Tsirkin @ 2018-03-16 14:30 UTC (permalink / raw)
  To: Jason Wang
  Cc: Tiwei Bie, virtualization, linux-kernel, netdev, wexu, jfreimann

On Fri, Mar 16, 2018 at 07:36:47PM +0800, Jason Wang wrote:
> > > @@ -1096,17 +1599,21 @@ struct virtqueue *vring_create_virtqueue(
> > > > > > > >      	if (!queue) {
> > > > > > > >      		/* Try to get a single page. You are my only hope! */
> > > > > > > > -		queue = vring_alloc_queue(vdev, vring_size(num, vring_align),
> > > > > > > > +		queue = vring_alloc_queue(vdev, __vring_size(num, vring_align,
> > > > > > > > +							     packed),
> > > > > > > >      					  &dma_addr, GFP_KERNEL|__GFP_ZERO);
> > > > > > > >      	}
> > > > > > > >      	if (!queue)
> > > > > > > >      		return NULL;
> > > > > > > > -	queue_size_in_bytes = vring_size(num, vring_align);
> > > > > > > > -	vring_init(&vring, num, queue, vring_align);
> > > > > > > > +	queue_size_in_bytes = __vring_size(num, vring_align, packed);
> > > > > > > > +	if (packed)
> > > > > > > > +		vring_packed_init(&vring.vring_packed, num, queue, vring_align);
> > > > > > > > +	else
> > > > > > > > +		vring_init(&vring.vring_split, num, queue, vring_align);
> > > > > > > Let's rename vring_init to vring_init_split() like other helpers?
> > > > > > The vring_init() is a public API in include/uapi/linux/virtio_ring.h.
> > > > > > I don't think we can rename it.
> > > > > I see, then this need more thoughts to unify the API.
> > > > My thought is to keep the old API as is, and introduce
> > > > new types and helpers for packed ring.
> > > I admit it's not a fault of this patch. But we'd better think of this in the
> > > future, consider we may have new kinds of ring.
> > > 
> > > > More details can be found in this patch:
> > > > https://lkml.org/lkml/2018/2/23/243
> > > > (PS. The type which has bit fields is just for reference,
> > > >    and will be changed in next version.)
> > > > 
> > > > Do you have any other suggestions?
> > > No.
> > Hmm.. Sorry, I didn't describe my question well.
> > I mean do you have any suggestions about the API
> > design for packed ring in uapi header? Currently
> > I introduced below two new helpers:
> > 
> > static inline void vring_packed_init(struct vring_packed *vr, unsigned int num,
> > 				     void *p, unsigned long align);
> > static inline unsigned vring_packed_size(unsigned int num, unsigned long align);
> > 
> > When new rings are introduced in the future, above
> > helpers can't be reused. Maybe we should make the
> > helpers be able to determine the ring type?
> 
> Let's wait for Michael's comment here. Generally, I fail to understand why
> vring_init() become a part of uapi. Git grep shows the only use cases are
> virtio_test/vringh_test.
> 
> Thanks

For init - I think it's a mistake that stems from lguest which sometimes
made it less than obvious which code is where.  I don't see a reason to
add to it.

-- 
MST

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH RFC 2/2] virtio_ring: support packed ring
@ 2018-03-16 14:30                   ` Michael S. Tsirkin
  0 siblings, 0 replies; 38+ messages in thread
From: Michael S. Tsirkin @ 2018-03-16 14:30 UTC (permalink / raw)
  To: Jason Wang; +Cc: netdev, linux-kernel, virtualization, wexu

On Fri, Mar 16, 2018 at 07:36:47PM +0800, Jason Wang wrote:
> > > @@ -1096,17 +1599,21 @@ struct virtqueue *vring_create_virtqueue(
> > > > > > > >      	if (!queue) {
> > > > > > > >      		/* Try to get a single page. You are my only hope! */
> > > > > > > > -		queue = vring_alloc_queue(vdev, vring_size(num, vring_align),
> > > > > > > > +		queue = vring_alloc_queue(vdev, __vring_size(num, vring_align,
> > > > > > > > +							     packed),
> > > > > > > >      					  &dma_addr, GFP_KERNEL|__GFP_ZERO);
> > > > > > > >      	}
> > > > > > > >      	if (!queue)
> > > > > > > >      		return NULL;
> > > > > > > > -	queue_size_in_bytes = vring_size(num, vring_align);
> > > > > > > > -	vring_init(&vring, num, queue, vring_align);
> > > > > > > > +	queue_size_in_bytes = __vring_size(num, vring_align, packed);
> > > > > > > > +	if (packed)
> > > > > > > > +		vring_packed_init(&vring.vring_packed, num, queue, vring_align);
> > > > > > > > +	else
> > > > > > > > +		vring_init(&vring.vring_split, num, queue, vring_align);
> > > > > > > Let's rename vring_init to vring_init_split() like other helpers?
> > > > > > The vring_init() is a public API in include/uapi/linux/virtio_ring.h.
> > > > > > I don't think we can rename it.
> > > > > I see, then this need more thoughts to unify the API.
> > > > My thought is to keep the old API as is, and introduce
> > > > new types and helpers for packed ring.
> > > I admit it's not a fault of this patch. But we'd better think of this in the
> > > future, consider we may have new kinds of ring.
> > > 
> > > > More details can be found in this patch:
> > > > https://lkml.org/lkml/2018/2/23/243
> > > > (PS. The type which has bit fields is just for reference,
> > > >    and will be changed in next version.)
> > > > 
> > > > Do you have any other suggestions?
> > > No.
> > Hmm.. Sorry, I didn't describe my question well.
> > I mean do you have any suggestions about the API
> > design for packed ring in uapi header? Currently
> > I introduced below two new helpers:
> > 
> > static inline void vring_packed_init(struct vring_packed *vr, unsigned int num,
> > 				     void *p, unsigned long align);
> > static inline unsigned vring_packed_size(unsigned int num, unsigned long align);
> > 
> > When new rings are introduced in the future, above
> > helpers can't be reused. Maybe we should make the
> > helpers be able to determine the ring type?
> 
> Let's wait for Michael's comment here. Generally, I fail to understand why
> vring_init() become a part of uapi. Git grep shows the only use cases are
> virtio_test/vringh_test.
> 
> Thanks

For init - I think it's a mistake that stems from lguest which sometimes
made it less than obvious which code is where.  I don't see a reason to
add to it.

-- 
MST

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH RFC 2/2] virtio_ring: support packed ring
  2018-03-16 11:36                 ` Jason Wang
@ 2018-03-21  7:30                   ` Tiwei Bie
  -1 siblings, 0 replies; 38+ messages in thread
From: Tiwei Bie @ 2018-03-21  7:30 UTC (permalink / raw)
  To: Jason Wang; +Cc: mst, virtualization, linux-kernel, netdev, wexu, jfreimann

On Fri, Mar 16, 2018 at 07:36:47PM +0800, Jason Wang wrote:
> On 2018年03月16日 18:04, Tiwei Bie wrote:
> > On Fri, Mar 16, 2018 at 04:34:28PM +0800, Jason Wang wrote:
> > > On 2018年03月16日 15:40, Tiwei Bie wrote:
> > > > On Fri, Mar 16, 2018 at 02:44:12PM +0800, Jason Wang wrote:
> > > > > On 2018年03月16日 14:10, Tiwei Bie wrote:
> > > > > > On Fri, Mar 16, 2018 at 12:03:25PM +0800, Jason Wang wrote:
> > > > > > > On 2018年02月23日 19:18, Tiwei Bie wrote:
> > > > > > > > Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
> > > > > > > > ---
> > > > > > > >      drivers/virtio/virtio_ring.c | 699 +++++++++++++++++++++++++++++++++++++------
> > > > > > > >      include/linux/virtio_ring.h  |   8 +-
> > > > > > > >      2 files changed, 618 insertions(+), 89 deletions(-)
[...]
> > > @@ -1096,17 +1599,21 @@ struct virtqueue *vring_create_virtqueue(
> > > > > > > >      	if (!queue) {
> > > > > > > >      		/* Try to get a single page. You are my only hope! */
> > > > > > > > -		queue = vring_alloc_queue(vdev, vring_size(num, vring_align),
> > > > > > > > +		queue = vring_alloc_queue(vdev, __vring_size(num, vring_align,
> > > > > > > > +							     packed),
> > > > > > > >      					  &dma_addr, GFP_KERNEL|__GFP_ZERO);
> > > > > > > >      	}
> > > > > > > >      	if (!queue)
> > > > > > > >      		return NULL;
> > > > > > > > -	queue_size_in_bytes = vring_size(num, vring_align);
> > > > > > > > -	vring_init(&vring, num, queue, vring_align);
> > > > > > > > +	queue_size_in_bytes = __vring_size(num, vring_align, packed);
> > > > > > > > +	if (packed)
> > > > > > > > +		vring_packed_init(&vring.vring_packed, num, queue, vring_align);
> > > > > > > > +	else
> > > > > > > > +		vring_init(&vring.vring_split, num, queue, vring_align);
> > > > > > > Let's rename vring_init to vring_init_split() like other helpers?
> > > > > > The vring_init() is a public API in include/uapi/linux/virtio_ring.h.
> > > > > > I don't think we can rename it.
> > > > > I see, then this need more thoughts to unify the API.
> > > > My thought is to keep the old API as is, and introduce
> > > > new types and helpers for packed ring.
> > > I admit it's not a fault of this patch. But we'd better think of this in the
> > > future, consider we may have new kinds of ring.
> > > 
> > > > More details can be found in this patch:
> > > > https://lkml.org/lkml/2018/2/23/243
> > > > (PS. The type which has bit fields is just for reference,
> > > >    and will be changed in next version.)
> > > > 
> > > > Do you have any other suggestions?
> > > No.
> > Hmm.. Sorry, I didn't describe my question well.
> > I mean do you have any suggestions about the API
> > design for packed ring in uapi header? Currently
> > I introduced below two new helpers:
> > 
> > static inline void vring_packed_init(struct vring_packed *vr, unsigned int num,
> > 				     void *p, unsigned long align);
> > static inline unsigned vring_packed_size(unsigned int num, unsigned long align);
> > 
> > When new rings are introduced in the future, above
> > helpers can't be reused. Maybe we should make the
> > helpers be able to determine the ring type?
> 
> Let's wait for Michael's comment here. Generally, I fail to understand why
> vring_init() become a part of uapi. Git grep shows the only use cases are
> virtio_test/vringh_test.

Thank you very much for the review on this patch!
I'll send out a new version ASAP to address these
comments. :)

Best regards,
Tiwei Bie

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH RFC 2/2] virtio_ring: support packed ring
@ 2018-03-21  7:30                   ` Tiwei Bie
  0 siblings, 0 replies; 38+ messages in thread
From: Tiwei Bie @ 2018-03-21  7:30 UTC (permalink / raw)
  To: Jason Wang; +Cc: mst, netdev, linux-kernel, virtualization, wexu

On Fri, Mar 16, 2018 at 07:36:47PM +0800, Jason Wang wrote:
> On 2018年03月16日 18:04, Tiwei Bie wrote:
> > On Fri, Mar 16, 2018 at 04:34:28PM +0800, Jason Wang wrote:
> > > On 2018年03月16日 15:40, Tiwei Bie wrote:
> > > > On Fri, Mar 16, 2018 at 02:44:12PM +0800, Jason Wang wrote:
> > > > > On 2018年03月16日 14:10, Tiwei Bie wrote:
> > > > > > On Fri, Mar 16, 2018 at 12:03:25PM +0800, Jason Wang wrote:
> > > > > > > On 2018年02月23日 19:18, Tiwei Bie wrote:
> > > > > > > > Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
> > > > > > > > ---
> > > > > > > >      drivers/virtio/virtio_ring.c | 699 +++++++++++++++++++++++++++++++++++++------
> > > > > > > >      include/linux/virtio_ring.h  |   8 +-
> > > > > > > >      2 files changed, 618 insertions(+), 89 deletions(-)
[...]
> > > @@ -1096,17 +1599,21 @@ struct virtqueue *vring_create_virtqueue(
> > > > > > > >      	if (!queue) {
> > > > > > > >      		/* Try to get a single page. You are my only hope! */
> > > > > > > > -		queue = vring_alloc_queue(vdev, vring_size(num, vring_align),
> > > > > > > > +		queue = vring_alloc_queue(vdev, __vring_size(num, vring_align,
> > > > > > > > +							     packed),
> > > > > > > >      					  &dma_addr, GFP_KERNEL|__GFP_ZERO);
> > > > > > > >      	}
> > > > > > > >      	if (!queue)
> > > > > > > >      		return NULL;
> > > > > > > > -	queue_size_in_bytes = vring_size(num, vring_align);
> > > > > > > > -	vring_init(&vring, num, queue, vring_align);
> > > > > > > > +	queue_size_in_bytes = __vring_size(num, vring_align, packed);
> > > > > > > > +	if (packed)
> > > > > > > > +		vring_packed_init(&vring.vring_packed, num, queue, vring_align);
> > > > > > > > +	else
> > > > > > > > +		vring_init(&vring.vring_split, num, queue, vring_align);
> > > > > > > Let's rename vring_init to vring_init_split() like other helpers?
> > > > > > The vring_init() is a public API in include/uapi/linux/virtio_ring.h.
> > > > > > I don't think we can rename it.
> > > > > I see, then this need more thoughts to unify the API.
> > > > My thought is to keep the old API as is, and introduce
> > > > new types and helpers for packed ring.
> > > I admit it's not a fault of this patch. But we'd better think of this in the
> > > future, consider we may have new kinds of ring.
> > > 
> > > > More details can be found in this patch:
> > > > https://lkml.org/lkml/2018/2/23/243
> > > > (PS. The type which has bit fields is just for reference,
> > > >    and will be changed in next version.)
> > > > 
> > > > Do you have any other suggestions?
> > > No.
> > Hmm.. Sorry, I didn't describe my question well.
> > I mean do you have any suggestions about the API
> > design for packed ring in uapi header? Currently
> > I introduced below two new helpers:
> > 
> > static inline void vring_packed_init(struct vring_packed *vr, unsigned int num,
> > 				     void *p, unsigned long align);
> > static inline unsigned vring_packed_size(unsigned int num, unsigned long align);
> > 
> > When new rings are introduced in the future, above
> > helpers can't be reused. Maybe we should make the
> > helpers be able to determine the ring type?
> 
> Let's wait for Michael's comment here. Generally, I fail to understand why
> vring_init() become a part of uapi. Git grep shows the only use cases are
> virtio_test/vringh_test.

Thank you very much for the review on this patch!
I'll send out a new version ASAP to address these
comments. :)

Best regards,
Tiwei Bie
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH RFC 2/2] virtio_ring: support packed ring
  2018-03-16 14:30                   ` Michael S. Tsirkin
@ 2018-03-21  7:35                     ` Tiwei Bie
  -1 siblings, 0 replies; 38+ messages in thread
From: Tiwei Bie @ 2018-03-21  7:35 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Jason Wang, virtualization, linux-kernel, netdev, wexu, jfreimann

On Fri, Mar 16, 2018 at 04:30:02PM +0200, Michael S. Tsirkin wrote:
> On Fri, Mar 16, 2018 at 07:36:47PM +0800, Jason Wang wrote:
> > > > @@ -1096,17 +1599,21 @@ struct virtqueue *vring_create_virtqueue(
> > > > > > > > >      	if (!queue) {
> > > > > > > > >      		/* Try to get a single page. You are my only hope! */
> > > > > > > > > -		queue = vring_alloc_queue(vdev, vring_size(num, vring_align),
> > > > > > > > > +		queue = vring_alloc_queue(vdev, __vring_size(num, vring_align,
> > > > > > > > > +							     packed),
> > > > > > > > >      					  &dma_addr, GFP_KERNEL|__GFP_ZERO);
> > > > > > > > >      	}
> > > > > > > > >      	if (!queue)
> > > > > > > > >      		return NULL;
> > > > > > > > > -	queue_size_in_bytes = vring_size(num, vring_align);
> > > > > > > > > -	vring_init(&vring, num, queue, vring_align);
> > > > > > > > > +	queue_size_in_bytes = __vring_size(num, vring_align, packed);
> > > > > > > > > +	if (packed)
> > > > > > > > > +		vring_packed_init(&vring.vring_packed, num, queue, vring_align);
> > > > > > > > > +	else
> > > > > > > > > +		vring_init(&vring.vring_split, num, queue, vring_align);
> > > > > > > > Let's rename vring_init to vring_init_split() like other helpers?
> > > > > > > The vring_init() is a public API in include/uapi/linux/virtio_ring.h.
> > > > > > > I don't think we can rename it.
> > > > > > I see, then this need more thoughts to unify the API.
> > > > > My thought is to keep the old API as is, and introduce
> > > > > new types and helpers for packed ring.
> > > > I admit it's not a fault of this patch. But we'd better think of this in the
> > > > future, consider we may have new kinds of ring.
> > > > 
> > > > > More details can be found in this patch:
> > > > > https://lkml.org/lkml/2018/2/23/243
> > > > > (PS. The type which has bit fields is just for reference,
> > > > >    and will be changed in next version.)
> > > > > 
> > > > > Do you have any other suggestions?
> > > > No.
> > > Hmm.. Sorry, I didn't describe my question well.
> > > I mean do you have any suggestions about the API
> > > design for packed ring in uapi header? Currently
> > > I introduced below two new helpers:
> > > 
> > > static inline void vring_packed_init(struct vring_packed *vr, unsigned int num,
> > > 				     void *p, unsigned long align);
> > > static inline unsigned vring_packed_size(unsigned int num, unsigned long align);
> > > 
> > > When new rings are introduced in the future, above
> > > helpers can't be reused. Maybe we should make the
> > > helpers be able to determine the ring type?
> > 
> > Let's wait for Michael's comment here. Generally, I fail to understand why
> > vring_init() become a part of uapi. Git grep shows the only use cases are
> > virtio_test/vringh_test.
> > 
> > Thanks
> 
> For init - I think it's a mistake that stems from lguest which sometimes
> made it less than obvious which code is where.  I don't see a reason to
> add to it.

Got it! I'll move vring_packed_init() out of uapi. Many thanks! :)

Best regards,
Tiwei Bie

> 
> -- 
> MST

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH RFC 2/2] virtio_ring: support packed ring
@ 2018-03-21  7:35                     ` Tiwei Bie
  0 siblings, 0 replies; 38+ messages in thread
From: Tiwei Bie @ 2018-03-21  7:35 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: netdev, linux-kernel, virtualization, wexu

On Fri, Mar 16, 2018 at 04:30:02PM +0200, Michael S. Tsirkin wrote:
> On Fri, Mar 16, 2018 at 07:36:47PM +0800, Jason Wang wrote:
> > > > @@ -1096,17 +1599,21 @@ struct virtqueue *vring_create_virtqueue(
> > > > > > > > >      	if (!queue) {
> > > > > > > > >      		/* Try to get a single page. You are my only hope! */
> > > > > > > > > -		queue = vring_alloc_queue(vdev, vring_size(num, vring_align),
> > > > > > > > > +		queue = vring_alloc_queue(vdev, __vring_size(num, vring_align,
> > > > > > > > > +							     packed),
> > > > > > > > >      					  &dma_addr, GFP_KERNEL|__GFP_ZERO);
> > > > > > > > >      	}
> > > > > > > > >      	if (!queue)
> > > > > > > > >      		return NULL;
> > > > > > > > > -	queue_size_in_bytes = vring_size(num, vring_align);
> > > > > > > > > -	vring_init(&vring, num, queue, vring_align);
> > > > > > > > > +	queue_size_in_bytes = __vring_size(num, vring_align, packed);
> > > > > > > > > +	if (packed)
> > > > > > > > > +		vring_packed_init(&vring.vring_packed, num, queue, vring_align);
> > > > > > > > > +	else
> > > > > > > > > +		vring_init(&vring.vring_split, num, queue, vring_align);
> > > > > > > > Let's rename vring_init to vring_init_split() like other helpers?
> > > > > > > The vring_init() is a public API in include/uapi/linux/virtio_ring.h.
> > > > > > > I don't think we can rename it.
> > > > > > I see, then this need more thoughts to unify the API.
> > > > > My thought is to keep the old API as is, and introduce
> > > > > new types and helpers for packed ring.
> > > > I admit it's not a fault of this patch. But we'd better think of this in the
> > > > future, consider we may have new kinds of ring.
> > > > 
> > > > > More details can be found in this patch:
> > > > > https://lkml.org/lkml/2018/2/23/243
> > > > > (PS. The type which has bit fields is just for reference,
> > > > >    and will be changed in next version.)
> > > > > 
> > > > > Do you have any other suggestions?
> > > > No.
> > > Hmm.. Sorry, I didn't describe my question well.
> > > I mean do you have any suggestions about the API
> > > design for packed ring in uapi header? Currently
> > > I introduced below two new helpers:
> > > 
> > > static inline void vring_packed_init(struct vring_packed *vr, unsigned int num,
> > > 				     void *p, unsigned long align);
> > > static inline unsigned vring_packed_size(unsigned int num, unsigned long align);
> > > 
> > > When new rings are introduced in the future, above
> > > helpers can't be reused. Maybe we should make the
> > > helpers be able to determine the ring type?
> > 
> > Let's wait for Michael's comment here. Generally, I fail to understand why
> > vring_init() become a part of uapi. Git grep shows the only use cases are
> > virtio_test/vringh_test.
> > 
> > Thanks
> 
> For init - I think it's a mistake that stems from lguest which sometimes
> made it less than obvious which code is where.  I don't see a reason to
> add to it.

Got it! I'll move vring_packed_init() out of uapi. Many thanks! :)

Best regards,
Tiwei Bie

> 
> -- 
> MST

^ permalink raw reply	[flat|nested] 38+ messages in thread

end of thread, other threads:[~2018-03-21  7:37 UTC | newest]

Thread overview: 38+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-02-23 11:17 [PATCH RFC 0/2] Packed ring for virtio Tiwei Bie
2018-02-23 11:17 ` Tiwei Bie
2018-02-23 11:18 ` [PATCH RFC 1/2] virtio: introduce packed ring defines Tiwei Bie
2018-02-23 11:18   ` Tiwei Bie
2018-02-27  8:54   ` Jens Freimann
2018-02-27  8:54   ` Jens Freimann
2018-02-27  9:18     ` Jens Freimann
2018-02-27  9:18     ` Jens Freimann
2018-02-27 12:01     ` Tiwei Bie
2018-02-27 12:01     ` Tiwei Bie
2018-02-27 20:28     ` Michael S. Tsirkin
2018-02-27 20:28     ` Michael S. Tsirkin
2018-02-27  9:26   ` David Laight
2018-02-27  9:26   ` David Laight
2018-02-27 11:31     ` Tiwei Bie
2018-02-27 11:31       ` Tiwei Bie
2018-02-23 11:18 ` [PATCH RFC 2/2] virtio_ring: support packed ring Tiwei Bie
2018-02-23 11:18   ` Tiwei Bie
2018-03-16  4:03   ` Jason Wang
2018-03-16  4:03     ` Jason Wang
2018-03-16  6:10     ` Tiwei Bie
2018-03-16  6:10       ` Tiwei Bie
2018-03-16  6:44       ` Jason Wang
2018-03-16  6:44       ` Jason Wang
2018-03-16  7:40         ` Tiwei Bie
2018-03-16  7:40         ` Tiwei Bie
2018-03-16  8:34           ` Jason Wang
2018-03-16  8:34             ` Jason Wang
2018-03-16 10:04             ` Tiwei Bie
2018-03-16 11:36               ` Jason Wang
2018-03-16 11:36                 ` Jason Wang
2018-03-16 14:30                 ` Michael S. Tsirkin
2018-03-16 14:30                   ` Michael S. Tsirkin
2018-03-21  7:35                   ` Tiwei Bie
2018-03-21  7:35                     ` Tiwei Bie
2018-03-21  7:30                 ` Tiwei Bie
2018-03-21  7:30                   ` Tiwei Bie
2018-03-16 10:04             ` Tiwei Bie

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.