All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/3] virtio support cache indirect desc
@ 2021-10-27  6:19 ` Xuan Zhuo
  0 siblings, 0 replies; 41+ messages in thread
From: Xuan Zhuo @ 2021-10-27  6:19 UTC (permalink / raw)
  To: virtualization, netdev
  Cc: Michael S. Tsirkin, Jason Wang, David S. Miller, Jakub Kicinski

If the VIRTIO_RING_F_INDIRECT_DESC negotiation succeeds, and the number
of sgs used for sending packets is greater than 1. We must constantly
call __kmalloc/kfree to allocate/release desc.

In the case of extremely fast package delivery, the overhead cannot be
ignored:

  27.46%  [kernel]  [k] virtqueue_add
  16.66%  [kernel]  [k] detach_buf_split
  16.51%  [kernel]  [k] virtnet_xsk_xmit
  14.04%  [kernel]  [k] virtqueue_add_outbuf
   5.18%  [kernel]  [k] __kmalloc
   4.08%  [kernel]  [k] kfree
   2.80%  [kernel]  [k] virtqueue_get_buf_ctx
   2.22%  [kernel]  [k] xsk_tx_peek_desc
   2.08%  [kernel]  [k] memset_erms
   0.83%  [kernel]  [k] virtqueue_kick_prepare
   0.76%  [kernel]  [k] virtnet_xsk_run
   0.62%  [kernel]  [k] __free_old_xmit_ptr
   0.60%  [kernel]  [k] vring_map_one_sg
   0.53%  [kernel]  [k] native_apic_mem_write
   0.46%  [kernel]  [k] sg_next
   0.43%  [kernel]  [k] sg_init_table
   0.41%  [kernel]  [k] kmalloc_slab

This patch adds a cache function to virtio to cache these allocated indirect
desc instead of constantly allocating and releasing desc.


Xuan Zhuo (3):
  virtio: cache indirect desc for split
  virtio: cache indirect desc for packed
  virtio-net: enable virtio indirect cache

 drivers/net/virtio_net.c     |   4 ++
 drivers/virtio/virtio.c      |   6 ++
 drivers/virtio/virtio_ring.c | 120 ++++++++++++++++++++++++++++++-----
 include/linux/virtio.h       |  10 +++
 4 files changed, 123 insertions(+), 17 deletions(-)

--
2.31.0


^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH 0/3] virtio support cache indirect desc
@ 2021-10-27  6:19 ` Xuan Zhuo
  0 siblings, 0 replies; 41+ messages in thread
From: Xuan Zhuo @ 2021-10-27  6:19 UTC (permalink / raw)
  To: virtualization, netdev
  Cc: Jakub Kicinski, David S. Miller, Michael S. Tsirkin

If the VIRTIO_RING_F_INDIRECT_DESC negotiation succeeds, and the number
of sgs used for sending packets is greater than 1. We must constantly
call __kmalloc/kfree to allocate/release desc.

In the case of extremely fast package delivery, the overhead cannot be
ignored:

  27.46%  [kernel]  [k] virtqueue_add
  16.66%  [kernel]  [k] detach_buf_split
  16.51%  [kernel]  [k] virtnet_xsk_xmit
  14.04%  [kernel]  [k] virtqueue_add_outbuf
   5.18%  [kernel]  [k] __kmalloc
   4.08%  [kernel]  [k] kfree
   2.80%  [kernel]  [k] virtqueue_get_buf_ctx
   2.22%  [kernel]  [k] xsk_tx_peek_desc
   2.08%  [kernel]  [k] memset_erms
   0.83%  [kernel]  [k] virtqueue_kick_prepare
   0.76%  [kernel]  [k] virtnet_xsk_run
   0.62%  [kernel]  [k] __free_old_xmit_ptr
   0.60%  [kernel]  [k] vring_map_one_sg
   0.53%  [kernel]  [k] native_apic_mem_write
   0.46%  [kernel]  [k] sg_next
   0.43%  [kernel]  [k] sg_init_table
   0.41%  [kernel]  [k] kmalloc_slab

This patch adds a cache function to virtio to cache these allocated indirect
desc instead of constantly allocating and releasing desc.


Xuan Zhuo (3):
  virtio: cache indirect desc for split
  virtio: cache indirect desc for packed
  virtio-net: enable virtio indirect cache

 drivers/net/virtio_net.c     |   4 ++
 drivers/virtio/virtio.c      |   6 ++
 drivers/virtio/virtio_ring.c | 120 ++++++++++++++++++++++++++++++-----
 include/linux/virtio.h       |  10 +++
 4 files changed, 123 insertions(+), 17 deletions(-)

--
2.31.0

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH 1/3] virtio: cache indirect desc for split
  2021-10-27  6:19 ` Xuan Zhuo
@ 2021-10-27  6:19   ` Xuan Zhuo
  -1 siblings, 0 replies; 41+ messages in thread
From: Xuan Zhuo @ 2021-10-27  6:19 UTC (permalink / raw)
  To: virtualization, netdev
  Cc: Michael S. Tsirkin, Jason Wang, David S. Miller, Jakub Kicinski

In the case of using indirect, indirect desc must be allocated and
released each time, which increases a lot of cpu overhead.

Here, a cache is added for indirect. If the number of indirect desc to be
applied for is less than VIRT_QUEUE_CACHE_DESC_NUM, the desc array with
the size of VIRT_QUEUE_CACHE_DESC_NUM is fixed and cached for reuse.

Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
---
 drivers/virtio/virtio.c      |  6 ++++
 drivers/virtio/virtio_ring.c | 63 ++++++++++++++++++++++++++++++------
 include/linux/virtio.h       | 10 ++++++
 3 files changed, 70 insertions(+), 9 deletions(-)

diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c
index 0a5b54034d4b..04bcb74e5b9a 100644
--- a/drivers/virtio/virtio.c
+++ b/drivers/virtio/virtio.c
@@ -431,6 +431,12 @@ bool is_virtio_device(struct device *dev)
 }
 EXPORT_SYMBOL_GPL(is_virtio_device);
 
+void virtio_use_desc_cache(struct virtio_device *dev, bool val)
+{
+	dev->desc_cache = val;
+}
+EXPORT_SYMBOL_GPL(virtio_use_desc_cache);
+
 void unregister_virtio_device(struct virtio_device *dev)
 {
 	int index = dev->index; /* save for after device release */
diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index dd95dfd85e98..0b9a8544b0e8 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -117,6 +117,10 @@ struct vring_virtqueue {
 	/* Hint for event idx: already triggered no need to disable. */
 	bool event_triggered;
 
+	/* Is indirect cache used? */
+	bool use_desc_cache;
+	void *desc_cache_chain;
+
 	union {
 		/* Available for split ring */
 		struct {
@@ -423,12 +427,47 @@ static unsigned int vring_unmap_one_split(const struct vring_virtqueue *vq,
 	return extra[i].next;
 }
 
-static struct vring_desc *alloc_indirect_split(struct virtqueue *_vq,
+#define VIRT_QUEUE_CACHE_DESC_NUM 4
+
+static void desc_cache_chain_free_split(void *chain)
+{
+	struct vring_desc *desc;
+
+	while (chain) {
+		desc = chain;
+		chain = (void *)desc->addr;
+		kfree(desc);
+	}
+}
+
+static void desc_cache_put_split(struct vring_virtqueue *vq,
+				 struct vring_desc *desc, int n)
+{
+	if (vq->use_desc_cache && n <= VIRT_QUEUE_CACHE_DESC_NUM) {
+		desc->addr = (u64)vq->desc_cache_chain;
+		vq->desc_cache_chain = desc;
+	} else {
+		kfree(desc);
+	}
+}
+
+static struct vring_desc *alloc_indirect_split(struct vring_virtqueue *vq,
 					       unsigned int total_sg,
 					       gfp_t gfp)
 {
 	struct vring_desc *desc;
-	unsigned int i;
+	unsigned int i, n;
+
+	if (vq->use_desc_cache && total_sg <= VIRT_QUEUE_CACHE_DESC_NUM) {
+		if (vq->desc_cache_chain) {
+			desc = vq->desc_cache_chain;
+			vq->desc_cache_chain = (void *)desc->addr;
+			goto got;
+		}
+		n = VIRT_QUEUE_CACHE_DESC_NUM;
+	} else {
+		n = total_sg;
+	}
 
 	/*
 	 * We require lowmem mappings for the descriptors because
@@ -437,12 +476,13 @@ static struct vring_desc *alloc_indirect_split(struct virtqueue *_vq,
 	 */
 	gfp &= ~__GFP_HIGHMEM;
 
-	desc = kmalloc_array(total_sg, sizeof(struct vring_desc), gfp);
+	desc = kmalloc_array(n, sizeof(struct vring_desc), gfp);
 	if (!desc)
 		return NULL;
 
+got:
 	for (i = 0; i < total_sg; i++)
-		desc[i].next = cpu_to_virtio16(_vq->vdev, i + 1);
+		desc[i].next = cpu_to_virtio16(vq->vq.vdev, i + 1);
 	return desc;
 }
 
@@ -508,7 +548,7 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
 	head = vq->free_head;
 
 	if (virtqueue_use_indirect(_vq, total_sg))
-		desc = alloc_indirect_split(_vq, total_sg, gfp);
+		desc = alloc_indirect_split(vq, total_sg, gfp);
 	else {
 		desc = NULL;
 		WARN_ON_ONCE(total_sg > vq->split.vring.num && !vq->indirect);
@@ -652,7 +692,7 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
 	}
 
 	if (indirect)
-		kfree(desc);
+		desc_cache_put_split(vq, desc, total_sg);
 
 	END_USE(vq);
 	return -ENOMEM;
@@ -717,7 +757,7 @@ static void detach_buf_split(struct vring_virtqueue *vq, unsigned int head,
 	if (vq->indirect) {
 		struct vring_desc *indir_desc =
 				vq->split.desc_state[head].indir_desc;
-		u32 len;
+		u32 len, n;
 
 		/* Free the indirect table, if any, now that it's unmapped. */
 		if (!indir_desc)
@@ -729,10 +769,12 @@ static void detach_buf_split(struct vring_virtqueue *vq, unsigned int head,
 				VRING_DESC_F_INDIRECT));
 		BUG_ON(len == 0 || len % sizeof(struct vring_desc));
 
-		for (j = 0; j < len / sizeof(struct vring_desc); j++)
+		n = len / sizeof(struct vring_desc);
+
+		for (j = 0; j < n; j++)
 			vring_unmap_one_split_indirect(vq, &indir_desc[j]);
 
-		kfree(indir_desc);
+		desc_cache_put_split(vq, indir_desc, n);
 		vq->split.desc_state[head].indir_desc = NULL;
 	} else if (ctx) {
 		*ctx = vq->split.desc_state[head].indir_desc;
@@ -2199,6 +2241,8 @@ struct virtqueue *__vring_new_virtqueue(unsigned int index,
 	vq->indirect = virtio_has_feature(vdev, VIRTIO_RING_F_INDIRECT_DESC) &&
 		!context;
 	vq->event = virtio_has_feature(vdev, VIRTIO_RING_F_EVENT_IDX);
+	vq->desc_cache_chain = NULL;
+	vq->use_desc_cache = vdev->desc_cache;
 
 	if (virtio_has_feature(vdev, VIRTIO_F_ORDER_PLATFORM))
 		vq->weak_barriers = false;
@@ -2329,6 +2373,7 @@ void vring_del_virtqueue(struct virtqueue *_vq)
 	if (!vq->packed_ring) {
 		kfree(vq->split.desc_state);
 		kfree(vq->split.desc_extra);
+		desc_cache_chain_free_split(vq->desc_cache_chain);
 	}
 	kfree(vq);
 }
diff --git a/include/linux/virtio.h b/include/linux/virtio.h
index 41edbc01ffa4..d84b7b8f4070 100644
--- a/include/linux/virtio.h
+++ b/include/linux/virtio.h
@@ -109,6 +109,7 @@ struct virtio_device {
 	bool failed;
 	bool config_enabled;
 	bool config_change_pending;
+	bool desc_cache;
 	spinlock_t config_lock;
 	spinlock_t vqs_list_lock; /* Protects VQs list access */
 	struct device dev;
@@ -130,6 +131,15 @@ int register_virtio_device(struct virtio_device *dev);
 void unregister_virtio_device(struct virtio_device *dev);
 bool is_virtio_device(struct device *dev);
 
+/**
+ * virtio_use_desc_cache - virtio ring use desc cache
+ *
+ * virtio will cache the allocated indirect desc.
+ *
+ * This function must be called before find_vqs.
+ */
+void virtio_use_desc_cache(struct virtio_device *dev, bool val);
+
 void virtio_break_device(struct virtio_device *dev);
 
 void virtio_config_changed(struct virtio_device *dev);
-- 
2.31.0


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH 1/3] virtio: cache indirect desc for split
@ 2021-10-27  6:19   ` Xuan Zhuo
  0 siblings, 0 replies; 41+ messages in thread
From: Xuan Zhuo @ 2021-10-27  6:19 UTC (permalink / raw)
  To: virtualization, netdev
  Cc: Jakub Kicinski, David S. Miller, Michael S. Tsirkin

In the case of using indirect, indirect desc must be allocated and
released each time, which increases a lot of cpu overhead.

Here, a cache is added for indirect. If the number of indirect desc to be
applied for is less than VIRT_QUEUE_CACHE_DESC_NUM, the desc array with
the size of VIRT_QUEUE_CACHE_DESC_NUM is fixed and cached for reuse.

Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
---
 drivers/virtio/virtio.c      |  6 ++++
 drivers/virtio/virtio_ring.c | 63 ++++++++++++++++++++++++++++++------
 include/linux/virtio.h       | 10 ++++++
 3 files changed, 70 insertions(+), 9 deletions(-)

diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c
index 0a5b54034d4b..04bcb74e5b9a 100644
--- a/drivers/virtio/virtio.c
+++ b/drivers/virtio/virtio.c
@@ -431,6 +431,12 @@ bool is_virtio_device(struct device *dev)
 }
 EXPORT_SYMBOL_GPL(is_virtio_device);
 
+void virtio_use_desc_cache(struct virtio_device *dev, bool val)
+{
+	dev->desc_cache = val;
+}
+EXPORT_SYMBOL_GPL(virtio_use_desc_cache);
+
 void unregister_virtio_device(struct virtio_device *dev)
 {
 	int index = dev->index; /* save for after device release */
diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index dd95dfd85e98..0b9a8544b0e8 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -117,6 +117,10 @@ struct vring_virtqueue {
 	/* Hint for event idx: already triggered no need to disable. */
 	bool event_triggered;
 
+	/* Is indirect cache used? */
+	bool use_desc_cache;
+	void *desc_cache_chain;
+
 	union {
 		/* Available for split ring */
 		struct {
@@ -423,12 +427,47 @@ static unsigned int vring_unmap_one_split(const struct vring_virtqueue *vq,
 	return extra[i].next;
 }
 
-static struct vring_desc *alloc_indirect_split(struct virtqueue *_vq,
+#define VIRT_QUEUE_CACHE_DESC_NUM 4
+
+static void desc_cache_chain_free_split(void *chain)
+{
+	struct vring_desc *desc;
+
+	while (chain) {
+		desc = chain;
+		chain = (void *)desc->addr;
+		kfree(desc);
+	}
+}
+
+static void desc_cache_put_split(struct vring_virtqueue *vq,
+				 struct vring_desc *desc, int n)
+{
+	if (vq->use_desc_cache && n <= VIRT_QUEUE_CACHE_DESC_NUM) {
+		desc->addr = (u64)vq->desc_cache_chain;
+		vq->desc_cache_chain = desc;
+	} else {
+		kfree(desc);
+	}
+}
+
+static struct vring_desc *alloc_indirect_split(struct vring_virtqueue *vq,
 					       unsigned int total_sg,
 					       gfp_t gfp)
 {
 	struct vring_desc *desc;
-	unsigned int i;
+	unsigned int i, n;
+
+	if (vq->use_desc_cache && total_sg <= VIRT_QUEUE_CACHE_DESC_NUM) {
+		if (vq->desc_cache_chain) {
+			desc = vq->desc_cache_chain;
+			vq->desc_cache_chain = (void *)desc->addr;
+			goto got;
+		}
+		n = VIRT_QUEUE_CACHE_DESC_NUM;
+	} else {
+		n = total_sg;
+	}
 
 	/*
 	 * We require lowmem mappings for the descriptors because
@@ -437,12 +476,13 @@ static struct vring_desc *alloc_indirect_split(struct virtqueue *_vq,
 	 */
 	gfp &= ~__GFP_HIGHMEM;
 
-	desc = kmalloc_array(total_sg, sizeof(struct vring_desc), gfp);
+	desc = kmalloc_array(n, sizeof(struct vring_desc), gfp);
 	if (!desc)
 		return NULL;
 
+got:
 	for (i = 0; i < total_sg; i++)
-		desc[i].next = cpu_to_virtio16(_vq->vdev, i + 1);
+		desc[i].next = cpu_to_virtio16(vq->vq.vdev, i + 1);
 	return desc;
 }
 
@@ -508,7 +548,7 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
 	head = vq->free_head;
 
 	if (virtqueue_use_indirect(_vq, total_sg))
-		desc = alloc_indirect_split(_vq, total_sg, gfp);
+		desc = alloc_indirect_split(vq, total_sg, gfp);
 	else {
 		desc = NULL;
 		WARN_ON_ONCE(total_sg > vq->split.vring.num && !vq->indirect);
@@ -652,7 +692,7 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
 	}
 
 	if (indirect)
-		kfree(desc);
+		desc_cache_put_split(vq, desc, total_sg);
 
 	END_USE(vq);
 	return -ENOMEM;
@@ -717,7 +757,7 @@ static void detach_buf_split(struct vring_virtqueue *vq, unsigned int head,
 	if (vq->indirect) {
 		struct vring_desc *indir_desc =
 				vq->split.desc_state[head].indir_desc;
-		u32 len;
+		u32 len, n;
 
 		/* Free the indirect table, if any, now that it's unmapped. */
 		if (!indir_desc)
@@ -729,10 +769,12 @@ static void detach_buf_split(struct vring_virtqueue *vq, unsigned int head,
 				VRING_DESC_F_INDIRECT));
 		BUG_ON(len == 0 || len % sizeof(struct vring_desc));
 
-		for (j = 0; j < len / sizeof(struct vring_desc); j++)
+		n = len / sizeof(struct vring_desc);
+
+		for (j = 0; j < n; j++)
 			vring_unmap_one_split_indirect(vq, &indir_desc[j]);
 
-		kfree(indir_desc);
+		desc_cache_put_split(vq, indir_desc, n);
 		vq->split.desc_state[head].indir_desc = NULL;
 	} else if (ctx) {
 		*ctx = vq->split.desc_state[head].indir_desc;
@@ -2199,6 +2241,8 @@ struct virtqueue *__vring_new_virtqueue(unsigned int index,
 	vq->indirect = virtio_has_feature(vdev, VIRTIO_RING_F_INDIRECT_DESC) &&
 		!context;
 	vq->event = virtio_has_feature(vdev, VIRTIO_RING_F_EVENT_IDX);
+	vq->desc_cache_chain = NULL;
+	vq->use_desc_cache = vdev->desc_cache;
 
 	if (virtio_has_feature(vdev, VIRTIO_F_ORDER_PLATFORM))
 		vq->weak_barriers = false;
@@ -2329,6 +2373,7 @@ void vring_del_virtqueue(struct virtqueue *_vq)
 	if (!vq->packed_ring) {
 		kfree(vq->split.desc_state);
 		kfree(vq->split.desc_extra);
+		desc_cache_chain_free_split(vq->desc_cache_chain);
 	}
 	kfree(vq);
 }
diff --git a/include/linux/virtio.h b/include/linux/virtio.h
index 41edbc01ffa4..d84b7b8f4070 100644
--- a/include/linux/virtio.h
+++ b/include/linux/virtio.h
@@ -109,6 +109,7 @@ struct virtio_device {
 	bool failed;
 	bool config_enabled;
 	bool config_change_pending;
+	bool desc_cache;
 	spinlock_t config_lock;
 	spinlock_t vqs_list_lock; /* Protects VQs list access */
 	struct device dev;
@@ -130,6 +131,15 @@ int register_virtio_device(struct virtio_device *dev);
 void unregister_virtio_device(struct virtio_device *dev);
 bool is_virtio_device(struct device *dev);
 
+/**
+ * virtio_use_desc_cache - virtio ring use desc cache
+ *
+ * virtio will cache the allocated indirect desc.
+ *
+ * This function must be called before find_vqs.
+ */
+void virtio_use_desc_cache(struct virtio_device *dev, bool val);
+
 void virtio_break_device(struct virtio_device *dev);
 
 void virtio_config_changed(struct virtio_device *dev);
-- 
2.31.0

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH 2/3] virtio: cache indirect desc for packed
  2021-10-27  6:19 ` Xuan Zhuo
@ 2021-10-27  6:19   ` Xuan Zhuo
  -1 siblings, 0 replies; 41+ messages in thread
From: Xuan Zhuo @ 2021-10-27  6:19 UTC (permalink / raw)
  To: virtualization, netdev
  Cc: Michael S. Tsirkin, Jason Wang, David S. Miller, Jakub Kicinski

In the case of using indirect, indirect desc must be allocated and
released each time, which increases a lot of cpu overhead.

Here, a cache is added for indirect. If the number of indirect desc to be
applied for is less than VIRT_QUEUE_CACHE_DESC_NUM, the desc array with
the size of VIRT_QUEUE_CACHE_DESC_NUM is fixed and cached for reuse.

Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
---
 drivers/virtio/virtio_ring.c | 57 +++++++++++++++++++++++++++++++-----
 1 file changed, 49 insertions(+), 8 deletions(-)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index 0b9a8544b0e8..4fd7bd5bcd70 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -1074,10 +1074,45 @@ static void vring_unmap_desc_packed(const struct vring_virtqueue *vq,
 	}
 }
 
-static struct vring_packed_desc *alloc_indirect_packed(unsigned int total_sg,
+static void desc_cache_chain_free_packed(void *chain)
+{
+	struct vring_packed_desc *desc;
+
+	while (chain) {
+		desc = chain;
+		chain = (void *)desc->addr;
+		kfree(desc);
+	}
+}
+
+static void desc_cache_put_packed(struct vring_virtqueue *vq,
+				  struct vring_packed_desc *desc, int n)
+{
+	if (vq->use_desc_cache && n <= VIRT_QUEUE_CACHE_DESC_NUM) {
+		desc->addr = (u64)vq->desc_cache_chain;
+		vq->desc_cache_chain = desc;
+	} else {
+		kfree(desc);
+	}
+}
+
+static struct vring_packed_desc *alloc_indirect_packed(struct vring_virtqueue *vq,
+						       unsigned int total_sg,
 						       gfp_t gfp)
 {
 	struct vring_packed_desc *desc;
+	unsigned int n;
+
+	if (vq->use_desc_cache && total_sg <= VIRT_QUEUE_CACHE_DESC_NUM) {
+		if (vq->desc_cache_chain) {
+			desc = vq->desc_cache_chain;
+			vq->desc_cache_chain = (void *)desc->addr;
+			return desc;
+		}
+		n = VIRT_QUEUE_CACHE_DESC_NUM;
+	} else {
+		n = total_sg;
+	}
 
 	/*
 	 * We require lowmem mappings for the descriptors because
@@ -1086,7 +1121,7 @@ static struct vring_packed_desc *alloc_indirect_packed(unsigned int total_sg,
 	 */
 	gfp &= ~__GFP_HIGHMEM;
 
-	desc = kmalloc_array(total_sg, sizeof(struct vring_packed_desc), gfp);
+	desc = kmalloc_array(n, sizeof(struct vring_packed_desc), gfp);
 
 	return desc;
 }
@@ -1106,7 +1141,7 @@ static int virtqueue_add_indirect_packed(struct vring_virtqueue *vq,
 	dma_addr_t addr;
 
 	head = vq->packed.next_avail_idx;
-	desc = alloc_indirect_packed(total_sg, gfp);
+	desc = alloc_indirect_packed(vq, total_sg, gfp);
 
 	if (unlikely(vq->vq.num_free < 1)) {
 		pr_debug("Can't add buf len 1 - avail = 0\n");
@@ -1197,7 +1232,7 @@ static int virtqueue_add_indirect_packed(struct vring_virtqueue *vq,
 	for (i = 0; i < err_idx; i++)
 		vring_unmap_desc_packed(vq, &desc[i]);
 
-	kfree(desc);
+	desc_cache_put_packed(vq, desc, total_sg);
 
 	END_USE(vq);
 	return -ENOMEM;
@@ -1422,20 +1457,22 @@ static void detach_buf_packed(struct vring_virtqueue *vq,
 	}
 
 	if (vq->indirect) {
-		u32 len;
+		u32 len, n;
 
 		/* Free the indirect table, if any, now that it's unmapped. */
 		desc = state->indir_desc;
 		if (!desc)
 			return;
 
+		n = len / sizeof(struct vring_packed_desc);
+
 		if (vq->use_dma_api) {
 			len = vq->packed.desc_extra[id].len;
-			for (i = 0; i < len / sizeof(struct vring_packed_desc);
-					i++)
+			for (i = 0; i < n; i++)
 				vring_unmap_desc_packed(vq, &desc[i]);
 		}
-		kfree(desc);
+
+		desc_cache_put_packed(vq, desc, n);
 		state->indir_desc = NULL;
 	} else if (ctx) {
 		*ctx = state->indir_desc;
@@ -1753,6 +1790,8 @@ static struct virtqueue *vring_create_virtqueue_packed(
 	vq->indirect = virtio_has_feature(vdev, VIRTIO_RING_F_INDIRECT_DESC) &&
 		!context;
 	vq->event = virtio_has_feature(vdev, VIRTIO_RING_F_EVENT_IDX);
+	vq->desc_cache_chain = NULL;
+	vq->use_desc_cache = vdev->desc_cache;
 
 	if (virtio_has_feature(vdev, VIRTIO_F_ORDER_PLATFORM))
 		vq->weak_barriers = false;
@@ -2374,6 +2413,8 @@ void vring_del_virtqueue(struct virtqueue *_vq)
 		kfree(vq->split.desc_state);
 		kfree(vq->split.desc_extra);
 		desc_cache_chain_free_split(vq->desc_cache_chain);
+	} else {
+		desc_cache_chain_free_packed(vq->desc_cache_chain);
 	}
 	kfree(vq);
 }
-- 
2.31.0


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH 2/3] virtio: cache indirect desc for packed
@ 2021-10-27  6:19   ` Xuan Zhuo
  0 siblings, 0 replies; 41+ messages in thread
From: Xuan Zhuo @ 2021-10-27  6:19 UTC (permalink / raw)
  To: virtualization, netdev
  Cc: Jakub Kicinski, David S. Miller, Michael S. Tsirkin

In the case of using indirect, indirect desc must be allocated and
released each time, which increases a lot of cpu overhead.

Here, a cache is added for indirect. If the number of indirect desc to be
applied for is less than VIRT_QUEUE_CACHE_DESC_NUM, the desc array with
the size of VIRT_QUEUE_CACHE_DESC_NUM is fixed and cached for reuse.

Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
---
 drivers/virtio/virtio_ring.c | 57 +++++++++++++++++++++++++++++++-----
 1 file changed, 49 insertions(+), 8 deletions(-)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index 0b9a8544b0e8..4fd7bd5bcd70 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -1074,10 +1074,45 @@ static void vring_unmap_desc_packed(const struct vring_virtqueue *vq,
 	}
 }
 
-static struct vring_packed_desc *alloc_indirect_packed(unsigned int total_sg,
+static void desc_cache_chain_free_packed(void *chain)
+{
+	struct vring_packed_desc *desc;
+
+	while (chain) {
+		desc = chain;
+		chain = (void *)desc->addr;
+		kfree(desc);
+	}
+}
+
+static void desc_cache_put_packed(struct vring_virtqueue *vq,
+				  struct vring_packed_desc *desc, int n)
+{
+	if (vq->use_desc_cache && n <= VIRT_QUEUE_CACHE_DESC_NUM) {
+		desc->addr = (u64)vq->desc_cache_chain;
+		vq->desc_cache_chain = desc;
+	} else {
+		kfree(desc);
+	}
+}
+
+static struct vring_packed_desc *alloc_indirect_packed(struct vring_virtqueue *vq,
+						       unsigned int total_sg,
 						       gfp_t gfp)
 {
 	struct vring_packed_desc *desc;
+	unsigned int n;
+
+	if (vq->use_desc_cache && total_sg <= VIRT_QUEUE_CACHE_DESC_NUM) {
+		if (vq->desc_cache_chain) {
+			desc = vq->desc_cache_chain;
+			vq->desc_cache_chain = (void *)desc->addr;
+			return desc;
+		}
+		n = VIRT_QUEUE_CACHE_DESC_NUM;
+	} else {
+		n = total_sg;
+	}
 
 	/*
 	 * We require lowmem mappings for the descriptors because
@@ -1086,7 +1121,7 @@ static struct vring_packed_desc *alloc_indirect_packed(unsigned int total_sg,
 	 */
 	gfp &= ~__GFP_HIGHMEM;
 
-	desc = kmalloc_array(total_sg, sizeof(struct vring_packed_desc), gfp);
+	desc = kmalloc_array(n, sizeof(struct vring_packed_desc), gfp);
 
 	return desc;
 }
@@ -1106,7 +1141,7 @@ static int virtqueue_add_indirect_packed(struct vring_virtqueue *vq,
 	dma_addr_t addr;
 
 	head = vq->packed.next_avail_idx;
-	desc = alloc_indirect_packed(total_sg, gfp);
+	desc = alloc_indirect_packed(vq, total_sg, gfp);
 
 	if (unlikely(vq->vq.num_free < 1)) {
 		pr_debug("Can't add buf len 1 - avail = 0\n");
@@ -1197,7 +1232,7 @@ static int virtqueue_add_indirect_packed(struct vring_virtqueue *vq,
 	for (i = 0; i < err_idx; i++)
 		vring_unmap_desc_packed(vq, &desc[i]);
 
-	kfree(desc);
+	desc_cache_put_packed(vq, desc, total_sg);
 
 	END_USE(vq);
 	return -ENOMEM;
@@ -1422,20 +1457,22 @@ static void detach_buf_packed(struct vring_virtqueue *vq,
 	}
 
 	if (vq->indirect) {
-		u32 len;
+		u32 len, n;
 
 		/* Free the indirect table, if any, now that it's unmapped. */
 		desc = state->indir_desc;
 		if (!desc)
 			return;
 
+		n = len / sizeof(struct vring_packed_desc);
+
 		if (vq->use_dma_api) {
 			len = vq->packed.desc_extra[id].len;
-			for (i = 0; i < len / sizeof(struct vring_packed_desc);
-					i++)
+			for (i = 0; i < n; i++)
 				vring_unmap_desc_packed(vq, &desc[i]);
 		}
-		kfree(desc);
+
+		desc_cache_put_packed(vq, desc, n);
 		state->indir_desc = NULL;
 	} else if (ctx) {
 		*ctx = state->indir_desc;
@@ -1753,6 +1790,8 @@ static struct virtqueue *vring_create_virtqueue_packed(
 	vq->indirect = virtio_has_feature(vdev, VIRTIO_RING_F_INDIRECT_DESC) &&
 		!context;
 	vq->event = virtio_has_feature(vdev, VIRTIO_RING_F_EVENT_IDX);
+	vq->desc_cache_chain = NULL;
+	vq->use_desc_cache = vdev->desc_cache;
 
 	if (virtio_has_feature(vdev, VIRTIO_F_ORDER_PLATFORM))
 		vq->weak_barriers = false;
@@ -2374,6 +2413,8 @@ void vring_del_virtqueue(struct virtqueue *_vq)
 		kfree(vq->split.desc_state);
 		kfree(vq->split.desc_extra);
 		desc_cache_chain_free_split(vq->desc_cache_chain);
+	} else {
+		desc_cache_chain_free_packed(vq->desc_cache_chain);
 	}
 	kfree(vq);
 }
-- 
2.31.0

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH 3/3] virtio-net: enable virtio indirect cache
  2021-10-27  6:19 ` Xuan Zhuo
@ 2021-10-27  6:19   ` Xuan Zhuo
  -1 siblings, 0 replies; 41+ messages in thread
From: Xuan Zhuo @ 2021-10-27  6:19 UTC (permalink / raw)
  To: virtualization, netdev
  Cc: Michael S. Tsirkin, Jason Wang, David S. Miller, Jakub Kicinski

If the VIRTIO_RING_F_INDIRECT_DESC negotiation succeeds, and the number
of sgs used for sending packets is greater than 1. We must constantly
call __kmalloc/kfree to allocate/release desc.

In the case of extremely fast package delivery, the overhead cannot be
ignored:

  27.46%  [kernel]  [k] virtqueue_add
  16.66%  [kernel]  [k] detach_buf_split
  16.51%  [kernel]  [k] virtnet_xsk_xmit
  14.04%  [kernel]  [k] virtqueue_add_outbuf
   5.18%  [kernel]  [k] __kmalloc
   4.08%  [kernel]  [k] kfree
   2.80%  [kernel]  [k] virtqueue_get_buf_ctx
   2.22%  [kernel]  [k] xsk_tx_peek_desc
   2.08%  [kernel]  [k] memset_erms
   0.83%  [kernel]  [k] virtqueue_kick_prepare
   0.76%  [kernel]  [k] virtnet_xsk_run
   0.62%  [kernel]  [k] __free_old_xmit_ptr
   0.60%  [kernel]  [k] vring_map_one_sg
   0.53%  [kernel]  [k] native_apic_mem_write
   0.46%  [kernel]  [k] sg_next
   0.43%  [kernel]  [k] sg_init_table
   0.41%  [kernel]  [k] kmalloc_slab

Compared to not using virtio indirect cache, virtio-net can get a 16%
performance improvement when using indirect desc cache.

In the test case, the CPU where the package is sent has reached 100%.
The following are the PPS in two cases:

    indirect desc cache  | no cache
    3074658              | 2685132
    3111866              | 2666118
    3152527              | 2653632
    3125867              | 2669820
    3027147              | 2644464
    3069211              | 2669777
    3038522              | 2675645
    3034507              | 2671302
    3102257              | 2685504
    3083712              | 2692800
    3051771              | 2676928
    3080684              | 2695040
    3147816              | 2720876
    3123887              | 2705492
    3180963              | 2699520
    3191579              | 2676480
    3161670              | 2686272
    3189768              | 2692588
    3174272              | 2686692
    3143434              | 2682416

Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
---
 drivers/net/virtio_net.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 4ad25a8b0870..0ec29cf90d0a 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -27,9 +27,11 @@ static int napi_weight = NAPI_POLL_WEIGHT;
 module_param(napi_weight, int, 0444);
 
 static bool csum = true, gso = true, napi_tx = true;
+static bool virtio_desc_cache = true;
 module_param(csum, bool, 0444);
 module_param(gso, bool, 0444);
 module_param(napi_tx, bool, 0644);
+module_param(virtio_desc_cache, bool, 0644);
 
 /* FIXME: MTU in config. */
 #define GOOD_PACKET_LEN (ETH_HLEN + VLAN_HLEN + ETH_DATA_LEN)
@@ -3214,6 +3216,8 @@ static int virtnet_probe(struct virtio_device *vdev)
 		vi->curr_queue_pairs = num_online_cpus();
 	vi->max_queue_pairs = max_queue_pairs;
 
+	virtio_use_desc_cache(vdev, virtio_desc_cache);
+
 	/* Allocate/initialize the rx/tx queues, and invoke find_vqs */
 	err = init_vqs(vi);
 	if (err)
-- 
2.31.0


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH 3/3] virtio-net: enable virtio indirect cache
@ 2021-10-27  6:19   ` Xuan Zhuo
  0 siblings, 0 replies; 41+ messages in thread
From: Xuan Zhuo @ 2021-10-27  6:19 UTC (permalink / raw)
  To: virtualization, netdev
  Cc: Jakub Kicinski, David S. Miller, Michael S. Tsirkin

If the VIRTIO_RING_F_INDIRECT_DESC negotiation succeeds, and the number
of sgs used for sending packets is greater than 1. We must constantly
call __kmalloc/kfree to allocate/release desc.

In the case of extremely fast package delivery, the overhead cannot be
ignored:

  27.46%  [kernel]  [k] virtqueue_add
  16.66%  [kernel]  [k] detach_buf_split
  16.51%  [kernel]  [k] virtnet_xsk_xmit
  14.04%  [kernel]  [k] virtqueue_add_outbuf
   5.18%  [kernel]  [k] __kmalloc
   4.08%  [kernel]  [k] kfree
   2.80%  [kernel]  [k] virtqueue_get_buf_ctx
   2.22%  [kernel]  [k] xsk_tx_peek_desc
   2.08%  [kernel]  [k] memset_erms
   0.83%  [kernel]  [k] virtqueue_kick_prepare
   0.76%  [kernel]  [k] virtnet_xsk_run
   0.62%  [kernel]  [k] __free_old_xmit_ptr
   0.60%  [kernel]  [k] vring_map_one_sg
   0.53%  [kernel]  [k] native_apic_mem_write
   0.46%  [kernel]  [k] sg_next
   0.43%  [kernel]  [k] sg_init_table
   0.41%  [kernel]  [k] kmalloc_slab

Compared to not using virtio indirect cache, virtio-net can get a 16%
performance improvement when using indirect desc cache.

In the test case, the CPU where the package is sent has reached 100%.
The following are the PPS in two cases:

    indirect desc cache  | no cache
    3074658              | 2685132
    3111866              | 2666118
    3152527              | 2653632
    3125867              | 2669820
    3027147              | 2644464
    3069211              | 2669777
    3038522              | 2675645
    3034507              | 2671302
    3102257              | 2685504
    3083712              | 2692800
    3051771              | 2676928
    3080684              | 2695040
    3147816              | 2720876
    3123887              | 2705492
    3180963              | 2699520
    3191579              | 2676480
    3161670              | 2686272
    3189768              | 2692588
    3174272              | 2686692
    3143434              | 2682416

Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
---
 drivers/net/virtio_net.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 4ad25a8b0870..0ec29cf90d0a 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -27,9 +27,11 @@ static int napi_weight = NAPI_POLL_WEIGHT;
 module_param(napi_weight, int, 0444);
 
 static bool csum = true, gso = true, napi_tx = true;
+static bool virtio_desc_cache = true;
 module_param(csum, bool, 0444);
 module_param(gso, bool, 0444);
 module_param(napi_tx, bool, 0644);
+module_param(virtio_desc_cache, bool, 0644);
 
 /* FIXME: MTU in config. */
 #define GOOD_PACKET_LEN (ETH_HLEN + VLAN_HLEN + ETH_DATA_LEN)
@@ -3214,6 +3216,8 @@ static int virtnet_probe(struct virtio_device *vdev)
 		vi->curr_queue_pairs = num_online_cpus();
 	vi->max_queue_pairs = max_queue_pairs;
 
+	virtio_use_desc_cache(vdev, virtio_desc_cache);
+
 	/* Allocate/initialize the rx/tx queues, and invoke find_vqs */
 	err = init_vqs(vi);
 	if (err)
-- 
2.31.0

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* Re: [PATCH 1/3] virtio: cache indirect desc for split
  2021-10-27  6:19   ` Xuan Zhuo
@ 2021-10-27  8:55     ` Michael S. Tsirkin
  -1 siblings, 0 replies; 41+ messages in thread
From: Michael S. Tsirkin @ 2021-10-27  8:55 UTC (permalink / raw)
  To: Xuan Zhuo
  Cc: virtualization, netdev, Jason Wang, David S. Miller, Jakub Kicinski

On Wed, Oct 27, 2021 at 02:19:11PM +0800, Xuan Zhuo wrote:
> In the case of using indirect, indirect desc must be allocated and
> released each time, which increases a lot of cpu overhead.
> 
> Here, a cache is added for indirect. If the number of indirect desc to be
> applied for is less than VIRT_QUEUE_CACHE_DESC_NUM, the desc array with
> the size of VIRT_QUEUE_CACHE_DESC_NUM is fixed and cached for reuse.
> 
> Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> ---
>  drivers/virtio/virtio.c      |  6 ++++
>  drivers/virtio/virtio_ring.c | 63 ++++++++++++++++++++++++++++++------
>  include/linux/virtio.h       | 10 ++++++
>  3 files changed, 70 insertions(+), 9 deletions(-)
> 
> diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c
> index 0a5b54034d4b..04bcb74e5b9a 100644
> --- a/drivers/virtio/virtio.c
> +++ b/drivers/virtio/virtio.c
> @@ -431,6 +431,12 @@ bool is_virtio_device(struct device *dev)
>  }
>  EXPORT_SYMBOL_GPL(is_virtio_device);
>  
> +void virtio_use_desc_cache(struct virtio_device *dev, bool val)
> +{
> +	dev->desc_cache = val;
> +}
> +EXPORT_SYMBOL_GPL(virtio_use_desc_cache);
> +
>  void unregister_virtio_device(struct virtio_device *dev)
>  {
>  	int index = dev->index; /* save for after device release */
> diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> index dd95dfd85e98..0b9a8544b0e8 100644
> --- a/drivers/virtio/virtio_ring.c
> +++ b/drivers/virtio/virtio_ring.c
> @@ -117,6 +117,10 @@ struct vring_virtqueue {
>  	/* Hint for event idx: already triggered no need to disable. */
>  	bool event_triggered;
>  
> +	/* Is indirect cache used? */
> +	bool use_desc_cache;
> +	void *desc_cache_chain;
> +
>  	union {
>  		/* Available for split ring */
>  		struct {

Let's use llist_head and friends please (I am guessing we want
single-linked to avoid writing into indirect buffer after use,
invalidating the cache, but please document why in a comment).  Do not
open-code it.

Also hide all casts in inline wrappers.


> @@ -423,12 +427,47 @@ static unsigned int vring_unmap_one_split(const struct vring_virtqueue *vq,
>  	return extra[i].next;
>  }
>  
> -static struct vring_desc *alloc_indirect_split(struct virtqueue *_vq,
> +#define VIRT_QUEUE_CACHE_DESC_NUM 4

Add an inline wrapper for checking sg versus VIRT_QUEUE_CACHE_DESC_NUM.

> +
> +static void desc_cache_chain_free_split(void *chain)
> +{
> +	struct vring_desc *desc;
> +
> +	while (chain) {
> +		desc = chain;
> +		chain = (void *)desc->addr;
> +		kfree(desc);
> +	}
> +}
> +
> +static void desc_cache_put_split(struct vring_virtqueue *vq,
> +				 struct vring_desc *desc, int n)
> +{
> +	if (vq->use_desc_cache && n <= VIRT_QUEUE_CACHE_DESC_NUM) {
> +		desc->addr = (u64)vq->desc_cache_chain;
> +		vq->desc_cache_chain = desc;
> +	} else {
> +		kfree(desc);
> +	}
> +}


> +
> +static struct vring_desc *alloc_indirect_split(struct vring_virtqueue *vq,
>  					       unsigned int total_sg,
>  					       gfp_t gfp)
>  {
>  	struct vring_desc *desc;
> -	unsigned int i;
> +	unsigned int i, n;
> +
> +	if (vq->use_desc_cache && total_sg <= VIRT_QUEUE_CACHE_DESC_NUM) {
> +		if (vq->desc_cache_chain) {
> +			desc = vq->desc_cache_chain;
> +			vq->desc_cache_chain = (void *)desc->addr;
> +			goto got;
> +		}
> +		n = VIRT_QUEUE_CACHE_DESC_NUM;

Hmm. This will allocate more entries than actually used. Why do it?

> +	} else {
> +		n = total_sg;
> +	}
>  
>  	/*
>  	 * We require lowmem mappings for the descriptors because
> @@ -437,12 +476,13 @@ static struct vring_desc *alloc_indirect_split(struct virtqueue *_vq,
>  	 */
>  	gfp &= ~__GFP_HIGHMEM;
>  
> -	desc = kmalloc_array(total_sg, sizeof(struct vring_desc), gfp);
> +	desc = kmalloc_array(n, sizeof(struct vring_desc), gfp);
>  	if (!desc)
>  		return NULL;
>  
> +got:
>  	for (i = 0; i < total_sg; i++)
> -		desc[i].next = cpu_to_virtio16(_vq->vdev, i + 1);
> +		desc[i].next = cpu_to_virtio16(vq->vq.vdev, i + 1);
>  	return desc;
>  }
>  
> @@ -508,7 +548,7 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
>  	head = vq->free_head;
>  
>  	if (virtqueue_use_indirect(_vq, total_sg))
> -		desc = alloc_indirect_split(_vq, total_sg, gfp);
> +		desc = alloc_indirect_split(vq, total_sg, gfp);
>  	else {
>  		desc = NULL;
>  		WARN_ON_ONCE(total_sg > vq->split.vring.num && !vq->indirect);
> @@ -652,7 +692,7 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
>  	}
>  
>  	if (indirect)
> -		kfree(desc);
> +		desc_cache_put_split(vq, desc, total_sg);
>  
>  	END_USE(vq);
>  	return -ENOMEM;
> @@ -717,7 +757,7 @@ static void detach_buf_split(struct vring_virtqueue *vq, unsigned int head,
>  	if (vq->indirect) {
>  		struct vring_desc *indir_desc =
>  				vq->split.desc_state[head].indir_desc;
> -		u32 len;
> +		u32 len, n;
>  
>  		/* Free the indirect table, if any, now that it's unmapped. */
>  		if (!indir_desc)
> @@ -729,10 +769,12 @@ static void detach_buf_split(struct vring_virtqueue *vq, unsigned int head,
>  				VRING_DESC_F_INDIRECT));
>  		BUG_ON(len == 0 || len % sizeof(struct vring_desc));
>  
> -		for (j = 0; j < len / sizeof(struct vring_desc); j++)
> +		n = len / sizeof(struct vring_desc);
> +
> +		for (j = 0; j < n; j++)
>  			vring_unmap_one_split_indirect(vq, &indir_desc[j]);
>  
> -		kfree(indir_desc);
> +		desc_cache_put_split(vq, indir_desc, n);
>  		vq->split.desc_state[head].indir_desc = NULL;
>  	} else if (ctx) {
>  		*ctx = vq->split.desc_state[head].indir_desc;
> @@ -2199,6 +2241,8 @@ struct virtqueue *__vring_new_virtqueue(unsigned int index,
>  	vq->indirect = virtio_has_feature(vdev, VIRTIO_RING_F_INDIRECT_DESC) &&
>  		!context;
>  	vq->event = virtio_has_feature(vdev, VIRTIO_RING_F_EVENT_IDX);
> +	vq->desc_cache_chain = NULL;
> +	vq->use_desc_cache = vdev->desc_cache;
>  
>  	if (virtio_has_feature(vdev, VIRTIO_F_ORDER_PLATFORM))
>  		vq->weak_barriers = false;
> @@ -2329,6 +2373,7 @@ void vring_del_virtqueue(struct virtqueue *_vq)
>  	if (!vq->packed_ring) {
>  		kfree(vq->split.desc_state);
>  		kfree(vq->split.desc_extra);
> +		desc_cache_chain_free_split(vq->desc_cache_chain);
>  	}
>  	kfree(vq);
>  }
> diff --git a/include/linux/virtio.h b/include/linux/virtio.h
> index 41edbc01ffa4..d84b7b8f4070 100644
> --- a/include/linux/virtio.h
> +++ b/include/linux/virtio.h
> @@ -109,6 +109,7 @@ struct virtio_device {
>  	bool failed;
>  	bool config_enabled;
>  	bool config_change_pending;
> +	bool desc_cache;
>  	spinlock_t config_lock;
>  	spinlock_t vqs_list_lock; /* Protects VQs list access */
>  	struct device dev;
> @@ -130,6 +131,15 @@ int register_virtio_device(struct virtio_device *dev);
>  void unregister_virtio_device(struct virtio_device *dev);
>  bool is_virtio_device(struct device *dev);
>  
> +/**
> + * virtio_use_desc_cache - virtio ring use desc cache
> + *
> + * virtio will cache the allocated indirect desc.
> + *
> + * This function must be called before find_vqs.
> + */
> +void virtio_use_desc_cache(struct virtio_device *dev, bool val);
> +
>  void virtio_break_device(struct virtio_device *dev);
>  
>  void virtio_config_changed(struct virtio_device *dev);
> -- 
> 2.31.0


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 1/3] virtio: cache indirect desc for split
@ 2021-10-27  8:55     ` Michael S. Tsirkin
  0 siblings, 0 replies; 41+ messages in thread
From: Michael S. Tsirkin @ 2021-10-27  8:55 UTC (permalink / raw)
  To: Xuan Zhuo; +Cc: netdev, David S. Miller, Jakub Kicinski, virtualization

On Wed, Oct 27, 2021 at 02:19:11PM +0800, Xuan Zhuo wrote:
> In the case of using indirect, indirect desc must be allocated and
> released each time, which increases a lot of cpu overhead.
> 
> Here, a cache is added for indirect. If the number of indirect desc to be
> applied for is less than VIRT_QUEUE_CACHE_DESC_NUM, the desc array with
> the size of VIRT_QUEUE_CACHE_DESC_NUM is fixed and cached for reuse.
> 
> Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> ---
>  drivers/virtio/virtio.c      |  6 ++++
>  drivers/virtio/virtio_ring.c | 63 ++++++++++++++++++++++++++++++------
>  include/linux/virtio.h       | 10 ++++++
>  3 files changed, 70 insertions(+), 9 deletions(-)
> 
> diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c
> index 0a5b54034d4b..04bcb74e5b9a 100644
> --- a/drivers/virtio/virtio.c
> +++ b/drivers/virtio/virtio.c
> @@ -431,6 +431,12 @@ bool is_virtio_device(struct device *dev)
>  }
>  EXPORT_SYMBOL_GPL(is_virtio_device);
>  
> +void virtio_use_desc_cache(struct virtio_device *dev, bool val)
> +{
> +	dev->desc_cache = val;
> +}
> +EXPORT_SYMBOL_GPL(virtio_use_desc_cache);
> +
>  void unregister_virtio_device(struct virtio_device *dev)
>  {
>  	int index = dev->index; /* save for after device release */
> diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> index dd95dfd85e98..0b9a8544b0e8 100644
> --- a/drivers/virtio/virtio_ring.c
> +++ b/drivers/virtio/virtio_ring.c
> @@ -117,6 +117,10 @@ struct vring_virtqueue {
>  	/* Hint for event idx: already triggered no need to disable. */
>  	bool event_triggered;
>  
> +	/* Is indirect cache used? */
> +	bool use_desc_cache;
> +	void *desc_cache_chain;
> +
>  	union {
>  		/* Available for split ring */
>  		struct {

Let's use llist_head and friends please (I am guessing we want
single-linked to avoid writing into indirect buffer after use,
invalidating the cache, but please document why in a comment).  Do not
open-code it.

Also hide all casts in inline wrappers.


> @@ -423,12 +427,47 @@ static unsigned int vring_unmap_one_split(const struct vring_virtqueue *vq,
>  	return extra[i].next;
>  }
>  
> -static struct vring_desc *alloc_indirect_split(struct virtqueue *_vq,
> +#define VIRT_QUEUE_CACHE_DESC_NUM 4

Add an inline wrapper for checking sg versus VIRT_QUEUE_CACHE_DESC_NUM.

> +
> +static void desc_cache_chain_free_split(void *chain)
> +{
> +	struct vring_desc *desc;
> +
> +	while (chain) {
> +		desc = chain;
> +		chain = (void *)desc->addr;
> +		kfree(desc);
> +	}
> +}
> +
> +static void desc_cache_put_split(struct vring_virtqueue *vq,
> +				 struct vring_desc *desc, int n)
> +{
> +	if (vq->use_desc_cache && n <= VIRT_QUEUE_CACHE_DESC_NUM) {
> +		desc->addr = (u64)vq->desc_cache_chain;
> +		vq->desc_cache_chain = desc;
> +	} else {
> +		kfree(desc);
> +	}
> +}


> +
> +static struct vring_desc *alloc_indirect_split(struct vring_virtqueue *vq,
>  					       unsigned int total_sg,
>  					       gfp_t gfp)
>  {
>  	struct vring_desc *desc;
> -	unsigned int i;
> +	unsigned int i, n;
> +
> +	if (vq->use_desc_cache && total_sg <= VIRT_QUEUE_CACHE_DESC_NUM) {
> +		if (vq->desc_cache_chain) {
> +			desc = vq->desc_cache_chain;
> +			vq->desc_cache_chain = (void *)desc->addr;
> +			goto got;
> +		}
> +		n = VIRT_QUEUE_CACHE_DESC_NUM;

Hmm. This will allocate more entries than actually used. Why do it?

> +	} else {
> +		n = total_sg;
> +	}
>  
>  	/*
>  	 * We require lowmem mappings for the descriptors because
> @@ -437,12 +476,13 @@ static struct vring_desc *alloc_indirect_split(struct virtqueue *_vq,
>  	 */
>  	gfp &= ~__GFP_HIGHMEM;
>  
> -	desc = kmalloc_array(total_sg, sizeof(struct vring_desc), gfp);
> +	desc = kmalloc_array(n, sizeof(struct vring_desc), gfp);
>  	if (!desc)
>  		return NULL;
>  
> +got:
>  	for (i = 0; i < total_sg; i++)
> -		desc[i].next = cpu_to_virtio16(_vq->vdev, i + 1);
> +		desc[i].next = cpu_to_virtio16(vq->vq.vdev, i + 1);
>  	return desc;
>  }
>  
> @@ -508,7 +548,7 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
>  	head = vq->free_head;
>  
>  	if (virtqueue_use_indirect(_vq, total_sg))
> -		desc = alloc_indirect_split(_vq, total_sg, gfp);
> +		desc = alloc_indirect_split(vq, total_sg, gfp);
>  	else {
>  		desc = NULL;
>  		WARN_ON_ONCE(total_sg > vq->split.vring.num && !vq->indirect);
> @@ -652,7 +692,7 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
>  	}
>  
>  	if (indirect)
> -		kfree(desc);
> +		desc_cache_put_split(vq, desc, total_sg);
>  
>  	END_USE(vq);
>  	return -ENOMEM;
> @@ -717,7 +757,7 @@ static void detach_buf_split(struct vring_virtqueue *vq, unsigned int head,
>  	if (vq->indirect) {
>  		struct vring_desc *indir_desc =
>  				vq->split.desc_state[head].indir_desc;
> -		u32 len;
> +		u32 len, n;
>  
>  		/* Free the indirect table, if any, now that it's unmapped. */
>  		if (!indir_desc)
> @@ -729,10 +769,12 @@ static void detach_buf_split(struct vring_virtqueue *vq, unsigned int head,
>  				VRING_DESC_F_INDIRECT));
>  		BUG_ON(len == 0 || len % sizeof(struct vring_desc));
>  
> -		for (j = 0; j < len / sizeof(struct vring_desc); j++)
> +		n = len / sizeof(struct vring_desc);
> +
> +		for (j = 0; j < n; j++)
>  			vring_unmap_one_split_indirect(vq, &indir_desc[j]);
>  
> -		kfree(indir_desc);
> +		desc_cache_put_split(vq, indir_desc, n);
>  		vq->split.desc_state[head].indir_desc = NULL;
>  	} else if (ctx) {
>  		*ctx = vq->split.desc_state[head].indir_desc;
> @@ -2199,6 +2241,8 @@ struct virtqueue *__vring_new_virtqueue(unsigned int index,
>  	vq->indirect = virtio_has_feature(vdev, VIRTIO_RING_F_INDIRECT_DESC) &&
>  		!context;
>  	vq->event = virtio_has_feature(vdev, VIRTIO_RING_F_EVENT_IDX);
> +	vq->desc_cache_chain = NULL;
> +	vq->use_desc_cache = vdev->desc_cache;
>  
>  	if (virtio_has_feature(vdev, VIRTIO_F_ORDER_PLATFORM))
>  		vq->weak_barriers = false;
> @@ -2329,6 +2373,7 @@ void vring_del_virtqueue(struct virtqueue *_vq)
>  	if (!vq->packed_ring) {
>  		kfree(vq->split.desc_state);
>  		kfree(vq->split.desc_extra);
> +		desc_cache_chain_free_split(vq->desc_cache_chain);
>  	}
>  	kfree(vq);
>  }
> diff --git a/include/linux/virtio.h b/include/linux/virtio.h
> index 41edbc01ffa4..d84b7b8f4070 100644
> --- a/include/linux/virtio.h
> +++ b/include/linux/virtio.h
> @@ -109,6 +109,7 @@ struct virtio_device {
>  	bool failed;
>  	bool config_enabled;
>  	bool config_change_pending;
> +	bool desc_cache;
>  	spinlock_t config_lock;
>  	spinlock_t vqs_list_lock; /* Protects VQs list access */
>  	struct device dev;
> @@ -130,6 +131,15 @@ int register_virtio_device(struct virtio_device *dev);
>  void unregister_virtio_device(struct virtio_device *dev);
>  bool is_virtio_device(struct device *dev);
>  
> +/**
> + * virtio_use_desc_cache - virtio ring use desc cache
> + *
> + * virtio will cache the allocated indirect desc.
> + *
> + * This function must be called before find_vqs.
> + */
> +void virtio_use_desc_cache(struct virtio_device *dev, bool val);
> +
>  void virtio_break_device(struct virtio_device *dev);
>  
>  void virtio_config_changed(struct virtio_device *dev);
> -- 
> 2.31.0

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 1/3] virtio: cache indirect desc for split
  2021-10-27  8:55     ` Michael S. Tsirkin
  (?)
@ 2021-10-27  9:03     ` Xuan Zhuo
  -1 siblings, 0 replies; 41+ messages in thread
From: Xuan Zhuo @ 2021-10-27  9:03 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: netdev, David S. Miller, Jakub Kicinski, virtualization

On Wed, 27 Oct 2021 04:55:16 -0400, Michael S. Tsirkin <mst@redhat.com> wrote:
> On Wed, Oct 27, 2021 at 02:19:11PM +0800, Xuan Zhuo wrote:
> > In the case of using indirect, indirect desc must be allocated and
> > released each time, which increases a lot of cpu overhead.
> >
> > Here, a cache is added for indirect. If the number of indirect desc to be
> > applied for is less than VIRT_QUEUE_CACHE_DESC_NUM, the desc array with
> > the size of VIRT_QUEUE_CACHE_DESC_NUM is fixed and cached for reuse.
> >
> > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > ---
> >  drivers/virtio/virtio.c      |  6 ++++
> >  drivers/virtio/virtio_ring.c | 63 ++++++++++++++++++++++++++++++------
> >  include/linux/virtio.h       | 10 ++++++
> >  3 files changed, 70 insertions(+), 9 deletions(-)
> >
> > diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c
> > index 0a5b54034d4b..04bcb74e5b9a 100644
> > --- a/drivers/virtio/virtio.c
> > +++ b/drivers/virtio/virtio.c
> > @@ -431,6 +431,12 @@ bool is_virtio_device(struct device *dev)
> >  }
> >  EXPORT_SYMBOL_GPL(is_virtio_device);
> >
> > +void virtio_use_desc_cache(struct virtio_device *dev, bool val)
> > +{
> > +	dev->desc_cache = val;
> > +}
> > +EXPORT_SYMBOL_GPL(virtio_use_desc_cache);
> > +
> >  void unregister_virtio_device(struct virtio_device *dev)
> >  {
> >  	int index = dev->index; /* save for after device release */
> > diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> > index dd95dfd85e98..0b9a8544b0e8 100644
> > --- a/drivers/virtio/virtio_ring.c
> > +++ b/drivers/virtio/virtio_ring.c
> > @@ -117,6 +117,10 @@ struct vring_virtqueue {
> >  	/* Hint for event idx: already triggered no need to disable. */
> >  	bool event_triggered;
> >
> > +	/* Is indirect cache used? */
> > +	bool use_desc_cache;
> > +	void *desc_cache_chain;
> > +
> >  	union {
> >  		/* Available for split ring */
> >  		struct {
>
> Let's use llist_head and friends please (I am guessing we want
> single-linked to avoid writing into indirect buffer after use,
> invalidating the cache, but please document why in a comment).  Do not
> open-code it.
>
> Also hide all casts in inline wrappers.
>
>
> > @@ -423,12 +427,47 @@ static unsigned int vring_unmap_one_split(const struct vring_virtqueue *vq,
> >  	return extra[i].next;
> >  }
> >
> > -static struct vring_desc *alloc_indirect_split(struct virtqueue *_vq,
> > +#define VIRT_QUEUE_CACHE_DESC_NUM 4
>
> Add an inline wrapper for checking sg versus VIRT_QUEUE_CACHE_DESC_NUM.
>
> > +
> > +static void desc_cache_chain_free_split(void *chain)
> > +{
> > +	struct vring_desc *desc;
> > +
> > +	while (chain) {
> > +		desc = chain;
> > +		chain = (void *)desc->addr;
> > +		kfree(desc);
> > +	}
> > +}
> > +
> > +static void desc_cache_put_split(struct vring_virtqueue *vq,
> > +				 struct vring_desc *desc, int n)
> > +{
> > +	if (vq->use_desc_cache && n <= VIRT_QUEUE_CACHE_DESC_NUM) {
> > +		desc->addr = (u64)vq->desc_cache_chain;
> > +		vq->desc_cache_chain = desc;
> > +	} else {
> > +		kfree(desc);
> > +	}
> > +}
>
>
> > +
> > +static struct vring_desc *alloc_indirect_split(struct vring_virtqueue *vq,
> >  					       unsigned int total_sg,
> >  					       gfp_t gfp)
> >  {
> >  	struct vring_desc *desc;
> > -	unsigned int i;
> > +	unsigned int i, n;
> > +
> > +	if (vq->use_desc_cache && total_sg <= VIRT_QUEUE_CACHE_DESC_NUM) {
> > +		if (vq->desc_cache_chain) {
> > +			desc = vq->desc_cache_chain;
> > +			vq->desc_cache_chain = (void *)desc->addr;
> > +			goto got;
> > +		}
> > +		n = VIRT_QUEUE_CACHE_DESC_NUM;
>
> Hmm. This will allocate more entries than actually used. Why do it?

If total_sg here is 2, we only allocate two desc here, but if total_sg is 3
later, it will be difficult to use the desc allocated this time.

So if total_sg is less than or equal to 4, a desc array of size 4 is allocated,
so that subsequent desc arrays can be reused when total_sg is less than or
equal to 4. If total_sg is greater than 4, use kmalloc_array to allocate
directly without using the cache.

Thanks.

>
> > +	} else {
> > +		n = total_sg;
> > +	}
> >
> >  	/*
> >  	 * We require lowmem mappings for the descriptors because
> > @@ -437,12 +476,13 @@ static struct vring_desc *alloc_indirect_split(struct virtqueue *_vq,
> >  	 */
> >  	gfp &= ~__GFP_HIGHMEM;
> >
> > -	desc = kmalloc_array(total_sg, sizeof(struct vring_desc), gfp);
> > +	desc = kmalloc_array(n, sizeof(struct vring_desc), gfp);
> >  	if (!desc)
> >  		return NULL;
> >
> > +got:
> >  	for (i = 0; i < total_sg; i++)
> > -		desc[i].next = cpu_to_virtio16(_vq->vdev, i + 1);
> > +		desc[i].next = cpu_to_virtio16(vq->vq.vdev, i + 1);
> >  	return desc;
> >  }
> >
> > @@ -508,7 +548,7 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
> >  	head = vq->free_head;
> >
> >  	if (virtqueue_use_indirect(_vq, total_sg))
> > -		desc = alloc_indirect_split(_vq, total_sg, gfp);
> > +		desc = alloc_indirect_split(vq, total_sg, gfp);
> >  	else {
> >  		desc = NULL;
> >  		WARN_ON_ONCE(total_sg > vq->split.vring.num && !vq->indirect);
> > @@ -652,7 +692,7 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
> >  	}
> >
> >  	if (indirect)
> > -		kfree(desc);
> > +		desc_cache_put_split(vq, desc, total_sg);
> >
> >  	END_USE(vq);
> >  	return -ENOMEM;
> > @@ -717,7 +757,7 @@ static void detach_buf_split(struct vring_virtqueue *vq, unsigned int head,
> >  	if (vq->indirect) {
> >  		struct vring_desc *indir_desc =
> >  				vq->split.desc_state[head].indir_desc;
> > -		u32 len;
> > +		u32 len, n;
> >
> >  		/* Free the indirect table, if any, now that it's unmapped. */
> >  		if (!indir_desc)
> > @@ -729,10 +769,12 @@ static void detach_buf_split(struct vring_virtqueue *vq, unsigned int head,
> >  				VRING_DESC_F_INDIRECT));
> >  		BUG_ON(len == 0 || len % sizeof(struct vring_desc));
> >
> > -		for (j = 0; j < len / sizeof(struct vring_desc); j++)
> > +		n = len / sizeof(struct vring_desc);
> > +
> > +		for (j = 0; j < n; j++)
> >  			vring_unmap_one_split_indirect(vq, &indir_desc[j]);
> >
> > -		kfree(indir_desc);
> > +		desc_cache_put_split(vq, indir_desc, n);
> >  		vq->split.desc_state[head].indir_desc = NULL;
> >  	} else if (ctx) {
> >  		*ctx = vq->split.desc_state[head].indir_desc;
> > @@ -2199,6 +2241,8 @@ struct virtqueue *__vring_new_virtqueue(unsigned int index,
> >  	vq->indirect = virtio_has_feature(vdev, VIRTIO_RING_F_INDIRECT_DESC) &&
> >  		!context;
> >  	vq->event = virtio_has_feature(vdev, VIRTIO_RING_F_EVENT_IDX);
> > +	vq->desc_cache_chain = NULL;
> > +	vq->use_desc_cache = vdev->desc_cache;
> >
> >  	if (virtio_has_feature(vdev, VIRTIO_F_ORDER_PLATFORM))
> >  		vq->weak_barriers = false;
> > @@ -2329,6 +2373,7 @@ void vring_del_virtqueue(struct virtqueue *_vq)
> >  	if (!vq->packed_ring) {
> >  		kfree(vq->split.desc_state);
> >  		kfree(vq->split.desc_extra);
> > +		desc_cache_chain_free_split(vq->desc_cache_chain);
> >  	}
> >  	kfree(vq);
> >  }
> > diff --git a/include/linux/virtio.h b/include/linux/virtio.h
> > index 41edbc01ffa4..d84b7b8f4070 100644
> > --- a/include/linux/virtio.h
> > +++ b/include/linux/virtio.h
> > @@ -109,6 +109,7 @@ struct virtio_device {
> >  	bool failed;
> >  	bool config_enabled;
> >  	bool config_change_pending;
> > +	bool desc_cache;
> >  	spinlock_t config_lock;
> >  	spinlock_t vqs_list_lock; /* Protects VQs list access */
> >  	struct device dev;
> > @@ -130,6 +131,15 @@ int register_virtio_device(struct virtio_device *dev);
> >  void unregister_virtio_device(struct virtio_device *dev);
> >  bool is_virtio_device(struct device *dev);
> >
> > +/**
> > + * virtio_use_desc_cache - virtio ring use desc cache
> > + *
> > + * virtio will cache the allocated indirect desc.
> > + *
> > + * This function must be called before find_vqs.
> > + */
> > +void virtio_use_desc_cache(struct virtio_device *dev, bool val);
> > +
> >  void virtio_break_device(struct virtio_device *dev);
> >
> >  void virtio_config_changed(struct virtio_device *dev);
> > --
> > 2.31.0
>
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 3/3] virtio-net: enable virtio indirect cache
  2021-10-27  6:19   ` Xuan Zhuo
  (?)
@ 2021-10-27 15:55   ` Jakub Kicinski
  2021-10-28  1:57     ` Xuan Zhuo
  -1 siblings, 1 reply; 41+ messages in thread
From: Jakub Kicinski @ 2021-10-27 15:55 UTC (permalink / raw)
  To: Xuan Zhuo
  Cc: virtualization, netdev, Michael S. Tsirkin, Jason Wang, David S. Miller

On Wed, 27 Oct 2021 14:19:13 +0800 Xuan Zhuo wrote:
> +static bool virtio_desc_cache = true;
>  module_param(csum, bool, 0444);
>  module_param(gso, bool, 0444);
>  module_param(napi_tx, bool, 0644);
> +module_param(virtio_desc_cache, bool, 0644);

Can this be an ethtool priv flag? module params are discouraged because
they can't be controlled per-netdev.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 1/3] virtio: cache indirect desc for split
  2021-10-27  6:19   ` Xuan Zhuo
@ 2021-10-27 16:33     ` Dongli Zhang
  -1 siblings, 0 replies; 41+ messages in thread
From: Dongli Zhang @ 2021-10-27 16:33 UTC (permalink / raw)
  To: Xuan Zhuo
  Cc: Jakub Kicinski, David S. Miller, Michael S. Tsirkin,
	virtualization, netdev



On 10/26/21 11:19 PM, Xuan Zhuo wrote:
> In the case of using indirect, indirect desc must be allocated and
> released each time, which increases a lot of cpu overhead.
> 
> Here, a cache is added for indirect. If the number of indirect desc to be
> applied for is less than VIRT_QUEUE_CACHE_DESC_NUM, the desc array with
> the size of VIRT_QUEUE_CACHE_DESC_NUM is fixed and cached for reuse.
> 
> Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> ---
>  drivers/virtio/virtio.c      |  6 ++++
>  drivers/virtio/virtio_ring.c | 63 ++++++++++++++++++++++++++++++------
>  include/linux/virtio.h       | 10 ++++++
>  3 files changed, 70 insertions(+), 9 deletions(-)
> 
> diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c
> index 0a5b54034d4b..04bcb74e5b9a 100644
> --- a/drivers/virtio/virtio.c
> +++ b/drivers/virtio/virtio.c
> @@ -431,6 +431,12 @@ bool is_virtio_device(struct device *dev)
>  }
>  EXPORT_SYMBOL_GPL(is_virtio_device);
>  
> +void virtio_use_desc_cache(struct virtio_device *dev, bool val)
> +{
> +	dev->desc_cache = val;
> +}
> +EXPORT_SYMBOL_GPL(virtio_use_desc_cache);
> +
>  void unregister_virtio_device(struct virtio_device *dev)
>  {
>  	int index = dev->index; /* save for after device release */
> diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> index dd95dfd85e98..0b9a8544b0e8 100644
> --- a/drivers/virtio/virtio_ring.c
> +++ b/drivers/virtio/virtio_ring.c
> @@ -117,6 +117,10 @@ struct vring_virtqueue {
>  	/* Hint for event idx: already triggered no need to disable. */
>  	bool event_triggered;
>  
> +	/* Is indirect cache used? */
> +	bool use_desc_cache;
> +	void *desc_cache_chain;
> +
>  	union {
>  		/* Available for split ring */
>  		struct {
> @@ -423,12 +427,47 @@ static unsigned int vring_unmap_one_split(const struct vring_virtqueue *vq,
>  	return extra[i].next;
>  }
>  
> -static struct vring_desc *alloc_indirect_split(struct virtqueue *_vq,
> +#define VIRT_QUEUE_CACHE_DESC_NUM 4
> +
> +static void desc_cache_chain_free_split(void *chain)
> +{
> +	struct vring_desc *desc;
> +
> +	while (chain) {
> +		desc = chain;
> +		chain = (void *)desc->addr;
> +		kfree(desc);
> +	}
> +}
> +
> +static void desc_cache_put_split(struct vring_virtqueue *vq,
> +				 struct vring_desc *desc, int n)
> +{
> +	if (vq->use_desc_cache && n <= VIRT_QUEUE_CACHE_DESC_NUM) {
> +		desc->addr = (u64)vq->desc_cache_chain;
> +		vq->desc_cache_chain = desc;
> +	} else {
> +		kfree(desc);
> +	}
> +}
> +
> +static struct vring_desc *alloc_indirect_split(struct vring_virtqueue *vq,
>  					       unsigned int total_sg,
>  					       gfp_t gfp)
>  {
>  	struct vring_desc *desc;
> -	unsigned int i;
> +	unsigned int i, n;
> +
> +	if (vq->use_desc_cache && total_sg <= VIRT_QUEUE_CACHE_DESC_NUM) {
> +		if (vq->desc_cache_chain) {
> +			desc = vq->desc_cache_chain;
> +			vq->desc_cache_chain = (void *)desc->addr;
> +			goto got;
> +		}
> +		n = VIRT_QUEUE_CACHE_DESC_NUM;

How about to make the VIRT_QUEUE_CACHE_DESC_NUM configurable (at least during
driver probing) unless there is a reason that the default value is 4.

Thank you very much!

Dongli Zhang



> +	} else {
> +		n = total_sg;
> +	}
>  
>  	/*
>  	 * We require lowmem mappings for the descriptors because
> @@ -437,12 +476,13 @@ static struct vring_desc *alloc_indirect_split(struct virtqueue *_vq,
>  	 */
>  	gfp &= ~__GFP_HIGHMEM;
>  
> -	desc = kmalloc_array(total_sg, sizeof(struct vring_desc), gfp);
> +	desc = kmalloc_array(n, sizeof(struct vring_desc), gfp);
>  	if (!desc)
>  		return NULL;
>  
> +got:
>  	for (i = 0; i < total_sg; i++)
> -		desc[i].next = cpu_to_virtio16(_vq->vdev, i + 1);
> +		desc[i].next = cpu_to_virtio16(vq->vq.vdev, i + 1);
>  	return desc;
>  }
>  
> @@ -508,7 +548,7 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
>  	head = vq->free_head;
>  
>  	if (virtqueue_use_indirect(_vq, total_sg))
> -		desc = alloc_indirect_split(_vq, total_sg, gfp);
> +		desc = alloc_indirect_split(vq, total_sg, gfp);
>  	else {
>  		desc = NULL;
>  		WARN_ON_ONCE(total_sg > vq->split.vring.num && !vq->indirect);
> @@ -652,7 +692,7 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
>  	}
>  
>  	if (indirect)
> -		kfree(desc);
> +		desc_cache_put_split(vq, desc, total_sg);
>  
>  	END_USE(vq);
>  	return -ENOMEM;
> @@ -717,7 +757,7 @@ static void detach_buf_split(struct vring_virtqueue *vq, unsigned int head,
>  	if (vq->indirect) {
>  		struct vring_desc *indir_desc =
>  				vq->split.desc_state[head].indir_desc;
> -		u32 len;
> +		u32 len, n;
>  
>  		/* Free the indirect table, if any, now that it's unmapped. */
>  		if (!indir_desc)
> @@ -729,10 +769,12 @@ static void detach_buf_split(struct vring_virtqueue *vq, unsigned int head,
>  				VRING_DESC_F_INDIRECT));
>  		BUG_ON(len == 0 || len % sizeof(struct vring_desc));
>  
> -		for (j = 0; j < len / sizeof(struct vring_desc); j++)
> +		n = len / sizeof(struct vring_desc);
> +
> +		for (j = 0; j < n; j++)
>  			vring_unmap_one_split_indirect(vq, &indir_desc[j]);
>  
> -		kfree(indir_desc);
> +		desc_cache_put_split(vq, indir_desc, n);
>  		vq->split.desc_state[head].indir_desc = NULL;
>  	} else if (ctx) {
>  		*ctx = vq->split.desc_state[head].indir_desc;
> @@ -2199,6 +2241,8 @@ struct virtqueue *__vring_new_virtqueue(unsigned int index,
>  	vq->indirect = virtio_has_feature(vdev, VIRTIO_RING_F_INDIRECT_DESC) &&
>  		!context;
>  	vq->event = virtio_has_feature(vdev, VIRTIO_RING_F_EVENT_IDX);
> +	vq->desc_cache_chain = NULL;
> +	vq->use_desc_cache = vdev->desc_cache;
>  
>  	if (virtio_has_feature(vdev, VIRTIO_F_ORDER_PLATFORM))
>  		vq->weak_barriers = false;
> @@ -2329,6 +2373,7 @@ void vring_del_virtqueue(struct virtqueue *_vq)
>  	if (!vq->packed_ring) {
>  		kfree(vq->split.desc_state);
>  		kfree(vq->split.desc_extra);
> +		desc_cache_chain_free_split(vq->desc_cache_chain);
>  	}
>  	kfree(vq);
>  }
> diff --git a/include/linux/virtio.h b/include/linux/virtio.h
> index 41edbc01ffa4..d84b7b8f4070 100644
> --- a/include/linux/virtio.h
> +++ b/include/linux/virtio.h
> @@ -109,6 +109,7 @@ struct virtio_device {
>  	bool failed;
>  	bool config_enabled;
>  	bool config_change_pending;
> +	bool desc_cache;
>  	spinlock_t config_lock;
>  	spinlock_t vqs_list_lock; /* Protects VQs list access */
>  	struct device dev;
> @@ -130,6 +131,15 @@ int register_virtio_device(struct virtio_device *dev);
>  void unregister_virtio_device(struct virtio_device *dev);
>  bool is_virtio_device(struct device *dev);
>  
> +/**
> + * virtio_use_desc_cache - virtio ring use desc cache
> + *
> + * virtio will cache the allocated indirect desc.
> + *
> + * This function must be called before find_vqs.
> + */
> +void virtio_use_desc_cache(struct virtio_device *dev, bool val);
> +
>  void virtio_break_device(struct virtio_device *dev);
>  
>  void virtio_config_changed(struct virtio_device *dev);
> 

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 1/3] virtio: cache indirect desc for split
@ 2021-10-27 16:33     ` Dongli Zhang
  0 siblings, 0 replies; 41+ messages in thread
From: Dongli Zhang @ 2021-10-27 16:33 UTC (permalink / raw)
  To: Xuan Zhuo
  Cc: Jakub Kicinski, netdev, virtualization, David S. Miller,
	Michael S. Tsirkin



On 10/26/21 11:19 PM, Xuan Zhuo wrote:
> In the case of using indirect, indirect desc must be allocated and
> released each time, which increases a lot of cpu overhead.
> 
> Here, a cache is added for indirect. If the number of indirect desc to be
> applied for is less than VIRT_QUEUE_CACHE_DESC_NUM, the desc array with
> the size of VIRT_QUEUE_CACHE_DESC_NUM is fixed and cached for reuse.
> 
> Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> ---
>  drivers/virtio/virtio.c      |  6 ++++
>  drivers/virtio/virtio_ring.c | 63 ++++++++++++++++++++++++++++++------
>  include/linux/virtio.h       | 10 ++++++
>  3 files changed, 70 insertions(+), 9 deletions(-)
> 
> diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c
> index 0a5b54034d4b..04bcb74e5b9a 100644
> --- a/drivers/virtio/virtio.c
> +++ b/drivers/virtio/virtio.c
> @@ -431,6 +431,12 @@ bool is_virtio_device(struct device *dev)
>  }
>  EXPORT_SYMBOL_GPL(is_virtio_device);
>  
> +void virtio_use_desc_cache(struct virtio_device *dev, bool val)
> +{
> +	dev->desc_cache = val;
> +}
> +EXPORT_SYMBOL_GPL(virtio_use_desc_cache);
> +
>  void unregister_virtio_device(struct virtio_device *dev)
>  {
>  	int index = dev->index; /* save for after device release */
> diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> index dd95dfd85e98..0b9a8544b0e8 100644
> --- a/drivers/virtio/virtio_ring.c
> +++ b/drivers/virtio/virtio_ring.c
> @@ -117,6 +117,10 @@ struct vring_virtqueue {
>  	/* Hint for event idx: already triggered no need to disable. */
>  	bool event_triggered;
>  
> +	/* Is indirect cache used? */
> +	bool use_desc_cache;
> +	void *desc_cache_chain;
> +
>  	union {
>  		/* Available for split ring */
>  		struct {
> @@ -423,12 +427,47 @@ static unsigned int vring_unmap_one_split(const struct vring_virtqueue *vq,
>  	return extra[i].next;
>  }
>  
> -static struct vring_desc *alloc_indirect_split(struct virtqueue *_vq,
> +#define VIRT_QUEUE_CACHE_DESC_NUM 4
> +
> +static void desc_cache_chain_free_split(void *chain)
> +{
> +	struct vring_desc *desc;
> +
> +	while (chain) {
> +		desc = chain;
> +		chain = (void *)desc->addr;
> +		kfree(desc);
> +	}
> +}
> +
> +static void desc_cache_put_split(struct vring_virtqueue *vq,
> +				 struct vring_desc *desc, int n)
> +{
> +	if (vq->use_desc_cache && n <= VIRT_QUEUE_CACHE_DESC_NUM) {
> +		desc->addr = (u64)vq->desc_cache_chain;
> +		vq->desc_cache_chain = desc;
> +	} else {
> +		kfree(desc);
> +	}
> +}
> +
> +static struct vring_desc *alloc_indirect_split(struct vring_virtqueue *vq,
>  					       unsigned int total_sg,
>  					       gfp_t gfp)
>  {
>  	struct vring_desc *desc;
> -	unsigned int i;
> +	unsigned int i, n;
> +
> +	if (vq->use_desc_cache && total_sg <= VIRT_QUEUE_CACHE_DESC_NUM) {
> +		if (vq->desc_cache_chain) {
> +			desc = vq->desc_cache_chain;
> +			vq->desc_cache_chain = (void *)desc->addr;
> +			goto got;
> +		}
> +		n = VIRT_QUEUE_CACHE_DESC_NUM;

How about to make the VIRT_QUEUE_CACHE_DESC_NUM configurable (at least during
driver probing) unless there is a reason that the default value is 4.

Thank you very much!

Dongli Zhang



> +	} else {
> +		n = total_sg;
> +	}
>  
>  	/*
>  	 * We require lowmem mappings for the descriptors because
> @@ -437,12 +476,13 @@ static struct vring_desc *alloc_indirect_split(struct virtqueue *_vq,
>  	 */
>  	gfp &= ~__GFP_HIGHMEM;
>  
> -	desc = kmalloc_array(total_sg, sizeof(struct vring_desc), gfp);
> +	desc = kmalloc_array(n, sizeof(struct vring_desc), gfp);
>  	if (!desc)
>  		return NULL;
>  
> +got:
>  	for (i = 0; i < total_sg; i++)
> -		desc[i].next = cpu_to_virtio16(_vq->vdev, i + 1);
> +		desc[i].next = cpu_to_virtio16(vq->vq.vdev, i + 1);
>  	return desc;
>  }
>  
> @@ -508,7 +548,7 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
>  	head = vq->free_head;
>  
>  	if (virtqueue_use_indirect(_vq, total_sg))
> -		desc = alloc_indirect_split(_vq, total_sg, gfp);
> +		desc = alloc_indirect_split(vq, total_sg, gfp);
>  	else {
>  		desc = NULL;
>  		WARN_ON_ONCE(total_sg > vq->split.vring.num && !vq->indirect);
> @@ -652,7 +692,7 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
>  	}
>  
>  	if (indirect)
> -		kfree(desc);
> +		desc_cache_put_split(vq, desc, total_sg);
>  
>  	END_USE(vq);
>  	return -ENOMEM;
> @@ -717,7 +757,7 @@ static void detach_buf_split(struct vring_virtqueue *vq, unsigned int head,
>  	if (vq->indirect) {
>  		struct vring_desc *indir_desc =
>  				vq->split.desc_state[head].indir_desc;
> -		u32 len;
> +		u32 len, n;
>  
>  		/* Free the indirect table, if any, now that it's unmapped. */
>  		if (!indir_desc)
> @@ -729,10 +769,12 @@ static void detach_buf_split(struct vring_virtqueue *vq, unsigned int head,
>  				VRING_DESC_F_INDIRECT));
>  		BUG_ON(len == 0 || len % sizeof(struct vring_desc));
>  
> -		for (j = 0; j < len / sizeof(struct vring_desc); j++)
> +		n = len / sizeof(struct vring_desc);
> +
> +		for (j = 0; j < n; j++)
>  			vring_unmap_one_split_indirect(vq, &indir_desc[j]);
>  
> -		kfree(indir_desc);
> +		desc_cache_put_split(vq, indir_desc, n);
>  		vq->split.desc_state[head].indir_desc = NULL;
>  	} else if (ctx) {
>  		*ctx = vq->split.desc_state[head].indir_desc;
> @@ -2199,6 +2241,8 @@ struct virtqueue *__vring_new_virtqueue(unsigned int index,
>  	vq->indirect = virtio_has_feature(vdev, VIRTIO_RING_F_INDIRECT_DESC) &&
>  		!context;
>  	vq->event = virtio_has_feature(vdev, VIRTIO_RING_F_EVENT_IDX);
> +	vq->desc_cache_chain = NULL;
> +	vq->use_desc_cache = vdev->desc_cache;
>  
>  	if (virtio_has_feature(vdev, VIRTIO_F_ORDER_PLATFORM))
>  		vq->weak_barriers = false;
> @@ -2329,6 +2373,7 @@ void vring_del_virtqueue(struct virtqueue *_vq)
>  	if (!vq->packed_ring) {
>  		kfree(vq->split.desc_state);
>  		kfree(vq->split.desc_extra);
> +		desc_cache_chain_free_split(vq->desc_cache_chain);
>  	}
>  	kfree(vq);
>  }
> diff --git a/include/linux/virtio.h b/include/linux/virtio.h
> index 41edbc01ffa4..d84b7b8f4070 100644
> --- a/include/linux/virtio.h
> +++ b/include/linux/virtio.h
> @@ -109,6 +109,7 @@ struct virtio_device {
>  	bool failed;
>  	bool config_enabled;
>  	bool config_change_pending;
> +	bool desc_cache;
>  	spinlock_t config_lock;
>  	spinlock_t vqs_list_lock; /* Protects VQs list access */
>  	struct device dev;
> @@ -130,6 +131,15 @@ int register_virtio_device(struct virtio_device *dev);
>  void unregister_virtio_device(struct virtio_device *dev);
>  bool is_virtio_device(struct device *dev);
>  
> +/**
> + * virtio_use_desc_cache - virtio ring use desc cache
> + *
> + * virtio will cache the allocated indirect desc.
> + *
> + * This function must be called before find_vqs.
> + */
> +void virtio_use_desc_cache(struct virtio_device *dev, bool val);
> +
>  void virtio_break_device(struct virtio_device *dev);
>  
>  void virtio_config_changed(struct virtio_device *dev);
> 
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 1/3] virtio: cache indirect desc for split
  2021-10-27  6:19   ` Xuan Zhuo
@ 2021-10-27 17:07     ` Michael S. Tsirkin
  -1 siblings, 0 replies; 41+ messages in thread
From: Michael S. Tsirkin @ 2021-10-27 17:07 UTC (permalink / raw)
  To: Xuan Zhuo
  Cc: virtualization, netdev, Jason Wang, David S. Miller, Jakub Kicinski

On Wed, Oct 27, 2021 at 02:19:11PM +0800, Xuan Zhuo wrote:
> In the case of using indirect, indirect desc must be allocated and
> released each time, which increases a lot of cpu overhead.
> 
> Here, a cache is added for indirect. If the number of indirect desc to be
> applied for is less than VIRT_QUEUE_CACHE_DESC_NUM, the desc array with
> the size of VIRT_QUEUE_CACHE_DESC_NUM is fixed and cached for reuse.
> 
> Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> ---
>  drivers/virtio/virtio.c      |  6 ++++
>  drivers/virtio/virtio_ring.c | 63 ++++++++++++++++++++++++++++++------
>  include/linux/virtio.h       | 10 ++++++
>  3 files changed, 70 insertions(+), 9 deletions(-)
> 
> diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c
> index 0a5b54034d4b..04bcb74e5b9a 100644
> --- a/drivers/virtio/virtio.c
> +++ b/drivers/virtio/virtio.c
> @@ -431,6 +431,12 @@ bool is_virtio_device(struct device *dev)
>  }
>  EXPORT_SYMBOL_GPL(is_virtio_device);
>  
> +void virtio_use_desc_cache(struct virtio_device *dev, bool val)
> +{
> +	dev->desc_cache = val;
> +}
> +EXPORT_SYMBOL_GPL(virtio_use_desc_cache);
> +
>  void unregister_virtio_device(struct virtio_device *dev)
>  {
>  	int index = dev->index; /* save for after device release */
> diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> index dd95dfd85e98..0b9a8544b0e8 100644
> --- a/drivers/virtio/virtio_ring.c
> +++ b/drivers/virtio/virtio_ring.c
> @@ -117,6 +117,10 @@ struct vring_virtqueue {
>  	/* Hint for event idx: already triggered no need to disable. */
>  	bool event_triggered;
>  
> +	/* Is indirect cache used? */
> +	bool use_desc_cache;
> +	void *desc_cache_chain;
> +
>  	union {
>  		/* Available for split ring */
>  		struct {
> @@ -423,12 +427,47 @@ static unsigned int vring_unmap_one_split(const struct vring_virtqueue *vq,
>  	return extra[i].next;
>  }
>  
> -static struct vring_desc *alloc_indirect_split(struct virtqueue *_vq,
> +#define VIRT_QUEUE_CACHE_DESC_NUM 4
> +
> +static void desc_cache_chain_free_split(void *chain)
> +{
> +	struct vring_desc *desc;
> +
> +	while (chain) {
> +		desc = chain;
> +		chain = (void *)desc->addr;
> +		kfree(desc);
> +	}
> +}
> +
> +static void desc_cache_put_split(struct vring_virtqueue *vq,
> +				 struct vring_desc *desc, int n)
> +{
> +	if (vq->use_desc_cache && n <= VIRT_QUEUE_CACHE_DESC_NUM) {
> +		desc->addr = (u64)vq->desc_cache_chain;
> +		vq->desc_cache_chain = desc;
> +	} else {
> +		kfree(desc);
> +	}
> +}
> +


So I have a question here. What happens if we just do:

if (n <= VIRT_QUEUE_CACHE_DESC_NUM) {
	return kmem_cache_alloc(VIRT_QUEUE_CACHE_DESC_NUM * sizeof desc, gfp)
} else {
	return kmalloc_arrat(n, sizeof desc, gfp)
}

A small change and won't we reap most performance benefits?

-- 
MST


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 1/3] virtio: cache indirect desc for split
@ 2021-10-27 17:07     ` Michael S. Tsirkin
  0 siblings, 0 replies; 41+ messages in thread
From: Michael S. Tsirkin @ 2021-10-27 17:07 UTC (permalink / raw)
  To: Xuan Zhuo; +Cc: netdev, David S. Miller, Jakub Kicinski, virtualization

On Wed, Oct 27, 2021 at 02:19:11PM +0800, Xuan Zhuo wrote:
> In the case of using indirect, indirect desc must be allocated and
> released each time, which increases a lot of cpu overhead.
> 
> Here, a cache is added for indirect. If the number of indirect desc to be
> applied for is less than VIRT_QUEUE_CACHE_DESC_NUM, the desc array with
> the size of VIRT_QUEUE_CACHE_DESC_NUM is fixed and cached for reuse.
> 
> Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> ---
>  drivers/virtio/virtio.c      |  6 ++++
>  drivers/virtio/virtio_ring.c | 63 ++++++++++++++++++++++++++++++------
>  include/linux/virtio.h       | 10 ++++++
>  3 files changed, 70 insertions(+), 9 deletions(-)
> 
> diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c
> index 0a5b54034d4b..04bcb74e5b9a 100644
> --- a/drivers/virtio/virtio.c
> +++ b/drivers/virtio/virtio.c
> @@ -431,6 +431,12 @@ bool is_virtio_device(struct device *dev)
>  }
>  EXPORT_SYMBOL_GPL(is_virtio_device);
>  
> +void virtio_use_desc_cache(struct virtio_device *dev, bool val)
> +{
> +	dev->desc_cache = val;
> +}
> +EXPORT_SYMBOL_GPL(virtio_use_desc_cache);
> +
>  void unregister_virtio_device(struct virtio_device *dev)
>  {
>  	int index = dev->index; /* save for after device release */
> diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> index dd95dfd85e98..0b9a8544b0e8 100644
> --- a/drivers/virtio/virtio_ring.c
> +++ b/drivers/virtio/virtio_ring.c
> @@ -117,6 +117,10 @@ struct vring_virtqueue {
>  	/* Hint for event idx: already triggered no need to disable. */
>  	bool event_triggered;
>  
> +	/* Is indirect cache used? */
> +	bool use_desc_cache;
> +	void *desc_cache_chain;
> +
>  	union {
>  		/* Available for split ring */
>  		struct {
> @@ -423,12 +427,47 @@ static unsigned int vring_unmap_one_split(const struct vring_virtqueue *vq,
>  	return extra[i].next;
>  }
>  
> -static struct vring_desc *alloc_indirect_split(struct virtqueue *_vq,
> +#define VIRT_QUEUE_CACHE_DESC_NUM 4
> +
> +static void desc_cache_chain_free_split(void *chain)
> +{
> +	struct vring_desc *desc;
> +
> +	while (chain) {
> +		desc = chain;
> +		chain = (void *)desc->addr;
> +		kfree(desc);
> +	}
> +}
> +
> +static void desc_cache_put_split(struct vring_virtqueue *vq,
> +				 struct vring_desc *desc, int n)
> +{
> +	if (vq->use_desc_cache && n <= VIRT_QUEUE_CACHE_DESC_NUM) {
> +		desc->addr = (u64)vq->desc_cache_chain;
> +		vq->desc_cache_chain = desc;
> +	} else {
> +		kfree(desc);
> +	}
> +}
> +


So I have a question here. What happens if we just do:

if (n <= VIRT_QUEUE_CACHE_DESC_NUM) {
	return kmem_cache_alloc(VIRT_QUEUE_CACHE_DESC_NUM * sizeof desc, gfp)
} else {
	return kmalloc_arrat(n, sizeof desc, gfp)
}

A small change and won't we reap most performance benefits?

-- 
MST

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 1/3] virtio: cache indirect desc for split
  2021-10-27 16:33     ` Dongli Zhang
@ 2021-10-27 19:36       ` Michael S. Tsirkin
  -1 siblings, 0 replies; 41+ messages in thread
From: Michael S. Tsirkin @ 2021-10-27 19:36 UTC (permalink / raw)
  To: Dongli Zhang
  Cc: Xuan Zhuo, Jakub Kicinski, David S. Miller, virtualization, netdev

On Wed, Oct 27, 2021 at 09:33:46AM -0700, Dongli Zhang wrote:
> 
> 
> On 10/26/21 11:19 PM, Xuan Zhuo wrote:
> > In the case of using indirect, indirect desc must be allocated and
> > released each time, which increases a lot of cpu overhead.
> > 
> > Here, a cache is added for indirect. If the number of indirect desc to be
> > applied for is less than VIRT_QUEUE_CACHE_DESC_NUM, the desc array with
> > the size of VIRT_QUEUE_CACHE_DESC_NUM is fixed and cached for reuse.
> > 
> > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > ---
> >  drivers/virtio/virtio.c      |  6 ++++
> >  drivers/virtio/virtio_ring.c | 63 ++++++++++++++++++++++++++++++------
> >  include/linux/virtio.h       | 10 ++++++
> >  3 files changed, 70 insertions(+), 9 deletions(-)
> > 
> > diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c
> > index 0a5b54034d4b..04bcb74e5b9a 100644
> > --- a/drivers/virtio/virtio.c
> > +++ b/drivers/virtio/virtio.c
> > @@ -431,6 +431,12 @@ bool is_virtio_device(struct device *dev)
> >  }
> >  EXPORT_SYMBOL_GPL(is_virtio_device);
> >  
> > +void virtio_use_desc_cache(struct virtio_device *dev, bool val)
> > +{
> > +	dev->desc_cache = val;
> > +}
> > +EXPORT_SYMBOL_GPL(virtio_use_desc_cache);
> > +
> >  void unregister_virtio_device(struct virtio_device *dev)
> >  {
> >  	int index = dev->index; /* save for after device release */
> > diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> > index dd95dfd85e98..0b9a8544b0e8 100644
> > --- a/drivers/virtio/virtio_ring.c
> > +++ b/drivers/virtio/virtio_ring.c
> > @@ -117,6 +117,10 @@ struct vring_virtqueue {
> >  	/* Hint for event idx: already triggered no need to disable. */
> >  	bool event_triggered;
> >  
> > +	/* Is indirect cache used? */
> > +	bool use_desc_cache;
> > +	void *desc_cache_chain;
> > +
> >  	union {
> >  		/* Available for split ring */
> >  		struct {
> > @@ -423,12 +427,47 @@ static unsigned int vring_unmap_one_split(const struct vring_virtqueue *vq,
> >  	return extra[i].next;
> >  }
> >  
> > -static struct vring_desc *alloc_indirect_split(struct virtqueue *_vq,
> > +#define VIRT_QUEUE_CACHE_DESC_NUM 4
> > +
> > +static void desc_cache_chain_free_split(void *chain)
> > +{
> > +	struct vring_desc *desc;
> > +
> > +	while (chain) {
> > +		desc = chain;
> > +		chain = (void *)desc->addr;
> > +		kfree(desc);
> > +	}
> > +}
> > +
> > +static void desc_cache_put_split(struct vring_virtqueue *vq,
> > +				 struct vring_desc *desc, int n)
> > +{
> > +	if (vq->use_desc_cache && n <= VIRT_QUEUE_CACHE_DESC_NUM) {
> > +		desc->addr = (u64)vq->desc_cache_chain;
> > +		vq->desc_cache_chain = desc;
> > +	} else {
> > +		kfree(desc);
> > +	}
> > +}
> > +
> > +static struct vring_desc *alloc_indirect_split(struct vring_virtqueue *vq,
> >  					       unsigned int total_sg,
> >  					       gfp_t gfp)
> >  {
> >  	struct vring_desc *desc;
> > -	unsigned int i;
> > +	unsigned int i, n;
> > +
> > +	if (vq->use_desc_cache && total_sg <= VIRT_QUEUE_CACHE_DESC_NUM) {
> > +		if (vq->desc_cache_chain) {
> > +			desc = vq->desc_cache_chain;
> > +			vq->desc_cache_chain = (void *)desc->addr;
> > +			goto got;
> > +		}
> > +		n = VIRT_QUEUE_CACHE_DESC_NUM;
> 
> How about to make the VIRT_QUEUE_CACHE_DESC_NUM configurable (at least during
> driver probing) unless there is a reason that the default value is 4.
> 
> Thank you very much!
> 
> Dongli Zhang


I would start with some experimentation showing that it actually makes a
difference in performance.

> 
> 
> > +	} else {
> > +		n = total_sg;
> > +	}
> >  
> >  	/*
> >  	 * We require lowmem mappings for the descriptors because
> > @@ -437,12 +476,13 @@ static struct vring_desc *alloc_indirect_split(struct virtqueue *_vq,
> >  	 */
> >  	gfp &= ~__GFP_HIGHMEM;
> >  
> > -	desc = kmalloc_array(total_sg, sizeof(struct vring_desc), gfp);
> > +	desc = kmalloc_array(n, sizeof(struct vring_desc), gfp);
> >  	if (!desc)
> >  		return NULL;
> >  
> > +got:
> >  	for (i = 0; i < total_sg; i++)
> > -		desc[i].next = cpu_to_virtio16(_vq->vdev, i + 1);
> > +		desc[i].next = cpu_to_virtio16(vq->vq.vdev, i + 1);
> >  	return desc;
> >  }
> >  
> > @@ -508,7 +548,7 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
> >  	head = vq->free_head;
> >  
> >  	if (virtqueue_use_indirect(_vq, total_sg))
> > -		desc = alloc_indirect_split(_vq, total_sg, gfp);
> > +		desc = alloc_indirect_split(vq, total_sg, gfp);
> >  	else {
> >  		desc = NULL;
> >  		WARN_ON_ONCE(total_sg > vq->split.vring.num && !vq->indirect);
> > @@ -652,7 +692,7 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
> >  	}
> >  
> >  	if (indirect)
> > -		kfree(desc);
> > +		desc_cache_put_split(vq, desc, total_sg);
> >  
> >  	END_USE(vq);
> >  	return -ENOMEM;
> > @@ -717,7 +757,7 @@ static void detach_buf_split(struct vring_virtqueue *vq, unsigned int head,
> >  	if (vq->indirect) {
> >  		struct vring_desc *indir_desc =
> >  				vq->split.desc_state[head].indir_desc;
> > -		u32 len;
> > +		u32 len, n;
> >  
> >  		/* Free the indirect table, if any, now that it's unmapped. */
> >  		if (!indir_desc)
> > @@ -729,10 +769,12 @@ static void detach_buf_split(struct vring_virtqueue *vq, unsigned int head,
> >  				VRING_DESC_F_INDIRECT));
> >  		BUG_ON(len == 0 || len % sizeof(struct vring_desc));
> >  
> > -		for (j = 0; j < len / sizeof(struct vring_desc); j++)
> > +		n = len / sizeof(struct vring_desc);
> > +
> > +		for (j = 0; j < n; j++)
> >  			vring_unmap_one_split_indirect(vq, &indir_desc[j]);
> >  
> > -		kfree(indir_desc);
> > +		desc_cache_put_split(vq, indir_desc, n);
> >  		vq->split.desc_state[head].indir_desc = NULL;
> >  	} else if (ctx) {
> >  		*ctx = vq->split.desc_state[head].indir_desc;
> > @@ -2199,6 +2241,8 @@ struct virtqueue *__vring_new_virtqueue(unsigned int index,
> >  	vq->indirect = virtio_has_feature(vdev, VIRTIO_RING_F_INDIRECT_DESC) &&
> >  		!context;
> >  	vq->event = virtio_has_feature(vdev, VIRTIO_RING_F_EVENT_IDX);
> > +	vq->desc_cache_chain = NULL;
> > +	vq->use_desc_cache = vdev->desc_cache;
> >  
> >  	if (virtio_has_feature(vdev, VIRTIO_F_ORDER_PLATFORM))
> >  		vq->weak_barriers = false;
> > @@ -2329,6 +2373,7 @@ void vring_del_virtqueue(struct virtqueue *_vq)
> >  	if (!vq->packed_ring) {
> >  		kfree(vq->split.desc_state);
> >  		kfree(vq->split.desc_extra);
> > +		desc_cache_chain_free_split(vq->desc_cache_chain);
> >  	}
> >  	kfree(vq);
> >  }
> > diff --git a/include/linux/virtio.h b/include/linux/virtio.h
> > index 41edbc01ffa4..d84b7b8f4070 100644
> > --- a/include/linux/virtio.h
> > +++ b/include/linux/virtio.h
> > @@ -109,6 +109,7 @@ struct virtio_device {
> >  	bool failed;
> >  	bool config_enabled;
> >  	bool config_change_pending;
> > +	bool desc_cache;
> >  	spinlock_t config_lock;
> >  	spinlock_t vqs_list_lock; /* Protects VQs list access */
> >  	struct device dev;
> > @@ -130,6 +131,15 @@ int register_virtio_device(struct virtio_device *dev);
> >  void unregister_virtio_device(struct virtio_device *dev);
> >  bool is_virtio_device(struct device *dev);
> >  
> > +/**
> > + * virtio_use_desc_cache - virtio ring use desc cache
> > + *
> > + * virtio will cache the allocated indirect desc.
> > + *
> > + * This function must be called before find_vqs.
> > + */
> > +void virtio_use_desc_cache(struct virtio_device *dev, bool val);
> > +
> >  void virtio_break_device(struct virtio_device *dev);
> >  
> >  void virtio_config_changed(struct virtio_device *dev);
> > 


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 1/3] virtio: cache indirect desc for split
@ 2021-10-27 19:36       ` Michael S. Tsirkin
  0 siblings, 0 replies; 41+ messages in thread
From: Michael S. Tsirkin @ 2021-10-27 19:36 UTC (permalink / raw)
  To: Dongli Zhang; +Cc: netdev, virtualization, David S. Miller, Jakub Kicinski

On Wed, Oct 27, 2021 at 09:33:46AM -0700, Dongli Zhang wrote:
> 
> 
> On 10/26/21 11:19 PM, Xuan Zhuo wrote:
> > In the case of using indirect, indirect desc must be allocated and
> > released each time, which increases a lot of cpu overhead.
> > 
> > Here, a cache is added for indirect. If the number of indirect desc to be
> > applied for is less than VIRT_QUEUE_CACHE_DESC_NUM, the desc array with
> > the size of VIRT_QUEUE_CACHE_DESC_NUM is fixed and cached for reuse.
> > 
> > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > ---
> >  drivers/virtio/virtio.c      |  6 ++++
> >  drivers/virtio/virtio_ring.c | 63 ++++++++++++++++++++++++++++++------
> >  include/linux/virtio.h       | 10 ++++++
> >  3 files changed, 70 insertions(+), 9 deletions(-)
> > 
> > diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c
> > index 0a5b54034d4b..04bcb74e5b9a 100644
> > --- a/drivers/virtio/virtio.c
> > +++ b/drivers/virtio/virtio.c
> > @@ -431,6 +431,12 @@ bool is_virtio_device(struct device *dev)
> >  }
> >  EXPORT_SYMBOL_GPL(is_virtio_device);
> >  
> > +void virtio_use_desc_cache(struct virtio_device *dev, bool val)
> > +{
> > +	dev->desc_cache = val;
> > +}
> > +EXPORT_SYMBOL_GPL(virtio_use_desc_cache);
> > +
> >  void unregister_virtio_device(struct virtio_device *dev)
> >  {
> >  	int index = dev->index; /* save for after device release */
> > diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> > index dd95dfd85e98..0b9a8544b0e8 100644
> > --- a/drivers/virtio/virtio_ring.c
> > +++ b/drivers/virtio/virtio_ring.c
> > @@ -117,6 +117,10 @@ struct vring_virtqueue {
> >  	/* Hint for event idx: already triggered no need to disable. */
> >  	bool event_triggered;
> >  
> > +	/* Is indirect cache used? */
> > +	bool use_desc_cache;
> > +	void *desc_cache_chain;
> > +
> >  	union {
> >  		/* Available for split ring */
> >  		struct {
> > @@ -423,12 +427,47 @@ static unsigned int vring_unmap_one_split(const struct vring_virtqueue *vq,
> >  	return extra[i].next;
> >  }
> >  
> > -static struct vring_desc *alloc_indirect_split(struct virtqueue *_vq,
> > +#define VIRT_QUEUE_CACHE_DESC_NUM 4
> > +
> > +static void desc_cache_chain_free_split(void *chain)
> > +{
> > +	struct vring_desc *desc;
> > +
> > +	while (chain) {
> > +		desc = chain;
> > +		chain = (void *)desc->addr;
> > +		kfree(desc);
> > +	}
> > +}
> > +
> > +static void desc_cache_put_split(struct vring_virtqueue *vq,
> > +				 struct vring_desc *desc, int n)
> > +{
> > +	if (vq->use_desc_cache && n <= VIRT_QUEUE_CACHE_DESC_NUM) {
> > +		desc->addr = (u64)vq->desc_cache_chain;
> > +		vq->desc_cache_chain = desc;
> > +	} else {
> > +		kfree(desc);
> > +	}
> > +}
> > +
> > +static struct vring_desc *alloc_indirect_split(struct vring_virtqueue *vq,
> >  					       unsigned int total_sg,
> >  					       gfp_t gfp)
> >  {
> >  	struct vring_desc *desc;
> > -	unsigned int i;
> > +	unsigned int i, n;
> > +
> > +	if (vq->use_desc_cache && total_sg <= VIRT_QUEUE_CACHE_DESC_NUM) {
> > +		if (vq->desc_cache_chain) {
> > +			desc = vq->desc_cache_chain;
> > +			vq->desc_cache_chain = (void *)desc->addr;
> > +			goto got;
> > +		}
> > +		n = VIRT_QUEUE_CACHE_DESC_NUM;
> 
> How about to make the VIRT_QUEUE_CACHE_DESC_NUM configurable (at least during
> driver probing) unless there is a reason that the default value is 4.
> 
> Thank you very much!
> 
> Dongli Zhang


I would start with some experimentation showing that it actually makes a
difference in performance.

> 
> 
> > +	} else {
> > +		n = total_sg;
> > +	}
> >  
> >  	/*
> >  	 * We require lowmem mappings for the descriptors because
> > @@ -437,12 +476,13 @@ static struct vring_desc *alloc_indirect_split(struct virtqueue *_vq,
> >  	 */
> >  	gfp &= ~__GFP_HIGHMEM;
> >  
> > -	desc = kmalloc_array(total_sg, sizeof(struct vring_desc), gfp);
> > +	desc = kmalloc_array(n, sizeof(struct vring_desc), gfp);
> >  	if (!desc)
> >  		return NULL;
> >  
> > +got:
> >  	for (i = 0; i < total_sg; i++)
> > -		desc[i].next = cpu_to_virtio16(_vq->vdev, i + 1);
> > +		desc[i].next = cpu_to_virtio16(vq->vq.vdev, i + 1);
> >  	return desc;
> >  }
> >  
> > @@ -508,7 +548,7 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
> >  	head = vq->free_head;
> >  
> >  	if (virtqueue_use_indirect(_vq, total_sg))
> > -		desc = alloc_indirect_split(_vq, total_sg, gfp);
> > +		desc = alloc_indirect_split(vq, total_sg, gfp);
> >  	else {
> >  		desc = NULL;
> >  		WARN_ON_ONCE(total_sg > vq->split.vring.num && !vq->indirect);
> > @@ -652,7 +692,7 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
> >  	}
> >  
> >  	if (indirect)
> > -		kfree(desc);
> > +		desc_cache_put_split(vq, desc, total_sg);
> >  
> >  	END_USE(vq);
> >  	return -ENOMEM;
> > @@ -717,7 +757,7 @@ static void detach_buf_split(struct vring_virtqueue *vq, unsigned int head,
> >  	if (vq->indirect) {
> >  		struct vring_desc *indir_desc =
> >  				vq->split.desc_state[head].indir_desc;
> > -		u32 len;
> > +		u32 len, n;
> >  
> >  		/* Free the indirect table, if any, now that it's unmapped. */
> >  		if (!indir_desc)
> > @@ -729,10 +769,12 @@ static void detach_buf_split(struct vring_virtqueue *vq, unsigned int head,
> >  				VRING_DESC_F_INDIRECT));
> >  		BUG_ON(len == 0 || len % sizeof(struct vring_desc));
> >  
> > -		for (j = 0; j < len / sizeof(struct vring_desc); j++)
> > +		n = len / sizeof(struct vring_desc);
> > +
> > +		for (j = 0; j < n; j++)
> >  			vring_unmap_one_split_indirect(vq, &indir_desc[j]);
> >  
> > -		kfree(indir_desc);
> > +		desc_cache_put_split(vq, indir_desc, n);
> >  		vq->split.desc_state[head].indir_desc = NULL;
> >  	} else if (ctx) {
> >  		*ctx = vq->split.desc_state[head].indir_desc;
> > @@ -2199,6 +2241,8 @@ struct virtqueue *__vring_new_virtqueue(unsigned int index,
> >  	vq->indirect = virtio_has_feature(vdev, VIRTIO_RING_F_INDIRECT_DESC) &&
> >  		!context;
> >  	vq->event = virtio_has_feature(vdev, VIRTIO_RING_F_EVENT_IDX);
> > +	vq->desc_cache_chain = NULL;
> > +	vq->use_desc_cache = vdev->desc_cache;
> >  
> >  	if (virtio_has_feature(vdev, VIRTIO_F_ORDER_PLATFORM))
> >  		vq->weak_barriers = false;
> > @@ -2329,6 +2373,7 @@ void vring_del_virtqueue(struct virtqueue *_vq)
> >  	if (!vq->packed_ring) {
> >  		kfree(vq->split.desc_state);
> >  		kfree(vq->split.desc_extra);
> > +		desc_cache_chain_free_split(vq->desc_cache_chain);
> >  	}
> >  	kfree(vq);
> >  }
> > diff --git a/include/linux/virtio.h b/include/linux/virtio.h
> > index 41edbc01ffa4..d84b7b8f4070 100644
> > --- a/include/linux/virtio.h
> > +++ b/include/linux/virtio.h
> > @@ -109,6 +109,7 @@ struct virtio_device {
> >  	bool failed;
> >  	bool config_enabled;
> >  	bool config_change_pending;
> > +	bool desc_cache;
> >  	spinlock_t config_lock;
> >  	spinlock_t vqs_list_lock; /* Protects VQs list access */
> >  	struct device dev;
> > @@ -130,6 +131,15 @@ int register_virtio_device(struct virtio_device *dev);
> >  void unregister_virtio_device(struct virtio_device *dev);
> >  bool is_virtio_device(struct device *dev);
> >  
> > +/**
> > + * virtio_use_desc_cache - virtio ring use desc cache
> > + *
> > + * virtio will cache the allocated indirect desc.
> > + *
> > + * This function must be called before find_vqs.
> > + */
> > +void virtio_use_desc_cache(struct virtio_device *dev, bool val);
> > +
> >  void virtio_break_device(struct virtio_device *dev);
> >  
> >  void virtio_config_changed(struct virtio_device *dev);
> > 

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 1/3] virtio: cache indirect desc for split
  2021-10-27  6:19   ` Xuan Zhuo
  (?)
@ 2021-10-27 23:02     ` kernel test robot
  -1 siblings, 0 replies; 41+ messages in thread
From: kernel test robot @ 2021-10-27 23:02 UTC (permalink / raw)
  To: Xuan Zhuo, virtualization, netdev
  Cc: kbuild-all, Michael S. Tsirkin, Jason Wang, Jakub Kicinski

[-- Attachment #1: Type: text/plain, Size: 3130 bytes --]

Hi Xuan,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on horms-ipvs/master]
[also build test WARNING on linus/master v5.15-rc7 next-20211027]
[cannot apply to mst-vhost/linux-next]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Xuan-Zhuo/virtio-support-cache-indirect-desc/20211027-142025
base:   https://git.kernel.org/pub/scm/linux/kernel/git/horms/ipvs.git master
config: arm-defconfig (attached as .config)
compiler: arm-linux-gnueabi-gcc (GCC) 11.2.0
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/0day-ci/linux/commit/8935e116c155fb7d484bad35a42b2ca98f75e384
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Xuan-Zhuo/virtio-support-cache-indirect-desc/20211027-142025
        git checkout 8935e116c155fb7d484bad35a42b2ca98f75e384
        # save the attached .config to linux build tree
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-11.2.0 make.cross ARCH=arm 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All warnings (new ones prefixed by >>):

   drivers/virtio/virtio_ring.c: In function 'desc_cache_chain_free_split':
>> drivers/virtio/virtio_ring.c:438:25: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]
     438 |                 chain = (void *)desc->addr;
         |                         ^
   drivers/virtio/virtio_ring.c: In function 'desc_cache_put_split':
>> drivers/virtio/virtio_ring.c:447:30: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast]
     447 |                 desc->addr = (u64)vq->desc_cache_chain;
         |                              ^
   drivers/virtio/virtio_ring.c: In function 'alloc_indirect_split':
   drivers/virtio/virtio_ring.c:464:48: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]
     464 |                         vq->desc_cache_chain = (void *)desc->addr;
         |                                                ^


vim +438 drivers/virtio/virtio_ring.c

   431	
   432	static void desc_cache_chain_free_split(void *chain)
   433	{
   434		struct vring_desc *desc;
   435	
   436		while (chain) {
   437			desc = chain;
 > 438			chain = (void *)desc->addr;
   439			kfree(desc);
   440		}
   441	}
   442	
   443	static void desc_cache_put_split(struct vring_virtqueue *vq,
   444					 struct vring_desc *desc, int n)
   445	{
   446		if (vq->use_desc_cache && n <= VIRT_QUEUE_CACHE_DESC_NUM) {
 > 447			desc->addr = (u64)vq->desc_cache_chain;
   448			vq->desc_cache_chain = desc;
   449		} else {
   450			kfree(desc);
   451		}
   452	}
   453	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 55358 bytes --]

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 1/3] virtio: cache indirect desc for split
@ 2021-10-27 23:02     ` kernel test robot
  0 siblings, 0 replies; 41+ messages in thread
From: kernel test robot @ 2021-10-27 23:02 UTC (permalink / raw)
  To: Xuan Zhuo, virtualization, netdev
  Cc: Jakub Kicinski, kbuild-all, Michael S. Tsirkin

[-- Attachment #1: Type: text/plain, Size: 3130 bytes --]

Hi Xuan,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on horms-ipvs/master]
[also build test WARNING on linus/master v5.15-rc7 next-20211027]
[cannot apply to mst-vhost/linux-next]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Xuan-Zhuo/virtio-support-cache-indirect-desc/20211027-142025
base:   https://git.kernel.org/pub/scm/linux/kernel/git/horms/ipvs.git master
config: arm-defconfig (attached as .config)
compiler: arm-linux-gnueabi-gcc (GCC) 11.2.0
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/0day-ci/linux/commit/8935e116c155fb7d484bad35a42b2ca98f75e384
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Xuan-Zhuo/virtio-support-cache-indirect-desc/20211027-142025
        git checkout 8935e116c155fb7d484bad35a42b2ca98f75e384
        # save the attached .config to linux build tree
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-11.2.0 make.cross ARCH=arm 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All warnings (new ones prefixed by >>):

   drivers/virtio/virtio_ring.c: In function 'desc_cache_chain_free_split':
>> drivers/virtio/virtio_ring.c:438:25: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]
     438 |                 chain = (void *)desc->addr;
         |                         ^
   drivers/virtio/virtio_ring.c: In function 'desc_cache_put_split':
>> drivers/virtio/virtio_ring.c:447:30: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast]
     447 |                 desc->addr = (u64)vq->desc_cache_chain;
         |                              ^
   drivers/virtio/virtio_ring.c: In function 'alloc_indirect_split':
   drivers/virtio/virtio_ring.c:464:48: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]
     464 |                         vq->desc_cache_chain = (void *)desc->addr;
         |                                                ^


vim +438 drivers/virtio/virtio_ring.c

   431	
   432	static void desc_cache_chain_free_split(void *chain)
   433	{
   434		struct vring_desc *desc;
   435	
   436		while (chain) {
   437			desc = chain;
 > 438			chain = (void *)desc->addr;
   439			kfree(desc);
   440		}
   441	}
   442	
   443	static void desc_cache_put_split(struct vring_virtqueue *vq,
   444					 struct vring_desc *desc, int n)
   445	{
   446		if (vq->use_desc_cache && n <= VIRT_QUEUE_CACHE_DESC_NUM) {
 > 447			desc->addr = (u64)vq->desc_cache_chain;
   448			vq->desc_cache_chain = desc;
   449		} else {
   450			kfree(desc);
   451		}
   452	}
   453	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 55358 bytes --]

[-- Attachment #3: Type: text/plain, Size: 183 bytes --]

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 1/3] virtio: cache indirect desc for split
@ 2021-10-27 23:02     ` kernel test robot
  0 siblings, 0 replies; 41+ messages in thread
From: kernel test robot @ 2021-10-27 23:02 UTC (permalink / raw)
  To: kbuild-all

[-- Attachment #1: Type: text/plain, Size: 3205 bytes --]

Hi Xuan,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on horms-ipvs/master]
[also build test WARNING on linus/master v5.15-rc7 next-20211027]
[cannot apply to mst-vhost/linux-next]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Xuan-Zhuo/virtio-support-cache-indirect-desc/20211027-142025
base:   https://git.kernel.org/pub/scm/linux/kernel/git/horms/ipvs.git master
config: arm-defconfig (attached as .config)
compiler: arm-linux-gnueabi-gcc (GCC) 11.2.0
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/0day-ci/linux/commit/8935e116c155fb7d484bad35a42b2ca98f75e384
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Xuan-Zhuo/virtio-support-cache-indirect-desc/20211027-142025
        git checkout 8935e116c155fb7d484bad35a42b2ca98f75e384
        # save the attached .config to linux build tree
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-11.2.0 make.cross ARCH=arm 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All warnings (new ones prefixed by >>):

   drivers/virtio/virtio_ring.c: In function 'desc_cache_chain_free_split':
>> drivers/virtio/virtio_ring.c:438:25: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]
     438 |                 chain = (void *)desc->addr;
         |                         ^
   drivers/virtio/virtio_ring.c: In function 'desc_cache_put_split':
>> drivers/virtio/virtio_ring.c:447:30: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast]
     447 |                 desc->addr = (u64)vq->desc_cache_chain;
         |                              ^
   drivers/virtio/virtio_ring.c: In function 'alloc_indirect_split':
   drivers/virtio/virtio_ring.c:464:48: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]
     464 |                         vq->desc_cache_chain = (void *)desc->addr;
         |                                                ^


vim +438 drivers/virtio/virtio_ring.c

   431	
   432	static void desc_cache_chain_free_split(void *chain)
   433	{
   434		struct vring_desc *desc;
   435	
   436		while (chain) {
   437			desc = chain;
 > 438			chain = (void *)desc->addr;
   439			kfree(desc);
   440		}
   441	}
   442	
   443	static void desc_cache_put_split(struct vring_virtqueue *vq,
   444					 struct vring_desc *desc, int n)
   445	{
   446		if (vq->use_desc_cache && n <= VIRT_QUEUE_CACHE_DESC_NUM) {
 > 447			desc->addr = (u64)vq->desc_cache_chain;
   448			vq->desc_cache_chain = desc;
   449		} else {
   450			kfree(desc);
   451		}
   452	}
   453	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all(a)lists.01.org

[-- Attachment #2: config.gz --]
[-- Type: application/gzip, Size: 55358 bytes --]

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 2/3] virtio: cache indirect desc for packed
  2021-10-27  6:19   ` Xuan Zhuo
  (?)
@ 2021-10-28  0:28     ` kernel test robot
  -1 siblings, 0 replies; 41+ messages in thread
From: kernel test robot @ 2021-10-28  0:28 UTC (permalink / raw)
  To: Xuan Zhuo, virtualization, netdev
  Cc: llvm, kbuild-all, Michael S. Tsirkin, Jason Wang, Jakub Kicinski

[-- Attachment #1: Type: text/plain, Size: 3542 bytes --]

Hi Xuan,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on horms-ipvs/master]
[also build test WARNING on linus/master v5.15-rc7]
[cannot apply to mst-vhost/linux-next next-20211027]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Xuan-Zhuo/virtio-support-cache-indirect-desc/20211027-142025
base:   https://git.kernel.org/pub/scm/linux/kernel/git/horms/ipvs.git master
config: hexagon-randconfig-r045-20211027 (attached as .config)
compiler: clang version 14.0.0 (https://github.com/llvm/llvm-project 5db7568a6a1fcb408eb8988abdaff2a225a8eb72)
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/0day-ci/linux/commit/bb65ceda850ed4592d8a940e01926d5e3d33ae92
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Xuan-Zhuo/virtio-support-cache-indirect-desc/20211027-142025
        git checkout bb65ceda850ed4592d8a940e01926d5e3d33ae92
        # save the attached .config to linux build tree
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 ARCH=hexagon 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All warnings (new ones prefixed by >>):

>> drivers/virtio/virtio_ring.c:1467:7: warning: variable 'len' is uninitialized when used here [-Wuninitialized]
                   n = len / sizeof(struct vring_packed_desc);
                       ^~~
   drivers/virtio/virtio_ring.c:1460:10: note: initialize the variable 'len' to silence this warning
                   u32 len, n;
                          ^
                           = 0
   1 warning generated.


vim +/len +1467 drivers/virtio/virtio_ring.c

  1433	
  1434	static void detach_buf_packed(struct vring_virtqueue *vq,
  1435				      unsigned int id, void **ctx)
  1436	{
  1437		struct vring_desc_state_packed *state = NULL;
  1438		struct vring_packed_desc *desc;
  1439		unsigned int i, curr;
  1440	
  1441		state = &vq->packed.desc_state[id];
  1442	
  1443		/* Clear data ptr. */
  1444		state->data = NULL;
  1445	
  1446		vq->packed.desc_extra[state->last].next = vq->free_head;
  1447		vq->free_head = id;
  1448		vq->vq.num_free += state->num;
  1449	
  1450		if (unlikely(vq->use_dma_api)) {
  1451			curr = id;
  1452			for (i = 0; i < state->num; i++) {
  1453				vring_unmap_state_packed(vq,
  1454					&vq->packed.desc_extra[curr]);
  1455				curr = vq->packed.desc_extra[curr].next;
  1456			}
  1457		}
  1458	
  1459		if (vq->indirect) {
  1460			u32 len, n;
  1461	
  1462			/* Free the indirect table, if any, now that it's unmapped. */
  1463			desc = state->indir_desc;
  1464			if (!desc)
  1465				return;
  1466	
> 1467			n = len / sizeof(struct vring_packed_desc);
  1468	
  1469			if (vq->use_dma_api) {
  1470				len = vq->packed.desc_extra[id].len;
  1471				for (i = 0; i < n; i++)
  1472					vring_unmap_desc_packed(vq, &desc[i]);
  1473			}
  1474	
  1475			desc_cache_put_packed(vq, desc, n);
  1476			state->indir_desc = NULL;
  1477		} else if (ctx) {
  1478			*ctx = state->indir_desc;
  1479		}
  1480	}
  1481	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 27476 bytes --]

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 2/3] virtio: cache indirect desc for packed
@ 2021-10-28  0:28     ` kernel test robot
  0 siblings, 0 replies; 41+ messages in thread
From: kernel test robot @ 2021-10-28  0:28 UTC (permalink / raw)
  To: Xuan Zhuo, virtualization, netdev
  Cc: Jakub Kicinski, llvm, kbuild-all, Michael S. Tsirkin

[-- Attachment #1: Type: text/plain, Size: 3542 bytes --]

Hi Xuan,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on horms-ipvs/master]
[also build test WARNING on linus/master v5.15-rc7]
[cannot apply to mst-vhost/linux-next next-20211027]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Xuan-Zhuo/virtio-support-cache-indirect-desc/20211027-142025
base:   https://git.kernel.org/pub/scm/linux/kernel/git/horms/ipvs.git master
config: hexagon-randconfig-r045-20211027 (attached as .config)
compiler: clang version 14.0.0 (https://github.com/llvm/llvm-project 5db7568a6a1fcb408eb8988abdaff2a225a8eb72)
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/0day-ci/linux/commit/bb65ceda850ed4592d8a940e01926d5e3d33ae92
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Xuan-Zhuo/virtio-support-cache-indirect-desc/20211027-142025
        git checkout bb65ceda850ed4592d8a940e01926d5e3d33ae92
        # save the attached .config to linux build tree
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 ARCH=hexagon 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All warnings (new ones prefixed by >>):

>> drivers/virtio/virtio_ring.c:1467:7: warning: variable 'len' is uninitialized when used here [-Wuninitialized]
                   n = len / sizeof(struct vring_packed_desc);
                       ^~~
   drivers/virtio/virtio_ring.c:1460:10: note: initialize the variable 'len' to silence this warning
                   u32 len, n;
                          ^
                           = 0
   1 warning generated.


vim +/len +1467 drivers/virtio/virtio_ring.c

  1433	
  1434	static void detach_buf_packed(struct vring_virtqueue *vq,
  1435				      unsigned int id, void **ctx)
  1436	{
  1437		struct vring_desc_state_packed *state = NULL;
  1438		struct vring_packed_desc *desc;
  1439		unsigned int i, curr;
  1440	
  1441		state = &vq->packed.desc_state[id];
  1442	
  1443		/* Clear data ptr. */
  1444		state->data = NULL;
  1445	
  1446		vq->packed.desc_extra[state->last].next = vq->free_head;
  1447		vq->free_head = id;
  1448		vq->vq.num_free += state->num;
  1449	
  1450		if (unlikely(vq->use_dma_api)) {
  1451			curr = id;
  1452			for (i = 0; i < state->num; i++) {
  1453				vring_unmap_state_packed(vq,
  1454					&vq->packed.desc_extra[curr]);
  1455				curr = vq->packed.desc_extra[curr].next;
  1456			}
  1457		}
  1458	
  1459		if (vq->indirect) {
  1460			u32 len, n;
  1461	
  1462			/* Free the indirect table, if any, now that it's unmapped. */
  1463			desc = state->indir_desc;
  1464			if (!desc)
  1465				return;
  1466	
> 1467			n = len / sizeof(struct vring_packed_desc);
  1468	
  1469			if (vq->use_dma_api) {
  1470				len = vq->packed.desc_extra[id].len;
  1471				for (i = 0; i < n; i++)
  1472					vring_unmap_desc_packed(vq, &desc[i]);
  1473			}
  1474	
  1475			desc_cache_put_packed(vq, desc, n);
  1476			state->indir_desc = NULL;
  1477		} else if (ctx) {
  1478			*ctx = state->indir_desc;
  1479		}
  1480	}
  1481	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 27476 bytes --]

[-- Attachment #3: Type: text/plain, Size: 183 bytes --]

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 2/3] virtio: cache indirect desc for packed
@ 2021-10-28  0:28     ` kernel test robot
  0 siblings, 0 replies; 41+ messages in thread
From: kernel test robot @ 2021-10-28  0:28 UTC (permalink / raw)
  To: kbuild-all

[-- Attachment #1: Type: text/plain, Size: 3639 bytes --]

Hi Xuan,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on horms-ipvs/master]
[also build test WARNING on linus/master v5.15-rc7]
[cannot apply to mst-vhost/linux-next next-20211027]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Xuan-Zhuo/virtio-support-cache-indirect-desc/20211027-142025
base:   https://git.kernel.org/pub/scm/linux/kernel/git/horms/ipvs.git master
config: hexagon-randconfig-r045-20211027 (attached as .config)
compiler: clang version 14.0.0 (https://github.com/llvm/llvm-project 5db7568a6a1fcb408eb8988abdaff2a225a8eb72)
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/0day-ci/linux/commit/bb65ceda850ed4592d8a940e01926d5e3d33ae92
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Xuan-Zhuo/virtio-support-cache-indirect-desc/20211027-142025
        git checkout bb65ceda850ed4592d8a940e01926d5e3d33ae92
        # save the attached .config to linux build tree
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 ARCH=hexagon 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All warnings (new ones prefixed by >>):

>> drivers/virtio/virtio_ring.c:1467:7: warning: variable 'len' is uninitialized when used here [-Wuninitialized]
                   n = len / sizeof(struct vring_packed_desc);
                       ^~~
   drivers/virtio/virtio_ring.c:1460:10: note: initialize the variable 'len' to silence this warning
                   u32 len, n;
                          ^
                           = 0
   1 warning generated.


vim +/len +1467 drivers/virtio/virtio_ring.c

  1433	
  1434	static void detach_buf_packed(struct vring_virtqueue *vq,
  1435				      unsigned int id, void **ctx)
  1436	{
  1437		struct vring_desc_state_packed *state = NULL;
  1438		struct vring_packed_desc *desc;
  1439		unsigned int i, curr;
  1440	
  1441		state = &vq->packed.desc_state[id];
  1442	
  1443		/* Clear data ptr. */
  1444		state->data = NULL;
  1445	
  1446		vq->packed.desc_extra[state->last].next = vq->free_head;
  1447		vq->free_head = id;
  1448		vq->vq.num_free += state->num;
  1449	
  1450		if (unlikely(vq->use_dma_api)) {
  1451			curr = id;
  1452			for (i = 0; i < state->num; i++) {
  1453				vring_unmap_state_packed(vq,
  1454					&vq->packed.desc_extra[curr]);
  1455				curr = vq->packed.desc_extra[curr].next;
  1456			}
  1457		}
  1458	
  1459		if (vq->indirect) {
  1460			u32 len, n;
  1461	
  1462			/* Free the indirect table, if any, now that it's unmapped. */
  1463			desc = state->indir_desc;
  1464			if (!desc)
  1465				return;
  1466	
> 1467			n = len / sizeof(struct vring_packed_desc);
  1468	
  1469			if (vq->use_dma_api) {
  1470				len = vq->packed.desc_extra[id].len;
  1471				for (i = 0; i < n; i++)
  1472					vring_unmap_desc_packed(vq, &desc[i]);
  1473			}
  1474	
  1475			desc_cache_put_packed(vq, desc, n);
  1476			state->indir_desc = NULL;
  1477		} else if (ctx) {
  1478			*ctx = state->indir_desc;
  1479		}
  1480	}
  1481	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all(a)lists.01.org

[-- Attachment #2: config.gz --]
[-- Type: application/gzip, Size: 27476 bytes --]

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 1/3] virtio: cache indirect desc for split
  2021-10-27  6:19   ` Xuan Zhuo
  (?)
@ 2021-10-28  0:57     ` kernel test robot
  -1 siblings, 0 replies; 41+ messages in thread
From: kernel test robot @ 2021-10-28  0:57 UTC (permalink / raw)
  To: Xuan Zhuo, virtualization, netdev
  Cc: kbuild-all, Michael S. Tsirkin, Jason Wang, Jakub Kicinski

[-- Attachment #1: Type: text/plain, Size: 2744 bytes --]

Hi Xuan,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on horms-ipvs/master]
[also build test WARNING on linus/master v5.15-rc7 next-20211027]
[cannot apply to mst-vhost/linux-next]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Xuan-Zhuo/virtio-support-cache-indirect-desc/20211027-142025
base:   https://git.kernel.org/pub/scm/linux/kernel/git/horms/ipvs.git master
config: i386-randconfig-s002-20211027 (attached as .config)
compiler: gcc-9 (Debian 9.3.0-22) 9.3.0
reproduce:
        # apt-get install sparse
        # sparse version: v0.6.4-dirty
        # https://github.com/0day-ci/linux/commit/8935e116c155fb7d484bad35a42b2ca98f75e384
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Xuan-Zhuo/virtio-support-cache-indirect-desc/20211027-142025
        git checkout 8935e116c155fb7d484bad35a42b2ca98f75e384
        # save the attached .config to linux build tree
        make W=1 C=1 CF='-fdiagnostic-prefix -D__CHECK_ENDIAN__' O=build_dir ARCH=i386 SHELL=/bin/bash drivers/virtio/

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>


sparse warnings: (new ones prefixed by >>)
>> drivers/virtio/virtio_ring.c:438:26: sparse: sparse: cast from restricted __virtio64
>> drivers/virtio/virtio_ring.c:447:28: sparse: sparse: incorrect type in assignment (different base types) @@     expected restricted __virtio64 [usertype] addr @@     got unsigned long long [usertype] @@
   drivers/virtio/virtio_ring.c:447:28: sparse:     expected restricted __virtio64 [usertype] addr
   drivers/virtio/virtio_ring.c:447:28: sparse:     got unsigned long long [usertype]
   drivers/virtio/virtio_ring.c:464:49: sparse: sparse: cast from restricted __virtio64

vim +438 drivers/virtio/virtio_ring.c

   431	
   432	static void desc_cache_chain_free_split(void *chain)
   433	{
   434		struct vring_desc *desc;
   435	
   436		while (chain) {
   437			desc = chain;
 > 438			chain = (void *)desc->addr;
   439			kfree(desc);
   440		}
   441	}
   442	
   443	static void desc_cache_put_split(struct vring_virtqueue *vq,
   444					 struct vring_desc *desc, int n)
   445	{
   446		if (vq->use_desc_cache && n <= VIRT_QUEUE_CACHE_DESC_NUM) {
 > 447			desc->addr = (u64)vq->desc_cache_chain;
   448			vq->desc_cache_chain = desc;
   449		} else {
   450			kfree(desc);
   451		}
   452	}
   453	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 37885 bytes --]

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 1/3] virtio: cache indirect desc for split
@ 2021-10-28  0:57     ` kernel test robot
  0 siblings, 0 replies; 41+ messages in thread
From: kernel test robot @ 2021-10-28  0:57 UTC (permalink / raw)
  To: Xuan Zhuo, virtualization, netdev
  Cc: Jakub Kicinski, kbuild-all, Michael S. Tsirkin

[-- Attachment #1: Type: text/plain, Size: 2744 bytes --]

Hi Xuan,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on horms-ipvs/master]
[also build test WARNING on linus/master v5.15-rc7 next-20211027]
[cannot apply to mst-vhost/linux-next]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Xuan-Zhuo/virtio-support-cache-indirect-desc/20211027-142025
base:   https://git.kernel.org/pub/scm/linux/kernel/git/horms/ipvs.git master
config: i386-randconfig-s002-20211027 (attached as .config)
compiler: gcc-9 (Debian 9.3.0-22) 9.3.0
reproduce:
        # apt-get install sparse
        # sparse version: v0.6.4-dirty
        # https://github.com/0day-ci/linux/commit/8935e116c155fb7d484bad35a42b2ca98f75e384
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Xuan-Zhuo/virtio-support-cache-indirect-desc/20211027-142025
        git checkout 8935e116c155fb7d484bad35a42b2ca98f75e384
        # save the attached .config to linux build tree
        make W=1 C=1 CF='-fdiagnostic-prefix -D__CHECK_ENDIAN__' O=build_dir ARCH=i386 SHELL=/bin/bash drivers/virtio/

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>


sparse warnings: (new ones prefixed by >>)
>> drivers/virtio/virtio_ring.c:438:26: sparse: sparse: cast from restricted __virtio64
>> drivers/virtio/virtio_ring.c:447:28: sparse: sparse: incorrect type in assignment (different base types) @@     expected restricted __virtio64 [usertype] addr @@     got unsigned long long [usertype] @@
   drivers/virtio/virtio_ring.c:447:28: sparse:     expected restricted __virtio64 [usertype] addr
   drivers/virtio/virtio_ring.c:447:28: sparse:     got unsigned long long [usertype]
   drivers/virtio/virtio_ring.c:464:49: sparse: sparse: cast from restricted __virtio64

vim +438 drivers/virtio/virtio_ring.c

   431	
   432	static void desc_cache_chain_free_split(void *chain)
   433	{
   434		struct vring_desc *desc;
   435	
   436		while (chain) {
   437			desc = chain;
 > 438			chain = (void *)desc->addr;
   439			kfree(desc);
   440		}
   441	}
   442	
   443	static void desc_cache_put_split(struct vring_virtqueue *vq,
   444					 struct vring_desc *desc, int n)
   445	{
   446		if (vq->use_desc_cache && n <= VIRT_QUEUE_CACHE_DESC_NUM) {
 > 447			desc->addr = (u64)vq->desc_cache_chain;
   448			vq->desc_cache_chain = desc;
   449		} else {
   450			kfree(desc);
   451		}
   452	}
   453	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 37885 bytes --]

[-- Attachment #3: Type: text/plain, Size: 183 bytes --]

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 1/3] virtio: cache indirect desc for split
@ 2021-10-28  0:57     ` kernel test robot
  0 siblings, 0 replies; 41+ messages in thread
From: kernel test robot @ 2021-10-28  0:57 UTC (permalink / raw)
  To: kbuild-all

[-- Attachment #1: Type: text/plain, Size: 2811 bytes --]

Hi Xuan,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on horms-ipvs/master]
[also build test WARNING on linus/master v5.15-rc7 next-20211027]
[cannot apply to mst-vhost/linux-next]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Xuan-Zhuo/virtio-support-cache-indirect-desc/20211027-142025
base:   https://git.kernel.org/pub/scm/linux/kernel/git/horms/ipvs.git master
config: i386-randconfig-s002-20211027 (attached as .config)
compiler: gcc-9 (Debian 9.3.0-22) 9.3.0
reproduce:
        # apt-get install sparse
        # sparse version: v0.6.4-dirty
        # https://github.com/0day-ci/linux/commit/8935e116c155fb7d484bad35a42b2ca98f75e384
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Xuan-Zhuo/virtio-support-cache-indirect-desc/20211027-142025
        git checkout 8935e116c155fb7d484bad35a42b2ca98f75e384
        # save the attached .config to linux build tree
        make W=1 C=1 CF='-fdiagnostic-prefix -D__CHECK_ENDIAN__' O=build_dir ARCH=i386 SHELL=/bin/bash drivers/virtio/

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>


sparse warnings: (new ones prefixed by >>)
>> drivers/virtio/virtio_ring.c:438:26: sparse: sparse: cast from restricted __virtio64
>> drivers/virtio/virtio_ring.c:447:28: sparse: sparse: incorrect type in assignment (different base types) @@     expected restricted __virtio64 [usertype] addr @@     got unsigned long long [usertype] @@
   drivers/virtio/virtio_ring.c:447:28: sparse:     expected restricted __virtio64 [usertype] addr
   drivers/virtio/virtio_ring.c:447:28: sparse:     got unsigned long long [usertype]
   drivers/virtio/virtio_ring.c:464:49: sparse: sparse: cast from restricted __virtio64

vim +438 drivers/virtio/virtio_ring.c

   431	
   432	static void desc_cache_chain_free_split(void *chain)
   433	{
   434		struct vring_desc *desc;
   435	
   436		while (chain) {
   437			desc = chain;
 > 438			chain = (void *)desc->addr;
   439			kfree(desc);
   440		}
   441	}
   442	
   443	static void desc_cache_put_split(struct vring_virtqueue *vq,
   444					 struct vring_desc *desc, int n)
   445	{
   446		if (vq->use_desc_cache && n <= VIRT_QUEUE_CACHE_DESC_NUM) {
 > 447			desc->addr = (u64)vq->desc_cache_chain;
   448			vq->desc_cache_chain = desc;
   449		} else {
   450			kfree(desc);
   451		}
   452	}
   453	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all(a)lists.01.org

[-- Attachment #2: config.gz --]
[-- Type: application/gzip, Size: 37885 bytes --]

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 3/3] virtio-net: enable virtio indirect cache
  2021-10-27 15:55   ` Jakub Kicinski
@ 2021-10-28  1:57     ` Xuan Zhuo
  2021-10-28  2:28         ` Jason Wang
  0 siblings, 1 reply; 41+ messages in thread
From: Xuan Zhuo @ 2021-10-28  1:57 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: netdev, Michael S. Tsirkin, David S. Miller, virtualization

On Wed, 27 Oct 2021 08:55:28 -0700, Jakub Kicinski <kuba@kernel.org> wrote:
> On Wed, 27 Oct 2021 14:19:13 +0800 Xuan Zhuo wrote:
> > +static bool virtio_desc_cache = true;
> >  module_param(csum, bool, 0444);
> >  module_param(gso, bool, 0444);
> >  module_param(napi_tx, bool, 0644);
> > +module_param(virtio_desc_cache, bool, 0644);
>
> Can this be an ethtool priv flag? module params are discouraged because
> they can't be controlled per-netdev.


The current design can only be set when the device is initialized. So using
ethtool to modify it will not work.

Thanks.

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 1/3] virtio: cache indirect desc for split
  2021-10-27 17:07     ` Michael S. Tsirkin
@ 2021-10-28  2:16       ` Jason Wang
  -1 siblings, 0 replies; 41+ messages in thread
From: Jason Wang @ 2021-10-28  2:16 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Xuan Zhuo, virtualization, netdev, David S. Miller, Jakub Kicinski

On Thu, Oct 28, 2021 at 1:07 AM Michael S. Tsirkin <mst@redhat.com> wrote:
>
> On Wed, Oct 27, 2021 at 02:19:11PM +0800, Xuan Zhuo wrote:
> > In the case of using indirect, indirect desc must be allocated and
> > released each time, which increases a lot of cpu overhead.
> >
> > Here, a cache is added for indirect. If the number of indirect desc to be
> > applied for is less than VIRT_QUEUE_CACHE_DESC_NUM, the desc array with
> > the size of VIRT_QUEUE_CACHE_DESC_NUM is fixed and cached for reuse.
> >
> > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > ---
> >  drivers/virtio/virtio.c      |  6 ++++
> >  drivers/virtio/virtio_ring.c | 63 ++++++++++++++++++++++++++++++------
> >  include/linux/virtio.h       | 10 ++++++
> >  3 files changed, 70 insertions(+), 9 deletions(-)
> >
> > diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c
> > index 0a5b54034d4b..04bcb74e5b9a 100644
> > --- a/drivers/virtio/virtio.c
> > +++ b/drivers/virtio/virtio.c
> > @@ -431,6 +431,12 @@ bool is_virtio_device(struct device *dev)
> >  }
> >  EXPORT_SYMBOL_GPL(is_virtio_device);
> >
> > +void virtio_use_desc_cache(struct virtio_device *dev, bool val)
> > +{
> > +     dev->desc_cache = val;
> > +}
> > +EXPORT_SYMBOL_GPL(virtio_use_desc_cache);
> > +
> >  void unregister_virtio_device(struct virtio_device *dev)
> >  {
> >       int index = dev->index; /* save for after device release */
> > diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> > index dd95dfd85e98..0b9a8544b0e8 100644
> > --- a/drivers/virtio/virtio_ring.c
> > +++ b/drivers/virtio/virtio_ring.c
> > @@ -117,6 +117,10 @@ struct vring_virtqueue {
> >       /* Hint for event idx: already triggered no need to disable. */
> >       bool event_triggered;
> >
> > +     /* Is indirect cache used? */
> > +     bool use_desc_cache;
> > +     void *desc_cache_chain;
> > +
> >       union {
> >               /* Available for split ring */
> >               struct {
> > @@ -423,12 +427,47 @@ static unsigned int vring_unmap_one_split(const struct vring_virtqueue *vq,
> >       return extra[i].next;
> >  }
> >
> > -static struct vring_desc *alloc_indirect_split(struct virtqueue *_vq,
> > +#define VIRT_QUEUE_CACHE_DESC_NUM 4
> > +
> > +static void desc_cache_chain_free_split(void *chain)
> > +{
> > +     struct vring_desc *desc;
> > +
> > +     while (chain) {
> > +             desc = chain;
> > +             chain = (void *)desc->addr;
> > +             kfree(desc);
> > +     }
> > +}
> > +
> > +static void desc_cache_put_split(struct vring_virtqueue *vq,
> > +                              struct vring_desc *desc, int n)
> > +{
> > +     if (vq->use_desc_cache && n <= VIRT_QUEUE_CACHE_DESC_NUM) {
> > +             desc->addr = (u64)vq->desc_cache_chain;
> > +             vq->desc_cache_chain = desc;
> > +     } else {
> > +             kfree(desc);
> > +     }
> > +}
> > +
>
>
> So I have a question here. What happens if we just do:
>
> if (n <= VIRT_QUEUE_CACHE_DESC_NUM) {
>         return kmem_cache_alloc(VIRT_QUEUE_CACHE_DESC_NUM * sizeof desc, gfp)
> } else {
>         return kmalloc_arrat(n, sizeof desc, gfp)
> }
>
> A small change and won't we reap most performance benefits?

Yes, I think we need a benchmark to use private cache to see how much
it can help.

Thanks

>
> --
> MST
>


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 1/3] virtio: cache indirect desc for split
@ 2021-10-28  2:16       ` Jason Wang
  0 siblings, 0 replies; 41+ messages in thread
From: Jason Wang @ 2021-10-28  2:16 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: netdev, David S. Miller, Jakub Kicinski, virtualization

On Thu, Oct 28, 2021 at 1:07 AM Michael S. Tsirkin <mst@redhat.com> wrote:
>
> On Wed, Oct 27, 2021 at 02:19:11PM +0800, Xuan Zhuo wrote:
> > In the case of using indirect, indirect desc must be allocated and
> > released each time, which increases a lot of cpu overhead.
> >
> > Here, a cache is added for indirect. If the number of indirect desc to be
> > applied for is less than VIRT_QUEUE_CACHE_DESC_NUM, the desc array with
> > the size of VIRT_QUEUE_CACHE_DESC_NUM is fixed and cached for reuse.
> >
> > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > ---
> >  drivers/virtio/virtio.c      |  6 ++++
> >  drivers/virtio/virtio_ring.c | 63 ++++++++++++++++++++++++++++++------
> >  include/linux/virtio.h       | 10 ++++++
> >  3 files changed, 70 insertions(+), 9 deletions(-)
> >
> > diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c
> > index 0a5b54034d4b..04bcb74e5b9a 100644
> > --- a/drivers/virtio/virtio.c
> > +++ b/drivers/virtio/virtio.c
> > @@ -431,6 +431,12 @@ bool is_virtio_device(struct device *dev)
> >  }
> >  EXPORT_SYMBOL_GPL(is_virtio_device);
> >
> > +void virtio_use_desc_cache(struct virtio_device *dev, bool val)
> > +{
> > +     dev->desc_cache = val;
> > +}
> > +EXPORT_SYMBOL_GPL(virtio_use_desc_cache);
> > +
> >  void unregister_virtio_device(struct virtio_device *dev)
> >  {
> >       int index = dev->index; /* save for after device release */
> > diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> > index dd95dfd85e98..0b9a8544b0e8 100644
> > --- a/drivers/virtio/virtio_ring.c
> > +++ b/drivers/virtio/virtio_ring.c
> > @@ -117,6 +117,10 @@ struct vring_virtqueue {
> >       /* Hint for event idx: already triggered no need to disable. */
> >       bool event_triggered;
> >
> > +     /* Is indirect cache used? */
> > +     bool use_desc_cache;
> > +     void *desc_cache_chain;
> > +
> >       union {
> >               /* Available for split ring */
> >               struct {
> > @@ -423,12 +427,47 @@ static unsigned int vring_unmap_one_split(const struct vring_virtqueue *vq,
> >       return extra[i].next;
> >  }
> >
> > -static struct vring_desc *alloc_indirect_split(struct virtqueue *_vq,
> > +#define VIRT_QUEUE_CACHE_DESC_NUM 4
> > +
> > +static void desc_cache_chain_free_split(void *chain)
> > +{
> > +     struct vring_desc *desc;
> > +
> > +     while (chain) {
> > +             desc = chain;
> > +             chain = (void *)desc->addr;
> > +             kfree(desc);
> > +     }
> > +}
> > +
> > +static void desc_cache_put_split(struct vring_virtqueue *vq,
> > +                              struct vring_desc *desc, int n)
> > +{
> > +     if (vq->use_desc_cache && n <= VIRT_QUEUE_CACHE_DESC_NUM) {
> > +             desc->addr = (u64)vq->desc_cache_chain;
> > +             vq->desc_cache_chain = desc;
> > +     } else {
> > +             kfree(desc);
> > +     }
> > +}
> > +
>
>
> So I have a question here. What happens if we just do:
>
> if (n <= VIRT_QUEUE_CACHE_DESC_NUM) {
>         return kmem_cache_alloc(VIRT_QUEUE_CACHE_DESC_NUM * sizeof desc, gfp)
> } else {
>         return kmalloc_arrat(n, sizeof desc, gfp)
> }
>
> A small change and won't we reap most performance benefits?

Yes, I think we need a benchmark to use private cache to see how much
it can help.

Thanks

>
> --
> MST
>

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 3/3] virtio-net: enable virtio indirect cache
  2021-10-28  1:57     ` Xuan Zhuo
@ 2021-10-28  2:28         ` Jason Wang
  0 siblings, 0 replies; 41+ messages in thread
From: Jason Wang @ 2021-10-28  2:28 UTC (permalink / raw)
  To: Xuan Zhuo
  Cc: Jakub Kicinski, virtualization, netdev, Michael S. Tsirkin,
	David S. Miller

On Thu, Oct 28, 2021 at 9:59 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
>
> On Wed, 27 Oct 2021 08:55:28 -0700, Jakub Kicinski <kuba@kernel.org> wrote:
> > On Wed, 27 Oct 2021 14:19:13 +0800 Xuan Zhuo wrote:
> > > +static bool virtio_desc_cache = true;
> > >  module_param(csum, bool, 0444);
> > >  module_param(gso, bool, 0444);
> > >  module_param(napi_tx, bool, 0644);
> > > +module_param(virtio_desc_cache, bool, 0644);
> >
> > Can this be an ethtool priv flag? module params are discouraged because
> > they can't be controlled per-netdev.
>
>
> The current design can only be set when the device is initialized. So using
> ethtool to modify it will not work.

Anyhow you can add things like synchronization to make it work. But I
think what we want is to make it work unconditionally, so having a
module parameter seems useless. If you want to use it for
benchmarking?

Thanks

>
> Thanks.
>


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 3/3] virtio-net: enable virtio indirect cache
@ 2021-10-28  2:28         ` Jason Wang
  0 siblings, 0 replies; 41+ messages in thread
From: Jason Wang @ 2021-10-28  2:28 UTC (permalink / raw)
  To: Xuan Zhuo
  Cc: Jakub Kicinski, netdev, Michael S. Tsirkin, David S. Miller,
	virtualization

On Thu, Oct 28, 2021 at 9:59 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
>
> On Wed, 27 Oct 2021 08:55:28 -0700, Jakub Kicinski <kuba@kernel.org> wrote:
> > On Wed, 27 Oct 2021 14:19:13 +0800 Xuan Zhuo wrote:
> > > +static bool virtio_desc_cache = true;
> > >  module_param(csum, bool, 0444);
> > >  module_param(gso, bool, 0444);
> > >  module_param(napi_tx, bool, 0644);
> > > +module_param(virtio_desc_cache, bool, 0644);
> >
> > Can this be an ethtool priv flag? module params are discouraged because
> > they can't be controlled per-netdev.
>
>
> The current design can only be set when the device is initialized. So using
> ethtool to modify it will not work.

Anyhow you can add things like synchronization to make it work. But I
think what we want is to make it work unconditionally, so having a
module parameter seems useless. If you want to use it for
benchmarking?

Thanks

>
> Thanks.
>

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 2/3] virtio: cache indirect desc for packed
  2021-10-27  6:19   ` Xuan Zhuo
  (?)
@ 2021-10-28  3:51     ` kernel test robot
  -1 siblings, 0 replies; 41+ messages in thread
From: kernel test robot @ 2021-10-28  3:51 UTC (permalink / raw)
  To: Xuan Zhuo, virtualization, netdev
  Cc: kbuild-all, Michael S. Tsirkin, Jason Wang, Jakub Kicinski

[-- Attachment #1: Type: text/plain, Size: 3318 bytes --]

Hi Xuan,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on horms-ipvs/master]
[also build test WARNING on linus/master v5.15-rc7]
[cannot apply to mst-vhost/linux-next next-20211027]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Xuan-Zhuo/virtio-support-cache-indirect-desc/20211027-142025
base:   https://git.kernel.org/pub/scm/linux/kernel/git/horms/ipvs.git master
config: i386-randconfig-s002-20211027 (attached as .config)
compiler: gcc-9 (Debian 9.3.0-22) 9.3.0
reproduce:
        # apt-get install sparse
        # sparse version: v0.6.4-dirty
        # https://github.com/0day-ci/linux/commit/bb65ceda850ed4592d8a940e01926d5e3d33ae92
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Xuan-Zhuo/virtio-support-cache-indirect-desc/20211027-142025
        git checkout bb65ceda850ed4592d8a940e01926d5e3d33ae92
        # save the attached .config to linux build tree
        make W=1 C=1 CF='-fdiagnostic-prefix -D__CHECK_ENDIAN__' O=build_dir ARCH=i386 SHELL=/bin/bash drivers/virtio/

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>


sparse warnings: (new ones prefixed by >>)
   drivers/virtio/virtio_ring.c:438:26: sparse: sparse: cast from restricted __virtio64
   drivers/virtio/virtio_ring.c:447:28: sparse: sparse: incorrect type in assignment (different base types) @@     expected restricted __virtio64 [usertype] addr @@     got unsigned long long [usertype] @@
   drivers/virtio/virtio_ring.c:447:28: sparse:     expected restricted __virtio64 [usertype] addr
   drivers/virtio/virtio_ring.c:447:28: sparse:     got unsigned long long [usertype]
   drivers/virtio/virtio_ring.c:464:49: sparse: sparse: cast from restricted __virtio64
>> drivers/virtio/virtio_ring.c:1083:26: sparse: sparse: cast from restricted __le64
>> drivers/virtio/virtio_ring.c:1092:28: sparse: sparse: incorrect type in assignment (different base types) @@     expected restricted __le64 [usertype] addr @@     got unsigned long long [usertype] @@
   drivers/virtio/virtio_ring.c:1092:28: sparse:     expected restricted __le64 [usertype] addr
   drivers/virtio/virtio_ring.c:1092:28: sparse:     got unsigned long long [usertype]
   drivers/virtio/virtio_ring.c:1109:49: sparse: sparse: cast from restricted __le64

vim +1083 drivers/virtio/virtio_ring.c

  1076	
  1077	static void desc_cache_chain_free_packed(void *chain)
  1078	{
  1079		struct vring_packed_desc *desc;
  1080	
  1081		while (chain) {
  1082			desc = chain;
> 1083			chain = (void *)desc->addr;
  1084			kfree(desc);
  1085		}
  1086	}
  1087	
  1088	static void desc_cache_put_packed(struct vring_virtqueue *vq,
  1089					  struct vring_packed_desc *desc, int n)
  1090	{
  1091		if (vq->use_desc_cache && n <= VIRT_QUEUE_CACHE_DESC_NUM) {
> 1092			desc->addr = (u64)vq->desc_cache_chain;
  1093			vq->desc_cache_chain = desc;
  1094		} else {
  1095			kfree(desc);
  1096		}
  1097	}
  1098	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 37885 bytes --]

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 2/3] virtio: cache indirect desc for packed
@ 2021-10-28  3:51     ` kernel test robot
  0 siblings, 0 replies; 41+ messages in thread
From: kernel test robot @ 2021-10-28  3:51 UTC (permalink / raw)
  To: Xuan Zhuo, virtualization, netdev
  Cc: Jakub Kicinski, kbuild-all, Michael S. Tsirkin

[-- Attachment #1: Type: text/plain, Size: 3318 bytes --]

Hi Xuan,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on horms-ipvs/master]
[also build test WARNING on linus/master v5.15-rc7]
[cannot apply to mst-vhost/linux-next next-20211027]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Xuan-Zhuo/virtio-support-cache-indirect-desc/20211027-142025
base:   https://git.kernel.org/pub/scm/linux/kernel/git/horms/ipvs.git master
config: i386-randconfig-s002-20211027 (attached as .config)
compiler: gcc-9 (Debian 9.3.0-22) 9.3.0
reproduce:
        # apt-get install sparse
        # sparse version: v0.6.4-dirty
        # https://github.com/0day-ci/linux/commit/bb65ceda850ed4592d8a940e01926d5e3d33ae92
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Xuan-Zhuo/virtio-support-cache-indirect-desc/20211027-142025
        git checkout bb65ceda850ed4592d8a940e01926d5e3d33ae92
        # save the attached .config to linux build tree
        make W=1 C=1 CF='-fdiagnostic-prefix -D__CHECK_ENDIAN__' O=build_dir ARCH=i386 SHELL=/bin/bash drivers/virtio/

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>


sparse warnings: (new ones prefixed by >>)
   drivers/virtio/virtio_ring.c:438:26: sparse: sparse: cast from restricted __virtio64
   drivers/virtio/virtio_ring.c:447:28: sparse: sparse: incorrect type in assignment (different base types) @@     expected restricted __virtio64 [usertype] addr @@     got unsigned long long [usertype] @@
   drivers/virtio/virtio_ring.c:447:28: sparse:     expected restricted __virtio64 [usertype] addr
   drivers/virtio/virtio_ring.c:447:28: sparse:     got unsigned long long [usertype]
   drivers/virtio/virtio_ring.c:464:49: sparse: sparse: cast from restricted __virtio64
>> drivers/virtio/virtio_ring.c:1083:26: sparse: sparse: cast from restricted __le64
>> drivers/virtio/virtio_ring.c:1092:28: sparse: sparse: incorrect type in assignment (different base types) @@     expected restricted __le64 [usertype] addr @@     got unsigned long long [usertype] @@
   drivers/virtio/virtio_ring.c:1092:28: sparse:     expected restricted __le64 [usertype] addr
   drivers/virtio/virtio_ring.c:1092:28: sparse:     got unsigned long long [usertype]
   drivers/virtio/virtio_ring.c:1109:49: sparse: sparse: cast from restricted __le64

vim +1083 drivers/virtio/virtio_ring.c

  1076	
  1077	static void desc_cache_chain_free_packed(void *chain)
  1078	{
  1079		struct vring_packed_desc *desc;
  1080	
  1081		while (chain) {
  1082			desc = chain;
> 1083			chain = (void *)desc->addr;
  1084			kfree(desc);
  1085		}
  1086	}
  1087	
  1088	static void desc_cache_put_packed(struct vring_virtqueue *vq,
  1089					  struct vring_packed_desc *desc, int n)
  1090	{
  1091		if (vq->use_desc_cache && n <= VIRT_QUEUE_CACHE_DESC_NUM) {
> 1092			desc->addr = (u64)vq->desc_cache_chain;
  1093			vq->desc_cache_chain = desc;
  1094		} else {
  1095			kfree(desc);
  1096		}
  1097	}
  1098	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 37885 bytes --]

[-- Attachment #3: Type: text/plain, Size: 183 bytes --]

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 2/3] virtio: cache indirect desc for packed
@ 2021-10-28  3:51     ` kernel test robot
  0 siblings, 0 replies; 41+ messages in thread
From: kernel test robot @ 2021-10-28  3:51 UTC (permalink / raw)
  To: kbuild-all

[-- Attachment #1: Type: text/plain, Size: 3390 bytes --]

Hi Xuan,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on horms-ipvs/master]
[also build test WARNING on linus/master v5.15-rc7]
[cannot apply to mst-vhost/linux-next next-20211027]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Xuan-Zhuo/virtio-support-cache-indirect-desc/20211027-142025
base:   https://git.kernel.org/pub/scm/linux/kernel/git/horms/ipvs.git master
config: i386-randconfig-s002-20211027 (attached as .config)
compiler: gcc-9 (Debian 9.3.0-22) 9.3.0
reproduce:
        # apt-get install sparse
        # sparse version: v0.6.4-dirty
        # https://github.com/0day-ci/linux/commit/bb65ceda850ed4592d8a940e01926d5e3d33ae92
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Xuan-Zhuo/virtio-support-cache-indirect-desc/20211027-142025
        git checkout bb65ceda850ed4592d8a940e01926d5e3d33ae92
        # save the attached .config to linux build tree
        make W=1 C=1 CF='-fdiagnostic-prefix -D__CHECK_ENDIAN__' O=build_dir ARCH=i386 SHELL=/bin/bash drivers/virtio/

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>


sparse warnings: (new ones prefixed by >>)
   drivers/virtio/virtio_ring.c:438:26: sparse: sparse: cast from restricted __virtio64
   drivers/virtio/virtio_ring.c:447:28: sparse: sparse: incorrect type in assignment (different base types) @@     expected restricted __virtio64 [usertype] addr @@     got unsigned long long [usertype] @@
   drivers/virtio/virtio_ring.c:447:28: sparse:     expected restricted __virtio64 [usertype] addr
   drivers/virtio/virtio_ring.c:447:28: sparse:     got unsigned long long [usertype]
   drivers/virtio/virtio_ring.c:464:49: sparse: sparse: cast from restricted __virtio64
>> drivers/virtio/virtio_ring.c:1083:26: sparse: sparse: cast from restricted __le64
>> drivers/virtio/virtio_ring.c:1092:28: sparse: sparse: incorrect type in assignment (different base types) @@     expected restricted __le64 [usertype] addr @@     got unsigned long long [usertype] @@
   drivers/virtio/virtio_ring.c:1092:28: sparse:     expected restricted __le64 [usertype] addr
   drivers/virtio/virtio_ring.c:1092:28: sparse:     got unsigned long long [usertype]
   drivers/virtio/virtio_ring.c:1109:49: sparse: sparse: cast from restricted __le64

vim +1083 drivers/virtio/virtio_ring.c

  1076	
  1077	static void desc_cache_chain_free_packed(void *chain)
  1078	{
  1079		struct vring_packed_desc *desc;
  1080	
  1081		while (chain) {
  1082			desc = chain;
> 1083			chain = (void *)desc->addr;
  1084			kfree(desc);
  1085		}
  1086	}
  1087	
  1088	static void desc_cache_put_packed(struct vring_virtqueue *vq,
  1089					  struct vring_packed_desc *desc, int n)
  1090	{
  1091		if (vq->use_desc_cache && n <= VIRT_QUEUE_CACHE_DESC_NUM) {
> 1092			desc->addr = (u64)vq->desc_cache_chain;
  1093			vq->desc_cache_chain = desc;
  1094		} else {
  1095			kfree(desc);
  1096		}
  1097	}
  1098	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all(a)lists.01.org

[-- Attachment #2: config.gz --]
[-- Type: application/gzip, Size: 37885 bytes --]

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 1/3] virtio: cache indirect desc for split
  2021-10-28  2:16       ` Jason Wang
  (?)
@ 2021-10-28  6:16       ` Xuan Zhuo
  2021-10-31 14:47           ` Michael S. Tsirkin
  -1 siblings, 1 reply; 41+ messages in thread
From: Xuan Zhuo @ 2021-10-28  6:16 UTC (permalink / raw)
  To: Jason Wang
  Cc: netdev, Michael S. Tsirkin, David S. Miller, Jakub Kicinski,
	virtualization

On Thu, 28 Oct 2021 10:16:10 +0800, Jason Wang <jasowang@redhat.com> wrote:
> On Thu, Oct 28, 2021 at 1:07 AM Michael S. Tsirkin <mst@redhat.com> wrote:
> >
> > On Wed, Oct 27, 2021 at 02:19:11PM +0800, Xuan Zhuo wrote:
> > > In the case of using indirect, indirect desc must be allocated and
> > > released each time, which increases a lot of cpu overhead.
> > >
> > > Here, a cache is added for indirect. If the number of indirect desc to be
> > > applied for is less than VIRT_QUEUE_CACHE_DESC_NUM, the desc array with
> > > the size of VIRT_QUEUE_CACHE_DESC_NUM is fixed and cached for reuse.
> > >
> > > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > > ---
> > >  drivers/virtio/virtio.c      |  6 ++++
> > >  drivers/virtio/virtio_ring.c | 63 ++++++++++++++++++++++++++++++------
> > >  include/linux/virtio.h       | 10 ++++++
> > >  3 files changed, 70 insertions(+), 9 deletions(-)
> > >
> > > diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c
> > > index 0a5b54034d4b..04bcb74e5b9a 100644
> > > --- a/drivers/virtio/virtio.c
> > > +++ b/drivers/virtio/virtio.c
> > > @@ -431,6 +431,12 @@ bool is_virtio_device(struct device *dev)
> > >  }
> > >  EXPORT_SYMBOL_GPL(is_virtio_device);
> > >
> > > +void virtio_use_desc_cache(struct virtio_device *dev, bool val)
> > > +{
> > > +     dev->desc_cache = val;
> > > +}
> > > +EXPORT_SYMBOL_GPL(virtio_use_desc_cache);
> > > +
> > >  void unregister_virtio_device(struct virtio_device *dev)
> > >  {
> > >       int index = dev->index; /* save for after device release */
> > > diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> > > index dd95dfd85e98..0b9a8544b0e8 100644
> > > --- a/drivers/virtio/virtio_ring.c
> > > +++ b/drivers/virtio/virtio_ring.c
> > > @@ -117,6 +117,10 @@ struct vring_virtqueue {
> > >       /* Hint for event idx: already triggered no need to disable. */
> > >       bool event_triggered;
> > >
> > > +     /* Is indirect cache used? */
> > > +     bool use_desc_cache;
> > > +     void *desc_cache_chain;
> > > +
> > >       union {
> > >               /* Available for split ring */
> > >               struct {
> > > @@ -423,12 +427,47 @@ static unsigned int vring_unmap_one_split(const struct vring_virtqueue *vq,
> > >       return extra[i].next;
> > >  }
> > >
> > > -static struct vring_desc *alloc_indirect_split(struct virtqueue *_vq,
> > > +#define VIRT_QUEUE_CACHE_DESC_NUM 4
> > > +
> > > +static void desc_cache_chain_free_split(void *chain)
> > > +{
> > > +     struct vring_desc *desc;
> > > +
> > > +     while (chain) {
> > > +             desc = chain;
> > > +             chain = (void *)desc->addr;
> > > +             kfree(desc);
> > > +     }
> > > +}
> > > +
> > > +static void desc_cache_put_split(struct vring_virtqueue *vq,
> > > +                              struct vring_desc *desc, int n)
> > > +{
> > > +     if (vq->use_desc_cache && n <= VIRT_QUEUE_CACHE_DESC_NUM) {
> > > +             desc->addr = (u64)vq->desc_cache_chain;
> > > +             vq->desc_cache_chain = desc;
> > > +     } else {
> > > +             kfree(desc);
> > > +     }
> > > +}
> > > +
> >
> >
> > So I have a question here. What happens if we just do:
> >
> > if (n <= VIRT_QUEUE_CACHE_DESC_NUM) {
> >         return kmem_cache_alloc(VIRT_QUEUE_CACHE_DESC_NUM * sizeof desc, gfp)
> > } else {
> >         return kmalloc_arrat(n, sizeof desc, gfp)
> > }
> >
> > A small change and won't we reap most performance benefits?
>
> Yes, I think we need a benchmark to use private cache to see how much
> it can help.

I did a test, the code is as follows:

+static void desc_cache_put_packed(struct vring_virtqueue *vq,
+                                  struct vring_packed_desc *desc, int n)
+ {
+       if (n <= VIRT_QUEUE_CACHE_DESC_NUM) {
+               kmem_cache_free(vq->desc_cache, desc);
+       } else {
+               kfree(desc);
+       }


@@ -476,11 +452,14 @@ static struct vring_desc *alloc_indirect_split(struct vring_virtqueue *vq,
         */
        gfp &= ~__GFP_HIGHMEM;

-       desc = kmalloc_array(n, sizeof(struct vring_desc), gfp);
+       if (total_sg <= VIRT_QUEUE_CACHE_DESC_NUM)
+               desc = kmem_cache_alloc(vq->desc_cache, gfp);
+       else
+               desc = kmalloc_array(total_sg, sizeof(struct vring_desc), gfp);
+

	.......

+       vq->desc_cache = kmem_cache_create("virio_desc",
+                                          4 * sizeof(struct vring_desc),
+                                          0, 0, NULL);

The effect is not good, basically there is no improvement, using perf top can
see that the overhead of kmem_cache_alloc/kmem_cache_free is also relatively
large:

         26.91%  [kernel]  [k] virtqueue_add
         15.35%  [kernel]  [k] detach_buf_split
         14.15%  [kernel]  [k] virtnet_xsk_xmit
         13.24%  [kernel]  [k] virtqueue_add_outbuf
       >  9.30%  [kernel]  [k] __slab_free
       >  3.91%  [kernel]  [k] kmem_cache_alloc
          2.85%  [kernel]  [k] virtqueue_get_buf_ctx
       >  2.82%  [kernel]  [k] kmem_cache_free
          2.54%  [kernel]  [k] memset_erms
          2.37%  [kernel]  [k] xsk_tx_peek_desc
          1.20%  [kernel]  [k] virtnet_xsk_run
          0.81%  [kernel]  [k] vring_map_one_sg
          0.69%  [kernel]  [k] __free_old_xmit_ptr
          0.69%  [kernel]  [k] virtqueue_kick_prepare
          0.43%  [kernel]  [k] sg_init_table
          0.41%  [kernel]  [k] sg_next
          0.28%  [kernel]  [k] vring_unmap_one_split
          0.25%  [kernel]  [k] vring_map_single.constprop.34
          0.24%  [kernel]  [k] net_rx_action

Thanks.

>
> Thanks
>
> >
> > --
> > MST
> >
>
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 2/3] virtio: cache indirect desc for packed
  2021-10-27  6:19   ` Xuan Zhuo
  (?)
@ 2021-10-28  7:38     ` kernel test robot
  -1 siblings, 0 replies; 41+ messages in thread
From: kernel test robot @ 2021-10-28  7:38 UTC (permalink / raw)
  To: Xuan Zhuo, virtualization, netdev
  Cc: llvm, kbuild-all, Michael S. Tsirkin, Jason Wang, Jakub Kicinski

[-- Attachment #1: Type: text/plain, Size: 3540 bytes --]

Hi Xuan,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on horms-ipvs/master]
[also build test ERROR on linus/master v5.15-rc7]
[cannot apply to mst-vhost/linux-next next-20211027]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Xuan-Zhuo/virtio-support-cache-indirect-desc/20211027-142025
base:   https://git.kernel.org/pub/scm/linux/kernel/git/horms/ipvs.git master
config: i386-buildonly-randconfig-r005-20211027 (attached as .config)
compiler: clang version 14.0.0 (https://github.com/llvm/llvm-project 5db7568a6a1fcb408eb8988abdaff2a225a8eb72)
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/0day-ci/linux/commit/bb65ceda850ed4592d8a940e01926d5e3d33ae92
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Xuan-Zhuo/virtio-support-cache-indirect-desc/20211027-142025
        git checkout bb65ceda850ed4592d8a940e01926d5e3d33ae92
        # save the attached .config to linux build tree
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 ARCH=i386 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

>> drivers/virtio/virtio_ring.c:1467:7: error: variable 'len' is uninitialized when used here [-Werror,-Wuninitialized]
                   n = len / sizeof(struct vring_packed_desc);
                       ^~~
   drivers/virtio/virtio_ring.c:1460:10: note: initialize the variable 'len' to silence this warning
                   u32 len, n;
                          ^
                           = 0
   1 error generated.


vim +/len +1467 drivers/virtio/virtio_ring.c

  1433	
  1434	static void detach_buf_packed(struct vring_virtqueue *vq,
  1435				      unsigned int id, void **ctx)
  1436	{
  1437		struct vring_desc_state_packed *state = NULL;
  1438		struct vring_packed_desc *desc;
  1439		unsigned int i, curr;
  1440	
  1441		state = &vq->packed.desc_state[id];
  1442	
  1443		/* Clear data ptr. */
  1444		state->data = NULL;
  1445	
  1446		vq->packed.desc_extra[state->last].next = vq->free_head;
  1447		vq->free_head = id;
  1448		vq->vq.num_free += state->num;
  1449	
  1450		if (unlikely(vq->use_dma_api)) {
  1451			curr = id;
  1452			for (i = 0; i < state->num; i++) {
  1453				vring_unmap_state_packed(vq,
  1454					&vq->packed.desc_extra[curr]);
  1455				curr = vq->packed.desc_extra[curr].next;
  1456			}
  1457		}
  1458	
  1459		if (vq->indirect) {
  1460			u32 len, n;
  1461	
  1462			/* Free the indirect table, if any, now that it's unmapped. */
  1463			desc = state->indir_desc;
  1464			if (!desc)
  1465				return;
  1466	
> 1467			n = len / sizeof(struct vring_packed_desc);
  1468	
  1469			if (vq->use_dma_api) {
  1470				len = vq->packed.desc_extra[id].len;
  1471				for (i = 0; i < n; i++)
  1472					vring_unmap_desc_packed(vq, &desc[i]);
  1473			}
  1474	
  1475			desc_cache_put_packed(vq, desc, n);
  1476			state->indir_desc = NULL;
  1477		} else if (ctx) {
  1478			*ctx = state->indir_desc;
  1479		}
  1480	}
  1481	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 40097 bytes --]

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 2/3] virtio: cache indirect desc for packed
@ 2021-10-28  7:38     ` kernel test robot
  0 siblings, 0 replies; 41+ messages in thread
From: kernel test robot @ 2021-10-28  7:38 UTC (permalink / raw)
  To: Xuan Zhuo, virtualization, netdev
  Cc: Jakub Kicinski, llvm, kbuild-all, Michael S. Tsirkin

[-- Attachment #1: Type: text/plain, Size: 3540 bytes --]

Hi Xuan,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on horms-ipvs/master]
[also build test ERROR on linus/master v5.15-rc7]
[cannot apply to mst-vhost/linux-next next-20211027]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Xuan-Zhuo/virtio-support-cache-indirect-desc/20211027-142025
base:   https://git.kernel.org/pub/scm/linux/kernel/git/horms/ipvs.git master
config: i386-buildonly-randconfig-r005-20211027 (attached as .config)
compiler: clang version 14.0.0 (https://github.com/llvm/llvm-project 5db7568a6a1fcb408eb8988abdaff2a225a8eb72)
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/0day-ci/linux/commit/bb65ceda850ed4592d8a940e01926d5e3d33ae92
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Xuan-Zhuo/virtio-support-cache-indirect-desc/20211027-142025
        git checkout bb65ceda850ed4592d8a940e01926d5e3d33ae92
        # save the attached .config to linux build tree
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 ARCH=i386 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

>> drivers/virtio/virtio_ring.c:1467:7: error: variable 'len' is uninitialized when used here [-Werror,-Wuninitialized]
                   n = len / sizeof(struct vring_packed_desc);
                       ^~~
   drivers/virtio/virtio_ring.c:1460:10: note: initialize the variable 'len' to silence this warning
                   u32 len, n;
                          ^
                           = 0
   1 error generated.


vim +/len +1467 drivers/virtio/virtio_ring.c

  1433	
  1434	static void detach_buf_packed(struct vring_virtqueue *vq,
  1435				      unsigned int id, void **ctx)
  1436	{
  1437		struct vring_desc_state_packed *state = NULL;
  1438		struct vring_packed_desc *desc;
  1439		unsigned int i, curr;
  1440	
  1441		state = &vq->packed.desc_state[id];
  1442	
  1443		/* Clear data ptr. */
  1444		state->data = NULL;
  1445	
  1446		vq->packed.desc_extra[state->last].next = vq->free_head;
  1447		vq->free_head = id;
  1448		vq->vq.num_free += state->num;
  1449	
  1450		if (unlikely(vq->use_dma_api)) {
  1451			curr = id;
  1452			for (i = 0; i < state->num; i++) {
  1453				vring_unmap_state_packed(vq,
  1454					&vq->packed.desc_extra[curr]);
  1455				curr = vq->packed.desc_extra[curr].next;
  1456			}
  1457		}
  1458	
  1459		if (vq->indirect) {
  1460			u32 len, n;
  1461	
  1462			/* Free the indirect table, if any, now that it's unmapped. */
  1463			desc = state->indir_desc;
  1464			if (!desc)
  1465				return;
  1466	
> 1467			n = len / sizeof(struct vring_packed_desc);
  1468	
  1469			if (vq->use_dma_api) {
  1470				len = vq->packed.desc_extra[id].len;
  1471				for (i = 0; i < n; i++)
  1472					vring_unmap_desc_packed(vq, &desc[i]);
  1473			}
  1474	
  1475			desc_cache_put_packed(vq, desc, n);
  1476			state->indir_desc = NULL;
  1477		} else if (ctx) {
  1478			*ctx = state->indir_desc;
  1479		}
  1480	}
  1481	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 40097 bytes --]

[-- Attachment #3: Type: text/plain, Size: 183 bytes --]

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 2/3] virtio: cache indirect desc for packed
@ 2021-10-28  7:38     ` kernel test robot
  0 siblings, 0 replies; 41+ messages in thread
From: kernel test robot @ 2021-10-28  7:38 UTC (permalink / raw)
  To: kbuild-all

[-- Attachment #1: Type: text/plain, Size: 3637 bytes --]

Hi Xuan,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on horms-ipvs/master]
[also build test ERROR on linus/master v5.15-rc7]
[cannot apply to mst-vhost/linux-next next-20211027]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Xuan-Zhuo/virtio-support-cache-indirect-desc/20211027-142025
base:   https://git.kernel.org/pub/scm/linux/kernel/git/horms/ipvs.git master
config: i386-buildonly-randconfig-r005-20211027 (attached as .config)
compiler: clang version 14.0.0 (https://github.com/llvm/llvm-project 5db7568a6a1fcb408eb8988abdaff2a225a8eb72)
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/0day-ci/linux/commit/bb65ceda850ed4592d8a940e01926d5e3d33ae92
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Xuan-Zhuo/virtio-support-cache-indirect-desc/20211027-142025
        git checkout bb65ceda850ed4592d8a940e01926d5e3d33ae92
        # save the attached .config to linux build tree
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 ARCH=i386 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

>> drivers/virtio/virtio_ring.c:1467:7: error: variable 'len' is uninitialized when used here [-Werror,-Wuninitialized]
                   n = len / sizeof(struct vring_packed_desc);
                       ^~~
   drivers/virtio/virtio_ring.c:1460:10: note: initialize the variable 'len' to silence this warning
                   u32 len, n;
                          ^
                           = 0
   1 error generated.


vim +/len +1467 drivers/virtio/virtio_ring.c

  1433	
  1434	static void detach_buf_packed(struct vring_virtqueue *vq,
  1435				      unsigned int id, void **ctx)
  1436	{
  1437		struct vring_desc_state_packed *state = NULL;
  1438		struct vring_packed_desc *desc;
  1439		unsigned int i, curr;
  1440	
  1441		state = &vq->packed.desc_state[id];
  1442	
  1443		/* Clear data ptr. */
  1444		state->data = NULL;
  1445	
  1446		vq->packed.desc_extra[state->last].next = vq->free_head;
  1447		vq->free_head = id;
  1448		vq->vq.num_free += state->num;
  1449	
  1450		if (unlikely(vq->use_dma_api)) {
  1451			curr = id;
  1452			for (i = 0; i < state->num; i++) {
  1453				vring_unmap_state_packed(vq,
  1454					&vq->packed.desc_extra[curr]);
  1455				curr = vq->packed.desc_extra[curr].next;
  1456			}
  1457		}
  1458	
  1459		if (vq->indirect) {
  1460			u32 len, n;
  1461	
  1462			/* Free the indirect table, if any, now that it's unmapped. */
  1463			desc = state->indir_desc;
  1464			if (!desc)
  1465				return;
  1466	
> 1467			n = len / sizeof(struct vring_packed_desc);
  1468	
  1469			if (vq->use_dma_api) {
  1470				len = vq->packed.desc_extra[id].len;
  1471				for (i = 0; i < n; i++)
  1472					vring_unmap_desc_packed(vq, &desc[i]);
  1473			}
  1474	
  1475			desc_cache_put_packed(vq, desc, n);
  1476			state->indir_desc = NULL;
  1477		} else if (ctx) {
  1478			*ctx = state->indir_desc;
  1479		}
  1480	}
  1481	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all(a)lists.01.org

[-- Attachment #2: config.gz --]
[-- Type: application/gzip, Size: 40097 bytes --]

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 1/3] virtio: cache indirect desc for split
  2021-10-28  6:16       ` Xuan Zhuo
@ 2021-10-31 14:47           ` Michael S. Tsirkin
  0 siblings, 0 replies; 41+ messages in thread
From: Michael S. Tsirkin @ 2021-10-31 14:47 UTC (permalink / raw)
  To: Xuan Zhuo
  Cc: Jason Wang, virtualization, netdev, David S. Miller, Jakub Kicinski

On Thu, Oct 28, 2021 at 02:16:03PM +0800, Xuan Zhuo wrote:
> On Thu, 28 Oct 2021 10:16:10 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > On Thu, Oct 28, 2021 at 1:07 AM Michael S. Tsirkin <mst@redhat.com> wrote:
> > >
> > > On Wed, Oct 27, 2021 at 02:19:11PM +0800, Xuan Zhuo wrote:
> > > > In the case of using indirect, indirect desc must be allocated and
> > > > released each time, which increases a lot of cpu overhead.
> > > >
> > > > Here, a cache is added for indirect. If the number of indirect desc to be
> > > > applied for is less than VIRT_QUEUE_CACHE_DESC_NUM, the desc array with
> > > > the size of VIRT_QUEUE_CACHE_DESC_NUM is fixed and cached for reuse.
> > > >
> > > > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > > > ---
> > > >  drivers/virtio/virtio.c      |  6 ++++
> > > >  drivers/virtio/virtio_ring.c | 63 ++++++++++++++++++++++++++++++------
> > > >  include/linux/virtio.h       | 10 ++++++
> > > >  3 files changed, 70 insertions(+), 9 deletions(-)
> > > >
> > > > diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c
> > > > index 0a5b54034d4b..04bcb74e5b9a 100644
> > > > --- a/drivers/virtio/virtio.c
> > > > +++ b/drivers/virtio/virtio.c
> > > > @@ -431,6 +431,12 @@ bool is_virtio_device(struct device *dev)
> > > >  }
> > > >  EXPORT_SYMBOL_GPL(is_virtio_device);
> > > >
> > > > +void virtio_use_desc_cache(struct virtio_device *dev, bool val)
> > > > +{
> > > > +     dev->desc_cache = val;
> > > > +}
> > > > +EXPORT_SYMBOL_GPL(virtio_use_desc_cache);
> > > > +
> > > >  void unregister_virtio_device(struct virtio_device *dev)
> > > >  {
> > > >       int index = dev->index; /* save for after device release */
> > > > diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> > > > index dd95dfd85e98..0b9a8544b0e8 100644
> > > > --- a/drivers/virtio/virtio_ring.c
> > > > +++ b/drivers/virtio/virtio_ring.c
> > > > @@ -117,6 +117,10 @@ struct vring_virtqueue {
> > > >       /* Hint for event idx: already triggered no need to disable. */
> > > >       bool event_triggered;
> > > >
> > > > +     /* Is indirect cache used? */
> > > > +     bool use_desc_cache;
> > > > +     void *desc_cache_chain;
> > > > +
> > > >       union {
> > > >               /* Available for split ring */
> > > >               struct {
> > > > @@ -423,12 +427,47 @@ static unsigned int vring_unmap_one_split(const struct vring_virtqueue *vq,
> > > >       return extra[i].next;
> > > >  }
> > > >
> > > > -static struct vring_desc *alloc_indirect_split(struct virtqueue *_vq,
> > > > +#define VIRT_QUEUE_CACHE_DESC_NUM 4
> > > > +
> > > > +static void desc_cache_chain_free_split(void *chain)
> > > > +{
> > > > +     struct vring_desc *desc;
> > > > +
> > > > +     while (chain) {
> > > > +             desc = chain;
> > > > +             chain = (void *)desc->addr;
> > > > +             kfree(desc);
> > > > +     }
> > > > +}
> > > > +
> > > > +static void desc_cache_put_split(struct vring_virtqueue *vq,
> > > > +                              struct vring_desc *desc, int n)
> > > > +{
> > > > +     if (vq->use_desc_cache && n <= VIRT_QUEUE_CACHE_DESC_NUM) {
> > > > +             desc->addr = (u64)vq->desc_cache_chain;
> > > > +             vq->desc_cache_chain = desc;
> > > > +     } else {
> > > > +             kfree(desc);
> > > > +     }
> > > > +}
> > > > +
> > >
> > >
> > > So I have a question here. What happens if we just do:
> > >
> > > if (n <= VIRT_QUEUE_CACHE_DESC_NUM) {
> > >         return kmem_cache_alloc(VIRT_QUEUE_CACHE_DESC_NUM * sizeof desc, gfp)
> > > } else {
> > >         return kmalloc_arrat(n, sizeof desc, gfp)
> > > }
> > >
> > > A small change and won't we reap most performance benefits?
> >
> > Yes, I think we need a benchmark to use private cache to see how much
> > it can help.
> 
> I did a test, the code is as follows:
> 
> +static void desc_cache_put_packed(struct vring_virtqueue *vq,
> +                                  struct vring_packed_desc *desc, int n)
> + {
> +       if (n <= VIRT_QUEUE_CACHE_DESC_NUM) {
> +               kmem_cache_free(vq->desc_cache, desc);
> +       } else {
> +               kfree(desc);
> +       }
> 
> 
> @@ -476,11 +452,14 @@ static struct vring_desc *alloc_indirect_split(struct vring_virtqueue *vq,
>          */
>         gfp &= ~__GFP_HIGHMEM;
> 
> -       desc = kmalloc_array(n, sizeof(struct vring_desc), gfp);
> +       if (total_sg <= VIRT_QUEUE_CACHE_DESC_NUM)
> +               desc = kmem_cache_alloc(vq->desc_cache, gfp);
> +       else
> +               desc = kmalloc_array(total_sg, sizeof(struct vring_desc), gfp);
> +
> 
> 	.......
> 
> +       vq->desc_cache = kmem_cache_create("virio_desc",
> +                                          4 * sizeof(struct vring_desc),
> +                                          0, 0, NULL);
> 
> The effect is not good, basically there is no improvement, using perf top can
> see that the overhead of kmem_cache_alloc/kmem_cache_free is also relatively
> large:
> 
>          26.91%  [kernel]  [k] virtqueue_add
>          15.35%  [kernel]  [k] detach_buf_split
>          14.15%  [kernel]  [k] virtnet_xsk_xmit
>          13.24%  [kernel]  [k] virtqueue_add_outbuf
>        >  9.30%  [kernel]  [k] __slab_free
>        >  3.91%  [kernel]  [k] kmem_cache_alloc
>           2.85%  [kernel]  [k] virtqueue_get_buf_ctx
>        >  2.82%  [kernel]  [k] kmem_cache_free
>           2.54%  [kernel]  [k] memset_erms
>           2.37%  [kernel]  [k] xsk_tx_peek_desc
>           1.20%  [kernel]  [k] virtnet_xsk_run
>           0.81%  [kernel]  [k] vring_map_one_sg
>           0.69%  [kernel]  [k] __free_old_xmit_ptr
>           0.69%  [kernel]  [k] virtqueue_kick_prepare
>           0.43%  [kernel]  [k] sg_init_table
>           0.41%  [kernel]  [k] sg_next
>           0.28%  [kernel]  [k] vring_unmap_one_split
>           0.25%  [kernel]  [k] vring_map_single.constprop.34
>           0.24%  [kernel]  [k] net_rx_action
> 
> Thanks.


How about batching these?  E.g. kmem_cache_alloc_bulk/kmem_cache_free_bulk?


> >
> > Thanks
> >
> > >
> > > --
> > > MST
> > >
> >


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 1/3] virtio: cache indirect desc for split
@ 2021-10-31 14:47           ` Michael S. Tsirkin
  0 siblings, 0 replies; 41+ messages in thread
From: Michael S. Tsirkin @ 2021-10-31 14:47 UTC (permalink / raw)
  To: Xuan Zhuo; +Cc: netdev, David S. Miller, Jakub Kicinski, virtualization

On Thu, Oct 28, 2021 at 02:16:03PM +0800, Xuan Zhuo wrote:
> On Thu, 28 Oct 2021 10:16:10 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > On Thu, Oct 28, 2021 at 1:07 AM Michael S. Tsirkin <mst@redhat.com> wrote:
> > >
> > > On Wed, Oct 27, 2021 at 02:19:11PM +0800, Xuan Zhuo wrote:
> > > > In the case of using indirect, indirect desc must be allocated and
> > > > released each time, which increases a lot of cpu overhead.
> > > >
> > > > Here, a cache is added for indirect. If the number of indirect desc to be
> > > > applied for is less than VIRT_QUEUE_CACHE_DESC_NUM, the desc array with
> > > > the size of VIRT_QUEUE_CACHE_DESC_NUM is fixed and cached for reuse.
> > > >
> > > > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > > > ---
> > > >  drivers/virtio/virtio.c      |  6 ++++
> > > >  drivers/virtio/virtio_ring.c | 63 ++++++++++++++++++++++++++++++------
> > > >  include/linux/virtio.h       | 10 ++++++
> > > >  3 files changed, 70 insertions(+), 9 deletions(-)
> > > >
> > > > diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c
> > > > index 0a5b54034d4b..04bcb74e5b9a 100644
> > > > --- a/drivers/virtio/virtio.c
> > > > +++ b/drivers/virtio/virtio.c
> > > > @@ -431,6 +431,12 @@ bool is_virtio_device(struct device *dev)
> > > >  }
> > > >  EXPORT_SYMBOL_GPL(is_virtio_device);
> > > >
> > > > +void virtio_use_desc_cache(struct virtio_device *dev, bool val)
> > > > +{
> > > > +     dev->desc_cache = val;
> > > > +}
> > > > +EXPORT_SYMBOL_GPL(virtio_use_desc_cache);
> > > > +
> > > >  void unregister_virtio_device(struct virtio_device *dev)
> > > >  {
> > > >       int index = dev->index; /* save for after device release */
> > > > diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> > > > index dd95dfd85e98..0b9a8544b0e8 100644
> > > > --- a/drivers/virtio/virtio_ring.c
> > > > +++ b/drivers/virtio/virtio_ring.c
> > > > @@ -117,6 +117,10 @@ struct vring_virtqueue {
> > > >       /* Hint for event idx: already triggered no need to disable. */
> > > >       bool event_triggered;
> > > >
> > > > +     /* Is indirect cache used? */
> > > > +     bool use_desc_cache;
> > > > +     void *desc_cache_chain;
> > > > +
> > > >       union {
> > > >               /* Available for split ring */
> > > >               struct {
> > > > @@ -423,12 +427,47 @@ static unsigned int vring_unmap_one_split(const struct vring_virtqueue *vq,
> > > >       return extra[i].next;
> > > >  }
> > > >
> > > > -static struct vring_desc *alloc_indirect_split(struct virtqueue *_vq,
> > > > +#define VIRT_QUEUE_CACHE_DESC_NUM 4
> > > > +
> > > > +static void desc_cache_chain_free_split(void *chain)
> > > > +{
> > > > +     struct vring_desc *desc;
> > > > +
> > > > +     while (chain) {
> > > > +             desc = chain;
> > > > +             chain = (void *)desc->addr;
> > > > +             kfree(desc);
> > > > +     }
> > > > +}
> > > > +
> > > > +static void desc_cache_put_split(struct vring_virtqueue *vq,
> > > > +                              struct vring_desc *desc, int n)
> > > > +{
> > > > +     if (vq->use_desc_cache && n <= VIRT_QUEUE_CACHE_DESC_NUM) {
> > > > +             desc->addr = (u64)vq->desc_cache_chain;
> > > > +             vq->desc_cache_chain = desc;
> > > > +     } else {
> > > > +             kfree(desc);
> > > > +     }
> > > > +}
> > > > +
> > >
> > >
> > > So I have a question here. What happens if we just do:
> > >
> > > if (n <= VIRT_QUEUE_CACHE_DESC_NUM) {
> > >         return kmem_cache_alloc(VIRT_QUEUE_CACHE_DESC_NUM * sizeof desc, gfp)
> > > } else {
> > >         return kmalloc_arrat(n, sizeof desc, gfp)
> > > }
> > >
> > > A small change and won't we reap most performance benefits?
> >
> > Yes, I think we need a benchmark to use private cache to see how much
> > it can help.
> 
> I did a test, the code is as follows:
> 
> +static void desc_cache_put_packed(struct vring_virtqueue *vq,
> +                                  struct vring_packed_desc *desc, int n)
> + {
> +       if (n <= VIRT_QUEUE_CACHE_DESC_NUM) {
> +               kmem_cache_free(vq->desc_cache, desc);
> +       } else {
> +               kfree(desc);
> +       }
> 
> 
> @@ -476,11 +452,14 @@ static struct vring_desc *alloc_indirect_split(struct vring_virtqueue *vq,
>          */
>         gfp &= ~__GFP_HIGHMEM;
> 
> -       desc = kmalloc_array(n, sizeof(struct vring_desc), gfp);
> +       if (total_sg <= VIRT_QUEUE_CACHE_DESC_NUM)
> +               desc = kmem_cache_alloc(vq->desc_cache, gfp);
> +       else
> +               desc = kmalloc_array(total_sg, sizeof(struct vring_desc), gfp);
> +
> 
> 	.......
> 
> +       vq->desc_cache = kmem_cache_create("virio_desc",
> +                                          4 * sizeof(struct vring_desc),
> +                                          0, 0, NULL);
> 
> The effect is not good, basically there is no improvement, using perf top can
> see that the overhead of kmem_cache_alloc/kmem_cache_free is also relatively
> large:
> 
>          26.91%  [kernel]  [k] virtqueue_add
>          15.35%  [kernel]  [k] detach_buf_split
>          14.15%  [kernel]  [k] virtnet_xsk_xmit
>          13.24%  [kernel]  [k] virtqueue_add_outbuf
>        >  9.30%  [kernel]  [k] __slab_free
>        >  3.91%  [kernel]  [k] kmem_cache_alloc
>           2.85%  [kernel]  [k] virtqueue_get_buf_ctx
>        >  2.82%  [kernel]  [k] kmem_cache_free
>           2.54%  [kernel]  [k] memset_erms
>           2.37%  [kernel]  [k] xsk_tx_peek_desc
>           1.20%  [kernel]  [k] virtnet_xsk_run
>           0.81%  [kernel]  [k] vring_map_one_sg
>           0.69%  [kernel]  [k] __free_old_xmit_ptr
>           0.69%  [kernel]  [k] virtqueue_kick_prepare
>           0.43%  [kernel]  [k] sg_init_table
>           0.41%  [kernel]  [k] sg_next
>           0.28%  [kernel]  [k] vring_unmap_one_split
>           0.25%  [kernel]  [k] vring_map_single.constprop.34
>           0.24%  [kernel]  [k] net_rx_action
> 
> Thanks.


How about batching these?  E.g. kmem_cache_alloc_bulk/kmem_cache_free_bulk?


> >
> > Thanks
> >
> > >
> > > --
> > > MST
> > >
> >

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 41+ messages in thread

end of thread, other threads:[~2021-10-31 14:47 UTC | newest]

Thread overview: 41+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-10-27  6:19 [PATCH 0/3] virtio support cache indirect desc Xuan Zhuo
2021-10-27  6:19 ` Xuan Zhuo
2021-10-27  6:19 ` [PATCH 1/3] virtio: cache indirect desc for split Xuan Zhuo
2021-10-27  6:19   ` Xuan Zhuo
2021-10-27  8:55   ` Michael S. Tsirkin
2021-10-27  8:55     ` Michael S. Tsirkin
2021-10-27  9:03     ` Xuan Zhuo
2021-10-27 16:33   ` Dongli Zhang
2021-10-27 16:33     ` Dongli Zhang
2021-10-27 19:36     ` Michael S. Tsirkin
2021-10-27 19:36       ` Michael S. Tsirkin
2021-10-27 17:07   ` Michael S. Tsirkin
2021-10-27 17:07     ` Michael S. Tsirkin
2021-10-28  2:16     ` Jason Wang
2021-10-28  2:16       ` Jason Wang
2021-10-28  6:16       ` Xuan Zhuo
2021-10-31 14:47         ` Michael S. Tsirkin
2021-10-31 14:47           ` Michael S. Tsirkin
2021-10-27 23:02   ` kernel test robot
2021-10-27 23:02     ` kernel test robot
2021-10-27 23:02     ` kernel test robot
2021-10-28  0:57   ` kernel test robot
2021-10-28  0:57     ` kernel test robot
2021-10-28  0:57     ` kernel test robot
2021-10-27  6:19 ` [PATCH 2/3] virtio: cache indirect desc for packed Xuan Zhuo
2021-10-27  6:19   ` Xuan Zhuo
2021-10-28  0:28   ` kernel test robot
2021-10-28  0:28     ` kernel test robot
2021-10-28  0:28     ` kernel test robot
2021-10-28  3:51   ` kernel test robot
2021-10-28  3:51     ` kernel test robot
2021-10-28  3:51     ` kernel test robot
2021-10-28  7:38   ` kernel test robot
2021-10-28  7:38     ` kernel test robot
2021-10-28  7:38     ` kernel test robot
2021-10-27  6:19 ` [PATCH 3/3] virtio-net: enable virtio indirect cache Xuan Zhuo
2021-10-27  6:19   ` Xuan Zhuo
2021-10-27 15:55   ` Jakub Kicinski
2021-10-28  1:57     ` Xuan Zhuo
2021-10-28  2:28       ` Jason Wang
2021-10-28  2:28         ` Jason Wang

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.