netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH vhost v4 00/10] virtio: drivers maintain dma info for premapped vq
@ 2024-03-12  3:35 Xuan Zhuo
  2024-03-12  3:35 ` [PATCH vhost v4 01/10] virtio_ring: introduce vring_need_unmap_buffer Xuan Zhuo
                   ` (11 more replies)
  0 siblings, 12 replies; 31+ messages in thread
From: Xuan Zhuo @ 2024-03-12  3:35 UTC (permalink / raw)
  To: virtualization
  Cc: Michael S. Tsirkin, Jason Wang, Xuan Zhuo, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, netdev

As discussed:

http://lore.kernel.org/all/CACGkMEvq0No8QGC46U4mGsMtuD44fD_cfLcPaVmJ3rHYqRZxYg@mail.gmail.com

If the virtio is premapped mode, the driver should manage the dma info by self.
So the virtio core should not store the dma info. We can release the memory used
to store the dma info.

For virtio-net xmit queue, if the virtio-net maintains the dma info,
the virtio-net must allocate too much memory(19 * queue_size for per-queue), so
we do not plan to make the virtio-net to maintain the dma info by default. The
virtio-net xmit queue only maintain the dma info when premapped mode is enable
(such as AF_XDP is enable).

So this patch set try to do:

1. make the virtio core to do not store the dma info
    - But if the desc_extra has not dma info, we face a new question,
      it is hard to get the dma info of the desc with indirect flag.
      For split mode, that is easy from desc, but for the packed mode,
      it is hard to get the dma info from the desc. And hardening
      the dma unmap is safe, we should store the dma info of indirect
      descs when the virtio core does not store the bufer dma info.

      So I introduce the "structure the indirect desc table" to
      allocate space to store dma info of the desc table.

        +struct vring_split_desc_indir {
        +       dma_addr_t addr;                /* Descriptor Array DMA addr. */
        +       u32 len;                        /* Descriptor Array length. */
        +       u32 num;
        +       struct vring_desc desc[];
        +};

      The follow patches to this:
         * virtio_ring: packed: structure the indirect desc table
         * virtio_ring: split: structure the indirect desc table

    - On the other side, in the umap handle, we mix the indirect descs with
      other descs. That make things too complex. I found if we we distinguish
      the descs with VRING_DESC_F_INDIRECT before unmap, thing will be clearer.

      The follow patches do this.
         * virtio_ring: packed: remove double check of the unmap ops
         * virtio_ring: split: structure the indirect desc table

2. make the virtio core to enable premapped mode by find_vqs() params
    - Because the find_vqs() will try to allocate memory for the dma info.
      If we set the premapped mode after find_vqs() and release the
      dma info, that is odd.


Please review.

Thanks

v4:
    1. virtio-net xmit queue does not enable premapped mode by default

v3:
    1. fix the conflict with the vp_modern_create_avq().

v2:
    1. change the dma item of virtio-net, every item have MAX_SKB_FRAGS + 2 addr + len pairs.
    2. introduce virtnet_sq_free_stats for __free_old_xmit

v1:
    1. rename transport_vq_config to vq_transport_config
    2. virtio-net set dma meta number to (ring-size + 1)(MAX_SKB_FRGAS +2)
    3. introduce virtqueue_dma_map_sg_attrs
    4. separate vring_create_virtqueue to an independent commit

Xuan Zhuo (10):
  virtio_ring: introduce vring_need_unmap_buffer
  virtio_ring: packed: remove double check of the unmap ops
  virtio_ring: packed: structure the indirect desc table
  virtio_ring: split: remove double check of the unmap ops
  virtio_ring: split: structure the indirect desc table
  virtio_ring: no store dma info when unmap is not needed
  virtio: find_vqs: add new parameter premapped
  virtio_ring: export premapped to driver by struct virtqueue
  virtio_net: set premapped mode by find_vqs()
  virtio_ring: virtqueue_set_dma_premapped support disable

 drivers/net/virtio_net.c      |  57 +++--
 drivers/virtio/virtio_ring.c  | 436 +++++++++++++++++++++-------------
 include/linux/virtio.h        |   3 +-
 include/linux/virtio_config.h |  17 +-
 4 files changed, 307 insertions(+), 206 deletions(-)

--
2.32.0.3.g01195cf9f


^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH vhost v4 01/10] virtio_ring: introduce vring_need_unmap_buffer
  2024-03-12  3:35 [PATCH vhost v4 00/10] virtio: drivers maintain dma info for premapped vq Xuan Zhuo
@ 2024-03-12  3:35 ` Xuan Zhuo
  2024-03-12  3:35 ` [PATCH vhost v4 02/10] virtio_ring: packed: remove double check of the unmap ops Xuan Zhuo
                   ` (10 subsequent siblings)
  11 siblings, 0 replies; 31+ messages in thread
From: Xuan Zhuo @ 2024-03-12  3:35 UTC (permalink / raw)
  To: virtualization
  Cc: Michael S. Tsirkin, Jason Wang, Xuan Zhuo, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, netdev

To make the code readable, introduce vring_need_unmap_buffer() to
replace do_unmap.

   use_dma_api premapped -> vring_need_unmap_buffer()
1. false       false        false
2. true        false        true
3. true        true         false

Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Jason Wang <jasowang@redhat.com>
---
 drivers/virtio/virtio_ring.c | 27 ++++++++++++---------------
 1 file changed, 12 insertions(+), 15 deletions(-)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index 94c442ba844f..c2779e34aac7 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -175,11 +175,6 @@ struct vring_virtqueue {
 	/* Do DMA mapping by driver */
 	bool premapped;
 
-	/* Do unmap or not for desc. Just when premapped is False and
-	 * use_dma_api is true, this is true.
-	 */
-	bool do_unmap;
-
 	/* Head of free buffer list. */
 	unsigned int free_head;
 	/* Number we've added since last sync. */
@@ -295,6 +290,11 @@ static bool vring_use_dma_api(const struct virtio_device *vdev)
 	return false;
 }
 
+static bool vring_need_unmap_buffer(const struct vring_virtqueue *vring)
+{
+	return vring->use_dma_api && !vring->premapped;
+}
+
 size_t virtio_max_dma_size(const struct virtio_device *vdev)
 {
 	size_t max_segment_size = SIZE_MAX;
@@ -443,7 +443,7 @@ static void vring_unmap_one_split_indirect(const struct vring_virtqueue *vq,
 {
 	u16 flags;
 
-	if (!vq->do_unmap)
+	if (!vring_need_unmap_buffer(vq))
 		return;
 
 	flags = virtio16_to_cpu(vq->vq.vdev, desc->flags);
@@ -473,7 +473,7 @@ static unsigned int vring_unmap_one_split(const struct vring_virtqueue *vq,
 				 (flags & VRING_DESC_F_WRITE) ?
 				 DMA_FROM_DEVICE : DMA_TO_DEVICE);
 	} else {
-		if (!vq->do_unmap)
+		if (!vring_need_unmap_buffer(vq))
 			goto out;
 
 		dma_unmap_page(vring_dma_dev(vq),
@@ -641,7 +641,7 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
 	}
 	/* Last one doesn't continue. */
 	desc[prev].flags &= cpu_to_virtio16(_vq->vdev, ~VRING_DESC_F_NEXT);
-	if (!indirect && vq->do_unmap)
+	if (!indirect && vring_need_unmap_buffer(vq))
 		vq->split.desc_extra[prev & (vq->split.vring.num - 1)].flags &=
 			~VRING_DESC_F_NEXT;
 
@@ -800,7 +800,7 @@ static void detach_buf_split(struct vring_virtqueue *vq, unsigned int head,
 				VRING_DESC_F_INDIRECT));
 		BUG_ON(len == 0 || len % sizeof(struct vring_desc));
 
-		if (vq->do_unmap) {
+		if (vring_need_unmap_buffer(vq)) {
 			for (j = 0; j < len / sizeof(struct vring_desc); j++)
 				vring_unmap_one_split_indirect(vq, &indir_desc[j]);
 		}
@@ -1230,7 +1230,7 @@ static void vring_unmap_extra_packed(const struct vring_virtqueue *vq,
 				 (flags & VRING_DESC_F_WRITE) ?
 				 DMA_FROM_DEVICE : DMA_TO_DEVICE);
 	} else {
-		if (!vq->do_unmap)
+		if (!vring_need_unmap_buffer(vq))
 			return;
 
 		dma_unmap_page(vring_dma_dev(vq),
@@ -1245,7 +1245,7 @@ static void vring_unmap_desc_packed(const struct vring_virtqueue *vq,
 {
 	u16 flags;
 
-	if (!vq->do_unmap)
+	if (!vring_need_unmap_buffer(vq))
 		return;
 
 	flags = le16_to_cpu(desc->flags);
@@ -1626,7 +1626,7 @@ static void detach_buf_packed(struct vring_virtqueue *vq,
 		if (!desc)
 			return;
 
-		if (vq->do_unmap) {
+		if (vring_need_unmap_buffer(vq)) {
 			len = vq->packed.desc_extra[id].len;
 			for (i = 0; i < len / sizeof(struct vring_packed_desc);
 					i++)
@@ -2080,7 +2080,6 @@ static struct virtqueue *vring_create_virtqueue_packed(struct virtio_device *vde
 	vq->dma_dev = dma_dev;
 	vq->use_dma_api = vring_use_dma_api(vdev);
 	vq->premapped = false;
-	vq->do_unmap = vq->use_dma_api;
 
 	vq->indirect = virtio_has_feature(vdev, VIRTIO_RING_F_INDIRECT_DESC) &&
 		!cfg_vq_get(cfg, ctx);
@@ -2621,7 +2620,6 @@ static struct virtqueue *__vring_new_virtqueue(struct virtio_device *vdev,
 	vq->dma_dev = tp_cfg->dma_dev;
 	vq->use_dma_api = vring_use_dma_api(vdev);
 	vq->premapped = false;
-	vq->do_unmap = vq->use_dma_api;
 
 	vq->indirect = virtio_has_feature(vdev, VIRTIO_RING_F_INDIRECT_DESC) &&
 		!cfg_vq_get(cfg, ctx);
@@ -2752,7 +2750,6 @@ int virtqueue_set_dma_premapped(struct virtqueue *_vq)
 	}
 
 	vq->premapped = true;
-	vq->do_unmap = false;
 
 	END_USE(vq);
 
-- 
2.32.0.3.g01195cf9f


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH vhost v4 02/10] virtio_ring: packed: remove double check of the unmap ops
  2024-03-12  3:35 [PATCH vhost v4 00/10] virtio: drivers maintain dma info for premapped vq Xuan Zhuo
  2024-03-12  3:35 ` [PATCH vhost v4 01/10] virtio_ring: introduce vring_need_unmap_buffer Xuan Zhuo
@ 2024-03-12  3:35 ` Xuan Zhuo
  2024-03-21  5:57   ` Jason Wang
  2024-03-12  3:35 ` [PATCH vhost v4 03/10] virtio_ring: packed: structure the indirect desc table Xuan Zhuo
                   ` (9 subsequent siblings)
  11 siblings, 1 reply; 31+ messages in thread
From: Xuan Zhuo @ 2024-03-12  3:35 UTC (permalink / raw)
  To: virtualization
  Cc: Michael S. Tsirkin, Jason Wang, Xuan Zhuo, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, netdev

In the functions vring_unmap_extra_packed and vring_unmap_desc_packed,
multiple checks are made whether unmap is performed and whether it is
INDIRECT.

These two functions are usually called in a loop, and we should put the
check outside the loop.

And we unmap the descs with VRING_DESC_F_INDIRECT on the same path with
other descs, that make the thing more complex. If we distinguish the
descs with VRING_DESC_F_INDIRECT before unmap, thing will be clearer.

1. only one desc of the desc table is used, we do not need the loop
2. the called unmap api is difference from the other desc
3. the vq->premapped is not needed to check
4. the vq->indirect is not needed to check
5. the state->indir_desc must not be null

Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
---
 drivers/virtio/virtio_ring.c | 78 ++++++++++++++++++------------------
 1 file changed, 40 insertions(+), 38 deletions(-)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index c2779e34aac7..0dfbd17e5a87 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -1214,6 +1214,7 @@ static u16 packed_last_used(u16 last_used_idx)
 	return last_used_idx & ~(-(1 << VRING_PACKED_EVENT_F_WRAP_CTR));
 }
 
+/* caller must check vring_need_unmap_buffer() */
 static void vring_unmap_extra_packed(const struct vring_virtqueue *vq,
 				     const struct vring_desc_extra *extra)
 {
@@ -1221,33 +1222,18 @@ static void vring_unmap_extra_packed(const struct vring_virtqueue *vq,
 
 	flags = extra->flags;
 
-	if (flags & VRING_DESC_F_INDIRECT) {
-		if (!vq->use_dma_api)
-			return;
-
-		dma_unmap_single(vring_dma_dev(vq),
-				 extra->addr, extra->len,
-				 (flags & VRING_DESC_F_WRITE) ?
-				 DMA_FROM_DEVICE : DMA_TO_DEVICE);
-	} else {
-		if (!vring_need_unmap_buffer(vq))
-			return;
-
-		dma_unmap_page(vring_dma_dev(vq),
-			       extra->addr, extra->len,
-			       (flags & VRING_DESC_F_WRITE) ?
-			       DMA_FROM_DEVICE : DMA_TO_DEVICE);
-	}
+	dma_unmap_page(vring_dma_dev(vq),
+		       extra->addr, extra->len,
+		       (flags & VRING_DESC_F_WRITE) ?
+		       DMA_FROM_DEVICE : DMA_TO_DEVICE);
 }
 
+/* caller must check vring_need_unmap_buffer() */
 static void vring_unmap_desc_packed(const struct vring_virtqueue *vq,
 				    const struct vring_packed_desc *desc)
 {
 	u16 flags;
 
-	if (!vring_need_unmap_buffer(vq))
-		return;
-
 	flags = le16_to_cpu(desc->flags);
 
 	dma_unmap_page(vring_dma_dev(vq),
@@ -1323,7 +1309,7 @@ static int virtqueue_add_indirect_packed(struct vring_virtqueue *vq,
 			total_sg * sizeof(struct vring_packed_desc),
 			DMA_TO_DEVICE);
 	if (vring_mapping_error(vq, addr)) {
-		if (vq->premapped)
+		if (!vring_need_unmap_buffer(vq))
 			goto free_desc;
 
 		goto unmap_release;
@@ -1338,10 +1324,11 @@ static int virtqueue_add_indirect_packed(struct vring_virtqueue *vq,
 		vq->packed.desc_extra[id].addr = addr;
 		vq->packed.desc_extra[id].len = total_sg *
 				sizeof(struct vring_packed_desc);
-		vq->packed.desc_extra[id].flags = VRING_DESC_F_INDIRECT |
-						  vq->packed.avail_used_flags;
 	}
 
+	vq->packed.desc_extra[id].flags = VRING_DESC_F_INDIRECT |
+		vq->packed.avail_used_flags;
+
 	/*
 	 * A driver MUST NOT make the first descriptor in the list
 	 * available before all subsequent descriptors comprising
@@ -1382,6 +1369,8 @@ static int virtqueue_add_indirect_packed(struct vring_virtqueue *vq,
 unmap_release:
 	err_idx = i;
 
+	WARN_ON(!vring_need_unmap_buffer(vq));
+
 	for (i = 0; i < err_idx; i++)
 		vring_unmap_desc_packed(vq, &desc[i]);
 
@@ -1475,12 +1464,13 @@ static inline int virtqueue_add_packed(struct virtqueue *_vq,
 			desc[i].len = cpu_to_le32(sg->length);
 			desc[i].id = cpu_to_le16(id);
 
-			if (unlikely(vq->use_dma_api)) {
+			if (vring_need_unmap_buffer(vq)) {
 				vq->packed.desc_extra[curr].addr = addr;
 				vq->packed.desc_extra[curr].len = sg->length;
-				vq->packed.desc_extra[curr].flags =
-					le16_to_cpu(flags);
 			}
+
+			vq->packed.desc_extra[curr].flags = le16_to_cpu(flags);
+
 			prev = curr;
 			curr = vq->packed.desc_extra[curr].next;
 
@@ -1530,6 +1520,8 @@ static inline int virtqueue_add_packed(struct virtqueue *_vq,
 
 	vq->packed.avail_used_flags = avail_used_flags;
 
+	WARN_ON(!vring_need_unmap_buffer(vq));
+
 	for (n = 0; n < total_sg; n++) {
 		if (i == err_idx)
 			break;
@@ -1599,7 +1591,9 @@ static void detach_buf_packed(struct vring_virtqueue *vq,
 	struct vring_desc_state_packed *state = NULL;
 	struct vring_packed_desc *desc;
 	unsigned int i, curr;
+	u16 flags;
 
+	flags = vq->packed.desc_extra[id].flags;
 	state = &vq->packed.desc_state[id];
 
 	/* Clear data ptr. */
@@ -1609,22 +1603,32 @@ static void detach_buf_packed(struct vring_virtqueue *vq,
 	vq->free_head = id;
 	vq->vq.num_free += state->num;
 
-	if (unlikely(vq->use_dma_api)) {
-		curr = id;
-		for (i = 0; i < state->num; i++) {
-			vring_unmap_extra_packed(vq,
-						 &vq->packed.desc_extra[curr]);
-			curr = vq->packed.desc_extra[curr].next;
+	if (!(flags & VRING_DESC_F_INDIRECT)) {
+		if (vring_need_unmap_buffer(vq)) {
+			curr = id;
+			for (i = 0; i < state->num; i++) {
+				vring_unmap_extra_packed(vq,
+							 &vq->packed.desc_extra[curr]);
+				curr = vq->packed.desc_extra[curr].next;
+			}
 		}
-	}
 
-	if (vq->indirect) {
+		if (ctx)
+			*ctx = state->indir_desc;
+	} else {
+		const struct vring_desc_extra *extra;
 		u32 len;
 
+		if (vq->use_dma_api) {
+			extra = &vq->packed.desc_extra[id];
+			dma_unmap_single(vring_dma_dev(vq),
+					 extra->addr, extra->len,
+					 (flags & VRING_DESC_F_WRITE) ?
+					 DMA_FROM_DEVICE : DMA_TO_DEVICE);
+		}
+
 		/* Free the indirect table, if any, now that it's unmapped. */
 		desc = state->indir_desc;
-		if (!desc)
-			return;
 
 		if (vring_need_unmap_buffer(vq)) {
 			len = vq->packed.desc_extra[id].len;
@@ -1634,8 +1638,6 @@ static void detach_buf_packed(struct vring_virtqueue *vq,
 		}
 		kfree(desc);
 		state->indir_desc = NULL;
-	} else if (ctx) {
-		*ctx = state->indir_desc;
 	}
 }
 
-- 
2.32.0.3.g01195cf9f


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH vhost v4 03/10] virtio_ring: packed: structure the indirect desc table
  2024-03-12  3:35 [PATCH vhost v4 00/10] virtio: drivers maintain dma info for premapped vq Xuan Zhuo
  2024-03-12  3:35 ` [PATCH vhost v4 01/10] virtio_ring: introduce vring_need_unmap_buffer Xuan Zhuo
  2024-03-12  3:35 ` [PATCH vhost v4 02/10] virtio_ring: packed: remove double check of the unmap ops Xuan Zhuo
@ 2024-03-12  3:35 ` Xuan Zhuo
  2024-03-21  4:47   ` Jason Wang
  2024-03-12  3:35 ` [PATCH vhost v4 04/10] virtio_ring: split: remove double check of the unmap ops Xuan Zhuo
                   ` (8 subsequent siblings)
  11 siblings, 1 reply; 31+ messages in thread
From: Xuan Zhuo @ 2024-03-12  3:35 UTC (permalink / raw)
  To: virtualization
  Cc: Michael S. Tsirkin, Jason Wang, Xuan Zhuo, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, netdev

This commit structure the indirect desc table.
Then we can get the desc num directly when doing unmap.

And save the dma info to the struct, then the indirect
will not use the dma fields of the desc_extra. The subsequent
commits will make the dma fields are optional. But for
the indirect case, we must record the dma info.

Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
---
 drivers/virtio/virtio_ring.c | 66 +++++++++++++++++++++---------------
 1 file changed, 38 insertions(+), 28 deletions(-)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index 0dfbd17e5a87..22a588bba166 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -72,9 +72,16 @@ struct vring_desc_state_split {
 	struct vring_desc *indir_desc;	/* Indirect descriptor, if any. */
 };
 
+struct vring_packed_desc_indir {
+	dma_addr_t addr;		/* Descriptor Array DMA addr. */
+	u32 len;			/* Descriptor Array length. */
+	u32 num;
+	struct vring_packed_desc desc[];
+};
+
 struct vring_desc_state_packed {
 	void *data;			/* Data for callback. */
-	struct vring_packed_desc *indir_desc; /* Indirect descriptor, if any. */
+	struct vring_packed_desc_indir *indir_desc; /* Indirect descriptor, if any. */
 	u16 num;			/* Descriptor list length. */
 	u16 last;			/* The last desc state in a list. */
 };
@@ -1243,10 +1250,13 @@ static void vring_unmap_desc_packed(const struct vring_virtqueue *vq,
 		       DMA_FROM_DEVICE : DMA_TO_DEVICE);
 }
 
-static struct vring_packed_desc *alloc_indirect_packed(unsigned int total_sg,
+static struct vring_packed_desc_indir *alloc_indirect_packed(unsigned int total_sg,
 						       gfp_t gfp)
 {
-	struct vring_packed_desc *desc;
+	struct vring_packed_desc_indir *in_desc;
+	u32 size;
+
+	size = struct_size(in_desc, desc, total_sg);
 
 	/*
 	 * We require lowmem mappings for the descriptors because
@@ -1255,9 +1265,10 @@ static struct vring_packed_desc *alloc_indirect_packed(unsigned int total_sg,
 	 */
 	gfp &= ~__GFP_HIGHMEM;
 
-	desc = kmalloc_array(total_sg, sizeof(struct vring_packed_desc), gfp);
 
-	return desc;
+	in_desc = kmalloc(size, gfp);
+
+	return in_desc;
 }
 
 static int virtqueue_add_indirect_packed(struct vring_virtqueue *vq,
@@ -1268,6 +1279,7 @@ static int virtqueue_add_indirect_packed(struct vring_virtqueue *vq,
 					 void *data,
 					 gfp_t gfp)
 {
+	struct vring_packed_desc_indir *in_desc;
 	struct vring_packed_desc *desc;
 	struct scatterlist *sg;
 	unsigned int i, n, err_idx;
@@ -1275,10 +1287,12 @@ static int virtqueue_add_indirect_packed(struct vring_virtqueue *vq,
 	dma_addr_t addr;
 
 	head = vq->packed.next_avail_idx;
-	desc = alloc_indirect_packed(total_sg, gfp);
-	if (!desc)
+	in_desc = alloc_indirect_packed(total_sg, gfp);
+	if (!in_desc)
 		return -ENOMEM;
 
+	desc = in_desc->desc;
+
 	if (unlikely(vq->vq.num_free < 1)) {
 		pr_debug("Can't add buf len 1 - avail = 0\n");
 		kfree(desc);
@@ -1315,17 +1329,19 @@ static int virtqueue_add_indirect_packed(struct vring_virtqueue *vq,
 		goto unmap_release;
 	}
 
+	if (vring_need_unmap_buffer(vq))
+		in_desc->num = i;
+
+	if (vq->use_dma_api) {
+		in_desc->addr = addr;
+		in_desc->len = total_sg * sizeof(struct vring_packed_desc);
+	}
+
 	vq->packed.vring.desc[head].addr = cpu_to_le64(addr);
 	vq->packed.vring.desc[head].len = cpu_to_le32(total_sg *
 				sizeof(struct vring_packed_desc));
 	vq->packed.vring.desc[head].id = cpu_to_le16(id);
 
-	if (vq->use_dma_api) {
-		vq->packed.desc_extra[id].addr = addr;
-		vq->packed.desc_extra[id].len = total_sg *
-				sizeof(struct vring_packed_desc);
-	}
-
 	vq->packed.desc_extra[id].flags = VRING_DESC_F_INDIRECT |
 		vq->packed.avail_used_flags;
 
@@ -1356,7 +1372,7 @@ static int virtqueue_add_indirect_packed(struct vring_virtqueue *vq,
 	/* Store token and indirect buffer state. */
 	vq->packed.desc_state[id].num = 1;
 	vq->packed.desc_state[id].data = data;
-	vq->packed.desc_state[id].indir_desc = desc;
+	vq->packed.desc_state[id].indir_desc = in_desc;
 	vq->packed.desc_state[id].last = id;
 
 	vq->num_added += 1;
@@ -1375,7 +1391,7 @@ static int virtqueue_add_indirect_packed(struct vring_virtqueue *vq,
 		vring_unmap_desc_packed(vq, &desc[i]);
 
 free_desc:
-	kfree(desc);
+	kfree(in_desc);
 
 	END_USE(vq);
 	return -ENOMEM;
@@ -1589,7 +1605,6 @@ static void detach_buf_packed(struct vring_virtqueue *vq,
 			      unsigned int id, void **ctx)
 {
 	struct vring_desc_state_packed *state = NULL;
-	struct vring_packed_desc *desc;
 	unsigned int i, curr;
 	u16 flags;
 
@@ -1616,27 +1631,22 @@ static void detach_buf_packed(struct vring_virtqueue *vq,
 		if (ctx)
 			*ctx = state->indir_desc;
 	} else {
-		const struct vring_desc_extra *extra;
-		u32 len;
+		struct vring_packed_desc_indir *in_desc;
+
+		in_desc = state->indir_desc;
 
 		if (vq->use_dma_api) {
-			extra = &vq->packed.desc_extra[id];
 			dma_unmap_single(vring_dma_dev(vq),
-					 extra->addr, extra->len,
+					 in_desc->addr, in_desc->len,
 					 (flags & VRING_DESC_F_WRITE) ?
 					 DMA_FROM_DEVICE : DMA_TO_DEVICE);
 		}
 
-		/* Free the indirect table, if any, now that it's unmapped. */
-		desc = state->indir_desc;
-
 		if (vring_need_unmap_buffer(vq)) {
-			len = vq->packed.desc_extra[id].len;
-			for (i = 0; i < len / sizeof(struct vring_packed_desc);
-					i++)
-				vring_unmap_desc_packed(vq, &desc[i]);
+			for (i = 0; i < in_desc->num; i++)
+				vring_unmap_desc_packed(vq, &in_desc->desc[i]);
 		}
-		kfree(desc);
+		kfree(in_desc);
 		state->indir_desc = NULL;
 	}
 }
-- 
2.32.0.3.g01195cf9f


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH vhost v4 04/10] virtio_ring: split: remove double check of the unmap ops
  2024-03-12  3:35 [PATCH vhost v4 00/10] virtio: drivers maintain dma info for premapped vq Xuan Zhuo
                   ` (2 preceding siblings ...)
  2024-03-12  3:35 ` [PATCH vhost v4 03/10] virtio_ring: packed: structure the indirect desc table Xuan Zhuo
@ 2024-03-12  3:35 ` Xuan Zhuo
  2024-03-12  3:35 ` [PATCH vhost v4 05/10] virtio_ring: split: structure the indirect desc table Xuan Zhuo
                   ` (7 subsequent siblings)
  11 siblings, 0 replies; 31+ messages in thread
From: Xuan Zhuo @ 2024-03-12  3:35 UTC (permalink / raw)
  To: virtualization
  Cc: Michael S. Tsirkin, Jason Wang, Xuan Zhuo, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, netdev

In the functions vring_unmap_one_split and
vring_unmap_one_split_indirect,
multiple checks are made whether unmap is performed and whether it is
INDIRECT.

These two functions are usually called in a loop, and we should put the
check outside the loop.

And we unmap the descs with VRING_DESC_F_INDIRECT on the same path with
other descs, that make the thing more complex. If we distinguish the
descs with VRING_DESC_F_INDIRECT before unmap, thing will be clearer.

1. only one desc of the desc table is used, we do not need the loop
2. the called unmap api is difference from the other desc
3. the vq->premapped is not needed to check
4. the vq->indirect is not needed to check
5. the state->indir_desc must not be null

Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
---
 drivers/virtio/virtio_ring.c | 79 +++++++++++++++++-------------------
 1 file changed, 38 insertions(+), 41 deletions(-)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index 22a588bba166..9d5a9b907a2e 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -450,9 +450,6 @@ static void vring_unmap_one_split_indirect(const struct vring_virtqueue *vq,
 {
 	u16 flags;
 
-	if (!vring_need_unmap_buffer(vq))
-		return;
-
 	flags = virtio16_to_cpu(vq->vq.vdev, desc->flags);
 
 	dma_unmap_page(vring_dma_dev(vq),
@@ -470,27 +467,12 @@ static unsigned int vring_unmap_one_split(const struct vring_virtqueue *vq,
 
 	flags = extra[i].flags;
 
-	if (flags & VRING_DESC_F_INDIRECT) {
-		if (!vq->use_dma_api)
-			goto out;
-
-		dma_unmap_single(vring_dma_dev(vq),
-				 extra[i].addr,
-				 extra[i].len,
-				 (flags & VRING_DESC_F_WRITE) ?
-				 DMA_FROM_DEVICE : DMA_TO_DEVICE);
-	} else {
-		if (!vring_need_unmap_buffer(vq))
-			goto out;
-
-		dma_unmap_page(vring_dma_dev(vq),
-			       extra[i].addr,
-			       extra[i].len,
-			       (flags & VRING_DESC_F_WRITE) ?
-			       DMA_FROM_DEVICE : DMA_TO_DEVICE);
-	}
+	dma_unmap_page(vring_dma_dev(vq),
+		       extra[i].addr,
+		       extra[i].len,
+		       (flags & VRING_DESC_F_WRITE) ?
+		       DMA_FROM_DEVICE : DMA_TO_DEVICE);
 
-out:
 	return extra[i].next;
 }
 
@@ -658,7 +640,7 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
 			vq, desc, total_sg * sizeof(struct vring_desc),
 			DMA_TO_DEVICE);
 		if (vring_mapping_error(vq, addr)) {
-			if (vq->premapped)
+			if (!vring_need_unmap_buffer(vq))
 				goto free_indirect;
 
 			goto unmap_release;
@@ -711,6 +693,9 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
 	return 0;
 
 unmap_release:
+
+	WARN_ON(!vring_need_unmap_buffer(vq));
+
 	err_idx = i;
 
 	if (indirect)
@@ -772,34 +757,42 @@ static void detach_buf_split(struct vring_virtqueue *vq, unsigned int head,
 {
 	unsigned int i, j;
 	__virtio16 nextflag = cpu_to_virtio16(vq->vq.vdev, VRING_DESC_F_NEXT);
+	u16 flags;
 
 	/* Clear data ptr. */
 	vq->split.desc_state[head].data = NULL;
+	flags = vq->split.desc_extra[head].flags;
 
 	/* Put back on free list: unmap first-level descriptors and find end */
 	i = head;
 
-	while (vq->split.vring.desc[i].flags & nextflag) {
-		vring_unmap_one_split(vq, i);
-		i = vq->split.desc_extra[i].next;
-		vq->vq.num_free++;
-	}
-
-	vring_unmap_one_split(vq, i);
-	vq->split.desc_extra[i].next = vq->free_head;
-	vq->free_head = head;
+	if (!(flags & VRING_DESC_F_INDIRECT)) {
+		while (vq->split.vring.desc[i].flags & nextflag) {
+			if (vring_need_unmap_buffer(vq))
+				vring_unmap_one_split(vq, i);
+			i = vq->split.desc_extra[i].next;
+			vq->vq.num_free++;
+		}
 
-	/* Plus final descriptor */
-	vq->vq.num_free++;
+		if (vring_need_unmap_buffer(vq))
+			vring_unmap_one_split(vq, i);
 
-	if (vq->indirect) {
+		if (ctx)
+			*ctx = vq->split.desc_state[head].indir_desc;
+	} else {
 		struct vring_desc *indir_desc =
 				vq->split.desc_state[head].indir_desc;
 		u32 len;
 
-		/* Free the indirect table, if any, now that it's unmapped. */
-		if (!indir_desc)
-			return;
+		if (vq->use_dma_api) {
+			struct vring_desc_extra *extra = vq->split.desc_extra;
+
+			dma_unmap_single(vring_dma_dev(vq),
+					 extra[i].addr,
+					 extra[i].len,
+					 (flags & VRING_DESC_F_WRITE) ?
+					 DMA_FROM_DEVICE : DMA_TO_DEVICE);
+		}
 
 		len = vq->split.desc_extra[head].len;
 
@@ -814,9 +807,13 @@ static void detach_buf_split(struct vring_virtqueue *vq, unsigned int head,
 
 		kfree(indir_desc);
 		vq->split.desc_state[head].indir_desc = NULL;
-	} else if (ctx) {
-		*ctx = vq->split.desc_state[head].indir_desc;
 	}
+
+	vq->split.desc_extra[i].next = vq->free_head;
+	vq->free_head = head;
+
+	/* Plus final descriptor */
+	vq->vq.num_free++;
 }
 
 static bool more_used_split(const struct vring_virtqueue *vq)
-- 
2.32.0.3.g01195cf9f


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH vhost v4 05/10] virtio_ring: split: structure the indirect desc table
  2024-03-12  3:35 [PATCH vhost v4 00/10] virtio: drivers maintain dma info for premapped vq Xuan Zhuo
                   ` (3 preceding siblings ...)
  2024-03-12  3:35 ` [PATCH vhost v4 04/10] virtio_ring: split: remove double check of the unmap ops Xuan Zhuo
@ 2024-03-12  3:35 ` Xuan Zhuo
  2024-03-12  3:35 ` [PATCH vhost v4 06/10] virtio_ring: no store dma info when unmap is not needed Xuan Zhuo
                   ` (6 subsequent siblings)
  11 siblings, 0 replies; 31+ messages in thread
From: Xuan Zhuo @ 2024-03-12  3:35 UTC (permalink / raw)
  To: virtualization
  Cc: Michael S. Tsirkin, Jason Wang, Xuan Zhuo, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, netdev

This commit structure the indirect desc table.
Then we can get the desc num directly when doing unmap.

And save the dma info to the struct, then the indirect
will not use the dma fields of the desc_extra. The subsequent
commits will make the dma fields are optional. But for
the indirect case, we must record the dma info.

Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
---
 drivers/virtio/virtio_ring.c | 91 ++++++++++++++++++++++--------------
 1 file changed, 55 insertions(+), 36 deletions(-)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index 9d5a9b907a2e..81a8c2b70af1 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -67,9 +67,16 @@
 #define LAST_ADD_TIME_INVALID(vq)
 #endif
 
+struct vring_split_desc_indir {
+	dma_addr_t addr;		/* Descriptor Array DMA addr. */
+	u32 len;			/* Descriptor Array length. */
+	u32 num;
+	struct vring_desc desc[];
+};
+
 struct vring_desc_state_split {
 	void *data;			/* Data for callback. */
-	struct vring_desc *indir_desc;	/* Indirect descriptor, if any. */
+	struct vring_split_desc_indir *indir_desc;	/* Indirect descriptor, if any. */
 };
 
 struct vring_packed_desc_indir {
@@ -476,12 +483,16 @@ static unsigned int vring_unmap_one_split(const struct vring_virtqueue *vq,
 	return extra[i].next;
 }
 
-static struct vring_desc *alloc_indirect_split(struct virtqueue *_vq,
-					       unsigned int total_sg,
-					       gfp_t gfp)
+static struct vring_split_desc_indir *alloc_indirect_split(struct virtqueue *_vq,
+							   unsigned int total_sg,
+							   gfp_t gfp)
 {
+	struct vring_split_desc_indir *in_desc;
 	struct vring_desc *desc;
 	unsigned int i;
+	u32 size;
+
+	size = struct_size(in_desc, desc, total_sg);
 
 	/*
 	 * We require lowmem mappings for the descriptors because
@@ -490,13 +501,16 @@ static struct vring_desc *alloc_indirect_split(struct virtqueue *_vq,
 	 */
 	gfp &= ~__GFP_HIGHMEM;
 
-	desc = kmalloc_array(total_sg, sizeof(struct vring_desc), gfp);
-	if (!desc)
+	in_desc = kmalloc(size, gfp);
+	if (!in_desc)
 		return NULL;
 
+	desc = in_desc->desc;
+
 	for (i = 0; i < total_sg; i++)
 		desc[i].next = cpu_to_virtio16(_vq->vdev, i + 1);
-	return desc;
+
+	return in_desc;
 }
 
 static inline unsigned int virtqueue_add_desc_split(struct virtqueue *vq,
@@ -538,6 +552,7 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
 				      gfp_t gfp)
 {
 	struct vring_virtqueue *vq = to_vvq(_vq);
+	struct vring_split_desc_indir *in_desc;
 	struct scatterlist *sg;
 	struct vring_desc *desc;
 	unsigned int i, n, avail, descs_used, prev, err_idx;
@@ -560,9 +575,13 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
 
 	head = vq->free_head;
 
-	if (virtqueue_use_indirect(vq, total_sg))
-		desc = alloc_indirect_split(_vq, total_sg, gfp);
-	else {
+	if (virtqueue_use_indirect(vq, total_sg)) {
+		in_desc = alloc_indirect_split(_vq, total_sg, gfp);
+		if (!in_desc)
+			desc = NULL;
+		else
+			desc = in_desc->desc;
+	} else {
 		desc = NULL;
 		WARN_ON_ONCE(total_sg > vq->split.vring.num && !vq->indirect);
 	}
@@ -635,10 +654,10 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
 			~VRING_DESC_F_NEXT;
 
 	if (indirect) {
+		u32 size = total_sg * sizeof(struct vring_desc);
+
 		/* Now that the indirect table is filled in, map it. */
-		dma_addr_t addr = vring_map_single(
-			vq, desc, total_sg * sizeof(struct vring_desc),
-			DMA_TO_DEVICE);
+		dma_addr_t addr = vring_map_single(vq, desc, size, DMA_TO_DEVICE);
 		if (vring_mapping_error(vq, addr)) {
 			if (!vring_need_unmap_buffer(vq))
 				goto free_indirect;
@@ -646,11 +665,20 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
 			goto unmap_release;
 		}
 
-		virtqueue_add_desc_split(_vq, vq->split.vring.desc,
-					 head, addr,
-					 total_sg * sizeof(struct vring_desc),
-					 VRING_DESC_F_INDIRECT,
-					 false);
+		desc = &vq->split.vring.desc[head];
+
+		desc->flags = cpu_to_virtio16(_vq->vdev, VRING_DESC_F_INDIRECT);
+		desc->addr = cpu_to_virtio64(_vq->vdev, addr);
+		desc->len = cpu_to_virtio32(_vq->vdev, size);
+
+		vq->split.desc_extra[head].flags = VRING_DESC_F_INDIRECT;
+
+		if (vq->use_dma_api) {
+			in_desc->addr = addr;
+			in_desc->len = size;
+		}
+
+		in_desc->num = total_sg;
 	}
 
 	/* We're using some buffers from the free list. */
@@ -665,7 +693,7 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
 	/* Store token and indirect buffer state. */
 	vq->split.desc_state[head].data = data;
 	if (indirect)
-		vq->split.desc_state[head].indir_desc = desc;
+		vq->split.desc_state[head].indir_desc = in_desc;
 	else
 		vq->split.desc_state[head].indir_desc = ctx;
 
@@ -715,7 +743,7 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
 
 free_indirect:
 	if (indirect)
-		kfree(desc);
+		kfree(in_desc);
 
 	END_USE(vq);
 	return -ENOMEM;
@@ -780,32 +808,23 @@ static void detach_buf_split(struct vring_virtqueue *vq, unsigned int head,
 		if (ctx)
 			*ctx = vq->split.desc_state[head].indir_desc;
 	} else {
-		struct vring_desc *indir_desc =
-				vq->split.desc_state[head].indir_desc;
-		u32 len;
+		struct vring_split_desc_indir *in_desc;
 
-		if (vq->use_dma_api) {
-			struct vring_desc_extra *extra = vq->split.desc_extra;
+		in_desc = vq->split.desc_state[head].indir_desc;
 
+		if (vq->use_dma_api) {
 			dma_unmap_single(vring_dma_dev(vq),
-					 extra[i].addr,
-					 extra[i].len,
+					 in_desc->addr, in_desc->len,
 					 (flags & VRING_DESC_F_WRITE) ?
 					 DMA_FROM_DEVICE : DMA_TO_DEVICE);
 		}
 
-		len = vq->split.desc_extra[head].len;
-
-		BUG_ON(!(vq->split.desc_extra[head].flags &
-				VRING_DESC_F_INDIRECT));
-		BUG_ON(len == 0 || len % sizeof(struct vring_desc));
-
 		if (vring_need_unmap_buffer(vq)) {
-			for (j = 0; j < len / sizeof(struct vring_desc); j++)
-				vring_unmap_one_split_indirect(vq, &indir_desc[j]);
+			for (j = 0; j < in_desc->num; j++)
+				vring_unmap_one_split_indirect(vq, &in_desc->desc[j]);
 		}
 
-		kfree(indir_desc);
+		kfree(in_desc);
 		vq->split.desc_state[head].indir_desc = NULL;
 	}
 
-- 
2.32.0.3.g01195cf9f


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH vhost v4 06/10] virtio_ring: no store dma info when unmap is not needed
  2024-03-12  3:35 [PATCH vhost v4 00/10] virtio: drivers maintain dma info for premapped vq Xuan Zhuo
                   ` (4 preceding siblings ...)
  2024-03-12  3:35 ` [PATCH vhost v4 05/10] virtio_ring: split: structure the indirect desc table Xuan Zhuo
@ 2024-03-12  3:35 ` Xuan Zhuo
  2024-03-12  3:35 ` [PATCH vhost v4 07/10] virtio: find_vqs: add new parameter premapped Xuan Zhuo
                   ` (5 subsequent siblings)
  11 siblings, 0 replies; 31+ messages in thread
From: Xuan Zhuo @ 2024-03-12  3:35 UTC (permalink / raw)
  To: virtualization
  Cc: Michael S. Tsirkin, Jason Wang, Xuan Zhuo, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, netdev

As discussed:
http://lore.kernel.org/all/CACGkMEug-=C+VQhkMYSgUKMC==04m7-uem_yC21bgGkKZh845w@mail.gmail.com

When the vq is premapped mode, the driver manages the dma
info is a good way.

So this commit make the virtio core not to store the dma
info and release the memory which is used to store the dma
info.

If the use_dma_api is false, the memory is also not allocated.

Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
---
 drivers/virtio/virtio_ring.c | 100 ++++++++++++++++++++++++++++++-----
 1 file changed, 87 insertions(+), 13 deletions(-)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index 81a8c2b70af1..f5a72d22c838 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -94,12 +94,15 @@ struct vring_desc_state_packed {
 };
 
 struct vring_desc_extra {
-	dma_addr_t addr;		/* Descriptor DMA addr. */
-	u32 len;			/* Descriptor length. */
 	u16 flags;			/* Descriptor flags. */
 	u16 next;			/* The next desc state in a list. */
 };
 
+struct vring_desc_dma {
+	dma_addr_t addr;		/* Descriptor DMA addr. */
+	u32 len;			/* Descriptor length. */
+};
+
 struct vring_virtqueue_split {
 	/* Actual memory layout for this queue. */
 	struct vring vring;
@@ -116,6 +119,7 @@ struct vring_virtqueue_split {
 	/* Per-descriptor state. */
 	struct vring_desc_state_split *desc_state;
 	struct vring_desc_extra *desc_extra;
+	struct vring_desc_dma *desc_dma;
 
 	/* DMA address and size information */
 	dma_addr_t queue_dma_addr;
@@ -156,6 +160,7 @@ struct vring_virtqueue_packed {
 	/* Per-descriptor state. */
 	struct vring_desc_state_packed *desc_state;
 	struct vring_desc_extra *desc_extra;
+	struct vring_desc_dma *desc_dma;
 
 	/* DMA address and size information */
 	dma_addr_t ring_dma_addr;
@@ -470,13 +475,14 @@ static unsigned int vring_unmap_one_split(const struct vring_virtqueue *vq,
 					  unsigned int i)
 {
 	struct vring_desc_extra *extra = vq->split.desc_extra;
+	struct vring_desc_dma *dma = vq->split.desc_dma;
 	u16 flags;
 
 	flags = extra[i].flags;
 
 	dma_unmap_page(vring_dma_dev(vq),
-		       extra[i].addr,
-		       extra[i].len,
+		       dma[i].addr,
+		       dma[i].len,
 		       (flags & VRING_DESC_F_WRITE) ?
 		       DMA_FROM_DEVICE : DMA_TO_DEVICE);
 
@@ -533,8 +539,11 @@ static inline unsigned int virtqueue_add_desc_split(struct virtqueue *vq,
 		next = extra[i].next;
 		desc[i].next = cpu_to_virtio16(vq->vdev, next);
 
-		extra[i].addr = addr;
-		extra[i].len = len;
+		if (vring->split.desc_dma) {
+			vring->split.desc_dma[i].addr = addr;
+			vring->split.desc_dma[i].len = len;
+		}
+
 		extra[i].flags = flags;
 	} else
 		next = virtio16_to_cpu(vq->vdev, desc[i].next);
@@ -1070,6 +1079,23 @@ static void virtqueue_vring_attach_split(struct vring_virtqueue *vq,
 	vq->free_head = 0;
 }
 
+static int vring_alloc_dma_split(struct vring_virtqueue_split *vring_split,
+				  bool need_unmap)
+{
+	u32 num = vring_split->vring.num;
+	struct vring_desc_dma *dma;
+
+	if (!need_unmap)
+		return 0;
+
+	dma = kmalloc_array(num, sizeof(struct vring_desc_dma), GFP_KERNEL);
+	if (!dma)
+		return -ENOMEM;
+
+	vring_split->desc_dma = dma;
+	return 0;
+}
+
 static int vring_alloc_state_extra_split(struct vring_virtqueue_split *vring_split)
 {
 	struct vring_desc_state_split *state;
@@ -1106,6 +1132,7 @@ static void vring_free_split(struct vring_virtqueue_split *vring_split,
 
 	kfree(vring_split->desc_state);
 	kfree(vring_split->desc_extra);
+	kfree(vring_split->desc_dma);
 }
 
 static int vring_alloc_queue_split(struct vring_virtqueue_split *vring_split,
@@ -1207,6 +1234,10 @@ static int virtqueue_resize_split(struct virtqueue *_vq, u32 num)
 	if (err)
 		goto err_state_extra;
 
+	err = vring_alloc_dma_split(&vring_split, vring_need_unmap_buffer(vq));
+	if (err)
+		goto err_state_extra;
+
 	vring_free(&vq->vq);
 
 	virtqueue_vring_init_split(&vring_split, vq);
@@ -1239,14 +1270,16 @@ static u16 packed_last_used(u16 last_used_idx)
 
 /* caller must check vring_need_unmap_buffer() */
 static void vring_unmap_extra_packed(const struct vring_virtqueue *vq,
-				     const struct vring_desc_extra *extra)
+				     unsigned int i)
 {
+	const struct vring_desc_extra *extra = &vq->packed.desc_extra[i];
+	const struct vring_desc_dma *dma = &vq->packed.desc_dma[i];
 	u16 flags;
 
 	flags = extra->flags;
 
 	dma_unmap_page(vring_dma_dev(vq),
-		       extra->addr, extra->len,
+		       dma->addr, dma->len,
 		       (flags & VRING_DESC_F_WRITE) ?
 		       DMA_FROM_DEVICE : DMA_TO_DEVICE);
 }
@@ -1497,8 +1530,8 @@ static inline int virtqueue_add_packed(struct virtqueue *_vq,
 			desc[i].id = cpu_to_le16(id);
 
 			if (vring_need_unmap_buffer(vq)) {
-				vq->packed.desc_extra[curr].addr = addr;
-				vq->packed.desc_extra[curr].len = sg->length;
+				vq->packed.desc_dma[curr].addr = addr;
+				vq->packed.desc_dma[curr].len = sg->length;
 			}
 
 			vq->packed.desc_extra[curr].flags = le16_to_cpu(flags);
@@ -1557,7 +1590,7 @@ static inline int virtqueue_add_packed(struct virtqueue *_vq,
 	for (n = 0; n < total_sg; n++) {
 		if (i == err_idx)
 			break;
-		vring_unmap_extra_packed(vq, &vq->packed.desc_extra[curr]);
+		vring_unmap_extra_packed(vq, curr);
 		curr = vq->packed.desc_extra[curr].next;
 		i++;
 		if (i >= vq->packed.vring.num)
@@ -1638,8 +1671,7 @@ static void detach_buf_packed(struct vring_virtqueue *vq,
 		if (vring_need_unmap_buffer(vq)) {
 			curr = id;
 			for (i = 0; i < state->num; i++) {
-				vring_unmap_extra_packed(vq,
-							 &vq->packed.desc_extra[curr]);
+				vring_unmap_extra_packed(vq, curr);
 				curr = vq->packed.desc_extra[curr].next;
 			}
 		}
@@ -1952,6 +1984,7 @@ static void vring_free_packed(struct vring_virtqueue_packed *vring_packed,
 
 	kfree(vring_packed->desc_state);
 	kfree(vring_packed->desc_extra);
+	kfree(vring_packed->desc_dma);
 }
 
 static int vring_alloc_queue_packed(struct vring_virtqueue_packed *vring_packed,
@@ -2008,6 +2041,23 @@ static int vring_alloc_queue_packed(struct vring_virtqueue_packed *vring_packed,
 	return -ENOMEM;
 }
 
+static int vring_alloc_dma_packed(struct vring_virtqueue_packed *vring_packed,
+				  bool need_unmap)
+{
+	u32 num = vring_packed->vring.num;
+	struct vring_desc_dma *dma;
+
+	if (!need_unmap)
+		return 0;
+
+	dma = kmalloc_array(num, sizeof(struct vring_desc_dma), GFP_KERNEL);
+	if (!dma)
+		return -ENOMEM;
+
+	vring_packed->desc_dma = dma;
+	return 0;
+}
+
 static int vring_alloc_state_extra_packed(struct vring_virtqueue_packed *vring_packed)
 {
 	struct vring_desc_state_packed *state;
@@ -2120,6 +2170,10 @@ static struct virtqueue *vring_create_virtqueue_packed(struct virtio_device *vde
 	if (err)
 		goto err_state_extra;
 
+	err = vring_alloc_dma_packed(&vring_packed, vring_need_unmap_buffer(vq));
+	if (err)
+		goto err_state_extra;
+
 	virtqueue_vring_init_packed(&vring_packed, !!cfg_vq_val(cfg, callbacks));
 
 	virtqueue_init(vq, tp_cfg->num);
@@ -2152,6 +2206,10 @@ static int virtqueue_resize_packed(struct virtqueue *_vq, u32 num)
 	if (err)
 		goto err_state_extra;
 
+	err = vring_alloc_dma_packed(&vring_packed, vring_need_unmap_buffer(vq));
+	if (err)
+		goto err_state_extra;
+
 	vring_free(&vq->vq);
 
 	virtqueue_vring_init_packed(&vring_packed, !!vq->vq.callback);
@@ -2662,6 +2720,12 @@ static struct virtqueue *__vring_new_virtqueue(struct virtio_device *vdev,
 		return NULL;
 	}
 
+	err = vring_alloc_dma_split(vring_split, vring_need_unmap_buffer(vq));
+	if (err) {
+		kfree(vq);
+		return NULL;
+	}
+
 	virtqueue_vring_init_split(vring_split, vq);
 
 	virtqueue_init(vq, vring_split->vring.num);
@@ -2779,6 +2843,14 @@ int virtqueue_set_dma_premapped(struct virtqueue *_vq)
 
 	vq->premapped = true;
 
+	if (vq->packed_ring) {
+		kfree(vq->packed.desc_dma);
+		vq->packed.desc_dma = NULL;
+	} else {
+		kfree(vq->split.desc_dma);
+		vq->split.desc_dma = NULL;
+	}
+
 	END_USE(vq);
 
 	return 0;
@@ -2863,6 +2935,7 @@ static void vring_free(struct virtqueue *_vq)
 
 			kfree(vq->packed.desc_state);
 			kfree(vq->packed.desc_extra);
+			kfree(vq->packed.desc_dma);
 		} else {
 			vring_free_queue(vq->vq.vdev,
 					 vq->split.queue_size_in_bytes,
@@ -2874,6 +2947,7 @@ static void vring_free(struct virtqueue *_vq)
 	if (!vq->packed_ring) {
 		kfree(vq->split.desc_state);
 		kfree(vq->split.desc_extra);
+		kfree(vq->split.desc_dma);
 	}
 }
 
-- 
2.32.0.3.g01195cf9f


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH vhost v4 07/10] virtio: find_vqs: add new parameter premapped
  2024-03-12  3:35 [PATCH vhost v4 00/10] virtio: drivers maintain dma info for premapped vq Xuan Zhuo
                   ` (5 preceding siblings ...)
  2024-03-12  3:35 ` [PATCH vhost v4 06/10] virtio_ring: no store dma info when unmap is not needed Xuan Zhuo
@ 2024-03-12  3:35 ` Xuan Zhuo
  2024-03-12  3:35 ` [PATCH vhost v4 08/10] virtio_ring: export premapped to driver by struct virtqueue Xuan Zhuo
                   ` (4 subsequent siblings)
  11 siblings, 0 replies; 31+ messages in thread
From: Xuan Zhuo @ 2024-03-12  3:35 UTC (permalink / raw)
  To: virtualization
  Cc: Michael S. Tsirkin, Jason Wang, Xuan Zhuo, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, netdev

If the premapped mode is enabled, the dma array(struct vring_desc_dma) of
virtio core will not be allocated. That is judged when find_vqs() is
called. To avoid allocating dma array in find_vqs() and releasing it
immediately by virtqueue_set_dma_premapped(). This patch introduces a
new parameter to find_vqs(). Then we can judge should we allocate the
dma array(struct vring_desc_dma) or not inside find_vqs().

The driver must check the premapped mode of every vq after find_vqs().

Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
---
 drivers/virtio/virtio_ring.c  | 4 ++--
 include/linux/virtio_config.h | 1 +
 2 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index f5a72d22c838..538940ede46a 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -2157,7 +2157,7 @@ static struct virtqueue *vring_create_virtqueue_packed(struct virtio_device *vde
 	vq->packed_ring = true;
 	vq->dma_dev = dma_dev;
 	vq->use_dma_api = vring_use_dma_api(vdev);
-	vq->premapped = false;
+	vq->premapped = vq->use_dma_api && cfg_vq_get(cfg, premapped);
 
 	vq->indirect = virtio_has_feature(vdev, VIRTIO_RING_F_INDIRECT_DESC) &&
 		!cfg_vq_get(cfg, ctx);
@@ -2705,7 +2705,7 @@ static struct virtqueue *__vring_new_virtqueue(struct virtio_device *vdev,
 #endif
 	vq->dma_dev = tp_cfg->dma_dev;
 	vq->use_dma_api = vring_use_dma_api(vdev);
-	vq->premapped = false;
+	vq->premapped = vq->use_dma_api && cfg_vq_get(cfg, premapped);
 
 	vq->indirect = virtio_has_feature(vdev, VIRTIO_RING_F_INDIRECT_DESC) &&
 		!cfg_vq_get(cfg, ctx);
diff --git a/include/linux/virtio_config.h b/include/linux/virtio_config.h
index d47188303d34..f1f62e57f395 100644
--- a/include/linux/virtio_config.h
+++ b/include/linux/virtio_config.h
@@ -107,6 +107,7 @@ struct virtio_vq_config {
 	vq_callback_t **callbacks;
 	const char **names;
 	const bool *ctx;
+	const bool *premapped;
 	struct irq_affinity *desc;
 };
 
-- 
2.32.0.3.g01195cf9f


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH vhost v4 08/10] virtio_ring: export premapped to driver by struct virtqueue
  2024-03-12  3:35 [PATCH vhost v4 00/10] virtio: drivers maintain dma info for premapped vq Xuan Zhuo
                   ` (6 preceding siblings ...)
  2024-03-12  3:35 ` [PATCH vhost v4 07/10] virtio: find_vqs: add new parameter premapped Xuan Zhuo
@ 2024-03-12  3:35 ` Xuan Zhuo
  2024-03-12  3:35 ` [PATCH vhost v4 09/10] virtio_net: set premapped mode by find_vqs() Xuan Zhuo
                   ` (3 subsequent siblings)
  11 siblings, 0 replies; 31+ messages in thread
From: Xuan Zhuo @ 2024-03-12  3:35 UTC (permalink / raw)
  To: virtualization
  Cc: Michael S. Tsirkin, Jason Wang, Xuan Zhuo, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, netdev

Export the premapped to drivers, then drivers can check
the vq premapped mode after the find_vqs().
Because the find_vqs() just try to enable the vq premapped mode,
the driver must check that after find_vqs().

Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
---
 drivers/virtio/virtio_ring.c | 13 +++++--------
 include/linux/virtio.h       |  1 +
 2 files changed, 6 insertions(+), 8 deletions(-)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index 538940ede46a..34f4b2c0c31e 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -191,9 +191,6 @@ struct vring_virtqueue {
 	/* Host publishes avail event idx */
 	bool event;
 
-	/* Do DMA mapping by driver */
-	bool premapped;
-
 	/* Head of free buffer list. */
 	unsigned int free_head;
 	/* Number we've added since last sync. */
@@ -311,7 +308,7 @@ static bool vring_use_dma_api(const struct virtio_device *vdev)
 
 static bool vring_need_unmap_buffer(const struct vring_virtqueue *vring)
 {
-	return vring->use_dma_api && !vring->premapped;
+	return vring->use_dma_api && !vring->vq.premapped;
 }
 
 size_t virtio_max_dma_size(const struct virtio_device *vdev)
@@ -383,7 +380,7 @@ static struct device *vring_dma_dev(const struct vring_virtqueue *vq)
 static int vring_map_one_sg(const struct vring_virtqueue *vq, struct scatterlist *sg,
 			    enum dma_data_direction direction, dma_addr_t *addr)
 {
-	if (vq->premapped) {
+	if (vq->vq.premapped) {
 		*addr = sg_dma_address(sg);
 		return 0;
 	}
@@ -2157,7 +2154,7 @@ static struct virtqueue *vring_create_virtqueue_packed(struct virtio_device *vde
 	vq->packed_ring = true;
 	vq->dma_dev = dma_dev;
 	vq->use_dma_api = vring_use_dma_api(vdev);
-	vq->premapped = vq->use_dma_api && cfg_vq_get(cfg, premapped);
+	vq->vq.premapped = vq->use_dma_api && cfg_vq_get(cfg, premapped);
 
 	vq->indirect = virtio_has_feature(vdev, VIRTIO_RING_F_INDIRECT_DESC) &&
 		!cfg_vq_get(cfg, ctx);
@@ -2705,7 +2702,7 @@ static struct virtqueue *__vring_new_virtqueue(struct virtio_device *vdev,
 #endif
 	vq->dma_dev = tp_cfg->dma_dev;
 	vq->use_dma_api = vring_use_dma_api(vdev);
-	vq->premapped = vq->use_dma_api && cfg_vq_get(cfg, premapped);
+	vq->vq.premapped = vq->use_dma_api && cfg_vq_get(cfg, premapped);
 
 	vq->indirect = virtio_has_feature(vdev, VIRTIO_RING_F_INDIRECT_DESC) &&
 		!cfg_vq_get(cfg, ctx);
@@ -2841,7 +2838,7 @@ int virtqueue_set_dma_premapped(struct virtqueue *_vq)
 		return -EINVAL;
 	}
 
-	vq->premapped = true;
+	vq->vq.premapped = true;
 
 	if (vq->packed_ring) {
 		kfree(vq->packed.desc_dma);
diff --git a/include/linux/virtio.h b/include/linux/virtio.h
index b0201747a263..407277d5a16b 100644
--- a/include/linux/virtio.h
+++ b/include/linux/virtio.h
@@ -36,6 +36,7 @@ struct virtqueue {
 	unsigned int num_free;
 	unsigned int num_max;
 	bool reset;
+	bool premapped;
 	void *priv;
 };
 
-- 
2.32.0.3.g01195cf9f


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH vhost v4 09/10] virtio_net: set premapped mode by find_vqs()
  2024-03-12  3:35 [PATCH vhost v4 00/10] virtio: drivers maintain dma info for premapped vq Xuan Zhuo
                   ` (7 preceding siblings ...)
  2024-03-12  3:35 ` [PATCH vhost v4 08/10] virtio_ring: export premapped to driver by struct virtqueue Xuan Zhuo
@ 2024-03-12  3:35 ` Xuan Zhuo
  2024-03-12  3:35 ` [PATCH vhost v4 10/10] virtio_ring: virtqueue_set_dma_premapped support disable Xuan Zhuo
                   ` (2 subsequent siblings)
  11 siblings, 0 replies; 31+ messages in thread
From: Xuan Zhuo @ 2024-03-12  3:35 UTC (permalink / raw)
  To: virtualization
  Cc: Michael S. Tsirkin, Jason Wang, Xuan Zhuo, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, netdev

Now, the virtio core can set the premapped mode by find_vqs().
If the premapped can be enabled, the dma array will not be
allocated. So virtio-net use the api of find_vqs to enable the
premapped.

Judge the premapped mode by the vq->premapped instead of saving
local variable.

Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
---
 drivers/net/virtio_net.c      | 57 +++++++++++++++++------------------
 include/linux/virtio_config.h | 16 ++--------
 2 files changed, 29 insertions(+), 44 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index d7ce4a1011ea..140b2c155650 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -208,9 +208,6 @@ struct receive_queue {
 
 	/* Record the last dma info to free after new pages is allocated. */
 	struct virtnet_rq_dma *last_dma;
-
-	/* Do dma by self */
-	bool do_dma;
 };
 
 /* This structure can contain rss message with maximum settings for indirection table and keysize
@@ -657,7 +654,7 @@ static void *virtnet_rq_get_buf(struct receive_queue *rq, u32 *len, void **ctx)
 	void *buf;
 
 	buf = virtqueue_get_buf_ctx(rq->vq, len, ctx);
-	if (buf && rq->do_dma)
+	if (buf && rq->vq->premapped)
 		virtnet_rq_unmap(rq, buf, *len);
 
 	return buf;
@@ -670,7 +667,7 @@ static void virtnet_rq_init_one_sg(struct receive_queue *rq, void *buf, u32 len)
 	u32 offset;
 	void *head;
 
-	if (!rq->do_dma) {
+	if (!rq->vq->premapped) {
 		sg_init_one(rq->sg, buf, len);
 		return;
 	}
@@ -700,7 +697,7 @@ static void *virtnet_rq_alloc(struct receive_queue *rq, u32 size, gfp_t gfp)
 
 	head = page_address(alloc_frag->page);
 
-	if (rq->do_dma) {
+	if (rq->vq->premapped) {
 		dma = head;
 
 		/* new pages */
@@ -746,22 +743,6 @@ static void *virtnet_rq_alloc(struct receive_queue *rq, u32 size, gfp_t gfp)
 	return buf;
 }
 
-static void virtnet_rq_set_premapped(struct virtnet_info *vi)
-{
-	int i;
-
-	/* disable for big mode */
-	if (!vi->mergeable_rx_bufs && vi->big_packets)
-		return;
-
-	for (i = 0; i < vi->max_queue_pairs; i++) {
-		if (virtqueue_set_dma_premapped(vi->rq[i].vq))
-			continue;
-
-		vi->rq[i].do_dma = true;
-	}
-}
-
 static void virtnet_rq_unmap_free_buf(struct virtqueue *vq, void *buf)
 {
 	struct virtnet_info *vi = vq->vdev->priv;
@@ -770,7 +751,7 @@ static void virtnet_rq_unmap_free_buf(struct virtqueue *vq, void *buf)
 
 	rq = &vi->rq[i];
 
-	if (rq->do_dma)
+	if (rq->vq->premapped)
 		virtnet_rq_unmap(rq, buf, 0);
 
 	virtnet_rq_free_buf(vi, rq, buf);
@@ -1865,7 +1846,7 @@ static int add_recvbuf_small(struct virtnet_info *vi, struct receive_queue *rq,
 
 	err = virtqueue_add_inbuf_ctx(rq->vq, rq->sg, 1, buf, ctx, gfp);
 	if (err < 0) {
-		if (rq->do_dma)
+		if (rq->vq->premapped)
 			virtnet_rq_unmap(rq, buf, 0);
 		put_page(virt_to_head_page(buf));
 	}
@@ -1980,7 +1961,7 @@ static int add_recvbuf_mergeable(struct virtnet_info *vi,
 	ctx = mergeable_len_to_ctx(len + room, headroom);
 	err = virtqueue_add_inbuf_ctx(rq->vq, rq->sg, 1, buf, ctx, gfp);
 	if (err < 0) {
-		if (rq->do_dma)
+		if (rq->vq->premapped)
 			virtnet_rq_unmap(rq, buf, 0);
 		put_page(virt_to_head_page(buf));
 	}
@@ -4229,7 +4210,7 @@ static void free_receive_page_frags(struct virtnet_info *vi)
 	int i;
 	for (i = 0; i < vi->max_queue_pairs; i++)
 		if (vi->rq[i].alloc_frag.page) {
-			if (vi->rq[i].do_dma && vi->rq[i].last_dma)
+			if (vi->rq[i].vq->premapped && vi->rq[i].last_dma)
 				virtnet_rq_unmap(&vi->rq[i], vi->rq[i].last_dma, 0);
 			put_page(vi->rq[i].alloc_frag.page);
 		}
@@ -4293,11 +4274,13 @@ static unsigned int mergeable_min_buf_len(struct virtnet_info *vi, struct virtqu
 
 static int virtnet_find_vqs(struct virtnet_info *vi)
 {
+	struct virtio_vq_config cfg = {};
 	vq_callback_t **callbacks;
 	struct virtqueue **vqs;
 	const char **names;
 	int ret = -ENOMEM;
 	int total_vqs;
+	bool *premapped;
 	bool *ctx;
 	u16 i;
 
@@ -4322,8 +4305,13 @@ static int virtnet_find_vqs(struct virtnet_info *vi)
 		ctx = kcalloc(total_vqs, sizeof(*ctx), GFP_KERNEL);
 		if (!ctx)
 			goto err_ctx;
+
+		premapped = kcalloc(total_vqs, sizeof(*premapped), GFP_KERNEL);
+		if (!ctx)
+			goto err_premapped;
 	} else {
 		ctx = NULL;
+		premapped = NULL;
 	}
 
 	/* Parameters for control virtqueue, if any */
@@ -4342,10 +4330,19 @@ static int virtnet_find_vqs(struct virtnet_info *vi)
 		names[txq2vq(i)] = vi->sq[i].name;
 		if (ctx)
 			ctx[rxq2vq(i)] = true;
+
+		if (premapped)
+			premapped[rxq2vq(i)] = true;
 	}
 
-	ret = virtio_find_vqs_ctx(vi->vdev, total_vqs, vqs, callbacks,
-				  names, ctx, NULL);
+	cfg.nvqs      = total_vqs;
+	cfg.vqs       = vqs;
+	cfg.callbacks = callbacks;
+	cfg.names     = names;
+	cfg.ctx       = ctx;
+	cfg.premapped = premapped;
+
+	ret = virtio_find_vqs_cfg(vi->vdev, &cfg);
 	if (ret)
 		goto err_find;
 
@@ -4365,6 +4362,8 @@ static int virtnet_find_vqs(struct virtnet_info *vi)
 
 
 err_find:
+	kfree(premapped);
+err_premapped:
 	kfree(ctx);
 err_ctx:
 	kfree(names);
@@ -4437,8 +4436,6 @@ static int init_vqs(struct virtnet_info *vi)
 	if (ret)
 		goto err_free;
 
-	virtnet_rq_set_premapped(vi);
-
 	cpus_read_lock();
 	virtnet_set_affinity(vi);
 	cpus_read_unlock();
diff --git a/include/linux/virtio_config.h b/include/linux/virtio_config.h
index f1f62e57f395..e40509fef5fe 100644
--- a/include/linux/virtio_config.h
+++ b/include/linux/virtio_config.h
@@ -260,21 +260,9 @@ int virtio_find_vqs(struct virtio_device *vdev, unsigned nvqs,
 }
 
 static inline
-int virtio_find_vqs_ctx(struct virtio_device *vdev, unsigned nvqs,
-			struct virtqueue *vqs[], vq_callback_t *callbacks[],
-			const char * const names[], const bool *ctx,
-			struct irq_affinity *desc)
+int virtio_find_vqs_cfg(struct virtio_device *vdev, struct virtio_vq_config *cfg)
 {
-	struct virtio_vq_config cfg = {};
-
-	cfg.nvqs = nvqs;
-	cfg.vqs = vqs;
-	cfg.callbacks = callbacks;
-	cfg.names = (const char **)names;
-	cfg.ctx = ctx;
-	cfg.desc = desc;
-
-	return vdev->config->find_vqs(vdev, &cfg);
+	return vdev->config->find_vqs(vdev, cfg);
 }
 
 /**
-- 
2.32.0.3.g01195cf9f


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH vhost v4 10/10] virtio_ring: virtqueue_set_dma_premapped support disable
  2024-03-12  3:35 [PATCH vhost v4 00/10] virtio: drivers maintain dma info for premapped vq Xuan Zhuo
                   ` (8 preceding siblings ...)
  2024-03-12  3:35 ` [PATCH vhost v4 09/10] virtio_net: set premapped mode by find_vqs() Xuan Zhuo
@ 2024-03-12  3:35 ` Xuan Zhuo
  2024-03-21  6:02   ` Jason Wang
  2024-03-19  6:56 ` [PATCH vhost v4 00/10] virtio: drivers maintain dma info for premapped vq Michael S. Tsirkin
  2024-03-21  4:45 ` Jason Wang
  11 siblings, 1 reply; 31+ messages in thread
From: Xuan Zhuo @ 2024-03-12  3:35 UTC (permalink / raw)
  To: virtualization
  Cc: Michael S. Tsirkin, Jason Wang, Xuan Zhuo, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, netdev

Now, the API virtqueue_set_dma_premapped just support to
enable premapped mode.

If we allow enabling the premapped dynamically, we should
make this API to support disable the premapped mode.

Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
---
 drivers/virtio/virtio_ring.c | 34 ++++++++++++++++++++++++++--------
 include/linux/virtio.h       |  2 +-
 2 files changed, 27 insertions(+), 9 deletions(-)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index 34f4b2c0c31e..3bf69cae4965 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -2801,6 +2801,7 @@ EXPORT_SYMBOL_GPL(virtqueue_resize);
 /**
  * virtqueue_set_dma_premapped - set the vring premapped mode
  * @_vq: the struct virtqueue we're talking about.
+ * @premapped: enable/disable the premapped mode.
  *
  * Enable the premapped mode of the vq.
  *
@@ -2819,9 +2820,10 @@ EXPORT_SYMBOL_GPL(virtqueue_resize);
  * 0: success.
  * -EINVAL: vring does not use the dma api, so we can not enable premapped mode.
  */
-int virtqueue_set_dma_premapped(struct virtqueue *_vq)
+int virtqueue_set_dma_premapped(struct virtqueue *_vq, bool premapped)
 {
 	struct vring_virtqueue *vq = to_vvq(_vq);
+	int err = 0;
 	u32 num;
 
 	START_USE(vq);
@@ -2833,24 +2835,40 @@ int virtqueue_set_dma_premapped(struct virtqueue *_vq)
 		return -EINVAL;
 	}
 
+	if (vq->vq.premapped == premapped) {
+		END_USE(vq);
+		return 0;
+	}
+
 	if (!vq->use_dma_api) {
 		END_USE(vq);
 		return -EINVAL;
 	}
 
-	vq->vq.premapped = true;
+	if (premapped) {
+		vq->vq.premapped = true;
+
+		if (vq->packed_ring) {
+			kfree(vq->packed.desc_dma);
+			vq->packed.desc_dma = NULL;
+		} else {
+			kfree(vq->split.desc_dma);
+			vq->split.desc_dma = NULL;
+		}
 
-	if (vq->packed_ring) {
-		kfree(vq->packed.desc_dma);
-		vq->packed.desc_dma = NULL;
 	} else {
-		kfree(vq->split.desc_dma);
-		vq->split.desc_dma = NULL;
+		if (vq->packed_ring)
+			err = vring_alloc_dma_split(&vq->split, false);
+		else
+			err = vring_alloc_dma_packed(&vq->packed, false);
+
+		if (!err)
+			vq->vq.premapped = false;
 	}
 
 	END_USE(vq);
 
-	return 0;
+	return err;
 }
 EXPORT_SYMBOL_GPL(virtqueue_set_dma_premapped);
 
diff --git a/include/linux/virtio.h b/include/linux/virtio.h
index 407277d5a16b..4b338590abf4 100644
--- a/include/linux/virtio.h
+++ b/include/linux/virtio.h
@@ -82,7 +82,7 @@ bool virtqueue_enable_cb(struct virtqueue *vq);
 
 unsigned virtqueue_enable_cb_prepare(struct virtqueue *vq);
 
-int virtqueue_set_dma_premapped(struct virtqueue *_vq);
+int virtqueue_set_dma_premapped(struct virtqueue *_vq, bool premapped);
 
 bool virtqueue_poll(struct virtqueue *vq, unsigned);
 
-- 
2.32.0.3.g01195cf9f


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* Re: [PATCH vhost v4 00/10] virtio: drivers maintain dma info for premapped vq
  2024-03-12  3:35 [PATCH vhost v4 00/10] virtio: drivers maintain dma info for premapped vq Xuan Zhuo
                   ` (9 preceding siblings ...)
  2024-03-12  3:35 ` [PATCH vhost v4 10/10] virtio_ring: virtqueue_set_dma_premapped support disable Xuan Zhuo
@ 2024-03-19  6:56 ` Michael S. Tsirkin
  2024-03-20  9:25   ` Jason Wang
  2024-03-21  4:45 ` Jason Wang
  11 siblings, 1 reply; 31+ messages in thread
From: Michael S. Tsirkin @ 2024-03-19  6:56 UTC (permalink / raw)
  To: Xuan Zhuo
  Cc: virtualization, Jason Wang, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, netdev

On Tue, Mar 12, 2024 at 11:35:47AM +0800, Xuan Zhuo wrote:
> As discussed:
> 
> http://lore.kernel.org/all/CACGkMEvq0No8QGC46U4mGsMtuD44fD_cfLcPaVmJ3rHYqRZxYg@mail.gmail.com
> 
> If the virtio is premapped mode, the driver should manage the dma info by self.
> So the virtio core should not store the dma info. We can release the memory used
> to store the dma info.
> 
> For virtio-net xmit queue, if the virtio-net maintains the dma info,
> the virtio-net must allocate too much memory(19 * queue_size for per-queue), so
> we do not plan to make the virtio-net to maintain the dma info by default. The
> virtio-net xmit queue only maintain the dma info when premapped mode is enable
> (such as AF_XDP is enable).

This landed when merge window was open already so I'm deferring this
to the next merge window, just to be safe. Jason can you review please?

> So this patch set try to do:
> 
> 1. make the virtio core to do not store the dma info
>     - But if the desc_extra has not dma info, we face a new question,
>       it is hard to get the dma info of the desc with indirect flag.
>       For split mode, that is easy from desc, but for the packed mode,
>       it is hard to get the dma info from the desc. And hardening
>       the dma unmap is safe, we should store the dma info of indirect
>       descs when the virtio core does not store the bufer dma info.
> 
>       So I introduce the "structure the indirect desc table" to
>       allocate space to store dma info of the desc table.
> 
>         +struct vring_split_desc_indir {
>         +       dma_addr_t addr;                /* Descriptor Array DMA addr. */
>         +       u32 len;                        /* Descriptor Array length. */
>         +       u32 num;
>         +       struct vring_desc desc[];
>         +};
> 
>       The follow patches to this:
>          * virtio_ring: packed: structure the indirect desc table
>          * virtio_ring: split: structure the indirect desc table
> 
>     - On the other side, in the umap handle, we mix the indirect descs with
>       other descs. That make things too complex. I found if we we distinguish
>       the descs with VRING_DESC_F_INDIRECT before unmap, thing will be clearer.
> 
>       The follow patches do this.
>          * virtio_ring: packed: remove double check of the unmap ops
>          * virtio_ring: split: structure the indirect desc table
> 
> 2. make the virtio core to enable premapped mode by find_vqs() params
>     - Because the find_vqs() will try to allocate memory for the dma info.
>       If we set the premapped mode after find_vqs() and release the
>       dma info, that is odd.
> 
> 
> Please review.
> 
> Thanks
> 
> v4:
>     1. virtio-net xmit queue does not enable premapped mode by default
> 
> v3:
>     1. fix the conflict with the vp_modern_create_avq().
> 
> v2:
>     1. change the dma item of virtio-net, every item have MAX_SKB_FRAGS + 2 addr + len pairs.
>     2. introduce virtnet_sq_free_stats for __free_old_xmit
> 
> v1:
>     1. rename transport_vq_config to vq_transport_config
>     2. virtio-net set dma meta number to (ring-size + 1)(MAX_SKB_FRGAS +2)
>     3. introduce virtqueue_dma_map_sg_attrs
>     4. separate vring_create_virtqueue to an independent commit
> 
> Xuan Zhuo (10):
>   virtio_ring: introduce vring_need_unmap_buffer
>   virtio_ring: packed: remove double check of the unmap ops
>   virtio_ring: packed: structure the indirect desc table
>   virtio_ring: split: remove double check of the unmap ops
>   virtio_ring: split: structure the indirect desc table
>   virtio_ring: no store dma info when unmap is not needed
>   virtio: find_vqs: add new parameter premapped
>   virtio_ring: export premapped to driver by struct virtqueue
>   virtio_net: set premapped mode by find_vqs()
>   virtio_ring: virtqueue_set_dma_premapped support disable
> 
>  drivers/net/virtio_net.c      |  57 +++--
>  drivers/virtio/virtio_ring.c  | 436 +++++++++++++++++++++-------------
>  include/linux/virtio.h        |   3 +-
>  include/linux/virtio_config.h |  17 +-
>  4 files changed, 307 insertions(+), 206 deletions(-)
> 
> --
> 2.32.0.3.g01195cf9f


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH vhost v4 00/10] virtio: drivers maintain dma info for premapped vq
  2024-03-19  6:56 ` [PATCH vhost v4 00/10] virtio: drivers maintain dma info for premapped vq Michael S. Tsirkin
@ 2024-03-20  9:25   ` Jason Wang
  0 siblings, 0 replies; 31+ messages in thread
From: Jason Wang @ 2024-03-20  9:25 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Xuan Zhuo, virtualization, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, netdev

On Tue, Mar 19, 2024 at 2:56 PM Michael S. Tsirkin <mst@redhat.com> wrote:
>
> On Tue, Mar 12, 2024 at 11:35:47AM +0800, Xuan Zhuo wrote:
> > As discussed:
> >
> > http://lore.kernel.org/all/CACGkMEvq0No8QGC46U4mGsMtuD44fD_cfLcPaVmJ3rHYqRZxYg@mail.gmail.com
> >
> > If the virtio is premapped mode, the driver should manage the dma info by self.
> > So the virtio core should not store the dma info. We can release the memory used
> > to store the dma info.
> >
> > For virtio-net xmit queue, if the virtio-net maintains the dma info,
> > the virtio-net must allocate too much memory(19 * queue_size for per-queue), so
> > we do not plan to make the virtio-net to maintain the dma info by default. The
> > virtio-net xmit queue only maintain the dma info when premapped mode is enable
> > (such as AF_XDP is enable).
>
> This landed when merge window was open already so I'm deferring this
> to the next merge window, just to be safe. Jason can you review please?

Will do.

Thanks


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH vhost v4 00/10] virtio: drivers maintain dma info for premapped vq
  2024-03-12  3:35 [PATCH vhost v4 00/10] virtio: drivers maintain dma info for premapped vq Xuan Zhuo
                   ` (10 preceding siblings ...)
  2024-03-19  6:56 ` [PATCH vhost v4 00/10] virtio: drivers maintain dma info for premapped vq Michael S. Tsirkin
@ 2024-03-21  4:45 ` Jason Wang
  2024-03-21  8:30   ` Xuan Zhuo
  11 siblings, 1 reply; 31+ messages in thread
From: Jason Wang @ 2024-03-21  4:45 UTC (permalink / raw)
  To: Xuan Zhuo
  Cc: virtualization, Michael S. Tsirkin, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, netdev

On Tue, Mar 12, 2024 at 11:36 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
>
> As discussed:
>
> http://lore.kernel.org/all/CACGkMEvq0No8QGC46U4mGsMtuD44fD_cfLcPaVmJ3rHYqRZxYg@mail.gmail.com
>
> If the virtio is premapped mode, the driver should manage the dma info by self.
> So the virtio core should not store the dma info. We can release the memory used
> to store the dma info.
>
> For virtio-net xmit queue, if the virtio-net maintains the dma info,
> the virtio-net must allocate too much memory(19 * queue_size for per-queue), so
> we do not plan to make the virtio-net to maintain the dma info by default. The
> virtio-net xmit queue only maintain the dma info when premapped mode is enable
> (such as AF_XDP is enable).
>
> So this patch set try to do:
>
> 1. make the virtio core to do not store the dma info

I think you mean "make the virtio core to do not store the dma info
when driver can do that"

>     - But if the desc_extra has not dma info, we face a new question,
>       it is hard to get the dma info of the desc with indirect flag.

I guess you want to avoid allocating desc_extra array, otherwise you
won't have this issue.

How about keeping that?

>       For split mode, that is easy from desc, but for the packed mode,
>       it is hard to get the dma info from the desc. And hardening
>       the dma unmap is safe, we should store the dma info of indirect
>       descs when the virtio core does not store the bufer dma info.
>
>       So I introduce the "structure the indirect desc table" to
>       allocate space to store dma info of the desc table.
>
>         +struct vring_split_desc_indir {
>         +       dma_addr_t addr;                /* Descriptor Array DMA addr. */
>         +       u32 len;                        /* Descriptor Array length. */
>         +       u32 num;

We can probably just reuse vring_desc_extra here with a known flag
(read only for device).

>         +       struct vring_desc desc[];
>         +};
>
>       The follow patches to this:
>          * virtio_ring: packed: structure the indirect desc table
>          * virtio_ring: split: structure the indirect desc table
>
>     - On the other side, in the umap handle, we mix the indirect descs with
>       other descs. That make things too complex. I found if we we distinguish
>       the descs with VRING_DESC_F_INDIRECT before unmap, thing will be clearer.
>
>       The follow patches do this.
>          * virtio_ring: packed: remove double check of the unmap ops
>          * virtio_ring: split: structure the indirect desc table
>
> 2. make the virtio core to enable premapped mode by find_vqs() params
>     - Because the find_vqs() will try to allocate memory for the dma info.
>       If we set the premapped mode after find_vqs() and release the
>       dma info, that is odd.

Thanks

>
>
> Please review.
>
> Thanks
>
> v4:
>     1. virtio-net xmit queue does not enable premapped mode by default
>
> v3:
>     1. fix the conflict with the vp_modern_create_avq().
>
> v2:
>     1. change the dma item of virtio-net, every item have MAX_SKB_FRAGS + 2 addr + len pairs.
>     2. introduce virtnet_sq_free_stats for __free_old_xmit
>
> v1:
>     1. rename transport_vq_config to vq_transport_config
>     2. virtio-net set dma meta number to (ring-size + 1)(MAX_SKB_FRGAS +2)
>     3. introduce virtqueue_dma_map_sg_attrs
>     4. separate vring_create_virtqueue to an independent commit
>
> Xuan Zhuo (10):
>   virtio_ring: introduce vring_need_unmap_buffer
>   virtio_ring: packed: remove double check of the unmap ops
>   virtio_ring: packed: structure the indirect desc table
>   virtio_ring: split: remove double check of the unmap ops
>   virtio_ring: split: structure the indirect desc table
>   virtio_ring: no store dma info when unmap is not needed
>   virtio: find_vqs: add new parameter premapped
>   virtio_ring: export premapped to driver by struct virtqueue
>   virtio_net: set premapped mode by find_vqs()
>   virtio_ring: virtqueue_set_dma_premapped support disable
>
>  drivers/net/virtio_net.c      |  57 +++--
>  drivers/virtio/virtio_ring.c  | 436 +++++++++++++++++++++-------------
>  include/linux/virtio.h        |   3 +-
>  include/linux/virtio_config.h |  17 +-
>  4 files changed, 307 insertions(+), 206 deletions(-)
>
> --
> 2.32.0.3.g01195cf9f
>


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH vhost v4 03/10] virtio_ring: packed: structure the indirect desc table
  2024-03-12  3:35 ` [PATCH vhost v4 03/10] virtio_ring: packed: structure the indirect desc table Xuan Zhuo
@ 2024-03-21  4:47   ` Jason Wang
  2024-03-21  8:24     ` Xuan Zhuo
  0 siblings, 1 reply; 31+ messages in thread
From: Jason Wang @ 2024-03-21  4:47 UTC (permalink / raw)
  To: Xuan Zhuo
  Cc: virtualization, Michael S. Tsirkin, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, netdev

On Tue, Mar 12, 2024 at 11:36 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
>
> This commit structure the indirect desc table.
> Then we can get the desc num directly when doing unmap.
>
> And save the dma info to the struct, then the indirect
> will not use the dma fields of the desc_extra. The subsequent
> commits will make the dma fields are optional. But for
> the indirect case, we must record the dma info.
>
> Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> ---
>  drivers/virtio/virtio_ring.c | 66 +++++++++++++++++++++---------------
>  1 file changed, 38 insertions(+), 28 deletions(-)
>
> diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> index 0dfbd17e5a87..22a588bba166 100644
> --- a/drivers/virtio/virtio_ring.c
> +++ b/drivers/virtio/virtio_ring.c
> @@ -72,9 +72,16 @@ struct vring_desc_state_split {
>         struct vring_desc *indir_desc;  /* Indirect descriptor, if any. */
>  };
>
> +struct vring_packed_desc_indir {
> +       dma_addr_t addr;                /* Descriptor Array DMA addr. */
> +       u32 len;                        /* Descriptor Array length. */
> +       u32 num;
> +       struct vring_packed_desc desc[];
> +};
> +
>  struct vring_desc_state_packed {
>         void *data;                     /* Data for callback. */
> -       struct vring_packed_desc *indir_desc; /* Indirect descriptor, if any. */
> +       struct vring_packed_desc_indir *indir_desc; /* Indirect descriptor, if any. */

Maybe it's better just to have a vring_desc_extra here.

Thanks


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH vhost v4 02/10] virtio_ring: packed: remove double check of the unmap ops
  2024-03-12  3:35 ` [PATCH vhost v4 02/10] virtio_ring: packed: remove double check of the unmap ops Xuan Zhuo
@ 2024-03-21  5:57   ` Jason Wang
  2024-03-21  8:20     ` Xuan Zhuo
  0 siblings, 1 reply; 31+ messages in thread
From: Jason Wang @ 2024-03-21  5:57 UTC (permalink / raw)
  To: Xuan Zhuo
  Cc: virtualization, Michael S. Tsirkin, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, netdev

On Tue, Mar 12, 2024 at 11:36 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
>
> In the functions vring_unmap_extra_packed and vring_unmap_desc_packed,
> multiple checks are made whether unmap is performed and whether it is
> INDIRECT.
>
> These two functions are usually called in a loop, and we should put the
> check outside the loop.
>
> And we unmap the descs with VRING_DESC_F_INDIRECT on the same path with
> other descs, that make the thing more complex. If we distinguish the
> descs with VRING_DESC_F_INDIRECT before unmap, thing will be clearer.
>
> 1. only one desc of the desc table is used, we do not need the loop
> 2. the called unmap api is difference from the other desc
> 3. the vq->premapped is not needed to check
> 4. the vq->indirect is not needed to check
> 5. the state->indir_desc must not be null
>
> Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> ---
>  drivers/virtio/virtio_ring.c | 78 ++++++++++++++++++------------------
>  1 file changed, 40 insertions(+), 38 deletions(-)
>
> diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> index c2779e34aac7..0dfbd17e5a87 100644
> --- a/drivers/virtio/virtio_ring.c
> +++ b/drivers/virtio/virtio_ring.c
> @@ -1214,6 +1214,7 @@ static u16 packed_last_used(u16 last_used_idx)
>         return last_used_idx & ~(-(1 << VRING_PACKED_EVENT_F_WRAP_CTR));
>  }
>
> +/* caller must check vring_need_unmap_buffer() */
>  static void vring_unmap_extra_packed(const struct vring_virtqueue *vq,
>                                      const struct vring_desc_extra *extra)
>  {
> @@ -1221,33 +1222,18 @@ static void vring_unmap_extra_packed(const struct vring_virtqueue *vq,
>
>         flags = extra->flags;
>
> -       if (flags & VRING_DESC_F_INDIRECT) {
> -               if (!vq->use_dma_api)
> -                       return;
> -
> -               dma_unmap_single(vring_dma_dev(vq),
> -                                extra->addr, extra->len,
> -                                (flags & VRING_DESC_F_WRITE) ?
> -                                DMA_FROM_DEVICE : DMA_TO_DEVICE);
> -       } else {
> -               if (!vring_need_unmap_buffer(vq))
> -                       return;
> -
> -               dma_unmap_page(vring_dma_dev(vq),
> -                              extra->addr, extra->len,
> -                              (flags & VRING_DESC_F_WRITE) ?
> -                              DMA_FROM_DEVICE : DMA_TO_DEVICE);
> -       }
> +       dma_unmap_page(vring_dma_dev(vq),
> +                      extra->addr, extra->len,
> +                      (flags & VRING_DESC_F_WRITE) ?
> +                      DMA_FROM_DEVICE : DMA_TO_DEVICE);
>  }
>
> +/* caller must check vring_need_unmap_buffer() */
>  static void vring_unmap_desc_packed(const struct vring_virtqueue *vq,
>                                     const struct vring_packed_desc *desc)
>  {
>         u16 flags;
>
> -       if (!vring_need_unmap_buffer(vq))
> -               return;
> -
>         flags = le16_to_cpu(desc->flags);
>
>         dma_unmap_page(vring_dma_dev(vq),
> @@ -1323,7 +1309,7 @@ static int virtqueue_add_indirect_packed(struct vring_virtqueue *vq,
>                         total_sg * sizeof(struct vring_packed_desc),
>                         DMA_TO_DEVICE);
>         if (vring_mapping_error(vq, addr)) {
> -               if (vq->premapped)
> +               if (!vring_need_unmap_buffer(vq))
>                         goto free_desc;
>
>                 goto unmap_release;
> @@ -1338,10 +1324,11 @@ static int virtqueue_add_indirect_packed(struct vring_virtqueue *vq,
>                 vq->packed.desc_extra[id].addr = addr;
>                 vq->packed.desc_extra[id].len = total_sg *
>                                 sizeof(struct vring_packed_desc);
> -               vq->packed.desc_extra[id].flags = VRING_DESC_F_INDIRECT |
> -                                                 vq->packed.avail_used_flags;
>         }
>
> +       vq->packed.desc_extra[id].flags = VRING_DESC_F_INDIRECT |
> +               vq->packed.avail_used_flags;
> +
>         /*
>          * A driver MUST NOT make the first descriptor in the list
>          * available before all subsequent descriptors comprising
> @@ -1382,6 +1369,8 @@ static int virtqueue_add_indirect_packed(struct vring_virtqueue *vq,
>  unmap_release:
>         err_idx = i;
>
> +       WARN_ON(!vring_need_unmap_buffer(vq));
> +
>         for (i = 0; i < err_idx; i++)
>                 vring_unmap_desc_packed(vq, &desc[i]);
>
> @@ -1475,12 +1464,13 @@ static inline int virtqueue_add_packed(struct virtqueue *_vq,
>                         desc[i].len = cpu_to_le32(sg->length);
>                         desc[i].id = cpu_to_le16(id);
>
> -                       if (unlikely(vq->use_dma_api)) {
> +                       if (vring_need_unmap_buffer(vq)) {
>                                 vq->packed.desc_extra[curr].addr = addr;
>                                 vq->packed.desc_extra[curr].len = sg->length;
> -                               vq->packed.desc_extra[curr].flags =
> -                                       le16_to_cpu(flags);
>                         }
> +
> +                       vq->packed.desc_extra[curr].flags = le16_to_cpu(flags);
> +
>                         prev = curr;
>                         curr = vq->packed.desc_extra[curr].next;
>
> @@ -1530,6 +1520,8 @@ static inline int virtqueue_add_packed(struct virtqueue *_vq,
>
>         vq->packed.avail_used_flags = avail_used_flags;
>
> +       WARN_ON(!vring_need_unmap_buffer(vq));
> +
>         for (n = 0; n < total_sg; n++) {
>                 if (i == err_idx)
>                         break;
> @@ -1599,7 +1591,9 @@ static void detach_buf_packed(struct vring_virtqueue *vq,
>         struct vring_desc_state_packed *state = NULL;
>         struct vring_packed_desc *desc;
>         unsigned int i, curr;
> +       u16 flags;
>
> +       flags = vq->packed.desc_extra[id].flags;
>         state = &vq->packed.desc_state[id];
>
>         /* Clear data ptr. */
> @@ -1609,22 +1603,32 @@ static void detach_buf_packed(struct vring_virtqueue *vq,
>         vq->free_head = id;
>         vq->vq.num_free += state->num;
>
> -       if (unlikely(vq->use_dma_api)) {
> -               curr = id;
> -               for (i = 0; i < state->num; i++) {
> -                       vring_unmap_extra_packed(vq,
> -                                                &vq->packed.desc_extra[curr]);
> -                       curr = vq->packed.desc_extra[curr].next;
> +       if (!(flags & VRING_DESC_F_INDIRECT)) {
> +               if (vring_need_unmap_buffer(vq)) {
> +                       curr = id;
> +                       for (i = 0; i < state->num; i++) {
> +                               vring_unmap_extra_packed(vq,
> +                                                        &vq->packed.desc_extra[curr]);
> +                               curr = vq->packed.desc_extra[curr].next;
> +                       }
>                 }
> -       }
>
> -       if (vq->indirect) {
> +               if (ctx)
> +                       *ctx = state->indir_desc;
> +       } else {
> +               const struct vring_desc_extra *extra;
>                 u32 len;
>
> +               if (vq->use_dma_api) {
> +                       extra = &vq->packed.desc_extra[id];
> +                       dma_unmap_single(vring_dma_dev(vq),
> +                                        extra->addr, extra->len,
> +                                        (flags & VRING_DESC_F_WRITE) ?
> +                                        DMA_FROM_DEVICE : DMA_TO_DEVICE);
> +               }

Theoretically, indirect descriptors could be chained. It is supported
without this patch but not here.

Thanks

> +
>                 /* Free the indirect table, if any, now that it's unmapped. */
>                 desc = state->indir_desc;
> -               if (!desc)
> -                       return;
>
>                 if (vring_need_unmap_buffer(vq)) {
>                         len = vq->packed.desc_extra[id].len;
> @@ -1634,8 +1638,6 @@ static void detach_buf_packed(struct vring_virtqueue *vq,
>                 }
>                 kfree(desc);
>                 state->indir_desc = NULL;
> -       } else if (ctx) {
> -               *ctx = state->indir_desc;
>         }
>  }
>
> --
> 2.32.0.3.g01195cf9f
>


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH vhost v4 10/10] virtio_ring: virtqueue_set_dma_premapped support disable
  2024-03-12  3:35 ` [PATCH vhost v4 10/10] virtio_ring: virtqueue_set_dma_premapped support disable Xuan Zhuo
@ 2024-03-21  6:02   ` Jason Wang
  2024-03-21  8:21     ` Xuan Zhuo
  0 siblings, 1 reply; 31+ messages in thread
From: Jason Wang @ 2024-03-21  6:02 UTC (permalink / raw)
  To: Xuan Zhuo
  Cc: virtualization, Michael S. Tsirkin, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, netdev

On Tue, Mar 12, 2024 at 11:36 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
>
> Now, the API virtqueue_set_dma_premapped just support to
> enable premapped mode.
>
> If we allow enabling the premapped dynamically, we should
> make this API to support disable the premapped mode.
>
> Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> ---
>  drivers/virtio/virtio_ring.c | 34 ++++++++++++++++++++++++++--------
>  include/linux/virtio.h       |  2 +-
>  2 files changed, 27 insertions(+), 9 deletions(-)
>
> diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> index 34f4b2c0c31e..3bf69cae4965 100644
> --- a/drivers/virtio/virtio_ring.c
> +++ b/drivers/virtio/virtio_ring.c
> @@ -2801,6 +2801,7 @@ EXPORT_SYMBOL_GPL(virtqueue_resize);
>  /**
>   * virtqueue_set_dma_premapped - set the vring premapped mode
>   * @_vq: the struct virtqueue we're talking about.
> + * @premapped: enable/disable the premapped mode.
>   *
>   * Enable the premapped mode of the vq.
>   *
> @@ -2819,9 +2820,10 @@ EXPORT_SYMBOL_GPL(virtqueue_resize);
>   * 0: success.
>   * -EINVAL: vring does not use the dma api, so we can not enable premapped mode.
>   */
> -int virtqueue_set_dma_premapped(struct virtqueue *_vq)
> +int virtqueue_set_dma_premapped(struct virtqueue *_vq, bool premapped)

I think we need to document the requirement for calling this.

Looking at the code, it seems it requires to stop the datapath and
detach all the used buffers?

Thanks


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH vhost v4 02/10] virtio_ring: packed: remove double check of the unmap ops
  2024-03-21  5:57   ` Jason Wang
@ 2024-03-21  8:20     ` Xuan Zhuo
  2024-03-22  5:10       ` Jason Wang
  2024-03-26  7:32       ` Michael S. Tsirkin
  0 siblings, 2 replies; 31+ messages in thread
From: Xuan Zhuo @ 2024-03-21  8:20 UTC (permalink / raw)
  To: Jason Wang
  Cc: virtualization, Michael S. Tsirkin, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, netdev

On Thu, 21 Mar 2024 13:57:06 +0800, Jason Wang <jasowang@redhat.com> wrote:
> On Tue, Mar 12, 2024 at 11:36 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> >
> > In the functions vring_unmap_extra_packed and vring_unmap_desc_packed,
> > multiple checks are made whether unmap is performed and whether it is
> > INDIRECT.
> >
> > These two functions are usually called in a loop, and we should put the
> > check outside the loop.
> >
> > And we unmap the descs with VRING_DESC_F_INDIRECT on the same path with
> > other descs, that make the thing more complex. If we distinguish the
> > descs with VRING_DESC_F_INDIRECT before unmap, thing will be clearer.
> >
> > 1. only one desc of the desc table is used, we do not need the loop
> > 2. the called unmap api is difference from the other desc
> > 3. the vq->premapped is not needed to check
> > 4. the vq->indirect is not needed to check
> > 5. the state->indir_desc must not be null
> >
> > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > ---
> >  drivers/virtio/virtio_ring.c | 78 ++++++++++++++++++------------------
> >  1 file changed, 40 insertions(+), 38 deletions(-)
> >
> > diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> > index c2779e34aac7..0dfbd17e5a87 100644
> > --- a/drivers/virtio/virtio_ring.c
> > +++ b/drivers/virtio/virtio_ring.c
> > @@ -1214,6 +1214,7 @@ static u16 packed_last_used(u16 last_used_idx)
> >         return last_used_idx & ~(-(1 << VRING_PACKED_EVENT_F_WRAP_CTR));
> >  }
> >
> > +/* caller must check vring_need_unmap_buffer() */
> >  static void vring_unmap_extra_packed(const struct vring_virtqueue *vq,
> >                                      const struct vring_desc_extra *extra)
> >  {
> > @@ -1221,33 +1222,18 @@ static void vring_unmap_extra_packed(const struct vring_virtqueue *vq,
> >
> >         flags = extra->flags;
> >
> > -       if (flags & VRING_DESC_F_INDIRECT) {
> > -               if (!vq->use_dma_api)
> > -                       return;
> > -
> > -               dma_unmap_single(vring_dma_dev(vq),
> > -                                extra->addr, extra->len,
> > -                                (flags & VRING_DESC_F_WRITE) ?
> > -                                DMA_FROM_DEVICE : DMA_TO_DEVICE);
> > -       } else {
> > -               if (!vring_need_unmap_buffer(vq))
> > -                       return;
> > -
> > -               dma_unmap_page(vring_dma_dev(vq),
> > -                              extra->addr, extra->len,
> > -                              (flags & VRING_DESC_F_WRITE) ?
> > -                              DMA_FROM_DEVICE : DMA_TO_DEVICE);
> > -       }
> > +       dma_unmap_page(vring_dma_dev(vq),
> > +                      extra->addr, extra->len,
> > +                      (flags & VRING_DESC_F_WRITE) ?
> > +                      DMA_FROM_DEVICE : DMA_TO_DEVICE);
> >  }
> >
> > +/* caller must check vring_need_unmap_buffer() */
> >  static void vring_unmap_desc_packed(const struct vring_virtqueue *vq,
> >                                     const struct vring_packed_desc *desc)
> >  {
> >         u16 flags;
> >
> > -       if (!vring_need_unmap_buffer(vq))
> > -               return;
> > -
> >         flags = le16_to_cpu(desc->flags);
> >
> >         dma_unmap_page(vring_dma_dev(vq),
> > @@ -1323,7 +1309,7 @@ static int virtqueue_add_indirect_packed(struct vring_virtqueue *vq,
> >                         total_sg * sizeof(struct vring_packed_desc),
> >                         DMA_TO_DEVICE);
> >         if (vring_mapping_error(vq, addr)) {
> > -               if (vq->premapped)
> > +               if (!vring_need_unmap_buffer(vq))
> >                         goto free_desc;
> >
> >                 goto unmap_release;
> > @@ -1338,10 +1324,11 @@ static int virtqueue_add_indirect_packed(struct vring_virtqueue *vq,
> >                 vq->packed.desc_extra[id].addr = addr;
> >                 vq->packed.desc_extra[id].len = total_sg *
> >                                 sizeof(struct vring_packed_desc);
> > -               vq->packed.desc_extra[id].flags = VRING_DESC_F_INDIRECT |
> > -                                                 vq->packed.avail_used_flags;
> >         }
> >
> > +       vq->packed.desc_extra[id].flags = VRING_DESC_F_INDIRECT |
> > +               vq->packed.avail_used_flags;
> > +
> >         /*
> >          * A driver MUST NOT make the first descriptor in the list
> >          * available before all subsequent descriptors comprising
> > @@ -1382,6 +1369,8 @@ static int virtqueue_add_indirect_packed(struct vring_virtqueue *vq,
> >  unmap_release:
> >         err_idx = i;
> >
> > +       WARN_ON(!vring_need_unmap_buffer(vq));
> > +
> >         for (i = 0; i < err_idx; i++)
> >                 vring_unmap_desc_packed(vq, &desc[i]);
> >
> > @@ -1475,12 +1464,13 @@ static inline int virtqueue_add_packed(struct virtqueue *_vq,
> >                         desc[i].len = cpu_to_le32(sg->length);
> >                         desc[i].id = cpu_to_le16(id);
> >
> > -                       if (unlikely(vq->use_dma_api)) {
> > +                       if (vring_need_unmap_buffer(vq)) {
> >                                 vq->packed.desc_extra[curr].addr = addr;
> >                                 vq->packed.desc_extra[curr].len = sg->length;
> > -                               vq->packed.desc_extra[curr].flags =
> > -                                       le16_to_cpu(flags);
> >                         }
> > +
> > +                       vq->packed.desc_extra[curr].flags = le16_to_cpu(flags);
> > +
> >                         prev = curr;
> >                         curr = vq->packed.desc_extra[curr].next;
> >
> > @@ -1530,6 +1520,8 @@ static inline int virtqueue_add_packed(struct virtqueue *_vq,
> >
> >         vq->packed.avail_used_flags = avail_used_flags;
> >
> > +       WARN_ON(!vring_need_unmap_buffer(vq));
> > +
> >         for (n = 0; n < total_sg; n++) {
> >                 if (i == err_idx)
> >                         break;
> > @@ -1599,7 +1591,9 @@ static void detach_buf_packed(struct vring_virtqueue *vq,
> >         struct vring_desc_state_packed *state = NULL;
> >         struct vring_packed_desc *desc;
> >         unsigned int i, curr;
> > +       u16 flags;
> >
> > +       flags = vq->packed.desc_extra[id].flags;
> >         state = &vq->packed.desc_state[id];
> >
> >         /* Clear data ptr. */
> > @@ -1609,22 +1603,32 @@ static void detach_buf_packed(struct vring_virtqueue *vq,
> >         vq->free_head = id;
> >         vq->vq.num_free += state->num;
> >
> > -       if (unlikely(vq->use_dma_api)) {
> > -               curr = id;
> > -               for (i = 0; i < state->num; i++) {
> > -                       vring_unmap_extra_packed(vq,
> > -                                                &vq->packed.desc_extra[curr]);
> > -                       curr = vq->packed.desc_extra[curr].next;
> > +       if (!(flags & VRING_DESC_F_INDIRECT)) {
> > +               if (vring_need_unmap_buffer(vq)) {
> > +                       curr = id;
> > +                       for (i = 0; i < state->num; i++) {
> > +                               vring_unmap_extra_packed(vq,
> > +                                                        &vq->packed.desc_extra[curr]);
> > +                               curr = vq->packed.desc_extra[curr].next;
> > +                       }
> >                 }
> > -       }
> >
> > -       if (vq->indirect) {
> > +               if (ctx)
> > +                       *ctx = state->indir_desc;
> > +       } else {
> > +               const struct vring_desc_extra *extra;
> >                 u32 len;
> >
> > +               if (vq->use_dma_api) {
> > +                       extra = &vq->packed.desc_extra[id];
> > +                       dma_unmap_single(vring_dma_dev(vq),
> > +                                        extra->addr, extra->len,
> > +                                        (flags & VRING_DESC_F_WRITE) ?
> > +                                        DMA_FROM_DEVICE : DMA_TO_DEVICE);
> > +               }
>
> Theoretically, indirect descriptors could be chained. It is supported
> without this patch but not here.


YES. But now, that is not supported by "add", so I think we
do not need to think about it.

Thanks.

>
> Thanks
>
> > +
> >                 /* Free the indirect table, if any, now that it's unmapped. */
> >                 desc = state->indir_desc;
> > -               if (!desc)
> > -                       return;
> >
> >                 if (vring_need_unmap_buffer(vq)) {
> >                         len = vq->packed.desc_extra[id].len;
> > @@ -1634,8 +1638,6 @@ static void detach_buf_packed(struct vring_virtqueue *vq,
> >                 }
> >                 kfree(desc);
> >                 state->indir_desc = NULL;
> > -       } else if (ctx) {
> > -               *ctx = state->indir_desc;
> >         }
> >  }
> >
> > --
> > 2.32.0.3.g01195cf9f
> >
>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH vhost v4 10/10] virtio_ring: virtqueue_set_dma_premapped support disable
  2024-03-21  6:02   ` Jason Wang
@ 2024-03-21  8:21     ` Xuan Zhuo
  2024-03-22  5:13       ` Jason Wang
  0 siblings, 1 reply; 31+ messages in thread
From: Xuan Zhuo @ 2024-03-21  8:21 UTC (permalink / raw)
  To: Jason Wang
  Cc: virtualization, Michael S. Tsirkin, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, netdev

On Thu, 21 Mar 2024 14:02:14 +0800, Jason Wang <jasowang@redhat.com> wrote:
> On Tue, Mar 12, 2024 at 11:36 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> >
> > Now, the API virtqueue_set_dma_premapped just support to
> > enable premapped mode.
> >
> > If we allow enabling the premapped dynamically, we should
> > make this API to support disable the premapped mode.
> >
> > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > ---
> >  drivers/virtio/virtio_ring.c | 34 ++++++++++++++++++++++++++--------
> >  include/linux/virtio.h       |  2 +-
> >  2 files changed, 27 insertions(+), 9 deletions(-)
> >
> > diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> > index 34f4b2c0c31e..3bf69cae4965 100644
> > --- a/drivers/virtio/virtio_ring.c
> > +++ b/drivers/virtio/virtio_ring.c
> > @@ -2801,6 +2801,7 @@ EXPORT_SYMBOL_GPL(virtqueue_resize);
> >  /**
> >   * virtqueue_set_dma_premapped - set the vring premapped mode
> >   * @_vq: the struct virtqueue we're talking about.
> > + * @premapped: enable/disable the premapped mode.
> >   *
> >   * Enable the premapped mode of the vq.
> >   *
> > @@ -2819,9 +2820,10 @@ EXPORT_SYMBOL_GPL(virtqueue_resize);
> >   * 0: success.
> >   * -EINVAL: vring does not use the dma api, so we can not enable premapped mode.
> >   */
> > -int virtqueue_set_dma_premapped(struct virtqueue *_vq)
> > +int virtqueue_set_dma_premapped(struct virtqueue *_vq, bool premapped)
>
> I think we need to document the requirement for calling this.
>
> Looking at the code, it seems it requires to stop the datapath and
> detach all the used buffers?


YES. The complete document is:

/**
 * virtqueue_set_dma_premapped - set the vring premapped mode
 * @_vq: the struct virtqueue we're talking about.
 *
 * Enable the premapped mode of the vq.
 *
 * The vring in premapped mode does not do dma internally, so the driver must
 * do dma mapping in advance. The driver must pass the dma_address through
 * dma_address of scatterlist. When the driver got a used buffer from
 * the vring, it has to unmap the dma address.
 *
 * This function must be called immediately after creating the vq, or after vq
 * reset, and before adding any buffers to it.
 *
 * Caller must ensure we don't call this with other virtqueue operations
 * at the same time (except where noted).
 *
 * Returns zero or a negative error.
 * 0: success.
 * -EINVAL: vring does not use the dma api, so we can not enable premapped mode.
 */

Thanks


>
> Thanks
>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH vhost v4 03/10] virtio_ring: packed: structure the indirect desc table
  2024-03-21  4:47   ` Jason Wang
@ 2024-03-21  8:24     ` Xuan Zhuo
  2024-03-22  5:15       ` Jason Wang
  0 siblings, 1 reply; 31+ messages in thread
From: Xuan Zhuo @ 2024-03-21  8:24 UTC (permalink / raw)
  To: Jason Wang
  Cc: virtualization, Michael S. Tsirkin, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, netdev

On Thu, 21 Mar 2024 12:47:18 +0800, Jason Wang <jasowang@redhat.com> wrote:
> On Tue, Mar 12, 2024 at 11:36 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> >
> > This commit structure the indirect desc table.
> > Then we can get the desc num directly when doing unmap.
> >
> > And save the dma info to the struct, then the indirect
> > will not use the dma fields of the desc_extra. The subsequent
> > commits will make the dma fields are optional. But for
> > the indirect case, we must record the dma info.
> >
> > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > ---
> >  drivers/virtio/virtio_ring.c | 66 +++++++++++++++++++++---------------
> >  1 file changed, 38 insertions(+), 28 deletions(-)
> >
> > diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> > index 0dfbd17e5a87..22a588bba166 100644
> > --- a/drivers/virtio/virtio_ring.c
> > +++ b/drivers/virtio/virtio_ring.c
> > @@ -72,9 +72,16 @@ struct vring_desc_state_split {
> >         struct vring_desc *indir_desc;  /* Indirect descriptor, if any. */
> >  };
> >
> > +struct vring_packed_desc_indir {
> > +       dma_addr_t addr;                /* Descriptor Array DMA addr. */
> > +       u32 len;                        /* Descriptor Array length. */
> > +       u32 num;
> > +       struct vring_packed_desc desc[];
> > +};
> > +
> >  struct vring_desc_state_packed {
> >         void *data;                     /* Data for callback. */
> > -       struct vring_packed_desc *indir_desc; /* Indirect descriptor, if any. */
> > +       struct vring_packed_desc_indir *indir_desc; /* Indirect descriptor, if any. */
>
> Maybe it's better just to have a vring_desc_extra here.


Do you mean replacing vring_packed_desc_indir by vring_desc_extra?

I am ok for that. But vring_desc_extra has two extra items:

	u16 flags;			/* Descriptor flags. */
	u16 next;			/* The next desc state in a list. */

vring_packed_desc_indir has "desc". I think that is more convenient.

So, I think vring_packed_desc_indir is appropriate.
Or I missed something.


Thanks.


>
> Thanks
>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH vhost v4 00/10] virtio: drivers maintain dma info for premapped vq
  2024-03-21  4:45 ` Jason Wang
@ 2024-03-21  8:30   ` Xuan Zhuo
  0 siblings, 0 replies; 31+ messages in thread
From: Xuan Zhuo @ 2024-03-21  8:30 UTC (permalink / raw)
  To: Jason Wang
  Cc: virtualization, Michael S. Tsirkin, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, netdev

On Thu, 21 Mar 2024 12:45:08 +0800, Jason Wang <jasowang@redhat.com> wrote:
> On Tue, Mar 12, 2024 at 11:36 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> >
> > As discussed:
> >
> > http://lore.kernel.org/all/CACGkMEvq0No8QGC46U4mGsMtuD44fD_cfLcPaVmJ3rHYqRZxYg@mail.gmail.com
> >
> > If the virtio is premapped mode, the driver should manage the dma info by self.
> > So the virtio core should not store the dma info. We can release the memory used
> > to store the dma info.
> >
> > For virtio-net xmit queue, if the virtio-net maintains the dma info,
> > the virtio-net must allocate too much memory(19 * queue_size for per-queue), so
> > we do not plan to make the virtio-net to maintain the dma info by default. The
> > virtio-net xmit queue only maintain the dma info when premapped mode is enable
> > (such as AF_XDP is enable).
> >
> > So this patch set try to do:
> >
> > 1. make the virtio core to do not store the dma info
>
> I think you mean "make the virtio core to do not store the dma info
> when driver can do that"

YES.


>
> >     - But if the desc_extra has not dma info, we face a new question,
> >       it is hard to get the dma info of the desc with indirect flag.
>
> I guess you want to avoid allocating desc_extra array, otherwise you
> won't have this issue.
>
> How about keeping that?

This is a way. But when we allocate the indirect desc, we alloc
more memory to save that, I think that is a good way.
And in the future, we can handen the unmap for the indirect buffer
with more memory allocated by once.

Thanks.


>
> >       For split mode, that is easy from desc, but for the packed mode,
> >       it is hard to get the dma info from the desc. And hardening
> >       the dma unmap is safe, we should store the dma info of indirect
> >       descs when the virtio core does not store the bufer dma info.
> >
> >       So I introduce the "structure the indirect desc table" to
> >       allocate space to store dma info of the desc table.
> >
> >         +struct vring_split_desc_indir {
> >         +       dma_addr_t addr;                /* Descriptor Array DMA addr. */
> >         +       u32 len;                        /* Descriptor Array length. */
> >         +       u32 num;
>
> We can probably just reuse vring_desc_extra here with a known flag
> (read only for device).
>
> >         +       struct vring_desc desc[];
> >         +};
> >
> >       The follow patches to this:
> >          * virtio_ring: packed: structure the indirect desc table
> >          * virtio_ring: split: structure the indirect desc table
> >
> >     - On the other side, in the umap handle, we mix the indirect descs with
> >       other descs. That make things too complex. I found if we we distinguish
> >       the descs with VRING_DESC_F_INDIRECT before unmap, thing will be clearer.
> >
> >       The follow patches do this.
> >          * virtio_ring: packed: remove double check of the unmap ops
> >          * virtio_ring: split: structure the indirect desc table
> >
> > 2. make the virtio core to enable premapped mode by find_vqs() params
> >     - Because the find_vqs() will try to allocate memory for the dma info.
> >       If we set the premapped mode after find_vqs() and release the
> >       dma info, that is odd.
>
> Thanks
>
> >
> >
> > Please review.
> >
> > Thanks
> >
> > v4:
> >     1. virtio-net xmit queue does not enable premapped mode by default
> >
> > v3:
> >     1. fix the conflict with the vp_modern_create_avq().
> >
> > v2:
> >     1. change the dma item of virtio-net, every item have MAX_SKB_FRAGS + 2 addr + len pairs.
> >     2. introduce virtnet_sq_free_stats for __free_old_xmit
> >
> > v1:
> >     1. rename transport_vq_config to vq_transport_config
> >     2. virtio-net set dma meta number to (ring-size + 1)(MAX_SKB_FRGAS +2)
> >     3. introduce virtqueue_dma_map_sg_attrs
> >     4. separate vring_create_virtqueue to an independent commit
> >
> > Xuan Zhuo (10):
> >   virtio_ring: introduce vring_need_unmap_buffer
> >   virtio_ring: packed: remove double check of the unmap ops
> >   virtio_ring: packed: structure the indirect desc table
> >   virtio_ring: split: remove double check of the unmap ops
> >   virtio_ring: split: structure the indirect desc table
> >   virtio_ring: no store dma info when unmap is not needed
> >   virtio: find_vqs: add new parameter premapped
> >   virtio_ring: export premapped to driver by struct virtqueue
> >   virtio_net: set premapped mode by find_vqs()
> >   virtio_ring: virtqueue_set_dma_premapped support disable
> >
> >  drivers/net/virtio_net.c      |  57 +++--
> >  drivers/virtio/virtio_ring.c  | 436 +++++++++++++++++++++-------------
> >  include/linux/virtio.h        |   3 +-
> >  include/linux/virtio_config.h |  17 +-
> >  4 files changed, 307 insertions(+), 206 deletions(-)
> >
> > --
> > 2.32.0.3.g01195cf9f
> >
>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH vhost v4 02/10] virtio_ring: packed: remove double check of the unmap ops
  2024-03-21  8:20     ` Xuan Zhuo
@ 2024-03-22  5:10       ` Jason Wang
  2024-03-26  7:32       ` Michael S. Tsirkin
  1 sibling, 0 replies; 31+ messages in thread
From: Jason Wang @ 2024-03-22  5:10 UTC (permalink / raw)
  To: Xuan Zhuo
  Cc: virtualization, Michael S. Tsirkin, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, netdev

On Thu, Mar 21, 2024 at 4:21 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
>
> On Thu, 21 Mar 2024 13:57:06 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > On Tue, Mar 12, 2024 at 11:36 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > >
> > > In the functions vring_unmap_extra_packed and vring_unmap_desc_packed,
> > > multiple checks are made whether unmap is performed and whether it is
> > > INDIRECT.
> > >
> > > These two functions are usually called in a loop, and we should put the
> > > check outside the loop.
> > >
> > > And we unmap the descs with VRING_DESC_F_INDIRECT on the same path with
> > > other descs, that make the thing more complex. If we distinguish the
> > > descs with VRING_DESC_F_INDIRECT before unmap, thing will be clearer.
> > >
> > > 1. only one desc of the desc table is used, we do not need the loop
> > > 2. the called unmap api is difference from the other desc
> > > 3. the vq->premapped is not needed to check
> > > 4. the vq->indirect is not needed to check
> > > 5. the state->indir_desc must not be null
> > >
> > > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > > ---
> > >  drivers/virtio/virtio_ring.c | 78 ++++++++++++++++++------------------
> > >  1 file changed, 40 insertions(+), 38 deletions(-)
> > >
> > > diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> > > index c2779e34aac7..0dfbd17e5a87 100644
> > > --- a/drivers/virtio/virtio_ring.c
> > > +++ b/drivers/virtio/virtio_ring.c
> > > @@ -1214,6 +1214,7 @@ static u16 packed_last_used(u16 last_used_idx)
> > >         return last_used_idx & ~(-(1 << VRING_PACKED_EVENT_F_WRAP_CTR));
> > >  }
> > >
> > > +/* caller must check vring_need_unmap_buffer() */
> > >  static void vring_unmap_extra_packed(const struct vring_virtqueue *vq,
> > >                                      const struct vring_desc_extra *extra)
> > >  {
> > > @@ -1221,33 +1222,18 @@ static void vring_unmap_extra_packed(const struct vring_virtqueue *vq,
> > >
> > >         flags = extra->flags;
> > >
> > > -       if (flags & VRING_DESC_F_INDIRECT) {
> > > -               if (!vq->use_dma_api)
> > > -                       return;
> > > -
> > > -               dma_unmap_single(vring_dma_dev(vq),
> > > -                                extra->addr, extra->len,
> > > -                                (flags & VRING_DESC_F_WRITE) ?
> > > -                                DMA_FROM_DEVICE : DMA_TO_DEVICE);
> > > -       } else {
> > > -               if (!vring_need_unmap_buffer(vq))
> > > -                       return;
> > > -
> > > -               dma_unmap_page(vring_dma_dev(vq),
> > > -                              extra->addr, extra->len,
> > > -                              (flags & VRING_DESC_F_WRITE) ?
> > > -                              DMA_FROM_DEVICE : DMA_TO_DEVICE);
> > > -       }
> > > +       dma_unmap_page(vring_dma_dev(vq),
> > > +                      extra->addr, extra->len,
> > > +                      (flags & VRING_DESC_F_WRITE) ?
> > > +                      DMA_FROM_DEVICE : DMA_TO_DEVICE);
> > >  }
> > >
> > > +/* caller must check vring_need_unmap_buffer() */
> > >  static void vring_unmap_desc_packed(const struct vring_virtqueue *vq,
> > >                                     const struct vring_packed_desc *desc)
> > >  {
> > >         u16 flags;
> > >
> > > -       if (!vring_need_unmap_buffer(vq))
> > > -               return;
> > > -
> > >         flags = le16_to_cpu(desc->flags);
> > >
> > >         dma_unmap_page(vring_dma_dev(vq),
> > > @@ -1323,7 +1309,7 @@ static int virtqueue_add_indirect_packed(struct vring_virtqueue *vq,
> > >                         total_sg * sizeof(struct vring_packed_desc),
> > >                         DMA_TO_DEVICE);
> > >         if (vring_mapping_error(vq, addr)) {
> > > -               if (vq->premapped)
> > > +               if (!vring_need_unmap_buffer(vq))
> > >                         goto free_desc;
> > >
> > >                 goto unmap_release;
> > > @@ -1338,10 +1324,11 @@ static int virtqueue_add_indirect_packed(struct vring_virtqueue *vq,
> > >                 vq->packed.desc_extra[id].addr = addr;
> > >                 vq->packed.desc_extra[id].len = total_sg *
> > >                                 sizeof(struct vring_packed_desc);
> > > -               vq->packed.desc_extra[id].flags = VRING_DESC_F_INDIRECT |
> > > -                                                 vq->packed.avail_used_flags;
> > >         }
> > >
> > > +       vq->packed.desc_extra[id].flags = VRING_DESC_F_INDIRECT |
> > > +               vq->packed.avail_used_flags;
> > > +
> > >         /*
> > >          * A driver MUST NOT make the first descriptor in the list
> > >          * available before all subsequent descriptors comprising
> > > @@ -1382,6 +1369,8 @@ static int virtqueue_add_indirect_packed(struct vring_virtqueue *vq,
> > >  unmap_release:
> > >         err_idx = i;
> > >
> > > +       WARN_ON(!vring_need_unmap_buffer(vq));
> > > +
> > >         for (i = 0; i < err_idx; i++)
> > >                 vring_unmap_desc_packed(vq, &desc[i]);
> > >
> > > @@ -1475,12 +1464,13 @@ static inline int virtqueue_add_packed(struct virtqueue *_vq,
> > >                         desc[i].len = cpu_to_le32(sg->length);
> > >                         desc[i].id = cpu_to_le16(id);
> > >
> > > -                       if (unlikely(vq->use_dma_api)) {
> > > +                       if (vring_need_unmap_buffer(vq)) {
> > >                                 vq->packed.desc_extra[curr].addr = addr;
> > >                                 vq->packed.desc_extra[curr].len = sg->length;
> > > -                               vq->packed.desc_extra[curr].flags =
> > > -                                       le16_to_cpu(flags);
> > >                         }
> > > +
> > > +                       vq->packed.desc_extra[curr].flags = le16_to_cpu(flags);
> > > +
> > >                         prev = curr;
> > >                         curr = vq->packed.desc_extra[curr].next;
> > >
> > > @@ -1530,6 +1520,8 @@ static inline int virtqueue_add_packed(struct virtqueue *_vq,
> > >
> > >         vq->packed.avail_used_flags = avail_used_flags;
> > >
> > > +       WARN_ON(!vring_need_unmap_buffer(vq));
> > > +
> > >         for (n = 0; n < total_sg; n++) {
> > >                 if (i == err_idx)
> > >                         break;
> > > @@ -1599,7 +1591,9 @@ static void detach_buf_packed(struct vring_virtqueue *vq,
> > >         struct vring_desc_state_packed *state = NULL;
> > >         struct vring_packed_desc *desc;
> > >         unsigned int i, curr;
> > > +       u16 flags;
> > >
> > > +       flags = vq->packed.desc_extra[id].flags;
> > >         state = &vq->packed.desc_state[id];
> > >
> > >         /* Clear data ptr. */
> > > @@ -1609,22 +1603,32 @@ static void detach_buf_packed(struct vring_virtqueue *vq,
> > >         vq->free_head = id;
> > >         vq->vq.num_free += state->num;
> > >
> > > -       if (unlikely(vq->use_dma_api)) {
> > > -               curr = id;
> > > -               for (i = 0; i < state->num; i++) {
> > > -                       vring_unmap_extra_packed(vq,
> > > -                                                &vq->packed.desc_extra[curr]);
> > > -                       curr = vq->packed.desc_extra[curr].next;
> > > +       if (!(flags & VRING_DESC_F_INDIRECT)) {
> > > +               if (vring_need_unmap_buffer(vq)) {
> > > +                       curr = id;
> > > +                       for (i = 0; i < state->num; i++) {
> > > +                               vring_unmap_extra_packed(vq,
> > > +                                                        &vq->packed.desc_extra[curr]);
> > > +                               curr = vq->packed.desc_extra[curr].next;
> > > +                       }
> > >                 }
> > > -       }
> > >
> > > -       if (vq->indirect) {
> > > +               if (ctx)
> > > +                       *ctx = state->indir_desc;
> > > +       } else {
> > > +               const struct vring_desc_extra *extra;
> > >                 u32 len;
> > >
> > > +               if (vq->use_dma_api) {
> > > +                       extra = &vq->packed.desc_extra[id];
> > > +                       dma_unmap_single(vring_dma_dev(vq),
> > > +                                        extra->addr, extra->len,
> > > +                                        (flags & VRING_DESC_F_WRITE) ?
> > > +                                        DMA_FROM_DEVICE : DMA_TO_DEVICE);
> > > +               }
> >
> > Theoretically, indirect descriptors could be chained. It is supported
> > without this patch but not here.
>
>
> YES. But now, that is not supported by "add", so I think we
> do not need to think about it.
>
> Thanks.

Well, you need to at least document this somewhere (probably in the changlog).

Thanks


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH vhost v4 10/10] virtio_ring: virtqueue_set_dma_premapped support disable
  2024-03-21  8:21     ` Xuan Zhuo
@ 2024-03-22  5:13       ` Jason Wang
  2024-03-22  6:03         ` Xuan Zhuo
  0 siblings, 1 reply; 31+ messages in thread
From: Jason Wang @ 2024-03-22  5:13 UTC (permalink / raw)
  To: Xuan Zhuo
  Cc: virtualization, Michael S. Tsirkin, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, netdev

On Thu, Mar 21, 2024 at 4:22 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
>
> On Thu, 21 Mar 2024 14:02:14 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > On Tue, Mar 12, 2024 at 11:36 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > >
> > > Now, the API virtqueue_set_dma_premapped just support to
> > > enable premapped mode.
> > >
> > > If we allow enabling the premapped dynamically, we should
> > > make this API to support disable the premapped mode.
> > >
> > > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > > ---
> > >  drivers/virtio/virtio_ring.c | 34 ++++++++++++++++++++++++++--------
> > >  include/linux/virtio.h       |  2 +-
> > >  2 files changed, 27 insertions(+), 9 deletions(-)
> > >
> > > diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> > > index 34f4b2c0c31e..3bf69cae4965 100644
> > > --- a/drivers/virtio/virtio_ring.c
> > > +++ b/drivers/virtio/virtio_ring.c
> > > @@ -2801,6 +2801,7 @@ EXPORT_SYMBOL_GPL(virtqueue_resize);
> > >  /**
> > >   * virtqueue_set_dma_premapped - set the vring premapped mode
> > >   * @_vq: the struct virtqueue we're talking about.
> > > + * @premapped: enable/disable the premapped mode.
> > >   *
> > >   * Enable the premapped mode of the vq.
> > >   *
> > > @@ -2819,9 +2820,10 @@ EXPORT_SYMBOL_GPL(virtqueue_resize);
> > >   * 0: success.
> > >   * -EINVAL: vring does not use the dma api, so we can not enable premapped mode.
> > >   */
> > > -int virtqueue_set_dma_premapped(struct virtqueue *_vq)
> > > +int virtqueue_set_dma_premapped(struct virtqueue *_vq, bool premapped)
> >
> > I think we need to document the requirement for calling this.
> >
> > Looking at the code, it seems it requires to stop the datapath and
> > detach all the used buffers?
>
>
> YES. The complete document is:
>
> /**
>  * virtqueue_set_dma_premapped - set the vring premapped mode
>  * @_vq: the struct virtqueue we're talking about.
>  *
>  * Enable the premapped mode of the vq.
>  *
>  * The vring in premapped mode does not do dma internally, so the driver must
>  * do dma mapping in advance. The driver must pass the dma_address through
>  * dma_address of scatterlist. When the driver got a used buffer from
>  * the vring, it has to unmap the dma address.
>  *
>  * This function must be called immediately after creating the vq, or after vq
>  * reset, and before adding any buffers to it.

I'm not sure this is a good design but we need at least some guard for
this, probably WARN for num_added or others.

Thanks

>  *
>  * Caller must ensure we don't call this with other virtqueue operations
>  * at the same time (except where noted).
>  *
>  * Returns zero or a negative error.
>  * 0: success.
>  * -EINVAL: vring does not use the dma api, so we can not enable premapped mode.
>  */
>
> Thanks
>
>
> >
> > Thanks
> >
>


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH vhost v4 03/10] virtio_ring: packed: structure the indirect desc table
  2024-03-21  8:24     ` Xuan Zhuo
@ 2024-03-22  5:15       ` Jason Wang
  2024-03-22  5:55         ` Xuan Zhuo
  2024-03-22  7:51         ` Xuan Zhuo
  0 siblings, 2 replies; 31+ messages in thread
From: Jason Wang @ 2024-03-22  5:15 UTC (permalink / raw)
  To: Xuan Zhuo
  Cc: virtualization, Michael S. Tsirkin, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, netdev

On Thu, Mar 21, 2024 at 4:29 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
>
> On Thu, 21 Mar 2024 12:47:18 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > On Tue, Mar 12, 2024 at 11:36 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > >
> > > This commit structure the indirect desc table.
> > > Then we can get the desc num directly when doing unmap.
> > >
> > > And save the dma info to the struct, then the indirect
> > > will not use the dma fields of the desc_extra. The subsequent
> > > commits will make the dma fields are optional. But for
> > > the indirect case, we must record the dma info.
> > >
> > > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > > ---
> > >  drivers/virtio/virtio_ring.c | 66 +++++++++++++++++++++---------------
> > >  1 file changed, 38 insertions(+), 28 deletions(-)
> > >
> > > diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> > > index 0dfbd17e5a87..22a588bba166 100644
> > > --- a/drivers/virtio/virtio_ring.c
> > > +++ b/drivers/virtio/virtio_ring.c
> > > @@ -72,9 +72,16 @@ struct vring_desc_state_split {
> > >         struct vring_desc *indir_desc;  /* Indirect descriptor, if any. */
> > >  };
> > >
> > > +struct vring_packed_desc_indir {
> > > +       dma_addr_t addr;                /* Descriptor Array DMA addr. */
> > > +       u32 len;                        /* Descriptor Array length. */
> > > +       u32 num;
> > > +       struct vring_packed_desc desc[];
> > > +};
> > > +
> > >  struct vring_desc_state_packed {
> > >         void *data;                     /* Data for callback. */
> > > -       struct vring_packed_desc *indir_desc; /* Indirect descriptor, if any. */
> > > +       struct vring_packed_desc_indir *indir_desc; /* Indirect descriptor, if any. */
> >
> > Maybe it's better just to have a vring_desc_extra here.
>
>
> Do you mean replacing vring_packed_desc_indir by vring_desc_extra?

Just add a vring_desc_extra in vring_desc_state_packed.

>
> I am ok for that. But vring_desc_extra has two extra items:
>
>         u16 flags;                      /* Descriptor flags. */
>         u16 next;                       /* The next desc state in a list. */
>
> vring_packed_desc_indir has "desc". I think that is more convenient.
>
> So, I think vring_packed_desc_indir is appropriate.

It reuses the existing structure so we had the chance to reuse the
helper. And it could be used for future chained indirect (if it turns
out to be necessary).

Thanks

> Or I missed something.
>
>
> Thanks.
>
>
> >
> > Thanks
> >
>


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH vhost v4 03/10] virtio_ring: packed: structure the indirect desc table
  2024-03-22  5:15       ` Jason Wang
@ 2024-03-22  5:55         ` Xuan Zhuo
  2024-03-22  7:51         ` Xuan Zhuo
  1 sibling, 0 replies; 31+ messages in thread
From: Xuan Zhuo @ 2024-03-22  5:55 UTC (permalink / raw)
  To: Jason Wang
  Cc: virtualization, Michael S. Tsirkin, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, netdev

On Fri, 22 Mar 2024 13:15:10 +0800, Jason Wang <jasowang@redhat.com> wrote:
> On Thu, Mar 21, 2024 at 4:29 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> >
> > On Thu, 21 Mar 2024 12:47:18 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > On Tue, Mar 12, 2024 at 11:36 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > >
> > > > This commit structure the indirect desc table.
> > > > Then we can get the desc num directly when doing unmap.
> > > >
> > > > And save the dma info to the struct, then the indirect
> > > > will not use the dma fields of the desc_extra. The subsequent
> > > > commits will make the dma fields are optional. But for
> > > > the indirect case, we must record the dma info.
> > > >
> > > > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > > > ---
> > > >  drivers/virtio/virtio_ring.c | 66 +++++++++++++++++++++---------------
> > > >  1 file changed, 38 insertions(+), 28 deletions(-)
> > > >
> > > > diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> > > > index 0dfbd17e5a87..22a588bba166 100644
> > > > --- a/drivers/virtio/virtio_ring.c
> > > > +++ b/drivers/virtio/virtio_ring.c
> > > > @@ -72,9 +72,16 @@ struct vring_desc_state_split {
> > > >         struct vring_desc *indir_desc;  /* Indirect descriptor, if any. */
> > > >  };
> > > >
> > > > +struct vring_packed_desc_indir {
> > > > +       dma_addr_t addr;                /* Descriptor Array DMA addr. */
> > > > +       u32 len;                        /* Descriptor Array length. */
> > > > +       u32 num;
> > > > +       struct vring_packed_desc desc[];
> > > > +};
> > > > +
> > > >  struct vring_desc_state_packed {
> > > >         void *data;                     /* Data for callback. */
> > > > -       struct vring_packed_desc *indir_desc; /* Indirect descriptor, if any. */
> > > > +       struct vring_packed_desc_indir *indir_desc; /* Indirect descriptor, if any. */
> > >
> > > Maybe it's better just to have a vring_desc_extra here.
> >
> >
> > Do you mean replacing vring_packed_desc_indir by vring_desc_extra?
>
> Just add a vring_desc_extra in vring_desc_state_packed.


I am surprise to here that.

Do you mean this: #1


struct vring_desc_state_packed {
	void *data;			/* Data for callback. */
	struct vring_packed_desc *indir_desc; /* Indirect descriptor, if any. */
	u16 num;			/* Descriptor list length. */
	u16 last;			/* The last desc state in a list. */

	struct vring_desc_extra desc_extra;
};


Then desc_extra is included by desc_state. I do not think so.

I guess you mean this: #2

struct vring_desc_state_packed {
	void *data;			/* Data for callback. */
	struct vring_desc_extra *indir_desc; /* Indirect descriptor, if any. */
	u16 num;			/* Descriptor list length. */
	u16 last;			/* The last desc state in a list. */
};

indir_desc pointers to memory:

|struct vring_desc_extra | struct vring_packed_desc desc[] |


>
> >
> > I am ok for that. But vring_desc_extra has two extra items:
> >
> >         u16 flags;                      /* Descriptor flags. */
> >         u16 next;                       /* The next desc state in a list. */
> >
> > vring_packed_desc_indir has "desc". I think that is more convenient.
> >
> > So, I think vring_packed_desc_indir is appropriate.
>
> It reuses the existing structure so we had the chance to reuse the
> helper.

Which helper?

But, if you mean #2. I am ok.

Thanks.

> And it could be used for future chained indirect (if it turns
> out to be necessary).
>
> Thanks
>
> > Or I missed something.
> >
> >
> > Thanks.
> >
> >
> > >
> > > Thanks
> > >
> >
>
>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH vhost v4 10/10] virtio_ring: virtqueue_set_dma_premapped support disable
  2024-03-22  5:13       ` Jason Wang
@ 2024-03-22  6:03         ` Xuan Zhuo
  2024-03-25  7:10           ` Jason Wang
  0 siblings, 1 reply; 31+ messages in thread
From: Xuan Zhuo @ 2024-03-22  6:03 UTC (permalink / raw)
  To: Jason Wang
  Cc: virtualization, Michael S. Tsirkin, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, netdev

On Fri, 22 Mar 2024 13:13:36 +0800, Jason Wang <jasowang@redhat.com> wrote:
> On Thu, Mar 21, 2024 at 4:22 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> >
> > On Thu, 21 Mar 2024 14:02:14 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > On Tue, Mar 12, 2024 at 11:36 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > >
> > > > Now, the API virtqueue_set_dma_premapped just support to
> > > > enable premapped mode.
> > > >
> > > > If we allow enabling the premapped dynamically, we should
> > > > make this API to support disable the premapped mode.
> > > >
> > > > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > > > ---
> > > >  drivers/virtio/virtio_ring.c | 34 ++++++++++++++++++++++++++--------
> > > >  include/linux/virtio.h       |  2 +-
> > > >  2 files changed, 27 insertions(+), 9 deletions(-)
> > > >
> > > > diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> > > > index 34f4b2c0c31e..3bf69cae4965 100644
> > > > --- a/drivers/virtio/virtio_ring.c
> > > > +++ b/drivers/virtio/virtio_ring.c
> > > > @@ -2801,6 +2801,7 @@ EXPORT_SYMBOL_GPL(virtqueue_resize);
> > > >  /**
> > > >   * virtqueue_set_dma_premapped - set the vring premapped mode
> > > >   * @_vq: the struct virtqueue we're talking about.
> > > > + * @premapped: enable/disable the premapped mode.
> > > >   *
> > > >   * Enable the premapped mode of the vq.
> > > >   *
> > > > @@ -2819,9 +2820,10 @@ EXPORT_SYMBOL_GPL(virtqueue_resize);
> > > >   * 0: success.
> > > >   * -EINVAL: vring does not use the dma api, so we can not enable premapped mode.
> > > >   */
> > > > -int virtqueue_set_dma_premapped(struct virtqueue *_vq)
> > > > +int virtqueue_set_dma_premapped(struct virtqueue *_vq, bool premapped)
> > >
> > > I think we need to document the requirement for calling this.
> > >
> > > Looking at the code, it seems it requires to stop the datapath and
> > > detach all the used buffers?
> >
> >
> > YES. The complete document is:
> >
> > /**
> >  * virtqueue_set_dma_premapped - set the vring premapped mode
> >  * @_vq: the struct virtqueue we're talking about.
> >  *
> >  * Enable the premapped mode of the vq.
> >  *
> >  * The vring in premapped mode does not do dma internally, so the driver must
> >  * do dma mapping in advance. The driver must pass the dma_address through
> >  * dma_address of scatterlist. When the driver got a used buffer from
> >  * the vring, it has to unmap the dma address.
> >  *
> >  * This function must be called immediately after creating the vq, or after vq
> >  * reset, and before adding any buffers to it.
>
> I'm not sure this is a good design but we need at least some guard for
> this, probably WARN for num_added or others.


int virtqueue_set_dma_premapped(struct virtqueue *_vq)
{
	struct vring_virtqueue *vq = to_vvq(_vq);
	u32 num;

	START_USE(vq);

	num = vq->packed_ring ? vq->packed.vring.num : vq->split.vring.num;

	if (num != vq->vq.num_free) {
		END_USE(vq);
		return -EINVAL;
	}


Now, we have checked the num_free.

Thanks.


>
> Thanks
>
> >  *
> >  * Caller must ensure we don't call this with other virtqueue operations
> >  * at the same time (except where noted).
> >  *
> >  * Returns zero or a negative error.
> >  * 0: success.
> >  * -EINVAL: vring does not use the dma api, so we can not enable premapped mode.
> >  */
> >
> > Thanks
> >
> >
> > >
> > > Thanks
> > >
> >
>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH vhost v4 03/10] virtio_ring: packed: structure the indirect desc table
  2024-03-22  5:15       ` Jason Wang
  2024-03-22  5:55         ` Xuan Zhuo
@ 2024-03-22  7:51         ` Xuan Zhuo
  2024-03-25  7:07           ` Jason Wang
  1 sibling, 1 reply; 31+ messages in thread
From: Xuan Zhuo @ 2024-03-22  7:51 UTC (permalink / raw)
  To: Jason Wang
  Cc: virtualization, Michael S. Tsirkin, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, netdev

On Fri, 22 Mar 2024 13:15:10 +0800, Jason Wang <jasowang@redhat.com> wrote:
> On Thu, Mar 21, 2024 at 4:29 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> >
> > On Thu, 21 Mar 2024 12:47:18 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > On Tue, Mar 12, 2024 at 11:36 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > >
> > > > This commit structure the indirect desc table.
> > > > Then we can get the desc num directly when doing unmap.
> > > >
> > > > And save the dma info to the struct, then the indirect
> > > > will not use the dma fields of the desc_extra. The subsequent
> > > > commits will make the dma fields are optional. But for
> > > > the indirect case, we must record the dma info.
> > > >
> > > > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > > > ---
> > > >  drivers/virtio/virtio_ring.c | 66 +++++++++++++++++++++---------------
> > > >  1 file changed, 38 insertions(+), 28 deletions(-)
> > > >
> > > > diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> > > > index 0dfbd17e5a87..22a588bba166 100644
> > > > --- a/drivers/virtio/virtio_ring.c
> > > > +++ b/drivers/virtio/virtio_ring.c
> > > > @@ -72,9 +72,16 @@ struct vring_desc_state_split {
> > > >         struct vring_desc *indir_desc;  /* Indirect descriptor, if any. */
> > > >  };
> > > >
> > > > +struct vring_packed_desc_indir {
> > > > +       dma_addr_t addr;                /* Descriptor Array DMA addr. */
> > > > +       u32 len;                        /* Descriptor Array length. */
> > > > +       u32 num;
> > > > +       struct vring_packed_desc desc[];
> > > > +};
> > > > +
> > > >  struct vring_desc_state_packed {
> > > >         void *data;                     /* Data for callback. */
> > > > -       struct vring_packed_desc *indir_desc; /* Indirect descriptor, if any. */
> > > > +       struct vring_packed_desc_indir *indir_desc; /* Indirect descriptor, if any. */
> > >
> > > Maybe it's better just to have a vring_desc_extra here.
> >
> >
> > Do you mean replacing vring_packed_desc_indir by vring_desc_extra?
>
> Just add a vring_desc_extra in vring_desc_state_packed.
>
> >
> > I am ok for that. But vring_desc_extra has two extra items:
> >
> >         u16 flags;                      /* Descriptor flags. */
> >         u16 next;                       /* The next desc state in a list. */
> >
> > vring_packed_desc_indir has "desc". I think that is more convenient.
> >
> > So, I think vring_packed_desc_indir is appropriate.
>
> It reuses the existing structure so we had the chance to reuse the
> helper.

Do you mean vring_unmap_extra_packed()?

After last commit(virtio_ring: packed: remove double check of the unmap ops):

	/* caller must check vring_need_unmap_buffer() */
	static void vring_unmap_extra_packed(const struct vring_virtqueue *vq,
					     const struct vring_desc_extra *extra)
	{
		u16 flags;

		flags = extra->flags;

		dma_unmap_page(vring_dma_dev(vq),
			       extra->addr, extra->len,
			       (flags & VRING_DESC_F_WRITE) ?
			       DMA_FROM_DEVICE : DMA_TO_DEVICE);
	}

But we should call dma_unmap_single() for indirect desc.

We know, dma_unmap_single() and dma_unmap_page() are same in essence.
So if we call dma_unmap_page for the indirect desc, we can reuse
this function. But I do not prefer doing this.

Thanks.


> And it could be used for future chained indirect (if it turns
> out to be necessary).
>
> Thanks
>
> > Or I missed something.
> >
> >
> > Thanks.
> >
> >
> > >
> > > Thanks
> > >
> >
>
>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH vhost v4 03/10] virtio_ring: packed: structure the indirect desc table
  2024-03-22  7:51         ` Xuan Zhuo
@ 2024-03-25  7:07           ` Jason Wang
  0 siblings, 0 replies; 31+ messages in thread
From: Jason Wang @ 2024-03-25  7:07 UTC (permalink / raw)
  To: Xuan Zhuo
  Cc: virtualization, Michael S. Tsirkin, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, netdev

On Fri, Mar 22, 2024 at 3:58 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
>
> On Fri, 22 Mar 2024 13:15:10 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > On Thu, Mar 21, 2024 at 4:29 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > >
> > > On Thu, 21 Mar 2024 12:47:18 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > > On Tue, Mar 12, 2024 at 11:36 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > > >
> > > > > This commit structure the indirect desc table.
> > > > > Then we can get the desc num directly when doing unmap.
> > > > >
> > > > > And save the dma info to the struct, then the indirect
> > > > > will not use the dma fields of the desc_extra. The subsequent
> > > > > commits will make the dma fields are optional. But for
> > > > > the indirect case, we must record the dma info.
> > > > >
> > > > > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > > > > ---
> > > > >  drivers/virtio/virtio_ring.c | 66 +++++++++++++++++++++---------------
> > > > >  1 file changed, 38 insertions(+), 28 deletions(-)
> > > > >
> > > > > diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> > > > > index 0dfbd17e5a87..22a588bba166 100644
> > > > > --- a/drivers/virtio/virtio_ring.c
> > > > > +++ b/drivers/virtio/virtio_ring.c
> > > > > @@ -72,9 +72,16 @@ struct vring_desc_state_split {
> > > > >         struct vring_desc *indir_desc;  /* Indirect descriptor, if any. */
> > > > >  };
> > > > >
> > > > > +struct vring_packed_desc_indir {
> > > > > +       dma_addr_t addr;                /* Descriptor Array DMA addr. */
> > > > > +       u32 len;                        /* Descriptor Array length. */
> > > > > +       u32 num;
> > > > > +       struct vring_packed_desc desc[];
> > > > > +};
> > > > > +
> > > > >  struct vring_desc_state_packed {
> > > > >         void *data;                     /* Data for callback. */
> > > > > -       struct vring_packed_desc *indir_desc; /* Indirect descriptor, if any. */
> > > > > +       struct vring_packed_desc_indir *indir_desc; /* Indirect descriptor, if any. */
> > > >
> > > > Maybe it's better just to have a vring_desc_extra here.
> > >
> > >
> > > Do you mean replacing vring_packed_desc_indir by vring_desc_extra?
> >
> > Just add a vring_desc_extra in vring_desc_state_packed.
> >
> > >
> > > I am ok for that. But vring_desc_extra has two extra items:
> > >
> > >         u16 flags;                      /* Descriptor flags. */
> > >         u16 next;                       /* The next desc state in a list. */
> > >
> > > vring_packed_desc_indir has "desc". I think that is more convenient.
> > >
> > > So, I think vring_packed_desc_indir is appropriate.
> >
> > It reuses the existing structure so we had the chance to reuse the
> > helper.
>
> Do you mean vring_unmap_extra_packed()?

Yes.


>
> After last commit(virtio_ring: packed: remove double check of the unmap ops):
>
>         /* caller must check vring_need_unmap_buffer() */
>         static void vring_unmap_extra_packed(const struct vring_virtqueue *vq,
>                                              const struct vring_desc_extra *extra)
>         {
>                 u16 flags;
>
>                 flags = extra->flags;
>
>                 dma_unmap_page(vring_dma_dev(vq),
>                                extra->addr, extra->len,
>                                (flags & VRING_DESC_F_WRITE) ?
>                                DMA_FROM_DEVICE : DMA_TO_DEVICE);
>         }
>
> But we should call dma_unmap_single() for indirect desc.
>
> We know, dma_unmap_single() and dma_unmap_page() are same in essence.

Yes, it's worth tweaking in the future.

> So if we call dma_unmap_page for the indirect desc, we can reuse
> this function. But I do not prefer doing this.

Ok.

Thanks

>
> Thanks.
>
>
> > And it could be used for future chained indirect (if it turns
> > out to be necessary).
> >
> > Thanks
> >
> > > Or I missed something.
> > >
> > >
> > > Thanks.
> > >
> > >
> > > >
> > > > Thanks
> > > >
> > >
> >
> >
>


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH vhost v4 10/10] virtio_ring: virtqueue_set_dma_premapped support disable
  2024-03-22  6:03         ` Xuan Zhuo
@ 2024-03-25  7:10           ` Jason Wang
  0 siblings, 0 replies; 31+ messages in thread
From: Jason Wang @ 2024-03-25  7:10 UTC (permalink / raw)
  To: Xuan Zhuo
  Cc: virtualization, Michael S. Tsirkin, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, netdev

On Fri, Mar 22, 2024 at 2:04 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
>
> On Fri, 22 Mar 2024 13:13:36 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > On Thu, Mar 21, 2024 at 4:22 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > >
> > > On Thu, 21 Mar 2024 14:02:14 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > > On Tue, Mar 12, 2024 at 11:36 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > > >
> > > > > Now, the API virtqueue_set_dma_premapped just support to
> > > > > enable premapped mode.
> > > > >
> > > > > If we allow enabling the premapped dynamically, we should
> > > > > make this API to support disable the premapped mode.
> > > > >
> > > > > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > > > > ---
> > > > >  drivers/virtio/virtio_ring.c | 34 ++++++++++++++++++++++++++--------
> > > > >  include/linux/virtio.h       |  2 +-
> > > > >  2 files changed, 27 insertions(+), 9 deletions(-)
> > > > >
> > > > > diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> > > > > index 34f4b2c0c31e..3bf69cae4965 100644
> > > > > --- a/drivers/virtio/virtio_ring.c
> > > > > +++ b/drivers/virtio/virtio_ring.c
> > > > > @@ -2801,6 +2801,7 @@ EXPORT_SYMBOL_GPL(virtqueue_resize);
> > > > >  /**
> > > > >   * virtqueue_set_dma_premapped - set the vring premapped mode
> > > > >   * @_vq: the struct virtqueue we're talking about.
> > > > > + * @premapped: enable/disable the premapped mode.
> > > > >   *
> > > > >   * Enable the premapped mode of the vq.
> > > > >   *
> > > > > @@ -2819,9 +2820,10 @@ EXPORT_SYMBOL_GPL(virtqueue_resize);
> > > > >   * 0: success.
> > > > >   * -EINVAL: vring does not use the dma api, so we can not enable premapped mode.
> > > > >   */
> > > > > -int virtqueue_set_dma_premapped(struct virtqueue *_vq)
> > > > > +int virtqueue_set_dma_premapped(struct virtqueue *_vq, bool premapped)
> > > >
> > > > I think we need to document the requirement for calling this.
> > > >
> > > > Looking at the code, it seems it requires to stop the datapath and
> > > > detach all the used buffers?
> > >
> > >
> > > YES. The complete document is:
> > >
> > > /**
> > >  * virtqueue_set_dma_premapped - set the vring premapped mode
> > >  * @_vq: the struct virtqueue we're talking about.
> > >  *
> > >  * Enable the premapped mode of the vq.
> > >  *
> > >  * The vring in premapped mode does not do dma internally, so the driver must
> > >  * do dma mapping in advance. The driver must pass the dma_address through
> > >  * dma_address of scatterlist. When the driver got a used buffer from
> > >  * the vring, it has to unmap the dma address.
> > >  *
> > >  * This function must be called immediately after creating the vq, or after vq
> > >  * reset, and before adding any buffers to it.
> >
> > I'm not sure this is a good design but we need at least some guard for
> > this, probably WARN for num_added or others.
>
>
> int virtqueue_set_dma_premapped(struct virtqueue *_vq)
> {
>         struct vring_virtqueue *vq = to_vvq(_vq);
>         u32 num;
>
>         START_USE(vq);
>
>         num = vq->packed_ring ? vq->packed.vring.num : vq->split.vring.num;
>
>         if (num != vq->vq.num_free) {
>                 END_USE(vq);
>                 return -EINVAL;
>         }
>
>
> Now, we have checked the num_free.

Ok, let's add it to the doc.

Thanks

>
> Thanks.
>
>
> >
> > Thanks
> >
> > >  *
> > >  * Caller must ensure we don't call this with other virtqueue operations
> > >  * at the same time (except where noted).
> > >  *
> > >  * Returns zero or a negative error.
> > >  * 0: success.
> > >  * -EINVAL: vring does not use the dma api, so we can not enable premapped mode.
> > >  */
> > >
> > > Thanks
> > >
> > >
> > > >
> > > > Thanks
> > > >
> > >
> >
>


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH vhost v4 02/10] virtio_ring: packed: remove double check of the unmap ops
  2024-03-21  8:20     ` Xuan Zhuo
  2024-03-22  5:10       ` Jason Wang
@ 2024-03-26  7:32       ` Michael S. Tsirkin
  2024-03-27  7:11         ` Xuan Zhuo
  1 sibling, 1 reply; 31+ messages in thread
From: Michael S. Tsirkin @ 2024-03-26  7:32 UTC (permalink / raw)
  To: Xuan Zhuo
  Cc: Jason Wang, virtualization, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, netdev

On Thu, Mar 21, 2024 at 04:20:09PM +0800, Xuan Zhuo wrote:
> On Thu, 21 Mar 2024 13:57:06 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > On Tue, Mar 12, 2024 at 11:36 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > >
> > > In the functions vring_unmap_extra_packed and vring_unmap_desc_packed,
> > > multiple checks are made whether unmap is performed and whether it is
> > > INDIRECT.
> > >
> > > These two functions are usually called in a loop, and we should put the
> > > check outside the loop.
> > >
> > > And we unmap the descs with VRING_DESC_F_INDIRECT on the same path with
> > > other descs, that make the thing more complex. If we distinguish the
> > > descs with VRING_DESC_F_INDIRECT before unmap, thing will be clearer.
> > >
> > > 1. only one desc of the desc table is used, we do not need the loop
> > > 2. the called unmap api is difference from the other desc
> > > 3. the vq->premapped is not needed to check
> > > 4. the vq->indirect is not needed to check
> > > 5. the state->indir_desc must not be null
> > >
> > > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > > ---
> > >  drivers/virtio/virtio_ring.c | 78 ++++++++++++++++++------------------
> > >  1 file changed, 40 insertions(+), 38 deletions(-)
> > >
> > > diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> > > index c2779e34aac7..0dfbd17e5a87 100644
> > > --- a/drivers/virtio/virtio_ring.c
> > > +++ b/drivers/virtio/virtio_ring.c
> > > @@ -1214,6 +1214,7 @@ static u16 packed_last_used(u16 last_used_idx)
> > >         return last_used_idx & ~(-(1 << VRING_PACKED_EVENT_F_WRAP_CTR));
> > >  }
> > >
> > > +/* caller must check vring_need_unmap_buffer() */
> > >  static void vring_unmap_extra_packed(const struct vring_virtqueue *vq,
> > >                                      const struct vring_desc_extra *extra)
> > >  {
> > > @@ -1221,33 +1222,18 @@ static void vring_unmap_extra_packed(const struct vring_virtqueue *vq,
> > >
> > >         flags = extra->flags;
> > >
> > > -       if (flags & VRING_DESC_F_INDIRECT) {
> > > -               if (!vq->use_dma_api)
> > > -                       return;
> > > -
> > > -               dma_unmap_single(vring_dma_dev(vq),
> > > -                                extra->addr, extra->len,
> > > -                                (flags & VRING_DESC_F_WRITE) ?
> > > -                                DMA_FROM_DEVICE : DMA_TO_DEVICE);
> > > -       } else {
> > > -               if (!vring_need_unmap_buffer(vq))
> > > -                       return;
> > > -
> > > -               dma_unmap_page(vring_dma_dev(vq),
> > > -                              extra->addr, extra->len,
> > > -                              (flags & VRING_DESC_F_WRITE) ?
> > > -                              DMA_FROM_DEVICE : DMA_TO_DEVICE);
> > > -       }
> > > +       dma_unmap_page(vring_dma_dev(vq),
> > > +                      extra->addr, extra->len,
> > > +                      (flags & VRING_DESC_F_WRITE) ?
> > > +                      DMA_FROM_DEVICE : DMA_TO_DEVICE);
> > >  }
> > >
> > > +/* caller must check vring_need_unmap_buffer() */
> > >  static void vring_unmap_desc_packed(const struct vring_virtqueue *vq,
> > >                                     const struct vring_packed_desc *desc)
> > >  {
> > >         u16 flags;
> > >
> > > -       if (!vring_need_unmap_buffer(vq))
> > > -               return;
> > > -
> > >         flags = le16_to_cpu(desc->flags);
> > >
> > >         dma_unmap_page(vring_dma_dev(vq),
> > > @@ -1323,7 +1309,7 @@ static int virtqueue_add_indirect_packed(struct vring_virtqueue *vq,
> > >                         total_sg * sizeof(struct vring_packed_desc),
> > >                         DMA_TO_DEVICE);
> > >         if (vring_mapping_error(vq, addr)) {
> > > -               if (vq->premapped)
> > > +               if (!vring_need_unmap_buffer(vq))
> > >                         goto free_desc;
> > >
> > >                 goto unmap_release;
> > > @@ -1338,10 +1324,11 @@ static int virtqueue_add_indirect_packed(struct vring_virtqueue *vq,
> > >                 vq->packed.desc_extra[id].addr = addr;
> > >                 vq->packed.desc_extra[id].len = total_sg *
> > >                                 sizeof(struct vring_packed_desc);
> > > -               vq->packed.desc_extra[id].flags = VRING_DESC_F_INDIRECT |
> > > -                                                 vq->packed.avail_used_flags;
> > >         }
> > >
> > > +       vq->packed.desc_extra[id].flags = VRING_DESC_F_INDIRECT |
> > > +               vq->packed.avail_used_flags;
> > > +
> > >         /*
> > >          * A driver MUST NOT make the first descriptor in the list
> > >          * available before all subsequent descriptors comprising
> > > @@ -1382,6 +1369,8 @@ static int virtqueue_add_indirect_packed(struct vring_virtqueue *vq,
> > >  unmap_release:
> > >         err_idx = i;
> > >
> > > +       WARN_ON(!vring_need_unmap_buffer(vq));
> > > +
> > >         for (i = 0; i < err_idx; i++)
> > >                 vring_unmap_desc_packed(vq, &desc[i]);
> > >
> > > @@ -1475,12 +1464,13 @@ static inline int virtqueue_add_packed(struct virtqueue *_vq,
> > >                         desc[i].len = cpu_to_le32(sg->length);
> > >                         desc[i].id = cpu_to_le16(id);
> > >
> > > -                       if (unlikely(vq->use_dma_api)) {
> > > +                       if (vring_need_unmap_buffer(vq)) {
> > >                                 vq->packed.desc_extra[curr].addr = addr;
> > >                                 vq->packed.desc_extra[curr].len = sg->length;
> > > -                               vq->packed.desc_extra[curr].flags =
> > > -                                       le16_to_cpu(flags);
> > >                         }
> > > +
> > > +                       vq->packed.desc_extra[curr].flags = le16_to_cpu(flags);
> > > +
> > >                         prev = curr;
> > >                         curr = vq->packed.desc_extra[curr].next;
> > >
> > > @@ -1530,6 +1520,8 @@ static inline int virtqueue_add_packed(struct virtqueue *_vq,
> > >
> > >         vq->packed.avail_used_flags = avail_used_flags;
> > >
> > > +       WARN_ON(!vring_need_unmap_buffer(vq));
> > > +
> > >         for (n = 0; n < total_sg; n++) {
> > >                 if (i == err_idx)
> > >                         break;
> > > @@ -1599,7 +1591,9 @@ static void detach_buf_packed(struct vring_virtqueue *vq,
> > >         struct vring_desc_state_packed *state = NULL;
> > >         struct vring_packed_desc *desc;
> > >         unsigned int i, curr;
> > > +       u16 flags;
> > >
> > > +       flags = vq->packed.desc_extra[id].flags;
> > >         state = &vq->packed.desc_state[id];
> > >
> > >         /* Clear data ptr. */
> > > @@ -1609,22 +1603,32 @@ static void detach_buf_packed(struct vring_virtqueue *vq,
> > >         vq->free_head = id;
> > >         vq->vq.num_free += state->num;
> > >
> > > -       if (unlikely(vq->use_dma_api)) {
> > > -               curr = id;
> > > -               for (i = 0; i < state->num; i++) {
> > > -                       vring_unmap_extra_packed(vq,
> > > -                                                &vq->packed.desc_extra[curr]);
> > > -                       curr = vq->packed.desc_extra[curr].next;
> > > +       if (!(flags & VRING_DESC_F_INDIRECT)) {
> > > +               if (vring_need_unmap_buffer(vq)) {
> > > +                       curr = id;
> > > +                       for (i = 0; i < state->num; i++) {
> > > +                               vring_unmap_extra_packed(vq,
> > > +                                                        &vq->packed.desc_extra[curr]);
> > > +                               curr = vq->packed.desc_extra[curr].next;
> > > +                       }
> > >                 }
> > > -       }
> > >
> > > -       if (vq->indirect) {
> > > +               if (ctx)
> > > +                       *ctx = state->indir_desc;
> > > +       } else {
> > > +               const struct vring_desc_extra *extra;
> > >                 u32 len;
> > >
> > > +               if (vq->use_dma_api) {
> > > +                       extra = &vq->packed.desc_extra[id];
> > > +                       dma_unmap_single(vring_dma_dev(vq),
> > > +                                        extra->addr, extra->len,
> > > +                                        (flags & VRING_DESC_F_WRITE) ?
> > > +                                        DMA_FROM_DEVICE : DMA_TO_DEVICE);
> > > +               }
> >
> > Theoretically, indirect descriptors could be chained. It is supported
> > without this patch but not here.
> 
> 
> YES. But now, that is not supported by "add", so I think we
> do not need to think about it.
> 
> Thanks.


the "add" you are referring to is virtio drivers in the linux guest?
That's not the only guest and there's no way to be sure
no one does it. We can make some unusual operations go somewhat
slower but breaking them outright is not a good idea.

> >
> > Thanks
> >
> > > +
> > >                 /* Free the indirect table, if any, now that it's unmapped. */
> > >                 desc = state->indir_desc;
> > > -               if (!desc)
> > > -                       return;
> > >
> > >                 if (vring_need_unmap_buffer(vq)) {
> > >                         len = vq->packed.desc_extra[id].len;
> > > @@ -1634,8 +1638,6 @@ static void detach_buf_packed(struct vring_virtqueue *vq,
> > >                 }
> > >                 kfree(desc);
> > >                 state->indir_desc = NULL;
> > > -       } else if (ctx) {
> > > -               *ctx = state->indir_desc;
> > >         }
> > >  }
> > >
> > > --
> > > 2.32.0.3.g01195cf9f
> > >
> >


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH vhost v4 02/10] virtio_ring: packed: remove double check of the unmap ops
  2024-03-26  7:32       ` Michael S. Tsirkin
@ 2024-03-27  7:11         ` Xuan Zhuo
  0 siblings, 0 replies; 31+ messages in thread
From: Xuan Zhuo @ 2024-03-27  7:11 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Jason Wang, virtualization, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, netdev

On Tue, 26 Mar 2024 03:32:43 -0400, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> On Thu, Mar 21, 2024 at 04:20:09PM +0800, Xuan Zhuo wrote:
> > On Thu, 21 Mar 2024 13:57:06 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > On Tue, Mar 12, 2024 at 11:36 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > >
> > > > In the functions vring_unmap_extra_packed and vring_unmap_desc_packed,
> > > > multiple checks are made whether unmap is performed and whether it is
> > > > INDIRECT.
> > > >
> > > > These two functions are usually called in a loop, and we should put the
> > > > check outside the loop.
> > > >
> > > > And we unmap the descs with VRING_DESC_F_INDIRECT on the same path with
> > > > other descs, that make the thing more complex. If we distinguish the
> > > > descs with VRING_DESC_F_INDIRECT before unmap, thing will be clearer.
> > > >
> > > > 1. only one desc of the desc table is used, we do not need the loop
> > > > 2. the called unmap api is difference from the other desc
> > > > 3. the vq->premapped is not needed to check
> > > > 4. the vq->indirect is not needed to check
> > > > 5. the state->indir_desc must not be null
> > > >
> > > > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > > > ---
> > > >  drivers/virtio/virtio_ring.c | 78 ++++++++++++++++++------------------
> > > >  1 file changed, 40 insertions(+), 38 deletions(-)
> > > >
> > > > diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> > > > index c2779e34aac7..0dfbd17e5a87 100644
> > > > --- a/drivers/virtio/virtio_ring.c
> > > > +++ b/drivers/virtio/virtio_ring.c
> > > > @@ -1214,6 +1214,7 @@ static u16 packed_last_used(u16 last_used_idx)
> > > >         return last_used_idx & ~(-(1 << VRING_PACKED_EVENT_F_WRAP_CTR));
> > > >  }
> > > >
> > > > +/* caller must check vring_need_unmap_buffer() */
> > > >  static void vring_unmap_extra_packed(const struct vring_virtqueue *vq,
> > > >                                      const struct vring_desc_extra *extra)
> > > >  {
> > > > @@ -1221,33 +1222,18 @@ static void vring_unmap_extra_packed(const struct vring_virtqueue *vq,
> > > >
> > > >         flags = extra->flags;
> > > >
> > > > -       if (flags & VRING_DESC_F_INDIRECT) {
> > > > -               if (!vq->use_dma_api)
> > > > -                       return;
> > > > -
> > > > -               dma_unmap_single(vring_dma_dev(vq),
> > > > -                                extra->addr, extra->len,
> > > > -                                (flags & VRING_DESC_F_WRITE) ?
> > > > -                                DMA_FROM_DEVICE : DMA_TO_DEVICE);
> > > > -       } else {
> > > > -               if (!vring_need_unmap_buffer(vq))
> > > > -                       return;
> > > > -
> > > > -               dma_unmap_page(vring_dma_dev(vq),
> > > > -                              extra->addr, extra->len,
> > > > -                              (flags & VRING_DESC_F_WRITE) ?
> > > > -                              DMA_FROM_DEVICE : DMA_TO_DEVICE);
> > > > -       }
> > > > +       dma_unmap_page(vring_dma_dev(vq),
> > > > +                      extra->addr, extra->len,
> > > > +                      (flags & VRING_DESC_F_WRITE) ?
> > > > +                      DMA_FROM_DEVICE : DMA_TO_DEVICE);
> > > >  }
> > > >
> > > > +/* caller must check vring_need_unmap_buffer() */
> > > >  static void vring_unmap_desc_packed(const struct vring_virtqueue *vq,
> > > >                                     const struct vring_packed_desc *desc)
> > > >  {
> > > >         u16 flags;
> > > >
> > > > -       if (!vring_need_unmap_buffer(vq))
> > > > -               return;
> > > > -
> > > >         flags = le16_to_cpu(desc->flags);
> > > >
> > > >         dma_unmap_page(vring_dma_dev(vq),
> > > > @@ -1323,7 +1309,7 @@ static int virtqueue_add_indirect_packed(struct vring_virtqueue *vq,
> > > >                         total_sg * sizeof(struct vring_packed_desc),
> > > >                         DMA_TO_DEVICE);
> > > >         if (vring_mapping_error(vq, addr)) {
> > > > -               if (vq->premapped)
> > > > +               if (!vring_need_unmap_buffer(vq))
> > > >                         goto free_desc;
> > > >
> > > >                 goto unmap_release;
> > > > @@ -1338,10 +1324,11 @@ static int virtqueue_add_indirect_packed(struct vring_virtqueue *vq,
> > > >                 vq->packed.desc_extra[id].addr = addr;
> > > >                 vq->packed.desc_extra[id].len = total_sg *
> > > >                                 sizeof(struct vring_packed_desc);
> > > > -               vq->packed.desc_extra[id].flags = VRING_DESC_F_INDIRECT |
> > > > -                                                 vq->packed.avail_used_flags;
> > > >         }
> > > >
> > > > +       vq->packed.desc_extra[id].flags = VRING_DESC_F_INDIRECT |
> > > > +               vq->packed.avail_used_flags;
> > > > +
> > > >         /*
> > > >          * A driver MUST NOT make the first descriptor in the list
> > > >          * available before all subsequent descriptors comprising
> > > > @@ -1382,6 +1369,8 @@ static int virtqueue_add_indirect_packed(struct vring_virtqueue *vq,
> > > >  unmap_release:
> > > >         err_idx = i;
> > > >
> > > > +       WARN_ON(!vring_need_unmap_buffer(vq));
> > > > +
> > > >         for (i = 0; i < err_idx; i++)
> > > >                 vring_unmap_desc_packed(vq, &desc[i]);
> > > >
> > > > @@ -1475,12 +1464,13 @@ static inline int virtqueue_add_packed(struct virtqueue *_vq,
> > > >                         desc[i].len = cpu_to_le32(sg->length);
> > > >                         desc[i].id = cpu_to_le16(id);
> > > >
> > > > -                       if (unlikely(vq->use_dma_api)) {
> > > > +                       if (vring_need_unmap_buffer(vq)) {
> > > >                                 vq->packed.desc_extra[curr].addr = addr;
> > > >                                 vq->packed.desc_extra[curr].len = sg->length;
> > > > -                               vq->packed.desc_extra[curr].flags =
> > > > -                                       le16_to_cpu(flags);
> > > >                         }
> > > > +
> > > > +                       vq->packed.desc_extra[curr].flags = le16_to_cpu(flags);
> > > > +
> > > >                         prev = curr;
> > > >                         curr = vq->packed.desc_extra[curr].next;
> > > >
> > > > @@ -1530,6 +1520,8 @@ static inline int virtqueue_add_packed(struct virtqueue *_vq,
> > > >
> > > >         vq->packed.avail_used_flags = avail_used_flags;
> > > >
> > > > +       WARN_ON(!vring_need_unmap_buffer(vq));
> > > > +
> > > >         for (n = 0; n < total_sg; n++) {
> > > >                 if (i == err_idx)
> > > >                         break;
> > > > @@ -1599,7 +1591,9 @@ static void detach_buf_packed(struct vring_virtqueue *vq,
> > > >         struct vring_desc_state_packed *state = NULL;
> > > >         struct vring_packed_desc *desc;
> > > >         unsigned int i, curr;
> > > > +       u16 flags;
> > > >
> > > > +       flags = vq->packed.desc_extra[id].flags;
> > > >         state = &vq->packed.desc_state[id];
> > > >
> > > >         /* Clear data ptr. */
> > > > @@ -1609,22 +1603,32 @@ static void detach_buf_packed(struct vring_virtqueue *vq,
> > > >         vq->free_head = id;
> > > >         vq->vq.num_free += state->num;
> > > >
> > > > -       if (unlikely(vq->use_dma_api)) {
> > > > -               curr = id;
> > > > -               for (i = 0; i < state->num; i++) {
> > > > -                       vring_unmap_extra_packed(vq,
> > > > -                                                &vq->packed.desc_extra[curr]);
> > > > -                       curr = vq->packed.desc_extra[curr].next;
> > > > +       if (!(flags & VRING_DESC_F_INDIRECT)) {
> > > > +               if (vring_need_unmap_buffer(vq)) {
> > > > +                       curr = id;
> > > > +                       for (i = 0; i < state->num; i++) {
> > > > +                               vring_unmap_extra_packed(vq,
> > > > +                                                        &vq->packed.desc_extra[curr]);
> > > > +                               curr = vq->packed.desc_extra[curr].next;
> > > > +                       }
> > > >                 }
> > > > -       }
> > > >
> > > > -       if (vq->indirect) {
> > > > +               if (ctx)
> > > > +                       *ctx = state->indir_desc;
> > > > +       } else {
> > > > +               const struct vring_desc_extra *extra;
> > > >                 u32 len;
> > > >
> > > > +               if (vq->use_dma_api) {
> > > > +                       extra = &vq->packed.desc_extra[id];
> > > > +                       dma_unmap_single(vring_dma_dev(vq),
> > > > +                                        extra->addr, extra->len,
> > > > +                                        (flags & VRING_DESC_F_WRITE) ?
> > > > +                                        DMA_FROM_DEVICE : DMA_TO_DEVICE);
> > > > +               }
> > >
> > > Theoretically, indirect descriptors could be chained. It is supported
> > > without this patch but not here.
> >
> >
> > YES. But now, that is not supported by "add", so I think we
> > do not need to think about it.
> >
> > Thanks.
>
>
> the "add" you are referring to is virtio drivers in the linux guest?
> That's not the only guest and there's no way to be sure
> no one does it. We can make some unusual operations go somewhat
> slower but breaking them outright is not a good idea.


"add" means virtqueue_add_packed/virtqueue_add_split.

If the virtqueue_add_packed/virtqueue_add_split do not chain
the indirect desc, then I think do not consider the case that
the indirect desc wash chained.

That was all done by the linux virtio core, we do not need to
consider other cases.

Thanks.


>
> > >
> > > Thanks
> > >
> > > > +
> > > >                 /* Free the indirect table, if any, now that it's unmapped. */
> > > >                 desc = state->indir_desc;
> > > > -               if (!desc)
> > > > -                       return;
> > > >
> > > >                 if (vring_need_unmap_buffer(vq)) {
> > > >                         len = vq->packed.desc_extra[id].len;
> > > > @@ -1634,8 +1638,6 @@ static void detach_buf_packed(struct vring_virtqueue *vq,
> > > >                 }
> > > >                 kfree(desc);
> > > >                 state->indir_desc = NULL;
> > > > -       } else if (ctx) {
> > > > -               *ctx = state->indir_desc;
> > > >         }
> > > >  }
> > > >
> > > > --
> > > > 2.32.0.3.g01195cf9f
> > > >
> > >
>

^ permalink raw reply	[flat|nested] 31+ messages in thread

end of thread, other threads:[~2024-03-27  7:14 UTC | newest]

Thread overview: 31+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-03-12  3:35 [PATCH vhost v4 00/10] virtio: drivers maintain dma info for premapped vq Xuan Zhuo
2024-03-12  3:35 ` [PATCH vhost v4 01/10] virtio_ring: introduce vring_need_unmap_buffer Xuan Zhuo
2024-03-12  3:35 ` [PATCH vhost v4 02/10] virtio_ring: packed: remove double check of the unmap ops Xuan Zhuo
2024-03-21  5:57   ` Jason Wang
2024-03-21  8:20     ` Xuan Zhuo
2024-03-22  5:10       ` Jason Wang
2024-03-26  7:32       ` Michael S. Tsirkin
2024-03-27  7:11         ` Xuan Zhuo
2024-03-12  3:35 ` [PATCH vhost v4 03/10] virtio_ring: packed: structure the indirect desc table Xuan Zhuo
2024-03-21  4:47   ` Jason Wang
2024-03-21  8:24     ` Xuan Zhuo
2024-03-22  5:15       ` Jason Wang
2024-03-22  5:55         ` Xuan Zhuo
2024-03-22  7:51         ` Xuan Zhuo
2024-03-25  7:07           ` Jason Wang
2024-03-12  3:35 ` [PATCH vhost v4 04/10] virtio_ring: split: remove double check of the unmap ops Xuan Zhuo
2024-03-12  3:35 ` [PATCH vhost v4 05/10] virtio_ring: split: structure the indirect desc table Xuan Zhuo
2024-03-12  3:35 ` [PATCH vhost v4 06/10] virtio_ring: no store dma info when unmap is not needed Xuan Zhuo
2024-03-12  3:35 ` [PATCH vhost v4 07/10] virtio: find_vqs: add new parameter premapped Xuan Zhuo
2024-03-12  3:35 ` [PATCH vhost v4 08/10] virtio_ring: export premapped to driver by struct virtqueue Xuan Zhuo
2024-03-12  3:35 ` [PATCH vhost v4 09/10] virtio_net: set premapped mode by find_vqs() Xuan Zhuo
2024-03-12  3:35 ` [PATCH vhost v4 10/10] virtio_ring: virtqueue_set_dma_premapped support disable Xuan Zhuo
2024-03-21  6:02   ` Jason Wang
2024-03-21  8:21     ` Xuan Zhuo
2024-03-22  5:13       ` Jason Wang
2024-03-22  6:03         ` Xuan Zhuo
2024-03-25  7:10           ` Jason Wang
2024-03-19  6:56 ` [PATCH vhost v4 00/10] virtio: drivers maintain dma info for premapped vq Michael S. Tsirkin
2024-03-20  9:25   ` Jason Wang
2024-03-21  4:45 ` Jason Wang
2024-03-21  8:30   ` Xuan Zhuo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).