All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH vhost v10 00/10] virtio core prepares for AF_XDP
@ 2023-06-02  9:21 ` Xuan Zhuo
  0 siblings, 0 replies; 91+ messages in thread
From: Xuan Zhuo @ 2023-06-02  9:21 UTC (permalink / raw)
  To: virtualization
  Cc: Xuan Zhuo, Jesper Dangaard Brouer, Daniel Borkmann,
	Michael S. Tsirkin, netdev, John Fastabend, Alexei Starovoitov,
	Eric Dumazet, Jakub Kicinski, bpf, Paolo Abeni, David S. Miller

## About DMA APIs

Now, virtio may can not work with DMA APIs when virtio features do not have
VIRTIO_F_ACCESS_PLATFORM.

1. I tried to let DMA APIs return phy address by virtio-device. But DMA APIs just
   work with the "real" devices.
2. I tried to let xsk support callballs to get phy address from virtio-net
   driver as the dma address. But the maintainers of xsk may want to use dma-buf
   to replace the DMA APIs. I think that may be a larger effort. We will wait
   too long.

So rethinking this, firstly, we can support premapped-dma only for devices with
VIRTIO_F_ACCESS_PLATFORM. In the case of af-xdp, if the users want to use it,
they have to update the device to support VIRTIO_F_RING_RESET, and they can also
enable the device's VIRTIO_F_ACCESS_PLATFORM feature.

Thanks for the help from Christoph.

=================

XDP socket(AF_XDP) is an excellent bypass kernel network framework. The zero
copy feature of xsk (XDP socket) needs to be supported by the driver. The
performance of zero copy is very good.

ENV: Qemu with vhost.

                   vhost cpu | Guest APP CPU |Guest Softirq CPU | PPS
-----------------------------|---------------|------------------|------------
xmit by sockperf:     90%    |   100%        |                  |  318967
xmit by xsk:          100%   |   30%         |   33%            | 1192064
recv by sockperf:     100%   |   68%         |   100%           |  692288
recv by xsk:          100%   |   33%         |   43%            |  771670

Before achieving the function of Virtio-Net, we also have to let virtio core
support these features:

1. virtio core support premapped
2. virtio core support reset per-queue
3. introduce DMA APIs to virtio core

Please review.

Thanks.

v10:
 1. support to set vq to premapped mode, then the vq just handles the premapped request.
 2. virtio-net support to do dma mapping in advance

v9:
 1. use flag to distinguish the premapped operations. no do judgment by sg.

v8:
 1. vring_sg_address: check by sg_page(sg) not dma_address. Because 0 is a valid dma address
 2. remove unused code from vring_map_one_sg()

v7:
 1. virtqueue_dma_dev() return NULL when virtio is without DMA API.

v6:
 1. change the size of the flags to u32.

v5:
 1. fix for error handler
 2. add flags to record internal dma mapping

v4:
 1. rename map_inter to dma_map_internal
 2. fix: Excess function parameter 'vq' description in 'virtqueue_dma_dev'

v3:
 1. add map_inter to struct desc state to reocrd whether virtio core do dma map

v2:
 1. based on sgs[0]->dma_address to judgment is premapped
 2. based on extra.addr to judgment to do unmap for no-indirect desc
 3. based on indir_desc to judgment to do unmap for indirect desc
 4. rename virtqueue_get_dma_dev to virtqueue_dma_dev

v1:
 1. expose dma device. NO introduce the api for dma and sync
 2. split some commit for review.




Xuan Zhuo (10):
  virtio_ring: put mapping error check in vring_map_one_sg
  virtio_ring: introduce virtqueue_set_premapped()
  virtio_ring: split: support add premapped buf
  virtio_ring: packed: support add premapped buf
  virtio_ring: split-detach: support return dma info to driver
  virtio_ring: packed-detach: support return dma info to driver
  virtio_ring: introduce helpers for premapped
  virtio_ring: introduce virtqueue_dma_dev()
  virtio_ring: introduce virtqueue_add_sg()
  virtio_net: support dma premapped

 drivers/net/virtio_net.c     | 163 ++++++++++--
 drivers/virtio/virtio_ring.c | 493 +++++++++++++++++++++++++++++++----
 include/linux/virtio.h       |  34 +++
 3 files changed, 612 insertions(+), 78 deletions(-)

--
2.32.0.3.g01195cf9f

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 91+ messages in thread

* [PATCH vhost v10 00/10] virtio core prepares for AF_XDP
@ 2023-06-02  9:21 ` Xuan Zhuo
  0 siblings, 0 replies; 91+ messages in thread
From: Xuan Zhuo @ 2023-06-02  9:21 UTC (permalink / raw)
  To: virtualization
  Cc: Michael S. Tsirkin, Jason Wang, Xuan Zhuo, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Alexei Starovoitov,
	Daniel Borkmann, Jesper Dangaard Brouer, John Fastabend, netdev,
	bpf

## About DMA APIs

Now, virtio may can not work with DMA APIs when virtio features do not have
VIRTIO_F_ACCESS_PLATFORM.

1. I tried to let DMA APIs return phy address by virtio-device. But DMA APIs just
   work with the "real" devices.
2. I tried to let xsk support callballs to get phy address from virtio-net
   driver as the dma address. But the maintainers of xsk may want to use dma-buf
   to replace the DMA APIs. I think that may be a larger effort. We will wait
   too long.

So rethinking this, firstly, we can support premapped-dma only for devices with
VIRTIO_F_ACCESS_PLATFORM. In the case of af-xdp, if the users want to use it,
they have to update the device to support VIRTIO_F_RING_RESET, and they can also
enable the device's VIRTIO_F_ACCESS_PLATFORM feature.

Thanks for the help from Christoph.

=================

XDP socket(AF_XDP) is an excellent bypass kernel network framework. The zero
copy feature of xsk (XDP socket) needs to be supported by the driver. The
performance of zero copy is very good.

ENV: Qemu with vhost.

                   vhost cpu | Guest APP CPU |Guest Softirq CPU | PPS
-----------------------------|---------------|------------------|------------
xmit by sockperf:     90%    |   100%        |                  |  318967
xmit by xsk:          100%   |   30%         |   33%            | 1192064
recv by sockperf:     100%   |   68%         |   100%           |  692288
recv by xsk:          100%   |   33%         |   43%            |  771670

Before achieving the function of Virtio-Net, we also have to let virtio core
support these features:

1. virtio core support premapped
2. virtio core support reset per-queue
3. introduce DMA APIs to virtio core

Please review.

Thanks.

v10:
 1. support to set vq to premapped mode, then the vq just handles the premapped request.
 2. virtio-net support to do dma mapping in advance

v9:
 1. use flag to distinguish the premapped operations. no do judgment by sg.

v8:
 1. vring_sg_address: check by sg_page(sg) not dma_address. Because 0 is a valid dma address
 2. remove unused code from vring_map_one_sg()

v7:
 1. virtqueue_dma_dev() return NULL when virtio is without DMA API.

v6:
 1. change the size of the flags to u32.

v5:
 1. fix for error handler
 2. add flags to record internal dma mapping

v4:
 1. rename map_inter to dma_map_internal
 2. fix: Excess function parameter 'vq' description in 'virtqueue_dma_dev'

v3:
 1. add map_inter to struct desc state to reocrd whether virtio core do dma map

v2:
 1. based on sgs[0]->dma_address to judgment is premapped
 2. based on extra.addr to judgment to do unmap for no-indirect desc
 3. based on indir_desc to judgment to do unmap for indirect desc
 4. rename virtqueue_get_dma_dev to virtqueue_dma_dev

v1:
 1. expose dma device. NO introduce the api for dma and sync
 2. split some commit for review.




Xuan Zhuo (10):
  virtio_ring: put mapping error check in vring_map_one_sg
  virtio_ring: introduce virtqueue_set_premapped()
  virtio_ring: split: support add premapped buf
  virtio_ring: packed: support add premapped buf
  virtio_ring: split-detach: support return dma info to driver
  virtio_ring: packed-detach: support return dma info to driver
  virtio_ring: introduce helpers for premapped
  virtio_ring: introduce virtqueue_dma_dev()
  virtio_ring: introduce virtqueue_add_sg()
  virtio_net: support dma premapped

 drivers/net/virtio_net.c     | 163 ++++++++++--
 drivers/virtio/virtio_ring.c | 493 +++++++++++++++++++++++++++++++----
 include/linux/virtio.h       |  34 +++
 3 files changed, 612 insertions(+), 78 deletions(-)

--
2.32.0.3.g01195cf9f


^ permalink raw reply	[flat|nested] 91+ messages in thread

* [PATCH vhost v10 01/10] virtio_ring: put mapping error check in vring_map_one_sg
  2023-06-02  9:21 ` Xuan Zhuo
@ 2023-06-02  9:21   ` Xuan Zhuo
  -1 siblings, 0 replies; 91+ messages in thread
From: Xuan Zhuo @ 2023-06-02  9:21 UTC (permalink / raw)
  To: virtualization
  Cc: Xuan Zhuo, Jesper Dangaard Brouer, Daniel Borkmann,
	Michael S. Tsirkin, netdev, John Fastabend, Alexei Starovoitov,
	Eric Dumazet, Jakub Kicinski, bpf, Paolo Abeni, David S. Miller

This patch put the dma addr error check in vring_map_one_sg().

The benefits of doing this:

1. reduce one judgment of vq->use_dma_api.
2. make vring_map_one_sg more simple, without calling
   vring_mapping_error to check the return value. simplifies subsequent
   code

Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
---
 drivers/virtio/virtio_ring.c | 37 +++++++++++++++++++++---------------
 1 file changed, 22 insertions(+), 15 deletions(-)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index c5310eaf8b46..72ed07a604d4 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -355,9 +355,8 @@ static struct device *vring_dma_dev(const struct vring_virtqueue *vq)
 }
 
 /* Map one sg entry. */
-static dma_addr_t vring_map_one_sg(const struct vring_virtqueue *vq,
-				   struct scatterlist *sg,
-				   enum dma_data_direction direction)
+static int vring_map_one_sg(const struct vring_virtqueue *vq, struct scatterlist *sg,
+			    enum dma_data_direction direction, dma_addr_t *addr)
 {
 	if (!vq->use_dma_api) {
 		/*
@@ -366,7 +365,8 @@ static dma_addr_t vring_map_one_sg(const struct vring_virtqueue *vq,
 		 * depending on the direction.
 		 */
 		kmsan_handle_dma(sg_page(sg), sg->offset, sg->length, direction);
-		return (dma_addr_t)sg_phys(sg);
+		*addr = (dma_addr_t)sg_phys(sg);
+		return 0;
 	}
 
 	/*
@@ -374,9 +374,14 @@ static dma_addr_t vring_map_one_sg(const struct vring_virtqueue *vq,
 	 * the way it expects (we don't guarantee that the scatterlist
 	 * will exist for the lifetime of the mapping).
 	 */
-	return dma_map_page(vring_dma_dev(vq),
+	*addr = dma_map_page(vring_dma_dev(vq),
 			    sg_page(sg), sg->offset, sg->length,
 			    direction);
+
+	if (dma_mapping_error(vring_dma_dev(vq), *addr))
+		return -ENOMEM;
+
+	return 0;
 }
 
 static dma_addr_t vring_map_single(const struct vring_virtqueue *vq,
@@ -588,8 +593,9 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
 
 	for (n = 0; n < out_sgs; n++) {
 		for (sg = sgs[n]; sg; sg = sg_next(sg)) {
-			dma_addr_t addr = vring_map_one_sg(vq, sg, DMA_TO_DEVICE);
-			if (vring_mapping_error(vq, addr))
+			dma_addr_t addr;
+
+			if (vring_map_one_sg(vq, sg, DMA_TO_DEVICE, &addr))
 				goto unmap_release;
 
 			prev = i;
@@ -603,8 +609,9 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
 	}
 	for (; n < (out_sgs + in_sgs); n++) {
 		for (sg = sgs[n]; sg; sg = sg_next(sg)) {
-			dma_addr_t addr = vring_map_one_sg(vq, sg, DMA_FROM_DEVICE);
-			if (vring_mapping_error(vq, addr))
+			dma_addr_t addr;
+
+			if (vring_map_one_sg(vq, sg, DMA_FROM_DEVICE, &addr))
 				goto unmap_release;
 
 			prev = i;
@@ -1279,9 +1286,8 @@ static int virtqueue_add_indirect_packed(struct vring_virtqueue *vq,
 
 	for (n = 0; n < out_sgs + in_sgs; n++) {
 		for (sg = sgs[n]; sg; sg = sg_next(sg)) {
-			addr = vring_map_one_sg(vq, sg, n < out_sgs ?
-					DMA_TO_DEVICE : DMA_FROM_DEVICE);
-			if (vring_mapping_error(vq, addr))
+			if (vring_map_one_sg(vq, sg, n < out_sgs ?
+					     DMA_TO_DEVICE : DMA_FROM_DEVICE, &addr))
 				goto unmap_release;
 
 			desc[i].flags = cpu_to_le16(n < out_sgs ?
@@ -1426,9 +1432,10 @@ static inline int virtqueue_add_packed(struct virtqueue *_vq,
 	c = 0;
 	for (n = 0; n < out_sgs + in_sgs; n++) {
 		for (sg = sgs[n]; sg; sg = sg_next(sg)) {
-			dma_addr_t addr = vring_map_one_sg(vq, sg, n < out_sgs ?
-					DMA_TO_DEVICE : DMA_FROM_DEVICE);
-			if (vring_mapping_error(vq, addr))
+			dma_addr_t addr;
+
+			if (vring_map_one_sg(vq, sg, n < out_sgs ?
+					     DMA_TO_DEVICE : DMA_FROM_DEVICE, &addr))
 				goto unmap_release;
 
 			flags = cpu_to_le16(vq->packed.avail_used_flags |
-- 
2.32.0.3.g01195cf9f

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply related	[flat|nested] 91+ messages in thread

* [PATCH vhost v10 01/10] virtio_ring: put mapping error check in vring_map_one_sg
@ 2023-06-02  9:21   ` Xuan Zhuo
  0 siblings, 0 replies; 91+ messages in thread
From: Xuan Zhuo @ 2023-06-02  9:21 UTC (permalink / raw)
  To: virtualization
  Cc: Michael S. Tsirkin, Jason Wang, Xuan Zhuo, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Alexei Starovoitov,
	Daniel Borkmann, Jesper Dangaard Brouer, John Fastabend, netdev,
	bpf

This patch put the dma addr error check in vring_map_one_sg().

The benefits of doing this:

1. reduce one judgment of vq->use_dma_api.
2. make vring_map_one_sg more simple, without calling
   vring_mapping_error to check the return value. simplifies subsequent
   code

Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
---
 drivers/virtio/virtio_ring.c | 37 +++++++++++++++++++++---------------
 1 file changed, 22 insertions(+), 15 deletions(-)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index c5310eaf8b46..72ed07a604d4 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -355,9 +355,8 @@ static struct device *vring_dma_dev(const struct vring_virtqueue *vq)
 }
 
 /* Map one sg entry. */
-static dma_addr_t vring_map_one_sg(const struct vring_virtqueue *vq,
-				   struct scatterlist *sg,
-				   enum dma_data_direction direction)
+static int vring_map_one_sg(const struct vring_virtqueue *vq, struct scatterlist *sg,
+			    enum dma_data_direction direction, dma_addr_t *addr)
 {
 	if (!vq->use_dma_api) {
 		/*
@@ -366,7 +365,8 @@ static dma_addr_t vring_map_one_sg(const struct vring_virtqueue *vq,
 		 * depending on the direction.
 		 */
 		kmsan_handle_dma(sg_page(sg), sg->offset, sg->length, direction);
-		return (dma_addr_t)sg_phys(sg);
+		*addr = (dma_addr_t)sg_phys(sg);
+		return 0;
 	}
 
 	/*
@@ -374,9 +374,14 @@ static dma_addr_t vring_map_one_sg(const struct vring_virtqueue *vq,
 	 * the way it expects (we don't guarantee that the scatterlist
 	 * will exist for the lifetime of the mapping).
 	 */
-	return dma_map_page(vring_dma_dev(vq),
+	*addr = dma_map_page(vring_dma_dev(vq),
 			    sg_page(sg), sg->offset, sg->length,
 			    direction);
+
+	if (dma_mapping_error(vring_dma_dev(vq), *addr))
+		return -ENOMEM;
+
+	return 0;
 }
 
 static dma_addr_t vring_map_single(const struct vring_virtqueue *vq,
@@ -588,8 +593,9 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
 
 	for (n = 0; n < out_sgs; n++) {
 		for (sg = sgs[n]; sg; sg = sg_next(sg)) {
-			dma_addr_t addr = vring_map_one_sg(vq, sg, DMA_TO_DEVICE);
-			if (vring_mapping_error(vq, addr))
+			dma_addr_t addr;
+
+			if (vring_map_one_sg(vq, sg, DMA_TO_DEVICE, &addr))
 				goto unmap_release;
 
 			prev = i;
@@ -603,8 +609,9 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
 	}
 	for (; n < (out_sgs + in_sgs); n++) {
 		for (sg = sgs[n]; sg; sg = sg_next(sg)) {
-			dma_addr_t addr = vring_map_one_sg(vq, sg, DMA_FROM_DEVICE);
-			if (vring_mapping_error(vq, addr))
+			dma_addr_t addr;
+
+			if (vring_map_one_sg(vq, sg, DMA_FROM_DEVICE, &addr))
 				goto unmap_release;
 
 			prev = i;
@@ -1279,9 +1286,8 @@ static int virtqueue_add_indirect_packed(struct vring_virtqueue *vq,
 
 	for (n = 0; n < out_sgs + in_sgs; n++) {
 		for (sg = sgs[n]; sg; sg = sg_next(sg)) {
-			addr = vring_map_one_sg(vq, sg, n < out_sgs ?
-					DMA_TO_DEVICE : DMA_FROM_DEVICE);
-			if (vring_mapping_error(vq, addr))
+			if (vring_map_one_sg(vq, sg, n < out_sgs ?
+					     DMA_TO_DEVICE : DMA_FROM_DEVICE, &addr))
 				goto unmap_release;
 
 			desc[i].flags = cpu_to_le16(n < out_sgs ?
@@ -1426,9 +1432,10 @@ static inline int virtqueue_add_packed(struct virtqueue *_vq,
 	c = 0;
 	for (n = 0; n < out_sgs + in_sgs; n++) {
 		for (sg = sgs[n]; sg; sg = sg_next(sg)) {
-			dma_addr_t addr = vring_map_one_sg(vq, sg, n < out_sgs ?
-					DMA_TO_DEVICE : DMA_FROM_DEVICE);
-			if (vring_mapping_error(vq, addr))
+			dma_addr_t addr;
+
+			if (vring_map_one_sg(vq, sg, n < out_sgs ?
+					     DMA_TO_DEVICE : DMA_FROM_DEVICE, &addr))
 				goto unmap_release;
 
 			flags = cpu_to_le16(vq->packed.avail_used_flags |
-- 
2.32.0.3.g01195cf9f


^ permalink raw reply related	[flat|nested] 91+ messages in thread

* [PATCH vhost v10 02/10] virtio_ring: introduce virtqueue_set_premapped()
  2023-06-02  9:21 ` Xuan Zhuo
@ 2023-06-02  9:21   ` Xuan Zhuo
  -1 siblings, 0 replies; 91+ messages in thread
From: Xuan Zhuo @ 2023-06-02  9:21 UTC (permalink / raw)
  To: virtualization
  Cc: Xuan Zhuo, Jesper Dangaard Brouer, Daniel Borkmann,
	Michael S. Tsirkin, netdev, John Fastabend, Alexei Starovoitov,
	Eric Dumazet, Jakub Kicinski, bpf, Paolo Abeni, David S. Miller

This helper allows the driver change the dma mode to premapped mode.
Under the premapped mode, the virtio core do not do dma mapping
internally.

This just work when the use_dma_api is true. If the use_dma_api is false,
the dma options is not through the DMA APIs, that is not the standard
way of the linux kernel.

Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
---
 drivers/virtio/virtio_ring.c | 40 ++++++++++++++++++++++++++++++++++++
 include/linux/virtio.h       |  2 ++
 2 files changed, 42 insertions(+)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index 72ed07a604d4..2afdfb9e3e30 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -172,6 +172,9 @@ struct vring_virtqueue {
 	/* Host publishes avail event idx */
 	bool event;
 
+	/* Do DMA mapping by driver */
+	bool premapped;
+
 	/* Head of free buffer list. */
 	unsigned int free_head;
 	/* Number we've added since last sync. */
@@ -2059,6 +2062,7 @@ static struct virtqueue *vring_create_virtqueue_packed(
 	vq->packed_ring = true;
 	vq->dma_dev = dma_dev;
 	vq->use_dma_api = vring_use_dma_api(vdev);
+	vq->premapped = false;
 
 	vq->indirect = virtio_has_feature(vdev, VIRTIO_RING_F_INDIRECT_DESC) &&
 		!context;
@@ -2548,6 +2552,7 @@ static struct virtqueue *__vring_new_virtqueue(unsigned int index,
 #endif
 	vq->dma_dev = dma_dev;
 	vq->use_dma_api = vring_use_dma_api(vdev);
+	vq->premapped = false;
 
 	vq->indirect = virtio_has_feature(vdev, VIRTIO_RING_F_INDIRECT_DESC) &&
 		!context;
@@ -2691,6 +2696,41 @@ int virtqueue_resize(struct virtqueue *_vq, u32 num,
 }
 EXPORT_SYMBOL_GPL(virtqueue_resize);
 
+/**
+ * virtqueue_set_premapped - set the vring premapped mode
+ * @_vq: the struct virtqueue we're talking about.
+ *
+ * Enable the premapped mode of the vq.
+ *
+ * The vring in premapped mode does not do dma internally, so the driver must
+ * do dma mapping in advance. The driver must pass the dma_address through
+ * dma_address of scatterlist. When the driver got a used buffer from
+ * the vring, it has to unmap the dma address. So the driver must call
+ * virtqueue_get_buf_premapped()/virtqueue_detach_unused_buf_premapped().
+ *
+ * This must be called before adding any buf to vring.
+ * So this should be called immediately after init vq or vq reset.
+ *
+ * Caller must ensure we don't call this with other virtqueue operations
+ * at the same time (except where noted).
+ *
+ * Returns zero or a negative error.
+ * 0: success.
+ * -EINVAL: vring does not use the dma api, so we can not enable premapped mode.
+ */
+int virtqueue_set_premapped(struct virtqueue *_vq)
+{
+	struct vring_virtqueue *vq = to_vvq(_vq);
+
+	if (!vq->use_dma_api)
+		return -EINVAL;
+
+	vq->premapped = true;
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(virtqueue_set_premapped);
+
 /* Only available for split ring */
 struct virtqueue *vring_new_virtqueue(unsigned int index,
 				      unsigned int num,
diff --git a/include/linux/virtio.h b/include/linux/virtio.h
index b93238db94e3..1fc0e1023bd4 100644
--- a/include/linux/virtio.h
+++ b/include/linux/virtio.h
@@ -78,6 +78,8 @@ bool virtqueue_enable_cb(struct virtqueue *vq);
 
 unsigned virtqueue_enable_cb_prepare(struct virtqueue *vq);
 
+int virtqueue_set_premapped(struct virtqueue *_vq);
+
 bool virtqueue_poll(struct virtqueue *vq, unsigned);
 
 bool virtqueue_enable_cb_delayed(struct virtqueue *vq);
-- 
2.32.0.3.g01195cf9f

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply related	[flat|nested] 91+ messages in thread

* [PATCH vhost v10 02/10] virtio_ring: introduce virtqueue_set_premapped()
@ 2023-06-02  9:21   ` Xuan Zhuo
  0 siblings, 0 replies; 91+ messages in thread
From: Xuan Zhuo @ 2023-06-02  9:21 UTC (permalink / raw)
  To: virtualization
  Cc: Michael S. Tsirkin, Jason Wang, Xuan Zhuo, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Alexei Starovoitov,
	Daniel Borkmann, Jesper Dangaard Brouer, John Fastabend, netdev,
	bpf

This helper allows the driver change the dma mode to premapped mode.
Under the premapped mode, the virtio core do not do dma mapping
internally.

This just work when the use_dma_api is true. If the use_dma_api is false,
the dma options is not through the DMA APIs, that is not the standard
way of the linux kernel.

Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
---
 drivers/virtio/virtio_ring.c | 40 ++++++++++++++++++++++++++++++++++++
 include/linux/virtio.h       |  2 ++
 2 files changed, 42 insertions(+)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index 72ed07a604d4..2afdfb9e3e30 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -172,6 +172,9 @@ struct vring_virtqueue {
 	/* Host publishes avail event idx */
 	bool event;
 
+	/* Do DMA mapping by driver */
+	bool premapped;
+
 	/* Head of free buffer list. */
 	unsigned int free_head;
 	/* Number we've added since last sync. */
@@ -2059,6 +2062,7 @@ static struct virtqueue *vring_create_virtqueue_packed(
 	vq->packed_ring = true;
 	vq->dma_dev = dma_dev;
 	vq->use_dma_api = vring_use_dma_api(vdev);
+	vq->premapped = false;
 
 	vq->indirect = virtio_has_feature(vdev, VIRTIO_RING_F_INDIRECT_DESC) &&
 		!context;
@@ -2548,6 +2552,7 @@ static struct virtqueue *__vring_new_virtqueue(unsigned int index,
 #endif
 	vq->dma_dev = dma_dev;
 	vq->use_dma_api = vring_use_dma_api(vdev);
+	vq->premapped = false;
 
 	vq->indirect = virtio_has_feature(vdev, VIRTIO_RING_F_INDIRECT_DESC) &&
 		!context;
@@ -2691,6 +2696,41 @@ int virtqueue_resize(struct virtqueue *_vq, u32 num,
 }
 EXPORT_SYMBOL_GPL(virtqueue_resize);
 
+/**
+ * virtqueue_set_premapped - set the vring premapped mode
+ * @_vq: the struct virtqueue we're talking about.
+ *
+ * Enable the premapped mode of the vq.
+ *
+ * The vring in premapped mode does not do dma internally, so the driver must
+ * do dma mapping in advance. The driver must pass the dma_address through
+ * dma_address of scatterlist. When the driver got a used buffer from
+ * the vring, it has to unmap the dma address. So the driver must call
+ * virtqueue_get_buf_premapped()/virtqueue_detach_unused_buf_premapped().
+ *
+ * This must be called before adding any buf to vring.
+ * So this should be called immediately after init vq or vq reset.
+ *
+ * Caller must ensure we don't call this with other virtqueue operations
+ * at the same time (except where noted).
+ *
+ * Returns zero or a negative error.
+ * 0: success.
+ * -EINVAL: vring does not use the dma api, so we can not enable premapped mode.
+ */
+int virtqueue_set_premapped(struct virtqueue *_vq)
+{
+	struct vring_virtqueue *vq = to_vvq(_vq);
+
+	if (!vq->use_dma_api)
+		return -EINVAL;
+
+	vq->premapped = true;
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(virtqueue_set_premapped);
+
 /* Only available for split ring */
 struct virtqueue *vring_new_virtqueue(unsigned int index,
 				      unsigned int num,
diff --git a/include/linux/virtio.h b/include/linux/virtio.h
index b93238db94e3..1fc0e1023bd4 100644
--- a/include/linux/virtio.h
+++ b/include/linux/virtio.h
@@ -78,6 +78,8 @@ bool virtqueue_enable_cb(struct virtqueue *vq);
 
 unsigned virtqueue_enable_cb_prepare(struct virtqueue *vq);
 
+int virtqueue_set_premapped(struct virtqueue *_vq);
+
 bool virtqueue_poll(struct virtqueue *vq, unsigned);
 
 bool virtqueue_enable_cb_delayed(struct virtqueue *vq);
-- 
2.32.0.3.g01195cf9f


^ permalink raw reply related	[flat|nested] 91+ messages in thread

* [PATCH vhost v10 03/10] virtio_ring: split: support add premapped buf
  2023-06-02  9:21 ` Xuan Zhuo
@ 2023-06-02  9:21   ` Xuan Zhuo
  -1 siblings, 0 replies; 91+ messages in thread
From: Xuan Zhuo @ 2023-06-02  9:21 UTC (permalink / raw)
  To: virtualization
  Cc: Xuan Zhuo, Jesper Dangaard Brouer, Daniel Borkmann,
	Michael S. Tsirkin, netdev, John Fastabend, Alexei Starovoitov,
	Eric Dumazet, Jakub Kicinski, bpf, Paolo Abeni, David S. Miller

If the vq is the premapped mode, use the sg_dma_address() directly.

Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
---
 drivers/virtio/virtio_ring.c | 46 ++++++++++++++++++++++--------------
 1 file changed, 28 insertions(+), 18 deletions(-)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index 2afdfb9e3e30..18212c3e056b 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -598,8 +598,12 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
 		for (sg = sgs[n]; sg; sg = sg_next(sg)) {
 			dma_addr_t addr;
 
-			if (vring_map_one_sg(vq, sg, DMA_TO_DEVICE, &addr))
-				goto unmap_release;
+			if (vq->premapped) {
+				addr = sg_dma_address(sg);
+			} else {
+				if (vring_map_one_sg(vq, sg, DMA_TO_DEVICE, &addr))
+					goto unmap_release;
+			}
 
 			prev = i;
 			/* Note that we trust indirect descriptor
@@ -614,8 +618,12 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
 		for (sg = sgs[n]; sg; sg = sg_next(sg)) {
 			dma_addr_t addr;
 
-			if (vring_map_one_sg(vq, sg, DMA_FROM_DEVICE, &addr))
-				goto unmap_release;
+			if (vq->premapped) {
+				addr = sg_dma_address(sg);
+			} else {
+				if (vring_map_one_sg(vq, sg, DMA_FROM_DEVICE, &addr))
+					goto unmap_release;
+			}
 
 			prev = i;
 			/* Note that we trust indirect descriptor
@@ -689,21 +697,23 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
 	return 0;
 
 unmap_release:
-	err_idx = i;
+	if (!vq->premapped) {
+		err_idx = i;
 
-	if (indirect)
-		i = 0;
-	else
-		i = head;
-
-	for (n = 0; n < total_sg; n++) {
-		if (i == err_idx)
-			break;
-		if (indirect) {
-			vring_unmap_one_split_indirect(vq, &desc[i]);
-			i = virtio16_to_cpu(_vq->vdev, desc[i].next);
-		} else
-			i = vring_unmap_one_split(vq, i);
+		if (indirect)
+			i = 0;
+		else
+			i = head;
+
+		for (n = 0; n < total_sg; n++) {
+			if (i == err_idx)
+				break;
+			if (indirect) {
+				vring_unmap_one_split_indirect(vq, &desc[i]);
+				i = virtio16_to_cpu(_vq->vdev, desc[i].next);
+			} else
+				i = vring_unmap_one_split(vq, i);
+		}
 	}
 
 	if (indirect)
-- 
2.32.0.3.g01195cf9f

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply related	[flat|nested] 91+ messages in thread

* [PATCH vhost v10 03/10] virtio_ring: split: support add premapped buf
@ 2023-06-02  9:21   ` Xuan Zhuo
  0 siblings, 0 replies; 91+ messages in thread
From: Xuan Zhuo @ 2023-06-02  9:21 UTC (permalink / raw)
  To: virtualization
  Cc: Michael S. Tsirkin, Jason Wang, Xuan Zhuo, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Alexei Starovoitov,
	Daniel Borkmann, Jesper Dangaard Brouer, John Fastabend, netdev,
	bpf

If the vq is the premapped mode, use the sg_dma_address() directly.

Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
---
 drivers/virtio/virtio_ring.c | 46 ++++++++++++++++++++++--------------
 1 file changed, 28 insertions(+), 18 deletions(-)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index 2afdfb9e3e30..18212c3e056b 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -598,8 +598,12 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
 		for (sg = sgs[n]; sg; sg = sg_next(sg)) {
 			dma_addr_t addr;
 
-			if (vring_map_one_sg(vq, sg, DMA_TO_DEVICE, &addr))
-				goto unmap_release;
+			if (vq->premapped) {
+				addr = sg_dma_address(sg);
+			} else {
+				if (vring_map_one_sg(vq, sg, DMA_TO_DEVICE, &addr))
+					goto unmap_release;
+			}
 
 			prev = i;
 			/* Note that we trust indirect descriptor
@@ -614,8 +618,12 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
 		for (sg = sgs[n]; sg; sg = sg_next(sg)) {
 			dma_addr_t addr;
 
-			if (vring_map_one_sg(vq, sg, DMA_FROM_DEVICE, &addr))
-				goto unmap_release;
+			if (vq->premapped) {
+				addr = sg_dma_address(sg);
+			} else {
+				if (vring_map_one_sg(vq, sg, DMA_FROM_DEVICE, &addr))
+					goto unmap_release;
+			}
 
 			prev = i;
 			/* Note that we trust indirect descriptor
@@ -689,21 +697,23 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
 	return 0;
 
 unmap_release:
-	err_idx = i;
+	if (!vq->premapped) {
+		err_idx = i;
 
-	if (indirect)
-		i = 0;
-	else
-		i = head;
-
-	for (n = 0; n < total_sg; n++) {
-		if (i == err_idx)
-			break;
-		if (indirect) {
-			vring_unmap_one_split_indirect(vq, &desc[i]);
-			i = virtio16_to_cpu(_vq->vdev, desc[i].next);
-		} else
-			i = vring_unmap_one_split(vq, i);
+		if (indirect)
+			i = 0;
+		else
+			i = head;
+
+		for (n = 0; n < total_sg; n++) {
+			if (i == err_idx)
+				break;
+			if (indirect) {
+				vring_unmap_one_split_indirect(vq, &desc[i]);
+				i = virtio16_to_cpu(_vq->vdev, desc[i].next);
+			} else
+				i = vring_unmap_one_split(vq, i);
+		}
 	}
 
 	if (indirect)
-- 
2.32.0.3.g01195cf9f


^ permalink raw reply related	[flat|nested] 91+ messages in thread

* [PATCH vhost v10 04/10] virtio_ring: packed: support add premapped buf
  2023-06-02  9:21 ` Xuan Zhuo
@ 2023-06-02  9:22   ` Xuan Zhuo
  -1 siblings, 0 replies; 91+ messages in thread
From: Xuan Zhuo @ 2023-06-02  9:22 UTC (permalink / raw)
  To: virtualization
  Cc: Xuan Zhuo, Jesper Dangaard Brouer, Daniel Borkmann,
	Michael S. Tsirkin, netdev, John Fastabend, Alexei Starovoitov,
	Eric Dumazet, Jakub Kicinski, bpf, Paolo Abeni, David S. Miller

If the vq is the premapped mode, use the sg_dma_address() directly.

Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
---
 drivers/virtio/virtio_ring.c | 36 ++++++++++++++++++++++++++----------
 1 file changed, 26 insertions(+), 10 deletions(-)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index 18212c3e056b..dc109fbc05a5 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -1299,9 +1299,13 @@ static int virtqueue_add_indirect_packed(struct vring_virtqueue *vq,
 
 	for (n = 0; n < out_sgs + in_sgs; n++) {
 		for (sg = sgs[n]; sg; sg = sg_next(sg)) {
-			if (vring_map_one_sg(vq, sg, n < out_sgs ?
-					     DMA_TO_DEVICE : DMA_FROM_DEVICE, &addr))
-				goto unmap_release;
+			if (vq->premapped) {
+				addr = sg_dma_address(sg);
+			} else {
+				if (vring_map_one_sg(vq, sg, n < out_sgs ?
+						     DMA_TO_DEVICE : DMA_FROM_DEVICE, &addr))
+					goto unmap_release;
+			}
 
 			desc[i].flags = cpu_to_le16(n < out_sgs ?
 						0 : VRING_DESC_F_WRITE);
@@ -1369,10 +1373,12 @@ static int virtqueue_add_indirect_packed(struct vring_virtqueue *vq,
 	return 0;
 
 unmap_release:
-	err_idx = i;
+	if (!vq->premapped) {
+		err_idx = i;
 
-	for (i = 0; i < err_idx; i++)
-		vring_unmap_desc_packed(vq, &desc[i]);
+		for (i = 0; i < err_idx; i++)
+			vring_unmap_desc_packed(vq, &desc[i]);
+	}
 
 	kfree(desc);
 
@@ -1447,9 +1453,13 @@ static inline int virtqueue_add_packed(struct virtqueue *_vq,
 		for (sg = sgs[n]; sg; sg = sg_next(sg)) {
 			dma_addr_t addr;
 
-			if (vring_map_one_sg(vq, sg, n < out_sgs ?
-					     DMA_TO_DEVICE : DMA_FROM_DEVICE, &addr))
-				goto unmap_release;
+			if (vq->premapped) {
+				addr = sg_dma_address(sg);
+			} else {
+				if (vring_map_one_sg(vq, sg, n < out_sgs ?
+						     DMA_TO_DEVICE : DMA_FROM_DEVICE, &addr))
+					goto unmap_release;
+			}
 
 			flags = cpu_to_le16(vq->packed.avail_used_flags |
 				    (++c == total_sg ? 0 : VRING_DESC_F_NEXT) |
@@ -1512,11 +1522,17 @@ static inline int virtqueue_add_packed(struct virtqueue *_vq,
 	return 0;
 
 unmap_release:
+	vq->packed.avail_used_flags = avail_used_flags;
+
+	if (vq->premapped) {
+		END_USE(vq);
+		return -EIO;
+	}
+
 	err_idx = i;
 	i = head;
 	curr = vq->free_head;
 
-	vq->packed.avail_used_flags = avail_used_flags;
 
 	for (n = 0; n < total_sg; n++) {
 		if (i == err_idx)
-- 
2.32.0.3.g01195cf9f

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply related	[flat|nested] 91+ messages in thread

* [PATCH vhost v10 04/10] virtio_ring: packed: support add premapped buf
@ 2023-06-02  9:22   ` Xuan Zhuo
  0 siblings, 0 replies; 91+ messages in thread
From: Xuan Zhuo @ 2023-06-02  9:22 UTC (permalink / raw)
  To: virtualization
  Cc: Michael S. Tsirkin, Jason Wang, Xuan Zhuo, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Alexei Starovoitov,
	Daniel Borkmann, Jesper Dangaard Brouer, John Fastabend, netdev,
	bpf

If the vq is the premapped mode, use the sg_dma_address() directly.

Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
---
 drivers/virtio/virtio_ring.c | 36 ++++++++++++++++++++++++++----------
 1 file changed, 26 insertions(+), 10 deletions(-)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index 18212c3e056b..dc109fbc05a5 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -1299,9 +1299,13 @@ static int virtqueue_add_indirect_packed(struct vring_virtqueue *vq,
 
 	for (n = 0; n < out_sgs + in_sgs; n++) {
 		for (sg = sgs[n]; sg; sg = sg_next(sg)) {
-			if (vring_map_one_sg(vq, sg, n < out_sgs ?
-					     DMA_TO_DEVICE : DMA_FROM_DEVICE, &addr))
-				goto unmap_release;
+			if (vq->premapped) {
+				addr = sg_dma_address(sg);
+			} else {
+				if (vring_map_one_sg(vq, sg, n < out_sgs ?
+						     DMA_TO_DEVICE : DMA_FROM_DEVICE, &addr))
+					goto unmap_release;
+			}
 
 			desc[i].flags = cpu_to_le16(n < out_sgs ?
 						0 : VRING_DESC_F_WRITE);
@@ -1369,10 +1373,12 @@ static int virtqueue_add_indirect_packed(struct vring_virtqueue *vq,
 	return 0;
 
 unmap_release:
-	err_idx = i;
+	if (!vq->premapped) {
+		err_idx = i;
 
-	for (i = 0; i < err_idx; i++)
-		vring_unmap_desc_packed(vq, &desc[i]);
+		for (i = 0; i < err_idx; i++)
+			vring_unmap_desc_packed(vq, &desc[i]);
+	}
 
 	kfree(desc);
 
@@ -1447,9 +1453,13 @@ static inline int virtqueue_add_packed(struct virtqueue *_vq,
 		for (sg = sgs[n]; sg; sg = sg_next(sg)) {
 			dma_addr_t addr;
 
-			if (vring_map_one_sg(vq, sg, n < out_sgs ?
-					     DMA_TO_DEVICE : DMA_FROM_DEVICE, &addr))
-				goto unmap_release;
+			if (vq->premapped) {
+				addr = sg_dma_address(sg);
+			} else {
+				if (vring_map_one_sg(vq, sg, n < out_sgs ?
+						     DMA_TO_DEVICE : DMA_FROM_DEVICE, &addr))
+					goto unmap_release;
+			}
 
 			flags = cpu_to_le16(vq->packed.avail_used_flags |
 				    (++c == total_sg ? 0 : VRING_DESC_F_NEXT) |
@@ -1512,11 +1522,17 @@ static inline int virtqueue_add_packed(struct virtqueue *_vq,
 	return 0;
 
 unmap_release:
+	vq->packed.avail_used_flags = avail_used_flags;
+
+	if (vq->premapped) {
+		END_USE(vq);
+		return -EIO;
+	}
+
 	err_idx = i;
 	i = head;
 	curr = vq->free_head;
 
-	vq->packed.avail_used_flags = avail_used_flags;
 
 	for (n = 0; n < total_sg; n++) {
 		if (i == err_idx)
-- 
2.32.0.3.g01195cf9f


^ permalink raw reply related	[flat|nested] 91+ messages in thread

* [PATCH vhost v10 05/10] virtio_ring: split-detach: support return dma info to driver
  2023-06-02  9:21 ` Xuan Zhuo
@ 2023-06-02  9:22   ` Xuan Zhuo
  -1 siblings, 0 replies; 91+ messages in thread
From: Xuan Zhuo @ 2023-06-02  9:22 UTC (permalink / raw)
  To: virtualization
  Cc: Xuan Zhuo, Jesper Dangaard Brouer, Daniel Borkmann,
	Michael S. Tsirkin, netdev, John Fastabend, Alexei Starovoitov,
	Eric Dumazet, Jakub Kicinski, bpf, Paolo Abeni, David S. Miller

Under the premapped mode, the driver needs to unmap the DMA address
after receiving the buffer. The virtio core records the DMA address,
so the driver needs a way to get the dma info from the virtio core.

A straightforward approach is to pass an array to the virtio core when
calling virtqueue_get_buf(). However, it is not feasible when there are
multiple DMA addresses in the descriptor chain, and the array size is
unknown.

To solve this problem, a helper be introduced. After calling
virtqueue_get_buf(), the driver can call the helper to
retrieve a dma info. If the helper function returns -EAGAIN, it means
that there are more DMA addresses to be processed, and the driver should
call the helper function again. To keep track of the current position in
the chain, a cursor must be passed to the helper function, which is
initialized by virtqueue_get_buf().

Some processes are done inside this helper, so this helper MUST be
called under the premapped mode.

Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
---
 drivers/virtio/virtio_ring.c | 118 ++++++++++++++++++++++++++++++++---
 include/linux/virtio.h       |  11 ++++
 2 files changed, 119 insertions(+), 10 deletions(-)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index dc109fbc05a5..cdc4349f6066 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -754,8 +754,95 @@ static bool virtqueue_kick_prepare_split(struct virtqueue *_vq)
 	return needs_kick;
 }
 
-static void detach_buf_split(struct vring_virtqueue *vq, unsigned int head,
-			     void **ctx)
+static void detach_cursor_init_split(struct vring_virtqueue *vq,
+				     struct virtqueue_detach_cursor *cursor, u16 head)
+{
+	struct vring_desc_extra *extra;
+
+	extra = &vq->split.desc_extra[head];
+
+	/* Clear data ptr. */
+	vq->split.desc_state[head].data = NULL;
+
+	cursor->head = head;
+	cursor->done = 0;
+
+	if (extra->flags & VRING_DESC_F_INDIRECT) {
+		cursor->num = extra->len / sizeof(struct vring_desc);
+		cursor->indirect = true;
+		cursor->pos = 0;
+
+		vring_unmap_one_split(vq, head);
+
+		extra->next = vq->free_head;
+
+		vq->free_head = head;
+
+		/* Plus final descriptor */
+		vq->vq.num_free++;
+
+	} else {
+		cursor->indirect = false;
+		cursor->pos = head;
+	}
+}
+
+static int virtqueue_detach_split(struct virtqueue *_vq, struct virtqueue_detach_cursor *cursor,
+				  dma_addr_t *addr, u32 *len, enum dma_data_direction *dir)
+{
+	struct vring_virtqueue *vq = to_vvq(_vq);
+	__virtio16 nextflag = cpu_to_virtio16(vq->vq.vdev, VRING_DESC_F_NEXT);
+	int rc = -EAGAIN;
+
+	if (unlikely(cursor->done))
+		return -EINVAL;
+
+	if (!cursor->indirect) {
+		struct vring_desc_extra *extra;
+		unsigned int i;
+
+		i = cursor->pos;
+
+		extra = &vq->split.desc_extra[i];
+
+		if (vq->split.vring.desc[i].flags & nextflag) {
+			cursor->pos = extra->next;
+		} else {
+			extra->next = vq->free_head;
+			vq->free_head = cursor->head;
+			cursor->done = true;
+			rc = 0;
+		}
+
+		*addr = extra->addr;
+		*len = extra->len;
+		*dir = (extra->flags & VRING_DESC_F_WRITE) ? DMA_FROM_DEVICE : DMA_TO_DEVICE;
+
+		vq->vq.num_free++;
+
+	} else {
+		struct vring_desc *indir_desc, *desc;
+		u16 flags;
+
+		indir_desc = vq->split.desc_state[cursor->head].indir_desc;
+		desc = &indir_desc[cursor->pos];
+
+		flags = virtio16_to_cpu(vq->vq.vdev, desc->flags);
+		*addr = virtio64_to_cpu(vq->vq.vdev, desc->addr);
+		*len = virtio32_to_cpu(vq->vq.vdev, desc->len);
+		*dir = (flags & VRING_DESC_F_WRITE) ? DMA_FROM_DEVICE : DMA_TO_DEVICE;
+
+		if (++cursor->pos == cursor->num) {
+			kfree(indir_desc);
+			cursor->done = true;
+			return 0;
+		}
+	}
+
+	return rc;
+}
+
+static void detach_buf_split(struct vring_virtqueue *vq, unsigned int head)
 {
 	unsigned int i, j;
 	__virtio16 nextflag = cpu_to_virtio16(vq->vq.vdev, VRING_DESC_F_NEXT);
@@ -799,8 +886,6 @@ static void detach_buf_split(struct vring_virtqueue *vq, unsigned int head,
 
 		kfree(indir_desc);
 		vq->split.desc_state[head].indir_desc = NULL;
-	} else if (ctx) {
-		*ctx = vq->split.desc_state[head].indir_desc;
 	}
 }
 
@@ -812,7 +897,8 @@ static bool more_used_split(const struct vring_virtqueue *vq)
 
 static void *virtqueue_get_buf_ctx_split(struct virtqueue *_vq,
 					 unsigned int *len,
-					 void **ctx)
+					 void **ctx,
+					 struct virtqueue_detach_cursor *cursor)
 {
 	struct vring_virtqueue *vq = to_vvq(_vq);
 	void *ret;
@@ -852,7 +938,15 @@ static void *virtqueue_get_buf_ctx_split(struct virtqueue *_vq,
 
 	/* detach_buf_split clears data, so grab it now. */
 	ret = vq->split.desc_state[i].data;
-	detach_buf_split(vq, i, ctx);
+
+	if (!vq->indirect && ctx)
+		*ctx = vq->split.desc_state[i].indir_desc;
+
+	if (vq->premapped)
+		detach_cursor_init_split(vq, cursor, i);
+	else
+		detach_buf_split(vq, i);
+
 	vq->last_used_idx++;
 	/* If we expect an interrupt for the next entry, tell host
 	 * by writing event index and flush out the write before
@@ -961,7 +1055,8 @@ static bool virtqueue_enable_cb_delayed_split(struct virtqueue *_vq)
 	return true;
 }
 
-static void *virtqueue_detach_unused_buf_split(struct virtqueue *_vq)
+static void *virtqueue_detach_unused_buf_split(struct virtqueue *_vq,
+					       struct virtqueue_detach_cursor *cursor)
 {
 	struct vring_virtqueue *vq = to_vvq(_vq);
 	unsigned int i;
@@ -974,7 +1069,10 @@ static void *virtqueue_detach_unused_buf_split(struct virtqueue *_vq)
 			continue;
 		/* detach_buf_split clears data, so grab it now. */
 		buf = vq->split.desc_state[i].data;
-		detach_buf_split(vq, i, NULL);
+		if (vq->premapped)
+			detach_cursor_init_split(vq, cursor, i);
+		else
+			detach_buf_split(vq, i);
 		vq->split.avail_idx_shadow--;
 		vq->split.vring.avail->idx = cpu_to_virtio16(_vq->vdev,
 				vq->split.avail_idx_shadow);
@@ -2361,7 +2459,7 @@ void *virtqueue_get_buf_ctx(struct virtqueue *_vq, unsigned int *len,
 	struct vring_virtqueue *vq = to_vvq(_vq);
 
 	return vq->packed_ring ? virtqueue_get_buf_ctx_packed(_vq, len, ctx) :
-				 virtqueue_get_buf_ctx_split(_vq, len, ctx);
+				 virtqueue_get_buf_ctx_split(_vq, len, ctx, NULL);
 }
 EXPORT_SYMBOL_GPL(virtqueue_get_buf_ctx);
 
@@ -2493,7 +2591,7 @@ void *virtqueue_detach_unused_buf(struct virtqueue *_vq)
 	struct vring_virtqueue *vq = to_vvq(_vq);
 
 	return vq->packed_ring ? virtqueue_detach_unused_buf_packed(_vq) :
-				 virtqueue_detach_unused_buf_split(_vq);
+				 virtqueue_detach_unused_buf_split(_vq, NULL);
 }
 EXPORT_SYMBOL_GPL(virtqueue_detach_unused_buf);
 
diff --git a/include/linux/virtio.h b/include/linux/virtio.h
index 1fc0e1023bd4..eb4a4e4329aa 100644
--- a/include/linux/virtio.h
+++ b/include/linux/virtio.h
@@ -38,6 +38,17 @@ struct virtqueue {
 	void *priv;
 };
 
+struct virtqueue_detach_cursor {
+	unsigned indirect:1;
+	unsigned done:1;
+	unsigned hole:14;
+
+	/* for split head */
+	unsigned head:16;
+	unsigned num:16;
+	unsigned pos:16;
+};
+
 int virtqueue_add_outbuf(struct virtqueue *vq,
 			 struct scatterlist sg[], unsigned int num,
 			 void *data,
-- 
2.32.0.3.g01195cf9f

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply related	[flat|nested] 91+ messages in thread

* [PATCH vhost v10 05/10] virtio_ring: split-detach: support return dma info to driver
@ 2023-06-02  9:22   ` Xuan Zhuo
  0 siblings, 0 replies; 91+ messages in thread
From: Xuan Zhuo @ 2023-06-02  9:22 UTC (permalink / raw)
  To: virtualization
  Cc: Michael S. Tsirkin, Jason Wang, Xuan Zhuo, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Alexei Starovoitov,
	Daniel Borkmann, Jesper Dangaard Brouer, John Fastabend, netdev,
	bpf

Under the premapped mode, the driver needs to unmap the DMA address
after receiving the buffer. The virtio core records the DMA address,
so the driver needs a way to get the dma info from the virtio core.

A straightforward approach is to pass an array to the virtio core when
calling virtqueue_get_buf(). However, it is not feasible when there are
multiple DMA addresses in the descriptor chain, and the array size is
unknown.

To solve this problem, a helper be introduced. After calling
virtqueue_get_buf(), the driver can call the helper to
retrieve a dma info. If the helper function returns -EAGAIN, it means
that there are more DMA addresses to be processed, and the driver should
call the helper function again. To keep track of the current position in
the chain, a cursor must be passed to the helper function, which is
initialized by virtqueue_get_buf().

Some processes are done inside this helper, so this helper MUST be
called under the premapped mode.

Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
---
 drivers/virtio/virtio_ring.c | 118 ++++++++++++++++++++++++++++++++---
 include/linux/virtio.h       |  11 ++++
 2 files changed, 119 insertions(+), 10 deletions(-)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index dc109fbc05a5..cdc4349f6066 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -754,8 +754,95 @@ static bool virtqueue_kick_prepare_split(struct virtqueue *_vq)
 	return needs_kick;
 }
 
-static void detach_buf_split(struct vring_virtqueue *vq, unsigned int head,
-			     void **ctx)
+static void detach_cursor_init_split(struct vring_virtqueue *vq,
+				     struct virtqueue_detach_cursor *cursor, u16 head)
+{
+	struct vring_desc_extra *extra;
+
+	extra = &vq->split.desc_extra[head];
+
+	/* Clear data ptr. */
+	vq->split.desc_state[head].data = NULL;
+
+	cursor->head = head;
+	cursor->done = 0;
+
+	if (extra->flags & VRING_DESC_F_INDIRECT) {
+		cursor->num = extra->len / sizeof(struct vring_desc);
+		cursor->indirect = true;
+		cursor->pos = 0;
+
+		vring_unmap_one_split(vq, head);
+
+		extra->next = vq->free_head;
+
+		vq->free_head = head;
+
+		/* Plus final descriptor */
+		vq->vq.num_free++;
+
+	} else {
+		cursor->indirect = false;
+		cursor->pos = head;
+	}
+}
+
+static int virtqueue_detach_split(struct virtqueue *_vq, struct virtqueue_detach_cursor *cursor,
+				  dma_addr_t *addr, u32 *len, enum dma_data_direction *dir)
+{
+	struct vring_virtqueue *vq = to_vvq(_vq);
+	__virtio16 nextflag = cpu_to_virtio16(vq->vq.vdev, VRING_DESC_F_NEXT);
+	int rc = -EAGAIN;
+
+	if (unlikely(cursor->done))
+		return -EINVAL;
+
+	if (!cursor->indirect) {
+		struct vring_desc_extra *extra;
+		unsigned int i;
+
+		i = cursor->pos;
+
+		extra = &vq->split.desc_extra[i];
+
+		if (vq->split.vring.desc[i].flags & nextflag) {
+			cursor->pos = extra->next;
+		} else {
+			extra->next = vq->free_head;
+			vq->free_head = cursor->head;
+			cursor->done = true;
+			rc = 0;
+		}
+
+		*addr = extra->addr;
+		*len = extra->len;
+		*dir = (extra->flags & VRING_DESC_F_WRITE) ? DMA_FROM_DEVICE : DMA_TO_DEVICE;
+
+		vq->vq.num_free++;
+
+	} else {
+		struct vring_desc *indir_desc, *desc;
+		u16 flags;
+
+		indir_desc = vq->split.desc_state[cursor->head].indir_desc;
+		desc = &indir_desc[cursor->pos];
+
+		flags = virtio16_to_cpu(vq->vq.vdev, desc->flags);
+		*addr = virtio64_to_cpu(vq->vq.vdev, desc->addr);
+		*len = virtio32_to_cpu(vq->vq.vdev, desc->len);
+		*dir = (flags & VRING_DESC_F_WRITE) ? DMA_FROM_DEVICE : DMA_TO_DEVICE;
+
+		if (++cursor->pos == cursor->num) {
+			kfree(indir_desc);
+			cursor->done = true;
+			return 0;
+		}
+	}
+
+	return rc;
+}
+
+static void detach_buf_split(struct vring_virtqueue *vq, unsigned int head)
 {
 	unsigned int i, j;
 	__virtio16 nextflag = cpu_to_virtio16(vq->vq.vdev, VRING_DESC_F_NEXT);
@@ -799,8 +886,6 @@ static void detach_buf_split(struct vring_virtqueue *vq, unsigned int head,
 
 		kfree(indir_desc);
 		vq->split.desc_state[head].indir_desc = NULL;
-	} else if (ctx) {
-		*ctx = vq->split.desc_state[head].indir_desc;
 	}
 }
 
@@ -812,7 +897,8 @@ static bool more_used_split(const struct vring_virtqueue *vq)
 
 static void *virtqueue_get_buf_ctx_split(struct virtqueue *_vq,
 					 unsigned int *len,
-					 void **ctx)
+					 void **ctx,
+					 struct virtqueue_detach_cursor *cursor)
 {
 	struct vring_virtqueue *vq = to_vvq(_vq);
 	void *ret;
@@ -852,7 +938,15 @@ static void *virtqueue_get_buf_ctx_split(struct virtqueue *_vq,
 
 	/* detach_buf_split clears data, so grab it now. */
 	ret = vq->split.desc_state[i].data;
-	detach_buf_split(vq, i, ctx);
+
+	if (!vq->indirect && ctx)
+		*ctx = vq->split.desc_state[i].indir_desc;
+
+	if (vq->premapped)
+		detach_cursor_init_split(vq, cursor, i);
+	else
+		detach_buf_split(vq, i);
+
 	vq->last_used_idx++;
 	/* If we expect an interrupt for the next entry, tell host
 	 * by writing event index and flush out the write before
@@ -961,7 +1055,8 @@ static bool virtqueue_enable_cb_delayed_split(struct virtqueue *_vq)
 	return true;
 }
 
-static void *virtqueue_detach_unused_buf_split(struct virtqueue *_vq)
+static void *virtqueue_detach_unused_buf_split(struct virtqueue *_vq,
+					       struct virtqueue_detach_cursor *cursor)
 {
 	struct vring_virtqueue *vq = to_vvq(_vq);
 	unsigned int i;
@@ -974,7 +1069,10 @@ static void *virtqueue_detach_unused_buf_split(struct virtqueue *_vq)
 			continue;
 		/* detach_buf_split clears data, so grab it now. */
 		buf = vq->split.desc_state[i].data;
-		detach_buf_split(vq, i, NULL);
+		if (vq->premapped)
+			detach_cursor_init_split(vq, cursor, i);
+		else
+			detach_buf_split(vq, i);
 		vq->split.avail_idx_shadow--;
 		vq->split.vring.avail->idx = cpu_to_virtio16(_vq->vdev,
 				vq->split.avail_idx_shadow);
@@ -2361,7 +2459,7 @@ void *virtqueue_get_buf_ctx(struct virtqueue *_vq, unsigned int *len,
 	struct vring_virtqueue *vq = to_vvq(_vq);
 
 	return vq->packed_ring ? virtqueue_get_buf_ctx_packed(_vq, len, ctx) :
-				 virtqueue_get_buf_ctx_split(_vq, len, ctx);
+				 virtqueue_get_buf_ctx_split(_vq, len, ctx, NULL);
 }
 EXPORT_SYMBOL_GPL(virtqueue_get_buf_ctx);
 
@@ -2493,7 +2591,7 @@ void *virtqueue_detach_unused_buf(struct virtqueue *_vq)
 	struct vring_virtqueue *vq = to_vvq(_vq);
 
 	return vq->packed_ring ? virtqueue_detach_unused_buf_packed(_vq) :
-				 virtqueue_detach_unused_buf_split(_vq);
+				 virtqueue_detach_unused_buf_split(_vq, NULL);
 }
 EXPORT_SYMBOL_GPL(virtqueue_detach_unused_buf);
 
diff --git a/include/linux/virtio.h b/include/linux/virtio.h
index 1fc0e1023bd4..eb4a4e4329aa 100644
--- a/include/linux/virtio.h
+++ b/include/linux/virtio.h
@@ -38,6 +38,17 @@ struct virtqueue {
 	void *priv;
 };
 
+struct virtqueue_detach_cursor {
+	unsigned indirect:1;
+	unsigned done:1;
+	unsigned hole:14;
+
+	/* for split head */
+	unsigned head:16;
+	unsigned num:16;
+	unsigned pos:16;
+};
+
 int virtqueue_add_outbuf(struct virtqueue *vq,
 			 struct scatterlist sg[], unsigned int num,
 			 void *data,
-- 
2.32.0.3.g01195cf9f


^ permalink raw reply related	[flat|nested] 91+ messages in thread

* [PATCH vhost v10 06/10] virtio_ring: packed-detach: support return dma info to driver
  2023-06-02  9:21 ` Xuan Zhuo
@ 2023-06-02  9:22   ` Xuan Zhuo
  -1 siblings, 0 replies; 91+ messages in thread
From: Xuan Zhuo @ 2023-06-02  9:22 UTC (permalink / raw)
  To: virtualization
  Cc: Xuan Zhuo, Jesper Dangaard Brouer, Daniel Borkmann,
	Michael S. Tsirkin, netdev, John Fastabend, Alexei Starovoitov,
	Eric Dumazet, Jakub Kicinski, bpf, Paolo Abeni, David S. Miller

Under the premapped mode, the driver needs to unmap the DMA address
after receiving the buffer. The virtio core records the DMA address,
so the driver needs a way to get the dma info from the virtio core.

A straightforward approach is to pass an array to the virtio core when
calling virtqueue_get_buf(). However, it is not feasible when there are
multiple DMA addresses in the descriptor chain, and the array size is
unknown.

To solve this problem, a helper be introduced. After calling
virtqueue_get_buf(), the driver can call the helper to
retrieve a dma info. If the helper function returns -EAGAIN, it means
that there are more DMA addresses to be processed, and the driver should
call the helper function again. To keep track of the current position in
the chain, a cursor must be passed to the helper function, which is
initialized by virtqueue_get_buf().

Some processes are done inside this helper, so this helper MUST be
called under the premapped mode.

Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
---
 drivers/virtio/virtio_ring.c | 105 ++++++++++++++++++++++++++++++++---
 include/linux/virtio.h       |   9 ++-
 2 files changed, 103 insertions(+), 11 deletions(-)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index cdc4349f6066..cbc22daae7e1 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -1695,8 +1695,85 @@ static bool virtqueue_kick_prepare_packed(struct virtqueue *_vq)
 	return needs_kick;
 }
 
+static void detach_cursor_init_packed(struct vring_virtqueue *vq,
+				      struct virtqueue_detach_cursor *cursor, u16 id)
+{
+	struct vring_desc_state_packed *state = NULL;
+	u32 len;
+
+	state = &vq->packed.desc_state[id];
+
+	/* Clear data ptr. */
+	state->data = NULL;
+
+	vq->packed.desc_extra[state->last].next = vq->free_head;
+	vq->free_head = id;
+	vq->vq.num_free += state->num;
+
+	/* init cursor */
+	cursor->curr = id;
+	cursor->done = 0;
+	cursor->pos = 0;
+
+	if (vq->packed.desc_extra[id].flags & VRING_DESC_F_INDIRECT) {
+		len = vq->split.desc_extra[id].len;
+
+		cursor->num = len / sizeof(struct vring_packed_desc);
+		cursor->indirect = true;
+
+		vring_unmap_extra_packed(vq, &vq->packed.desc_extra[id]);
+	} else {
+		cursor->num = state->num;
+		cursor->indirect = false;
+	}
+}
+
+static int virtqueue_detach_packed(struct virtqueue *_vq, struct virtqueue_detach_cursor *cursor,
+				   dma_addr_t *addr, u32 *len, enum dma_data_direction *dir)
+{
+	struct vring_virtqueue *vq = to_vvq(_vq);
+
+	if (unlikely(cursor->done))
+		return -EINVAL;
+
+	if (!cursor->indirect) {
+		struct vring_desc_extra *extra;
+
+		extra = &vq->packed.desc_extra[cursor->curr];
+		cursor->curr = extra->next;
+
+		*addr = extra->addr;
+		*len = extra->len;
+		*dir = (extra->flags & VRING_DESC_F_WRITE) ? DMA_FROM_DEVICE : DMA_TO_DEVICE;
+
+		if (++cursor->pos == cursor->num) {
+			cursor->done = true;
+			return 0;
+		}
+	} else {
+		struct vring_packed_desc *indir_desc, *desc;
+		u16 flags;
+
+		indir_desc = vq->packed.desc_state[cursor->curr].indir_desc;
+		desc = &indir_desc[cursor->pos];
+
+		flags = le16_to_cpu(desc->flags);
+		*addr = le64_to_cpu(desc->addr);
+		*len = le32_to_cpu(desc->len);
+		*dir = (flags & VRING_DESC_F_WRITE) ?  DMA_FROM_DEVICE : DMA_TO_DEVICE;
+
+		if (++cursor->pos == cursor->num) {
+			kfree(indir_desc);
+			cursor->done = true;
+			return 0;
+		}
+	}
+
+	return -EAGAIN;
+}
+
 static void detach_buf_packed(struct vring_virtqueue *vq,
-			      unsigned int id, void **ctx)
+			      unsigned int id)
 {
 	struct vring_desc_state_packed *state = NULL;
 	struct vring_packed_desc *desc;
@@ -1736,8 +1813,6 @@ static void detach_buf_packed(struct vring_virtqueue *vq,
 		}
 		kfree(desc);
 		state->indir_desc = NULL;
-	} else if (ctx) {
-		*ctx = state->indir_desc;
 	}
 }
 
@@ -1768,7 +1843,8 @@ static bool more_used_packed(const struct vring_virtqueue *vq)
 
 static void *virtqueue_get_buf_ctx_packed(struct virtqueue *_vq,
 					  unsigned int *len,
-					  void **ctx)
+					  void **ctx,
+					  struct virtqueue_detach_cursor *cursor)
 {
 	struct vring_virtqueue *vq = to_vvq(_vq);
 	u16 last_used, id, last_used_idx;
@@ -1808,7 +1884,14 @@ static void *virtqueue_get_buf_ctx_packed(struct virtqueue *_vq,
 
 	/* detach_buf_packed clears data, so grab it now. */
 	ret = vq->packed.desc_state[id].data;
-	detach_buf_packed(vq, id, ctx);
+
+	if (!vq->indirect && ctx)
+		*ctx = vq->packed.desc_state[id].indir_desc;
+
+	if (vq->premapped)
+		detach_cursor_init_packed(vq, cursor, id);
+	else
+		detach_buf_packed(vq, id);
 
 	last_used += vq->packed.desc_state[id].num;
 	if (unlikely(last_used >= vq->packed.vring.num)) {
@@ -1960,7 +2043,8 @@ static bool virtqueue_enable_cb_delayed_packed(struct virtqueue *_vq)
 	return true;
 }
 
-static void *virtqueue_detach_unused_buf_packed(struct virtqueue *_vq)
+static void *virtqueue_detach_unused_buf_packed(struct virtqueue *_vq,
+						struct virtqueue_detach_cursor *cursor)
 {
 	struct vring_virtqueue *vq = to_vvq(_vq);
 	unsigned int i;
@@ -1973,7 +2057,10 @@ static void *virtqueue_detach_unused_buf_packed(struct virtqueue *_vq)
 			continue;
 		/* detach_buf clears data, so grab it now. */
 		buf = vq->packed.desc_state[i].data;
-		detach_buf_packed(vq, i, NULL);
+		if (vq->premapped)
+			detach_cursor_init_packed(vq, cursor, i);
+		else
+			detach_buf_packed(vq, i);
 		END_USE(vq);
 		return buf;
 	}
@@ -2458,7 +2545,7 @@ void *virtqueue_get_buf_ctx(struct virtqueue *_vq, unsigned int *len,
 {
 	struct vring_virtqueue *vq = to_vvq(_vq);
 
-	return vq->packed_ring ? virtqueue_get_buf_ctx_packed(_vq, len, ctx) :
+	return vq->packed_ring ? virtqueue_get_buf_ctx_packed(_vq, len, ctx, NULL) :
 				 virtqueue_get_buf_ctx_split(_vq, len, ctx, NULL);
 }
 EXPORT_SYMBOL_GPL(virtqueue_get_buf_ctx);
@@ -2590,7 +2677,7 @@ void *virtqueue_detach_unused_buf(struct virtqueue *_vq)
 {
 	struct vring_virtqueue *vq = to_vvq(_vq);
 
-	return vq->packed_ring ? virtqueue_detach_unused_buf_packed(_vq) :
+	return vq->packed_ring ? virtqueue_detach_unused_buf_packed(_vq, NULL) :
 				 virtqueue_detach_unused_buf_split(_vq, NULL);
 }
 EXPORT_SYMBOL_GPL(virtqueue_detach_unused_buf);
diff --git a/include/linux/virtio.h b/include/linux/virtio.h
index eb4a4e4329aa..7f137c7a9034 100644
--- a/include/linux/virtio.h
+++ b/include/linux/virtio.h
@@ -43,8 +43,13 @@ struct virtqueue_detach_cursor {
 	unsigned done:1;
 	unsigned hole:14;
 
-	/* for split head */
-	unsigned head:16;
+	union {
+		/* for split head */
+		unsigned head:16;
+
+		/* for packed id */
+		unsigned curr:16;
+	};
 	unsigned num:16;
 	unsigned pos:16;
 };
-- 
2.32.0.3.g01195cf9f

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply related	[flat|nested] 91+ messages in thread

* [PATCH vhost v10 06/10] virtio_ring: packed-detach: support return dma info to driver
@ 2023-06-02  9:22   ` Xuan Zhuo
  0 siblings, 0 replies; 91+ messages in thread
From: Xuan Zhuo @ 2023-06-02  9:22 UTC (permalink / raw)
  To: virtualization
  Cc: Michael S. Tsirkin, Jason Wang, Xuan Zhuo, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Alexei Starovoitov,
	Daniel Borkmann, Jesper Dangaard Brouer, John Fastabend, netdev,
	bpf

Under the premapped mode, the driver needs to unmap the DMA address
after receiving the buffer. The virtio core records the DMA address,
so the driver needs a way to get the dma info from the virtio core.

A straightforward approach is to pass an array to the virtio core when
calling virtqueue_get_buf(). However, it is not feasible when there are
multiple DMA addresses in the descriptor chain, and the array size is
unknown.

To solve this problem, a helper be introduced. After calling
virtqueue_get_buf(), the driver can call the helper to
retrieve a dma info. If the helper function returns -EAGAIN, it means
that there are more DMA addresses to be processed, and the driver should
call the helper function again. To keep track of the current position in
the chain, a cursor must be passed to the helper function, which is
initialized by virtqueue_get_buf().

Some processes are done inside this helper, so this helper MUST be
called under the premapped mode.

Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
---
 drivers/virtio/virtio_ring.c | 105 ++++++++++++++++++++++++++++++++---
 include/linux/virtio.h       |   9 ++-
 2 files changed, 103 insertions(+), 11 deletions(-)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index cdc4349f6066..cbc22daae7e1 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -1695,8 +1695,85 @@ static bool virtqueue_kick_prepare_packed(struct virtqueue *_vq)
 	return needs_kick;
 }
 
+static void detach_cursor_init_packed(struct vring_virtqueue *vq,
+				      struct virtqueue_detach_cursor *cursor, u16 id)
+{
+	struct vring_desc_state_packed *state = NULL;
+	u32 len;
+
+	state = &vq->packed.desc_state[id];
+
+	/* Clear data ptr. */
+	state->data = NULL;
+
+	vq->packed.desc_extra[state->last].next = vq->free_head;
+	vq->free_head = id;
+	vq->vq.num_free += state->num;
+
+	/* init cursor */
+	cursor->curr = id;
+	cursor->done = 0;
+	cursor->pos = 0;
+
+	if (vq->packed.desc_extra[id].flags & VRING_DESC_F_INDIRECT) {
+		len = vq->split.desc_extra[id].len;
+
+		cursor->num = len / sizeof(struct vring_packed_desc);
+		cursor->indirect = true;
+
+		vring_unmap_extra_packed(vq, &vq->packed.desc_extra[id]);
+	} else {
+		cursor->num = state->num;
+		cursor->indirect = false;
+	}
+}
+
+static int virtqueue_detach_packed(struct virtqueue *_vq, struct virtqueue_detach_cursor *cursor,
+				   dma_addr_t *addr, u32 *len, enum dma_data_direction *dir)
+{
+	struct vring_virtqueue *vq = to_vvq(_vq);
+
+	if (unlikely(cursor->done))
+		return -EINVAL;
+
+	if (!cursor->indirect) {
+		struct vring_desc_extra *extra;
+
+		extra = &vq->packed.desc_extra[cursor->curr];
+		cursor->curr = extra->next;
+
+		*addr = extra->addr;
+		*len = extra->len;
+		*dir = (extra->flags & VRING_DESC_F_WRITE) ? DMA_FROM_DEVICE : DMA_TO_DEVICE;
+
+		if (++cursor->pos == cursor->num) {
+			cursor->done = true;
+			return 0;
+		}
+	} else {
+		struct vring_packed_desc *indir_desc, *desc;
+		u16 flags;
+
+		indir_desc = vq->packed.desc_state[cursor->curr].indir_desc;
+		desc = &indir_desc[cursor->pos];
+
+		flags = le16_to_cpu(desc->flags);
+		*addr = le64_to_cpu(desc->addr);
+		*len = le32_to_cpu(desc->len);
+		*dir = (flags & VRING_DESC_F_WRITE) ?  DMA_FROM_DEVICE : DMA_TO_DEVICE;
+
+		if (++cursor->pos == cursor->num) {
+			kfree(indir_desc);
+			cursor->done = true;
+			return 0;
+		}
+	}
+
+	return -EAGAIN;
+}
+
 static void detach_buf_packed(struct vring_virtqueue *vq,
-			      unsigned int id, void **ctx)
+			      unsigned int id)
 {
 	struct vring_desc_state_packed *state = NULL;
 	struct vring_packed_desc *desc;
@@ -1736,8 +1813,6 @@ static void detach_buf_packed(struct vring_virtqueue *vq,
 		}
 		kfree(desc);
 		state->indir_desc = NULL;
-	} else if (ctx) {
-		*ctx = state->indir_desc;
 	}
 }
 
@@ -1768,7 +1843,8 @@ static bool more_used_packed(const struct vring_virtqueue *vq)
 
 static void *virtqueue_get_buf_ctx_packed(struct virtqueue *_vq,
 					  unsigned int *len,
-					  void **ctx)
+					  void **ctx,
+					  struct virtqueue_detach_cursor *cursor)
 {
 	struct vring_virtqueue *vq = to_vvq(_vq);
 	u16 last_used, id, last_used_idx;
@@ -1808,7 +1884,14 @@ static void *virtqueue_get_buf_ctx_packed(struct virtqueue *_vq,
 
 	/* detach_buf_packed clears data, so grab it now. */
 	ret = vq->packed.desc_state[id].data;
-	detach_buf_packed(vq, id, ctx);
+
+	if (!vq->indirect && ctx)
+		*ctx = vq->packed.desc_state[id].indir_desc;
+
+	if (vq->premapped)
+		detach_cursor_init_packed(vq, cursor, id);
+	else
+		detach_buf_packed(vq, id);
 
 	last_used += vq->packed.desc_state[id].num;
 	if (unlikely(last_used >= vq->packed.vring.num)) {
@@ -1960,7 +2043,8 @@ static bool virtqueue_enable_cb_delayed_packed(struct virtqueue *_vq)
 	return true;
 }
 
-static void *virtqueue_detach_unused_buf_packed(struct virtqueue *_vq)
+static void *virtqueue_detach_unused_buf_packed(struct virtqueue *_vq,
+						struct virtqueue_detach_cursor *cursor)
 {
 	struct vring_virtqueue *vq = to_vvq(_vq);
 	unsigned int i;
@@ -1973,7 +2057,10 @@ static void *virtqueue_detach_unused_buf_packed(struct virtqueue *_vq)
 			continue;
 		/* detach_buf clears data, so grab it now. */
 		buf = vq->packed.desc_state[i].data;
-		detach_buf_packed(vq, i, NULL);
+		if (vq->premapped)
+			detach_cursor_init_packed(vq, cursor, i);
+		else
+			detach_buf_packed(vq, i);
 		END_USE(vq);
 		return buf;
 	}
@@ -2458,7 +2545,7 @@ void *virtqueue_get_buf_ctx(struct virtqueue *_vq, unsigned int *len,
 {
 	struct vring_virtqueue *vq = to_vvq(_vq);
 
-	return vq->packed_ring ? virtqueue_get_buf_ctx_packed(_vq, len, ctx) :
+	return vq->packed_ring ? virtqueue_get_buf_ctx_packed(_vq, len, ctx, NULL) :
 				 virtqueue_get_buf_ctx_split(_vq, len, ctx, NULL);
 }
 EXPORT_SYMBOL_GPL(virtqueue_get_buf_ctx);
@@ -2590,7 +2677,7 @@ void *virtqueue_detach_unused_buf(struct virtqueue *_vq)
 {
 	struct vring_virtqueue *vq = to_vvq(_vq);
 
-	return vq->packed_ring ? virtqueue_detach_unused_buf_packed(_vq) :
+	return vq->packed_ring ? virtqueue_detach_unused_buf_packed(_vq, NULL) :
 				 virtqueue_detach_unused_buf_split(_vq, NULL);
 }
 EXPORT_SYMBOL_GPL(virtqueue_detach_unused_buf);
diff --git a/include/linux/virtio.h b/include/linux/virtio.h
index eb4a4e4329aa..7f137c7a9034 100644
--- a/include/linux/virtio.h
+++ b/include/linux/virtio.h
@@ -43,8 +43,13 @@ struct virtqueue_detach_cursor {
 	unsigned done:1;
 	unsigned hole:14;
 
-	/* for split head */
-	unsigned head:16;
+	union {
+		/* for split head */
+		unsigned head:16;
+
+		/* for packed id */
+		unsigned curr:16;
+	};
 	unsigned num:16;
 	unsigned pos:16;
 };
-- 
2.32.0.3.g01195cf9f


^ permalink raw reply related	[flat|nested] 91+ messages in thread

* [PATCH vhost v10 07/10] virtio_ring: introduce helpers for premapped
  2023-06-02  9:21 ` Xuan Zhuo
@ 2023-06-02  9:22   ` Xuan Zhuo
  -1 siblings, 0 replies; 91+ messages in thread
From: Xuan Zhuo @ 2023-06-02  9:22 UTC (permalink / raw)
  To: virtualization
  Cc: Xuan Zhuo, Jesper Dangaard Brouer, Daniel Borkmann,
	Michael S. Tsirkin, netdev, John Fastabend, Alexei Starovoitov,
	Eric Dumazet, Jakub Kicinski, bpf, Paolo Abeni, David S. Miller

This patch introduces three helpers for premapped mode.

* virtqueue_get_buf_premapped
* virtqueue_detach_unused_buf_premapped

The above helpers work like the non-premapped funcs. But a cursor is
passed.

virtqueue_detach is used to get the dma info of the last buf by
  cursor.

Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
---
 drivers/virtio/virtio_ring.c | 83 ++++++++++++++++++++++++++++++++++++
 include/linux/virtio.h       | 10 +++++
 2 files changed, 93 insertions(+)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index cbc22daae7e1..6771b9661798 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -2555,6 +2555,66 @@ void *virtqueue_get_buf(struct virtqueue *_vq, unsigned int *len)
 	return virtqueue_get_buf_ctx(_vq, len, NULL);
 }
 EXPORT_SYMBOL_GPL(virtqueue_get_buf);
+
+/**
+ * virtqueue_get_buf_premapped - get the next used buffer
+ * @_vq: the struct virtqueue we're talking about.
+ * @len: the length written into the buffer
+ * @ctx: extra context for the token
+ * @cursor: detach cursor
+ *
+ * If the device wrote data into the buffer, @len will be set to the
+ * amount written.  This means you don't need to clear the buffer
+ * beforehand to ensure there's no data leakage in the case of short
+ * writes.
+ *
+ * Caller must ensure we don't call this with other virtqueue
+ * operations at the same time (except where noted).
+ *
+ * This is used for the premapped vq. The cursor is passed by the dirver, that
+ * is used for virtqueue_detach. That will be initialized by virtio core
+ * internally.
+ *
+ * Returns NULL if there are no used buffers, or the "data" token
+ * handed to virtqueue_add_*().
+ */
+void *virtqueue_get_buf_premapped(struct virtqueue *_vq, unsigned int *len,
+				  void **ctx,
+				  struct virtqueue_detach_cursor *cursor)
+{
+	struct vring_virtqueue *vq = to_vvq(_vq);
+
+	return vq->packed_ring ? virtqueue_get_buf_ctx_packed(_vq, len, ctx, cursor) :
+				 virtqueue_get_buf_ctx_split(_vq, len, ctx, cursor);
+}
+EXPORT_SYMBOL_GPL(virtqueue_get_buf_premapped);
+
+/**
+ * virtqueue_detach - get the dma info of last buf
+ * @_vq: the struct virtqueue we're talking about.
+ * @cursor: detach cursor
+ * @addr: the dma address
+ * @len: the length of the dma address
+ * @dir: the direction of the dma address
+ *
+ * This is used for the premapped vq. The cursor is initialized by
+ * virtqueue_get_buf_premapped or virtqueue_detach_unused_buf_premapped.
+ *
+ * Returns:
+ * -EAGAIN: there are more dma info, this function should be called more.
+ * -EINVAL: the process is done, should not call this function
+ * 0: no more dma info
+ */
+int virtqueue_detach(struct virtqueue *_vq, struct virtqueue_detach_cursor *cursor,
+		     dma_addr_t *addr, u32 *len, enum dma_data_direction *dir)
+{
+	struct vring_virtqueue *vq = to_vvq(_vq);
+
+	return vq->packed_ring ? virtqueue_detach_packed(_vq, cursor, addr, len, dir) :
+				 virtqueue_detach_split(_vq, cursor, addr, len, dir);
+}
+EXPORT_SYMBOL_GPL(virtqueue_detach);
+
 /**
  * virtqueue_disable_cb - disable callbacks
  * @_vq: the struct virtqueue we're talking about.
@@ -2682,6 +2742,29 @@ void *virtqueue_detach_unused_buf(struct virtqueue *_vq)
 }
 EXPORT_SYMBOL_GPL(virtqueue_detach_unused_buf);
 
+/**
+ * virtqueue_detach_unused_buf_premapped - detach first unused buffer
+ * @_vq: the struct virtqueue we're talking about.
+ * @cursor: detach cursor
+ *
+ * This is used for the premapped vq. The cursor is passed by the dirver, that
+ * is used for virtqueue_detach. That will be initialized by virtio core
+ * internally.
+ *
+ * Returns NULL or the "data" token handed to virtqueue_add_*().
+ * This is not valid on an active queue; it is useful for device
+ * shutdown or the reset queue.
+ */
+void *virtqueue_detach_unused_buf_premapped(struct virtqueue *_vq,
+					    struct virtqueue_detach_cursor *cursor)
+{
+	struct vring_virtqueue *vq = to_vvq(_vq);
+
+	return vq->packed_ring ? virtqueue_detach_unused_buf_packed(_vq, cursor) :
+				 virtqueue_detach_unused_buf_split(_vq, cursor);
+}
+EXPORT_SYMBOL_GPL(virtqueue_detach_unused_buf_premapped);
+
 static inline bool more_used(const struct vring_virtqueue *vq)
 {
 	return vq->packed_ring ? more_used_packed(vq) : more_used_split(vq);
diff --git a/include/linux/virtio.h b/include/linux/virtio.h
index 7f137c7a9034..0a11c5b32fe5 100644
--- a/include/linux/virtio.h
+++ b/include/linux/virtio.h
@@ -3,6 +3,7 @@
 #define _LINUX_VIRTIO_H
 /* Everything a virtio driver needs to work with any particular virtio
  * implementation. */
+#include <linux/dma-mapping.h>
 #include <linux/types.h>
 #include <linux/scatterlist.h>
 #include <linux/spinlock.h>
@@ -88,6 +89,10 @@ void *virtqueue_get_buf(struct virtqueue *vq, unsigned int *len);
 void *virtqueue_get_buf_ctx(struct virtqueue *vq, unsigned int *len,
 			    void **ctx);
 
+void *virtqueue_get_buf_premapped(struct virtqueue *_vq, unsigned int *len,
+				  void **ctx,
+				  struct virtqueue_detach_cursor *cursor);
+
 void virtqueue_disable_cb(struct virtqueue *vq);
 
 bool virtqueue_enable_cb(struct virtqueue *vq);
@@ -101,6 +106,8 @@ bool virtqueue_poll(struct virtqueue *vq, unsigned);
 bool virtqueue_enable_cb_delayed(struct virtqueue *vq);
 
 void *virtqueue_detach_unused_buf(struct virtqueue *vq);
+void *virtqueue_detach_unused_buf_premapped(struct virtqueue *_vq,
+					    struct virtqueue_detach_cursor *cursor);
 
 unsigned int virtqueue_get_vring_size(const struct virtqueue *vq);
 
@@ -114,6 +121,9 @@ dma_addr_t virtqueue_get_used_addr(const struct virtqueue *vq);
 int virtqueue_resize(struct virtqueue *vq, u32 num,
 		     void (*recycle)(struct virtqueue *vq, void *buf));
 
+int virtqueue_detach(struct virtqueue *_vq, struct virtqueue_detach_cursor *cursor,
+		     dma_addr_t *addr, u32 *len, enum dma_data_direction *dir);
+
 /**
  * struct virtio_device - representation of a device using virtio
  * @index: unique position on the virtio bus
-- 
2.32.0.3.g01195cf9f

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply related	[flat|nested] 91+ messages in thread

* [PATCH vhost v10 07/10] virtio_ring: introduce helpers for premapped
@ 2023-06-02  9:22   ` Xuan Zhuo
  0 siblings, 0 replies; 91+ messages in thread
From: Xuan Zhuo @ 2023-06-02  9:22 UTC (permalink / raw)
  To: virtualization
  Cc: Michael S. Tsirkin, Jason Wang, Xuan Zhuo, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Alexei Starovoitov,
	Daniel Borkmann, Jesper Dangaard Brouer, John Fastabend, netdev,
	bpf

This patch introduces three helpers for premapped mode.

* virtqueue_get_buf_premapped
* virtqueue_detach_unused_buf_premapped

The above helpers work like the non-premapped funcs. But a cursor is
passed.

virtqueue_detach is used to get the dma info of the last buf by
  cursor.

Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
---
 drivers/virtio/virtio_ring.c | 83 ++++++++++++++++++++++++++++++++++++
 include/linux/virtio.h       | 10 +++++
 2 files changed, 93 insertions(+)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index cbc22daae7e1..6771b9661798 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -2555,6 +2555,66 @@ void *virtqueue_get_buf(struct virtqueue *_vq, unsigned int *len)
 	return virtqueue_get_buf_ctx(_vq, len, NULL);
 }
 EXPORT_SYMBOL_GPL(virtqueue_get_buf);
+
+/**
+ * virtqueue_get_buf_premapped - get the next used buffer
+ * @_vq: the struct virtqueue we're talking about.
+ * @len: the length written into the buffer
+ * @ctx: extra context for the token
+ * @cursor: detach cursor
+ *
+ * If the device wrote data into the buffer, @len will be set to the
+ * amount written.  This means you don't need to clear the buffer
+ * beforehand to ensure there's no data leakage in the case of short
+ * writes.
+ *
+ * Caller must ensure we don't call this with other virtqueue
+ * operations at the same time (except where noted).
+ *
+ * This is used for the premapped vq. The cursor is passed by the dirver, that
+ * is used for virtqueue_detach. That will be initialized by virtio core
+ * internally.
+ *
+ * Returns NULL if there are no used buffers, or the "data" token
+ * handed to virtqueue_add_*().
+ */
+void *virtqueue_get_buf_premapped(struct virtqueue *_vq, unsigned int *len,
+				  void **ctx,
+				  struct virtqueue_detach_cursor *cursor)
+{
+	struct vring_virtqueue *vq = to_vvq(_vq);
+
+	return vq->packed_ring ? virtqueue_get_buf_ctx_packed(_vq, len, ctx, cursor) :
+				 virtqueue_get_buf_ctx_split(_vq, len, ctx, cursor);
+}
+EXPORT_SYMBOL_GPL(virtqueue_get_buf_premapped);
+
+/**
+ * virtqueue_detach - get the dma info of last buf
+ * @_vq: the struct virtqueue we're talking about.
+ * @cursor: detach cursor
+ * @addr: the dma address
+ * @len: the length of the dma address
+ * @dir: the direction of the dma address
+ *
+ * This is used for the premapped vq. The cursor is initialized by
+ * virtqueue_get_buf_premapped or virtqueue_detach_unused_buf_premapped.
+ *
+ * Returns:
+ * -EAGAIN: there are more dma info, this function should be called more.
+ * -EINVAL: the process is done, should not call this function
+ * 0: no more dma info
+ */
+int virtqueue_detach(struct virtqueue *_vq, struct virtqueue_detach_cursor *cursor,
+		     dma_addr_t *addr, u32 *len, enum dma_data_direction *dir)
+{
+	struct vring_virtqueue *vq = to_vvq(_vq);
+
+	return vq->packed_ring ? virtqueue_detach_packed(_vq, cursor, addr, len, dir) :
+				 virtqueue_detach_split(_vq, cursor, addr, len, dir);
+}
+EXPORT_SYMBOL_GPL(virtqueue_detach);
+
 /**
  * virtqueue_disable_cb - disable callbacks
  * @_vq: the struct virtqueue we're talking about.
@@ -2682,6 +2742,29 @@ void *virtqueue_detach_unused_buf(struct virtqueue *_vq)
 }
 EXPORT_SYMBOL_GPL(virtqueue_detach_unused_buf);
 
+/**
+ * virtqueue_detach_unused_buf_premapped - detach first unused buffer
+ * @_vq: the struct virtqueue we're talking about.
+ * @cursor: detach cursor
+ *
+ * This is used for the premapped vq. The cursor is passed by the dirver, that
+ * is used for virtqueue_detach. That will be initialized by virtio core
+ * internally.
+ *
+ * Returns NULL or the "data" token handed to virtqueue_add_*().
+ * This is not valid on an active queue; it is useful for device
+ * shutdown or the reset queue.
+ */
+void *virtqueue_detach_unused_buf_premapped(struct virtqueue *_vq,
+					    struct virtqueue_detach_cursor *cursor)
+{
+	struct vring_virtqueue *vq = to_vvq(_vq);
+
+	return vq->packed_ring ? virtqueue_detach_unused_buf_packed(_vq, cursor) :
+				 virtqueue_detach_unused_buf_split(_vq, cursor);
+}
+EXPORT_SYMBOL_GPL(virtqueue_detach_unused_buf_premapped);
+
 static inline bool more_used(const struct vring_virtqueue *vq)
 {
 	return vq->packed_ring ? more_used_packed(vq) : more_used_split(vq);
diff --git a/include/linux/virtio.h b/include/linux/virtio.h
index 7f137c7a9034..0a11c5b32fe5 100644
--- a/include/linux/virtio.h
+++ b/include/linux/virtio.h
@@ -3,6 +3,7 @@
 #define _LINUX_VIRTIO_H
 /* Everything a virtio driver needs to work with any particular virtio
  * implementation. */
+#include <linux/dma-mapping.h>
 #include <linux/types.h>
 #include <linux/scatterlist.h>
 #include <linux/spinlock.h>
@@ -88,6 +89,10 @@ void *virtqueue_get_buf(struct virtqueue *vq, unsigned int *len);
 void *virtqueue_get_buf_ctx(struct virtqueue *vq, unsigned int *len,
 			    void **ctx);
 
+void *virtqueue_get_buf_premapped(struct virtqueue *_vq, unsigned int *len,
+				  void **ctx,
+				  struct virtqueue_detach_cursor *cursor);
+
 void virtqueue_disable_cb(struct virtqueue *vq);
 
 bool virtqueue_enable_cb(struct virtqueue *vq);
@@ -101,6 +106,8 @@ bool virtqueue_poll(struct virtqueue *vq, unsigned);
 bool virtqueue_enable_cb_delayed(struct virtqueue *vq);
 
 void *virtqueue_detach_unused_buf(struct virtqueue *vq);
+void *virtqueue_detach_unused_buf_premapped(struct virtqueue *_vq,
+					    struct virtqueue_detach_cursor *cursor);
 
 unsigned int virtqueue_get_vring_size(const struct virtqueue *vq);
 
@@ -114,6 +121,9 @@ dma_addr_t virtqueue_get_used_addr(const struct virtqueue *vq);
 int virtqueue_resize(struct virtqueue *vq, u32 num,
 		     void (*recycle)(struct virtqueue *vq, void *buf));
 
+int virtqueue_detach(struct virtqueue *_vq, struct virtqueue_detach_cursor *cursor,
+		     dma_addr_t *addr, u32 *len, enum dma_data_direction *dir);
+
 /**
  * struct virtio_device - representation of a device using virtio
  * @index: unique position on the virtio bus
-- 
2.32.0.3.g01195cf9f


^ permalink raw reply related	[flat|nested] 91+ messages in thread

* [PATCH vhost v10 08/10] virtio_ring: introduce virtqueue_dma_dev()
  2023-06-02  9:21 ` Xuan Zhuo
@ 2023-06-02  9:22   ` Xuan Zhuo
  -1 siblings, 0 replies; 91+ messages in thread
From: Xuan Zhuo @ 2023-06-02  9:22 UTC (permalink / raw)
  To: virtualization
  Cc: Xuan Zhuo, Jesper Dangaard Brouer, Daniel Borkmann,
	Michael S. Tsirkin, netdev, John Fastabend, Alexei Starovoitov,
	Eric Dumazet, Jakub Kicinski, bpf, Paolo Abeni, David S. Miller

Added virtqueue_dma_dev() to get DMA device for virtio. Then the
caller can do dma operation in advance. The purpose is to keep memory
mapped across multiple add/get buf operations.

Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
---
 drivers/virtio/virtio_ring.c | 17 +++++++++++++++++
 include/linux/virtio.h       |  2 ++
 2 files changed, 19 insertions(+)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index 6771b9661798..56444f872967 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -2459,6 +2459,23 @@ int virtqueue_add_inbuf_ctx(struct virtqueue *vq,
 }
 EXPORT_SYMBOL_GPL(virtqueue_add_inbuf_ctx);
 
+/**
+ * virtqueue_dma_dev - get the dma dev
+ * @_vq: the struct virtqueue we're talking about.
+ *
+ * Returns the dma dev. That can been used for dma api.
+ */
+struct device *virtqueue_dma_dev(struct virtqueue *_vq)
+{
+	struct vring_virtqueue *vq = to_vvq(_vq);
+
+	if (vq->use_dma_api)
+		return vring_dma_dev(vq);
+	else
+		return NULL;
+}
+EXPORT_SYMBOL_GPL(virtqueue_dma_dev);
+
 /**
  * virtqueue_kick_prepare - first half of split virtqueue_kick call.
  * @_vq: the struct virtqueue
diff --git a/include/linux/virtio.h b/include/linux/virtio.h
index 0a11c5b32fe5..b24f0a665390 100644
--- a/include/linux/virtio.h
+++ b/include/linux/virtio.h
@@ -78,6 +78,8 @@ int virtqueue_add_sgs(struct virtqueue *vq,
 		      void *data,
 		      gfp_t gfp);
 
+struct device *virtqueue_dma_dev(struct virtqueue *vq);
+
 bool virtqueue_kick(struct virtqueue *vq);
 
 bool virtqueue_kick_prepare(struct virtqueue *vq);
-- 
2.32.0.3.g01195cf9f

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply related	[flat|nested] 91+ messages in thread

* [PATCH vhost v10 08/10] virtio_ring: introduce virtqueue_dma_dev()
@ 2023-06-02  9:22   ` Xuan Zhuo
  0 siblings, 0 replies; 91+ messages in thread
From: Xuan Zhuo @ 2023-06-02  9:22 UTC (permalink / raw)
  To: virtualization
  Cc: Michael S. Tsirkin, Jason Wang, Xuan Zhuo, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Alexei Starovoitov,
	Daniel Borkmann, Jesper Dangaard Brouer, John Fastabend, netdev,
	bpf

Added virtqueue_dma_dev() to get DMA device for virtio. Then the
caller can do dma operation in advance. The purpose is to keep memory
mapped across multiple add/get buf operations.

Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
---
 drivers/virtio/virtio_ring.c | 17 +++++++++++++++++
 include/linux/virtio.h       |  2 ++
 2 files changed, 19 insertions(+)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index 6771b9661798..56444f872967 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -2459,6 +2459,23 @@ int virtqueue_add_inbuf_ctx(struct virtqueue *vq,
 }
 EXPORT_SYMBOL_GPL(virtqueue_add_inbuf_ctx);
 
+/**
+ * virtqueue_dma_dev - get the dma dev
+ * @_vq: the struct virtqueue we're talking about.
+ *
+ * Returns the dma dev. That can been used for dma api.
+ */
+struct device *virtqueue_dma_dev(struct virtqueue *_vq)
+{
+	struct vring_virtqueue *vq = to_vvq(_vq);
+
+	if (vq->use_dma_api)
+		return vring_dma_dev(vq);
+	else
+		return NULL;
+}
+EXPORT_SYMBOL_GPL(virtqueue_dma_dev);
+
 /**
  * virtqueue_kick_prepare - first half of split virtqueue_kick call.
  * @_vq: the struct virtqueue
diff --git a/include/linux/virtio.h b/include/linux/virtio.h
index 0a11c5b32fe5..b24f0a665390 100644
--- a/include/linux/virtio.h
+++ b/include/linux/virtio.h
@@ -78,6 +78,8 @@ int virtqueue_add_sgs(struct virtqueue *vq,
 		      void *data,
 		      gfp_t gfp);
 
+struct device *virtqueue_dma_dev(struct virtqueue *vq);
+
 bool virtqueue_kick(struct virtqueue *vq);
 
 bool virtqueue_kick_prepare(struct virtqueue *vq);
-- 
2.32.0.3.g01195cf9f


^ permalink raw reply related	[flat|nested] 91+ messages in thread

* [PATCH vhost v10 09/10] virtio_ring: introduce virtqueue_add_sg()
  2023-06-02  9:21 ` Xuan Zhuo
@ 2023-06-02  9:22   ` Xuan Zhuo
  -1 siblings, 0 replies; 91+ messages in thread
From: Xuan Zhuo @ 2023-06-02  9:22 UTC (permalink / raw)
  To: virtualization
  Cc: Xuan Zhuo, Jesper Dangaard Brouer, Daniel Borkmann,
	Michael S. Tsirkin, netdev, John Fastabend, Alexei Starovoitov,
	Eric Dumazet, Jakub Kicinski, bpf, Paolo Abeni, David S. Miller

Introduce virtqueueu_add_sg(), so that in virtio-net we can create an
unify api for rq and sq.

Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
---
 drivers/virtio/virtio_ring.c | 23 +++++++++++++++++++++++
 include/linux/virtio.h       |  4 ++++
 2 files changed, 27 insertions(+)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index 56444f872967..a00f049ea442 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -2356,6 +2356,29 @@ static inline int virtqueue_add(struct virtqueue *_vq,
 					out_sgs, in_sgs, data, ctx, gfp);
 }
 
+/**
+ * virtqueue_add_sg - expose buffers to other end
+ * @vq: the struct virtqueue we're talking about.
+ * @sg: a scatterlist
+ * @num: the number of entries in @sg
+ * @out: whether the sg is readable by other side
+ * @data: the token identifying the buffer.
+ * @ctx: extra context for the token
+ * @gfp: how to do memory allocations (if necessary).
+ *
+ * Caller must ensure we don't call this with other virtqueue operations
+ * at the same time (except where noted).
+ *
+ * Returns zero or a negative error (ie. ENOSPC, ENOMEM, EIO).
+ */
+int virtqueue_add_sg(struct virtqueue *vq, struct scatterlist *sg,
+		     unsigned int num, bool out, void *data,
+		     void *ctx, gfp_t gfp)
+{
+	return virtqueue_add(vq, &sg, num, (int)out, (int)!out, data, ctx, gfp);
+}
+EXPORT_SYMBOL_GPL(virtqueue_add_sg);
+
 /**
  * virtqueue_add_sgs - expose buffers to other end
  * @_vq: the struct virtqueue we're talking about.
diff --git a/include/linux/virtio.h b/include/linux/virtio.h
index b24f0a665390..1a4aa4382c53 100644
--- a/include/linux/virtio.h
+++ b/include/linux/virtio.h
@@ -55,6 +55,10 @@ struct virtqueue_detach_cursor {
 	unsigned pos:16;
 };
 
+int virtqueue_add_sg(struct virtqueue *vq, struct scatterlist *sg,
+		     unsigned int num, bool out, void *data,
+		     void *ctx, gfp_t gfp);
+
 int virtqueue_add_outbuf(struct virtqueue *vq,
 			 struct scatterlist sg[], unsigned int num,
 			 void *data,
-- 
2.32.0.3.g01195cf9f

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply related	[flat|nested] 91+ messages in thread

* [PATCH vhost v10 09/10] virtio_ring: introduce virtqueue_add_sg()
@ 2023-06-02  9:22   ` Xuan Zhuo
  0 siblings, 0 replies; 91+ messages in thread
From: Xuan Zhuo @ 2023-06-02  9:22 UTC (permalink / raw)
  To: virtualization
  Cc: Michael S. Tsirkin, Jason Wang, Xuan Zhuo, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Alexei Starovoitov,
	Daniel Borkmann, Jesper Dangaard Brouer, John Fastabend, netdev,
	bpf

Introduce virtqueueu_add_sg(), so that in virtio-net we can create an
unify api for rq and sq.

Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
---
 drivers/virtio/virtio_ring.c | 23 +++++++++++++++++++++++
 include/linux/virtio.h       |  4 ++++
 2 files changed, 27 insertions(+)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index 56444f872967..a00f049ea442 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -2356,6 +2356,29 @@ static inline int virtqueue_add(struct virtqueue *_vq,
 					out_sgs, in_sgs, data, ctx, gfp);
 }
 
+/**
+ * virtqueue_add_sg - expose buffers to other end
+ * @vq: the struct virtqueue we're talking about.
+ * @sg: a scatterlist
+ * @num: the number of entries in @sg
+ * @out: whether the sg is readable by other side
+ * @data: the token identifying the buffer.
+ * @ctx: extra context for the token
+ * @gfp: how to do memory allocations (if necessary).
+ *
+ * Caller must ensure we don't call this with other virtqueue operations
+ * at the same time (except where noted).
+ *
+ * Returns zero or a negative error (ie. ENOSPC, ENOMEM, EIO).
+ */
+int virtqueue_add_sg(struct virtqueue *vq, struct scatterlist *sg,
+		     unsigned int num, bool out, void *data,
+		     void *ctx, gfp_t gfp)
+{
+	return virtqueue_add(vq, &sg, num, (int)out, (int)!out, data, ctx, gfp);
+}
+EXPORT_SYMBOL_GPL(virtqueue_add_sg);
+
 /**
  * virtqueue_add_sgs - expose buffers to other end
  * @_vq: the struct virtqueue we're talking about.
diff --git a/include/linux/virtio.h b/include/linux/virtio.h
index b24f0a665390..1a4aa4382c53 100644
--- a/include/linux/virtio.h
+++ b/include/linux/virtio.h
@@ -55,6 +55,10 @@ struct virtqueue_detach_cursor {
 	unsigned pos:16;
 };
 
+int virtqueue_add_sg(struct virtqueue *vq, struct scatterlist *sg,
+		     unsigned int num, bool out, void *data,
+		     void *ctx, gfp_t gfp);
+
 int virtqueue_add_outbuf(struct virtqueue *vq,
 			 struct scatterlist sg[], unsigned int num,
 			 void *data,
-- 
2.32.0.3.g01195cf9f


^ permalink raw reply related	[flat|nested] 91+ messages in thread

* [PATCH vhost v10 10/10] virtio_net: support dma premapped
  2023-06-02  9:21 ` Xuan Zhuo
@ 2023-06-02  9:22   ` Xuan Zhuo
  -1 siblings, 0 replies; 91+ messages in thread
From: Xuan Zhuo @ 2023-06-02  9:22 UTC (permalink / raw)
  To: virtualization
  Cc: Xuan Zhuo, Jesper Dangaard Brouer, Daniel Borkmann,
	Michael S. Tsirkin, netdev, John Fastabend, Alexei Starovoitov,
	Eric Dumazet, Jakub Kicinski, bpf, Paolo Abeni, David S. Miller

Introduce the module param "experiment_premapped" to enable the function
that the virtio-net do dma mapping.

If that is true, the vq of virtio-net is under the premapped mode.
It just handle the sg with dma_address. And the driver must get the dma
address of the buffer to unmap after get the buffer from virtio core.

That will be useful when AF_XDP is enable, AF_XDP tx and the kernel packet
xmit will share the tx queue, so the skb xmit must support the premapped
mode.

Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
---
 drivers/net/virtio_net.c | 163 +++++++++++++++++++++++++++++++++------
 1 file changed, 141 insertions(+), 22 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 2396c28c0122..5898212fcb3c 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -26,10 +26,11 @@
 static int napi_weight = NAPI_POLL_WEIGHT;
 module_param(napi_weight, int, 0444);
 
-static bool csum = true, gso = true, napi_tx = true;
+static bool csum = true, gso = true, napi_tx = true, experiment_premapped;
 module_param(csum, bool, 0444);
 module_param(gso, bool, 0444);
 module_param(napi_tx, bool, 0644);
+module_param(experiment_premapped, bool, 0644);
 
 /* FIXME: MTU in config. */
 #define GOOD_PACKET_LEN (ETH_HLEN + VLAN_HLEN + ETH_DATA_LEN)
@@ -142,6 +143,9 @@ struct send_queue {
 
 	/* Record whether sq is in reset state. */
 	bool reset;
+
+	/* The vq is premapped mode. */
+	bool premapped;
 };
 
 /* Internal representation of a receive virtqueue */
@@ -174,6 +178,9 @@ struct receive_queue {
 	char name[16];
 
 	struct xdp_rxq_info xdp_rxq;
+
+	/* The vq is premapped mode. */
+	bool premapped;
 };
 
 /* This structure can contain rss message with maximum settings for indirection table and keysize
@@ -546,6 +553,105 @@ static struct sk_buff *page_to_skb(struct virtnet_info *vi,
 	return skb;
 }
 
+static int virtnet_generic_unmap(struct virtqueue *vq, struct virtqueue_detach_cursor *cursor)
+{
+	enum dma_data_direction dir;
+	dma_addr_t addr;
+	u32 len;
+	int err;
+
+	do {
+		err = virtqueue_detach(vq, cursor, &addr, &len, &dir);
+		if (!err || err == -EAGAIN)
+			dma_unmap_page_attrs(virtqueue_dma_dev(vq), addr, len, dir, 0);
+
+	} while (err == -EAGAIN);
+
+	return err;
+}
+
+static void *virtnet_detach_unused_buf(struct virtqueue *vq, bool premapped)
+{
+	struct virtqueue_detach_cursor cursor;
+	void *buf;
+
+	if (!premapped)
+		return virtqueue_detach_unused_buf(vq);
+
+	buf = virtqueue_detach_unused_buf_premapped(vq, &cursor);
+	if (buf)
+		virtnet_generic_unmap(vq, &cursor);
+
+	return buf;
+}
+
+static void *virtnet_get_buf_ctx(struct virtqueue *vq, bool premapped, u32 *len, void **ctx)
+{
+	struct virtqueue_detach_cursor cursor;
+	void *buf;
+
+	if (!premapped)
+		return virtqueue_get_buf_ctx(vq, len, ctx);
+
+	buf = virtqueue_get_buf_premapped(vq, len, ctx, &cursor);
+	if (buf)
+		virtnet_generic_unmap(vq, &cursor);
+
+	return buf;
+}
+
+#define virtnet_rq_get_buf(rq, plen, pctx) \
+({ \
+	typeof(rq) _rq = (rq); \
+	virtnet_get_buf_ctx(_rq->vq, _rq->premapped, plen, pctx); \
+})
+
+#define virtnet_sq_get_buf(sq, plen, pctx) \
+({ \
+	typeof(sq) _sq = (sq); \
+	virtnet_get_buf_ctx(_sq->vq, _sq->premapped, plen, pctx); \
+})
+
+static int virtnet_add_sg(struct virtqueue *vq, bool premapped,
+			  struct scatterlist *sg, unsigned int num, bool out,
+			  void *data, void *ctx, gfp_t gfp)
+{
+	enum dma_data_direction dir;
+	struct device *dev;
+	int err, ret;
+
+	if (!premapped)
+		return virtqueue_add_sg(vq, sg, num, out, data, ctx, gfp);
+
+	dir = out ? DMA_TO_DEVICE : DMA_FROM_DEVICE;
+	dev = virtqueue_dma_dev(vq);
+
+	ret = dma_map_sg_attrs(dev, sg, num, dir, 0);
+	if (ret != num)
+		goto err;
+
+	err = virtqueue_add_sg(vq, sg, num, out, data, ctx, gfp);
+	if (err < 0)
+		goto err;
+
+	return 0;
+
+err:
+	dma_unmap_sg_attrs(dev, sg, num, dir, 0);
+	return -ENOMEM;
+}
+
+static int virtnet_add_outbuf(struct send_queue *sq, unsigned int num, void *data)
+{
+	return virtnet_add_sg(sq->vq, sq->premapped, sq->sg, num, true, data, NULL, GFP_ATOMIC);
+}
+
+static int virtnet_add_inbuf(struct receive_queue *rq, unsigned int num, void *data,
+			     void *ctx, gfp_t gfp)
+{
+	return virtnet_add_sg(rq->vq, rq->premapped, rq->sg, num, false, data, ctx, gfp);
+}
+
 static void free_old_xmit_skbs(struct send_queue *sq, bool in_napi)
 {
 	unsigned int len;
@@ -553,7 +659,7 @@ static void free_old_xmit_skbs(struct send_queue *sq, bool in_napi)
 	unsigned int bytes = 0;
 	void *ptr;
 
-	while ((ptr = virtqueue_get_buf(sq->vq, &len)) != NULL) {
+	while ((ptr = virtnet_sq_get_buf(sq, &len, NULL)) != NULL) {
 		if (likely(!is_xdp_frame(ptr))) {
 			struct sk_buff *skb = ptr;
 
@@ -667,8 +773,7 @@ static int __virtnet_xdp_xmit_one(struct virtnet_info *vi,
 			    skb_frag_size(frag), skb_frag_off(frag));
 	}
 
-	err = virtqueue_add_outbuf(sq->vq, sq->sg, nr_frags + 1,
-				   xdp_to_ptr(xdpf), GFP_ATOMIC);
+	err = virtnet_add_outbuf(sq, nr_frags + 1, xdp_to_ptr(xdpf));
 	if (unlikely(err))
 		return -ENOSPC; /* Caller handle free/refcnt */
 
@@ -744,7 +849,7 @@ static int virtnet_xdp_xmit(struct net_device *dev,
 	}
 
 	/* Free up any pending old buffers before queueing new ones. */
-	while ((ptr = virtqueue_get_buf(sq->vq, &len)) != NULL) {
+	while ((ptr = virtnet_sq_get_buf(sq, &len, NULL)) != NULL) {
 		if (likely(is_xdp_frame(ptr))) {
 			struct xdp_frame *frame = ptr_to_xdp(ptr);
 
@@ -828,7 +933,7 @@ static struct page *xdp_linearize_page(struct receive_queue *rq,
 		void *buf;
 		int off;
 
-		buf = virtqueue_get_buf(rq->vq, &buflen);
+		buf = virtnet_rq_get_buf(rq, &buflen, NULL);
 		if (unlikely(!buf))
 			goto err_buf;
 
@@ -1119,7 +1224,7 @@ static int virtnet_build_xdp_buff_mrg(struct net_device *dev,
 		return -EINVAL;
 
 	while (--*num_buf > 0) {
-		buf = virtqueue_get_buf_ctx(rq->vq, &len, &ctx);
+		buf = virtnet_rq_get_buf(rq, &len, &ctx);
 		if (unlikely(!buf)) {
 			pr_debug("%s: rx error: %d buffers out of %d missing\n",
 				 dev->name, *num_buf,
@@ -1344,7 +1449,7 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
 	while (--num_buf) {
 		int num_skb_frags;
 
-		buf = virtqueue_get_buf_ctx(rq->vq, &len, &ctx);
+		buf = virtnet_rq_get_buf(rq, &len, &ctx);
 		if (unlikely(!buf)) {
 			pr_debug("%s: rx error: %d buffers out of %d missing\n",
 				 dev->name, num_buf,
@@ -1407,7 +1512,7 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
 err_skb:
 	put_page(page);
 	while (num_buf-- > 1) {
-		buf = virtqueue_get_buf(rq->vq, &len);
+		buf = virtnet_rq_get_buf(rq, &len, NULL);
 		if (unlikely(!buf)) {
 			pr_debug("%s: rx error: %d buffers missing\n",
 				 dev->name, num_buf);
@@ -1534,7 +1639,7 @@ static int add_recvbuf_small(struct virtnet_info *vi, struct receive_queue *rq,
 	alloc_frag->offset += len;
 	sg_init_one(rq->sg, buf + VIRTNET_RX_PAD + xdp_headroom,
 		    vi->hdr_len + GOOD_PACKET_LEN);
-	err = virtqueue_add_inbuf_ctx(rq->vq, rq->sg, 1, buf, ctx, gfp);
+	err = virtnet_add_inbuf(rq, 1, buf, ctx, gfp);
 	if (err < 0)
 		put_page(virt_to_head_page(buf));
 	return err;
@@ -1581,8 +1686,8 @@ static int add_recvbuf_big(struct virtnet_info *vi, struct receive_queue *rq,
 
 	/* chain first in list head */
 	first->private = (unsigned long)list;
-	err = virtqueue_add_inbuf(rq->vq, rq->sg, vi->big_packets_num_skbfrags + 2,
-				  first, gfp);
+	err = virtnet_add_inbuf(rq, vi->big_packets_num_skbfrags + 2,
+				first, NULL, gfp);
 	if (err < 0)
 		give_pages(rq, first);
 
@@ -1645,7 +1750,7 @@ static int add_recvbuf_mergeable(struct virtnet_info *vi,
 
 	sg_init_one(rq->sg, buf, len);
 	ctx = mergeable_len_to_ctx(len + room, headroom);
-	err = virtqueue_add_inbuf_ctx(rq->vq, rq->sg, 1, buf, ctx, gfp);
+	err = virtnet_add_inbuf(rq, 1, buf, ctx, gfp);
 	if (err < 0)
 		put_page(virt_to_head_page(buf));
 
@@ -1768,13 +1873,13 @@ static int virtnet_receive(struct receive_queue *rq, int budget,
 		void *ctx;
 
 		while (stats.packets < budget &&
-		       (buf = virtqueue_get_buf_ctx(rq->vq, &len, &ctx))) {
+		       (buf = virtnet_rq_get_buf(rq, &len, &ctx))) {
 			receive_buf(vi, rq, buf, len, ctx, xdp_xmit, &stats);
 			stats.packets++;
 		}
 	} else {
 		while (stats.packets < budget &&
-		       (buf = virtqueue_get_buf(rq->vq, &len)) != NULL) {
+		       (buf = virtnet_rq_get_buf(rq, &len, NULL)) != NULL) {
 			receive_buf(vi, rq, buf, len, NULL, xdp_xmit, &stats);
 			stats.packets++;
 		}
@@ -1984,7 +2089,7 @@ static int xmit_skb(struct send_queue *sq, struct sk_buff *skb)
 			return num_sg;
 		num_sg++;
 	}
-	return virtqueue_add_outbuf(sq->vq, sq->sg, num_sg, skb, GFP_ATOMIC);
+	return virtnet_add_outbuf(sq, num_sg, skb);
 }
 
 static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev)
@@ -3552,15 +3657,17 @@ static void free_unused_bufs(struct virtnet_info *vi)
 	int i;
 
 	for (i = 0; i < vi->max_queue_pairs; i++) {
-		struct virtqueue *vq = vi->sq[i].vq;
-		while ((buf = virtqueue_detach_unused_buf(vq)) != NULL)
-			virtnet_sq_free_unused_buf(vq, buf);
+		struct send_queue *sq = &vi->sq[i];
+
+		while ((buf = virtnet_detach_unused_buf(sq->vq, sq->premapped)) != NULL)
+			virtnet_sq_free_unused_buf(sq->vq, buf);
 	}
 
 	for (i = 0; i < vi->max_queue_pairs; i++) {
-		struct virtqueue *vq = vi->rq[i].vq;
-		while ((buf = virtqueue_detach_unused_buf(vq)) != NULL)
-			virtnet_rq_free_unused_buf(vq, buf);
+		struct receive_queue *rq = &vi->rq[i];
+
+		while ((buf = virtnet_detach_unused_buf(rq->vq, rq->premapped)) != NULL)
+			virtnet_rq_free_unused_buf(rq->vq, buf);
 	}
 }
 
@@ -3658,6 +3765,18 @@ static int virtnet_find_vqs(struct virtnet_info *vi)
 		vi->rq[i].vq = vqs[rxq2vq(i)];
 		vi->rq[i].min_buf_len = mergeable_min_buf_len(vi, vi->rq[i].vq);
 		vi->sq[i].vq = vqs[txq2vq(i)];
+
+		if (experiment_premapped) {
+			if (!virtqueue_set_premapped(vi->rq[i].vq))
+				vi->rq[i].premapped = true;
+			else
+				netdev_warn(vi->dev, "RXQ (%d) enable premapped failure.\n", i);
+
+			if (!virtqueue_set_premapped(vi->sq[i].vq))
+				vi->sq[i].premapped = true;
+			else
+				netdev_warn(vi->dev, "TXQ (%d) enable premapped failure.\n", i);
+		}
 	}
 
 	/* run here: ret == 0. */
-- 
2.32.0.3.g01195cf9f

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply related	[flat|nested] 91+ messages in thread

* [PATCH vhost v10 10/10] virtio_net: support dma premapped
@ 2023-06-02  9:22   ` Xuan Zhuo
  0 siblings, 0 replies; 91+ messages in thread
From: Xuan Zhuo @ 2023-06-02  9:22 UTC (permalink / raw)
  To: virtualization
  Cc: Michael S. Tsirkin, Jason Wang, Xuan Zhuo, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Alexei Starovoitov,
	Daniel Borkmann, Jesper Dangaard Brouer, John Fastabend, netdev,
	bpf

Introduce the module param "experiment_premapped" to enable the function
that the virtio-net do dma mapping.

If that is true, the vq of virtio-net is under the premapped mode.
It just handle the sg with dma_address. And the driver must get the dma
address of the buffer to unmap after get the buffer from virtio core.

That will be useful when AF_XDP is enable, AF_XDP tx and the kernel packet
xmit will share the tx queue, so the skb xmit must support the premapped
mode.

Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
---
 drivers/net/virtio_net.c | 163 +++++++++++++++++++++++++++++++++------
 1 file changed, 141 insertions(+), 22 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 2396c28c0122..5898212fcb3c 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -26,10 +26,11 @@
 static int napi_weight = NAPI_POLL_WEIGHT;
 module_param(napi_weight, int, 0444);
 
-static bool csum = true, gso = true, napi_tx = true;
+static bool csum = true, gso = true, napi_tx = true, experiment_premapped;
 module_param(csum, bool, 0444);
 module_param(gso, bool, 0444);
 module_param(napi_tx, bool, 0644);
+module_param(experiment_premapped, bool, 0644);
 
 /* FIXME: MTU in config. */
 #define GOOD_PACKET_LEN (ETH_HLEN + VLAN_HLEN + ETH_DATA_LEN)
@@ -142,6 +143,9 @@ struct send_queue {
 
 	/* Record whether sq is in reset state. */
 	bool reset;
+
+	/* The vq is premapped mode. */
+	bool premapped;
 };
 
 /* Internal representation of a receive virtqueue */
@@ -174,6 +178,9 @@ struct receive_queue {
 	char name[16];
 
 	struct xdp_rxq_info xdp_rxq;
+
+	/* The vq is premapped mode. */
+	bool premapped;
 };
 
 /* This structure can contain rss message with maximum settings for indirection table and keysize
@@ -546,6 +553,105 @@ static struct sk_buff *page_to_skb(struct virtnet_info *vi,
 	return skb;
 }
 
+static int virtnet_generic_unmap(struct virtqueue *vq, struct virtqueue_detach_cursor *cursor)
+{
+	enum dma_data_direction dir;
+	dma_addr_t addr;
+	u32 len;
+	int err;
+
+	do {
+		err = virtqueue_detach(vq, cursor, &addr, &len, &dir);
+		if (!err || err == -EAGAIN)
+			dma_unmap_page_attrs(virtqueue_dma_dev(vq), addr, len, dir, 0);
+
+	} while (err == -EAGAIN);
+
+	return err;
+}
+
+static void *virtnet_detach_unused_buf(struct virtqueue *vq, bool premapped)
+{
+	struct virtqueue_detach_cursor cursor;
+	void *buf;
+
+	if (!premapped)
+		return virtqueue_detach_unused_buf(vq);
+
+	buf = virtqueue_detach_unused_buf_premapped(vq, &cursor);
+	if (buf)
+		virtnet_generic_unmap(vq, &cursor);
+
+	return buf;
+}
+
+static void *virtnet_get_buf_ctx(struct virtqueue *vq, bool premapped, u32 *len, void **ctx)
+{
+	struct virtqueue_detach_cursor cursor;
+	void *buf;
+
+	if (!premapped)
+		return virtqueue_get_buf_ctx(vq, len, ctx);
+
+	buf = virtqueue_get_buf_premapped(vq, len, ctx, &cursor);
+	if (buf)
+		virtnet_generic_unmap(vq, &cursor);
+
+	return buf;
+}
+
+#define virtnet_rq_get_buf(rq, plen, pctx) \
+({ \
+	typeof(rq) _rq = (rq); \
+	virtnet_get_buf_ctx(_rq->vq, _rq->premapped, plen, pctx); \
+})
+
+#define virtnet_sq_get_buf(sq, plen, pctx) \
+({ \
+	typeof(sq) _sq = (sq); \
+	virtnet_get_buf_ctx(_sq->vq, _sq->premapped, plen, pctx); \
+})
+
+static int virtnet_add_sg(struct virtqueue *vq, bool premapped,
+			  struct scatterlist *sg, unsigned int num, bool out,
+			  void *data, void *ctx, gfp_t gfp)
+{
+	enum dma_data_direction dir;
+	struct device *dev;
+	int err, ret;
+
+	if (!premapped)
+		return virtqueue_add_sg(vq, sg, num, out, data, ctx, gfp);
+
+	dir = out ? DMA_TO_DEVICE : DMA_FROM_DEVICE;
+	dev = virtqueue_dma_dev(vq);
+
+	ret = dma_map_sg_attrs(dev, sg, num, dir, 0);
+	if (ret != num)
+		goto err;
+
+	err = virtqueue_add_sg(vq, sg, num, out, data, ctx, gfp);
+	if (err < 0)
+		goto err;
+
+	return 0;
+
+err:
+	dma_unmap_sg_attrs(dev, sg, num, dir, 0);
+	return -ENOMEM;
+}
+
+static int virtnet_add_outbuf(struct send_queue *sq, unsigned int num, void *data)
+{
+	return virtnet_add_sg(sq->vq, sq->premapped, sq->sg, num, true, data, NULL, GFP_ATOMIC);
+}
+
+static int virtnet_add_inbuf(struct receive_queue *rq, unsigned int num, void *data,
+			     void *ctx, gfp_t gfp)
+{
+	return virtnet_add_sg(rq->vq, rq->premapped, rq->sg, num, false, data, ctx, gfp);
+}
+
 static void free_old_xmit_skbs(struct send_queue *sq, bool in_napi)
 {
 	unsigned int len;
@@ -553,7 +659,7 @@ static void free_old_xmit_skbs(struct send_queue *sq, bool in_napi)
 	unsigned int bytes = 0;
 	void *ptr;
 
-	while ((ptr = virtqueue_get_buf(sq->vq, &len)) != NULL) {
+	while ((ptr = virtnet_sq_get_buf(sq, &len, NULL)) != NULL) {
 		if (likely(!is_xdp_frame(ptr))) {
 			struct sk_buff *skb = ptr;
 
@@ -667,8 +773,7 @@ static int __virtnet_xdp_xmit_one(struct virtnet_info *vi,
 			    skb_frag_size(frag), skb_frag_off(frag));
 	}
 
-	err = virtqueue_add_outbuf(sq->vq, sq->sg, nr_frags + 1,
-				   xdp_to_ptr(xdpf), GFP_ATOMIC);
+	err = virtnet_add_outbuf(sq, nr_frags + 1, xdp_to_ptr(xdpf));
 	if (unlikely(err))
 		return -ENOSPC; /* Caller handle free/refcnt */
 
@@ -744,7 +849,7 @@ static int virtnet_xdp_xmit(struct net_device *dev,
 	}
 
 	/* Free up any pending old buffers before queueing new ones. */
-	while ((ptr = virtqueue_get_buf(sq->vq, &len)) != NULL) {
+	while ((ptr = virtnet_sq_get_buf(sq, &len, NULL)) != NULL) {
 		if (likely(is_xdp_frame(ptr))) {
 			struct xdp_frame *frame = ptr_to_xdp(ptr);
 
@@ -828,7 +933,7 @@ static struct page *xdp_linearize_page(struct receive_queue *rq,
 		void *buf;
 		int off;
 
-		buf = virtqueue_get_buf(rq->vq, &buflen);
+		buf = virtnet_rq_get_buf(rq, &buflen, NULL);
 		if (unlikely(!buf))
 			goto err_buf;
 
@@ -1119,7 +1224,7 @@ static int virtnet_build_xdp_buff_mrg(struct net_device *dev,
 		return -EINVAL;
 
 	while (--*num_buf > 0) {
-		buf = virtqueue_get_buf_ctx(rq->vq, &len, &ctx);
+		buf = virtnet_rq_get_buf(rq, &len, &ctx);
 		if (unlikely(!buf)) {
 			pr_debug("%s: rx error: %d buffers out of %d missing\n",
 				 dev->name, *num_buf,
@@ -1344,7 +1449,7 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
 	while (--num_buf) {
 		int num_skb_frags;
 
-		buf = virtqueue_get_buf_ctx(rq->vq, &len, &ctx);
+		buf = virtnet_rq_get_buf(rq, &len, &ctx);
 		if (unlikely(!buf)) {
 			pr_debug("%s: rx error: %d buffers out of %d missing\n",
 				 dev->name, num_buf,
@@ -1407,7 +1512,7 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
 err_skb:
 	put_page(page);
 	while (num_buf-- > 1) {
-		buf = virtqueue_get_buf(rq->vq, &len);
+		buf = virtnet_rq_get_buf(rq, &len, NULL);
 		if (unlikely(!buf)) {
 			pr_debug("%s: rx error: %d buffers missing\n",
 				 dev->name, num_buf);
@@ -1534,7 +1639,7 @@ static int add_recvbuf_small(struct virtnet_info *vi, struct receive_queue *rq,
 	alloc_frag->offset += len;
 	sg_init_one(rq->sg, buf + VIRTNET_RX_PAD + xdp_headroom,
 		    vi->hdr_len + GOOD_PACKET_LEN);
-	err = virtqueue_add_inbuf_ctx(rq->vq, rq->sg, 1, buf, ctx, gfp);
+	err = virtnet_add_inbuf(rq, 1, buf, ctx, gfp);
 	if (err < 0)
 		put_page(virt_to_head_page(buf));
 	return err;
@@ -1581,8 +1686,8 @@ static int add_recvbuf_big(struct virtnet_info *vi, struct receive_queue *rq,
 
 	/* chain first in list head */
 	first->private = (unsigned long)list;
-	err = virtqueue_add_inbuf(rq->vq, rq->sg, vi->big_packets_num_skbfrags + 2,
-				  first, gfp);
+	err = virtnet_add_inbuf(rq, vi->big_packets_num_skbfrags + 2,
+				first, NULL, gfp);
 	if (err < 0)
 		give_pages(rq, first);
 
@@ -1645,7 +1750,7 @@ static int add_recvbuf_mergeable(struct virtnet_info *vi,
 
 	sg_init_one(rq->sg, buf, len);
 	ctx = mergeable_len_to_ctx(len + room, headroom);
-	err = virtqueue_add_inbuf_ctx(rq->vq, rq->sg, 1, buf, ctx, gfp);
+	err = virtnet_add_inbuf(rq, 1, buf, ctx, gfp);
 	if (err < 0)
 		put_page(virt_to_head_page(buf));
 
@@ -1768,13 +1873,13 @@ static int virtnet_receive(struct receive_queue *rq, int budget,
 		void *ctx;
 
 		while (stats.packets < budget &&
-		       (buf = virtqueue_get_buf_ctx(rq->vq, &len, &ctx))) {
+		       (buf = virtnet_rq_get_buf(rq, &len, &ctx))) {
 			receive_buf(vi, rq, buf, len, ctx, xdp_xmit, &stats);
 			stats.packets++;
 		}
 	} else {
 		while (stats.packets < budget &&
-		       (buf = virtqueue_get_buf(rq->vq, &len)) != NULL) {
+		       (buf = virtnet_rq_get_buf(rq, &len, NULL)) != NULL) {
 			receive_buf(vi, rq, buf, len, NULL, xdp_xmit, &stats);
 			stats.packets++;
 		}
@@ -1984,7 +2089,7 @@ static int xmit_skb(struct send_queue *sq, struct sk_buff *skb)
 			return num_sg;
 		num_sg++;
 	}
-	return virtqueue_add_outbuf(sq->vq, sq->sg, num_sg, skb, GFP_ATOMIC);
+	return virtnet_add_outbuf(sq, num_sg, skb);
 }
 
 static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev)
@@ -3552,15 +3657,17 @@ static void free_unused_bufs(struct virtnet_info *vi)
 	int i;
 
 	for (i = 0; i < vi->max_queue_pairs; i++) {
-		struct virtqueue *vq = vi->sq[i].vq;
-		while ((buf = virtqueue_detach_unused_buf(vq)) != NULL)
-			virtnet_sq_free_unused_buf(vq, buf);
+		struct send_queue *sq = &vi->sq[i];
+
+		while ((buf = virtnet_detach_unused_buf(sq->vq, sq->premapped)) != NULL)
+			virtnet_sq_free_unused_buf(sq->vq, buf);
 	}
 
 	for (i = 0; i < vi->max_queue_pairs; i++) {
-		struct virtqueue *vq = vi->rq[i].vq;
-		while ((buf = virtqueue_detach_unused_buf(vq)) != NULL)
-			virtnet_rq_free_unused_buf(vq, buf);
+		struct receive_queue *rq = &vi->rq[i];
+
+		while ((buf = virtnet_detach_unused_buf(rq->vq, rq->premapped)) != NULL)
+			virtnet_rq_free_unused_buf(rq->vq, buf);
 	}
 }
 
@@ -3658,6 +3765,18 @@ static int virtnet_find_vqs(struct virtnet_info *vi)
 		vi->rq[i].vq = vqs[rxq2vq(i)];
 		vi->rq[i].min_buf_len = mergeable_min_buf_len(vi, vi->rq[i].vq);
 		vi->sq[i].vq = vqs[txq2vq(i)];
+
+		if (experiment_premapped) {
+			if (!virtqueue_set_premapped(vi->rq[i].vq))
+				vi->rq[i].premapped = true;
+			else
+				netdev_warn(vi->dev, "RXQ (%d) enable premapped failure.\n", i);
+
+			if (!virtqueue_set_premapped(vi->sq[i].vq))
+				vi->sq[i].premapped = true;
+			else
+				netdev_warn(vi->dev, "TXQ (%d) enable premapped failure.\n", i);
+		}
 	}
 
 	/* run here: ret == 0. */
-- 
2.32.0.3.g01195cf9f


^ permalink raw reply related	[flat|nested] 91+ messages in thread

* Re: [PATCH vhost v10 06/10] virtio_ring: packed-detach: support return dma info to driver
  2023-06-02  9:22   ` Xuan Zhuo
@ 2023-06-02 11:40     ` Michael S. Tsirkin
  -1 siblings, 0 replies; 91+ messages in thread
From: Michael S. Tsirkin @ 2023-06-02 11:40 UTC (permalink / raw)
  To: Xuan Zhuo
  Cc: virtualization, Jason Wang, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Alexei Starovoitov, Daniel Borkmann,
	Jesper Dangaard Brouer, John Fastabend, netdev, bpf

On Fri, Jun 02, 2023 at 05:22:02PM +0800, Xuan Zhuo wrote:
> Under the premapped mode, the driver needs to unmap the DMA address
> after receiving the buffer. The virtio core records the DMA address,
> so the driver needs a way to get the dma info from the virtio core.
> 
> A straightforward approach is to pass an array to the virtio core when
> calling virtqueue_get_buf(). However, it is not feasible when there are
> multiple DMA addresses in the descriptor chain, and the array size is
> unknown.
> 
> To solve this problem, a helper be introduced. After calling
> virtqueue_get_buf(), the driver can call the helper to
> retrieve a dma info. If the helper function returns -EAGAIN, it means
> that there are more DMA addresses to be processed, and the driver should
> call the helper function again.


Please, keep error codes for when an actual error occurs.
A common pattern would be:
	<0 - error
	0 - success, done
	>0 - success, more to do


> To keep track of the current position in
> the chain, a cursor must be passed to the helper function, which is
> initialized by virtqueue_get_buf().
> 
> Some processes are done inside this helper, so this helper MUST be
> called under the premapped mode.
> 
> Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> ---
>  drivers/virtio/virtio_ring.c | 105 ++++++++++++++++++++++++++++++++---
>  include/linux/virtio.h       |   9 ++-
>  2 files changed, 103 insertions(+), 11 deletions(-)
> 
> diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> index cdc4349f6066..cbc22daae7e1 100644
> --- a/drivers/virtio/virtio_ring.c
> +++ b/drivers/virtio/virtio_ring.c
> @@ -1695,8 +1695,85 @@ static bool virtqueue_kick_prepare_packed(struct virtqueue *_vq)
>  	return needs_kick;
>  }
>  
> +static void detach_cursor_init_packed(struct vring_virtqueue *vq,
> +				      struct virtqueue_detach_cursor *cursor, u16 id)
> +{
> +	struct vring_desc_state_packed *state = NULL;
> +	u32 len;
> +
> +	state = &vq->packed.desc_state[id];
> +
> +	/* Clear data ptr. */
> +	state->data = NULL;
> +
> +	vq->packed.desc_extra[state->last].next = vq->free_head;
> +	vq->free_head = id;
> +	vq->vq.num_free += state->num;
> +
> +	/* init cursor */
> +	cursor->curr = id;
> +	cursor->done = 0;
> +	cursor->pos = 0;
> +
> +	if (vq->packed.desc_extra[id].flags & VRING_DESC_F_INDIRECT) {
> +		len = vq->split.desc_extra[id].len;
> +
> +		cursor->num = len / sizeof(struct vring_packed_desc);
> +		cursor->indirect = true;
> +
> +		vring_unmap_extra_packed(vq, &vq->packed.desc_extra[id]);
> +	} else {
> +		cursor->num = state->num;
> +		cursor->indirect = false;
> +	}
> +}
> +
> +static int virtqueue_detach_packed(struct virtqueue *_vq, struct virtqueue_detach_cursor *cursor,
> +				   dma_addr_t *addr, u32 *len, enum dma_data_direction *dir)
> +{
> +	struct vring_virtqueue *vq = to_vvq(_vq);
> +
> +	if (unlikely(cursor->done))
> +		return -EINVAL;
> +
> +	if (!cursor->indirect) {
> +		struct vring_desc_extra *extra;
> +
> +		extra = &vq->packed.desc_extra[cursor->curr];
> +		cursor->curr = extra->next;
> +
> +		*addr = extra->addr;
> +		*len = extra->len;
> +		*dir = (extra->flags & VRING_DESC_F_WRITE) ? DMA_FROM_DEVICE : DMA_TO_DEVICE;
> +
> +		if (++cursor->pos == cursor->num) {
> +			cursor->done = true;
> +			return 0;
> +		}
> +	} else {
> +		struct vring_packed_desc *indir_desc, *desc;
> +		u16 flags;
> +
> +		indir_desc = vq->packed.desc_state[cursor->curr].indir_desc;
> +		desc = &indir_desc[cursor->pos];
> +
> +		flags = le16_to_cpu(desc->flags);
> +		*addr = le64_to_cpu(desc->addr);
> +		*len = le32_to_cpu(desc->len);
> +		*dir = (flags & VRING_DESC_F_WRITE) ?  DMA_FROM_DEVICE : DMA_TO_DEVICE;
> +
> +		if (++cursor->pos == cursor->num) {
> +			kfree(indir_desc);
> +			cursor->done = true;
> +			return 0;
> +		}
> +	}
> +
> +	return -EAGAIN;
> +}
> +
>  static void detach_buf_packed(struct vring_virtqueue *vq,
> -			      unsigned int id, void **ctx)
> +			      unsigned int id)
>  {
>  	struct vring_desc_state_packed *state = NULL;
>  	struct vring_packed_desc *desc;
> @@ -1736,8 +1813,6 @@ static void detach_buf_packed(struct vring_virtqueue *vq,
>  		}
>  		kfree(desc);
>  		state->indir_desc = NULL;
> -	} else if (ctx) {
> -		*ctx = state->indir_desc;
>  	}
>  }
>  
> @@ -1768,7 +1843,8 @@ static bool more_used_packed(const struct vring_virtqueue *vq)
>  
>  static void *virtqueue_get_buf_ctx_packed(struct virtqueue *_vq,
>  					  unsigned int *len,
> -					  void **ctx)
> +					  void **ctx,
> +					  struct virtqueue_detach_cursor *cursor)
>  {
>  	struct vring_virtqueue *vq = to_vvq(_vq);
>  	u16 last_used, id, last_used_idx;
> @@ -1808,7 +1884,14 @@ static void *virtqueue_get_buf_ctx_packed(struct virtqueue *_vq,
>  
>  	/* detach_buf_packed clears data, so grab it now. */
>  	ret = vq->packed.desc_state[id].data;
> -	detach_buf_packed(vq, id, ctx);
> +
> +	if (!vq->indirect && ctx)
> +		*ctx = vq->packed.desc_state[id].indir_desc;
> +
> +	if (vq->premapped)
> +		detach_cursor_init_packed(vq, cursor, id);
> +	else
> +		detach_buf_packed(vq, id);
>  
>  	last_used += vq->packed.desc_state[id].num;
>  	if (unlikely(last_used >= vq->packed.vring.num)) {
> @@ -1960,7 +2043,8 @@ static bool virtqueue_enable_cb_delayed_packed(struct virtqueue *_vq)
>  	return true;
>  }
>  
> -static void *virtqueue_detach_unused_buf_packed(struct virtqueue *_vq)
> +static void *virtqueue_detach_unused_buf_packed(struct virtqueue *_vq,
> +						struct virtqueue_detach_cursor *cursor)
>  {
>  	struct vring_virtqueue *vq = to_vvq(_vq);
>  	unsigned int i;
> @@ -1973,7 +2057,10 @@ static void *virtqueue_detach_unused_buf_packed(struct virtqueue *_vq)
>  			continue;
>  		/* detach_buf clears data, so grab it now. */
>  		buf = vq->packed.desc_state[i].data;
> -		detach_buf_packed(vq, i, NULL);
> +		if (vq->premapped)
> +			detach_cursor_init_packed(vq, cursor, i);
> +		else
> +			detach_buf_packed(vq, i);
>  		END_USE(vq);
>  		return buf;
>  	}
> @@ -2458,7 +2545,7 @@ void *virtqueue_get_buf_ctx(struct virtqueue *_vq, unsigned int *len,
>  {
>  	struct vring_virtqueue *vq = to_vvq(_vq);
>  
> -	return vq->packed_ring ? virtqueue_get_buf_ctx_packed(_vq, len, ctx) :
> +	return vq->packed_ring ? virtqueue_get_buf_ctx_packed(_vq, len, ctx, NULL) :
>  				 virtqueue_get_buf_ctx_split(_vq, len, ctx, NULL);
>  }
>  EXPORT_SYMBOL_GPL(virtqueue_get_buf_ctx);
> @@ -2590,7 +2677,7 @@ void *virtqueue_detach_unused_buf(struct virtqueue *_vq)
>  {
>  	struct vring_virtqueue *vq = to_vvq(_vq);
>  
> -	return vq->packed_ring ? virtqueue_detach_unused_buf_packed(_vq) :
> +	return vq->packed_ring ? virtqueue_detach_unused_buf_packed(_vq, NULL) :
>  				 virtqueue_detach_unused_buf_split(_vq, NULL);
>  }
>  EXPORT_SYMBOL_GPL(virtqueue_detach_unused_buf);
> diff --git a/include/linux/virtio.h b/include/linux/virtio.h
> index eb4a4e4329aa..7f137c7a9034 100644
> --- a/include/linux/virtio.h
> +++ b/include/linux/virtio.h
> @@ -43,8 +43,13 @@ struct virtqueue_detach_cursor {
>  	unsigned done:1;
>  	unsigned hole:14;
>  
> -	/* for split head */
> -	unsigned head:16;
> +	union {
> +		/* for split head */
> +		unsigned head:16;
> +
> +		/* for packed id */
> +		unsigned curr:16;
> +	};
>  	unsigned num:16;
>  	unsigned pos:16;
>  };
> -- 
> 2.32.0.3.g01195cf9f


^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH vhost v10 06/10] virtio_ring: packed-detach: support return dma info to driver
@ 2023-06-02 11:40     ` Michael S. Tsirkin
  0 siblings, 0 replies; 91+ messages in thread
From: Michael S. Tsirkin @ 2023-06-02 11:40 UTC (permalink / raw)
  To: Xuan Zhuo
  Cc: Jesper Dangaard Brouer, Daniel Borkmann, netdev, John Fastabend,
	Alexei Starovoitov, virtualization, Eric Dumazet, Jakub Kicinski,
	bpf, Paolo Abeni, David S. Miller

On Fri, Jun 02, 2023 at 05:22:02PM +0800, Xuan Zhuo wrote:
> Under the premapped mode, the driver needs to unmap the DMA address
> after receiving the buffer. The virtio core records the DMA address,
> so the driver needs a way to get the dma info from the virtio core.
> 
> A straightforward approach is to pass an array to the virtio core when
> calling virtqueue_get_buf(). However, it is not feasible when there are
> multiple DMA addresses in the descriptor chain, and the array size is
> unknown.
> 
> To solve this problem, a helper be introduced. After calling
> virtqueue_get_buf(), the driver can call the helper to
> retrieve a dma info. If the helper function returns -EAGAIN, it means
> that there are more DMA addresses to be processed, and the driver should
> call the helper function again.


Please, keep error codes for when an actual error occurs.
A common pattern would be:
	<0 - error
	0 - success, done
	>0 - success, more to do


> To keep track of the current position in
> the chain, a cursor must be passed to the helper function, which is
> initialized by virtqueue_get_buf().
> 
> Some processes are done inside this helper, so this helper MUST be
> called under the premapped mode.
> 
> Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> ---
>  drivers/virtio/virtio_ring.c | 105 ++++++++++++++++++++++++++++++++---
>  include/linux/virtio.h       |   9 ++-
>  2 files changed, 103 insertions(+), 11 deletions(-)
> 
> diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> index cdc4349f6066..cbc22daae7e1 100644
> --- a/drivers/virtio/virtio_ring.c
> +++ b/drivers/virtio/virtio_ring.c
> @@ -1695,8 +1695,85 @@ static bool virtqueue_kick_prepare_packed(struct virtqueue *_vq)
>  	return needs_kick;
>  }
>  
> +static void detach_cursor_init_packed(struct vring_virtqueue *vq,
> +				      struct virtqueue_detach_cursor *cursor, u16 id)
> +{
> +	struct vring_desc_state_packed *state = NULL;
> +	u32 len;
> +
> +	state = &vq->packed.desc_state[id];
> +
> +	/* Clear data ptr. */
> +	state->data = NULL;
> +
> +	vq->packed.desc_extra[state->last].next = vq->free_head;
> +	vq->free_head = id;
> +	vq->vq.num_free += state->num;
> +
> +	/* init cursor */
> +	cursor->curr = id;
> +	cursor->done = 0;
> +	cursor->pos = 0;
> +
> +	if (vq->packed.desc_extra[id].flags & VRING_DESC_F_INDIRECT) {
> +		len = vq->split.desc_extra[id].len;
> +
> +		cursor->num = len / sizeof(struct vring_packed_desc);
> +		cursor->indirect = true;
> +
> +		vring_unmap_extra_packed(vq, &vq->packed.desc_extra[id]);
> +	} else {
> +		cursor->num = state->num;
> +		cursor->indirect = false;
> +	}
> +}
> +
> +static int virtqueue_detach_packed(struct virtqueue *_vq, struct virtqueue_detach_cursor *cursor,
> +				   dma_addr_t *addr, u32 *len, enum dma_data_direction *dir)
> +{
> +	struct vring_virtqueue *vq = to_vvq(_vq);
> +
> +	if (unlikely(cursor->done))
> +		return -EINVAL;
> +
> +	if (!cursor->indirect) {
> +		struct vring_desc_extra *extra;
> +
> +		extra = &vq->packed.desc_extra[cursor->curr];
> +		cursor->curr = extra->next;
> +
> +		*addr = extra->addr;
> +		*len = extra->len;
> +		*dir = (extra->flags & VRING_DESC_F_WRITE) ? DMA_FROM_DEVICE : DMA_TO_DEVICE;
> +
> +		if (++cursor->pos == cursor->num) {
> +			cursor->done = true;
> +			return 0;
> +		}
> +	} else {
> +		struct vring_packed_desc *indir_desc, *desc;
> +		u16 flags;
> +
> +		indir_desc = vq->packed.desc_state[cursor->curr].indir_desc;
> +		desc = &indir_desc[cursor->pos];
> +
> +		flags = le16_to_cpu(desc->flags);
> +		*addr = le64_to_cpu(desc->addr);
> +		*len = le32_to_cpu(desc->len);
> +		*dir = (flags & VRING_DESC_F_WRITE) ?  DMA_FROM_DEVICE : DMA_TO_DEVICE;
> +
> +		if (++cursor->pos == cursor->num) {
> +			kfree(indir_desc);
> +			cursor->done = true;
> +			return 0;
> +		}
> +	}
> +
> +	return -EAGAIN;
> +}
> +
>  static void detach_buf_packed(struct vring_virtqueue *vq,
> -			      unsigned int id, void **ctx)
> +			      unsigned int id)
>  {
>  	struct vring_desc_state_packed *state = NULL;
>  	struct vring_packed_desc *desc;
> @@ -1736,8 +1813,6 @@ static void detach_buf_packed(struct vring_virtqueue *vq,
>  		}
>  		kfree(desc);
>  		state->indir_desc = NULL;
> -	} else if (ctx) {
> -		*ctx = state->indir_desc;
>  	}
>  }
>  
> @@ -1768,7 +1843,8 @@ static bool more_used_packed(const struct vring_virtqueue *vq)
>  
>  static void *virtqueue_get_buf_ctx_packed(struct virtqueue *_vq,
>  					  unsigned int *len,
> -					  void **ctx)
> +					  void **ctx,
> +					  struct virtqueue_detach_cursor *cursor)
>  {
>  	struct vring_virtqueue *vq = to_vvq(_vq);
>  	u16 last_used, id, last_used_idx;
> @@ -1808,7 +1884,14 @@ static void *virtqueue_get_buf_ctx_packed(struct virtqueue *_vq,
>  
>  	/* detach_buf_packed clears data, so grab it now. */
>  	ret = vq->packed.desc_state[id].data;
> -	detach_buf_packed(vq, id, ctx);
> +
> +	if (!vq->indirect && ctx)
> +		*ctx = vq->packed.desc_state[id].indir_desc;
> +
> +	if (vq->premapped)
> +		detach_cursor_init_packed(vq, cursor, id);
> +	else
> +		detach_buf_packed(vq, id);
>  
>  	last_used += vq->packed.desc_state[id].num;
>  	if (unlikely(last_used >= vq->packed.vring.num)) {
> @@ -1960,7 +2043,8 @@ static bool virtqueue_enable_cb_delayed_packed(struct virtqueue *_vq)
>  	return true;
>  }
>  
> -static void *virtqueue_detach_unused_buf_packed(struct virtqueue *_vq)
> +static void *virtqueue_detach_unused_buf_packed(struct virtqueue *_vq,
> +						struct virtqueue_detach_cursor *cursor)
>  {
>  	struct vring_virtqueue *vq = to_vvq(_vq);
>  	unsigned int i;
> @@ -1973,7 +2057,10 @@ static void *virtqueue_detach_unused_buf_packed(struct virtqueue *_vq)
>  			continue;
>  		/* detach_buf clears data, so grab it now. */
>  		buf = vq->packed.desc_state[i].data;
> -		detach_buf_packed(vq, i, NULL);
> +		if (vq->premapped)
> +			detach_cursor_init_packed(vq, cursor, i);
> +		else
> +			detach_buf_packed(vq, i);
>  		END_USE(vq);
>  		return buf;
>  	}
> @@ -2458,7 +2545,7 @@ void *virtqueue_get_buf_ctx(struct virtqueue *_vq, unsigned int *len,
>  {
>  	struct vring_virtqueue *vq = to_vvq(_vq);
>  
> -	return vq->packed_ring ? virtqueue_get_buf_ctx_packed(_vq, len, ctx) :
> +	return vq->packed_ring ? virtqueue_get_buf_ctx_packed(_vq, len, ctx, NULL) :
>  				 virtqueue_get_buf_ctx_split(_vq, len, ctx, NULL);
>  }
>  EXPORT_SYMBOL_GPL(virtqueue_get_buf_ctx);
> @@ -2590,7 +2677,7 @@ void *virtqueue_detach_unused_buf(struct virtqueue *_vq)
>  {
>  	struct vring_virtqueue *vq = to_vvq(_vq);
>  
> -	return vq->packed_ring ? virtqueue_detach_unused_buf_packed(_vq) :
> +	return vq->packed_ring ? virtqueue_detach_unused_buf_packed(_vq, NULL) :
>  				 virtqueue_detach_unused_buf_split(_vq, NULL);
>  }
>  EXPORT_SYMBOL_GPL(virtqueue_detach_unused_buf);
> diff --git a/include/linux/virtio.h b/include/linux/virtio.h
> index eb4a4e4329aa..7f137c7a9034 100644
> --- a/include/linux/virtio.h
> +++ b/include/linux/virtio.h
> @@ -43,8 +43,13 @@ struct virtqueue_detach_cursor {
>  	unsigned done:1;
>  	unsigned hole:14;
>  
> -	/* for split head */
> -	unsigned head:16;
> +	union {
> +		/* for split head */
> +		unsigned head:16;
> +
> +		/* for packed id */
> +		unsigned curr:16;
> +	};
>  	unsigned num:16;
>  	unsigned pos:16;
>  };
> -- 
> 2.32.0.3.g01195cf9f

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH vhost v10 00/10] virtio core prepares for AF_XDP
  2023-06-02  9:21 ` Xuan Zhuo
                   ` (10 preceding siblings ...)
  (?)
@ 2023-06-03  6:29 ` Jakub Kicinski
  2023-06-05  1:58     ` Xuan Zhuo
  -1 siblings, 1 reply; 91+ messages in thread
From: Jakub Kicinski @ 2023-06-03  6:29 UTC (permalink / raw)
  To: Xuan Zhuo
  Cc: virtualization, Michael S. Tsirkin, Jason Wang, David S. Miller,
	Eric Dumazet, Paolo Abeni, Alexei Starovoitov, Daniel Borkmann,
	Jesper Dangaard Brouer, John Fastabend, netdev, bpf

On Fri,  2 Jun 2023 17:21:56 +0800 Xuan Zhuo wrote:
> Thanks for the help from Christoph.

That said you haven't CCed him on the series, isn't the general rule to
CC anyone who was involved in previous discussions?

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH vhost v10 10/10] virtio_net: support dma premapped
  2023-06-02  9:22   ` Xuan Zhuo
  (?)
@ 2023-06-03  6:31   ` Jakub Kicinski
  2023-06-05  2:10       ` Xuan Zhuo
  -1 siblings, 1 reply; 91+ messages in thread
From: Jakub Kicinski @ 2023-06-03  6:31 UTC (permalink / raw)
  To: Xuan Zhuo
  Cc: virtualization, Michael S. Tsirkin, Jason Wang, David S. Miller,
	Eric Dumazet, Paolo Abeni, Alexei Starovoitov, Daniel Borkmann,
	Jesper Dangaard Brouer, John Fastabend, netdev, bpf

On Fri,  2 Jun 2023 17:22:06 +0800 Xuan Zhuo wrote:
>  drivers/net/virtio_net.c | 163 +++++++++++++++++++++++++++++++++------

ack for this going via the vhost tree, FWIW, but you'll potentially
need to wait for the merge window to move forward with the actual
af xdp patches, in this case.

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH vhost v10 07/10] virtio_ring: introduce helpers for premapped
  2023-06-02  9:22   ` Xuan Zhuo
@ 2023-06-04 13:45     ` Michael S. Tsirkin
  -1 siblings, 0 replies; 91+ messages in thread
From: Michael S. Tsirkin @ 2023-06-04 13:45 UTC (permalink / raw)
  To: Xuan Zhuo
  Cc: virtualization, Jason Wang, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Alexei Starovoitov, Daniel Borkmann,
	Jesper Dangaard Brouer, John Fastabend, netdev, bpf

On Fri, Jun 02, 2023 at 05:22:03PM +0800, Xuan Zhuo wrote:
> This patch introduces three helpers for premapped mode.
> 
> * virtqueue_get_buf_premapped
> * virtqueue_detach_unused_buf_premapped
> 
> The above helpers work like the non-premapped funcs. But a cursor is
> passed.
> 
> virtqueue_detach is used to get the dma info of the last buf by
>   cursor.

This isn't very clear from the description but virtqueue_detach is
also introduced by this patch as opposed to being used.


> Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> ---
>  drivers/virtio/virtio_ring.c | 83 ++++++++++++++++++++++++++++++++++++
>  include/linux/virtio.h       | 10 +++++
>  2 files changed, 93 insertions(+)
> 
> diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> index cbc22daae7e1..6771b9661798 100644
> --- a/drivers/virtio/virtio_ring.c
> +++ b/drivers/virtio/virtio_ring.c
> @@ -2555,6 +2555,66 @@ void *virtqueue_get_buf(struct virtqueue *_vq, unsigned int *len)
>  	return virtqueue_get_buf_ctx(_vq, len, NULL);
>  }
>  EXPORT_SYMBOL_GPL(virtqueue_get_buf);
> +
> +/**
> + * virtqueue_get_buf_premapped - get the next used buffer
> + * @_vq: the struct virtqueue we're talking about.
> + * @len: the length written into the buffer
> + * @ctx: extra context for the token
> + * @cursor: detach cursor
> + *
> + * If the device wrote data into the buffer, @len will be set to the
> + * amount written.  This means you don't need to clear the buffer
> + * beforehand to ensure there's no data leakage in the case of short
> + * writes.
> + *
> + * Caller must ensure we don't call this with other virtqueue
> + * operations at the same time (except where noted).
> + *
> + * This is used for the premapped vq. The cursor is passed by the dirver, that
> + * is used for virtqueue_detach. That will be initialized by virtio core
> + * internally.
> + *
> + * Returns NULL if there are no used buffers, or the "data" token
> + * handed to virtqueue_add_*().
> + */
> +void *virtqueue_get_buf_premapped(struct virtqueue *_vq, unsigned int *len,
> +				  void **ctx,
> +				  struct virtqueue_detach_cursor *cursor)
> +{
> +	struct vring_virtqueue *vq = to_vvq(_vq);
> +
> +	return vq->packed_ring ? virtqueue_get_buf_ctx_packed(_vq, len, ctx, cursor) :
> +				 virtqueue_get_buf_ctx_split(_vq, len, ctx, cursor);
> +}
> +EXPORT_SYMBOL_GPL(virtqueue_get_buf_premapped);
> +
> +/**
> + * virtqueue_detach - get the dma info of last buf

detach what from what then?
I am guessing this is not the only thing this function does?
sounds like a bad name for a function.

> + * @_vq: the struct virtqueue we're talking about.
> + * @cursor: detach cursor
> + * @addr: the dma address

what address?  it's the 1st time you mention an address ...

> + * @len: the length of the dma address
> + * @dir: the direction of the dma address
> + *
> + * This is used for the premapped vq. The cursor is initialized by
> + * virtqueue_get_buf_premapped or virtqueue_detach_unused_buf_premapped.
> + *
> + * Returns:
> + * -EAGAIN: there are more dma info, this function should be called more.

here too, pls don't return -EAGAIN not in an error case.
something like "1" will do.

> + * -EINVAL: the process is done, should not call this function
> + * 0: no more dma info
> + */
> +int virtqueue_detach(struct virtqueue *_vq, struct virtqueue_detach_cursor *cursor,
> +		     dma_addr_t *addr, u32 *len, enum dma_data_direction *dir)
> +{
> +	struct vring_virtqueue *vq = to_vvq(_vq);
> +
> +	return vq->packed_ring ? virtqueue_detach_packed(_vq, cursor, addr, len, dir) :
> +				 virtqueue_detach_split(_vq, cursor, addr, len, dir);
> +}
> +EXPORT_SYMBOL_GPL(virtqueue_detach);
> +
>  /**
>   * virtqueue_disable_cb - disable callbacks
>   * @_vq: the struct virtqueue we're talking about.
> @@ -2682,6 +2742,29 @@ void *virtqueue_detach_unused_buf(struct virtqueue *_vq)
>  }
>  EXPORT_SYMBOL_GPL(virtqueue_detach_unused_buf);
>  
> +/**
> + * virtqueue_detach_unused_buf_premapped - detach first unused buffer
> + * @_vq: the struct virtqueue we're talking about.
> + * @cursor: detach cursor
> + *
> + * This is used for the premapped vq. The cursor is passed by the dirver, that
> + * is used for virtqueue_detach. That will be initialized by virtio core
> + * internally.
> + *
> + * Returns NULL or the "data" token handed to virtqueue_add_*().
> + * This is not valid on an active queue; it is useful for device
> + * shutdown or the reset queue.
> + */
> +void *virtqueue_detach_unused_buf_premapped(struct virtqueue *_vq,
> +					    struct virtqueue_detach_cursor *cursor)
> +{
> +	struct vring_virtqueue *vq = to_vvq(_vq);
> +
> +	return vq->packed_ring ? virtqueue_detach_unused_buf_packed(_vq, cursor) :
> +				 virtqueue_detach_unused_buf_split(_vq, cursor);
> +}
> +EXPORT_SYMBOL_GPL(virtqueue_detach_unused_buf_premapped);
> +
>  static inline bool more_used(const struct vring_virtqueue *vq)
>  {
>  	return vq->packed_ring ? more_used_packed(vq) : more_used_split(vq);
> diff --git a/include/linux/virtio.h b/include/linux/virtio.h
> index 7f137c7a9034..0a11c5b32fe5 100644
> --- a/include/linux/virtio.h
> +++ b/include/linux/virtio.h
> @@ -3,6 +3,7 @@
>  #define _LINUX_VIRTIO_H
>  /* Everything a virtio driver needs to work with any particular virtio
>   * implementation. */
> +#include <linux/dma-mapping.h>
>  #include <linux/types.h>
>  #include <linux/scatterlist.h>
>  #include <linux/spinlock.h>
> @@ -88,6 +89,10 @@ void *virtqueue_get_buf(struct virtqueue *vq, unsigned int *len);
>  void *virtqueue_get_buf_ctx(struct virtqueue *vq, unsigned int *len,
>  			    void **ctx);
>  
> +void *virtqueue_get_buf_premapped(struct virtqueue *_vq, unsigned int *len,
> +				  void **ctx,
> +				  struct virtqueue_detach_cursor *cursor);
> +
>  void virtqueue_disable_cb(struct virtqueue *vq);
>  
>  bool virtqueue_enable_cb(struct virtqueue *vq);
> @@ -101,6 +106,8 @@ bool virtqueue_poll(struct virtqueue *vq, unsigned);
>  bool virtqueue_enable_cb_delayed(struct virtqueue *vq);
>  
>  void *virtqueue_detach_unused_buf(struct virtqueue *vq);
> +void *virtqueue_detach_unused_buf_premapped(struct virtqueue *_vq,
> +					    struct virtqueue_detach_cursor *cursor);
>  
>  unsigned int virtqueue_get_vring_size(const struct virtqueue *vq);
>  
> @@ -114,6 +121,9 @@ dma_addr_t virtqueue_get_used_addr(const struct virtqueue *vq);
>  int virtqueue_resize(struct virtqueue *vq, u32 num,
>  		     void (*recycle)(struct virtqueue *vq, void *buf));
>  
> +int virtqueue_detach(struct virtqueue *_vq, struct virtqueue_detach_cursor *cursor,
> +		     dma_addr_t *addr, u32 *len, enum dma_data_direction *dir);
> +
>  /**
>   * struct virtio_device - representation of a device using virtio
>   * @index: unique position on the virtio bus
> -- 
> 2.32.0.3.g01195cf9f


^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH vhost v10 07/10] virtio_ring: introduce helpers for premapped
@ 2023-06-04 13:45     ` Michael S. Tsirkin
  0 siblings, 0 replies; 91+ messages in thread
From: Michael S. Tsirkin @ 2023-06-04 13:45 UTC (permalink / raw)
  To: Xuan Zhuo
  Cc: Jesper Dangaard Brouer, Daniel Borkmann, netdev, John Fastabend,
	Alexei Starovoitov, virtualization, Eric Dumazet, Jakub Kicinski,
	bpf, Paolo Abeni, David S. Miller

On Fri, Jun 02, 2023 at 05:22:03PM +0800, Xuan Zhuo wrote:
> This patch introduces three helpers for premapped mode.
> 
> * virtqueue_get_buf_premapped
> * virtqueue_detach_unused_buf_premapped
> 
> The above helpers work like the non-premapped funcs. But a cursor is
> passed.
> 
> virtqueue_detach is used to get the dma info of the last buf by
>   cursor.

This isn't very clear from the description but virtqueue_detach is
also introduced by this patch as opposed to being used.


> Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> ---
>  drivers/virtio/virtio_ring.c | 83 ++++++++++++++++++++++++++++++++++++
>  include/linux/virtio.h       | 10 +++++
>  2 files changed, 93 insertions(+)
> 
> diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> index cbc22daae7e1..6771b9661798 100644
> --- a/drivers/virtio/virtio_ring.c
> +++ b/drivers/virtio/virtio_ring.c
> @@ -2555,6 +2555,66 @@ void *virtqueue_get_buf(struct virtqueue *_vq, unsigned int *len)
>  	return virtqueue_get_buf_ctx(_vq, len, NULL);
>  }
>  EXPORT_SYMBOL_GPL(virtqueue_get_buf);
> +
> +/**
> + * virtqueue_get_buf_premapped - get the next used buffer
> + * @_vq: the struct virtqueue we're talking about.
> + * @len: the length written into the buffer
> + * @ctx: extra context for the token
> + * @cursor: detach cursor
> + *
> + * If the device wrote data into the buffer, @len will be set to the
> + * amount written.  This means you don't need to clear the buffer
> + * beforehand to ensure there's no data leakage in the case of short
> + * writes.
> + *
> + * Caller must ensure we don't call this with other virtqueue
> + * operations at the same time (except where noted).
> + *
> + * This is used for the premapped vq. The cursor is passed by the dirver, that
> + * is used for virtqueue_detach. That will be initialized by virtio core
> + * internally.
> + *
> + * Returns NULL if there are no used buffers, or the "data" token
> + * handed to virtqueue_add_*().
> + */
> +void *virtqueue_get_buf_premapped(struct virtqueue *_vq, unsigned int *len,
> +				  void **ctx,
> +				  struct virtqueue_detach_cursor *cursor)
> +{
> +	struct vring_virtqueue *vq = to_vvq(_vq);
> +
> +	return vq->packed_ring ? virtqueue_get_buf_ctx_packed(_vq, len, ctx, cursor) :
> +				 virtqueue_get_buf_ctx_split(_vq, len, ctx, cursor);
> +}
> +EXPORT_SYMBOL_GPL(virtqueue_get_buf_premapped);
> +
> +/**
> + * virtqueue_detach - get the dma info of last buf

detach what from what then?
I am guessing this is not the only thing this function does?
sounds like a bad name for a function.

> + * @_vq: the struct virtqueue we're talking about.
> + * @cursor: detach cursor
> + * @addr: the dma address

what address?  it's the 1st time you mention an address ...

> + * @len: the length of the dma address
> + * @dir: the direction of the dma address
> + *
> + * This is used for the premapped vq. The cursor is initialized by
> + * virtqueue_get_buf_premapped or virtqueue_detach_unused_buf_premapped.
> + *
> + * Returns:
> + * -EAGAIN: there are more dma info, this function should be called more.

here too, pls don't return -EAGAIN not in an error case.
something like "1" will do.

> + * -EINVAL: the process is done, should not call this function
> + * 0: no more dma info
> + */
> +int virtqueue_detach(struct virtqueue *_vq, struct virtqueue_detach_cursor *cursor,
> +		     dma_addr_t *addr, u32 *len, enum dma_data_direction *dir)
> +{
> +	struct vring_virtqueue *vq = to_vvq(_vq);
> +
> +	return vq->packed_ring ? virtqueue_detach_packed(_vq, cursor, addr, len, dir) :
> +				 virtqueue_detach_split(_vq, cursor, addr, len, dir);
> +}
> +EXPORT_SYMBOL_GPL(virtqueue_detach);
> +
>  /**
>   * virtqueue_disable_cb - disable callbacks
>   * @_vq: the struct virtqueue we're talking about.
> @@ -2682,6 +2742,29 @@ void *virtqueue_detach_unused_buf(struct virtqueue *_vq)
>  }
>  EXPORT_SYMBOL_GPL(virtqueue_detach_unused_buf);
>  
> +/**
> + * virtqueue_detach_unused_buf_premapped - detach first unused buffer
> + * @_vq: the struct virtqueue we're talking about.
> + * @cursor: detach cursor
> + *
> + * This is used for the premapped vq. The cursor is passed by the dirver, that
> + * is used for virtqueue_detach. That will be initialized by virtio core
> + * internally.
> + *
> + * Returns NULL or the "data" token handed to virtqueue_add_*().
> + * This is not valid on an active queue; it is useful for device
> + * shutdown or the reset queue.
> + */
> +void *virtqueue_detach_unused_buf_premapped(struct virtqueue *_vq,
> +					    struct virtqueue_detach_cursor *cursor)
> +{
> +	struct vring_virtqueue *vq = to_vvq(_vq);
> +
> +	return vq->packed_ring ? virtqueue_detach_unused_buf_packed(_vq, cursor) :
> +				 virtqueue_detach_unused_buf_split(_vq, cursor);
> +}
> +EXPORT_SYMBOL_GPL(virtqueue_detach_unused_buf_premapped);
> +
>  static inline bool more_used(const struct vring_virtqueue *vq)
>  {
>  	return vq->packed_ring ? more_used_packed(vq) : more_used_split(vq);
> diff --git a/include/linux/virtio.h b/include/linux/virtio.h
> index 7f137c7a9034..0a11c5b32fe5 100644
> --- a/include/linux/virtio.h
> +++ b/include/linux/virtio.h
> @@ -3,6 +3,7 @@
>  #define _LINUX_VIRTIO_H
>  /* Everything a virtio driver needs to work with any particular virtio
>   * implementation. */
> +#include <linux/dma-mapping.h>
>  #include <linux/types.h>
>  #include <linux/scatterlist.h>
>  #include <linux/spinlock.h>
> @@ -88,6 +89,10 @@ void *virtqueue_get_buf(struct virtqueue *vq, unsigned int *len);
>  void *virtqueue_get_buf_ctx(struct virtqueue *vq, unsigned int *len,
>  			    void **ctx);
>  
> +void *virtqueue_get_buf_premapped(struct virtqueue *_vq, unsigned int *len,
> +				  void **ctx,
> +				  struct virtqueue_detach_cursor *cursor);
> +
>  void virtqueue_disable_cb(struct virtqueue *vq);
>  
>  bool virtqueue_enable_cb(struct virtqueue *vq);
> @@ -101,6 +106,8 @@ bool virtqueue_poll(struct virtqueue *vq, unsigned);
>  bool virtqueue_enable_cb_delayed(struct virtqueue *vq);
>  
>  void *virtqueue_detach_unused_buf(struct virtqueue *vq);
> +void *virtqueue_detach_unused_buf_premapped(struct virtqueue *_vq,
> +					    struct virtqueue_detach_cursor *cursor);
>  
>  unsigned int virtqueue_get_vring_size(const struct virtqueue *vq);
>  
> @@ -114,6 +121,9 @@ dma_addr_t virtqueue_get_used_addr(const struct virtqueue *vq);
>  int virtqueue_resize(struct virtqueue *vq, u32 num,
>  		     void (*recycle)(struct virtqueue *vq, void *buf));
>  
> +int virtqueue_detach(struct virtqueue *_vq, struct virtqueue_detach_cursor *cursor,
> +		     dma_addr_t *addr, u32 *len, enum dma_data_direction *dir);
> +
>  /**
>   * struct virtio_device - representation of a device using virtio
>   * @index: unique position on the virtio bus
> -- 
> 2.32.0.3.g01195cf9f

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH vhost v10 00/10] virtio core prepares for AF_XDP
  2023-06-03  6:29 ` [PATCH vhost v10 00/10] virtio core prepares for AF_XDP Jakub Kicinski
@ 2023-06-05  1:58     ` Xuan Zhuo
  0 siblings, 0 replies; 91+ messages in thread
From: Xuan Zhuo @ 2023-06-05  1:58 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Jesper Dangaard Brouer, Daniel Borkmann, Michael S. Tsirkin,
	netdev, John Fastabend, Alexei Starovoitov, virtualization,
	Christoph Hellwig, Eric Dumazet, bpf, Paolo Abeni,
	David S. Miller

On Fri, 2 Jun 2023 23:29:02 -0700, Jakub Kicinski <kuba@kernel.org> wrote:
> On Fri,  2 Jun 2023 17:21:56 +0800 Xuan Zhuo wrote:
> > Thanks for the help from Christoph.
>
> That said you haven't CCed him on the series, isn't the general rule to
> CC anyone who was involved in previous discussions?


Sorry, I forgot to add cc after git format-patch.


Thanks.
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH vhost v10 00/10] virtio core prepares for AF_XDP
@ 2023-06-05  1:58     ` Xuan Zhuo
  0 siblings, 0 replies; 91+ messages in thread
From: Xuan Zhuo @ 2023-06-05  1:58 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: virtualization, Michael S. Tsirkin, Jason Wang, David S. Miller,
	Eric Dumazet, Paolo Abeni, Alexei Starovoitov, Daniel Borkmann,
	Jesper Dangaard Brouer, John Fastabend, netdev, bpf,
	Christoph Hellwig

On Fri, 2 Jun 2023 23:29:02 -0700, Jakub Kicinski <kuba@kernel.org> wrote:
> On Fri,  2 Jun 2023 17:21:56 +0800 Xuan Zhuo wrote:
> > Thanks for the help from Christoph.
>
> That said you haven't CCed him on the series, isn't the general rule to
> CC anyone who was involved in previous discussions?


Sorry, I forgot to add cc after git format-patch.


Thanks.

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH vhost v10 07/10] virtio_ring: introduce helpers for premapped
  2023-06-04 13:45     ` Michael S. Tsirkin
@ 2023-06-05  2:06       ` Xuan Zhuo
  -1 siblings, 0 replies; 91+ messages in thread
From: Xuan Zhuo @ 2023-06-05  2:06 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: virtualization, Jason Wang, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Alexei Starovoitov, Daniel Borkmann,
	Jesper Dangaard Brouer, John Fastabend, netdev, bpf

On Sun, 4 Jun 2023 09:45:14 -0400, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> On Fri, Jun 02, 2023 at 05:22:03PM +0800, Xuan Zhuo wrote:
> > This patch introduces three helpers for premapped mode.
> >
> > * virtqueue_get_buf_premapped
> > * virtqueue_detach_unused_buf_premapped
> >
> > The above helpers work like the non-premapped funcs. But a cursor is
> > passed.
> >
> > virtqueue_detach is used to get the dma info of the last buf by
> >   cursor.
>
> This isn't very clear from the description but virtqueue_detach is
> also introduced by this patch as opposed to being used.
>
>
> > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > ---
> >  drivers/virtio/virtio_ring.c | 83 ++++++++++++++++++++++++++++++++++++
> >  include/linux/virtio.h       | 10 +++++
> >  2 files changed, 93 insertions(+)
> >
> > diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> > index cbc22daae7e1..6771b9661798 100644
> > --- a/drivers/virtio/virtio_ring.c
> > +++ b/drivers/virtio/virtio_ring.c
> > @@ -2555,6 +2555,66 @@ void *virtqueue_get_buf(struct virtqueue *_vq, unsigned int *len)
> >  	return virtqueue_get_buf_ctx(_vq, len, NULL);
> >  }
> >  EXPORT_SYMBOL_GPL(virtqueue_get_buf);
> > +
> > +/**
> > + * virtqueue_get_buf_premapped - get the next used buffer
> > + * @_vq: the struct virtqueue we're talking about.
> > + * @len: the length written into the buffer
> > + * @ctx: extra context for the token
> > + * @cursor: detach cursor
> > + *
> > + * If the device wrote data into the buffer, @len will be set to the
> > + * amount written.  This means you don't need to clear the buffer
> > + * beforehand to ensure there's no data leakage in the case of short
> > + * writes.
> > + *
> > + * Caller must ensure we don't call this with other virtqueue
> > + * operations at the same time (except where noted).
> > + *
> > + * This is used for the premapped vq. The cursor is passed by the dirver, that
> > + * is used for virtqueue_detach. That will be initialized by virtio core
> > + * internally.
> > + *
> > + * Returns NULL if there are no used buffers, or the "data" token
> > + * handed to virtqueue_add_*().
> > + */
> > +void *virtqueue_get_buf_premapped(struct virtqueue *_vq, unsigned int *len,
> > +				  void **ctx,
> > +				  struct virtqueue_detach_cursor *cursor)
> > +{
> > +	struct vring_virtqueue *vq = to_vvq(_vq);
> > +
> > +	return vq->packed_ring ? virtqueue_get_buf_ctx_packed(_vq, len, ctx, cursor) :
> > +				 virtqueue_get_buf_ctx_split(_vq, len, ctx, cursor);
> > +}
> > +EXPORT_SYMBOL_GPL(virtqueue_get_buf_premapped);
> > +
> > +/**
> > + * virtqueue_detach - get the dma info of last buf
>
> detach what from what then?
> I am guessing this is not the only thing this function does?
> sounds like a bad name for a function.

Let me think of a good name

>
> > + * @_vq: the struct virtqueue we're talking about.
> > + * @cursor: detach cursor
> > + * @addr: the dma address
>
> what address?  it's the 1st time you mention an address ...

Will fix.


>
> > + * @len: the length of the dma address
> > + * @dir: the direction of the dma address
> > + *
> > + * This is used for the premapped vq. The cursor is initialized by
> > + * virtqueue_get_buf_premapped or virtqueue_detach_unused_buf_premapped.
> > + *
> > + * Returns:
> > + * -EAGAIN: there are more dma info, this function should be called more.
>
> here too, pls don't return -EAGAIN not in an error case.
> something like "1" will do.

While I agree with you, -EAGAIN seems to be a commonly used method. How about we
return EAGAIN instead of -EAGAIN ?

Thanks.



>
> > + * -EINVAL: the process is done, should not call this function
> > + * 0: no more dma info
> > + */
> > +int virtqueue_detach(struct virtqueue *_vq, struct virtqueue_detach_cursor *cursor,
> > +		     dma_addr_t *addr, u32 *len, enum dma_data_direction *dir)
> > +{
> > +	struct vring_virtqueue *vq = to_vvq(_vq);
> > +
> > +	return vq->packed_ring ? virtqueue_detach_packed(_vq, cursor, addr, len, dir) :
> > +				 virtqueue_detach_split(_vq, cursor, addr, len, dir);
> > +}
> > +EXPORT_SYMBOL_GPL(virtqueue_detach);
> > +
> >  /**
> >   * virtqueue_disable_cb - disable callbacks
> >   * @_vq: the struct virtqueue we're talking about.
> > @@ -2682,6 +2742,29 @@ void *virtqueue_detach_unused_buf(struct virtqueue *_vq)
> >  }
> >  EXPORT_SYMBOL_GPL(virtqueue_detach_unused_buf);
> >
> > +/**
> > + * virtqueue_detach_unused_buf_premapped - detach first unused buffer
> > + * @_vq: the struct virtqueue we're talking about.
> > + * @cursor: detach cursor
> > + *
> > + * This is used for the premapped vq. The cursor is passed by the dirver, that
> > + * is used for virtqueue_detach. That will be initialized by virtio core
> > + * internally.
> > + *
> > + * Returns NULL or the "data" token handed to virtqueue_add_*().
> > + * This is not valid on an active queue; it is useful for device
> > + * shutdown or the reset queue.
> > + */
> > +void *virtqueue_detach_unused_buf_premapped(struct virtqueue *_vq,
> > +					    struct virtqueue_detach_cursor *cursor)
> > +{
> > +	struct vring_virtqueue *vq = to_vvq(_vq);
> > +
> > +	return vq->packed_ring ? virtqueue_detach_unused_buf_packed(_vq, cursor) :
> > +				 virtqueue_detach_unused_buf_split(_vq, cursor);
> > +}
> > +EXPORT_SYMBOL_GPL(virtqueue_detach_unused_buf_premapped);
> > +
> >  static inline bool more_used(const struct vring_virtqueue *vq)
> >  {
> >  	return vq->packed_ring ? more_used_packed(vq) : more_used_split(vq);
> > diff --git a/include/linux/virtio.h b/include/linux/virtio.h
> > index 7f137c7a9034..0a11c5b32fe5 100644
> > --- a/include/linux/virtio.h
> > +++ b/include/linux/virtio.h
> > @@ -3,6 +3,7 @@
> >  #define _LINUX_VIRTIO_H
> >  /* Everything a virtio driver needs to work with any particular virtio
> >   * implementation. */
> > +#include <linux/dma-mapping.h>
> >  #include <linux/types.h>
> >  #include <linux/scatterlist.h>
> >  #include <linux/spinlock.h>
> > @@ -88,6 +89,10 @@ void *virtqueue_get_buf(struct virtqueue *vq, unsigned int *len);
> >  void *virtqueue_get_buf_ctx(struct virtqueue *vq, unsigned int *len,
> >  			    void **ctx);
> >
> > +void *virtqueue_get_buf_premapped(struct virtqueue *_vq, unsigned int *len,
> > +				  void **ctx,
> > +				  struct virtqueue_detach_cursor *cursor);
> > +
> >  void virtqueue_disable_cb(struct virtqueue *vq);
> >
> >  bool virtqueue_enable_cb(struct virtqueue *vq);
> > @@ -101,6 +106,8 @@ bool virtqueue_poll(struct virtqueue *vq, unsigned);
> >  bool virtqueue_enable_cb_delayed(struct virtqueue *vq);
> >
> >  void *virtqueue_detach_unused_buf(struct virtqueue *vq);
> > +void *virtqueue_detach_unused_buf_premapped(struct virtqueue *_vq,
> > +					    struct virtqueue_detach_cursor *cursor);
> >
> >  unsigned int virtqueue_get_vring_size(const struct virtqueue *vq);
> >
> > @@ -114,6 +121,9 @@ dma_addr_t virtqueue_get_used_addr(const struct virtqueue *vq);
> >  int virtqueue_resize(struct virtqueue *vq, u32 num,
> >  		     void (*recycle)(struct virtqueue *vq, void *buf));
> >
> > +int virtqueue_detach(struct virtqueue *_vq, struct virtqueue_detach_cursor *cursor,
> > +		     dma_addr_t *addr, u32 *len, enum dma_data_direction *dir);
> > +
> >  /**
> >   * struct virtio_device - representation of a device using virtio
> >   * @index: unique position on the virtio bus
> > --
> > 2.32.0.3.g01195cf9f
>

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH vhost v10 07/10] virtio_ring: introduce helpers for premapped
@ 2023-06-05  2:06       ` Xuan Zhuo
  0 siblings, 0 replies; 91+ messages in thread
From: Xuan Zhuo @ 2023-06-05  2:06 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Jesper Dangaard Brouer, Daniel Borkmann, netdev, John Fastabend,
	Alexei Starovoitov, virtualization, Eric Dumazet, Jakub Kicinski,
	bpf, Paolo Abeni, David S. Miller

On Sun, 4 Jun 2023 09:45:14 -0400, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> On Fri, Jun 02, 2023 at 05:22:03PM +0800, Xuan Zhuo wrote:
> > This patch introduces three helpers for premapped mode.
> >
> > * virtqueue_get_buf_premapped
> > * virtqueue_detach_unused_buf_premapped
> >
> > The above helpers work like the non-premapped funcs. But a cursor is
> > passed.
> >
> > virtqueue_detach is used to get the dma info of the last buf by
> >   cursor.
>
> This isn't very clear from the description but virtqueue_detach is
> also introduced by this patch as opposed to being used.
>
>
> > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > ---
> >  drivers/virtio/virtio_ring.c | 83 ++++++++++++++++++++++++++++++++++++
> >  include/linux/virtio.h       | 10 +++++
> >  2 files changed, 93 insertions(+)
> >
> > diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> > index cbc22daae7e1..6771b9661798 100644
> > --- a/drivers/virtio/virtio_ring.c
> > +++ b/drivers/virtio/virtio_ring.c
> > @@ -2555,6 +2555,66 @@ void *virtqueue_get_buf(struct virtqueue *_vq, unsigned int *len)
> >  	return virtqueue_get_buf_ctx(_vq, len, NULL);
> >  }
> >  EXPORT_SYMBOL_GPL(virtqueue_get_buf);
> > +
> > +/**
> > + * virtqueue_get_buf_premapped - get the next used buffer
> > + * @_vq: the struct virtqueue we're talking about.
> > + * @len: the length written into the buffer
> > + * @ctx: extra context for the token
> > + * @cursor: detach cursor
> > + *
> > + * If the device wrote data into the buffer, @len will be set to the
> > + * amount written.  This means you don't need to clear the buffer
> > + * beforehand to ensure there's no data leakage in the case of short
> > + * writes.
> > + *
> > + * Caller must ensure we don't call this with other virtqueue
> > + * operations at the same time (except where noted).
> > + *
> > + * This is used for the premapped vq. The cursor is passed by the dirver, that
> > + * is used for virtqueue_detach. That will be initialized by virtio core
> > + * internally.
> > + *
> > + * Returns NULL if there are no used buffers, or the "data" token
> > + * handed to virtqueue_add_*().
> > + */
> > +void *virtqueue_get_buf_premapped(struct virtqueue *_vq, unsigned int *len,
> > +				  void **ctx,
> > +				  struct virtqueue_detach_cursor *cursor)
> > +{
> > +	struct vring_virtqueue *vq = to_vvq(_vq);
> > +
> > +	return vq->packed_ring ? virtqueue_get_buf_ctx_packed(_vq, len, ctx, cursor) :
> > +				 virtqueue_get_buf_ctx_split(_vq, len, ctx, cursor);
> > +}
> > +EXPORT_SYMBOL_GPL(virtqueue_get_buf_premapped);
> > +
> > +/**
> > + * virtqueue_detach - get the dma info of last buf
>
> detach what from what then?
> I am guessing this is not the only thing this function does?
> sounds like a bad name for a function.

Let me think of a good name

>
> > + * @_vq: the struct virtqueue we're talking about.
> > + * @cursor: detach cursor
> > + * @addr: the dma address
>
> what address?  it's the 1st time you mention an address ...

Will fix.


>
> > + * @len: the length of the dma address
> > + * @dir: the direction of the dma address
> > + *
> > + * This is used for the premapped vq. The cursor is initialized by
> > + * virtqueue_get_buf_premapped or virtqueue_detach_unused_buf_premapped.
> > + *
> > + * Returns:
> > + * -EAGAIN: there are more dma info, this function should be called more.
>
> here too, pls don't return -EAGAIN not in an error case.
> something like "1" will do.

While I agree with you, -EAGAIN seems to be a commonly used method. How about we
return EAGAIN instead of -EAGAIN ?

Thanks.



>
> > + * -EINVAL: the process is done, should not call this function
> > + * 0: no more dma info
> > + */
> > +int virtqueue_detach(struct virtqueue *_vq, struct virtqueue_detach_cursor *cursor,
> > +		     dma_addr_t *addr, u32 *len, enum dma_data_direction *dir)
> > +{
> > +	struct vring_virtqueue *vq = to_vvq(_vq);
> > +
> > +	return vq->packed_ring ? virtqueue_detach_packed(_vq, cursor, addr, len, dir) :
> > +				 virtqueue_detach_split(_vq, cursor, addr, len, dir);
> > +}
> > +EXPORT_SYMBOL_GPL(virtqueue_detach);
> > +
> >  /**
> >   * virtqueue_disable_cb - disable callbacks
> >   * @_vq: the struct virtqueue we're talking about.
> > @@ -2682,6 +2742,29 @@ void *virtqueue_detach_unused_buf(struct virtqueue *_vq)
> >  }
> >  EXPORT_SYMBOL_GPL(virtqueue_detach_unused_buf);
> >
> > +/**
> > + * virtqueue_detach_unused_buf_premapped - detach first unused buffer
> > + * @_vq: the struct virtqueue we're talking about.
> > + * @cursor: detach cursor
> > + *
> > + * This is used for the premapped vq. The cursor is passed by the dirver, that
> > + * is used for virtqueue_detach. That will be initialized by virtio core
> > + * internally.
> > + *
> > + * Returns NULL or the "data" token handed to virtqueue_add_*().
> > + * This is not valid on an active queue; it is useful for device
> > + * shutdown or the reset queue.
> > + */
> > +void *virtqueue_detach_unused_buf_premapped(struct virtqueue *_vq,
> > +					    struct virtqueue_detach_cursor *cursor)
> > +{
> > +	struct vring_virtqueue *vq = to_vvq(_vq);
> > +
> > +	return vq->packed_ring ? virtqueue_detach_unused_buf_packed(_vq, cursor) :
> > +				 virtqueue_detach_unused_buf_split(_vq, cursor);
> > +}
> > +EXPORT_SYMBOL_GPL(virtqueue_detach_unused_buf_premapped);
> > +
> >  static inline bool more_used(const struct vring_virtqueue *vq)
> >  {
> >  	return vq->packed_ring ? more_used_packed(vq) : more_used_split(vq);
> > diff --git a/include/linux/virtio.h b/include/linux/virtio.h
> > index 7f137c7a9034..0a11c5b32fe5 100644
> > --- a/include/linux/virtio.h
> > +++ b/include/linux/virtio.h
> > @@ -3,6 +3,7 @@
> >  #define _LINUX_VIRTIO_H
> >  /* Everything a virtio driver needs to work with any particular virtio
> >   * implementation. */
> > +#include <linux/dma-mapping.h>
> >  #include <linux/types.h>
> >  #include <linux/scatterlist.h>
> >  #include <linux/spinlock.h>
> > @@ -88,6 +89,10 @@ void *virtqueue_get_buf(struct virtqueue *vq, unsigned int *len);
> >  void *virtqueue_get_buf_ctx(struct virtqueue *vq, unsigned int *len,
> >  			    void **ctx);
> >
> > +void *virtqueue_get_buf_premapped(struct virtqueue *_vq, unsigned int *len,
> > +				  void **ctx,
> > +				  struct virtqueue_detach_cursor *cursor);
> > +
> >  void virtqueue_disable_cb(struct virtqueue *vq);
> >
> >  bool virtqueue_enable_cb(struct virtqueue *vq);
> > @@ -101,6 +106,8 @@ bool virtqueue_poll(struct virtqueue *vq, unsigned);
> >  bool virtqueue_enable_cb_delayed(struct virtqueue *vq);
> >
> >  void *virtqueue_detach_unused_buf(struct virtqueue *vq);
> > +void *virtqueue_detach_unused_buf_premapped(struct virtqueue *_vq,
> > +					    struct virtqueue_detach_cursor *cursor);
> >
> >  unsigned int virtqueue_get_vring_size(const struct virtqueue *vq);
> >
> > @@ -114,6 +121,9 @@ dma_addr_t virtqueue_get_used_addr(const struct virtqueue *vq);
> >  int virtqueue_resize(struct virtqueue *vq, u32 num,
> >  		     void (*recycle)(struct virtqueue *vq, void *buf));
> >
> > +int virtqueue_detach(struct virtqueue *_vq, struct virtqueue_detach_cursor *cursor,
> > +		     dma_addr_t *addr, u32 *len, enum dma_data_direction *dir);
> > +
> >  /**
> >   * struct virtio_device - representation of a device using virtio
> >   * @index: unique position on the virtio bus
> > --
> > 2.32.0.3.g01195cf9f
>
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH vhost v10 10/10] virtio_net: support dma premapped
  2023-06-03  6:31   ` Jakub Kicinski
@ 2023-06-05  2:10       ` Xuan Zhuo
  0 siblings, 0 replies; 91+ messages in thread
From: Xuan Zhuo @ 2023-06-05  2:10 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: virtualization, Michael S. Tsirkin, Jason Wang, David S. Miller,
	Eric Dumazet, Paolo Abeni, Alexei Starovoitov, Daniel Borkmann,
	Jesper Dangaard Brouer, John Fastabend, netdev, bpf

On Fri, 2 Jun 2023 23:31:52 -0700, Jakub Kicinski <kuba@kernel.org> wrote:
> On Fri,  2 Jun 2023 17:22:06 +0800 Xuan Zhuo wrote:
> >  drivers/net/virtio_net.c | 163 +++++++++++++++++++++++++++++++++------
>
> ack for this going via the vhost tree, FWIW, but you'll potentially
> need to wait for the merge window to move forward with the actual
> af xdp patches, in this case.


My current plan is to let virtio support premapped dma first, and then implement
virtio-net to support af-xdp zerocopy.

This will indeed involve two branches. But most of the implementations in this
patch are virtio code, so I think it would be more appropriate to commit to
vhost. Do you have any good ideas?


Thanks.

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH vhost v10 10/10] virtio_net: support dma premapped
@ 2023-06-05  2:10       ` Xuan Zhuo
  0 siblings, 0 replies; 91+ messages in thread
From: Xuan Zhuo @ 2023-06-05  2:10 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Jesper Dangaard Brouer, Daniel Borkmann, Michael S. Tsirkin,
	netdev, John Fastabend, Alexei Starovoitov, virtualization,
	Eric Dumazet, bpf, Paolo Abeni, David S. Miller

On Fri, 2 Jun 2023 23:31:52 -0700, Jakub Kicinski <kuba@kernel.org> wrote:
> On Fri,  2 Jun 2023 17:22:06 +0800 Xuan Zhuo wrote:
> >  drivers/net/virtio_net.c | 163 +++++++++++++++++++++++++++++++++------
>
> ack for this going via the vhost tree, FWIW, but you'll potentially
> need to wait for the merge window to move forward with the actual
> af xdp patches, in this case.


My current plan is to let virtio support premapped dma first, and then implement
virtio-net to support af-xdp zerocopy.

This will indeed involve two branches. But most of the implementations in this
patch are virtio code, so I think it would be more appropriate to commit to
vhost. Do you have any good ideas?


Thanks.
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH vhost v10 07/10] virtio_ring: introduce helpers for premapped
  2023-06-05  2:06       ` Xuan Zhuo
@ 2023-06-05  5:38         ` Michael S. Tsirkin
  -1 siblings, 0 replies; 91+ messages in thread
From: Michael S. Tsirkin @ 2023-06-05  5:38 UTC (permalink / raw)
  To: Xuan Zhuo
  Cc: Jesper Dangaard Brouer, Daniel Borkmann, netdev, John Fastabend,
	Alexei Starovoitov, virtualization, Eric Dumazet, Jakub Kicinski,
	bpf, Paolo Abeni, David S. Miller

On Mon, Jun 05, 2023 at 10:06:51AM +0800, Xuan Zhuo wrote:
> On Sun, 4 Jun 2023 09:45:14 -0400, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > On Fri, Jun 02, 2023 at 05:22:03PM +0800, Xuan Zhuo wrote:
> > > This patch introduces three helpers for premapped mode.
> > >
> > > * virtqueue_get_buf_premapped
> > > * virtqueue_detach_unused_buf_premapped
> > >
> > > The above helpers work like the non-premapped funcs. But a cursor is
> > > passed.
> > >
> > > virtqueue_detach is used to get the dma info of the last buf by
> > >   cursor.
> >
> > This isn't very clear from the description but virtqueue_detach is
> > also introduced by this patch as opposed to being used.
> >
> >
> > > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > > ---
> > >  drivers/virtio/virtio_ring.c | 83 ++++++++++++++++++++++++++++++++++++
> > >  include/linux/virtio.h       | 10 +++++
> > >  2 files changed, 93 insertions(+)
> > >
> > > diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> > > index cbc22daae7e1..6771b9661798 100644
> > > --- a/drivers/virtio/virtio_ring.c
> > > +++ b/drivers/virtio/virtio_ring.c
> > > @@ -2555,6 +2555,66 @@ void *virtqueue_get_buf(struct virtqueue *_vq, unsigned int *len)
> > >  	return virtqueue_get_buf_ctx(_vq, len, NULL);
> > >  }
> > >  EXPORT_SYMBOL_GPL(virtqueue_get_buf);
> > > +
> > > +/**
> > > + * virtqueue_get_buf_premapped - get the next used buffer
> > > + * @_vq: the struct virtqueue we're talking about.
> > > + * @len: the length written into the buffer
> > > + * @ctx: extra context for the token
> > > + * @cursor: detach cursor
> > > + *
> > > + * If the device wrote data into the buffer, @len will be set to the
> > > + * amount written.  This means you don't need to clear the buffer
> > > + * beforehand to ensure there's no data leakage in the case of short
> > > + * writes.
> > > + *
> > > + * Caller must ensure we don't call this with other virtqueue
> > > + * operations at the same time (except where noted).
> > > + *
> > > + * This is used for the premapped vq. The cursor is passed by the dirver, that
> > > + * is used for virtqueue_detach. That will be initialized by virtio core
> > > + * internally.
> > > + *
> > > + * Returns NULL if there are no used buffers, or the "data" token
> > > + * handed to virtqueue_add_*().
> > > + */
> > > +void *virtqueue_get_buf_premapped(struct virtqueue *_vq, unsigned int *len,
> > > +				  void **ctx,
> > > +				  struct virtqueue_detach_cursor *cursor)
> > > +{
> > > +	struct vring_virtqueue *vq = to_vvq(_vq);
> > > +
> > > +	return vq->packed_ring ? virtqueue_get_buf_ctx_packed(_vq, len, ctx, cursor) :
> > > +				 virtqueue_get_buf_ctx_split(_vq, len, ctx, cursor);
> > > +}
> > > +EXPORT_SYMBOL_GPL(virtqueue_get_buf_premapped);
> > > +
> > > +/**
> > > + * virtqueue_detach - get the dma info of last buf
> >
> > detach what from what then?
> > I am guessing this is not the only thing this function does?
> > sounds like a bad name for a function.
> 
> Let me think of a good name
> 
> >
> > > + * @_vq: the struct virtqueue we're talking about.
> > > + * @cursor: detach cursor
> > > + * @addr: the dma address
> >
> > what address?  it's the 1st time you mention an address ...
> 
> Will fix.
> 
> 
> >
> > > + * @len: the length of the dma address
> > > + * @dir: the direction of the dma address
> > > + *
> > > + * This is used for the premapped vq. The cursor is initialized by
> > > + * virtqueue_get_buf_premapped or virtqueue_detach_unused_buf_premapped.
> > > + *
> > > + * Returns:
> > > + * -EAGAIN: there are more dma info, this function should be called more.
> >
> > here too, pls don't return -EAGAIN not in an error case.
> > something like "1" will do.
> 
> While I agree with you, -EAGAIN seems to be a commonly used method.

Where is it used like this? A typical use is e.g. in read(2):

      EAGAIN The file descriptor fd refers to a file other than a socket and has been marked nonblocking (O_NONBLOCK), and  the  read
              would block.  See open(2) for further details on the O_NONBLOCK flag.

a better analog here is read filling up all its buffer, in which
case it returns the # of bytes returned.


> How about we
> return EAGAIN instead of -EAGAIN ?
> 
> Thanks.
> 
> 
> 
> >
> > > + * -EINVAL: the process is done, should not call this function
> > > + * 0: no more dma info
> > > + */
> > > +int virtqueue_detach(struct virtqueue *_vq, struct virtqueue_detach_cursor *cursor,
> > > +		     dma_addr_t *addr, u32 *len, enum dma_data_direction *dir)
> > > +{
> > > +	struct vring_virtqueue *vq = to_vvq(_vq);
> > > +
> > > +	return vq->packed_ring ? virtqueue_detach_packed(_vq, cursor, addr, len, dir) :
> > > +				 virtqueue_detach_split(_vq, cursor, addr, len, dir);
> > > +}
> > > +EXPORT_SYMBOL_GPL(virtqueue_detach);
> > > +
> > >  /**
> > >   * virtqueue_disable_cb - disable callbacks
> > >   * @_vq: the struct virtqueue we're talking about.
> > > @@ -2682,6 +2742,29 @@ void *virtqueue_detach_unused_buf(struct virtqueue *_vq)
> > >  }
> > >  EXPORT_SYMBOL_GPL(virtqueue_detach_unused_buf);
> > >
> > > +/**
> > > + * virtqueue_detach_unused_buf_premapped - detach first unused buffer
> > > + * @_vq: the struct virtqueue we're talking about.
> > > + * @cursor: detach cursor
> > > + *
> > > + * This is used for the premapped vq. The cursor is passed by the dirver, that
> > > + * is used for virtqueue_detach. That will be initialized by virtio core
> > > + * internally.
> > > + *
> > > + * Returns NULL or the "data" token handed to virtqueue_add_*().
> > > + * This is not valid on an active queue; it is useful for device
> > > + * shutdown or the reset queue.
> > > + */
> > > +void *virtqueue_detach_unused_buf_premapped(struct virtqueue *_vq,
> > > +					    struct virtqueue_detach_cursor *cursor)
> > > +{
> > > +	struct vring_virtqueue *vq = to_vvq(_vq);
> > > +
> > > +	return vq->packed_ring ? virtqueue_detach_unused_buf_packed(_vq, cursor) :
> > > +				 virtqueue_detach_unused_buf_split(_vq, cursor);
> > > +}
> > > +EXPORT_SYMBOL_GPL(virtqueue_detach_unused_buf_premapped);
> > > +
> > >  static inline bool more_used(const struct vring_virtqueue *vq)
> > >  {
> > >  	return vq->packed_ring ? more_used_packed(vq) : more_used_split(vq);
> > > diff --git a/include/linux/virtio.h b/include/linux/virtio.h
> > > index 7f137c7a9034..0a11c5b32fe5 100644
> > > --- a/include/linux/virtio.h
> > > +++ b/include/linux/virtio.h
> > > @@ -3,6 +3,7 @@
> > >  #define _LINUX_VIRTIO_H
> > >  /* Everything a virtio driver needs to work with any particular virtio
> > >   * implementation. */
> > > +#include <linux/dma-mapping.h>
> > >  #include <linux/types.h>
> > >  #include <linux/scatterlist.h>
> > >  #include <linux/spinlock.h>
> > > @@ -88,6 +89,10 @@ void *virtqueue_get_buf(struct virtqueue *vq, unsigned int *len);
> > >  void *virtqueue_get_buf_ctx(struct virtqueue *vq, unsigned int *len,
> > >  			    void **ctx);
> > >
> > > +void *virtqueue_get_buf_premapped(struct virtqueue *_vq, unsigned int *len,
> > > +				  void **ctx,
> > > +				  struct virtqueue_detach_cursor *cursor);
> > > +
> > >  void virtqueue_disable_cb(struct virtqueue *vq);
> > >
> > >  bool virtqueue_enable_cb(struct virtqueue *vq);
> > > @@ -101,6 +106,8 @@ bool virtqueue_poll(struct virtqueue *vq, unsigned);
> > >  bool virtqueue_enable_cb_delayed(struct virtqueue *vq);
> > >
> > >  void *virtqueue_detach_unused_buf(struct virtqueue *vq);
> > > +void *virtqueue_detach_unused_buf_premapped(struct virtqueue *_vq,
> > > +					    struct virtqueue_detach_cursor *cursor);
> > >
> > >  unsigned int virtqueue_get_vring_size(const struct virtqueue *vq);
> > >
> > > @@ -114,6 +121,9 @@ dma_addr_t virtqueue_get_used_addr(const struct virtqueue *vq);
> > >  int virtqueue_resize(struct virtqueue *vq, u32 num,
> > >  		     void (*recycle)(struct virtqueue *vq, void *buf));
> > >
> > > +int virtqueue_detach(struct virtqueue *_vq, struct virtqueue_detach_cursor *cursor,
> > > +		     dma_addr_t *addr, u32 *len, enum dma_data_direction *dir);
> > > +
> > >  /**
> > >   * struct virtio_device - representation of a device using virtio
> > >   * @index: unique position on the virtio bus
> > > --
> > > 2.32.0.3.g01195cf9f
> >

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH vhost v10 07/10] virtio_ring: introduce helpers for premapped
@ 2023-06-05  5:38         ` Michael S. Tsirkin
  0 siblings, 0 replies; 91+ messages in thread
From: Michael S. Tsirkin @ 2023-06-05  5:38 UTC (permalink / raw)
  To: Xuan Zhuo
  Cc: virtualization, Jason Wang, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Alexei Starovoitov, Daniel Borkmann,
	Jesper Dangaard Brouer, John Fastabend, netdev, bpf

On Mon, Jun 05, 2023 at 10:06:51AM +0800, Xuan Zhuo wrote:
> On Sun, 4 Jun 2023 09:45:14 -0400, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > On Fri, Jun 02, 2023 at 05:22:03PM +0800, Xuan Zhuo wrote:
> > > This patch introduces three helpers for premapped mode.
> > >
> > > * virtqueue_get_buf_premapped
> > > * virtqueue_detach_unused_buf_premapped
> > >
> > > The above helpers work like the non-premapped funcs. But a cursor is
> > > passed.
> > >
> > > virtqueue_detach is used to get the dma info of the last buf by
> > >   cursor.
> >
> > This isn't very clear from the description but virtqueue_detach is
> > also introduced by this patch as opposed to being used.
> >
> >
> > > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > > ---
> > >  drivers/virtio/virtio_ring.c | 83 ++++++++++++++++++++++++++++++++++++
> > >  include/linux/virtio.h       | 10 +++++
> > >  2 files changed, 93 insertions(+)
> > >
> > > diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> > > index cbc22daae7e1..6771b9661798 100644
> > > --- a/drivers/virtio/virtio_ring.c
> > > +++ b/drivers/virtio/virtio_ring.c
> > > @@ -2555,6 +2555,66 @@ void *virtqueue_get_buf(struct virtqueue *_vq, unsigned int *len)
> > >  	return virtqueue_get_buf_ctx(_vq, len, NULL);
> > >  }
> > >  EXPORT_SYMBOL_GPL(virtqueue_get_buf);
> > > +
> > > +/**
> > > + * virtqueue_get_buf_premapped - get the next used buffer
> > > + * @_vq: the struct virtqueue we're talking about.
> > > + * @len: the length written into the buffer
> > > + * @ctx: extra context for the token
> > > + * @cursor: detach cursor
> > > + *
> > > + * If the device wrote data into the buffer, @len will be set to the
> > > + * amount written.  This means you don't need to clear the buffer
> > > + * beforehand to ensure there's no data leakage in the case of short
> > > + * writes.
> > > + *
> > > + * Caller must ensure we don't call this with other virtqueue
> > > + * operations at the same time (except where noted).
> > > + *
> > > + * This is used for the premapped vq. The cursor is passed by the dirver, that
> > > + * is used for virtqueue_detach. That will be initialized by virtio core
> > > + * internally.
> > > + *
> > > + * Returns NULL if there are no used buffers, or the "data" token
> > > + * handed to virtqueue_add_*().
> > > + */
> > > +void *virtqueue_get_buf_premapped(struct virtqueue *_vq, unsigned int *len,
> > > +				  void **ctx,
> > > +				  struct virtqueue_detach_cursor *cursor)
> > > +{
> > > +	struct vring_virtqueue *vq = to_vvq(_vq);
> > > +
> > > +	return vq->packed_ring ? virtqueue_get_buf_ctx_packed(_vq, len, ctx, cursor) :
> > > +				 virtqueue_get_buf_ctx_split(_vq, len, ctx, cursor);
> > > +}
> > > +EXPORT_SYMBOL_GPL(virtqueue_get_buf_premapped);
> > > +
> > > +/**
> > > + * virtqueue_detach - get the dma info of last buf
> >
> > detach what from what then?
> > I am guessing this is not the only thing this function does?
> > sounds like a bad name for a function.
> 
> Let me think of a good name
> 
> >
> > > + * @_vq: the struct virtqueue we're talking about.
> > > + * @cursor: detach cursor
> > > + * @addr: the dma address
> >
> > what address?  it's the 1st time you mention an address ...
> 
> Will fix.
> 
> 
> >
> > > + * @len: the length of the dma address
> > > + * @dir: the direction of the dma address
> > > + *
> > > + * This is used for the premapped vq. The cursor is initialized by
> > > + * virtqueue_get_buf_premapped or virtqueue_detach_unused_buf_premapped.
> > > + *
> > > + * Returns:
> > > + * -EAGAIN: there are more dma info, this function should be called more.
> >
> > here too, pls don't return -EAGAIN not in an error case.
> > something like "1" will do.
> 
> While I agree with you, -EAGAIN seems to be a commonly used method.

Where is it used like this? A typical use is e.g. in read(2):

      EAGAIN The file descriptor fd refers to a file other than a socket and has been marked nonblocking (O_NONBLOCK), and  the  read
              would block.  See open(2) for further details on the O_NONBLOCK flag.

a better analog here is read filling up all its buffer, in which
case it returns the # of bytes returned.


> How about we
> return EAGAIN instead of -EAGAIN ?
> 
> Thanks.
> 
> 
> 
> >
> > > + * -EINVAL: the process is done, should not call this function
> > > + * 0: no more dma info
> > > + */
> > > +int virtqueue_detach(struct virtqueue *_vq, struct virtqueue_detach_cursor *cursor,
> > > +		     dma_addr_t *addr, u32 *len, enum dma_data_direction *dir)
> > > +{
> > > +	struct vring_virtqueue *vq = to_vvq(_vq);
> > > +
> > > +	return vq->packed_ring ? virtqueue_detach_packed(_vq, cursor, addr, len, dir) :
> > > +				 virtqueue_detach_split(_vq, cursor, addr, len, dir);
> > > +}
> > > +EXPORT_SYMBOL_GPL(virtqueue_detach);
> > > +
> > >  /**
> > >   * virtqueue_disable_cb - disable callbacks
> > >   * @_vq: the struct virtqueue we're talking about.
> > > @@ -2682,6 +2742,29 @@ void *virtqueue_detach_unused_buf(struct virtqueue *_vq)
> > >  }
> > >  EXPORT_SYMBOL_GPL(virtqueue_detach_unused_buf);
> > >
> > > +/**
> > > + * virtqueue_detach_unused_buf_premapped - detach first unused buffer
> > > + * @_vq: the struct virtqueue we're talking about.
> > > + * @cursor: detach cursor
> > > + *
> > > + * This is used for the premapped vq. The cursor is passed by the dirver, that
> > > + * is used for virtqueue_detach. That will be initialized by virtio core
> > > + * internally.
> > > + *
> > > + * Returns NULL or the "data" token handed to virtqueue_add_*().
> > > + * This is not valid on an active queue; it is useful for device
> > > + * shutdown or the reset queue.
> > > + */
> > > +void *virtqueue_detach_unused_buf_premapped(struct virtqueue *_vq,
> > > +					    struct virtqueue_detach_cursor *cursor)
> > > +{
> > > +	struct vring_virtqueue *vq = to_vvq(_vq);
> > > +
> > > +	return vq->packed_ring ? virtqueue_detach_unused_buf_packed(_vq, cursor) :
> > > +				 virtqueue_detach_unused_buf_split(_vq, cursor);
> > > +}
> > > +EXPORT_SYMBOL_GPL(virtqueue_detach_unused_buf_premapped);
> > > +
> > >  static inline bool more_used(const struct vring_virtqueue *vq)
> > >  {
> > >  	return vq->packed_ring ? more_used_packed(vq) : more_used_split(vq);
> > > diff --git a/include/linux/virtio.h b/include/linux/virtio.h
> > > index 7f137c7a9034..0a11c5b32fe5 100644
> > > --- a/include/linux/virtio.h
> > > +++ b/include/linux/virtio.h
> > > @@ -3,6 +3,7 @@
> > >  #define _LINUX_VIRTIO_H
> > >  /* Everything a virtio driver needs to work with any particular virtio
> > >   * implementation. */
> > > +#include <linux/dma-mapping.h>
> > >  #include <linux/types.h>
> > >  #include <linux/scatterlist.h>
> > >  #include <linux/spinlock.h>
> > > @@ -88,6 +89,10 @@ void *virtqueue_get_buf(struct virtqueue *vq, unsigned int *len);
> > >  void *virtqueue_get_buf_ctx(struct virtqueue *vq, unsigned int *len,
> > >  			    void **ctx);
> > >
> > > +void *virtqueue_get_buf_premapped(struct virtqueue *_vq, unsigned int *len,
> > > +				  void **ctx,
> > > +				  struct virtqueue_detach_cursor *cursor);
> > > +
> > >  void virtqueue_disable_cb(struct virtqueue *vq);
> > >
> > >  bool virtqueue_enable_cb(struct virtqueue *vq);
> > > @@ -101,6 +106,8 @@ bool virtqueue_poll(struct virtqueue *vq, unsigned);
> > >  bool virtqueue_enable_cb_delayed(struct virtqueue *vq);
> > >
> > >  void *virtqueue_detach_unused_buf(struct virtqueue *vq);
> > > +void *virtqueue_detach_unused_buf_premapped(struct virtqueue *_vq,
> > > +					    struct virtqueue_detach_cursor *cursor);
> > >
> > >  unsigned int virtqueue_get_vring_size(const struct virtqueue *vq);
> > >
> > > @@ -114,6 +121,9 @@ dma_addr_t virtqueue_get_used_addr(const struct virtqueue *vq);
> > >  int virtqueue_resize(struct virtqueue *vq, u32 num,
> > >  		     void (*recycle)(struct virtqueue *vq, void *buf));
> > >
> > > +int virtqueue_detach(struct virtqueue *_vq, struct virtqueue_detach_cursor *cursor,
> > > +		     dma_addr_t *addr, u32 *len, enum dma_data_direction *dir);
> > > +
> > >  /**
> > >   * struct virtio_device - representation of a device using virtio
> > >   * @index: unique position on the virtio bus
> > > --
> > > 2.32.0.3.g01195cf9f
> >


^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH vhost v10 10/10] virtio_net: support dma premapped
  2023-06-05  2:10       ` Xuan Zhuo
@ 2023-06-05  5:44         ` Michael S. Tsirkin
  -1 siblings, 0 replies; 91+ messages in thread
From: Michael S. Tsirkin @ 2023-06-05  5:44 UTC (permalink / raw)
  To: Xuan Zhuo
  Cc: Jesper Dangaard Brouer, Daniel Borkmann, netdev, John Fastabend,
	Alexei Starovoitov, virtualization, Eric Dumazet, Jakub Kicinski,
	bpf, Paolo Abeni, David S. Miller

On Mon, Jun 05, 2023 at 10:10:44AM +0800, Xuan Zhuo wrote:
> On Fri, 2 Jun 2023 23:31:52 -0700, Jakub Kicinski <kuba@kernel.org> wrote:
> > On Fri,  2 Jun 2023 17:22:06 +0800 Xuan Zhuo wrote:
> > >  drivers/net/virtio_net.c | 163 +++++++++++++++++++++++++++++++++------
> >
> > ack for this going via the vhost tree, FWIW, but you'll potentially
> > need to wait for the merge window to move forward with the actual
> > af xdp patches, in this case.
> 
> 
> My current plan is to let virtio support premapped dma first, and then implement
> virtio-net to support af-xdp zerocopy.
> 
> This will indeed involve two branches. But most of the implementations in this
> patch are virtio code, so I think it would be more appropriate to commit to
> vhost. Do you have any good ideas?
> 
> 
> Thanks.

Are you still making changes to net core? DMA core? If it's only
virtio-net then I can probably merge all of it - just a couple of
bugfixes there so far, it shouldn't cause complex conflicts.

-- 
MST

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH vhost v10 10/10] virtio_net: support dma premapped
@ 2023-06-05  5:44         ` Michael S. Tsirkin
  0 siblings, 0 replies; 91+ messages in thread
From: Michael S. Tsirkin @ 2023-06-05  5:44 UTC (permalink / raw)
  To: Xuan Zhuo
  Cc: Jakub Kicinski, virtualization, Jason Wang, David S. Miller,
	Eric Dumazet, Paolo Abeni, Alexei Starovoitov, Daniel Borkmann,
	Jesper Dangaard Brouer, John Fastabend, netdev, bpf

On Mon, Jun 05, 2023 at 10:10:44AM +0800, Xuan Zhuo wrote:
> On Fri, 2 Jun 2023 23:31:52 -0700, Jakub Kicinski <kuba@kernel.org> wrote:
> > On Fri,  2 Jun 2023 17:22:06 +0800 Xuan Zhuo wrote:
> > >  drivers/net/virtio_net.c | 163 +++++++++++++++++++++++++++++++++------
> >
> > ack for this going via the vhost tree, FWIW, but you'll potentially
> > need to wait for the merge window to move forward with the actual
> > af xdp patches, in this case.
> 
> 
> My current plan is to let virtio support premapped dma first, and then implement
> virtio-net to support af-xdp zerocopy.
> 
> This will indeed involve two branches. But most of the implementations in this
> patch are virtio code, so I think it would be more appropriate to commit to
> vhost. Do you have any good ideas?
> 
> 
> Thanks.

Are you still making changes to net core? DMA core? If it's only
virtio-net then I can probably merge all of it - just a couple of
bugfixes there so far, it shouldn't cause complex conflicts.

-- 
MST


^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH vhost v10 07/10] virtio_ring: introduce helpers for premapped
  2023-06-05  5:38         ` Michael S. Tsirkin
@ 2023-06-06  2:01           ` Xuan Zhuo
  -1 siblings, 0 replies; 91+ messages in thread
From: Xuan Zhuo @ 2023-06-06  2:01 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Jesper Dangaard Brouer, Daniel Borkmann, netdev, John Fastabend,
	Alexei Starovoitov, virtualization, Eric Dumazet, Jakub Kicinski,
	bpf, Paolo Abeni, David S. Miller

On Mon, 5 Jun 2023 01:38:48 -0400, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> On Mon, Jun 05, 2023 at 10:06:51AM +0800, Xuan Zhuo wrote:
> > On Sun, 4 Jun 2023 09:45:14 -0400, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > > On Fri, Jun 02, 2023 at 05:22:03PM +0800, Xuan Zhuo wrote:
> > > > This patch introduces three helpers for premapped mode.
> > > >
> > > > * virtqueue_get_buf_premapped
> > > > * virtqueue_detach_unused_buf_premapped
> > > >
> > > > The above helpers work like the non-premapped funcs. But a cursor is
> > > > passed.
> > > >
> > > > virtqueue_detach is used to get the dma info of the last buf by
> > > >   cursor.
> > >
> > > This isn't very clear from the description but virtqueue_detach is
> > > also introduced by this patch as opposed to being used.
> > >
> > >
> > > > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > > > ---
> > > >  drivers/virtio/virtio_ring.c | 83 ++++++++++++++++++++++++++++++++++++
> > > >  include/linux/virtio.h       | 10 +++++
> > > >  2 files changed, 93 insertions(+)
> > > >
> > > > diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> > > > index cbc22daae7e1..6771b9661798 100644
> > > > --- a/drivers/virtio/virtio_ring.c
> > > > +++ b/drivers/virtio/virtio_ring.c
> > > > @@ -2555,6 +2555,66 @@ void *virtqueue_get_buf(struct virtqueue *_vq, unsigned int *len)
> > > >  	return virtqueue_get_buf_ctx(_vq, len, NULL);
> > > >  }
> > > >  EXPORT_SYMBOL_GPL(virtqueue_get_buf);
> > > > +
> > > > +/**
> > > > + * virtqueue_get_buf_premapped - get the next used buffer
> > > > + * @_vq: the struct virtqueue we're talking about.
> > > > + * @len: the length written into the buffer
> > > > + * @ctx: extra context for the token
> > > > + * @cursor: detach cursor
> > > > + *
> > > > + * If the device wrote data into the buffer, @len will be set to the
> > > > + * amount written.  This means you don't need to clear the buffer
> > > > + * beforehand to ensure there's no data leakage in the case of short
> > > > + * writes.
> > > > + *
> > > > + * Caller must ensure we don't call this with other virtqueue
> > > > + * operations at the same time (except where noted).
> > > > + *
> > > > + * This is used for the premapped vq. The cursor is passed by the dirver, that
> > > > + * is used for virtqueue_detach. That will be initialized by virtio core
> > > > + * internally.
> > > > + *
> > > > + * Returns NULL if there are no used buffers, or the "data" token
> > > > + * handed to virtqueue_add_*().
> > > > + */
> > > > +void *virtqueue_get_buf_premapped(struct virtqueue *_vq, unsigned int *len,
> > > > +				  void **ctx,
> > > > +				  struct virtqueue_detach_cursor *cursor)
> > > > +{
> > > > +	struct vring_virtqueue *vq = to_vvq(_vq);
> > > > +
> > > > +	return vq->packed_ring ? virtqueue_get_buf_ctx_packed(_vq, len, ctx, cursor) :
> > > > +				 virtqueue_get_buf_ctx_split(_vq, len, ctx, cursor);
> > > > +}
> > > > +EXPORT_SYMBOL_GPL(virtqueue_get_buf_premapped);
> > > > +
> > > > +/**
> > > > + * virtqueue_detach - get the dma info of last buf
> > >
> > > detach what from what then?
> > > I am guessing this is not the only thing this function does?
> > > sounds like a bad name for a function.
> >
> > Let me think of a good name
> >
> > >
> > > > + * @_vq: the struct virtqueue we're talking about.
> > > > + * @cursor: detach cursor
> > > > + * @addr: the dma address
> > >
> > > what address?  it's the 1st time you mention an address ...
> >
> > Will fix.
> >
> >
> > >
> > > > + * @len: the length of the dma address
> > > > + * @dir: the direction of the dma address
> > > > + *
> > > > + * This is used for the premapped vq. The cursor is initialized by
> > > > + * virtqueue_get_buf_premapped or virtqueue_detach_unused_buf_premapped.
> > > > + *
> > > > + * Returns:
> > > > + * -EAGAIN: there are more dma info, this function should be called more.
> > >
> > > here too, pls don't return -EAGAIN not in an error case.
> > > something like "1" will do.
> >
> > While I agree with you, -EAGAIN seems to be a commonly used method.
>
> Where is it used like this? A typical use is e.g. in read(2):
>
>       EAGAIN The file descriptor fd refers to a file other than a socket and has been marked nonblocking (O_NONBLOCK), and  the  read
>               would block.  See open(2) for further details on the O_NONBLOCK flag.
>
> a better analog here is read filling up all its buffer, in which
> case it returns the # of bytes returned.


Rethink about this, I confused some scenarios. I should return a value to
indicate there is more data. "1" might be a good choice

Will fix.

Thanks.

>
>
> > How about we
> > return EAGAIN instead of -EAGAIN ?
> >
> > Thanks.
> >
> >
> >
> > >
> > > > + * -EINVAL: the process is done, should not call this function
> > > > + * 0: no more dma info
> > > > + */
> > > > +int virtqueue_detach(struct virtqueue *_vq, struct virtqueue_detach_cursor *cursor,
> > > > +		     dma_addr_t *addr, u32 *len, enum dma_data_direction *dir)
> > > > +{
> > > > +	struct vring_virtqueue *vq = to_vvq(_vq);
> > > > +
> > > > +	return vq->packed_ring ? virtqueue_detach_packed(_vq, cursor, addr, len, dir) :
> > > > +				 virtqueue_detach_split(_vq, cursor, addr, len, dir);
> > > > +}
> > > > +EXPORT_SYMBOL_GPL(virtqueue_detach);
> > > > +
> > > >  /**
> > > >   * virtqueue_disable_cb - disable callbacks
> > > >   * @_vq: the struct virtqueue we're talking about.
> > > > @@ -2682,6 +2742,29 @@ void *virtqueue_detach_unused_buf(struct virtqueue *_vq)
> > > >  }
> > > >  EXPORT_SYMBOL_GPL(virtqueue_detach_unused_buf);
> > > >
> > > > +/**
> > > > + * virtqueue_detach_unused_buf_premapped - detach first unused buffer
> > > > + * @_vq: the struct virtqueue we're talking about.
> > > > + * @cursor: detach cursor
> > > > + *
> > > > + * This is used for the premapped vq. The cursor is passed by the dirver, that
> > > > + * is used for virtqueue_detach. That will be initialized by virtio core
> > > > + * internally.
> > > > + *
> > > > + * Returns NULL or the "data" token handed to virtqueue_add_*().
> > > > + * This is not valid on an active queue; it is useful for device
> > > > + * shutdown or the reset queue.
> > > > + */
> > > > +void *virtqueue_detach_unused_buf_premapped(struct virtqueue *_vq,
> > > > +					    struct virtqueue_detach_cursor *cursor)
> > > > +{
> > > > +	struct vring_virtqueue *vq = to_vvq(_vq);
> > > > +
> > > > +	return vq->packed_ring ? virtqueue_detach_unused_buf_packed(_vq, cursor) :
> > > > +				 virtqueue_detach_unused_buf_split(_vq, cursor);
> > > > +}
> > > > +EXPORT_SYMBOL_GPL(virtqueue_detach_unused_buf_premapped);
> > > > +
> > > >  static inline bool more_used(const struct vring_virtqueue *vq)
> > > >  {
> > > >  	return vq->packed_ring ? more_used_packed(vq) : more_used_split(vq);
> > > > diff --git a/include/linux/virtio.h b/include/linux/virtio.h
> > > > index 7f137c7a9034..0a11c5b32fe5 100644
> > > > --- a/include/linux/virtio.h
> > > > +++ b/include/linux/virtio.h
> > > > @@ -3,6 +3,7 @@
> > > >  #define _LINUX_VIRTIO_H
> > > >  /* Everything a virtio driver needs to work with any particular virtio
> > > >   * implementation. */
> > > > +#include <linux/dma-mapping.h>
> > > >  #include <linux/types.h>
> > > >  #include <linux/scatterlist.h>
> > > >  #include <linux/spinlock.h>
> > > > @@ -88,6 +89,10 @@ void *virtqueue_get_buf(struct virtqueue *vq, unsigned int *len);
> > > >  void *virtqueue_get_buf_ctx(struct virtqueue *vq, unsigned int *len,
> > > >  			    void **ctx);
> > > >
> > > > +void *virtqueue_get_buf_premapped(struct virtqueue *_vq, unsigned int *len,
> > > > +				  void **ctx,
> > > > +				  struct virtqueue_detach_cursor *cursor);
> > > > +
> > > >  void virtqueue_disable_cb(struct virtqueue *vq);
> > > >
> > > >  bool virtqueue_enable_cb(struct virtqueue *vq);
> > > > @@ -101,6 +106,8 @@ bool virtqueue_poll(struct virtqueue *vq, unsigned);
> > > >  bool virtqueue_enable_cb_delayed(struct virtqueue *vq);
> > > >
> > > >  void *virtqueue_detach_unused_buf(struct virtqueue *vq);
> > > > +void *virtqueue_detach_unused_buf_premapped(struct virtqueue *_vq,
> > > > +					    struct virtqueue_detach_cursor *cursor);
> > > >
> > > >  unsigned int virtqueue_get_vring_size(const struct virtqueue *vq);
> > > >
> > > > @@ -114,6 +121,9 @@ dma_addr_t virtqueue_get_used_addr(const struct virtqueue *vq);
> > > >  int virtqueue_resize(struct virtqueue *vq, u32 num,
> > > >  		     void (*recycle)(struct virtqueue *vq, void *buf));
> > > >
> > > > +int virtqueue_detach(struct virtqueue *_vq, struct virtqueue_detach_cursor *cursor,
> > > > +		     dma_addr_t *addr, u32 *len, enum dma_data_direction *dir);
> > > > +
> > > >  /**
> > > >   * struct virtio_device - representation of a device using virtio
> > > >   * @index: unique position on the virtio bus
> > > > --
> > > > 2.32.0.3.g01195cf9f
> > >
>
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH vhost v10 07/10] virtio_ring: introduce helpers for premapped
@ 2023-06-06  2:01           ` Xuan Zhuo
  0 siblings, 0 replies; 91+ messages in thread
From: Xuan Zhuo @ 2023-06-06  2:01 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: virtualization, Jason Wang, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Alexei Starovoitov, Daniel Borkmann,
	Jesper Dangaard Brouer, John Fastabend, netdev, bpf

On Mon, 5 Jun 2023 01:38:48 -0400, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> On Mon, Jun 05, 2023 at 10:06:51AM +0800, Xuan Zhuo wrote:
> > On Sun, 4 Jun 2023 09:45:14 -0400, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > > On Fri, Jun 02, 2023 at 05:22:03PM +0800, Xuan Zhuo wrote:
> > > > This patch introduces three helpers for premapped mode.
> > > >
> > > > * virtqueue_get_buf_premapped
> > > > * virtqueue_detach_unused_buf_premapped
> > > >
> > > > The above helpers work like the non-premapped funcs. But a cursor is
> > > > passed.
> > > >
> > > > virtqueue_detach is used to get the dma info of the last buf by
> > > >   cursor.
> > >
> > > This isn't very clear from the description but virtqueue_detach is
> > > also introduced by this patch as opposed to being used.
> > >
> > >
> > > > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > > > ---
> > > >  drivers/virtio/virtio_ring.c | 83 ++++++++++++++++++++++++++++++++++++
> > > >  include/linux/virtio.h       | 10 +++++
> > > >  2 files changed, 93 insertions(+)
> > > >
> > > > diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> > > > index cbc22daae7e1..6771b9661798 100644
> > > > --- a/drivers/virtio/virtio_ring.c
> > > > +++ b/drivers/virtio/virtio_ring.c
> > > > @@ -2555,6 +2555,66 @@ void *virtqueue_get_buf(struct virtqueue *_vq, unsigned int *len)
> > > >  	return virtqueue_get_buf_ctx(_vq, len, NULL);
> > > >  }
> > > >  EXPORT_SYMBOL_GPL(virtqueue_get_buf);
> > > > +
> > > > +/**
> > > > + * virtqueue_get_buf_premapped - get the next used buffer
> > > > + * @_vq: the struct virtqueue we're talking about.
> > > > + * @len: the length written into the buffer
> > > > + * @ctx: extra context for the token
> > > > + * @cursor: detach cursor
> > > > + *
> > > > + * If the device wrote data into the buffer, @len will be set to the
> > > > + * amount written.  This means you don't need to clear the buffer
> > > > + * beforehand to ensure there's no data leakage in the case of short
> > > > + * writes.
> > > > + *
> > > > + * Caller must ensure we don't call this with other virtqueue
> > > > + * operations at the same time (except where noted).
> > > > + *
> > > > + * This is used for the premapped vq. The cursor is passed by the dirver, that
> > > > + * is used for virtqueue_detach. That will be initialized by virtio core
> > > > + * internally.
> > > > + *
> > > > + * Returns NULL if there are no used buffers, or the "data" token
> > > > + * handed to virtqueue_add_*().
> > > > + */
> > > > +void *virtqueue_get_buf_premapped(struct virtqueue *_vq, unsigned int *len,
> > > > +				  void **ctx,
> > > > +				  struct virtqueue_detach_cursor *cursor)
> > > > +{
> > > > +	struct vring_virtqueue *vq = to_vvq(_vq);
> > > > +
> > > > +	return vq->packed_ring ? virtqueue_get_buf_ctx_packed(_vq, len, ctx, cursor) :
> > > > +				 virtqueue_get_buf_ctx_split(_vq, len, ctx, cursor);
> > > > +}
> > > > +EXPORT_SYMBOL_GPL(virtqueue_get_buf_premapped);
> > > > +
> > > > +/**
> > > > + * virtqueue_detach - get the dma info of last buf
> > >
> > > detach what from what then?
> > > I am guessing this is not the only thing this function does?
> > > sounds like a bad name for a function.
> >
> > Let me think of a good name
> >
> > >
> > > > + * @_vq: the struct virtqueue we're talking about.
> > > > + * @cursor: detach cursor
> > > > + * @addr: the dma address
> > >
> > > what address?  it's the 1st time you mention an address ...
> >
> > Will fix.
> >
> >
> > >
> > > > + * @len: the length of the dma address
> > > > + * @dir: the direction of the dma address
> > > > + *
> > > > + * This is used for the premapped vq. The cursor is initialized by
> > > > + * virtqueue_get_buf_premapped or virtqueue_detach_unused_buf_premapped.
> > > > + *
> > > > + * Returns:
> > > > + * -EAGAIN: there are more dma info, this function should be called more.
> > >
> > > here too, pls don't return -EAGAIN not in an error case.
> > > something like "1" will do.
> >
> > While I agree with you, -EAGAIN seems to be a commonly used method.
>
> Where is it used like this? A typical use is e.g. in read(2):
>
>       EAGAIN The file descriptor fd refers to a file other than a socket and has been marked nonblocking (O_NONBLOCK), and  the  read
>               would block.  See open(2) for further details on the O_NONBLOCK flag.
>
> a better analog here is read filling up all its buffer, in which
> case it returns the # of bytes returned.


Rethink about this, I confused some scenarios. I should return a value to
indicate there is more data. "1" might be a good choice

Will fix.

Thanks.

>
>
> > How about we
> > return EAGAIN instead of -EAGAIN ?
> >
> > Thanks.
> >
> >
> >
> > >
> > > > + * -EINVAL: the process is done, should not call this function
> > > > + * 0: no more dma info
> > > > + */
> > > > +int virtqueue_detach(struct virtqueue *_vq, struct virtqueue_detach_cursor *cursor,
> > > > +		     dma_addr_t *addr, u32 *len, enum dma_data_direction *dir)
> > > > +{
> > > > +	struct vring_virtqueue *vq = to_vvq(_vq);
> > > > +
> > > > +	return vq->packed_ring ? virtqueue_detach_packed(_vq, cursor, addr, len, dir) :
> > > > +				 virtqueue_detach_split(_vq, cursor, addr, len, dir);
> > > > +}
> > > > +EXPORT_SYMBOL_GPL(virtqueue_detach);
> > > > +
> > > >  /**
> > > >   * virtqueue_disable_cb - disable callbacks
> > > >   * @_vq: the struct virtqueue we're talking about.
> > > > @@ -2682,6 +2742,29 @@ void *virtqueue_detach_unused_buf(struct virtqueue *_vq)
> > > >  }
> > > >  EXPORT_SYMBOL_GPL(virtqueue_detach_unused_buf);
> > > >
> > > > +/**
> > > > + * virtqueue_detach_unused_buf_premapped - detach first unused buffer
> > > > + * @_vq: the struct virtqueue we're talking about.
> > > > + * @cursor: detach cursor
> > > > + *
> > > > + * This is used for the premapped vq. The cursor is passed by the dirver, that
> > > > + * is used for virtqueue_detach. That will be initialized by virtio core
> > > > + * internally.
> > > > + *
> > > > + * Returns NULL or the "data" token handed to virtqueue_add_*().
> > > > + * This is not valid on an active queue; it is useful for device
> > > > + * shutdown or the reset queue.
> > > > + */
> > > > +void *virtqueue_detach_unused_buf_premapped(struct virtqueue *_vq,
> > > > +					    struct virtqueue_detach_cursor *cursor)
> > > > +{
> > > > +	struct vring_virtqueue *vq = to_vvq(_vq);
> > > > +
> > > > +	return vq->packed_ring ? virtqueue_detach_unused_buf_packed(_vq, cursor) :
> > > > +				 virtqueue_detach_unused_buf_split(_vq, cursor);
> > > > +}
> > > > +EXPORT_SYMBOL_GPL(virtqueue_detach_unused_buf_premapped);
> > > > +
> > > >  static inline bool more_used(const struct vring_virtqueue *vq)
> > > >  {
> > > >  	return vq->packed_ring ? more_used_packed(vq) : more_used_split(vq);
> > > > diff --git a/include/linux/virtio.h b/include/linux/virtio.h
> > > > index 7f137c7a9034..0a11c5b32fe5 100644
> > > > --- a/include/linux/virtio.h
> > > > +++ b/include/linux/virtio.h
> > > > @@ -3,6 +3,7 @@
> > > >  #define _LINUX_VIRTIO_H
> > > >  /* Everything a virtio driver needs to work with any particular virtio
> > > >   * implementation. */
> > > > +#include <linux/dma-mapping.h>
> > > >  #include <linux/types.h>
> > > >  #include <linux/scatterlist.h>
> > > >  #include <linux/spinlock.h>
> > > > @@ -88,6 +89,10 @@ void *virtqueue_get_buf(struct virtqueue *vq, unsigned int *len);
> > > >  void *virtqueue_get_buf_ctx(struct virtqueue *vq, unsigned int *len,
> > > >  			    void **ctx);
> > > >
> > > > +void *virtqueue_get_buf_premapped(struct virtqueue *_vq, unsigned int *len,
> > > > +				  void **ctx,
> > > > +				  struct virtqueue_detach_cursor *cursor);
> > > > +
> > > >  void virtqueue_disable_cb(struct virtqueue *vq);
> > > >
> > > >  bool virtqueue_enable_cb(struct virtqueue *vq);
> > > > @@ -101,6 +106,8 @@ bool virtqueue_poll(struct virtqueue *vq, unsigned);
> > > >  bool virtqueue_enable_cb_delayed(struct virtqueue *vq);
> > > >
> > > >  void *virtqueue_detach_unused_buf(struct virtqueue *vq);
> > > > +void *virtqueue_detach_unused_buf_premapped(struct virtqueue *_vq,
> > > > +					    struct virtqueue_detach_cursor *cursor);
> > > >
> > > >  unsigned int virtqueue_get_vring_size(const struct virtqueue *vq);
> > > >
> > > > @@ -114,6 +121,9 @@ dma_addr_t virtqueue_get_used_addr(const struct virtqueue *vq);
> > > >  int virtqueue_resize(struct virtqueue *vq, u32 num,
> > > >  		     void (*recycle)(struct virtqueue *vq, void *buf));
> > > >
> > > > +int virtqueue_detach(struct virtqueue *_vq, struct virtqueue_detach_cursor *cursor,
> > > > +		     dma_addr_t *addr, u32 *len, enum dma_data_direction *dir);
> > > > +
> > > >  /**
> > > >   * struct virtio_device - representation of a device using virtio
> > > >   * @index: unique position on the virtio bus
> > > > --
> > > > 2.32.0.3.g01195cf9f
> > >
>

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH vhost v10 10/10] virtio_net: support dma premapped
  2023-06-05  5:44         ` Michael S. Tsirkin
@ 2023-06-06  2:11           ` Xuan Zhuo
  -1 siblings, 0 replies; 91+ messages in thread
From: Xuan Zhuo @ 2023-06-06  2:11 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Jakub Kicinski, virtualization, Jason Wang, David S. Miller,
	Eric Dumazet, Paolo Abeni, Alexei Starovoitov, Daniel Borkmann,
	Jesper Dangaard Brouer, John Fastabend, netdev, bpf

On Mon, 5 Jun 2023 01:44:28 -0400, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> On Mon, Jun 05, 2023 at 10:10:44AM +0800, Xuan Zhuo wrote:
> > On Fri, 2 Jun 2023 23:31:52 -0700, Jakub Kicinski <kuba@kernel.org> wrote:
> > > On Fri,  2 Jun 2023 17:22:06 +0800 Xuan Zhuo wrote:
> > > >  drivers/net/virtio_net.c | 163 +++++++++++++++++++++++++++++++++------
> > >
> > > ack for this going via the vhost tree, FWIW, but you'll potentially
> > > need to wait for the merge window to move forward with the actual
> > > af xdp patches, in this case.
> >
> >
> > My current plan is to let virtio support premapped dma first, and then implement
> > virtio-net to support af-xdp zerocopy.
> >
> > This will indeed involve two branches. But most of the implementations in this
> > patch are virtio code, so I think it would be more appropriate to commit to
> > vhost. Do you have any good ideas?
> >
> >
> > Thanks.
>
> Are you still making changes to net core? DMA core? If it's only
> virtio-net then I can probably merge all of it - just a couple of
> bugfixes there so far, it shouldn't cause complex conflicts.

Just one small change to net core. no dma core.

I will try to fix this problem.

Thanks.


>
> --
> MST
>

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH vhost v10 10/10] virtio_net: support dma premapped
@ 2023-06-06  2:11           ` Xuan Zhuo
  0 siblings, 0 replies; 91+ messages in thread
From: Xuan Zhuo @ 2023-06-06  2:11 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Jesper Dangaard Brouer, Daniel Borkmann, netdev, John Fastabend,
	Alexei Starovoitov, virtualization, Eric Dumazet, Jakub Kicinski,
	bpf, Paolo Abeni, David S. Miller

On Mon, 5 Jun 2023 01:44:28 -0400, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> On Mon, Jun 05, 2023 at 10:10:44AM +0800, Xuan Zhuo wrote:
> > On Fri, 2 Jun 2023 23:31:52 -0700, Jakub Kicinski <kuba@kernel.org> wrote:
> > > On Fri,  2 Jun 2023 17:22:06 +0800 Xuan Zhuo wrote:
> > > >  drivers/net/virtio_net.c | 163 +++++++++++++++++++++++++++++++++------
> > >
> > > ack for this going via the vhost tree, FWIW, but you'll potentially
> > > need to wait for the merge window to move forward with the actual
> > > af xdp patches, in this case.
> >
> >
> > My current plan is to let virtio support premapped dma first, and then implement
> > virtio-net to support af-xdp zerocopy.
> >
> > This will indeed involve two branches. But most of the implementations in this
> > patch are virtio code, so I think it would be more appropriate to commit to
> > vhost. Do you have any good ideas?
> >
> >
> > Thanks.
>
> Are you still making changes to net core? DMA core? If it's only
> virtio-net then I can probably merge all of it - just a couple of
> bugfixes there so far, it shouldn't cause complex conflicts.

Just one small change to net core. no dma core.

I will try to fix this problem.

Thanks.


>
> --
> MST
>
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH vhost v10 00/10] virtio core prepares for AF_XDP
  2023-06-05  1:58     ` Xuan Zhuo
@ 2023-06-07 14:05       ` Christoph Hellwig
  -1 siblings, 0 replies; 91+ messages in thread
From: Christoph Hellwig @ 2023-06-07 14:05 UTC (permalink / raw)
  To: Xuan Zhuo
  Cc: Jesper Dangaard Brouer, Daniel Borkmann, Michael S. Tsirkin,
	netdev, John Fastabend, Alexei Starovoitov, virtualization,
	Christoph Hellwig, Eric Dumazet, Jakub Kicinski, bpf,
	Paolo Abeni, David S. Miller

On Mon, Jun 05, 2023 at 09:58:21AM +0800, Xuan Zhuo wrote:
> On Fri, 2 Jun 2023 23:29:02 -0700, Jakub Kicinski <kuba@kernel.org> wrote:
> > On Fri,  2 Jun 2023 17:21:56 +0800 Xuan Zhuo wrote:
> > > Thanks for the help from Christoph.
> >
> > That said you haven't CCed him on the series, isn't the general rule to
> > CC anyone who was involved in previous discussions?
> 
> 
> Sorry, I forgot to add cc after git format-patch.

So I've been looking for this series elsewhere, but it seems to include
neither lkml nor the iommu list, so I can't find it.  Can you please
repost it?
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH vhost v10 00/10] virtio core prepares for AF_XDP
@ 2023-06-07 14:05       ` Christoph Hellwig
  0 siblings, 0 replies; 91+ messages in thread
From: Christoph Hellwig @ 2023-06-07 14:05 UTC (permalink / raw)
  To: Xuan Zhuo
  Cc: Jakub Kicinski, virtualization, Michael S. Tsirkin, Jason Wang,
	David S. Miller, Eric Dumazet, Paolo Abeni, Alexei Starovoitov,
	Daniel Borkmann, Jesper Dangaard Brouer, John Fastabend, netdev,
	bpf, Christoph Hellwig

On Mon, Jun 05, 2023 at 09:58:21AM +0800, Xuan Zhuo wrote:
> On Fri, 2 Jun 2023 23:29:02 -0700, Jakub Kicinski <kuba@kernel.org> wrote:
> > On Fri,  2 Jun 2023 17:21:56 +0800 Xuan Zhuo wrote:
> > > Thanks for the help from Christoph.
> >
> > That said you haven't CCed him on the series, isn't the general rule to
> > CC anyone who was involved in previous discussions?
> 
> 
> Sorry, I forgot to add cc after git format-patch.

So I've been looking for this series elsewhere, but it seems to include
neither lkml nor the iommu list, so I can't find it.  Can you please
repost it?

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH vhost v10 00/10] virtio core prepares for AF_XDP
  2023-06-07 14:05       ` Christoph Hellwig
@ 2023-06-07 20:15         ` Michael S. Tsirkin
  -1 siblings, 0 replies; 91+ messages in thread
From: Michael S. Tsirkin @ 2023-06-07 20:15 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Xuan Zhuo, Jakub Kicinski, virtualization, Jason Wang,
	David S. Miller, Eric Dumazet, Paolo Abeni, Alexei Starovoitov,
	Daniel Borkmann, Jesper Dangaard Brouer, John Fastabend, netdev,
	bpf, linux-kernel

On Wed, Jun 07, 2023 at 07:05:11AM -0700, Christoph Hellwig wrote:
> On Mon, Jun 05, 2023 at 09:58:21AM +0800, Xuan Zhuo wrote:
> > On Fri, 2 Jun 2023 23:29:02 -0700, Jakub Kicinski <kuba@kernel.org> wrote:
> > > On Fri,  2 Jun 2023 17:21:56 +0800 Xuan Zhuo wrote:
> > > > Thanks for the help from Christoph.
> > >
> > > That said you haven't CCed him on the series, isn't the general rule to
> > > CC anyone who was involved in previous discussions?
> > 
> > 
> > Sorry, I forgot to add cc after git format-patch.
> 
> So I've been looking for this series elsewhere, but it seems to include
> neither lkml nor the iommu list, so I can't find it.  Can you please
> repost it?

I bounced to lkml now - does this help?


^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH vhost v10 00/10] virtio core prepares for AF_XDP
@ 2023-06-07 20:15         ` Michael S. Tsirkin
  0 siblings, 0 replies; 91+ messages in thread
From: Michael S. Tsirkin @ 2023-06-07 20:15 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Xuan Zhuo, Jesper Dangaard Brouer, Daniel Borkmann, netdev,
	John Fastabend, Alexei Starovoitov, virtualization, Eric Dumazet,
	Jakub Kicinski, bpf, Paolo Abeni, David S. Miller, linux-kernel

On Wed, Jun 07, 2023 at 07:05:11AM -0700, Christoph Hellwig wrote:
> On Mon, Jun 05, 2023 at 09:58:21AM +0800, Xuan Zhuo wrote:
> > On Fri, 2 Jun 2023 23:29:02 -0700, Jakub Kicinski <kuba@kernel.org> wrote:
> > > On Fri,  2 Jun 2023 17:21:56 +0800 Xuan Zhuo wrote:
> > > > Thanks for the help from Christoph.
> > >
> > > That said you haven't CCed him on the series, isn't the general rule to
> > > CC anyone who was involved in previous discussions?
> > 
> > 
> > Sorry, I forgot to add cc after git format-patch.
> 
> So I've been looking for this series elsewhere, but it seems to include
> neither lkml nor the iommu list, so I can't find it.  Can you please
> repost it?

I bounced to lkml now - does this help?

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH vhost v10 00/10] virtio core prepares for AF_XDP
  2023-06-02  9:21 ` Xuan Zhuo
@ 2023-06-21  6:42   ` Xuan Zhuo
  -1 siblings, 0 replies; 91+ messages in thread
From: Xuan Zhuo @ 2023-06-21  6:42 UTC (permalink / raw)
  To: Jason Wang
  Cc: Michael S. Tsirkin, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Alexei Starovoitov, Daniel Borkmann,
	Jesper Dangaard Brouer, John Fastabend, netdev, bpf,
	virtualization

Hi Jason,

Do you have plan to review this?

Thanks.


^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH vhost v10 00/10] virtio core prepares for AF_XDP
@ 2023-06-21  6:42   ` Xuan Zhuo
  0 siblings, 0 replies; 91+ messages in thread
From: Xuan Zhuo @ 2023-06-21  6:42 UTC (permalink / raw)
  To: Jason Wang
  Cc: Jesper Dangaard Brouer, Daniel Borkmann, Michael S. Tsirkin,
	netdev, John Fastabend, Alexei Starovoitov, virtualization,
	Eric Dumazet, Jakub Kicinski, bpf, Paolo Abeni, David S. Miller

Hi Jason,

Do you have plan to review this?

Thanks.

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH vhost v10 10/10] virtio_net: support dma premapped
  2023-06-02  9:22   ` Xuan Zhuo
@ 2023-06-22 12:15     ` Michael S. Tsirkin
  -1 siblings, 0 replies; 91+ messages in thread
From: Michael S. Tsirkin @ 2023-06-22 12:15 UTC (permalink / raw)
  To: Xuan Zhuo
  Cc: virtualization, Jason Wang, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Alexei Starovoitov, Daniel Borkmann,
	Jesper Dangaard Brouer, John Fastabend, netdev, bpf

On Fri, Jun 02, 2023 at 05:22:06PM +0800, Xuan Zhuo wrote:
> Introduce the module param "experiment_premapped" to enable the function
> that the virtio-net do dma mapping.
> 
> If that is true, the vq of virtio-net is under the premapped mode.
> It just handle the sg with dma_address. And the driver must get the dma
> address of the buffer to unmap after get the buffer from virtio core.
> 
> That will be useful when AF_XDP is enable, AF_XDP tx and the kernel packet
> xmit will share the tx queue, so the skb xmit must support the premapped
> mode.
> 
> Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>


I put this in next but I don't think this is going upstream
in its current form, certainly not with the experiment_premapped mod config
that no one will know how to enable. If you want to experiment,
keep it in your private tree, experimenting on humans requires
an ethics board approval and consent forms :)

Spreading the "premapped" boolean all of the place is also
far from pretty, I wonder why we can't only specify it when adding.

> ---
>  drivers/net/virtio_net.c | 163 +++++++++++++++++++++++++++++++++------
>  1 file changed, 141 insertions(+), 22 deletions(-)
> 
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index 2396c28c0122..5898212fcb3c 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -26,10 +26,11 @@
>  static int napi_weight = NAPI_POLL_WEIGHT;
>  module_param(napi_weight, int, 0444);
>  
> -static bool csum = true, gso = true, napi_tx = true;
> +static bool csum = true, gso = true, napi_tx = true, experiment_premapped;
>  module_param(csum, bool, 0444);
>  module_param(gso, bool, 0444);
>  module_param(napi_tx, bool, 0644);
> +module_param(experiment_premapped, bool, 0644);
>  
>  /* FIXME: MTU in config. */
>  #define GOOD_PACKET_LEN (ETH_HLEN + VLAN_HLEN + ETH_DATA_LEN)
> @@ -142,6 +143,9 @@ struct send_queue {
>  
>  	/* Record whether sq is in reset state. */
>  	bool reset;
> +
> +	/* The vq is premapped mode. */
> +	bool premapped;
>  };
>  
>  /* Internal representation of a receive virtqueue */
> @@ -174,6 +178,9 @@ struct receive_queue {
>  	char name[16];
>  
>  	struct xdp_rxq_info xdp_rxq;
> +
> +	/* The vq is premapped mode. */
> +	bool premapped;
>  };
>  
>  /* This structure can contain rss message with maximum settings for indirection table and keysize
> @@ -546,6 +553,105 @@ static struct sk_buff *page_to_skb(struct virtnet_info *vi,
>  	return skb;
>  }
>  
> +static int virtnet_generic_unmap(struct virtqueue *vq, struct virtqueue_detach_cursor *cursor)
> +{
> +	enum dma_data_direction dir;
> +	dma_addr_t addr;
> +	u32 len;
> +	int err;
> +
> +	do {
> +		err = virtqueue_detach(vq, cursor, &addr, &len, &dir);
> +		if (!err || err == -EAGAIN)
> +			dma_unmap_page_attrs(virtqueue_dma_dev(vq), addr, len, dir, 0);
> +
> +	} while (err == -EAGAIN);
> +
> +	return err;
> +}
> +
> +static void *virtnet_detach_unused_buf(struct virtqueue *vq, bool premapped)
> +{
> +	struct virtqueue_detach_cursor cursor;
> +	void *buf;
> +
> +	if (!premapped)
> +		return virtqueue_detach_unused_buf(vq);
> +
> +	buf = virtqueue_detach_unused_buf_premapped(vq, &cursor);
> +	if (buf)
> +		virtnet_generic_unmap(vq, &cursor);
> +
> +	return buf;
> +}
> +
> +static void *virtnet_get_buf_ctx(struct virtqueue *vq, bool premapped, u32 *len, void **ctx)
> +{
> +	struct virtqueue_detach_cursor cursor;
> +	void *buf;
> +
> +	if (!premapped)
> +		return virtqueue_get_buf_ctx(vq, len, ctx);
> +
> +	buf = virtqueue_get_buf_premapped(vq, len, ctx, &cursor);
> +	if (buf)
> +		virtnet_generic_unmap(vq, &cursor);
> +
> +	return buf;
> +}
> +
> +#define virtnet_rq_get_buf(rq, plen, pctx) \
> +({ \
> +	typeof(rq) _rq = (rq); \
> +	virtnet_get_buf_ctx(_rq->vq, _rq->premapped, plen, pctx); \
> +})
> +
> +#define virtnet_sq_get_buf(sq, plen, pctx) \
> +({ \
> +	typeof(sq) _sq = (sq); \
> +	virtnet_get_buf_ctx(_sq->vq, _sq->premapped, plen, pctx); \
> +})
> +
> +static int virtnet_add_sg(struct virtqueue *vq, bool premapped,
> +			  struct scatterlist *sg, unsigned int num, bool out,
> +			  void *data, void *ctx, gfp_t gfp)
> +{
> +	enum dma_data_direction dir;
> +	struct device *dev;
> +	int err, ret;
> +
> +	if (!premapped)
> +		return virtqueue_add_sg(vq, sg, num, out, data, ctx, gfp);
> +
> +	dir = out ? DMA_TO_DEVICE : DMA_FROM_DEVICE;
> +	dev = virtqueue_dma_dev(vq);
> +
> +	ret = dma_map_sg_attrs(dev, sg, num, dir, 0);
> +	if (ret != num)
> +		goto err;
> +
> +	err = virtqueue_add_sg(vq, sg, num, out, data, ctx, gfp);
> +	if (err < 0)
> +		goto err;
> +
> +	return 0;
> +
> +err:
> +	dma_unmap_sg_attrs(dev, sg, num, dir, 0);
> +	return -ENOMEM;
> +}
> +
> +static int virtnet_add_outbuf(struct send_queue *sq, unsigned int num, void *data)
> +{
> +	return virtnet_add_sg(sq->vq, sq->premapped, sq->sg, num, true, data, NULL, GFP_ATOMIC);
> +}
> +
> +static int virtnet_add_inbuf(struct receive_queue *rq, unsigned int num, void *data,
> +			     void *ctx, gfp_t gfp)
> +{
> +	return virtnet_add_sg(rq->vq, rq->premapped, rq->sg, num, false, data, ctx, gfp);
> +}
> +
>  static void free_old_xmit_skbs(struct send_queue *sq, bool in_napi)
>  {
>  	unsigned int len;
> @@ -553,7 +659,7 @@ static void free_old_xmit_skbs(struct send_queue *sq, bool in_napi)
>  	unsigned int bytes = 0;
>  	void *ptr;
>  
> -	while ((ptr = virtqueue_get_buf(sq->vq, &len)) != NULL) {
> +	while ((ptr = virtnet_sq_get_buf(sq, &len, NULL)) != NULL) {
>  		if (likely(!is_xdp_frame(ptr))) {
>  			struct sk_buff *skb = ptr;
>  
> @@ -667,8 +773,7 @@ static int __virtnet_xdp_xmit_one(struct virtnet_info *vi,
>  			    skb_frag_size(frag), skb_frag_off(frag));
>  	}
>  
> -	err = virtqueue_add_outbuf(sq->vq, sq->sg, nr_frags + 1,
> -				   xdp_to_ptr(xdpf), GFP_ATOMIC);
> +	err = virtnet_add_outbuf(sq, nr_frags + 1, xdp_to_ptr(xdpf));
>  	if (unlikely(err))
>  		return -ENOSPC; /* Caller handle free/refcnt */
>  
> @@ -744,7 +849,7 @@ static int virtnet_xdp_xmit(struct net_device *dev,
>  	}
>  
>  	/* Free up any pending old buffers before queueing new ones. */
> -	while ((ptr = virtqueue_get_buf(sq->vq, &len)) != NULL) {
> +	while ((ptr = virtnet_sq_get_buf(sq, &len, NULL)) != NULL) {
>  		if (likely(is_xdp_frame(ptr))) {
>  			struct xdp_frame *frame = ptr_to_xdp(ptr);
>  
> @@ -828,7 +933,7 @@ static struct page *xdp_linearize_page(struct receive_queue *rq,
>  		void *buf;
>  		int off;
>  
> -		buf = virtqueue_get_buf(rq->vq, &buflen);
> +		buf = virtnet_rq_get_buf(rq, &buflen, NULL);
>  		if (unlikely(!buf))
>  			goto err_buf;
>  
> @@ -1119,7 +1224,7 @@ static int virtnet_build_xdp_buff_mrg(struct net_device *dev,
>  		return -EINVAL;
>  
>  	while (--*num_buf > 0) {
> -		buf = virtqueue_get_buf_ctx(rq->vq, &len, &ctx);
> +		buf = virtnet_rq_get_buf(rq, &len, &ctx);
>  		if (unlikely(!buf)) {
>  			pr_debug("%s: rx error: %d buffers out of %d missing\n",
>  				 dev->name, *num_buf,
> @@ -1344,7 +1449,7 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
>  	while (--num_buf) {
>  		int num_skb_frags;
>  
> -		buf = virtqueue_get_buf_ctx(rq->vq, &len, &ctx);
> +		buf = virtnet_rq_get_buf(rq, &len, &ctx);
>  		if (unlikely(!buf)) {
>  			pr_debug("%s: rx error: %d buffers out of %d missing\n",
>  				 dev->name, num_buf,
> @@ -1407,7 +1512,7 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
>  err_skb:
>  	put_page(page);
>  	while (num_buf-- > 1) {
> -		buf = virtqueue_get_buf(rq->vq, &len);
> +		buf = virtnet_rq_get_buf(rq, &len, NULL);
>  		if (unlikely(!buf)) {
>  			pr_debug("%s: rx error: %d buffers missing\n",
>  				 dev->name, num_buf);
> @@ -1534,7 +1639,7 @@ static int add_recvbuf_small(struct virtnet_info *vi, struct receive_queue *rq,
>  	alloc_frag->offset += len;
>  	sg_init_one(rq->sg, buf + VIRTNET_RX_PAD + xdp_headroom,
>  		    vi->hdr_len + GOOD_PACKET_LEN);
> -	err = virtqueue_add_inbuf_ctx(rq->vq, rq->sg, 1, buf, ctx, gfp);
> +	err = virtnet_add_inbuf(rq, 1, buf, ctx, gfp);
>  	if (err < 0)
>  		put_page(virt_to_head_page(buf));
>  	return err;
> @@ -1581,8 +1686,8 @@ static int add_recvbuf_big(struct virtnet_info *vi, struct receive_queue *rq,
>  
>  	/* chain first in list head */
>  	first->private = (unsigned long)list;
> -	err = virtqueue_add_inbuf(rq->vq, rq->sg, vi->big_packets_num_skbfrags + 2,
> -				  first, gfp);
> +	err = virtnet_add_inbuf(rq, vi->big_packets_num_skbfrags + 2,
> +				first, NULL, gfp);
>  	if (err < 0)
>  		give_pages(rq, first);
>  
> @@ -1645,7 +1750,7 @@ static int add_recvbuf_mergeable(struct virtnet_info *vi,
>  
>  	sg_init_one(rq->sg, buf, len);
>  	ctx = mergeable_len_to_ctx(len + room, headroom);
> -	err = virtqueue_add_inbuf_ctx(rq->vq, rq->sg, 1, buf, ctx, gfp);
> +	err = virtnet_add_inbuf(rq, 1, buf, ctx, gfp);
>  	if (err < 0)
>  		put_page(virt_to_head_page(buf));
>  
> @@ -1768,13 +1873,13 @@ static int virtnet_receive(struct receive_queue *rq, int budget,
>  		void *ctx;
>  
>  		while (stats.packets < budget &&
> -		       (buf = virtqueue_get_buf_ctx(rq->vq, &len, &ctx))) {
> +		       (buf = virtnet_rq_get_buf(rq, &len, &ctx))) {
>  			receive_buf(vi, rq, buf, len, ctx, xdp_xmit, &stats);
>  			stats.packets++;
>  		}
>  	} else {
>  		while (stats.packets < budget &&
> -		       (buf = virtqueue_get_buf(rq->vq, &len)) != NULL) {
> +		       (buf = virtnet_rq_get_buf(rq, &len, NULL)) != NULL) {
>  			receive_buf(vi, rq, buf, len, NULL, xdp_xmit, &stats);
>  			stats.packets++;
>  		}
> @@ -1984,7 +2089,7 @@ static int xmit_skb(struct send_queue *sq, struct sk_buff *skb)
>  			return num_sg;
>  		num_sg++;
>  	}
> -	return virtqueue_add_outbuf(sq->vq, sq->sg, num_sg, skb, GFP_ATOMIC);
> +	return virtnet_add_outbuf(sq, num_sg, skb);
>  }
>  
>  static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev)
> @@ -3552,15 +3657,17 @@ static void free_unused_bufs(struct virtnet_info *vi)
>  	int i;
>  
>  	for (i = 0; i < vi->max_queue_pairs; i++) {
> -		struct virtqueue *vq = vi->sq[i].vq;
> -		while ((buf = virtqueue_detach_unused_buf(vq)) != NULL)
> -			virtnet_sq_free_unused_buf(vq, buf);
> +		struct send_queue *sq = &vi->sq[i];
> +
> +		while ((buf = virtnet_detach_unused_buf(sq->vq, sq->premapped)) != NULL)
> +			virtnet_sq_free_unused_buf(sq->vq, buf);
>  	}
>  
>  	for (i = 0; i < vi->max_queue_pairs; i++) {
> -		struct virtqueue *vq = vi->rq[i].vq;
> -		while ((buf = virtqueue_detach_unused_buf(vq)) != NULL)
> -			virtnet_rq_free_unused_buf(vq, buf);
> +		struct receive_queue *rq = &vi->rq[i];
> +
> +		while ((buf = virtnet_detach_unused_buf(rq->vq, rq->premapped)) != NULL)
> +			virtnet_rq_free_unused_buf(rq->vq, buf);
>  	}
>  }
>  
> @@ -3658,6 +3765,18 @@ static int virtnet_find_vqs(struct virtnet_info *vi)
>  		vi->rq[i].vq = vqs[rxq2vq(i)];
>  		vi->rq[i].min_buf_len = mergeable_min_buf_len(vi, vi->rq[i].vq);
>  		vi->sq[i].vq = vqs[txq2vq(i)];
> +
> +		if (experiment_premapped) {
> +			if (!virtqueue_set_premapped(vi->rq[i].vq))
> +				vi->rq[i].premapped = true;
> +			else
> +				netdev_warn(vi->dev, "RXQ (%d) enable premapped failure.\n", i);
> +
> +			if (!virtqueue_set_premapped(vi->sq[i].vq))
> +				vi->sq[i].premapped = true;
> +			else
> +				netdev_warn(vi->dev, "TXQ (%d) enable premapped failure.\n", i);
> +		}
>  	}
>  
>  	/* run here: ret == 0. */
> -- 
> 2.32.0.3.g01195cf9f


^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH vhost v10 10/10] virtio_net: support dma premapped
@ 2023-06-22 12:15     ` Michael S. Tsirkin
  0 siblings, 0 replies; 91+ messages in thread
From: Michael S. Tsirkin @ 2023-06-22 12:15 UTC (permalink / raw)
  To: Xuan Zhuo
  Cc: Jesper Dangaard Brouer, Daniel Borkmann, netdev, John Fastabend,
	Alexei Starovoitov, virtualization, Eric Dumazet, Jakub Kicinski,
	bpf, Paolo Abeni, David S. Miller

On Fri, Jun 02, 2023 at 05:22:06PM +0800, Xuan Zhuo wrote:
> Introduce the module param "experiment_premapped" to enable the function
> that the virtio-net do dma mapping.
> 
> If that is true, the vq of virtio-net is under the premapped mode.
> It just handle the sg with dma_address. And the driver must get the dma
> address of the buffer to unmap after get the buffer from virtio core.
> 
> That will be useful when AF_XDP is enable, AF_XDP tx and the kernel packet
> xmit will share the tx queue, so the skb xmit must support the premapped
> mode.
> 
> Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>


I put this in next but I don't think this is going upstream
in its current form, certainly not with the experiment_premapped mod config
that no one will know how to enable. If you want to experiment,
keep it in your private tree, experimenting on humans requires
an ethics board approval and consent forms :)

Spreading the "premapped" boolean all of the place is also
far from pretty, I wonder why we can't only specify it when adding.

> ---
>  drivers/net/virtio_net.c | 163 +++++++++++++++++++++++++++++++++------
>  1 file changed, 141 insertions(+), 22 deletions(-)
> 
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index 2396c28c0122..5898212fcb3c 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -26,10 +26,11 @@
>  static int napi_weight = NAPI_POLL_WEIGHT;
>  module_param(napi_weight, int, 0444);
>  
> -static bool csum = true, gso = true, napi_tx = true;
> +static bool csum = true, gso = true, napi_tx = true, experiment_premapped;
>  module_param(csum, bool, 0444);
>  module_param(gso, bool, 0444);
>  module_param(napi_tx, bool, 0644);
> +module_param(experiment_premapped, bool, 0644);
>  
>  /* FIXME: MTU in config. */
>  #define GOOD_PACKET_LEN (ETH_HLEN + VLAN_HLEN + ETH_DATA_LEN)
> @@ -142,6 +143,9 @@ struct send_queue {
>  
>  	/* Record whether sq is in reset state. */
>  	bool reset;
> +
> +	/* The vq is premapped mode. */
> +	bool premapped;
>  };
>  
>  /* Internal representation of a receive virtqueue */
> @@ -174,6 +178,9 @@ struct receive_queue {
>  	char name[16];
>  
>  	struct xdp_rxq_info xdp_rxq;
> +
> +	/* The vq is premapped mode. */
> +	bool premapped;
>  };
>  
>  /* This structure can contain rss message with maximum settings for indirection table and keysize
> @@ -546,6 +553,105 @@ static struct sk_buff *page_to_skb(struct virtnet_info *vi,
>  	return skb;
>  }
>  
> +static int virtnet_generic_unmap(struct virtqueue *vq, struct virtqueue_detach_cursor *cursor)
> +{
> +	enum dma_data_direction dir;
> +	dma_addr_t addr;
> +	u32 len;
> +	int err;
> +
> +	do {
> +		err = virtqueue_detach(vq, cursor, &addr, &len, &dir);
> +		if (!err || err == -EAGAIN)
> +			dma_unmap_page_attrs(virtqueue_dma_dev(vq), addr, len, dir, 0);
> +
> +	} while (err == -EAGAIN);
> +
> +	return err;
> +}
> +
> +static void *virtnet_detach_unused_buf(struct virtqueue *vq, bool premapped)
> +{
> +	struct virtqueue_detach_cursor cursor;
> +	void *buf;
> +
> +	if (!premapped)
> +		return virtqueue_detach_unused_buf(vq);
> +
> +	buf = virtqueue_detach_unused_buf_premapped(vq, &cursor);
> +	if (buf)
> +		virtnet_generic_unmap(vq, &cursor);
> +
> +	return buf;
> +}
> +
> +static void *virtnet_get_buf_ctx(struct virtqueue *vq, bool premapped, u32 *len, void **ctx)
> +{
> +	struct virtqueue_detach_cursor cursor;
> +	void *buf;
> +
> +	if (!premapped)
> +		return virtqueue_get_buf_ctx(vq, len, ctx);
> +
> +	buf = virtqueue_get_buf_premapped(vq, len, ctx, &cursor);
> +	if (buf)
> +		virtnet_generic_unmap(vq, &cursor);
> +
> +	return buf;
> +}
> +
> +#define virtnet_rq_get_buf(rq, plen, pctx) \
> +({ \
> +	typeof(rq) _rq = (rq); \
> +	virtnet_get_buf_ctx(_rq->vq, _rq->premapped, plen, pctx); \
> +})
> +
> +#define virtnet_sq_get_buf(sq, plen, pctx) \
> +({ \
> +	typeof(sq) _sq = (sq); \
> +	virtnet_get_buf_ctx(_sq->vq, _sq->premapped, plen, pctx); \
> +})
> +
> +static int virtnet_add_sg(struct virtqueue *vq, bool premapped,
> +			  struct scatterlist *sg, unsigned int num, bool out,
> +			  void *data, void *ctx, gfp_t gfp)
> +{
> +	enum dma_data_direction dir;
> +	struct device *dev;
> +	int err, ret;
> +
> +	if (!premapped)
> +		return virtqueue_add_sg(vq, sg, num, out, data, ctx, gfp);
> +
> +	dir = out ? DMA_TO_DEVICE : DMA_FROM_DEVICE;
> +	dev = virtqueue_dma_dev(vq);
> +
> +	ret = dma_map_sg_attrs(dev, sg, num, dir, 0);
> +	if (ret != num)
> +		goto err;
> +
> +	err = virtqueue_add_sg(vq, sg, num, out, data, ctx, gfp);
> +	if (err < 0)
> +		goto err;
> +
> +	return 0;
> +
> +err:
> +	dma_unmap_sg_attrs(dev, sg, num, dir, 0);
> +	return -ENOMEM;
> +}
> +
> +static int virtnet_add_outbuf(struct send_queue *sq, unsigned int num, void *data)
> +{
> +	return virtnet_add_sg(sq->vq, sq->premapped, sq->sg, num, true, data, NULL, GFP_ATOMIC);
> +}
> +
> +static int virtnet_add_inbuf(struct receive_queue *rq, unsigned int num, void *data,
> +			     void *ctx, gfp_t gfp)
> +{
> +	return virtnet_add_sg(rq->vq, rq->premapped, rq->sg, num, false, data, ctx, gfp);
> +}
> +
>  static void free_old_xmit_skbs(struct send_queue *sq, bool in_napi)
>  {
>  	unsigned int len;
> @@ -553,7 +659,7 @@ static void free_old_xmit_skbs(struct send_queue *sq, bool in_napi)
>  	unsigned int bytes = 0;
>  	void *ptr;
>  
> -	while ((ptr = virtqueue_get_buf(sq->vq, &len)) != NULL) {
> +	while ((ptr = virtnet_sq_get_buf(sq, &len, NULL)) != NULL) {
>  		if (likely(!is_xdp_frame(ptr))) {
>  			struct sk_buff *skb = ptr;
>  
> @@ -667,8 +773,7 @@ static int __virtnet_xdp_xmit_one(struct virtnet_info *vi,
>  			    skb_frag_size(frag), skb_frag_off(frag));
>  	}
>  
> -	err = virtqueue_add_outbuf(sq->vq, sq->sg, nr_frags + 1,
> -				   xdp_to_ptr(xdpf), GFP_ATOMIC);
> +	err = virtnet_add_outbuf(sq, nr_frags + 1, xdp_to_ptr(xdpf));
>  	if (unlikely(err))
>  		return -ENOSPC; /* Caller handle free/refcnt */
>  
> @@ -744,7 +849,7 @@ static int virtnet_xdp_xmit(struct net_device *dev,
>  	}
>  
>  	/* Free up any pending old buffers before queueing new ones. */
> -	while ((ptr = virtqueue_get_buf(sq->vq, &len)) != NULL) {
> +	while ((ptr = virtnet_sq_get_buf(sq, &len, NULL)) != NULL) {
>  		if (likely(is_xdp_frame(ptr))) {
>  			struct xdp_frame *frame = ptr_to_xdp(ptr);
>  
> @@ -828,7 +933,7 @@ static struct page *xdp_linearize_page(struct receive_queue *rq,
>  		void *buf;
>  		int off;
>  
> -		buf = virtqueue_get_buf(rq->vq, &buflen);
> +		buf = virtnet_rq_get_buf(rq, &buflen, NULL);
>  		if (unlikely(!buf))
>  			goto err_buf;
>  
> @@ -1119,7 +1224,7 @@ static int virtnet_build_xdp_buff_mrg(struct net_device *dev,
>  		return -EINVAL;
>  
>  	while (--*num_buf > 0) {
> -		buf = virtqueue_get_buf_ctx(rq->vq, &len, &ctx);
> +		buf = virtnet_rq_get_buf(rq, &len, &ctx);
>  		if (unlikely(!buf)) {
>  			pr_debug("%s: rx error: %d buffers out of %d missing\n",
>  				 dev->name, *num_buf,
> @@ -1344,7 +1449,7 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
>  	while (--num_buf) {
>  		int num_skb_frags;
>  
> -		buf = virtqueue_get_buf_ctx(rq->vq, &len, &ctx);
> +		buf = virtnet_rq_get_buf(rq, &len, &ctx);
>  		if (unlikely(!buf)) {
>  			pr_debug("%s: rx error: %d buffers out of %d missing\n",
>  				 dev->name, num_buf,
> @@ -1407,7 +1512,7 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
>  err_skb:
>  	put_page(page);
>  	while (num_buf-- > 1) {
> -		buf = virtqueue_get_buf(rq->vq, &len);
> +		buf = virtnet_rq_get_buf(rq, &len, NULL);
>  		if (unlikely(!buf)) {
>  			pr_debug("%s: rx error: %d buffers missing\n",
>  				 dev->name, num_buf);
> @@ -1534,7 +1639,7 @@ static int add_recvbuf_small(struct virtnet_info *vi, struct receive_queue *rq,
>  	alloc_frag->offset += len;
>  	sg_init_one(rq->sg, buf + VIRTNET_RX_PAD + xdp_headroom,
>  		    vi->hdr_len + GOOD_PACKET_LEN);
> -	err = virtqueue_add_inbuf_ctx(rq->vq, rq->sg, 1, buf, ctx, gfp);
> +	err = virtnet_add_inbuf(rq, 1, buf, ctx, gfp);
>  	if (err < 0)
>  		put_page(virt_to_head_page(buf));
>  	return err;
> @@ -1581,8 +1686,8 @@ static int add_recvbuf_big(struct virtnet_info *vi, struct receive_queue *rq,
>  
>  	/* chain first in list head */
>  	first->private = (unsigned long)list;
> -	err = virtqueue_add_inbuf(rq->vq, rq->sg, vi->big_packets_num_skbfrags + 2,
> -				  first, gfp);
> +	err = virtnet_add_inbuf(rq, vi->big_packets_num_skbfrags + 2,
> +				first, NULL, gfp);
>  	if (err < 0)
>  		give_pages(rq, first);
>  
> @@ -1645,7 +1750,7 @@ static int add_recvbuf_mergeable(struct virtnet_info *vi,
>  
>  	sg_init_one(rq->sg, buf, len);
>  	ctx = mergeable_len_to_ctx(len + room, headroom);
> -	err = virtqueue_add_inbuf_ctx(rq->vq, rq->sg, 1, buf, ctx, gfp);
> +	err = virtnet_add_inbuf(rq, 1, buf, ctx, gfp);
>  	if (err < 0)
>  		put_page(virt_to_head_page(buf));
>  
> @@ -1768,13 +1873,13 @@ static int virtnet_receive(struct receive_queue *rq, int budget,
>  		void *ctx;
>  
>  		while (stats.packets < budget &&
> -		       (buf = virtqueue_get_buf_ctx(rq->vq, &len, &ctx))) {
> +		       (buf = virtnet_rq_get_buf(rq, &len, &ctx))) {
>  			receive_buf(vi, rq, buf, len, ctx, xdp_xmit, &stats);
>  			stats.packets++;
>  		}
>  	} else {
>  		while (stats.packets < budget &&
> -		       (buf = virtqueue_get_buf(rq->vq, &len)) != NULL) {
> +		       (buf = virtnet_rq_get_buf(rq, &len, NULL)) != NULL) {
>  			receive_buf(vi, rq, buf, len, NULL, xdp_xmit, &stats);
>  			stats.packets++;
>  		}
> @@ -1984,7 +2089,7 @@ static int xmit_skb(struct send_queue *sq, struct sk_buff *skb)
>  			return num_sg;
>  		num_sg++;
>  	}
> -	return virtqueue_add_outbuf(sq->vq, sq->sg, num_sg, skb, GFP_ATOMIC);
> +	return virtnet_add_outbuf(sq, num_sg, skb);
>  }
>  
>  static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev)
> @@ -3552,15 +3657,17 @@ static void free_unused_bufs(struct virtnet_info *vi)
>  	int i;
>  
>  	for (i = 0; i < vi->max_queue_pairs; i++) {
> -		struct virtqueue *vq = vi->sq[i].vq;
> -		while ((buf = virtqueue_detach_unused_buf(vq)) != NULL)
> -			virtnet_sq_free_unused_buf(vq, buf);
> +		struct send_queue *sq = &vi->sq[i];
> +
> +		while ((buf = virtnet_detach_unused_buf(sq->vq, sq->premapped)) != NULL)
> +			virtnet_sq_free_unused_buf(sq->vq, buf);
>  	}
>  
>  	for (i = 0; i < vi->max_queue_pairs; i++) {
> -		struct virtqueue *vq = vi->rq[i].vq;
> -		while ((buf = virtqueue_detach_unused_buf(vq)) != NULL)
> -			virtnet_rq_free_unused_buf(vq, buf);
> +		struct receive_queue *rq = &vi->rq[i];
> +
> +		while ((buf = virtnet_detach_unused_buf(rq->vq, rq->premapped)) != NULL)
> +			virtnet_rq_free_unused_buf(rq->vq, buf);
>  	}
>  }
>  
> @@ -3658,6 +3765,18 @@ static int virtnet_find_vqs(struct virtnet_info *vi)
>  		vi->rq[i].vq = vqs[rxq2vq(i)];
>  		vi->rq[i].min_buf_len = mergeable_min_buf_len(vi, vi->rq[i].vq);
>  		vi->sq[i].vq = vqs[txq2vq(i)];
> +
> +		if (experiment_premapped) {
> +			if (!virtqueue_set_premapped(vi->rq[i].vq))
> +				vi->rq[i].premapped = true;
> +			else
> +				netdev_warn(vi->dev, "RXQ (%d) enable premapped failure.\n", i);
> +
> +			if (!virtqueue_set_premapped(vi->sq[i].vq))
> +				vi->sq[i].premapped = true;
> +			else
> +				netdev_warn(vi->dev, "TXQ (%d) enable premapped failure.\n", i);
> +		}
>  	}
>  
>  	/* run here: ret == 0. */
> -- 
> 2.32.0.3.g01195cf9f

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH vhost v10 07/10] virtio_ring: introduce helpers for premapped
  2023-06-02  9:22   ` Xuan Zhuo
@ 2023-06-22 19:29     ` Michael S. Tsirkin
  -1 siblings, 0 replies; 91+ messages in thread
From: Michael S. Tsirkin @ 2023-06-22 19:29 UTC (permalink / raw)
  To: Xuan Zhuo
  Cc: virtualization, Jason Wang, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Alexei Starovoitov, Daniel Borkmann,
	Jesper Dangaard Brouer, John Fastabend, netdev, bpf

On Fri, Jun 02, 2023 at 05:22:03PM +0800, Xuan Zhuo wrote:
> This patch introduces three helpers for premapped mode.
> 
> * virtqueue_get_buf_premapped
> * virtqueue_detach_unused_buf_premapped
> 
> The above helpers work like the non-premapped funcs. But a cursor is
> passed.
> 
> virtqueue_detach is used to get the dma info of the last buf by
>   cursor.
> 
> Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> ---
>  drivers/virtio/virtio_ring.c | 83 ++++++++++++++++++++++++++++++++++++
>  include/linux/virtio.h       | 10 +++++
>  2 files changed, 93 insertions(+)
> 
> diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> index cbc22daae7e1..6771b9661798 100644
> --- a/drivers/virtio/virtio_ring.c
> +++ b/drivers/virtio/virtio_ring.c
> @@ -2555,6 +2555,66 @@ void *virtqueue_get_buf(struct virtqueue *_vq, unsigned int *len)
>  	return virtqueue_get_buf_ctx(_vq, len, NULL);
>  }
>  EXPORT_SYMBOL_GPL(virtqueue_get_buf);
> +
> +/**
> + * virtqueue_get_buf_premapped - get the next used buffer
> + * @_vq: the struct virtqueue we're talking about.
> + * @len: the length written into the buffer
> + * @ctx: extra context for the token
> + * @cursor: detach cursor
> + *
> + * If the device wrote data into the buffer, @len will be set to the
> + * amount written.  This means you don't need to clear the buffer
> + * beforehand to ensure there's no data leakage in the case of short
> + * writes.
> + *
> + * Caller must ensure we don't call this with other virtqueue
> + * operations at the same time (except where noted).
> + *
> + * This is used for the premapped vq. The cursor is passed by the dirver, that
> + * is used for virtqueue_detach. That will be initialized by virtio core
> + * internally.

initialized how?

> + *
> + * Returns NULL if there are no used buffers, or the "data" token
> + * handed to virtqueue_add_*().
> + */
> +void *virtqueue_get_buf_premapped(struct virtqueue *_vq, unsigned int *len,
> +				  void **ctx,
> +				  struct virtqueue_detach_cursor *cursor)
> +{
> +	struct vring_virtqueue *vq = to_vvq(_vq);
> +
> +	return vq->packed_ring ? virtqueue_get_buf_ctx_packed(_vq, len, ctx, cursor) :
> +				 virtqueue_get_buf_ctx_split(_vq, len, ctx, cursor);
> +}
> +EXPORT_SYMBOL_GPL(virtqueue_get_buf_premapped);
> +
> +/**
> + * virtqueue_detach - get the dma info of last buf
> + * @_vq: the struct virtqueue we're talking about.
> + * @cursor: detach cursor
> + * @addr: the dma address
> + * @len: the length of the dma address
> + * @dir: the direction of the dma address
> + *
> + * This is used for the premapped vq. The cursor is initialized by
> + * virtqueue_get_buf_premapped or virtqueue_detach_unused_buf_premapped.
> + *
> + * Returns:
> + * -EAGAIN: there are more dma info, this function should be called more.

dma info is singular, so "there is".

I see you kept this idea of returning -EAGAIN on success.
Pls don't.


> + * -EINVAL: the process is done, should not call this function

here you really mean "a previous call returned 0"
we generally don't do defensive programming why do it here?

> + * 0: no more dma info

"no more" normally means "nothing was returned", not
"value returned and this was the last entry".


> + */
> +int virtqueue_detach(struct virtqueue *_vq, struct virtqueue_detach_cursor *cursor,
> +		     dma_addr_t *addr, u32 *len, enum dma_data_direction *dir)
> +{
> +	struct vring_virtqueue *vq = to_vvq(_vq);
> +
> +	return vq->packed_ring ? virtqueue_detach_packed(_vq, cursor, addr, len, dir) :
> +				 virtqueue_detach_split(_vq, cursor, addr, len, dir);
> +}
> +EXPORT_SYMBOL_GPL(virtqueue_detach);
> +
>  /**
>   * virtqueue_disable_cb - disable callbacks
>   * @_vq: the struct virtqueue we're talking about.
> @@ -2682,6 +2742,29 @@ void *virtqueue_detach_unused_buf(struct virtqueue *_vq)
>  }
>  EXPORT_SYMBOL_GPL(virtqueue_detach_unused_buf);
>  
> +/**
> + * virtqueue_detach_unused_buf_premapped - detach first unused buffer
> + * @_vq: the struct virtqueue we're talking about.
> + * @cursor: detach cursor
> + *
> + * This is used for the premapped vq. The cursor is passed by the dirver, that
> + * is used for virtqueue_detach. That will be initialized by virtio core
> + * internally.
> + *
> + * Returns NULL or the "data" token handed to virtqueue_add_*().
> + * This is not valid on an active queue; it is useful for device
> + * shutdown or the reset queue.
> + */
> +void *virtqueue_detach_unused_buf_premapped(struct virtqueue *_vq,
> +					    struct virtqueue_detach_cursor *cursor)
> +{
> +	struct vring_virtqueue *vq = to_vvq(_vq);
> +
> +	return vq->packed_ring ? virtqueue_detach_unused_buf_packed(_vq, cursor) :
> +				 virtqueue_detach_unused_buf_split(_vq, cursor);
> +}
> +EXPORT_SYMBOL_GPL(virtqueue_detach_unused_buf_premapped);
> +
>  static inline bool more_used(const struct vring_virtqueue *vq)
>  {
>  	return vq->packed_ring ? more_used_packed(vq) : more_used_split(vq);
> diff --git a/include/linux/virtio.h b/include/linux/virtio.h
> index 7f137c7a9034..0a11c5b32fe5 100644
> --- a/include/linux/virtio.h
> +++ b/include/linux/virtio.h
> @@ -3,6 +3,7 @@
>  #define _LINUX_VIRTIO_H
>  /* Everything a virtio driver needs to work with any particular virtio
>   * implementation. */
> +#include <linux/dma-mapping.h>
>  #include <linux/types.h>
>  #include <linux/scatterlist.h>
>  #include <linux/spinlock.h>
> @@ -88,6 +89,10 @@ void *virtqueue_get_buf(struct virtqueue *vq, unsigned int *len);
>  void *virtqueue_get_buf_ctx(struct virtqueue *vq, unsigned int *len,
>  			    void **ctx);
>  
> +void *virtqueue_get_buf_premapped(struct virtqueue *_vq, unsigned int *len,
> +				  void **ctx,
> +				  struct virtqueue_detach_cursor *cursor);
> +
>  void virtqueue_disable_cb(struct virtqueue *vq);
>  
>  bool virtqueue_enable_cb(struct virtqueue *vq);
> @@ -101,6 +106,8 @@ bool virtqueue_poll(struct virtqueue *vq, unsigned);
>  bool virtqueue_enable_cb_delayed(struct virtqueue *vq);
>  
>  void *virtqueue_detach_unused_buf(struct virtqueue *vq);
> +void *virtqueue_detach_unused_buf_premapped(struct virtqueue *_vq,
> +					    struct virtqueue_detach_cursor *cursor);
>  
>  unsigned int virtqueue_get_vring_size(const struct virtqueue *vq);
>  
> @@ -114,6 +121,9 @@ dma_addr_t virtqueue_get_used_addr(const struct virtqueue *vq);
>  int virtqueue_resize(struct virtqueue *vq, u32 num,
>  		     void (*recycle)(struct virtqueue *vq, void *buf));
>  
> +int virtqueue_detach(struct virtqueue *_vq, struct virtqueue_detach_cursor *cursor,
> +		     dma_addr_t *addr, u32 *len, enum dma_data_direction *dir);
> +
>  /**
>   * struct virtio_device - representation of a device using virtio
>   * @index: unique position on the virtio bus
> -- 
> 2.32.0.3.g01195cf9f


^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH vhost v10 07/10] virtio_ring: introduce helpers for premapped
@ 2023-06-22 19:29     ` Michael S. Tsirkin
  0 siblings, 0 replies; 91+ messages in thread
From: Michael S. Tsirkin @ 2023-06-22 19:29 UTC (permalink / raw)
  To: Xuan Zhuo
  Cc: Jesper Dangaard Brouer, Daniel Borkmann, netdev, John Fastabend,
	Alexei Starovoitov, virtualization, Eric Dumazet, Jakub Kicinski,
	bpf, Paolo Abeni, David S. Miller

On Fri, Jun 02, 2023 at 05:22:03PM +0800, Xuan Zhuo wrote:
> This patch introduces three helpers for premapped mode.
> 
> * virtqueue_get_buf_premapped
> * virtqueue_detach_unused_buf_premapped
> 
> The above helpers work like the non-premapped funcs. But a cursor is
> passed.
> 
> virtqueue_detach is used to get the dma info of the last buf by
>   cursor.
> 
> Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> ---
>  drivers/virtio/virtio_ring.c | 83 ++++++++++++++++++++++++++++++++++++
>  include/linux/virtio.h       | 10 +++++
>  2 files changed, 93 insertions(+)
> 
> diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> index cbc22daae7e1..6771b9661798 100644
> --- a/drivers/virtio/virtio_ring.c
> +++ b/drivers/virtio/virtio_ring.c
> @@ -2555,6 +2555,66 @@ void *virtqueue_get_buf(struct virtqueue *_vq, unsigned int *len)
>  	return virtqueue_get_buf_ctx(_vq, len, NULL);
>  }
>  EXPORT_SYMBOL_GPL(virtqueue_get_buf);
> +
> +/**
> + * virtqueue_get_buf_premapped - get the next used buffer
> + * @_vq: the struct virtqueue we're talking about.
> + * @len: the length written into the buffer
> + * @ctx: extra context for the token
> + * @cursor: detach cursor
> + *
> + * If the device wrote data into the buffer, @len will be set to the
> + * amount written.  This means you don't need to clear the buffer
> + * beforehand to ensure there's no data leakage in the case of short
> + * writes.
> + *
> + * Caller must ensure we don't call this with other virtqueue
> + * operations at the same time (except where noted).
> + *
> + * This is used for the premapped vq. The cursor is passed by the dirver, that
> + * is used for virtqueue_detach. That will be initialized by virtio core
> + * internally.

initialized how?

> + *
> + * Returns NULL if there are no used buffers, or the "data" token
> + * handed to virtqueue_add_*().
> + */
> +void *virtqueue_get_buf_premapped(struct virtqueue *_vq, unsigned int *len,
> +				  void **ctx,
> +				  struct virtqueue_detach_cursor *cursor)
> +{
> +	struct vring_virtqueue *vq = to_vvq(_vq);
> +
> +	return vq->packed_ring ? virtqueue_get_buf_ctx_packed(_vq, len, ctx, cursor) :
> +				 virtqueue_get_buf_ctx_split(_vq, len, ctx, cursor);
> +}
> +EXPORT_SYMBOL_GPL(virtqueue_get_buf_premapped);
> +
> +/**
> + * virtqueue_detach - get the dma info of last buf
> + * @_vq: the struct virtqueue we're talking about.
> + * @cursor: detach cursor
> + * @addr: the dma address
> + * @len: the length of the dma address
> + * @dir: the direction of the dma address
> + *
> + * This is used for the premapped vq. The cursor is initialized by
> + * virtqueue_get_buf_premapped or virtqueue_detach_unused_buf_premapped.
> + *
> + * Returns:
> + * -EAGAIN: there are more dma info, this function should be called more.

dma info is singular, so "there is".

I see you kept this idea of returning -EAGAIN on success.
Pls don't.


> + * -EINVAL: the process is done, should not call this function

here you really mean "a previous call returned 0"
we generally don't do defensive programming why do it here?

> + * 0: no more dma info

"no more" normally means "nothing was returned", not
"value returned and this was the last entry".


> + */
> +int virtqueue_detach(struct virtqueue *_vq, struct virtqueue_detach_cursor *cursor,
> +		     dma_addr_t *addr, u32 *len, enum dma_data_direction *dir)
> +{
> +	struct vring_virtqueue *vq = to_vvq(_vq);
> +
> +	return vq->packed_ring ? virtqueue_detach_packed(_vq, cursor, addr, len, dir) :
> +				 virtqueue_detach_split(_vq, cursor, addr, len, dir);
> +}
> +EXPORT_SYMBOL_GPL(virtqueue_detach);
> +
>  /**
>   * virtqueue_disable_cb - disable callbacks
>   * @_vq: the struct virtqueue we're talking about.
> @@ -2682,6 +2742,29 @@ void *virtqueue_detach_unused_buf(struct virtqueue *_vq)
>  }
>  EXPORT_SYMBOL_GPL(virtqueue_detach_unused_buf);
>  
> +/**
> + * virtqueue_detach_unused_buf_premapped - detach first unused buffer
> + * @_vq: the struct virtqueue we're talking about.
> + * @cursor: detach cursor
> + *
> + * This is used for the premapped vq. The cursor is passed by the dirver, that
> + * is used for virtqueue_detach. That will be initialized by virtio core
> + * internally.
> + *
> + * Returns NULL or the "data" token handed to virtqueue_add_*().
> + * This is not valid on an active queue; it is useful for device
> + * shutdown or the reset queue.
> + */
> +void *virtqueue_detach_unused_buf_premapped(struct virtqueue *_vq,
> +					    struct virtqueue_detach_cursor *cursor)
> +{
> +	struct vring_virtqueue *vq = to_vvq(_vq);
> +
> +	return vq->packed_ring ? virtqueue_detach_unused_buf_packed(_vq, cursor) :
> +				 virtqueue_detach_unused_buf_split(_vq, cursor);
> +}
> +EXPORT_SYMBOL_GPL(virtqueue_detach_unused_buf_premapped);
> +
>  static inline bool more_used(const struct vring_virtqueue *vq)
>  {
>  	return vq->packed_ring ? more_used_packed(vq) : more_used_split(vq);
> diff --git a/include/linux/virtio.h b/include/linux/virtio.h
> index 7f137c7a9034..0a11c5b32fe5 100644
> --- a/include/linux/virtio.h
> +++ b/include/linux/virtio.h
> @@ -3,6 +3,7 @@
>  #define _LINUX_VIRTIO_H
>  /* Everything a virtio driver needs to work with any particular virtio
>   * implementation. */
> +#include <linux/dma-mapping.h>
>  #include <linux/types.h>
>  #include <linux/scatterlist.h>
>  #include <linux/spinlock.h>
> @@ -88,6 +89,10 @@ void *virtqueue_get_buf(struct virtqueue *vq, unsigned int *len);
>  void *virtqueue_get_buf_ctx(struct virtqueue *vq, unsigned int *len,
>  			    void **ctx);
>  
> +void *virtqueue_get_buf_premapped(struct virtqueue *_vq, unsigned int *len,
> +				  void **ctx,
> +				  struct virtqueue_detach_cursor *cursor);
> +
>  void virtqueue_disable_cb(struct virtqueue *vq);
>  
>  bool virtqueue_enable_cb(struct virtqueue *vq);
> @@ -101,6 +106,8 @@ bool virtqueue_poll(struct virtqueue *vq, unsigned);
>  bool virtqueue_enable_cb_delayed(struct virtqueue *vq);
>  
>  void *virtqueue_detach_unused_buf(struct virtqueue *vq);
> +void *virtqueue_detach_unused_buf_premapped(struct virtqueue *_vq,
> +					    struct virtqueue_detach_cursor *cursor);
>  
>  unsigned int virtqueue_get_vring_size(const struct virtqueue *vq);
>  
> @@ -114,6 +121,9 @@ dma_addr_t virtqueue_get_used_addr(const struct virtqueue *vq);
>  int virtqueue_resize(struct virtqueue *vq, u32 num,
>  		     void (*recycle)(struct virtqueue *vq, void *buf));
>  
> +int virtqueue_detach(struct virtqueue *_vq, struct virtqueue_detach_cursor *cursor,
> +		     dma_addr_t *addr, u32 *len, enum dma_data_direction *dir);
> +
>  /**
>   * struct virtio_device - representation of a device using virtio
>   * @index: unique position on the virtio bus
> -- 
> 2.32.0.3.g01195cf9f

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH vhost v10 05/10] virtio_ring: split-detach: support return dma info to driver
  2023-06-02  9:22   ` Xuan Zhuo
@ 2023-06-22 19:36     ` Michael S. Tsirkin
  -1 siblings, 0 replies; 91+ messages in thread
From: Michael S. Tsirkin @ 2023-06-22 19:36 UTC (permalink / raw)
  To: Xuan Zhuo
  Cc: virtualization, Jason Wang, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Alexei Starovoitov, Daniel Borkmann,
	Jesper Dangaard Brouer, John Fastabend, netdev, bpf

On Fri, Jun 02, 2023 at 05:22:01PM +0800, Xuan Zhuo wrote:
> Under the premapped mode, the driver needs to unmap the DMA address
> after receiving the buffer. The virtio core records the DMA address,
> so the driver needs a way to get the dma info from the virtio core.
> 
> A straightforward approach is to pass an array to the virtio core when
> calling virtqueue_get_buf(). However, it is not feasible when there are
> multiple DMA addresses in the descriptor chain, and the array size is
> unknown.
> 
> To solve this problem, a helper be introduced. After calling
> virtqueue_get_buf(), the driver can call the helper to
> retrieve a dma info. If the helper function returns -EAGAIN, it means
> that there are more DMA addresses to be processed, and the driver should
> call the helper function again. To keep track of the current position in
> the chain, a cursor must be passed to the helper function, which is
> initialized by virtqueue_get_buf().
> 
> Some processes are done inside this helper, so this helper MUST be
> called under the premapped mode.
> 
> Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> ---
>  drivers/virtio/virtio_ring.c | 118 ++++++++++++++++++++++++++++++++---
>  include/linux/virtio.h       |  11 ++++
>  2 files changed, 119 insertions(+), 10 deletions(-)
> 
> diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> index dc109fbc05a5..cdc4349f6066 100644
> --- a/drivers/virtio/virtio_ring.c
> +++ b/drivers/virtio/virtio_ring.c
> @@ -754,8 +754,95 @@ static bool virtqueue_kick_prepare_split(struct virtqueue *_vq)
>  	return needs_kick;
>  }
>  
> -static void detach_buf_split(struct vring_virtqueue *vq, unsigned int head,
> -			     void **ctx)
> +static void detach_cursor_init_split(struct vring_virtqueue *vq,
> +				     struct virtqueue_detach_cursor *cursor, u16 head)
> +{
> +	struct vring_desc_extra *extra;
> +
> +	extra = &vq->split.desc_extra[head];
> +
> +	/* Clear data ptr. */
> +	vq->split.desc_state[head].data = NULL;
> +
> +	cursor->head = head;
> +	cursor->done = 0;
> +
> +	if (extra->flags & VRING_DESC_F_INDIRECT) {
> +		cursor->num = extra->len / sizeof(struct vring_desc);
> +		cursor->indirect = true;
> +		cursor->pos = 0;
> +
> +		vring_unmap_one_split(vq, head);
> +
> +		extra->next = vq->free_head;
> +
> +		vq->free_head = head;
> +
> +		/* Plus final descriptor */
> +		vq->vq.num_free++;
> +
> +	} else {
> +		cursor->indirect = false;
> +		cursor->pos = head;
> +	}
> +}
> +
> +static int virtqueue_detach_split(struct virtqueue *_vq, struct virtqueue_detach_cursor *cursor,
> +				  dma_addr_t *addr, u32 *len, enum dma_data_direction *dir)
> +{

I don't get it. This is generic split vq code? Why is it unconditionally
wasting time with cursors etc? Poking at split.desc_extra when not
necessary is also not really nice, will cause lots of cache misses.

And it looks like we duplicated a bunch of logic?


> +	struct vring_virtqueue *vq = to_vvq(_vq);
> +	__virtio16 nextflag = cpu_to_virtio16(vq->vq.vdev, VRING_DESC_F_NEXT);
> +	int rc = -EAGAIN;
> +
> +	if (unlikely(cursor->done))
> +		return -EINVAL;
> +
> +	if (!cursor->indirect) {
> +		struct vring_desc_extra *extra;
> +		unsigned int i;
> +
> +		i = cursor->pos;
> +
> +		extra = &vq->split.desc_extra[i];
> +
> +		if (vq->split.vring.desc[i].flags & nextflag) {
> +			cursor->pos = extra->next;
> +		} else {
> +			extra->next = vq->free_head;
> +			vq->free_head = cursor->head;
> +			cursor->done = true;
> +			rc = 0;
> +		}
> +
> +		*addr = extra->addr;
> +		*len = extra->len;
> +		*dir = (extra->flags & VRING_DESC_F_WRITE) ? DMA_FROM_DEVICE : DMA_TO_DEVICE;
> +
> +		vq->vq.num_free++;
> +
> +	} else {
> +		struct vring_desc *indir_desc, *desc;
> +		u16 flags;
> +
> +		indir_desc = vq->split.desc_state[cursor->head].indir_desc;
> +		desc = &indir_desc[cursor->pos];
> +
> +		flags = virtio16_to_cpu(vq->vq.vdev, desc->flags);
> +		*addr = virtio64_to_cpu(vq->vq.vdev, desc->addr);
> +		*len = virtio32_to_cpu(vq->vq.vdev, desc->len);
> +		*dir = (flags & VRING_DESC_F_WRITE) ? DMA_FROM_DEVICE : DMA_TO_DEVICE;
> +
> +		if (++cursor->pos == cursor->num) {
> +			kfree(indir_desc);
> +			cursor->done = true;
> +			return 0;
> +		}
> +	}
> +
> +	return rc;
> +}
> +
> +static void detach_buf_split(struct vring_virtqueue *vq, unsigned int head)
>  {
>  	unsigned int i, j;
>  	__virtio16 nextflag = cpu_to_virtio16(vq->vq.vdev, VRING_DESC_F_NEXT);
> @@ -799,8 +886,6 @@ static void detach_buf_split(struct vring_virtqueue *vq, unsigned int head,
>  
>  		kfree(indir_desc);
>  		vq->split.desc_state[head].indir_desc = NULL;
> -	} else if (ctx) {
> -		*ctx = vq->split.desc_state[head].indir_desc;
>  	}
>  }
>  
> @@ -812,7 +897,8 @@ static bool more_used_split(const struct vring_virtqueue *vq)
>  
>  static void *virtqueue_get_buf_ctx_split(struct virtqueue *_vq,
>  					 unsigned int *len,
> -					 void **ctx)
> +					 void **ctx,
> +					 struct virtqueue_detach_cursor *cursor)
>  {
>  	struct vring_virtqueue *vq = to_vvq(_vq);
>  	void *ret;
> @@ -852,7 +938,15 @@ static void *virtqueue_get_buf_ctx_split(struct virtqueue *_vq,
>  
>  	/* detach_buf_split clears data, so grab it now. */
>  	ret = vq->split.desc_state[i].data;
> -	detach_buf_split(vq, i, ctx);
> +
> +	if (!vq->indirect && ctx)
> +		*ctx = vq->split.desc_state[i].indir_desc;
> +
> +	if (vq->premapped)
> +		detach_cursor_init_split(vq, cursor, i);
> +	else
> +		detach_buf_split(vq, i);
> +
>  	vq->last_used_idx++;
>  	/* If we expect an interrupt for the next entry, tell host
>  	 * by writing event index and flush out the write before
> @@ -961,7 +1055,8 @@ static bool virtqueue_enable_cb_delayed_split(struct virtqueue *_vq)
>  	return true;
>  }
>  
> -static void *virtqueue_detach_unused_buf_split(struct virtqueue *_vq)
> +static void *virtqueue_detach_unused_buf_split(struct virtqueue *_vq,
> +					       struct virtqueue_detach_cursor *cursor)
>  {
>  	struct vring_virtqueue *vq = to_vvq(_vq);
>  	unsigned int i;
> @@ -974,7 +1069,10 @@ static void *virtqueue_detach_unused_buf_split(struct virtqueue *_vq)
>  			continue;
>  		/* detach_buf_split clears data, so grab it now. */
>  		buf = vq->split.desc_state[i].data;
> -		detach_buf_split(vq, i, NULL);
> +		if (vq->premapped)
> +			detach_cursor_init_split(vq, cursor, i);
> +		else
> +			detach_buf_split(vq, i);
>  		vq->split.avail_idx_shadow--;
>  		vq->split.vring.avail->idx = cpu_to_virtio16(_vq->vdev,
>  				vq->split.avail_idx_shadow);
> @@ -2361,7 +2459,7 @@ void *virtqueue_get_buf_ctx(struct virtqueue *_vq, unsigned int *len,
>  	struct vring_virtqueue *vq = to_vvq(_vq);
>  
>  	return vq->packed_ring ? virtqueue_get_buf_ctx_packed(_vq, len, ctx) :
> -				 virtqueue_get_buf_ctx_split(_vq, len, ctx);
> +				 virtqueue_get_buf_ctx_split(_vq, len, ctx, NULL);
>  }
>  EXPORT_SYMBOL_GPL(virtqueue_get_buf_ctx);
>  
> @@ -2493,7 +2591,7 @@ void *virtqueue_detach_unused_buf(struct virtqueue *_vq)
>  	struct vring_virtqueue *vq = to_vvq(_vq);
>  
>  	return vq->packed_ring ? virtqueue_detach_unused_buf_packed(_vq) :
> -				 virtqueue_detach_unused_buf_split(_vq);
> +				 virtqueue_detach_unused_buf_split(_vq, NULL);
>  }
>  EXPORT_SYMBOL_GPL(virtqueue_detach_unused_buf);
>  
> diff --git a/include/linux/virtio.h b/include/linux/virtio.h
> index 1fc0e1023bd4..eb4a4e4329aa 100644
> --- a/include/linux/virtio.h
> +++ b/include/linux/virtio.h
> @@ -38,6 +38,17 @@ struct virtqueue {
>  	void *priv;
>  };
>  
> +struct virtqueue_detach_cursor {
> +	unsigned indirect:1;
> +	unsigned done:1;
> +	unsigned hole:14;
> +
> +	/* for split head */
> +	unsigned head:16;
> +	unsigned num:16;
> +	unsigned pos:16;
> +};
> +

is cursor ever stored somewhere? If not don't use bitfields,
they cause many gcc versions to generate atrocious code.


>  int virtqueue_add_outbuf(struct virtqueue *vq,
>  			 struct scatterlist sg[], unsigned int num,
>  			 void *data,
> -- 
> 2.32.0.3.g01195cf9f


^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH vhost v10 05/10] virtio_ring: split-detach: support return dma info to driver
@ 2023-06-22 19:36     ` Michael S. Tsirkin
  0 siblings, 0 replies; 91+ messages in thread
From: Michael S. Tsirkin @ 2023-06-22 19:36 UTC (permalink / raw)
  To: Xuan Zhuo
  Cc: Jesper Dangaard Brouer, Daniel Borkmann, netdev, John Fastabend,
	Alexei Starovoitov, virtualization, Eric Dumazet, Jakub Kicinski,
	bpf, Paolo Abeni, David S. Miller

On Fri, Jun 02, 2023 at 05:22:01PM +0800, Xuan Zhuo wrote:
> Under the premapped mode, the driver needs to unmap the DMA address
> after receiving the buffer. The virtio core records the DMA address,
> so the driver needs a way to get the dma info from the virtio core.
> 
> A straightforward approach is to pass an array to the virtio core when
> calling virtqueue_get_buf(). However, it is not feasible when there are
> multiple DMA addresses in the descriptor chain, and the array size is
> unknown.
> 
> To solve this problem, a helper be introduced. After calling
> virtqueue_get_buf(), the driver can call the helper to
> retrieve a dma info. If the helper function returns -EAGAIN, it means
> that there are more DMA addresses to be processed, and the driver should
> call the helper function again. To keep track of the current position in
> the chain, a cursor must be passed to the helper function, which is
> initialized by virtqueue_get_buf().
> 
> Some processes are done inside this helper, so this helper MUST be
> called under the premapped mode.
> 
> Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> ---
>  drivers/virtio/virtio_ring.c | 118 ++++++++++++++++++++++++++++++++---
>  include/linux/virtio.h       |  11 ++++
>  2 files changed, 119 insertions(+), 10 deletions(-)
> 
> diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> index dc109fbc05a5..cdc4349f6066 100644
> --- a/drivers/virtio/virtio_ring.c
> +++ b/drivers/virtio/virtio_ring.c
> @@ -754,8 +754,95 @@ static bool virtqueue_kick_prepare_split(struct virtqueue *_vq)
>  	return needs_kick;
>  }
>  
> -static void detach_buf_split(struct vring_virtqueue *vq, unsigned int head,
> -			     void **ctx)
> +static void detach_cursor_init_split(struct vring_virtqueue *vq,
> +				     struct virtqueue_detach_cursor *cursor, u16 head)
> +{
> +	struct vring_desc_extra *extra;
> +
> +	extra = &vq->split.desc_extra[head];
> +
> +	/* Clear data ptr. */
> +	vq->split.desc_state[head].data = NULL;
> +
> +	cursor->head = head;
> +	cursor->done = 0;
> +
> +	if (extra->flags & VRING_DESC_F_INDIRECT) {
> +		cursor->num = extra->len / sizeof(struct vring_desc);
> +		cursor->indirect = true;
> +		cursor->pos = 0;
> +
> +		vring_unmap_one_split(vq, head);
> +
> +		extra->next = vq->free_head;
> +
> +		vq->free_head = head;
> +
> +		/* Plus final descriptor */
> +		vq->vq.num_free++;
> +
> +	} else {
> +		cursor->indirect = false;
> +		cursor->pos = head;
> +	}
> +}
> +
> +static int virtqueue_detach_split(struct virtqueue *_vq, struct virtqueue_detach_cursor *cursor,
> +				  dma_addr_t *addr, u32 *len, enum dma_data_direction *dir)
> +{

I don't get it. This is generic split vq code? Why is it unconditionally
wasting time with cursors etc? Poking at split.desc_extra when not
necessary is also not really nice, will cause lots of cache misses.

And it looks like we duplicated a bunch of logic?


> +	struct vring_virtqueue *vq = to_vvq(_vq);
> +	__virtio16 nextflag = cpu_to_virtio16(vq->vq.vdev, VRING_DESC_F_NEXT);
> +	int rc = -EAGAIN;
> +
> +	if (unlikely(cursor->done))
> +		return -EINVAL;
> +
> +	if (!cursor->indirect) {
> +		struct vring_desc_extra *extra;
> +		unsigned int i;
> +
> +		i = cursor->pos;
> +
> +		extra = &vq->split.desc_extra[i];
> +
> +		if (vq->split.vring.desc[i].flags & nextflag) {
> +			cursor->pos = extra->next;
> +		} else {
> +			extra->next = vq->free_head;
> +			vq->free_head = cursor->head;
> +			cursor->done = true;
> +			rc = 0;
> +		}
> +
> +		*addr = extra->addr;
> +		*len = extra->len;
> +		*dir = (extra->flags & VRING_DESC_F_WRITE) ? DMA_FROM_DEVICE : DMA_TO_DEVICE;
> +
> +		vq->vq.num_free++;
> +
> +	} else {
> +		struct vring_desc *indir_desc, *desc;
> +		u16 flags;
> +
> +		indir_desc = vq->split.desc_state[cursor->head].indir_desc;
> +		desc = &indir_desc[cursor->pos];
> +
> +		flags = virtio16_to_cpu(vq->vq.vdev, desc->flags);
> +		*addr = virtio64_to_cpu(vq->vq.vdev, desc->addr);
> +		*len = virtio32_to_cpu(vq->vq.vdev, desc->len);
> +		*dir = (flags & VRING_DESC_F_WRITE) ? DMA_FROM_DEVICE : DMA_TO_DEVICE;
> +
> +		if (++cursor->pos == cursor->num) {
> +			kfree(indir_desc);
> +			cursor->done = true;
> +			return 0;
> +		}
> +	}
> +
> +	return rc;
> +}
> +
> +static void detach_buf_split(struct vring_virtqueue *vq, unsigned int head)
>  {
>  	unsigned int i, j;
>  	__virtio16 nextflag = cpu_to_virtio16(vq->vq.vdev, VRING_DESC_F_NEXT);
> @@ -799,8 +886,6 @@ static void detach_buf_split(struct vring_virtqueue *vq, unsigned int head,
>  
>  		kfree(indir_desc);
>  		vq->split.desc_state[head].indir_desc = NULL;
> -	} else if (ctx) {
> -		*ctx = vq->split.desc_state[head].indir_desc;
>  	}
>  }
>  
> @@ -812,7 +897,8 @@ static bool more_used_split(const struct vring_virtqueue *vq)
>  
>  static void *virtqueue_get_buf_ctx_split(struct virtqueue *_vq,
>  					 unsigned int *len,
> -					 void **ctx)
> +					 void **ctx,
> +					 struct virtqueue_detach_cursor *cursor)
>  {
>  	struct vring_virtqueue *vq = to_vvq(_vq);
>  	void *ret;
> @@ -852,7 +938,15 @@ static void *virtqueue_get_buf_ctx_split(struct virtqueue *_vq,
>  
>  	/* detach_buf_split clears data, so grab it now. */
>  	ret = vq->split.desc_state[i].data;
> -	detach_buf_split(vq, i, ctx);
> +
> +	if (!vq->indirect && ctx)
> +		*ctx = vq->split.desc_state[i].indir_desc;
> +
> +	if (vq->premapped)
> +		detach_cursor_init_split(vq, cursor, i);
> +	else
> +		detach_buf_split(vq, i);
> +
>  	vq->last_used_idx++;
>  	/* If we expect an interrupt for the next entry, tell host
>  	 * by writing event index and flush out the write before
> @@ -961,7 +1055,8 @@ static bool virtqueue_enable_cb_delayed_split(struct virtqueue *_vq)
>  	return true;
>  }
>  
> -static void *virtqueue_detach_unused_buf_split(struct virtqueue *_vq)
> +static void *virtqueue_detach_unused_buf_split(struct virtqueue *_vq,
> +					       struct virtqueue_detach_cursor *cursor)
>  {
>  	struct vring_virtqueue *vq = to_vvq(_vq);
>  	unsigned int i;
> @@ -974,7 +1069,10 @@ static void *virtqueue_detach_unused_buf_split(struct virtqueue *_vq)
>  			continue;
>  		/* detach_buf_split clears data, so grab it now. */
>  		buf = vq->split.desc_state[i].data;
> -		detach_buf_split(vq, i, NULL);
> +		if (vq->premapped)
> +			detach_cursor_init_split(vq, cursor, i);
> +		else
> +			detach_buf_split(vq, i);
>  		vq->split.avail_idx_shadow--;
>  		vq->split.vring.avail->idx = cpu_to_virtio16(_vq->vdev,
>  				vq->split.avail_idx_shadow);
> @@ -2361,7 +2459,7 @@ void *virtqueue_get_buf_ctx(struct virtqueue *_vq, unsigned int *len,
>  	struct vring_virtqueue *vq = to_vvq(_vq);
>  
>  	return vq->packed_ring ? virtqueue_get_buf_ctx_packed(_vq, len, ctx) :
> -				 virtqueue_get_buf_ctx_split(_vq, len, ctx);
> +				 virtqueue_get_buf_ctx_split(_vq, len, ctx, NULL);
>  }
>  EXPORT_SYMBOL_GPL(virtqueue_get_buf_ctx);
>  
> @@ -2493,7 +2591,7 @@ void *virtqueue_detach_unused_buf(struct virtqueue *_vq)
>  	struct vring_virtqueue *vq = to_vvq(_vq);
>  
>  	return vq->packed_ring ? virtqueue_detach_unused_buf_packed(_vq) :
> -				 virtqueue_detach_unused_buf_split(_vq);
> +				 virtqueue_detach_unused_buf_split(_vq, NULL);
>  }
>  EXPORT_SYMBOL_GPL(virtqueue_detach_unused_buf);
>  
> diff --git a/include/linux/virtio.h b/include/linux/virtio.h
> index 1fc0e1023bd4..eb4a4e4329aa 100644
> --- a/include/linux/virtio.h
> +++ b/include/linux/virtio.h
> @@ -38,6 +38,17 @@ struct virtqueue {
>  	void *priv;
>  };
>  
> +struct virtqueue_detach_cursor {
> +	unsigned indirect:1;
> +	unsigned done:1;
> +	unsigned hole:14;
> +
> +	/* for split head */
> +	unsigned head:16;
> +	unsigned num:16;
> +	unsigned pos:16;
> +};
> +

is cursor ever stored somewhere? If not don't use bitfields,
they cause many gcc versions to generate atrocious code.


>  int virtqueue_add_outbuf(struct virtqueue *vq,
>  			 struct scatterlist sg[], unsigned int num,
>  			 void *data,
> -- 
> 2.32.0.3.g01195cf9f

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH vhost v10 00/10] virtio core prepares for AF_XDP
  2023-06-02  9:21 ` Xuan Zhuo
@ 2023-06-22 19:38   ` Michael S. Tsirkin
  -1 siblings, 0 replies; 91+ messages in thread
From: Michael S. Tsirkin @ 2023-06-22 19:38 UTC (permalink / raw)
  To: Xuan Zhuo
  Cc: virtualization, Jason Wang, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Alexei Starovoitov, Daniel Borkmann,
	Jesper Dangaard Brouer, John Fastabend, netdev, bpf

On Fri, Jun 02, 2023 at 05:21:56PM +0800, Xuan Zhuo wrote:
> ## About DMA APIs
> 
> Now, virtio may can not work with DMA APIs when virtio features do not have
> VIRTIO_F_ACCESS_PLATFORM.
> 
> 1. I tried to let DMA APIs return phy address by virtio-device. But DMA APIs just
>    work with the "real" devices.
> 2. I tried to let xsk support callballs to get phy address from virtio-net
>    driver as the dma address. But the maintainers of xsk may want to use dma-buf
>    to replace the DMA APIs. I think that may be a larger effort. We will wait
>    too long.
> 
> So rethinking this, firstly, we can support premapped-dma only for devices with
> VIRTIO_F_ACCESS_PLATFORM. In the case of af-xdp, if the users want to use it,
> they have to update the device to support VIRTIO_F_RING_RESET, and they can also
> enable the device's VIRTIO_F_ACCESS_PLATFORM feature.
> 
> Thanks for the help from Christoph.
> 
> =================
> 
> XDP socket(AF_XDP) is an excellent bypass kernel network framework. The zero
> copy feature of xsk (XDP socket) needs to be supported by the driver. The
> performance of zero copy is very good.
> 
> ENV: Qemu with vhost.
> 
>                    vhost cpu | Guest APP CPU |Guest Softirq CPU | PPS
> -----------------------------|---------------|------------------|------------
> xmit by sockperf:     90%    |   100%        |                  |  318967
> xmit by xsk:          100%   |   30%         |   33%            | 1192064
> recv by sockperf:     100%   |   68%         |   100%           |  692288
> recv by xsk:          100%   |   33%         |   43%            |  771670
> 
> Before achieving the function of Virtio-Net, we also have to let virtio core
> support these features:

So by itself, this doesn't do this. But what effect does all this
overhead have on performance?

> 1. virtio core support premapped
> 2. virtio core support reset per-queue
> 3. introduce DMA APIs to virtio core
> 
> Please review.
> 
> Thanks.
> 
> v10:
>  1. support to set vq to premapped mode, then the vq just handles the premapped request.
>  2. virtio-net support to do dma mapping in advance
> 
> v9:
>  1. use flag to distinguish the premapped operations. no do judgment by sg.
> 
> v8:
>  1. vring_sg_address: check by sg_page(sg) not dma_address. Because 0 is a valid dma address
>  2. remove unused code from vring_map_one_sg()
> 
> v7:
>  1. virtqueue_dma_dev() return NULL when virtio is without DMA API.
> 
> v6:
>  1. change the size of the flags to u32.
> 
> v5:
>  1. fix for error handler
>  2. add flags to record internal dma mapping
> 
> v4:
>  1. rename map_inter to dma_map_internal
>  2. fix: Excess function parameter 'vq' description in 'virtqueue_dma_dev'
> 
> v3:
>  1. add map_inter to struct desc state to reocrd whether virtio core do dma map
> 
> v2:
>  1. based on sgs[0]->dma_address to judgment is premapped
>  2. based on extra.addr to judgment to do unmap for no-indirect desc
>  3. based on indir_desc to judgment to do unmap for indirect desc
>  4. rename virtqueue_get_dma_dev to virtqueue_dma_dev
> 
> v1:
>  1. expose dma device. NO introduce the api for dma and sync
>  2. split some commit for review.
> 
> 
> 
> 
> Xuan Zhuo (10):
>   virtio_ring: put mapping error check in vring_map_one_sg
>   virtio_ring: introduce virtqueue_set_premapped()
>   virtio_ring: split: support add premapped buf
>   virtio_ring: packed: support add premapped buf
>   virtio_ring: split-detach: support return dma info to driver
>   virtio_ring: packed-detach: support return dma info to driver
>   virtio_ring: introduce helpers for premapped
>   virtio_ring: introduce virtqueue_dma_dev()
>   virtio_ring: introduce virtqueue_add_sg()
>   virtio_net: support dma premapped
> 
>  drivers/net/virtio_net.c     | 163 ++++++++++--
>  drivers/virtio/virtio_ring.c | 493 +++++++++++++++++++++++++++++++----
>  include/linux/virtio.h       |  34 +++
>  3 files changed, 612 insertions(+), 78 deletions(-)
> 
> --
> 2.32.0.3.g01195cf9f


^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH vhost v10 00/10] virtio core prepares for AF_XDP
@ 2023-06-22 19:38   ` Michael S. Tsirkin
  0 siblings, 0 replies; 91+ messages in thread
From: Michael S. Tsirkin @ 2023-06-22 19:38 UTC (permalink / raw)
  To: Xuan Zhuo
  Cc: Jesper Dangaard Brouer, Daniel Borkmann, netdev, John Fastabend,
	Alexei Starovoitov, virtualization, Eric Dumazet, Jakub Kicinski,
	bpf, Paolo Abeni, David S. Miller

On Fri, Jun 02, 2023 at 05:21:56PM +0800, Xuan Zhuo wrote:
> ## About DMA APIs
> 
> Now, virtio may can not work with DMA APIs when virtio features do not have
> VIRTIO_F_ACCESS_PLATFORM.
> 
> 1. I tried to let DMA APIs return phy address by virtio-device. But DMA APIs just
>    work with the "real" devices.
> 2. I tried to let xsk support callballs to get phy address from virtio-net
>    driver as the dma address. But the maintainers of xsk may want to use dma-buf
>    to replace the DMA APIs. I think that may be a larger effort. We will wait
>    too long.
> 
> So rethinking this, firstly, we can support premapped-dma only for devices with
> VIRTIO_F_ACCESS_PLATFORM. In the case of af-xdp, if the users want to use it,
> they have to update the device to support VIRTIO_F_RING_RESET, and they can also
> enable the device's VIRTIO_F_ACCESS_PLATFORM feature.
> 
> Thanks for the help from Christoph.
> 
> =================
> 
> XDP socket(AF_XDP) is an excellent bypass kernel network framework. The zero
> copy feature of xsk (XDP socket) needs to be supported by the driver. The
> performance of zero copy is very good.
> 
> ENV: Qemu with vhost.
> 
>                    vhost cpu | Guest APP CPU |Guest Softirq CPU | PPS
> -----------------------------|---------------|------------------|------------
> xmit by sockperf:     90%    |   100%        |                  |  318967
> xmit by xsk:          100%   |   30%         |   33%            | 1192064
> recv by sockperf:     100%   |   68%         |   100%           |  692288
> recv by xsk:          100%   |   33%         |   43%            |  771670
> 
> Before achieving the function of Virtio-Net, we also have to let virtio core
> support these features:

So by itself, this doesn't do this. But what effect does all this
overhead have on performance?

> 1. virtio core support premapped
> 2. virtio core support reset per-queue
> 3. introduce DMA APIs to virtio core
> 
> Please review.
> 
> Thanks.
> 
> v10:
>  1. support to set vq to premapped mode, then the vq just handles the premapped request.
>  2. virtio-net support to do dma mapping in advance
> 
> v9:
>  1. use flag to distinguish the premapped operations. no do judgment by sg.
> 
> v8:
>  1. vring_sg_address: check by sg_page(sg) not dma_address. Because 0 is a valid dma address
>  2. remove unused code from vring_map_one_sg()
> 
> v7:
>  1. virtqueue_dma_dev() return NULL when virtio is without DMA API.
> 
> v6:
>  1. change the size of the flags to u32.
> 
> v5:
>  1. fix for error handler
>  2. add flags to record internal dma mapping
> 
> v4:
>  1. rename map_inter to dma_map_internal
>  2. fix: Excess function parameter 'vq' description in 'virtqueue_dma_dev'
> 
> v3:
>  1. add map_inter to struct desc state to reocrd whether virtio core do dma map
> 
> v2:
>  1. based on sgs[0]->dma_address to judgment is premapped
>  2. based on extra.addr to judgment to do unmap for no-indirect desc
>  3. based on indir_desc to judgment to do unmap for indirect desc
>  4. rename virtqueue_get_dma_dev to virtqueue_dma_dev
> 
> v1:
>  1. expose dma device. NO introduce the api for dma and sync
>  2. split some commit for review.
> 
> 
> 
> 
> Xuan Zhuo (10):
>   virtio_ring: put mapping error check in vring_map_one_sg
>   virtio_ring: introduce virtqueue_set_premapped()
>   virtio_ring: split: support add premapped buf
>   virtio_ring: packed: support add premapped buf
>   virtio_ring: split-detach: support return dma info to driver
>   virtio_ring: packed-detach: support return dma info to driver
>   virtio_ring: introduce helpers for premapped
>   virtio_ring: introduce virtqueue_dma_dev()
>   virtio_ring: introduce virtqueue_add_sg()
>   virtio_net: support dma premapped
> 
>  drivers/net/virtio_net.c     | 163 ++++++++++--
>  drivers/virtio/virtio_ring.c | 493 +++++++++++++++++++++++++++++++----
>  include/linux/virtio.h       |  34 +++
>  3 files changed, 612 insertions(+), 78 deletions(-)
> 
> --
> 2.32.0.3.g01195cf9f

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH vhost v10 05/10] virtio_ring: split-detach: support return dma info to driver
  2023-06-22 19:36     ` Michael S. Tsirkin
@ 2023-06-25  2:10       ` Xuan Zhuo
  -1 siblings, 0 replies; 91+ messages in thread
From: Xuan Zhuo @ 2023-06-25  2:10 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Jesper Dangaard Brouer, Daniel Borkmann, netdev, John Fastabend,
	Alexei Starovoitov, virtualization, Eric Dumazet, Jakub Kicinski,
	bpf, Paolo Abeni, David S. Miller

On Thu, 22 Jun 2023 15:36:41 -0400, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> On Fri, Jun 02, 2023 at 05:22:01PM +0800, Xuan Zhuo wrote:
> > Under the premapped mode, the driver needs to unmap the DMA address
> > after receiving the buffer. The virtio core records the DMA address,
> > so the driver needs a way to get the dma info from the virtio core.
> >
> > A straightforward approach is to pass an array to the virtio core when
> > calling virtqueue_get_buf(). However, it is not feasible when there are
> > multiple DMA addresses in the descriptor chain, and the array size is
> > unknown.
> >
> > To solve this problem, a helper be introduced. After calling
> > virtqueue_get_buf(), the driver can call the helper to
> > retrieve a dma info. If the helper function returns -EAGAIN, it means
> > that there are more DMA addresses to be processed, and the driver should
> > call the helper function again. To keep track of the current position in
> > the chain, a cursor must be passed to the helper function, which is
> > initialized by virtqueue_get_buf().
> >
> > Some processes are done inside this helper, so this helper MUST be
> > called under the premapped mode.
> >
> > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > ---
> >  drivers/virtio/virtio_ring.c | 118 ++++++++++++++++++++++++++++++++---
> >  include/linux/virtio.h       |  11 ++++
> >  2 files changed, 119 insertions(+), 10 deletions(-)
> >
> > diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> > index dc109fbc05a5..cdc4349f6066 100644
> > --- a/drivers/virtio/virtio_ring.c
> > +++ b/drivers/virtio/virtio_ring.c
> > @@ -754,8 +754,95 @@ static bool virtqueue_kick_prepare_split(struct virtqueue *_vq)
> >  	return needs_kick;
> >  }
> >
> > -static void detach_buf_split(struct vring_virtqueue *vq, unsigned int head,
> > -			     void **ctx)
> > +static void detach_cursor_init_split(struct vring_virtqueue *vq,
> > +				     struct virtqueue_detach_cursor *cursor, u16 head)
> > +{
> > +	struct vring_desc_extra *extra;
> > +
> > +	extra = &vq->split.desc_extra[head];
> > +
> > +	/* Clear data ptr. */
> > +	vq->split.desc_state[head].data = NULL;
> > +
> > +	cursor->head = head;
> > +	cursor->done = 0;
> > +
> > +	if (extra->flags & VRING_DESC_F_INDIRECT) {
> > +		cursor->num = extra->len / sizeof(struct vring_desc);
> > +		cursor->indirect = true;
> > +		cursor->pos = 0;
> > +
> > +		vring_unmap_one_split(vq, head);
> > +
> > +		extra->next = vq->free_head;
> > +
> > +		vq->free_head = head;
> > +
> > +		/* Plus final descriptor */
> > +		vq->vq.num_free++;
> > +
> > +	} else {
> > +		cursor->indirect = false;
> > +		cursor->pos = head;
> > +	}
> > +}
> > +
> > +static int virtqueue_detach_split(struct virtqueue *_vq, struct virtqueue_detach_cursor *cursor,
> > +				  dma_addr_t *addr, u32 *len, enum dma_data_direction *dir)
> > +{
>
> I don't get it. This is generic split vq code?

NO. This is the api for split vq when the address is mapped by the driver.

> Why is it unconditionally
> wasting time with cursors etc? Poking at split.desc_extra when not
> necessary is also not really nice, will cause lots of cache misses.

virtqueue_get_buf_ctx_split() is the generic code.

I just add the checking of vq->premapped.

>
> And it looks like we duplicated a bunch of logic?

Yes.

The detach_buf_split() is the origin logic.
But now, the driver needs to get the dma info of every desc, so
I break the loop of the detach_buf_split().
But, the logic is simple, so I think it is ok.

virtqueue_detach_split() return the dma info of every desc.
detach_cursor_init_split() init the cursor inside virtqueue_get_buf_ctx_split().

>
>
> > +	struct vring_virtqueue *vq = to_vvq(_vq);
> > +	__virtio16 nextflag = cpu_to_virtio16(vq->vq.vdev, VRING_DESC_F_NEXT);
> > +	int rc = -EAGAIN;
> > +
> > +	if (unlikely(cursor->done))
> > +		return -EINVAL;
> > +
> > +	if (!cursor->indirect) {
> > +		struct vring_desc_extra *extra;
> > +		unsigned int i;
> > +
> > +		i = cursor->pos;
> > +
> > +		extra = &vq->split.desc_extra[i];
> > +
> > +		if (vq->split.vring.desc[i].flags & nextflag) {
> > +			cursor->pos = extra->next;
> > +		} else {
> > +			extra->next = vq->free_head;
> > +			vq->free_head = cursor->head;
> > +			cursor->done = true;
> > +			rc = 0;
> > +		}
> > +
> > +		*addr = extra->addr;
> > +		*len = extra->len;
> > +		*dir = (extra->flags & VRING_DESC_F_WRITE) ? DMA_FROM_DEVICE : DMA_TO_DEVICE;
> > +
> > +		vq->vq.num_free++;
> > +
> > +	} else {
> > +		struct vring_desc *indir_desc, *desc;
> > +		u16 flags;
> > +
> > +		indir_desc = vq->split.desc_state[cursor->head].indir_desc;
> > +		desc = &indir_desc[cursor->pos];
> > +
> > +		flags = virtio16_to_cpu(vq->vq.vdev, desc->flags);
> > +		*addr = virtio64_to_cpu(vq->vq.vdev, desc->addr);
> > +		*len = virtio32_to_cpu(vq->vq.vdev, desc->len);
> > +		*dir = (flags & VRING_DESC_F_WRITE) ? DMA_FROM_DEVICE : DMA_TO_DEVICE;
> > +
> > +		if (++cursor->pos == cursor->num) {
> > +			kfree(indir_desc);
> > +			cursor->done = true;
> > +			return 0;
> > +		}
> > +	}
> > +
> > +	return rc;
> > +}
> > +
> > +static void detach_buf_split(struct vring_virtqueue *vq, unsigned int head)
> >  {
> >  	unsigned int i, j;
> >  	__virtio16 nextflag = cpu_to_virtio16(vq->vq.vdev, VRING_DESC_F_NEXT);
> > @@ -799,8 +886,6 @@ static void detach_buf_split(struct vring_virtqueue *vq, unsigned int head,
> >
> >  		kfree(indir_desc);
> >  		vq->split.desc_state[head].indir_desc = NULL;
> > -	} else if (ctx) {
> > -		*ctx = vq->split.desc_state[head].indir_desc;
> >  	}
> >  }
> >
> > @@ -812,7 +897,8 @@ static bool more_used_split(const struct vring_virtqueue *vq)
> >
> >  static void *virtqueue_get_buf_ctx_split(struct virtqueue *_vq,
> >  					 unsigned int *len,
> > -					 void **ctx)
> > +					 void **ctx,
> > +					 struct virtqueue_detach_cursor *cursor)
> >  {
> >  	struct vring_virtqueue *vq = to_vvq(_vq);
> >  	void *ret;
> > @@ -852,7 +938,15 @@ static void *virtqueue_get_buf_ctx_split(struct virtqueue *_vq,
> >
> >  	/* detach_buf_split clears data, so grab it now. */
> >  	ret = vq->split.desc_state[i].data;
> > -	detach_buf_split(vq, i, ctx);
> > +
> > +	if (!vq->indirect && ctx)
> > +		*ctx = vq->split.desc_state[i].indir_desc;
> > +
> > +	if (vq->premapped)
> > +		detach_cursor_init_split(vq, cursor, i);
> > +	else
> > +		detach_buf_split(vq, i);
> > +
> >  	vq->last_used_idx++;
> >  	/* If we expect an interrupt for the next entry, tell host
> >  	 * by writing event index and flush out the write before
> > @@ -961,7 +1055,8 @@ static bool virtqueue_enable_cb_delayed_split(struct virtqueue *_vq)
> >  	return true;
> >  }
> >
> > -static void *virtqueue_detach_unused_buf_split(struct virtqueue *_vq)
> > +static void *virtqueue_detach_unused_buf_split(struct virtqueue *_vq,
> > +					       struct virtqueue_detach_cursor *cursor)
> >  {
> >  	struct vring_virtqueue *vq = to_vvq(_vq);
> >  	unsigned int i;
> > @@ -974,7 +1069,10 @@ static void *virtqueue_detach_unused_buf_split(struct virtqueue *_vq)
> >  			continue;
> >  		/* detach_buf_split clears data, so grab it now. */
> >  		buf = vq->split.desc_state[i].data;
> > -		detach_buf_split(vq, i, NULL);
> > +		if (vq->premapped)
> > +			detach_cursor_init_split(vq, cursor, i);
> > +		else
> > +			detach_buf_split(vq, i);
> >  		vq->split.avail_idx_shadow--;
> >  		vq->split.vring.avail->idx = cpu_to_virtio16(_vq->vdev,
> >  				vq->split.avail_idx_shadow);
> > @@ -2361,7 +2459,7 @@ void *virtqueue_get_buf_ctx(struct virtqueue *_vq, unsigned int *len,
> >  	struct vring_virtqueue *vq = to_vvq(_vq);
> >
> >  	return vq->packed_ring ? virtqueue_get_buf_ctx_packed(_vq, len, ctx) :
> > -				 virtqueue_get_buf_ctx_split(_vq, len, ctx);
> > +				 virtqueue_get_buf_ctx_split(_vq, len, ctx, NULL);
> >  }
> >  EXPORT_SYMBOL_GPL(virtqueue_get_buf_ctx);
> >
> > @@ -2493,7 +2591,7 @@ void *virtqueue_detach_unused_buf(struct virtqueue *_vq)
> >  	struct vring_virtqueue *vq = to_vvq(_vq);
> >
> >  	return vq->packed_ring ? virtqueue_detach_unused_buf_packed(_vq) :
> > -				 virtqueue_detach_unused_buf_split(_vq);
> > +				 virtqueue_detach_unused_buf_split(_vq, NULL);
> >  }
> >  EXPORT_SYMBOL_GPL(virtqueue_detach_unused_buf);
> >
> > diff --git a/include/linux/virtio.h b/include/linux/virtio.h
> > index 1fc0e1023bd4..eb4a4e4329aa 100644
> > --- a/include/linux/virtio.h
> > +++ b/include/linux/virtio.h
> > @@ -38,6 +38,17 @@ struct virtqueue {
> >  	void *priv;
> >  };
> >
> > +struct virtqueue_detach_cursor {
> > +	unsigned indirect:1;
> > +	unsigned done:1;
> > +	unsigned hole:14;
> > +
> > +	/* for split head */
> > +	unsigned head:16;
> > +	unsigned num:16;
> > +	unsigned pos:16;
> > +};
> > +
>
> is cursor ever stored somewhere? If not don't use bitfields,
> they cause many gcc versions to generate atrocious code.

OK.


Thanks.


>
>
> >  int virtqueue_add_outbuf(struct virtqueue *vq,
> >  			 struct scatterlist sg[], unsigned int num,
> >  			 void *data,
> > --
> > 2.32.0.3.g01195cf9f
>
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH vhost v10 05/10] virtio_ring: split-detach: support return dma info to driver
@ 2023-06-25  2:10       ` Xuan Zhuo
  0 siblings, 0 replies; 91+ messages in thread
From: Xuan Zhuo @ 2023-06-25  2:10 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: virtualization, Jason Wang, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Alexei Starovoitov, Daniel Borkmann,
	Jesper Dangaard Brouer, John Fastabend, netdev, bpf

On Thu, 22 Jun 2023 15:36:41 -0400, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> On Fri, Jun 02, 2023 at 05:22:01PM +0800, Xuan Zhuo wrote:
> > Under the premapped mode, the driver needs to unmap the DMA address
> > after receiving the buffer. The virtio core records the DMA address,
> > so the driver needs a way to get the dma info from the virtio core.
> >
> > A straightforward approach is to pass an array to the virtio core when
> > calling virtqueue_get_buf(). However, it is not feasible when there are
> > multiple DMA addresses in the descriptor chain, and the array size is
> > unknown.
> >
> > To solve this problem, a helper be introduced. After calling
> > virtqueue_get_buf(), the driver can call the helper to
> > retrieve a dma info. If the helper function returns -EAGAIN, it means
> > that there are more DMA addresses to be processed, and the driver should
> > call the helper function again. To keep track of the current position in
> > the chain, a cursor must be passed to the helper function, which is
> > initialized by virtqueue_get_buf().
> >
> > Some processes are done inside this helper, so this helper MUST be
> > called under the premapped mode.
> >
> > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > ---
> >  drivers/virtio/virtio_ring.c | 118 ++++++++++++++++++++++++++++++++---
> >  include/linux/virtio.h       |  11 ++++
> >  2 files changed, 119 insertions(+), 10 deletions(-)
> >
> > diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> > index dc109fbc05a5..cdc4349f6066 100644
> > --- a/drivers/virtio/virtio_ring.c
> > +++ b/drivers/virtio/virtio_ring.c
> > @@ -754,8 +754,95 @@ static bool virtqueue_kick_prepare_split(struct virtqueue *_vq)
> >  	return needs_kick;
> >  }
> >
> > -static void detach_buf_split(struct vring_virtqueue *vq, unsigned int head,
> > -			     void **ctx)
> > +static void detach_cursor_init_split(struct vring_virtqueue *vq,
> > +				     struct virtqueue_detach_cursor *cursor, u16 head)
> > +{
> > +	struct vring_desc_extra *extra;
> > +
> > +	extra = &vq->split.desc_extra[head];
> > +
> > +	/* Clear data ptr. */
> > +	vq->split.desc_state[head].data = NULL;
> > +
> > +	cursor->head = head;
> > +	cursor->done = 0;
> > +
> > +	if (extra->flags & VRING_DESC_F_INDIRECT) {
> > +		cursor->num = extra->len / sizeof(struct vring_desc);
> > +		cursor->indirect = true;
> > +		cursor->pos = 0;
> > +
> > +		vring_unmap_one_split(vq, head);
> > +
> > +		extra->next = vq->free_head;
> > +
> > +		vq->free_head = head;
> > +
> > +		/* Plus final descriptor */
> > +		vq->vq.num_free++;
> > +
> > +	} else {
> > +		cursor->indirect = false;
> > +		cursor->pos = head;
> > +	}
> > +}
> > +
> > +static int virtqueue_detach_split(struct virtqueue *_vq, struct virtqueue_detach_cursor *cursor,
> > +				  dma_addr_t *addr, u32 *len, enum dma_data_direction *dir)
> > +{
>
> I don't get it. This is generic split vq code?

NO. This is the api for split vq when the address is mapped by the driver.

> Why is it unconditionally
> wasting time with cursors etc? Poking at split.desc_extra when not
> necessary is also not really nice, will cause lots of cache misses.

virtqueue_get_buf_ctx_split() is the generic code.

I just add the checking of vq->premapped.

>
> And it looks like we duplicated a bunch of logic?

Yes.

The detach_buf_split() is the origin logic.
But now, the driver needs to get the dma info of every desc, so
I break the loop of the detach_buf_split().
But, the logic is simple, so I think it is ok.

virtqueue_detach_split() return the dma info of every desc.
detach_cursor_init_split() init the cursor inside virtqueue_get_buf_ctx_split().

>
>
> > +	struct vring_virtqueue *vq = to_vvq(_vq);
> > +	__virtio16 nextflag = cpu_to_virtio16(vq->vq.vdev, VRING_DESC_F_NEXT);
> > +	int rc = -EAGAIN;
> > +
> > +	if (unlikely(cursor->done))
> > +		return -EINVAL;
> > +
> > +	if (!cursor->indirect) {
> > +		struct vring_desc_extra *extra;
> > +		unsigned int i;
> > +
> > +		i = cursor->pos;
> > +
> > +		extra = &vq->split.desc_extra[i];
> > +
> > +		if (vq->split.vring.desc[i].flags & nextflag) {
> > +			cursor->pos = extra->next;
> > +		} else {
> > +			extra->next = vq->free_head;
> > +			vq->free_head = cursor->head;
> > +			cursor->done = true;
> > +			rc = 0;
> > +		}
> > +
> > +		*addr = extra->addr;
> > +		*len = extra->len;
> > +		*dir = (extra->flags & VRING_DESC_F_WRITE) ? DMA_FROM_DEVICE : DMA_TO_DEVICE;
> > +
> > +		vq->vq.num_free++;
> > +
> > +	} else {
> > +		struct vring_desc *indir_desc, *desc;
> > +		u16 flags;
> > +
> > +		indir_desc = vq->split.desc_state[cursor->head].indir_desc;
> > +		desc = &indir_desc[cursor->pos];
> > +
> > +		flags = virtio16_to_cpu(vq->vq.vdev, desc->flags);
> > +		*addr = virtio64_to_cpu(vq->vq.vdev, desc->addr);
> > +		*len = virtio32_to_cpu(vq->vq.vdev, desc->len);
> > +		*dir = (flags & VRING_DESC_F_WRITE) ? DMA_FROM_DEVICE : DMA_TO_DEVICE;
> > +
> > +		if (++cursor->pos == cursor->num) {
> > +			kfree(indir_desc);
> > +			cursor->done = true;
> > +			return 0;
> > +		}
> > +	}
> > +
> > +	return rc;
> > +}
> > +
> > +static void detach_buf_split(struct vring_virtqueue *vq, unsigned int head)
> >  {
> >  	unsigned int i, j;
> >  	__virtio16 nextflag = cpu_to_virtio16(vq->vq.vdev, VRING_DESC_F_NEXT);
> > @@ -799,8 +886,6 @@ static void detach_buf_split(struct vring_virtqueue *vq, unsigned int head,
> >
> >  		kfree(indir_desc);
> >  		vq->split.desc_state[head].indir_desc = NULL;
> > -	} else if (ctx) {
> > -		*ctx = vq->split.desc_state[head].indir_desc;
> >  	}
> >  }
> >
> > @@ -812,7 +897,8 @@ static bool more_used_split(const struct vring_virtqueue *vq)
> >
> >  static void *virtqueue_get_buf_ctx_split(struct virtqueue *_vq,
> >  					 unsigned int *len,
> > -					 void **ctx)
> > +					 void **ctx,
> > +					 struct virtqueue_detach_cursor *cursor)
> >  {
> >  	struct vring_virtqueue *vq = to_vvq(_vq);
> >  	void *ret;
> > @@ -852,7 +938,15 @@ static void *virtqueue_get_buf_ctx_split(struct virtqueue *_vq,
> >
> >  	/* detach_buf_split clears data, so grab it now. */
> >  	ret = vq->split.desc_state[i].data;
> > -	detach_buf_split(vq, i, ctx);
> > +
> > +	if (!vq->indirect && ctx)
> > +		*ctx = vq->split.desc_state[i].indir_desc;
> > +
> > +	if (vq->premapped)
> > +		detach_cursor_init_split(vq, cursor, i);
> > +	else
> > +		detach_buf_split(vq, i);
> > +
> >  	vq->last_used_idx++;
> >  	/* If we expect an interrupt for the next entry, tell host
> >  	 * by writing event index and flush out the write before
> > @@ -961,7 +1055,8 @@ static bool virtqueue_enable_cb_delayed_split(struct virtqueue *_vq)
> >  	return true;
> >  }
> >
> > -static void *virtqueue_detach_unused_buf_split(struct virtqueue *_vq)
> > +static void *virtqueue_detach_unused_buf_split(struct virtqueue *_vq,
> > +					       struct virtqueue_detach_cursor *cursor)
> >  {
> >  	struct vring_virtqueue *vq = to_vvq(_vq);
> >  	unsigned int i;
> > @@ -974,7 +1069,10 @@ static void *virtqueue_detach_unused_buf_split(struct virtqueue *_vq)
> >  			continue;
> >  		/* detach_buf_split clears data, so grab it now. */
> >  		buf = vq->split.desc_state[i].data;
> > -		detach_buf_split(vq, i, NULL);
> > +		if (vq->premapped)
> > +			detach_cursor_init_split(vq, cursor, i);
> > +		else
> > +			detach_buf_split(vq, i);
> >  		vq->split.avail_idx_shadow--;
> >  		vq->split.vring.avail->idx = cpu_to_virtio16(_vq->vdev,
> >  				vq->split.avail_idx_shadow);
> > @@ -2361,7 +2459,7 @@ void *virtqueue_get_buf_ctx(struct virtqueue *_vq, unsigned int *len,
> >  	struct vring_virtqueue *vq = to_vvq(_vq);
> >
> >  	return vq->packed_ring ? virtqueue_get_buf_ctx_packed(_vq, len, ctx) :
> > -				 virtqueue_get_buf_ctx_split(_vq, len, ctx);
> > +				 virtqueue_get_buf_ctx_split(_vq, len, ctx, NULL);
> >  }
> >  EXPORT_SYMBOL_GPL(virtqueue_get_buf_ctx);
> >
> > @@ -2493,7 +2591,7 @@ void *virtqueue_detach_unused_buf(struct virtqueue *_vq)
> >  	struct vring_virtqueue *vq = to_vvq(_vq);
> >
> >  	return vq->packed_ring ? virtqueue_detach_unused_buf_packed(_vq) :
> > -				 virtqueue_detach_unused_buf_split(_vq);
> > +				 virtqueue_detach_unused_buf_split(_vq, NULL);
> >  }
> >  EXPORT_SYMBOL_GPL(virtqueue_detach_unused_buf);
> >
> > diff --git a/include/linux/virtio.h b/include/linux/virtio.h
> > index 1fc0e1023bd4..eb4a4e4329aa 100644
> > --- a/include/linux/virtio.h
> > +++ b/include/linux/virtio.h
> > @@ -38,6 +38,17 @@ struct virtqueue {
> >  	void *priv;
> >  };
> >
> > +struct virtqueue_detach_cursor {
> > +	unsigned indirect:1;
> > +	unsigned done:1;
> > +	unsigned hole:14;
> > +
> > +	/* for split head */
> > +	unsigned head:16;
> > +	unsigned num:16;
> > +	unsigned pos:16;
> > +};
> > +
>
> is cursor ever stored somewhere? If not don't use bitfields,
> they cause many gcc versions to generate atrocious code.

OK.


Thanks.


>
>
> >  int virtqueue_add_outbuf(struct virtqueue *vq,
> >  			 struct scatterlist sg[], unsigned int num,
> >  			 void *data,
> > --
> > 2.32.0.3.g01195cf9f
>

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH vhost v10 10/10] virtio_net: support dma premapped
  2023-06-22 12:15     ` Michael S. Tsirkin
@ 2023-06-25  2:43       ` Xuan Zhuo
  -1 siblings, 0 replies; 91+ messages in thread
From: Xuan Zhuo @ 2023-06-25  2:43 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Jesper Dangaard Brouer, Daniel Borkmann, netdev, John Fastabend,
	Alexei Starovoitov, virtualization, Eric Dumazet, Jakub Kicinski,
	bpf, Paolo Abeni, David S. Miller

On Thu, 22 Jun 2023 08:15:03 -0400, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> On Fri, Jun 02, 2023 at 05:22:06PM +0800, Xuan Zhuo wrote:
> > Introduce the module param "experiment_premapped" to enable the function
> > that the virtio-net do dma mapping.
> >
> > If that is true, the vq of virtio-net is under the premapped mode.
> > It just handle the sg with dma_address. And the driver must get the dma
> > address of the buffer to unmap after get the buffer from virtio core.
> >
> > That will be useful when AF_XDP is enable, AF_XDP tx and the kernel packet
> > xmit will share the tx queue, so the skb xmit must support the premapped
> > mode.
> >
> > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
>
>
> I put this in next but I don't think this is going upstream
> in its current form,

I agree.

> certainly not with the experiment_premapped mod config
> that no one will know how to enable. If you want to experiment,
> keep it in your private tree, experimenting on humans requires
> an ethics board approval and consent forms :)

^_^

Maybe, this patchset should not include this patch,
this patch should work with the patch that uses the premapped.

>
> Spreading the "premapped" boolean all of the place is also
> far from pretty, I wonder why we can't only specify it when adding.

I guess you mean that we just use the "premapped" for the virtio api.

I will try.

Thanks.


>
> > ---
> >  drivers/net/virtio_net.c | 163 +++++++++++++++++++++++++++++++++------
> >  1 file changed, 141 insertions(+), 22 deletions(-)
> >
> > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > index 2396c28c0122..5898212fcb3c 100644
> > --- a/drivers/net/virtio_net.c
> > +++ b/drivers/net/virtio_net.c
> > @@ -26,10 +26,11 @@
> >  static int napi_weight = NAPI_POLL_WEIGHT;
> >  module_param(napi_weight, int, 0444);
> >
> > -static bool csum = true, gso = true, napi_tx = true;
> > +static bool csum = true, gso = true, napi_tx = true, experiment_premapped;
> >  module_param(csum, bool, 0444);
> >  module_param(gso, bool, 0444);
> >  module_param(napi_tx, bool, 0644);
> > +module_param(experiment_premapped, bool, 0644);
> >
> >  /* FIXME: MTU in config. */
> >  #define GOOD_PACKET_LEN (ETH_HLEN + VLAN_HLEN + ETH_DATA_LEN)
> > @@ -142,6 +143,9 @@ struct send_queue {
> >
> >  	/* Record whether sq is in reset state. */
> >  	bool reset;
> > +
> > +	/* The vq is premapped mode. */
> > +	bool premapped;
> >  };
> >
> >  /* Internal representation of a receive virtqueue */
> > @@ -174,6 +178,9 @@ struct receive_queue {
> >  	char name[16];
> >
> >  	struct xdp_rxq_info xdp_rxq;
> > +
> > +	/* The vq is premapped mode. */
> > +	bool premapped;
> >  };
> >
> >  /* This structure can contain rss message with maximum settings for indirection table and keysize
> > @@ -546,6 +553,105 @@ static struct sk_buff *page_to_skb(struct virtnet_info *vi,
> >  	return skb;
> >  }
> >
> > +static int virtnet_generic_unmap(struct virtqueue *vq, struct virtqueue_detach_cursor *cursor)
> > +{
> > +	enum dma_data_direction dir;
> > +	dma_addr_t addr;
> > +	u32 len;
> > +	int err;
> > +
> > +	do {
> > +		err = virtqueue_detach(vq, cursor, &addr, &len, &dir);
> > +		if (!err || err == -EAGAIN)
> > +			dma_unmap_page_attrs(virtqueue_dma_dev(vq), addr, len, dir, 0);
> > +
> > +	} while (err == -EAGAIN);
> > +
> > +	return err;
> > +}
> > +
> > +static void *virtnet_detach_unused_buf(struct virtqueue *vq, bool premapped)
> > +{
> > +	struct virtqueue_detach_cursor cursor;
> > +	void *buf;
> > +
> > +	if (!premapped)
> > +		return virtqueue_detach_unused_buf(vq);
> > +
> > +	buf = virtqueue_detach_unused_buf_premapped(vq, &cursor);
> > +	if (buf)
> > +		virtnet_generic_unmap(vq, &cursor);
> > +
> > +	return buf;
> > +}
> > +
> > +static void *virtnet_get_buf_ctx(struct virtqueue *vq, bool premapped, u32 *len, void **ctx)
> > +{
> > +	struct virtqueue_detach_cursor cursor;
> > +	void *buf;
> > +
> > +	if (!premapped)
> > +		return virtqueue_get_buf_ctx(vq, len, ctx);
> > +
> > +	buf = virtqueue_get_buf_premapped(vq, len, ctx, &cursor);
> > +	if (buf)
> > +		virtnet_generic_unmap(vq, &cursor);
> > +
> > +	return buf;
> > +}
> > +
> > +#define virtnet_rq_get_buf(rq, plen, pctx) \
> > +({ \
> > +	typeof(rq) _rq = (rq); \
> > +	virtnet_get_buf_ctx(_rq->vq, _rq->premapped, plen, pctx); \
> > +})
> > +
> > +#define virtnet_sq_get_buf(sq, plen, pctx) \
> > +({ \
> > +	typeof(sq) _sq = (sq); \
> > +	virtnet_get_buf_ctx(_sq->vq, _sq->premapped, plen, pctx); \
> > +})
> > +
> > +static int virtnet_add_sg(struct virtqueue *vq, bool premapped,
> > +			  struct scatterlist *sg, unsigned int num, bool out,
> > +			  void *data, void *ctx, gfp_t gfp)
> > +{
> > +	enum dma_data_direction dir;
> > +	struct device *dev;
> > +	int err, ret;
> > +
> > +	if (!premapped)
> > +		return virtqueue_add_sg(vq, sg, num, out, data, ctx, gfp);
> > +
> > +	dir = out ? DMA_TO_DEVICE : DMA_FROM_DEVICE;
> > +	dev = virtqueue_dma_dev(vq);
> > +
> > +	ret = dma_map_sg_attrs(dev, sg, num, dir, 0);
> > +	if (ret != num)
> > +		goto err;
> > +
> > +	err = virtqueue_add_sg(vq, sg, num, out, data, ctx, gfp);
> > +	if (err < 0)
> > +		goto err;
> > +
> > +	return 0;
> > +
> > +err:
> > +	dma_unmap_sg_attrs(dev, sg, num, dir, 0);
> > +	return -ENOMEM;
> > +}
> > +
> > +static int virtnet_add_outbuf(struct send_queue *sq, unsigned int num, void *data)
> > +{
> > +	return virtnet_add_sg(sq->vq, sq->premapped, sq->sg, num, true, data, NULL, GFP_ATOMIC);
> > +}
> > +
> > +static int virtnet_add_inbuf(struct receive_queue *rq, unsigned int num, void *data,
> > +			     void *ctx, gfp_t gfp)
> > +{
> > +	return virtnet_add_sg(rq->vq, rq->premapped, rq->sg, num, false, data, ctx, gfp);
> > +}
> > +
> >  static void free_old_xmit_skbs(struct send_queue *sq, bool in_napi)
> >  {
> >  	unsigned int len;
> > @@ -553,7 +659,7 @@ static void free_old_xmit_skbs(struct send_queue *sq, bool in_napi)
> >  	unsigned int bytes = 0;
> >  	void *ptr;
> >
> > -	while ((ptr = virtqueue_get_buf(sq->vq, &len)) != NULL) {
> > +	while ((ptr = virtnet_sq_get_buf(sq, &len, NULL)) != NULL) {
> >  		if (likely(!is_xdp_frame(ptr))) {
> >  			struct sk_buff *skb = ptr;
> >
> > @@ -667,8 +773,7 @@ static int __virtnet_xdp_xmit_one(struct virtnet_info *vi,
> >  			    skb_frag_size(frag), skb_frag_off(frag));
> >  	}
> >
> > -	err = virtqueue_add_outbuf(sq->vq, sq->sg, nr_frags + 1,
> > -				   xdp_to_ptr(xdpf), GFP_ATOMIC);
> > +	err = virtnet_add_outbuf(sq, nr_frags + 1, xdp_to_ptr(xdpf));
> >  	if (unlikely(err))
> >  		return -ENOSPC; /* Caller handle free/refcnt */
> >
> > @@ -744,7 +849,7 @@ static int virtnet_xdp_xmit(struct net_device *dev,
> >  	}
> >
> >  	/* Free up any pending old buffers before queueing new ones. */
> > -	while ((ptr = virtqueue_get_buf(sq->vq, &len)) != NULL) {
> > +	while ((ptr = virtnet_sq_get_buf(sq, &len, NULL)) != NULL) {
> >  		if (likely(is_xdp_frame(ptr))) {
> >  			struct xdp_frame *frame = ptr_to_xdp(ptr);
> >
> > @@ -828,7 +933,7 @@ static struct page *xdp_linearize_page(struct receive_queue *rq,
> >  		void *buf;
> >  		int off;
> >
> > -		buf = virtqueue_get_buf(rq->vq, &buflen);
> > +		buf = virtnet_rq_get_buf(rq, &buflen, NULL);
> >  		if (unlikely(!buf))
> >  			goto err_buf;
> >
> > @@ -1119,7 +1224,7 @@ static int virtnet_build_xdp_buff_mrg(struct net_device *dev,
> >  		return -EINVAL;
> >
> >  	while (--*num_buf > 0) {
> > -		buf = virtqueue_get_buf_ctx(rq->vq, &len, &ctx);
> > +		buf = virtnet_rq_get_buf(rq, &len, &ctx);
> >  		if (unlikely(!buf)) {
> >  			pr_debug("%s: rx error: %d buffers out of %d missing\n",
> >  				 dev->name, *num_buf,
> > @@ -1344,7 +1449,7 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
> >  	while (--num_buf) {
> >  		int num_skb_frags;
> >
> > -		buf = virtqueue_get_buf_ctx(rq->vq, &len, &ctx);
> > +		buf = virtnet_rq_get_buf(rq, &len, &ctx);
> >  		if (unlikely(!buf)) {
> >  			pr_debug("%s: rx error: %d buffers out of %d missing\n",
> >  				 dev->name, num_buf,
> > @@ -1407,7 +1512,7 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
> >  err_skb:
> >  	put_page(page);
> >  	while (num_buf-- > 1) {
> > -		buf = virtqueue_get_buf(rq->vq, &len);
> > +		buf = virtnet_rq_get_buf(rq, &len, NULL);
> >  		if (unlikely(!buf)) {
> >  			pr_debug("%s: rx error: %d buffers missing\n",
> >  				 dev->name, num_buf);
> > @@ -1534,7 +1639,7 @@ static int add_recvbuf_small(struct virtnet_info *vi, struct receive_queue *rq,
> >  	alloc_frag->offset += len;
> >  	sg_init_one(rq->sg, buf + VIRTNET_RX_PAD + xdp_headroom,
> >  		    vi->hdr_len + GOOD_PACKET_LEN);
> > -	err = virtqueue_add_inbuf_ctx(rq->vq, rq->sg, 1, buf, ctx, gfp);
> > +	err = virtnet_add_inbuf(rq, 1, buf, ctx, gfp);
> >  	if (err < 0)
> >  		put_page(virt_to_head_page(buf));
> >  	return err;
> > @@ -1581,8 +1686,8 @@ static int add_recvbuf_big(struct virtnet_info *vi, struct receive_queue *rq,
> >
> >  	/* chain first in list head */
> >  	first->private = (unsigned long)list;
> > -	err = virtqueue_add_inbuf(rq->vq, rq->sg, vi->big_packets_num_skbfrags + 2,
> > -				  first, gfp);
> > +	err = virtnet_add_inbuf(rq, vi->big_packets_num_skbfrags + 2,
> > +				first, NULL, gfp);
> >  	if (err < 0)
> >  		give_pages(rq, first);
> >
> > @@ -1645,7 +1750,7 @@ static int add_recvbuf_mergeable(struct virtnet_info *vi,
> >
> >  	sg_init_one(rq->sg, buf, len);
> >  	ctx = mergeable_len_to_ctx(len + room, headroom);
> > -	err = virtqueue_add_inbuf_ctx(rq->vq, rq->sg, 1, buf, ctx, gfp);
> > +	err = virtnet_add_inbuf(rq, 1, buf, ctx, gfp);
> >  	if (err < 0)
> >  		put_page(virt_to_head_page(buf));
> >
> > @@ -1768,13 +1873,13 @@ static int virtnet_receive(struct receive_queue *rq, int budget,
> >  		void *ctx;
> >
> >  		while (stats.packets < budget &&
> > -		       (buf = virtqueue_get_buf_ctx(rq->vq, &len, &ctx))) {
> > +		       (buf = virtnet_rq_get_buf(rq, &len, &ctx))) {
> >  			receive_buf(vi, rq, buf, len, ctx, xdp_xmit, &stats);
> >  			stats.packets++;
> >  		}
> >  	} else {
> >  		while (stats.packets < budget &&
> > -		       (buf = virtqueue_get_buf(rq->vq, &len)) != NULL) {
> > +		       (buf = virtnet_rq_get_buf(rq, &len, NULL)) != NULL) {
> >  			receive_buf(vi, rq, buf, len, NULL, xdp_xmit, &stats);
> >  			stats.packets++;
> >  		}
> > @@ -1984,7 +2089,7 @@ static int xmit_skb(struct send_queue *sq, struct sk_buff *skb)
> >  			return num_sg;
> >  		num_sg++;
> >  	}
> > -	return virtqueue_add_outbuf(sq->vq, sq->sg, num_sg, skb, GFP_ATOMIC);
> > +	return virtnet_add_outbuf(sq, num_sg, skb);
> >  }
> >
> >  static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev)
> > @@ -3552,15 +3657,17 @@ static void free_unused_bufs(struct virtnet_info *vi)
> >  	int i;
> >
> >  	for (i = 0; i < vi->max_queue_pairs; i++) {
> > -		struct virtqueue *vq = vi->sq[i].vq;
> > -		while ((buf = virtqueue_detach_unused_buf(vq)) != NULL)
> > -			virtnet_sq_free_unused_buf(vq, buf);
> > +		struct send_queue *sq = &vi->sq[i];
> > +
> > +		while ((buf = virtnet_detach_unused_buf(sq->vq, sq->premapped)) != NULL)
> > +			virtnet_sq_free_unused_buf(sq->vq, buf);
> >  	}
> >
> >  	for (i = 0; i < vi->max_queue_pairs; i++) {
> > -		struct virtqueue *vq = vi->rq[i].vq;
> > -		while ((buf = virtqueue_detach_unused_buf(vq)) != NULL)
> > -			virtnet_rq_free_unused_buf(vq, buf);
> > +		struct receive_queue *rq = &vi->rq[i];
> > +
> > +		while ((buf = virtnet_detach_unused_buf(rq->vq, rq->premapped)) != NULL)
> > +			virtnet_rq_free_unused_buf(rq->vq, buf);
> >  	}
> >  }
> >
> > @@ -3658,6 +3765,18 @@ static int virtnet_find_vqs(struct virtnet_info *vi)
> >  		vi->rq[i].vq = vqs[rxq2vq(i)];
> >  		vi->rq[i].min_buf_len = mergeable_min_buf_len(vi, vi->rq[i].vq);
> >  		vi->sq[i].vq = vqs[txq2vq(i)];
> > +
> > +		if (experiment_premapped) {
> > +			if (!virtqueue_set_premapped(vi->rq[i].vq))
> > +				vi->rq[i].premapped = true;
> > +			else
> > +				netdev_warn(vi->dev, "RXQ (%d) enable premapped failure.\n", i);
> > +
> > +			if (!virtqueue_set_premapped(vi->sq[i].vq))
> > +				vi->sq[i].premapped = true;
> > +			else
> > +				netdev_warn(vi->dev, "TXQ (%d) enable premapped failure.\n", i);
> > +		}
> >  	}
> >
> >  	/* run here: ret == 0. */
> > --
> > 2.32.0.3.g01195cf9f
>
> _______________________________________________
> Virtualization mailing list
> Virtualization@lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/virtualization
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH vhost v10 10/10] virtio_net: support dma premapped
@ 2023-06-25  2:43       ` Xuan Zhuo
  0 siblings, 0 replies; 91+ messages in thread
From: Xuan Zhuo @ 2023-06-25  2:43 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Jesper Dangaard Brouer, Daniel Borkmann, netdev, John Fastabend,
	Alexei Starovoitov, virtualization, Eric Dumazet, Jakub Kicinski,
	bpf, Paolo Abeni, David S. Miller

On Thu, 22 Jun 2023 08:15:03 -0400, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> On Fri, Jun 02, 2023 at 05:22:06PM +0800, Xuan Zhuo wrote:
> > Introduce the module param "experiment_premapped" to enable the function
> > that the virtio-net do dma mapping.
> >
> > If that is true, the vq of virtio-net is under the premapped mode.
> > It just handle the sg with dma_address. And the driver must get the dma
> > address of the buffer to unmap after get the buffer from virtio core.
> >
> > That will be useful when AF_XDP is enable, AF_XDP tx and the kernel packet
> > xmit will share the tx queue, so the skb xmit must support the premapped
> > mode.
> >
> > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
>
>
> I put this in next but I don't think this is going upstream
> in its current form,

I agree.

> certainly not with the experiment_premapped mod config
> that no one will know how to enable. If you want to experiment,
> keep it in your private tree, experimenting on humans requires
> an ethics board approval and consent forms :)

^_^

Maybe, this patchset should not include this patch,
this patch should work with the patch that uses the premapped.

>
> Spreading the "premapped" boolean all of the place is also
> far from pretty, I wonder why we can't only specify it when adding.

I guess you mean that we just use the "premapped" for the virtio api.

I will try.

Thanks.


>
> > ---
> >  drivers/net/virtio_net.c | 163 +++++++++++++++++++++++++++++++++------
> >  1 file changed, 141 insertions(+), 22 deletions(-)
> >
> > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > index 2396c28c0122..5898212fcb3c 100644
> > --- a/drivers/net/virtio_net.c
> > +++ b/drivers/net/virtio_net.c
> > @@ -26,10 +26,11 @@
> >  static int napi_weight = NAPI_POLL_WEIGHT;
> >  module_param(napi_weight, int, 0444);
> >
> > -static bool csum = true, gso = true, napi_tx = true;
> > +static bool csum = true, gso = true, napi_tx = true, experiment_premapped;
> >  module_param(csum, bool, 0444);
> >  module_param(gso, bool, 0444);
> >  module_param(napi_tx, bool, 0644);
> > +module_param(experiment_premapped, bool, 0644);
> >
> >  /* FIXME: MTU in config. */
> >  #define GOOD_PACKET_LEN (ETH_HLEN + VLAN_HLEN + ETH_DATA_LEN)
> > @@ -142,6 +143,9 @@ struct send_queue {
> >
> >  	/* Record whether sq is in reset state. */
> >  	bool reset;
> > +
> > +	/* The vq is premapped mode. */
> > +	bool premapped;
> >  };
> >
> >  /* Internal representation of a receive virtqueue */
> > @@ -174,6 +178,9 @@ struct receive_queue {
> >  	char name[16];
> >
> >  	struct xdp_rxq_info xdp_rxq;
> > +
> > +	/* The vq is premapped mode. */
> > +	bool premapped;
> >  };
> >
> >  /* This structure can contain rss message with maximum settings for indirection table and keysize
> > @@ -546,6 +553,105 @@ static struct sk_buff *page_to_skb(struct virtnet_info *vi,
> >  	return skb;
> >  }
> >
> > +static int virtnet_generic_unmap(struct virtqueue *vq, struct virtqueue_detach_cursor *cursor)
> > +{
> > +	enum dma_data_direction dir;
> > +	dma_addr_t addr;
> > +	u32 len;
> > +	int err;
> > +
> > +	do {
> > +		err = virtqueue_detach(vq, cursor, &addr, &len, &dir);
> > +		if (!err || err == -EAGAIN)
> > +			dma_unmap_page_attrs(virtqueue_dma_dev(vq), addr, len, dir, 0);
> > +
> > +	} while (err == -EAGAIN);
> > +
> > +	return err;
> > +}
> > +
> > +static void *virtnet_detach_unused_buf(struct virtqueue *vq, bool premapped)
> > +{
> > +	struct virtqueue_detach_cursor cursor;
> > +	void *buf;
> > +
> > +	if (!premapped)
> > +		return virtqueue_detach_unused_buf(vq);
> > +
> > +	buf = virtqueue_detach_unused_buf_premapped(vq, &cursor);
> > +	if (buf)
> > +		virtnet_generic_unmap(vq, &cursor);
> > +
> > +	return buf;
> > +}
> > +
> > +static void *virtnet_get_buf_ctx(struct virtqueue *vq, bool premapped, u32 *len, void **ctx)
> > +{
> > +	struct virtqueue_detach_cursor cursor;
> > +	void *buf;
> > +
> > +	if (!premapped)
> > +		return virtqueue_get_buf_ctx(vq, len, ctx);
> > +
> > +	buf = virtqueue_get_buf_premapped(vq, len, ctx, &cursor);
> > +	if (buf)
> > +		virtnet_generic_unmap(vq, &cursor);
> > +
> > +	return buf;
> > +}
> > +
> > +#define virtnet_rq_get_buf(rq, plen, pctx) \
> > +({ \
> > +	typeof(rq) _rq = (rq); \
> > +	virtnet_get_buf_ctx(_rq->vq, _rq->premapped, plen, pctx); \
> > +})
> > +
> > +#define virtnet_sq_get_buf(sq, plen, pctx) \
> > +({ \
> > +	typeof(sq) _sq = (sq); \
> > +	virtnet_get_buf_ctx(_sq->vq, _sq->premapped, plen, pctx); \
> > +})
> > +
> > +static int virtnet_add_sg(struct virtqueue *vq, bool premapped,
> > +			  struct scatterlist *sg, unsigned int num, bool out,
> > +			  void *data, void *ctx, gfp_t gfp)
> > +{
> > +	enum dma_data_direction dir;
> > +	struct device *dev;
> > +	int err, ret;
> > +
> > +	if (!premapped)
> > +		return virtqueue_add_sg(vq, sg, num, out, data, ctx, gfp);
> > +
> > +	dir = out ? DMA_TO_DEVICE : DMA_FROM_DEVICE;
> > +	dev = virtqueue_dma_dev(vq);
> > +
> > +	ret = dma_map_sg_attrs(dev, sg, num, dir, 0);
> > +	if (ret != num)
> > +		goto err;
> > +
> > +	err = virtqueue_add_sg(vq, sg, num, out, data, ctx, gfp);
> > +	if (err < 0)
> > +		goto err;
> > +
> > +	return 0;
> > +
> > +err:
> > +	dma_unmap_sg_attrs(dev, sg, num, dir, 0);
> > +	return -ENOMEM;
> > +}
> > +
> > +static int virtnet_add_outbuf(struct send_queue *sq, unsigned int num, void *data)
> > +{
> > +	return virtnet_add_sg(sq->vq, sq->premapped, sq->sg, num, true, data, NULL, GFP_ATOMIC);
> > +}
> > +
> > +static int virtnet_add_inbuf(struct receive_queue *rq, unsigned int num, void *data,
> > +			     void *ctx, gfp_t gfp)
> > +{
> > +	return virtnet_add_sg(rq->vq, rq->premapped, rq->sg, num, false, data, ctx, gfp);
> > +}
> > +
> >  static void free_old_xmit_skbs(struct send_queue *sq, bool in_napi)
> >  {
> >  	unsigned int len;
> > @@ -553,7 +659,7 @@ static void free_old_xmit_skbs(struct send_queue *sq, bool in_napi)
> >  	unsigned int bytes = 0;
> >  	void *ptr;
> >
> > -	while ((ptr = virtqueue_get_buf(sq->vq, &len)) != NULL) {
> > +	while ((ptr = virtnet_sq_get_buf(sq, &len, NULL)) != NULL) {
> >  		if (likely(!is_xdp_frame(ptr))) {
> >  			struct sk_buff *skb = ptr;
> >
> > @@ -667,8 +773,7 @@ static int __virtnet_xdp_xmit_one(struct virtnet_info *vi,
> >  			    skb_frag_size(frag), skb_frag_off(frag));
> >  	}
> >
> > -	err = virtqueue_add_outbuf(sq->vq, sq->sg, nr_frags + 1,
> > -				   xdp_to_ptr(xdpf), GFP_ATOMIC);
> > +	err = virtnet_add_outbuf(sq, nr_frags + 1, xdp_to_ptr(xdpf));
> >  	if (unlikely(err))
> >  		return -ENOSPC; /* Caller handle free/refcnt */
> >
> > @@ -744,7 +849,7 @@ static int virtnet_xdp_xmit(struct net_device *dev,
> >  	}
> >
> >  	/* Free up any pending old buffers before queueing new ones. */
> > -	while ((ptr = virtqueue_get_buf(sq->vq, &len)) != NULL) {
> > +	while ((ptr = virtnet_sq_get_buf(sq, &len, NULL)) != NULL) {
> >  		if (likely(is_xdp_frame(ptr))) {
> >  			struct xdp_frame *frame = ptr_to_xdp(ptr);
> >
> > @@ -828,7 +933,7 @@ static struct page *xdp_linearize_page(struct receive_queue *rq,
> >  		void *buf;
> >  		int off;
> >
> > -		buf = virtqueue_get_buf(rq->vq, &buflen);
> > +		buf = virtnet_rq_get_buf(rq, &buflen, NULL);
> >  		if (unlikely(!buf))
> >  			goto err_buf;
> >
> > @@ -1119,7 +1224,7 @@ static int virtnet_build_xdp_buff_mrg(struct net_device *dev,
> >  		return -EINVAL;
> >
> >  	while (--*num_buf > 0) {
> > -		buf = virtqueue_get_buf_ctx(rq->vq, &len, &ctx);
> > +		buf = virtnet_rq_get_buf(rq, &len, &ctx);
> >  		if (unlikely(!buf)) {
> >  			pr_debug("%s: rx error: %d buffers out of %d missing\n",
> >  				 dev->name, *num_buf,
> > @@ -1344,7 +1449,7 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
> >  	while (--num_buf) {
> >  		int num_skb_frags;
> >
> > -		buf = virtqueue_get_buf_ctx(rq->vq, &len, &ctx);
> > +		buf = virtnet_rq_get_buf(rq, &len, &ctx);
> >  		if (unlikely(!buf)) {
> >  			pr_debug("%s: rx error: %d buffers out of %d missing\n",
> >  				 dev->name, num_buf,
> > @@ -1407,7 +1512,7 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
> >  err_skb:
> >  	put_page(page);
> >  	while (num_buf-- > 1) {
> > -		buf = virtqueue_get_buf(rq->vq, &len);
> > +		buf = virtnet_rq_get_buf(rq, &len, NULL);
> >  		if (unlikely(!buf)) {
> >  			pr_debug("%s: rx error: %d buffers missing\n",
> >  				 dev->name, num_buf);
> > @@ -1534,7 +1639,7 @@ static int add_recvbuf_small(struct virtnet_info *vi, struct receive_queue *rq,
> >  	alloc_frag->offset += len;
> >  	sg_init_one(rq->sg, buf + VIRTNET_RX_PAD + xdp_headroom,
> >  		    vi->hdr_len + GOOD_PACKET_LEN);
> > -	err = virtqueue_add_inbuf_ctx(rq->vq, rq->sg, 1, buf, ctx, gfp);
> > +	err = virtnet_add_inbuf(rq, 1, buf, ctx, gfp);
> >  	if (err < 0)
> >  		put_page(virt_to_head_page(buf));
> >  	return err;
> > @@ -1581,8 +1686,8 @@ static int add_recvbuf_big(struct virtnet_info *vi, struct receive_queue *rq,
> >
> >  	/* chain first in list head */
> >  	first->private = (unsigned long)list;
> > -	err = virtqueue_add_inbuf(rq->vq, rq->sg, vi->big_packets_num_skbfrags + 2,
> > -				  first, gfp);
> > +	err = virtnet_add_inbuf(rq, vi->big_packets_num_skbfrags + 2,
> > +				first, NULL, gfp);
> >  	if (err < 0)
> >  		give_pages(rq, first);
> >
> > @@ -1645,7 +1750,7 @@ static int add_recvbuf_mergeable(struct virtnet_info *vi,
> >
> >  	sg_init_one(rq->sg, buf, len);
> >  	ctx = mergeable_len_to_ctx(len + room, headroom);
> > -	err = virtqueue_add_inbuf_ctx(rq->vq, rq->sg, 1, buf, ctx, gfp);
> > +	err = virtnet_add_inbuf(rq, 1, buf, ctx, gfp);
> >  	if (err < 0)
> >  		put_page(virt_to_head_page(buf));
> >
> > @@ -1768,13 +1873,13 @@ static int virtnet_receive(struct receive_queue *rq, int budget,
> >  		void *ctx;
> >
> >  		while (stats.packets < budget &&
> > -		       (buf = virtqueue_get_buf_ctx(rq->vq, &len, &ctx))) {
> > +		       (buf = virtnet_rq_get_buf(rq, &len, &ctx))) {
> >  			receive_buf(vi, rq, buf, len, ctx, xdp_xmit, &stats);
> >  			stats.packets++;
> >  		}
> >  	} else {
> >  		while (stats.packets < budget &&
> > -		       (buf = virtqueue_get_buf(rq->vq, &len)) != NULL) {
> > +		       (buf = virtnet_rq_get_buf(rq, &len, NULL)) != NULL) {
> >  			receive_buf(vi, rq, buf, len, NULL, xdp_xmit, &stats);
> >  			stats.packets++;
> >  		}
> > @@ -1984,7 +2089,7 @@ static int xmit_skb(struct send_queue *sq, struct sk_buff *skb)
> >  			return num_sg;
> >  		num_sg++;
> >  	}
> > -	return virtqueue_add_outbuf(sq->vq, sq->sg, num_sg, skb, GFP_ATOMIC);
> > +	return virtnet_add_outbuf(sq, num_sg, skb);
> >  }
> >
> >  static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev)
> > @@ -3552,15 +3657,17 @@ static void free_unused_bufs(struct virtnet_info *vi)
> >  	int i;
> >
> >  	for (i = 0; i < vi->max_queue_pairs; i++) {
> > -		struct virtqueue *vq = vi->sq[i].vq;
> > -		while ((buf = virtqueue_detach_unused_buf(vq)) != NULL)
> > -			virtnet_sq_free_unused_buf(vq, buf);
> > +		struct send_queue *sq = &vi->sq[i];
> > +
> > +		while ((buf = virtnet_detach_unused_buf(sq->vq, sq->premapped)) != NULL)
> > +			virtnet_sq_free_unused_buf(sq->vq, buf);
> >  	}
> >
> >  	for (i = 0; i < vi->max_queue_pairs; i++) {
> > -		struct virtqueue *vq = vi->rq[i].vq;
> > -		while ((buf = virtqueue_detach_unused_buf(vq)) != NULL)
> > -			virtnet_rq_free_unused_buf(vq, buf);
> > +		struct receive_queue *rq = &vi->rq[i];
> > +
> > +		while ((buf = virtnet_detach_unused_buf(rq->vq, rq->premapped)) != NULL)
> > +			virtnet_rq_free_unused_buf(rq->vq, buf);
> >  	}
> >  }
> >
> > @@ -3658,6 +3765,18 @@ static int virtnet_find_vqs(struct virtnet_info *vi)
> >  		vi->rq[i].vq = vqs[rxq2vq(i)];
> >  		vi->rq[i].min_buf_len = mergeable_min_buf_len(vi, vi->rq[i].vq);
> >  		vi->sq[i].vq = vqs[txq2vq(i)];
> > +
> > +		if (experiment_premapped) {
> > +			if (!virtqueue_set_premapped(vi->rq[i].vq))
> > +				vi->rq[i].premapped = true;
> > +			else
> > +				netdev_warn(vi->dev, "RXQ (%d) enable premapped failure.\n", i);
> > +
> > +			if (!virtqueue_set_premapped(vi->sq[i].vq))
> > +				vi->sq[i].premapped = true;
> > +			else
> > +				netdev_warn(vi->dev, "TXQ (%d) enable premapped failure.\n", i);
> > +		}
> >  	}
> >
> >  	/* run here: ret == 0. */
> > --
> > 2.32.0.3.g01195cf9f
>
> _______________________________________________
> Virtualization mailing list
> Virtualization@lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH vhost v10 00/10] virtio core prepares for AF_XDP
  2023-06-21  6:42   ` Xuan Zhuo
@ 2023-06-25  7:19     ` Jason Wang
  -1 siblings, 0 replies; 91+ messages in thread
From: Jason Wang @ 2023-06-25  7:19 UTC (permalink / raw)
  To: Xuan Zhuo
  Cc: Michael S. Tsirkin, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Alexei Starovoitov, Daniel Borkmann,
	Jesper Dangaard Brouer, John Fastabend, netdev, bpf,
	virtualization

On Wed, Jun 21, 2023 at 2:43 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
>
> Hi Jason,
>
> Do you have plan to review this?

Just came back from vacation, will do this next week.

Thanks

>
> Thanks.
>


^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH vhost v10 00/10] virtio core prepares for AF_XDP
@ 2023-06-25  7:19     ` Jason Wang
  0 siblings, 0 replies; 91+ messages in thread
From: Jason Wang @ 2023-06-25  7:19 UTC (permalink / raw)
  To: Xuan Zhuo
  Cc: Jesper Dangaard Brouer, Daniel Borkmann, Michael S. Tsirkin,
	netdev, John Fastabend, Alexei Starovoitov, virtualization,
	Eric Dumazet, Jakub Kicinski, bpf, Paolo Abeni, David S. Miller

On Wed, Jun 21, 2023 at 2:43 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
>
> Hi Jason,
>
> Do you have plan to review this?

Just came back from vacation, will do this next week.

Thanks

>
> Thanks.
>

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH vhost v10 01/10] virtio_ring: put mapping error check in vring_map_one_sg
  2023-06-02  9:21   ` Xuan Zhuo
@ 2023-06-27  8:03     ` Jason Wang
  -1 siblings, 0 replies; 91+ messages in thread
From: Jason Wang @ 2023-06-27  8:03 UTC (permalink / raw)
  To: Xuan Zhuo
  Cc: Jesper Dangaard Brouer, Daniel Borkmann, Michael S. Tsirkin,
	netdev, John Fastabend, Alexei Starovoitov, virtualization,
	Eric Dumazet, Jakub Kicinski, bpf, Paolo Abeni, David S. Miller

On Fri, Jun 2, 2023 at 5:22 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
>
> This patch put the dma addr error check in vring_map_one_sg().
>
> The benefits of doing this:
>
> 1. reduce one judgment of vq->use_dma_api.
> 2. make vring_map_one_sg more simple, without calling
>    vring_mapping_error to check the return value. simplifies subsequent
>    code
>
> Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>

Acked-by: Jason Wang <jasowang@redhat.com>

Thanks


> ---
>  drivers/virtio/virtio_ring.c | 37 +++++++++++++++++++++---------------
>  1 file changed, 22 insertions(+), 15 deletions(-)
>
> diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> index c5310eaf8b46..72ed07a604d4 100644
> --- a/drivers/virtio/virtio_ring.c
> +++ b/drivers/virtio/virtio_ring.c
> @@ -355,9 +355,8 @@ static struct device *vring_dma_dev(const struct vring_virtqueue *vq)
>  }
>
>  /* Map one sg entry. */
> -static dma_addr_t vring_map_one_sg(const struct vring_virtqueue *vq,
> -                                  struct scatterlist *sg,
> -                                  enum dma_data_direction direction)
> +static int vring_map_one_sg(const struct vring_virtqueue *vq, struct scatterlist *sg,
> +                           enum dma_data_direction direction, dma_addr_t *addr)
>  {
>         if (!vq->use_dma_api) {
>                 /*
> @@ -366,7 +365,8 @@ static dma_addr_t vring_map_one_sg(const struct vring_virtqueue *vq,
>                  * depending on the direction.
>                  */
>                 kmsan_handle_dma(sg_page(sg), sg->offset, sg->length, direction);
> -               return (dma_addr_t)sg_phys(sg);
> +               *addr = (dma_addr_t)sg_phys(sg);
> +               return 0;
>         }
>
>         /*
> @@ -374,9 +374,14 @@ static dma_addr_t vring_map_one_sg(const struct vring_virtqueue *vq,
>          * the way it expects (we don't guarantee that the scatterlist
>          * will exist for the lifetime of the mapping).
>          */
> -       return dma_map_page(vring_dma_dev(vq),
> +       *addr = dma_map_page(vring_dma_dev(vq),
>                             sg_page(sg), sg->offset, sg->length,
>                             direction);
> +
> +       if (dma_mapping_error(vring_dma_dev(vq), *addr))
> +               return -ENOMEM;
> +
> +       return 0;
>  }
>
>  static dma_addr_t vring_map_single(const struct vring_virtqueue *vq,
> @@ -588,8 +593,9 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
>
>         for (n = 0; n < out_sgs; n++) {
>                 for (sg = sgs[n]; sg; sg = sg_next(sg)) {
> -                       dma_addr_t addr = vring_map_one_sg(vq, sg, DMA_TO_DEVICE);
> -                       if (vring_mapping_error(vq, addr))
> +                       dma_addr_t addr;
> +
> +                       if (vring_map_one_sg(vq, sg, DMA_TO_DEVICE, &addr))
>                                 goto unmap_release;
>
>                         prev = i;
> @@ -603,8 +609,9 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
>         }
>         for (; n < (out_sgs + in_sgs); n++) {
>                 for (sg = sgs[n]; sg; sg = sg_next(sg)) {
> -                       dma_addr_t addr = vring_map_one_sg(vq, sg, DMA_FROM_DEVICE);
> -                       if (vring_mapping_error(vq, addr))
> +                       dma_addr_t addr;
> +
> +                       if (vring_map_one_sg(vq, sg, DMA_FROM_DEVICE, &addr))
>                                 goto unmap_release;
>
>                         prev = i;
> @@ -1279,9 +1286,8 @@ static int virtqueue_add_indirect_packed(struct vring_virtqueue *vq,
>
>         for (n = 0; n < out_sgs + in_sgs; n++) {
>                 for (sg = sgs[n]; sg; sg = sg_next(sg)) {
> -                       addr = vring_map_one_sg(vq, sg, n < out_sgs ?
> -                                       DMA_TO_DEVICE : DMA_FROM_DEVICE);
> -                       if (vring_mapping_error(vq, addr))
> +                       if (vring_map_one_sg(vq, sg, n < out_sgs ?
> +                                            DMA_TO_DEVICE : DMA_FROM_DEVICE, &addr))
>                                 goto unmap_release;
>
>                         desc[i].flags = cpu_to_le16(n < out_sgs ?
> @@ -1426,9 +1432,10 @@ static inline int virtqueue_add_packed(struct virtqueue *_vq,
>         c = 0;
>         for (n = 0; n < out_sgs + in_sgs; n++) {
>                 for (sg = sgs[n]; sg; sg = sg_next(sg)) {
> -                       dma_addr_t addr = vring_map_one_sg(vq, sg, n < out_sgs ?
> -                                       DMA_TO_DEVICE : DMA_FROM_DEVICE);
> -                       if (vring_mapping_error(vq, addr))
> +                       dma_addr_t addr;
> +
> +                       if (vring_map_one_sg(vq, sg, n < out_sgs ?
> +                                            DMA_TO_DEVICE : DMA_FROM_DEVICE, &addr))
>                                 goto unmap_release;
>
>                         flags = cpu_to_le16(vq->packed.avail_used_flags |
> --
> 2.32.0.3.g01195cf9f
>

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH vhost v10 01/10] virtio_ring: put mapping error check in vring_map_one_sg
@ 2023-06-27  8:03     ` Jason Wang
  0 siblings, 0 replies; 91+ messages in thread
From: Jason Wang @ 2023-06-27  8:03 UTC (permalink / raw)
  To: Xuan Zhuo
  Cc: virtualization, Michael S. Tsirkin, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Alexei Starovoitov,
	Daniel Borkmann, Jesper Dangaard Brouer, John Fastabend, netdev,
	bpf

On Fri, Jun 2, 2023 at 5:22 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
>
> This patch put the dma addr error check in vring_map_one_sg().
>
> The benefits of doing this:
>
> 1. reduce one judgment of vq->use_dma_api.
> 2. make vring_map_one_sg more simple, without calling
>    vring_mapping_error to check the return value. simplifies subsequent
>    code
>
> Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>

Acked-by: Jason Wang <jasowang@redhat.com>

Thanks


> ---
>  drivers/virtio/virtio_ring.c | 37 +++++++++++++++++++++---------------
>  1 file changed, 22 insertions(+), 15 deletions(-)
>
> diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> index c5310eaf8b46..72ed07a604d4 100644
> --- a/drivers/virtio/virtio_ring.c
> +++ b/drivers/virtio/virtio_ring.c
> @@ -355,9 +355,8 @@ static struct device *vring_dma_dev(const struct vring_virtqueue *vq)
>  }
>
>  /* Map one sg entry. */
> -static dma_addr_t vring_map_one_sg(const struct vring_virtqueue *vq,
> -                                  struct scatterlist *sg,
> -                                  enum dma_data_direction direction)
> +static int vring_map_one_sg(const struct vring_virtqueue *vq, struct scatterlist *sg,
> +                           enum dma_data_direction direction, dma_addr_t *addr)
>  {
>         if (!vq->use_dma_api) {
>                 /*
> @@ -366,7 +365,8 @@ static dma_addr_t vring_map_one_sg(const struct vring_virtqueue *vq,
>                  * depending on the direction.
>                  */
>                 kmsan_handle_dma(sg_page(sg), sg->offset, sg->length, direction);
> -               return (dma_addr_t)sg_phys(sg);
> +               *addr = (dma_addr_t)sg_phys(sg);
> +               return 0;
>         }
>
>         /*
> @@ -374,9 +374,14 @@ static dma_addr_t vring_map_one_sg(const struct vring_virtqueue *vq,
>          * the way it expects (we don't guarantee that the scatterlist
>          * will exist for the lifetime of the mapping).
>          */
> -       return dma_map_page(vring_dma_dev(vq),
> +       *addr = dma_map_page(vring_dma_dev(vq),
>                             sg_page(sg), sg->offset, sg->length,
>                             direction);
> +
> +       if (dma_mapping_error(vring_dma_dev(vq), *addr))
> +               return -ENOMEM;
> +
> +       return 0;
>  }
>
>  static dma_addr_t vring_map_single(const struct vring_virtqueue *vq,
> @@ -588,8 +593,9 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
>
>         for (n = 0; n < out_sgs; n++) {
>                 for (sg = sgs[n]; sg; sg = sg_next(sg)) {
> -                       dma_addr_t addr = vring_map_one_sg(vq, sg, DMA_TO_DEVICE);
> -                       if (vring_mapping_error(vq, addr))
> +                       dma_addr_t addr;
> +
> +                       if (vring_map_one_sg(vq, sg, DMA_TO_DEVICE, &addr))
>                                 goto unmap_release;
>
>                         prev = i;
> @@ -603,8 +609,9 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
>         }
>         for (; n < (out_sgs + in_sgs); n++) {
>                 for (sg = sgs[n]; sg; sg = sg_next(sg)) {
> -                       dma_addr_t addr = vring_map_one_sg(vq, sg, DMA_FROM_DEVICE);
> -                       if (vring_mapping_error(vq, addr))
> +                       dma_addr_t addr;
> +
> +                       if (vring_map_one_sg(vq, sg, DMA_FROM_DEVICE, &addr))
>                                 goto unmap_release;
>
>                         prev = i;
> @@ -1279,9 +1286,8 @@ static int virtqueue_add_indirect_packed(struct vring_virtqueue *vq,
>
>         for (n = 0; n < out_sgs + in_sgs; n++) {
>                 for (sg = sgs[n]; sg; sg = sg_next(sg)) {
> -                       addr = vring_map_one_sg(vq, sg, n < out_sgs ?
> -                                       DMA_TO_DEVICE : DMA_FROM_DEVICE);
> -                       if (vring_mapping_error(vq, addr))
> +                       if (vring_map_one_sg(vq, sg, n < out_sgs ?
> +                                            DMA_TO_DEVICE : DMA_FROM_DEVICE, &addr))
>                                 goto unmap_release;
>
>                         desc[i].flags = cpu_to_le16(n < out_sgs ?
> @@ -1426,9 +1432,10 @@ static inline int virtqueue_add_packed(struct virtqueue *_vq,
>         c = 0;
>         for (n = 0; n < out_sgs + in_sgs; n++) {
>                 for (sg = sgs[n]; sg; sg = sg_next(sg)) {
> -                       dma_addr_t addr = vring_map_one_sg(vq, sg, n < out_sgs ?
> -                                       DMA_TO_DEVICE : DMA_FROM_DEVICE);
> -                       if (vring_mapping_error(vq, addr))
> +                       dma_addr_t addr;
> +
> +                       if (vring_map_one_sg(vq, sg, n < out_sgs ?
> +                                            DMA_TO_DEVICE : DMA_FROM_DEVICE, &addr))
>                                 goto unmap_release;
>
>                         flags = cpu_to_le16(vq->packed.avail_used_flags |
> --
> 2.32.0.3.g01195cf9f
>


^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH vhost v10 02/10] virtio_ring: introduce virtqueue_set_premapped()
  2023-06-02  9:21   ` Xuan Zhuo
@ 2023-06-27  8:03     ` Jason Wang
  -1 siblings, 0 replies; 91+ messages in thread
From: Jason Wang @ 2023-06-27  8:03 UTC (permalink / raw)
  To: Xuan Zhuo
  Cc: Jesper Dangaard Brouer, Daniel Borkmann, Michael S. Tsirkin,
	netdev, John Fastabend, Alexei Starovoitov, virtualization,
	Eric Dumazet, Jakub Kicinski, bpf, Paolo Abeni, David S. Miller

On Fri, Jun 2, 2023 at 5:22 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
>
> This helper allows the driver change the dma mode to premapped mode.
> Under the premapped mode, the virtio core do not do dma mapping
> internally.
>
> This just work when the use_dma_api is true. If the use_dma_api is false,
> the dma options is not through the DMA APIs, that is not the standard
> way of the linux kernel.
>
> Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> ---
>  drivers/virtio/virtio_ring.c | 40 ++++++++++++++++++++++++++++++++++++
>  include/linux/virtio.h       |  2 ++
>  2 files changed, 42 insertions(+)
>
> diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> index 72ed07a604d4..2afdfb9e3e30 100644
> --- a/drivers/virtio/virtio_ring.c
> +++ b/drivers/virtio/virtio_ring.c
> @@ -172,6 +172,9 @@ struct vring_virtqueue {
>         /* Host publishes avail event idx */
>         bool event;
>
> +       /* Do DMA mapping by driver */
> +       bool premapped;
> +
>         /* Head of free buffer list. */
>         unsigned int free_head;
>         /* Number we've added since last sync. */
> @@ -2059,6 +2062,7 @@ static struct virtqueue *vring_create_virtqueue_packed(
>         vq->packed_ring = true;
>         vq->dma_dev = dma_dev;
>         vq->use_dma_api = vring_use_dma_api(vdev);
> +       vq->premapped = false;
>
>         vq->indirect = virtio_has_feature(vdev, VIRTIO_RING_F_INDIRECT_DESC) &&
>                 !context;
> @@ -2548,6 +2552,7 @@ static struct virtqueue *__vring_new_virtqueue(unsigned int index,
>  #endif
>         vq->dma_dev = dma_dev;
>         vq->use_dma_api = vring_use_dma_api(vdev);
> +       vq->premapped = false;
>
>         vq->indirect = virtio_has_feature(vdev, VIRTIO_RING_F_INDIRECT_DESC) &&
>                 !context;
> @@ -2691,6 +2696,41 @@ int virtqueue_resize(struct virtqueue *_vq, u32 num,
>  }
>  EXPORT_SYMBOL_GPL(virtqueue_resize);
>
> +/**
> + * virtqueue_set_premapped - set the vring premapped mode
> + * @_vq: the struct virtqueue we're talking about.
> + *
> + * Enable the premapped mode of the vq.
> + *
> + * The vring in premapped mode does not do dma internally, so the driver must
> + * do dma mapping in advance. The driver must pass the dma_address through
> + * dma_address of scatterlist. When the driver got a used buffer from
> + * the vring, it has to unmap the dma address. So the driver must call
> + * virtqueue_get_buf_premapped()/virtqueue_detach_unused_buf_premapped().
> + *
> + * This must be called before adding any buf to vring.

And any old buffer should be detached?

> + * So this should be called immediately after init vq or vq reset.

Any way to detect and warn in this case? (not a must if it's too
expensive to do the check)

> + *
> + * Caller must ensure we don't call this with other virtqueue operations
> + * at the same time (except where noted).
> + *
> + * Returns zero or a negative error.
> + * 0: success.
> + * -EINVAL: vring does not use the dma api, so we can not enable premapped mode.
> + */
> +int virtqueue_set_premapped(struct virtqueue *_vq)
> +{
> +       struct vring_virtqueue *vq = to_vvq(_vq);
> +
> +       if (!vq->use_dma_api)
> +               return -EINVAL;
> +
> +       vq->premapped = true;

I guess there should be a way to disable it. Would it be useful for
the case when AF_XDP sockets were destroyed?

Thanks


> +
> +       return 0;
> +}
> +EXPORT_SYMBOL_GPL(virtqueue_set_premapped);
> +
>  /* Only available for split ring */
>  struct virtqueue *vring_new_virtqueue(unsigned int index,
>                                       unsigned int num,
> diff --git a/include/linux/virtio.h b/include/linux/virtio.h
> index b93238db94e3..1fc0e1023bd4 100644
> --- a/include/linux/virtio.h
> +++ b/include/linux/virtio.h
> @@ -78,6 +78,8 @@ bool virtqueue_enable_cb(struct virtqueue *vq);
>
>  unsigned virtqueue_enable_cb_prepare(struct virtqueue *vq);
>
> +int virtqueue_set_premapped(struct virtqueue *_vq);
> +
>  bool virtqueue_poll(struct virtqueue *vq, unsigned);
>
>  bool virtqueue_enable_cb_delayed(struct virtqueue *vq);
> --
> 2.32.0.3.g01195cf9f
>

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH vhost v10 02/10] virtio_ring: introduce virtqueue_set_premapped()
@ 2023-06-27  8:03     ` Jason Wang
  0 siblings, 0 replies; 91+ messages in thread
From: Jason Wang @ 2023-06-27  8:03 UTC (permalink / raw)
  To: Xuan Zhuo
  Cc: virtualization, Michael S. Tsirkin, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Alexei Starovoitov,
	Daniel Borkmann, Jesper Dangaard Brouer, John Fastabend, netdev,
	bpf

On Fri, Jun 2, 2023 at 5:22 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
>
> This helper allows the driver change the dma mode to premapped mode.
> Under the premapped mode, the virtio core do not do dma mapping
> internally.
>
> This just work when the use_dma_api is true. If the use_dma_api is false,
> the dma options is not through the DMA APIs, that is not the standard
> way of the linux kernel.
>
> Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> ---
>  drivers/virtio/virtio_ring.c | 40 ++++++++++++++++++++++++++++++++++++
>  include/linux/virtio.h       |  2 ++
>  2 files changed, 42 insertions(+)
>
> diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> index 72ed07a604d4..2afdfb9e3e30 100644
> --- a/drivers/virtio/virtio_ring.c
> +++ b/drivers/virtio/virtio_ring.c
> @@ -172,6 +172,9 @@ struct vring_virtqueue {
>         /* Host publishes avail event idx */
>         bool event;
>
> +       /* Do DMA mapping by driver */
> +       bool premapped;
> +
>         /* Head of free buffer list. */
>         unsigned int free_head;
>         /* Number we've added since last sync. */
> @@ -2059,6 +2062,7 @@ static struct virtqueue *vring_create_virtqueue_packed(
>         vq->packed_ring = true;
>         vq->dma_dev = dma_dev;
>         vq->use_dma_api = vring_use_dma_api(vdev);
> +       vq->premapped = false;
>
>         vq->indirect = virtio_has_feature(vdev, VIRTIO_RING_F_INDIRECT_DESC) &&
>                 !context;
> @@ -2548,6 +2552,7 @@ static struct virtqueue *__vring_new_virtqueue(unsigned int index,
>  #endif
>         vq->dma_dev = dma_dev;
>         vq->use_dma_api = vring_use_dma_api(vdev);
> +       vq->premapped = false;
>
>         vq->indirect = virtio_has_feature(vdev, VIRTIO_RING_F_INDIRECT_DESC) &&
>                 !context;
> @@ -2691,6 +2696,41 @@ int virtqueue_resize(struct virtqueue *_vq, u32 num,
>  }
>  EXPORT_SYMBOL_GPL(virtqueue_resize);
>
> +/**
> + * virtqueue_set_premapped - set the vring premapped mode
> + * @_vq: the struct virtqueue we're talking about.
> + *
> + * Enable the premapped mode of the vq.
> + *
> + * The vring in premapped mode does not do dma internally, so the driver must
> + * do dma mapping in advance. The driver must pass the dma_address through
> + * dma_address of scatterlist. When the driver got a used buffer from
> + * the vring, it has to unmap the dma address. So the driver must call
> + * virtqueue_get_buf_premapped()/virtqueue_detach_unused_buf_premapped().
> + *
> + * This must be called before adding any buf to vring.

And any old buffer should be detached?

> + * So this should be called immediately after init vq or vq reset.

Any way to detect and warn in this case? (not a must if it's too
expensive to do the check)

> + *
> + * Caller must ensure we don't call this with other virtqueue operations
> + * at the same time (except where noted).
> + *
> + * Returns zero or a negative error.
> + * 0: success.
> + * -EINVAL: vring does not use the dma api, so we can not enable premapped mode.
> + */
> +int virtqueue_set_premapped(struct virtqueue *_vq)
> +{
> +       struct vring_virtqueue *vq = to_vvq(_vq);
> +
> +       if (!vq->use_dma_api)
> +               return -EINVAL;
> +
> +       vq->premapped = true;

I guess there should be a way to disable it. Would it be useful for
the case when AF_XDP sockets were destroyed?

Thanks


> +
> +       return 0;
> +}
> +EXPORT_SYMBOL_GPL(virtqueue_set_premapped);
> +
>  /* Only available for split ring */
>  struct virtqueue *vring_new_virtqueue(unsigned int index,
>                                       unsigned int num,
> diff --git a/include/linux/virtio.h b/include/linux/virtio.h
> index b93238db94e3..1fc0e1023bd4 100644
> --- a/include/linux/virtio.h
> +++ b/include/linux/virtio.h
> @@ -78,6 +78,8 @@ bool virtqueue_enable_cb(struct virtqueue *vq);
>
>  unsigned virtqueue_enable_cb_prepare(struct virtqueue *vq);
>
> +int virtqueue_set_premapped(struct virtqueue *_vq);
> +
>  bool virtqueue_poll(struct virtqueue *vq, unsigned);
>
>  bool virtqueue_enable_cb_delayed(struct virtqueue *vq);
> --
> 2.32.0.3.g01195cf9f
>


^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH vhost v10 03/10] virtio_ring: split: support add premapped buf
  2023-06-02  9:21   ` Xuan Zhuo
@ 2023-06-27  8:03     ` Jason Wang
  -1 siblings, 0 replies; 91+ messages in thread
From: Jason Wang @ 2023-06-27  8:03 UTC (permalink / raw)
  To: Xuan Zhuo
  Cc: Jesper Dangaard Brouer, Daniel Borkmann, Michael S. Tsirkin,
	netdev, John Fastabend, Alexei Starovoitov, virtualization,
	Eric Dumazet, Jakub Kicinski, bpf, Paolo Abeni, David S. Miller

On Fri, Jun 2, 2023 at 5:22 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
>
> If the vq is the premapped mode, use the sg_dma_address() directly.
>
> Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> ---
>  drivers/virtio/virtio_ring.c | 46 ++++++++++++++++++++++--------------
>  1 file changed, 28 insertions(+), 18 deletions(-)
>
> diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> index 2afdfb9e3e30..18212c3e056b 100644
> --- a/drivers/virtio/virtio_ring.c
> +++ b/drivers/virtio/virtio_ring.c
> @@ -598,8 +598,12 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
>                 for (sg = sgs[n]; sg; sg = sg_next(sg)) {
>                         dma_addr_t addr;
>
> -                       if (vring_map_one_sg(vq, sg, DMA_TO_DEVICE, &addr))
> -                               goto unmap_release;
> +                       if (vq->premapped) {
> +                               addr = sg_dma_address(sg);
> +                       } else {
> +                               if (vring_map_one_sg(vq, sg, DMA_TO_DEVICE, &addr))
> +                                       goto unmap_release;
> +                       }

Btw, I wonder whether or not it would be simple to implement the
vq->premapped check inside vring_map_one_sg() assuming the
!use_dma_api is done there as well.

>
>                         prev = i;
>                         /* Note that we trust indirect descriptor
> @@ -614,8 +618,12 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
>                 for (sg = sgs[n]; sg; sg = sg_next(sg)) {
>                         dma_addr_t addr;
>
> -                       if (vring_map_one_sg(vq, sg, DMA_FROM_DEVICE, &addr))
> -                               goto unmap_release;
> +                       if (vq->premapped) {
> +                               addr = sg_dma_address(sg);
> +                       } else {
> +                               if (vring_map_one_sg(vq, sg, DMA_FROM_DEVICE, &addr))
> +                                       goto unmap_release;
> +                       }
>
>                         prev = i;
>                         /* Note that we trust indirect descriptor
> @@ -689,21 +697,23 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
>         return 0;
>
>  unmap_release:
> -       err_idx = i;
> +       if (!vq->premapped) {

Can vq->premapped be true here? The label is named as "unmap_relase"
which implies "map" beforehand which seems not the case for
premapping.

Thanks


> +               err_idx = i;
>
> -       if (indirect)
> -               i = 0;
> -       else
> -               i = head;
> -
> -       for (n = 0; n < total_sg; n++) {
> -               if (i == err_idx)
> -                       break;
> -               if (indirect) {
> -                       vring_unmap_one_split_indirect(vq, &desc[i]);
> -                       i = virtio16_to_cpu(_vq->vdev, desc[i].next);
> -               } else
> -                       i = vring_unmap_one_split(vq, i);
> +               if (indirect)
> +                       i = 0;
> +               else
> +                       i = head;
> +
> +               for (n = 0; n < total_sg; n++) {
> +                       if (i == err_idx)
> +                               break;
> +                       if (indirect) {
> +                               vring_unmap_one_split_indirect(vq, &desc[i]);
> +                               i = virtio16_to_cpu(_vq->vdev, desc[i].next);
> +                       } else
> +                               i = vring_unmap_one_split(vq, i);
> +               }
>         }
>
>         if (indirect)
> --
> 2.32.0.3.g01195cf9f
>

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH vhost v10 03/10] virtio_ring: split: support add premapped buf
@ 2023-06-27  8:03     ` Jason Wang
  0 siblings, 0 replies; 91+ messages in thread
From: Jason Wang @ 2023-06-27  8:03 UTC (permalink / raw)
  To: Xuan Zhuo
  Cc: virtualization, Michael S. Tsirkin, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Alexei Starovoitov,
	Daniel Borkmann, Jesper Dangaard Brouer, John Fastabend, netdev,
	bpf

On Fri, Jun 2, 2023 at 5:22 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
>
> If the vq is the premapped mode, use the sg_dma_address() directly.
>
> Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> ---
>  drivers/virtio/virtio_ring.c | 46 ++++++++++++++++++++++--------------
>  1 file changed, 28 insertions(+), 18 deletions(-)
>
> diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> index 2afdfb9e3e30..18212c3e056b 100644
> --- a/drivers/virtio/virtio_ring.c
> +++ b/drivers/virtio/virtio_ring.c
> @@ -598,8 +598,12 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
>                 for (sg = sgs[n]; sg; sg = sg_next(sg)) {
>                         dma_addr_t addr;
>
> -                       if (vring_map_one_sg(vq, sg, DMA_TO_DEVICE, &addr))
> -                               goto unmap_release;
> +                       if (vq->premapped) {
> +                               addr = sg_dma_address(sg);
> +                       } else {
> +                               if (vring_map_one_sg(vq, sg, DMA_TO_DEVICE, &addr))
> +                                       goto unmap_release;
> +                       }

Btw, I wonder whether or not it would be simple to implement the
vq->premapped check inside vring_map_one_sg() assuming the
!use_dma_api is done there as well.

>
>                         prev = i;
>                         /* Note that we trust indirect descriptor
> @@ -614,8 +618,12 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
>                 for (sg = sgs[n]; sg; sg = sg_next(sg)) {
>                         dma_addr_t addr;
>
> -                       if (vring_map_one_sg(vq, sg, DMA_FROM_DEVICE, &addr))
> -                               goto unmap_release;
> +                       if (vq->premapped) {
> +                               addr = sg_dma_address(sg);
> +                       } else {
> +                               if (vring_map_one_sg(vq, sg, DMA_FROM_DEVICE, &addr))
> +                                       goto unmap_release;
> +                       }
>
>                         prev = i;
>                         /* Note that we trust indirect descriptor
> @@ -689,21 +697,23 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
>         return 0;
>
>  unmap_release:
> -       err_idx = i;
> +       if (!vq->premapped) {

Can vq->premapped be true here? The label is named as "unmap_relase"
which implies "map" beforehand which seems not the case for
premapping.

Thanks


> +               err_idx = i;
>
> -       if (indirect)
> -               i = 0;
> -       else
> -               i = head;
> -
> -       for (n = 0; n < total_sg; n++) {
> -               if (i == err_idx)
> -                       break;
> -               if (indirect) {
> -                       vring_unmap_one_split_indirect(vq, &desc[i]);
> -                       i = virtio16_to_cpu(_vq->vdev, desc[i].next);
> -               } else
> -                       i = vring_unmap_one_split(vq, i);
> +               if (indirect)
> +                       i = 0;
> +               else
> +                       i = head;
> +
> +               for (n = 0; n < total_sg; n++) {
> +                       if (i == err_idx)
> +                               break;
> +                       if (indirect) {
> +                               vring_unmap_one_split_indirect(vq, &desc[i]);
> +                               i = virtio16_to_cpu(_vq->vdev, desc[i].next);
> +                       } else
> +                               i = vring_unmap_one_split(vq, i);
> +               }
>         }
>
>         if (indirect)
> --
> 2.32.0.3.g01195cf9f
>


^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH vhost v10 04/10] virtio_ring: packed: support add premapped buf
  2023-06-02  9:22   ` Xuan Zhuo
@ 2023-06-27  8:03     ` Jason Wang
  -1 siblings, 0 replies; 91+ messages in thread
From: Jason Wang @ 2023-06-27  8:03 UTC (permalink / raw)
  To: Xuan Zhuo
  Cc: Jesper Dangaard Brouer, Daniel Borkmann, Michael S. Tsirkin,
	netdev, John Fastabend, Alexei Starovoitov, virtualization,
	Eric Dumazet, Jakub Kicinski, bpf, Paolo Abeni, David S. Miller

On Fri, Jun 2, 2023 at 5:22 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
>
> If the vq is the premapped mode, use the sg_dma_address() directly.
>
> Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> ---
>  drivers/virtio/virtio_ring.c | 36 ++++++++++++++++++++++++++----------
>  1 file changed, 26 insertions(+), 10 deletions(-)
>
> diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> index 18212c3e056b..dc109fbc05a5 100644
> --- a/drivers/virtio/virtio_ring.c
> +++ b/drivers/virtio/virtio_ring.c
> @@ -1299,9 +1299,13 @@ static int virtqueue_add_indirect_packed(struct vring_virtqueue *vq,
>
>         for (n = 0; n < out_sgs + in_sgs; n++) {
>                 for (sg = sgs[n]; sg; sg = sg_next(sg)) {
> -                       if (vring_map_one_sg(vq, sg, n < out_sgs ?
> -                                            DMA_TO_DEVICE : DMA_FROM_DEVICE, &addr))
> -                               goto unmap_release;
> +                       if (vq->premapped) {
> +                               addr = sg_dma_address(sg);
> +                       } else {
> +                               if (vring_map_one_sg(vq, sg, n < out_sgs ?
> +                                                    DMA_TO_DEVICE : DMA_FROM_DEVICE, &addr))
> +                                       goto unmap_release;
> +                       }
>
>                         desc[i].flags = cpu_to_le16(n < out_sgs ?
>                                                 0 : VRING_DESC_F_WRITE);
> @@ -1369,10 +1373,12 @@ static int virtqueue_add_indirect_packed(struct vring_virtqueue *vq,
>         return 0;
>
>  unmap_release:
> -       err_idx = i;
> +       if (!vq->premapped) {
> +               err_idx = i;
>
> -       for (i = 0; i < err_idx; i++)
> -               vring_unmap_desc_packed(vq, &desc[i]);
> +               for (i = 0; i < err_idx; i++)
> +                       vring_unmap_desc_packed(vq, &desc[i]);
> +       }
>
>         kfree(desc);
>
> @@ -1447,9 +1453,13 @@ static inline int virtqueue_add_packed(struct virtqueue *_vq,
>                 for (sg = sgs[n]; sg; sg = sg_next(sg)) {
>                         dma_addr_t addr;
>
> -                       if (vring_map_one_sg(vq, sg, n < out_sgs ?
> -                                            DMA_TO_DEVICE : DMA_FROM_DEVICE, &addr))
> -                               goto unmap_release;
> +                       if (vq->premapped) {
> +                               addr = sg_dma_address(sg);
> +                       } else {
> +                               if (vring_map_one_sg(vq, sg, n < out_sgs ?
> +                                                    DMA_TO_DEVICE : DMA_FROM_DEVICE, &addr))
> +                                       goto unmap_release;
> +                       }
>
>                         flags = cpu_to_le16(vq->packed.avail_used_flags |
>                                     (++c == total_sg ? 0 : VRING_DESC_F_NEXT) |
> @@ -1512,11 +1522,17 @@ static inline int virtqueue_add_packed(struct virtqueue *_vq,
>         return 0;
>
>  unmap_release:
> +       vq->packed.avail_used_flags = avail_used_flags;
> +
> +       if (vq->premapped) {

Similar to the split path, I think we can't hit vq->premapped here.

Thanks


> +               END_USE(vq);
> +               return -EIO;
> +       }
> +
>         err_idx = i;
>         i = head;
>         curr = vq->free_head;
>
> -       vq->packed.avail_used_flags = avail_used_flags;
>
>         for (n = 0; n < total_sg; n++) {
>                 if (i == err_idx)
> --
> 2.32.0.3.g01195cf9f
>

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH vhost v10 04/10] virtio_ring: packed: support add premapped buf
@ 2023-06-27  8:03     ` Jason Wang
  0 siblings, 0 replies; 91+ messages in thread
From: Jason Wang @ 2023-06-27  8:03 UTC (permalink / raw)
  To: Xuan Zhuo
  Cc: virtualization, Michael S. Tsirkin, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Alexei Starovoitov,
	Daniel Borkmann, Jesper Dangaard Brouer, John Fastabend, netdev,
	bpf

On Fri, Jun 2, 2023 at 5:22 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
>
> If the vq is the premapped mode, use the sg_dma_address() directly.
>
> Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> ---
>  drivers/virtio/virtio_ring.c | 36 ++++++++++++++++++++++++++----------
>  1 file changed, 26 insertions(+), 10 deletions(-)
>
> diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> index 18212c3e056b..dc109fbc05a5 100644
> --- a/drivers/virtio/virtio_ring.c
> +++ b/drivers/virtio/virtio_ring.c
> @@ -1299,9 +1299,13 @@ static int virtqueue_add_indirect_packed(struct vring_virtqueue *vq,
>
>         for (n = 0; n < out_sgs + in_sgs; n++) {
>                 for (sg = sgs[n]; sg; sg = sg_next(sg)) {
> -                       if (vring_map_one_sg(vq, sg, n < out_sgs ?
> -                                            DMA_TO_DEVICE : DMA_FROM_DEVICE, &addr))
> -                               goto unmap_release;
> +                       if (vq->premapped) {
> +                               addr = sg_dma_address(sg);
> +                       } else {
> +                               if (vring_map_one_sg(vq, sg, n < out_sgs ?
> +                                                    DMA_TO_DEVICE : DMA_FROM_DEVICE, &addr))
> +                                       goto unmap_release;
> +                       }
>
>                         desc[i].flags = cpu_to_le16(n < out_sgs ?
>                                                 0 : VRING_DESC_F_WRITE);
> @@ -1369,10 +1373,12 @@ static int virtqueue_add_indirect_packed(struct vring_virtqueue *vq,
>         return 0;
>
>  unmap_release:
> -       err_idx = i;
> +       if (!vq->premapped) {
> +               err_idx = i;
>
> -       for (i = 0; i < err_idx; i++)
> -               vring_unmap_desc_packed(vq, &desc[i]);
> +               for (i = 0; i < err_idx; i++)
> +                       vring_unmap_desc_packed(vq, &desc[i]);
> +       }
>
>         kfree(desc);
>
> @@ -1447,9 +1453,13 @@ static inline int virtqueue_add_packed(struct virtqueue *_vq,
>                 for (sg = sgs[n]; sg; sg = sg_next(sg)) {
>                         dma_addr_t addr;
>
> -                       if (vring_map_one_sg(vq, sg, n < out_sgs ?
> -                                            DMA_TO_DEVICE : DMA_FROM_DEVICE, &addr))
> -                               goto unmap_release;
> +                       if (vq->premapped) {
> +                               addr = sg_dma_address(sg);
> +                       } else {
> +                               if (vring_map_one_sg(vq, sg, n < out_sgs ?
> +                                                    DMA_TO_DEVICE : DMA_FROM_DEVICE, &addr))
> +                                       goto unmap_release;
> +                       }
>
>                         flags = cpu_to_le16(vq->packed.avail_used_flags |
>                                     (++c == total_sg ? 0 : VRING_DESC_F_NEXT) |
> @@ -1512,11 +1522,17 @@ static inline int virtqueue_add_packed(struct virtqueue *_vq,
>         return 0;
>
>  unmap_release:
> +       vq->packed.avail_used_flags = avail_used_flags;
> +
> +       if (vq->premapped) {

Similar to the split path, I think we can't hit vq->premapped here.

Thanks


> +               END_USE(vq);
> +               return -EIO;
> +       }
> +
>         err_idx = i;
>         i = head;
>         curr = vq->free_head;
>
> -       vq->packed.avail_used_flags = avail_used_flags;
>
>         for (n = 0; n < total_sg; n++) {
>                 if (i == err_idx)
> --
> 2.32.0.3.g01195cf9f
>


^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH vhost v10 05/10] virtio_ring: split-detach: support return dma info to driver
  2023-06-02  9:22   ` Xuan Zhuo
@ 2023-06-27  8:03     ` Jason Wang
  -1 siblings, 0 replies; 91+ messages in thread
From: Jason Wang @ 2023-06-27  8:03 UTC (permalink / raw)
  To: Xuan Zhuo
  Cc: Jesper Dangaard Brouer, Daniel Borkmann, Michael S. Tsirkin,
	netdev, John Fastabend, Alexei Starovoitov, virtualization,
	Eric Dumazet, Jakub Kicinski, bpf, Paolo Abeni, David S. Miller

On Fri, Jun 2, 2023 at 5:22 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
>
> Under the premapped mode, the driver needs to unmap the DMA address
> after receiving the buffer. The virtio core records the DMA address,
> so the driver needs a way to get the dma info from the virtio core.

A second thought, can we simply offload the tracking to the driver
itself? This looks the way many other modern NIC drivers did.

In pre mapped mode, the DMA address is in fact told by the driver
itself so it should have sufficient knowledge. And in some cases, the
driver wants to optimize/merge/delay the unampping so the DMA
addresses returned by the virtio core are not even interested in those
cases.

Thanks



>
> A straightforward approach is to pass an array to the virtio core when
> calling virtqueue_get_buf(). However, it is not feasible when there are
> multiple DMA addresses in the descriptor chain, and the array size is
> unknown.
>
> To solve this problem, a helper be introduced. After calling
> virtqueue_get_buf(), the driver can call the helper to
> retrieve a dma info. If the helper function returns -EAGAIN, it means
> that there are more DMA addresses to be processed, and the driver should
> call the helper function again. To keep track of the current position in
> the chain, a cursor must be passed to the helper function, which is
> initialized by virtqueue_get_buf().
>
> Some processes are done inside this helper, so this helper MUST be
> called under the premapped mode.
>
> Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> ---
>  drivers/virtio/virtio_ring.c | 118 ++++++++++++++++++++++++++++++++---
>  include/linux/virtio.h       |  11 ++++
>  2 files changed, 119 insertions(+), 10 deletions(-)
>
> diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> index dc109fbc05a5..cdc4349f6066 100644
> --- a/drivers/virtio/virtio_ring.c
> +++ b/drivers/virtio/virtio_ring.c
> @@ -754,8 +754,95 @@ static bool virtqueue_kick_prepare_split(struct virtqueue *_vq)
>         return needs_kick;
>  }
>
> -static void detach_buf_split(struct vring_virtqueue *vq, unsigned int head,
> -                            void **ctx)
> +static void detach_cursor_init_split(struct vring_virtqueue *vq,
> +                                    struct virtqueue_detach_cursor *cursor, u16 head)
> +{
> +       struct vring_desc_extra *extra;
> +
> +       extra = &vq->split.desc_extra[head];
> +
> +       /* Clear data ptr. */
> +       vq->split.desc_state[head].data = NULL;
> +
> +       cursor->head = head;
> +       cursor->done = 0;
> +
> +       if (extra->flags & VRING_DESC_F_INDIRECT) {
> +               cursor->num = extra->len / sizeof(struct vring_desc);
> +               cursor->indirect = true;
> +               cursor->pos = 0;
> +
> +               vring_unmap_one_split(vq, head);
> +
> +               extra->next = vq->free_head;
> +
> +               vq->free_head = head;
> +
> +               /* Plus final descriptor */
> +               vq->vq.num_free++;
> +
> +       } else {
> +               cursor->indirect = false;
> +               cursor->pos = head;
> +       }
> +}
> +
> +static int virtqueue_detach_split(struct virtqueue *_vq, struct virtqueue_detach_cursor *cursor,
> +                                 dma_addr_t *addr, u32 *len, enum dma_data_direction *dir)
> +{
> +       struct vring_virtqueue *vq = to_vvq(_vq);
> +       __virtio16 nextflag = cpu_to_virtio16(vq->vq.vdev, VRING_DESC_F_NEXT);
> +       int rc = -EAGAIN;
> +
> +       if (unlikely(cursor->done))
> +               return -EINVAL;
> +
> +       if (!cursor->indirect) {
> +               struct vring_desc_extra *extra;
> +               unsigned int i;
> +
> +               i = cursor->pos;
> +
> +               extra = &vq->split.desc_extra[i];
> +
> +               if (vq->split.vring.desc[i].flags & nextflag) {
> +                       cursor->pos = extra->next;
> +               } else {
> +                       extra->next = vq->free_head;
> +                       vq->free_head = cursor->head;
> +                       cursor->done = true;
> +                       rc = 0;
> +               }
> +
> +               *addr = extra->addr;
> +               *len = extra->len;
> +               *dir = (extra->flags & VRING_DESC_F_WRITE) ? DMA_FROM_DEVICE : DMA_TO_DEVICE;
> +
> +               vq->vq.num_free++;
> +
> +       } else {
> +               struct vring_desc *indir_desc, *desc;
> +               u16 flags;
> +
> +               indir_desc = vq->split.desc_state[cursor->head].indir_desc;
> +               desc = &indir_desc[cursor->pos];
> +
> +               flags = virtio16_to_cpu(vq->vq.vdev, desc->flags);
> +               *addr = virtio64_to_cpu(vq->vq.vdev, desc->addr);
> +               *len = virtio32_to_cpu(vq->vq.vdev, desc->len);
> +               *dir = (flags & VRING_DESC_F_WRITE) ? DMA_FROM_DEVICE : DMA_TO_DEVICE;
> +
> +               if (++cursor->pos == cursor->num) {
> +                       kfree(indir_desc);
> +                       cursor->done = true;
> +                       return 0;
> +               }
> +       }
> +
> +       return rc;
> +}
> +
> +static void detach_buf_split(struct vring_virtqueue *vq, unsigned int head)
>  {
>         unsigned int i, j;
>         __virtio16 nextflag = cpu_to_virtio16(vq->vq.vdev, VRING_DESC_F_NEXT);
> @@ -799,8 +886,6 @@ static void detach_buf_split(struct vring_virtqueue *vq, unsigned int head,
>
>                 kfree(indir_desc);
>                 vq->split.desc_state[head].indir_desc = NULL;
> -       } else if (ctx) {
> -               *ctx = vq->split.desc_state[head].indir_desc;
>         }
>  }
>
> @@ -812,7 +897,8 @@ static bool more_used_split(const struct vring_virtqueue *vq)
>
>  static void *virtqueue_get_buf_ctx_split(struct virtqueue *_vq,
>                                          unsigned int *len,
> -                                        void **ctx)
> +                                        void **ctx,
> +                                        struct virtqueue_detach_cursor *cursor)
>  {
>         struct vring_virtqueue *vq = to_vvq(_vq);
>         void *ret;
> @@ -852,7 +938,15 @@ static void *virtqueue_get_buf_ctx_split(struct virtqueue *_vq,
>
>         /* detach_buf_split clears data, so grab it now. */
>         ret = vq->split.desc_state[i].data;
> -       detach_buf_split(vq, i, ctx);
> +
> +       if (!vq->indirect && ctx)
> +               *ctx = vq->split.desc_state[i].indir_desc;
> +
> +       if (vq->premapped)
> +               detach_cursor_init_split(vq, cursor, i);
> +       else
> +               detach_buf_split(vq, i);
> +
>         vq->last_used_idx++;
>         /* If we expect an interrupt for the next entry, tell host
>          * by writing event index and flush out the write before
> @@ -961,7 +1055,8 @@ static bool virtqueue_enable_cb_delayed_split(struct virtqueue *_vq)
>         return true;
>  }
>
> -static void *virtqueue_detach_unused_buf_split(struct virtqueue *_vq)
> +static void *virtqueue_detach_unused_buf_split(struct virtqueue *_vq,
> +                                              struct virtqueue_detach_cursor *cursor)
>  {
>         struct vring_virtqueue *vq = to_vvq(_vq);
>         unsigned int i;
> @@ -974,7 +1069,10 @@ static void *virtqueue_detach_unused_buf_split(struct virtqueue *_vq)
>                         continue;
>                 /* detach_buf_split clears data, so grab it now. */
>                 buf = vq->split.desc_state[i].data;
> -               detach_buf_split(vq, i, NULL);
> +               if (vq->premapped)
> +                       detach_cursor_init_split(vq, cursor, i);
> +               else
> +                       detach_buf_split(vq, i);
>                 vq->split.avail_idx_shadow--;
>                 vq->split.vring.avail->idx = cpu_to_virtio16(_vq->vdev,
>                                 vq->split.avail_idx_shadow);
> @@ -2361,7 +2459,7 @@ void *virtqueue_get_buf_ctx(struct virtqueue *_vq, unsigned int *len,
>         struct vring_virtqueue *vq = to_vvq(_vq);
>
>         return vq->packed_ring ? virtqueue_get_buf_ctx_packed(_vq, len, ctx) :
> -                                virtqueue_get_buf_ctx_split(_vq, len, ctx);
> +                                virtqueue_get_buf_ctx_split(_vq, len, ctx, NULL);
>  }
>  EXPORT_SYMBOL_GPL(virtqueue_get_buf_ctx);
>
> @@ -2493,7 +2591,7 @@ void *virtqueue_detach_unused_buf(struct virtqueue *_vq)
>         struct vring_virtqueue *vq = to_vvq(_vq);
>
>         return vq->packed_ring ? virtqueue_detach_unused_buf_packed(_vq) :
> -                                virtqueue_detach_unused_buf_split(_vq);
> +                                virtqueue_detach_unused_buf_split(_vq, NULL);
>  }
>  EXPORT_SYMBOL_GPL(virtqueue_detach_unused_buf);
>
> diff --git a/include/linux/virtio.h b/include/linux/virtio.h
> index 1fc0e1023bd4..eb4a4e4329aa 100644
> --- a/include/linux/virtio.h
> +++ b/include/linux/virtio.h
> @@ -38,6 +38,17 @@ struct virtqueue {
>         void *priv;
>  };
>
> +struct virtqueue_detach_cursor {
> +       unsigned indirect:1;
> +       unsigned done:1;
> +       unsigned hole:14;
> +
> +       /* for split head */
> +       unsigned head:16;
> +       unsigned num:16;
> +       unsigned pos:16;
> +};
> +
>  int virtqueue_add_outbuf(struct virtqueue *vq,
>                          struct scatterlist sg[], unsigned int num,
>                          void *data,
> --
> 2.32.0.3.g01195cf9f
>

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH vhost v10 05/10] virtio_ring: split-detach: support return dma info to driver
@ 2023-06-27  8:03     ` Jason Wang
  0 siblings, 0 replies; 91+ messages in thread
From: Jason Wang @ 2023-06-27  8:03 UTC (permalink / raw)
  To: Xuan Zhuo
  Cc: virtualization, Michael S. Tsirkin, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Alexei Starovoitov,
	Daniel Borkmann, Jesper Dangaard Brouer, John Fastabend, netdev,
	bpf

On Fri, Jun 2, 2023 at 5:22 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
>
> Under the premapped mode, the driver needs to unmap the DMA address
> after receiving the buffer. The virtio core records the DMA address,
> so the driver needs a way to get the dma info from the virtio core.

A second thought, can we simply offload the tracking to the driver
itself? This looks the way many other modern NIC drivers did.

In pre mapped mode, the DMA address is in fact told by the driver
itself so it should have sufficient knowledge. And in some cases, the
driver wants to optimize/merge/delay the unampping so the DMA
addresses returned by the virtio core are not even interested in those
cases.

Thanks



>
> A straightforward approach is to pass an array to the virtio core when
> calling virtqueue_get_buf(). However, it is not feasible when there are
> multiple DMA addresses in the descriptor chain, and the array size is
> unknown.
>
> To solve this problem, a helper be introduced. After calling
> virtqueue_get_buf(), the driver can call the helper to
> retrieve a dma info. If the helper function returns -EAGAIN, it means
> that there are more DMA addresses to be processed, and the driver should
> call the helper function again. To keep track of the current position in
> the chain, a cursor must be passed to the helper function, which is
> initialized by virtqueue_get_buf().
>
> Some processes are done inside this helper, so this helper MUST be
> called under the premapped mode.
>
> Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> ---
>  drivers/virtio/virtio_ring.c | 118 ++++++++++++++++++++++++++++++++---
>  include/linux/virtio.h       |  11 ++++
>  2 files changed, 119 insertions(+), 10 deletions(-)
>
> diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> index dc109fbc05a5..cdc4349f6066 100644
> --- a/drivers/virtio/virtio_ring.c
> +++ b/drivers/virtio/virtio_ring.c
> @@ -754,8 +754,95 @@ static bool virtqueue_kick_prepare_split(struct virtqueue *_vq)
>         return needs_kick;
>  }
>
> -static void detach_buf_split(struct vring_virtqueue *vq, unsigned int head,
> -                            void **ctx)
> +static void detach_cursor_init_split(struct vring_virtqueue *vq,
> +                                    struct virtqueue_detach_cursor *cursor, u16 head)
> +{
> +       struct vring_desc_extra *extra;
> +
> +       extra = &vq->split.desc_extra[head];
> +
> +       /* Clear data ptr. */
> +       vq->split.desc_state[head].data = NULL;
> +
> +       cursor->head = head;
> +       cursor->done = 0;
> +
> +       if (extra->flags & VRING_DESC_F_INDIRECT) {
> +               cursor->num = extra->len / sizeof(struct vring_desc);
> +               cursor->indirect = true;
> +               cursor->pos = 0;
> +
> +               vring_unmap_one_split(vq, head);
> +
> +               extra->next = vq->free_head;
> +
> +               vq->free_head = head;
> +
> +               /* Plus final descriptor */
> +               vq->vq.num_free++;
> +
> +       } else {
> +               cursor->indirect = false;
> +               cursor->pos = head;
> +       }
> +}
> +
> +static int virtqueue_detach_split(struct virtqueue *_vq, struct virtqueue_detach_cursor *cursor,
> +                                 dma_addr_t *addr, u32 *len, enum dma_data_direction *dir)
> +{
> +       struct vring_virtqueue *vq = to_vvq(_vq);
> +       __virtio16 nextflag = cpu_to_virtio16(vq->vq.vdev, VRING_DESC_F_NEXT);
> +       int rc = -EAGAIN;
> +
> +       if (unlikely(cursor->done))
> +               return -EINVAL;
> +
> +       if (!cursor->indirect) {
> +               struct vring_desc_extra *extra;
> +               unsigned int i;
> +
> +               i = cursor->pos;
> +
> +               extra = &vq->split.desc_extra[i];
> +
> +               if (vq->split.vring.desc[i].flags & nextflag) {
> +                       cursor->pos = extra->next;
> +               } else {
> +                       extra->next = vq->free_head;
> +                       vq->free_head = cursor->head;
> +                       cursor->done = true;
> +                       rc = 0;
> +               }
> +
> +               *addr = extra->addr;
> +               *len = extra->len;
> +               *dir = (extra->flags & VRING_DESC_F_WRITE) ? DMA_FROM_DEVICE : DMA_TO_DEVICE;
> +
> +               vq->vq.num_free++;
> +
> +       } else {
> +               struct vring_desc *indir_desc, *desc;
> +               u16 flags;
> +
> +               indir_desc = vq->split.desc_state[cursor->head].indir_desc;
> +               desc = &indir_desc[cursor->pos];
> +
> +               flags = virtio16_to_cpu(vq->vq.vdev, desc->flags);
> +               *addr = virtio64_to_cpu(vq->vq.vdev, desc->addr);
> +               *len = virtio32_to_cpu(vq->vq.vdev, desc->len);
> +               *dir = (flags & VRING_DESC_F_WRITE) ? DMA_FROM_DEVICE : DMA_TO_DEVICE;
> +
> +               if (++cursor->pos == cursor->num) {
> +                       kfree(indir_desc);
> +                       cursor->done = true;
> +                       return 0;
> +               }
> +       }
> +
> +       return rc;
> +}
> +
> +static void detach_buf_split(struct vring_virtqueue *vq, unsigned int head)
>  {
>         unsigned int i, j;
>         __virtio16 nextflag = cpu_to_virtio16(vq->vq.vdev, VRING_DESC_F_NEXT);
> @@ -799,8 +886,6 @@ static void detach_buf_split(struct vring_virtqueue *vq, unsigned int head,
>
>                 kfree(indir_desc);
>                 vq->split.desc_state[head].indir_desc = NULL;
> -       } else if (ctx) {
> -               *ctx = vq->split.desc_state[head].indir_desc;
>         }
>  }
>
> @@ -812,7 +897,8 @@ static bool more_used_split(const struct vring_virtqueue *vq)
>
>  static void *virtqueue_get_buf_ctx_split(struct virtqueue *_vq,
>                                          unsigned int *len,
> -                                        void **ctx)
> +                                        void **ctx,
> +                                        struct virtqueue_detach_cursor *cursor)
>  {
>         struct vring_virtqueue *vq = to_vvq(_vq);
>         void *ret;
> @@ -852,7 +938,15 @@ static void *virtqueue_get_buf_ctx_split(struct virtqueue *_vq,
>
>         /* detach_buf_split clears data, so grab it now. */
>         ret = vq->split.desc_state[i].data;
> -       detach_buf_split(vq, i, ctx);
> +
> +       if (!vq->indirect && ctx)
> +               *ctx = vq->split.desc_state[i].indir_desc;
> +
> +       if (vq->premapped)
> +               detach_cursor_init_split(vq, cursor, i);
> +       else
> +               detach_buf_split(vq, i);
> +
>         vq->last_used_idx++;
>         /* If we expect an interrupt for the next entry, tell host
>          * by writing event index and flush out the write before
> @@ -961,7 +1055,8 @@ static bool virtqueue_enable_cb_delayed_split(struct virtqueue *_vq)
>         return true;
>  }
>
> -static void *virtqueue_detach_unused_buf_split(struct virtqueue *_vq)
> +static void *virtqueue_detach_unused_buf_split(struct virtqueue *_vq,
> +                                              struct virtqueue_detach_cursor *cursor)
>  {
>         struct vring_virtqueue *vq = to_vvq(_vq);
>         unsigned int i;
> @@ -974,7 +1069,10 @@ static void *virtqueue_detach_unused_buf_split(struct virtqueue *_vq)
>                         continue;
>                 /* detach_buf_split clears data, so grab it now. */
>                 buf = vq->split.desc_state[i].data;
> -               detach_buf_split(vq, i, NULL);
> +               if (vq->premapped)
> +                       detach_cursor_init_split(vq, cursor, i);
> +               else
> +                       detach_buf_split(vq, i);
>                 vq->split.avail_idx_shadow--;
>                 vq->split.vring.avail->idx = cpu_to_virtio16(_vq->vdev,
>                                 vq->split.avail_idx_shadow);
> @@ -2361,7 +2459,7 @@ void *virtqueue_get_buf_ctx(struct virtqueue *_vq, unsigned int *len,
>         struct vring_virtqueue *vq = to_vvq(_vq);
>
>         return vq->packed_ring ? virtqueue_get_buf_ctx_packed(_vq, len, ctx) :
> -                                virtqueue_get_buf_ctx_split(_vq, len, ctx);
> +                                virtqueue_get_buf_ctx_split(_vq, len, ctx, NULL);
>  }
>  EXPORT_SYMBOL_GPL(virtqueue_get_buf_ctx);
>
> @@ -2493,7 +2591,7 @@ void *virtqueue_detach_unused_buf(struct virtqueue *_vq)
>         struct vring_virtqueue *vq = to_vvq(_vq);
>
>         return vq->packed_ring ? virtqueue_detach_unused_buf_packed(_vq) :
> -                                virtqueue_detach_unused_buf_split(_vq);
> +                                virtqueue_detach_unused_buf_split(_vq, NULL);
>  }
>  EXPORT_SYMBOL_GPL(virtqueue_detach_unused_buf);
>
> diff --git a/include/linux/virtio.h b/include/linux/virtio.h
> index 1fc0e1023bd4..eb4a4e4329aa 100644
> --- a/include/linux/virtio.h
> +++ b/include/linux/virtio.h
> @@ -38,6 +38,17 @@ struct virtqueue {
>         void *priv;
>  };
>
> +struct virtqueue_detach_cursor {
> +       unsigned indirect:1;
> +       unsigned done:1;
> +       unsigned hole:14;
> +
> +       /* for split head */
> +       unsigned head:16;
> +       unsigned num:16;
> +       unsigned pos:16;
> +};
> +
>  int virtqueue_add_outbuf(struct virtqueue *vq,
>                          struct scatterlist sg[], unsigned int num,
>                          void *data,
> --
> 2.32.0.3.g01195cf9f
>


^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH vhost v10 10/10] virtio_net: support dma premapped
  2023-06-02  9:22   ` Xuan Zhuo
@ 2023-06-27  8:03     ` Jason Wang
  -1 siblings, 0 replies; 91+ messages in thread
From: Jason Wang @ 2023-06-27  8:03 UTC (permalink / raw)
  To: Xuan Zhuo
  Cc: Jesper Dangaard Brouer, Daniel Borkmann, Michael S. Tsirkin,
	netdev, John Fastabend, Alexei Starovoitov, virtualization,
	Eric Dumazet, Jakub Kicinski, bpf, Paolo Abeni, David S. Miller

On Fri, Jun 2, 2023 at 5:22 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
>
> Introduce the module param "experiment_premapped" to enable the function
> that the virtio-net do dma mapping.
>
> If that is true, the vq of virtio-net is under the premapped mode.
> It just handle the sg with dma_address. And the driver must get the dma
> address of the buffer to unmap after get the buffer from virtio core.
>
> That will be useful when AF_XDP is enable, AF_XDP tx and the kernel packet
> xmit will share the tx queue, so the skb xmit must support the premapped
> mode.
>
> Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> ---
>  drivers/net/virtio_net.c | 163 +++++++++++++++++++++++++++++++++------
>  1 file changed, 141 insertions(+), 22 deletions(-)
>
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index 2396c28c0122..5898212fcb3c 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -26,10 +26,11 @@
>  static int napi_weight = NAPI_POLL_WEIGHT;
>  module_param(napi_weight, int, 0444);
>
> -static bool csum = true, gso = true, napi_tx = true;
> +static bool csum = true, gso = true, napi_tx = true, experiment_premapped;
>  module_param(csum, bool, 0444);
>  module_param(gso, bool, 0444);
>  module_param(napi_tx, bool, 0644);
> +module_param(experiment_premapped, bool, 0644);

Having a module parameter is sub-optimal. I think we can demonstrate
real benefit:

In the case of a merge rx buffer, if the mapping is done by the
virtio-core, it needs to be done per buffer (< PAGE_SIZE).

But if it is done by the virtio-net, we have a chance to map the
buffer per page. Which can save a lot of mappings and unmapping. A lot
of other optimizations could be done on top as well.

If we manage to prove this, we don't need any experimental module
parameters at all.

Thanks


>
>  /* FIXME: MTU in config. */
>  #define GOOD_PACKET_LEN (ETH_HLEN + VLAN_HLEN + ETH_DATA_LEN)
> @@ -142,6 +143,9 @@ struct send_queue {
>
>         /* Record whether sq is in reset state. */
>         bool reset;
> +
> +       /* The vq is premapped mode. */
> +       bool premapped;
>  };
>
>  /* Internal representation of a receive virtqueue */
> @@ -174,6 +178,9 @@ struct receive_queue {
>         char name[16];
>
>         struct xdp_rxq_info xdp_rxq;
> +
> +       /* The vq is premapped mode. */
> +       bool premapped;
>  };
>
>  /* This structure can contain rss message with maximum settings for indirection table and keysize
> @@ -546,6 +553,105 @@ static struct sk_buff *page_to_skb(struct virtnet_info *vi,
>         return skb;
>  }
>
> +static int virtnet_generic_unmap(struct virtqueue *vq, struct virtqueue_detach_cursor *cursor)
> +{
> +       enum dma_data_direction dir;
> +       dma_addr_t addr;
> +       u32 len;
> +       int err;
> +
> +       do {
> +               err = virtqueue_detach(vq, cursor, &addr, &len, &dir);
> +               if (!err || err == -EAGAIN)
> +                       dma_unmap_page_attrs(virtqueue_dma_dev(vq), addr, len, dir, 0);
> +
> +       } while (err == -EAGAIN);
> +
> +       return err;
> +}
> +
> +static void *virtnet_detach_unused_buf(struct virtqueue *vq, bool premapped)
> +{
> +       struct virtqueue_detach_cursor cursor;
> +       void *buf;
> +
> +       if (!premapped)
> +               return virtqueue_detach_unused_buf(vq);
> +
> +       buf = virtqueue_detach_unused_buf_premapped(vq, &cursor);
> +       if (buf)
> +               virtnet_generic_unmap(vq, &cursor);
> +
> +       return buf;
> +}
> +
> +static void *virtnet_get_buf_ctx(struct virtqueue *vq, bool premapped, u32 *len, void **ctx)
> +{
> +       struct virtqueue_detach_cursor cursor;
> +       void *buf;
> +
> +       if (!premapped)
> +               return virtqueue_get_buf_ctx(vq, len, ctx);
> +
> +       buf = virtqueue_get_buf_premapped(vq, len, ctx, &cursor);
> +       if (buf)
> +               virtnet_generic_unmap(vq, &cursor);
> +
> +       return buf;
> +}
> +
> +#define virtnet_rq_get_buf(rq, plen, pctx) \
> +({ \
> +       typeof(rq) _rq = (rq); \
> +       virtnet_get_buf_ctx(_rq->vq, _rq->premapped, plen, pctx); \
> +})
> +
> +#define virtnet_sq_get_buf(sq, plen, pctx) \
> +({ \
> +       typeof(sq) _sq = (sq); \
> +       virtnet_get_buf_ctx(_sq->vq, _sq->premapped, plen, pctx); \
> +})
> +
> +static int virtnet_add_sg(struct virtqueue *vq, bool premapped,
> +                         struct scatterlist *sg, unsigned int num, bool out,
> +                         void *data, void *ctx, gfp_t gfp)
> +{
> +       enum dma_data_direction dir;
> +       struct device *dev;
> +       int err, ret;
> +
> +       if (!premapped)
> +               return virtqueue_add_sg(vq, sg, num, out, data, ctx, gfp);
> +
> +       dir = out ? DMA_TO_DEVICE : DMA_FROM_DEVICE;
> +       dev = virtqueue_dma_dev(vq);
> +
> +       ret = dma_map_sg_attrs(dev, sg, num, dir, 0);
> +       if (ret != num)
> +               goto err;
> +
> +       err = virtqueue_add_sg(vq, sg, num, out, data, ctx, gfp);
> +       if (err < 0)
> +               goto err;
> +
> +       return 0;
> +
> +err:
> +       dma_unmap_sg_attrs(dev, sg, num, dir, 0);
> +       return -ENOMEM;
> +}
> +
> +static int virtnet_add_outbuf(struct send_queue *sq, unsigned int num, void *data)
> +{
> +       return virtnet_add_sg(sq->vq, sq->premapped, sq->sg, num, true, data, NULL, GFP_ATOMIC);
> +}
> +
> +static int virtnet_add_inbuf(struct receive_queue *rq, unsigned int num, void *data,
> +                            void *ctx, gfp_t gfp)
> +{
> +       return virtnet_add_sg(rq->vq, rq->premapped, rq->sg, num, false, data, ctx, gfp);
> +}
> +
>  static void free_old_xmit_skbs(struct send_queue *sq, bool in_napi)
>  {
>         unsigned int len;
> @@ -553,7 +659,7 @@ static void free_old_xmit_skbs(struct send_queue *sq, bool in_napi)
>         unsigned int bytes = 0;
>         void *ptr;
>
> -       while ((ptr = virtqueue_get_buf(sq->vq, &len)) != NULL) {
> +       while ((ptr = virtnet_sq_get_buf(sq, &len, NULL)) != NULL) {
>                 if (likely(!is_xdp_frame(ptr))) {
>                         struct sk_buff *skb = ptr;
>
> @@ -667,8 +773,7 @@ static int __virtnet_xdp_xmit_one(struct virtnet_info *vi,
>                             skb_frag_size(frag), skb_frag_off(frag));
>         }
>
> -       err = virtqueue_add_outbuf(sq->vq, sq->sg, nr_frags + 1,
> -                                  xdp_to_ptr(xdpf), GFP_ATOMIC);
> +       err = virtnet_add_outbuf(sq, nr_frags + 1, xdp_to_ptr(xdpf));
>         if (unlikely(err))
>                 return -ENOSPC; /* Caller handle free/refcnt */
>
> @@ -744,7 +849,7 @@ static int virtnet_xdp_xmit(struct net_device *dev,
>         }
>
>         /* Free up any pending old buffers before queueing new ones. */
> -       while ((ptr = virtqueue_get_buf(sq->vq, &len)) != NULL) {
> +       while ((ptr = virtnet_sq_get_buf(sq, &len, NULL)) != NULL) {
>                 if (likely(is_xdp_frame(ptr))) {
>                         struct xdp_frame *frame = ptr_to_xdp(ptr);
>
> @@ -828,7 +933,7 @@ static struct page *xdp_linearize_page(struct receive_queue *rq,
>                 void *buf;
>                 int off;
>
> -               buf = virtqueue_get_buf(rq->vq, &buflen);
> +               buf = virtnet_rq_get_buf(rq, &buflen, NULL);
>                 if (unlikely(!buf))
>                         goto err_buf;
>
> @@ -1119,7 +1224,7 @@ static int virtnet_build_xdp_buff_mrg(struct net_device *dev,
>                 return -EINVAL;
>
>         while (--*num_buf > 0) {
> -               buf = virtqueue_get_buf_ctx(rq->vq, &len, &ctx);
> +               buf = virtnet_rq_get_buf(rq, &len, &ctx);
>                 if (unlikely(!buf)) {
>                         pr_debug("%s: rx error: %d buffers out of %d missing\n",
>                                  dev->name, *num_buf,
> @@ -1344,7 +1449,7 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
>         while (--num_buf) {
>                 int num_skb_frags;
>
> -               buf = virtqueue_get_buf_ctx(rq->vq, &len, &ctx);
> +               buf = virtnet_rq_get_buf(rq, &len, &ctx);
>                 if (unlikely(!buf)) {
>                         pr_debug("%s: rx error: %d buffers out of %d missing\n",
>                                  dev->name, num_buf,
> @@ -1407,7 +1512,7 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
>  err_skb:
>         put_page(page);
>         while (num_buf-- > 1) {
> -               buf = virtqueue_get_buf(rq->vq, &len);
> +               buf = virtnet_rq_get_buf(rq, &len, NULL);
>                 if (unlikely(!buf)) {
>                         pr_debug("%s: rx error: %d buffers missing\n",
>                                  dev->name, num_buf);
> @@ -1534,7 +1639,7 @@ static int add_recvbuf_small(struct virtnet_info *vi, struct receive_queue *rq,
>         alloc_frag->offset += len;
>         sg_init_one(rq->sg, buf + VIRTNET_RX_PAD + xdp_headroom,
>                     vi->hdr_len + GOOD_PACKET_LEN);
> -       err = virtqueue_add_inbuf_ctx(rq->vq, rq->sg, 1, buf, ctx, gfp);
> +       err = virtnet_add_inbuf(rq, 1, buf, ctx, gfp);
>         if (err < 0)
>                 put_page(virt_to_head_page(buf));
>         return err;
> @@ -1581,8 +1686,8 @@ static int add_recvbuf_big(struct virtnet_info *vi, struct receive_queue *rq,
>
>         /* chain first in list head */
>         first->private = (unsigned long)list;
> -       err = virtqueue_add_inbuf(rq->vq, rq->sg, vi->big_packets_num_skbfrags + 2,
> -                                 first, gfp);
> +       err = virtnet_add_inbuf(rq, vi->big_packets_num_skbfrags + 2,
> +                               first, NULL, gfp);
>         if (err < 0)
>                 give_pages(rq, first);
>
> @@ -1645,7 +1750,7 @@ static int add_recvbuf_mergeable(struct virtnet_info *vi,
>
>         sg_init_one(rq->sg, buf, len);
>         ctx = mergeable_len_to_ctx(len + room, headroom);
> -       err = virtqueue_add_inbuf_ctx(rq->vq, rq->sg, 1, buf, ctx, gfp);
> +       err = virtnet_add_inbuf(rq, 1, buf, ctx, gfp);
>         if (err < 0)
>                 put_page(virt_to_head_page(buf));
>
> @@ -1768,13 +1873,13 @@ static int virtnet_receive(struct receive_queue *rq, int budget,
>                 void *ctx;
>
>                 while (stats.packets < budget &&
> -                      (buf = virtqueue_get_buf_ctx(rq->vq, &len, &ctx))) {
> +                      (buf = virtnet_rq_get_buf(rq, &len, &ctx))) {
>                         receive_buf(vi, rq, buf, len, ctx, xdp_xmit, &stats);
>                         stats.packets++;
>                 }
>         } else {
>                 while (stats.packets < budget &&
> -                      (buf = virtqueue_get_buf(rq->vq, &len)) != NULL) {
> +                      (buf = virtnet_rq_get_buf(rq, &len, NULL)) != NULL) {
>                         receive_buf(vi, rq, buf, len, NULL, xdp_xmit, &stats);
>                         stats.packets++;
>                 }
> @@ -1984,7 +2089,7 @@ static int xmit_skb(struct send_queue *sq, struct sk_buff *skb)
>                         return num_sg;
>                 num_sg++;
>         }
> -       return virtqueue_add_outbuf(sq->vq, sq->sg, num_sg, skb, GFP_ATOMIC);
> +       return virtnet_add_outbuf(sq, num_sg, skb);
>  }
>
>  static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev)
> @@ -3552,15 +3657,17 @@ static void free_unused_bufs(struct virtnet_info *vi)
>         int i;
>
>         for (i = 0; i < vi->max_queue_pairs; i++) {
> -               struct virtqueue *vq = vi->sq[i].vq;
> -               while ((buf = virtqueue_detach_unused_buf(vq)) != NULL)
> -                       virtnet_sq_free_unused_buf(vq, buf);
> +               struct send_queue *sq = &vi->sq[i];
> +
> +               while ((buf = virtnet_detach_unused_buf(sq->vq, sq->premapped)) != NULL)
> +                       virtnet_sq_free_unused_buf(sq->vq, buf);
>         }
>
>         for (i = 0; i < vi->max_queue_pairs; i++) {
> -               struct virtqueue *vq = vi->rq[i].vq;
> -               while ((buf = virtqueue_detach_unused_buf(vq)) != NULL)
> -                       virtnet_rq_free_unused_buf(vq, buf);
> +               struct receive_queue *rq = &vi->rq[i];
> +
> +               while ((buf = virtnet_detach_unused_buf(rq->vq, rq->premapped)) != NULL)
> +                       virtnet_rq_free_unused_buf(rq->vq, buf);
>         }
>  }
>
> @@ -3658,6 +3765,18 @@ static int virtnet_find_vqs(struct virtnet_info *vi)
>                 vi->rq[i].vq = vqs[rxq2vq(i)];
>                 vi->rq[i].min_buf_len = mergeable_min_buf_len(vi, vi->rq[i].vq);
>                 vi->sq[i].vq = vqs[txq2vq(i)];
> +
> +               if (experiment_premapped) {
> +                       if (!virtqueue_set_premapped(vi->rq[i].vq))
> +                               vi->rq[i].premapped = true;
> +                       else
> +                               netdev_warn(vi->dev, "RXQ (%d) enable premapped failure.\n", i);
> +
> +                       if (!virtqueue_set_premapped(vi->sq[i].vq))
> +                               vi->sq[i].premapped = true;
> +                       else
> +                               netdev_warn(vi->dev, "TXQ (%d) enable premapped failure.\n", i);
> +               }
>         }
>
>         /* run here: ret == 0. */
> --
> 2.32.0.3.g01195cf9f
>

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH vhost v10 10/10] virtio_net: support dma premapped
@ 2023-06-27  8:03     ` Jason Wang
  0 siblings, 0 replies; 91+ messages in thread
From: Jason Wang @ 2023-06-27  8:03 UTC (permalink / raw)
  To: Xuan Zhuo
  Cc: virtualization, Michael S. Tsirkin, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Alexei Starovoitov,
	Daniel Borkmann, Jesper Dangaard Brouer, John Fastabend, netdev,
	bpf

On Fri, Jun 2, 2023 at 5:22 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
>
> Introduce the module param "experiment_premapped" to enable the function
> that the virtio-net do dma mapping.
>
> If that is true, the vq of virtio-net is under the premapped mode.
> It just handle the sg with dma_address. And the driver must get the dma
> address of the buffer to unmap after get the buffer from virtio core.
>
> That will be useful when AF_XDP is enable, AF_XDP tx and the kernel packet
> xmit will share the tx queue, so the skb xmit must support the premapped
> mode.
>
> Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> ---
>  drivers/net/virtio_net.c | 163 +++++++++++++++++++++++++++++++++------
>  1 file changed, 141 insertions(+), 22 deletions(-)
>
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index 2396c28c0122..5898212fcb3c 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -26,10 +26,11 @@
>  static int napi_weight = NAPI_POLL_WEIGHT;
>  module_param(napi_weight, int, 0444);
>
> -static bool csum = true, gso = true, napi_tx = true;
> +static bool csum = true, gso = true, napi_tx = true, experiment_premapped;
>  module_param(csum, bool, 0444);
>  module_param(gso, bool, 0444);
>  module_param(napi_tx, bool, 0644);
> +module_param(experiment_premapped, bool, 0644);

Having a module parameter is sub-optimal. I think we can demonstrate
real benefit:

In the case of a merge rx buffer, if the mapping is done by the
virtio-core, it needs to be done per buffer (< PAGE_SIZE).

But if it is done by the virtio-net, we have a chance to map the
buffer per page. Which can save a lot of mappings and unmapping. A lot
of other optimizations could be done on top as well.

If we manage to prove this, we don't need any experimental module
parameters at all.

Thanks


>
>  /* FIXME: MTU in config. */
>  #define GOOD_PACKET_LEN (ETH_HLEN + VLAN_HLEN + ETH_DATA_LEN)
> @@ -142,6 +143,9 @@ struct send_queue {
>
>         /* Record whether sq is in reset state. */
>         bool reset;
> +
> +       /* The vq is premapped mode. */
> +       bool premapped;
>  };
>
>  /* Internal representation of a receive virtqueue */
> @@ -174,6 +178,9 @@ struct receive_queue {
>         char name[16];
>
>         struct xdp_rxq_info xdp_rxq;
> +
> +       /* The vq is premapped mode. */
> +       bool premapped;
>  };
>
>  /* This structure can contain rss message with maximum settings for indirection table and keysize
> @@ -546,6 +553,105 @@ static struct sk_buff *page_to_skb(struct virtnet_info *vi,
>         return skb;
>  }
>
> +static int virtnet_generic_unmap(struct virtqueue *vq, struct virtqueue_detach_cursor *cursor)
> +{
> +       enum dma_data_direction dir;
> +       dma_addr_t addr;
> +       u32 len;
> +       int err;
> +
> +       do {
> +               err = virtqueue_detach(vq, cursor, &addr, &len, &dir);
> +               if (!err || err == -EAGAIN)
> +                       dma_unmap_page_attrs(virtqueue_dma_dev(vq), addr, len, dir, 0);
> +
> +       } while (err == -EAGAIN);
> +
> +       return err;
> +}
> +
> +static void *virtnet_detach_unused_buf(struct virtqueue *vq, bool premapped)
> +{
> +       struct virtqueue_detach_cursor cursor;
> +       void *buf;
> +
> +       if (!premapped)
> +               return virtqueue_detach_unused_buf(vq);
> +
> +       buf = virtqueue_detach_unused_buf_premapped(vq, &cursor);
> +       if (buf)
> +               virtnet_generic_unmap(vq, &cursor);
> +
> +       return buf;
> +}
> +
> +static void *virtnet_get_buf_ctx(struct virtqueue *vq, bool premapped, u32 *len, void **ctx)
> +{
> +       struct virtqueue_detach_cursor cursor;
> +       void *buf;
> +
> +       if (!premapped)
> +               return virtqueue_get_buf_ctx(vq, len, ctx);
> +
> +       buf = virtqueue_get_buf_premapped(vq, len, ctx, &cursor);
> +       if (buf)
> +               virtnet_generic_unmap(vq, &cursor);
> +
> +       return buf;
> +}
> +
> +#define virtnet_rq_get_buf(rq, plen, pctx) \
> +({ \
> +       typeof(rq) _rq = (rq); \
> +       virtnet_get_buf_ctx(_rq->vq, _rq->premapped, plen, pctx); \
> +})
> +
> +#define virtnet_sq_get_buf(sq, plen, pctx) \
> +({ \
> +       typeof(sq) _sq = (sq); \
> +       virtnet_get_buf_ctx(_sq->vq, _sq->premapped, plen, pctx); \
> +})
> +
> +static int virtnet_add_sg(struct virtqueue *vq, bool premapped,
> +                         struct scatterlist *sg, unsigned int num, bool out,
> +                         void *data, void *ctx, gfp_t gfp)
> +{
> +       enum dma_data_direction dir;
> +       struct device *dev;
> +       int err, ret;
> +
> +       if (!premapped)
> +               return virtqueue_add_sg(vq, sg, num, out, data, ctx, gfp);
> +
> +       dir = out ? DMA_TO_DEVICE : DMA_FROM_DEVICE;
> +       dev = virtqueue_dma_dev(vq);
> +
> +       ret = dma_map_sg_attrs(dev, sg, num, dir, 0);
> +       if (ret != num)
> +               goto err;
> +
> +       err = virtqueue_add_sg(vq, sg, num, out, data, ctx, gfp);
> +       if (err < 0)
> +               goto err;
> +
> +       return 0;
> +
> +err:
> +       dma_unmap_sg_attrs(dev, sg, num, dir, 0);
> +       return -ENOMEM;
> +}
> +
> +static int virtnet_add_outbuf(struct send_queue *sq, unsigned int num, void *data)
> +{
> +       return virtnet_add_sg(sq->vq, sq->premapped, sq->sg, num, true, data, NULL, GFP_ATOMIC);
> +}
> +
> +static int virtnet_add_inbuf(struct receive_queue *rq, unsigned int num, void *data,
> +                            void *ctx, gfp_t gfp)
> +{
> +       return virtnet_add_sg(rq->vq, rq->premapped, rq->sg, num, false, data, ctx, gfp);
> +}
> +
>  static void free_old_xmit_skbs(struct send_queue *sq, bool in_napi)
>  {
>         unsigned int len;
> @@ -553,7 +659,7 @@ static void free_old_xmit_skbs(struct send_queue *sq, bool in_napi)
>         unsigned int bytes = 0;
>         void *ptr;
>
> -       while ((ptr = virtqueue_get_buf(sq->vq, &len)) != NULL) {
> +       while ((ptr = virtnet_sq_get_buf(sq, &len, NULL)) != NULL) {
>                 if (likely(!is_xdp_frame(ptr))) {
>                         struct sk_buff *skb = ptr;
>
> @@ -667,8 +773,7 @@ static int __virtnet_xdp_xmit_one(struct virtnet_info *vi,
>                             skb_frag_size(frag), skb_frag_off(frag));
>         }
>
> -       err = virtqueue_add_outbuf(sq->vq, sq->sg, nr_frags + 1,
> -                                  xdp_to_ptr(xdpf), GFP_ATOMIC);
> +       err = virtnet_add_outbuf(sq, nr_frags + 1, xdp_to_ptr(xdpf));
>         if (unlikely(err))
>                 return -ENOSPC; /* Caller handle free/refcnt */
>
> @@ -744,7 +849,7 @@ static int virtnet_xdp_xmit(struct net_device *dev,
>         }
>
>         /* Free up any pending old buffers before queueing new ones. */
> -       while ((ptr = virtqueue_get_buf(sq->vq, &len)) != NULL) {
> +       while ((ptr = virtnet_sq_get_buf(sq, &len, NULL)) != NULL) {
>                 if (likely(is_xdp_frame(ptr))) {
>                         struct xdp_frame *frame = ptr_to_xdp(ptr);
>
> @@ -828,7 +933,7 @@ static struct page *xdp_linearize_page(struct receive_queue *rq,
>                 void *buf;
>                 int off;
>
> -               buf = virtqueue_get_buf(rq->vq, &buflen);
> +               buf = virtnet_rq_get_buf(rq, &buflen, NULL);
>                 if (unlikely(!buf))
>                         goto err_buf;
>
> @@ -1119,7 +1224,7 @@ static int virtnet_build_xdp_buff_mrg(struct net_device *dev,
>                 return -EINVAL;
>
>         while (--*num_buf > 0) {
> -               buf = virtqueue_get_buf_ctx(rq->vq, &len, &ctx);
> +               buf = virtnet_rq_get_buf(rq, &len, &ctx);
>                 if (unlikely(!buf)) {
>                         pr_debug("%s: rx error: %d buffers out of %d missing\n",
>                                  dev->name, *num_buf,
> @@ -1344,7 +1449,7 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
>         while (--num_buf) {
>                 int num_skb_frags;
>
> -               buf = virtqueue_get_buf_ctx(rq->vq, &len, &ctx);
> +               buf = virtnet_rq_get_buf(rq, &len, &ctx);
>                 if (unlikely(!buf)) {
>                         pr_debug("%s: rx error: %d buffers out of %d missing\n",
>                                  dev->name, num_buf,
> @@ -1407,7 +1512,7 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
>  err_skb:
>         put_page(page);
>         while (num_buf-- > 1) {
> -               buf = virtqueue_get_buf(rq->vq, &len);
> +               buf = virtnet_rq_get_buf(rq, &len, NULL);
>                 if (unlikely(!buf)) {
>                         pr_debug("%s: rx error: %d buffers missing\n",
>                                  dev->name, num_buf);
> @@ -1534,7 +1639,7 @@ static int add_recvbuf_small(struct virtnet_info *vi, struct receive_queue *rq,
>         alloc_frag->offset += len;
>         sg_init_one(rq->sg, buf + VIRTNET_RX_PAD + xdp_headroom,
>                     vi->hdr_len + GOOD_PACKET_LEN);
> -       err = virtqueue_add_inbuf_ctx(rq->vq, rq->sg, 1, buf, ctx, gfp);
> +       err = virtnet_add_inbuf(rq, 1, buf, ctx, gfp);
>         if (err < 0)
>                 put_page(virt_to_head_page(buf));
>         return err;
> @@ -1581,8 +1686,8 @@ static int add_recvbuf_big(struct virtnet_info *vi, struct receive_queue *rq,
>
>         /* chain first in list head */
>         first->private = (unsigned long)list;
> -       err = virtqueue_add_inbuf(rq->vq, rq->sg, vi->big_packets_num_skbfrags + 2,
> -                                 first, gfp);
> +       err = virtnet_add_inbuf(rq, vi->big_packets_num_skbfrags + 2,
> +                               first, NULL, gfp);
>         if (err < 0)
>                 give_pages(rq, first);
>
> @@ -1645,7 +1750,7 @@ static int add_recvbuf_mergeable(struct virtnet_info *vi,
>
>         sg_init_one(rq->sg, buf, len);
>         ctx = mergeable_len_to_ctx(len + room, headroom);
> -       err = virtqueue_add_inbuf_ctx(rq->vq, rq->sg, 1, buf, ctx, gfp);
> +       err = virtnet_add_inbuf(rq, 1, buf, ctx, gfp);
>         if (err < 0)
>                 put_page(virt_to_head_page(buf));
>
> @@ -1768,13 +1873,13 @@ static int virtnet_receive(struct receive_queue *rq, int budget,
>                 void *ctx;
>
>                 while (stats.packets < budget &&
> -                      (buf = virtqueue_get_buf_ctx(rq->vq, &len, &ctx))) {
> +                      (buf = virtnet_rq_get_buf(rq, &len, &ctx))) {
>                         receive_buf(vi, rq, buf, len, ctx, xdp_xmit, &stats);
>                         stats.packets++;
>                 }
>         } else {
>                 while (stats.packets < budget &&
> -                      (buf = virtqueue_get_buf(rq->vq, &len)) != NULL) {
> +                      (buf = virtnet_rq_get_buf(rq, &len, NULL)) != NULL) {
>                         receive_buf(vi, rq, buf, len, NULL, xdp_xmit, &stats);
>                         stats.packets++;
>                 }
> @@ -1984,7 +2089,7 @@ static int xmit_skb(struct send_queue *sq, struct sk_buff *skb)
>                         return num_sg;
>                 num_sg++;
>         }
> -       return virtqueue_add_outbuf(sq->vq, sq->sg, num_sg, skb, GFP_ATOMIC);
> +       return virtnet_add_outbuf(sq, num_sg, skb);
>  }
>
>  static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev)
> @@ -3552,15 +3657,17 @@ static void free_unused_bufs(struct virtnet_info *vi)
>         int i;
>
>         for (i = 0; i < vi->max_queue_pairs; i++) {
> -               struct virtqueue *vq = vi->sq[i].vq;
> -               while ((buf = virtqueue_detach_unused_buf(vq)) != NULL)
> -                       virtnet_sq_free_unused_buf(vq, buf);
> +               struct send_queue *sq = &vi->sq[i];
> +
> +               while ((buf = virtnet_detach_unused_buf(sq->vq, sq->premapped)) != NULL)
> +                       virtnet_sq_free_unused_buf(sq->vq, buf);
>         }
>
>         for (i = 0; i < vi->max_queue_pairs; i++) {
> -               struct virtqueue *vq = vi->rq[i].vq;
> -               while ((buf = virtqueue_detach_unused_buf(vq)) != NULL)
> -                       virtnet_rq_free_unused_buf(vq, buf);
> +               struct receive_queue *rq = &vi->rq[i];
> +
> +               while ((buf = virtnet_detach_unused_buf(rq->vq, rq->premapped)) != NULL)
> +                       virtnet_rq_free_unused_buf(rq->vq, buf);
>         }
>  }
>
> @@ -3658,6 +3765,18 @@ static int virtnet_find_vqs(struct virtnet_info *vi)
>                 vi->rq[i].vq = vqs[rxq2vq(i)];
>                 vi->rq[i].min_buf_len = mergeable_min_buf_len(vi, vi->rq[i].vq);
>                 vi->sq[i].vq = vqs[txq2vq(i)];
> +
> +               if (experiment_premapped) {
> +                       if (!virtqueue_set_premapped(vi->rq[i].vq))
> +                               vi->rq[i].premapped = true;
> +                       else
> +                               netdev_warn(vi->dev, "RXQ (%d) enable premapped failure.\n", i);
> +
> +                       if (!virtqueue_set_premapped(vi->sq[i].vq))
> +                               vi->sq[i].premapped = true;
> +                       else
> +                               netdev_warn(vi->dev, "TXQ (%d) enable premapped failure.\n", i);
> +               }
>         }
>
>         /* run here: ret == 0. */
> --
> 2.32.0.3.g01195cf9f
>


^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH vhost v10 02/10] virtio_ring: introduce virtqueue_set_premapped()
  2023-06-27  8:03     ` Jason Wang
@ 2023-06-27  8:50       ` Xuan Zhuo
  -1 siblings, 0 replies; 91+ messages in thread
From: Xuan Zhuo @ 2023-06-27  8:50 UTC (permalink / raw)
  To: Jason Wang
  Cc: Jesper Dangaard Brouer, Daniel Borkmann, Michael S. Tsirkin,
	netdev, John Fastabend, Alexei Starovoitov, virtualization,
	Eric Dumazet, Jakub Kicinski, bpf, Paolo Abeni, David S. Miller

On Tue, 27 Jun 2023 16:03:23 +0800, Jason Wang <jasowang@redhat.com> wrote:
> On Fri, Jun 2, 2023 at 5:22 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> >
> > This helper allows the driver change the dma mode to premapped mode.
> > Under the premapped mode, the virtio core do not do dma mapping
> > internally.
> >
> > This just work when the use_dma_api is true. If the use_dma_api is false,
> > the dma options is not through the DMA APIs, that is not the standard
> > way of the linux kernel.
> >
> > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > ---
> >  drivers/virtio/virtio_ring.c | 40 ++++++++++++++++++++++++++++++++++++
> >  include/linux/virtio.h       |  2 ++
> >  2 files changed, 42 insertions(+)
> >
> > diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> > index 72ed07a604d4..2afdfb9e3e30 100644
> > --- a/drivers/virtio/virtio_ring.c
> > +++ b/drivers/virtio/virtio_ring.c
> > @@ -172,6 +172,9 @@ struct vring_virtqueue {
> >         /* Host publishes avail event idx */
> >         bool event;
> >
> > +       /* Do DMA mapping by driver */
> > +       bool premapped;
> > +
> >         /* Head of free buffer list. */
> >         unsigned int free_head;
> >         /* Number we've added since last sync. */
> > @@ -2059,6 +2062,7 @@ static struct virtqueue *vring_create_virtqueue_packed(
> >         vq->packed_ring = true;
> >         vq->dma_dev = dma_dev;
> >         vq->use_dma_api = vring_use_dma_api(vdev);
> > +       vq->premapped = false;
> >
> >         vq->indirect = virtio_has_feature(vdev, VIRTIO_RING_F_INDIRECT_DESC) &&
> >                 !context;
> > @@ -2548,6 +2552,7 @@ static struct virtqueue *__vring_new_virtqueue(unsigned int index,
> >  #endif
> >         vq->dma_dev = dma_dev;
> >         vq->use_dma_api = vring_use_dma_api(vdev);
> > +       vq->premapped = false;
> >
> >         vq->indirect = virtio_has_feature(vdev, VIRTIO_RING_F_INDIRECT_DESC) &&
> >                 !context;
> > @@ -2691,6 +2696,41 @@ int virtqueue_resize(struct virtqueue *_vq, u32 num,
> >  }
> >  EXPORT_SYMBOL_GPL(virtqueue_resize);
> >
> > +/**
> > + * virtqueue_set_premapped - set the vring premapped mode
> > + * @_vq: the struct virtqueue we're talking about.
> > + *
> > + * Enable the premapped mode of the vq.
> > + *
> > + * The vring in premapped mode does not do dma internally, so the driver must
> > + * do dma mapping in advance. The driver must pass the dma_address through
> > + * dma_address of scatterlist. When the driver got a used buffer from
> > + * the vring, it has to unmap the dma address. So the driver must call
> > + * virtqueue_get_buf_premapped()/virtqueue_detach_unused_buf_premapped().
> > + *
> > + * This must be called before adding any buf to vring.
>
> And any old buffer should be detached?

I mean that before adding any buf, So there are not old buffer.


>
> > + * So this should be called immediately after init vq or vq reset.
>
> Any way to detect and warn in this case? (not a must if it's too
> expensive to do the check)


I can try to check whether the qeueu is empty.


>
> > + *
> > + * Caller must ensure we don't call this with other virtqueue operations
> > + * at the same time (except where noted).
> > + *
> > + * Returns zero or a negative error.
> > + * 0: success.
> > + * -EINVAL: vring does not use the dma api, so we can not enable premapped mode.
> > + */
> > +int virtqueue_set_premapped(struct virtqueue *_vq)
> > +{
> > +       struct vring_virtqueue *vq = to_vvq(_vq);
> > +
> > +       if (!vq->use_dma_api)
> > +               return -EINVAL;
> > +
> > +       vq->premapped = true;
>
> I guess there should be a way to disable it. Would it be useful for
> the case when AF_XDP sockets were destroyed?

Yes.

When we reset the queue, the vq->premapped will be set to 0.

The is called after find_vqs or reset vq.

Thanks.



>
> Thanks
>
>
> > +
> > +       return 0;
> > +}
> > +EXPORT_SYMBOL_GPL(virtqueue_set_premapped);
> > +
> >  /* Only available for split ring */
> >  struct virtqueue *vring_new_virtqueue(unsigned int index,
> >                                       unsigned int num,
> > diff --git a/include/linux/virtio.h b/include/linux/virtio.h
> > index b93238db94e3..1fc0e1023bd4 100644
> > --- a/include/linux/virtio.h
> > +++ b/include/linux/virtio.h
> > @@ -78,6 +78,8 @@ bool virtqueue_enable_cb(struct virtqueue *vq);
> >
> >  unsigned virtqueue_enable_cb_prepare(struct virtqueue *vq);
> >
> > +int virtqueue_set_premapped(struct virtqueue *_vq);
> > +
> >  bool virtqueue_poll(struct virtqueue *vq, unsigned);
> >
> >  bool virtqueue_enable_cb_delayed(struct virtqueue *vq);
> > --
> > 2.32.0.3.g01195cf9f
> >
>
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH vhost v10 02/10] virtio_ring: introduce virtqueue_set_premapped()
@ 2023-06-27  8:50       ` Xuan Zhuo
  0 siblings, 0 replies; 91+ messages in thread
From: Xuan Zhuo @ 2023-06-27  8:50 UTC (permalink / raw)
  To: Jason Wang
  Cc: virtualization, Michael S. Tsirkin, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Alexei Starovoitov,
	Daniel Borkmann, Jesper Dangaard Brouer, John Fastabend, netdev,
	bpf

On Tue, 27 Jun 2023 16:03:23 +0800, Jason Wang <jasowang@redhat.com> wrote:
> On Fri, Jun 2, 2023 at 5:22 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> >
> > This helper allows the driver change the dma mode to premapped mode.
> > Under the premapped mode, the virtio core do not do dma mapping
> > internally.
> >
> > This just work when the use_dma_api is true. If the use_dma_api is false,
> > the dma options is not through the DMA APIs, that is not the standard
> > way of the linux kernel.
> >
> > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > ---
> >  drivers/virtio/virtio_ring.c | 40 ++++++++++++++++++++++++++++++++++++
> >  include/linux/virtio.h       |  2 ++
> >  2 files changed, 42 insertions(+)
> >
> > diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> > index 72ed07a604d4..2afdfb9e3e30 100644
> > --- a/drivers/virtio/virtio_ring.c
> > +++ b/drivers/virtio/virtio_ring.c
> > @@ -172,6 +172,9 @@ struct vring_virtqueue {
> >         /* Host publishes avail event idx */
> >         bool event;
> >
> > +       /* Do DMA mapping by driver */
> > +       bool premapped;
> > +
> >         /* Head of free buffer list. */
> >         unsigned int free_head;
> >         /* Number we've added since last sync. */
> > @@ -2059,6 +2062,7 @@ static struct virtqueue *vring_create_virtqueue_packed(
> >         vq->packed_ring = true;
> >         vq->dma_dev = dma_dev;
> >         vq->use_dma_api = vring_use_dma_api(vdev);
> > +       vq->premapped = false;
> >
> >         vq->indirect = virtio_has_feature(vdev, VIRTIO_RING_F_INDIRECT_DESC) &&
> >                 !context;
> > @@ -2548,6 +2552,7 @@ static struct virtqueue *__vring_new_virtqueue(unsigned int index,
> >  #endif
> >         vq->dma_dev = dma_dev;
> >         vq->use_dma_api = vring_use_dma_api(vdev);
> > +       vq->premapped = false;
> >
> >         vq->indirect = virtio_has_feature(vdev, VIRTIO_RING_F_INDIRECT_DESC) &&
> >                 !context;
> > @@ -2691,6 +2696,41 @@ int virtqueue_resize(struct virtqueue *_vq, u32 num,
> >  }
> >  EXPORT_SYMBOL_GPL(virtqueue_resize);
> >
> > +/**
> > + * virtqueue_set_premapped - set the vring premapped mode
> > + * @_vq: the struct virtqueue we're talking about.
> > + *
> > + * Enable the premapped mode of the vq.
> > + *
> > + * The vring in premapped mode does not do dma internally, so the driver must
> > + * do dma mapping in advance. The driver must pass the dma_address through
> > + * dma_address of scatterlist. When the driver got a used buffer from
> > + * the vring, it has to unmap the dma address. So the driver must call
> > + * virtqueue_get_buf_premapped()/virtqueue_detach_unused_buf_premapped().
> > + *
> > + * This must be called before adding any buf to vring.
>
> And any old buffer should be detached?

I mean that before adding any buf, So there are not old buffer.


>
> > + * So this should be called immediately after init vq or vq reset.
>
> Any way to detect and warn in this case? (not a must if it's too
> expensive to do the check)


I can try to check whether the qeueu is empty.


>
> > + *
> > + * Caller must ensure we don't call this with other virtqueue operations
> > + * at the same time (except where noted).
> > + *
> > + * Returns zero or a negative error.
> > + * 0: success.
> > + * -EINVAL: vring does not use the dma api, so we can not enable premapped mode.
> > + */
> > +int virtqueue_set_premapped(struct virtqueue *_vq)
> > +{
> > +       struct vring_virtqueue *vq = to_vvq(_vq);
> > +
> > +       if (!vq->use_dma_api)
> > +               return -EINVAL;
> > +
> > +       vq->premapped = true;
>
> I guess there should be a way to disable it. Would it be useful for
> the case when AF_XDP sockets were destroyed?

Yes.

When we reset the queue, the vq->premapped will be set to 0.

The is called after find_vqs or reset vq.

Thanks.



>
> Thanks
>
>
> > +
> > +       return 0;
> > +}
> > +EXPORT_SYMBOL_GPL(virtqueue_set_premapped);
> > +
> >  /* Only available for split ring */
> >  struct virtqueue *vring_new_virtqueue(unsigned int index,
> >                                       unsigned int num,
> > diff --git a/include/linux/virtio.h b/include/linux/virtio.h
> > index b93238db94e3..1fc0e1023bd4 100644
> > --- a/include/linux/virtio.h
> > +++ b/include/linux/virtio.h
> > @@ -78,6 +78,8 @@ bool virtqueue_enable_cb(struct virtqueue *vq);
> >
> >  unsigned virtqueue_enable_cb_prepare(struct virtqueue *vq);
> >
> > +int virtqueue_set_premapped(struct virtqueue *_vq);
> > +
> >  bool virtqueue_poll(struct virtqueue *vq, unsigned);
> >
> >  bool virtqueue_enable_cb_delayed(struct virtqueue *vq);
> > --
> > 2.32.0.3.g01195cf9f
> >
>

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH vhost v10 03/10] virtio_ring: split: support add premapped buf
  2023-06-27  8:03     ` Jason Wang
@ 2023-06-27  9:01       ` Xuan Zhuo
  -1 siblings, 0 replies; 91+ messages in thread
From: Xuan Zhuo @ 2023-06-27  9:01 UTC (permalink / raw)
  To: Jason Wang
  Cc: Jesper Dangaard Brouer, Daniel Borkmann, Michael S. Tsirkin,
	netdev, John Fastabend, Alexei Starovoitov, virtualization,
	Eric Dumazet, Jakub Kicinski, bpf, Paolo Abeni, David S. Miller

On Tue, 27 Jun 2023 16:03:26 +0800, Jason Wang <jasowang@redhat.com> wrote:
> On Fri, Jun 2, 2023 at 5:22 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> >
> > If the vq is the premapped mode, use the sg_dma_address() directly.
> >
> > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > ---
> >  drivers/virtio/virtio_ring.c | 46 ++++++++++++++++++++++--------------
> >  1 file changed, 28 insertions(+), 18 deletions(-)
> >
> > diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> > index 2afdfb9e3e30..18212c3e056b 100644
> > --- a/drivers/virtio/virtio_ring.c
> > +++ b/drivers/virtio/virtio_ring.c
> > @@ -598,8 +598,12 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
> >                 for (sg = sgs[n]; sg; sg = sg_next(sg)) {
> >                         dma_addr_t addr;
> >
> > -                       if (vring_map_one_sg(vq, sg, DMA_TO_DEVICE, &addr))
> > -                               goto unmap_release;
> > +                       if (vq->premapped) {
> > +                               addr = sg_dma_address(sg);
> > +                       } else {
> > +                               if (vring_map_one_sg(vq, sg, DMA_TO_DEVICE, &addr))
> > +                                       goto unmap_release;
> > +                       }
>
> Btw, I wonder whether or not it would be simple to implement the
> vq->premapped check inside vring_map_one_sg() assuming the
> !use_dma_api is done there as well.


YES,

That will more simple for the caller.

But we will have things like:

int func(bool do)
{
if (!do)
    return;
}

I like this way, but you don't like it in last version.

>
> >
> >                         prev = i;
> >                         /* Note that we trust indirect descriptor
> > @@ -614,8 +618,12 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
> >                 for (sg = sgs[n]; sg; sg = sg_next(sg)) {
> >                         dma_addr_t addr;
> >
> > -                       if (vring_map_one_sg(vq, sg, DMA_FROM_DEVICE, &addr))
> > -                               goto unmap_release;
> > +                       if (vq->premapped) {
> > +                               addr = sg_dma_address(sg);
> > +                       } else {
> > +                               if (vring_map_one_sg(vq, sg, DMA_FROM_DEVICE, &addr))
> > +                                       goto unmap_release;
> > +                       }
> >
> >                         prev = i;
> >                         /* Note that we trust indirect descriptor
> > @@ -689,21 +697,23 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
> >         return 0;
> >
> >  unmap_release:
> > -       err_idx = i;
> > +       if (!vq->premapped) {
>
> Can vq->premapped be true here? The label is named as "unmap_relase"
> which implies "map" beforehand which seems not the case for
> premapping.

I see.

Rethink about this, there is a better way.
I will fix in next version.


Thanks.


>
> Thanks
>
>
> > +               err_idx = i;
> >
> > -       if (indirect)
> > -               i = 0;
> > -       else
> > -               i = head;
> > -
> > -       for (n = 0; n < total_sg; n++) {
> > -               if (i == err_idx)
> > -                       break;
> > -               if (indirect) {
> > -                       vring_unmap_one_split_indirect(vq, &desc[i]);
> > -                       i = virtio16_to_cpu(_vq->vdev, desc[i].next);
> > -               } else
> > -                       i = vring_unmap_one_split(vq, i);
> > +               if (indirect)
> > +                       i = 0;
> > +               else
> > +                       i = head;
> > +
> > +               for (n = 0; n < total_sg; n++) {
> > +                       if (i == err_idx)
> > +                               break;
> > +                       if (indirect) {
> > +                               vring_unmap_one_split_indirect(vq, &desc[i]);
> > +                               i = virtio16_to_cpu(_vq->vdev, desc[i].next);
> > +                       } else
> > +                               i = vring_unmap_one_split(vq, i);
> > +               }
> >         }
> >
> >         if (indirect)
> > --
> > 2.32.0.3.g01195cf9f
> >
>
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH vhost v10 03/10] virtio_ring: split: support add premapped buf
@ 2023-06-27  9:01       ` Xuan Zhuo
  0 siblings, 0 replies; 91+ messages in thread
From: Xuan Zhuo @ 2023-06-27  9:01 UTC (permalink / raw)
  To: Jason Wang
  Cc: virtualization, Michael S. Tsirkin, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Alexei Starovoitov,
	Daniel Borkmann, Jesper Dangaard Brouer, John Fastabend, netdev,
	bpf

On Tue, 27 Jun 2023 16:03:26 +0800, Jason Wang <jasowang@redhat.com> wrote:
> On Fri, Jun 2, 2023 at 5:22 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> >
> > If the vq is the premapped mode, use the sg_dma_address() directly.
> >
> > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > ---
> >  drivers/virtio/virtio_ring.c | 46 ++++++++++++++++++++++--------------
> >  1 file changed, 28 insertions(+), 18 deletions(-)
> >
> > diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> > index 2afdfb9e3e30..18212c3e056b 100644
> > --- a/drivers/virtio/virtio_ring.c
> > +++ b/drivers/virtio/virtio_ring.c
> > @@ -598,8 +598,12 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
> >                 for (sg = sgs[n]; sg; sg = sg_next(sg)) {
> >                         dma_addr_t addr;
> >
> > -                       if (vring_map_one_sg(vq, sg, DMA_TO_DEVICE, &addr))
> > -                               goto unmap_release;
> > +                       if (vq->premapped) {
> > +                               addr = sg_dma_address(sg);
> > +                       } else {
> > +                               if (vring_map_one_sg(vq, sg, DMA_TO_DEVICE, &addr))
> > +                                       goto unmap_release;
> > +                       }
>
> Btw, I wonder whether or not it would be simple to implement the
> vq->premapped check inside vring_map_one_sg() assuming the
> !use_dma_api is done there as well.


YES,

That will more simple for the caller.

But we will have things like:

int func(bool do)
{
if (!do)
    return;
}

I like this way, but you don't like it in last version.

>
> >
> >                         prev = i;
> >                         /* Note that we trust indirect descriptor
> > @@ -614,8 +618,12 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
> >                 for (sg = sgs[n]; sg; sg = sg_next(sg)) {
> >                         dma_addr_t addr;
> >
> > -                       if (vring_map_one_sg(vq, sg, DMA_FROM_DEVICE, &addr))
> > -                               goto unmap_release;
> > +                       if (vq->premapped) {
> > +                               addr = sg_dma_address(sg);
> > +                       } else {
> > +                               if (vring_map_one_sg(vq, sg, DMA_FROM_DEVICE, &addr))
> > +                                       goto unmap_release;
> > +                       }
> >
> >                         prev = i;
> >                         /* Note that we trust indirect descriptor
> > @@ -689,21 +697,23 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
> >         return 0;
> >
> >  unmap_release:
> > -       err_idx = i;
> > +       if (!vq->premapped) {
>
> Can vq->premapped be true here? The label is named as "unmap_relase"
> which implies "map" beforehand which seems not the case for
> premapping.

I see.

Rethink about this, there is a better way.
I will fix in next version.


Thanks.


>
> Thanks
>
>
> > +               err_idx = i;
> >
> > -       if (indirect)
> > -               i = 0;
> > -       else
> > -               i = head;
> > -
> > -       for (n = 0; n < total_sg; n++) {
> > -               if (i == err_idx)
> > -                       break;
> > -               if (indirect) {
> > -                       vring_unmap_one_split_indirect(vq, &desc[i]);
> > -                       i = virtio16_to_cpu(_vq->vdev, desc[i].next);
> > -               } else
> > -                       i = vring_unmap_one_split(vq, i);
> > +               if (indirect)
> > +                       i = 0;
> > +               else
> > +                       i = head;
> > +
> > +               for (n = 0; n < total_sg; n++) {
> > +                       if (i == err_idx)
> > +                               break;
> > +                       if (indirect) {
> > +                               vring_unmap_one_split_indirect(vq, &desc[i]);
> > +                               i = virtio16_to_cpu(_vq->vdev, desc[i].next);
> > +                       } else
> > +                               i = vring_unmap_one_split(vq, i);
> > +               }
> >         }
> >
> >         if (indirect)
> > --
> > 2.32.0.3.g01195cf9f
> >
>

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH vhost v10 04/10] virtio_ring: packed: support add premapped buf
  2023-06-27  8:03     ` Jason Wang
@ 2023-06-27  9:05       ` Xuan Zhuo
  -1 siblings, 0 replies; 91+ messages in thread
From: Xuan Zhuo @ 2023-06-27  9:05 UTC (permalink / raw)
  To: Jason Wang
  Cc: Jesper Dangaard Brouer, Daniel Borkmann, Michael S. Tsirkin,
	netdev, John Fastabend, Alexei Starovoitov, virtualization,
	Eric Dumazet, Jakub Kicinski, bpf, Paolo Abeni, David S. Miller

On Tue, 27 Jun 2023 16:03:29 +0800, Jason Wang <jasowang@redhat.com> wrote:
> On Fri, Jun 2, 2023 at 5:22 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> >
> > If the vq is the premapped mode, use the sg_dma_address() directly.
> >
> > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > ---
> >  drivers/virtio/virtio_ring.c | 36 ++++++++++++++++++++++++++----------
> >  1 file changed, 26 insertions(+), 10 deletions(-)
> >
> > diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> > index 18212c3e056b..dc109fbc05a5 100644
> > --- a/drivers/virtio/virtio_ring.c
> > +++ b/drivers/virtio/virtio_ring.c
> > @@ -1299,9 +1299,13 @@ static int virtqueue_add_indirect_packed(struct vring_virtqueue *vq,
> >
> >         for (n = 0; n < out_sgs + in_sgs; n++) {
> >                 for (sg = sgs[n]; sg; sg = sg_next(sg)) {
> > -                       if (vring_map_one_sg(vq, sg, n < out_sgs ?
> > -                                            DMA_TO_DEVICE : DMA_FROM_DEVICE, &addr))
> > -                               goto unmap_release;
> > +                       if (vq->premapped) {
> > +                               addr = sg_dma_address(sg);
> > +                       } else {
> > +                               if (vring_map_one_sg(vq, sg, n < out_sgs ?
> > +                                                    DMA_TO_DEVICE : DMA_FROM_DEVICE, &addr))
> > +                                       goto unmap_release;
> > +                       }
> >
> >                         desc[i].flags = cpu_to_le16(n < out_sgs ?
> >                                                 0 : VRING_DESC_F_WRITE);
> > @@ -1369,10 +1373,12 @@ static int virtqueue_add_indirect_packed(struct vring_virtqueue *vq,
> >         return 0;
> >
> >  unmap_release:
> > -       err_idx = i;
> > +       if (!vq->premapped) {
> > +               err_idx = i;
> >
> > -       for (i = 0; i < err_idx; i++)
> > -               vring_unmap_desc_packed(vq, &desc[i]);
> > +               for (i = 0; i < err_idx; i++)
> > +                       vring_unmap_desc_packed(vq, &desc[i]);
> > +       }
> >
> >         kfree(desc);
> >
> > @@ -1447,9 +1453,13 @@ static inline int virtqueue_add_packed(struct virtqueue *_vq,
> >                 for (sg = sgs[n]; sg; sg = sg_next(sg)) {
> >                         dma_addr_t addr;
> >
> > -                       if (vring_map_one_sg(vq, sg, n < out_sgs ?
> > -                                            DMA_TO_DEVICE : DMA_FROM_DEVICE, &addr))
> > -                               goto unmap_release;
> > +                       if (vq->premapped) {
> > +                               addr = sg_dma_address(sg);
> > +                       } else {
> > +                               if (vring_map_one_sg(vq, sg, n < out_sgs ?
> > +                                                    DMA_TO_DEVICE : DMA_FROM_DEVICE, &addr))
> > +                                       goto unmap_release;
> > +                       }
> >
> >                         flags = cpu_to_le16(vq->packed.avail_used_flags |
> >                                     (++c == total_sg ? 0 : VRING_DESC_F_NEXT) |
> > @@ -1512,11 +1522,17 @@ static inline int virtqueue_add_packed(struct virtqueue *_vq,
> >         return 0;
> >
> >  unmap_release:
> > +       vq->packed.avail_used_flags = avail_used_flags;
> > +
> > +       if (vq->premapped) {
>
> Similar to the split path, I think we can't hit vq->premapped here.


YES, similar to the above reply, we can have a better way.

But, we can hit vq->premapped, when we fail doing dma for the indirect desc
array.

Thanks.




>
> Thanks
>
>
> > +               END_USE(vq);
> > +               return -EIO;
> > +       }
> > +
> >         err_idx = i;
> >         i = head;
> >         curr = vq->free_head;
> >
> > -       vq->packed.avail_used_flags = avail_used_flags;
> >
> >         for (n = 0; n < total_sg; n++) {
> >                 if (i == err_idx)
> > --
> > 2.32.0.3.g01195cf9f
> >
>
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH vhost v10 04/10] virtio_ring: packed: support add premapped buf
@ 2023-06-27  9:05       ` Xuan Zhuo
  0 siblings, 0 replies; 91+ messages in thread
From: Xuan Zhuo @ 2023-06-27  9:05 UTC (permalink / raw)
  To: Jason Wang
  Cc: virtualization, Michael S. Tsirkin, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Alexei Starovoitov,
	Daniel Borkmann, Jesper Dangaard Brouer, John Fastabend, netdev,
	bpf

On Tue, 27 Jun 2023 16:03:29 +0800, Jason Wang <jasowang@redhat.com> wrote:
> On Fri, Jun 2, 2023 at 5:22 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> >
> > If the vq is the premapped mode, use the sg_dma_address() directly.
> >
> > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > ---
> >  drivers/virtio/virtio_ring.c | 36 ++++++++++++++++++++++++++----------
> >  1 file changed, 26 insertions(+), 10 deletions(-)
> >
> > diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> > index 18212c3e056b..dc109fbc05a5 100644
> > --- a/drivers/virtio/virtio_ring.c
> > +++ b/drivers/virtio/virtio_ring.c
> > @@ -1299,9 +1299,13 @@ static int virtqueue_add_indirect_packed(struct vring_virtqueue *vq,
> >
> >         for (n = 0; n < out_sgs + in_sgs; n++) {
> >                 for (sg = sgs[n]; sg; sg = sg_next(sg)) {
> > -                       if (vring_map_one_sg(vq, sg, n < out_sgs ?
> > -                                            DMA_TO_DEVICE : DMA_FROM_DEVICE, &addr))
> > -                               goto unmap_release;
> > +                       if (vq->premapped) {
> > +                               addr = sg_dma_address(sg);
> > +                       } else {
> > +                               if (vring_map_one_sg(vq, sg, n < out_sgs ?
> > +                                                    DMA_TO_DEVICE : DMA_FROM_DEVICE, &addr))
> > +                                       goto unmap_release;
> > +                       }
> >
> >                         desc[i].flags = cpu_to_le16(n < out_sgs ?
> >                                                 0 : VRING_DESC_F_WRITE);
> > @@ -1369,10 +1373,12 @@ static int virtqueue_add_indirect_packed(struct vring_virtqueue *vq,
> >         return 0;
> >
> >  unmap_release:
> > -       err_idx = i;
> > +       if (!vq->premapped) {
> > +               err_idx = i;
> >
> > -       for (i = 0; i < err_idx; i++)
> > -               vring_unmap_desc_packed(vq, &desc[i]);
> > +               for (i = 0; i < err_idx; i++)
> > +                       vring_unmap_desc_packed(vq, &desc[i]);
> > +       }
> >
> >         kfree(desc);
> >
> > @@ -1447,9 +1453,13 @@ static inline int virtqueue_add_packed(struct virtqueue *_vq,
> >                 for (sg = sgs[n]; sg; sg = sg_next(sg)) {
> >                         dma_addr_t addr;
> >
> > -                       if (vring_map_one_sg(vq, sg, n < out_sgs ?
> > -                                            DMA_TO_DEVICE : DMA_FROM_DEVICE, &addr))
> > -                               goto unmap_release;
> > +                       if (vq->premapped) {
> > +                               addr = sg_dma_address(sg);
> > +                       } else {
> > +                               if (vring_map_one_sg(vq, sg, n < out_sgs ?
> > +                                                    DMA_TO_DEVICE : DMA_FROM_DEVICE, &addr))
> > +                                       goto unmap_release;
> > +                       }
> >
> >                         flags = cpu_to_le16(vq->packed.avail_used_flags |
> >                                     (++c == total_sg ? 0 : VRING_DESC_F_NEXT) |
> > @@ -1512,11 +1522,17 @@ static inline int virtqueue_add_packed(struct virtqueue *_vq,
> >         return 0;
> >
> >  unmap_release:
> > +       vq->packed.avail_used_flags = avail_used_flags;
> > +
> > +       if (vq->premapped) {
>
> Similar to the split path, I think we can't hit vq->premapped here.


YES, similar to the above reply, we can have a better way.

But, we can hit vq->premapped, when we fail doing dma for the indirect desc
array.

Thanks.




>
> Thanks
>
>
> > +               END_USE(vq);
> > +               return -EIO;
> > +       }
> > +
> >         err_idx = i;
> >         i = head;
> >         curr = vq->free_head;
> >
> > -       vq->packed.avail_used_flags = avail_used_flags;
> >
> >         for (n = 0; n < total_sg; n++) {
> >                 if (i == err_idx)
> > --
> > 2.32.0.3.g01195cf9f
> >
>

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH vhost v10 05/10] virtio_ring: split-detach: support return dma info to driver
  2023-06-27  8:03     ` Jason Wang
@ 2023-06-27  9:21       ` Xuan Zhuo
  -1 siblings, 0 replies; 91+ messages in thread
From: Xuan Zhuo @ 2023-06-27  9:21 UTC (permalink / raw)
  To: Jason Wang
  Cc: Jesper Dangaard Brouer, Daniel Borkmann, Michael S. Tsirkin,
	netdev, John Fastabend, Alexei Starovoitov, virtualization,
	Eric Dumazet, Jakub Kicinski, bpf, Paolo Abeni, David S. Miller

On Tue, 27 Jun 2023 16:03:31 +0800, Jason Wang <jasowang@redhat.com> wrote:
> On Fri, Jun 2, 2023 at 5:22 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> >
> > Under the premapped mode, the driver needs to unmap the DMA address
> > after receiving the buffer. The virtio core records the DMA address,
> > so the driver needs a way to get the dma info from the virtio core.
>
> A second thought, can we simply offload the tracking to the driver
> itself? This looks the way many other modern NIC drivers did.
>
> In pre mapped mode, the DMA address is in fact told by the driver
> itself so it should have sufficient knowledge. And in some cases, the
> driver wants to optimize/merge/delay the unampping so the DMA
> addresses returned by the virtio core are not even interested in those
> cases.

Will fix.

Thanks.


>
> Thanks
>
>
>
> >
> > A straightforward approach is to pass an array to the virtio core when
> > calling virtqueue_get_buf(). However, it is not feasible when there are
> > multiple DMA addresses in the descriptor chain, and the array size is
> > unknown.
> >
> > To solve this problem, a helper be introduced. After calling
> > virtqueue_get_buf(), the driver can call the helper to
> > retrieve a dma info. If the helper function returns -EAGAIN, it means
> > that there are more DMA addresses to be processed, and the driver should
> > call the helper function again. To keep track of the current position in
> > the chain, a cursor must be passed to the helper function, which is
> > initialized by virtqueue_get_buf().
> >
> > Some processes are done inside this helper, so this helper MUST be
> > called under the premapped mode.
> >
> > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > ---
> >  drivers/virtio/virtio_ring.c | 118 ++++++++++++++++++++++++++++++++---
> >  include/linux/virtio.h       |  11 ++++
> >  2 files changed, 119 insertions(+), 10 deletions(-)
> >
> > diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> > index dc109fbc05a5..cdc4349f6066 100644
> > --- a/drivers/virtio/virtio_ring.c
> > +++ b/drivers/virtio/virtio_ring.c
> > @@ -754,8 +754,95 @@ static bool virtqueue_kick_prepare_split(struct virtqueue *_vq)
> >         return needs_kick;
> >  }
> >
> > -static void detach_buf_split(struct vring_virtqueue *vq, unsigned int head,
> > -                            void **ctx)
> > +static void detach_cursor_init_split(struct vring_virtqueue *vq,
> > +                                    struct virtqueue_detach_cursor *cursor, u16 head)
> > +{
> > +       struct vring_desc_extra *extra;
> > +
> > +       extra = &vq->split.desc_extra[head];
> > +
> > +       /* Clear data ptr. */
> > +       vq->split.desc_state[head].data = NULL;
> > +
> > +       cursor->head = head;
> > +       cursor->done = 0;
> > +
> > +       if (extra->flags & VRING_DESC_F_INDIRECT) {
> > +               cursor->num = extra->len / sizeof(struct vring_desc);
> > +               cursor->indirect = true;
> > +               cursor->pos = 0;
> > +
> > +               vring_unmap_one_split(vq, head);
> > +
> > +               extra->next = vq->free_head;
> > +
> > +               vq->free_head = head;
> > +
> > +               /* Plus final descriptor */
> > +               vq->vq.num_free++;
> > +
> > +       } else {
> > +               cursor->indirect = false;
> > +               cursor->pos = head;
> > +       }
> > +}
> > +
> > +static int virtqueue_detach_split(struct virtqueue *_vq, struct virtqueue_detach_cursor *cursor,
> > +                                 dma_addr_t *addr, u32 *len, enum dma_data_direction *dir)
> > +{
> > +       struct vring_virtqueue *vq = to_vvq(_vq);
> > +       __virtio16 nextflag = cpu_to_virtio16(vq->vq.vdev, VRING_DESC_F_NEXT);
> > +       int rc = -EAGAIN;
> > +
> > +       if (unlikely(cursor->done))
> > +               return -EINVAL;
> > +
> > +       if (!cursor->indirect) {
> > +               struct vring_desc_extra *extra;
> > +               unsigned int i;
> > +
> > +               i = cursor->pos;
> > +
> > +               extra = &vq->split.desc_extra[i];
> > +
> > +               if (vq->split.vring.desc[i].flags & nextflag) {
> > +                       cursor->pos = extra->next;
> > +               } else {
> > +                       extra->next = vq->free_head;
> > +                       vq->free_head = cursor->head;
> > +                       cursor->done = true;
> > +                       rc = 0;
> > +               }
> > +
> > +               *addr = extra->addr;
> > +               *len = extra->len;
> > +               *dir = (extra->flags & VRING_DESC_F_WRITE) ? DMA_FROM_DEVICE : DMA_TO_DEVICE;
> > +
> > +               vq->vq.num_free++;
> > +
> > +       } else {
> > +               struct vring_desc *indir_desc, *desc;
> > +               u16 flags;
> > +
> > +               indir_desc = vq->split.desc_state[cursor->head].indir_desc;
> > +               desc = &indir_desc[cursor->pos];
> > +
> > +               flags = virtio16_to_cpu(vq->vq.vdev, desc->flags);
> > +               *addr = virtio64_to_cpu(vq->vq.vdev, desc->addr);
> > +               *len = virtio32_to_cpu(vq->vq.vdev, desc->len);
> > +               *dir = (flags & VRING_DESC_F_WRITE) ? DMA_FROM_DEVICE : DMA_TO_DEVICE;
> > +
> > +               if (++cursor->pos == cursor->num) {
> > +                       kfree(indir_desc);
> > +                       cursor->done = true;
> > +                       return 0;
> > +               }
> > +       }
> > +
> > +       return rc;
> > +}
> > +
> > +static void detach_buf_split(struct vring_virtqueue *vq, unsigned int head)
> >  {
> >         unsigned int i, j;
> >         __virtio16 nextflag = cpu_to_virtio16(vq->vq.vdev, VRING_DESC_F_NEXT);
> > @@ -799,8 +886,6 @@ static void detach_buf_split(struct vring_virtqueue *vq, unsigned int head,
> >
> >                 kfree(indir_desc);
> >                 vq->split.desc_state[head].indir_desc = NULL;
> > -       } else if (ctx) {
> > -               *ctx = vq->split.desc_state[head].indir_desc;
> >         }
> >  }
> >
> > @@ -812,7 +897,8 @@ static bool more_used_split(const struct vring_virtqueue *vq)
> >
> >  static void *virtqueue_get_buf_ctx_split(struct virtqueue *_vq,
> >                                          unsigned int *len,
> > -                                        void **ctx)
> > +                                        void **ctx,
> > +                                        struct virtqueue_detach_cursor *cursor)
> >  {
> >         struct vring_virtqueue *vq = to_vvq(_vq);
> >         void *ret;
> > @@ -852,7 +938,15 @@ static void *virtqueue_get_buf_ctx_split(struct virtqueue *_vq,
> >
> >         /* detach_buf_split clears data, so grab it now. */
> >         ret = vq->split.desc_state[i].data;
> > -       detach_buf_split(vq, i, ctx);
> > +
> > +       if (!vq->indirect && ctx)
> > +               *ctx = vq->split.desc_state[i].indir_desc;
> > +
> > +       if (vq->premapped)
> > +               detach_cursor_init_split(vq, cursor, i);
> > +       else
> > +               detach_buf_split(vq, i);
> > +
> >         vq->last_used_idx++;
> >         /* If we expect an interrupt for the next entry, tell host
> >          * by writing event index and flush out the write before
> > @@ -961,7 +1055,8 @@ static bool virtqueue_enable_cb_delayed_split(struct virtqueue *_vq)
> >         return true;
> >  }
> >
> > -static void *virtqueue_detach_unused_buf_split(struct virtqueue *_vq)
> > +static void *virtqueue_detach_unused_buf_split(struct virtqueue *_vq,
> > +                                              struct virtqueue_detach_cursor *cursor)
> >  {
> >         struct vring_virtqueue *vq = to_vvq(_vq);
> >         unsigned int i;
> > @@ -974,7 +1069,10 @@ static void *virtqueue_detach_unused_buf_split(struct virtqueue *_vq)
> >                         continue;
> >                 /* detach_buf_split clears data, so grab it now. */
> >                 buf = vq->split.desc_state[i].data;
> > -               detach_buf_split(vq, i, NULL);
> > +               if (vq->premapped)
> > +                       detach_cursor_init_split(vq, cursor, i);
> > +               else
> > +                       detach_buf_split(vq, i);
> >                 vq->split.avail_idx_shadow--;
> >                 vq->split.vring.avail->idx = cpu_to_virtio16(_vq->vdev,
> >                                 vq->split.avail_idx_shadow);
> > @@ -2361,7 +2459,7 @@ void *virtqueue_get_buf_ctx(struct virtqueue *_vq, unsigned int *len,
> >         struct vring_virtqueue *vq = to_vvq(_vq);
> >
> >         return vq->packed_ring ? virtqueue_get_buf_ctx_packed(_vq, len, ctx) :
> > -                                virtqueue_get_buf_ctx_split(_vq, len, ctx);
> > +                                virtqueue_get_buf_ctx_split(_vq, len, ctx, NULL);
> >  }
> >  EXPORT_SYMBOL_GPL(virtqueue_get_buf_ctx);
> >
> > @@ -2493,7 +2591,7 @@ void *virtqueue_detach_unused_buf(struct virtqueue *_vq)
> >         struct vring_virtqueue *vq = to_vvq(_vq);
> >
> >         return vq->packed_ring ? virtqueue_detach_unused_buf_packed(_vq) :
> > -                                virtqueue_detach_unused_buf_split(_vq);
> > +                                virtqueue_detach_unused_buf_split(_vq, NULL);
> >  }
> >  EXPORT_SYMBOL_GPL(virtqueue_detach_unused_buf);
> >
> > diff --git a/include/linux/virtio.h b/include/linux/virtio.h
> > index 1fc0e1023bd4..eb4a4e4329aa 100644
> > --- a/include/linux/virtio.h
> > +++ b/include/linux/virtio.h
> > @@ -38,6 +38,17 @@ struct virtqueue {
> >         void *priv;
> >  };
> >
> > +struct virtqueue_detach_cursor {
> > +       unsigned indirect:1;
> > +       unsigned done:1;
> > +       unsigned hole:14;
> > +
> > +       /* for split head */
> > +       unsigned head:16;
> > +       unsigned num:16;
> > +       unsigned pos:16;
> > +};
> > +
> >  int virtqueue_add_outbuf(struct virtqueue *vq,
> >                          struct scatterlist sg[], unsigned int num,
> >                          void *data,
> > --
> > 2.32.0.3.g01195cf9f
> >
>
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH vhost v10 05/10] virtio_ring: split-detach: support return dma info to driver
@ 2023-06-27  9:21       ` Xuan Zhuo
  0 siblings, 0 replies; 91+ messages in thread
From: Xuan Zhuo @ 2023-06-27  9:21 UTC (permalink / raw)
  To: Jason Wang
  Cc: virtualization, Michael S. Tsirkin, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Alexei Starovoitov,
	Daniel Borkmann, Jesper Dangaard Brouer, John Fastabend, netdev,
	bpf

On Tue, 27 Jun 2023 16:03:31 +0800, Jason Wang <jasowang@redhat.com> wrote:
> On Fri, Jun 2, 2023 at 5:22 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> >
> > Under the premapped mode, the driver needs to unmap the DMA address
> > after receiving the buffer. The virtio core records the DMA address,
> > so the driver needs a way to get the dma info from the virtio core.
>
> A second thought, can we simply offload the tracking to the driver
> itself? This looks the way many other modern NIC drivers did.
>
> In pre mapped mode, the DMA address is in fact told by the driver
> itself so it should have sufficient knowledge. And in some cases, the
> driver wants to optimize/merge/delay the unampping so the DMA
> addresses returned by the virtio core are not even interested in those
> cases.

Will fix.

Thanks.


>
> Thanks
>
>
>
> >
> > A straightforward approach is to pass an array to the virtio core when
> > calling virtqueue_get_buf(). However, it is not feasible when there are
> > multiple DMA addresses in the descriptor chain, and the array size is
> > unknown.
> >
> > To solve this problem, a helper be introduced. After calling
> > virtqueue_get_buf(), the driver can call the helper to
> > retrieve a dma info. If the helper function returns -EAGAIN, it means
> > that there are more DMA addresses to be processed, and the driver should
> > call the helper function again. To keep track of the current position in
> > the chain, a cursor must be passed to the helper function, which is
> > initialized by virtqueue_get_buf().
> >
> > Some processes are done inside this helper, so this helper MUST be
> > called under the premapped mode.
> >
> > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > ---
> >  drivers/virtio/virtio_ring.c | 118 ++++++++++++++++++++++++++++++++---
> >  include/linux/virtio.h       |  11 ++++
> >  2 files changed, 119 insertions(+), 10 deletions(-)
> >
> > diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> > index dc109fbc05a5..cdc4349f6066 100644
> > --- a/drivers/virtio/virtio_ring.c
> > +++ b/drivers/virtio/virtio_ring.c
> > @@ -754,8 +754,95 @@ static bool virtqueue_kick_prepare_split(struct virtqueue *_vq)
> >         return needs_kick;
> >  }
> >
> > -static void detach_buf_split(struct vring_virtqueue *vq, unsigned int head,
> > -                            void **ctx)
> > +static void detach_cursor_init_split(struct vring_virtqueue *vq,
> > +                                    struct virtqueue_detach_cursor *cursor, u16 head)
> > +{
> > +       struct vring_desc_extra *extra;
> > +
> > +       extra = &vq->split.desc_extra[head];
> > +
> > +       /* Clear data ptr. */
> > +       vq->split.desc_state[head].data = NULL;
> > +
> > +       cursor->head = head;
> > +       cursor->done = 0;
> > +
> > +       if (extra->flags & VRING_DESC_F_INDIRECT) {
> > +               cursor->num = extra->len / sizeof(struct vring_desc);
> > +               cursor->indirect = true;
> > +               cursor->pos = 0;
> > +
> > +               vring_unmap_one_split(vq, head);
> > +
> > +               extra->next = vq->free_head;
> > +
> > +               vq->free_head = head;
> > +
> > +               /* Plus final descriptor */
> > +               vq->vq.num_free++;
> > +
> > +       } else {
> > +               cursor->indirect = false;
> > +               cursor->pos = head;
> > +       }
> > +}
> > +
> > +static int virtqueue_detach_split(struct virtqueue *_vq, struct virtqueue_detach_cursor *cursor,
> > +                                 dma_addr_t *addr, u32 *len, enum dma_data_direction *dir)
> > +{
> > +       struct vring_virtqueue *vq = to_vvq(_vq);
> > +       __virtio16 nextflag = cpu_to_virtio16(vq->vq.vdev, VRING_DESC_F_NEXT);
> > +       int rc = -EAGAIN;
> > +
> > +       if (unlikely(cursor->done))
> > +               return -EINVAL;
> > +
> > +       if (!cursor->indirect) {
> > +               struct vring_desc_extra *extra;
> > +               unsigned int i;
> > +
> > +               i = cursor->pos;
> > +
> > +               extra = &vq->split.desc_extra[i];
> > +
> > +               if (vq->split.vring.desc[i].flags & nextflag) {
> > +                       cursor->pos = extra->next;
> > +               } else {
> > +                       extra->next = vq->free_head;
> > +                       vq->free_head = cursor->head;
> > +                       cursor->done = true;
> > +                       rc = 0;
> > +               }
> > +
> > +               *addr = extra->addr;
> > +               *len = extra->len;
> > +               *dir = (extra->flags & VRING_DESC_F_WRITE) ? DMA_FROM_DEVICE : DMA_TO_DEVICE;
> > +
> > +               vq->vq.num_free++;
> > +
> > +       } else {
> > +               struct vring_desc *indir_desc, *desc;
> > +               u16 flags;
> > +
> > +               indir_desc = vq->split.desc_state[cursor->head].indir_desc;
> > +               desc = &indir_desc[cursor->pos];
> > +
> > +               flags = virtio16_to_cpu(vq->vq.vdev, desc->flags);
> > +               *addr = virtio64_to_cpu(vq->vq.vdev, desc->addr);
> > +               *len = virtio32_to_cpu(vq->vq.vdev, desc->len);
> > +               *dir = (flags & VRING_DESC_F_WRITE) ? DMA_FROM_DEVICE : DMA_TO_DEVICE;
> > +
> > +               if (++cursor->pos == cursor->num) {
> > +                       kfree(indir_desc);
> > +                       cursor->done = true;
> > +                       return 0;
> > +               }
> > +       }
> > +
> > +       return rc;
> > +}
> > +
> > +static void detach_buf_split(struct vring_virtqueue *vq, unsigned int head)
> >  {
> >         unsigned int i, j;
> >         __virtio16 nextflag = cpu_to_virtio16(vq->vq.vdev, VRING_DESC_F_NEXT);
> > @@ -799,8 +886,6 @@ static void detach_buf_split(struct vring_virtqueue *vq, unsigned int head,
> >
> >                 kfree(indir_desc);
> >                 vq->split.desc_state[head].indir_desc = NULL;
> > -       } else if (ctx) {
> > -               *ctx = vq->split.desc_state[head].indir_desc;
> >         }
> >  }
> >
> > @@ -812,7 +897,8 @@ static bool more_used_split(const struct vring_virtqueue *vq)
> >
> >  static void *virtqueue_get_buf_ctx_split(struct virtqueue *_vq,
> >                                          unsigned int *len,
> > -                                        void **ctx)
> > +                                        void **ctx,
> > +                                        struct virtqueue_detach_cursor *cursor)
> >  {
> >         struct vring_virtqueue *vq = to_vvq(_vq);
> >         void *ret;
> > @@ -852,7 +938,15 @@ static void *virtqueue_get_buf_ctx_split(struct virtqueue *_vq,
> >
> >         /* detach_buf_split clears data, so grab it now. */
> >         ret = vq->split.desc_state[i].data;
> > -       detach_buf_split(vq, i, ctx);
> > +
> > +       if (!vq->indirect && ctx)
> > +               *ctx = vq->split.desc_state[i].indir_desc;
> > +
> > +       if (vq->premapped)
> > +               detach_cursor_init_split(vq, cursor, i);
> > +       else
> > +               detach_buf_split(vq, i);
> > +
> >         vq->last_used_idx++;
> >         /* If we expect an interrupt for the next entry, tell host
> >          * by writing event index and flush out the write before
> > @@ -961,7 +1055,8 @@ static bool virtqueue_enable_cb_delayed_split(struct virtqueue *_vq)
> >         return true;
> >  }
> >
> > -static void *virtqueue_detach_unused_buf_split(struct virtqueue *_vq)
> > +static void *virtqueue_detach_unused_buf_split(struct virtqueue *_vq,
> > +                                              struct virtqueue_detach_cursor *cursor)
> >  {
> >         struct vring_virtqueue *vq = to_vvq(_vq);
> >         unsigned int i;
> > @@ -974,7 +1069,10 @@ static void *virtqueue_detach_unused_buf_split(struct virtqueue *_vq)
> >                         continue;
> >                 /* detach_buf_split clears data, so grab it now. */
> >                 buf = vq->split.desc_state[i].data;
> > -               detach_buf_split(vq, i, NULL);
> > +               if (vq->premapped)
> > +                       detach_cursor_init_split(vq, cursor, i);
> > +               else
> > +                       detach_buf_split(vq, i);
> >                 vq->split.avail_idx_shadow--;
> >                 vq->split.vring.avail->idx = cpu_to_virtio16(_vq->vdev,
> >                                 vq->split.avail_idx_shadow);
> > @@ -2361,7 +2459,7 @@ void *virtqueue_get_buf_ctx(struct virtqueue *_vq, unsigned int *len,
> >         struct vring_virtqueue *vq = to_vvq(_vq);
> >
> >         return vq->packed_ring ? virtqueue_get_buf_ctx_packed(_vq, len, ctx) :
> > -                                virtqueue_get_buf_ctx_split(_vq, len, ctx);
> > +                                virtqueue_get_buf_ctx_split(_vq, len, ctx, NULL);
> >  }
> >  EXPORT_SYMBOL_GPL(virtqueue_get_buf_ctx);
> >
> > @@ -2493,7 +2591,7 @@ void *virtqueue_detach_unused_buf(struct virtqueue *_vq)
> >         struct vring_virtqueue *vq = to_vvq(_vq);
> >
> >         return vq->packed_ring ? virtqueue_detach_unused_buf_packed(_vq) :
> > -                                virtqueue_detach_unused_buf_split(_vq);
> > +                                virtqueue_detach_unused_buf_split(_vq, NULL);
> >  }
> >  EXPORT_SYMBOL_GPL(virtqueue_detach_unused_buf);
> >
> > diff --git a/include/linux/virtio.h b/include/linux/virtio.h
> > index 1fc0e1023bd4..eb4a4e4329aa 100644
> > --- a/include/linux/virtio.h
> > +++ b/include/linux/virtio.h
> > @@ -38,6 +38,17 @@ struct virtqueue {
> >         void *priv;
> >  };
> >
> > +struct virtqueue_detach_cursor {
> > +       unsigned indirect:1;
> > +       unsigned done:1;
> > +       unsigned hole:14;
> > +
> > +       /* for split head */
> > +       unsigned head:16;
> > +       unsigned num:16;
> > +       unsigned pos:16;
> > +};
> > +
> >  int virtqueue_add_outbuf(struct virtqueue *vq,
> >                          struct scatterlist sg[], unsigned int num,
> >                          void *data,
> > --
> > 2.32.0.3.g01195cf9f
> >
>

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH vhost v10 10/10] virtio_net: support dma premapped
  2023-06-27  8:03     ` Jason Wang
@ 2023-06-27  9:23       ` Xuan Zhuo
  -1 siblings, 0 replies; 91+ messages in thread
From: Xuan Zhuo @ 2023-06-27  9:23 UTC (permalink / raw)
  To: Jason Wang
  Cc: Jesper Dangaard Brouer, Daniel Borkmann, Michael S. Tsirkin,
	netdev, John Fastabend, Alexei Starovoitov, virtualization,
	Eric Dumazet, Jakub Kicinski, bpf, Paolo Abeni, David S. Miller

On Tue, 27 Jun 2023 16:03:35 +0800, Jason Wang <jasowang@redhat.com> wrote:
> On Fri, Jun 2, 2023 at 5:22 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> >
> > Introduce the module param "experiment_premapped" to enable the function
> > that the virtio-net do dma mapping.
> >
> > If that is true, the vq of virtio-net is under the premapped mode.
> > It just handle the sg with dma_address. And the driver must get the dma
> > address of the buffer to unmap after get the buffer from virtio core.
> >
> > That will be useful when AF_XDP is enable, AF_XDP tx and the kernel packet
> > xmit will share the tx queue, so the skb xmit must support the premapped
> > mode.
> >
> > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > ---
> >  drivers/net/virtio_net.c | 163 +++++++++++++++++++++++++++++++++------
> >  1 file changed, 141 insertions(+), 22 deletions(-)
> >
> > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > index 2396c28c0122..5898212fcb3c 100644
> > --- a/drivers/net/virtio_net.c
> > +++ b/drivers/net/virtio_net.c
> > @@ -26,10 +26,11 @@
> >  static int napi_weight = NAPI_POLL_WEIGHT;
> >  module_param(napi_weight, int, 0444);
> >
> > -static bool csum = true, gso = true, napi_tx = true;
> > +static bool csum = true, gso = true, napi_tx = true, experiment_premapped;
> >  module_param(csum, bool, 0444);
> >  module_param(gso, bool, 0444);
> >  module_param(napi_tx, bool, 0644);
> > +module_param(experiment_premapped, bool, 0644);
>
> Having a module parameter is sub-optimal. I think we can demonstrate
> real benefit:
>
> In the case of a merge rx buffer, if the mapping is done by the
> virtio-core, it needs to be done per buffer (< PAGE_SIZE).
>
> But if it is done by the virtio-net, we have a chance to map the
> buffer per page. Which can save a lot of mappings and unmapping. A lot
> of other optimizations could be done on top as well.


Good point.

Thanks


>
> If we manage to prove this, we don't need any experimental module
> parameters at all.
>
> Thanks
>
>
> >
> >  /* FIXME: MTU in config. */
> >  #define GOOD_PACKET_LEN (ETH_HLEN + VLAN_HLEN + ETH_DATA_LEN)
> > @@ -142,6 +143,9 @@ struct send_queue {
> >
> >         /* Record whether sq is in reset state. */
> >         bool reset;
> > +
> > +       /* The vq is premapped mode. */
> > +       bool premapped;
> >  };
> >
> >  /* Internal representation of a receive virtqueue */
> > @@ -174,6 +178,9 @@ struct receive_queue {
> >         char name[16];
> >
> >         struct xdp_rxq_info xdp_rxq;
> > +
> > +       /* The vq is premapped mode. */
> > +       bool premapped;
> >  };
> >
> >  /* This structure can contain rss message with maximum settings for indirection table and keysize
> > @@ -546,6 +553,105 @@ static struct sk_buff *page_to_skb(struct virtnet_info *vi,
> >         return skb;
> >  }
> >
> > +static int virtnet_generic_unmap(struct virtqueue *vq, struct virtqueue_detach_cursor *cursor)
> > +{
> > +       enum dma_data_direction dir;
> > +       dma_addr_t addr;
> > +       u32 len;
> > +       int err;
> > +
> > +       do {
> > +               err = virtqueue_detach(vq, cursor, &addr, &len, &dir);
> > +               if (!err || err == -EAGAIN)
> > +                       dma_unmap_page_attrs(virtqueue_dma_dev(vq), addr, len, dir, 0);
> > +
> > +       } while (err == -EAGAIN);
> > +
> > +       return err;
> > +}
> > +
> > +static void *virtnet_detach_unused_buf(struct virtqueue *vq, bool premapped)
> > +{
> > +       struct virtqueue_detach_cursor cursor;
> > +       void *buf;
> > +
> > +       if (!premapped)
> > +               return virtqueue_detach_unused_buf(vq);
> > +
> > +       buf = virtqueue_detach_unused_buf_premapped(vq, &cursor);
> > +       if (buf)
> > +               virtnet_generic_unmap(vq, &cursor);
> > +
> > +       return buf;
> > +}
> > +
> > +static void *virtnet_get_buf_ctx(struct virtqueue *vq, bool premapped, u32 *len, void **ctx)
> > +{
> > +       struct virtqueue_detach_cursor cursor;
> > +       void *buf;
> > +
> > +       if (!premapped)
> > +               return virtqueue_get_buf_ctx(vq, len, ctx);
> > +
> > +       buf = virtqueue_get_buf_premapped(vq, len, ctx, &cursor);
> > +       if (buf)
> > +               virtnet_generic_unmap(vq, &cursor);
> > +
> > +       return buf;
> > +}
> > +
> > +#define virtnet_rq_get_buf(rq, plen, pctx) \
> > +({ \
> > +       typeof(rq) _rq = (rq); \
> > +       virtnet_get_buf_ctx(_rq->vq, _rq->premapped, plen, pctx); \
> > +})
> > +
> > +#define virtnet_sq_get_buf(sq, plen, pctx) \
> > +({ \
> > +       typeof(sq) _sq = (sq); \
> > +       virtnet_get_buf_ctx(_sq->vq, _sq->premapped, plen, pctx); \
> > +})
> > +
> > +static int virtnet_add_sg(struct virtqueue *vq, bool premapped,
> > +                         struct scatterlist *sg, unsigned int num, bool out,
> > +                         void *data, void *ctx, gfp_t gfp)
> > +{
> > +       enum dma_data_direction dir;
> > +       struct device *dev;
> > +       int err, ret;
> > +
> > +       if (!premapped)
> > +               return virtqueue_add_sg(vq, sg, num, out, data, ctx, gfp);
> > +
> > +       dir = out ? DMA_TO_DEVICE : DMA_FROM_DEVICE;
> > +       dev = virtqueue_dma_dev(vq);
> > +
> > +       ret = dma_map_sg_attrs(dev, sg, num, dir, 0);
> > +       if (ret != num)
> > +               goto err;
> > +
> > +       err = virtqueue_add_sg(vq, sg, num, out, data, ctx, gfp);
> > +       if (err < 0)
> > +               goto err;
> > +
> > +       return 0;
> > +
> > +err:
> > +       dma_unmap_sg_attrs(dev, sg, num, dir, 0);
> > +       return -ENOMEM;
> > +}
> > +
> > +static int virtnet_add_outbuf(struct send_queue *sq, unsigned int num, void *data)
> > +{
> > +       return virtnet_add_sg(sq->vq, sq->premapped, sq->sg, num, true, data, NULL, GFP_ATOMIC);
> > +}
> > +
> > +static int virtnet_add_inbuf(struct receive_queue *rq, unsigned int num, void *data,
> > +                            void *ctx, gfp_t gfp)
> > +{
> > +       return virtnet_add_sg(rq->vq, rq->premapped, rq->sg, num, false, data, ctx, gfp);
> > +}
> > +
> >  static void free_old_xmit_skbs(struct send_queue *sq, bool in_napi)
> >  {
> >         unsigned int len;
> > @@ -553,7 +659,7 @@ static void free_old_xmit_skbs(struct send_queue *sq, bool in_napi)
> >         unsigned int bytes = 0;
> >         void *ptr;
> >
> > -       while ((ptr = virtqueue_get_buf(sq->vq, &len)) != NULL) {
> > +       while ((ptr = virtnet_sq_get_buf(sq, &len, NULL)) != NULL) {
> >                 if (likely(!is_xdp_frame(ptr))) {
> >                         struct sk_buff *skb = ptr;
> >
> > @@ -667,8 +773,7 @@ static int __virtnet_xdp_xmit_one(struct virtnet_info *vi,
> >                             skb_frag_size(frag), skb_frag_off(frag));
> >         }
> >
> > -       err = virtqueue_add_outbuf(sq->vq, sq->sg, nr_frags + 1,
> > -                                  xdp_to_ptr(xdpf), GFP_ATOMIC);
> > +       err = virtnet_add_outbuf(sq, nr_frags + 1, xdp_to_ptr(xdpf));
> >         if (unlikely(err))
> >                 return -ENOSPC; /* Caller handle free/refcnt */
> >
> > @@ -744,7 +849,7 @@ static int virtnet_xdp_xmit(struct net_device *dev,
> >         }
> >
> >         /* Free up any pending old buffers before queueing new ones. */
> > -       while ((ptr = virtqueue_get_buf(sq->vq, &len)) != NULL) {
> > +       while ((ptr = virtnet_sq_get_buf(sq, &len, NULL)) != NULL) {
> >                 if (likely(is_xdp_frame(ptr))) {
> >                         struct xdp_frame *frame = ptr_to_xdp(ptr);
> >
> > @@ -828,7 +933,7 @@ static struct page *xdp_linearize_page(struct receive_queue *rq,
> >                 void *buf;
> >                 int off;
> >
> > -               buf = virtqueue_get_buf(rq->vq, &buflen);
> > +               buf = virtnet_rq_get_buf(rq, &buflen, NULL);
> >                 if (unlikely(!buf))
> >                         goto err_buf;
> >
> > @@ -1119,7 +1224,7 @@ static int virtnet_build_xdp_buff_mrg(struct net_device *dev,
> >                 return -EINVAL;
> >
> >         while (--*num_buf > 0) {
> > -               buf = virtqueue_get_buf_ctx(rq->vq, &len, &ctx);
> > +               buf = virtnet_rq_get_buf(rq, &len, &ctx);
> >                 if (unlikely(!buf)) {
> >                         pr_debug("%s: rx error: %d buffers out of %d missing\n",
> >                                  dev->name, *num_buf,
> > @@ -1344,7 +1449,7 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
> >         while (--num_buf) {
> >                 int num_skb_frags;
> >
> > -               buf = virtqueue_get_buf_ctx(rq->vq, &len, &ctx);
> > +               buf = virtnet_rq_get_buf(rq, &len, &ctx);
> >                 if (unlikely(!buf)) {
> >                         pr_debug("%s: rx error: %d buffers out of %d missing\n",
> >                                  dev->name, num_buf,
> > @@ -1407,7 +1512,7 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
> >  err_skb:
> >         put_page(page);
> >         while (num_buf-- > 1) {
> > -               buf = virtqueue_get_buf(rq->vq, &len);
> > +               buf = virtnet_rq_get_buf(rq, &len, NULL);
> >                 if (unlikely(!buf)) {
> >                         pr_debug("%s: rx error: %d buffers missing\n",
> >                                  dev->name, num_buf);
> > @@ -1534,7 +1639,7 @@ static int add_recvbuf_small(struct virtnet_info *vi, struct receive_queue *rq,
> >         alloc_frag->offset += len;
> >         sg_init_one(rq->sg, buf + VIRTNET_RX_PAD + xdp_headroom,
> >                     vi->hdr_len + GOOD_PACKET_LEN);
> > -       err = virtqueue_add_inbuf_ctx(rq->vq, rq->sg, 1, buf, ctx, gfp);
> > +       err = virtnet_add_inbuf(rq, 1, buf, ctx, gfp);
> >         if (err < 0)
> >                 put_page(virt_to_head_page(buf));
> >         return err;
> > @@ -1581,8 +1686,8 @@ static int add_recvbuf_big(struct virtnet_info *vi, struct receive_queue *rq,
> >
> >         /* chain first in list head */
> >         first->private = (unsigned long)list;
> > -       err = virtqueue_add_inbuf(rq->vq, rq->sg, vi->big_packets_num_skbfrags + 2,
> > -                                 first, gfp);
> > +       err = virtnet_add_inbuf(rq, vi->big_packets_num_skbfrags + 2,
> > +                               first, NULL, gfp);
> >         if (err < 0)
> >                 give_pages(rq, first);
> >
> > @@ -1645,7 +1750,7 @@ static int add_recvbuf_mergeable(struct virtnet_info *vi,
> >
> >         sg_init_one(rq->sg, buf, len);
> >         ctx = mergeable_len_to_ctx(len + room, headroom);
> > -       err = virtqueue_add_inbuf_ctx(rq->vq, rq->sg, 1, buf, ctx, gfp);
> > +       err = virtnet_add_inbuf(rq, 1, buf, ctx, gfp);
> >         if (err < 0)
> >                 put_page(virt_to_head_page(buf));
> >
> > @@ -1768,13 +1873,13 @@ static int virtnet_receive(struct receive_queue *rq, int budget,
> >                 void *ctx;
> >
> >                 while (stats.packets < budget &&
> > -                      (buf = virtqueue_get_buf_ctx(rq->vq, &len, &ctx))) {
> > +                      (buf = virtnet_rq_get_buf(rq, &len, &ctx))) {
> >                         receive_buf(vi, rq, buf, len, ctx, xdp_xmit, &stats);
> >                         stats.packets++;
> >                 }
> >         } else {
> >                 while (stats.packets < budget &&
> > -                      (buf = virtqueue_get_buf(rq->vq, &len)) != NULL) {
> > +                      (buf = virtnet_rq_get_buf(rq, &len, NULL)) != NULL) {
> >                         receive_buf(vi, rq, buf, len, NULL, xdp_xmit, &stats);
> >                         stats.packets++;
> >                 }
> > @@ -1984,7 +2089,7 @@ static int xmit_skb(struct send_queue *sq, struct sk_buff *skb)
> >                         return num_sg;
> >                 num_sg++;
> >         }
> > -       return virtqueue_add_outbuf(sq->vq, sq->sg, num_sg, skb, GFP_ATOMIC);
> > +       return virtnet_add_outbuf(sq, num_sg, skb);
> >  }
> >
> >  static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev)
> > @@ -3552,15 +3657,17 @@ static void free_unused_bufs(struct virtnet_info *vi)
> >         int i;
> >
> >         for (i = 0; i < vi->max_queue_pairs; i++) {
> > -               struct virtqueue *vq = vi->sq[i].vq;
> > -               while ((buf = virtqueue_detach_unused_buf(vq)) != NULL)
> > -                       virtnet_sq_free_unused_buf(vq, buf);
> > +               struct send_queue *sq = &vi->sq[i];
> > +
> > +               while ((buf = virtnet_detach_unused_buf(sq->vq, sq->premapped)) != NULL)
> > +                       virtnet_sq_free_unused_buf(sq->vq, buf);
> >         }
> >
> >         for (i = 0; i < vi->max_queue_pairs; i++) {
> > -               struct virtqueue *vq = vi->rq[i].vq;
> > -               while ((buf = virtqueue_detach_unused_buf(vq)) != NULL)
> > -                       virtnet_rq_free_unused_buf(vq, buf);
> > +               struct receive_queue *rq = &vi->rq[i];
> > +
> > +               while ((buf = virtnet_detach_unused_buf(rq->vq, rq->premapped)) != NULL)
> > +                       virtnet_rq_free_unused_buf(rq->vq, buf);
> >         }
> >  }
> >
> > @@ -3658,6 +3765,18 @@ static int virtnet_find_vqs(struct virtnet_info *vi)
> >                 vi->rq[i].vq = vqs[rxq2vq(i)];
> >                 vi->rq[i].min_buf_len = mergeable_min_buf_len(vi, vi->rq[i].vq);
> >                 vi->sq[i].vq = vqs[txq2vq(i)];
> > +
> > +               if (experiment_premapped) {
> > +                       if (!virtqueue_set_premapped(vi->rq[i].vq))
> > +                               vi->rq[i].premapped = true;
> > +                       else
> > +                               netdev_warn(vi->dev, "RXQ (%d) enable premapped failure.\n", i);
> > +
> > +                       if (!virtqueue_set_premapped(vi->sq[i].vq))
> > +                               vi->sq[i].premapped = true;
> > +                       else
> > +                               netdev_warn(vi->dev, "TXQ (%d) enable premapped failure.\n", i);
> > +               }
> >         }
> >
> >         /* run here: ret == 0. */
> > --
> > 2.32.0.3.g01195cf9f
> >
>
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH vhost v10 10/10] virtio_net: support dma premapped
@ 2023-06-27  9:23       ` Xuan Zhuo
  0 siblings, 0 replies; 91+ messages in thread
From: Xuan Zhuo @ 2023-06-27  9:23 UTC (permalink / raw)
  To: Jason Wang
  Cc: virtualization, Michael S. Tsirkin, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Alexei Starovoitov,
	Daniel Borkmann, Jesper Dangaard Brouer, John Fastabend, netdev,
	bpf

On Tue, 27 Jun 2023 16:03:35 +0800, Jason Wang <jasowang@redhat.com> wrote:
> On Fri, Jun 2, 2023 at 5:22 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> >
> > Introduce the module param "experiment_premapped" to enable the function
> > that the virtio-net do dma mapping.
> >
> > If that is true, the vq of virtio-net is under the premapped mode.
> > It just handle the sg with dma_address. And the driver must get the dma
> > address of the buffer to unmap after get the buffer from virtio core.
> >
> > That will be useful when AF_XDP is enable, AF_XDP tx and the kernel packet
> > xmit will share the tx queue, so the skb xmit must support the premapped
> > mode.
> >
> > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > ---
> >  drivers/net/virtio_net.c | 163 +++++++++++++++++++++++++++++++++------
> >  1 file changed, 141 insertions(+), 22 deletions(-)
> >
> > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > index 2396c28c0122..5898212fcb3c 100644
> > --- a/drivers/net/virtio_net.c
> > +++ b/drivers/net/virtio_net.c
> > @@ -26,10 +26,11 @@
> >  static int napi_weight = NAPI_POLL_WEIGHT;
> >  module_param(napi_weight, int, 0444);
> >
> > -static bool csum = true, gso = true, napi_tx = true;
> > +static bool csum = true, gso = true, napi_tx = true, experiment_premapped;
> >  module_param(csum, bool, 0444);
> >  module_param(gso, bool, 0444);
> >  module_param(napi_tx, bool, 0644);
> > +module_param(experiment_premapped, bool, 0644);
>
> Having a module parameter is sub-optimal. I think we can demonstrate
> real benefit:
>
> In the case of a merge rx buffer, if the mapping is done by the
> virtio-core, it needs to be done per buffer (< PAGE_SIZE).
>
> But if it is done by the virtio-net, we have a chance to map the
> buffer per page. Which can save a lot of mappings and unmapping. A lot
> of other optimizations could be done on top as well.


Good point.

Thanks


>
> If we manage to prove this, we don't need any experimental module
> parameters at all.
>
> Thanks
>
>
> >
> >  /* FIXME: MTU in config. */
> >  #define GOOD_PACKET_LEN (ETH_HLEN + VLAN_HLEN + ETH_DATA_LEN)
> > @@ -142,6 +143,9 @@ struct send_queue {
> >
> >         /* Record whether sq is in reset state. */
> >         bool reset;
> > +
> > +       /* The vq is premapped mode. */
> > +       bool premapped;
> >  };
> >
> >  /* Internal representation of a receive virtqueue */
> > @@ -174,6 +178,9 @@ struct receive_queue {
> >         char name[16];
> >
> >         struct xdp_rxq_info xdp_rxq;
> > +
> > +       /* The vq is premapped mode. */
> > +       bool premapped;
> >  };
> >
> >  /* This structure can contain rss message with maximum settings for indirection table and keysize
> > @@ -546,6 +553,105 @@ static struct sk_buff *page_to_skb(struct virtnet_info *vi,
> >         return skb;
> >  }
> >
> > +static int virtnet_generic_unmap(struct virtqueue *vq, struct virtqueue_detach_cursor *cursor)
> > +{
> > +       enum dma_data_direction dir;
> > +       dma_addr_t addr;
> > +       u32 len;
> > +       int err;
> > +
> > +       do {
> > +               err = virtqueue_detach(vq, cursor, &addr, &len, &dir);
> > +               if (!err || err == -EAGAIN)
> > +                       dma_unmap_page_attrs(virtqueue_dma_dev(vq), addr, len, dir, 0);
> > +
> > +       } while (err == -EAGAIN);
> > +
> > +       return err;
> > +}
> > +
> > +static void *virtnet_detach_unused_buf(struct virtqueue *vq, bool premapped)
> > +{
> > +       struct virtqueue_detach_cursor cursor;
> > +       void *buf;
> > +
> > +       if (!premapped)
> > +               return virtqueue_detach_unused_buf(vq);
> > +
> > +       buf = virtqueue_detach_unused_buf_premapped(vq, &cursor);
> > +       if (buf)
> > +               virtnet_generic_unmap(vq, &cursor);
> > +
> > +       return buf;
> > +}
> > +
> > +static void *virtnet_get_buf_ctx(struct virtqueue *vq, bool premapped, u32 *len, void **ctx)
> > +{
> > +       struct virtqueue_detach_cursor cursor;
> > +       void *buf;
> > +
> > +       if (!premapped)
> > +               return virtqueue_get_buf_ctx(vq, len, ctx);
> > +
> > +       buf = virtqueue_get_buf_premapped(vq, len, ctx, &cursor);
> > +       if (buf)
> > +               virtnet_generic_unmap(vq, &cursor);
> > +
> > +       return buf;
> > +}
> > +
> > +#define virtnet_rq_get_buf(rq, plen, pctx) \
> > +({ \
> > +       typeof(rq) _rq = (rq); \
> > +       virtnet_get_buf_ctx(_rq->vq, _rq->premapped, plen, pctx); \
> > +})
> > +
> > +#define virtnet_sq_get_buf(sq, plen, pctx) \
> > +({ \
> > +       typeof(sq) _sq = (sq); \
> > +       virtnet_get_buf_ctx(_sq->vq, _sq->premapped, plen, pctx); \
> > +})
> > +
> > +static int virtnet_add_sg(struct virtqueue *vq, bool premapped,
> > +                         struct scatterlist *sg, unsigned int num, bool out,
> > +                         void *data, void *ctx, gfp_t gfp)
> > +{
> > +       enum dma_data_direction dir;
> > +       struct device *dev;
> > +       int err, ret;
> > +
> > +       if (!premapped)
> > +               return virtqueue_add_sg(vq, sg, num, out, data, ctx, gfp);
> > +
> > +       dir = out ? DMA_TO_DEVICE : DMA_FROM_DEVICE;
> > +       dev = virtqueue_dma_dev(vq);
> > +
> > +       ret = dma_map_sg_attrs(dev, sg, num, dir, 0);
> > +       if (ret != num)
> > +               goto err;
> > +
> > +       err = virtqueue_add_sg(vq, sg, num, out, data, ctx, gfp);
> > +       if (err < 0)
> > +               goto err;
> > +
> > +       return 0;
> > +
> > +err:
> > +       dma_unmap_sg_attrs(dev, sg, num, dir, 0);
> > +       return -ENOMEM;
> > +}
> > +
> > +static int virtnet_add_outbuf(struct send_queue *sq, unsigned int num, void *data)
> > +{
> > +       return virtnet_add_sg(sq->vq, sq->premapped, sq->sg, num, true, data, NULL, GFP_ATOMIC);
> > +}
> > +
> > +static int virtnet_add_inbuf(struct receive_queue *rq, unsigned int num, void *data,
> > +                            void *ctx, gfp_t gfp)
> > +{
> > +       return virtnet_add_sg(rq->vq, rq->premapped, rq->sg, num, false, data, ctx, gfp);
> > +}
> > +
> >  static void free_old_xmit_skbs(struct send_queue *sq, bool in_napi)
> >  {
> >         unsigned int len;
> > @@ -553,7 +659,7 @@ static void free_old_xmit_skbs(struct send_queue *sq, bool in_napi)
> >         unsigned int bytes = 0;
> >         void *ptr;
> >
> > -       while ((ptr = virtqueue_get_buf(sq->vq, &len)) != NULL) {
> > +       while ((ptr = virtnet_sq_get_buf(sq, &len, NULL)) != NULL) {
> >                 if (likely(!is_xdp_frame(ptr))) {
> >                         struct sk_buff *skb = ptr;
> >
> > @@ -667,8 +773,7 @@ static int __virtnet_xdp_xmit_one(struct virtnet_info *vi,
> >                             skb_frag_size(frag), skb_frag_off(frag));
> >         }
> >
> > -       err = virtqueue_add_outbuf(sq->vq, sq->sg, nr_frags + 1,
> > -                                  xdp_to_ptr(xdpf), GFP_ATOMIC);
> > +       err = virtnet_add_outbuf(sq, nr_frags + 1, xdp_to_ptr(xdpf));
> >         if (unlikely(err))
> >                 return -ENOSPC; /* Caller handle free/refcnt */
> >
> > @@ -744,7 +849,7 @@ static int virtnet_xdp_xmit(struct net_device *dev,
> >         }
> >
> >         /* Free up any pending old buffers before queueing new ones. */
> > -       while ((ptr = virtqueue_get_buf(sq->vq, &len)) != NULL) {
> > +       while ((ptr = virtnet_sq_get_buf(sq, &len, NULL)) != NULL) {
> >                 if (likely(is_xdp_frame(ptr))) {
> >                         struct xdp_frame *frame = ptr_to_xdp(ptr);
> >
> > @@ -828,7 +933,7 @@ static struct page *xdp_linearize_page(struct receive_queue *rq,
> >                 void *buf;
> >                 int off;
> >
> > -               buf = virtqueue_get_buf(rq->vq, &buflen);
> > +               buf = virtnet_rq_get_buf(rq, &buflen, NULL);
> >                 if (unlikely(!buf))
> >                         goto err_buf;
> >
> > @@ -1119,7 +1224,7 @@ static int virtnet_build_xdp_buff_mrg(struct net_device *dev,
> >                 return -EINVAL;
> >
> >         while (--*num_buf > 0) {
> > -               buf = virtqueue_get_buf_ctx(rq->vq, &len, &ctx);
> > +               buf = virtnet_rq_get_buf(rq, &len, &ctx);
> >                 if (unlikely(!buf)) {
> >                         pr_debug("%s: rx error: %d buffers out of %d missing\n",
> >                                  dev->name, *num_buf,
> > @@ -1344,7 +1449,7 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
> >         while (--num_buf) {
> >                 int num_skb_frags;
> >
> > -               buf = virtqueue_get_buf_ctx(rq->vq, &len, &ctx);
> > +               buf = virtnet_rq_get_buf(rq, &len, &ctx);
> >                 if (unlikely(!buf)) {
> >                         pr_debug("%s: rx error: %d buffers out of %d missing\n",
> >                                  dev->name, num_buf,
> > @@ -1407,7 +1512,7 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
> >  err_skb:
> >         put_page(page);
> >         while (num_buf-- > 1) {
> > -               buf = virtqueue_get_buf(rq->vq, &len);
> > +               buf = virtnet_rq_get_buf(rq, &len, NULL);
> >                 if (unlikely(!buf)) {
> >                         pr_debug("%s: rx error: %d buffers missing\n",
> >                                  dev->name, num_buf);
> > @@ -1534,7 +1639,7 @@ static int add_recvbuf_small(struct virtnet_info *vi, struct receive_queue *rq,
> >         alloc_frag->offset += len;
> >         sg_init_one(rq->sg, buf + VIRTNET_RX_PAD + xdp_headroom,
> >                     vi->hdr_len + GOOD_PACKET_LEN);
> > -       err = virtqueue_add_inbuf_ctx(rq->vq, rq->sg, 1, buf, ctx, gfp);
> > +       err = virtnet_add_inbuf(rq, 1, buf, ctx, gfp);
> >         if (err < 0)
> >                 put_page(virt_to_head_page(buf));
> >         return err;
> > @@ -1581,8 +1686,8 @@ static int add_recvbuf_big(struct virtnet_info *vi, struct receive_queue *rq,
> >
> >         /* chain first in list head */
> >         first->private = (unsigned long)list;
> > -       err = virtqueue_add_inbuf(rq->vq, rq->sg, vi->big_packets_num_skbfrags + 2,
> > -                                 first, gfp);
> > +       err = virtnet_add_inbuf(rq, vi->big_packets_num_skbfrags + 2,
> > +                               first, NULL, gfp);
> >         if (err < 0)
> >                 give_pages(rq, first);
> >
> > @@ -1645,7 +1750,7 @@ static int add_recvbuf_mergeable(struct virtnet_info *vi,
> >
> >         sg_init_one(rq->sg, buf, len);
> >         ctx = mergeable_len_to_ctx(len + room, headroom);
> > -       err = virtqueue_add_inbuf_ctx(rq->vq, rq->sg, 1, buf, ctx, gfp);
> > +       err = virtnet_add_inbuf(rq, 1, buf, ctx, gfp);
> >         if (err < 0)
> >                 put_page(virt_to_head_page(buf));
> >
> > @@ -1768,13 +1873,13 @@ static int virtnet_receive(struct receive_queue *rq, int budget,
> >                 void *ctx;
> >
> >                 while (stats.packets < budget &&
> > -                      (buf = virtqueue_get_buf_ctx(rq->vq, &len, &ctx))) {
> > +                      (buf = virtnet_rq_get_buf(rq, &len, &ctx))) {
> >                         receive_buf(vi, rq, buf, len, ctx, xdp_xmit, &stats);
> >                         stats.packets++;
> >                 }
> >         } else {
> >                 while (stats.packets < budget &&
> > -                      (buf = virtqueue_get_buf(rq->vq, &len)) != NULL) {
> > +                      (buf = virtnet_rq_get_buf(rq, &len, NULL)) != NULL) {
> >                         receive_buf(vi, rq, buf, len, NULL, xdp_xmit, &stats);
> >                         stats.packets++;
> >                 }
> > @@ -1984,7 +2089,7 @@ static int xmit_skb(struct send_queue *sq, struct sk_buff *skb)
> >                         return num_sg;
> >                 num_sg++;
> >         }
> > -       return virtqueue_add_outbuf(sq->vq, sq->sg, num_sg, skb, GFP_ATOMIC);
> > +       return virtnet_add_outbuf(sq, num_sg, skb);
> >  }
> >
> >  static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev)
> > @@ -3552,15 +3657,17 @@ static void free_unused_bufs(struct virtnet_info *vi)
> >         int i;
> >
> >         for (i = 0; i < vi->max_queue_pairs; i++) {
> > -               struct virtqueue *vq = vi->sq[i].vq;
> > -               while ((buf = virtqueue_detach_unused_buf(vq)) != NULL)
> > -                       virtnet_sq_free_unused_buf(vq, buf);
> > +               struct send_queue *sq = &vi->sq[i];
> > +
> > +               while ((buf = virtnet_detach_unused_buf(sq->vq, sq->premapped)) != NULL)
> > +                       virtnet_sq_free_unused_buf(sq->vq, buf);
> >         }
> >
> >         for (i = 0; i < vi->max_queue_pairs; i++) {
> > -               struct virtqueue *vq = vi->rq[i].vq;
> > -               while ((buf = virtqueue_detach_unused_buf(vq)) != NULL)
> > -                       virtnet_rq_free_unused_buf(vq, buf);
> > +               struct receive_queue *rq = &vi->rq[i];
> > +
> > +               while ((buf = virtnet_detach_unused_buf(rq->vq, rq->premapped)) != NULL)
> > +                       virtnet_rq_free_unused_buf(rq->vq, buf);
> >         }
> >  }
> >
> > @@ -3658,6 +3765,18 @@ static int virtnet_find_vqs(struct virtnet_info *vi)
> >                 vi->rq[i].vq = vqs[rxq2vq(i)];
> >                 vi->rq[i].min_buf_len = mergeable_min_buf_len(vi, vi->rq[i].vq);
> >                 vi->sq[i].vq = vqs[txq2vq(i)];
> > +
> > +               if (experiment_premapped) {
> > +                       if (!virtqueue_set_premapped(vi->rq[i].vq))
> > +                               vi->rq[i].premapped = true;
> > +                       else
> > +                               netdev_warn(vi->dev, "RXQ (%d) enable premapped failure.\n", i);
> > +
> > +                       if (!virtqueue_set_premapped(vi->sq[i].vq))
> > +                               vi->sq[i].premapped = true;
> > +                       else
> > +                               netdev_warn(vi->dev, "TXQ (%d) enable premapped failure.\n", i);
> > +               }
> >         }
> >
> >         /* run here: ret == 0. */
> > --
> > 2.32.0.3.g01195cf9f
> >
>

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH vhost v10 02/10] virtio_ring: introduce virtqueue_set_premapped()
  2023-06-27  8:50       ` Xuan Zhuo
@ 2023-06-27 14:56         ` Michael S. Tsirkin
  -1 siblings, 0 replies; 91+ messages in thread
From: Michael S. Tsirkin @ 2023-06-27 14:56 UTC (permalink / raw)
  To: Xuan Zhuo
  Cc: Jesper Dangaard Brouer, Daniel Borkmann, netdev, John Fastabend,
	Alexei Starovoitov, virtualization, Eric Dumazet, Jakub Kicinski,
	bpf, Paolo Abeni, David S. Miller

On Tue, Jun 27, 2023 at 04:50:01PM +0800, Xuan Zhuo wrote:
> On Tue, 27 Jun 2023 16:03:23 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > On Fri, Jun 2, 2023 at 5:22 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > >
> > > This helper allows the driver change the dma mode to premapped mode.
> > > Under the premapped mode, the virtio core do not do dma mapping
> > > internally.
> > >
> > > This just work when the use_dma_api is true. If the use_dma_api is false,
> > > the dma options is not through the DMA APIs, that is not the standard
> > > way of the linux kernel.
> > >
> > > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > > ---
> > >  drivers/virtio/virtio_ring.c | 40 ++++++++++++++++++++++++++++++++++++
> > >  include/linux/virtio.h       |  2 ++
> > >  2 files changed, 42 insertions(+)
> > >
> > > diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> > > index 72ed07a604d4..2afdfb9e3e30 100644
> > > --- a/drivers/virtio/virtio_ring.c
> > > +++ b/drivers/virtio/virtio_ring.c
> > > @@ -172,6 +172,9 @@ struct vring_virtqueue {
> > >         /* Host publishes avail event idx */
> > >         bool event;
> > >
> > > +       /* Do DMA mapping by driver */
> > > +       bool premapped;
> > > +
> > >         /* Head of free buffer list. */
> > >         unsigned int free_head;
> > >         /* Number we've added since last sync. */
> > > @@ -2059,6 +2062,7 @@ static struct virtqueue *vring_create_virtqueue_packed(
> > >         vq->packed_ring = true;
> > >         vq->dma_dev = dma_dev;
> > >         vq->use_dma_api = vring_use_dma_api(vdev);
> > > +       vq->premapped = false;
> > >
> > >         vq->indirect = virtio_has_feature(vdev, VIRTIO_RING_F_INDIRECT_DESC) &&
> > >                 !context;
> > > @@ -2548,6 +2552,7 @@ static struct virtqueue *__vring_new_virtqueue(unsigned int index,
> > >  #endif
> > >         vq->dma_dev = dma_dev;
> > >         vq->use_dma_api = vring_use_dma_api(vdev);
> > > +       vq->premapped = false;
> > >
> > >         vq->indirect = virtio_has_feature(vdev, VIRTIO_RING_F_INDIRECT_DESC) &&
> > >                 !context;
> > > @@ -2691,6 +2696,41 @@ int virtqueue_resize(struct virtqueue *_vq, u32 num,
> > >  }
> > >  EXPORT_SYMBOL_GPL(virtqueue_resize);
> > >
> > > +/**
> > > + * virtqueue_set_premapped - set the vring premapped mode
> > > + * @_vq: the struct virtqueue we're talking about.
> > > + *
> > > + * Enable the premapped mode of the vq.
> > > + *
> > > + * The vring in premapped mode does not do dma internally, so the driver must
> > > + * do dma mapping in advance. The driver must pass the dma_address through
> > > + * dma_address of scatterlist. When the driver got a used buffer from
> > > + * the vring, it has to unmap the dma address. So the driver must call
> > > + * virtqueue_get_buf_premapped()/virtqueue_detach_unused_buf_premapped().
> > > + *
> > > + * This must be called before adding any buf to vring.
> >
> > And any old buffer should be detached?
> 
> I mean that before adding any buf, So there are not old buffer.
> 

Oh. So put this in the same sentence:

	This function must be called immediately after creating the vq,
	or after vq reset, and before adding any buffers to it.


> >
> > > + * So this should be called immediately after init vq or vq reset.

Do you really need to call this again after each reset?


> > Any way to detect and warn in this case? (not a must if it's too
> > expensive to do the check)
> 
> 
> I can try to check whether the qeueu is empty.
> 
> 
> >
> > > + *
> > > + * Caller must ensure we don't call this with other virtqueue operations
> > > + * at the same time (except where noted).
> > > + *
> > > + * Returns zero or a negative error.
> > > + * 0: success.
> > > + * -EINVAL: vring does not use the dma api, so we can not enable premapped mode.
> > > + */
> > > +int virtqueue_set_premapped(struct virtqueue *_vq)
> > > +{
> > > +       struct vring_virtqueue *vq = to_vvq(_vq);
> > > +
> > > +       if (!vq->use_dma_api)
> > > +               return -EINVAL;
> > > +
> > > +       vq->premapped = true;
> >
> > I guess there should be a way to disable it. Would it be useful for
> > the case when AF_XDP sockets were destroyed?
> 
> Yes.
> 
> When we reset the queue, the vq->premapped will be set to 0.
> 
> The is called after find_vqs or reset vq.
> 
> Thanks.
> 
> 
> 
> >
> > Thanks
> >
> >
> > > +
> > > +       return 0;
> > > +}
> > > +EXPORT_SYMBOL_GPL(virtqueue_set_premapped);
> > > +
> > >  /* Only available for split ring */
> > >  struct virtqueue *vring_new_virtqueue(unsigned int index,
> > >                                       unsigned int num,
> > > diff --git a/include/linux/virtio.h b/include/linux/virtio.h
> > > index b93238db94e3..1fc0e1023bd4 100644
> > > --- a/include/linux/virtio.h
> > > +++ b/include/linux/virtio.h
> > > @@ -78,6 +78,8 @@ bool virtqueue_enable_cb(struct virtqueue *vq);
> > >
> > >  unsigned virtqueue_enable_cb_prepare(struct virtqueue *vq);
> > >
> > > +int virtqueue_set_premapped(struct virtqueue *_vq);
> > > +
> > >  bool virtqueue_poll(struct virtqueue *vq, unsigned);
> > >
> > >  bool virtqueue_enable_cb_delayed(struct virtqueue *vq);
> > > --
> > > 2.32.0.3.g01195cf9f
> > >
> >

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH vhost v10 02/10] virtio_ring: introduce virtqueue_set_premapped()
@ 2023-06-27 14:56         ` Michael S. Tsirkin
  0 siblings, 0 replies; 91+ messages in thread
From: Michael S. Tsirkin @ 2023-06-27 14:56 UTC (permalink / raw)
  To: Xuan Zhuo
  Cc: Jason Wang, virtualization, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Alexei Starovoitov, Daniel Borkmann,
	Jesper Dangaard Brouer, John Fastabend, netdev, bpf

On Tue, Jun 27, 2023 at 04:50:01PM +0800, Xuan Zhuo wrote:
> On Tue, 27 Jun 2023 16:03:23 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > On Fri, Jun 2, 2023 at 5:22 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > >
> > > This helper allows the driver change the dma mode to premapped mode.
> > > Under the premapped mode, the virtio core do not do dma mapping
> > > internally.
> > >
> > > This just work when the use_dma_api is true. If the use_dma_api is false,
> > > the dma options is not through the DMA APIs, that is not the standard
> > > way of the linux kernel.
> > >
> > > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > > ---
> > >  drivers/virtio/virtio_ring.c | 40 ++++++++++++++++++++++++++++++++++++
> > >  include/linux/virtio.h       |  2 ++
> > >  2 files changed, 42 insertions(+)
> > >
> > > diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> > > index 72ed07a604d4..2afdfb9e3e30 100644
> > > --- a/drivers/virtio/virtio_ring.c
> > > +++ b/drivers/virtio/virtio_ring.c
> > > @@ -172,6 +172,9 @@ struct vring_virtqueue {
> > >         /* Host publishes avail event idx */
> > >         bool event;
> > >
> > > +       /* Do DMA mapping by driver */
> > > +       bool premapped;
> > > +
> > >         /* Head of free buffer list. */
> > >         unsigned int free_head;
> > >         /* Number we've added since last sync. */
> > > @@ -2059,6 +2062,7 @@ static struct virtqueue *vring_create_virtqueue_packed(
> > >         vq->packed_ring = true;
> > >         vq->dma_dev = dma_dev;
> > >         vq->use_dma_api = vring_use_dma_api(vdev);
> > > +       vq->premapped = false;
> > >
> > >         vq->indirect = virtio_has_feature(vdev, VIRTIO_RING_F_INDIRECT_DESC) &&
> > >                 !context;
> > > @@ -2548,6 +2552,7 @@ static struct virtqueue *__vring_new_virtqueue(unsigned int index,
> > >  #endif
> > >         vq->dma_dev = dma_dev;
> > >         vq->use_dma_api = vring_use_dma_api(vdev);
> > > +       vq->premapped = false;
> > >
> > >         vq->indirect = virtio_has_feature(vdev, VIRTIO_RING_F_INDIRECT_DESC) &&
> > >                 !context;
> > > @@ -2691,6 +2696,41 @@ int virtqueue_resize(struct virtqueue *_vq, u32 num,
> > >  }
> > >  EXPORT_SYMBOL_GPL(virtqueue_resize);
> > >
> > > +/**
> > > + * virtqueue_set_premapped - set the vring premapped mode
> > > + * @_vq: the struct virtqueue we're talking about.
> > > + *
> > > + * Enable the premapped mode of the vq.
> > > + *
> > > + * The vring in premapped mode does not do dma internally, so the driver must
> > > + * do dma mapping in advance. The driver must pass the dma_address through
> > > + * dma_address of scatterlist. When the driver got a used buffer from
> > > + * the vring, it has to unmap the dma address. So the driver must call
> > > + * virtqueue_get_buf_premapped()/virtqueue_detach_unused_buf_premapped().
> > > + *
> > > + * This must be called before adding any buf to vring.
> >
> > And any old buffer should be detached?
> 
> I mean that before adding any buf, So there are not old buffer.
> 

Oh. So put this in the same sentence:

	This function must be called immediately after creating the vq,
	or after vq reset, and before adding any buffers to it.


> >
> > > + * So this should be called immediately after init vq or vq reset.

Do you really need to call this again after each reset?


> > Any way to detect and warn in this case? (not a must if it's too
> > expensive to do the check)
> 
> 
> I can try to check whether the qeueu is empty.
> 
> 
> >
> > > + *
> > > + * Caller must ensure we don't call this with other virtqueue operations
> > > + * at the same time (except where noted).
> > > + *
> > > + * Returns zero or a negative error.
> > > + * 0: success.
> > > + * -EINVAL: vring does not use the dma api, so we can not enable premapped mode.
> > > + */
> > > +int virtqueue_set_premapped(struct virtqueue *_vq)
> > > +{
> > > +       struct vring_virtqueue *vq = to_vvq(_vq);
> > > +
> > > +       if (!vq->use_dma_api)
> > > +               return -EINVAL;
> > > +
> > > +       vq->premapped = true;
> >
> > I guess there should be a way to disable it. Would it be useful for
> > the case when AF_XDP sockets were destroyed?
> 
> Yes.
> 
> When we reset the queue, the vq->premapped will be set to 0.
> 
> The is called after find_vqs or reset vq.
> 
> Thanks.
> 
> 
> 
> >
> > Thanks
> >
> >
> > > +
> > > +       return 0;
> > > +}
> > > +EXPORT_SYMBOL_GPL(virtqueue_set_premapped);
> > > +
> > >  /* Only available for split ring */
> > >  struct virtqueue *vring_new_virtqueue(unsigned int index,
> > >                                       unsigned int num,
> > > diff --git a/include/linux/virtio.h b/include/linux/virtio.h
> > > index b93238db94e3..1fc0e1023bd4 100644
> > > --- a/include/linux/virtio.h
> > > +++ b/include/linux/virtio.h
> > > @@ -78,6 +78,8 @@ bool virtqueue_enable_cb(struct virtqueue *vq);
> > >
> > >  unsigned virtqueue_enable_cb_prepare(struct virtqueue *vq);
> > >
> > > +int virtqueue_set_premapped(struct virtqueue *_vq);
> > > +
> > >  bool virtqueue_poll(struct virtqueue *vq, unsigned);
> > >
> > >  bool virtqueue_enable_cb_delayed(struct virtqueue *vq);
> > > --
> > > 2.32.0.3.g01195cf9f
> > >
> >


^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH vhost v10 02/10] virtio_ring: introduce virtqueue_set_premapped()
  2023-06-27 14:56         ` Michael S. Tsirkin
@ 2023-06-28  1:34           ` Xuan Zhuo
  -1 siblings, 0 replies; 91+ messages in thread
From: Xuan Zhuo @ 2023-06-28  1:34 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Jesper Dangaard Brouer, Daniel Borkmann, netdev, John Fastabend,
	Alexei Starovoitov, virtualization, Eric Dumazet, Jakub Kicinski,
	bpf, Paolo Abeni, David S. Miller

On Tue, 27 Jun 2023 10:56:54 -0400, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> On Tue, Jun 27, 2023 at 04:50:01PM +0800, Xuan Zhuo wrote:
> > On Tue, 27 Jun 2023 16:03:23 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > On Fri, Jun 2, 2023 at 5:22 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > >
> > > > This helper allows the driver change the dma mode to premapped mode.
> > > > Under the premapped mode, the virtio core do not do dma mapping
> > > > internally.
> > > >
> > > > This just work when the use_dma_api is true. If the use_dma_api is false,
> > > > the dma options is not through the DMA APIs, that is not the standard
> > > > way of the linux kernel.
> > > >
> > > > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > > > ---
> > > >  drivers/virtio/virtio_ring.c | 40 ++++++++++++++++++++++++++++++++++++
> > > >  include/linux/virtio.h       |  2 ++
> > > >  2 files changed, 42 insertions(+)
> > > >
> > > > diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> > > > index 72ed07a604d4..2afdfb9e3e30 100644
> > > > --- a/drivers/virtio/virtio_ring.c
> > > > +++ b/drivers/virtio/virtio_ring.c
> > > > @@ -172,6 +172,9 @@ struct vring_virtqueue {
> > > >         /* Host publishes avail event idx */
> > > >         bool event;
> > > >
> > > > +       /* Do DMA mapping by driver */
> > > > +       bool premapped;
> > > > +
> > > >         /* Head of free buffer list. */
> > > >         unsigned int free_head;
> > > >         /* Number we've added since last sync. */
> > > > @@ -2059,6 +2062,7 @@ static struct virtqueue *vring_create_virtqueue_packed(
> > > >         vq->packed_ring = true;
> > > >         vq->dma_dev = dma_dev;
> > > >         vq->use_dma_api = vring_use_dma_api(vdev);
> > > > +       vq->premapped = false;
> > > >
> > > >         vq->indirect = virtio_has_feature(vdev, VIRTIO_RING_F_INDIRECT_DESC) &&
> > > >                 !context;
> > > > @@ -2548,6 +2552,7 @@ static struct virtqueue *__vring_new_virtqueue(unsigned int index,
> > > >  #endif
> > > >         vq->dma_dev = dma_dev;
> > > >         vq->use_dma_api = vring_use_dma_api(vdev);
> > > > +       vq->premapped = false;
> > > >
> > > >         vq->indirect = virtio_has_feature(vdev, VIRTIO_RING_F_INDIRECT_DESC) &&
> > > >                 !context;
> > > > @@ -2691,6 +2696,41 @@ int virtqueue_resize(struct virtqueue *_vq, u32 num,
> > > >  }
> > > >  EXPORT_SYMBOL_GPL(virtqueue_resize);
> > > >
> > > > +/**
> > > > + * virtqueue_set_premapped - set the vring premapped mode
> > > > + * @_vq: the struct virtqueue we're talking about.
> > > > + *
> > > > + * Enable the premapped mode of the vq.
> > > > + *
> > > > + * The vring in premapped mode does not do dma internally, so the driver must
> > > > + * do dma mapping in advance. The driver must pass the dma_address through
> > > > + * dma_address of scatterlist. When the driver got a used buffer from
> > > > + * the vring, it has to unmap the dma address. So the driver must call
> > > > + * virtqueue_get_buf_premapped()/virtqueue_detach_unused_buf_premapped().
> > > > + *
> > > > + * This must be called before adding any buf to vring.
> > >
> > > And any old buffer should be detached?
> >
> > I mean that before adding any buf, So there are not old buffer.
> >
>
> Oh. So put this in the same sentence:
>
> 	This function must be called immediately after creating the vq,
> 	or after vq reset, and before adding any buffers to it.


OK, thanks.

>
>
> > >
> > > > + * So this should be called immediately after init vq or vq reset.
>
> Do you really need to call this again after each reset?

YES


Thanks.


>
>
> > > Any way to detect and warn in this case? (not a must if it's too
> > > expensive to do the check)
> >
> >
> > I can try to check whether the qeueu is empty.
> >
> >
> > >
> > > > + *
> > > > + * Caller must ensure we don't call this with other virtqueue operations
> > > > + * at the same time (except where noted).
> > > > + *
> > > > + * Returns zero or a negative error.
> > > > + * 0: success.
> > > > + * -EINVAL: vring does not use the dma api, so we can not enable premapped mode.
> > > > + */
> > > > +int virtqueue_set_premapped(struct virtqueue *_vq)
> > > > +{
> > > > +       struct vring_virtqueue *vq = to_vvq(_vq);
> > > > +
> > > > +       if (!vq->use_dma_api)
> > > > +               return -EINVAL;
> > > > +
> > > > +       vq->premapped = true;
> > >
> > > I guess there should be a way to disable it. Would it be useful for
> > > the case when AF_XDP sockets were destroyed?
> >
> > Yes.
> >
> > When we reset the queue, the vq->premapped will be set to 0.
> >
> > The is called after find_vqs or reset vq.
> >
> > Thanks.
> >
> >
> >
> > >
> > > Thanks
> > >
> > >
> > > > +
> > > > +       return 0;
> > > > +}
> > > > +EXPORT_SYMBOL_GPL(virtqueue_set_premapped);
> > > > +
> > > >  /* Only available for split ring */
> > > >  struct virtqueue *vring_new_virtqueue(unsigned int index,
> > > >                                       unsigned int num,
> > > > diff --git a/include/linux/virtio.h b/include/linux/virtio.h
> > > > index b93238db94e3..1fc0e1023bd4 100644
> > > > --- a/include/linux/virtio.h
> > > > +++ b/include/linux/virtio.h
> > > > @@ -78,6 +78,8 @@ bool virtqueue_enable_cb(struct virtqueue *vq);
> > > >
> > > >  unsigned virtqueue_enable_cb_prepare(struct virtqueue *vq);
> > > >
> > > > +int virtqueue_set_premapped(struct virtqueue *_vq);
> > > > +
> > > >  bool virtqueue_poll(struct virtqueue *vq, unsigned);
> > > >
> > > >  bool virtqueue_enable_cb_delayed(struct virtqueue *vq);
> > > > --
> > > > 2.32.0.3.g01195cf9f
> > > >
> > >
>
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH vhost v10 02/10] virtio_ring: introduce virtqueue_set_premapped()
@ 2023-06-28  1:34           ` Xuan Zhuo
  0 siblings, 0 replies; 91+ messages in thread
From: Xuan Zhuo @ 2023-06-28  1:34 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Jason Wang, virtualization, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Alexei Starovoitov, Daniel Borkmann,
	Jesper Dangaard Brouer, John Fastabend, netdev, bpf

On Tue, 27 Jun 2023 10:56:54 -0400, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> On Tue, Jun 27, 2023 at 04:50:01PM +0800, Xuan Zhuo wrote:
> > On Tue, 27 Jun 2023 16:03:23 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > On Fri, Jun 2, 2023 at 5:22 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > >
> > > > This helper allows the driver change the dma mode to premapped mode.
> > > > Under the premapped mode, the virtio core do not do dma mapping
> > > > internally.
> > > >
> > > > This just work when the use_dma_api is true. If the use_dma_api is false,
> > > > the dma options is not through the DMA APIs, that is not the standard
> > > > way of the linux kernel.
> > > >
> > > > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > > > ---
> > > >  drivers/virtio/virtio_ring.c | 40 ++++++++++++++++++++++++++++++++++++
> > > >  include/linux/virtio.h       |  2 ++
> > > >  2 files changed, 42 insertions(+)
> > > >
> > > > diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> > > > index 72ed07a604d4..2afdfb9e3e30 100644
> > > > --- a/drivers/virtio/virtio_ring.c
> > > > +++ b/drivers/virtio/virtio_ring.c
> > > > @@ -172,6 +172,9 @@ struct vring_virtqueue {
> > > >         /* Host publishes avail event idx */
> > > >         bool event;
> > > >
> > > > +       /* Do DMA mapping by driver */
> > > > +       bool premapped;
> > > > +
> > > >         /* Head of free buffer list. */
> > > >         unsigned int free_head;
> > > >         /* Number we've added since last sync. */
> > > > @@ -2059,6 +2062,7 @@ static struct virtqueue *vring_create_virtqueue_packed(
> > > >         vq->packed_ring = true;
> > > >         vq->dma_dev = dma_dev;
> > > >         vq->use_dma_api = vring_use_dma_api(vdev);
> > > > +       vq->premapped = false;
> > > >
> > > >         vq->indirect = virtio_has_feature(vdev, VIRTIO_RING_F_INDIRECT_DESC) &&
> > > >                 !context;
> > > > @@ -2548,6 +2552,7 @@ static struct virtqueue *__vring_new_virtqueue(unsigned int index,
> > > >  #endif
> > > >         vq->dma_dev = dma_dev;
> > > >         vq->use_dma_api = vring_use_dma_api(vdev);
> > > > +       vq->premapped = false;
> > > >
> > > >         vq->indirect = virtio_has_feature(vdev, VIRTIO_RING_F_INDIRECT_DESC) &&
> > > >                 !context;
> > > > @@ -2691,6 +2696,41 @@ int virtqueue_resize(struct virtqueue *_vq, u32 num,
> > > >  }
> > > >  EXPORT_SYMBOL_GPL(virtqueue_resize);
> > > >
> > > > +/**
> > > > + * virtqueue_set_premapped - set the vring premapped mode
> > > > + * @_vq: the struct virtqueue we're talking about.
> > > > + *
> > > > + * Enable the premapped mode of the vq.
> > > > + *
> > > > + * The vring in premapped mode does not do dma internally, so the driver must
> > > > + * do dma mapping in advance. The driver must pass the dma_address through
> > > > + * dma_address of scatterlist. When the driver got a used buffer from
> > > > + * the vring, it has to unmap the dma address. So the driver must call
> > > > + * virtqueue_get_buf_premapped()/virtqueue_detach_unused_buf_premapped().
> > > > + *
> > > > + * This must be called before adding any buf to vring.
> > >
> > > And any old buffer should be detached?
> >
> > I mean that before adding any buf, So there are not old buffer.
> >
>
> Oh. So put this in the same sentence:
>
> 	This function must be called immediately after creating the vq,
> 	or after vq reset, and before adding any buffers to it.


OK, thanks.

>
>
> > >
> > > > + * So this should be called immediately after init vq or vq reset.
>
> Do you really need to call this again after each reset?

YES


Thanks.


>
>
> > > Any way to detect and warn in this case? (not a must if it's too
> > > expensive to do the check)
> >
> >
> > I can try to check whether the qeueu is empty.
> >
> >
> > >
> > > > + *
> > > > + * Caller must ensure we don't call this with other virtqueue operations
> > > > + * at the same time (except where noted).
> > > > + *
> > > > + * Returns zero or a negative error.
> > > > + * 0: success.
> > > > + * -EINVAL: vring does not use the dma api, so we can not enable premapped mode.
> > > > + */
> > > > +int virtqueue_set_premapped(struct virtqueue *_vq)
> > > > +{
> > > > +       struct vring_virtqueue *vq = to_vvq(_vq);
> > > > +
> > > > +       if (!vq->use_dma_api)
> > > > +               return -EINVAL;
> > > > +
> > > > +       vq->premapped = true;
> > >
> > > I guess there should be a way to disable it. Would it be useful for
> > > the case when AF_XDP sockets were destroyed?
> >
> > Yes.
> >
> > When we reset the queue, the vq->premapped will be set to 0.
> >
> > The is called after find_vqs or reset vq.
> >
> > Thanks.
> >
> >
> >
> > >
> > > Thanks
> > >
> > >
> > > > +
> > > > +       return 0;
> > > > +}
> > > > +EXPORT_SYMBOL_GPL(virtqueue_set_premapped);
> > > > +
> > > >  /* Only available for split ring */
> > > >  struct virtqueue *vring_new_virtqueue(unsigned int index,
> > > >                                       unsigned int num,
> > > > diff --git a/include/linux/virtio.h b/include/linux/virtio.h
> > > > index b93238db94e3..1fc0e1023bd4 100644
> > > > --- a/include/linux/virtio.h
> > > > +++ b/include/linux/virtio.h
> > > > @@ -78,6 +78,8 @@ bool virtqueue_enable_cb(struct virtqueue *vq);
> > > >
> > > >  unsigned virtqueue_enable_cb_prepare(struct virtqueue *vq);
> > > >
> > > > +int virtqueue_set_premapped(struct virtqueue *_vq);
> > > > +
> > > >  bool virtqueue_poll(struct virtqueue *vq, unsigned);
> > > >
> > > >  bool virtqueue_enable_cb_delayed(struct virtqueue *vq);
> > > > --
> > > > 2.32.0.3.g01195cf9f
> > > >
> > >
>

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH vhost v10 03/10] virtio_ring: split: support add premapped buf
  2023-06-27  9:01       ` Xuan Zhuo
  (?)
@ 2023-06-28  4:07       ` Jason Wang
  2023-06-28  6:00         ` Xuan Zhuo
  -1 siblings, 1 reply; 91+ messages in thread
From: Jason Wang @ 2023-06-28  4:07 UTC (permalink / raw)
  To: Xuan Zhuo
  Cc: Jesper Dangaard Brouer, Daniel Borkmann, Michael S. Tsirkin,
	netdev, John Fastabend, Alexei Starovoitov, virtualization,
	Eric Dumazet, Jakub Kicinski, bpf, Paolo Abeni, David S. Miller

On Tue, Jun 27, 2023 at 5:05 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
>
> On Tue, 27 Jun 2023 16:03:26 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > On Fri, Jun 2, 2023 at 5:22 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > >
> > > If the vq is the premapped mode, use the sg_dma_address() directly.
> > >
> > > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > > ---
> > >  drivers/virtio/virtio_ring.c | 46 ++++++++++++++++++++++--------------
> > >  1 file changed, 28 insertions(+), 18 deletions(-)
> > >
> > > diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> > > index 2afdfb9e3e30..18212c3e056b 100644
> > > --- a/drivers/virtio/virtio_ring.c
> > > +++ b/drivers/virtio/virtio_ring.c
> > > @@ -598,8 +598,12 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
> > >                 for (sg = sgs[n]; sg; sg = sg_next(sg)) {
> > >                         dma_addr_t addr;
> > >
> > > -                       if (vring_map_one_sg(vq, sg, DMA_TO_DEVICE, &addr))
> > > -                               goto unmap_release;
> > > +                       if (vq->premapped) {
> > > +                               addr = sg_dma_address(sg);
> > > +                       } else {
> > > +                               if (vring_map_one_sg(vq, sg, DMA_TO_DEVICE, &addr))
> > > +                                       goto unmap_release;
> > > +                       }
> >
> > Btw, I wonder whether or not it would be simple to implement the
> > vq->premapped check inside vring_map_one_sg() assuming the
> > !use_dma_api is done there as well.
>
>
> YES,
>
> That will more simple for the caller.
>
> But we will have things like:
>
> int func(bool do)
> {
> if (!do)
>     return;
> }
>
> I like this way, but you don't like it in last version.

I see :)

So I think it depends on the error handling path, we should choose a
way that can let us easily deal with errors.

For example, it seems the current approach is better since it doesn't
need to change the unmap_release.

Thanks

>
> >
> > >
> > >                         prev = i;
> > >                         /* Note that we trust indirect descriptor
> > > @@ -614,8 +618,12 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
> > >                 for (sg = sgs[n]; sg; sg = sg_next(sg)) {
> > >                         dma_addr_t addr;
> > >
> > > -                       if (vring_map_one_sg(vq, sg, DMA_FROM_DEVICE, &addr))
> > > -                               goto unmap_release;
> > > +                       if (vq->premapped) {
> > > +                               addr = sg_dma_address(sg);
> > > +                       } else {
> > > +                               if (vring_map_one_sg(vq, sg, DMA_FROM_DEVICE, &addr))
> > > +                                       goto unmap_release;
> > > +                       }
> > >
> > >                         prev = i;
> > >                         /* Note that we trust indirect descriptor
> > > @@ -689,21 +697,23 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
> > >         return 0;
> > >
> > >  unmap_release:
> > > -       err_idx = i;
> > > +       if (!vq->premapped) {
> >
> > Can vq->premapped be true here? The label is named as "unmap_relase"
> > which implies "map" beforehand which seems not the case for
> > premapping.
>
> I see.
>
> Rethink about this, there is a better way.
> I will fix in next version.
>
>
> Thanks.
>
>
> >
> > Thanks
> >
> >
> > > +               err_idx = i;
> > >
> > > -       if (indirect)
> > > -               i = 0;
> > > -       else
> > > -               i = head;
> > > -
> > > -       for (n = 0; n < total_sg; n++) {
> > > -               if (i == err_idx)
> > > -                       break;
> > > -               if (indirect) {
> > > -                       vring_unmap_one_split_indirect(vq, &desc[i]);
> > > -                       i = virtio16_to_cpu(_vq->vdev, desc[i].next);
> > > -               } else
> > > -                       i = vring_unmap_one_split(vq, i);
> > > +               if (indirect)
> > > +                       i = 0;
> > > +               else
> > > +                       i = head;
> > > +
> > > +               for (n = 0; n < total_sg; n++) {
> > > +                       if (i == err_idx)
> > > +                               break;
> > > +                       if (indirect) {
> > > +                               vring_unmap_one_split_indirect(vq, &desc[i]);
> > > +                               i = virtio16_to_cpu(_vq->vdev, desc[i].next);
> > > +                       } else
> > > +                               i = vring_unmap_one_split(vq, i);
> > > +               }
> > >         }
> > >
> > >         if (indirect)
> > > --
> > > 2.32.0.3.g01195cf9f
> > >
> >
>

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH vhost v10 03/10] virtio_ring: split: support add premapped buf
  2023-06-28  4:07       ` Jason Wang
@ 2023-06-28  6:00         ` Xuan Zhuo
  2023-06-28  6:51           ` Jason Wang
  0 siblings, 1 reply; 91+ messages in thread
From: Xuan Zhuo @ 2023-06-28  6:00 UTC (permalink / raw)
  To: Jason Wang
  Cc: Jesper Dangaard Brouer, Daniel Borkmann, Michael S. Tsirkin,
	netdev, John Fastabend, Alexei Starovoitov, virtualization,
	Eric Dumazet, Jakub Kicinski, bpf, Paolo Abeni, David S. Miller

On Wed, 28 Jun 2023 12:07:10 +0800, Jason Wang <jasowang@redhat.com> wrote:
> On Tue, Jun 27, 2023 at 5:05 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> >
> > On Tue, 27 Jun 2023 16:03:26 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > On Fri, Jun 2, 2023 at 5:22 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > >
> > > > If the vq is the premapped mode, use the sg_dma_address() directly.
> > > >
> > > > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > > > ---
> > > >  drivers/virtio/virtio_ring.c | 46 ++++++++++++++++++++++--------------
> > > >  1 file changed, 28 insertions(+), 18 deletions(-)
> > > >
> > > > diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> > > > index 2afdfb9e3e30..18212c3e056b 100644
> > > > --- a/drivers/virtio/virtio_ring.c
> > > > +++ b/drivers/virtio/virtio_ring.c
> > > > @@ -598,8 +598,12 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
> > > >                 for (sg = sgs[n]; sg; sg = sg_next(sg)) {
> > > >                         dma_addr_t addr;
> > > >
> > > > -                       if (vring_map_one_sg(vq, sg, DMA_TO_DEVICE, &addr))
> > > > -                               goto unmap_release;
> > > > +                       if (vq->premapped) {
> > > > +                               addr = sg_dma_address(sg);
> > > > +                       } else {
> > > > +                               if (vring_map_one_sg(vq, sg, DMA_TO_DEVICE, &addr))
> > > > +                                       goto unmap_release;
> > > > +                       }
> > >
> > > Btw, I wonder whether or not it would be simple to implement the
> > > vq->premapped check inside vring_map_one_sg() assuming the
> > > !use_dma_api is done there as well.
> >
> >
> > YES,
> >
> > That will more simple for the caller.
> >
> > But we will have things like:
> >
> > int func(bool do)
> > {
> > if (!do)
> >     return;
> > }
> >
> > I like this way, but you don't like it in last version.
>
> I see :)
>
> So I think it depends on the error handling path, we should choose a
> way that can let us easily deal with errors.
>
> For example, it seems the current approach is better since it doesn't
> need to change the unmap_release.

NO,

The unmap_release is same for two way.

Thanks.


>
> Thanks
>
> >
> > >
> > > >
> > > >                         prev = i;
> > > >                         /* Note that we trust indirect descriptor
> > > > @@ -614,8 +618,12 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
> > > >                 for (sg = sgs[n]; sg; sg = sg_next(sg)) {
> > > >                         dma_addr_t addr;
> > > >
> > > > -                       if (vring_map_one_sg(vq, sg, DMA_FROM_DEVICE, &addr))
> > > > -                               goto unmap_release;
> > > > +                       if (vq->premapped) {
> > > > +                               addr = sg_dma_address(sg);
> > > > +                       } else {
> > > > +                               if (vring_map_one_sg(vq, sg, DMA_FROM_DEVICE, &addr))
> > > > +                                       goto unmap_release;
> > > > +                       }
> > > >
> > > >                         prev = i;
> > > >                         /* Note that we trust indirect descriptor
> > > > @@ -689,21 +697,23 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
> > > >         return 0;
> > > >
> > > >  unmap_release:
> > > > -       err_idx = i;
> > > > +       if (!vq->premapped) {
> > >
> > > Can vq->premapped be true here? The label is named as "unmap_relase"
> > > which implies "map" beforehand which seems not the case for
> > > premapping.
> >
> > I see.
> >
> > Rethink about this, there is a better way.
> > I will fix in next version.
> >
> >
> > Thanks.
> >
> >
> > >
> > > Thanks
> > >
> > >
> > > > +               err_idx = i;
> > > >
> > > > -       if (indirect)
> > > > -               i = 0;
> > > > -       else
> > > > -               i = head;
> > > > -
> > > > -       for (n = 0; n < total_sg; n++) {
> > > > -               if (i == err_idx)
> > > > -                       break;
> > > > -               if (indirect) {
> > > > -                       vring_unmap_one_split_indirect(vq, &desc[i]);
> > > > -                       i = virtio16_to_cpu(_vq->vdev, desc[i].next);
> > > > -               } else
> > > > -                       i = vring_unmap_one_split(vq, i);
> > > > +               if (indirect)
> > > > +                       i = 0;
> > > > +               else
> > > > +                       i = head;
> > > > +
> > > > +               for (n = 0; n < total_sg; n++) {
> > > > +                       if (i == err_idx)
> > > > +                               break;
> > > > +                       if (indirect) {
> > > > +                               vring_unmap_one_split_indirect(vq, &desc[i]);
> > > > +                               i = virtio16_to_cpu(_vq->vdev, desc[i].next);
> > > > +                       } else
> > > > +                               i = vring_unmap_one_split(vq, i);
> > > > +               }
> > > >         }
> > > >
> > > >         if (indirect)
> > > > --
> > > > 2.32.0.3.g01195cf9f
> > > >
> > >
> >
>
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH vhost v10 03/10] virtio_ring: split: support add premapped buf
  2023-06-28  6:00         ` Xuan Zhuo
@ 2023-06-28  6:51           ` Jason Wang
  0 siblings, 0 replies; 91+ messages in thread
From: Jason Wang @ 2023-06-28  6:51 UTC (permalink / raw)
  To: Xuan Zhuo
  Cc: Jesper Dangaard Brouer, Daniel Borkmann, Michael S. Tsirkin,
	netdev, John Fastabend, Alexei Starovoitov, virtualization,
	Eric Dumazet, Jakub Kicinski, bpf, Paolo Abeni, David S. Miller

On Wed, Jun 28, 2023 at 2:02 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
>
> On Wed, 28 Jun 2023 12:07:10 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > On Tue, Jun 27, 2023 at 5:05 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > >
> > > On Tue, 27 Jun 2023 16:03:26 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > > On Fri, Jun 2, 2023 at 5:22 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > > >
> > > > > If the vq is the premapped mode, use the sg_dma_address() directly.
> > > > >
> > > > > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > > > > ---
> > > > >  drivers/virtio/virtio_ring.c | 46 ++++++++++++++++++++++--------------
> > > > >  1 file changed, 28 insertions(+), 18 deletions(-)
> > > > >
> > > > > diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> > > > > index 2afdfb9e3e30..18212c3e056b 100644
> > > > > --- a/drivers/virtio/virtio_ring.c
> > > > > +++ b/drivers/virtio/virtio_ring.c
> > > > > @@ -598,8 +598,12 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
> > > > >                 for (sg = sgs[n]; sg; sg = sg_next(sg)) {
> > > > >                         dma_addr_t addr;
> > > > >
> > > > > -                       if (vring_map_one_sg(vq, sg, DMA_TO_DEVICE, &addr))
> > > > > -                               goto unmap_release;
> > > > > +                       if (vq->premapped) {
> > > > > +                               addr = sg_dma_address(sg);
> > > > > +                       } else {
> > > > > +                               if (vring_map_one_sg(vq, sg, DMA_TO_DEVICE, &addr))
> > > > > +                                       goto unmap_release;
> > > > > +                       }
> > > >
> > > > Btw, I wonder whether or not it would be simple to implement the
> > > > vq->premapped check inside vring_map_one_sg() assuming the
> > > > !use_dma_api is done there as well.
> > >
> > >
> > > YES,
> > >
> > > That will more simple for the caller.
> > >
> > > But we will have things like:
> > >
> > > int func(bool do)
> > > {
> > > if (!do)
> > >     return;
> > > }
> > >
> > > I like this way, but you don't like it in last version.
> >
> > I see :)
> >
> > So I think it depends on the error handling path, we should choose a
> > way that can let us easily deal with errors.
> >
> > For example, it seems the current approach is better since it doesn't
> > need to change the unmap_release.
>
> NO,
>
> The unmap_release is same for two way.
>
> Thanks.

Ok, so either is fine for me.

Thanks

>
>
> >
> > Thanks
> >
> > >
> > > >
> > > > >
> > > > >                         prev = i;
> > > > >                         /* Note that we trust indirect descriptor
> > > > > @@ -614,8 +618,12 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
> > > > >                 for (sg = sgs[n]; sg; sg = sg_next(sg)) {
> > > > >                         dma_addr_t addr;
> > > > >
> > > > > -                       if (vring_map_one_sg(vq, sg, DMA_FROM_DEVICE, &addr))
> > > > > -                               goto unmap_release;
> > > > > +                       if (vq->premapped) {
> > > > > +                               addr = sg_dma_address(sg);
> > > > > +                       } else {
> > > > > +                               if (vring_map_one_sg(vq, sg, DMA_FROM_DEVICE, &addr))
> > > > > +                                       goto unmap_release;
> > > > > +                       }
> > > > >
> > > > >                         prev = i;
> > > > >                         /* Note that we trust indirect descriptor
> > > > > @@ -689,21 +697,23 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
> > > > >         return 0;
> > > > >
> > > > >  unmap_release:
> > > > > -       err_idx = i;
> > > > > +       if (!vq->premapped) {
> > > >
> > > > Can vq->premapped be true here? The label is named as "unmap_relase"
> > > > which implies "map" beforehand which seems not the case for
> > > > premapping.
> > >
> > > I see.
> > >
> > > Rethink about this, there is a better way.
> > > I will fix in next version.
> > >
> > >
> > > Thanks.
> > >
> > >
> > > >
> > > > Thanks
> > > >
> > > >
> > > > > +               err_idx = i;
> > > > >
> > > > > -       if (indirect)
> > > > > -               i = 0;
> > > > > -       else
> > > > > -               i = head;
> > > > > -
> > > > > -       for (n = 0; n < total_sg; n++) {
> > > > > -               if (i == err_idx)
> > > > > -                       break;
> > > > > -               if (indirect) {
> > > > > -                       vring_unmap_one_split_indirect(vq, &desc[i]);
> > > > > -                       i = virtio16_to_cpu(_vq->vdev, desc[i].next);
> > > > > -               } else
> > > > > -                       i = vring_unmap_one_split(vq, i);
> > > > > +               if (indirect)
> > > > > +                       i = 0;
> > > > > +               else
> > > > > +                       i = head;
> > > > > +
> > > > > +               for (n = 0; n < total_sg; n++) {
> > > > > +                       if (i == err_idx)
> > > > > +                               break;
> > > > > +                       if (indirect) {
> > > > > +                               vring_unmap_one_split_indirect(vq, &desc[i]);
> > > > > +                               i = virtio16_to_cpu(_vq->vdev, desc[i].next);
> > > > > +                       } else
> > > > > +                               i = vring_unmap_one_split(vq, i);
> > > > > +               }
> > > > >         }
> > > > >
> > > > >         if (indirect)
> > > > > --
> > > > > 2.32.0.3.g01195cf9f
> > > > >
> > > >
> > >
> >
>

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 91+ messages in thread

end of thread, other threads:[~2023-06-28  6:51 UTC | newest]

Thread overview: 91+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-06-02  9:21 [PATCH vhost v10 00/10] virtio core prepares for AF_XDP Xuan Zhuo
2023-06-02  9:21 ` Xuan Zhuo
2023-06-02  9:21 ` [PATCH vhost v10 01/10] virtio_ring: put mapping error check in vring_map_one_sg Xuan Zhuo
2023-06-02  9:21   ` Xuan Zhuo
2023-06-27  8:03   ` Jason Wang
2023-06-27  8:03     ` Jason Wang
2023-06-02  9:21 ` [PATCH vhost v10 02/10] virtio_ring: introduce virtqueue_set_premapped() Xuan Zhuo
2023-06-02  9:21   ` Xuan Zhuo
2023-06-27  8:03   ` Jason Wang
2023-06-27  8:03     ` Jason Wang
2023-06-27  8:50     ` Xuan Zhuo
2023-06-27  8:50       ` Xuan Zhuo
2023-06-27 14:56       ` Michael S. Tsirkin
2023-06-27 14:56         ` Michael S. Tsirkin
2023-06-28  1:34         ` Xuan Zhuo
2023-06-28  1:34           ` Xuan Zhuo
2023-06-02  9:21 ` [PATCH vhost v10 03/10] virtio_ring: split: support add premapped buf Xuan Zhuo
2023-06-02  9:21   ` Xuan Zhuo
2023-06-27  8:03   ` Jason Wang
2023-06-27  8:03     ` Jason Wang
2023-06-27  9:01     ` Xuan Zhuo
2023-06-27  9:01       ` Xuan Zhuo
2023-06-28  4:07       ` Jason Wang
2023-06-28  6:00         ` Xuan Zhuo
2023-06-28  6:51           ` Jason Wang
2023-06-02  9:22 ` [PATCH vhost v10 04/10] virtio_ring: packed: " Xuan Zhuo
2023-06-02  9:22   ` Xuan Zhuo
2023-06-27  8:03   ` Jason Wang
2023-06-27  8:03     ` Jason Wang
2023-06-27  9:05     ` Xuan Zhuo
2023-06-27  9:05       ` Xuan Zhuo
2023-06-02  9:22 ` [PATCH vhost v10 05/10] virtio_ring: split-detach: support return dma info to driver Xuan Zhuo
2023-06-02  9:22   ` Xuan Zhuo
2023-06-22 19:36   ` Michael S. Tsirkin
2023-06-22 19:36     ` Michael S. Tsirkin
2023-06-25  2:10     ` Xuan Zhuo
2023-06-25  2:10       ` Xuan Zhuo
2023-06-27  8:03   ` Jason Wang
2023-06-27  8:03     ` Jason Wang
2023-06-27  9:21     ` Xuan Zhuo
2023-06-27  9:21       ` Xuan Zhuo
2023-06-02  9:22 ` [PATCH vhost v10 06/10] virtio_ring: packed-detach: " Xuan Zhuo
2023-06-02  9:22   ` Xuan Zhuo
2023-06-02 11:40   ` Michael S. Tsirkin
2023-06-02 11:40     ` Michael S. Tsirkin
2023-06-02  9:22 ` [PATCH vhost v10 07/10] virtio_ring: introduce helpers for premapped Xuan Zhuo
2023-06-02  9:22   ` Xuan Zhuo
2023-06-04 13:45   ` Michael S. Tsirkin
2023-06-04 13:45     ` Michael S. Tsirkin
2023-06-05  2:06     ` Xuan Zhuo
2023-06-05  2:06       ` Xuan Zhuo
2023-06-05  5:38       ` Michael S. Tsirkin
2023-06-05  5:38         ` Michael S. Tsirkin
2023-06-06  2:01         ` Xuan Zhuo
2023-06-06  2:01           ` Xuan Zhuo
2023-06-22 19:29   ` Michael S. Tsirkin
2023-06-22 19:29     ` Michael S. Tsirkin
2023-06-02  9:22 ` [PATCH vhost v10 08/10] virtio_ring: introduce virtqueue_dma_dev() Xuan Zhuo
2023-06-02  9:22   ` Xuan Zhuo
2023-06-02  9:22 ` [PATCH vhost v10 09/10] virtio_ring: introduce virtqueue_add_sg() Xuan Zhuo
2023-06-02  9:22   ` Xuan Zhuo
2023-06-02  9:22 ` [PATCH vhost v10 10/10] virtio_net: support dma premapped Xuan Zhuo
2023-06-02  9:22   ` Xuan Zhuo
2023-06-03  6:31   ` Jakub Kicinski
2023-06-05  2:10     ` Xuan Zhuo
2023-06-05  2:10       ` Xuan Zhuo
2023-06-05  5:44       ` Michael S. Tsirkin
2023-06-05  5:44         ` Michael S. Tsirkin
2023-06-06  2:11         ` Xuan Zhuo
2023-06-06  2:11           ` Xuan Zhuo
2023-06-22 12:15   ` Michael S. Tsirkin
2023-06-22 12:15     ` Michael S. Tsirkin
2023-06-25  2:43     ` Xuan Zhuo
2023-06-25  2:43       ` Xuan Zhuo
2023-06-27  8:03   ` Jason Wang
2023-06-27  8:03     ` Jason Wang
2023-06-27  9:23     ` Xuan Zhuo
2023-06-27  9:23       ` Xuan Zhuo
2023-06-03  6:29 ` [PATCH vhost v10 00/10] virtio core prepares for AF_XDP Jakub Kicinski
2023-06-05  1:58   ` Xuan Zhuo
2023-06-05  1:58     ` Xuan Zhuo
2023-06-07 14:05     ` Christoph Hellwig
2023-06-07 14:05       ` Christoph Hellwig
2023-06-07 20:15       ` Michael S. Tsirkin
2023-06-07 20:15         ` Michael S. Tsirkin
2023-06-21  6:42 ` Xuan Zhuo
2023-06-21  6:42   ` Xuan Zhuo
2023-06-25  7:19   ` Jason Wang
2023-06-25  7:19     ` Jason Wang
2023-06-22 19:38 ` Michael S. Tsirkin
2023-06-22 19:38   ` Michael S. Tsirkin

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.