All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH vhost v11 00/10] virtio core prepares for AF_XDP
@ 2023-07-10  3:42 ` Xuan Zhuo
  0 siblings, 0 replies; 176+ messages in thread
From: Xuan Zhuo @ 2023-07-10  3:42 UTC (permalink / raw)
  To: virtualization
  Cc: Michael S. Tsirkin, Jason Wang, Xuan Zhuo, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Alexei Starovoitov,
	Daniel Borkmann, Jesper Dangaard Brouer, John Fastabend, netdev,
	bpf, Christoph Hellwig

## About DMA APIs

Now, virtio may can not work with DMA APIs when virtio features do not have
VIRTIO_F_ACCESS_PLATFORM.

1. I tried to let DMA APIs return phy address by virtio-device. But DMA APIs just
   work with the "real" devices.
2. I tried to let xsk support callballs to get phy address from virtio-net
   driver as the dma address. But the maintainers of xsk may want to use dma-buf
   to replace the DMA APIs. I think that may be a larger effort. We will wait
   too long.

So rethinking this, firstly, we can support premapped-dma only for devices with
VIRTIO_F_ACCESS_PLATFORM. In the case of af-xdp, if the users want to use it,
they have to update the device to support VIRTIO_F_RING_RESET, and they can also
enable the device's VIRTIO_F_ACCESS_PLATFORM feature.

Thanks for the help from Christoph.

=================

XDP socket(AF_XDP) is an excellent bypass kernel network framework. The zero
copy feature of xsk (XDP socket) needs to be supported by the driver. The
performance of zero copy is very good.

ENV: Qemu with vhost.

                   vhost cpu | Guest APP CPU |Guest Softirq CPU | PPS
-----------------------------|---------------|------------------|------------
xmit by sockperf:     90%    |   100%        |                  |  318967
xmit by xsk:          100%   |   30%         |   33%            | 1192064
recv by sockperf:     100%   |   68%         |   100%           |  692288
recv by xsk:          100%   |   33%         |   43%            |  771670

Before achieving the function of Virtio-Net, we also have to let virtio core
support these features:

1. virtio core support premapped
2. virtio core support reset per-queue

=================

After introducing premapping, I added an example to virtio-net. virtio-net can
merge dma mappings through this feature. @Jason


Please review.

Thanks.

v11
 1. virtio-net merges dma operates based on the feature premapped
 2. A better way to handle the map error with the premapped

v10:
 1. support to set vq to premapped mode, then the vq just handles the premapped request.
 2. virtio-net support to do dma mapping in advance

v9:
 1. use flag to distinguish the premapped operations. no do judgment by sg.

v8:
 1. vring_sg_address: check by sg_page(sg) not dma_address. Because 0 is a valid dma address
 2. remove unused code from vring_map_one_sg()

v7:
 1. virtqueue_dma_dev() return NULL when virtio is without DMA API.

v6:
 1. change the size of the flags to u32.

v5:
 1. fix for error handler
 2. add flags to record internal dma mapping

v4:
 1. rename map_inter to dma_map_internal
 2. fix: Excess function parameter 'vq' description in 'virtqueue_dma_dev'

v3:
 1. add map_inter to struct desc state to reocrd whether virtio core do dma map

v2:
 1. based on sgs[0]->dma_address to judgment is premapped
 2. based on extra.addr to judgment to do unmap for no-indirect desc
 3. based on indir_desc to judgment to do unmap for indirect desc
 4. rename virtqueue_get_dma_dev to virtqueue_dma_dev

v1:
 1. expose dma device. NO introduce the api for dma and sync
 2. split some commit for review.





Xuan Zhuo (10):
  virtio_ring: check use_dma_api before unmap desc for indirect
  virtio_ring: put mapping error check in vring_map_one_sg
  virtio_ring: introduce virtqueue_set_premapped()
  virtio_ring: support add premapped buf
  virtio_ring: introduce virtqueue_dma_dev()
  virtio_ring: skip unmap for premapped
  virtio_ring: correct the expression of the description of
    virtqueue_resize()
  virtio_ring: separate the logic of reset/enable from virtqueue_resize
  virtio_ring: introduce virtqueue_reset()
  virtio_net: merge dma operation for one page

 drivers/net/virtio_net.c     | 283 +++++++++++++++++++++++++++++++++--
 drivers/virtio/virtio_ring.c | 257 ++++++++++++++++++++++++-------
 include/linux/virtio.h       |   6 +
 3 files changed, 478 insertions(+), 68 deletions(-)

--
2.32.0.3.g01195cf9f


^ permalink raw reply	[flat|nested] 176+ messages in thread

* [PATCH vhost v11 00/10] virtio core prepares for AF_XDP
@ 2023-07-10  3:42 ` Xuan Zhuo
  0 siblings, 0 replies; 176+ messages in thread
From: Xuan Zhuo @ 2023-07-10  3:42 UTC (permalink / raw)
  To: virtualization
  Cc: Xuan Zhuo, Jesper Dangaard Brouer, Daniel Borkmann,
	Michael S. Tsirkin, netdev, John Fastabend, Alexei Starovoitov,
	Christoph Hellwig, Eric Dumazet, Jakub Kicinski, bpf,
	Paolo Abeni, David S. Miller

## About DMA APIs

Now, virtio may can not work with DMA APIs when virtio features do not have
VIRTIO_F_ACCESS_PLATFORM.

1. I tried to let DMA APIs return phy address by virtio-device. But DMA APIs just
   work with the "real" devices.
2. I tried to let xsk support callballs to get phy address from virtio-net
   driver as the dma address. But the maintainers of xsk may want to use dma-buf
   to replace the DMA APIs. I think that may be a larger effort. We will wait
   too long.

So rethinking this, firstly, we can support premapped-dma only for devices with
VIRTIO_F_ACCESS_PLATFORM. In the case of af-xdp, if the users want to use it,
they have to update the device to support VIRTIO_F_RING_RESET, and they can also
enable the device's VIRTIO_F_ACCESS_PLATFORM feature.

Thanks for the help from Christoph.

=================

XDP socket(AF_XDP) is an excellent bypass kernel network framework. The zero
copy feature of xsk (XDP socket) needs to be supported by the driver. The
performance of zero copy is very good.

ENV: Qemu with vhost.

                   vhost cpu | Guest APP CPU |Guest Softirq CPU | PPS
-----------------------------|---------------|------------------|------------
xmit by sockperf:     90%    |   100%        |                  |  318967
xmit by xsk:          100%   |   30%         |   33%            | 1192064
recv by sockperf:     100%   |   68%         |   100%           |  692288
recv by xsk:          100%   |   33%         |   43%            |  771670

Before achieving the function of Virtio-Net, we also have to let virtio core
support these features:

1. virtio core support premapped
2. virtio core support reset per-queue

=================

After introducing premapping, I added an example to virtio-net. virtio-net can
merge dma mappings through this feature. @Jason


Please review.

Thanks.

v11
 1. virtio-net merges dma operates based on the feature premapped
 2. A better way to handle the map error with the premapped

v10:
 1. support to set vq to premapped mode, then the vq just handles the premapped request.
 2. virtio-net support to do dma mapping in advance

v9:
 1. use flag to distinguish the premapped operations. no do judgment by sg.

v8:
 1. vring_sg_address: check by sg_page(sg) not dma_address. Because 0 is a valid dma address
 2. remove unused code from vring_map_one_sg()

v7:
 1. virtqueue_dma_dev() return NULL when virtio is without DMA API.

v6:
 1. change the size of the flags to u32.

v5:
 1. fix for error handler
 2. add flags to record internal dma mapping

v4:
 1. rename map_inter to dma_map_internal
 2. fix: Excess function parameter 'vq' description in 'virtqueue_dma_dev'

v3:
 1. add map_inter to struct desc state to reocrd whether virtio core do dma map

v2:
 1. based on sgs[0]->dma_address to judgment is premapped
 2. based on extra.addr to judgment to do unmap for no-indirect desc
 3. based on indir_desc to judgment to do unmap for indirect desc
 4. rename virtqueue_get_dma_dev to virtqueue_dma_dev

v1:
 1. expose dma device. NO introduce the api for dma and sync
 2. split some commit for review.





Xuan Zhuo (10):
  virtio_ring: check use_dma_api before unmap desc for indirect
  virtio_ring: put mapping error check in vring_map_one_sg
  virtio_ring: introduce virtqueue_set_premapped()
  virtio_ring: support add premapped buf
  virtio_ring: introduce virtqueue_dma_dev()
  virtio_ring: skip unmap for premapped
  virtio_ring: correct the expression of the description of
    virtqueue_resize()
  virtio_ring: separate the logic of reset/enable from virtqueue_resize
  virtio_ring: introduce virtqueue_reset()
  virtio_net: merge dma operation for one page

 drivers/net/virtio_net.c     | 283 +++++++++++++++++++++++++++++++++--
 drivers/virtio/virtio_ring.c | 257 ++++++++++++++++++++++++-------
 include/linux/virtio.h       |   6 +
 3 files changed, 478 insertions(+), 68 deletions(-)

--
2.32.0.3.g01195cf9f

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 176+ messages in thread

* [PATCH vhost v11 01/10] virtio_ring: check use_dma_api before unmap desc for indirect
  2023-07-10  3:42 ` Xuan Zhuo
@ 2023-07-10  3:42   ` Xuan Zhuo
  -1 siblings, 0 replies; 176+ messages in thread
From: Xuan Zhuo @ 2023-07-10  3:42 UTC (permalink / raw)
  To: virtualization
  Cc: Michael S. Tsirkin, Jason Wang, Xuan Zhuo, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Alexei Starovoitov,
	Daniel Borkmann, Jesper Dangaard Brouer, John Fastabend, netdev,
	bpf, Christoph Hellwig

Inside detach_buf_split(), if use_dma_api is false,
vring_unmap_one_split_indirect will be called many times, but actually
nothing is done. So this patch check use_dma_api firstly.

Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Jason Wang <jasowang@redhat.com>
---
 drivers/virtio/virtio_ring.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index c5310eaf8b46..f8754f1d64d3 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -774,8 +774,10 @@ static void detach_buf_split(struct vring_virtqueue *vq, unsigned int head,
 				VRING_DESC_F_INDIRECT));
 		BUG_ON(len == 0 || len % sizeof(struct vring_desc));
 
-		for (j = 0; j < len / sizeof(struct vring_desc); j++)
-			vring_unmap_one_split_indirect(vq, &indir_desc[j]);
+		if (vq->use_dma_api) {
+			for (j = 0; j < len / sizeof(struct vring_desc); j++)
+				vring_unmap_one_split_indirect(vq, &indir_desc[j]);
+		}
 
 		kfree(indir_desc);
 		vq->split.desc_state[head].indir_desc = NULL;
-- 
2.32.0.3.g01195cf9f


^ permalink raw reply related	[flat|nested] 176+ messages in thread

* [PATCH vhost v11 01/10] virtio_ring: check use_dma_api before unmap desc for indirect
@ 2023-07-10  3:42   ` Xuan Zhuo
  0 siblings, 0 replies; 176+ messages in thread
From: Xuan Zhuo @ 2023-07-10  3:42 UTC (permalink / raw)
  To: virtualization
  Cc: Xuan Zhuo, Jesper Dangaard Brouer, Daniel Borkmann,
	Michael S. Tsirkin, netdev, John Fastabend, Alexei Starovoitov,
	Christoph Hellwig, Eric Dumazet, Jakub Kicinski, bpf,
	Paolo Abeni, David S. Miller

Inside detach_buf_split(), if use_dma_api is false,
vring_unmap_one_split_indirect will be called many times, but actually
nothing is done. So this patch check use_dma_api firstly.

Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Jason Wang <jasowang@redhat.com>
---
 drivers/virtio/virtio_ring.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index c5310eaf8b46..f8754f1d64d3 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -774,8 +774,10 @@ static void detach_buf_split(struct vring_virtqueue *vq, unsigned int head,
 				VRING_DESC_F_INDIRECT));
 		BUG_ON(len == 0 || len % sizeof(struct vring_desc));
 
-		for (j = 0; j < len / sizeof(struct vring_desc); j++)
-			vring_unmap_one_split_indirect(vq, &indir_desc[j]);
+		if (vq->use_dma_api) {
+			for (j = 0; j < len / sizeof(struct vring_desc); j++)
+				vring_unmap_one_split_indirect(vq, &indir_desc[j]);
+		}
 
 		kfree(indir_desc);
 		vq->split.desc_state[head].indir_desc = NULL;
-- 
2.32.0.3.g01195cf9f

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply related	[flat|nested] 176+ messages in thread

* [PATCH vhost v11 02/10] virtio_ring: put mapping error check in vring_map_one_sg
  2023-07-10  3:42 ` Xuan Zhuo
@ 2023-07-10  3:42   ` Xuan Zhuo
  -1 siblings, 0 replies; 176+ messages in thread
From: Xuan Zhuo @ 2023-07-10  3:42 UTC (permalink / raw)
  To: virtualization
  Cc: Michael S. Tsirkin, Jason Wang, Xuan Zhuo, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Alexei Starovoitov,
	Daniel Borkmann, Jesper Dangaard Brouer, John Fastabend, netdev,
	bpf, Christoph Hellwig

This patch put the dma addr error check in vring_map_one_sg().

The benefits of doing this:

1. reduce one judgment of vq->use_dma_api.
2. make vring_map_one_sg more simple, without calling
   vring_mapping_error to check the return value. simplifies subsequent
   code

Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Jason Wang <jasowang@redhat.com>
---
 drivers/virtio/virtio_ring.c | 37 +++++++++++++++++++++---------------
 1 file changed, 22 insertions(+), 15 deletions(-)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index f8754f1d64d3..87d7ceeecdbd 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -355,9 +355,8 @@ static struct device *vring_dma_dev(const struct vring_virtqueue *vq)
 }
 
 /* Map one sg entry. */
-static dma_addr_t vring_map_one_sg(const struct vring_virtqueue *vq,
-				   struct scatterlist *sg,
-				   enum dma_data_direction direction)
+static int vring_map_one_sg(const struct vring_virtqueue *vq, struct scatterlist *sg,
+			    enum dma_data_direction direction, dma_addr_t *addr)
 {
 	if (!vq->use_dma_api) {
 		/*
@@ -366,7 +365,8 @@ static dma_addr_t vring_map_one_sg(const struct vring_virtqueue *vq,
 		 * depending on the direction.
 		 */
 		kmsan_handle_dma(sg_page(sg), sg->offset, sg->length, direction);
-		return (dma_addr_t)sg_phys(sg);
+		*addr = (dma_addr_t)sg_phys(sg);
+		return 0;
 	}
 
 	/*
@@ -374,9 +374,14 @@ static dma_addr_t vring_map_one_sg(const struct vring_virtqueue *vq,
 	 * the way it expects (we don't guarantee that the scatterlist
 	 * will exist for the lifetime of the mapping).
 	 */
-	return dma_map_page(vring_dma_dev(vq),
+	*addr = dma_map_page(vring_dma_dev(vq),
 			    sg_page(sg), sg->offset, sg->length,
 			    direction);
+
+	if (dma_mapping_error(vring_dma_dev(vq), *addr))
+		return -ENOMEM;
+
+	return 0;
 }
 
 static dma_addr_t vring_map_single(const struct vring_virtqueue *vq,
@@ -588,8 +593,9 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
 
 	for (n = 0; n < out_sgs; n++) {
 		for (sg = sgs[n]; sg; sg = sg_next(sg)) {
-			dma_addr_t addr = vring_map_one_sg(vq, sg, DMA_TO_DEVICE);
-			if (vring_mapping_error(vq, addr))
+			dma_addr_t addr;
+
+			if (vring_map_one_sg(vq, sg, DMA_TO_DEVICE, &addr))
 				goto unmap_release;
 
 			prev = i;
@@ -603,8 +609,9 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
 	}
 	for (; n < (out_sgs + in_sgs); n++) {
 		for (sg = sgs[n]; sg; sg = sg_next(sg)) {
-			dma_addr_t addr = vring_map_one_sg(vq, sg, DMA_FROM_DEVICE);
-			if (vring_mapping_error(vq, addr))
+			dma_addr_t addr;
+
+			if (vring_map_one_sg(vq, sg, DMA_FROM_DEVICE, &addr))
 				goto unmap_release;
 
 			prev = i;
@@ -1281,9 +1288,8 @@ static int virtqueue_add_indirect_packed(struct vring_virtqueue *vq,
 
 	for (n = 0; n < out_sgs + in_sgs; n++) {
 		for (sg = sgs[n]; sg; sg = sg_next(sg)) {
-			addr = vring_map_one_sg(vq, sg, n < out_sgs ?
-					DMA_TO_DEVICE : DMA_FROM_DEVICE);
-			if (vring_mapping_error(vq, addr))
+			if (vring_map_one_sg(vq, sg, n < out_sgs ?
+					     DMA_TO_DEVICE : DMA_FROM_DEVICE, &addr))
 				goto unmap_release;
 
 			desc[i].flags = cpu_to_le16(n < out_sgs ?
@@ -1428,9 +1434,10 @@ static inline int virtqueue_add_packed(struct virtqueue *_vq,
 	c = 0;
 	for (n = 0; n < out_sgs + in_sgs; n++) {
 		for (sg = sgs[n]; sg; sg = sg_next(sg)) {
-			dma_addr_t addr = vring_map_one_sg(vq, sg, n < out_sgs ?
-					DMA_TO_DEVICE : DMA_FROM_DEVICE);
-			if (vring_mapping_error(vq, addr))
+			dma_addr_t addr;
+
+			if (vring_map_one_sg(vq, sg, n < out_sgs ?
+					     DMA_TO_DEVICE : DMA_FROM_DEVICE, &addr))
 				goto unmap_release;
 
 			flags = cpu_to_le16(vq->packed.avail_used_flags |
-- 
2.32.0.3.g01195cf9f


^ permalink raw reply related	[flat|nested] 176+ messages in thread

* [PATCH vhost v11 02/10] virtio_ring: put mapping error check in vring_map_one_sg
@ 2023-07-10  3:42   ` Xuan Zhuo
  0 siblings, 0 replies; 176+ messages in thread
From: Xuan Zhuo @ 2023-07-10  3:42 UTC (permalink / raw)
  To: virtualization
  Cc: Xuan Zhuo, Jesper Dangaard Brouer, Daniel Borkmann,
	Michael S. Tsirkin, netdev, John Fastabend, Alexei Starovoitov,
	Christoph Hellwig, Eric Dumazet, Jakub Kicinski, bpf,
	Paolo Abeni, David S. Miller

This patch put the dma addr error check in vring_map_one_sg().

The benefits of doing this:

1. reduce one judgment of vq->use_dma_api.
2. make vring_map_one_sg more simple, without calling
   vring_mapping_error to check the return value. simplifies subsequent
   code

Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Jason Wang <jasowang@redhat.com>
---
 drivers/virtio/virtio_ring.c | 37 +++++++++++++++++++++---------------
 1 file changed, 22 insertions(+), 15 deletions(-)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index f8754f1d64d3..87d7ceeecdbd 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -355,9 +355,8 @@ static struct device *vring_dma_dev(const struct vring_virtqueue *vq)
 }
 
 /* Map one sg entry. */
-static dma_addr_t vring_map_one_sg(const struct vring_virtqueue *vq,
-				   struct scatterlist *sg,
-				   enum dma_data_direction direction)
+static int vring_map_one_sg(const struct vring_virtqueue *vq, struct scatterlist *sg,
+			    enum dma_data_direction direction, dma_addr_t *addr)
 {
 	if (!vq->use_dma_api) {
 		/*
@@ -366,7 +365,8 @@ static dma_addr_t vring_map_one_sg(const struct vring_virtqueue *vq,
 		 * depending on the direction.
 		 */
 		kmsan_handle_dma(sg_page(sg), sg->offset, sg->length, direction);
-		return (dma_addr_t)sg_phys(sg);
+		*addr = (dma_addr_t)sg_phys(sg);
+		return 0;
 	}
 
 	/*
@@ -374,9 +374,14 @@ static dma_addr_t vring_map_one_sg(const struct vring_virtqueue *vq,
 	 * the way it expects (we don't guarantee that the scatterlist
 	 * will exist for the lifetime of the mapping).
 	 */
-	return dma_map_page(vring_dma_dev(vq),
+	*addr = dma_map_page(vring_dma_dev(vq),
 			    sg_page(sg), sg->offset, sg->length,
 			    direction);
+
+	if (dma_mapping_error(vring_dma_dev(vq), *addr))
+		return -ENOMEM;
+
+	return 0;
 }
 
 static dma_addr_t vring_map_single(const struct vring_virtqueue *vq,
@@ -588,8 +593,9 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
 
 	for (n = 0; n < out_sgs; n++) {
 		for (sg = sgs[n]; sg; sg = sg_next(sg)) {
-			dma_addr_t addr = vring_map_one_sg(vq, sg, DMA_TO_DEVICE);
-			if (vring_mapping_error(vq, addr))
+			dma_addr_t addr;
+
+			if (vring_map_one_sg(vq, sg, DMA_TO_DEVICE, &addr))
 				goto unmap_release;
 
 			prev = i;
@@ -603,8 +609,9 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
 	}
 	for (; n < (out_sgs + in_sgs); n++) {
 		for (sg = sgs[n]; sg; sg = sg_next(sg)) {
-			dma_addr_t addr = vring_map_one_sg(vq, sg, DMA_FROM_DEVICE);
-			if (vring_mapping_error(vq, addr))
+			dma_addr_t addr;
+
+			if (vring_map_one_sg(vq, sg, DMA_FROM_DEVICE, &addr))
 				goto unmap_release;
 
 			prev = i;
@@ -1281,9 +1288,8 @@ static int virtqueue_add_indirect_packed(struct vring_virtqueue *vq,
 
 	for (n = 0; n < out_sgs + in_sgs; n++) {
 		for (sg = sgs[n]; sg; sg = sg_next(sg)) {
-			addr = vring_map_one_sg(vq, sg, n < out_sgs ?
-					DMA_TO_DEVICE : DMA_FROM_DEVICE);
-			if (vring_mapping_error(vq, addr))
+			if (vring_map_one_sg(vq, sg, n < out_sgs ?
+					     DMA_TO_DEVICE : DMA_FROM_DEVICE, &addr))
 				goto unmap_release;
 
 			desc[i].flags = cpu_to_le16(n < out_sgs ?
@@ -1428,9 +1434,10 @@ static inline int virtqueue_add_packed(struct virtqueue *_vq,
 	c = 0;
 	for (n = 0; n < out_sgs + in_sgs; n++) {
 		for (sg = sgs[n]; sg; sg = sg_next(sg)) {
-			dma_addr_t addr = vring_map_one_sg(vq, sg, n < out_sgs ?
-					DMA_TO_DEVICE : DMA_FROM_DEVICE);
-			if (vring_mapping_error(vq, addr))
+			dma_addr_t addr;
+
+			if (vring_map_one_sg(vq, sg, n < out_sgs ?
+					     DMA_TO_DEVICE : DMA_FROM_DEVICE, &addr))
 				goto unmap_release;
 
 			flags = cpu_to_le16(vq->packed.avail_used_flags |
-- 
2.32.0.3.g01195cf9f

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply related	[flat|nested] 176+ messages in thread

* [PATCH vhost v11 03/10] virtio_ring: introduce virtqueue_set_premapped()
  2023-07-10  3:42 ` Xuan Zhuo
@ 2023-07-10  3:42   ` Xuan Zhuo
  -1 siblings, 0 replies; 176+ messages in thread
From: Xuan Zhuo @ 2023-07-10  3:42 UTC (permalink / raw)
  To: virtualization
  Cc: Michael S. Tsirkin, Jason Wang, Xuan Zhuo, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Alexei Starovoitov,
	Daniel Borkmann, Jesper Dangaard Brouer, John Fastabend, netdev,
	bpf, Christoph Hellwig

This helper allows the driver change the dma mode to premapped mode.
Under the premapped mode, the virtio core do not do dma mapping
internally.

This just work when the use_dma_api is true. If the use_dma_api is false,
the dma options is not through the DMA APIs, that is not the standard
way of the linux kernel.

Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
---
 drivers/virtio/virtio_ring.c | 45 ++++++++++++++++++++++++++++++++++++
 include/linux/virtio.h       |  2 ++
 2 files changed, 47 insertions(+)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index 87d7ceeecdbd..5ace4539344c 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -172,6 +172,9 @@ struct vring_virtqueue {
 	/* Host publishes avail event idx */
 	bool event;
 
+	/* Do DMA mapping by driver */
+	bool premapped;
+
 	/* Head of free buffer list. */
 	unsigned int free_head;
 	/* Number we've added since last sync. */
@@ -2061,6 +2064,7 @@ static struct virtqueue *vring_create_virtqueue_packed(
 	vq->packed_ring = true;
 	vq->dma_dev = dma_dev;
 	vq->use_dma_api = vring_use_dma_api(vdev);
+	vq->premapped = false;
 
 	vq->indirect = virtio_has_feature(vdev, VIRTIO_RING_F_INDIRECT_DESC) &&
 		!context;
@@ -2550,6 +2554,7 @@ static struct virtqueue *__vring_new_virtqueue(unsigned int index,
 #endif
 	vq->dma_dev = dma_dev;
 	vq->use_dma_api = vring_use_dma_api(vdev);
+	vq->premapped = false;
 
 	vq->indirect = virtio_has_feature(vdev, VIRTIO_RING_F_INDIRECT_DESC) &&
 		!context;
@@ -2693,6 +2698,46 @@ int virtqueue_resize(struct virtqueue *_vq, u32 num,
 }
 EXPORT_SYMBOL_GPL(virtqueue_resize);
 
+/**
+ * virtqueue_set_premapped - set the vring premapped mode
+ * @_vq: the struct virtqueue we're talking about.
+ *
+ * Enable the premapped mode of the vq.
+ *
+ * The vring in premapped mode does not do dma internally, so the driver must
+ * do dma mapping in advance. The driver must pass the dma_address through
+ * dma_address of scatterlist. When the driver got a used buffer from
+ * the vring, it has to unmap the dma address.
+ *
+ * This function must be called immediately after creating the vq, or after vq
+ * reset, and before adding any buffers to it.
+ *
+ * Caller must ensure we don't call this with other virtqueue operations
+ * at the same time (except where noted).
+ *
+ * Returns zero or a negative error.
+ * 0: success.
+ * -EINVAL: vring does not use the dma api, so we can not enable premapped mode.
+ */
+int virtqueue_set_premapped(struct virtqueue *_vq)
+{
+	struct vring_virtqueue *vq = to_vvq(_vq);
+	u32 num;
+
+	num = vq->packed_ring ? vq->packed.vring.num : vq->split.vring.num;
+
+	if (num != vq->vq.num_free)
+		return -EINVAL;
+
+	if (!vq->use_dma_api)
+		return -EINVAL;
+
+	vq->premapped = true;
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(virtqueue_set_premapped);
+
 /* Only available for split ring */
 struct virtqueue *vring_new_virtqueue(unsigned int index,
 				      unsigned int num,
diff --git a/include/linux/virtio.h b/include/linux/virtio.h
index de6041deee37..2efd07b79ecf 100644
--- a/include/linux/virtio.h
+++ b/include/linux/virtio.h
@@ -78,6 +78,8 @@ bool virtqueue_enable_cb(struct virtqueue *vq);
 
 unsigned virtqueue_enable_cb_prepare(struct virtqueue *vq);
 
+int virtqueue_set_premapped(struct virtqueue *_vq);
+
 bool virtqueue_poll(struct virtqueue *vq, unsigned);
 
 bool virtqueue_enable_cb_delayed(struct virtqueue *vq);
-- 
2.32.0.3.g01195cf9f


^ permalink raw reply related	[flat|nested] 176+ messages in thread

* [PATCH vhost v11 03/10] virtio_ring: introduce virtqueue_set_premapped()
@ 2023-07-10  3:42   ` Xuan Zhuo
  0 siblings, 0 replies; 176+ messages in thread
From: Xuan Zhuo @ 2023-07-10  3:42 UTC (permalink / raw)
  To: virtualization
  Cc: Xuan Zhuo, Jesper Dangaard Brouer, Daniel Borkmann,
	Michael S. Tsirkin, netdev, John Fastabend, Alexei Starovoitov,
	Christoph Hellwig, Eric Dumazet, Jakub Kicinski, bpf,
	Paolo Abeni, David S. Miller

This helper allows the driver change the dma mode to premapped mode.
Under the premapped mode, the virtio core do not do dma mapping
internally.

This just work when the use_dma_api is true. If the use_dma_api is false,
the dma options is not through the DMA APIs, that is not the standard
way of the linux kernel.

Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
---
 drivers/virtio/virtio_ring.c | 45 ++++++++++++++++++++++++++++++++++++
 include/linux/virtio.h       |  2 ++
 2 files changed, 47 insertions(+)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index 87d7ceeecdbd..5ace4539344c 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -172,6 +172,9 @@ struct vring_virtqueue {
 	/* Host publishes avail event idx */
 	bool event;
 
+	/* Do DMA mapping by driver */
+	bool premapped;
+
 	/* Head of free buffer list. */
 	unsigned int free_head;
 	/* Number we've added since last sync. */
@@ -2061,6 +2064,7 @@ static struct virtqueue *vring_create_virtqueue_packed(
 	vq->packed_ring = true;
 	vq->dma_dev = dma_dev;
 	vq->use_dma_api = vring_use_dma_api(vdev);
+	vq->premapped = false;
 
 	vq->indirect = virtio_has_feature(vdev, VIRTIO_RING_F_INDIRECT_DESC) &&
 		!context;
@@ -2550,6 +2554,7 @@ static struct virtqueue *__vring_new_virtqueue(unsigned int index,
 #endif
 	vq->dma_dev = dma_dev;
 	vq->use_dma_api = vring_use_dma_api(vdev);
+	vq->premapped = false;
 
 	vq->indirect = virtio_has_feature(vdev, VIRTIO_RING_F_INDIRECT_DESC) &&
 		!context;
@@ -2693,6 +2698,46 @@ int virtqueue_resize(struct virtqueue *_vq, u32 num,
 }
 EXPORT_SYMBOL_GPL(virtqueue_resize);
 
+/**
+ * virtqueue_set_premapped - set the vring premapped mode
+ * @_vq: the struct virtqueue we're talking about.
+ *
+ * Enable the premapped mode of the vq.
+ *
+ * The vring in premapped mode does not do dma internally, so the driver must
+ * do dma mapping in advance. The driver must pass the dma_address through
+ * dma_address of scatterlist. When the driver got a used buffer from
+ * the vring, it has to unmap the dma address.
+ *
+ * This function must be called immediately after creating the vq, or after vq
+ * reset, and before adding any buffers to it.
+ *
+ * Caller must ensure we don't call this with other virtqueue operations
+ * at the same time (except where noted).
+ *
+ * Returns zero or a negative error.
+ * 0: success.
+ * -EINVAL: vring does not use the dma api, so we can not enable premapped mode.
+ */
+int virtqueue_set_premapped(struct virtqueue *_vq)
+{
+	struct vring_virtqueue *vq = to_vvq(_vq);
+	u32 num;
+
+	num = vq->packed_ring ? vq->packed.vring.num : vq->split.vring.num;
+
+	if (num != vq->vq.num_free)
+		return -EINVAL;
+
+	if (!vq->use_dma_api)
+		return -EINVAL;
+
+	vq->premapped = true;
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(virtqueue_set_premapped);
+
 /* Only available for split ring */
 struct virtqueue *vring_new_virtqueue(unsigned int index,
 				      unsigned int num,
diff --git a/include/linux/virtio.h b/include/linux/virtio.h
index de6041deee37..2efd07b79ecf 100644
--- a/include/linux/virtio.h
+++ b/include/linux/virtio.h
@@ -78,6 +78,8 @@ bool virtqueue_enable_cb(struct virtqueue *vq);
 
 unsigned virtqueue_enable_cb_prepare(struct virtqueue *vq);
 
+int virtqueue_set_premapped(struct virtqueue *_vq);
+
 bool virtqueue_poll(struct virtqueue *vq, unsigned);
 
 bool virtqueue_enable_cb_delayed(struct virtqueue *vq);
-- 
2.32.0.3.g01195cf9f

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply related	[flat|nested] 176+ messages in thread

* [PATCH vhost v11 04/10] virtio_ring: support add premapped buf
  2023-07-10  3:42 ` Xuan Zhuo
@ 2023-07-10  3:42   ` Xuan Zhuo
  -1 siblings, 0 replies; 176+ messages in thread
From: Xuan Zhuo @ 2023-07-10  3:42 UTC (permalink / raw)
  To: virtualization
  Cc: Michael S. Tsirkin, Jason Wang, Xuan Zhuo, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Alexei Starovoitov,
	Daniel Borkmann, Jesper Dangaard Brouer, John Fastabend, netdev,
	bpf, Christoph Hellwig

If the vq is the premapped mode, use the sg_dma_address() directly.

Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
---
 drivers/virtio/virtio_ring.c | 19 +++++++++++++++++--
 1 file changed, 17 insertions(+), 2 deletions(-)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index 5ace4539344c..d471dee3f4f7 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -361,6 +361,11 @@ static struct device *vring_dma_dev(const struct vring_virtqueue *vq)
 static int vring_map_one_sg(const struct vring_virtqueue *vq, struct scatterlist *sg,
 			    enum dma_data_direction direction, dma_addr_t *addr)
 {
+	if (vq->premapped) {
+		*addr = sg_dma_address(sg);
+		return 0;
+	}
+
 	if (!vq->use_dma_api) {
 		/*
 		 * If DMA is not used, KMSAN doesn't know that the scatterlist
@@ -639,8 +644,12 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
 		dma_addr_t addr = vring_map_single(
 			vq, desc, total_sg * sizeof(struct vring_desc),
 			DMA_TO_DEVICE);
-		if (vring_mapping_error(vq, addr))
+		if (vring_mapping_error(vq, addr)) {
+			if (vq->premapped)
+				goto free_indirect;
+
 			goto unmap_release;
+		}
 
 		virtqueue_add_desc_split(_vq, vq->split.vring.desc,
 					 head, addr,
@@ -706,6 +715,7 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
 			i = vring_unmap_one_split(vq, i);
 	}
 
+free_indirect:
 	if (indirect)
 		kfree(desc);
 
@@ -1307,8 +1317,12 @@ static int virtqueue_add_indirect_packed(struct vring_virtqueue *vq,
 	addr = vring_map_single(vq, desc,
 			total_sg * sizeof(struct vring_packed_desc),
 			DMA_TO_DEVICE);
-	if (vring_mapping_error(vq, addr))
+	if (vring_mapping_error(vq, addr)) {
+		if (vq->premapped)
+			goto free_desc;
+
 		goto unmap_release;
+	}
 
 	vq->packed.vring.desc[head].addr = cpu_to_le64(addr);
 	vq->packed.vring.desc[head].len = cpu_to_le32(total_sg *
@@ -1366,6 +1380,7 @@ static int virtqueue_add_indirect_packed(struct vring_virtqueue *vq,
 	for (i = 0; i < err_idx; i++)
 		vring_unmap_desc_packed(vq, &desc[i]);
 
+free_desc:
 	kfree(desc);
 
 	END_USE(vq);
-- 
2.32.0.3.g01195cf9f


^ permalink raw reply related	[flat|nested] 176+ messages in thread

* [PATCH vhost v11 04/10] virtio_ring: support add premapped buf
@ 2023-07-10  3:42   ` Xuan Zhuo
  0 siblings, 0 replies; 176+ messages in thread
From: Xuan Zhuo @ 2023-07-10  3:42 UTC (permalink / raw)
  To: virtualization
  Cc: Xuan Zhuo, Jesper Dangaard Brouer, Daniel Borkmann,
	Michael S. Tsirkin, netdev, John Fastabend, Alexei Starovoitov,
	Christoph Hellwig, Eric Dumazet, Jakub Kicinski, bpf,
	Paolo Abeni, David S. Miller

If the vq is the premapped mode, use the sg_dma_address() directly.

Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
---
 drivers/virtio/virtio_ring.c | 19 +++++++++++++++++--
 1 file changed, 17 insertions(+), 2 deletions(-)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index 5ace4539344c..d471dee3f4f7 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -361,6 +361,11 @@ static struct device *vring_dma_dev(const struct vring_virtqueue *vq)
 static int vring_map_one_sg(const struct vring_virtqueue *vq, struct scatterlist *sg,
 			    enum dma_data_direction direction, dma_addr_t *addr)
 {
+	if (vq->premapped) {
+		*addr = sg_dma_address(sg);
+		return 0;
+	}
+
 	if (!vq->use_dma_api) {
 		/*
 		 * If DMA is not used, KMSAN doesn't know that the scatterlist
@@ -639,8 +644,12 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
 		dma_addr_t addr = vring_map_single(
 			vq, desc, total_sg * sizeof(struct vring_desc),
 			DMA_TO_DEVICE);
-		if (vring_mapping_error(vq, addr))
+		if (vring_mapping_error(vq, addr)) {
+			if (vq->premapped)
+				goto free_indirect;
+
 			goto unmap_release;
+		}
 
 		virtqueue_add_desc_split(_vq, vq->split.vring.desc,
 					 head, addr,
@@ -706,6 +715,7 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
 			i = vring_unmap_one_split(vq, i);
 	}
 
+free_indirect:
 	if (indirect)
 		kfree(desc);
 
@@ -1307,8 +1317,12 @@ static int virtqueue_add_indirect_packed(struct vring_virtqueue *vq,
 	addr = vring_map_single(vq, desc,
 			total_sg * sizeof(struct vring_packed_desc),
 			DMA_TO_DEVICE);
-	if (vring_mapping_error(vq, addr))
+	if (vring_mapping_error(vq, addr)) {
+		if (vq->premapped)
+			goto free_desc;
+
 		goto unmap_release;
+	}
 
 	vq->packed.vring.desc[head].addr = cpu_to_le64(addr);
 	vq->packed.vring.desc[head].len = cpu_to_le32(total_sg *
@@ -1366,6 +1380,7 @@ static int virtqueue_add_indirect_packed(struct vring_virtqueue *vq,
 	for (i = 0; i < err_idx; i++)
 		vring_unmap_desc_packed(vq, &desc[i]);
 
+free_desc:
 	kfree(desc);
 
 	END_USE(vq);
-- 
2.32.0.3.g01195cf9f

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply related	[flat|nested] 176+ messages in thread

* [PATCH vhost v11 05/10] virtio_ring: introduce virtqueue_dma_dev()
  2023-07-10  3:42 ` Xuan Zhuo
@ 2023-07-10  3:42   ` Xuan Zhuo
  -1 siblings, 0 replies; 176+ messages in thread
From: Xuan Zhuo @ 2023-07-10  3:42 UTC (permalink / raw)
  To: virtualization
  Cc: Michael S. Tsirkin, Jason Wang, Xuan Zhuo, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Alexei Starovoitov,
	Daniel Borkmann, Jesper Dangaard Brouer, John Fastabend, netdev,
	bpf, Christoph Hellwig

Added virtqueue_dma_dev() to get DMA device for virtio. Then the
caller can do dma operation in advance. The purpose is to keep memory
mapped across multiple add/get buf operations.

Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
---
 drivers/virtio/virtio_ring.c | 17 +++++++++++++++++
 include/linux/virtio.h       |  2 ++
 2 files changed, 19 insertions(+)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index d471dee3f4f7..1fb2c6dca9ea 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -2265,6 +2265,23 @@ int virtqueue_add_inbuf_ctx(struct virtqueue *vq,
 }
 EXPORT_SYMBOL_GPL(virtqueue_add_inbuf_ctx);
 
+/**
+ * virtqueue_dma_dev - get the dma dev
+ * @_vq: the struct virtqueue we're talking about.
+ *
+ * Returns the dma dev. That can been used for dma api.
+ */
+struct device *virtqueue_dma_dev(struct virtqueue *_vq)
+{
+	struct vring_virtqueue *vq = to_vvq(_vq);
+
+	if (vq->use_dma_api)
+		return vring_dma_dev(vq);
+	else
+		return NULL;
+}
+EXPORT_SYMBOL_GPL(virtqueue_dma_dev);
+
 /**
  * virtqueue_kick_prepare - first half of split virtqueue_kick call.
  * @_vq: the struct virtqueue
diff --git a/include/linux/virtio.h b/include/linux/virtio.h
index 2efd07b79ecf..35d175121cc6 100644
--- a/include/linux/virtio.h
+++ b/include/linux/virtio.h
@@ -61,6 +61,8 @@ int virtqueue_add_sgs(struct virtqueue *vq,
 		      void *data,
 		      gfp_t gfp);
 
+struct device *virtqueue_dma_dev(struct virtqueue *vq);
+
 bool virtqueue_kick(struct virtqueue *vq);
 
 bool virtqueue_kick_prepare(struct virtqueue *vq);
-- 
2.32.0.3.g01195cf9f


^ permalink raw reply related	[flat|nested] 176+ messages in thread

* [PATCH vhost v11 05/10] virtio_ring: introduce virtqueue_dma_dev()
@ 2023-07-10  3:42   ` Xuan Zhuo
  0 siblings, 0 replies; 176+ messages in thread
From: Xuan Zhuo @ 2023-07-10  3:42 UTC (permalink / raw)
  To: virtualization
  Cc: Xuan Zhuo, Jesper Dangaard Brouer, Daniel Borkmann,
	Michael S. Tsirkin, netdev, John Fastabend, Alexei Starovoitov,
	Christoph Hellwig, Eric Dumazet, Jakub Kicinski, bpf,
	Paolo Abeni, David S. Miller

Added virtqueue_dma_dev() to get DMA device for virtio. Then the
caller can do dma operation in advance. The purpose is to keep memory
mapped across multiple add/get buf operations.

Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
---
 drivers/virtio/virtio_ring.c | 17 +++++++++++++++++
 include/linux/virtio.h       |  2 ++
 2 files changed, 19 insertions(+)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index d471dee3f4f7..1fb2c6dca9ea 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -2265,6 +2265,23 @@ int virtqueue_add_inbuf_ctx(struct virtqueue *vq,
 }
 EXPORT_SYMBOL_GPL(virtqueue_add_inbuf_ctx);
 
+/**
+ * virtqueue_dma_dev - get the dma dev
+ * @_vq: the struct virtqueue we're talking about.
+ *
+ * Returns the dma dev. That can been used for dma api.
+ */
+struct device *virtqueue_dma_dev(struct virtqueue *_vq)
+{
+	struct vring_virtqueue *vq = to_vvq(_vq);
+
+	if (vq->use_dma_api)
+		return vring_dma_dev(vq);
+	else
+		return NULL;
+}
+EXPORT_SYMBOL_GPL(virtqueue_dma_dev);
+
 /**
  * virtqueue_kick_prepare - first half of split virtqueue_kick call.
  * @_vq: the struct virtqueue
diff --git a/include/linux/virtio.h b/include/linux/virtio.h
index 2efd07b79ecf..35d175121cc6 100644
--- a/include/linux/virtio.h
+++ b/include/linux/virtio.h
@@ -61,6 +61,8 @@ int virtqueue_add_sgs(struct virtqueue *vq,
 		      void *data,
 		      gfp_t gfp);
 
+struct device *virtqueue_dma_dev(struct virtqueue *vq);
+
 bool virtqueue_kick(struct virtqueue *vq);
 
 bool virtqueue_kick_prepare(struct virtqueue *vq);
-- 
2.32.0.3.g01195cf9f

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply related	[flat|nested] 176+ messages in thread

* [PATCH vhost v11 06/10] virtio_ring: skip unmap for premapped
  2023-07-10  3:42 ` Xuan Zhuo
@ 2023-07-10  3:42   ` Xuan Zhuo
  -1 siblings, 0 replies; 176+ messages in thread
From: Xuan Zhuo @ 2023-07-10  3:42 UTC (permalink / raw)
  To: virtualization
  Cc: Michael S. Tsirkin, Jason Wang, Xuan Zhuo, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Alexei Starovoitov,
	Daniel Borkmann, Jesper Dangaard Brouer, John Fastabend, netdev,
	bpf, Christoph Hellwig

Now we add a case where we skip dma unmap, the vq->premapped is true.

We can't just rely on use_dma_api to determine whether to skip the dma
operation. For convenience, I introduced the "do_unmap". By default, it
is the same as use_dma_api. If the driver is configured with premapped,
then do_unmap is false.

So as long as do_unmap is false, for addr of desc, we should skip dma
unmap operation.

Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
---
 drivers/virtio/virtio_ring.c | 42 ++++++++++++++++++++++++------------
 1 file changed, 28 insertions(+), 14 deletions(-)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index 1fb2c6dca9ea..10ee3b7ce571 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -175,6 +175,11 @@ struct vring_virtqueue {
 	/* Do DMA mapping by driver */
 	bool premapped;
 
+	/* Do unmap or not for desc. Just when premapped is False and
+	 * use_dma_api is true, this is true.
+	 */
+	bool do_unmap;
+
 	/* Head of free buffer list. */
 	unsigned int free_head;
 	/* Number we've added since last sync. */
@@ -440,7 +445,7 @@ static void vring_unmap_one_split_indirect(const struct vring_virtqueue *vq,
 {
 	u16 flags;
 
-	if (!vq->use_dma_api)
+	if (!vq->do_unmap)
 		return;
 
 	flags = virtio16_to_cpu(vq->vq.vdev, desc->flags);
@@ -458,18 +463,21 @@ static unsigned int vring_unmap_one_split(const struct vring_virtqueue *vq,
 	struct vring_desc_extra *extra = vq->split.desc_extra;
 	u16 flags;
 
-	if (!vq->use_dma_api)
-		goto out;
-
 	flags = extra[i].flags;
 
 	if (flags & VRING_DESC_F_INDIRECT) {
+		if (!vq->use_dma_api)
+			goto out;
+
 		dma_unmap_single(vring_dma_dev(vq),
 				 extra[i].addr,
 				 extra[i].len,
 				 (flags & VRING_DESC_F_WRITE) ?
 				 DMA_FROM_DEVICE : DMA_TO_DEVICE);
 	} else {
+		if (!vq->do_unmap)
+			goto out;
+
 		dma_unmap_page(vring_dma_dev(vq),
 			       extra[i].addr,
 			       extra[i].len,
@@ -635,7 +643,7 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
 	}
 	/* Last one doesn't continue. */
 	desc[prev].flags &= cpu_to_virtio16(_vq->vdev, ~VRING_DESC_F_NEXT);
-	if (!indirect && vq->use_dma_api)
+	if (!indirect && vq->do_unmap)
 		vq->split.desc_extra[prev & (vq->split.vring.num - 1)].flags &=
 			~VRING_DESC_F_NEXT;
 
@@ -794,7 +802,7 @@ static void detach_buf_split(struct vring_virtqueue *vq, unsigned int head,
 				VRING_DESC_F_INDIRECT));
 		BUG_ON(len == 0 || len % sizeof(struct vring_desc));
 
-		if (vq->use_dma_api) {
+		if (vq->do_unmap) {
 			for (j = 0; j < len / sizeof(struct vring_desc); j++)
 				vring_unmap_one_split_indirect(vq, &indir_desc[j]);
 		}
@@ -1217,17 +1225,20 @@ static void vring_unmap_extra_packed(const struct vring_virtqueue *vq,
 {
 	u16 flags;
 
-	if (!vq->use_dma_api)
-		return;
-
 	flags = extra->flags;
 
 	if (flags & VRING_DESC_F_INDIRECT) {
+		if (!vq->use_dma_api)
+			return;
+
 		dma_unmap_single(vring_dma_dev(vq),
 				 extra->addr, extra->len,
 				 (flags & VRING_DESC_F_WRITE) ?
 				 DMA_FROM_DEVICE : DMA_TO_DEVICE);
 	} else {
+		if (!vq->do_unmap)
+			return;
+
 		dma_unmap_page(vring_dma_dev(vq),
 			       extra->addr, extra->len,
 			       (flags & VRING_DESC_F_WRITE) ?
@@ -1240,7 +1251,7 @@ static void vring_unmap_desc_packed(const struct vring_virtqueue *vq,
 {
 	u16 flags;
 
-	if (!vq->use_dma_api)
+	if (!vq->do_unmap)
 		return;
 
 	flags = le16_to_cpu(desc->flags);
@@ -1329,7 +1340,7 @@ static int virtqueue_add_indirect_packed(struct vring_virtqueue *vq,
 				sizeof(struct vring_packed_desc));
 	vq->packed.vring.desc[head].id = cpu_to_le16(id);
 
-	if (vq->use_dma_api) {
+	if (vq->do_unmap) {
 		vq->packed.desc_extra[id].addr = addr;
 		vq->packed.desc_extra[id].len = total_sg *
 				sizeof(struct vring_packed_desc);
@@ -1470,7 +1481,7 @@ static inline int virtqueue_add_packed(struct virtqueue *_vq,
 			desc[i].len = cpu_to_le32(sg->length);
 			desc[i].id = cpu_to_le16(id);
 
-			if (unlikely(vq->use_dma_api)) {
+			if (unlikely(vq->do_unmap)) {
 				vq->packed.desc_extra[curr].addr = addr;
 				vq->packed.desc_extra[curr].len = sg->length;
 				vq->packed.desc_extra[curr].flags =
@@ -1604,7 +1615,7 @@ static void detach_buf_packed(struct vring_virtqueue *vq,
 	vq->free_head = id;
 	vq->vq.num_free += state->num;
 
-	if (unlikely(vq->use_dma_api)) {
+	if (unlikely(vq->do_unmap)) {
 		curr = id;
 		for (i = 0; i < state->num; i++) {
 			vring_unmap_extra_packed(vq,
@@ -1621,7 +1632,7 @@ static void detach_buf_packed(struct vring_virtqueue *vq,
 		if (!desc)
 			return;
 
-		if (vq->use_dma_api) {
+		if (vq->do_unmap) {
 			len = vq->packed.desc_extra[id].len;
 			for (i = 0; i < len / sizeof(struct vring_packed_desc);
 					i++)
@@ -2080,6 +2091,7 @@ static struct virtqueue *vring_create_virtqueue_packed(
 	vq->dma_dev = dma_dev;
 	vq->use_dma_api = vring_use_dma_api(vdev);
 	vq->premapped = false;
+	vq->do_unmap = vq->use_dma_api;
 
 	vq->indirect = virtio_has_feature(vdev, VIRTIO_RING_F_INDIRECT_DESC) &&
 		!context;
@@ -2587,6 +2599,7 @@ static struct virtqueue *__vring_new_virtqueue(unsigned int index,
 	vq->dma_dev = dma_dev;
 	vq->use_dma_api = vring_use_dma_api(vdev);
 	vq->premapped = false;
+	vq->do_unmap = vq->use_dma_api;
 
 	vq->indirect = virtio_has_feature(vdev, VIRTIO_RING_F_INDIRECT_DESC) &&
 		!context;
@@ -2765,6 +2778,7 @@ int virtqueue_set_premapped(struct virtqueue *_vq)
 		return -EINVAL;
 
 	vq->premapped = true;
+	vq->do_unmap = false;
 
 	return 0;
 }
-- 
2.32.0.3.g01195cf9f


^ permalink raw reply related	[flat|nested] 176+ messages in thread

* [PATCH vhost v11 06/10] virtio_ring: skip unmap for premapped
@ 2023-07-10  3:42   ` Xuan Zhuo
  0 siblings, 0 replies; 176+ messages in thread
From: Xuan Zhuo @ 2023-07-10  3:42 UTC (permalink / raw)
  To: virtualization
  Cc: Xuan Zhuo, Jesper Dangaard Brouer, Daniel Borkmann,
	Michael S. Tsirkin, netdev, John Fastabend, Alexei Starovoitov,
	Christoph Hellwig, Eric Dumazet, Jakub Kicinski, bpf,
	Paolo Abeni, David S. Miller

Now we add a case where we skip dma unmap, the vq->premapped is true.

We can't just rely on use_dma_api to determine whether to skip the dma
operation. For convenience, I introduced the "do_unmap". By default, it
is the same as use_dma_api. If the driver is configured with premapped,
then do_unmap is false.

So as long as do_unmap is false, for addr of desc, we should skip dma
unmap operation.

Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
---
 drivers/virtio/virtio_ring.c | 42 ++++++++++++++++++++++++------------
 1 file changed, 28 insertions(+), 14 deletions(-)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index 1fb2c6dca9ea..10ee3b7ce571 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -175,6 +175,11 @@ struct vring_virtqueue {
 	/* Do DMA mapping by driver */
 	bool premapped;
 
+	/* Do unmap or not for desc. Just when premapped is False and
+	 * use_dma_api is true, this is true.
+	 */
+	bool do_unmap;
+
 	/* Head of free buffer list. */
 	unsigned int free_head;
 	/* Number we've added since last sync. */
@@ -440,7 +445,7 @@ static void vring_unmap_one_split_indirect(const struct vring_virtqueue *vq,
 {
 	u16 flags;
 
-	if (!vq->use_dma_api)
+	if (!vq->do_unmap)
 		return;
 
 	flags = virtio16_to_cpu(vq->vq.vdev, desc->flags);
@@ -458,18 +463,21 @@ static unsigned int vring_unmap_one_split(const struct vring_virtqueue *vq,
 	struct vring_desc_extra *extra = vq->split.desc_extra;
 	u16 flags;
 
-	if (!vq->use_dma_api)
-		goto out;
-
 	flags = extra[i].flags;
 
 	if (flags & VRING_DESC_F_INDIRECT) {
+		if (!vq->use_dma_api)
+			goto out;
+
 		dma_unmap_single(vring_dma_dev(vq),
 				 extra[i].addr,
 				 extra[i].len,
 				 (flags & VRING_DESC_F_WRITE) ?
 				 DMA_FROM_DEVICE : DMA_TO_DEVICE);
 	} else {
+		if (!vq->do_unmap)
+			goto out;
+
 		dma_unmap_page(vring_dma_dev(vq),
 			       extra[i].addr,
 			       extra[i].len,
@@ -635,7 +643,7 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
 	}
 	/* Last one doesn't continue. */
 	desc[prev].flags &= cpu_to_virtio16(_vq->vdev, ~VRING_DESC_F_NEXT);
-	if (!indirect && vq->use_dma_api)
+	if (!indirect && vq->do_unmap)
 		vq->split.desc_extra[prev & (vq->split.vring.num - 1)].flags &=
 			~VRING_DESC_F_NEXT;
 
@@ -794,7 +802,7 @@ static void detach_buf_split(struct vring_virtqueue *vq, unsigned int head,
 				VRING_DESC_F_INDIRECT));
 		BUG_ON(len == 0 || len % sizeof(struct vring_desc));
 
-		if (vq->use_dma_api) {
+		if (vq->do_unmap) {
 			for (j = 0; j < len / sizeof(struct vring_desc); j++)
 				vring_unmap_one_split_indirect(vq, &indir_desc[j]);
 		}
@@ -1217,17 +1225,20 @@ static void vring_unmap_extra_packed(const struct vring_virtqueue *vq,
 {
 	u16 flags;
 
-	if (!vq->use_dma_api)
-		return;
-
 	flags = extra->flags;
 
 	if (flags & VRING_DESC_F_INDIRECT) {
+		if (!vq->use_dma_api)
+			return;
+
 		dma_unmap_single(vring_dma_dev(vq),
 				 extra->addr, extra->len,
 				 (flags & VRING_DESC_F_WRITE) ?
 				 DMA_FROM_DEVICE : DMA_TO_DEVICE);
 	} else {
+		if (!vq->do_unmap)
+			return;
+
 		dma_unmap_page(vring_dma_dev(vq),
 			       extra->addr, extra->len,
 			       (flags & VRING_DESC_F_WRITE) ?
@@ -1240,7 +1251,7 @@ static void vring_unmap_desc_packed(const struct vring_virtqueue *vq,
 {
 	u16 flags;
 
-	if (!vq->use_dma_api)
+	if (!vq->do_unmap)
 		return;
 
 	flags = le16_to_cpu(desc->flags);
@@ -1329,7 +1340,7 @@ static int virtqueue_add_indirect_packed(struct vring_virtqueue *vq,
 				sizeof(struct vring_packed_desc));
 	vq->packed.vring.desc[head].id = cpu_to_le16(id);
 
-	if (vq->use_dma_api) {
+	if (vq->do_unmap) {
 		vq->packed.desc_extra[id].addr = addr;
 		vq->packed.desc_extra[id].len = total_sg *
 				sizeof(struct vring_packed_desc);
@@ -1470,7 +1481,7 @@ static inline int virtqueue_add_packed(struct virtqueue *_vq,
 			desc[i].len = cpu_to_le32(sg->length);
 			desc[i].id = cpu_to_le16(id);
 
-			if (unlikely(vq->use_dma_api)) {
+			if (unlikely(vq->do_unmap)) {
 				vq->packed.desc_extra[curr].addr = addr;
 				vq->packed.desc_extra[curr].len = sg->length;
 				vq->packed.desc_extra[curr].flags =
@@ -1604,7 +1615,7 @@ static void detach_buf_packed(struct vring_virtqueue *vq,
 	vq->free_head = id;
 	vq->vq.num_free += state->num;
 
-	if (unlikely(vq->use_dma_api)) {
+	if (unlikely(vq->do_unmap)) {
 		curr = id;
 		for (i = 0; i < state->num; i++) {
 			vring_unmap_extra_packed(vq,
@@ -1621,7 +1632,7 @@ static void detach_buf_packed(struct vring_virtqueue *vq,
 		if (!desc)
 			return;
 
-		if (vq->use_dma_api) {
+		if (vq->do_unmap) {
 			len = vq->packed.desc_extra[id].len;
 			for (i = 0; i < len / sizeof(struct vring_packed_desc);
 					i++)
@@ -2080,6 +2091,7 @@ static struct virtqueue *vring_create_virtqueue_packed(
 	vq->dma_dev = dma_dev;
 	vq->use_dma_api = vring_use_dma_api(vdev);
 	vq->premapped = false;
+	vq->do_unmap = vq->use_dma_api;
 
 	vq->indirect = virtio_has_feature(vdev, VIRTIO_RING_F_INDIRECT_DESC) &&
 		!context;
@@ -2587,6 +2599,7 @@ static struct virtqueue *__vring_new_virtqueue(unsigned int index,
 	vq->dma_dev = dma_dev;
 	vq->use_dma_api = vring_use_dma_api(vdev);
 	vq->premapped = false;
+	vq->do_unmap = vq->use_dma_api;
 
 	vq->indirect = virtio_has_feature(vdev, VIRTIO_RING_F_INDIRECT_DESC) &&
 		!context;
@@ -2765,6 +2778,7 @@ int virtqueue_set_premapped(struct virtqueue *_vq)
 		return -EINVAL;
 
 	vq->premapped = true;
+	vq->do_unmap = false;
 
 	return 0;
 }
-- 
2.32.0.3.g01195cf9f

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply related	[flat|nested] 176+ messages in thread

* [PATCH vhost v11 07/10] virtio_ring: correct the expression of the description of virtqueue_resize()
  2023-07-10  3:42 ` Xuan Zhuo
@ 2023-07-10  3:42   ` Xuan Zhuo
  -1 siblings, 0 replies; 176+ messages in thread
From: Xuan Zhuo @ 2023-07-10  3:42 UTC (permalink / raw)
  To: virtualization
  Cc: Michael S. Tsirkin, Jason Wang, Xuan Zhuo, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Alexei Starovoitov,
	Daniel Borkmann, Jesper Dangaard Brouer, John Fastabend, netdev,
	bpf, Christoph Hellwig

Modify the "useless" to a more accurate "unused".

Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Jason Wang <jasowang@redhat.com>
---
 drivers/virtio/virtio_ring.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index 10ee3b7ce571..dcbc8a5eaf16 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -2678,7 +2678,7 @@ EXPORT_SYMBOL_GPL(vring_create_virtqueue_dma);
  * virtqueue_resize - resize the vring of vq
  * @_vq: the struct virtqueue we're talking about.
  * @num: new ring num
- * @recycle: callback for recycle the useless buffer
+ * @recycle: callback to recycle unused buffers
  *
  * When it is really necessary to create a new vring, it will set the current vq
  * into the reset state. Then call the passed callback to recycle the buffer
-- 
2.32.0.3.g01195cf9f


^ permalink raw reply related	[flat|nested] 176+ messages in thread

* [PATCH vhost v11 07/10] virtio_ring: correct the expression of the description of virtqueue_resize()
@ 2023-07-10  3:42   ` Xuan Zhuo
  0 siblings, 0 replies; 176+ messages in thread
From: Xuan Zhuo @ 2023-07-10  3:42 UTC (permalink / raw)
  To: virtualization
  Cc: Xuan Zhuo, Jesper Dangaard Brouer, Daniel Borkmann,
	Michael S. Tsirkin, netdev, John Fastabend, Alexei Starovoitov,
	Christoph Hellwig, Eric Dumazet, Jakub Kicinski, bpf,
	Paolo Abeni, David S. Miller

Modify the "useless" to a more accurate "unused".

Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Jason Wang <jasowang@redhat.com>
---
 drivers/virtio/virtio_ring.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index 10ee3b7ce571..dcbc8a5eaf16 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -2678,7 +2678,7 @@ EXPORT_SYMBOL_GPL(vring_create_virtqueue_dma);
  * virtqueue_resize - resize the vring of vq
  * @_vq: the struct virtqueue we're talking about.
  * @num: new ring num
- * @recycle: callback for recycle the useless buffer
+ * @recycle: callback to recycle unused buffers
  *
  * When it is really necessary to create a new vring, it will set the current vq
  * into the reset state. Then call the passed callback to recycle the buffer
-- 
2.32.0.3.g01195cf9f

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply related	[flat|nested] 176+ messages in thread

* [PATCH vhost v11 08/10] virtio_ring: separate the logic of reset/enable from virtqueue_resize
  2023-07-10  3:42 ` Xuan Zhuo
@ 2023-07-10  3:42   ` Xuan Zhuo
  -1 siblings, 0 replies; 176+ messages in thread
From: Xuan Zhuo @ 2023-07-10  3:42 UTC (permalink / raw)
  To: virtualization
  Cc: Michael S. Tsirkin, Jason Wang, Xuan Zhuo, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Alexei Starovoitov,
	Daniel Borkmann, Jesper Dangaard Brouer, John Fastabend, netdev,
	bpf, Christoph Hellwig

The subsequent reset function will reuse these logic.

Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Jason Wang <jasowang@redhat.com>
---
 drivers/virtio/virtio_ring.c | 58 ++++++++++++++++++++++++------------
 1 file changed, 39 insertions(+), 19 deletions(-)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index dcbc8a5eaf16..bed0237402fa 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -2152,6 +2152,43 @@ static int virtqueue_resize_packed(struct virtqueue *_vq, u32 num)
 	return -ENOMEM;
 }
 
+static int virtqueue_disable_and_recycle(struct virtqueue *_vq,
+					 void (*recycle)(struct virtqueue *vq, void *buf))
+{
+	struct vring_virtqueue *vq = to_vvq(_vq);
+	struct virtio_device *vdev = vq->vq.vdev;
+	void *buf;
+	int err;
+
+	if (!vq->we_own_ring)
+		return -EPERM;
+
+	if (!vdev->config->disable_vq_and_reset)
+		return -ENOENT;
+
+	if (!vdev->config->enable_vq_after_reset)
+		return -ENOENT;
+
+	err = vdev->config->disable_vq_and_reset(_vq);
+	if (err)
+		return err;
+
+	while ((buf = virtqueue_detach_unused_buf(_vq)) != NULL)
+		recycle(_vq, buf);
+
+	return 0;
+}
+
+static int virtqueue_enable_after_reset(struct virtqueue *_vq)
+{
+	struct vring_virtqueue *vq = to_vvq(_vq);
+	struct virtio_device *vdev = vq->vq.vdev;
+
+	if (vdev->config->enable_vq_after_reset(_vq))
+		return -EBUSY;
+
+	return 0;
+}
 
 /*
  * Generic functions and exported symbols.
@@ -2702,13 +2739,8 @@ int virtqueue_resize(struct virtqueue *_vq, u32 num,
 		     void (*recycle)(struct virtqueue *vq, void *buf))
 {
 	struct vring_virtqueue *vq = to_vvq(_vq);
-	struct virtio_device *vdev = vq->vq.vdev;
-	void *buf;
 	int err;
 
-	if (!vq->we_own_ring)
-		return -EPERM;
-
 	if (num > vq->vq.num_max)
 		return -E2BIG;
 
@@ -2718,28 +2750,16 @@ int virtqueue_resize(struct virtqueue *_vq, u32 num,
 	if ((vq->packed_ring ? vq->packed.vring.num : vq->split.vring.num) == num)
 		return 0;
 
-	if (!vdev->config->disable_vq_and_reset)
-		return -ENOENT;
-
-	if (!vdev->config->enable_vq_after_reset)
-		return -ENOENT;
-
-	err = vdev->config->disable_vq_and_reset(_vq);
+	err = virtqueue_disable_and_recycle(_vq, recycle);
 	if (err)
 		return err;
 
-	while ((buf = virtqueue_detach_unused_buf(_vq)) != NULL)
-		recycle(_vq, buf);
-
 	if (vq->packed_ring)
 		err = virtqueue_resize_packed(_vq, num);
 	else
 		err = virtqueue_resize_split(_vq, num);
 
-	if (vdev->config->enable_vq_after_reset(_vq))
-		return -EBUSY;
-
-	return err;
+	return virtqueue_enable_after_reset(_vq);
 }
 EXPORT_SYMBOL_GPL(virtqueue_resize);
 
-- 
2.32.0.3.g01195cf9f


^ permalink raw reply related	[flat|nested] 176+ messages in thread

* [PATCH vhost v11 08/10] virtio_ring: separate the logic of reset/enable from virtqueue_resize
@ 2023-07-10  3:42   ` Xuan Zhuo
  0 siblings, 0 replies; 176+ messages in thread
From: Xuan Zhuo @ 2023-07-10  3:42 UTC (permalink / raw)
  To: virtualization
  Cc: Xuan Zhuo, Jesper Dangaard Brouer, Daniel Borkmann,
	Michael S. Tsirkin, netdev, John Fastabend, Alexei Starovoitov,
	Christoph Hellwig, Eric Dumazet, Jakub Kicinski, bpf,
	Paolo Abeni, David S. Miller

The subsequent reset function will reuse these logic.

Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Jason Wang <jasowang@redhat.com>
---
 drivers/virtio/virtio_ring.c | 58 ++++++++++++++++++++++++------------
 1 file changed, 39 insertions(+), 19 deletions(-)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index dcbc8a5eaf16..bed0237402fa 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -2152,6 +2152,43 @@ static int virtqueue_resize_packed(struct virtqueue *_vq, u32 num)
 	return -ENOMEM;
 }
 
+static int virtqueue_disable_and_recycle(struct virtqueue *_vq,
+					 void (*recycle)(struct virtqueue *vq, void *buf))
+{
+	struct vring_virtqueue *vq = to_vvq(_vq);
+	struct virtio_device *vdev = vq->vq.vdev;
+	void *buf;
+	int err;
+
+	if (!vq->we_own_ring)
+		return -EPERM;
+
+	if (!vdev->config->disable_vq_and_reset)
+		return -ENOENT;
+
+	if (!vdev->config->enable_vq_after_reset)
+		return -ENOENT;
+
+	err = vdev->config->disable_vq_and_reset(_vq);
+	if (err)
+		return err;
+
+	while ((buf = virtqueue_detach_unused_buf(_vq)) != NULL)
+		recycle(_vq, buf);
+
+	return 0;
+}
+
+static int virtqueue_enable_after_reset(struct virtqueue *_vq)
+{
+	struct vring_virtqueue *vq = to_vvq(_vq);
+	struct virtio_device *vdev = vq->vq.vdev;
+
+	if (vdev->config->enable_vq_after_reset(_vq))
+		return -EBUSY;
+
+	return 0;
+}
 
 /*
  * Generic functions and exported symbols.
@@ -2702,13 +2739,8 @@ int virtqueue_resize(struct virtqueue *_vq, u32 num,
 		     void (*recycle)(struct virtqueue *vq, void *buf))
 {
 	struct vring_virtqueue *vq = to_vvq(_vq);
-	struct virtio_device *vdev = vq->vq.vdev;
-	void *buf;
 	int err;
 
-	if (!vq->we_own_ring)
-		return -EPERM;
-
 	if (num > vq->vq.num_max)
 		return -E2BIG;
 
@@ -2718,28 +2750,16 @@ int virtqueue_resize(struct virtqueue *_vq, u32 num,
 	if ((vq->packed_ring ? vq->packed.vring.num : vq->split.vring.num) == num)
 		return 0;
 
-	if (!vdev->config->disable_vq_and_reset)
-		return -ENOENT;
-
-	if (!vdev->config->enable_vq_after_reset)
-		return -ENOENT;
-
-	err = vdev->config->disable_vq_and_reset(_vq);
+	err = virtqueue_disable_and_recycle(_vq, recycle);
 	if (err)
 		return err;
 
-	while ((buf = virtqueue_detach_unused_buf(_vq)) != NULL)
-		recycle(_vq, buf);
-
 	if (vq->packed_ring)
 		err = virtqueue_resize_packed(_vq, num);
 	else
 		err = virtqueue_resize_split(_vq, num);
 
-	if (vdev->config->enable_vq_after_reset(_vq))
-		return -EBUSY;
-
-	return err;
+	return virtqueue_enable_after_reset(_vq);
 }
 EXPORT_SYMBOL_GPL(virtqueue_resize);
 
-- 
2.32.0.3.g01195cf9f

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply related	[flat|nested] 176+ messages in thread

* [PATCH vhost v11 09/10] virtio_ring: introduce virtqueue_reset()
  2023-07-10  3:42 ` Xuan Zhuo
@ 2023-07-10  3:42   ` Xuan Zhuo
  -1 siblings, 0 replies; 176+ messages in thread
From: Xuan Zhuo @ 2023-07-10  3:42 UTC (permalink / raw)
  To: virtualization
  Cc: Michael S. Tsirkin, Jason Wang, Xuan Zhuo, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Alexei Starovoitov,
	Daniel Borkmann, Jesper Dangaard Brouer, John Fastabend, netdev,
	bpf, Christoph Hellwig

Introduce virtqueue_reset() to release all buffer inside vq.

Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Jason Wang <jasowang@redhat.com>
---
 drivers/virtio/virtio_ring.c | 33 +++++++++++++++++++++++++++++++++
 include/linux/virtio.h       |  2 ++
 2 files changed, 35 insertions(+)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index bed0237402fa..1f4681102190 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -2804,6 +2804,39 @@ int virtqueue_set_premapped(struct virtqueue *_vq)
 }
 EXPORT_SYMBOL_GPL(virtqueue_set_premapped);
 
+/**
+ * virtqueue_reset - detach and recycle all unused buffers
+ * @_vq: the struct virtqueue we're talking about.
+ * @recycle: callback to recycle unused buffers
+ *
+ * Caller must ensure we don't call this with other virtqueue operations
+ * at the same time (except where noted).
+ *
+ * Returns zero or a negative error.
+ * 0: success.
+ * -EBUSY: Failed to sync with device, vq may not work properly
+ * -ENOENT: Transport or device not supported
+ * -EPERM: Operation not permitted
+ */
+int virtqueue_reset(struct virtqueue *_vq,
+		    void (*recycle)(struct virtqueue *vq, void *buf))
+{
+	struct vring_virtqueue *vq = to_vvq(_vq);
+	int err;
+
+	err = virtqueue_disable_and_recycle(_vq, recycle);
+	if (err)
+		return err;
+
+	if (vq->packed_ring)
+		virtqueue_reinit_packed(vq);
+	else
+		virtqueue_reinit_split(vq);
+
+	return virtqueue_enable_after_reset(_vq);
+}
+EXPORT_SYMBOL_GPL(virtqueue_reset);
+
 /* Only available for split ring */
 struct virtqueue *vring_new_virtqueue(unsigned int index,
 				      unsigned int num,
diff --git a/include/linux/virtio.h b/include/linux/virtio.h
index 35d175121cc6..465e8e0e215a 100644
--- a/include/linux/virtio.h
+++ b/include/linux/virtio.h
@@ -99,6 +99,8 @@ dma_addr_t virtqueue_get_used_addr(const struct virtqueue *vq);
 
 int virtqueue_resize(struct virtqueue *vq, u32 num,
 		     void (*recycle)(struct virtqueue *vq, void *buf));
+int virtqueue_reset(struct virtqueue *vq,
+		    void (*recycle)(struct virtqueue *vq, void *buf));
 
 /**
  * struct virtio_device - representation of a device using virtio
-- 
2.32.0.3.g01195cf9f


^ permalink raw reply related	[flat|nested] 176+ messages in thread

* [PATCH vhost v11 09/10] virtio_ring: introduce virtqueue_reset()
@ 2023-07-10  3:42   ` Xuan Zhuo
  0 siblings, 0 replies; 176+ messages in thread
From: Xuan Zhuo @ 2023-07-10  3:42 UTC (permalink / raw)
  To: virtualization
  Cc: Xuan Zhuo, Jesper Dangaard Brouer, Daniel Borkmann,
	Michael S. Tsirkin, netdev, John Fastabend, Alexei Starovoitov,
	Christoph Hellwig, Eric Dumazet, Jakub Kicinski, bpf,
	Paolo Abeni, David S. Miller

Introduce virtqueue_reset() to release all buffer inside vq.

Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Jason Wang <jasowang@redhat.com>
---
 drivers/virtio/virtio_ring.c | 33 +++++++++++++++++++++++++++++++++
 include/linux/virtio.h       |  2 ++
 2 files changed, 35 insertions(+)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index bed0237402fa..1f4681102190 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -2804,6 +2804,39 @@ int virtqueue_set_premapped(struct virtqueue *_vq)
 }
 EXPORT_SYMBOL_GPL(virtqueue_set_premapped);
 
+/**
+ * virtqueue_reset - detach and recycle all unused buffers
+ * @_vq: the struct virtqueue we're talking about.
+ * @recycle: callback to recycle unused buffers
+ *
+ * Caller must ensure we don't call this with other virtqueue operations
+ * at the same time (except where noted).
+ *
+ * Returns zero or a negative error.
+ * 0: success.
+ * -EBUSY: Failed to sync with device, vq may not work properly
+ * -ENOENT: Transport or device not supported
+ * -EPERM: Operation not permitted
+ */
+int virtqueue_reset(struct virtqueue *_vq,
+		    void (*recycle)(struct virtqueue *vq, void *buf))
+{
+	struct vring_virtqueue *vq = to_vvq(_vq);
+	int err;
+
+	err = virtqueue_disable_and_recycle(_vq, recycle);
+	if (err)
+		return err;
+
+	if (vq->packed_ring)
+		virtqueue_reinit_packed(vq);
+	else
+		virtqueue_reinit_split(vq);
+
+	return virtqueue_enable_after_reset(_vq);
+}
+EXPORT_SYMBOL_GPL(virtqueue_reset);
+
 /* Only available for split ring */
 struct virtqueue *vring_new_virtqueue(unsigned int index,
 				      unsigned int num,
diff --git a/include/linux/virtio.h b/include/linux/virtio.h
index 35d175121cc6..465e8e0e215a 100644
--- a/include/linux/virtio.h
+++ b/include/linux/virtio.h
@@ -99,6 +99,8 @@ dma_addr_t virtqueue_get_used_addr(const struct virtqueue *vq);
 
 int virtqueue_resize(struct virtqueue *vq, u32 num,
 		     void (*recycle)(struct virtqueue *vq, void *buf));
+int virtqueue_reset(struct virtqueue *vq,
+		    void (*recycle)(struct virtqueue *vq, void *buf));
 
 /**
  * struct virtio_device - representation of a device using virtio
-- 
2.32.0.3.g01195cf9f

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply related	[flat|nested] 176+ messages in thread

* [PATCH vhost v11 10/10] virtio_net: merge dma operation for one page
  2023-07-10  3:42 ` Xuan Zhuo
@ 2023-07-10  3:42   ` Xuan Zhuo
  -1 siblings, 0 replies; 176+ messages in thread
From: Xuan Zhuo @ 2023-07-10  3:42 UTC (permalink / raw)
  To: virtualization
  Cc: Michael S. Tsirkin, Jason Wang, Xuan Zhuo, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Alexei Starovoitov,
	Daniel Borkmann, Jesper Dangaard Brouer, John Fastabend, netdev,
	bpf, Christoph Hellwig

Currently, the virtio core will perform a dma operation for each
operation. Although, the same page may be operated multiple times.

The driver does the dma operation and manages the dma address based the
feature premapped of virtio core.

This way, we can perform only one dma operation for the same page. In
the case of mtu 1500, this can reduce a lot of dma operations.

Tested on Aliyun g7.4large machine, in the case of a cpu 100%, pps
increased from 1893766 to 1901105. An increase of 0.4%.

Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
---
 drivers/net/virtio_net.c | 283 ++++++++++++++++++++++++++++++++++++---
 1 file changed, 267 insertions(+), 16 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 486b5849033d..4de845d35bed 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -126,6 +126,27 @@ static const struct virtnet_stat_desc virtnet_rq_stats_desc[] = {
 #define VIRTNET_SQ_STATS_LEN	ARRAY_SIZE(virtnet_sq_stats_desc)
 #define VIRTNET_RQ_STATS_LEN	ARRAY_SIZE(virtnet_rq_stats_desc)
 
+/* The bufs on the same page may share this struct. */
+struct virtnet_rq_dma {
+	struct virtnet_rq_dma *next;
+
+	dma_addr_t addr;
+
+	void *buf;
+	u32 len;
+
+	u32 ref;
+};
+
+/* Record the dma and buf. */
+struct virtnet_rq_data {
+	struct virtnet_rq_data *next;
+
+	void *buf;
+
+	struct virtnet_rq_dma *dma;
+};
+
 /* Internal representation of a send virtqueue */
 struct send_queue {
 	/* Virtqueue associated with this send _queue */
@@ -175,6 +196,13 @@ struct receive_queue {
 	char name[16];
 
 	struct xdp_rxq_info xdp_rxq;
+
+	struct virtnet_rq_data *data_array;
+	struct virtnet_rq_data *data_free;
+
+	struct virtnet_rq_dma *dma_array;
+	struct virtnet_rq_dma *dma_free;
+	struct virtnet_rq_dma *last_dma;
 };
 
 /* This structure can contain rss message with maximum settings for indirection table and keysize
@@ -549,6 +577,176 @@ static struct sk_buff *page_to_skb(struct virtnet_info *vi,
 	return skb;
 }
 
+static void virtnet_rq_unmap(struct receive_queue *rq, struct virtnet_rq_dma *dma)
+{
+	struct device *dev;
+
+	--dma->ref;
+
+	if (dma->ref)
+		return;
+
+	dev = virtqueue_dma_dev(rq->vq);
+
+	dma_unmap_page(dev, dma->addr, dma->len, DMA_FROM_DEVICE);
+
+	dma->next = rq->dma_free;
+	rq->dma_free = dma;
+}
+
+static void *virtnet_rq_recycle_data(struct receive_queue *rq,
+				     struct virtnet_rq_data *data)
+{
+	void *buf;
+
+	buf = data->buf;
+
+	data->next = rq->data_free;
+	rq->data_free = data;
+
+	return buf;
+}
+
+static struct virtnet_rq_data *virtnet_rq_get_data(struct receive_queue *rq,
+						   void *buf,
+						   struct virtnet_rq_dma *dma)
+{
+	struct virtnet_rq_data *data;
+
+	data = rq->data_free;
+	rq->data_free = data->next;
+
+	data->buf = buf;
+	data->dma = dma;
+
+	return data;
+}
+
+static void *virtnet_rq_get_buf(struct receive_queue *rq, u32 *len, void **ctx)
+{
+	struct virtnet_rq_data *data;
+	void *buf;
+
+	buf = virtqueue_get_buf_ctx(rq->vq, len, ctx);
+	if (!buf || !rq->data_array)
+		return buf;
+
+	data = buf;
+
+	virtnet_rq_unmap(rq, data->dma);
+
+	return virtnet_rq_recycle_data(rq, data);
+}
+
+static void *virtnet_rq_detach_unused_buf(struct receive_queue *rq)
+{
+	struct virtnet_rq_data *data;
+	void *buf;
+
+	buf = virtqueue_detach_unused_buf(rq->vq);
+	if (!buf || !rq->data_array)
+		return buf;
+
+	data = buf;
+
+	virtnet_rq_unmap(rq, data->dma);
+
+	return virtnet_rq_recycle_data(rq, data);
+}
+
+static int virtnet_rq_map_sg(struct receive_queue *rq, void *buf, u32 len)
+{
+	struct virtnet_rq_dma *dma = rq->last_dma;
+	struct device *dev;
+	u32 off, map_len;
+	dma_addr_t addr;
+	void *end;
+
+	if (likely(dma) && buf >= dma->buf && (buf + len <= dma->buf + dma->len)) {
+		++dma->ref;
+		addr = dma->addr + (buf - dma->buf);
+		goto ok;
+	}
+
+	end = buf + len - 1;
+	off = offset_in_page(end);
+	map_len = len + PAGE_SIZE - off;
+
+	dev = virtqueue_dma_dev(rq->vq);
+
+	addr = dma_map_page_attrs(dev, virt_to_page(buf), offset_in_page(buf),
+				  map_len, DMA_FROM_DEVICE, 0);
+	if (addr == DMA_MAPPING_ERROR)
+		return -ENOMEM;
+
+	dma = rq->dma_free;
+	rq->dma_free = dma->next;
+
+	dma->ref = 1;
+	dma->buf = buf;
+	dma->addr = addr;
+	dma->len = map_len;
+
+	rq->last_dma = dma;
+
+ok:
+	sg_init_table(rq->sg, 1);
+	rq->sg[0].dma_address = addr;
+	rq->sg[0].length = len;
+
+	return 0;
+}
+
+static int virtnet_rq_merge_map_init(struct virtnet_info *vi)
+{
+	struct receive_queue *rq;
+	int i, err, j, num;
+
+	/* disable for big mode */
+	if (!vi->mergeable_rx_bufs && vi->big_packets)
+		return 0;
+
+	for (i = 0; i < vi->max_queue_pairs; i++) {
+		err = virtqueue_set_premapped(vi->rq[i].vq);
+		if (err)
+			continue;
+
+		rq = &vi->rq[i];
+
+		num = virtqueue_get_vring_size(rq->vq);
+
+		rq->data_array = kmalloc_array(num, sizeof(*rq->data_array), GFP_KERNEL);
+		if (!rq->data_array)
+			goto err;
+
+		rq->dma_array = kmalloc_array(num, sizeof(*rq->dma_array), GFP_KERNEL);
+		if (!rq->dma_array)
+			goto err;
+
+		for (j = 0; j < num; ++j) {
+			rq->data_array[j].next = rq->data_free;
+			rq->data_free = &rq->data_array[j];
+
+			rq->dma_array[j].next = rq->dma_free;
+			rq->dma_free = &rq->dma_array[j];
+		}
+	}
+
+	return 0;
+
+err:
+	for (i = 0; i < vi->max_queue_pairs; i++) {
+		struct receive_queue *rq;
+
+		rq = &vi->rq[i];
+
+		kfree(rq->dma_array);
+		kfree(rq->data_array);
+	}
+
+	return -ENOMEM;
+}
+
 static void free_old_xmit_skbs(struct send_queue *sq, bool in_napi)
 {
 	unsigned int len;
@@ -835,7 +1033,7 @@ static struct page *xdp_linearize_page(struct receive_queue *rq,
 		void *buf;
 		int off;
 
-		buf = virtqueue_get_buf(rq->vq, &buflen);
+		buf = virtnet_rq_get_buf(rq, &buflen, NULL);
 		if (unlikely(!buf))
 			goto err_buf;
 
@@ -1126,7 +1324,7 @@ static int virtnet_build_xdp_buff_mrg(struct net_device *dev,
 		return -EINVAL;
 
 	while (--*num_buf > 0) {
-		buf = virtqueue_get_buf_ctx(rq->vq, &len, &ctx);
+		buf = virtnet_rq_get_buf(rq, &len, &ctx);
 		if (unlikely(!buf)) {
 			pr_debug("%s: rx error: %d buffers out of %d missing\n",
 				 dev->name, *num_buf,
@@ -1351,7 +1549,7 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
 	while (--num_buf) {
 		int num_skb_frags;
 
-		buf = virtqueue_get_buf_ctx(rq->vq, &len, &ctx);
+		buf = virtnet_rq_get_buf(rq, &len, &ctx);
 		if (unlikely(!buf)) {
 			pr_debug("%s: rx error: %d buffers out of %d missing\n",
 				 dev->name, num_buf,
@@ -1414,7 +1612,7 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
 err_skb:
 	put_page(page);
 	while (num_buf-- > 1) {
-		buf = virtqueue_get_buf(rq->vq, &len);
+		buf = virtnet_rq_get_buf(rq, &len, NULL);
 		if (unlikely(!buf)) {
 			pr_debug("%s: rx error: %d buffers missing\n",
 				 dev->name, num_buf);
@@ -1529,6 +1727,7 @@ static int add_recvbuf_small(struct virtnet_info *vi, struct receive_queue *rq,
 	unsigned int xdp_headroom = virtnet_get_headroom(vi);
 	void *ctx = (void *)(unsigned long)xdp_headroom;
 	int len = vi->hdr_len + VIRTNET_RX_PAD + GOOD_PACKET_LEN + xdp_headroom;
+	struct virtnet_rq_data *data;
 	int err;
 
 	len = SKB_DATA_ALIGN(len) +
@@ -1539,11 +1738,34 @@ static int add_recvbuf_small(struct virtnet_info *vi, struct receive_queue *rq,
 	buf = (char *)page_address(alloc_frag->page) + alloc_frag->offset;
 	get_page(alloc_frag->page);
 	alloc_frag->offset += len;
-	sg_init_one(rq->sg, buf + VIRTNET_RX_PAD + xdp_headroom,
-		    vi->hdr_len + GOOD_PACKET_LEN);
-	err = virtqueue_add_inbuf_ctx(rq->vq, rq->sg, 1, buf, ctx, gfp);
+
+	if (rq->data_array) {
+		err = virtnet_rq_map_sg(rq, buf + VIRTNET_RX_PAD + xdp_headroom,
+					vi->hdr_len + GOOD_PACKET_LEN);
+		if (err)
+			goto map_err;
+
+		data = virtnet_rq_get_data(rq, buf, rq->last_dma);
+	} else {
+		sg_init_one(rq->sg, buf + VIRTNET_RX_PAD + xdp_headroom,
+			    vi->hdr_len + GOOD_PACKET_LEN);
+		data = (void *)buf;
+	}
+
+	err = virtqueue_add_inbuf_ctx(rq->vq, rq->sg, 1, data, ctx, gfp);
 	if (err < 0)
-		put_page(virt_to_head_page(buf));
+		goto add_err;
+
+	return err;
+
+add_err:
+	if (rq->data_array) {
+		virtnet_rq_unmap(rq, data->dma);
+		virtnet_rq_recycle_data(rq, data);
+	}
+
+map_err:
+	put_page(virt_to_head_page(buf));
 	return err;
 }
 
@@ -1620,6 +1842,7 @@ static int add_recvbuf_mergeable(struct virtnet_info *vi,
 	unsigned int headroom = virtnet_get_headroom(vi);
 	unsigned int tailroom = headroom ? sizeof(struct skb_shared_info) : 0;
 	unsigned int room = SKB_DATA_ALIGN(headroom + tailroom);
+	struct virtnet_rq_data *data;
 	char *buf;
 	void *ctx;
 	int err;
@@ -1650,12 +1873,32 @@ static int add_recvbuf_mergeable(struct virtnet_info *vi,
 		alloc_frag->offset += hole;
 	}
 
-	sg_init_one(rq->sg, buf, len);
+	if (rq->data_array) {
+		err = virtnet_rq_map_sg(rq, buf, len);
+		if (err)
+			goto map_err;
+
+		data = virtnet_rq_get_data(rq, buf, rq->last_dma);
+	} else {
+		sg_init_one(rq->sg, buf, len);
+		data = (void *)buf;
+	}
+
 	ctx = mergeable_len_to_ctx(len + room, headroom);
-	err = virtqueue_add_inbuf_ctx(rq->vq, rq->sg, 1, buf, ctx, gfp);
+	err = virtqueue_add_inbuf_ctx(rq->vq, rq->sg, 1, data, ctx, gfp);
 	if (err < 0)
-		put_page(virt_to_head_page(buf));
+		goto add_err;
+
+	return 0;
+
+add_err:
+	if (rq->data_array) {
+		virtnet_rq_unmap(rq, data->dma);
+		virtnet_rq_recycle_data(rq, data);
+	}
 
+map_err:
+	put_page(virt_to_head_page(buf));
 	return err;
 }
 
@@ -1775,13 +2018,13 @@ static int virtnet_receive(struct receive_queue *rq, int budget,
 		void *ctx;
 
 		while (stats.packets < budget &&
-		       (buf = virtqueue_get_buf_ctx(rq->vq, &len, &ctx))) {
+		       (buf = virtnet_rq_get_buf(rq, &len, &ctx))) {
 			receive_buf(vi, rq, buf, len, ctx, xdp_xmit, &stats);
 			stats.packets++;
 		}
 	} else {
 		while (stats.packets < budget &&
-		       (buf = virtqueue_get_buf(rq->vq, &len)) != NULL) {
+		       (buf = virtnet_rq_get_buf(rq, &len, NULL)) != NULL) {
 			receive_buf(vi, rq, buf, len, NULL, xdp_xmit, &stats);
 			stats.packets++;
 		}
@@ -3514,6 +3757,9 @@ static void virtnet_free_queues(struct virtnet_info *vi)
 	for (i = 0; i < vi->max_queue_pairs; i++) {
 		__netif_napi_del(&vi->rq[i].napi);
 		__netif_napi_del(&vi->sq[i].napi);
+
+		kfree(vi->rq[i].data_array);
+		kfree(vi->rq[i].dma_array);
 	}
 
 	/* We called __netif_napi_del(),
@@ -3591,9 +3837,10 @@ static void free_unused_bufs(struct virtnet_info *vi)
 	}
 
 	for (i = 0; i < vi->max_queue_pairs; i++) {
-		struct virtqueue *vq = vi->rq[i].vq;
-		while ((buf = virtqueue_detach_unused_buf(vq)) != NULL)
-			virtnet_rq_free_unused_buf(vq, buf);
+		struct receive_queue *rq = &vi->rq[i];
+
+		while ((buf = virtnet_rq_detach_unused_buf(rq)) != NULL)
+			virtnet_rq_free_unused_buf(rq->vq, buf);
 		cond_resched();
 	}
 }
@@ -3767,6 +4014,10 @@ static int init_vqs(struct virtnet_info *vi)
 	if (ret)
 		goto err_free;
 
+	ret = virtnet_rq_merge_map_init(vi);
+	if (ret)
+		goto err_free;
+
 	cpus_read_lock();
 	virtnet_set_affinity(vi);
 	cpus_read_unlock();
-- 
2.32.0.3.g01195cf9f


^ permalink raw reply related	[flat|nested] 176+ messages in thread

* [PATCH vhost v11 10/10] virtio_net: merge dma operation for one page
@ 2023-07-10  3:42   ` Xuan Zhuo
  0 siblings, 0 replies; 176+ messages in thread
From: Xuan Zhuo @ 2023-07-10  3:42 UTC (permalink / raw)
  To: virtualization
  Cc: Xuan Zhuo, Jesper Dangaard Brouer, Daniel Borkmann,
	Michael S. Tsirkin, netdev, John Fastabend, Alexei Starovoitov,
	Christoph Hellwig, Eric Dumazet, Jakub Kicinski, bpf,
	Paolo Abeni, David S. Miller

Currently, the virtio core will perform a dma operation for each
operation. Although, the same page may be operated multiple times.

The driver does the dma operation and manages the dma address based the
feature premapped of virtio core.

This way, we can perform only one dma operation for the same page. In
the case of mtu 1500, this can reduce a lot of dma operations.

Tested on Aliyun g7.4large machine, in the case of a cpu 100%, pps
increased from 1893766 to 1901105. An increase of 0.4%.

Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
---
 drivers/net/virtio_net.c | 283 ++++++++++++++++++++++++++++++++++++---
 1 file changed, 267 insertions(+), 16 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 486b5849033d..4de845d35bed 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -126,6 +126,27 @@ static const struct virtnet_stat_desc virtnet_rq_stats_desc[] = {
 #define VIRTNET_SQ_STATS_LEN	ARRAY_SIZE(virtnet_sq_stats_desc)
 #define VIRTNET_RQ_STATS_LEN	ARRAY_SIZE(virtnet_rq_stats_desc)
 
+/* The bufs on the same page may share this struct. */
+struct virtnet_rq_dma {
+	struct virtnet_rq_dma *next;
+
+	dma_addr_t addr;
+
+	void *buf;
+	u32 len;
+
+	u32 ref;
+};
+
+/* Record the dma and buf. */
+struct virtnet_rq_data {
+	struct virtnet_rq_data *next;
+
+	void *buf;
+
+	struct virtnet_rq_dma *dma;
+};
+
 /* Internal representation of a send virtqueue */
 struct send_queue {
 	/* Virtqueue associated with this send _queue */
@@ -175,6 +196,13 @@ struct receive_queue {
 	char name[16];
 
 	struct xdp_rxq_info xdp_rxq;
+
+	struct virtnet_rq_data *data_array;
+	struct virtnet_rq_data *data_free;
+
+	struct virtnet_rq_dma *dma_array;
+	struct virtnet_rq_dma *dma_free;
+	struct virtnet_rq_dma *last_dma;
 };
 
 /* This structure can contain rss message with maximum settings for indirection table and keysize
@@ -549,6 +577,176 @@ static struct sk_buff *page_to_skb(struct virtnet_info *vi,
 	return skb;
 }
 
+static void virtnet_rq_unmap(struct receive_queue *rq, struct virtnet_rq_dma *dma)
+{
+	struct device *dev;
+
+	--dma->ref;
+
+	if (dma->ref)
+		return;
+
+	dev = virtqueue_dma_dev(rq->vq);
+
+	dma_unmap_page(dev, dma->addr, dma->len, DMA_FROM_DEVICE);
+
+	dma->next = rq->dma_free;
+	rq->dma_free = dma;
+}
+
+static void *virtnet_rq_recycle_data(struct receive_queue *rq,
+				     struct virtnet_rq_data *data)
+{
+	void *buf;
+
+	buf = data->buf;
+
+	data->next = rq->data_free;
+	rq->data_free = data;
+
+	return buf;
+}
+
+static struct virtnet_rq_data *virtnet_rq_get_data(struct receive_queue *rq,
+						   void *buf,
+						   struct virtnet_rq_dma *dma)
+{
+	struct virtnet_rq_data *data;
+
+	data = rq->data_free;
+	rq->data_free = data->next;
+
+	data->buf = buf;
+	data->dma = dma;
+
+	return data;
+}
+
+static void *virtnet_rq_get_buf(struct receive_queue *rq, u32 *len, void **ctx)
+{
+	struct virtnet_rq_data *data;
+	void *buf;
+
+	buf = virtqueue_get_buf_ctx(rq->vq, len, ctx);
+	if (!buf || !rq->data_array)
+		return buf;
+
+	data = buf;
+
+	virtnet_rq_unmap(rq, data->dma);
+
+	return virtnet_rq_recycle_data(rq, data);
+}
+
+static void *virtnet_rq_detach_unused_buf(struct receive_queue *rq)
+{
+	struct virtnet_rq_data *data;
+	void *buf;
+
+	buf = virtqueue_detach_unused_buf(rq->vq);
+	if (!buf || !rq->data_array)
+		return buf;
+
+	data = buf;
+
+	virtnet_rq_unmap(rq, data->dma);
+
+	return virtnet_rq_recycle_data(rq, data);
+}
+
+static int virtnet_rq_map_sg(struct receive_queue *rq, void *buf, u32 len)
+{
+	struct virtnet_rq_dma *dma = rq->last_dma;
+	struct device *dev;
+	u32 off, map_len;
+	dma_addr_t addr;
+	void *end;
+
+	if (likely(dma) && buf >= dma->buf && (buf + len <= dma->buf + dma->len)) {
+		++dma->ref;
+		addr = dma->addr + (buf - dma->buf);
+		goto ok;
+	}
+
+	end = buf + len - 1;
+	off = offset_in_page(end);
+	map_len = len + PAGE_SIZE - off;
+
+	dev = virtqueue_dma_dev(rq->vq);
+
+	addr = dma_map_page_attrs(dev, virt_to_page(buf), offset_in_page(buf),
+				  map_len, DMA_FROM_DEVICE, 0);
+	if (addr == DMA_MAPPING_ERROR)
+		return -ENOMEM;
+
+	dma = rq->dma_free;
+	rq->dma_free = dma->next;
+
+	dma->ref = 1;
+	dma->buf = buf;
+	dma->addr = addr;
+	dma->len = map_len;
+
+	rq->last_dma = dma;
+
+ok:
+	sg_init_table(rq->sg, 1);
+	rq->sg[0].dma_address = addr;
+	rq->sg[0].length = len;
+
+	return 0;
+}
+
+static int virtnet_rq_merge_map_init(struct virtnet_info *vi)
+{
+	struct receive_queue *rq;
+	int i, err, j, num;
+
+	/* disable for big mode */
+	if (!vi->mergeable_rx_bufs && vi->big_packets)
+		return 0;
+
+	for (i = 0; i < vi->max_queue_pairs; i++) {
+		err = virtqueue_set_premapped(vi->rq[i].vq);
+		if (err)
+			continue;
+
+		rq = &vi->rq[i];
+
+		num = virtqueue_get_vring_size(rq->vq);
+
+		rq->data_array = kmalloc_array(num, sizeof(*rq->data_array), GFP_KERNEL);
+		if (!rq->data_array)
+			goto err;
+
+		rq->dma_array = kmalloc_array(num, sizeof(*rq->dma_array), GFP_KERNEL);
+		if (!rq->dma_array)
+			goto err;
+
+		for (j = 0; j < num; ++j) {
+			rq->data_array[j].next = rq->data_free;
+			rq->data_free = &rq->data_array[j];
+
+			rq->dma_array[j].next = rq->dma_free;
+			rq->dma_free = &rq->dma_array[j];
+		}
+	}
+
+	return 0;
+
+err:
+	for (i = 0; i < vi->max_queue_pairs; i++) {
+		struct receive_queue *rq;
+
+		rq = &vi->rq[i];
+
+		kfree(rq->dma_array);
+		kfree(rq->data_array);
+	}
+
+	return -ENOMEM;
+}
+
 static void free_old_xmit_skbs(struct send_queue *sq, bool in_napi)
 {
 	unsigned int len;
@@ -835,7 +1033,7 @@ static struct page *xdp_linearize_page(struct receive_queue *rq,
 		void *buf;
 		int off;
 
-		buf = virtqueue_get_buf(rq->vq, &buflen);
+		buf = virtnet_rq_get_buf(rq, &buflen, NULL);
 		if (unlikely(!buf))
 			goto err_buf;
 
@@ -1126,7 +1324,7 @@ static int virtnet_build_xdp_buff_mrg(struct net_device *dev,
 		return -EINVAL;
 
 	while (--*num_buf > 0) {
-		buf = virtqueue_get_buf_ctx(rq->vq, &len, &ctx);
+		buf = virtnet_rq_get_buf(rq, &len, &ctx);
 		if (unlikely(!buf)) {
 			pr_debug("%s: rx error: %d buffers out of %d missing\n",
 				 dev->name, *num_buf,
@@ -1351,7 +1549,7 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
 	while (--num_buf) {
 		int num_skb_frags;
 
-		buf = virtqueue_get_buf_ctx(rq->vq, &len, &ctx);
+		buf = virtnet_rq_get_buf(rq, &len, &ctx);
 		if (unlikely(!buf)) {
 			pr_debug("%s: rx error: %d buffers out of %d missing\n",
 				 dev->name, num_buf,
@@ -1414,7 +1612,7 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
 err_skb:
 	put_page(page);
 	while (num_buf-- > 1) {
-		buf = virtqueue_get_buf(rq->vq, &len);
+		buf = virtnet_rq_get_buf(rq, &len, NULL);
 		if (unlikely(!buf)) {
 			pr_debug("%s: rx error: %d buffers missing\n",
 				 dev->name, num_buf);
@@ -1529,6 +1727,7 @@ static int add_recvbuf_small(struct virtnet_info *vi, struct receive_queue *rq,
 	unsigned int xdp_headroom = virtnet_get_headroom(vi);
 	void *ctx = (void *)(unsigned long)xdp_headroom;
 	int len = vi->hdr_len + VIRTNET_RX_PAD + GOOD_PACKET_LEN + xdp_headroom;
+	struct virtnet_rq_data *data;
 	int err;
 
 	len = SKB_DATA_ALIGN(len) +
@@ -1539,11 +1738,34 @@ static int add_recvbuf_small(struct virtnet_info *vi, struct receive_queue *rq,
 	buf = (char *)page_address(alloc_frag->page) + alloc_frag->offset;
 	get_page(alloc_frag->page);
 	alloc_frag->offset += len;
-	sg_init_one(rq->sg, buf + VIRTNET_RX_PAD + xdp_headroom,
-		    vi->hdr_len + GOOD_PACKET_LEN);
-	err = virtqueue_add_inbuf_ctx(rq->vq, rq->sg, 1, buf, ctx, gfp);
+
+	if (rq->data_array) {
+		err = virtnet_rq_map_sg(rq, buf + VIRTNET_RX_PAD + xdp_headroom,
+					vi->hdr_len + GOOD_PACKET_LEN);
+		if (err)
+			goto map_err;
+
+		data = virtnet_rq_get_data(rq, buf, rq->last_dma);
+	} else {
+		sg_init_one(rq->sg, buf + VIRTNET_RX_PAD + xdp_headroom,
+			    vi->hdr_len + GOOD_PACKET_LEN);
+		data = (void *)buf;
+	}
+
+	err = virtqueue_add_inbuf_ctx(rq->vq, rq->sg, 1, data, ctx, gfp);
 	if (err < 0)
-		put_page(virt_to_head_page(buf));
+		goto add_err;
+
+	return err;
+
+add_err:
+	if (rq->data_array) {
+		virtnet_rq_unmap(rq, data->dma);
+		virtnet_rq_recycle_data(rq, data);
+	}
+
+map_err:
+	put_page(virt_to_head_page(buf));
 	return err;
 }
 
@@ -1620,6 +1842,7 @@ static int add_recvbuf_mergeable(struct virtnet_info *vi,
 	unsigned int headroom = virtnet_get_headroom(vi);
 	unsigned int tailroom = headroom ? sizeof(struct skb_shared_info) : 0;
 	unsigned int room = SKB_DATA_ALIGN(headroom + tailroom);
+	struct virtnet_rq_data *data;
 	char *buf;
 	void *ctx;
 	int err;
@@ -1650,12 +1873,32 @@ static int add_recvbuf_mergeable(struct virtnet_info *vi,
 		alloc_frag->offset += hole;
 	}
 
-	sg_init_one(rq->sg, buf, len);
+	if (rq->data_array) {
+		err = virtnet_rq_map_sg(rq, buf, len);
+		if (err)
+			goto map_err;
+
+		data = virtnet_rq_get_data(rq, buf, rq->last_dma);
+	} else {
+		sg_init_one(rq->sg, buf, len);
+		data = (void *)buf;
+	}
+
 	ctx = mergeable_len_to_ctx(len + room, headroom);
-	err = virtqueue_add_inbuf_ctx(rq->vq, rq->sg, 1, buf, ctx, gfp);
+	err = virtqueue_add_inbuf_ctx(rq->vq, rq->sg, 1, data, ctx, gfp);
 	if (err < 0)
-		put_page(virt_to_head_page(buf));
+		goto add_err;
+
+	return 0;
+
+add_err:
+	if (rq->data_array) {
+		virtnet_rq_unmap(rq, data->dma);
+		virtnet_rq_recycle_data(rq, data);
+	}
 
+map_err:
+	put_page(virt_to_head_page(buf));
 	return err;
 }
 
@@ -1775,13 +2018,13 @@ static int virtnet_receive(struct receive_queue *rq, int budget,
 		void *ctx;
 
 		while (stats.packets < budget &&
-		       (buf = virtqueue_get_buf_ctx(rq->vq, &len, &ctx))) {
+		       (buf = virtnet_rq_get_buf(rq, &len, &ctx))) {
 			receive_buf(vi, rq, buf, len, ctx, xdp_xmit, &stats);
 			stats.packets++;
 		}
 	} else {
 		while (stats.packets < budget &&
-		       (buf = virtqueue_get_buf(rq->vq, &len)) != NULL) {
+		       (buf = virtnet_rq_get_buf(rq, &len, NULL)) != NULL) {
 			receive_buf(vi, rq, buf, len, NULL, xdp_xmit, &stats);
 			stats.packets++;
 		}
@@ -3514,6 +3757,9 @@ static void virtnet_free_queues(struct virtnet_info *vi)
 	for (i = 0; i < vi->max_queue_pairs; i++) {
 		__netif_napi_del(&vi->rq[i].napi);
 		__netif_napi_del(&vi->sq[i].napi);
+
+		kfree(vi->rq[i].data_array);
+		kfree(vi->rq[i].dma_array);
 	}
 
 	/* We called __netif_napi_del(),
@@ -3591,9 +3837,10 @@ static void free_unused_bufs(struct virtnet_info *vi)
 	}
 
 	for (i = 0; i < vi->max_queue_pairs; i++) {
-		struct virtqueue *vq = vi->rq[i].vq;
-		while ((buf = virtqueue_detach_unused_buf(vq)) != NULL)
-			virtnet_rq_free_unused_buf(vq, buf);
+		struct receive_queue *rq = &vi->rq[i];
+
+		while ((buf = virtnet_rq_detach_unused_buf(rq)) != NULL)
+			virtnet_rq_free_unused_buf(rq->vq, buf);
 		cond_resched();
 	}
 }
@@ -3767,6 +4014,10 @@ static int init_vqs(struct virtnet_info *vi)
 	if (ret)
 		goto err_free;
 
+	ret = virtnet_rq_merge_map_init(vi);
+	if (ret)
+		goto err_free;
+
 	cpus_read_lock();
 	virtnet_set_affinity(vi);
 	cpus_read_unlock();
-- 
2.32.0.3.g01195cf9f

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply related	[flat|nested] 176+ messages in thread

* Re: [PATCH vhost v11 10/10] virtio_net: merge dma operation for one page
  2023-07-10  3:42   ` Xuan Zhuo
@ 2023-07-10  9:40     ` Michael S. Tsirkin
  -1 siblings, 0 replies; 176+ messages in thread
From: Michael S. Tsirkin @ 2023-07-10  9:40 UTC (permalink / raw)
  To: Xuan Zhuo
  Cc: Jesper Dangaard Brouer, Daniel Borkmann, netdev, John Fastabend,
	Alexei Starovoitov, virtualization, Christoph Hellwig,
	Eric Dumazet, Jakub Kicinski, bpf, Paolo Abeni, David S. Miller

On Mon, Jul 10, 2023 at 11:42:37AM +0800, Xuan Zhuo wrote:
> Currently, the virtio core will perform a dma operation for each
> operation. Although, the same page may be operated multiple times.
> 
> The driver does the dma operation and manages the dma address based the
> feature premapped of virtio core.
> 
> This way, we can perform only one dma operation for the same page. In
> the case of mtu 1500, this can reduce a lot of dma operations.
> 
> Tested on Aliyun g7.4large machine, in the case of a cpu 100%, pps
> increased from 1893766 to 1901105. An increase of 0.4%.

what kind of dma was there? an IOMMU? which vendors? in which mode
of operation?

> Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>

This kind of difference is likely in the noise.


> ---
>  drivers/net/virtio_net.c | 283 ++++++++++++++++++++++++++++++++++++---
>  1 file changed, 267 insertions(+), 16 deletions(-)
> 
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index 486b5849033d..4de845d35bed 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -126,6 +126,27 @@ static const struct virtnet_stat_desc virtnet_rq_stats_desc[] = {
>  #define VIRTNET_SQ_STATS_LEN	ARRAY_SIZE(virtnet_sq_stats_desc)
>  #define VIRTNET_RQ_STATS_LEN	ARRAY_SIZE(virtnet_rq_stats_desc)
>  
> +/* The bufs on the same page may share this struct. */
> +struct virtnet_rq_dma {
> +	struct virtnet_rq_dma *next;
> +
> +	dma_addr_t addr;
> +
> +	void *buf;
> +	u32 len;
> +
> +	u32 ref;
> +};
> +
> +/* Record the dma and buf. */

I guess I see that. But why?
And these two comments are the extent of the available
documentation, that's not enough I feel.


> +struct virtnet_rq_data {
> +	struct virtnet_rq_data *next;

Is manually reimplementing a linked list the best
we can do?

> +
> +	void *buf;
> +
> +	struct virtnet_rq_dma *dma;
> +};
> +
>  /* Internal representation of a send virtqueue */
>  struct send_queue {
>  	/* Virtqueue associated with this send _queue */
> @@ -175,6 +196,13 @@ struct receive_queue {
>  	char name[16];
>  
>  	struct xdp_rxq_info xdp_rxq;
> +
> +	struct virtnet_rq_data *data_array;
> +	struct virtnet_rq_data *data_free;
> +
> +	struct virtnet_rq_dma *dma_array;
> +	struct virtnet_rq_dma *dma_free;
> +	struct virtnet_rq_dma *last_dma;
>  };
>  
>  /* This structure can contain rss message with maximum settings for indirection table and keysize
> @@ -549,6 +577,176 @@ static struct sk_buff *page_to_skb(struct virtnet_info *vi,
>  	return skb;
>  }
>  
> +static void virtnet_rq_unmap(struct receive_queue *rq, struct virtnet_rq_dma *dma)
> +{
> +	struct device *dev;
> +
> +	--dma->ref;
> +
> +	if (dma->ref)
> +		return;
> +

If you don't unmap there is no guarantee valid data will be
there in the buffer.

> +	dev = virtqueue_dma_dev(rq->vq);
> +
> +	dma_unmap_page(dev, dma->addr, dma->len, DMA_FROM_DEVICE);





> +
> +	dma->next = rq->dma_free;
> +	rq->dma_free = dma;
> +}
> +
> +static void *virtnet_rq_recycle_data(struct receive_queue *rq,
> +				     struct virtnet_rq_data *data)
> +{
> +	void *buf;
> +
> +	buf = data->buf;
> +
> +	data->next = rq->data_free;
> +	rq->data_free = data;
> +
> +	return buf;
> +}
> +
> +static struct virtnet_rq_data *virtnet_rq_get_data(struct receive_queue *rq,
> +						   void *buf,
> +						   struct virtnet_rq_dma *dma)
> +{
> +	struct virtnet_rq_data *data;
> +
> +	data = rq->data_free;
> +	rq->data_free = data->next;
> +
> +	data->buf = buf;
> +	data->dma = dma;
> +
> +	return data;
> +}
> +
> +static void *virtnet_rq_get_buf(struct receive_queue *rq, u32 *len, void **ctx)
> +{
> +	struct virtnet_rq_data *data;
> +	void *buf;
> +
> +	buf = virtqueue_get_buf_ctx(rq->vq, len, ctx);
> +	if (!buf || !rq->data_array)
> +		return buf;
> +
> +	data = buf;
> +
> +	virtnet_rq_unmap(rq, data->dma);
> +
> +	return virtnet_rq_recycle_data(rq, data);
> +}
> +
> +static void *virtnet_rq_detach_unused_buf(struct receive_queue *rq)
> +{
> +	struct virtnet_rq_data *data;
> +	void *buf;
> +
> +	buf = virtqueue_detach_unused_buf(rq->vq);
> +	if (!buf || !rq->data_array)
> +		return buf;
> +
> +	data = buf;
> +
> +	virtnet_rq_unmap(rq, data->dma);
> +
> +	return virtnet_rq_recycle_data(rq, data);
> +}
> +
> +static int virtnet_rq_map_sg(struct receive_queue *rq, void *buf, u32 len)
> +{
> +	struct virtnet_rq_dma *dma = rq->last_dma;
> +	struct device *dev;
> +	u32 off, map_len;
> +	dma_addr_t addr;
> +	void *end;
> +
> +	if (likely(dma) && buf >= dma->buf && (buf + len <= dma->buf + dma->len)) {
> +		++dma->ref;
> +		addr = dma->addr + (buf - dma->buf);
> +		goto ok;
> +	}

So this is the meat of the proposed optimization. I guess that
if the last buffer we allocated happens to be in the same page
as this one then they can both be mapped for DMA together.
Why last one specifically? Whether next one happens to
be close depends on luck. If you want to try optimizing this
the right thing to do is likely by using a page pool.
There's actually work upstream on page pool, look it up.

> +
> +	end = buf + len - 1;
> +	off = offset_in_page(end);
> +	map_len = len + PAGE_SIZE - off;
> +
> +	dev = virtqueue_dma_dev(rq->vq);
> +
> +	addr = dma_map_page_attrs(dev, virt_to_page(buf), offset_in_page(buf),
> +				  map_len, DMA_FROM_DEVICE, 0);
> +	if (addr == DMA_MAPPING_ERROR)
> +		return -ENOMEM;
> +
> +	dma = rq->dma_free;
> +	rq->dma_free = dma->next;
> +
> +	dma->ref = 1;
> +	dma->buf = buf;
> +	dma->addr = addr;
> +	dma->len = map_len;
> +
> +	rq->last_dma = dma;
> +
> +ok:
> +	sg_init_table(rq->sg, 1);
> +	rq->sg[0].dma_address = addr;
> +	rq->sg[0].length = len;
> +
> +	return 0;
> +}
> +
> +static int virtnet_rq_merge_map_init(struct virtnet_info *vi)
> +{
> +	struct receive_queue *rq;
> +	int i, err, j, num;
> +
> +	/* disable for big mode */
> +	if (!vi->mergeable_rx_bufs && vi->big_packets)
> +		return 0;
> +
> +	for (i = 0; i < vi->max_queue_pairs; i++) {
> +		err = virtqueue_set_premapped(vi->rq[i].vq);
> +		if (err)
> +			continue;
> +
> +		rq = &vi->rq[i];
> +
> +		num = virtqueue_get_vring_size(rq->vq);
> +
> +		rq->data_array = kmalloc_array(num, sizeof(*rq->data_array), GFP_KERNEL);
> +		if (!rq->data_array)
> +			goto err;
> +
> +		rq->dma_array = kmalloc_array(num, sizeof(*rq->dma_array), GFP_KERNEL);
> +		if (!rq->dma_array)
> +			goto err;
> +
> +		for (j = 0; j < num; ++j) {
> +			rq->data_array[j].next = rq->data_free;
> +			rq->data_free = &rq->data_array[j];
> +
> +			rq->dma_array[j].next = rq->dma_free;
> +			rq->dma_free = &rq->dma_array[j];
> +		}
> +	}
> +
> +	return 0;
> +
> +err:
> +	for (i = 0; i < vi->max_queue_pairs; i++) {
> +		struct receive_queue *rq;
> +
> +		rq = &vi->rq[i];
> +
> +		kfree(rq->dma_array);
> +		kfree(rq->data_array);
> +	}
> +
> +	return -ENOMEM;
> +}
> +
>  static void free_old_xmit_skbs(struct send_queue *sq, bool in_napi)
>  {
>  	unsigned int len;
> @@ -835,7 +1033,7 @@ static struct page *xdp_linearize_page(struct receive_queue *rq,
>  		void *buf;
>  		int off;
>  
> -		buf = virtqueue_get_buf(rq->vq, &buflen);
> +		buf = virtnet_rq_get_buf(rq, &buflen, NULL);
>  		if (unlikely(!buf))
>  			goto err_buf;
>  
> @@ -1126,7 +1324,7 @@ static int virtnet_build_xdp_buff_mrg(struct net_device *dev,
>  		return -EINVAL;
>  
>  	while (--*num_buf > 0) {
> -		buf = virtqueue_get_buf_ctx(rq->vq, &len, &ctx);
> +		buf = virtnet_rq_get_buf(rq, &len, &ctx);
>  		if (unlikely(!buf)) {
>  			pr_debug("%s: rx error: %d buffers out of %d missing\n",
>  				 dev->name, *num_buf,
> @@ -1351,7 +1549,7 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
>  	while (--num_buf) {
>  		int num_skb_frags;
>  
> -		buf = virtqueue_get_buf_ctx(rq->vq, &len, &ctx);
> +		buf = virtnet_rq_get_buf(rq, &len, &ctx);
>  		if (unlikely(!buf)) {
>  			pr_debug("%s: rx error: %d buffers out of %d missing\n",
>  				 dev->name, num_buf,
> @@ -1414,7 +1612,7 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
>  err_skb:
>  	put_page(page);
>  	while (num_buf-- > 1) {
> -		buf = virtqueue_get_buf(rq->vq, &len);
> +		buf = virtnet_rq_get_buf(rq, &len, NULL);
>  		if (unlikely(!buf)) {
>  			pr_debug("%s: rx error: %d buffers missing\n",
>  				 dev->name, num_buf);
> @@ -1529,6 +1727,7 @@ static int add_recvbuf_small(struct virtnet_info *vi, struct receive_queue *rq,
>  	unsigned int xdp_headroom = virtnet_get_headroom(vi);
>  	void *ctx = (void *)(unsigned long)xdp_headroom;
>  	int len = vi->hdr_len + VIRTNET_RX_PAD + GOOD_PACKET_LEN + xdp_headroom;
> +	struct virtnet_rq_data *data;
>  	int err;
>  
>  	len = SKB_DATA_ALIGN(len) +
> @@ -1539,11 +1738,34 @@ static int add_recvbuf_small(struct virtnet_info *vi, struct receive_queue *rq,
>  	buf = (char *)page_address(alloc_frag->page) + alloc_frag->offset;
>  	get_page(alloc_frag->page);
>  	alloc_frag->offset += len;
> -	sg_init_one(rq->sg, buf + VIRTNET_RX_PAD + xdp_headroom,
> -		    vi->hdr_len + GOOD_PACKET_LEN);
> -	err = virtqueue_add_inbuf_ctx(rq->vq, rq->sg, 1, buf, ctx, gfp);
> +
> +	if (rq->data_array) {
> +		err = virtnet_rq_map_sg(rq, buf + VIRTNET_RX_PAD + xdp_headroom,
> +					vi->hdr_len + GOOD_PACKET_LEN);
> +		if (err)
> +			goto map_err;
> +
> +		data = virtnet_rq_get_data(rq, buf, rq->last_dma);
> +	} else {
> +		sg_init_one(rq->sg, buf + VIRTNET_RX_PAD + xdp_headroom,
> +			    vi->hdr_len + GOOD_PACKET_LEN);
> +		data = (void *)buf;
> +	}
> +
> +	err = virtqueue_add_inbuf_ctx(rq->vq, rq->sg, 1, data, ctx, gfp);
>  	if (err < 0)
> -		put_page(virt_to_head_page(buf));
> +		goto add_err;
> +
> +	return err;
> +
> +add_err:
> +	if (rq->data_array) {
> +		virtnet_rq_unmap(rq, data->dma);
> +		virtnet_rq_recycle_data(rq, data);
> +	}
> +
> +map_err:
> +	put_page(virt_to_head_page(buf));
>  	return err;
>  }
>  
> @@ -1620,6 +1842,7 @@ static int add_recvbuf_mergeable(struct virtnet_info *vi,
>  	unsigned int headroom = virtnet_get_headroom(vi);
>  	unsigned int tailroom = headroom ? sizeof(struct skb_shared_info) : 0;
>  	unsigned int room = SKB_DATA_ALIGN(headroom + tailroom);
> +	struct virtnet_rq_data *data;
>  	char *buf;
>  	void *ctx;
>  	int err;
> @@ -1650,12 +1873,32 @@ static int add_recvbuf_mergeable(struct virtnet_info *vi,
>  		alloc_frag->offset += hole;
>  	}
>  
> -	sg_init_one(rq->sg, buf, len);
> +	if (rq->data_array) {
> +		err = virtnet_rq_map_sg(rq, buf, len);
> +		if (err)
> +			goto map_err;
> +
> +		data = virtnet_rq_get_data(rq, buf, rq->last_dma);
> +	} else {
> +		sg_init_one(rq->sg, buf, len);
> +		data = (void *)buf;
> +	}
> +
>  	ctx = mergeable_len_to_ctx(len + room, headroom);
> -	err = virtqueue_add_inbuf_ctx(rq->vq, rq->sg, 1, buf, ctx, gfp);
> +	err = virtqueue_add_inbuf_ctx(rq->vq, rq->sg, 1, data, ctx, gfp);
>  	if (err < 0)
> -		put_page(virt_to_head_page(buf));
> +		goto add_err;
> +
> +	return 0;
> +
> +add_err:
> +	if (rq->data_array) {
> +		virtnet_rq_unmap(rq, data->dma);
> +		virtnet_rq_recycle_data(rq, data);
> +	}
>  
> +map_err:
> +	put_page(virt_to_head_page(buf));
>  	return err;
>  }
>  
> @@ -1775,13 +2018,13 @@ static int virtnet_receive(struct receive_queue *rq, int budget,
>  		void *ctx;
>  
>  		while (stats.packets < budget &&
> -		       (buf = virtqueue_get_buf_ctx(rq->vq, &len, &ctx))) {
> +		       (buf = virtnet_rq_get_buf(rq, &len, &ctx))) {
>  			receive_buf(vi, rq, buf, len, ctx, xdp_xmit, &stats);
>  			stats.packets++;
>  		}
>  	} else {
>  		while (stats.packets < budget &&
> -		       (buf = virtqueue_get_buf(rq->vq, &len)) != NULL) {
> +		       (buf = virtnet_rq_get_buf(rq, &len, NULL)) != NULL) {
>  			receive_buf(vi, rq, buf, len, NULL, xdp_xmit, &stats);
>  			stats.packets++;
>  		}
> @@ -3514,6 +3757,9 @@ static void virtnet_free_queues(struct virtnet_info *vi)
>  	for (i = 0; i < vi->max_queue_pairs; i++) {
>  		__netif_napi_del(&vi->rq[i].napi);
>  		__netif_napi_del(&vi->sq[i].napi);
> +
> +		kfree(vi->rq[i].data_array);
> +		kfree(vi->rq[i].dma_array);
>  	}
>  
>  	/* We called __netif_napi_del(),
> @@ -3591,9 +3837,10 @@ static void free_unused_bufs(struct virtnet_info *vi)
>  	}
>  
>  	for (i = 0; i < vi->max_queue_pairs; i++) {
> -		struct virtqueue *vq = vi->rq[i].vq;
> -		while ((buf = virtqueue_detach_unused_buf(vq)) != NULL)
> -			virtnet_rq_free_unused_buf(vq, buf);
> +		struct receive_queue *rq = &vi->rq[i];
> +
> +		while ((buf = virtnet_rq_detach_unused_buf(rq)) != NULL)
> +			virtnet_rq_free_unused_buf(rq->vq, buf);
>  		cond_resched();
>  	}
>  }
> @@ -3767,6 +4014,10 @@ static int init_vqs(struct virtnet_info *vi)
>  	if (ret)
>  		goto err_free;
>  
> +	ret = virtnet_rq_merge_map_init(vi);
> +	if (ret)
> +		goto err_free;
> +
>  	cpus_read_lock();
>  	virtnet_set_affinity(vi);
>  	cpus_read_unlock();
> -- 
> 2.32.0.3.g01195cf9f

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: [PATCH vhost v11 10/10] virtio_net: merge dma operation for one page
@ 2023-07-10  9:40     ` Michael S. Tsirkin
  0 siblings, 0 replies; 176+ messages in thread
From: Michael S. Tsirkin @ 2023-07-10  9:40 UTC (permalink / raw)
  To: Xuan Zhuo
  Cc: virtualization, Jason Wang, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Alexei Starovoitov, Daniel Borkmann,
	Jesper Dangaard Brouer, John Fastabend, netdev, bpf,
	Christoph Hellwig

On Mon, Jul 10, 2023 at 11:42:37AM +0800, Xuan Zhuo wrote:
> Currently, the virtio core will perform a dma operation for each
> operation. Although, the same page may be operated multiple times.
> 
> The driver does the dma operation and manages the dma address based the
> feature premapped of virtio core.
> 
> This way, we can perform only one dma operation for the same page. In
> the case of mtu 1500, this can reduce a lot of dma operations.
> 
> Tested on Aliyun g7.4large machine, in the case of a cpu 100%, pps
> increased from 1893766 to 1901105. An increase of 0.4%.

what kind of dma was there? an IOMMU? which vendors? in which mode
of operation?

> Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>

This kind of difference is likely in the noise.


> ---
>  drivers/net/virtio_net.c | 283 ++++++++++++++++++++++++++++++++++++---
>  1 file changed, 267 insertions(+), 16 deletions(-)
> 
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index 486b5849033d..4de845d35bed 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -126,6 +126,27 @@ static const struct virtnet_stat_desc virtnet_rq_stats_desc[] = {
>  #define VIRTNET_SQ_STATS_LEN	ARRAY_SIZE(virtnet_sq_stats_desc)
>  #define VIRTNET_RQ_STATS_LEN	ARRAY_SIZE(virtnet_rq_stats_desc)
>  
> +/* The bufs on the same page may share this struct. */
> +struct virtnet_rq_dma {
> +	struct virtnet_rq_dma *next;
> +
> +	dma_addr_t addr;
> +
> +	void *buf;
> +	u32 len;
> +
> +	u32 ref;
> +};
> +
> +/* Record the dma and buf. */

I guess I see that. But why?
And these two comments are the extent of the available
documentation, that's not enough I feel.


> +struct virtnet_rq_data {
> +	struct virtnet_rq_data *next;

Is manually reimplementing a linked list the best
we can do?

> +
> +	void *buf;
> +
> +	struct virtnet_rq_dma *dma;
> +};
> +
>  /* Internal representation of a send virtqueue */
>  struct send_queue {
>  	/* Virtqueue associated with this send _queue */
> @@ -175,6 +196,13 @@ struct receive_queue {
>  	char name[16];
>  
>  	struct xdp_rxq_info xdp_rxq;
> +
> +	struct virtnet_rq_data *data_array;
> +	struct virtnet_rq_data *data_free;
> +
> +	struct virtnet_rq_dma *dma_array;
> +	struct virtnet_rq_dma *dma_free;
> +	struct virtnet_rq_dma *last_dma;
>  };
>  
>  /* This structure can contain rss message with maximum settings for indirection table and keysize
> @@ -549,6 +577,176 @@ static struct sk_buff *page_to_skb(struct virtnet_info *vi,
>  	return skb;
>  }
>  
> +static void virtnet_rq_unmap(struct receive_queue *rq, struct virtnet_rq_dma *dma)
> +{
> +	struct device *dev;
> +
> +	--dma->ref;
> +
> +	if (dma->ref)
> +		return;
> +

If you don't unmap there is no guarantee valid data will be
there in the buffer.

> +	dev = virtqueue_dma_dev(rq->vq);
> +
> +	dma_unmap_page(dev, dma->addr, dma->len, DMA_FROM_DEVICE);





> +
> +	dma->next = rq->dma_free;
> +	rq->dma_free = dma;
> +}
> +
> +static void *virtnet_rq_recycle_data(struct receive_queue *rq,
> +				     struct virtnet_rq_data *data)
> +{
> +	void *buf;
> +
> +	buf = data->buf;
> +
> +	data->next = rq->data_free;
> +	rq->data_free = data;
> +
> +	return buf;
> +}
> +
> +static struct virtnet_rq_data *virtnet_rq_get_data(struct receive_queue *rq,
> +						   void *buf,
> +						   struct virtnet_rq_dma *dma)
> +{
> +	struct virtnet_rq_data *data;
> +
> +	data = rq->data_free;
> +	rq->data_free = data->next;
> +
> +	data->buf = buf;
> +	data->dma = dma;
> +
> +	return data;
> +}
> +
> +static void *virtnet_rq_get_buf(struct receive_queue *rq, u32 *len, void **ctx)
> +{
> +	struct virtnet_rq_data *data;
> +	void *buf;
> +
> +	buf = virtqueue_get_buf_ctx(rq->vq, len, ctx);
> +	if (!buf || !rq->data_array)
> +		return buf;
> +
> +	data = buf;
> +
> +	virtnet_rq_unmap(rq, data->dma);
> +
> +	return virtnet_rq_recycle_data(rq, data);
> +}
> +
> +static void *virtnet_rq_detach_unused_buf(struct receive_queue *rq)
> +{
> +	struct virtnet_rq_data *data;
> +	void *buf;
> +
> +	buf = virtqueue_detach_unused_buf(rq->vq);
> +	if (!buf || !rq->data_array)
> +		return buf;
> +
> +	data = buf;
> +
> +	virtnet_rq_unmap(rq, data->dma);
> +
> +	return virtnet_rq_recycle_data(rq, data);
> +}
> +
> +static int virtnet_rq_map_sg(struct receive_queue *rq, void *buf, u32 len)
> +{
> +	struct virtnet_rq_dma *dma = rq->last_dma;
> +	struct device *dev;
> +	u32 off, map_len;
> +	dma_addr_t addr;
> +	void *end;
> +
> +	if (likely(dma) && buf >= dma->buf && (buf + len <= dma->buf + dma->len)) {
> +		++dma->ref;
> +		addr = dma->addr + (buf - dma->buf);
> +		goto ok;
> +	}

So this is the meat of the proposed optimization. I guess that
if the last buffer we allocated happens to be in the same page
as this one then they can both be mapped for DMA together.
Why last one specifically? Whether next one happens to
be close depends on luck. If you want to try optimizing this
the right thing to do is likely by using a page pool.
There's actually work upstream on page pool, look it up.

> +
> +	end = buf + len - 1;
> +	off = offset_in_page(end);
> +	map_len = len + PAGE_SIZE - off;
> +
> +	dev = virtqueue_dma_dev(rq->vq);
> +
> +	addr = dma_map_page_attrs(dev, virt_to_page(buf), offset_in_page(buf),
> +				  map_len, DMA_FROM_DEVICE, 0);
> +	if (addr == DMA_MAPPING_ERROR)
> +		return -ENOMEM;
> +
> +	dma = rq->dma_free;
> +	rq->dma_free = dma->next;
> +
> +	dma->ref = 1;
> +	dma->buf = buf;
> +	dma->addr = addr;
> +	dma->len = map_len;
> +
> +	rq->last_dma = dma;
> +
> +ok:
> +	sg_init_table(rq->sg, 1);
> +	rq->sg[0].dma_address = addr;
> +	rq->sg[0].length = len;
> +
> +	return 0;
> +}
> +
> +static int virtnet_rq_merge_map_init(struct virtnet_info *vi)
> +{
> +	struct receive_queue *rq;
> +	int i, err, j, num;
> +
> +	/* disable for big mode */
> +	if (!vi->mergeable_rx_bufs && vi->big_packets)
> +		return 0;
> +
> +	for (i = 0; i < vi->max_queue_pairs; i++) {
> +		err = virtqueue_set_premapped(vi->rq[i].vq);
> +		if (err)
> +			continue;
> +
> +		rq = &vi->rq[i];
> +
> +		num = virtqueue_get_vring_size(rq->vq);
> +
> +		rq->data_array = kmalloc_array(num, sizeof(*rq->data_array), GFP_KERNEL);
> +		if (!rq->data_array)
> +			goto err;
> +
> +		rq->dma_array = kmalloc_array(num, sizeof(*rq->dma_array), GFP_KERNEL);
> +		if (!rq->dma_array)
> +			goto err;
> +
> +		for (j = 0; j < num; ++j) {
> +			rq->data_array[j].next = rq->data_free;
> +			rq->data_free = &rq->data_array[j];
> +
> +			rq->dma_array[j].next = rq->dma_free;
> +			rq->dma_free = &rq->dma_array[j];
> +		}
> +	}
> +
> +	return 0;
> +
> +err:
> +	for (i = 0; i < vi->max_queue_pairs; i++) {
> +		struct receive_queue *rq;
> +
> +		rq = &vi->rq[i];
> +
> +		kfree(rq->dma_array);
> +		kfree(rq->data_array);
> +	}
> +
> +	return -ENOMEM;
> +}
> +
>  static void free_old_xmit_skbs(struct send_queue *sq, bool in_napi)
>  {
>  	unsigned int len;
> @@ -835,7 +1033,7 @@ static struct page *xdp_linearize_page(struct receive_queue *rq,
>  		void *buf;
>  		int off;
>  
> -		buf = virtqueue_get_buf(rq->vq, &buflen);
> +		buf = virtnet_rq_get_buf(rq, &buflen, NULL);
>  		if (unlikely(!buf))
>  			goto err_buf;
>  
> @@ -1126,7 +1324,7 @@ static int virtnet_build_xdp_buff_mrg(struct net_device *dev,
>  		return -EINVAL;
>  
>  	while (--*num_buf > 0) {
> -		buf = virtqueue_get_buf_ctx(rq->vq, &len, &ctx);
> +		buf = virtnet_rq_get_buf(rq, &len, &ctx);
>  		if (unlikely(!buf)) {
>  			pr_debug("%s: rx error: %d buffers out of %d missing\n",
>  				 dev->name, *num_buf,
> @@ -1351,7 +1549,7 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
>  	while (--num_buf) {
>  		int num_skb_frags;
>  
> -		buf = virtqueue_get_buf_ctx(rq->vq, &len, &ctx);
> +		buf = virtnet_rq_get_buf(rq, &len, &ctx);
>  		if (unlikely(!buf)) {
>  			pr_debug("%s: rx error: %d buffers out of %d missing\n",
>  				 dev->name, num_buf,
> @@ -1414,7 +1612,7 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
>  err_skb:
>  	put_page(page);
>  	while (num_buf-- > 1) {
> -		buf = virtqueue_get_buf(rq->vq, &len);
> +		buf = virtnet_rq_get_buf(rq, &len, NULL);
>  		if (unlikely(!buf)) {
>  			pr_debug("%s: rx error: %d buffers missing\n",
>  				 dev->name, num_buf);
> @@ -1529,6 +1727,7 @@ static int add_recvbuf_small(struct virtnet_info *vi, struct receive_queue *rq,
>  	unsigned int xdp_headroom = virtnet_get_headroom(vi);
>  	void *ctx = (void *)(unsigned long)xdp_headroom;
>  	int len = vi->hdr_len + VIRTNET_RX_PAD + GOOD_PACKET_LEN + xdp_headroom;
> +	struct virtnet_rq_data *data;
>  	int err;
>  
>  	len = SKB_DATA_ALIGN(len) +
> @@ -1539,11 +1738,34 @@ static int add_recvbuf_small(struct virtnet_info *vi, struct receive_queue *rq,
>  	buf = (char *)page_address(alloc_frag->page) + alloc_frag->offset;
>  	get_page(alloc_frag->page);
>  	alloc_frag->offset += len;
> -	sg_init_one(rq->sg, buf + VIRTNET_RX_PAD + xdp_headroom,
> -		    vi->hdr_len + GOOD_PACKET_LEN);
> -	err = virtqueue_add_inbuf_ctx(rq->vq, rq->sg, 1, buf, ctx, gfp);
> +
> +	if (rq->data_array) {
> +		err = virtnet_rq_map_sg(rq, buf + VIRTNET_RX_PAD + xdp_headroom,
> +					vi->hdr_len + GOOD_PACKET_LEN);
> +		if (err)
> +			goto map_err;
> +
> +		data = virtnet_rq_get_data(rq, buf, rq->last_dma);
> +	} else {
> +		sg_init_one(rq->sg, buf + VIRTNET_RX_PAD + xdp_headroom,
> +			    vi->hdr_len + GOOD_PACKET_LEN);
> +		data = (void *)buf;
> +	}
> +
> +	err = virtqueue_add_inbuf_ctx(rq->vq, rq->sg, 1, data, ctx, gfp);
>  	if (err < 0)
> -		put_page(virt_to_head_page(buf));
> +		goto add_err;
> +
> +	return err;
> +
> +add_err:
> +	if (rq->data_array) {
> +		virtnet_rq_unmap(rq, data->dma);
> +		virtnet_rq_recycle_data(rq, data);
> +	}
> +
> +map_err:
> +	put_page(virt_to_head_page(buf));
>  	return err;
>  }
>  
> @@ -1620,6 +1842,7 @@ static int add_recvbuf_mergeable(struct virtnet_info *vi,
>  	unsigned int headroom = virtnet_get_headroom(vi);
>  	unsigned int tailroom = headroom ? sizeof(struct skb_shared_info) : 0;
>  	unsigned int room = SKB_DATA_ALIGN(headroom + tailroom);
> +	struct virtnet_rq_data *data;
>  	char *buf;
>  	void *ctx;
>  	int err;
> @@ -1650,12 +1873,32 @@ static int add_recvbuf_mergeable(struct virtnet_info *vi,
>  		alloc_frag->offset += hole;
>  	}
>  
> -	sg_init_one(rq->sg, buf, len);
> +	if (rq->data_array) {
> +		err = virtnet_rq_map_sg(rq, buf, len);
> +		if (err)
> +			goto map_err;
> +
> +		data = virtnet_rq_get_data(rq, buf, rq->last_dma);
> +	} else {
> +		sg_init_one(rq->sg, buf, len);
> +		data = (void *)buf;
> +	}
> +
>  	ctx = mergeable_len_to_ctx(len + room, headroom);
> -	err = virtqueue_add_inbuf_ctx(rq->vq, rq->sg, 1, buf, ctx, gfp);
> +	err = virtqueue_add_inbuf_ctx(rq->vq, rq->sg, 1, data, ctx, gfp);
>  	if (err < 0)
> -		put_page(virt_to_head_page(buf));
> +		goto add_err;
> +
> +	return 0;
> +
> +add_err:
> +	if (rq->data_array) {
> +		virtnet_rq_unmap(rq, data->dma);
> +		virtnet_rq_recycle_data(rq, data);
> +	}
>  
> +map_err:
> +	put_page(virt_to_head_page(buf));
>  	return err;
>  }
>  
> @@ -1775,13 +2018,13 @@ static int virtnet_receive(struct receive_queue *rq, int budget,
>  		void *ctx;
>  
>  		while (stats.packets < budget &&
> -		       (buf = virtqueue_get_buf_ctx(rq->vq, &len, &ctx))) {
> +		       (buf = virtnet_rq_get_buf(rq, &len, &ctx))) {
>  			receive_buf(vi, rq, buf, len, ctx, xdp_xmit, &stats);
>  			stats.packets++;
>  		}
>  	} else {
>  		while (stats.packets < budget &&
> -		       (buf = virtqueue_get_buf(rq->vq, &len)) != NULL) {
> +		       (buf = virtnet_rq_get_buf(rq, &len, NULL)) != NULL) {
>  			receive_buf(vi, rq, buf, len, NULL, xdp_xmit, &stats);
>  			stats.packets++;
>  		}
> @@ -3514,6 +3757,9 @@ static void virtnet_free_queues(struct virtnet_info *vi)
>  	for (i = 0; i < vi->max_queue_pairs; i++) {
>  		__netif_napi_del(&vi->rq[i].napi);
>  		__netif_napi_del(&vi->sq[i].napi);
> +
> +		kfree(vi->rq[i].data_array);
> +		kfree(vi->rq[i].dma_array);
>  	}
>  
>  	/* We called __netif_napi_del(),
> @@ -3591,9 +3837,10 @@ static void free_unused_bufs(struct virtnet_info *vi)
>  	}
>  
>  	for (i = 0; i < vi->max_queue_pairs; i++) {
> -		struct virtqueue *vq = vi->rq[i].vq;
> -		while ((buf = virtqueue_detach_unused_buf(vq)) != NULL)
> -			virtnet_rq_free_unused_buf(vq, buf);
> +		struct receive_queue *rq = &vi->rq[i];
> +
> +		while ((buf = virtnet_rq_detach_unused_buf(rq)) != NULL)
> +			virtnet_rq_free_unused_buf(rq->vq, buf);
>  		cond_resched();
>  	}
>  }
> @@ -3767,6 +4014,10 @@ static int init_vqs(struct virtnet_info *vi)
>  	if (ret)
>  		goto err_free;
>  
> +	ret = virtnet_rq_merge_map_init(vi);
> +	if (ret)
> +		goto err_free;
> +
>  	cpus_read_lock();
>  	virtnet_set_affinity(vi);
>  	cpus_read_unlock();
> -- 
> 2.32.0.3.g01195cf9f


^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: [PATCH vhost v11 10/10] virtio_net: merge dma operation for one page
  2023-07-10  9:40     ` Michael S. Tsirkin
@ 2023-07-10 10:18       ` Xuan Zhuo
  -1 siblings, 0 replies; 176+ messages in thread
From: Xuan Zhuo @ 2023-07-10 10:18 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Jesper Dangaard Brouer, Daniel Borkmann, netdev, John Fastabend,
	Alexei Starovoitov, virtualization, Christoph Hellwig,
	Eric Dumazet, Jakub Kicinski, bpf, Paolo Abeni, David S. Miller

On Mon, 10 Jul 2023 05:40:21 -0400, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> On Mon, Jul 10, 2023 at 11:42:37AM +0800, Xuan Zhuo wrote:
> > Currently, the virtio core will perform a dma operation for each
> > operation. Although, the same page may be operated multiple times.
> >
> > The driver does the dma operation and manages the dma address based the
> > feature premapped of virtio core.
> >
> > This way, we can perform only one dma operation for the same page. In
> > the case of mtu 1500, this can reduce a lot of dma operations.
> >
> > Tested on Aliyun g7.4large machine, in the case of a cpu 100%, pps
> > increased from 1893766 to 1901105. An increase of 0.4%.
>
> what kind of dma was there? an IOMMU? which vendors? in which mode
> of operation?


Do you mean this:

[    0.470816] iommu: Default domain type: Passthrough


>
> > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
>
> This kind of difference is likely in the noise.

It's really not high, but this is because the proportion of DMA under perf top
is not high. Probably that much.

>
>
> > ---
> >  drivers/net/virtio_net.c | 283 ++++++++++++++++++++++++++++++++++++---
> >  1 file changed, 267 insertions(+), 16 deletions(-)
> >
> > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > index 486b5849033d..4de845d35bed 100644
> > --- a/drivers/net/virtio_net.c
> > +++ b/drivers/net/virtio_net.c
> > @@ -126,6 +126,27 @@ static const struct virtnet_stat_desc virtnet_rq_stats_desc[] = {
> >  #define VIRTNET_SQ_STATS_LEN	ARRAY_SIZE(virtnet_sq_stats_desc)
> >  #define VIRTNET_RQ_STATS_LEN	ARRAY_SIZE(virtnet_rq_stats_desc)
> >
> > +/* The bufs on the same page may share this struct. */
> > +struct virtnet_rq_dma {
> > +	struct virtnet_rq_dma *next;
> > +
> > +	dma_addr_t addr;
> > +
> > +	void *buf;
> > +	u32 len;
> > +
> > +	u32 ref;
> > +};
> > +
> > +/* Record the dma and buf. */
>
> I guess I see that. But why?
> And these two comments are the extent of the available
> documentation, that's not enough I feel.
>
>
> > +struct virtnet_rq_data {
> > +	struct virtnet_rq_data *next;
>
> Is manually reimplementing a linked list the best
> we can do?

Yes, we can use llist.

>
> > +
> > +	void *buf;
> > +
> > +	struct virtnet_rq_dma *dma;
> > +};
> > +
> >  /* Internal representation of a send virtqueue */
> >  struct send_queue {
> >  	/* Virtqueue associated with this send _queue */
> > @@ -175,6 +196,13 @@ struct receive_queue {
> >  	char name[16];
> >
> >  	struct xdp_rxq_info xdp_rxq;
> > +
> > +	struct virtnet_rq_data *data_array;
> > +	struct virtnet_rq_data *data_free;
> > +
> > +	struct virtnet_rq_dma *dma_array;
> > +	struct virtnet_rq_dma *dma_free;
> > +	struct virtnet_rq_dma *last_dma;
> >  };
> >
> >  /* This structure can contain rss message with maximum settings for indirection table and keysize
> > @@ -549,6 +577,176 @@ static struct sk_buff *page_to_skb(struct virtnet_info *vi,
> >  	return skb;
> >  }
> >
> > +static void virtnet_rq_unmap(struct receive_queue *rq, struct virtnet_rq_dma *dma)
> > +{
> > +	struct device *dev;
> > +
> > +	--dma->ref;
> > +
> > +	if (dma->ref)
> > +		return;
> > +
>
> If you don't unmap there is no guarantee valid data will be
> there in the buffer.
>
> > +	dev = virtqueue_dma_dev(rq->vq);
> > +
> > +	dma_unmap_page(dev, dma->addr, dma->len, DMA_FROM_DEVICE);
>
>
>
>
>
> > +
> > +	dma->next = rq->dma_free;
> > +	rq->dma_free = dma;
> > +}
> > +
> > +static void *virtnet_rq_recycle_data(struct receive_queue *rq,
> > +				     struct virtnet_rq_data *data)
> > +{
> > +	void *buf;
> > +
> > +	buf = data->buf;
> > +
> > +	data->next = rq->data_free;
> > +	rq->data_free = data;
> > +
> > +	return buf;
> > +}
> > +
> > +static struct virtnet_rq_data *virtnet_rq_get_data(struct receive_queue *rq,
> > +						   void *buf,
> > +						   struct virtnet_rq_dma *dma)
> > +{
> > +	struct virtnet_rq_data *data;
> > +
> > +	data = rq->data_free;
> > +	rq->data_free = data->next;
> > +
> > +	data->buf = buf;
> > +	data->dma = dma;
> > +
> > +	return data;
> > +}
> > +
> > +static void *virtnet_rq_get_buf(struct receive_queue *rq, u32 *len, void **ctx)
> > +{
> > +	struct virtnet_rq_data *data;
> > +	void *buf;
> > +
> > +	buf = virtqueue_get_buf_ctx(rq->vq, len, ctx);
> > +	if (!buf || !rq->data_array)
> > +		return buf;
> > +
> > +	data = buf;
> > +
> > +	virtnet_rq_unmap(rq, data->dma);
> > +
> > +	return virtnet_rq_recycle_data(rq, data);
> > +}
> > +
> > +static void *virtnet_rq_detach_unused_buf(struct receive_queue *rq)
> > +{
> > +	struct virtnet_rq_data *data;
> > +	void *buf;
> > +
> > +	buf = virtqueue_detach_unused_buf(rq->vq);
> > +	if (!buf || !rq->data_array)
> > +		return buf;
> > +
> > +	data = buf;
> > +
> > +	virtnet_rq_unmap(rq, data->dma);
> > +
> > +	return virtnet_rq_recycle_data(rq, data);
> > +}
> > +
> > +static int virtnet_rq_map_sg(struct receive_queue *rq, void *buf, u32 len)
> > +{
> > +	struct virtnet_rq_dma *dma = rq->last_dma;
> > +	struct device *dev;
> > +	u32 off, map_len;
> > +	dma_addr_t addr;
> > +	void *end;
> > +
> > +	if (likely(dma) && buf >= dma->buf && (buf + len <= dma->buf + dma->len)) {
> > +		++dma->ref;
> > +		addr = dma->addr + (buf - dma->buf);
> > +		goto ok;
> > +	}
>
> So this is the meat of the proposed optimization. I guess that
> if the last buffer we allocated happens to be in the same page
> as this one then they can both be mapped for DMA together.

Since we use page_frag, the buffers we allocated are all continuous.

> Why last one specifically? Whether next one happens to
> be close depends on luck. If you want to try optimizing this
> the right thing to do is likely by using a page pool.
> There's actually work upstream on page pool, look it up.

As we discussed in another thread, the page pool is first used for xdp. Let's
transform it step by step.

Thanks.

>
> > +
> > +	end = buf + len - 1;
> > +	off = offset_in_page(end);
> > +	map_len = len + PAGE_SIZE - off;
> > +
> > +	dev = virtqueue_dma_dev(rq->vq);
> > +
> > +	addr = dma_map_page_attrs(dev, virt_to_page(buf), offset_in_page(buf),
> > +				  map_len, DMA_FROM_DEVICE, 0);
> > +	if (addr == DMA_MAPPING_ERROR)
> > +		return -ENOMEM;
> > +
> > +	dma = rq->dma_free;
> > +	rq->dma_free = dma->next;
> > +
> > +	dma->ref = 1;
> > +	dma->buf = buf;
> > +	dma->addr = addr;
> > +	dma->len = map_len;
> > +
> > +	rq->last_dma = dma;
> > +
> > +ok:
> > +	sg_init_table(rq->sg, 1);
> > +	rq->sg[0].dma_address = addr;
> > +	rq->sg[0].length = len;
> > +
> > +	return 0;
> > +}
> > +
> > +static int virtnet_rq_merge_map_init(struct virtnet_info *vi)
> > +{
> > +	struct receive_queue *rq;
> > +	int i, err, j, num;
> > +
> > +	/* disable for big mode */
> > +	if (!vi->mergeable_rx_bufs && vi->big_packets)
> > +		return 0;
> > +
> > +	for (i = 0; i < vi->max_queue_pairs; i++) {
> > +		err = virtqueue_set_premapped(vi->rq[i].vq);
> > +		if (err)
> > +			continue;
> > +
> > +		rq = &vi->rq[i];
> > +
> > +		num = virtqueue_get_vring_size(rq->vq);
> > +
> > +		rq->data_array = kmalloc_array(num, sizeof(*rq->data_array), GFP_KERNEL);
> > +		if (!rq->data_array)
> > +			goto err;
> > +
> > +		rq->dma_array = kmalloc_array(num, sizeof(*rq->dma_array), GFP_KERNEL);
> > +		if (!rq->dma_array)
> > +			goto err;
> > +
> > +		for (j = 0; j < num; ++j) {
> > +			rq->data_array[j].next = rq->data_free;
> > +			rq->data_free = &rq->data_array[j];
> > +
> > +			rq->dma_array[j].next = rq->dma_free;
> > +			rq->dma_free = &rq->dma_array[j];
> > +		}
> > +	}
> > +
> > +	return 0;
> > +
> > +err:
> > +	for (i = 0; i < vi->max_queue_pairs; i++) {
> > +		struct receive_queue *rq;
> > +
> > +		rq = &vi->rq[i];
> > +
> > +		kfree(rq->dma_array);
> > +		kfree(rq->data_array);
> > +	}
> > +
> > +	return -ENOMEM;
> > +}
> > +
> >  static void free_old_xmit_skbs(struct send_queue *sq, bool in_napi)
> >  {
> >  	unsigned int len;
> > @@ -835,7 +1033,7 @@ static struct page *xdp_linearize_page(struct receive_queue *rq,
> >  		void *buf;
> >  		int off;
> >
> > -		buf = virtqueue_get_buf(rq->vq, &buflen);
> > +		buf = virtnet_rq_get_buf(rq, &buflen, NULL);
> >  		if (unlikely(!buf))
> >  			goto err_buf;
> >
> > @@ -1126,7 +1324,7 @@ static int virtnet_build_xdp_buff_mrg(struct net_device *dev,
> >  		return -EINVAL;
> >
> >  	while (--*num_buf > 0) {
> > -		buf = virtqueue_get_buf_ctx(rq->vq, &len, &ctx);
> > +		buf = virtnet_rq_get_buf(rq, &len, &ctx);
> >  		if (unlikely(!buf)) {
> >  			pr_debug("%s: rx error: %d buffers out of %d missing\n",
> >  				 dev->name, *num_buf,
> > @@ -1351,7 +1549,7 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
> >  	while (--num_buf) {
> >  		int num_skb_frags;
> >
> > -		buf = virtqueue_get_buf_ctx(rq->vq, &len, &ctx);
> > +		buf = virtnet_rq_get_buf(rq, &len, &ctx);
> >  		if (unlikely(!buf)) {
> >  			pr_debug("%s: rx error: %d buffers out of %d missing\n",
> >  				 dev->name, num_buf,
> > @@ -1414,7 +1612,7 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
> >  err_skb:
> >  	put_page(page);
> >  	while (num_buf-- > 1) {
> > -		buf = virtqueue_get_buf(rq->vq, &len);
> > +		buf = virtnet_rq_get_buf(rq, &len, NULL);
> >  		if (unlikely(!buf)) {
> >  			pr_debug("%s: rx error: %d buffers missing\n",
> >  				 dev->name, num_buf);
> > @@ -1529,6 +1727,7 @@ static int add_recvbuf_small(struct virtnet_info *vi, struct receive_queue *rq,
> >  	unsigned int xdp_headroom = virtnet_get_headroom(vi);
> >  	void *ctx = (void *)(unsigned long)xdp_headroom;
> >  	int len = vi->hdr_len + VIRTNET_RX_PAD + GOOD_PACKET_LEN + xdp_headroom;
> > +	struct virtnet_rq_data *data;
> >  	int err;
> >
> >  	len = SKB_DATA_ALIGN(len) +
> > @@ -1539,11 +1738,34 @@ static int add_recvbuf_small(struct virtnet_info *vi, struct receive_queue *rq,
> >  	buf = (char *)page_address(alloc_frag->page) + alloc_frag->offset;
> >  	get_page(alloc_frag->page);
> >  	alloc_frag->offset += len;
> > -	sg_init_one(rq->sg, buf + VIRTNET_RX_PAD + xdp_headroom,
> > -		    vi->hdr_len + GOOD_PACKET_LEN);
> > -	err = virtqueue_add_inbuf_ctx(rq->vq, rq->sg, 1, buf, ctx, gfp);
> > +
> > +	if (rq->data_array) {
> > +		err = virtnet_rq_map_sg(rq, buf + VIRTNET_RX_PAD + xdp_headroom,
> > +					vi->hdr_len + GOOD_PACKET_LEN);
> > +		if (err)
> > +			goto map_err;
> > +
> > +		data = virtnet_rq_get_data(rq, buf, rq->last_dma);
> > +	} else {
> > +		sg_init_one(rq->sg, buf + VIRTNET_RX_PAD + xdp_headroom,
> > +			    vi->hdr_len + GOOD_PACKET_LEN);
> > +		data = (void *)buf;
> > +	}
> > +
> > +	err = virtqueue_add_inbuf_ctx(rq->vq, rq->sg, 1, data, ctx, gfp);
> >  	if (err < 0)
> > -		put_page(virt_to_head_page(buf));
> > +		goto add_err;
> > +
> > +	return err;
> > +
> > +add_err:
> > +	if (rq->data_array) {
> > +		virtnet_rq_unmap(rq, data->dma);
> > +		virtnet_rq_recycle_data(rq, data);
> > +	}
> > +
> > +map_err:
> > +	put_page(virt_to_head_page(buf));
> >  	return err;
> >  }
> >
> > @@ -1620,6 +1842,7 @@ static int add_recvbuf_mergeable(struct virtnet_info *vi,
> >  	unsigned int headroom = virtnet_get_headroom(vi);
> >  	unsigned int tailroom = headroom ? sizeof(struct skb_shared_info) : 0;
> >  	unsigned int room = SKB_DATA_ALIGN(headroom + tailroom);
> > +	struct virtnet_rq_data *data;
> >  	char *buf;
> >  	void *ctx;
> >  	int err;
> > @@ -1650,12 +1873,32 @@ static int add_recvbuf_mergeable(struct virtnet_info *vi,
> >  		alloc_frag->offset += hole;
> >  	}
> >
> > -	sg_init_one(rq->sg, buf, len);
> > +	if (rq->data_array) {
> > +		err = virtnet_rq_map_sg(rq, buf, len);
> > +		if (err)
> > +			goto map_err;
> > +
> > +		data = virtnet_rq_get_data(rq, buf, rq->last_dma);
> > +	} else {
> > +		sg_init_one(rq->sg, buf, len);
> > +		data = (void *)buf;
> > +	}
> > +
> >  	ctx = mergeable_len_to_ctx(len + room, headroom);
> > -	err = virtqueue_add_inbuf_ctx(rq->vq, rq->sg, 1, buf, ctx, gfp);
> > +	err = virtqueue_add_inbuf_ctx(rq->vq, rq->sg, 1, data, ctx, gfp);
> >  	if (err < 0)
> > -		put_page(virt_to_head_page(buf));
> > +		goto add_err;
> > +
> > +	return 0;
> > +
> > +add_err:
> > +	if (rq->data_array) {
> > +		virtnet_rq_unmap(rq, data->dma);
> > +		virtnet_rq_recycle_data(rq, data);
> > +	}
> >
> > +map_err:
> > +	put_page(virt_to_head_page(buf));
> >  	return err;
> >  }
> >
> > @@ -1775,13 +2018,13 @@ static int virtnet_receive(struct receive_queue *rq, int budget,
> >  		void *ctx;
> >
> >  		while (stats.packets < budget &&
> > -		       (buf = virtqueue_get_buf_ctx(rq->vq, &len, &ctx))) {
> > +		       (buf = virtnet_rq_get_buf(rq, &len, &ctx))) {
> >  			receive_buf(vi, rq, buf, len, ctx, xdp_xmit, &stats);
> >  			stats.packets++;
> >  		}
> >  	} else {
> >  		while (stats.packets < budget &&
> > -		       (buf = virtqueue_get_buf(rq->vq, &len)) != NULL) {
> > +		       (buf = virtnet_rq_get_buf(rq, &len, NULL)) != NULL) {
> >  			receive_buf(vi, rq, buf, len, NULL, xdp_xmit, &stats);
> >  			stats.packets++;
> >  		}
> > @@ -3514,6 +3757,9 @@ static void virtnet_free_queues(struct virtnet_info *vi)
> >  	for (i = 0; i < vi->max_queue_pairs; i++) {
> >  		__netif_napi_del(&vi->rq[i].napi);
> >  		__netif_napi_del(&vi->sq[i].napi);
> > +
> > +		kfree(vi->rq[i].data_array);
> > +		kfree(vi->rq[i].dma_array);
> >  	}
> >
> >  	/* We called __netif_napi_del(),
> > @@ -3591,9 +3837,10 @@ static void free_unused_bufs(struct virtnet_info *vi)
> >  	}
> >
> >  	for (i = 0; i < vi->max_queue_pairs; i++) {
> > -		struct virtqueue *vq = vi->rq[i].vq;
> > -		while ((buf = virtqueue_detach_unused_buf(vq)) != NULL)
> > -			virtnet_rq_free_unused_buf(vq, buf);
> > +		struct receive_queue *rq = &vi->rq[i];
> > +
> > +		while ((buf = virtnet_rq_detach_unused_buf(rq)) != NULL)
> > +			virtnet_rq_free_unused_buf(rq->vq, buf);
> >  		cond_resched();
> >  	}
> >  }
> > @@ -3767,6 +4014,10 @@ static int init_vqs(struct virtnet_info *vi)
> >  	if (ret)
> >  		goto err_free;
> >
> > +	ret = virtnet_rq_merge_map_init(vi);
> > +	if (ret)
> > +		goto err_free;
> > +
> >  	cpus_read_lock();
> >  	virtnet_set_affinity(vi);
> >  	cpus_read_unlock();
> > --
> > 2.32.0.3.g01195cf9f
>
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: [PATCH vhost v11 10/10] virtio_net: merge dma operation for one page
@ 2023-07-10 10:18       ` Xuan Zhuo
  0 siblings, 0 replies; 176+ messages in thread
From: Xuan Zhuo @ 2023-07-10 10:18 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: virtualization, Jason Wang, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Alexei Starovoitov, Daniel Borkmann,
	Jesper Dangaard Brouer, John Fastabend, netdev, bpf,
	Christoph Hellwig

On Mon, 10 Jul 2023 05:40:21 -0400, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> On Mon, Jul 10, 2023 at 11:42:37AM +0800, Xuan Zhuo wrote:
> > Currently, the virtio core will perform a dma operation for each
> > operation. Although, the same page may be operated multiple times.
> >
> > The driver does the dma operation and manages the dma address based the
> > feature premapped of virtio core.
> >
> > This way, we can perform only one dma operation for the same page. In
> > the case of mtu 1500, this can reduce a lot of dma operations.
> >
> > Tested on Aliyun g7.4large machine, in the case of a cpu 100%, pps
> > increased from 1893766 to 1901105. An increase of 0.4%.
>
> what kind of dma was there? an IOMMU? which vendors? in which mode
> of operation?


Do you mean this:

[    0.470816] iommu: Default domain type: Passthrough


>
> > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
>
> This kind of difference is likely in the noise.

It's really not high, but this is because the proportion of DMA under perf top
is not high. Probably that much.

>
>
> > ---
> >  drivers/net/virtio_net.c | 283 ++++++++++++++++++++++++++++++++++++---
> >  1 file changed, 267 insertions(+), 16 deletions(-)
> >
> > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > index 486b5849033d..4de845d35bed 100644
> > --- a/drivers/net/virtio_net.c
> > +++ b/drivers/net/virtio_net.c
> > @@ -126,6 +126,27 @@ static const struct virtnet_stat_desc virtnet_rq_stats_desc[] = {
> >  #define VIRTNET_SQ_STATS_LEN	ARRAY_SIZE(virtnet_sq_stats_desc)
> >  #define VIRTNET_RQ_STATS_LEN	ARRAY_SIZE(virtnet_rq_stats_desc)
> >
> > +/* The bufs on the same page may share this struct. */
> > +struct virtnet_rq_dma {
> > +	struct virtnet_rq_dma *next;
> > +
> > +	dma_addr_t addr;
> > +
> > +	void *buf;
> > +	u32 len;
> > +
> > +	u32 ref;
> > +};
> > +
> > +/* Record the dma and buf. */
>
> I guess I see that. But why?
> And these two comments are the extent of the available
> documentation, that's not enough I feel.
>
>
> > +struct virtnet_rq_data {
> > +	struct virtnet_rq_data *next;
>
> Is manually reimplementing a linked list the best
> we can do?

Yes, we can use llist.

>
> > +
> > +	void *buf;
> > +
> > +	struct virtnet_rq_dma *dma;
> > +};
> > +
> >  /* Internal representation of a send virtqueue */
> >  struct send_queue {
> >  	/* Virtqueue associated with this send _queue */
> > @@ -175,6 +196,13 @@ struct receive_queue {
> >  	char name[16];
> >
> >  	struct xdp_rxq_info xdp_rxq;
> > +
> > +	struct virtnet_rq_data *data_array;
> > +	struct virtnet_rq_data *data_free;
> > +
> > +	struct virtnet_rq_dma *dma_array;
> > +	struct virtnet_rq_dma *dma_free;
> > +	struct virtnet_rq_dma *last_dma;
> >  };
> >
> >  /* This structure can contain rss message with maximum settings for indirection table and keysize
> > @@ -549,6 +577,176 @@ static struct sk_buff *page_to_skb(struct virtnet_info *vi,
> >  	return skb;
> >  }
> >
> > +static void virtnet_rq_unmap(struct receive_queue *rq, struct virtnet_rq_dma *dma)
> > +{
> > +	struct device *dev;
> > +
> > +	--dma->ref;
> > +
> > +	if (dma->ref)
> > +		return;
> > +
>
> If you don't unmap there is no guarantee valid data will be
> there in the buffer.
>
> > +	dev = virtqueue_dma_dev(rq->vq);
> > +
> > +	dma_unmap_page(dev, dma->addr, dma->len, DMA_FROM_DEVICE);
>
>
>
>
>
> > +
> > +	dma->next = rq->dma_free;
> > +	rq->dma_free = dma;
> > +}
> > +
> > +static void *virtnet_rq_recycle_data(struct receive_queue *rq,
> > +				     struct virtnet_rq_data *data)
> > +{
> > +	void *buf;
> > +
> > +	buf = data->buf;
> > +
> > +	data->next = rq->data_free;
> > +	rq->data_free = data;
> > +
> > +	return buf;
> > +}
> > +
> > +static struct virtnet_rq_data *virtnet_rq_get_data(struct receive_queue *rq,
> > +						   void *buf,
> > +						   struct virtnet_rq_dma *dma)
> > +{
> > +	struct virtnet_rq_data *data;
> > +
> > +	data = rq->data_free;
> > +	rq->data_free = data->next;
> > +
> > +	data->buf = buf;
> > +	data->dma = dma;
> > +
> > +	return data;
> > +}
> > +
> > +static void *virtnet_rq_get_buf(struct receive_queue *rq, u32 *len, void **ctx)
> > +{
> > +	struct virtnet_rq_data *data;
> > +	void *buf;
> > +
> > +	buf = virtqueue_get_buf_ctx(rq->vq, len, ctx);
> > +	if (!buf || !rq->data_array)
> > +		return buf;
> > +
> > +	data = buf;
> > +
> > +	virtnet_rq_unmap(rq, data->dma);
> > +
> > +	return virtnet_rq_recycle_data(rq, data);
> > +}
> > +
> > +static void *virtnet_rq_detach_unused_buf(struct receive_queue *rq)
> > +{
> > +	struct virtnet_rq_data *data;
> > +	void *buf;
> > +
> > +	buf = virtqueue_detach_unused_buf(rq->vq);
> > +	if (!buf || !rq->data_array)
> > +		return buf;
> > +
> > +	data = buf;
> > +
> > +	virtnet_rq_unmap(rq, data->dma);
> > +
> > +	return virtnet_rq_recycle_data(rq, data);
> > +}
> > +
> > +static int virtnet_rq_map_sg(struct receive_queue *rq, void *buf, u32 len)
> > +{
> > +	struct virtnet_rq_dma *dma = rq->last_dma;
> > +	struct device *dev;
> > +	u32 off, map_len;
> > +	dma_addr_t addr;
> > +	void *end;
> > +
> > +	if (likely(dma) && buf >= dma->buf && (buf + len <= dma->buf + dma->len)) {
> > +		++dma->ref;
> > +		addr = dma->addr + (buf - dma->buf);
> > +		goto ok;
> > +	}
>
> So this is the meat of the proposed optimization. I guess that
> if the last buffer we allocated happens to be in the same page
> as this one then they can both be mapped for DMA together.

Since we use page_frag, the buffers we allocated are all continuous.

> Why last one specifically? Whether next one happens to
> be close depends on luck. If you want to try optimizing this
> the right thing to do is likely by using a page pool.
> There's actually work upstream on page pool, look it up.

As we discussed in another thread, the page pool is first used for xdp. Let's
transform it step by step.

Thanks.

>
> > +
> > +	end = buf + len - 1;
> > +	off = offset_in_page(end);
> > +	map_len = len + PAGE_SIZE - off;
> > +
> > +	dev = virtqueue_dma_dev(rq->vq);
> > +
> > +	addr = dma_map_page_attrs(dev, virt_to_page(buf), offset_in_page(buf),
> > +				  map_len, DMA_FROM_DEVICE, 0);
> > +	if (addr == DMA_MAPPING_ERROR)
> > +		return -ENOMEM;
> > +
> > +	dma = rq->dma_free;
> > +	rq->dma_free = dma->next;
> > +
> > +	dma->ref = 1;
> > +	dma->buf = buf;
> > +	dma->addr = addr;
> > +	dma->len = map_len;
> > +
> > +	rq->last_dma = dma;
> > +
> > +ok:
> > +	sg_init_table(rq->sg, 1);
> > +	rq->sg[0].dma_address = addr;
> > +	rq->sg[0].length = len;
> > +
> > +	return 0;
> > +}
> > +
> > +static int virtnet_rq_merge_map_init(struct virtnet_info *vi)
> > +{
> > +	struct receive_queue *rq;
> > +	int i, err, j, num;
> > +
> > +	/* disable for big mode */
> > +	if (!vi->mergeable_rx_bufs && vi->big_packets)
> > +		return 0;
> > +
> > +	for (i = 0; i < vi->max_queue_pairs; i++) {
> > +		err = virtqueue_set_premapped(vi->rq[i].vq);
> > +		if (err)
> > +			continue;
> > +
> > +		rq = &vi->rq[i];
> > +
> > +		num = virtqueue_get_vring_size(rq->vq);
> > +
> > +		rq->data_array = kmalloc_array(num, sizeof(*rq->data_array), GFP_KERNEL);
> > +		if (!rq->data_array)
> > +			goto err;
> > +
> > +		rq->dma_array = kmalloc_array(num, sizeof(*rq->dma_array), GFP_KERNEL);
> > +		if (!rq->dma_array)
> > +			goto err;
> > +
> > +		for (j = 0; j < num; ++j) {
> > +			rq->data_array[j].next = rq->data_free;
> > +			rq->data_free = &rq->data_array[j];
> > +
> > +			rq->dma_array[j].next = rq->dma_free;
> > +			rq->dma_free = &rq->dma_array[j];
> > +		}
> > +	}
> > +
> > +	return 0;
> > +
> > +err:
> > +	for (i = 0; i < vi->max_queue_pairs; i++) {
> > +		struct receive_queue *rq;
> > +
> > +		rq = &vi->rq[i];
> > +
> > +		kfree(rq->dma_array);
> > +		kfree(rq->data_array);
> > +	}
> > +
> > +	return -ENOMEM;
> > +}
> > +
> >  static void free_old_xmit_skbs(struct send_queue *sq, bool in_napi)
> >  {
> >  	unsigned int len;
> > @@ -835,7 +1033,7 @@ static struct page *xdp_linearize_page(struct receive_queue *rq,
> >  		void *buf;
> >  		int off;
> >
> > -		buf = virtqueue_get_buf(rq->vq, &buflen);
> > +		buf = virtnet_rq_get_buf(rq, &buflen, NULL);
> >  		if (unlikely(!buf))
> >  			goto err_buf;
> >
> > @@ -1126,7 +1324,7 @@ static int virtnet_build_xdp_buff_mrg(struct net_device *dev,
> >  		return -EINVAL;
> >
> >  	while (--*num_buf > 0) {
> > -		buf = virtqueue_get_buf_ctx(rq->vq, &len, &ctx);
> > +		buf = virtnet_rq_get_buf(rq, &len, &ctx);
> >  		if (unlikely(!buf)) {
> >  			pr_debug("%s: rx error: %d buffers out of %d missing\n",
> >  				 dev->name, *num_buf,
> > @@ -1351,7 +1549,7 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
> >  	while (--num_buf) {
> >  		int num_skb_frags;
> >
> > -		buf = virtqueue_get_buf_ctx(rq->vq, &len, &ctx);
> > +		buf = virtnet_rq_get_buf(rq, &len, &ctx);
> >  		if (unlikely(!buf)) {
> >  			pr_debug("%s: rx error: %d buffers out of %d missing\n",
> >  				 dev->name, num_buf,
> > @@ -1414,7 +1612,7 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
> >  err_skb:
> >  	put_page(page);
> >  	while (num_buf-- > 1) {
> > -		buf = virtqueue_get_buf(rq->vq, &len);
> > +		buf = virtnet_rq_get_buf(rq, &len, NULL);
> >  		if (unlikely(!buf)) {
> >  			pr_debug("%s: rx error: %d buffers missing\n",
> >  				 dev->name, num_buf);
> > @@ -1529,6 +1727,7 @@ static int add_recvbuf_small(struct virtnet_info *vi, struct receive_queue *rq,
> >  	unsigned int xdp_headroom = virtnet_get_headroom(vi);
> >  	void *ctx = (void *)(unsigned long)xdp_headroom;
> >  	int len = vi->hdr_len + VIRTNET_RX_PAD + GOOD_PACKET_LEN + xdp_headroom;
> > +	struct virtnet_rq_data *data;
> >  	int err;
> >
> >  	len = SKB_DATA_ALIGN(len) +
> > @@ -1539,11 +1738,34 @@ static int add_recvbuf_small(struct virtnet_info *vi, struct receive_queue *rq,
> >  	buf = (char *)page_address(alloc_frag->page) + alloc_frag->offset;
> >  	get_page(alloc_frag->page);
> >  	alloc_frag->offset += len;
> > -	sg_init_one(rq->sg, buf + VIRTNET_RX_PAD + xdp_headroom,
> > -		    vi->hdr_len + GOOD_PACKET_LEN);
> > -	err = virtqueue_add_inbuf_ctx(rq->vq, rq->sg, 1, buf, ctx, gfp);
> > +
> > +	if (rq->data_array) {
> > +		err = virtnet_rq_map_sg(rq, buf + VIRTNET_RX_PAD + xdp_headroom,
> > +					vi->hdr_len + GOOD_PACKET_LEN);
> > +		if (err)
> > +			goto map_err;
> > +
> > +		data = virtnet_rq_get_data(rq, buf, rq->last_dma);
> > +	} else {
> > +		sg_init_one(rq->sg, buf + VIRTNET_RX_PAD + xdp_headroom,
> > +			    vi->hdr_len + GOOD_PACKET_LEN);
> > +		data = (void *)buf;
> > +	}
> > +
> > +	err = virtqueue_add_inbuf_ctx(rq->vq, rq->sg, 1, data, ctx, gfp);
> >  	if (err < 0)
> > -		put_page(virt_to_head_page(buf));
> > +		goto add_err;
> > +
> > +	return err;
> > +
> > +add_err:
> > +	if (rq->data_array) {
> > +		virtnet_rq_unmap(rq, data->dma);
> > +		virtnet_rq_recycle_data(rq, data);
> > +	}
> > +
> > +map_err:
> > +	put_page(virt_to_head_page(buf));
> >  	return err;
> >  }
> >
> > @@ -1620,6 +1842,7 @@ static int add_recvbuf_mergeable(struct virtnet_info *vi,
> >  	unsigned int headroom = virtnet_get_headroom(vi);
> >  	unsigned int tailroom = headroom ? sizeof(struct skb_shared_info) : 0;
> >  	unsigned int room = SKB_DATA_ALIGN(headroom + tailroom);
> > +	struct virtnet_rq_data *data;
> >  	char *buf;
> >  	void *ctx;
> >  	int err;
> > @@ -1650,12 +1873,32 @@ static int add_recvbuf_mergeable(struct virtnet_info *vi,
> >  		alloc_frag->offset += hole;
> >  	}
> >
> > -	sg_init_one(rq->sg, buf, len);
> > +	if (rq->data_array) {
> > +		err = virtnet_rq_map_sg(rq, buf, len);
> > +		if (err)
> > +			goto map_err;
> > +
> > +		data = virtnet_rq_get_data(rq, buf, rq->last_dma);
> > +	} else {
> > +		sg_init_one(rq->sg, buf, len);
> > +		data = (void *)buf;
> > +	}
> > +
> >  	ctx = mergeable_len_to_ctx(len + room, headroom);
> > -	err = virtqueue_add_inbuf_ctx(rq->vq, rq->sg, 1, buf, ctx, gfp);
> > +	err = virtqueue_add_inbuf_ctx(rq->vq, rq->sg, 1, data, ctx, gfp);
> >  	if (err < 0)
> > -		put_page(virt_to_head_page(buf));
> > +		goto add_err;
> > +
> > +	return 0;
> > +
> > +add_err:
> > +	if (rq->data_array) {
> > +		virtnet_rq_unmap(rq, data->dma);
> > +		virtnet_rq_recycle_data(rq, data);
> > +	}
> >
> > +map_err:
> > +	put_page(virt_to_head_page(buf));
> >  	return err;
> >  }
> >
> > @@ -1775,13 +2018,13 @@ static int virtnet_receive(struct receive_queue *rq, int budget,
> >  		void *ctx;
> >
> >  		while (stats.packets < budget &&
> > -		       (buf = virtqueue_get_buf_ctx(rq->vq, &len, &ctx))) {
> > +		       (buf = virtnet_rq_get_buf(rq, &len, &ctx))) {
> >  			receive_buf(vi, rq, buf, len, ctx, xdp_xmit, &stats);
> >  			stats.packets++;
> >  		}
> >  	} else {
> >  		while (stats.packets < budget &&
> > -		       (buf = virtqueue_get_buf(rq->vq, &len)) != NULL) {
> > +		       (buf = virtnet_rq_get_buf(rq, &len, NULL)) != NULL) {
> >  			receive_buf(vi, rq, buf, len, NULL, xdp_xmit, &stats);
> >  			stats.packets++;
> >  		}
> > @@ -3514,6 +3757,9 @@ static void virtnet_free_queues(struct virtnet_info *vi)
> >  	for (i = 0; i < vi->max_queue_pairs; i++) {
> >  		__netif_napi_del(&vi->rq[i].napi);
> >  		__netif_napi_del(&vi->sq[i].napi);
> > +
> > +		kfree(vi->rq[i].data_array);
> > +		kfree(vi->rq[i].dma_array);
> >  	}
> >
> >  	/* We called __netif_napi_del(),
> > @@ -3591,9 +3837,10 @@ static void free_unused_bufs(struct virtnet_info *vi)
> >  	}
> >
> >  	for (i = 0; i < vi->max_queue_pairs; i++) {
> > -		struct virtqueue *vq = vi->rq[i].vq;
> > -		while ((buf = virtqueue_detach_unused_buf(vq)) != NULL)
> > -			virtnet_rq_free_unused_buf(vq, buf);
> > +		struct receive_queue *rq = &vi->rq[i];
> > +
> > +		while ((buf = virtnet_rq_detach_unused_buf(rq)) != NULL)
> > +			virtnet_rq_free_unused_buf(rq->vq, buf);
> >  		cond_resched();
> >  	}
> >  }
> > @@ -3767,6 +4014,10 @@ static int init_vqs(struct virtnet_info *vi)
> >  	if (ret)
> >  		goto err_free;
> >
> > +	ret = virtnet_rq_merge_map_init(vi);
> > +	if (ret)
> > +		goto err_free;
> > +
> >  	cpus_read_lock();
> >  	virtnet_set_affinity(vi);
> >  	cpus_read_unlock();
> > --
> > 2.32.0.3.g01195cf9f
>

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: [PATCH vhost v11 10/10] virtio_net: merge dma operation for one page
  2023-07-10 10:18       ` Xuan Zhuo
@ 2023-07-10 11:59         ` Michael S. Tsirkin
  -1 siblings, 0 replies; 176+ messages in thread
From: Michael S. Tsirkin @ 2023-07-10 11:59 UTC (permalink / raw)
  To: Xuan Zhuo
  Cc: virtualization, Jason Wang, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Alexei Starovoitov, Daniel Borkmann,
	Jesper Dangaard Brouer, John Fastabend, netdev, bpf,
	Christoph Hellwig

On Mon, Jul 10, 2023 at 06:18:30PM +0800, Xuan Zhuo wrote:
> On Mon, 10 Jul 2023 05:40:21 -0400, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > On Mon, Jul 10, 2023 at 11:42:37AM +0800, Xuan Zhuo wrote:
> > > Currently, the virtio core will perform a dma operation for each
> > > operation. Although, the same page may be operated multiple times.
> > >
> > > The driver does the dma operation and manages the dma address based the
> > > feature premapped of virtio core.
> > >
> > > This way, we can perform only one dma operation for the same page. In
> > > the case of mtu 1500, this can reduce a lot of dma operations.
> > >
> > > Tested on Aliyun g7.4large machine, in the case of a cpu 100%, pps
> > > increased from 1893766 to 1901105. An increase of 0.4%.
> >
> > what kind of dma was there? an IOMMU? which vendors? in which mode
> > of operation?
> 
> 
> Do you mean this:
> 
> [    0.470816] iommu: Default domain type: Passthrough
> 

With passthrough, dma API is just some indirect function calls, they do
not affect the performance a lot.

Try e.g. bounce buffer. Which is where you will see a problem: your
patches won't work.


> >
> > > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> >
> > This kind of difference is likely in the noise.
> 
> It's really not high, but this is because the proportion of DMA under perf top
> is not high. Probably that much.

So maybe not worth the complexity.

> >
> >
> > > ---
> > >  drivers/net/virtio_net.c | 283 ++++++++++++++++++++++++++++++++++++---
> > >  1 file changed, 267 insertions(+), 16 deletions(-)
> > >
> > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > > index 486b5849033d..4de845d35bed 100644
> > > --- a/drivers/net/virtio_net.c
> > > +++ b/drivers/net/virtio_net.c
> > > @@ -126,6 +126,27 @@ static const struct virtnet_stat_desc virtnet_rq_stats_desc[] = {
> > >  #define VIRTNET_SQ_STATS_LEN	ARRAY_SIZE(virtnet_sq_stats_desc)
> > >  #define VIRTNET_RQ_STATS_LEN	ARRAY_SIZE(virtnet_rq_stats_desc)
> > >
> > > +/* The bufs on the same page may share this struct. */
> > > +struct virtnet_rq_dma {
> > > +	struct virtnet_rq_dma *next;
> > > +
> > > +	dma_addr_t addr;
> > > +
> > > +	void *buf;
> > > +	u32 len;
> > > +
> > > +	u32 ref;
> > > +};
> > > +
> > > +/* Record the dma and buf. */
> >
> > I guess I see that. But why?
> > And these two comments are the extent of the available
> > documentation, that's not enough I feel.
> >
> >
> > > +struct virtnet_rq_data {
> > > +	struct virtnet_rq_data *next;
> >
> > Is manually reimplementing a linked list the best
> > we can do?
> 
> Yes, we can use llist.
> 
> >
> > > +
> > > +	void *buf;
> > > +
> > > +	struct virtnet_rq_dma *dma;
> > > +};
> > > +
> > >  /* Internal representation of a send virtqueue */
> > >  struct send_queue {
> > >  	/* Virtqueue associated with this send _queue */
> > > @@ -175,6 +196,13 @@ struct receive_queue {
> > >  	char name[16];
> > >
> > >  	struct xdp_rxq_info xdp_rxq;
> > > +
> > > +	struct virtnet_rq_data *data_array;
> > > +	struct virtnet_rq_data *data_free;
> > > +
> > > +	struct virtnet_rq_dma *dma_array;
> > > +	struct virtnet_rq_dma *dma_free;
> > > +	struct virtnet_rq_dma *last_dma;
> > >  };
> > >
> > >  /* This structure can contain rss message with maximum settings for indirection table and keysize
> > > @@ -549,6 +577,176 @@ static struct sk_buff *page_to_skb(struct virtnet_info *vi,
> > >  	return skb;
> > >  }
> > >
> > > +static void virtnet_rq_unmap(struct receive_queue *rq, struct virtnet_rq_dma *dma)
> > > +{
> > > +	struct device *dev;
> > > +
> > > +	--dma->ref;
> > > +
> > > +	if (dma->ref)
> > > +		return;
> > > +
> >
> > If you don't unmap there is no guarantee valid data will be
> > there in the buffer.
> >
> > > +	dev = virtqueue_dma_dev(rq->vq);
> > > +
> > > +	dma_unmap_page(dev, dma->addr, dma->len, DMA_FROM_DEVICE);
> >
> >
> >
> >
> >
> > > +
> > > +	dma->next = rq->dma_free;
> > > +	rq->dma_free = dma;
> > > +}
> > > +
> > > +static void *virtnet_rq_recycle_data(struct receive_queue *rq,
> > > +				     struct virtnet_rq_data *data)
> > > +{
> > > +	void *buf;
> > > +
> > > +	buf = data->buf;
> > > +
> > > +	data->next = rq->data_free;
> > > +	rq->data_free = data;
> > > +
> > > +	return buf;
> > > +}
> > > +
> > > +static struct virtnet_rq_data *virtnet_rq_get_data(struct receive_queue *rq,
> > > +						   void *buf,
> > > +						   struct virtnet_rq_dma *dma)
> > > +{
> > > +	struct virtnet_rq_data *data;
> > > +
> > > +	data = rq->data_free;
> > > +	rq->data_free = data->next;
> > > +
> > > +	data->buf = buf;
> > > +	data->dma = dma;
> > > +
> > > +	return data;
> > > +}
> > > +
> > > +static void *virtnet_rq_get_buf(struct receive_queue *rq, u32 *len, void **ctx)
> > > +{
> > > +	struct virtnet_rq_data *data;
> > > +	void *buf;
> > > +
> > > +	buf = virtqueue_get_buf_ctx(rq->vq, len, ctx);
> > > +	if (!buf || !rq->data_array)
> > > +		return buf;
> > > +
> > > +	data = buf;
> > > +
> > > +	virtnet_rq_unmap(rq, data->dma);
> > > +
> > > +	return virtnet_rq_recycle_data(rq, data);
> > > +}
> > > +
> > > +static void *virtnet_rq_detach_unused_buf(struct receive_queue *rq)
> > > +{
> > > +	struct virtnet_rq_data *data;
> > > +	void *buf;
> > > +
> > > +	buf = virtqueue_detach_unused_buf(rq->vq);
> > > +	if (!buf || !rq->data_array)
> > > +		return buf;
> > > +
> > > +	data = buf;
> > > +
> > > +	virtnet_rq_unmap(rq, data->dma);
> > > +
> > > +	return virtnet_rq_recycle_data(rq, data);
> > > +}
> > > +
> > > +static int virtnet_rq_map_sg(struct receive_queue *rq, void *buf, u32 len)
> > > +{
> > > +	struct virtnet_rq_dma *dma = rq->last_dma;
> > > +	struct device *dev;
> > > +	u32 off, map_len;
> > > +	dma_addr_t addr;
> > > +	void *end;
> > > +
> > > +	if (likely(dma) && buf >= dma->buf && (buf + len <= dma->buf + dma->len)) {
> > > +		++dma->ref;
> > > +		addr = dma->addr + (buf - dma->buf);
> > > +		goto ok;
> > > +	}
> >
> > So this is the meat of the proposed optimization. I guess that
> > if the last buffer we allocated happens to be in the same page
> > as this one then they can both be mapped for DMA together.
> 
> Since we use page_frag, the buffers we allocated are all continuous.
> 
> > Why last one specifically? Whether next one happens to
> > be close depends on luck. If you want to try optimizing this
> > the right thing to do is likely by using a page pool.
> > There's actually work upstream on page pool, look it up.
> 
> As we discussed in another thread, the page pool is first used for xdp. Let's
> transform it step by step.
> 
> Thanks.

ok so this should wait then?

> >
> > > +
> > > +	end = buf + len - 1;
> > > +	off = offset_in_page(end);
> > > +	map_len = len + PAGE_SIZE - off;
> > > +
> > > +	dev = virtqueue_dma_dev(rq->vq);
> > > +
> > > +	addr = dma_map_page_attrs(dev, virt_to_page(buf), offset_in_page(buf),
> > > +				  map_len, DMA_FROM_DEVICE, 0);
> > > +	if (addr == DMA_MAPPING_ERROR)
> > > +		return -ENOMEM;
> > > +
> > > +	dma = rq->dma_free;
> > > +	rq->dma_free = dma->next;
> > > +
> > > +	dma->ref = 1;
> > > +	dma->buf = buf;
> > > +	dma->addr = addr;
> > > +	dma->len = map_len;
> > > +
> > > +	rq->last_dma = dma;
> > > +
> > > +ok:
> > > +	sg_init_table(rq->sg, 1);
> > > +	rq->sg[0].dma_address = addr;
> > > +	rq->sg[0].length = len;
> > > +
> > > +	return 0;
> > > +}
> > > +
> > > +static int virtnet_rq_merge_map_init(struct virtnet_info *vi)
> > > +{
> > > +	struct receive_queue *rq;
> > > +	int i, err, j, num;
> > > +
> > > +	/* disable for big mode */
> > > +	if (!vi->mergeable_rx_bufs && vi->big_packets)
> > > +		return 0;
> > > +
> > > +	for (i = 0; i < vi->max_queue_pairs; i++) {
> > > +		err = virtqueue_set_premapped(vi->rq[i].vq);
> > > +		if (err)
> > > +			continue;
> > > +
> > > +		rq = &vi->rq[i];
> > > +
> > > +		num = virtqueue_get_vring_size(rq->vq);
> > > +
> > > +		rq->data_array = kmalloc_array(num, sizeof(*rq->data_array), GFP_KERNEL);
> > > +		if (!rq->data_array)
> > > +			goto err;
> > > +
> > > +		rq->dma_array = kmalloc_array(num, sizeof(*rq->dma_array), GFP_KERNEL);
> > > +		if (!rq->dma_array)
> > > +			goto err;
> > > +
> > > +		for (j = 0; j < num; ++j) {
> > > +			rq->data_array[j].next = rq->data_free;
> > > +			rq->data_free = &rq->data_array[j];
> > > +
> > > +			rq->dma_array[j].next = rq->dma_free;
> > > +			rq->dma_free = &rq->dma_array[j];
> > > +		}
> > > +	}
> > > +
> > > +	return 0;
> > > +
> > > +err:
> > > +	for (i = 0; i < vi->max_queue_pairs; i++) {
> > > +		struct receive_queue *rq;
> > > +
> > > +		rq = &vi->rq[i];
> > > +
> > > +		kfree(rq->dma_array);
> > > +		kfree(rq->data_array);
> > > +	}
> > > +
> > > +	return -ENOMEM;
> > > +}
> > > +
> > >  static void free_old_xmit_skbs(struct send_queue *sq, bool in_napi)
> > >  {
> > >  	unsigned int len;
> > > @@ -835,7 +1033,7 @@ static struct page *xdp_linearize_page(struct receive_queue *rq,
> > >  		void *buf;
> > >  		int off;
> > >
> > > -		buf = virtqueue_get_buf(rq->vq, &buflen);
> > > +		buf = virtnet_rq_get_buf(rq, &buflen, NULL);
> > >  		if (unlikely(!buf))
> > >  			goto err_buf;
> > >
> > > @@ -1126,7 +1324,7 @@ static int virtnet_build_xdp_buff_mrg(struct net_device *dev,
> > >  		return -EINVAL;
> > >
> > >  	while (--*num_buf > 0) {
> > > -		buf = virtqueue_get_buf_ctx(rq->vq, &len, &ctx);
> > > +		buf = virtnet_rq_get_buf(rq, &len, &ctx);
> > >  		if (unlikely(!buf)) {
> > >  			pr_debug("%s: rx error: %d buffers out of %d missing\n",
> > >  				 dev->name, *num_buf,
> > > @@ -1351,7 +1549,7 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
> > >  	while (--num_buf) {
> > >  		int num_skb_frags;
> > >
> > > -		buf = virtqueue_get_buf_ctx(rq->vq, &len, &ctx);
> > > +		buf = virtnet_rq_get_buf(rq, &len, &ctx);
> > >  		if (unlikely(!buf)) {
> > >  			pr_debug("%s: rx error: %d buffers out of %d missing\n",
> > >  				 dev->name, num_buf,
> > > @@ -1414,7 +1612,7 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
> > >  err_skb:
> > >  	put_page(page);
> > >  	while (num_buf-- > 1) {
> > > -		buf = virtqueue_get_buf(rq->vq, &len);
> > > +		buf = virtnet_rq_get_buf(rq, &len, NULL);
> > >  		if (unlikely(!buf)) {
> > >  			pr_debug("%s: rx error: %d buffers missing\n",
> > >  				 dev->name, num_buf);
> > > @@ -1529,6 +1727,7 @@ static int add_recvbuf_small(struct virtnet_info *vi, struct receive_queue *rq,
> > >  	unsigned int xdp_headroom = virtnet_get_headroom(vi);
> > >  	void *ctx = (void *)(unsigned long)xdp_headroom;
> > >  	int len = vi->hdr_len + VIRTNET_RX_PAD + GOOD_PACKET_LEN + xdp_headroom;
> > > +	struct virtnet_rq_data *data;
> > >  	int err;
> > >
> > >  	len = SKB_DATA_ALIGN(len) +
> > > @@ -1539,11 +1738,34 @@ static int add_recvbuf_small(struct virtnet_info *vi, struct receive_queue *rq,
> > >  	buf = (char *)page_address(alloc_frag->page) + alloc_frag->offset;
> > >  	get_page(alloc_frag->page);
> > >  	alloc_frag->offset += len;
> > > -	sg_init_one(rq->sg, buf + VIRTNET_RX_PAD + xdp_headroom,
> > > -		    vi->hdr_len + GOOD_PACKET_LEN);
> > > -	err = virtqueue_add_inbuf_ctx(rq->vq, rq->sg, 1, buf, ctx, gfp);
> > > +
> > > +	if (rq->data_array) {
> > > +		err = virtnet_rq_map_sg(rq, buf + VIRTNET_RX_PAD + xdp_headroom,
> > > +					vi->hdr_len + GOOD_PACKET_LEN);
> > > +		if (err)
> > > +			goto map_err;
> > > +
> > > +		data = virtnet_rq_get_data(rq, buf, rq->last_dma);
> > > +	} else {
> > > +		sg_init_one(rq->sg, buf + VIRTNET_RX_PAD + xdp_headroom,
> > > +			    vi->hdr_len + GOOD_PACKET_LEN);
> > > +		data = (void *)buf;
> > > +	}
> > > +
> > > +	err = virtqueue_add_inbuf_ctx(rq->vq, rq->sg, 1, data, ctx, gfp);
> > >  	if (err < 0)
> > > -		put_page(virt_to_head_page(buf));
> > > +		goto add_err;
> > > +
> > > +	return err;
> > > +
> > > +add_err:
> > > +	if (rq->data_array) {
> > > +		virtnet_rq_unmap(rq, data->dma);
> > > +		virtnet_rq_recycle_data(rq, data);
> > > +	}
> > > +
> > > +map_err:
> > > +	put_page(virt_to_head_page(buf));
> > >  	return err;
> > >  }
> > >
> > > @@ -1620,6 +1842,7 @@ static int add_recvbuf_mergeable(struct virtnet_info *vi,
> > >  	unsigned int headroom = virtnet_get_headroom(vi);
> > >  	unsigned int tailroom = headroom ? sizeof(struct skb_shared_info) : 0;
> > >  	unsigned int room = SKB_DATA_ALIGN(headroom + tailroom);
> > > +	struct virtnet_rq_data *data;
> > >  	char *buf;
> > >  	void *ctx;
> > >  	int err;
> > > @@ -1650,12 +1873,32 @@ static int add_recvbuf_mergeable(struct virtnet_info *vi,
> > >  		alloc_frag->offset += hole;
> > >  	}
> > >
> > > -	sg_init_one(rq->sg, buf, len);
> > > +	if (rq->data_array) {
> > > +		err = virtnet_rq_map_sg(rq, buf, len);
> > > +		if (err)
> > > +			goto map_err;
> > > +
> > > +		data = virtnet_rq_get_data(rq, buf, rq->last_dma);
> > > +	} else {
> > > +		sg_init_one(rq->sg, buf, len);
> > > +		data = (void *)buf;
> > > +	}
> > > +
> > >  	ctx = mergeable_len_to_ctx(len + room, headroom);
> > > -	err = virtqueue_add_inbuf_ctx(rq->vq, rq->sg, 1, buf, ctx, gfp);
> > > +	err = virtqueue_add_inbuf_ctx(rq->vq, rq->sg, 1, data, ctx, gfp);
> > >  	if (err < 0)
> > > -		put_page(virt_to_head_page(buf));
> > > +		goto add_err;
> > > +
> > > +	return 0;
> > > +
> > > +add_err:
> > > +	if (rq->data_array) {
> > > +		virtnet_rq_unmap(rq, data->dma);
> > > +		virtnet_rq_recycle_data(rq, data);
> > > +	}
> > >
> > > +map_err:
> > > +	put_page(virt_to_head_page(buf));
> > >  	return err;
> > >  }
> > >
> > > @@ -1775,13 +2018,13 @@ static int virtnet_receive(struct receive_queue *rq, int budget,
> > >  		void *ctx;
> > >
> > >  		while (stats.packets < budget &&
> > > -		       (buf = virtqueue_get_buf_ctx(rq->vq, &len, &ctx))) {
> > > +		       (buf = virtnet_rq_get_buf(rq, &len, &ctx))) {
> > >  			receive_buf(vi, rq, buf, len, ctx, xdp_xmit, &stats);
> > >  			stats.packets++;
> > >  		}
> > >  	} else {
> > >  		while (stats.packets < budget &&
> > > -		       (buf = virtqueue_get_buf(rq->vq, &len)) != NULL) {
> > > +		       (buf = virtnet_rq_get_buf(rq, &len, NULL)) != NULL) {
> > >  			receive_buf(vi, rq, buf, len, NULL, xdp_xmit, &stats);
> > >  			stats.packets++;
> > >  		}
> > > @@ -3514,6 +3757,9 @@ static void virtnet_free_queues(struct virtnet_info *vi)
> > >  	for (i = 0; i < vi->max_queue_pairs; i++) {
> > >  		__netif_napi_del(&vi->rq[i].napi);
> > >  		__netif_napi_del(&vi->sq[i].napi);
> > > +
> > > +		kfree(vi->rq[i].data_array);
> > > +		kfree(vi->rq[i].dma_array);
> > >  	}
> > >
> > >  	/* We called __netif_napi_del(),
> > > @@ -3591,9 +3837,10 @@ static void free_unused_bufs(struct virtnet_info *vi)
> > >  	}
> > >
> > >  	for (i = 0; i < vi->max_queue_pairs; i++) {
> > > -		struct virtqueue *vq = vi->rq[i].vq;
> > > -		while ((buf = virtqueue_detach_unused_buf(vq)) != NULL)
> > > -			virtnet_rq_free_unused_buf(vq, buf);
> > > +		struct receive_queue *rq = &vi->rq[i];
> > > +
> > > +		while ((buf = virtnet_rq_detach_unused_buf(rq)) != NULL)
> > > +			virtnet_rq_free_unused_buf(rq->vq, buf);
> > >  		cond_resched();
> > >  	}
> > >  }
> > > @@ -3767,6 +4014,10 @@ static int init_vqs(struct virtnet_info *vi)
> > >  	if (ret)
> > >  		goto err_free;
> > >
> > > +	ret = virtnet_rq_merge_map_init(vi);
> > > +	if (ret)
> > > +		goto err_free;
> > > +
> > >  	cpus_read_lock();
> > >  	virtnet_set_affinity(vi);
> > >  	cpus_read_unlock();
> > > --
> > > 2.32.0.3.g01195cf9f
> >


^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: [PATCH vhost v11 10/10] virtio_net: merge dma operation for one page
@ 2023-07-10 11:59         ` Michael S. Tsirkin
  0 siblings, 0 replies; 176+ messages in thread
From: Michael S. Tsirkin @ 2023-07-10 11:59 UTC (permalink / raw)
  To: Xuan Zhuo
  Cc: Jesper Dangaard Brouer, Daniel Borkmann, netdev, John Fastabend,
	Alexei Starovoitov, virtualization, Christoph Hellwig,
	Eric Dumazet, Jakub Kicinski, bpf, Paolo Abeni, David S. Miller

On Mon, Jul 10, 2023 at 06:18:30PM +0800, Xuan Zhuo wrote:
> On Mon, 10 Jul 2023 05:40:21 -0400, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > On Mon, Jul 10, 2023 at 11:42:37AM +0800, Xuan Zhuo wrote:
> > > Currently, the virtio core will perform a dma operation for each
> > > operation. Although, the same page may be operated multiple times.
> > >
> > > The driver does the dma operation and manages the dma address based the
> > > feature premapped of virtio core.
> > >
> > > This way, we can perform only one dma operation for the same page. In
> > > the case of mtu 1500, this can reduce a lot of dma operations.
> > >
> > > Tested on Aliyun g7.4large machine, in the case of a cpu 100%, pps
> > > increased from 1893766 to 1901105. An increase of 0.4%.
> >
> > what kind of dma was there? an IOMMU? which vendors? in which mode
> > of operation?
> 
> 
> Do you mean this:
> 
> [    0.470816] iommu: Default domain type: Passthrough
> 

With passthrough, dma API is just some indirect function calls, they do
not affect the performance a lot.

Try e.g. bounce buffer. Which is where you will see a problem: your
patches won't work.


> >
> > > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> >
> > This kind of difference is likely in the noise.
> 
> It's really not high, but this is because the proportion of DMA under perf top
> is not high. Probably that much.

So maybe not worth the complexity.

> >
> >
> > > ---
> > >  drivers/net/virtio_net.c | 283 ++++++++++++++++++++++++++++++++++++---
> > >  1 file changed, 267 insertions(+), 16 deletions(-)
> > >
> > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > > index 486b5849033d..4de845d35bed 100644
> > > --- a/drivers/net/virtio_net.c
> > > +++ b/drivers/net/virtio_net.c
> > > @@ -126,6 +126,27 @@ static const struct virtnet_stat_desc virtnet_rq_stats_desc[] = {
> > >  #define VIRTNET_SQ_STATS_LEN	ARRAY_SIZE(virtnet_sq_stats_desc)
> > >  #define VIRTNET_RQ_STATS_LEN	ARRAY_SIZE(virtnet_rq_stats_desc)
> > >
> > > +/* The bufs on the same page may share this struct. */
> > > +struct virtnet_rq_dma {
> > > +	struct virtnet_rq_dma *next;
> > > +
> > > +	dma_addr_t addr;
> > > +
> > > +	void *buf;
> > > +	u32 len;
> > > +
> > > +	u32 ref;
> > > +};
> > > +
> > > +/* Record the dma and buf. */
> >
> > I guess I see that. But why?
> > And these two comments are the extent of the available
> > documentation, that's not enough I feel.
> >
> >
> > > +struct virtnet_rq_data {
> > > +	struct virtnet_rq_data *next;
> >
> > Is manually reimplementing a linked list the best
> > we can do?
> 
> Yes, we can use llist.
> 
> >
> > > +
> > > +	void *buf;
> > > +
> > > +	struct virtnet_rq_dma *dma;
> > > +};
> > > +
> > >  /* Internal representation of a send virtqueue */
> > >  struct send_queue {
> > >  	/* Virtqueue associated with this send _queue */
> > > @@ -175,6 +196,13 @@ struct receive_queue {
> > >  	char name[16];
> > >
> > >  	struct xdp_rxq_info xdp_rxq;
> > > +
> > > +	struct virtnet_rq_data *data_array;
> > > +	struct virtnet_rq_data *data_free;
> > > +
> > > +	struct virtnet_rq_dma *dma_array;
> > > +	struct virtnet_rq_dma *dma_free;
> > > +	struct virtnet_rq_dma *last_dma;
> > >  };
> > >
> > >  /* This structure can contain rss message with maximum settings for indirection table and keysize
> > > @@ -549,6 +577,176 @@ static struct sk_buff *page_to_skb(struct virtnet_info *vi,
> > >  	return skb;
> > >  }
> > >
> > > +static void virtnet_rq_unmap(struct receive_queue *rq, struct virtnet_rq_dma *dma)
> > > +{
> > > +	struct device *dev;
> > > +
> > > +	--dma->ref;
> > > +
> > > +	if (dma->ref)
> > > +		return;
> > > +
> >
> > If you don't unmap there is no guarantee valid data will be
> > there in the buffer.
> >
> > > +	dev = virtqueue_dma_dev(rq->vq);
> > > +
> > > +	dma_unmap_page(dev, dma->addr, dma->len, DMA_FROM_DEVICE);
> >
> >
> >
> >
> >
> > > +
> > > +	dma->next = rq->dma_free;
> > > +	rq->dma_free = dma;
> > > +}
> > > +
> > > +static void *virtnet_rq_recycle_data(struct receive_queue *rq,
> > > +				     struct virtnet_rq_data *data)
> > > +{
> > > +	void *buf;
> > > +
> > > +	buf = data->buf;
> > > +
> > > +	data->next = rq->data_free;
> > > +	rq->data_free = data;
> > > +
> > > +	return buf;
> > > +}
> > > +
> > > +static struct virtnet_rq_data *virtnet_rq_get_data(struct receive_queue *rq,
> > > +						   void *buf,
> > > +						   struct virtnet_rq_dma *dma)
> > > +{
> > > +	struct virtnet_rq_data *data;
> > > +
> > > +	data = rq->data_free;
> > > +	rq->data_free = data->next;
> > > +
> > > +	data->buf = buf;
> > > +	data->dma = dma;
> > > +
> > > +	return data;
> > > +}
> > > +
> > > +static void *virtnet_rq_get_buf(struct receive_queue *rq, u32 *len, void **ctx)
> > > +{
> > > +	struct virtnet_rq_data *data;
> > > +	void *buf;
> > > +
> > > +	buf = virtqueue_get_buf_ctx(rq->vq, len, ctx);
> > > +	if (!buf || !rq->data_array)
> > > +		return buf;
> > > +
> > > +	data = buf;
> > > +
> > > +	virtnet_rq_unmap(rq, data->dma);
> > > +
> > > +	return virtnet_rq_recycle_data(rq, data);
> > > +}
> > > +
> > > +static void *virtnet_rq_detach_unused_buf(struct receive_queue *rq)
> > > +{
> > > +	struct virtnet_rq_data *data;
> > > +	void *buf;
> > > +
> > > +	buf = virtqueue_detach_unused_buf(rq->vq);
> > > +	if (!buf || !rq->data_array)
> > > +		return buf;
> > > +
> > > +	data = buf;
> > > +
> > > +	virtnet_rq_unmap(rq, data->dma);
> > > +
> > > +	return virtnet_rq_recycle_data(rq, data);
> > > +}
> > > +
> > > +static int virtnet_rq_map_sg(struct receive_queue *rq, void *buf, u32 len)
> > > +{
> > > +	struct virtnet_rq_dma *dma = rq->last_dma;
> > > +	struct device *dev;
> > > +	u32 off, map_len;
> > > +	dma_addr_t addr;
> > > +	void *end;
> > > +
> > > +	if (likely(dma) && buf >= dma->buf && (buf + len <= dma->buf + dma->len)) {
> > > +		++dma->ref;
> > > +		addr = dma->addr + (buf - dma->buf);
> > > +		goto ok;
> > > +	}
> >
> > So this is the meat of the proposed optimization. I guess that
> > if the last buffer we allocated happens to be in the same page
> > as this one then they can both be mapped for DMA together.
> 
> Since we use page_frag, the buffers we allocated are all continuous.
> 
> > Why last one specifically? Whether next one happens to
> > be close depends on luck. If you want to try optimizing this
> > the right thing to do is likely by using a page pool.
> > There's actually work upstream on page pool, look it up.
> 
> As we discussed in another thread, the page pool is first used for xdp. Let's
> transform it step by step.
> 
> Thanks.

ok so this should wait then?

> >
> > > +
> > > +	end = buf + len - 1;
> > > +	off = offset_in_page(end);
> > > +	map_len = len + PAGE_SIZE - off;
> > > +
> > > +	dev = virtqueue_dma_dev(rq->vq);
> > > +
> > > +	addr = dma_map_page_attrs(dev, virt_to_page(buf), offset_in_page(buf),
> > > +				  map_len, DMA_FROM_DEVICE, 0);
> > > +	if (addr == DMA_MAPPING_ERROR)
> > > +		return -ENOMEM;
> > > +
> > > +	dma = rq->dma_free;
> > > +	rq->dma_free = dma->next;
> > > +
> > > +	dma->ref = 1;
> > > +	dma->buf = buf;
> > > +	dma->addr = addr;
> > > +	dma->len = map_len;
> > > +
> > > +	rq->last_dma = dma;
> > > +
> > > +ok:
> > > +	sg_init_table(rq->sg, 1);
> > > +	rq->sg[0].dma_address = addr;
> > > +	rq->sg[0].length = len;
> > > +
> > > +	return 0;
> > > +}
> > > +
> > > +static int virtnet_rq_merge_map_init(struct virtnet_info *vi)
> > > +{
> > > +	struct receive_queue *rq;
> > > +	int i, err, j, num;
> > > +
> > > +	/* disable for big mode */
> > > +	if (!vi->mergeable_rx_bufs && vi->big_packets)
> > > +		return 0;
> > > +
> > > +	for (i = 0; i < vi->max_queue_pairs; i++) {
> > > +		err = virtqueue_set_premapped(vi->rq[i].vq);
> > > +		if (err)
> > > +			continue;
> > > +
> > > +		rq = &vi->rq[i];
> > > +
> > > +		num = virtqueue_get_vring_size(rq->vq);
> > > +
> > > +		rq->data_array = kmalloc_array(num, sizeof(*rq->data_array), GFP_KERNEL);
> > > +		if (!rq->data_array)
> > > +			goto err;
> > > +
> > > +		rq->dma_array = kmalloc_array(num, sizeof(*rq->dma_array), GFP_KERNEL);
> > > +		if (!rq->dma_array)
> > > +			goto err;
> > > +
> > > +		for (j = 0; j < num; ++j) {
> > > +			rq->data_array[j].next = rq->data_free;
> > > +			rq->data_free = &rq->data_array[j];
> > > +
> > > +			rq->dma_array[j].next = rq->dma_free;
> > > +			rq->dma_free = &rq->dma_array[j];
> > > +		}
> > > +	}
> > > +
> > > +	return 0;
> > > +
> > > +err:
> > > +	for (i = 0; i < vi->max_queue_pairs; i++) {
> > > +		struct receive_queue *rq;
> > > +
> > > +		rq = &vi->rq[i];
> > > +
> > > +		kfree(rq->dma_array);
> > > +		kfree(rq->data_array);
> > > +	}
> > > +
> > > +	return -ENOMEM;
> > > +}
> > > +
> > >  static void free_old_xmit_skbs(struct send_queue *sq, bool in_napi)
> > >  {
> > >  	unsigned int len;
> > > @@ -835,7 +1033,7 @@ static struct page *xdp_linearize_page(struct receive_queue *rq,
> > >  		void *buf;
> > >  		int off;
> > >
> > > -		buf = virtqueue_get_buf(rq->vq, &buflen);
> > > +		buf = virtnet_rq_get_buf(rq, &buflen, NULL);
> > >  		if (unlikely(!buf))
> > >  			goto err_buf;
> > >
> > > @@ -1126,7 +1324,7 @@ static int virtnet_build_xdp_buff_mrg(struct net_device *dev,
> > >  		return -EINVAL;
> > >
> > >  	while (--*num_buf > 0) {
> > > -		buf = virtqueue_get_buf_ctx(rq->vq, &len, &ctx);
> > > +		buf = virtnet_rq_get_buf(rq, &len, &ctx);
> > >  		if (unlikely(!buf)) {
> > >  			pr_debug("%s: rx error: %d buffers out of %d missing\n",
> > >  				 dev->name, *num_buf,
> > > @@ -1351,7 +1549,7 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
> > >  	while (--num_buf) {
> > >  		int num_skb_frags;
> > >
> > > -		buf = virtqueue_get_buf_ctx(rq->vq, &len, &ctx);
> > > +		buf = virtnet_rq_get_buf(rq, &len, &ctx);
> > >  		if (unlikely(!buf)) {
> > >  			pr_debug("%s: rx error: %d buffers out of %d missing\n",
> > >  				 dev->name, num_buf,
> > > @@ -1414,7 +1612,7 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
> > >  err_skb:
> > >  	put_page(page);
> > >  	while (num_buf-- > 1) {
> > > -		buf = virtqueue_get_buf(rq->vq, &len);
> > > +		buf = virtnet_rq_get_buf(rq, &len, NULL);
> > >  		if (unlikely(!buf)) {
> > >  			pr_debug("%s: rx error: %d buffers missing\n",
> > >  				 dev->name, num_buf);
> > > @@ -1529,6 +1727,7 @@ static int add_recvbuf_small(struct virtnet_info *vi, struct receive_queue *rq,
> > >  	unsigned int xdp_headroom = virtnet_get_headroom(vi);
> > >  	void *ctx = (void *)(unsigned long)xdp_headroom;
> > >  	int len = vi->hdr_len + VIRTNET_RX_PAD + GOOD_PACKET_LEN + xdp_headroom;
> > > +	struct virtnet_rq_data *data;
> > >  	int err;
> > >
> > >  	len = SKB_DATA_ALIGN(len) +
> > > @@ -1539,11 +1738,34 @@ static int add_recvbuf_small(struct virtnet_info *vi, struct receive_queue *rq,
> > >  	buf = (char *)page_address(alloc_frag->page) + alloc_frag->offset;
> > >  	get_page(alloc_frag->page);
> > >  	alloc_frag->offset += len;
> > > -	sg_init_one(rq->sg, buf + VIRTNET_RX_PAD + xdp_headroom,
> > > -		    vi->hdr_len + GOOD_PACKET_LEN);
> > > -	err = virtqueue_add_inbuf_ctx(rq->vq, rq->sg, 1, buf, ctx, gfp);
> > > +
> > > +	if (rq->data_array) {
> > > +		err = virtnet_rq_map_sg(rq, buf + VIRTNET_RX_PAD + xdp_headroom,
> > > +					vi->hdr_len + GOOD_PACKET_LEN);
> > > +		if (err)
> > > +			goto map_err;
> > > +
> > > +		data = virtnet_rq_get_data(rq, buf, rq->last_dma);
> > > +	} else {
> > > +		sg_init_one(rq->sg, buf + VIRTNET_RX_PAD + xdp_headroom,
> > > +			    vi->hdr_len + GOOD_PACKET_LEN);
> > > +		data = (void *)buf;
> > > +	}
> > > +
> > > +	err = virtqueue_add_inbuf_ctx(rq->vq, rq->sg, 1, data, ctx, gfp);
> > >  	if (err < 0)
> > > -		put_page(virt_to_head_page(buf));
> > > +		goto add_err;
> > > +
> > > +	return err;
> > > +
> > > +add_err:
> > > +	if (rq->data_array) {
> > > +		virtnet_rq_unmap(rq, data->dma);
> > > +		virtnet_rq_recycle_data(rq, data);
> > > +	}
> > > +
> > > +map_err:
> > > +	put_page(virt_to_head_page(buf));
> > >  	return err;
> > >  }
> > >
> > > @@ -1620,6 +1842,7 @@ static int add_recvbuf_mergeable(struct virtnet_info *vi,
> > >  	unsigned int headroom = virtnet_get_headroom(vi);
> > >  	unsigned int tailroom = headroom ? sizeof(struct skb_shared_info) : 0;
> > >  	unsigned int room = SKB_DATA_ALIGN(headroom + tailroom);
> > > +	struct virtnet_rq_data *data;
> > >  	char *buf;
> > >  	void *ctx;
> > >  	int err;
> > > @@ -1650,12 +1873,32 @@ static int add_recvbuf_mergeable(struct virtnet_info *vi,
> > >  		alloc_frag->offset += hole;
> > >  	}
> > >
> > > -	sg_init_one(rq->sg, buf, len);
> > > +	if (rq->data_array) {
> > > +		err = virtnet_rq_map_sg(rq, buf, len);
> > > +		if (err)
> > > +			goto map_err;
> > > +
> > > +		data = virtnet_rq_get_data(rq, buf, rq->last_dma);
> > > +	} else {
> > > +		sg_init_one(rq->sg, buf, len);
> > > +		data = (void *)buf;
> > > +	}
> > > +
> > >  	ctx = mergeable_len_to_ctx(len + room, headroom);
> > > -	err = virtqueue_add_inbuf_ctx(rq->vq, rq->sg, 1, buf, ctx, gfp);
> > > +	err = virtqueue_add_inbuf_ctx(rq->vq, rq->sg, 1, data, ctx, gfp);
> > >  	if (err < 0)
> > > -		put_page(virt_to_head_page(buf));
> > > +		goto add_err;
> > > +
> > > +	return 0;
> > > +
> > > +add_err:
> > > +	if (rq->data_array) {
> > > +		virtnet_rq_unmap(rq, data->dma);
> > > +		virtnet_rq_recycle_data(rq, data);
> > > +	}
> > >
> > > +map_err:
> > > +	put_page(virt_to_head_page(buf));
> > >  	return err;
> > >  }
> > >
> > > @@ -1775,13 +2018,13 @@ static int virtnet_receive(struct receive_queue *rq, int budget,
> > >  		void *ctx;
> > >
> > >  		while (stats.packets < budget &&
> > > -		       (buf = virtqueue_get_buf_ctx(rq->vq, &len, &ctx))) {
> > > +		       (buf = virtnet_rq_get_buf(rq, &len, &ctx))) {
> > >  			receive_buf(vi, rq, buf, len, ctx, xdp_xmit, &stats);
> > >  			stats.packets++;
> > >  		}
> > >  	} else {
> > >  		while (stats.packets < budget &&
> > > -		       (buf = virtqueue_get_buf(rq->vq, &len)) != NULL) {
> > > +		       (buf = virtnet_rq_get_buf(rq, &len, NULL)) != NULL) {
> > >  			receive_buf(vi, rq, buf, len, NULL, xdp_xmit, &stats);
> > >  			stats.packets++;
> > >  		}
> > > @@ -3514,6 +3757,9 @@ static void virtnet_free_queues(struct virtnet_info *vi)
> > >  	for (i = 0; i < vi->max_queue_pairs; i++) {
> > >  		__netif_napi_del(&vi->rq[i].napi);
> > >  		__netif_napi_del(&vi->sq[i].napi);
> > > +
> > > +		kfree(vi->rq[i].data_array);
> > > +		kfree(vi->rq[i].dma_array);
> > >  	}
> > >
> > >  	/* We called __netif_napi_del(),
> > > @@ -3591,9 +3837,10 @@ static void free_unused_bufs(struct virtnet_info *vi)
> > >  	}
> > >
> > >  	for (i = 0; i < vi->max_queue_pairs; i++) {
> > > -		struct virtqueue *vq = vi->rq[i].vq;
> > > -		while ((buf = virtqueue_detach_unused_buf(vq)) != NULL)
> > > -			virtnet_rq_free_unused_buf(vq, buf);
> > > +		struct receive_queue *rq = &vi->rq[i];
> > > +
> > > +		while ((buf = virtnet_rq_detach_unused_buf(rq)) != NULL)
> > > +			virtnet_rq_free_unused_buf(rq->vq, buf);
> > >  		cond_resched();
> > >  	}
> > >  }
> > > @@ -3767,6 +4014,10 @@ static int init_vqs(struct virtnet_info *vi)
> > >  	if (ret)
> > >  		goto err_free;
> > >
> > > +	ret = virtnet_rq_merge_map_init(vi);
> > > +	if (ret)
> > > +		goto err_free;
> > > +
> > >  	cpus_read_lock();
> > >  	virtnet_set_affinity(vi);
> > >  	cpus_read_unlock();
> > > --
> > > 2.32.0.3.g01195cf9f
> >

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: [PATCH vhost v11 10/10] virtio_net: merge dma operation for one page
  2023-07-10 11:59         ` Michael S. Tsirkin
@ 2023-07-10 12:38           ` Xuan Zhuo
  -1 siblings, 0 replies; 176+ messages in thread
From: Xuan Zhuo @ 2023-07-10 12:38 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: virtualization, Jason Wang, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Alexei Starovoitov, Daniel Borkmann,
	Jesper Dangaard Brouer, John Fastabend, netdev, bpf,
	Christoph Hellwig

On Mon, 10 Jul 2023 07:59:03 -0400, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> On Mon, Jul 10, 2023 at 06:18:30PM +0800, Xuan Zhuo wrote:
> > On Mon, 10 Jul 2023 05:40:21 -0400, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > > On Mon, Jul 10, 2023 at 11:42:37AM +0800, Xuan Zhuo wrote:
> > > > Currently, the virtio core will perform a dma operation for each
> > > > operation. Although, the same page may be operated multiple times.
> > > >
> > > > The driver does the dma operation and manages the dma address based the
> > > > feature premapped of virtio core.
> > > >
> > > > This way, we can perform only one dma operation for the same page. In
> > > > the case of mtu 1500, this can reduce a lot of dma operations.
> > > >
> > > > Tested on Aliyun g7.4large machine, in the case of a cpu 100%, pps
> > > > increased from 1893766 to 1901105. An increase of 0.4%.
> > >
> > > what kind of dma was there? an IOMMU? which vendors? in which mode
> > > of operation?
> >
> >
> > Do you mean this:
> >
> > [    0.470816] iommu: Default domain type: Passthrough
> >
>
> With passthrough, dma API is just some indirect function calls, they do
> not affect the performance a lot.


Yes, this benefit is worthless. I seem to have done a meaningless thing. The
overhead of DMA I observed is indeed not too high.

Thanks.


>
> Try e.g. bounce buffer. Which is where you will see a problem: your
> patches won't work.
>
>
> > >
> > > > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > >
> > > This kind of difference is likely in the noise.
> >
> > It's really not high, but this is because the proportion of DMA under perf top
> > is not high. Probably that much.
>
> So maybe not worth the complexity.
>
> > >
> > >
> > > > ---
> > > >  drivers/net/virtio_net.c | 283 ++++++++++++++++++++++++++++++++++++---
> > > >  1 file changed, 267 insertions(+), 16 deletions(-)
> > > >
> > > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > > > index 486b5849033d..4de845d35bed 100644
> > > > --- a/drivers/net/virtio_net.c
> > > > +++ b/drivers/net/virtio_net.c
> > > > @@ -126,6 +126,27 @@ static const struct virtnet_stat_desc virtnet_rq_stats_desc[] = {
> > > >  #define VIRTNET_SQ_STATS_LEN	ARRAY_SIZE(virtnet_sq_stats_desc)
> > > >  #define VIRTNET_RQ_STATS_LEN	ARRAY_SIZE(virtnet_rq_stats_desc)
> > > >
> > > > +/* The bufs on the same page may share this struct. */
> > > > +struct virtnet_rq_dma {
> > > > +	struct virtnet_rq_dma *next;
> > > > +
> > > > +	dma_addr_t addr;
> > > > +
> > > > +	void *buf;
> > > > +	u32 len;
> > > > +
> > > > +	u32 ref;
> > > > +};
> > > > +
> > > > +/* Record the dma and buf. */
> > >
> > > I guess I see that. But why?
> > > And these two comments are the extent of the available
> > > documentation, that's not enough I feel.
> > >
> > >
> > > > +struct virtnet_rq_data {
> > > > +	struct virtnet_rq_data *next;
> > >
> > > Is manually reimplementing a linked list the best
> > > we can do?
> >
> > Yes, we can use llist.
> >
> > >
> > > > +
> > > > +	void *buf;
> > > > +
> > > > +	struct virtnet_rq_dma *dma;
> > > > +};
> > > > +
> > > >  /* Internal representation of a send virtqueue */
> > > >  struct send_queue {
> > > >  	/* Virtqueue associated with this send _queue */
> > > > @@ -175,6 +196,13 @@ struct receive_queue {
> > > >  	char name[16];
> > > >
> > > >  	struct xdp_rxq_info xdp_rxq;
> > > > +
> > > > +	struct virtnet_rq_data *data_array;
> > > > +	struct virtnet_rq_data *data_free;
> > > > +
> > > > +	struct virtnet_rq_dma *dma_array;
> > > > +	struct virtnet_rq_dma *dma_free;
> > > > +	struct virtnet_rq_dma *last_dma;
> > > >  };
> > > >
> > > >  /* This structure can contain rss message with maximum settings for indirection table and keysize
> > > > @@ -549,6 +577,176 @@ static struct sk_buff *page_to_skb(struct virtnet_info *vi,
> > > >  	return skb;
> > > >  }
> > > >
> > > > +static void virtnet_rq_unmap(struct receive_queue *rq, struct virtnet_rq_dma *dma)
> > > > +{
> > > > +	struct device *dev;
> > > > +
> > > > +	--dma->ref;
> > > > +
> > > > +	if (dma->ref)
> > > > +		return;
> > > > +
> > >
> > > If you don't unmap there is no guarantee valid data will be
> > > there in the buffer.
> > >
> > > > +	dev = virtqueue_dma_dev(rq->vq);
> > > > +
> > > > +	dma_unmap_page(dev, dma->addr, dma->len, DMA_FROM_DEVICE);
> > >
> > >
> > >
> > >
> > >
> > > > +
> > > > +	dma->next = rq->dma_free;
> > > > +	rq->dma_free = dma;
> > > > +}
> > > > +
> > > > +static void *virtnet_rq_recycle_data(struct receive_queue *rq,
> > > > +				     struct virtnet_rq_data *data)
> > > > +{
> > > > +	void *buf;
> > > > +
> > > > +	buf = data->buf;
> > > > +
> > > > +	data->next = rq->data_free;
> > > > +	rq->data_free = data;
> > > > +
> > > > +	return buf;
> > > > +}
> > > > +
> > > > +static struct virtnet_rq_data *virtnet_rq_get_data(struct receive_queue *rq,
> > > > +						   void *buf,
> > > > +						   struct virtnet_rq_dma *dma)
> > > > +{
> > > > +	struct virtnet_rq_data *data;
> > > > +
> > > > +	data = rq->data_free;
> > > > +	rq->data_free = data->next;
> > > > +
> > > > +	data->buf = buf;
> > > > +	data->dma = dma;
> > > > +
> > > > +	return data;
> > > > +}
> > > > +
> > > > +static void *virtnet_rq_get_buf(struct receive_queue *rq, u32 *len, void **ctx)
> > > > +{
> > > > +	struct virtnet_rq_data *data;
> > > > +	void *buf;
> > > > +
> > > > +	buf = virtqueue_get_buf_ctx(rq->vq, len, ctx);
> > > > +	if (!buf || !rq->data_array)
> > > > +		return buf;
> > > > +
> > > > +	data = buf;
> > > > +
> > > > +	virtnet_rq_unmap(rq, data->dma);
> > > > +
> > > > +	return virtnet_rq_recycle_data(rq, data);
> > > > +}
> > > > +
> > > > +static void *virtnet_rq_detach_unused_buf(struct receive_queue *rq)
> > > > +{
> > > > +	struct virtnet_rq_data *data;
> > > > +	void *buf;
> > > > +
> > > > +	buf = virtqueue_detach_unused_buf(rq->vq);
> > > > +	if (!buf || !rq->data_array)
> > > > +		return buf;
> > > > +
> > > > +	data = buf;
> > > > +
> > > > +	virtnet_rq_unmap(rq, data->dma);
> > > > +
> > > > +	return virtnet_rq_recycle_data(rq, data);
> > > > +}
> > > > +
> > > > +static int virtnet_rq_map_sg(struct receive_queue *rq, void *buf, u32 len)
> > > > +{
> > > > +	struct virtnet_rq_dma *dma = rq->last_dma;
> > > > +	struct device *dev;
> > > > +	u32 off, map_len;
> > > > +	dma_addr_t addr;
> > > > +	void *end;
> > > > +
> > > > +	if (likely(dma) && buf >= dma->buf && (buf + len <= dma->buf + dma->len)) {
> > > > +		++dma->ref;
> > > > +		addr = dma->addr + (buf - dma->buf);
> > > > +		goto ok;
> > > > +	}
> > >
> > > So this is the meat of the proposed optimization. I guess that
> > > if the last buffer we allocated happens to be in the same page
> > > as this one then they can both be mapped for DMA together.
> >
> > Since we use page_frag, the buffers we allocated are all continuous.
> >
> > > Why last one specifically? Whether next one happens to
> > > be close depends on luck. If you want to try optimizing this
> > > the right thing to do is likely by using a page pool.
> > > There's actually work upstream on page pool, look it up.
> >
> > As we discussed in another thread, the page pool is first used for xdp. Let's
> > transform it step by step.
> >
> > Thanks.
>
> ok so this should wait then?
>
> > >
> > > > +
> > > > +	end = buf + len - 1;
> > > > +	off = offset_in_page(end);
> > > > +	map_len = len + PAGE_SIZE - off;
> > > > +
> > > > +	dev = virtqueue_dma_dev(rq->vq);
> > > > +
> > > > +	addr = dma_map_page_attrs(dev, virt_to_page(buf), offset_in_page(buf),
> > > > +				  map_len, DMA_FROM_DEVICE, 0);
> > > > +	if (addr == DMA_MAPPING_ERROR)
> > > > +		return -ENOMEM;
> > > > +
> > > > +	dma = rq->dma_free;
> > > > +	rq->dma_free = dma->next;
> > > > +
> > > > +	dma->ref = 1;
> > > > +	dma->buf = buf;
> > > > +	dma->addr = addr;
> > > > +	dma->len = map_len;
> > > > +
> > > > +	rq->last_dma = dma;
> > > > +
> > > > +ok:
> > > > +	sg_init_table(rq->sg, 1);
> > > > +	rq->sg[0].dma_address = addr;
> > > > +	rq->sg[0].length = len;
> > > > +
> > > > +	return 0;
> > > > +}
> > > > +
> > > > +static int virtnet_rq_merge_map_init(struct virtnet_info *vi)
> > > > +{
> > > > +	struct receive_queue *rq;
> > > > +	int i, err, j, num;
> > > > +
> > > > +	/* disable for big mode */
> > > > +	if (!vi->mergeable_rx_bufs && vi->big_packets)
> > > > +		return 0;
> > > > +
> > > > +	for (i = 0; i < vi->max_queue_pairs; i++) {
> > > > +		err = virtqueue_set_premapped(vi->rq[i].vq);
> > > > +		if (err)
> > > > +			continue;
> > > > +
> > > > +		rq = &vi->rq[i];
> > > > +
> > > > +		num = virtqueue_get_vring_size(rq->vq);
> > > > +
> > > > +		rq->data_array = kmalloc_array(num, sizeof(*rq->data_array), GFP_KERNEL);
> > > > +		if (!rq->data_array)
> > > > +			goto err;
> > > > +
> > > > +		rq->dma_array = kmalloc_array(num, sizeof(*rq->dma_array), GFP_KERNEL);
> > > > +		if (!rq->dma_array)
> > > > +			goto err;
> > > > +
> > > > +		for (j = 0; j < num; ++j) {
> > > > +			rq->data_array[j].next = rq->data_free;
> > > > +			rq->data_free = &rq->data_array[j];
> > > > +
> > > > +			rq->dma_array[j].next = rq->dma_free;
> > > > +			rq->dma_free = &rq->dma_array[j];
> > > > +		}
> > > > +	}
> > > > +
> > > > +	return 0;
> > > > +
> > > > +err:
> > > > +	for (i = 0; i < vi->max_queue_pairs; i++) {
> > > > +		struct receive_queue *rq;
> > > > +
> > > > +		rq = &vi->rq[i];
> > > > +
> > > > +		kfree(rq->dma_array);
> > > > +		kfree(rq->data_array);
> > > > +	}
> > > > +
> > > > +	return -ENOMEM;
> > > > +}
> > > > +
> > > >  static void free_old_xmit_skbs(struct send_queue *sq, bool in_napi)
> > > >  {
> > > >  	unsigned int len;
> > > > @@ -835,7 +1033,7 @@ static struct page *xdp_linearize_page(struct receive_queue *rq,
> > > >  		void *buf;
> > > >  		int off;
> > > >
> > > > -		buf = virtqueue_get_buf(rq->vq, &buflen);
> > > > +		buf = virtnet_rq_get_buf(rq, &buflen, NULL);
> > > >  		if (unlikely(!buf))
> > > >  			goto err_buf;
> > > >
> > > > @@ -1126,7 +1324,7 @@ static int virtnet_build_xdp_buff_mrg(struct net_device *dev,
> > > >  		return -EINVAL;
> > > >
> > > >  	while (--*num_buf > 0) {
> > > > -		buf = virtqueue_get_buf_ctx(rq->vq, &len, &ctx);
> > > > +		buf = virtnet_rq_get_buf(rq, &len, &ctx);
> > > >  		if (unlikely(!buf)) {
> > > >  			pr_debug("%s: rx error: %d buffers out of %d missing\n",
> > > >  				 dev->name, *num_buf,
> > > > @@ -1351,7 +1549,7 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
> > > >  	while (--num_buf) {
> > > >  		int num_skb_frags;
> > > >
> > > > -		buf = virtqueue_get_buf_ctx(rq->vq, &len, &ctx);
> > > > +		buf = virtnet_rq_get_buf(rq, &len, &ctx);
> > > >  		if (unlikely(!buf)) {
> > > >  			pr_debug("%s: rx error: %d buffers out of %d missing\n",
> > > >  				 dev->name, num_buf,
> > > > @@ -1414,7 +1612,7 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
> > > >  err_skb:
> > > >  	put_page(page);
> > > >  	while (num_buf-- > 1) {
> > > > -		buf = virtqueue_get_buf(rq->vq, &len);
> > > > +		buf = virtnet_rq_get_buf(rq, &len, NULL);
> > > >  		if (unlikely(!buf)) {
> > > >  			pr_debug("%s: rx error: %d buffers missing\n",
> > > >  				 dev->name, num_buf);
> > > > @@ -1529,6 +1727,7 @@ static int add_recvbuf_small(struct virtnet_info *vi, struct receive_queue *rq,
> > > >  	unsigned int xdp_headroom = virtnet_get_headroom(vi);
> > > >  	void *ctx = (void *)(unsigned long)xdp_headroom;
> > > >  	int len = vi->hdr_len + VIRTNET_RX_PAD + GOOD_PACKET_LEN + xdp_headroom;
> > > > +	struct virtnet_rq_data *data;
> > > >  	int err;
> > > >
> > > >  	len = SKB_DATA_ALIGN(len) +
> > > > @@ -1539,11 +1738,34 @@ static int add_recvbuf_small(struct virtnet_info *vi, struct receive_queue *rq,
> > > >  	buf = (char *)page_address(alloc_frag->page) + alloc_frag->offset;
> > > >  	get_page(alloc_frag->page);
> > > >  	alloc_frag->offset += len;
> > > > -	sg_init_one(rq->sg, buf + VIRTNET_RX_PAD + xdp_headroom,
> > > > -		    vi->hdr_len + GOOD_PACKET_LEN);
> > > > -	err = virtqueue_add_inbuf_ctx(rq->vq, rq->sg, 1, buf, ctx, gfp);
> > > > +
> > > > +	if (rq->data_array) {
> > > > +		err = virtnet_rq_map_sg(rq, buf + VIRTNET_RX_PAD + xdp_headroom,
> > > > +					vi->hdr_len + GOOD_PACKET_LEN);
> > > > +		if (err)
> > > > +			goto map_err;
> > > > +
> > > > +		data = virtnet_rq_get_data(rq, buf, rq->last_dma);
> > > > +	} else {
> > > > +		sg_init_one(rq->sg, buf + VIRTNET_RX_PAD + xdp_headroom,
> > > > +			    vi->hdr_len + GOOD_PACKET_LEN);
> > > > +		data = (void *)buf;
> > > > +	}
> > > > +
> > > > +	err = virtqueue_add_inbuf_ctx(rq->vq, rq->sg, 1, data, ctx, gfp);
> > > >  	if (err < 0)
> > > > -		put_page(virt_to_head_page(buf));
> > > > +		goto add_err;
> > > > +
> > > > +	return err;
> > > > +
> > > > +add_err:
> > > > +	if (rq->data_array) {
> > > > +		virtnet_rq_unmap(rq, data->dma);
> > > > +		virtnet_rq_recycle_data(rq, data);
> > > > +	}
> > > > +
> > > > +map_err:
> > > > +	put_page(virt_to_head_page(buf));
> > > >  	return err;
> > > >  }
> > > >
> > > > @@ -1620,6 +1842,7 @@ static int add_recvbuf_mergeable(struct virtnet_info *vi,
> > > >  	unsigned int headroom = virtnet_get_headroom(vi);
> > > >  	unsigned int tailroom = headroom ? sizeof(struct skb_shared_info) : 0;
> > > >  	unsigned int room = SKB_DATA_ALIGN(headroom + tailroom);
> > > > +	struct virtnet_rq_data *data;
> > > >  	char *buf;
> > > >  	void *ctx;
> > > >  	int err;
> > > > @@ -1650,12 +1873,32 @@ static int add_recvbuf_mergeable(struct virtnet_info *vi,
> > > >  		alloc_frag->offset += hole;
> > > >  	}
> > > >
> > > > -	sg_init_one(rq->sg, buf, len);
> > > > +	if (rq->data_array) {
> > > > +		err = virtnet_rq_map_sg(rq, buf, len);
> > > > +		if (err)
> > > > +			goto map_err;
> > > > +
> > > > +		data = virtnet_rq_get_data(rq, buf, rq->last_dma);
> > > > +	} else {
> > > > +		sg_init_one(rq->sg, buf, len);
> > > > +		data = (void *)buf;
> > > > +	}
> > > > +
> > > >  	ctx = mergeable_len_to_ctx(len + room, headroom);
> > > > -	err = virtqueue_add_inbuf_ctx(rq->vq, rq->sg, 1, buf, ctx, gfp);
> > > > +	err = virtqueue_add_inbuf_ctx(rq->vq, rq->sg, 1, data, ctx, gfp);
> > > >  	if (err < 0)
> > > > -		put_page(virt_to_head_page(buf));
> > > > +		goto add_err;
> > > > +
> > > > +	return 0;
> > > > +
> > > > +add_err:
> > > > +	if (rq->data_array) {
> > > > +		virtnet_rq_unmap(rq, data->dma);
> > > > +		virtnet_rq_recycle_data(rq, data);
> > > > +	}
> > > >
> > > > +map_err:
> > > > +	put_page(virt_to_head_page(buf));
> > > >  	return err;
> > > >  }
> > > >
> > > > @@ -1775,13 +2018,13 @@ static int virtnet_receive(struct receive_queue *rq, int budget,
> > > >  		void *ctx;
> > > >
> > > >  		while (stats.packets < budget &&
> > > > -		       (buf = virtqueue_get_buf_ctx(rq->vq, &len, &ctx))) {
> > > > +		       (buf = virtnet_rq_get_buf(rq, &len, &ctx))) {
> > > >  			receive_buf(vi, rq, buf, len, ctx, xdp_xmit, &stats);
> > > >  			stats.packets++;
> > > >  		}
> > > >  	} else {
> > > >  		while (stats.packets < budget &&
> > > > -		       (buf = virtqueue_get_buf(rq->vq, &len)) != NULL) {
> > > > +		       (buf = virtnet_rq_get_buf(rq, &len, NULL)) != NULL) {
> > > >  			receive_buf(vi, rq, buf, len, NULL, xdp_xmit, &stats);
> > > >  			stats.packets++;
> > > >  		}
> > > > @@ -3514,6 +3757,9 @@ static void virtnet_free_queues(struct virtnet_info *vi)
> > > >  	for (i = 0; i < vi->max_queue_pairs; i++) {
> > > >  		__netif_napi_del(&vi->rq[i].napi);
> > > >  		__netif_napi_del(&vi->sq[i].napi);
> > > > +
> > > > +		kfree(vi->rq[i].data_array);
> > > > +		kfree(vi->rq[i].dma_array);
> > > >  	}
> > > >
> > > >  	/* We called __netif_napi_del(),
> > > > @@ -3591,9 +3837,10 @@ static void free_unused_bufs(struct virtnet_info *vi)
> > > >  	}
> > > >
> > > >  	for (i = 0; i < vi->max_queue_pairs; i++) {
> > > > -		struct virtqueue *vq = vi->rq[i].vq;
> > > > -		while ((buf = virtqueue_detach_unused_buf(vq)) != NULL)
> > > > -			virtnet_rq_free_unused_buf(vq, buf);
> > > > +		struct receive_queue *rq = &vi->rq[i];
> > > > +
> > > > +		while ((buf = virtnet_rq_detach_unused_buf(rq)) != NULL)
> > > > +			virtnet_rq_free_unused_buf(rq->vq, buf);
> > > >  		cond_resched();
> > > >  	}
> > > >  }
> > > > @@ -3767,6 +4014,10 @@ static int init_vqs(struct virtnet_info *vi)
> > > >  	if (ret)
> > > >  		goto err_free;
> > > >
> > > > +	ret = virtnet_rq_merge_map_init(vi);
> > > > +	if (ret)
> > > > +		goto err_free;
> > > > +
> > > >  	cpus_read_lock();
> > > >  	virtnet_set_affinity(vi);
> > > >  	cpus_read_unlock();
> > > > --
> > > > 2.32.0.3.g01195cf9f
> > >
>

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: [PATCH vhost v11 10/10] virtio_net: merge dma operation for one page
@ 2023-07-10 12:38           ` Xuan Zhuo
  0 siblings, 0 replies; 176+ messages in thread
From: Xuan Zhuo @ 2023-07-10 12:38 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Jesper Dangaard Brouer, Daniel Borkmann, netdev, John Fastabend,
	Alexei Starovoitov, virtualization, Christoph Hellwig,
	Eric Dumazet, Jakub Kicinski, bpf, Paolo Abeni, David S. Miller

On Mon, 10 Jul 2023 07:59:03 -0400, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> On Mon, Jul 10, 2023 at 06:18:30PM +0800, Xuan Zhuo wrote:
> > On Mon, 10 Jul 2023 05:40:21 -0400, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > > On Mon, Jul 10, 2023 at 11:42:37AM +0800, Xuan Zhuo wrote:
> > > > Currently, the virtio core will perform a dma operation for each
> > > > operation. Although, the same page may be operated multiple times.
> > > >
> > > > The driver does the dma operation and manages the dma address based the
> > > > feature premapped of virtio core.
> > > >
> > > > This way, we can perform only one dma operation for the same page. In
> > > > the case of mtu 1500, this can reduce a lot of dma operations.
> > > >
> > > > Tested on Aliyun g7.4large machine, in the case of a cpu 100%, pps
> > > > increased from 1893766 to 1901105. An increase of 0.4%.
> > >
> > > what kind of dma was there? an IOMMU? which vendors? in which mode
> > > of operation?
> >
> >
> > Do you mean this:
> >
> > [    0.470816] iommu: Default domain type: Passthrough
> >
>
> With passthrough, dma API is just some indirect function calls, they do
> not affect the performance a lot.


Yes, this benefit is worthless. I seem to have done a meaningless thing. The
overhead of DMA I observed is indeed not too high.

Thanks.


>
> Try e.g. bounce buffer. Which is where you will see a problem: your
> patches won't work.
>
>
> > >
> > > > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > >
> > > This kind of difference is likely in the noise.
> >
> > It's really not high, but this is because the proportion of DMA under perf top
> > is not high. Probably that much.
>
> So maybe not worth the complexity.
>
> > >
> > >
> > > > ---
> > > >  drivers/net/virtio_net.c | 283 ++++++++++++++++++++++++++++++++++++---
> > > >  1 file changed, 267 insertions(+), 16 deletions(-)
> > > >
> > > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > > > index 486b5849033d..4de845d35bed 100644
> > > > --- a/drivers/net/virtio_net.c
> > > > +++ b/drivers/net/virtio_net.c
> > > > @@ -126,6 +126,27 @@ static const struct virtnet_stat_desc virtnet_rq_stats_desc[] = {
> > > >  #define VIRTNET_SQ_STATS_LEN	ARRAY_SIZE(virtnet_sq_stats_desc)
> > > >  #define VIRTNET_RQ_STATS_LEN	ARRAY_SIZE(virtnet_rq_stats_desc)
> > > >
> > > > +/* The bufs on the same page may share this struct. */
> > > > +struct virtnet_rq_dma {
> > > > +	struct virtnet_rq_dma *next;
> > > > +
> > > > +	dma_addr_t addr;
> > > > +
> > > > +	void *buf;
> > > > +	u32 len;
> > > > +
> > > > +	u32 ref;
> > > > +};
> > > > +
> > > > +/* Record the dma and buf. */
> > >
> > > I guess I see that. But why?
> > > And these two comments are the extent of the available
> > > documentation, that's not enough I feel.
> > >
> > >
> > > > +struct virtnet_rq_data {
> > > > +	struct virtnet_rq_data *next;
> > >
> > > Is manually reimplementing a linked list the best
> > > we can do?
> >
> > Yes, we can use llist.
> >
> > >
> > > > +
> > > > +	void *buf;
> > > > +
> > > > +	struct virtnet_rq_dma *dma;
> > > > +};
> > > > +
> > > >  /* Internal representation of a send virtqueue */
> > > >  struct send_queue {
> > > >  	/* Virtqueue associated with this send _queue */
> > > > @@ -175,6 +196,13 @@ struct receive_queue {
> > > >  	char name[16];
> > > >
> > > >  	struct xdp_rxq_info xdp_rxq;
> > > > +
> > > > +	struct virtnet_rq_data *data_array;
> > > > +	struct virtnet_rq_data *data_free;
> > > > +
> > > > +	struct virtnet_rq_dma *dma_array;
> > > > +	struct virtnet_rq_dma *dma_free;
> > > > +	struct virtnet_rq_dma *last_dma;
> > > >  };
> > > >
> > > >  /* This structure can contain rss message with maximum settings for indirection table and keysize
> > > > @@ -549,6 +577,176 @@ static struct sk_buff *page_to_skb(struct virtnet_info *vi,
> > > >  	return skb;
> > > >  }
> > > >
> > > > +static void virtnet_rq_unmap(struct receive_queue *rq, struct virtnet_rq_dma *dma)
> > > > +{
> > > > +	struct device *dev;
> > > > +
> > > > +	--dma->ref;
> > > > +
> > > > +	if (dma->ref)
> > > > +		return;
> > > > +
> > >
> > > If you don't unmap there is no guarantee valid data will be
> > > there in the buffer.
> > >
> > > > +	dev = virtqueue_dma_dev(rq->vq);
> > > > +
> > > > +	dma_unmap_page(dev, dma->addr, dma->len, DMA_FROM_DEVICE);
> > >
> > >
> > >
> > >
> > >
> > > > +
> > > > +	dma->next = rq->dma_free;
> > > > +	rq->dma_free = dma;
> > > > +}
> > > > +
> > > > +static void *virtnet_rq_recycle_data(struct receive_queue *rq,
> > > > +				     struct virtnet_rq_data *data)
> > > > +{
> > > > +	void *buf;
> > > > +
> > > > +	buf = data->buf;
> > > > +
> > > > +	data->next = rq->data_free;
> > > > +	rq->data_free = data;
> > > > +
> > > > +	return buf;
> > > > +}
> > > > +
> > > > +static struct virtnet_rq_data *virtnet_rq_get_data(struct receive_queue *rq,
> > > > +						   void *buf,
> > > > +						   struct virtnet_rq_dma *dma)
> > > > +{
> > > > +	struct virtnet_rq_data *data;
> > > > +
> > > > +	data = rq->data_free;
> > > > +	rq->data_free = data->next;
> > > > +
> > > > +	data->buf = buf;
> > > > +	data->dma = dma;
> > > > +
> > > > +	return data;
> > > > +}
> > > > +
> > > > +static void *virtnet_rq_get_buf(struct receive_queue *rq, u32 *len, void **ctx)
> > > > +{
> > > > +	struct virtnet_rq_data *data;
> > > > +	void *buf;
> > > > +
> > > > +	buf = virtqueue_get_buf_ctx(rq->vq, len, ctx);
> > > > +	if (!buf || !rq->data_array)
> > > > +		return buf;
> > > > +
> > > > +	data = buf;
> > > > +
> > > > +	virtnet_rq_unmap(rq, data->dma);
> > > > +
> > > > +	return virtnet_rq_recycle_data(rq, data);
> > > > +}
> > > > +
> > > > +static void *virtnet_rq_detach_unused_buf(struct receive_queue *rq)
> > > > +{
> > > > +	struct virtnet_rq_data *data;
> > > > +	void *buf;
> > > > +
> > > > +	buf = virtqueue_detach_unused_buf(rq->vq);
> > > > +	if (!buf || !rq->data_array)
> > > > +		return buf;
> > > > +
> > > > +	data = buf;
> > > > +
> > > > +	virtnet_rq_unmap(rq, data->dma);
> > > > +
> > > > +	return virtnet_rq_recycle_data(rq, data);
> > > > +}
> > > > +
> > > > +static int virtnet_rq_map_sg(struct receive_queue *rq, void *buf, u32 len)
> > > > +{
> > > > +	struct virtnet_rq_dma *dma = rq->last_dma;
> > > > +	struct device *dev;
> > > > +	u32 off, map_len;
> > > > +	dma_addr_t addr;
> > > > +	void *end;
> > > > +
> > > > +	if (likely(dma) && buf >= dma->buf && (buf + len <= dma->buf + dma->len)) {
> > > > +		++dma->ref;
> > > > +		addr = dma->addr + (buf - dma->buf);
> > > > +		goto ok;
> > > > +	}
> > >
> > > So this is the meat of the proposed optimization. I guess that
> > > if the last buffer we allocated happens to be in the same page
> > > as this one then they can both be mapped for DMA together.
> >
> > Since we use page_frag, the buffers we allocated are all continuous.
> >
> > > Why last one specifically? Whether next one happens to
> > > be close depends on luck. If you want to try optimizing this
> > > the right thing to do is likely by using a page pool.
> > > There's actually work upstream on page pool, look it up.
> >
> > As we discussed in another thread, the page pool is first used for xdp. Let's
> > transform it step by step.
> >
> > Thanks.
>
> ok so this should wait then?
>
> > >
> > > > +
> > > > +	end = buf + len - 1;
> > > > +	off = offset_in_page(end);
> > > > +	map_len = len + PAGE_SIZE - off;
> > > > +
> > > > +	dev = virtqueue_dma_dev(rq->vq);
> > > > +
> > > > +	addr = dma_map_page_attrs(dev, virt_to_page(buf), offset_in_page(buf),
> > > > +				  map_len, DMA_FROM_DEVICE, 0);
> > > > +	if (addr == DMA_MAPPING_ERROR)
> > > > +		return -ENOMEM;
> > > > +
> > > > +	dma = rq->dma_free;
> > > > +	rq->dma_free = dma->next;
> > > > +
> > > > +	dma->ref = 1;
> > > > +	dma->buf = buf;
> > > > +	dma->addr = addr;
> > > > +	dma->len = map_len;
> > > > +
> > > > +	rq->last_dma = dma;
> > > > +
> > > > +ok:
> > > > +	sg_init_table(rq->sg, 1);
> > > > +	rq->sg[0].dma_address = addr;
> > > > +	rq->sg[0].length = len;
> > > > +
> > > > +	return 0;
> > > > +}
> > > > +
> > > > +static int virtnet_rq_merge_map_init(struct virtnet_info *vi)
> > > > +{
> > > > +	struct receive_queue *rq;
> > > > +	int i, err, j, num;
> > > > +
> > > > +	/* disable for big mode */
> > > > +	if (!vi->mergeable_rx_bufs && vi->big_packets)
> > > > +		return 0;
> > > > +
> > > > +	for (i = 0; i < vi->max_queue_pairs; i++) {
> > > > +		err = virtqueue_set_premapped(vi->rq[i].vq);
> > > > +		if (err)
> > > > +			continue;
> > > > +
> > > > +		rq = &vi->rq[i];
> > > > +
> > > > +		num = virtqueue_get_vring_size(rq->vq);
> > > > +
> > > > +		rq->data_array = kmalloc_array(num, sizeof(*rq->data_array), GFP_KERNEL);
> > > > +		if (!rq->data_array)
> > > > +			goto err;
> > > > +
> > > > +		rq->dma_array = kmalloc_array(num, sizeof(*rq->dma_array), GFP_KERNEL);
> > > > +		if (!rq->dma_array)
> > > > +			goto err;
> > > > +
> > > > +		for (j = 0; j < num; ++j) {
> > > > +			rq->data_array[j].next = rq->data_free;
> > > > +			rq->data_free = &rq->data_array[j];
> > > > +
> > > > +			rq->dma_array[j].next = rq->dma_free;
> > > > +			rq->dma_free = &rq->dma_array[j];
> > > > +		}
> > > > +	}
> > > > +
> > > > +	return 0;
> > > > +
> > > > +err:
> > > > +	for (i = 0; i < vi->max_queue_pairs; i++) {
> > > > +		struct receive_queue *rq;
> > > > +
> > > > +		rq = &vi->rq[i];
> > > > +
> > > > +		kfree(rq->dma_array);
> > > > +		kfree(rq->data_array);
> > > > +	}
> > > > +
> > > > +	return -ENOMEM;
> > > > +}
> > > > +
> > > >  static void free_old_xmit_skbs(struct send_queue *sq, bool in_napi)
> > > >  {
> > > >  	unsigned int len;
> > > > @@ -835,7 +1033,7 @@ static struct page *xdp_linearize_page(struct receive_queue *rq,
> > > >  		void *buf;
> > > >  		int off;
> > > >
> > > > -		buf = virtqueue_get_buf(rq->vq, &buflen);
> > > > +		buf = virtnet_rq_get_buf(rq, &buflen, NULL);
> > > >  		if (unlikely(!buf))
> > > >  			goto err_buf;
> > > >
> > > > @@ -1126,7 +1324,7 @@ static int virtnet_build_xdp_buff_mrg(struct net_device *dev,
> > > >  		return -EINVAL;
> > > >
> > > >  	while (--*num_buf > 0) {
> > > > -		buf = virtqueue_get_buf_ctx(rq->vq, &len, &ctx);
> > > > +		buf = virtnet_rq_get_buf(rq, &len, &ctx);
> > > >  		if (unlikely(!buf)) {
> > > >  			pr_debug("%s: rx error: %d buffers out of %d missing\n",
> > > >  				 dev->name, *num_buf,
> > > > @@ -1351,7 +1549,7 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
> > > >  	while (--num_buf) {
> > > >  		int num_skb_frags;
> > > >
> > > > -		buf = virtqueue_get_buf_ctx(rq->vq, &len, &ctx);
> > > > +		buf = virtnet_rq_get_buf(rq, &len, &ctx);
> > > >  		if (unlikely(!buf)) {
> > > >  			pr_debug("%s: rx error: %d buffers out of %d missing\n",
> > > >  				 dev->name, num_buf,
> > > > @@ -1414,7 +1612,7 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
> > > >  err_skb:
> > > >  	put_page(page);
> > > >  	while (num_buf-- > 1) {
> > > > -		buf = virtqueue_get_buf(rq->vq, &len);
> > > > +		buf = virtnet_rq_get_buf(rq, &len, NULL);
> > > >  		if (unlikely(!buf)) {
> > > >  			pr_debug("%s: rx error: %d buffers missing\n",
> > > >  				 dev->name, num_buf);
> > > > @@ -1529,6 +1727,7 @@ static int add_recvbuf_small(struct virtnet_info *vi, struct receive_queue *rq,
> > > >  	unsigned int xdp_headroom = virtnet_get_headroom(vi);
> > > >  	void *ctx = (void *)(unsigned long)xdp_headroom;
> > > >  	int len = vi->hdr_len + VIRTNET_RX_PAD + GOOD_PACKET_LEN + xdp_headroom;
> > > > +	struct virtnet_rq_data *data;
> > > >  	int err;
> > > >
> > > >  	len = SKB_DATA_ALIGN(len) +
> > > > @@ -1539,11 +1738,34 @@ static int add_recvbuf_small(struct virtnet_info *vi, struct receive_queue *rq,
> > > >  	buf = (char *)page_address(alloc_frag->page) + alloc_frag->offset;
> > > >  	get_page(alloc_frag->page);
> > > >  	alloc_frag->offset += len;
> > > > -	sg_init_one(rq->sg, buf + VIRTNET_RX_PAD + xdp_headroom,
> > > > -		    vi->hdr_len + GOOD_PACKET_LEN);
> > > > -	err = virtqueue_add_inbuf_ctx(rq->vq, rq->sg, 1, buf, ctx, gfp);
> > > > +
> > > > +	if (rq->data_array) {
> > > > +		err = virtnet_rq_map_sg(rq, buf + VIRTNET_RX_PAD + xdp_headroom,
> > > > +					vi->hdr_len + GOOD_PACKET_LEN);
> > > > +		if (err)
> > > > +			goto map_err;
> > > > +
> > > > +		data = virtnet_rq_get_data(rq, buf, rq->last_dma);
> > > > +	} else {
> > > > +		sg_init_one(rq->sg, buf + VIRTNET_RX_PAD + xdp_headroom,
> > > > +			    vi->hdr_len + GOOD_PACKET_LEN);
> > > > +		data = (void *)buf;
> > > > +	}
> > > > +
> > > > +	err = virtqueue_add_inbuf_ctx(rq->vq, rq->sg, 1, data, ctx, gfp);
> > > >  	if (err < 0)
> > > > -		put_page(virt_to_head_page(buf));
> > > > +		goto add_err;
> > > > +
> > > > +	return err;
> > > > +
> > > > +add_err:
> > > > +	if (rq->data_array) {
> > > > +		virtnet_rq_unmap(rq, data->dma);
> > > > +		virtnet_rq_recycle_data(rq, data);
> > > > +	}
> > > > +
> > > > +map_err:
> > > > +	put_page(virt_to_head_page(buf));
> > > >  	return err;
> > > >  }
> > > >
> > > > @@ -1620,6 +1842,7 @@ static int add_recvbuf_mergeable(struct virtnet_info *vi,
> > > >  	unsigned int headroom = virtnet_get_headroom(vi);
> > > >  	unsigned int tailroom = headroom ? sizeof(struct skb_shared_info) : 0;
> > > >  	unsigned int room = SKB_DATA_ALIGN(headroom + tailroom);
> > > > +	struct virtnet_rq_data *data;
> > > >  	char *buf;
> > > >  	void *ctx;
> > > >  	int err;
> > > > @@ -1650,12 +1873,32 @@ static int add_recvbuf_mergeable(struct virtnet_info *vi,
> > > >  		alloc_frag->offset += hole;
> > > >  	}
> > > >
> > > > -	sg_init_one(rq->sg, buf, len);
> > > > +	if (rq->data_array) {
> > > > +		err = virtnet_rq_map_sg(rq, buf, len);
> > > > +		if (err)
> > > > +			goto map_err;
> > > > +
> > > > +		data = virtnet_rq_get_data(rq, buf, rq->last_dma);
> > > > +	} else {
> > > > +		sg_init_one(rq->sg, buf, len);
> > > > +		data = (void *)buf;
> > > > +	}
> > > > +
> > > >  	ctx = mergeable_len_to_ctx(len + room, headroom);
> > > > -	err = virtqueue_add_inbuf_ctx(rq->vq, rq->sg, 1, buf, ctx, gfp);
> > > > +	err = virtqueue_add_inbuf_ctx(rq->vq, rq->sg, 1, data, ctx, gfp);
> > > >  	if (err < 0)
> > > > -		put_page(virt_to_head_page(buf));
> > > > +		goto add_err;
> > > > +
> > > > +	return 0;
> > > > +
> > > > +add_err:
> > > > +	if (rq->data_array) {
> > > > +		virtnet_rq_unmap(rq, data->dma);
> > > > +		virtnet_rq_recycle_data(rq, data);
> > > > +	}
> > > >
> > > > +map_err:
> > > > +	put_page(virt_to_head_page(buf));
> > > >  	return err;
> > > >  }
> > > >
> > > > @@ -1775,13 +2018,13 @@ static int virtnet_receive(struct receive_queue *rq, int budget,
> > > >  		void *ctx;
> > > >
> > > >  		while (stats.packets < budget &&
> > > > -		       (buf = virtqueue_get_buf_ctx(rq->vq, &len, &ctx))) {
> > > > +		       (buf = virtnet_rq_get_buf(rq, &len, &ctx))) {
> > > >  			receive_buf(vi, rq, buf, len, ctx, xdp_xmit, &stats);
> > > >  			stats.packets++;
> > > >  		}
> > > >  	} else {
> > > >  		while (stats.packets < budget &&
> > > > -		       (buf = virtqueue_get_buf(rq->vq, &len)) != NULL) {
> > > > +		       (buf = virtnet_rq_get_buf(rq, &len, NULL)) != NULL) {
> > > >  			receive_buf(vi, rq, buf, len, NULL, xdp_xmit, &stats);
> > > >  			stats.packets++;
> > > >  		}
> > > > @@ -3514,6 +3757,9 @@ static void virtnet_free_queues(struct virtnet_info *vi)
> > > >  	for (i = 0; i < vi->max_queue_pairs; i++) {
> > > >  		__netif_napi_del(&vi->rq[i].napi);
> > > >  		__netif_napi_del(&vi->sq[i].napi);
> > > > +
> > > > +		kfree(vi->rq[i].data_array);
> > > > +		kfree(vi->rq[i].dma_array);
> > > >  	}
> > > >
> > > >  	/* We called __netif_napi_del(),
> > > > @@ -3591,9 +3837,10 @@ static void free_unused_bufs(struct virtnet_info *vi)
> > > >  	}
> > > >
> > > >  	for (i = 0; i < vi->max_queue_pairs; i++) {
> > > > -		struct virtqueue *vq = vi->rq[i].vq;
> > > > -		while ((buf = virtqueue_detach_unused_buf(vq)) != NULL)
> > > > -			virtnet_rq_free_unused_buf(vq, buf);
> > > > +		struct receive_queue *rq = &vi->rq[i];
> > > > +
> > > > +		while ((buf = virtnet_rq_detach_unused_buf(rq)) != NULL)
> > > > +			virtnet_rq_free_unused_buf(rq->vq, buf);
> > > >  		cond_resched();
> > > >  	}
> > > >  }
> > > > @@ -3767,6 +4014,10 @@ static int init_vqs(struct virtnet_info *vi)
> > > >  	if (ret)
> > > >  		goto err_free;
> > > >
> > > > +	ret = virtnet_rq_merge_map_init(vi);
> > > > +	if (ret)
> > > > +		goto err_free;
> > > > +
> > > >  	cpus_read_lock();
> > > >  	virtnet_set_affinity(vi);
> > > >  	cpus_read_unlock();
> > > > --
> > > > 2.32.0.3.g01195cf9f
> > >
>
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: [PATCH vhost v11 10/10] virtio_net: merge dma operation for one page
  2023-07-10 12:38           ` Xuan Zhuo
@ 2023-07-11  2:36             ` Jason Wang
  -1 siblings, 0 replies; 176+ messages in thread
From: Jason Wang @ 2023-07-11  2:36 UTC (permalink / raw)
  To: Xuan Zhuo
  Cc: Michael S. Tsirkin, virtualization, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Alexei Starovoitov,
	Daniel Borkmann, Jesper Dangaard Brouer, John Fastabend, netdev,
	bpf, Christoph Hellwig

On Mon, Jul 10, 2023 at 8:41 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
>
> On Mon, 10 Jul 2023 07:59:03 -0400, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > On Mon, Jul 10, 2023 at 06:18:30PM +0800, Xuan Zhuo wrote:
> > > On Mon, 10 Jul 2023 05:40:21 -0400, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > > > On Mon, Jul 10, 2023 at 11:42:37AM +0800, Xuan Zhuo wrote:
> > > > > Currently, the virtio core will perform a dma operation for each
> > > > > operation. Although, the same page may be operated multiple times.
> > > > >
> > > > > The driver does the dma operation and manages the dma address based the
> > > > > feature premapped of virtio core.
> > > > >
> > > > > This way, we can perform only one dma operation for the same page. In
> > > > > the case of mtu 1500, this can reduce a lot of dma operations.
> > > > >
> > > > > Tested on Aliyun g7.4large machine, in the case of a cpu 100%, pps
> > > > > increased from 1893766 to 1901105. An increase of 0.4%.
> > > >
> > > > what kind of dma was there? an IOMMU? which vendors? in which mode
> > > > of operation?
> > >
> > >
> > > Do you mean this:
> > >
> > > [    0.470816] iommu: Default domain type: Passthrough
> > >
> >
> > With passthrough, dma API is just some indirect function calls, they do
> > not affect the performance a lot.
>
>
> Yes, this benefit is worthless. I seem to have done a meaningless thing. The
> overhead of DMA I observed is indeed not too high.

Have you measured with iommu=strict?

Thanks

>
> Thanks.
>
>
> >
> > Try e.g. bounce buffer. Which is where you will see a problem: your
> > patches won't work.
> >
> >
> > > >
> > > > > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > > >
> > > > This kind of difference is likely in the noise.
> > >
> > > It's really not high, but this is because the proportion of DMA under perf top
> > > is not high. Probably that much.
> >
> > So maybe not worth the complexity.
> >
> > > >
> > > >
> > > > > ---
> > > > >  drivers/net/virtio_net.c | 283 ++++++++++++++++++++++++++++++++++++---
> > > > >  1 file changed, 267 insertions(+), 16 deletions(-)
> > > > >
> > > > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > > > > index 486b5849033d..4de845d35bed 100644
> > > > > --- a/drivers/net/virtio_net.c
> > > > > +++ b/drivers/net/virtio_net.c
> > > > > @@ -126,6 +126,27 @@ static const struct virtnet_stat_desc virtnet_rq_stats_desc[] = {
> > > > >  #define VIRTNET_SQ_STATS_LEN   ARRAY_SIZE(virtnet_sq_stats_desc)
> > > > >  #define VIRTNET_RQ_STATS_LEN   ARRAY_SIZE(virtnet_rq_stats_desc)
> > > > >
> > > > > +/* The bufs on the same page may share this struct. */
> > > > > +struct virtnet_rq_dma {
> > > > > +       struct virtnet_rq_dma *next;
> > > > > +
> > > > > +       dma_addr_t addr;
> > > > > +
> > > > > +       void *buf;
> > > > > +       u32 len;
> > > > > +
> > > > > +       u32 ref;
> > > > > +};
> > > > > +
> > > > > +/* Record the dma and buf. */
> > > >
> > > > I guess I see that. But why?
> > > > And these two comments are the extent of the available
> > > > documentation, that's not enough I feel.
> > > >
> > > >
> > > > > +struct virtnet_rq_data {
> > > > > +       struct virtnet_rq_data *next;
> > > >
> > > > Is manually reimplementing a linked list the best
> > > > we can do?
> > >
> > > Yes, we can use llist.
> > >
> > > >
> > > > > +
> > > > > +       void *buf;
> > > > > +
> > > > > +       struct virtnet_rq_dma *dma;
> > > > > +};
> > > > > +
> > > > >  /* Internal representation of a send virtqueue */
> > > > >  struct send_queue {
> > > > >         /* Virtqueue associated with this send _queue */
> > > > > @@ -175,6 +196,13 @@ struct receive_queue {
> > > > >         char name[16];
> > > > >
> > > > >         struct xdp_rxq_info xdp_rxq;
> > > > > +
> > > > > +       struct virtnet_rq_data *data_array;
> > > > > +       struct virtnet_rq_data *data_free;
> > > > > +
> > > > > +       struct virtnet_rq_dma *dma_array;
> > > > > +       struct virtnet_rq_dma *dma_free;
> > > > > +       struct virtnet_rq_dma *last_dma;
> > > > >  };
> > > > >
> > > > >  /* This structure can contain rss message with maximum settings for indirection table and keysize
> > > > > @@ -549,6 +577,176 @@ static struct sk_buff *page_to_skb(struct virtnet_info *vi,
> > > > >         return skb;
> > > > >  }
> > > > >
> > > > > +static void virtnet_rq_unmap(struct receive_queue *rq, struct virtnet_rq_dma *dma)
> > > > > +{
> > > > > +       struct device *dev;
> > > > > +
> > > > > +       --dma->ref;
> > > > > +
> > > > > +       if (dma->ref)
> > > > > +               return;
> > > > > +
> > > >
> > > > If you don't unmap there is no guarantee valid data will be
> > > > there in the buffer.
> > > >
> > > > > +       dev = virtqueue_dma_dev(rq->vq);
> > > > > +
> > > > > +       dma_unmap_page(dev, dma->addr, dma->len, DMA_FROM_DEVICE);
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > > +
> > > > > +       dma->next = rq->dma_free;
> > > > > +       rq->dma_free = dma;
> > > > > +}
> > > > > +
> > > > > +static void *virtnet_rq_recycle_data(struct receive_queue *rq,
> > > > > +                                    struct virtnet_rq_data *data)
> > > > > +{
> > > > > +       void *buf;
> > > > > +
> > > > > +       buf = data->buf;
> > > > > +
> > > > > +       data->next = rq->data_free;
> > > > > +       rq->data_free = data;
> > > > > +
> > > > > +       return buf;
> > > > > +}
> > > > > +
> > > > > +static struct virtnet_rq_data *virtnet_rq_get_data(struct receive_queue *rq,
> > > > > +                                                  void *buf,
> > > > > +                                                  struct virtnet_rq_dma *dma)
> > > > > +{
> > > > > +       struct virtnet_rq_data *data;
> > > > > +
> > > > > +       data = rq->data_free;
> > > > > +       rq->data_free = data->next;
> > > > > +
> > > > > +       data->buf = buf;
> > > > > +       data->dma = dma;
> > > > > +
> > > > > +       return data;
> > > > > +}
> > > > > +
> > > > > +static void *virtnet_rq_get_buf(struct receive_queue *rq, u32 *len, void **ctx)
> > > > > +{
> > > > > +       struct virtnet_rq_data *data;
> > > > > +       void *buf;
> > > > > +
> > > > > +       buf = virtqueue_get_buf_ctx(rq->vq, len, ctx);
> > > > > +       if (!buf || !rq->data_array)
> > > > > +               return buf;
> > > > > +
> > > > > +       data = buf;
> > > > > +
> > > > > +       virtnet_rq_unmap(rq, data->dma);
> > > > > +
> > > > > +       return virtnet_rq_recycle_data(rq, data);
> > > > > +}
> > > > > +
> > > > > +static void *virtnet_rq_detach_unused_buf(struct receive_queue *rq)
> > > > > +{
> > > > > +       struct virtnet_rq_data *data;
> > > > > +       void *buf;
> > > > > +
> > > > > +       buf = virtqueue_detach_unused_buf(rq->vq);
> > > > > +       if (!buf || !rq->data_array)
> > > > > +               return buf;
> > > > > +
> > > > > +       data = buf;
> > > > > +
> > > > > +       virtnet_rq_unmap(rq, data->dma);
> > > > > +
> > > > > +       return virtnet_rq_recycle_data(rq, data);
> > > > > +}
> > > > > +
> > > > > +static int virtnet_rq_map_sg(struct receive_queue *rq, void *buf, u32 len)
> > > > > +{
> > > > > +       struct virtnet_rq_dma *dma = rq->last_dma;
> > > > > +       struct device *dev;
> > > > > +       u32 off, map_len;
> > > > > +       dma_addr_t addr;
> > > > > +       void *end;
> > > > > +
> > > > > +       if (likely(dma) && buf >= dma->buf && (buf + len <= dma->buf + dma->len)) {
> > > > > +               ++dma->ref;
> > > > > +               addr = dma->addr + (buf - dma->buf);
> > > > > +               goto ok;
> > > > > +       }
> > > >
> > > > So this is the meat of the proposed optimization. I guess that
> > > > if the last buffer we allocated happens to be in the same page
> > > > as this one then they can both be mapped for DMA together.
> > >
> > > Since we use page_frag, the buffers we allocated are all continuous.
> > >
> > > > Why last one specifically? Whether next one happens to
> > > > be close depends on luck. If you want to try optimizing this
> > > > the right thing to do is likely by using a page pool.
> > > > There's actually work upstream on page pool, look it up.
> > >
> > > As we discussed in another thread, the page pool is first used for xdp. Let's
> > > transform it step by step.
> > >
> > > Thanks.
> >
> > ok so this should wait then?
> >
> > > >
> > > > > +
> > > > > +       end = buf + len - 1;
> > > > > +       off = offset_in_page(end);
> > > > > +       map_len = len + PAGE_SIZE - off;
> > > > > +
> > > > > +       dev = virtqueue_dma_dev(rq->vq);
> > > > > +
> > > > > +       addr = dma_map_page_attrs(dev, virt_to_page(buf), offset_in_page(buf),
> > > > > +                                 map_len, DMA_FROM_DEVICE, 0);
> > > > > +       if (addr == DMA_MAPPING_ERROR)
> > > > > +               return -ENOMEM;
> > > > > +
> > > > > +       dma = rq->dma_free;
> > > > > +       rq->dma_free = dma->next;
> > > > > +
> > > > > +       dma->ref = 1;
> > > > > +       dma->buf = buf;
> > > > > +       dma->addr = addr;
> > > > > +       dma->len = map_len;
> > > > > +
> > > > > +       rq->last_dma = dma;
> > > > > +
> > > > > +ok:
> > > > > +       sg_init_table(rq->sg, 1);
> > > > > +       rq->sg[0].dma_address = addr;
> > > > > +       rq->sg[0].length = len;
> > > > > +
> > > > > +       return 0;
> > > > > +}
> > > > > +
> > > > > +static int virtnet_rq_merge_map_init(struct virtnet_info *vi)
> > > > > +{
> > > > > +       struct receive_queue *rq;
> > > > > +       int i, err, j, num;
> > > > > +
> > > > > +       /* disable for big mode */
> > > > > +       if (!vi->mergeable_rx_bufs && vi->big_packets)
> > > > > +               return 0;
> > > > > +
> > > > > +       for (i = 0; i < vi->max_queue_pairs; i++) {
> > > > > +               err = virtqueue_set_premapped(vi->rq[i].vq);
> > > > > +               if (err)
> > > > > +                       continue;
> > > > > +
> > > > > +               rq = &vi->rq[i];
> > > > > +
> > > > > +               num = virtqueue_get_vring_size(rq->vq);
> > > > > +
> > > > > +               rq->data_array = kmalloc_array(num, sizeof(*rq->data_array), GFP_KERNEL);
> > > > > +               if (!rq->data_array)
> > > > > +                       goto err;
> > > > > +
> > > > > +               rq->dma_array = kmalloc_array(num, sizeof(*rq->dma_array), GFP_KERNEL);
> > > > > +               if (!rq->dma_array)
> > > > > +                       goto err;
> > > > > +
> > > > > +               for (j = 0; j < num; ++j) {
> > > > > +                       rq->data_array[j].next = rq->data_free;
> > > > > +                       rq->data_free = &rq->data_array[j];
> > > > > +
> > > > > +                       rq->dma_array[j].next = rq->dma_free;
> > > > > +                       rq->dma_free = &rq->dma_array[j];
> > > > > +               }
> > > > > +       }
> > > > > +
> > > > > +       return 0;
> > > > > +
> > > > > +err:
> > > > > +       for (i = 0; i < vi->max_queue_pairs; i++) {
> > > > > +               struct receive_queue *rq;
> > > > > +
> > > > > +               rq = &vi->rq[i];
> > > > > +
> > > > > +               kfree(rq->dma_array);
> > > > > +               kfree(rq->data_array);
> > > > > +       }
> > > > > +
> > > > > +       return -ENOMEM;
> > > > > +}
> > > > > +
> > > > >  static void free_old_xmit_skbs(struct send_queue *sq, bool in_napi)
> > > > >  {
> > > > >         unsigned int len;
> > > > > @@ -835,7 +1033,7 @@ static struct page *xdp_linearize_page(struct receive_queue *rq,
> > > > >                 void *buf;
> > > > >                 int off;
> > > > >
> > > > > -               buf = virtqueue_get_buf(rq->vq, &buflen);
> > > > > +               buf = virtnet_rq_get_buf(rq, &buflen, NULL);
> > > > >                 if (unlikely(!buf))
> > > > >                         goto err_buf;
> > > > >
> > > > > @@ -1126,7 +1324,7 @@ static int virtnet_build_xdp_buff_mrg(struct net_device *dev,
> > > > >                 return -EINVAL;
> > > > >
> > > > >         while (--*num_buf > 0) {
> > > > > -               buf = virtqueue_get_buf_ctx(rq->vq, &len, &ctx);
> > > > > +               buf = virtnet_rq_get_buf(rq, &len, &ctx);
> > > > >                 if (unlikely(!buf)) {
> > > > >                         pr_debug("%s: rx error: %d buffers out of %d missing\n",
> > > > >                                  dev->name, *num_buf,
> > > > > @@ -1351,7 +1549,7 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
> > > > >         while (--num_buf) {
> > > > >                 int num_skb_frags;
> > > > >
> > > > > -               buf = virtqueue_get_buf_ctx(rq->vq, &len, &ctx);
> > > > > +               buf = virtnet_rq_get_buf(rq, &len, &ctx);
> > > > >                 if (unlikely(!buf)) {
> > > > >                         pr_debug("%s: rx error: %d buffers out of %d missing\n",
> > > > >                                  dev->name, num_buf,
> > > > > @@ -1414,7 +1612,7 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
> > > > >  err_skb:
> > > > >         put_page(page);
> > > > >         while (num_buf-- > 1) {
> > > > > -               buf = virtqueue_get_buf(rq->vq, &len);
> > > > > +               buf = virtnet_rq_get_buf(rq, &len, NULL);
> > > > >                 if (unlikely(!buf)) {
> > > > >                         pr_debug("%s: rx error: %d buffers missing\n",
> > > > >                                  dev->name, num_buf);
> > > > > @@ -1529,6 +1727,7 @@ static int add_recvbuf_small(struct virtnet_info *vi, struct receive_queue *rq,
> > > > >         unsigned int xdp_headroom = virtnet_get_headroom(vi);
> > > > >         void *ctx = (void *)(unsigned long)xdp_headroom;
> > > > >         int len = vi->hdr_len + VIRTNET_RX_PAD + GOOD_PACKET_LEN + xdp_headroom;
> > > > > +       struct virtnet_rq_data *data;
> > > > >         int err;
> > > > >
> > > > >         len = SKB_DATA_ALIGN(len) +
> > > > > @@ -1539,11 +1738,34 @@ static int add_recvbuf_small(struct virtnet_info *vi, struct receive_queue *rq,
> > > > >         buf = (char *)page_address(alloc_frag->page) + alloc_frag->offset;
> > > > >         get_page(alloc_frag->page);
> > > > >         alloc_frag->offset += len;
> > > > > -       sg_init_one(rq->sg, buf + VIRTNET_RX_PAD + xdp_headroom,
> > > > > -                   vi->hdr_len + GOOD_PACKET_LEN);
> > > > > -       err = virtqueue_add_inbuf_ctx(rq->vq, rq->sg, 1, buf, ctx, gfp);
> > > > > +
> > > > > +       if (rq->data_array) {
> > > > > +               err = virtnet_rq_map_sg(rq, buf + VIRTNET_RX_PAD + xdp_headroom,
> > > > > +                                       vi->hdr_len + GOOD_PACKET_LEN);
> > > > > +               if (err)
> > > > > +                       goto map_err;
> > > > > +
> > > > > +               data = virtnet_rq_get_data(rq, buf, rq->last_dma);
> > > > > +       } else {
> > > > > +               sg_init_one(rq->sg, buf + VIRTNET_RX_PAD + xdp_headroom,
> > > > > +                           vi->hdr_len + GOOD_PACKET_LEN);
> > > > > +               data = (void *)buf;
> > > > > +       }
> > > > > +
> > > > > +       err = virtqueue_add_inbuf_ctx(rq->vq, rq->sg, 1, data, ctx, gfp);
> > > > >         if (err < 0)
> > > > > -               put_page(virt_to_head_page(buf));
> > > > > +               goto add_err;
> > > > > +
> > > > > +       return err;
> > > > > +
> > > > > +add_err:
> > > > > +       if (rq->data_array) {
> > > > > +               virtnet_rq_unmap(rq, data->dma);
> > > > > +               virtnet_rq_recycle_data(rq, data);
> > > > > +       }
> > > > > +
> > > > > +map_err:
> > > > > +       put_page(virt_to_head_page(buf));
> > > > >         return err;
> > > > >  }
> > > > >
> > > > > @@ -1620,6 +1842,7 @@ static int add_recvbuf_mergeable(struct virtnet_info *vi,
> > > > >         unsigned int headroom = virtnet_get_headroom(vi);
> > > > >         unsigned int tailroom = headroom ? sizeof(struct skb_shared_info) : 0;
> > > > >         unsigned int room = SKB_DATA_ALIGN(headroom + tailroom);
> > > > > +       struct virtnet_rq_data *data;
> > > > >         char *buf;
> > > > >         void *ctx;
> > > > >         int err;
> > > > > @@ -1650,12 +1873,32 @@ static int add_recvbuf_mergeable(struct virtnet_info *vi,
> > > > >                 alloc_frag->offset += hole;
> > > > >         }
> > > > >
> > > > > -       sg_init_one(rq->sg, buf, len);
> > > > > +       if (rq->data_array) {
> > > > > +               err = virtnet_rq_map_sg(rq, buf, len);
> > > > > +               if (err)
> > > > > +                       goto map_err;
> > > > > +
> > > > > +               data = virtnet_rq_get_data(rq, buf, rq->last_dma);
> > > > > +       } else {
> > > > > +               sg_init_one(rq->sg, buf, len);
> > > > > +               data = (void *)buf;
> > > > > +       }
> > > > > +
> > > > >         ctx = mergeable_len_to_ctx(len + room, headroom);
> > > > > -       err = virtqueue_add_inbuf_ctx(rq->vq, rq->sg, 1, buf, ctx, gfp);
> > > > > +       err = virtqueue_add_inbuf_ctx(rq->vq, rq->sg, 1, data, ctx, gfp);
> > > > >         if (err < 0)
> > > > > -               put_page(virt_to_head_page(buf));
> > > > > +               goto add_err;
> > > > > +
> > > > > +       return 0;
> > > > > +
> > > > > +add_err:
> > > > > +       if (rq->data_array) {
> > > > > +               virtnet_rq_unmap(rq, data->dma);
> > > > > +               virtnet_rq_recycle_data(rq, data);
> > > > > +       }
> > > > >
> > > > > +map_err:
> > > > > +       put_page(virt_to_head_page(buf));
> > > > >         return err;
> > > > >  }
> > > > >
> > > > > @@ -1775,13 +2018,13 @@ static int virtnet_receive(struct receive_queue *rq, int budget,
> > > > >                 void *ctx;
> > > > >
> > > > >                 while (stats.packets < budget &&
> > > > > -                      (buf = virtqueue_get_buf_ctx(rq->vq, &len, &ctx))) {
> > > > > +                      (buf = virtnet_rq_get_buf(rq, &len, &ctx))) {
> > > > >                         receive_buf(vi, rq, buf, len, ctx, xdp_xmit, &stats);
> > > > >                         stats.packets++;
> > > > >                 }
> > > > >         } else {
> > > > >                 while (stats.packets < budget &&
> > > > > -                      (buf = virtqueue_get_buf(rq->vq, &len)) != NULL) {
> > > > > +                      (buf = virtnet_rq_get_buf(rq, &len, NULL)) != NULL) {
> > > > >                         receive_buf(vi, rq, buf, len, NULL, xdp_xmit, &stats);
> > > > >                         stats.packets++;
> > > > >                 }
> > > > > @@ -3514,6 +3757,9 @@ static void virtnet_free_queues(struct virtnet_info *vi)
> > > > >         for (i = 0; i < vi->max_queue_pairs; i++) {
> > > > >                 __netif_napi_del(&vi->rq[i].napi);
> > > > >                 __netif_napi_del(&vi->sq[i].napi);
> > > > > +
> > > > > +               kfree(vi->rq[i].data_array);
> > > > > +               kfree(vi->rq[i].dma_array);
> > > > >         }
> > > > >
> > > > >         /* We called __netif_napi_del(),
> > > > > @@ -3591,9 +3837,10 @@ static void free_unused_bufs(struct virtnet_info *vi)
> > > > >         }
> > > > >
> > > > >         for (i = 0; i < vi->max_queue_pairs; i++) {
> > > > > -               struct virtqueue *vq = vi->rq[i].vq;
> > > > > -               while ((buf = virtqueue_detach_unused_buf(vq)) != NULL)
> > > > > -                       virtnet_rq_free_unused_buf(vq, buf);
> > > > > +               struct receive_queue *rq = &vi->rq[i];
> > > > > +
> > > > > +               while ((buf = virtnet_rq_detach_unused_buf(rq)) != NULL)
> > > > > +                       virtnet_rq_free_unused_buf(rq->vq, buf);
> > > > >                 cond_resched();
> > > > >         }
> > > > >  }
> > > > > @@ -3767,6 +4014,10 @@ static int init_vqs(struct virtnet_info *vi)
> > > > >         if (ret)
> > > > >                 goto err_free;
> > > > >
> > > > > +       ret = virtnet_rq_merge_map_init(vi);
> > > > > +       if (ret)
> > > > > +               goto err_free;
> > > > > +
> > > > >         cpus_read_lock();
> > > > >         virtnet_set_affinity(vi);
> > > > >         cpus_read_unlock();
> > > > > --
> > > > > 2.32.0.3.g01195cf9f
> > > >
> >
>


^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: [PATCH vhost v11 10/10] virtio_net: merge dma operation for one page
@ 2023-07-11  2:36             ` Jason Wang
  0 siblings, 0 replies; 176+ messages in thread
From: Jason Wang @ 2023-07-11  2:36 UTC (permalink / raw)
  To: Xuan Zhuo
  Cc: Jesper Dangaard Brouer, Daniel Borkmann, Michael S. Tsirkin,
	netdev, John Fastabend, Alexei Starovoitov, virtualization,
	Christoph Hellwig, Eric Dumazet, Jakub Kicinski, bpf,
	Paolo Abeni, David S. Miller

On Mon, Jul 10, 2023 at 8:41 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
>
> On Mon, 10 Jul 2023 07:59:03 -0400, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > On Mon, Jul 10, 2023 at 06:18:30PM +0800, Xuan Zhuo wrote:
> > > On Mon, 10 Jul 2023 05:40:21 -0400, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > > > On Mon, Jul 10, 2023 at 11:42:37AM +0800, Xuan Zhuo wrote:
> > > > > Currently, the virtio core will perform a dma operation for each
> > > > > operation. Although, the same page may be operated multiple times.
> > > > >
> > > > > The driver does the dma operation and manages the dma address based the
> > > > > feature premapped of virtio core.
> > > > >
> > > > > This way, we can perform only one dma operation for the same page. In
> > > > > the case of mtu 1500, this can reduce a lot of dma operations.
> > > > >
> > > > > Tested on Aliyun g7.4large machine, in the case of a cpu 100%, pps
> > > > > increased from 1893766 to 1901105. An increase of 0.4%.
> > > >
> > > > what kind of dma was there? an IOMMU? which vendors? in which mode
> > > > of operation?
> > >
> > >
> > > Do you mean this:
> > >
> > > [    0.470816] iommu: Default domain type: Passthrough
> > >
> >
> > With passthrough, dma API is just some indirect function calls, they do
> > not affect the performance a lot.
>
>
> Yes, this benefit is worthless. I seem to have done a meaningless thing. The
> overhead of DMA I observed is indeed not too high.

Have you measured with iommu=strict?

Thanks

>
> Thanks.
>
>
> >
> > Try e.g. bounce buffer. Which is where you will see a problem: your
> > patches won't work.
> >
> >
> > > >
> > > > > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > > >
> > > > This kind of difference is likely in the noise.
> > >
> > > It's really not high, but this is because the proportion of DMA under perf top
> > > is not high. Probably that much.
> >
> > So maybe not worth the complexity.
> >
> > > >
> > > >
> > > > > ---
> > > > >  drivers/net/virtio_net.c | 283 ++++++++++++++++++++++++++++++++++++---
> > > > >  1 file changed, 267 insertions(+), 16 deletions(-)
> > > > >
> > > > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > > > > index 486b5849033d..4de845d35bed 100644
> > > > > --- a/drivers/net/virtio_net.c
> > > > > +++ b/drivers/net/virtio_net.c
> > > > > @@ -126,6 +126,27 @@ static const struct virtnet_stat_desc virtnet_rq_stats_desc[] = {
> > > > >  #define VIRTNET_SQ_STATS_LEN   ARRAY_SIZE(virtnet_sq_stats_desc)
> > > > >  #define VIRTNET_RQ_STATS_LEN   ARRAY_SIZE(virtnet_rq_stats_desc)
> > > > >
> > > > > +/* The bufs on the same page may share this struct. */
> > > > > +struct virtnet_rq_dma {
> > > > > +       struct virtnet_rq_dma *next;
> > > > > +
> > > > > +       dma_addr_t addr;
> > > > > +
> > > > > +       void *buf;
> > > > > +       u32 len;
> > > > > +
> > > > > +       u32 ref;
> > > > > +};
> > > > > +
> > > > > +/* Record the dma and buf. */
> > > >
> > > > I guess I see that. But why?
> > > > And these two comments are the extent of the available
> > > > documentation, that's not enough I feel.
> > > >
> > > >
> > > > > +struct virtnet_rq_data {
> > > > > +       struct virtnet_rq_data *next;
> > > >
> > > > Is manually reimplementing a linked list the best
> > > > we can do?
> > >
> > > Yes, we can use llist.
> > >
> > > >
> > > > > +
> > > > > +       void *buf;
> > > > > +
> > > > > +       struct virtnet_rq_dma *dma;
> > > > > +};
> > > > > +
> > > > >  /* Internal representation of a send virtqueue */
> > > > >  struct send_queue {
> > > > >         /* Virtqueue associated with this send _queue */
> > > > > @@ -175,6 +196,13 @@ struct receive_queue {
> > > > >         char name[16];
> > > > >
> > > > >         struct xdp_rxq_info xdp_rxq;
> > > > > +
> > > > > +       struct virtnet_rq_data *data_array;
> > > > > +       struct virtnet_rq_data *data_free;
> > > > > +
> > > > > +       struct virtnet_rq_dma *dma_array;
> > > > > +       struct virtnet_rq_dma *dma_free;
> > > > > +       struct virtnet_rq_dma *last_dma;
> > > > >  };
> > > > >
> > > > >  /* This structure can contain rss message with maximum settings for indirection table and keysize
> > > > > @@ -549,6 +577,176 @@ static struct sk_buff *page_to_skb(struct virtnet_info *vi,
> > > > >         return skb;
> > > > >  }
> > > > >
> > > > > +static void virtnet_rq_unmap(struct receive_queue *rq, struct virtnet_rq_dma *dma)
> > > > > +{
> > > > > +       struct device *dev;
> > > > > +
> > > > > +       --dma->ref;
> > > > > +
> > > > > +       if (dma->ref)
> > > > > +               return;
> > > > > +
> > > >
> > > > If you don't unmap there is no guarantee valid data will be
> > > > there in the buffer.
> > > >
> > > > > +       dev = virtqueue_dma_dev(rq->vq);
> > > > > +
> > > > > +       dma_unmap_page(dev, dma->addr, dma->len, DMA_FROM_DEVICE);
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > > +
> > > > > +       dma->next = rq->dma_free;
> > > > > +       rq->dma_free = dma;
> > > > > +}
> > > > > +
> > > > > +static void *virtnet_rq_recycle_data(struct receive_queue *rq,
> > > > > +                                    struct virtnet_rq_data *data)
> > > > > +{
> > > > > +       void *buf;
> > > > > +
> > > > > +       buf = data->buf;
> > > > > +
> > > > > +       data->next = rq->data_free;
> > > > > +       rq->data_free = data;
> > > > > +
> > > > > +       return buf;
> > > > > +}
> > > > > +
> > > > > +static struct virtnet_rq_data *virtnet_rq_get_data(struct receive_queue *rq,
> > > > > +                                                  void *buf,
> > > > > +                                                  struct virtnet_rq_dma *dma)
> > > > > +{
> > > > > +       struct virtnet_rq_data *data;
> > > > > +
> > > > > +       data = rq->data_free;
> > > > > +       rq->data_free = data->next;
> > > > > +
> > > > > +       data->buf = buf;
> > > > > +       data->dma = dma;
> > > > > +
> > > > > +       return data;
> > > > > +}
> > > > > +
> > > > > +static void *virtnet_rq_get_buf(struct receive_queue *rq, u32 *len, void **ctx)
> > > > > +{
> > > > > +       struct virtnet_rq_data *data;
> > > > > +       void *buf;
> > > > > +
> > > > > +       buf = virtqueue_get_buf_ctx(rq->vq, len, ctx);
> > > > > +       if (!buf || !rq->data_array)
> > > > > +               return buf;
> > > > > +
> > > > > +       data = buf;
> > > > > +
> > > > > +       virtnet_rq_unmap(rq, data->dma);
> > > > > +
> > > > > +       return virtnet_rq_recycle_data(rq, data);
> > > > > +}
> > > > > +
> > > > > +static void *virtnet_rq_detach_unused_buf(struct receive_queue *rq)
> > > > > +{
> > > > > +       struct virtnet_rq_data *data;
> > > > > +       void *buf;
> > > > > +
> > > > > +       buf = virtqueue_detach_unused_buf(rq->vq);
> > > > > +       if (!buf || !rq->data_array)
> > > > > +               return buf;
> > > > > +
> > > > > +       data = buf;
> > > > > +
> > > > > +       virtnet_rq_unmap(rq, data->dma);
> > > > > +
> > > > > +       return virtnet_rq_recycle_data(rq, data);
> > > > > +}
> > > > > +
> > > > > +static int virtnet_rq_map_sg(struct receive_queue *rq, void *buf, u32 len)
> > > > > +{
> > > > > +       struct virtnet_rq_dma *dma = rq->last_dma;
> > > > > +       struct device *dev;
> > > > > +       u32 off, map_len;
> > > > > +       dma_addr_t addr;
> > > > > +       void *end;
> > > > > +
> > > > > +       if (likely(dma) && buf >= dma->buf && (buf + len <= dma->buf + dma->len)) {
> > > > > +               ++dma->ref;
> > > > > +               addr = dma->addr + (buf - dma->buf);
> > > > > +               goto ok;
> > > > > +       }
> > > >
> > > > So this is the meat of the proposed optimization. I guess that
> > > > if the last buffer we allocated happens to be in the same page
> > > > as this one then they can both be mapped for DMA together.
> > >
> > > Since we use page_frag, the buffers we allocated are all continuous.
> > >
> > > > Why last one specifically? Whether next one happens to
> > > > be close depends on luck. If you want to try optimizing this
> > > > the right thing to do is likely by using a page pool.
> > > > There's actually work upstream on page pool, look it up.
> > >
> > > As we discussed in another thread, the page pool is first used for xdp. Let's
> > > transform it step by step.
> > >
> > > Thanks.
> >
> > ok so this should wait then?
> >
> > > >
> > > > > +
> > > > > +       end = buf + len - 1;
> > > > > +       off = offset_in_page(end);
> > > > > +       map_len = len + PAGE_SIZE - off;
> > > > > +
> > > > > +       dev = virtqueue_dma_dev(rq->vq);
> > > > > +
> > > > > +       addr = dma_map_page_attrs(dev, virt_to_page(buf), offset_in_page(buf),
> > > > > +                                 map_len, DMA_FROM_DEVICE, 0);
> > > > > +       if (addr == DMA_MAPPING_ERROR)
> > > > > +               return -ENOMEM;
> > > > > +
> > > > > +       dma = rq->dma_free;
> > > > > +       rq->dma_free = dma->next;
> > > > > +
> > > > > +       dma->ref = 1;
> > > > > +       dma->buf = buf;
> > > > > +       dma->addr = addr;
> > > > > +       dma->len = map_len;
> > > > > +
> > > > > +       rq->last_dma = dma;
> > > > > +
> > > > > +ok:
> > > > > +       sg_init_table(rq->sg, 1);
> > > > > +       rq->sg[0].dma_address = addr;
> > > > > +       rq->sg[0].length = len;
> > > > > +
> > > > > +       return 0;
> > > > > +}
> > > > > +
> > > > > +static int virtnet_rq_merge_map_init(struct virtnet_info *vi)
> > > > > +{
> > > > > +       struct receive_queue *rq;
> > > > > +       int i, err, j, num;
> > > > > +
> > > > > +       /* disable for big mode */
> > > > > +       if (!vi->mergeable_rx_bufs && vi->big_packets)
> > > > > +               return 0;
> > > > > +
> > > > > +       for (i = 0; i < vi->max_queue_pairs; i++) {
> > > > > +               err = virtqueue_set_premapped(vi->rq[i].vq);
> > > > > +               if (err)
> > > > > +                       continue;
> > > > > +
> > > > > +               rq = &vi->rq[i];
> > > > > +
> > > > > +               num = virtqueue_get_vring_size(rq->vq);
> > > > > +
> > > > > +               rq->data_array = kmalloc_array(num, sizeof(*rq->data_array), GFP_KERNEL);
> > > > > +               if (!rq->data_array)
> > > > > +                       goto err;
> > > > > +
> > > > > +               rq->dma_array = kmalloc_array(num, sizeof(*rq->dma_array), GFP_KERNEL);
> > > > > +               if (!rq->dma_array)
> > > > > +                       goto err;
> > > > > +
> > > > > +               for (j = 0; j < num; ++j) {
> > > > > +                       rq->data_array[j].next = rq->data_free;
> > > > > +                       rq->data_free = &rq->data_array[j];
> > > > > +
> > > > > +                       rq->dma_array[j].next = rq->dma_free;
> > > > > +                       rq->dma_free = &rq->dma_array[j];
> > > > > +               }
> > > > > +       }
> > > > > +
> > > > > +       return 0;
> > > > > +
> > > > > +err:
> > > > > +       for (i = 0; i < vi->max_queue_pairs; i++) {
> > > > > +               struct receive_queue *rq;
> > > > > +
> > > > > +               rq = &vi->rq[i];
> > > > > +
> > > > > +               kfree(rq->dma_array);
> > > > > +               kfree(rq->data_array);
> > > > > +       }
> > > > > +
> > > > > +       return -ENOMEM;
> > > > > +}
> > > > > +
> > > > >  static void free_old_xmit_skbs(struct send_queue *sq, bool in_napi)
> > > > >  {
> > > > >         unsigned int len;
> > > > > @@ -835,7 +1033,7 @@ static struct page *xdp_linearize_page(struct receive_queue *rq,
> > > > >                 void *buf;
> > > > >                 int off;
> > > > >
> > > > > -               buf = virtqueue_get_buf(rq->vq, &buflen);
> > > > > +               buf = virtnet_rq_get_buf(rq, &buflen, NULL);
> > > > >                 if (unlikely(!buf))
> > > > >                         goto err_buf;
> > > > >
> > > > > @@ -1126,7 +1324,7 @@ static int virtnet_build_xdp_buff_mrg(struct net_device *dev,
> > > > >                 return -EINVAL;
> > > > >
> > > > >         while (--*num_buf > 0) {
> > > > > -               buf = virtqueue_get_buf_ctx(rq->vq, &len, &ctx);
> > > > > +               buf = virtnet_rq_get_buf(rq, &len, &ctx);
> > > > >                 if (unlikely(!buf)) {
> > > > >                         pr_debug("%s: rx error: %d buffers out of %d missing\n",
> > > > >                                  dev->name, *num_buf,
> > > > > @@ -1351,7 +1549,7 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
> > > > >         while (--num_buf) {
> > > > >                 int num_skb_frags;
> > > > >
> > > > > -               buf = virtqueue_get_buf_ctx(rq->vq, &len, &ctx);
> > > > > +               buf = virtnet_rq_get_buf(rq, &len, &ctx);
> > > > >                 if (unlikely(!buf)) {
> > > > >                         pr_debug("%s: rx error: %d buffers out of %d missing\n",
> > > > >                                  dev->name, num_buf,
> > > > > @@ -1414,7 +1612,7 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
> > > > >  err_skb:
> > > > >         put_page(page);
> > > > >         while (num_buf-- > 1) {
> > > > > -               buf = virtqueue_get_buf(rq->vq, &len);
> > > > > +               buf = virtnet_rq_get_buf(rq, &len, NULL);
> > > > >                 if (unlikely(!buf)) {
> > > > >                         pr_debug("%s: rx error: %d buffers missing\n",
> > > > >                                  dev->name, num_buf);
> > > > > @@ -1529,6 +1727,7 @@ static int add_recvbuf_small(struct virtnet_info *vi, struct receive_queue *rq,
> > > > >         unsigned int xdp_headroom = virtnet_get_headroom(vi);
> > > > >         void *ctx = (void *)(unsigned long)xdp_headroom;
> > > > >         int len = vi->hdr_len + VIRTNET_RX_PAD + GOOD_PACKET_LEN + xdp_headroom;
> > > > > +       struct virtnet_rq_data *data;
> > > > >         int err;
> > > > >
> > > > >         len = SKB_DATA_ALIGN(len) +
> > > > > @@ -1539,11 +1738,34 @@ static int add_recvbuf_small(struct virtnet_info *vi, struct receive_queue *rq,
> > > > >         buf = (char *)page_address(alloc_frag->page) + alloc_frag->offset;
> > > > >         get_page(alloc_frag->page);
> > > > >         alloc_frag->offset += len;
> > > > > -       sg_init_one(rq->sg, buf + VIRTNET_RX_PAD + xdp_headroom,
> > > > > -                   vi->hdr_len + GOOD_PACKET_LEN);
> > > > > -       err = virtqueue_add_inbuf_ctx(rq->vq, rq->sg, 1, buf, ctx, gfp);
> > > > > +
> > > > > +       if (rq->data_array) {
> > > > > +               err = virtnet_rq_map_sg(rq, buf + VIRTNET_RX_PAD + xdp_headroom,
> > > > > +                                       vi->hdr_len + GOOD_PACKET_LEN);
> > > > > +               if (err)
> > > > > +                       goto map_err;
> > > > > +
> > > > > +               data = virtnet_rq_get_data(rq, buf, rq->last_dma);
> > > > > +       } else {
> > > > > +               sg_init_one(rq->sg, buf + VIRTNET_RX_PAD + xdp_headroom,
> > > > > +                           vi->hdr_len + GOOD_PACKET_LEN);
> > > > > +               data = (void *)buf;
> > > > > +       }
> > > > > +
> > > > > +       err = virtqueue_add_inbuf_ctx(rq->vq, rq->sg, 1, data, ctx, gfp);
> > > > >         if (err < 0)
> > > > > -               put_page(virt_to_head_page(buf));
> > > > > +               goto add_err;
> > > > > +
> > > > > +       return err;
> > > > > +
> > > > > +add_err:
> > > > > +       if (rq->data_array) {
> > > > > +               virtnet_rq_unmap(rq, data->dma);
> > > > > +               virtnet_rq_recycle_data(rq, data);
> > > > > +       }
> > > > > +
> > > > > +map_err:
> > > > > +       put_page(virt_to_head_page(buf));
> > > > >         return err;
> > > > >  }
> > > > >
> > > > > @@ -1620,6 +1842,7 @@ static int add_recvbuf_mergeable(struct virtnet_info *vi,
> > > > >         unsigned int headroom = virtnet_get_headroom(vi);
> > > > >         unsigned int tailroom = headroom ? sizeof(struct skb_shared_info) : 0;
> > > > >         unsigned int room = SKB_DATA_ALIGN(headroom + tailroom);
> > > > > +       struct virtnet_rq_data *data;
> > > > >         char *buf;
> > > > >         void *ctx;
> > > > >         int err;
> > > > > @@ -1650,12 +1873,32 @@ static int add_recvbuf_mergeable(struct virtnet_info *vi,
> > > > >                 alloc_frag->offset += hole;
> > > > >         }
> > > > >
> > > > > -       sg_init_one(rq->sg, buf, len);
> > > > > +       if (rq->data_array) {
> > > > > +               err = virtnet_rq_map_sg(rq, buf, len);
> > > > > +               if (err)
> > > > > +                       goto map_err;
> > > > > +
> > > > > +               data = virtnet_rq_get_data(rq, buf, rq->last_dma);
> > > > > +       } else {
> > > > > +               sg_init_one(rq->sg, buf, len);
> > > > > +               data = (void *)buf;
> > > > > +       }
> > > > > +
> > > > >         ctx = mergeable_len_to_ctx(len + room, headroom);
> > > > > -       err = virtqueue_add_inbuf_ctx(rq->vq, rq->sg, 1, buf, ctx, gfp);
> > > > > +       err = virtqueue_add_inbuf_ctx(rq->vq, rq->sg, 1, data, ctx, gfp);
> > > > >         if (err < 0)
> > > > > -               put_page(virt_to_head_page(buf));
> > > > > +               goto add_err;
> > > > > +
> > > > > +       return 0;
> > > > > +
> > > > > +add_err:
> > > > > +       if (rq->data_array) {
> > > > > +               virtnet_rq_unmap(rq, data->dma);
> > > > > +               virtnet_rq_recycle_data(rq, data);
> > > > > +       }
> > > > >
> > > > > +map_err:
> > > > > +       put_page(virt_to_head_page(buf));
> > > > >         return err;
> > > > >  }
> > > > >
> > > > > @@ -1775,13 +2018,13 @@ static int virtnet_receive(struct receive_queue *rq, int budget,
> > > > >                 void *ctx;
> > > > >
> > > > >                 while (stats.packets < budget &&
> > > > > -                      (buf = virtqueue_get_buf_ctx(rq->vq, &len, &ctx))) {
> > > > > +                      (buf = virtnet_rq_get_buf(rq, &len, &ctx))) {
> > > > >                         receive_buf(vi, rq, buf, len, ctx, xdp_xmit, &stats);
> > > > >                         stats.packets++;
> > > > >                 }
> > > > >         } else {
> > > > >                 while (stats.packets < budget &&
> > > > > -                      (buf = virtqueue_get_buf(rq->vq, &len)) != NULL) {
> > > > > +                      (buf = virtnet_rq_get_buf(rq, &len, NULL)) != NULL) {
> > > > >                         receive_buf(vi, rq, buf, len, NULL, xdp_xmit, &stats);
> > > > >                         stats.packets++;
> > > > >                 }
> > > > > @@ -3514,6 +3757,9 @@ static void virtnet_free_queues(struct virtnet_info *vi)
> > > > >         for (i = 0; i < vi->max_queue_pairs; i++) {
> > > > >                 __netif_napi_del(&vi->rq[i].napi);
> > > > >                 __netif_napi_del(&vi->sq[i].napi);
> > > > > +
> > > > > +               kfree(vi->rq[i].data_array);
> > > > > +               kfree(vi->rq[i].dma_array);
> > > > >         }
> > > > >
> > > > >         /* We called __netif_napi_del(),
> > > > > @@ -3591,9 +3837,10 @@ static void free_unused_bufs(struct virtnet_info *vi)
> > > > >         }
> > > > >
> > > > >         for (i = 0; i < vi->max_queue_pairs; i++) {
> > > > > -               struct virtqueue *vq = vi->rq[i].vq;
> > > > > -               while ((buf = virtqueue_detach_unused_buf(vq)) != NULL)
> > > > > -                       virtnet_rq_free_unused_buf(vq, buf);
> > > > > +               struct receive_queue *rq = &vi->rq[i];
> > > > > +
> > > > > +               while ((buf = virtnet_rq_detach_unused_buf(rq)) != NULL)
> > > > > +                       virtnet_rq_free_unused_buf(rq->vq, buf);
> > > > >                 cond_resched();
> > > > >         }
> > > > >  }
> > > > > @@ -3767,6 +4014,10 @@ static int init_vqs(struct virtnet_info *vi)
> > > > >         if (ret)
> > > > >                 goto err_free;
> > > > >
> > > > > +       ret = virtnet_rq_merge_map_init(vi);
> > > > > +       if (ret)
> > > > > +               goto err_free;
> > > > > +
> > > > >         cpus_read_lock();
> > > > >         virtnet_set_affinity(vi);
> > > > >         cpus_read_unlock();
> > > > > --
> > > > > 2.32.0.3.g01195cf9f
> > > >
> >
>

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: [PATCH vhost v11 10/10] virtio_net: merge dma operation for one page
  2023-07-11  2:36             ` Jason Wang
@ 2023-07-11  2:40               ` Xuan Zhuo
  -1 siblings, 0 replies; 176+ messages in thread
From: Xuan Zhuo @ 2023-07-11  2:40 UTC (permalink / raw)
  To: Jason Wang
  Cc: Jesper Dangaard Brouer, Daniel Borkmann, Michael S. Tsirkin,
	netdev, John Fastabend, Alexei Starovoitov, virtualization,
	Christoph Hellwig, Eric Dumazet, Jakub Kicinski, bpf,
	Paolo Abeni, David S. Miller

On Tue, 11 Jul 2023 10:36:17 +0800, Jason Wang <jasowang@redhat.com> wrote:
> On Mon, Jul 10, 2023 at 8:41 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> >
> > On Mon, 10 Jul 2023 07:59:03 -0400, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > > On Mon, Jul 10, 2023 at 06:18:30PM +0800, Xuan Zhuo wrote:
> > > > On Mon, 10 Jul 2023 05:40:21 -0400, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > > > > On Mon, Jul 10, 2023 at 11:42:37AM +0800, Xuan Zhuo wrote:
> > > > > > Currently, the virtio core will perform a dma operation for each
> > > > > > operation. Although, the same page may be operated multiple times.
> > > > > >
> > > > > > The driver does the dma operation and manages the dma address based the
> > > > > > feature premapped of virtio core.
> > > > > >
> > > > > > This way, we can perform only one dma operation for the same page. In
> > > > > > the case of mtu 1500, this can reduce a lot of dma operations.
> > > > > >
> > > > > > Tested on Aliyun g7.4large machine, in the case of a cpu 100%, pps
> > > > > > increased from 1893766 to 1901105. An increase of 0.4%.
> > > > >
> > > > > what kind of dma was there? an IOMMU? which vendors? in which mode
> > > > > of operation?
> > > >
> > > >
> > > > Do you mean this:
> > > >
> > > > [    0.470816] iommu: Default domain type: Passthrough
> > > >
> > >
> > > With passthrough, dma API is just some indirect function calls, they do
> > > not affect the performance a lot.
> >
> >
> > Yes, this benefit is worthless. I seem to have done a meaningless thing. The
> > overhead of DMA I observed is indeed not too high.
>
> Have you measured with iommu=strict?

I have not tested this way, our environment is pt, I wonder if strict is a
common scenario. I can test it.

Thanks.


>
> Thanks
>
> >
> > Thanks.
> >
> >
> > >
> > > Try e.g. bounce buffer. Which is where you will see a problem: your
> > > patches won't work.
> > >
> > >
> > > > >
> > > > > > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > > > >
> > > > > This kind of difference is likely in the noise.
> > > >
> > > > It's really not high, but this is because the proportion of DMA under perf top
> > > > is not high. Probably that much.
> > >
> > > So maybe not worth the complexity.
> > >
> > > > >
> > > > >
> > > > > > ---
> > > > > >  drivers/net/virtio_net.c | 283 ++++++++++++++++++++++++++++++++++++---
> > > > > >  1 file changed, 267 insertions(+), 16 deletions(-)
> > > > > >
> > > > > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > > > > > index 486b5849033d..4de845d35bed 100644
> > > > > > --- a/drivers/net/virtio_net.c
> > > > > > +++ b/drivers/net/virtio_net.c
> > > > > > @@ -126,6 +126,27 @@ static const struct virtnet_stat_desc virtnet_rq_stats_desc[] = {
> > > > > >  #define VIRTNET_SQ_STATS_LEN   ARRAY_SIZE(virtnet_sq_stats_desc)
> > > > > >  #define VIRTNET_RQ_STATS_LEN   ARRAY_SIZE(virtnet_rq_stats_desc)
> > > > > >
> > > > > > +/* The bufs on the same page may share this struct. */
> > > > > > +struct virtnet_rq_dma {
> > > > > > +       struct virtnet_rq_dma *next;
> > > > > > +
> > > > > > +       dma_addr_t addr;
> > > > > > +
> > > > > > +       void *buf;
> > > > > > +       u32 len;
> > > > > > +
> > > > > > +       u32 ref;
> > > > > > +};
> > > > > > +
> > > > > > +/* Record the dma and buf. */
> > > > >
> > > > > I guess I see that. But why?
> > > > > And these two comments are the extent of the available
> > > > > documentation, that's not enough I feel.
> > > > >
> > > > >
> > > > > > +struct virtnet_rq_data {
> > > > > > +       struct virtnet_rq_data *next;
> > > > >
> > > > > Is manually reimplementing a linked list the best
> > > > > we can do?
> > > >
> > > > Yes, we can use llist.
> > > >
> > > > >
> > > > > > +
> > > > > > +       void *buf;
> > > > > > +
> > > > > > +       struct virtnet_rq_dma *dma;
> > > > > > +};
> > > > > > +
> > > > > >  /* Internal representation of a send virtqueue */
> > > > > >  struct send_queue {
> > > > > >         /* Virtqueue associated with this send _queue */
> > > > > > @@ -175,6 +196,13 @@ struct receive_queue {
> > > > > >         char name[16];
> > > > > >
> > > > > >         struct xdp_rxq_info xdp_rxq;
> > > > > > +
> > > > > > +       struct virtnet_rq_data *data_array;
> > > > > > +       struct virtnet_rq_data *data_free;
> > > > > > +
> > > > > > +       struct virtnet_rq_dma *dma_array;
> > > > > > +       struct virtnet_rq_dma *dma_free;
> > > > > > +       struct virtnet_rq_dma *last_dma;
> > > > > >  };
> > > > > >
> > > > > >  /* This structure can contain rss message with maximum settings for indirection table and keysize
> > > > > > @@ -549,6 +577,176 @@ static struct sk_buff *page_to_skb(struct virtnet_info *vi,
> > > > > >         return skb;
> > > > > >  }
> > > > > >
> > > > > > +static void virtnet_rq_unmap(struct receive_queue *rq, struct virtnet_rq_dma *dma)
> > > > > > +{
> > > > > > +       struct device *dev;
> > > > > > +
> > > > > > +       --dma->ref;
> > > > > > +
> > > > > > +       if (dma->ref)
> > > > > > +               return;
> > > > > > +
> > > > >
> > > > > If you don't unmap there is no guarantee valid data will be
> > > > > there in the buffer.
> > > > >
> > > > > > +       dev = virtqueue_dma_dev(rq->vq);
> > > > > > +
> > > > > > +       dma_unmap_page(dev, dma->addr, dma->len, DMA_FROM_DEVICE);
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > > +
> > > > > > +       dma->next = rq->dma_free;
> > > > > > +       rq->dma_free = dma;
> > > > > > +}
> > > > > > +
> > > > > > +static void *virtnet_rq_recycle_data(struct receive_queue *rq,
> > > > > > +                                    struct virtnet_rq_data *data)
> > > > > > +{
> > > > > > +       void *buf;
> > > > > > +
> > > > > > +       buf = data->buf;
> > > > > > +
> > > > > > +       data->next = rq->data_free;
> > > > > > +       rq->data_free = data;
> > > > > > +
> > > > > > +       return buf;
> > > > > > +}
> > > > > > +
> > > > > > +static struct virtnet_rq_data *virtnet_rq_get_data(struct receive_queue *rq,
> > > > > > +                                                  void *buf,
> > > > > > +                                                  struct virtnet_rq_dma *dma)
> > > > > > +{
> > > > > > +       struct virtnet_rq_data *data;
> > > > > > +
> > > > > > +       data = rq->data_free;
> > > > > > +       rq->data_free = data->next;
> > > > > > +
> > > > > > +       data->buf = buf;
> > > > > > +       data->dma = dma;
> > > > > > +
> > > > > > +       return data;
> > > > > > +}
> > > > > > +
> > > > > > +static void *virtnet_rq_get_buf(struct receive_queue *rq, u32 *len, void **ctx)
> > > > > > +{
> > > > > > +       struct virtnet_rq_data *data;
> > > > > > +       void *buf;
> > > > > > +
> > > > > > +       buf = virtqueue_get_buf_ctx(rq->vq, len, ctx);
> > > > > > +       if (!buf || !rq->data_array)
> > > > > > +               return buf;
> > > > > > +
> > > > > > +       data = buf;
> > > > > > +
> > > > > > +       virtnet_rq_unmap(rq, data->dma);
> > > > > > +
> > > > > > +       return virtnet_rq_recycle_data(rq, data);
> > > > > > +}
> > > > > > +
> > > > > > +static void *virtnet_rq_detach_unused_buf(struct receive_queue *rq)
> > > > > > +{
> > > > > > +       struct virtnet_rq_data *data;
> > > > > > +       void *buf;
> > > > > > +
> > > > > > +       buf = virtqueue_detach_unused_buf(rq->vq);
> > > > > > +       if (!buf || !rq->data_array)
> > > > > > +               return buf;
> > > > > > +
> > > > > > +       data = buf;
> > > > > > +
> > > > > > +       virtnet_rq_unmap(rq, data->dma);
> > > > > > +
> > > > > > +       return virtnet_rq_recycle_data(rq, data);
> > > > > > +}
> > > > > > +
> > > > > > +static int virtnet_rq_map_sg(struct receive_queue *rq, void *buf, u32 len)
> > > > > > +{
> > > > > > +       struct virtnet_rq_dma *dma = rq->last_dma;
> > > > > > +       struct device *dev;
> > > > > > +       u32 off, map_len;
> > > > > > +       dma_addr_t addr;
> > > > > > +       void *end;
> > > > > > +
> > > > > > +       if (likely(dma) && buf >= dma->buf && (buf + len <= dma->buf + dma->len)) {
> > > > > > +               ++dma->ref;
> > > > > > +               addr = dma->addr + (buf - dma->buf);
> > > > > > +               goto ok;
> > > > > > +       }
> > > > >
> > > > > So this is the meat of the proposed optimization. I guess that
> > > > > if the last buffer we allocated happens to be in the same page
> > > > > as this one then they can both be mapped for DMA together.
> > > >
> > > > Since we use page_frag, the buffers we allocated are all continuous.
> > > >
> > > > > Why last one specifically? Whether next one happens to
> > > > > be close depends on luck. If you want to try optimizing this
> > > > > the right thing to do is likely by using a page pool.
> > > > > There's actually work upstream on page pool, look it up.
> > > >
> > > > As we discussed in another thread, the page pool is first used for xdp. Let's
> > > > transform it step by step.
> > > >
> > > > Thanks.
> > >
> > > ok so this should wait then?
> > >
> > > > >
> > > > > > +
> > > > > > +       end = buf + len - 1;
> > > > > > +       off = offset_in_page(end);
> > > > > > +       map_len = len + PAGE_SIZE - off;
> > > > > > +
> > > > > > +       dev = virtqueue_dma_dev(rq->vq);
> > > > > > +
> > > > > > +       addr = dma_map_page_attrs(dev, virt_to_page(buf), offset_in_page(buf),
> > > > > > +                                 map_len, DMA_FROM_DEVICE, 0);
> > > > > > +       if (addr == DMA_MAPPING_ERROR)
> > > > > > +               return -ENOMEM;
> > > > > > +
> > > > > > +       dma = rq->dma_free;
> > > > > > +       rq->dma_free = dma->next;
> > > > > > +
> > > > > > +       dma->ref = 1;
> > > > > > +       dma->buf = buf;
> > > > > > +       dma->addr = addr;
> > > > > > +       dma->len = map_len;
> > > > > > +
> > > > > > +       rq->last_dma = dma;
> > > > > > +
> > > > > > +ok:
> > > > > > +       sg_init_table(rq->sg, 1);
> > > > > > +       rq->sg[0].dma_address = addr;
> > > > > > +       rq->sg[0].length = len;
> > > > > > +
> > > > > > +       return 0;
> > > > > > +}
> > > > > > +
> > > > > > +static int virtnet_rq_merge_map_init(struct virtnet_info *vi)
> > > > > > +{
> > > > > > +       struct receive_queue *rq;
> > > > > > +       int i, err, j, num;
> > > > > > +
> > > > > > +       /* disable for big mode */
> > > > > > +       if (!vi->mergeable_rx_bufs && vi->big_packets)
> > > > > > +               return 0;
> > > > > > +
> > > > > > +       for (i = 0; i < vi->max_queue_pairs; i++) {
> > > > > > +               err = virtqueue_set_premapped(vi->rq[i].vq);
> > > > > > +               if (err)
> > > > > > +                       continue;
> > > > > > +
> > > > > > +               rq = &vi->rq[i];
> > > > > > +
> > > > > > +               num = virtqueue_get_vring_size(rq->vq);
> > > > > > +
> > > > > > +               rq->data_array = kmalloc_array(num, sizeof(*rq->data_array), GFP_KERNEL);
> > > > > > +               if (!rq->data_array)
> > > > > > +                       goto err;
> > > > > > +
> > > > > > +               rq->dma_array = kmalloc_array(num, sizeof(*rq->dma_array), GFP_KERNEL);
> > > > > > +               if (!rq->dma_array)
> > > > > > +                       goto err;
> > > > > > +
> > > > > > +               for (j = 0; j < num; ++j) {
> > > > > > +                       rq->data_array[j].next = rq->data_free;
> > > > > > +                       rq->data_free = &rq->data_array[j];
> > > > > > +
> > > > > > +                       rq->dma_array[j].next = rq->dma_free;
> > > > > > +                       rq->dma_free = &rq->dma_array[j];
> > > > > > +               }
> > > > > > +       }
> > > > > > +
> > > > > > +       return 0;
> > > > > > +
> > > > > > +err:
> > > > > > +       for (i = 0; i < vi->max_queue_pairs; i++) {
> > > > > > +               struct receive_queue *rq;
> > > > > > +
> > > > > > +               rq = &vi->rq[i];
> > > > > > +
> > > > > > +               kfree(rq->dma_array);
> > > > > > +               kfree(rq->data_array);
> > > > > > +       }
> > > > > > +
> > > > > > +       return -ENOMEM;
> > > > > > +}
> > > > > > +
> > > > > >  static void free_old_xmit_skbs(struct send_queue *sq, bool in_napi)
> > > > > >  {
> > > > > >         unsigned int len;
> > > > > > @@ -835,7 +1033,7 @@ static struct page *xdp_linearize_page(struct receive_queue *rq,
> > > > > >                 void *buf;
> > > > > >                 int off;
> > > > > >
> > > > > > -               buf = virtqueue_get_buf(rq->vq, &buflen);
> > > > > > +               buf = virtnet_rq_get_buf(rq, &buflen, NULL);
> > > > > >                 if (unlikely(!buf))
> > > > > >                         goto err_buf;
> > > > > >
> > > > > > @@ -1126,7 +1324,7 @@ static int virtnet_build_xdp_buff_mrg(struct net_device *dev,
> > > > > >                 return -EINVAL;
> > > > > >
> > > > > >         while (--*num_buf > 0) {
> > > > > > -               buf = virtqueue_get_buf_ctx(rq->vq, &len, &ctx);
> > > > > > +               buf = virtnet_rq_get_buf(rq, &len, &ctx);
> > > > > >                 if (unlikely(!buf)) {
> > > > > >                         pr_debug("%s: rx error: %d buffers out of %d missing\n",
> > > > > >                                  dev->name, *num_buf,
> > > > > > @@ -1351,7 +1549,7 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
> > > > > >         while (--num_buf) {
> > > > > >                 int num_skb_frags;
> > > > > >
> > > > > > -               buf = virtqueue_get_buf_ctx(rq->vq, &len, &ctx);
> > > > > > +               buf = virtnet_rq_get_buf(rq, &len, &ctx);
> > > > > >                 if (unlikely(!buf)) {
> > > > > >                         pr_debug("%s: rx error: %d buffers out of %d missing\n",
> > > > > >                                  dev->name, num_buf,
> > > > > > @@ -1414,7 +1612,7 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
> > > > > >  err_skb:
> > > > > >         put_page(page);
> > > > > >         while (num_buf-- > 1) {
> > > > > > -               buf = virtqueue_get_buf(rq->vq, &len);
> > > > > > +               buf = virtnet_rq_get_buf(rq, &len, NULL);
> > > > > >                 if (unlikely(!buf)) {
> > > > > >                         pr_debug("%s: rx error: %d buffers missing\n",
> > > > > >                                  dev->name, num_buf);
> > > > > > @@ -1529,6 +1727,7 @@ static int add_recvbuf_small(struct virtnet_info *vi, struct receive_queue *rq,
> > > > > >         unsigned int xdp_headroom = virtnet_get_headroom(vi);
> > > > > >         void *ctx = (void *)(unsigned long)xdp_headroom;
> > > > > >         int len = vi->hdr_len + VIRTNET_RX_PAD + GOOD_PACKET_LEN + xdp_headroom;
> > > > > > +       struct virtnet_rq_data *data;
> > > > > >         int err;
> > > > > >
> > > > > >         len = SKB_DATA_ALIGN(len) +
> > > > > > @@ -1539,11 +1738,34 @@ static int add_recvbuf_small(struct virtnet_info *vi, struct receive_queue *rq,
> > > > > >         buf = (char *)page_address(alloc_frag->page) + alloc_frag->offset;
> > > > > >         get_page(alloc_frag->page);
> > > > > >         alloc_frag->offset += len;
> > > > > > -       sg_init_one(rq->sg, buf + VIRTNET_RX_PAD + xdp_headroom,
> > > > > > -                   vi->hdr_len + GOOD_PACKET_LEN);
> > > > > > -       err = virtqueue_add_inbuf_ctx(rq->vq, rq->sg, 1, buf, ctx, gfp);
> > > > > > +
> > > > > > +       if (rq->data_array) {
> > > > > > +               err = virtnet_rq_map_sg(rq, buf + VIRTNET_RX_PAD + xdp_headroom,
> > > > > > +                                       vi->hdr_len + GOOD_PACKET_LEN);
> > > > > > +               if (err)
> > > > > > +                       goto map_err;
> > > > > > +
> > > > > > +               data = virtnet_rq_get_data(rq, buf, rq->last_dma);
> > > > > > +       } else {
> > > > > > +               sg_init_one(rq->sg, buf + VIRTNET_RX_PAD + xdp_headroom,
> > > > > > +                           vi->hdr_len + GOOD_PACKET_LEN);
> > > > > > +               data = (void *)buf;
> > > > > > +       }
> > > > > > +
> > > > > > +       err = virtqueue_add_inbuf_ctx(rq->vq, rq->sg, 1, data, ctx, gfp);
> > > > > >         if (err < 0)
> > > > > > -               put_page(virt_to_head_page(buf));
> > > > > > +               goto add_err;
> > > > > > +
> > > > > > +       return err;
> > > > > > +
> > > > > > +add_err:
> > > > > > +       if (rq->data_array) {
> > > > > > +               virtnet_rq_unmap(rq, data->dma);
> > > > > > +               virtnet_rq_recycle_data(rq, data);
> > > > > > +       }
> > > > > > +
> > > > > > +map_err:
> > > > > > +       put_page(virt_to_head_page(buf));
> > > > > >         return err;
> > > > > >  }
> > > > > >
> > > > > > @@ -1620,6 +1842,7 @@ static int add_recvbuf_mergeable(struct virtnet_info *vi,
> > > > > >         unsigned int headroom = virtnet_get_headroom(vi);
> > > > > >         unsigned int tailroom = headroom ? sizeof(struct skb_shared_info) : 0;
> > > > > >         unsigned int room = SKB_DATA_ALIGN(headroom + tailroom);
> > > > > > +       struct virtnet_rq_data *data;
> > > > > >         char *buf;
> > > > > >         void *ctx;
> > > > > >         int err;
> > > > > > @@ -1650,12 +1873,32 @@ static int add_recvbuf_mergeable(struct virtnet_info *vi,
> > > > > >                 alloc_frag->offset += hole;
> > > > > >         }
> > > > > >
> > > > > > -       sg_init_one(rq->sg, buf, len);
> > > > > > +       if (rq->data_array) {
> > > > > > +               err = virtnet_rq_map_sg(rq, buf, len);
> > > > > > +               if (err)
> > > > > > +                       goto map_err;
> > > > > > +
> > > > > > +               data = virtnet_rq_get_data(rq, buf, rq->last_dma);
> > > > > > +       } else {
> > > > > > +               sg_init_one(rq->sg, buf, len);
> > > > > > +               data = (void *)buf;
> > > > > > +       }
> > > > > > +
> > > > > >         ctx = mergeable_len_to_ctx(len + room, headroom);
> > > > > > -       err = virtqueue_add_inbuf_ctx(rq->vq, rq->sg, 1, buf, ctx, gfp);
> > > > > > +       err = virtqueue_add_inbuf_ctx(rq->vq, rq->sg, 1, data, ctx, gfp);
> > > > > >         if (err < 0)
> > > > > > -               put_page(virt_to_head_page(buf));
> > > > > > +               goto add_err;
> > > > > > +
> > > > > > +       return 0;
> > > > > > +
> > > > > > +add_err:
> > > > > > +       if (rq->data_array) {
> > > > > > +               virtnet_rq_unmap(rq, data->dma);
> > > > > > +               virtnet_rq_recycle_data(rq, data);
> > > > > > +       }
> > > > > >
> > > > > > +map_err:
> > > > > > +       put_page(virt_to_head_page(buf));
> > > > > >         return err;
> > > > > >  }
> > > > > >
> > > > > > @@ -1775,13 +2018,13 @@ static int virtnet_receive(struct receive_queue *rq, int budget,
> > > > > >                 void *ctx;
> > > > > >
> > > > > >                 while (stats.packets < budget &&
> > > > > > -                      (buf = virtqueue_get_buf_ctx(rq->vq, &len, &ctx))) {
> > > > > > +                      (buf = virtnet_rq_get_buf(rq, &len, &ctx))) {
> > > > > >                         receive_buf(vi, rq, buf, len, ctx, xdp_xmit, &stats);
> > > > > >                         stats.packets++;
> > > > > >                 }
> > > > > >         } else {
> > > > > >                 while (stats.packets < budget &&
> > > > > > -                      (buf = virtqueue_get_buf(rq->vq, &len)) != NULL) {
> > > > > > +                      (buf = virtnet_rq_get_buf(rq, &len, NULL)) != NULL) {
> > > > > >                         receive_buf(vi, rq, buf, len, NULL, xdp_xmit, &stats);
> > > > > >                         stats.packets++;
> > > > > >                 }
> > > > > > @@ -3514,6 +3757,9 @@ static void virtnet_free_queues(struct virtnet_info *vi)
> > > > > >         for (i = 0; i < vi->max_queue_pairs; i++) {
> > > > > >                 __netif_napi_del(&vi->rq[i].napi);
> > > > > >                 __netif_napi_del(&vi->sq[i].napi);
> > > > > > +
> > > > > > +               kfree(vi->rq[i].data_array);
> > > > > > +               kfree(vi->rq[i].dma_array);
> > > > > >         }
> > > > > >
> > > > > >         /* We called __netif_napi_del(),
> > > > > > @@ -3591,9 +3837,10 @@ static void free_unused_bufs(struct virtnet_info *vi)
> > > > > >         }
> > > > > >
> > > > > >         for (i = 0; i < vi->max_queue_pairs; i++) {
> > > > > > -               struct virtqueue *vq = vi->rq[i].vq;
> > > > > > -               while ((buf = virtqueue_detach_unused_buf(vq)) != NULL)
> > > > > > -                       virtnet_rq_free_unused_buf(vq, buf);
> > > > > > +               struct receive_queue *rq = &vi->rq[i];
> > > > > > +
> > > > > > +               while ((buf = virtnet_rq_detach_unused_buf(rq)) != NULL)
> > > > > > +                       virtnet_rq_free_unused_buf(rq->vq, buf);
> > > > > >                 cond_resched();
> > > > > >         }
> > > > > >  }
> > > > > > @@ -3767,6 +4014,10 @@ static int init_vqs(struct virtnet_info *vi)
> > > > > >         if (ret)
> > > > > >                 goto err_free;
> > > > > >
> > > > > > +       ret = virtnet_rq_merge_map_init(vi);
> > > > > > +       if (ret)
> > > > > > +               goto err_free;
> > > > > > +
> > > > > >         cpus_read_lock();
> > > > > >         virtnet_set_affinity(vi);
> > > > > >         cpus_read_unlock();
> > > > > > --
> > > > > > 2.32.0.3.g01195cf9f
> > > > >
> > >
> >
>
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: [PATCH vhost v11 10/10] virtio_net: merge dma operation for one page
@ 2023-07-11  2:40               ` Xuan Zhuo
  0 siblings, 0 replies; 176+ messages in thread
From: Xuan Zhuo @ 2023-07-11  2:40 UTC (permalink / raw)
  To: Jason Wang
  Cc: Michael S. Tsirkin, virtualization, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Alexei Starovoitov,
	Daniel Borkmann, Jesper Dangaard Brouer, John Fastabend, netdev,
	bpf, Christoph Hellwig

On Tue, 11 Jul 2023 10:36:17 +0800, Jason Wang <jasowang@redhat.com> wrote:
> On Mon, Jul 10, 2023 at 8:41 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> >
> > On Mon, 10 Jul 2023 07:59:03 -0400, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > > On Mon, Jul 10, 2023 at 06:18:30PM +0800, Xuan Zhuo wrote:
> > > > On Mon, 10 Jul 2023 05:40:21 -0400, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > > > > On Mon, Jul 10, 2023 at 11:42:37AM +0800, Xuan Zhuo wrote:
> > > > > > Currently, the virtio core will perform a dma operation for each
> > > > > > operation. Although, the same page may be operated multiple times.
> > > > > >
> > > > > > The driver does the dma operation and manages the dma address based the
> > > > > > feature premapped of virtio core.
> > > > > >
> > > > > > This way, we can perform only one dma operation for the same page. In
> > > > > > the case of mtu 1500, this can reduce a lot of dma operations.
> > > > > >
> > > > > > Tested on Aliyun g7.4large machine, in the case of a cpu 100%, pps
> > > > > > increased from 1893766 to 1901105. An increase of 0.4%.
> > > > >
> > > > > what kind of dma was there? an IOMMU? which vendors? in which mode
> > > > > of operation?
> > > >
> > > >
> > > > Do you mean this:
> > > >
> > > > [    0.470816] iommu: Default domain type: Passthrough
> > > >
> > >
> > > With passthrough, dma API is just some indirect function calls, they do
> > > not affect the performance a lot.
> >
> >
> > Yes, this benefit is worthless. I seem to have done a meaningless thing. The
> > overhead of DMA I observed is indeed not too high.
>
> Have you measured with iommu=strict?

I have not tested this way, our environment is pt, I wonder if strict is a
common scenario. I can test it.

Thanks.


>
> Thanks
>
> >
> > Thanks.
> >
> >
> > >
> > > Try e.g. bounce buffer. Which is where you will see a problem: your
> > > patches won't work.
> > >
> > >
> > > > >
> > > > > > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > > > >
> > > > > This kind of difference is likely in the noise.
> > > >
> > > > It's really not high, but this is because the proportion of DMA under perf top
> > > > is not high. Probably that much.
> > >
> > > So maybe not worth the complexity.
> > >
> > > > >
> > > > >
> > > > > > ---
> > > > > >  drivers/net/virtio_net.c | 283 ++++++++++++++++++++++++++++++++++++---
> > > > > >  1 file changed, 267 insertions(+), 16 deletions(-)
> > > > > >
> > > > > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > > > > > index 486b5849033d..4de845d35bed 100644
> > > > > > --- a/drivers/net/virtio_net.c
> > > > > > +++ b/drivers/net/virtio_net.c
> > > > > > @@ -126,6 +126,27 @@ static const struct virtnet_stat_desc virtnet_rq_stats_desc[] = {
> > > > > >  #define VIRTNET_SQ_STATS_LEN   ARRAY_SIZE(virtnet_sq_stats_desc)
> > > > > >  #define VIRTNET_RQ_STATS_LEN   ARRAY_SIZE(virtnet_rq_stats_desc)
> > > > > >
> > > > > > +/* The bufs on the same page may share this struct. */
> > > > > > +struct virtnet_rq_dma {
> > > > > > +       struct virtnet_rq_dma *next;
> > > > > > +
> > > > > > +       dma_addr_t addr;
> > > > > > +
> > > > > > +       void *buf;
> > > > > > +       u32 len;
> > > > > > +
> > > > > > +       u32 ref;
> > > > > > +};
> > > > > > +
> > > > > > +/* Record the dma and buf. */
> > > > >
> > > > > I guess I see that. But why?
> > > > > And these two comments are the extent of the available
> > > > > documentation, that's not enough I feel.
> > > > >
> > > > >
> > > > > > +struct virtnet_rq_data {
> > > > > > +       struct virtnet_rq_data *next;
> > > > >
> > > > > Is manually reimplementing a linked list the best
> > > > > we can do?
> > > >
> > > > Yes, we can use llist.
> > > >
> > > > >
> > > > > > +
> > > > > > +       void *buf;
> > > > > > +
> > > > > > +       struct virtnet_rq_dma *dma;
> > > > > > +};
> > > > > > +
> > > > > >  /* Internal representation of a send virtqueue */
> > > > > >  struct send_queue {
> > > > > >         /* Virtqueue associated with this send _queue */
> > > > > > @@ -175,6 +196,13 @@ struct receive_queue {
> > > > > >         char name[16];
> > > > > >
> > > > > >         struct xdp_rxq_info xdp_rxq;
> > > > > > +
> > > > > > +       struct virtnet_rq_data *data_array;
> > > > > > +       struct virtnet_rq_data *data_free;
> > > > > > +
> > > > > > +       struct virtnet_rq_dma *dma_array;
> > > > > > +       struct virtnet_rq_dma *dma_free;
> > > > > > +       struct virtnet_rq_dma *last_dma;
> > > > > >  };
> > > > > >
> > > > > >  /* This structure can contain rss message with maximum settings for indirection table and keysize
> > > > > > @@ -549,6 +577,176 @@ static struct sk_buff *page_to_skb(struct virtnet_info *vi,
> > > > > >         return skb;
> > > > > >  }
> > > > > >
> > > > > > +static void virtnet_rq_unmap(struct receive_queue *rq, struct virtnet_rq_dma *dma)
> > > > > > +{
> > > > > > +       struct device *dev;
> > > > > > +
> > > > > > +       --dma->ref;
> > > > > > +
> > > > > > +       if (dma->ref)
> > > > > > +               return;
> > > > > > +
> > > > >
> > > > > If you don't unmap there is no guarantee valid data will be
> > > > > there in the buffer.
> > > > >
> > > > > > +       dev = virtqueue_dma_dev(rq->vq);
> > > > > > +
> > > > > > +       dma_unmap_page(dev, dma->addr, dma->len, DMA_FROM_DEVICE);
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > > +
> > > > > > +       dma->next = rq->dma_free;
> > > > > > +       rq->dma_free = dma;
> > > > > > +}
> > > > > > +
> > > > > > +static void *virtnet_rq_recycle_data(struct receive_queue *rq,
> > > > > > +                                    struct virtnet_rq_data *data)
> > > > > > +{
> > > > > > +       void *buf;
> > > > > > +
> > > > > > +       buf = data->buf;
> > > > > > +
> > > > > > +       data->next = rq->data_free;
> > > > > > +       rq->data_free = data;
> > > > > > +
> > > > > > +       return buf;
> > > > > > +}
> > > > > > +
> > > > > > +static struct virtnet_rq_data *virtnet_rq_get_data(struct receive_queue *rq,
> > > > > > +                                                  void *buf,
> > > > > > +                                                  struct virtnet_rq_dma *dma)
> > > > > > +{
> > > > > > +       struct virtnet_rq_data *data;
> > > > > > +
> > > > > > +       data = rq->data_free;
> > > > > > +       rq->data_free = data->next;
> > > > > > +
> > > > > > +       data->buf = buf;
> > > > > > +       data->dma = dma;
> > > > > > +
> > > > > > +       return data;
> > > > > > +}
> > > > > > +
> > > > > > +static void *virtnet_rq_get_buf(struct receive_queue *rq, u32 *len, void **ctx)
> > > > > > +{
> > > > > > +       struct virtnet_rq_data *data;
> > > > > > +       void *buf;
> > > > > > +
> > > > > > +       buf = virtqueue_get_buf_ctx(rq->vq, len, ctx);
> > > > > > +       if (!buf || !rq->data_array)
> > > > > > +               return buf;
> > > > > > +
> > > > > > +       data = buf;
> > > > > > +
> > > > > > +       virtnet_rq_unmap(rq, data->dma);
> > > > > > +
> > > > > > +       return virtnet_rq_recycle_data(rq, data);
> > > > > > +}
> > > > > > +
> > > > > > +static void *virtnet_rq_detach_unused_buf(struct receive_queue *rq)
> > > > > > +{
> > > > > > +       struct virtnet_rq_data *data;
> > > > > > +       void *buf;
> > > > > > +
> > > > > > +       buf = virtqueue_detach_unused_buf(rq->vq);
> > > > > > +       if (!buf || !rq->data_array)
> > > > > > +               return buf;
> > > > > > +
> > > > > > +       data = buf;
> > > > > > +
> > > > > > +       virtnet_rq_unmap(rq, data->dma);
> > > > > > +
> > > > > > +       return virtnet_rq_recycle_data(rq, data);
> > > > > > +}
> > > > > > +
> > > > > > +static int virtnet_rq_map_sg(struct receive_queue *rq, void *buf, u32 len)
> > > > > > +{
> > > > > > +       struct virtnet_rq_dma *dma = rq->last_dma;
> > > > > > +       struct device *dev;
> > > > > > +       u32 off, map_len;
> > > > > > +       dma_addr_t addr;
> > > > > > +       void *end;
> > > > > > +
> > > > > > +       if (likely(dma) && buf >= dma->buf && (buf + len <= dma->buf + dma->len)) {
> > > > > > +               ++dma->ref;
> > > > > > +               addr = dma->addr + (buf - dma->buf);
> > > > > > +               goto ok;
> > > > > > +       }
> > > > >
> > > > > So this is the meat of the proposed optimization. I guess that
> > > > > if the last buffer we allocated happens to be in the same page
> > > > > as this one then they can both be mapped for DMA together.
> > > >
> > > > Since we use page_frag, the buffers we allocated are all continuous.
> > > >
> > > > > Why last one specifically? Whether next one happens to
> > > > > be close depends on luck. If you want to try optimizing this
> > > > > the right thing to do is likely by using a page pool.
> > > > > There's actually work upstream on page pool, look it up.
> > > >
> > > > As we discussed in another thread, the page pool is first used for xdp. Let's
> > > > transform it step by step.
> > > >
> > > > Thanks.
> > >
> > > ok so this should wait then?
> > >
> > > > >
> > > > > > +
> > > > > > +       end = buf + len - 1;
> > > > > > +       off = offset_in_page(end);
> > > > > > +       map_len = len + PAGE_SIZE - off;
> > > > > > +
> > > > > > +       dev = virtqueue_dma_dev(rq->vq);
> > > > > > +
> > > > > > +       addr = dma_map_page_attrs(dev, virt_to_page(buf), offset_in_page(buf),
> > > > > > +                                 map_len, DMA_FROM_DEVICE, 0);
> > > > > > +       if (addr == DMA_MAPPING_ERROR)
> > > > > > +               return -ENOMEM;
> > > > > > +
> > > > > > +       dma = rq->dma_free;
> > > > > > +       rq->dma_free = dma->next;
> > > > > > +
> > > > > > +       dma->ref = 1;
> > > > > > +       dma->buf = buf;
> > > > > > +       dma->addr = addr;
> > > > > > +       dma->len = map_len;
> > > > > > +
> > > > > > +       rq->last_dma = dma;
> > > > > > +
> > > > > > +ok:
> > > > > > +       sg_init_table(rq->sg, 1);
> > > > > > +       rq->sg[0].dma_address = addr;
> > > > > > +       rq->sg[0].length = len;
> > > > > > +
> > > > > > +       return 0;
> > > > > > +}
> > > > > > +
> > > > > > +static int virtnet_rq_merge_map_init(struct virtnet_info *vi)
> > > > > > +{
> > > > > > +       struct receive_queue *rq;
> > > > > > +       int i, err, j, num;
> > > > > > +
> > > > > > +       /* disable for big mode */
> > > > > > +       if (!vi->mergeable_rx_bufs && vi->big_packets)
> > > > > > +               return 0;
> > > > > > +
> > > > > > +       for (i = 0; i < vi->max_queue_pairs; i++) {
> > > > > > +               err = virtqueue_set_premapped(vi->rq[i].vq);
> > > > > > +               if (err)
> > > > > > +                       continue;
> > > > > > +
> > > > > > +               rq = &vi->rq[i];
> > > > > > +
> > > > > > +               num = virtqueue_get_vring_size(rq->vq);
> > > > > > +
> > > > > > +               rq->data_array = kmalloc_array(num, sizeof(*rq->data_array), GFP_KERNEL);
> > > > > > +               if (!rq->data_array)
> > > > > > +                       goto err;
> > > > > > +
> > > > > > +               rq->dma_array = kmalloc_array(num, sizeof(*rq->dma_array), GFP_KERNEL);
> > > > > > +               if (!rq->dma_array)
> > > > > > +                       goto err;
> > > > > > +
> > > > > > +               for (j = 0; j < num; ++j) {
> > > > > > +                       rq->data_array[j].next = rq->data_free;
> > > > > > +                       rq->data_free = &rq->data_array[j];
> > > > > > +
> > > > > > +                       rq->dma_array[j].next = rq->dma_free;
> > > > > > +                       rq->dma_free = &rq->dma_array[j];
> > > > > > +               }
> > > > > > +       }
> > > > > > +
> > > > > > +       return 0;
> > > > > > +
> > > > > > +err:
> > > > > > +       for (i = 0; i < vi->max_queue_pairs; i++) {
> > > > > > +               struct receive_queue *rq;
> > > > > > +
> > > > > > +               rq = &vi->rq[i];
> > > > > > +
> > > > > > +               kfree(rq->dma_array);
> > > > > > +               kfree(rq->data_array);
> > > > > > +       }
> > > > > > +
> > > > > > +       return -ENOMEM;
> > > > > > +}
> > > > > > +
> > > > > >  static void free_old_xmit_skbs(struct send_queue *sq, bool in_napi)
> > > > > >  {
> > > > > >         unsigned int len;
> > > > > > @@ -835,7 +1033,7 @@ static struct page *xdp_linearize_page(struct receive_queue *rq,
> > > > > >                 void *buf;
> > > > > >                 int off;
> > > > > >
> > > > > > -               buf = virtqueue_get_buf(rq->vq, &buflen);
> > > > > > +               buf = virtnet_rq_get_buf(rq, &buflen, NULL);
> > > > > >                 if (unlikely(!buf))
> > > > > >                         goto err_buf;
> > > > > >
> > > > > > @@ -1126,7 +1324,7 @@ static int virtnet_build_xdp_buff_mrg(struct net_device *dev,
> > > > > >                 return -EINVAL;
> > > > > >
> > > > > >         while (--*num_buf > 0) {
> > > > > > -               buf = virtqueue_get_buf_ctx(rq->vq, &len, &ctx);
> > > > > > +               buf = virtnet_rq_get_buf(rq, &len, &ctx);
> > > > > >                 if (unlikely(!buf)) {
> > > > > >                         pr_debug("%s: rx error: %d buffers out of %d missing\n",
> > > > > >                                  dev->name, *num_buf,
> > > > > > @@ -1351,7 +1549,7 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
> > > > > >         while (--num_buf) {
> > > > > >                 int num_skb_frags;
> > > > > >
> > > > > > -               buf = virtqueue_get_buf_ctx(rq->vq, &len, &ctx);
> > > > > > +               buf = virtnet_rq_get_buf(rq, &len, &ctx);
> > > > > >                 if (unlikely(!buf)) {
> > > > > >                         pr_debug("%s: rx error: %d buffers out of %d missing\n",
> > > > > >                                  dev->name, num_buf,
> > > > > > @@ -1414,7 +1612,7 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
> > > > > >  err_skb:
> > > > > >         put_page(page);
> > > > > >         while (num_buf-- > 1) {
> > > > > > -               buf = virtqueue_get_buf(rq->vq, &len);
> > > > > > +               buf = virtnet_rq_get_buf(rq, &len, NULL);
> > > > > >                 if (unlikely(!buf)) {
> > > > > >                         pr_debug("%s: rx error: %d buffers missing\n",
> > > > > >                                  dev->name, num_buf);
> > > > > > @@ -1529,6 +1727,7 @@ static int add_recvbuf_small(struct virtnet_info *vi, struct receive_queue *rq,
> > > > > >         unsigned int xdp_headroom = virtnet_get_headroom(vi);
> > > > > >         void *ctx = (void *)(unsigned long)xdp_headroom;
> > > > > >         int len = vi->hdr_len + VIRTNET_RX_PAD + GOOD_PACKET_LEN + xdp_headroom;
> > > > > > +       struct virtnet_rq_data *data;
> > > > > >         int err;
> > > > > >
> > > > > >         len = SKB_DATA_ALIGN(len) +
> > > > > > @@ -1539,11 +1738,34 @@ static int add_recvbuf_small(struct virtnet_info *vi, struct receive_queue *rq,
> > > > > >         buf = (char *)page_address(alloc_frag->page) + alloc_frag->offset;
> > > > > >         get_page(alloc_frag->page);
> > > > > >         alloc_frag->offset += len;
> > > > > > -       sg_init_one(rq->sg, buf + VIRTNET_RX_PAD + xdp_headroom,
> > > > > > -                   vi->hdr_len + GOOD_PACKET_LEN);
> > > > > > -       err = virtqueue_add_inbuf_ctx(rq->vq, rq->sg, 1, buf, ctx, gfp);
> > > > > > +
> > > > > > +       if (rq->data_array) {
> > > > > > +               err = virtnet_rq_map_sg(rq, buf + VIRTNET_RX_PAD + xdp_headroom,
> > > > > > +                                       vi->hdr_len + GOOD_PACKET_LEN);
> > > > > > +               if (err)
> > > > > > +                       goto map_err;
> > > > > > +
> > > > > > +               data = virtnet_rq_get_data(rq, buf, rq->last_dma);
> > > > > > +       } else {
> > > > > > +               sg_init_one(rq->sg, buf + VIRTNET_RX_PAD + xdp_headroom,
> > > > > > +                           vi->hdr_len + GOOD_PACKET_LEN);
> > > > > > +               data = (void *)buf;
> > > > > > +       }
> > > > > > +
> > > > > > +       err = virtqueue_add_inbuf_ctx(rq->vq, rq->sg, 1, data, ctx, gfp);
> > > > > >         if (err < 0)
> > > > > > -               put_page(virt_to_head_page(buf));
> > > > > > +               goto add_err;
> > > > > > +
> > > > > > +       return err;
> > > > > > +
> > > > > > +add_err:
> > > > > > +       if (rq->data_array) {
> > > > > > +               virtnet_rq_unmap(rq, data->dma);
> > > > > > +               virtnet_rq_recycle_data(rq, data);
> > > > > > +       }
> > > > > > +
> > > > > > +map_err:
> > > > > > +       put_page(virt_to_head_page(buf));
> > > > > >         return err;
> > > > > >  }
> > > > > >
> > > > > > @@ -1620,6 +1842,7 @@ static int add_recvbuf_mergeable(struct virtnet_info *vi,
> > > > > >         unsigned int headroom = virtnet_get_headroom(vi);
> > > > > >         unsigned int tailroom = headroom ? sizeof(struct skb_shared_info) : 0;
> > > > > >         unsigned int room = SKB_DATA_ALIGN(headroom + tailroom);
> > > > > > +       struct virtnet_rq_data *data;
> > > > > >         char *buf;
> > > > > >         void *ctx;
> > > > > >         int err;
> > > > > > @@ -1650,12 +1873,32 @@ static int add_recvbuf_mergeable(struct virtnet_info *vi,
> > > > > >                 alloc_frag->offset += hole;
> > > > > >         }
> > > > > >
> > > > > > -       sg_init_one(rq->sg, buf, len);
> > > > > > +       if (rq->data_array) {
> > > > > > +               err = virtnet_rq_map_sg(rq, buf, len);
> > > > > > +               if (err)
> > > > > > +                       goto map_err;
> > > > > > +
> > > > > > +               data = virtnet_rq_get_data(rq, buf, rq->last_dma);
> > > > > > +       } else {
> > > > > > +               sg_init_one(rq->sg, buf, len);
> > > > > > +               data = (void *)buf;
> > > > > > +       }
> > > > > > +
> > > > > >         ctx = mergeable_len_to_ctx(len + room, headroom);
> > > > > > -       err = virtqueue_add_inbuf_ctx(rq->vq, rq->sg, 1, buf, ctx, gfp);
> > > > > > +       err = virtqueue_add_inbuf_ctx(rq->vq, rq->sg, 1, data, ctx, gfp);
> > > > > >         if (err < 0)
> > > > > > -               put_page(virt_to_head_page(buf));
> > > > > > +               goto add_err;
> > > > > > +
> > > > > > +       return 0;
> > > > > > +
> > > > > > +add_err:
> > > > > > +       if (rq->data_array) {
> > > > > > +               virtnet_rq_unmap(rq, data->dma);
> > > > > > +               virtnet_rq_recycle_data(rq, data);
> > > > > > +       }
> > > > > >
> > > > > > +map_err:
> > > > > > +       put_page(virt_to_head_page(buf));
> > > > > >         return err;
> > > > > >  }
> > > > > >
> > > > > > @@ -1775,13 +2018,13 @@ static int virtnet_receive(struct receive_queue *rq, int budget,
> > > > > >                 void *ctx;
> > > > > >
> > > > > >                 while (stats.packets < budget &&
> > > > > > -                      (buf = virtqueue_get_buf_ctx(rq->vq, &len, &ctx))) {
> > > > > > +                      (buf = virtnet_rq_get_buf(rq, &len, &ctx))) {
> > > > > >                         receive_buf(vi, rq, buf, len, ctx, xdp_xmit, &stats);
> > > > > >                         stats.packets++;
> > > > > >                 }
> > > > > >         } else {
> > > > > >                 while (stats.packets < budget &&
> > > > > > -                      (buf = virtqueue_get_buf(rq->vq, &len)) != NULL) {
> > > > > > +                      (buf = virtnet_rq_get_buf(rq, &len, NULL)) != NULL) {
> > > > > >                         receive_buf(vi, rq, buf, len, NULL, xdp_xmit, &stats);
> > > > > >                         stats.packets++;
> > > > > >                 }
> > > > > > @@ -3514,6 +3757,9 @@ static void virtnet_free_queues(struct virtnet_info *vi)
> > > > > >         for (i = 0; i < vi->max_queue_pairs; i++) {
> > > > > >                 __netif_napi_del(&vi->rq[i].napi);
> > > > > >                 __netif_napi_del(&vi->sq[i].napi);
> > > > > > +
> > > > > > +               kfree(vi->rq[i].data_array);
> > > > > > +               kfree(vi->rq[i].dma_array);
> > > > > >         }
> > > > > >
> > > > > >         /* We called __netif_napi_del(),
> > > > > > @@ -3591,9 +3837,10 @@ static void free_unused_bufs(struct virtnet_info *vi)
> > > > > >         }
> > > > > >
> > > > > >         for (i = 0; i < vi->max_queue_pairs; i++) {
> > > > > > -               struct virtqueue *vq = vi->rq[i].vq;
> > > > > > -               while ((buf = virtqueue_detach_unused_buf(vq)) != NULL)
> > > > > > -                       virtnet_rq_free_unused_buf(vq, buf);
> > > > > > +               struct receive_queue *rq = &vi->rq[i];
> > > > > > +
> > > > > > +               while ((buf = virtnet_rq_detach_unused_buf(rq)) != NULL)
> > > > > > +                       virtnet_rq_free_unused_buf(rq->vq, buf);
> > > > > >                 cond_resched();
> > > > > >         }
> > > > > >  }
> > > > > > @@ -3767,6 +4014,10 @@ static int init_vqs(struct virtnet_info *vi)
> > > > > >         if (ret)
> > > > > >                 goto err_free;
> > > > > >
> > > > > > +       ret = virtnet_rq_merge_map_init(vi);
> > > > > > +       if (ret)
> > > > > > +               goto err_free;
> > > > > > +
> > > > > >         cpus_read_lock();
> > > > > >         virtnet_set_affinity(vi);
> > > > > >         cpus_read_unlock();
> > > > > > --
> > > > > > 2.32.0.3.g01195cf9f
> > > > >
> > >
> >
>

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: [PATCH vhost v11 10/10] virtio_net: merge dma operation for one page
  2023-07-11  2:40               ` Xuan Zhuo
@ 2023-07-11  2:58                 ` Jason Wang
  -1 siblings, 0 replies; 176+ messages in thread
From: Jason Wang @ 2023-07-11  2:58 UTC (permalink / raw)
  To: Xuan Zhuo
  Cc: Michael S. Tsirkin, virtualization, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Alexei Starovoitov,
	Daniel Borkmann, Jesper Dangaard Brouer, John Fastabend, netdev,
	bpf, Christoph Hellwig

On Tue, Jul 11, 2023 at 10:42 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
>
> On Tue, 11 Jul 2023 10:36:17 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > On Mon, Jul 10, 2023 at 8:41 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > >
> > > On Mon, 10 Jul 2023 07:59:03 -0400, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > > > On Mon, Jul 10, 2023 at 06:18:30PM +0800, Xuan Zhuo wrote:
> > > > > On Mon, 10 Jul 2023 05:40:21 -0400, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > > > > > On Mon, Jul 10, 2023 at 11:42:37AM +0800, Xuan Zhuo wrote:
> > > > > > > Currently, the virtio core will perform a dma operation for each
> > > > > > > operation. Although, the same page may be operated multiple times.
> > > > > > >
> > > > > > > The driver does the dma operation and manages the dma address based the
> > > > > > > feature premapped of virtio core.
> > > > > > >
> > > > > > > This way, we can perform only one dma operation for the same page. In
> > > > > > > the case of mtu 1500, this can reduce a lot of dma operations.
> > > > > > >
> > > > > > > Tested on Aliyun g7.4large machine, in the case of a cpu 100%, pps
> > > > > > > increased from 1893766 to 1901105. An increase of 0.4%.
> > > > > >
> > > > > > what kind of dma was there? an IOMMU? which vendors? in which mode
> > > > > > of operation?
> > > > >
> > > > >
> > > > > Do you mean this:
> > > > >
> > > > > [    0.470816] iommu: Default domain type: Passthrough
> > > > >
> > > >
> > > > With passthrough, dma API is just some indirect function calls, they do
> > > > not affect the performance a lot.
> > >
> > >
> > > Yes, this benefit is worthless. I seem to have done a meaningless thing. The
> > > overhead of DMA I observed is indeed not too high.
> >
> > Have you measured with iommu=strict?
>
> I have not tested this way, our environment is pt, I wonder if strict is a
> common scenario. I can test it.

It's not a common setup, but it's a way to stress DMA layer to see the overhead.

Thanks

>
> Thanks.
>
>
> >
> > Thanks
> >
> > >
> > > Thanks.
> > >
> > >
> > > >
> > > > Try e.g. bounce buffer. Which is where you will see a problem: your
> > > > patches won't work.
> > > >
> > > >
> > > > > >
> > > > > > > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > > > > >
> > > > > > This kind of difference is likely in the noise.
> > > > >
> > > > > It's really not high, but this is because the proportion of DMA under perf top
> > > > > is not high. Probably that much.
> > > >
> > > > So maybe not worth the complexity.
> > > >
> > > > > >
> > > > > >
> > > > > > > ---
> > > > > > >  drivers/net/virtio_net.c | 283 ++++++++++++++++++++++++++++++++++++---
> > > > > > >  1 file changed, 267 insertions(+), 16 deletions(-)
> > > > > > >
> > > > > > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > > > > > > index 486b5849033d..4de845d35bed 100644
> > > > > > > --- a/drivers/net/virtio_net.c
> > > > > > > +++ b/drivers/net/virtio_net.c
> > > > > > > @@ -126,6 +126,27 @@ static const struct virtnet_stat_desc virtnet_rq_stats_desc[] = {
> > > > > > >  #define VIRTNET_SQ_STATS_LEN   ARRAY_SIZE(virtnet_sq_stats_desc)
> > > > > > >  #define VIRTNET_RQ_STATS_LEN   ARRAY_SIZE(virtnet_rq_stats_desc)
> > > > > > >
> > > > > > > +/* The bufs on the same page may share this struct. */
> > > > > > > +struct virtnet_rq_dma {
> > > > > > > +       struct virtnet_rq_dma *next;
> > > > > > > +
> > > > > > > +       dma_addr_t addr;
> > > > > > > +
> > > > > > > +       void *buf;
> > > > > > > +       u32 len;
> > > > > > > +
> > > > > > > +       u32 ref;
> > > > > > > +};
> > > > > > > +
> > > > > > > +/* Record the dma and buf. */
> > > > > >
> > > > > > I guess I see that. But why?
> > > > > > And these two comments are the extent of the available
> > > > > > documentation, that's not enough I feel.
> > > > > >
> > > > > >
> > > > > > > +struct virtnet_rq_data {
> > > > > > > +       struct virtnet_rq_data *next;
> > > > > >
> > > > > > Is manually reimplementing a linked list the best
> > > > > > we can do?
> > > > >
> > > > > Yes, we can use llist.
> > > > >
> > > > > >
> > > > > > > +
> > > > > > > +       void *buf;
> > > > > > > +
> > > > > > > +       struct virtnet_rq_dma *dma;
> > > > > > > +};
> > > > > > > +
> > > > > > >  /* Internal representation of a send virtqueue */
> > > > > > >  struct send_queue {
> > > > > > >         /* Virtqueue associated with this send _queue */
> > > > > > > @@ -175,6 +196,13 @@ struct receive_queue {
> > > > > > >         char name[16];
> > > > > > >
> > > > > > >         struct xdp_rxq_info xdp_rxq;
> > > > > > > +
> > > > > > > +       struct virtnet_rq_data *data_array;
> > > > > > > +       struct virtnet_rq_data *data_free;
> > > > > > > +
> > > > > > > +       struct virtnet_rq_dma *dma_array;
> > > > > > > +       struct virtnet_rq_dma *dma_free;
> > > > > > > +       struct virtnet_rq_dma *last_dma;
> > > > > > >  };
> > > > > > >
> > > > > > >  /* This structure can contain rss message with maximum settings for indirection table and keysize
> > > > > > > @@ -549,6 +577,176 @@ static struct sk_buff *page_to_skb(struct virtnet_info *vi,
> > > > > > >         return skb;
> > > > > > >  }
> > > > > > >
> > > > > > > +static void virtnet_rq_unmap(struct receive_queue *rq, struct virtnet_rq_dma *dma)
> > > > > > > +{
> > > > > > > +       struct device *dev;
> > > > > > > +
> > > > > > > +       --dma->ref;
> > > > > > > +
> > > > > > > +       if (dma->ref)
> > > > > > > +               return;
> > > > > > > +
> > > > > >
> > > > > > If you don't unmap there is no guarantee valid data will be
> > > > > > there in the buffer.
> > > > > >
> > > > > > > +       dev = virtqueue_dma_dev(rq->vq);
> > > > > > > +
> > > > > > > +       dma_unmap_page(dev, dma->addr, dma->len, DMA_FROM_DEVICE);
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > > +
> > > > > > > +       dma->next = rq->dma_free;
> > > > > > > +       rq->dma_free = dma;
> > > > > > > +}
> > > > > > > +
> > > > > > > +static void *virtnet_rq_recycle_data(struct receive_queue *rq,
> > > > > > > +                                    struct virtnet_rq_data *data)
> > > > > > > +{
> > > > > > > +       void *buf;
> > > > > > > +
> > > > > > > +       buf = data->buf;
> > > > > > > +
> > > > > > > +       data->next = rq->data_free;
> > > > > > > +       rq->data_free = data;
> > > > > > > +
> > > > > > > +       return buf;
> > > > > > > +}
> > > > > > > +
> > > > > > > +static struct virtnet_rq_data *virtnet_rq_get_data(struct receive_queue *rq,
> > > > > > > +                                                  void *buf,
> > > > > > > +                                                  struct virtnet_rq_dma *dma)
> > > > > > > +{
> > > > > > > +       struct virtnet_rq_data *data;
> > > > > > > +
> > > > > > > +       data = rq->data_free;
> > > > > > > +       rq->data_free = data->next;
> > > > > > > +
> > > > > > > +       data->buf = buf;
> > > > > > > +       data->dma = dma;
> > > > > > > +
> > > > > > > +       return data;
> > > > > > > +}
> > > > > > > +
> > > > > > > +static void *virtnet_rq_get_buf(struct receive_queue *rq, u32 *len, void **ctx)
> > > > > > > +{
> > > > > > > +       struct virtnet_rq_data *data;
> > > > > > > +       void *buf;
> > > > > > > +
> > > > > > > +       buf = virtqueue_get_buf_ctx(rq->vq, len, ctx);
> > > > > > > +       if (!buf || !rq->data_array)
> > > > > > > +               return buf;
> > > > > > > +
> > > > > > > +       data = buf;
> > > > > > > +
> > > > > > > +       virtnet_rq_unmap(rq, data->dma);
> > > > > > > +
> > > > > > > +       return virtnet_rq_recycle_data(rq, data);
> > > > > > > +}
> > > > > > > +
> > > > > > > +static void *virtnet_rq_detach_unused_buf(struct receive_queue *rq)
> > > > > > > +{
> > > > > > > +       struct virtnet_rq_data *data;
> > > > > > > +       void *buf;
> > > > > > > +
> > > > > > > +       buf = virtqueue_detach_unused_buf(rq->vq);
> > > > > > > +       if (!buf || !rq->data_array)
> > > > > > > +               return buf;
> > > > > > > +
> > > > > > > +       data = buf;
> > > > > > > +
> > > > > > > +       virtnet_rq_unmap(rq, data->dma);
> > > > > > > +
> > > > > > > +       return virtnet_rq_recycle_data(rq, data);
> > > > > > > +}
> > > > > > > +
> > > > > > > +static int virtnet_rq_map_sg(struct receive_queue *rq, void *buf, u32 len)
> > > > > > > +{
> > > > > > > +       struct virtnet_rq_dma *dma = rq->last_dma;
> > > > > > > +       struct device *dev;
> > > > > > > +       u32 off, map_len;
> > > > > > > +       dma_addr_t addr;
> > > > > > > +       void *end;
> > > > > > > +
> > > > > > > +       if (likely(dma) && buf >= dma->buf && (buf + len <= dma->buf + dma->len)) {
> > > > > > > +               ++dma->ref;
> > > > > > > +               addr = dma->addr + (buf - dma->buf);
> > > > > > > +               goto ok;
> > > > > > > +       }
> > > > > >
> > > > > > So this is the meat of the proposed optimization. I guess that
> > > > > > if the last buffer we allocated happens to be in the same page
> > > > > > as this one then they can both be mapped for DMA together.
> > > > >
> > > > > Since we use page_frag, the buffers we allocated are all continuous.
> > > > >
> > > > > > Why last one specifically? Whether next one happens to
> > > > > > be close depends on luck. If you want to try optimizing this
> > > > > > the right thing to do is likely by using a page pool.
> > > > > > There's actually work upstream on page pool, look it up.
> > > > >
> > > > > As we discussed in another thread, the page pool is first used for xdp. Let's
> > > > > transform it step by step.
> > > > >
> > > > > Thanks.
> > > >
> > > > ok so this should wait then?
> > > >
> > > > > >
> > > > > > > +
> > > > > > > +       end = buf + len - 1;
> > > > > > > +       off = offset_in_page(end);
> > > > > > > +       map_len = len + PAGE_SIZE - off;
> > > > > > > +
> > > > > > > +       dev = virtqueue_dma_dev(rq->vq);
> > > > > > > +
> > > > > > > +       addr = dma_map_page_attrs(dev, virt_to_page(buf), offset_in_page(buf),
> > > > > > > +                                 map_len, DMA_FROM_DEVICE, 0);
> > > > > > > +       if (addr == DMA_MAPPING_ERROR)
> > > > > > > +               return -ENOMEM;
> > > > > > > +
> > > > > > > +       dma = rq->dma_free;
> > > > > > > +       rq->dma_free = dma->next;
> > > > > > > +
> > > > > > > +       dma->ref = 1;
> > > > > > > +       dma->buf = buf;
> > > > > > > +       dma->addr = addr;
> > > > > > > +       dma->len = map_len;
> > > > > > > +
> > > > > > > +       rq->last_dma = dma;
> > > > > > > +
> > > > > > > +ok:
> > > > > > > +       sg_init_table(rq->sg, 1);
> > > > > > > +       rq->sg[0].dma_address = addr;
> > > > > > > +       rq->sg[0].length = len;
> > > > > > > +
> > > > > > > +       return 0;
> > > > > > > +}
> > > > > > > +
> > > > > > > +static int virtnet_rq_merge_map_init(struct virtnet_info *vi)
> > > > > > > +{
> > > > > > > +       struct receive_queue *rq;
> > > > > > > +       int i, err, j, num;
> > > > > > > +
> > > > > > > +       /* disable for big mode */
> > > > > > > +       if (!vi->mergeable_rx_bufs && vi->big_packets)
> > > > > > > +               return 0;
> > > > > > > +
> > > > > > > +       for (i = 0; i < vi->max_queue_pairs; i++) {
> > > > > > > +               err = virtqueue_set_premapped(vi->rq[i].vq);
> > > > > > > +               if (err)
> > > > > > > +                       continue;
> > > > > > > +
> > > > > > > +               rq = &vi->rq[i];
> > > > > > > +
> > > > > > > +               num = virtqueue_get_vring_size(rq->vq);
> > > > > > > +
> > > > > > > +               rq->data_array = kmalloc_array(num, sizeof(*rq->data_array), GFP_KERNEL);
> > > > > > > +               if (!rq->data_array)
> > > > > > > +                       goto err;
> > > > > > > +
> > > > > > > +               rq->dma_array = kmalloc_array(num, sizeof(*rq->dma_array), GFP_KERNEL);
> > > > > > > +               if (!rq->dma_array)
> > > > > > > +                       goto err;
> > > > > > > +
> > > > > > > +               for (j = 0; j < num; ++j) {
> > > > > > > +                       rq->data_array[j].next = rq->data_free;
> > > > > > > +                       rq->data_free = &rq->data_array[j];
> > > > > > > +
> > > > > > > +                       rq->dma_array[j].next = rq->dma_free;
> > > > > > > +                       rq->dma_free = &rq->dma_array[j];
> > > > > > > +               }
> > > > > > > +       }
> > > > > > > +
> > > > > > > +       return 0;
> > > > > > > +
> > > > > > > +err:
> > > > > > > +       for (i = 0; i < vi->max_queue_pairs; i++) {
> > > > > > > +               struct receive_queue *rq;
> > > > > > > +
> > > > > > > +               rq = &vi->rq[i];
> > > > > > > +
> > > > > > > +               kfree(rq->dma_array);
> > > > > > > +               kfree(rq->data_array);
> > > > > > > +       }
> > > > > > > +
> > > > > > > +       return -ENOMEM;
> > > > > > > +}
> > > > > > > +
> > > > > > >  static void free_old_xmit_skbs(struct send_queue *sq, bool in_napi)
> > > > > > >  {
> > > > > > >         unsigned int len;
> > > > > > > @@ -835,7 +1033,7 @@ static struct page *xdp_linearize_page(struct receive_queue *rq,
> > > > > > >                 void *buf;
> > > > > > >                 int off;
> > > > > > >
> > > > > > > -               buf = virtqueue_get_buf(rq->vq, &buflen);
> > > > > > > +               buf = virtnet_rq_get_buf(rq, &buflen, NULL);
> > > > > > >                 if (unlikely(!buf))
> > > > > > >                         goto err_buf;
> > > > > > >
> > > > > > > @@ -1126,7 +1324,7 @@ static int virtnet_build_xdp_buff_mrg(struct net_device *dev,
> > > > > > >                 return -EINVAL;
> > > > > > >
> > > > > > >         while (--*num_buf > 0) {
> > > > > > > -               buf = virtqueue_get_buf_ctx(rq->vq, &len, &ctx);
> > > > > > > +               buf = virtnet_rq_get_buf(rq, &len, &ctx);
> > > > > > >                 if (unlikely(!buf)) {
> > > > > > >                         pr_debug("%s: rx error: %d buffers out of %d missing\n",
> > > > > > >                                  dev->name, *num_buf,
> > > > > > > @@ -1351,7 +1549,7 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
> > > > > > >         while (--num_buf) {
> > > > > > >                 int num_skb_frags;
> > > > > > >
> > > > > > > -               buf = virtqueue_get_buf_ctx(rq->vq, &len, &ctx);
> > > > > > > +               buf = virtnet_rq_get_buf(rq, &len, &ctx);
> > > > > > >                 if (unlikely(!buf)) {
> > > > > > >                         pr_debug("%s: rx error: %d buffers out of %d missing\n",
> > > > > > >                                  dev->name, num_buf,
> > > > > > > @@ -1414,7 +1612,7 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
> > > > > > >  err_skb:
> > > > > > >         put_page(page);
> > > > > > >         while (num_buf-- > 1) {
> > > > > > > -               buf = virtqueue_get_buf(rq->vq, &len);
> > > > > > > +               buf = virtnet_rq_get_buf(rq, &len, NULL);
> > > > > > >                 if (unlikely(!buf)) {
> > > > > > >                         pr_debug("%s: rx error: %d buffers missing\n",
> > > > > > >                                  dev->name, num_buf);
> > > > > > > @@ -1529,6 +1727,7 @@ static int add_recvbuf_small(struct virtnet_info *vi, struct receive_queue *rq,
> > > > > > >         unsigned int xdp_headroom = virtnet_get_headroom(vi);
> > > > > > >         void *ctx = (void *)(unsigned long)xdp_headroom;
> > > > > > >         int len = vi->hdr_len + VIRTNET_RX_PAD + GOOD_PACKET_LEN + xdp_headroom;
> > > > > > > +       struct virtnet_rq_data *data;
> > > > > > >         int err;
> > > > > > >
> > > > > > >         len = SKB_DATA_ALIGN(len) +
> > > > > > > @@ -1539,11 +1738,34 @@ static int add_recvbuf_small(struct virtnet_info *vi, struct receive_queue *rq,
> > > > > > >         buf = (char *)page_address(alloc_frag->page) + alloc_frag->offset;
> > > > > > >         get_page(alloc_frag->page);
> > > > > > >         alloc_frag->offset += len;
> > > > > > > -       sg_init_one(rq->sg, buf + VIRTNET_RX_PAD + xdp_headroom,
> > > > > > > -                   vi->hdr_len + GOOD_PACKET_LEN);
> > > > > > > -       err = virtqueue_add_inbuf_ctx(rq->vq, rq->sg, 1, buf, ctx, gfp);
> > > > > > > +
> > > > > > > +       if (rq->data_array) {
> > > > > > > +               err = virtnet_rq_map_sg(rq, buf + VIRTNET_RX_PAD + xdp_headroom,
> > > > > > > +                                       vi->hdr_len + GOOD_PACKET_LEN);
> > > > > > > +               if (err)
> > > > > > > +                       goto map_err;
> > > > > > > +
> > > > > > > +               data = virtnet_rq_get_data(rq, buf, rq->last_dma);
> > > > > > > +       } else {
> > > > > > > +               sg_init_one(rq->sg, buf + VIRTNET_RX_PAD + xdp_headroom,
> > > > > > > +                           vi->hdr_len + GOOD_PACKET_LEN);
> > > > > > > +               data = (void *)buf;
> > > > > > > +       }
> > > > > > > +
> > > > > > > +       err = virtqueue_add_inbuf_ctx(rq->vq, rq->sg, 1, data, ctx, gfp);
> > > > > > >         if (err < 0)
> > > > > > > -               put_page(virt_to_head_page(buf));
> > > > > > > +               goto add_err;
> > > > > > > +
> > > > > > > +       return err;
> > > > > > > +
> > > > > > > +add_err:
> > > > > > > +       if (rq->data_array) {
> > > > > > > +               virtnet_rq_unmap(rq, data->dma);
> > > > > > > +               virtnet_rq_recycle_data(rq, data);
> > > > > > > +       }
> > > > > > > +
> > > > > > > +map_err:
> > > > > > > +       put_page(virt_to_head_page(buf));
> > > > > > >         return err;
> > > > > > >  }
> > > > > > >
> > > > > > > @@ -1620,6 +1842,7 @@ static int add_recvbuf_mergeable(struct virtnet_info *vi,
> > > > > > >         unsigned int headroom = virtnet_get_headroom(vi);
> > > > > > >         unsigned int tailroom = headroom ? sizeof(struct skb_shared_info) : 0;
> > > > > > >         unsigned int room = SKB_DATA_ALIGN(headroom + tailroom);
> > > > > > > +       struct virtnet_rq_data *data;
> > > > > > >         char *buf;
> > > > > > >         void *ctx;
> > > > > > >         int err;
> > > > > > > @@ -1650,12 +1873,32 @@ static int add_recvbuf_mergeable(struct virtnet_info *vi,
> > > > > > >                 alloc_frag->offset += hole;
> > > > > > >         }
> > > > > > >
> > > > > > > -       sg_init_one(rq->sg, buf, len);
> > > > > > > +       if (rq->data_array) {
> > > > > > > +               err = virtnet_rq_map_sg(rq, buf, len);
> > > > > > > +               if (err)
> > > > > > > +                       goto map_err;
> > > > > > > +
> > > > > > > +               data = virtnet_rq_get_data(rq, buf, rq->last_dma);
> > > > > > > +       } else {
> > > > > > > +               sg_init_one(rq->sg, buf, len);
> > > > > > > +               data = (void *)buf;
> > > > > > > +       }
> > > > > > > +
> > > > > > >         ctx = mergeable_len_to_ctx(len + room, headroom);
> > > > > > > -       err = virtqueue_add_inbuf_ctx(rq->vq, rq->sg, 1, buf, ctx, gfp);
> > > > > > > +       err = virtqueue_add_inbuf_ctx(rq->vq, rq->sg, 1, data, ctx, gfp);
> > > > > > >         if (err < 0)
> > > > > > > -               put_page(virt_to_head_page(buf));
> > > > > > > +               goto add_err;
> > > > > > > +
> > > > > > > +       return 0;
> > > > > > > +
> > > > > > > +add_err:
> > > > > > > +       if (rq->data_array) {
> > > > > > > +               virtnet_rq_unmap(rq, data->dma);
> > > > > > > +               virtnet_rq_recycle_data(rq, data);
> > > > > > > +       }
> > > > > > >
> > > > > > > +map_err:
> > > > > > > +       put_page(virt_to_head_page(buf));
> > > > > > >         return err;
> > > > > > >  }
> > > > > > >
> > > > > > > @@ -1775,13 +2018,13 @@ static int virtnet_receive(struct receive_queue *rq, int budget,
> > > > > > >                 void *ctx;
> > > > > > >
> > > > > > >                 while (stats.packets < budget &&
> > > > > > > -                      (buf = virtqueue_get_buf_ctx(rq->vq, &len, &ctx))) {
> > > > > > > +                      (buf = virtnet_rq_get_buf(rq, &len, &ctx))) {
> > > > > > >                         receive_buf(vi, rq, buf, len, ctx, xdp_xmit, &stats);
> > > > > > >                         stats.packets++;
> > > > > > >                 }
> > > > > > >         } else {
> > > > > > >                 while (stats.packets < budget &&
> > > > > > > -                      (buf = virtqueue_get_buf(rq->vq, &len)) != NULL) {
> > > > > > > +                      (buf = virtnet_rq_get_buf(rq, &len, NULL)) != NULL) {
> > > > > > >                         receive_buf(vi, rq, buf, len, NULL, xdp_xmit, &stats);
> > > > > > >                         stats.packets++;
> > > > > > >                 }
> > > > > > > @@ -3514,6 +3757,9 @@ static void virtnet_free_queues(struct virtnet_info *vi)
> > > > > > >         for (i = 0; i < vi->max_queue_pairs; i++) {
> > > > > > >                 __netif_napi_del(&vi->rq[i].napi);
> > > > > > >                 __netif_napi_del(&vi->sq[i].napi);
> > > > > > > +
> > > > > > > +               kfree(vi->rq[i].data_array);
> > > > > > > +               kfree(vi->rq[i].dma_array);
> > > > > > >         }
> > > > > > >
> > > > > > >         /* We called __netif_napi_del(),
> > > > > > > @@ -3591,9 +3837,10 @@ static void free_unused_bufs(struct virtnet_info *vi)
> > > > > > >         }
> > > > > > >
> > > > > > >         for (i = 0; i < vi->max_queue_pairs; i++) {
> > > > > > > -               struct virtqueue *vq = vi->rq[i].vq;
> > > > > > > -               while ((buf = virtqueue_detach_unused_buf(vq)) != NULL)
> > > > > > > -                       virtnet_rq_free_unused_buf(vq, buf);
> > > > > > > +               struct receive_queue *rq = &vi->rq[i];
> > > > > > > +
> > > > > > > +               while ((buf = virtnet_rq_detach_unused_buf(rq)) != NULL)
> > > > > > > +                       virtnet_rq_free_unused_buf(rq->vq, buf);
> > > > > > >                 cond_resched();
> > > > > > >         }
> > > > > > >  }
> > > > > > > @@ -3767,6 +4014,10 @@ static int init_vqs(struct virtnet_info *vi)
> > > > > > >         if (ret)
> > > > > > >                 goto err_free;
> > > > > > >
> > > > > > > +       ret = virtnet_rq_merge_map_init(vi);
> > > > > > > +       if (ret)
> > > > > > > +               goto err_free;
> > > > > > > +
> > > > > > >         cpus_read_lock();
> > > > > > >         virtnet_set_affinity(vi);
> > > > > > >         cpus_read_unlock();
> > > > > > > --
> > > > > > > 2.32.0.3.g01195cf9f
> > > > > >
> > > >
> > >
> >
>


^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: [PATCH vhost v11 10/10] virtio_net: merge dma operation for one page
@ 2023-07-11  2:58                 ` Jason Wang
  0 siblings, 0 replies; 176+ messages in thread
From: Jason Wang @ 2023-07-11  2:58 UTC (permalink / raw)
  To: Xuan Zhuo
  Cc: Jesper Dangaard Brouer, Daniel Borkmann, Michael S. Tsirkin,
	netdev, John Fastabend, Alexei Starovoitov, virtualization,
	Christoph Hellwig, Eric Dumazet, Jakub Kicinski, bpf,
	Paolo Abeni, David S. Miller

On Tue, Jul 11, 2023 at 10:42 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
>
> On Tue, 11 Jul 2023 10:36:17 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > On Mon, Jul 10, 2023 at 8:41 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > >
> > > On Mon, 10 Jul 2023 07:59:03 -0400, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > > > On Mon, Jul 10, 2023 at 06:18:30PM +0800, Xuan Zhuo wrote:
> > > > > On Mon, 10 Jul 2023 05:40:21 -0400, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > > > > > On Mon, Jul 10, 2023 at 11:42:37AM +0800, Xuan Zhuo wrote:
> > > > > > > Currently, the virtio core will perform a dma operation for each
> > > > > > > operation. Although, the same page may be operated multiple times.
> > > > > > >
> > > > > > > The driver does the dma operation and manages the dma address based the
> > > > > > > feature premapped of virtio core.
> > > > > > >
> > > > > > > This way, we can perform only one dma operation for the same page. In
> > > > > > > the case of mtu 1500, this can reduce a lot of dma operations.
> > > > > > >
> > > > > > > Tested on Aliyun g7.4large machine, in the case of a cpu 100%, pps
> > > > > > > increased from 1893766 to 1901105. An increase of 0.4%.
> > > > > >
> > > > > > what kind of dma was there? an IOMMU? which vendors? in which mode
> > > > > > of operation?
> > > > >
> > > > >
> > > > > Do you mean this:
> > > > >
> > > > > [    0.470816] iommu: Default domain type: Passthrough
> > > > >
> > > >
> > > > With passthrough, dma API is just some indirect function calls, they do
> > > > not affect the performance a lot.
> > >
> > >
> > > Yes, this benefit is worthless. I seem to have done a meaningless thing. The
> > > overhead of DMA I observed is indeed not too high.
> >
> > Have you measured with iommu=strict?
>
> I have not tested this way, our environment is pt, I wonder if strict is a
> common scenario. I can test it.

It's not a common setup, but it's a way to stress DMA layer to see the overhead.

Thanks

>
> Thanks.
>
>
> >
> > Thanks
> >
> > >
> > > Thanks.
> > >
> > >
> > > >
> > > > Try e.g. bounce buffer. Which is where you will see a problem: your
> > > > patches won't work.
> > > >
> > > >
> > > > > >
> > > > > > > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > > > > >
> > > > > > This kind of difference is likely in the noise.
> > > > >
> > > > > It's really not high, but this is because the proportion of DMA under perf top
> > > > > is not high. Probably that much.
> > > >
> > > > So maybe not worth the complexity.
> > > >
> > > > > >
> > > > > >
> > > > > > > ---
> > > > > > >  drivers/net/virtio_net.c | 283 ++++++++++++++++++++++++++++++++++++---
> > > > > > >  1 file changed, 267 insertions(+), 16 deletions(-)
> > > > > > >
> > > > > > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > > > > > > index 486b5849033d..4de845d35bed 100644
> > > > > > > --- a/drivers/net/virtio_net.c
> > > > > > > +++ b/drivers/net/virtio_net.c
> > > > > > > @@ -126,6 +126,27 @@ static const struct virtnet_stat_desc virtnet_rq_stats_desc[] = {
> > > > > > >  #define VIRTNET_SQ_STATS_LEN   ARRAY_SIZE(virtnet_sq_stats_desc)
> > > > > > >  #define VIRTNET_RQ_STATS_LEN   ARRAY_SIZE(virtnet_rq_stats_desc)
> > > > > > >
> > > > > > > +/* The bufs on the same page may share this struct. */
> > > > > > > +struct virtnet_rq_dma {
> > > > > > > +       struct virtnet_rq_dma *next;
> > > > > > > +
> > > > > > > +       dma_addr_t addr;
> > > > > > > +
> > > > > > > +       void *buf;
> > > > > > > +       u32 len;
> > > > > > > +
> > > > > > > +       u32 ref;
> > > > > > > +};
> > > > > > > +
> > > > > > > +/* Record the dma and buf. */
> > > > > >
> > > > > > I guess I see that. But why?
> > > > > > And these two comments are the extent of the available
> > > > > > documentation, that's not enough I feel.
> > > > > >
> > > > > >
> > > > > > > +struct virtnet_rq_data {
> > > > > > > +       struct virtnet_rq_data *next;
> > > > > >
> > > > > > Is manually reimplementing a linked list the best
> > > > > > we can do?
> > > > >
> > > > > Yes, we can use llist.
> > > > >
> > > > > >
> > > > > > > +
> > > > > > > +       void *buf;
> > > > > > > +
> > > > > > > +       struct virtnet_rq_dma *dma;
> > > > > > > +};
> > > > > > > +
> > > > > > >  /* Internal representation of a send virtqueue */
> > > > > > >  struct send_queue {
> > > > > > >         /* Virtqueue associated with this send _queue */
> > > > > > > @@ -175,6 +196,13 @@ struct receive_queue {
> > > > > > >         char name[16];
> > > > > > >
> > > > > > >         struct xdp_rxq_info xdp_rxq;
> > > > > > > +
> > > > > > > +       struct virtnet_rq_data *data_array;
> > > > > > > +       struct virtnet_rq_data *data_free;
> > > > > > > +
> > > > > > > +       struct virtnet_rq_dma *dma_array;
> > > > > > > +       struct virtnet_rq_dma *dma_free;
> > > > > > > +       struct virtnet_rq_dma *last_dma;
> > > > > > >  };
> > > > > > >
> > > > > > >  /* This structure can contain rss message with maximum settings for indirection table and keysize
> > > > > > > @@ -549,6 +577,176 @@ static struct sk_buff *page_to_skb(struct virtnet_info *vi,
> > > > > > >         return skb;
> > > > > > >  }
> > > > > > >
> > > > > > > +static void virtnet_rq_unmap(struct receive_queue *rq, struct virtnet_rq_dma *dma)
> > > > > > > +{
> > > > > > > +       struct device *dev;
> > > > > > > +
> > > > > > > +       --dma->ref;
> > > > > > > +
> > > > > > > +       if (dma->ref)
> > > > > > > +               return;
> > > > > > > +
> > > > > >
> > > > > > If you don't unmap there is no guarantee valid data will be
> > > > > > there in the buffer.
> > > > > >
> > > > > > > +       dev = virtqueue_dma_dev(rq->vq);
> > > > > > > +
> > > > > > > +       dma_unmap_page(dev, dma->addr, dma->len, DMA_FROM_DEVICE);
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > > +
> > > > > > > +       dma->next = rq->dma_free;
> > > > > > > +       rq->dma_free = dma;
> > > > > > > +}
> > > > > > > +
> > > > > > > +static void *virtnet_rq_recycle_data(struct receive_queue *rq,
> > > > > > > +                                    struct virtnet_rq_data *data)
> > > > > > > +{
> > > > > > > +       void *buf;
> > > > > > > +
> > > > > > > +       buf = data->buf;
> > > > > > > +
> > > > > > > +       data->next = rq->data_free;
> > > > > > > +       rq->data_free = data;
> > > > > > > +
> > > > > > > +       return buf;
> > > > > > > +}
> > > > > > > +
> > > > > > > +static struct virtnet_rq_data *virtnet_rq_get_data(struct receive_queue *rq,
> > > > > > > +                                                  void *buf,
> > > > > > > +                                                  struct virtnet_rq_dma *dma)
> > > > > > > +{
> > > > > > > +       struct virtnet_rq_data *data;
> > > > > > > +
> > > > > > > +       data = rq->data_free;
> > > > > > > +       rq->data_free = data->next;
> > > > > > > +
> > > > > > > +       data->buf = buf;
> > > > > > > +       data->dma = dma;
> > > > > > > +
> > > > > > > +       return data;
> > > > > > > +}
> > > > > > > +
> > > > > > > +static void *virtnet_rq_get_buf(struct receive_queue *rq, u32 *len, void **ctx)
> > > > > > > +{
> > > > > > > +       struct virtnet_rq_data *data;
> > > > > > > +       void *buf;
> > > > > > > +
> > > > > > > +       buf = virtqueue_get_buf_ctx(rq->vq, len, ctx);
> > > > > > > +       if (!buf || !rq->data_array)
> > > > > > > +               return buf;
> > > > > > > +
> > > > > > > +       data = buf;
> > > > > > > +
> > > > > > > +       virtnet_rq_unmap(rq, data->dma);
> > > > > > > +
> > > > > > > +       return virtnet_rq_recycle_data(rq, data);
> > > > > > > +}
> > > > > > > +
> > > > > > > +static void *virtnet_rq_detach_unused_buf(struct receive_queue *rq)
> > > > > > > +{
> > > > > > > +       struct virtnet_rq_data *data;
> > > > > > > +       void *buf;
> > > > > > > +
> > > > > > > +       buf = virtqueue_detach_unused_buf(rq->vq);
> > > > > > > +       if (!buf || !rq->data_array)
> > > > > > > +               return buf;
> > > > > > > +
> > > > > > > +       data = buf;
> > > > > > > +
> > > > > > > +       virtnet_rq_unmap(rq, data->dma);
> > > > > > > +
> > > > > > > +       return virtnet_rq_recycle_data(rq, data);
> > > > > > > +}
> > > > > > > +
> > > > > > > +static int virtnet_rq_map_sg(struct receive_queue *rq, void *buf, u32 len)
> > > > > > > +{
> > > > > > > +       struct virtnet_rq_dma *dma = rq->last_dma;
> > > > > > > +       struct device *dev;
> > > > > > > +       u32 off, map_len;
> > > > > > > +       dma_addr_t addr;
> > > > > > > +       void *end;
> > > > > > > +
> > > > > > > +       if (likely(dma) && buf >= dma->buf && (buf + len <= dma->buf + dma->len)) {
> > > > > > > +               ++dma->ref;
> > > > > > > +               addr = dma->addr + (buf - dma->buf);
> > > > > > > +               goto ok;
> > > > > > > +       }
> > > > > >
> > > > > > So this is the meat of the proposed optimization. I guess that
> > > > > > if the last buffer we allocated happens to be in the same page
> > > > > > as this one then they can both be mapped for DMA together.
> > > > >
> > > > > Since we use page_frag, the buffers we allocated are all continuous.
> > > > >
> > > > > > Why last one specifically? Whether next one happens to
> > > > > > be close depends on luck. If you want to try optimizing this
> > > > > > the right thing to do is likely by using a page pool.
> > > > > > There's actually work upstream on page pool, look it up.
> > > > >
> > > > > As we discussed in another thread, the page pool is first used for xdp. Let's
> > > > > transform it step by step.
> > > > >
> > > > > Thanks.
> > > >
> > > > ok so this should wait then?
> > > >
> > > > > >
> > > > > > > +
> > > > > > > +       end = buf + len - 1;
> > > > > > > +       off = offset_in_page(end);
> > > > > > > +       map_len = len + PAGE_SIZE - off;
> > > > > > > +
> > > > > > > +       dev = virtqueue_dma_dev(rq->vq);
> > > > > > > +
> > > > > > > +       addr = dma_map_page_attrs(dev, virt_to_page(buf), offset_in_page(buf),
> > > > > > > +                                 map_len, DMA_FROM_DEVICE, 0);
> > > > > > > +       if (addr == DMA_MAPPING_ERROR)
> > > > > > > +               return -ENOMEM;
> > > > > > > +
> > > > > > > +       dma = rq->dma_free;
> > > > > > > +       rq->dma_free = dma->next;
> > > > > > > +
> > > > > > > +       dma->ref = 1;
> > > > > > > +       dma->buf = buf;
> > > > > > > +       dma->addr = addr;
> > > > > > > +       dma->len = map_len;
> > > > > > > +
> > > > > > > +       rq->last_dma = dma;
> > > > > > > +
> > > > > > > +ok:
> > > > > > > +       sg_init_table(rq->sg, 1);
> > > > > > > +       rq->sg[0].dma_address = addr;
> > > > > > > +       rq->sg[0].length = len;
> > > > > > > +
> > > > > > > +       return 0;
> > > > > > > +}
> > > > > > > +
> > > > > > > +static int virtnet_rq_merge_map_init(struct virtnet_info *vi)
> > > > > > > +{
> > > > > > > +       struct receive_queue *rq;
> > > > > > > +       int i, err, j, num;
> > > > > > > +
> > > > > > > +       /* disable for big mode */
> > > > > > > +       if (!vi->mergeable_rx_bufs && vi->big_packets)
> > > > > > > +               return 0;
> > > > > > > +
> > > > > > > +       for (i = 0; i < vi->max_queue_pairs; i++) {
> > > > > > > +               err = virtqueue_set_premapped(vi->rq[i].vq);
> > > > > > > +               if (err)
> > > > > > > +                       continue;
> > > > > > > +
> > > > > > > +               rq = &vi->rq[i];
> > > > > > > +
> > > > > > > +               num = virtqueue_get_vring_size(rq->vq);
> > > > > > > +
> > > > > > > +               rq->data_array = kmalloc_array(num, sizeof(*rq->data_array), GFP_KERNEL);
> > > > > > > +               if (!rq->data_array)
> > > > > > > +                       goto err;
> > > > > > > +
> > > > > > > +               rq->dma_array = kmalloc_array(num, sizeof(*rq->dma_array), GFP_KERNEL);
> > > > > > > +               if (!rq->dma_array)
> > > > > > > +                       goto err;
> > > > > > > +
> > > > > > > +               for (j = 0; j < num; ++j) {
> > > > > > > +                       rq->data_array[j].next = rq->data_free;
> > > > > > > +                       rq->data_free = &rq->data_array[j];
> > > > > > > +
> > > > > > > +                       rq->dma_array[j].next = rq->dma_free;
> > > > > > > +                       rq->dma_free = &rq->dma_array[j];
> > > > > > > +               }
> > > > > > > +       }
> > > > > > > +
> > > > > > > +       return 0;
> > > > > > > +
> > > > > > > +err:
> > > > > > > +       for (i = 0; i < vi->max_queue_pairs; i++) {
> > > > > > > +               struct receive_queue *rq;
> > > > > > > +
> > > > > > > +               rq = &vi->rq[i];
> > > > > > > +
> > > > > > > +               kfree(rq->dma_array);
> > > > > > > +               kfree(rq->data_array);
> > > > > > > +       }
> > > > > > > +
> > > > > > > +       return -ENOMEM;
> > > > > > > +}
> > > > > > > +
> > > > > > >  static void free_old_xmit_skbs(struct send_queue *sq, bool in_napi)
> > > > > > >  {
> > > > > > >         unsigned int len;
> > > > > > > @@ -835,7 +1033,7 @@ static struct page *xdp_linearize_page(struct receive_queue *rq,
> > > > > > >                 void *buf;
> > > > > > >                 int off;
> > > > > > >
> > > > > > > -               buf = virtqueue_get_buf(rq->vq, &buflen);
> > > > > > > +               buf = virtnet_rq_get_buf(rq, &buflen, NULL);
> > > > > > >                 if (unlikely(!buf))
> > > > > > >                         goto err_buf;
> > > > > > >
> > > > > > > @@ -1126,7 +1324,7 @@ static int virtnet_build_xdp_buff_mrg(struct net_device *dev,
> > > > > > >                 return -EINVAL;
> > > > > > >
> > > > > > >         while (--*num_buf > 0) {
> > > > > > > -               buf = virtqueue_get_buf_ctx(rq->vq, &len, &ctx);
> > > > > > > +               buf = virtnet_rq_get_buf(rq, &len, &ctx);
> > > > > > >                 if (unlikely(!buf)) {
> > > > > > >                         pr_debug("%s: rx error: %d buffers out of %d missing\n",
> > > > > > >                                  dev->name, *num_buf,
> > > > > > > @@ -1351,7 +1549,7 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
> > > > > > >         while (--num_buf) {
> > > > > > >                 int num_skb_frags;
> > > > > > >
> > > > > > > -               buf = virtqueue_get_buf_ctx(rq->vq, &len, &ctx);
> > > > > > > +               buf = virtnet_rq_get_buf(rq, &len, &ctx);
> > > > > > >                 if (unlikely(!buf)) {
> > > > > > >                         pr_debug("%s: rx error: %d buffers out of %d missing\n",
> > > > > > >                                  dev->name, num_buf,
> > > > > > > @@ -1414,7 +1612,7 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
> > > > > > >  err_skb:
> > > > > > >         put_page(page);
> > > > > > >         while (num_buf-- > 1) {
> > > > > > > -               buf = virtqueue_get_buf(rq->vq, &len);
> > > > > > > +               buf = virtnet_rq_get_buf(rq, &len, NULL);
> > > > > > >                 if (unlikely(!buf)) {
> > > > > > >                         pr_debug("%s: rx error: %d buffers missing\n",
> > > > > > >                                  dev->name, num_buf);
> > > > > > > @@ -1529,6 +1727,7 @@ static int add_recvbuf_small(struct virtnet_info *vi, struct receive_queue *rq,
> > > > > > >         unsigned int xdp_headroom = virtnet_get_headroom(vi);
> > > > > > >         void *ctx = (void *)(unsigned long)xdp_headroom;
> > > > > > >         int len = vi->hdr_len + VIRTNET_RX_PAD + GOOD_PACKET_LEN + xdp_headroom;
> > > > > > > +       struct virtnet_rq_data *data;
> > > > > > >         int err;
> > > > > > >
> > > > > > >         len = SKB_DATA_ALIGN(len) +
> > > > > > > @@ -1539,11 +1738,34 @@ static int add_recvbuf_small(struct virtnet_info *vi, struct receive_queue *rq,
> > > > > > >         buf = (char *)page_address(alloc_frag->page) + alloc_frag->offset;
> > > > > > >         get_page(alloc_frag->page);
> > > > > > >         alloc_frag->offset += len;
> > > > > > > -       sg_init_one(rq->sg, buf + VIRTNET_RX_PAD + xdp_headroom,
> > > > > > > -                   vi->hdr_len + GOOD_PACKET_LEN);
> > > > > > > -       err = virtqueue_add_inbuf_ctx(rq->vq, rq->sg, 1, buf, ctx, gfp);
> > > > > > > +
> > > > > > > +       if (rq->data_array) {
> > > > > > > +               err = virtnet_rq_map_sg(rq, buf + VIRTNET_RX_PAD + xdp_headroom,
> > > > > > > +                                       vi->hdr_len + GOOD_PACKET_LEN);
> > > > > > > +               if (err)
> > > > > > > +                       goto map_err;
> > > > > > > +
> > > > > > > +               data = virtnet_rq_get_data(rq, buf, rq->last_dma);
> > > > > > > +       } else {
> > > > > > > +               sg_init_one(rq->sg, buf + VIRTNET_RX_PAD + xdp_headroom,
> > > > > > > +                           vi->hdr_len + GOOD_PACKET_LEN);
> > > > > > > +               data = (void *)buf;
> > > > > > > +       }
> > > > > > > +
> > > > > > > +       err = virtqueue_add_inbuf_ctx(rq->vq, rq->sg, 1, data, ctx, gfp);
> > > > > > >         if (err < 0)
> > > > > > > -               put_page(virt_to_head_page(buf));
> > > > > > > +               goto add_err;
> > > > > > > +
> > > > > > > +       return err;
> > > > > > > +
> > > > > > > +add_err:
> > > > > > > +       if (rq->data_array) {
> > > > > > > +               virtnet_rq_unmap(rq, data->dma);
> > > > > > > +               virtnet_rq_recycle_data(rq, data);
> > > > > > > +       }
> > > > > > > +
> > > > > > > +map_err:
> > > > > > > +       put_page(virt_to_head_page(buf));
> > > > > > >         return err;
> > > > > > >  }
> > > > > > >
> > > > > > > @@ -1620,6 +1842,7 @@ static int add_recvbuf_mergeable(struct virtnet_info *vi,
> > > > > > >         unsigned int headroom = virtnet_get_headroom(vi);
> > > > > > >         unsigned int tailroom = headroom ? sizeof(struct skb_shared_info) : 0;
> > > > > > >         unsigned int room = SKB_DATA_ALIGN(headroom + tailroom);
> > > > > > > +       struct virtnet_rq_data *data;
> > > > > > >         char *buf;
> > > > > > >         void *ctx;
> > > > > > >         int err;
> > > > > > > @@ -1650,12 +1873,32 @@ static int add_recvbuf_mergeable(struct virtnet_info *vi,
> > > > > > >                 alloc_frag->offset += hole;
> > > > > > >         }
> > > > > > >
> > > > > > > -       sg_init_one(rq->sg, buf, len);
> > > > > > > +       if (rq->data_array) {
> > > > > > > +               err = virtnet_rq_map_sg(rq, buf, len);
> > > > > > > +               if (err)
> > > > > > > +                       goto map_err;
> > > > > > > +
> > > > > > > +               data = virtnet_rq_get_data(rq, buf, rq->last_dma);
> > > > > > > +       } else {
> > > > > > > +               sg_init_one(rq->sg, buf, len);
> > > > > > > +               data = (void *)buf;
> > > > > > > +       }
> > > > > > > +
> > > > > > >         ctx = mergeable_len_to_ctx(len + room, headroom);
> > > > > > > -       err = virtqueue_add_inbuf_ctx(rq->vq, rq->sg, 1, buf, ctx, gfp);
> > > > > > > +       err = virtqueue_add_inbuf_ctx(rq->vq, rq->sg, 1, data, ctx, gfp);
> > > > > > >         if (err < 0)
> > > > > > > -               put_page(virt_to_head_page(buf));
> > > > > > > +               goto add_err;
> > > > > > > +
> > > > > > > +       return 0;
> > > > > > > +
> > > > > > > +add_err:
> > > > > > > +       if (rq->data_array) {
> > > > > > > +               virtnet_rq_unmap(rq, data->dma);
> > > > > > > +               virtnet_rq_recycle_data(rq, data);
> > > > > > > +       }
> > > > > > >
> > > > > > > +map_err:
> > > > > > > +       put_page(virt_to_head_page(buf));
> > > > > > >         return err;
> > > > > > >  }
> > > > > > >
> > > > > > > @@ -1775,13 +2018,13 @@ static int virtnet_receive(struct receive_queue *rq, int budget,
> > > > > > >                 void *ctx;
> > > > > > >
> > > > > > >                 while (stats.packets < budget &&
> > > > > > > -                      (buf = virtqueue_get_buf_ctx(rq->vq, &len, &ctx))) {
> > > > > > > +                      (buf = virtnet_rq_get_buf(rq, &len, &ctx))) {
> > > > > > >                         receive_buf(vi, rq, buf, len, ctx, xdp_xmit, &stats);
> > > > > > >                         stats.packets++;
> > > > > > >                 }
> > > > > > >         } else {
> > > > > > >                 while (stats.packets < budget &&
> > > > > > > -                      (buf = virtqueue_get_buf(rq->vq, &len)) != NULL) {
> > > > > > > +                      (buf = virtnet_rq_get_buf(rq, &len, NULL)) != NULL) {
> > > > > > >                         receive_buf(vi, rq, buf, len, NULL, xdp_xmit, &stats);
> > > > > > >                         stats.packets++;
> > > > > > >                 }
> > > > > > > @@ -3514,6 +3757,9 @@ static void virtnet_free_queues(struct virtnet_info *vi)
> > > > > > >         for (i = 0; i < vi->max_queue_pairs; i++) {
> > > > > > >                 __netif_napi_del(&vi->rq[i].napi);
> > > > > > >                 __netif_napi_del(&vi->sq[i].napi);
> > > > > > > +
> > > > > > > +               kfree(vi->rq[i].data_array);
> > > > > > > +               kfree(vi->rq[i].dma_array);
> > > > > > >         }
> > > > > > >
> > > > > > >         /* We called __netif_napi_del(),
> > > > > > > @@ -3591,9 +3837,10 @@ static void free_unused_bufs(struct virtnet_info *vi)
> > > > > > >         }
> > > > > > >
> > > > > > >         for (i = 0; i < vi->max_queue_pairs; i++) {
> > > > > > > -               struct virtqueue *vq = vi->rq[i].vq;
> > > > > > > -               while ((buf = virtqueue_detach_unused_buf(vq)) != NULL)
> > > > > > > -                       virtnet_rq_free_unused_buf(vq, buf);
> > > > > > > +               struct receive_queue *rq = &vi->rq[i];
> > > > > > > +
> > > > > > > +               while ((buf = virtnet_rq_detach_unused_buf(rq)) != NULL)
> > > > > > > +                       virtnet_rq_free_unused_buf(rq->vq, buf);
> > > > > > >                 cond_resched();
> > > > > > >         }
> > > > > > >  }
> > > > > > > @@ -3767,6 +4014,10 @@ static int init_vqs(struct virtnet_info *vi)
> > > > > > >         if (ret)
> > > > > > >                 goto err_free;
> > > > > > >
> > > > > > > +       ret = virtnet_rq_merge_map_init(vi);
> > > > > > > +       if (ret)
> > > > > > > +               goto err_free;
> > > > > > > +
> > > > > > >         cpus_read_lock();
> > > > > > >         virtnet_set_affinity(vi);
> > > > > > >         cpus_read_unlock();
> > > > > > > --
> > > > > > > 2.32.0.3.g01195cf9f
> > > > > >
> > > >
> > >
> >
>

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: [PATCH vhost v11 10/10] virtio_net: merge dma operation for one page
  2023-07-11  2:58                 ` Jason Wang
@ 2023-07-12  7:54                   ` Xuan Zhuo
  -1 siblings, 0 replies; 176+ messages in thread
From: Xuan Zhuo @ 2023-07-12  7:54 UTC (permalink / raw)
  To: Jason Wang
  Cc: Michael S. Tsirkin, virtualization, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Alexei Starovoitov,
	Daniel Borkmann, Jesper Dangaard Brouer, John Fastabend, netdev,
	bpf, Christoph Hellwig

On Tue, 11 Jul 2023 10:58:51 +0800, Jason Wang <jasowang@redhat.com> wrote:
> On Tue, Jul 11, 2023 at 10:42 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> >
> > On Tue, 11 Jul 2023 10:36:17 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > On Mon, Jul 10, 2023 at 8:41 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > >
> > > > On Mon, 10 Jul 2023 07:59:03 -0400, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > > > > On Mon, Jul 10, 2023 at 06:18:30PM +0800, Xuan Zhuo wrote:
> > > > > > On Mon, 10 Jul 2023 05:40:21 -0400, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > > > > > > On Mon, Jul 10, 2023 at 11:42:37AM +0800, Xuan Zhuo wrote:
> > > > > > > > Currently, the virtio core will perform a dma operation for each
> > > > > > > > operation. Although, the same page may be operated multiple times.
> > > > > > > >
> > > > > > > > The driver does the dma operation and manages the dma address based the
> > > > > > > > feature premapped of virtio core.
> > > > > > > >
> > > > > > > > This way, we can perform only one dma operation for the same page. In
> > > > > > > > the case of mtu 1500, this can reduce a lot of dma operations.
> > > > > > > >
> > > > > > > > Tested on Aliyun g7.4large machine, in the case of a cpu 100%, pps
> > > > > > > > increased from 1893766 to 1901105. An increase of 0.4%.
> > > > > > >
> > > > > > > what kind of dma was there? an IOMMU? which vendors? in which mode
> > > > > > > of operation?
> > > > > >
> > > > > >
> > > > > > Do you mean this:
> > > > > >
> > > > > > [    0.470816] iommu: Default domain type: Passthrough
> > > > > >
> > > > >
> > > > > With passthrough, dma API is just some indirect function calls, they do
> > > > > not affect the performance a lot.
> > > >
> > > >
> > > > Yes, this benefit is worthless. I seem to have done a meaningless thing. The
> > > > overhead of DMA I observed is indeed not too high.
> > >
> > > Have you measured with iommu=strict?
> >
> > I have not tested this way, our environment is pt, I wonder if strict is a
> > common scenario. I can test it.
>
> It's not a common setup, but it's a way to stress DMA layer to see the overhead.

kernel command line: intel_iommu=on iommu.strict=1 iommu.passthrough=0

virtio-net without merge dma 428614.00 pps

virtio-net with merge dma    742853.00 pps


Thanks.




>
> Thanks
>
> >
> > Thanks.
> >
> >
> > >
> > > Thanks
> > >
> > > >
> > > > Thanks.
> > > >
> > > >
> > > > >
> > > > > Try e.g. bounce buffer. Which is where you will see a problem: your
> > > > > patches won't work.
> > > > >
> > > > >
> > > > > > >
> > > > > > > > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > > > > > >
> > > > > > > This kind of difference is likely in the noise.
> > > > > >
> > > > > > It's really not high, but this is because the proportion of DMA under perf top
> > > > > > is not high. Probably that much.
> > > > >
> > > > > So maybe not worth the complexity.
> > > > >
> > > > > > >
> > > > > > >
> > > > > > > > ---
> > > > > > > >  drivers/net/virtio_net.c | 283 ++++++++++++++++++++++++++++++++++++---
> > > > > > > >  1 file changed, 267 insertions(+), 16 deletions(-)
> > > > > > > >
> > > > > > > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > > > > > > > index 486b5849033d..4de845d35bed 100644
> > > > > > > > --- a/drivers/net/virtio_net.c
> > > > > > > > +++ b/drivers/net/virtio_net.c
> > > > > > > > @@ -126,6 +126,27 @@ static const struct virtnet_stat_desc virtnet_rq_stats_desc[] = {
> > > > > > > >  #define VIRTNET_SQ_STATS_LEN   ARRAY_SIZE(virtnet_sq_stats_desc)
> > > > > > > >  #define VIRTNET_RQ_STATS_LEN   ARRAY_SIZE(virtnet_rq_stats_desc)
> > > > > > > >
> > > > > > > > +/* The bufs on the same page may share this struct. */
> > > > > > > > +struct virtnet_rq_dma {
> > > > > > > > +       struct virtnet_rq_dma *next;
> > > > > > > > +
> > > > > > > > +       dma_addr_t addr;
> > > > > > > > +
> > > > > > > > +       void *buf;
> > > > > > > > +       u32 len;
> > > > > > > > +
> > > > > > > > +       u32 ref;
> > > > > > > > +};
> > > > > > > > +
> > > > > > > > +/* Record the dma and buf. */
> > > > > > >
> > > > > > > I guess I see that. But why?
> > > > > > > And these two comments are the extent of the available
> > > > > > > documentation, that's not enough I feel.
> > > > > > >
> > > > > > >
> > > > > > > > +struct virtnet_rq_data {
> > > > > > > > +       struct virtnet_rq_data *next;
> > > > > > >
> > > > > > > Is manually reimplementing a linked list the best
> > > > > > > we can do?
> > > > > >
> > > > > > Yes, we can use llist.
> > > > > >
> > > > > > >
> > > > > > > > +
> > > > > > > > +       void *buf;
> > > > > > > > +
> > > > > > > > +       struct virtnet_rq_dma *dma;
> > > > > > > > +};
> > > > > > > > +
> > > > > > > >  /* Internal representation of a send virtqueue */
> > > > > > > >  struct send_queue {
> > > > > > > >         /* Virtqueue associated with this send _queue */
> > > > > > > > @@ -175,6 +196,13 @@ struct receive_queue {
> > > > > > > >         char name[16];
> > > > > > > >
> > > > > > > >         struct xdp_rxq_info xdp_rxq;
> > > > > > > > +
> > > > > > > > +       struct virtnet_rq_data *data_array;
> > > > > > > > +       struct virtnet_rq_data *data_free;
> > > > > > > > +
> > > > > > > > +       struct virtnet_rq_dma *dma_array;
> > > > > > > > +       struct virtnet_rq_dma *dma_free;
> > > > > > > > +       struct virtnet_rq_dma *last_dma;
> > > > > > > >  };
> > > > > > > >
> > > > > > > >  /* This structure can contain rss message with maximum settings for indirection table and keysize
> > > > > > > > @@ -549,6 +577,176 @@ static struct sk_buff *page_to_skb(struct virtnet_info *vi,
> > > > > > > >         return skb;
> > > > > > > >  }
> > > > > > > >
> > > > > > > > +static void virtnet_rq_unmap(struct receive_queue *rq, struct virtnet_rq_dma *dma)
> > > > > > > > +{
> > > > > > > > +       struct device *dev;
> > > > > > > > +
> > > > > > > > +       --dma->ref;
> > > > > > > > +
> > > > > > > > +       if (dma->ref)
> > > > > > > > +               return;
> > > > > > > > +
> > > > > > >
> > > > > > > If you don't unmap there is no guarantee valid data will be
> > > > > > > there in the buffer.
> > > > > > >
> > > > > > > > +       dev = virtqueue_dma_dev(rq->vq);
> > > > > > > > +
> > > > > > > > +       dma_unmap_page(dev, dma->addr, dma->len, DMA_FROM_DEVICE);
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > > +
> > > > > > > > +       dma->next = rq->dma_free;
> > > > > > > > +       rq->dma_free = dma;
> > > > > > > > +}
> > > > > > > > +
> > > > > > > > +static void *virtnet_rq_recycle_data(struct receive_queue *rq,
> > > > > > > > +                                    struct virtnet_rq_data *data)
> > > > > > > > +{
> > > > > > > > +       void *buf;
> > > > > > > > +
> > > > > > > > +       buf = data->buf;
> > > > > > > > +
> > > > > > > > +       data->next = rq->data_free;
> > > > > > > > +       rq->data_free = data;
> > > > > > > > +
> > > > > > > > +       return buf;
> > > > > > > > +}
> > > > > > > > +
> > > > > > > > +static struct virtnet_rq_data *virtnet_rq_get_data(struct receive_queue *rq,
> > > > > > > > +                                                  void *buf,
> > > > > > > > +                                                  struct virtnet_rq_dma *dma)
> > > > > > > > +{
> > > > > > > > +       struct virtnet_rq_data *data;
> > > > > > > > +
> > > > > > > > +       data = rq->data_free;
> > > > > > > > +       rq->data_free = data->next;
> > > > > > > > +
> > > > > > > > +       data->buf = buf;
> > > > > > > > +       data->dma = dma;
> > > > > > > > +
> > > > > > > > +       return data;
> > > > > > > > +}
> > > > > > > > +
> > > > > > > > +static void *virtnet_rq_get_buf(struct receive_queue *rq, u32 *len, void **ctx)
> > > > > > > > +{
> > > > > > > > +       struct virtnet_rq_data *data;
> > > > > > > > +       void *buf;
> > > > > > > > +
> > > > > > > > +       buf = virtqueue_get_buf_ctx(rq->vq, len, ctx);
> > > > > > > > +       if (!buf || !rq->data_array)
> > > > > > > > +               return buf;
> > > > > > > > +
> > > > > > > > +       data = buf;
> > > > > > > > +
> > > > > > > > +       virtnet_rq_unmap(rq, data->dma);
> > > > > > > > +
> > > > > > > > +       return virtnet_rq_recycle_data(rq, data);
> > > > > > > > +}
> > > > > > > > +
> > > > > > > > +static void *virtnet_rq_detach_unused_buf(struct receive_queue *rq)
> > > > > > > > +{
> > > > > > > > +       struct virtnet_rq_data *data;
> > > > > > > > +       void *buf;
> > > > > > > > +
> > > > > > > > +       buf = virtqueue_detach_unused_buf(rq->vq);
> > > > > > > > +       if (!buf || !rq->data_array)
> > > > > > > > +               return buf;
> > > > > > > > +
> > > > > > > > +       data = buf;
> > > > > > > > +
> > > > > > > > +       virtnet_rq_unmap(rq, data->dma);
> > > > > > > > +
> > > > > > > > +       return virtnet_rq_recycle_data(rq, data);
> > > > > > > > +}
> > > > > > > > +
> > > > > > > > +static int virtnet_rq_map_sg(struct receive_queue *rq, void *buf, u32 len)
> > > > > > > > +{
> > > > > > > > +       struct virtnet_rq_dma *dma = rq->last_dma;
> > > > > > > > +       struct device *dev;
> > > > > > > > +       u32 off, map_len;
> > > > > > > > +       dma_addr_t addr;
> > > > > > > > +       void *end;
> > > > > > > > +
> > > > > > > > +       if (likely(dma) && buf >= dma->buf && (buf + len <= dma->buf + dma->len)) {
> > > > > > > > +               ++dma->ref;
> > > > > > > > +               addr = dma->addr + (buf - dma->buf);
> > > > > > > > +               goto ok;
> > > > > > > > +       }
> > > > > > >
> > > > > > > So this is the meat of the proposed optimization. I guess that
> > > > > > > if the last buffer we allocated happens to be in the same page
> > > > > > > as this one then they can both be mapped for DMA together.
> > > > > >
> > > > > > Since we use page_frag, the buffers we allocated are all continuous.
> > > > > >
> > > > > > > Why last one specifically? Whether next one happens to
> > > > > > > be close depends on luck. If you want to try optimizing this
> > > > > > > the right thing to do is likely by using a page pool.
> > > > > > > There's actually work upstream on page pool, look it up.
> > > > > >
> > > > > > As we discussed in another thread, the page pool is first used for xdp. Let's
> > > > > > transform it step by step.
> > > > > >
> > > > > > Thanks.
> > > > >
> > > > > ok so this should wait then?
> > > > >
> > > > > > >
> > > > > > > > +
> > > > > > > > +       end = buf + len - 1;
> > > > > > > > +       off = offset_in_page(end);
> > > > > > > > +       map_len = len + PAGE_SIZE - off;
> > > > > > > > +
> > > > > > > > +       dev = virtqueue_dma_dev(rq->vq);
> > > > > > > > +
> > > > > > > > +       addr = dma_map_page_attrs(dev, virt_to_page(buf), offset_in_page(buf),
> > > > > > > > +                                 map_len, DMA_FROM_DEVICE, 0);
> > > > > > > > +       if (addr == DMA_MAPPING_ERROR)
> > > > > > > > +               return -ENOMEM;
> > > > > > > > +
> > > > > > > > +       dma = rq->dma_free;
> > > > > > > > +       rq->dma_free = dma->next;
> > > > > > > > +
> > > > > > > > +       dma->ref = 1;
> > > > > > > > +       dma->buf = buf;
> > > > > > > > +       dma->addr = addr;
> > > > > > > > +       dma->len = map_len;
> > > > > > > > +
> > > > > > > > +       rq->last_dma = dma;
> > > > > > > > +
> > > > > > > > +ok:
> > > > > > > > +       sg_init_table(rq->sg, 1);
> > > > > > > > +       rq->sg[0].dma_address = addr;
> > > > > > > > +       rq->sg[0].length = len;
> > > > > > > > +
> > > > > > > > +       return 0;
> > > > > > > > +}
> > > > > > > > +
> > > > > > > > +static int virtnet_rq_merge_map_init(struct virtnet_info *vi)
> > > > > > > > +{
> > > > > > > > +       struct receive_queue *rq;
> > > > > > > > +       int i, err, j, num;
> > > > > > > > +
> > > > > > > > +       /* disable for big mode */
> > > > > > > > +       if (!vi->mergeable_rx_bufs && vi->big_packets)
> > > > > > > > +               return 0;
> > > > > > > > +
> > > > > > > > +       for (i = 0; i < vi->max_queue_pairs; i++) {
> > > > > > > > +               err = virtqueue_set_premapped(vi->rq[i].vq);
> > > > > > > > +               if (err)
> > > > > > > > +                       continue;
> > > > > > > > +
> > > > > > > > +               rq = &vi->rq[i];
> > > > > > > > +
> > > > > > > > +               num = virtqueue_get_vring_size(rq->vq);
> > > > > > > > +
> > > > > > > > +               rq->data_array = kmalloc_array(num, sizeof(*rq->data_array), GFP_KERNEL);
> > > > > > > > +               if (!rq->data_array)
> > > > > > > > +                       goto err;
> > > > > > > > +
> > > > > > > > +               rq->dma_array = kmalloc_array(num, sizeof(*rq->dma_array), GFP_KERNEL);
> > > > > > > > +               if (!rq->dma_array)
> > > > > > > > +                       goto err;
> > > > > > > > +
> > > > > > > > +               for (j = 0; j < num; ++j) {
> > > > > > > > +                       rq->data_array[j].next = rq->data_free;
> > > > > > > > +                       rq->data_free = &rq->data_array[j];
> > > > > > > > +
> > > > > > > > +                       rq->dma_array[j].next = rq->dma_free;
> > > > > > > > +                       rq->dma_free = &rq->dma_array[j];
> > > > > > > > +               }
> > > > > > > > +       }
> > > > > > > > +
> > > > > > > > +       return 0;
> > > > > > > > +
> > > > > > > > +err:
> > > > > > > > +       for (i = 0; i < vi->max_queue_pairs; i++) {
> > > > > > > > +               struct receive_queue *rq;
> > > > > > > > +
> > > > > > > > +               rq = &vi->rq[i];
> > > > > > > > +
> > > > > > > > +               kfree(rq->dma_array);
> > > > > > > > +               kfree(rq->data_array);
> > > > > > > > +       }
> > > > > > > > +
> > > > > > > > +       return -ENOMEM;
> > > > > > > > +}
> > > > > > > > +
> > > > > > > >  static void free_old_xmit_skbs(struct send_queue *sq, bool in_napi)
> > > > > > > >  {
> > > > > > > >         unsigned int len;
> > > > > > > > @@ -835,7 +1033,7 @@ static struct page *xdp_linearize_page(struct receive_queue *rq,
> > > > > > > >                 void *buf;
> > > > > > > >                 int off;
> > > > > > > >
> > > > > > > > -               buf = virtqueue_get_buf(rq->vq, &buflen);
> > > > > > > > +               buf = virtnet_rq_get_buf(rq, &buflen, NULL);
> > > > > > > >                 if (unlikely(!buf))
> > > > > > > >                         goto err_buf;
> > > > > > > >
> > > > > > > > @@ -1126,7 +1324,7 @@ static int virtnet_build_xdp_buff_mrg(struct net_device *dev,
> > > > > > > >                 return -EINVAL;
> > > > > > > >
> > > > > > > >         while (--*num_buf > 0) {
> > > > > > > > -               buf = virtqueue_get_buf_ctx(rq->vq, &len, &ctx);
> > > > > > > > +               buf = virtnet_rq_get_buf(rq, &len, &ctx);
> > > > > > > >                 if (unlikely(!buf)) {
> > > > > > > >                         pr_debug("%s: rx error: %d buffers out of %d missing\n",
> > > > > > > >                                  dev->name, *num_buf,
> > > > > > > > @@ -1351,7 +1549,7 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
> > > > > > > >         while (--num_buf) {
> > > > > > > >                 int num_skb_frags;
> > > > > > > >
> > > > > > > > -               buf = virtqueue_get_buf_ctx(rq->vq, &len, &ctx);
> > > > > > > > +               buf = virtnet_rq_get_buf(rq, &len, &ctx);
> > > > > > > >                 if (unlikely(!buf)) {
> > > > > > > >                         pr_debug("%s: rx error: %d buffers out of %d missing\n",
> > > > > > > >                                  dev->name, num_buf,
> > > > > > > > @@ -1414,7 +1612,7 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
> > > > > > > >  err_skb:
> > > > > > > >         put_page(page);
> > > > > > > >         while (num_buf-- > 1) {
> > > > > > > > -               buf = virtqueue_get_buf(rq->vq, &len);
> > > > > > > > +               buf = virtnet_rq_get_buf(rq, &len, NULL);
> > > > > > > >                 if (unlikely(!buf)) {
> > > > > > > >                         pr_debug("%s: rx error: %d buffers missing\n",
> > > > > > > >                                  dev->name, num_buf);
> > > > > > > > @@ -1529,6 +1727,7 @@ static int add_recvbuf_small(struct virtnet_info *vi, struct receive_queue *rq,
> > > > > > > >         unsigned int xdp_headroom = virtnet_get_headroom(vi);
> > > > > > > >         void *ctx = (void *)(unsigned long)xdp_headroom;
> > > > > > > >         int len = vi->hdr_len + VIRTNET_RX_PAD + GOOD_PACKET_LEN + xdp_headroom;
> > > > > > > > +       struct virtnet_rq_data *data;
> > > > > > > >         int err;
> > > > > > > >
> > > > > > > >         len = SKB_DATA_ALIGN(len) +
> > > > > > > > @@ -1539,11 +1738,34 @@ static int add_recvbuf_small(struct virtnet_info *vi, struct receive_queue *rq,
> > > > > > > >         buf = (char *)page_address(alloc_frag->page) + alloc_frag->offset;
> > > > > > > >         get_page(alloc_frag->page);
> > > > > > > >         alloc_frag->offset += len;
> > > > > > > > -       sg_init_one(rq->sg, buf + VIRTNET_RX_PAD + xdp_headroom,
> > > > > > > > -                   vi->hdr_len + GOOD_PACKET_LEN);
> > > > > > > > -       err = virtqueue_add_inbuf_ctx(rq->vq, rq->sg, 1, buf, ctx, gfp);
> > > > > > > > +
> > > > > > > > +       if (rq->data_array) {
> > > > > > > > +               err = virtnet_rq_map_sg(rq, buf + VIRTNET_RX_PAD + xdp_headroom,
> > > > > > > > +                                       vi->hdr_len + GOOD_PACKET_LEN);
> > > > > > > > +               if (err)
> > > > > > > > +                       goto map_err;
> > > > > > > > +
> > > > > > > > +               data = virtnet_rq_get_data(rq, buf, rq->last_dma);
> > > > > > > > +       } else {
> > > > > > > > +               sg_init_one(rq->sg, buf + VIRTNET_RX_PAD + xdp_headroom,
> > > > > > > > +                           vi->hdr_len + GOOD_PACKET_LEN);
> > > > > > > > +               data = (void *)buf;
> > > > > > > > +       }
> > > > > > > > +
> > > > > > > > +       err = virtqueue_add_inbuf_ctx(rq->vq, rq->sg, 1, data, ctx, gfp);
> > > > > > > >         if (err < 0)
> > > > > > > > -               put_page(virt_to_head_page(buf));
> > > > > > > > +               goto add_err;
> > > > > > > > +
> > > > > > > > +       return err;
> > > > > > > > +
> > > > > > > > +add_err:
> > > > > > > > +       if (rq->data_array) {
> > > > > > > > +               virtnet_rq_unmap(rq, data->dma);
> > > > > > > > +               virtnet_rq_recycle_data(rq, data);
> > > > > > > > +       }
> > > > > > > > +
> > > > > > > > +map_err:
> > > > > > > > +       put_page(virt_to_head_page(buf));
> > > > > > > >         return err;
> > > > > > > >  }
> > > > > > > >
> > > > > > > > @@ -1620,6 +1842,7 @@ static int add_recvbuf_mergeable(struct virtnet_info *vi,
> > > > > > > >         unsigned int headroom = virtnet_get_headroom(vi);
> > > > > > > >         unsigned int tailroom = headroom ? sizeof(struct skb_shared_info) : 0;
> > > > > > > >         unsigned int room = SKB_DATA_ALIGN(headroom + tailroom);
> > > > > > > > +       struct virtnet_rq_data *data;
> > > > > > > >         char *buf;
> > > > > > > >         void *ctx;
> > > > > > > >         int err;
> > > > > > > > @@ -1650,12 +1873,32 @@ static int add_recvbuf_mergeable(struct virtnet_info *vi,
> > > > > > > >                 alloc_frag->offset += hole;
> > > > > > > >         }
> > > > > > > >
> > > > > > > > -       sg_init_one(rq->sg, buf, len);
> > > > > > > > +       if (rq->data_array) {
> > > > > > > > +               err = virtnet_rq_map_sg(rq, buf, len);
> > > > > > > > +               if (err)
> > > > > > > > +                       goto map_err;
> > > > > > > > +
> > > > > > > > +               data = virtnet_rq_get_data(rq, buf, rq->last_dma);
> > > > > > > > +       } else {
> > > > > > > > +               sg_init_one(rq->sg, buf, len);
> > > > > > > > +               data = (void *)buf;
> > > > > > > > +       }
> > > > > > > > +
> > > > > > > >         ctx = mergeable_len_to_ctx(len + room, headroom);
> > > > > > > > -       err = virtqueue_add_inbuf_ctx(rq->vq, rq->sg, 1, buf, ctx, gfp);
> > > > > > > > +       err = virtqueue_add_inbuf_ctx(rq->vq, rq->sg, 1, data, ctx, gfp);
> > > > > > > >         if (err < 0)
> > > > > > > > -               put_page(virt_to_head_page(buf));
> > > > > > > > +               goto add_err;
> > > > > > > > +
> > > > > > > > +       return 0;
> > > > > > > > +
> > > > > > > > +add_err:
> > > > > > > > +       if (rq->data_array) {
> > > > > > > > +               virtnet_rq_unmap(rq, data->dma);
> > > > > > > > +               virtnet_rq_recycle_data(rq, data);
> > > > > > > > +       }
> > > > > > > >
> > > > > > > > +map_err:
> > > > > > > > +       put_page(virt_to_head_page(buf));
> > > > > > > >         return err;
> > > > > > > >  }
> > > > > > > >
> > > > > > > > @@ -1775,13 +2018,13 @@ static int virtnet_receive(struct receive_queue *rq, int budget,
> > > > > > > >                 void *ctx;
> > > > > > > >
> > > > > > > >                 while (stats.packets < budget &&
> > > > > > > > -                      (buf = virtqueue_get_buf_ctx(rq->vq, &len, &ctx))) {
> > > > > > > > +                      (buf = virtnet_rq_get_buf(rq, &len, &ctx))) {
> > > > > > > >                         receive_buf(vi, rq, buf, len, ctx, xdp_xmit, &stats);
> > > > > > > >                         stats.packets++;
> > > > > > > >                 }
> > > > > > > >         } else {
> > > > > > > >                 while (stats.packets < budget &&
> > > > > > > > -                      (buf = virtqueue_get_buf(rq->vq, &len)) != NULL) {
> > > > > > > > +                      (buf = virtnet_rq_get_buf(rq, &len, NULL)) != NULL) {
> > > > > > > >                         receive_buf(vi, rq, buf, len, NULL, xdp_xmit, &stats);
> > > > > > > >                         stats.packets++;
> > > > > > > >                 }
> > > > > > > > @@ -3514,6 +3757,9 @@ static void virtnet_free_queues(struct virtnet_info *vi)
> > > > > > > >         for (i = 0; i < vi->max_queue_pairs; i++) {
> > > > > > > >                 __netif_napi_del(&vi->rq[i].napi);
> > > > > > > >                 __netif_napi_del(&vi->sq[i].napi);
> > > > > > > > +
> > > > > > > > +               kfree(vi->rq[i].data_array);
> > > > > > > > +               kfree(vi->rq[i].dma_array);
> > > > > > > >         }
> > > > > > > >
> > > > > > > >         /* We called __netif_napi_del(),
> > > > > > > > @@ -3591,9 +3837,10 @@ static void free_unused_bufs(struct virtnet_info *vi)
> > > > > > > >         }
> > > > > > > >
> > > > > > > >         for (i = 0; i < vi->max_queue_pairs; i++) {
> > > > > > > > -               struct virtqueue *vq = vi->rq[i].vq;
> > > > > > > > -               while ((buf = virtqueue_detach_unused_buf(vq)) != NULL)
> > > > > > > > -                       virtnet_rq_free_unused_buf(vq, buf);
> > > > > > > > +               struct receive_queue *rq = &vi->rq[i];
> > > > > > > > +
> > > > > > > > +               while ((buf = virtnet_rq_detach_unused_buf(rq)) != NULL)
> > > > > > > > +                       virtnet_rq_free_unused_buf(rq->vq, buf);
> > > > > > > >                 cond_resched();
> > > > > > > >         }
> > > > > > > >  }
> > > > > > > > @@ -3767,6 +4014,10 @@ static int init_vqs(struct virtnet_info *vi)
> > > > > > > >         if (ret)
> > > > > > > >                 goto err_free;
> > > > > > > >
> > > > > > > > +       ret = virtnet_rq_merge_map_init(vi);
> > > > > > > > +       if (ret)
> > > > > > > > +               goto err_free;
> > > > > > > > +
> > > > > > > >         cpus_read_lock();
> > > > > > > >         virtnet_set_affinity(vi);
> > > > > > > >         cpus_read_unlock();
> > > > > > > > --
> > > > > > > > 2.32.0.3.g01195cf9f
> > > > > > >
> > > > >
> > > >
> > >
> >
>

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: [PATCH vhost v11 10/10] virtio_net: merge dma operation for one page
@ 2023-07-12  7:54                   ` Xuan Zhuo
  0 siblings, 0 replies; 176+ messages in thread
From: Xuan Zhuo @ 2023-07-12  7:54 UTC (permalink / raw)
  To: Jason Wang
  Cc: Jesper Dangaard Brouer, Daniel Borkmann, Michael S. Tsirkin,
	netdev, John Fastabend, Alexei Starovoitov, virtualization,
	Christoph Hellwig, Eric Dumazet, Jakub Kicinski, bpf,
	Paolo Abeni, David S. Miller

On Tue, 11 Jul 2023 10:58:51 +0800, Jason Wang <jasowang@redhat.com> wrote:
> On Tue, Jul 11, 2023 at 10:42 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> >
> > On Tue, 11 Jul 2023 10:36:17 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > On Mon, Jul 10, 2023 at 8:41 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > >
> > > > On Mon, 10 Jul 2023 07:59:03 -0400, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > > > > On Mon, Jul 10, 2023 at 06:18:30PM +0800, Xuan Zhuo wrote:
> > > > > > On Mon, 10 Jul 2023 05:40:21 -0400, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > > > > > > On Mon, Jul 10, 2023 at 11:42:37AM +0800, Xuan Zhuo wrote:
> > > > > > > > Currently, the virtio core will perform a dma operation for each
> > > > > > > > operation. Although, the same page may be operated multiple times.
> > > > > > > >
> > > > > > > > The driver does the dma operation and manages the dma address based the
> > > > > > > > feature premapped of virtio core.
> > > > > > > >
> > > > > > > > This way, we can perform only one dma operation for the same page. In
> > > > > > > > the case of mtu 1500, this can reduce a lot of dma operations.
> > > > > > > >
> > > > > > > > Tested on Aliyun g7.4large machine, in the case of a cpu 100%, pps
> > > > > > > > increased from 1893766 to 1901105. An increase of 0.4%.
> > > > > > >
> > > > > > > what kind of dma was there? an IOMMU? which vendors? in which mode
> > > > > > > of operation?
> > > > > >
> > > > > >
> > > > > > Do you mean this:
> > > > > >
> > > > > > [    0.470816] iommu: Default domain type: Passthrough
> > > > > >
> > > > >
> > > > > With passthrough, dma API is just some indirect function calls, they do
> > > > > not affect the performance a lot.
> > > >
> > > >
> > > > Yes, this benefit is worthless. I seem to have done a meaningless thing. The
> > > > overhead of DMA I observed is indeed not too high.
> > >
> > > Have you measured with iommu=strict?
> >
> > I have not tested this way, our environment is pt, I wonder if strict is a
> > common scenario. I can test it.
>
> It's not a common setup, but it's a way to stress DMA layer to see the overhead.

kernel command line: intel_iommu=on iommu.strict=1 iommu.passthrough=0

virtio-net without merge dma 428614.00 pps

virtio-net with merge dma    742853.00 pps


Thanks.




>
> Thanks
>
> >
> > Thanks.
> >
> >
> > >
> > > Thanks
> > >
> > > >
> > > > Thanks.
> > > >
> > > >
> > > > >
> > > > > Try e.g. bounce buffer. Which is where you will see a problem: your
> > > > > patches won't work.
> > > > >
> > > > >
> > > > > > >
> > > > > > > > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > > > > > >
> > > > > > > This kind of difference is likely in the noise.
> > > > > >
> > > > > > It's really not high, but this is because the proportion of DMA under perf top
> > > > > > is not high. Probably that much.
> > > > >
> > > > > So maybe not worth the complexity.
> > > > >
> > > > > > >
> > > > > > >
> > > > > > > > ---
> > > > > > > >  drivers/net/virtio_net.c | 283 ++++++++++++++++++++++++++++++++++++---
> > > > > > > >  1 file changed, 267 insertions(+), 16 deletions(-)
> > > > > > > >
> > > > > > > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > > > > > > > index 486b5849033d..4de845d35bed 100644
> > > > > > > > --- a/drivers/net/virtio_net.c
> > > > > > > > +++ b/drivers/net/virtio_net.c
> > > > > > > > @@ -126,6 +126,27 @@ static const struct virtnet_stat_desc virtnet_rq_stats_desc[] = {
> > > > > > > >  #define VIRTNET_SQ_STATS_LEN   ARRAY_SIZE(virtnet_sq_stats_desc)
> > > > > > > >  #define VIRTNET_RQ_STATS_LEN   ARRAY_SIZE(virtnet_rq_stats_desc)
> > > > > > > >
> > > > > > > > +/* The bufs on the same page may share this struct. */
> > > > > > > > +struct virtnet_rq_dma {
> > > > > > > > +       struct virtnet_rq_dma *next;
> > > > > > > > +
> > > > > > > > +       dma_addr_t addr;
> > > > > > > > +
> > > > > > > > +       void *buf;
> > > > > > > > +       u32 len;
> > > > > > > > +
> > > > > > > > +       u32 ref;
> > > > > > > > +};
> > > > > > > > +
> > > > > > > > +/* Record the dma and buf. */
> > > > > > >
> > > > > > > I guess I see that. But why?
> > > > > > > And these two comments are the extent of the available
> > > > > > > documentation, that's not enough I feel.
> > > > > > >
> > > > > > >
> > > > > > > > +struct virtnet_rq_data {
> > > > > > > > +       struct virtnet_rq_data *next;
> > > > > > >
> > > > > > > Is manually reimplementing a linked list the best
> > > > > > > we can do?
> > > > > >
> > > > > > Yes, we can use llist.
> > > > > >
> > > > > > >
> > > > > > > > +
> > > > > > > > +       void *buf;
> > > > > > > > +
> > > > > > > > +       struct virtnet_rq_dma *dma;
> > > > > > > > +};
> > > > > > > > +
> > > > > > > >  /* Internal representation of a send virtqueue */
> > > > > > > >  struct send_queue {
> > > > > > > >         /* Virtqueue associated with this send _queue */
> > > > > > > > @@ -175,6 +196,13 @@ struct receive_queue {
> > > > > > > >         char name[16];
> > > > > > > >
> > > > > > > >         struct xdp_rxq_info xdp_rxq;
> > > > > > > > +
> > > > > > > > +       struct virtnet_rq_data *data_array;
> > > > > > > > +       struct virtnet_rq_data *data_free;
> > > > > > > > +
> > > > > > > > +       struct virtnet_rq_dma *dma_array;
> > > > > > > > +       struct virtnet_rq_dma *dma_free;
> > > > > > > > +       struct virtnet_rq_dma *last_dma;
> > > > > > > >  };
> > > > > > > >
> > > > > > > >  /* This structure can contain rss message with maximum settings for indirection table and keysize
> > > > > > > > @@ -549,6 +577,176 @@ static struct sk_buff *page_to_skb(struct virtnet_info *vi,
> > > > > > > >         return skb;
> > > > > > > >  }
> > > > > > > >
> > > > > > > > +static void virtnet_rq_unmap(struct receive_queue *rq, struct virtnet_rq_dma *dma)
> > > > > > > > +{
> > > > > > > > +       struct device *dev;
> > > > > > > > +
> > > > > > > > +       --dma->ref;
> > > > > > > > +
> > > > > > > > +       if (dma->ref)
> > > > > > > > +               return;
> > > > > > > > +
> > > > > > >
> > > > > > > If you don't unmap there is no guarantee valid data will be
> > > > > > > there in the buffer.
> > > > > > >
> > > > > > > > +       dev = virtqueue_dma_dev(rq->vq);
> > > > > > > > +
> > > > > > > > +       dma_unmap_page(dev, dma->addr, dma->len, DMA_FROM_DEVICE);
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > > +
> > > > > > > > +       dma->next = rq->dma_free;
> > > > > > > > +       rq->dma_free = dma;
> > > > > > > > +}
> > > > > > > > +
> > > > > > > > +static void *virtnet_rq_recycle_data(struct receive_queue *rq,
> > > > > > > > +                                    struct virtnet_rq_data *data)
> > > > > > > > +{
> > > > > > > > +       void *buf;
> > > > > > > > +
> > > > > > > > +       buf = data->buf;
> > > > > > > > +
> > > > > > > > +       data->next = rq->data_free;
> > > > > > > > +       rq->data_free = data;
> > > > > > > > +
> > > > > > > > +       return buf;
> > > > > > > > +}
> > > > > > > > +
> > > > > > > > +static struct virtnet_rq_data *virtnet_rq_get_data(struct receive_queue *rq,
> > > > > > > > +                                                  void *buf,
> > > > > > > > +                                                  struct virtnet_rq_dma *dma)
> > > > > > > > +{
> > > > > > > > +       struct virtnet_rq_data *data;
> > > > > > > > +
> > > > > > > > +       data = rq->data_free;
> > > > > > > > +       rq->data_free = data->next;
> > > > > > > > +
> > > > > > > > +       data->buf = buf;
> > > > > > > > +       data->dma = dma;
> > > > > > > > +
> > > > > > > > +       return data;
> > > > > > > > +}
> > > > > > > > +
> > > > > > > > +static void *virtnet_rq_get_buf(struct receive_queue *rq, u32 *len, void **ctx)
> > > > > > > > +{
> > > > > > > > +       struct virtnet_rq_data *data;
> > > > > > > > +       void *buf;
> > > > > > > > +
> > > > > > > > +       buf = virtqueue_get_buf_ctx(rq->vq, len, ctx);
> > > > > > > > +       if (!buf || !rq->data_array)
> > > > > > > > +               return buf;
> > > > > > > > +
> > > > > > > > +       data = buf;
> > > > > > > > +
> > > > > > > > +       virtnet_rq_unmap(rq, data->dma);
> > > > > > > > +
> > > > > > > > +       return virtnet_rq_recycle_data(rq, data);
> > > > > > > > +}
> > > > > > > > +
> > > > > > > > +static void *virtnet_rq_detach_unused_buf(struct receive_queue *rq)
> > > > > > > > +{
> > > > > > > > +       struct virtnet_rq_data *data;
> > > > > > > > +       void *buf;
> > > > > > > > +
> > > > > > > > +       buf = virtqueue_detach_unused_buf(rq->vq);
> > > > > > > > +       if (!buf || !rq->data_array)
> > > > > > > > +               return buf;
> > > > > > > > +
> > > > > > > > +       data = buf;
> > > > > > > > +
> > > > > > > > +       virtnet_rq_unmap(rq, data->dma);
> > > > > > > > +
> > > > > > > > +       return virtnet_rq_recycle_data(rq, data);
> > > > > > > > +}
> > > > > > > > +
> > > > > > > > +static int virtnet_rq_map_sg(struct receive_queue *rq, void *buf, u32 len)
> > > > > > > > +{
> > > > > > > > +       struct virtnet_rq_dma *dma = rq->last_dma;
> > > > > > > > +       struct device *dev;
> > > > > > > > +       u32 off, map_len;
> > > > > > > > +       dma_addr_t addr;
> > > > > > > > +       void *end;
> > > > > > > > +
> > > > > > > > +       if (likely(dma) && buf >= dma->buf && (buf + len <= dma->buf + dma->len)) {
> > > > > > > > +               ++dma->ref;
> > > > > > > > +               addr = dma->addr + (buf - dma->buf);
> > > > > > > > +               goto ok;
> > > > > > > > +       }
> > > > > > >
> > > > > > > So this is the meat of the proposed optimization. I guess that
> > > > > > > if the last buffer we allocated happens to be in the same page
> > > > > > > as this one then they can both be mapped for DMA together.
> > > > > >
> > > > > > Since we use page_frag, the buffers we allocated are all continuous.
> > > > > >
> > > > > > > Why last one specifically? Whether next one happens to
> > > > > > > be close depends on luck. If you want to try optimizing this
> > > > > > > the right thing to do is likely by using a page pool.
> > > > > > > There's actually work upstream on page pool, look it up.
> > > > > >
> > > > > > As we discussed in another thread, the page pool is first used for xdp. Let's
> > > > > > transform it step by step.
> > > > > >
> > > > > > Thanks.
> > > > >
> > > > > ok so this should wait then?
> > > > >
> > > > > > >
> > > > > > > > +
> > > > > > > > +       end = buf + len - 1;
> > > > > > > > +       off = offset_in_page(end);
> > > > > > > > +       map_len = len + PAGE_SIZE - off;
> > > > > > > > +
> > > > > > > > +       dev = virtqueue_dma_dev(rq->vq);
> > > > > > > > +
> > > > > > > > +       addr = dma_map_page_attrs(dev, virt_to_page(buf), offset_in_page(buf),
> > > > > > > > +                                 map_len, DMA_FROM_DEVICE, 0);
> > > > > > > > +       if (addr == DMA_MAPPING_ERROR)
> > > > > > > > +               return -ENOMEM;
> > > > > > > > +
> > > > > > > > +       dma = rq->dma_free;
> > > > > > > > +       rq->dma_free = dma->next;
> > > > > > > > +
> > > > > > > > +       dma->ref = 1;
> > > > > > > > +       dma->buf = buf;
> > > > > > > > +       dma->addr = addr;
> > > > > > > > +       dma->len = map_len;
> > > > > > > > +
> > > > > > > > +       rq->last_dma = dma;
> > > > > > > > +
> > > > > > > > +ok:
> > > > > > > > +       sg_init_table(rq->sg, 1);
> > > > > > > > +       rq->sg[0].dma_address = addr;
> > > > > > > > +       rq->sg[0].length = len;
> > > > > > > > +
> > > > > > > > +       return 0;
> > > > > > > > +}
> > > > > > > > +
> > > > > > > > +static int virtnet_rq_merge_map_init(struct virtnet_info *vi)
> > > > > > > > +{
> > > > > > > > +       struct receive_queue *rq;
> > > > > > > > +       int i, err, j, num;
> > > > > > > > +
> > > > > > > > +       /* disable for big mode */
> > > > > > > > +       if (!vi->mergeable_rx_bufs && vi->big_packets)
> > > > > > > > +               return 0;
> > > > > > > > +
> > > > > > > > +       for (i = 0; i < vi->max_queue_pairs; i++) {
> > > > > > > > +               err = virtqueue_set_premapped(vi->rq[i].vq);
> > > > > > > > +               if (err)
> > > > > > > > +                       continue;
> > > > > > > > +
> > > > > > > > +               rq = &vi->rq[i];
> > > > > > > > +
> > > > > > > > +               num = virtqueue_get_vring_size(rq->vq);
> > > > > > > > +
> > > > > > > > +               rq->data_array = kmalloc_array(num, sizeof(*rq->data_array), GFP_KERNEL);
> > > > > > > > +               if (!rq->data_array)
> > > > > > > > +                       goto err;
> > > > > > > > +
> > > > > > > > +               rq->dma_array = kmalloc_array(num, sizeof(*rq->dma_array), GFP_KERNEL);
> > > > > > > > +               if (!rq->dma_array)
> > > > > > > > +                       goto err;
> > > > > > > > +
> > > > > > > > +               for (j = 0; j < num; ++j) {
> > > > > > > > +                       rq->data_array[j].next = rq->data_free;
> > > > > > > > +                       rq->data_free = &rq->data_array[j];
> > > > > > > > +
> > > > > > > > +                       rq->dma_array[j].next = rq->dma_free;
> > > > > > > > +                       rq->dma_free = &rq->dma_array[j];
> > > > > > > > +               }
> > > > > > > > +       }
> > > > > > > > +
> > > > > > > > +       return 0;
> > > > > > > > +
> > > > > > > > +err:
> > > > > > > > +       for (i = 0; i < vi->max_queue_pairs; i++) {
> > > > > > > > +               struct receive_queue *rq;
> > > > > > > > +
> > > > > > > > +               rq = &vi->rq[i];
> > > > > > > > +
> > > > > > > > +               kfree(rq->dma_array);
> > > > > > > > +               kfree(rq->data_array);
> > > > > > > > +       }
> > > > > > > > +
> > > > > > > > +       return -ENOMEM;
> > > > > > > > +}
> > > > > > > > +
> > > > > > > >  static void free_old_xmit_skbs(struct send_queue *sq, bool in_napi)
> > > > > > > >  {
> > > > > > > >         unsigned int len;
> > > > > > > > @@ -835,7 +1033,7 @@ static struct page *xdp_linearize_page(struct receive_queue *rq,
> > > > > > > >                 void *buf;
> > > > > > > >                 int off;
> > > > > > > >
> > > > > > > > -               buf = virtqueue_get_buf(rq->vq, &buflen);
> > > > > > > > +               buf = virtnet_rq_get_buf(rq, &buflen, NULL);
> > > > > > > >                 if (unlikely(!buf))
> > > > > > > >                         goto err_buf;
> > > > > > > >
> > > > > > > > @@ -1126,7 +1324,7 @@ static int virtnet_build_xdp_buff_mrg(struct net_device *dev,
> > > > > > > >                 return -EINVAL;
> > > > > > > >
> > > > > > > >         while (--*num_buf > 0) {
> > > > > > > > -               buf = virtqueue_get_buf_ctx(rq->vq, &len, &ctx);
> > > > > > > > +               buf = virtnet_rq_get_buf(rq, &len, &ctx);
> > > > > > > >                 if (unlikely(!buf)) {
> > > > > > > >                         pr_debug("%s: rx error: %d buffers out of %d missing\n",
> > > > > > > >                                  dev->name, *num_buf,
> > > > > > > > @@ -1351,7 +1549,7 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
> > > > > > > >         while (--num_buf) {
> > > > > > > >                 int num_skb_frags;
> > > > > > > >
> > > > > > > > -               buf = virtqueue_get_buf_ctx(rq->vq, &len, &ctx);
> > > > > > > > +               buf = virtnet_rq_get_buf(rq, &len, &ctx);
> > > > > > > >                 if (unlikely(!buf)) {
> > > > > > > >                         pr_debug("%s: rx error: %d buffers out of %d missing\n",
> > > > > > > >                                  dev->name, num_buf,
> > > > > > > > @@ -1414,7 +1612,7 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
> > > > > > > >  err_skb:
> > > > > > > >         put_page(page);
> > > > > > > >         while (num_buf-- > 1) {
> > > > > > > > -               buf = virtqueue_get_buf(rq->vq, &len);
> > > > > > > > +               buf = virtnet_rq_get_buf(rq, &len, NULL);
> > > > > > > >                 if (unlikely(!buf)) {
> > > > > > > >                         pr_debug("%s: rx error: %d buffers missing\n",
> > > > > > > >                                  dev->name, num_buf);
> > > > > > > > @@ -1529,6 +1727,7 @@ static int add_recvbuf_small(struct virtnet_info *vi, struct receive_queue *rq,
> > > > > > > >         unsigned int xdp_headroom = virtnet_get_headroom(vi);
> > > > > > > >         void *ctx = (void *)(unsigned long)xdp_headroom;
> > > > > > > >         int len = vi->hdr_len + VIRTNET_RX_PAD + GOOD_PACKET_LEN + xdp_headroom;
> > > > > > > > +       struct virtnet_rq_data *data;
> > > > > > > >         int err;
> > > > > > > >
> > > > > > > >         len = SKB_DATA_ALIGN(len) +
> > > > > > > > @@ -1539,11 +1738,34 @@ static int add_recvbuf_small(struct virtnet_info *vi, struct receive_queue *rq,
> > > > > > > >         buf = (char *)page_address(alloc_frag->page) + alloc_frag->offset;
> > > > > > > >         get_page(alloc_frag->page);
> > > > > > > >         alloc_frag->offset += len;
> > > > > > > > -       sg_init_one(rq->sg, buf + VIRTNET_RX_PAD + xdp_headroom,
> > > > > > > > -                   vi->hdr_len + GOOD_PACKET_LEN);
> > > > > > > > -       err = virtqueue_add_inbuf_ctx(rq->vq, rq->sg, 1, buf, ctx, gfp);
> > > > > > > > +
> > > > > > > > +       if (rq->data_array) {
> > > > > > > > +               err = virtnet_rq_map_sg(rq, buf + VIRTNET_RX_PAD + xdp_headroom,
> > > > > > > > +                                       vi->hdr_len + GOOD_PACKET_LEN);
> > > > > > > > +               if (err)
> > > > > > > > +                       goto map_err;
> > > > > > > > +
> > > > > > > > +               data = virtnet_rq_get_data(rq, buf, rq->last_dma);
> > > > > > > > +       } else {
> > > > > > > > +               sg_init_one(rq->sg, buf + VIRTNET_RX_PAD + xdp_headroom,
> > > > > > > > +                           vi->hdr_len + GOOD_PACKET_LEN);
> > > > > > > > +               data = (void *)buf;
> > > > > > > > +       }
> > > > > > > > +
> > > > > > > > +       err = virtqueue_add_inbuf_ctx(rq->vq, rq->sg, 1, data, ctx, gfp);
> > > > > > > >         if (err < 0)
> > > > > > > > -               put_page(virt_to_head_page(buf));
> > > > > > > > +               goto add_err;
> > > > > > > > +
> > > > > > > > +       return err;
> > > > > > > > +
> > > > > > > > +add_err:
> > > > > > > > +       if (rq->data_array) {
> > > > > > > > +               virtnet_rq_unmap(rq, data->dma);
> > > > > > > > +               virtnet_rq_recycle_data(rq, data);
> > > > > > > > +       }
> > > > > > > > +
> > > > > > > > +map_err:
> > > > > > > > +       put_page(virt_to_head_page(buf));
> > > > > > > >         return err;
> > > > > > > >  }
> > > > > > > >
> > > > > > > > @@ -1620,6 +1842,7 @@ static int add_recvbuf_mergeable(struct virtnet_info *vi,
> > > > > > > >         unsigned int headroom = virtnet_get_headroom(vi);
> > > > > > > >         unsigned int tailroom = headroom ? sizeof(struct skb_shared_info) : 0;
> > > > > > > >         unsigned int room = SKB_DATA_ALIGN(headroom + tailroom);
> > > > > > > > +       struct virtnet_rq_data *data;
> > > > > > > >         char *buf;
> > > > > > > >         void *ctx;
> > > > > > > >         int err;
> > > > > > > > @@ -1650,12 +1873,32 @@ static int add_recvbuf_mergeable(struct virtnet_info *vi,
> > > > > > > >                 alloc_frag->offset += hole;
> > > > > > > >         }
> > > > > > > >
> > > > > > > > -       sg_init_one(rq->sg, buf, len);
> > > > > > > > +       if (rq->data_array) {
> > > > > > > > +               err = virtnet_rq_map_sg(rq, buf, len);
> > > > > > > > +               if (err)
> > > > > > > > +                       goto map_err;
> > > > > > > > +
> > > > > > > > +               data = virtnet_rq_get_data(rq, buf, rq->last_dma);
> > > > > > > > +       } else {
> > > > > > > > +               sg_init_one(rq->sg, buf, len);
> > > > > > > > +               data = (void *)buf;
> > > > > > > > +       }
> > > > > > > > +
> > > > > > > >         ctx = mergeable_len_to_ctx(len + room, headroom);
> > > > > > > > -       err = virtqueue_add_inbuf_ctx(rq->vq, rq->sg, 1, buf, ctx, gfp);
> > > > > > > > +       err = virtqueue_add_inbuf_ctx(rq->vq, rq->sg, 1, data, ctx, gfp);
> > > > > > > >         if (err < 0)
> > > > > > > > -               put_page(virt_to_head_page(buf));
> > > > > > > > +               goto add_err;
> > > > > > > > +
> > > > > > > > +       return 0;
> > > > > > > > +
> > > > > > > > +add_err:
> > > > > > > > +       if (rq->data_array) {
> > > > > > > > +               virtnet_rq_unmap(rq, data->dma);
> > > > > > > > +               virtnet_rq_recycle_data(rq, data);
> > > > > > > > +       }
> > > > > > > >
> > > > > > > > +map_err:
> > > > > > > > +       put_page(virt_to_head_page(buf));
> > > > > > > >         return err;
> > > > > > > >  }
> > > > > > > >
> > > > > > > > @@ -1775,13 +2018,13 @@ static int virtnet_receive(struct receive_queue *rq, int budget,
> > > > > > > >                 void *ctx;
> > > > > > > >
> > > > > > > >                 while (stats.packets < budget &&
> > > > > > > > -                      (buf = virtqueue_get_buf_ctx(rq->vq, &len, &ctx))) {
> > > > > > > > +                      (buf = virtnet_rq_get_buf(rq, &len, &ctx))) {
> > > > > > > >                         receive_buf(vi, rq, buf, len, ctx, xdp_xmit, &stats);
> > > > > > > >                         stats.packets++;
> > > > > > > >                 }
> > > > > > > >         } else {
> > > > > > > >                 while (stats.packets < budget &&
> > > > > > > > -                      (buf = virtqueue_get_buf(rq->vq, &len)) != NULL) {
> > > > > > > > +                      (buf = virtnet_rq_get_buf(rq, &len, NULL)) != NULL) {
> > > > > > > >                         receive_buf(vi, rq, buf, len, NULL, xdp_xmit, &stats);
> > > > > > > >                         stats.packets++;
> > > > > > > >                 }
> > > > > > > > @@ -3514,6 +3757,9 @@ static void virtnet_free_queues(struct virtnet_info *vi)
> > > > > > > >         for (i = 0; i < vi->max_queue_pairs; i++) {
> > > > > > > >                 __netif_napi_del(&vi->rq[i].napi);
> > > > > > > >                 __netif_napi_del(&vi->sq[i].napi);
> > > > > > > > +
> > > > > > > > +               kfree(vi->rq[i].data_array);
> > > > > > > > +               kfree(vi->rq[i].dma_array);
> > > > > > > >         }
> > > > > > > >
> > > > > > > >         /* We called __netif_napi_del(),
> > > > > > > > @@ -3591,9 +3837,10 @@ static void free_unused_bufs(struct virtnet_info *vi)
> > > > > > > >         }
> > > > > > > >
> > > > > > > >         for (i = 0; i < vi->max_queue_pairs; i++) {
> > > > > > > > -               struct virtqueue *vq = vi->rq[i].vq;
> > > > > > > > -               while ((buf = virtqueue_detach_unused_buf(vq)) != NULL)
> > > > > > > > -                       virtnet_rq_free_unused_buf(vq, buf);
> > > > > > > > +               struct receive_queue *rq = &vi->rq[i];
> > > > > > > > +
> > > > > > > > +               while ((buf = virtnet_rq_detach_unused_buf(rq)) != NULL)
> > > > > > > > +                       virtnet_rq_free_unused_buf(rq->vq, buf);
> > > > > > > >                 cond_resched();
> > > > > > > >         }
> > > > > > > >  }
> > > > > > > > @@ -3767,6 +4014,10 @@ static int init_vqs(struct virtnet_info *vi)
> > > > > > > >         if (ret)
> > > > > > > >                 goto err_free;
> > > > > > > >
> > > > > > > > +       ret = virtnet_rq_merge_map_init(vi);
> > > > > > > > +       if (ret)
> > > > > > > > +               goto err_free;
> > > > > > > > +
> > > > > > > >         cpus_read_lock();
> > > > > > > >         virtnet_set_affinity(vi);
> > > > > > > >         cpus_read_unlock();
> > > > > > > > --
> > > > > > > > 2.32.0.3.g01195cf9f
> > > > > > >
> > > > >
> > > >
> > >
> >
>
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: [PATCH vhost v11 03/10] virtio_ring: introduce virtqueue_set_premapped()
  2023-07-10  3:42   ` Xuan Zhuo
  (?)
@ 2023-07-12  8:24   ` Jason Wang
  2023-07-12  8:35       ` Xuan Zhuo
  -1 siblings, 1 reply; 176+ messages in thread
From: Jason Wang @ 2023-07-12  8:24 UTC (permalink / raw)
  To: Xuan Zhuo
  Cc: Jesper Dangaard Brouer, Daniel Borkmann, Michael S. Tsirkin,
	netdev, John Fastabend, Alexei Starovoitov, virtualization,
	Christoph Hellwig, Eric Dumazet, Jakub Kicinski, bpf,
	Paolo Abeni, David S. Miller


[-- Attachment #1.1: Type: text/plain, Size: 4317 bytes --]

On Mon, Jul 10, 2023 at 11:42 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com>
wrote:

> This helper allows the driver change the dma mode to premapped mode.
> Under the premapped mode, the virtio core do not do dma mapping
> internally.
>
> This just work when the use_dma_api is true. If the use_dma_api is false,
> the dma options is not through the DMA APIs, that is not the standard
> way of the linux kernel.
>
> Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> ---
>  drivers/virtio/virtio_ring.c | 45 ++++++++++++++++++++++++++++++++++++
>  include/linux/virtio.h       |  2 ++
>  2 files changed, 47 insertions(+)
>
> diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> index 87d7ceeecdbd..5ace4539344c 100644
> --- a/drivers/virtio/virtio_ring.c
> +++ b/drivers/virtio/virtio_ring.c
> @@ -172,6 +172,9 @@ struct vring_virtqueue {
>         /* Host publishes avail event idx */
>         bool event;
>
> +       /* Do DMA mapping by driver */
> +       bool premapped;
> +
>         /* Head of free buffer list. */
>         unsigned int free_head;
>         /* Number we've added since last sync. */
> @@ -2061,6 +2064,7 @@ static struct virtqueue
> *vring_create_virtqueue_packed(
>         vq->packed_ring = true;
>         vq->dma_dev = dma_dev;
>         vq->use_dma_api = vring_use_dma_api(vdev);
> +       vq->premapped = false;
>
>         vq->indirect = virtio_has_feature(vdev,
> VIRTIO_RING_F_INDIRECT_DESC) &&
>                 !context;
> @@ -2550,6 +2554,7 @@ static struct virtqueue
> *__vring_new_virtqueue(unsigned int index,
>  #endif
>         vq->dma_dev = dma_dev;
>         vq->use_dma_api = vring_use_dma_api(vdev);
> +       vq->premapped = false;
>
>         vq->indirect = virtio_has_feature(vdev,
> VIRTIO_RING_F_INDIRECT_DESC) &&
>                 !context;
> @@ -2693,6 +2698,46 @@ int virtqueue_resize(struct virtqueue *_vq, u32 num,
>  }
>  EXPORT_SYMBOL_GPL(virtqueue_resize);
>
> +/**
> + * virtqueue_set_premapped - set the vring premapped mode
> + * @_vq: the struct virtqueue we're talking about.
> + *
> + * Enable the premapped mode of the vq.
> + *
> + * The vring in premapped mode does not do dma internally, so the driver
> must
> + * do dma mapping in advance. The driver must pass the dma_address through
> + * dma_address of scatterlist. When the driver got a used buffer from
> + * the vring, it has to unmap the dma address.
> + *
> + * This function must be called immediately after creating the vq, or
> after vq
> + * reset, and before adding any buffers to it.
> + *
> + * Caller must ensure we don't call this with other virtqueue operations
> + * at the same time (except where noted).
> + *
> + * Returns zero or a negative error.
> + * 0: success.
> + * -EINVAL: vring does not use the dma api, so we can not enable
> premapped mode.
> + */
> +int virtqueue_set_premapped(struct virtqueue *_vq)
> +{
> +       struct vring_virtqueue *vq = to_vvq(_vq);
> +       u32 num;
> +
> +       num = vq->packed_ring ? vq->packed.vring.num : vq->split.vring.num;
> +
> +       if (num != vq->vq.num_free)
> +               return -EINVAL;
>

If we check this, I think we need to protect this with
START_USE()/END_USE().


> +
> +       if (!vq->use_dma_api)
> +               return -EINVAL;
>

Not a native spreak, but I think "dma_premapped" is better than "premapped"
as "dma_premapped" implies "use_dma_api".

Thanks


> +
> +       vq->premapped = true;
> +
> +       return 0;
> +}
> +EXPORT_SYMBOL_GPL(virtqueue_set_premapped);
> +
>  /* Only available for split ring */
>  struct virtqueue *vring_new_virtqueue(unsigned int index,
>                                       unsigned int num,
> diff --git a/include/linux/virtio.h b/include/linux/virtio.h
> index de6041deee37..2efd07b79ecf 100644
> --- a/include/linux/virtio.h
> +++ b/include/linux/virtio.h
> @@ -78,6 +78,8 @@ bool virtqueue_enable_cb(struct virtqueue *vq);
>
>  unsigned virtqueue_enable_cb_prepare(struct virtqueue *vq);
>
> +int virtqueue_set_premapped(struct virtqueue *_vq);
> +
>  bool virtqueue_poll(struct virtqueue *vq, unsigned);
>
>  bool virtqueue_enable_cb_delayed(struct virtqueue *vq);
> --
> 2.32.0.3.g01195cf9f
>
>

[-- Attachment #1.2: Type: text/html, Size: 5519 bytes --]

[-- Attachment #2: Type: text/plain, Size: 183 bytes --]

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: [PATCH vhost v11 04/10] virtio_ring: support add premapped buf
  2023-07-10  3:42   ` Xuan Zhuo
@ 2023-07-12  8:31     ` Jason Wang
  -1 siblings, 0 replies; 176+ messages in thread
From: Jason Wang @ 2023-07-12  8:31 UTC (permalink / raw)
  To: Xuan Zhuo
  Cc: virtualization, Michael S. Tsirkin, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Alexei Starovoitov,
	Daniel Borkmann, Jesper Dangaard Brouer, John Fastabend, netdev,
	bpf, Christoph Hellwig

On Mon, Jul 10, 2023 at 11:42 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
>
> If the vq is the premapped mode, use the sg_dma_address() directly.
>
> Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> ---
>  drivers/virtio/virtio_ring.c | 19 +++++++++++++++++--
>  1 file changed, 17 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> index 5ace4539344c..d471dee3f4f7 100644
> --- a/drivers/virtio/virtio_ring.c
> +++ b/drivers/virtio/virtio_ring.c
> @@ -361,6 +361,11 @@ static struct device *vring_dma_dev(const struct vring_virtqueue *vq)
>  static int vring_map_one_sg(const struct vring_virtqueue *vq, struct scatterlist *sg,
>                             enum dma_data_direction direction, dma_addr_t *addr)
>  {
> +       if (vq->premapped) {
> +               *addr = sg_dma_address(sg);
> +               return 0;
> +       }
> +
>         if (!vq->use_dma_api) {
>                 /*
>                  * If DMA is not used, KMSAN doesn't know that the scatterlist
> @@ -639,8 +644,12 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
>                 dma_addr_t addr = vring_map_single(
>                         vq, desc, total_sg * sizeof(struct vring_desc),
>                         DMA_TO_DEVICE);
> -               if (vring_mapping_error(vq, addr))
> +               if (vring_mapping_error(vq, addr)) {
> +                       if (vq->premapped)
> +                               goto free_indirect;

Under which case could we hit this? A bug of the driver?

Thanks

> +
>                         goto unmap_release;
> +               }
>
>                 virtqueue_add_desc_split(_vq, vq->split.vring.desc,
>                                          head, addr,
> @@ -706,6 +715,7 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
>                         i = vring_unmap_one_split(vq, i);
>         }
>
> +free_indirect:
>         if (indirect)
>                 kfree(desc);
>
> @@ -1307,8 +1317,12 @@ static int virtqueue_add_indirect_packed(struct vring_virtqueue *vq,
>         addr = vring_map_single(vq, desc,
>                         total_sg * sizeof(struct vring_packed_desc),
>                         DMA_TO_DEVICE);
> -       if (vring_mapping_error(vq, addr))
> +       if (vring_mapping_error(vq, addr)) {
> +               if (vq->premapped)
> +                       goto free_desc;
> +
>                 goto unmap_release;
> +       }
>
>         vq->packed.vring.desc[head].addr = cpu_to_le64(addr);
>         vq->packed.vring.desc[head].len = cpu_to_le32(total_sg *
> @@ -1366,6 +1380,7 @@ static int virtqueue_add_indirect_packed(struct vring_virtqueue *vq,
>         for (i = 0; i < err_idx; i++)
>                 vring_unmap_desc_packed(vq, &desc[i]);
>
> +free_desc:
>         kfree(desc);
>
>         END_USE(vq);
> --
> 2.32.0.3.g01195cf9f
>


^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: [PATCH vhost v11 04/10] virtio_ring: support add premapped buf
@ 2023-07-12  8:31     ` Jason Wang
  0 siblings, 0 replies; 176+ messages in thread
From: Jason Wang @ 2023-07-12  8:31 UTC (permalink / raw)
  To: Xuan Zhuo
  Cc: Jesper Dangaard Brouer, Daniel Borkmann, Michael S. Tsirkin,
	netdev, John Fastabend, Alexei Starovoitov, virtualization,
	Christoph Hellwig, Eric Dumazet, Jakub Kicinski, bpf,
	Paolo Abeni, David S. Miller

On Mon, Jul 10, 2023 at 11:42 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
>
> If the vq is the premapped mode, use the sg_dma_address() directly.
>
> Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> ---
>  drivers/virtio/virtio_ring.c | 19 +++++++++++++++++--
>  1 file changed, 17 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> index 5ace4539344c..d471dee3f4f7 100644
> --- a/drivers/virtio/virtio_ring.c
> +++ b/drivers/virtio/virtio_ring.c
> @@ -361,6 +361,11 @@ static struct device *vring_dma_dev(const struct vring_virtqueue *vq)
>  static int vring_map_one_sg(const struct vring_virtqueue *vq, struct scatterlist *sg,
>                             enum dma_data_direction direction, dma_addr_t *addr)
>  {
> +       if (vq->premapped) {
> +               *addr = sg_dma_address(sg);
> +               return 0;
> +       }
> +
>         if (!vq->use_dma_api) {
>                 /*
>                  * If DMA is not used, KMSAN doesn't know that the scatterlist
> @@ -639,8 +644,12 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
>                 dma_addr_t addr = vring_map_single(
>                         vq, desc, total_sg * sizeof(struct vring_desc),
>                         DMA_TO_DEVICE);
> -               if (vring_mapping_error(vq, addr))
> +               if (vring_mapping_error(vq, addr)) {
> +                       if (vq->premapped)
> +                               goto free_indirect;

Under which case could we hit this? A bug of the driver?

Thanks

> +
>                         goto unmap_release;
> +               }
>
>                 virtqueue_add_desc_split(_vq, vq->split.vring.desc,
>                                          head, addr,
> @@ -706,6 +715,7 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
>                         i = vring_unmap_one_split(vq, i);
>         }
>
> +free_indirect:
>         if (indirect)
>                 kfree(desc);
>
> @@ -1307,8 +1317,12 @@ static int virtqueue_add_indirect_packed(struct vring_virtqueue *vq,
>         addr = vring_map_single(vq, desc,
>                         total_sg * sizeof(struct vring_packed_desc),
>                         DMA_TO_DEVICE);
> -       if (vring_mapping_error(vq, addr))
> +       if (vring_mapping_error(vq, addr)) {
> +               if (vq->premapped)
> +                       goto free_desc;
> +
>                 goto unmap_release;
> +       }
>
>         vq->packed.vring.desc[head].addr = cpu_to_le64(addr);
>         vq->packed.vring.desc[head].len = cpu_to_le32(total_sg *
> @@ -1366,6 +1380,7 @@ static int virtqueue_add_indirect_packed(struct vring_virtqueue *vq,
>         for (i = 0; i < err_idx; i++)
>                 vring_unmap_desc_packed(vq, &desc[i]);
>
> +free_desc:
>         kfree(desc);
>
>         END_USE(vq);
> --
> 2.32.0.3.g01195cf9f
>

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: [PATCH vhost v11 10/10] virtio_net: merge dma operation for one page
  2023-07-12  7:54                   ` Xuan Zhuo
@ 2023-07-12  8:32                     ` Xuan Zhuo
  -1 siblings, 0 replies; 176+ messages in thread
From: Xuan Zhuo @ 2023-07-12  8:32 UTC (permalink / raw)
  To: Xuan Zhuo
  Cc: Michael S. Tsirkin, virtualization, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Alexei Starovoitov,
	Daniel Borkmann, Jesper Dangaard Brouer, John Fastabend, netdev,
	bpf, Christoph Hellwig, Jason Wang

On Wed, 12 Jul 2023 15:54:58 +0800, Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> On Tue, 11 Jul 2023 10:58:51 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > On Tue, Jul 11, 2023 at 10:42 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > >
> > > On Tue, 11 Jul 2023 10:36:17 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > > On Mon, Jul 10, 2023 at 8:41 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > > >
> > > > > On Mon, 10 Jul 2023 07:59:03 -0400, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > > > > > On Mon, Jul 10, 2023 at 06:18:30PM +0800, Xuan Zhuo wrote:
> > > > > > > On Mon, 10 Jul 2023 05:40:21 -0400, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > > > > > > > On Mon, Jul 10, 2023 at 11:42:37AM +0800, Xuan Zhuo wrote:
> > > > > > > > > Currently, the virtio core will perform a dma operation for each
> > > > > > > > > operation. Although, the same page may be operated multiple times.
> > > > > > > > >
> > > > > > > > > The driver does the dma operation and manages the dma address based the
> > > > > > > > > feature premapped of virtio core.
> > > > > > > > >
> > > > > > > > > This way, we can perform only one dma operation for the same page. In
> > > > > > > > > the case of mtu 1500, this can reduce a lot of dma operations.
> > > > > > > > >
> > > > > > > > > Tested on Aliyun g7.4large machine, in the case of a cpu 100%, pps
> > > > > > > > > increased from 1893766 to 1901105. An increase of 0.4%.
> > > > > > > >
> > > > > > > > what kind of dma was there? an IOMMU? which vendors? in which mode
> > > > > > > > of operation?
> > > > > > >
> > > > > > >
> > > > > > > Do you mean this:
> > > > > > >
> > > > > > > [    0.470816] iommu: Default domain type: Passthrough
> > > > > > >
> > > > > >
> > > > > > With passthrough, dma API is just some indirect function calls, they do
> > > > > > not affect the performance a lot.
> > > > >
> > > > >
> > > > > Yes, this benefit is worthless. I seem to have done a meaningless thing. The
> > > > > overhead of DMA I observed is indeed not too high.
> > > >
> > > > Have you measured with iommu=strict?
> > >
> > > I have not tested this way, our environment is pt, I wonder if strict is a
> > > common scenario. I can test it.
> >
> > It's not a common setup, but it's a way to stress DMA layer to see the overhead.
>
> kernel command line: intel_iommu=on iommu.strict=1 iommu.passthrough=0
>
> virtio-net without merge dma 428614.00 pps
>
> virtio-net with merge dma    742853.00 pps


kernel command line: intel_iommu=on iommu.strict=0 iommu.passthrough=0

virtio-net without merge dma 775496.00 pps

virtio-net with merge dma    1010514.00 pps


Thanks.

>
>
> Thanks.
>
>
>
>
> >
> > Thanks
> >
> > >
> > > Thanks.
> > >
> > >
> > > >
> > > > Thanks
> > > >
> > > > >
> > > > > Thanks.
> > > > >
> > > > >
> > > > > >
> > > > > > Try e.g. bounce buffer. Which is where you will see a problem: your
> > > > > > patches won't work.
> > > > > >
> > > > > >
> > > > > > > >
> > > > > > > > > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > > > > > > >
> > > > > > > > This kind of difference is likely in the noise.
> > > > > > >
> > > > > > > It's really not high, but this is because the proportion of DMA under perf top
> > > > > > > is not high. Probably that much.
> > > > > >
> > > > > > So maybe not worth the complexity.
> > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > > ---
> > > > > > > > >  drivers/net/virtio_net.c | 283 ++++++++++++++++++++++++++++++++++++---
> > > > > > > > >  1 file changed, 267 insertions(+), 16 deletions(-)
> > > > > > > > >
> > > > > > > > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > > > > > > > > index 486b5849033d..4de845d35bed 100644
> > > > > > > > > --- a/drivers/net/virtio_net.c
> > > > > > > > > +++ b/drivers/net/virtio_net.c
> > > > > > > > > @@ -126,6 +126,27 @@ static const struct virtnet_stat_desc virtnet_rq_stats_desc[] = {
> > > > > > > > >  #define VIRTNET_SQ_STATS_LEN   ARRAY_SIZE(virtnet_sq_stats_desc)
> > > > > > > > >  #define VIRTNET_RQ_STATS_LEN   ARRAY_SIZE(virtnet_rq_stats_desc)
> > > > > > > > >
> > > > > > > > > +/* The bufs on the same page may share this struct. */
> > > > > > > > > +struct virtnet_rq_dma {
> > > > > > > > > +       struct virtnet_rq_dma *next;
> > > > > > > > > +
> > > > > > > > > +       dma_addr_t addr;
> > > > > > > > > +
> > > > > > > > > +       void *buf;
> > > > > > > > > +       u32 len;
> > > > > > > > > +
> > > > > > > > > +       u32 ref;
> > > > > > > > > +};
> > > > > > > > > +
> > > > > > > > > +/* Record the dma and buf. */
> > > > > > > >
> > > > > > > > I guess I see that. But why?
> > > > > > > > And these two comments are the extent of the available
> > > > > > > > documentation, that's not enough I feel.
> > > > > > > >
> > > > > > > >
> > > > > > > > > +struct virtnet_rq_data {
> > > > > > > > > +       struct virtnet_rq_data *next;
> > > > > > > >
> > > > > > > > Is manually reimplementing a linked list the best
> > > > > > > > we can do?
> > > > > > >
> > > > > > > Yes, we can use llist.
> > > > > > >
> > > > > > > >
> > > > > > > > > +
> > > > > > > > > +       void *buf;
> > > > > > > > > +
> > > > > > > > > +       struct virtnet_rq_dma *dma;
> > > > > > > > > +};
> > > > > > > > > +
> > > > > > > > >  /* Internal representation of a send virtqueue */
> > > > > > > > >  struct send_queue {
> > > > > > > > >         /* Virtqueue associated with this send _queue */
> > > > > > > > > @@ -175,6 +196,13 @@ struct receive_queue {
> > > > > > > > >         char name[16];
> > > > > > > > >
> > > > > > > > >         struct xdp_rxq_info xdp_rxq;
> > > > > > > > > +
> > > > > > > > > +       struct virtnet_rq_data *data_array;
> > > > > > > > > +       struct virtnet_rq_data *data_free;
> > > > > > > > > +
> > > > > > > > > +       struct virtnet_rq_dma *dma_array;
> > > > > > > > > +       struct virtnet_rq_dma *dma_free;
> > > > > > > > > +       struct virtnet_rq_dma *last_dma;
> > > > > > > > >  };
> > > > > > > > >
> > > > > > > > >  /* This structure can contain rss message with maximum settings for indirection table and keysize
> > > > > > > > > @@ -549,6 +577,176 @@ static struct sk_buff *page_to_skb(struct virtnet_info *vi,
> > > > > > > > >         return skb;
> > > > > > > > >  }
> > > > > > > > >
> > > > > > > > > +static void virtnet_rq_unmap(struct receive_queue *rq, struct virtnet_rq_dma *dma)
> > > > > > > > > +{
> > > > > > > > > +       struct device *dev;
> > > > > > > > > +
> > > > > > > > > +       --dma->ref;
> > > > > > > > > +
> > > > > > > > > +       if (dma->ref)
> > > > > > > > > +               return;
> > > > > > > > > +
> > > > > > > >
> > > > > > > > If you don't unmap there is no guarantee valid data will be
> > > > > > > > there in the buffer.
> > > > > > > >
> > > > > > > > > +       dev = virtqueue_dma_dev(rq->vq);
> > > > > > > > > +
> > > > > > > > > +       dma_unmap_page(dev, dma->addr, dma->len, DMA_FROM_DEVICE);
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > > +
> > > > > > > > > +       dma->next = rq->dma_free;
> > > > > > > > > +       rq->dma_free = dma;
> > > > > > > > > +}
> > > > > > > > > +
> > > > > > > > > +static void *virtnet_rq_recycle_data(struct receive_queue *rq,
> > > > > > > > > +                                    struct virtnet_rq_data *data)
> > > > > > > > > +{
> > > > > > > > > +       void *buf;
> > > > > > > > > +
> > > > > > > > > +       buf = data->buf;
> > > > > > > > > +
> > > > > > > > > +       data->next = rq->data_free;
> > > > > > > > > +       rq->data_free = data;
> > > > > > > > > +
> > > > > > > > > +       return buf;
> > > > > > > > > +}
> > > > > > > > > +
> > > > > > > > > +static struct virtnet_rq_data *virtnet_rq_get_data(struct receive_queue *rq,
> > > > > > > > > +                                                  void *buf,
> > > > > > > > > +                                                  struct virtnet_rq_dma *dma)
> > > > > > > > > +{
> > > > > > > > > +       struct virtnet_rq_data *data;
> > > > > > > > > +
> > > > > > > > > +       data = rq->data_free;
> > > > > > > > > +       rq->data_free = data->next;
> > > > > > > > > +
> > > > > > > > > +       data->buf = buf;
> > > > > > > > > +       data->dma = dma;
> > > > > > > > > +
> > > > > > > > > +       return data;
> > > > > > > > > +}
> > > > > > > > > +
> > > > > > > > > +static void *virtnet_rq_get_buf(struct receive_queue *rq, u32 *len, void **ctx)
> > > > > > > > > +{
> > > > > > > > > +       struct virtnet_rq_data *data;
> > > > > > > > > +       void *buf;
> > > > > > > > > +
> > > > > > > > > +       buf = virtqueue_get_buf_ctx(rq->vq, len, ctx);
> > > > > > > > > +       if (!buf || !rq->data_array)
> > > > > > > > > +               return buf;
> > > > > > > > > +
> > > > > > > > > +       data = buf;
> > > > > > > > > +
> > > > > > > > > +       virtnet_rq_unmap(rq, data->dma);
> > > > > > > > > +
> > > > > > > > > +       return virtnet_rq_recycle_data(rq, data);
> > > > > > > > > +}
> > > > > > > > > +
> > > > > > > > > +static void *virtnet_rq_detach_unused_buf(struct receive_queue *rq)
> > > > > > > > > +{
> > > > > > > > > +       struct virtnet_rq_data *data;
> > > > > > > > > +       void *buf;
> > > > > > > > > +
> > > > > > > > > +       buf = virtqueue_detach_unused_buf(rq->vq);
> > > > > > > > > +       if (!buf || !rq->data_array)
> > > > > > > > > +               return buf;
> > > > > > > > > +
> > > > > > > > > +       data = buf;
> > > > > > > > > +
> > > > > > > > > +       virtnet_rq_unmap(rq, data->dma);
> > > > > > > > > +
> > > > > > > > > +       return virtnet_rq_recycle_data(rq, data);
> > > > > > > > > +}
> > > > > > > > > +
> > > > > > > > > +static int virtnet_rq_map_sg(struct receive_queue *rq, void *buf, u32 len)
> > > > > > > > > +{
> > > > > > > > > +       struct virtnet_rq_dma *dma = rq->last_dma;
> > > > > > > > > +       struct device *dev;
> > > > > > > > > +       u32 off, map_len;
> > > > > > > > > +       dma_addr_t addr;
> > > > > > > > > +       void *end;
> > > > > > > > > +
> > > > > > > > > +       if (likely(dma) && buf >= dma->buf && (buf + len <= dma->buf + dma->len)) {
> > > > > > > > > +               ++dma->ref;
> > > > > > > > > +               addr = dma->addr + (buf - dma->buf);
> > > > > > > > > +               goto ok;
> > > > > > > > > +       }
> > > > > > > >
> > > > > > > > So this is the meat of the proposed optimization. I guess that
> > > > > > > > if the last buffer we allocated happens to be in the same page
> > > > > > > > as this one then they can both be mapped for DMA together.
> > > > > > >
> > > > > > > Since we use page_frag, the buffers we allocated are all continuous.
> > > > > > >
> > > > > > > > Why last one specifically? Whether next one happens to
> > > > > > > > be close depends on luck. If you want to try optimizing this
> > > > > > > > the right thing to do is likely by using a page pool.
> > > > > > > > There's actually work upstream on page pool, look it up.
> > > > > > >
> > > > > > > As we discussed in another thread, the page pool is first used for xdp. Let's
> > > > > > > transform it step by step.
> > > > > > >
> > > > > > > Thanks.
> > > > > >
> > > > > > ok so this should wait then?
> > > > > >
> > > > > > > >
> > > > > > > > > +
> > > > > > > > > +       end = buf + len - 1;
> > > > > > > > > +       off = offset_in_page(end);
> > > > > > > > > +       map_len = len + PAGE_SIZE - off;
> > > > > > > > > +
> > > > > > > > > +       dev = virtqueue_dma_dev(rq->vq);
> > > > > > > > > +
> > > > > > > > > +       addr = dma_map_page_attrs(dev, virt_to_page(buf), offset_in_page(buf),
> > > > > > > > > +                                 map_len, DMA_FROM_DEVICE, 0);
> > > > > > > > > +       if (addr == DMA_MAPPING_ERROR)
> > > > > > > > > +               return -ENOMEM;
> > > > > > > > > +
> > > > > > > > > +       dma = rq->dma_free;
> > > > > > > > > +       rq->dma_free = dma->next;
> > > > > > > > > +
> > > > > > > > > +       dma->ref = 1;
> > > > > > > > > +       dma->buf = buf;
> > > > > > > > > +       dma->addr = addr;
> > > > > > > > > +       dma->len = map_len;
> > > > > > > > > +
> > > > > > > > > +       rq->last_dma = dma;
> > > > > > > > > +
> > > > > > > > > +ok:
> > > > > > > > > +       sg_init_table(rq->sg, 1);
> > > > > > > > > +       rq->sg[0].dma_address = addr;
> > > > > > > > > +       rq->sg[0].length = len;
> > > > > > > > > +
> > > > > > > > > +       return 0;
> > > > > > > > > +}
> > > > > > > > > +
> > > > > > > > > +static int virtnet_rq_merge_map_init(struct virtnet_info *vi)
> > > > > > > > > +{
> > > > > > > > > +       struct receive_queue *rq;
> > > > > > > > > +       int i, err, j, num;
> > > > > > > > > +
> > > > > > > > > +       /* disable for big mode */
> > > > > > > > > +       if (!vi->mergeable_rx_bufs && vi->big_packets)
> > > > > > > > > +               return 0;
> > > > > > > > > +
> > > > > > > > > +       for (i = 0; i < vi->max_queue_pairs; i++) {
> > > > > > > > > +               err = virtqueue_set_premapped(vi->rq[i].vq);
> > > > > > > > > +               if (err)
> > > > > > > > > +                       continue;
> > > > > > > > > +
> > > > > > > > > +               rq = &vi->rq[i];
> > > > > > > > > +
> > > > > > > > > +               num = virtqueue_get_vring_size(rq->vq);
> > > > > > > > > +
> > > > > > > > > +               rq->data_array = kmalloc_array(num, sizeof(*rq->data_array), GFP_KERNEL);
> > > > > > > > > +               if (!rq->data_array)
> > > > > > > > > +                       goto err;
> > > > > > > > > +
> > > > > > > > > +               rq->dma_array = kmalloc_array(num, sizeof(*rq->dma_array), GFP_KERNEL);
> > > > > > > > > +               if (!rq->dma_array)
> > > > > > > > > +                       goto err;
> > > > > > > > > +
> > > > > > > > > +               for (j = 0; j < num; ++j) {
> > > > > > > > > +                       rq->data_array[j].next = rq->data_free;
> > > > > > > > > +                       rq->data_free = &rq->data_array[j];
> > > > > > > > > +
> > > > > > > > > +                       rq->dma_array[j].next = rq->dma_free;
> > > > > > > > > +                       rq->dma_free = &rq->dma_array[j];
> > > > > > > > > +               }
> > > > > > > > > +       }
> > > > > > > > > +
> > > > > > > > > +       return 0;
> > > > > > > > > +
> > > > > > > > > +err:
> > > > > > > > > +       for (i = 0; i < vi->max_queue_pairs; i++) {
> > > > > > > > > +               struct receive_queue *rq;
> > > > > > > > > +
> > > > > > > > > +               rq = &vi->rq[i];
> > > > > > > > > +
> > > > > > > > > +               kfree(rq->dma_array);
> > > > > > > > > +               kfree(rq->data_array);
> > > > > > > > > +       }
> > > > > > > > > +
> > > > > > > > > +       return -ENOMEM;
> > > > > > > > > +}
> > > > > > > > > +
> > > > > > > > >  static void free_old_xmit_skbs(struct send_queue *sq, bool in_napi)
> > > > > > > > >  {
> > > > > > > > >         unsigned int len;
> > > > > > > > > @@ -835,7 +1033,7 @@ static struct page *xdp_linearize_page(struct receive_queue *rq,
> > > > > > > > >                 void *buf;
> > > > > > > > >                 int off;
> > > > > > > > >
> > > > > > > > > -               buf = virtqueue_get_buf(rq->vq, &buflen);
> > > > > > > > > +               buf = virtnet_rq_get_buf(rq, &buflen, NULL);
> > > > > > > > >                 if (unlikely(!buf))
> > > > > > > > >                         goto err_buf;
> > > > > > > > >
> > > > > > > > > @@ -1126,7 +1324,7 @@ static int virtnet_build_xdp_buff_mrg(struct net_device *dev,
> > > > > > > > >                 return -EINVAL;
> > > > > > > > >
> > > > > > > > >         while (--*num_buf > 0) {
> > > > > > > > > -               buf = virtqueue_get_buf_ctx(rq->vq, &len, &ctx);
> > > > > > > > > +               buf = virtnet_rq_get_buf(rq, &len, &ctx);
> > > > > > > > >                 if (unlikely(!buf)) {
> > > > > > > > >                         pr_debug("%s: rx error: %d buffers out of %d missing\n",
> > > > > > > > >                                  dev->name, *num_buf,
> > > > > > > > > @@ -1351,7 +1549,7 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
> > > > > > > > >         while (--num_buf) {
> > > > > > > > >                 int num_skb_frags;
> > > > > > > > >
> > > > > > > > > -               buf = virtqueue_get_buf_ctx(rq->vq, &len, &ctx);
> > > > > > > > > +               buf = virtnet_rq_get_buf(rq, &len, &ctx);
> > > > > > > > >                 if (unlikely(!buf)) {
> > > > > > > > >                         pr_debug("%s: rx error: %d buffers out of %d missing\n",
> > > > > > > > >                                  dev->name, num_buf,
> > > > > > > > > @@ -1414,7 +1612,7 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
> > > > > > > > >  err_skb:
> > > > > > > > >         put_page(page);
> > > > > > > > >         while (num_buf-- > 1) {
> > > > > > > > > -               buf = virtqueue_get_buf(rq->vq, &len);
> > > > > > > > > +               buf = virtnet_rq_get_buf(rq, &len, NULL);
> > > > > > > > >                 if (unlikely(!buf)) {
> > > > > > > > >                         pr_debug("%s: rx error: %d buffers missing\n",
> > > > > > > > >                                  dev->name, num_buf);
> > > > > > > > > @@ -1529,6 +1727,7 @@ static int add_recvbuf_small(struct virtnet_info *vi, struct receive_queue *rq,
> > > > > > > > >         unsigned int xdp_headroom = virtnet_get_headroom(vi);
> > > > > > > > >         void *ctx = (void *)(unsigned long)xdp_headroom;
> > > > > > > > >         int len = vi->hdr_len + VIRTNET_RX_PAD + GOOD_PACKET_LEN + xdp_headroom;
> > > > > > > > > +       struct virtnet_rq_data *data;
> > > > > > > > >         int err;
> > > > > > > > >
> > > > > > > > >         len = SKB_DATA_ALIGN(len) +
> > > > > > > > > @@ -1539,11 +1738,34 @@ static int add_recvbuf_small(struct virtnet_info *vi, struct receive_queue *rq,
> > > > > > > > >         buf = (char *)page_address(alloc_frag->page) + alloc_frag->offset;
> > > > > > > > >         get_page(alloc_frag->page);
> > > > > > > > >         alloc_frag->offset += len;
> > > > > > > > > -       sg_init_one(rq->sg, buf + VIRTNET_RX_PAD + xdp_headroom,
> > > > > > > > > -                   vi->hdr_len + GOOD_PACKET_LEN);
> > > > > > > > > -       err = virtqueue_add_inbuf_ctx(rq->vq, rq->sg, 1, buf, ctx, gfp);
> > > > > > > > > +
> > > > > > > > > +       if (rq->data_array) {
> > > > > > > > > +               err = virtnet_rq_map_sg(rq, buf + VIRTNET_RX_PAD + xdp_headroom,
> > > > > > > > > +                                       vi->hdr_len + GOOD_PACKET_LEN);
> > > > > > > > > +               if (err)
> > > > > > > > > +                       goto map_err;
> > > > > > > > > +
> > > > > > > > > +               data = virtnet_rq_get_data(rq, buf, rq->last_dma);
> > > > > > > > > +       } else {
> > > > > > > > > +               sg_init_one(rq->sg, buf + VIRTNET_RX_PAD + xdp_headroom,
> > > > > > > > > +                           vi->hdr_len + GOOD_PACKET_LEN);
> > > > > > > > > +               data = (void *)buf;
> > > > > > > > > +       }
> > > > > > > > > +
> > > > > > > > > +       err = virtqueue_add_inbuf_ctx(rq->vq, rq->sg, 1, data, ctx, gfp);
> > > > > > > > >         if (err < 0)
> > > > > > > > > -               put_page(virt_to_head_page(buf));
> > > > > > > > > +               goto add_err;
> > > > > > > > > +
> > > > > > > > > +       return err;
> > > > > > > > > +
> > > > > > > > > +add_err:
> > > > > > > > > +       if (rq->data_array) {
> > > > > > > > > +               virtnet_rq_unmap(rq, data->dma);
> > > > > > > > > +               virtnet_rq_recycle_data(rq, data);
> > > > > > > > > +       }
> > > > > > > > > +
> > > > > > > > > +map_err:
> > > > > > > > > +       put_page(virt_to_head_page(buf));
> > > > > > > > >         return err;
> > > > > > > > >  }
> > > > > > > > >
> > > > > > > > > @@ -1620,6 +1842,7 @@ static int add_recvbuf_mergeable(struct virtnet_info *vi,
> > > > > > > > >         unsigned int headroom = virtnet_get_headroom(vi);
> > > > > > > > >         unsigned int tailroom = headroom ? sizeof(struct skb_shared_info) : 0;
> > > > > > > > >         unsigned int room = SKB_DATA_ALIGN(headroom + tailroom);
> > > > > > > > > +       struct virtnet_rq_data *data;
> > > > > > > > >         char *buf;
> > > > > > > > >         void *ctx;
> > > > > > > > >         int err;
> > > > > > > > > @@ -1650,12 +1873,32 @@ static int add_recvbuf_mergeable(struct virtnet_info *vi,
> > > > > > > > >                 alloc_frag->offset += hole;
> > > > > > > > >         }
> > > > > > > > >
> > > > > > > > > -       sg_init_one(rq->sg, buf, len);
> > > > > > > > > +       if (rq->data_array) {
> > > > > > > > > +               err = virtnet_rq_map_sg(rq, buf, len);
> > > > > > > > > +               if (err)
> > > > > > > > > +                       goto map_err;
> > > > > > > > > +
> > > > > > > > > +               data = virtnet_rq_get_data(rq, buf, rq->last_dma);
> > > > > > > > > +       } else {
> > > > > > > > > +               sg_init_one(rq->sg, buf, len);
> > > > > > > > > +               data = (void *)buf;
> > > > > > > > > +       }
> > > > > > > > > +
> > > > > > > > >         ctx = mergeable_len_to_ctx(len + room, headroom);
> > > > > > > > > -       err = virtqueue_add_inbuf_ctx(rq->vq, rq->sg, 1, buf, ctx, gfp);
> > > > > > > > > +       err = virtqueue_add_inbuf_ctx(rq->vq, rq->sg, 1, data, ctx, gfp);
> > > > > > > > >         if (err < 0)
> > > > > > > > > -               put_page(virt_to_head_page(buf));
> > > > > > > > > +               goto add_err;
> > > > > > > > > +
> > > > > > > > > +       return 0;
> > > > > > > > > +
> > > > > > > > > +add_err:
> > > > > > > > > +       if (rq->data_array) {
> > > > > > > > > +               virtnet_rq_unmap(rq, data->dma);
> > > > > > > > > +               virtnet_rq_recycle_data(rq, data);
> > > > > > > > > +       }
> > > > > > > > >
> > > > > > > > > +map_err:
> > > > > > > > > +       put_page(virt_to_head_page(buf));
> > > > > > > > >         return err;
> > > > > > > > >  }
> > > > > > > > >
> > > > > > > > > @@ -1775,13 +2018,13 @@ static int virtnet_receive(struct receive_queue *rq, int budget,
> > > > > > > > >                 void *ctx;
> > > > > > > > >
> > > > > > > > >                 while (stats.packets < budget &&
> > > > > > > > > -                      (buf = virtqueue_get_buf_ctx(rq->vq, &len, &ctx))) {
> > > > > > > > > +                      (buf = virtnet_rq_get_buf(rq, &len, &ctx))) {
> > > > > > > > >                         receive_buf(vi, rq, buf, len, ctx, xdp_xmit, &stats);
> > > > > > > > >                         stats.packets++;
> > > > > > > > >                 }
> > > > > > > > >         } else {
> > > > > > > > >                 while (stats.packets < budget &&
> > > > > > > > > -                      (buf = virtqueue_get_buf(rq->vq, &len)) != NULL) {
> > > > > > > > > +                      (buf = virtnet_rq_get_buf(rq, &len, NULL)) != NULL) {
> > > > > > > > >                         receive_buf(vi, rq, buf, len, NULL, xdp_xmit, &stats);
> > > > > > > > >                         stats.packets++;
> > > > > > > > >                 }
> > > > > > > > > @@ -3514,6 +3757,9 @@ static void virtnet_free_queues(struct virtnet_info *vi)
> > > > > > > > >         for (i = 0; i < vi->max_queue_pairs; i++) {
> > > > > > > > >                 __netif_napi_del(&vi->rq[i].napi);
> > > > > > > > >                 __netif_napi_del(&vi->sq[i].napi);
> > > > > > > > > +
> > > > > > > > > +               kfree(vi->rq[i].data_array);
> > > > > > > > > +               kfree(vi->rq[i].dma_array);
> > > > > > > > >         }
> > > > > > > > >
> > > > > > > > >         /* We called __netif_napi_del(),
> > > > > > > > > @@ -3591,9 +3837,10 @@ static void free_unused_bufs(struct virtnet_info *vi)
> > > > > > > > >         }
> > > > > > > > >
> > > > > > > > >         for (i = 0; i < vi->max_queue_pairs; i++) {
> > > > > > > > > -               struct virtqueue *vq = vi->rq[i].vq;
> > > > > > > > > -               while ((buf = virtqueue_detach_unused_buf(vq)) != NULL)
> > > > > > > > > -                       virtnet_rq_free_unused_buf(vq, buf);
> > > > > > > > > +               struct receive_queue *rq = &vi->rq[i];
> > > > > > > > > +
> > > > > > > > > +               while ((buf = virtnet_rq_detach_unused_buf(rq)) != NULL)
> > > > > > > > > +                       virtnet_rq_free_unused_buf(rq->vq, buf);
> > > > > > > > >                 cond_resched();
> > > > > > > > >         }
> > > > > > > > >  }
> > > > > > > > > @@ -3767,6 +4014,10 @@ static int init_vqs(struct virtnet_info *vi)
> > > > > > > > >         if (ret)
> > > > > > > > >                 goto err_free;
> > > > > > > > >
> > > > > > > > > +       ret = virtnet_rq_merge_map_init(vi);
> > > > > > > > > +       if (ret)
> > > > > > > > > +               goto err_free;
> > > > > > > > > +
> > > > > > > > >         cpus_read_lock();
> > > > > > > > >         virtnet_set_affinity(vi);
> > > > > > > > >         cpus_read_unlock();
> > > > > > > > > --
> > > > > > > > > 2.32.0.3.g01195cf9f
> > > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: [PATCH vhost v11 10/10] virtio_net: merge dma operation for one page
@ 2023-07-12  8:32                     ` Xuan Zhuo
  0 siblings, 0 replies; 176+ messages in thread
From: Xuan Zhuo @ 2023-07-12  8:32 UTC (permalink / raw)
  To: Xuan Zhuo
  Cc: Jesper Dangaard Brouer, Daniel Borkmann, Michael S. Tsirkin,
	netdev, John Fastabend, Alexei Starovoitov, virtualization,
	Christoph Hellwig, Eric Dumazet, Jakub Kicinski, bpf,
	Paolo Abeni, David S. Miller

On Wed, 12 Jul 2023 15:54:58 +0800, Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> On Tue, 11 Jul 2023 10:58:51 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > On Tue, Jul 11, 2023 at 10:42 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > >
> > > On Tue, 11 Jul 2023 10:36:17 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > > On Mon, Jul 10, 2023 at 8:41 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > > >
> > > > > On Mon, 10 Jul 2023 07:59:03 -0400, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > > > > > On Mon, Jul 10, 2023 at 06:18:30PM +0800, Xuan Zhuo wrote:
> > > > > > > On Mon, 10 Jul 2023 05:40:21 -0400, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > > > > > > > On Mon, Jul 10, 2023 at 11:42:37AM +0800, Xuan Zhuo wrote:
> > > > > > > > > Currently, the virtio core will perform a dma operation for each
> > > > > > > > > operation. Although, the same page may be operated multiple times.
> > > > > > > > >
> > > > > > > > > The driver does the dma operation and manages the dma address based the
> > > > > > > > > feature premapped of virtio core.
> > > > > > > > >
> > > > > > > > > This way, we can perform only one dma operation for the same page. In
> > > > > > > > > the case of mtu 1500, this can reduce a lot of dma operations.
> > > > > > > > >
> > > > > > > > > Tested on Aliyun g7.4large machine, in the case of a cpu 100%, pps
> > > > > > > > > increased from 1893766 to 1901105. An increase of 0.4%.
> > > > > > > >
> > > > > > > > what kind of dma was there? an IOMMU? which vendors? in which mode
> > > > > > > > of operation?
> > > > > > >
> > > > > > >
> > > > > > > Do you mean this:
> > > > > > >
> > > > > > > [    0.470816] iommu: Default domain type: Passthrough
> > > > > > >
> > > > > >
> > > > > > With passthrough, dma API is just some indirect function calls, they do
> > > > > > not affect the performance a lot.
> > > > >
> > > > >
> > > > > Yes, this benefit is worthless. I seem to have done a meaningless thing. The
> > > > > overhead of DMA I observed is indeed not too high.
> > > >
> > > > Have you measured with iommu=strict?
> > >
> > > I have not tested this way, our environment is pt, I wonder if strict is a
> > > common scenario. I can test it.
> >
> > It's not a common setup, but it's a way to stress DMA layer to see the overhead.
>
> kernel command line: intel_iommu=on iommu.strict=1 iommu.passthrough=0
>
> virtio-net without merge dma 428614.00 pps
>
> virtio-net with merge dma    742853.00 pps


kernel command line: intel_iommu=on iommu.strict=0 iommu.passthrough=0

virtio-net without merge dma 775496.00 pps

virtio-net with merge dma    1010514.00 pps


Thanks.

>
>
> Thanks.
>
>
>
>
> >
> > Thanks
> >
> > >
> > > Thanks.
> > >
> > >
> > > >
> > > > Thanks
> > > >
> > > > >
> > > > > Thanks.
> > > > >
> > > > >
> > > > > >
> > > > > > Try e.g. bounce buffer. Which is where you will see a problem: your
> > > > > > patches won't work.
> > > > > >
> > > > > >
> > > > > > > >
> > > > > > > > > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > > > > > > >
> > > > > > > > This kind of difference is likely in the noise.
> > > > > > >
> > > > > > > It's really not high, but this is because the proportion of DMA under perf top
> > > > > > > is not high. Probably that much.
> > > > > >
> > > > > > So maybe not worth the complexity.
> > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > > ---
> > > > > > > > >  drivers/net/virtio_net.c | 283 ++++++++++++++++++++++++++++++++++++---
> > > > > > > > >  1 file changed, 267 insertions(+), 16 deletions(-)
> > > > > > > > >
> > > > > > > > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > > > > > > > > index 486b5849033d..4de845d35bed 100644
> > > > > > > > > --- a/drivers/net/virtio_net.c
> > > > > > > > > +++ b/drivers/net/virtio_net.c
> > > > > > > > > @@ -126,6 +126,27 @@ static const struct virtnet_stat_desc virtnet_rq_stats_desc[] = {
> > > > > > > > >  #define VIRTNET_SQ_STATS_LEN   ARRAY_SIZE(virtnet_sq_stats_desc)
> > > > > > > > >  #define VIRTNET_RQ_STATS_LEN   ARRAY_SIZE(virtnet_rq_stats_desc)
> > > > > > > > >
> > > > > > > > > +/* The bufs on the same page may share this struct. */
> > > > > > > > > +struct virtnet_rq_dma {
> > > > > > > > > +       struct virtnet_rq_dma *next;
> > > > > > > > > +
> > > > > > > > > +       dma_addr_t addr;
> > > > > > > > > +
> > > > > > > > > +       void *buf;
> > > > > > > > > +       u32 len;
> > > > > > > > > +
> > > > > > > > > +       u32 ref;
> > > > > > > > > +};
> > > > > > > > > +
> > > > > > > > > +/* Record the dma and buf. */
> > > > > > > >
> > > > > > > > I guess I see that. But why?
> > > > > > > > And these two comments are the extent of the available
> > > > > > > > documentation, that's not enough I feel.
> > > > > > > >
> > > > > > > >
> > > > > > > > > +struct virtnet_rq_data {
> > > > > > > > > +       struct virtnet_rq_data *next;
> > > > > > > >
> > > > > > > > Is manually reimplementing a linked list the best
> > > > > > > > we can do?
> > > > > > >
> > > > > > > Yes, we can use llist.
> > > > > > >
> > > > > > > >
> > > > > > > > > +
> > > > > > > > > +       void *buf;
> > > > > > > > > +
> > > > > > > > > +       struct virtnet_rq_dma *dma;
> > > > > > > > > +};
> > > > > > > > > +
> > > > > > > > >  /* Internal representation of a send virtqueue */
> > > > > > > > >  struct send_queue {
> > > > > > > > >         /* Virtqueue associated with this send _queue */
> > > > > > > > > @@ -175,6 +196,13 @@ struct receive_queue {
> > > > > > > > >         char name[16];
> > > > > > > > >
> > > > > > > > >         struct xdp_rxq_info xdp_rxq;
> > > > > > > > > +
> > > > > > > > > +       struct virtnet_rq_data *data_array;
> > > > > > > > > +       struct virtnet_rq_data *data_free;
> > > > > > > > > +
> > > > > > > > > +       struct virtnet_rq_dma *dma_array;
> > > > > > > > > +       struct virtnet_rq_dma *dma_free;
> > > > > > > > > +       struct virtnet_rq_dma *last_dma;
> > > > > > > > >  };
> > > > > > > > >
> > > > > > > > >  /* This structure can contain rss message with maximum settings for indirection table and keysize
> > > > > > > > > @@ -549,6 +577,176 @@ static struct sk_buff *page_to_skb(struct virtnet_info *vi,
> > > > > > > > >         return skb;
> > > > > > > > >  }
> > > > > > > > >
> > > > > > > > > +static void virtnet_rq_unmap(struct receive_queue *rq, struct virtnet_rq_dma *dma)
> > > > > > > > > +{
> > > > > > > > > +       struct device *dev;
> > > > > > > > > +
> > > > > > > > > +       --dma->ref;
> > > > > > > > > +
> > > > > > > > > +       if (dma->ref)
> > > > > > > > > +               return;
> > > > > > > > > +
> > > > > > > >
> > > > > > > > If you don't unmap there is no guarantee valid data will be
> > > > > > > > there in the buffer.
> > > > > > > >
> > > > > > > > > +       dev = virtqueue_dma_dev(rq->vq);
> > > > > > > > > +
> > > > > > > > > +       dma_unmap_page(dev, dma->addr, dma->len, DMA_FROM_DEVICE);
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > > +
> > > > > > > > > +       dma->next = rq->dma_free;
> > > > > > > > > +       rq->dma_free = dma;
> > > > > > > > > +}
> > > > > > > > > +
> > > > > > > > > +static void *virtnet_rq_recycle_data(struct receive_queue *rq,
> > > > > > > > > +                                    struct virtnet_rq_data *data)
> > > > > > > > > +{
> > > > > > > > > +       void *buf;
> > > > > > > > > +
> > > > > > > > > +       buf = data->buf;
> > > > > > > > > +
> > > > > > > > > +       data->next = rq->data_free;
> > > > > > > > > +       rq->data_free = data;
> > > > > > > > > +
> > > > > > > > > +       return buf;
> > > > > > > > > +}
> > > > > > > > > +
> > > > > > > > > +static struct virtnet_rq_data *virtnet_rq_get_data(struct receive_queue *rq,
> > > > > > > > > +                                                  void *buf,
> > > > > > > > > +                                                  struct virtnet_rq_dma *dma)
> > > > > > > > > +{
> > > > > > > > > +       struct virtnet_rq_data *data;
> > > > > > > > > +
> > > > > > > > > +       data = rq->data_free;
> > > > > > > > > +       rq->data_free = data->next;
> > > > > > > > > +
> > > > > > > > > +       data->buf = buf;
> > > > > > > > > +       data->dma = dma;
> > > > > > > > > +
> > > > > > > > > +       return data;
> > > > > > > > > +}
> > > > > > > > > +
> > > > > > > > > +static void *virtnet_rq_get_buf(struct receive_queue *rq, u32 *len, void **ctx)
> > > > > > > > > +{
> > > > > > > > > +       struct virtnet_rq_data *data;
> > > > > > > > > +       void *buf;
> > > > > > > > > +
> > > > > > > > > +       buf = virtqueue_get_buf_ctx(rq->vq, len, ctx);
> > > > > > > > > +       if (!buf || !rq->data_array)
> > > > > > > > > +               return buf;
> > > > > > > > > +
> > > > > > > > > +       data = buf;
> > > > > > > > > +
> > > > > > > > > +       virtnet_rq_unmap(rq, data->dma);
> > > > > > > > > +
> > > > > > > > > +       return virtnet_rq_recycle_data(rq, data);
> > > > > > > > > +}
> > > > > > > > > +
> > > > > > > > > +static void *virtnet_rq_detach_unused_buf(struct receive_queue *rq)
> > > > > > > > > +{
> > > > > > > > > +       struct virtnet_rq_data *data;
> > > > > > > > > +       void *buf;
> > > > > > > > > +
> > > > > > > > > +       buf = virtqueue_detach_unused_buf(rq->vq);
> > > > > > > > > +       if (!buf || !rq->data_array)
> > > > > > > > > +               return buf;
> > > > > > > > > +
> > > > > > > > > +       data = buf;
> > > > > > > > > +
> > > > > > > > > +       virtnet_rq_unmap(rq, data->dma);
> > > > > > > > > +
> > > > > > > > > +       return virtnet_rq_recycle_data(rq, data);
> > > > > > > > > +}
> > > > > > > > > +
> > > > > > > > > +static int virtnet_rq_map_sg(struct receive_queue *rq, void *buf, u32 len)
> > > > > > > > > +{
> > > > > > > > > +       struct virtnet_rq_dma *dma = rq->last_dma;
> > > > > > > > > +       struct device *dev;
> > > > > > > > > +       u32 off, map_len;
> > > > > > > > > +       dma_addr_t addr;
> > > > > > > > > +       void *end;
> > > > > > > > > +
> > > > > > > > > +       if (likely(dma) && buf >= dma->buf && (buf + len <= dma->buf + dma->len)) {
> > > > > > > > > +               ++dma->ref;
> > > > > > > > > +               addr = dma->addr + (buf - dma->buf);
> > > > > > > > > +               goto ok;
> > > > > > > > > +       }
> > > > > > > >
> > > > > > > > So this is the meat of the proposed optimization. I guess that
> > > > > > > > if the last buffer we allocated happens to be in the same page
> > > > > > > > as this one then they can both be mapped for DMA together.
> > > > > > >
> > > > > > > Since we use page_frag, the buffers we allocated are all continuous.
> > > > > > >
> > > > > > > > Why last one specifically? Whether next one happens to
> > > > > > > > be close depends on luck. If you want to try optimizing this
> > > > > > > > the right thing to do is likely by using a page pool.
> > > > > > > > There's actually work upstream on page pool, look it up.
> > > > > > >
> > > > > > > As we discussed in another thread, the page pool is first used for xdp. Let's
> > > > > > > transform it step by step.
> > > > > > >
> > > > > > > Thanks.
> > > > > >
> > > > > > ok so this should wait then?
> > > > > >
> > > > > > > >
> > > > > > > > > +
> > > > > > > > > +       end = buf + len - 1;
> > > > > > > > > +       off = offset_in_page(end);
> > > > > > > > > +       map_len = len + PAGE_SIZE - off;
> > > > > > > > > +
> > > > > > > > > +       dev = virtqueue_dma_dev(rq->vq);
> > > > > > > > > +
> > > > > > > > > +       addr = dma_map_page_attrs(dev, virt_to_page(buf), offset_in_page(buf),
> > > > > > > > > +                                 map_len, DMA_FROM_DEVICE, 0);
> > > > > > > > > +       if (addr == DMA_MAPPING_ERROR)
> > > > > > > > > +               return -ENOMEM;
> > > > > > > > > +
> > > > > > > > > +       dma = rq->dma_free;
> > > > > > > > > +       rq->dma_free = dma->next;
> > > > > > > > > +
> > > > > > > > > +       dma->ref = 1;
> > > > > > > > > +       dma->buf = buf;
> > > > > > > > > +       dma->addr = addr;
> > > > > > > > > +       dma->len = map_len;
> > > > > > > > > +
> > > > > > > > > +       rq->last_dma = dma;
> > > > > > > > > +
> > > > > > > > > +ok:
> > > > > > > > > +       sg_init_table(rq->sg, 1);
> > > > > > > > > +       rq->sg[0].dma_address = addr;
> > > > > > > > > +       rq->sg[0].length = len;
> > > > > > > > > +
> > > > > > > > > +       return 0;
> > > > > > > > > +}
> > > > > > > > > +
> > > > > > > > > +static int virtnet_rq_merge_map_init(struct virtnet_info *vi)
> > > > > > > > > +{
> > > > > > > > > +       struct receive_queue *rq;
> > > > > > > > > +       int i, err, j, num;
> > > > > > > > > +
> > > > > > > > > +       /* disable for big mode */
> > > > > > > > > +       if (!vi->mergeable_rx_bufs && vi->big_packets)
> > > > > > > > > +               return 0;
> > > > > > > > > +
> > > > > > > > > +       for (i = 0; i < vi->max_queue_pairs; i++) {
> > > > > > > > > +               err = virtqueue_set_premapped(vi->rq[i].vq);
> > > > > > > > > +               if (err)
> > > > > > > > > +                       continue;
> > > > > > > > > +
> > > > > > > > > +               rq = &vi->rq[i];
> > > > > > > > > +
> > > > > > > > > +               num = virtqueue_get_vring_size(rq->vq);
> > > > > > > > > +
> > > > > > > > > +               rq->data_array = kmalloc_array(num, sizeof(*rq->data_array), GFP_KERNEL);
> > > > > > > > > +               if (!rq->data_array)
> > > > > > > > > +                       goto err;
> > > > > > > > > +
> > > > > > > > > +               rq->dma_array = kmalloc_array(num, sizeof(*rq->dma_array), GFP_KERNEL);
> > > > > > > > > +               if (!rq->dma_array)
> > > > > > > > > +                       goto err;
> > > > > > > > > +
> > > > > > > > > +               for (j = 0; j < num; ++j) {
> > > > > > > > > +                       rq->data_array[j].next = rq->data_free;
> > > > > > > > > +                       rq->data_free = &rq->data_array[j];
> > > > > > > > > +
> > > > > > > > > +                       rq->dma_array[j].next = rq->dma_free;
> > > > > > > > > +                       rq->dma_free = &rq->dma_array[j];
> > > > > > > > > +               }
> > > > > > > > > +       }
> > > > > > > > > +
> > > > > > > > > +       return 0;
> > > > > > > > > +
> > > > > > > > > +err:
> > > > > > > > > +       for (i = 0; i < vi->max_queue_pairs; i++) {
> > > > > > > > > +               struct receive_queue *rq;
> > > > > > > > > +
> > > > > > > > > +               rq = &vi->rq[i];
> > > > > > > > > +
> > > > > > > > > +               kfree(rq->dma_array);
> > > > > > > > > +               kfree(rq->data_array);
> > > > > > > > > +       }
> > > > > > > > > +
> > > > > > > > > +       return -ENOMEM;
> > > > > > > > > +}
> > > > > > > > > +
> > > > > > > > >  static void free_old_xmit_skbs(struct send_queue *sq, bool in_napi)
> > > > > > > > >  {
> > > > > > > > >         unsigned int len;
> > > > > > > > > @@ -835,7 +1033,7 @@ static struct page *xdp_linearize_page(struct receive_queue *rq,
> > > > > > > > >                 void *buf;
> > > > > > > > >                 int off;
> > > > > > > > >
> > > > > > > > > -               buf = virtqueue_get_buf(rq->vq, &buflen);
> > > > > > > > > +               buf = virtnet_rq_get_buf(rq, &buflen, NULL);
> > > > > > > > >                 if (unlikely(!buf))
> > > > > > > > >                         goto err_buf;
> > > > > > > > >
> > > > > > > > > @@ -1126,7 +1324,7 @@ static int virtnet_build_xdp_buff_mrg(struct net_device *dev,
> > > > > > > > >                 return -EINVAL;
> > > > > > > > >
> > > > > > > > >         while (--*num_buf > 0) {
> > > > > > > > > -               buf = virtqueue_get_buf_ctx(rq->vq, &len, &ctx);
> > > > > > > > > +               buf = virtnet_rq_get_buf(rq, &len, &ctx);
> > > > > > > > >                 if (unlikely(!buf)) {
> > > > > > > > >                         pr_debug("%s: rx error: %d buffers out of %d missing\n",
> > > > > > > > >                                  dev->name, *num_buf,
> > > > > > > > > @@ -1351,7 +1549,7 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
> > > > > > > > >         while (--num_buf) {
> > > > > > > > >                 int num_skb_frags;
> > > > > > > > >
> > > > > > > > > -               buf = virtqueue_get_buf_ctx(rq->vq, &len, &ctx);
> > > > > > > > > +               buf = virtnet_rq_get_buf(rq, &len, &ctx);
> > > > > > > > >                 if (unlikely(!buf)) {
> > > > > > > > >                         pr_debug("%s: rx error: %d buffers out of %d missing\n",
> > > > > > > > >                                  dev->name, num_buf,
> > > > > > > > > @@ -1414,7 +1612,7 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
> > > > > > > > >  err_skb:
> > > > > > > > >         put_page(page);
> > > > > > > > >         while (num_buf-- > 1) {
> > > > > > > > > -               buf = virtqueue_get_buf(rq->vq, &len);
> > > > > > > > > +               buf = virtnet_rq_get_buf(rq, &len, NULL);
> > > > > > > > >                 if (unlikely(!buf)) {
> > > > > > > > >                         pr_debug("%s: rx error: %d buffers missing\n",
> > > > > > > > >                                  dev->name, num_buf);
> > > > > > > > > @@ -1529,6 +1727,7 @@ static int add_recvbuf_small(struct virtnet_info *vi, struct receive_queue *rq,
> > > > > > > > >         unsigned int xdp_headroom = virtnet_get_headroom(vi);
> > > > > > > > >         void *ctx = (void *)(unsigned long)xdp_headroom;
> > > > > > > > >         int len = vi->hdr_len + VIRTNET_RX_PAD + GOOD_PACKET_LEN + xdp_headroom;
> > > > > > > > > +       struct virtnet_rq_data *data;
> > > > > > > > >         int err;
> > > > > > > > >
> > > > > > > > >         len = SKB_DATA_ALIGN(len) +
> > > > > > > > > @@ -1539,11 +1738,34 @@ static int add_recvbuf_small(struct virtnet_info *vi, struct receive_queue *rq,
> > > > > > > > >         buf = (char *)page_address(alloc_frag->page) + alloc_frag->offset;
> > > > > > > > >         get_page(alloc_frag->page);
> > > > > > > > >         alloc_frag->offset += len;
> > > > > > > > > -       sg_init_one(rq->sg, buf + VIRTNET_RX_PAD + xdp_headroom,
> > > > > > > > > -                   vi->hdr_len + GOOD_PACKET_LEN);
> > > > > > > > > -       err = virtqueue_add_inbuf_ctx(rq->vq, rq->sg, 1, buf, ctx, gfp);
> > > > > > > > > +
> > > > > > > > > +       if (rq->data_array) {
> > > > > > > > > +               err = virtnet_rq_map_sg(rq, buf + VIRTNET_RX_PAD + xdp_headroom,
> > > > > > > > > +                                       vi->hdr_len + GOOD_PACKET_LEN);
> > > > > > > > > +               if (err)
> > > > > > > > > +                       goto map_err;
> > > > > > > > > +
> > > > > > > > > +               data = virtnet_rq_get_data(rq, buf, rq->last_dma);
> > > > > > > > > +       } else {
> > > > > > > > > +               sg_init_one(rq->sg, buf + VIRTNET_RX_PAD + xdp_headroom,
> > > > > > > > > +                           vi->hdr_len + GOOD_PACKET_LEN);
> > > > > > > > > +               data = (void *)buf;
> > > > > > > > > +       }
> > > > > > > > > +
> > > > > > > > > +       err = virtqueue_add_inbuf_ctx(rq->vq, rq->sg, 1, data, ctx, gfp);
> > > > > > > > >         if (err < 0)
> > > > > > > > > -               put_page(virt_to_head_page(buf));
> > > > > > > > > +               goto add_err;
> > > > > > > > > +
> > > > > > > > > +       return err;
> > > > > > > > > +
> > > > > > > > > +add_err:
> > > > > > > > > +       if (rq->data_array) {
> > > > > > > > > +               virtnet_rq_unmap(rq, data->dma);
> > > > > > > > > +               virtnet_rq_recycle_data(rq, data);
> > > > > > > > > +       }
> > > > > > > > > +
> > > > > > > > > +map_err:
> > > > > > > > > +       put_page(virt_to_head_page(buf));
> > > > > > > > >         return err;
> > > > > > > > >  }
> > > > > > > > >
> > > > > > > > > @@ -1620,6 +1842,7 @@ static int add_recvbuf_mergeable(struct virtnet_info *vi,
> > > > > > > > >         unsigned int headroom = virtnet_get_headroom(vi);
> > > > > > > > >         unsigned int tailroom = headroom ? sizeof(struct skb_shared_info) : 0;
> > > > > > > > >         unsigned int room = SKB_DATA_ALIGN(headroom + tailroom);
> > > > > > > > > +       struct virtnet_rq_data *data;
> > > > > > > > >         char *buf;
> > > > > > > > >         void *ctx;
> > > > > > > > >         int err;
> > > > > > > > > @@ -1650,12 +1873,32 @@ static int add_recvbuf_mergeable(struct virtnet_info *vi,
> > > > > > > > >                 alloc_frag->offset += hole;
> > > > > > > > >         }
> > > > > > > > >
> > > > > > > > > -       sg_init_one(rq->sg, buf, len);
> > > > > > > > > +       if (rq->data_array) {
> > > > > > > > > +               err = virtnet_rq_map_sg(rq, buf, len);
> > > > > > > > > +               if (err)
> > > > > > > > > +                       goto map_err;
> > > > > > > > > +
> > > > > > > > > +               data = virtnet_rq_get_data(rq, buf, rq->last_dma);
> > > > > > > > > +       } else {
> > > > > > > > > +               sg_init_one(rq->sg, buf, len);
> > > > > > > > > +               data = (void *)buf;
> > > > > > > > > +       }
> > > > > > > > > +
> > > > > > > > >         ctx = mergeable_len_to_ctx(len + room, headroom);
> > > > > > > > > -       err = virtqueue_add_inbuf_ctx(rq->vq, rq->sg, 1, buf, ctx, gfp);
> > > > > > > > > +       err = virtqueue_add_inbuf_ctx(rq->vq, rq->sg, 1, data, ctx, gfp);
> > > > > > > > >         if (err < 0)
> > > > > > > > > -               put_page(virt_to_head_page(buf));
> > > > > > > > > +               goto add_err;
> > > > > > > > > +
> > > > > > > > > +       return 0;
> > > > > > > > > +
> > > > > > > > > +add_err:
> > > > > > > > > +       if (rq->data_array) {
> > > > > > > > > +               virtnet_rq_unmap(rq, data->dma);
> > > > > > > > > +               virtnet_rq_recycle_data(rq, data);
> > > > > > > > > +       }
> > > > > > > > >
> > > > > > > > > +map_err:
> > > > > > > > > +       put_page(virt_to_head_page(buf));
> > > > > > > > >         return err;
> > > > > > > > >  }
> > > > > > > > >
> > > > > > > > > @@ -1775,13 +2018,13 @@ static int virtnet_receive(struct receive_queue *rq, int budget,
> > > > > > > > >                 void *ctx;
> > > > > > > > >
> > > > > > > > >                 while (stats.packets < budget &&
> > > > > > > > > -                      (buf = virtqueue_get_buf_ctx(rq->vq, &len, &ctx))) {
> > > > > > > > > +                      (buf = virtnet_rq_get_buf(rq, &len, &ctx))) {
> > > > > > > > >                         receive_buf(vi, rq, buf, len, ctx, xdp_xmit, &stats);
> > > > > > > > >                         stats.packets++;
> > > > > > > > >                 }
> > > > > > > > >         } else {
> > > > > > > > >                 while (stats.packets < budget &&
> > > > > > > > > -                      (buf = virtqueue_get_buf(rq->vq, &len)) != NULL) {
> > > > > > > > > +                      (buf = virtnet_rq_get_buf(rq, &len, NULL)) != NULL) {
> > > > > > > > >                         receive_buf(vi, rq, buf, len, NULL, xdp_xmit, &stats);
> > > > > > > > >                         stats.packets++;
> > > > > > > > >                 }
> > > > > > > > > @@ -3514,6 +3757,9 @@ static void virtnet_free_queues(struct virtnet_info *vi)
> > > > > > > > >         for (i = 0; i < vi->max_queue_pairs; i++) {
> > > > > > > > >                 __netif_napi_del(&vi->rq[i].napi);
> > > > > > > > >                 __netif_napi_del(&vi->sq[i].napi);
> > > > > > > > > +
> > > > > > > > > +               kfree(vi->rq[i].data_array);
> > > > > > > > > +               kfree(vi->rq[i].dma_array);
> > > > > > > > >         }
> > > > > > > > >
> > > > > > > > >         /* We called __netif_napi_del(),
> > > > > > > > > @@ -3591,9 +3837,10 @@ static void free_unused_bufs(struct virtnet_info *vi)
> > > > > > > > >         }
> > > > > > > > >
> > > > > > > > >         for (i = 0; i < vi->max_queue_pairs; i++) {
> > > > > > > > > -               struct virtqueue *vq = vi->rq[i].vq;
> > > > > > > > > -               while ((buf = virtqueue_detach_unused_buf(vq)) != NULL)
> > > > > > > > > -                       virtnet_rq_free_unused_buf(vq, buf);
> > > > > > > > > +               struct receive_queue *rq = &vi->rq[i];
> > > > > > > > > +
> > > > > > > > > +               while ((buf = virtnet_rq_detach_unused_buf(rq)) != NULL)
> > > > > > > > > +                       virtnet_rq_free_unused_buf(rq->vq, buf);
> > > > > > > > >                 cond_resched();
> > > > > > > > >         }
> > > > > > > > >  }
> > > > > > > > > @@ -3767,6 +4014,10 @@ static int init_vqs(struct virtnet_info *vi)
> > > > > > > > >         if (ret)
> > > > > > > > >                 goto err_free;
> > > > > > > > >
> > > > > > > > > +       ret = virtnet_rq_merge_map_init(vi);
> > > > > > > > > +       if (ret)
> > > > > > > > > +               goto err_free;
> > > > > > > > > +
> > > > > > > > >         cpus_read_lock();
> > > > > > > > >         virtnet_set_affinity(vi);
> > > > > > > > >         cpus_read_unlock();
> > > > > > > > > --
> > > > > > > > > 2.32.0.3.g01195cf9f
> > > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: [PATCH vhost v11 05/10] virtio_ring: introduce virtqueue_dma_dev()
  2023-07-10  3:42   ` Xuan Zhuo
@ 2023-07-12  8:33     ` Jason Wang
  -1 siblings, 0 replies; 176+ messages in thread
From: Jason Wang @ 2023-07-12  8:33 UTC (permalink / raw)
  To: Xuan Zhuo
  Cc: virtualization, Michael S. Tsirkin, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Alexei Starovoitov,
	Daniel Borkmann, Jesper Dangaard Brouer, John Fastabend, netdev,
	bpf, Christoph Hellwig

On Mon, Jul 10, 2023 at 11:42 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
>
> Added virtqueue_dma_dev() to get DMA device for virtio. Then the
> caller can do dma operation in advance. The purpose is to keep memory
> mapped across multiple add/get buf operations.
>
> Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>

Acked-by: Jason Wang <jasowang@redhat.com>

Thanks

> ---
>  drivers/virtio/virtio_ring.c | 17 +++++++++++++++++
>  include/linux/virtio.h       |  2 ++
>  2 files changed, 19 insertions(+)
>
> diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> index d471dee3f4f7..1fb2c6dca9ea 100644
> --- a/drivers/virtio/virtio_ring.c
> +++ b/drivers/virtio/virtio_ring.c
> @@ -2265,6 +2265,23 @@ int virtqueue_add_inbuf_ctx(struct virtqueue *vq,
>  }
>  EXPORT_SYMBOL_GPL(virtqueue_add_inbuf_ctx);
>
> +/**
> + * virtqueue_dma_dev - get the dma dev
> + * @_vq: the struct virtqueue we're talking about.
> + *
> + * Returns the dma dev. That can been used for dma api.
> + */
> +struct device *virtqueue_dma_dev(struct virtqueue *_vq)
> +{
> +       struct vring_virtqueue *vq = to_vvq(_vq);
> +
> +       if (vq->use_dma_api)
> +               return vring_dma_dev(vq);
> +       else
> +               return NULL;
> +}
> +EXPORT_SYMBOL_GPL(virtqueue_dma_dev);
> +
>  /**
>   * virtqueue_kick_prepare - first half of split virtqueue_kick call.
>   * @_vq: the struct virtqueue
> diff --git a/include/linux/virtio.h b/include/linux/virtio.h
> index 2efd07b79ecf..35d175121cc6 100644
> --- a/include/linux/virtio.h
> +++ b/include/linux/virtio.h
> @@ -61,6 +61,8 @@ int virtqueue_add_sgs(struct virtqueue *vq,
>                       void *data,
>                       gfp_t gfp);
>
> +struct device *virtqueue_dma_dev(struct virtqueue *vq);
> +
>  bool virtqueue_kick(struct virtqueue *vq);
>
>  bool virtqueue_kick_prepare(struct virtqueue *vq);
> --
> 2.32.0.3.g01195cf9f
>


^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: [PATCH vhost v11 05/10] virtio_ring: introduce virtqueue_dma_dev()
@ 2023-07-12  8:33     ` Jason Wang
  0 siblings, 0 replies; 176+ messages in thread
From: Jason Wang @ 2023-07-12  8:33 UTC (permalink / raw)
  To: Xuan Zhuo
  Cc: Jesper Dangaard Brouer, Daniel Borkmann, Michael S. Tsirkin,
	netdev, John Fastabend, Alexei Starovoitov, virtualization,
	Christoph Hellwig, Eric Dumazet, Jakub Kicinski, bpf,
	Paolo Abeni, David S. Miller

On Mon, Jul 10, 2023 at 11:42 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
>
> Added virtqueue_dma_dev() to get DMA device for virtio. Then the
> caller can do dma operation in advance. The purpose is to keep memory
> mapped across multiple add/get buf operations.
>
> Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>

Acked-by: Jason Wang <jasowang@redhat.com>

Thanks

> ---
>  drivers/virtio/virtio_ring.c | 17 +++++++++++++++++
>  include/linux/virtio.h       |  2 ++
>  2 files changed, 19 insertions(+)
>
> diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> index d471dee3f4f7..1fb2c6dca9ea 100644
> --- a/drivers/virtio/virtio_ring.c
> +++ b/drivers/virtio/virtio_ring.c
> @@ -2265,6 +2265,23 @@ int virtqueue_add_inbuf_ctx(struct virtqueue *vq,
>  }
>  EXPORT_SYMBOL_GPL(virtqueue_add_inbuf_ctx);
>
> +/**
> + * virtqueue_dma_dev - get the dma dev
> + * @_vq: the struct virtqueue we're talking about.
> + *
> + * Returns the dma dev. That can been used for dma api.
> + */
> +struct device *virtqueue_dma_dev(struct virtqueue *_vq)
> +{
> +       struct vring_virtqueue *vq = to_vvq(_vq);
> +
> +       if (vq->use_dma_api)
> +               return vring_dma_dev(vq);
> +       else
> +               return NULL;
> +}
> +EXPORT_SYMBOL_GPL(virtqueue_dma_dev);
> +
>  /**
>   * virtqueue_kick_prepare - first half of split virtqueue_kick call.
>   * @_vq: the struct virtqueue
> diff --git a/include/linux/virtio.h b/include/linux/virtio.h
> index 2efd07b79ecf..35d175121cc6 100644
> --- a/include/linux/virtio.h
> +++ b/include/linux/virtio.h
> @@ -61,6 +61,8 @@ int virtqueue_add_sgs(struct virtqueue *vq,
>                       void *data,
>                       gfp_t gfp);
>
> +struct device *virtqueue_dma_dev(struct virtqueue *vq);
> +
>  bool virtqueue_kick(struct virtqueue *vq);
>
>  bool virtqueue_kick_prepare(struct virtqueue *vq);
> --
> 2.32.0.3.g01195cf9f
>

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: [PATCH vhost v11 04/10] virtio_ring: support add premapped buf
  2023-07-12  8:31     ` Jason Wang
@ 2023-07-12  8:33       ` Xuan Zhuo
  -1 siblings, 0 replies; 176+ messages in thread
From: Xuan Zhuo @ 2023-07-12  8:33 UTC (permalink / raw)
  To: Jason Wang
  Cc: virtualization, Michael S. Tsirkin, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Alexei Starovoitov,
	Daniel Borkmann, Jesper Dangaard Brouer, John Fastabend, netdev,
	bpf, Christoph Hellwig

On Wed, 12 Jul 2023 16:31:35 +0800, Jason Wang <jasowang@redhat.com> wrote:
> On Mon, Jul 10, 2023 at 11:42 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> >
> > If the vq is the premapped mode, use the sg_dma_address() directly.
> >
> > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > ---
> >  drivers/virtio/virtio_ring.c | 19 +++++++++++++++++--
> >  1 file changed, 17 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> > index 5ace4539344c..d471dee3f4f7 100644
> > --- a/drivers/virtio/virtio_ring.c
> > +++ b/drivers/virtio/virtio_ring.c
> > @@ -361,6 +361,11 @@ static struct device *vring_dma_dev(const struct vring_virtqueue *vq)
> >  static int vring_map_one_sg(const struct vring_virtqueue *vq, struct scatterlist *sg,
> >                             enum dma_data_direction direction, dma_addr_t *addr)
> >  {
> > +       if (vq->premapped) {
> > +               *addr = sg_dma_address(sg);
> > +               return 0;
> > +       }
> > +
> >         if (!vq->use_dma_api) {
> >                 /*
> >                  * If DMA is not used, KMSAN doesn't know that the scatterlist
> > @@ -639,8 +644,12 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
> >                 dma_addr_t addr = vring_map_single(
> >                         vq, desc, total_sg * sizeof(struct vring_desc),
> >                         DMA_TO_DEVICE);
> > -               if (vring_mapping_error(vq, addr))
> > +               if (vring_mapping_error(vq, addr)) {
> > +                       if (vq->premapped)
> > +                               goto free_indirect;
>
> Under which case could we hit this? A bug of the driver?

Here the map operate is for the indirect descs array.

So this is done inside the virtio core.

Thanks.




>
> Thanks
>
> > +
> >                         goto unmap_release;
> > +               }
> >
> >                 virtqueue_add_desc_split(_vq, vq->split.vring.desc,
> >                                          head, addr,
> > @@ -706,6 +715,7 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
> >                         i = vring_unmap_one_split(vq, i);
> >         }
> >
> > +free_indirect:
> >         if (indirect)
> >                 kfree(desc);
> >
> > @@ -1307,8 +1317,12 @@ static int virtqueue_add_indirect_packed(struct vring_virtqueue *vq,
> >         addr = vring_map_single(vq, desc,
> >                         total_sg * sizeof(struct vring_packed_desc),
> >                         DMA_TO_DEVICE);
> > -       if (vring_mapping_error(vq, addr))
> > +       if (vring_mapping_error(vq, addr)) {
> > +               if (vq->premapped)
> > +                       goto free_desc;
> > +
> >                 goto unmap_release;
> > +       }
> >
> >         vq->packed.vring.desc[head].addr = cpu_to_le64(addr);
> >         vq->packed.vring.desc[head].len = cpu_to_le32(total_sg *
> > @@ -1366,6 +1380,7 @@ static int virtqueue_add_indirect_packed(struct vring_virtqueue *vq,
> >         for (i = 0; i < err_idx; i++)
> >                 vring_unmap_desc_packed(vq, &desc[i]);
> >
> > +free_desc:
> >         kfree(desc);
> >
> >         END_USE(vq);
> > --
> > 2.32.0.3.g01195cf9f
> >
>

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: [PATCH vhost v11 04/10] virtio_ring: support add premapped buf
@ 2023-07-12  8:33       ` Xuan Zhuo
  0 siblings, 0 replies; 176+ messages in thread
From: Xuan Zhuo @ 2023-07-12  8:33 UTC (permalink / raw)
  To: Jason Wang
  Cc: Jesper Dangaard Brouer, Daniel Borkmann, Michael S. Tsirkin,
	netdev, John Fastabend, Alexei Starovoitov, virtualization,
	Christoph Hellwig, Eric Dumazet, Jakub Kicinski, bpf,
	Paolo Abeni, David S. Miller

On Wed, 12 Jul 2023 16:31:35 +0800, Jason Wang <jasowang@redhat.com> wrote:
> On Mon, Jul 10, 2023 at 11:42 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> >
> > If the vq is the premapped mode, use the sg_dma_address() directly.
> >
> > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > ---
> >  drivers/virtio/virtio_ring.c | 19 +++++++++++++++++--
> >  1 file changed, 17 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> > index 5ace4539344c..d471dee3f4f7 100644
> > --- a/drivers/virtio/virtio_ring.c
> > +++ b/drivers/virtio/virtio_ring.c
> > @@ -361,6 +361,11 @@ static struct device *vring_dma_dev(const struct vring_virtqueue *vq)
> >  static int vring_map_one_sg(const struct vring_virtqueue *vq, struct scatterlist *sg,
> >                             enum dma_data_direction direction, dma_addr_t *addr)
> >  {
> > +       if (vq->premapped) {
> > +               *addr = sg_dma_address(sg);
> > +               return 0;
> > +       }
> > +
> >         if (!vq->use_dma_api) {
> >                 /*
> >                  * If DMA is not used, KMSAN doesn't know that the scatterlist
> > @@ -639,8 +644,12 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
> >                 dma_addr_t addr = vring_map_single(
> >                         vq, desc, total_sg * sizeof(struct vring_desc),
> >                         DMA_TO_DEVICE);
> > -               if (vring_mapping_error(vq, addr))
> > +               if (vring_mapping_error(vq, addr)) {
> > +                       if (vq->premapped)
> > +                               goto free_indirect;
>
> Under which case could we hit this? A bug of the driver?

Here the map operate is for the indirect descs array.

So this is done inside the virtio core.

Thanks.




>
> Thanks
>
> > +
> >                         goto unmap_release;
> > +               }
> >
> >                 virtqueue_add_desc_split(_vq, vq->split.vring.desc,
> >                                          head, addr,
> > @@ -706,6 +715,7 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
> >                         i = vring_unmap_one_split(vq, i);
> >         }
> >
> > +free_indirect:
> >         if (indirect)
> >                 kfree(desc);
> >
> > @@ -1307,8 +1317,12 @@ static int virtqueue_add_indirect_packed(struct vring_virtqueue *vq,
> >         addr = vring_map_single(vq, desc,
> >                         total_sg * sizeof(struct vring_packed_desc),
> >                         DMA_TO_DEVICE);
> > -       if (vring_mapping_error(vq, addr))
> > +       if (vring_mapping_error(vq, addr)) {
> > +               if (vq->premapped)
> > +                       goto free_desc;
> > +
> >                 goto unmap_release;
> > +       }
> >
> >         vq->packed.vring.desc[head].addr = cpu_to_le64(addr);
> >         vq->packed.vring.desc[head].len = cpu_to_le32(total_sg *
> > @@ -1366,6 +1380,7 @@ static int virtqueue_add_indirect_packed(struct vring_virtqueue *vq,
> >         for (i = 0; i < err_idx; i++)
> >                 vring_unmap_desc_packed(vq, &desc[i]);
> >
> > +free_desc:
> >         kfree(desc);
> >
> >         END_USE(vq);
> > --
> > 2.32.0.3.g01195cf9f
> >
>
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: [PATCH vhost v11 03/10] virtio_ring: introduce virtqueue_set_premapped()
  2023-07-12  8:24   ` Jason Wang
@ 2023-07-12  8:35       ` Xuan Zhuo
  0 siblings, 0 replies; 176+ messages in thread
From: Xuan Zhuo @ 2023-07-12  8:35 UTC (permalink / raw)
  To: Jason Wang
  Cc: virtualization, Michael S. Tsirkin, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Alexei Starovoitov,
	Daniel Borkmann, Jesper Dangaard Brouer, John Fastabend, netdev,
	bpf, Christoph Hellwig

On Wed, 12 Jul 2023 16:24:18 +0800, Jason Wang <jasowang@redhat.com> wrote:
> On Mon, Jul 10, 2023 at 11:42 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> wrote:
>
> > This helper allows the driver change the dma mode to premapped mode.
> > Under the premapped mode, the virtio core do not do dma mapping
> > internally.
> >
> > This just work when the use_dma_api is true. If the use_dma_api is false,
> > the dma options is not through the DMA APIs, that is not the standard
> > way of the linux kernel.
> >
> > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > ---
> >  drivers/virtio/virtio_ring.c | 45 ++++++++++++++++++++++++++++++++++++
> >  include/linux/virtio.h       |  2 ++
> >  2 files changed, 47 insertions(+)
> >
> > diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> > index 87d7ceeecdbd..5ace4539344c 100644
> > --- a/drivers/virtio/virtio_ring.c
> > +++ b/drivers/virtio/virtio_ring.c
> > @@ -172,6 +172,9 @@ struct vring_virtqueue {
> >         /* Host publishes avail event idx */
> >         bool event;
> >
> > +       /* Do DMA mapping by driver */
> > +       bool premapped;
> > +
> >         /* Head of free buffer list. */
> >         unsigned int free_head;
> >         /* Number we've added since last sync. */
> > @@ -2061,6 +2064,7 @@ static struct virtqueue
> > *vring_create_virtqueue_packed(
> >         vq->packed_ring = true;
> >         vq->dma_dev = dma_dev;
> >         vq->use_dma_api = vring_use_dma_api(vdev);
> > +       vq->premapped = false;
> >
> >         vq->indirect = virtio_has_feature(vdev,
> > VIRTIO_RING_F_INDIRECT_DESC) &&
> >                 !context;
> > @@ -2550,6 +2554,7 @@ static struct virtqueue
> > *__vring_new_virtqueue(unsigned int index,
> >  #endif
> >         vq->dma_dev = dma_dev;
> >         vq->use_dma_api = vring_use_dma_api(vdev);
> > +       vq->premapped = false;
> >
> >         vq->indirect = virtio_has_feature(vdev,
> > VIRTIO_RING_F_INDIRECT_DESC) &&
> >                 !context;
> > @@ -2693,6 +2698,46 @@ int virtqueue_resize(struct virtqueue *_vq, u32 num,
> >  }
> >  EXPORT_SYMBOL_GPL(virtqueue_resize);
> >
> > +/**
> > + * virtqueue_set_premapped - set the vring premapped mode
> > + * @_vq: the struct virtqueue we're talking about.
> > + *
> > + * Enable the premapped mode of the vq.
> > + *
> > + * The vring in premapped mode does not do dma internally, so the driver
> > must
> > + * do dma mapping in advance. The driver must pass the dma_address through
> > + * dma_address of scatterlist. When the driver got a used buffer from
> > + * the vring, it has to unmap the dma address.
> > + *
> > + * This function must be called immediately after creating the vq, or
> > after vq
> > + * reset, and before adding any buffers to it.
> > + *
> > + * Caller must ensure we don't call this with other virtqueue operations
> > + * at the same time (except where noted).
> > + *
> > + * Returns zero or a negative error.
> > + * 0: success.
> > + * -EINVAL: vring does not use the dma api, so we can not enable
> > premapped mode.
> > + */
> > +int virtqueue_set_premapped(struct virtqueue *_vq)
> > +{
> > +       struct vring_virtqueue *vq = to_vvq(_vq);
> > +       u32 num;
> > +
> > +       num = vq->packed_ring ? vq->packed.vring.num : vq->split.vring.num;
> > +
> > +       if (num != vq->vq.num_free)
> > +               return -EINVAL;
> >
>
> If we check this, I think we need to protect this with
> START_USE()/END_USE().

YES.


>
>
> > +
> > +       if (!vq->use_dma_api)
> > +               return -EINVAL;
> >
>
> Not a native spreak, but I think "dma_premapped" is better than "premapped"
> as "dma_premapped" implies "use_dma_api".

I am ok to fix this.

Thanks.


>
> Thanks
>
>
> > +
> > +       vq->premapped = true;
> > +
> > +       return 0;
> > +}
> > +EXPORT_SYMBOL_GPL(virtqueue_set_premapped);
> > +
> >  /* Only available for split ring */
> >  struct virtqueue *vring_new_virtqueue(unsigned int index,
> >                                       unsigned int num,
> > diff --git a/include/linux/virtio.h b/include/linux/virtio.h
> > index de6041deee37..2efd07b79ecf 100644
> > --- a/include/linux/virtio.h
> > +++ b/include/linux/virtio.h
> > @@ -78,6 +78,8 @@ bool virtqueue_enable_cb(struct virtqueue *vq);
> >
> >  unsigned virtqueue_enable_cb_prepare(struct virtqueue *vq);
> >
> > +int virtqueue_set_premapped(struct virtqueue *_vq);
> > +
> >  bool virtqueue_poll(struct virtqueue *vq, unsigned);
> >
> >  bool virtqueue_enable_cb_delayed(struct virtqueue *vq);
> > --
> > 2.32.0.3.g01195cf9f
> >
> >
>

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: [PATCH vhost v11 03/10] virtio_ring: introduce virtqueue_set_premapped()
@ 2023-07-12  8:35       ` Xuan Zhuo
  0 siblings, 0 replies; 176+ messages in thread
From: Xuan Zhuo @ 2023-07-12  8:35 UTC (permalink / raw)
  To: Jason Wang
  Cc: Jesper Dangaard Brouer, Daniel Borkmann, Michael S. Tsirkin,
	netdev, John Fastabend, Alexei Starovoitov, virtualization,
	Christoph Hellwig, Eric Dumazet, Jakub Kicinski, bpf,
	Paolo Abeni, David S. Miller

On Wed, 12 Jul 2023 16:24:18 +0800, Jason Wang <jasowang@redhat.com> wrote:
> On Mon, Jul 10, 2023 at 11:42 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> wrote:
>
> > This helper allows the driver change the dma mode to premapped mode.
> > Under the premapped mode, the virtio core do not do dma mapping
> > internally.
> >
> > This just work when the use_dma_api is true. If the use_dma_api is false,
> > the dma options is not through the DMA APIs, that is not the standard
> > way of the linux kernel.
> >
> > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > ---
> >  drivers/virtio/virtio_ring.c | 45 ++++++++++++++++++++++++++++++++++++
> >  include/linux/virtio.h       |  2 ++
> >  2 files changed, 47 insertions(+)
> >
> > diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> > index 87d7ceeecdbd..5ace4539344c 100644
> > --- a/drivers/virtio/virtio_ring.c
> > +++ b/drivers/virtio/virtio_ring.c
> > @@ -172,6 +172,9 @@ struct vring_virtqueue {
> >         /* Host publishes avail event idx */
> >         bool event;
> >
> > +       /* Do DMA mapping by driver */
> > +       bool premapped;
> > +
> >         /* Head of free buffer list. */
> >         unsigned int free_head;
> >         /* Number we've added since last sync. */
> > @@ -2061,6 +2064,7 @@ static struct virtqueue
> > *vring_create_virtqueue_packed(
> >         vq->packed_ring = true;
> >         vq->dma_dev = dma_dev;
> >         vq->use_dma_api = vring_use_dma_api(vdev);
> > +       vq->premapped = false;
> >
> >         vq->indirect = virtio_has_feature(vdev,
> > VIRTIO_RING_F_INDIRECT_DESC) &&
> >                 !context;
> > @@ -2550,6 +2554,7 @@ static struct virtqueue
> > *__vring_new_virtqueue(unsigned int index,
> >  #endif
> >         vq->dma_dev = dma_dev;
> >         vq->use_dma_api = vring_use_dma_api(vdev);
> > +       vq->premapped = false;
> >
> >         vq->indirect = virtio_has_feature(vdev,
> > VIRTIO_RING_F_INDIRECT_DESC) &&
> >                 !context;
> > @@ -2693,6 +2698,46 @@ int virtqueue_resize(struct virtqueue *_vq, u32 num,
> >  }
> >  EXPORT_SYMBOL_GPL(virtqueue_resize);
> >
> > +/**
> > + * virtqueue_set_premapped - set the vring premapped mode
> > + * @_vq: the struct virtqueue we're talking about.
> > + *
> > + * Enable the premapped mode of the vq.
> > + *
> > + * The vring in premapped mode does not do dma internally, so the driver
> > must
> > + * do dma mapping in advance. The driver must pass the dma_address through
> > + * dma_address of scatterlist. When the driver got a used buffer from
> > + * the vring, it has to unmap the dma address.
> > + *
> > + * This function must be called immediately after creating the vq, or
> > after vq
> > + * reset, and before adding any buffers to it.
> > + *
> > + * Caller must ensure we don't call this with other virtqueue operations
> > + * at the same time (except where noted).
> > + *
> > + * Returns zero or a negative error.
> > + * 0: success.
> > + * -EINVAL: vring does not use the dma api, so we can not enable
> > premapped mode.
> > + */
> > +int virtqueue_set_premapped(struct virtqueue *_vq)
> > +{
> > +       struct vring_virtqueue *vq = to_vvq(_vq);
> > +       u32 num;
> > +
> > +       num = vq->packed_ring ? vq->packed.vring.num : vq->split.vring.num;
> > +
> > +       if (num != vq->vq.num_free)
> > +               return -EINVAL;
> >
>
> If we check this, I think we need to protect this with
> START_USE()/END_USE().

YES.


>
>
> > +
> > +       if (!vq->use_dma_api)
> > +               return -EINVAL;
> >
>
> Not a native spreak, but I think "dma_premapped" is better than "premapped"
> as "dma_premapped" implies "use_dma_api".

I am ok to fix this.

Thanks.


>
> Thanks
>
>
> > +
> > +       vq->premapped = true;
> > +
> > +       return 0;
> > +}
> > +EXPORT_SYMBOL_GPL(virtqueue_set_premapped);
> > +
> >  /* Only available for split ring */
> >  struct virtqueue *vring_new_virtqueue(unsigned int index,
> >                                       unsigned int num,
> > diff --git a/include/linux/virtio.h b/include/linux/virtio.h
> > index de6041deee37..2efd07b79ecf 100644
> > --- a/include/linux/virtio.h
> > +++ b/include/linux/virtio.h
> > @@ -78,6 +78,8 @@ bool virtqueue_enable_cb(struct virtqueue *vq);
> >
> >  unsigned virtqueue_enable_cb_prepare(struct virtqueue *vq);
> >
> > +int virtqueue_set_premapped(struct virtqueue *_vq);
> > +
> >  bool virtqueue_poll(struct virtqueue *vq, unsigned);
> >
> >  bool virtqueue_enable_cb_delayed(struct virtqueue *vq);
> > --
> > 2.32.0.3.g01195cf9f
> >
> >
>
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: [PATCH vhost v11 10/10] virtio_net: merge dma operation for one page
  2023-07-12  8:32                     ` Xuan Zhuo
@ 2023-07-12  8:37                       ` Jason Wang
  -1 siblings, 0 replies; 176+ messages in thread
From: Jason Wang @ 2023-07-12  8:37 UTC (permalink / raw)
  To: Xuan Zhuo
  Cc: Michael S. Tsirkin, virtualization, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Alexei Starovoitov,
	Daniel Borkmann, Jesper Dangaard Brouer, John Fastabend, netdev,
	bpf, Christoph Hellwig

On Wed, Jul 12, 2023 at 4:33 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
>
> On Wed, 12 Jul 2023 15:54:58 +0800, Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > On Tue, 11 Jul 2023 10:58:51 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > On Tue, Jul 11, 2023 at 10:42 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > >
> > > > On Tue, 11 Jul 2023 10:36:17 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > > > On Mon, Jul 10, 2023 at 8:41 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > > > >
> > > > > > On Mon, 10 Jul 2023 07:59:03 -0400, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > > > > > > On Mon, Jul 10, 2023 at 06:18:30PM +0800, Xuan Zhuo wrote:
> > > > > > > > On Mon, 10 Jul 2023 05:40:21 -0400, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > > > > > > > > On Mon, Jul 10, 2023 at 11:42:37AM +0800, Xuan Zhuo wrote:
> > > > > > > > > > Currently, the virtio core will perform a dma operation for each
> > > > > > > > > > operation. Although, the same page may be operated multiple times.
> > > > > > > > > >
> > > > > > > > > > The driver does the dma operation and manages the dma address based the
> > > > > > > > > > feature premapped of virtio core.
> > > > > > > > > >
> > > > > > > > > > This way, we can perform only one dma operation for the same page. In
> > > > > > > > > > the case of mtu 1500, this can reduce a lot of dma operations.
> > > > > > > > > >
> > > > > > > > > > Tested on Aliyun g7.4large machine, in the case of a cpu 100%, pps
> > > > > > > > > > increased from 1893766 to 1901105. An increase of 0.4%.
> > > > > > > > >
> > > > > > > > > what kind of dma was there? an IOMMU? which vendors? in which mode
> > > > > > > > > of operation?
> > > > > > > >
> > > > > > > >
> > > > > > > > Do you mean this:
> > > > > > > >
> > > > > > > > [    0.470816] iommu: Default domain type: Passthrough
> > > > > > > >
> > > > > > >
> > > > > > > With passthrough, dma API is just some indirect function calls, they do
> > > > > > > not affect the performance a lot.
> > > > > >
> > > > > >
> > > > > > Yes, this benefit is worthless. I seem to have done a meaningless thing. The
> > > > > > overhead of DMA I observed is indeed not too high.
> > > > >
> > > > > Have you measured with iommu=strict?
> > > >
> > > > I have not tested this way, our environment is pt, I wonder if strict is a
> > > > common scenario. I can test it.
> > >
> > > It's not a common setup, but it's a way to stress DMA layer to see the overhead.
> >
> > kernel command line: intel_iommu=on iommu.strict=1 iommu.passthrough=0
> >
> > virtio-net without merge dma 428614.00 pps
> >
> > virtio-net with merge dma    742853.00 pps
>
>
> kernel command line: intel_iommu=on iommu.strict=0 iommu.passthrough=0
>
> virtio-net without merge dma 775496.00 pps
>
> virtio-net with merge dma    1010514.00 pps
>
>

Great, let's add those numbers to the changelog.

Thanks

> Thanks.
>
> >
> >
> > Thanks.
> >
> >
> >
> >
> > >
> > > Thanks
> > >
> > > >
> > > > Thanks.
> > > >
> > > >
> > > > >
> > > > > Thanks
> > > > >
> > > > > >
> > > > > > Thanks.
> > > > > >
> > > > > >
> > > > > > >
> > > > > > > Try e.g. bounce buffer. Which is where you will see a problem: your
> > > > > > > patches won't work.
> > > > > > >
> > > > > > >
> > > > > > > > >
> > > > > > > > > > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > > > > > > > >
> > > > > > > > > This kind of difference is likely in the noise.
> > > > > > > >
> > > > > > > > It's really not high, but this is because the proportion of DMA under perf top
> > > > > > > > is not high. Probably that much.
> > > > > > >
> > > > > > > So maybe not worth the complexity.
> > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > > ---
> > > > > > > > > >  drivers/net/virtio_net.c | 283 ++++++++++++++++++++++++++++++++++++---
> > > > > > > > > >  1 file changed, 267 insertions(+), 16 deletions(-)
> > > > > > > > > >
> > > > > > > > > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > > > > > > > > > index 486b5849033d..4de845d35bed 100644
> > > > > > > > > > --- a/drivers/net/virtio_net.c
> > > > > > > > > > +++ b/drivers/net/virtio_net.c
> > > > > > > > > > @@ -126,6 +126,27 @@ static const struct virtnet_stat_desc virtnet_rq_stats_desc[] = {
> > > > > > > > > >  #define VIRTNET_SQ_STATS_LEN   ARRAY_SIZE(virtnet_sq_stats_desc)
> > > > > > > > > >  #define VIRTNET_RQ_STATS_LEN   ARRAY_SIZE(virtnet_rq_stats_desc)
> > > > > > > > > >
> > > > > > > > > > +/* The bufs on the same page may share this struct. */
> > > > > > > > > > +struct virtnet_rq_dma {
> > > > > > > > > > +       struct virtnet_rq_dma *next;
> > > > > > > > > > +
> > > > > > > > > > +       dma_addr_t addr;
> > > > > > > > > > +
> > > > > > > > > > +       void *buf;
> > > > > > > > > > +       u32 len;
> > > > > > > > > > +
> > > > > > > > > > +       u32 ref;
> > > > > > > > > > +};
> > > > > > > > > > +
> > > > > > > > > > +/* Record the dma and buf. */
> > > > > > > > >
> > > > > > > > > I guess I see that. But why?
> > > > > > > > > And these two comments are the extent of the available
> > > > > > > > > documentation, that's not enough I feel.
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > > +struct virtnet_rq_data {
> > > > > > > > > > +       struct virtnet_rq_data *next;
> > > > > > > > >
> > > > > > > > > Is manually reimplementing a linked list the best
> > > > > > > > > we can do?
> > > > > > > >
> > > > > > > > Yes, we can use llist.
> > > > > > > >
> > > > > > > > >
> > > > > > > > > > +
> > > > > > > > > > +       void *buf;
> > > > > > > > > > +
> > > > > > > > > > +       struct virtnet_rq_dma *dma;
> > > > > > > > > > +};
> > > > > > > > > > +
> > > > > > > > > >  /* Internal representation of a send virtqueue */
> > > > > > > > > >  struct send_queue {
> > > > > > > > > >         /* Virtqueue associated with this send _queue */
> > > > > > > > > > @@ -175,6 +196,13 @@ struct receive_queue {
> > > > > > > > > >         char name[16];
> > > > > > > > > >
> > > > > > > > > >         struct xdp_rxq_info xdp_rxq;
> > > > > > > > > > +
> > > > > > > > > > +       struct virtnet_rq_data *data_array;
> > > > > > > > > > +       struct virtnet_rq_data *data_free;
> > > > > > > > > > +
> > > > > > > > > > +       struct virtnet_rq_dma *dma_array;
> > > > > > > > > > +       struct virtnet_rq_dma *dma_free;
> > > > > > > > > > +       struct virtnet_rq_dma *last_dma;
> > > > > > > > > >  };
> > > > > > > > > >
> > > > > > > > > >  /* This structure can contain rss message with maximum settings for indirection table and keysize
> > > > > > > > > > @@ -549,6 +577,176 @@ static struct sk_buff *page_to_skb(struct virtnet_info *vi,
> > > > > > > > > >         return skb;
> > > > > > > > > >  }
> > > > > > > > > >
> > > > > > > > > > +static void virtnet_rq_unmap(struct receive_queue *rq, struct virtnet_rq_dma *dma)
> > > > > > > > > > +{
> > > > > > > > > > +       struct device *dev;
> > > > > > > > > > +
> > > > > > > > > > +       --dma->ref;
> > > > > > > > > > +
> > > > > > > > > > +       if (dma->ref)
> > > > > > > > > > +               return;
> > > > > > > > > > +
> > > > > > > > >
> > > > > > > > > If you don't unmap there is no guarantee valid data will be
> > > > > > > > > there in the buffer.
> > > > > > > > >
> > > > > > > > > > +       dev = virtqueue_dma_dev(rq->vq);
> > > > > > > > > > +
> > > > > > > > > > +       dma_unmap_page(dev, dma->addr, dma->len, DMA_FROM_DEVICE);
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > > +
> > > > > > > > > > +       dma->next = rq->dma_free;
> > > > > > > > > > +       rq->dma_free = dma;
> > > > > > > > > > +}
> > > > > > > > > > +
> > > > > > > > > > +static void *virtnet_rq_recycle_data(struct receive_queue *rq,
> > > > > > > > > > +                                    struct virtnet_rq_data *data)
> > > > > > > > > > +{
> > > > > > > > > > +       void *buf;
> > > > > > > > > > +
> > > > > > > > > > +       buf = data->buf;
> > > > > > > > > > +
> > > > > > > > > > +       data->next = rq->data_free;
> > > > > > > > > > +       rq->data_free = data;
> > > > > > > > > > +
> > > > > > > > > > +       return buf;
> > > > > > > > > > +}
> > > > > > > > > > +
> > > > > > > > > > +static struct virtnet_rq_data *virtnet_rq_get_data(struct receive_queue *rq,
> > > > > > > > > > +                                                  void *buf,
> > > > > > > > > > +                                                  struct virtnet_rq_dma *dma)
> > > > > > > > > > +{
> > > > > > > > > > +       struct virtnet_rq_data *data;
> > > > > > > > > > +
> > > > > > > > > > +       data = rq->data_free;
> > > > > > > > > > +       rq->data_free = data->next;
> > > > > > > > > > +
> > > > > > > > > > +       data->buf = buf;
> > > > > > > > > > +       data->dma = dma;
> > > > > > > > > > +
> > > > > > > > > > +       return data;
> > > > > > > > > > +}
> > > > > > > > > > +
> > > > > > > > > > +static void *virtnet_rq_get_buf(struct receive_queue *rq, u32 *len, void **ctx)
> > > > > > > > > > +{
> > > > > > > > > > +       struct virtnet_rq_data *data;
> > > > > > > > > > +       void *buf;
> > > > > > > > > > +
> > > > > > > > > > +       buf = virtqueue_get_buf_ctx(rq->vq, len, ctx);
> > > > > > > > > > +       if (!buf || !rq->data_array)
> > > > > > > > > > +               return buf;
> > > > > > > > > > +
> > > > > > > > > > +       data = buf;
> > > > > > > > > > +
> > > > > > > > > > +       virtnet_rq_unmap(rq, data->dma);
> > > > > > > > > > +
> > > > > > > > > > +       return virtnet_rq_recycle_data(rq, data);
> > > > > > > > > > +}
> > > > > > > > > > +
> > > > > > > > > > +static void *virtnet_rq_detach_unused_buf(struct receive_queue *rq)
> > > > > > > > > > +{
> > > > > > > > > > +       struct virtnet_rq_data *data;
> > > > > > > > > > +       void *buf;
> > > > > > > > > > +
> > > > > > > > > > +       buf = virtqueue_detach_unused_buf(rq->vq);
> > > > > > > > > > +       if (!buf || !rq->data_array)
> > > > > > > > > > +               return buf;
> > > > > > > > > > +
> > > > > > > > > > +       data = buf;
> > > > > > > > > > +
> > > > > > > > > > +       virtnet_rq_unmap(rq, data->dma);
> > > > > > > > > > +
> > > > > > > > > > +       return virtnet_rq_recycle_data(rq, data);
> > > > > > > > > > +}
> > > > > > > > > > +
> > > > > > > > > > +static int virtnet_rq_map_sg(struct receive_queue *rq, void *buf, u32 len)
> > > > > > > > > > +{
> > > > > > > > > > +       struct virtnet_rq_dma *dma = rq->last_dma;
> > > > > > > > > > +       struct device *dev;
> > > > > > > > > > +       u32 off, map_len;
> > > > > > > > > > +       dma_addr_t addr;
> > > > > > > > > > +       void *end;
> > > > > > > > > > +
> > > > > > > > > > +       if (likely(dma) && buf >= dma->buf && (buf + len <= dma->buf + dma->len)) {
> > > > > > > > > > +               ++dma->ref;
> > > > > > > > > > +               addr = dma->addr + (buf - dma->buf);
> > > > > > > > > > +               goto ok;
> > > > > > > > > > +       }
> > > > > > > > >
> > > > > > > > > So this is the meat of the proposed optimization. I guess that
> > > > > > > > > if the last buffer we allocated happens to be in the same page
> > > > > > > > > as this one then they can both be mapped for DMA together.
> > > > > > > >
> > > > > > > > Since we use page_frag, the buffers we allocated are all continuous.
> > > > > > > >
> > > > > > > > > Why last one specifically? Whether next one happens to
> > > > > > > > > be close depends on luck. If you want to try optimizing this
> > > > > > > > > the right thing to do is likely by using a page pool.
> > > > > > > > > There's actually work upstream on page pool, look it up.
> > > > > > > >
> > > > > > > > As we discussed in another thread, the page pool is first used for xdp. Let's
> > > > > > > > transform it step by step.
> > > > > > > >
> > > > > > > > Thanks.
> > > > > > >
> > > > > > > ok so this should wait then?
> > > > > > >
> > > > > > > > >
> > > > > > > > > > +
> > > > > > > > > > +       end = buf + len - 1;
> > > > > > > > > > +       off = offset_in_page(end);
> > > > > > > > > > +       map_len = len + PAGE_SIZE - off;
> > > > > > > > > > +
> > > > > > > > > > +       dev = virtqueue_dma_dev(rq->vq);
> > > > > > > > > > +
> > > > > > > > > > +       addr = dma_map_page_attrs(dev, virt_to_page(buf), offset_in_page(buf),
> > > > > > > > > > +                                 map_len, DMA_FROM_DEVICE, 0);
> > > > > > > > > > +       if (addr == DMA_MAPPING_ERROR)
> > > > > > > > > > +               return -ENOMEM;
> > > > > > > > > > +
> > > > > > > > > > +       dma = rq->dma_free;
> > > > > > > > > > +       rq->dma_free = dma->next;
> > > > > > > > > > +
> > > > > > > > > > +       dma->ref = 1;
> > > > > > > > > > +       dma->buf = buf;
> > > > > > > > > > +       dma->addr = addr;
> > > > > > > > > > +       dma->len = map_len;
> > > > > > > > > > +
> > > > > > > > > > +       rq->last_dma = dma;
> > > > > > > > > > +
> > > > > > > > > > +ok:
> > > > > > > > > > +       sg_init_table(rq->sg, 1);
> > > > > > > > > > +       rq->sg[0].dma_address = addr;
> > > > > > > > > > +       rq->sg[0].length = len;
> > > > > > > > > > +
> > > > > > > > > > +       return 0;
> > > > > > > > > > +}
> > > > > > > > > > +
> > > > > > > > > > +static int virtnet_rq_merge_map_init(struct virtnet_info *vi)
> > > > > > > > > > +{
> > > > > > > > > > +       struct receive_queue *rq;
> > > > > > > > > > +       int i, err, j, num;
> > > > > > > > > > +
> > > > > > > > > > +       /* disable for big mode */
> > > > > > > > > > +       if (!vi->mergeable_rx_bufs && vi->big_packets)
> > > > > > > > > > +               return 0;
> > > > > > > > > > +
> > > > > > > > > > +       for (i = 0; i < vi->max_queue_pairs; i++) {
> > > > > > > > > > +               err = virtqueue_set_premapped(vi->rq[i].vq);
> > > > > > > > > > +               if (err)
> > > > > > > > > > +                       continue;
> > > > > > > > > > +
> > > > > > > > > > +               rq = &vi->rq[i];
> > > > > > > > > > +
> > > > > > > > > > +               num = virtqueue_get_vring_size(rq->vq);
> > > > > > > > > > +
> > > > > > > > > > +               rq->data_array = kmalloc_array(num, sizeof(*rq->data_array), GFP_KERNEL);
> > > > > > > > > > +               if (!rq->data_array)
> > > > > > > > > > +                       goto err;
> > > > > > > > > > +
> > > > > > > > > > +               rq->dma_array = kmalloc_array(num, sizeof(*rq->dma_array), GFP_KERNEL);
> > > > > > > > > > +               if (!rq->dma_array)
> > > > > > > > > > +                       goto err;
> > > > > > > > > > +
> > > > > > > > > > +               for (j = 0; j < num; ++j) {
> > > > > > > > > > +                       rq->data_array[j].next = rq->data_free;
> > > > > > > > > > +                       rq->data_free = &rq->data_array[j];
> > > > > > > > > > +
> > > > > > > > > > +                       rq->dma_array[j].next = rq->dma_free;
> > > > > > > > > > +                       rq->dma_free = &rq->dma_array[j];
> > > > > > > > > > +               }
> > > > > > > > > > +       }
> > > > > > > > > > +
> > > > > > > > > > +       return 0;
> > > > > > > > > > +
> > > > > > > > > > +err:
> > > > > > > > > > +       for (i = 0; i < vi->max_queue_pairs; i++) {
> > > > > > > > > > +               struct receive_queue *rq;
> > > > > > > > > > +
> > > > > > > > > > +               rq = &vi->rq[i];
> > > > > > > > > > +
> > > > > > > > > > +               kfree(rq->dma_array);
> > > > > > > > > > +               kfree(rq->data_array);
> > > > > > > > > > +       }
> > > > > > > > > > +
> > > > > > > > > > +       return -ENOMEM;
> > > > > > > > > > +}
> > > > > > > > > > +
> > > > > > > > > >  static void free_old_xmit_skbs(struct send_queue *sq, bool in_napi)
> > > > > > > > > >  {
> > > > > > > > > >         unsigned int len;
> > > > > > > > > > @@ -835,7 +1033,7 @@ static struct page *xdp_linearize_page(struct receive_queue *rq,
> > > > > > > > > >                 void *buf;
> > > > > > > > > >                 int off;
> > > > > > > > > >
> > > > > > > > > > -               buf = virtqueue_get_buf(rq->vq, &buflen);
> > > > > > > > > > +               buf = virtnet_rq_get_buf(rq, &buflen, NULL);
> > > > > > > > > >                 if (unlikely(!buf))
> > > > > > > > > >                         goto err_buf;
> > > > > > > > > >
> > > > > > > > > > @@ -1126,7 +1324,7 @@ static int virtnet_build_xdp_buff_mrg(struct net_device *dev,
> > > > > > > > > >                 return -EINVAL;
> > > > > > > > > >
> > > > > > > > > >         while (--*num_buf > 0) {
> > > > > > > > > > -               buf = virtqueue_get_buf_ctx(rq->vq, &len, &ctx);
> > > > > > > > > > +               buf = virtnet_rq_get_buf(rq, &len, &ctx);
> > > > > > > > > >                 if (unlikely(!buf)) {
> > > > > > > > > >                         pr_debug("%s: rx error: %d buffers out of %d missing\n",
> > > > > > > > > >                                  dev->name, *num_buf,
> > > > > > > > > > @@ -1351,7 +1549,7 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
> > > > > > > > > >         while (--num_buf) {
> > > > > > > > > >                 int num_skb_frags;
> > > > > > > > > >
> > > > > > > > > > -               buf = virtqueue_get_buf_ctx(rq->vq, &len, &ctx);
> > > > > > > > > > +               buf = virtnet_rq_get_buf(rq, &len, &ctx);
> > > > > > > > > >                 if (unlikely(!buf)) {
> > > > > > > > > >                         pr_debug("%s: rx error: %d buffers out of %d missing\n",
> > > > > > > > > >                                  dev->name, num_buf,
> > > > > > > > > > @@ -1414,7 +1612,7 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
> > > > > > > > > >  err_skb:
> > > > > > > > > >         put_page(page);
> > > > > > > > > >         while (num_buf-- > 1) {
> > > > > > > > > > -               buf = virtqueue_get_buf(rq->vq, &len);
> > > > > > > > > > +               buf = virtnet_rq_get_buf(rq, &len, NULL);
> > > > > > > > > >                 if (unlikely(!buf)) {
> > > > > > > > > >                         pr_debug("%s: rx error: %d buffers missing\n",
> > > > > > > > > >                                  dev->name, num_buf);
> > > > > > > > > > @@ -1529,6 +1727,7 @@ static int add_recvbuf_small(struct virtnet_info *vi, struct receive_queue *rq,
> > > > > > > > > >         unsigned int xdp_headroom = virtnet_get_headroom(vi);
> > > > > > > > > >         void *ctx = (void *)(unsigned long)xdp_headroom;
> > > > > > > > > >         int len = vi->hdr_len + VIRTNET_RX_PAD + GOOD_PACKET_LEN + xdp_headroom;
> > > > > > > > > > +       struct virtnet_rq_data *data;
> > > > > > > > > >         int err;
> > > > > > > > > >
> > > > > > > > > >         len = SKB_DATA_ALIGN(len) +
> > > > > > > > > > @@ -1539,11 +1738,34 @@ static int add_recvbuf_small(struct virtnet_info *vi, struct receive_queue *rq,
> > > > > > > > > >         buf = (char *)page_address(alloc_frag->page) + alloc_frag->offset;
> > > > > > > > > >         get_page(alloc_frag->page);
> > > > > > > > > >         alloc_frag->offset += len;
> > > > > > > > > > -       sg_init_one(rq->sg, buf + VIRTNET_RX_PAD + xdp_headroom,
> > > > > > > > > > -                   vi->hdr_len + GOOD_PACKET_LEN);
> > > > > > > > > > -       err = virtqueue_add_inbuf_ctx(rq->vq, rq->sg, 1, buf, ctx, gfp);
> > > > > > > > > > +
> > > > > > > > > > +       if (rq->data_array) {
> > > > > > > > > > +               err = virtnet_rq_map_sg(rq, buf + VIRTNET_RX_PAD + xdp_headroom,
> > > > > > > > > > +                                       vi->hdr_len + GOOD_PACKET_LEN);
> > > > > > > > > > +               if (err)
> > > > > > > > > > +                       goto map_err;
> > > > > > > > > > +
> > > > > > > > > > +               data = virtnet_rq_get_data(rq, buf, rq->last_dma);
> > > > > > > > > > +       } else {
> > > > > > > > > > +               sg_init_one(rq->sg, buf + VIRTNET_RX_PAD + xdp_headroom,
> > > > > > > > > > +                           vi->hdr_len + GOOD_PACKET_LEN);
> > > > > > > > > > +               data = (void *)buf;
> > > > > > > > > > +       }
> > > > > > > > > > +
> > > > > > > > > > +       err = virtqueue_add_inbuf_ctx(rq->vq, rq->sg, 1, data, ctx, gfp);
> > > > > > > > > >         if (err < 0)
> > > > > > > > > > -               put_page(virt_to_head_page(buf));
> > > > > > > > > > +               goto add_err;
> > > > > > > > > > +
> > > > > > > > > > +       return err;
> > > > > > > > > > +
> > > > > > > > > > +add_err:
> > > > > > > > > > +       if (rq->data_array) {
> > > > > > > > > > +               virtnet_rq_unmap(rq, data->dma);
> > > > > > > > > > +               virtnet_rq_recycle_data(rq, data);
> > > > > > > > > > +       }
> > > > > > > > > > +
> > > > > > > > > > +map_err:
> > > > > > > > > > +       put_page(virt_to_head_page(buf));
> > > > > > > > > >         return err;
> > > > > > > > > >  }
> > > > > > > > > >
> > > > > > > > > > @@ -1620,6 +1842,7 @@ static int add_recvbuf_mergeable(struct virtnet_info *vi,
> > > > > > > > > >         unsigned int headroom = virtnet_get_headroom(vi);
> > > > > > > > > >         unsigned int tailroom = headroom ? sizeof(struct skb_shared_info) : 0;
> > > > > > > > > >         unsigned int room = SKB_DATA_ALIGN(headroom + tailroom);
> > > > > > > > > > +       struct virtnet_rq_data *data;
> > > > > > > > > >         char *buf;
> > > > > > > > > >         void *ctx;
> > > > > > > > > >         int err;
> > > > > > > > > > @@ -1650,12 +1873,32 @@ static int add_recvbuf_mergeable(struct virtnet_info *vi,
> > > > > > > > > >                 alloc_frag->offset += hole;
> > > > > > > > > >         }
> > > > > > > > > >
> > > > > > > > > > -       sg_init_one(rq->sg, buf, len);
> > > > > > > > > > +       if (rq->data_array) {
> > > > > > > > > > +               err = virtnet_rq_map_sg(rq, buf, len);
> > > > > > > > > > +               if (err)
> > > > > > > > > > +                       goto map_err;
> > > > > > > > > > +
> > > > > > > > > > +               data = virtnet_rq_get_data(rq, buf, rq->last_dma);
> > > > > > > > > > +       } else {
> > > > > > > > > > +               sg_init_one(rq->sg, buf, len);
> > > > > > > > > > +               data = (void *)buf;
> > > > > > > > > > +       }
> > > > > > > > > > +
> > > > > > > > > >         ctx = mergeable_len_to_ctx(len + room, headroom);
> > > > > > > > > > -       err = virtqueue_add_inbuf_ctx(rq->vq, rq->sg, 1, buf, ctx, gfp);
> > > > > > > > > > +       err = virtqueue_add_inbuf_ctx(rq->vq, rq->sg, 1, data, ctx, gfp);
> > > > > > > > > >         if (err < 0)
> > > > > > > > > > -               put_page(virt_to_head_page(buf));
> > > > > > > > > > +               goto add_err;
> > > > > > > > > > +
> > > > > > > > > > +       return 0;
> > > > > > > > > > +
> > > > > > > > > > +add_err:
> > > > > > > > > > +       if (rq->data_array) {
> > > > > > > > > > +               virtnet_rq_unmap(rq, data->dma);
> > > > > > > > > > +               virtnet_rq_recycle_data(rq, data);
> > > > > > > > > > +       }
> > > > > > > > > >
> > > > > > > > > > +map_err:
> > > > > > > > > > +       put_page(virt_to_head_page(buf));
> > > > > > > > > >         return err;
> > > > > > > > > >  }
> > > > > > > > > >
> > > > > > > > > > @@ -1775,13 +2018,13 @@ static int virtnet_receive(struct receive_queue *rq, int budget,
> > > > > > > > > >                 void *ctx;
> > > > > > > > > >
> > > > > > > > > >                 while (stats.packets < budget &&
> > > > > > > > > > -                      (buf = virtqueue_get_buf_ctx(rq->vq, &len, &ctx))) {
> > > > > > > > > > +                      (buf = virtnet_rq_get_buf(rq, &len, &ctx))) {
> > > > > > > > > >                         receive_buf(vi, rq, buf, len, ctx, xdp_xmit, &stats);
> > > > > > > > > >                         stats.packets++;
> > > > > > > > > >                 }
> > > > > > > > > >         } else {
> > > > > > > > > >                 while (stats.packets < budget &&
> > > > > > > > > > -                      (buf = virtqueue_get_buf(rq->vq, &len)) != NULL) {
> > > > > > > > > > +                      (buf = virtnet_rq_get_buf(rq, &len, NULL)) != NULL) {
> > > > > > > > > >                         receive_buf(vi, rq, buf, len, NULL, xdp_xmit, &stats);
> > > > > > > > > >                         stats.packets++;
> > > > > > > > > >                 }
> > > > > > > > > > @@ -3514,6 +3757,9 @@ static void virtnet_free_queues(struct virtnet_info *vi)
> > > > > > > > > >         for (i = 0; i < vi->max_queue_pairs; i++) {
> > > > > > > > > >                 __netif_napi_del(&vi->rq[i].napi);
> > > > > > > > > >                 __netif_napi_del(&vi->sq[i].napi);
> > > > > > > > > > +
> > > > > > > > > > +               kfree(vi->rq[i].data_array);
> > > > > > > > > > +               kfree(vi->rq[i].dma_array);
> > > > > > > > > >         }
> > > > > > > > > >
> > > > > > > > > >         /* We called __netif_napi_del(),
> > > > > > > > > > @@ -3591,9 +3837,10 @@ static void free_unused_bufs(struct virtnet_info *vi)
> > > > > > > > > >         }
> > > > > > > > > >
> > > > > > > > > >         for (i = 0; i < vi->max_queue_pairs; i++) {
> > > > > > > > > > -               struct virtqueue *vq = vi->rq[i].vq;
> > > > > > > > > > -               while ((buf = virtqueue_detach_unused_buf(vq)) != NULL)
> > > > > > > > > > -                       virtnet_rq_free_unused_buf(vq, buf);
> > > > > > > > > > +               struct receive_queue *rq = &vi->rq[i];
> > > > > > > > > > +
> > > > > > > > > > +               while ((buf = virtnet_rq_detach_unused_buf(rq)) != NULL)
> > > > > > > > > > +                       virtnet_rq_free_unused_buf(rq->vq, buf);
> > > > > > > > > >                 cond_resched();
> > > > > > > > > >         }
> > > > > > > > > >  }
> > > > > > > > > > @@ -3767,6 +4014,10 @@ static int init_vqs(struct virtnet_info *vi)
> > > > > > > > > >         if (ret)
> > > > > > > > > >                 goto err_free;
> > > > > > > > > >
> > > > > > > > > > +       ret = virtnet_rq_merge_map_init(vi);
> > > > > > > > > > +       if (ret)
> > > > > > > > > > +               goto err_free;
> > > > > > > > > > +
> > > > > > > > > >         cpus_read_lock();
> > > > > > > > > >         virtnet_set_affinity(vi);
> > > > > > > > > >         cpus_read_unlock();
> > > > > > > > > > --
> > > > > > > > > > 2.32.0.3.g01195cf9f
> > > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>


^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: [PATCH vhost v11 10/10] virtio_net: merge dma operation for one page
@ 2023-07-12  8:37                       ` Jason Wang
  0 siblings, 0 replies; 176+ messages in thread
From: Jason Wang @ 2023-07-12  8:37 UTC (permalink / raw)
  To: Xuan Zhuo
  Cc: Jesper Dangaard Brouer, Daniel Borkmann, Michael S. Tsirkin,
	netdev, John Fastabend, Alexei Starovoitov, virtualization,
	Christoph Hellwig, Eric Dumazet, Jakub Kicinski, bpf,
	Paolo Abeni, David S. Miller

On Wed, Jul 12, 2023 at 4:33 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
>
> On Wed, 12 Jul 2023 15:54:58 +0800, Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > On Tue, 11 Jul 2023 10:58:51 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > On Tue, Jul 11, 2023 at 10:42 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > >
> > > > On Tue, 11 Jul 2023 10:36:17 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > > > On Mon, Jul 10, 2023 at 8:41 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > > > >
> > > > > > On Mon, 10 Jul 2023 07:59:03 -0400, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > > > > > > On Mon, Jul 10, 2023 at 06:18:30PM +0800, Xuan Zhuo wrote:
> > > > > > > > On Mon, 10 Jul 2023 05:40:21 -0400, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > > > > > > > > On Mon, Jul 10, 2023 at 11:42:37AM +0800, Xuan Zhuo wrote:
> > > > > > > > > > Currently, the virtio core will perform a dma operation for each
> > > > > > > > > > operation. Although, the same page may be operated multiple times.
> > > > > > > > > >
> > > > > > > > > > The driver does the dma operation and manages the dma address based the
> > > > > > > > > > feature premapped of virtio core.
> > > > > > > > > >
> > > > > > > > > > This way, we can perform only one dma operation for the same page. In
> > > > > > > > > > the case of mtu 1500, this can reduce a lot of dma operations.
> > > > > > > > > >
> > > > > > > > > > Tested on Aliyun g7.4large machine, in the case of a cpu 100%, pps
> > > > > > > > > > increased from 1893766 to 1901105. An increase of 0.4%.
> > > > > > > > >
> > > > > > > > > what kind of dma was there? an IOMMU? which vendors? in which mode
> > > > > > > > > of operation?
> > > > > > > >
> > > > > > > >
> > > > > > > > Do you mean this:
> > > > > > > >
> > > > > > > > [    0.470816] iommu: Default domain type: Passthrough
> > > > > > > >
> > > > > > >
> > > > > > > With passthrough, dma API is just some indirect function calls, they do
> > > > > > > not affect the performance a lot.
> > > > > >
> > > > > >
> > > > > > Yes, this benefit is worthless. I seem to have done a meaningless thing. The
> > > > > > overhead of DMA I observed is indeed not too high.
> > > > >
> > > > > Have you measured with iommu=strict?
> > > >
> > > > I have not tested this way, our environment is pt, I wonder if strict is a
> > > > common scenario. I can test it.
> > >
> > > It's not a common setup, but it's a way to stress DMA layer to see the overhead.
> >
> > kernel command line: intel_iommu=on iommu.strict=1 iommu.passthrough=0
> >
> > virtio-net without merge dma 428614.00 pps
> >
> > virtio-net with merge dma    742853.00 pps
>
>
> kernel command line: intel_iommu=on iommu.strict=0 iommu.passthrough=0
>
> virtio-net without merge dma 775496.00 pps
>
> virtio-net with merge dma    1010514.00 pps
>
>

Great, let's add those numbers to the changelog.

Thanks

> Thanks.
>
> >
> >
> > Thanks.
> >
> >
> >
> >
> > >
> > > Thanks
> > >
> > > >
> > > > Thanks.
> > > >
> > > >
> > > > >
> > > > > Thanks
> > > > >
> > > > > >
> > > > > > Thanks.
> > > > > >
> > > > > >
> > > > > > >
> > > > > > > Try e.g. bounce buffer. Which is where you will see a problem: your
> > > > > > > patches won't work.
> > > > > > >
> > > > > > >
> > > > > > > > >
> > > > > > > > > > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > > > > > > > >
> > > > > > > > > This kind of difference is likely in the noise.
> > > > > > > >
> > > > > > > > It's really not high, but this is because the proportion of DMA under perf top
> > > > > > > > is not high. Probably that much.
> > > > > > >
> > > > > > > So maybe not worth the complexity.
> > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > > ---
> > > > > > > > > >  drivers/net/virtio_net.c | 283 ++++++++++++++++++++++++++++++++++++---
> > > > > > > > > >  1 file changed, 267 insertions(+), 16 deletions(-)
> > > > > > > > > >
> > > > > > > > > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > > > > > > > > > index 486b5849033d..4de845d35bed 100644
> > > > > > > > > > --- a/drivers/net/virtio_net.c
> > > > > > > > > > +++ b/drivers/net/virtio_net.c
> > > > > > > > > > @@ -126,6 +126,27 @@ static const struct virtnet_stat_desc virtnet_rq_stats_desc[] = {
> > > > > > > > > >  #define VIRTNET_SQ_STATS_LEN   ARRAY_SIZE(virtnet_sq_stats_desc)
> > > > > > > > > >  #define VIRTNET_RQ_STATS_LEN   ARRAY_SIZE(virtnet_rq_stats_desc)
> > > > > > > > > >
> > > > > > > > > > +/* The bufs on the same page may share this struct. */
> > > > > > > > > > +struct virtnet_rq_dma {
> > > > > > > > > > +       struct virtnet_rq_dma *next;
> > > > > > > > > > +
> > > > > > > > > > +       dma_addr_t addr;
> > > > > > > > > > +
> > > > > > > > > > +       void *buf;
> > > > > > > > > > +       u32 len;
> > > > > > > > > > +
> > > > > > > > > > +       u32 ref;
> > > > > > > > > > +};
> > > > > > > > > > +
> > > > > > > > > > +/* Record the dma and buf. */
> > > > > > > > >
> > > > > > > > > I guess I see that. But why?
> > > > > > > > > And these two comments are the extent of the available
> > > > > > > > > documentation, that's not enough I feel.
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > > +struct virtnet_rq_data {
> > > > > > > > > > +       struct virtnet_rq_data *next;
> > > > > > > > >
> > > > > > > > > Is manually reimplementing a linked list the best
> > > > > > > > > we can do?
> > > > > > > >
> > > > > > > > Yes, we can use llist.
> > > > > > > >
> > > > > > > > >
> > > > > > > > > > +
> > > > > > > > > > +       void *buf;
> > > > > > > > > > +
> > > > > > > > > > +       struct virtnet_rq_dma *dma;
> > > > > > > > > > +};
> > > > > > > > > > +
> > > > > > > > > >  /* Internal representation of a send virtqueue */
> > > > > > > > > >  struct send_queue {
> > > > > > > > > >         /* Virtqueue associated with this send _queue */
> > > > > > > > > > @@ -175,6 +196,13 @@ struct receive_queue {
> > > > > > > > > >         char name[16];
> > > > > > > > > >
> > > > > > > > > >         struct xdp_rxq_info xdp_rxq;
> > > > > > > > > > +
> > > > > > > > > > +       struct virtnet_rq_data *data_array;
> > > > > > > > > > +       struct virtnet_rq_data *data_free;
> > > > > > > > > > +
> > > > > > > > > > +       struct virtnet_rq_dma *dma_array;
> > > > > > > > > > +       struct virtnet_rq_dma *dma_free;
> > > > > > > > > > +       struct virtnet_rq_dma *last_dma;
> > > > > > > > > >  };
> > > > > > > > > >
> > > > > > > > > >  /* This structure can contain rss message with maximum settings for indirection table and keysize
> > > > > > > > > > @@ -549,6 +577,176 @@ static struct sk_buff *page_to_skb(struct virtnet_info *vi,
> > > > > > > > > >         return skb;
> > > > > > > > > >  }
> > > > > > > > > >
> > > > > > > > > > +static void virtnet_rq_unmap(struct receive_queue *rq, struct virtnet_rq_dma *dma)
> > > > > > > > > > +{
> > > > > > > > > > +       struct device *dev;
> > > > > > > > > > +
> > > > > > > > > > +       --dma->ref;
> > > > > > > > > > +
> > > > > > > > > > +       if (dma->ref)
> > > > > > > > > > +               return;
> > > > > > > > > > +
> > > > > > > > >
> > > > > > > > > If you don't unmap there is no guarantee valid data will be
> > > > > > > > > there in the buffer.
> > > > > > > > >
> > > > > > > > > > +       dev = virtqueue_dma_dev(rq->vq);
> > > > > > > > > > +
> > > > > > > > > > +       dma_unmap_page(dev, dma->addr, dma->len, DMA_FROM_DEVICE);
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > > +
> > > > > > > > > > +       dma->next = rq->dma_free;
> > > > > > > > > > +       rq->dma_free = dma;
> > > > > > > > > > +}
> > > > > > > > > > +
> > > > > > > > > > +static void *virtnet_rq_recycle_data(struct receive_queue *rq,
> > > > > > > > > > +                                    struct virtnet_rq_data *data)
> > > > > > > > > > +{
> > > > > > > > > > +       void *buf;
> > > > > > > > > > +
> > > > > > > > > > +       buf = data->buf;
> > > > > > > > > > +
> > > > > > > > > > +       data->next = rq->data_free;
> > > > > > > > > > +       rq->data_free = data;
> > > > > > > > > > +
> > > > > > > > > > +       return buf;
> > > > > > > > > > +}
> > > > > > > > > > +
> > > > > > > > > > +static struct virtnet_rq_data *virtnet_rq_get_data(struct receive_queue *rq,
> > > > > > > > > > +                                                  void *buf,
> > > > > > > > > > +                                                  struct virtnet_rq_dma *dma)
> > > > > > > > > > +{
> > > > > > > > > > +       struct virtnet_rq_data *data;
> > > > > > > > > > +
> > > > > > > > > > +       data = rq->data_free;
> > > > > > > > > > +       rq->data_free = data->next;
> > > > > > > > > > +
> > > > > > > > > > +       data->buf = buf;
> > > > > > > > > > +       data->dma = dma;
> > > > > > > > > > +
> > > > > > > > > > +       return data;
> > > > > > > > > > +}
> > > > > > > > > > +
> > > > > > > > > > +static void *virtnet_rq_get_buf(struct receive_queue *rq, u32 *len, void **ctx)
> > > > > > > > > > +{
> > > > > > > > > > +       struct virtnet_rq_data *data;
> > > > > > > > > > +       void *buf;
> > > > > > > > > > +
> > > > > > > > > > +       buf = virtqueue_get_buf_ctx(rq->vq, len, ctx);
> > > > > > > > > > +       if (!buf || !rq->data_array)
> > > > > > > > > > +               return buf;
> > > > > > > > > > +
> > > > > > > > > > +       data = buf;
> > > > > > > > > > +
> > > > > > > > > > +       virtnet_rq_unmap(rq, data->dma);
> > > > > > > > > > +
> > > > > > > > > > +       return virtnet_rq_recycle_data(rq, data);
> > > > > > > > > > +}
> > > > > > > > > > +
> > > > > > > > > > +static void *virtnet_rq_detach_unused_buf(struct receive_queue *rq)
> > > > > > > > > > +{
> > > > > > > > > > +       struct virtnet_rq_data *data;
> > > > > > > > > > +       void *buf;
> > > > > > > > > > +
> > > > > > > > > > +       buf = virtqueue_detach_unused_buf(rq->vq);
> > > > > > > > > > +       if (!buf || !rq->data_array)
> > > > > > > > > > +               return buf;
> > > > > > > > > > +
> > > > > > > > > > +       data = buf;
> > > > > > > > > > +
> > > > > > > > > > +       virtnet_rq_unmap(rq, data->dma);
> > > > > > > > > > +
> > > > > > > > > > +       return virtnet_rq_recycle_data(rq, data);
> > > > > > > > > > +}
> > > > > > > > > > +
> > > > > > > > > > +static int virtnet_rq_map_sg(struct receive_queue *rq, void *buf, u32 len)
> > > > > > > > > > +{
> > > > > > > > > > +       struct virtnet_rq_dma *dma = rq->last_dma;
> > > > > > > > > > +       struct device *dev;
> > > > > > > > > > +       u32 off, map_len;
> > > > > > > > > > +       dma_addr_t addr;
> > > > > > > > > > +       void *end;
> > > > > > > > > > +
> > > > > > > > > > +       if (likely(dma) && buf >= dma->buf && (buf + len <= dma->buf + dma->len)) {
> > > > > > > > > > +               ++dma->ref;
> > > > > > > > > > +               addr = dma->addr + (buf - dma->buf);
> > > > > > > > > > +               goto ok;
> > > > > > > > > > +       }
> > > > > > > > >
> > > > > > > > > So this is the meat of the proposed optimization. I guess that
> > > > > > > > > if the last buffer we allocated happens to be in the same page
> > > > > > > > > as this one then they can both be mapped for DMA together.
> > > > > > > >
> > > > > > > > Since we use page_frag, the buffers we allocated are all continuous.
> > > > > > > >
> > > > > > > > > Why last one specifically? Whether next one happens to
> > > > > > > > > be close depends on luck. If you want to try optimizing this
> > > > > > > > > the right thing to do is likely by using a page pool.
> > > > > > > > > There's actually work upstream on page pool, look it up.
> > > > > > > >
> > > > > > > > As we discussed in another thread, the page pool is first used for xdp. Let's
> > > > > > > > transform it step by step.
> > > > > > > >
> > > > > > > > Thanks.
> > > > > > >
> > > > > > > ok so this should wait then?
> > > > > > >
> > > > > > > > >
> > > > > > > > > > +
> > > > > > > > > > +       end = buf + len - 1;
> > > > > > > > > > +       off = offset_in_page(end);
> > > > > > > > > > +       map_len = len + PAGE_SIZE - off;
> > > > > > > > > > +
> > > > > > > > > > +       dev = virtqueue_dma_dev(rq->vq);
> > > > > > > > > > +
> > > > > > > > > > +       addr = dma_map_page_attrs(dev, virt_to_page(buf), offset_in_page(buf),
> > > > > > > > > > +                                 map_len, DMA_FROM_DEVICE, 0);
> > > > > > > > > > +       if (addr == DMA_MAPPING_ERROR)
> > > > > > > > > > +               return -ENOMEM;
> > > > > > > > > > +
> > > > > > > > > > +       dma = rq->dma_free;
> > > > > > > > > > +       rq->dma_free = dma->next;
> > > > > > > > > > +
> > > > > > > > > > +       dma->ref = 1;
> > > > > > > > > > +       dma->buf = buf;
> > > > > > > > > > +       dma->addr = addr;
> > > > > > > > > > +       dma->len = map_len;
> > > > > > > > > > +
> > > > > > > > > > +       rq->last_dma = dma;
> > > > > > > > > > +
> > > > > > > > > > +ok:
> > > > > > > > > > +       sg_init_table(rq->sg, 1);
> > > > > > > > > > +       rq->sg[0].dma_address = addr;
> > > > > > > > > > +       rq->sg[0].length = len;
> > > > > > > > > > +
> > > > > > > > > > +       return 0;
> > > > > > > > > > +}
> > > > > > > > > > +
> > > > > > > > > > +static int virtnet_rq_merge_map_init(struct virtnet_info *vi)
> > > > > > > > > > +{
> > > > > > > > > > +       struct receive_queue *rq;
> > > > > > > > > > +       int i, err, j, num;
> > > > > > > > > > +
> > > > > > > > > > +       /* disable for big mode */
> > > > > > > > > > +       if (!vi->mergeable_rx_bufs && vi->big_packets)
> > > > > > > > > > +               return 0;
> > > > > > > > > > +
> > > > > > > > > > +       for (i = 0; i < vi->max_queue_pairs; i++) {
> > > > > > > > > > +               err = virtqueue_set_premapped(vi->rq[i].vq);
> > > > > > > > > > +               if (err)
> > > > > > > > > > +                       continue;
> > > > > > > > > > +
> > > > > > > > > > +               rq = &vi->rq[i];
> > > > > > > > > > +
> > > > > > > > > > +               num = virtqueue_get_vring_size(rq->vq);
> > > > > > > > > > +
> > > > > > > > > > +               rq->data_array = kmalloc_array(num, sizeof(*rq->data_array), GFP_KERNEL);
> > > > > > > > > > +               if (!rq->data_array)
> > > > > > > > > > +                       goto err;
> > > > > > > > > > +
> > > > > > > > > > +               rq->dma_array = kmalloc_array(num, sizeof(*rq->dma_array), GFP_KERNEL);
> > > > > > > > > > +               if (!rq->dma_array)
> > > > > > > > > > +                       goto err;
> > > > > > > > > > +
> > > > > > > > > > +               for (j = 0; j < num; ++j) {
> > > > > > > > > > +                       rq->data_array[j].next = rq->data_free;
> > > > > > > > > > +                       rq->data_free = &rq->data_array[j];
> > > > > > > > > > +
> > > > > > > > > > +                       rq->dma_array[j].next = rq->dma_free;
> > > > > > > > > > +                       rq->dma_free = &rq->dma_array[j];
> > > > > > > > > > +               }
> > > > > > > > > > +       }
> > > > > > > > > > +
> > > > > > > > > > +       return 0;
> > > > > > > > > > +
> > > > > > > > > > +err:
> > > > > > > > > > +       for (i = 0; i < vi->max_queue_pairs; i++) {
> > > > > > > > > > +               struct receive_queue *rq;
> > > > > > > > > > +
> > > > > > > > > > +               rq = &vi->rq[i];
> > > > > > > > > > +
> > > > > > > > > > +               kfree(rq->dma_array);
> > > > > > > > > > +               kfree(rq->data_array);
> > > > > > > > > > +       }
> > > > > > > > > > +
> > > > > > > > > > +       return -ENOMEM;
> > > > > > > > > > +}
> > > > > > > > > > +
> > > > > > > > > >  static void free_old_xmit_skbs(struct send_queue *sq, bool in_napi)
> > > > > > > > > >  {
> > > > > > > > > >         unsigned int len;
> > > > > > > > > > @@ -835,7 +1033,7 @@ static struct page *xdp_linearize_page(struct receive_queue *rq,
> > > > > > > > > >                 void *buf;
> > > > > > > > > >                 int off;
> > > > > > > > > >
> > > > > > > > > > -               buf = virtqueue_get_buf(rq->vq, &buflen);
> > > > > > > > > > +               buf = virtnet_rq_get_buf(rq, &buflen, NULL);
> > > > > > > > > >                 if (unlikely(!buf))
> > > > > > > > > >                         goto err_buf;
> > > > > > > > > >
> > > > > > > > > > @@ -1126,7 +1324,7 @@ static int virtnet_build_xdp_buff_mrg(struct net_device *dev,
> > > > > > > > > >                 return -EINVAL;
> > > > > > > > > >
> > > > > > > > > >         while (--*num_buf > 0) {
> > > > > > > > > > -               buf = virtqueue_get_buf_ctx(rq->vq, &len, &ctx);
> > > > > > > > > > +               buf = virtnet_rq_get_buf(rq, &len, &ctx);
> > > > > > > > > >                 if (unlikely(!buf)) {
> > > > > > > > > >                         pr_debug("%s: rx error: %d buffers out of %d missing\n",
> > > > > > > > > >                                  dev->name, *num_buf,
> > > > > > > > > > @@ -1351,7 +1549,7 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
> > > > > > > > > >         while (--num_buf) {
> > > > > > > > > >                 int num_skb_frags;
> > > > > > > > > >
> > > > > > > > > > -               buf = virtqueue_get_buf_ctx(rq->vq, &len, &ctx);
> > > > > > > > > > +               buf = virtnet_rq_get_buf(rq, &len, &ctx);
> > > > > > > > > >                 if (unlikely(!buf)) {
> > > > > > > > > >                         pr_debug("%s: rx error: %d buffers out of %d missing\n",
> > > > > > > > > >                                  dev->name, num_buf,
> > > > > > > > > > @@ -1414,7 +1612,7 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
> > > > > > > > > >  err_skb:
> > > > > > > > > >         put_page(page);
> > > > > > > > > >         while (num_buf-- > 1) {
> > > > > > > > > > -               buf = virtqueue_get_buf(rq->vq, &len);
> > > > > > > > > > +               buf = virtnet_rq_get_buf(rq, &len, NULL);
> > > > > > > > > >                 if (unlikely(!buf)) {
> > > > > > > > > >                         pr_debug("%s: rx error: %d buffers missing\n",
> > > > > > > > > >                                  dev->name, num_buf);
> > > > > > > > > > @@ -1529,6 +1727,7 @@ static int add_recvbuf_small(struct virtnet_info *vi, struct receive_queue *rq,
> > > > > > > > > >         unsigned int xdp_headroom = virtnet_get_headroom(vi);
> > > > > > > > > >         void *ctx = (void *)(unsigned long)xdp_headroom;
> > > > > > > > > >         int len = vi->hdr_len + VIRTNET_RX_PAD + GOOD_PACKET_LEN + xdp_headroom;
> > > > > > > > > > +       struct virtnet_rq_data *data;
> > > > > > > > > >         int err;
> > > > > > > > > >
> > > > > > > > > >         len = SKB_DATA_ALIGN(len) +
> > > > > > > > > > @@ -1539,11 +1738,34 @@ static int add_recvbuf_small(struct virtnet_info *vi, struct receive_queue *rq,
> > > > > > > > > >         buf = (char *)page_address(alloc_frag->page) + alloc_frag->offset;
> > > > > > > > > >         get_page(alloc_frag->page);
> > > > > > > > > >         alloc_frag->offset += len;
> > > > > > > > > > -       sg_init_one(rq->sg, buf + VIRTNET_RX_PAD + xdp_headroom,
> > > > > > > > > > -                   vi->hdr_len + GOOD_PACKET_LEN);
> > > > > > > > > > -       err = virtqueue_add_inbuf_ctx(rq->vq, rq->sg, 1, buf, ctx, gfp);
> > > > > > > > > > +
> > > > > > > > > > +       if (rq->data_array) {
> > > > > > > > > > +               err = virtnet_rq_map_sg(rq, buf + VIRTNET_RX_PAD + xdp_headroom,
> > > > > > > > > > +                                       vi->hdr_len + GOOD_PACKET_LEN);
> > > > > > > > > > +               if (err)
> > > > > > > > > > +                       goto map_err;
> > > > > > > > > > +
> > > > > > > > > > +               data = virtnet_rq_get_data(rq, buf, rq->last_dma);
> > > > > > > > > > +       } else {
> > > > > > > > > > +               sg_init_one(rq->sg, buf + VIRTNET_RX_PAD + xdp_headroom,
> > > > > > > > > > +                           vi->hdr_len + GOOD_PACKET_LEN);
> > > > > > > > > > +               data = (void *)buf;
> > > > > > > > > > +       }
> > > > > > > > > > +
> > > > > > > > > > +       err = virtqueue_add_inbuf_ctx(rq->vq, rq->sg, 1, data, ctx, gfp);
> > > > > > > > > >         if (err < 0)
> > > > > > > > > > -               put_page(virt_to_head_page(buf));
> > > > > > > > > > +               goto add_err;
> > > > > > > > > > +
> > > > > > > > > > +       return err;
> > > > > > > > > > +
> > > > > > > > > > +add_err:
> > > > > > > > > > +       if (rq->data_array) {
> > > > > > > > > > +               virtnet_rq_unmap(rq, data->dma);
> > > > > > > > > > +               virtnet_rq_recycle_data(rq, data);
> > > > > > > > > > +       }
> > > > > > > > > > +
> > > > > > > > > > +map_err:
> > > > > > > > > > +       put_page(virt_to_head_page(buf));
> > > > > > > > > >         return err;
> > > > > > > > > >  }
> > > > > > > > > >
> > > > > > > > > > @@ -1620,6 +1842,7 @@ static int add_recvbuf_mergeable(struct virtnet_info *vi,
> > > > > > > > > >         unsigned int headroom = virtnet_get_headroom(vi);
> > > > > > > > > >         unsigned int tailroom = headroom ? sizeof(struct skb_shared_info) : 0;
> > > > > > > > > >         unsigned int room = SKB_DATA_ALIGN(headroom + tailroom);
> > > > > > > > > > +       struct virtnet_rq_data *data;
> > > > > > > > > >         char *buf;
> > > > > > > > > >         void *ctx;
> > > > > > > > > >         int err;
> > > > > > > > > > @@ -1650,12 +1873,32 @@ static int add_recvbuf_mergeable(struct virtnet_info *vi,
> > > > > > > > > >                 alloc_frag->offset += hole;
> > > > > > > > > >         }
> > > > > > > > > >
> > > > > > > > > > -       sg_init_one(rq->sg, buf, len);
> > > > > > > > > > +       if (rq->data_array) {
> > > > > > > > > > +               err = virtnet_rq_map_sg(rq, buf, len);
> > > > > > > > > > +               if (err)
> > > > > > > > > > +                       goto map_err;
> > > > > > > > > > +
> > > > > > > > > > +               data = virtnet_rq_get_data(rq, buf, rq->last_dma);
> > > > > > > > > > +       } else {
> > > > > > > > > > +               sg_init_one(rq->sg, buf, len);
> > > > > > > > > > +               data = (void *)buf;
> > > > > > > > > > +       }
> > > > > > > > > > +
> > > > > > > > > >         ctx = mergeable_len_to_ctx(len + room, headroom);
> > > > > > > > > > -       err = virtqueue_add_inbuf_ctx(rq->vq, rq->sg, 1, buf, ctx, gfp);
> > > > > > > > > > +       err = virtqueue_add_inbuf_ctx(rq->vq, rq->sg, 1, data, ctx, gfp);
> > > > > > > > > >         if (err < 0)
> > > > > > > > > > -               put_page(virt_to_head_page(buf));
> > > > > > > > > > +               goto add_err;
> > > > > > > > > > +
> > > > > > > > > > +       return 0;
> > > > > > > > > > +
> > > > > > > > > > +add_err:
> > > > > > > > > > +       if (rq->data_array) {
> > > > > > > > > > +               virtnet_rq_unmap(rq, data->dma);
> > > > > > > > > > +               virtnet_rq_recycle_data(rq, data);
> > > > > > > > > > +       }
> > > > > > > > > >
> > > > > > > > > > +map_err:
> > > > > > > > > > +       put_page(virt_to_head_page(buf));
> > > > > > > > > >         return err;
> > > > > > > > > >  }
> > > > > > > > > >
> > > > > > > > > > @@ -1775,13 +2018,13 @@ static int virtnet_receive(struct receive_queue *rq, int budget,
> > > > > > > > > >                 void *ctx;
> > > > > > > > > >
> > > > > > > > > >                 while (stats.packets < budget &&
> > > > > > > > > > -                      (buf = virtqueue_get_buf_ctx(rq->vq, &len, &ctx))) {
> > > > > > > > > > +                      (buf = virtnet_rq_get_buf(rq, &len, &ctx))) {
> > > > > > > > > >                         receive_buf(vi, rq, buf, len, ctx, xdp_xmit, &stats);
> > > > > > > > > >                         stats.packets++;
> > > > > > > > > >                 }
> > > > > > > > > >         } else {
> > > > > > > > > >                 while (stats.packets < budget &&
> > > > > > > > > > -                      (buf = virtqueue_get_buf(rq->vq, &len)) != NULL) {
> > > > > > > > > > +                      (buf = virtnet_rq_get_buf(rq, &len, NULL)) != NULL) {
> > > > > > > > > >                         receive_buf(vi, rq, buf, len, NULL, xdp_xmit, &stats);
> > > > > > > > > >                         stats.packets++;
> > > > > > > > > >                 }
> > > > > > > > > > @@ -3514,6 +3757,9 @@ static void virtnet_free_queues(struct virtnet_info *vi)
> > > > > > > > > >         for (i = 0; i < vi->max_queue_pairs; i++) {
> > > > > > > > > >                 __netif_napi_del(&vi->rq[i].napi);
> > > > > > > > > >                 __netif_napi_del(&vi->sq[i].napi);
> > > > > > > > > > +
> > > > > > > > > > +               kfree(vi->rq[i].data_array);
> > > > > > > > > > +               kfree(vi->rq[i].dma_array);
> > > > > > > > > >         }
> > > > > > > > > >
> > > > > > > > > >         /* We called __netif_napi_del(),
> > > > > > > > > > @@ -3591,9 +3837,10 @@ static void free_unused_bufs(struct virtnet_info *vi)
> > > > > > > > > >         }
> > > > > > > > > >
> > > > > > > > > >         for (i = 0; i < vi->max_queue_pairs; i++) {
> > > > > > > > > > -               struct virtqueue *vq = vi->rq[i].vq;
> > > > > > > > > > -               while ((buf = virtqueue_detach_unused_buf(vq)) != NULL)
> > > > > > > > > > -                       virtnet_rq_free_unused_buf(vq, buf);
> > > > > > > > > > +               struct receive_queue *rq = &vi->rq[i];
> > > > > > > > > > +
> > > > > > > > > > +               while ((buf = virtnet_rq_detach_unused_buf(rq)) != NULL)
> > > > > > > > > > +                       virtnet_rq_free_unused_buf(rq->vq, buf);
> > > > > > > > > >                 cond_resched();
> > > > > > > > > >         }
> > > > > > > > > >  }
> > > > > > > > > > @@ -3767,6 +4014,10 @@ static int init_vqs(struct virtnet_info *vi)
> > > > > > > > > >         if (ret)
> > > > > > > > > >                 goto err_free;
> > > > > > > > > >
> > > > > > > > > > +       ret = virtnet_rq_merge_map_init(vi);
> > > > > > > > > > +       if (ret)
> > > > > > > > > > +               goto err_free;
> > > > > > > > > > +
> > > > > > > > > >         cpus_read_lock();
> > > > > > > > > >         virtnet_set_affinity(vi);
> > > > > > > > > >         cpus_read_unlock();
> > > > > > > > > > --
> > > > > > > > > > 2.32.0.3.g01195cf9f
> > > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: [PATCH vhost v11 10/10] virtio_net: merge dma operation for one page
  2023-07-12  8:37                       ` Jason Wang
@ 2023-07-12  8:38                         ` Xuan Zhuo
  -1 siblings, 0 replies; 176+ messages in thread
From: Xuan Zhuo @ 2023-07-12  8:38 UTC (permalink / raw)
  To: Jason Wang
  Cc: Michael S. Tsirkin, virtualization, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Alexei Starovoitov,
	Daniel Borkmann, Jesper Dangaard Brouer, John Fastabend, netdev,
	bpf, Christoph Hellwig

On Wed, 12 Jul 2023 16:37:43 +0800, Jason Wang <jasowang@redhat.com> wrote:
> On Wed, Jul 12, 2023 at 4:33 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> >
> > On Wed, 12 Jul 2023 15:54:58 +0800, Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > On Tue, 11 Jul 2023 10:58:51 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > > On Tue, Jul 11, 2023 at 10:42 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > > >
> > > > > On Tue, 11 Jul 2023 10:36:17 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > > > > On Mon, Jul 10, 2023 at 8:41 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > > > > >
> > > > > > > On Mon, 10 Jul 2023 07:59:03 -0400, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > > > > > > > On Mon, Jul 10, 2023 at 06:18:30PM +0800, Xuan Zhuo wrote:
> > > > > > > > > On Mon, 10 Jul 2023 05:40:21 -0400, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > > > > > > > > > On Mon, Jul 10, 2023 at 11:42:37AM +0800, Xuan Zhuo wrote:
> > > > > > > > > > > Currently, the virtio core will perform a dma operation for each
> > > > > > > > > > > operation. Although, the same page may be operated multiple times.
> > > > > > > > > > >
> > > > > > > > > > > The driver does the dma operation and manages the dma address based the
> > > > > > > > > > > feature premapped of virtio core.
> > > > > > > > > > >
> > > > > > > > > > > This way, we can perform only one dma operation for the same page. In
> > > > > > > > > > > the case of mtu 1500, this can reduce a lot of dma operations.
> > > > > > > > > > >
> > > > > > > > > > > Tested on Aliyun g7.4large machine, in the case of a cpu 100%, pps
> > > > > > > > > > > increased from 1893766 to 1901105. An increase of 0.4%.
> > > > > > > > > >
> > > > > > > > > > what kind of dma was there? an IOMMU? which vendors? in which mode
> > > > > > > > > > of operation?
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > Do you mean this:
> > > > > > > > >
> > > > > > > > > [    0.470816] iommu: Default domain type: Passthrough
> > > > > > > > >
> > > > > > > >
> > > > > > > > With passthrough, dma API is just some indirect function calls, they do
> > > > > > > > not affect the performance a lot.
> > > > > > >
> > > > > > >
> > > > > > > Yes, this benefit is worthless. I seem to have done a meaningless thing. The
> > > > > > > overhead of DMA I observed is indeed not too high.
> > > > > >
> > > > > > Have you measured with iommu=strict?
> > > > >
> > > > > I have not tested this way, our environment is pt, I wonder if strict is a
> > > > > common scenario. I can test it.
> > > >
> > > > It's not a common setup, but it's a way to stress DMA layer to see the overhead.
> > >
> > > kernel command line: intel_iommu=on iommu.strict=1 iommu.passthrough=0
> > >
> > > virtio-net without merge dma 428614.00 pps
> > >
> > > virtio-net with merge dma    742853.00 pps
> >
> >
> > kernel command line: intel_iommu=on iommu.strict=0 iommu.passthrough=0
> >
> > virtio-net without merge dma 775496.00 pps
> >
> > virtio-net with merge dma    1010514.00 pps
> >
> >
>
> Great, let's add those numbers to the changelog.


Yes, I will do it in next version.


Thanks.


>
> Thanks
>
> > Thanks.
> >
> > >
> > >
> > > Thanks.
> > >
> > >
> > >
> > >
> > > >
> > > > Thanks
> > > >
> > > > >
> > > > > Thanks.
> > > > >
> > > > >
> > > > > >
> > > > > > Thanks
> > > > > >
> > > > > > >
> > > > > > > Thanks.
> > > > > > >
> > > > > > >
> > > > > > > >
> > > > > > > > Try e.g. bounce buffer. Which is where you will see a problem: your
> > > > > > > > patches won't work.
> > > > > > > >
> > > > > > > >
> > > > > > > > > >
> > > > > > > > > > > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > > > > > > > > >
> > > > > > > > > > This kind of difference is likely in the noise.
> > > > > > > > >
> > > > > > > > > It's really not high, but this is because the proportion of DMA under perf top
> > > > > > > > > is not high. Probably that much.
> > > > > > > >
> > > > > > > > So maybe not worth the complexity.
> > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > > ---
> > > > > > > > > > >  drivers/net/virtio_net.c | 283 ++++++++++++++++++++++++++++++++++++---
> > > > > > > > > > >  1 file changed, 267 insertions(+), 16 deletions(-)
> > > > > > > > > > >
> > > > > > > > > > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > > > > > > > > > > index 486b5849033d..4de845d35bed 100644
> > > > > > > > > > > --- a/drivers/net/virtio_net.c
> > > > > > > > > > > +++ b/drivers/net/virtio_net.c
> > > > > > > > > > > @@ -126,6 +126,27 @@ static const struct virtnet_stat_desc virtnet_rq_stats_desc[] = {
> > > > > > > > > > >  #define VIRTNET_SQ_STATS_LEN   ARRAY_SIZE(virtnet_sq_stats_desc)
> > > > > > > > > > >  #define VIRTNET_RQ_STATS_LEN   ARRAY_SIZE(virtnet_rq_stats_desc)
> > > > > > > > > > >
> > > > > > > > > > > +/* The bufs on the same page may share this struct. */
> > > > > > > > > > > +struct virtnet_rq_dma {
> > > > > > > > > > > +       struct virtnet_rq_dma *next;
> > > > > > > > > > > +
> > > > > > > > > > > +       dma_addr_t addr;
> > > > > > > > > > > +
> > > > > > > > > > > +       void *buf;
> > > > > > > > > > > +       u32 len;
> > > > > > > > > > > +
> > > > > > > > > > > +       u32 ref;
> > > > > > > > > > > +};
> > > > > > > > > > > +
> > > > > > > > > > > +/* Record the dma and buf. */
> > > > > > > > > >
> > > > > > > > > > I guess I see that. But why?
> > > > > > > > > > And these two comments are the extent of the available
> > > > > > > > > > documentation, that's not enough I feel.
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > > +struct virtnet_rq_data {
> > > > > > > > > > > +       struct virtnet_rq_data *next;
> > > > > > > > > >
> > > > > > > > > > Is manually reimplementing a linked list the best
> > > > > > > > > > we can do?
> > > > > > > > >
> > > > > > > > > Yes, we can use llist.
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > > +
> > > > > > > > > > > +       void *buf;
> > > > > > > > > > > +
> > > > > > > > > > > +       struct virtnet_rq_dma *dma;
> > > > > > > > > > > +};
> > > > > > > > > > > +
> > > > > > > > > > >  /* Internal representation of a send virtqueue */
> > > > > > > > > > >  struct send_queue {
> > > > > > > > > > >         /* Virtqueue associated with this send _queue */
> > > > > > > > > > > @@ -175,6 +196,13 @@ struct receive_queue {
> > > > > > > > > > >         char name[16];
> > > > > > > > > > >
> > > > > > > > > > >         struct xdp_rxq_info xdp_rxq;
> > > > > > > > > > > +
> > > > > > > > > > > +       struct virtnet_rq_data *data_array;
> > > > > > > > > > > +       struct virtnet_rq_data *data_free;
> > > > > > > > > > > +
> > > > > > > > > > > +       struct virtnet_rq_dma *dma_array;
> > > > > > > > > > > +       struct virtnet_rq_dma *dma_free;
> > > > > > > > > > > +       struct virtnet_rq_dma *last_dma;
> > > > > > > > > > >  };
> > > > > > > > > > >
> > > > > > > > > > >  /* This structure can contain rss message with maximum settings for indirection table and keysize
> > > > > > > > > > > @@ -549,6 +577,176 @@ static struct sk_buff *page_to_skb(struct virtnet_info *vi,
> > > > > > > > > > >         return skb;
> > > > > > > > > > >  }
> > > > > > > > > > >
> > > > > > > > > > > +static void virtnet_rq_unmap(struct receive_queue *rq, struct virtnet_rq_dma *dma)
> > > > > > > > > > > +{
> > > > > > > > > > > +       struct device *dev;
> > > > > > > > > > > +
> > > > > > > > > > > +       --dma->ref;
> > > > > > > > > > > +
> > > > > > > > > > > +       if (dma->ref)
> > > > > > > > > > > +               return;
> > > > > > > > > > > +
> > > > > > > > > >
> > > > > > > > > > If you don't unmap there is no guarantee valid data will be
> > > > > > > > > > there in the buffer.
> > > > > > > > > >
> > > > > > > > > > > +       dev = virtqueue_dma_dev(rq->vq);
> > > > > > > > > > > +
> > > > > > > > > > > +       dma_unmap_page(dev, dma->addr, dma->len, DMA_FROM_DEVICE);
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > > +
> > > > > > > > > > > +       dma->next = rq->dma_free;
> > > > > > > > > > > +       rq->dma_free = dma;
> > > > > > > > > > > +}
> > > > > > > > > > > +
> > > > > > > > > > > +static void *virtnet_rq_recycle_data(struct receive_queue *rq,
> > > > > > > > > > > +                                    struct virtnet_rq_data *data)
> > > > > > > > > > > +{
> > > > > > > > > > > +       void *buf;
> > > > > > > > > > > +
> > > > > > > > > > > +       buf = data->buf;
> > > > > > > > > > > +
> > > > > > > > > > > +       data->next = rq->data_free;
> > > > > > > > > > > +       rq->data_free = data;
> > > > > > > > > > > +
> > > > > > > > > > > +       return buf;
> > > > > > > > > > > +}
> > > > > > > > > > > +
> > > > > > > > > > > +static struct virtnet_rq_data *virtnet_rq_get_data(struct receive_queue *rq,
> > > > > > > > > > > +                                                  void *buf,
> > > > > > > > > > > +                                                  struct virtnet_rq_dma *dma)
> > > > > > > > > > > +{
> > > > > > > > > > > +       struct virtnet_rq_data *data;
> > > > > > > > > > > +
> > > > > > > > > > > +       data = rq->data_free;
> > > > > > > > > > > +       rq->data_free = data->next;
> > > > > > > > > > > +
> > > > > > > > > > > +       data->buf = buf;
> > > > > > > > > > > +       data->dma = dma;
> > > > > > > > > > > +
> > > > > > > > > > > +       return data;
> > > > > > > > > > > +}
> > > > > > > > > > > +
> > > > > > > > > > > +static void *virtnet_rq_get_buf(struct receive_queue *rq, u32 *len, void **ctx)
> > > > > > > > > > > +{
> > > > > > > > > > > +       struct virtnet_rq_data *data;
> > > > > > > > > > > +       void *buf;
> > > > > > > > > > > +
> > > > > > > > > > > +       buf = virtqueue_get_buf_ctx(rq->vq, len, ctx);
> > > > > > > > > > > +       if (!buf || !rq->data_array)
> > > > > > > > > > > +               return buf;
> > > > > > > > > > > +
> > > > > > > > > > > +       data = buf;
> > > > > > > > > > > +
> > > > > > > > > > > +       virtnet_rq_unmap(rq, data->dma);
> > > > > > > > > > > +
> > > > > > > > > > > +       return virtnet_rq_recycle_data(rq, data);
> > > > > > > > > > > +}
> > > > > > > > > > > +
> > > > > > > > > > > +static void *virtnet_rq_detach_unused_buf(struct receive_queue *rq)
> > > > > > > > > > > +{
> > > > > > > > > > > +       struct virtnet_rq_data *data;
> > > > > > > > > > > +       void *buf;
> > > > > > > > > > > +
> > > > > > > > > > > +       buf = virtqueue_detach_unused_buf(rq->vq);
> > > > > > > > > > > +       if (!buf || !rq->data_array)
> > > > > > > > > > > +               return buf;
> > > > > > > > > > > +
> > > > > > > > > > > +       data = buf;
> > > > > > > > > > > +
> > > > > > > > > > > +       virtnet_rq_unmap(rq, data->dma);
> > > > > > > > > > > +
> > > > > > > > > > > +       return virtnet_rq_recycle_data(rq, data);
> > > > > > > > > > > +}
> > > > > > > > > > > +
> > > > > > > > > > > +static int virtnet_rq_map_sg(struct receive_queue *rq, void *buf, u32 len)
> > > > > > > > > > > +{
> > > > > > > > > > > +       struct virtnet_rq_dma *dma = rq->last_dma;
> > > > > > > > > > > +       struct device *dev;
> > > > > > > > > > > +       u32 off, map_len;
> > > > > > > > > > > +       dma_addr_t addr;
> > > > > > > > > > > +       void *end;
> > > > > > > > > > > +
> > > > > > > > > > > +       if (likely(dma) && buf >= dma->buf && (buf + len <= dma->buf + dma->len)) {
> > > > > > > > > > > +               ++dma->ref;
> > > > > > > > > > > +               addr = dma->addr + (buf - dma->buf);
> > > > > > > > > > > +               goto ok;
> > > > > > > > > > > +       }
> > > > > > > > > >
> > > > > > > > > > So this is the meat of the proposed optimization. I guess that
> > > > > > > > > > if the last buffer we allocated happens to be in the same page
> > > > > > > > > > as this one then they can both be mapped for DMA together.
> > > > > > > > >
> > > > > > > > > Since we use page_frag, the buffers we allocated are all continuous.
> > > > > > > > >
> > > > > > > > > > Why last one specifically? Whether next one happens to
> > > > > > > > > > be close depends on luck. If you want to try optimizing this
> > > > > > > > > > the right thing to do is likely by using a page pool.
> > > > > > > > > > There's actually work upstream on page pool, look it up.
> > > > > > > > >
> > > > > > > > > As we discussed in another thread, the page pool is first used for xdp. Let's
> > > > > > > > > transform it step by step.
> > > > > > > > >
> > > > > > > > > Thanks.
> > > > > > > >
> > > > > > > > ok so this should wait then?
> > > > > > > >
> > > > > > > > > >
> > > > > > > > > > > +
> > > > > > > > > > > +       end = buf + len - 1;
> > > > > > > > > > > +       off = offset_in_page(end);
> > > > > > > > > > > +       map_len = len + PAGE_SIZE - off;
> > > > > > > > > > > +
> > > > > > > > > > > +       dev = virtqueue_dma_dev(rq->vq);
> > > > > > > > > > > +
> > > > > > > > > > > +       addr = dma_map_page_attrs(dev, virt_to_page(buf), offset_in_page(buf),
> > > > > > > > > > > +                                 map_len, DMA_FROM_DEVICE, 0);
> > > > > > > > > > > +       if (addr == DMA_MAPPING_ERROR)
> > > > > > > > > > > +               return -ENOMEM;
> > > > > > > > > > > +
> > > > > > > > > > > +       dma = rq->dma_free;
> > > > > > > > > > > +       rq->dma_free = dma->next;
> > > > > > > > > > > +
> > > > > > > > > > > +       dma->ref = 1;
> > > > > > > > > > > +       dma->buf = buf;
> > > > > > > > > > > +       dma->addr = addr;
> > > > > > > > > > > +       dma->len = map_len;
> > > > > > > > > > > +
> > > > > > > > > > > +       rq->last_dma = dma;
> > > > > > > > > > > +
> > > > > > > > > > > +ok:
> > > > > > > > > > > +       sg_init_table(rq->sg, 1);
> > > > > > > > > > > +       rq->sg[0].dma_address = addr;
> > > > > > > > > > > +       rq->sg[0].length = len;
> > > > > > > > > > > +
> > > > > > > > > > > +       return 0;
> > > > > > > > > > > +}
> > > > > > > > > > > +
> > > > > > > > > > > +static int virtnet_rq_merge_map_init(struct virtnet_info *vi)
> > > > > > > > > > > +{
> > > > > > > > > > > +       struct receive_queue *rq;
> > > > > > > > > > > +       int i, err, j, num;
> > > > > > > > > > > +
> > > > > > > > > > > +       /* disable for big mode */
> > > > > > > > > > > +       if (!vi->mergeable_rx_bufs && vi->big_packets)
> > > > > > > > > > > +               return 0;
> > > > > > > > > > > +
> > > > > > > > > > > +       for (i = 0; i < vi->max_queue_pairs; i++) {
> > > > > > > > > > > +               err = virtqueue_set_premapped(vi->rq[i].vq);
> > > > > > > > > > > +               if (err)
> > > > > > > > > > > +                       continue;
> > > > > > > > > > > +
> > > > > > > > > > > +               rq = &vi->rq[i];
> > > > > > > > > > > +
> > > > > > > > > > > +               num = virtqueue_get_vring_size(rq->vq);
> > > > > > > > > > > +
> > > > > > > > > > > +               rq->data_array = kmalloc_array(num, sizeof(*rq->data_array), GFP_KERNEL);
> > > > > > > > > > > +               if (!rq->data_array)
> > > > > > > > > > > +                       goto err;
> > > > > > > > > > > +
> > > > > > > > > > > +               rq->dma_array = kmalloc_array(num, sizeof(*rq->dma_array), GFP_KERNEL);
> > > > > > > > > > > +               if (!rq->dma_array)
> > > > > > > > > > > +                       goto err;
> > > > > > > > > > > +
> > > > > > > > > > > +               for (j = 0; j < num; ++j) {
> > > > > > > > > > > +                       rq->data_array[j].next = rq->data_free;
> > > > > > > > > > > +                       rq->data_free = &rq->data_array[j];
> > > > > > > > > > > +
> > > > > > > > > > > +                       rq->dma_array[j].next = rq->dma_free;
> > > > > > > > > > > +                       rq->dma_free = &rq->dma_array[j];
> > > > > > > > > > > +               }
> > > > > > > > > > > +       }
> > > > > > > > > > > +
> > > > > > > > > > > +       return 0;
> > > > > > > > > > > +
> > > > > > > > > > > +err:
> > > > > > > > > > > +       for (i = 0; i < vi->max_queue_pairs; i++) {
> > > > > > > > > > > +               struct receive_queue *rq;
> > > > > > > > > > > +
> > > > > > > > > > > +               rq = &vi->rq[i];
> > > > > > > > > > > +
> > > > > > > > > > > +               kfree(rq->dma_array);
> > > > > > > > > > > +               kfree(rq->data_array);
> > > > > > > > > > > +       }
> > > > > > > > > > > +
> > > > > > > > > > > +       return -ENOMEM;
> > > > > > > > > > > +}
> > > > > > > > > > > +
> > > > > > > > > > >  static void free_old_xmit_skbs(struct send_queue *sq, bool in_napi)
> > > > > > > > > > >  {
> > > > > > > > > > >         unsigned int len;
> > > > > > > > > > > @@ -835,7 +1033,7 @@ static struct page *xdp_linearize_page(struct receive_queue *rq,
> > > > > > > > > > >                 void *buf;
> > > > > > > > > > >                 int off;
> > > > > > > > > > >
> > > > > > > > > > > -               buf = virtqueue_get_buf(rq->vq, &buflen);
> > > > > > > > > > > +               buf = virtnet_rq_get_buf(rq, &buflen, NULL);
> > > > > > > > > > >                 if (unlikely(!buf))
> > > > > > > > > > >                         goto err_buf;
> > > > > > > > > > >
> > > > > > > > > > > @@ -1126,7 +1324,7 @@ static int virtnet_build_xdp_buff_mrg(struct net_device *dev,
> > > > > > > > > > >                 return -EINVAL;
> > > > > > > > > > >
> > > > > > > > > > >         while (--*num_buf > 0) {
> > > > > > > > > > > -               buf = virtqueue_get_buf_ctx(rq->vq, &len, &ctx);
> > > > > > > > > > > +               buf = virtnet_rq_get_buf(rq, &len, &ctx);
> > > > > > > > > > >                 if (unlikely(!buf)) {
> > > > > > > > > > >                         pr_debug("%s: rx error: %d buffers out of %d missing\n",
> > > > > > > > > > >                                  dev->name, *num_buf,
> > > > > > > > > > > @@ -1351,7 +1549,7 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
> > > > > > > > > > >         while (--num_buf) {
> > > > > > > > > > >                 int num_skb_frags;
> > > > > > > > > > >
> > > > > > > > > > > -               buf = virtqueue_get_buf_ctx(rq->vq, &len, &ctx);
> > > > > > > > > > > +               buf = virtnet_rq_get_buf(rq, &len, &ctx);
> > > > > > > > > > >                 if (unlikely(!buf)) {
> > > > > > > > > > >                         pr_debug("%s: rx error: %d buffers out of %d missing\n",
> > > > > > > > > > >                                  dev->name, num_buf,
> > > > > > > > > > > @@ -1414,7 +1612,7 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
> > > > > > > > > > >  err_skb:
> > > > > > > > > > >         put_page(page);
> > > > > > > > > > >         while (num_buf-- > 1) {
> > > > > > > > > > > -               buf = virtqueue_get_buf(rq->vq, &len);
> > > > > > > > > > > +               buf = virtnet_rq_get_buf(rq, &len, NULL);
> > > > > > > > > > >                 if (unlikely(!buf)) {
> > > > > > > > > > >                         pr_debug("%s: rx error: %d buffers missing\n",
> > > > > > > > > > >                                  dev->name, num_buf);
> > > > > > > > > > > @@ -1529,6 +1727,7 @@ static int add_recvbuf_small(struct virtnet_info *vi, struct receive_queue *rq,
> > > > > > > > > > >         unsigned int xdp_headroom = virtnet_get_headroom(vi);
> > > > > > > > > > >         void *ctx = (void *)(unsigned long)xdp_headroom;
> > > > > > > > > > >         int len = vi->hdr_len + VIRTNET_RX_PAD + GOOD_PACKET_LEN + xdp_headroom;
> > > > > > > > > > > +       struct virtnet_rq_data *data;
> > > > > > > > > > >         int err;
> > > > > > > > > > >
> > > > > > > > > > >         len = SKB_DATA_ALIGN(len) +
> > > > > > > > > > > @@ -1539,11 +1738,34 @@ static int add_recvbuf_small(struct virtnet_info *vi, struct receive_queue *rq,
> > > > > > > > > > >         buf = (char *)page_address(alloc_frag->page) + alloc_frag->offset;
> > > > > > > > > > >         get_page(alloc_frag->page);
> > > > > > > > > > >         alloc_frag->offset += len;
> > > > > > > > > > > -       sg_init_one(rq->sg, buf + VIRTNET_RX_PAD + xdp_headroom,
> > > > > > > > > > > -                   vi->hdr_len + GOOD_PACKET_LEN);
> > > > > > > > > > > -       err = virtqueue_add_inbuf_ctx(rq->vq, rq->sg, 1, buf, ctx, gfp);
> > > > > > > > > > > +
> > > > > > > > > > > +       if (rq->data_array) {
> > > > > > > > > > > +               err = virtnet_rq_map_sg(rq, buf + VIRTNET_RX_PAD + xdp_headroom,
> > > > > > > > > > > +                                       vi->hdr_len + GOOD_PACKET_LEN);
> > > > > > > > > > > +               if (err)
> > > > > > > > > > > +                       goto map_err;
> > > > > > > > > > > +
> > > > > > > > > > > +               data = virtnet_rq_get_data(rq, buf, rq->last_dma);
> > > > > > > > > > > +       } else {
> > > > > > > > > > > +               sg_init_one(rq->sg, buf + VIRTNET_RX_PAD + xdp_headroom,
> > > > > > > > > > > +                           vi->hdr_len + GOOD_PACKET_LEN);
> > > > > > > > > > > +               data = (void *)buf;
> > > > > > > > > > > +       }
> > > > > > > > > > > +
> > > > > > > > > > > +       err = virtqueue_add_inbuf_ctx(rq->vq, rq->sg, 1, data, ctx, gfp);
> > > > > > > > > > >         if (err < 0)
> > > > > > > > > > > -               put_page(virt_to_head_page(buf));
> > > > > > > > > > > +               goto add_err;
> > > > > > > > > > > +
> > > > > > > > > > > +       return err;
> > > > > > > > > > > +
> > > > > > > > > > > +add_err:
> > > > > > > > > > > +       if (rq->data_array) {
> > > > > > > > > > > +               virtnet_rq_unmap(rq, data->dma);
> > > > > > > > > > > +               virtnet_rq_recycle_data(rq, data);
> > > > > > > > > > > +       }
> > > > > > > > > > > +
> > > > > > > > > > > +map_err:
> > > > > > > > > > > +       put_page(virt_to_head_page(buf));
> > > > > > > > > > >         return err;
> > > > > > > > > > >  }
> > > > > > > > > > >
> > > > > > > > > > > @@ -1620,6 +1842,7 @@ static int add_recvbuf_mergeable(struct virtnet_info *vi,
> > > > > > > > > > >         unsigned int headroom = virtnet_get_headroom(vi);
> > > > > > > > > > >         unsigned int tailroom = headroom ? sizeof(struct skb_shared_info) : 0;
> > > > > > > > > > >         unsigned int room = SKB_DATA_ALIGN(headroom + tailroom);
> > > > > > > > > > > +       struct virtnet_rq_data *data;
> > > > > > > > > > >         char *buf;
> > > > > > > > > > >         void *ctx;
> > > > > > > > > > >         int err;
> > > > > > > > > > > @@ -1650,12 +1873,32 @@ static int add_recvbuf_mergeable(struct virtnet_info *vi,
> > > > > > > > > > >                 alloc_frag->offset += hole;
> > > > > > > > > > >         }
> > > > > > > > > > >
> > > > > > > > > > > -       sg_init_one(rq->sg, buf, len);
> > > > > > > > > > > +       if (rq->data_array) {
> > > > > > > > > > > +               err = virtnet_rq_map_sg(rq, buf, len);
> > > > > > > > > > > +               if (err)
> > > > > > > > > > > +                       goto map_err;
> > > > > > > > > > > +
> > > > > > > > > > > +               data = virtnet_rq_get_data(rq, buf, rq->last_dma);
> > > > > > > > > > > +       } else {
> > > > > > > > > > > +               sg_init_one(rq->sg, buf, len);
> > > > > > > > > > > +               data = (void *)buf;
> > > > > > > > > > > +       }
> > > > > > > > > > > +
> > > > > > > > > > >         ctx = mergeable_len_to_ctx(len + room, headroom);
> > > > > > > > > > > -       err = virtqueue_add_inbuf_ctx(rq->vq, rq->sg, 1, buf, ctx, gfp);
> > > > > > > > > > > +       err = virtqueue_add_inbuf_ctx(rq->vq, rq->sg, 1, data, ctx, gfp);
> > > > > > > > > > >         if (err < 0)
> > > > > > > > > > > -               put_page(virt_to_head_page(buf));
> > > > > > > > > > > +               goto add_err;
> > > > > > > > > > > +
> > > > > > > > > > > +       return 0;
> > > > > > > > > > > +
> > > > > > > > > > > +add_err:
> > > > > > > > > > > +       if (rq->data_array) {
> > > > > > > > > > > +               virtnet_rq_unmap(rq, data->dma);
> > > > > > > > > > > +               virtnet_rq_recycle_data(rq, data);
> > > > > > > > > > > +       }
> > > > > > > > > > >
> > > > > > > > > > > +map_err:
> > > > > > > > > > > +       put_page(virt_to_head_page(buf));
> > > > > > > > > > >         return err;
> > > > > > > > > > >  }
> > > > > > > > > > >
> > > > > > > > > > > @@ -1775,13 +2018,13 @@ static int virtnet_receive(struct receive_queue *rq, int budget,
> > > > > > > > > > >                 void *ctx;
> > > > > > > > > > >
> > > > > > > > > > >                 while (stats.packets < budget &&
> > > > > > > > > > > -                      (buf = virtqueue_get_buf_ctx(rq->vq, &len, &ctx))) {
> > > > > > > > > > > +                      (buf = virtnet_rq_get_buf(rq, &len, &ctx))) {
> > > > > > > > > > >                         receive_buf(vi, rq, buf, len, ctx, xdp_xmit, &stats);
> > > > > > > > > > >                         stats.packets++;
> > > > > > > > > > >                 }
> > > > > > > > > > >         } else {
> > > > > > > > > > >                 while (stats.packets < budget &&
> > > > > > > > > > > -                      (buf = virtqueue_get_buf(rq->vq, &len)) != NULL) {
> > > > > > > > > > > +                      (buf = virtnet_rq_get_buf(rq, &len, NULL)) != NULL) {
> > > > > > > > > > >                         receive_buf(vi, rq, buf, len, NULL, xdp_xmit, &stats);
> > > > > > > > > > >                         stats.packets++;
> > > > > > > > > > >                 }
> > > > > > > > > > > @@ -3514,6 +3757,9 @@ static void virtnet_free_queues(struct virtnet_info *vi)
> > > > > > > > > > >         for (i = 0; i < vi->max_queue_pairs; i++) {
> > > > > > > > > > >                 __netif_napi_del(&vi->rq[i].napi);
> > > > > > > > > > >                 __netif_napi_del(&vi->sq[i].napi);
> > > > > > > > > > > +
> > > > > > > > > > > +               kfree(vi->rq[i].data_array);
> > > > > > > > > > > +               kfree(vi->rq[i].dma_array);
> > > > > > > > > > >         }
> > > > > > > > > > >
> > > > > > > > > > >         /* We called __netif_napi_del(),
> > > > > > > > > > > @@ -3591,9 +3837,10 @@ static void free_unused_bufs(struct virtnet_info *vi)
> > > > > > > > > > >         }
> > > > > > > > > > >
> > > > > > > > > > >         for (i = 0; i < vi->max_queue_pairs; i++) {
> > > > > > > > > > > -               struct virtqueue *vq = vi->rq[i].vq;
> > > > > > > > > > > -               while ((buf = virtqueue_detach_unused_buf(vq)) != NULL)
> > > > > > > > > > > -                       virtnet_rq_free_unused_buf(vq, buf);
> > > > > > > > > > > +               struct receive_queue *rq = &vi->rq[i];
> > > > > > > > > > > +
> > > > > > > > > > > +               while ((buf = virtnet_rq_detach_unused_buf(rq)) != NULL)
> > > > > > > > > > > +                       virtnet_rq_free_unused_buf(rq->vq, buf);
> > > > > > > > > > >                 cond_resched();
> > > > > > > > > > >         }
> > > > > > > > > > >  }
> > > > > > > > > > > @@ -3767,6 +4014,10 @@ static int init_vqs(struct virtnet_info *vi)
> > > > > > > > > > >         if (ret)
> > > > > > > > > > >                 goto err_free;
> > > > > > > > > > >
> > > > > > > > > > > +       ret = virtnet_rq_merge_map_init(vi);
> > > > > > > > > > > +       if (ret)
> > > > > > > > > > > +               goto err_free;
> > > > > > > > > > > +
> > > > > > > > > > >         cpus_read_lock();
> > > > > > > > > > >         virtnet_set_affinity(vi);
> > > > > > > > > > >         cpus_read_unlock();
> > > > > > > > > > > --
> > > > > > > > > > > 2.32.0.3.g01195cf9f
> > > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: [PATCH vhost v11 10/10] virtio_net: merge dma operation for one page
@ 2023-07-12  8:38                         ` Xuan Zhuo
  0 siblings, 0 replies; 176+ messages in thread
From: Xuan Zhuo @ 2023-07-12  8:38 UTC (permalink / raw)
  To: Jason Wang
  Cc: Jesper Dangaard Brouer, Daniel Borkmann, Michael S. Tsirkin,
	netdev, John Fastabend, Alexei Starovoitov, virtualization,
	Christoph Hellwig, Eric Dumazet, Jakub Kicinski, bpf,
	Paolo Abeni, David S. Miller

On Wed, 12 Jul 2023 16:37:43 +0800, Jason Wang <jasowang@redhat.com> wrote:
> On Wed, Jul 12, 2023 at 4:33 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> >
> > On Wed, 12 Jul 2023 15:54:58 +0800, Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > On Tue, 11 Jul 2023 10:58:51 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > > On Tue, Jul 11, 2023 at 10:42 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > > >
> > > > > On Tue, 11 Jul 2023 10:36:17 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > > > > On Mon, Jul 10, 2023 at 8:41 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > > > > >
> > > > > > > On Mon, 10 Jul 2023 07:59:03 -0400, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > > > > > > > On Mon, Jul 10, 2023 at 06:18:30PM +0800, Xuan Zhuo wrote:
> > > > > > > > > On Mon, 10 Jul 2023 05:40:21 -0400, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > > > > > > > > > On Mon, Jul 10, 2023 at 11:42:37AM +0800, Xuan Zhuo wrote:
> > > > > > > > > > > Currently, the virtio core will perform a dma operation for each
> > > > > > > > > > > operation. Although, the same page may be operated multiple times.
> > > > > > > > > > >
> > > > > > > > > > > The driver does the dma operation and manages the dma address based the
> > > > > > > > > > > feature premapped of virtio core.
> > > > > > > > > > >
> > > > > > > > > > > This way, we can perform only one dma operation for the same page. In
> > > > > > > > > > > the case of mtu 1500, this can reduce a lot of dma operations.
> > > > > > > > > > >
> > > > > > > > > > > Tested on Aliyun g7.4large machine, in the case of a cpu 100%, pps
> > > > > > > > > > > increased from 1893766 to 1901105. An increase of 0.4%.
> > > > > > > > > >
> > > > > > > > > > what kind of dma was there? an IOMMU? which vendors? in which mode
> > > > > > > > > > of operation?
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > Do you mean this:
> > > > > > > > >
> > > > > > > > > [    0.470816] iommu: Default domain type: Passthrough
> > > > > > > > >
> > > > > > > >
> > > > > > > > With passthrough, dma API is just some indirect function calls, they do
> > > > > > > > not affect the performance a lot.
> > > > > > >
> > > > > > >
> > > > > > > Yes, this benefit is worthless. I seem to have done a meaningless thing. The
> > > > > > > overhead of DMA I observed is indeed not too high.
> > > > > >
> > > > > > Have you measured with iommu=strict?
> > > > >
> > > > > I have not tested this way, our environment is pt, I wonder if strict is a
> > > > > common scenario. I can test it.
> > > >
> > > > It's not a common setup, but it's a way to stress DMA layer to see the overhead.
> > >
> > > kernel command line: intel_iommu=on iommu.strict=1 iommu.passthrough=0
> > >
> > > virtio-net without merge dma 428614.00 pps
> > >
> > > virtio-net with merge dma    742853.00 pps
> >
> >
> > kernel command line: intel_iommu=on iommu.strict=0 iommu.passthrough=0
> >
> > virtio-net without merge dma 775496.00 pps
> >
> > virtio-net with merge dma    1010514.00 pps
> >
> >
>
> Great, let's add those numbers to the changelog.


Yes, I will do it in next version.


Thanks.


>
> Thanks
>
> > Thanks.
> >
> > >
> > >
> > > Thanks.
> > >
> > >
> > >
> > >
> > > >
> > > > Thanks
> > > >
> > > > >
> > > > > Thanks.
> > > > >
> > > > >
> > > > > >
> > > > > > Thanks
> > > > > >
> > > > > > >
> > > > > > > Thanks.
> > > > > > >
> > > > > > >
> > > > > > > >
> > > > > > > > Try e.g. bounce buffer. Which is where you will see a problem: your
> > > > > > > > patches won't work.
> > > > > > > >
> > > > > > > >
> > > > > > > > > >
> > > > > > > > > > > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > > > > > > > > >
> > > > > > > > > > This kind of difference is likely in the noise.
> > > > > > > > >
> > > > > > > > > It's really not high, but this is because the proportion of DMA under perf top
> > > > > > > > > is not high. Probably that much.
> > > > > > > >
> > > > > > > > So maybe not worth the complexity.
> > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > > ---
> > > > > > > > > > >  drivers/net/virtio_net.c | 283 ++++++++++++++++++++++++++++++++++++---
> > > > > > > > > > >  1 file changed, 267 insertions(+), 16 deletions(-)
> > > > > > > > > > >
> > > > > > > > > > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > > > > > > > > > > index 486b5849033d..4de845d35bed 100644
> > > > > > > > > > > --- a/drivers/net/virtio_net.c
> > > > > > > > > > > +++ b/drivers/net/virtio_net.c
> > > > > > > > > > > @@ -126,6 +126,27 @@ static const struct virtnet_stat_desc virtnet_rq_stats_desc[] = {
> > > > > > > > > > >  #define VIRTNET_SQ_STATS_LEN   ARRAY_SIZE(virtnet_sq_stats_desc)
> > > > > > > > > > >  #define VIRTNET_RQ_STATS_LEN   ARRAY_SIZE(virtnet_rq_stats_desc)
> > > > > > > > > > >
> > > > > > > > > > > +/* The bufs on the same page may share this struct. */
> > > > > > > > > > > +struct virtnet_rq_dma {
> > > > > > > > > > > +       struct virtnet_rq_dma *next;
> > > > > > > > > > > +
> > > > > > > > > > > +       dma_addr_t addr;
> > > > > > > > > > > +
> > > > > > > > > > > +       void *buf;
> > > > > > > > > > > +       u32 len;
> > > > > > > > > > > +
> > > > > > > > > > > +       u32 ref;
> > > > > > > > > > > +};
> > > > > > > > > > > +
> > > > > > > > > > > +/* Record the dma and buf. */
> > > > > > > > > >
> > > > > > > > > > I guess I see that. But why?
> > > > > > > > > > And these two comments are the extent of the available
> > > > > > > > > > documentation, that's not enough I feel.
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > > +struct virtnet_rq_data {
> > > > > > > > > > > +       struct virtnet_rq_data *next;
> > > > > > > > > >
> > > > > > > > > > Is manually reimplementing a linked list the best
> > > > > > > > > > we can do?
> > > > > > > > >
> > > > > > > > > Yes, we can use llist.
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > > +
> > > > > > > > > > > +       void *buf;
> > > > > > > > > > > +
> > > > > > > > > > > +       struct virtnet_rq_dma *dma;
> > > > > > > > > > > +};
> > > > > > > > > > > +
> > > > > > > > > > >  /* Internal representation of a send virtqueue */
> > > > > > > > > > >  struct send_queue {
> > > > > > > > > > >         /* Virtqueue associated with this send _queue */
> > > > > > > > > > > @@ -175,6 +196,13 @@ struct receive_queue {
> > > > > > > > > > >         char name[16];
> > > > > > > > > > >
> > > > > > > > > > >         struct xdp_rxq_info xdp_rxq;
> > > > > > > > > > > +
> > > > > > > > > > > +       struct virtnet_rq_data *data_array;
> > > > > > > > > > > +       struct virtnet_rq_data *data_free;
> > > > > > > > > > > +
> > > > > > > > > > > +       struct virtnet_rq_dma *dma_array;
> > > > > > > > > > > +       struct virtnet_rq_dma *dma_free;
> > > > > > > > > > > +       struct virtnet_rq_dma *last_dma;
> > > > > > > > > > >  };
> > > > > > > > > > >
> > > > > > > > > > >  /* This structure can contain rss message with maximum settings for indirection table and keysize
> > > > > > > > > > > @@ -549,6 +577,176 @@ static struct sk_buff *page_to_skb(struct virtnet_info *vi,
> > > > > > > > > > >         return skb;
> > > > > > > > > > >  }
> > > > > > > > > > >
> > > > > > > > > > > +static void virtnet_rq_unmap(struct receive_queue *rq, struct virtnet_rq_dma *dma)
> > > > > > > > > > > +{
> > > > > > > > > > > +       struct device *dev;
> > > > > > > > > > > +
> > > > > > > > > > > +       --dma->ref;
> > > > > > > > > > > +
> > > > > > > > > > > +       if (dma->ref)
> > > > > > > > > > > +               return;
> > > > > > > > > > > +
> > > > > > > > > >
> > > > > > > > > > If you don't unmap there is no guarantee valid data will be
> > > > > > > > > > there in the buffer.
> > > > > > > > > >
> > > > > > > > > > > +       dev = virtqueue_dma_dev(rq->vq);
> > > > > > > > > > > +
> > > > > > > > > > > +       dma_unmap_page(dev, dma->addr, dma->len, DMA_FROM_DEVICE);
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > > +
> > > > > > > > > > > +       dma->next = rq->dma_free;
> > > > > > > > > > > +       rq->dma_free = dma;
> > > > > > > > > > > +}
> > > > > > > > > > > +
> > > > > > > > > > > +static void *virtnet_rq_recycle_data(struct receive_queue *rq,
> > > > > > > > > > > +                                    struct virtnet_rq_data *data)
> > > > > > > > > > > +{
> > > > > > > > > > > +       void *buf;
> > > > > > > > > > > +
> > > > > > > > > > > +       buf = data->buf;
> > > > > > > > > > > +
> > > > > > > > > > > +       data->next = rq->data_free;
> > > > > > > > > > > +       rq->data_free = data;
> > > > > > > > > > > +
> > > > > > > > > > > +       return buf;
> > > > > > > > > > > +}
> > > > > > > > > > > +
> > > > > > > > > > > +static struct virtnet_rq_data *virtnet_rq_get_data(struct receive_queue *rq,
> > > > > > > > > > > +                                                  void *buf,
> > > > > > > > > > > +                                                  struct virtnet_rq_dma *dma)
> > > > > > > > > > > +{
> > > > > > > > > > > +       struct virtnet_rq_data *data;
> > > > > > > > > > > +
> > > > > > > > > > > +       data = rq->data_free;
> > > > > > > > > > > +       rq->data_free = data->next;
> > > > > > > > > > > +
> > > > > > > > > > > +       data->buf = buf;
> > > > > > > > > > > +       data->dma = dma;
> > > > > > > > > > > +
> > > > > > > > > > > +       return data;
> > > > > > > > > > > +}
> > > > > > > > > > > +
> > > > > > > > > > > +static void *virtnet_rq_get_buf(struct receive_queue *rq, u32 *len, void **ctx)
> > > > > > > > > > > +{
> > > > > > > > > > > +       struct virtnet_rq_data *data;
> > > > > > > > > > > +       void *buf;
> > > > > > > > > > > +
> > > > > > > > > > > +       buf = virtqueue_get_buf_ctx(rq->vq, len, ctx);
> > > > > > > > > > > +       if (!buf || !rq->data_array)
> > > > > > > > > > > +               return buf;
> > > > > > > > > > > +
> > > > > > > > > > > +       data = buf;
> > > > > > > > > > > +
> > > > > > > > > > > +       virtnet_rq_unmap(rq, data->dma);
> > > > > > > > > > > +
> > > > > > > > > > > +       return virtnet_rq_recycle_data(rq, data);
> > > > > > > > > > > +}
> > > > > > > > > > > +
> > > > > > > > > > > +static void *virtnet_rq_detach_unused_buf(struct receive_queue *rq)
> > > > > > > > > > > +{
> > > > > > > > > > > +       struct virtnet_rq_data *data;
> > > > > > > > > > > +       void *buf;
> > > > > > > > > > > +
> > > > > > > > > > > +       buf = virtqueue_detach_unused_buf(rq->vq);
> > > > > > > > > > > +       if (!buf || !rq->data_array)
> > > > > > > > > > > +               return buf;
> > > > > > > > > > > +
> > > > > > > > > > > +       data = buf;
> > > > > > > > > > > +
> > > > > > > > > > > +       virtnet_rq_unmap(rq, data->dma);
> > > > > > > > > > > +
> > > > > > > > > > > +       return virtnet_rq_recycle_data(rq, data);
> > > > > > > > > > > +}
> > > > > > > > > > > +
> > > > > > > > > > > +static int virtnet_rq_map_sg(struct receive_queue *rq, void *buf, u32 len)
> > > > > > > > > > > +{
> > > > > > > > > > > +       struct virtnet_rq_dma *dma = rq->last_dma;
> > > > > > > > > > > +       struct device *dev;
> > > > > > > > > > > +       u32 off, map_len;
> > > > > > > > > > > +       dma_addr_t addr;
> > > > > > > > > > > +       void *end;
> > > > > > > > > > > +
> > > > > > > > > > > +       if (likely(dma) && buf >= dma->buf && (buf + len <= dma->buf + dma->len)) {
> > > > > > > > > > > +               ++dma->ref;
> > > > > > > > > > > +               addr = dma->addr + (buf - dma->buf);
> > > > > > > > > > > +               goto ok;
> > > > > > > > > > > +       }
> > > > > > > > > >
> > > > > > > > > > So this is the meat of the proposed optimization. I guess that
> > > > > > > > > > if the last buffer we allocated happens to be in the same page
> > > > > > > > > > as this one then they can both be mapped for DMA together.
> > > > > > > > >
> > > > > > > > > Since we use page_frag, the buffers we allocated are all continuous.
> > > > > > > > >
> > > > > > > > > > Why last one specifically? Whether next one happens to
> > > > > > > > > > be close depends on luck. If you want to try optimizing this
> > > > > > > > > > the right thing to do is likely by using a page pool.
> > > > > > > > > > There's actually work upstream on page pool, look it up.
> > > > > > > > >
> > > > > > > > > As we discussed in another thread, the page pool is first used for xdp. Let's
> > > > > > > > > transform it step by step.
> > > > > > > > >
> > > > > > > > > Thanks.
> > > > > > > >
> > > > > > > > ok so this should wait then?
> > > > > > > >
> > > > > > > > > >
> > > > > > > > > > > +
> > > > > > > > > > > +       end = buf + len - 1;
> > > > > > > > > > > +       off = offset_in_page(end);
> > > > > > > > > > > +       map_len = len + PAGE_SIZE - off;
> > > > > > > > > > > +
> > > > > > > > > > > +       dev = virtqueue_dma_dev(rq->vq);
> > > > > > > > > > > +
> > > > > > > > > > > +       addr = dma_map_page_attrs(dev, virt_to_page(buf), offset_in_page(buf),
> > > > > > > > > > > +                                 map_len, DMA_FROM_DEVICE, 0);
> > > > > > > > > > > +       if (addr == DMA_MAPPING_ERROR)
> > > > > > > > > > > +               return -ENOMEM;
> > > > > > > > > > > +
> > > > > > > > > > > +       dma = rq->dma_free;
> > > > > > > > > > > +       rq->dma_free = dma->next;
> > > > > > > > > > > +
> > > > > > > > > > > +       dma->ref = 1;
> > > > > > > > > > > +       dma->buf = buf;
> > > > > > > > > > > +       dma->addr = addr;
> > > > > > > > > > > +       dma->len = map_len;
> > > > > > > > > > > +
> > > > > > > > > > > +       rq->last_dma = dma;
> > > > > > > > > > > +
> > > > > > > > > > > +ok:
> > > > > > > > > > > +       sg_init_table(rq->sg, 1);
> > > > > > > > > > > +       rq->sg[0].dma_address = addr;
> > > > > > > > > > > +       rq->sg[0].length = len;
> > > > > > > > > > > +
> > > > > > > > > > > +       return 0;
> > > > > > > > > > > +}
> > > > > > > > > > > +
> > > > > > > > > > > +static int virtnet_rq_merge_map_init(struct virtnet_info *vi)
> > > > > > > > > > > +{
> > > > > > > > > > > +       struct receive_queue *rq;
> > > > > > > > > > > +       int i, err, j, num;
> > > > > > > > > > > +
> > > > > > > > > > > +       /* disable for big mode */
> > > > > > > > > > > +       if (!vi->mergeable_rx_bufs && vi->big_packets)
> > > > > > > > > > > +               return 0;
> > > > > > > > > > > +
> > > > > > > > > > > +       for (i = 0; i < vi->max_queue_pairs; i++) {
> > > > > > > > > > > +               err = virtqueue_set_premapped(vi->rq[i].vq);
> > > > > > > > > > > +               if (err)
> > > > > > > > > > > +                       continue;
> > > > > > > > > > > +
> > > > > > > > > > > +               rq = &vi->rq[i];
> > > > > > > > > > > +
> > > > > > > > > > > +               num = virtqueue_get_vring_size(rq->vq);
> > > > > > > > > > > +
> > > > > > > > > > > +               rq->data_array = kmalloc_array(num, sizeof(*rq->data_array), GFP_KERNEL);
> > > > > > > > > > > +               if (!rq->data_array)
> > > > > > > > > > > +                       goto err;
> > > > > > > > > > > +
> > > > > > > > > > > +               rq->dma_array = kmalloc_array(num, sizeof(*rq->dma_array), GFP_KERNEL);
> > > > > > > > > > > +               if (!rq->dma_array)
> > > > > > > > > > > +                       goto err;
> > > > > > > > > > > +
> > > > > > > > > > > +               for (j = 0; j < num; ++j) {
> > > > > > > > > > > +                       rq->data_array[j].next = rq->data_free;
> > > > > > > > > > > +                       rq->data_free = &rq->data_array[j];
> > > > > > > > > > > +
> > > > > > > > > > > +                       rq->dma_array[j].next = rq->dma_free;
> > > > > > > > > > > +                       rq->dma_free = &rq->dma_array[j];
> > > > > > > > > > > +               }
> > > > > > > > > > > +       }
> > > > > > > > > > > +
> > > > > > > > > > > +       return 0;
> > > > > > > > > > > +
> > > > > > > > > > > +err:
> > > > > > > > > > > +       for (i = 0; i < vi->max_queue_pairs; i++) {
> > > > > > > > > > > +               struct receive_queue *rq;
> > > > > > > > > > > +
> > > > > > > > > > > +               rq = &vi->rq[i];
> > > > > > > > > > > +
> > > > > > > > > > > +               kfree(rq->dma_array);
> > > > > > > > > > > +               kfree(rq->data_array);
> > > > > > > > > > > +       }
> > > > > > > > > > > +
> > > > > > > > > > > +       return -ENOMEM;
> > > > > > > > > > > +}
> > > > > > > > > > > +
> > > > > > > > > > >  static void free_old_xmit_skbs(struct send_queue *sq, bool in_napi)
> > > > > > > > > > >  {
> > > > > > > > > > >         unsigned int len;
> > > > > > > > > > > @@ -835,7 +1033,7 @@ static struct page *xdp_linearize_page(struct receive_queue *rq,
> > > > > > > > > > >                 void *buf;
> > > > > > > > > > >                 int off;
> > > > > > > > > > >
> > > > > > > > > > > -               buf = virtqueue_get_buf(rq->vq, &buflen);
> > > > > > > > > > > +               buf = virtnet_rq_get_buf(rq, &buflen, NULL);
> > > > > > > > > > >                 if (unlikely(!buf))
> > > > > > > > > > >                         goto err_buf;
> > > > > > > > > > >
> > > > > > > > > > > @@ -1126,7 +1324,7 @@ static int virtnet_build_xdp_buff_mrg(struct net_device *dev,
> > > > > > > > > > >                 return -EINVAL;
> > > > > > > > > > >
> > > > > > > > > > >         while (--*num_buf > 0) {
> > > > > > > > > > > -               buf = virtqueue_get_buf_ctx(rq->vq, &len, &ctx);
> > > > > > > > > > > +               buf = virtnet_rq_get_buf(rq, &len, &ctx);
> > > > > > > > > > >                 if (unlikely(!buf)) {
> > > > > > > > > > >                         pr_debug("%s: rx error: %d buffers out of %d missing\n",
> > > > > > > > > > >                                  dev->name, *num_buf,
> > > > > > > > > > > @@ -1351,7 +1549,7 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
> > > > > > > > > > >         while (--num_buf) {
> > > > > > > > > > >                 int num_skb_frags;
> > > > > > > > > > >
> > > > > > > > > > > -               buf = virtqueue_get_buf_ctx(rq->vq, &len, &ctx);
> > > > > > > > > > > +               buf = virtnet_rq_get_buf(rq, &len, &ctx);
> > > > > > > > > > >                 if (unlikely(!buf)) {
> > > > > > > > > > >                         pr_debug("%s: rx error: %d buffers out of %d missing\n",
> > > > > > > > > > >                                  dev->name, num_buf,
> > > > > > > > > > > @@ -1414,7 +1612,7 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
> > > > > > > > > > >  err_skb:
> > > > > > > > > > >         put_page(page);
> > > > > > > > > > >         while (num_buf-- > 1) {
> > > > > > > > > > > -               buf = virtqueue_get_buf(rq->vq, &len);
> > > > > > > > > > > +               buf = virtnet_rq_get_buf(rq, &len, NULL);
> > > > > > > > > > >                 if (unlikely(!buf)) {
> > > > > > > > > > >                         pr_debug("%s: rx error: %d buffers missing\n",
> > > > > > > > > > >                                  dev->name, num_buf);
> > > > > > > > > > > @@ -1529,6 +1727,7 @@ static int add_recvbuf_small(struct virtnet_info *vi, struct receive_queue *rq,
> > > > > > > > > > >         unsigned int xdp_headroom = virtnet_get_headroom(vi);
> > > > > > > > > > >         void *ctx = (void *)(unsigned long)xdp_headroom;
> > > > > > > > > > >         int len = vi->hdr_len + VIRTNET_RX_PAD + GOOD_PACKET_LEN + xdp_headroom;
> > > > > > > > > > > +       struct virtnet_rq_data *data;
> > > > > > > > > > >         int err;
> > > > > > > > > > >
> > > > > > > > > > >         len = SKB_DATA_ALIGN(len) +
> > > > > > > > > > > @@ -1539,11 +1738,34 @@ static int add_recvbuf_small(struct virtnet_info *vi, struct receive_queue *rq,
> > > > > > > > > > >         buf = (char *)page_address(alloc_frag->page) + alloc_frag->offset;
> > > > > > > > > > >         get_page(alloc_frag->page);
> > > > > > > > > > >         alloc_frag->offset += len;
> > > > > > > > > > > -       sg_init_one(rq->sg, buf + VIRTNET_RX_PAD + xdp_headroom,
> > > > > > > > > > > -                   vi->hdr_len + GOOD_PACKET_LEN);
> > > > > > > > > > > -       err = virtqueue_add_inbuf_ctx(rq->vq, rq->sg, 1, buf, ctx, gfp);
> > > > > > > > > > > +
> > > > > > > > > > > +       if (rq->data_array) {
> > > > > > > > > > > +               err = virtnet_rq_map_sg(rq, buf + VIRTNET_RX_PAD + xdp_headroom,
> > > > > > > > > > > +                                       vi->hdr_len + GOOD_PACKET_LEN);
> > > > > > > > > > > +               if (err)
> > > > > > > > > > > +                       goto map_err;
> > > > > > > > > > > +
> > > > > > > > > > > +               data = virtnet_rq_get_data(rq, buf, rq->last_dma);
> > > > > > > > > > > +       } else {
> > > > > > > > > > > +               sg_init_one(rq->sg, buf + VIRTNET_RX_PAD + xdp_headroom,
> > > > > > > > > > > +                           vi->hdr_len + GOOD_PACKET_LEN);
> > > > > > > > > > > +               data = (void *)buf;
> > > > > > > > > > > +       }
> > > > > > > > > > > +
> > > > > > > > > > > +       err = virtqueue_add_inbuf_ctx(rq->vq, rq->sg, 1, data, ctx, gfp);
> > > > > > > > > > >         if (err < 0)
> > > > > > > > > > > -               put_page(virt_to_head_page(buf));
> > > > > > > > > > > +               goto add_err;
> > > > > > > > > > > +
> > > > > > > > > > > +       return err;
> > > > > > > > > > > +
> > > > > > > > > > > +add_err:
> > > > > > > > > > > +       if (rq->data_array) {
> > > > > > > > > > > +               virtnet_rq_unmap(rq, data->dma);
> > > > > > > > > > > +               virtnet_rq_recycle_data(rq, data);
> > > > > > > > > > > +       }
> > > > > > > > > > > +
> > > > > > > > > > > +map_err:
> > > > > > > > > > > +       put_page(virt_to_head_page(buf));
> > > > > > > > > > >         return err;
> > > > > > > > > > >  }
> > > > > > > > > > >
> > > > > > > > > > > @@ -1620,6 +1842,7 @@ static int add_recvbuf_mergeable(struct virtnet_info *vi,
> > > > > > > > > > >         unsigned int headroom = virtnet_get_headroom(vi);
> > > > > > > > > > >         unsigned int tailroom = headroom ? sizeof(struct skb_shared_info) : 0;
> > > > > > > > > > >         unsigned int room = SKB_DATA_ALIGN(headroom + tailroom);
> > > > > > > > > > > +       struct virtnet_rq_data *data;
> > > > > > > > > > >         char *buf;
> > > > > > > > > > >         void *ctx;
> > > > > > > > > > >         int err;
> > > > > > > > > > > @@ -1650,12 +1873,32 @@ static int add_recvbuf_mergeable(struct virtnet_info *vi,
> > > > > > > > > > >                 alloc_frag->offset += hole;
> > > > > > > > > > >         }
> > > > > > > > > > >
> > > > > > > > > > > -       sg_init_one(rq->sg, buf, len);
> > > > > > > > > > > +       if (rq->data_array) {
> > > > > > > > > > > +               err = virtnet_rq_map_sg(rq, buf, len);
> > > > > > > > > > > +               if (err)
> > > > > > > > > > > +                       goto map_err;
> > > > > > > > > > > +
> > > > > > > > > > > +               data = virtnet_rq_get_data(rq, buf, rq->last_dma);
> > > > > > > > > > > +       } else {
> > > > > > > > > > > +               sg_init_one(rq->sg, buf, len);
> > > > > > > > > > > +               data = (void *)buf;
> > > > > > > > > > > +       }
> > > > > > > > > > > +
> > > > > > > > > > >         ctx = mergeable_len_to_ctx(len + room, headroom);
> > > > > > > > > > > -       err = virtqueue_add_inbuf_ctx(rq->vq, rq->sg, 1, buf, ctx, gfp);
> > > > > > > > > > > +       err = virtqueue_add_inbuf_ctx(rq->vq, rq->sg, 1, data, ctx, gfp);
> > > > > > > > > > >         if (err < 0)
> > > > > > > > > > > -               put_page(virt_to_head_page(buf));
> > > > > > > > > > > +               goto add_err;
> > > > > > > > > > > +
> > > > > > > > > > > +       return 0;
> > > > > > > > > > > +
> > > > > > > > > > > +add_err:
> > > > > > > > > > > +       if (rq->data_array) {
> > > > > > > > > > > +               virtnet_rq_unmap(rq, data->dma);
> > > > > > > > > > > +               virtnet_rq_recycle_data(rq, data);
> > > > > > > > > > > +       }
> > > > > > > > > > >
> > > > > > > > > > > +map_err:
> > > > > > > > > > > +       put_page(virt_to_head_page(buf));
> > > > > > > > > > >         return err;
> > > > > > > > > > >  }
> > > > > > > > > > >
> > > > > > > > > > > @@ -1775,13 +2018,13 @@ static int virtnet_receive(struct receive_queue *rq, int budget,
> > > > > > > > > > >                 void *ctx;
> > > > > > > > > > >
> > > > > > > > > > >                 while (stats.packets < budget &&
> > > > > > > > > > > -                      (buf = virtqueue_get_buf_ctx(rq->vq, &len, &ctx))) {
> > > > > > > > > > > +                      (buf = virtnet_rq_get_buf(rq, &len, &ctx))) {
> > > > > > > > > > >                         receive_buf(vi, rq, buf, len, ctx, xdp_xmit, &stats);
> > > > > > > > > > >                         stats.packets++;
> > > > > > > > > > >                 }
> > > > > > > > > > >         } else {
> > > > > > > > > > >                 while (stats.packets < budget &&
> > > > > > > > > > > -                      (buf = virtqueue_get_buf(rq->vq, &len)) != NULL) {
> > > > > > > > > > > +                      (buf = virtnet_rq_get_buf(rq, &len, NULL)) != NULL) {
> > > > > > > > > > >                         receive_buf(vi, rq, buf, len, NULL, xdp_xmit, &stats);
> > > > > > > > > > >                         stats.packets++;
> > > > > > > > > > >                 }
> > > > > > > > > > > @@ -3514,6 +3757,9 @@ static void virtnet_free_queues(struct virtnet_info *vi)
> > > > > > > > > > >         for (i = 0; i < vi->max_queue_pairs; i++) {
> > > > > > > > > > >                 __netif_napi_del(&vi->rq[i].napi);
> > > > > > > > > > >                 __netif_napi_del(&vi->sq[i].napi);
> > > > > > > > > > > +
> > > > > > > > > > > +               kfree(vi->rq[i].data_array);
> > > > > > > > > > > +               kfree(vi->rq[i].dma_array);
> > > > > > > > > > >         }
> > > > > > > > > > >
> > > > > > > > > > >         /* We called __netif_napi_del(),
> > > > > > > > > > > @@ -3591,9 +3837,10 @@ static void free_unused_bufs(struct virtnet_info *vi)
> > > > > > > > > > >         }
> > > > > > > > > > >
> > > > > > > > > > >         for (i = 0; i < vi->max_queue_pairs; i++) {
> > > > > > > > > > > -               struct virtqueue *vq = vi->rq[i].vq;
> > > > > > > > > > > -               while ((buf = virtqueue_detach_unused_buf(vq)) != NULL)
> > > > > > > > > > > -                       virtnet_rq_free_unused_buf(vq, buf);
> > > > > > > > > > > +               struct receive_queue *rq = &vi->rq[i];
> > > > > > > > > > > +
> > > > > > > > > > > +               while ((buf = virtnet_rq_detach_unused_buf(rq)) != NULL)
> > > > > > > > > > > +                       virtnet_rq_free_unused_buf(rq->vq, buf);
> > > > > > > > > > >                 cond_resched();
> > > > > > > > > > >         }
> > > > > > > > > > >  }
> > > > > > > > > > > @@ -3767,6 +4014,10 @@ static int init_vqs(struct virtnet_info *vi)
> > > > > > > > > > >         if (ret)
> > > > > > > > > > >                 goto err_free;
> > > > > > > > > > >
> > > > > > > > > > > +       ret = virtnet_rq_merge_map_init(vi);
> > > > > > > > > > > +       if (ret)
> > > > > > > > > > > +               goto err_free;
> > > > > > > > > > > +
> > > > > > > > > > >         cpus_read_lock();
> > > > > > > > > > >         virtnet_set_affinity(vi);
> > > > > > > > > > >         cpus_read_unlock();
> > > > > > > > > > > --
> > > > > > > > > > > 2.32.0.3.g01195cf9f
> > > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: [PATCH vhost v11 06/10] virtio_ring: skip unmap for premapped
  2023-07-10  3:42   ` Xuan Zhuo
@ 2023-07-13  3:50     ` Jason Wang
  -1 siblings, 0 replies; 176+ messages in thread
From: Jason Wang @ 2023-07-13  3:50 UTC (permalink / raw)
  To: Xuan Zhuo
  Cc: Jesper Dangaard Brouer, Daniel Borkmann, Michael S. Tsirkin,
	netdev, John Fastabend, Alexei Starovoitov, virtualization,
	Christoph Hellwig, Eric Dumazet, Jakub Kicinski, bpf,
	Paolo Abeni, David S. Miller

On Mon, Jul 10, 2023 at 11:42 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
>
> Now we add a case where we skip dma unmap, the vq->premapped is true.
>
> We can't just rely on use_dma_api to determine whether to skip the dma
> operation. For convenience, I introduced the "do_unmap". By default, it
> is the same as use_dma_api. If the driver is configured with premapped,
> then do_unmap is false.
>
> So as long as do_unmap is false, for addr of desc, we should skip dma
> unmap operation.
>
> Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> ---
>  drivers/virtio/virtio_ring.c | 42 ++++++++++++++++++++++++------------
>  1 file changed, 28 insertions(+), 14 deletions(-)
>
> diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> index 1fb2c6dca9ea..10ee3b7ce571 100644
> --- a/drivers/virtio/virtio_ring.c
> +++ b/drivers/virtio/virtio_ring.c
> @@ -175,6 +175,11 @@ struct vring_virtqueue {
>         /* Do DMA mapping by driver */
>         bool premapped;
>
> +       /* Do unmap or not for desc. Just when premapped is False and
> +        * use_dma_api is true, this is true.
> +        */
> +       bool do_unmap;
> +
>         /* Head of free buffer list. */
>         unsigned int free_head;
>         /* Number we've added since last sync. */
> @@ -440,7 +445,7 @@ static void vring_unmap_one_split_indirect(const struct vring_virtqueue *vq,
>  {
>         u16 flags;
>
> -       if (!vq->use_dma_api)
> +       if (!vq->do_unmap)
>                 return;
>
>         flags = virtio16_to_cpu(vq->vq.vdev, desc->flags);
> @@ -458,18 +463,21 @@ static unsigned int vring_unmap_one_split(const struct vring_virtqueue *vq,
>         struct vring_desc_extra *extra = vq->split.desc_extra;
>         u16 flags;
>
> -       if (!vq->use_dma_api)
> -               goto out;
> -
>         flags = extra[i].flags;
>
>         if (flags & VRING_DESC_F_INDIRECT) {
> +               if (!vq->use_dma_api)
> +                       goto out;
> +
>                 dma_unmap_single(vring_dma_dev(vq),
>                                  extra[i].addr,
>                                  extra[i].len,
>                                  (flags & VRING_DESC_F_WRITE) ?
>                                  DMA_FROM_DEVICE : DMA_TO_DEVICE);
>         } else {
> +               if (!vq->do_unmap)
> +                       goto out;
> +
>                 dma_unmap_page(vring_dma_dev(vq),
>                                extra[i].addr,
>                                extra[i].len,
> @@ -635,7 +643,7 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
>         }
>         /* Last one doesn't continue. */
>         desc[prev].flags &= cpu_to_virtio16(_vq->vdev, ~VRING_DESC_F_NEXT);
> -       if (!indirect && vq->use_dma_api)
> +       if (!indirect && vq->do_unmap)
>                 vq->split.desc_extra[prev & (vq->split.vring.num - 1)].flags &=
>                         ~VRING_DESC_F_NEXT;
>
> @@ -794,7 +802,7 @@ static void detach_buf_split(struct vring_virtqueue *vq, unsigned int head,
>                                 VRING_DESC_F_INDIRECT));
>                 BUG_ON(len == 0 || len % sizeof(struct vring_desc));
>
> -               if (vq->use_dma_api) {
> +               if (vq->do_unmap) {
>                         for (j = 0; j < len / sizeof(struct vring_desc); j++)
>                                 vring_unmap_one_split_indirect(vq, &indir_desc[j]);
>                 }
> @@ -1217,17 +1225,20 @@ static void vring_unmap_extra_packed(const struct vring_virtqueue *vq,
>  {
>         u16 flags;
>
> -       if (!vq->use_dma_api)
> -               return;
> -
>         flags = extra->flags;
>
>         if (flags & VRING_DESC_F_INDIRECT) {
> +               if (!vq->use_dma_api)
> +                       return;
> +
>                 dma_unmap_single(vring_dma_dev(vq),
>                                  extra->addr, extra->len,
>                                  (flags & VRING_DESC_F_WRITE) ?
>                                  DMA_FROM_DEVICE : DMA_TO_DEVICE);
>         } else {
> +               if (!vq->do_unmap)
> +                       return;

This seems not straightforward than:

if (!vq->use_dma_api)
    return;

if (INDIRECT) {
} else if (!vq->premapped) {
}

?

Thanks

> +
>                 dma_unmap_page(vring_dma_dev(vq),
>                                extra->addr, extra->len,
>                                (flags & VRING_DESC_F_WRITE) ?
> @@ -1240,7 +1251,7 @@ static void vring_unmap_desc_packed(const struct vring_virtqueue *vq,
>  {
>         u16 flags;
>
> -       if (!vq->use_dma_api)
> +       if (!vq->do_unmap)
>                 return;
>
>         flags = le16_to_cpu(desc->flags);
> @@ -1329,7 +1340,7 @@ static int virtqueue_add_indirect_packed(struct vring_virtqueue *vq,
>                                 sizeof(struct vring_packed_desc));
>         vq->packed.vring.desc[head].id = cpu_to_le16(id);
>
> -       if (vq->use_dma_api) {
> +       if (vq->do_unmap) {
>                 vq->packed.desc_extra[id].addr = addr;
>                 vq->packed.desc_extra[id].len = total_sg *
>                                 sizeof(struct vring_packed_desc);
> @@ -1470,7 +1481,7 @@ static inline int virtqueue_add_packed(struct virtqueue *_vq,
>                         desc[i].len = cpu_to_le32(sg->length);
>                         desc[i].id = cpu_to_le16(id);
>
> -                       if (unlikely(vq->use_dma_api)) {
> +                       if (unlikely(vq->do_unmap)) {
>                                 vq->packed.desc_extra[curr].addr = addr;
>                                 vq->packed.desc_extra[curr].len = sg->length;
>                                 vq->packed.desc_extra[curr].flags =
> @@ -1604,7 +1615,7 @@ static void detach_buf_packed(struct vring_virtqueue *vq,
>         vq->free_head = id;
>         vq->vq.num_free += state->num;
>
> -       if (unlikely(vq->use_dma_api)) {
> +       if (unlikely(vq->do_unmap)) {
>                 curr = id;
>                 for (i = 0; i < state->num; i++) {
>                         vring_unmap_extra_packed(vq,
> @@ -1621,7 +1632,7 @@ static void detach_buf_packed(struct vring_virtqueue *vq,
>                 if (!desc)
>                         return;
>
> -               if (vq->use_dma_api) {
> +               if (vq->do_unmap) {
>                         len = vq->packed.desc_extra[id].len;
>                         for (i = 0; i < len / sizeof(struct vring_packed_desc);
>                                         i++)
> @@ -2080,6 +2091,7 @@ static struct virtqueue *vring_create_virtqueue_packed(
>         vq->dma_dev = dma_dev;
>         vq->use_dma_api = vring_use_dma_api(vdev);
>         vq->premapped = false;
> +       vq->do_unmap = vq->use_dma_api;
>
>         vq->indirect = virtio_has_feature(vdev, VIRTIO_RING_F_INDIRECT_DESC) &&
>                 !context;
> @@ -2587,6 +2599,7 @@ static struct virtqueue *__vring_new_virtqueue(unsigned int index,
>         vq->dma_dev = dma_dev;
>         vq->use_dma_api = vring_use_dma_api(vdev);
>         vq->premapped = false;
> +       vq->do_unmap = vq->use_dma_api;
>
>         vq->indirect = virtio_has_feature(vdev, VIRTIO_RING_F_INDIRECT_DESC) &&
>                 !context;
> @@ -2765,6 +2778,7 @@ int virtqueue_set_premapped(struct virtqueue *_vq)
>                 return -EINVAL;
>
>         vq->premapped = true;
> +       vq->do_unmap = false;
>
>         return 0;
>  }
> --
> 2.32.0.3.g01195cf9f
>

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: [PATCH vhost v11 06/10] virtio_ring: skip unmap for premapped
@ 2023-07-13  3:50     ` Jason Wang
  0 siblings, 0 replies; 176+ messages in thread
From: Jason Wang @ 2023-07-13  3:50 UTC (permalink / raw)
  To: Xuan Zhuo
  Cc: virtualization, Michael S. Tsirkin, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Alexei Starovoitov,
	Daniel Borkmann, Jesper Dangaard Brouer, John Fastabend, netdev,
	bpf, Christoph Hellwig

On Mon, Jul 10, 2023 at 11:42 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
>
> Now we add a case where we skip dma unmap, the vq->premapped is true.
>
> We can't just rely on use_dma_api to determine whether to skip the dma
> operation. For convenience, I introduced the "do_unmap". By default, it
> is the same as use_dma_api. If the driver is configured with premapped,
> then do_unmap is false.
>
> So as long as do_unmap is false, for addr of desc, we should skip dma
> unmap operation.
>
> Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> ---
>  drivers/virtio/virtio_ring.c | 42 ++++++++++++++++++++++++------------
>  1 file changed, 28 insertions(+), 14 deletions(-)
>
> diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> index 1fb2c6dca9ea..10ee3b7ce571 100644
> --- a/drivers/virtio/virtio_ring.c
> +++ b/drivers/virtio/virtio_ring.c
> @@ -175,6 +175,11 @@ struct vring_virtqueue {
>         /* Do DMA mapping by driver */
>         bool premapped;
>
> +       /* Do unmap or not for desc. Just when premapped is False and
> +        * use_dma_api is true, this is true.
> +        */
> +       bool do_unmap;
> +
>         /* Head of free buffer list. */
>         unsigned int free_head;
>         /* Number we've added since last sync. */
> @@ -440,7 +445,7 @@ static void vring_unmap_one_split_indirect(const struct vring_virtqueue *vq,
>  {
>         u16 flags;
>
> -       if (!vq->use_dma_api)
> +       if (!vq->do_unmap)
>                 return;
>
>         flags = virtio16_to_cpu(vq->vq.vdev, desc->flags);
> @@ -458,18 +463,21 @@ static unsigned int vring_unmap_one_split(const struct vring_virtqueue *vq,
>         struct vring_desc_extra *extra = vq->split.desc_extra;
>         u16 flags;
>
> -       if (!vq->use_dma_api)
> -               goto out;
> -
>         flags = extra[i].flags;
>
>         if (flags & VRING_DESC_F_INDIRECT) {
> +               if (!vq->use_dma_api)
> +                       goto out;
> +
>                 dma_unmap_single(vring_dma_dev(vq),
>                                  extra[i].addr,
>                                  extra[i].len,
>                                  (flags & VRING_DESC_F_WRITE) ?
>                                  DMA_FROM_DEVICE : DMA_TO_DEVICE);
>         } else {
> +               if (!vq->do_unmap)
> +                       goto out;
> +
>                 dma_unmap_page(vring_dma_dev(vq),
>                                extra[i].addr,
>                                extra[i].len,
> @@ -635,7 +643,7 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
>         }
>         /* Last one doesn't continue. */
>         desc[prev].flags &= cpu_to_virtio16(_vq->vdev, ~VRING_DESC_F_NEXT);
> -       if (!indirect && vq->use_dma_api)
> +       if (!indirect && vq->do_unmap)
>                 vq->split.desc_extra[prev & (vq->split.vring.num - 1)].flags &=
>                         ~VRING_DESC_F_NEXT;
>
> @@ -794,7 +802,7 @@ static void detach_buf_split(struct vring_virtqueue *vq, unsigned int head,
>                                 VRING_DESC_F_INDIRECT));
>                 BUG_ON(len == 0 || len % sizeof(struct vring_desc));
>
> -               if (vq->use_dma_api) {
> +               if (vq->do_unmap) {
>                         for (j = 0; j < len / sizeof(struct vring_desc); j++)
>                                 vring_unmap_one_split_indirect(vq, &indir_desc[j]);
>                 }
> @@ -1217,17 +1225,20 @@ static void vring_unmap_extra_packed(const struct vring_virtqueue *vq,
>  {
>         u16 flags;
>
> -       if (!vq->use_dma_api)
> -               return;
> -
>         flags = extra->flags;
>
>         if (flags & VRING_DESC_F_INDIRECT) {
> +               if (!vq->use_dma_api)
> +                       return;
> +
>                 dma_unmap_single(vring_dma_dev(vq),
>                                  extra->addr, extra->len,
>                                  (flags & VRING_DESC_F_WRITE) ?
>                                  DMA_FROM_DEVICE : DMA_TO_DEVICE);
>         } else {
> +               if (!vq->do_unmap)
> +                       return;

This seems not straightforward than:

if (!vq->use_dma_api)
    return;

if (INDIRECT) {
} else if (!vq->premapped) {
}

?

Thanks

> +
>                 dma_unmap_page(vring_dma_dev(vq),
>                                extra->addr, extra->len,
>                                (flags & VRING_DESC_F_WRITE) ?
> @@ -1240,7 +1251,7 @@ static void vring_unmap_desc_packed(const struct vring_virtqueue *vq,
>  {
>         u16 flags;
>
> -       if (!vq->use_dma_api)
> +       if (!vq->do_unmap)
>                 return;
>
>         flags = le16_to_cpu(desc->flags);
> @@ -1329,7 +1340,7 @@ static int virtqueue_add_indirect_packed(struct vring_virtqueue *vq,
>                                 sizeof(struct vring_packed_desc));
>         vq->packed.vring.desc[head].id = cpu_to_le16(id);
>
> -       if (vq->use_dma_api) {
> +       if (vq->do_unmap) {
>                 vq->packed.desc_extra[id].addr = addr;
>                 vq->packed.desc_extra[id].len = total_sg *
>                                 sizeof(struct vring_packed_desc);
> @@ -1470,7 +1481,7 @@ static inline int virtqueue_add_packed(struct virtqueue *_vq,
>                         desc[i].len = cpu_to_le32(sg->length);
>                         desc[i].id = cpu_to_le16(id);
>
> -                       if (unlikely(vq->use_dma_api)) {
> +                       if (unlikely(vq->do_unmap)) {
>                                 vq->packed.desc_extra[curr].addr = addr;
>                                 vq->packed.desc_extra[curr].len = sg->length;
>                                 vq->packed.desc_extra[curr].flags =
> @@ -1604,7 +1615,7 @@ static void detach_buf_packed(struct vring_virtqueue *vq,
>         vq->free_head = id;
>         vq->vq.num_free += state->num;
>
> -       if (unlikely(vq->use_dma_api)) {
> +       if (unlikely(vq->do_unmap)) {
>                 curr = id;
>                 for (i = 0; i < state->num; i++) {
>                         vring_unmap_extra_packed(vq,
> @@ -1621,7 +1632,7 @@ static void detach_buf_packed(struct vring_virtqueue *vq,
>                 if (!desc)
>                         return;
>
> -               if (vq->use_dma_api) {
> +               if (vq->do_unmap) {
>                         len = vq->packed.desc_extra[id].len;
>                         for (i = 0; i < len / sizeof(struct vring_packed_desc);
>                                         i++)
> @@ -2080,6 +2091,7 @@ static struct virtqueue *vring_create_virtqueue_packed(
>         vq->dma_dev = dma_dev;
>         vq->use_dma_api = vring_use_dma_api(vdev);
>         vq->premapped = false;
> +       vq->do_unmap = vq->use_dma_api;
>
>         vq->indirect = virtio_has_feature(vdev, VIRTIO_RING_F_INDIRECT_DESC) &&
>                 !context;
> @@ -2587,6 +2599,7 @@ static struct virtqueue *__vring_new_virtqueue(unsigned int index,
>         vq->dma_dev = dma_dev;
>         vq->use_dma_api = vring_use_dma_api(vdev);
>         vq->premapped = false;
> +       vq->do_unmap = vq->use_dma_api;
>
>         vq->indirect = virtio_has_feature(vdev, VIRTIO_RING_F_INDIRECT_DESC) &&
>                 !context;
> @@ -2765,6 +2778,7 @@ int virtqueue_set_premapped(struct virtqueue *_vq)
>                 return -EINVAL;
>
>         vq->premapped = true;
> +       vq->do_unmap = false;
>
>         return 0;
>  }
> --
> 2.32.0.3.g01195cf9f
>


^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: [PATCH vhost v11 06/10] virtio_ring: skip unmap for premapped
  2023-07-13  3:50     ` Jason Wang
@ 2023-07-13  4:02       ` Xuan Zhuo
  -1 siblings, 0 replies; 176+ messages in thread
From: Xuan Zhuo @ 2023-07-13  4:02 UTC (permalink / raw)
  To: Jason Wang
  Cc: Jesper Dangaard Brouer, Daniel Borkmann, Michael S. Tsirkin,
	netdev, John Fastabend, Alexei Starovoitov, virtualization,
	Christoph Hellwig, Eric Dumazet, Jakub Kicinski, bpf,
	Paolo Abeni, David S. Miller

On Thu, 13 Jul 2023 11:50:57 +0800, Jason Wang <jasowang@redhat.com> wrote:
> On Mon, Jul 10, 2023 at 11:42 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> >
> > Now we add a case where we skip dma unmap, the vq->premapped is true.
> >
> > We can't just rely on use_dma_api to determine whether to skip the dma
> > operation. For convenience, I introduced the "do_unmap". By default, it
> > is the same as use_dma_api. If the driver is configured with premapped,
> > then do_unmap is false.
> >
> > So as long as do_unmap is false, for addr of desc, we should skip dma
> > unmap operation.
> >
> > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > ---
> >  drivers/virtio/virtio_ring.c | 42 ++++++++++++++++++++++++------------
> >  1 file changed, 28 insertions(+), 14 deletions(-)
> >
> > diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> > index 1fb2c6dca9ea..10ee3b7ce571 100644
> > --- a/drivers/virtio/virtio_ring.c
> > +++ b/drivers/virtio/virtio_ring.c
> > @@ -175,6 +175,11 @@ struct vring_virtqueue {
> >         /* Do DMA mapping by driver */
> >         bool premapped;
> >
> > +       /* Do unmap or not for desc. Just when premapped is False and
> > +        * use_dma_api is true, this is true.
> > +        */
> > +       bool do_unmap;
> > +
> >         /* Head of free buffer list. */
> >         unsigned int free_head;
> >         /* Number we've added since last sync. */
> > @@ -440,7 +445,7 @@ static void vring_unmap_one_split_indirect(const struct vring_virtqueue *vq,
> >  {
> >         u16 flags;
> >
> > -       if (!vq->use_dma_api)
> > +       if (!vq->do_unmap)
> >                 return;
> >
> >         flags = virtio16_to_cpu(vq->vq.vdev, desc->flags);
> > @@ -458,18 +463,21 @@ static unsigned int vring_unmap_one_split(const struct vring_virtqueue *vq,
> >         struct vring_desc_extra *extra = vq->split.desc_extra;
> >         u16 flags;
> >
> > -       if (!vq->use_dma_api)
> > -               goto out;
> > -
> >         flags = extra[i].flags;
> >
> >         if (flags & VRING_DESC_F_INDIRECT) {
> > +               if (!vq->use_dma_api)
> > +                       goto out;
> > +
> >                 dma_unmap_single(vring_dma_dev(vq),
> >                                  extra[i].addr,
> >                                  extra[i].len,
> >                                  (flags & VRING_DESC_F_WRITE) ?
> >                                  DMA_FROM_DEVICE : DMA_TO_DEVICE);
> >         } else {
> > +               if (!vq->do_unmap)
> > +                       goto out;
> > +
> >                 dma_unmap_page(vring_dma_dev(vq),
> >                                extra[i].addr,
> >                                extra[i].len,
> > @@ -635,7 +643,7 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
> >         }
> >         /* Last one doesn't continue. */
> >         desc[prev].flags &= cpu_to_virtio16(_vq->vdev, ~VRING_DESC_F_NEXT);
> > -       if (!indirect && vq->use_dma_api)
> > +       if (!indirect && vq->do_unmap)
> >                 vq->split.desc_extra[prev & (vq->split.vring.num - 1)].flags &=
> >                         ~VRING_DESC_F_NEXT;
> >
> > @@ -794,7 +802,7 @@ static void detach_buf_split(struct vring_virtqueue *vq, unsigned int head,
> >                                 VRING_DESC_F_INDIRECT));
> >                 BUG_ON(len == 0 || len % sizeof(struct vring_desc));
> >
> > -               if (vq->use_dma_api) {
> > +               if (vq->do_unmap) {
> >                         for (j = 0; j < len / sizeof(struct vring_desc); j++)
> >                                 vring_unmap_one_split_indirect(vq, &indir_desc[j]);
> >                 }
> > @@ -1217,17 +1225,20 @@ static void vring_unmap_extra_packed(const struct vring_virtqueue *vq,
> >  {
> >         u16 flags;
> >
> > -       if (!vq->use_dma_api)
> > -               return;
> > -
> >         flags = extra->flags;
> >
> >         if (flags & VRING_DESC_F_INDIRECT) {
> > +               if (!vq->use_dma_api)
> > +                       return;
> > +
> >                 dma_unmap_single(vring_dma_dev(vq),
> >                                  extra->addr, extra->len,
> >                                  (flags & VRING_DESC_F_WRITE) ?
> >                                  DMA_FROM_DEVICE : DMA_TO_DEVICE);
> >         } else {
> > +               if (!vq->do_unmap)
> > +                       return;
>
> This seems not straightforward than:
>
> if (!vq->use_dma_api)
>     return;
>
> if (INDIRECT) {
> } else if (!vq->premapped) {
> }
>
> ?


My logic here is that for the real buffer, we use do_unmap to judge uniformly.
And indirect still use use_dma_api to judge.

From this point of view, how do you feel?

Thanks.


>
> Thanks
>
> > +
> >                 dma_unmap_page(vring_dma_dev(vq),
> >                                extra->addr, extra->len,
> >                                (flags & VRING_DESC_F_WRITE) ?
> > @@ -1240,7 +1251,7 @@ static void vring_unmap_desc_packed(const struct vring_virtqueue *vq,
> >  {
> >         u16 flags;
> >
> > -       if (!vq->use_dma_api)
> > +       if (!vq->do_unmap)
> >                 return;
> >
> >         flags = le16_to_cpu(desc->flags);
> > @@ -1329,7 +1340,7 @@ static int virtqueue_add_indirect_packed(struct vring_virtqueue *vq,
> >                                 sizeof(struct vring_packed_desc));
> >         vq->packed.vring.desc[head].id = cpu_to_le16(id);
> >
> > -       if (vq->use_dma_api) {
> > +       if (vq->do_unmap) {
> >                 vq->packed.desc_extra[id].addr = addr;
> >                 vq->packed.desc_extra[id].len = total_sg *
> >                                 sizeof(struct vring_packed_desc);
> > @@ -1470,7 +1481,7 @@ static inline int virtqueue_add_packed(struct virtqueue *_vq,
> >                         desc[i].len = cpu_to_le32(sg->length);
> >                         desc[i].id = cpu_to_le16(id);
> >
> > -                       if (unlikely(vq->use_dma_api)) {
> > +                       if (unlikely(vq->do_unmap)) {
> >                                 vq->packed.desc_extra[curr].addr = addr;
> >                                 vq->packed.desc_extra[curr].len = sg->length;
> >                                 vq->packed.desc_extra[curr].flags =
> > @@ -1604,7 +1615,7 @@ static void detach_buf_packed(struct vring_virtqueue *vq,
> >         vq->free_head = id;
> >         vq->vq.num_free += state->num;
> >
> > -       if (unlikely(vq->use_dma_api)) {
> > +       if (unlikely(vq->do_unmap)) {
> >                 curr = id;
> >                 for (i = 0; i < state->num; i++) {
> >                         vring_unmap_extra_packed(vq,
> > @@ -1621,7 +1632,7 @@ static void detach_buf_packed(struct vring_virtqueue *vq,
> >                 if (!desc)
> >                         return;
> >
> > -               if (vq->use_dma_api) {
> > +               if (vq->do_unmap) {
> >                         len = vq->packed.desc_extra[id].len;
> >                         for (i = 0; i < len / sizeof(struct vring_packed_desc);
> >                                         i++)
> > @@ -2080,6 +2091,7 @@ static struct virtqueue *vring_create_virtqueue_packed(
> >         vq->dma_dev = dma_dev;
> >         vq->use_dma_api = vring_use_dma_api(vdev);
> >         vq->premapped = false;
> > +       vq->do_unmap = vq->use_dma_api;
> >
> >         vq->indirect = virtio_has_feature(vdev, VIRTIO_RING_F_INDIRECT_DESC) &&
> >                 !context;
> > @@ -2587,6 +2599,7 @@ static struct virtqueue *__vring_new_virtqueue(unsigned int index,
> >         vq->dma_dev = dma_dev;
> >         vq->use_dma_api = vring_use_dma_api(vdev);
> >         vq->premapped = false;
> > +       vq->do_unmap = vq->use_dma_api;
> >
> >         vq->indirect = virtio_has_feature(vdev, VIRTIO_RING_F_INDIRECT_DESC) &&
> >                 !context;
> > @@ -2765,6 +2778,7 @@ int virtqueue_set_premapped(struct virtqueue *_vq)
> >                 return -EINVAL;
> >
> >         vq->premapped = true;
> > +       vq->do_unmap = false;
> >
> >         return 0;
> >  }
> > --
> > 2.32.0.3.g01195cf9f
> >
>
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: [PATCH vhost v11 06/10] virtio_ring: skip unmap for premapped
@ 2023-07-13  4:02       ` Xuan Zhuo
  0 siblings, 0 replies; 176+ messages in thread
From: Xuan Zhuo @ 2023-07-13  4:02 UTC (permalink / raw)
  To: Jason Wang
  Cc: virtualization, Michael S. Tsirkin, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Alexei Starovoitov,
	Daniel Borkmann, Jesper Dangaard Brouer, John Fastabend, netdev,
	bpf, Christoph Hellwig

On Thu, 13 Jul 2023 11:50:57 +0800, Jason Wang <jasowang@redhat.com> wrote:
> On Mon, Jul 10, 2023 at 11:42 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> >
> > Now we add a case where we skip dma unmap, the vq->premapped is true.
> >
> > We can't just rely on use_dma_api to determine whether to skip the dma
> > operation. For convenience, I introduced the "do_unmap". By default, it
> > is the same as use_dma_api. If the driver is configured with premapped,
> > then do_unmap is false.
> >
> > So as long as do_unmap is false, for addr of desc, we should skip dma
> > unmap operation.
> >
> > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > ---
> >  drivers/virtio/virtio_ring.c | 42 ++++++++++++++++++++++++------------
> >  1 file changed, 28 insertions(+), 14 deletions(-)
> >
> > diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> > index 1fb2c6dca9ea..10ee3b7ce571 100644
> > --- a/drivers/virtio/virtio_ring.c
> > +++ b/drivers/virtio/virtio_ring.c
> > @@ -175,6 +175,11 @@ struct vring_virtqueue {
> >         /* Do DMA mapping by driver */
> >         bool premapped;
> >
> > +       /* Do unmap or not for desc. Just when premapped is False and
> > +        * use_dma_api is true, this is true.
> > +        */
> > +       bool do_unmap;
> > +
> >         /* Head of free buffer list. */
> >         unsigned int free_head;
> >         /* Number we've added since last sync. */
> > @@ -440,7 +445,7 @@ static void vring_unmap_one_split_indirect(const struct vring_virtqueue *vq,
> >  {
> >         u16 flags;
> >
> > -       if (!vq->use_dma_api)
> > +       if (!vq->do_unmap)
> >                 return;
> >
> >         flags = virtio16_to_cpu(vq->vq.vdev, desc->flags);
> > @@ -458,18 +463,21 @@ static unsigned int vring_unmap_one_split(const struct vring_virtqueue *vq,
> >         struct vring_desc_extra *extra = vq->split.desc_extra;
> >         u16 flags;
> >
> > -       if (!vq->use_dma_api)
> > -               goto out;
> > -
> >         flags = extra[i].flags;
> >
> >         if (flags & VRING_DESC_F_INDIRECT) {
> > +               if (!vq->use_dma_api)
> > +                       goto out;
> > +
> >                 dma_unmap_single(vring_dma_dev(vq),
> >                                  extra[i].addr,
> >                                  extra[i].len,
> >                                  (flags & VRING_DESC_F_WRITE) ?
> >                                  DMA_FROM_DEVICE : DMA_TO_DEVICE);
> >         } else {
> > +               if (!vq->do_unmap)
> > +                       goto out;
> > +
> >                 dma_unmap_page(vring_dma_dev(vq),
> >                                extra[i].addr,
> >                                extra[i].len,
> > @@ -635,7 +643,7 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
> >         }
> >         /* Last one doesn't continue. */
> >         desc[prev].flags &= cpu_to_virtio16(_vq->vdev, ~VRING_DESC_F_NEXT);
> > -       if (!indirect && vq->use_dma_api)
> > +       if (!indirect && vq->do_unmap)
> >                 vq->split.desc_extra[prev & (vq->split.vring.num - 1)].flags &=
> >                         ~VRING_DESC_F_NEXT;
> >
> > @@ -794,7 +802,7 @@ static void detach_buf_split(struct vring_virtqueue *vq, unsigned int head,
> >                                 VRING_DESC_F_INDIRECT));
> >                 BUG_ON(len == 0 || len % sizeof(struct vring_desc));
> >
> > -               if (vq->use_dma_api) {
> > +               if (vq->do_unmap) {
> >                         for (j = 0; j < len / sizeof(struct vring_desc); j++)
> >                                 vring_unmap_one_split_indirect(vq, &indir_desc[j]);
> >                 }
> > @@ -1217,17 +1225,20 @@ static void vring_unmap_extra_packed(const struct vring_virtqueue *vq,
> >  {
> >         u16 flags;
> >
> > -       if (!vq->use_dma_api)
> > -               return;
> > -
> >         flags = extra->flags;
> >
> >         if (flags & VRING_DESC_F_INDIRECT) {
> > +               if (!vq->use_dma_api)
> > +                       return;
> > +
> >                 dma_unmap_single(vring_dma_dev(vq),
> >                                  extra->addr, extra->len,
> >                                  (flags & VRING_DESC_F_WRITE) ?
> >                                  DMA_FROM_DEVICE : DMA_TO_DEVICE);
> >         } else {
> > +               if (!vq->do_unmap)
> > +                       return;
>
> This seems not straightforward than:
>
> if (!vq->use_dma_api)
>     return;
>
> if (INDIRECT) {
> } else if (!vq->premapped) {
> }
>
> ?


My logic here is that for the real buffer, we use do_unmap to judge uniformly.
And indirect still use use_dma_api to judge.

From this point of view, how do you feel?

Thanks.


>
> Thanks
>
> > +
> >                 dma_unmap_page(vring_dma_dev(vq),
> >                                extra->addr, extra->len,
> >                                (flags & VRING_DESC_F_WRITE) ?
> > @@ -1240,7 +1251,7 @@ static void vring_unmap_desc_packed(const struct vring_virtqueue *vq,
> >  {
> >         u16 flags;
> >
> > -       if (!vq->use_dma_api)
> > +       if (!vq->do_unmap)
> >                 return;
> >
> >         flags = le16_to_cpu(desc->flags);
> > @@ -1329,7 +1340,7 @@ static int virtqueue_add_indirect_packed(struct vring_virtqueue *vq,
> >                                 sizeof(struct vring_packed_desc));
> >         vq->packed.vring.desc[head].id = cpu_to_le16(id);
> >
> > -       if (vq->use_dma_api) {
> > +       if (vq->do_unmap) {
> >                 vq->packed.desc_extra[id].addr = addr;
> >                 vq->packed.desc_extra[id].len = total_sg *
> >                                 sizeof(struct vring_packed_desc);
> > @@ -1470,7 +1481,7 @@ static inline int virtqueue_add_packed(struct virtqueue *_vq,
> >                         desc[i].len = cpu_to_le32(sg->length);
> >                         desc[i].id = cpu_to_le16(id);
> >
> > -                       if (unlikely(vq->use_dma_api)) {
> > +                       if (unlikely(vq->do_unmap)) {
> >                                 vq->packed.desc_extra[curr].addr = addr;
> >                                 vq->packed.desc_extra[curr].len = sg->length;
> >                                 vq->packed.desc_extra[curr].flags =
> > @@ -1604,7 +1615,7 @@ static void detach_buf_packed(struct vring_virtqueue *vq,
> >         vq->free_head = id;
> >         vq->vq.num_free += state->num;
> >
> > -       if (unlikely(vq->use_dma_api)) {
> > +       if (unlikely(vq->do_unmap)) {
> >                 curr = id;
> >                 for (i = 0; i < state->num; i++) {
> >                         vring_unmap_extra_packed(vq,
> > @@ -1621,7 +1632,7 @@ static void detach_buf_packed(struct vring_virtqueue *vq,
> >                 if (!desc)
> >                         return;
> >
> > -               if (vq->use_dma_api) {
> > +               if (vq->do_unmap) {
> >                         len = vq->packed.desc_extra[id].len;
> >                         for (i = 0; i < len / sizeof(struct vring_packed_desc);
> >                                         i++)
> > @@ -2080,6 +2091,7 @@ static struct virtqueue *vring_create_virtqueue_packed(
> >         vq->dma_dev = dma_dev;
> >         vq->use_dma_api = vring_use_dma_api(vdev);
> >         vq->premapped = false;
> > +       vq->do_unmap = vq->use_dma_api;
> >
> >         vq->indirect = virtio_has_feature(vdev, VIRTIO_RING_F_INDIRECT_DESC) &&
> >                 !context;
> > @@ -2587,6 +2599,7 @@ static struct virtqueue *__vring_new_virtqueue(unsigned int index,
> >         vq->dma_dev = dma_dev;
> >         vq->use_dma_api = vring_use_dma_api(vdev);
> >         vq->premapped = false;
> > +       vq->do_unmap = vq->use_dma_api;
> >
> >         vq->indirect = virtio_has_feature(vdev, VIRTIO_RING_F_INDIRECT_DESC) &&
> >                 !context;
> > @@ -2765,6 +2778,7 @@ int virtqueue_set_premapped(struct virtqueue *_vq)
> >                 return -EINVAL;
> >
> >         vq->premapped = true;
> > +       vq->do_unmap = false;
> >
> >         return 0;
> >  }
> > --
> > 2.32.0.3.g01195cf9f
> >
>

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: [PATCH vhost v11 10/10] virtio_net: merge dma operation for one page
  2023-07-10  3:42   ` Xuan Zhuo
@ 2023-07-13  4:20     ` Jason Wang
  -1 siblings, 0 replies; 176+ messages in thread
From: Jason Wang @ 2023-07-13  4:20 UTC (permalink / raw)
  To: Xuan Zhuo
  Cc: Jesper Dangaard Brouer, Daniel Borkmann, Michael S. Tsirkin,
	netdev, John Fastabend, Alexei Starovoitov, virtualization,
	Christoph Hellwig, Eric Dumazet, Jakub Kicinski, bpf,
	Paolo Abeni, David S. Miller

On Mon, Jul 10, 2023 at 11:43 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
>

I'd suggest to tweak the title like:

"merge dma operations when refilling mergeable buffers"

> Currently, the virtio core will perform a dma operation for each
> operation.

"for each buffer"?

> Although, the same page may be operated multiple times.
>
> The driver does the dma operation and manages the dma address based the
> feature premapped of virtio core.
>
> This way, we can perform only one dma operation for the same page. In
> the case of mtu 1500, this can reduce a lot of dma operations.
>
> Tested on Aliyun g7.4large machine, in the case of a cpu 100%, pps
> increased from 1893766 to 1901105. An increase of 0.4%.

Btw, it looks to me the code to deal with XDP_TX/REDIRECT for
linearized pages was missed.

>
> Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> ---
>  drivers/net/virtio_net.c | 283 ++++++++++++++++++++++++++++++++++++---
>  1 file changed, 267 insertions(+), 16 deletions(-)
>
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index 486b5849033d..4de845d35bed 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -126,6 +126,27 @@ static const struct virtnet_stat_desc virtnet_rq_stats_desc[] = {
>  #define VIRTNET_SQ_STATS_LEN   ARRAY_SIZE(virtnet_sq_stats_desc)
>  #define VIRTNET_RQ_STATS_LEN   ARRAY_SIZE(virtnet_rq_stats_desc)
>
> +/* The bufs on the same page may share this struct. */
> +struct virtnet_rq_dma {
> +       struct virtnet_rq_dma *next;
> +
> +       dma_addr_t addr;
> +
> +       void *buf;
> +       u32 len;
> +
> +       u32 ref;
> +};
> +
> +/* Record the dma and buf. */
> +struct virtnet_rq_data {
> +       struct virtnet_rq_data *next;
> +
> +       void *buf;
> +
> +       struct virtnet_rq_dma *dma;
> +};
> +
>  /* Internal representation of a send virtqueue */
>  struct send_queue {
>         /* Virtqueue associated with this send _queue */
> @@ -175,6 +196,13 @@ struct receive_queue {
>         char name[16];
>
>         struct xdp_rxq_info xdp_rxq;
> +
> +       struct virtnet_rq_data *data_array;
> +       struct virtnet_rq_data *data_free;
> +
> +       struct virtnet_rq_dma *dma_array;
> +       struct virtnet_rq_dma *dma_free;
> +       struct virtnet_rq_dma *last_dma;
>  };
>
>  /* This structure can contain rss message with maximum settings for indirection table and keysize
> @@ -549,6 +577,176 @@ static struct sk_buff *page_to_skb(struct virtnet_info *vi,
>         return skb;
>  }
>
> +static void virtnet_rq_unmap(struct receive_queue *rq, struct virtnet_rq_dma *dma)
> +{
> +       struct device *dev;
> +
> +       --dma->ref;
> +
> +       if (dma->ref)
> +               return;
> +
> +       dev = virtqueue_dma_dev(rq->vq);
> +
> +       dma_unmap_page(dev, dma->addr, dma->len, DMA_FROM_DEVICE);
> +
> +       dma->next = rq->dma_free;
> +       rq->dma_free = dma;
> +}
> +
> +static void *virtnet_rq_recycle_data(struct receive_queue *rq,
> +                                    struct virtnet_rq_data *data)
> +{
> +       void *buf;
> +
> +       buf = data->buf;
> +
> +       data->next = rq->data_free;
> +       rq->data_free = data;
> +
> +       return buf;
> +}
> +
> +static struct virtnet_rq_data *virtnet_rq_get_data(struct receive_queue *rq,
> +                                                  void *buf,
> +                                                  struct virtnet_rq_dma *dma)
> +{
> +       struct virtnet_rq_data *data;
> +
> +       data = rq->data_free;
> +       rq->data_free = data->next;
> +
> +       data->buf = buf;
> +       data->dma = dma;
> +
> +       return data;
> +}
> +
> +static void *virtnet_rq_get_buf(struct receive_queue *rq, u32 *len, void **ctx)
> +{
> +       struct virtnet_rq_data *data;
> +       void *buf;
> +
> +       buf = virtqueue_get_buf_ctx(rq->vq, len, ctx);
> +       if (!buf || !rq->data_array)
> +               return buf;
> +
> +       data = buf;
> +
> +       virtnet_rq_unmap(rq, data->dma);
> +
> +       return virtnet_rq_recycle_data(rq, data);
> +}
> +
> +static void *virtnet_rq_detach_unused_buf(struct receive_queue *rq)
> +{
> +       struct virtnet_rq_data *data;
> +       void *buf;
> +
> +       buf = virtqueue_detach_unused_buf(rq->vq);
> +       if (!buf || !rq->data_array)
> +               return buf;
> +
> +       data = buf;
> +
> +       virtnet_rq_unmap(rq, data->dma);
> +
> +       return virtnet_rq_recycle_data(rq, data);
> +}
> +
> +static int virtnet_rq_map_sg(struct receive_queue *rq, void *buf, u32 len)
> +{
> +       struct virtnet_rq_dma *dma = rq->last_dma;
> +       struct device *dev;
> +       u32 off, map_len;
> +       dma_addr_t addr;
> +       void *end;
> +
> +       if (likely(dma) && buf >= dma->buf && (buf + len <= dma->buf + dma->len)) {
> +               ++dma->ref;
> +               addr = dma->addr + (buf - dma->buf);
> +               goto ok;
> +       }
> +
> +       end = buf + len - 1;
> +       off = offset_in_page(end);
> +       map_len = len + PAGE_SIZE - off;

This assumes a PAGE_SIZE which seems sub-optimal as page frag could be
larger than this.

> +
> +       dev = virtqueue_dma_dev(rq->vq);
> +
> +       addr = dma_map_page_attrs(dev, virt_to_page(buf), offset_in_page(buf),
> +                                 map_len, DMA_FROM_DEVICE, 0);
> +       if (addr == DMA_MAPPING_ERROR)
> +               return -ENOMEM;
> +
> +       dma = rq->dma_free;
> +       rq->dma_free = dma->next;
> +
> +       dma->ref = 1;
> +       dma->buf = buf;
> +       dma->addr = addr;
> +       dma->len = map_len;
> +
> +       rq->last_dma = dma;
> +
> +ok:
> +       sg_init_table(rq->sg, 1);
> +       rq->sg[0].dma_address = addr;
> +       rq->sg[0].length = len;
> +
> +       return 0;
> +}
> +
> +static int virtnet_rq_merge_map_init(struct virtnet_info *vi)
> +{
> +       struct receive_queue *rq;
> +       int i, err, j, num;
> +
> +       /* disable for big mode */
> +       if (!vi->mergeable_rx_bufs && vi->big_packets)
> +               return 0;
> +
> +       for (i = 0; i < vi->max_queue_pairs; i++) {
> +               err = virtqueue_set_premapped(vi->rq[i].vq);
> +               if (err)
> +                       continue;
> +
> +               rq = &vi->rq[i];
> +
> +               num = virtqueue_get_vring_size(rq->vq);
> +
> +               rq->data_array = kmalloc_array(num, sizeof(*rq->data_array), GFP_KERNEL);
> +               if (!rq->data_array)

Can we avoid those allocations when we don't use the DMA API?

> +                       goto err;
> +
> +               rq->dma_array = kmalloc_array(num, sizeof(*rq->dma_array), GFP_KERNEL);
> +               if (!rq->dma_array)
> +                       goto err;
> +
> +               for (j = 0; j < num; ++j) {
> +                       rq->data_array[j].next = rq->data_free;
> +                       rq->data_free = &rq->data_array[j];
> +
> +                       rq->dma_array[j].next = rq->dma_free;
> +                       rq->dma_free = &rq->dma_array[j];
> +               }
> +       }
> +
> +       return 0;
> +
> +err:
> +       for (i = 0; i < vi->max_queue_pairs; i++) {
> +               struct receive_queue *rq;
> +
> +               rq = &vi->rq[i];
> +
> +               kfree(rq->dma_array);
> +               kfree(rq->data_array);
> +       }
> +
> +       return -ENOMEM;
> +}
> +
>  static void free_old_xmit_skbs(struct send_queue *sq, bool in_napi)
>  {
>         unsigned int len;
> @@ -835,7 +1033,7 @@ static struct page *xdp_linearize_page(struct receive_queue *rq,
>                 void *buf;
>                 int off;
>
> -               buf = virtqueue_get_buf(rq->vq, &buflen);
> +               buf = virtnet_rq_get_buf(rq, &buflen, NULL);
>                 if (unlikely(!buf))
>                         goto err_buf;
>
> @@ -1126,7 +1324,7 @@ static int virtnet_build_xdp_buff_mrg(struct net_device *dev,
>                 return -EINVAL;
>
>         while (--*num_buf > 0) {
> -               buf = virtqueue_get_buf_ctx(rq->vq, &len, &ctx);
> +               buf = virtnet_rq_get_buf(rq, &len, &ctx);
>                 if (unlikely(!buf)) {
>                         pr_debug("%s: rx error: %d buffers out of %d missing\n",
>                                  dev->name, *num_buf,
> @@ -1351,7 +1549,7 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
>         while (--num_buf) {
>                 int num_skb_frags;
>
> -               buf = virtqueue_get_buf_ctx(rq->vq, &len, &ctx);
> +               buf = virtnet_rq_get_buf(rq, &len, &ctx);
>                 if (unlikely(!buf)) {
>                         pr_debug("%s: rx error: %d buffers out of %d missing\n",
>                                  dev->name, num_buf,
> @@ -1414,7 +1612,7 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
>  err_skb:
>         put_page(page);
>         while (num_buf-- > 1) {
> -               buf = virtqueue_get_buf(rq->vq, &len);
> +               buf = virtnet_rq_get_buf(rq, &len, NULL);
>                 if (unlikely(!buf)) {
>                         pr_debug("%s: rx error: %d buffers missing\n",
>                                  dev->name, num_buf);
> @@ -1529,6 +1727,7 @@ static int add_recvbuf_small(struct virtnet_info *vi, struct receive_queue *rq,
>         unsigned int xdp_headroom = virtnet_get_headroom(vi);
>         void *ctx = (void *)(unsigned long)xdp_headroom;
>         int len = vi->hdr_len + VIRTNET_RX_PAD + GOOD_PACKET_LEN + xdp_headroom;
> +       struct virtnet_rq_data *data;
>         int err;
>
>         len = SKB_DATA_ALIGN(len) +
> @@ -1539,11 +1738,34 @@ static int add_recvbuf_small(struct virtnet_info *vi, struct receive_queue *rq,
>         buf = (char *)page_address(alloc_frag->page) + alloc_frag->offset;
>         get_page(alloc_frag->page);
>         alloc_frag->offset += len;
> -       sg_init_one(rq->sg, buf + VIRTNET_RX_PAD + xdp_headroom,
> -                   vi->hdr_len + GOOD_PACKET_LEN);
> -       err = virtqueue_add_inbuf_ctx(rq->vq, rq->sg, 1, buf, ctx, gfp);
> +
> +       if (rq->data_array) {
> +               err = virtnet_rq_map_sg(rq, buf + VIRTNET_RX_PAD + xdp_headroom,
> +                                       vi->hdr_len + GOOD_PACKET_LEN);

Thanks to the compound page. I wonder if everything could be
simplified if we just reuse page->private for storing metadata like
dma address and refcnt. Then we don't need extra stuff for tracking
any other thing?

Thanks



> +               if (err)
> +                       goto map_err;
> +
> +               data = virtnet_rq_get_data(rq, buf, rq->last_dma);
> +       } else {
> +               sg_init_one(rq->sg, buf + VIRTNET_RX_PAD + xdp_headroom,
> +                           vi->hdr_len + GOOD_PACKET_LEN);
> +               data = (void *)buf;
> +       }
> +
> +       err = virtqueue_add_inbuf_ctx(rq->vq, rq->sg, 1, data, ctx, gfp);
>         if (err < 0)
> -               put_page(virt_to_head_page(buf));
> +               goto add_err;
> +
> +       return err;
> +
> +add_err:
> +       if (rq->data_array) {
> +               virtnet_rq_unmap(rq, data->dma);
> +               virtnet_rq_recycle_data(rq, data);
> +       }
> +
> +map_err:
> +       put_page(virt_to_head_page(buf));
>         return err;
>  }
>
> @@ -1620,6 +1842,7 @@ static int add_recvbuf_mergeable(struct virtnet_info *vi,
>         unsigned int headroom = virtnet_get_headroom(vi);
>         unsigned int tailroom = headroom ? sizeof(struct skb_shared_info) : 0;
>         unsigned int room = SKB_DATA_ALIGN(headroom + tailroom);
> +       struct virtnet_rq_data *data;
>         char *buf;
>         void *ctx;
>         int err;
> @@ -1650,12 +1873,32 @@ static int add_recvbuf_mergeable(struct virtnet_info *vi,
>                 alloc_frag->offset += hole;
>         }
>
> -       sg_init_one(rq->sg, buf, len);
> +       if (rq->data_array) {
> +               err = virtnet_rq_map_sg(rq, buf, len);
> +               if (err)
> +                       goto map_err;
> +
> +               data = virtnet_rq_get_data(rq, buf, rq->last_dma);
> +       } else {
> +               sg_init_one(rq->sg, buf, len);
> +               data = (void *)buf;
> +       }
> +
>         ctx = mergeable_len_to_ctx(len + room, headroom);
> -       err = virtqueue_add_inbuf_ctx(rq->vq, rq->sg, 1, buf, ctx, gfp);
> +       err = virtqueue_add_inbuf_ctx(rq->vq, rq->sg, 1, data, ctx, gfp);
>         if (err < 0)
> -               put_page(virt_to_head_page(buf));
> +               goto add_err;
> +
> +       return 0;
> +
> +add_err:
> +       if (rq->data_array) {
> +               virtnet_rq_unmap(rq, data->dma);
> +               virtnet_rq_recycle_data(rq, data);
> +       }
>
> +map_err:
> +       put_page(virt_to_head_page(buf));
>         return err;
>  }
>
> @@ -1775,13 +2018,13 @@ static int virtnet_receive(struct receive_queue *rq, int budget,
>                 void *ctx;
>
>                 while (stats.packets < budget &&
> -                      (buf = virtqueue_get_buf_ctx(rq->vq, &len, &ctx))) {
> +                      (buf = virtnet_rq_get_buf(rq, &len, &ctx))) {
>                         receive_buf(vi, rq, buf, len, ctx, xdp_xmit, &stats);
>                         stats.packets++;
>                 }
>         } else {
>                 while (stats.packets < budget &&
> -                      (buf = virtqueue_get_buf(rq->vq, &len)) != NULL) {
> +                      (buf = virtnet_rq_get_buf(rq, &len, NULL)) != NULL) {
>                         receive_buf(vi, rq, buf, len, NULL, xdp_xmit, &stats);
>                         stats.packets++;
>                 }
> @@ -3514,6 +3757,9 @@ static void virtnet_free_queues(struct virtnet_info *vi)
>         for (i = 0; i < vi->max_queue_pairs; i++) {
>                 __netif_napi_del(&vi->rq[i].napi);
>                 __netif_napi_del(&vi->sq[i].napi);
> +
> +               kfree(vi->rq[i].data_array);
> +               kfree(vi->rq[i].dma_array);
>         }
>
>         /* We called __netif_napi_del(),
> @@ -3591,9 +3837,10 @@ static void free_unused_bufs(struct virtnet_info *vi)
>         }
>
>         for (i = 0; i < vi->max_queue_pairs; i++) {
> -               struct virtqueue *vq = vi->rq[i].vq;
> -               while ((buf = virtqueue_detach_unused_buf(vq)) != NULL)
> -                       virtnet_rq_free_unused_buf(vq, buf);
> +               struct receive_queue *rq = &vi->rq[i];
> +
> +               while ((buf = virtnet_rq_detach_unused_buf(rq)) != NULL)
> +                       virtnet_rq_free_unused_buf(rq->vq, buf);
>                 cond_resched();
>         }
>  }
> @@ -3767,6 +4014,10 @@ static int init_vqs(struct virtnet_info *vi)
>         if (ret)
>                 goto err_free;
>
> +       ret = virtnet_rq_merge_map_init(vi);
> +       if (ret)
> +               goto err_free;
> +
>         cpus_read_lock();
>         virtnet_set_affinity(vi);
>         cpus_read_unlock();
> --
> 2.32.0.3.g01195cf9f
>

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: [PATCH vhost v11 10/10] virtio_net: merge dma operation for one page
@ 2023-07-13  4:20     ` Jason Wang
  0 siblings, 0 replies; 176+ messages in thread
From: Jason Wang @ 2023-07-13  4:20 UTC (permalink / raw)
  To: Xuan Zhuo
  Cc: virtualization, Michael S. Tsirkin, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Alexei Starovoitov,
	Daniel Borkmann, Jesper Dangaard Brouer, John Fastabend, netdev,
	bpf, Christoph Hellwig

On Mon, Jul 10, 2023 at 11:43 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
>

I'd suggest to tweak the title like:

"merge dma operations when refilling mergeable buffers"

> Currently, the virtio core will perform a dma operation for each
> operation.

"for each buffer"?

> Although, the same page may be operated multiple times.
>
> The driver does the dma operation and manages the dma address based the
> feature premapped of virtio core.
>
> This way, we can perform only one dma operation for the same page. In
> the case of mtu 1500, this can reduce a lot of dma operations.
>
> Tested on Aliyun g7.4large machine, in the case of a cpu 100%, pps
> increased from 1893766 to 1901105. An increase of 0.4%.

Btw, it looks to me the code to deal with XDP_TX/REDIRECT for
linearized pages was missed.

>
> Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> ---
>  drivers/net/virtio_net.c | 283 ++++++++++++++++++++++++++++++++++++---
>  1 file changed, 267 insertions(+), 16 deletions(-)
>
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index 486b5849033d..4de845d35bed 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -126,6 +126,27 @@ static const struct virtnet_stat_desc virtnet_rq_stats_desc[] = {
>  #define VIRTNET_SQ_STATS_LEN   ARRAY_SIZE(virtnet_sq_stats_desc)
>  #define VIRTNET_RQ_STATS_LEN   ARRAY_SIZE(virtnet_rq_stats_desc)
>
> +/* The bufs on the same page may share this struct. */
> +struct virtnet_rq_dma {
> +       struct virtnet_rq_dma *next;
> +
> +       dma_addr_t addr;
> +
> +       void *buf;
> +       u32 len;
> +
> +       u32 ref;
> +};
> +
> +/* Record the dma and buf. */
> +struct virtnet_rq_data {
> +       struct virtnet_rq_data *next;
> +
> +       void *buf;
> +
> +       struct virtnet_rq_dma *dma;
> +};
> +
>  /* Internal representation of a send virtqueue */
>  struct send_queue {
>         /* Virtqueue associated with this send _queue */
> @@ -175,6 +196,13 @@ struct receive_queue {
>         char name[16];
>
>         struct xdp_rxq_info xdp_rxq;
> +
> +       struct virtnet_rq_data *data_array;
> +       struct virtnet_rq_data *data_free;
> +
> +       struct virtnet_rq_dma *dma_array;
> +       struct virtnet_rq_dma *dma_free;
> +       struct virtnet_rq_dma *last_dma;
>  };
>
>  /* This structure can contain rss message with maximum settings for indirection table and keysize
> @@ -549,6 +577,176 @@ static struct sk_buff *page_to_skb(struct virtnet_info *vi,
>         return skb;
>  }
>
> +static void virtnet_rq_unmap(struct receive_queue *rq, struct virtnet_rq_dma *dma)
> +{
> +       struct device *dev;
> +
> +       --dma->ref;
> +
> +       if (dma->ref)
> +               return;
> +
> +       dev = virtqueue_dma_dev(rq->vq);
> +
> +       dma_unmap_page(dev, dma->addr, dma->len, DMA_FROM_DEVICE);
> +
> +       dma->next = rq->dma_free;
> +       rq->dma_free = dma;
> +}
> +
> +static void *virtnet_rq_recycle_data(struct receive_queue *rq,
> +                                    struct virtnet_rq_data *data)
> +{
> +       void *buf;
> +
> +       buf = data->buf;
> +
> +       data->next = rq->data_free;
> +       rq->data_free = data;
> +
> +       return buf;
> +}
> +
> +static struct virtnet_rq_data *virtnet_rq_get_data(struct receive_queue *rq,
> +                                                  void *buf,
> +                                                  struct virtnet_rq_dma *dma)
> +{
> +       struct virtnet_rq_data *data;
> +
> +       data = rq->data_free;
> +       rq->data_free = data->next;
> +
> +       data->buf = buf;
> +       data->dma = dma;
> +
> +       return data;
> +}
> +
> +static void *virtnet_rq_get_buf(struct receive_queue *rq, u32 *len, void **ctx)
> +{
> +       struct virtnet_rq_data *data;
> +       void *buf;
> +
> +       buf = virtqueue_get_buf_ctx(rq->vq, len, ctx);
> +       if (!buf || !rq->data_array)
> +               return buf;
> +
> +       data = buf;
> +
> +       virtnet_rq_unmap(rq, data->dma);
> +
> +       return virtnet_rq_recycle_data(rq, data);
> +}
> +
> +static void *virtnet_rq_detach_unused_buf(struct receive_queue *rq)
> +{
> +       struct virtnet_rq_data *data;
> +       void *buf;
> +
> +       buf = virtqueue_detach_unused_buf(rq->vq);
> +       if (!buf || !rq->data_array)
> +               return buf;
> +
> +       data = buf;
> +
> +       virtnet_rq_unmap(rq, data->dma);
> +
> +       return virtnet_rq_recycle_data(rq, data);
> +}
> +
> +static int virtnet_rq_map_sg(struct receive_queue *rq, void *buf, u32 len)
> +{
> +       struct virtnet_rq_dma *dma = rq->last_dma;
> +       struct device *dev;
> +       u32 off, map_len;
> +       dma_addr_t addr;
> +       void *end;
> +
> +       if (likely(dma) && buf >= dma->buf && (buf + len <= dma->buf + dma->len)) {
> +               ++dma->ref;
> +               addr = dma->addr + (buf - dma->buf);
> +               goto ok;
> +       }
> +
> +       end = buf + len - 1;
> +       off = offset_in_page(end);
> +       map_len = len + PAGE_SIZE - off;

This assumes a PAGE_SIZE which seems sub-optimal as page frag could be
larger than this.

> +
> +       dev = virtqueue_dma_dev(rq->vq);
> +
> +       addr = dma_map_page_attrs(dev, virt_to_page(buf), offset_in_page(buf),
> +                                 map_len, DMA_FROM_DEVICE, 0);
> +       if (addr == DMA_MAPPING_ERROR)
> +               return -ENOMEM;
> +
> +       dma = rq->dma_free;
> +       rq->dma_free = dma->next;
> +
> +       dma->ref = 1;
> +       dma->buf = buf;
> +       dma->addr = addr;
> +       dma->len = map_len;
> +
> +       rq->last_dma = dma;
> +
> +ok:
> +       sg_init_table(rq->sg, 1);
> +       rq->sg[0].dma_address = addr;
> +       rq->sg[0].length = len;
> +
> +       return 0;
> +}
> +
> +static int virtnet_rq_merge_map_init(struct virtnet_info *vi)
> +{
> +       struct receive_queue *rq;
> +       int i, err, j, num;
> +
> +       /* disable for big mode */
> +       if (!vi->mergeable_rx_bufs && vi->big_packets)
> +               return 0;
> +
> +       for (i = 0; i < vi->max_queue_pairs; i++) {
> +               err = virtqueue_set_premapped(vi->rq[i].vq);
> +               if (err)
> +                       continue;
> +
> +               rq = &vi->rq[i];
> +
> +               num = virtqueue_get_vring_size(rq->vq);
> +
> +               rq->data_array = kmalloc_array(num, sizeof(*rq->data_array), GFP_KERNEL);
> +               if (!rq->data_array)

Can we avoid those allocations when we don't use the DMA API?

> +                       goto err;
> +
> +               rq->dma_array = kmalloc_array(num, sizeof(*rq->dma_array), GFP_KERNEL);
> +               if (!rq->dma_array)
> +                       goto err;
> +
> +               for (j = 0; j < num; ++j) {
> +                       rq->data_array[j].next = rq->data_free;
> +                       rq->data_free = &rq->data_array[j];
> +
> +                       rq->dma_array[j].next = rq->dma_free;
> +                       rq->dma_free = &rq->dma_array[j];
> +               }
> +       }
> +
> +       return 0;
> +
> +err:
> +       for (i = 0; i < vi->max_queue_pairs; i++) {
> +               struct receive_queue *rq;
> +
> +               rq = &vi->rq[i];
> +
> +               kfree(rq->dma_array);
> +               kfree(rq->data_array);
> +       }
> +
> +       return -ENOMEM;
> +}
> +
>  static void free_old_xmit_skbs(struct send_queue *sq, bool in_napi)
>  {
>         unsigned int len;
> @@ -835,7 +1033,7 @@ static struct page *xdp_linearize_page(struct receive_queue *rq,
>                 void *buf;
>                 int off;
>
> -               buf = virtqueue_get_buf(rq->vq, &buflen);
> +               buf = virtnet_rq_get_buf(rq, &buflen, NULL);
>                 if (unlikely(!buf))
>                         goto err_buf;
>
> @@ -1126,7 +1324,7 @@ static int virtnet_build_xdp_buff_mrg(struct net_device *dev,
>                 return -EINVAL;
>
>         while (--*num_buf > 0) {
> -               buf = virtqueue_get_buf_ctx(rq->vq, &len, &ctx);
> +               buf = virtnet_rq_get_buf(rq, &len, &ctx);
>                 if (unlikely(!buf)) {
>                         pr_debug("%s: rx error: %d buffers out of %d missing\n",
>                                  dev->name, *num_buf,
> @@ -1351,7 +1549,7 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
>         while (--num_buf) {
>                 int num_skb_frags;
>
> -               buf = virtqueue_get_buf_ctx(rq->vq, &len, &ctx);
> +               buf = virtnet_rq_get_buf(rq, &len, &ctx);
>                 if (unlikely(!buf)) {
>                         pr_debug("%s: rx error: %d buffers out of %d missing\n",
>                                  dev->name, num_buf,
> @@ -1414,7 +1612,7 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
>  err_skb:
>         put_page(page);
>         while (num_buf-- > 1) {
> -               buf = virtqueue_get_buf(rq->vq, &len);
> +               buf = virtnet_rq_get_buf(rq, &len, NULL);
>                 if (unlikely(!buf)) {
>                         pr_debug("%s: rx error: %d buffers missing\n",
>                                  dev->name, num_buf);
> @@ -1529,6 +1727,7 @@ static int add_recvbuf_small(struct virtnet_info *vi, struct receive_queue *rq,
>         unsigned int xdp_headroom = virtnet_get_headroom(vi);
>         void *ctx = (void *)(unsigned long)xdp_headroom;
>         int len = vi->hdr_len + VIRTNET_RX_PAD + GOOD_PACKET_LEN + xdp_headroom;
> +       struct virtnet_rq_data *data;
>         int err;
>
>         len = SKB_DATA_ALIGN(len) +
> @@ -1539,11 +1738,34 @@ static int add_recvbuf_small(struct virtnet_info *vi, struct receive_queue *rq,
>         buf = (char *)page_address(alloc_frag->page) + alloc_frag->offset;
>         get_page(alloc_frag->page);
>         alloc_frag->offset += len;
> -       sg_init_one(rq->sg, buf + VIRTNET_RX_PAD + xdp_headroom,
> -                   vi->hdr_len + GOOD_PACKET_LEN);
> -       err = virtqueue_add_inbuf_ctx(rq->vq, rq->sg, 1, buf, ctx, gfp);
> +
> +       if (rq->data_array) {
> +               err = virtnet_rq_map_sg(rq, buf + VIRTNET_RX_PAD + xdp_headroom,
> +                                       vi->hdr_len + GOOD_PACKET_LEN);

Thanks to the compound page. I wonder if everything could be
simplified if we just reuse page->private for storing metadata like
dma address and refcnt. Then we don't need extra stuff for tracking
any other thing?

Thanks



> +               if (err)
> +                       goto map_err;
> +
> +               data = virtnet_rq_get_data(rq, buf, rq->last_dma);
> +       } else {
> +               sg_init_one(rq->sg, buf + VIRTNET_RX_PAD + xdp_headroom,
> +                           vi->hdr_len + GOOD_PACKET_LEN);
> +               data = (void *)buf;
> +       }
> +
> +       err = virtqueue_add_inbuf_ctx(rq->vq, rq->sg, 1, data, ctx, gfp);
>         if (err < 0)
> -               put_page(virt_to_head_page(buf));
> +               goto add_err;
> +
> +       return err;
> +
> +add_err:
> +       if (rq->data_array) {
> +               virtnet_rq_unmap(rq, data->dma);
> +               virtnet_rq_recycle_data(rq, data);
> +       }
> +
> +map_err:
> +       put_page(virt_to_head_page(buf));
>         return err;
>  }
>
> @@ -1620,6 +1842,7 @@ static int add_recvbuf_mergeable(struct virtnet_info *vi,
>         unsigned int headroom = virtnet_get_headroom(vi);
>         unsigned int tailroom = headroom ? sizeof(struct skb_shared_info) : 0;
>         unsigned int room = SKB_DATA_ALIGN(headroom + tailroom);
> +       struct virtnet_rq_data *data;
>         char *buf;
>         void *ctx;
>         int err;
> @@ -1650,12 +1873,32 @@ static int add_recvbuf_mergeable(struct virtnet_info *vi,
>                 alloc_frag->offset += hole;
>         }
>
> -       sg_init_one(rq->sg, buf, len);
> +       if (rq->data_array) {
> +               err = virtnet_rq_map_sg(rq, buf, len);
> +               if (err)
> +                       goto map_err;
> +
> +               data = virtnet_rq_get_data(rq, buf, rq->last_dma);
> +       } else {
> +               sg_init_one(rq->sg, buf, len);
> +               data = (void *)buf;
> +       }
> +
>         ctx = mergeable_len_to_ctx(len + room, headroom);
> -       err = virtqueue_add_inbuf_ctx(rq->vq, rq->sg, 1, buf, ctx, gfp);
> +       err = virtqueue_add_inbuf_ctx(rq->vq, rq->sg, 1, data, ctx, gfp);
>         if (err < 0)
> -               put_page(virt_to_head_page(buf));
> +               goto add_err;
> +
> +       return 0;
> +
> +add_err:
> +       if (rq->data_array) {
> +               virtnet_rq_unmap(rq, data->dma);
> +               virtnet_rq_recycle_data(rq, data);
> +       }
>
> +map_err:
> +       put_page(virt_to_head_page(buf));
>         return err;
>  }
>
> @@ -1775,13 +2018,13 @@ static int virtnet_receive(struct receive_queue *rq, int budget,
>                 void *ctx;
>
>                 while (stats.packets < budget &&
> -                      (buf = virtqueue_get_buf_ctx(rq->vq, &len, &ctx))) {
> +                      (buf = virtnet_rq_get_buf(rq, &len, &ctx))) {
>                         receive_buf(vi, rq, buf, len, ctx, xdp_xmit, &stats);
>                         stats.packets++;
>                 }
>         } else {
>                 while (stats.packets < budget &&
> -                      (buf = virtqueue_get_buf(rq->vq, &len)) != NULL) {
> +                      (buf = virtnet_rq_get_buf(rq, &len, NULL)) != NULL) {
>                         receive_buf(vi, rq, buf, len, NULL, xdp_xmit, &stats);
>                         stats.packets++;
>                 }
> @@ -3514,6 +3757,9 @@ static void virtnet_free_queues(struct virtnet_info *vi)
>         for (i = 0; i < vi->max_queue_pairs; i++) {
>                 __netif_napi_del(&vi->rq[i].napi);
>                 __netif_napi_del(&vi->sq[i].napi);
> +
> +               kfree(vi->rq[i].data_array);
> +               kfree(vi->rq[i].dma_array);
>         }
>
>         /* We called __netif_napi_del(),
> @@ -3591,9 +3837,10 @@ static void free_unused_bufs(struct virtnet_info *vi)
>         }
>
>         for (i = 0; i < vi->max_queue_pairs; i++) {
> -               struct virtqueue *vq = vi->rq[i].vq;
> -               while ((buf = virtqueue_detach_unused_buf(vq)) != NULL)
> -                       virtnet_rq_free_unused_buf(vq, buf);
> +               struct receive_queue *rq = &vi->rq[i];
> +
> +               while ((buf = virtnet_rq_detach_unused_buf(rq)) != NULL)
> +                       virtnet_rq_free_unused_buf(rq->vq, buf);
>                 cond_resched();
>         }
>  }
> @@ -3767,6 +4014,10 @@ static int init_vqs(struct virtnet_info *vi)
>         if (ret)
>                 goto err_free;
>
> +       ret = virtnet_rq_merge_map_init(vi);
> +       if (ret)
> +               goto err_free;
> +
>         cpus_read_lock();
>         virtnet_set_affinity(vi);
>         cpus_read_unlock();
> --
> 2.32.0.3.g01195cf9f
>


^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: [PATCH vhost v11 06/10] virtio_ring: skip unmap for premapped
  2023-07-13  4:02       ` Xuan Zhuo
@ 2023-07-13  4:21         ` Jason Wang
  -1 siblings, 0 replies; 176+ messages in thread
From: Jason Wang @ 2023-07-13  4:21 UTC (permalink / raw)
  To: Xuan Zhuo
  Cc: Jesper Dangaard Brouer, Daniel Borkmann, Michael S. Tsirkin,
	netdev, John Fastabend, Alexei Starovoitov, virtualization,
	Christoph Hellwig, Eric Dumazet, Jakub Kicinski, bpf,
	Paolo Abeni, David S. Miller

On Thu, Jul 13, 2023 at 12:06 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
>
> On Thu, 13 Jul 2023 11:50:57 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > On Mon, Jul 10, 2023 at 11:42 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > >
> > > Now we add a case where we skip dma unmap, the vq->premapped is true.
> > >
> > > We can't just rely on use_dma_api to determine whether to skip the dma
> > > operation. For convenience, I introduced the "do_unmap". By default, it
> > > is the same as use_dma_api. If the driver is configured with premapped,
> > > then do_unmap is false.
> > >
> > > So as long as do_unmap is false, for addr of desc, we should skip dma
> > > unmap operation.
> > >
> > > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > > ---
> > >  drivers/virtio/virtio_ring.c | 42 ++++++++++++++++++++++++------------
> > >  1 file changed, 28 insertions(+), 14 deletions(-)
> > >
> > > diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> > > index 1fb2c6dca9ea..10ee3b7ce571 100644
> > > --- a/drivers/virtio/virtio_ring.c
> > > +++ b/drivers/virtio/virtio_ring.c
> > > @@ -175,6 +175,11 @@ struct vring_virtqueue {
> > >         /* Do DMA mapping by driver */
> > >         bool premapped;
> > >
> > > +       /* Do unmap or not for desc. Just when premapped is False and
> > > +        * use_dma_api is true, this is true.
> > > +        */
> > > +       bool do_unmap;
> > > +
> > >         /* Head of free buffer list. */
> > >         unsigned int free_head;
> > >         /* Number we've added since last sync. */
> > > @@ -440,7 +445,7 @@ static void vring_unmap_one_split_indirect(const struct vring_virtqueue *vq,
> > >  {
> > >         u16 flags;
> > >
> > > -       if (!vq->use_dma_api)
> > > +       if (!vq->do_unmap)
> > >                 return;
> > >
> > >         flags = virtio16_to_cpu(vq->vq.vdev, desc->flags);
> > > @@ -458,18 +463,21 @@ static unsigned int vring_unmap_one_split(const struct vring_virtqueue *vq,
> > >         struct vring_desc_extra *extra = vq->split.desc_extra;
> > >         u16 flags;
> > >
> > > -       if (!vq->use_dma_api)
> > > -               goto out;
> > > -
> > >         flags = extra[i].flags;
> > >
> > >         if (flags & VRING_DESC_F_INDIRECT) {
> > > +               if (!vq->use_dma_api)
> > > +                       goto out;
> > > +
> > >                 dma_unmap_single(vring_dma_dev(vq),
> > >                                  extra[i].addr,
> > >                                  extra[i].len,
> > >                                  (flags & VRING_DESC_F_WRITE) ?
> > >                                  DMA_FROM_DEVICE : DMA_TO_DEVICE);
> > >         } else {
> > > +               if (!vq->do_unmap)
> > > +                       goto out;
> > > +
> > >                 dma_unmap_page(vring_dma_dev(vq),
> > >                                extra[i].addr,
> > >                                extra[i].len,
> > > @@ -635,7 +643,7 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
> > >         }
> > >         /* Last one doesn't continue. */
> > >         desc[prev].flags &= cpu_to_virtio16(_vq->vdev, ~VRING_DESC_F_NEXT);
> > > -       if (!indirect && vq->use_dma_api)
> > > +       if (!indirect && vq->do_unmap)
> > >                 vq->split.desc_extra[prev & (vq->split.vring.num - 1)].flags &=
> > >                         ~VRING_DESC_F_NEXT;
> > >
> > > @@ -794,7 +802,7 @@ static void detach_buf_split(struct vring_virtqueue *vq, unsigned int head,
> > >                                 VRING_DESC_F_INDIRECT));
> > >                 BUG_ON(len == 0 || len % sizeof(struct vring_desc));
> > >
> > > -               if (vq->use_dma_api) {
> > > +               if (vq->do_unmap) {
> > >                         for (j = 0; j < len / sizeof(struct vring_desc); j++)
> > >                                 vring_unmap_one_split_indirect(vq, &indir_desc[j]);
> > >                 }
> > > @@ -1217,17 +1225,20 @@ static void vring_unmap_extra_packed(const struct vring_virtqueue *vq,
> > >  {
> > >         u16 flags;
> > >
> > > -       if (!vq->use_dma_api)
> > > -               return;
> > > -
> > >         flags = extra->flags;
> > >
> > >         if (flags & VRING_DESC_F_INDIRECT) {
> > > +               if (!vq->use_dma_api)
> > > +                       return;
> > > +
> > >                 dma_unmap_single(vring_dma_dev(vq),
> > >                                  extra->addr, extra->len,
> > >                                  (flags & VRING_DESC_F_WRITE) ?
> > >                                  DMA_FROM_DEVICE : DMA_TO_DEVICE);
> > >         } else {
> > > +               if (!vq->do_unmap)
> > > +                       return;
> >
> > This seems not straightforward than:
> >
> > if (!vq->use_dma_api)
> >     return;
> >
> > if (INDIRECT) {
> > } else if (!vq->premapped) {
> > }
> >
> > ?
>
>
> My logic here is that for the real buffer, we use do_unmap to judge uniformly.
> And indirect still use use_dma_api to judge.
>
> From this point of view, how do you feel?

We can hear from others but a state machine with three booleans seems
not easy for me to read.

Thanks

>
> Thanks.
>
>
> >
> > Thanks
> >
> > > +
> > >                 dma_unmap_page(vring_dma_dev(vq),
> > >                                extra->addr, extra->len,
> > >                                (flags & VRING_DESC_F_WRITE) ?
> > > @@ -1240,7 +1251,7 @@ static void vring_unmap_desc_packed(const struct vring_virtqueue *vq,
> > >  {
> > >         u16 flags;
> > >
> > > -       if (!vq->use_dma_api)
> > > +       if (!vq->do_unmap)
> > >                 return;
> > >
> > >         flags = le16_to_cpu(desc->flags);
> > > @@ -1329,7 +1340,7 @@ static int virtqueue_add_indirect_packed(struct vring_virtqueue *vq,
> > >                                 sizeof(struct vring_packed_desc));
> > >         vq->packed.vring.desc[head].id = cpu_to_le16(id);
> > >
> > > -       if (vq->use_dma_api) {
> > > +       if (vq->do_unmap) {
> > >                 vq->packed.desc_extra[id].addr = addr;
> > >                 vq->packed.desc_extra[id].len = total_sg *
> > >                                 sizeof(struct vring_packed_desc);
> > > @@ -1470,7 +1481,7 @@ static inline int virtqueue_add_packed(struct virtqueue *_vq,
> > >                         desc[i].len = cpu_to_le32(sg->length);
> > >                         desc[i].id = cpu_to_le16(id);
> > >
> > > -                       if (unlikely(vq->use_dma_api)) {
> > > +                       if (unlikely(vq->do_unmap)) {
> > >                                 vq->packed.desc_extra[curr].addr = addr;
> > >                                 vq->packed.desc_extra[curr].len = sg->length;
> > >                                 vq->packed.desc_extra[curr].flags =
> > > @@ -1604,7 +1615,7 @@ static void detach_buf_packed(struct vring_virtqueue *vq,
> > >         vq->free_head = id;
> > >         vq->vq.num_free += state->num;
> > >
> > > -       if (unlikely(vq->use_dma_api)) {
> > > +       if (unlikely(vq->do_unmap)) {
> > >                 curr = id;
> > >                 for (i = 0; i < state->num; i++) {
> > >                         vring_unmap_extra_packed(vq,
> > > @@ -1621,7 +1632,7 @@ static void detach_buf_packed(struct vring_virtqueue *vq,
> > >                 if (!desc)
> > >                         return;
> > >
> > > -               if (vq->use_dma_api) {
> > > +               if (vq->do_unmap) {
> > >                         len = vq->packed.desc_extra[id].len;
> > >                         for (i = 0; i < len / sizeof(struct vring_packed_desc);
> > >                                         i++)
> > > @@ -2080,6 +2091,7 @@ static struct virtqueue *vring_create_virtqueue_packed(
> > >         vq->dma_dev = dma_dev;
> > >         vq->use_dma_api = vring_use_dma_api(vdev);
> > >         vq->premapped = false;
> > > +       vq->do_unmap = vq->use_dma_api;
> > >
> > >         vq->indirect = virtio_has_feature(vdev, VIRTIO_RING_F_INDIRECT_DESC) &&
> > >                 !context;
> > > @@ -2587,6 +2599,7 @@ static struct virtqueue *__vring_new_virtqueue(unsigned int index,
> > >         vq->dma_dev = dma_dev;
> > >         vq->use_dma_api = vring_use_dma_api(vdev);
> > >         vq->premapped = false;
> > > +       vq->do_unmap = vq->use_dma_api;
> > >
> > >         vq->indirect = virtio_has_feature(vdev, VIRTIO_RING_F_INDIRECT_DESC) &&
> > >                 !context;
> > > @@ -2765,6 +2778,7 @@ int virtqueue_set_premapped(struct virtqueue *_vq)
> > >                 return -EINVAL;
> > >
> > >         vq->premapped = true;
> > > +       vq->do_unmap = false;
> > >
> > >         return 0;
> > >  }
> > > --
> > > 2.32.0.3.g01195cf9f
> > >
> >
>

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: [PATCH vhost v11 06/10] virtio_ring: skip unmap for premapped
@ 2023-07-13  4:21         ` Jason Wang
  0 siblings, 0 replies; 176+ messages in thread
From: Jason Wang @ 2023-07-13  4:21 UTC (permalink / raw)
  To: Xuan Zhuo
  Cc: virtualization, Michael S. Tsirkin, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Alexei Starovoitov,
	Daniel Borkmann, Jesper Dangaard Brouer, John Fastabend, netdev,
	bpf, Christoph Hellwig

On Thu, Jul 13, 2023 at 12:06 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
>
> On Thu, 13 Jul 2023 11:50:57 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > On Mon, Jul 10, 2023 at 11:42 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > >
> > > Now we add a case where we skip dma unmap, the vq->premapped is true.
> > >
> > > We can't just rely on use_dma_api to determine whether to skip the dma
> > > operation. For convenience, I introduced the "do_unmap". By default, it
> > > is the same as use_dma_api. If the driver is configured with premapped,
> > > then do_unmap is false.
> > >
> > > So as long as do_unmap is false, for addr of desc, we should skip dma
> > > unmap operation.
> > >
> > > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > > ---
> > >  drivers/virtio/virtio_ring.c | 42 ++++++++++++++++++++++++------------
> > >  1 file changed, 28 insertions(+), 14 deletions(-)
> > >
> > > diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> > > index 1fb2c6dca9ea..10ee3b7ce571 100644
> > > --- a/drivers/virtio/virtio_ring.c
> > > +++ b/drivers/virtio/virtio_ring.c
> > > @@ -175,6 +175,11 @@ struct vring_virtqueue {
> > >         /* Do DMA mapping by driver */
> > >         bool premapped;
> > >
> > > +       /* Do unmap or not for desc. Just when premapped is False and
> > > +        * use_dma_api is true, this is true.
> > > +        */
> > > +       bool do_unmap;
> > > +
> > >         /* Head of free buffer list. */
> > >         unsigned int free_head;
> > >         /* Number we've added since last sync. */
> > > @@ -440,7 +445,7 @@ static void vring_unmap_one_split_indirect(const struct vring_virtqueue *vq,
> > >  {
> > >         u16 flags;
> > >
> > > -       if (!vq->use_dma_api)
> > > +       if (!vq->do_unmap)
> > >                 return;
> > >
> > >         flags = virtio16_to_cpu(vq->vq.vdev, desc->flags);
> > > @@ -458,18 +463,21 @@ static unsigned int vring_unmap_one_split(const struct vring_virtqueue *vq,
> > >         struct vring_desc_extra *extra = vq->split.desc_extra;
> > >         u16 flags;
> > >
> > > -       if (!vq->use_dma_api)
> > > -               goto out;
> > > -
> > >         flags = extra[i].flags;
> > >
> > >         if (flags & VRING_DESC_F_INDIRECT) {
> > > +               if (!vq->use_dma_api)
> > > +                       goto out;
> > > +
> > >                 dma_unmap_single(vring_dma_dev(vq),
> > >                                  extra[i].addr,
> > >                                  extra[i].len,
> > >                                  (flags & VRING_DESC_F_WRITE) ?
> > >                                  DMA_FROM_DEVICE : DMA_TO_DEVICE);
> > >         } else {
> > > +               if (!vq->do_unmap)
> > > +                       goto out;
> > > +
> > >                 dma_unmap_page(vring_dma_dev(vq),
> > >                                extra[i].addr,
> > >                                extra[i].len,
> > > @@ -635,7 +643,7 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
> > >         }
> > >         /* Last one doesn't continue. */
> > >         desc[prev].flags &= cpu_to_virtio16(_vq->vdev, ~VRING_DESC_F_NEXT);
> > > -       if (!indirect && vq->use_dma_api)
> > > +       if (!indirect && vq->do_unmap)
> > >                 vq->split.desc_extra[prev & (vq->split.vring.num - 1)].flags &=
> > >                         ~VRING_DESC_F_NEXT;
> > >
> > > @@ -794,7 +802,7 @@ static void detach_buf_split(struct vring_virtqueue *vq, unsigned int head,
> > >                                 VRING_DESC_F_INDIRECT));
> > >                 BUG_ON(len == 0 || len % sizeof(struct vring_desc));
> > >
> > > -               if (vq->use_dma_api) {
> > > +               if (vq->do_unmap) {
> > >                         for (j = 0; j < len / sizeof(struct vring_desc); j++)
> > >                                 vring_unmap_one_split_indirect(vq, &indir_desc[j]);
> > >                 }
> > > @@ -1217,17 +1225,20 @@ static void vring_unmap_extra_packed(const struct vring_virtqueue *vq,
> > >  {
> > >         u16 flags;
> > >
> > > -       if (!vq->use_dma_api)
> > > -               return;
> > > -
> > >         flags = extra->flags;
> > >
> > >         if (flags & VRING_DESC_F_INDIRECT) {
> > > +               if (!vq->use_dma_api)
> > > +                       return;
> > > +
> > >                 dma_unmap_single(vring_dma_dev(vq),
> > >                                  extra->addr, extra->len,
> > >                                  (flags & VRING_DESC_F_WRITE) ?
> > >                                  DMA_FROM_DEVICE : DMA_TO_DEVICE);
> > >         } else {
> > > +               if (!vq->do_unmap)
> > > +                       return;
> >
> > This seems not straightforward than:
> >
> > if (!vq->use_dma_api)
> >     return;
> >
> > if (INDIRECT) {
> > } else if (!vq->premapped) {
> > }
> >
> > ?
>
>
> My logic here is that for the real buffer, we use do_unmap to judge uniformly.
> And indirect still use use_dma_api to judge.
>
> From this point of view, how do you feel?

We can hear from others but a state machine with three booleans seems
not easy for me to read.

Thanks

>
> Thanks.
>
>
> >
> > Thanks
> >
> > > +
> > >                 dma_unmap_page(vring_dma_dev(vq),
> > >                                extra->addr, extra->len,
> > >                                (flags & VRING_DESC_F_WRITE) ?
> > > @@ -1240,7 +1251,7 @@ static void vring_unmap_desc_packed(const struct vring_virtqueue *vq,
> > >  {
> > >         u16 flags;
> > >
> > > -       if (!vq->use_dma_api)
> > > +       if (!vq->do_unmap)
> > >                 return;
> > >
> > >         flags = le16_to_cpu(desc->flags);
> > > @@ -1329,7 +1340,7 @@ static int virtqueue_add_indirect_packed(struct vring_virtqueue *vq,
> > >                                 sizeof(struct vring_packed_desc));
> > >         vq->packed.vring.desc[head].id = cpu_to_le16(id);
> > >
> > > -       if (vq->use_dma_api) {
> > > +       if (vq->do_unmap) {
> > >                 vq->packed.desc_extra[id].addr = addr;
> > >                 vq->packed.desc_extra[id].len = total_sg *
> > >                                 sizeof(struct vring_packed_desc);
> > > @@ -1470,7 +1481,7 @@ static inline int virtqueue_add_packed(struct virtqueue *_vq,
> > >                         desc[i].len = cpu_to_le32(sg->length);
> > >                         desc[i].id = cpu_to_le16(id);
> > >
> > > -                       if (unlikely(vq->use_dma_api)) {
> > > +                       if (unlikely(vq->do_unmap)) {
> > >                                 vq->packed.desc_extra[curr].addr = addr;
> > >                                 vq->packed.desc_extra[curr].len = sg->length;
> > >                                 vq->packed.desc_extra[curr].flags =
> > > @@ -1604,7 +1615,7 @@ static void detach_buf_packed(struct vring_virtqueue *vq,
> > >         vq->free_head = id;
> > >         vq->vq.num_free += state->num;
> > >
> > > -       if (unlikely(vq->use_dma_api)) {
> > > +       if (unlikely(vq->do_unmap)) {
> > >                 curr = id;
> > >                 for (i = 0; i < state->num; i++) {
> > >                         vring_unmap_extra_packed(vq,
> > > @@ -1621,7 +1632,7 @@ static void detach_buf_packed(struct vring_virtqueue *vq,
> > >                 if (!desc)
> > >                         return;
> > >
> > > -               if (vq->use_dma_api) {
> > > +               if (vq->do_unmap) {
> > >                         len = vq->packed.desc_extra[id].len;
> > >                         for (i = 0; i < len / sizeof(struct vring_packed_desc);
> > >                                         i++)
> > > @@ -2080,6 +2091,7 @@ static struct virtqueue *vring_create_virtqueue_packed(
> > >         vq->dma_dev = dma_dev;
> > >         vq->use_dma_api = vring_use_dma_api(vdev);
> > >         vq->premapped = false;
> > > +       vq->do_unmap = vq->use_dma_api;
> > >
> > >         vq->indirect = virtio_has_feature(vdev, VIRTIO_RING_F_INDIRECT_DESC) &&
> > >                 !context;
> > > @@ -2587,6 +2599,7 @@ static struct virtqueue *__vring_new_virtqueue(unsigned int index,
> > >         vq->dma_dev = dma_dev;
> > >         vq->use_dma_api = vring_use_dma_api(vdev);
> > >         vq->premapped = false;
> > > +       vq->do_unmap = vq->use_dma_api;
> > >
> > >         vq->indirect = virtio_has_feature(vdev, VIRTIO_RING_F_INDIRECT_DESC) &&
> > >                 !context;
> > > @@ -2765,6 +2778,7 @@ int virtqueue_set_premapped(struct virtqueue *_vq)
> > >                 return -EINVAL;
> > >
> > >         vq->premapped = true;
> > > +       vq->do_unmap = false;
> > >
> > >         return 0;
> > >  }
> > > --
> > > 2.32.0.3.g01195cf9f
> > >
> >
>


^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: [PATCH vhost v11 06/10] virtio_ring: skip unmap for premapped
  2023-07-13  4:21         ` Jason Wang
@ 2023-07-13  5:45           ` Xuan Zhuo
  -1 siblings, 0 replies; 176+ messages in thread
From: Xuan Zhuo @ 2023-07-13  5:45 UTC (permalink / raw)
  To: Jason Wang
  Cc: Jesper Dangaard Brouer, Daniel Borkmann, Michael S. Tsirkin,
	netdev, John Fastabend, Alexei Starovoitov, virtualization,
	Christoph Hellwig, Eric Dumazet, Jakub Kicinski, bpf,
	Paolo Abeni, David S. Miller

On Thu, 13 Jul 2023 12:21:26 +0800, Jason Wang <jasowang@redhat.com> wrote:
> On Thu, Jul 13, 2023 at 12:06 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> >
> > On Thu, 13 Jul 2023 11:50:57 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > On Mon, Jul 10, 2023 at 11:42 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > >
> > > > Now we add a case where we skip dma unmap, the vq->premapped is true.
> > > >
> > > > We can't just rely on use_dma_api to determine whether to skip the dma
> > > > operation. For convenience, I introduced the "do_unmap". By default, it
> > > > is the same as use_dma_api. If the driver is configured with premapped,
> > > > then do_unmap is false.
> > > >
> > > > So as long as do_unmap is false, for addr of desc, we should skip dma
> > > > unmap operation.
> > > >
> > > > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > > > ---
> > > >  drivers/virtio/virtio_ring.c | 42 ++++++++++++++++++++++++------------
> > > >  1 file changed, 28 insertions(+), 14 deletions(-)
> > > >
> > > > diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> > > > index 1fb2c6dca9ea..10ee3b7ce571 100644
> > > > --- a/drivers/virtio/virtio_ring.c
> > > > +++ b/drivers/virtio/virtio_ring.c
> > > > @@ -175,6 +175,11 @@ struct vring_virtqueue {
> > > >         /* Do DMA mapping by driver */
> > > >         bool premapped;
> > > >
> > > > +       /* Do unmap or not for desc. Just when premapped is False and
> > > > +        * use_dma_api is true, this is true.
> > > > +        */
> > > > +       bool do_unmap;
> > > > +
> > > >         /* Head of free buffer list. */
> > > >         unsigned int free_head;
> > > >         /* Number we've added since last sync. */
> > > > @@ -440,7 +445,7 @@ static void vring_unmap_one_split_indirect(const struct vring_virtqueue *vq,
> > > >  {
> > > >         u16 flags;
> > > >
> > > > -       if (!vq->use_dma_api)
> > > > +       if (!vq->do_unmap)
> > > >                 return;
> > > >
> > > >         flags = virtio16_to_cpu(vq->vq.vdev, desc->flags);
> > > > @@ -458,18 +463,21 @@ static unsigned int vring_unmap_one_split(const struct vring_virtqueue *vq,
> > > >         struct vring_desc_extra *extra = vq->split.desc_extra;
> > > >         u16 flags;
> > > >
> > > > -       if (!vq->use_dma_api)
> > > > -               goto out;
> > > > -
> > > >         flags = extra[i].flags;
> > > >
> > > >         if (flags & VRING_DESC_F_INDIRECT) {
> > > > +               if (!vq->use_dma_api)
> > > > +                       goto out;
> > > > +
> > > >                 dma_unmap_single(vring_dma_dev(vq),
> > > >                                  extra[i].addr,
> > > >                                  extra[i].len,
> > > >                                  (flags & VRING_DESC_F_WRITE) ?
> > > >                                  DMA_FROM_DEVICE : DMA_TO_DEVICE);
> > > >         } else {
> > > > +               if (!vq->do_unmap)
> > > > +                       goto out;
> > > > +
> > > >                 dma_unmap_page(vring_dma_dev(vq),
> > > >                                extra[i].addr,
> > > >                                extra[i].len,
> > > > @@ -635,7 +643,7 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
> > > >         }
> > > >         /* Last one doesn't continue. */
> > > >         desc[prev].flags &= cpu_to_virtio16(_vq->vdev, ~VRING_DESC_F_NEXT);
> > > > -       if (!indirect && vq->use_dma_api)
> > > > +       if (!indirect && vq->do_unmap)
> > > >                 vq->split.desc_extra[prev & (vq->split.vring.num - 1)].flags &=
> > > >                         ~VRING_DESC_F_NEXT;
> > > >
> > > > @@ -794,7 +802,7 @@ static void detach_buf_split(struct vring_virtqueue *vq, unsigned int head,
> > > >                                 VRING_DESC_F_INDIRECT));
> > > >                 BUG_ON(len == 0 || len % sizeof(struct vring_desc));
> > > >
> > > > -               if (vq->use_dma_api) {
> > > > +               if (vq->do_unmap) {
> > > >                         for (j = 0; j < len / sizeof(struct vring_desc); j++)
> > > >                                 vring_unmap_one_split_indirect(vq, &indir_desc[j]);
> > > >                 }
> > > > @@ -1217,17 +1225,20 @@ static void vring_unmap_extra_packed(const struct vring_virtqueue *vq,
> > > >  {
> > > >         u16 flags;
> > > >
> > > > -       if (!vq->use_dma_api)
> > > > -               return;
> > > > -
> > > >         flags = extra->flags;
> > > >
> > > >         if (flags & VRING_DESC_F_INDIRECT) {
> > > > +               if (!vq->use_dma_api)
> > > > +                       return;
> > > > +
> > > >                 dma_unmap_single(vring_dma_dev(vq),
> > > >                                  extra->addr, extra->len,
> > > >                                  (flags & VRING_DESC_F_WRITE) ?
> > > >                                  DMA_FROM_DEVICE : DMA_TO_DEVICE);
> > > >         } else {
> > > > +               if (!vq->do_unmap)
> > > > +                       return;
> > >
> > > This seems not straightforward than:
> > >
> > > if (!vq->use_dma_api)
> > >     return;
> > >
> > > if (INDIRECT) {
> > > } else if (!vq->premapped) {
> > > }
> > >
> > > ?
> >
> >
> > My logic here is that for the real buffer, we use do_unmap to judge uniformly.
> > And indirect still use use_dma_api to judge.
> >
> > From this point of view, how do you feel?
>
> We can hear from others but a state machine with three booleans seems
> not easy for me to read.

Yes, I also think too many booleans, so I introduce do_unmap, then
for the real buffer(not the indirect desc array), we just check do_unmap.

Thanks.


>
> Thanks
>
> >
> > Thanks.
> >
> >
> > >
> > > Thanks
> > >
> > > > +
> > > >                 dma_unmap_page(vring_dma_dev(vq),
> > > >                                extra->addr, extra->len,
> > > >                                (flags & VRING_DESC_F_WRITE) ?
> > > > @@ -1240,7 +1251,7 @@ static void vring_unmap_desc_packed(const struct vring_virtqueue *vq,
> > > >  {
> > > >         u16 flags;
> > > >
> > > > -       if (!vq->use_dma_api)
> > > > +       if (!vq->do_unmap)
> > > >                 return;
> > > >
> > > >         flags = le16_to_cpu(desc->flags);
> > > > @@ -1329,7 +1340,7 @@ static int virtqueue_add_indirect_packed(struct vring_virtqueue *vq,
> > > >                                 sizeof(struct vring_packed_desc));
> > > >         vq->packed.vring.desc[head].id = cpu_to_le16(id);
> > > >
> > > > -       if (vq->use_dma_api) {
> > > > +       if (vq->do_unmap) {
> > > >                 vq->packed.desc_extra[id].addr = addr;
> > > >                 vq->packed.desc_extra[id].len = total_sg *
> > > >                                 sizeof(struct vring_packed_desc);
> > > > @@ -1470,7 +1481,7 @@ static inline int virtqueue_add_packed(struct virtqueue *_vq,
> > > >                         desc[i].len = cpu_to_le32(sg->length);
> > > >                         desc[i].id = cpu_to_le16(id);
> > > >
> > > > -                       if (unlikely(vq->use_dma_api)) {
> > > > +                       if (unlikely(vq->do_unmap)) {
> > > >                                 vq->packed.desc_extra[curr].addr = addr;
> > > >                                 vq->packed.desc_extra[curr].len = sg->length;
> > > >                                 vq->packed.desc_extra[curr].flags =
> > > > @@ -1604,7 +1615,7 @@ static void detach_buf_packed(struct vring_virtqueue *vq,
> > > >         vq->free_head = id;
> > > >         vq->vq.num_free += state->num;
> > > >
> > > > -       if (unlikely(vq->use_dma_api)) {
> > > > +       if (unlikely(vq->do_unmap)) {
> > > >                 curr = id;
> > > >                 for (i = 0; i < state->num; i++) {
> > > >                         vring_unmap_extra_packed(vq,
> > > > @@ -1621,7 +1632,7 @@ static void detach_buf_packed(struct vring_virtqueue *vq,
> > > >                 if (!desc)
> > > >                         return;
> > > >
> > > > -               if (vq->use_dma_api) {
> > > > +               if (vq->do_unmap) {
> > > >                         len = vq->packed.desc_extra[id].len;
> > > >                         for (i = 0; i < len / sizeof(struct vring_packed_desc);
> > > >                                         i++)
> > > > @@ -2080,6 +2091,7 @@ static struct virtqueue *vring_create_virtqueue_packed(
> > > >         vq->dma_dev = dma_dev;
> > > >         vq->use_dma_api = vring_use_dma_api(vdev);
> > > >         vq->premapped = false;
> > > > +       vq->do_unmap = vq->use_dma_api;
> > > >
> > > >         vq->indirect = virtio_has_feature(vdev, VIRTIO_RING_F_INDIRECT_DESC) &&
> > > >                 !context;
> > > > @@ -2587,6 +2599,7 @@ static struct virtqueue *__vring_new_virtqueue(unsigned int index,
> > > >         vq->dma_dev = dma_dev;
> > > >         vq->use_dma_api = vring_use_dma_api(vdev);
> > > >         vq->premapped = false;
> > > > +       vq->do_unmap = vq->use_dma_api;
> > > >
> > > >         vq->indirect = virtio_has_feature(vdev, VIRTIO_RING_F_INDIRECT_DESC) &&
> > > >                 !context;
> > > > @@ -2765,6 +2778,7 @@ int virtqueue_set_premapped(struct virtqueue *_vq)
> > > >                 return -EINVAL;
> > > >
> > > >         vq->premapped = true;
> > > > +       vq->do_unmap = false;
> > > >
> > > >         return 0;
> > > >  }
> > > > --
> > > > 2.32.0.3.g01195cf9f
> > > >
> > >
> >
>
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: [PATCH vhost v11 06/10] virtio_ring: skip unmap for premapped
@ 2023-07-13  5:45           ` Xuan Zhuo
  0 siblings, 0 replies; 176+ messages in thread
From: Xuan Zhuo @ 2023-07-13  5:45 UTC (permalink / raw)
  To: Jason Wang
  Cc: virtualization, Michael S. Tsirkin, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Alexei Starovoitov,
	Daniel Borkmann, Jesper Dangaard Brouer, John Fastabend, netdev,
	bpf, Christoph Hellwig

On Thu, 13 Jul 2023 12:21:26 +0800, Jason Wang <jasowang@redhat.com> wrote:
> On Thu, Jul 13, 2023 at 12:06 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> >
> > On Thu, 13 Jul 2023 11:50:57 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > On Mon, Jul 10, 2023 at 11:42 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > >
> > > > Now we add a case where we skip dma unmap, the vq->premapped is true.
> > > >
> > > > We can't just rely on use_dma_api to determine whether to skip the dma
> > > > operation. For convenience, I introduced the "do_unmap". By default, it
> > > > is the same as use_dma_api. If the driver is configured with premapped,
> > > > then do_unmap is false.
> > > >
> > > > So as long as do_unmap is false, for addr of desc, we should skip dma
> > > > unmap operation.
> > > >
> > > > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > > > ---
> > > >  drivers/virtio/virtio_ring.c | 42 ++++++++++++++++++++++++------------
> > > >  1 file changed, 28 insertions(+), 14 deletions(-)
> > > >
> > > > diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> > > > index 1fb2c6dca9ea..10ee3b7ce571 100644
> > > > --- a/drivers/virtio/virtio_ring.c
> > > > +++ b/drivers/virtio/virtio_ring.c
> > > > @@ -175,6 +175,11 @@ struct vring_virtqueue {
> > > >         /* Do DMA mapping by driver */
> > > >         bool premapped;
> > > >
> > > > +       /* Do unmap or not for desc. Just when premapped is False and
> > > > +        * use_dma_api is true, this is true.
> > > > +        */
> > > > +       bool do_unmap;
> > > > +
> > > >         /* Head of free buffer list. */
> > > >         unsigned int free_head;
> > > >         /* Number we've added since last sync. */
> > > > @@ -440,7 +445,7 @@ static void vring_unmap_one_split_indirect(const struct vring_virtqueue *vq,
> > > >  {
> > > >         u16 flags;
> > > >
> > > > -       if (!vq->use_dma_api)
> > > > +       if (!vq->do_unmap)
> > > >                 return;
> > > >
> > > >         flags = virtio16_to_cpu(vq->vq.vdev, desc->flags);
> > > > @@ -458,18 +463,21 @@ static unsigned int vring_unmap_one_split(const struct vring_virtqueue *vq,
> > > >         struct vring_desc_extra *extra = vq->split.desc_extra;
> > > >         u16 flags;
> > > >
> > > > -       if (!vq->use_dma_api)
> > > > -               goto out;
> > > > -
> > > >         flags = extra[i].flags;
> > > >
> > > >         if (flags & VRING_DESC_F_INDIRECT) {
> > > > +               if (!vq->use_dma_api)
> > > > +                       goto out;
> > > > +
> > > >                 dma_unmap_single(vring_dma_dev(vq),
> > > >                                  extra[i].addr,
> > > >                                  extra[i].len,
> > > >                                  (flags & VRING_DESC_F_WRITE) ?
> > > >                                  DMA_FROM_DEVICE : DMA_TO_DEVICE);
> > > >         } else {
> > > > +               if (!vq->do_unmap)
> > > > +                       goto out;
> > > > +
> > > >                 dma_unmap_page(vring_dma_dev(vq),
> > > >                                extra[i].addr,
> > > >                                extra[i].len,
> > > > @@ -635,7 +643,7 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
> > > >         }
> > > >         /* Last one doesn't continue. */
> > > >         desc[prev].flags &= cpu_to_virtio16(_vq->vdev, ~VRING_DESC_F_NEXT);
> > > > -       if (!indirect && vq->use_dma_api)
> > > > +       if (!indirect && vq->do_unmap)
> > > >                 vq->split.desc_extra[prev & (vq->split.vring.num - 1)].flags &=
> > > >                         ~VRING_DESC_F_NEXT;
> > > >
> > > > @@ -794,7 +802,7 @@ static void detach_buf_split(struct vring_virtqueue *vq, unsigned int head,
> > > >                                 VRING_DESC_F_INDIRECT));
> > > >                 BUG_ON(len == 0 || len % sizeof(struct vring_desc));
> > > >
> > > > -               if (vq->use_dma_api) {
> > > > +               if (vq->do_unmap) {
> > > >                         for (j = 0; j < len / sizeof(struct vring_desc); j++)
> > > >                                 vring_unmap_one_split_indirect(vq, &indir_desc[j]);
> > > >                 }
> > > > @@ -1217,17 +1225,20 @@ static void vring_unmap_extra_packed(const struct vring_virtqueue *vq,
> > > >  {
> > > >         u16 flags;
> > > >
> > > > -       if (!vq->use_dma_api)
> > > > -               return;
> > > > -
> > > >         flags = extra->flags;
> > > >
> > > >         if (flags & VRING_DESC_F_INDIRECT) {
> > > > +               if (!vq->use_dma_api)
> > > > +                       return;
> > > > +
> > > >                 dma_unmap_single(vring_dma_dev(vq),
> > > >                                  extra->addr, extra->len,
> > > >                                  (flags & VRING_DESC_F_WRITE) ?
> > > >                                  DMA_FROM_DEVICE : DMA_TO_DEVICE);
> > > >         } else {
> > > > +               if (!vq->do_unmap)
> > > > +                       return;
> > >
> > > This seems not straightforward than:
> > >
> > > if (!vq->use_dma_api)
> > >     return;
> > >
> > > if (INDIRECT) {
> > > } else if (!vq->premapped) {
> > > }
> > >
> > > ?
> >
> >
> > My logic here is that for the real buffer, we use do_unmap to judge uniformly.
> > And indirect still use use_dma_api to judge.
> >
> > From this point of view, how do you feel?
>
> We can hear from others but a state machine with three booleans seems
> not easy for me to read.

Yes, I also think too many booleans, so I introduce do_unmap, then
for the real buffer(not the indirect desc array), we just check do_unmap.

Thanks.


>
> Thanks
>
> >
> > Thanks.
> >
> >
> > >
> > > Thanks
> > >
> > > > +
> > > >                 dma_unmap_page(vring_dma_dev(vq),
> > > >                                extra->addr, extra->len,
> > > >                                (flags & VRING_DESC_F_WRITE) ?
> > > > @@ -1240,7 +1251,7 @@ static void vring_unmap_desc_packed(const struct vring_virtqueue *vq,
> > > >  {
> > > >         u16 flags;
> > > >
> > > > -       if (!vq->use_dma_api)
> > > > +       if (!vq->do_unmap)
> > > >                 return;
> > > >
> > > >         flags = le16_to_cpu(desc->flags);
> > > > @@ -1329,7 +1340,7 @@ static int virtqueue_add_indirect_packed(struct vring_virtqueue *vq,
> > > >                                 sizeof(struct vring_packed_desc));
> > > >         vq->packed.vring.desc[head].id = cpu_to_le16(id);
> > > >
> > > > -       if (vq->use_dma_api) {
> > > > +       if (vq->do_unmap) {
> > > >                 vq->packed.desc_extra[id].addr = addr;
> > > >                 vq->packed.desc_extra[id].len = total_sg *
> > > >                                 sizeof(struct vring_packed_desc);
> > > > @@ -1470,7 +1481,7 @@ static inline int virtqueue_add_packed(struct virtqueue *_vq,
> > > >                         desc[i].len = cpu_to_le32(sg->length);
> > > >                         desc[i].id = cpu_to_le16(id);
> > > >
> > > > -                       if (unlikely(vq->use_dma_api)) {
> > > > +                       if (unlikely(vq->do_unmap)) {
> > > >                                 vq->packed.desc_extra[curr].addr = addr;
> > > >                                 vq->packed.desc_extra[curr].len = sg->length;
> > > >                                 vq->packed.desc_extra[curr].flags =
> > > > @@ -1604,7 +1615,7 @@ static void detach_buf_packed(struct vring_virtqueue *vq,
> > > >         vq->free_head = id;
> > > >         vq->vq.num_free += state->num;
> > > >
> > > > -       if (unlikely(vq->use_dma_api)) {
> > > > +       if (unlikely(vq->do_unmap)) {
> > > >                 curr = id;
> > > >                 for (i = 0; i < state->num; i++) {
> > > >                         vring_unmap_extra_packed(vq,
> > > > @@ -1621,7 +1632,7 @@ static void detach_buf_packed(struct vring_virtqueue *vq,
> > > >                 if (!desc)
> > > >                         return;
> > > >
> > > > -               if (vq->use_dma_api) {
> > > > +               if (vq->do_unmap) {
> > > >                         len = vq->packed.desc_extra[id].len;
> > > >                         for (i = 0; i < len / sizeof(struct vring_packed_desc);
> > > >                                         i++)
> > > > @@ -2080,6 +2091,7 @@ static struct virtqueue *vring_create_virtqueue_packed(
> > > >         vq->dma_dev = dma_dev;
> > > >         vq->use_dma_api = vring_use_dma_api(vdev);
> > > >         vq->premapped = false;
> > > > +       vq->do_unmap = vq->use_dma_api;
> > > >
> > > >         vq->indirect = virtio_has_feature(vdev, VIRTIO_RING_F_INDIRECT_DESC) &&
> > > >                 !context;
> > > > @@ -2587,6 +2599,7 @@ static struct virtqueue *__vring_new_virtqueue(unsigned int index,
> > > >         vq->dma_dev = dma_dev;
> > > >         vq->use_dma_api = vring_use_dma_api(vdev);
> > > >         vq->premapped = false;
> > > > +       vq->do_unmap = vq->use_dma_api;
> > > >
> > > >         vq->indirect = virtio_has_feature(vdev, VIRTIO_RING_F_INDIRECT_DESC) &&
> > > >                 !context;
> > > > @@ -2765,6 +2778,7 @@ int virtqueue_set_premapped(struct virtqueue *_vq)
> > > >                 return -EINVAL;
> > > >
> > > >         vq->premapped = true;
> > > > +       vq->do_unmap = false;
> > > >
> > > >         return 0;
> > > >  }
> > > > --
> > > > 2.32.0.3.g01195cf9f
> > > >
> > >
> >
>

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: [PATCH vhost v11 10/10] virtio_net: merge dma operation for one page
  2023-07-13  4:20     ` Jason Wang
@ 2023-07-13  5:53       ` Xuan Zhuo
  -1 siblings, 0 replies; 176+ messages in thread
From: Xuan Zhuo @ 2023-07-13  5:53 UTC (permalink / raw)
  To: Jason Wang
  Cc: Jesper Dangaard Brouer, Daniel Borkmann, Michael S. Tsirkin,
	netdev, John Fastabend, Alexei Starovoitov, virtualization,
	Christoph Hellwig, Eric Dumazet, Jakub Kicinski, bpf,
	Paolo Abeni, David S. Miller

On Thu, 13 Jul 2023 12:20:01 +0800, Jason Wang <jasowang@redhat.com> wrote:
> On Mon, Jul 10, 2023 at 11:43 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> >
>
> I'd suggest to tweak the title like:
>
> "merge dma operations when refilling mergeable buffers"
>
> > Currently, the virtio core will perform a dma operation for each
> > operation.
>
> "for each buffer"?
>
> > Although, the same page may be operated multiple times.
> >
> > The driver does the dma operation and manages the dma address based the
> > feature premapped of virtio core.
> >
> > This way, we can perform only one dma operation for the same page. In
> > the case of mtu 1500, this can reduce a lot of dma operations.
> >
> > Tested on Aliyun g7.4large machine, in the case of a cpu 100%, pps
> > increased from 1893766 to 1901105. An increase of 0.4%.
>
> Btw, it looks to me the code to deal with XDP_TX/REDIRECT for
> linearized pages was missed.

This patch just affects the filling buffers and the getting buffers.
So I guess that you mean the getting buffers from the xdp_linearize_page().

I actually handled this. Maybe you miss that.



>
> >
> > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > ---
> >  drivers/net/virtio_net.c | 283 ++++++++++++++++++++++++++++++++++++---
> >  1 file changed, 267 insertions(+), 16 deletions(-)
> >
> > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > index 486b5849033d..4de845d35bed 100644
> > --- a/drivers/net/virtio_net.c
> > +++ b/drivers/net/virtio_net.c
> > @@ -126,6 +126,27 @@ static const struct virtnet_stat_desc virtnet_rq_stats_desc[] = {
> >  #define VIRTNET_SQ_STATS_LEN   ARRAY_SIZE(virtnet_sq_stats_desc)
> >  #define VIRTNET_RQ_STATS_LEN   ARRAY_SIZE(virtnet_rq_stats_desc)
> >
> > +/* The bufs on the same page may share this struct. */
> > +struct virtnet_rq_dma {
> > +       struct virtnet_rq_dma *next;
> > +
> > +       dma_addr_t addr;
> > +
> > +       void *buf;
> > +       u32 len;
> > +
> > +       u32 ref;
> > +};
> > +
> > +/* Record the dma and buf. */
> > +struct virtnet_rq_data {
> > +       struct virtnet_rq_data *next;
> > +
> > +       void *buf;
> > +
> > +       struct virtnet_rq_dma *dma;
> > +};
> > +
> >  /* Internal representation of a send virtqueue */
> >  struct send_queue {
> >         /* Virtqueue associated with this send _queue */
> > @@ -175,6 +196,13 @@ struct receive_queue {
> >         char name[16];
> >
> >         struct xdp_rxq_info xdp_rxq;
> > +
> > +       struct virtnet_rq_data *data_array;
> > +       struct virtnet_rq_data *data_free;
> > +
> > +       struct virtnet_rq_dma *dma_array;
> > +       struct virtnet_rq_dma *dma_free;
> > +       struct virtnet_rq_dma *last_dma;
> >  };
> >
> >  /* This structure can contain rss message with maximum settings for indirection table and keysize
> > @@ -549,6 +577,176 @@ static struct sk_buff *page_to_skb(struct virtnet_info *vi,
> >         return skb;
> >  }
> >
> > +static void virtnet_rq_unmap(struct receive_queue *rq, struct virtnet_rq_dma *dma)
> > +{
> > +       struct device *dev;
> > +
> > +       --dma->ref;
> > +
> > +       if (dma->ref)
> > +               return;
> > +
> > +       dev = virtqueue_dma_dev(rq->vq);
> > +
> > +       dma_unmap_page(dev, dma->addr, dma->len, DMA_FROM_DEVICE);
> > +
> > +       dma->next = rq->dma_free;
> > +       rq->dma_free = dma;
> > +}
> > +
> > +static void *virtnet_rq_recycle_data(struct receive_queue *rq,
> > +                                    struct virtnet_rq_data *data)
> > +{
> > +       void *buf;
> > +
> > +       buf = data->buf;
> > +
> > +       data->next = rq->data_free;
> > +       rq->data_free = data;
> > +
> > +       return buf;
> > +}
> > +
> > +static struct virtnet_rq_data *virtnet_rq_get_data(struct receive_queue *rq,
> > +                                                  void *buf,
> > +                                                  struct virtnet_rq_dma *dma)
> > +{
> > +       struct virtnet_rq_data *data;
> > +
> > +       data = rq->data_free;
> > +       rq->data_free = data->next;
> > +
> > +       data->buf = buf;
> > +       data->dma = dma;
> > +
> > +       return data;
> > +}
> > +
> > +static void *virtnet_rq_get_buf(struct receive_queue *rq, u32 *len, void **ctx)
> > +{
> > +       struct virtnet_rq_data *data;
> > +       void *buf;
> > +
> > +       buf = virtqueue_get_buf_ctx(rq->vq, len, ctx);
> > +       if (!buf || !rq->data_array)
> > +               return buf;
> > +
> > +       data = buf;
> > +
> > +       virtnet_rq_unmap(rq, data->dma);
> > +
> > +       return virtnet_rq_recycle_data(rq, data);
> > +}
> > +
> > +static void *virtnet_rq_detach_unused_buf(struct receive_queue *rq)
> > +{
> > +       struct virtnet_rq_data *data;
> > +       void *buf;
> > +
> > +       buf = virtqueue_detach_unused_buf(rq->vq);
> > +       if (!buf || !rq->data_array)
> > +               return buf;
> > +
> > +       data = buf;
> > +
> > +       virtnet_rq_unmap(rq, data->dma);
> > +
> > +       return virtnet_rq_recycle_data(rq, data);
> > +}
> > +
> > +static int virtnet_rq_map_sg(struct receive_queue *rq, void *buf, u32 len)
> > +{
> > +       struct virtnet_rq_dma *dma = rq->last_dma;
> > +       struct device *dev;
> > +       u32 off, map_len;
> > +       dma_addr_t addr;
> > +       void *end;
> > +
> > +       if (likely(dma) && buf >= dma->buf && (buf + len <= dma->buf + dma->len)) {
> > +               ++dma->ref;
> > +               addr = dma->addr + (buf - dma->buf);
> > +               goto ok;
> > +       }
> > +
> > +       end = buf + len - 1;
> > +       off = offset_in_page(end);
> > +       map_len = len + PAGE_SIZE - off;
>
> This assumes a PAGE_SIZE which seems sub-optimal as page frag could be
> larger than this.

Actually, the each time I just handle one/two page. I do not map the page frag.
Because I want to avoid that just one buffer(1500) is using, but the entire page
frag(32k) is still mapped.

Mapping the entire page frag and mapping one page every time, I don't know
which is good.

>
> > +
> > +       dev = virtqueue_dma_dev(rq->vq);
> > +
> > +       addr = dma_map_page_attrs(dev, virt_to_page(buf), offset_in_page(buf),
> > +                                 map_len, DMA_FROM_DEVICE, 0);
> > +       if (addr == DMA_MAPPING_ERROR)
> > +               return -ENOMEM;
> > +
> > +       dma = rq->dma_free;
> > +       rq->dma_free = dma->next;
> > +
> > +       dma->ref = 1;
> > +       dma->buf = buf;
> > +       dma->addr = addr;
> > +       dma->len = map_len;
> > +
> > +       rq->last_dma = dma;
> > +
> > +ok:
> > +       sg_init_table(rq->sg, 1);
> > +       rq->sg[0].dma_address = addr;
> > +       rq->sg[0].length = len;
> > +
> > +       return 0;
> > +}
> > +
> > +static int virtnet_rq_merge_map_init(struct virtnet_info *vi)
> > +{
> > +       struct receive_queue *rq;
> > +       int i, err, j, num;
> > +
> > +       /* disable for big mode */
> > +       if (!vi->mergeable_rx_bufs && vi->big_packets)
> > +               return 0;
> > +
> > +       for (i = 0; i < vi->max_queue_pairs; i++) {
> > +               err = virtqueue_set_premapped(vi->rq[i].vq);
> > +               if (err)
> > +                       continue;
> > +
> > +               rq = &vi->rq[i];
> > +
> > +               num = virtqueue_get_vring_size(rq->vq);
> > +
> > +               rq->data_array = kmalloc_array(num, sizeof(*rq->data_array), GFP_KERNEL);
> > +               if (!rq->data_array)
>
> Can we avoid those allocations when we don't use the DMA API?

Yes.

The success of virtqueue_set_premapped() means that we use the DMA API.


>
> > +                       goto err;
> > +
> > +               rq->dma_array = kmalloc_array(num, sizeof(*rq->dma_array), GFP_KERNEL);
> > +               if (!rq->dma_array)
> > +                       goto err;
> > +
> > +               for (j = 0; j < num; ++j) {
> > +                       rq->data_array[j].next = rq->data_free;
> > +                       rq->data_free = &rq->data_array[j];
> > +
> > +                       rq->dma_array[j].next = rq->dma_free;
> > +                       rq->dma_free = &rq->dma_array[j];
> > +               }
> > +       }
> > +
> > +       return 0;
> > +
> > +err:
> > +       for (i = 0; i < vi->max_queue_pairs; i++) {
> > +               struct receive_queue *rq;
> > +
> > +               rq = &vi->rq[i];
> > +
> > +               kfree(rq->dma_array);
> > +               kfree(rq->data_array);
> > +       }
> > +
> > +       return -ENOMEM;
> > +}
> > +
> >  static void free_old_xmit_skbs(struct send_queue *sq, bool in_napi)
> >  {
> >         unsigned int len;
> > @@ -835,7 +1033,7 @@ static struct page *xdp_linearize_page(struct receive_queue *rq,
> >                 void *buf;
> >                 int off;
> >
> > -               buf = virtqueue_get_buf(rq->vq, &buflen);
> > +               buf = virtnet_rq_get_buf(rq, &buflen, NULL);
> >                 if (unlikely(!buf))
> >                         goto err_buf;
> >
> > @@ -1126,7 +1324,7 @@ static int virtnet_build_xdp_buff_mrg(struct net_device *dev,
> >                 return -EINVAL;
> >
> >         while (--*num_buf > 0) {
> > -               buf = virtqueue_get_buf_ctx(rq->vq, &len, &ctx);
> > +               buf = virtnet_rq_get_buf(rq, &len, &ctx);
> >                 if (unlikely(!buf)) {
> >                         pr_debug("%s: rx error: %d buffers out of %d missing\n",
> >                                  dev->name, *num_buf,
> > @@ -1351,7 +1549,7 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
> >         while (--num_buf) {
> >                 int num_skb_frags;
> >
> > -               buf = virtqueue_get_buf_ctx(rq->vq, &len, &ctx);
> > +               buf = virtnet_rq_get_buf(rq, &len, &ctx);
> >                 if (unlikely(!buf)) {
> >                         pr_debug("%s: rx error: %d buffers out of %d missing\n",
> >                                  dev->name, num_buf,
> > @@ -1414,7 +1612,7 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
> >  err_skb:
> >         put_page(page);
> >         while (num_buf-- > 1) {
> > -               buf = virtqueue_get_buf(rq->vq, &len);
> > +               buf = virtnet_rq_get_buf(rq, &len, NULL);
> >                 if (unlikely(!buf)) {
> >                         pr_debug("%s: rx error: %d buffers missing\n",
> >                                  dev->name, num_buf);
> > @@ -1529,6 +1727,7 @@ static int add_recvbuf_small(struct virtnet_info *vi, struct receive_queue *rq,
> >         unsigned int xdp_headroom = virtnet_get_headroom(vi);
> >         void *ctx = (void *)(unsigned long)xdp_headroom;
> >         int len = vi->hdr_len + VIRTNET_RX_PAD + GOOD_PACKET_LEN + xdp_headroom;
> > +       struct virtnet_rq_data *data;
> >         int err;
> >
> >         len = SKB_DATA_ALIGN(len) +
> > @@ -1539,11 +1738,34 @@ static int add_recvbuf_small(struct virtnet_info *vi, struct receive_queue *rq,
> >         buf = (char *)page_address(alloc_frag->page) + alloc_frag->offset;
> >         get_page(alloc_frag->page);
> >         alloc_frag->offset += len;
> > -       sg_init_one(rq->sg, buf + VIRTNET_RX_PAD + xdp_headroom,
> > -                   vi->hdr_len + GOOD_PACKET_LEN);
> > -       err = virtqueue_add_inbuf_ctx(rq->vq, rq->sg, 1, buf, ctx, gfp);
> > +
> > +       if (rq->data_array) {
> > +               err = virtnet_rq_map_sg(rq, buf + VIRTNET_RX_PAD + xdp_headroom,
> > +                                       vi->hdr_len + GOOD_PACKET_LEN);
>
> Thanks to the compound page. I wonder if everything could be
> simplified if we just reuse page->private for storing metadata like
> dma address and refcnt. Then we don't need extra stuff for tracking
> any other thing?


I will try.

Thanks.


>
> Thanks
>
>
>
> > +               if (err)
> > +                       goto map_err;
> > +
> > +               data = virtnet_rq_get_data(rq, buf, rq->last_dma);
> > +       } else {
> > +               sg_init_one(rq->sg, buf + VIRTNET_RX_PAD + xdp_headroom,
> > +                           vi->hdr_len + GOOD_PACKET_LEN);
> > +               data = (void *)buf;
> > +       }
> > +
> > +       err = virtqueue_add_inbuf_ctx(rq->vq, rq->sg, 1, data, ctx, gfp);
> >         if (err < 0)
> > -               put_page(virt_to_head_page(buf));
> > +               goto add_err;
> > +
> > +       return err;
> > +
> > +add_err:
> > +       if (rq->data_array) {
> > +               virtnet_rq_unmap(rq, data->dma);
> > +               virtnet_rq_recycle_data(rq, data);
> > +       }
> > +
> > +map_err:
> > +       put_page(virt_to_head_page(buf));
> >         return err;
> >  }
> >
> > @@ -1620,6 +1842,7 @@ static int add_recvbuf_mergeable(struct virtnet_info *vi,
> >         unsigned int headroom = virtnet_get_headroom(vi);
> >         unsigned int tailroom = headroom ? sizeof(struct skb_shared_info) : 0;
> >         unsigned int room = SKB_DATA_ALIGN(headroom + tailroom);
> > +       struct virtnet_rq_data *data;
> >         char *buf;
> >         void *ctx;
> >         int err;
> > @@ -1650,12 +1873,32 @@ static int add_recvbuf_mergeable(struct virtnet_info *vi,
> >                 alloc_frag->offset += hole;
> >         }
> >
> > -       sg_init_one(rq->sg, buf, len);
> > +       if (rq->data_array) {
> > +               err = virtnet_rq_map_sg(rq, buf, len);
> > +               if (err)
> > +                       goto map_err;
> > +
> > +               data = virtnet_rq_get_data(rq, buf, rq->last_dma);
> > +       } else {
> > +               sg_init_one(rq->sg, buf, len);
> > +               data = (void *)buf;
> > +       }
> > +
> >         ctx = mergeable_len_to_ctx(len + room, headroom);
> > -       err = virtqueue_add_inbuf_ctx(rq->vq, rq->sg, 1, buf, ctx, gfp);
> > +       err = virtqueue_add_inbuf_ctx(rq->vq, rq->sg, 1, data, ctx, gfp);
> >         if (err < 0)
> > -               put_page(virt_to_head_page(buf));
> > +               goto add_err;
> > +
> > +       return 0;
> > +
> > +add_err:
> > +       if (rq->data_array) {
> > +               virtnet_rq_unmap(rq, data->dma);
> > +               virtnet_rq_recycle_data(rq, data);
> > +       }
> >
> > +map_err:
> > +       put_page(virt_to_head_page(buf));
> >         return err;
> >  }
> >
> > @@ -1775,13 +2018,13 @@ static int virtnet_receive(struct receive_queue *rq, int budget,
> >                 void *ctx;
> >
> >                 while (stats.packets < budget &&
> > -                      (buf = virtqueue_get_buf_ctx(rq->vq, &len, &ctx))) {
> > +                      (buf = virtnet_rq_get_buf(rq, &len, &ctx))) {
> >                         receive_buf(vi, rq, buf, len, ctx, xdp_xmit, &stats);
> >                         stats.packets++;
> >                 }
> >         } else {
> >                 while (stats.packets < budget &&
> > -                      (buf = virtqueue_get_buf(rq->vq, &len)) != NULL) {
> > +                      (buf = virtnet_rq_get_buf(rq, &len, NULL)) != NULL) {
> >                         receive_buf(vi, rq, buf, len, NULL, xdp_xmit, &stats);
> >                         stats.packets++;
> >                 }
> > @@ -3514,6 +3757,9 @@ static void virtnet_free_queues(struct virtnet_info *vi)
> >         for (i = 0; i < vi->max_queue_pairs; i++) {
> >                 __netif_napi_del(&vi->rq[i].napi);
> >                 __netif_napi_del(&vi->sq[i].napi);
> > +
> > +               kfree(vi->rq[i].data_array);
> > +               kfree(vi->rq[i].dma_array);
> >         }
> >
> >         /* We called __netif_napi_del(),
> > @@ -3591,9 +3837,10 @@ static void free_unused_bufs(struct virtnet_info *vi)
> >         }
> >
> >         for (i = 0; i < vi->max_queue_pairs; i++) {
> > -               struct virtqueue *vq = vi->rq[i].vq;
> > -               while ((buf = virtqueue_detach_unused_buf(vq)) != NULL)
> > -                       virtnet_rq_free_unused_buf(vq, buf);
> > +               struct receive_queue *rq = &vi->rq[i];
> > +
> > +               while ((buf = virtnet_rq_detach_unused_buf(rq)) != NULL)
> > +                       virtnet_rq_free_unused_buf(rq->vq, buf);
> >                 cond_resched();
> >         }
> >  }
> > @@ -3767,6 +4014,10 @@ static int init_vqs(struct virtnet_info *vi)
> >         if (ret)
> >                 goto err_free;
> >
> > +       ret = virtnet_rq_merge_map_init(vi);
> > +       if (ret)
> > +               goto err_free;
> > +
> >         cpus_read_lock();
> >         virtnet_set_affinity(vi);
> >         cpus_read_unlock();
> > --
> > 2.32.0.3.g01195cf9f
> >
>
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: [PATCH vhost v11 10/10] virtio_net: merge dma operation for one page
@ 2023-07-13  5:53       ` Xuan Zhuo
  0 siblings, 0 replies; 176+ messages in thread
From: Xuan Zhuo @ 2023-07-13  5:53 UTC (permalink / raw)
  To: Jason Wang
  Cc: virtualization, Michael S. Tsirkin, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Alexei Starovoitov,
	Daniel Borkmann, Jesper Dangaard Brouer, John Fastabend, netdev,
	bpf, Christoph Hellwig

On Thu, 13 Jul 2023 12:20:01 +0800, Jason Wang <jasowang@redhat.com> wrote:
> On Mon, Jul 10, 2023 at 11:43 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> >
>
> I'd suggest to tweak the title like:
>
> "merge dma operations when refilling mergeable buffers"
>
> > Currently, the virtio core will perform a dma operation for each
> > operation.
>
> "for each buffer"?
>
> > Although, the same page may be operated multiple times.
> >
> > The driver does the dma operation and manages the dma address based the
> > feature premapped of virtio core.
> >
> > This way, we can perform only one dma operation for the same page. In
> > the case of mtu 1500, this can reduce a lot of dma operations.
> >
> > Tested on Aliyun g7.4large machine, in the case of a cpu 100%, pps
> > increased from 1893766 to 1901105. An increase of 0.4%.
>
> Btw, it looks to me the code to deal with XDP_TX/REDIRECT for
> linearized pages was missed.

This patch just affects the filling buffers and the getting buffers.
So I guess that you mean the getting buffers from the xdp_linearize_page().

I actually handled this. Maybe you miss that.



>
> >
> > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > ---
> >  drivers/net/virtio_net.c | 283 ++++++++++++++++++++++++++++++++++++---
> >  1 file changed, 267 insertions(+), 16 deletions(-)
> >
> > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > index 486b5849033d..4de845d35bed 100644
> > --- a/drivers/net/virtio_net.c
> > +++ b/drivers/net/virtio_net.c
> > @@ -126,6 +126,27 @@ static const struct virtnet_stat_desc virtnet_rq_stats_desc[] = {
> >  #define VIRTNET_SQ_STATS_LEN   ARRAY_SIZE(virtnet_sq_stats_desc)
> >  #define VIRTNET_RQ_STATS_LEN   ARRAY_SIZE(virtnet_rq_stats_desc)
> >
> > +/* The bufs on the same page may share this struct. */
> > +struct virtnet_rq_dma {
> > +       struct virtnet_rq_dma *next;
> > +
> > +       dma_addr_t addr;
> > +
> > +       void *buf;
> > +       u32 len;
> > +
> > +       u32 ref;
> > +};
> > +
> > +/* Record the dma and buf. */
> > +struct virtnet_rq_data {
> > +       struct virtnet_rq_data *next;
> > +
> > +       void *buf;
> > +
> > +       struct virtnet_rq_dma *dma;
> > +};
> > +
> >  /* Internal representation of a send virtqueue */
> >  struct send_queue {
> >         /* Virtqueue associated with this send _queue */
> > @@ -175,6 +196,13 @@ struct receive_queue {
> >         char name[16];
> >
> >         struct xdp_rxq_info xdp_rxq;
> > +
> > +       struct virtnet_rq_data *data_array;
> > +       struct virtnet_rq_data *data_free;
> > +
> > +       struct virtnet_rq_dma *dma_array;
> > +       struct virtnet_rq_dma *dma_free;
> > +       struct virtnet_rq_dma *last_dma;
> >  };
> >
> >  /* This structure can contain rss message with maximum settings for indirection table and keysize
> > @@ -549,6 +577,176 @@ static struct sk_buff *page_to_skb(struct virtnet_info *vi,
> >         return skb;
> >  }
> >
> > +static void virtnet_rq_unmap(struct receive_queue *rq, struct virtnet_rq_dma *dma)
> > +{
> > +       struct device *dev;
> > +
> > +       --dma->ref;
> > +
> > +       if (dma->ref)
> > +               return;
> > +
> > +       dev = virtqueue_dma_dev(rq->vq);
> > +
> > +       dma_unmap_page(dev, dma->addr, dma->len, DMA_FROM_DEVICE);
> > +
> > +       dma->next = rq->dma_free;
> > +       rq->dma_free = dma;
> > +}
> > +
> > +static void *virtnet_rq_recycle_data(struct receive_queue *rq,
> > +                                    struct virtnet_rq_data *data)
> > +{
> > +       void *buf;
> > +
> > +       buf = data->buf;
> > +
> > +       data->next = rq->data_free;
> > +       rq->data_free = data;
> > +
> > +       return buf;
> > +}
> > +
> > +static struct virtnet_rq_data *virtnet_rq_get_data(struct receive_queue *rq,
> > +                                                  void *buf,
> > +                                                  struct virtnet_rq_dma *dma)
> > +{
> > +       struct virtnet_rq_data *data;
> > +
> > +       data = rq->data_free;
> > +       rq->data_free = data->next;
> > +
> > +       data->buf = buf;
> > +       data->dma = dma;
> > +
> > +       return data;
> > +}
> > +
> > +static void *virtnet_rq_get_buf(struct receive_queue *rq, u32 *len, void **ctx)
> > +{
> > +       struct virtnet_rq_data *data;
> > +       void *buf;
> > +
> > +       buf = virtqueue_get_buf_ctx(rq->vq, len, ctx);
> > +       if (!buf || !rq->data_array)
> > +               return buf;
> > +
> > +       data = buf;
> > +
> > +       virtnet_rq_unmap(rq, data->dma);
> > +
> > +       return virtnet_rq_recycle_data(rq, data);
> > +}
> > +
> > +static void *virtnet_rq_detach_unused_buf(struct receive_queue *rq)
> > +{
> > +       struct virtnet_rq_data *data;
> > +       void *buf;
> > +
> > +       buf = virtqueue_detach_unused_buf(rq->vq);
> > +       if (!buf || !rq->data_array)
> > +               return buf;
> > +
> > +       data = buf;
> > +
> > +       virtnet_rq_unmap(rq, data->dma);
> > +
> > +       return virtnet_rq_recycle_data(rq, data);
> > +}
> > +
> > +static int virtnet_rq_map_sg(struct receive_queue *rq, void *buf, u32 len)
> > +{
> > +       struct virtnet_rq_dma *dma = rq->last_dma;
> > +       struct device *dev;
> > +       u32 off, map_len;
> > +       dma_addr_t addr;
> > +       void *end;
> > +
> > +       if (likely(dma) && buf >= dma->buf && (buf + len <= dma->buf + dma->len)) {
> > +               ++dma->ref;
> > +               addr = dma->addr + (buf - dma->buf);
> > +               goto ok;
> > +       }
> > +
> > +       end = buf + len - 1;
> > +       off = offset_in_page(end);
> > +       map_len = len + PAGE_SIZE - off;
>
> This assumes a PAGE_SIZE which seems sub-optimal as page frag could be
> larger than this.

Actually, the each time I just handle one/two page. I do not map the page frag.
Because I want to avoid that just one buffer(1500) is using, but the entire page
frag(32k) is still mapped.

Mapping the entire page frag and mapping one page every time, I don't know
which is good.

>
> > +
> > +       dev = virtqueue_dma_dev(rq->vq);
> > +
> > +       addr = dma_map_page_attrs(dev, virt_to_page(buf), offset_in_page(buf),
> > +                                 map_len, DMA_FROM_DEVICE, 0);
> > +       if (addr == DMA_MAPPING_ERROR)
> > +               return -ENOMEM;
> > +
> > +       dma = rq->dma_free;
> > +       rq->dma_free = dma->next;
> > +
> > +       dma->ref = 1;
> > +       dma->buf = buf;
> > +       dma->addr = addr;
> > +       dma->len = map_len;
> > +
> > +       rq->last_dma = dma;
> > +
> > +ok:
> > +       sg_init_table(rq->sg, 1);
> > +       rq->sg[0].dma_address = addr;
> > +       rq->sg[0].length = len;
> > +
> > +       return 0;
> > +}
> > +
> > +static int virtnet_rq_merge_map_init(struct virtnet_info *vi)
> > +{
> > +       struct receive_queue *rq;
> > +       int i, err, j, num;
> > +
> > +       /* disable for big mode */
> > +       if (!vi->mergeable_rx_bufs && vi->big_packets)
> > +               return 0;
> > +
> > +       for (i = 0; i < vi->max_queue_pairs; i++) {
> > +               err = virtqueue_set_premapped(vi->rq[i].vq);
> > +               if (err)
> > +                       continue;
> > +
> > +               rq = &vi->rq[i];
> > +
> > +               num = virtqueue_get_vring_size(rq->vq);
> > +
> > +               rq->data_array = kmalloc_array(num, sizeof(*rq->data_array), GFP_KERNEL);
> > +               if (!rq->data_array)
>
> Can we avoid those allocations when we don't use the DMA API?

Yes.

The success of virtqueue_set_premapped() means that we use the DMA API.


>
> > +                       goto err;
> > +
> > +               rq->dma_array = kmalloc_array(num, sizeof(*rq->dma_array), GFP_KERNEL);
> > +               if (!rq->dma_array)
> > +                       goto err;
> > +
> > +               for (j = 0; j < num; ++j) {
> > +                       rq->data_array[j].next = rq->data_free;
> > +                       rq->data_free = &rq->data_array[j];
> > +
> > +                       rq->dma_array[j].next = rq->dma_free;
> > +                       rq->dma_free = &rq->dma_array[j];
> > +               }
> > +       }
> > +
> > +       return 0;
> > +
> > +err:
> > +       for (i = 0; i < vi->max_queue_pairs; i++) {
> > +               struct receive_queue *rq;
> > +
> > +               rq = &vi->rq[i];
> > +
> > +               kfree(rq->dma_array);
> > +               kfree(rq->data_array);
> > +       }
> > +
> > +       return -ENOMEM;
> > +}
> > +
> >  static void free_old_xmit_skbs(struct send_queue *sq, bool in_napi)
> >  {
> >         unsigned int len;
> > @@ -835,7 +1033,7 @@ static struct page *xdp_linearize_page(struct receive_queue *rq,
> >                 void *buf;
> >                 int off;
> >
> > -               buf = virtqueue_get_buf(rq->vq, &buflen);
> > +               buf = virtnet_rq_get_buf(rq, &buflen, NULL);
> >                 if (unlikely(!buf))
> >                         goto err_buf;
> >
> > @@ -1126,7 +1324,7 @@ static int virtnet_build_xdp_buff_mrg(struct net_device *dev,
> >                 return -EINVAL;
> >
> >         while (--*num_buf > 0) {
> > -               buf = virtqueue_get_buf_ctx(rq->vq, &len, &ctx);
> > +               buf = virtnet_rq_get_buf(rq, &len, &ctx);
> >                 if (unlikely(!buf)) {
> >                         pr_debug("%s: rx error: %d buffers out of %d missing\n",
> >                                  dev->name, *num_buf,
> > @@ -1351,7 +1549,7 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
> >         while (--num_buf) {
> >                 int num_skb_frags;
> >
> > -               buf = virtqueue_get_buf_ctx(rq->vq, &len, &ctx);
> > +               buf = virtnet_rq_get_buf(rq, &len, &ctx);
> >                 if (unlikely(!buf)) {
> >                         pr_debug("%s: rx error: %d buffers out of %d missing\n",
> >                                  dev->name, num_buf,
> > @@ -1414,7 +1612,7 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
> >  err_skb:
> >         put_page(page);
> >         while (num_buf-- > 1) {
> > -               buf = virtqueue_get_buf(rq->vq, &len);
> > +               buf = virtnet_rq_get_buf(rq, &len, NULL);
> >                 if (unlikely(!buf)) {
> >                         pr_debug("%s: rx error: %d buffers missing\n",
> >                                  dev->name, num_buf);
> > @@ -1529,6 +1727,7 @@ static int add_recvbuf_small(struct virtnet_info *vi, struct receive_queue *rq,
> >         unsigned int xdp_headroom = virtnet_get_headroom(vi);
> >         void *ctx = (void *)(unsigned long)xdp_headroom;
> >         int len = vi->hdr_len + VIRTNET_RX_PAD + GOOD_PACKET_LEN + xdp_headroom;
> > +       struct virtnet_rq_data *data;
> >         int err;
> >
> >         len = SKB_DATA_ALIGN(len) +
> > @@ -1539,11 +1738,34 @@ static int add_recvbuf_small(struct virtnet_info *vi, struct receive_queue *rq,
> >         buf = (char *)page_address(alloc_frag->page) + alloc_frag->offset;
> >         get_page(alloc_frag->page);
> >         alloc_frag->offset += len;
> > -       sg_init_one(rq->sg, buf + VIRTNET_RX_PAD + xdp_headroom,
> > -                   vi->hdr_len + GOOD_PACKET_LEN);
> > -       err = virtqueue_add_inbuf_ctx(rq->vq, rq->sg, 1, buf, ctx, gfp);
> > +
> > +       if (rq->data_array) {
> > +               err = virtnet_rq_map_sg(rq, buf + VIRTNET_RX_PAD + xdp_headroom,
> > +                                       vi->hdr_len + GOOD_PACKET_LEN);
>
> Thanks to the compound page. I wonder if everything could be
> simplified if we just reuse page->private for storing metadata like
> dma address and refcnt. Then we don't need extra stuff for tracking
> any other thing?


I will try.

Thanks.


>
> Thanks
>
>
>
> > +               if (err)
> > +                       goto map_err;
> > +
> > +               data = virtnet_rq_get_data(rq, buf, rq->last_dma);
> > +       } else {
> > +               sg_init_one(rq->sg, buf + VIRTNET_RX_PAD + xdp_headroom,
> > +                           vi->hdr_len + GOOD_PACKET_LEN);
> > +               data = (void *)buf;
> > +       }
> > +
> > +       err = virtqueue_add_inbuf_ctx(rq->vq, rq->sg, 1, data, ctx, gfp);
> >         if (err < 0)
> > -               put_page(virt_to_head_page(buf));
> > +               goto add_err;
> > +
> > +       return err;
> > +
> > +add_err:
> > +       if (rq->data_array) {
> > +               virtnet_rq_unmap(rq, data->dma);
> > +               virtnet_rq_recycle_data(rq, data);
> > +       }
> > +
> > +map_err:
> > +       put_page(virt_to_head_page(buf));
> >         return err;
> >  }
> >
> > @@ -1620,6 +1842,7 @@ static int add_recvbuf_mergeable(struct virtnet_info *vi,
> >         unsigned int headroom = virtnet_get_headroom(vi);
> >         unsigned int tailroom = headroom ? sizeof(struct skb_shared_info) : 0;
> >         unsigned int room = SKB_DATA_ALIGN(headroom + tailroom);
> > +       struct virtnet_rq_data *data;
> >         char *buf;
> >         void *ctx;
> >         int err;
> > @@ -1650,12 +1873,32 @@ static int add_recvbuf_mergeable(struct virtnet_info *vi,
> >                 alloc_frag->offset += hole;
> >         }
> >
> > -       sg_init_one(rq->sg, buf, len);
> > +       if (rq->data_array) {
> > +               err = virtnet_rq_map_sg(rq, buf, len);
> > +               if (err)
> > +                       goto map_err;
> > +
> > +               data = virtnet_rq_get_data(rq, buf, rq->last_dma);
> > +       } else {
> > +               sg_init_one(rq->sg, buf, len);
> > +               data = (void *)buf;
> > +       }
> > +
> >         ctx = mergeable_len_to_ctx(len + room, headroom);
> > -       err = virtqueue_add_inbuf_ctx(rq->vq, rq->sg, 1, buf, ctx, gfp);
> > +       err = virtqueue_add_inbuf_ctx(rq->vq, rq->sg, 1, data, ctx, gfp);
> >         if (err < 0)
> > -               put_page(virt_to_head_page(buf));
> > +               goto add_err;
> > +
> > +       return 0;
> > +
> > +add_err:
> > +       if (rq->data_array) {
> > +               virtnet_rq_unmap(rq, data->dma);
> > +               virtnet_rq_recycle_data(rq, data);
> > +       }
> >
> > +map_err:
> > +       put_page(virt_to_head_page(buf));
> >         return err;
> >  }
> >
> > @@ -1775,13 +2018,13 @@ static int virtnet_receive(struct receive_queue *rq, int budget,
> >                 void *ctx;
> >
> >                 while (stats.packets < budget &&
> > -                      (buf = virtqueue_get_buf_ctx(rq->vq, &len, &ctx))) {
> > +                      (buf = virtnet_rq_get_buf(rq, &len, &ctx))) {
> >                         receive_buf(vi, rq, buf, len, ctx, xdp_xmit, &stats);
> >                         stats.packets++;
> >                 }
> >         } else {
> >                 while (stats.packets < budget &&
> > -                      (buf = virtqueue_get_buf(rq->vq, &len)) != NULL) {
> > +                      (buf = virtnet_rq_get_buf(rq, &len, NULL)) != NULL) {
> >                         receive_buf(vi, rq, buf, len, NULL, xdp_xmit, &stats);
> >                         stats.packets++;
> >                 }
> > @@ -3514,6 +3757,9 @@ static void virtnet_free_queues(struct virtnet_info *vi)
> >         for (i = 0; i < vi->max_queue_pairs; i++) {
> >                 __netif_napi_del(&vi->rq[i].napi);
> >                 __netif_napi_del(&vi->sq[i].napi);
> > +
> > +               kfree(vi->rq[i].data_array);
> > +               kfree(vi->rq[i].dma_array);
> >         }
> >
> >         /* We called __netif_napi_del(),
> > @@ -3591,9 +3837,10 @@ static void free_unused_bufs(struct virtnet_info *vi)
> >         }
> >
> >         for (i = 0; i < vi->max_queue_pairs; i++) {
> > -               struct virtqueue *vq = vi->rq[i].vq;
> > -               while ((buf = virtqueue_detach_unused_buf(vq)) != NULL)
> > -                       virtnet_rq_free_unused_buf(vq, buf);
> > +               struct receive_queue *rq = &vi->rq[i];
> > +
> > +               while ((buf = virtnet_rq_detach_unused_buf(rq)) != NULL)
> > +                       virtnet_rq_free_unused_buf(rq->vq, buf);
> >                 cond_resched();
> >         }
> >  }
> > @@ -3767,6 +4014,10 @@ static int init_vqs(struct virtnet_info *vi)
> >         if (ret)
> >                 goto err_free;
> >
> > +       ret = virtnet_rq_merge_map_init(vi);
> > +       if (ret)
> > +               goto err_free;
> > +
> >         cpus_read_lock();
> >         virtnet_set_affinity(vi);
> >         cpus_read_unlock();
> > --
> > 2.32.0.3.g01195cf9f
> >
>

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: [PATCH vhost v11 10/10] virtio_net: merge dma operation for one page
  2023-07-13  4:20     ` Jason Wang
@ 2023-07-13  6:51       ` Xuan Zhuo
  -1 siblings, 0 replies; 176+ messages in thread
From: Xuan Zhuo @ 2023-07-13  6:51 UTC (permalink / raw)
  To: Jason Wang
  Cc: Jesper Dangaard Brouer, Daniel Borkmann, Michael S. Tsirkin,
	netdev, John Fastabend, Alexei Starovoitov, virtualization,
	Christoph Hellwig, Eric Dumazet, Jakub Kicinski, bpf,
	Paolo Abeni, David S. Miller

On Thu, 13 Jul 2023 12:20:01 +0800, Jason Wang <jasowang@redhat.com> wrote:
> On Mon, Jul 10, 2023 at 11:43 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> >
>
> I'd suggest to tweak the title like:
>
> "merge dma operations when refilling mergeable buffers"
>
> > Currently, the virtio core will perform a dma operation for each
> > operation.
>
> "for each buffer"?
>
> > Although, the same page may be operated multiple times.
> >
> > The driver does the dma operation and manages the dma address based the
> > feature premapped of virtio core.
> >
> > This way, we can perform only one dma operation for the same page. In
> > the case of mtu 1500, this can reduce a lot of dma operations.
> >
> > Tested on Aliyun g7.4large machine, in the case of a cpu 100%, pps
> > increased from 1893766 to 1901105. An increase of 0.4%.
>
> Btw, it looks to me the code to deal with XDP_TX/REDIRECT for
> linearized pages was missed.
>
> >
> > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > ---
> >  drivers/net/virtio_net.c | 283 ++++++++++++++++++++++++++++++++++++---
> >  1 file changed, 267 insertions(+), 16 deletions(-)
> >
> > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > index 486b5849033d..4de845d35bed 100644
> > --- a/drivers/net/virtio_net.c
> > +++ b/drivers/net/virtio_net.c
> > @@ -126,6 +126,27 @@ static const struct virtnet_stat_desc virtnet_rq_stats_desc[] = {
> >  #define VIRTNET_SQ_STATS_LEN   ARRAY_SIZE(virtnet_sq_stats_desc)
> >  #define VIRTNET_RQ_STATS_LEN   ARRAY_SIZE(virtnet_rq_stats_desc)
> >
> > +/* The bufs on the same page may share this struct. */
> > +struct virtnet_rq_dma {
> > +       struct virtnet_rq_dma *next;
> > +
> > +       dma_addr_t addr;
> > +
> > +       void *buf;
> > +       u32 len;
> > +
> > +       u32 ref;
> > +};
> > +
> > +/* Record the dma and buf. */
> > +struct virtnet_rq_data {
> > +       struct virtnet_rq_data *next;
> > +
> > +       void *buf;
> > +
> > +       struct virtnet_rq_dma *dma;
> > +};
> > +
> >  /* Internal representation of a send virtqueue */
> >  struct send_queue {
> >         /* Virtqueue associated with this send _queue */
> > @@ -175,6 +196,13 @@ struct receive_queue {
> >         char name[16];
> >
> >         struct xdp_rxq_info xdp_rxq;
> > +
> > +       struct virtnet_rq_data *data_array;
> > +       struct virtnet_rq_data *data_free;
> > +
> > +       struct virtnet_rq_dma *dma_array;
> > +       struct virtnet_rq_dma *dma_free;
> > +       struct virtnet_rq_dma *last_dma;
> >  };
> >
> >  /* This structure can contain rss message with maximum settings for indirection table and keysize
> > @@ -549,6 +577,176 @@ static struct sk_buff *page_to_skb(struct virtnet_info *vi,
> >         return skb;
> >  }
> >
> > +static void virtnet_rq_unmap(struct receive_queue *rq, struct virtnet_rq_dma *dma)
> > +{
> > +       struct device *dev;
> > +
> > +       --dma->ref;
> > +
> > +       if (dma->ref)
> > +               return;
> > +
> > +       dev = virtqueue_dma_dev(rq->vq);
> > +
> > +       dma_unmap_page(dev, dma->addr, dma->len, DMA_FROM_DEVICE);
> > +
> > +       dma->next = rq->dma_free;
> > +       rq->dma_free = dma;
> > +}
> > +
> > +static void *virtnet_rq_recycle_data(struct receive_queue *rq,
> > +                                    struct virtnet_rq_data *data)
> > +{
> > +       void *buf;
> > +
> > +       buf = data->buf;
> > +
> > +       data->next = rq->data_free;
> > +       rq->data_free = data;
> > +
> > +       return buf;
> > +}
> > +
> > +static struct virtnet_rq_data *virtnet_rq_get_data(struct receive_queue *rq,
> > +                                                  void *buf,
> > +                                                  struct virtnet_rq_dma *dma)
> > +{
> > +       struct virtnet_rq_data *data;
> > +
> > +       data = rq->data_free;
> > +       rq->data_free = data->next;
> > +
> > +       data->buf = buf;
> > +       data->dma = dma;
> > +
> > +       return data;
> > +}
> > +
> > +static void *virtnet_rq_get_buf(struct receive_queue *rq, u32 *len, void **ctx)
> > +{
> > +       struct virtnet_rq_data *data;
> > +       void *buf;
> > +
> > +       buf = virtqueue_get_buf_ctx(rq->vq, len, ctx);
> > +       if (!buf || !rq->data_array)
> > +               return buf;
> > +
> > +       data = buf;
> > +
> > +       virtnet_rq_unmap(rq, data->dma);
> > +
> > +       return virtnet_rq_recycle_data(rq, data);
> > +}
> > +
> > +static void *virtnet_rq_detach_unused_buf(struct receive_queue *rq)
> > +{
> > +       struct virtnet_rq_data *data;
> > +       void *buf;
> > +
> > +       buf = virtqueue_detach_unused_buf(rq->vq);
> > +       if (!buf || !rq->data_array)
> > +               return buf;
> > +
> > +       data = buf;
> > +
> > +       virtnet_rq_unmap(rq, data->dma);
> > +
> > +       return virtnet_rq_recycle_data(rq, data);
> > +}
> > +
> > +static int virtnet_rq_map_sg(struct receive_queue *rq, void *buf, u32 len)
> > +{
> > +       struct virtnet_rq_dma *dma = rq->last_dma;
> > +       struct device *dev;
> > +       u32 off, map_len;
> > +       dma_addr_t addr;
> > +       void *end;
> > +
> > +       if (likely(dma) && buf >= dma->buf && (buf + len <= dma->buf + dma->len)) {
> > +               ++dma->ref;
> > +               addr = dma->addr + (buf - dma->buf);
> > +               goto ok;
> > +       }
> > +
> > +       end = buf + len - 1;
> > +       off = offset_in_page(end);
> > +       map_len = len + PAGE_SIZE - off;
>
> This assumes a PAGE_SIZE which seems sub-optimal as page frag could be
> larger than this.
>
> > +
> > +       dev = virtqueue_dma_dev(rq->vq);
> > +
> > +       addr = dma_map_page_attrs(dev, virt_to_page(buf), offset_in_page(buf),
> > +                                 map_len, DMA_FROM_DEVICE, 0);
> > +       if (addr == DMA_MAPPING_ERROR)
> > +               return -ENOMEM;
> > +
> > +       dma = rq->dma_free;
> > +       rq->dma_free = dma->next;
> > +
> > +       dma->ref = 1;
> > +       dma->buf = buf;
> > +       dma->addr = addr;
> > +       dma->len = map_len;
> > +
> > +       rq->last_dma = dma;
> > +
> > +ok:
> > +       sg_init_table(rq->sg, 1);
> > +       rq->sg[0].dma_address = addr;
> > +       rq->sg[0].length = len;
> > +
> > +       return 0;
> > +}
> > +
> > +static int virtnet_rq_merge_map_init(struct virtnet_info *vi)
> > +{
> > +       struct receive_queue *rq;
> > +       int i, err, j, num;
> > +
> > +       /* disable for big mode */
> > +       if (!vi->mergeable_rx_bufs && vi->big_packets)
> > +               return 0;
> > +
> > +       for (i = 0; i < vi->max_queue_pairs; i++) {
> > +               err = virtqueue_set_premapped(vi->rq[i].vq);
> > +               if (err)
> > +                       continue;
> > +
> > +               rq = &vi->rq[i];
> > +
> > +               num = virtqueue_get_vring_size(rq->vq);
> > +
> > +               rq->data_array = kmalloc_array(num, sizeof(*rq->data_array), GFP_KERNEL);
> > +               if (!rq->data_array)
>
> Can we avoid those allocations when we don't use the DMA API?
>
> > +                       goto err;
> > +
> > +               rq->dma_array = kmalloc_array(num, sizeof(*rq->dma_array), GFP_KERNEL);
> > +               if (!rq->dma_array)
> > +                       goto err;
> > +
> > +               for (j = 0; j < num; ++j) {
> > +                       rq->data_array[j].next = rq->data_free;
> > +                       rq->data_free = &rq->data_array[j];
> > +
> > +                       rq->dma_array[j].next = rq->dma_free;
> > +                       rq->dma_free = &rq->dma_array[j];
> > +               }
> > +       }
> > +
> > +       return 0;
> > +
> > +err:
> > +       for (i = 0; i < vi->max_queue_pairs; i++) {
> > +               struct receive_queue *rq;
> > +
> > +               rq = &vi->rq[i];
> > +
> > +               kfree(rq->dma_array);
> > +               kfree(rq->data_array);
> > +       }
> > +
> > +       return -ENOMEM;
> > +}
> > +
> >  static void free_old_xmit_skbs(struct send_queue *sq, bool in_napi)
> >  {
> >         unsigned int len;
> > @@ -835,7 +1033,7 @@ static struct page *xdp_linearize_page(struct receive_queue *rq,
> >                 void *buf;
> >                 int off;
> >
> > -               buf = virtqueue_get_buf(rq->vq, &buflen);
> > +               buf = virtnet_rq_get_buf(rq, &buflen, NULL);
> >                 if (unlikely(!buf))
> >                         goto err_buf;
> >
> > @@ -1126,7 +1324,7 @@ static int virtnet_build_xdp_buff_mrg(struct net_device *dev,
> >                 return -EINVAL;
> >
> >         while (--*num_buf > 0) {
> > -               buf = virtqueue_get_buf_ctx(rq->vq, &len, &ctx);
> > +               buf = virtnet_rq_get_buf(rq, &len, &ctx);
> >                 if (unlikely(!buf)) {
> >                         pr_debug("%s: rx error: %d buffers out of %d missing\n",
> >                                  dev->name, *num_buf,
> > @@ -1351,7 +1549,7 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
> >         while (--num_buf) {
> >                 int num_skb_frags;
> >
> > -               buf = virtqueue_get_buf_ctx(rq->vq, &len, &ctx);
> > +               buf = virtnet_rq_get_buf(rq, &len, &ctx);
> >                 if (unlikely(!buf)) {
> >                         pr_debug("%s: rx error: %d buffers out of %d missing\n",
> >                                  dev->name, num_buf,
> > @@ -1414,7 +1612,7 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
> >  err_skb:
> >         put_page(page);
> >         while (num_buf-- > 1) {
> > -               buf = virtqueue_get_buf(rq->vq, &len);
> > +               buf = virtnet_rq_get_buf(rq, &len, NULL);
> >                 if (unlikely(!buf)) {
> >                         pr_debug("%s: rx error: %d buffers missing\n",
> >                                  dev->name, num_buf);
> > @@ -1529,6 +1727,7 @@ static int add_recvbuf_small(struct virtnet_info *vi, struct receive_queue *rq,
> >         unsigned int xdp_headroom = virtnet_get_headroom(vi);
> >         void *ctx = (void *)(unsigned long)xdp_headroom;
> >         int len = vi->hdr_len + VIRTNET_RX_PAD + GOOD_PACKET_LEN + xdp_headroom;
> > +       struct virtnet_rq_data *data;
> >         int err;
> >
> >         len = SKB_DATA_ALIGN(len) +
> > @@ -1539,11 +1738,34 @@ static int add_recvbuf_small(struct virtnet_info *vi, struct receive_queue *rq,
> >         buf = (char *)page_address(alloc_frag->page) + alloc_frag->offset;
> >         get_page(alloc_frag->page);
> >         alloc_frag->offset += len;
> > -       sg_init_one(rq->sg, buf + VIRTNET_RX_PAD + xdp_headroom,
> > -                   vi->hdr_len + GOOD_PACKET_LEN);
> > -       err = virtqueue_add_inbuf_ctx(rq->vq, rq->sg, 1, buf, ctx, gfp);
> > +
> > +       if (rq->data_array) {
> > +               err = virtnet_rq_map_sg(rq, buf + VIRTNET_RX_PAD + xdp_headroom,
> > +                                       vi->hdr_len + GOOD_PACKET_LEN);
>
> Thanks to the compound page. I wonder if everything could be
> simplified if we just reuse page->private for storing metadata like
> dma address and refcnt. Then we don't need extra stuff for tracking
> any other thing?

I didn't use page->private because if part of the page is used by one skb then
the driver is not the only owner. Can we still use page->private?

Thanks.




>
> Thanks
>
>
>
> > +               if (err)
> > +                       goto map_err;
> > +
> > +               data = virtnet_rq_get_data(rq, buf, rq->last_dma);
> > +       } else {
> > +               sg_init_one(rq->sg, buf + VIRTNET_RX_PAD + xdp_headroom,
> > +                           vi->hdr_len + GOOD_PACKET_LEN);
> > +               data = (void *)buf;
> > +       }
> > +
> > +       err = virtqueue_add_inbuf_ctx(rq->vq, rq->sg, 1, data, ctx, gfp);
> >         if (err < 0)
> > -               put_page(virt_to_head_page(buf));
> > +               goto add_err;
> > +
> > +       return err;
> > +
> > +add_err:
> > +       if (rq->data_array) {
> > +               virtnet_rq_unmap(rq, data->dma);
> > +               virtnet_rq_recycle_data(rq, data);
> > +       }
> > +
> > +map_err:
> > +       put_page(virt_to_head_page(buf));
> >         return err;
> >  }
> >
> > @@ -1620,6 +1842,7 @@ static int add_recvbuf_mergeable(struct virtnet_info *vi,
> >         unsigned int headroom = virtnet_get_headroom(vi);
> >         unsigned int tailroom = headroom ? sizeof(struct skb_shared_info) : 0;
> >         unsigned int room = SKB_DATA_ALIGN(headroom + tailroom);
> > +       struct virtnet_rq_data *data;
> >         char *buf;
> >         void *ctx;
> >         int err;
> > @@ -1650,12 +1873,32 @@ static int add_recvbuf_mergeable(struct virtnet_info *vi,
> >                 alloc_frag->offset += hole;
> >         }
> >
> > -       sg_init_one(rq->sg, buf, len);
> > +       if (rq->data_array) {
> > +               err = virtnet_rq_map_sg(rq, buf, len);
> > +               if (err)
> > +                       goto map_err;
> > +
> > +               data = virtnet_rq_get_data(rq, buf, rq->last_dma);
> > +       } else {
> > +               sg_init_one(rq->sg, buf, len);
> > +               data = (void *)buf;
> > +       }
> > +
> >         ctx = mergeable_len_to_ctx(len + room, headroom);
> > -       err = virtqueue_add_inbuf_ctx(rq->vq, rq->sg, 1, buf, ctx, gfp);
> > +       err = virtqueue_add_inbuf_ctx(rq->vq, rq->sg, 1, data, ctx, gfp);
> >         if (err < 0)
> > -               put_page(virt_to_head_page(buf));
> > +               goto add_err;
> > +
> > +       return 0;
> > +
> > +add_err:
> > +       if (rq->data_array) {
> > +               virtnet_rq_unmap(rq, data->dma);
> > +               virtnet_rq_recycle_data(rq, data);
> > +       }
> >
> > +map_err:
> > +       put_page(virt_to_head_page(buf));
> >         return err;
> >  }
> >
> > @@ -1775,13 +2018,13 @@ static int virtnet_receive(struct receive_queue *rq, int budget,
> >                 void *ctx;
> >
> >                 while (stats.packets < budget &&
> > -                      (buf = virtqueue_get_buf_ctx(rq->vq, &len, &ctx))) {
> > +                      (buf = virtnet_rq_get_buf(rq, &len, &ctx))) {
> >                         receive_buf(vi, rq, buf, len, ctx, xdp_xmit, &stats);
> >                         stats.packets++;
> >                 }
> >         } else {
> >                 while (stats.packets < budget &&
> > -                      (buf = virtqueue_get_buf(rq->vq, &len)) != NULL) {
> > +                      (buf = virtnet_rq_get_buf(rq, &len, NULL)) != NULL) {
> >                         receive_buf(vi, rq, buf, len, NULL, xdp_xmit, &stats);
> >                         stats.packets++;
> >                 }
> > @@ -3514,6 +3757,9 @@ static void virtnet_free_queues(struct virtnet_info *vi)
> >         for (i = 0; i < vi->max_queue_pairs; i++) {
> >                 __netif_napi_del(&vi->rq[i].napi);
> >                 __netif_napi_del(&vi->sq[i].napi);
> > +
> > +               kfree(vi->rq[i].data_array);
> > +               kfree(vi->rq[i].dma_array);
> >         }
> >
> >         /* We called __netif_napi_del(),
> > @@ -3591,9 +3837,10 @@ static void free_unused_bufs(struct virtnet_info *vi)
> >         }
> >
> >         for (i = 0; i < vi->max_queue_pairs; i++) {
> > -               struct virtqueue *vq = vi->rq[i].vq;
> > -               while ((buf = virtqueue_detach_unused_buf(vq)) != NULL)
> > -                       virtnet_rq_free_unused_buf(vq, buf);
> > +               struct receive_queue *rq = &vi->rq[i];
> > +
> > +               while ((buf = virtnet_rq_detach_unused_buf(rq)) != NULL)
> > +                       virtnet_rq_free_unused_buf(rq->vq, buf);
> >                 cond_resched();
> >         }
> >  }
> > @@ -3767,6 +4014,10 @@ static int init_vqs(struct virtnet_info *vi)
> >         if (ret)
> >                 goto err_free;
> >
> > +       ret = virtnet_rq_merge_map_init(vi);
> > +       if (ret)
> > +               goto err_free;
> > +
> >         cpus_read_lock();
> >         virtnet_set_affinity(vi);
> >         cpus_read_unlock();
> > --
> > 2.32.0.3.g01195cf9f
> >
>
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: [PATCH vhost v11 10/10] virtio_net: merge dma operation for one page
@ 2023-07-13  6:51       ` Xuan Zhuo
  0 siblings, 0 replies; 176+ messages in thread
From: Xuan Zhuo @ 2023-07-13  6:51 UTC (permalink / raw)
  To: Jason Wang
  Cc: virtualization, Michael S. Tsirkin, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Alexei Starovoitov,
	Daniel Borkmann, Jesper Dangaard Brouer, John Fastabend, netdev,
	bpf, Christoph Hellwig

On Thu, 13 Jul 2023 12:20:01 +0800, Jason Wang <jasowang@redhat.com> wrote:
> On Mon, Jul 10, 2023 at 11:43 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> >
>
> I'd suggest to tweak the title like:
>
> "merge dma operations when refilling mergeable buffers"
>
> > Currently, the virtio core will perform a dma operation for each
> > operation.
>
> "for each buffer"?
>
> > Although, the same page may be operated multiple times.
> >
> > The driver does the dma operation and manages the dma address based the
> > feature premapped of virtio core.
> >
> > This way, we can perform only one dma operation for the same page. In
> > the case of mtu 1500, this can reduce a lot of dma operations.
> >
> > Tested on Aliyun g7.4large machine, in the case of a cpu 100%, pps
> > increased from 1893766 to 1901105. An increase of 0.4%.
>
> Btw, it looks to me the code to deal with XDP_TX/REDIRECT for
> linearized pages was missed.
>
> >
> > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > ---
> >  drivers/net/virtio_net.c | 283 ++++++++++++++++++++++++++++++++++++---
> >  1 file changed, 267 insertions(+), 16 deletions(-)
> >
> > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > index 486b5849033d..4de845d35bed 100644
> > --- a/drivers/net/virtio_net.c
> > +++ b/drivers/net/virtio_net.c
> > @@ -126,6 +126,27 @@ static const struct virtnet_stat_desc virtnet_rq_stats_desc[] = {
> >  #define VIRTNET_SQ_STATS_LEN   ARRAY_SIZE(virtnet_sq_stats_desc)
> >  #define VIRTNET_RQ_STATS_LEN   ARRAY_SIZE(virtnet_rq_stats_desc)
> >
> > +/* The bufs on the same page may share this struct. */
> > +struct virtnet_rq_dma {
> > +       struct virtnet_rq_dma *next;
> > +
> > +       dma_addr_t addr;
> > +
> > +       void *buf;
> > +       u32 len;
> > +
> > +       u32 ref;
> > +};
> > +
> > +/* Record the dma and buf. */
> > +struct virtnet_rq_data {
> > +       struct virtnet_rq_data *next;
> > +
> > +       void *buf;
> > +
> > +       struct virtnet_rq_dma *dma;
> > +};
> > +
> >  /* Internal representation of a send virtqueue */
> >  struct send_queue {
> >         /* Virtqueue associated with this send _queue */
> > @@ -175,6 +196,13 @@ struct receive_queue {
> >         char name[16];
> >
> >         struct xdp_rxq_info xdp_rxq;
> > +
> > +       struct virtnet_rq_data *data_array;
> > +       struct virtnet_rq_data *data_free;
> > +
> > +       struct virtnet_rq_dma *dma_array;
> > +       struct virtnet_rq_dma *dma_free;
> > +       struct virtnet_rq_dma *last_dma;
> >  };
> >
> >  /* This structure can contain rss message with maximum settings for indirection table and keysize
> > @@ -549,6 +577,176 @@ static struct sk_buff *page_to_skb(struct virtnet_info *vi,
> >         return skb;
> >  }
> >
> > +static void virtnet_rq_unmap(struct receive_queue *rq, struct virtnet_rq_dma *dma)
> > +{
> > +       struct device *dev;
> > +
> > +       --dma->ref;
> > +
> > +       if (dma->ref)
> > +               return;
> > +
> > +       dev = virtqueue_dma_dev(rq->vq);
> > +
> > +       dma_unmap_page(dev, dma->addr, dma->len, DMA_FROM_DEVICE);
> > +
> > +       dma->next = rq->dma_free;
> > +       rq->dma_free = dma;
> > +}
> > +
> > +static void *virtnet_rq_recycle_data(struct receive_queue *rq,
> > +                                    struct virtnet_rq_data *data)
> > +{
> > +       void *buf;
> > +
> > +       buf = data->buf;
> > +
> > +       data->next = rq->data_free;
> > +       rq->data_free = data;
> > +
> > +       return buf;
> > +}
> > +
> > +static struct virtnet_rq_data *virtnet_rq_get_data(struct receive_queue *rq,
> > +                                                  void *buf,
> > +                                                  struct virtnet_rq_dma *dma)
> > +{
> > +       struct virtnet_rq_data *data;
> > +
> > +       data = rq->data_free;
> > +       rq->data_free = data->next;
> > +
> > +       data->buf = buf;
> > +       data->dma = dma;
> > +
> > +       return data;
> > +}
> > +
> > +static void *virtnet_rq_get_buf(struct receive_queue *rq, u32 *len, void **ctx)
> > +{
> > +       struct virtnet_rq_data *data;
> > +       void *buf;
> > +
> > +       buf = virtqueue_get_buf_ctx(rq->vq, len, ctx);
> > +       if (!buf || !rq->data_array)
> > +               return buf;
> > +
> > +       data = buf;
> > +
> > +       virtnet_rq_unmap(rq, data->dma);
> > +
> > +       return virtnet_rq_recycle_data(rq, data);
> > +}
> > +
> > +static void *virtnet_rq_detach_unused_buf(struct receive_queue *rq)
> > +{
> > +       struct virtnet_rq_data *data;
> > +       void *buf;
> > +
> > +       buf = virtqueue_detach_unused_buf(rq->vq);
> > +       if (!buf || !rq->data_array)
> > +               return buf;
> > +
> > +       data = buf;
> > +
> > +       virtnet_rq_unmap(rq, data->dma);
> > +
> > +       return virtnet_rq_recycle_data(rq, data);
> > +}
> > +
> > +static int virtnet_rq_map_sg(struct receive_queue *rq, void *buf, u32 len)
> > +{
> > +       struct virtnet_rq_dma *dma = rq->last_dma;
> > +       struct device *dev;
> > +       u32 off, map_len;
> > +       dma_addr_t addr;
> > +       void *end;
> > +
> > +       if (likely(dma) && buf >= dma->buf && (buf + len <= dma->buf + dma->len)) {
> > +               ++dma->ref;
> > +               addr = dma->addr + (buf - dma->buf);
> > +               goto ok;
> > +       }
> > +
> > +       end = buf + len - 1;
> > +       off = offset_in_page(end);
> > +       map_len = len + PAGE_SIZE - off;
>
> This assumes a PAGE_SIZE which seems sub-optimal as page frag could be
> larger than this.
>
> > +
> > +       dev = virtqueue_dma_dev(rq->vq);
> > +
> > +       addr = dma_map_page_attrs(dev, virt_to_page(buf), offset_in_page(buf),
> > +                                 map_len, DMA_FROM_DEVICE, 0);
> > +       if (addr == DMA_MAPPING_ERROR)
> > +               return -ENOMEM;
> > +
> > +       dma = rq->dma_free;
> > +       rq->dma_free = dma->next;
> > +
> > +       dma->ref = 1;
> > +       dma->buf = buf;
> > +       dma->addr = addr;
> > +       dma->len = map_len;
> > +
> > +       rq->last_dma = dma;
> > +
> > +ok:
> > +       sg_init_table(rq->sg, 1);
> > +       rq->sg[0].dma_address = addr;
> > +       rq->sg[0].length = len;
> > +
> > +       return 0;
> > +}
> > +
> > +static int virtnet_rq_merge_map_init(struct virtnet_info *vi)
> > +{
> > +       struct receive_queue *rq;
> > +       int i, err, j, num;
> > +
> > +       /* disable for big mode */
> > +       if (!vi->mergeable_rx_bufs && vi->big_packets)
> > +               return 0;
> > +
> > +       for (i = 0; i < vi->max_queue_pairs; i++) {
> > +               err = virtqueue_set_premapped(vi->rq[i].vq);
> > +               if (err)
> > +                       continue;
> > +
> > +               rq = &vi->rq[i];
> > +
> > +               num = virtqueue_get_vring_size(rq->vq);
> > +
> > +               rq->data_array = kmalloc_array(num, sizeof(*rq->data_array), GFP_KERNEL);
> > +               if (!rq->data_array)
>
> Can we avoid those allocations when we don't use the DMA API?
>
> > +                       goto err;
> > +
> > +               rq->dma_array = kmalloc_array(num, sizeof(*rq->dma_array), GFP_KERNEL);
> > +               if (!rq->dma_array)
> > +                       goto err;
> > +
> > +               for (j = 0; j < num; ++j) {
> > +                       rq->data_array[j].next = rq->data_free;
> > +                       rq->data_free = &rq->data_array[j];
> > +
> > +                       rq->dma_array[j].next = rq->dma_free;
> > +                       rq->dma_free = &rq->dma_array[j];
> > +               }
> > +       }
> > +
> > +       return 0;
> > +
> > +err:
> > +       for (i = 0; i < vi->max_queue_pairs; i++) {
> > +               struct receive_queue *rq;
> > +
> > +               rq = &vi->rq[i];
> > +
> > +               kfree(rq->dma_array);
> > +               kfree(rq->data_array);
> > +       }
> > +
> > +       return -ENOMEM;
> > +}
> > +
> >  static void free_old_xmit_skbs(struct send_queue *sq, bool in_napi)
> >  {
> >         unsigned int len;
> > @@ -835,7 +1033,7 @@ static struct page *xdp_linearize_page(struct receive_queue *rq,
> >                 void *buf;
> >                 int off;
> >
> > -               buf = virtqueue_get_buf(rq->vq, &buflen);
> > +               buf = virtnet_rq_get_buf(rq, &buflen, NULL);
> >                 if (unlikely(!buf))
> >                         goto err_buf;
> >
> > @@ -1126,7 +1324,7 @@ static int virtnet_build_xdp_buff_mrg(struct net_device *dev,
> >                 return -EINVAL;
> >
> >         while (--*num_buf > 0) {
> > -               buf = virtqueue_get_buf_ctx(rq->vq, &len, &ctx);
> > +               buf = virtnet_rq_get_buf(rq, &len, &ctx);
> >                 if (unlikely(!buf)) {
> >                         pr_debug("%s: rx error: %d buffers out of %d missing\n",
> >                                  dev->name, *num_buf,
> > @@ -1351,7 +1549,7 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
> >         while (--num_buf) {
> >                 int num_skb_frags;
> >
> > -               buf = virtqueue_get_buf_ctx(rq->vq, &len, &ctx);
> > +               buf = virtnet_rq_get_buf(rq, &len, &ctx);
> >                 if (unlikely(!buf)) {
> >                         pr_debug("%s: rx error: %d buffers out of %d missing\n",
> >                                  dev->name, num_buf,
> > @@ -1414,7 +1612,7 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
> >  err_skb:
> >         put_page(page);
> >         while (num_buf-- > 1) {
> > -               buf = virtqueue_get_buf(rq->vq, &len);
> > +               buf = virtnet_rq_get_buf(rq, &len, NULL);
> >                 if (unlikely(!buf)) {
> >                         pr_debug("%s: rx error: %d buffers missing\n",
> >                                  dev->name, num_buf);
> > @@ -1529,6 +1727,7 @@ static int add_recvbuf_small(struct virtnet_info *vi, struct receive_queue *rq,
> >         unsigned int xdp_headroom = virtnet_get_headroom(vi);
> >         void *ctx = (void *)(unsigned long)xdp_headroom;
> >         int len = vi->hdr_len + VIRTNET_RX_PAD + GOOD_PACKET_LEN + xdp_headroom;
> > +       struct virtnet_rq_data *data;
> >         int err;
> >
> >         len = SKB_DATA_ALIGN(len) +
> > @@ -1539,11 +1738,34 @@ static int add_recvbuf_small(struct virtnet_info *vi, struct receive_queue *rq,
> >         buf = (char *)page_address(alloc_frag->page) + alloc_frag->offset;
> >         get_page(alloc_frag->page);
> >         alloc_frag->offset += len;
> > -       sg_init_one(rq->sg, buf + VIRTNET_RX_PAD + xdp_headroom,
> > -                   vi->hdr_len + GOOD_PACKET_LEN);
> > -       err = virtqueue_add_inbuf_ctx(rq->vq, rq->sg, 1, buf, ctx, gfp);
> > +
> > +       if (rq->data_array) {
> > +               err = virtnet_rq_map_sg(rq, buf + VIRTNET_RX_PAD + xdp_headroom,
> > +                                       vi->hdr_len + GOOD_PACKET_LEN);
>
> Thanks to the compound page. I wonder if everything could be
> simplified if we just reuse page->private for storing metadata like
> dma address and refcnt. Then we don't need extra stuff for tracking
> any other thing?

I didn't use page->private because if part of the page is used by one skb then
the driver is not the only owner. Can we still use page->private?

Thanks.




>
> Thanks
>
>
>
> > +               if (err)
> > +                       goto map_err;
> > +
> > +               data = virtnet_rq_get_data(rq, buf, rq->last_dma);
> > +       } else {
> > +               sg_init_one(rq->sg, buf + VIRTNET_RX_PAD + xdp_headroom,
> > +                           vi->hdr_len + GOOD_PACKET_LEN);
> > +               data = (void *)buf;
> > +       }
> > +
> > +       err = virtqueue_add_inbuf_ctx(rq->vq, rq->sg, 1, data, ctx, gfp);
> >         if (err < 0)
> > -               put_page(virt_to_head_page(buf));
> > +               goto add_err;
> > +
> > +       return err;
> > +
> > +add_err:
> > +       if (rq->data_array) {
> > +               virtnet_rq_unmap(rq, data->dma);
> > +               virtnet_rq_recycle_data(rq, data);
> > +       }
> > +
> > +map_err:
> > +       put_page(virt_to_head_page(buf));
> >         return err;
> >  }
> >
> > @@ -1620,6 +1842,7 @@ static int add_recvbuf_mergeable(struct virtnet_info *vi,
> >         unsigned int headroom = virtnet_get_headroom(vi);
> >         unsigned int tailroom = headroom ? sizeof(struct skb_shared_info) : 0;
> >         unsigned int room = SKB_DATA_ALIGN(headroom + tailroom);
> > +       struct virtnet_rq_data *data;
> >         char *buf;
> >         void *ctx;
> >         int err;
> > @@ -1650,12 +1873,32 @@ static int add_recvbuf_mergeable(struct virtnet_info *vi,
> >                 alloc_frag->offset += hole;
> >         }
> >
> > -       sg_init_one(rq->sg, buf, len);
> > +       if (rq->data_array) {
> > +               err = virtnet_rq_map_sg(rq, buf, len);
> > +               if (err)
> > +                       goto map_err;
> > +
> > +               data = virtnet_rq_get_data(rq, buf, rq->last_dma);
> > +       } else {
> > +               sg_init_one(rq->sg, buf, len);
> > +               data = (void *)buf;
> > +       }
> > +
> >         ctx = mergeable_len_to_ctx(len + room, headroom);
> > -       err = virtqueue_add_inbuf_ctx(rq->vq, rq->sg, 1, buf, ctx, gfp);
> > +       err = virtqueue_add_inbuf_ctx(rq->vq, rq->sg, 1, data, ctx, gfp);
> >         if (err < 0)
> > -               put_page(virt_to_head_page(buf));
> > +               goto add_err;
> > +
> > +       return 0;
> > +
> > +add_err:
> > +       if (rq->data_array) {
> > +               virtnet_rq_unmap(rq, data->dma);
> > +               virtnet_rq_recycle_data(rq, data);
> > +       }
> >
> > +map_err:
> > +       put_page(virt_to_head_page(buf));
> >         return err;
> >  }
> >
> > @@ -1775,13 +2018,13 @@ static int virtnet_receive(struct receive_queue *rq, int budget,
> >                 void *ctx;
> >
> >                 while (stats.packets < budget &&
> > -                      (buf = virtqueue_get_buf_ctx(rq->vq, &len, &ctx))) {
> > +                      (buf = virtnet_rq_get_buf(rq, &len, &ctx))) {
> >                         receive_buf(vi, rq, buf, len, ctx, xdp_xmit, &stats);
> >                         stats.packets++;
> >                 }
> >         } else {
> >                 while (stats.packets < budget &&
> > -                      (buf = virtqueue_get_buf(rq->vq, &len)) != NULL) {
> > +                      (buf = virtnet_rq_get_buf(rq, &len, NULL)) != NULL) {
> >                         receive_buf(vi, rq, buf, len, NULL, xdp_xmit, &stats);
> >                         stats.packets++;
> >                 }
> > @@ -3514,6 +3757,9 @@ static void virtnet_free_queues(struct virtnet_info *vi)
> >         for (i = 0; i < vi->max_queue_pairs; i++) {
> >                 __netif_napi_del(&vi->rq[i].napi);
> >                 __netif_napi_del(&vi->sq[i].napi);
> > +
> > +               kfree(vi->rq[i].data_array);
> > +               kfree(vi->rq[i].dma_array);
> >         }
> >
> >         /* We called __netif_napi_del(),
> > @@ -3591,9 +3837,10 @@ static void free_unused_bufs(struct virtnet_info *vi)
> >         }
> >
> >         for (i = 0; i < vi->max_queue_pairs; i++) {
> > -               struct virtqueue *vq = vi->rq[i].vq;
> > -               while ((buf = virtqueue_detach_unused_buf(vq)) != NULL)
> > -                       virtnet_rq_free_unused_buf(vq, buf);
> > +               struct receive_queue *rq = &vi->rq[i];
> > +
> > +               while ((buf = virtnet_rq_detach_unused_buf(rq)) != NULL)
> > +                       virtnet_rq_free_unused_buf(rq->vq, buf);
> >                 cond_resched();
> >         }
> >  }
> > @@ -3767,6 +4014,10 @@ static int init_vqs(struct virtnet_info *vi)
> >         if (ret)
> >                 goto err_free;
> >
> > +       ret = virtnet_rq_merge_map_init(vi);
> > +       if (ret)
> > +               goto err_free;
> > +
> >         cpus_read_lock();
> >         virtnet_set_affinity(vi);
> >         cpus_read_unlock();
> > --
> > 2.32.0.3.g01195cf9f
> >
>

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: [PATCH vhost v11 10/10] virtio_net: merge dma operation for one page
  2023-07-13  4:20     ` Jason Wang
@ 2023-07-13  7:00       ` Xuan Zhuo
  -1 siblings, 0 replies; 176+ messages in thread
From: Xuan Zhuo @ 2023-07-13  7:00 UTC (permalink / raw)
  To: Jason Wang
  Cc: Jesper Dangaard Brouer, Daniel Borkmann, Michael S. Tsirkin,
	netdev, John Fastabend, Alexei Starovoitov, virtualization,
	Christoph Hellwig, Eric Dumazet, Jakub Kicinski, bpf,
	Paolo Abeni, David S. Miller

On Thu, 13 Jul 2023 12:20:01 +0800, Jason Wang <jasowang@redhat.com> wrote:
> On Mon, Jul 10, 2023 at 11:43 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> >
>
> I'd suggest to tweak the title like:
>
> "merge dma operations when refilling mergeable buffers"
>
> > Currently, the virtio core will perform a dma operation for each
> > operation.
>
> "for each buffer"?
>
> > Although, the same page may be operated multiple times.
> >
> > The driver does the dma operation and manages the dma address based the
> > feature premapped of virtio core.
> >
> > This way, we can perform only one dma operation for the same page. In
> > the case of mtu 1500, this can reduce a lot of dma operations.
> >
> > Tested on Aliyun g7.4large machine, in the case of a cpu 100%, pps
> > increased from 1893766 to 1901105. An increase of 0.4%.
>
> Btw, it looks to me the code to deal with XDP_TX/REDIRECT for
> linearized pages was missed.
>
> >
> > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > ---
> >  drivers/net/virtio_net.c | 283 ++++++++++++++++++++++++++++++++++++---
> >  1 file changed, 267 insertions(+), 16 deletions(-)
> >
> > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > index 486b5849033d..4de845d35bed 100644
> > --- a/drivers/net/virtio_net.c
> > +++ b/drivers/net/virtio_net.c
> > @@ -126,6 +126,27 @@ static const struct virtnet_stat_desc virtnet_rq_stats_desc[] = {
> >  #define VIRTNET_SQ_STATS_LEN   ARRAY_SIZE(virtnet_sq_stats_desc)
> >  #define VIRTNET_RQ_STATS_LEN   ARRAY_SIZE(virtnet_rq_stats_desc)
> >
> > +/* The bufs on the same page may share this struct. */
> > +struct virtnet_rq_dma {
> > +       struct virtnet_rq_dma *next;
> > +
> > +       dma_addr_t addr;
> > +
> > +       void *buf;
> > +       u32 len;
> > +
> > +       u32 ref;
> > +};
> > +
> > +/* Record the dma and buf. */
> > +struct virtnet_rq_data {
> > +       struct virtnet_rq_data *next;
> > +
> > +       void *buf;
> > +
> > +       struct virtnet_rq_dma *dma;
> > +};
> > +
> >  /* Internal representation of a send virtqueue */
> >  struct send_queue {
> >         /* Virtqueue associated with this send _queue */
> > @@ -175,6 +196,13 @@ struct receive_queue {
> >         char name[16];
> >
> >         struct xdp_rxq_info xdp_rxq;
> > +
> > +       struct virtnet_rq_data *data_array;
> > +       struct virtnet_rq_data *data_free;
> > +
> > +       struct virtnet_rq_dma *dma_array;
> > +       struct virtnet_rq_dma *dma_free;
> > +       struct virtnet_rq_dma *last_dma;
> >  };
> >
> >  /* This structure can contain rss message with maximum settings for indirection table and keysize
> > @@ -549,6 +577,176 @@ static struct sk_buff *page_to_skb(struct virtnet_info *vi,
> >         return skb;
> >  }
> >
> > +static void virtnet_rq_unmap(struct receive_queue *rq, struct virtnet_rq_dma *dma)
> > +{
> > +       struct device *dev;
> > +
> > +       --dma->ref;
> > +
> > +       if (dma->ref)
> > +               return;
> > +
> > +       dev = virtqueue_dma_dev(rq->vq);
> > +
> > +       dma_unmap_page(dev, dma->addr, dma->len, DMA_FROM_DEVICE);
> > +
> > +       dma->next = rq->dma_free;
> > +       rq->dma_free = dma;
> > +}
> > +
> > +static void *virtnet_rq_recycle_data(struct receive_queue *rq,
> > +                                    struct virtnet_rq_data *data)
> > +{
> > +       void *buf;
> > +
> > +       buf = data->buf;
> > +
> > +       data->next = rq->data_free;
> > +       rq->data_free = data;
> > +
> > +       return buf;
> > +}
> > +
> > +static struct virtnet_rq_data *virtnet_rq_get_data(struct receive_queue *rq,
> > +                                                  void *buf,
> > +                                                  struct virtnet_rq_dma *dma)
> > +{
> > +       struct virtnet_rq_data *data;
> > +
> > +       data = rq->data_free;
> > +       rq->data_free = data->next;
> > +
> > +       data->buf = buf;
> > +       data->dma = dma;
> > +
> > +       return data;
> > +}
> > +
> > +static void *virtnet_rq_get_buf(struct receive_queue *rq, u32 *len, void **ctx)
> > +{
> > +       struct virtnet_rq_data *data;
> > +       void *buf;
> > +
> > +       buf = virtqueue_get_buf_ctx(rq->vq, len, ctx);
> > +       if (!buf || !rq->data_array)
> > +               return buf;
> > +
> > +       data = buf;
> > +
> > +       virtnet_rq_unmap(rq, data->dma);
> > +
> > +       return virtnet_rq_recycle_data(rq, data);
> > +}
> > +
> > +static void *virtnet_rq_detach_unused_buf(struct receive_queue *rq)
> > +{
> > +       struct virtnet_rq_data *data;
> > +       void *buf;
> > +
> > +       buf = virtqueue_detach_unused_buf(rq->vq);
> > +       if (!buf || !rq->data_array)
> > +               return buf;
> > +
> > +       data = buf;
> > +
> > +       virtnet_rq_unmap(rq, data->dma);
> > +
> > +       return virtnet_rq_recycle_data(rq, data);
> > +}
> > +
> > +static int virtnet_rq_map_sg(struct receive_queue *rq, void *buf, u32 len)
> > +{
> > +       struct virtnet_rq_dma *dma = rq->last_dma;
> > +       struct device *dev;
> > +       u32 off, map_len;
> > +       dma_addr_t addr;
> > +       void *end;
> > +
> > +       if (likely(dma) && buf >= dma->buf && (buf + len <= dma->buf + dma->len)) {
> > +               ++dma->ref;
> > +               addr = dma->addr + (buf - dma->buf);
> > +               goto ok;
> > +       }
> > +
> > +       end = buf + len - 1;
> > +       off = offset_in_page(end);
> > +       map_len = len + PAGE_SIZE - off;
>
> This assumes a PAGE_SIZE which seems sub-optimal as page frag could be
> larger than this.
>
> > +
> > +       dev = virtqueue_dma_dev(rq->vq);
> > +
> > +       addr = dma_map_page_attrs(dev, virt_to_page(buf), offset_in_page(buf),
> > +                                 map_len, DMA_FROM_DEVICE, 0);
> > +       if (addr == DMA_MAPPING_ERROR)
> > +               return -ENOMEM;
> > +
> > +       dma = rq->dma_free;
> > +       rq->dma_free = dma->next;
> > +
> > +       dma->ref = 1;
> > +       dma->buf = buf;
> > +       dma->addr = addr;
> > +       dma->len = map_len;
> > +
> > +       rq->last_dma = dma;
> > +
> > +ok:
> > +       sg_init_table(rq->sg, 1);
> > +       rq->sg[0].dma_address = addr;
> > +       rq->sg[0].length = len;
> > +
> > +       return 0;
> > +}
> > +
> > +static int virtnet_rq_merge_map_init(struct virtnet_info *vi)
> > +{
> > +       struct receive_queue *rq;
> > +       int i, err, j, num;
> > +
> > +       /* disable for big mode */
> > +       if (!vi->mergeable_rx_bufs && vi->big_packets)
> > +               return 0;
> > +
> > +       for (i = 0; i < vi->max_queue_pairs; i++) {
> > +               err = virtqueue_set_premapped(vi->rq[i].vq);
> > +               if (err)
> > +                       continue;
> > +
> > +               rq = &vi->rq[i];
> > +
> > +               num = virtqueue_get_vring_size(rq->vq);
> > +
> > +               rq->data_array = kmalloc_array(num, sizeof(*rq->data_array), GFP_KERNEL);
> > +               if (!rq->data_array)
>
> Can we avoid those allocations when we don't use the DMA API?
>
> > +                       goto err;
> > +
> > +               rq->dma_array = kmalloc_array(num, sizeof(*rq->dma_array), GFP_KERNEL);
> > +               if (!rq->dma_array)
> > +                       goto err;
> > +
> > +               for (j = 0; j < num; ++j) {
> > +                       rq->data_array[j].next = rq->data_free;
> > +                       rq->data_free = &rq->data_array[j];
> > +
> > +                       rq->dma_array[j].next = rq->dma_free;
> > +                       rq->dma_free = &rq->dma_array[j];
> > +               }
> > +       }
> > +
> > +       return 0;
> > +
> > +err:
> > +       for (i = 0; i < vi->max_queue_pairs; i++) {
> > +               struct receive_queue *rq;
> > +
> > +               rq = &vi->rq[i];
> > +
> > +               kfree(rq->dma_array);
> > +               kfree(rq->data_array);
> > +       }
> > +
> > +       return -ENOMEM;
> > +}
> > +
> >  static void free_old_xmit_skbs(struct send_queue *sq, bool in_napi)
> >  {
> >         unsigned int len;
> > @@ -835,7 +1033,7 @@ static struct page *xdp_linearize_page(struct receive_queue *rq,
> >                 void *buf;
> >                 int off;
> >
> > -               buf = virtqueue_get_buf(rq->vq, &buflen);
> > +               buf = virtnet_rq_get_buf(rq, &buflen, NULL);
> >                 if (unlikely(!buf))
> >                         goto err_buf;
> >
> > @@ -1126,7 +1324,7 @@ static int virtnet_build_xdp_buff_mrg(struct net_device *dev,
> >                 return -EINVAL;
> >
> >         while (--*num_buf > 0) {
> > -               buf = virtqueue_get_buf_ctx(rq->vq, &len, &ctx);
> > +               buf = virtnet_rq_get_buf(rq, &len, &ctx);
> >                 if (unlikely(!buf)) {
> >                         pr_debug("%s: rx error: %d buffers out of %d missing\n",
> >                                  dev->name, *num_buf,
> > @@ -1351,7 +1549,7 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
> >         while (--num_buf) {
> >                 int num_skb_frags;
> >
> > -               buf = virtqueue_get_buf_ctx(rq->vq, &len, &ctx);
> > +               buf = virtnet_rq_get_buf(rq, &len, &ctx);
> >                 if (unlikely(!buf)) {
> >                         pr_debug("%s: rx error: %d buffers out of %d missing\n",
> >                                  dev->name, num_buf,
> > @@ -1414,7 +1612,7 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
> >  err_skb:
> >         put_page(page);
> >         while (num_buf-- > 1) {
> > -               buf = virtqueue_get_buf(rq->vq, &len);
> > +               buf = virtnet_rq_get_buf(rq, &len, NULL);
> >                 if (unlikely(!buf)) {
> >                         pr_debug("%s: rx error: %d buffers missing\n",
> >                                  dev->name, num_buf);
> > @@ -1529,6 +1727,7 @@ static int add_recvbuf_small(struct virtnet_info *vi, struct receive_queue *rq,
> >         unsigned int xdp_headroom = virtnet_get_headroom(vi);
> >         void *ctx = (void *)(unsigned long)xdp_headroom;
> >         int len = vi->hdr_len + VIRTNET_RX_PAD + GOOD_PACKET_LEN + xdp_headroom;
> > +       struct virtnet_rq_data *data;
> >         int err;
> >
> >         len = SKB_DATA_ALIGN(len) +
> > @@ -1539,11 +1738,34 @@ static int add_recvbuf_small(struct virtnet_info *vi, struct receive_queue *rq,
> >         buf = (char *)page_address(alloc_frag->page) + alloc_frag->offset;
> >         get_page(alloc_frag->page);
> >         alloc_frag->offset += len;
> > -       sg_init_one(rq->sg, buf + VIRTNET_RX_PAD + xdp_headroom,
> > -                   vi->hdr_len + GOOD_PACKET_LEN);
> > -       err = virtqueue_add_inbuf_ctx(rq->vq, rq->sg, 1, buf, ctx, gfp);
> > +
> > +       if (rq->data_array) {
> > +               err = virtnet_rq_map_sg(rq, buf + VIRTNET_RX_PAD + xdp_headroom,
> > +                                       vi->hdr_len + GOOD_PACKET_LEN);
>
> Thanks to the compound page. I wonder if everything could be
> simplified if we just reuse page->private for storing metadata like
> dma address and refcnt. Then we don't need extra stuff for tracking
> any other thing?

Maybe we can try alloc one small buffer from the page_frag to store the dma info
when page_frag.offset == 0.

Thanks.


>
> Thanks
>
>
>
> > +               if (err)
> > +                       goto map_err;
> > +
> > +               data = virtnet_rq_get_data(rq, buf, rq->last_dma);
> > +       } else {
> > +               sg_init_one(rq->sg, buf + VIRTNET_RX_PAD + xdp_headroom,
> > +                           vi->hdr_len + GOOD_PACKET_LEN);
> > +               data = (void *)buf;
> > +       }
> > +
> > +       err = virtqueue_add_inbuf_ctx(rq->vq, rq->sg, 1, data, ctx, gfp);
> >         if (err < 0)
> > -               put_page(virt_to_head_page(buf));
> > +               goto add_err;
> > +
> > +       return err;
> > +
> > +add_err:
> > +       if (rq->data_array) {
> > +               virtnet_rq_unmap(rq, data->dma);
> > +               virtnet_rq_recycle_data(rq, data);
> > +       }
> > +
> > +map_err:
> > +       put_page(virt_to_head_page(buf));
> >         return err;
> >  }
> >
> > @@ -1620,6 +1842,7 @@ static int add_recvbuf_mergeable(struct virtnet_info *vi,
> >         unsigned int headroom = virtnet_get_headroom(vi);
> >         unsigned int tailroom = headroom ? sizeof(struct skb_shared_info) : 0;
> >         unsigned int room = SKB_DATA_ALIGN(headroom + tailroom);
> > +       struct virtnet_rq_data *data;
> >         char *buf;
> >         void *ctx;
> >         int err;
> > @@ -1650,12 +1873,32 @@ static int add_recvbuf_mergeable(struct virtnet_info *vi,
> >                 alloc_frag->offset += hole;
> >         }
> >
> > -       sg_init_one(rq->sg, buf, len);
> > +       if (rq->data_array) {
> > +               err = virtnet_rq_map_sg(rq, buf, len);
> > +               if (err)
> > +                       goto map_err;
> > +
> > +               data = virtnet_rq_get_data(rq, buf, rq->last_dma);
> > +       } else {
> > +               sg_init_one(rq->sg, buf, len);
> > +               data = (void *)buf;
> > +       }
> > +
> >         ctx = mergeable_len_to_ctx(len + room, headroom);
> > -       err = virtqueue_add_inbuf_ctx(rq->vq, rq->sg, 1, buf, ctx, gfp);
> > +       err = virtqueue_add_inbuf_ctx(rq->vq, rq->sg, 1, data, ctx, gfp);
> >         if (err < 0)
> > -               put_page(virt_to_head_page(buf));
> > +               goto add_err;
> > +
> > +       return 0;
> > +
> > +add_err:
> > +       if (rq->data_array) {
> > +               virtnet_rq_unmap(rq, data->dma);
> > +               virtnet_rq_recycle_data(rq, data);
> > +       }
> >
> > +map_err:
> > +       put_page(virt_to_head_page(buf));
> >         return err;
> >  }
> >
> > @@ -1775,13 +2018,13 @@ static int virtnet_receive(struct receive_queue *rq, int budget,
> >                 void *ctx;
> >
> >                 while (stats.packets < budget &&
> > -                      (buf = virtqueue_get_buf_ctx(rq->vq, &len, &ctx))) {
> > +                      (buf = virtnet_rq_get_buf(rq, &len, &ctx))) {
> >                         receive_buf(vi, rq, buf, len, ctx, xdp_xmit, &stats);
> >                         stats.packets++;
> >                 }
> >         } else {
> >                 while (stats.packets < budget &&
> > -                      (buf = virtqueue_get_buf(rq->vq, &len)) != NULL) {
> > +                      (buf = virtnet_rq_get_buf(rq, &len, NULL)) != NULL) {
> >                         receive_buf(vi, rq, buf, len, NULL, xdp_xmit, &stats);
> >                         stats.packets++;
> >                 }
> > @@ -3514,6 +3757,9 @@ static void virtnet_free_queues(struct virtnet_info *vi)
> >         for (i = 0; i < vi->max_queue_pairs; i++) {
> >                 __netif_napi_del(&vi->rq[i].napi);
> >                 __netif_napi_del(&vi->sq[i].napi);
> > +
> > +               kfree(vi->rq[i].data_array);
> > +               kfree(vi->rq[i].dma_array);
> >         }
> >
> >         /* We called __netif_napi_del(),
> > @@ -3591,9 +3837,10 @@ static void free_unused_bufs(struct virtnet_info *vi)
> >         }
> >
> >         for (i = 0; i < vi->max_queue_pairs; i++) {
> > -               struct virtqueue *vq = vi->rq[i].vq;
> > -               while ((buf = virtqueue_detach_unused_buf(vq)) != NULL)
> > -                       virtnet_rq_free_unused_buf(vq, buf);
> > +               struct receive_queue *rq = &vi->rq[i];
> > +
> > +               while ((buf = virtnet_rq_detach_unused_buf(rq)) != NULL)
> > +                       virtnet_rq_free_unused_buf(rq->vq, buf);
> >                 cond_resched();
> >         }
> >  }
> > @@ -3767,6 +4014,10 @@ static int init_vqs(struct virtnet_info *vi)
> >         if (ret)
> >                 goto err_free;
> >
> > +       ret = virtnet_rq_merge_map_init(vi);
> > +       if (ret)
> > +               goto err_free;
> > +
> >         cpus_read_lock();
> >         virtnet_set_affinity(vi);
> >         cpus_read_unlock();
> > --
> > 2.32.0.3.g01195cf9f
> >
>
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: [PATCH vhost v11 10/10] virtio_net: merge dma operation for one page
@ 2023-07-13  7:00       ` Xuan Zhuo
  0 siblings, 0 replies; 176+ messages in thread
From: Xuan Zhuo @ 2023-07-13  7:00 UTC (permalink / raw)
  To: Jason Wang
  Cc: virtualization, Michael S. Tsirkin, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Alexei Starovoitov,
	Daniel Borkmann, Jesper Dangaard Brouer, John Fastabend, netdev,
	bpf, Christoph Hellwig

On Thu, 13 Jul 2023 12:20:01 +0800, Jason Wang <jasowang@redhat.com> wrote:
> On Mon, Jul 10, 2023 at 11:43 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> >
>
> I'd suggest to tweak the title like:
>
> "merge dma operations when refilling mergeable buffers"
>
> > Currently, the virtio core will perform a dma operation for each
> > operation.
>
> "for each buffer"?
>
> > Although, the same page may be operated multiple times.
> >
> > The driver does the dma operation and manages the dma address based the
> > feature premapped of virtio core.
> >
> > This way, we can perform only one dma operation for the same page. In
> > the case of mtu 1500, this can reduce a lot of dma operations.
> >
> > Tested on Aliyun g7.4large machine, in the case of a cpu 100%, pps
> > increased from 1893766 to 1901105. An increase of 0.4%.
>
> Btw, it looks to me the code to deal with XDP_TX/REDIRECT for
> linearized pages was missed.
>
> >
> > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > ---
> >  drivers/net/virtio_net.c | 283 ++++++++++++++++++++++++++++++++++++---
> >  1 file changed, 267 insertions(+), 16 deletions(-)
> >
> > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > index 486b5849033d..4de845d35bed 100644
> > --- a/drivers/net/virtio_net.c
> > +++ b/drivers/net/virtio_net.c
> > @@ -126,6 +126,27 @@ static const struct virtnet_stat_desc virtnet_rq_stats_desc[] = {
> >  #define VIRTNET_SQ_STATS_LEN   ARRAY_SIZE(virtnet_sq_stats_desc)
> >  #define VIRTNET_RQ_STATS_LEN   ARRAY_SIZE(virtnet_rq_stats_desc)
> >
> > +/* The bufs on the same page may share this struct. */
> > +struct virtnet_rq_dma {
> > +       struct virtnet_rq_dma *next;
> > +
> > +       dma_addr_t addr;
> > +
> > +       void *buf;
> > +       u32 len;
> > +
> > +       u32 ref;
> > +};
> > +
> > +/* Record the dma and buf. */
> > +struct virtnet_rq_data {
> > +       struct virtnet_rq_data *next;
> > +
> > +       void *buf;
> > +
> > +       struct virtnet_rq_dma *dma;
> > +};
> > +
> >  /* Internal representation of a send virtqueue */
> >  struct send_queue {
> >         /* Virtqueue associated with this send _queue */
> > @@ -175,6 +196,13 @@ struct receive_queue {
> >         char name[16];
> >
> >         struct xdp_rxq_info xdp_rxq;
> > +
> > +       struct virtnet_rq_data *data_array;
> > +       struct virtnet_rq_data *data_free;
> > +
> > +       struct virtnet_rq_dma *dma_array;
> > +       struct virtnet_rq_dma *dma_free;
> > +       struct virtnet_rq_dma *last_dma;
> >  };
> >
> >  /* This structure can contain rss message with maximum settings for indirection table and keysize
> > @@ -549,6 +577,176 @@ static struct sk_buff *page_to_skb(struct virtnet_info *vi,
> >         return skb;
> >  }
> >
> > +static void virtnet_rq_unmap(struct receive_queue *rq, struct virtnet_rq_dma *dma)
> > +{
> > +       struct device *dev;
> > +
> > +       --dma->ref;
> > +
> > +       if (dma->ref)
> > +               return;
> > +
> > +       dev = virtqueue_dma_dev(rq->vq);
> > +
> > +       dma_unmap_page(dev, dma->addr, dma->len, DMA_FROM_DEVICE);
> > +
> > +       dma->next = rq->dma_free;
> > +       rq->dma_free = dma;
> > +}
> > +
> > +static void *virtnet_rq_recycle_data(struct receive_queue *rq,
> > +                                    struct virtnet_rq_data *data)
> > +{
> > +       void *buf;
> > +
> > +       buf = data->buf;
> > +
> > +       data->next = rq->data_free;
> > +       rq->data_free = data;
> > +
> > +       return buf;
> > +}
> > +
> > +static struct virtnet_rq_data *virtnet_rq_get_data(struct receive_queue *rq,
> > +                                                  void *buf,
> > +                                                  struct virtnet_rq_dma *dma)
> > +{
> > +       struct virtnet_rq_data *data;
> > +
> > +       data = rq->data_free;
> > +       rq->data_free = data->next;
> > +
> > +       data->buf = buf;
> > +       data->dma = dma;
> > +
> > +       return data;
> > +}
> > +
> > +static void *virtnet_rq_get_buf(struct receive_queue *rq, u32 *len, void **ctx)
> > +{
> > +       struct virtnet_rq_data *data;
> > +       void *buf;
> > +
> > +       buf = virtqueue_get_buf_ctx(rq->vq, len, ctx);
> > +       if (!buf || !rq->data_array)
> > +               return buf;
> > +
> > +       data = buf;
> > +
> > +       virtnet_rq_unmap(rq, data->dma);
> > +
> > +       return virtnet_rq_recycle_data(rq, data);
> > +}
> > +
> > +static void *virtnet_rq_detach_unused_buf(struct receive_queue *rq)
> > +{
> > +       struct virtnet_rq_data *data;
> > +       void *buf;
> > +
> > +       buf = virtqueue_detach_unused_buf(rq->vq);
> > +       if (!buf || !rq->data_array)
> > +               return buf;
> > +
> > +       data = buf;
> > +
> > +       virtnet_rq_unmap(rq, data->dma);
> > +
> > +       return virtnet_rq_recycle_data(rq, data);
> > +}
> > +
> > +static int virtnet_rq_map_sg(struct receive_queue *rq, void *buf, u32 len)
> > +{
> > +       struct virtnet_rq_dma *dma = rq->last_dma;
> > +       struct device *dev;
> > +       u32 off, map_len;
> > +       dma_addr_t addr;
> > +       void *end;
> > +
> > +       if (likely(dma) && buf >= dma->buf && (buf + len <= dma->buf + dma->len)) {
> > +               ++dma->ref;
> > +               addr = dma->addr + (buf - dma->buf);
> > +               goto ok;
> > +       }
> > +
> > +       end = buf + len - 1;
> > +       off = offset_in_page(end);
> > +       map_len = len + PAGE_SIZE - off;
>
> This assumes a PAGE_SIZE which seems sub-optimal as page frag could be
> larger than this.
>
> > +
> > +       dev = virtqueue_dma_dev(rq->vq);
> > +
> > +       addr = dma_map_page_attrs(dev, virt_to_page(buf), offset_in_page(buf),
> > +                                 map_len, DMA_FROM_DEVICE, 0);
> > +       if (addr == DMA_MAPPING_ERROR)
> > +               return -ENOMEM;
> > +
> > +       dma = rq->dma_free;
> > +       rq->dma_free = dma->next;
> > +
> > +       dma->ref = 1;
> > +       dma->buf = buf;
> > +       dma->addr = addr;
> > +       dma->len = map_len;
> > +
> > +       rq->last_dma = dma;
> > +
> > +ok:
> > +       sg_init_table(rq->sg, 1);
> > +       rq->sg[0].dma_address = addr;
> > +       rq->sg[0].length = len;
> > +
> > +       return 0;
> > +}
> > +
> > +static int virtnet_rq_merge_map_init(struct virtnet_info *vi)
> > +{
> > +       struct receive_queue *rq;
> > +       int i, err, j, num;
> > +
> > +       /* disable for big mode */
> > +       if (!vi->mergeable_rx_bufs && vi->big_packets)
> > +               return 0;
> > +
> > +       for (i = 0; i < vi->max_queue_pairs; i++) {
> > +               err = virtqueue_set_premapped(vi->rq[i].vq);
> > +               if (err)
> > +                       continue;
> > +
> > +               rq = &vi->rq[i];
> > +
> > +               num = virtqueue_get_vring_size(rq->vq);
> > +
> > +               rq->data_array = kmalloc_array(num, sizeof(*rq->data_array), GFP_KERNEL);
> > +               if (!rq->data_array)
>
> Can we avoid those allocations when we don't use the DMA API?
>
> > +                       goto err;
> > +
> > +               rq->dma_array = kmalloc_array(num, sizeof(*rq->dma_array), GFP_KERNEL);
> > +               if (!rq->dma_array)
> > +                       goto err;
> > +
> > +               for (j = 0; j < num; ++j) {
> > +                       rq->data_array[j].next = rq->data_free;
> > +                       rq->data_free = &rq->data_array[j];
> > +
> > +                       rq->dma_array[j].next = rq->dma_free;
> > +                       rq->dma_free = &rq->dma_array[j];
> > +               }
> > +       }
> > +
> > +       return 0;
> > +
> > +err:
> > +       for (i = 0; i < vi->max_queue_pairs; i++) {
> > +               struct receive_queue *rq;
> > +
> > +               rq = &vi->rq[i];
> > +
> > +               kfree(rq->dma_array);
> > +               kfree(rq->data_array);
> > +       }
> > +
> > +       return -ENOMEM;
> > +}
> > +
> >  static void free_old_xmit_skbs(struct send_queue *sq, bool in_napi)
> >  {
> >         unsigned int len;
> > @@ -835,7 +1033,7 @@ static struct page *xdp_linearize_page(struct receive_queue *rq,
> >                 void *buf;
> >                 int off;
> >
> > -               buf = virtqueue_get_buf(rq->vq, &buflen);
> > +               buf = virtnet_rq_get_buf(rq, &buflen, NULL);
> >                 if (unlikely(!buf))
> >                         goto err_buf;
> >
> > @@ -1126,7 +1324,7 @@ static int virtnet_build_xdp_buff_mrg(struct net_device *dev,
> >                 return -EINVAL;
> >
> >         while (--*num_buf > 0) {
> > -               buf = virtqueue_get_buf_ctx(rq->vq, &len, &ctx);
> > +               buf = virtnet_rq_get_buf(rq, &len, &ctx);
> >                 if (unlikely(!buf)) {
> >                         pr_debug("%s: rx error: %d buffers out of %d missing\n",
> >                                  dev->name, *num_buf,
> > @@ -1351,7 +1549,7 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
> >         while (--num_buf) {
> >                 int num_skb_frags;
> >
> > -               buf = virtqueue_get_buf_ctx(rq->vq, &len, &ctx);
> > +               buf = virtnet_rq_get_buf(rq, &len, &ctx);
> >                 if (unlikely(!buf)) {
> >                         pr_debug("%s: rx error: %d buffers out of %d missing\n",
> >                                  dev->name, num_buf,
> > @@ -1414,7 +1612,7 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
> >  err_skb:
> >         put_page(page);
> >         while (num_buf-- > 1) {
> > -               buf = virtqueue_get_buf(rq->vq, &len);
> > +               buf = virtnet_rq_get_buf(rq, &len, NULL);
> >                 if (unlikely(!buf)) {
> >                         pr_debug("%s: rx error: %d buffers missing\n",
> >                                  dev->name, num_buf);
> > @@ -1529,6 +1727,7 @@ static int add_recvbuf_small(struct virtnet_info *vi, struct receive_queue *rq,
> >         unsigned int xdp_headroom = virtnet_get_headroom(vi);
> >         void *ctx = (void *)(unsigned long)xdp_headroom;
> >         int len = vi->hdr_len + VIRTNET_RX_PAD + GOOD_PACKET_LEN + xdp_headroom;
> > +       struct virtnet_rq_data *data;
> >         int err;
> >
> >         len = SKB_DATA_ALIGN(len) +
> > @@ -1539,11 +1738,34 @@ static int add_recvbuf_small(struct virtnet_info *vi, struct receive_queue *rq,
> >         buf = (char *)page_address(alloc_frag->page) + alloc_frag->offset;
> >         get_page(alloc_frag->page);
> >         alloc_frag->offset += len;
> > -       sg_init_one(rq->sg, buf + VIRTNET_RX_PAD + xdp_headroom,
> > -                   vi->hdr_len + GOOD_PACKET_LEN);
> > -       err = virtqueue_add_inbuf_ctx(rq->vq, rq->sg, 1, buf, ctx, gfp);
> > +
> > +       if (rq->data_array) {
> > +               err = virtnet_rq_map_sg(rq, buf + VIRTNET_RX_PAD + xdp_headroom,
> > +                                       vi->hdr_len + GOOD_PACKET_LEN);
>
> Thanks to the compound page. I wonder if everything could be
> simplified if we just reuse page->private for storing metadata like
> dma address and refcnt. Then we don't need extra stuff for tracking
> any other thing?

Maybe we can try alloc one small buffer from the page_frag to store the dma info
when page_frag.offset == 0.

Thanks.


>
> Thanks
>
>
>
> > +               if (err)
> > +                       goto map_err;
> > +
> > +               data = virtnet_rq_get_data(rq, buf, rq->last_dma);
> > +       } else {
> > +               sg_init_one(rq->sg, buf + VIRTNET_RX_PAD + xdp_headroom,
> > +                           vi->hdr_len + GOOD_PACKET_LEN);
> > +               data = (void *)buf;
> > +       }
> > +
> > +       err = virtqueue_add_inbuf_ctx(rq->vq, rq->sg, 1, data, ctx, gfp);
> >         if (err < 0)
> > -               put_page(virt_to_head_page(buf));
> > +               goto add_err;
> > +
> > +       return err;
> > +
> > +add_err:
> > +       if (rq->data_array) {
> > +               virtnet_rq_unmap(rq, data->dma);
> > +               virtnet_rq_recycle_data(rq, data);
> > +       }
> > +
> > +map_err:
> > +       put_page(virt_to_head_page(buf));
> >         return err;
> >  }
> >
> > @@ -1620,6 +1842,7 @@ static int add_recvbuf_mergeable(struct virtnet_info *vi,
> >         unsigned int headroom = virtnet_get_headroom(vi);
> >         unsigned int tailroom = headroom ? sizeof(struct skb_shared_info) : 0;
> >         unsigned int room = SKB_DATA_ALIGN(headroom + tailroom);
> > +       struct virtnet_rq_data *data;
> >         char *buf;
> >         void *ctx;
> >         int err;
> > @@ -1650,12 +1873,32 @@ static int add_recvbuf_mergeable(struct virtnet_info *vi,
> >                 alloc_frag->offset += hole;
> >         }
> >
> > -       sg_init_one(rq->sg, buf, len);
> > +       if (rq->data_array) {
> > +               err = virtnet_rq_map_sg(rq, buf, len);
> > +               if (err)
> > +                       goto map_err;
> > +
> > +               data = virtnet_rq_get_data(rq, buf, rq->last_dma);
> > +       } else {
> > +               sg_init_one(rq->sg, buf, len);
> > +               data = (void *)buf;
> > +       }
> > +
> >         ctx = mergeable_len_to_ctx(len + room, headroom);
> > -       err = virtqueue_add_inbuf_ctx(rq->vq, rq->sg, 1, buf, ctx, gfp);
> > +       err = virtqueue_add_inbuf_ctx(rq->vq, rq->sg, 1, data, ctx, gfp);
> >         if (err < 0)
> > -               put_page(virt_to_head_page(buf));
> > +               goto add_err;
> > +
> > +       return 0;
> > +
> > +add_err:
> > +       if (rq->data_array) {
> > +               virtnet_rq_unmap(rq, data->dma);
> > +               virtnet_rq_recycle_data(rq, data);
> > +       }
> >
> > +map_err:
> > +       put_page(virt_to_head_page(buf));
> >         return err;
> >  }
> >
> > @@ -1775,13 +2018,13 @@ static int virtnet_receive(struct receive_queue *rq, int budget,
> >                 void *ctx;
> >
> >                 while (stats.packets < budget &&
> > -                      (buf = virtqueue_get_buf_ctx(rq->vq, &len, &ctx))) {
> > +                      (buf = virtnet_rq_get_buf(rq, &len, &ctx))) {
> >                         receive_buf(vi, rq, buf, len, ctx, xdp_xmit, &stats);
> >                         stats.packets++;
> >                 }
> >         } else {
> >                 while (stats.packets < budget &&
> > -                      (buf = virtqueue_get_buf(rq->vq, &len)) != NULL) {
> > +                      (buf = virtnet_rq_get_buf(rq, &len, NULL)) != NULL) {
> >                         receive_buf(vi, rq, buf, len, NULL, xdp_xmit, &stats);
> >                         stats.packets++;
> >                 }
> > @@ -3514,6 +3757,9 @@ static void virtnet_free_queues(struct virtnet_info *vi)
> >         for (i = 0; i < vi->max_queue_pairs; i++) {
> >                 __netif_napi_del(&vi->rq[i].napi);
> >                 __netif_napi_del(&vi->sq[i].napi);
> > +
> > +               kfree(vi->rq[i].data_array);
> > +               kfree(vi->rq[i].dma_array);
> >         }
> >
> >         /* We called __netif_napi_del(),
> > @@ -3591,9 +3837,10 @@ static void free_unused_bufs(struct virtnet_info *vi)
> >         }
> >
> >         for (i = 0; i < vi->max_queue_pairs; i++) {
> > -               struct virtqueue *vq = vi->rq[i].vq;
> > -               while ((buf = virtqueue_detach_unused_buf(vq)) != NULL)
> > -                       virtnet_rq_free_unused_buf(vq, buf);
> > +               struct receive_queue *rq = &vi->rq[i];
> > +
> > +               while ((buf = virtnet_rq_detach_unused_buf(rq)) != NULL)
> > +                       virtnet_rq_free_unused_buf(rq->vq, buf);
> >                 cond_resched();
> >         }
> >  }
> > @@ -3767,6 +4014,10 @@ static int init_vqs(struct virtnet_info *vi)
> >         if (ret)
> >                 goto err_free;
> >
> > +       ret = virtnet_rq_merge_map_init(vi);
> > +       if (ret)
> > +               goto err_free;
> > +
> >         cpus_read_lock();
> >         virtnet_set_affinity(vi);
> >         cpus_read_unlock();
> > --
> > 2.32.0.3.g01195cf9f
> >
>

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: [PATCH vhost v11 03/10] virtio_ring: introduce virtqueue_set_premapped()
  2023-07-10  3:42   ` Xuan Zhuo
@ 2023-07-13 11:14     ` Christoph Hellwig
  -1 siblings, 0 replies; 176+ messages in thread
From: Christoph Hellwig @ 2023-07-13 11:14 UTC (permalink / raw)
  To: Xuan Zhuo
  Cc: Jesper Dangaard Brouer, Daniel Borkmann, Michael S. Tsirkin,
	netdev, John Fastabend, Alexei Starovoitov, virtualization,
	Christoph Hellwig, Eric Dumazet, Jakub Kicinski, bpf,
	Paolo Abeni, David S. Miller

On Mon, Jul 10, 2023 at 11:42:30AM +0800, Xuan Zhuo wrote:
> This helper allows the driver change the dma mode to premapped mode.
> Under the premapped mode, the virtio core do not do dma mapping
> internally.
> 
> This just work when the use_dma_api is true. If the use_dma_api is false,
> the dma options is not through the DMA APIs, that is not the standard
> way of the linux kernel.

I have a hard time parsing this.

More importantly having two modes seems very error prone going down
the route.  If the premapping is so important, why don't we do it
always?
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: [PATCH vhost v11 03/10] virtio_ring: introduce virtqueue_set_premapped()
@ 2023-07-13 11:14     ` Christoph Hellwig
  0 siblings, 0 replies; 176+ messages in thread
From: Christoph Hellwig @ 2023-07-13 11:14 UTC (permalink / raw)
  To: Xuan Zhuo
  Cc: virtualization, Michael S. Tsirkin, Jason Wang, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Alexei Starovoitov,
	Daniel Borkmann, Jesper Dangaard Brouer, John Fastabend, netdev,
	bpf, Christoph Hellwig

On Mon, Jul 10, 2023 at 11:42:30AM +0800, Xuan Zhuo wrote:
> This helper allows the driver change the dma mode to premapped mode.
> Under the premapped mode, the virtio core do not do dma mapping
> internally.
> 
> This just work when the use_dma_api is true. If the use_dma_api is false,
> the dma options is not through the DMA APIs, that is not the standard
> way of the linux kernel.

I have a hard time parsing this.

More importantly having two modes seems very error prone going down
the route.  If the premapping is so important, why don't we do it
always?

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: [PATCH vhost v11 05/10] virtio_ring: introduce virtqueue_dma_dev()
  2023-07-10  3:42   ` Xuan Zhuo
@ 2023-07-13 11:15     ` Christoph Hellwig
  -1 siblings, 0 replies; 176+ messages in thread
From: Christoph Hellwig @ 2023-07-13 11:15 UTC (permalink / raw)
  To: Xuan Zhuo
  Cc: Jesper Dangaard Brouer, Daniel Borkmann, Michael S. Tsirkin,
	netdev, John Fastabend, Alexei Starovoitov, virtualization,
	Christoph Hellwig, Eric Dumazet, Jakub Kicinski, bpf,
	Paolo Abeni, David S. Miller

On Mon, Jul 10, 2023 at 11:42:32AM +0800, Xuan Zhuo wrote:
> Added virtqueue_dma_dev() to get DMA device for virtio. Then the
> caller can do dma operation in advance. The purpose is to keep memory
> mapped across multiple add/get buf operations.

This is just poking holes into the abstraction..

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: [PATCH vhost v11 05/10] virtio_ring: introduce virtqueue_dma_dev()
@ 2023-07-13 11:15     ` Christoph Hellwig
  0 siblings, 0 replies; 176+ messages in thread
From: Christoph Hellwig @ 2023-07-13 11:15 UTC (permalink / raw)
  To: Xuan Zhuo
  Cc: virtualization, Michael S. Tsirkin, Jason Wang, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Alexei Starovoitov,
	Daniel Borkmann, Jesper Dangaard Brouer, John Fastabend, netdev,
	bpf, Christoph Hellwig

On Mon, Jul 10, 2023 at 11:42:32AM +0800, Xuan Zhuo wrote:
> Added virtqueue_dma_dev() to get DMA device for virtio. Then the
> caller can do dma operation in advance. The purpose is to keep memory
> mapped across multiple add/get buf operations.

This is just poking holes into the abstraction..


^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: [PATCH vhost v11 03/10] virtio_ring: introduce virtqueue_set_premapped()
  2023-07-13 11:14     ` Christoph Hellwig
@ 2023-07-13 14:47       ` Michael S. Tsirkin
  -1 siblings, 0 replies; 176+ messages in thread
From: Michael S. Tsirkin @ 2023-07-13 14:47 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Xuan Zhuo, Jesper Dangaard Brouer, Daniel Borkmann, netdev,
	John Fastabend, Alexei Starovoitov, virtualization, Eric Dumazet,
	Jakub Kicinski, bpf, Paolo Abeni, David S. Miller

On Thu, Jul 13, 2023 at 04:14:45AM -0700, Christoph Hellwig wrote:
> On Mon, Jul 10, 2023 at 11:42:30AM +0800, Xuan Zhuo wrote:
> > This helper allows the driver change the dma mode to premapped mode.
> > Under the premapped mode, the virtio core do not do dma mapping
> > internally.
> > 
> > This just work when the use_dma_api is true. If the use_dma_api is false,
> > the dma options is not through the DMA APIs, that is not the standard
> > way of the linux kernel.
> 
> I have a hard time parsing this.
> 
> More importantly having two modes seems very error prone going down
> the route.  If the premapping is so important, why don't we do it
> always?

There are a gazillion virtio drivers and most of them just use the
virtio API, without bothering with these micro-optimizations.  virtio
already tracks addresses so mapping/unmapping them for DMA is easier
done in the core.  It's only networking and only with XDP where the
difference becomes measureable.

-- 
MST

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: [PATCH vhost v11 03/10] virtio_ring: introduce virtqueue_set_premapped()
@ 2023-07-13 14:47       ` Michael S. Tsirkin
  0 siblings, 0 replies; 176+ messages in thread
From: Michael S. Tsirkin @ 2023-07-13 14:47 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Xuan Zhuo, virtualization, Jason Wang, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Alexei Starovoitov,
	Daniel Borkmann, Jesper Dangaard Brouer, John Fastabend, netdev,
	bpf

On Thu, Jul 13, 2023 at 04:14:45AM -0700, Christoph Hellwig wrote:
> On Mon, Jul 10, 2023 at 11:42:30AM +0800, Xuan Zhuo wrote:
> > This helper allows the driver change the dma mode to premapped mode.
> > Under the premapped mode, the virtio core do not do dma mapping
> > internally.
> > 
> > This just work when the use_dma_api is true. If the use_dma_api is false,
> > the dma options is not through the DMA APIs, that is not the standard
> > way of the linux kernel.
> 
> I have a hard time parsing this.
> 
> More importantly having two modes seems very error prone going down
> the route.  If the premapping is so important, why don't we do it
> always?

There are a gazillion virtio drivers and most of them just use the
virtio API, without bothering with these micro-optimizations.  virtio
already tracks addresses so mapping/unmapping them for DMA is easier
done in the core.  It's only networking and only with XDP where the
difference becomes measureable.

-- 
MST


^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: [PATCH vhost v11 05/10] virtio_ring: introduce virtqueue_dma_dev()
  2023-07-13 11:15     ` Christoph Hellwig
@ 2023-07-13 14:51       ` Michael S. Tsirkin
  -1 siblings, 0 replies; 176+ messages in thread
From: Michael S. Tsirkin @ 2023-07-13 14:51 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Xuan Zhuo, Jesper Dangaard Brouer, Daniel Borkmann, netdev,
	John Fastabend, Alexei Starovoitov, virtualization, Eric Dumazet,
	Jakub Kicinski, bpf, Paolo Abeni, David S. Miller

On Thu, Jul 13, 2023 at 04:15:16AM -0700, Christoph Hellwig wrote:
> On Mon, Jul 10, 2023 at 11:42:32AM +0800, Xuan Zhuo wrote:
> > Added virtqueue_dma_dev() to get DMA device for virtio. Then the
> > caller can do dma operation in advance. The purpose is to keep memory
> > mapped across multiple add/get buf operations.
> 
> This is just poking holes into the abstraction..

More specifically?

-- 
MST

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: [PATCH vhost v11 05/10] virtio_ring: introduce virtqueue_dma_dev()
@ 2023-07-13 14:51       ` Michael S. Tsirkin
  0 siblings, 0 replies; 176+ messages in thread
From: Michael S. Tsirkin @ 2023-07-13 14:51 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Xuan Zhuo, virtualization, Jason Wang, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Alexei Starovoitov,
	Daniel Borkmann, Jesper Dangaard Brouer, John Fastabend, netdev,
	bpf

On Thu, Jul 13, 2023 at 04:15:16AM -0700, Christoph Hellwig wrote:
> On Mon, Jul 10, 2023 at 11:42:32AM +0800, Xuan Zhuo wrote:
> > Added virtqueue_dma_dev() to get DMA device for virtio. Then the
> > caller can do dma operation in advance. The purpose is to keep memory
> > mapped across multiple add/get buf operations.
> 
> This is just poking holes into the abstraction..

More specifically?

-- 
MST


^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: [PATCH vhost v11 03/10] virtio_ring: introduce virtqueue_set_premapped()
  2023-07-13 11:14     ` Christoph Hellwig
@ 2023-07-13 14:52       ` Michael S. Tsirkin
  -1 siblings, 0 replies; 176+ messages in thread
From: Michael S. Tsirkin @ 2023-07-13 14:52 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Xuan Zhuo, Jesper Dangaard Brouer, Daniel Borkmann, netdev,
	John Fastabend, Alexei Starovoitov, virtualization, Eric Dumazet,
	Jakub Kicinski, bpf, Paolo Abeni, David S. Miller

On Thu, Jul 13, 2023 at 04:14:45AM -0700, Christoph Hellwig wrote:
> On Mon, Jul 10, 2023 at 11:42:30AM +0800, Xuan Zhuo wrote:
> > This helper allows the driver change the dma mode to premapped mode.
> > Under the premapped mode, the virtio core do not do dma mapping
> > internally.
> > 
> > This just work when the use_dma_api is true. If the use_dma_api is false,
> > the dma options is not through the DMA APIs, that is not the standard
> > way of the linux kernel.
> 
> I have a hard time parsing this.

Me too unfortunately.

-- 
MST

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: [PATCH vhost v11 03/10] virtio_ring: introduce virtqueue_set_premapped()
@ 2023-07-13 14:52       ` Michael S. Tsirkin
  0 siblings, 0 replies; 176+ messages in thread
From: Michael S. Tsirkin @ 2023-07-13 14:52 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Xuan Zhuo, virtualization, Jason Wang, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Alexei Starovoitov,
	Daniel Borkmann, Jesper Dangaard Brouer, John Fastabend, netdev,
	bpf

On Thu, Jul 13, 2023 at 04:14:45AM -0700, Christoph Hellwig wrote:
> On Mon, Jul 10, 2023 at 11:42:30AM +0800, Xuan Zhuo wrote:
> > This helper allows the driver change the dma mode to premapped mode.
> > Under the premapped mode, the virtio core do not do dma mapping
> > internally.
> > 
> > This just work when the use_dma_api is true. If the use_dma_api is false,
> > the dma options is not through the DMA APIs, that is not the standard
> > way of the linux kernel.
> 
> I have a hard time parsing this.

Me too unfortunately.

-- 
MST


^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: [PATCH vhost v11 10/10] virtio_net: merge dma operation for one page
  2023-07-13  6:51       ` Xuan Zhuo
@ 2023-07-14  3:56         ` Jason Wang
  -1 siblings, 0 replies; 176+ messages in thread
From: Jason Wang @ 2023-07-14  3:56 UTC (permalink / raw)
  To: Xuan Zhuo
  Cc: virtualization, Michael S. Tsirkin, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Alexei Starovoitov,
	Daniel Borkmann, Jesper Dangaard Brouer, John Fastabend, netdev,
	bpf, Christoph Hellwig

On Thu, Jul 13, 2023 at 2:54 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
>
> On Thu, 13 Jul 2023 12:20:01 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > On Mon, Jul 10, 2023 at 11:43 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > >
> >
> > I'd suggest to tweak the title like:
> >
> > "merge dma operations when refilling mergeable buffers"
> >
> > > Currently, the virtio core will perform a dma operation for each
> > > operation.
> >
> > "for each buffer"?
> >
> > > Although, the same page may be operated multiple times.
> > >
> > > The driver does the dma operation and manages the dma address based the
> > > feature premapped of virtio core.
> > >
> > > This way, we can perform only one dma operation for the same page. In
> > > the case of mtu 1500, this can reduce a lot of dma operations.
> > >
> > > Tested on Aliyun g7.4large machine, in the case of a cpu 100%, pps
> > > increased from 1893766 to 1901105. An increase of 0.4%.
> >
> > Btw, it looks to me the code to deal with XDP_TX/REDIRECT for
> > linearized pages was missed.
> >
> > >
> > > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > > ---
> > >  drivers/net/virtio_net.c | 283 ++++++++++++++++++++++++++++++++++++---
> > >  1 file changed, 267 insertions(+), 16 deletions(-)
> > >
> > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > > index 486b5849033d..4de845d35bed 100644
> > > --- a/drivers/net/virtio_net.c
> > > +++ b/drivers/net/virtio_net.c
> > > @@ -126,6 +126,27 @@ static const struct virtnet_stat_desc virtnet_rq_stats_desc[] = {
> > >  #define VIRTNET_SQ_STATS_LEN   ARRAY_SIZE(virtnet_sq_stats_desc)
> > >  #define VIRTNET_RQ_STATS_LEN   ARRAY_SIZE(virtnet_rq_stats_desc)
> > >
> > > +/* The bufs on the same page may share this struct. */
> > > +struct virtnet_rq_dma {
> > > +       struct virtnet_rq_dma *next;
> > > +
> > > +       dma_addr_t addr;
> > > +
> > > +       void *buf;
> > > +       u32 len;
> > > +
> > > +       u32 ref;
> > > +};
> > > +
> > > +/* Record the dma and buf. */
> > > +struct virtnet_rq_data {
> > > +       struct virtnet_rq_data *next;
> > > +
> > > +       void *buf;
> > > +
> > > +       struct virtnet_rq_dma *dma;
> > > +};
> > > +
> > >  /* Internal representation of a send virtqueue */
> > >  struct send_queue {
> > >         /* Virtqueue associated with this send _queue */
> > > @@ -175,6 +196,13 @@ struct receive_queue {
> > >         char name[16];
> > >
> > >         struct xdp_rxq_info xdp_rxq;
> > > +
> > > +       struct virtnet_rq_data *data_array;
> > > +       struct virtnet_rq_data *data_free;
> > > +
> > > +       struct virtnet_rq_dma *dma_array;
> > > +       struct virtnet_rq_dma *dma_free;
> > > +       struct virtnet_rq_dma *last_dma;
> > >  };
> > >
> > >  /* This structure can contain rss message with maximum settings for indirection table and keysize
> > > @@ -549,6 +577,176 @@ static struct sk_buff *page_to_skb(struct virtnet_info *vi,
> > >         return skb;
> > >  }
> > >
> > > +static void virtnet_rq_unmap(struct receive_queue *rq, struct virtnet_rq_dma *dma)
> > > +{
> > > +       struct device *dev;
> > > +
> > > +       --dma->ref;
> > > +
> > > +       if (dma->ref)
> > > +               return;
> > > +
> > > +       dev = virtqueue_dma_dev(rq->vq);
> > > +
> > > +       dma_unmap_page(dev, dma->addr, dma->len, DMA_FROM_DEVICE);
> > > +
> > > +       dma->next = rq->dma_free;
> > > +       rq->dma_free = dma;
> > > +}
> > > +
> > > +static void *virtnet_rq_recycle_data(struct receive_queue *rq,
> > > +                                    struct virtnet_rq_data *data)
> > > +{
> > > +       void *buf;
> > > +
> > > +       buf = data->buf;
> > > +
> > > +       data->next = rq->data_free;
> > > +       rq->data_free = data;
> > > +
> > > +       return buf;
> > > +}
> > > +
> > > +static struct virtnet_rq_data *virtnet_rq_get_data(struct receive_queue *rq,
> > > +                                                  void *buf,
> > > +                                                  struct virtnet_rq_dma *dma)
> > > +{
> > > +       struct virtnet_rq_data *data;
> > > +
> > > +       data = rq->data_free;
> > > +       rq->data_free = data->next;
> > > +
> > > +       data->buf = buf;
> > > +       data->dma = dma;
> > > +
> > > +       return data;
> > > +}
> > > +
> > > +static void *virtnet_rq_get_buf(struct receive_queue *rq, u32 *len, void **ctx)
> > > +{
> > > +       struct virtnet_rq_data *data;
> > > +       void *buf;
> > > +
> > > +       buf = virtqueue_get_buf_ctx(rq->vq, len, ctx);
> > > +       if (!buf || !rq->data_array)
> > > +               return buf;
> > > +
> > > +       data = buf;
> > > +
> > > +       virtnet_rq_unmap(rq, data->dma);
> > > +
> > > +       return virtnet_rq_recycle_data(rq, data);
> > > +}
> > > +
> > > +static void *virtnet_rq_detach_unused_buf(struct receive_queue *rq)
> > > +{
> > > +       struct virtnet_rq_data *data;
> > > +       void *buf;
> > > +
> > > +       buf = virtqueue_detach_unused_buf(rq->vq);
> > > +       if (!buf || !rq->data_array)
> > > +               return buf;
> > > +
> > > +       data = buf;
> > > +
> > > +       virtnet_rq_unmap(rq, data->dma);
> > > +
> > > +       return virtnet_rq_recycle_data(rq, data);
> > > +}
> > > +
> > > +static int virtnet_rq_map_sg(struct receive_queue *rq, void *buf, u32 len)
> > > +{
> > > +       struct virtnet_rq_dma *dma = rq->last_dma;
> > > +       struct device *dev;
> > > +       u32 off, map_len;
> > > +       dma_addr_t addr;
> > > +       void *end;
> > > +
> > > +       if (likely(dma) && buf >= dma->buf && (buf + len <= dma->buf + dma->len)) {
> > > +               ++dma->ref;
> > > +               addr = dma->addr + (buf - dma->buf);
> > > +               goto ok;
> > > +       }
> > > +
> > > +       end = buf + len - 1;
> > > +       off = offset_in_page(end);
> > > +       map_len = len + PAGE_SIZE - off;
> >
> > This assumes a PAGE_SIZE which seems sub-optimal as page frag could be
> > larger than this.
> >
> > > +
> > > +       dev = virtqueue_dma_dev(rq->vq);
> > > +
> > > +       addr = dma_map_page_attrs(dev, virt_to_page(buf), offset_in_page(buf),
> > > +                                 map_len, DMA_FROM_DEVICE, 0);
> > > +       if (addr == DMA_MAPPING_ERROR)
> > > +               return -ENOMEM;
> > > +
> > > +       dma = rq->dma_free;
> > > +       rq->dma_free = dma->next;
> > > +
> > > +       dma->ref = 1;
> > > +       dma->buf = buf;
> > > +       dma->addr = addr;
> > > +       dma->len = map_len;
> > > +
> > > +       rq->last_dma = dma;
> > > +
> > > +ok:
> > > +       sg_init_table(rq->sg, 1);
> > > +       rq->sg[0].dma_address = addr;
> > > +       rq->sg[0].length = len;
> > > +
> > > +       return 0;
> > > +}
> > > +
> > > +static int virtnet_rq_merge_map_init(struct virtnet_info *vi)
> > > +{
> > > +       struct receive_queue *rq;
> > > +       int i, err, j, num;
> > > +
> > > +       /* disable for big mode */
> > > +       if (!vi->mergeable_rx_bufs && vi->big_packets)
> > > +               return 0;
> > > +
> > > +       for (i = 0; i < vi->max_queue_pairs; i++) {
> > > +               err = virtqueue_set_premapped(vi->rq[i].vq);
> > > +               if (err)
> > > +                       continue;
> > > +
> > > +               rq = &vi->rq[i];
> > > +
> > > +               num = virtqueue_get_vring_size(rq->vq);
> > > +
> > > +               rq->data_array = kmalloc_array(num, sizeof(*rq->data_array), GFP_KERNEL);
> > > +               if (!rq->data_array)
> >
> > Can we avoid those allocations when we don't use the DMA API?
> >
> > > +                       goto err;
> > > +
> > > +               rq->dma_array = kmalloc_array(num, sizeof(*rq->dma_array), GFP_KERNEL);
> > > +               if (!rq->dma_array)
> > > +                       goto err;
> > > +
> > > +               for (j = 0; j < num; ++j) {
> > > +                       rq->data_array[j].next = rq->data_free;
> > > +                       rq->data_free = &rq->data_array[j];
> > > +
> > > +                       rq->dma_array[j].next = rq->dma_free;
> > > +                       rq->dma_free = &rq->dma_array[j];
> > > +               }
> > > +       }
> > > +
> > > +       return 0;
> > > +
> > > +err:
> > > +       for (i = 0; i < vi->max_queue_pairs; i++) {
> > > +               struct receive_queue *rq;
> > > +
> > > +               rq = &vi->rq[i];
> > > +
> > > +               kfree(rq->dma_array);
> > > +               kfree(rq->data_array);
> > > +       }
> > > +
> > > +       return -ENOMEM;
> > > +}
> > > +
> > >  static void free_old_xmit_skbs(struct send_queue *sq, bool in_napi)
> > >  {
> > >         unsigned int len;
> > > @@ -835,7 +1033,7 @@ static struct page *xdp_linearize_page(struct receive_queue *rq,
> > >                 void *buf;
> > >                 int off;
> > >
> > > -               buf = virtqueue_get_buf(rq->vq, &buflen);
> > > +               buf = virtnet_rq_get_buf(rq, &buflen, NULL);
> > >                 if (unlikely(!buf))
> > >                         goto err_buf;
> > >
> > > @@ -1126,7 +1324,7 @@ static int virtnet_build_xdp_buff_mrg(struct net_device *dev,
> > >                 return -EINVAL;
> > >
> > >         while (--*num_buf > 0) {
> > > -               buf = virtqueue_get_buf_ctx(rq->vq, &len, &ctx);
> > > +               buf = virtnet_rq_get_buf(rq, &len, &ctx);
> > >                 if (unlikely(!buf)) {
> > >                         pr_debug("%s: rx error: %d buffers out of %d missing\n",
> > >                                  dev->name, *num_buf,
> > > @@ -1351,7 +1549,7 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
> > >         while (--num_buf) {
> > >                 int num_skb_frags;
> > >
> > > -               buf = virtqueue_get_buf_ctx(rq->vq, &len, &ctx);
> > > +               buf = virtnet_rq_get_buf(rq, &len, &ctx);
> > >                 if (unlikely(!buf)) {
> > >                         pr_debug("%s: rx error: %d buffers out of %d missing\n",
> > >                                  dev->name, num_buf,
> > > @@ -1414,7 +1612,7 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
> > >  err_skb:
> > >         put_page(page);
> > >         while (num_buf-- > 1) {
> > > -               buf = virtqueue_get_buf(rq->vq, &len);
> > > +               buf = virtnet_rq_get_buf(rq, &len, NULL);
> > >                 if (unlikely(!buf)) {
> > >                         pr_debug("%s: rx error: %d buffers missing\n",
> > >                                  dev->name, num_buf);
> > > @@ -1529,6 +1727,7 @@ static int add_recvbuf_small(struct virtnet_info *vi, struct receive_queue *rq,
> > >         unsigned int xdp_headroom = virtnet_get_headroom(vi);
> > >         void *ctx = (void *)(unsigned long)xdp_headroom;
> > >         int len = vi->hdr_len + VIRTNET_RX_PAD + GOOD_PACKET_LEN + xdp_headroom;
> > > +       struct virtnet_rq_data *data;
> > >         int err;
> > >
> > >         len = SKB_DATA_ALIGN(len) +
> > > @@ -1539,11 +1738,34 @@ static int add_recvbuf_small(struct virtnet_info *vi, struct receive_queue *rq,
> > >         buf = (char *)page_address(alloc_frag->page) + alloc_frag->offset;
> > >         get_page(alloc_frag->page);
> > >         alloc_frag->offset += len;
> > > -       sg_init_one(rq->sg, buf + VIRTNET_RX_PAD + xdp_headroom,
> > > -                   vi->hdr_len + GOOD_PACKET_LEN);
> > > -       err = virtqueue_add_inbuf_ctx(rq->vq, rq->sg, 1, buf, ctx, gfp);
> > > +
> > > +       if (rq->data_array) {
> > > +               err = virtnet_rq_map_sg(rq, buf + VIRTNET_RX_PAD + xdp_headroom,
> > > +                                       vi->hdr_len + GOOD_PACKET_LEN);
> >
> > Thanks to the compound page. I wonder if everything could be
> > simplified if we just reuse page->private for storing metadata like
> > dma address and refcnt. Then we don't need extra stuff for tracking
> > any other thing?
>
> I didn't use page->private because if part of the page is used by one skb then
> the driver is not the only owner. Can we still use page->private?

You are right, we can't since there's no guarantee that a skb will
occupy a full page.

Thanks

>
> Thanks.
>
>
>
>
> >
> > Thanks
> >
> >
> >
> > > +               if (err)
> > > +                       goto map_err;
> > > +
> > > +               data = virtnet_rq_get_data(rq, buf, rq->last_dma);
> > > +       } else {
> > > +               sg_init_one(rq->sg, buf + VIRTNET_RX_PAD + xdp_headroom,
> > > +                           vi->hdr_len + GOOD_PACKET_LEN);
> > > +               data = (void *)buf;
> > > +       }
> > > +
> > > +       err = virtqueue_add_inbuf_ctx(rq->vq, rq->sg, 1, data, ctx, gfp);
> > >         if (err < 0)
> > > -               put_page(virt_to_head_page(buf));
> > > +               goto add_err;
> > > +
> > > +       return err;
> > > +
> > > +add_err:
> > > +       if (rq->data_array) {
> > > +               virtnet_rq_unmap(rq, data->dma);
> > > +               virtnet_rq_recycle_data(rq, data);
> > > +       }
> > > +
> > > +map_err:
> > > +       put_page(virt_to_head_page(buf));
> > >         return err;
> > >  }
> > >
> > > @@ -1620,6 +1842,7 @@ static int add_recvbuf_mergeable(struct virtnet_info *vi,
> > >         unsigned int headroom = virtnet_get_headroom(vi);
> > >         unsigned int tailroom = headroom ? sizeof(struct skb_shared_info) : 0;
> > >         unsigned int room = SKB_DATA_ALIGN(headroom + tailroom);
> > > +       struct virtnet_rq_data *data;
> > >         char *buf;
> > >         void *ctx;
> > >         int err;
> > > @@ -1650,12 +1873,32 @@ static int add_recvbuf_mergeable(struct virtnet_info *vi,
> > >                 alloc_frag->offset += hole;
> > >         }
> > >
> > > -       sg_init_one(rq->sg, buf, len);
> > > +       if (rq->data_array) {
> > > +               err = virtnet_rq_map_sg(rq, buf, len);
> > > +               if (err)
> > > +                       goto map_err;
> > > +
> > > +               data = virtnet_rq_get_data(rq, buf, rq->last_dma);
> > > +       } else {
> > > +               sg_init_one(rq->sg, buf, len);
> > > +               data = (void *)buf;
> > > +       }
> > > +
> > >         ctx = mergeable_len_to_ctx(len + room, headroom);
> > > -       err = virtqueue_add_inbuf_ctx(rq->vq, rq->sg, 1, buf, ctx, gfp);
> > > +       err = virtqueue_add_inbuf_ctx(rq->vq, rq->sg, 1, data, ctx, gfp);
> > >         if (err < 0)
> > > -               put_page(virt_to_head_page(buf));
> > > +               goto add_err;
> > > +
> > > +       return 0;
> > > +
> > > +add_err:
> > > +       if (rq->data_array) {
> > > +               virtnet_rq_unmap(rq, data->dma);
> > > +               virtnet_rq_recycle_data(rq, data);
> > > +       }
> > >
> > > +map_err:
> > > +       put_page(virt_to_head_page(buf));
> > >         return err;
> > >  }
> > >
> > > @@ -1775,13 +2018,13 @@ static int virtnet_receive(struct receive_queue *rq, int budget,
> > >                 void *ctx;
> > >
> > >                 while (stats.packets < budget &&
> > > -                      (buf = virtqueue_get_buf_ctx(rq->vq, &len, &ctx))) {
> > > +                      (buf = virtnet_rq_get_buf(rq, &len, &ctx))) {
> > >                         receive_buf(vi, rq, buf, len, ctx, xdp_xmit, &stats);
> > >                         stats.packets++;
> > >                 }
> > >         } else {
> > >                 while (stats.packets < budget &&
> > > -                      (buf = virtqueue_get_buf(rq->vq, &len)) != NULL) {
> > > +                      (buf = virtnet_rq_get_buf(rq, &len, NULL)) != NULL) {
> > >                         receive_buf(vi, rq, buf, len, NULL, xdp_xmit, &stats);
> > >                         stats.packets++;
> > >                 }
> > > @@ -3514,6 +3757,9 @@ static void virtnet_free_queues(struct virtnet_info *vi)
> > >         for (i = 0; i < vi->max_queue_pairs; i++) {
> > >                 __netif_napi_del(&vi->rq[i].napi);
> > >                 __netif_napi_del(&vi->sq[i].napi);
> > > +
> > > +               kfree(vi->rq[i].data_array);
> > > +               kfree(vi->rq[i].dma_array);
> > >         }
> > >
> > >         /* We called __netif_napi_del(),
> > > @@ -3591,9 +3837,10 @@ static void free_unused_bufs(struct virtnet_info *vi)
> > >         }
> > >
> > >         for (i = 0; i < vi->max_queue_pairs; i++) {
> > > -               struct virtqueue *vq = vi->rq[i].vq;
> > > -               while ((buf = virtqueue_detach_unused_buf(vq)) != NULL)
> > > -                       virtnet_rq_free_unused_buf(vq, buf);
> > > +               struct receive_queue *rq = &vi->rq[i];
> > > +
> > > +               while ((buf = virtnet_rq_detach_unused_buf(rq)) != NULL)
> > > +                       virtnet_rq_free_unused_buf(rq->vq, buf);
> > >                 cond_resched();
> > >         }
> > >  }
> > > @@ -3767,6 +4014,10 @@ static int init_vqs(struct virtnet_info *vi)
> > >         if (ret)
> > >                 goto err_free;
> > >
> > > +       ret = virtnet_rq_merge_map_init(vi);
> > > +       if (ret)
> > > +               goto err_free;
> > > +
> > >         cpus_read_lock();
> > >         virtnet_set_affinity(vi);
> > >         cpus_read_unlock();
> > > --
> > > 2.32.0.3.g01195cf9f
> > >
> >
>


^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: [PATCH vhost v11 10/10] virtio_net: merge dma operation for one page
@ 2023-07-14  3:56         ` Jason Wang
  0 siblings, 0 replies; 176+ messages in thread
From: Jason Wang @ 2023-07-14  3:56 UTC (permalink / raw)
  To: Xuan Zhuo
  Cc: Jesper Dangaard Brouer, Daniel Borkmann, Michael S. Tsirkin,
	netdev, John Fastabend, Alexei Starovoitov, virtualization,
	Christoph Hellwig, Eric Dumazet, Jakub Kicinski, bpf,
	Paolo Abeni, David S. Miller

On Thu, Jul 13, 2023 at 2:54 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
>
> On Thu, 13 Jul 2023 12:20:01 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > On Mon, Jul 10, 2023 at 11:43 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > >
> >
> > I'd suggest to tweak the title like:
> >
> > "merge dma operations when refilling mergeable buffers"
> >
> > > Currently, the virtio core will perform a dma operation for each
> > > operation.
> >
> > "for each buffer"?
> >
> > > Although, the same page may be operated multiple times.
> > >
> > > The driver does the dma operation and manages the dma address based the
> > > feature premapped of virtio core.
> > >
> > > This way, we can perform only one dma operation for the same page. In
> > > the case of mtu 1500, this can reduce a lot of dma operations.
> > >
> > > Tested on Aliyun g7.4large machine, in the case of a cpu 100%, pps
> > > increased from 1893766 to 1901105. An increase of 0.4%.
> >
> > Btw, it looks to me the code to deal with XDP_TX/REDIRECT for
> > linearized pages was missed.
> >
> > >
> > > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > > ---
> > >  drivers/net/virtio_net.c | 283 ++++++++++++++++++++++++++++++++++++---
> > >  1 file changed, 267 insertions(+), 16 deletions(-)
> > >
> > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > > index 486b5849033d..4de845d35bed 100644
> > > --- a/drivers/net/virtio_net.c
> > > +++ b/drivers/net/virtio_net.c
> > > @@ -126,6 +126,27 @@ static const struct virtnet_stat_desc virtnet_rq_stats_desc[] = {
> > >  #define VIRTNET_SQ_STATS_LEN   ARRAY_SIZE(virtnet_sq_stats_desc)
> > >  #define VIRTNET_RQ_STATS_LEN   ARRAY_SIZE(virtnet_rq_stats_desc)
> > >
> > > +/* The bufs on the same page may share this struct. */
> > > +struct virtnet_rq_dma {
> > > +       struct virtnet_rq_dma *next;
> > > +
> > > +       dma_addr_t addr;
> > > +
> > > +       void *buf;
> > > +       u32 len;
> > > +
> > > +       u32 ref;
> > > +};
> > > +
> > > +/* Record the dma and buf. */
> > > +struct virtnet_rq_data {
> > > +       struct virtnet_rq_data *next;
> > > +
> > > +       void *buf;
> > > +
> > > +       struct virtnet_rq_dma *dma;
> > > +};
> > > +
> > >  /* Internal representation of a send virtqueue */
> > >  struct send_queue {
> > >         /* Virtqueue associated with this send _queue */
> > > @@ -175,6 +196,13 @@ struct receive_queue {
> > >         char name[16];
> > >
> > >         struct xdp_rxq_info xdp_rxq;
> > > +
> > > +       struct virtnet_rq_data *data_array;
> > > +       struct virtnet_rq_data *data_free;
> > > +
> > > +       struct virtnet_rq_dma *dma_array;
> > > +       struct virtnet_rq_dma *dma_free;
> > > +       struct virtnet_rq_dma *last_dma;
> > >  };
> > >
> > >  /* This structure can contain rss message with maximum settings for indirection table and keysize
> > > @@ -549,6 +577,176 @@ static struct sk_buff *page_to_skb(struct virtnet_info *vi,
> > >         return skb;
> > >  }
> > >
> > > +static void virtnet_rq_unmap(struct receive_queue *rq, struct virtnet_rq_dma *dma)
> > > +{
> > > +       struct device *dev;
> > > +
> > > +       --dma->ref;
> > > +
> > > +       if (dma->ref)
> > > +               return;
> > > +
> > > +       dev = virtqueue_dma_dev(rq->vq);
> > > +
> > > +       dma_unmap_page(dev, dma->addr, dma->len, DMA_FROM_DEVICE);
> > > +
> > > +       dma->next = rq->dma_free;
> > > +       rq->dma_free = dma;
> > > +}
> > > +
> > > +static void *virtnet_rq_recycle_data(struct receive_queue *rq,
> > > +                                    struct virtnet_rq_data *data)
> > > +{
> > > +       void *buf;
> > > +
> > > +       buf = data->buf;
> > > +
> > > +       data->next = rq->data_free;
> > > +       rq->data_free = data;
> > > +
> > > +       return buf;
> > > +}
> > > +
> > > +static struct virtnet_rq_data *virtnet_rq_get_data(struct receive_queue *rq,
> > > +                                                  void *buf,
> > > +                                                  struct virtnet_rq_dma *dma)
> > > +{
> > > +       struct virtnet_rq_data *data;
> > > +
> > > +       data = rq->data_free;
> > > +       rq->data_free = data->next;
> > > +
> > > +       data->buf = buf;
> > > +       data->dma = dma;
> > > +
> > > +       return data;
> > > +}
> > > +
> > > +static void *virtnet_rq_get_buf(struct receive_queue *rq, u32 *len, void **ctx)
> > > +{
> > > +       struct virtnet_rq_data *data;
> > > +       void *buf;
> > > +
> > > +       buf = virtqueue_get_buf_ctx(rq->vq, len, ctx);
> > > +       if (!buf || !rq->data_array)
> > > +               return buf;
> > > +
> > > +       data = buf;
> > > +
> > > +       virtnet_rq_unmap(rq, data->dma);
> > > +
> > > +       return virtnet_rq_recycle_data(rq, data);
> > > +}
> > > +
> > > +static void *virtnet_rq_detach_unused_buf(struct receive_queue *rq)
> > > +{
> > > +       struct virtnet_rq_data *data;
> > > +       void *buf;
> > > +
> > > +       buf = virtqueue_detach_unused_buf(rq->vq);
> > > +       if (!buf || !rq->data_array)
> > > +               return buf;
> > > +
> > > +       data = buf;
> > > +
> > > +       virtnet_rq_unmap(rq, data->dma);
> > > +
> > > +       return virtnet_rq_recycle_data(rq, data);
> > > +}
> > > +
> > > +static int virtnet_rq_map_sg(struct receive_queue *rq, void *buf, u32 len)
> > > +{
> > > +       struct virtnet_rq_dma *dma = rq->last_dma;
> > > +       struct device *dev;
> > > +       u32 off, map_len;
> > > +       dma_addr_t addr;
> > > +       void *end;
> > > +
> > > +       if (likely(dma) && buf >= dma->buf && (buf + len <= dma->buf + dma->len)) {
> > > +               ++dma->ref;
> > > +               addr = dma->addr + (buf - dma->buf);
> > > +               goto ok;
> > > +       }
> > > +
> > > +       end = buf + len - 1;
> > > +       off = offset_in_page(end);
> > > +       map_len = len + PAGE_SIZE - off;
> >
> > This assumes a PAGE_SIZE which seems sub-optimal as page frag could be
> > larger than this.
> >
> > > +
> > > +       dev = virtqueue_dma_dev(rq->vq);
> > > +
> > > +       addr = dma_map_page_attrs(dev, virt_to_page(buf), offset_in_page(buf),
> > > +                                 map_len, DMA_FROM_DEVICE, 0);
> > > +       if (addr == DMA_MAPPING_ERROR)
> > > +               return -ENOMEM;
> > > +
> > > +       dma = rq->dma_free;
> > > +       rq->dma_free = dma->next;
> > > +
> > > +       dma->ref = 1;
> > > +       dma->buf = buf;
> > > +       dma->addr = addr;
> > > +       dma->len = map_len;
> > > +
> > > +       rq->last_dma = dma;
> > > +
> > > +ok:
> > > +       sg_init_table(rq->sg, 1);
> > > +       rq->sg[0].dma_address = addr;
> > > +       rq->sg[0].length = len;
> > > +
> > > +       return 0;
> > > +}
> > > +
> > > +static int virtnet_rq_merge_map_init(struct virtnet_info *vi)
> > > +{
> > > +       struct receive_queue *rq;
> > > +       int i, err, j, num;
> > > +
> > > +       /* disable for big mode */
> > > +       if (!vi->mergeable_rx_bufs && vi->big_packets)
> > > +               return 0;
> > > +
> > > +       for (i = 0; i < vi->max_queue_pairs; i++) {
> > > +               err = virtqueue_set_premapped(vi->rq[i].vq);
> > > +               if (err)
> > > +                       continue;
> > > +
> > > +               rq = &vi->rq[i];
> > > +
> > > +               num = virtqueue_get_vring_size(rq->vq);
> > > +
> > > +               rq->data_array = kmalloc_array(num, sizeof(*rq->data_array), GFP_KERNEL);
> > > +               if (!rq->data_array)
> >
> > Can we avoid those allocations when we don't use the DMA API?
> >
> > > +                       goto err;
> > > +
> > > +               rq->dma_array = kmalloc_array(num, sizeof(*rq->dma_array), GFP_KERNEL);
> > > +               if (!rq->dma_array)
> > > +                       goto err;
> > > +
> > > +               for (j = 0; j < num; ++j) {
> > > +                       rq->data_array[j].next = rq->data_free;
> > > +                       rq->data_free = &rq->data_array[j];
> > > +
> > > +                       rq->dma_array[j].next = rq->dma_free;
> > > +                       rq->dma_free = &rq->dma_array[j];
> > > +               }
> > > +       }
> > > +
> > > +       return 0;
> > > +
> > > +err:
> > > +       for (i = 0; i < vi->max_queue_pairs; i++) {
> > > +               struct receive_queue *rq;
> > > +
> > > +               rq = &vi->rq[i];
> > > +
> > > +               kfree(rq->dma_array);
> > > +               kfree(rq->data_array);
> > > +       }
> > > +
> > > +       return -ENOMEM;
> > > +}
> > > +
> > >  static void free_old_xmit_skbs(struct send_queue *sq, bool in_napi)
> > >  {
> > >         unsigned int len;
> > > @@ -835,7 +1033,7 @@ static struct page *xdp_linearize_page(struct receive_queue *rq,
> > >                 void *buf;
> > >                 int off;
> > >
> > > -               buf = virtqueue_get_buf(rq->vq, &buflen);
> > > +               buf = virtnet_rq_get_buf(rq, &buflen, NULL);
> > >                 if (unlikely(!buf))
> > >                         goto err_buf;
> > >
> > > @@ -1126,7 +1324,7 @@ static int virtnet_build_xdp_buff_mrg(struct net_device *dev,
> > >                 return -EINVAL;
> > >
> > >         while (--*num_buf > 0) {
> > > -               buf = virtqueue_get_buf_ctx(rq->vq, &len, &ctx);
> > > +               buf = virtnet_rq_get_buf(rq, &len, &ctx);
> > >                 if (unlikely(!buf)) {
> > >                         pr_debug("%s: rx error: %d buffers out of %d missing\n",
> > >                                  dev->name, *num_buf,
> > > @@ -1351,7 +1549,7 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
> > >         while (--num_buf) {
> > >                 int num_skb_frags;
> > >
> > > -               buf = virtqueue_get_buf_ctx(rq->vq, &len, &ctx);
> > > +               buf = virtnet_rq_get_buf(rq, &len, &ctx);
> > >                 if (unlikely(!buf)) {
> > >                         pr_debug("%s: rx error: %d buffers out of %d missing\n",
> > >                                  dev->name, num_buf,
> > > @@ -1414,7 +1612,7 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
> > >  err_skb:
> > >         put_page(page);
> > >         while (num_buf-- > 1) {
> > > -               buf = virtqueue_get_buf(rq->vq, &len);
> > > +               buf = virtnet_rq_get_buf(rq, &len, NULL);
> > >                 if (unlikely(!buf)) {
> > >                         pr_debug("%s: rx error: %d buffers missing\n",
> > >                                  dev->name, num_buf);
> > > @@ -1529,6 +1727,7 @@ static int add_recvbuf_small(struct virtnet_info *vi, struct receive_queue *rq,
> > >         unsigned int xdp_headroom = virtnet_get_headroom(vi);
> > >         void *ctx = (void *)(unsigned long)xdp_headroom;
> > >         int len = vi->hdr_len + VIRTNET_RX_PAD + GOOD_PACKET_LEN + xdp_headroom;
> > > +       struct virtnet_rq_data *data;
> > >         int err;
> > >
> > >         len = SKB_DATA_ALIGN(len) +
> > > @@ -1539,11 +1738,34 @@ static int add_recvbuf_small(struct virtnet_info *vi, struct receive_queue *rq,
> > >         buf = (char *)page_address(alloc_frag->page) + alloc_frag->offset;
> > >         get_page(alloc_frag->page);
> > >         alloc_frag->offset += len;
> > > -       sg_init_one(rq->sg, buf + VIRTNET_RX_PAD + xdp_headroom,
> > > -                   vi->hdr_len + GOOD_PACKET_LEN);
> > > -       err = virtqueue_add_inbuf_ctx(rq->vq, rq->sg, 1, buf, ctx, gfp);
> > > +
> > > +       if (rq->data_array) {
> > > +               err = virtnet_rq_map_sg(rq, buf + VIRTNET_RX_PAD + xdp_headroom,
> > > +                                       vi->hdr_len + GOOD_PACKET_LEN);
> >
> > Thanks to the compound page. I wonder if everything could be
> > simplified if we just reuse page->private for storing metadata like
> > dma address and refcnt. Then we don't need extra stuff for tracking
> > any other thing?
>
> I didn't use page->private because if part of the page is used by one skb then
> the driver is not the only owner. Can we still use page->private?

You are right, we can't since there's no guarantee that a skb will
occupy a full page.

Thanks

>
> Thanks.
>
>
>
>
> >
> > Thanks
> >
> >
> >
> > > +               if (err)
> > > +                       goto map_err;
> > > +
> > > +               data = virtnet_rq_get_data(rq, buf, rq->last_dma);
> > > +       } else {
> > > +               sg_init_one(rq->sg, buf + VIRTNET_RX_PAD + xdp_headroom,
> > > +                           vi->hdr_len + GOOD_PACKET_LEN);
> > > +               data = (void *)buf;
> > > +       }
> > > +
> > > +       err = virtqueue_add_inbuf_ctx(rq->vq, rq->sg, 1, data, ctx, gfp);
> > >         if (err < 0)
> > > -               put_page(virt_to_head_page(buf));
> > > +               goto add_err;
> > > +
> > > +       return err;
> > > +
> > > +add_err:
> > > +       if (rq->data_array) {
> > > +               virtnet_rq_unmap(rq, data->dma);
> > > +               virtnet_rq_recycle_data(rq, data);
> > > +       }
> > > +
> > > +map_err:
> > > +       put_page(virt_to_head_page(buf));
> > >         return err;
> > >  }
> > >
> > > @@ -1620,6 +1842,7 @@ static int add_recvbuf_mergeable(struct virtnet_info *vi,
> > >         unsigned int headroom = virtnet_get_headroom(vi);
> > >         unsigned int tailroom = headroom ? sizeof(struct skb_shared_info) : 0;
> > >         unsigned int room = SKB_DATA_ALIGN(headroom + tailroom);
> > > +       struct virtnet_rq_data *data;
> > >         char *buf;
> > >         void *ctx;
> > >         int err;
> > > @@ -1650,12 +1873,32 @@ static int add_recvbuf_mergeable(struct virtnet_info *vi,
> > >                 alloc_frag->offset += hole;
> > >         }
> > >
> > > -       sg_init_one(rq->sg, buf, len);
> > > +       if (rq->data_array) {
> > > +               err = virtnet_rq_map_sg(rq, buf, len);
> > > +               if (err)
> > > +                       goto map_err;
> > > +
> > > +               data = virtnet_rq_get_data(rq, buf, rq->last_dma);
> > > +       } else {
> > > +               sg_init_one(rq->sg, buf, len);
> > > +               data = (void *)buf;
> > > +       }
> > > +
> > >         ctx = mergeable_len_to_ctx(len + room, headroom);
> > > -       err = virtqueue_add_inbuf_ctx(rq->vq, rq->sg, 1, buf, ctx, gfp);
> > > +       err = virtqueue_add_inbuf_ctx(rq->vq, rq->sg, 1, data, ctx, gfp);
> > >         if (err < 0)
> > > -               put_page(virt_to_head_page(buf));
> > > +               goto add_err;
> > > +
> > > +       return 0;
> > > +
> > > +add_err:
> > > +       if (rq->data_array) {
> > > +               virtnet_rq_unmap(rq, data->dma);
> > > +               virtnet_rq_recycle_data(rq, data);
> > > +       }
> > >
> > > +map_err:
> > > +       put_page(virt_to_head_page(buf));
> > >         return err;
> > >  }
> > >
> > > @@ -1775,13 +2018,13 @@ static int virtnet_receive(struct receive_queue *rq, int budget,
> > >                 void *ctx;
> > >
> > >                 while (stats.packets < budget &&
> > > -                      (buf = virtqueue_get_buf_ctx(rq->vq, &len, &ctx))) {
> > > +                      (buf = virtnet_rq_get_buf(rq, &len, &ctx))) {
> > >                         receive_buf(vi, rq, buf, len, ctx, xdp_xmit, &stats);
> > >                         stats.packets++;
> > >                 }
> > >         } else {
> > >                 while (stats.packets < budget &&
> > > -                      (buf = virtqueue_get_buf(rq->vq, &len)) != NULL) {
> > > +                      (buf = virtnet_rq_get_buf(rq, &len, NULL)) != NULL) {
> > >                         receive_buf(vi, rq, buf, len, NULL, xdp_xmit, &stats);
> > >                         stats.packets++;
> > >                 }
> > > @@ -3514,6 +3757,9 @@ static void virtnet_free_queues(struct virtnet_info *vi)
> > >         for (i = 0; i < vi->max_queue_pairs; i++) {
> > >                 __netif_napi_del(&vi->rq[i].napi);
> > >                 __netif_napi_del(&vi->sq[i].napi);
> > > +
> > > +               kfree(vi->rq[i].data_array);
> > > +               kfree(vi->rq[i].dma_array);
> > >         }
> > >
> > >         /* We called __netif_napi_del(),
> > > @@ -3591,9 +3837,10 @@ static void free_unused_bufs(struct virtnet_info *vi)
> > >         }
> > >
> > >         for (i = 0; i < vi->max_queue_pairs; i++) {
> > > -               struct virtqueue *vq = vi->rq[i].vq;
> > > -               while ((buf = virtqueue_detach_unused_buf(vq)) != NULL)
> > > -                       virtnet_rq_free_unused_buf(vq, buf);
> > > +               struct receive_queue *rq = &vi->rq[i];
> > > +
> > > +               while ((buf = virtnet_rq_detach_unused_buf(rq)) != NULL)
> > > +                       virtnet_rq_free_unused_buf(rq->vq, buf);
> > >                 cond_resched();
> > >         }
> > >  }
> > > @@ -3767,6 +4014,10 @@ static int init_vqs(struct virtnet_info *vi)
> > >         if (ret)
> > >                 goto err_free;
> > >
> > > +       ret = virtnet_rq_merge_map_init(vi);
> > > +       if (ret)
> > > +               goto err_free;
> > > +
> > >         cpus_read_lock();
> > >         virtnet_set_affinity(vi);
> > >         cpus_read_unlock();
> > > --
> > > 2.32.0.3.g01195cf9f
> > >
> >
>

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: [PATCH vhost v11 10/10] virtio_net: merge dma operation for one page
  2023-07-13  7:00       ` Xuan Zhuo
@ 2023-07-14  3:57         ` Jason Wang
  -1 siblings, 0 replies; 176+ messages in thread
From: Jason Wang @ 2023-07-14  3:57 UTC (permalink / raw)
  To: Xuan Zhuo
  Cc: virtualization, Michael S. Tsirkin, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Alexei Starovoitov,
	Daniel Borkmann, Jesper Dangaard Brouer, John Fastabend, netdev,
	bpf, Christoph Hellwig

On Thu, Jul 13, 2023 at 3:02 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
>
> On Thu, 13 Jul 2023 12:20:01 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > On Mon, Jul 10, 2023 at 11:43 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > >
> >
> > I'd suggest to tweak the title like:
> >
> > "merge dma operations when refilling mergeable buffers"
> >
> > > Currently, the virtio core will perform a dma operation for each
> > > operation.
> >
> > "for each buffer"?
> >
> > > Although, the same page may be operated multiple times.
> > >
> > > The driver does the dma operation and manages the dma address based the
> > > feature premapped of virtio core.
> > >
> > > This way, we can perform only one dma operation for the same page. In
> > > the case of mtu 1500, this can reduce a lot of dma operations.
> > >
> > > Tested on Aliyun g7.4large machine, in the case of a cpu 100%, pps
> > > increased from 1893766 to 1901105. An increase of 0.4%.
> >
> > Btw, it looks to me the code to deal with XDP_TX/REDIRECT for
> > linearized pages was missed.
> >
> > >
> > > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > > ---
> > >  drivers/net/virtio_net.c | 283 ++++++++++++++++++++++++++++++++++++---
> > >  1 file changed, 267 insertions(+), 16 deletions(-)
> > >
> > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > > index 486b5849033d..4de845d35bed 100644
> > > --- a/drivers/net/virtio_net.c
> > > +++ b/drivers/net/virtio_net.c
> > > @@ -126,6 +126,27 @@ static const struct virtnet_stat_desc virtnet_rq_stats_desc[] = {
> > >  #define VIRTNET_SQ_STATS_LEN   ARRAY_SIZE(virtnet_sq_stats_desc)
> > >  #define VIRTNET_RQ_STATS_LEN   ARRAY_SIZE(virtnet_rq_stats_desc)
> > >
> > > +/* The bufs on the same page may share this struct. */
> > > +struct virtnet_rq_dma {
> > > +       struct virtnet_rq_dma *next;
> > > +
> > > +       dma_addr_t addr;
> > > +
> > > +       void *buf;
> > > +       u32 len;
> > > +
> > > +       u32 ref;
> > > +};
> > > +
> > > +/* Record the dma and buf. */
> > > +struct virtnet_rq_data {
> > > +       struct virtnet_rq_data *next;
> > > +
> > > +       void *buf;
> > > +
> > > +       struct virtnet_rq_dma *dma;
> > > +};
> > > +
> > >  /* Internal representation of a send virtqueue */
> > >  struct send_queue {
> > >         /* Virtqueue associated with this send _queue */
> > > @@ -175,6 +196,13 @@ struct receive_queue {
> > >         char name[16];
> > >
> > >         struct xdp_rxq_info xdp_rxq;
> > > +
> > > +       struct virtnet_rq_data *data_array;
> > > +       struct virtnet_rq_data *data_free;
> > > +
> > > +       struct virtnet_rq_dma *dma_array;
> > > +       struct virtnet_rq_dma *dma_free;
> > > +       struct virtnet_rq_dma *last_dma;
> > >  };
> > >
> > >  /* This structure can contain rss message with maximum settings for indirection table and keysize
> > > @@ -549,6 +577,176 @@ static struct sk_buff *page_to_skb(struct virtnet_info *vi,
> > >         return skb;
> > >  }
> > >
> > > +static void virtnet_rq_unmap(struct receive_queue *rq, struct virtnet_rq_dma *dma)
> > > +{
> > > +       struct device *dev;
> > > +
> > > +       --dma->ref;
> > > +
> > > +       if (dma->ref)
> > > +               return;
> > > +
> > > +       dev = virtqueue_dma_dev(rq->vq);
> > > +
> > > +       dma_unmap_page(dev, dma->addr, dma->len, DMA_FROM_DEVICE);
> > > +
> > > +       dma->next = rq->dma_free;
> > > +       rq->dma_free = dma;
> > > +}
> > > +
> > > +static void *virtnet_rq_recycle_data(struct receive_queue *rq,
> > > +                                    struct virtnet_rq_data *data)
> > > +{
> > > +       void *buf;
> > > +
> > > +       buf = data->buf;
> > > +
> > > +       data->next = rq->data_free;
> > > +       rq->data_free = data;
> > > +
> > > +       return buf;
> > > +}
> > > +
> > > +static struct virtnet_rq_data *virtnet_rq_get_data(struct receive_queue *rq,
> > > +                                                  void *buf,
> > > +                                                  struct virtnet_rq_dma *dma)
> > > +{
> > > +       struct virtnet_rq_data *data;
> > > +
> > > +       data = rq->data_free;
> > > +       rq->data_free = data->next;
> > > +
> > > +       data->buf = buf;
> > > +       data->dma = dma;
> > > +
> > > +       return data;
> > > +}
> > > +
> > > +static void *virtnet_rq_get_buf(struct receive_queue *rq, u32 *len, void **ctx)
> > > +{
> > > +       struct virtnet_rq_data *data;
> > > +       void *buf;
> > > +
> > > +       buf = virtqueue_get_buf_ctx(rq->vq, len, ctx);
> > > +       if (!buf || !rq->data_array)
> > > +               return buf;
> > > +
> > > +       data = buf;
> > > +
> > > +       virtnet_rq_unmap(rq, data->dma);
> > > +
> > > +       return virtnet_rq_recycle_data(rq, data);
> > > +}
> > > +
> > > +static void *virtnet_rq_detach_unused_buf(struct receive_queue *rq)
> > > +{
> > > +       struct virtnet_rq_data *data;
> > > +       void *buf;
> > > +
> > > +       buf = virtqueue_detach_unused_buf(rq->vq);
> > > +       if (!buf || !rq->data_array)
> > > +               return buf;
> > > +
> > > +       data = buf;
> > > +
> > > +       virtnet_rq_unmap(rq, data->dma);
> > > +
> > > +       return virtnet_rq_recycle_data(rq, data);
> > > +}
> > > +
> > > +static int virtnet_rq_map_sg(struct receive_queue *rq, void *buf, u32 len)
> > > +{
> > > +       struct virtnet_rq_dma *dma = rq->last_dma;
> > > +       struct device *dev;
> > > +       u32 off, map_len;
> > > +       dma_addr_t addr;
> > > +       void *end;
> > > +
> > > +       if (likely(dma) && buf >= dma->buf && (buf + len <= dma->buf + dma->len)) {
> > > +               ++dma->ref;
> > > +               addr = dma->addr + (buf - dma->buf);
> > > +               goto ok;
> > > +       }
> > > +
> > > +       end = buf + len - 1;
> > > +       off = offset_in_page(end);
> > > +       map_len = len + PAGE_SIZE - off;
> >
> > This assumes a PAGE_SIZE which seems sub-optimal as page frag could be
> > larger than this.
> >
> > > +
> > > +       dev = virtqueue_dma_dev(rq->vq);
> > > +
> > > +       addr = dma_map_page_attrs(dev, virt_to_page(buf), offset_in_page(buf),
> > > +                                 map_len, DMA_FROM_DEVICE, 0);
> > > +       if (addr == DMA_MAPPING_ERROR)
> > > +               return -ENOMEM;
> > > +
> > > +       dma = rq->dma_free;
> > > +       rq->dma_free = dma->next;
> > > +
> > > +       dma->ref = 1;
> > > +       dma->buf = buf;
> > > +       dma->addr = addr;
> > > +       dma->len = map_len;
> > > +
> > > +       rq->last_dma = dma;
> > > +
> > > +ok:
> > > +       sg_init_table(rq->sg, 1);
> > > +       rq->sg[0].dma_address = addr;
> > > +       rq->sg[0].length = len;
> > > +
> > > +       return 0;
> > > +}
> > > +
> > > +static int virtnet_rq_merge_map_init(struct virtnet_info *vi)
> > > +{
> > > +       struct receive_queue *rq;
> > > +       int i, err, j, num;
> > > +
> > > +       /* disable for big mode */
> > > +       if (!vi->mergeable_rx_bufs && vi->big_packets)
> > > +               return 0;
> > > +
> > > +       for (i = 0; i < vi->max_queue_pairs; i++) {
> > > +               err = virtqueue_set_premapped(vi->rq[i].vq);
> > > +               if (err)
> > > +                       continue;
> > > +
> > > +               rq = &vi->rq[i];
> > > +
> > > +               num = virtqueue_get_vring_size(rq->vq);
> > > +
> > > +               rq->data_array = kmalloc_array(num, sizeof(*rq->data_array), GFP_KERNEL);
> > > +               if (!rq->data_array)
> >
> > Can we avoid those allocations when we don't use the DMA API?
> >
> > > +                       goto err;
> > > +
> > > +               rq->dma_array = kmalloc_array(num, sizeof(*rq->dma_array), GFP_KERNEL);
> > > +               if (!rq->dma_array)
> > > +                       goto err;
> > > +
> > > +               for (j = 0; j < num; ++j) {
> > > +                       rq->data_array[j].next = rq->data_free;
> > > +                       rq->data_free = &rq->data_array[j];
> > > +
> > > +                       rq->dma_array[j].next = rq->dma_free;
> > > +                       rq->dma_free = &rq->dma_array[j];
> > > +               }
> > > +       }
> > > +
> > > +       return 0;
> > > +
> > > +err:
> > > +       for (i = 0; i < vi->max_queue_pairs; i++) {
> > > +               struct receive_queue *rq;
> > > +
> > > +               rq = &vi->rq[i];
> > > +
> > > +               kfree(rq->dma_array);
> > > +               kfree(rq->data_array);
> > > +       }
> > > +
> > > +       return -ENOMEM;
> > > +}
> > > +
> > >  static void free_old_xmit_skbs(struct send_queue *sq, bool in_napi)
> > >  {
> > >         unsigned int len;
> > > @@ -835,7 +1033,7 @@ static struct page *xdp_linearize_page(struct receive_queue *rq,
> > >                 void *buf;
> > >                 int off;
> > >
> > > -               buf = virtqueue_get_buf(rq->vq, &buflen);
> > > +               buf = virtnet_rq_get_buf(rq, &buflen, NULL);
> > >                 if (unlikely(!buf))
> > >                         goto err_buf;
> > >
> > > @@ -1126,7 +1324,7 @@ static int virtnet_build_xdp_buff_mrg(struct net_device *dev,
> > >                 return -EINVAL;
> > >
> > >         while (--*num_buf > 0) {
> > > -               buf = virtqueue_get_buf_ctx(rq->vq, &len, &ctx);
> > > +               buf = virtnet_rq_get_buf(rq, &len, &ctx);
> > >                 if (unlikely(!buf)) {
> > >                         pr_debug("%s: rx error: %d buffers out of %d missing\n",
> > >                                  dev->name, *num_buf,
> > > @@ -1351,7 +1549,7 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
> > >         while (--num_buf) {
> > >                 int num_skb_frags;
> > >
> > > -               buf = virtqueue_get_buf_ctx(rq->vq, &len, &ctx);
> > > +               buf = virtnet_rq_get_buf(rq, &len, &ctx);
> > >                 if (unlikely(!buf)) {
> > >                         pr_debug("%s: rx error: %d buffers out of %d missing\n",
> > >                                  dev->name, num_buf,
> > > @@ -1414,7 +1612,7 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
> > >  err_skb:
> > >         put_page(page);
> > >         while (num_buf-- > 1) {
> > > -               buf = virtqueue_get_buf(rq->vq, &len);
> > > +               buf = virtnet_rq_get_buf(rq, &len, NULL);
> > >                 if (unlikely(!buf)) {
> > >                         pr_debug("%s: rx error: %d buffers missing\n",
> > >                                  dev->name, num_buf);
> > > @@ -1529,6 +1727,7 @@ static int add_recvbuf_small(struct virtnet_info *vi, struct receive_queue *rq,
> > >         unsigned int xdp_headroom = virtnet_get_headroom(vi);
> > >         void *ctx = (void *)(unsigned long)xdp_headroom;
> > >         int len = vi->hdr_len + VIRTNET_RX_PAD + GOOD_PACKET_LEN + xdp_headroom;
> > > +       struct virtnet_rq_data *data;
> > >         int err;
> > >
> > >         len = SKB_DATA_ALIGN(len) +
> > > @@ -1539,11 +1738,34 @@ static int add_recvbuf_small(struct virtnet_info *vi, struct receive_queue *rq,
> > >         buf = (char *)page_address(alloc_frag->page) + alloc_frag->offset;
> > >         get_page(alloc_frag->page);
> > >         alloc_frag->offset += len;
> > > -       sg_init_one(rq->sg, buf + VIRTNET_RX_PAD + xdp_headroom,
> > > -                   vi->hdr_len + GOOD_PACKET_LEN);
> > > -       err = virtqueue_add_inbuf_ctx(rq->vq, rq->sg, 1, buf, ctx, gfp);
> > > +
> > > +       if (rq->data_array) {
> > > +               err = virtnet_rq_map_sg(rq, buf + VIRTNET_RX_PAD + xdp_headroom,
> > > +                                       vi->hdr_len + GOOD_PACKET_LEN);
> >
> > Thanks to the compound page. I wonder if everything could be
> > simplified if we just reuse page->private for storing metadata like
> > dma address and refcnt. Then we don't need extra stuff for tracking
> > any other thing?
>
> Maybe we can try alloc one small buffer from the page_frag to store the dma info
> when page_frag.offset == 0.

And store it in the ctx? I think it should work.

Thanks

>
> Thanks.
>
>
> >
> > Thanks
> >
> >
> >
> > > +               if (err)
> > > +                       goto map_err;
> > > +
> > > +               data = virtnet_rq_get_data(rq, buf, rq->last_dma);
> > > +       } else {
> > > +               sg_init_one(rq->sg, buf + VIRTNET_RX_PAD + xdp_headroom,
> > > +                           vi->hdr_len + GOOD_PACKET_LEN);
> > > +               data = (void *)buf;
> > > +       }
> > > +
> > > +       err = virtqueue_add_inbuf_ctx(rq->vq, rq->sg, 1, data, ctx, gfp);
> > >         if (err < 0)
> > > -               put_page(virt_to_head_page(buf));
> > > +               goto add_err;
> > > +
> > > +       return err;
> > > +
> > > +add_err:
> > > +       if (rq->data_array) {
> > > +               virtnet_rq_unmap(rq, data->dma);
> > > +               virtnet_rq_recycle_data(rq, data);
> > > +       }
> > > +
> > > +map_err:
> > > +       put_page(virt_to_head_page(buf));
> > >         return err;
> > >  }
> > >
> > > @@ -1620,6 +1842,7 @@ static int add_recvbuf_mergeable(struct virtnet_info *vi,
> > >         unsigned int headroom = virtnet_get_headroom(vi);
> > >         unsigned int tailroom = headroom ? sizeof(struct skb_shared_info) : 0;
> > >         unsigned int room = SKB_DATA_ALIGN(headroom + tailroom);
> > > +       struct virtnet_rq_data *data;
> > >         char *buf;
> > >         void *ctx;
> > >         int err;
> > > @@ -1650,12 +1873,32 @@ static int add_recvbuf_mergeable(struct virtnet_info *vi,
> > >                 alloc_frag->offset += hole;
> > >         }
> > >
> > > -       sg_init_one(rq->sg, buf, len);
> > > +       if (rq->data_array) {
> > > +               err = virtnet_rq_map_sg(rq, buf, len);
> > > +               if (err)
> > > +                       goto map_err;
> > > +
> > > +               data = virtnet_rq_get_data(rq, buf, rq->last_dma);
> > > +       } else {
> > > +               sg_init_one(rq->sg, buf, len);
> > > +               data = (void *)buf;
> > > +       }
> > > +
> > >         ctx = mergeable_len_to_ctx(len + room, headroom);
> > > -       err = virtqueue_add_inbuf_ctx(rq->vq, rq->sg, 1, buf, ctx, gfp);
> > > +       err = virtqueue_add_inbuf_ctx(rq->vq, rq->sg, 1, data, ctx, gfp);
> > >         if (err < 0)
> > > -               put_page(virt_to_head_page(buf));
> > > +               goto add_err;
> > > +
> > > +       return 0;
> > > +
> > > +add_err:
> > > +       if (rq->data_array) {
> > > +               virtnet_rq_unmap(rq, data->dma);
> > > +               virtnet_rq_recycle_data(rq, data);
> > > +       }
> > >
> > > +map_err:
> > > +       put_page(virt_to_head_page(buf));
> > >         return err;
> > >  }
> > >
> > > @@ -1775,13 +2018,13 @@ static int virtnet_receive(struct receive_queue *rq, int budget,
> > >                 void *ctx;
> > >
> > >                 while (stats.packets < budget &&
> > > -                      (buf = virtqueue_get_buf_ctx(rq->vq, &len, &ctx))) {
> > > +                      (buf = virtnet_rq_get_buf(rq, &len, &ctx))) {
> > >                         receive_buf(vi, rq, buf, len, ctx, xdp_xmit, &stats);
> > >                         stats.packets++;
> > >                 }
> > >         } else {
> > >                 while (stats.packets < budget &&
> > > -                      (buf = virtqueue_get_buf(rq->vq, &len)) != NULL) {
> > > +                      (buf = virtnet_rq_get_buf(rq, &len, NULL)) != NULL) {
> > >                         receive_buf(vi, rq, buf, len, NULL, xdp_xmit, &stats);
> > >                         stats.packets++;
> > >                 }
> > > @@ -3514,6 +3757,9 @@ static void virtnet_free_queues(struct virtnet_info *vi)
> > >         for (i = 0; i < vi->max_queue_pairs; i++) {
> > >                 __netif_napi_del(&vi->rq[i].napi);
> > >                 __netif_napi_del(&vi->sq[i].napi);
> > > +
> > > +               kfree(vi->rq[i].data_array);
> > > +               kfree(vi->rq[i].dma_array);
> > >         }
> > >
> > >         /* We called __netif_napi_del(),
> > > @@ -3591,9 +3837,10 @@ static void free_unused_bufs(struct virtnet_info *vi)
> > >         }
> > >
> > >         for (i = 0; i < vi->max_queue_pairs; i++) {
> > > -               struct virtqueue *vq = vi->rq[i].vq;
> > > -               while ((buf = virtqueue_detach_unused_buf(vq)) != NULL)
> > > -                       virtnet_rq_free_unused_buf(vq, buf);
> > > +               struct receive_queue *rq = &vi->rq[i];
> > > +
> > > +               while ((buf = virtnet_rq_detach_unused_buf(rq)) != NULL)
> > > +                       virtnet_rq_free_unused_buf(rq->vq, buf);
> > >                 cond_resched();
> > >         }
> > >  }
> > > @@ -3767,6 +4014,10 @@ static int init_vqs(struct virtnet_info *vi)
> > >         if (ret)
> > >                 goto err_free;
> > >
> > > +       ret = virtnet_rq_merge_map_init(vi);
> > > +       if (ret)
> > > +               goto err_free;
> > > +
> > >         cpus_read_lock();
> > >         virtnet_set_affinity(vi);
> > >         cpus_read_unlock();
> > > --
> > > 2.32.0.3.g01195cf9f
> > >
> >
>


^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: [PATCH vhost v11 10/10] virtio_net: merge dma operation for one page
@ 2023-07-14  3:57         ` Jason Wang
  0 siblings, 0 replies; 176+ messages in thread
From: Jason Wang @ 2023-07-14  3:57 UTC (permalink / raw)
  To: Xuan Zhuo
  Cc: Jesper Dangaard Brouer, Daniel Borkmann, Michael S. Tsirkin,
	netdev, John Fastabend, Alexei Starovoitov, virtualization,
	Christoph Hellwig, Eric Dumazet, Jakub Kicinski, bpf,
	Paolo Abeni, David S. Miller

On Thu, Jul 13, 2023 at 3:02 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
>
> On Thu, 13 Jul 2023 12:20:01 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > On Mon, Jul 10, 2023 at 11:43 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > >
> >
> > I'd suggest to tweak the title like:
> >
> > "merge dma operations when refilling mergeable buffers"
> >
> > > Currently, the virtio core will perform a dma operation for each
> > > operation.
> >
> > "for each buffer"?
> >
> > > Although, the same page may be operated multiple times.
> > >
> > > The driver does the dma operation and manages the dma address based the
> > > feature premapped of virtio core.
> > >
> > > This way, we can perform only one dma operation for the same page. In
> > > the case of mtu 1500, this can reduce a lot of dma operations.
> > >
> > > Tested on Aliyun g7.4large machine, in the case of a cpu 100%, pps
> > > increased from 1893766 to 1901105. An increase of 0.4%.
> >
> > Btw, it looks to me the code to deal with XDP_TX/REDIRECT for
> > linearized pages was missed.
> >
> > >
> > > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > > ---
> > >  drivers/net/virtio_net.c | 283 ++++++++++++++++++++++++++++++++++++---
> > >  1 file changed, 267 insertions(+), 16 deletions(-)
> > >
> > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > > index 486b5849033d..4de845d35bed 100644
> > > --- a/drivers/net/virtio_net.c
> > > +++ b/drivers/net/virtio_net.c
> > > @@ -126,6 +126,27 @@ static const struct virtnet_stat_desc virtnet_rq_stats_desc[] = {
> > >  #define VIRTNET_SQ_STATS_LEN   ARRAY_SIZE(virtnet_sq_stats_desc)
> > >  #define VIRTNET_RQ_STATS_LEN   ARRAY_SIZE(virtnet_rq_stats_desc)
> > >
> > > +/* The bufs on the same page may share this struct. */
> > > +struct virtnet_rq_dma {
> > > +       struct virtnet_rq_dma *next;
> > > +
> > > +       dma_addr_t addr;
> > > +
> > > +       void *buf;
> > > +       u32 len;
> > > +
> > > +       u32 ref;
> > > +};
> > > +
> > > +/* Record the dma and buf. */
> > > +struct virtnet_rq_data {
> > > +       struct virtnet_rq_data *next;
> > > +
> > > +       void *buf;
> > > +
> > > +       struct virtnet_rq_dma *dma;
> > > +};
> > > +
> > >  /* Internal representation of a send virtqueue */
> > >  struct send_queue {
> > >         /* Virtqueue associated with this send _queue */
> > > @@ -175,6 +196,13 @@ struct receive_queue {
> > >         char name[16];
> > >
> > >         struct xdp_rxq_info xdp_rxq;
> > > +
> > > +       struct virtnet_rq_data *data_array;
> > > +       struct virtnet_rq_data *data_free;
> > > +
> > > +       struct virtnet_rq_dma *dma_array;
> > > +       struct virtnet_rq_dma *dma_free;
> > > +       struct virtnet_rq_dma *last_dma;
> > >  };
> > >
> > >  /* This structure can contain rss message with maximum settings for indirection table and keysize
> > > @@ -549,6 +577,176 @@ static struct sk_buff *page_to_skb(struct virtnet_info *vi,
> > >         return skb;
> > >  }
> > >
> > > +static void virtnet_rq_unmap(struct receive_queue *rq, struct virtnet_rq_dma *dma)
> > > +{
> > > +       struct device *dev;
> > > +
> > > +       --dma->ref;
> > > +
> > > +       if (dma->ref)
> > > +               return;
> > > +
> > > +       dev = virtqueue_dma_dev(rq->vq);
> > > +
> > > +       dma_unmap_page(dev, dma->addr, dma->len, DMA_FROM_DEVICE);
> > > +
> > > +       dma->next = rq->dma_free;
> > > +       rq->dma_free = dma;
> > > +}
> > > +
> > > +static void *virtnet_rq_recycle_data(struct receive_queue *rq,
> > > +                                    struct virtnet_rq_data *data)
> > > +{
> > > +       void *buf;
> > > +
> > > +       buf = data->buf;
> > > +
> > > +       data->next = rq->data_free;
> > > +       rq->data_free = data;
> > > +
> > > +       return buf;
> > > +}
> > > +
> > > +static struct virtnet_rq_data *virtnet_rq_get_data(struct receive_queue *rq,
> > > +                                                  void *buf,
> > > +                                                  struct virtnet_rq_dma *dma)
> > > +{
> > > +       struct virtnet_rq_data *data;
> > > +
> > > +       data = rq->data_free;
> > > +       rq->data_free = data->next;
> > > +
> > > +       data->buf = buf;
> > > +       data->dma = dma;
> > > +
> > > +       return data;
> > > +}
> > > +
> > > +static void *virtnet_rq_get_buf(struct receive_queue *rq, u32 *len, void **ctx)
> > > +{
> > > +       struct virtnet_rq_data *data;
> > > +       void *buf;
> > > +
> > > +       buf = virtqueue_get_buf_ctx(rq->vq, len, ctx);
> > > +       if (!buf || !rq->data_array)
> > > +               return buf;
> > > +
> > > +       data = buf;
> > > +
> > > +       virtnet_rq_unmap(rq, data->dma);
> > > +
> > > +       return virtnet_rq_recycle_data(rq, data);
> > > +}
> > > +
> > > +static void *virtnet_rq_detach_unused_buf(struct receive_queue *rq)
> > > +{
> > > +       struct virtnet_rq_data *data;
> > > +       void *buf;
> > > +
> > > +       buf = virtqueue_detach_unused_buf(rq->vq);
> > > +       if (!buf || !rq->data_array)
> > > +               return buf;
> > > +
> > > +       data = buf;
> > > +
> > > +       virtnet_rq_unmap(rq, data->dma);
> > > +
> > > +       return virtnet_rq_recycle_data(rq, data);
> > > +}
> > > +
> > > +static int virtnet_rq_map_sg(struct receive_queue *rq, void *buf, u32 len)
> > > +{
> > > +       struct virtnet_rq_dma *dma = rq->last_dma;
> > > +       struct device *dev;
> > > +       u32 off, map_len;
> > > +       dma_addr_t addr;
> > > +       void *end;
> > > +
> > > +       if (likely(dma) && buf >= dma->buf && (buf + len <= dma->buf + dma->len)) {
> > > +               ++dma->ref;
> > > +               addr = dma->addr + (buf - dma->buf);
> > > +               goto ok;
> > > +       }
> > > +
> > > +       end = buf + len - 1;
> > > +       off = offset_in_page(end);
> > > +       map_len = len + PAGE_SIZE - off;
> >
> > This assumes a PAGE_SIZE which seems sub-optimal as page frag could be
> > larger than this.
> >
> > > +
> > > +       dev = virtqueue_dma_dev(rq->vq);
> > > +
> > > +       addr = dma_map_page_attrs(dev, virt_to_page(buf), offset_in_page(buf),
> > > +                                 map_len, DMA_FROM_DEVICE, 0);
> > > +       if (addr == DMA_MAPPING_ERROR)
> > > +               return -ENOMEM;
> > > +
> > > +       dma = rq->dma_free;
> > > +       rq->dma_free = dma->next;
> > > +
> > > +       dma->ref = 1;
> > > +       dma->buf = buf;
> > > +       dma->addr = addr;
> > > +       dma->len = map_len;
> > > +
> > > +       rq->last_dma = dma;
> > > +
> > > +ok:
> > > +       sg_init_table(rq->sg, 1);
> > > +       rq->sg[0].dma_address = addr;
> > > +       rq->sg[0].length = len;
> > > +
> > > +       return 0;
> > > +}
> > > +
> > > +static int virtnet_rq_merge_map_init(struct virtnet_info *vi)
> > > +{
> > > +       struct receive_queue *rq;
> > > +       int i, err, j, num;
> > > +
> > > +       /* disable for big mode */
> > > +       if (!vi->mergeable_rx_bufs && vi->big_packets)
> > > +               return 0;
> > > +
> > > +       for (i = 0; i < vi->max_queue_pairs; i++) {
> > > +               err = virtqueue_set_premapped(vi->rq[i].vq);
> > > +               if (err)
> > > +                       continue;
> > > +
> > > +               rq = &vi->rq[i];
> > > +
> > > +               num = virtqueue_get_vring_size(rq->vq);
> > > +
> > > +               rq->data_array = kmalloc_array(num, sizeof(*rq->data_array), GFP_KERNEL);
> > > +               if (!rq->data_array)
> >
> > Can we avoid those allocations when we don't use the DMA API?
> >
> > > +                       goto err;
> > > +
> > > +               rq->dma_array = kmalloc_array(num, sizeof(*rq->dma_array), GFP_KERNEL);
> > > +               if (!rq->dma_array)
> > > +                       goto err;
> > > +
> > > +               for (j = 0; j < num; ++j) {
> > > +                       rq->data_array[j].next = rq->data_free;
> > > +                       rq->data_free = &rq->data_array[j];
> > > +
> > > +                       rq->dma_array[j].next = rq->dma_free;
> > > +                       rq->dma_free = &rq->dma_array[j];
> > > +               }
> > > +       }
> > > +
> > > +       return 0;
> > > +
> > > +err:
> > > +       for (i = 0; i < vi->max_queue_pairs; i++) {
> > > +               struct receive_queue *rq;
> > > +
> > > +               rq = &vi->rq[i];
> > > +
> > > +               kfree(rq->dma_array);
> > > +               kfree(rq->data_array);
> > > +       }
> > > +
> > > +       return -ENOMEM;
> > > +}
> > > +
> > >  static void free_old_xmit_skbs(struct send_queue *sq, bool in_napi)
> > >  {
> > >         unsigned int len;
> > > @@ -835,7 +1033,7 @@ static struct page *xdp_linearize_page(struct receive_queue *rq,
> > >                 void *buf;
> > >                 int off;
> > >
> > > -               buf = virtqueue_get_buf(rq->vq, &buflen);
> > > +               buf = virtnet_rq_get_buf(rq, &buflen, NULL);
> > >                 if (unlikely(!buf))
> > >                         goto err_buf;
> > >
> > > @@ -1126,7 +1324,7 @@ static int virtnet_build_xdp_buff_mrg(struct net_device *dev,
> > >                 return -EINVAL;
> > >
> > >         while (--*num_buf > 0) {
> > > -               buf = virtqueue_get_buf_ctx(rq->vq, &len, &ctx);
> > > +               buf = virtnet_rq_get_buf(rq, &len, &ctx);
> > >                 if (unlikely(!buf)) {
> > >                         pr_debug("%s: rx error: %d buffers out of %d missing\n",
> > >                                  dev->name, *num_buf,
> > > @@ -1351,7 +1549,7 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
> > >         while (--num_buf) {
> > >                 int num_skb_frags;
> > >
> > > -               buf = virtqueue_get_buf_ctx(rq->vq, &len, &ctx);
> > > +               buf = virtnet_rq_get_buf(rq, &len, &ctx);
> > >                 if (unlikely(!buf)) {
> > >                         pr_debug("%s: rx error: %d buffers out of %d missing\n",
> > >                                  dev->name, num_buf,
> > > @@ -1414,7 +1612,7 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
> > >  err_skb:
> > >         put_page(page);
> > >         while (num_buf-- > 1) {
> > > -               buf = virtqueue_get_buf(rq->vq, &len);
> > > +               buf = virtnet_rq_get_buf(rq, &len, NULL);
> > >                 if (unlikely(!buf)) {
> > >                         pr_debug("%s: rx error: %d buffers missing\n",
> > >                                  dev->name, num_buf);
> > > @@ -1529,6 +1727,7 @@ static int add_recvbuf_small(struct virtnet_info *vi, struct receive_queue *rq,
> > >         unsigned int xdp_headroom = virtnet_get_headroom(vi);
> > >         void *ctx = (void *)(unsigned long)xdp_headroom;
> > >         int len = vi->hdr_len + VIRTNET_RX_PAD + GOOD_PACKET_LEN + xdp_headroom;
> > > +       struct virtnet_rq_data *data;
> > >         int err;
> > >
> > >         len = SKB_DATA_ALIGN(len) +
> > > @@ -1539,11 +1738,34 @@ static int add_recvbuf_small(struct virtnet_info *vi, struct receive_queue *rq,
> > >         buf = (char *)page_address(alloc_frag->page) + alloc_frag->offset;
> > >         get_page(alloc_frag->page);
> > >         alloc_frag->offset += len;
> > > -       sg_init_one(rq->sg, buf + VIRTNET_RX_PAD + xdp_headroom,
> > > -                   vi->hdr_len + GOOD_PACKET_LEN);
> > > -       err = virtqueue_add_inbuf_ctx(rq->vq, rq->sg, 1, buf, ctx, gfp);
> > > +
> > > +       if (rq->data_array) {
> > > +               err = virtnet_rq_map_sg(rq, buf + VIRTNET_RX_PAD + xdp_headroom,
> > > +                                       vi->hdr_len + GOOD_PACKET_LEN);
> >
> > Thanks to the compound page. I wonder if everything could be
> > simplified if we just reuse page->private for storing metadata like
> > dma address and refcnt. Then we don't need extra stuff for tracking
> > any other thing?
>
> Maybe we can try alloc one small buffer from the page_frag to store the dma info
> when page_frag.offset == 0.

And store it in the ctx? I think it should work.

Thanks

>
> Thanks.
>
>
> >
> > Thanks
> >
> >
> >
> > > +               if (err)
> > > +                       goto map_err;
> > > +
> > > +               data = virtnet_rq_get_data(rq, buf, rq->last_dma);
> > > +       } else {
> > > +               sg_init_one(rq->sg, buf + VIRTNET_RX_PAD + xdp_headroom,
> > > +                           vi->hdr_len + GOOD_PACKET_LEN);
> > > +               data = (void *)buf;
> > > +       }
> > > +
> > > +       err = virtqueue_add_inbuf_ctx(rq->vq, rq->sg, 1, data, ctx, gfp);
> > >         if (err < 0)
> > > -               put_page(virt_to_head_page(buf));
> > > +               goto add_err;
> > > +
> > > +       return err;
> > > +
> > > +add_err:
> > > +       if (rq->data_array) {
> > > +               virtnet_rq_unmap(rq, data->dma);
> > > +               virtnet_rq_recycle_data(rq, data);
> > > +       }
> > > +
> > > +map_err:
> > > +       put_page(virt_to_head_page(buf));
> > >         return err;
> > >  }
> > >
> > > @@ -1620,6 +1842,7 @@ static int add_recvbuf_mergeable(struct virtnet_info *vi,
> > >         unsigned int headroom = virtnet_get_headroom(vi);
> > >         unsigned int tailroom = headroom ? sizeof(struct skb_shared_info) : 0;
> > >         unsigned int room = SKB_DATA_ALIGN(headroom + tailroom);
> > > +       struct virtnet_rq_data *data;
> > >         char *buf;
> > >         void *ctx;
> > >         int err;
> > > @@ -1650,12 +1873,32 @@ static int add_recvbuf_mergeable(struct virtnet_info *vi,
> > >                 alloc_frag->offset += hole;
> > >         }
> > >
> > > -       sg_init_one(rq->sg, buf, len);
> > > +       if (rq->data_array) {
> > > +               err = virtnet_rq_map_sg(rq, buf, len);
> > > +               if (err)
> > > +                       goto map_err;
> > > +
> > > +               data = virtnet_rq_get_data(rq, buf, rq->last_dma);
> > > +       } else {
> > > +               sg_init_one(rq->sg, buf, len);
> > > +               data = (void *)buf;
> > > +       }
> > > +
> > >         ctx = mergeable_len_to_ctx(len + room, headroom);
> > > -       err = virtqueue_add_inbuf_ctx(rq->vq, rq->sg, 1, buf, ctx, gfp);
> > > +       err = virtqueue_add_inbuf_ctx(rq->vq, rq->sg, 1, data, ctx, gfp);
> > >         if (err < 0)
> > > -               put_page(virt_to_head_page(buf));
> > > +               goto add_err;
> > > +
> > > +       return 0;
> > > +
> > > +add_err:
> > > +       if (rq->data_array) {
> > > +               virtnet_rq_unmap(rq, data->dma);
> > > +               virtnet_rq_recycle_data(rq, data);
> > > +       }
> > >
> > > +map_err:
> > > +       put_page(virt_to_head_page(buf));
> > >         return err;
> > >  }
> > >
> > > @@ -1775,13 +2018,13 @@ static int virtnet_receive(struct receive_queue *rq, int budget,
> > >                 void *ctx;
> > >
> > >                 while (stats.packets < budget &&
> > > -                      (buf = virtqueue_get_buf_ctx(rq->vq, &len, &ctx))) {
> > > +                      (buf = virtnet_rq_get_buf(rq, &len, &ctx))) {
> > >                         receive_buf(vi, rq, buf, len, ctx, xdp_xmit, &stats);
> > >                         stats.packets++;
> > >                 }
> > >         } else {
> > >                 while (stats.packets < budget &&
> > > -                      (buf = virtqueue_get_buf(rq->vq, &len)) != NULL) {
> > > +                      (buf = virtnet_rq_get_buf(rq, &len, NULL)) != NULL) {
> > >                         receive_buf(vi, rq, buf, len, NULL, xdp_xmit, &stats);
> > >                         stats.packets++;
> > >                 }
> > > @@ -3514,6 +3757,9 @@ static void virtnet_free_queues(struct virtnet_info *vi)
> > >         for (i = 0; i < vi->max_queue_pairs; i++) {
> > >                 __netif_napi_del(&vi->rq[i].napi);
> > >                 __netif_napi_del(&vi->sq[i].napi);
> > > +
> > > +               kfree(vi->rq[i].data_array);
> > > +               kfree(vi->rq[i].dma_array);
> > >         }
> > >
> > >         /* We called __netif_napi_del(),
> > > @@ -3591,9 +3837,10 @@ static void free_unused_bufs(struct virtnet_info *vi)
> > >         }
> > >
> > >         for (i = 0; i < vi->max_queue_pairs; i++) {
> > > -               struct virtqueue *vq = vi->rq[i].vq;
> > > -               while ((buf = virtqueue_detach_unused_buf(vq)) != NULL)
> > > -                       virtnet_rq_free_unused_buf(vq, buf);
> > > +               struct receive_queue *rq = &vi->rq[i];
> > > +
> > > +               while ((buf = virtnet_rq_detach_unused_buf(rq)) != NULL)
> > > +                       virtnet_rq_free_unused_buf(rq->vq, buf);
> > >                 cond_resched();
> > >         }
> > >  }
> > > @@ -3767,6 +4014,10 @@ static int init_vqs(struct virtnet_info *vi)
> > >         if (ret)
> > >                 goto err_free;
> > >
> > > +       ret = virtnet_rq_merge_map_init(vi);
> > > +       if (ret)
> > > +               goto err_free;
> > > +
> > >         cpus_read_lock();
> > >         virtnet_set_affinity(vi);
> > >         cpus_read_unlock();
> > > --
> > > 2.32.0.3.g01195cf9f
> > >
> >
>

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: [PATCH vhost v11 10/10] virtio_net: merge dma operation for one page
  2023-07-14  3:57         ` Jason Wang
@ 2023-07-14  3:58           ` Xuan Zhuo
  -1 siblings, 0 replies; 176+ messages in thread
From: Xuan Zhuo @ 2023-07-14  3:58 UTC (permalink / raw)
  To: Jason Wang
  Cc: Jesper Dangaard Brouer, Daniel Borkmann, Michael S. Tsirkin,
	netdev, John Fastabend, Alexei Starovoitov, virtualization,
	Christoph Hellwig, Eric Dumazet, Jakub Kicinski, bpf,
	Paolo Abeni, David S. Miller

On Fri, 14 Jul 2023 11:57:05 +0800, Jason Wang <jasowang@redhat.com> wrote:
> On Thu, Jul 13, 2023 at 3:02 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> >
> > On Thu, 13 Jul 2023 12:20:01 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > On Mon, Jul 10, 2023 at 11:43 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > >
> > >
> > > I'd suggest to tweak the title like:
> > >
> > > "merge dma operations when refilling mergeable buffers"
> > >
> > > > Currently, the virtio core will perform a dma operation for each
> > > > operation.
> > >
> > > "for each buffer"?
> > >
> > > > Although, the same page may be operated multiple times.
> > > >
> > > > The driver does the dma operation and manages the dma address based the
> > > > feature premapped of virtio core.
> > > >
> > > > This way, we can perform only one dma operation for the same page. In
> > > > the case of mtu 1500, this can reduce a lot of dma operations.
> > > >
> > > > Tested on Aliyun g7.4large machine, in the case of a cpu 100%, pps
> > > > increased from 1893766 to 1901105. An increase of 0.4%.
> > >
> > > Btw, it looks to me the code to deal with XDP_TX/REDIRECT for
> > > linearized pages was missed.
> > >
> > > >
> > > > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > > > ---
> > > >  drivers/net/virtio_net.c | 283 ++++++++++++++++++++++++++++++++++++---
> > > >  1 file changed, 267 insertions(+), 16 deletions(-)
> > > >
> > > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > > > index 486b5849033d..4de845d35bed 100644
> > > > --- a/drivers/net/virtio_net.c
> > > > +++ b/drivers/net/virtio_net.c
> > > > @@ -126,6 +126,27 @@ static const struct virtnet_stat_desc virtnet_rq_stats_desc[] = {
> > > >  #define VIRTNET_SQ_STATS_LEN   ARRAY_SIZE(virtnet_sq_stats_desc)
> > > >  #define VIRTNET_RQ_STATS_LEN   ARRAY_SIZE(virtnet_rq_stats_desc)
> > > >
> > > > +/* The bufs on the same page may share this struct. */
> > > > +struct virtnet_rq_dma {
> > > > +       struct virtnet_rq_dma *next;
> > > > +
> > > > +       dma_addr_t addr;
> > > > +
> > > > +       void *buf;
> > > > +       u32 len;
> > > > +
> > > > +       u32 ref;
> > > > +};
> > > > +
> > > > +/* Record the dma and buf. */
> > > > +struct virtnet_rq_data {
> > > > +       struct virtnet_rq_data *next;
> > > > +
> > > > +       void *buf;
> > > > +
> > > > +       struct virtnet_rq_dma *dma;
> > > > +};
> > > > +
> > > >  /* Internal representation of a send virtqueue */
> > > >  struct send_queue {
> > > >         /* Virtqueue associated with this send _queue */
> > > > @@ -175,6 +196,13 @@ struct receive_queue {
> > > >         char name[16];
> > > >
> > > >         struct xdp_rxq_info xdp_rxq;
> > > > +
> > > > +       struct virtnet_rq_data *data_array;
> > > > +       struct virtnet_rq_data *data_free;
> > > > +
> > > > +       struct virtnet_rq_dma *dma_array;
> > > > +       struct virtnet_rq_dma *dma_free;
> > > > +       struct virtnet_rq_dma *last_dma;
> > > >  };
> > > >
> > > >  /* This structure can contain rss message with maximum settings for indirection table and keysize
> > > > @@ -549,6 +577,176 @@ static struct sk_buff *page_to_skb(struct virtnet_info *vi,
> > > >         return skb;
> > > >  }
> > > >
> > > > +static void virtnet_rq_unmap(struct receive_queue *rq, struct virtnet_rq_dma *dma)
> > > > +{
> > > > +       struct device *dev;
> > > > +
> > > > +       --dma->ref;
> > > > +
> > > > +       if (dma->ref)
> > > > +               return;
> > > > +
> > > > +       dev = virtqueue_dma_dev(rq->vq);
> > > > +
> > > > +       dma_unmap_page(dev, dma->addr, dma->len, DMA_FROM_DEVICE);
> > > > +
> > > > +       dma->next = rq->dma_free;
> > > > +       rq->dma_free = dma;
> > > > +}
> > > > +
> > > > +static void *virtnet_rq_recycle_data(struct receive_queue *rq,
> > > > +                                    struct virtnet_rq_data *data)
> > > > +{
> > > > +       void *buf;
> > > > +
> > > > +       buf = data->buf;
> > > > +
> > > > +       data->next = rq->data_free;
> > > > +       rq->data_free = data;
> > > > +
> > > > +       return buf;
> > > > +}
> > > > +
> > > > +static struct virtnet_rq_data *virtnet_rq_get_data(struct receive_queue *rq,
> > > > +                                                  void *buf,
> > > > +                                                  struct virtnet_rq_dma *dma)
> > > > +{
> > > > +       struct virtnet_rq_data *data;
> > > > +
> > > > +       data = rq->data_free;
> > > > +       rq->data_free = data->next;
> > > > +
> > > > +       data->buf = buf;
> > > > +       data->dma = dma;
> > > > +
> > > > +       return data;
> > > > +}
> > > > +
> > > > +static void *virtnet_rq_get_buf(struct receive_queue *rq, u32 *len, void **ctx)
> > > > +{
> > > > +       struct virtnet_rq_data *data;
> > > > +       void *buf;
> > > > +
> > > > +       buf = virtqueue_get_buf_ctx(rq->vq, len, ctx);
> > > > +       if (!buf || !rq->data_array)
> > > > +               return buf;
> > > > +
> > > > +       data = buf;
> > > > +
> > > > +       virtnet_rq_unmap(rq, data->dma);
> > > > +
> > > > +       return virtnet_rq_recycle_data(rq, data);
> > > > +}
> > > > +
> > > > +static void *virtnet_rq_detach_unused_buf(struct receive_queue *rq)
> > > > +{
> > > > +       struct virtnet_rq_data *data;
> > > > +       void *buf;
> > > > +
> > > > +       buf = virtqueue_detach_unused_buf(rq->vq);
> > > > +       if (!buf || !rq->data_array)
> > > > +               return buf;
> > > > +
> > > > +       data = buf;
> > > > +
> > > > +       virtnet_rq_unmap(rq, data->dma);
> > > > +
> > > > +       return virtnet_rq_recycle_data(rq, data);
> > > > +}
> > > > +
> > > > +static int virtnet_rq_map_sg(struct receive_queue *rq, void *buf, u32 len)
> > > > +{
> > > > +       struct virtnet_rq_dma *dma = rq->last_dma;
> > > > +       struct device *dev;
> > > > +       u32 off, map_len;
> > > > +       dma_addr_t addr;
> > > > +       void *end;
> > > > +
> > > > +       if (likely(dma) && buf >= dma->buf && (buf + len <= dma->buf + dma->len)) {
> > > > +               ++dma->ref;
> > > > +               addr = dma->addr + (buf - dma->buf);
> > > > +               goto ok;
> > > > +       }
> > > > +
> > > > +       end = buf + len - 1;
> > > > +       off = offset_in_page(end);
> > > > +       map_len = len + PAGE_SIZE - off;
> > >
> > > This assumes a PAGE_SIZE which seems sub-optimal as page frag could be
> > > larger than this.
> > >
> > > > +
> > > > +       dev = virtqueue_dma_dev(rq->vq);
> > > > +
> > > > +       addr = dma_map_page_attrs(dev, virt_to_page(buf), offset_in_page(buf),
> > > > +                                 map_len, DMA_FROM_DEVICE, 0);
> > > > +       if (addr == DMA_MAPPING_ERROR)
> > > > +               return -ENOMEM;
> > > > +
> > > > +       dma = rq->dma_free;
> > > > +       rq->dma_free = dma->next;
> > > > +
> > > > +       dma->ref = 1;
> > > > +       dma->buf = buf;
> > > > +       dma->addr = addr;
> > > > +       dma->len = map_len;
> > > > +
> > > > +       rq->last_dma = dma;
> > > > +
> > > > +ok:
> > > > +       sg_init_table(rq->sg, 1);
> > > > +       rq->sg[0].dma_address = addr;
> > > > +       rq->sg[0].length = len;
> > > > +
> > > > +       return 0;
> > > > +}
> > > > +
> > > > +static int virtnet_rq_merge_map_init(struct virtnet_info *vi)
> > > > +{
> > > > +       struct receive_queue *rq;
> > > > +       int i, err, j, num;
> > > > +
> > > > +       /* disable for big mode */
> > > > +       if (!vi->mergeable_rx_bufs && vi->big_packets)
> > > > +               return 0;
> > > > +
> > > > +       for (i = 0; i < vi->max_queue_pairs; i++) {
> > > > +               err = virtqueue_set_premapped(vi->rq[i].vq);
> > > > +               if (err)
> > > > +                       continue;
> > > > +
> > > > +               rq = &vi->rq[i];
> > > > +
> > > > +               num = virtqueue_get_vring_size(rq->vq);
> > > > +
> > > > +               rq->data_array = kmalloc_array(num, sizeof(*rq->data_array), GFP_KERNEL);
> > > > +               if (!rq->data_array)
> > >
> > > Can we avoid those allocations when we don't use the DMA API?
> > >
> > > > +                       goto err;
> > > > +
> > > > +               rq->dma_array = kmalloc_array(num, sizeof(*rq->dma_array), GFP_KERNEL);
> > > > +               if (!rq->dma_array)
> > > > +                       goto err;
> > > > +
> > > > +               for (j = 0; j < num; ++j) {
> > > > +                       rq->data_array[j].next = rq->data_free;
> > > > +                       rq->data_free = &rq->data_array[j];
> > > > +
> > > > +                       rq->dma_array[j].next = rq->dma_free;
> > > > +                       rq->dma_free = &rq->dma_array[j];
> > > > +               }
> > > > +       }
> > > > +
> > > > +       return 0;
> > > > +
> > > > +err:
> > > > +       for (i = 0; i < vi->max_queue_pairs; i++) {
> > > > +               struct receive_queue *rq;
> > > > +
> > > > +               rq = &vi->rq[i];
> > > > +
> > > > +               kfree(rq->dma_array);
> > > > +               kfree(rq->data_array);
> > > > +       }
> > > > +
> > > > +       return -ENOMEM;
> > > > +}
> > > > +
> > > >  static void free_old_xmit_skbs(struct send_queue *sq, bool in_napi)
> > > >  {
> > > >         unsigned int len;
> > > > @@ -835,7 +1033,7 @@ static struct page *xdp_linearize_page(struct receive_queue *rq,
> > > >                 void *buf;
> > > >                 int off;
> > > >
> > > > -               buf = virtqueue_get_buf(rq->vq, &buflen);
> > > > +               buf = virtnet_rq_get_buf(rq, &buflen, NULL);
> > > >                 if (unlikely(!buf))
> > > >                         goto err_buf;
> > > >
> > > > @@ -1126,7 +1324,7 @@ static int virtnet_build_xdp_buff_mrg(struct net_device *dev,
> > > >                 return -EINVAL;
> > > >
> > > >         while (--*num_buf > 0) {
> > > > -               buf = virtqueue_get_buf_ctx(rq->vq, &len, &ctx);
> > > > +               buf = virtnet_rq_get_buf(rq, &len, &ctx);
> > > >                 if (unlikely(!buf)) {
> > > >                         pr_debug("%s: rx error: %d buffers out of %d missing\n",
> > > >                                  dev->name, *num_buf,
> > > > @@ -1351,7 +1549,7 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
> > > >         while (--num_buf) {
> > > >                 int num_skb_frags;
> > > >
> > > > -               buf = virtqueue_get_buf_ctx(rq->vq, &len, &ctx);
> > > > +               buf = virtnet_rq_get_buf(rq, &len, &ctx);
> > > >                 if (unlikely(!buf)) {
> > > >                         pr_debug("%s: rx error: %d buffers out of %d missing\n",
> > > >                                  dev->name, num_buf,
> > > > @@ -1414,7 +1612,7 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
> > > >  err_skb:
> > > >         put_page(page);
> > > >         while (num_buf-- > 1) {
> > > > -               buf = virtqueue_get_buf(rq->vq, &len);
> > > > +               buf = virtnet_rq_get_buf(rq, &len, NULL);
> > > >                 if (unlikely(!buf)) {
> > > >                         pr_debug("%s: rx error: %d buffers missing\n",
> > > >                                  dev->name, num_buf);
> > > > @@ -1529,6 +1727,7 @@ static int add_recvbuf_small(struct virtnet_info *vi, struct receive_queue *rq,
> > > >         unsigned int xdp_headroom = virtnet_get_headroom(vi);
> > > >         void *ctx = (void *)(unsigned long)xdp_headroom;
> > > >         int len = vi->hdr_len + VIRTNET_RX_PAD + GOOD_PACKET_LEN + xdp_headroom;
> > > > +       struct virtnet_rq_data *data;
> > > >         int err;
> > > >
> > > >         len = SKB_DATA_ALIGN(len) +
> > > > @@ -1539,11 +1738,34 @@ static int add_recvbuf_small(struct virtnet_info *vi, struct receive_queue *rq,
> > > >         buf = (char *)page_address(alloc_frag->page) + alloc_frag->offset;
> > > >         get_page(alloc_frag->page);
> > > >         alloc_frag->offset += len;
> > > > -       sg_init_one(rq->sg, buf + VIRTNET_RX_PAD + xdp_headroom,
> > > > -                   vi->hdr_len + GOOD_PACKET_LEN);
> > > > -       err = virtqueue_add_inbuf_ctx(rq->vq, rq->sg, 1, buf, ctx, gfp);
> > > > +
> > > > +       if (rq->data_array) {
> > > > +               err = virtnet_rq_map_sg(rq, buf + VIRTNET_RX_PAD + xdp_headroom,
> > > > +                                       vi->hdr_len + GOOD_PACKET_LEN);
> > >
> > > Thanks to the compound page. I wonder if everything could be
> > > simplified if we just reuse page->private for storing metadata like
> > > dma address and refcnt. Then we don't need extra stuff for tracking
> > > any other thing?
> >
> > Maybe we can try alloc one small buffer from the page_frag to store the dma info
> > when page_frag.offset == 0.
>
> And store it in the ctx? I think it should work.


Since the dma information is located on the first page of the composite page, we
can get the dma information through buf.

No need to modify ctx.

Thanks.


>
> Thanks
>
> >
> > Thanks.
> >
> >
> > >
> > > Thanks
> > >
> > >
> > >
> > > > +               if (err)
> > > > +                       goto map_err;
> > > > +
> > > > +               data = virtnet_rq_get_data(rq, buf, rq->last_dma);
> > > > +       } else {
> > > > +               sg_init_one(rq->sg, buf + VIRTNET_RX_PAD + xdp_headroom,
> > > > +                           vi->hdr_len + GOOD_PACKET_LEN);
> > > > +               data = (void *)buf;
> > > > +       }
> > > > +
> > > > +       err = virtqueue_add_inbuf_ctx(rq->vq, rq->sg, 1, data, ctx, gfp);
> > > >         if (err < 0)
> > > > -               put_page(virt_to_head_page(buf));
> > > > +               goto add_err;
> > > > +
> > > > +       return err;
> > > > +
> > > > +add_err:
> > > > +       if (rq->data_array) {
> > > > +               virtnet_rq_unmap(rq, data->dma);
> > > > +               virtnet_rq_recycle_data(rq, data);
> > > > +       }
> > > > +
> > > > +map_err:
> > > > +       put_page(virt_to_head_page(buf));
> > > >         return err;
> > > >  }
> > > >
> > > > @@ -1620,6 +1842,7 @@ static int add_recvbuf_mergeable(struct virtnet_info *vi,
> > > >         unsigned int headroom = virtnet_get_headroom(vi);
> > > >         unsigned int tailroom = headroom ? sizeof(struct skb_shared_info) : 0;
> > > >         unsigned int room = SKB_DATA_ALIGN(headroom + tailroom);
> > > > +       struct virtnet_rq_data *data;
> > > >         char *buf;
> > > >         void *ctx;
> > > >         int err;
> > > > @@ -1650,12 +1873,32 @@ static int add_recvbuf_mergeable(struct virtnet_info *vi,
> > > >                 alloc_frag->offset += hole;
> > > >         }
> > > >
> > > > -       sg_init_one(rq->sg, buf, len);
> > > > +       if (rq->data_array) {
> > > > +               err = virtnet_rq_map_sg(rq, buf, len);
> > > > +               if (err)
> > > > +                       goto map_err;
> > > > +
> > > > +               data = virtnet_rq_get_data(rq, buf, rq->last_dma);
> > > > +       } else {
> > > > +               sg_init_one(rq->sg, buf, len);
> > > > +               data = (void *)buf;
> > > > +       }
> > > > +
> > > >         ctx = mergeable_len_to_ctx(len + room, headroom);
> > > > -       err = virtqueue_add_inbuf_ctx(rq->vq, rq->sg, 1, buf, ctx, gfp);
> > > > +       err = virtqueue_add_inbuf_ctx(rq->vq, rq->sg, 1, data, ctx, gfp);
> > > >         if (err < 0)
> > > > -               put_page(virt_to_head_page(buf));
> > > > +               goto add_err;
> > > > +
> > > > +       return 0;
> > > > +
> > > > +add_err:
> > > > +       if (rq->data_array) {
> > > > +               virtnet_rq_unmap(rq, data->dma);
> > > > +               virtnet_rq_recycle_data(rq, data);
> > > > +       }
> > > >
> > > > +map_err:
> > > > +       put_page(virt_to_head_page(buf));
> > > >         return err;
> > > >  }
> > > >
> > > > @@ -1775,13 +2018,13 @@ static int virtnet_receive(struct receive_queue *rq, int budget,
> > > >                 void *ctx;
> > > >
> > > >                 while (stats.packets < budget &&
> > > > -                      (buf = virtqueue_get_buf_ctx(rq->vq, &len, &ctx))) {
> > > > +                      (buf = virtnet_rq_get_buf(rq, &len, &ctx))) {
> > > >                         receive_buf(vi, rq, buf, len, ctx, xdp_xmit, &stats);
> > > >                         stats.packets++;
> > > >                 }
> > > >         } else {
> > > >                 while (stats.packets < budget &&
> > > > -                      (buf = virtqueue_get_buf(rq->vq, &len)) != NULL) {
> > > > +                      (buf = virtnet_rq_get_buf(rq, &len, NULL)) != NULL) {
> > > >                         receive_buf(vi, rq, buf, len, NULL, xdp_xmit, &stats);
> > > >                         stats.packets++;
> > > >                 }
> > > > @@ -3514,6 +3757,9 @@ static void virtnet_free_queues(struct virtnet_info *vi)
> > > >         for (i = 0; i < vi->max_queue_pairs; i++) {
> > > >                 __netif_napi_del(&vi->rq[i].napi);
> > > >                 __netif_napi_del(&vi->sq[i].napi);
> > > > +
> > > > +               kfree(vi->rq[i].data_array);
> > > > +               kfree(vi->rq[i].dma_array);
> > > >         }
> > > >
> > > >         /* We called __netif_napi_del(),
> > > > @@ -3591,9 +3837,10 @@ static void free_unused_bufs(struct virtnet_info *vi)
> > > >         }
> > > >
> > > >         for (i = 0; i < vi->max_queue_pairs; i++) {
> > > > -               struct virtqueue *vq = vi->rq[i].vq;
> > > > -               while ((buf = virtqueue_detach_unused_buf(vq)) != NULL)
> > > > -                       virtnet_rq_free_unused_buf(vq, buf);
> > > > +               struct receive_queue *rq = &vi->rq[i];
> > > > +
> > > > +               while ((buf = virtnet_rq_detach_unused_buf(rq)) != NULL)
> > > > +                       virtnet_rq_free_unused_buf(rq->vq, buf);
> > > >                 cond_resched();
> > > >         }
> > > >  }
> > > > @@ -3767,6 +4014,10 @@ static int init_vqs(struct virtnet_info *vi)
> > > >         if (ret)
> > > >                 goto err_free;
> > > >
> > > > +       ret = virtnet_rq_merge_map_init(vi);
> > > > +       if (ret)
> > > > +               goto err_free;
> > > > +
> > > >         cpus_read_lock();
> > > >         virtnet_set_affinity(vi);
> > > >         cpus_read_unlock();
> > > > --
> > > > 2.32.0.3.g01195cf9f
> > > >
> > >
> >
>
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: [PATCH vhost v11 10/10] virtio_net: merge dma operation for one page
@ 2023-07-14  3:58           ` Xuan Zhuo
  0 siblings, 0 replies; 176+ messages in thread
From: Xuan Zhuo @ 2023-07-14  3:58 UTC (permalink / raw)
  To: Jason Wang
  Cc: virtualization, Michael S. Tsirkin, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Alexei Starovoitov,
	Daniel Borkmann, Jesper Dangaard Brouer, John Fastabend, netdev,
	bpf, Christoph Hellwig

On Fri, 14 Jul 2023 11:57:05 +0800, Jason Wang <jasowang@redhat.com> wrote:
> On Thu, Jul 13, 2023 at 3:02 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> >
> > On Thu, 13 Jul 2023 12:20:01 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > On Mon, Jul 10, 2023 at 11:43 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > >
> > >
> > > I'd suggest to tweak the title like:
> > >
> > > "merge dma operations when refilling mergeable buffers"
> > >
> > > > Currently, the virtio core will perform a dma operation for each
> > > > operation.
> > >
> > > "for each buffer"?
> > >
> > > > Although, the same page may be operated multiple times.
> > > >
> > > > The driver does the dma operation and manages the dma address based the
> > > > feature premapped of virtio core.
> > > >
> > > > This way, we can perform only one dma operation for the same page. In
> > > > the case of mtu 1500, this can reduce a lot of dma operations.
> > > >
> > > > Tested on Aliyun g7.4large machine, in the case of a cpu 100%, pps
> > > > increased from 1893766 to 1901105. An increase of 0.4%.
> > >
> > > Btw, it looks to me the code to deal with XDP_TX/REDIRECT for
> > > linearized pages was missed.
> > >
> > > >
> > > > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > > > ---
> > > >  drivers/net/virtio_net.c | 283 ++++++++++++++++++++++++++++++++++++---
> > > >  1 file changed, 267 insertions(+), 16 deletions(-)
> > > >
> > > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > > > index 486b5849033d..4de845d35bed 100644
> > > > --- a/drivers/net/virtio_net.c
> > > > +++ b/drivers/net/virtio_net.c
> > > > @@ -126,6 +126,27 @@ static const struct virtnet_stat_desc virtnet_rq_stats_desc[] = {
> > > >  #define VIRTNET_SQ_STATS_LEN   ARRAY_SIZE(virtnet_sq_stats_desc)
> > > >  #define VIRTNET_RQ_STATS_LEN   ARRAY_SIZE(virtnet_rq_stats_desc)
> > > >
> > > > +/* The bufs on the same page may share this struct. */
> > > > +struct virtnet_rq_dma {
> > > > +       struct virtnet_rq_dma *next;
> > > > +
> > > > +       dma_addr_t addr;
> > > > +
> > > > +       void *buf;
> > > > +       u32 len;
> > > > +
> > > > +       u32 ref;
> > > > +};
> > > > +
> > > > +/* Record the dma and buf. */
> > > > +struct virtnet_rq_data {
> > > > +       struct virtnet_rq_data *next;
> > > > +
> > > > +       void *buf;
> > > > +
> > > > +       struct virtnet_rq_dma *dma;
> > > > +};
> > > > +
> > > >  /* Internal representation of a send virtqueue */
> > > >  struct send_queue {
> > > >         /* Virtqueue associated with this send _queue */
> > > > @@ -175,6 +196,13 @@ struct receive_queue {
> > > >         char name[16];
> > > >
> > > >         struct xdp_rxq_info xdp_rxq;
> > > > +
> > > > +       struct virtnet_rq_data *data_array;
> > > > +       struct virtnet_rq_data *data_free;
> > > > +
> > > > +       struct virtnet_rq_dma *dma_array;
> > > > +       struct virtnet_rq_dma *dma_free;
> > > > +       struct virtnet_rq_dma *last_dma;
> > > >  };
> > > >
> > > >  /* This structure can contain rss message with maximum settings for indirection table and keysize
> > > > @@ -549,6 +577,176 @@ static struct sk_buff *page_to_skb(struct virtnet_info *vi,
> > > >         return skb;
> > > >  }
> > > >
> > > > +static void virtnet_rq_unmap(struct receive_queue *rq, struct virtnet_rq_dma *dma)
> > > > +{
> > > > +       struct device *dev;
> > > > +
> > > > +       --dma->ref;
> > > > +
> > > > +       if (dma->ref)
> > > > +               return;
> > > > +
> > > > +       dev = virtqueue_dma_dev(rq->vq);
> > > > +
> > > > +       dma_unmap_page(dev, dma->addr, dma->len, DMA_FROM_DEVICE);
> > > > +
> > > > +       dma->next = rq->dma_free;
> > > > +       rq->dma_free = dma;
> > > > +}
> > > > +
> > > > +static void *virtnet_rq_recycle_data(struct receive_queue *rq,
> > > > +                                    struct virtnet_rq_data *data)
> > > > +{
> > > > +       void *buf;
> > > > +
> > > > +       buf = data->buf;
> > > > +
> > > > +       data->next = rq->data_free;
> > > > +       rq->data_free = data;
> > > > +
> > > > +       return buf;
> > > > +}
> > > > +
> > > > +static struct virtnet_rq_data *virtnet_rq_get_data(struct receive_queue *rq,
> > > > +                                                  void *buf,
> > > > +                                                  struct virtnet_rq_dma *dma)
> > > > +{
> > > > +       struct virtnet_rq_data *data;
> > > > +
> > > > +       data = rq->data_free;
> > > > +       rq->data_free = data->next;
> > > > +
> > > > +       data->buf = buf;
> > > > +       data->dma = dma;
> > > > +
> > > > +       return data;
> > > > +}
> > > > +
> > > > +static void *virtnet_rq_get_buf(struct receive_queue *rq, u32 *len, void **ctx)
> > > > +{
> > > > +       struct virtnet_rq_data *data;
> > > > +       void *buf;
> > > > +
> > > > +       buf = virtqueue_get_buf_ctx(rq->vq, len, ctx);
> > > > +       if (!buf || !rq->data_array)
> > > > +               return buf;
> > > > +
> > > > +       data = buf;
> > > > +
> > > > +       virtnet_rq_unmap(rq, data->dma);
> > > > +
> > > > +       return virtnet_rq_recycle_data(rq, data);
> > > > +}
> > > > +
> > > > +static void *virtnet_rq_detach_unused_buf(struct receive_queue *rq)
> > > > +{
> > > > +       struct virtnet_rq_data *data;
> > > > +       void *buf;
> > > > +
> > > > +       buf = virtqueue_detach_unused_buf(rq->vq);
> > > > +       if (!buf || !rq->data_array)
> > > > +               return buf;
> > > > +
> > > > +       data = buf;
> > > > +
> > > > +       virtnet_rq_unmap(rq, data->dma);
> > > > +
> > > > +       return virtnet_rq_recycle_data(rq, data);
> > > > +}
> > > > +
> > > > +static int virtnet_rq_map_sg(struct receive_queue *rq, void *buf, u32 len)
> > > > +{
> > > > +       struct virtnet_rq_dma *dma = rq->last_dma;
> > > > +       struct device *dev;
> > > > +       u32 off, map_len;
> > > > +       dma_addr_t addr;
> > > > +       void *end;
> > > > +
> > > > +       if (likely(dma) && buf >= dma->buf && (buf + len <= dma->buf + dma->len)) {
> > > > +               ++dma->ref;
> > > > +               addr = dma->addr + (buf - dma->buf);
> > > > +               goto ok;
> > > > +       }
> > > > +
> > > > +       end = buf + len - 1;
> > > > +       off = offset_in_page(end);
> > > > +       map_len = len + PAGE_SIZE - off;
> > >
> > > This assumes a PAGE_SIZE which seems sub-optimal as page frag could be
> > > larger than this.
> > >
> > > > +
> > > > +       dev = virtqueue_dma_dev(rq->vq);
> > > > +
> > > > +       addr = dma_map_page_attrs(dev, virt_to_page(buf), offset_in_page(buf),
> > > > +                                 map_len, DMA_FROM_DEVICE, 0);
> > > > +       if (addr == DMA_MAPPING_ERROR)
> > > > +               return -ENOMEM;
> > > > +
> > > > +       dma = rq->dma_free;
> > > > +       rq->dma_free = dma->next;
> > > > +
> > > > +       dma->ref = 1;
> > > > +       dma->buf = buf;
> > > > +       dma->addr = addr;
> > > > +       dma->len = map_len;
> > > > +
> > > > +       rq->last_dma = dma;
> > > > +
> > > > +ok:
> > > > +       sg_init_table(rq->sg, 1);
> > > > +       rq->sg[0].dma_address = addr;
> > > > +       rq->sg[0].length = len;
> > > > +
> > > > +       return 0;
> > > > +}
> > > > +
> > > > +static int virtnet_rq_merge_map_init(struct virtnet_info *vi)
> > > > +{
> > > > +       struct receive_queue *rq;
> > > > +       int i, err, j, num;
> > > > +
> > > > +       /* disable for big mode */
> > > > +       if (!vi->mergeable_rx_bufs && vi->big_packets)
> > > > +               return 0;
> > > > +
> > > > +       for (i = 0; i < vi->max_queue_pairs; i++) {
> > > > +               err = virtqueue_set_premapped(vi->rq[i].vq);
> > > > +               if (err)
> > > > +                       continue;
> > > > +
> > > > +               rq = &vi->rq[i];
> > > > +
> > > > +               num = virtqueue_get_vring_size(rq->vq);
> > > > +
> > > > +               rq->data_array = kmalloc_array(num, sizeof(*rq->data_array), GFP_KERNEL);
> > > > +               if (!rq->data_array)
> > >
> > > Can we avoid those allocations when we don't use the DMA API?
> > >
> > > > +                       goto err;
> > > > +
> > > > +               rq->dma_array = kmalloc_array(num, sizeof(*rq->dma_array), GFP_KERNEL);
> > > > +               if (!rq->dma_array)
> > > > +                       goto err;
> > > > +
> > > > +               for (j = 0; j < num; ++j) {
> > > > +                       rq->data_array[j].next = rq->data_free;
> > > > +                       rq->data_free = &rq->data_array[j];
> > > > +
> > > > +                       rq->dma_array[j].next = rq->dma_free;
> > > > +                       rq->dma_free = &rq->dma_array[j];
> > > > +               }
> > > > +       }
> > > > +
> > > > +       return 0;
> > > > +
> > > > +err:
> > > > +       for (i = 0; i < vi->max_queue_pairs; i++) {
> > > > +               struct receive_queue *rq;
> > > > +
> > > > +               rq = &vi->rq[i];
> > > > +
> > > > +               kfree(rq->dma_array);
> > > > +               kfree(rq->data_array);
> > > > +       }
> > > > +
> > > > +       return -ENOMEM;
> > > > +}
> > > > +
> > > >  static void free_old_xmit_skbs(struct send_queue *sq, bool in_napi)
> > > >  {
> > > >         unsigned int len;
> > > > @@ -835,7 +1033,7 @@ static struct page *xdp_linearize_page(struct receive_queue *rq,
> > > >                 void *buf;
> > > >                 int off;
> > > >
> > > > -               buf = virtqueue_get_buf(rq->vq, &buflen);
> > > > +               buf = virtnet_rq_get_buf(rq, &buflen, NULL);
> > > >                 if (unlikely(!buf))
> > > >                         goto err_buf;
> > > >
> > > > @@ -1126,7 +1324,7 @@ static int virtnet_build_xdp_buff_mrg(struct net_device *dev,
> > > >                 return -EINVAL;
> > > >
> > > >         while (--*num_buf > 0) {
> > > > -               buf = virtqueue_get_buf_ctx(rq->vq, &len, &ctx);
> > > > +               buf = virtnet_rq_get_buf(rq, &len, &ctx);
> > > >                 if (unlikely(!buf)) {
> > > >                         pr_debug("%s: rx error: %d buffers out of %d missing\n",
> > > >                                  dev->name, *num_buf,
> > > > @@ -1351,7 +1549,7 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
> > > >         while (--num_buf) {
> > > >                 int num_skb_frags;
> > > >
> > > > -               buf = virtqueue_get_buf_ctx(rq->vq, &len, &ctx);
> > > > +               buf = virtnet_rq_get_buf(rq, &len, &ctx);
> > > >                 if (unlikely(!buf)) {
> > > >                         pr_debug("%s: rx error: %d buffers out of %d missing\n",
> > > >                                  dev->name, num_buf,
> > > > @@ -1414,7 +1612,7 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
> > > >  err_skb:
> > > >         put_page(page);
> > > >         while (num_buf-- > 1) {
> > > > -               buf = virtqueue_get_buf(rq->vq, &len);
> > > > +               buf = virtnet_rq_get_buf(rq, &len, NULL);
> > > >                 if (unlikely(!buf)) {
> > > >                         pr_debug("%s: rx error: %d buffers missing\n",
> > > >                                  dev->name, num_buf);
> > > > @@ -1529,6 +1727,7 @@ static int add_recvbuf_small(struct virtnet_info *vi, struct receive_queue *rq,
> > > >         unsigned int xdp_headroom = virtnet_get_headroom(vi);
> > > >         void *ctx = (void *)(unsigned long)xdp_headroom;
> > > >         int len = vi->hdr_len + VIRTNET_RX_PAD + GOOD_PACKET_LEN + xdp_headroom;
> > > > +       struct virtnet_rq_data *data;
> > > >         int err;
> > > >
> > > >         len = SKB_DATA_ALIGN(len) +
> > > > @@ -1539,11 +1738,34 @@ static int add_recvbuf_small(struct virtnet_info *vi, struct receive_queue *rq,
> > > >         buf = (char *)page_address(alloc_frag->page) + alloc_frag->offset;
> > > >         get_page(alloc_frag->page);
> > > >         alloc_frag->offset += len;
> > > > -       sg_init_one(rq->sg, buf + VIRTNET_RX_PAD + xdp_headroom,
> > > > -                   vi->hdr_len + GOOD_PACKET_LEN);
> > > > -       err = virtqueue_add_inbuf_ctx(rq->vq, rq->sg, 1, buf, ctx, gfp);
> > > > +
> > > > +       if (rq->data_array) {
> > > > +               err = virtnet_rq_map_sg(rq, buf + VIRTNET_RX_PAD + xdp_headroom,
> > > > +                                       vi->hdr_len + GOOD_PACKET_LEN);
> > >
> > > Thanks to the compound page. I wonder if everything could be
> > > simplified if we just reuse page->private for storing metadata like
> > > dma address and refcnt. Then we don't need extra stuff for tracking
> > > any other thing?
> >
> > Maybe we can try alloc one small buffer from the page_frag to store the dma info
> > when page_frag.offset == 0.
>
> And store it in the ctx? I think it should work.


Since the dma information is located on the first page of the composite page, we
can get the dma information through buf.

No need to modify ctx.

Thanks.


>
> Thanks
>
> >
> > Thanks.
> >
> >
> > >
> > > Thanks
> > >
> > >
> > >
> > > > +               if (err)
> > > > +                       goto map_err;
> > > > +
> > > > +               data = virtnet_rq_get_data(rq, buf, rq->last_dma);
> > > > +       } else {
> > > > +               sg_init_one(rq->sg, buf + VIRTNET_RX_PAD + xdp_headroom,
> > > > +                           vi->hdr_len + GOOD_PACKET_LEN);
> > > > +               data = (void *)buf;
> > > > +       }
> > > > +
> > > > +       err = virtqueue_add_inbuf_ctx(rq->vq, rq->sg, 1, data, ctx, gfp);
> > > >         if (err < 0)
> > > > -               put_page(virt_to_head_page(buf));
> > > > +               goto add_err;
> > > > +
> > > > +       return err;
> > > > +
> > > > +add_err:
> > > > +       if (rq->data_array) {
> > > > +               virtnet_rq_unmap(rq, data->dma);
> > > > +               virtnet_rq_recycle_data(rq, data);
> > > > +       }
> > > > +
> > > > +map_err:
> > > > +       put_page(virt_to_head_page(buf));
> > > >         return err;
> > > >  }
> > > >
> > > > @@ -1620,6 +1842,7 @@ static int add_recvbuf_mergeable(struct virtnet_info *vi,
> > > >         unsigned int headroom = virtnet_get_headroom(vi);
> > > >         unsigned int tailroom = headroom ? sizeof(struct skb_shared_info) : 0;
> > > >         unsigned int room = SKB_DATA_ALIGN(headroom + tailroom);
> > > > +       struct virtnet_rq_data *data;
> > > >         char *buf;
> > > >         void *ctx;
> > > >         int err;
> > > > @@ -1650,12 +1873,32 @@ static int add_recvbuf_mergeable(struct virtnet_info *vi,
> > > >                 alloc_frag->offset += hole;
> > > >         }
> > > >
> > > > -       sg_init_one(rq->sg, buf, len);
> > > > +       if (rq->data_array) {
> > > > +               err = virtnet_rq_map_sg(rq, buf, len);
> > > > +               if (err)
> > > > +                       goto map_err;
> > > > +
> > > > +               data = virtnet_rq_get_data(rq, buf, rq->last_dma);
> > > > +       } else {
> > > > +               sg_init_one(rq->sg, buf, len);
> > > > +               data = (void *)buf;
> > > > +       }
> > > > +
> > > >         ctx = mergeable_len_to_ctx(len + room, headroom);
> > > > -       err = virtqueue_add_inbuf_ctx(rq->vq, rq->sg, 1, buf, ctx, gfp);
> > > > +       err = virtqueue_add_inbuf_ctx(rq->vq, rq->sg, 1, data, ctx, gfp);
> > > >         if (err < 0)
> > > > -               put_page(virt_to_head_page(buf));
> > > > +               goto add_err;
> > > > +
> > > > +       return 0;
> > > > +
> > > > +add_err:
> > > > +       if (rq->data_array) {
> > > > +               virtnet_rq_unmap(rq, data->dma);
> > > > +               virtnet_rq_recycle_data(rq, data);
> > > > +       }
> > > >
> > > > +map_err:
> > > > +       put_page(virt_to_head_page(buf));
> > > >         return err;
> > > >  }
> > > >
> > > > @@ -1775,13 +2018,13 @@ static int virtnet_receive(struct receive_queue *rq, int budget,
> > > >                 void *ctx;
> > > >
> > > >                 while (stats.packets < budget &&
> > > > -                      (buf = virtqueue_get_buf_ctx(rq->vq, &len, &ctx))) {
> > > > +                      (buf = virtnet_rq_get_buf(rq, &len, &ctx))) {
> > > >                         receive_buf(vi, rq, buf, len, ctx, xdp_xmit, &stats);
> > > >                         stats.packets++;
> > > >                 }
> > > >         } else {
> > > >                 while (stats.packets < budget &&
> > > > -                      (buf = virtqueue_get_buf(rq->vq, &len)) != NULL) {
> > > > +                      (buf = virtnet_rq_get_buf(rq, &len, NULL)) != NULL) {
> > > >                         receive_buf(vi, rq, buf, len, NULL, xdp_xmit, &stats);
> > > >                         stats.packets++;
> > > >                 }
> > > > @@ -3514,6 +3757,9 @@ static void virtnet_free_queues(struct virtnet_info *vi)
> > > >         for (i = 0; i < vi->max_queue_pairs; i++) {
> > > >                 __netif_napi_del(&vi->rq[i].napi);
> > > >                 __netif_napi_del(&vi->sq[i].napi);
> > > > +
> > > > +               kfree(vi->rq[i].data_array);
> > > > +               kfree(vi->rq[i].dma_array);
> > > >         }
> > > >
> > > >         /* We called __netif_napi_del(),
> > > > @@ -3591,9 +3837,10 @@ static void free_unused_bufs(struct virtnet_info *vi)
> > > >         }
> > > >
> > > >         for (i = 0; i < vi->max_queue_pairs; i++) {
> > > > -               struct virtqueue *vq = vi->rq[i].vq;
> > > > -               while ((buf = virtqueue_detach_unused_buf(vq)) != NULL)
> > > > -                       virtnet_rq_free_unused_buf(vq, buf);
> > > > +               struct receive_queue *rq = &vi->rq[i];
> > > > +
> > > > +               while ((buf = virtnet_rq_detach_unused_buf(rq)) != NULL)
> > > > +                       virtnet_rq_free_unused_buf(rq->vq, buf);
> > > >                 cond_resched();
> > > >         }
> > > >  }
> > > > @@ -3767,6 +4014,10 @@ static int init_vqs(struct virtnet_info *vi)
> > > >         if (ret)
> > > >                 goto err_free;
> > > >
> > > > +       ret = virtnet_rq_merge_map_init(vi);
> > > > +       if (ret)
> > > > +               goto err_free;
> > > > +
> > > >         cpus_read_lock();
> > > >         virtnet_set_affinity(vi);
> > > >         cpus_read_unlock();
> > > > --
> > > > 2.32.0.3.g01195cf9f
> > > >
> > >
> >
>

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: [PATCH vhost v11 10/10] virtio_net: merge dma operation for one page
  2023-07-14  3:58           ` Xuan Zhuo
@ 2023-07-14  5:45             ` Jason Wang
  -1 siblings, 0 replies; 176+ messages in thread
From: Jason Wang @ 2023-07-14  5:45 UTC (permalink / raw)
  To: Xuan Zhuo
  Cc: Jesper Dangaard Brouer, Daniel Borkmann, Michael S. Tsirkin,
	netdev, John Fastabend, Alexei Starovoitov, virtualization,
	Christoph Hellwig, Eric Dumazet, Jakub Kicinski, bpf,
	Paolo Abeni, David S. Miller

On Fri, Jul 14, 2023 at 12:37 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
>
> On Fri, 14 Jul 2023 11:57:05 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > On Thu, Jul 13, 2023 at 3:02 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > >
> > > On Thu, 13 Jul 2023 12:20:01 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > > On Mon, Jul 10, 2023 at 11:43 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > > >
> > > >
> > > > I'd suggest to tweak the title like:
> > > >
> > > > "merge dma operations when refilling mergeable buffers"
> > > >
> > > > > Currently, the virtio core will perform a dma operation for each
> > > > > operation.
> > > >
> > > > "for each buffer"?
> > > >
> > > > > Although, the same page may be operated multiple times.
> > > > >
> > > > > The driver does the dma operation and manages the dma address based the
> > > > > feature premapped of virtio core.
> > > > >
> > > > > This way, we can perform only one dma operation for the same page. In
> > > > > the case of mtu 1500, this can reduce a lot of dma operations.
> > > > >
> > > > > Tested on Aliyun g7.4large machine, in the case of a cpu 100%, pps
> > > > > increased from 1893766 to 1901105. An increase of 0.4%.
> > > >
> > > > Btw, it looks to me the code to deal with XDP_TX/REDIRECT for
> > > > linearized pages was missed.
> > > >
> > > > >
> > > > > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > > > > ---
> > > > >  drivers/net/virtio_net.c | 283 ++++++++++++++++++++++++++++++++++++---
> > > > >  1 file changed, 267 insertions(+), 16 deletions(-)
> > > > >
> > > > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > > > > index 486b5849033d..4de845d35bed 100644
> > > > > --- a/drivers/net/virtio_net.c
> > > > > +++ b/drivers/net/virtio_net.c
> > > > > @@ -126,6 +126,27 @@ static const struct virtnet_stat_desc virtnet_rq_stats_desc[] = {
> > > > >  #define VIRTNET_SQ_STATS_LEN   ARRAY_SIZE(virtnet_sq_stats_desc)
> > > > >  #define VIRTNET_RQ_STATS_LEN   ARRAY_SIZE(virtnet_rq_stats_desc)
> > > > >
> > > > > +/* The bufs on the same page may share this struct. */
> > > > > +struct virtnet_rq_dma {
> > > > > +       struct virtnet_rq_dma *next;
> > > > > +
> > > > > +       dma_addr_t addr;
> > > > > +
> > > > > +       void *buf;
> > > > > +       u32 len;
> > > > > +
> > > > > +       u32 ref;
> > > > > +};
> > > > > +
> > > > > +/* Record the dma and buf. */
> > > > > +struct virtnet_rq_data {
> > > > > +       struct virtnet_rq_data *next;
> > > > > +
> > > > > +       void *buf;
> > > > > +
> > > > > +       struct virtnet_rq_dma *dma;
> > > > > +};
> > > > > +
> > > > >  /* Internal representation of a send virtqueue */
> > > > >  struct send_queue {
> > > > >         /* Virtqueue associated with this send _queue */
> > > > > @@ -175,6 +196,13 @@ struct receive_queue {
> > > > >         char name[16];
> > > > >
> > > > >         struct xdp_rxq_info xdp_rxq;
> > > > > +
> > > > > +       struct virtnet_rq_data *data_array;
> > > > > +       struct virtnet_rq_data *data_free;
> > > > > +
> > > > > +       struct virtnet_rq_dma *dma_array;
> > > > > +       struct virtnet_rq_dma *dma_free;
> > > > > +       struct virtnet_rq_dma *last_dma;
> > > > >  };
> > > > >
> > > > >  /* This structure can contain rss message with maximum settings for indirection table and keysize
> > > > > @@ -549,6 +577,176 @@ static struct sk_buff *page_to_skb(struct virtnet_info *vi,
> > > > >         return skb;
> > > > >  }
> > > > >
> > > > > +static void virtnet_rq_unmap(struct receive_queue *rq, struct virtnet_rq_dma *dma)
> > > > > +{
> > > > > +       struct device *dev;
> > > > > +
> > > > > +       --dma->ref;
> > > > > +
> > > > > +       if (dma->ref)
> > > > > +               return;
> > > > > +
> > > > > +       dev = virtqueue_dma_dev(rq->vq);
> > > > > +
> > > > > +       dma_unmap_page(dev, dma->addr, dma->len, DMA_FROM_DEVICE);
> > > > > +
> > > > > +       dma->next = rq->dma_free;
> > > > > +       rq->dma_free = dma;
> > > > > +}
> > > > > +
> > > > > +static void *virtnet_rq_recycle_data(struct receive_queue *rq,
> > > > > +                                    struct virtnet_rq_data *data)
> > > > > +{
> > > > > +       void *buf;
> > > > > +
> > > > > +       buf = data->buf;
> > > > > +
> > > > > +       data->next = rq->data_free;
> > > > > +       rq->data_free = data;
> > > > > +
> > > > > +       return buf;
> > > > > +}
> > > > > +
> > > > > +static struct virtnet_rq_data *virtnet_rq_get_data(struct receive_queue *rq,
> > > > > +                                                  void *buf,
> > > > > +                                                  struct virtnet_rq_dma *dma)
> > > > > +{
> > > > > +       struct virtnet_rq_data *data;
> > > > > +
> > > > > +       data = rq->data_free;
> > > > > +       rq->data_free = data->next;
> > > > > +
> > > > > +       data->buf = buf;
> > > > > +       data->dma = dma;
> > > > > +
> > > > > +       return data;
> > > > > +}
> > > > > +
> > > > > +static void *virtnet_rq_get_buf(struct receive_queue *rq, u32 *len, void **ctx)
> > > > > +{
> > > > > +       struct virtnet_rq_data *data;
> > > > > +       void *buf;
> > > > > +
> > > > > +       buf = virtqueue_get_buf_ctx(rq->vq, len, ctx);
> > > > > +       if (!buf || !rq->data_array)
> > > > > +               return buf;
> > > > > +
> > > > > +       data = buf;
> > > > > +
> > > > > +       virtnet_rq_unmap(rq, data->dma);
> > > > > +
> > > > > +       return virtnet_rq_recycle_data(rq, data);
> > > > > +}
> > > > > +
> > > > > +static void *virtnet_rq_detach_unused_buf(struct receive_queue *rq)
> > > > > +{
> > > > > +       struct virtnet_rq_data *data;
> > > > > +       void *buf;
> > > > > +
> > > > > +       buf = virtqueue_detach_unused_buf(rq->vq);
> > > > > +       if (!buf || !rq->data_array)
> > > > > +               return buf;
> > > > > +
> > > > > +       data = buf;
> > > > > +
> > > > > +       virtnet_rq_unmap(rq, data->dma);
> > > > > +
> > > > > +       return virtnet_rq_recycle_data(rq, data);
> > > > > +}
> > > > > +
> > > > > +static int virtnet_rq_map_sg(struct receive_queue *rq, void *buf, u32 len)
> > > > > +{
> > > > > +       struct virtnet_rq_dma *dma = rq->last_dma;
> > > > > +       struct device *dev;
> > > > > +       u32 off, map_len;
> > > > > +       dma_addr_t addr;
> > > > > +       void *end;
> > > > > +
> > > > > +       if (likely(dma) && buf >= dma->buf && (buf + len <= dma->buf + dma->len)) {
> > > > > +               ++dma->ref;
> > > > > +               addr = dma->addr + (buf - dma->buf);
> > > > > +               goto ok;
> > > > > +       }
> > > > > +
> > > > > +       end = buf + len - 1;
> > > > > +       off = offset_in_page(end);
> > > > > +       map_len = len + PAGE_SIZE - off;
> > > >
> > > > This assumes a PAGE_SIZE which seems sub-optimal as page frag could be
> > > > larger than this.
> > > >
> > > > > +
> > > > > +       dev = virtqueue_dma_dev(rq->vq);
> > > > > +
> > > > > +       addr = dma_map_page_attrs(dev, virt_to_page(buf), offset_in_page(buf),
> > > > > +                                 map_len, DMA_FROM_DEVICE, 0);
> > > > > +       if (addr == DMA_MAPPING_ERROR)
> > > > > +               return -ENOMEM;
> > > > > +
> > > > > +       dma = rq->dma_free;
> > > > > +       rq->dma_free = dma->next;
> > > > > +
> > > > > +       dma->ref = 1;
> > > > > +       dma->buf = buf;
> > > > > +       dma->addr = addr;
> > > > > +       dma->len = map_len;
> > > > > +
> > > > > +       rq->last_dma = dma;
> > > > > +
> > > > > +ok:
> > > > > +       sg_init_table(rq->sg, 1);
> > > > > +       rq->sg[0].dma_address = addr;
> > > > > +       rq->sg[0].length = len;
> > > > > +
> > > > > +       return 0;
> > > > > +}
> > > > > +
> > > > > +static int virtnet_rq_merge_map_init(struct virtnet_info *vi)
> > > > > +{
> > > > > +       struct receive_queue *rq;
> > > > > +       int i, err, j, num;
> > > > > +
> > > > > +       /* disable for big mode */
> > > > > +       if (!vi->mergeable_rx_bufs && vi->big_packets)
> > > > > +               return 0;
> > > > > +
> > > > > +       for (i = 0; i < vi->max_queue_pairs; i++) {
> > > > > +               err = virtqueue_set_premapped(vi->rq[i].vq);
> > > > > +               if (err)
> > > > > +                       continue;
> > > > > +
> > > > > +               rq = &vi->rq[i];
> > > > > +
> > > > > +               num = virtqueue_get_vring_size(rq->vq);
> > > > > +
> > > > > +               rq->data_array = kmalloc_array(num, sizeof(*rq->data_array), GFP_KERNEL);
> > > > > +               if (!rq->data_array)
> > > >
> > > > Can we avoid those allocations when we don't use the DMA API?
> > > >
> > > > > +                       goto err;
> > > > > +
> > > > > +               rq->dma_array = kmalloc_array(num, sizeof(*rq->dma_array), GFP_KERNEL);
> > > > > +               if (!rq->dma_array)
> > > > > +                       goto err;
> > > > > +
> > > > > +               for (j = 0; j < num; ++j) {
> > > > > +                       rq->data_array[j].next = rq->data_free;
> > > > > +                       rq->data_free = &rq->data_array[j];
> > > > > +
> > > > > +                       rq->dma_array[j].next = rq->dma_free;
> > > > > +                       rq->dma_free = &rq->dma_array[j];
> > > > > +               }
> > > > > +       }
> > > > > +
> > > > > +       return 0;
> > > > > +
> > > > > +err:
> > > > > +       for (i = 0; i < vi->max_queue_pairs; i++) {
> > > > > +               struct receive_queue *rq;
> > > > > +
> > > > > +               rq = &vi->rq[i];
> > > > > +
> > > > > +               kfree(rq->dma_array);
> > > > > +               kfree(rq->data_array);
> > > > > +       }
> > > > > +
> > > > > +       return -ENOMEM;
> > > > > +}
> > > > > +
> > > > >  static void free_old_xmit_skbs(struct send_queue *sq, bool in_napi)
> > > > >  {
> > > > >         unsigned int len;
> > > > > @@ -835,7 +1033,7 @@ static struct page *xdp_linearize_page(struct receive_queue *rq,
> > > > >                 void *buf;
> > > > >                 int off;
> > > > >
> > > > > -               buf = virtqueue_get_buf(rq->vq, &buflen);
> > > > > +               buf = virtnet_rq_get_buf(rq, &buflen, NULL);
> > > > >                 if (unlikely(!buf))
> > > > >                         goto err_buf;
> > > > >
> > > > > @@ -1126,7 +1324,7 @@ static int virtnet_build_xdp_buff_mrg(struct net_device *dev,
> > > > >                 return -EINVAL;
> > > > >
> > > > >         while (--*num_buf > 0) {
> > > > > -               buf = virtqueue_get_buf_ctx(rq->vq, &len, &ctx);
> > > > > +               buf = virtnet_rq_get_buf(rq, &len, &ctx);
> > > > >                 if (unlikely(!buf)) {
> > > > >                         pr_debug("%s: rx error: %d buffers out of %d missing\n",
> > > > >                                  dev->name, *num_buf,
> > > > > @@ -1351,7 +1549,7 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
> > > > >         while (--num_buf) {
> > > > >                 int num_skb_frags;
> > > > >
> > > > > -               buf = virtqueue_get_buf_ctx(rq->vq, &len, &ctx);
> > > > > +               buf = virtnet_rq_get_buf(rq, &len, &ctx);
> > > > >                 if (unlikely(!buf)) {
> > > > >                         pr_debug("%s: rx error: %d buffers out of %d missing\n",
> > > > >                                  dev->name, num_buf,
> > > > > @@ -1414,7 +1612,7 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
> > > > >  err_skb:
> > > > >         put_page(page);
> > > > >         while (num_buf-- > 1) {
> > > > > -               buf = virtqueue_get_buf(rq->vq, &len);
> > > > > +               buf = virtnet_rq_get_buf(rq, &len, NULL);
> > > > >                 if (unlikely(!buf)) {
> > > > >                         pr_debug("%s: rx error: %d buffers missing\n",
> > > > >                                  dev->name, num_buf);
> > > > > @@ -1529,6 +1727,7 @@ static int add_recvbuf_small(struct virtnet_info *vi, struct receive_queue *rq,
> > > > >         unsigned int xdp_headroom = virtnet_get_headroom(vi);
> > > > >         void *ctx = (void *)(unsigned long)xdp_headroom;
> > > > >         int len = vi->hdr_len + VIRTNET_RX_PAD + GOOD_PACKET_LEN + xdp_headroom;
> > > > > +       struct virtnet_rq_data *data;
> > > > >         int err;
> > > > >
> > > > >         len = SKB_DATA_ALIGN(len) +
> > > > > @@ -1539,11 +1738,34 @@ static int add_recvbuf_small(struct virtnet_info *vi, struct receive_queue *rq,
> > > > >         buf = (char *)page_address(alloc_frag->page) + alloc_frag->offset;
> > > > >         get_page(alloc_frag->page);
> > > > >         alloc_frag->offset += len;
> > > > > -       sg_init_one(rq->sg, buf + VIRTNET_RX_PAD + xdp_headroom,
> > > > > -                   vi->hdr_len + GOOD_PACKET_LEN);
> > > > > -       err = virtqueue_add_inbuf_ctx(rq->vq, rq->sg, 1, buf, ctx, gfp);
> > > > > +
> > > > > +       if (rq->data_array) {
> > > > > +               err = virtnet_rq_map_sg(rq, buf + VIRTNET_RX_PAD + xdp_headroom,
> > > > > +                                       vi->hdr_len + GOOD_PACKET_LEN);
> > > >
> > > > Thanks to the compound page. I wonder if everything could be
> > > > simplified if we just reuse page->private for storing metadata like
> > > > dma address and refcnt. Then we don't need extra stuff for tracking
> > > > any other thing?
> > >
> > > Maybe we can try alloc one small buffer from the page_frag to store the dma info
> > > when page_frag.offset == 0.
> >
> > And store it in the ctx? I think it should work.
>
>
> Since the dma information is located on the first page of the composite page, we
> can get the dma information through buf.
>
> No need to modify ctx.

Ok, I'm not sure I get this fully, maybe you can post another version to see.

Thanks

>
> Thanks.
>
>
> >
> > Thanks
> >
> > >
> > > Thanks.
> > >
> > >
> > > >
> > > > Thanks
> > > >
> > > >
> > > >
> > > > > +               if (err)
> > > > > +                       goto map_err;
> > > > > +
> > > > > +               data = virtnet_rq_get_data(rq, buf, rq->last_dma);
> > > > > +       } else {
> > > > > +               sg_init_one(rq->sg, buf + VIRTNET_RX_PAD + xdp_headroom,
> > > > > +                           vi->hdr_len + GOOD_PACKET_LEN);
> > > > > +               data = (void *)buf;
> > > > > +       }
> > > > > +
> > > > > +       err = virtqueue_add_inbuf_ctx(rq->vq, rq->sg, 1, data, ctx, gfp);
> > > > >         if (err < 0)
> > > > > -               put_page(virt_to_head_page(buf));
> > > > > +               goto add_err;
> > > > > +
> > > > > +       return err;
> > > > > +
> > > > > +add_err:
> > > > > +       if (rq->data_array) {
> > > > > +               virtnet_rq_unmap(rq, data->dma);
> > > > > +               virtnet_rq_recycle_data(rq, data);
> > > > > +       }
> > > > > +
> > > > > +map_err:
> > > > > +       put_page(virt_to_head_page(buf));
> > > > >         return err;
> > > > >  }
> > > > >
> > > > > @@ -1620,6 +1842,7 @@ static int add_recvbuf_mergeable(struct virtnet_info *vi,
> > > > >         unsigned int headroom = virtnet_get_headroom(vi);
> > > > >         unsigned int tailroom = headroom ? sizeof(struct skb_shared_info) : 0;
> > > > >         unsigned int room = SKB_DATA_ALIGN(headroom + tailroom);
> > > > > +       struct virtnet_rq_data *data;
> > > > >         char *buf;
> > > > >         void *ctx;
> > > > >         int err;
> > > > > @@ -1650,12 +1873,32 @@ static int add_recvbuf_mergeable(struct virtnet_info *vi,
> > > > >                 alloc_frag->offset += hole;
> > > > >         }
> > > > >
> > > > > -       sg_init_one(rq->sg, buf, len);
> > > > > +       if (rq->data_array) {
> > > > > +               err = virtnet_rq_map_sg(rq, buf, len);
> > > > > +               if (err)
> > > > > +                       goto map_err;
> > > > > +
> > > > > +               data = virtnet_rq_get_data(rq, buf, rq->last_dma);
> > > > > +       } else {
> > > > > +               sg_init_one(rq->sg, buf, len);
> > > > > +               data = (void *)buf;
> > > > > +       }
> > > > > +
> > > > >         ctx = mergeable_len_to_ctx(len + room, headroom);
> > > > > -       err = virtqueue_add_inbuf_ctx(rq->vq, rq->sg, 1, buf, ctx, gfp);
> > > > > +       err = virtqueue_add_inbuf_ctx(rq->vq, rq->sg, 1, data, ctx, gfp);
> > > > >         if (err < 0)
> > > > > -               put_page(virt_to_head_page(buf));
> > > > > +               goto add_err;
> > > > > +
> > > > > +       return 0;
> > > > > +
> > > > > +add_err:
> > > > > +       if (rq->data_array) {
> > > > > +               virtnet_rq_unmap(rq, data->dma);
> > > > > +               virtnet_rq_recycle_data(rq, data);
> > > > > +       }
> > > > >
> > > > > +map_err:
> > > > > +       put_page(virt_to_head_page(buf));
> > > > >         return err;
> > > > >  }
> > > > >
> > > > > @@ -1775,13 +2018,13 @@ static int virtnet_receive(struct receive_queue *rq, int budget,
> > > > >                 void *ctx;
> > > > >
> > > > >                 while (stats.packets < budget &&
> > > > > -                      (buf = virtqueue_get_buf_ctx(rq->vq, &len, &ctx))) {
> > > > > +                      (buf = virtnet_rq_get_buf(rq, &len, &ctx))) {
> > > > >                         receive_buf(vi, rq, buf, len, ctx, xdp_xmit, &stats);
> > > > >                         stats.packets++;
> > > > >                 }
> > > > >         } else {
> > > > >                 while (stats.packets < budget &&
> > > > > -                      (buf = virtqueue_get_buf(rq->vq, &len)) != NULL) {
> > > > > +                      (buf = virtnet_rq_get_buf(rq, &len, NULL)) != NULL) {
> > > > >                         receive_buf(vi, rq, buf, len, NULL, xdp_xmit, &stats);
> > > > >                         stats.packets++;
> > > > >                 }
> > > > > @@ -3514,6 +3757,9 @@ static void virtnet_free_queues(struct virtnet_info *vi)
> > > > >         for (i = 0; i < vi->max_queue_pairs; i++) {
> > > > >                 __netif_napi_del(&vi->rq[i].napi);
> > > > >                 __netif_napi_del(&vi->sq[i].napi);
> > > > > +
> > > > > +               kfree(vi->rq[i].data_array);
> > > > > +               kfree(vi->rq[i].dma_array);
> > > > >         }
> > > > >
> > > > >         /* We called __netif_napi_del(),
> > > > > @@ -3591,9 +3837,10 @@ static void free_unused_bufs(struct virtnet_info *vi)
> > > > >         }
> > > > >
> > > > >         for (i = 0; i < vi->max_queue_pairs; i++) {
> > > > > -               struct virtqueue *vq = vi->rq[i].vq;
> > > > > -               while ((buf = virtqueue_detach_unused_buf(vq)) != NULL)
> > > > > -                       virtnet_rq_free_unused_buf(vq, buf);
> > > > > +               struct receive_queue *rq = &vi->rq[i];
> > > > > +
> > > > > +               while ((buf = virtnet_rq_detach_unused_buf(rq)) != NULL)
> > > > > +                       virtnet_rq_free_unused_buf(rq->vq, buf);
> > > > >                 cond_resched();
> > > > >         }
> > > > >  }
> > > > > @@ -3767,6 +4014,10 @@ static int init_vqs(struct virtnet_info *vi)
> > > > >         if (ret)
> > > > >                 goto err_free;
> > > > >
> > > > > +       ret = virtnet_rq_merge_map_init(vi);
> > > > > +       if (ret)
> > > > > +               goto err_free;
> > > > > +
> > > > >         cpus_read_lock();
> > > > >         virtnet_set_affinity(vi);
> > > > >         cpus_read_unlock();
> > > > > --
> > > > > 2.32.0.3.g01195cf9f
> > > > >
> > > >
> > >
> >
>

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: [PATCH vhost v11 10/10] virtio_net: merge dma operation for one page
@ 2023-07-14  5:45             ` Jason Wang
  0 siblings, 0 replies; 176+ messages in thread
From: Jason Wang @ 2023-07-14  5:45 UTC (permalink / raw)
  To: Xuan Zhuo
  Cc: virtualization, Michael S. Tsirkin, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Alexei Starovoitov,
	Daniel Borkmann, Jesper Dangaard Brouer, John Fastabend, netdev,
	bpf, Christoph Hellwig

On Fri, Jul 14, 2023 at 12:37 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
>
> On Fri, 14 Jul 2023 11:57:05 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > On Thu, Jul 13, 2023 at 3:02 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > >
> > > On Thu, 13 Jul 2023 12:20:01 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > > On Mon, Jul 10, 2023 at 11:43 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > > >
> > > >
> > > > I'd suggest to tweak the title like:
> > > >
> > > > "merge dma operations when refilling mergeable buffers"
> > > >
> > > > > Currently, the virtio core will perform a dma operation for each
> > > > > operation.
> > > >
> > > > "for each buffer"?
> > > >
> > > > > Although, the same page may be operated multiple times.
> > > > >
> > > > > The driver does the dma operation and manages the dma address based the
> > > > > feature premapped of virtio core.
> > > > >
> > > > > This way, we can perform only one dma operation for the same page. In
> > > > > the case of mtu 1500, this can reduce a lot of dma operations.
> > > > >
> > > > > Tested on Aliyun g7.4large machine, in the case of a cpu 100%, pps
> > > > > increased from 1893766 to 1901105. An increase of 0.4%.
> > > >
> > > > Btw, it looks to me the code to deal with XDP_TX/REDIRECT for
> > > > linearized pages was missed.
> > > >
> > > > >
> > > > > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > > > > ---
> > > > >  drivers/net/virtio_net.c | 283 ++++++++++++++++++++++++++++++++++++---
> > > > >  1 file changed, 267 insertions(+), 16 deletions(-)
> > > > >
> > > > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > > > > index 486b5849033d..4de845d35bed 100644
> > > > > --- a/drivers/net/virtio_net.c
> > > > > +++ b/drivers/net/virtio_net.c
> > > > > @@ -126,6 +126,27 @@ static const struct virtnet_stat_desc virtnet_rq_stats_desc[] = {
> > > > >  #define VIRTNET_SQ_STATS_LEN   ARRAY_SIZE(virtnet_sq_stats_desc)
> > > > >  #define VIRTNET_RQ_STATS_LEN   ARRAY_SIZE(virtnet_rq_stats_desc)
> > > > >
> > > > > +/* The bufs on the same page may share this struct. */
> > > > > +struct virtnet_rq_dma {
> > > > > +       struct virtnet_rq_dma *next;
> > > > > +
> > > > > +       dma_addr_t addr;
> > > > > +
> > > > > +       void *buf;
> > > > > +       u32 len;
> > > > > +
> > > > > +       u32 ref;
> > > > > +};
> > > > > +
> > > > > +/* Record the dma and buf. */
> > > > > +struct virtnet_rq_data {
> > > > > +       struct virtnet_rq_data *next;
> > > > > +
> > > > > +       void *buf;
> > > > > +
> > > > > +       struct virtnet_rq_dma *dma;
> > > > > +};
> > > > > +
> > > > >  /* Internal representation of a send virtqueue */
> > > > >  struct send_queue {
> > > > >         /* Virtqueue associated with this send _queue */
> > > > > @@ -175,6 +196,13 @@ struct receive_queue {
> > > > >         char name[16];
> > > > >
> > > > >         struct xdp_rxq_info xdp_rxq;
> > > > > +
> > > > > +       struct virtnet_rq_data *data_array;
> > > > > +       struct virtnet_rq_data *data_free;
> > > > > +
> > > > > +       struct virtnet_rq_dma *dma_array;
> > > > > +       struct virtnet_rq_dma *dma_free;
> > > > > +       struct virtnet_rq_dma *last_dma;
> > > > >  };
> > > > >
> > > > >  /* This structure can contain rss message with maximum settings for indirection table and keysize
> > > > > @@ -549,6 +577,176 @@ static struct sk_buff *page_to_skb(struct virtnet_info *vi,
> > > > >         return skb;
> > > > >  }
> > > > >
> > > > > +static void virtnet_rq_unmap(struct receive_queue *rq, struct virtnet_rq_dma *dma)
> > > > > +{
> > > > > +       struct device *dev;
> > > > > +
> > > > > +       --dma->ref;
> > > > > +
> > > > > +       if (dma->ref)
> > > > > +               return;
> > > > > +
> > > > > +       dev = virtqueue_dma_dev(rq->vq);
> > > > > +
> > > > > +       dma_unmap_page(dev, dma->addr, dma->len, DMA_FROM_DEVICE);
> > > > > +
> > > > > +       dma->next = rq->dma_free;
> > > > > +       rq->dma_free = dma;
> > > > > +}
> > > > > +
> > > > > +static void *virtnet_rq_recycle_data(struct receive_queue *rq,
> > > > > +                                    struct virtnet_rq_data *data)
> > > > > +{
> > > > > +       void *buf;
> > > > > +
> > > > > +       buf = data->buf;
> > > > > +
> > > > > +       data->next = rq->data_free;
> > > > > +       rq->data_free = data;
> > > > > +
> > > > > +       return buf;
> > > > > +}
> > > > > +
> > > > > +static struct virtnet_rq_data *virtnet_rq_get_data(struct receive_queue *rq,
> > > > > +                                                  void *buf,
> > > > > +                                                  struct virtnet_rq_dma *dma)
> > > > > +{
> > > > > +       struct virtnet_rq_data *data;
> > > > > +
> > > > > +       data = rq->data_free;
> > > > > +       rq->data_free = data->next;
> > > > > +
> > > > > +       data->buf = buf;
> > > > > +       data->dma = dma;
> > > > > +
> > > > > +       return data;
> > > > > +}
> > > > > +
> > > > > +static void *virtnet_rq_get_buf(struct receive_queue *rq, u32 *len, void **ctx)
> > > > > +{
> > > > > +       struct virtnet_rq_data *data;
> > > > > +       void *buf;
> > > > > +
> > > > > +       buf = virtqueue_get_buf_ctx(rq->vq, len, ctx);
> > > > > +       if (!buf || !rq->data_array)
> > > > > +               return buf;
> > > > > +
> > > > > +       data = buf;
> > > > > +
> > > > > +       virtnet_rq_unmap(rq, data->dma);
> > > > > +
> > > > > +       return virtnet_rq_recycle_data(rq, data);
> > > > > +}
> > > > > +
> > > > > +static void *virtnet_rq_detach_unused_buf(struct receive_queue *rq)
> > > > > +{
> > > > > +       struct virtnet_rq_data *data;
> > > > > +       void *buf;
> > > > > +
> > > > > +       buf = virtqueue_detach_unused_buf(rq->vq);
> > > > > +       if (!buf || !rq->data_array)
> > > > > +               return buf;
> > > > > +
> > > > > +       data = buf;
> > > > > +
> > > > > +       virtnet_rq_unmap(rq, data->dma);
> > > > > +
> > > > > +       return virtnet_rq_recycle_data(rq, data);
> > > > > +}
> > > > > +
> > > > > +static int virtnet_rq_map_sg(struct receive_queue *rq, void *buf, u32 len)
> > > > > +{
> > > > > +       struct virtnet_rq_dma *dma = rq->last_dma;
> > > > > +       struct device *dev;
> > > > > +       u32 off, map_len;
> > > > > +       dma_addr_t addr;
> > > > > +       void *end;
> > > > > +
> > > > > +       if (likely(dma) && buf >= dma->buf && (buf + len <= dma->buf + dma->len)) {
> > > > > +               ++dma->ref;
> > > > > +               addr = dma->addr + (buf - dma->buf);
> > > > > +               goto ok;
> > > > > +       }
> > > > > +
> > > > > +       end = buf + len - 1;
> > > > > +       off = offset_in_page(end);
> > > > > +       map_len = len + PAGE_SIZE - off;
> > > >
> > > > This assumes a PAGE_SIZE which seems sub-optimal as page frag could be
> > > > larger than this.
> > > >
> > > > > +
> > > > > +       dev = virtqueue_dma_dev(rq->vq);
> > > > > +
> > > > > +       addr = dma_map_page_attrs(dev, virt_to_page(buf), offset_in_page(buf),
> > > > > +                                 map_len, DMA_FROM_DEVICE, 0);
> > > > > +       if (addr == DMA_MAPPING_ERROR)
> > > > > +               return -ENOMEM;
> > > > > +
> > > > > +       dma = rq->dma_free;
> > > > > +       rq->dma_free = dma->next;
> > > > > +
> > > > > +       dma->ref = 1;
> > > > > +       dma->buf = buf;
> > > > > +       dma->addr = addr;
> > > > > +       dma->len = map_len;
> > > > > +
> > > > > +       rq->last_dma = dma;
> > > > > +
> > > > > +ok:
> > > > > +       sg_init_table(rq->sg, 1);
> > > > > +       rq->sg[0].dma_address = addr;
> > > > > +       rq->sg[0].length = len;
> > > > > +
> > > > > +       return 0;
> > > > > +}
> > > > > +
> > > > > +static int virtnet_rq_merge_map_init(struct virtnet_info *vi)
> > > > > +{
> > > > > +       struct receive_queue *rq;
> > > > > +       int i, err, j, num;
> > > > > +
> > > > > +       /* disable for big mode */
> > > > > +       if (!vi->mergeable_rx_bufs && vi->big_packets)
> > > > > +               return 0;
> > > > > +
> > > > > +       for (i = 0; i < vi->max_queue_pairs; i++) {
> > > > > +               err = virtqueue_set_premapped(vi->rq[i].vq);
> > > > > +               if (err)
> > > > > +                       continue;
> > > > > +
> > > > > +               rq = &vi->rq[i];
> > > > > +
> > > > > +               num = virtqueue_get_vring_size(rq->vq);
> > > > > +
> > > > > +               rq->data_array = kmalloc_array(num, sizeof(*rq->data_array), GFP_KERNEL);
> > > > > +               if (!rq->data_array)
> > > >
> > > > Can we avoid those allocations when we don't use the DMA API?
> > > >
> > > > > +                       goto err;
> > > > > +
> > > > > +               rq->dma_array = kmalloc_array(num, sizeof(*rq->dma_array), GFP_KERNEL);
> > > > > +               if (!rq->dma_array)
> > > > > +                       goto err;
> > > > > +
> > > > > +               for (j = 0; j < num; ++j) {
> > > > > +                       rq->data_array[j].next = rq->data_free;
> > > > > +                       rq->data_free = &rq->data_array[j];
> > > > > +
> > > > > +                       rq->dma_array[j].next = rq->dma_free;
> > > > > +                       rq->dma_free = &rq->dma_array[j];
> > > > > +               }
> > > > > +       }
> > > > > +
> > > > > +       return 0;
> > > > > +
> > > > > +err:
> > > > > +       for (i = 0; i < vi->max_queue_pairs; i++) {
> > > > > +               struct receive_queue *rq;
> > > > > +
> > > > > +               rq = &vi->rq[i];
> > > > > +
> > > > > +               kfree(rq->dma_array);
> > > > > +               kfree(rq->data_array);
> > > > > +       }
> > > > > +
> > > > > +       return -ENOMEM;
> > > > > +}
> > > > > +
> > > > >  static void free_old_xmit_skbs(struct send_queue *sq, bool in_napi)
> > > > >  {
> > > > >         unsigned int len;
> > > > > @@ -835,7 +1033,7 @@ static struct page *xdp_linearize_page(struct receive_queue *rq,
> > > > >                 void *buf;
> > > > >                 int off;
> > > > >
> > > > > -               buf = virtqueue_get_buf(rq->vq, &buflen);
> > > > > +               buf = virtnet_rq_get_buf(rq, &buflen, NULL);
> > > > >                 if (unlikely(!buf))
> > > > >                         goto err_buf;
> > > > >
> > > > > @@ -1126,7 +1324,7 @@ static int virtnet_build_xdp_buff_mrg(struct net_device *dev,
> > > > >                 return -EINVAL;
> > > > >
> > > > >         while (--*num_buf > 0) {
> > > > > -               buf = virtqueue_get_buf_ctx(rq->vq, &len, &ctx);
> > > > > +               buf = virtnet_rq_get_buf(rq, &len, &ctx);
> > > > >                 if (unlikely(!buf)) {
> > > > >                         pr_debug("%s: rx error: %d buffers out of %d missing\n",
> > > > >                                  dev->name, *num_buf,
> > > > > @@ -1351,7 +1549,7 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
> > > > >         while (--num_buf) {
> > > > >                 int num_skb_frags;
> > > > >
> > > > > -               buf = virtqueue_get_buf_ctx(rq->vq, &len, &ctx);
> > > > > +               buf = virtnet_rq_get_buf(rq, &len, &ctx);
> > > > >                 if (unlikely(!buf)) {
> > > > >                         pr_debug("%s: rx error: %d buffers out of %d missing\n",
> > > > >                                  dev->name, num_buf,
> > > > > @@ -1414,7 +1612,7 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
> > > > >  err_skb:
> > > > >         put_page(page);
> > > > >         while (num_buf-- > 1) {
> > > > > -               buf = virtqueue_get_buf(rq->vq, &len);
> > > > > +               buf = virtnet_rq_get_buf(rq, &len, NULL);
> > > > >                 if (unlikely(!buf)) {
> > > > >                         pr_debug("%s: rx error: %d buffers missing\n",
> > > > >                                  dev->name, num_buf);
> > > > > @@ -1529,6 +1727,7 @@ static int add_recvbuf_small(struct virtnet_info *vi, struct receive_queue *rq,
> > > > >         unsigned int xdp_headroom = virtnet_get_headroom(vi);
> > > > >         void *ctx = (void *)(unsigned long)xdp_headroom;
> > > > >         int len = vi->hdr_len + VIRTNET_RX_PAD + GOOD_PACKET_LEN + xdp_headroom;
> > > > > +       struct virtnet_rq_data *data;
> > > > >         int err;
> > > > >
> > > > >         len = SKB_DATA_ALIGN(len) +
> > > > > @@ -1539,11 +1738,34 @@ static int add_recvbuf_small(struct virtnet_info *vi, struct receive_queue *rq,
> > > > >         buf = (char *)page_address(alloc_frag->page) + alloc_frag->offset;
> > > > >         get_page(alloc_frag->page);
> > > > >         alloc_frag->offset += len;
> > > > > -       sg_init_one(rq->sg, buf + VIRTNET_RX_PAD + xdp_headroom,
> > > > > -                   vi->hdr_len + GOOD_PACKET_LEN);
> > > > > -       err = virtqueue_add_inbuf_ctx(rq->vq, rq->sg, 1, buf, ctx, gfp);
> > > > > +
> > > > > +       if (rq->data_array) {
> > > > > +               err = virtnet_rq_map_sg(rq, buf + VIRTNET_RX_PAD + xdp_headroom,
> > > > > +                                       vi->hdr_len + GOOD_PACKET_LEN);
> > > >
> > > > Thanks to the compound page. I wonder if everything could be
> > > > simplified if we just reuse page->private for storing metadata like
> > > > dma address and refcnt. Then we don't need extra stuff for tracking
> > > > any other thing?
> > >
> > > Maybe we can try alloc one small buffer from the page_frag to store the dma info
> > > when page_frag.offset == 0.
> >
> > And store it in the ctx? I think it should work.
>
>
> Since the dma information is located on the first page of the composite page, we
> can get the dma information through buf.
>
> No need to modify ctx.

Ok, I'm not sure I get this fully, maybe you can post another version to see.

Thanks

>
> Thanks.
>
>
> >
> > Thanks
> >
> > >
> > > Thanks.
> > >
> > >
> > > >
> > > > Thanks
> > > >
> > > >
> > > >
> > > > > +               if (err)
> > > > > +                       goto map_err;
> > > > > +
> > > > > +               data = virtnet_rq_get_data(rq, buf, rq->last_dma);
> > > > > +       } else {
> > > > > +               sg_init_one(rq->sg, buf + VIRTNET_RX_PAD + xdp_headroom,
> > > > > +                           vi->hdr_len + GOOD_PACKET_LEN);
> > > > > +               data = (void *)buf;
> > > > > +       }
> > > > > +
> > > > > +       err = virtqueue_add_inbuf_ctx(rq->vq, rq->sg, 1, data, ctx, gfp);
> > > > >         if (err < 0)
> > > > > -               put_page(virt_to_head_page(buf));
> > > > > +               goto add_err;
> > > > > +
> > > > > +       return err;
> > > > > +
> > > > > +add_err:
> > > > > +       if (rq->data_array) {
> > > > > +               virtnet_rq_unmap(rq, data->dma);
> > > > > +               virtnet_rq_recycle_data(rq, data);
> > > > > +       }
> > > > > +
> > > > > +map_err:
> > > > > +       put_page(virt_to_head_page(buf));
> > > > >         return err;
> > > > >  }
> > > > >
> > > > > @@ -1620,6 +1842,7 @@ static int add_recvbuf_mergeable(struct virtnet_info *vi,
> > > > >         unsigned int headroom = virtnet_get_headroom(vi);
> > > > >         unsigned int tailroom = headroom ? sizeof(struct skb_shared_info) : 0;
> > > > >         unsigned int room = SKB_DATA_ALIGN(headroom + tailroom);
> > > > > +       struct virtnet_rq_data *data;
> > > > >         char *buf;
> > > > >         void *ctx;
> > > > >         int err;
> > > > > @@ -1650,12 +1873,32 @@ static int add_recvbuf_mergeable(struct virtnet_info *vi,
> > > > >                 alloc_frag->offset += hole;
> > > > >         }
> > > > >
> > > > > -       sg_init_one(rq->sg, buf, len);
> > > > > +       if (rq->data_array) {
> > > > > +               err = virtnet_rq_map_sg(rq, buf, len);
> > > > > +               if (err)
> > > > > +                       goto map_err;
> > > > > +
> > > > > +               data = virtnet_rq_get_data(rq, buf, rq->last_dma);
> > > > > +       } else {
> > > > > +               sg_init_one(rq->sg, buf, len);
> > > > > +               data = (void *)buf;
> > > > > +       }
> > > > > +
> > > > >         ctx = mergeable_len_to_ctx(len + room, headroom);
> > > > > -       err = virtqueue_add_inbuf_ctx(rq->vq, rq->sg, 1, buf, ctx, gfp);
> > > > > +       err = virtqueue_add_inbuf_ctx(rq->vq, rq->sg, 1, data, ctx, gfp);
> > > > >         if (err < 0)
> > > > > -               put_page(virt_to_head_page(buf));
> > > > > +               goto add_err;
> > > > > +
> > > > > +       return 0;
> > > > > +
> > > > > +add_err:
> > > > > +       if (rq->data_array) {
> > > > > +               virtnet_rq_unmap(rq, data->dma);
> > > > > +               virtnet_rq_recycle_data(rq, data);
> > > > > +       }
> > > > >
> > > > > +map_err:
> > > > > +       put_page(virt_to_head_page(buf));
> > > > >         return err;
> > > > >  }
> > > > >
> > > > > @@ -1775,13 +2018,13 @@ static int virtnet_receive(struct receive_queue *rq, int budget,
> > > > >                 void *ctx;
> > > > >
> > > > >                 while (stats.packets < budget &&
> > > > > -                      (buf = virtqueue_get_buf_ctx(rq->vq, &len, &ctx))) {
> > > > > +                      (buf = virtnet_rq_get_buf(rq, &len, &ctx))) {
> > > > >                         receive_buf(vi, rq, buf, len, ctx, xdp_xmit, &stats);
> > > > >                         stats.packets++;
> > > > >                 }
> > > > >         } else {
> > > > >                 while (stats.packets < budget &&
> > > > > -                      (buf = virtqueue_get_buf(rq->vq, &len)) != NULL) {
> > > > > +                      (buf = virtnet_rq_get_buf(rq, &len, NULL)) != NULL) {
> > > > >                         receive_buf(vi, rq, buf, len, NULL, xdp_xmit, &stats);
> > > > >                         stats.packets++;
> > > > >                 }
> > > > > @@ -3514,6 +3757,9 @@ static void virtnet_free_queues(struct virtnet_info *vi)
> > > > >         for (i = 0; i < vi->max_queue_pairs; i++) {
> > > > >                 __netif_napi_del(&vi->rq[i].napi);
> > > > >                 __netif_napi_del(&vi->sq[i].napi);
> > > > > +
> > > > > +               kfree(vi->rq[i].data_array);
> > > > > +               kfree(vi->rq[i].dma_array);
> > > > >         }
> > > > >
> > > > >         /* We called __netif_napi_del(),
> > > > > @@ -3591,9 +3837,10 @@ static void free_unused_bufs(struct virtnet_info *vi)
> > > > >         }
> > > > >
> > > > >         for (i = 0; i < vi->max_queue_pairs; i++) {
> > > > > -               struct virtqueue *vq = vi->rq[i].vq;
> > > > > -               while ((buf = virtqueue_detach_unused_buf(vq)) != NULL)
> > > > > -                       virtnet_rq_free_unused_buf(vq, buf);
> > > > > +               struct receive_queue *rq = &vi->rq[i];
> > > > > +
> > > > > +               while ((buf = virtnet_rq_detach_unused_buf(rq)) != NULL)
> > > > > +                       virtnet_rq_free_unused_buf(rq->vq, buf);
> > > > >                 cond_resched();
> > > > >         }
> > > > >  }
> > > > > @@ -3767,6 +4014,10 @@ static int init_vqs(struct virtnet_info *vi)
> > > > >         if (ret)
> > > > >                 goto err_free;
> > > > >
> > > > > +       ret = virtnet_rq_merge_map_init(vi);
> > > > > +       if (ret)
> > > > > +               goto err_free;
> > > > > +
> > > > >         cpus_read_lock();
> > > > >         virtnet_set_affinity(vi);
> > > > >         cpus_read_unlock();
> > > > > --
> > > > > 2.32.0.3.g01195cf9f
> > > > >
> > > >
> > >
> >
>


^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: [PATCH vhost v11 10/10] virtio_net: merge dma operation for one page
  2023-07-12  8:38                         ` Xuan Zhuo
@ 2023-07-14 10:37                           ` Michael S. Tsirkin
  -1 siblings, 0 replies; 176+ messages in thread
From: Michael S. Tsirkin @ 2023-07-14 10:37 UTC (permalink / raw)
  To: Xuan Zhuo
  Cc: Jesper Dangaard Brouer, Daniel Borkmann, netdev, John Fastabend,
	Alexei Starovoitov, virtualization, Christoph Hellwig,
	Eric Dumazet, Jakub Kicinski, bpf, Paolo Abeni, David S. Miller

On Wed, Jul 12, 2023 at 04:38:24PM +0800, Xuan Zhuo wrote:
> On Wed, 12 Jul 2023 16:37:43 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > On Wed, Jul 12, 2023 at 4:33 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > >
> > > On Wed, 12 Jul 2023 15:54:58 +0800, Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > > On Tue, 11 Jul 2023 10:58:51 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > > > On Tue, Jul 11, 2023 at 10:42 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > > > >
> > > > > > On Tue, 11 Jul 2023 10:36:17 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > > > > > On Mon, Jul 10, 2023 at 8:41 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > > > > > >
> > > > > > > > On Mon, 10 Jul 2023 07:59:03 -0400, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > > > > > > > > On Mon, Jul 10, 2023 at 06:18:30PM +0800, Xuan Zhuo wrote:
> > > > > > > > > > On Mon, 10 Jul 2023 05:40:21 -0400, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > > > > > > > > > > On Mon, Jul 10, 2023 at 11:42:37AM +0800, Xuan Zhuo wrote:
> > > > > > > > > > > > Currently, the virtio core will perform a dma operation for each
> > > > > > > > > > > > operation. Although, the same page may be operated multiple times.
> > > > > > > > > > > >
> > > > > > > > > > > > The driver does the dma operation and manages the dma address based the
> > > > > > > > > > > > feature premapped of virtio core.
> > > > > > > > > > > >
> > > > > > > > > > > > This way, we can perform only one dma operation for the same page. In
> > > > > > > > > > > > the case of mtu 1500, this can reduce a lot of dma operations.
> > > > > > > > > > > >
> > > > > > > > > > > > Tested on Aliyun g7.4large machine, in the case of a cpu 100%, pps
> > > > > > > > > > > > increased from 1893766 to 1901105. An increase of 0.4%.
> > > > > > > > > > >
> > > > > > > > > > > what kind of dma was there? an IOMMU? which vendors? in which mode
> > > > > > > > > > > of operation?
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > Do you mean this:
> > > > > > > > > >
> > > > > > > > > > [    0.470816] iommu: Default domain type: Passthrough
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > > With passthrough, dma API is just some indirect function calls, they do
> > > > > > > > > not affect the performance a lot.
> > > > > > > >
> > > > > > > >
> > > > > > > > Yes, this benefit is worthless. I seem to have done a meaningless thing. The
> > > > > > > > overhead of DMA I observed is indeed not too high.
> > > > > > >
> > > > > > > Have you measured with iommu=strict?
> > > > > >
> > > > > > I have not tested this way, our environment is pt, I wonder if strict is a
> > > > > > common scenario. I can test it.
> > > > >
> > > > > It's not a common setup, but it's a way to stress DMA layer to see the overhead.
> > > >
> > > > kernel command line: intel_iommu=on iommu.strict=1 iommu.passthrough=0
> > > >
> > > > virtio-net without merge dma 428614.00 pps
> > > >
> > > > virtio-net with merge dma    742853.00 pps
> > >
> > >
> > > kernel command line: intel_iommu=on iommu.strict=0 iommu.passthrough=0
> > >
> > > virtio-net without merge dma 775496.00 pps
> > >
> > > virtio-net with merge dma    1010514.00 pps
> > >
> > >
> >
> > Great, let's add those numbers to the changelog.
> 
> 
> Yes, I will do it in next version.
> 
> 
> Thanks.
> 

You should also test without iommu but with swiotlb=force

But first fix the use of DMA API to actually be correct,
otherwise you are cheating by avoiding synchronization.



> >
> > Thanks
> >
> > > Thanks.
> > >
> > > >
> > > >
> > > > Thanks.
> > > >
> > > >
> > > >
> > > >
> > > > >
> > > > > Thanks
> > > > >
> > > > > >
> > > > > > Thanks.
> > > > > >
> > > > > >
> > > > > > >
> > > > > > > Thanks
> > > > > > >
> > > > > > > >
> > > > > > > > Thanks.
> > > > > > > >
> > > > > > > >
> > > > > > > > >
> > > > > > > > > Try e.g. bounce buffer. Which is where you will see a problem: your
> > > > > > > > > patches won't work.
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > > > > > > > > > >
> > > > > > > > > > > This kind of difference is likely in the noise.
> > > > > > > > > >
> > > > > > > > > > It's really not high, but this is because the proportion of DMA under perf top
> > > > > > > > > > is not high. Probably that much.
> > > > > > > > >
> > > > > > > > > So maybe not worth the complexity.
> > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > > ---
> > > > > > > > > > > >  drivers/net/virtio_net.c | 283 ++++++++++++++++++++++++++++++++++++---
> > > > > > > > > > > >  1 file changed, 267 insertions(+), 16 deletions(-)
> > > > > > > > > > > >
> > > > > > > > > > > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > > > > > > > > > > > index 486b5849033d..4de845d35bed 100644
> > > > > > > > > > > > --- a/drivers/net/virtio_net.c
> > > > > > > > > > > > +++ b/drivers/net/virtio_net.c
> > > > > > > > > > > > @@ -126,6 +126,27 @@ static const struct virtnet_stat_desc virtnet_rq_stats_desc[] = {
> > > > > > > > > > > >  #define VIRTNET_SQ_STATS_LEN   ARRAY_SIZE(virtnet_sq_stats_desc)
> > > > > > > > > > > >  #define VIRTNET_RQ_STATS_LEN   ARRAY_SIZE(virtnet_rq_stats_desc)
> > > > > > > > > > > >
> > > > > > > > > > > > +/* The bufs on the same page may share this struct. */
> > > > > > > > > > > > +struct virtnet_rq_dma {
> > > > > > > > > > > > +       struct virtnet_rq_dma *next;
> > > > > > > > > > > > +
> > > > > > > > > > > > +       dma_addr_t addr;
> > > > > > > > > > > > +
> > > > > > > > > > > > +       void *buf;
> > > > > > > > > > > > +       u32 len;
> > > > > > > > > > > > +
> > > > > > > > > > > > +       u32 ref;
> > > > > > > > > > > > +};
> > > > > > > > > > > > +
> > > > > > > > > > > > +/* Record the dma and buf. */
> > > > > > > > > > >
> > > > > > > > > > > I guess I see that. But why?
> > > > > > > > > > > And these two comments are the extent of the available
> > > > > > > > > > > documentation, that's not enough I feel.
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > > +struct virtnet_rq_data {
> > > > > > > > > > > > +       struct virtnet_rq_data *next;
> > > > > > > > > > >
> > > > > > > > > > > Is manually reimplementing a linked list the best
> > > > > > > > > > > we can do?
> > > > > > > > > >
> > > > > > > > > > Yes, we can use llist.
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > > +
> > > > > > > > > > > > +       void *buf;
> > > > > > > > > > > > +
> > > > > > > > > > > > +       struct virtnet_rq_dma *dma;
> > > > > > > > > > > > +};
> > > > > > > > > > > > +
> > > > > > > > > > > >  /* Internal representation of a send virtqueue */
> > > > > > > > > > > >  struct send_queue {
> > > > > > > > > > > >         /* Virtqueue associated with this send _queue */
> > > > > > > > > > > > @@ -175,6 +196,13 @@ struct receive_queue {
> > > > > > > > > > > >         char name[16];
> > > > > > > > > > > >
> > > > > > > > > > > >         struct xdp_rxq_info xdp_rxq;
> > > > > > > > > > > > +
> > > > > > > > > > > > +       struct virtnet_rq_data *data_array;
> > > > > > > > > > > > +       struct virtnet_rq_data *data_free;
> > > > > > > > > > > > +
> > > > > > > > > > > > +       struct virtnet_rq_dma *dma_array;
> > > > > > > > > > > > +       struct virtnet_rq_dma *dma_free;
> > > > > > > > > > > > +       struct virtnet_rq_dma *last_dma;
> > > > > > > > > > > >  };
> > > > > > > > > > > >
> > > > > > > > > > > >  /* This structure can contain rss message with maximum settings for indirection table and keysize
> > > > > > > > > > > > @@ -549,6 +577,176 @@ static struct sk_buff *page_to_skb(struct virtnet_info *vi,
> > > > > > > > > > > >         return skb;
> > > > > > > > > > > >  }
> > > > > > > > > > > >
> > > > > > > > > > > > +static void virtnet_rq_unmap(struct receive_queue *rq, struct virtnet_rq_dma *dma)
> > > > > > > > > > > > +{
> > > > > > > > > > > > +       struct device *dev;
> > > > > > > > > > > > +
> > > > > > > > > > > > +       --dma->ref;
> > > > > > > > > > > > +
> > > > > > > > > > > > +       if (dma->ref)
> > > > > > > > > > > > +               return;
> > > > > > > > > > > > +
> > > > > > > > > > >
> > > > > > > > > > > If you don't unmap there is no guarantee valid data will be
> > > > > > > > > > > there in the buffer.
> > > > > > > > > > >
> > > > > > > > > > > > +       dev = virtqueue_dma_dev(rq->vq);
> > > > > > > > > > > > +
> > > > > > > > > > > > +       dma_unmap_page(dev, dma->addr, dma->len, DMA_FROM_DEVICE);
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > > +
> > > > > > > > > > > > +       dma->next = rq->dma_free;
> > > > > > > > > > > > +       rq->dma_free = dma;
> > > > > > > > > > > > +}
> > > > > > > > > > > > +
> > > > > > > > > > > > +static void *virtnet_rq_recycle_data(struct receive_queue *rq,
> > > > > > > > > > > > +                                    struct virtnet_rq_data *data)
> > > > > > > > > > > > +{
> > > > > > > > > > > > +       void *buf;
> > > > > > > > > > > > +
> > > > > > > > > > > > +       buf = data->buf;
> > > > > > > > > > > > +
> > > > > > > > > > > > +       data->next = rq->data_free;
> > > > > > > > > > > > +       rq->data_free = data;
> > > > > > > > > > > > +
> > > > > > > > > > > > +       return buf;
> > > > > > > > > > > > +}
> > > > > > > > > > > > +
> > > > > > > > > > > > +static struct virtnet_rq_data *virtnet_rq_get_data(struct receive_queue *rq,
> > > > > > > > > > > > +                                                  void *buf,
> > > > > > > > > > > > +                                                  struct virtnet_rq_dma *dma)
> > > > > > > > > > > > +{
> > > > > > > > > > > > +       struct virtnet_rq_data *data;
> > > > > > > > > > > > +
> > > > > > > > > > > > +       data = rq->data_free;
> > > > > > > > > > > > +       rq->data_free = data->next;
> > > > > > > > > > > > +
> > > > > > > > > > > > +       data->buf = buf;
> > > > > > > > > > > > +       data->dma = dma;
> > > > > > > > > > > > +
> > > > > > > > > > > > +       return data;
> > > > > > > > > > > > +}
> > > > > > > > > > > > +
> > > > > > > > > > > > +static void *virtnet_rq_get_buf(struct receive_queue *rq, u32 *len, void **ctx)
> > > > > > > > > > > > +{
> > > > > > > > > > > > +       struct virtnet_rq_data *data;
> > > > > > > > > > > > +       void *buf;
> > > > > > > > > > > > +
> > > > > > > > > > > > +       buf = virtqueue_get_buf_ctx(rq->vq, len, ctx);
> > > > > > > > > > > > +       if (!buf || !rq->data_array)
> > > > > > > > > > > > +               return buf;
> > > > > > > > > > > > +
> > > > > > > > > > > > +       data = buf;
> > > > > > > > > > > > +
> > > > > > > > > > > > +       virtnet_rq_unmap(rq, data->dma);
> > > > > > > > > > > > +
> > > > > > > > > > > > +       return virtnet_rq_recycle_data(rq, data);
> > > > > > > > > > > > +}
> > > > > > > > > > > > +
> > > > > > > > > > > > +static void *virtnet_rq_detach_unused_buf(struct receive_queue *rq)
> > > > > > > > > > > > +{
> > > > > > > > > > > > +       struct virtnet_rq_data *data;
> > > > > > > > > > > > +       void *buf;
> > > > > > > > > > > > +
> > > > > > > > > > > > +       buf = virtqueue_detach_unused_buf(rq->vq);
> > > > > > > > > > > > +       if (!buf || !rq->data_array)
> > > > > > > > > > > > +               return buf;
> > > > > > > > > > > > +
> > > > > > > > > > > > +       data = buf;
> > > > > > > > > > > > +
> > > > > > > > > > > > +       virtnet_rq_unmap(rq, data->dma);
> > > > > > > > > > > > +
> > > > > > > > > > > > +       return virtnet_rq_recycle_data(rq, data);
> > > > > > > > > > > > +}
> > > > > > > > > > > > +
> > > > > > > > > > > > +static int virtnet_rq_map_sg(struct receive_queue *rq, void *buf, u32 len)
> > > > > > > > > > > > +{
> > > > > > > > > > > > +       struct virtnet_rq_dma *dma = rq->last_dma;
> > > > > > > > > > > > +       struct device *dev;
> > > > > > > > > > > > +       u32 off, map_len;
> > > > > > > > > > > > +       dma_addr_t addr;
> > > > > > > > > > > > +       void *end;
> > > > > > > > > > > > +
> > > > > > > > > > > > +       if (likely(dma) && buf >= dma->buf && (buf + len <= dma->buf + dma->len)) {
> > > > > > > > > > > > +               ++dma->ref;
> > > > > > > > > > > > +               addr = dma->addr + (buf - dma->buf);
> > > > > > > > > > > > +               goto ok;
> > > > > > > > > > > > +       }
> > > > > > > > > > >
> > > > > > > > > > > So this is the meat of the proposed optimization. I guess that
> > > > > > > > > > > if the last buffer we allocated happens to be in the same page
> > > > > > > > > > > as this one then they can both be mapped for DMA together.
> > > > > > > > > >
> > > > > > > > > > Since we use page_frag, the buffers we allocated are all continuous.
> > > > > > > > > >
> > > > > > > > > > > Why last one specifically? Whether next one happens to
> > > > > > > > > > > be close depends on luck. If you want to try optimizing this
> > > > > > > > > > > the right thing to do is likely by using a page pool.
> > > > > > > > > > > There's actually work upstream on page pool, look it up.
> > > > > > > > > >
> > > > > > > > > > As we discussed in another thread, the page pool is first used for xdp. Let's
> > > > > > > > > > transform it step by step.
> > > > > > > > > >
> > > > > > > > > > Thanks.
> > > > > > > > >
> > > > > > > > > ok so this should wait then?
> > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > > +
> > > > > > > > > > > > +       end = buf + len - 1;
> > > > > > > > > > > > +       off = offset_in_page(end);
> > > > > > > > > > > > +       map_len = len + PAGE_SIZE - off;
> > > > > > > > > > > > +
> > > > > > > > > > > > +       dev = virtqueue_dma_dev(rq->vq);
> > > > > > > > > > > > +
> > > > > > > > > > > > +       addr = dma_map_page_attrs(dev, virt_to_page(buf), offset_in_page(buf),
> > > > > > > > > > > > +                                 map_len, DMA_FROM_DEVICE, 0);
> > > > > > > > > > > > +       if (addr == DMA_MAPPING_ERROR)
> > > > > > > > > > > > +               return -ENOMEM;
> > > > > > > > > > > > +
> > > > > > > > > > > > +       dma = rq->dma_free;
> > > > > > > > > > > > +       rq->dma_free = dma->next;
> > > > > > > > > > > > +
> > > > > > > > > > > > +       dma->ref = 1;
> > > > > > > > > > > > +       dma->buf = buf;
> > > > > > > > > > > > +       dma->addr = addr;
> > > > > > > > > > > > +       dma->len = map_len;
> > > > > > > > > > > > +
> > > > > > > > > > > > +       rq->last_dma = dma;
> > > > > > > > > > > > +
> > > > > > > > > > > > +ok:
> > > > > > > > > > > > +       sg_init_table(rq->sg, 1);
> > > > > > > > > > > > +       rq->sg[0].dma_address = addr;
> > > > > > > > > > > > +       rq->sg[0].length = len;
> > > > > > > > > > > > +
> > > > > > > > > > > > +       return 0;
> > > > > > > > > > > > +}
> > > > > > > > > > > > +
> > > > > > > > > > > > +static int virtnet_rq_merge_map_init(struct virtnet_info *vi)
> > > > > > > > > > > > +{
> > > > > > > > > > > > +       struct receive_queue *rq;
> > > > > > > > > > > > +       int i, err, j, num;
> > > > > > > > > > > > +
> > > > > > > > > > > > +       /* disable for big mode */
> > > > > > > > > > > > +       if (!vi->mergeable_rx_bufs && vi->big_packets)
> > > > > > > > > > > > +               return 0;
> > > > > > > > > > > > +
> > > > > > > > > > > > +       for (i = 0; i < vi->max_queue_pairs; i++) {
> > > > > > > > > > > > +               err = virtqueue_set_premapped(vi->rq[i].vq);
> > > > > > > > > > > > +               if (err)
> > > > > > > > > > > > +                       continue;
> > > > > > > > > > > > +
> > > > > > > > > > > > +               rq = &vi->rq[i];
> > > > > > > > > > > > +
> > > > > > > > > > > > +               num = virtqueue_get_vring_size(rq->vq);
> > > > > > > > > > > > +
> > > > > > > > > > > > +               rq->data_array = kmalloc_array(num, sizeof(*rq->data_array), GFP_KERNEL);
> > > > > > > > > > > > +               if (!rq->data_array)
> > > > > > > > > > > > +                       goto err;
> > > > > > > > > > > > +
> > > > > > > > > > > > +               rq->dma_array = kmalloc_array(num, sizeof(*rq->dma_array), GFP_KERNEL);
> > > > > > > > > > > > +               if (!rq->dma_array)
> > > > > > > > > > > > +                       goto err;
> > > > > > > > > > > > +
> > > > > > > > > > > > +               for (j = 0; j < num; ++j) {
> > > > > > > > > > > > +                       rq->data_array[j].next = rq->data_free;
> > > > > > > > > > > > +                       rq->data_free = &rq->data_array[j];
> > > > > > > > > > > > +
> > > > > > > > > > > > +                       rq->dma_array[j].next = rq->dma_free;
> > > > > > > > > > > > +                       rq->dma_free = &rq->dma_array[j];
> > > > > > > > > > > > +               }
> > > > > > > > > > > > +       }
> > > > > > > > > > > > +
> > > > > > > > > > > > +       return 0;
> > > > > > > > > > > > +
> > > > > > > > > > > > +err:
> > > > > > > > > > > > +       for (i = 0; i < vi->max_queue_pairs; i++) {
> > > > > > > > > > > > +               struct receive_queue *rq;
> > > > > > > > > > > > +
> > > > > > > > > > > > +               rq = &vi->rq[i];
> > > > > > > > > > > > +
> > > > > > > > > > > > +               kfree(rq->dma_array);
> > > > > > > > > > > > +               kfree(rq->data_array);
> > > > > > > > > > > > +       }
> > > > > > > > > > > > +
> > > > > > > > > > > > +       return -ENOMEM;
> > > > > > > > > > > > +}
> > > > > > > > > > > > +
> > > > > > > > > > > >  static void free_old_xmit_skbs(struct send_queue *sq, bool in_napi)
> > > > > > > > > > > >  {
> > > > > > > > > > > >         unsigned int len;
> > > > > > > > > > > > @@ -835,7 +1033,7 @@ static struct page *xdp_linearize_page(struct receive_queue *rq,
> > > > > > > > > > > >                 void *buf;
> > > > > > > > > > > >                 int off;
> > > > > > > > > > > >
> > > > > > > > > > > > -               buf = virtqueue_get_buf(rq->vq, &buflen);
> > > > > > > > > > > > +               buf = virtnet_rq_get_buf(rq, &buflen, NULL);
> > > > > > > > > > > >                 if (unlikely(!buf))
> > > > > > > > > > > >                         goto err_buf;
> > > > > > > > > > > >
> > > > > > > > > > > > @@ -1126,7 +1324,7 @@ static int virtnet_build_xdp_buff_mrg(struct net_device *dev,
> > > > > > > > > > > >                 return -EINVAL;
> > > > > > > > > > > >
> > > > > > > > > > > >         while (--*num_buf > 0) {
> > > > > > > > > > > > -               buf = virtqueue_get_buf_ctx(rq->vq, &len, &ctx);
> > > > > > > > > > > > +               buf = virtnet_rq_get_buf(rq, &len, &ctx);
> > > > > > > > > > > >                 if (unlikely(!buf)) {
> > > > > > > > > > > >                         pr_debug("%s: rx error: %d buffers out of %d missing\n",
> > > > > > > > > > > >                                  dev->name, *num_buf,
> > > > > > > > > > > > @@ -1351,7 +1549,7 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
> > > > > > > > > > > >         while (--num_buf) {
> > > > > > > > > > > >                 int num_skb_frags;
> > > > > > > > > > > >
> > > > > > > > > > > > -               buf = virtqueue_get_buf_ctx(rq->vq, &len, &ctx);
> > > > > > > > > > > > +               buf = virtnet_rq_get_buf(rq, &len, &ctx);
> > > > > > > > > > > >                 if (unlikely(!buf)) {
> > > > > > > > > > > >                         pr_debug("%s: rx error: %d buffers out of %d missing\n",
> > > > > > > > > > > >                                  dev->name, num_buf,
> > > > > > > > > > > > @@ -1414,7 +1612,7 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
> > > > > > > > > > > >  err_skb:
> > > > > > > > > > > >         put_page(page);
> > > > > > > > > > > >         while (num_buf-- > 1) {
> > > > > > > > > > > > -               buf = virtqueue_get_buf(rq->vq, &len);
> > > > > > > > > > > > +               buf = virtnet_rq_get_buf(rq, &len, NULL);
> > > > > > > > > > > >                 if (unlikely(!buf)) {
> > > > > > > > > > > >                         pr_debug("%s: rx error: %d buffers missing\n",
> > > > > > > > > > > >                                  dev->name, num_buf);
> > > > > > > > > > > > @@ -1529,6 +1727,7 @@ static int add_recvbuf_small(struct virtnet_info *vi, struct receive_queue *rq,
> > > > > > > > > > > >         unsigned int xdp_headroom = virtnet_get_headroom(vi);
> > > > > > > > > > > >         void *ctx = (void *)(unsigned long)xdp_headroom;
> > > > > > > > > > > >         int len = vi->hdr_len + VIRTNET_RX_PAD + GOOD_PACKET_LEN + xdp_headroom;
> > > > > > > > > > > > +       struct virtnet_rq_data *data;
> > > > > > > > > > > >         int err;
> > > > > > > > > > > >
> > > > > > > > > > > >         len = SKB_DATA_ALIGN(len) +
> > > > > > > > > > > > @@ -1539,11 +1738,34 @@ static int add_recvbuf_small(struct virtnet_info *vi, struct receive_queue *rq,
> > > > > > > > > > > >         buf = (char *)page_address(alloc_frag->page) + alloc_frag->offset;
> > > > > > > > > > > >         get_page(alloc_frag->page);
> > > > > > > > > > > >         alloc_frag->offset += len;
> > > > > > > > > > > > -       sg_init_one(rq->sg, buf + VIRTNET_RX_PAD + xdp_headroom,
> > > > > > > > > > > > -                   vi->hdr_len + GOOD_PACKET_LEN);
> > > > > > > > > > > > -       err = virtqueue_add_inbuf_ctx(rq->vq, rq->sg, 1, buf, ctx, gfp);
> > > > > > > > > > > > +
> > > > > > > > > > > > +       if (rq->data_array) {
> > > > > > > > > > > > +               err = virtnet_rq_map_sg(rq, buf + VIRTNET_RX_PAD + xdp_headroom,
> > > > > > > > > > > > +                                       vi->hdr_len + GOOD_PACKET_LEN);
> > > > > > > > > > > > +               if (err)
> > > > > > > > > > > > +                       goto map_err;
> > > > > > > > > > > > +
> > > > > > > > > > > > +               data = virtnet_rq_get_data(rq, buf, rq->last_dma);
> > > > > > > > > > > > +       } else {
> > > > > > > > > > > > +               sg_init_one(rq->sg, buf + VIRTNET_RX_PAD + xdp_headroom,
> > > > > > > > > > > > +                           vi->hdr_len + GOOD_PACKET_LEN);
> > > > > > > > > > > > +               data = (void *)buf;
> > > > > > > > > > > > +       }
> > > > > > > > > > > > +
> > > > > > > > > > > > +       err = virtqueue_add_inbuf_ctx(rq->vq, rq->sg, 1, data, ctx, gfp);
> > > > > > > > > > > >         if (err < 0)
> > > > > > > > > > > > -               put_page(virt_to_head_page(buf));
> > > > > > > > > > > > +               goto add_err;
> > > > > > > > > > > > +
> > > > > > > > > > > > +       return err;
> > > > > > > > > > > > +
> > > > > > > > > > > > +add_err:
> > > > > > > > > > > > +       if (rq->data_array) {
> > > > > > > > > > > > +               virtnet_rq_unmap(rq, data->dma);
> > > > > > > > > > > > +               virtnet_rq_recycle_data(rq, data);
> > > > > > > > > > > > +       }
> > > > > > > > > > > > +
> > > > > > > > > > > > +map_err:
> > > > > > > > > > > > +       put_page(virt_to_head_page(buf));
> > > > > > > > > > > >         return err;
> > > > > > > > > > > >  }
> > > > > > > > > > > >
> > > > > > > > > > > > @@ -1620,6 +1842,7 @@ static int add_recvbuf_mergeable(struct virtnet_info *vi,
> > > > > > > > > > > >         unsigned int headroom = virtnet_get_headroom(vi);
> > > > > > > > > > > >         unsigned int tailroom = headroom ? sizeof(struct skb_shared_info) : 0;
> > > > > > > > > > > >         unsigned int room = SKB_DATA_ALIGN(headroom + tailroom);
> > > > > > > > > > > > +       struct virtnet_rq_data *data;
> > > > > > > > > > > >         char *buf;
> > > > > > > > > > > >         void *ctx;
> > > > > > > > > > > >         int err;
> > > > > > > > > > > > @@ -1650,12 +1873,32 @@ static int add_recvbuf_mergeable(struct virtnet_info *vi,
> > > > > > > > > > > >                 alloc_frag->offset += hole;
> > > > > > > > > > > >         }
> > > > > > > > > > > >
> > > > > > > > > > > > -       sg_init_one(rq->sg, buf, len);
> > > > > > > > > > > > +       if (rq->data_array) {
> > > > > > > > > > > > +               err = virtnet_rq_map_sg(rq, buf, len);
> > > > > > > > > > > > +               if (err)
> > > > > > > > > > > > +                       goto map_err;
> > > > > > > > > > > > +
> > > > > > > > > > > > +               data = virtnet_rq_get_data(rq, buf, rq->last_dma);
> > > > > > > > > > > > +       } else {
> > > > > > > > > > > > +               sg_init_one(rq->sg, buf, len);
> > > > > > > > > > > > +               data = (void *)buf;
> > > > > > > > > > > > +       }
> > > > > > > > > > > > +
> > > > > > > > > > > >         ctx = mergeable_len_to_ctx(len + room, headroom);
> > > > > > > > > > > > -       err = virtqueue_add_inbuf_ctx(rq->vq, rq->sg, 1, buf, ctx, gfp);
> > > > > > > > > > > > +       err = virtqueue_add_inbuf_ctx(rq->vq, rq->sg, 1, data, ctx, gfp);
> > > > > > > > > > > >         if (err < 0)
> > > > > > > > > > > > -               put_page(virt_to_head_page(buf));
> > > > > > > > > > > > +               goto add_err;
> > > > > > > > > > > > +
> > > > > > > > > > > > +       return 0;
> > > > > > > > > > > > +
> > > > > > > > > > > > +add_err:
> > > > > > > > > > > > +       if (rq->data_array) {
> > > > > > > > > > > > +               virtnet_rq_unmap(rq, data->dma);
> > > > > > > > > > > > +               virtnet_rq_recycle_data(rq, data);
> > > > > > > > > > > > +       }
> > > > > > > > > > > >
> > > > > > > > > > > > +map_err:
> > > > > > > > > > > > +       put_page(virt_to_head_page(buf));
> > > > > > > > > > > >         return err;
> > > > > > > > > > > >  }
> > > > > > > > > > > >
> > > > > > > > > > > > @@ -1775,13 +2018,13 @@ static int virtnet_receive(struct receive_queue *rq, int budget,
> > > > > > > > > > > >                 void *ctx;
> > > > > > > > > > > >
> > > > > > > > > > > >                 while (stats.packets < budget &&
> > > > > > > > > > > > -                      (buf = virtqueue_get_buf_ctx(rq->vq, &len, &ctx))) {
> > > > > > > > > > > > +                      (buf = virtnet_rq_get_buf(rq, &len, &ctx))) {
> > > > > > > > > > > >                         receive_buf(vi, rq, buf, len, ctx, xdp_xmit, &stats);
> > > > > > > > > > > >                         stats.packets++;
> > > > > > > > > > > >                 }
> > > > > > > > > > > >         } else {
> > > > > > > > > > > >                 while (stats.packets < budget &&
> > > > > > > > > > > > -                      (buf = virtqueue_get_buf(rq->vq, &len)) != NULL) {
> > > > > > > > > > > > +                      (buf = virtnet_rq_get_buf(rq, &len, NULL)) != NULL) {
> > > > > > > > > > > >                         receive_buf(vi, rq, buf, len, NULL, xdp_xmit, &stats);
> > > > > > > > > > > >                         stats.packets++;
> > > > > > > > > > > >                 }
> > > > > > > > > > > > @@ -3514,6 +3757,9 @@ static void virtnet_free_queues(struct virtnet_info *vi)
> > > > > > > > > > > >         for (i = 0; i < vi->max_queue_pairs; i++) {
> > > > > > > > > > > >                 __netif_napi_del(&vi->rq[i].napi);
> > > > > > > > > > > >                 __netif_napi_del(&vi->sq[i].napi);
> > > > > > > > > > > > +
> > > > > > > > > > > > +               kfree(vi->rq[i].data_array);
> > > > > > > > > > > > +               kfree(vi->rq[i].dma_array);
> > > > > > > > > > > >         }
> > > > > > > > > > > >
> > > > > > > > > > > >         /* We called __netif_napi_del(),
> > > > > > > > > > > > @@ -3591,9 +3837,10 @@ static void free_unused_bufs(struct virtnet_info *vi)
> > > > > > > > > > > >         }
> > > > > > > > > > > >
> > > > > > > > > > > >         for (i = 0; i < vi->max_queue_pairs; i++) {
> > > > > > > > > > > > -               struct virtqueue *vq = vi->rq[i].vq;
> > > > > > > > > > > > -               while ((buf = virtqueue_detach_unused_buf(vq)) != NULL)
> > > > > > > > > > > > -                       virtnet_rq_free_unused_buf(vq, buf);
> > > > > > > > > > > > +               struct receive_queue *rq = &vi->rq[i];
> > > > > > > > > > > > +
> > > > > > > > > > > > +               while ((buf = virtnet_rq_detach_unused_buf(rq)) != NULL)
> > > > > > > > > > > > +                       virtnet_rq_free_unused_buf(rq->vq, buf);
> > > > > > > > > > > >                 cond_resched();
> > > > > > > > > > > >         }
> > > > > > > > > > > >  }
> > > > > > > > > > > > @@ -3767,6 +4014,10 @@ static int init_vqs(struct virtnet_info *vi)
> > > > > > > > > > > >         if (ret)
> > > > > > > > > > > >                 goto err_free;
> > > > > > > > > > > >
> > > > > > > > > > > > +       ret = virtnet_rq_merge_map_init(vi);
> > > > > > > > > > > > +       if (ret)
> > > > > > > > > > > > +               goto err_free;
> > > > > > > > > > > > +
> > > > > > > > > > > >         cpus_read_lock();
> > > > > > > > > > > >         virtnet_set_affinity(vi);
> > > > > > > > > > > >         cpus_read_unlock();
> > > > > > > > > > > > --
> > > > > > > > > > > > 2.32.0.3.g01195cf9f
> > > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: [PATCH vhost v11 10/10] virtio_net: merge dma operation for one page
@ 2023-07-14 10:37                           ` Michael S. Tsirkin
  0 siblings, 0 replies; 176+ messages in thread
From: Michael S. Tsirkin @ 2023-07-14 10:37 UTC (permalink / raw)
  To: Xuan Zhuo
  Cc: Jason Wang, virtualization, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Alexei Starovoitov, Daniel Borkmann,
	Jesper Dangaard Brouer, John Fastabend, netdev, bpf,
	Christoph Hellwig

On Wed, Jul 12, 2023 at 04:38:24PM +0800, Xuan Zhuo wrote:
> On Wed, 12 Jul 2023 16:37:43 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > On Wed, Jul 12, 2023 at 4:33 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > >
> > > On Wed, 12 Jul 2023 15:54:58 +0800, Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > > On Tue, 11 Jul 2023 10:58:51 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > > > On Tue, Jul 11, 2023 at 10:42 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > > > >
> > > > > > On Tue, 11 Jul 2023 10:36:17 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > > > > > On Mon, Jul 10, 2023 at 8:41 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > > > > > >
> > > > > > > > On Mon, 10 Jul 2023 07:59:03 -0400, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > > > > > > > > On Mon, Jul 10, 2023 at 06:18:30PM +0800, Xuan Zhuo wrote:
> > > > > > > > > > On Mon, 10 Jul 2023 05:40:21 -0400, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > > > > > > > > > > On Mon, Jul 10, 2023 at 11:42:37AM +0800, Xuan Zhuo wrote:
> > > > > > > > > > > > Currently, the virtio core will perform a dma operation for each
> > > > > > > > > > > > operation. Although, the same page may be operated multiple times.
> > > > > > > > > > > >
> > > > > > > > > > > > The driver does the dma operation and manages the dma address based the
> > > > > > > > > > > > feature premapped of virtio core.
> > > > > > > > > > > >
> > > > > > > > > > > > This way, we can perform only one dma operation for the same page. In
> > > > > > > > > > > > the case of mtu 1500, this can reduce a lot of dma operations.
> > > > > > > > > > > >
> > > > > > > > > > > > Tested on Aliyun g7.4large machine, in the case of a cpu 100%, pps
> > > > > > > > > > > > increased from 1893766 to 1901105. An increase of 0.4%.
> > > > > > > > > > >
> > > > > > > > > > > what kind of dma was there? an IOMMU? which vendors? in which mode
> > > > > > > > > > > of operation?
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > Do you mean this:
> > > > > > > > > >
> > > > > > > > > > [    0.470816] iommu: Default domain type: Passthrough
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > > With passthrough, dma API is just some indirect function calls, they do
> > > > > > > > > not affect the performance a lot.
> > > > > > > >
> > > > > > > >
> > > > > > > > Yes, this benefit is worthless. I seem to have done a meaningless thing. The
> > > > > > > > overhead of DMA I observed is indeed not too high.
> > > > > > >
> > > > > > > Have you measured with iommu=strict?
> > > > > >
> > > > > > I have not tested this way, our environment is pt, I wonder if strict is a
> > > > > > common scenario. I can test it.
> > > > >
> > > > > It's not a common setup, but it's a way to stress DMA layer to see the overhead.
> > > >
> > > > kernel command line: intel_iommu=on iommu.strict=1 iommu.passthrough=0
> > > >
> > > > virtio-net without merge dma 428614.00 pps
> > > >
> > > > virtio-net with merge dma    742853.00 pps
> > >
> > >
> > > kernel command line: intel_iommu=on iommu.strict=0 iommu.passthrough=0
> > >
> > > virtio-net without merge dma 775496.00 pps
> > >
> > > virtio-net with merge dma    1010514.00 pps
> > >
> > >
> >
> > Great, let's add those numbers to the changelog.
> 
> 
> Yes, I will do it in next version.
> 
> 
> Thanks.
> 

You should also test without iommu but with swiotlb=force

But first fix the use of DMA API to actually be correct,
otherwise you are cheating by avoiding synchronization.



> >
> > Thanks
> >
> > > Thanks.
> > >
> > > >
> > > >
> > > > Thanks.
> > > >
> > > >
> > > >
> > > >
> > > > >
> > > > > Thanks
> > > > >
> > > > > >
> > > > > > Thanks.
> > > > > >
> > > > > >
> > > > > > >
> > > > > > > Thanks
> > > > > > >
> > > > > > > >
> > > > > > > > Thanks.
> > > > > > > >
> > > > > > > >
> > > > > > > > >
> > > > > > > > > Try e.g. bounce buffer. Which is where you will see a problem: your
> > > > > > > > > patches won't work.
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > > > > > > > > > >
> > > > > > > > > > > This kind of difference is likely in the noise.
> > > > > > > > > >
> > > > > > > > > > It's really not high, but this is because the proportion of DMA under perf top
> > > > > > > > > > is not high. Probably that much.
> > > > > > > > >
> > > > > > > > > So maybe not worth the complexity.
> > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > > ---
> > > > > > > > > > > >  drivers/net/virtio_net.c | 283 ++++++++++++++++++++++++++++++++++++---
> > > > > > > > > > > >  1 file changed, 267 insertions(+), 16 deletions(-)
> > > > > > > > > > > >
> > > > > > > > > > > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > > > > > > > > > > > index 486b5849033d..4de845d35bed 100644
> > > > > > > > > > > > --- a/drivers/net/virtio_net.c
> > > > > > > > > > > > +++ b/drivers/net/virtio_net.c
> > > > > > > > > > > > @@ -126,6 +126,27 @@ static const struct virtnet_stat_desc virtnet_rq_stats_desc[] = {
> > > > > > > > > > > >  #define VIRTNET_SQ_STATS_LEN   ARRAY_SIZE(virtnet_sq_stats_desc)
> > > > > > > > > > > >  #define VIRTNET_RQ_STATS_LEN   ARRAY_SIZE(virtnet_rq_stats_desc)
> > > > > > > > > > > >
> > > > > > > > > > > > +/* The bufs on the same page may share this struct. */
> > > > > > > > > > > > +struct virtnet_rq_dma {
> > > > > > > > > > > > +       struct virtnet_rq_dma *next;
> > > > > > > > > > > > +
> > > > > > > > > > > > +       dma_addr_t addr;
> > > > > > > > > > > > +
> > > > > > > > > > > > +       void *buf;
> > > > > > > > > > > > +       u32 len;
> > > > > > > > > > > > +
> > > > > > > > > > > > +       u32 ref;
> > > > > > > > > > > > +};
> > > > > > > > > > > > +
> > > > > > > > > > > > +/* Record the dma and buf. */
> > > > > > > > > > >
> > > > > > > > > > > I guess I see that. But why?
> > > > > > > > > > > And these two comments are the extent of the available
> > > > > > > > > > > documentation, that's not enough I feel.
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > > +struct virtnet_rq_data {
> > > > > > > > > > > > +       struct virtnet_rq_data *next;
> > > > > > > > > > >
> > > > > > > > > > > Is manually reimplementing a linked list the best
> > > > > > > > > > > we can do?
> > > > > > > > > >
> > > > > > > > > > Yes, we can use llist.
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > > +
> > > > > > > > > > > > +       void *buf;
> > > > > > > > > > > > +
> > > > > > > > > > > > +       struct virtnet_rq_dma *dma;
> > > > > > > > > > > > +};
> > > > > > > > > > > > +
> > > > > > > > > > > >  /* Internal representation of a send virtqueue */
> > > > > > > > > > > >  struct send_queue {
> > > > > > > > > > > >         /* Virtqueue associated with this send _queue */
> > > > > > > > > > > > @@ -175,6 +196,13 @@ struct receive_queue {
> > > > > > > > > > > >         char name[16];
> > > > > > > > > > > >
> > > > > > > > > > > >         struct xdp_rxq_info xdp_rxq;
> > > > > > > > > > > > +
> > > > > > > > > > > > +       struct virtnet_rq_data *data_array;
> > > > > > > > > > > > +       struct virtnet_rq_data *data_free;
> > > > > > > > > > > > +
> > > > > > > > > > > > +       struct virtnet_rq_dma *dma_array;
> > > > > > > > > > > > +       struct virtnet_rq_dma *dma_free;
> > > > > > > > > > > > +       struct virtnet_rq_dma *last_dma;
> > > > > > > > > > > >  };
> > > > > > > > > > > >
> > > > > > > > > > > >  /* This structure can contain rss message with maximum settings for indirection table and keysize
> > > > > > > > > > > > @@ -549,6 +577,176 @@ static struct sk_buff *page_to_skb(struct virtnet_info *vi,
> > > > > > > > > > > >         return skb;
> > > > > > > > > > > >  }
> > > > > > > > > > > >
> > > > > > > > > > > > +static void virtnet_rq_unmap(struct receive_queue *rq, struct virtnet_rq_dma *dma)
> > > > > > > > > > > > +{
> > > > > > > > > > > > +       struct device *dev;
> > > > > > > > > > > > +
> > > > > > > > > > > > +       --dma->ref;
> > > > > > > > > > > > +
> > > > > > > > > > > > +       if (dma->ref)
> > > > > > > > > > > > +               return;
> > > > > > > > > > > > +
> > > > > > > > > > >
> > > > > > > > > > > If you don't unmap there is no guarantee valid data will be
> > > > > > > > > > > there in the buffer.
> > > > > > > > > > >
> > > > > > > > > > > > +       dev = virtqueue_dma_dev(rq->vq);
> > > > > > > > > > > > +
> > > > > > > > > > > > +       dma_unmap_page(dev, dma->addr, dma->len, DMA_FROM_DEVICE);
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > > +
> > > > > > > > > > > > +       dma->next = rq->dma_free;
> > > > > > > > > > > > +       rq->dma_free = dma;
> > > > > > > > > > > > +}
> > > > > > > > > > > > +
> > > > > > > > > > > > +static void *virtnet_rq_recycle_data(struct receive_queue *rq,
> > > > > > > > > > > > +                                    struct virtnet_rq_data *data)
> > > > > > > > > > > > +{
> > > > > > > > > > > > +       void *buf;
> > > > > > > > > > > > +
> > > > > > > > > > > > +       buf = data->buf;
> > > > > > > > > > > > +
> > > > > > > > > > > > +       data->next = rq->data_free;
> > > > > > > > > > > > +       rq->data_free = data;
> > > > > > > > > > > > +
> > > > > > > > > > > > +       return buf;
> > > > > > > > > > > > +}
> > > > > > > > > > > > +
> > > > > > > > > > > > +static struct virtnet_rq_data *virtnet_rq_get_data(struct receive_queue *rq,
> > > > > > > > > > > > +                                                  void *buf,
> > > > > > > > > > > > +                                                  struct virtnet_rq_dma *dma)
> > > > > > > > > > > > +{
> > > > > > > > > > > > +       struct virtnet_rq_data *data;
> > > > > > > > > > > > +
> > > > > > > > > > > > +       data = rq->data_free;
> > > > > > > > > > > > +       rq->data_free = data->next;
> > > > > > > > > > > > +
> > > > > > > > > > > > +       data->buf = buf;
> > > > > > > > > > > > +       data->dma = dma;
> > > > > > > > > > > > +
> > > > > > > > > > > > +       return data;
> > > > > > > > > > > > +}
> > > > > > > > > > > > +
> > > > > > > > > > > > +static void *virtnet_rq_get_buf(struct receive_queue *rq, u32 *len, void **ctx)
> > > > > > > > > > > > +{
> > > > > > > > > > > > +       struct virtnet_rq_data *data;
> > > > > > > > > > > > +       void *buf;
> > > > > > > > > > > > +
> > > > > > > > > > > > +       buf = virtqueue_get_buf_ctx(rq->vq, len, ctx);
> > > > > > > > > > > > +       if (!buf || !rq->data_array)
> > > > > > > > > > > > +               return buf;
> > > > > > > > > > > > +
> > > > > > > > > > > > +       data = buf;
> > > > > > > > > > > > +
> > > > > > > > > > > > +       virtnet_rq_unmap(rq, data->dma);
> > > > > > > > > > > > +
> > > > > > > > > > > > +       return virtnet_rq_recycle_data(rq, data);
> > > > > > > > > > > > +}
> > > > > > > > > > > > +
> > > > > > > > > > > > +static void *virtnet_rq_detach_unused_buf(struct receive_queue *rq)
> > > > > > > > > > > > +{
> > > > > > > > > > > > +       struct virtnet_rq_data *data;
> > > > > > > > > > > > +       void *buf;
> > > > > > > > > > > > +
> > > > > > > > > > > > +       buf = virtqueue_detach_unused_buf(rq->vq);
> > > > > > > > > > > > +       if (!buf || !rq->data_array)
> > > > > > > > > > > > +               return buf;
> > > > > > > > > > > > +
> > > > > > > > > > > > +       data = buf;
> > > > > > > > > > > > +
> > > > > > > > > > > > +       virtnet_rq_unmap(rq, data->dma);
> > > > > > > > > > > > +
> > > > > > > > > > > > +       return virtnet_rq_recycle_data(rq, data);
> > > > > > > > > > > > +}
> > > > > > > > > > > > +
> > > > > > > > > > > > +static int virtnet_rq_map_sg(struct receive_queue *rq, void *buf, u32 len)
> > > > > > > > > > > > +{
> > > > > > > > > > > > +       struct virtnet_rq_dma *dma = rq->last_dma;
> > > > > > > > > > > > +       struct device *dev;
> > > > > > > > > > > > +       u32 off, map_len;
> > > > > > > > > > > > +       dma_addr_t addr;
> > > > > > > > > > > > +       void *end;
> > > > > > > > > > > > +
> > > > > > > > > > > > +       if (likely(dma) && buf >= dma->buf && (buf + len <= dma->buf + dma->len)) {
> > > > > > > > > > > > +               ++dma->ref;
> > > > > > > > > > > > +               addr = dma->addr + (buf - dma->buf);
> > > > > > > > > > > > +               goto ok;
> > > > > > > > > > > > +       }
> > > > > > > > > > >
> > > > > > > > > > > So this is the meat of the proposed optimization. I guess that
> > > > > > > > > > > if the last buffer we allocated happens to be in the same page
> > > > > > > > > > > as this one then they can both be mapped for DMA together.
> > > > > > > > > >
> > > > > > > > > > Since we use page_frag, the buffers we allocated are all continuous.
> > > > > > > > > >
> > > > > > > > > > > Why last one specifically? Whether next one happens to
> > > > > > > > > > > be close depends on luck. If you want to try optimizing this
> > > > > > > > > > > the right thing to do is likely by using a page pool.
> > > > > > > > > > > There's actually work upstream on page pool, look it up.
> > > > > > > > > >
> > > > > > > > > > As we discussed in another thread, the page pool is first used for xdp. Let's
> > > > > > > > > > transform it step by step.
> > > > > > > > > >
> > > > > > > > > > Thanks.
> > > > > > > > >
> > > > > > > > > ok so this should wait then?
> > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > > +
> > > > > > > > > > > > +       end = buf + len - 1;
> > > > > > > > > > > > +       off = offset_in_page(end);
> > > > > > > > > > > > +       map_len = len + PAGE_SIZE - off;
> > > > > > > > > > > > +
> > > > > > > > > > > > +       dev = virtqueue_dma_dev(rq->vq);
> > > > > > > > > > > > +
> > > > > > > > > > > > +       addr = dma_map_page_attrs(dev, virt_to_page(buf), offset_in_page(buf),
> > > > > > > > > > > > +                                 map_len, DMA_FROM_DEVICE, 0);
> > > > > > > > > > > > +       if (addr == DMA_MAPPING_ERROR)
> > > > > > > > > > > > +               return -ENOMEM;
> > > > > > > > > > > > +
> > > > > > > > > > > > +       dma = rq->dma_free;
> > > > > > > > > > > > +       rq->dma_free = dma->next;
> > > > > > > > > > > > +
> > > > > > > > > > > > +       dma->ref = 1;
> > > > > > > > > > > > +       dma->buf = buf;
> > > > > > > > > > > > +       dma->addr = addr;
> > > > > > > > > > > > +       dma->len = map_len;
> > > > > > > > > > > > +
> > > > > > > > > > > > +       rq->last_dma = dma;
> > > > > > > > > > > > +
> > > > > > > > > > > > +ok:
> > > > > > > > > > > > +       sg_init_table(rq->sg, 1);
> > > > > > > > > > > > +       rq->sg[0].dma_address = addr;
> > > > > > > > > > > > +       rq->sg[0].length = len;
> > > > > > > > > > > > +
> > > > > > > > > > > > +       return 0;
> > > > > > > > > > > > +}
> > > > > > > > > > > > +
> > > > > > > > > > > > +static int virtnet_rq_merge_map_init(struct virtnet_info *vi)
> > > > > > > > > > > > +{
> > > > > > > > > > > > +       struct receive_queue *rq;
> > > > > > > > > > > > +       int i, err, j, num;
> > > > > > > > > > > > +
> > > > > > > > > > > > +       /* disable for big mode */
> > > > > > > > > > > > +       if (!vi->mergeable_rx_bufs && vi->big_packets)
> > > > > > > > > > > > +               return 0;
> > > > > > > > > > > > +
> > > > > > > > > > > > +       for (i = 0; i < vi->max_queue_pairs; i++) {
> > > > > > > > > > > > +               err = virtqueue_set_premapped(vi->rq[i].vq);
> > > > > > > > > > > > +               if (err)
> > > > > > > > > > > > +                       continue;
> > > > > > > > > > > > +
> > > > > > > > > > > > +               rq = &vi->rq[i];
> > > > > > > > > > > > +
> > > > > > > > > > > > +               num = virtqueue_get_vring_size(rq->vq);
> > > > > > > > > > > > +
> > > > > > > > > > > > +               rq->data_array = kmalloc_array(num, sizeof(*rq->data_array), GFP_KERNEL);
> > > > > > > > > > > > +               if (!rq->data_array)
> > > > > > > > > > > > +                       goto err;
> > > > > > > > > > > > +
> > > > > > > > > > > > +               rq->dma_array = kmalloc_array(num, sizeof(*rq->dma_array), GFP_KERNEL);
> > > > > > > > > > > > +               if (!rq->dma_array)
> > > > > > > > > > > > +                       goto err;
> > > > > > > > > > > > +
> > > > > > > > > > > > +               for (j = 0; j < num; ++j) {
> > > > > > > > > > > > +                       rq->data_array[j].next = rq->data_free;
> > > > > > > > > > > > +                       rq->data_free = &rq->data_array[j];
> > > > > > > > > > > > +
> > > > > > > > > > > > +                       rq->dma_array[j].next = rq->dma_free;
> > > > > > > > > > > > +                       rq->dma_free = &rq->dma_array[j];
> > > > > > > > > > > > +               }
> > > > > > > > > > > > +       }
> > > > > > > > > > > > +
> > > > > > > > > > > > +       return 0;
> > > > > > > > > > > > +
> > > > > > > > > > > > +err:
> > > > > > > > > > > > +       for (i = 0; i < vi->max_queue_pairs; i++) {
> > > > > > > > > > > > +               struct receive_queue *rq;
> > > > > > > > > > > > +
> > > > > > > > > > > > +               rq = &vi->rq[i];
> > > > > > > > > > > > +
> > > > > > > > > > > > +               kfree(rq->dma_array);
> > > > > > > > > > > > +               kfree(rq->data_array);
> > > > > > > > > > > > +       }
> > > > > > > > > > > > +
> > > > > > > > > > > > +       return -ENOMEM;
> > > > > > > > > > > > +}
> > > > > > > > > > > > +
> > > > > > > > > > > >  static void free_old_xmit_skbs(struct send_queue *sq, bool in_napi)
> > > > > > > > > > > >  {
> > > > > > > > > > > >         unsigned int len;
> > > > > > > > > > > > @@ -835,7 +1033,7 @@ static struct page *xdp_linearize_page(struct receive_queue *rq,
> > > > > > > > > > > >                 void *buf;
> > > > > > > > > > > >                 int off;
> > > > > > > > > > > >
> > > > > > > > > > > > -               buf = virtqueue_get_buf(rq->vq, &buflen);
> > > > > > > > > > > > +               buf = virtnet_rq_get_buf(rq, &buflen, NULL);
> > > > > > > > > > > >                 if (unlikely(!buf))
> > > > > > > > > > > >                         goto err_buf;
> > > > > > > > > > > >
> > > > > > > > > > > > @@ -1126,7 +1324,7 @@ static int virtnet_build_xdp_buff_mrg(struct net_device *dev,
> > > > > > > > > > > >                 return -EINVAL;
> > > > > > > > > > > >
> > > > > > > > > > > >         while (--*num_buf > 0) {
> > > > > > > > > > > > -               buf = virtqueue_get_buf_ctx(rq->vq, &len, &ctx);
> > > > > > > > > > > > +               buf = virtnet_rq_get_buf(rq, &len, &ctx);
> > > > > > > > > > > >                 if (unlikely(!buf)) {
> > > > > > > > > > > >                         pr_debug("%s: rx error: %d buffers out of %d missing\n",
> > > > > > > > > > > >                                  dev->name, *num_buf,
> > > > > > > > > > > > @@ -1351,7 +1549,7 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
> > > > > > > > > > > >         while (--num_buf) {
> > > > > > > > > > > >                 int num_skb_frags;
> > > > > > > > > > > >
> > > > > > > > > > > > -               buf = virtqueue_get_buf_ctx(rq->vq, &len, &ctx);
> > > > > > > > > > > > +               buf = virtnet_rq_get_buf(rq, &len, &ctx);
> > > > > > > > > > > >                 if (unlikely(!buf)) {
> > > > > > > > > > > >                         pr_debug("%s: rx error: %d buffers out of %d missing\n",
> > > > > > > > > > > >                                  dev->name, num_buf,
> > > > > > > > > > > > @@ -1414,7 +1612,7 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
> > > > > > > > > > > >  err_skb:
> > > > > > > > > > > >         put_page(page);
> > > > > > > > > > > >         while (num_buf-- > 1) {
> > > > > > > > > > > > -               buf = virtqueue_get_buf(rq->vq, &len);
> > > > > > > > > > > > +               buf = virtnet_rq_get_buf(rq, &len, NULL);
> > > > > > > > > > > >                 if (unlikely(!buf)) {
> > > > > > > > > > > >                         pr_debug("%s: rx error: %d buffers missing\n",
> > > > > > > > > > > >                                  dev->name, num_buf);
> > > > > > > > > > > > @@ -1529,6 +1727,7 @@ static int add_recvbuf_small(struct virtnet_info *vi, struct receive_queue *rq,
> > > > > > > > > > > >         unsigned int xdp_headroom = virtnet_get_headroom(vi);
> > > > > > > > > > > >         void *ctx = (void *)(unsigned long)xdp_headroom;
> > > > > > > > > > > >         int len = vi->hdr_len + VIRTNET_RX_PAD + GOOD_PACKET_LEN + xdp_headroom;
> > > > > > > > > > > > +       struct virtnet_rq_data *data;
> > > > > > > > > > > >         int err;
> > > > > > > > > > > >
> > > > > > > > > > > >         len = SKB_DATA_ALIGN(len) +
> > > > > > > > > > > > @@ -1539,11 +1738,34 @@ static int add_recvbuf_small(struct virtnet_info *vi, struct receive_queue *rq,
> > > > > > > > > > > >         buf = (char *)page_address(alloc_frag->page) + alloc_frag->offset;
> > > > > > > > > > > >         get_page(alloc_frag->page);
> > > > > > > > > > > >         alloc_frag->offset += len;
> > > > > > > > > > > > -       sg_init_one(rq->sg, buf + VIRTNET_RX_PAD + xdp_headroom,
> > > > > > > > > > > > -                   vi->hdr_len + GOOD_PACKET_LEN);
> > > > > > > > > > > > -       err = virtqueue_add_inbuf_ctx(rq->vq, rq->sg, 1, buf, ctx, gfp);
> > > > > > > > > > > > +
> > > > > > > > > > > > +       if (rq->data_array) {
> > > > > > > > > > > > +               err = virtnet_rq_map_sg(rq, buf + VIRTNET_RX_PAD + xdp_headroom,
> > > > > > > > > > > > +                                       vi->hdr_len + GOOD_PACKET_LEN);
> > > > > > > > > > > > +               if (err)
> > > > > > > > > > > > +                       goto map_err;
> > > > > > > > > > > > +
> > > > > > > > > > > > +               data = virtnet_rq_get_data(rq, buf, rq->last_dma);
> > > > > > > > > > > > +       } else {
> > > > > > > > > > > > +               sg_init_one(rq->sg, buf + VIRTNET_RX_PAD + xdp_headroom,
> > > > > > > > > > > > +                           vi->hdr_len + GOOD_PACKET_LEN);
> > > > > > > > > > > > +               data = (void *)buf;
> > > > > > > > > > > > +       }
> > > > > > > > > > > > +
> > > > > > > > > > > > +       err = virtqueue_add_inbuf_ctx(rq->vq, rq->sg, 1, data, ctx, gfp);
> > > > > > > > > > > >         if (err < 0)
> > > > > > > > > > > > -               put_page(virt_to_head_page(buf));
> > > > > > > > > > > > +               goto add_err;
> > > > > > > > > > > > +
> > > > > > > > > > > > +       return err;
> > > > > > > > > > > > +
> > > > > > > > > > > > +add_err:
> > > > > > > > > > > > +       if (rq->data_array) {
> > > > > > > > > > > > +               virtnet_rq_unmap(rq, data->dma);
> > > > > > > > > > > > +               virtnet_rq_recycle_data(rq, data);
> > > > > > > > > > > > +       }
> > > > > > > > > > > > +
> > > > > > > > > > > > +map_err:
> > > > > > > > > > > > +       put_page(virt_to_head_page(buf));
> > > > > > > > > > > >         return err;
> > > > > > > > > > > >  }
> > > > > > > > > > > >
> > > > > > > > > > > > @@ -1620,6 +1842,7 @@ static int add_recvbuf_mergeable(struct virtnet_info *vi,
> > > > > > > > > > > >         unsigned int headroom = virtnet_get_headroom(vi);
> > > > > > > > > > > >         unsigned int tailroom = headroom ? sizeof(struct skb_shared_info) : 0;
> > > > > > > > > > > >         unsigned int room = SKB_DATA_ALIGN(headroom + tailroom);
> > > > > > > > > > > > +       struct virtnet_rq_data *data;
> > > > > > > > > > > >         char *buf;
> > > > > > > > > > > >         void *ctx;
> > > > > > > > > > > >         int err;
> > > > > > > > > > > > @@ -1650,12 +1873,32 @@ static int add_recvbuf_mergeable(struct virtnet_info *vi,
> > > > > > > > > > > >                 alloc_frag->offset += hole;
> > > > > > > > > > > >         }
> > > > > > > > > > > >
> > > > > > > > > > > > -       sg_init_one(rq->sg, buf, len);
> > > > > > > > > > > > +       if (rq->data_array) {
> > > > > > > > > > > > +               err = virtnet_rq_map_sg(rq, buf, len);
> > > > > > > > > > > > +               if (err)
> > > > > > > > > > > > +                       goto map_err;
> > > > > > > > > > > > +
> > > > > > > > > > > > +               data = virtnet_rq_get_data(rq, buf, rq->last_dma);
> > > > > > > > > > > > +       } else {
> > > > > > > > > > > > +               sg_init_one(rq->sg, buf, len);
> > > > > > > > > > > > +               data = (void *)buf;
> > > > > > > > > > > > +       }
> > > > > > > > > > > > +
> > > > > > > > > > > >         ctx = mergeable_len_to_ctx(len + room, headroom);
> > > > > > > > > > > > -       err = virtqueue_add_inbuf_ctx(rq->vq, rq->sg, 1, buf, ctx, gfp);
> > > > > > > > > > > > +       err = virtqueue_add_inbuf_ctx(rq->vq, rq->sg, 1, data, ctx, gfp);
> > > > > > > > > > > >         if (err < 0)
> > > > > > > > > > > > -               put_page(virt_to_head_page(buf));
> > > > > > > > > > > > +               goto add_err;
> > > > > > > > > > > > +
> > > > > > > > > > > > +       return 0;
> > > > > > > > > > > > +
> > > > > > > > > > > > +add_err:
> > > > > > > > > > > > +       if (rq->data_array) {
> > > > > > > > > > > > +               virtnet_rq_unmap(rq, data->dma);
> > > > > > > > > > > > +               virtnet_rq_recycle_data(rq, data);
> > > > > > > > > > > > +       }
> > > > > > > > > > > >
> > > > > > > > > > > > +map_err:
> > > > > > > > > > > > +       put_page(virt_to_head_page(buf));
> > > > > > > > > > > >         return err;
> > > > > > > > > > > >  }
> > > > > > > > > > > >
> > > > > > > > > > > > @@ -1775,13 +2018,13 @@ static int virtnet_receive(struct receive_queue *rq, int budget,
> > > > > > > > > > > >                 void *ctx;
> > > > > > > > > > > >
> > > > > > > > > > > >                 while (stats.packets < budget &&
> > > > > > > > > > > > -                      (buf = virtqueue_get_buf_ctx(rq->vq, &len, &ctx))) {
> > > > > > > > > > > > +                      (buf = virtnet_rq_get_buf(rq, &len, &ctx))) {
> > > > > > > > > > > >                         receive_buf(vi, rq, buf, len, ctx, xdp_xmit, &stats);
> > > > > > > > > > > >                         stats.packets++;
> > > > > > > > > > > >                 }
> > > > > > > > > > > >         } else {
> > > > > > > > > > > >                 while (stats.packets < budget &&
> > > > > > > > > > > > -                      (buf = virtqueue_get_buf(rq->vq, &len)) != NULL) {
> > > > > > > > > > > > +                      (buf = virtnet_rq_get_buf(rq, &len, NULL)) != NULL) {
> > > > > > > > > > > >                         receive_buf(vi, rq, buf, len, NULL, xdp_xmit, &stats);
> > > > > > > > > > > >                         stats.packets++;
> > > > > > > > > > > >                 }
> > > > > > > > > > > > @@ -3514,6 +3757,9 @@ static void virtnet_free_queues(struct virtnet_info *vi)
> > > > > > > > > > > >         for (i = 0; i < vi->max_queue_pairs; i++) {
> > > > > > > > > > > >                 __netif_napi_del(&vi->rq[i].napi);
> > > > > > > > > > > >                 __netif_napi_del(&vi->sq[i].napi);
> > > > > > > > > > > > +
> > > > > > > > > > > > +               kfree(vi->rq[i].data_array);
> > > > > > > > > > > > +               kfree(vi->rq[i].dma_array);
> > > > > > > > > > > >         }
> > > > > > > > > > > >
> > > > > > > > > > > >         /* We called __netif_napi_del(),
> > > > > > > > > > > > @@ -3591,9 +3837,10 @@ static void free_unused_bufs(struct virtnet_info *vi)
> > > > > > > > > > > >         }
> > > > > > > > > > > >
> > > > > > > > > > > >         for (i = 0; i < vi->max_queue_pairs; i++) {
> > > > > > > > > > > > -               struct virtqueue *vq = vi->rq[i].vq;
> > > > > > > > > > > > -               while ((buf = virtqueue_detach_unused_buf(vq)) != NULL)
> > > > > > > > > > > > -                       virtnet_rq_free_unused_buf(vq, buf);
> > > > > > > > > > > > +               struct receive_queue *rq = &vi->rq[i];
> > > > > > > > > > > > +
> > > > > > > > > > > > +               while ((buf = virtnet_rq_detach_unused_buf(rq)) != NULL)
> > > > > > > > > > > > +                       virtnet_rq_free_unused_buf(rq->vq, buf);
> > > > > > > > > > > >                 cond_resched();
> > > > > > > > > > > >         }
> > > > > > > > > > > >  }
> > > > > > > > > > > > @@ -3767,6 +4014,10 @@ static int init_vqs(struct virtnet_info *vi)
> > > > > > > > > > > >         if (ret)
> > > > > > > > > > > >                 goto err_free;
> > > > > > > > > > > >
> > > > > > > > > > > > +       ret = virtnet_rq_merge_map_init(vi);
> > > > > > > > > > > > +       if (ret)
> > > > > > > > > > > > +               goto err_free;
> > > > > > > > > > > > +
> > > > > > > > > > > >         cpus_read_lock();
> > > > > > > > > > > >         virtnet_set_affinity(vi);
> > > > > > > > > > > >         cpus_read_unlock();
> > > > > > > > > > > > --
> > > > > > > > > > > > 2.32.0.3.g01195cf9f
> > > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >


^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: [PATCH vhost v11 10/10] virtio_net: merge dma operation for one page
  2023-07-14 10:37                           ` Michael S. Tsirkin
@ 2023-07-19  3:21                             ` Xuan Zhuo
  -1 siblings, 0 replies; 176+ messages in thread
From: Xuan Zhuo @ 2023-07-19  3:21 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Jason Wang, virtualization, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Alexei Starovoitov, Daniel Borkmann,
	Jesper Dangaard Brouer, John Fastabend, netdev, bpf,
	Christoph Hellwig

On Fri, 14 Jul 2023 06:37:10 -0400, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> On Wed, Jul 12, 2023 at 04:38:24PM +0800, Xuan Zhuo wrote:
> > On Wed, 12 Jul 2023 16:37:43 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > On Wed, Jul 12, 2023 at 4:33 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > >
> > > > On Wed, 12 Jul 2023 15:54:58 +0800, Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > > > On Tue, 11 Jul 2023 10:58:51 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > > > > On Tue, Jul 11, 2023 at 10:42 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > > > > >
> > > > > > > On Tue, 11 Jul 2023 10:36:17 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > > > > > > On Mon, Jul 10, 2023 at 8:41 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > > > > > > >
> > > > > > > > > On Mon, 10 Jul 2023 07:59:03 -0400, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > > > > > > > > > On Mon, Jul 10, 2023 at 06:18:30PM +0800, Xuan Zhuo wrote:
> > > > > > > > > > > On Mon, 10 Jul 2023 05:40:21 -0400, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > > > > > > > > > > > On Mon, Jul 10, 2023 at 11:42:37AM +0800, Xuan Zhuo wrote:
> > > > > > > > > > > > > Currently, the virtio core will perform a dma operation for each
> > > > > > > > > > > > > operation. Although, the same page may be operated multiple times.
> > > > > > > > > > > > >
> > > > > > > > > > > > > The driver does the dma operation and manages the dma address based the
> > > > > > > > > > > > > feature premapped of virtio core.
> > > > > > > > > > > > >
> > > > > > > > > > > > > This way, we can perform only one dma operation for the same page. In
> > > > > > > > > > > > > the case of mtu 1500, this can reduce a lot of dma operations.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Tested on Aliyun g7.4large machine, in the case of a cpu 100%, pps
> > > > > > > > > > > > > increased from 1893766 to 1901105. An increase of 0.4%.
> > > > > > > > > > > >
> > > > > > > > > > > > what kind of dma was there? an IOMMU? which vendors? in which mode
> > > > > > > > > > > > of operation?
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > Do you mean this:
> > > > > > > > > > >
> > > > > > > > > > > [    0.470816] iommu: Default domain type: Passthrough
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > With passthrough, dma API is just some indirect function calls, they do
> > > > > > > > > > not affect the performance a lot.
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > Yes, this benefit is worthless. I seem to have done a meaningless thing. The
> > > > > > > > > overhead of DMA I observed is indeed not too high.
> > > > > > > >
> > > > > > > > Have you measured with iommu=strict?
> > > > > > >
> > > > > > > I have not tested this way, our environment is pt, I wonder if strict is a
> > > > > > > common scenario. I can test it.
> > > > > >
> > > > > > It's not a common setup, but it's a way to stress DMA layer to see the overhead.
> > > > >
> > > > > kernel command line: intel_iommu=on iommu.strict=1 iommu.passthrough=0
> > > > >
> > > > > virtio-net without merge dma 428614.00 pps
> > > > >
> > > > > virtio-net with merge dma    742853.00 pps
> > > >
> > > >
> > > > kernel command line: intel_iommu=on iommu.strict=0 iommu.passthrough=0
> > > >
> > > > virtio-net without merge dma 775496.00 pps
> > > >
> > > > virtio-net with merge dma    1010514.00 pps
> > > >
> > > >
> > >
> > > Great, let's add those numbers to the changelog.
> >
> >
> > Yes, I will do it in next version.
> >
> >
> > Thanks.
> >
>
> You should also test without iommu but with swiotlb=force


For swiotlb, merge DMA has no benefit, because we still need to copy data from
swiotlb buffer to the origin buffer.

The benefit of the merge DMA is to reduce the operate to the iommu device.

I did some test for this. The result is same.

Thanks.


>
> But first fix the use of DMA API to actually be correct,
> otherwise you are cheating by avoiding synchronization.
>
>
>
> > >
> > > Thanks
> > >
> > > > Thanks.
> > > >
> > > > >
> > > > >
> > > > > Thanks.
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > >
> > > > > > Thanks
> > > > > >
> > > > > > >
> > > > > > > Thanks.
> > > > > > >
> > > > > > >
> > > > > > > >
> > > > > > > > Thanks
> > > > > > > >
> > > > > > > > >
> > > > > > > > > Thanks.
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > Try e.g. bounce buffer. Which is where you will see a problem: your
> > > > > > > > > > patches won't work.
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > > > > > > > > > > >
> > > > > > > > > > > > This kind of difference is likely in the noise.
> > > > > > > > > > >
> > > > > > > > > > > It's really not high, but this is because the proportion of DMA under perf top
> > > > > > > > > > > is not high. Probably that much.
> > > > > > > > > >
> > > > > > > > > > So maybe not worth the complexity.
> > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > > ---
> > > > > > > > > > > > >  drivers/net/virtio_net.c | 283 ++++++++++++++++++++++++++++++++++++---
> > > > > > > > > > > > >  1 file changed, 267 insertions(+), 16 deletions(-)
> > > > > > > > > > > > >
> > > > > > > > > > > > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > > > > > > > > > > > > index 486b5849033d..4de845d35bed 100644
> > > > > > > > > > > > > --- a/drivers/net/virtio_net.c
> > > > > > > > > > > > > +++ b/drivers/net/virtio_net.c
> > > > > > > > > > > > > @@ -126,6 +126,27 @@ static const struct virtnet_stat_desc virtnet_rq_stats_desc[] = {
> > > > > > > > > > > > >  #define VIRTNET_SQ_STATS_LEN   ARRAY_SIZE(virtnet_sq_stats_desc)
> > > > > > > > > > > > >  #define VIRTNET_RQ_STATS_LEN   ARRAY_SIZE(virtnet_rq_stats_desc)
> > > > > > > > > > > > >
> > > > > > > > > > > > > +/* The bufs on the same page may share this struct. */
> > > > > > > > > > > > > +struct virtnet_rq_dma {
> > > > > > > > > > > > > +       struct virtnet_rq_dma *next;
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +       dma_addr_t addr;
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +       void *buf;
> > > > > > > > > > > > > +       u32 len;
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +       u32 ref;
> > > > > > > > > > > > > +};
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +/* Record the dma and buf. */
> > > > > > > > > > > >
> > > > > > > > > > > > I guess I see that. But why?
> > > > > > > > > > > > And these two comments are the extent of the available
> > > > > > > > > > > > documentation, that's not enough I feel.
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > > +struct virtnet_rq_data {
> > > > > > > > > > > > > +       struct virtnet_rq_data *next;
> > > > > > > > > > > >
> > > > > > > > > > > > Is manually reimplementing a linked list the best
> > > > > > > > > > > > we can do?
> > > > > > > > > > >
> > > > > > > > > > > Yes, we can use llist.
> > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +       void *buf;
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +       struct virtnet_rq_dma *dma;
> > > > > > > > > > > > > +};
> > > > > > > > > > > > > +
> > > > > > > > > > > > >  /* Internal representation of a send virtqueue */
> > > > > > > > > > > > >  struct send_queue {
> > > > > > > > > > > > >         /* Virtqueue associated with this send _queue */
> > > > > > > > > > > > > @@ -175,6 +196,13 @@ struct receive_queue {
> > > > > > > > > > > > >         char name[16];
> > > > > > > > > > > > >
> > > > > > > > > > > > >         struct xdp_rxq_info xdp_rxq;
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +       struct virtnet_rq_data *data_array;
> > > > > > > > > > > > > +       struct virtnet_rq_data *data_free;
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +       struct virtnet_rq_dma *dma_array;
> > > > > > > > > > > > > +       struct virtnet_rq_dma *dma_free;
> > > > > > > > > > > > > +       struct virtnet_rq_dma *last_dma;
> > > > > > > > > > > > >  };
> > > > > > > > > > > > >
> > > > > > > > > > > > >  /* This structure can contain rss message with maximum settings for indirection table and keysize
> > > > > > > > > > > > > @@ -549,6 +577,176 @@ static struct sk_buff *page_to_skb(struct virtnet_info *vi,
> > > > > > > > > > > > >         return skb;
> > > > > > > > > > > > >  }
> > > > > > > > > > > > >
> > > > > > > > > > > > > +static void virtnet_rq_unmap(struct receive_queue *rq, struct virtnet_rq_dma *dma)
> > > > > > > > > > > > > +{
> > > > > > > > > > > > > +       struct device *dev;
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +       --dma->ref;
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +       if (dma->ref)
> > > > > > > > > > > > > +               return;
> > > > > > > > > > > > > +
> > > > > > > > > > > >
> > > > > > > > > > > > If you don't unmap there is no guarantee valid data will be
> > > > > > > > > > > > there in the buffer.
> > > > > > > > > > > >
> > > > > > > > > > > > > +       dev = virtqueue_dma_dev(rq->vq);
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +       dma_unmap_page(dev, dma->addr, dma->len, DMA_FROM_DEVICE);
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +       dma->next = rq->dma_free;
> > > > > > > > > > > > > +       rq->dma_free = dma;
> > > > > > > > > > > > > +}
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +static void *virtnet_rq_recycle_data(struct receive_queue *rq,
> > > > > > > > > > > > > +                                    struct virtnet_rq_data *data)
> > > > > > > > > > > > > +{
> > > > > > > > > > > > > +       void *buf;
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +       buf = data->buf;
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +       data->next = rq->data_free;
> > > > > > > > > > > > > +       rq->data_free = data;
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +       return buf;
> > > > > > > > > > > > > +}
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +static struct virtnet_rq_data *virtnet_rq_get_data(struct receive_queue *rq,
> > > > > > > > > > > > > +                                                  void *buf,
> > > > > > > > > > > > > +                                                  struct virtnet_rq_dma *dma)
> > > > > > > > > > > > > +{
> > > > > > > > > > > > > +       struct virtnet_rq_data *data;
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +       data = rq->data_free;
> > > > > > > > > > > > > +       rq->data_free = data->next;
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +       data->buf = buf;
> > > > > > > > > > > > > +       data->dma = dma;
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +       return data;
> > > > > > > > > > > > > +}
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +static void *virtnet_rq_get_buf(struct receive_queue *rq, u32 *len, void **ctx)
> > > > > > > > > > > > > +{
> > > > > > > > > > > > > +       struct virtnet_rq_data *data;
> > > > > > > > > > > > > +       void *buf;
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +       buf = virtqueue_get_buf_ctx(rq->vq, len, ctx);
> > > > > > > > > > > > > +       if (!buf || !rq->data_array)
> > > > > > > > > > > > > +               return buf;
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +       data = buf;
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +       virtnet_rq_unmap(rq, data->dma);
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +       return virtnet_rq_recycle_data(rq, data);
> > > > > > > > > > > > > +}
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +static void *virtnet_rq_detach_unused_buf(struct receive_queue *rq)
> > > > > > > > > > > > > +{
> > > > > > > > > > > > > +       struct virtnet_rq_data *data;
> > > > > > > > > > > > > +       void *buf;
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +       buf = virtqueue_detach_unused_buf(rq->vq);
> > > > > > > > > > > > > +       if (!buf || !rq->data_array)
> > > > > > > > > > > > > +               return buf;
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +       data = buf;
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +       virtnet_rq_unmap(rq, data->dma);
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +       return virtnet_rq_recycle_data(rq, data);
> > > > > > > > > > > > > +}
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +static int virtnet_rq_map_sg(struct receive_queue *rq, void *buf, u32 len)
> > > > > > > > > > > > > +{
> > > > > > > > > > > > > +       struct virtnet_rq_dma *dma = rq->last_dma;
> > > > > > > > > > > > > +       struct device *dev;
> > > > > > > > > > > > > +       u32 off, map_len;
> > > > > > > > > > > > > +       dma_addr_t addr;
> > > > > > > > > > > > > +       void *end;
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +       if (likely(dma) && buf >= dma->buf && (buf + len <= dma->buf + dma->len)) {
> > > > > > > > > > > > > +               ++dma->ref;
> > > > > > > > > > > > > +               addr = dma->addr + (buf - dma->buf);
> > > > > > > > > > > > > +               goto ok;
> > > > > > > > > > > > > +       }
> > > > > > > > > > > >
> > > > > > > > > > > > So this is the meat of the proposed optimization. I guess that
> > > > > > > > > > > > if the last buffer we allocated happens to be in the same page
> > > > > > > > > > > > as this one then they can both be mapped for DMA together.
> > > > > > > > > > >
> > > > > > > > > > > Since we use page_frag, the buffers we allocated are all continuous.
> > > > > > > > > > >
> > > > > > > > > > > > Why last one specifically? Whether next one happens to
> > > > > > > > > > > > be close depends on luck. If you want to try optimizing this
> > > > > > > > > > > > the right thing to do is likely by using a page pool.
> > > > > > > > > > > > There's actually work upstream on page pool, look it up.
> > > > > > > > > > >
> > > > > > > > > > > As we discussed in another thread, the page pool is first used for xdp. Let's
> > > > > > > > > > > transform it step by step.
> > > > > > > > > > >
> > > > > > > > > > > Thanks.
> > > > > > > > > >
> > > > > > > > > > ok so this should wait then?
> > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +       end = buf + len - 1;
> > > > > > > > > > > > > +       off = offset_in_page(end);
> > > > > > > > > > > > > +       map_len = len + PAGE_SIZE - off;
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +       dev = virtqueue_dma_dev(rq->vq);
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +       addr = dma_map_page_attrs(dev, virt_to_page(buf), offset_in_page(buf),
> > > > > > > > > > > > > +                                 map_len, DMA_FROM_DEVICE, 0);
> > > > > > > > > > > > > +       if (addr == DMA_MAPPING_ERROR)
> > > > > > > > > > > > > +               return -ENOMEM;
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +       dma = rq->dma_free;
> > > > > > > > > > > > > +       rq->dma_free = dma->next;
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +       dma->ref = 1;
> > > > > > > > > > > > > +       dma->buf = buf;
> > > > > > > > > > > > > +       dma->addr = addr;
> > > > > > > > > > > > > +       dma->len = map_len;
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +       rq->last_dma = dma;
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +ok:
> > > > > > > > > > > > > +       sg_init_table(rq->sg, 1);
> > > > > > > > > > > > > +       rq->sg[0].dma_address = addr;
> > > > > > > > > > > > > +       rq->sg[0].length = len;
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +       return 0;
> > > > > > > > > > > > > +}
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +static int virtnet_rq_merge_map_init(struct virtnet_info *vi)
> > > > > > > > > > > > > +{
> > > > > > > > > > > > > +       struct receive_queue *rq;
> > > > > > > > > > > > > +       int i, err, j, num;
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +       /* disable for big mode */
> > > > > > > > > > > > > +       if (!vi->mergeable_rx_bufs && vi->big_packets)
> > > > > > > > > > > > > +               return 0;
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +       for (i = 0; i < vi->max_queue_pairs; i++) {
> > > > > > > > > > > > > +               err = virtqueue_set_premapped(vi->rq[i].vq);
> > > > > > > > > > > > > +               if (err)
> > > > > > > > > > > > > +                       continue;
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +               rq = &vi->rq[i];
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +               num = virtqueue_get_vring_size(rq->vq);
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +               rq->data_array = kmalloc_array(num, sizeof(*rq->data_array), GFP_KERNEL);
> > > > > > > > > > > > > +               if (!rq->data_array)
> > > > > > > > > > > > > +                       goto err;
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +               rq->dma_array = kmalloc_array(num, sizeof(*rq->dma_array), GFP_KERNEL);
> > > > > > > > > > > > > +               if (!rq->dma_array)
> > > > > > > > > > > > > +                       goto err;
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +               for (j = 0; j < num; ++j) {
> > > > > > > > > > > > > +                       rq->data_array[j].next = rq->data_free;
> > > > > > > > > > > > > +                       rq->data_free = &rq->data_array[j];
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +                       rq->dma_array[j].next = rq->dma_free;
> > > > > > > > > > > > > +                       rq->dma_free = &rq->dma_array[j];
> > > > > > > > > > > > > +               }
> > > > > > > > > > > > > +       }
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +       return 0;
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +err:
> > > > > > > > > > > > > +       for (i = 0; i < vi->max_queue_pairs; i++) {
> > > > > > > > > > > > > +               struct receive_queue *rq;
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +               rq = &vi->rq[i];
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +               kfree(rq->dma_array);
> > > > > > > > > > > > > +               kfree(rq->data_array);
> > > > > > > > > > > > > +       }
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +       return -ENOMEM;
> > > > > > > > > > > > > +}
> > > > > > > > > > > > > +
> > > > > > > > > > > > >  static void free_old_xmit_skbs(struct send_queue *sq, bool in_napi)
> > > > > > > > > > > > >  {
> > > > > > > > > > > > >         unsigned int len;
> > > > > > > > > > > > > @@ -835,7 +1033,7 @@ static struct page *xdp_linearize_page(struct receive_queue *rq,
> > > > > > > > > > > > >                 void *buf;
> > > > > > > > > > > > >                 int off;
> > > > > > > > > > > > >
> > > > > > > > > > > > > -               buf = virtqueue_get_buf(rq->vq, &buflen);
> > > > > > > > > > > > > +               buf = virtnet_rq_get_buf(rq, &buflen, NULL);
> > > > > > > > > > > > >                 if (unlikely(!buf))
> > > > > > > > > > > > >                         goto err_buf;
> > > > > > > > > > > > >
> > > > > > > > > > > > > @@ -1126,7 +1324,7 @@ static int virtnet_build_xdp_buff_mrg(struct net_device *dev,
> > > > > > > > > > > > >                 return -EINVAL;
> > > > > > > > > > > > >
> > > > > > > > > > > > >         while (--*num_buf > 0) {
> > > > > > > > > > > > > -               buf = virtqueue_get_buf_ctx(rq->vq, &len, &ctx);
> > > > > > > > > > > > > +               buf = virtnet_rq_get_buf(rq, &len, &ctx);
> > > > > > > > > > > > >                 if (unlikely(!buf)) {
> > > > > > > > > > > > >                         pr_debug("%s: rx error: %d buffers out of %d missing\n",
> > > > > > > > > > > > >                                  dev->name, *num_buf,
> > > > > > > > > > > > > @@ -1351,7 +1549,7 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
> > > > > > > > > > > > >         while (--num_buf) {
> > > > > > > > > > > > >                 int num_skb_frags;
> > > > > > > > > > > > >
> > > > > > > > > > > > > -               buf = virtqueue_get_buf_ctx(rq->vq, &len, &ctx);
> > > > > > > > > > > > > +               buf = virtnet_rq_get_buf(rq, &len, &ctx);
> > > > > > > > > > > > >                 if (unlikely(!buf)) {
> > > > > > > > > > > > >                         pr_debug("%s: rx error: %d buffers out of %d missing\n",
> > > > > > > > > > > > >                                  dev->name, num_buf,
> > > > > > > > > > > > > @@ -1414,7 +1612,7 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
> > > > > > > > > > > > >  err_skb:
> > > > > > > > > > > > >         put_page(page);
> > > > > > > > > > > > >         while (num_buf-- > 1) {
> > > > > > > > > > > > > -               buf = virtqueue_get_buf(rq->vq, &len);
> > > > > > > > > > > > > +               buf = virtnet_rq_get_buf(rq, &len, NULL);
> > > > > > > > > > > > >                 if (unlikely(!buf)) {
> > > > > > > > > > > > >                         pr_debug("%s: rx error: %d buffers missing\n",
> > > > > > > > > > > > >                                  dev->name, num_buf);
> > > > > > > > > > > > > @@ -1529,6 +1727,7 @@ static int add_recvbuf_small(struct virtnet_info *vi, struct receive_queue *rq,
> > > > > > > > > > > > >         unsigned int xdp_headroom = virtnet_get_headroom(vi);
> > > > > > > > > > > > >         void *ctx = (void *)(unsigned long)xdp_headroom;
> > > > > > > > > > > > >         int len = vi->hdr_len + VIRTNET_RX_PAD + GOOD_PACKET_LEN + xdp_headroom;
> > > > > > > > > > > > > +       struct virtnet_rq_data *data;
> > > > > > > > > > > > >         int err;
> > > > > > > > > > > > >
> > > > > > > > > > > > >         len = SKB_DATA_ALIGN(len) +
> > > > > > > > > > > > > @@ -1539,11 +1738,34 @@ static int add_recvbuf_small(struct virtnet_info *vi, struct receive_queue *rq,
> > > > > > > > > > > > >         buf = (char *)page_address(alloc_frag->page) + alloc_frag->offset;
> > > > > > > > > > > > >         get_page(alloc_frag->page);
> > > > > > > > > > > > >         alloc_frag->offset += len;
> > > > > > > > > > > > > -       sg_init_one(rq->sg, buf + VIRTNET_RX_PAD + xdp_headroom,
> > > > > > > > > > > > > -                   vi->hdr_len + GOOD_PACKET_LEN);
> > > > > > > > > > > > > -       err = virtqueue_add_inbuf_ctx(rq->vq, rq->sg, 1, buf, ctx, gfp);
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +       if (rq->data_array) {
> > > > > > > > > > > > > +               err = virtnet_rq_map_sg(rq, buf + VIRTNET_RX_PAD + xdp_headroom,
> > > > > > > > > > > > > +                                       vi->hdr_len + GOOD_PACKET_LEN);
> > > > > > > > > > > > > +               if (err)
> > > > > > > > > > > > > +                       goto map_err;
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +               data = virtnet_rq_get_data(rq, buf, rq->last_dma);
> > > > > > > > > > > > > +       } else {
> > > > > > > > > > > > > +               sg_init_one(rq->sg, buf + VIRTNET_RX_PAD + xdp_headroom,
> > > > > > > > > > > > > +                           vi->hdr_len + GOOD_PACKET_LEN);
> > > > > > > > > > > > > +               data = (void *)buf;
> > > > > > > > > > > > > +       }
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +       err = virtqueue_add_inbuf_ctx(rq->vq, rq->sg, 1, data, ctx, gfp);
> > > > > > > > > > > > >         if (err < 0)
> > > > > > > > > > > > > -               put_page(virt_to_head_page(buf));
> > > > > > > > > > > > > +               goto add_err;
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +       return err;
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +add_err:
> > > > > > > > > > > > > +       if (rq->data_array) {
> > > > > > > > > > > > > +               virtnet_rq_unmap(rq, data->dma);
> > > > > > > > > > > > > +               virtnet_rq_recycle_data(rq, data);
> > > > > > > > > > > > > +       }
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +map_err:
> > > > > > > > > > > > > +       put_page(virt_to_head_page(buf));
> > > > > > > > > > > > >         return err;
> > > > > > > > > > > > >  }
> > > > > > > > > > > > >
> > > > > > > > > > > > > @@ -1620,6 +1842,7 @@ static int add_recvbuf_mergeable(struct virtnet_info *vi,
> > > > > > > > > > > > >         unsigned int headroom = virtnet_get_headroom(vi);
> > > > > > > > > > > > >         unsigned int tailroom = headroom ? sizeof(struct skb_shared_info) : 0;
> > > > > > > > > > > > >         unsigned int room = SKB_DATA_ALIGN(headroom + tailroom);
> > > > > > > > > > > > > +       struct virtnet_rq_data *data;
> > > > > > > > > > > > >         char *buf;
> > > > > > > > > > > > >         void *ctx;
> > > > > > > > > > > > >         int err;
> > > > > > > > > > > > > @@ -1650,12 +1873,32 @@ static int add_recvbuf_mergeable(struct virtnet_info *vi,
> > > > > > > > > > > > >                 alloc_frag->offset += hole;
> > > > > > > > > > > > >         }
> > > > > > > > > > > > >
> > > > > > > > > > > > > -       sg_init_one(rq->sg, buf, len);
> > > > > > > > > > > > > +       if (rq->data_array) {
> > > > > > > > > > > > > +               err = virtnet_rq_map_sg(rq, buf, len);
> > > > > > > > > > > > > +               if (err)
> > > > > > > > > > > > > +                       goto map_err;
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +               data = virtnet_rq_get_data(rq, buf, rq->last_dma);
> > > > > > > > > > > > > +       } else {
> > > > > > > > > > > > > +               sg_init_one(rq->sg, buf, len);
> > > > > > > > > > > > > +               data = (void *)buf;
> > > > > > > > > > > > > +       }
> > > > > > > > > > > > > +
> > > > > > > > > > > > >         ctx = mergeable_len_to_ctx(len + room, headroom);
> > > > > > > > > > > > > -       err = virtqueue_add_inbuf_ctx(rq->vq, rq->sg, 1, buf, ctx, gfp);
> > > > > > > > > > > > > +       err = virtqueue_add_inbuf_ctx(rq->vq, rq->sg, 1, data, ctx, gfp);
> > > > > > > > > > > > >         if (err < 0)
> > > > > > > > > > > > > -               put_page(virt_to_head_page(buf));
> > > > > > > > > > > > > +               goto add_err;
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +       return 0;
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +add_err:
> > > > > > > > > > > > > +       if (rq->data_array) {
> > > > > > > > > > > > > +               virtnet_rq_unmap(rq, data->dma);
> > > > > > > > > > > > > +               virtnet_rq_recycle_data(rq, data);
> > > > > > > > > > > > > +       }
> > > > > > > > > > > > >
> > > > > > > > > > > > > +map_err:
> > > > > > > > > > > > > +       put_page(virt_to_head_page(buf));
> > > > > > > > > > > > >         return err;
> > > > > > > > > > > > >  }
> > > > > > > > > > > > >
> > > > > > > > > > > > > @@ -1775,13 +2018,13 @@ static int virtnet_receive(struct receive_queue *rq, int budget,
> > > > > > > > > > > > >                 void *ctx;
> > > > > > > > > > > > >
> > > > > > > > > > > > >                 while (stats.packets < budget &&
> > > > > > > > > > > > > -                      (buf = virtqueue_get_buf_ctx(rq->vq, &len, &ctx))) {
> > > > > > > > > > > > > +                      (buf = virtnet_rq_get_buf(rq, &len, &ctx))) {
> > > > > > > > > > > > >                         receive_buf(vi, rq, buf, len, ctx, xdp_xmit, &stats);
> > > > > > > > > > > > >                         stats.packets++;
> > > > > > > > > > > > >                 }
> > > > > > > > > > > > >         } else {
> > > > > > > > > > > > >                 while (stats.packets < budget &&
> > > > > > > > > > > > > -                      (buf = virtqueue_get_buf(rq->vq, &len)) != NULL) {
> > > > > > > > > > > > > +                      (buf = virtnet_rq_get_buf(rq, &len, NULL)) != NULL) {
> > > > > > > > > > > > >                         receive_buf(vi, rq, buf, len, NULL, xdp_xmit, &stats);
> > > > > > > > > > > > >                         stats.packets++;
> > > > > > > > > > > > >                 }
> > > > > > > > > > > > > @@ -3514,6 +3757,9 @@ static void virtnet_free_queues(struct virtnet_info *vi)
> > > > > > > > > > > > >         for (i = 0; i < vi->max_queue_pairs; i++) {
> > > > > > > > > > > > >                 __netif_napi_del(&vi->rq[i].napi);
> > > > > > > > > > > > >                 __netif_napi_del(&vi->sq[i].napi);
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +               kfree(vi->rq[i].data_array);
> > > > > > > > > > > > > +               kfree(vi->rq[i].dma_array);
> > > > > > > > > > > > >         }
> > > > > > > > > > > > >
> > > > > > > > > > > > >         /* We called __netif_napi_del(),
> > > > > > > > > > > > > @@ -3591,9 +3837,10 @@ static void free_unused_bufs(struct virtnet_info *vi)
> > > > > > > > > > > > >         }
> > > > > > > > > > > > >
> > > > > > > > > > > > >         for (i = 0; i < vi->max_queue_pairs; i++) {
> > > > > > > > > > > > > -               struct virtqueue *vq = vi->rq[i].vq;
> > > > > > > > > > > > > -               while ((buf = virtqueue_detach_unused_buf(vq)) != NULL)
> > > > > > > > > > > > > -                       virtnet_rq_free_unused_buf(vq, buf);
> > > > > > > > > > > > > +               struct receive_queue *rq = &vi->rq[i];
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +               while ((buf = virtnet_rq_detach_unused_buf(rq)) != NULL)
> > > > > > > > > > > > > +                       virtnet_rq_free_unused_buf(rq->vq, buf);
> > > > > > > > > > > > >                 cond_resched();
> > > > > > > > > > > > >         }
> > > > > > > > > > > > >  }
> > > > > > > > > > > > > @@ -3767,6 +4014,10 @@ static int init_vqs(struct virtnet_info *vi)
> > > > > > > > > > > > >         if (ret)
> > > > > > > > > > > > >                 goto err_free;
> > > > > > > > > > > > >
> > > > > > > > > > > > > +       ret = virtnet_rq_merge_map_init(vi);
> > > > > > > > > > > > > +       if (ret)
> > > > > > > > > > > > > +               goto err_free;
> > > > > > > > > > > > > +
> > > > > > > > > > > > >         cpus_read_lock();
> > > > > > > > > > > > >         virtnet_set_affinity(vi);
> > > > > > > > > > > > >         cpus_read_unlock();
> > > > > > > > > > > > > --
> > > > > > > > > > > > > 2.32.0.3.g01195cf9f
> > > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
>

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: [PATCH vhost v11 10/10] virtio_net: merge dma operation for one page
@ 2023-07-19  3:21                             ` Xuan Zhuo
  0 siblings, 0 replies; 176+ messages in thread
From: Xuan Zhuo @ 2023-07-19  3:21 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Jesper Dangaard Brouer, Daniel Borkmann, netdev, John Fastabend,
	Alexei Starovoitov, virtualization, Christoph Hellwig,
	Eric Dumazet, Jakub Kicinski, bpf, Paolo Abeni, David S. Miller

On Fri, 14 Jul 2023 06:37:10 -0400, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> On Wed, Jul 12, 2023 at 04:38:24PM +0800, Xuan Zhuo wrote:
> > On Wed, 12 Jul 2023 16:37:43 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > On Wed, Jul 12, 2023 at 4:33 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > >
> > > > On Wed, 12 Jul 2023 15:54:58 +0800, Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > > > On Tue, 11 Jul 2023 10:58:51 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > > > > On Tue, Jul 11, 2023 at 10:42 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > > > > >
> > > > > > > On Tue, 11 Jul 2023 10:36:17 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > > > > > > On Mon, Jul 10, 2023 at 8:41 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > > > > > > >
> > > > > > > > > On Mon, 10 Jul 2023 07:59:03 -0400, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > > > > > > > > > On Mon, Jul 10, 2023 at 06:18:30PM +0800, Xuan Zhuo wrote:
> > > > > > > > > > > On Mon, 10 Jul 2023 05:40:21 -0400, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > > > > > > > > > > > On Mon, Jul 10, 2023 at 11:42:37AM +0800, Xuan Zhuo wrote:
> > > > > > > > > > > > > Currently, the virtio core will perform a dma operation for each
> > > > > > > > > > > > > operation. Although, the same page may be operated multiple times.
> > > > > > > > > > > > >
> > > > > > > > > > > > > The driver does the dma operation and manages the dma address based the
> > > > > > > > > > > > > feature premapped of virtio core.
> > > > > > > > > > > > >
> > > > > > > > > > > > > This way, we can perform only one dma operation for the same page. In
> > > > > > > > > > > > > the case of mtu 1500, this can reduce a lot of dma operations.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Tested on Aliyun g7.4large machine, in the case of a cpu 100%, pps
> > > > > > > > > > > > > increased from 1893766 to 1901105. An increase of 0.4%.
> > > > > > > > > > > >
> > > > > > > > > > > > what kind of dma was there? an IOMMU? which vendors? in which mode
> > > > > > > > > > > > of operation?
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > Do you mean this:
> > > > > > > > > > >
> > > > > > > > > > > [    0.470816] iommu: Default domain type: Passthrough
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > With passthrough, dma API is just some indirect function calls, they do
> > > > > > > > > > not affect the performance a lot.
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > Yes, this benefit is worthless. I seem to have done a meaningless thing. The
> > > > > > > > > overhead of DMA I observed is indeed not too high.
> > > > > > > >
> > > > > > > > Have you measured with iommu=strict?
> > > > > > >
> > > > > > > I have not tested this way, our environment is pt, I wonder if strict is a
> > > > > > > common scenario. I can test it.
> > > > > >
> > > > > > It's not a common setup, but it's a way to stress DMA layer to see the overhead.
> > > > >
> > > > > kernel command line: intel_iommu=on iommu.strict=1 iommu.passthrough=0
> > > > >
> > > > > virtio-net without merge dma 428614.00 pps
> > > > >
> > > > > virtio-net with merge dma    742853.00 pps
> > > >
> > > >
> > > > kernel command line: intel_iommu=on iommu.strict=0 iommu.passthrough=0
> > > >
> > > > virtio-net without merge dma 775496.00 pps
> > > >
> > > > virtio-net with merge dma    1010514.00 pps
> > > >
> > > >
> > >
> > > Great, let's add those numbers to the changelog.
> >
> >
> > Yes, I will do it in next version.
> >
> >
> > Thanks.
> >
>
> You should also test without iommu but with swiotlb=force


For swiotlb, merge DMA has no benefit, because we still need to copy data from
swiotlb buffer to the origin buffer.

The benefit of the merge DMA is to reduce the operate to the iommu device.

I did some test for this. The result is same.

Thanks.


>
> But first fix the use of DMA API to actually be correct,
> otherwise you are cheating by avoiding synchronization.
>
>
>
> > >
> > > Thanks
> > >
> > > > Thanks.
> > > >
> > > > >
> > > > >
> > > > > Thanks.
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > >
> > > > > > Thanks
> > > > > >
> > > > > > >
> > > > > > > Thanks.
> > > > > > >
> > > > > > >
> > > > > > > >
> > > > > > > > Thanks
> > > > > > > >
> > > > > > > > >
> > > > > > > > > Thanks.
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > Try e.g. bounce buffer. Which is where you will see a problem: your
> > > > > > > > > > patches won't work.
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > > > > > > > > > > >
> > > > > > > > > > > > This kind of difference is likely in the noise.
> > > > > > > > > > >
> > > > > > > > > > > It's really not high, but this is because the proportion of DMA under perf top
> > > > > > > > > > > is not high. Probably that much.
> > > > > > > > > >
> > > > > > > > > > So maybe not worth the complexity.
> > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > > ---
> > > > > > > > > > > > >  drivers/net/virtio_net.c | 283 ++++++++++++++++++++++++++++++++++++---
> > > > > > > > > > > > >  1 file changed, 267 insertions(+), 16 deletions(-)
> > > > > > > > > > > > >
> > > > > > > > > > > > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > > > > > > > > > > > > index 486b5849033d..4de845d35bed 100644
> > > > > > > > > > > > > --- a/drivers/net/virtio_net.c
> > > > > > > > > > > > > +++ b/drivers/net/virtio_net.c
> > > > > > > > > > > > > @@ -126,6 +126,27 @@ static const struct virtnet_stat_desc virtnet_rq_stats_desc[] = {
> > > > > > > > > > > > >  #define VIRTNET_SQ_STATS_LEN   ARRAY_SIZE(virtnet_sq_stats_desc)
> > > > > > > > > > > > >  #define VIRTNET_RQ_STATS_LEN   ARRAY_SIZE(virtnet_rq_stats_desc)
> > > > > > > > > > > > >
> > > > > > > > > > > > > +/* The bufs on the same page may share this struct. */
> > > > > > > > > > > > > +struct virtnet_rq_dma {
> > > > > > > > > > > > > +       struct virtnet_rq_dma *next;
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +       dma_addr_t addr;
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +       void *buf;
> > > > > > > > > > > > > +       u32 len;
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +       u32 ref;
> > > > > > > > > > > > > +};
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +/* Record the dma and buf. */
> > > > > > > > > > > >
> > > > > > > > > > > > I guess I see that. But why?
> > > > > > > > > > > > And these two comments are the extent of the available
> > > > > > > > > > > > documentation, that's not enough I feel.
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > > +struct virtnet_rq_data {
> > > > > > > > > > > > > +       struct virtnet_rq_data *next;
> > > > > > > > > > > >
> > > > > > > > > > > > Is manually reimplementing a linked list the best
> > > > > > > > > > > > we can do?
> > > > > > > > > > >
> > > > > > > > > > > Yes, we can use llist.
> > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +       void *buf;
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +       struct virtnet_rq_dma *dma;
> > > > > > > > > > > > > +};
> > > > > > > > > > > > > +
> > > > > > > > > > > > >  /* Internal representation of a send virtqueue */
> > > > > > > > > > > > >  struct send_queue {
> > > > > > > > > > > > >         /* Virtqueue associated with this send _queue */
> > > > > > > > > > > > > @@ -175,6 +196,13 @@ struct receive_queue {
> > > > > > > > > > > > >         char name[16];
> > > > > > > > > > > > >
> > > > > > > > > > > > >         struct xdp_rxq_info xdp_rxq;
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +       struct virtnet_rq_data *data_array;
> > > > > > > > > > > > > +       struct virtnet_rq_data *data_free;
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +       struct virtnet_rq_dma *dma_array;
> > > > > > > > > > > > > +       struct virtnet_rq_dma *dma_free;
> > > > > > > > > > > > > +       struct virtnet_rq_dma *last_dma;
> > > > > > > > > > > > >  };
> > > > > > > > > > > > >
> > > > > > > > > > > > >  /* This structure can contain rss message with maximum settings for indirection table and keysize
> > > > > > > > > > > > > @@ -549,6 +577,176 @@ static struct sk_buff *page_to_skb(struct virtnet_info *vi,
> > > > > > > > > > > > >         return skb;
> > > > > > > > > > > > >  }
> > > > > > > > > > > > >
> > > > > > > > > > > > > +static void virtnet_rq_unmap(struct receive_queue *rq, struct virtnet_rq_dma *dma)
> > > > > > > > > > > > > +{
> > > > > > > > > > > > > +       struct device *dev;
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +       --dma->ref;
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +       if (dma->ref)
> > > > > > > > > > > > > +               return;
> > > > > > > > > > > > > +
> > > > > > > > > > > >
> > > > > > > > > > > > If you don't unmap there is no guarantee valid data will be
> > > > > > > > > > > > there in the buffer.
> > > > > > > > > > > >
> > > > > > > > > > > > > +       dev = virtqueue_dma_dev(rq->vq);
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +       dma_unmap_page(dev, dma->addr, dma->len, DMA_FROM_DEVICE);
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +       dma->next = rq->dma_free;
> > > > > > > > > > > > > +       rq->dma_free = dma;
> > > > > > > > > > > > > +}
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +static void *virtnet_rq_recycle_data(struct receive_queue *rq,
> > > > > > > > > > > > > +                                    struct virtnet_rq_data *data)
> > > > > > > > > > > > > +{
> > > > > > > > > > > > > +       void *buf;
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +       buf = data->buf;
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +       data->next = rq->data_free;
> > > > > > > > > > > > > +       rq->data_free = data;
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +       return buf;
> > > > > > > > > > > > > +}
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +static struct virtnet_rq_data *virtnet_rq_get_data(struct receive_queue *rq,
> > > > > > > > > > > > > +                                                  void *buf,
> > > > > > > > > > > > > +                                                  struct virtnet_rq_dma *dma)
> > > > > > > > > > > > > +{
> > > > > > > > > > > > > +       struct virtnet_rq_data *data;
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +       data = rq->data_free;
> > > > > > > > > > > > > +       rq->data_free = data->next;
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +       data->buf = buf;
> > > > > > > > > > > > > +       data->dma = dma;
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +       return data;
> > > > > > > > > > > > > +}
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +static void *virtnet_rq_get_buf(struct receive_queue *rq, u32 *len, void **ctx)
> > > > > > > > > > > > > +{
> > > > > > > > > > > > > +       struct virtnet_rq_data *data;
> > > > > > > > > > > > > +       void *buf;
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +       buf = virtqueue_get_buf_ctx(rq->vq, len, ctx);
> > > > > > > > > > > > > +       if (!buf || !rq->data_array)
> > > > > > > > > > > > > +               return buf;
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +       data = buf;
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +       virtnet_rq_unmap(rq, data->dma);
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +       return virtnet_rq_recycle_data(rq, data);
> > > > > > > > > > > > > +}
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +static void *virtnet_rq_detach_unused_buf(struct receive_queue *rq)
> > > > > > > > > > > > > +{
> > > > > > > > > > > > > +       struct virtnet_rq_data *data;
> > > > > > > > > > > > > +       void *buf;
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +       buf = virtqueue_detach_unused_buf(rq->vq);
> > > > > > > > > > > > > +       if (!buf || !rq->data_array)
> > > > > > > > > > > > > +               return buf;
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +       data = buf;
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +       virtnet_rq_unmap(rq, data->dma);
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +       return virtnet_rq_recycle_data(rq, data);
> > > > > > > > > > > > > +}
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +static int virtnet_rq_map_sg(struct receive_queue *rq, void *buf, u32 len)
> > > > > > > > > > > > > +{
> > > > > > > > > > > > > +       struct virtnet_rq_dma *dma = rq->last_dma;
> > > > > > > > > > > > > +       struct device *dev;
> > > > > > > > > > > > > +       u32 off, map_len;
> > > > > > > > > > > > > +       dma_addr_t addr;
> > > > > > > > > > > > > +       void *end;
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +       if (likely(dma) && buf >= dma->buf && (buf + len <= dma->buf + dma->len)) {
> > > > > > > > > > > > > +               ++dma->ref;
> > > > > > > > > > > > > +               addr = dma->addr + (buf - dma->buf);
> > > > > > > > > > > > > +               goto ok;
> > > > > > > > > > > > > +       }
> > > > > > > > > > > >
> > > > > > > > > > > > So this is the meat of the proposed optimization. I guess that
> > > > > > > > > > > > if the last buffer we allocated happens to be in the same page
> > > > > > > > > > > > as this one then they can both be mapped for DMA together.
> > > > > > > > > > >
> > > > > > > > > > > Since we use page_frag, the buffers we allocated are all continuous.
> > > > > > > > > > >
> > > > > > > > > > > > Why last one specifically? Whether next one happens to
> > > > > > > > > > > > be close depends on luck. If you want to try optimizing this
> > > > > > > > > > > > the right thing to do is likely by using a page pool.
> > > > > > > > > > > > There's actually work upstream on page pool, look it up.
> > > > > > > > > > >
> > > > > > > > > > > As we discussed in another thread, the page pool is first used for xdp. Let's
> > > > > > > > > > > transform it step by step.
> > > > > > > > > > >
> > > > > > > > > > > Thanks.
> > > > > > > > > >
> > > > > > > > > > ok so this should wait then?
> > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +       end = buf + len - 1;
> > > > > > > > > > > > > +       off = offset_in_page(end);
> > > > > > > > > > > > > +       map_len = len + PAGE_SIZE - off;
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +       dev = virtqueue_dma_dev(rq->vq);
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +       addr = dma_map_page_attrs(dev, virt_to_page(buf), offset_in_page(buf),
> > > > > > > > > > > > > +                                 map_len, DMA_FROM_DEVICE, 0);
> > > > > > > > > > > > > +       if (addr == DMA_MAPPING_ERROR)
> > > > > > > > > > > > > +               return -ENOMEM;
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +       dma = rq->dma_free;
> > > > > > > > > > > > > +       rq->dma_free = dma->next;
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +       dma->ref = 1;
> > > > > > > > > > > > > +       dma->buf = buf;
> > > > > > > > > > > > > +       dma->addr = addr;
> > > > > > > > > > > > > +       dma->len = map_len;
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +       rq->last_dma = dma;
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +ok:
> > > > > > > > > > > > > +       sg_init_table(rq->sg, 1);
> > > > > > > > > > > > > +       rq->sg[0].dma_address = addr;
> > > > > > > > > > > > > +       rq->sg[0].length = len;
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +       return 0;
> > > > > > > > > > > > > +}
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +static int virtnet_rq_merge_map_init(struct virtnet_info *vi)
> > > > > > > > > > > > > +{
> > > > > > > > > > > > > +       struct receive_queue *rq;
> > > > > > > > > > > > > +       int i, err, j, num;
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +       /* disable for big mode */
> > > > > > > > > > > > > +       if (!vi->mergeable_rx_bufs && vi->big_packets)
> > > > > > > > > > > > > +               return 0;
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +       for (i = 0; i < vi->max_queue_pairs; i++) {
> > > > > > > > > > > > > +               err = virtqueue_set_premapped(vi->rq[i].vq);
> > > > > > > > > > > > > +               if (err)
> > > > > > > > > > > > > +                       continue;
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +               rq = &vi->rq[i];
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +               num = virtqueue_get_vring_size(rq->vq);
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +               rq->data_array = kmalloc_array(num, sizeof(*rq->data_array), GFP_KERNEL);
> > > > > > > > > > > > > +               if (!rq->data_array)
> > > > > > > > > > > > > +                       goto err;
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +               rq->dma_array = kmalloc_array(num, sizeof(*rq->dma_array), GFP_KERNEL);
> > > > > > > > > > > > > +               if (!rq->dma_array)
> > > > > > > > > > > > > +                       goto err;
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +               for (j = 0; j < num; ++j) {
> > > > > > > > > > > > > +                       rq->data_array[j].next = rq->data_free;
> > > > > > > > > > > > > +                       rq->data_free = &rq->data_array[j];
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +                       rq->dma_array[j].next = rq->dma_free;
> > > > > > > > > > > > > +                       rq->dma_free = &rq->dma_array[j];
> > > > > > > > > > > > > +               }
> > > > > > > > > > > > > +       }
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +       return 0;
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +err:
> > > > > > > > > > > > > +       for (i = 0; i < vi->max_queue_pairs; i++) {
> > > > > > > > > > > > > +               struct receive_queue *rq;
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +               rq = &vi->rq[i];
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +               kfree(rq->dma_array);
> > > > > > > > > > > > > +               kfree(rq->data_array);
> > > > > > > > > > > > > +       }
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +       return -ENOMEM;
> > > > > > > > > > > > > +}
> > > > > > > > > > > > > +
> > > > > > > > > > > > >  static void free_old_xmit_skbs(struct send_queue *sq, bool in_napi)
> > > > > > > > > > > > >  {
> > > > > > > > > > > > >         unsigned int len;
> > > > > > > > > > > > > @@ -835,7 +1033,7 @@ static struct page *xdp_linearize_page(struct receive_queue *rq,
> > > > > > > > > > > > >                 void *buf;
> > > > > > > > > > > > >                 int off;
> > > > > > > > > > > > >
> > > > > > > > > > > > > -               buf = virtqueue_get_buf(rq->vq, &buflen);
> > > > > > > > > > > > > +               buf = virtnet_rq_get_buf(rq, &buflen, NULL);
> > > > > > > > > > > > >                 if (unlikely(!buf))
> > > > > > > > > > > > >                         goto err_buf;
> > > > > > > > > > > > >
> > > > > > > > > > > > > @@ -1126,7 +1324,7 @@ static int virtnet_build_xdp_buff_mrg(struct net_device *dev,
> > > > > > > > > > > > >                 return -EINVAL;
> > > > > > > > > > > > >
> > > > > > > > > > > > >         while (--*num_buf > 0) {
> > > > > > > > > > > > > -               buf = virtqueue_get_buf_ctx(rq->vq, &len, &ctx);
> > > > > > > > > > > > > +               buf = virtnet_rq_get_buf(rq, &len, &ctx);
> > > > > > > > > > > > >                 if (unlikely(!buf)) {
> > > > > > > > > > > > >                         pr_debug("%s: rx error: %d buffers out of %d missing\n",
> > > > > > > > > > > > >                                  dev->name, *num_buf,
> > > > > > > > > > > > > @@ -1351,7 +1549,7 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
> > > > > > > > > > > > >         while (--num_buf) {
> > > > > > > > > > > > >                 int num_skb_frags;
> > > > > > > > > > > > >
> > > > > > > > > > > > > -               buf = virtqueue_get_buf_ctx(rq->vq, &len, &ctx);
> > > > > > > > > > > > > +               buf = virtnet_rq_get_buf(rq, &len, &ctx);
> > > > > > > > > > > > >                 if (unlikely(!buf)) {
> > > > > > > > > > > > >                         pr_debug("%s: rx error: %d buffers out of %d missing\n",
> > > > > > > > > > > > >                                  dev->name, num_buf,
> > > > > > > > > > > > > @@ -1414,7 +1612,7 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
> > > > > > > > > > > > >  err_skb:
> > > > > > > > > > > > >         put_page(page);
> > > > > > > > > > > > >         while (num_buf-- > 1) {
> > > > > > > > > > > > > -               buf = virtqueue_get_buf(rq->vq, &len);
> > > > > > > > > > > > > +               buf = virtnet_rq_get_buf(rq, &len, NULL);
> > > > > > > > > > > > >                 if (unlikely(!buf)) {
> > > > > > > > > > > > >                         pr_debug("%s: rx error: %d buffers missing\n",
> > > > > > > > > > > > >                                  dev->name, num_buf);
> > > > > > > > > > > > > @@ -1529,6 +1727,7 @@ static int add_recvbuf_small(struct virtnet_info *vi, struct receive_queue *rq,
> > > > > > > > > > > > >         unsigned int xdp_headroom = virtnet_get_headroom(vi);
> > > > > > > > > > > > >         void *ctx = (void *)(unsigned long)xdp_headroom;
> > > > > > > > > > > > >         int len = vi->hdr_len + VIRTNET_RX_PAD + GOOD_PACKET_LEN + xdp_headroom;
> > > > > > > > > > > > > +       struct virtnet_rq_data *data;
> > > > > > > > > > > > >         int err;
> > > > > > > > > > > > >
> > > > > > > > > > > > >         len = SKB_DATA_ALIGN(len) +
> > > > > > > > > > > > > @@ -1539,11 +1738,34 @@ static int add_recvbuf_small(struct virtnet_info *vi, struct receive_queue *rq,
> > > > > > > > > > > > >         buf = (char *)page_address(alloc_frag->page) + alloc_frag->offset;
> > > > > > > > > > > > >         get_page(alloc_frag->page);
> > > > > > > > > > > > >         alloc_frag->offset += len;
> > > > > > > > > > > > > -       sg_init_one(rq->sg, buf + VIRTNET_RX_PAD + xdp_headroom,
> > > > > > > > > > > > > -                   vi->hdr_len + GOOD_PACKET_LEN);
> > > > > > > > > > > > > -       err = virtqueue_add_inbuf_ctx(rq->vq, rq->sg, 1, buf, ctx, gfp);
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +       if (rq->data_array) {
> > > > > > > > > > > > > +               err = virtnet_rq_map_sg(rq, buf + VIRTNET_RX_PAD + xdp_headroom,
> > > > > > > > > > > > > +                                       vi->hdr_len + GOOD_PACKET_LEN);
> > > > > > > > > > > > > +               if (err)
> > > > > > > > > > > > > +                       goto map_err;
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +               data = virtnet_rq_get_data(rq, buf, rq->last_dma);
> > > > > > > > > > > > > +       } else {
> > > > > > > > > > > > > +               sg_init_one(rq->sg, buf + VIRTNET_RX_PAD + xdp_headroom,
> > > > > > > > > > > > > +                           vi->hdr_len + GOOD_PACKET_LEN);
> > > > > > > > > > > > > +               data = (void *)buf;
> > > > > > > > > > > > > +       }
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +       err = virtqueue_add_inbuf_ctx(rq->vq, rq->sg, 1, data, ctx, gfp);
> > > > > > > > > > > > >         if (err < 0)
> > > > > > > > > > > > > -               put_page(virt_to_head_page(buf));
> > > > > > > > > > > > > +               goto add_err;
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +       return err;
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +add_err:
> > > > > > > > > > > > > +       if (rq->data_array) {
> > > > > > > > > > > > > +               virtnet_rq_unmap(rq, data->dma);
> > > > > > > > > > > > > +               virtnet_rq_recycle_data(rq, data);
> > > > > > > > > > > > > +       }
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +map_err:
> > > > > > > > > > > > > +       put_page(virt_to_head_page(buf));
> > > > > > > > > > > > >         return err;
> > > > > > > > > > > > >  }
> > > > > > > > > > > > >
> > > > > > > > > > > > > @@ -1620,6 +1842,7 @@ static int add_recvbuf_mergeable(struct virtnet_info *vi,
> > > > > > > > > > > > >         unsigned int headroom = virtnet_get_headroom(vi);
> > > > > > > > > > > > >         unsigned int tailroom = headroom ? sizeof(struct skb_shared_info) : 0;
> > > > > > > > > > > > >         unsigned int room = SKB_DATA_ALIGN(headroom + tailroom);
> > > > > > > > > > > > > +       struct virtnet_rq_data *data;
> > > > > > > > > > > > >         char *buf;
> > > > > > > > > > > > >         void *ctx;
> > > > > > > > > > > > >         int err;
> > > > > > > > > > > > > @@ -1650,12 +1873,32 @@ static int add_recvbuf_mergeable(struct virtnet_info *vi,
> > > > > > > > > > > > >                 alloc_frag->offset += hole;
> > > > > > > > > > > > >         }
> > > > > > > > > > > > >
> > > > > > > > > > > > > -       sg_init_one(rq->sg, buf, len);
> > > > > > > > > > > > > +       if (rq->data_array) {
> > > > > > > > > > > > > +               err = virtnet_rq_map_sg(rq, buf, len);
> > > > > > > > > > > > > +               if (err)
> > > > > > > > > > > > > +                       goto map_err;
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +               data = virtnet_rq_get_data(rq, buf, rq->last_dma);
> > > > > > > > > > > > > +       } else {
> > > > > > > > > > > > > +               sg_init_one(rq->sg, buf, len);
> > > > > > > > > > > > > +               data = (void *)buf;
> > > > > > > > > > > > > +       }
> > > > > > > > > > > > > +
> > > > > > > > > > > > >         ctx = mergeable_len_to_ctx(len + room, headroom);
> > > > > > > > > > > > > -       err = virtqueue_add_inbuf_ctx(rq->vq, rq->sg, 1, buf, ctx, gfp);
> > > > > > > > > > > > > +       err = virtqueue_add_inbuf_ctx(rq->vq, rq->sg, 1, data, ctx, gfp);
> > > > > > > > > > > > >         if (err < 0)
> > > > > > > > > > > > > -               put_page(virt_to_head_page(buf));
> > > > > > > > > > > > > +               goto add_err;
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +       return 0;
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +add_err:
> > > > > > > > > > > > > +       if (rq->data_array) {
> > > > > > > > > > > > > +               virtnet_rq_unmap(rq, data->dma);
> > > > > > > > > > > > > +               virtnet_rq_recycle_data(rq, data);
> > > > > > > > > > > > > +       }
> > > > > > > > > > > > >
> > > > > > > > > > > > > +map_err:
> > > > > > > > > > > > > +       put_page(virt_to_head_page(buf));
> > > > > > > > > > > > >         return err;
> > > > > > > > > > > > >  }
> > > > > > > > > > > > >
> > > > > > > > > > > > > @@ -1775,13 +2018,13 @@ static int virtnet_receive(struct receive_queue *rq, int budget,
> > > > > > > > > > > > >                 void *ctx;
> > > > > > > > > > > > >
> > > > > > > > > > > > >                 while (stats.packets < budget &&
> > > > > > > > > > > > > -                      (buf = virtqueue_get_buf_ctx(rq->vq, &len, &ctx))) {
> > > > > > > > > > > > > +                      (buf = virtnet_rq_get_buf(rq, &len, &ctx))) {
> > > > > > > > > > > > >                         receive_buf(vi, rq, buf, len, ctx, xdp_xmit, &stats);
> > > > > > > > > > > > >                         stats.packets++;
> > > > > > > > > > > > >                 }
> > > > > > > > > > > > >         } else {
> > > > > > > > > > > > >                 while (stats.packets < budget &&
> > > > > > > > > > > > > -                      (buf = virtqueue_get_buf(rq->vq, &len)) != NULL) {
> > > > > > > > > > > > > +                      (buf = virtnet_rq_get_buf(rq, &len, NULL)) != NULL) {
> > > > > > > > > > > > >                         receive_buf(vi, rq, buf, len, NULL, xdp_xmit, &stats);
> > > > > > > > > > > > >                         stats.packets++;
> > > > > > > > > > > > >                 }
> > > > > > > > > > > > > @@ -3514,6 +3757,9 @@ static void virtnet_free_queues(struct virtnet_info *vi)
> > > > > > > > > > > > >         for (i = 0; i < vi->max_queue_pairs; i++) {
> > > > > > > > > > > > >                 __netif_napi_del(&vi->rq[i].napi);
> > > > > > > > > > > > >                 __netif_napi_del(&vi->sq[i].napi);
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +               kfree(vi->rq[i].data_array);
> > > > > > > > > > > > > +               kfree(vi->rq[i].dma_array);
> > > > > > > > > > > > >         }
> > > > > > > > > > > > >
> > > > > > > > > > > > >         /* We called __netif_napi_del(),
> > > > > > > > > > > > > @@ -3591,9 +3837,10 @@ static void free_unused_bufs(struct virtnet_info *vi)
> > > > > > > > > > > > >         }
> > > > > > > > > > > > >
> > > > > > > > > > > > >         for (i = 0; i < vi->max_queue_pairs; i++) {
> > > > > > > > > > > > > -               struct virtqueue *vq = vi->rq[i].vq;
> > > > > > > > > > > > > -               while ((buf = virtqueue_detach_unused_buf(vq)) != NULL)
> > > > > > > > > > > > > -                       virtnet_rq_free_unused_buf(vq, buf);
> > > > > > > > > > > > > +               struct receive_queue *rq = &vi->rq[i];
> > > > > > > > > > > > > +
> > > > > > > > > > > > > +               while ((buf = virtnet_rq_detach_unused_buf(rq)) != NULL)
> > > > > > > > > > > > > +                       virtnet_rq_free_unused_buf(rq->vq, buf);
> > > > > > > > > > > > >                 cond_resched();
> > > > > > > > > > > > >         }
> > > > > > > > > > > > >  }
> > > > > > > > > > > > > @@ -3767,6 +4014,10 @@ static int init_vqs(struct virtnet_info *vi)
> > > > > > > > > > > > >         if (ret)
> > > > > > > > > > > > >                 goto err_free;
> > > > > > > > > > > > >
> > > > > > > > > > > > > +       ret = virtnet_rq_merge_map_init(vi);
> > > > > > > > > > > > > +       if (ret)
> > > > > > > > > > > > > +               goto err_free;
> > > > > > > > > > > > > +
> > > > > > > > > > > > >         cpus_read_lock();
> > > > > > > > > > > > >         virtnet_set_affinity(vi);
> > > > > > > > > > > > >         cpus_read_unlock();
> > > > > > > > > > > > > --
> > > > > > > > > > > > > 2.32.0.3.g01195cf9f
> > > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
>
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: [PATCH vhost v11 10/10] virtio_net: merge dma operation for one page
  2023-07-19  3:21                             ` Xuan Zhuo
@ 2023-07-19  8:55                               ` Michael S. Tsirkin
  -1 siblings, 0 replies; 176+ messages in thread
From: Michael S. Tsirkin @ 2023-07-19  8:55 UTC (permalink / raw)
  To: Xuan Zhuo
  Cc: Jason Wang, virtualization, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Alexei Starovoitov, Daniel Borkmann,
	Jesper Dangaard Brouer, John Fastabend, netdev, bpf,
	Christoph Hellwig

On Wed, Jul 19, 2023 at 11:21:07AM +0800, Xuan Zhuo wrote:
> On Fri, 14 Jul 2023 06:37:10 -0400, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > On Wed, Jul 12, 2023 at 04:38:24PM +0800, Xuan Zhuo wrote:
> > > On Wed, 12 Jul 2023 16:37:43 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > > On Wed, Jul 12, 2023 at 4:33 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > > >
> > > > > On Wed, 12 Jul 2023 15:54:58 +0800, Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > > > > On Tue, 11 Jul 2023 10:58:51 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > > > > > On Tue, Jul 11, 2023 at 10:42 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > > > > > >
> > > > > > > > On Tue, 11 Jul 2023 10:36:17 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > > > > > > > On Mon, Jul 10, 2023 at 8:41 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > > > > > > > >
> > > > > > > > > > On Mon, 10 Jul 2023 07:59:03 -0400, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > > > > > > > > > > On Mon, Jul 10, 2023 at 06:18:30PM +0800, Xuan Zhuo wrote:
> > > > > > > > > > > > On Mon, 10 Jul 2023 05:40:21 -0400, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > > > > > > > > > > > > On Mon, Jul 10, 2023 at 11:42:37AM +0800, Xuan Zhuo wrote:
> > > > > > > > > > > > > > Currently, the virtio core will perform a dma operation for each
> > > > > > > > > > > > > > operation. Although, the same page may be operated multiple times.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > The driver does the dma operation and manages the dma address based the
> > > > > > > > > > > > > > feature premapped of virtio core.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > This way, we can perform only one dma operation for the same page. In
> > > > > > > > > > > > > > the case of mtu 1500, this can reduce a lot of dma operations.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Tested on Aliyun g7.4large machine, in the case of a cpu 100%, pps
> > > > > > > > > > > > > > increased from 1893766 to 1901105. An increase of 0.4%.
> > > > > > > > > > > > >
> > > > > > > > > > > > > what kind of dma was there? an IOMMU? which vendors? in which mode
> > > > > > > > > > > > > of operation?
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > Do you mean this:
> > > > > > > > > > > >
> > > > > > > > > > > > [    0.470816] iommu: Default domain type: Passthrough
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > With passthrough, dma API is just some indirect function calls, they do
> > > > > > > > > > > not affect the performance a lot.
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > Yes, this benefit is worthless. I seem to have done a meaningless thing. The
> > > > > > > > > > overhead of DMA I observed is indeed not too high.
> > > > > > > > >
> > > > > > > > > Have you measured with iommu=strict?
> > > > > > > >
> > > > > > > > I have not tested this way, our environment is pt, I wonder if strict is a
> > > > > > > > common scenario. I can test it.
> > > > > > >
> > > > > > > It's not a common setup, but it's a way to stress DMA layer to see the overhead.
> > > > > >
> > > > > > kernel command line: intel_iommu=on iommu.strict=1 iommu.passthrough=0
> > > > > >
> > > > > > virtio-net without merge dma 428614.00 pps
> > > > > >
> > > > > > virtio-net with merge dma    742853.00 pps
> > > > >
> > > > >
> > > > > kernel command line: intel_iommu=on iommu.strict=0 iommu.passthrough=0
> > > > >
> > > > > virtio-net without merge dma 775496.00 pps
> > > > >
> > > > > virtio-net with merge dma    1010514.00 pps
> > > > >
> > > > >
> > > >
> > > > Great, let's add those numbers to the changelog.
> > >
> > >
> > > Yes, I will do it in next version.
> > >
> > >
> > > Thanks.
> > >
> >
> > You should also test without iommu but with swiotlb=force
> 
> 
> For swiotlb, merge DMA has no benefit, because we still need to copy data from
> swiotlb buffer to the origin buffer.
> The benefit of the merge DMA is to reduce the operate to the iommu device.
> 
> I did some test for this. The result is same.
> 
> Thanks.
> 

Did you actually check that it works though?
Looks like with swiotlb you need to synch to trigger a copy
before unmap, and I don't see where it's done in the current
patch.


> 
> >
> > But first fix the use of DMA API to actually be correct,
> > otherwise you are cheating by avoiding synchronization.
> >
> >
> >
> > > >
> > > > Thanks
> > > >
> > > > > Thanks.
> > > > >
> > > > > >
> > > > > >
> > > > > > Thanks.
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > >
> > > > > > > Thanks
> > > > > > >
> > > > > > > >
> > > > > > > > Thanks.
> > > > > > > >
> > > > > > > >
> > > > > > > > >
> > > > > > > > > Thanks
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > Thanks.
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > Try e.g. bounce buffer. Which is where you will see a problem: your
> > > > > > > > > > > patches won't work.
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > > > > > > > > > > > >
> > > > > > > > > > > > > This kind of difference is likely in the noise.
> > > > > > > > > > > >
> > > > > > > > > > > > It's really not high, but this is because the proportion of DMA under perf top
> > > > > > > > > > > > is not high. Probably that much.
> > > > > > > > > > >
> > > > > > > > > > > So maybe not worth the complexity.
> > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > > ---
> > > > > > > > > > > > > >  drivers/net/virtio_net.c | 283 ++++++++++++++++++++++++++++++++++++---
> > > > > > > > > > > > > >  1 file changed, 267 insertions(+), 16 deletions(-)
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > > > > > > > > > > > > > index 486b5849033d..4de845d35bed 100644
> > > > > > > > > > > > > > --- a/drivers/net/virtio_net.c
> > > > > > > > > > > > > > +++ b/drivers/net/virtio_net.c
> > > > > > > > > > > > > > @@ -126,6 +126,27 @@ static const struct virtnet_stat_desc virtnet_rq_stats_desc[] = {
> > > > > > > > > > > > > >  #define VIRTNET_SQ_STATS_LEN   ARRAY_SIZE(virtnet_sq_stats_desc)
> > > > > > > > > > > > > >  #define VIRTNET_RQ_STATS_LEN   ARRAY_SIZE(virtnet_rq_stats_desc)
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > +/* The bufs on the same page may share this struct. */
> > > > > > > > > > > > > > +struct virtnet_rq_dma {
> > > > > > > > > > > > > > +       struct virtnet_rq_dma *next;
> > > > > > > > > > > > > > +
> > > > > > > > > > > > > > +       dma_addr_t addr;
> > > > > > > > > > > > > > +
> > > > > > > > > > > > > > +       void *buf;
> > > > > > > > > > > > > > +       u32 len;
> > > > > > > > > > > > > > +
> > > > > > > > > > > > > > +       u32 ref;
> > > > > > > > > > > > > > +};
> > > > > > > > > > > > > > +
> > > > > > > > > > > > > > +/* Record the dma and buf. */
> > > > > > > > > > > > >
> > > > > > > > > > > > > I guess I see that. But why?
> > > > > > > > > > > > > And these two comments are the extent of the available
> > > > > > > > > > > > > documentation, that's not enough I feel.
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > > +struct virtnet_rq_data {
> > > > > > > > > > > > > > +       struct virtnet_rq_data *next;
> > > > > > > > > > > > >
> > > > > > > > > > > > > Is manually reimplementing a linked list the best
> > > > > > > > > > > > > we can do?
> > > > > > > > > > > >
> > > > > > > > > > > > Yes, we can use llist.
> > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > > +
> > > > > > > > > > > > > > +       void *buf;
> > > > > > > > > > > > > > +
> > > > > > > > > > > > > > +       struct virtnet_rq_dma *dma;
> > > > > > > > > > > > > > +};
> > > > > > > > > > > > > > +
> > > > > > > > > > > > > >  /* Internal representation of a send virtqueue */
> > > > > > > > > > > > > >  struct send_queue {
> > > > > > > > > > > > > >         /* Virtqueue associated with this send _queue */
> > > > > > > > > > > > > > @@ -175,6 +196,13 @@ struct receive_queue {
> > > > > > > > > > > > > >         char name[16];
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >         struct xdp_rxq_info xdp_rxq;
> > > > > > > > > > > > > > +
> > > > > > > > > > > > > > +       struct virtnet_rq_data *data_array;
> > > > > > > > > > > > > > +       struct virtnet_rq_data *data_free;
> > > > > > > > > > > > > > +
> > > > > > > > > > > > > > +       struct virtnet_rq_dma *dma_array;
> > > > > > > > > > > > > > +       struct virtnet_rq_dma *dma_free;
> > > > > > > > > > > > > > +       struct virtnet_rq_dma *last_dma;
> > > > > > > > > > > > > >  };
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >  /* This structure can contain rss message with maximum settings for indirection table and keysize
> > > > > > > > > > > > > > @@ -549,6 +577,176 @@ static struct sk_buff *page_to_skb(struct virtnet_info *vi,
> > > > > > > > > > > > > >         return skb;
> > > > > > > > > > > > > >  }
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > +static void virtnet_rq_unmap(struct receive_queue *rq, struct virtnet_rq_dma *dma)
> > > > > > > > > > > > > > +{
> > > > > > > > > > > > > > +       struct device *dev;
> > > > > > > > > > > > > > +
> > > > > > > > > > > > > > +       --dma->ref;
> > > > > > > > > > > > > > +
> > > > > > > > > > > > > > +       if (dma->ref)
> > > > > > > > > > > > > > +               return;
> > > > > > > > > > > > > > +
> > > > > > > > > > > > >
> > > > > > > > > > > > > If you don't unmap there is no guarantee valid data will be
> > > > > > > > > > > > > there in the buffer.
> > > > > > > > > > > > >
> > > > > > > > > > > > > > +       dev = virtqueue_dma_dev(rq->vq);
> > > > > > > > > > > > > > +
> > > > > > > > > > > > > > +       dma_unmap_page(dev, dma->addr, dma->len, DMA_FROM_DEVICE);
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > > +
> > > > > > > > > > > > > > +       dma->next = rq->dma_free;
> > > > > > > > > > > > > > +       rq->dma_free = dma;
> > > > > > > > > > > > > > +}
> > > > > > > > > > > > > > +
> > > > > > > > > > > > > > +static void *virtnet_rq_recycle_data(struct receive_queue *rq,
> > > > > > > > > > > > > > +                                    struct virtnet_rq_data *data)
> > > > > > > > > > > > > > +{
> > > > > > > > > > > > > > +       void *buf;
> > > > > > > > > > > > > > +
> > > > > > > > > > > > > > +       buf = data->buf;
> > > > > > > > > > > > > > +
> > > > > > > > > > > > > > +       data->next = rq->data_free;
> > > > > > > > > > > > > > +       rq->data_free = data;
> > > > > > > > > > > > > > +
> > > > > > > > > > > > > > +       return buf;
> > > > > > > > > > > > > > +}
> > > > > > > > > > > > > > +
> > > > > > > > > > > > > > +static struct virtnet_rq_data *virtnet_rq_get_data(struct receive_queue *rq,
> > > > > > > > > > > > > > +                                                  void *buf,
> > > > > > > > > > > > > > +                                                  struct virtnet_rq_dma *dma)
> > > > > > > > > > > > > > +{
> > > > > > > > > > > > > > +       struct virtnet_rq_data *data;
> > > > > > > > > > > > > > +
> > > > > > > > > > > > > > +       data = rq->data_free;
> > > > > > > > > > > > > > +       rq->data_free = data->next;
> > > > > > > > > > > > > > +
> > > > > > > > > > > > > > +       data->buf = buf;
> > > > > > > > > > > > > > +       data->dma = dma;
> > > > > > > > > > > > > > +
> > > > > > > > > > > > > > +       return data;
> > > > > > > > > > > > > > +}
> > > > > > > > > > > > > > +
> > > > > > > > > > > > > > +static void *virtnet_rq_get_buf(struct receive_queue *rq, u32 *len, void **ctx)
> > > > > > > > > > > > > > +{
> > > > > > > > > > > > > > +       struct virtnet_rq_data *data;
> > > > > > > > > > > > > > +       void *buf;
> > > > > > > > > > > > > > +
> > > > > > > > > > > > > > +       buf = virtqueue_get_buf_ctx(rq->vq, len, ctx);
> > > > > > > > > > > > > > +       if (!buf || !rq->data_array)
> > > > > > > > > > > > > > +               return buf;
> > > > > > > > > > > > > > +
> > > > > > > > > > > > > > +       data = buf;
> > > > > > > > > > > > > > +
> > > > > > > > > > > > > > +       virtnet_rq_unmap(rq, data->dma);
> > > > > > > > > > > > > > +
> > > > > > > > > > > > > > +       return virtnet_rq_recycle_data(rq, data);
> > > > > > > > > > > > > > +}
> > > > > > > > > > > > > > +
> > > > > > > > > > > > > > +static void *virtnet_rq_detach_unused_buf(struct receive_queue *rq)
> > > > > > > > > > > > > > +{
> > > > > > > > > > > > > > +       struct virtnet_rq_data *data;
> > > > > > > > > > > > > > +       void *buf;
> > > > > > > > > > > > > > +
> > > > > > > > > > > > > > +       buf = virtqueue_detach_unused_buf(rq->vq);
> > > > > > > > > > > > > > +       if (!buf || !rq->data_array)
> > > > > > > > > > > > > > +               return buf;
> > > > > > > > > > > > > > +
> > > > > > > > > > > > > > +       data = buf;
> > > > > > > > > > > > > > +
> > > > > > > > > > > > > > +       virtnet_rq_unmap(rq, data->dma);
> > > > > > > > > > > > > > +
> > > > > > > > > > > > > > +       return virtnet_rq_recycle_data(rq, data);
> > > > > > > > > > > > > > +}
> > > > > > > > > > > > > > +
> > > > > > > > > > > > > > +static int virtnet_rq_map_sg(struct receive_queue *rq, void *buf, u32 len)
> > > > > > > > > > > > > > +{
> > > > > > > > > > > > > > +       struct virtnet_rq_dma *dma = rq->last_dma;
> > > > > > > > > > > > > > +       struct device *dev;
> > > > > > > > > > > > > > +       u32 off, map_len;
> > > > > > > > > > > > > > +       dma_addr_t addr;
> > > > > > > > > > > > > > +       void *end;
> > > > > > > > > > > > > > +
> > > > > > > > > > > > > > +       if (likely(dma) && buf >= dma->buf && (buf + len <= dma->buf + dma->len)) {
> > > > > > > > > > > > > > +               ++dma->ref;
> > > > > > > > > > > > > > +               addr = dma->addr + (buf - dma->buf);
> > > > > > > > > > > > > > +               goto ok;
> > > > > > > > > > > > > > +       }
> > > > > > > > > > > > >
> > > > > > > > > > > > > So this is the meat of the proposed optimization. I guess that
> > > > > > > > > > > > > if the last buffer we allocated happens to be in the same page
> > > > > > > > > > > > > as this one then they can both be mapped for DMA together.
> > > > > > > > > > > >
> > > > > > > > > > > > Since we use page_frag, the buffers we allocated are all continuous.
> > > > > > > > > > > >
> > > > > > > > > > > > > Why last one specifically? Whether next one happens to
> > > > > > > > > > > > > be close depends on luck. If you want to try optimizing this
> > > > > > > > > > > > > the right thing to do is likely by using a page pool.
> > > > > > > > > > > > > There's actually work upstream on page pool, look it up.
> > > > > > > > > > > >
> > > > > > > > > > > > As we discussed in another thread, the page pool is first used for xdp. Let's
> > > > > > > > > > > > transform it step by step.
> > > > > > > > > > > >
> > > > > > > > > > > > Thanks.
> > > > > > > > > > >
> > > > > > > > > > > ok so this should wait then?
> > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > > +
> > > > > > > > > > > > > > +       end = buf + len - 1;
> > > > > > > > > > > > > > +       off = offset_in_page(end);
> > > > > > > > > > > > > > +       map_len = len + PAGE_SIZE - off;
> > > > > > > > > > > > > > +
> > > > > > > > > > > > > > +       dev = virtqueue_dma_dev(rq->vq);
> > > > > > > > > > > > > > +
> > > > > > > > > > > > > > +       addr = dma_map_page_attrs(dev, virt_to_page(buf), offset_in_page(buf),
> > > > > > > > > > > > > > +                                 map_len, DMA_FROM_DEVICE, 0);
> > > > > > > > > > > > > > +       if (addr == DMA_MAPPING_ERROR)
> > > > > > > > > > > > > > +               return -ENOMEM;
> > > > > > > > > > > > > > +
> > > > > > > > > > > > > > +       dma = rq->dma_free;
> > > > > > > > > > > > > > +       rq->dma_free = dma->next;
> > > > > > > > > > > > > > +
> > > > > > > > > > > > > > +       dma->ref = 1;
> > > > > > > > > > > > > > +       dma->buf = buf;
> > > > > > > > > > > > > > +       dma->addr = addr;
> > > > > > > > > > > > > > +       dma->len = map_len;
> > > > > > > > > > > > > > +
> > > > > > > > > > > > > > +       rq->last_dma = dma;
> > > > > > > > > > > > > > +
> > > > > > > > > > > > > > +ok:
> > > > > > > > > > > > > > +       sg_init_table(rq->sg, 1);
> > > > > > > > > > > > > > +       rq->sg[0].dma_address = addr;
> > > > > > > > > > > > > > +       rq->sg[0].length = len;
> > > > > > > > > > > > > > +
> > > > > > > > > > > > > > +       return 0;
> > > > > > > > > > > > > > +}
> > > > > > > > > > > > > > +
> > > > > > > > > > > > > > +static int virtnet_rq_merge_map_init(struct virtnet_info *vi)
> > > > > > > > > > > > > > +{
> > > > > > > > > > > > > > +       struct receive_queue *rq;
> > > > > > > > > > > > > > +       int i, err, j, num;
> > > > > > > > > > > > > > +
> > > > > > > > > > > > > > +       /* disable for big mode */
> > > > > > > > > > > > > > +       if (!vi->mergeable_rx_bufs && vi->big_packets)
> > > > > > > > > > > > > > +               return 0;
> > > > > > > > > > > > > > +
> > > > > > > > > > > > > > +       for (i = 0; i < vi->max_queue_pairs; i++) {
> > > > > > > > > > > > > > +               err = virtqueue_set_premapped(vi->rq[i].vq);
> > > > > > > > > > > > > > +               if (err)
> > > > > > > > > > > > > > +                       continue;
> > > > > > > > > > > > > > +
> > > > > > > > > > > > > > +               rq = &vi->rq[i];
> > > > > > > > > > > > > > +
> > > > > > > > > > > > > > +               num = virtqueue_get_vring_size(rq->vq);
> > > > > > > > > > > > > > +
> > > > > > > > > > > > > > +               rq->data_array = kmalloc_array(num, sizeof(*rq->data_array), GFP_KERNEL);
> > > > > > > > > > > > > > +               if (!rq->data_array)
> > > > > > > > > > > > > > +                       goto err;
> > > > > > > > > > > > > > +
> > > > > > > > > > > > > > +               rq->dma_array = kmalloc_array(num, sizeof(*rq->dma_array), GFP_KERNEL);
> > > > > > > > > > > > > > +               if (!rq->dma_array)
> > > > > > > > > > > > > > +                       goto err;
> > > > > > > > > > > > > > +
> > > > > > > > > > > > > > +               for (j = 0; j < num; ++j) {
> > > > > > > > > > > > > > +                       rq->data_array[j].next = rq->data_free;
> > > > > > > > > > > > > > +                       rq->data_free = &rq->data_array[j];
> > > > > > > > > > > > > > +
> > > > > > > > > > > > > > +                       rq->dma_array[j].next = rq->dma_free;
> > > > > > > > > > > > > > +                       rq->dma_free = &rq->dma_array[j];
> > > > > > > > > > > > > > +               }
> > > > > > > > > > > > > > +       }
> > > > > > > > > > > > > > +
> > > > > > > > > > > > > > +       return 0;
> > > > > > > > > > > > > > +
> > > > > > > > > > > > > > +err:
> > > > > > > > > > > > > > +       for (i = 0; i < vi->max_queue_pairs; i++) {
> > > > > > > > > > > > > > +               struct receive_queue *rq;
> > > > > > > > > > > > > > +
> > > > > > > > > > > > > > +               rq = &vi->rq[i];
> > > > > > > > > > > > > > +
> > > > > > > > > > > > > > +               kfree(rq->dma_array);
> > > > > > > > > > > > > > +               kfree(rq->data_array);
> > > > > > > > > > > > > > +       }
> > > > > > > > > > > > > > +
> > > > > > > > > > > > > > +       return -ENOMEM;
> > > > > > > > > > > > > > +}
> > > > > > > > > > > > > > +
> > > > > > > > > > > > > >  static void free_old_xmit_skbs(struct send_queue *sq, bool in_napi)
> > > > > > > > > > > > > >  {
> > > > > > > > > > > > > >         unsigned int len;
> > > > > > > > > > > > > > @@ -835,7 +1033,7 @@ static struct page *xdp_linearize_page(struct receive_queue *rq,
> > > > > > > > > > > > > >                 void *buf;
> > > > > > > > > > > > > >                 int off;
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > -               buf = virtqueue_get_buf(rq->vq, &buflen);
> > > > > > > > > > > > > > +               buf = virtnet_rq_get_buf(rq, &buflen, NULL);
> > > > > > > > > > > > > >                 if (unlikely(!buf))
> > > > > > > > > > > > > >                         goto err_buf;
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > @@ -1126,7 +1324,7 @@ static int virtnet_build_xdp_buff_mrg(struct net_device *dev,
> > > > > > > > > > > > > >                 return -EINVAL;
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >         while (--*num_buf > 0) {
> > > > > > > > > > > > > > -               buf = virtqueue_get_buf_ctx(rq->vq, &len, &ctx);
> > > > > > > > > > > > > > +               buf = virtnet_rq_get_buf(rq, &len, &ctx);
> > > > > > > > > > > > > >                 if (unlikely(!buf)) {
> > > > > > > > > > > > > >                         pr_debug("%s: rx error: %d buffers out of %d missing\n",
> > > > > > > > > > > > > >                                  dev->name, *num_buf,
> > > > > > > > > > > > > > @@ -1351,7 +1549,7 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
> > > > > > > > > > > > > >         while (--num_buf) {
> > > > > > > > > > > > > >                 int num_skb_frags;
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > -               buf = virtqueue_get_buf_ctx(rq->vq, &len, &ctx);
> > > > > > > > > > > > > > +               buf = virtnet_rq_get_buf(rq, &len, &ctx);
> > > > > > > > > > > > > >                 if (unlikely(!buf)) {
> > > > > > > > > > > > > >                         pr_debug("%s: rx error: %d buffers out of %d missing\n",
> > > > > > > > > > > > > >                                  dev->name, num_buf,
> > > > > > > > > > > > > > @@ -1414,7 +1612,7 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
> > > > > > > > > > > > > >  err_skb:
> > > > > > > > > > > > > >         put_page(page);
> > > > > > > > > > > > > >         while (num_buf-- > 1) {
> > > > > > > > > > > > > > -               buf = virtqueue_get_buf(rq->vq, &len);
> > > > > > > > > > > > > > +               buf = virtnet_rq_get_buf(rq, &len, NULL);
> > > > > > > > > > > > > >                 if (unlikely(!buf)) {
> > > > > > > > > > > > > >                         pr_debug("%s: rx error: %d buffers missing\n",
> > > > > > > > > > > > > >                                  dev->name, num_buf);
> > > > > > > > > > > > > > @@ -1529,6 +1727,7 @@ static int add_recvbuf_small(struct virtnet_info *vi, struct receive_queue *rq,
> > > > > > > > > > > > > >         unsigned int xdp_headroom = virtnet_get_headroom(vi);
> > > > > > > > > > > > > >         void *ctx = (void *)(unsigned long)xdp_headroom;
> > > > > > > > > > > > > >         int len = vi->hdr_len + VIRTNET_RX_PAD + GOOD_PACKET_LEN + xdp_headroom;
> > > > > > > > > > > > > > +       struct virtnet_rq_data *data;
> > > > > > > > > > > > > >         int err;
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >         len = SKB_DATA_ALIGN(len) +
> > > > > > > > > > > > > > @@ -1539,11 +1738,34 @@ static int add_recvbuf_small(struct virtnet_info *vi, struct receive_queue *rq,
> > > > > > > > > > > > > >         buf = (char *)page_address(alloc_frag->page) + alloc_frag->offset;
> > > > > > > > > > > > > >         get_page(alloc_frag->page);
> > > > > > > > > > > > > >         alloc_frag->offset += len;
> > > > > > > > > > > > > > -       sg_init_one(rq->sg, buf + VIRTNET_RX_PAD + xdp_headroom,
> > > > > > > > > > > > > > -                   vi->hdr_len + GOOD_PACKET_LEN);
> > > > > > > > > > > > > > -       err = virtqueue_add_inbuf_ctx(rq->vq, rq->sg, 1, buf, ctx, gfp);
> > > > > > > > > > > > > > +
> > > > > > > > > > > > > > +       if (rq->data_array) {
> > > > > > > > > > > > > > +               err = virtnet_rq_map_sg(rq, buf + VIRTNET_RX_PAD + xdp_headroom,
> > > > > > > > > > > > > > +                                       vi->hdr_len + GOOD_PACKET_LEN);
> > > > > > > > > > > > > > +               if (err)
> > > > > > > > > > > > > > +                       goto map_err;
> > > > > > > > > > > > > > +
> > > > > > > > > > > > > > +               data = virtnet_rq_get_data(rq, buf, rq->last_dma);
> > > > > > > > > > > > > > +       } else {
> > > > > > > > > > > > > > +               sg_init_one(rq->sg, buf + VIRTNET_RX_PAD + xdp_headroom,
> > > > > > > > > > > > > > +                           vi->hdr_len + GOOD_PACKET_LEN);
> > > > > > > > > > > > > > +               data = (void *)buf;
> > > > > > > > > > > > > > +       }
> > > > > > > > > > > > > > +
> > > > > > > > > > > > > > +       err = virtqueue_add_inbuf_ctx(rq->vq, rq->sg, 1, data, ctx, gfp);
> > > > > > > > > > > > > >         if (err < 0)
> > > > > > > > > > > > > > -               put_page(virt_to_head_page(buf));
> > > > > > > > > > > > > > +               goto add_err;
> > > > > > > > > > > > > > +
> > > > > > > > > > > > > > +       return err;
> > > > > > > > > > > > > > +
> > > > > > > > > > > > > > +add_err:
> > > > > > > > > > > > > > +       if (rq->data_array) {
> > > > > > > > > > > > > > +               virtnet_rq_unmap(rq, data->dma);
> > > > > > > > > > > > > > +               virtnet_rq_recycle_data(rq, data);
> > > > > > > > > > > > > > +       }
> > > > > > > > > > > > > > +
> > > > > > > > > > > > > > +map_err:
> > > > > > > > > > > > > > +       put_page(virt_to_head_page(buf));
> > > > > > > > > > > > > >         return err;
> > > > > > > > > > > > > >  }
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > @@ -1620,6 +1842,7 @@ static int add_recvbuf_mergeable(struct virtnet_info *vi,
> > > > > > > > > > > > > >         unsigned int headroom = virtnet_get_headroom(vi);
> > > > > > > > > > > > > >         unsigned int tailroom = headroom ? sizeof(struct skb_shared_info) : 0;
> > > > > > > > > > > > > >         unsigned int room = SKB_DATA_ALIGN(headroom + tailroom);
> > > > > > > > > > > > > > +       struct virtnet_rq_data *data;
> > > > > > > > > > > > > >         char *buf;
> > > > > > > > > > > > > >         void *ctx;
> > > > > > > > > > > > > >         int err;
> > > > > > > > > > > > > > @@ -1650,12 +1873,32 @@ static int add_recvbuf_mergeable(struct virtnet_info *vi,
> > > > > > > > > > > > > >                 alloc_frag->offset += hole;
> > > > > > > > > > > > > >         }
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > -       sg_init_one(rq->sg, buf, len);
> > > > > > > > > > > > > > +       if (rq->data_array) {
> > > > > > > > > > > > > > +               err = virtnet_rq_map_sg(rq, buf, len);
> > > > > > > > > > > > > > +               if (err)
> > > > > > > > > > > > > > +                       goto map_err;
> > > > > > > > > > > > > > +
> > > > > > > > > > > > > > +               data = virtnet_rq_get_data(rq, buf, rq->last_dma);
> > > > > > > > > > > > > > +       } else {
> > > > > > > > > > > > > > +               sg_init_one(rq->sg, buf, len);
> > > > > > > > > > > > > > +               data = (void *)buf;
> > > > > > > > > > > > > > +       }
> > > > > > > > > > > > > > +
> > > > > > > > > > > > > >         ctx = mergeable_len_to_ctx(len + room, headroom);
> > > > > > > > > > > > > > -       err = virtqueue_add_inbuf_ctx(rq->vq, rq->sg, 1, buf, ctx, gfp);
> > > > > > > > > > > > > > +       err = virtqueue_add_inbuf_ctx(rq->vq, rq->sg, 1, data, ctx, gfp);
> > > > > > > > > > > > > >         if (err < 0)
> > > > > > > > > > > > > > -               put_page(virt_to_head_page(buf));
> > > > > > > > > > > > > > +               goto add_err;
> > > > > > > > > > > > > > +
> > > > > > > > > > > > > > +       return 0;
> > > > > > > > > > > > > > +
> > > > > > > > > > > > > > +add_err:
> > > > > > > > > > > > > > +       if (rq->data_array) {
> > > > > > > > > > > > > > +               virtnet_rq_unmap(rq, data->dma);
> > > > > > > > > > > > > > +               virtnet_rq_recycle_data(rq, data);
> > > > > > > > > > > > > > +       }
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > +map_err:
> > > > > > > > > > > > > > +       put_page(virt_to_head_page(buf));
> > > > > > > > > > > > > >         return err;
> > > > > > > > > > > > > >  }
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > @@ -1775,13 +2018,13 @@ static int virtnet_receive(struct receive_queue *rq, int budget,
> > > > > > > > > > > > > >                 void *ctx;
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >                 while (stats.packets < budget &&
> > > > > > > > > > > > > > -                      (buf = virtqueue_get_buf_ctx(rq->vq, &len, &ctx))) {
> > > > > > > > > > > > > > +                      (buf = virtnet_rq_get_buf(rq, &len, &ctx))) {
> > > > > > > > > > > > > >                         receive_buf(vi, rq, buf, len, ctx, xdp_xmit, &stats);
> > > > > > > > > > > > > >                         stats.packets++;
> > > > > > > > > > > > > >                 }
> > > > > > > > > > > > > >         } else {
> > > > > > > > > > > > > >                 while (stats.packets < budget &&
> > > > > > > > > > > > > > -                      (buf = virtqueue_get_buf(rq->vq, &len)) != NULL) {
> > > > > > > > > > > > > > +                      (buf = virtnet_rq_get_buf(rq, &len, NULL)) != NULL) {
> > > > > > > > > > > > > >                         receive_buf(vi, rq, buf, len, NULL, xdp_xmit, &stats);
> > > > > > > > > > > > > >                         stats.packets++;
> > > > > > > > > > > > > >                 }
> > > > > > > > > > > > > > @@ -3514,6 +3757,9 @@ static void virtnet_free_queues(struct virtnet_info *vi)
> > > > > > > > > > > > > >         for (i = 0; i < vi->max_queue_pairs; i++) {
> > > > > > > > > > > > > >                 __netif_napi_del(&vi->rq[i].napi);
> > > > > > > > > > > > > >                 __netif_napi_del(&vi->sq[i].napi);
> > > > > > > > > > > > > > +
> > > > > > > > > > > > > > +               kfree(vi->rq[i].data_array);
> > > > > > > > > > > > > > +               kfree(vi->rq[i].dma_array);
> > > > > > > > > > > > > >         }
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >         /* We called __netif_napi_del(),
> > > > > > > > > > > > > > @@ -3591,9 +3837,10 @@ static void free_unused_bufs(struct virtnet_info *vi)
> > > > > > > > > > > > > >         }
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >         for (i = 0; i < vi->max_queue_pairs; i++) {
> > > > > > > > > > > > > > -               struct virtqueue *vq = vi->rq[i].vq;
> > > > > > > > > > > > > > -               while ((buf = virtqueue_detach_unused_buf(vq)) != NULL)
> > > > > > > > > > > > > > -                       virtnet_rq_free_unused_buf(vq, buf);
> > > > > > > > > > > > > > +               struct receive_queue *rq = &vi->rq[i];
> > > > > > > > > > > > > > +
> > > > > > > > > > > > > > +               while ((buf = virtnet_rq_detach_unused_buf(rq)) != NULL)
> > > > > > > > > > > > > > +                       virtnet_rq_free_unused_buf(rq->vq, buf);
> > > > > > > > > > > > > >                 cond_resched();
> > > > > > > > > > > > > >         }
> > > > > > > > > > > > > >  }
> > > > > > > > > > > > > > @@ -3767,6 +4014,10 @@ static int init_vqs(struct virtnet_info *vi)
> > > > > > > > > > > > > >         if (ret)
> > > > > > > > > > > > > >                 goto err_free;
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > +       ret = virtnet_rq_merge_map_init(vi);
> > > > > > > > > > > > > > +       if (ret)
> > > > > > > > > > > > > > +               goto err_free;
> > > > > > > > > > > > > > +
> > > > > > > > > > > > > >         cpus_read_lock();
> > > > > > > > > > > > > >         virtnet_set_affinity(vi);
> > > > > > > > > > > > > >         cpus_read_unlock();
> > > > > > > > > > > > > > --
> > > > > > > > > > > > > > 2.32.0.3.g01195cf9f
> > > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> >


^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: [PATCH vhost v11 10/10] virtio_net: merge dma operation for one page
@ 2023-07-19  8:55                               ` Michael S. Tsirkin
  0 siblings, 0 replies; 176+ messages in thread
From: Michael S. Tsirkin @ 2023-07-19  8:55 UTC (permalink / raw)
  To: Xuan Zhuo
  Cc: Jesper Dangaard Brouer, Daniel Borkmann, netdev, John Fastabend,
	Alexei Starovoitov, virtualization, Christoph Hellwig,
	Eric Dumazet, Jakub Kicinski, bpf, Paolo Abeni, David S. Miller

On Wed, Jul 19, 2023 at 11:21:07AM +0800, Xuan Zhuo wrote:
> On Fri, 14 Jul 2023 06:37:10 -0400, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > On Wed, Jul 12, 2023 at 04:38:24PM +0800, Xuan Zhuo wrote:
> > > On Wed, 12 Jul 2023 16:37:43 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > > On Wed, Jul 12, 2023 at 4:33 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > > >
> > > > > On Wed, 12 Jul 2023 15:54:58 +0800, Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > > > > On Tue, 11 Jul 2023 10:58:51 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > > > > > On Tue, Jul 11, 2023 at 10:42 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > > > > > >
> > > > > > > > On Tue, 11 Jul 2023 10:36:17 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > > > > > > > On Mon, Jul 10, 2023 at 8:41 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > > > > > > > >
> > > > > > > > > > On Mon, 10 Jul 2023 07:59:03 -0400, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > > > > > > > > > > On Mon, Jul 10, 2023 at 06:18:30PM +0800, Xuan Zhuo wrote:
> > > > > > > > > > > > On Mon, 10 Jul 2023 05:40:21 -0400, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > > > > > > > > > > > > On Mon, Jul 10, 2023 at 11:42:37AM +0800, Xuan Zhuo wrote:
> > > > > > > > > > > > > > Currently, the virtio core will perform a dma operation for each
> > > > > > > > > > > > > > operation. Although, the same page may be operated multiple times.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > The driver does the dma operation and manages the dma address based the
> > > > > > > > > > > > > > feature premapped of virtio core.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > This way, we can perform only one dma operation for the same page. In
> > > > > > > > > > > > > > the case of mtu 1500, this can reduce a lot of dma operations.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Tested on Aliyun g7.4large machine, in the case of a cpu 100%, pps
> > > > > > > > > > > > > > increased from 1893766 to 1901105. An increase of 0.4%.
> > > > > > > > > > > > >
> > > > > > > > > > > > > what kind of dma was there? an IOMMU? which vendors? in which mode
> > > > > > > > > > > > > of operation?
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > Do you mean this:
> > > > > > > > > > > >
> > > > > > > > > > > > [    0.470816] iommu: Default domain type: Passthrough
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > With passthrough, dma API is just some indirect function calls, they do
> > > > > > > > > > > not affect the performance a lot.
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > Yes, this benefit is worthless. I seem to have done a meaningless thing. The
> > > > > > > > > > overhead of DMA I observed is indeed not too high.
> > > > > > > > >
> > > > > > > > > Have you measured with iommu=strict?
> > > > > > > >
> > > > > > > > I have not tested this way, our environment is pt, I wonder if strict is a
> > > > > > > > common scenario. I can test it.
> > > > > > >
> > > > > > > It's not a common setup, but it's a way to stress DMA layer to see the overhead.
> > > > > >
> > > > > > kernel command line: intel_iommu=on iommu.strict=1 iommu.passthrough=0
> > > > > >
> > > > > > virtio-net without merge dma 428614.00 pps
> > > > > >
> > > > > > virtio-net with merge dma    742853.00 pps
> > > > >
> > > > >
> > > > > kernel command line: intel_iommu=on iommu.strict=0 iommu.passthrough=0
> > > > >
> > > > > virtio-net without merge dma 775496.00 pps
> > > > >
> > > > > virtio-net with merge dma    1010514.00 pps
> > > > >
> > > > >
> > > >
> > > > Great, let's add those numbers to the changelog.
> > >
> > >
> > > Yes, I will do it in next version.
> > >
> > >
> > > Thanks.
> > >
> >
> > You should also test without iommu but with swiotlb=force
> 
> 
> For swiotlb, merge DMA has no benefit, because we still need to copy data from
> swiotlb buffer to the origin buffer.
> The benefit of the merge DMA is to reduce the operate to the iommu device.
> 
> I did some test for this. The result is same.
> 
> Thanks.
> 

Did you actually check that it works though?
Looks like with swiotlb you need to synch to trigger a copy
before unmap, and I don't see where it's done in the current
patch.


> 
> >
> > But first fix the use of DMA API to actually be correct,
> > otherwise you are cheating by avoiding synchronization.
> >
> >
> >
> > > >
> > > > Thanks
> > > >
> > > > > Thanks.
> > > > >
> > > > > >
> > > > > >
> > > > > > Thanks.
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > >
> > > > > > > Thanks
> > > > > > >
> > > > > > > >
> > > > > > > > Thanks.
> > > > > > > >
> > > > > > > >
> > > > > > > > >
> > > > > > > > > Thanks
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > Thanks.
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > Try e.g. bounce buffer. Which is where you will see a problem: your
> > > > > > > > > > > patches won't work.
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > > > > > > > > > > > >
> > > > > > > > > > > > > This kind of difference is likely in the noise.
> > > > > > > > > > > >
> > > > > > > > > > > > It's really not high, but this is because the proportion of DMA under perf top
> > > > > > > > > > > > is not high. Probably that much.
> > > > > > > > > > >
> > > > > > > > > > > So maybe not worth the complexity.
> > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > > ---
> > > > > > > > > > > > > >  drivers/net/virtio_net.c | 283 ++++++++++++++++++++++++++++++++++++---
> > > > > > > > > > > > > >  1 file changed, 267 insertions(+), 16 deletions(-)
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > > > > > > > > > > > > > index 486b5849033d..4de845d35bed 100644
> > > > > > > > > > > > > > --- a/drivers/net/virtio_net.c
> > > > > > > > > > > > > > +++ b/drivers/net/virtio_net.c
> > > > > > > > > > > > > > @@ -126,6 +126,27 @@ static const struct virtnet_stat_desc virtnet_rq_stats_desc[] = {
> > > > > > > > > > > > > >  #define VIRTNET_SQ_STATS_LEN   ARRAY_SIZE(virtnet_sq_stats_desc)
> > > > > > > > > > > > > >  #define VIRTNET_RQ_STATS_LEN   ARRAY_SIZE(virtnet_rq_stats_desc)
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > +/* The bufs on the same page may share this struct. */
> > > > > > > > > > > > > > +struct virtnet_rq_dma {
> > > > > > > > > > > > > > +       struct virtnet_rq_dma *next;
> > > > > > > > > > > > > > +
> > > > > > > > > > > > > > +       dma_addr_t addr;
> > > > > > > > > > > > > > +
> > > > > > > > > > > > > > +       void *buf;
> > > > > > > > > > > > > > +       u32 len;
> > > > > > > > > > > > > > +
> > > > > > > > > > > > > > +       u32 ref;
> > > > > > > > > > > > > > +};
> > > > > > > > > > > > > > +
> > > > > > > > > > > > > > +/* Record the dma and buf. */
> > > > > > > > > > > > >
> > > > > > > > > > > > > I guess I see that. But why?
> > > > > > > > > > > > > And these two comments are the extent of the available
> > > > > > > > > > > > > documentation, that's not enough I feel.
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > > +struct virtnet_rq_data {
> > > > > > > > > > > > > > +       struct virtnet_rq_data *next;
> > > > > > > > > > > > >
> > > > > > > > > > > > > Is manually reimplementing a linked list the best
> > > > > > > > > > > > > we can do?
> > > > > > > > > > > >
> > > > > > > > > > > > Yes, we can use llist.
> > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > > +
> > > > > > > > > > > > > > +       void *buf;
> > > > > > > > > > > > > > +
> > > > > > > > > > > > > > +       struct virtnet_rq_dma *dma;
> > > > > > > > > > > > > > +};
> > > > > > > > > > > > > > +
> > > > > > > > > > > > > >  /* Internal representation of a send virtqueue */
> > > > > > > > > > > > > >  struct send_queue {
> > > > > > > > > > > > > >         /* Virtqueue associated with this send _queue */
> > > > > > > > > > > > > > @@ -175,6 +196,13 @@ struct receive_queue {
> > > > > > > > > > > > > >         char name[16];
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >         struct xdp_rxq_info xdp_rxq;
> > > > > > > > > > > > > > +
> > > > > > > > > > > > > > +       struct virtnet_rq_data *data_array;
> > > > > > > > > > > > > > +       struct virtnet_rq_data *data_free;
> > > > > > > > > > > > > > +
> > > > > > > > > > > > > > +       struct virtnet_rq_dma *dma_array;
> > > > > > > > > > > > > > +       struct virtnet_rq_dma *dma_free;
> > > > > > > > > > > > > > +       struct virtnet_rq_dma *last_dma;
> > > > > > > > > > > > > >  };
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >  /* This structure can contain rss message with maximum settings for indirection table and keysize
> > > > > > > > > > > > > > @@ -549,6 +577,176 @@ static struct sk_buff *page_to_skb(struct virtnet_info *vi,
> > > > > > > > > > > > > >         return skb;
> > > > > > > > > > > > > >  }
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > +static void virtnet_rq_unmap(struct receive_queue *rq, struct virtnet_rq_dma *dma)
> > > > > > > > > > > > > > +{
> > > > > > > > > > > > > > +       struct device *dev;
> > > > > > > > > > > > > > +
> > > > > > > > > > > > > > +       --dma->ref;
> > > > > > > > > > > > > > +
> > > > > > > > > > > > > > +       if (dma->ref)
> > > > > > > > > > > > > > +               return;
> > > > > > > > > > > > > > +
> > > > > > > > > > > > >
> > > > > > > > > > > > > If you don't unmap there is no guarantee valid data will be
> > > > > > > > > > > > > there in the buffer.
> > > > > > > > > > > > >
> > > > > > > > > > > > > > +       dev = virtqueue_dma_dev(rq->vq);
> > > > > > > > > > > > > > +
> > > > > > > > > > > > > > +       dma_unmap_page(dev, dma->addr, dma->len, DMA_FROM_DEVICE);
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > > +
> > > > > > > > > > > > > > +       dma->next = rq->dma_free;
> > > > > > > > > > > > > > +       rq->dma_free = dma;
> > > > > > > > > > > > > > +}
> > > > > > > > > > > > > > +
> > > > > > > > > > > > > > +static void *virtnet_rq_recycle_data(struct receive_queue *rq,
> > > > > > > > > > > > > > +                                    struct virtnet_rq_data *data)
> > > > > > > > > > > > > > +{
> > > > > > > > > > > > > > +       void *buf;
> > > > > > > > > > > > > > +
> > > > > > > > > > > > > > +       buf = data->buf;
> > > > > > > > > > > > > > +
> > > > > > > > > > > > > > +       data->next = rq->data_free;
> > > > > > > > > > > > > > +       rq->data_free = data;
> > > > > > > > > > > > > > +
> > > > > > > > > > > > > > +       return buf;
> > > > > > > > > > > > > > +}
> > > > > > > > > > > > > > +
> > > > > > > > > > > > > > +static struct virtnet_rq_data *virtnet_rq_get_data(struct receive_queue *rq,
> > > > > > > > > > > > > > +                                                  void *buf,
> > > > > > > > > > > > > > +                                                  struct virtnet_rq_dma *dma)
> > > > > > > > > > > > > > +{
> > > > > > > > > > > > > > +       struct virtnet_rq_data *data;
> > > > > > > > > > > > > > +
> > > > > > > > > > > > > > +       data = rq->data_free;
> > > > > > > > > > > > > > +       rq->data_free = data->next;
> > > > > > > > > > > > > > +
> > > > > > > > > > > > > > +       data->buf = buf;
> > > > > > > > > > > > > > +       data->dma = dma;
> > > > > > > > > > > > > > +
> > > > > > > > > > > > > > +       return data;
> > > > > > > > > > > > > > +}
> > > > > > > > > > > > > > +
> > > > > > > > > > > > > > +static void *virtnet_rq_get_buf(struct receive_queue *rq, u32 *len, void **ctx)
> > > > > > > > > > > > > > +{
> > > > > > > > > > > > > > +       struct virtnet_rq_data *data;
> > > > > > > > > > > > > > +       void *buf;
> > > > > > > > > > > > > > +
> > > > > > > > > > > > > > +       buf = virtqueue_get_buf_ctx(rq->vq, len, ctx);
> > > > > > > > > > > > > > +       if (!buf || !rq->data_array)
> > > > > > > > > > > > > > +               return buf;
> > > > > > > > > > > > > > +
> > > > > > > > > > > > > > +       data = buf;
> > > > > > > > > > > > > > +
> > > > > > > > > > > > > > +       virtnet_rq_unmap(rq, data->dma);
> > > > > > > > > > > > > > +
> > > > > > > > > > > > > > +       return virtnet_rq_recycle_data(rq, data);
> > > > > > > > > > > > > > +}
> > > > > > > > > > > > > > +
> > > > > > > > > > > > > > +static void *virtnet_rq_detach_unused_buf(struct receive_queue *rq)
> > > > > > > > > > > > > > +{
> > > > > > > > > > > > > > +       struct virtnet_rq_data *data;
> > > > > > > > > > > > > > +       void *buf;
> > > > > > > > > > > > > > +
> > > > > > > > > > > > > > +       buf = virtqueue_detach_unused_buf(rq->vq);
> > > > > > > > > > > > > > +       if (!buf || !rq->data_array)
> > > > > > > > > > > > > > +               return buf;
> > > > > > > > > > > > > > +
> > > > > > > > > > > > > > +       data = buf;
> > > > > > > > > > > > > > +
> > > > > > > > > > > > > > +       virtnet_rq_unmap(rq, data->dma);
> > > > > > > > > > > > > > +
> > > > > > > > > > > > > > +       return virtnet_rq_recycle_data(rq, data);
> > > > > > > > > > > > > > +}
> > > > > > > > > > > > > > +
> > > > > > > > > > > > > > +static int virtnet_rq_map_sg(struct receive_queue *rq, void *buf, u32 len)
> > > > > > > > > > > > > > +{
> > > > > > > > > > > > > > +       struct virtnet_rq_dma *dma = rq->last_dma;
> > > > > > > > > > > > > > +       struct device *dev;
> > > > > > > > > > > > > > +       u32 off, map_len;
> > > > > > > > > > > > > > +       dma_addr_t addr;
> > > > > > > > > > > > > > +       void *end;
> > > > > > > > > > > > > > +
> > > > > > > > > > > > > > +       if (likely(dma) && buf >= dma->buf && (buf + len <= dma->buf + dma->len)) {
> > > > > > > > > > > > > > +               ++dma->ref;
> > > > > > > > > > > > > > +               addr = dma->addr + (buf - dma->buf);
> > > > > > > > > > > > > > +               goto ok;
> > > > > > > > > > > > > > +       }
> > > > > > > > > > > > >
> > > > > > > > > > > > > So this is the meat of the proposed optimization. I guess that
> > > > > > > > > > > > > if the last buffer we allocated happens to be in the same page
> > > > > > > > > > > > > as this one then they can both be mapped for DMA together.
> > > > > > > > > > > >
> > > > > > > > > > > > Since we use page_frag, the buffers we allocated are all continuous.
> > > > > > > > > > > >
> > > > > > > > > > > > > Why last one specifically? Whether next one happens to
> > > > > > > > > > > > > be close depends on luck. If you want to try optimizing this
> > > > > > > > > > > > > the right thing to do is likely by using a page pool.
> > > > > > > > > > > > > There's actually work upstream on page pool, look it up.
> > > > > > > > > > > >
> > > > > > > > > > > > As we discussed in another thread, the page pool is first used for xdp. Let's
> > > > > > > > > > > > transform it step by step.
> > > > > > > > > > > >
> > > > > > > > > > > > Thanks.
> > > > > > > > > > >
> > > > > > > > > > > ok so this should wait then?
> > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > > +
> > > > > > > > > > > > > > +       end = buf + len - 1;
> > > > > > > > > > > > > > +       off = offset_in_page(end);
> > > > > > > > > > > > > > +       map_len = len + PAGE_SIZE - off;
> > > > > > > > > > > > > > +
> > > > > > > > > > > > > > +       dev = virtqueue_dma_dev(rq->vq);
> > > > > > > > > > > > > > +
> > > > > > > > > > > > > > +       addr = dma_map_page_attrs(dev, virt_to_page(buf), offset_in_page(buf),
> > > > > > > > > > > > > > +                                 map_len, DMA_FROM_DEVICE, 0);
> > > > > > > > > > > > > > +       if (addr == DMA_MAPPING_ERROR)
> > > > > > > > > > > > > > +               return -ENOMEM;
> > > > > > > > > > > > > > +
> > > > > > > > > > > > > > +       dma = rq->dma_free;
> > > > > > > > > > > > > > +       rq->dma_free = dma->next;
> > > > > > > > > > > > > > +
> > > > > > > > > > > > > > +       dma->ref = 1;
> > > > > > > > > > > > > > +       dma->buf = buf;
> > > > > > > > > > > > > > +       dma->addr = addr;
> > > > > > > > > > > > > > +       dma->len = map_len;
> > > > > > > > > > > > > > +
> > > > > > > > > > > > > > +       rq->last_dma = dma;
> > > > > > > > > > > > > > +
> > > > > > > > > > > > > > +ok:
> > > > > > > > > > > > > > +       sg_init_table(rq->sg, 1);
> > > > > > > > > > > > > > +       rq->sg[0].dma_address = addr;
> > > > > > > > > > > > > > +       rq->sg[0].length = len;
> > > > > > > > > > > > > > +
> > > > > > > > > > > > > > +       return 0;
> > > > > > > > > > > > > > +}
> > > > > > > > > > > > > > +
> > > > > > > > > > > > > > +static int virtnet_rq_merge_map_init(struct virtnet_info *vi)
> > > > > > > > > > > > > > +{
> > > > > > > > > > > > > > +       struct receive_queue *rq;
> > > > > > > > > > > > > > +       int i, err, j, num;
> > > > > > > > > > > > > > +
> > > > > > > > > > > > > > +       /* disable for big mode */
> > > > > > > > > > > > > > +       if (!vi->mergeable_rx_bufs && vi->big_packets)
> > > > > > > > > > > > > > +               return 0;
> > > > > > > > > > > > > > +
> > > > > > > > > > > > > > +       for (i = 0; i < vi->max_queue_pairs; i++) {
> > > > > > > > > > > > > > +               err = virtqueue_set_premapped(vi->rq[i].vq);
> > > > > > > > > > > > > > +               if (err)
> > > > > > > > > > > > > > +                       continue;
> > > > > > > > > > > > > > +
> > > > > > > > > > > > > > +               rq = &vi->rq[i];
> > > > > > > > > > > > > > +
> > > > > > > > > > > > > > +               num = virtqueue_get_vring_size(rq->vq);
> > > > > > > > > > > > > > +
> > > > > > > > > > > > > > +               rq->data_array = kmalloc_array(num, sizeof(*rq->data_array), GFP_KERNEL);
> > > > > > > > > > > > > > +               if (!rq->data_array)
> > > > > > > > > > > > > > +                       goto err;
> > > > > > > > > > > > > > +
> > > > > > > > > > > > > > +               rq->dma_array = kmalloc_array(num, sizeof(*rq->dma_array), GFP_KERNEL);
> > > > > > > > > > > > > > +               if (!rq->dma_array)
> > > > > > > > > > > > > > +                       goto err;
> > > > > > > > > > > > > > +
> > > > > > > > > > > > > > +               for (j = 0; j < num; ++j) {
> > > > > > > > > > > > > > +                       rq->data_array[j].next = rq->data_free;
> > > > > > > > > > > > > > +                       rq->data_free = &rq->data_array[j];
> > > > > > > > > > > > > > +
> > > > > > > > > > > > > > +                       rq->dma_array[j].next = rq->dma_free;
> > > > > > > > > > > > > > +                       rq->dma_free = &rq->dma_array[j];
> > > > > > > > > > > > > > +               }
> > > > > > > > > > > > > > +       }
> > > > > > > > > > > > > > +
> > > > > > > > > > > > > > +       return 0;
> > > > > > > > > > > > > > +
> > > > > > > > > > > > > > +err:
> > > > > > > > > > > > > > +       for (i = 0; i < vi->max_queue_pairs; i++) {
> > > > > > > > > > > > > > +               struct receive_queue *rq;
> > > > > > > > > > > > > > +
> > > > > > > > > > > > > > +               rq = &vi->rq[i];
> > > > > > > > > > > > > > +
> > > > > > > > > > > > > > +               kfree(rq->dma_array);
> > > > > > > > > > > > > > +               kfree(rq->data_array);
> > > > > > > > > > > > > > +       }
> > > > > > > > > > > > > > +
> > > > > > > > > > > > > > +       return -ENOMEM;
> > > > > > > > > > > > > > +}
> > > > > > > > > > > > > > +
> > > > > > > > > > > > > >  static void free_old_xmit_skbs(struct send_queue *sq, bool in_napi)
> > > > > > > > > > > > > >  {
> > > > > > > > > > > > > >         unsigned int len;
> > > > > > > > > > > > > > @@ -835,7 +1033,7 @@ static struct page *xdp_linearize_page(struct receive_queue *rq,
> > > > > > > > > > > > > >                 void *buf;
> > > > > > > > > > > > > >                 int off;
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > -               buf = virtqueue_get_buf(rq->vq, &buflen);
> > > > > > > > > > > > > > +               buf = virtnet_rq_get_buf(rq, &buflen, NULL);
> > > > > > > > > > > > > >                 if (unlikely(!buf))
> > > > > > > > > > > > > >                         goto err_buf;
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > @@ -1126,7 +1324,7 @@ static int virtnet_build_xdp_buff_mrg(struct net_device *dev,
> > > > > > > > > > > > > >                 return -EINVAL;
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >         while (--*num_buf > 0) {
> > > > > > > > > > > > > > -               buf = virtqueue_get_buf_ctx(rq->vq, &len, &ctx);
> > > > > > > > > > > > > > +               buf = virtnet_rq_get_buf(rq, &len, &ctx);
> > > > > > > > > > > > > >                 if (unlikely(!buf)) {
> > > > > > > > > > > > > >                         pr_debug("%s: rx error: %d buffers out of %d missing\n",
> > > > > > > > > > > > > >                                  dev->name, *num_buf,
> > > > > > > > > > > > > > @@ -1351,7 +1549,7 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
> > > > > > > > > > > > > >         while (--num_buf) {
> > > > > > > > > > > > > >                 int num_skb_frags;
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > -               buf = virtqueue_get_buf_ctx(rq->vq, &len, &ctx);
> > > > > > > > > > > > > > +               buf = virtnet_rq_get_buf(rq, &len, &ctx);
> > > > > > > > > > > > > >                 if (unlikely(!buf)) {
> > > > > > > > > > > > > >                         pr_debug("%s: rx error: %d buffers out of %d missing\n",
> > > > > > > > > > > > > >                                  dev->name, num_buf,
> > > > > > > > > > > > > > @@ -1414,7 +1612,7 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
> > > > > > > > > > > > > >  err_skb:
> > > > > > > > > > > > > >         put_page(page);
> > > > > > > > > > > > > >         while (num_buf-- > 1) {
> > > > > > > > > > > > > > -               buf = virtqueue_get_buf(rq->vq, &len);
> > > > > > > > > > > > > > +               buf = virtnet_rq_get_buf(rq, &len, NULL);
> > > > > > > > > > > > > >                 if (unlikely(!buf)) {
> > > > > > > > > > > > > >                         pr_debug("%s: rx error: %d buffers missing\n",
> > > > > > > > > > > > > >                                  dev->name, num_buf);
> > > > > > > > > > > > > > @@ -1529,6 +1727,7 @@ static int add_recvbuf_small(struct virtnet_info *vi, struct receive_queue *rq,
> > > > > > > > > > > > > >         unsigned int xdp_headroom = virtnet_get_headroom(vi);
> > > > > > > > > > > > > >         void *ctx = (void *)(unsigned long)xdp_headroom;
> > > > > > > > > > > > > >         int len = vi->hdr_len + VIRTNET_RX_PAD + GOOD_PACKET_LEN + xdp_headroom;
> > > > > > > > > > > > > > +       struct virtnet_rq_data *data;
> > > > > > > > > > > > > >         int err;
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >         len = SKB_DATA_ALIGN(len) +
> > > > > > > > > > > > > > @@ -1539,11 +1738,34 @@ static int add_recvbuf_small(struct virtnet_info *vi, struct receive_queue *rq,
> > > > > > > > > > > > > >         buf = (char *)page_address(alloc_frag->page) + alloc_frag->offset;
> > > > > > > > > > > > > >         get_page(alloc_frag->page);
> > > > > > > > > > > > > >         alloc_frag->offset += len;
> > > > > > > > > > > > > > -       sg_init_one(rq->sg, buf + VIRTNET_RX_PAD + xdp_headroom,
> > > > > > > > > > > > > > -                   vi->hdr_len + GOOD_PACKET_LEN);
> > > > > > > > > > > > > > -       err = virtqueue_add_inbuf_ctx(rq->vq, rq->sg, 1, buf, ctx, gfp);
> > > > > > > > > > > > > > +
> > > > > > > > > > > > > > +       if (rq->data_array) {
> > > > > > > > > > > > > > +               err = virtnet_rq_map_sg(rq, buf + VIRTNET_RX_PAD + xdp_headroom,
> > > > > > > > > > > > > > +                                       vi->hdr_len + GOOD_PACKET_LEN);
> > > > > > > > > > > > > > +               if (err)
> > > > > > > > > > > > > > +                       goto map_err;
> > > > > > > > > > > > > > +
> > > > > > > > > > > > > > +               data = virtnet_rq_get_data(rq, buf, rq->last_dma);
> > > > > > > > > > > > > > +       } else {
> > > > > > > > > > > > > > +               sg_init_one(rq->sg, buf + VIRTNET_RX_PAD + xdp_headroom,
> > > > > > > > > > > > > > +                           vi->hdr_len + GOOD_PACKET_LEN);
> > > > > > > > > > > > > > +               data = (void *)buf;
> > > > > > > > > > > > > > +       }
> > > > > > > > > > > > > > +
> > > > > > > > > > > > > > +       err = virtqueue_add_inbuf_ctx(rq->vq, rq->sg, 1, data, ctx, gfp);
> > > > > > > > > > > > > >         if (err < 0)
> > > > > > > > > > > > > > -               put_page(virt_to_head_page(buf));
> > > > > > > > > > > > > > +               goto add_err;
> > > > > > > > > > > > > > +
> > > > > > > > > > > > > > +       return err;
> > > > > > > > > > > > > > +
> > > > > > > > > > > > > > +add_err:
> > > > > > > > > > > > > > +       if (rq->data_array) {
> > > > > > > > > > > > > > +               virtnet_rq_unmap(rq, data->dma);
> > > > > > > > > > > > > > +               virtnet_rq_recycle_data(rq, data);
> > > > > > > > > > > > > > +       }
> > > > > > > > > > > > > > +
> > > > > > > > > > > > > > +map_err:
> > > > > > > > > > > > > > +       put_page(virt_to_head_page(buf));
> > > > > > > > > > > > > >         return err;
> > > > > > > > > > > > > >  }
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > @@ -1620,6 +1842,7 @@ static int add_recvbuf_mergeable(struct virtnet_info *vi,
> > > > > > > > > > > > > >         unsigned int headroom = virtnet_get_headroom(vi);
> > > > > > > > > > > > > >         unsigned int tailroom = headroom ? sizeof(struct skb_shared_info) : 0;
> > > > > > > > > > > > > >         unsigned int room = SKB_DATA_ALIGN(headroom + tailroom);
> > > > > > > > > > > > > > +       struct virtnet_rq_data *data;
> > > > > > > > > > > > > >         char *buf;
> > > > > > > > > > > > > >         void *ctx;
> > > > > > > > > > > > > >         int err;
> > > > > > > > > > > > > > @@ -1650,12 +1873,32 @@ static int add_recvbuf_mergeable(struct virtnet_info *vi,
> > > > > > > > > > > > > >                 alloc_frag->offset += hole;
> > > > > > > > > > > > > >         }
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > -       sg_init_one(rq->sg, buf, len);
> > > > > > > > > > > > > > +       if (rq->data_array) {
> > > > > > > > > > > > > > +               err = virtnet_rq_map_sg(rq, buf, len);
> > > > > > > > > > > > > > +               if (err)
> > > > > > > > > > > > > > +                       goto map_err;
> > > > > > > > > > > > > > +
> > > > > > > > > > > > > > +               data = virtnet_rq_get_data(rq, buf, rq->last_dma);
> > > > > > > > > > > > > > +       } else {
> > > > > > > > > > > > > > +               sg_init_one(rq->sg, buf, len);
> > > > > > > > > > > > > > +               data = (void *)buf;
> > > > > > > > > > > > > > +       }
> > > > > > > > > > > > > > +
> > > > > > > > > > > > > >         ctx = mergeable_len_to_ctx(len + room, headroom);
> > > > > > > > > > > > > > -       err = virtqueue_add_inbuf_ctx(rq->vq, rq->sg, 1, buf, ctx, gfp);
> > > > > > > > > > > > > > +       err = virtqueue_add_inbuf_ctx(rq->vq, rq->sg, 1, data, ctx, gfp);
> > > > > > > > > > > > > >         if (err < 0)
> > > > > > > > > > > > > > -               put_page(virt_to_head_page(buf));
> > > > > > > > > > > > > > +               goto add_err;
> > > > > > > > > > > > > > +
> > > > > > > > > > > > > > +       return 0;
> > > > > > > > > > > > > > +
> > > > > > > > > > > > > > +add_err:
> > > > > > > > > > > > > > +       if (rq->data_array) {
> > > > > > > > > > > > > > +               virtnet_rq_unmap(rq, data->dma);
> > > > > > > > > > > > > > +               virtnet_rq_recycle_data(rq, data);
> > > > > > > > > > > > > > +       }
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > +map_err:
> > > > > > > > > > > > > > +       put_page(virt_to_head_page(buf));
> > > > > > > > > > > > > >         return err;
> > > > > > > > > > > > > >  }
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > @@ -1775,13 +2018,13 @@ static int virtnet_receive(struct receive_queue *rq, int budget,
> > > > > > > > > > > > > >                 void *ctx;
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >                 while (stats.packets < budget &&
> > > > > > > > > > > > > > -                      (buf = virtqueue_get_buf_ctx(rq->vq, &len, &ctx))) {
> > > > > > > > > > > > > > +                      (buf = virtnet_rq_get_buf(rq, &len, &ctx))) {
> > > > > > > > > > > > > >                         receive_buf(vi, rq, buf, len, ctx, xdp_xmit, &stats);
> > > > > > > > > > > > > >                         stats.packets++;
> > > > > > > > > > > > > >                 }
> > > > > > > > > > > > > >         } else {
> > > > > > > > > > > > > >                 while (stats.packets < budget &&
> > > > > > > > > > > > > > -                      (buf = virtqueue_get_buf(rq->vq, &len)) != NULL) {
> > > > > > > > > > > > > > +                      (buf = virtnet_rq_get_buf(rq, &len, NULL)) != NULL) {
> > > > > > > > > > > > > >                         receive_buf(vi, rq, buf, len, NULL, xdp_xmit, &stats);
> > > > > > > > > > > > > >                         stats.packets++;
> > > > > > > > > > > > > >                 }
> > > > > > > > > > > > > > @@ -3514,6 +3757,9 @@ static void virtnet_free_queues(struct virtnet_info *vi)
> > > > > > > > > > > > > >         for (i = 0; i < vi->max_queue_pairs; i++) {
> > > > > > > > > > > > > >                 __netif_napi_del(&vi->rq[i].napi);
> > > > > > > > > > > > > >                 __netif_napi_del(&vi->sq[i].napi);
> > > > > > > > > > > > > > +
> > > > > > > > > > > > > > +               kfree(vi->rq[i].data_array);
> > > > > > > > > > > > > > +               kfree(vi->rq[i].dma_array);
> > > > > > > > > > > > > >         }
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >         /* We called __netif_napi_del(),
> > > > > > > > > > > > > > @@ -3591,9 +3837,10 @@ static void free_unused_bufs(struct virtnet_info *vi)
> > > > > > > > > > > > > >         }
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >         for (i = 0; i < vi->max_queue_pairs; i++) {
> > > > > > > > > > > > > > -               struct virtqueue *vq = vi->rq[i].vq;
> > > > > > > > > > > > > > -               while ((buf = virtqueue_detach_unused_buf(vq)) != NULL)
> > > > > > > > > > > > > > -                       virtnet_rq_free_unused_buf(vq, buf);
> > > > > > > > > > > > > > +               struct receive_queue *rq = &vi->rq[i];
> > > > > > > > > > > > > > +
> > > > > > > > > > > > > > +               while ((buf = virtnet_rq_detach_unused_buf(rq)) != NULL)
> > > > > > > > > > > > > > +                       virtnet_rq_free_unused_buf(rq->vq, buf);
> > > > > > > > > > > > > >                 cond_resched();
> > > > > > > > > > > > > >         }
> > > > > > > > > > > > > >  }
> > > > > > > > > > > > > > @@ -3767,6 +4014,10 @@ static int init_vqs(struct virtnet_info *vi)
> > > > > > > > > > > > > >         if (ret)
> > > > > > > > > > > > > >                 goto err_free;
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > +       ret = virtnet_rq_merge_map_init(vi);
> > > > > > > > > > > > > > +       if (ret)
> > > > > > > > > > > > > > +               goto err_free;
> > > > > > > > > > > > > > +
> > > > > > > > > > > > > >         cpus_read_lock();
> > > > > > > > > > > > > >         virtnet_set_affinity(vi);
> > > > > > > > > > > > > >         cpus_read_unlock();
> > > > > > > > > > > > > > --
> > > > > > > > > > > > > > 2.32.0.3.g01195cf9f
> > > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> >

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: [PATCH vhost v11 10/10] virtio_net: merge dma operation for one page
  2023-07-19  8:55                               ` Michael S. Tsirkin
@ 2023-07-19  9:38                                 ` Jason Wang
  -1 siblings, 0 replies; 176+ messages in thread
From: Jason Wang @ 2023-07-19  9:38 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Xuan Zhuo, virtualization, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Alexei Starovoitov, Daniel Borkmann,
	Jesper Dangaard Brouer, John Fastabend, netdev, bpf,
	Christoph Hellwig

On Wed, Jul 19, 2023 at 4:55 PM Michael S. Tsirkin <mst@redhat.com> wrote:
>
> On Wed, Jul 19, 2023 at 11:21:07AM +0800, Xuan Zhuo wrote:
> > On Fri, 14 Jul 2023 06:37:10 -0400, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > > On Wed, Jul 12, 2023 at 04:38:24PM +0800, Xuan Zhuo wrote:
> > > > On Wed, 12 Jul 2023 16:37:43 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > > > On Wed, Jul 12, 2023 at 4:33 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > > > >
> > > > > > On Wed, 12 Jul 2023 15:54:58 +0800, Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > > > > > On Tue, 11 Jul 2023 10:58:51 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > > > > > > On Tue, Jul 11, 2023 at 10:42 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > > > > > > >
> > > > > > > > > On Tue, 11 Jul 2023 10:36:17 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > > > > > > > > On Mon, Jul 10, 2023 at 8:41 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > > > > > > > > >
> > > > > > > > > > > On Mon, 10 Jul 2023 07:59:03 -0400, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > > > > > > > > > > > On Mon, Jul 10, 2023 at 06:18:30PM +0800, Xuan Zhuo wrote:
> > > > > > > > > > > > > On Mon, 10 Jul 2023 05:40:21 -0400, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > > > > > > > > > > > > > On Mon, Jul 10, 2023 at 11:42:37AM +0800, Xuan Zhuo wrote:
> > > > > > > > > > > > > > > Currently, the virtio core will perform a dma operation for each
> > > > > > > > > > > > > > > operation. Although, the same page may be operated multiple times.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > The driver does the dma operation and manages the dma address based the
> > > > > > > > > > > > > > > feature premapped of virtio core.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > This way, we can perform only one dma operation for the same page. In
> > > > > > > > > > > > > > > the case of mtu 1500, this can reduce a lot of dma operations.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Tested on Aliyun g7.4large machine, in the case of a cpu 100%, pps
> > > > > > > > > > > > > > > increased from 1893766 to 1901105. An increase of 0.4%.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > what kind of dma was there? an IOMMU? which vendors? in which mode
> > > > > > > > > > > > > > of operation?
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > Do you mean this:
> > > > > > > > > > > > >
> > > > > > > > > > > > > [    0.470816] iommu: Default domain type: Passthrough
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > With passthrough, dma API is just some indirect function calls, they do
> > > > > > > > > > > > not affect the performance a lot.
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > Yes, this benefit is worthless. I seem to have done a meaningless thing. The
> > > > > > > > > > > overhead of DMA I observed is indeed not too high.
> > > > > > > > > >
> > > > > > > > > > Have you measured with iommu=strict?
> > > > > > > > >
> > > > > > > > > I have not tested this way, our environment is pt, I wonder if strict is a
> > > > > > > > > common scenario. I can test it.
> > > > > > > >
> > > > > > > > It's not a common setup, but it's a way to stress DMA layer to see the overhead.
> > > > > > >
> > > > > > > kernel command line: intel_iommu=on iommu.strict=1 iommu.passthrough=0
> > > > > > >
> > > > > > > virtio-net without merge dma 428614.00 pps
> > > > > > >
> > > > > > > virtio-net with merge dma    742853.00 pps
> > > > > >
> > > > > >
> > > > > > kernel command line: intel_iommu=on iommu.strict=0 iommu.passthrough=0
> > > > > >
> > > > > > virtio-net without merge dma 775496.00 pps
> > > > > >
> > > > > > virtio-net with merge dma    1010514.00 pps
> > > > > >
> > > > > >
> > > > >
> > > > > Great, let's add those numbers to the changelog.
> > > >
> > > >
> > > > Yes, I will do it in next version.
> > > >
> > > >
> > > > Thanks.
> > > >
> > >
> > > You should also test without iommu but with swiotlb=force
> >
> >
> > For swiotlb, merge DMA has no benefit, because we still need to copy data from
> > swiotlb buffer to the origin buffer.
> > The benefit of the merge DMA is to reduce the operate to the iommu device.
> >
> > I did some test for this. The result is same.
> >
> > Thanks.
> >
>
> Did you actually check that it works though?
> Looks like with swiotlb you need to synch to trigger a copy
> before unmap, and I don't see where it's done in the current
> patch.

And this is needed for XDP_REDIRECT as well.

Thanks

>
>
> >
> > >
> > > But first fix the use of DMA API to actually be correct,
> > > otherwise you are cheating by avoiding synchronization.
> > >
> > >
> > >
> > > > >
> > > > > Thanks
> > > > >
> > > > > > Thanks.
> > > > > >
> > > > > > >
> > > > > > >
> > > > > > > Thanks.
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > >
> > > > > > > > Thanks
> > > > > > > >
> > > > > > > > >
> > > > > > > > > Thanks.
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > Thanks
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > Thanks.
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > Try e.g. bounce buffer. Which is where you will see a problem: your
> > > > > > > > > > > > patches won't work.
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > This kind of difference is likely in the noise.
> > > > > > > > > > > > >
> > > > > > > > > > > > > It's really not high, but this is because the proportion of DMA under perf top
> > > > > > > > > > > > > is not high. Probably that much.
> > > > > > > > > > > >
> > > > > > > > > > > > So maybe not worth the complexity.
> > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > ---
> > > > > > > > > > > > > > >  drivers/net/virtio_net.c | 283 ++++++++++++++++++++++++++++++++++++---
> > > > > > > > > > > > > > >  1 file changed, 267 insertions(+), 16 deletions(-)
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > > > > > > > > > > > > > > index 486b5849033d..4de845d35bed 100644
> > > > > > > > > > > > > > > --- a/drivers/net/virtio_net.c
> > > > > > > > > > > > > > > +++ b/drivers/net/virtio_net.c
> > > > > > > > > > > > > > > @@ -126,6 +126,27 @@ static const struct virtnet_stat_desc virtnet_rq_stats_desc[] = {
> > > > > > > > > > > > > > >  #define VIRTNET_SQ_STATS_LEN   ARRAY_SIZE(virtnet_sq_stats_desc)
> > > > > > > > > > > > > > >  #define VIRTNET_RQ_STATS_LEN   ARRAY_SIZE(virtnet_rq_stats_desc)
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > +/* The bufs on the same page may share this struct. */
> > > > > > > > > > > > > > > +struct virtnet_rq_dma {
> > > > > > > > > > > > > > > +       struct virtnet_rq_dma *next;
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > +       dma_addr_t addr;
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > +       void *buf;
> > > > > > > > > > > > > > > +       u32 len;
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > +       u32 ref;
> > > > > > > > > > > > > > > +};
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > +/* Record the dma and buf. */
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > I guess I see that. But why?
> > > > > > > > > > > > > > And these two comments are the extent of the available
> > > > > > > > > > > > > > documentation, that's not enough I feel.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > +struct virtnet_rq_data {
> > > > > > > > > > > > > > > +       struct virtnet_rq_data *next;
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Is manually reimplementing a linked list the best
> > > > > > > > > > > > > > we can do?
> > > > > > > > > > > > >
> > > > > > > > > > > > > Yes, we can use llist.
> > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > +       void *buf;
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > +       struct virtnet_rq_dma *dma;
> > > > > > > > > > > > > > > +};
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > >  /* Internal representation of a send virtqueue */
> > > > > > > > > > > > > > >  struct send_queue {
> > > > > > > > > > > > > > >         /* Virtqueue associated with this send _queue */
> > > > > > > > > > > > > > > @@ -175,6 +196,13 @@ struct receive_queue {
> > > > > > > > > > > > > > >         char name[16];
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >         struct xdp_rxq_info xdp_rxq;
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > +       struct virtnet_rq_data *data_array;
> > > > > > > > > > > > > > > +       struct virtnet_rq_data *data_free;
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > +       struct virtnet_rq_dma *dma_array;
> > > > > > > > > > > > > > > +       struct virtnet_rq_dma *dma_free;
> > > > > > > > > > > > > > > +       struct virtnet_rq_dma *last_dma;
> > > > > > > > > > > > > > >  };
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >  /* This structure can contain rss message with maximum settings for indirection table and keysize
> > > > > > > > > > > > > > > @@ -549,6 +577,176 @@ static struct sk_buff *page_to_skb(struct virtnet_info *vi,
> > > > > > > > > > > > > > >         return skb;
> > > > > > > > > > > > > > >  }
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > +static void virtnet_rq_unmap(struct receive_queue *rq, struct virtnet_rq_dma *dma)
> > > > > > > > > > > > > > > +{
> > > > > > > > > > > > > > > +       struct device *dev;
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > +       --dma->ref;
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > +       if (dma->ref)
> > > > > > > > > > > > > > > +               return;
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > If you don't unmap there is no guarantee valid data will be
> > > > > > > > > > > > > > there in the buffer.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > +       dev = virtqueue_dma_dev(rq->vq);
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > +       dma_unmap_page(dev, dma->addr, dma->len, DMA_FROM_DEVICE);
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > +       dma->next = rq->dma_free;
> > > > > > > > > > > > > > > +       rq->dma_free = dma;
> > > > > > > > > > > > > > > +}
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > +static void *virtnet_rq_recycle_data(struct receive_queue *rq,
> > > > > > > > > > > > > > > +                                    struct virtnet_rq_data *data)
> > > > > > > > > > > > > > > +{
> > > > > > > > > > > > > > > +       void *buf;
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > +       buf = data->buf;
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > +       data->next = rq->data_free;
> > > > > > > > > > > > > > > +       rq->data_free = data;
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > +       return buf;
> > > > > > > > > > > > > > > +}
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > +static struct virtnet_rq_data *virtnet_rq_get_data(struct receive_queue *rq,
> > > > > > > > > > > > > > > +                                                  void *buf,
> > > > > > > > > > > > > > > +                                                  struct virtnet_rq_dma *dma)
> > > > > > > > > > > > > > > +{
> > > > > > > > > > > > > > > +       struct virtnet_rq_data *data;
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > +       data = rq->data_free;
> > > > > > > > > > > > > > > +       rq->data_free = data->next;
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > +       data->buf = buf;
> > > > > > > > > > > > > > > +       data->dma = dma;
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > +       return data;
> > > > > > > > > > > > > > > +}
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > +static void *virtnet_rq_get_buf(struct receive_queue *rq, u32 *len, void **ctx)
> > > > > > > > > > > > > > > +{
> > > > > > > > > > > > > > > +       struct virtnet_rq_data *data;
> > > > > > > > > > > > > > > +       void *buf;
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > +       buf = virtqueue_get_buf_ctx(rq->vq, len, ctx);
> > > > > > > > > > > > > > > +       if (!buf || !rq->data_array)
> > > > > > > > > > > > > > > +               return buf;
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > +       data = buf;
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > +       virtnet_rq_unmap(rq, data->dma);
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > +       return virtnet_rq_recycle_data(rq, data);
> > > > > > > > > > > > > > > +}
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > +static void *virtnet_rq_detach_unused_buf(struct receive_queue *rq)
> > > > > > > > > > > > > > > +{
> > > > > > > > > > > > > > > +       struct virtnet_rq_data *data;
> > > > > > > > > > > > > > > +       void *buf;
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > +       buf = virtqueue_detach_unused_buf(rq->vq);
> > > > > > > > > > > > > > > +       if (!buf || !rq->data_array)
> > > > > > > > > > > > > > > +               return buf;
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > +       data = buf;
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > +       virtnet_rq_unmap(rq, data->dma);
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > +       return virtnet_rq_recycle_data(rq, data);
> > > > > > > > > > > > > > > +}
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > +static int virtnet_rq_map_sg(struct receive_queue *rq, void *buf, u32 len)
> > > > > > > > > > > > > > > +{
> > > > > > > > > > > > > > > +       struct virtnet_rq_dma *dma = rq->last_dma;
> > > > > > > > > > > > > > > +       struct device *dev;
> > > > > > > > > > > > > > > +       u32 off, map_len;
> > > > > > > > > > > > > > > +       dma_addr_t addr;
> > > > > > > > > > > > > > > +       void *end;
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > +       if (likely(dma) && buf >= dma->buf && (buf + len <= dma->buf + dma->len)) {
> > > > > > > > > > > > > > > +               ++dma->ref;
> > > > > > > > > > > > > > > +               addr = dma->addr + (buf - dma->buf);
> > > > > > > > > > > > > > > +               goto ok;
> > > > > > > > > > > > > > > +       }
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > So this is the meat of the proposed optimization. I guess that
> > > > > > > > > > > > > > if the last buffer we allocated happens to be in the same page
> > > > > > > > > > > > > > as this one then they can both be mapped for DMA together.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Since we use page_frag, the buffers we allocated are all continuous.
> > > > > > > > > > > > >
> > > > > > > > > > > > > > Why last one specifically? Whether next one happens to
> > > > > > > > > > > > > > be close depends on luck. If you want to try optimizing this
> > > > > > > > > > > > > > the right thing to do is likely by using a page pool.
> > > > > > > > > > > > > > There's actually work upstream on page pool, look it up.
> > > > > > > > > > > > >
> > > > > > > > > > > > > As we discussed in another thread, the page pool is first used for xdp. Let's
> > > > > > > > > > > > > transform it step by step.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Thanks.
> > > > > > > > > > > >
> > > > > > > > > > > > ok so this should wait then?
> > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > +       end = buf + len - 1;
> > > > > > > > > > > > > > > +       off = offset_in_page(end);
> > > > > > > > > > > > > > > +       map_len = len + PAGE_SIZE - off;
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > +       dev = virtqueue_dma_dev(rq->vq);
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > +       addr = dma_map_page_attrs(dev, virt_to_page(buf), offset_in_page(buf),
> > > > > > > > > > > > > > > +                                 map_len, DMA_FROM_DEVICE, 0);
> > > > > > > > > > > > > > > +       if (addr == DMA_MAPPING_ERROR)
> > > > > > > > > > > > > > > +               return -ENOMEM;
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > +       dma = rq->dma_free;
> > > > > > > > > > > > > > > +       rq->dma_free = dma->next;
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > +       dma->ref = 1;
> > > > > > > > > > > > > > > +       dma->buf = buf;
> > > > > > > > > > > > > > > +       dma->addr = addr;
> > > > > > > > > > > > > > > +       dma->len = map_len;
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > +       rq->last_dma = dma;
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > +ok:
> > > > > > > > > > > > > > > +       sg_init_table(rq->sg, 1);
> > > > > > > > > > > > > > > +       rq->sg[0].dma_address = addr;
> > > > > > > > > > > > > > > +       rq->sg[0].length = len;
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > +       return 0;
> > > > > > > > > > > > > > > +}
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > +static int virtnet_rq_merge_map_init(struct virtnet_info *vi)
> > > > > > > > > > > > > > > +{
> > > > > > > > > > > > > > > +       struct receive_queue *rq;
> > > > > > > > > > > > > > > +       int i, err, j, num;
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > +       /* disable for big mode */
> > > > > > > > > > > > > > > +       if (!vi->mergeable_rx_bufs && vi->big_packets)
> > > > > > > > > > > > > > > +               return 0;
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > +       for (i = 0; i < vi->max_queue_pairs; i++) {
> > > > > > > > > > > > > > > +               err = virtqueue_set_premapped(vi->rq[i].vq);
> > > > > > > > > > > > > > > +               if (err)
> > > > > > > > > > > > > > > +                       continue;
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > +               rq = &vi->rq[i];
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > +               num = virtqueue_get_vring_size(rq->vq);
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > +               rq->data_array = kmalloc_array(num, sizeof(*rq->data_array), GFP_KERNEL);
> > > > > > > > > > > > > > > +               if (!rq->data_array)
> > > > > > > > > > > > > > > +                       goto err;
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > +               rq->dma_array = kmalloc_array(num, sizeof(*rq->dma_array), GFP_KERNEL);
> > > > > > > > > > > > > > > +               if (!rq->dma_array)
> > > > > > > > > > > > > > > +                       goto err;
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > +               for (j = 0; j < num; ++j) {
> > > > > > > > > > > > > > > +                       rq->data_array[j].next = rq->data_free;
> > > > > > > > > > > > > > > +                       rq->data_free = &rq->data_array[j];
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > +                       rq->dma_array[j].next = rq->dma_free;
> > > > > > > > > > > > > > > +                       rq->dma_free = &rq->dma_array[j];
> > > > > > > > > > > > > > > +               }
> > > > > > > > > > > > > > > +       }
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > +       return 0;
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > +err:
> > > > > > > > > > > > > > > +       for (i = 0; i < vi->max_queue_pairs; i++) {
> > > > > > > > > > > > > > > +               struct receive_queue *rq;
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > +               rq = &vi->rq[i];
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > +               kfree(rq->dma_array);
> > > > > > > > > > > > > > > +               kfree(rq->data_array);
> > > > > > > > > > > > > > > +       }
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > +       return -ENOMEM;
> > > > > > > > > > > > > > > +}
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > >  static void free_old_xmit_skbs(struct send_queue *sq, bool in_napi)
> > > > > > > > > > > > > > >  {
> > > > > > > > > > > > > > >         unsigned int len;
> > > > > > > > > > > > > > > @@ -835,7 +1033,7 @@ static struct page *xdp_linearize_page(struct receive_queue *rq,
> > > > > > > > > > > > > > >                 void *buf;
> > > > > > > > > > > > > > >                 int off;
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > -               buf = virtqueue_get_buf(rq->vq, &buflen);
> > > > > > > > > > > > > > > +               buf = virtnet_rq_get_buf(rq, &buflen, NULL);
> > > > > > > > > > > > > > >                 if (unlikely(!buf))
> > > > > > > > > > > > > > >                         goto err_buf;
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > @@ -1126,7 +1324,7 @@ static int virtnet_build_xdp_buff_mrg(struct net_device *dev,
> > > > > > > > > > > > > > >                 return -EINVAL;
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >         while (--*num_buf > 0) {
> > > > > > > > > > > > > > > -               buf = virtqueue_get_buf_ctx(rq->vq, &len, &ctx);
> > > > > > > > > > > > > > > +               buf = virtnet_rq_get_buf(rq, &len, &ctx);
> > > > > > > > > > > > > > >                 if (unlikely(!buf)) {
> > > > > > > > > > > > > > >                         pr_debug("%s: rx error: %d buffers out of %d missing\n",
> > > > > > > > > > > > > > >                                  dev->name, *num_buf,
> > > > > > > > > > > > > > > @@ -1351,7 +1549,7 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
> > > > > > > > > > > > > > >         while (--num_buf) {
> > > > > > > > > > > > > > >                 int num_skb_frags;
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > -               buf = virtqueue_get_buf_ctx(rq->vq, &len, &ctx);
> > > > > > > > > > > > > > > +               buf = virtnet_rq_get_buf(rq, &len, &ctx);
> > > > > > > > > > > > > > >                 if (unlikely(!buf)) {
> > > > > > > > > > > > > > >                         pr_debug("%s: rx error: %d buffers out of %d missing\n",
> > > > > > > > > > > > > > >                                  dev->name, num_buf,
> > > > > > > > > > > > > > > @@ -1414,7 +1612,7 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
> > > > > > > > > > > > > > >  err_skb:
> > > > > > > > > > > > > > >         put_page(page);
> > > > > > > > > > > > > > >         while (num_buf-- > 1) {
> > > > > > > > > > > > > > > -               buf = virtqueue_get_buf(rq->vq, &len);
> > > > > > > > > > > > > > > +               buf = virtnet_rq_get_buf(rq, &len, NULL);
> > > > > > > > > > > > > > >                 if (unlikely(!buf)) {
> > > > > > > > > > > > > > >                         pr_debug("%s: rx error: %d buffers missing\n",
> > > > > > > > > > > > > > >                                  dev->name, num_buf);
> > > > > > > > > > > > > > > @@ -1529,6 +1727,7 @@ static int add_recvbuf_small(struct virtnet_info *vi, struct receive_queue *rq,
> > > > > > > > > > > > > > >         unsigned int xdp_headroom = virtnet_get_headroom(vi);
> > > > > > > > > > > > > > >         void *ctx = (void *)(unsigned long)xdp_headroom;
> > > > > > > > > > > > > > >         int len = vi->hdr_len + VIRTNET_RX_PAD + GOOD_PACKET_LEN + xdp_headroom;
> > > > > > > > > > > > > > > +       struct virtnet_rq_data *data;
> > > > > > > > > > > > > > >         int err;
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >         len = SKB_DATA_ALIGN(len) +
> > > > > > > > > > > > > > > @@ -1539,11 +1738,34 @@ static int add_recvbuf_small(struct virtnet_info *vi, struct receive_queue *rq,
> > > > > > > > > > > > > > >         buf = (char *)page_address(alloc_frag->page) + alloc_frag->offset;
> > > > > > > > > > > > > > >         get_page(alloc_frag->page);
> > > > > > > > > > > > > > >         alloc_frag->offset += len;
> > > > > > > > > > > > > > > -       sg_init_one(rq->sg, buf + VIRTNET_RX_PAD + xdp_headroom,
> > > > > > > > > > > > > > > -                   vi->hdr_len + GOOD_PACKET_LEN);
> > > > > > > > > > > > > > > -       err = virtqueue_add_inbuf_ctx(rq->vq, rq->sg, 1, buf, ctx, gfp);
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > +       if (rq->data_array) {
> > > > > > > > > > > > > > > +               err = virtnet_rq_map_sg(rq, buf + VIRTNET_RX_PAD + xdp_headroom,
> > > > > > > > > > > > > > > +                                       vi->hdr_len + GOOD_PACKET_LEN);
> > > > > > > > > > > > > > > +               if (err)
> > > > > > > > > > > > > > > +                       goto map_err;
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > +               data = virtnet_rq_get_data(rq, buf, rq->last_dma);
> > > > > > > > > > > > > > > +       } else {
> > > > > > > > > > > > > > > +               sg_init_one(rq->sg, buf + VIRTNET_RX_PAD + xdp_headroom,
> > > > > > > > > > > > > > > +                           vi->hdr_len + GOOD_PACKET_LEN);
> > > > > > > > > > > > > > > +               data = (void *)buf;
> > > > > > > > > > > > > > > +       }
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > +       err = virtqueue_add_inbuf_ctx(rq->vq, rq->sg, 1, data, ctx, gfp);
> > > > > > > > > > > > > > >         if (err < 0)
> > > > > > > > > > > > > > > -               put_page(virt_to_head_page(buf));
> > > > > > > > > > > > > > > +               goto add_err;
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > +       return err;
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > +add_err:
> > > > > > > > > > > > > > > +       if (rq->data_array) {
> > > > > > > > > > > > > > > +               virtnet_rq_unmap(rq, data->dma);
> > > > > > > > > > > > > > > +               virtnet_rq_recycle_data(rq, data);
> > > > > > > > > > > > > > > +       }
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > +map_err:
> > > > > > > > > > > > > > > +       put_page(virt_to_head_page(buf));
> > > > > > > > > > > > > > >         return err;
> > > > > > > > > > > > > > >  }
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > @@ -1620,6 +1842,7 @@ static int add_recvbuf_mergeable(struct virtnet_info *vi,
> > > > > > > > > > > > > > >         unsigned int headroom = virtnet_get_headroom(vi);
> > > > > > > > > > > > > > >         unsigned int tailroom = headroom ? sizeof(struct skb_shared_info) : 0;
> > > > > > > > > > > > > > >         unsigned int room = SKB_DATA_ALIGN(headroom + tailroom);
> > > > > > > > > > > > > > > +       struct virtnet_rq_data *data;
> > > > > > > > > > > > > > >         char *buf;
> > > > > > > > > > > > > > >         void *ctx;
> > > > > > > > > > > > > > >         int err;
> > > > > > > > > > > > > > > @@ -1650,12 +1873,32 @@ static int add_recvbuf_mergeable(struct virtnet_info *vi,
> > > > > > > > > > > > > > >                 alloc_frag->offset += hole;
> > > > > > > > > > > > > > >         }
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > -       sg_init_one(rq->sg, buf, len);
> > > > > > > > > > > > > > > +       if (rq->data_array) {
> > > > > > > > > > > > > > > +               err = virtnet_rq_map_sg(rq, buf, len);
> > > > > > > > > > > > > > > +               if (err)
> > > > > > > > > > > > > > > +                       goto map_err;
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > +               data = virtnet_rq_get_data(rq, buf, rq->last_dma);
> > > > > > > > > > > > > > > +       } else {
> > > > > > > > > > > > > > > +               sg_init_one(rq->sg, buf, len);
> > > > > > > > > > > > > > > +               data = (void *)buf;
> > > > > > > > > > > > > > > +       }
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > >         ctx = mergeable_len_to_ctx(len + room, headroom);
> > > > > > > > > > > > > > > -       err = virtqueue_add_inbuf_ctx(rq->vq, rq->sg, 1, buf, ctx, gfp);
> > > > > > > > > > > > > > > +       err = virtqueue_add_inbuf_ctx(rq->vq, rq->sg, 1, data, ctx, gfp);
> > > > > > > > > > > > > > >         if (err < 0)
> > > > > > > > > > > > > > > -               put_page(virt_to_head_page(buf));
> > > > > > > > > > > > > > > +               goto add_err;
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > +       return 0;
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > +add_err:
> > > > > > > > > > > > > > > +       if (rq->data_array) {
> > > > > > > > > > > > > > > +               virtnet_rq_unmap(rq, data->dma);
> > > > > > > > > > > > > > > +               virtnet_rq_recycle_data(rq, data);
> > > > > > > > > > > > > > > +       }
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > +map_err:
> > > > > > > > > > > > > > > +       put_page(virt_to_head_page(buf));
> > > > > > > > > > > > > > >         return err;
> > > > > > > > > > > > > > >  }
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > @@ -1775,13 +2018,13 @@ static int virtnet_receive(struct receive_queue *rq, int budget,
> > > > > > > > > > > > > > >                 void *ctx;
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >                 while (stats.packets < budget &&
> > > > > > > > > > > > > > > -                      (buf = virtqueue_get_buf_ctx(rq->vq, &len, &ctx))) {
> > > > > > > > > > > > > > > +                      (buf = virtnet_rq_get_buf(rq, &len, &ctx))) {
> > > > > > > > > > > > > > >                         receive_buf(vi, rq, buf, len, ctx, xdp_xmit, &stats);
> > > > > > > > > > > > > > >                         stats.packets++;
> > > > > > > > > > > > > > >                 }
> > > > > > > > > > > > > > >         } else {
> > > > > > > > > > > > > > >                 while (stats.packets < budget &&
> > > > > > > > > > > > > > > -                      (buf = virtqueue_get_buf(rq->vq, &len)) != NULL) {
> > > > > > > > > > > > > > > +                      (buf = virtnet_rq_get_buf(rq, &len, NULL)) != NULL) {
> > > > > > > > > > > > > > >                         receive_buf(vi, rq, buf, len, NULL, xdp_xmit, &stats);
> > > > > > > > > > > > > > >                         stats.packets++;
> > > > > > > > > > > > > > >                 }
> > > > > > > > > > > > > > > @@ -3514,6 +3757,9 @@ static void virtnet_free_queues(struct virtnet_info *vi)
> > > > > > > > > > > > > > >         for (i = 0; i < vi->max_queue_pairs; i++) {
> > > > > > > > > > > > > > >                 __netif_napi_del(&vi->rq[i].napi);
> > > > > > > > > > > > > > >                 __netif_napi_del(&vi->sq[i].napi);
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > +               kfree(vi->rq[i].data_array);
> > > > > > > > > > > > > > > +               kfree(vi->rq[i].dma_array);
> > > > > > > > > > > > > > >         }
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >         /* We called __netif_napi_del(),
> > > > > > > > > > > > > > > @@ -3591,9 +3837,10 @@ static void free_unused_bufs(struct virtnet_info *vi)
> > > > > > > > > > > > > > >         }
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >         for (i = 0; i < vi->max_queue_pairs; i++) {
> > > > > > > > > > > > > > > -               struct virtqueue *vq = vi->rq[i].vq;
> > > > > > > > > > > > > > > -               while ((buf = virtqueue_detach_unused_buf(vq)) != NULL)
> > > > > > > > > > > > > > > -                       virtnet_rq_free_unused_buf(vq, buf);
> > > > > > > > > > > > > > > +               struct receive_queue *rq = &vi->rq[i];
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > +               while ((buf = virtnet_rq_detach_unused_buf(rq)) != NULL)
> > > > > > > > > > > > > > > +                       virtnet_rq_free_unused_buf(rq->vq, buf);
> > > > > > > > > > > > > > >                 cond_resched();
> > > > > > > > > > > > > > >         }
> > > > > > > > > > > > > > >  }
> > > > > > > > > > > > > > > @@ -3767,6 +4014,10 @@ static int init_vqs(struct virtnet_info *vi)
> > > > > > > > > > > > > > >         if (ret)
> > > > > > > > > > > > > > >                 goto err_free;
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > +       ret = virtnet_rq_merge_map_init(vi);
> > > > > > > > > > > > > > > +       if (ret)
> > > > > > > > > > > > > > > +               goto err_free;
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > >         cpus_read_lock();
> > > > > > > > > > > > > > >         virtnet_set_affinity(vi);
> > > > > > > > > > > > > > >         cpus_read_unlock();
> > > > > > > > > > > > > > > --
> > > > > > > > > > > > > > > 2.32.0.3.g01195cf9f
> > > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > >
>


^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: [PATCH vhost v11 10/10] virtio_net: merge dma operation for one page
@ 2023-07-19  9:38                                 ` Jason Wang
  0 siblings, 0 replies; 176+ messages in thread
From: Jason Wang @ 2023-07-19  9:38 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Xuan Zhuo, Jesper Dangaard Brouer, Daniel Borkmann, netdev,
	John Fastabend, Alexei Starovoitov, virtualization,
	Christoph Hellwig, Eric Dumazet, Jakub Kicinski, bpf,
	Paolo Abeni, David S. Miller

On Wed, Jul 19, 2023 at 4:55 PM Michael S. Tsirkin <mst@redhat.com> wrote:
>
> On Wed, Jul 19, 2023 at 11:21:07AM +0800, Xuan Zhuo wrote:
> > On Fri, 14 Jul 2023 06:37:10 -0400, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > > On Wed, Jul 12, 2023 at 04:38:24PM +0800, Xuan Zhuo wrote:
> > > > On Wed, 12 Jul 2023 16:37:43 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > > > On Wed, Jul 12, 2023 at 4:33 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > > > >
> > > > > > On Wed, 12 Jul 2023 15:54:58 +0800, Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > > > > > On Tue, 11 Jul 2023 10:58:51 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > > > > > > On Tue, Jul 11, 2023 at 10:42 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > > > > > > >
> > > > > > > > > On Tue, 11 Jul 2023 10:36:17 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > > > > > > > > On Mon, Jul 10, 2023 at 8:41 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > > > > > > > > >
> > > > > > > > > > > On Mon, 10 Jul 2023 07:59:03 -0400, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > > > > > > > > > > > On Mon, Jul 10, 2023 at 06:18:30PM +0800, Xuan Zhuo wrote:
> > > > > > > > > > > > > On Mon, 10 Jul 2023 05:40:21 -0400, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > > > > > > > > > > > > > On Mon, Jul 10, 2023 at 11:42:37AM +0800, Xuan Zhuo wrote:
> > > > > > > > > > > > > > > Currently, the virtio core will perform a dma operation for each
> > > > > > > > > > > > > > > operation. Although, the same page may be operated multiple times.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > The driver does the dma operation and manages the dma address based the
> > > > > > > > > > > > > > > feature premapped of virtio core.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > This way, we can perform only one dma operation for the same page. In
> > > > > > > > > > > > > > > the case of mtu 1500, this can reduce a lot of dma operations.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Tested on Aliyun g7.4large machine, in the case of a cpu 100%, pps
> > > > > > > > > > > > > > > increased from 1893766 to 1901105. An increase of 0.4%.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > what kind of dma was there? an IOMMU? which vendors? in which mode
> > > > > > > > > > > > > > of operation?
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > Do you mean this:
> > > > > > > > > > > > >
> > > > > > > > > > > > > [    0.470816] iommu: Default domain type: Passthrough
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > With passthrough, dma API is just some indirect function calls, they do
> > > > > > > > > > > > not affect the performance a lot.
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > Yes, this benefit is worthless. I seem to have done a meaningless thing. The
> > > > > > > > > > > overhead of DMA I observed is indeed not too high.
> > > > > > > > > >
> > > > > > > > > > Have you measured with iommu=strict?
> > > > > > > > >
> > > > > > > > > I have not tested this way, our environment is pt, I wonder if strict is a
> > > > > > > > > common scenario. I can test it.
> > > > > > > >
> > > > > > > > It's not a common setup, but it's a way to stress DMA layer to see the overhead.
> > > > > > >
> > > > > > > kernel command line: intel_iommu=on iommu.strict=1 iommu.passthrough=0
> > > > > > >
> > > > > > > virtio-net without merge dma 428614.00 pps
> > > > > > >
> > > > > > > virtio-net with merge dma    742853.00 pps
> > > > > >
> > > > > >
> > > > > > kernel command line: intel_iommu=on iommu.strict=0 iommu.passthrough=0
> > > > > >
> > > > > > virtio-net without merge dma 775496.00 pps
> > > > > >
> > > > > > virtio-net with merge dma    1010514.00 pps
> > > > > >
> > > > > >
> > > > >
> > > > > Great, let's add those numbers to the changelog.
> > > >
> > > >
> > > > Yes, I will do it in next version.
> > > >
> > > >
> > > > Thanks.
> > > >
> > >
> > > You should also test without iommu but with swiotlb=force
> >
> >
> > For swiotlb, merge DMA has no benefit, because we still need to copy data from
> > swiotlb buffer to the origin buffer.
> > The benefit of the merge DMA is to reduce the operate to the iommu device.
> >
> > I did some test for this. The result is same.
> >
> > Thanks.
> >
>
> Did you actually check that it works though?
> Looks like with swiotlb you need to synch to trigger a copy
> before unmap, and I don't see where it's done in the current
> patch.

And this is needed for XDP_REDIRECT as well.

Thanks

>
>
> >
> > >
> > > But first fix the use of DMA API to actually be correct,
> > > otherwise you are cheating by avoiding synchronization.
> > >
> > >
> > >
> > > > >
> > > > > Thanks
> > > > >
> > > > > > Thanks.
> > > > > >
> > > > > > >
> > > > > > >
> > > > > > > Thanks.
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > >
> > > > > > > > Thanks
> > > > > > > >
> > > > > > > > >
> > > > > > > > > Thanks.
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > Thanks
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > Thanks.
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > Try e.g. bounce buffer. Which is where you will see a problem: your
> > > > > > > > > > > > patches won't work.
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > This kind of difference is likely in the noise.
> > > > > > > > > > > > >
> > > > > > > > > > > > > It's really not high, but this is because the proportion of DMA under perf top
> > > > > > > > > > > > > is not high. Probably that much.
> > > > > > > > > > > >
> > > > > > > > > > > > So maybe not worth the complexity.
> > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > ---
> > > > > > > > > > > > > > >  drivers/net/virtio_net.c | 283 ++++++++++++++++++++++++++++++++++++---
> > > > > > > > > > > > > > >  1 file changed, 267 insertions(+), 16 deletions(-)
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > > > > > > > > > > > > > > index 486b5849033d..4de845d35bed 100644
> > > > > > > > > > > > > > > --- a/drivers/net/virtio_net.c
> > > > > > > > > > > > > > > +++ b/drivers/net/virtio_net.c
> > > > > > > > > > > > > > > @@ -126,6 +126,27 @@ static const struct virtnet_stat_desc virtnet_rq_stats_desc[] = {
> > > > > > > > > > > > > > >  #define VIRTNET_SQ_STATS_LEN   ARRAY_SIZE(virtnet_sq_stats_desc)
> > > > > > > > > > > > > > >  #define VIRTNET_RQ_STATS_LEN   ARRAY_SIZE(virtnet_rq_stats_desc)
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > +/* The bufs on the same page may share this struct. */
> > > > > > > > > > > > > > > +struct virtnet_rq_dma {
> > > > > > > > > > > > > > > +       struct virtnet_rq_dma *next;
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > +       dma_addr_t addr;
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > +       void *buf;
> > > > > > > > > > > > > > > +       u32 len;
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > +       u32 ref;
> > > > > > > > > > > > > > > +};
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > +/* Record the dma and buf. */
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > I guess I see that. But why?
> > > > > > > > > > > > > > And these two comments are the extent of the available
> > > > > > > > > > > > > > documentation, that's not enough I feel.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > +struct virtnet_rq_data {
> > > > > > > > > > > > > > > +       struct virtnet_rq_data *next;
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Is manually reimplementing a linked list the best
> > > > > > > > > > > > > > we can do?
> > > > > > > > > > > > >
> > > > > > > > > > > > > Yes, we can use llist.
> > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > +       void *buf;
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > +       struct virtnet_rq_dma *dma;
> > > > > > > > > > > > > > > +};
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > >  /* Internal representation of a send virtqueue */
> > > > > > > > > > > > > > >  struct send_queue {
> > > > > > > > > > > > > > >         /* Virtqueue associated with this send _queue */
> > > > > > > > > > > > > > > @@ -175,6 +196,13 @@ struct receive_queue {
> > > > > > > > > > > > > > >         char name[16];
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >         struct xdp_rxq_info xdp_rxq;
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > +       struct virtnet_rq_data *data_array;
> > > > > > > > > > > > > > > +       struct virtnet_rq_data *data_free;
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > +       struct virtnet_rq_dma *dma_array;
> > > > > > > > > > > > > > > +       struct virtnet_rq_dma *dma_free;
> > > > > > > > > > > > > > > +       struct virtnet_rq_dma *last_dma;
> > > > > > > > > > > > > > >  };
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >  /* This structure can contain rss message with maximum settings for indirection table and keysize
> > > > > > > > > > > > > > > @@ -549,6 +577,176 @@ static struct sk_buff *page_to_skb(struct virtnet_info *vi,
> > > > > > > > > > > > > > >         return skb;
> > > > > > > > > > > > > > >  }
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > +static void virtnet_rq_unmap(struct receive_queue *rq, struct virtnet_rq_dma *dma)
> > > > > > > > > > > > > > > +{
> > > > > > > > > > > > > > > +       struct device *dev;
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > +       --dma->ref;
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > +       if (dma->ref)
> > > > > > > > > > > > > > > +               return;
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > If you don't unmap there is no guarantee valid data will be
> > > > > > > > > > > > > > there in the buffer.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > +       dev = virtqueue_dma_dev(rq->vq);
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > +       dma_unmap_page(dev, dma->addr, dma->len, DMA_FROM_DEVICE);
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > +       dma->next = rq->dma_free;
> > > > > > > > > > > > > > > +       rq->dma_free = dma;
> > > > > > > > > > > > > > > +}
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > +static void *virtnet_rq_recycle_data(struct receive_queue *rq,
> > > > > > > > > > > > > > > +                                    struct virtnet_rq_data *data)
> > > > > > > > > > > > > > > +{
> > > > > > > > > > > > > > > +       void *buf;
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > +       buf = data->buf;
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > +       data->next = rq->data_free;
> > > > > > > > > > > > > > > +       rq->data_free = data;
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > +       return buf;
> > > > > > > > > > > > > > > +}
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > +static struct virtnet_rq_data *virtnet_rq_get_data(struct receive_queue *rq,
> > > > > > > > > > > > > > > +                                                  void *buf,
> > > > > > > > > > > > > > > +                                                  struct virtnet_rq_dma *dma)
> > > > > > > > > > > > > > > +{
> > > > > > > > > > > > > > > +       struct virtnet_rq_data *data;
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > +       data = rq->data_free;
> > > > > > > > > > > > > > > +       rq->data_free = data->next;
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > +       data->buf = buf;
> > > > > > > > > > > > > > > +       data->dma = dma;
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > +       return data;
> > > > > > > > > > > > > > > +}
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > +static void *virtnet_rq_get_buf(struct receive_queue *rq, u32 *len, void **ctx)
> > > > > > > > > > > > > > > +{
> > > > > > > > > > > > > > > +       struct virtnet_rq_data *data;
> > > > > > > > > > > > > > > +       void *buf;
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > +       buf = virtqueue_get_buf_ctx(rq->vq, len, ctx);
> > > > > > > > > > > > > > > +       if (!buf || !rq->data_array)
> > > > > > > > > > > > > > > +               return buf;
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > +       data = buf;
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > +       virtnet_rq_unmap(rq, data->dma);
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > +       return virtnet_rq_recycle_data(rq, data);
> > > > > > > > > > > > > > > +}
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > +static void *virtnet_rq_detach_unused_buf(struct receive_queue *rq)
> > > > > > > > > > > > > > > +{
> > > > > > > > > > > > > > > +       struct virtnet_rq_data *data;
> > > > > > > > > > > > > > > +       void *buf;
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > +       buf = virtqueue_detach_unused_buf(rq->vq);
> > > > > > > > > > > > > > > +       if (!buf || !rq->data_array)
> > > > > > > > > > > > > > > +               return buf;
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > +       data = buf;
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > +       virtnet_rq_unmap(rq, data->dma);
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > +       return virtnet_rq_recycle_data(rq, data);
> > > > > > > > > > > > > > > +}
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > +static int virtnet_rq_map_sg(struct receive_queue *rq, void *buf, u32 len)
> > > > > > > > > > > > > > > +{
> > > > > > > > > > > > > > > +       struct virtnet_rq_dma *dma = rq->last_dma;
> > > > > > > > > > > > > > > +       struct device *dev;
> > > > > > > > > > > > > > > +       u32 off, map_len;
> > > > > > > > > > > > > > > +       dma_addr_t addr;
> > > > > > > > > > > > > > > +       void *end;
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > +       if (likely(dma) && buf >= dma->buf && (buf + len <= dma->buf + dma->len)) {
> > > > > > > > > > > > > > > +               ++dma->ref;
> > > > > > > > > > > > > > > +               addr = dma->addr + (buf - dma->buf);
> > > > > > > > > > > > > > > +               goto ok;
> > > > > > > > > > > > > > > +       }
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > So this is the meat of the proposed optimization. I guess that
> > > > > > > > > > > > > > if the last buffer we allocated happens to be in the same page
> > > > > > > > > > > > > > as this one then they can both be mapped for DMA together.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Since we use page_frag, the buffers we allocated are all continuous.
> > > > > > > > > > > > >
> > > > > > > > > > > > > > Why last one specifically? Whether next one happens to
> > > > > > > > > > > > > > be close depends on luck. If you want to try optimizing this
> > > > > > > > > > > > > > the right thing to do is likely by using a page pool.
> > > > > > > > > > > > > > There's actually work upstream on page pool, look it up.
> > > > > > > > > > > > >
> > > > > > > > > > > > > As we discussed in another thread, the page pool is first used for xdp. Let's
> > > > > > > > > > > > > transform it step by step.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Thanks.
> > > > > > > > > > > >
> > > > > > > > > > > > ok so this should wait then?
> > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > +       end = buf + len - 1;
> > > > > > > > > > > > > > > +       off = offset_in_page(end);
> > > > > > > > > > > > > > > +       map_len = len + PAGE_SIZE - off;
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > +       dev = virtqueue_dma_dev(rq->vq);
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > +       addr = dma_map_page_attrs(dev, virt_to_page(buf), offset_in_page(buf),
> > > > > > > > > > > > > > > +                                 map_len, DMA_FROM_DEVICE, 0);
> > > > > > > > > > > > > > > +       if (addr == DMA_MAPPING_ERROR)
> > > > > > > > > > > > > > > +               return -ENOMEM;
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > +       dma = rq->dma_free;
> > > > > > > > > > > > > > > +       rq->dma_free = dma->next;
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > +       dma->ref = 1;
> > > > > > > > > > > > > > > +       dma->buf = buf;
> > > > > > > > > > > > > > > +       dma->addr = addr;
> > > > > > > > > > > > > > > +       dma->len = map_len;
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > +       rq->last_dma = dma;
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > +ok:
> > > > > > > > > > > > > > > +       sg_init_table(rq->sg, 1);
> > > > > > > > > > > > > > > +       rq->sg[0].dma_address = addr;
> > > > > > > > > > > > > > > +       rq->sg[0].length = len;
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > +       return 0;
> > > > > > > > > > > > > > > +}
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > +static int virtnet_rq_merge_map_init(struct virtnet_info *vi)
> > > > > > > > > > > > > > > +{
> > > > > > > > > > > > > > > +       struct receive_queue *rq;
> > > > > > > > > > > > > > > +       int i, err, j, num;
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > +       /* disable for big mode */
> > > > > > > > > > > > > > > +       if (!vi->mergeable_rx_bufs && vi->big_packets)
> > > > > > > > > > > > > > > +               return 0;
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > +       for (i = 0; i < vi->max_queue_pairs; i++) {
> > > > > > > > > > > > > > > +               err = virtqueue_set_premapped(vi->rq[i].vq);
> > > > > > > > > > > > > > > +               if (err)
> > > > > > > > > > > > > > > +                       continue;
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > +               rq = &vi->rq[i];
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > +               num = virtqueue_get_vring_size(rq->vq);
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > +               rq->data_array = kmalloc_array(num, sizeof(*rq->data_array), GFP_KERNEL);
> > > > > > > > > > > > > > > +               if (!rq->data_array)
> > > > > > > > > > > > > > > +                       goto err;
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > +               rq->dma_array = kmalloc_array(num, sizeof(*rq->dma_array), GFP_KERNEL);
> > > > > > > > > > > > > > > +               if (!rq->dma_array)
> > > > > > > > > > > > > > > +                       goto err;
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > +               for (j = 0; j < num; ++j) {
> > > > > > > > > > > > > > > +                       rq->data_array[j].next = rq->data_free;
> > > > > > > > > > > > > > > +                       rq->data_free = &rq->data_array[j];
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > +                       rq->dma_array[j].next = rq->dma_free;
> > > > > > > > > > > > > > > +                       rq->dma_free = &rq->dma_array[j];
> > > > > > > > > > > > > > > +               }
> > > > > > > > > > > > > > > +       }
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > +       return 0;
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > +err:
> > > > > > > > > > > > > > > +       for (i = 0; i < vi->max_queue_pairs; i++) {
> > > > > > > > > > > > > > > +               struct receive_queue *rq;
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > +               rq = &vi->rq[i];
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > +               kfree(rq->dma_array);
> > > > > > > > > > > > > > > +               kfree(rq->data_array);
> > > > > > > > > > > > > > > +       }
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > +       return -ENOMEM;
> > > > > > > > > > > > > > > +}
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > >  static void free_old_xmit_skbs(struct send_queue *sq, bool in_napi)
> > > > > > > > > > > > > > >  {
> > > > > > > > > > > > > > >         unsigned int len;
> > > > > > > > > > > > > > > @@ -835,7 +1033,7 @@ static struct page *xdp_linearize_page(struct receive_queue *rq,
> > > > > > > > > > > > > > >                 void *buf;
> > > > > > > > > > > > > > >                 int off;
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > -               buf = virtqueue_get_buf(rq->vq, &buflen);
> > > > > > > > > > > > > > > +               buf = virtnet_rq_get_buf(rq, &buflen, NULL);
> > > > > > > > > > > > > > >                 if (unlikely(!buf))
> > > > > > > > > > > > > > >                         goto err_buf;
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > @@ -1126,7 +1324,7 @@ static int virtnet_build_xdp_buff_mrg(struct net_device *dev,
> > > > > > > > > > > > > > >                 return -EINVAL;
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >         while (--*num_buf > 0) {
> > > > > > > > > > > > > > > -               buf = virtqueue_get_buf_ctx(rq->vq, &len, &ctx);
> > > > > > > > > > > > > > > +               buf = virtnet_rq_get_buf(rq, &len, &ctx);
> > > > > > > > > > > > > > >                 if (unlikely(!buf)) {
> > > > > > > > > > > > > > >                         pr_debug("%s: rx error: %d buffers out of %d missing\n",
> > > > > > > > > > > > > > >                                  dev->name, *num_buf,
> > > > > > > > > > > > > > > @@ -1351,7 +1549,7 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
> > > > > > > > > > > > > > >         while (--num_buf) {
> > > > > > > > > > > > > > >                 int num_skb_frags;
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > -               buf = virtqueue_get_buf_ctx(rq->vq, &len, &ctx);
> > > > > > > > > > > > > > > +               buf = virtnet_rq_get_buf(rq, &len, &ctx);
> > > > > > > > > > > > > > >                 if (unlikely(!buf)) {
> > > > > > > > > > > > > > >                         pr_debug("%s: rx error: %d buffers out of %d missing\n",
> > > > > > > > > > > > > > >                                  dev->name, num_buf,
> > > > > > > > > > > > > > > @@ -1414,7 +1612,7 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
> > > > > > > > > > > > > > >  err_skb:
> > > > > > > > > > > > > > >         put_page(page);
> > > > > > > > > > > > > > >         while (num_buf-- > 1) {
> > > > > > > > > > > > > > > -               buf = virtqueue_get_buf(rq->vq, &len);
> > > > > > > > > > > > > > > +               buf = virtnet_rq_get_buf(rq, &len, NULL);
> > > > > > > > > > > > > > >                 if (unlikely(!buf)) {
> > > > > > > > > > > > > > >                         pr_debug("%s: rx error: %d buffers missing\n",
> > > > > > > > > > > > > > >                                  dev->name, num_buf);
> > > > > > > > > > > > > > > @@ -1529,6 +1727,7 @@ static int add_recvbuf_small(struct virtnet_info *vi, struct receive_queue *rq,
> > > > > > > > > > > > > > >         unsigned int xdp_headroom = virtnet_get_headroom(vi);
> > > > > > > > > > > > > > >         void *ctx = (void *)(unsigned long)xdp_headroom;
> > > > > > > > > > > > > > >         int len = vi->hdr_len + VIRTNET_RX_PAD + GOOD_PACKET_LEN + xdp_headroom;
> > > > > > > > > > > > > > > +       struct virtnet_rq_data *data;
> > > > > > > > > > > > > > >         int err;
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >         len = SKB_DATA_ALIGN(len) +
> > > > > > > > > > > > > > > @@ -1539,11 +1738,34 @@ static int add_recvbuf_small(struct virtnet_info *vi, struct receive_queue *rq,
> > > > > > > > > > > > > > >         buf = (char *)page_address(alloc_frag->page) + alloc_frag->offset;
> > > > > > > > > > > > > > >         get_page(alloc_frag->page);
> > > > > > > > > > > > > > >         alloc_frag->offset += len;
> > > > > > > > > > > > > > > -       sg_init_one(rq->sg, buf + VIRTNET_RX_PAD + xdp_headroom,
> > > > > > > > > > > > > > > -                   vi->hdr_len + GOOD_PACKET_LEN);
> > > > > > > > > > > > > > > -       err = virtqueue_add_inbuf_ctx(rq->vq, rq->sg, 1, buf, ctx, gfp);
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > +       if (rq->data_array) {
> > > > > > > > > > > > > > > +               err = virtnet_rq_map_sg(rq, buf + VIRTNET_RX_PAD + xdp_headroom,
> > > > > > > > > > > > > > > +                                       vi->hdr_len + GOOD_PACKET_LEN);
> > > > > > > > > > > > > > > +               if (err)
> > > > > > > > > > > > > > > +                       goto map_err;
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > +               data = virtnet_rq_get_data(rq, buf, rq->last_dma);
> > > > > > > > > > > > > > > +       } else {
> > > > > > > > > > > > > > > +               sg_init_one(rq->sg, buf + VIRTNET_RX_PAD + xdp_headroom,
> > > > > > > > > > > > > > > +                           vi->hdr_len + GOOD_PACKET_LEN);
> > > > > > > > > > > > > > > +               data = (void *)buf;
> > > > > > > > > > > > > > > +       }
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > +       err = virtqueue_add_inbuf_ctx(rq->vq, rq->sg, 1, data, ctx, gfp);
> > > > > > > > > > > > > > >         if (err < 0)
> > > > > > > > > > > > > > > -               put_page(virt_to_head_page(buf));
> > > > > > > > > > > > > > > +               goto add_err;
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > +       return err;
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > +add_err:
> > > > > > > > > > > > > > > +       if (rq->data_array) {
> > > > > > > > > > > > > > > +               virtnet_rq_unmap(rq, data->dma);
> > > > > > > > > > > > > > > +               virtnet_rq_recycle_data(rq, data);
> > > > > > > > > > > > > > > +       }
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > +map_err:
> > > > > > > > > > > > > > > +       put_page(virt_to_head_page(buf));
> > > > > > > > > > > > > > >         return err;
> > > > > > > > > > > > > > >  }
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > @@ -1620,6 +1842,7 @@ static int add_recvbuf_mergeable(struct virtnet_info *vi,
> > > > > > > > > > > > > > >         unsigned int headroom = virtnet_get_headroom(vi);
> > > > > > > > > > > > > > >         unsigned int tailroom = headroom ? sizeof(struct skb_shared_info) : 0;
> > > > > > > > > > > > > > >         unsigned int room = SKB_DATA_ALIGN(headroom + tailroom);
> > > > > > > > > > > > > > > +       struct virtnet_rq_data *data;
> > > > > > > > > > > > > > >         char *buf;
> > > > > > > > > > > > > > >         void *ctx;
> > > > > > > > > > > > > > >         int err;
> > > > > > > > > > > > > > > @@ -1650,12 +1873,32 @@ static int add_recvbuf_mergeable(struct virtnet_info *vi,
> > > > > > > > > > > > > > >                 alloc_frag->offset += hole;
> > > > > > > > > > > > > > >         }
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > -       sg_init_one(rq->sg, buf, len);
> > > > > > > > > > > > > > > +       if (rq->data_array) {
> > > > > > > > > > > > > > > +               err = virtnet_rq_map_sg(rq, buf, len);
> > > > > > > > > > > > > > > +               if (err)
> > > > > > > > > > > > > > > +                       goto map_err;
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > +               data = virtnet_rq_get_data(rq, buf, rq->last_dma);
> > > > > > > > > > > > > > > +       } else {
> > > > > > > > > > > > > > > +               sg_init_one(rq->sg, buf, len);
> > > > > > > > > > > > > > > +               data = (void *)buf;
> > > > > > > > > > > > > > > +       }
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > >         ctx = mergeable_len_to_ctx(len + room, headroom);
> > > > > > > > > > > > > > > -       err = virtqueue_add_inbuf_ctx(rq->vq, rq->sg, 1, buf, ctx, gfp);
> > > > > > > > > > > > > > > +       err = virtqueue_add_inbuf_ctx(rq->vq, rq->sg, 1, data, ctx, gfp);
> > > > > > > > > > > > > > >         if (err < 0)
> > > > > > > > > > > > > > > -               put_page(virt_to_head_page(buf));
> > > > > > > > > > > > > > > +               goto add_err;
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > +       return 0;
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > +add_err:
> > > > > > > > > > > > > > > +       if (rq->data_array) {
> > > > > > > > > > > > > > > +               virtnet_rq_unmap(rq, data->dma);
> > > > > > > > > > > > > > > +               virtnet_rq_recycle_data(rq, data);
> > > > > > > > > > > > > > > +       }
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > +map_err:
> > > > > > > > > > > > > > > +       put_page(virt_to_head_page(buf));
> > > > > > > > > > > > > > >         return err;
> > > > > > > > > > > > > > >  }
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > @@ -1775,13 +2018,13 @@ static int virtnet_receive(struct receive_queue *rq, int budget,
> > > > > > > > > > > > > > >                 void *ctx;
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >                 while (stats.packets < budget &&
> > > > > > > > > > > > > > > -                      (buf = virtqueue_get_buf_ctx(rq->vq, &len, &ctx))) {
> > > > > > > > > > > > > > > +                      (buf = virtnet_rq_get_buf(rq, &len, &ctx))) {
> > > > > > > > > > > > > > >                         receive_buf(vi, rq, buf, len, ctx, xdp_xmit, &stats);
> > > > > > > > > > > > > > >                         stats.packets++;
> > > > > > > > > > > > > > >                 }
> > > > > > > > > > > > > > >         } else {
> > > > > > > > > > > > > > >                 while (stats.packets < budget &&
> > > > > > > > > > > > > > > -                      (buf = virtqueue_get_buf(rq->vq, &len)) != NULL) {
> > > > > > > > > > > > > > > +                      (buf = virtnet_rq_get_buf(rq, &len, NULL)) != NULL) {
> > > > > > > > > > > > > > >                         receive_buf(vi, rq, buf, len, NULL, xdp_xmit, &stats);
> > > > > > > > > > > > > > >                         stats.packets++;
> > > > > > > > > > > > > > >                 }
> > > > > > > > > > > > > > > @@ -3514,6 +3757,9 @@ static void virtnet_free_queues(struct virtnet_info *vi)
> > > > > > > > > > > > > > >         for (i = 0; i < vi->max_queue_pairs; i++) {
> > > > > > > > > > > > > > >                 __netif_napi_del(&vi->rq[i].napi);
> > > > > > > > > > > > > > >                 __netif_napi_del(&vi->sq[i].napi);
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > +               kfree(vi->rq[i].data_array);
> > > > > > > > > > > > > > > +               kfree(vi->rq[i].dma_array);
> > > > > > > > > > > > > > >         }
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >         /* We called __netif_napi_del(),
> > > > > > > > > > > > > > > @@ -3591,9 +3837,10 @@ static void free_unused_bufs(struct virtnet_info *vi)
> > > > > > > > > > > > > > >         }
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >         for (i = 0; i < vi->max_queue_pairs; i++) {
> > > > > > > > > > > > > > > -               struct virtqueue *vq = vi->rq[i].vq;
> > > > > > > > > > > > > > > -               while ((buf = virtqueue_detach_unused_buf(vq)) != NULL)
> > > > > > > > > > > > > > > -                       virtnet_rq_free_unused_buf(vq, buf);
> > > > > > > > > > > > > > > +               struct receive_queue *rq = &vi->rq[i];
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > +               while ((buf = virtnet_rq_detach_unused_buf(rq)) != NULL)
> > > > > > > > > > > > > > > +                       virtnet_rq_free_unused_buf(rq->vq, buf);
> > > > > > > > > > > > > > >                 cond_resched();
> > > > > > > > > > > > > > >         }
> > > > > > > > > > > > > > >  }
> > > > > > > > > > > > > > > @@ -3767,6 +4014,10 @@ static int init_vqs(struct virtnet_info *vi)
> > > > > > > > > > > > > > >         if (ret)
> > > > > > > > > > > > > > >                 goto err_free;
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > +       ret = virtnet_rq_merge_map_init(vi);
> > > > > > > > > > > > > > > +       if (ret)
> > > > > > > > > > > > > > > +               goto err_free;
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > >         cpus_read_lock();
> > > > > > > > > > > > > > >         virtnet_set_affinity(vi);
> > > > > > > > > > > > > > >         cpus_read_unlock();
> > > > > > > > > > > > > > > --
> > > > > > > > > > > > > > > 2.32.0.3.g01195cf9f
> > > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > >
>


_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: [PATCH vhost v11 10/10] virtio_net: merge dma operation for one page
  2023-07-19  9:38                                 ` Jason Wang
@ 2023-07-19  9:51                                   ` Michael S. Tsirkin
  -1 siblings, 0 replies; 176+ messages in thread
From: Michael S. Tsirkin @ 2023-07-19  9:51 UTC (permalink / raw)
  To: Jason Wang
  Cc: Xuan Zhuo, virtualization, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Alexei Starovoitov, Daniel Borkmann,
	Jesper Dangaard Brouer, John Fastabend, netdev, bpf,
	Christoph Hellwig

On Wed, Jul 19, 2023 at 05:38:56PM +0800, Jason Wang wrote:
> On Wed, Jul 19, 2023 at 4:55 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> >
> > On Wed, Jul 19, 2023 at 11:21:07AM +0800, Xuan Zhuo wrote:
> > > On Fri, 14 Jul 2023 06:37:10 -0400, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > > > On Wed, Jul 12, 2023 at 04:38:24PM +0800, Xuan Zhuo wrote:
> > > > > On Wed, 12 Jul 2023 16:37:43 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > > > > On Wed, Jul 12, 2023 at 4:33 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > > > > >
> > > > > > > On Wed, 12 Jul 2023 15:54:58 +0800, Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > > > > > > On Tue, 11 Jul 2023 10:58:51 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > > > > > > > On Tue, Jul 11, 2023 at 10:42 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > > > > > > > >
> > > > > > > > > > On Tue, 11 Jul 2023 10:36:17 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > > > > > > > > > On Mon, Jul 10, 2023 at 8:41 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > On Mon, 10 Jul 2023 07:59:03 -0400, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > > > > > > > > > > > > On Mon, Jul 10, 2023 at 06:18:30PM +0800, Xuan Zhuo wrote:
> > > > > > > > > > > > > > On Mon, 10 Jul 2023 05:40:21 -0400, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > > > > > > > > > > > > > > On Mon, Jul 10, 2023 at 11:42:37AM +0800, Xuan Zhuo wrote:
> > > > > > > > > > > > > > > > Currently, the virtio core will perform a dma operation for each
> > > > > > > > > > > > > > > > operation. Although, the same page may be operated multiple times.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > The driver does the dma operation and manages the dma address based the
> > > > > > > > > > > > > > > > feature premapped of virtio core.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > This way, we can perform only one dma operation for the same page. In
> > > > > > > > > > > > > > > > the case of mtu 1500, this can reduce a lot of dma operations.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Tested on Aliyun g7.4large machine, in the case of a cpu 100%, pps
> > > > > > > > > > > > > > > > increased from 1893766 to 1901105. An increase of 0.4%.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > what kind of dma was there? an IOMMU? which vendors? in which mode
> > > > > > > > > > > > > > > of operation?
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Do you mean this:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > [    0.470816] iommu: Default domain type: Passthrough
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > With passthrough, dma API is just some indirect function calls, they do
> > > > > > > > > > > > > not affect the performance a lot.
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > Yes, this benefit is worthless. I seem to have done a meaningless thing. The
> > > > > > > > > > > > overhead of DMA I observed is indeed not too high.
> > > > > > > > > > >
> > > > > > > > > > > Have you measured with iommu=strict?
> > > > > > > > > >
> > > > > > > > > > I have not tested this way, our environment is pt, I wonder if strict is a
> > > > > > > > > > common scenario. I can test it.
> > > > > > > > >
> > > > > > > > > It's not a common setup, but it's a way to stress DMA layer to see the overhead.
> > > > > > > >
> > > > > > > > kernel command line: intel_iommu=on iommu.strict=1 iommu.passthrough=0
> > > > > > > >
> > > > > > > > virtio-net without merge dma 428614.00 pps
> > > > > > > >
> > > > > > > > virtio-net with merge dma    742853.00 pps
> > > > > > >
> > > > > > >
> > > > > > > kernel command line: intel_iommu=on iommu.strict=0 iommu.passthrough=0
> > > > > > >
> > > > > > > virtio-net without merge dma 775496.00 pps
> > > > > > >
> > > > > > > virtio-net with merge dma    1010514.00 pps
> > > > > > >
> > > > > > >
> > > > > >
> > > > > > Great, let's add those numbers to the changelog.
> > > > >
> > > > >
> > > > > Yes, I will do it in next version.
> > > > >
> > > > >
> > > > > Thanks.
> > > > >
> > > >
> > > > You should also test without iommu but with swiotlb=force
> > >
> > >
> > > For swiotlb, merge DMA has no benefit, because we still need to copy data from
> > > swiotlb buffer to the origin buffer.
> > > The benefit of the merge DMA is to reduce the operate to the iommu device.
> > >
> > > I did some test for this. The result is same.
> > >
> > > Thanks.
> > >
> >
> > Did you actually check that it works though?
> > Looks like with swiotlb you need to synch to trigger a copy
> > before unmap, and I don't see where it's done in the current
> > patch.
> 
> And this is needed for XDP_REDIRECT as well.
> 
> Thanks

And once you do, you'll do the copy twice so it will
actually be slower.

I suspect you need to sync manually then unmap with DMA_ATTR_SKIP_CPU_SYNC.

> >
> >
> > >
> > > >
> > > > But first fix the use of DMA API to actually be correct,
> > > > otherwise you are cheating by avoiding synchronization.
> > > >
> > > >
> > > >
> > > > > >
> > > > > > Thanks
> > > > > >
> > > > > > > Thanks.
> > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > Thanks.
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > >
> > > > > > > > > Thanks
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > Thanks.
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > Thanks
> > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > Thanks.
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > Try e.g. bounce buffer. Which is where you will see a problem: your
> > > > > > > > > > > > > patches won't work.
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > This kind of difference is likely in the noise.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > It's really not high, but this is because the proportion of DMA under perf top
> > > > > > > > > > > > > > is not high. Probably that much.
> > > > > > > > > > > > >
> > > > > > > > > > > > > So maybe not worth the complexity.
> > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > ---
> > > > > > > > > > > > > > > >  drivers/net/virtio_net.c | 283 ++++++++++++++++++++++++++++++++++++---
> > > > > > > > > > > > > > > >  1 file changed, 267 insertions(+), 16 deletions(-)
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > > > > > > > > > > > > > > > index 486b5849033d..4de845d35bed 100644
> > > > > > > > > > > > > > > > --- a/drivers/net/virtio_net.c
> > > > > > > > > > > > > > > > +++ b/drivers/net/virtio_net.c
> > > > > > > > > > > > > > > > @@ -126,6 +126,27 @@ static const struct virtnet_stat_desc virtnet_rq_stats_desc[] = {
> > > > > > > > > > > > > > > >  #define VIRTNET_SQ_STATS_LEN   ARRAY_SIZE(virtnet_sq_stats_desc)
> > > > > > > > > > > > > > > >  #define VIRTNET_RQ_STATS_LEN   ARRAY_SIZE(virtnet_rq_stats_desc)
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > +/* The bufs on the same page may share this struct. */
> > > > > > > > > > > > > > > > +struct virtnet_rq_dma {
> > > > > > > > > > > > > > > > +       struct virtnet_rq_dma *next;
> > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > +       dma_addr_t addr;
> > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > +       void *buf;
> > > > > > > > > > > > > > > > +       u32 len;
> > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > +       u32 ref;
> > > > > > > > > > > > > > > > +};
> > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > +/* Record the dma and buf. */
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > I guess I see that. But why?
> > > > > > > > > > > > > > > And these two comments are the extent of the available
> > > > > > > > > > > > > > > documentation, that's not enough I feel.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > +struct virtnet_rq_data {
> > > > > > > > > > > > > > > > +       struct virtnet_rq_data *next;
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Is manually reimplementing a linked list the best
> > > > > > > > > > > > > > > we can do?
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Yes, we can use llist.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > +       void *buf;
> > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > +       struct virtnet_rq_dma *dma;
> > > > > > > > > > > > > > > > +};
> > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > >  /* Internal representation of a send virtqueue */
> > > > > > > > > > > > > > > >  struct send_queue {
> > > > > > > > > > > > > > > >         /* Virtqueue associated with this send _queue */
> > > > > > > > > > > > > > > > @@ -175,6 +196,13 @@ struct receive_queue {
> > > > > > > > > > > > > > > >         char name[16];
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >         struct xdp_rxq_info xdp_rxq;
> > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > +       struct virtnet_rq_data *data_array;
> > > > > > > > > > > > > > > > +       struct virtnet_rq_data *data_free;
> > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > +       struct virtnet_rq_dma *dma_array;
> > > > > > > > > > > > > > > > +       struct virtnet_rq_dma *dma_free;
> > > > > > > > > > > > > > > > +       struct virtnet_rq_dma *last_dma;
> > > > > > > > > > > > > > > >  };
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >  /* This structure can contain rss message with maximum settings for indirection table and keysize
> > > > > > > > > > > > > > > > @@ -549,6 +577,176 @@ static struct sk_buff *page_to_skb(struct virtnet_info *vi,
> > > > > > > > > > > > > > > >         return skb;
> > > > > > > > > > > > > > > >  }
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > +static void virtnet_rq_unmap(struct receive_queue *rq, struct virtnet_rq_dma *dma)
> > > > > > > > > > > > > > > > +{
> > > > > > > > > > > > > > > > +       struct device *dev;
> > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > +       --dma->ref;
> > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > +       if (dma->ref)
> > > > > > > > > > > > > > > > +               return;
> > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > If you don't unmap there is no guarantee valid data will be
> > > > > > > > > > > > > > > there in the buffer.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > +       dev = virtqueue_dma_dev(rq->vq);
> > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > +       dma_unmap_page(dev, dma->addr, dma->len, DMA_FROM_DEVICE);
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > +       dma->next = rq->dma_free;
> > > > > > > > > > > > > > > > +       rq->dma_free = dma;
> > > > > > > > > > > > > > > > +}
> > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > +static void *virtnet_rq_recycle_data(struct receive_queue *rq,
> > > > > > > > > > > > > > > > +                                    struct virtnet_rq_data *data)
> > > > > > > > > > > > > > > > +{
> > > > > > > > > > > > > > > > +       void *buf;
> > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > +       buf = data->buf;
> > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > +       data->next = rq->data_free;
> > > > > > > > > > > > > > > > +       rq->data_free = data;
> > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > +       return buf;
> > > > > > > > > > > > > > > > +}
> > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > +static struct virtnet_rq_data *virtnet_rq_get_data(struct receive_queue *rq,
> > > > > > > > > > > > > > > > +                                                  void *buf,
> > > > > > > > > > > > > > > > +                                                  struct virtnet_rq_dma *dma)
> > > > > > > > > > > > > > > > +{
> > > > > > > > > > > > > > > > +       struct virtnet_rq_data *data;
> > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > +       data = rq->data_free;
> > > > > > > > > > > > > > > > +       rq->data_free = data->next;
> > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > +       data->buf = buf;
> > > > > > > > > > > > > > > > +       data->dma = dma;
> > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > +       return data;
> > > > > > > > > > > > > > > > +}
> > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > +static void *virtnet_rq_get_buf(struct receive_queue *rq, u32 *len, void **ctx)
> > > > > > > > > > > > > > > > +{
> > > > > > > > > > > > > > > > +       struct virtnet_rq_data *data;
> > > > > > > > > > > > > > > > +       void *buf;
> > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > +       buf = virtqueue_get_buf_ctx(rq->vq, len, ctx);
> > > > > > > > > > > > > > > > +       if (!buf || !rq->data_array)
> > > > > > > > > > > > > > > > +               return buf;
> > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > +       data = buf;
> > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > +       virtnet_rq_unmap(rq, data->dma);
> > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > +       return virtnet_rq_recycle_data(rq, data);
> > > > > > > > > > > > > > > > +}
> > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > +static void *virtnet_rq_detach_unused_buf(struct receive_queue *rq)
> > > > > > > > > > > > > > > > +{
> > > > > > > > > > > > > > > > +       struct virtnet_rq_data *data;
> > > > > > > > > > > > > > > > +       void *buf;
> > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > +       buf = virtqueue_detach_unused_buf(rq->vq);
> > > > > > > > > > > > > > > > +       if (!buf || !rq->data_array)
> > > > > > > > > > > > > > > > +               return buf;
> > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > +       data = buf;
> > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > +       virtnet_rq_unmap(rq, data->dma);
> > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > +       return virtnet_rq_recycle_data(rq, data);
> > > > > > > > > > > > > > > > +}
> > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > +static int virtnet_rq_map_sg(struct receive_queue *rq, void *buf, u32 len)
> > > > > > > > > > > > > > > > +{
> > > > > > > > > > > > > > > > +       struct virtnet_rq_dma *dma = rq->last_dma;
> > > > > > > > > > > > > > > > +       struct device *dev;
> > > > > > > > > > > > > > > > +       u32 off, map_len;
> > > > > > > > > > > > > > > > +       dma_addr_t addr;
> > > > > > > > > > > > > > > > +       void *end;
> > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > +       if (likely(dma) && buf >= dma->buf && (buf + len <= dma->buf + dma->len)) {
> > > > > > > > > > > > > > > > +               ++dma->ref;
> > > > > > > > > > > > > > > > +               addr = dma->addr + (buf - dma->buf);
> > > > > > > > > > > > > > > > +               goto ok;
> > > > > > > > > > > > > > > > +       }
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > So this is the meat of the proposed optimization. I guess that
> > > > > > > > > > > > > > > if the last buffer we allocated happens to be in the same page
> > > > > > > > > > > > > > > as this one then they can both be mapped for DMA together.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Since we use page_frag, the buffers we allocated are all continuous.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Why last one specifically? Whether next one happens to
> > > > > > > > > > > > > > > be close depends on luck. If you want to try optimizing this
> > > > > > > > > > > > > > > the right thing to do is likely by using a page pool.
> > > > > > > > > > > > > > > There's actually work upstream on page pool, look it up.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > As we discussed in another thread, the page pool is first used for xdp. Let's
> > > > > > > > > > > > > > transform it step by step.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Thanks.
> > > > > > > > > > > > >
> > > > > > > > > > > > > ok so this should wait then?
> > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > +       end = buf + len - 1;
> > > > > > > > > > > > > > > > +       off = offset_in_page(end);
> > > > > > > > > > > > > > > > +       map_len = len + PAGE_SIZE - off;
> > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > +       dev = virtqueue_dma_dev(rq->vq);
> > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > +       addr = dma_map_page_attrs(dev, virt_to_page(buf), offset_in_page(buf),
> > > > > > > > > > > > > > > > +                                 map_len, DMA_FROM_DEVICE, 0);
> > > > > > > > > > > > > > > > +       if (addr == DMA_MAPPING_ERROR)
> > > > > > > > > > > > > > > > +               return -ENOMEM;
> > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > +       dma = rq->dma_free;
> > > > > > > > > > > > > > > > +       rq->dma_free = dma->next;
> > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > +       dma->ref = 1;
> > > > > > > > > > > > > > > > +       dma->buf = buf;
> > > > > > > > > > > > > > > > +       dma->addr = addr;
> > > > > > > > > > > > > > > > +       dma->len = map_len;
> > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > +       rq->last_dma = dma;
> > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > +ok:
> > > > > > > > > > > > > > > > +       sg_init_table(rq->sg, 1);
> > > > > > > > > > > > > > > > +       rq->sg[0].dma_address = addr;
> > > > > > > > > > > > > > > > +       rq->sg[0].length = len;
> > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > +       return 0;
> > > > > > > > > > > > > > > > +}
> > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > +static int virtnet_rq_merge_map_init(struct virtnet_info *vi)
> > > > > > > > > > > > > > > > +{
> > > > > > > > > > > > > > > > +       struct receive_queue *rq;
> > > > > > > > > > > > > > > > +       int i, err, j, num;
> > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > +       /* disable for big mode */
> > > > > > > > > > > > > > > > +       if (!vi->mergeable_rx_bufs && vi->big_packets)
> > > > > > > > > > > > > > > > +               return 0;
> > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > +       for (i = 0; i < vi->max_queue_pairs; i++) {
> > > > > > > > > > > > > > > > +               err = virtqueue_set_premapped(vi->rq[i].vq);
> > > > > > > > > > > > > > > > +               if (err)
> > > > > > > > > > > > > > > > +                       continue;
> > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > +               rq = &vi->rq[i];
> > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > +               num = virtqueue_get_vring_size(rq->vq);
> > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > +               rq->data_array = kmalloc_array(num, sizeof(*rq->data_array), GFP_KERNEL);
> > > > > > > > > > > > > > > > +               if (!rq->data_array)
> > > > > > > > > > > > > > > > +                       goto err;
> > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > +               rq->dma_array = kmalloc_array(num, sizeof(*rq->dma_array), GFP_KERNEL);
> > > > > > > > > > > > > > > > +               if (!rq->dma_array)
> > > > > > > > > > > > > > > > +                       goto err;
> > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > +               for (j = 0; j < num; ++j) {
> > > > > > > > > > > > > > > > +                       rq->data_array[j].next = rq->data_free;
> > > > > > > > > > > > > > > > +                       rq->data_free = &rq->data_array[j];
> > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > +                       rq->dma_array[j].next = rq->dma_free;
> > > > > > > > > > > > > > > > +                       rq->dma_free = &rq->dma_array[j];
> > > > > > > > > > > > > > > > +               }
> > > > > > > > > > > > > > > > +       }
> > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > +       return 0;
> > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > +err:
> > > > > > > > > > > > > > > > +       for (i = 0; i < vi->max_queue_pairs; i++) {
> > > > > > > > > > > > > > > > +               struct receive_queue *rq;
> > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > +               rq = &vi->rq[i];
> > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > +               kfree(rq->dma_array);
> > > > > > > > > > > > > > > > +               kfree(rq->data_array);
> > > > > > > > > > > > > > > > +       }
> > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > +       return -ENOMEM;
> > > > > > > > > > > > > > > > +}
> > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > >  static void free_old_xmit_skbs(struct send_queue *sq, bool in_napi)
> > > > > > > > > > > > > > > >  {
> > > > > > > > > > > > > > > >         unsigned int len;
> > > > > > > > > > > > > > > > @@ -835,7 +1033,7 @@ static struct page *xdp_linearize_page(struct receive_queue *rq,
> > > > > > > > > > > > > > > >                 void *buf;
> > > > > > > > > > > > > > > >                 int off;
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > -               buf = virtqueue_get_buf(rq->vq, &buflen);
> > > > > > > > > > > > > > > > +               buf = virtnet_rq_get_buf(rq, &buflen, NULL);
> > > > > > > > > > > > > > > >                 if (unlikely(!buf))
> > > > > > > > > > > > > > > >                         goto err_buf;
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > @@ -1126,7 +1324,7 @@ static int virtnet_build_xdp_buff_mrg(struct net_device *dev,
> > > > > > > > > > > > > > > >                 return -EINVAL;
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >         while (--*num_buf > 0) {
> > > > > > > > > > > > > > > > -               buf = virtqueue_get_buf_ctx(rq->vq, &len, &ctx);
> > > > > > > > > > > > > > > > +               buf = virtnet_rq_get_buf(rq, &len, &ctx);
> > > > > > > > > > > > > > > >                 if (unlikely(!buf)) {
> > > > > > > > > > > > > > > >                         pr_debug("%s: rx error: %d buffers out of %d missing\n",
> > > > > > > > > > > > > > > >                                  dev->name, *num_buf,
> > > > > > > > > > > > > > > > @@ -1351,7 +1549,7 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
> > > > > > > > > > > > > > > >         while (--num_buf) {
> > > > > > > > > > > > > > > >                 int num_skb_frags;
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > -               buf = virtqueue_get_buf_ctx(rq->vq, &len, &ctx);
> > > > > > > > > > > > > > > > +               buf = virtnet_rq_get_buf(rq, &len, &ctx);
> > > > > > > > > > > > > > > >                 if (unlikely(!buf)) {
> > > > > > > > > > > > > > > >                         pr_debug("%s: rx error: %d buffers out of %d missing\n",
> > > > > > > > > > > > > > > >                                  dev->name, num_buf,
> > > > > > > > > > > > > > > > @@ -1414,7 +1612,7 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
> > > > > > > > > > > > > > > >  err_skb:
> > > > > > > > > > > > > > > >         put_page(page);
> > > > > > > > > > > > > > > >         while (num_buf-- > 1) {
> > > > > > > > > > > > > > > > -               buf = virtqueue_get_buf(rq->vq, &len);
> > > > > > > > > > > > > > > > +               buf = virtnet_rq_get_buf(rq, &len, NULL);
> > > > > > > > > > > > > > > >                 if (unlikely(!buf)) {
> > > > > > > > > > > > > > > >                         pr_debug("%s: rx error: %d buffers missing\n",
> > > > > > > > > > > > > > > >                                  dev->name, num_buf);
> > > > > > > > > > > > > > > > @@ -1529,6 +1727,7 @@ static int add_recvbuf_small(struct virtnet_info *vi, struct receive_queue *rq,
> > > > > > > > > > > > > > > >         unsigned int xdp_headroom = virtnet_get_headroom(vi);
> > > > > > > > > > > > > > > >         void *ctx = (void *)(unsigned long)xdp_headroom;
> > > > > > > > > > > > > > > >         int len = vi->hdr_len + VIRTNET_RX_PAD + GOOD_PACKET_LEN + xdp_headroom;
> > > > > > > > > > > > > > > > +       struct virtnet_rq_data *data;
> > > > > > > > > > > > > > > >         int err;
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >         len = SKB_DATA_ALIGN(len) +
> > > > > > > > > > > > > > > > @@ -1539,11 +1738,34 @@ static int add_recvbuf_small(struct virtnet_info *vi, struct receive_queue *rq,
> > > > > > > > > > > > > > > >         buf = (char *)page_address(alloc_frag->page) + alloc_frag->offset;
> > > > > > > > > > > > > > > >         get_page(alloc_frag->page);
> > > > > > > > > > > > > > > >         alloc_frag->offset += len;
> > > > > > > > > > > > > > > > -       sg_init_one(rq->sg, buf + VIRTNET_RX_PAD + xdp_headroom,
> > > > > > > > > > > > > > > > -                   vi->hdr_len + GOOD_PACKET_LEN);
> > > > > > > > > > > > > > > > -       err = virtqueue_add_inbuf_ctx(rq->vq, rq->sg, 1, buf, ctx, gfp);
> > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > +       if (rq->data_array) {
> > > > > > > > > > > > > > > > +               err = virtnet_rq_map_sg(rq, buf + VIRTNET_RX_PAD + xdp_headroom,
> > > > > > > > > > > > > > > > +                                       vi->hdr_len + GOOD_PACKET_LEN);
> > > > > > > > > > > > > > > > +               if (err)
> > > > > > > > > > > > > > > > +                       goto map_err;
> > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > +               data = virtnet_rq_get_data(rq, buf, rq->last_dma);
> > > > > > > > > > > > > > > > +       } else {
> > > > > > > > > > > > > > > > +               sg_init_one(rq->sg, buf + VIRTNET_RX_PAD + xdp_headroom,
> > > > > > > > > > > > > > > > +                           vi->hdr_len + GOOD_PACKET_LEN);
> > > > > > > > > > > > > > > > +               data = (void *)buf;
> > > > > > > > > > > > > > > > +       }
> > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > +       err = virtqueue_add_inbuf_ctx(rq->vq, rq->sg, 1, data, ctx, gfp);
> > > > > > > > > > > > > > > >         if (err < 0)
> > > > > > > > > > > > > > > > -               put_page(virt_to_head_page(buf));
> > > > > > > > > > > > > > > > +               goto add_err;
> > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > +       return err;
> > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > +add_err:
> > > > > > > > > > > > > > > > +       if (rq->data_array) {
> > > > > > > > > > > > > > > > +               virtnet_rq_unmap(rq, data->dma);
> > > > > > > > > > > > > > > > +               virtnet_rq_recycle_data(rq, data);
> > > > > > > > > > > > > > > > +       }
> > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > +map_err:
> > > > > > > > > > > > > > > > +       put_page(virt_to_head_page(buf));
> > > > > > > > > > > > > > > >         return err;
> > > > > > > > > > > > > > > >  }
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > @@ -1620,6 +1842,7 @@ static int add_recvbuf_mergeable(struct virtnet_info *vi,
> > > > > > > > > > > > > > > >         unsigned int headroom = virtnet_get_headroom(vi);
> > > > > > > > > > > > > > > >         unsigned int tailroom = headroom ? sizeof(struct skb_shared_info) : 0;
> > > > > > > > > > > > > > > >         unsigned int room = SKB_DATA_ALIGN(headroom + tailroom);
> > > > > > > > > > > > > > > > +       struct virtnet_rq_data *data;
> > > > > > > > > > > > > > > >         char *buf;
> > > > > > > > > > > > > > > >         void *ctx;
> > > > > > > > > > > > > > > >         int err;
> > > > > > > > > > > > > > > > @@ -1650,12 +1873,32 @@ static int add_recvbuf_mergeable(struct virtnet_info *vi,
> > > > > > > > > > > > > > > >                 alloc_frag->offset += hole;
> > > > > > > > > > > > > > > >         }
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > -       sg_init_one(rq->sg, buf, len);
> > > > > > > > > > > > > > > > +       if (rq->data_array) {
> > > > > > > > > > > > > > > > +               err = virtnet_rq_map_sg(rq, buf, len);
> > > > > > > > > > > > > > > > +               if (err)
> > > > > > > > > > > > > > > > +                       goto map_err;
> > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > +               data = virtnet_rq_get_data(rq, buf, rq->last_dma);
> > > > > > > > > > > > > > > > +       } else {
> > > > > > > > > > > > > > > > +               sg_init_one(rq->sg, buf, len);
> > > > > > > > > > > > > > > > +               data = (void *)buf;
> > > > > > > > > > > > > > > > +       }
> > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > >         ctx = mergeable_len_to_ctx(len + room, headroom);
> > > > > > > > > > > > > > > > -       err = virtqueue_add_inbuf_ctx(rq->vq, rq->sg, 1, buf, ctx, gfp);
> > > > > > > > > > > > > > > > +       err = virtqueue_add_inbuf_ctx(rq->vq, rq->sg, 1, data, ctx, gfp);
> > > > > > > > > > > > > > > >         if (err < 0)
> > > > > > > > > > > > > > > > -               put_page(virt_to_head_page(buf));
> > > > > > > > > > > > > > > > +               goto add_err;
> > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > +       return 0;
> > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > +add_err:
> > > > > > > > > > > > > > > > +       if (rq->data_array) {
> > > > > > > > > > > > > > > > +               virtnet_rq_unmap(rq, data->dma);
> > > > > > > > > > > > > > > > +               virtnet_rq_recycle_data(rq, data);
> > > > > > > > > > > > > > > > +       }
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > +map_err:
> > > > > > > > > > > > > > > > +       put_page(virt_to_head_page(buf));
> > > > > > > > > > > > > > > >         return err;
> > > > > > > > > > > > > > > >  }
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > @@ -1775,13 +2018,13 @@ static int virtnet_receive(struct receive_queue *rq, int budget,
> > > > > > > > > > > > > > > >                 void *ctx;
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >                 while (stats.packets < budget &&
> > > > > > > > > > > > > > > > -                      (buf = virtqueue_get_buf_ctx(rq->vq, &len, &ctx))) {
> > > > > > > > > > > > > > > > +                      (buf = virtnet_rq_get_buf(rq, &len, &ctx))) {
> > > > > > > > > > > > > > > >                         receive_buf(vi, rq, buf, len, ctx, xdp_xmit, &stats);
> > > > > > > > > > > > > > > >                         stats.packets++;
> > > > > > > > > > > > > > > >                 }
> > > > > > > > > > > > > > > >         } else {
> > > > > > > > > > > > > > > >                 while (stats.packets < budget &&
> > > > > > > > > > > > > > > > -                      (buf = virtqueue_get_buf(rq->vq, &len)) != NULL) {
> > > > > > > > > > > > > > > > +                      (buf = virtnet_rq_get_buf(rq, &len, NULL)) != NULL) {
> > > > > > > > > > > > > > > >                         receive_buf(vi, rq, buf, len, NULL, xdp_xmit, &stats);
> > > > > > > > > > > > > > > >                         stats.packets++;
> > > > > > > > > > > > > > > >                 }
> > > > > > > > > > > > > > > > @@ -3514,6 +3757,9 @@ static void virtnet_free_queues(struct virtnet_info *vi)
> > > > > > > > > > > > > > > >         for (i = 0; i < vi->max_queue_pairs; i++) {
> > > > > > > > > > > > > > > >                 __netif_napi_del(&vi->rq[i].napi);
> > > > > > > > > > > > > > > >                 __netif_napi_del(&vi->sq[i].napi);
> > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > +               kfree(vi->rq[i].data_array);
> > > > > > > > > > > > > > > > +               kfree(vi->rq[i].dma_array);
> > > > > > > > > > > > > > > >         }
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >         /* We called __netif_napi_del(),
> > > > > > > > > > > > > > > > @@ -3591,9 +3837,10 @@ static void free_unused_bufs(struct virtnet_info *vi)
> > > > > > > > > > > > > > > >         }
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >         for (i = 0; i < vi->max_queue_pairs; i++) {
> > > > > > > > > > > > > > > > -               struct virtqueue *vq = vi->rq[i].vq;
> > > > > > > > > > > > > > > > -               while ((buf = virtqueue_detach_unused_buf(vq)) != NULL)
> > > > > > > > > > > > > > > > -                       virtnet_rq_free_unused_buf(vq, buf);
> > > > > > > > > > > > > > > > +               struct receive_queue *rq = &vi->rq[i];
> > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > +               while ((buf = virtnet_rq_detach_unused_buf(rq)) != NULL)
> > > > > > > > > > > > > > > > +                       virtnet_rq_free_unused_buf(rq->vq, buf);
> > > > > > > > > > > > > > > >                 cond_resched();
> > > > > > > > > > > > > > > >         }
> > > > > > > > > > > > > > > >  }
> > > > > > > > > > > > > > > > @@ -3767,6 +4014,10 @@ static int init_vqs(struct virtnet_info *vi)
> > > > > > > > > > > > > > > >         if (ret)
> > > > > > > > > > > > > > > >                 goto err_free;
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > +       ret = virtnet_rq_merge_map_init(vi);
> > > > > > > > > > > > > > > > +       if (ret)
> > > > > > > > > > > > > > > > +               goto err_free;
> > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > >         cpus_read_lock();
> > > > > > > > > > > > > > > >         virtnet_set_affinity(vi);
> > > > > > > > > > > > > > > >         cpus_read_unlock();
> > > > > > > > > > > > > > > > --
> > > > > > > > > > > > > > > > 2.32.0.3.g01195cf9f
> > > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > >
> >


^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: [PATCH vhost v11 10/10] virtio_net: merge dma operation for one page
@ 2023-07-19  9:51                                   ` Michael S. Tsirkin
  0 siblings, 0 replies; 176+ messages in thread
From: Michael S. Tsirkin @ 2023-07-19  9:51 UTC (permalink / raw)
  To: Jason Wang
  Cc: Xuan Zhuo, Jesper Dangaard Brouer, Daniel Borkmann, netdev,
	John Fastabend, Alexei Starovoitov, virtualization,
	Christoph Hellwig, Eric Dumazet, Jakub Kicinski, bpf,
	Paolo Abeni, David S. Miller

On Wed, Jul 19, 2023 at 05:38:56PM +0800, Jason Wang wrote:
> On Wed, Jul 19, 2023 at 4:55 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> >
> > On Wed, Jul 19, 2023 at 11:21:07AM +0800, Xuan Zhuo wrote:
> > > On Fri, 14 Jul 2023 06:37:10 -0400, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > > > On Wed, Jul 12, 2023 at 04:38:24PM +0800, Xuan Zhuo wrote:
> > > > > On Wed, 12 Jul 2023 16:37:43 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > > > > On Wed, Jul 12, 2023 at 4:33 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > > > > >
> > > > > > > On Wed, 12 Jul 2023 15:54:58 +0800, Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > > > > > > On Tue, 11 Jul 2023 10:58:51 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > > > > > > > On Tue, Jul 11, 2023 at 10:42 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > > > > > > > >
> > > > > > > > > > On Tue, 11 Jul 2023 10:36:17 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > > > > > > > > > On Mon, Jul 10, 2023 at 8:41 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > On Mon, 10 Jul 2023 07:59:03 -0400, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > > > > > > > > > > > > On Mon, Jul 10, 2023 at 06:18:30PM +0800, Xuan Zhuo wrote:
> > > > > > > > > > > > > > On Mon, 10 Jul 2023 05:40:21 -0400, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > > > > > > > > > > > > > > On Mon, Jul 10, 2023 at 11:42:37AM +0800, Xuan Zhuo wrote:
> > > > > > > > > > > > > > > > Currently, the virtio core will perform a dma operation for each
> > > > > > > > > > > > > > > > operation. Although, the same page may be operated multiple times.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > The driver does the dma operation and manages the dma address based the
> > > > > > > > > > > > > > > > feature premapped of virtio core.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > This way, we can perform only one dma operation for the same page. In
> > > > > > > > > > > > > > > > the case of mtu 1500, this can reduce a lot of dma operations.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Tested on Aliyun g7.4large machine, in the case of a cpu 100%, pps
> > > > > > > > > > > > > > > > increased from 1893766 to 1901105. An increase of 0.4%.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > what kind of dma was there? an IOMMU? which vendors? in which mode
> > > > > > > > > > > > > > > of operation?
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Do you mean this:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > [    0.470816] iommu: Default domain type: Passthrough
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > With passthrough, dma API is just some indirect function calls, they do
> > > > > > > > > > > > > not affect the performance a lot.
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > Yes, this benefit is worthless. I seem to have done a meaningless thing. The
> > > > > > > > > > > > overhead of DMA I observed is indeed not too high.
> > > > > > > > > > >
> > > > > > > > > > > Have you measured with iommu=strict?
> > > > > > > > > >
> > > > > > > > > > I have not tested this way, our environment is pt, I wonder if strict is a
> > > > > > > > > > common scenario. I can test it.
> > > > > > > > >
> > > > > > > > > It's not a common setup, but it's a way to stress DMA layer to see the overhead.
> > > > > > > >
> > > > > > > > kernel command line: intel_iommu=on iommu.strict=1 iommu.passthrough=0
> > > > > > > >
> > > > > > > > virtio-net without merge dma 428614.00 pps
> > > > > > > >
> > > > > > > > virtio-net with merge dma    742853.00 pps
> > > > > > >
> > > > > > >
> > > > > > > kernel command line: intel_iommu=on iommu.strict=0 iommu.passthrough=0
> > > > > > >
> > > > > > > virtio-net without merge dma 775496.00 pps
> > > > > > >
> > > > > > > virtio-net with merge dma    1010514.00 pps
> > > > > > >
> > > > > > >
> > > > > >
> > > > > > Great, let's add those numbers to the changelog.
> > > > >
> > > > >
> > > > > Yes, I will do it in next version.
> > > > >
> > > > >
> > > > > Thanks.
> > > > >
> > > >
> > > > You should also test without iommu but with swiotlb=force
> > >
> > >
> > > For swiotlb, merge DMA has no benefit, because we still need to copy data from
> > > swiotlb buffer to the origin buffer.
> > > The benefit of the merge DMA is to reduce the operate to the iommu device.
> > >
> > > I did some test for this. The result is same.
> > >
> > > Thanks.
> > >
> >
> > Did you actually check that it works though?
> > Looks like with swiotlb you need to synch to trigger a copy
> > before unmap, and I don't see where it's done in the current
> > patch.
> 
> And this is needed for XDP_REDIRECT as well.
> 
> Thanks

And once you do, you'll do the copy twice so it will
actually be slower.

I suspect you need to sync manually then unmap with DMA_ATTR_SKIP_CPU_SYNC.

> >
> >
> > >
> > > >
> > > > But first fix the use of DMA API to actually be correct,
> > > > otherwise you are cheating by avoiding synchronization.
> > > >
> > > >
> > > >
> > > > > >
> > > > > > Thanks
> > > > > >
> > > > > > > Thanks.
> > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > Thanks.
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > >
> > > > > > > > > Thanks
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > Thanks.
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > Thanks
> > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > Thanks.
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > Try e.g. bounce buffer. Which is where you will see a problem: your
> > > > > > > > > > > > > patches won't work.
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > This kind of difference is likely in the noise.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > It's really not high, but this is because the proportion of DMA under perf top
> > > > > > > > > > > > > > is not high. Probably that much.
> > > > > > > > > > > > >
> > > > > > > > > > > > > So maybe not worth the complexity.
> > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > ---
> > > > > > > > > > > > > > > >  drivers/net/virtio_net.c | 283 ++++++++++++++++++++++++++++++++++++---
> > > > > > > > > > > > > > > >  1 file changed, 267 insertions(+), 16 deletions(-)
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > > > > > > > > > > > > > > > index 486b5849033d..4de845d35bed 100644
> > > > > > > > > > > > > > > > --- a/drivers/net/virtio_net.c
> > > > > > > > > > > > > > > > +++ b/drivers/net/virtio_net.c
> > > > > > > > > > > > > > > > @@ -126,6 +126,27 @@ static const struct virtnet_stat_desc virtnet_rq_stats_desc[] = {
> > > > > > > > > > > > > > > >  #define VIRTNET_SQ_STATS_LEN   ARRAY_SIZE(virtnet_sq_stats_desc)
> > > > > > > > > > > > > > > >  #define VIRTNET_RQ_STATS_LEN   ARRAY_SIZE(virtnet_rq_stats_desc)
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > +/* The bufs on the same page may share this struct. */
> > > > > > > > > > > > > > > > +struct virtnet_rq_dma {
> > > > > > > > > > > > > > > > +       struct virtnet_rq_dma *next;
> > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > +       dma_addr_t addr;
> > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > +       void *buf;
> > > > > > > > > > > > > > > > +       u32 len;
> > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > +       u32 ref;
> > > > > > > > > > > > > > > > +};
> > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > +/* Record the dma and buf. */
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > I guess I see that. But why?
> > > > > > > > > > > > > > > And these two comments are the extent of the available
> > > > > > > > > > > > > > > documentation, that's not enough I feel.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > +struct virtnet_rq_data {
> > > > > > > > > > > > > > > > +       struct virtnet_rq_data *next;
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Is manually reimplementing a linked list the best
> > > > > > > > > > > > > > > we can do?
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Yes, we can use llist.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > +       void *buf;
> > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > +       struct virtnet_rq_dma *dma;
> > > > > > > > > > > > > > > > +};
> > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > >  /* Internal representation of a send virtqueue */
> > > > > > > > > > > > > > > >  struct send_queue {
> > > > > > > > > > > > > > > >         /* Virtqueue associated with this send _queue */
> > > > > > > > > > > > > > > > @@ -175,6 +196,13 @@ struct receive_queue {
> > > > > > > > > > > > > > > >         char name[16];
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >         struct xdp_rxq_info xdp_rxq;
> > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > +       struct virtnet_rq_data *data_array;
> > > > > > > > > > > > > > > > +       struct virtnet_rq_data *data_free;
> > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > +       struct virtnet_rq_dma *dma_array;
> > > > > > > > > > > > > > > > +       struct virtnet_rq_dma *dma_free;
> > > > > > > > > > > > > > > > +       struct virtnet_rq_dma *last_dma;
> > > > > > > > > > > > > > > >  };
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >  /* This structure can contain rss message with maximum settings for indirection table and keysize
> > > > > > > > > > > > > > > > @@ -549,6 +577,176 @@ static struct sk_buff *page_to_skb(struct virtnet_info *vi,
> > > > > > > > > > > > > > > >         return skb;
> > > > > > > > > > > > > > > >  }
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > +static void virtnet_rq_unmap(struct receive_queue *rq, struct virtnet_rq_dma *dma)
> > > > > > > > > > > > > > > > +{
> > > > > > > > > > > > > > > > +       struct device *dev;
> > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > +       --dma->ref;
> > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > +       if (dma->ref)
> > > > > > > > > > > > > > > > +               return;
> > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > If you don't unmap there is no guarantee valid data will be
> > > > > > > > > > > > > > > there in the buffer.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > +       dev = virtqueue_dma_dev(rq->vq);
> > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > +       dma_unmap_page(dev, dma->addr, dma->len, DMA_FROM_DEVICE);
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > +       dma->next = rq->dma_free;
> > > > > > > > > > > > > > > > +       rq->dma_free = dma;
> > > > > > > > > > > > > > > > +}
> > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > +static void *virtnet_rq_recycle_data(struct receive_queue *rq,
> > > > > > > > > > > > > > > > +                                    struct virtnet_rq_data *data)
> > > > > > > > > > > > > > > > +{
> > > > > > > > > > > > > > > > +       void *buf;
> > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > +       buf = data->buf;
> > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > +       data->next = rq->data_free;
> > > > > > > > > > > > > > > > +       rq->data_free = data;
> > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > +       return buf;
> > > > > > > > > > > > > > > > +}
> > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > +static struct virtnet_rq_data *virtnet_rq_get_data(struct receive_queue *rq,
> > > > > > > > > > > > > > > > +                                                  void *buf,
> > > > > > > > > > > > > > > > +                                                  struct virtnet_rq_dma *dma)
> > > > > > > > > > > > > > > > +{
> > > > > > > > > > > > > > > > +       struct virtnet_rq_data *data;
> > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > +       data = rq->data_free;
> > > > > > > > > > > > > > > > +       rq->data_free = data->next;
> > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > +       data->buf = buf;
> > > > > > > > > > > > > > > > +       data->dma = dma;
> > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > +       return data;
> > > > > > > > > > > > > > > > +}
> > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > +static void *virtnet_rq_get_buf(struct receive_queue *rq, u32 *len, void **ctx)
> > > > > > > > > > > > > > > > +{
> > > > > > > > > > > > > > > > +       struct virtnet_rq_data *data;
> > > > > > > > > > > > > > > > +       void *buf;
> > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > +       buf = virtqueue_get_buf_ctx(rq->vq, len, ctx);
> > > > > > > > > > > > > > > > +       if (!buf || !rq->data_array)
> > > > > > > > > > > > > > > > +               return buf;
> > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > +       data = buf;
> > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > +       virtnet_rq_unmap(rq, data->dma);
> > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > +       return virtnet_rq_recycle_data(rq, data);
> > > > > > > > > > > > > > > > +}
> > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > +static void *virtnet_rq_detach_unused_buf(struct receive_queue *rq)
> > > > > > > > > > > > > > > > +{
> > > > > > > > > > > > > > > > +       struct virtnet_rq_data *data;
> > > > > > > > > > > > > > > > +       void *buf;
> > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > +       buf = virtqueue_detach_unused_buf(rq->vq);
> > > > > > > > > > > > > > > > +       if (!buf || !rq->data_array)
> > > > > > > > > > > > > > > > +               return buf;
> > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > +       data = buf;
> > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > +       virtnet_rq_unmap(rq, data->dma);
> > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > +       return virtnet_rq_recycle_data(rq, data);
> > > > > > > > > > > > > > > > +}
> > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > +static int virtnet_rq_map_sg(struct receive_queue *rq, void *buf, u32 len)
> > > > > > > > > > > > > > > > +{
> > > > > > > > > > > > > > > > +       struct virtnet_rq_dma *dma = rq->last_dma;
> > > > > > > > > > > > > > > > +       struct device *dev;
> > > > > > > > > > > > > > > > +       u32 off, map_len;
> > > > > > > > > > > > > > > > +       dma_addr_t addr;
> > > > > > > > > > > > > > > > +       void *end;
> > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > +       if (likely(dma) && buf >= dma->buf && (buf + len <= dma->buf + dma->len)) {
> > > > > > > > > > > > > > > > +               ++dma->ref;
> > > > > > > > > > > > > > > > +               addr = dma->addr + (buf - dma->buf);
> > > > > > > > > > > > > > > > +               goto ok;
> > > > > > > > > > > > > > > > +       }
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > So this is the meat of the proposed optimization. I guess that
> > > > > > > > > > > > > > > if the last buffer we allocated happens to be in the same page
> > > > > > > > > > > > > > > as this one then they can both be mapped for DMA together.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Since we use page_frag, the buffers we allocated are all continuous.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Why last one specifically? Whether next one happens to
> > > > > > > > > > > > > > > be close depends on luck. If you want to try optimizing this
> > > > > > > > > > > > > > > the right thing to do is likely by using a page pool.
> > > > > > > > > > > > > > > There's actually work upstream on page pool, look it up.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > As we discussed in another thread, the page pool is first used for xdp. Let's
> > > > > > > > > > > > > > transform it step by step.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Thanks.
> > > > > > > > > > > > >
> > > > > > > > > > > > > ok so this should wait then?
> > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > +       end = buf + len - 1;
> > > > > > > > > > > > > > > > +       off = offset_in_page(end);
> > > > > > > > > > > > > > > > +       map_len = len + PAGE_SIZE - off;
> > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > +       dev = virtqueue_dma_dev(rq->vq);
> > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > +       addr = dma_map_page_attrs(dev, virt_to_page(buf), offset_in_page(buf),
> > > > > > > > > > > > > > > > +                                 map_len, DMA_FROM_DEVICE, 0);
> > > > > > > > > > > > > > > > +       if (addr == DMA_MAPPING_ERROR)
> > > > > > > > > > > > > > > > +               return -ENOMEM;
> > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > +       dma = rq->dma_free;
> > > > > > > > > > > > > > > > +       rq->dma_free = dma->next;
> > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > +       dma->ref = 1;
> > > > > > > > > > > > > > > > +       dma->buf = buf;
> > > > > > > > > > > > > > > > +       dma->addr = addr;
> > > > > > > > > > > > > > > > +       dma->len = map_len;
> > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > +       rq->last_dma = dma;
> > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > +ok:
> > > > > > > > > > > > > > > > +       sg_init_table(rq->sg, 1);
> > > > > > > > > > > > > > > > +       rq->sg[0].dma_address = addr;
> > > > > > > > > > > > > > > > +       rq->sg[0].length = len;
> > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > +       return 0;
> > > > > > > > > > > > > > > > +}
> > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > +static int virtnet_rq_merge_map_init(struct virtnet_info *vi)
> > > > > > > > > > > > > > > > +{
> > > > > > > > > > > > > > > > +       struct receive_queue *rq;
> > > > > > > > > > > > > > > > +       int i, err, j, num;
> > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > +       /* disable for big mode */
> > > > > > > > > > > > > > > > +       if (!vi->mergeable_rx_bufs && vi->big_packets)
> > > > > > > > > > > > > > > > +               return 0;
> > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > +       for (i = 0; i < vi->max_queue_pairs; i++) {
> > > > > > > > > > > > > > > > +               err = virtqueue_set_premapped(vi->rq[i].vq);
> > > > > > > > > > > > > > > > +               if (err)
> > > > > > > > > > > > > > > > +                       continue;
> > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > +               rq = &vi->rq[i];
> > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > +               num = virtqueue_get_vring_size(rq->vq);
> > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > +               rq->data_array = kmalloc_array(num, sizeof(*rq->data_array), GFP_KERNEL);
> > > > > > > > > > > > > > > > +               if (!rq->data_array)
> > > > > > > > > > > > > > > > +                       goto err;
> > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > +               rq->dma_array = kmalloc_array(num, sizeof(*rq->dma_array), GFP_KERNEL);
> > > > > > > > > > > > > > > > +               if (!rq->dma_array)
> > > > > > > > > > > > > > > > +                       goto err;
> > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > +               for (j = 0; j < num; ++j) {
> > > > > > > > > > > > > > > > +                       rq->data_array[j].next = rq->data_free;
> > > > > > > > > > > > > > > > +                       rq->data_free = &rq->data_array[j];
> > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > +                       rq->dma_array[j].next = rq->dma_free;
> > > > > > > > > > > > > > > > +                       rq->dma_free = &rq->dma_array[j];
> > > > > > > > > > > > > > > > +               }
> > > > > > > > > > > > > > > > +       }
> > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > +       return 0;
> > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > +err:
> > > > > > > > > > > > > > > > +       for (i = 0; i < vi->max_queue_pairs; i++) {
> > > > > > > > > > > > > > > > +               struct receive_queue *rq;
> > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > +               rq = &vi->rq[i];
> > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > +               kfree(rq->dma_array);
> > > > > > > > > > > > > > > > +               kfree(rq->data_array);
> > > > > > > > > > > > > > > > +       }
> > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > +       return -ENOMEM;
> > > > > > > > > > > > > > > > +}
> > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > >  static void free_old_xmit_skbs(struct send_queue *sq, bool in_napi)
> > > > > > > > > > > > > > > >  {
> > > > > > > > > > > > > > > >         unsigned int len;
> > > > > > > > > > > > > > > > @@ -835,7 +1033,7 @@ static struct page *xdp_linearize_page(struct receive_queue *rq,
> > > > > > > > > > > > > > > >                 void *buf;
> > > > > > > > > > > > > > > >                 int off;
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > -               buf = virtqueue_get_buf(rq->vq, &buflen);
> > > > > > > > > > > > > > > > +               buf = virtnet_rq_get_buf(rq, &buflen, NULL);
> > > > > > > > > > > > > > > >                 if (unlikely(!buf))
> > > > > > > > > > > > > > > >                         goto err_buf;
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > @@ -1126,7 +1324,7 @@ static int virtnet_build_xdp_buff_mrg(struct net_device *dev,
> > > > > > > > > > > > > > > >                 return -EINVAL;
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >         while (--*num_buf > 0) {
> > > > > > > > > > > > > > > > -               buf = virtqueue_get_buf_ctx(rq->vq, &len, &ctx);
> > > > > > > > > > > > > > > > +               buf = virtnet_rq_get_buf(rq, &len, &ctx);
> > > > > > > > > > > > > > > >                 if (unlikely(!buf)) {
> > > > > > > > > > > > > > > >                         pr_debug("%s: rx error: %d buffers out of %d missing\n",
> > > > > > > > > > > > > > > >                                  dev->name, *num_buf,
> > > > > > > > > > > > > > > > @@ -1351,7 +1549,7 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
> > > > > > > > > > > > > > > >         while (--num_buf) {
> > > > > > > > > > > > > > > >                 int num_skb_frags;
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > -               buf = virtqueue_get_buf_ctx(rq->vq, &len, &ctx);
> > > > > > > > > > > > > > > > +               buf = virtnet_rq_get_buf(rq, &len, &ctx);
> > > > > > > > > > > > > > > >                 if (unlikely(!buf)) {
> > > > > > > > > > > > > > > >                         pr_debug("%s: rx error: %d buffers out of %d missing\n",
> > > > > > > > > > > > > > > >                                  dev->name, num_buf,
> > > > > > > > > > > > > > > > @@ -1414,7 +1612,7 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
> > > > > > > > > > > > > > > >  err_skb:
> > > > > > > > > > > > > > > >         put_page(page);
> > > > > > > > > > > > > > > >         while (num_buf-- > 1) {
> > > > > > > > > > > > > > > > -               buf = virtqueue_get_buf(rq->vq, &len);
> > > > > > > > > > > > > > > > +               buf = virtnet_rq_get_buf(rq, &len, NULL);
> > > > > > > > > > > > > > > >                 if (unlikely(!buf)) {
> > > > > > > > > > > > > > > >                         pr_debug("%s: rx error: %d buffers missing\n",
> > > > > > > > > > > > > > > >                                  dev->name, num_buf);
> > > > > > > > > > > > > > > > @@ -1529,6 +1727,7 @@ static int add_recvbuf_small(struct virtnet_info *vi, struct receive_queue *rq,
> > > > > > > > > > > > > > > >         unsigned int xdp_headroom = virtnet_get_headroom(vi);
> > > > > > > > > > > > > > > >         void *ctx = (void *)(unsigned long)xdp_headroom;
> > > > > > > > > > > > > > > >         int len = vi->hdr_len + VIRTNET_RX_PAD + GOOD_PACKET_LEN + xdp_headroom;
> > > > > > > > > > > > > > > > +       struct virtnet_rq_data *data;
> > > > > > > > > > > > > > > >         int err;
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >         len = SKB_DATA_ALIGN(len) +
> > > > > > > > > > > > > > > > @@ -1539,11 +1738,34 @@ static int add_recvbuf_small(struct virtnet_info *vi, struct receive_queue *rq,
> > > > > > > > > > > > > > > >         buf = (char *)page_address(alloc_frag->page) + alloc_frag->offset;
> > > > > > > > > > > > > > > >         get_page(alloc_frag->page);
> > > > > > > > > > > > > > > >         alloc_frag->offset += len;
> > > > > > > > > > > > > > > > -       sg_init_one(rq->sg, buf + VIRTNET_RX_PAD + xdp_headroom,
> > > > > > > > > > > > > > > > -                   vi->hdr_len + GOOD_PACKET_LEN);
> > > > > > > > > > > > > > > > -       err = virtqueue_add_inbuf_ctx(rq->vq, rq->sg, 1, buf, ctx, gfp);
> > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > +       if (rq->data_array) {
> > > > > > > > > > > > > > > > +               err = virtnet_rq_map_sg(rq, buf + VIRTNET_RX_PAD + xdp_headroom,
> > > > > > > > > > > > > > > > +                                       vi->hdr_len + GOOD_PACKET_LEN);
> > > > > > > > > > > > > > > > +               if (err)
> > > > > > > > > > > > > > > > +                       goto map_err;
> > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > +               data = virtnet_rq_get_data(rq, buf, rq->last_dma);
> > > > > > > > > > > > > > > > +       } else {
> > > > > > > > > > > > > > > > +               sg_init_one(rq->sg, buf + VIRTNET_RX_PAD + xdp_headroom,
> > > > > > > > > > > > > > > > +                           vi->hdr_len + GOOD_PACKET_LEN);
> > > > > > > > > > > > > > > > +               data = (void *)buf;
> > > > > > > > > > > > > > > > +       }
> > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > +       err = virtqueue_add_inbuf_ctx(rq->vq, rq->sg, 1, data, ctx, gfp);
> > > > > > > > > > > > > > > >         if (err < 0)
> > > > > > > > > > > > > > > > -               put_page(virt_to_head_page(buf));
> > > > > > > > > > > > > > > > +               goto add_err;
> > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > +       return err;
> > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > +add_err:
> > > > > > > > > > > > > > > > +       if (rq->data_array) {
> > > > > > > > > > > > > > > > +               virtnet_rq_unmap(rq, data->dma);
> > > > > > > > > > > > > > > > +               virtnet_rq_recycle_data(rq, data);
> > > > > > > > > > > > > > > > +       }
> > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > +map_err:
> > > > > > > > > > > > > > > > +       put_page(virt_to_head_page(buf));
> > > > > > > > > > > > > > > >         return err;
> > > > > > > > > > > > > > > >  }
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > @@ -1620,6 +1842,7 @@ static int add_recvbuf_mergeable(struct virtnet_info *vi,
> > > > > > > > > > > > > > > >         unsigned int headroom = virtnet_get_headroom(vi);
> > > > > > > > > > > > > > > >         unsigned int tailroom = headroom ? sizeof(struct skb_shared_info) : 0;
> > > > > > > > > > > > > > > >         unsigned int room = SKB_DATA_ALIGN(headroom + tailroom);
> > > > > > > > > > > > > > > > +       struct virtnet_rq_data *data;
> > > > > > > > > > > > > > > >         char *buf;
> > > > > > > > > > > > > > > >         void *ctx;
> > > > > > > > > > > > > > > >         int err;
> > > > > > > > > > > > > > > > @@ -1650,12 +1873,32 @@ static int add_recvbuf_mergeable(struct virtnet_info *vi,
> > > > > > > > > > > > > > > >                 alloc_frag->offset += hole;
> > > > > > > > > > > > > > > >         }
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > -       sg_init_one(rq->sg, buf, len);
> > > > > > > > > > > > > > > > +       if (rq->data_array) {
> > > > > > > > > > > > > > > > +               err = virtnet_rq_map_sg(rq, buf, len);
> > > > > > > > > > > > > > > > +               if (err)
> > > > > > > > > > > > > > > > +                       goto map_err;
> > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > +               data = virtnet_rq_get_data(rq, buf, rq->last_dma);
> > > > > > > > > > > > > > > > +       } else {
> > > > > > > > > > > > > > > > +               sg_init_one(rq->sg, buf, len);
> > > > > > > > > > > > > > > > +               data = (void *)buf;
> > > > > > > > > > > > > > > > +       }
> > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > >         ctx = mergeable_len_to_ctx(len + room, headroom);
> > > > > > > > > > > > > > > > -       err = virtqueue_add_inbuf_ctx(rq->vq, rq->sg, 1, buf, ctx, gfp);
> > > > > > > > > > > > > > > > +       err = virtqueue_add_inbuf_ctx(rq->vq, rq->sg, 1, data, ctx, gfp);
> > > > > > > > > > > > > > > >         if (err < 0)
> > > > > > > > > > > > > > > > -               put_page(virt_to_head_page(buf));
> > > > > > > > > > > > > > > > +               goto add_err;
> > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > +       return 0;
> > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > +add_err:
> > > > > > > > > > > > > > > > +       if (rq->data_array) {
> > > > > > > > > > > > > > > > +               virtnet_rq_unmap(rq, data->dma);
> > > > > > > > > > > > > > > > +               virtnet_rq_recycle_data(rq, data);
> > > > > > > > > > > > > > > > +       }
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > +map_err:
> > > > > > > > > > > > > > > > +       put_page(virt_to_head_page(buf));
> > > > > > > > > > > > > > > >         return err;
> > > > > > > > > > > > > > > >  }
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > @@ -1775,13 +2018,13 @@ static int virtnet_receive(struct receive_queue *rq, int budget,
> > > > > > > > > > > > > > > >                 void *ctx;
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >                 while (stats.packets < budget &&
> > > > > > > > > > > > > > > > -                      (buf = virtqueue_get_buf_ctx(rq->vq, &len, &ctx))) {
> > > > > > > > > > > > > > > > +                      (buf = virtnet_rq_get_buf(rq, &len, &ctx))) {
> > > > > > > > > > > > > > > >                         receive_buf(vi, rq, buf, len, ctx, xdp_xmit, &stats);
> > > > > > > > > > > > > > > >                         stats.packets++;
> > > > > > > > > > > > > > > >                 }
> > > > > > > > > > > > > > > >         } else {
> > > > > > > > > > > > > > > >                 while (stats.packets < budget &&
> > > > > > > > > > > > > > > > -                      (buf = virtqueue_get_buf(rq->vq, &len)) != NULL) {
> > > > > > > > > > > > > > > > +                      (buf = virtnet_rq_get_buf(rq, &len, NULL)) != NULL) {
> > > > > > > > > > > > > > > >                         receive_buf(vi, rq, buf, len, NULL, xdp_xmit, &stats);
> > > > > > > > > > > > > > > >                         stats.packets++;
> > > > > > > > > > > > > > > >                 }
> > > > > > > > > > > > > > > > @@ -3514,6 +3757,9 @@ static void virtnet_free_queues(struct virtnet_info *vi)
> > > > > > > > > > > > > > > >         for (i = 0; i < vi->max_queue_pairs; i++) {
> > > > > > > > > > > > > > > >                 __netif_napi_del(&vi->rq[i].napi);
> > > > > > > > > > > > > > > >                 __netif_napi_del(&vi->sq[i].napi);
> > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > +               kfree(vi->rq[i].data_array);
> > > > > > > > > > > > > > > > +               kfree(vi->rq[i].dma_array);
> > > > > > > > > > > > > > > >         }
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >         /* We called __netif_napi_del(),
> > > > > > > > > > > > > > > > @@ -3591,9 +3837,10 @@ static void free_unused_bufs(struct virtnet_info *vi)
> > > > > > > > > > > > > > > >         }
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >         for (i = 0; i < vi->max_queue_pairs; i++) {
> > > > > > > > > > > > > > > > -               struct virtqueue *vq = vi->rq[i].vq;
> > > > > > > > > > > > > > > > -               while ((buf = virtqueue_detach_unused_buf(vq)) != NULL)
> > > > > > > > > > > > > > > > -                       virtnet_rq_free_unused_buf(vq, buf);
> > > > > > > > > > > > > > > > +               struct receive_queue *rq = &vi->rq[i];
> > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > +               while ((buf = virtnet_rq_detach_unused_buf(rq)) != NULL)
> > > > > > > > > > > > > > > > +                       virtnet_rq_free_unused_buf(rq->vq, buf);
> > > > > > > > > > > > > > > >                 cond_resched();
> > > > > > > > > > > > > > > >         }
> > > > > > > > > > > > > > > >  }
> > > > > > > > > > > > > > > > @@ -3767,6 +4014,10 @@ static int init_vqs(struct virtnet_info *vi)
> > > > > > > > > > > > > > > >         if (ret)
> > > > > > > > > > > > > > > >                 goto err_free;
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > +       ret = virtnet_rq_merge_map_init(vi);
> > > > > > > > > > > > > > > > +       if (ret)
> > > > > > > > > > > > > > > > +               goto err_free;
> > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > >         cpus_read_lock();
> > > > > > > > > > > > > > > >         virtnet_set_affinity(vi);
> > > > > > > > > > > > > > > >         cpus_read_unlock();
> > > > > > > > > > > > > > > > --
> > > > > > > > > > > > > > > > 2.32.0.3.g01195cf9f
> > > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > >
> >


_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: [PATCH vhost v11 10/10] virtio_net: merge dma operation for one page
  2023-07-19  8:55                               ` Michael S. Tsirkin
@ 2023-07-20  2:24                                 ` Xuan Zhuo
  -1 siblings, 0 replies; 176+ messages in thread
From: Xuan Zhuo @ 2023-07-20  2:24 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Jason Wang, virtualization, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Alexei Starovoitov, Daniel Borkmann,
	Jesper Dangaard Brouer, John Fastabend, netdev, bpf,
	Christoph Hellwig

On Wed, 19 Jul 2023 04:55:04 -0400, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> On Wed, Jul 19, 2023 at 11:21:07AM +0800, Xuan Zhuo wrote:
> > On Fri, 14 Jul 2023 06:37:10 -0400, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > > On Wed, Jul 12, 2023 at 04:38:24PM +0800, Xuan Zhuo wrote:
> > > > On Wed, 12 Jul 2023 16:37:43 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > > > On Wed, Jul 12, 2023 at 4:33 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > > > >
> > > > > > On Wed, 12 Jul 2023 15:54:58 +0800, Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > > > > > On Tue, 11 Jul 2023 10:58:51 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > > > > > > On Tue, Jul 11, 2023 at 10:42 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > > > > > > >
> > > > > > > > > On Tue, 11 Jul 2023 10:36:17 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > > > > > > > > On Mon, Jul 10, 2023 at 8:41 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > > > > > > > > >
> > > > > > > > > > > On Mon, 10 Jul 2023 07:59:03 -0400, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > > > > > > > > > > > On Mon, Jul 10, 2023 at 06:18:30PM +0800, Xuan Zhuo wrote:
> > > > > > > > > > > > > On Mon, 10 Jul 2023 05:40:21 -0400, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > > > > > > > > > > > > > On Mon, Jul 10, 2023 at 11:42:37AM +0800, Xuan Zhuo wrote:
> > > > > > > > > > > > > > > Currently, the virtio core will perform a dma operation for each
> > > > > > > > > > > > > > > operation. Although, the same page may be operated multiple times.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > The driver does the dma operation and manages the dma address based the
> > > > > > > > > > > > > > > feature premapped of virtio core.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > This way, we can perform only one dma operation for the same page. In
> > > > > > > > > > > > > > > the case of mtu 1500, this can reduce a lot of dma operations.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Tested on Aliyun g7.4large machine, in the case of a cpu 100%, pps
> > > > > > > > > > > > > > > increased from 1893766 to 1901105. An increase of 0.4%.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > what kind of dma was there? an IOMMU? which vendors? in which mode
> > > > > > > > > > > > > > of operation?
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > Do you mean this:
> > > > > > > > > > > > >
> > > > > > > > > > > > > [    0.470816] iommu: Default domain type: Passthrough
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > With passthrough, dma API is just some indirect function calls, they do
> > > > > > > > > > > > not affect the performance a lot.
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > Yes, this benefit is worthless. I seem to have done a meaningless thing. The
> > > > > > > > > > > overhead of DMA I observed is indeed not too high.
> > > > > > > > > >
> > > > > > > > > > Have you measured with iommu=strict?
> > > > > > > > >
> > > > > > > > > I have not tested this way, our environment is pt, I wonder if strict is a
> > > > > > > > > common scenario. I can test it.
> > > > > > > >
> > > > > > > > It's not a common setup, but it's a way to stress DMA layer to see the overhead.
> > > > > > >
> > > > > > > kernel command line: intel_iommu=on iommu.strict=1 iommu.passthrough=0
> > > > > > >
> > > > > > > virtio-net without merge dma 428614.00 pps
> > > > > > >
> > > > > > > virtio-net with merge dma    742853.00 pps
> > > > > >
> > > > > >
> > > > > > kernel command line: intel_iommu=on iommu.strict=0 iommu.passthrough=0
> > > > > >
> > > > > > virtio-net without merge dma 775496.00 pps
> > > > > >
> > > > > > virtio-net with merge dma    1010514.00 pps
> > > > > >
> > > > > >
> > > > >
> > > > > Great, let's add those numbers to the changelog.
> > > >
> > > >
> > > > Yes, I will do it in next version.
> > > >
> > > >
> > > > Thanks.
> > > >
> > >
> > > You should also test without iommu but with swiotlb=force
> >
> >
> > For swiotlb, merge DMA has no benefit, because we still need to copy data from
> > swiotlb buffer to the origin buffer.
> > The benefit of the merge DMA is to reduce the operate to the iommu device.
> >
> > I did some test for this. The result is same.
> >
> > Thanks.
> >
>
> Did you actually check that it works though?
> Looks like with swiotlb you need to synch to trigger a copy
> before unmap, and I don't see where it's done in the current
> patch.

Yes, you are right, I miss the sync in this patch.
But when I tested for swiotlb, I fixed this.
You can see this in the next version.

Thanks.


>
>
> >
> > >
> > > But first fix the use of DMA API to actually be correct,
> > > otherwise you are cheating by avoiding synchronization.
> > >
> > >
> > >
> > > > >
> > > > > Thanks
> > > > >
> > > > > > Thanks.
> > > > > >
> > > > > > >
> > > > > > >
> > > > > > > Thanks.
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > >
> > > > > > > > Thanks
> > > > > > > >
> > > > > > > > >
> > > > > > > > > Thanks.
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > Thanks
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > Thanks.
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > Try e.g. bounce buffer. Which is where you will see a problem: your
> > > > > > > > > > > > patches won't work.
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > This kind of difference is likely in the noise.
> > > > > > > > > > > > >
> > > > > > > > > > > > > It's really not high, but this is because the proportion of DMA under perf top
> > > > > > > > > > > > > is not high. Probably that much.
> > > > > > > > > > > >
> > > > > > > > > > > > So maybe not worth the complexity.
> > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > ---
> > > > > > > > > > > > > > >  drivers/net/virtio_net.c | 283 ++++++++++++++++++++++++++++++++++++---
> > > > > > > > > > > > > > >  1 file changed, 267 insertions(+), 16 deletions(-)
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > > > > > > > > > > > > > > index 486b5849033d..4de845d35bed 100644
> > > > > > > > > > > > > > > --- a/drivers/net/virtio_net.c
> > > > > > > > > > > > > > > +++ b/drivers/net/virtio_net.c
> > > > > > > > > > > > > > > @@ -126,6 +126,27 @@ static const struct virtnet_stat_desc virtnet_rq_stats_desc[] = {
> > > > > > > > > > > > > > >  #define VIRTNET_SQ_STATS_LEN   ARRAY_SIZE(virtnet_sq_stats_desc)
> > > > > > > > > > > > > > >  #define VIRTNET_RQ_STATS_LEN   ARRAY_SIZE(virtnet_rq_stats_desc)
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > +/* The bufs on the same page may share this struct. */
> > > > > > > > > > > > > > > +struct virtnet_rq_dma {
> > > > > > > > > > > > > > > +       struct virtnet_rq_dma *next;
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > +       dma_addr_t addr;
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > +       void *buf;
> > > > > > > > > > > > > > > +       u32 len;
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > +       u32 ref;
> > > > > > > > > > > > > > > +};
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > +/* Record the dma and buf. */
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > I guess I see that. But why?
> > > > > > > > > > > > > > And these two comments are the extent of the available
> > > > > > > > > > > > > > documentation, that's not enough I feel.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > +struct virtnet_rq_data {
> > > > > > > > > > > > > > > +       struct virtnet_rq_data *next;
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Is manually reimplementing a linked list the best
> > > > > > > > > > > > > > we can do?
> > > > > > > > > > > > >
> > > > > > > > > > > > > Yes, we can use llist.
> > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > +       void *buf;
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > +       struct virtnet_rq_dma *dma;
> > > > > > > > > > > > > > > +};
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > >  /* Internal representation of a send virtqueue */
> > > > > > > > > > > > > > >  struct send_queue {
> > > > > > > > > > > > > > >         /* Virtqueue associated with this send _queue */
> > > > > > > > > > > > > > > @@ -175,6 +196,13 @@ struct receive_queue {
> > > > > > > > > > > > > > >         char name[16];
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >         struct xdp_rxq_info xdp_rxq;
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > +       struct virtnet_rq_data *data_array;
> > > > > > > > > > > > > > > +       struct virtnet_rq_data *data_free;
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > +       struct virtnet_rq_dma *dma_array;
> > > > > > > > > > > > > > > +       struct virtnet_rq_dma *dma_free;
> > > > > > > > > > > > > > > +       struct virtnet_rq_dma *last_dma;
> > > > > > > > > > > > > > >  };
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >  /* This structure can contain rss message with maximum settings for indirection table and keysize
> > > > > > > > > > > > > > > @@ -549,6 +577,176 @@ static struct sk_buff *page_to_skb(struct virtnet_info *vi,
> > > > > > > > > > > > > > >         return skb;
> > > > > > > > > > > > > > >  }
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > +static void virtnet_rq_unmap(struct receive_queue *rq, struct virtnet_rq_dma *dma)
> > > > > > > > > > > > > > > +{
> > > > > > > > > > > > > > > +       struct device *dev;
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > +       --dma->ref;
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > +       if (dma->ref)
> > > > > > > > > > > > > > > +               return;
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > If you don't unmap there is no guarantee valid data will be
> > > > > > > > > > > > > > there in the buffer.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > +       dev = virtqueue_dma_dev(rq->vq);
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > +       dma_unmap_page(dev, dma->addr, dma->len, DMA_FROM_DEVICE);
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > +       dma->next = rq->dma_free;
> > > > > > > > > > > > > > > +       rq->dma_free = dma;
> > > > > > > > > > > > > > > +}
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > +static void *virtnet_rq_recycle_data(struct receive_queue *rq,
> > > > > > > > > > > > > > > +                                    struct virtnet_rq_data *data)
> > > > > > > > > > > > > > > +{
> > > > > > > > > > > > > > > +       void *buf;
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > +       buf = data->buf;
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > +       data->next = rq->data_free;
> > > > > > > > > > > > > > > +       rq->data_free = data;
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > +       return buf;
> > > > > > > > > > > > > > > +}
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > +static struct virtnet_rq_data *virtnet_rq_get_data(struct receive_queue *rq,
> > > > > > > > > > > > > > > +                                                  void *buf,
> > > > > > > > > > > > > > > +                                                  struct virtnet_rq_dma *dma)
> > > > > > > > > > > > > > > +{
> > > > > > > > > > > > > > > +       struct virtnet_rq_data *data;
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > +       data = rq->data_free;
> > > > > > > > > > > > > > > +       rq->data_free = data->next;
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > +       data->buf = buf;
> > > > > > > > > > > > > > > +       data->dma = dma;
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > +       return data;
> > > > > > > > > > > > > > > +}
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > +static void *virtnet_rq_get_buf(struct receive_queue *rq, u32 *len, void **ctx)
> > > > > > > > > > > > > > > +{
> > > > > > > > > > > > > > > +       struct virtnet_rq_data *data;
> > > > > > > > > > > > > > > +       void *buf;
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > +       buf = virtqueue_get_buf_ctx(rq->vq, len, ctx);
> > > > > > > > > > > > > > > +       if (!buf || !rq->data_array)
> > > > > > > > > > > > > > > +               return buf;
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > +       data = buf;
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > +       virtnet_rq_unmap(rq, data->dma);
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > +       return virtnet_rq_recycle_data(rq, data);
> > > > > > > > > > > > > > > +}
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > +static void *virtnet_rq_detach_unused_buf(struct receive_queue *rq)
> > > > > > > > > > > > > > > +{
> > > > > > > > > > > > > > > +       struct virtnet_rq_data *data;
> > > > > > > > > > > > > > > +       void *buf;
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > +       buf = virtqueue_detach_unused_buf(rq->vq);
> > > > > > > > > > > > > > > +       if (!buf || !rq->data_array)
> > > > > > > > > > > > > > > +               return buf;
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > +       data = buf;
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > +       virtnet_rq_unmap(rq, data->dma);
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > +       return virtnet_rq_recycle_data(rq, data);
> > > > > > > > > > > > > > > +}
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > +static int virtnet_rq_map_sg(struct receive_queue *rq, void *buf, u32 len)
> > > > > > > > > > > > > > > +{
> > > > > > > > > > > > > > > +       struct virtnet_rq_dma *dma = rq->last_dma;
> > > > > > > > > > > > > > > +       struct device *dev;
> > > > > > > > > > > > > > > +       u32 off, map_len;
> > > > > > > > > > > > > > > +       dma_addr_t addr;
> > > > > > > > > > > > > > > +       void *end;
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > +       if (likely(dma) && buf >= dma->buf && (buf + len <= dma->buf + dma->len)) {
> > > > > > > > > > > > > > > +               ++dma->ref;
> > > > > > > > > > > > > > > +               addr = dma->addr + (buf - dma->buf);
> > > > > > > > > > > > > > > +               goto ok;
> > > > > > > > > > > > > > > +       }
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > So this is the meat of the proposed optimization. I guess that
> > > > > > > > > > > > > > if the last buffer we allocated happens to be in the same page
> > > > > > > > > > > > > > as this one then they can both be mapped for DMA together.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Since we use page_frag, the buffers we allocated are all continuous.
> > > > > > > > > > > > >
> > > > > > > > > > > > > > Why last one specifically? Whether next one happens to
> > > > > > > > > > > > > > be close depends on luck. If you want to try optimizing this
> > > > > > > > > > > > > > the right thing to do is likely by using a page pool.
> > > > > > > > > > > > > > There's actually work upstream on page pool, look it up.
> > > > > > > > > > > > >
> > > > > > > > > > > > > As we discussed in another thread, the page pool is first used for xdp. Let's
> > > > > > > > > > > > > transform it step by step.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Thanks.
> > > > > > > > > > > >
> > > > > > > > > > > > ok so this should wait then?
> > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > +       end = buf + len - 1;
> > > > > > > > > > > > > > > +       off = offset_in_page(end);
> > > > > > > > > > > > > > > +       map_len = len + PAGE_SIZE - off;
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > +       dev = virtqueue_dma_dev(rq->vq);
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > +       addr = dma_map_page_attrs(dev, virt_to_page(buf), offset_in_page(buf),
> > > > > > > > > > > > > > > +                                 map_len, DMA_FROM_DEVICE, 0);
> > > > > > > > > > > > > > > +       if (addr == DMA_MAPPING_ERROR)
> > > > > > > > > > > > > > > +               return -ENOMEM;
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > +       dma = rq->dma_free;
> > > > > > > > > > > > > > > +       rq->dma_free = dma->next;
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > +       dma->ref = 1;
> > > > > > > > > > > > > > > +       dma->buf = buf;
> > > > > > > > > > > > > > > +       dma->addr = addr;
> > > > > > > > > > > > > > > +       dma->len = map_len;
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > +       rq->last_dma = dma;
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > +ok:
> > > > > > > > > > > > > > > +       sg_init_table(rq->sg, 1);
> > > > > > > > > > > > > > > +       rq->sg[0].dma_address = addr;
> > > > > > > > > > > > > > > +       rq->sg[0].length = len;
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > +       return 0;
> > > > > > > > > > > > > > > +}
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > +static int virtnet_rq_merge_map_init(struct virtnet_info *vi)
> > > > > > > > > > > > > > > +{
> > > > > > > > > > > > > > > +       struct receive_queue *rq;
> > > > > > > > > > > > > > > +       int i, err, j, num;
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > +       /* disable for big mode */
> > > > > > > > > > > > > > > +       if (!vi->mergeable_rx_bufs && vi->big_packets)
> > > > > > > > > > > > > > > +               return 0;
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > +       for (i = 0; i < vi->max_queue_pairs; i++) {
> > > > > > > > > > > > > > > +               err = virtqueue_set_premapped(vi->rq[i].vq);
> > > > > > > > > > > > > > > +               if (err)
> > > > > > > > > > > > > > > +                       continue;
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > +               rq = &vi->rq[i];
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > +               num = virtqueue_get_vring_size(rq->vq);
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > +               rq->data_array = kmalloc_array(num, sizeof(*rq->data_array), GFP_KERNEL);
> > > > > > > > > > > > > > > +               if (!rq->data_array)
> > > > > > > > > > > > > > > +                       goto err;
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > +               rq->dma_array = kmalloc_array(num, sizeof(*rq->dma_array), GFP_KERNEL);
> > > > > > > > > > > > > > > +               if (!rq->dma_array)
> > > > > > > > > > > > > > > +                       goto err;
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > +               for (j = 0; j < num; ++j) {
> > > > > > > > > > > > > > > +                       rq->data_array[j].next = rq->data_free;
> > > > > > > > > > > > > > > +                       rq->data_free = &rq->data_array[j];
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > +                       rq->dma_array[j].next = rq->dma_free;
> > > > > > > > > > > > > > > +                       rq->dma_free = &rq->dma_array[j];
> > > > > > > > > > > > > > > +               }
> > > > > > > > > > > > > > > +       }
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > +       return 0;
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > +err:
> > > > > > > > > > > > > > > +       for (i = 0; i < vi->max_queue_pairs; i++) {
> > > > > > > > > > > > > > > +               struct receive_queue *rq;
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > +               rq = &vi->rq[i];
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > +               kfree(rq->dma_array);
> > > > > > > > > > > > > > > +               kfree(rq->data_array);
> > > > > > > > > > > > > > > +       }
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > +       return -ENOMEM;
> > > > > > > > > > > > > > > +}
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > >  static void free_old_xmit_skbs(struct send_queue *sq, bool in_napi)
> > > > > > > > > > > > > > >  {
> > > > > > > > > > > > > > >         unsigned int len;
> > > > > > > > > > > > > > > @@ -835,7 +1033,7 @@ static struct page *xdp_linearize_page(struct receive_queue *rq,
> > > > > > > > > > > > > > >                 void *buf;
> > > > > > > > > > > > > > >                 int off;
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > -               buf = virtqueue_get_buf(rq->vq, &buflen);
> > > > > > > > > > > > > > > +               buf = virtnet_rq_get_buf(rq, &buflen, NULL);
> > > > > > > > > > > > > > >                 if (unlikely(!buf))
> > > > > > > > > > > > > > >                         goto err_buf;
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > @@ -1126,7 +1324,7 @@ static int virtnet_build_xdp_buff_mrg(struct net_device *dev,
> > > > > > > > > > > > > > >                 return -EINVAL;
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >         while (--*num_buf > 0) {
> > > > > > > > > > > > > > > -               buf = virtqueue_get_buf_ctx(rq->vq, &len, &ctx);
> > > > > > > > > > > > > > > +               buf = virtnet_rq_get_buf(rq, &len, &ctx);
> > > > > > > > > > > > > > >                 if (unlikely(!buf)) {
> > > > > > > > > > > > > > >                         pr_debug("%s: rx error: %d buffers out of %d missing\n",
> > > > > > > > > > > > > > >                                  dev->name, *num_buf,
> > > > > > > > > > > > > > > @@ -1351,7 +1549,7 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
> > > > > > > > > > > > > > >         while (--num_buf) {
> > > > > > > > > > > > > > >                 int num_skb_frags;
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > -               buf = virtqueue_get_buf_ctx(rq->vq, &len, &ctx);
> > > > > > > > > > > > > > > +               buf = virtnet_rq_get_buf(rq, &len, &ctx);
> > > > > > > > > > > > > > >                 if (unlikely(!buf)) {
> > > > > > > > > > > > > > >                         pr_debug("%s: rx error: %d buffers out of %d missing\n",
> > > > > > > > > > > > > > >                                  dev->name, num_buf,
> > > > > > > > > > > > > > > @@ -1414,7 +1612,7 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
> > > > > > > > > > > > > > >  err_skb:
> > > > > > > > > > > > > > >         put_page(page);
> > > > > > > > > > > > > > >         while (num_buf-- > 1) {
> > > > > > > > > > > > > > > -               buf = virtqueue_get_buf(rq->vq, &len);
> > > > > > > > > > > > > > > +               buf = virtnet_rq_get_buf(rq, &len, NULL);
> > > > > > > > > > > > > > >                 if (unlikely(!buf)) {
> > > > > > > > > > > > > > >                         pr_debug("%s: rx error: %d buffers missing\n",
> > > > > > > > > > > > > > >                                  dev->name, num_buf);
> > > > > > > > > > > > > > > @@ -1529,6 +1727,7 @@ static int add_recvbuf_small(struct virtnet_info *vi, struct receive_queue *rq,
> > > > > > > > > > > > > > >         unsigned int xdp_headroom = virtnet_get_headroom(vi);
> > > > > > > > > > > > > > >         void *ctx = (void *)(unsigned long)xdp_headroom;
> > > > > > > > > > > > > > >         int len = vi->hdr_len + VIRTNET_RX_PAD + GOOD_PACKET_LEN + xdp_headroom;
> > > > > > > > > > > > > > > +       struct virtnet_rq_data *data;
> > > > > > > > > > > > > > >         int err;
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >         len = SKB_DATA_ALIGN(len) +
> > > > > > > > > > > > > > > @@ -1539,11 +1738,34 @@ static int add_recvbuf_small(struct virtnet_info *vi, struct receive_queue *rq,
> > > > > > > > > > > > > > >         buf = (char *)page_address(alloc_frag->page) + alloc_frag->offset;
> > > > > > > > > > > > > > >         get_page(alloc_frag->page);
> > > > > > > > > > > > > > >         alloc_frag->offset += len;
> > > > > > > > > > > > > > > -       sg_init_one(rq->sg, buf + VIRTNET_RX_PAD + xdp_headroom,
> > > > > > > > > > > > > > > -                   vi->hdr_len + GOOD_PACKET_LEN);
> > > > > > > > > > > > > > > -       err = virtqueue_add_inbuf_ctx(rq->vq, rq->sg, 1, buf, ctx, gfp);
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > +       if (rq->data_array) {
> > > > > > > > > > > > > > > +               err = virtnet_rq_map_sg(rq, buf + VIRTNET_RX_PAD + xdp_headroom,
> > > > > > > > > > > > > > > +                                       vi->hdr_len + GOOD_PACKET_LEN);
> > > > > > > > > > > > > > > +               if (err)
> > > > > > > > > > > > > > > +                       goto map_err;
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > +               data = virtnet_rq_get_data(rq, buf, rq->last_dma);
> > > > > > > > > > > > > > > +       } else {
> > > > > > > > > > > > > > > +               sg_init_one(rq->sg, buf + VIRTNET_RX_PAD + xdp_headroom,
> > > > > > > > > > > > > > > +                           vi->hdr_len + GOOD_PACKET_LEN);
> > > > > > > > > > > > > > > +               data = (void *)buf;
> > > > > > > > > > > > > > > +       }
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > +       err = virtqueue_add_inbuf_ctx(rq->vq, rq->sg, 1, data, ctx, gfp);
> > > > > > > > > > > > > > >         if (err < 0)
> > > > > > > > > > > > > > > -               put_page(virt_to_head_page(buf));
> > > > > > > > > > > > > > > +               goto add_err;
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > +       return err;
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > +add_err:
> > > > > > > > > > > > > > > +       if (rq->data_array) {
> > > > > > > > > > > > > > > +               virtnet_rq_unmap(rq, data->dma);
> > > > > > > > > > > > > > > +               virtnet_rq_recycle_data(rq, data);
> > > > > > > > > > > > > > > +       }
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > +map_err:
> > > > > > > > > > > > > > > +       put_page(virt_to_head_page(buf));
> > > > > > > > > > > > > > >         return err;
> > > > > > > > > > > > > > >  }
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > @@ -1620,6 +1842,7 @@ static int add_recvbuf_mergeable(struct virtnet_info *vi,
> > > > > > > > > > > > > > >         unsigned int headroom = virtnet_get_headroom(vi);
> > > > > > > > > > > > > > >         unsigned int tailroom = headroom ? sizeof(struct skb_shared_info) : 0;
> > > > > > > > > > > > > > >         unsigned int room = SKB_DATA_ALIGN(headroom + tailroom);
> > > > > > > > > > > > > > > +       struct virtnet_rq_data *data;
> > > > > > > > > > > > > > >         char *buf;
> > > > > > > > > > > > > > >         void *ctx;
> > > > > > > > > > > > > > >         int err;
> > > > > > > > > > > > > > > @@ -1650,12 +1873,32 @@ static int add_recvbuf_mergeable(struct virtnet_info *vi,
> > > > > > > > > > > > > > >                 alloc_frag->offset += hole;
> > > > > > > > > > > > > > >         }
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > -       sg_init_one(rq->sg, buf, len);
> > > > > > > > > > > > > > > +       if (rq->data_array) {
> > > > > > > > > > > > > > > +               err = virtnet_rq_map_sg(rq, buf, len);
> > > > > > > > > > > > > > > +               if (err)
> > > > > > > > > > > > > > > +                       goto map_err;
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > +               data = virtnet_rq_get_data(rq, buf, rq->last_dma);
> > > > > > > > > > > > > > > +       } else {
> > > > > > > > > > > > > > > +               sg_init_one(rq->sg, buf, len);
> > > > > > > > > > > > > > > +               data = (void *)buf;
> > > > > > > > > > > > > > > +       }
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > >         ctx = mergeable_len_to_ctx(len + room, headroom);
> > > > > > > > > > > > > > > -       err = virtqueue_add_inbuf_ctx(rq->vq, rq->sg, 1, buf, ctx, gfp);
> > > > > > > > > > > > > > > +       err = virtqueue_add_inbuf_ctx(rq->vq, rq->sg, 1, data, ctx, gfp);
> > > > > > > > > > > > > > >         if (err < 0)
> > > > > > > > > > > > > > > -               put_page(virt_to_head_page(buf));
> > > > > > > > > > > > > > > +               goto add_err;
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > +       return 0;
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > +add_err:
> > > > > > > > > > > > > > > +       if (rq->data_array) {
> > > > > > > > > > > > > > > +               virtnet_rq_unmap(rq, data->dma);
> > > > > > > > > > > > > > > +               virtnet_rq_recycle_data(rq, data);
> > > > > > > > > > > > > > > +       }
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > +map_err:
> > > > > > > > > > > > > > > +       put_page(virt_to_head_page(buf));
> > > > > > > > > > > > > > >         return err;
> > > > > > > > > > > > > > >  }
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > @@ -1775,13 +2018,13 @@ static int virtnet_receive(struct receive_queue *rq, int budget,
> > > > > > > > > > > > > > >                 void *ctx;
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >                 while (stats.packets < budget &&
> > > > > > > > > > > > > > > -                      (buf = virtqueue_get_buf_ctx(rq->vq, &len, &ctx))) {
> > > > > > > > > > > > > > > +                      (buf = virtnet_rq_get_buf(rq, &len, &ctx))) {
> > > > > > > > > > > > > > >                         receive_buf(vi, rq, buf, len, ctx, xdp_xmit, &stats);
> > > > > > > > > > > > > > >                         stats.packets++;
> > > > > > > > > > > > > > >                 }
> > > > > > > > > > > > > > >         } else {
> > > > > > > > > > > > > > >                 while (stats.packets < budget &&
> > > > > > > > > > > > > > > -                      (buf = virtqueue_get_buf(rq->vq, &len)) != NULL) {
> > > > > > > > > > > > > > > +                      (buf = virtnet_rq_get_buf(rq, &len, NULL)) != NULL) {
> > > > > > > > > > > > > > >                         receive_buf(vi, rq, buf, len, NULL, xdp_xmit, &stats);
> > > > > > > > > > > > > > >                         stats.packets++;
> > > > > > > > > > > > > > >                 }
> > > > > > > > > > > > > > > @@ -3514,6 +3757,9 @@ static void virtnet_free_queues(struct virtnet_info *vi)
> > > > > > > > > > > > > > >         for (i = 0; i < vi->max_queue_pairs; i++) {
> > > > > > > > > > > > > > >                 __netif_napi_del(&vi->rq[i].napi);
> > > > > > > > > > > > > > >                 __netif_napi_del(&vi->sq[i].napi);
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > +               kfree(vi->rq[i].data_array);
> > > > > > > > > > > > > > > +               kfree(vi->rq[i].dma_array);
> > > > > > > > > > > > > > >         }
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >         /* We called __netif_napi_del(),
> > > > > > > > > > > > > > > @@ -3591,9 +3837,10 @@ static void free_unused_bufs(struct virtnet_info *vi)
> > > > > > > > > > > > > > >         }
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >         for (i = 0; i < vi->max_queue_pairs; i++) {
> > > > > > > > > > > > > > > -               struct virtqueue *vq = vi->rq[i].vq;
> > > > > > > > > > > > > > > -               while ((buf = virtqueue_detach_unused_buf(vq)) != NULL)
> > > > > > > > > > > > > > > -                       virtnet_rq_free_unused_buf(vq, buf);
> > > > > > > > > > > > > > > +               struct receive_queue *rq = &vi->rq[i];
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > +               while ((buf = virtnet_rq_detach_unused_buf(rq)) != NULL)
> > > > > > > > > > > > > > > +                       virtnet_rq_free_unused_buf(rq->vq, buf);
> > > > > > > > > > > > > > >                 cond_resched();
> > > > > > > > > > > > > > >         }
> > > > > > > > > > > > > > >  }
> > > > > > > > > > > > > > > @@ -3767,6 +4014,10 @@ static int init_vqs(struct virtnet_info *vi)
> > > > > > > > > > > > > > >         if (ret)
> > > > > > > > > > > > > > >                 goto err_free;
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > +       ret = virtnet_rq_merge_map_init(vi);
> > > > > > > > > > > > > > > +       if (ret)
> > > > > > > > > > > > > > > +               goto err_free;
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > >         cpus_read_lock();
> > > > > > > > > > > > > > >         virtnet_set_affinity(vi);
> > > > > > > > > > > > > > >         cpus_read_unlock();
> > > > > > > > > > > > > > > --
> > > > > > > > > > > > > > > 2.32.0.3.g01195cf9f
> > > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > >
>

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: [PATCH vhost v11 10/10] virtio_net: merge dma operation for one page
@ 2023-07-20  2:24                                 ` Xuan Zhuo
  0 siblings, 0 replies; 176+ messages in thread
From: Xuan Zhuo @ 2023-07-20  2:24 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Jesper Dangaard Brouer, Daniel Borkmann, netdev, John Fastabend,
	Alexei Starovoitov, virtualization, Christoph Hellwig,
	Eric Dumazet, Jakub Kicinski, bpf, Paolo Abeni, David S. Miller

On Wed, 19 Jul 2023 04:55:04 -0400, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> On Wed, Jul 19, 2023 at 11:21:07AM +0800, Xuan Zhuo wrote:
> > On Fri, 14 Jul 2023 06:37:10 -0400, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > > On Wed, Jul 12, 2023 at 04:38:24PM +0800, Xuan Zhuo wrote:
> > > > On Wed, 12 Jul 2023 16:37:43 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > > > On Wed, Jul 12, 2023 at 4:33 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > > > >
> > > > > > On Wed, 12 Jul 2023 15:54:58 +0800, Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > > > > > On Tue, 11 Jul 2023 10:58:51 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > > > > > > On Tue, Jul 11, 2023 at 10:42 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > > > > > > >
> > > > > > > > > On Tue, 11 Jul 2023 10:36:17 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > > > > > > > > On Mon, Jul 10, 2023 at 8:41 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > > > > > > > > >
> > > > > > > > > > > On Mon, 10 Jul 2023 07:59:03 -0400, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > > > > > > > > > > > On Mon, Jul 10, 2023 at 06:18:30PM +0800, Xuan Zhuo wrote:
> > > > > > > > > > > > > On Mon, 10 Jul 2023 05:40:21 -0400, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > > > > > > > > > > > > > On Mon, Jul 10, 2023 at 11:42:37AM +0800, Xuan Zhuo wrote:
> > > > > > > > > > > > > > > Currently, the virtio core will perform a dma operation for each
> > > > > > > > > > > > > > > operation. Although, the same page may be operated multiple times.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > The driver does the dma operation and manages the dma address based the
> > > > > > > > > > > > > > > feature premapped of virtio core.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > This way, we can perform only one dma operation for the same page. In
> > > > > > > > > > > > > > > the case of mtu 1500, this can reduce a lot of dma operations.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Tested on Aliyun g7.4large machine, in the case of a cpu 100%, pps
> > > > > > > > > > > > > > > increased from 1893766 to 1901105. An increase of 0.4%.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > what kind of dma was there? an IOMMU? which vendors? in which mode
> > > > > > > > > > > > > > of operation?
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > Do you mean this:
> > > > > > > > > > > > >
> > > > > > > > > > > > > [    0.470816] iommu: Default domain type: Passthrough
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > With passthrough, dma API is just some indirect function calls, they do
> > > > > > > > > > > > not affect the performance a lot.
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > Yes, this benefit is worthless. I seem to have done a meaningless thing. The
> > > > > > > > > > > overhead of DMA I observed is indeed not too high.
> > > > > > > > > >
> > > > > > > > > > Have you measured with iommu=strict?
> > > > > > > > >
> > > > > > > > > I have not tested this way, our environment is pt, I wonder if strict is a
> > > > > > > > > common scenario. I can test it.
> > > > > > > >
> > > > > > > > It's not a common setup, but it's a way to stress DMA layer to see the overhead.
> > > > > > >
> > > > > > > kernel command line: intel_iommu=on iommu.strict=1 iommu.passthrough=0
> > > > > > >
> > > > > > > virtio-net without merge dma 428614.00 pps
> > > > > > >
> > > > > > > virtio-net with merge dma    742853.00 pps
> > > > > >
> > > > > >
> > > > > > kernel command line: intel_iommu=on iommu.strict=0 iommu.passthrough=0
> > > > > >
> > > > > > virtio-net without merge dma 775496.00 pps
> > > > > >
> > > > > > virtio-net with merge dma    1010514.00 pps
> > > > > >
> > > > > >
> > > > >
> > > > > Great, let's add those numbers to the changelog.
> > > >
> > > >
> > > > Yes, I will do it in next version.
> > > >
> > > >
> > > > Thanks.
> > > >
> > >
> > > You should also test without iommu but with swiotlb=force
> >
> >
> > For swiotlb, merge DMA has no benefit, because we still need to copy data from
> > swiotlb buffer to the origin buffer.
> > The benefit of the merge DMA is to reduce the operate to the iommu device.
> >
> > I did some test for this. The result is same.
> >
> > Thanks.
> >
>
> Did you actually check that it works though?
> Looks like with swiotlb you need to synch to trigger a copy
> before unmap, and I don't see where it's done in the current
> patch.

Yes, you are right, I miss the sync in this patch.
But when I tested for swiotlb, I fixed this.
You can see this in the next version.

Thanks.


>
>
> >
> > >
> > > But first fix the use of DMA API to actually be correct,
> > > otherwise you are cheating by avoiding synchronization.
> > >
> > >
> > >
> > > > >
> > > > > Thanks
> > > > >
> > > > > > Thanks.
> > > > > >
> > > > > > >
> > > > > > >
> > > > > > > Thanks.
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > >
> > > > > > > > Thanks
> > > > > > > >
> > > > > > > > >
> > > > > > > > > Thanks.
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > Thanks
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > Thanks.
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > Try e.g. bounce buffer. Which is where you will see a problem: your
> > > > > > > > > > > > patches won't work.
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > This kind of difference is likely in the noise.
> > > > > > > > > > > > >
> > > > > > > > > > > > > It's really not high, but this is because the proportion of DMA under perf top
> > > > > > > > > > > > > is not high. Probably that much.
> > > > > > > > > > > >
> > > > > > > > > > > > So maybe not worth the complexity.
> > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > ---
> > > > > > > > > > > > > > >  drivers/net/virtio_net.c | 283 ++++++++++++++++++++++++++++++++++++---
> > > > > > > > > > > > > > >  1 file changed, 267 insertions(+), 16 deletions(-)
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > > > > > > > > > > > > > > index 486b5849033d..4de845d35bed 100644
> > > > > > > > > > > > > > > --- a/drivers/net/virtio_net.c
> > > > > > > > > > > > > > > +++ b/drivers/net/virtio_net.c
> > > > > > > > > > > > > > > @@ -126,6 +126,27 @@ static const struct virtnet_stat_desc virtnet_rq_stats_desc[] = {
> > > > > > > > > > > > > > >  #define VIRTNET_SQ_STATS_LEN   ARRAY_SIZE(virtnet_sq_stats_desc)
> > > > > > > > > > > > > > >  #define VIRTNET_RQ_STATS_LEN   ARRAY_SIZE(virtnet_rq_stats_desc)
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > +/* The bufs on the same page may share this struct. */
> > > > > > > > > > > > > > > +struct virtnet_rq_dma {
> > > > > > > > > > > > > > > +       struct virtnet_rq_dma *next;
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > +       dma_addr_t addr;
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > +       void *buf;
> > > > > > > > > > > > > > > +       u32 len;
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > +       u32 ref;
> > > > > > > > > > > > > > > +};
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > +/* Record the dma and buf. */
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > I guess I see that. But why?
> > > > > > > > > > > > > > And these two comments are the extent of the available
> > > > > > > > > > > > > > documentation, that's not enough I feel.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > +struct virtnet_rq_data {
> > > > > > > > > > > > > > > +       struct virtnet_rq_data *next;
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Is manually reimplementing a linked list the best
> > > > > > > > > > > > > > we can do?
> > > > > > > > > > > > >
> > > > > > > > > > > > > Yes, we can use llist.
> > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > +       void *buf;
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > +       struct virtnet_rq_dma *dma;
> > > > > > > > > > > > > > > +};
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > >  /* Internal representation of a send virtqueue */
> > > > > > > > > > > > > > >  struct send_queue {
> > > > > > > > > > > > > > >         /* Virtqueue associated with this send _queue */
> > > > > > > > > > > > > > > @@ -175,6 +196,13 @@ struct receive_queue {
> > > > > > > > > > > > > > >         char name[16];
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >         struct xdp_rxq_info xdp_rxq;
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > +       struct virtnet_rq_data *data_array;
> > > > > > > > > > > > > > > +       struct virtnet_rq_data *data_free;
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > +       struct virtnet_rq_dma *dma_array;
> > > > > > > > > > > > > > > +       struct virtnet_rq_dma *dma_free;
> > > > > > > > > > > > > > > +       struct virtnet_rq_dma *last_dma;
> > > > > > > > > > > > > > >  };
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >  /* This structure can contain rss message with maximum settings for indirection table and keysize
> > > > > > > > > > > > > > > @@ -549,6 +577,176 @@ static struct sk_buff *page_to_skb(struct virtnet_info *vi,
> > > > > > > > > > > > > > >         return skb;
> > > > > > > > > > > > > > >  }
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > +static void virtnet_rq_unmap(struct receive_queue *rq, struct virtnet_rq_dma *dma)
> > > > > > > > > > > > > > > +{
> > > > > > > > > > > > > > > +       struct device *dev;
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > +       --dma->ref;
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > +       if (dma->ref)
> > > > > > > > > > > > > > > +               return;
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > If you don't unmap there is no guarantee valid data will be
> > > > > > > > > > > > > > there in the buffer.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > +       dev = virtqueue_dma_dev(rq->vq);
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > +       dma_unmap_page(dev, dma->addr, dma->len, DMA_FROM_DEVICE);
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > +       dma->next = rq->dma_free;
> > > > > > > > > > > > > > > +       rq->dma_free = dma;
> > > > > > > > > > > > > > > +}
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > +static void *virtnet_rq_recycle_data(struct receive_queue *rq,
> > > > > > > > > > > > > > > +                                    struct virtnet_rq_data *data)
> > > > > > > > > > > > > > > +{
> > > > > > > > > > > > > > > +       void *buf;
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > +       buf = data->buf;
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > +       data->next = rq->data_free;
> > > > > > > > > > > > > > > +       rq->data_free = data;
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > +       return buf;
> > > > > > > > > > > > > > > +}
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > +static struct virtnet_rq_data *virtnet_rq_get_data(struct receive_queue *rq,
> > > > > > > > > > > > > > > +                                                  void *buf,
> > > > > > > > > > > > > > > +                                                  struct virtnet_rq_dma *dma)
> > > > > > > > > > > > > > > +{
> > > > > > > > > > > > > > > +       struct virtnet_rq_data *data;
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > +       data = rq->data_free;
> > > > > > > > > > > > > > > +       rq->data_free = data->next;
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > +       data->buf = buf;
> > > > > > > > > > > > > > > +       data->dma = dma;
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > +       return data;
> > > > > > > > > > > > > > > +}
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > +static void *virtnet_rq_get_buf(struct receive_queue *rq, u32 *len, void **ctx)
> > > > > > > > > > > > > > > +{
> > > > > > > > > > > > > > > +       struct virtnet_rq_data *data;
> > > > > > > > > > > > > > > +       void *buf;
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > +       buf = virtqueue_get_buf_ctx(rq->vq, len, ctx);
> > > > > > > > > > > > > > > +       if (!buf || !rq->data_array)
> > > > > > > > > > > > > > > +               return buf;
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > +       data = buf;
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > +       virtnet_rq_unmap(rq, data->dma);
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > +       return virtnet_rq_recycle_data(rq, data);
> > > > > > > > > > > > > > > +}
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > +static void *virtnet_rq_detach_unused_buf(struct receive_queue *rq)
> > > > > > > > > > > > > > > +{
> > > > > > > > > > > > > > > +       struct virtnet_rq_data *data;
> > > > > > > > > > > > > > > +       void *buf;
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > +       buf = virtqueue_detach_unused_buf(rq->vq);
> > > > > > > > > > > > > > > +       if (!buf || !rq->data_array)
> > > > > > > > > > > > > > > +               return buf;
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > +       data = buf;
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > +       virtnet_rq_unmap(rq, data->dma);
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > +       return virtnet_rq_recycle_data(rq, data);
> > > > > > > > > > > > > > > +}
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > +static int virtnet_rq_map_sg(struct receive_queue *rq, void *buf, u32 len)
> > > > > > > > > > > > > > > +{
> > > > > > > > > > > > > > > +       struct virtnet_rq_dma *dma = rq->last_dma;
> > > > > > > > > > > > > > > +       struct device *dev;
> > > > > > > > > > > > > > > +       u32 off, map_len;
> > > > > > > > > > > > > > > +       dma_addr_t addr;
> > > > > > > > > > > > > > > +       void *end;
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > +       if (likely(dma) && buf >= dma->buf && (buf + len <= dma->buf + dma->len)) {
> > > > > > > > > > > > > > > +               ++dma->ref;
> > > > > > > > > > > > > > > +               addr = dma->addr + (buf - dma->buf);
> > > > > > > > > > > > > > > +               goto ok;
> > > > > > > > > > > > > > > +       }
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > So this is the meat of the proposed optimization. I guess that
> > > > > > > > > > > > > > if the last buffer we allocated happens to be in the same page
> > > > > > > > > > > > > > as this one then they can both be mapped for DMA together.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Since we use page_frag, the buffers we allocated are all continuous.
> > > > > > > > > > > > >
> > > > > > > > > > > > > > Why last one specifically? Whether next one happens to
> > > > > > > > > > > > > > be close depends on luck. If you want to try optimizing this
> > > > > > > > > > > > > > the right thing to do is likely by using a page pool.
> > > > > > > > > > > > > > There's actually work upstream on page pool, look it up.
> > > > > > > > > > > > >
> > > > > > > > > > > > > As we discussed in another thread, the page pool is first used for xdp. Let's
> > > > > > > > > > > > > transform it step by step.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Thanks.
> > > > > > > > > > > >
> > > > > > > > > > > > ok so this should wait then?
> > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > +       end = buf + len - 1;
> > > > > > > > > > > > > > > +       off = offset_in_page(end);
> > > > > > > > > > > > > > > +       map_len = len + PAGE_SIZE - off;
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > +       dev = virtqueue_dma_dev(rq->vq);
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > +       addr = dma_map_page_attrs(dev, virt_to_page(buf), offset_in_page(buf),
> > > > > > > > > > > > > > > +                                 map_len, DMA_FROM_DEVICE, 0);
> > > > > > > > > > > > > > > +       if (addr == DMA_MAPPING_ERROR)
> > > > > > > > > > > > > > > +               return -ENOMEM;
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > +       dma = rq->dma_free;
> > > > > > > > > > > > > > > +       rq->dma_free = dma->next;
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > +       dma->ref = 1;
> > > > > > > > > > > > > > > +       dma->buf = buf;
> > > > > > > > > > > > > > > +       dma->addr = addr;
> > > > > > > > > > > > > > > +       dma->len = map_len;
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > +       rq->last_dma = dma;
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > +ok:
> > > > > > > > > > > > > > > +       sg_init_table(rq->sg, 1);
> > > > > > > > > > > > > > > +       rq->sg[0].dma_address = addr;
> > > > > > > > > > > > > > > +       rq->sg[0].length = len;
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > +       return 0;
> > > > > > > > > > > > > > > +}
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > +static int virtnet_rq_merge_map_init(struct virtnet_info *vi)
> > > > > > > > > > > > > > > +{
> > > > > > > > > > > > > > > +       struct receive_queue *rq;
> > > > > > > > > > > > > > > +       int i, err, j, num;
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > +       /* disable for big mode */
> > > > > > > > > > > > > > > +       if (!vi->mergeable_rx_bufs && vi->big_packets)
> > > > > > > > > > > > > > > +               return 0;
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > +       for (i = 0; i < vi->max_queue_pairs; i++) {
> > > > > > > > > > > > > > > +               err = virtqueue_set_premapped(vi->rq[i].vq);
> > > > > > > > > > > > > > > +               if (err)
> > > > > > > > > > > > > > > +                       continue;
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > +               rq = &vi->rq[i];
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > +               num = virtqueue_get_vring_size(rq->vq);
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > +               rq->data_array = kmalloc_array(num, sizeof(*rq->data_array), GFP_KERNEL);
> > > > > > > > > > > > > > > +               if (!rq->data_array)
> > > > > > > > > > > > > > > +                       goto err;
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > +               rq->dma_array = kmalloc_array(num, sizeof(*rq->dma_array), GFP_KERNEL);
> > > > > > > > > > > > > > > +               if (!rq->dma_array)
> > > > > > > > > > > > > > > +                       goto err;
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > +               for (j = 0; j < num; ++j) {
> > > > > > > > > > > > > > > +                       rq->data_array[j].next = rq->data_free;
> > > > > > > > > > > > > > > +                       rq->data_free = &rq->data_array[j];
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > +                       rq->dma_array[j].next = rq->dma_free;
> > > > > > > > > > > > > > > +                       rq->dma_free = &rq->dma_array[j];
> > > > > > > > > > > > > > > +               }
> > > > > > > > > > > > > > > +       }
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > +       return 0;
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > +err:
> > > > > > > > > > > > > > > +       for (i = 0; i < vi->max_queue_pairs; i++) {
> > > > > > > > > > > > > > > +               struct receive_queue *rq;
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > +               rq = &vi->rq[i];
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > +               kfree(rq->dma_array);
> > > > > > > > > > > > > > > +               kfree(rq->data_array);
> > > > > > > > > > > > > > > +       }
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > +       return -ENOMEM;
> > > > > > > > > > > > > > > +}
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > >  static void free_old_xmit_skbs(struct send_queue *sq, bool in_napi)
> > > > > > > > > > > > > > >  {
> > > > > > > > > > > > > > >         unsigned int len;
> > > > > > > > > > > > > > > @@ -835,7 +1033,7 @@ static struct page *xdp_linearize_page(struct receive_queue *rq,
> > > > > > > > > > > > > > >                 void *buf;
> > > > > > > > > > > > > > >                 int off;
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > -               buf = virtqueue_get_buf(rq->vq, &buflen);
> > > > > > > > > > > > > > > +               buf = virtnet_rq_get_buf(rq, &buflen, NULL);
> > > > > > > > > > > > > > >                 if (unlikely(!buf))
> > > > > > > > > > > > > > >                         goto err_buf;
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > @@ -1126,7 +1324,7 @@ static int virtnet_build_xdp_buff_mrg(struct net_device *dev,
> > > > > > > > > > > > > > >                 return -EINVAL;
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >         while (--*num_buf > 0) {
> > > > > > > > > > > > > > > -               buf = virtqueue_get_buf_ctx(rq->vq, &len, &ctx);
> > > > > > > > > > > > > > > +               buf = virtnet_rq_get_buf(rq, &len, &ctx);
> > > > > > > > > > > > > > >                 if (unlikely(!buf)) {
> > > > > > > > > > > > > > >                         pr_debug("%s: rx error: %d buffers out of %d missing\n",
> > > > > > > > > > > > > > >                                  dev->name, *num_buf,
> > > > > > > > > > > > > > > @@ -1351,7 +1549,7 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
> > > > > > > > > > > > > > >         while (--num_buf) {
> > > > > > > > > > > > > > >                 int num_skb_frags;
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > -               buf = virtqueue_get_buf_ctx(rq->vq, &len, &ctx);
> > > > > > > > > > > > > > > +               buf = virtnet_rq_get_buf(rq, &len, &ctx);
> > > > > > > > > > > > > > >                 if (unlikely(!buf)) {
> > > > > > > > > > > > > > >                         pr_debug("%s: rx error: %d buffers out of %d missing\n",
> > > > > > > > > > > > > > >                                  dev->name, num_buf,
> > > > > > > > > > > > > > > @@ -1414,7 +1612,7 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
> > > > > > > > > > > > > > >  err_skb:
> > > > > > > > > > > > > > >         put_page(page);
> > > > > > > > > > > > > > >         while (num_buf-- > 1) {
> > > > > > > > > > > > > > > -               buf = virtqueue_get_buf(rq->vq, &len);
> > > > > > > > > > > > > > > +               buf = virtnet_rq_get_buf(rq, &len, NULL);
> > > > > > > > > > > > > > >                 if (unlikely(!buf)) {
> > > > > > > > > > > > > > >                         pr_debug("%s: rx error: %d buffers missing\n",
> > > > > > > > > > > > > > >                                  dev->name, num_buf);
> > > > > > > > > > > > > > > @@ -1529,6 +1727,7 @@ static int add_recvbuf_small(struct virtnet_info *vi, struct receive_queue *rq,
> > > > > > > > > > > > > > >         unsigned int xdp_headroom = virtnet_get_headroom(vi);
> > > > > > > > > > > > > > >         void *ctx = (void *)(unsigned long)xdp_headroom;
> > > > > > > > > > > > > > >         int len = vi->hdr_len + VIRTNET_RX_PAD + GOOD_PACKET_LEN + xdp_headroom;
> > > > > > > > > > > > > > > +       struct virtnet_rq_data *data;
> > > > > > > > > > > > > > >         int err;
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >         len = SKB_DATA_ALIGN(len) +
> > > > > > > > > > > > > > > @@ -1539,11 +1738,34 @@ static int add_recvbuf_small(struct virtnet_info *vi, struct receive_queue *rq,
> > > > > > > > > > > > > > >         buf = (char *)page_address(alloc_frag->page) + alloc_frag->offset;
> > > > > > > > > > > > > > >         get_page(alloc_frag->page);
> > > > > > > > > > > > > > >         alloc_frag->offset += len;
> > > > > > > > > > > > > > > -       sg_init_one(rq->sg, buf + VIRTNET_RX_PAD + xdp_headroom,
> > > > > > > > > > > > > > > -                   vi->hdr_len + GOOD_PACKET_LEN);
> > > > > > > > > > > > > > > -       err = virtqueue_add_inbuf_ctx(rq->vq, rq->sg, 1, buf, ctx, gfp);
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > +       if (rq->data_array) {
> > > > > > > > > > > > > > > +               err = virtnet_rq_map_sg(rq, buf + VIRTNET_RX_PAD + xdp_headroom,
> > > > > > > > > > > > > > > +                                       vi->hdr_len + GOOD_PACKET_LEN);
> > > > > > > > > > > > > > > +               if (err)
> > > > > > > > > > > > > > > +                       goto map_err;
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > +               data = virtnet_rq_get_data(rq, buf, rq->last_dma);
> > > > > > > > > > > > > > > +       } else {
> > > > > > > > > > > > > > > +               sg_init_one(rq->sg, buf + VIRTNET_RX_PAD + xdp_headroom,
> > > > > > > > > > > > > > > +                           vi->hdr_len + GOOD_PACKET_LEN);
> > > > > > > > > > > > > > > +               data = (void *)buf;
> > > > > > > > > > > > > > > +       }
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > +       err = virtqueue_add_inbuf_ctx(rq->vq, rq->sg, 1, data, ctx, gfp);
> > > > > > > > > > > > > > >         if (err < 0)
> > > > > > > > > > > > > > > -               put_page(virt_to_head_page(buf));
> > > > > > > > > > > > > > > +               goto add_err;
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > +       return err;
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > +add_err:
> > > > > > > > > > > > > > > +       if (rq->data_array) {
> > > > > > > > > > > > > > > +               virtnet_rq_unmap(rq, data->dma);
> > > > > > > > > > > > > > > +               virtnet_rq_recycle_data(rq, data);
> > > > > > > > > > > > > > > +       }
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > +map_err:
> > > > > > > > > > > > > > > +       put_page(virt_to_head_page(buf));
> > > > > > > > > > > > > > >         return err;
> > > > > > > > > > > > > > >  }
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > @@ -1620,6 +1842,7 @@ static int add_recvbuf_mergeable(struct virtnet_info *vi,
> > > > > > > > > > > > > > >         unsigned int headroom = virtnet_get_headroom(vi);
> > > > > > > > > > > > > > >         unsigned int tailroom = headroom ? sizeof(struct skb_shared_info) : 0;
> > > > > > > > > > > > > > >         unsigned int room = SKB_DATA_ALIGN(headroom + tailroom);
> > > > > > > > > > > > > > > +       struct virtnet_rq_data *data;
> > > > > > > > > > > > > > >         char *buf;
> > > > > > > > > > > > > > >         void *ctx;
> > > > > > > > > > > > > > >         int err;
> > > > > > > > > > > > > > > @@ -1650,12 +1873,32 @@ static int add_recvbuf_mergeable(struct virtnet_info *vi,
> > > > > > > > > > > > > > >                 alloc_frag->offset += hole;
> > > > > > > > > > > > > > >         }
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > -       sg_init_one(rq->sg, buf, len);
> > > > > > > > > > > > > > > +       if (rq->data_array) {
> > > > > > > > > > > > > > > +               err = virtnet_rq_map_sg(rq, buf, len);
> > > > > > > > > > > > > > > +               if (err)
> > > > > > > > > > > > > > > +                       goto map_err;
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > +               data = virtnet_rq_get_data(rq, buf, rq->last_dma);
> > > > > > > > > > > > > > > +       } else {
> > > > > > > > > > > > > > > +               sg_init_one(rq->sg, buf, len);
> > > > > > > > > > > > > > > +               data = (void *)buf;
> > > > > > > > > > > > > > > +       }
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > >         ctx = mergeable_len_to_ctx(len + room, headroom);
> > > > > > > > > > > > > > > -       err = virtqueue_add_inbuf_ctx(rq->vq, rq->sg, 1, buf, ctx, gfp);
> > > > > > > > > > > > > > > +       err = virtqueue_add_inbuf_ctx(rq->vq, rq->sg, 1, data, ctx, gfp);
> > > > > > > > > > > > > > >         if (err < 0)
> > > > > > > > > > > > > > > -               put_page(virt_to_head_page(buf));
> > > > > > > > > > > > > > > +               goto add_err;
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > +       return 0;
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > +add_err:
> > > > > > > > > > > > > > > +       if (rq->data_array) {
> > > > > > > > > > > > > > > +               virtnet_rq_unmap(rq, data->dma);
> > > > > > > > > > > > > > > +               virtnet_rq_recycle_data(rq, data);
> > > > > > > > > > > > > > > +       }
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > +map_err:
> > > > > > > > > > > > > > > +       put_page(virt_to_head_page(buf));
> > > > > > > > > > > > > > >         return err;
> > > > > > > > > > > > > > >  }
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > @@ -1775,13 +2018,13 @@ static int virtnet_receive(struct receive_queue *rq, int budget,
> > > > > > > > > > > > > > >                 void *ctx;
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >                 while (stats.packets < budget &&
> > > > > > > > > > > > > > > -                      (buf = virtqueue_get_buf_ctx(rq->vq, &len, &ctx))) {
> > > > > > > > > > > > > > > +                      (buf = virtnet_rq_get_buf(rq, &len, &ctx))) {
> > > > > > > > > > > > > > >                         receive_buf(vi, rq, buf, len, ctx, xdp_xmit, &stats);
> > > > > > > > > > > > > > >                         stats.packets++;
> > > > > > > > > > > > > > >                 }
> > > > > > > > > > > > > > >         } else {
> > > > > > > > > > > > > > >                 while (stats.packets < budget &&
> > > > > > > > > > > > > > > -                      (buf = virtqueue_get_buf(rq->vq, &len)) != NULL) {
> > > > > > > > > > > > > > > +                      (buf = virtnet_rq_get_buf(rq, &len, NULL)) != NULL) {
> > > > > > > > > > > > > > >                         receive_buf(vi, rq, buf, len, NULL, xdp_xmit, &stats);
> > > > > > > > > > > > > > >                         stats.packets++;
> > > > > > > > > > > > > > >                 }
> > > > > > > > > > > > > > > @@ -3514,6 +3757,9 @@ static void virtnet_free_queues(struct virtnet_info *vi)
> > > > > > > > > > > > > > >         for (i = 0; i < vi->max_queue_pairs; i++) {
> > > > > > > > > > > > > > >                 __netif_napi_del(&vi->rq[i].napi);
> > > > > > > > > > > > > > >                 __netif_napi_del(&vi->sq[i].napi);
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > +               kfree(vi->rq[i].data_array);
> > > > > > > > > > > > > > > +               kfree(vi->rq[i].dma_array);
> > > > > > > > > > > > > > >         }
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >         /* We called __netif_napi_del(),
> > > > > > > > > > > > > > > @@ -3591,9 +3837,10 @@ static void free_unused_bufs(struct virtnet_info *vi)
> > > > > > > > > > > > > > >         }
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >         for (i = 0; i < vi->max_queue_pairs; i++) {
> > > > > > > > > > > > > > > -               struct virtqueue *vq = vi->rq[i].vq;
> > > > > > > > > > > > > > > -               while ((buf = virtqueue_detach_unused_buf(vq)) != NULL)
> > > > > > > > > > > > > > > -                       virtnet_rq_free_unused_buf(vq, buf);
> > > > > > > > > > > > > > > +               struct receive_queue *rq = &vi->rq[i];
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > +               while ((buf = virtnet_rq_detach_unused_buf(rq)) != NULL)
> > > > > > > > > > > > > > > +                       virtnet_rq_free_unused_buf(rq->vq, buf);
> > > > > > > > > > > > > > >                 cond_resched();
> > > > > > > > > > > > > > >         }
> > > > > > > > > > > > > > >  }
> > > > > > > > > > > > > > > @@ -3767,6 +4014,10 @@ static int init_vqs(struct virtnet_info *vi)
> > > > > > > > > > > > > > >         if (ret)
> > > > > > > > > > > > > > >                 goto err_free;
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > +       ret = virtnet_rq_merge_map_init(vi);
> > > > > > > > > > > > > > > +       if (ret)
> > > > > > > > > > > > > > > +               goto err_free;
> > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > >         cpus_read_lock();
> > > > > > > > > > > > > > >         virtnet_set_affinity(vi);
> > > > > > > > > > > > > > >         cpus_read_unlock();
> > > > > > > > > > > > > > > --
> > > > > > > > > > > > > > > 2.32.0.3.g01195cf9f
> > > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > >
>
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: [PATCH vhost v11 10/10] virtio_net: merge dma operation for one page
  2023-07-19  9:51                                   ` Michael S. Tsirkin
@ 2023-07-20  2:26                                     ` Xuan Zhuo
  -1 siblings, 0 replies; 176+ messages in thread
From: Xuan Zhuo @ 2023-07-20  2:26 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Jesper Dangaard Brouer, Daniel Borkmann, netdev, John Fastabend,
	Alexei Starovoitov, virtualization, Christoph Hellwig,
	Eric Dumazet, Jakub Kicinski, bpf, Paolo Abeni, David S. Miller,
	Jason Wang

On Wed, 19 Jul 2023 05:51:50 -0400, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> On Wed, Jul 19, 2023 at 05:38:56PM +0800, Jason Wang wrote:
> > On Wed, Jul 19, 2023 at 4:55 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > >
> > > On Wed, Jul 19, 2023 at 11:21:07AM +0800, Xuan Zhuo wrote:
> > > > On Fri, 14 Jul 2023 06:37:10 -0400, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > > > > On Wed, Jul 12, 2023 at 04:38:24PM +0800, Xuan Zhuo wrote:
> > > > > > On Wed, 12 Jul 2023 16:37:43 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > > > > > On Wed, Jul 12, 2023 at 4:33 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > > > > > >
> > > > > > > > On Wed, 12 Jul 2023 15:54:58 +0800, Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > > > > > > > On Tue, 11 Jul 2023 10:58:51 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > > > > > > > > On Tue, Jul 11, 2023 at 10:42 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > > > > > > > > >
> > > > > > > > > > > On Tue, 11 Jul 2023 10:36:17 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > > > > > > > > > > On Mon, Jul 10, 2023 at 8:41 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > > On Mon, 10 Jul 2023 07:59:03 -0400, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > > > > > > > > > > > > > On Mon, Jul 10, 2023 at 06:18:30PM +0800, Xuan Zhuo wrote:
> > > > > > > > > > > > > > > On Mon, 10 Jul 2023 05:40:21 -0400, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > > > > > > > > > > > > > > > On Mon, Jul 10, 2023 at 11:42:37AM +0800, Xuan Zhuo wrote:
> > > > > > > > > > > > > > > > > Currently, the virtio core will perform a dma operation for each
> > > > > > > > > > > > > > > > > operation. Although, the same page may be operated multiple times.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > The driver does the dma operation and manages the dma address based the
> > > > > > > > > > > > > > > > > feature premapped of virtio core.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > This way, we can perform only one dma operation for the same page. In
> > > > > > > > > > > > > > > > > the case of mtu 1500, this can reduce a lot of dma operations.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Tested on Aliyun g7.4large machine, in the case of a cpu 100%, pps
> > > > > > > > > > > > > > > > > increased from 1893766 to 1901105. An increase of 0.4%.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > what kind of dma was there? an IOMMU? which vendors? in which mode
> > > > > > > > > > > > > > > > of operation?
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Do you mean this:
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > [    0.470816] iommu: Default domain type: Passthrough
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > With passthrough, dma API is just some indirect function calls, they do
> > > > > > > > > > > > > > not affect the performance a lot.
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > Yes, this benefit is worthless. I seem to have done a meaningless thing. The
> > > > > > > > > > > > > overhead of DMA I observed is indeed not too high.
> > > > > > > > > > > >
> > > > > > > > > > > > Have you measured with iommu=strict?
> > > > > > > > > > >
> > > > > > > > > > > I have not tested this way, our environment is pt, I wonder if strict is a
> > > > > > > > > > > common scenario. I can test it.
> > > > > > > > > >
> > > > > > > > > > It's not a common setup, but it's a way to stress DMA layer to see the overhead.
> > > > > > > > >
> > > > > > > > > kernel command line: intel_iommu=on iommu.strict=1 iommu.passthrough=0
> > > > > > > > >
> > > > > > > > > virtio-net without merge dma 428614.00 pps
> > > > > > > > >
> > > > > > > > > virtio-net with merge dma    742853.00 pps
> > > > > > > >
> > > > > > > >
> > > > > > > > kernel command line: intel_iommu=on iommu.strict=0 iommu.passthrough=0
> > > > > > > >
> > > > > > > > virtio-net without merge dma 775496.00 pps
> > > > > > > >
> > > > > > > > virtio-net with merge dma    1010514.00 pps
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > > > Great, let's add those numbers to the changelog.
> > > > > >
> > > > > >
> > > > > > Yes, I will do it in next version.
> > > > > >
> > > > > >
> > > > > > Thanks.
> > > > > >
> > > > >
> > > > > You should also test without iommu but with swiotlb=force
> > > >
> > > >
> > > > For swiotlb, merge DMA has no benefit, because we still need to copy data from
> > > > swiotlb buffer to the origin buffer.
> > > > The benefit of the merge DMA is to reduce the operate to the iommu device.
> > > >
> > > > I did some test for this. The result is same.
> > > >
> > > > Thanks.
> > > >
> > >
> > > Did you actually check that it works though?
> > > Looks like with swiotlb you need to synch to trigger a copy
> > > before unmap, and I don't see where it's done in the current
> > > patch.
> >
> > And this is needed for XDP_REDIRECT as well.
> >
> > Thanks
>
> And once you do, you'll do the copy twice so it will
> actually be slower.

Yes, I also think so. But, I did not see too much decline.
There may be a fluctuating effect.

>
> I suspect you need to sync manually then unmap with DMA_ATTR_SKIP_CPU_SYNC.

DMA_ATTR_SKIP_CPU_SYNC is great!!

I will include this in v13.

Thanks.


>
> > >
> > >
> > > >
> > > > >
> > > > > But first fix the use of DMA API to actually be correct,
> > > > > otherwise you are cheating by avoiding synchronization.
> > > > >
> > > > >
> > > > >
> > > > > > >
> > > > > > > Thanks
> > > > > > >
> > > > > > > > Thanks.
> > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > Thanks.
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > Thanks
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > Thanks.
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > Thanks
> > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > Thanks.
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Try e.g. bounce buffer. Which is where you will see a problem: your
> > > > > > > > > > > > > > patches won't work.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > This kind of difference is likely in the noise.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > It's really not high, but this is because the proportion of DMA under perf top
> > > > > > > > > > > > > > > is not high. Probably that much.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > So maybe not worth the complexity.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > ---
> > > > > > > > > > > > > > > > >  drivers/net/virtio_net.c | 283 ++++++++++++++++++++++++++++++++++++---
> > > > > > > > > > > > > > > > >  1 file changed, 267 insertions(+), 16 deletions(-)
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > > > > > > > > > > > > > > > > index 486b5849033d..4de845d35bed 100644
> > > > > > > > > > > > > > > > > --- a/drivers/net/virtio_net.c
> > > > > > > > > > > > > > > > > +++ b/drivers/net/virtio_net.c
> > > > > > > > > > > > > > > > > @@ -126,6 +126,27 @@ static const struct virtnet_stat_desc virtnet_rq_stats_desc[] = {
> > > > > > > > > > > > > > > > >  #define VIRTNET_SQ_STATS_LEN   ARRAY_SIZE(virtnet_sq_stats_desc)
> > > > > > > > > > > > > > > > >  #define VIRTNET_RQ_STATS_LEN   ARRAY_SIZE(virtnet_rq_stats_desc)
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > +/* The bufs on the same page may share this struct. */
> > > > > > > > > > > > > > > > > +struct virtnet_rq_dma {
> > > > > > > > > > > > > > > > > +       struct virtnet_rq_dma *next;
> > > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > > +       dma_addr_t addr;
> > > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > > +       void *buf;
> > > > > > > > > > > > > > > > > +       u32 len;
> > > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > > +       u32 ref;
> > > > > > > > > > > > > > > > > +};
> > > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > > +/* Record the dma and buf. */
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > I guess I see that. But why?
> > > > > > > > > > > > > > > > And these two comments are the extent of the available
> > > > > > > > > > > > > > > > documentation, that's not enough I feel.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > +struct virtnet_rq_data {
> > > > > > > > > > > > > > > > > +       struct virtnet_rq_data *next;
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Is manually reimplementing a linked list the best
> > > > > > > > > > > > > > > > we can do?
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Yes, we can use llist.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > > +       void *buf;
> > > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > > +       struct virtnet_rq_dma *dma;
> > > > > > > > > > > > > > > > > +};
> > > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > >  /* Internal representation of a send virtqueue */
> > > > > > > > > > > > > > > > >  struct send_queue {
> > > > > > > > > > > > > > > > >         /* Virtqueue associated with this send _queue */
> > > > > > > > > > > > > > > > > @@ -175,6 +196,13 @@ struct receive_queue {
> > > > > > > > > > > > > > > > >         char name[16];
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >         struct xdp_rxq_info xdp_rxq;
> > > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > > +       struct virtnet_rq_data *data_array;
> > > > > > > > > > > > > > > > > +       struct virtnet_rq_data *data_free;
> > > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > > +       struct virtnet_rq_dma *dma_array;
> > > > > > > > > > > > > > > > > +       struct virtnet_rq_dma *dma_free;
> > > > > > > > > > > > > > > > > +       struct virtnet_rq_dma *last_dma;
> > > > > > > > > > > > > > > > >  };
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >  /* This structure can contain rss message with maximum settings for indirection table and keysize
> > > > > > > > > > > > > > > > > @@ -549,6 +577,176 @@ static struct sk_buff *page_to_skb(struct virtnet_info *vi,
> > > > > > > > > > > > > > > > >         return skb;
> > > > > > > > > > > > > > > > >  }
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > +static void virtnet_rq_unmap(struct receive_queue *rq, struct virtnet_rq_dma *dma)
> > > > > > > > > > > > > > > > > +{
> > > > > > > > > > > > > > > > > +       struct device *dev;
> > > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > > +       --dma->ref;
> > > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > > +       if (dma->ref)
> > > > > > > > > > > > > > > > > +               return;
> > > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > If you don't unmap there is no guarantee valid data will be
> > > > > > > > > > > > > > > > there in the buffer.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > +       dev = virtqueue_dma_dev(rq->vq);
> > > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > > +       dma_unmap_page(dev, dma->addr, dma->len, DMA_FROM_DEVICE);
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > > +       dma->next = rq->dma_free;
> > > > > > > > > > > > > > > > > +       rq->dma_free = dma;
> > > > > > > > > > > > > > > > > +}
> > > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > > +static void *virtnet_rq_recycle_data(struct receive_queue *rq,
> > > > > > > > > > > > > > > > > +                                    struct virtnet_rq_data *data)
> > > > > > > > > > > > > > > > > +{
> > > > > > > > > > > > > > > > > +       void *buf;
> > > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > > +       buf = data->buf;
> > > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > > +       data->next = rq->data_free;
> > > > > > > > > > > > > > > > > +       rq->data_free = data;
> > > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > > +       return buf;
> > > > > > > > > > > > > > > > > +}
> > > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > > +static struct virtnet_rq_data *virtnet_rq_get_data(struct receive_queue *rq,
> > > > > > > > > > > > > > > > > +                                                  void *buf,
> > > > > > > > > > > > > > > > > +                                                  struct virtnet_rq_dma *dma)
> > > > > > > > > > > > > > > > > +{
> > > > > > > > > > > > > > > > > +       struct virtnet_rq_data *data;
> > > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > > +       data = rq->data_free;
> > > > > > > > > > > > > > > > > +       rq->data_free = data->next;
> > > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > > +       data->buf = buf;
> > > > > > > > > > > > > > > > > +       data->dma = dma;
> > > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > > +       return data;
> > > > > > > > > > > > > > > > > +}
> > > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > > +static void *virtnet_rq_get_buf(struct receive_queue *rq, u32 *len, void **ctx)
> > > > > > > > > > > > > > > > > +{
> > > > > > > > > > > > > > > > > +       struct virtnet_rq_data *data;
> > > > > > > > > > > > > > > > > +       void *buf;
> > > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > > +       buf = virtqueue_get_buf_ctx(rq->vq, len, ctx);
> > > > > > > > > > > > > > > > > +       if (!buf || !rq->data_array)
> > > > > > > > > > > > > > > > > +               return buf;
> > > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > > +       data = buf;
> > > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > > +       virtnet_rq_unmap(rq, data->dma);
> > > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > > +       return virtnet_rq_recycle_data(rq, data);
> > > > > > > > > > > > > > > > > +}
> > > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > > +static void *virtnet_rq_detach_unused_buf(struct receive_queue *rq)
> > > > > > > > > > > > > > > > > +{
> > > > > > > > > > > > > > > > > +       struct virtnet_rq_data *data;
> > > > > > > > > > > > > > > > > +       void *buf;
> > > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > > +       buf = virtqueue_detach_unused_buf(rq->vq);
> > > > > > > > > > > > > > > > > +       if (!buf || !rq->data_array)
> > > > > > > > > > > > > > > > > +               return buf;
> > > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > > +       data = buf;
> > > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > > +       virtnet_rq_unmap(rq, data->dma);
> > > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > > +       return virtnet_rq_recycle_data(rq, data);
> > > > > > > > > > > > > > > > > +}
> > > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > > +static int virtnet_rq_map_sg(struct receive_queue *rq, void *buf, u32 len)
> > > > > > > > > > > > > > > > > +{
> > > > > > > > > > > > > > > > > +       struct virtnet_rq_dma *dma = rq->last_dma;
> > > > > > > > > > > > > > > > > +       struct device *dev;
> > > > > > > > > > > > > > > > > +       u32 off, map_len;
> > > > > > > > > > > > > > > > > +       dma_addr_t addr;
> > > > > > > > > > > > > > > > > +       void *end;
> > > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > > +       if (likely(dma) && buf >= dma->buf && (buf + len <= dma->buf + dma->len)) {
> > > > > > > > > > > > > > > > > +               ++dma->ref;
> > > > > > > > > > > > > > > > > +               addr = dma->addr + (buf - dma->buf);
> > > > > > > > > > > > > > > > > +               goto ok;
> > > > > > > > > > > > > > > > > +       }
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > So this is the meat of the proposed optimization. I guess that
> > > > > > > > > > > > > > > > if the last buffer we allocated happens to be in the same page
> > > > > > > > > > > > > > > > as this one then they can both be mapped for DMA together.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Since we use page_frag, the buffers we allocated are all continuous.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Why last one specifically? Whether next one happens to
> > > > > > > > > > > > > > > > be close depends on luck. If you want to try optimizing this
> > > > > > > > > > > > > > > > the right thing to do is likely by using a page pool.
> > > > > > > > > > > > > > > > There's actually work upstream on page pool, look it up.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > As we discussed in another thread, the page pool is first used for xdp. Let's
> > > > > > > > > > > > > > > transform it step by step.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Thanks.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > ok so this should wait then?
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > > +       end = buf + len - 1;
> > > > > > > > > > > > > > > > > +       off = offset_in_page(end);
> > > > > > > > > > > > > > > > > +       map_len = len + PAGE_SIZE - off;
> > > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > > +       dev = virtqueue_dma_dev(rq->vq);
> > > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > > +       addr = dma_map_page_attrs(dev, virt_to_page(buf), offset_in_page(buf),
> > > > > > > > > > > > > > > > > +                                 map_len, DMA_FROM_DEVICE, 0);
> > > > > > > > > > > > > > > > > +       if (addr == DMA_MAPPING_ERROR)
> > > > > > > > > > > > > > > > > +               return -ENOMEM;
> > > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > > +       dma = rq->dma_free;
> > > > > > > > > > > > > > > > > +       rq->dma_free = dma->next;
> > > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > > +       dma->ref = 1;
> > > > > > > > > > > > > > > > > +       dma->buf = buf;
> > > > > > > > > > > > > > > > > +       dma->addr = addr;
> > > > > > > > > > > > > > > > > +       dma->len = map_len;
> > > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > > +       rq->last_dma = dma;
> > > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > > +ok:
> > > > > > > > > > > > > > > > > +       sg_init_table(rq->sg, 1);
> > > > > > > > > > > > > > > > > +       rq->sg[0].dma_address = addr;
> > > > > > > > > > > > > > > > > +       rq->sg[0].length = len;
> > > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > > +       return 0;
> > > > > > > > > > > > > > > > > +}
> > > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > > +static int virtnet_rq_merge_map_init(struct virtnet_info *vi)
> > > > > > > > > > > > > > > > > +{
> > > > > > > > > > > > > > > > > +       struct receive_queue *rq;
> > > > > > > > > > > > > > > > > +       int i, err, j, num;
> > > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > > +       /* disable for big mode */
> > > > > > > > > > > > > > > > > +       if (!vi->mergeable_rx_bufs && vi->big_packets)
> > > > > > > > > > > > > > > > > +               return 0;
> > > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > > +       for (i = 0; i < vi->max_queue_pairs; i++) {
> > > > > > > > > > > > > > > > > +               err = virtqueue_set_premapped(vi->rq[i].vq);
> > > > > > > > > > > > > > > > > +               if (err)
> > > > > > > > > > > > > > > > > +                       continue;
> > > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > > +               rq = &vi->rq[i];
> > > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > > +               num = virtqueue_get_vring_size(rq->vq);
> > > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > > +               rq->data_array = kmalloc_array(num, sizeof(*rq->data_array), GFP_KERNEL);
> > > > > > > > > > > > > > > > > +               if (!rq->data_array)
> > > > > > > > > > > > > > > > > +                       goto err;
> > > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > > +               rq->dma_array = kmalloc_array(num, sizeof(*rq->dma_array), GFP_KERNEL);
> > > > > > > > > > > > > > > > > +               if (!rq->dma_array)
> > > > > > > > > > > > > > > > > +                       goto err;
> > > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > > +               for (j = 0; j < num; ++j) {
> > > > > > > > > > > > > > > > > +                       rq->data_array[j].next = rq->data_free;
> > > > > > > > > > > > > > > > > +                       rq->data_free = &rq->data_array[j];
> > > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > > +                       rq->dma_array[j].next = rq->dma_free;
> > > > > > > > > > > > > > > > > +                       rq->dma_free = &rq->dma_array[j];
> > > > > > > > > > > > > > > > > +               }
> > > > > > > > > > > > > > > > > +       }
> > > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > > +       return 0;
> > > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > > +err:
> > > > > > > > > > > > > > > > > +       for (i = 0; i < vi->max_queue_pairs; i++) {
> > > > > > > > > > > > > > > > > +               struct receive_queue *rq;
> > > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > > +               rq = &vi->rq[i];
> > > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > > +               kfree(rq->dma_array);
> > > > > > > > > > > > > > > > > +               kfree(rq->data_array);
> > > > > > > > > > > > > > > > > +       }
> > > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > > +       return -ENOMEM;
> > > > > > > > > > > > > > > > > +}
> > > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > >  static void free_old_xmit_skbs(struct send_queue *sq, bool in_napi)
> > > > > > > > > > > > > > > > >  {
> > > > > > > > > > > > > > > > >         unsigned int len;
> > > > > > > > > > > > > > > > > @@ -835,7 +1033,7 @@ static struct page *xdp_linearize_page(struct receive_queue *rq,
> > > > > > > > > > > > > > > > >                 void *buf;
> > > > > > > > > > > > > > > > >                 int off;
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > -               buf = virtqueue_get_buf(rq->vq, &buflen);
> > > > > > > > > > > > > > > > > +               buf = virtnet_rq_get_buf(rq, &buflen, NULL);
> > > > > > > > > > > > > > > > >                 if (unlikely(!buf))
> > > > > > > > > > > > > > > > >                         goto err_buf;
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > @@ -1126,7 +1324,7 @@ static int virtnet_build_xdp_buff_mrg(struct net_device *dev,
> > > > > > > > > > > > > > > > >                 return -EINVAL;
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >         while (--*num_buf > 0) {
> > > > > > > > > > > > > > > > > -               buf = virtqueue_get_buf_ctx(rq->vq, &len, &ctx);
> > > > > > > > > > > > > > > > > +               buf = virtnet_rq_get_buf(rq, &len, &ctx);
> > > > > > > > > > > > > > > > >                 if (unlikely(!buf)) {
> > > > > > > > > > > > > > > > >                         pr_debug("%s: rx error: %d buffers out of %d missing\n",
> > > > > > > > > > > > > > > > >                                  dev->name, *num_buf,
> > > > > > > > > > > > > > > > > @@ -1351,7 +1549,7 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
> > > > > > > > > > > > > > > > >         while (--num_buf) {
> > > > > > > > > > > > > > > > >                 int num_skb_frags;
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > -               buf = virtqueue_get_buf_ctx(rq->vq, &len, &ctx);
> > > > > > > > > > > > > > > > > +               buf = virtnet_rq_get_buf(rq, &len, &ctx);
> > > > > > > > > > > > > > > > >                 if (unlikely(!buf)) {
> > > > > > > > > > > > > > > > >                         pr_debug("%s: rx error: %d buffers out of %d missing\n",
> > > > > > > > > > > > > > > > >                                  dev->name, num_buf,
> > > > > > > > > > > > > > > > > @@ -1414,7 +1612,7 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
> > > > > > > > > > > > > > > > >  err_skb:
> > > > > > > > > > > > > > > > >         put_page(page);
> > > > > > > > > > > > > > > > >         while (num_buf-- > 1) {
> > > > > > > > > > > > > > > > > -               buf = virtqueue_get_buf(rq->vq, &len);
> > > > > > > > > > > > > > > > > +               buf = virtnet_rq_get_buf(rq, &len, NULL);
> > > > > > > > > > > > > > > > >                 if (unlikely(!buf)) {
> > > > > > > > > > > > > > > > >                         pr_debug("%s: rx error: %d buffers missing\n",
> > > > > > > > > > > > > > > > >                                  dev->name, num_buf);
> > > > > > > > > > > > > > > > > @@ -1529,6 +1727,7 @@ static int add_recvbuf_small(struct virtnet_info *vi, struct receive_queue *rq,
> > > > > > > > > > > > > > > > >         unsigned int xdp_headroom = virtnet_get_headroom(vi);
> > > > > > > > > > > > > > > > >         void *ctx = (void *)(unsigned long)xdp_headroom;
> > > > > > > > > > > > > > > > >         int len = vi->hdr_len + VIRTNET_RX_PAD + GOOD_PACKET_LEN + xdp_headroom;
> > > > > > > > > > > > > > > > > +       struct virtnet_rq_data *data;
> > > > > > > > > > > > > > > > >         int err;
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >         len = SKB_DATA_ALIGN(len) +
> > > > > > > > > > > > > > > > > @@ -1539,11 +1738,34 @@ static int add_recvbuf_small(struct virtnet_info *vi, struct receive_queue *rq,
> > > > > > > > > > > > > > > > >         buf = (char *)page_address(alloc_frag->page) + alloc_frag->offset;
> > > > > > > > > > > > > > > > >         get_page(alloc_frag->page);
> > > > > > > > > > > > > > > > >         alloc_frag->offset += len;
> > > > > > > > > > > > > > > > > -       sg_init_one(rq->sg, buf + VIRTNET_RX_PAD + xdp_headroom,
> > > > > > > > > > > > > > > > > -                   vi->hdr_len + GOOD_PACKET_LEN);
> > > > > > > > > > > > > > > > > -       err = virtqueue_add_inbuf_ctx(rq->vq, rq->sg, 1, buf, ctx, gfp);
> > > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > > +       if (rq->data_array) {
> > > > > > > > > > > > > > > > > +               err = virtnet_rq_map_sg(rq, buf + VIRTNET_RX_PAD + xdp_headroom,
> > > > > > > > > > > > > > > > > +                                       vi->hdr_len + GOOD_PACKET_LEN);
> > > > > > > > > > > > > > > > > +               if (err)
> > > > > > > > > > > > > > > > > +                       goto map_err;
> > > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > > +               data = virtnet_rq_get_data(rq, buf, rq->last_dma);
> > > > > > > > > > > > > > > > > +       } else {
> > > > > > > > > > > > > > > > > +               sg_init_one(rq->sg, buf + VIRTNET_RX_PAD + xdp_headroom,
> > > > > > > > > > > > > > > > > +                           vi->hdr_len + GOOD_PACKET_LEN);
> > > > > > > > > > > > > > > > > +               data = (void *)buf;
> > > > > > > > > > > > > > > > > +       }
> > > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > > +       err = virtqueue_add_inbuf_ctx(rq->vq, rq->sg, 1, data, ctx, gfp);
> > > > > > > > > > > > > > > > >         if (err < 0)
> > > > > > > > > > > > > > > > > -               put_page(virt_to_head_page(buf));
> > > > > > > > > > > > > > > > > +               goto add_err;
> > > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > > +       return err;
> > > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > > +add_err:
> > > > > > > > > > > > > > > > > +       if (rq->data_array) {
> > > > > > > > > > > > > > > > > +               virtnet_rq_unmap(rq, data->dma);
> > > > > > > > > > > > > > > > > +               virtnet_rq_recycle_data(rq, data);
> > > > > > > > > > > > > > > > > +       }
> > > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > > +map_err:
> > > > > > > > > > > > > > > > > +       put_page(virt_to_head_page(buf));
> > > > > > > > > > > > > > > > >         return err;
> > > > > > > > > > > > > > > > >  }
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > @@ -1620,6 +1842,7 @@ static int add_recvbuf_mergeable(struct virtnet_info *vi,
> > > > > > > > > > > > > > > > >         unsigned int headroom = virtnet_get_headroom(vi);
> > > > > > > > > > > > > > > > >         unsigned int tailroom = headroom ? sizeof(struct skb_shared_info) : 0;
> > > > > > > > > > > > > > > > >         unsigned int room = SKB_DATA_ALIGN(headroom + tailroom);
> > > > > > > > > > > > > > > > > +       struct virtnet_rq_data *data;
> > > > > > > > > > > > > > > > >         char *buf;
> > > > > > > > > > > > > > > > >         void *ctx;
> > > > > > > > > > > > > > > > >         int err;
> > > > > > > > > > > > > > > > > @@ -1650,12 +1873,32 @@ static int add_recvbuf_mergeable(struct virtnet_info *vi,
> > > > > > > > > > > > > > > > >                 alloc_frag->offset += hole;
> > > > > > > > > > > > > > > > >         }
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > -       sg_init_one(rq->sg, buf, len);
> > > > > > > > > > > > > > > > > +       if (rq->data_array) {
> > > > > > > > > > > > > > > > > +               err = virtnet_rq_map_sg(rq, buf, len);
> > > > > > > > > > > > > > > > > +               if (err)
> > > > > > > > > > > > > > > > > +                       goto map_err;
> > > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > > +               data = virtnet_rq_get_data(rq, buf, rq->last_dma);
> > > > > > > > > > > > > > > > > +       } else {
> > > > > > > > > > > > > > > > > +               sg_init_one(rq->sg, buf, len);
> > > > > > > > > > > > > > > > > +               data = (void *)buf;
> > > > > > > > > > > > > > > > > +       }
> > > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > >         ctx = mergeable_len_to_ctx(len + room, headroom);
> > > > > > > > > > > > > > > > > -       err = virtqueue_add_inbuf_ctx(rq->vq, rq->sg, 1, buf, ctx, gfp);
> > > > > > > > > > > > > > > > > +       err = virtqueue_add_inbuf_ctx(rq->vq, rq->sg, 1, data, ctx, gfp);
> > > > > > > > > > > > > > > > >         if (err < 0)
> > > > > > > > > > > > > > > > > -               put_page(virt_to_head_page(buf));
> > > > > > > > > > > > > > > > > +               goto add_err;
> > > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > > +       return 0;
> > > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > > +add_err:
> > > > > > > > > > > > > > > > > +       if (rq->data_array) {
> > > > > > > > > > > > > > > > > +               virtnet_rq_unmap(rq, data->dma);
> > > > > > > > > > > > > > > > > +               virtnet_rq_recycle_data(rq, data);
> > > > > > > > > > > > > > > > > +       }
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > +map_err:
> > > > > > > > > > > > > > > > > +       put_page(virt_to_head_page(buf));
> > > > > > > > > > > > > > > > >         return err;
> > > > > > > > > > > > > > > > >  }
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > @@ -1775,13 +2018,13 @@ static int virtnet_receive(struct receive_queue *rq, int budget,
> > > > > > > > > > > > > > > > >                 void *ctx;
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >                 while (stats.packets < budget &&
> > > > > > > > > > > > > > > > > -                      (buf = virtqueue_get_buf_ctx(rq->vq, &len, &ctx))) {
> > > > > > > > > > > > > > > > > +                      (buf = virtnet_rq_get_buf(rq, &len, &ctx))) {
> > > > > > > > > > > > > > > > >                         receive_buf(vi, rq, buf, len, ctx, xdp_xmit, &stats);
> > > > > > > > > > > > > > > > >                         stats.packets++;
> > > > > > > > > > > > > > > > >                 }
> > > > > > > > > > > > > > > > >         } else {
> > > > > > > > > > > > > > > > >                 while (stats.packets < budget &&
> > > > > > > > > > > > > > > > > -                      (buf = virtqueue_get_buf(rq->vq, &len)) != NULL) {
> > > > > > > > > > > > > > > > > +                      (buf = virtnet_rq_get_buf(rq, &len, NULL)) != NULL) {
> > > > > > > > > > > > > > > > >                         receive_buf(vi, rq, buf, len, NULL, xdp_xmit, &stats);
> > > > > > > > > > > > > > > > >                         stats.packets++;
> > > > > > > > > > > > > > > > >                 }
> > > > > > > > > > > > > > > > > @@ -3514,6 +3757,9 @@ static void virtnet_free_queues(struct virtnet_info *vi)
> > > > > > > > > > > > > > > > >         for (i = 0; i < vi->max_queue_pairs; i++) {
> > > > > > > > > > > > > > > > >                 __netif_napi_del(&vi->rq[i].napi);
> > > > > > > > > > > > > > > > >                 __netif_napi_del(&vi->sq[i].napi);
> > > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > > +               kfree(vi->rq[i].data_array);
> > > > > > > > > > > > > > > > > +               kfree(vi->rq[i].dma_array);
> > > > > > > > > > > > > > > > >         }
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >         /* We called __netif_napi_del(),
> > > > > > > > > > > > > > > > > @@ -3591,9 +3837,10 @@ static void free_unused_bufs(struct virtnet_info *vi)
> > > > > > > > > > > > > > > > >         }
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >         for (i = 0; i < vi->max_queue_pairs; i++) {
> > > > > > > > > > > > > > > > > -               struct virtqueue *vq = vi->rq[i].vq;
> > > > > > > > > > > > > > > > > -               while ((buf = virtqueue_detach_unused_buf(vq)) != NULL)
> > > > > > > > > > > > > > > > > -                       virtnet_rq_free_unused_buf(vq, buf);
> > > > > > > > > > > > > > > > > +               struct receive_queue *rq = &vi->rq[i];
> > > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > > +               while ((buf = virtnet_rq_detach_unused_buf(rq)) != NULL)
> > > > > > > > > > > > > > > > > +                       virtnet_rq_free_unused_buf(rq->vq, buf);
> > > > > > > > > > > > > > > > >                 cond_resched();
> > > > > > > > > > > > > > > > >         }
> > > > > > > > > > > > > > > > >  }
> > > > > > > > > > > > > > > > > @@ -3767,6 +4014,10 @@ static int init_vqs(struct virtnet_info *vi)
> > > > > > > > > > > > > > > > >         if (ret)
> > > > > > > > > > > > > > > > >                 goto err_free;
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > +       ret = virtnet_rq_merge_map_init(vi);
> > > > > > > > > > > > > > > > > +       if (ret)
> > > > > > > > > > > > > > > > > +               goto err_free;
> > > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > >         cpus_read_lock();
> > > > > > > > > > > > > > > > >         virtnet_set_affinity(vi);
> > > > > > > > > > > > > > > > >         cpus_read_unlock();
> > > > > > > > > > > > > > > > > --
> > > > > > > > > > > > > > > > > 2.32.0.3.g01195cf9f
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > >
> > >
>
>
> _______________________________________________
> Virtualization mailing list
> Virtualization@lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: [PATCH vhost v11 10/10] virtio_net: merge dma operation for one page
@ 2023-07-20  2:26                                     ` Xuan Zhuo
  0 siblings, 0 replies; 176+ messages in thread
From: Xuan Zhuo @ 2023-07-20  2:26 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Jesper Dangaard Brouer, Daniel Borkmann, netdev, John Fastabend,
	Alexei Starovoitov, virtualization, Christoph Hellwig,
	Eric Dumazet, Jakub Kicinski, bpf, Paolo Abeni, David S. Miller

On Wed, 19 Jul 2023 05:51:50 -0400, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> On Wed, Jul 19, 2023 at 05:38:56PM +0800, Jason Wang wrote:
> > On Wed, Jul 19, 2023 at 4:55 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > >
> > > On Wed, Jul 19, 2023 at 11:21:07AM +0800, Xuan Zhuo wrote:
> > > > On Fri, 14 Jul 2023 06:37:10 -0400, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > > > > On Wed, Jul 12, 2023 at 04:38:24PM +0800, Xuan Zhuo wrote:
> > > > > > On Wed, 12 Jul 2023 16:37:43 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > > > > > On Wed, Jul 12, 2023 at 4:33 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > > > > > >
> > > > > > > > On Wed, 12 Jul 2023 15:54:58 +0800, Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > > > > > > > On Tue, 11 Jul 2023 10:58:51 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > > > > > > > > On Tue, Jul 11, 2023 at 10:42 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > > > > > > > > >
> > > > > > > > > > > On Tue, 11 Jul 2023 10:36:17 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > > > > > > > > > > On Mon, Jul 10, 2023 at 8:41 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > > On Mon, 10 Jul 2023 07:59:03 -0400, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > > > > > > > > > > > > > On Mon, Jul 10, 2023 at 06:18:30PM +0800, Xuan Zhuo wrote:
> > > > > > > > > > > > > > > On Mon, 10 Jul 2023 05:40:21 -0400, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > > > > > > > > > > > > > > > On Mon, Jul 10, 2023 at 11:42:37AM +0800, Xuan Zhuo wrote:
> > > > > > > > > > > > > > > > > Currently, the virtio core will perform a dma operation for each
> > > > > > > > > > > > > > > > > operation. Although, the same page may be operated multiple times.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > The driver does the dma operation and manages the dma address based the
> > > > > > > > > > > > > > > > > feature premapped of virtio core.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > This way, we can perform only one dma operation for the same page. In
> > > > > > > > > > > > > > > > > the case of mtu 1500, this can reduce a lot of dma operations.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Tested on Aliyun g7.4large machine, in the case of a cpu 100%, pps
> > > > > > > > > > > > > > > > > increased from 1893766 to 1901105. An increase of 0.4%.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > what kind of dma was there? an IOMMU? which vendors? in which mode
> > > > > > > > > > > > > > > > of operation?
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Do you mean this:
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > [    0.470816] iommu: Default domain type: Passthrough
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > With passthrough, dma API is just some indirect function calls, they do
> > > > > > > > > > > > > > not affect the performance a lot.
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > Yes, this benefit is worthless. I seem to have done a meaningless thing. The
> > > > > > > > > > > > > overhead of DMA I observed is indeed not too high.
> > > > > > > > > > > >
> > > > > > > > > > > > Have you measured with iommu=strict?
> > > > > > > > > > >
> > > > > > > > > > > I have not tested this way, our environment is pt, I wonder if strict is a
> > > > > > > > > > > common scenario. I can test it.
> > > > > > > > > >
> > > > > > > > > > It's not a common setup, but it's a way to stress DMA layer to see the overhead.
> > > > > > > > >
> > > > > > > > > kernel command line: intel_iommu=on iommu.strict=1 iommu.passthrough=0
> > > > > > > > >
> > > > > > > > > virtio-net without merge dma 428614.00 pps
> > > > > > > > >
> > > > > > > > > virtio-net with merge dma    742853.00 pps
> > > > > > > >
> > > > > > > >
> > > > > > > > kernel command line: intel_iommu=on iommu.strict=0 iommu.passthrough=0
> > > > > > > >
> > > > > > > > virtio-net without merge dma 775496.00 pps
> > > > > > > >
> > > > > > > > virtio-net with merge dma    1010514.00 pps
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > > > Great, let's add those numbers to the changelog.
> > > > > >
> > > > > >
> > > > > > Yes, I will do it in next version.
> > > > > >
> > > > > >
> > > > > > Thanks.
> > > > > >
> > > > >
> > > > > You should also test without iommu but with swiotlb=force
> > > >
> > > >
> > > > For swiotlb, merge DMA has no benefit, because we still need to copy data from
> > > > swiotlb buffer to the origin buffer.
> > > > The benefit of the merge DMA is to reduce the operate to the iommu device.
> > > >
> > > > I did some test for this. The result is same.
> > > >
> > > > Thanks.
> > > >
> > >
> > > Did you actually check that it works though?
> > > Looks like with swiotlb you need to synch to trigger a copy
> > > before unmap, and I don't see where it's done in the current
> > > patch.
> >
> > And this is needed for XDP_REDIRECT as well.
> >
> > Thanks
>
> And once you do, you'll do the copy twice so it will
> actually be slower.

Yes, I also think so. But, I did not see too much decline.
There may be a fluctuating effect.

>
> I suspect you need to sync manually then unmap with DMA_ATTR_SKIP_CPU_SYNC.

DMA_ATTR_SKIP_CPU_SYNC is great!!

I will include this in v13.

Thanks.


>
> > >
> > >
> > > >
> > > > >
> > > > > But first fix the use of DMA API to actually be correct,
> > > > > otherwise you are cheating by avoiding synchronization.
> > > > >
> > > > >
> > > > >
> > > > > > >
> > > > > > > Thanks
> > > > > > >
> > > > > > > > Thanks.
> > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > Thanks.
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > Thanks
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > Thanks.
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > Thanks
> > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > Thanks.
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Try e.g. bounce buffer. Which is where you will see a problem: your
> > > > > > > > > > > > > > patches won't work.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > This kind of difference is likely in the noise.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > It's really not high, but this is because the proportion of DMA under perf top
> > > > > > > > > > > > > > > is not high. Probably that much.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > So maybe not worth the complexity.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > ---
> > > > > > > > > > > > > > > > >  drivers/net/virtio_net.c | 283 ++++++++++++++++++++++++++++++++++++---
> > > > > > > > > > > > > > > > >  1 file changed, 267 insertions(+), 16 deletions(-)
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > > > > > > > > > > > > > > > > index 486b5849033d..4de845d35bed 100644
> > > > > > > > > > > > > > > > > --- a/drivers/net/virtio_net.c
> > > > > > > > > > > > > > > > > +++ b/drivers/net/virtio_net.c
> > > > > > > > > > > > > > > > > @@ -126,6 +126,27 @@ static const struct virtnet_stat_desc virtnet_rq_stats_desc[] = {
> > > > > > > > > > > > > > > > >  #define VIRTNET_SQ_STATS_LEN   ARRAY_SIZE(virtnet_sq_stats_desc)
> > > > > > > > > > > > > > > > >  #define VIRTNET_RQ_STATS_LEN   ARRAY_SIZE(virtnet_rq_stats_desc)
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > +/* The bufs on the same page may share this struct. */
> > > > > > > > > > > > > > > > > +struct virtnet_rq_dma {
> > > > > > > > > > > > > > > > > +       struct virtnet_rq_dma *next;
> > > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > > +       dma_addr_t addr;
> > > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > > +       void *buf;
> > > > > > > > > > > > > > > > > +       u32 len;
> > > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > > +       u32 ref;
> > > > > > > > > > > > > > > > > +};
> > > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > > +/* Record the dma and buf. */
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > I guess I see that. But why?
> > > > > > > > > > > > > > > > And these two comments are the extent of the available
> > > > > > > > > > > > > > > > documentation, that's not enough I feel.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > +struct virtnet_rq_data {
> > > > > > > > > > > > > > > > > +       struct virtnet_rq_data *next;
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Is manually reimplementing a linked list the best
> > > > > > > > > > > > > > > > we can do?
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Yes, we can use llist.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > > +       void *buf;
> > > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > > +       struct virtnet_rq_dma *dma;
> > > > > > > > > > > > > > > > > +};
> > > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > >  /* Internal representation of a send virtqueue */
> > > > > > > > > > > > > > > > >  struct send_queue {
> > > > > > > > > > > > > > > > >         /* Virtqueue associated with this send _queue */
> > > > > > > > > > > > > > > > > @@ -175,6 +196,13 @@ struct receive_queue {
> > > > > > > > > > > > > > > > >         char name[16];
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >         struct xdp_rxq_info xdp_rxq;
> > > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > > +       struct virtnet_rq_data *data_array;
> > > > > > > > > > > > > > > > > +       struct virtnet_rq_data *data_free;
> > > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > > +       struct virtnet_rq_dma *dma_array;
> > > > > > > > > > > > > > > > > +       struct virtnet_rq_dma *dma_free;
> > > > > > > > > > > > > > > > > +       struct virtnet_rq_dma *last_dma;
> > > > > > > > > > > > > > > > >  };
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >  /* This structure can contain rss message with maximum settings for indirection table and keysize
> > > > > > > > > > > > > > > > > @@ -549,6 +577,176 @@ static struct sk_buff *page_to_skb(struct virtnet_info *vi,
> > > > > > > > > > > > > > > > >         return skb;
> > > > > > > > > > > > > > > > >  }
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > +static void virtnet_rq_unmap(struct receive_queue *rq, struct virtnet_rq_dma *dma)
> > > > > > > > > > > > > > > > > +{
> > > > > > > > > > > > > > > > > +       struct device *dev;
> > > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > > +       --dma->ref;
> > > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > > +       if (dma->ref)
> > > > > > > > > > > > > > > > > +               return;
> > > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > If you don't unmap there is no guarantee valid data will be
> > > > > > > > > > > > > > > > there in the buffer.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > +       dev = virtqueue_dma_dev(rq->vq);
> > > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > > +       dma_unmap_page(dev, dma->addr, dma->len, DMA_FROM_DEVICE);
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > > +       dma->next = rq->dma_free;
> > > > > > > > > > > > > > > > > +       rq->dma_free = dma;
> > > > > > > > > > > > > > > > > +}
> > > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > > +static void *virtnet_rq_recycle_data(struct receive_queue *rq,
> > > > > > > > > > > > > > > > > +                                    struct virtnet_rq_data *data)
> > > > > > > > > > > > > > > > > +{
> > > > > > > > > > > > > > > > > +       void *buf;
> > > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > > +       buf = data->buf;
> > > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > > +       data->next = rq->data_free;
> > > > > > > > > > > > > > > > > +       rq->data_free = data;
> > > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > > +       return buf;
> > > > > > > > > > > > > > > > > +}
> > > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > > +static struct virtnet_rq_data *virtnet_rq_get_data(struct receive_queue *rq,
> > > > > > > > > > > > > > > > > +                                                  void *buf,
> > > > > > > > > > > > > > > > > +                                                  struct virtnet_rq_dma *dma)
> > > > > > > > > > > > > > > > > +{
> > > > > > > > > > > > > > > > > +       struct virtnet_rq_data *data;
> > > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > > +       data = rq->data_free;
> > > > > > > > > > > > > > > > > +       rq->data_free = data->next;
> > > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > > +       data->buf = buf;
> > > > > > > > > > > > > > > > > +       data->dma = dma;
> > > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > > +       return data;
> > > > > > > > > > > > > > > > > +}
> > > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > > +static void *virtnet_rq_get_buf(struct receive_queue *rq, u32 *len, void **ctx)
> > > > > > > > > > > > > > > > > +{
> > > > > > > > > > > > > > > > > +       struct virtnet_rq_data *data;
> > > > > > > > > > > > > > > > > +       void *buf;
> > > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > > +       buf = virtqueue_get_buf_ctx(rq->vq, len, ctx);
> > > > > > > > > > > > > > > > > +       if (!buf || !rq->data_array)
> > > > > > > > > > > > > > > > > +               return buf;
> > > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > > +       data = buf;
> > > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > > +       virtnet_rq_unmap(rq, data->dma);
> > > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > > +       return virtnet_rq_recycle_data(rq, data);
> > > > > > > > > > > > > > > > > +}
> > > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > > +static void *virtnet_rq_detach_unused_buf(struct receive_queue *rq)
> > > > > > > > > > > > > > > > > +{
> > > > > > > > > > > > > > > > > +       struct virtnet_rq_data *data;
> > > > > > > > > > > > > > > > > +       void *buf;
> > > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > > +       buf = virtqueue_detach_unused_buf(rq->vq);
> > > > > > > > > > > > > > > > > +       if (!buf || !rq->data_array)
> > > > > > > > > > > > > > > > > +               return buf;
> > > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > > +       data = buf;
> > > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > > +       virtnet_rq_unmap(rq, data->dma);
> > > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > > +       return virtnet_rq_recycle_data(rq, data);
> > > > > > > > > > > > > > > > > +}
> > > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > > +static int virtnet_rq_map_sg(struct receive_queue *rq, void *buf, u32 len)
> > > > > > > > > > > > > > > > > +{
> > > > > > > > > > > > > > > > > +       struct virtnet_rq_dma *dma = rq->last_dma;
> > > > > > > > > > > > > > > > > +       struct device *dev;
> > > > > > > > > > > > > > > > > +       u32 off, map_len;
> > > > > > > > > > > > > > > > > +       dma_addr_t addr;
> > > > > > > > > > > > > > > > > +       void *end;
> > > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > > +       if (likely(dma) && buf >= dma->buf && (buf + len <= dma->buf + dma->len)) {
> > > > > > > > > > > > > > > > > +               ++dma->ref;
> > > > > > > > > > > > > > > > > +               addr = dma->addr + (buf - dma->buf);
> > > > > > > > > > > > > > > > > +               goto ok;
> > > > > > > > > > > > > > > > > +       }
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > So this is the meat of the proposed optimization. I guess that
> > > > > > > > > > > > > > > > if the last buffer we allocated happens to be in the same page
> > > > > > > > > > > > > > > > as this one then they can both be mapped for DMA together.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Since we use page_frag, the buffers we allocated are all continuous.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Why last one specifically? Whether next one happens to
> > > > > > > > > > > > > > > > be close depends on luck. If you want to try optimizing this
> > > > > > > > > > > > > > > > the right thing to do is likely by using a page pool.
> > > > > > > > > > > > > > > > There's actually work upstream on page pool, look it up.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > As we discussed in another thread, the page pool is first used for xdp. Let's
> > > > > > > > > > > > > > > transform it step by step.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Thanks.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > ok so this should wait then?
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > > +       end = buf + len - 1;
> > > > > > > > > > > > > > > > > +       off = offset_in_page(end);
> > > > > > > > > > > > > > > > > +       map_len = len + PAGE_SIZE - off;
> > > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > > +       dev = virtqueue_dma_dev(rq->vq);
> > > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > > +       addr = dma_map_page_attrs(dev, virt_to_page(buf), offset_in_page(buf),
> > > > > > > > > > > > > > > > > +                                 map_len, DMA_FROM_DEVICE, 0);
> > > > > > > > > > > > > > > > > +       if (addr == DMA_MAPPING_ERROR)
> > > > > > > > > > > > > > > > > +               return -ENOMEM;
> > > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > > +       dma = rq->dma_free;
> > > > > > > > > > > > > > > > > +       rq->dma_free = dma->next;
> > > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > > +       dma->ref = 1;
> > > > > > > > > > > > > > > > > +       dma->buf = buf;
> > > > > > > > > > > > > > > > > +       dma->addr = addr;
> > > > > > > > > > > > > > > > > +       dma->len = map_len;
> > > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > > +       rq->last_dma = dma;
> > > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > > +ok:
> > > > > > > > > > > > > > > > > +       sg_init_table(rq->sg, 1);
> > > > > > > > > > > > > > > > > +       rq->sg[0].dma_address = addr;
> > > > > > > > > > > > > > > > > +       rq->sg[0].length = len;
> > > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > > +       return 0;
> > > > > > > > > > > > > > > > > +}
> > > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > > +static int virtnet_rq_merge_map_init(struct virtnet_info *vi)
> > > > > > > > > > > > > > > > > +{
> > > > > > > > > > > > > > > > > +       struct receive_queue *rq;
> > > > > > > > > > > > > > > > > +       int i, err, j, num;
> > > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > > +       /* disable for big mode */
> > > > > > > > > > > > > > > > > +       if (!vi->mergeable_rx_bufs && vi->big_packets)
> > > > > > > > > > > > > > > > > +               return 0;
> > > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > > +       for (i = 0; i < vi->max_queue_pairs; i++) {
> > > > > > > > > > > > > > > > > +               err = virtqueue_set_premapped(vi->rq[i].vq);
> > > > > > > > > > > > > > > > > +               if (err)
> > > > > > > > > > > > > > > > > +                       continue;
> > > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > > +               rq = &vi->rq[i];
> > > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > > +               num = virtqueue_get_vring_size(rq->vq);
> > > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > > +               rq->data_array = kmalloc_array(num, sizeof(*rq->data_array), GFP_KERNEL);
> > > > > > > > > > > > > > > > > +               if (!rq->data_array)
> > > > > > > > > > > > > > > > > +                       goto err;
> > > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > > +               rq->dma_array = kmalloc_array(num, sizeof(*rq->dma_array), GFP_KERNEL);
> > > > > > > > > > > > > > > > > +               if (!rq->dma_array)
> > > > > > > > > > > > > > > > > +                       goto err;
> > > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > > +               for (j = 0; j < num; ++j) {
> > > > > > > > > > > > > > > > > +                       rq->data_array[j].next = rq->data_free;
> > > > > > > > > > > > > > > > > +                       rq->data_free = &rq->data_array[j];
> > > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > > +                       rq->dma_array[j].next = rq->dma_free;
> > > > > > > > > > > > > > > > > +                       rq->dma_free = &rq->dma_array[j];
> > > > > > > > > > > > > > > > > +               }
> > > > > > > > > > > > > > > > > +       }
> > > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > > +       return 0;
> > > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > > +err:
> > > > > > > > > > > > > > > > > +       for (i = 0; i < vi->max_queue_pairs; i++) {
> > > > > > > > > > > > > > > > > +               struct receive_queue *rq;
> > > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > > +               rq = &vi->rq[i];
> > > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > > +               kfree(rq->dma_array);
> > > > > > > > > > > > > > > > > +               kfree(rq->data_array);
> > > > > > > > > > > > > > > > > +       }
> > > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > > +       return -ENOMEM;
> > > > > > > > > > > > > > > > > +}
> > > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > >  static void free_old_xmit_skbs(struct send_queue *sq, bool in_napi)
> > > > > > > > > > > > > > > > >  {
> > > > > > > > > > > > > > > > >         unsigned int len;
> > > > > > > > > > > > > > > > > @@ -835,7 +1033,7 @@ static struct page *xdp_linearize_page(struct receive_queue *rq,
> > > > > > > > > > > > > > > > >                 void *buf;
> > > > > > > > > > > > > > > > >                 int off;
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > -               buf = virtqueue_get_buf(rq->vq, &buflen);
> > > > > > > > > > > > > > > > > +               buf = virtnet_rq_get_buf(rq, &buflen, NULL);
> > > > > > > > > > > > > > > > >                 if (unlikely(!buf))
> > > > > > > > > > > > > > > > >                         goto err_buf;
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > @@ -1126,7 +1324,7 @@ static int virtnet_build_xdp_buff_mrg(struct net_device *dev,
> > > > > > > > > > > > > > > > >                 return -EINVAL;
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >         while (--*num_buf > 0) {
> > > > > > > > > > > > > > > > > -               buf = virtqueue_get_buf_ctx(rq->vq, &len, &ctx);
> > > > > > > > > > > > > > > > > +               buf = virtnet_rq_get_buf(rq, &len, &ctx);
> > > > > > > > > > > > > > > > >                 if (unlikely(!buf)) {
> > > > > > > > > > > > > > > > >                         pr_debug("%s: rx error: %d buffers out of %d missing\n",
> > > > > > > > > > > > > > > > >                                  dev->name, *num_buf,
> > > > > > > > > > > > > > > > > @@ -1351,7 +1549,7 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
> > > > > > > > > > > > > > > > >         while (--num_buf) {
> > > > > > > > > > > > > > > > >                 int num_skb_frags;
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > -               buf = virtqueue_get_buf_ctx(rq->vq, &len, &ctx);
> > > > > > > > > > > > > > > > > +               buf = virtnet_rq_get_buf(rq, &len, &ctx);
> > > > > > > > > > > > > > > > >                 if (unlikely(!buf)) {
> > > > > > > > > > > > > > > > >                         pr_debug("%s: rx error: %d buffers out of %d missing\n",
> > > > > > > > > > > > > > > > >                                  dev->name, num_buf,
> > > > > > > > > > > > > > > > > @@ -1414,7 +1612,7 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
> > > > > > > > > > > > > > > > >  err_skb:
> > > > > > > > > > > > > > > > >         put_page(page);
> > > > > > > > > > > > > > > > >         while (num_buf-- > 1) {
> > > > > > > > > > > > > > > > > -               buf = virtqueue_get_buf(rq->vq, &len);
> > > > > > > > > > > > > > > > > +               buf = virtnet_rq_get_buf(rq, &len, NULL);
> > > > > > > > > > > > > > > > >                 if (unlikely(!buf)) {
> > > > > > > > > > > > > > > > >                         pr_debug("%s: rx error: %d buffers missing\n",
> > > > > > > > > > > > > > > > >                                  dev->name, num_buf);
> > > > > > > > > > > > > > > > > @@ -1529,6 +1727,7 @@ static int add_recvbuf_small(struct virtnet_info *vi, struct receive_queue *rq,
> > > > > > > > > > > > > > > > >         unsigned int xdp_headroom = virtnet_get_headroom(vi);
> > > > > > > > > > > > > > > > >         void *ctx = (void *)(unsigned long)xdp_headroom;
> > > > > > > > > > > > > > > > >         int len = vi->hdr_len + VIRTNET_RX_PAD + GOOD_PACKET_LEN + xdp_headroom;
> > > > > > > > > > > > > > > > > +       struct virtnet_rq_data *data;
> > > > > > > > > > > > > > > > >         int err;
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >         len = SKB_DATA_ALIGN(len) +
> > > > > > > > > > > > > > > > > @@ -1539,11 +1738,34 @@ static int add_recvbuf_small(struct virtnet_info *vi, struct receive_queue *rq,
> > > > > > > > > > > > > > > > >         buf = (char *)page_address(alloc_frag->page) + alloc_frag->offset;
> > > > > > > > > > > > > > > > >         get_page(alloc_frag->page);
> > > > > > > > > > > > > > > > >         alloc_frag->offset += len;
> > > > > > > > > > > > > > > > > -       sg_init_one(rq->sg, buf + VIRTNET_RX_PAD + xdp_headroom,
> > > > > > > > > > > > > > > > > -                   vi->hdr_len + GOOD_PACKET_LEN);
> > > > > > > > > > > > > > > > > -       err = virtqueue_add_inbuf_ctx(rq->vq, rq->sg, 1, buf, ctx, gfp);
> > > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > > +       if (rq->data_array) {
> > > > > > > > > > > > > > > > > +               err = virtnet_rq_map_sg(rq, buf + VIRTNET_RX_PAD + xdp_headroom,
> > > > > > > > > > > > > > > > > +                                       vi->hdr_len + GOOD_PACKET_LEN);
> > > > > > > > > > > > > > > > > +               if (err)
> > > > > > > > > > > > > > > > > +                       goto map_err;
> > > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > > +               data = virtnet_rq_get_data(rq, buf, rq->last_dma);
> > > > > > > > > > > > > > > > > +       } else {
> > > > > > > > > > > > > > > > > +               sg_init_one(rq->sg, buf + VIRTNET_RX_PAD + xdp_headroom,
> > > > > > > > > > > > > > > > > +                           vi->hdr_len + GOOD_PACKET_LEN);
> > > > > > > > > > > > > > > > > +               data = (void *)buf;
> > > > > > > > > > > > > > > > > +       }
> > > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > > +       err = virtqueue_add_inbuf_ctx(rq->vq, rq->sg, 1, data, ctx, gfp);
> > > > > > > > > > > > > > > > >         if (err < 0)
> > > > > > > > > > > > > > > > > -               put_page(virt_to_head_page(buf));
> > > > > > > > > > > > > > > > > +               goto add_err;
> > > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > > +       return err;
> > > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > > +add_err:
> > > > > > > > > > > > > > > > > +       if (rq->data_array) {
> > > > > > > > > > > > > > > > > +               virtnet_rq_unmap(rq, data->dma);
> > > > > > > > > > > > > > > > > +               virtnet_rq_recycle_data(rq, data);
> > > > > > > > > > > > > > > > > +       }
> > > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > > +map_err:
> > > > > > > > > > > > > > > > > +       put_page(virt_to_head_page(buf));
> > > > > > > > > > > > > > > > >         return err;
> > > > > > > > > > > > > > > > >  }
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > @@ -1620,6 +1842,7 @@ static int add_recvbuf_mergeable(struct virtnet_info *vi,
> > > > > > > > > > > > > > > > >         unsigned int headroom = virtnet_get_headroom(vi);
> > > > > > > > > > > > > > > > >         unsigned int tailroom = headroom ? sizeof(struct skb_shared_info) : 0;
> > > > > > > > > > > > > > > > >         unsigned int room = SKB_DATA_ALIGN(headroom + tailroom);
> > > > > > > > > > > > > > > > > +       struct virtnet_rq_data *data;
> > > > > > > > > > > > > > > > >         char *buf;
> > > > > > > > > > > > > > > > >         void *ctx;
> > > > > > > > > > > > > > > > >         int err;
> > > > > > > > > > > > > > > > > @@ -1650,12 +1873,32 @@ static int add_recvbuf_mergeable(struct virtnet_info *vi,
> > > > > > > > > > > > > > > > >                 alloc_frag->offset += hole;
> > > > > > > > > > > > > > > > >         }
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > -       sg_init_one(rq->sg, buf, len);
> > > > > > > > > > > > > > > > > +       if (rq->data_array) {
> > > > > > > > > > > > > > > > > +               err = virtnet_rq_map_sg(rq, buf, len);
> > > > > > > > > > > > > > > > > +               if (err)
> > > > > > > > > > > > > > > > > +                       goto map_err;
> > > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > > +               data = virtnet_rq_get_data(rq, buf, rq->last_dma);
> > > > > > > > > > > > > > > > > +       } else {
> > > > > > > > > > > > > > > > > +               sg_init_one(rq->sg, buf, len);
> > > > > > > > > > > > > > > > > +               data = (void *)buf;
> > > > > > > > > > > > > > > > > +       }
> > > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > >         ctx = mergeable_len_to_ctx(len + room, headroom);
> > > > > > > > > > > > > > > > > -       err = virtqueue_add_inbuf_ctx(rq->vq, rq->sg, 1, buf, ctx, gfp);
> > > > > > > > > > > > > > > > > +       err = virtqueue_add_inbuf_ctx(rq->vq, rq->sg, 1, data, ctx, gfp);
> > > > > > > > > > > > > > > > >         if (err < 0)
> > > > > > > > > > > > > > > > > -               put_page(virt_to_head_page(buf));
> > > > > > > > > > > > > > > > > +               goto add_err;
> > > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > > +       return 0;
> > > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > > +add_err:
> > > > > > > > > > > > > > > > > +       if (rq->data_array) {
> > > > > > > > > > > > > > > > > +               virtnet_rq_unmap(rq, data->dma);
> > > > > > > > > > > > > > > > > +               virtnet_rq_recycle_data(rq, data);
> > > > > > > > > > > > > > > > > +       }
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > +map_err:
> > > > > > > > > > > > > > > > > +       put_page(virt_to_head_page(buf));
> > > > > > > > > > > > > > > > >         return err;
> > > > > > > > > > > > > > > > >  }
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > @@ -1775,13 +2018,13 @@ static int virtnet_receive(struct receive_queue *rq, int budget,
> > > > > > > > > > > > > > > > >                 void *ctx;
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >                 while (stats.packets < budget &&
> > > > > > > > > > > > > > > > > -                      (buf = virtqueue_get_buf_ctx(rq->vq, &len, &ctx))) {
> > > > > > > > > > > > > > > > > +                      (buf = virtnet_rq_get_buf(rq, &len, &ctx))) {
> > > > > > > > > > > > > > > > >                         receive_buf(vi, rq, buf, len, ctx, xdp_xmit, &stats);
> > > > > > > > > > > > > > > > >                         stats.packets++;
> > > > > > > > > > > > > > > > >                 }
> > > > > > > > > > > > > > > > >         } else {
> > > > > > > > > > > > > > > > >                 while (stats.packets < budget &&
> > > > > > > > > > > > > > > > > -                      (buf = virtqueue_get_buf(rq->vq, &len)) != NULL) {
> > > > > > > > > > > > > > > > > +                      (buf = virtnet_rq_get_buf(rq, &len, NULL)) != NULL) {
> > > > > > > > > > > > > > > > >                         receive_buf(vi, rq, buf, len, NULL, xdp_xmit, &stats);
> > > > > > > > > > > > > > > > >                         stats.packets++;
> > > > > > > > > > > > > > > > >                 }
> > > > > > > > > > > > > > > > > @@ -3514,6 +3757,9 @@ static void virtnet_free_queues(struct virtnet_info *vi)
> > > > > > > > > > > > > > > > >         for (i = 0; i < vi->max_queue_pairs; i++) {
> > > > > > > > > > > > > > > > >                 __netif_napi_del(&vi->rq[i].napi);
> > > > > > > > > > > > > > > > >                 __netif_napi_del(&vi->sq[i].napi);
> > > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > > +               kfree(vi->rq[i].data_array);
> > > > > > > > > > > > > > > > > +               kfree(vi->rq[i].dma_array);
> > > > > > > > > > > > > > > > >         }
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >         /* We called __netif_napi_del(),
> > > > > > > > > > > > > > > > > @@ -3591,9 +3837,10 @@ static void free_unused_bufs(struct virtnet_info *vi)
> > > > > > > > > > > > > > > > >         }
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >         for (i = 0; i < vi->max_queue_pairs; i++) {
> > > > > > > > > > > > > > > > > -               struct virtqueue *vq = vi->rq[i].vq;
> > > > > > > > > > > > > > > > > -               while ((buf = virtqueue_detach_unused_buf(vq)) != NULL)
> > > > > > > > > > > > > > > > > -                       virtnet_rq_free_unused_buf(vq, buf);
> > > > > > > > > > > > > > > > > +               struct receive_queue *rq = &vi->rq[i];
> > > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > > +               while ((buf = virtnet_rq_detach_unused_buf(rq)) != NULL)
> > > > > > > > > > > > > > > > > +                       virtnet_rq_free_unused_buf(rq->vq, buf);
> > > > > > > > > > > > > > > > >                 cond_resched();
> > > > > > > > > > > > > > > > >         }
> > > > > > > > > > > > > > > > >  }
> > > > > > > > > > > > > > > > > @@ -3767,6 +4014,10 @@ static int init_vqs(struct virtnet_info *vi)
> > > > > > > > > > > > > > > > >         if (ret)
> > > > > > > > > > > > > > > > >                 goto err_free;
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > +       ret = virtnet_rq_merge_map_init(vi);
> > > > > > > > > > > > > > > > > +       if (ret)
> > > > > > > > > > > > > > > > > +               goto err_free;
> > > > > > > > > > > > > > > > > +
> > > > > > > > > > > > > > > > >         cpus_read_lock();
> > > > > > > > > > > > > > > > >         virtnet_set_affinity(vi);
> > > > > > > > > > > > > > > > >         cpus_read_unlock();
> > > > > > > > > > > > > > > > > --
> > > > > > > > > > > > > > > > > 2.32.0.3.g01195cf9f
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > >
> > >
>
>
> _______________________________________________
> Virtualization mailing list
> Virtualization@lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/virtualization

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: [PATCH vhost v11 03/10] virtio_ring: introduce virtqueue_set_premapped()
  2023-07-13 14:47       ` Michael S. Tsirkin
@ 2023-07-20  6:22         ` Christoph Hellwig
  -1 siblings, 0 replies; 176+ messages in thread
From: Christoph Hellwig @ 2023-07-20  6:22 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Christoph Hellwig, Xuan Zhuo, virtualization, Jason Wang,
	David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Alexei Starovoitov, Daniel Borkmann, Jesper Dangaard Brouer,
	John Fastabend, netdev, bpf

On Thu, Jul 13, 2023 at 10:47:23AM -0400, Michael S. Tsirkin wrote:
> There are a gazillion virtio drivers and most of them just use the
> virtio API, without bothering with these micro-optimizations.  virtio
> already tracks addresses so mapping/unmapping them for DMA is easier
> done in the core.  It's only networking and only with XDP where the
> difference becomes measureable.

Yes, but now you two differing code paths (which then branch into
another two with the fake DMA mappings).  I'm really worried about
the madness that follows like the USB dma mapping code that is a
constant soure of trouble.


^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: [PATCH vhost v11 03/10] virtio_ring: introduce virtqueue_set_premapped()
@ 2023-07-20  6:22         ` Christoph Hellwig
  0 siblings, 0 replies; 176+ messages in thread
From: Christoph Hellwig @ 2023-07-20  6:22 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Xuan Zhuo, Jesper Dangaard Brouer, Daniel Borkmann, netdev,
	John Fastabend, Alexei Starovoitov, virtualization,
	Christoph Hellwig, Eric Dumazet, Jakub Kicinski, bpf,
	Paolo Abeni, David S. Miller

On Thu, Jul 13, 2023 at 10:47:23AM -0400, Michael S. Tsirkin wrote:
> There are a gazillion virtio drivers and most of them just use the
> virtio API, without bothering with these micro-optimizations.  virtio
> already tracks addresses so mapping/unmapping them for DMA is easier
> done in the core.  It's only networking and only with XDP where the
> difference becomes measureable.

Yes, but now you two differing code paths (which then branch into
another two with the fake DMA mappings).  I'm really worried about
the madness that follows like the USB dma mapping code that is a
constant soure of trouble.

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: [PATCH vhost v11 05/10] virtio_ring: introduce virtqueue_dma_dev()
  2023-07-13 14:51       ` Michael S. Tsirkin
@ 2023-07-20  6:22         ` Christoph Hellwig
  -1 siblings, 0 replies; 176+ messages in thread
From: Christoph Hellwig @ 2023-07-20  6:22 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Christoph Hellwig, Xuan Zhuo, virtualization, Jason Wang,
	David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Alexei Starovoitov, Daniel Borkmann, Jesper Dangaard Brouer,
	John Fastabend, netdev, bpf

On Thu, Jul 13, 2023 at 10:51:59AM -0400, Michael S. Tsirkin wrote:
> On Thu, Jul 13, 2023 at 04:15:16AM -0700, Christoph Hellwig wrote:
> > On Mon, Jul 10, 2023 at 11:42:32AM +0800, Xuan Zhuo wrote:
> > > Added virtqueue_dma_dev() to get DMA device for virtio. Then the
> > > caller can do dma operation in advance. The purpose is to keep memory
> > > mapped across multiple add/get buf operations.
> > 
> > This is just poking holes into the abstraction..
> 
> More specifically?

Because now you expose a device that can't be used for the non-dma
mapping case and shoud be hidden.

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: [PATCH vhost v11 05/10] virtio_ring: introduce virtqueue_dma_dev()
@ 2023-07-20  6:22         ` Christoph Hellwig
  0 siblings, 0 replies; 176+ messages in thread
From: Christoph Hellwig @ 2023-07-20  6:22 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Xuan Zhuo, Jesper Dangaard Brouer, Daniel Borkmann, netdev,
	John Fastabend, Alexei Starovoitov, virtualization,
	Christoph Hellwig, Eric Dumazet, Jakub Kicinski, bpf,
	Paolo Abeni, David S. Miller

On Thu, Jul 13, 2023 at 10:51:59AM -0400, Michael S. Tsirkin wrote:
> On Thu, Jul 13, 2023 at 04:15:16AM -0700, Christoph Hellwig wrote:
> > On Mon, Jul 10, 2023 at 11:42:32AM +0800, Xuan Zhuo wrote:
> > > Added virtqueue_dma_dev() to get DMA device for virtio. Then the
> > > caller can do dma operation in advance. The purpose is to keep memory
> > > mapped across multiple add/get buf operations.
> > 
> > This is just poking holes into the abstraction..
> 
> More specifically?

Because now you expose a device that can't be used for the non-dma
mapping case and shoud be hidden.
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: [PATCH vhost v11 10/10] virtio_net: merge dma operation for one page
  2023-07-14  3:56         ` Jason Wang
@ 2023-07-20  6:23           ` Christoph Hellwig
  -1 siblings, 0 replies; 176+ messages in thread
From: Christoph Hellwig @ 2023-07-20  6:23 UTC (permalink / raw)
  To: Jason Wang
  Cc: Xuan Zhuo, virtualization, Michael S. Tsirkin, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Alexei Starovoitov,
	Daniel Borkmann, Jesper Dangaard Brouer, John Fastabend, netdev,
	bpf, Christoph Hellwig

Hi Jason,

can you please resend your reply with proper quoting?  I had to give
up after multiple pages of scrolling without finding anything that
you added to the full quote.


^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: [PATCH vhost v11 10/10] virtio_net: merge dma operation for one page
@ 2023-07-20  6:23           ` Christoph Hellwig
  0 siblings, 0 replies; 176+ messages in thread
From: Christoph Hellwig @ 2023-07-20  6:23 UTC (permalink / raw)
  To: Jason Wang
  Cc: Xuan Zhuo, Jesper Dangaard Brouer, Daniel Borkmann,
	Michael S. Tsirkin, netdev, John Fastabend, Alexei Starovoitov,
	virtualization, Christoph Hellwig, Eric Dumazet, Jakub Kicinski,
	bpf, Paolo Abeni, David S. Miller

Hi Jason,

can you please resend your reply with proper quoting?  I had to give
up after multiple pages of scrolling without finding anything that
you added to the full quote.

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: [PATCH vhost v11 05/10] virtio_ring: introduce virtqueue_dma_dev()
  2023-07-20  6:22         ` Christoph Hellwig
@ 2023-07-20  6:45           ` Xuan Zhuo
  -1 siblings, 0 replies; 176+ messages in thread
From: Xuan Zhuo @ 2023-07-20  6:45 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Christoph Hellwig, virtualization, Jason Wang, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Alexei Starovoitov,
	Daniel Borkmann, Jesper Dangaard Brouer, John Fastabend, netdev,
	bpf, Michael S. Tsirkin

On Wed, 19 Jul 2023 23:22:42 -0700, Christoph Hellwig <hch@infradead.org> wrote:
> On Thu, Jul 13, 2023 at 10:51:59AM -0400, Michael S. Tsirkin wrote:
> > On Thu, Jul 13, 2023 at 04:15:16AM -0700, Christoph Hellwig wrote:
> > > On Mon, Jul 10, 2023 at 11:42:32AM +0800, Xuan Zhuo wrote:
> > > > Added virtqueue_dma_dev() to get DMA device for virtio. Then the
> > > > caller can do dma operation in advance. The purpose is to keep memory
> > > > mapped across multiple add/get buf operations.
> > >
> > > This is just poking holes into the abstraction..
> >
> > More specifically?
>
> Because now you expose a device that can't be used for the non-dma
> mapping case and shoud be hidden.

 Sorry I can not got.

 virtqueue_dma_dev() return the device that working with the DMA APIs.
 Then that can be used like other devices. So what is the problem.

 I always think the code path without the DMA APIs is the trouble for you.

 Thanks.


^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: [PATCH vhost v11 05/10] virtio_ring: introduce virtqueue_dma_dev()
@ 2023-07-20  6:45           ` Xuan Zhuo
  0 siblings, 0 replies; 176+ messages in thread
From: Xuan Zhuo @ 2023-07-20  6:45 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jesper Dangaard Brouer, Daniel Borkmann, Michael S. Tsirkin,
	netdev, John Fastabend, Alexei Starovoitov, virtualization,
	Christoph Hellwig, Eric Dumazet, Jakub Kicinski, bpf,
	Paolo Abeni, David S. Miller

On Wed, 19 Jul 2023 23:22:42 -0700, Christoph Hellwig <hch@infradead.org> wrote:
> On Thu, Jul 13, 2023 at 10:51:59AM -0400, Michael S. Tsirkin wrote:
> > On Thu, Jul 13, 2023 at 04:15:16AM -0700, Christoph Hellwig wrote:
> > > On Mon, Jul 10, 2023 at 11:42:32AM +0800, Xuan Zhuo wrote:
> > > > Added virtqueue_dma_dev() to get DMA device for virtio. Then the
> > > > caller can do dma operation in advance. The purpose is to keep memory
> > > > mapped across multiple add/get buf operations.
> > >
> > > This is just poking holes into the abstraction..
> >
> > More specifically?
>
> Because now you expose a device that can't be used for the non-dma
> mapping case and shoud be hidden.

 Sorry I can not got.

 virtqueue_dma_dev() return the device that working with the DMA APIs.
 Then that can be used like other devices. So what is the problem.

 I always think the code path without the DMA APIs is the trouble for you.

 Thanks.

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: [PATCH vhost v11 05/10] virtio_ring: introduce virtqueue_dma_dev()
  2023-07-20  6:45           ` Xuan Zhuo
@ 2023-07-20  6:57             ` Christoph Hellwig
  -1 siblings, 0 replies; 176+ messages in thread
From: Christoph Hellwig @ 2023-07-20  6:57 UTC (permalink / raw)
  To: Xuan Zhuo
  Cc: Christoph Hellwig, virtualization, Jason Wang, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Alexei Starovoitov,
	Daniel Borkmann, Jesper Dangaard Brouer, John Fastabend, netdev,
	bpf, Michael S. Tsirkin

On Thu, Jul 20, 2023 at 02:45:14PM +0800, Xuan Zhuo wrote:
>  virtqueue_dma_dev() return the device that working with the DMA APIs.
>  Then that can be used like other devices. So what is the problem.
> 
>  I always think the code path without the DMA APIs is the trouble for you.

Because we now have an API where the upper level drivers sometimes
see the dma device and sometimes not.  This will be abused and cause
trouble sooner than you can say "layering".

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: [PATCH vhost v11 05/10] virtio_ring: introduce virtqueue_dma_dev()
@ 2023-07-20  6:57             ` Christoph Hellwig
  0 siblings, 0 replies; 176+ messages in thread
From: Christoph Hellwig @ 2023-07-20  6:57 UTC (permalink / raw)
  To: Xuan Zhuo
  Cc: Jesper Dangaard Brouer, Daniel Borkmann, Michael S. Tsirkin,
	netdev, John Fastabend, Alexei Starovoitov, virtualization,
	Christoph Hellwig, Eric Dumazet, Jakub Kicinski, bpf,
	Paolo Abeni, David S. Miller

On Thu, Jul 20, 2023 at 02:45:14PM +0800, Xuan Zhuo wrote:
>  virtqueue_dma_dev() return the device that working with the DMA APIs.
>  Then that can be used like other devices. So what is the problem.
> 
>  I always think the code path without the DMA APIs is the trouble for you.

Because we now have an API where the upper level drivers sometimes
see the dma device and sometimes not.  This will be abused and cause
trouble sooner than you can say "layering".
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: [PATCH vhost v11 05/10] virtio_ring: introduce virtqueue_dma_dev()
  2023-07-20  6:57             ` Christoph Hellwig
@ 2023-07-20  7:34               ` Xuan Zhuo
  -1 siblings, 0 replies; 176+ messages in thread
From: Xuan Zhuo @ 2023-07-20  7:34 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Christoph Hellwig, virtualization, Jason Wang, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Alexei Starovoitov,
	Daniel Borkmann, Jesper Dangaard Brouer, John Fastabend, netdev,
	bpf, Michael S. Tsirkin

On Wed, 19 Jul 2023 23:57:51 -0700, Christoph Hellwig <hch@infradead.org> wrote:
> On Thu, Jul 20, 2023 at 02:45:14PM +0800, Xuan Zhuo wrote:
> >  virtqueue_dma_dev() return the device that working with the DMA APIs.
> >  Then that can be used like other devices. So what is the problem.
> >
> >  I always think the code path without the DMA APIs is the trouble for you.
>
> Because we now have an API where the upper level drivers sometimes
> see the dma device and sometimes not.

No dma device is just for the old devices.

The API without DMA dev are only compatible with older devices. We can't give up
these old devices, but we also have to embrace new features.

> This will be abused and cause
> trouble sooner than you can say "layering".

I don't understand what the possible trouble here is.

When no dma device, the driver just does the same thing as before.

Thanks.



^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: [PATCH vhost v11 05/10] virtio_ring: introduce virtqueue_dma_dev()
@ 2023-07-20  7:34               ` Xuan Zhuo
  0 siblings, 0 replies; 176+ messages in thread
From: Xuan Zhuo @ 2023-07-20  7:34 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jesper Dangaard Brouer, Daniel Borkmann, Michael S. Tsirkin,
	netdev, John Fastabend, Alexei Starovoitov, virtualization,
	Christoph Hellwig, Eric Dumazet, Jakub Kicinski, bpf,
	Paolo Abeni, David S. Miller

On Wed, 19 Jul 2023 23:57:51 -0700, Christoph Hellwig <hch@infradead.org> wrote:
> On Thu, Jul 20, 2023 at 02:45:14PM +0800, Xuan Zhuo wrote:
> >  virtqueue_dma_dev() return the device that working with the DMA APIs.
> >  Then that can be used like other devices. So what is the problem.
> >
> >  I always think the code path without the DMA APIs is the trouble for you.
>
> Because we now have an API where the upper level drivers sometimes
> see the dma device and sometimes not.

No dma device is just for the old devices.

The API without DMA dev are only compatible with older devices. We can't give up
these old devices, but we also have to embrace new features.

> This will be abused and cause
> trouble sooner than you can say "layering".

I don't understand what the possible trouble here is.

When no dma device, the driver just does the same thing as before.

Thanks.


_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: [PATCH vhost v11 10/10] virtio_net: merge dma operation for one page
  2023-07-20  6:23           ` Christoph Hellwig
@ 2023-07-20  7:41             ` Jason Wang
  -1 siblings, 0 replies; 176+ messages in thread
From: Jason Wang @ 2023-07-20  7:41 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Xuan Zhuo, virtualization, Michael S. Tsirkin, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Alexei Starovoitov,
	Daniel Borkmann, Jesper Dangaard Brouer, John Fastabend, netdev,
	bpf

On Thu, Jul 20, 2023 at 2:23 PM Christoph Hellwig <hch@infradead.org> wrote:
>
> Hi Jason,
>
> can you please resend your reply with proper quoting?  I had to give
> up after multiple pages of scrolling without finding anything that
> you added to the full quote.

I guess it's this part?

> > > You should also test without iommu but with swiotlb=force
> >
> >
> > For swiotlb, merge DMA has no benefit, because we still need to copy data from
> > swiotlb buffer to the origin buffer.
> > The benefit of the merge DMA is to reduce the operate to the iommu device.
> >
> > I did some test for this. The result is same.
> >
> > Thanks.
> >
>
> Did you actually check that it works though?
> Looks like with swiotlb you need to synch to trigger a copy
> before unmap, and I don't see where it's done in the current
> patch.

And this is needed for XDP_REDIRECT as well.

Thanks


^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: [PATCH vhost v11 10/10] virtio_net: merge dma operation for one page
@ 2023-07-20  7:41             ` Jason Wang
  0 siblings, 0 replies; 176+ messages in thread
From: Jason Wang @ 2023-07-20  7:41 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Xuan Zhuo, Jesper Dangaard Brouer, Daniel Borkmann,
	Michael S. Tsirkin, netdev, John Fastabend, Alexei Starovoitov,
	virtualization, Eric Dumazet, Jakub Kicinski, bpf, Paolo Abeni,
	David S. Miller

On Thu, Jul 20, 2023 at 2:23 PM Christoph Hellwig <hch@infradead.org> wrote:
>
> Hi Jason,
>
> can you please resend your reply with proper quoting?  I had to give
> up after multiple pages of scrolling without finding anything that
> you added to the full quote.

I guess it's this part?

> > > You should also test without iommu but with swiotlb=force
> >
> >
> > For swiotlb, merge DMA has no benefit, because we still need to copy data from
> > swiotlb buffer to the origin buffer.
> > The benefit of the merge DMA is to reduce the operate to the iommu device.
> >
> > I did some test for this. The result is same.
> >
> > Thanks.
> >
>
> Did you actually check that it works though?
> Looks like with swiotlb you need to synch to trigger a copy
> before unmap, and I don't see where it's done in the current
> patch.

And this is needed for XDP_REDIRECT as well.

Thanks

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: [PATCH vhost v11 10/10] virtio_net: merge dma operation for one page
  2023-07-20  7:41             ` Jason Wang
@ 2023-07-20  8:21               ` Christoph Hellwig
  -1 siblings, 0 replies; 176+ messages in thread
From: Christoph Hellwig @ 2023-07-20  8:21 UTC (permalink / raw)
  To: Jason Wang
  Cc: Xuan Zhuo, Jesper Dangaard Brouer, Daniel Borkmann,
	Michael S. Tsirkin, netdev, John Fastabend, Alexei Starovoitov,
	virtualization, Christoph Hellwig, Eric Dumazet, Jakub Kicinski,
	bpf, Paolo Abeni, David S. Miller

On Thu, Jul 20, 2023 at 03:41:56PM +0800, Jason Wang wrote:
> > Did you actually check that it works though?
> > Looks like with swiotlb you need to synch to trigger a copy
> > before unmap, and I don't see where it's done in the current
> > patch.
> 
> And this is needed for XDP_REDIRECT as well.

DMA always needs proper syncs, be that for swiotlb or for cache
maintainance, yes.
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: [PATCH vhost v11 10/10] virtio_net: merge dma operation for one page
@ 2023-07-20  8:21               ` Christoph Hellwig
  0 siblings, 0 replies; 176+ messages in thread
From: Christoph Hellwig @ 2023-07-20  8:21 UTC (permalink / raw)
  To: Jason Wang
  Cc: Christoph Hellwig, Xuan Zhuo, virtualization, Michael S. Tsirkin,
	David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Alexei Starovoitov, Daniel Borkmann, Jesper Dangaard Brouer,
	John Fastabend, netdev, bpf

On Thu, Jul 20, 2023 at 03:41:56PM +0800, Jason Wang wrote:
> > Did you actually check that it works though?
> > Looks like with swiotlb you need to synch to trigger a copy
> > before unmap, and I don't see where it's done in the current
> > patch.
> 
> And this is needed for XDP_REDIRECT as well.

DMA always needs proper syncs, be that for swiotlb or for cache
maintainance, yes.

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: [PATCH vhost v11 05/10] virtio_ring: introduce virtqueue_dma_dev()
  2023-07-20  6:22         ` Christoph Hellwig
@ 2023-07-20 17:21           ` Michael S. Tsirkin
  -1 siblings, 0 replies; 176+ messages in thread
From: Michael S. Tsirkin @ 2023-07-20 17:21 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Xuan Zhuo, Jesper Dangaard Brouer, Daniel Borkmann, netdev,
	John Fastabend, Alexei Starovoitov, virtualization, Eric Dumazet,
	Jakub Kicinski, bpf, Paolo Abeni, David S. Miller

On Wed, Jul 19, 2023 at 11:22:42PM -0700, Christoph Hellwig wrote:
> On Thu, Jul 13, 2023 at 10:51:59AM -0400, Michael S. Tsirkin wrote:
> > On Thu, Jul 13, 2023 at 04:15:16AM -0700, Christoph Hellwig wrote:
> > > On Mon, Jul 10, 2023 at 11:42:32AM +0800, Xuan Zhuo wrote:
> > > > Added virtqueue_dma_dev() to get DMA device for virtio. Then the
> > > > caller can do dma operation in advance. The purpose is to keep memory
> > > > mapped across multiple add/get buf operations.
> > > 
> > > This is just poking holes into the abstraction..
> > 
> > More specifically?
> 
> Because now you expose a device that can't be used for the non-dma
> mapping case and shoud be hidden.


Ah, ok.
Well I think we can add wrappers like virtio_dma_sync and so on.
There are NOP for non-dma so passing the dma device is harmless.

-- 
MST

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: [PATCH vhost v11 05/10] virtio_ring: introduce virtqueue_dma_dev()
@ 2023-07-20 17:21           ` Michael S. Tsirkin
  0 siblings, 0 replies; 176+ messages in thread
From: Michael S. Tsirkin @ 2023-07-20 17:21 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Xuan Zhuo, virtualization, Jason Wang, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Alexei Starovoitov,
	Daniel Borkmann, Jesper Dangaard Brouer, John Fastabend, netdev,
	bpf

On Wed, Jul 19, 2023 at 11:22:42PM -0700, Christoph Hellwig wrote:
> On Thu, Jul 13, 2023 at 10:51:59AM -0400, Michael S. Tsirkin wrote:
> > On Thu, Jul 13, 2023 at 04:15:16AM -0700, Christoph Hellwig wrote:
> > > On Mon, Jul 10, 2023 at 11:42:32AM +0800, Xuan Zhuo wrote:
> > > > Added virtqueue_dma_dev() to get DMA device for virtio. Then the
> > > > caller can do dma operation in advance. The purpose is to keep memory
> > > > mapped across multiple add/get buf operations.
> > > 
> > > This is just poking holes into the abstraction..
> > 
> > More specifically?
> 
> Because now you expose a device that can't be used for the non-dma
> mapping case and shoud be hidden.


Ah, ok.
Well I think we can add wrappers like virtio_dma_sync and so on.
There are NOP for non-dma so passing the dma device is harmless.

-- 
MST


^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: [PATCH vhost v11 05/10] virtio_ring: introduce virtqueue_dma_dev()
  2023-07-20 17:21           ` Michael S. Tsirkin
@ 2023-07-24 16:43             ` Christoph Hellwig
  -1 siblings, 0 replies; 176+ messages in thread
From: Christoph Hellwig @ 2023-07-24 16:43 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Xuan Zhuo, Jesper Dangaard Brouer, Daniel Borkmann, netdev,
	John Fastabend, Alexei Starovoitov, virtualization,
	Christoph Hellwig, Eric Dumazet, Jakub Kicinski, bpf,
	Paolo Abeni, David S. Miller

On Thu, Jul 20, 2023 at 01:21:07PM -0400, Michael S. Tsirkin wrote:
> Well I think we can add wrappers like virtio_dma_sync and so on.
> There are NOP for non-dma so passing the dma device is harmless.

Yes, please.
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: [PATCH vhost v11 05/10] virtio_ring: introduce virtqueue_dma_dev()
@ 2023-07-24 16:43             ` Christoph Hellwig
  0 siblings, 0 replies; 176+ messages in thread
From: Christoph Hellwig @ 2023-07-24 16:43 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Christoph Hellwig, Xuan Zhuo, virtualization, Jason Wang,
	David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Alexei Starovoitov, Daniel Borkmann, Jesper Dangaard Brouer,
	John Fastabend, netdev, bpf

On Thu, Jul 20, 2023 at 01:21:07PM -0400, Michael S. Tsirkin wrote:
> Well I think we can add wrappers like virtio_dma_sync and so on.
> There are NOP for non-dma so passing the dma device is harmless.

Yes, please.

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: [PATCH vhost v11 05/10] virtio_ring: introduce virtqueue_dma_dev()
  2023-07-20  7:34               ` Xuan Zhuo
@ 2023-07-24 20:05                 ` Michael S. Tsirkin
  -1 siblings, 0 replies; 176+ messages in thread
From: Michael S. Tsirkin @ 2023-07-24 20:05 UTC (permalink / raw)
  To: Xuan Zhuo
  Cc: Christoph Hellwig, virtualization, Jason Wang, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Alexei Starovoitov,
	Daniel Borkmann, Jesper Dangaard Brouer, John Fastabend, netdev,
	bpf

On Thu, Jul 20, 2023 at 03:34:01PM +0800, Xuan Zhuo wrote:
> On Wed, 19 Jul 2023 23:57:51 -0700, Christoph Hellwig <hch@infradead.org> wrote:
> > On Thu, Jul 20, 2023 at 02:45:14PM +0800, Xuan Zhuo wrote:
> > >  virtqueue_dma_dev() return the device that working with the DMA APIs.
> > >  Then that can be used like other devices. So what is the problem.
> > >
> > >  I always think the code path without the DMA APIs is the trouble for you.
> >
> > Because we now have an API where the upper level drivers sometimes
> > see the dma device and sometimes not.
> 
> No dma device is just for the old devices.
> 
> The API without DMA dev are only compatible with older devices. We can't give up
> these old devices, but we also have to embrace new features.
> 
> > This will be abused and cause
> > trouble sooner than you can say "layering".
> 
> I don't understand what the possible trouble here is.
> 
> When no dma device, the driver just does the same thing as before.
> 
> Thanks.

Instead of skipping operations, Christoph wants wrappers that
do nothing for non dma case.

-- 
MST


^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: [PATCH vhost v11 05/10] virtio_ring: introduce virtqueue_dma_dev()
@ 2023-07-24 20:05                 ` Michael S. Tsirkin
  0 siblings, 0 replies; 176+ messages in thread
From: Michael S. Tsirkin @ 2023-07-24 20:05 UTC (permalink / raw)
  To: Xuan Zhuo
  Cc: Jesper Dangaard Brouer, Daniel Borkmann, netdev, John Fastabend,
	Alexei Starovoitov, virtualization, Christoph Hellwig,
	Eric Dumazet, Jakub Kicinski, bpf, Paolo Abeni, David S. Miller

On Thu, Jul 20, 2023 at 03:34:01PM +0800, Xuan Zhuo wrote:
> On Wed, 19 Jul 2023 23:57:51 -0700, Christoph Hellwig <hch@infradead.org> wrote:
> > On Thu, Jul 20, 2023 at 02:45:14PM +0800, Xuan Zhuo wrote:
> > >  virtqueue_dma_dev() return the device that working with the DMA APIs.
> > >  Then that can be used like other devices. So what is the problem.
> > >
> > >  I always think the code path without the DMA APIs is the trouble for you.
> >
> > Because we now have an API where the upper level drivers sometimes
> > see the dma device and sometimes not.
> 
> No dma device is just for the old devices.
> 
> The API without DMA dev are only compatible with older devices. We can't give up
> these old devices, but we also have to embrace new features.
> 
> > This will be abused and cause
> > trouble sooner than you can say "layering".
> 
> I don't understand what the possible trouble here is.
> 
> When no dma device, the driver just does the same thing as before.
> 
> Thanks.

Instead of skipping operations, Christoph wants wrappers that
do nothing for non dma case.

-- 
MST

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: [PATCH vhost v11 05/10] virtio_ring: introduce virtqueue_dma_dev()
  2023-07-24 16:43             ` Christoph Hellwig
@ 2023-07-25  2:13               ` Xuan Zhuo
  -1 siblings, 0 replies; 176+ messages in thread
From: Xuan Zhuo @ 2023-07-25  2:13 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jesper Dangaard Brouer, Daniel Borkmann, Michael S. Tsirkin,
	netdev, John Fastabend, Alexei Starovoitov, virtualization,
	Christoph Hellwig, Eric Dumazet, Jakub Kicinski, bpf,
	Paolo Abeni, David S. Miller

On Mon, 24 Jul 2023 09:43:42 -0700, Christoph Hellwig <hch@infradead.org> wrote:
> On Thu, Jul 20, 2023 at 01:21:07PM -0400, Michael S. Tsirkin wrote:
> > Well I think we can add wrappers like virtio_dma_sync and so on.
> > There are NOP for non-dma so passing the dma device is harmless.
>
> Yes, please.


I am not sure I got this fully.

Are you mean this:
https://lore.kernel.org/all/20230214072704.126660-8-xuanzhuo@linux.alibaba.com/
https://lore.kernel.org/all/20230214072704.126660-9-xuanzhuo@linux.alibaba.com/

Then the driver must do dma operation(map and sync) by these virtio_dma_* APIs.
No care the device is non-dma device or dma device.

Then the AF_XDP must use these virtio_dma_* APIs for virtio device.

Thanks

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: [PATCH vhost v11 05/10] virtio_ring: introduce virtqueue_dma_dev()
@ 2023-07-25  2:13               ` Xuan Zhuo
  0 siblings, 0 replies; 176+ messages in thread
From: Xuan Zhuo @ 2023-07-25  2:13 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Christoph Hellwig, virtualization, Jason Wang, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Alexei Starovoitov,
	Daniel Borkmann, Jesper Dangaard Brouer, John Fastabend, netdev,
	bpf, Michael S. Tsirkin

On Mon, 24 Jul 2023 09:43:42 -0700, Christoph Hellwig <hch@infradead.org> wrote:
> On Thu, Jul 20, 2023 at 01:21:07PM -0400, Michael S. Tsirkin wrote:
> > Well I think we can add wrappers like virtio_dma_sync and so on.
> > There are NOP for non-dma so passing the dma device is harmless.
>
> Yes, please.


I am not sure I got this fully.

Are you mean this:
https://lore.kernel.org/all/20230214072704.126660-8-xuanzhuo@linux.alibaba.com/
https://lore.kernel.org/all/20230214072704.126660-9-xuanzhuo@linux.alibaba.com/

Then the driver must do dma operation(map and sync) by these virtio_dma_* APIs.
No care the device is non-dma device or dma device.

Then the AF_XDP must use these virtio_dma_* APIs for virtio device.

Thanks


^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: [PATCH vhost v11 05/10] virtio_ring: introduce virtqueue_dma_dev()
  2023-07-25  2:13               ` Xuan Zhuo
@ 2023-07-25  7:34                 ` Michael S. Tsirkin
  -1 siblings, 0 replies; 176+ messages in thread
From: Michael S. Tsirkin @ 2023-07-25  7:34 UTC (permalink / raw)
  To: Xuan Zhuo
  Cc: Jesper Dangaard Brouer, Daniel Borkmann, netdev, John Fastabend,
	Alexei Starovoitov, virtualization, Christoph Hellwig,
	Eric Dumazet, Jakub Kicinski, bpf, Paolo Abeni, David S. Miller

On Tue, Jul 25, 2023 at 10:13:48AM +0800, Xuan Zhuo wrote:
> On Mon, 24 Jul 2023 09:43:42 -0700, Christoph Hellwig <hch@infradead.org> wrote:
> > On Thu, Jul 20, 2023 at 01:21:07PM -0400, Michael S. Tsirkin wrote:
> > > Well I think we can add wrappers like virtio_dma_sync and so on.
> > > There are NOP for non-dma so passing the dma device is harmless.
> >
> > Yes, please.
> 
> 
> I am not sure I got this fully.
> 
> Are you mean this:
> https://lore.kernel.org/all/20230214072704.126660-8-xuanzhuo@linux.alibaba.com/
> https://lore.kernel.org/all/20230214072704.126660-9-xuanzhuo@linux.alibaba.com/
> 
> Then the driver must do dma operation(map and sync) by these virtio_dma_* APIs.
> No care the device is non-dma device or dma device.

yes

> Then the AF_XDP must use these virtio_dma_* APIs for virtio device.

We'll worry about AF_XDP when the patch is posted.

-- 
MST

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: [PATCH vhost v11 05/10] virtio_ring: introduce virtqueue_dma_dev()
@ 2023-07-25  7:34                 ` Michael S. Tsirkin
  0 siblings, 0 replies; 176+ messages in thread
From: Michael S. Tsirkin @ 2023-07-25  7:34 UTC (permalink / raw)
  To: Xuan Zhuo
  Cc: Christoph Hellwig, virtualization, Jason Wang, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Alexei Starovoitov,
	Daniel Borkmann, Jesper Dangaard Brouer, John Fastabend, netdev,
	bpf

On Tue, Jul 25, 2023 at 10:13:48AM +0800, Xuan Zhuo wrote:
> On Mon, 24 Jul 2023 09:43:42 -0700, Christoph Hellwig <hch@infradead.org> wrote:
> > On Thu, Jul 20, 2023 at 01:21:07PM -0400, Michael S. Tsirkin wrote:
> > > Well I think we can add wrappers like virtio_dma_sync and so on.
> > > There are NOP for non-dma so passing the dma device is harmless.
> >
> > Yes, please.
> 
> 
> I am not sure I got this fully.
> 
> Are you mean this:
> https://lore.kernel.org/all/20230214072704.126660-8-xuanzhuo@linux.alibaba.com/
> https://lore.kernel.org/all/20230214072704.126660-9-xuanzhuo@linux.alibaba.com/
> 
> Then the driver must do dma operation(map and sync) by these virtio_dma_* APIs.
> No care the device is non-dma device or dma device.

yes

> Then the AF_XDP must use these virtio_dma_* APIs for virtio device.

We'll worry about AF_XDP when the patch is posted.

-- 
MST


^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: [PATCH vhost v11 05/10] virtio_ring: introduce virtqueue_dma_dev()
  2023-07-25  7:34                 ` Michael S. Tsirkin
@ 2023-07-25 11:07                   ` Xuan Zhuo
  -1 siblings, 0 replies; 176+ messages in thread
From: Xuan Zhuo @ 2023-07-25 11:07 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Jesper Dangaard Brouer, Daniel Borkmann, netdev, John Fastabend,
	Alexei Starovoitov, virtualization, Christoph Hellwig,
	Eric Dumazet, Jakub Kicinski, bpf, Paolo Abeni, David S. Miller

On Tue, 25 Jul 2023 03:34:34 -0400, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> On Tue, Jul 25, 2023 at 10:13:48AM +0800, Xuan Zhuo wrote:
> > On Mon, 24 Jul 2023 09:43:42 -0700, Christoph Hellwig <hch@infradead.org> wrote:
> > > On Thu, Jul 20, 2023 at 01:21:07PM -0400, Michael S. Tsirkin wrote:
> > > > Well I think we can add wrappers like virtio_dma_sync and so on.
> > > > There are NOP for non-dma so passing the dma device is harmless.
> > >
> > > Yes, please.
> >
> >
> > I am not sure I got this fully.
> >
> > Are you mean this:
> > https://lore.kernel.org/all/20230214072704.126660-8-xuanzhuo@linux.alibaba.com/
> > https://lore.kernel.org/all/20230214072704.126660-9-xuanzhuo@linux.alibaba.com/
> >
> > Then the driver must do dma operation(map and sync) by these virtio_dma_* APIs.
> > No care the device is non-dma device or dma device.
>
> yes
>
> > Then the AF_XDP must use these virtio_dma_* APIs for virtio device.
>
> We'll worry about AF_XDP when the patch is posted.

YES.

We discussed it. They voted 'no'.

http://lore.kernel.org/all/20230424082856.15c1e593@kernel.org

Thanks.


>
> --
> MST
>
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: [PATCH vhost v11 05/10] virtio_ring: introduce virtqueue_dma_dev()
@ 2023-07-25 11:07                   ` Xuan Zhuo
  0 siblings, 0 replies; 176+ messages in thread
From: Xuan Zhuo @ 2023-07-25 11:07 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Christoph Hellwig, virtualization, Jason Wang, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Alexei Starovoitov,
	Daniel Borkmann, Jesper Dangaard Brouer, John Fastabend, netdev,
	bpf

On Tue, 25 Jul 2023 03:34:34 -0400, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> On Tue, Jul 25, 2023 at 10:13:48AM +0800, Xuan Zhuo wrote:
> > On Mon, 24 Jul 2023 09:43:42 -0700, Christoph Hellwig <hch@infradead.org> wrote:
> > > On Thu, Jul 20, 2023 at 01:21:07PM -0400, Michael S. Tsirkin wrote:
> > > > Well I think we can add wrappers like virtio_dma_sync and so on.
> > > > There are NOP for non-dma so passing the dma device is harmless.
> > >
> > > Yes, please.
> >
> >
> > I am not sure I got this fully.
> >
> > Are you mean this:
> > https://lore.kernel.org/all/20230214072704.126660-8-xuanzhuo@linux.alibaba.com/
> > https://lore.kernel.org/all/20230214072704.126660-9-xuanzhuo@linux.alibaba.com/
> >
> > Then the driver must do dma operation(map and sync) by these virtio_dma_* APIs.
> > No care the device is non-dma device or dma device.
>
> yes
>
> > Then the AF_XDP must use these virtio_dma_* APIs for virtio device.
>
> We'll worry about AF_XDP when the patch is posted.

YES.

We discussed it. They voted 'no'.

http://lore.kernel.org/all/20230424082856.15c1e593@kernel.org

Thanks.


>
> --
> MST
>

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: [PATCH vhost v11 05/10] virtio_ring: introduce virtqueue_dma_dev()
  2023-07-25 11:07                   ` Xuan Zhuo
@ 2023-07-28  6:02                     ` Xuan Zhuo
  -1 siblings, 0 replies; 176+ messages in thread
From: Xuan Zhuo @ 2023-07-28  6:02 UTC (permalink / raw)
  To: Xuan Zhuo
  Cc: Jesper Dangaard Brouer, Daniel Borkmann, Michael S. Tsirkin,
	netdev, John Fastabend, Alexei Starovoitov, virtualization,
	Christoph Hellwig, Eric Dumazet, Jakub Kicinski, bpf,
	Paolo Abeni, David S. Miller

On Tue, 25 Jul 2023 19:07:23 +0800, Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> On Tue, 25 Jul 2023 03:34:34 -0400, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > On Tue, Jul 25, 2023 at 10:13:48AM +0800, Xuan Zhuo wrote:
> > > On Mon, 24 Jul 2023 09:43:42 -0700, Christoph Hellwig <hch@infradead.org> wrote:
> > > > On Thu, Jul 20, 2023 at 01:21:07PM -0400, Michael S. Tsirkin wrote:
> > > > > Well I think we can add wrappers like virtio_dma_sync and so on.
> > > > > There are NOP for non-dma so passing the dma device is harmless.
> > > >
> > > > Yes, please.
> > >
> > >
> > > I am not sure I got this fully.
> > >
> > > Are you mean this:
> > > https://lore.kernel.org/all/20230214072704.126660-8-xuanzhuo@linux.alibaba.com/
> > > https://lore.kernel.org/all/20230214072704.126660-9-xuanzhuo@linux.alibaba.com/
> > >
> > > Then the driver must do dma operation(map and sync) by these virtio_dma_* APIs.
> > > No care the device is non-dma device or dma device.
> >
> > yes
> >
> > > Then the AF_XDP must use these virtio_dma_* APIs for virtio device.
> >
> > We'll worry about AF_XDP when the patch is posted.
>
> YES.
>
> We discussed it. They voted 'no'.
>
> http://lore.kernel.org/all/20230424082856.15c1e593@kernel.org


Hi guys, this topic is stuck again. How should I proceed with this work?

Let me briefly summarize:
1. The problem with adding virtio_dma_{map, sync} api is that, for AF_XDP and
the driver layer, we need to support these APIs. The current conclusion of
AF_XDP is no.

2. Set dma_set_mask_and_coherent, then we can use DMA API uniformly inside
driver. This idea seems to be inconsistent with the framework design of DMA. The
conclusion is no.

3. We noticed that if the virtio device supports VIRTIO_F_ACCESS_PLATFORM, it
uses DMA API. And this type of device is the future direction, so we only
support DMA premapped for this type of virtio device. The problem with this
solution is that virtqueue_dma_dev() only returns dev in some cases, because
VIRTIO_F_ACCESS_PLATFORM is supported in such cases. Otherwise NULL is returned.
This option is currently NO.

So I'm wondering what should I do, from a DMA point of view, is there any
solution in case of using DMA API?

Thank you



>
> Thanks.
>
>
> >
> > --
> > MST
> >
>
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: [PATCH vhost v11 05/10] virtio_ring: introduce virtqueue_dma_dev()
@ 2023-07-28  6:02                     ` Xuan Zhuo
  0 siblings, 0 replies; 176+ messages in thread
From: Xuan Zhuo @ 2023-07-28  6:02 UTC (permalink / raw)
  To: Xuan Zhuo
  Cc: Christoph Hellwig, virtualization, Jason Wang, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Alexei Starovoitov,
	Daniel Borkmann, Jesper Dangaard Brouer, John Fastabend, netdev,
	bpf, Michael S. Tsirkin

On Tue, 25 Jul 2023 19:07:23 +0800, Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> On Tue, 25 Jul 2023 03:34:34 -0400, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > On Tue, Jul 25, 2023 at 10:13:48AM +0800, Xuan Zhuo wrote:
> > > On Mon, 24 Jul 2023 09:43:42 -0700, Christoph Hellwig <hch@infradead.org> wrote:
> > > > On Thu, Jul 20, 2023 at 01:21:07PM -0400, Michael S. Tsirkin wrote:
> > > > > Well I think we can add wrappers like virtio_dma_sync and so on.
> > > > > There are NOP for non-dma so passing the dma device is harmless.
> > > >
> > > > Yes, please.
> > >
> > >
> > > I am not sure I got this fully.
> > >
> > > Are you mean this:
> > > https://lore.kernel.org/all/20230214072704.126660-8-xuanzhuo@linux.alibaba.com/
> > > https://lore.kernel.org/all/20230214072704.126660-9-xuanzhuo@linux.alibaba.com/
> > >
> > > Then the driver must do dma operation(map and sync) by these virtio_dma_* APIs.
> > > No care the device is non-dma device or dma device.
> >
> > yes
> >
> > > Then the AF_XDP must use these virtio_dma_* APIs for virtio device.
> >
> > We'll worry about AF_XDP when the patch is posted.
>
> YES.
>
> We discussed it. They voted 'no'.
>
> http://lore.kernel.org/all/20230424082856.15c1e593@kernel.org


Hi guys, this topic is stuck again. How should I proceed with this work?

Let me briefly summarize:
1. The problem with adding virtio_dma_{map, sync} api is that, for AF_XDP and
the driver layer, we need to support these APIs. The current conclusion of
AF_XDP is no.

2. Set dma_set_mask_and_coherent, then we can use DMA API uniformly inside
driver. This idea seems to be inconsistent with the framework design of DMA. The
conclusion is no.

3. We noticed that if the virtio device supports VIRTIO_F_ACCESS_PLATFORM, it
uses DMA API. And this type of device is the future direction, so we only
support DMA premapped for this type of virtio device. The problem with this
solution is that virtqueue_dma_dev() only returns dev in some cases, because
VIRTIO_F_ACCESS_PLATFORM is supported in such cases. Otherwise NULL is returned.
This option is currently NO.

So I'm wondering what should I do, from a DMA point of view, is there any
solution in case of using DMA API?

Thank you



>
> Thanks.
>
>
> >
> > --
> > MST
> >
>

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: [PATCH vhost v11 05/10] virtio_ring: introduce virtqueue_dma_dev()
  2023-07-28  6:02                     ` Xuan Zhuo
  (?)
@ 2023-07-28 15:03                     ` Jakub Kicinski
  2023-07-31  1:23                         ` Jason Wang
  2023-07-31  2:34                         ` Xuan Zhuo
  -1 siblings, 2 replies; 176+ messages in thread
From: Jakub Kicinski @ 2023-07-28 15:03 UTC (permalink / raw)
  To: Xuan Zhuo
  Cc: Christoph Hellwig, virtualization, Jason Wang, David S. Miller,
	Eric Dumazet, Paolo Abeni, Alexei Starovoitov, Daniel Borkmann,
	Jesper Dangaard Brouer, John Fastabend, netdev, bpf,
	Michael S. Tsirkin

On Fri, 28 Jul 2023 14:02:33 +0800 Xuan Zhuo wrote:
> Hi guys, this topic is stuck again. How should I proceed with this work?
> 
> Let me briefly summarize:
> 1. The problem with adding virtio_dma_{map, sync} api is that, for AF_XDP and
> the driver layer, we need to support these APIs. The current conclusion of
> AF_XDP is no.
> 
> 2. Set dma_set_mask_and_coherent, then we can use DMA API uniformly inside
> driver. This idea seems to be inconsistent with the framework design of DMA. The
> conclusion is no.
> 
> 3. We noticed that if the virtio device supports VIRTIO_F_ACCESS_PLATFORM, it
> uses DMA API. And this type of device is the future direction, so we only
> support DMA premapped for this type of virtio device. The problem with this
> solution is that virtqueue_dma_dev() only returns dev in some cases, because
> VIRTIO_F_ACCESS_PLATFORM is supported in such cases. Otherwise NULL is returned.
> This option is currently NO.
> 
> So I'm wondering what should I do, from a DMA point of view, is there any
> solution in case of using DMA API?

I'd step back and ask you why do you want to use AF_XDP with virtio.
Instead of bifurcating one virtio instance into different queues why
not create a separate virtio instance?

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: [PATCH vhost v11 05/10] virtio_ring: introduce virtqueue_dma_dev()
  2023-07-28 15:03                     ` Jakub Kicinski
@ 2023-07-31  1:23                         ` Jason Wang
  2023-07-31  2:34                         ` Xuan Zhuo
  1 sibling, 0 replies; 176+ messages in thread
From: Jason Wang @ 2023-07-31  1:23 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Xuan Zhuo, Jesper Dangaard Brouer, Daniel Borkmann,
	Michael S. Tsirkin, netdev, John Fastabend, Alexei Starovoitov,
	virtualization, Christoph Hellwig, Eric Dumazet, bpf,
	Paolo Abeni, David S. Miller

On Fri, Jul 28, 2023 at 11:03 PM Jakub Kicinski <kuba@kernel.org> wrote:
>
> On Fri, 28 Jul 2023 14:02:33 +0800 Xuan Zhuo wrote:
> > Hi guys, this topic is stuck again. How should I proceed with this work?
> >
> > Let me briefly summarize:
> > 1. The problem with adding virtio_dma_{map, sync} api is that, for AF_XDP and
> > the driver layer, we need to support these APIs. The current conclusion of
> > AF_XDP is no.
> >
> > 2. Set dma_set_mask_and_coherent, then we can use DMA API uniformly inside
> > driver. This idea seems to be inconsistent with the framework design of DMA. The
> > conclusion is no.
> >
> > 3. We noticed that if the virtio device supports VIRTIO_F_ACCESS_PLATFORM, it
> > uses DMA API. And this type of device is the future direction, so we only
> > support DMA premapped for this type of virtio device. The problem with this
> > solution is that virtqueue_dma_dev() only returns dev in some cases, because
> > VIRTIO_F_ACCESS_PLATFORM is supported in such cases. Otherwise NULL is returned.
> > This option is currently NO.
> >
> > So I'm wondering what should I do, from a DMA point of view, is there any
> > solution in case of using DMA API?
>
> I'd step back and ask you why do you want to use AF_XDP with virtio.
> Instead of bifurcating one virtio instance into different queues why
> not create a separate virtio instance?
>

I'm not sure I get this, but do you mean a separate virtio device that
owns AF_XDP queues only? If I understand it correctly, bifurcating is
one of the key advantages of AF_XDP. What's more, current virtio
doesn't support being split at queue (pair) level. And it may still
suffer from the yes/no DMA API issue.

Thanks

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: [PATCH vhost v11 05/10] virtio_ring: introduce virtqueue_dma_dev()
@ 2023-07-31  1:23                         ` Jason Wang
  0 siblings, 0 replies; 176+ messages in thread
From: Jason Wang @ 2023-07-31  1:23 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Xuan Zhuo, Christoph Hellwig, virtualization, David S. Miller,
	Eric Dumazet, Paolo Abeni, Alexei Starovoitov, Daniel Borkmann,
	Jesper Dangaard Brouer, John Fastabend, netdev, bpf,
	Michael S. Tsirkin

On Fri, Jul 28, 2023 at 11:03 PM Jakub Kicinski <kuba@kernel.org> wrote:
>
> On Fri, 28 Jul 2023 14:02:33 +0800 Xuan Zhuo wrote:
> > Hi guys, this topic is stuck again. How should I proceed with this work?
> >
> > Let me briefly summarize:
> > 1. The problem with adding virtio_dma_{map, sync} api is that, for AF_XDP and
> > the driver layer, we need to support these APIs. The current conclusion of
> > AF_XDP is no.
> >
> > 2. Set dma_set_mask_and_coherent, then we can use DMA API uniformly inside
> > driver. This idea seems to be inconsistent with the framework design of DMA. The
> > conclusion is no.
> >
> > 3. We noticed that if the virtio device supports VIRTIO_F_ACCESS_PLATFORM, it
> > uses DMA API. And this type of device is the future direction, so we only
> > support DMA premapped for this type of virtio device. The problem with this
> > solution is that virtqueue_dma_dev() only returns dev in some cases, because
> > VIRTIO_F_ACCESS_PLATFORM is supported in such cases. Otherwise NULL is returned.
> > This option is currently NO.
> >
> > So I'm wondering what should I do, from a DMA point of view, is there any
> > solution in case of using DMA API?
>
> I'd step back and ask you why do you want to use AF_XDP with virtio.
> Instead of bifurcating one virtio instance into different queues why
> not create a separate virtio instance?
>

I'm not sure I get this, but do you mean a separate virtio device that
owns AF_XDP queues only? If I understand it correctly, bifurcating is
one of the key advantages of AF_XDP. What's more, current virtio
doesn't support being split at queue (pair) level. And it may still
suffer from the yes/no DMA API issue.

Thanks


^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: [PATCH vhost v11 05/10] virtio_ring: introduce virtqueue_dma_dev()
  2023-07-28 15:03                     ` Jakub Kicinski
@ 2023-07-31  2:34                         ` Xuan Zhuo
  2023-07-31  2:34                         ` Xuan Zhuo
  1 sibling, 0 replies; 176+ messages in thread
From: Xuan Zhuo @ 2023-07-31  2:34 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Jesper Dangaard Brouer, Daniel Borkmann, Michael S. Tsirkin,
	netdev, John Fastabend, Alexei Starovoitov, virtualization,
	Christoph Hellwig, Eric Dumazet, bpf, Paolo Abeni,
	David S. Miller

On Fri, 28 Jul 2023 08:03:05 -0700, Jakub Kicinski <kuba@kernel.org> wrote:
> On Fri, 28 Jul 2023 14:02:33 +0800 Xuan Zhuo wrote:
> > Hi guys, this topic is stuck again. How should I proceed with this work?
> >
> > Let me briefly summarize:
> > 1. The problem with adding virtio_dma_{map, sync} api is that, for AF_XDP and
> > the driver layer, we need to support these APIs. The current conclusion of
> > AF_XDP is no.
> >
> > 2. Set dma_set_mask_and_coherent, then we can use DMA API uniformly inside
> > driver. This idea seems to be inconsistent with the framework design of DMA. The
> > conclusion is no.
> >
> > 3. We noticed that if the virtio device supports VIRTIO_F_ACCESS_PLATFORM, it
> > uses DMA API. And this type of device is the future direction, so we only
> > support DMA premapped for this type of virtio device. The problem with this
> > solution is that virtqueue_dma_dev() only returns dev in some cases, because
> > VIRTIO_F_ACCESS_PLATFORM is supported in such cases. Otherwise NULL is returned.
> > This option is currently NO.
> >
> > So I'm wondering what should I do, from a DMA point of view, is there any
> > solution in case of using DMA API?
>
> I'd step back and ask you why do you want to use AF_XDP with virtio.

Or do you mean virtio vs virtio-net?
All I did with virtio was to get the virtio-net to support AF_XDP.

> Instead of bifurcating one virtio instance into different queues

That is not the key of our problem.

Even though we have a device that only works with AF_XDP,
it still has this DMA issues.

I think the current way(v11, v12) is a good solution, the only problem is that
if the device is old, we can not do dma with DMA APIs. Then we will not suppot
AF_XDP. I don't think it matters. But Christoph was a little worried.

Thanks.


> why
> not create a separate virtio instance?




_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: [PATCH vhost v11 05/10] virtio_ring: introduce virtqueue_dma_dev()
@ 2023-07-31  2:34                         ` Xuan Zhuo
  0 siblings, 0 replies; 176+ messages in thread
From: Xuan Zhuo @ 2023-07-31  2:34 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Christoph Hellwig, virtualization, Jason Wang, David S. Miller,
	Eric Dumazet, Paolo Abeni, Alexei Starovoitov, Daniel Borkmann,
	Jesper Dangaard Brouer, John Fastabend, netdev, bpf,
	Michael S. Tsirkin

On Fri, 28 Jul 2023 08:03:05 -0700, Jakub Kicinski <kuba@kernel.org> wrote:
> On Fri, 28 Jul 2023 14:02:33 +0800 Xuan Zhuo wrote:
> > Hi guys, this topic is stuck again. How should I proceed with this work?
> >
> > Let me briefly summarize:
> > 1. The problem with adding virtio_dma_{map, sync} api is that, for AF_XDP and
> > the driver layer, we need to support these APIs. The current conclusion of
> > AF_XDP is no.
> >
> > 2. Set dma_set_mask_and_coherent, then we can use DMA API uniformly inside
> > driver. This idea seems to be inconsistent with the framework design of DMA. The
> > conclusion is no.
> >
> > 3. We noticed that if the virtio device supports VIRTIO_F_ACCESS_PLATFORM, it
> > uses DMA API. And this type of device is the future direction, so we only
> > support DMA premapped for this type of virtio device. The problem with this
> > solution is that virtqueue_dma_dev() only returns dev in some cases, because
> > VIRTIO_F_ACCESS_PLATFORM is supported in such cases. Otherwise NULL is returned.
> > This option is currently NO.
> >
> > So I'm wondering what should I do, from a DMA point of view, is there any
> > solution in case of using DMA API?
>
> I'd step back and ask you why do you want to use AF_XDP with virtio.

Or do you mean virtio vs virtio-net?
All I did with virtio was to get the virtio-net to support AF_XDP.

> Instead of bifurcating one virtio instance into different queues

That is not the key of our problem.

Even though we have a device that only works with AF_XDP,
it still has this DMA issues.

I think the current way(v11, v12) is a good solution, the only problem is that
if the device is old, we can not do dma with DMA APIs. Then we will not suppot
AF_XDP. I don't think it matters. But Christoph was a little worried.

Thanks.


> why
> not create a separate virtio instance?





^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: [PATCH vhost v11 05/10] virtio_ring: introduce virtqueue_dma_dev()
  2023-07-31  1:23                         ` Jason Wang
  (?)
@ 2023-07-31 15:46                         ` Jakub Kicinski
  2023-08-01  2:03                             ` Xuan Zhuo
  -1 siblings, 1 reply; 176+ messages in thread
From: Jakub Kicinski @ 2023-07-31 15:46 UTC (permalink / raw)
  To: Jason Wang
  Cc: Xuan Zhuo, Christoph Hellwig, virtualization, David S. Miller,
	Eric Dumazet, Paolo Abeni, Alexei Starovoitov, Daniel Borkmann,
	Jesper Dangaard Brouer, John Fastabend, netdev, bpf,
	Michael S. Tsirkin

On Mon, 31 Jul 2023 09:23:29 +0800 Jason Wang wrote:
> > I'd step back and ask you why do you want to use AF_XDP with virtio.
> > Instead of bifurcating one virtio instance into different queues why
> > not create a separate virtio instance?
> 
> I'm not sure I get this, but do you mean a separate virtio device that
> owns AF_XDP queues only? If I understand it correctly, bifurcating is
> one of the key advantages of AF_XDP. What's more, current virtio
> doesn't support being split at queue (pair) level. And it may still
> suffer from the yes/no DMA API issue.

I guess we should step even further back and ask Xuan what the use case
is, because I'm not very sure. All we hear is "enable AF_XDP on virtio"
but AF_XDP is barely used on real HW, so why?

Bifurcating makes (used to make?) some sense in case of real HW when you
had only one PCI function and had to subdivide it. Virtio is either a SW
construct or offloaded to very capable HW, so either way cost of
creating an extra instance for DPDK or whatever else is very low.

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: [PATCH vhost v11 05/10] virtio_ring: introduce virtqueue_dma_dev()
  2023-07-31 15:46                         ` Jakub Kicinski
@ 2023-08-01  2:03                             ` Xuan Zhuo
  0 siblings, 0 replies; 176+ messages in thread
From: Xuan Zhuo @ 2023-08-01  2:03 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Christoph Hellwig, virtualization, David S.  Miller,
	Eric Dumazet, Paolo Abeni, Alexei Starovoitov, Daniel Borkmann,
	Jesper Dangaard Brouer, John Fastabend, netdev, bpf,
	Michael S. Tsirkin, Jason Wang

On Mon, 31 Jul 2023 08:46:51 -0700, Jakub Kicinski <kuba@kernel.org> wrote:
> On Mon, 31 Jul 2023 09:23:29 +0800 Jason Wang wrote:
> > > I'd step back and ask you why do you want to use AF_XDP with virtio.
> > > Instead of bifurcating one virtio instance into different queues why
> > > not create a separate virtio instance?
> >
> > I'm not sure I get this, but do you mean a separate virtio device that
> > owns AF_XDP queues only? If I understand it correctly, bifurcating is
> > one of the key advantages of AF_XDP. What's more, current virtio
> > doesn't support being split at queue (pair) level. And it may still
> > suffer from the yes/no DMA API issue.
>
> I guess we should step even further back and ask Xuan what the use case
> is, because I'm not very sure. All we hear is "enable AF_XDP on virtio"
> but AF_XDP is barely used on real HW, so why?


Why just for real HW?

I want to enable AF_XDP on virtio-net. Then the user can send/recv packets by
AF_XDP bypass through the kernel. That has be used on large scale.

I donot know what is the problem of the virtio-net.
Why do you think that the virtio-net cannot work with AF_XDP?


>
> Bifurcating makes (used to make?) some sense in case of real HW when you
> had only one PCI function and had to subdivide it.

Sorry I do not get this.


> Virtio is either a SW
> construct or offloaded to very capable HW, so either way cost of
> creating an extra instance for DPDK or whatever else is very low.


The extra instance is virtio-net?

I think there is a gap. So let me give you a brief introduction of our case.

Firstly, we donot use dpdk. We use the AF_XDP, because of that the AF_XDP is
more simpler and easy to deploy for the nginx.

We use the AF_XDP to speedup the UDP of the quic. By the library, the APP just
needs some simple change.

On the AliYun, the net driver is virtio-net. So we want the virtio-net support
the AF_XDP.

I guess what you mean is that we can speed up through the cooperation of devices
and drivers, but our machines are public clouds, and we cannot change the
back-end devices of virtio under normal circumstances.

Here I do not know the different of the real hw and the virtio-net.


Thanks.

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: [PATCH vhost v11 05/10] virtio_ring: introduce virtqueue_dma_dev()
@ 2023-08-01  2:03                             ` Xuan Zhuo
  0 siblings, 0 replies; 176+ messages in thread
From: Xuan Zhuo @ 2023-08-01  2:03 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Jesper Dangaard Brouer, Daniel Borkmann, Michael S. Tsirkin,
	netdev, John Fastabend, Alexei Starovoitov, virtualization,
	Christoph Hellwig, Eric Dumazet, bpf, Paolo Abeni,
	David S.  Miller

On Mon, 31 Jul 2023 08:46:51 -0700, Jakub Kicinski <kuba@kernel.org> wrote:
> On Mon, 31 Jul 2023 09:23:29 +0800 Jason Wang wrote:
> > > I'd step back and ask you why do you want to use AF_XDP with virtio.
> > > Instead of bifurcating one virtio instance into different queues why
> > > not create a separate virtio instance?
> >
> > I'm not sure I get this, but do you mean a separate virtio device that
> > owns AF_XDP queues only? If I understand it correctly, bifurcating is
> > one of the key advantages of AF_XDP. What's more, current virtio
> > doesn't support being split at queue (pair) level. And it may still
> > suffer from the yes/no DMA API issue.
>
> I guess we should step even further back and ask Xuan what the use case
> is, because I'm not very sure. All we hear is "enable AF_XDP on virtio"
> but AF_XDP is barely used on real HW, so why?


Why just for real HW?

I want to enable AF_XDP on virtio-net. Then the user can send/recv packets by
AF_XDP bypass through the kernel. That has be used on large scale.

I donot know what is the problem of the virtio-net.
Why do you think that the virtio-net cannot work with AF_XDP?


>
> Bifurcating makes (used to make?) some sense in case of real HW when you
> had only one PCI function and had to subdivide it.

Sorry I do not get this.


> Virtio is either a SW
> construct or offloaded to very capable HW, so either way cost of
> creating an extra instance for DPDK or whatever else is very low.


The extra instance is virtio-net?

I think there is a gap. So let me give you a brief introduction of our case.

Firstly, we donot use dpdk. We use the AF_XDP, because of that the AF_XDP is
more simpler and easy to deploy for the nginx.

We use the AF_XDP to speedup the UDP of the quic. By the library, the APP just
needs some simple change.

On the AliYun, the net driver is virtio-net. So we want the virtio-net support
the AF_XDP.

I guess what you mean is that we can speed up through the cooperation of devices
and drivers, but our machines are public clouds, and we cannot change the
back-end devices of virtio under normal circumstances.

Here I do not know the different of the real hw and the virtio-net.


Thanks.
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: [PATCH vhost v11 05/10] virtio_ring: introduce virtqueue_dma_dev()
  2023-08-01  2:03                             ` Xuan Zhuo
  (?)
@ 2023-08-01  2:36                             ` Jakub Kicinski
  2023-08-01  2:57                                 ` Xuan Zhuo
  -1 siblings, 1 reply; 176+ messages in thread
From: Jakub Kicinski @ 2023-08-01  2:36 UTC (permalink / raw)
  To: Xuan Zhuo
  Cc: Christoph Hellwig, virtualization, David S.  Miller,
	Eric Dumazet, Paolo Abeni, Alexei Starovoitov, Daniel Borkmann,
	Jesper Dangaard Brouer, John Fastabend, netdev, bpf,
	Michael S. Tsirkin, Jason Wang

On Tue, 1 Aug 2023 10:03:44 +0800 Xuan Zhuo wrote:
> > Virtio is either a SW
> > construct or offloaded to very capable HW, so either way cost of
> > creating an extra instance for DPDK or whatever else is very low.  
> 
> The extra instance is virtio-net?
> 
> I think there is a gap. So let me give you a brief introduction of our case.
> 
> Firstly, we donot use dpdk. We use the AF_XDP, because of that the AF_XDP is
> more simpler and easy to deploy for the nginx.
> 
> We use the AF_XDP to speedup the UDP of the quic. By the library, the APP just
> needs some simple change.
> 
> On the AliYun, the net driver is virtio-net. So we want the virtio-net support
> the AF_XDP.
> 
> I guess what you mean is that we can speed up through the cooperation of devices
> and drivers, but our machines are public clouds, and we cannot change the
> back-end devices of virtio under normal circumstances.
> 
> Here I do not know the different of the real hw and the virtio-net.

You have this working and benchmarked or this is just and idea?

What about io_uring zero copy w/ pre-registered buffers.
You'll get csum offload, GSO, all the normal perf features.

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: [PATCH vhost v11 05/10] virtio_ring: introduce virtqueue_dma_dev()
  2023-08-01  2:36                             ` Jakub Kicinski
@ 2023-08-01  2:57                                 ` Xuan Zhuo
  0 siblings, 0 replies; 176+ messages in thread
From: Xuan Zhuo @ 2023-08-01  2:57 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Jesper Dangaard Brouer, Daniel Borkmann, Michael S. Tsirkin,
	netdev, John Fastabend, Alexei Starovoitov, virtualization,
	Christoph Hellwig, Eric Dumazet, bpf, Paolo Abeni,
	David S.  Miller

On Mon, 31 Jul 2023 19:36:06 -0700, Jakub Kicinski <kuba@kernel.org> wrote:
> On Tue, 1 Aug 2023 10:03:44 +0800 Xuan Zhuo wrote:
> > > Virtio is either a SW
> > > construct or offloaded to very capable HW, so either way cost of
> > > creating an extra instance for DPDK or whatever else is very low.
> >
> > The extra instance is virtio-net?
> >
> > I think there is a gap. So let me give you a brief introduction of our case.
> >
> > Firstly, we donot use dpdk. We use the AF_XDP, because of that the AF_XDP is
> > more simpler and easy to deploy for the nginx.
> >
> > We use the AF_XDP to speedup the UDP of the quic. By the library, the APP just
> > needs some simple change.
> >
> > On the AliYun, the net driver is virtio-net. So we want the virtio-net support
> > the AF_XDP.
> >
> > I guess what you mean is that we can speed up through the cooperation of devices
> > and drivers, but our machines are public clouds, and we cannot change the
> > back-end devices of virtio under normal circumstances.
> >
> > Here I do not know the different of the real hw and the virtio-net.
>
> You have this working and benchmarked or this is just and idea?

This is not just an idea. I said that has been used on large scale.

This is the library for the APP to use the AF_XDP. We has open it.
https://gitee.com/anolis/libxudp

This is the Alibaba version of the nginx. That has been opened, that supported
to work with the libray to use AF_XDP.
http://tengine.taobao.org/

I supported this on our kernel release Anolis/Alinux.

The work was done about 2 years ago. You know, I pushed the first version to
enable AF_XDP on virtio-net about two years ago. I never thought the job would
be so difficult.

The nic (virtio-net) of AliYun can reach 24,000,000PPS.
So I think there is no different with the real HW on the performance.

With the AF_XDP, the UDP pps is seven times that of the kernel udp stack.

>
> What about io_uring zero copy w/ pre-registered buffers.
> You'll get csum offload, GSO, all the normal perf features.

We tried io-uring, but it was not suitable for our scenario.

Yes, now the AF_XDP does not support the csum offload and GSO.
This is indeed a small problem.

Thanks.
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: [PATCH vhost v11 05/10] virtio_ring: introduce virtqueue_dma_dev()
@ 2023-08-01  2:57                                 ` Xuan Zhuo
  0 siblings, 0 replies; 176+ messages in thread
From: Xuan Zhuo @ 2023-08-01  2:57 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Christoph Hellwig, virtualization, David S.  Miller,
	Eric Dumazet, Paolo Abeni, Alexei Starovoitov, Daniel Borkmann,
	Jesper Dangaard Brouer, John Fastabend, netdev, bpf,
	Michael S. Tsirkin, Jason Wang

On Mon, 31 Jul 2023 19:36:06 -0700, Jakub Kicinski <kuba@kernel.org> wrote:
> On Tue, 1 Aug 2023 10:03:44 +0800 Xuan Zhuo wrote:
> > > Virtio is either a SW
> > > construct or offloaded to very capable HW, so either way cost of
> > > creating an extra instance for DPDK or whatever else is very low.
> >
> > The extra instance is virtio-net?
> >
> > I think there is a gap. So let me give you a brief introduction of our case.
> >
> > Firstly, we donot use dpdk. We use the AF_XDP, because of that the AF_XDP is
> > more simpler and easy to deploy for the nginx.
> >
> > We use the AF_XDP to speedup the UDP of the quic. By the library, the APP just
> > needs some simple change.
> >
> > On the AliYun, the net driver is virtio-net. So we want the virtio-net support
> > the AF_XDP.
> >
> > I guess what you mean is that we can speed up through the cooperation of devices
> > and drivers, but our machines are public clouds, and we cannot change the
> > back-end devices of virtio under normal circumstances.
> >
> > Here I do not know the different of the real hw and the virtio-net.
>
> You have this working and benchmarked or this is just and idea?

This is not just an idea. I said that has been used on large scale.

This is the library for the APP to use the AF_XDP. We has open it.
https://gitee.com/anolis/libxudp

This is the Alibaba version of the nginx. That has been opened, that supported
to work with the libray to use AF_XDP.
http://tengine.taobao.org/

I supported this on our kernel release Anolis/Alinux.

The work was done about 2 years ago. You know, I pushed the first version to
enable AF_XDP on virtio-net about two years ago. I never thought the job would
be so difficult.

The nic (virtio-net) of AliYun can reach 24,000,000PPS.
So I think there is no different with the real HW on the performance.

With the AF_XDP, the UDP pps is seven times that of the kernel udp stack.

>
> What about io_uring zero copy w/ pre-registered buffers.
> You'll get csum offload, GSO, all the normal perf features.

We tried io-uring, but it was not suitable for our scenario.

Yes, now the AF_XDP does not support the csum offload and GSO.
This is indeed a small problem.

Thanks.

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: [PATCH vhost v11 05/10] virtio_ring: introduce virtqueue_dma_dev()
  2023-08-01  2:57                                 ` Xuan Zhuo
  (?)
@ 2023-08-01 15:45                                 ` Jakub Kicinski
  2023-08-02  1:36                                     ` Xuan Zhuo
  -1 siblings, 1 reply; 176+ messages in thread
From: Jakub Kicinski @ 2023-08-01 15:45 UTC (permalink / raw)
  To: Xuan Zhuo
  Cc: Christoph Hellwig, virtualization, David S.  Miller,
	Eric Dumazet, Paolo Abeni, Alexei Starovoitov, Daniel Borkmann,
	Jesper Dangaard Brouer, John Fastabend, netdev, bpf,
	Michael S. Tsirkin, Jason Wang, Pavel Begunkov

On Tue, 1 Aug 2023 10:57:30 +0800 Xuan Zhuo wrote:
> > You have this working and benchmarked or this is just and idea?  
> 
> This is not just an idea. I said that has been used on large scale.
> 
> This is the library for the APP to use the AF_XDP. We has open it.
> https://gitee.com/anolis/libxudp
> 
> This is the Alibaba version of the nginx. That has been opened, that supported
> to work with the libray to use AF_XDP.
> http://tengine.taobao.org/
> 
> I supported this on our kernel release Anolis/Alinux.

Interesting!

> The work was done about 2 years ago. You know, I pushed the first version to
> enable AF_XDP on virtio-net about two years ago. I never thought the job would
> be so difficult.

Me neither, but it is what it is.

> The nic (virtio-net) of AliYun can reach 24,000,000PPS.
> So I think there is no different with the real HW on the performance.
> 
> With the AF_XDP, the UDP pps is seven times that of the kernel udp stack.

UDP pps or QUIC pps? UDP with or without GSO?

Do you have measurements of how much it saves in real world workloads?
I'm asking mostly out of curiosity, not to question the use case.

> > What about io_uring zero copy w/ pre-registered buffers.
> > You'll get csum offload, GSO, all the normal perf features.  
> 
> We tried io-uring, but it was not suitable for our scenario.
> 
> Yes, now the AF_XDP does not support the csum offload and GSO.
> This is indeed a small problem.

Can you say more about io-uring suitability? It can do zero copy
and recently-ish Pavel optimized it quite a bit.

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: [PATCH vhost v11 05/10] virtio_ring: introduce virtqueue_dma_dev()
  2023-07-28  6:02                     ` Xuan Zhuo
@ 2023-08-01 16:17                       ` Michael S. Tsirkin
  -1 siblings, 0 replies; 176+ messages in thread
From: Michael S. Tsirkin @ 2023-08-01 16:17 UTC (permalink / raw)
  To: Xuan Zhuo
  Cc: Christoph Hellwig, virtualization, Jason Wang, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Alexei Starovoitov,
	Daniel Borkmann, Jesper Dangaard Brouer, John Fastabend, netdev,
	bpf

On Fri, Jul 28, 2023 at 02:02:33PM +0800, Xuan Zhuo wrote:
> On Tue, 25 Jul 2023 19:07:23 +0800, Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > On Tue, 25 Jul 2023 03:34:34 -0400, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > > On Tue, Jul 25, 2023 at 10:13:48AM +0800, Xuan Zhuo wrote:
> > > > On Mon, 24 Jul 2023 09:43:42 -0700, Christoph Hellwig <hch@infradead.org> wrote:
> > > > > On Thu, Jul 20, 2023 at 01:21:07PM -0400, Michael S. Tsirkin wrote:
> > > > > > Well I think we can add wrappers like virtio_dma_sync and so on.
> > > > > > There are NOP for non-dma so passing the dma device is harmless.
> > > > >
> > > > > Yes, please.
> > > >
> > > >
> > > > I am not sure I got this fully.
> > > >
> > > > Are you mean this:
> > > > https://lore.kernel.org/all/20230214072704.126660-8-xuanzhuo@linux.alibaba.com/
> > > > https://lore.kernel.org/all/20230214072704.126660-9-xuanzhuo@linux.alibaba.com/
> > > >
> > > > Then the driver must do dma operation(map and sync) by these virtio_dma_* APIs.
> > > > No care the device is non-dma device or dma device.
> > >
> > > yes
> > >
> > > > Then the AF_XDP must use these virtio_dma_* APIs for virtio device.
> > >
> > > We'll worry about AF_XDP when the patch is posted.
> >
> > YES.
> >
> > We discussed it. They voted 'no'.
> >
> > http://lore.kernel.org/all/20230424082856.15c1e593@kernel.org
> 
> 
> Hi guys, this topic is stuck again. How should I proceed with this work?
> 
> Let me briefly summarize:
> 1. The problem with adding virtio_dma_{map, sync} api is that, for AF_XDP and
> the driver layer, we need to support these APIs. The current conclusion of
> AF_XDP is no.
> 
> 2. Set dma_set_mask_and_coherent, then we can use DMA API uniformly inside
> driver. This idea seems to be inconsistent with the framework design of DMA. The
> conclusion is no.
> 
> 3. We noticed that if the virtio device supports VIRTIO_F_ACCESS_PLATFORM, it
> uses DMA API. And this type of device is the future direction, so we only
> support DMA premapped for this type of virtio device. The problem with this
> solution is that virtqueue_dma_dev() only returns dev in some cases, because
> VIRTIO_F_ACCESS_PLATFORM is supported in such cases. Otherwise NULL is returned.
> This option is currently NO.
> 
> So I'm wondering what should I do, from a DMA point of view, is there any
> solution in case of using DMA API?
> 
> Thank you


I think it's ok at this point, Christoph just asked you
to add wrappers for map/unmap for use in virtio code.
Seems like a cosmetic change, shouldn't be hard.
Otherwise I haven't seen significant comments.


Christoph do I summarize what you are saying correctly?
-- 
MST


^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: [PATCH vhost v11 05/10] virtio_ring: introduce virtqueue_dma_dev()
@ 2023-08-01 16:17                       ` Michael S. Tsirkin
  0 siblings, 0 replies; 176+ messages in thread
From: Michael S. Tsirkin @ 2023-08-01 16:17 UTC (permalink / raw)
  To: Xuan Zhuo
  Cc: Jesper Dangaard Brouer, Daniel Borkmann, netdev, John Fastabend,
	Alexei Starovoitov, virtualization, Christoph Hellwig,
	Eric Dumazet, Jakub Kicinski, bpf, Paolo Abeni, David S. Miller

On Fri, Jul 28, 2023 at 02:02:33PM +0800, Xuan Zhuo wrote:
> On Tue, 25 Jul 2023 19:07:23 +0800, Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > On Tue, 25 Jul 2023 03:34:34 -0400, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > > On Tue, Jul 25, 2023 at 10:13:48AM +0800, Xuan Zhuo wrote:
> > > > On Mon, 24 Jul 2023 09:43:42 -0700, Christoph Hellwig <hch@infradead.org> wrote:
> > > > > On Thu, Jul 20, 2023 at 01:21:07PM -0400, Michael S. Tsirkin wrote:
> > > > > > Well I think we can add wrappers like virtio_dma_sync and so on.
> > > > > > There are NOP for non-dma so passing the dma device is harmless.
> > > > >
> > > > > Yes, please.
> > > >
> > > >
> > > > I am not sure I got this fully.
> > > >
> > > > Are you mean this:
> > > > https://lore.kernel.org/all/20230214072704.126660-8-xuanzhuo@linux.alibaba.com/
> > > > https://lore.kernel.org/all/20230214072704.126660-9-xuanzhuo@linux.alibaba.com/
> > > >
> > > > Then the driver must do dma operation(map and sync) by these virtio_dma_* APIs.
> > > > No care the device is non-dma device or dma device.
> > >
> > > yes
> > >
> > > > Then the AF_XDP must use these virtio_dma_* APIs for virtio device.
> > >
> > > We'll worry about AF_XDP when the patch is posted.
> >
> > YES.
> >
> > We discussed it. They voted 'no'.
> >
> > http://lore.kernel.org/all/20230424082856.15c1e593@kernel.org
> 
> 
> Hi guys, this topic is stuck again. How should I proceed with this work?
> 
> Let me briefly summarize:
> 1. The problem with adding virtio_dma_{map, sync} api is that, for AF_XDP and
> the driver layer, we need to support these APIs. The current conclusion of
> AF_XDP is no.
> 
> 2. Set dma_set_mask_and_coherent, then we can use DMA API uniformly inside
> driver. This idea seems to be inconsistent with the framework design of DMA. The
> conclusion is no.
> 
> 3. We noticed that if the virtio device supports VIRTIO_F_ACCESS_PLATFORM, it
> uses DMA API. And this type of device is the future direction, so we only
> support DMA premapped for this type of virtio device. The problem with this
> solution is that virtqueue_dma_dev() only returns dev in some cases, because
> VIRTIO_F_ACCESS_PLATFORM is supported in such cases. Otherwise NULL is returned.
> This option is currently NO.
> 
> So I'm wondering what should I do, from a DMA point of view, is there any
> solution in case of using DMA API?
> 
> Thank you


I think it's ok at this point, Christoph just asked you
to add wrappers for map/unmap for use in virtio code.
Seems like a cosmetic change, shouldn't be hard.
Otherwise I haven't seen significant comments.


Christoph do I summarize what you are saying correctly?
-- 
MST

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: [PATCH vhost v11 05/10] virtio_ring: introduce virtqueue_dma_dev()
  2023-08-01 15:45                                 ` Jakub Kicinski
@ 2023-08-02  1:36                                     ` Xuan Zhuo
  0 siblings, 0 replies; 176+ messages in thread
From: Xuan Zhuo @ 2023-08-02  1:36 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Pavel Begunkov, Jesper Dangaard Brouer, Daniel Borkmann,
	Michael S. Tsirkin, netdev, John Fastabend, Alexei Starovoitov,
	virtualization, Christoph Hellwig, Eric Dumazet, bpf,
	Paolo Abeni, David S.  Miller

On Tue, 1 Aug 2023 08:45:10 -0700, Jakub Kicinski <kuba@kernel.org> wrote:
> On Tue, 1 Aug 2023 10:57:30 +0800 Xuan Zhuo wrote:
> > > You have this working and benchmarked or this is just and idea?
> >
> > This is not just an idea. I said that has been used on large scale.
> >
> > This is the library for the APP to use the AF_XDP. We has open it.
> > https://gitee.com/anolis/libxudp
> >
> > This is the Alibaba version of the nginx. That has been opened, that supported
> > to work with the libray to use AF_XDP.
> > http://tengine.taobao.org/
> >
> > I supported this on our kernel release Anolis/Alinux.
>
> Interesting!
>
> > The work was done about 2 years ago. You know, I pushed the first version to
> > enable AF_XDP on virtio-net about two years ago. I never thought the job would
> > be so difficult.
>
> Me neither, but it is what it is.
>
> > The nic (virtio-net) of AliYun can reach 24,000,000PPS.
> > So I think there is no different with the real HW on the performance.
> >
> > With the AF_XDP, the UDP pps is seven times that of the kernel udp stack.
>
> UDP pps or QUIC pps? UDP with or without GSO?

UDP PPS without GSO.

>
> Do you have measurements of how much it saves in real world workloads?
> I'm asking mostly out of curiosity, not to question the use case.

YES,the result is affected by the request size, we can reach 10-40%.
The smaller the request size, the lower the result.

>
> > > What about io_uring zero copy w/ pre-registered buffers.
> > > You'll get csum offload, GSO, all the normal perf features.
> >
> > We tried io-uring, but it was not suitable for our scenario.
> >
> > Yes, now the AF_XDP does not support the csum offload and GSO.
> > This is indeed a small problem.
>
> Can you say more about io-uring suitability? It can do zero copy
> and recently-ish Pavel optimized it quite a bit.

First, AF_XDP is also zero-copy. We also use XDP for a few things.

And this was all about two years ago, so we have to say something about io-uring
two years ago.

As far as I know, io-uring still use kernel udp stack, AF_XDP can
skip all kernel stack directly to driver.

So here, io-ring does not have too much advantage.

Thanks.

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: [PATCH vhost v11 05/10] virtio_ring: introduce virtqueue_dma_dev()
@ 2023-08-02  1:36                                     ` Xuan Zhuo
  0 siblings, 0 replies; 176+ messages in thread
From: Xuan Zhuo @ 2023-08-02  1:36 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Christoph Hellwig, virtualization, David S.  Miller,
	Eric Dumazet, Paolo Abeni, Alexei Starovoitov, Daniel Borkmann,
	Jesper Dangaard Brouer, John Fastabend, netdev, bpf,
	Michael S. Tsirkin, Jason Wang, Pavel Begunkov

On Tue, 1 Aug 2023 08:45:10 -0700, Jakub Kicinski <kuba@kernel.org> wrote:
> On Tue, 1 Aug 2023 10:57:30 +0800 Xuan Zhuo wrote:
> > > You have this working and benchmarked or this is just and idea?
> >
> > This is not just an idea. I said that has been used on large scale.
> >
> > This is the library for the APP to use the AF_XDP. We has open it.
> > https://gitee.com/anolis/libxudp
> >
> > This is the Alibaba version of the nginx. That has been opened, that supported
> > to work with the libray to use AF_XDP.
> > http://tengine.taobao.org/
> >
> > I supported this on our kernel release Anolis/Alinux.
>
> Interesting!
>
> > The work was done about 2 years ago. You know, I pushed the first version to
> > enable AF_XDP on virtio-net about two years ago. I never thought the job would
> > be so difficult.
>
> Me neither, but it is what it is.
>
> > The nic (virtio-net) of AliYun can reach 24,000,000PPS.
> > So I think there is no different with the real HW on the performance.
> >
> > With the AF_XDP, the UDP pps is seven times that of the kernel udp stack.
>
> UDP pps or QUIC pps? UDP with or without GSO?

UDP PPS without GSO.

>
> Do you have measurements of how much it saves in real world workloads?
> I'm asking mostly out of curiosity, not to question the use case.

YES,the result is affected by the request size, we can reach 10-40%.
The smaller the request size, the lower the result.

>
> > > What about io_uring zero copy w/ pre-registered buffers.
> > > You'll get csum offload, GSO, all the normal perf features.
> >
> > We tried io-uring, but it was not suitable for our scenario.
> >
> > Yes, now the AF_XDP does not support the csum offload and GSO.
> > This is indeed a small problem.
>
> Can you say more about io-uring suitability? It can do zero copy
> and recently-ish Pavel optimized it quite a bit.

First, AF_XDP is also zero-copy. We also use XDP for a few things.

And this was all about two years ago, so we have to say something about io-uring
two years ago.

As far as I know, io-uring still use kernel udp stack, AF_XDP can
skip all kernel stack directly to driver.

So here, io-ring does not have too much advantage.

Thanks.


^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: [PATCH vhost v11 05/10] virtio_ring: introduce virtqueue_dma_dev()
  2023-08-01 16:17                       ` Michael S. Tsirkin
@ 2023-08-02  1:49                         ` Xuan Zhuo
  -1 siblings, 0 replies; 176+ messages in thread
From: Xuan Zhuo @ 2023-08-02  1:49 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Jesper Dangaard Brouer, Daniel Borkmann, netdev, John Fastabend,
	Alexei Starovoitov, virtualization, Christoph Hellwig,
	Eric Dumazet, Jakub Kicinski, bpf, Paolo Abeni, David S. Miller

On Tue, 1 Aug 2023 12:17:47 -0400, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> On Fri, Jul 28, 2023 at 02:02:33PM +0800, Xuan Zhuo wrote:
> > On Tue, 25 Jul 2023 19:07:23 +0800, Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > On Tue, 25 Jul 2023 03:34:34 -0400, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > > > On Tue, Jul 25, 2023 at 10:13:48AM +0800, Xuan Zhuo wrote:
> > > > > On Mon, 24 Jul 2023 09:43:42 -0700, Christoph Hellwig <hch@infradead.org> wrote:
> > > > > > On Thu, Jul 20, 2023 at 01:21:07PM -0400, Michael S. Tsirkin wrote:
> > > > > > > Well I think we can add wrappers like virtio_dma_sync and so on.
> > > > > > > There are NOP for non-dma so passing the dma device is harmless.
> > > > > >
> > > > > > Yes, please.
> > > > >
> > > > >
> > > > > I am not sure I got this fully.
> > > > >
> > > > > Are you mean this:
> > > > > https://lore.kernel.org/all/20230214072704.126660-8-xuanzhuo@linux.alibaba.com/
> > > > > https://lore.kernel.org/all/20230214072704.126660-9-xuanzhuo@linux.alibaba.com/
> > > > >
> > > > > Then the driver must do dma operation(map and sync) by these virtio_dma_* APIs.
> > > > > No care the device is non-dma device or dma device.
> > > >
> > > > yes
> > > >
> > > > > Then the AF_XDP must use these virtio_dma_* APIs for virtio device.
> > > >
> > > > We'll worry about AF_XDP when the patch is posted.
> > >
> > > YES.
> > >
> > > We discussed it. They voted 'no'.
> > >
> > > http://lore.kernel.org/all/20230424082856.15c1e593@kernel.org
> >
> >
> > Hi guys, this topic is stuck again. How should I proceed with this work?
> >
> > Let me briefly summarize:
> > 1. The problem with adding virtio_dma_{map, sync} api is that, for AF_XDP and
> > the driver layer, we need to support these APIs. The current conclusion of
> > AF_XDP is no.
> >
> > 2. Set dma_set_mask_and_coherent, then we can use DMA API uniformly inside
> > driver. This idea seems to be inconsistent with the framework design of DMA. The
> > conclusion is no.
> >
> > 3. We noticed that if the virtio device supports VIRTIO_F_ACCESS_PLATFORM, it
> > uses DMA API. And this type of device is the future direction, so we only
> > support DMA premapped for this type of virtio device. The problem with this
> > solution is that virtqueue_dma_dev() only returns dev in some cases, because
> > VIRTIO_F_ACCESS_PLATFORM is supported in such cases. Otherwise NULL is returned.
> > This option is currently NO.
> >
> > So I'm wondering what should I do, from a DMA point of view, is there any
> > solution in case of using DMA API?
> >
> > Thank you
>
>
> I think it's ok at this point, Christoph just asked you
> to add wrappers for map/unmap for use in virtio code.
> Seems like a cosmetic change, shouldn't be hard.

Yes, that is not hard, I has this code.

But, you mean that the wrappers is just used for the virtio driver code?
And we also offer the  API virtqueue_dma_dev() at the same time?
Then the driver will has two chooses to do DMA.

Is that so?


> Otherwise I haven't seen significant comments.
>
>
> Christoph do I summarize what you are saying correctly?
> --
> MST
>
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: [PATCH vhost v11 05/10] virtio_ring: introduce virtqueue_dma_dev()
@ 2023-08-02  1:49                         ` Xuan Zhuo
  0 siblings, 0 replies; 176+ messages in thread
From: Xuan Zhuo @ 2023-08-02  1:49 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Christoph Hellwig, virtualization, Jason Wang, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Alexei Starovoitov,
	Daniel Borkmann, Jesper Dangaard Brouer, John Fastabend, netdev,
	bpf

On Tue, 1 Aug 2023 12:17:47 -0400, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> On Fri, Jul 28, 2023 at 02:02:33PM +0800, Xuan Zhuo wrote:
> > On Tue, 25 Jul 2023 19:07:23 +0800, Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > On Tue, 25 Jul 2023 03:34:34 -0400, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > > > On Tue, Jul 25, 2023 at 10:13:48AM +0800, Xuan Zhuo wrote:
> > > > > On Mon, 24 Jul 2023 09:43:42 -0700, Christoph Hellwig <hch@infradead.org> wrote:
> > > > > > On Thu, Jul 20, 2023 at 01:21:07PM -0400, Michael S. Tsirkin wrote:
> > > > > > > Well I think we can add wrappers like virtio_dma_sync and so on.
> > > > > > > There are NOP for non-dma so passing the dma device is harmless.
> > > > > >
> > > > > > Yes, please.
> > > > >
> > > > >
> > > > > I am not sure I got this fully.
> > > > >
> > > > > Are you mean this:
> > > > > https://lore.kernel.org/all/20230214072704.126660-8-xuanzhuo@linux.alibaba.com/
> > > > > https://lore.kernel.org/all/20230214072704.126660-9-xuanzhuo@linux.alibaba.com/
> > > > >
> > > > > Then the driver must do dma operation(map and sync) by these virtio_dma_* APIs.
> > > > > No care the device is non-dma device or dma device.
> > > >
> > > > yes
> > > >
> > > > > Then the AF_XDP must use these virtio_dma_* APIs for virtio device.
> > > >
> > > > We'll worry about AF_XDP when the patch is posted.
> > >
> > > YES.
> > >
> > > We discussed it. They voted 'no'.
> > >
> > > http://lore.kernel.org/all/20230424082856.15c1e593@kernel.org
> >
> >
> > Hi guys, this topic is stuck again. How should I proceed with this work?
> >
> > Let me briefly summarize:
> > 1. The problem with adding virtio_dma_{map, sync} api is that, for AF_XDP and
> > the driver layer, we need to support these APIs. The current conclusion of
> > AF_XDP is no.
> >
> > 2. Set dma_set_mask_and_coherent, then we can use DMA API uniformly inside
> > driver. This idea seems to be inconsistent with the framework design of DMA. The
> > conclusion is no.
> >
> > 3. We noticed that if the virtio device supports VIRTIO_F_ACCESS_PLATFORM, it
> > uses DMA API. And this type of device is the future direction, so we only
> > support DMA premapped for this type of virtio device. The problem with this
> > solution is that virtqueue_dma_dev() only returns dev in some cases, because
> > VIRTIO_F_ACCESS_PLATFORM is supported in such cases. Otherwise NULL is returned.
> > This option is currently NO.
> >
> > So I'm wondering what should I do, from a DMA point of view, is there any
> > solution in case of using DMA API?
> >
> > Thank you
>
>
> I think it's ok at this point, Christoph just asked you
> to add wrappers for map/unmap for use in virtio code.
> Seems like a cosmetic change, shouldn't be hard.

Yes, that is not hard, I has this code.

But, you mean that the wrappers is just used for the virtio driver code?
And we also offer the  API virtqueue_dma_dev() at the same time?
Then the driver will has two chooses to do DMA.

Is that so?


> Otherwise I haven't seen significant comments.
>
>
> Christoph do I summarize what you are saying correctly?
> --
> MST
>

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: [PATCH vhost v11 05/10] virtio_ring: introduce virtqueue_dma_dev()
  2023-08-02  1:36                                     ` Xuan Zhuo
  (?)
@ 2023-08-02 11:12                                     ` Pavel Begunkov
  -1 siblings, 0 replies; 176+ messages in thread
From: Pavel Begunkov @ 2023-08-02 11:12 UTC (permalink / raw)
  To: Xuan Zhuo, Jakub Kicinski
  Cc: Christoph Hellwig, virtualization, David S. Miller, Eric Dumazet,
	Paolo Abeni, Alexei Starovoitov, Daniel Borkmann,
	Jesper Dangaard Brouer, John Fastabend, netdev, bpf,
	Michael S. Tsirkin, Jason Wang

On 8/2/23 02:36, Xuan Zhuo wrote:
> On Tue, 1 Aug 2023 08:45:10 -0700, Jakub Kicinski <kuba@kernel.org> wrote:
>> On Tue, 1 Aug 2023 10:57:30 +0800 Xuan Zhuo wrote:
>>>> You have this working and benchmarked or this is just and idea?
>>>
>>> This is not just an idea. I said that has been used on large scale.
>>>
>>> This is the library for the APP to use the AF_XDP. We has open it.
>>> https://gitee.com/anolis/libxudp
>>>
>>> This is the Alibaba version of the nginx. That has been opened, that supported
>>> to work with the libray to use AF_XDP.
>>> http://tengine.taobao.org/
>>>
>>> I supported this on our kernel release Anolis/Alinux.
>>
>> Interesting!
>>
>>> The work was done about 2 years ago. You know, I pushed the first version to
>>> enable AF_XDP on virtio-net about two years ago. I never thought the job would
>>> be so difficult.
>>
>> Me neither, but it is what it is.
>>
>>> The nic (virtio-net) of AliYun can reach 24,000,000PPS.
>>> So I think there is no different with the real HW on the performance.
>>>
>>> With the AF_XDP, the UDP pps is seven times that of the kernel udp stack.
>>
>> UDP pps or QUIC pps? UDP with or without GSO?
> 
> UDP PPS without GSO.
> 
>>
>> Do you have measurements of how much it saves in real world workloads?
>> I'm asking mostly out of curiosity, not to question the use case.
> 
> YES,the result is affected by the request size, we can reach 10-40%.
> The smaller the request size, the lower the result.
> 
>>
>>>> What about io_uring zero copy w/ pre-registered buffers.
>>>> You'll get csum offload, GSO, all the normal perf features.
>>>
>>> We tried io-uring, but it was not suitable for our scenario.
>>>
>>> Yes, now the AF_XDP does not support the csum offload and GSO.
>>> This is indeed a small problem.
>>
>> Can you say more about io-uring suitability? It can do zero copy
>> and recently-ish Pavel optimized it quite a bit.
> 
> First, AF_XDP is also zero-copy. We also use XDP for a few things.
> 
> And this was all about two years ago, so we have to say something about io-uring
> two years ago.
> 
> As far as I know, io-uring still use kernel udp stack, AF_XDP can
> skip all kernel stack directly to driver.
> 
> So here, io-ring does not have too much advantage.

Unfortunately I'd agree. Most of it is in the net stack. It can be
optimised to a certain extent (IMHO far more modest than 7x) but would
need extensive reworking, and I don't think I saw any appetite for that

-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: [PATCH vhost v11 05/10] virtio_ring: introduce virtqueue_dma_dev()
  2023-08-02  1:49                         ` Xuan Zhuo
@ 2023-08-07  6:14                           ` Xuan Zhuo
  -1 siblings, 0 replies; 176+ messages in thread
From: Xuan Zhuo @ 2023-08-07  6:14 UTC (permalink / raw)
  To: Michael S. Tsirkin, Christoph Hellwig
  Cc: virtualization, Jason Wang, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Alexei Starovoitov, Daniel Borkmann,
	Jesper Dangaard Brouer, John Fastabend, netdev, bpf

On Wed, 2 Aug 2023 09:49:31 +0800, Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> On Tue, 1 Aug 2023 12:17:47 -0400, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > On Fri, Jul 28, 2023 at 02:02:33PM +0800, Xuan Zhuo wrote:
> > > On Tue, 25 Jul 2023 19:07:23 +0800, Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > > On Tue, 25 Jul 2023 03:34:34 -0400, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > > > > On Tue, Jul 25, 2023 at 10:13:48AM +0800, Xuan Zhuo wrote:
> > > > > > On Mon, 24 Jul 2023 09:43:42 -0700, Christoph Hellwig <hch@infradead.org> wrote:
> > > > > > > On Thu, Jul 20, 2023 at 01:21:07PM -0400, Michael S. Tsirkin wrote:
> > > > > > > > Well I think we can add wrappers like virtio_dma_sync and so on.
> > > > > > > > There are NOP for non-dma so passing the dma device is harmless.
> > > > > > >
> > > > > > > Yes, please.
> > > > > >
> > > > > >
> > > > > > I am not sure I got this fully.
> > > > > >
> > > > > > Are you mean this:
> > > > > > https://lore.kernel.org/all/20230214072704.126660-8-xuanzhuo@linux.alibaba.com/
> > > > > > https://lore.kernel.org/all/20230214072704.126660-9-xuanzhuo@linux.alibaba.com/
> > > > > >
> > > > > > Then the driver must do dma operation(map and sync) by these virtio_dma_* APIs.
> > > > > > No care the device is non-dma device or dma device.
> > > > >
> > > > > yes
> > > > >
> > > > > > Then the AF_XDP must use these virtio_dma_* APIs for virtio device.
> > > > >
> > > > > We'll worry about AF_XDP when the patch is posted.
> > > >
> > > > YES.
> > > >
> > > > We discussed it. They voted 'no'.
> > > >
> > > > http://lore.kernel.org/all/20230424082856.15c1e593@kernel.org
> > >
> > >
> > > Hi guys, this topic is stuck again. How should I proceed with this work?
> > >
> > > Let me briefly summarize:
> > > 1. The problem with adding virtio_dma_{map, sync} api is that, for AF_XDP and
> > > the driver layer, we need to support these APIs. The current conclusion of
> > > AF_XDP is no.
> > >
> > > 2. Set dma_set_mask_and_coherent, then we can use DMA API uniformly inside
> > > driver. This idea seems to be inconsistent with the framework design of DMA. The
> > > conclusion is no.
> > >
> > > 3. We noticed that if the virtio device supports VIRTIO_F_ACCESS_PLATFORM, it
> > > uses DMA API. And this type of device is the future direction, so we only
> > > support DMA premapped for this type of virtio device. The problem with this
> > > solution is that virtqueue_dma_dev() only returns dev in some cases, because
> > > VIRTIO_F_ACCESS_PLATFORM is supported in such cases. Otherwise NULL is returned.
> > > This option is currently NO.
> > >
> > > So I'm wondering what should I do, from a DMA point of view, is there any
> > > solution in case of using DMA API?
> > >
> > > Thank you
> >
> >
> > I think it's ok at this point, Christoph just asked you
> > to add wrappers for map/unmap for use in virtio code.
> > Seems like a cosmetic change, shouldn't be hard.
>
> Yes, that is not hard, I has this code.
>
> But, you mean that the wrappers is just used for the virtio driver code?
> And we also offer the  API virtqueue_dma_dev() at the same time?
> Then the driver will has two chooses to do DMA.
>
> Is that so?

Ping.

Thanks

>
>
> > Otherwise I haven't seen significant comments.
> >
> >
> > Christoph do I summarize what you are saying correctly?
> > --
> > MST
> >
>

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: [PATCH vhost v11 05/10] virtio_ring: introduce virtqueue_dma_dev()
@ 2023-08-07  6:14                           ` Xuan Zhuo
  0 siblings, 0 replies; 176+ messages in thread
From: Xuan Zhuo @ 2023-08-07  6:14 UTC (permalink / raw)
  To: Michael S. Tsirkin, Christoph Hellwig
  Cc: Jesper Dangaard Brouer, Daniel Borkmann, netdev, John Fastabend,
	Alexei Starovoitov, virtualization, Eric Dumazet, Jakub Kicinski,
	bpf, Paolo Abeni, David S. Miller

On Wed, 2 Aug 2023 09:49:31 +0800, Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> On Tue, 1 Aug 2023 12:17:47 -0400, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > On Fri, Jul 28, 2023 at 02:02:33PM +0800, Xuan Zhuo wrote:
> > > On Tue, 25 Jul 2023 19:07:23 +0800, Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > > On Tue, 25 Jul 2023 03:34:34 -0400, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > > > > On Tue, Jul 25, 2023 at 10:13:48AM +0800, Xuan Zhuo wrote:
> > > > > > On Mon, 24 Jul 2023 09:43:42 -0700, Christoph Hellwig <hch@infradead.org> wrote:
> > > > > > > On Thu, Jul 20, 2023 at 01:21:07PM -0400, Michael S. Tsirkin wrote:
> > > > > > > > Well I think we can add wrappers like virtio_dma_sync and so on.
> > > > > > > > There are NOP for non-dma so passing the dma device is harmless.
> > > > > > >
> > > > > > > Yes, please.
> > > > > >
> > > > > >
> > > > > > I am not sure I got this fully.
> > > > > >
> > > > > > Are you mean this:
> > > > > > https://lore.kernel.org/all/20230214072704.126660-8-xuanzhuo@linux.alibaba.com/
> > > > > > https://lore.kernel.org/all/20230214072704.126660-9-xuanzhuo@linux.alibaba.com/
> > > > > >
> > > > > > Then the driver must do dma operation(map and sync) by these virtio_dma_* APIs.
> > > > > > No care the device is non-dma device or dma device.
> > > > >
> > > > > yes
> > > > >
> > > > > > Then the AF_XDP must use these virtio_dma_* APIs for virtio device.
> > > > >
> > > > > We'll worry about AF_XDP when the patch is posted.
> > > >
> > > > YES.
> > > >
> > > > We discussed it. They voted 'no'.
> > > >
> > > > http://lore.kernel.org/all/20230424082856.15c1e593@kernel.org
> > >
> > >
> > > Hi guys, this topic is stuck again. How should I proceed with this work?
> > >
> > > Let me briefly summarize:
> > > 1. The problem with adding virtio_dma_{map, sync} api is that, for AF_XDP and
> > > the driver layer, we need to support these APIs. The current conclusion of
> > > AF_XDP is no.
> > >
> > > 2. Set dma_set_mask_and_coherent, then we can use DMA API uniformly inside
> > > driver. This idea seems to be inconsistent with the framework design of DMA. The
> > > conclusion is no.
> > >
> > > 3. We noticed that if the virtio device supports VIRTIO_F_ACCESS_PLATFORM, it
> > > uses DMA API. And this type of device is the future direction, so we only
> > > support DMA premapped for this type of virtio device. The problem with this
> > > solution is that virtqueue_dma_dev() only returns dev in some cases, because
> > > VIRTIO_F_ACCESS_PLATFORM is supported in such cases. Otherwise NULL is returned.
> > > This option is currently NO.
> > >
> > > So I'm wondering what should I do, from a DMA point of view, is there any
> > > solution in case of using DMA API?
> > >
> > > Thank you
> >
> >
> > I think it's ok at this point, Christoph just asked you
> > to add wrappers for map/unmap for use in virtio code.
> > Seems like a cosmetic change, shouldn't be hard.
>
> Yes, that is not hard, I has this code.
>
> But, you mean that the wrappers is just used for the virtio driver code?
> And we also offer the  API virtqueue_dma_dev() at the same time?
> Then the driver will has two chooses to do DMA.
>
> Is that so?

Ping.

Thanks

>
>
> > Otherwise I haven't seen significant comments.
> >
> >
> > Christoph do I summarize what you are saying correctly?
> > --
> > MST
> >
>
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: [PATCH vhost v11 05/10] virtio_ring: introduce virtqueue_dma_dev()
  2023-08-07  6:14                           ` Xuan Zhuo
@ 2023-08-08  2:26                             ` Jason Wang
  -1 siblings, 0 replies; 176+ messages in thread
From: Jason Wang @ 2023-08-08  2:26 UTC (permalink / raw)
  To: Xuan Zhuo
  Cc: Michael S. Tsirkin, Christoph Hellwig, virtualization,
	David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Alexei Starovoitov, Daniel Borkmann, Jesper Dangaard Brouer,
	John Fastabend, netdev, bpf

On Mon, Aug 7, 2023 at 2:15 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
>
> On Wed, 2 Aug 2023 09:49:31 +0800, Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > On Tue, 1 Aug 2023 12:17:47 -0400, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > > On Fri, Jul 28, 2023 at 02:02:33PM +0800, Xuan Zhuo wrote:
> > > > On Tue, 25 Jul 2023 19:07:23 +0800, Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > > > On Tue, 25 Jul 2023 03:34:34 -0400, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > > > > > On Tue, Jul 25, 2023 at 10:13:48AM +0800, Xuan Zhuo wrote:
> > > > > > > On Mon, 24 Jul 2023 09:43:42 -0700, Christoph Hellwig <hch@infradead.org> wrote:
> > > > > > > > On Thu, Jul 20, 2023 at 01:21:07PM -0400, Michael S. Tsirkin wrote:
> > > > > > > > > Well I think we can add wrappers like virtio_dma_sync and so on.
> > > > > > > > > There are NOP for non-dma so passing the dma device is harmless.
> > > > > > > >
> > > > > > > > Yes, please.
> > > > > > >
> > > > > > >
> > > > > > > I am not sure I got this fully.
> > > > > > >
> > > > > > > Are you mean this:
> > > > > > > https://lore.kernel.org/all/20230214072704.126660-8-xuanzhuo@linux.alibaba.com/
> > > > > > > https://lore.kernel.org/all/20230214072704.126660-9-xuanzhuo@linux.alibaba.com/
> > > > > > >
> > > > > > > Then the driver must do dma operation(map and sync) by these virtio_dma_* APIs.
> > > > > > > No care the device is non-dma device or dma device.
> > > > > >
> > > > > > yes
> > > > > >
> > > > > > > Then the AF_XDP must use these virtio_dma_* APIs for virtio device.
> > > > > >
> > > > > > We'll worry about AF_XDP when the patch is posted.
> > > > >
> > > > > YES.
> > > > >
> > > > > We discussed it. They voted 'no'.
> > > > >
> > > > > http://lore.kernel.org/all/20230424082856.15c1e593@kernel.org
> > > >
> > > >
> > > > Hi guys, this topic is stuck again. How should I proceed with this work?
> > > >
> > > > Let me briefly summarize:
> > > > 1. The problem with adding virtio_dma_{map, sync} api is that, for AF_XDP and
> > > > the driver layer, we need to support these APIs. The current conclusion of
> > > > AF_XDP is no.
> > > >
> > > > 2. Set dma_set_mask_and_coherent, then we can use DMA API uniformly inside
> > > > driver. This idea seems to be inconsistent with the framework design of DMA. The
> > > > conclusion is no.
> > > >
> > > > 3. We noticed that if the virtio device supports VIRTIO_F_ACCESS_PLATFORM, it
> > > > uses DMA API. And this type of device is the future direction, so we only
> > > > support DMA premapped for this type of virtio device. The problem with this
> > > > solution is that virtqueue_dma_dev() only returns dev in some cases, because
> > > > VIRTIO_F_ACCESS_PLATFORM is supported in such cases.

Could you explain the issue a little bit more?

E.g if we limit AF_XDP to ACESS_PLATFROM only, why does
virtqueue_dma_dev() only return dev in some cases?

Thanks

>Otherwise NULL is returned.
> > > > This option is currently NO.
> > > >
> > > > So I'm wondering what should I do, from a DMA point of view, is there any
> > > > solution in case of using DMA API?
> > > >
> > > > Thank you
> > >
> > >
> > > I think it's ok at this point, Christoph just asked you
> > > to add wrappers for map/unmap for use in virtio code.
> > > Seems like a cosmetic change, shouldn't be hard.
> >
> > Yes, that is not hard, I has this code.
> >
> > But, you mean that the wrappers is just used for the virtio driver code?
> > And we also offer the  API virtqueue_dma_dev() at the same time?
> > Then the driver will has two chooses to do DMA.
> >
> > Is that so?
>
> Ping.
>
> Thanks
>
> >
> >
> > > Otherwise I haven't seen significant comments.
> > >
> > >
> > > Christoph do I summarize what you are saying correctly?
> > > --
> > > MST
> > >
> >
>


^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: [PATCH vhost v11 05/10] virtio_ring: introduce virtqueue_dma_dev()
@ 2023-08-08  2:26                             ` Jason Wang
  0 siblings, 0 replies; 176+ messages in thread
From: Jason Wang @ 2023-08-08  2:26 UTC (permalink / raw)
  To: Xuan Zhuo
  Cc: Jesper Dangaard Brouer, Daniel Borkmann, Michael S. Tsirkin,
	netdev, John Fastabend, Alexei Starovoitov, virtualization,
	Christoph Hellwig, Eric Dumazet, Jakub Kicinski, bpf,
	Paolo Abeni, David S. Miller

On Mon, Aug 7, 2023 at 2:15 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
>
> On Wed, 2 Aug 2023 09:49:31 +0800, Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > On Tue, 1 Aug 2023 12:17:47 -0400, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > > On Fri, Jul 28, 2023 at 02:02:33PM +0800, Xuan Zhuo wrote:
> > > > On Tue, 25 Jul 2023 19:07:23 +0800, Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > > > On Tue, 25 Jul 2023 03:34:34 -0400, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > > > > > On Tue, Jul 25, 2023 at 10:13:48AM +0800, Xuan Zhuo wrote:
> > > > > > > On Mon, 24 Jul 2023 09:43:42 -0700, Christoph Hellwig <hch@infradead.org> wrote:
> > > > > > > > On Thu, Jul 20, 2023 at 01:21:07PM -0400, Michael S. Tsirkin wrote:
> > > > > > > > > Well I think we can add wrappers like virtio_dma_sync and so on.
> > > > > > > > > There are NOP for non-dma so passing the dma device is harmless.
> > > > > > > >
> > > > > > > > Yes, please.
> > > > > > >
> > > > > > >
> > > > > > > I am not sure I got this fully.
> > > > > > >
> > > > > > > Are you mean this:
> > > > > > > https://lore.kernel.org/all/20230214072704.126660-8-xuanzhuo@linux.alibaba.com/
> > > > > > > https://lore.kernel.org/all/20230214072704.126660-9-xuanzhuo@linux.alibaba.com/
> > > > > > >
> > > > > > > Then the driver must do dma operation(map and sync) by these virtio_dma_* APIs.
> > > > > > > No care the device is non-dma device or dma device.
> > > > > >
> > > > > > yes
> > > > > >
> > > > > > > Then the AF_XDP must use these virtio_dma_* APIs for virtio device.
> > > > > >
> > > > > > We'll worry about AF_XDP when the patch is posted.
> > > > >
> > > > > YES.
> > > > >
> > > > > We discussed it. They voted 'no'.
> > > > >
> > > > > http://lore.kernel.org/all/20230424082856.15c1e593@kernel.org
> > > >
> > > >
> > > > Hi guys, this topic is stuck again. How should I proceed with this work?
> > > >
> > > > Let me briefly summarize:
> > > > 1. The problem with adding virtio_dma_{map, sync} api is that, for AF_XDP and
> > > > the driver layer, we need to support these APIs. The current conclusion of
> > > > AF_XDP is no.
> > > >
> > > > 2. Set dma_set_mask_and_coherent, then we can use DMA API uniformly inside
> > > > driver. This idea seems to be inconsistent with the framework design of DMA. The
> > > > conclusion is no.
> > > >
> > > > 3. We noticed that if the virtio device supports VIRTIO_F_ACCESS_PLATFORM, it
> > > > uses DMA API. And this type of device is the future direction, so we only
> > > > support DMA premapped for this type of virtio device. The problem with this
> > > > solution is that virtqueue_dma_dev() only returns dev in some cases, because
> > > > VIRTIO_F_ACCESS_PLATFORM is supported in such cases.

Could you explain the issue a little bit more?

E.g if we limit AF_XDP to ACESS_PLATFROM only, why does
virtqueue_dma_dev() only return dev in some cases?

Thanks

>Otherwise NULL is returned.
> > > > This option is currently NO.
> > > >
> > > > So I'm wondering what should I do, from a DMA point of view, is there any
> > > > solution in case of using DMA API?
> > > >
> > > > Thank you
> > >
> > >
> > > I think it's ok at this point, Christoph just asked you
> > > to add wrappers for map/unmap for use in virtio code.
> > > Seems like a cosmetic change, shouldn't be hard.
> >
> > Yes, that is not hard, I has this code.
> >
> > But, you mean that the wrappers is just used for the virtio driver code?
> > And we also offer the  API virtqueue_dma_dev() at the same time?
> > Then the driver will has two chooses to do DMA.
> >
> > Is that so?
>
> Ping.
>
> Thanks
>
> >
> >
> > > Otherwise I haven't seen significant comments.
> > >
> > >
> > > Christoph do I summarize what you are saying correctly?
> > > --
> > > MST
> > >
> >
>

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: [PATCH vhost v11 05/10] virtio_ring: introduce virtqueue_dma_dev()
  2023-08-08  2:26                             ` Jason Wang
@ 2023-08-08  2:47                               ` Xuan Zhuo
  -1 siblings, 0 replies; 176+ messages in thread
From: Xuan Zhuo @ 2023-08-08  2:47 UTC (permalink / raw)
  To: Jason Wang
  Cc: Michael S. Tsirkin, Christoph Hellwig, virtualization,
	David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Alexei Starovoitov, Daniel Borkmann, Jesper Dangaard Brouer,
	John Fastabend, netdev, bpf

On Tue, 8 Aug 2023 10:26:04 +0800, Jason Wang <jasowang@redhat.com> wrote:
> On Mon, Aug 7, 2023 at 2:15 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> >
> > On Wed, 2 Aug 2023 09:49:31 +0800, Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > On Tue, 1 Aug 2023 12:17:47 -0400, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > > > On Fri, Jul 28, 2023 at 02:02:33PM +0800, Xuan Zhuo wrote:
> > > > > On Tue, 25 Jul 2023 19:07:23 +0800, Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > > > > On Tue, 25 Jul 2023 03:34:34 -0400, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > > > > > > On Tue, Jul 25, 2023 at 10:13:48AM +0800, Xuan Zhuo wrote:
> > > > > > > > On Mon, 24 Jul 2023 09:43:42 -0700, Christoph Hellwig <hch@infradead.org> wrote:
> > > > > > > > > On Thu, Jul 20, 2023 at 01:21:07PM -0400, Michael S. Tsirkin wrote:
> > > > > > > > > > Well I think we can add wrappers like virtio_dma_sync and so on.
> > > > > > > > > > There are NOP for non-dma so passing the dma device is harmless.
> > > > > > > > >
> > > > > > > > > Yes, please.
> > > > > > > >
> > > > > > > >
> > > > > > > > I am not sure I got this fully.
> > > > > > > >
> > > > > > > > Are you mean this:
> > > > > > > > https://lore.kernel.org/all/20230214072704.126660-8-xuanzhuo@linux.alibaba.com/
> > > > > > > > https://lore.kernel.org/all/20230214072704.126660-9-xuanzhuo@linux.alibaba.com/
> > > > > > > >
> > > > > > > > Then the driver must do dma operation(map and sync) by these virtio_dma_* APIs.
> > > > > > > > No care the device is non-dma device or dma device.
> > > > > > >
> > > > > > > yes
> > > > > > >
> > > > > > > > Then the AF_XDP must use these virtio_dma_* APIs for virtio device.
> > > > > > >
> > > > > > > We'll worry about AF_XDP when the patch is posted.
> > > > > >
> > > > > > YES.
> > > > > >
> > > > > > We discussed it. They voted 'no'.
> > > > > >
> > > > > > http://lore.kernel.org/all/20230424082856.15c1e593@kernel.org
> > > > >
> > > > >
> > > > > Hi guys, this topic is stuck again. How should I proceed with this work?
> > > > >
> > > > > Let me briefly summarize:
> > > > > 1. The problem with adding virtio_dma_{map, sync} api is that, for AF_XDP and
> > > > > the driver layer, we need to support these APIs. The current conclusion of
> > > > > AF_XDP is no.
> > > > >
> > > > > 2. Set dma_set_mask_and_coherent, then we can use DMA API uniformly inside
> > > > > driver. This idea seems to be inconsistent with the framework design of DMA. The
> > > > > conclusion is no.
> > > > >
> > > > > 3. We noticed that if the virtio device supports VIRTIO_F_ACCESS_PLATFORM, it
> > > > > uses DMA API. And this type of device is the future direction, so we only
> > > > > support DMA premapped for this type of virtio device. The problem with this
> > > > > solution is that virtqueue_dma_dev() only returns dev in some cases, because
> > > > > VIRTIO_F_ACCESS_PLATFORM is supported in such cases.
>
> Could you explain the issue a little bit more?
>
> E.g if we limit AF_XDP to ACESS_PLATFROM only, why does
> virtqueue_dma_dev() only return dev in some cases?

The behavior of virtqueue_dma_dev() is not related to AF_XDP.

The return value of virtqueue_dma_dev() is used for the DMA APIs. So it can
return dma dev when the virtio is with ACCESS_PLATFORM. If virtio is without
ACCESS_PLATFORM then it MUST return NULL.

In the virtio-net driver, if the virtqueue_dma_dev() returns dma dev,
we can enable AF_XDP. If not, we return error to AF_XDP.

Thanks




>
> Thanks
>
> >Otherwise NULL is returned.
> > > > > This option is currently NO.
> > > > >
> > > > > So I'm wondering what should I do, from a DMA point of view, is there any
> > > > > solution in case of using DMA API?
> > > > >
> > > > > Thank you
> > > >
> > > >
> > > > I think it's ok at this point, Christoph just asked you
> > > > to add wrappers for map/unmap for use in virtio code.
> > > > Seems like a cosmetic change, shouldn't be hard.
> > >
> > > Yes, that is not hard, I has this code.
> > >
> > > But, you mean that the wrappers is just used for the virtio driver code?
> > > And we also offer the  API virtqueue_dma_dev() at the same time?
> > > Then the driver will has two chooses to do DMA.
> > >
> > > Is that so?
> >
> > Ping.
> >
> > Thanks
> >
> > >
> > >
> > > > Otherwise I haven't seen significant comments.
> > > >
> > > >
> > > > Christoph do I summarize what you are saying correctly?
> > > > --
> > > > MST
> > > >
> > >
> >
>

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: [PATCH vhost v11 05/10] virtio_ring: introduce virtqueue_dma_dev()
@ 2023-08-08  2:47                               ` Xuan Zhuo
  0 siblings, 0 replies; 176+ messages in thread
From: Xuan Zhuo @ 2023-08-08  2:47 UTC (permalink / raw)
  To: Jason Wang
  Cc: Jesper Dangaard Brouer, Daniel Borkmann, Michael S. Tsirkin,
	netdev, John Fastabend, Alexei Starovoitov, virtualization,
	Christoph Hellwig, Eric Dumazet, Jakub Kicinski, bpf,
	Paolo Abeni, David S. Miller

On Tue, 8 Aug 2023 10:26:04 +0800, Jason Wang <jasowang@redhat.com> wrote:
> On Mon, Aug 7, 2023 at 2:15 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> >
> > On Wed, 2 Aug 2023 09:49:31 +0800, Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > On Tue, 1 Aug 2023 12:17:47 -0400, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > > > On Fri, Jul 28, 2023 at 02:02:33PM +0800, Xuan Zhuo wrote:
> > > > > On Tue, 25 Jul 2023 19:07:23 +0800, Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > > > > On Tue, 25 Jul 2023 03:34:34 -0400, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > > > > > > On Tue, Jul 25, 2023 at 10:13:48AM +0800, Xuan Zhuo wrote:
> > > > > > > > On Mon, 24 Jul 2023 09:43:42 -0700, Christoph Hellwig <hch@infradead.org> wrote:
> > > > > > > > > On Thu, Jul 20, 2023 at 01:21:07PM -0400, Michael S. Tsirkin wrote:
> > > > > > > > > > Well I think we can add wrappers like virtio_dma_sync and so on.
> > > > > > > > > > There are NOP for non-dma so passing the dma device is harmless.
> > > > > > > > >
> > > > > > > > > Yes, please.
> > > > > > > >
> > > > > > > >
> > > > > > > > I am not sure I got this fully.
> > > > > > > >
> > > > > > > > Are you mean this:
> > > > > > > > https://lore.kernel.org/all/20230214072704.126660-8-xuanzhuo@linux.alibaba.com/
> > > > > > > > https://lore.kernel.org/all/20230214072704.126660-9-xuanzhuo@linux.alibaba.com/
> > > > > > > >
> > > > > > > > Then the driver must do dma operation(map and sync) by these virtio_dma_* APIs.
> > > > > > > > No care the device is non-dma device or dma device.
> > > > > > >
> > > > > > > yes
> > > > > > >
> > > > > > > > Then the AF_XDP must use these virtio_dma_* APIs for virtio device.
> > > > > > >
> > > > > > > We'll worry about AF_XDP when the patch is posted.
> > > > > >
> > > > > > YES.
> > > > > >
> > > > > > We discussed it. They voted 'no'.
> > > > > >
> > > > > > http://lore.kernel.org/all/20230424082856.15c1e593@kernel.org
> > > > >
> > > > >
> > > > > Hi guys, this topic is stuck again. How should I proceed with this work?
> > > > >
> > > > > Let me briefly summarize:
> > > > > 1. The problem with adding virtio_dma_{map, sync} api is that, for AF_XDP and
> > > > > the driver layer, we need to support these APIs. The current conclusion of
> > > > > AF_XDP is no.
> > > > >
> > > > > 2. Set dma_set_mask_and_coherent, then we can use DMA API uniformly inside
> > > > > driver. This idea seems to be inconsistent with the framework design of DMA. The
> > > > > conclusion is no.
> > > > >
> > > > > 3. We noticed that if the virtio device supports VIRTIO_F_ACCESS_PLATFORM, it
> > > > > uses DMA API. And this type of device is the future direction, so we only
> > > > > support DMA premapped for this type of virtio device. The problem with this
> > > > > solution is that virtqueue_dma_dev() only returns dev in some cases, because
> > > > > VIRTIO_F_ACCESS_PLATFORM is supported in such cases.
>
> Could you explain the issue a little bit more?
>
> E.g if we limit AF_XDP to ACESS_PLATFROM only, why does
> virtqueue_dma_dev() only return dev in some cases?

The behavior of virtqueue_dma_dev() is not related to AF_XDP.

The return value of virtqueue_dma_dev() is used for the DMA APIs. So it can
return dma dev when the virtio is with ACCESS_PLATFORM. If virtio is without
ACCESS_PLATFORM then it MUST return NULL.

In the virtio-net driver, if the virtqueue_dma_dev() returns dma dev,
we can enable AF_XDP. If not, we return error to AF_XDP.

Thanks




>
> Thanks
>
> >Otherwise NULL is returned.
> > > > > This option is currently NO.
> > > > >
> > > > > So I'm wondering what should I do, from a DMA point of view, is there any
> > > > > solution in case of using DMA API?
> > > > >
> > > > > Thank you
> > > >
> > > >
> > > > I think it's ok at this point, Christoph just asked you
> > > > to add wrappers for map/unmap for use in virtio code.
> > > > Seems like a cosmetic change, shouldn't be hard.
> > >
> > > Yes, that is not hard, I has this code.
> > >
> > > But, you mean that the wrappers is just used for the virtio driver code?
> > > And we also offer the  API virtqueue_dma_dev() at the same time?
> > > Then the driver will has two chooses to do DMA.
> > >
> > > Is that so?
> >
> > Ping.
> >
> > Thanks
> >
> > >
> > >
> > > > Otherwise I haven't seen significant comments.
> > > >
> > > >
> > > > Christoph do I summarize what you are saying correctly?
> > > > --
> > > > MST
> > > >
> > >
> >
>
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: [PATCH vhost v11 05/10] virtio_ring: introduce virtqueue_dma_dev()
  2023-08-08  2:47                               ` Xuan Zhuo
@ 2023-08-08  3:08                                 ` Jason Wang
  -1 siblings, 0 replies; 176+ messages in thread
From: Jason Wang @ 2023-08-08  3:08 UTC (permalink / raw)
  To: Xuan Zhuo
  Cc: Michael S. Tsirkin, Christoph Hellwig, virtualization,
	David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Alexei Starovoitov, Daniel Borkmann, Jesper Dangaard Brouer,
	John Fastabend, netdev, bpf

On Tue, Aug 8, 2023 at 10:52 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
>
> On Tue, 8 Aug 2023 10:26:04 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > On Mon, Aug 7, 2023 at 2:15 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > >
> > > On Wed, 2 Aug 2023 09:49:31 +0800, Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > > On Tue, 1 Aug 2023 12:17:47 -0400, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > > > > On Fri, Jul 28, 2023 at 02:02:33PM +0800, Xuan Zhuo wrote:
> > > > > > On Tue, 25 Jul 2023 19:07:23 +0800, Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > > > > > On Tue, 25 Jul 2023 03:34:34 -0400, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > > > > > > > On Tue, Jul 25, 2023 at 10:13:48AM +0800, Xuan Zhuo wrote:
> > > > > > > > > On Mon, 24 Jul 2023 09:43:42 -0700, Christoph Hellwig <hch@infradead.org> wrote:
> > > > > > > > > > On Thu, Jul 20, 2023 at 01:21:07PM -0400, Michael S. Tsirkin wrote:
> > > > > > > > > > > Well I think we can add wrappers like virtio_dma_sync and so on.
> > > > > > > > > > > There are NOP for non-dma so passing the dma device is harmless.
> > > > > > > > > >
> > > > > > > > > > Yes, please.
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > I am not sure I got this fully.
> > > > > > > > >
> > > > > > > > > Are you mean this:
> > > > > > > > > https://lore.kernel.org/all/20230214072704.126660-8-xuanzhuo@linux.alibaba.com/
> > > > > > > > > https://lore.kernel.org/all/20230214072704.126660-9-xuanzhuo@linux.alibaba.com/
> > > > > > > > >
> > > > > > > > > Then the driver must do dma operation(map and sync) by these virtio_dma_* APIs.
> > > > > > > > > No care the device is non-dma device or dma device.
> > > > > > > >
> > > > > > > > yes
> > > > > > > >
> > > > > > > > > Then the AF_XDP must use these virtio_dma_* APIs for virtio device.
> > > > > > > >
> > > > > > > > We'll worry about AF_XDP when the patch is posted.
> > > > > > >
> > > > > > > YES.
> > > > > > >
> > > > > > > We discussed it. They voted 'no'.
> > > > > > >
> > > > > > > http://lore.kernel.org/all/20230424082856.15c1e593@kernel.org
> > > > > >
> > > > > >
> > > > > > Hi guys, this topic is stuck again. How should I proceed with this work?
> > > > > >
> > > > > > Let me briefly summarize:
> > > > > > 1. The problem with adding virtio_dma_{map, sync} api is that, for AF_XDP and
> > > > > > the driver layer, we need to support these APIs. The current conclusion of
> > > > > > AF_XDP is no.
> > > > > >
> > > > > > 2. Set dma_set_mask_and_coherent, then we can use DMA API uniformly inside
> > > > > > driver. This idea seems to be inconsistent with the framework design of DMA. The
> > > > > > conclusion is no.
> > > > > >
> > > > > > 3. We noticed that if the virtio device supports VIRTIO_F_ACCESS_PLATFORM, it
> > > > > > uses DMA API. And this type of device is the future direction, so we only
> > > > > > support DMA premapped for this type of virtio device. The problem with this
> > > > > > solution is that virtqueue_dma_dev() only returns dev in some cases, because
> > > > > > VIRTIO_F_ACCESS_PLATFORM is supported in such cases.
> >
> > Could you explain the issue a little bit more?
> >
> > E.g if we limit AF_XDP to ACESS_PLATFROM only, why does
> > virtqueue_dma_dev() only return dev in some cases?
>
> The behavior of virtqueue_dma_dev() is not related to AF_XDP.
>
> The return value of virtqueue_dma_dev() is used for the DMA APIs. So it can
> return dma dev when the virtio is with ACCESS_PLATFORM. If virtio is without
> ACCESS_PLATFORM then it MUST return NULL.
>
> In the virtio-net driver, if the virtqueue_dma_dev() returns dma dev,
> we can enable AF_XDP. If not, we return error to AF_XDP.

Yes, as discussed, just having wrappers in the virtio_ring and doing
the switch there. Then can virtio-net use them without worrying about
DMA details?

Thanks

>
> Thanks
>
>
>
>
> >
> > Thanks
> >
> > >Otherwise NULL is returned.
> > > > > > This option is currently NO.
> > > > > >
> > > > > > So I'm wondering what should I do, from a DMA point of view, is there any
> > > > > > solution in case of using DMA API?
> > > > > >
> > > > > > Thank you
> > > > >
> > > > >
> > > > > I think it's ok at this point, Christoph just asked you
> > > > > to add wrappers for map/unmap for use in virtio code.
> > > > > Seems like a cosmetic change, shouldn't be hard.
> > > >
> > > > Yes, that is not hard, I has this code.
> > > >
> > > > But, you mean that the wrappers is just used for the virtio driver code?
> > > > And we also offer the  API virtqueue_dma_dev() at the same time?
> > > > Then the driver will has two chooses to do DMA.
> > > >
> > > > Is that so?
> > >
> > > Ping.
> > >
> > > Thanks
> > >
> > > >
> > > >
> > > > > Otherwise I haven't seen significant comments.
> > > > >
> > > > >
> > > > > Christoph do I summarize what you are saying correctly?
> > > > > --
> > > > > MST
> > > > >
> > > >
> > >
> >
>


^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: [PATCH vhost v11 05/10] virtio_ring: introduce virtqueue_dma_dev()
@ 2023-08-08  3:08                                 ` Jason Wang
  0 siblings, 0 replies; 176+ messages in thread
From: Jason Wang @ 2023-08-08  3:08 UTC (permalink / raw)
  To: Xuan Zhuo
  Cc: Jesper Dangaard Brouer, Daniel Borkmann, Michael S. Tsirkin,
	netdev, John Fastabend, Alexei Starovoitov, virtualization,
	Christoph Hellwig, Eric Dumazet, Jakub Kicinski, bpf,
	Paolo Abeni, David S. Miller

On Tue, Aug 8, 2023 at 10:52 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
>
> On Tue, 8 Aug 2023 10:26:04 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > On Mon, Aug 7, 2023 at 2:15 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > >
> > > On Wed, 2 Aug 2023 09:49:31 +0800, Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > > On Tue, 1 Aug 2023 12:17:47 -0400, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > > > > On Fri, Jul 28, 2023 at 02:02:33PM +0800, Xuan Zhuo wrote:
> > > > > > On Tue, 25 Jul 2023 19:07:23 +0800, Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > > > > > On Tue, 25 Jul 2023 03:34:34 -0400, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > > > > > > > On Tue, Jul 25, 2023 at 10:13:48AM +0800, Xuan Zhuo wrote:
> > > > > > > > > On Mon, 24 Jul 2023 09:43:42 -0700, Christoph Hellwig <hch@infradead.org> wrote:
> > > > > > > > > > On Thu, Jul 20, 2023 at 01:21:07PM -0400, Michael S. Tsirkin wrote:
> > > > > > > > > > > Well I think we can add wrappers like virtio_dma_sync and so on.
> > > > > > > > > > > There are NOP for non-dma so passing the dma device is harmless.
> > > > > > > > > >
> > > > > > > > > > Yes, please.
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > I am not sure I got this fully.
> > > > > > > > >
> > > > > > > > > Are you mean this:
> > > > > > > > > https://lore.kernel.org/all/20230214072704.126660-8-xuanzhuo@linux.alibaba.com/
> > > > > > > > > https://lore.kernel.org/all/20230214072704.126660-9-xuanzhuo@linux.alibaba.com/
> > > > > > > > >
> > > > > > > > > Then the driver must do dma operation(map and sync) by these virtio_dma_* APIs.
> > > > > > > > > No care the device is non-dma device or dma device.
> > > > > > > >
> > > > > > > > yes
> > > > > > > >
> > > > > > > > > Then the AF_XDP must use these virtio_dma_* APIs for virtio device.
> > > > > > > >
> > > > > > > > We'll worry about AF_XDP when the patch is posted.
> > > > > > >
> > > > > > > YES.
> > > > > > >
> > > > > > > We discussed it. They voted 'no'.
> > > > > > >
> > > > > > > http://lore.kernel.org/all/20230424082856.15c1e593@kernel.org
> > > > > >
> > > > > >
> > > > > > Hi guys, this topic is stuck again. How should I proceed with this work?
> > > > > >
> > > > > > Let me briefly summarize:
> > > > > > 1. The problem with adding virtio_dma_{map, sync} api is that, for AF_XDP and
> > > > > > the driver layer, we need to support these APIs. The current conclusion of
> > > > > > AF_XDP is no.
> > > > > >
> > > > > > 2. Set dma_set_mask_and_coherent, then we can use DMA API uniformly inside
> > > > > > driver. This idea seems to be inconsistent with the framework design of DMA. The
> > > > > > conclusion is no.
> > > > > >
> > > > > > 3. We noticed that if the virtio device supports VIRTIO_F_ACCESS_PLATFORM, it
> > > > > > uses DMA API. And this type of device is the future direction, so we only
> > > > > > support DMA premapped for this type of virtio device. The problem with this
> > > > > > solution is that virtqueue_dma_dev() only returns dev in some cases, because
> > > > > > VIRTIO_F_ACCESS_PLATFORM is supported in such cases.
> >
> > Could you explain the issue a little bit more?
> >
> > E.g if we limit AF_XDP to ACESS_PLATFROM only, why does
> > virtqueue_dma_dev() only return dev in some cases?
>
> The behavior of virtqueue_dma_dev() is not related to AF_XDP.
>
> The return value of virtqueue_dma_dev() is used for the DMA APIs. So it can
> return dma dev when the virtio is with ACCESS_PLATFORM. If virtio is without
> ACCESS_PLATFORM then it MUST return NULL.
>
> In the virtio-net driver, if the virtqueue_dma_dev() returns dma dev,
> we can enable AF_XDP. If not, we return error to AF_XDP.

Yes, as discussed, just having wrappers in the virtio_ring and doing
the switch there. Then can virtio-net use them without worrying about
DMA details?

Thanks

>
> Thanks
>
>
>
>
> >
> > Thanks
> >
> > >Otherwise NULL is returned.
> > > > > > This option is currently NO.
> > > > > >
> > > > > > So I'm wondering what should I do, from a DMA point of view, is there any
> > > > > > solution in case of using DMA API?
> > > > > >
> > > > > > Thank you
> > > > >
> > > > >
> > > > > I think it's ok at this point, Christoph just asked you
> > > > > to add wrappers for map/unmap for use in virtio code.
> > > > > Seems like a cosmetic change, shouldn't be hard.
> > > >
> > > > Yes, that is not hard, I has this code.
> > > >
> > > > But, you mean that the wrappers is just used for the virtio driver code?
> > > > And we also offer the  API virtqueue_dma_dev() at the same time?
> > > > Then the driver will has two chooses to do DMA.
> > > >
> > > > Is that so?
> > >
> > > Ping.
> > >
> > > Thanks
> > >
> > > >
> > > >
> > > > > Otherwise I haven't seen significant comments.
> > > > >
> > > > >
> > > > > Christoph do I summarize what you are saying correctly?
> > > > > --
> > > > > MST
> > > > >
> > > >
> > >
> >
>

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: [PATCH vhost v11 05/10] virtio_ring: introduce virtqueue_dma_dev()
  2023-08-08  3:08                                 ` Jason Wang
@ 2023-08-08  3:09                                   ` Xuan Zhuo
  -1 siblings, 0 replies; 176+ messages in thread
From: Xuan Zhuo @ 2023-08-08  3:09 UTC (permalink / raw)
  To: Jason Wang
  Cc: Michael S. Tsirkin, Christoph Hellwig, virtualization,
	David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Alexei Starovoitov, Daniel Borkmann, Jesper Dangaard Brouer,
	John Fastabend, netdev, bpf

On Tue, 8 Aug 2023 11:08:09 +0800, Jason Wang <jasowang@redhat.com> wrote:
> On Tue, Aug 8, 2023 at 10:52 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> >
> > On Tue, 8 Aug 2023 10:26:04 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > On Mon, Aug 7, 2023 at 2:15 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > >
> > > > On Wed, 2 Aug 2023 09:49:31 +0800, Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > > > On Tue, 1 Aug 2023 12:17:47 -0400, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > > > > > On Fri, Jul 28, 2023 at 02:02:33PM +0800, Xuan Zhuo wrote:
> > > > > > > On Tue, 25 Jul 2023 19:07:23 +0800, Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > > > > > > On Tue, 25 Jul 2023 03:34:34 -0400, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > > > > > > > > On Tue, Jul 25, 2023 at 10:13:48AM +0800, Xuan Zhuo wrote:
> > > > > > > > > > On Mon, 24 Jul 2023 09:43:42 -0700, Christoph Hellwig <hch@infradead.org> wrote:
> > > > > > > > > > > On Thu, Jul 20, 2023 at 01:21:07PM -0400, Michael S. Tsirkin wrote:
> > > > > > > > > > > > Well I think we can add wrappers like virtio_dma_sync and so on.
> > > > > > > > > > > > There are NOP for non-dma so passing the dma device is harmless.
> > > > > > > > > > >
> > > > > > > > > > > Yes, please.
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > I am not sure I got this fully.
> > > > > > > > > >
> > > > > > > > > > Are you mean this:
> > > > > > > > > > https://lore.kernel.org/all/20230214072704.126660-8-xuanzhuo@linux.alibaba.com/
> > > > > > > > > > https://lore.kernel.org/all/20230214072704.126660-9-xuanzhuo@linux.alibaba.com/
> > > > > > > > > >
> > > > > > > > > > Then the driver must do dma operation(map and sync) by these virtio_dma_* APIs.
> > > > > > > > > > No care the device is non-dma device or dma device.
> > > > > > > > >
> > > > > > > > > yes
> > > > > > > > >
> > > > > > > > > > Then the AF_XDP must use these virtio_dma_* APIs for virtio device.
> > > > > > > > >
> > > > > > > > > We'll worry about AF_XDP when the patch is posted.
> > > > > > > >
> > > > > > > > YES.
> > > > > > > >
> > > > > > > > We discussed it. They voted 'no'.
> > > > > > > >
> > > > > > > > http://lore.kernel.org/all/20230424082856.15c1e593@kernel.org
> > > > > > >
> > > > > > >
> > > > > > > Hi guys, this topic is stuck again. How should I proceed with this work?
> > > > > > >
> > > > > > > Let me briefly summarize:
> > > > > > > 1. The problem with adding virtio_dma_{map, sync} api is that, for AF_XDP and
> > > > > > > the driver layer, we need to support these APIs. The current conclusion of
> > > > > > > AF_XDP is no.
> > > > > > >
> > > > > > > 2. Set dma_set_mask_and_coherent, then we can use DMA API uniformly inside
> > > > > > > driver. This idea seems to be inconsistent with the framework design of DMA. The
> > > > > > > conclusion is no.
> > > > > > >
> > > > > > > 3. We noticed that if the virtio device supports VIRTIO_F_ACCESS_PLATFORM, it
> > > > > > > uses DMA API. And this type of device is the future direction, so we only
> > > > > > > support DMA premapped for this type of virtio device. The problem with this
> > > > > > > solution is that virtqueue_dma_dev() only returns dev in some cases, because
> > > > > > > VIRTIO_F_ACCESS_PLATFORM is supported in such cases.
> > >
> > > Could you explain the issue a little bit more?
> > >
> > > E.g if we limit AF_XDP to ACESS_PLATFROM only, why does
> > > virtqueue_dma_dev() only return dev in some cases?
> >
> > The behavior of virtqueue_dma_dev() is not related to AF_XDP.
> >
> > The return value of virtqueue_dma_dev() is used for the DMA APIs. So it can
> > return dma dev when the virtio is with ACCESS_PLATFORM. If virtio is without
> > ACCESS_PLATFORM then it MUST return NULL.
> >
> > In the virtio-net driver, if the virtqueue_dma_dev() returns dma dev,
> > we can enable AF_XDP. If not, we return error to AF_XDP.
>
> Yes, as discussed, just having wrappers in the virtio_ring and doing
> the switch there. Then can virtio-net use them without worrying about
> DMA details?


Yes. In the virtio drivers, we can use the wrappers. That is ok.

But we also need to support virtqueue_dma_dev() for AF_XDP, because that the
AF_XDP will not use the wrappers.

So that is ok for you?

Thanks.

>
> Thanks
>
> >
> > Thanks
> >
> >
> >
> >
> > >
> > > Thanks
> > >
> > > >Otherwise NULL is returned.
> > > > > > > This option is currently NO.
> > > > > > >
> > > > > > > So I'm wondering what should I do, from a DMA point of view, is there any
> > > > > > > solution in case of using DMA API?
> > > > > > >
> > > > > > > Thank you
> > > > > >
> > > > > >
> > > > > > I think it's ok at this point, Christoph just asked you
> > > > > > to add wrappers for map/unmap for use in virtio code.
> > > > > > Seems like a cosmetic change, shouldn't be hard.
> > > > >
> > > > > Yes, that is not hard, I has this code.
> > > > >
> > > > > But, you mean that the wrappers is just used for the virtio driver code?
> > > > > And we also offer the  API virtqueue_dma_dev() at the same time?
> > > > > Then the driver will has two chooses to do DMA.
> > > > >
> > > > > Is that so?
> > > >
> > > > Ping.
> > > >
> > > > Thanks
> > > >
> > > > >
> > > > >
> > > > > > Otherwise I haven't seen significant comments.
> > > > > >
> > > > > >
> > > > > > Christoph do I summarize what you are saying correctly?
> > > > > > --
> > > > > > MST
> > > > > >
> > > > >
> > > >
> > >
> >
>

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: [PATCH vhost v11 05/10] virtio_ring: introduce virtqueue_dma_dev()
@ 2023-08-08  3:09                                   ` Xuan Zhuo
  0 siblings, 0 replies; 176+ messages in thread
From: Xuan Zhuo @ 2023-08-08  3:09 UTC (permalink / raw)
  To: Jason Wang
  Cc: Jesper Dangaard Brouer, Daniel Borkmann, Michael S. Tsirkin,
	netdev, John Fastabend, Alexei Starovoitov, virtualization,
	Christoph Hellwig, Eric Dumazet, Jakub Kicinski, bpf,
	Paolo Abeni, David S. Miller

On Tue, 8 Aug 2023 11:08:09 +0800, Jason Wang <jasowang@redhat.com> wrote:
> On Tue, Aug 8, 2023 at 10:52 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> >
> > On Tue, 8 Aug 2023 10:26:04 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > On Mon, Aug 7, 2023 at 2:15 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > >
> > > > On Wed, 2 Aug 2023 09:49:31 +0800, Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > > > On Tue, 1 Aug 2023 12:17:47 -0400, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > > > > > On Fri, Jul 28, 2023 at 02:02:33PM +0800, Xuan Zhuo wrote:
> > > > > > > On Tue, 25 Jul 2023 19:07:23 +0800, Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > > > > > > On Tue, 25 Jul 2023 03:34:34 -0400, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > > > > > > > > On Tue, Jul 25, 2023 at 10:13:48AM +0800, Xuan Zhuo wrote:
> > > > > > > > > > On Mon, 24 Jul 2023 09:43:42 -0700, Christoph Hellwig <hch@infradead.org> wrote:
> > > > > > > > > > > On Thu, Jul 20, 2023 at 01:21:07PM -0400, Michael S. Tsirkin wrote:
> > > > > > > > > > > > Well I think we can add wrappers like virtio_dma_sync and so on.
> > > > > > > > > > > > There are NOP for non-dma so passing the dma device is harmless.
> > > > > > > > > > >
> > > > > > > > > > > Yes, please.
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > I am not sure I got this fully.
> > > > > > > > > >
> > > > > > > > > > Are you mean this:
> > > > > > > > > > https://lore.kernel.org/all/20230214072704.126660-8-xuanzhuo@linux.alibaba.com/
> > > > > > > > > > https://lore.kernel.org/all/20230214072704.126660-9-xuanzhuo@linux.alibaba.com/
> > > > > > > > > >
> > > > > > > > > > Then the driver must do dma operation(map and sync) by these virtio_dma_* APIs.
> > > > > > > > > > No care the device is non-dma device or dma device.
> > > > > > > > >
> > > > > > > > > yes
> > > > > > > > >
> > > > > > > > > > Then the AF_XDP must use these virtio_dma_* APIs for virtio device.
> > > > > > > > >
> > > > > > > > > We'll worry about AF_XDP when the patch is posted.
> > > > > > > >
> > > > > > > > YES.
> > > > > > > >
> > > > > > > > We discussed it. They voted 'no'.
> > > > > > > >
> > > > > > > > http://lore.kernel.org/all/20230424082856.15c1e593@kernel.org
> > > > > > >
> > > > > > >
> > > > > > > Hi guys, this topic is stuck again. How should I proceed with this work?
> > > > > > >
> > > > > > > Let me briefly summarize:
> > > > > > > 1. The problem with adding virtio_dma_{map, sync} api is that, for AF_XDP and
> > > > > > > the driver layer, we need to support these APIs. The current conclusion of
> > > > > > > AF_XDP is no.
> > > > > > >
> > > > > > > 2. Set dma_set_mask_and_coherent, then we can use DMA API uniformly inside
> > > > > > > driver. This idea seems to be inconsistent with the framework design of DMA. The
> > > > > > > conclusion is no.
> > > > > > >
> > > > > > > 3. We noticed that if the virtio device supports VIRTIO_F_ACCESS_PLATFORM, it
> > > > > > > uses DMA API. And this type of device is the future direction, so we only
> > > > > > > support DMA premapped for this type of virtio device. The problem with this
> > > > > > > solution is that virtqueue_dma_dev() only returns dev in some cases, because
> > > > > > > VIRTIO_F_ACCESS_PLATFORM is supported in such cases.
> > >
> > > Could you explain the issue a little bit more?
> > >
> > > E.g if we limit AF_XDP to ACESS_PLATFROM only, why does
> > > virtqueue_dma_dev() only return dev in some cases?
> >
> > The behavior of virtqueue_dma_dev() is not related to AF_XDP.
> >
> > The return value of virtqueue_dma_dev() is used for the DMA APIs. So it can
> > return dma dev when the virtio is with ACCESS_PLATFORM. If virtio is without
> > ACCESS_PLATFORM then it MUST return NULL.
> >
> > In the virtio-net driver, if the virtqueue_dma_dev() returns dma dev,
> > we can enable AF_XDP. If not, we return error to AF_XDP.
>
> Yes, as discussed, just having wrappers in the virtio_ring and doing
> the switch there. Then can virtio-net use them without worrying about
> DMA details?


Yes. In the virtio drivers, we can use the wrappers. That is ok.

But we also need to support virtqueue_dma_dev() for AF_XDP, because that the
AF_XDP will not use the wrappers.

So that is ok for you?

Thanks.

>
> Thanks
>
> >
> > Thanks
> >
> >
> >
> >
> > >
> > > Thanks
> > >
> > > >Otherwise NULL is returned.
> > > > > > > This option is currently NO.
> > > > > > >
> > > > > > > So I'm wondering what should I do, from a DMA point of view, is there any
> > > > > > > solution in case of using DMA API?
> > > > > > >
> > > > > > > Thank you
> > > > > >
> > > > > >
> > > > > > I think it's ok at this point, Christoph just asked you
> > > > > > to add wrappers for map/unmap for use in virtio code.
> > > > > > Seems like a cosmetic change, shouldn't be hard.
> > > > >
> > > > > Yes, that is not hard, I has this code.
> > > > >
> > > > > But, you mean that the wrappers is just used for the virtio driver code?
> > > > > And we also offer the  API virtqueue_dma_dev() at the same time?
> > > > > Then the driver will has two chooses to do DMA.
> > > > >
> > > > > Is that so?
> > > >
> > > > Ping.
> > > >
> > > > Thanks
> > > >
> > > > >
> > > > >
> > > > > > Otherwise I haven't seen significant comments.
> > > > > >
> > > > > >
> > > > > > Christoph do I summarize what you are saying correctly?
> > > > > > --
> > > > > > MST
> > > > > >
> > > > >
> > > >
> > >
> >
>
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: [PATCH vhost v11 05/10] virtio_ring: introduce virtqueue_dma_dev()
  2023-08-08  3:09                                   ` Xuan Zhuo
@ 2023-08-08  3:49                                     ` Jason Wang
  -1 siblings, 0 replies; 176+ messages in thread
From: Jason Wang @ 2023-08-08  3:49 UTC (permalink / raw)
  To: Xuan Zhuo
  Cc: Jesper Dangaard Brouer, Daniel Borkmann, Michael S. Tsirkin,
	netdev, John Fastabend, Alexei Starovoitov, virtualization,
	Christoph Hellwig, Eric Dumazet, Jakub Kicinski, bpf,
	Paolo Abeni, David S. Miller

On Tue, Aug 8, 2023 at 11:12 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
>
> On Tue, 8 Aug 2023 11:08:09 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > On Tue, Aug 8, 2023 at 10:52 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > >
> > > On Tue, 8 Aug 2023 10:26:04 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > > On Mon, Aug 7, 2023 at 2:15 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > > >
> > > > > On Wed, 2 Aug 2023 09:49:31 +0800, Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > > > > On Tue, 1 Aug 2023 12:17:47 -0400, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > > > > > > On Fri, Jul 28, 2023 at 02:02:33PM +0800, Xuan Zhuo wrote:
> > > > > > > > On Tue, 25 Jul 2023 19:07:23 +0800, Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > > > > > > > On Tue, 25 Jul 2023 03:34:34 -0400, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > > > > > > > > > On Tue, Jul 25, 2023 at 10:13:48AM +0800, Xuan Zhuo wrote:
> > > > > > > > > > > On Mon, 24 Jul 2023 09:43:42 -0700, Christoph Hellwig <hch@infradead.org> wrote:
> > > > > > > > > > > > On Thu, Jul 20, 2023 at 01:21:07PM -0400, Michael S. Tsirkin wrote:
> > > > > > > > > > > > > Well I think we can add wrappers like virtio_dma_sync and so on.
> > > > > > > > > > > > > There are NOP for non-dma so passing the dma device is harmless.
> > > > > > > > > > > >
> > > > > > > > > > > > Yes, please.
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > I am not sure I got this fully.
> > > > > > > > > > >
> > > > > > > > > > > Are you mean this:
> > > > > > > > > > > https://lore.kernel.org/all/20230214072704.126660-8-xuanzhuo@linux.alibaba.com/
> > > > > > > > > > > https://lore.kernel.org/all/20230214072704.126660-9-xuanzhuo@linux.alibaba.com/
> > > > > > > > > > >
> > > > > > > > > > > Then the driver must do dma operation(map and sync) by these virtio_dma_* APIs.
> > > > > > > > > > > No care the device is non-dma device or dma device.
> > > > > > > > > >
> > > > > > > > > > yes
> > > > > > > > > >
> > > > > > > > > > > Then the AF_XDP must use these virtio_dma_* APIs for virtio device.
> > > > > > > > > >
> > > > > > > > > > We'll worry about AF_XDP when the patch is posted.
> > > > > > > > >
> > > > > > > > > YES.
> > > > > > > > >
> > > > > > > > > We discussed it. They voted 'no'.
> > > > > > > > >
> > > > > > > > > http://lore.kernel.org/all/20230424082856.15c1e593@kernel.org
> > > > > > > >
> > > > > > > >
> > > > > > > > Hi guys, this topic is stuck again. How should I proceed with this work?
> > > > > > > >
> > > > > > > > Let me briefly summarize:
> > > > > > > > 1. The problem with adding virtio_dma_{map, sync} api is that, for AF_XDP and
> > > > > > > > the driver layer, we need to support these APIs. The current conclusion of
> > > > > > > > AF_XDP is no.
> > > > > > > >
> > > > > > > > 2. Set dma_set_mask_and_coherent, then we can use DMA API uniformly inside
> > > > > > > > driver. This idea seems to be inconsistent with the framework design of DMA. The
> > > > > > > > conclusion is no.
> > > > > > > >
> > > > > > > > 3. We noticed that if the virtio device supports VIRTIO_F_ACCESS_PLATFORM, it
> > > > > > > > uses DMA API. And this type of device is the future direction, so we only
> > > > > > > > support DMA premapped for this type of virtio device. The problem with this
> > > > > > > > solution is that virtqueue_dma_dev() only returns dev in some cases, because
> > > > > > > > VIRTIO_F_ACCESS_PLATFORM is supported in such cases.
> > > >
> > > > Could you explain the issue a little bit more?
> > > >
> > > > E.g if we limit AF_XDP to ACESS_PLATFROM only, why does
> > > > virtqueue_dma_dev() only return dev in some cases?
> > >
> > > The behavior of virtqueue_dma_dev() is not related to AF_XDP.
> > >
> > > The return value of virtqueue_dma_dev() is used for the DMA APIs. So it can
> > > return dma dev when the virtio is with ACCESS_PLATFORM. If virtio is without
> > > ACCESS_PLATFORM then it MUST return NULL.
> > >
> > > In the virtio-net driver, if the virtqueue_dma_dev() returns dma dev,
> > > we can enable AF_XDP. If not, we return error to AF_XDP.
> >
> > Yes, as discussed, just having wrappers in the virtio_ring and doing
> > the switch there. Then can virtio-net use them without worrying about
> > DMA details?
>
>
> Yes. In the virtio drivers, we can use the wrappers. That is ok.
>
> But we also need to support virtqueue_dma_dev() for AF_XDP, because that the
> AF_XDP will not use the wrappers.

You mean AF_XDP core or other? Could you give me an example?

Thanks

>
> So that is ok for you?
>
> Thanks.
>
> >
> > Thanks
> >
> > >
> > > Thanks
> > >
> > >
> > >
> > >
> > > >
> > > > Thanks
> > > >
> > > > >Otherwise NULL is returned.
> > > > > > > > This option is currently NO.
> > > > > > > >
> > > > > > > > So I'm wondering what should I do, from a DMA point of view, is there any
> > > > > > > > solution in case of using DMA API?
> > > > > > > >
> > > > > > > > Thank you
> > > > > > >
> > > > > > >
> > > > > > > I think it's ok at this point, Christoph just asked you
> > > > > > > to add wrappers for map/unmap for use in virtio code.
> > > > > > > Seems like a cosmetic change, shouldn't be hard.
> > > > > >
> > > > > > Yes, that is not hard, I has this code.
> > > > > >
> > > > > > But, you mean that the wrappers is just used for the virtio driver code?
> > > > > > And we also offer the  API virtqueue_dma_dev() at the same time?
> > > > > > Then the driver will has two chooses to do DMA.
> > > > > >
> > > > > > Is that so?
> > > > >
> > > > > Ping.
> > > > >
> > > > > Thanks
> > > > >
> > > > > >
> > > > > >
> > > > > > > Otherwise I haven't seen significant comments.
> > > > > > >
> > > > > > >
> > > > > > > Christoph do I summarize what you are saying correctly?
> > > > > > > --
> > > > > > > MST
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: [PATCH vhost v11 05/10] virtio_ring: introduce virtqueue_dma_dev()
@ 2023-08-08  3:49                                     ` Jason Wang
  0 siblings, 0 replies; 176+ messages in thread
From: Jason Wang @ 2023-08-08  3:49 UTC (permalink / raw)
  To: Xuan Zhuo
  Cc: Michael S. Tsirkin, Christoph Hellwig, virtualization,
	David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Alexei Starovoitov, Daniel Borkmann, Jesper Dangaard Brouer,
	John Fastabend, netdev, bpf

On Tue, Aug 8, 2023 at 11:12 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
>
> On Tue, 8 Aug 2023 11:08:09 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > On Tue, Aug 8, 2023 at 10:52 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > >
> > > On Tue, 8 Aug 2023 10:26:04 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > > On Mon, Aug 7, 2023 at 2:15 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > > >
> > > > > On Wed, 2 Aug 2023 09:49:31 +0800, Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > > > > On Tue, 1 Aug 2023 12:17:47 -0400, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > > > > > > On Fri, Jul 28, 2023 at 02:02:33PM +0800, Xuan Zhuo wrote:
> > > > > > > > On Tue, 25 Jul 2023 19:07:23 +0800, Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > > > > > > > On Tue, 25 Jul 2023 03:34:34 -0400, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > > > > > > > > > On Tue, Jul 25, 2023 at 10:13:48AM +0800, Xuan Zhuo wrote:
> > > > > > > > > > > On Mon, 24 Jul 2023 09:43:42 -0700, Christoph Hellwig <hch@infradead.org> wrote:
> > > > > > > > > > > > On Thu, Jul 20, 2023 at 01:21:07PM -0400, Michael S. Tsirkin wrote:
> > > > > > > > > > > > > Well I think we can add wrappers like virtio_dma_sync and so on.
> > > > > > > > > > > > > There are NOP for non-dma so passing the dma device is harmless.
> > > > > > > > > > > >
> > > > > > > > > > > > Yes, please.
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > I am not sure I got this fully.
> > > > > > > > > > >
> > > > > > > > > > > Are you mean this:
> > > > > > > > > > > https://lore.kernel.org/all/20230214072704.126660-8-xuanzhuo@linux.alibaba.com/
> > > > > > > > > > > https://lore.kernel.org/all/20230214072704.126660-9-xuanzhuo@linux.alibaba.com/
> > > > > > > > > > >
> > > > > > > > > > > Then the driver must do dma operation(map and sync) by these virtio_dma_* APIs.
> > > > > > > > > > > No care the device is non-dma device or dma device.
> > > > > > > > > >
> > > > > > > > > > yes
> > > > > > > > > >
> > > > > > > > > > > Then the AF_XDP must use these virtio_dma_* APIs for virtio device.
> > > > > > > > > >
> > > > > > > > > > We'll worry about AF_XDP when the patch is posted.
> > > > > > > > >
> > > > > > > > > YES.
> > > > > > > > >
> > > > > > > > > We discussed it. They voted 'no'.
> > > > > > > > >
> > > > > > > > > http://lore.kernel.org/all/20230424082856.15c1e593@kernel.org
> > > > > > > >
> > > > > > > >
> > > > > > > > Hi guys, this topic is stuck again. How should I proceed with this work?
> > > > > > > >
> > > > > > > > Let me briefly summarize:
> > > > > > > > 1. The problem with adding virtio_dma_{map, sync} api is that, for AF_XDP and
> > > > > > > > the driver layer, we need to support these APIs. The current conclusion of
> > > > > > > > AF_XDP is no.
> > > > > > > >
> > > > > > > > 2. Set dma_set_mask_and_coherent, then we can use DMA API uniformly inside
> > > > > > > > driver. This idea seems to be inconsistent with the framework design of DMA. The
> > > > > > > > conclusion is no.
> > > > > > > >
> > > > > > > > 3. We noticed that if the virtio device supports VIRTIO_F_ACCESS_PLATFORM, it
> > > > > > > > uses DMA API. And this type of device is the future direction, so we only
> > > > > > > > support DMA premapped for this type of virtio device. The problem with this
> > > > > > > > solution is that virtqueue_dma_dev() only returns dev in some cases, because
> > > > > > > > VIRTIO_F_ACCESS_PLATFORM is supported in such cases.
> > > >
> > > > Could you explain the issue a little bit more?
> > > >
> > > > E.g if we limit AF_XDP to ACESS_PLATFROM only, why does
> > > > virtqueue_dma_dev() only return dev in some cases?
> > >
> > > The behavior of virtqueue_dma_dev() is not related to AF_XDP.
> > >
> > > The return value of virtqueue_dma_dev() is used for the DMA APIs. So it can
> > > return dma dev when the virtio is with ACCESS_PLATFORM. If virtio is without
> > > ACCESS_PLATFORM then it MUST return NULL.
> > >
> > > In the virtio-net driver, if the virtqueue_dma_dev() returns dma dev,
> > > we can enable AF_XDP. If not, we return error to AF_XDP.
> >
> > Yes, as discussed, just having wrappers in the virtio_ring and doing
> > the switch there. Then can virtio-net use them without worrying about
> > DMA details?
>
>
> Yes. In the virtio drivers, we can use the wrappers. That is ok.
>
> But we also need to support virtqueue_dma_dev() for AF_XDP, because that the
> AF_XDP will not use the wrappers.

You mean AF_XDP core or other? Could you give me an example?

Thanks

>
> So that is ok for you?
>
> Thanks.
>
> >
> > Thanks
> >
> > >
> > > Thanks
> > >
> > >
> > >
> > >
> > > >
> > > > Thanks
> > > >
> > > > >Otherwise NULL is returned.
> > > > > > > > This option is currently NO.
> > > > > > > >
> > > > > > > > So I'm wondering what should I do, from a DMA point of view, is there any
> > > > > > > > solution in case of using DMA API?
> > > > > > > >
> > > > > > > > Thank you
> > > > > > >
> > > > > > >
> > > > > > > I think it's ok at this point, Christoph just asked you
> > > > > > > to add wrappers for map/unmap for use in virtio code.
> > > > > > > Seems like a cosmetic change, shouldn't be hard.
> > > > > >
> > > > > > Yes, that is not hard, I has this code.
> > > > > >
> > > > > > But, you mean that the wrappers is just used for the virtio driver code?
> > > > > > And we also offer the  API virtqueue_dma_dev() at the same time?
> > > > > > Then the driver will has two chooses to do DMA.
> > > > > >
> > > > > > Is that so?
> > > > >
> > > > > Ping.
> > > > >
> > > > > Thanks
> > > > >
> > > > > >
> > > > > >
> > > > > > > Otherwise I haven't seen significant comments.
> > > > > > >
> > > > > > >
> > > > > > > Christoph do I summarize what you are saying correctly?
> > > > > > > --
> > > > > > > MST
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>


^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: [PATCH vhost v11 05/10] virtio_ring: introduce virtqueue_dma_dev()
  2023-08-08  3:49                                     ` Jason Wang
  (?)
@ 2023-08-08  3:54                                     ` Xuan Zhuo
  2023-08-08  3:59                                         ` Jason Wang
  -1 siblings, 1 reply; 176+ messages in thread
From: Xuan Zhuo @ 2023-08-08  3:54 UTC (permalink / raw)
  To: Jason Wang
  Cc: Jesper Dangaard Brouer, Daniel Borkmann, Michael S. Tsirkin,
	netdev, John Fastabend, Alexei Starovoitov, virtualization,
	Christoph Hellwig, Eric Dumazet, Jakub Kicinski, bpf,
	Paolo Abeni, David S. Miller

On Tue, 8 Aug 2023 11:49:08 +0800, Jason Wang <jasowang@redhat.com> wrote:
> On Tue, Aug 8, 2023 at 11:12 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> >
> > On Tue, 8 Aug 2023 11:08:09 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > On Tue, Aug 8, 2023 at 10:52 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > >
> > > > On Tue, 8 Aug 2023 10:26:04 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > > > On Mon, Aug 7, 2023 at 2:15 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > > > >
> > > > > > On Wed, 2 Aug 2023 09:49:31 +0800, Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > > > > > On Tue, 1 Aug 2023 12:17:47 -0400, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > > > > > > > On Fri, Jul 28, 2023 at 02:02:33PM +0800, Xuan Zhuo wrote:
> > > > > > > > > On Tue, 25 Jul 2023 19:07:23 +0800, Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > > > > > > > > On Tue, 25 Jul 2023 03:34:34 -0400, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > > > > > > > > > > On Tue, Jul 25, 2023 at 10:13:48AM +0800, Xuan Zhuo wrote:
> > > > > > > > > > > > On Mon, 24 Jul 2023 09:43:42 -0700, Christoph Hellwig <hch@infradead.org> wrote:
> > > > > > > > > > > > > On Thu, Jul 20, 2023 at 01:21:07PM -0400, Michael S. Tsirkin wrote:
> > > > > > > > > > > > > > Well I think we can add wrappers like virtio_dma_sync and so on.
> > > > > > > > > > > > > > There are NOP for non-dma so passing the dma device is harmless.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Yes, please.
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > I am not sure I got this fully.
> > > > > > > > > > > >
> > > > > > > > > > > > Are you mean this:
> > > > > > > > > > > > https://lore.kernel.org/all/20230214072704.126660-8-xuanzhuo@linux.alibaba.com/
> > > > > > > > > > > > https://lore.kernel.org/all/20230214072704.126660-9-xuanzhuo@linux.alibaba.com/
> > > > > > > > > > > >
> > > > > > > > > > > > Then the driver must do dma operation(map and sync) by these virtio_dma_* APIs.
> > > > > > > > > > > > No care the device is non-dma device or dma device.
> > > > > > > > > > >
> > > > > > > > > > > yes
> > > > > > > > > > >
> > > > > > > > > > > > Then the AF_XDP must use these virtio_dma_* APIs for virtio device.
> > > > > > > > > > >
> > > > > > > > > > > We'll worry about AF_XDP when the patch is posted.
> > > > > > > > > >
> > > > > > > > > > YES.
> > > > > > > > > >
> > > > > > > > > > We discussed it. They voted 'no'.
> > > > > > > > > >
> > > > > > > > > > http://lore.kernel.org/all/20230424082856.15c1e593@kernel.org
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > Hi guys, this topic is stuck again. How should I proceed with this work?
> > > > > > > > >
> > > > > > > > > Let me briefly summarize:
> > > > > > > > > 1. The problem with adding virtio_dma_{map, sync} api is that, for AF_XDP and
> > > > > > > > > the driver layer, we need to support these APIs. The current conclusion of
> > > > > > > > > AF_XDP is no.
> > > > > > > > >
> > > > > > > > > 2. Set dma_set_mask_and_coherent, then we can use DMA API uniformly inside
> > > > > > > > > driver. This idea seems to be inconsistent with the framework design of DMA. The
> > > > > > > > > conclusion is no.
> > > > > > > > >
> > > > > > > > > 3. We noticed that if the virtio device supports VIRTIO_F_ACCESS_PLATFORM, it
> > > > > > > > > uses DMA API. And this type of device is the future direction, so we only
> > > > > > > > > support DMA premapped for this type of virtio device. The problem with this
> > > > > > > > > solution is that virtqueue_dma_dev() only returns dev in some cases, because
> > > > > > > > > VIRTIO_F_ACCESS_PLATFORM is supported in such cases.
> > > > >
> > > > > Could you explain the issue a little bit more?
> > > > >
> > > > > E.g if we limit AF_XDP to ACESS_PLATFROM only, why does
> > > > > virtqueue_dma_dev() only return dev in some cases?
> > > >
> > > > The behavior of virtqueue_dma_dev() is not related to AF_XDP.
> > > >
> > > > The return value of virtqueue_dma_dev() is used for the DMA APIs. So it can
> > > > return dma dev when the virtio is with ACCESS_PLATFORM. If virtio is without
> > > > ACCESS_PLATFORM then it MUST return NULL.
> > > >
> > > > In the virtio-net driver, if the virtqueue_dma_dev() returns dma dev,
> > > > we can enable AF_XDP. If not, we return error to AF_XDP.
> > >
> > > Yes, as discussed, just having wrappers in the virtio_ring and doing
> > > the switch there. Then can virtio-net use them without worrying about
> > > DMA details?
> >
> >
> > Yes. In the virtio drivers, we can use the wrappers. That is ok.
> >
> > But we also need to support virtqueue_dma_dev() for AF_XDP, because that the
> > AF_XDP will not use the wrappers.
>
> You mean AF_XDP core or other? Could you give me an example?


Yes. The AF_XDP core.

Now the AF_XDP core will do the dma operation.  Because that the memory is
allocated by the user from the user space.  So before putting the memory to the
driver, the AF_XDP will do the dma mapping.


int xp_dma_map(struct xsk_buff_pool *pool, struct device *dev,
	       unsigned long attrs, struct page **pages, u32 nr_pages)
{
	struct xsk_dma_map *dma_map;
	dma_addr_t dma;
	int err;
	u32 i;

	dma_map = xp_find_dma_map(pool);
	if (dma_map) {
		err = xp_init_dma_info(pool, dma_map);
		if (err)
			return err;

		refcount_inc(&dma_map->users);
		return 0;
	}

	dma_map = xp_create_dma_map(dev, pool->netdev, nr_pages, pool->umem);
	if (!dma_map)
		return -ENOMEM;

	for (i = 0; i < dma_map->dma_pages_cnt; i++) {
		dma = dma_map_page_attrs(dev, pages[i], 0, PAGE_SIZE,
					 DMA_BIDIRECTIONAL, attrs);
		if (dma_mapping_error(dev, dma)) {
			__xp_dma_unmap(dma_map, attrs);
			return -ENOMEM;
		}
		if (dma_need_sync(dev, dma))
			dma_map->dma_need_sync = true;
		dma_map->dma_pages[i] = dma;
	}

	if (pool->unaligned)
		xp_check_dma_contiguity(dma_map);

	err = xp_init_dma_info(pool, dma_map);
	if (err) {
		__xp_dma_unmap(dma_map, attrs);
		return err;
	}

	return 0;
}
EXPORT_SYMBOL(xp_dma_map);

Thanks.


>
> Thanks
>
> >
> > So that is ok for you?
> >
> > Thanks.
> >
> > >
> > > Thanks
> > >
> > > >
> > > > Thanks
> > > >
> > > >
> > > >
> > > >
> > > > >
> > > > > Thanks
> > > > >
> > > > > >Otherwise NULL is returned.
> > > > > > > > > This option is currently NO.
> > > > > > > > >
> > > > > > > > > So I'm wondering what should I do, from a DMA point of view, is there any
> > > > > > > > > solution in case of using DMA API?
> > > > > > > > >
> > > > > > > > > Thank you
> > > > > > > >
> > > > > > > >
> > > > > > > > I think it's ok at this point, Christoph just asked you
> > > > > > > > to add wrappers for map/unmap for use in virtio code.
> > > > > > > > Seems like a cosmetic change, shouldn't be hard.
> > > > > > >
> > > > > > > Yes, that is not hard, I has this code.
> > > > > > >
> > > > > > > But, you mean that the wrappers is just used for the virtio driver code?
> > > > > > > And we also offer the  API virtqueue_dma_dev() at the same time?
> > > > > > > Then the driver will has two chooses to do DMA.
> > > > > > >
> > > > > > > Is that so?
> > > > > >
> > > > > > Ping.
> > > > > >
> > > > > > Thanks
> > > > > >
> > > > > > >
> > > > > > >
> > > > > > > > Otherwise I haven't seen significant comments.
> > > > > > > >
> > > > > > > >
> > > > > > > > Christoph do I summarize what you are saying correctly?
> > > > > > > > --
> > > > > > > > MST
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: [PATCH vhost v11 05/10] virtio_ring: introduce virtqueue_dma_dev()
  2023-08-08  3:54                                     ` Xuan Zhuo
@ 2023-08-08  3:59                                         ` Jason Wang
  0 siblings, 0 replies; 176+ messages in thread
From: Jason Wang @ 2023-08-08  3:59 UTC (permalink / raw)
  To: Xuan Zhuo
  Cc: Jesper Dangaard Brouer, Daniel Borkmann, Michael S. Tsirkin,
	netdev, John Fastabend, Alexei Starovoitov, virtualization,
	Christoph Hellwig, Eric Dumazet, Jakub Kicinski, bpf,
	Paolo Abeni, David S. Miller

On Tue, Aug 8, 2023 at 11:57 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
>
> On Tue, 8 Aug 2023 11:49:08 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > On Tue, Aug 8, 2023 at 11:12 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > >
> > > On Tue, 8 Aug 2023 11:08:09 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > > On Tue, Aug 8, 2023 at 10:52 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > > >
> > > > > On Tue, 8 Aug 2023 10:26:04 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > > > > On Mon, Aug 7, 2023 at 2:15 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > > > > >
> > > > > > > On Wed, 2 Aug 2023 09:49:31 +0800, Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > > > > > > On Tue, 1 Aug 2023 12:17:47 -0400, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > > > > > > > > On Fri, Jul 28, 2023 at 02:02:33PM +0800, Xuan Zhuo wrote:
> > > > > > > > > > On Tue, 25 Jul 2023 19:07:23 +0800, Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > > > > > > > > > On Tue, 25 Jul 2023 03:34:34 -0400, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > > > > > > > > > > > On Tue, Jul 25, 2023 at 10:13:48AM +0800, Xuan Zhuo wrote:
> > > > > > > > > > > > > On Mon, 24 Jul 2023 09:43:42 -0700, Christoph Hellwig <hch@infradead.org> wrote:
> > > > > > > > > > > > > > On Thu, Jul 20, 2023 at 01:21:07PM -0400, Michael S. Tsirkin wrote:
> > > > > > > > > > > > > > > Well I think we can add wrappers like virtio_dma_sync and so on.
> > > > > > > > > > > > > > > There are NOP for non-dma so passing the dma device is harmless.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Yes, please.
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > I am not sure I got this fully.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Are you mean this:
> > > > > > > > > > > > > https://lore.kernel.org/all/20230214072704.126660-8-xuanzhuo@linux.alibaba.com/
> > > > > > > > > > > > > https://lore.kernel.org/all/20230214072704.126660-9-xuanzhuo@linux.alibaba.com/
> > > > > > > > > > > > >
> > > > > > > > > > > > > Then the driver must do dma operation(map and sync) by these virtio_dma_* APIs.
> > > > > > > > > > > > > No care the device is non-dma device or dma device.
> > > > > > > > > > > >
> > > > > > > > > > > > yes
> > > > > > > > > > > >
> > > > > > > > > > > > > Then the AF_XDP must use these virtio_dma_* APIs for virtio device.
> > > > > > > > > > > >
> > > > > > > > > > > > We'll worry about AF_XDP when the patch is posted.
> > > > > > > > > > >
> > > > > > > > > > > YES.
> > > > > > > > > > >
> > > > > > > > > > > We discussed it. They voted 'no'.
> > > > > > > > > > >
> > > > > > > > > > > http://lore.kernel.org/all/20230424082856.15c1e593@kernel.org
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > Hi guys, this topic is stuck again. How should I proceed with this work?
> > > > > > > > > >
> > > > > > > > > > Let me briefly summarize:
> > > > > > > > > > 1. The problem with adding virtio_dma_{map, sync} api is that, for AF_XDP and
> > > > > > > > > > the driver layer, we need to support these APIs. The current conclusion of
> > > > > > > > > > AF_XDP is no.
> > > > > > > > > >
> > > > > > > > > > 2. Set dma_set_mask_and_coherent, then we can use DMA API uniformly inside
> > > > > > > > > > driver. This idea seems to be inconsistent with the framework design of DMA. The
> > > > > > > > > > conclusion is no.
> > > > > > > > > >
> > > > > > > > > > 3. We noticed that if the virtio device supports VIRTIO_F_ACCESS_PLATFORM, it
> > > > > > > > > > uses DMA API. And this type of device is the future direction, so we only
> > > > > > > > > > support DMA premapped for this type of virtio device. The problem with this
> > > > > > > > > > solution is that virtqueue_dma_dev() only returns dev in some cases, because
> > > > > > > > > > VIRTIO_F_ACCESS_PLATFORM is supported in such cases.
> > > > > >
> > > > > > Could you explain the issue a little bit more?
> > > > > >
> > > > > > E.g if we limit AF_XDP to ACESS_PLATFROM only, why does
> > > > > > virtqueue_dma_dev() only return dev in some cases?
> > > > >
> > > > > The behavior of virtqueue_dma_dev() is not related to AF_XDP.
> > > > >
> > > > > The return value of virtqueue_dma_dev() is used for the DMA APIs. So it can
> > > > > return dma dev when the virtio is with ACCESS_PLATFORM. If virtio is without
> > > > > ACCESS_PLATFORM then it MUST return NULL.
> > > > >
> > > > > In the virtio-net driver, if the virtqueue_dma_dev() returns dma dev,
> > > > > we can enable AF_XDP. If not, we return error to AF_XDP.
> > > >
> > > > Yes, as discussed, just having wrappers in the virtio_ring and doing
> > > > the switch there. Then can virtio-net use them without worrying about
> > > > DMA details?
> > >
> > >
> > > Yes. In the virtio drivers, we can use the wrappers. That is ok.
> > >
> > > But we also need to support virtqueue_dma_dev() for AF_XDP, because that the
> > > AF_XDP will not use the wrappers.
> >
> > You mean AF_XDP core or other? Could you give me an example?
>
>
> Yes. The AF_XDP core.
>
> Now the AF_XDP core will do the dma operation.  Because that the memory is
> allocated by the user from the user space.  So before putting the memory to the
> driver, the AF_XDP will do the dma mapping.
>
>
> int xp_dma_map(struct xsk_buff_pool *pool, struct device *dev,
>                unsigned long attrs, struct page **pages, u32 nr_pages)
> {

I think it's the driver who passes the device pointer here. Anything I missed?

Thanks

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: [PATCH vhost v11 05/10] virtio_ring: introduce virtqueue_dma_dev()
@ 2023-08-08  3:59                                         ` Jason Wang
  0 siblings, 0 replies; 176+ messages in thread
From: Jason Wang @ 2023-08-08  3:59 UTC (permalink / raw)
  To: Xuan Zhuo
  Cc: Michael S. Tsirkin, Christoph Hellwig, virtualization,
	David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Alexei Starovoitov, Daniel Borkmann, Jesper Dangaard Brouer,
	John Fastabend, netdev, bpf

On Tue, Aug 8, 2023 at 11:57 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
>
> On Tue, 8 Aug 2023 11:49:08 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > On Tue, Aug 8, 2023 at 11:12 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > >
> > > On Tue, 8 Aug 2023 11:08:09 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > > On Tue, Aug 8, 2023 at 10:52 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > > >
> > > > > On Tue, 8 Aug 2023 10:26:04 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > > > > On Mon, Aug 7, 2023 at 2:15 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > > > > >
> > > > > > > On Wed, 2 Aug 2023 09:49:31 +0800, Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > > > > > > On Tue, 1 Aug 2023 12:17:47 -0400, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > > > > > > > > On Fri, Jul 28, 2023 at 02:02:33PM +0800, Xuan Zhuo wrote:
> > > > > > > > > > On Tue, 25 Jul 2023 19:07:23 +0800, Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > > > > > > > > > On Tue, 25 Jul 2023 03:34:34 -0400, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > > > > > > > > > > > On Tue, Jul 25, 2023 at 10:13:48AM +0800, Xuan Zhuo wrote:
> > > > > > > > > > > > > On Mon, 24 Jul 2023 09:43:42 -0700, Christoph Hellwig <hch@infradead.org> wrote:
> > > > > > > > > > > > > > On Thu, Jul 20, 2023 at 01:21:07PM -0400, Michael S. Tsirkin wrote:
> > > > > > > > > > > > > > > Well I think we can add wrappers like virtio_dma_sync and so on.
> > > > > > > > > > > > > > > There are NOP for non-dma so passing the dma device is harmless.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Yes, please.
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > I am not sure I got this fully.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Are you mean this:
> > > > > > > > > > > > > https://lore.kernel.org/all/20230214072704.126660-8-xuanzhuo@linux.alibaba.com/
> > > > > > > > > > > > > https://lore.kernel.org/all/20230214072704.126660-9-xuanzhuo@linux.alibaba.com/
> > > > > > > > > > > > >
> > > > > > > > > > > > > Then the driver must do dma operation(map and sync) by these virtio_dma_* APIs.
> > > > > > > > > > > > > No care the device is non-dma device or dma device.
> > > > > > > > > > > >
> > > > > > > > > > > > yes
> > > > > > > > > > > >
> > > > > > > > > > > > > Then the AF_XDP must use these virtio_dma_* APIs for virtio device.
> > > > > > > > > > > >
> > > > > > > > > > > > We'll worry about AF_XDP when the patch is posted.
> > > > > > > > > > >
> > > > > > > > > > > YES.
> > > > > > > > > > >
> > > > > > > > > > > We discussed it. They voted 'no'.
> > > > > > > > > > >
> > > > > > > > > > > http://lore.kernel.org/all/20230424082856.15c1e593@kernel.org
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > Hi guys, this topic is stuck again. How should I proceed with this work?
> > > > > > > > > >
> > > > > > > > > > Let me briefly summarize:
> > > > > > > > > > 1. The problem with adding virtio_dma_{map, sync} api is that, for AF_XDP and
> > > > > > > > > > the driver layer, we need to support these APIs. The current conclusion of
> > > > > > > > > > AF_XDP is no.
> > > > > > > > > >
> > > > > > > > > > 2. Set dma_set_mask_and_coherent, then we can use DMA API uniformly inside
> > > > > > > > > > driver. This idea seems to be inconsistent with the framework design of DMA. The
> > > > > > > > > > conclusion is no.
> > > > > > > > > >
> > > > > > > > > > 3. We noticed that if the virtio device supports VIRTIO_F_ACCESS_PLATFORM, it
> > > > > > > > > > uses DMA API. And this type of device is the future direction, so we only
> > > > > > > > > > support DMA premapped for this type of virtio device. The problem with this
> > > > > > > > > > solution is that virtqueue_dma_dev() only returns dev in some cases, because
> > > > > > > > > > VIRTIO_F_ACCESS_PLATFORM is supported in such cases.
> > > > > >
> > > > > > Could you explain the issue a little bit more?
> > > > > >
> > > > > > E.g if we limit AF_XDP to ACESS_PLATFROM only, why does
> > > > > > virtqueue_dma_dev() only return dev in some cases?
> > > > >
> > > > > The behavior of virtqueue_dma_dev() is not related to AF_XDP.
> > > > >
> > > > > The return value of virtqueue_dma_dev() is used for the DMA APIs. So it can
> > > > > return dma dev when the virtio is with ACCESS_PLATFORM. If virtio is without
> > > > > ACCESS_PLATFORM then it MUST return NULL.
> > > > >
> > > > > In the virtio-net driver, if the virtqueue_dma_dev() returns dma dev,
> > > > > we can enable AF_XDP. If not, we return error to AF_XDP.
> > > >
> > > > Yes, as discussed, just having wrappers in the virtio_ring and doing
> > > > the switch there. Then can virtio-net use them without worrying about
> > > > DMA details?
> > >
> > >
> > > Yes. In the virtio drivers, we can use the wrappers. That is ok.
> > >
> > > But we also need to support virtqueue_dma_dev() for AF_XDP, because that the
> > > AF_XDP will not use the wrappers.
> >
> > You mean AF_XDP core or other? Could you give me an example?
>
>
> Yes. The AF_XDP core.
>
> Now the AF_XDP core will do the dma operation.  Because that the memory is
> allocated by the user from the user space.  So before putting the memory to the
> driver, the AF_XDP will do the dma mapping.
>
>
> int xp_dma_map(struct xsk_buff_pool *pool, struct device *dev,
>                unsigned long attrs, struct page **pages, u32 nr_pages)
> {

I think it's the driver who passes the device pointer here. Anything I missed?

Thanks


^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: [PATCH vhost v11 05/10] virtio_ring: introduce virtqueue_dma_dev()
  2023-08-08  3:59                                         ` Jason Wang
  (?)
@ 2023-08-08  4:07                                         ` Xuan Zhuo
  -1 siblings, 0 replies; 176+ messages in thread
From: Xuan Zhuo @ 2023-08-08  4:07 UTC (permalink / raw)
  To: Jason Wang
  Cc: Jesper Dangaard Brouer, Daniel Borkmann, Michael S. Tsirkin,
	netdev, John Fastabend, Alexei Starovoitov, virtualization,
	Christoph Hellwig, Eric Dumazet, Jakub Kicinski, bpf,
	Paolo Abeni, David S. Miller

On Tue, 8 Aug 2023 11:59:04 +0800, Jason Wang <jasowang@redhat.com> wrote:
> On Tue, Aug 8, 2023 at 11:57 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> >
> > On Tue, 8 Aug 2023 11:49:08 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > On Tue, Aug 8, 2023 at 11:12 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > >
> > > > On Tue, 8 Aug 2023 11:08:09 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > > > On Tue, Aug 8, 2023 at 10:52 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > > > >
> > > > > > On Tue, 8 Aug 2023 10:26:04 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > > > > > On Mon, Aug 7, 2023 at 2:15 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > > > > > >
> > > > > > > > On Wed, 2 Aug 2023 09:49:31 +0800, Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > > > > > > > On Tue, 1 Aug 2023 12:17:47 -0400, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > > > > > > > > > On Fri, Jul 28, 2023 at 02:02:33PM +0800, Xuan Zhuo wrote:
> > > > > > > > > > > On Tue, 25 Jul 2023 19:07:23 +0800, Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > > > > > > > > > > On Tue, 25 Jul 2023 03:34:34 -0400, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > > > > > > > > > > > > On Tue, Jul 25, 2023 at 10:13:48AM +0800, Xuan Zhuo wrote:
> > > > > > > > > > > > > > On Mon, 24 Jul 2023 09:43:42 -0700, Christoph Hellwig <hch@infradead.org> wrote:
> > > > > > > > > > > > > > > On Thu, Jul 20, 2023 at 01:21:07PM -0400, Michael S. Tsirkin wrote:
> > > > > > > > > > > > > > > > Well I think we can add wrappers like virtio_dma_sync and so on.
> > > > > > > > > > > > > > > > There are NOP for non-dma so passing the dma device is harmless.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Yes, please.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > I am not sure I got this fully.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Are you mean this:
> > > > > > > > > > > > > > https://lore.kernel.org/all/20230214072704.126660-8-xuanzhuo@linux.alibaba.com/
> > > > > > > > > > > > > > https://lore.kernel.org/all/20230214072704.126660-9-xuanzhuo@linux.alibaba.com/
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Then the driver must do dma operation(map and sync) by these virtio_dma_* APIs.
> > > > > > > > > > > > > > No care the device is non-dma device or dma device.
> > > > > > > > > > > > >
> > > > > > > > > > > > > yes
> > > > > > > > > > > > >
> > > > > > > > > > > > > > Then the AF_XDP must use these virtio_dma_* APIs for virtio device.
> > > > > > > > > > > > >
> > > > > > > > > > > > > We'll worry about AF_XDP when the patch is posted.
> > > > > > > > > > > >
> > > > > > > > > > > > YES.
> > > > > > > > > > > >
> > > > > > > > > > > > We discussed it. They voted 'no'.
> > > > > > > > > > > >
> > > > > > > > > > > > http://lore.kernel.org/all/20230424082856.15c1e593@kernel.org
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > Hi guys, this topic is stuck again. How should I proceed with this work?
> > > > > > > > > > >
> > > > > > > > > > > Let me briefly summarize:
> > > > > > > > > > > 1. The problem with adding virtio_dma_{map, sync} api is that, for AF_XDP and
> > > > > > > > > > > the driver layer, we need to support these APIs. The current conclusion of
> > > > > > > > > > > AF_XDP is no.
> > > > > > > > > > >
> > > > > > > > > > > 2. Set dma_set_mask_and_coherent, then we can use DMA API uniformly inside
> > > > > > > > > > > driver. This idea seems to be inconsistent with the framework design of DMA. The
> > > > > > > > > > > conclusion is no.
> > > > > > > > > > >
> > > > > > > > > > > 3. We noticed that if the virtio device supports VIRTIO_F_ACCESS_PLATFORM, it
> > > > > > > > > > > uses DMA API. And this type of device is the future direction, so we only
> > > > > > > > > > > support DMA premapped for this type of virtio device. The problem with this
> > > > > > > > > > > solution is that virtqueue_dma_dev() only returns dev in some cases, because
> > > > > > > > > > > VIRTIO_F_ACCESS_PLATFORM is supported in such cases.
> > > > > > >
> > > > > > > Could you explain the issue a little bit more?
> > > > > > >
> > > > > > > E.g if we limit AF_XDP to ACESS_PLATFROM only, why does
> > > > > > > virtqueue_dma_dev() only return dev in some cases?
> > > > > >
> > > > > > The behavior of virtqueue_dma_dev() is not related to AF_XDP.
> > > > > >
> > > > > > The return value of virtqueue_dma_dev() is used for the DMA APIs. So it can
> > > > > > return dma dev when the virtio is with ACCESS_PLATFORM. If virtio is without
> > > > > > ACCESS_PLATFORM then it MUST return NULL.
> > > > > >
> > > > > > In the virtio-net driver, if the virtqueue_dma_dev() returns dma dev,
> > > > > > we can enable AF_XDP. If not, we return error to AF_XDP.
> > > > >
> > > > > Yes, as discussed, just having wrappers in the virtio_ring and doing
> > > > > the switch there. Then can virtio-net use them without worrying about
> > > > > DMA details?
> > > >
> > > >
> > > > Yes. In the virtio drivers, we can use the wrappers. That is ok.
> > > >
> > > > But we also need to support virtqueue_dma_dev() for AF_XDP, because that the
> > > > AF_XDP will not use the wrappers.
> > >
> > > You mean AF_XDP core or other? Could you give me an example?
> >
> >
> > Yes. The AF_XDP core.
> >
> > Now the AF_XDP core will do the dma operation.  Because that the memory is
> > allocated by the user from the user space.  So before putting the memory to the
> > driver, the AF_XDP will do the dma mapping.
> >
> >
> > int xp_dma_map(struct xsk_buff_pool *pool, struct device *dev,
> >                unsigned long attrs, struct page **pages, u32 nr_pages)
> > {
>
> I think it's the driver who passes the device pointer here. Anything I missed?

YES.

When the AF_XPD is bonding to the device queue, the driver should call this to
do the dma inside the AF_XDP core, and the dev is recorded  by the AF_XDP core.

The AF_XDP will call dma_sync APIs when the driver reads the buffers from the rx
queue and the driver xmit the buffers to the device.

The AF_XDP will do some DMA operation automatically.
The DMA APIs is used by the AF_XDP core.

Thanks.





>
> Thanks
>
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: [PATCH vhost v11 05/10] virtio_ring: introduce virtqueue_dma_dev()
  2023-08-07  6:14                           ` Xuan Zhuo
@ 2023-08-10  1:56                             ` Xuan Zhuo
  -1 siblings, 0 replies; 176+ messages in thread
From: Xuan Zhuo @ 2023-08-10  1:56 UTC (permalink / raw)
  To: Michael S. Tsirkin, Jason Wang, Christoph Hellwig
  Cc: virtualization, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Alexei Starovoitov, Daniel Borkmann,
	Jesper Dangaard Brouer, John Fastabend, netdev, bpf


Ping!!

Could we push this to the next linux version?

Thanks.


^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: [PATCH vhost v11 05/10] virtio_ring: introduce virtqueue_dma_dev()
@ 2023-08-10  1:56                             ` Xuan Zhuo
  0 siblings, 0 replies; 176+ messages in thread
From: Xuan Zhuo @ 2023-08-10  1:56 UTC (permalink / raw)
  To: Michael S. Tsirkin, Jason Wang, Christoph Hellwig
  Cc: Jesper Dangaard Brouer, Daniel Borkmann, netdev, John Fastabend,
	Alexei Starovoitov, virtualization, Eric Dumazet, Jakub Kicinski,
	bpf, Paolo Abeni, David S. Miller


Ping!!

Could we push this to the next linux version?

Thanks.

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: [PATCH vhost v11 05/10] virtio_ring: introduce virtqueue_dma_dev()
  2023-08-10  1:56                             ` Xuan Zhuo
@ 2023-08-10  6:37                               ` Jason Wang
  -1 siblings, 0 replies; 176+ messages in thread
From: Jason Wang @ 2023-08-10  6:37 UTC (permalink / raw)
  To: Xuan Zhuo
  Cc: Michael S. Tsirkin, Christoph Hellwig, virtualization,
	David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Alexei Starovoitov, Daniel Borkmann, Jesper Dangaard Brouer,
	John Fastabend, netdev, bpf

On Thu, Aug 10, 2023 at 9:59 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
>
>
> Ping!!
>
> Could we push this to the next linux version?

How about implementing the wrappers along with virtqueue_dma_dev() to
see if Christoph is happy?

Thanks

>
> Thanks.
>


^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: [PATCH vhost v11 05/10] virtio_ring: introduce virtqueue_dma_dev()
@ 2023-08-10  6:37                               ` Jason Wang
  0 siblings, 0 replies; 176+ messages in thread
From: Jason Wang @ 2023-08-10  6:37 UTC (permalink / raw)
  To: Xuan Zhuo
  Cc: Jesper Dangaard Brouer, Daniel Borkmann, Michael S. Tsirkin,
	netdev, John Fastabend, Alexei Starovoitov, virtualization,
	Christoph Hellwig, Eric Dumazet, Jakub Kicinski, bpf,
	Paolo Abeni, David S. Miller

On Thu, Aug 10, 2023 at 9:59 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
>
>
> Ping!!
>
> Could we push this to the next linux version?

How about implementing the wrappers along with virtqueue_dma_dev() to
see if Christoph is happy?

Thanks

>
> Thanks.
>

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: [PATCH vhost v11 05/10] virtio_ring: introduce virtqueue_dma_dev()
  2023-08-10  1:56                             ` Xuan Zhuo
@ 2023-08-10  6:39                               ` Michael S. Tsirkin
  -1 siblings, 0 replies; 176+ messages in thread
From: Michael S. Tsirkin @ 2023-08-10  6:39 UTC (permalink / raw)
  To: Xuan Zhuo
  Cc: Jesper Dangaard Brouer, Daniel Borkmann, netdev, John Fastabend,
	Alexei Starovoitov, virtualization, Christoph Hellwig,
	Eric Dumazet, Jakub Kicinski, bpf, Paolo Abeni, David S. Miller

On Thu, Aug 10, 2023 at 09:56:54AM +0800, Xuan Zhuo wrote:
> 
> Ping!!
> 
> Could we push this to the next linux version?
> 
> Thanks.

You sent v12, so not this one for sure.
v12 triggered kbuild warnings, you need to fix them and repost.
Note that I'm on vacation from monday, so if you want this
merged this needs to be addressed ASAP.

-- 
MST

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: [PATCH vhost v11 05/10] virtio_ring: introduce virtqueue_dma_dev()
@ 2023-08-10  6:39                               ` Michael S. Tsirkin
  0 siblings, 0 replies; 176+ messages in thread
From: Michael S. Tsirkin @ 2023-08-10  6:39 UTC (permalink / raw)
  To: Xuan Zhuo
  Cc: Jason Wang, Christoph Hellwig, virtualization, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Alexei Starovoitov,
	Daniel Borkmann, Jesper Dangaard Brouer, John Fastabend, netdev,
	bpf

On Thu, Aug 10, 2023 at 09:56:54AM +0800, Xuan Zhuo wrote:
> 
> Ping!!
> 
> Could we push this to the next linux version?
> 
> Thanks.

You sent v12, so not this one for sure.
v12 triggered kbuild warnings, you need to fix them and repost.
Note that I'm on vacation from monday, so if you want this
merged this needs to be addressed ASAP.

-- 
MST


^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: [PATCH vhost v11 05/10] virtio_ring: introduce virtqueue_dma_dev()
  2023-08-10  6:37                               ` Jason Wang
@ 2023-08-10  6:40                                 ` Michael S. Tsirkin
  -1 siblings, 0 replies; 176+ messages in thread
From: Michael S. Tsirkin @ 2023-08-10  6:40 UTC (permalink / raw)
  To: Jason Wang
  Cc: Xuan Zhuo, Jesper Dangaard Brouer, Daniel Borkmann, netdev,
	John Fastabend, Alexei Starovoitov, virtualization,
	Christoph Hellwig, Eric Dumazet, Jakub Kicinski, bpf,
	Paolo Abeni, David S. Miller

On Thu, Aug 10, 2023 at 02:37:20PM +0800, Jason Wang wrote:
> On Thu, Aug 10, 2023 at 9:59 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> >
> >
> > Ping!!
> >
> > Could we push this to the next linux version?
> 
> How about implementing the wrappers along with virtqueue_dma_dev() to
> see if Christoph is happy?
> 
> Thanks

That, too.

> >
> > Thanks.
> >

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: [PATCH vhost v11 05/10] virtio_ring: introduce virtqueue_dma_dev()
@ 2023-08-10  6:40                                 ` Michael S. Tsirkin
  0 siblings, 0 replies; 176+ messages in thread
From: Michael S. Tsirkin @ 2023-08-10  6:40 UTC (permalink / raw)
  To: Jason Wang
  Cc: Xuan Zhuo, Christoph Hellwig, virtualization, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Alexei Starovoitov,
	Daniel Borkmann, Jesper Dangaard Brouer, John Fastabend, netdev,
	bpf

On Thu, Aug 10, 2023 at 02:37:20PM +0800, Jason Wang wrote:
> On Thu, Aug 10, 2023 at 9:59 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> >
> >
> > Ping!!
> >
> > Could we push this to the next linux version?
> 
> How about implementing the wrappers along with virtqueue_dma_dev() to
> see if Christoph is happy?
> 
> Thanks

That, too.

> >
> > Thanks.
> >


^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: [PATCH vhost v11 05/10] virtio_ring: introduce virtqueue_dma_dev()
  2023-08-10  6:39                               ` Michael S. Tsirkin
@ 2023-08-10  6:47                                 ` Xuan Zhuo
  -1 siblings, 0 replies; 176+ messages in thread
From: Xuan Zhuo @ 2023-08-10  6:47 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Jason Wang, Christoph Hellwig, virtualization, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Alexei Starovoitov,
	Daniel Borkmann, Jesper Dangaard Brouer, John Fastabend, netdev,
	bpf

On Thu, 10 Aug 2023 02:39:47 -0400, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> On Thu, Aug 10, 2023 at 09:56:54AM +0800, Xuan Zhuo wrote:
> >
> > Ping!!
> >
> > Could we push this to the next linux version?
> >
> > Thanks.
>
> You sent v12, so not this one for sure.
> v12 triggered kbuild warnings, you need to fix them and repost.
> Note that I'm on vacation from monday, so if you want this
> merged this needs to be addressed ASAP.

I will post a new version today. The driver will use the wrappers.

Thanks.


>
> --
> MST
>

^ permalink raw reply	[flat|nested] 176+ messages in thread

* Re: [PATCH vhost v11 05/10] virtio_ring: introduce virtqueue_dma_dev()
@ 2023-08-10  6:47                                 ` Xuan Zhuo
  0 siblings, 0 replies; 176+ messages in thread
From: Xuan Zhuo @ 2023-08-10  6:47 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Jesper Dangaard Brouer, Daniel Borkmann, netdev, John Fastabend,
	Alexei Starovoitov, virtualization, Christoph Hellwig,
	Eric Dumazet, Jakub Kicinski, bpf, Paolo Abeni, David S. Miller

On Thu, 10 Aug 2023 02:39:47 -0400, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> On Thu, Aug 10, 2023 at 09:56:54AM +0800, Xuan Zhuo wrote:
> >
> > Ping!!
> >
> > Could we push this to the next linux version?
> >
> > Thanks.
>
> You sent v12, so not this one for sure.
> v12 triggered kbuild warnings, you need to fix them and repost.
> Note that I'm on vacation from monday, so if you want this
> merged this needs to be addressed ASAP.

I will post a new version today. The driver will use the wrappers.

Thanks.


>
> --
> MST
>
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 176+ messages in thread

end of thread, other threads:[~2023-08-10  6:48 UTC | newest]

Thread overview: 176+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-07-10  3:42 [PATCH vhost v11 00/10] virtio core prepares for AF_XDP Xuan Zhuo
2023-07-10  3:42 ` Xuan Zhuo
2023-07-10  3:42 ` [PATCH vhost v11 01/10] virtio_ring: check use_dma_api before unmap desc for indirect Xuan Zhuo
2023-07-10  3:42   ` Xuan Zhuo
2023-07-10  3:42 ` [PATCH vhost v11 02/10] virtio_ring: put mapping error check in vring_map_one_sg Xuan Zhuo
2023-07-10  3:42   ` Xuan Zhuo
2023-07-10  3:42 ` [PATCH vhost v11 03/10] virtio_ring: introduce virtqueue_set_premapped() Xuan Zhuo
2023-07-10  3:42   ` Xuan Zhuo
2023-07-12  8:24   ` Jason Wang
2023-07-12  8:35     ` Xuan Zhuo
2023-07-12  8:35       ` Xuan Zhuo
2023-07-13 11:14   ` Christoph Hellwig
2023-07-13 11:14     ` Christoph Hellwig
2023-07-13 14:47     ` Michael S. Tsirkin
2023-07-13 14:47       ` Michael S. Tsirkin
2023-07-20  6:22       ` Christoph Hellwig
2023-07-20  6:22         ` Christoph Hellwig
2023-07-13 14:52     ` Michael S. Tsirkin
2023-07-13 14:52       ` Michael S. Tsirkin
2023-07-10  3:42 ` [PATCH vhost v11 04/10] virtio_ring: support add premapped buf Xuan Zhuo
2023-07-10  3:42   ` Xuan Zhuo
2023-07-12  8:31   ` Jason Wang
2023-07-12  8:31     ` Jason Wang
2023-07-12  8:33     ` Xuan Zhuo
2023-07-12  8:33       ` Xuan Zhuo
2023-07-10  3:42 ` [PATCH vhost v11 05/10] virtio_ring: introduce virtqueue_dma_dev() Xuan Zhuo
2023-07-10  3:42   ` Xuan Zhuo
2023-07-12  8:33   ` Jason Wang
2023-07-12  8:33     ` Jason Wang
2023-07-13 11:15   ` Christoph Hellwig
2023-07-13 11:15     ` Christoph Hellwig
2023-07-13 14:51     ` Michael S. Tsirkin
2023-07-13 14:51       ` Michael S. Tsirkin
2023-07-20  6:22       ` Christoph Hellwig
2023-07-20  6:22         ` Christoph Hellwig
2023-07-20  6:45         ` Xuan Zhuo
2023-07-20  6:45           ` Xuan Zhuo
2023-07-20  6:57           ` Christoph Hellwig
2023-07-20  6:57             ` Christoph Hellwig
2023-07-20  7:34             ` Xuan Zhuo
2023-07-20  7:34               ` Xuan Zhuo
2023-07-24 20:05               ` Michael S. Tsirkin
2023-07-24 20:05                 ` Michael S. Tsirkin
2023-07-20 17:21         ` Michael S. Tsirkin
2023-07-20 17:21           ` Michael S. Tsirkin
2023-07-24 16:43           ` Christoph Hellwig
2023-07-24 16:43             ` Christoph Hellwig
2023-07-25  2:13             ` Xuan Zhuo
2023-07-25  2:13               ` Xuan Zhuo
2023-07-25  7:34               ` Michael S. Tsirkin
2023-07-25  7:34                 ` Michael S. Tsirkin
2023-07-25 11:07                 ` Xuan Zhuo
2023-07-25 11:07                   ` Xuan Zhuo
2023-07-28  6:02                   ` Xuan Zhuo
2023-07-28  6:02                     ` Xuan Zhuo
2023-07-28 15:03                     ` Jakub Kicinski
2023-07-31  1:23                       ` Jason Wang
2023-07-31  1:23                         ` Jason Wang
2023-07-31 15:46                         ` Jakub Kicinski
2023-08-01  2:03                           ` Xuan Zhuo
2023-08-01  2:03                             ` Xuan Zhuo
2023-08-01  2:36                             ` Jakub Kicinski
2023-08-01  2:57                               ` Xuan Zhuo
2023-08-01  2:57                                 ` Xuan Zhuo
2023-08-01 15:45                                 ` Jakub Kicinski
2023-08-02  1:36                                   ` Xuan Zhuo
2023-08-02  1:36                                     ` Xuan Zhuo
2023-08-02 11:12                                     ` Pavel Begunkov
2023-07-31  2:34                       ` Xuan Zhuo
2023-07-31  2:34                         ` Xuan Zhuo
2023-08-01 16:17                     ` Michael S. Tsirkin
2023-08-01 16:17                       ` Michael S. Tsirkin
2023-08-02  1:49                       ` Xuan Zhuo
2023-08-02  1:49                         ` Xuan Zhuo
2023-08-07  6:14                         ` Xuan Zhuo
2023-08-07  6:14                           ` Xuan Zhuo
2023-08-08  2:26                           ` Jason Wang
2023-08-08  2:26                             ` Jason Wang
2023-08-08  2:47                             ` Xuan Zhuo
2023-08-08  2:47                               ` Xuan Zhuo
2023-08-08  3:08                               ` Jason Wang
2023-08-08  3:08                                 ` Jason Wang
2023-08-08  3:09                                 ` Xuan Zhuo
2023-08-08  3:09                                   ` Xuan Zhuo
2023-08-08  3:49                                   ` Jason Wang
2023-08-08  3:49                                     ` Jason Wang
2023-08-08  3:54                                     ` Xuan Zhuo
2023-08-08  3:59                                       ` Jason Wang
2023-08-08  3:59                                         ` Jason Wang
2023-08-08  4:07                                         ` Xuan Zhuo
2023-08-10  1:56                           ` Xuan Zhuo
2023-08-10  1:56                             ` Xuan Zhuo
2023-08-10  6:37                             ` Jason Wang
2023-08-10  6:37                               ` Jason Wang
2023-08-10  6:40                               ` Michael S. Tsirkin
2023-08-10  6:40                                 ` Michael S. Tsirkin
2023-08-10  6:39                             ` Michael S. Tsirkin
2023-08-10  6:39                               ` Michael S. Tsirkin
2023-08-10  6:47                               ` Xuan Zhuo
2023-08-10  6:47                                 ` Xuan Zhuo
2023-07-10  3:42 ` [PATCH vhost v11 06/10] virtio_ring: skip unmap for premapped Xuan Zhuo
2023-07-10  3:42   ` Xuan Zhuo
2023-07-13  3:50   ` Jason Wang
2023-07-13  3:50     ` Jason Wang
2023-07-13  4:02     ` Xuan Zhuo
2023-07-13  4:02       ` Xuan Zhuo
2023-07-13  4:21       ` Jason Wang
2023-07-13  4:21         ` Jason Wang
2023-07-13  5:45         ` Xuan Zhuo
2023-07-13  5:45           ` Xuan Zhuo
2023-07-10  3:42 ` [PATCH vhost v11 07/10] virtio_ring: correct the expression of the description of virtqueue_resize() Xuan Zhuo
2023-07-10  3:42   ` Xuan Zhuo
2023-07-10  3:42 ` [PATCH vhost v11 08/10] virtio_ring: separate the logic of reset/enable from virtqueue_resize Xuan Zhuo
2023-07-10  3:42   ` Xuan Zhuo
2023-07-10  3:42 ` [PATCH vhost v11 09/10] virtio_ring: introduce virtqueue_reset() Xuan Zhuo
2023-07-10  3:42   ` Xuan Zhuo
2023-07-10  3:42 ` [PATCH vhost v11 10/10] virtio_net: merge dma operation for one page Xuan Zhuo
2023-07-10  3:42   ` Xuan Zhuo
2023-07-10  9:40   ` Michael S. Tsirkin
2023-07-10  9:40     ` Michael S. Tsirkin
2023-07-10 10:18     ` Xuan Zhuo
2023-07-10 10:18       ` Xuan Zhuo
2023-07-10 11:59       ` Michael S. Tsirkin
2023-07-10 11:59         ` Michael S. Tsirkin
2023-07-10 12:38         ` Xuan Zhuo
2023-07-10 12:38           ` Xuan Zhuo
2023-07-11  2:36           ` Jason Wang
2023-07-11  2:36             ` Jason Wang
2023-07-11  2:40             ` Xuan Zhuo
2023-07-11  2:40               ` Xuan Zhuo
2023-07-11  2:58               ` Jason Wang
2023-07-11  2:58                 ` Jason Wang
2023-07-12  7:54                 ` Xuan Zhuo
2023-07-12  7:54                   ` Xuan Zhuo
2023-07-12  8:32                   ` Xuan Zhuo
2023-07-12  8:32                     ` Xuan Zhuo
2023-07-12  8:37                     ` Jason Wang
2023-07-12  8:37                       ` Jason Wang
2023-07-12  8:38                       ` Xuan Zhuo
2023-07-12  8:38                         ` Xuan Zhuo
2023-07-14 10:37                         ` Michael S. Tsirkin
2023-07-14 10:37                           ` Michael S. Tsirkin
2023-07-19  3:21                           ` Xuan Zhuo
2023-07-19  3:21                             ` Xuan Zhuo
2023-07-19  8:55                             ` Michael S. Tsirkin
2023-07-19  8:55                               ` Michael S. Tsirkin
2023-07-19  9:38                               ` Jason Wang
2023-07-19  9:38                                 ` Jason Wang
2023-07-19  9:51                                 ` Michael S. Tsirkin
2023-07-19  9:51                                   ` Michael S. Tsirkin
2023-07-20  2:26                                   ` Xuan Zhuo
2023-07-20  2:26                                     ` Xuan Zhuo
2023-07-20  2:24                               ` Xuan Zhuo
2023-07-20  2:24                                 ` Xuan Zhuo
2023-07-13  4:20   ` Jason Wang
2023-07-13  4:20     ` Jason Wang
2023-07-13  5:53     ` Xuan Zhuo
2023-07-13  5:53       ` Xuan Zhuo
2023-07-13  6:51     ` Xuan Zhuo
2023-07-13  6:51       ` Xuan Zhuo
2023-07-14  3:56       ` Jason Wang
2023-07-14  3:56         ` Jason Wang
2023-07-20  6:23         ` Christoph Hellwig
2023-07-20  6:23           ` Christoph Hellwig
2023-07-20  7:41           ` Jason Wang
2023-07-20  7:41             ` Jason Wang
2023-07-20  8:21             ` Christoph Hellwig
2023-07-20  8:21               ` Christoph Hellwig
2023-07-13  7:00     ` Xuan Zhuo
2023-07-13  7:00       ` Xuan Zhuo
2023-07-14  3:57       ` Jason Wang
2023-07-14  3:57         ` Jason Wang
2023-07-14  3:58         ` Xuan Zhuo
2023-07-14  3:58           ` Xuan Zhuo
2023-07-14  5:45           ` Jason Wang
2023-07-14  5:45             ` Jason Wang

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.