All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH net-next 0/5] virtio-net: sq support premapped mode
@ 2024-01-16  7:59 Xuan Zhuo
  2024-01-16  7:59 ` [PATCH net-next 1/5] virtio_ring: introduce virtqueue_get_buf_ctx_dma() Xuan Zhuo
                   ` (6 more replies)
  0 siblings, 7 replies; 33+ messages in thread
From: Xuan Zhuo @ 2024-01-16  7:59 UTC (permalink / raw)
  To: netdev
  Cc: Michael S. Tsirkin, Jason Wang, Xuan Zhuo, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Alexei Starovoitov,
	Daniel Borkmann, Jesper Dangaard Brouer, John Fastabend,
	virtualization, bpf

This is the second part of virtio-net support AF_XDP zero copy.

The whole patch set
http://lore.kernel.org/all/20231229073108.57778-1-xuanzhuo@linux.alibaba.com

## About the branch

This patch set is pushed to the net-next branch, but some patches are about
virtio core. Because the entire patch set for virtio-net to support AF_XDP
should be pushed to net-next, I hope these patches will be merged into net-next
with the virtio core maintains's Acked-by.

============================================================================

## AF_XDP

XDP socket(AF_XDP) is an excellent bypass kernel network framework. The zero
copy feature of xsk (XDP socket) needs to be supported by the driver. The
performance of zero copy is very good. mlx5 and intel ixgbe already support
this feature, This patch set allows virtio-net to support xsk's zerocopy xmit
feature.

At present, we have completed some preparation:

1. vq-reset (virtio spec and kernel code)
2. virtio-core premapped dma
3. virtio-net xdp refactor

So it is time for Virtio-Net to complete the support for the XDP Socket
Zerocopy.

Virtio-net can not increase the queue num at will, so xsk shares the queue with
kernel.

On the other hand, Virtio-Net does not support generate interrupt from driver
manually, so when we wakeup tx xmit, we used some tips. If the CPU run by TX
NAPI last time is other CPUs, use IPI to wake up NAPI on the remote CPU. If it
is also the local CPU, then we wake up napi directly.

This patch set includes some refactor to the virtio-net to let that to support
AF_XDP.

## performance

ENV: Qemu with vhost-user(polling mode).
Host CPU: Intel(R) Xeon(R) Platinum 8163 CPU @ 2.50GHz

### virtio PMD in guest with testpmd

testpmd> show port stats all

 ######################## NIC statistics for port 0 ########################
 RX-packets: 19531092064 RX-missed: 0     RX-bytes: 1093741155584
 RX-errors: 0
 RX-nombuf: 0
 TX-packets: 5959955552 TX-errors: 0     TX-bytes: 371030645664


 Throughput (since last show)
 Rx-pps:   8861574     Rx-bps:  3969985208
 Tx-pps:   8861493     Tx-bps:  3969962736
 ############################################################################

### AF_XDP PMD in guest with testpmd

testpmd> show port stats all

  ######################## NIC statistics for port 0  ########################
  RX-packets: 68152727   RX-missed: 0          RX-bytes:  3816552712
  RX-errors: 0
  RX-nombuf:  0
  TX-packets: 68114967   TX-errors: 33216      TX-bytes:  3814438152

  Throughput (since last show)
  Rx-pps:      6333196          Rx-bps:   2837272088
  Tx-pps:      6333227          Tx-bps:   2837285936
  ############################################################################

But AF_XDP consumes more CPU for tx and rx napi(100% and 86%).

## maintain

I am currently a reviewer for virtio-net. I commit to maintain AF_XDP support in
virtio-net.

Please review.

Thanks.
Xuan Zhuo (5):
  virtio_ring: introduce virtqueue_get_buf_ctx_dma()
  virtio_ring: virtqueue_disable_and_recycle let the callback detach
    bufs
  virtio_ring: introduce virtqueue_detach_unused_buf_dma()
  virtio_ring: introduce virtqueue_get_dma_premapped()
  virtio_net: sq support premapped mode

 drivers/net/virtio/main.c       | 177 ++++++++++++++++++------
 drivers/net/virtio/virtio_net.h |  13 +-
 drivers/virtio/virtio_ring.c    | 235 ++++++++++++++++++++++++--------
 include/linux/virtio.h          |  22 ++-
 4 files changed, 340 insertions(+), 107 deletions(-)

--
2.32.0.3.g01195cf9f


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH net-next 1/5] virtio_ring: introduce virtqueue_get_buf_ctx_dma()
  2024-01-16  7:59 [PATCH net-next 0/5] virtio-net: sq support premapped mode Xuan Zhuo
@ 2024-01-16  7:59 ` Xuan Zhuo
  2024-01-24  6:54   ` Jason Wang
  2024-02-22 19:43   ` Michael S. Tsirkin
  2024-01-16  7:59 ` [PATCH net-next 2/5] virtio_ring: virtqueue_disable_and_recycle let the callback detach bufs Xuan Zhuo
                   ` (5 subsequent siblings)
  6 siblings, 2 replies; 33+ messages in thread
From: Xuan Zhuo @ 2024-01-16  7:59 UTC (permalink / raw)
  To: netdev
  Cc: Michael S. Tsirkin, Jason Wang, Xuan Zhuo, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Alexei Starovoitov,
	Daniel Borkmann, Jesper Dangaard Brouer, John Fastabend,
	virtualization, bpf

introduce virtqueue_get_buf_ctx_dma() to collect the dma info when
get buf from virtio core for premapped mode.

If the virtio queue is premapped mode, the virtio-net send buf may
have many desc. Every desc dma address need to be unmap. So here we
introduce a new helper to collect the dma address of the buffer from
the virtio core.

Because the BAD_RING is called (that may set vq->broken), so
the relative "const" of vq is removed.

Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
---
 drivers/virtio/virtio_ring.c | 174 +++++++++++++++++++++++++----------
 include/linux/virtio.h       |  16 ++++
 2 files changed, 142 insertions(+), 48 deletions(-)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index 49299b1f9ec7..82f72428605b 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -362,6 +362,45 @@ static struct device *vring_dma_dev(const struct vring_virtqueue *vq)
 	return vq->dma_dev;
 }
 
+/*
+ *     use_dma_api premapped -> do_unmap
+ *  1. false       false        false
+ *  2. true        false        true
+ *  3. true        true         false
+ *
+ * Only #3, we should return the DMA info to the driver.
+ *
+ * Return:
+ * true: the virtio core must unmap the desc
+ * false: the virtio core skip the desc unmap
+ */
+static bool vring_need_unmap(struct vring_virtqueue *vq,
+			     struct virtio_dma_head *dma,
+			     dma_addr_t addr, unsigned int length)
+{
+	if (vq->do_unmap)
+		return true;
+
+	if (!vq->premapped)
+		return false;
+
+	if (!dma)
+		return false;
+
+	if (unlikely(dma->next >= dma->num)) {
+		BAD_RING(vq, "premapped vq: collect dma overflow: %pad %u\n",
+			 &addr, length);
+		return false;
+	}
+
+	dma->items[dma->next].addr = addr;
+	dma->items[dma->next].length = length;
+
+	++dma->next;
+
+	return false;
+}
+
 /* Map one sg entry. */
 static int vring_map_one_sg(const struct vring_virtqueue *vq, struct scatterlist *sg,
 			    enum dma_data_direction direction, dma_addr_t *addr)
@@ -440,12 +479,14 @@ static void virtqueue_init(struct vring_virtqueue *vq, u32 num)
  * Split ring specific functions - *_split().
  */
 
-static void vring_unmap_one_split_indirect(const struct vring_virtqueue *vq,
-					   const struct vring_desc *desc)
+static void vring_unmap_one_split_indirect(struct vring_virtqueue *vq,
+					   const struct vring_desc *desc,
+					   struct virtio_dma_head *dma)
 {
 	u16 flags;
 
-	if (!vq->do_unmap)
+	if (!vring_need_unmap(vq, dma, virtio64_to_cpu(vq->vq.vdev, desc->addr),
+			  virtio32_to_cpu(vq->vq.vdev, desc->len)))
 		return;
 
 	flags = virtio16_to_cpu(vq->vq.vdev, desc->flags);
@@ -457,8 +498,8 @@ static void vring_unmap_one_split_indirect(const struct vring_virtqueue *vq,
 		       DMA_FROM_DEVICE : DMA_TO_DEVICE);
 }
 
-static unsigned int vring_unmap_one_split(const struct vring_virtqueue *vq,
-					  unsigned int i)
+static unsigned int vring_unmap_one_split(struct vring_virtqueue *vq,
+					  unsigned int i, struct virtio_dma_head *dma)
 {
 	struct vring_desc_extra *extra = vq->split.desc_extra;
 	u16 flags;
@@ -474,17 +515,16 @@ static unsigned int vring_unmap_one_split(const struct vring_virtqueue *vq,
 				 extra[i].len,
 				 (flags & VRING_DESC_F_WRITE) ?
 				 DMA_FROM_DEVICE : DMA_TO_DEVICE);
-	} else {
-		if (!vq->do_unmap)
-			goto out;
-
-		dma_unmap_page(vring_dma_dev(vq),
-			       extra[i].addr,
-			       extra[i].len,
-			       (flags & VRING_DESC_F_WRITE) ?
-			       DMA_FROM_DEVICE : DMA_TO_DEVICE);
+		goto out;
 	}
 
+	if (!vring_need_unmap(vq, dma, extra[i].addr, extra[i].len))
+		goto out;
+
+	dma_unmap_page(vring_dma_dev(vq), extra[i].addr, extra[i].len,
+		       (flags & VRING_DESC_F_WRITE) ?
+		       DMA_FROM_DEVICE : DMA_TO_DEVICE);
+
 out:
 	return extra[i].next;
 }
@@ -717,10 +757,10 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
 		if (i == err_idx)
 			break;
 		if (indirect) {
-			vring_unmap_one_split_indirect(vq, &desc[i]);
+			vring_unmap_one_split_indirect(vq, &desc[i], NULL);
 			i = virtio16_to_cpu(_vq->vdev, desc[i].next);
 		} else
-			i = vring_unmap_one_split(vq, i);
+			i = vring_unmap_one_split(vq, i, NULL);
 	}
 
 free_indirect:
@@ -763,7 +803,7 @@ static bool virtqueue_kick_prepare_split(struct virtqueue *_vq)
 }
 
 static void detach_buf_split(struct vring_virtqueue *vq, unsigned int head,
-			     void **ctx)
+			     struct virtio_dma_head *dma, void **ctx)
 {
 	unsigned int i, j;
 	__virtio16 nextflag = cpu_to_virtio16(vq->vq.vdev, VRING_DESC_F_NEXT);
@@ -775,12 +815,12 @@ static void detach_buf_split(struct vring_virtqueue *vq, unsigned int head,
 	i = head;
 
 	while (vq->split.vring.desc[i].flags & nextflag) {
-		vring_unmap_one_split(vq, i);
+		vring_unmap_one_split(vq, i, dma);
 		i = vq->split.desc_extra[i].next;
 		vq->vq.num_free++;
 	}
 
-	vring_unmap_one_split(vq, i);
+	vring_unmap_one_split(vq, i, dma);
 	vq->split.desc_extra[i].next = vq->free_head;
 	vq->free_head = head;
 
@@ -802,9 +842,9 @@ static void detach_buf_split(struct vring_virtqueue *vq, unsigned int head,
 				VRING_DESC_F_INDIRECT));
 		BUG_ON(len == 0 || len % sizeof(struct vring_desc));
 
-		if (vq->do_unmap) {
+		if (vq->do_unmap || dma) {
 			for (j = 0; j < len / sizeof(struct vring_desc); j++)
-				vring_unmap_one_split_indirect(vq, &indir_desc[j]);
+				vring_unmap_one_split_indirect(vq, &indir_desc[j], dma);
 		}
 
 		kfree(indir_desc);
@@ -822,6 +862,7 @@ static bool more_used_split(const struct vring_virtqueue *vq)
 
 static void *virtqueue_get_buf_ctx_split(struct virtqueue *_vq,
 					 unsigned int *len,
+					 struct virtio_dma_head *dma,
 					 void **ctx)
 {
 	struct vring_virtqueue *vq = to_vvq(_vq);
@@ -862,7 +903,7 @@ static void *virtqueue_get_buf_ctx_split(struct virtqueue *_vq,
 
 	/* detach_buf_split clears data, so grab it now. */
 	ret = vq->split.desc_state[i].data;
-	detach_buf_split(vq, i, ctx);
+	detach_buf_split(vq, i, dma, ctx);
 	vq->last_used_idx++;
 	/* If we expect an interrupt for the next entry, tell host
 	 * by writing event index and flush out the write before
@@ -984,7 +1025,7 @@ static void *virtqueue_detach_unused_buf_split(struct virtqueue *_vq)
 			continue;
 		/* detach_buf_split clears data, so grab it now. */
 		buf = vq->split.desc_state[i].data;
-		detach_buf_split(vq, i, NULL);
+		detach_buf_split(vq, i, NULL, NULL);
 		vq->split.avail_idx_shadow--;
 		vq->split.vring.avail->idx = cpu_to_virtio16(_vq->vdev,
 				vq->split.avail_idx_shadow);
@@ -1220,8 +1261,9 @@ static u16 packed_last_used(u16 last_used_idx)
 	return last_used_idx & ~(-(1 << VRING_PACKED_EVENT_F_WRAP_CTR));
 }
 
-static void vring_unmap_extra_packed(const struct vring_virtqueue *vq,
-				     const struct vring_desc_extra *extra)
+static void vring_unmap_extra_packed(struct vring_virtqueue *vq,
+				     const struct vring_desc_extra *extra,
+				     struct virtio_dma_head *dma)
 {
 	u16 flags;
 
@@ -1235,23 +1277,24 @@ static void vring_unmap_extra_packed(const struct vring_virtqueue *vq,
 				 extra->addr, extra->len,
 				 (flags & VRING_DESC_F_WRITE) ?
 				 DMA_FROM_DEVICE : DMA_TO_DEVICE);
-	} else {
-		if (!vq->do_unmap)
-			return;
-
-		dma_unmap_page(vring_dma_dev(vq),
-			       extra->addr, extra->len,
-			       (flags & VRING_DESC_F_WRITE) ?
-			       DMA_FROM_DEVICE : DMA_TO_DEVICE);
+		return;
 	}
+
+	if (!vring_need_unmap(vq, dma, extra->addr, extra->len))
+		return;
+
+	dma_unmap_page(vring_dma_dev(vq), extra->addr, extra->len,
+		       (flags & VRING_DESC_F_WRITE) ?
+		       DMA_FROM_DEVICE : DMA_TO_DEVICE);
 }
 
-static void vring_unmap_desc_packed(const struct vring_virtqueue *vq,
-				    const struct vring_packed_desc *desc)
+static void vring_unmap_desc_packed(struct vring_virtqueue *vq,
+				    const struct vring_packed_desc *desc,
+				    struct virtio_dma_head *dma)
 {
 	u16 flags;
 
-	if (!vq->do_unmap)
+	if (!vring_need_unmap(vq, dma, le64_to_cpu(desc->addr), le32_to_cpu(desc->len)))
 		return;
 
 	flags = le16_to_cpu(desc->flags);
@@ -1389,7 +1432,7 @@ static int virtqueue_add_indirect_packed(struct vring_virtqueue *vq,
 	err_idx = i;
 
 	for (i = 0; i < err_idx; i++)
-		vring_unmap_desc_packed(vq, &desc[i]);
+		vring_unmap_desc_packed(vq, &desc[i], NULL);
 
 free_desc:
 	kfree(desc);
@@ -1539,7 +1582,7 @@ static inline int virtqueue_add_packed(struct virtqueue *_vq,
 	for (n = 0; n < total_sg; n++) {
 		if (i == err_idx)
 			break;
-		vring_unmap_extra_packed(vq, &vq->packed.desc_extra[curr]);
+		vring_unmap_extra_packed(vq, &vq->packed.desc_extra[curr], NULL);
 		curr = vq->packed.desc_extra[curr].next;
 		i++;
 		if (i >= vq->packed.vring.num)
@@ -1600,7 +1643,9 @@ static bool virtqueue_kick_prepare_packed(struct virtqueue *_vq)
 }
 
 static void detach_buf_packed(struct vring_virtqueue *vq,
-			      unsigned int id, void **ctx)
+			      unsigned int id,
+			      struct virtio_dma_head *dma,
+			      void **ctx)
 {
 	struct vring_desc_state_packed *state = NULL;
 	struct vring_packed_desc *desc;
@@ -1615,11 +1660,10 @@ static void detach_buf_packed(struct vring_virtqueue *vq,
 	vq->free_head = id;
 	vq->vq.num_free += state->num;
 
-	if (unlikely(vq->do_unmap)) {
+	if (vq->do_unmap || dma) {
 		curr = id;
 		for (i = 0; i < state->num; i++) {
-			vring_unmap_extra_packed(vq,
-						 &vq->packed.desc_extra[curr]);
+			vring_unmap_extra_packed(vq, &vq->packed.desc_extra[curr], dma);
 			curr = vq->packed.desc_extra[curr].next;
 		}
 	}
@@ -1632,11 +1676,11 @@ static void detach_buf_packed(struct vring_virtqueue *vq,
 		if (!desc)
 			return;
 
-		if (vq->do_unmap) {
+		if (vq->do_unmap || dma) {
 			len = vq->packed.desc_extra[id].len;
 			for (i = 0; i < len / sizeof(struct vring_packed_desc);
 					i++)
-				vring_unmap_desc_packed(vq, &desc[i]);
+				vring_unmap_desc_packed(vq, &desc[i], dma);
 		}
 		kfree(desc);
 		state->indir_desc = NULL;
@@ -1672,6 +1716,7 @@ static bool more_used_packed(const struct vring_virtqueue *vq)
 
 static void *virtqueue_get_buf_ctx_packed(struct virtqueue *_vq,
 					  unsigned int *len,
+					  struct virtio_dma_head *dma,
 					  void **ctx)
 {
 	struct vring_virtqueue *vq = to_vvq(_vq);
@@ -1712,7 +1757,7 @@ static void *virtqueue_get_buf_ctx_packed(struct virtqueue *_vq,
 
 	/* detach_buf_packed clears data, so grab it now. */
 	ret = vq->packed.desc_state[id].data;
-	detach_buf_packed(vq, id, ctx);
+	detach_buf_packed(vq, id, dma, ctx);
 
 	last_used += vq->packed.desc_state[id].num;
 	if (unlikely(last_used >= vq->packed.vring.num)) {
@@ -1877,7 +1922,7 @@ static void *virtqueue_detach_unused_buf_packed(struct virtqueue *_vq)
 			continue;
 		/* detach_buf clears data, so grab it now. */
 		buf = vq->packed.desc_state[i].data;
-		detach_buf_packed(vq, i, NULL);
+		detach_buf_packed(vq, i, NULL, NULL);
 		END_USE(vq);
 		return buf;
 	}
@@ -2417,11 +2462,44 @@ void *virtqueue_get_buf_ctx(struct virtqueue *_vq, unsigned int *len,
 {
 	struct vring_virtqueue *vq = to_vvq(_vq);
 
-	return vq->packed_ring ? virtqueue_get_buf_ctx_packed(_vq, len, ctx) :
-				 virtqueue_get_buf_ctx_split(_vq, len, ctx);
+	return vq->packed_ring ? virtqueue_get_buf_ctx_packed(_vq, len, NULL, ctx) :
+				 virtqueue_get_buf_ctx_split(_vq, len, NULL, ctx);
 }
 EXPORT_SYMBOL_GPL(virtqueue_get_buf_ctx);
 
+/**
+ * virtqueue_get_buf_ctx_dma - get the next used buffer with the dma info
+ * @_vq: the struct virtqueue we're talking about.
+ * @len: the length written into the buffer
+ * @dma: the head of the array to store the dma info
+ * @ctx: extra context for the token
+ *
+ * If the device wrote data into the buffer, @len will be set to the
+ * amount written.  This means you don't need to clear the buffer
+ * beforehand to ensure there's no data leakage in the case of short
+ * writes.
+ *
+ * Caller must ensure we don't call this with other virtqueue
+ * operations at the same time (except where noted).
+ *
+ * We store the dma info of every descriptor of this buf to the dma->items
+ * array. If the array size is too small, some dma info may be missed, so
+ * the caller must ensure the array is large enough. The dma->next is the out
+ * value to the caller, indicates the num of the used items.
+ *
+ * Returns NULL if there are no used buffers, or the "data" token
+ * handed to virtqueue_add_*().
+ */
+void *virtqueue_get_buf_ctx_dma(struct virtqueue *_vq, unsigned int *len,
+				struct virtio_dma_head *dma, void **ctx)
+{
+	struct vring_virtqueue *vq = to_vvq(_vq);
+
+	return vq->packed_ring ? virtqueue_get_buf_ctx_packed(_vq, len, dma, ctx) :
+				 virtqueue_get_buf_ctx_split(_vq, len, dma, ctx);
+}
+EXPORT_SYMBOL_GPL(virtqueue_get_buf_ctx_dma);
+
 void *virtqueue_get_buf(struct virtqueue *_vq, unsigned int *len)
 {
 	return virtqueue_get_buf_ctx(_vq, len, NULL);
diff --git a/include/linux/virtio.h b/include/linux/virtio.h
index 4cc614a38376..572aecec205b 100644
--- a/include/linux/virtio.h
+++ b/include/linux/virtio.h
@@ -75,6 +75,22 @@ void *virtqueue_get_buf(struct virtqueue *vq, unsigned int *len);
 void *virtqueue_get_buf_ctx(struct virtqueue *vq, unsigned int *len,
 			    void **ctx);
 
+struct virtio_dma_item {
+	dma_addr_t addr;
+	unsigned int length;
+};
+
+struct virtio_dma_head {
+	/* total num of items. */
+	u16 num;
+	/* point to the next item to store dma info. */
+	u16 next;
+	struct virtio_dma_item items[];
+};
+
+void *virtqueue_get_buf_ctx_dma(struct virtqueue *_vq, unsigned int *len,
+				struct virtio_dma_head *dma, void **ctx);
+
 void virtqueue_disable_cb(struct virtqueue *vq);
 
 bool virtqueue_enable_cb(struct virtqueue *vq);
-- 
2.32.0.3.g01195cf9f


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH net-next 2/5] virtio_ring: virtqueue_disable_and_recycle let the callback detach bufs
  2024-01-16  7:59 [PATCH net-next 0/5] virtio-net: sq support premapped mode Xuan Zhuo
  2024-01-16  7:59 ` [PATCH net-next 1/5] virtio_ring: introduce virtqueue_get_buf_ctx_dma() Xuan Zhuo
@ 2024-01-16  7:59 ` Xuan Zhuo
  2024-01-16  7:59 ` [PATCH net-next 3/5] virtio_ring: introduce virtqueue_detach_unused_buf_dma() Xuan Zhuo
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 33+ messages in thread
From: Xuan Zhuo @ 2024-01-16  7:59 UTC (permalink / raw)
  To: netdev
  Cc: Michael S. Tsirkin, Jason Wang, Xuan Zhuo, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Alexei Starovoitov,
	Daniel Borkmann, Jesper Dangaard Brouer, John Fastabend,
	virtualization, bpf

Now, inside virtqueue_disable_and_recycle, the recycle() just has two
parameters(vq, buf) after detach operate.

But if we are in premapped mode, we may need to get some dma info when
detach buf like virtqueue_get_buf_ctx_dma().

So we call recycle directly, this callback detaches bufs self. It should
complete the work of detaching all the unused buffers.

Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
---
 drivers/net/virtio/main.c    | 60 +++++++++++++++++++-----------------
 drivers/virtio/virtio_ring.c | 10 +++---
 include/linux/virtio.h       |  4 +--
 3 files changed, 38 insertions(+), 36 deletions(-)

diff --git a/drivers/net/virtio/main.c b/drivers/net/virtio/main.c
index ac3a529c7729..186b2cf5d8fc 100644
--- a/drivers/net/virtio/main.c
+++ b/drivers/net/virtio/main.c
@@ -150,7 +150,8 @@ struct virtio_net_common_hdr {
 	};
 };
 
-static void virtnet_sq_free_unused_buf(struct virtqueue *vq, void *buf);
+static void virtnet_rq_free_unused_bufs(struct virtqueue *vq);
+static void virtnet_sq_free_unused_bufs(struct virtqueue *vq);
 
 static bool is_xdp_frame(void *ptr)
 {
@@ -587,20 +588,6 @@ static void virtnet_rq_set_premapped(struct virtnet_info *vi)
 	}
 }
 
-static void virtnet_rq_unmap_free_buf(struct virtqueue *vq, void *buf)
-{
-	struct virtnet_info *vi = vq->vdev->priv;
-	struct virtnet_rq *rq;
-	int i = vq2rxq(vq);
-
-	rq = &vi->rq[i];
-
-	if (rq->do_dma)
-		virtnet_rq_unmap(rq, buf, 0);
-
-	virtnet_rq_free_buf(vi, rq, buf);
-}
-
 static void free_old_xmit(struct virtnet_sq *sq, bool in_napi)
 {
 	u64 bytes = 0, packets = 0;
@@ -2244,7 +2231,7 @@ static int virtnet_rx_resize(struct virtnet_info *vi,
 		cancel_work_sync(&rq->dim.work);
 	}
 
-	err = virtqueue_resize(rq->vq, ring_num, virtnet_rq_unmap_free_buf);
+	err = virtqueue_resize(rq->vq, ring_num, virtnet_rq_free_unused_bufs);
 	if (err)
 		netdev_err(vi->dev, "resize rx fail: rx queue index: %d err: %d\n", qindex, err);
 
@@ -2283,7 +2270,7 @@ static int virtnet_tx_resize(struct virtnet_info *vi,
 
 	__netif_tx_unlock_bh(txq);
 
-	err = virtqueue_resize(sq->vq, ring_num, virtnet_sq_free_unused_buf);
+	err = virtqueue_resize(sq->vq, ring_num, virtnet_sq_free_unused_bufs);
 	if (err)
 		netdev_err(vi->dev, "resize tx fail: tx queue index: %d err: %d\n", qindex, err);
 
@@ -4026,31 +4013,48 @@ static void free_receive_page_frags(struct virtnet_info *vi)
 		}
 }
 
-static void virtnet_sq_free_unused_buf(struct virtqueue *vq, void *buf)
+static void virtnet_sq_free_unused_bufs(struct virtqueue *vq)
 {
-	if (!is_xdp_frame(buf))
-		dev_kfree_skb(buf);
-	else
-		xdp_return_frame(ptr_to_xdp(buf));
+	void *buf;
+
+	while ((buf = virtqueue_detach_unused_buf(vq)) != NULL) {
+		if (!is_xdp_frame(buf))
+			dev_kfree_skb(buf);
+		else
+			xdp_return_frame(ptr_to_xdp(buf));
+	}
 }
 
-static void free_unused_bufs(struct virtnet_info *vi)
+static void virtnet_rq_free_unused_bufs(struct virtqueue *vq)
 {
+	struct virtnet_info *vi = vq->vdev->priv;
+	struct virtnet_rq *rq;
+	int i = vq2rxq(vq);
 	void *buf;
+
+	rq = &vi->rq[i];
+
+	while ((buf = virtqueue_detach_unused_buf(vq)) != NULL) {
+		if (rq->do_dma)
+			virtnet_rq_unmap(rq, buf, 0);
+
+		virtnet_rq_free_buf(vi, rq, buf);
+	}
+}
+
+static void free_unused_bufs(struct virtnet_info *vi)
+{
 	int i;
 
 	for (i = 0; i < vi->max_queue_pairs; i++) {
 		struct virtqueue *vq = vi->sq[i].vq;
-		while ((buf = virtqueue_detach_unused_buf(vq)) != NULL)
-			virtnet_sq_free_unused_buf(vq, buf);
+		virtnet_sq_free_unused_bufs(vq);
 		cond_resched();
 	}
 
 	for (i = 0; i < vi->max_queue_pairs; i++) {
 		struct virtqueue *vq = vi->rq[i].vq;
-
-		while ((buf = virtqueue_detach_unused_buf(vq)) != NULL)
-			virtnet_rq_unmap_free_buf(vq, buf);
+		virtnet_rq_free_unused_bufs(vq);
 		cond_resched();
 	}
 }
diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index 82f72428605b..ecbaf7568251 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -2198,11 +2198,10 @@ static int virtqueue_resize_packed(struct virtqueue *_vq, u32 num)
 }
 
 static int virtqueue_disable_and_recycle(struct virtqueue *_vq,
-					 void (*recycle)(struct virtqueue *vq, void *buf))
+					 void (*recycle)(struct virtqueue *vq))
 {
 	struct vring_virtqueue *vq = to_vvq(_vq);
 	struct virtio_device *vdev = vq->vq.vdev;
-	void *buf;
 	int err;
 
 	if (!vq->we_own_ring)
@@ -2218,8 +2217,7 @@ static int virtqueue_disable_and_recycle(struct virtqueue *_vq,
 	if (err)
 		return err;
 
-	while ((buf = virtqueue_detach_unused_buf(_vq)) != NULL)
-		recycle(_vq, buf);
+	recycle(_vq);
 
 	return 0;
 }
@@ -2814,7 +2812,7 @@ EXPORT_SYMBOL_GPL(vring_create_virtqueue_dma);
  *
  */
 int virtqueue_resize(struct virtqueue *_vq, u32 num,
-		     void (*recycle)(struct virtqueue *vq, void *buf))
+		     void (*recycle)(struct virtqueue *vq))
 {
 	struct vring_virtqueue *vq = to_vvq(_vq);
 	int err;
@@ -2905,7 +2903,7 @@ EXPORT_SYMBOL_GPL(virtqueue_set_dma_premapped);
  * -EPERM: Operation not permitted
  */
 int virtqueue_reset(struct virtqueue *_vq,
-		    void (*recycle)(struct virtqueue *vq, void *buf))
+		    void (*recycle)(struct virtqueue *vq))
 {
 	struct vring_virtqueue *vq = to_vvq(_vq);
 	int err;
diff --git a/include/linux/virtio.h b/include/linux/virtio.h
index 572aecec205b..7a5e9ea7d420 100644
--- a/include/linux/virtio.h
+++ b/include/linux/virtio.h
@@ -115,9 +115,9 @@ dma_addr_t virtqueue_get_avail_addr(const struct virtqueue *vq);
 dma_addr_t virtqueue_get_used_addr(const struct virtqueue *vq);
 
 int virtqueue_resize(struct virtqueue *vq, u32 num,
-		     void (*recycle)(struct virtqueue *vq, void *buf));
+		     void (*recycle)(struct virtqueue *vq));
 int virtqueue_reset(struct virtqueue *vq,
-		    void (*recycle)(struct virtqueue *vq, void *buf));
+		    void (*recycle)(struct virtqueue *vq));
 
 /**
  * struct virtio_device - representation of a device using virtio
-- 
2.32.0.3.g01195cf9f


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH net-next 3/5] virtio_ring: introduce virtqueue_detach_unused_buf_dma()
  2024-01-16  7:59 [PATCH net-next 0/5] virtio-net: sq support premapped mode Xuan Zhuo
  2024-01-16  7:59 ` [PATCH net-next 1/5] virtio_ring: introduce virtqueue_get_buf_ctx_dma() Xuan Zhuo
  2024-01-16  7:59 ` [PATCH net-next 2/5] virtio_ring: virtqueue_disable_and_recycle let the callback detach bufs Xuan Zhuo
@ 2024-01-16  7:59 ` Xuan Zhuo
  2024-01-16  7:59 ` [PATCH net-next 4/5] virtio_ring: introduce virtqueue_get_dma_premapped() Xuan Zhuo
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 33+ messages in thread
From: Xuan Zhuo @ 2024-01-16  7:59 UTC (permalink / raw)
  To: netdev
  Cc: Michael S. Tsirkin, Jason Wang, Xuan Zhuo, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Alexei Starovoitov,
	Daniel Borkmann, Jesper Dangaard Brouer, John Fastabend,
	virtualization, bpf

introduce virtqueue_detach_unused_buf_dma() to collect the dma
info when get buf from virtio core for premapped mode.

If the virtio queue is premapped mode, the virtio-net send buf may
have many desc. Every desc dma address need to be unmap. So here we
introduce a new helper to collect the dma address of the buffer from
the virtio core.

Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
---
 drivers/virtio/virtio_ring.c | 33 +++++++++++++++++++++++++--------
 include/linux/virtio.h       |  1 +
 2 files changed, 26 insertions(+), 8 deletions(-)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index ecbaf7568251..2c5089d3b510 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -1012,7 +1012,7 @@ static bool virtqueue_enable_cb_delayed_split(struct virtqueue *_vq)
 	return true;
 }
 
-static void *virtqueue_detach_unused_buf_split(struct virtqueue *_vq)
+static void *virtqueue_detach_unused_buf_split(struct virtqueue *_vq, struct virtio_dma_head *dma)
 {
 	struct vring_virtqueue *vq = to_vvq(_vq);
 	unsigned int i;
@@ -1025,7 +1025,7 @@ static void *virtqueue_detach_unused_buf_split(struct virtqueue *_vq)
 			continue;
 		/* detach_buf_split clears data, so grab it now. */
 		buf = vq->split.desc_state[i].data;
-		detach_buf_split(vq, i, NULL, NULL);
+		detach_buf_split(vq, i, dma, NULL);
 		vq->split.avail_idx_shadow--;
 		vq->split.vring.avail->idx = cpu_to_virtio16(_vq->vdev,
 				vq->split.avail_idx_shadow);
@@ -1909,7 +1909,7 @@ static bool virtqueue_enable_cb_delayed_packed(struct virtqueue *_vq)
 	return true;
 }
 
-static void *virtqueue_detach_unused_buf_packed(struct virtqueue *_vq)
+static void *virtqueue_detach_unused_buf_packed(struct virtqueue *_vq, struct virtio_dma_head *dma)
 {
 	struct vring_virtqueue *vq = to_vvq(_vq);
 	unsigned int i;
@@ -1922,7 +1922,7 @@ static void *virtqueue_detach_unused_buf_packed(struct virtqueue *_vq)
 			continue;
 		/* detach_buf clears data, so grab it now. */
 		buf = vq->packed.desc_state[i].data;
-		detach_buf_packed(vq, i, NULL, NULL);
+		detach_buf_packed(vq, i, dma, NULL);
 		END_USE(vq);
 		return buf;
 	}
@@ -2614,19 +2614,36 @@ bool virtqueue_enable_cb_delayed(struct virtqueue *_vq)
 EXPORT_SYMBOL_GPL(virtqueue_enable_cb_delayed);
 
 /**
- * virtqueue_detach_unused_buf - detach first unused buffer
+ * virtqueue_detach_unused_buf_dma - detach first unused buffer
  * @_vq: the struct virtqueue we're talking about.
+ * @dma: the head of the array to store the dma info
+ *
+ * more see virtqueue_get_buf_ctx_dma()
  *
  * Returns NULL or the "data" token handed to virtqueue_add_*().
  * This is not valid on an active queue; it is useful for device
  * shutdown or the reset queue.
  */
-void *virtqueue_detach_unused_buf(struct virtqueue *_vq)
+void *virtqueue_detach_unused_buf_dma(struct virtqueue *_vq, struct virtio_dma_head *dma)
 {
 	struct vring_virtqueue *vq = to_vvq(_vq);
 
-	return vq->packed_ring ? virtqueue_detach_unused_buf_packed(_vq) :
-				 virtqueue_detach_unused_buf_split(_vq);
+	return vq->packed_ring ? virtqueue_detach_unused_buf_packed(_vq, dma) :
+				 virtqueue_detach_unused_buf_split(_vq, dma);
+}
+EXPORT_SYMBOL_GPL(virtqueue_detach_unused_buf_dma);
+
+/**
+ * virtqueue_detach_unused_buf - detach first unused buffer
+ * @_vq: the struct virtqueue we're talking about.
+ *
+ * Returns NULL or the "data" token handed to virtqueue_add_*().
+ * This is not valid on an active queue; it is useful for device
+ * shutdown or the reset queue.
+ */
+void *virtqueue_detach_unused_buf(struct virtqueue *_vq)
+{
+	return virtqueue_detach_unused_buf_dma(_vq, NULL);
 }
 EXPORT_SYMBOL_GPL(virtqueue_detach_unused_buf);
 
diff --git a/include/linux/virtio.h b/include/linux/virtio.h
index 7a5e9ea7d420..2596f0e7e395 100644
--- a/include/linux/virtio.h
+++ b/include/linux/virtio.h
@@ -104,6 +104,7 @@ bool virtqueue_poll(struct virtqueue *vq, unsigned);
 bool virtqueue_enable_cb_delayed(struct virtqueue *vq);
 
 void *virtqueue_detach_unused_buf(struct virtqueue *vq);
+void *virtqueue_detach_unused_buf_dma(struct virtqueue *_vq, struct virtio_dma_head *dma);
 
 unsigned int virtqueue_get_vring_size(const struct virtqueue *vq);
 
-- 
2.32.0.3.g01195cf9f


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH net-next 4/5] virtio_ring: introduce virtqueue_get_dma_premapped()
  2024-01-16  7:59 [PATCH net-next 0/5] virtio-net: sq support premapped mode Xuan Zhuo
                   ` (2 preceding siblings ...)
  2024-01-16  7:59 ` [PATCH net-next 3/5] virtio_ring: introduce virtqueue_detach_unused_buf_dma() Xuan Zhuo
@ 2024-01-16  7:59 ` Xuan Zhuo
  2024-01-25  3:39   ` Jason Wang
  2024-01-16  7:59 ` [PATCH net-next 5/5] virtio_net: sq support premapped mode Xuan Zhuo
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 33+ messages in thread
From: Xuan Zhuo @ 2024-01-16  7:59 UTC (permalink / raw)
  To: netdev
  Cc: Michael S. Tsirkin, Jason Wang, Xuan Zhuo, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Alexei Starovoitov,
	Daniel Borkmann, Jesper Dangaard Brouer, John Fastabend,
	virtualization, bpf

Introduce helper virtqueue_get_dma_premapped(), then the driver
can know whether dma unmap is needed.

Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
---
 drivers/net/virtio/main.c       | 22 +++++++++-------------
 drivers/net/virtio/virtio_net.h |  3 ---
 drivers/virtio/virtio_ring.c    | 22 ++++++++++++++++++++++
 include/linux/virtio.h          |  1 +
 4 files changed, 32 insertions(+), 16 deletions(-)

diff --git a/drivers/net/virtio/main.c b/drivers/net/virtio/main.c
index 186b2cf5d8fc..4fbf612da235 100644
--- a/drivers/net/virtio/main.c
+++ b/drivers/net/virtio/main.c
@@ -483,7 +483,7 @@ static void *virtnet_rq_get_buf(struct virtnet_rq *rq, u32 *len, void **ctx)
 	void *buf;
 
 	buf = virtqueue_get_buf_ctx(rq->vq, len, ctx);
-	if (buf && rq->do_dma)
+	if (buf && virtqueue_get_dma_premapped(rq->vq))
 		virtnet_rq_unmap(rq, buf, *len);
 
 	return buf;
@@ -496,7 +496,7 @@ static void virtnet_rq_init_one_sg(struct virtnet_rq *rq, void *buf, u32 len)
 	u32 offset;
 	void *head;
 
-	if (!rq->do_dma) {
+	if (!virtqueue_get_dma_premapped(rq->vq)) {
 		sg_init_one(rq->sg, buf, len);
 		return;
 	}
@@ -526,7 +526,7 @@ static void *virtnet_rq_alloc(struct virtnet_rq *rq, u32 size, gfp_t gfp)
 
 	head = page_address(alloc_frag->page);
 
-	if (rq->do_dma) {
+	if (virtqueue_get_dma_premapped(rq->vq)) {
 		dma = head;
 
 		/* new pages */
@@ -580,12 +580,8 @@ static void virtnet_rq_set_premapped(struct virtnet_info *vi)
 	if (!vi->mergeable_rx_bufs && vi->big_packets)
 		return;
 
-	for (i = 0; i < vi->max_queue_pairs; i++) {
-		if (virtqueue_set_dma_premapped(vi->rq[i].vq))
-			continue;
-
-		vi->rq[i].do_dma = true;
-	}
+	for (i = 0; i < vi->max_queue_pairs; i++)
+		virtqueue_set_dma_premapped(vi->rq[i].vq);
 }
 
 static void free_old_xmit(struct virtnet_sq *sq, bool in_napi)
@@ -1643,7 +1639,7 @@ static int add_recvbuf_small(struct virtnet_info *vi, struct virtnet_rq *rq,
 
 	err = virtqueue_add_inbuf_ctx(rq->vq, rq->sg, 1, buf, ctx, gfp);
 	if (err < 0) {
-		if (rq->do_dma)
+		if (virtqueue_get_dma_premapped(rq->vq))
 			virtnet_rq_unmap(rq, buf, 0);
 		put_page(virt_to_head_page(buf));
 	}
@@ -1758,7 +1754,7 @@ static int add_recvbuf_mergeable(struct virtnet_info *vi,
 	ctx = mergeable_len_to_ctx(len + room, headroom);
 	err = virtqueue_add_inbuf_ctx(rq->vq, rq->sg, 1, buf, ctx, gfp);
 	if (err < 0) {
-		if (rq->do_dma)
+		if (virtqueue_get_dma_premapped(rq->vq))
 			virtnet_rq_unmap(rq, buf, 0);
 		put_page(virt_to_head_page(buf));
 	}
@@ -4007,7 +4003,7 @@ static void free_receive_page_frags(struct virtnet_info *vi)
 	int i;
 	for (i = 0; i < vi->max_queue_pairs; i++)
 		if (vi->rq[i].alloc_frag.page) {
-			if (vi->rq[i].do_dma && vi->rq[i].last_dma)
+			if (virtqueue_get_dma_premapped(vi->rq[i].vq) && vi->rq[i].last_dma)
 				virtnet_rq_unmap(&vi->rq[i], vi->rq[i].last_dma, 0);
 			put_page(vi->rq[i].alloc_frag.page);
 		}
@@ -4035,7 +4031,7 @@ static void virtnet_rq_free_unused_bufs(struct virtqueue *vq)
 	rq = &vi->rq[i];
 
 	while ((buf = virtqueue_detach_unused_buf(vq)) != NULL) {
-		if (rq->do_dma)
+		if (virtqueue_get_dma_premapped(rq->vq))
 			virtnet_rq_unmap(rq, buf, 0);
 
 		virtnet_rq_free_buf(vi, rq, buf);
diff --git a/drivers/net/virtio/virtio_net.h b/drivers/net/virtio/virtio_net.h
index b28a4d0a3150..066a2b9d2b3c 100644
--- a/drivers/net/virtio/virtio_net.h
+++ b/drivers/net/virtio/virtio_net.h
@@ -115,9 +115,6 @@ struct virtnet_rq {
 
 	/* Record the last dma info to free after new pages is allocated. */
 	struct virtnet_rq_dma *last_dma;
-
-	/* Do dma by self */
-	bool do_dma;
 };
 
 struct virtnet_info {
diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index 2c5089d3b510..9092bcdebb53 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -2905,6 +2905,28 @@ int virtqueue_set_dma_premapped(struct virtqueue *_vq)
 }
 EXPORT_SYMBOL_GPL(virtqueue_set_dma_premapped);
 
+/**
+ * virtqueue_get_dma_premapped - get the vring premapped mode
+ * @_vq: the struct virtqueue we're talking about.
+ *
+ * Get the premapped mode of the vq.
+ *
+ * Returns bool for the vq premapped mode.
+ */
+bool virtqueue_get_dma_premapped(struct virtqueue *_vq)
+{
+	struct vring_virtqueue *vq = to_vvq(_vq);
+	bool premapped;
+
+	START_USE(vq);
+	premapped = vq->premapped;
+	END_USE(vq);
+
+	return premapped;
+
+}
+EXPORT_SYMBOL_GPL(virtqueue_get_dma_premapped);
+
 /**
  * virtqueue_reset - detach and recycle all unused buffers
  * @_vq: the struct virtqueue we're talking about.
diff --git a/include/linux/virtio.h b/include/linux/virtio.h
index 2596f0e7e395..3e9a2bb75af6 100644
--- a/include/linux/virtio.h
+++ b/include/linux/virtio.h
@@ -98,6 +98,7 @@ bool virtqueue_enable_cb(struct virtqueue *vq);
 unsigned virtqueue_enable_cb_prepare(struct virtqueue *vq);
 
 int virtqueue_set_dma_premapped(struct virtqueue *_vq);
+bool virtqueue_get_dma_premapped(struct virtqueue *_vq);
 
 bool virtqueue_poll(struct virtqueue *vq, unsigned);
 
-- 
2.32.0.3.g01195cf9f


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH net-next 5/5] virtio_net: sq support premapped mode
  2024-01-16  7:59 [PATCH net-next 0/5] virtio-net: sq support premapped mode Xuan Zhuo
                   ` (3 preceding siblings ...)
  2024-01-16  7:59 ` [PATCH net-next 4/5] virtio_ring: introduce virtqueue_get_dma_premapped() Xuan Zhuo
@ 2024-01-16  7:59 ` Xuan Zhuo
  2024-01-25  3:39   ` Jason Wang
  2024-01-25  3:39 ` [PATCH net-next 0/5] virtio-net: " Jason Wang
  2024-02-22 19:45 ` Michael S. Tsirkin
  6 siblings, 1 reply; 33+ messages in thread
From: Xuan Zhuo @ 2024-01-16  7:59 UTC (permalink / raw)
  To: netdev
  Cc: Michael S. Tsirkin, Jason Wang, Xuan Zhuo, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Alexei Starovoitov,
	Daniel Borkmann, Jesper Dangaard Brouer, John Fastabend,
	virtualization, bpf

If the xsk is enabling, the xsk tx will share the send queue.
But the xsk requires that the send queue use the premapped mode.
So the send queue must support premapped mode.

command: pktgen_sample01_simple.sh -i eth0 -s 16/1400 -d 10.0.0.123 -m 00:16:3e:12:e1:3e -n 0 -p 100
machine:  ecs.ebmg6e.26xlarge of Aliyun
cpu: Intel(R) Xeon(R) Platinum 8269CY CPU @ 2.50GHz
iommu mode: intel_iommu=on iommu.strict=1 iommu=nopt

                      |        iommu off           |        iommu on
----------------------|-----------------------------------------------------
                      | 16         |  1400         | 16         | 1400
----------------------|-----------------------------------------------------
Before:               |1716796.00  |  1581829.00   | 390756.00  | 374493.00
After(premapped off): |1733794.00  |  1576259.00   | 390189.00  | 378128.00
After(premapped on):  |1707107.00  |  1562917.00   | 385667.00  | 373584.00

Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
---
 drivers/net/virtio/main.c       | 119 ++++++++++++++++++++++++++++----
 drivers/net/virtio/virtio_net.h |  10 ++-
 2 files changed, 116 insertions(+), 13 deletions(-)

diff --git a/drivers/net/virtio/main.c b/drivers/net/virtio/main.c
index 4fbf612da235..53143f95a3a0 100644
--- a/drivers/net/virtio/main.c
+++ b/drivers/net/virtio/main.c
@@ -168,13 +168,39 @@ static struct xdp_frame *ptr_to_xdp(void *ptr)
 	return (struct xdp_frame *)((unsigned long)ptr & ~VIRTIO_XDP_FLAG);
 }
 
+static void virtnet_sq_unmap_buf(struct virtnet_sq *sq, struct virtio_dma_head *dma)
+{
+	int i;
+
+	if (!dma)
+		return;
+
+	for (i = 0; i < dma->next; ++i)
+		virtqueue_dma_unmap_single_attrs(sq->vq,
+						 dma->items[i].addr,
+						 dma->items[i].length,
+						 DMA_TO_DEVICE, 0);
+	dma->next = 0;
+}
+
 static void __free_old_xmit(struct virtnet_sq *sq, bool in_napi,
 			    u64 *bytes, u64 *packets)
 {
+	struct virtio_dma_head *dma;
 	unsigned int len;
 	void *ptr;
 
-	while ((ptr = virtqueue_get_buf(sq->vq, &len)) != NULL) {
+	if (virtqueue_get_dma_premapped(sq->vq)) {
+		dma = &sq->dma.head;
+		dma->num = ARRAY_SIZE(sq->dma.items);
+		dma->next = 0;
+	} else {
+		dma = NULL;
+	}
+
+	while ((ptr = virtqueue_get_buf_ctx_dma(sq->vq, &len, dma, NULL)) != NULL) {
+		virtnet_sq_unmap_buf(sq, dma);
+
 		if (!is_xdp_frame(ptr)) {
 			struct sk_buff *skb = ptr;
 
@@ -572,16 +598,70 @@ static void *virtnet_rq_alloc(struct virtnet_rq *rq, u32 size, gfp_t gfp)
 	return buf;
 }
 
-static void virtnet_rq_set_premapped(struct virtnet_info *vi)
+static void virtnet_set_premapped(struct virtnet_info *vi)
 {
 	int i;
 
-	/* disable for big mode */
-	if (!vi->mergeable_rx_bufs && vi->big_packets)
-		return;
+	for (i = 0; i < vi->max_queue_pairs; i++) {
+		virtqueue_set_dma_premapped(vi->sq[i].vq);
 
-	for (i = 0; i < vi->max_queue_pairs; i++)
-		virtqueue_set_dma_premapped(vi->rq[i].vq);
+		/* TODO for big mode */
+		if (vi->mergeable_rx_bufs || !vi->big_packets)
+			virtqueue_set_dma_premapped(vi->rq[i].vq);
+	}
+}
+
+static void virtnet_sq_unmap_sg(struct virtnet_sq *sq, u32 num)
+{
+	struct scatterlist *sg;
+	u32 i;
+
+	for (i = 0; i < num; ++i) {
+		sg = &sq->sg[i];
+
+		virtqueue_dma_unmap_single_attrs(sq->vq,
+						 sg->dma_address,
+						 sg->length,
+						 DMA_TO_DEVICE, 0);
+	}
+}
+
+static int virtnet_sq_map_sg(struct virtnet_sq *sq, u32 num)
+{
+	struct scatterlist *sg;
+	u32 i;
+
+	for (i = 0; i < num; ++i) {
+		sg = &sq->sg[i];
+		sg->dma_address = virtqueue_dma_map_single_attrs(sq->vq, sg_virt(sg),
+								 sg->length,
+								 DMA_TO_DEVICE, 0);
+		if (virtqueue_dma_mapping_error(sq->vq, sg->dma_address))
+			goto err;
+	}
+
+	return 0;
+
+err:
+	virtnet_sq_unmap_sg(sq, i);
+	return -ENOMEM;
+}
+
+static int virtnet_add_outbuf(struct virtnet_sq *sq, u32 num, void *data)
+{
+	int ret;
+
+	if (virtqueue_get_dma_premapped(sq->vq)) {
+		ret = virtnet_sq_map_sg(sq, num);
+		if (ret)
+			return -ENOMEM;
+	}
+
+	ret = virtqueue_add_outbuf(sq->vq, sq->sg, num, data, GFP_ATOMIC);
+	if (ret && virtqueue_get_dma_premapped(sq->vq))
+		virtnet_sq_unmap_sg(sq, num);
+
+	return ret;
 }
 
 static void free_old_xmit(struct virtnet_sq *sq, bool in_napi)
@@ -687,8 +767,7 @@ static int __virtnet_xdp_xmit_one(struct virtnet_info *vi,
 			    skb_frag_size(frag), skb_frag_off(frag));
 	}
 
-	err = virtqueue_add_outbuf(sq->vq, sq->sg, nr_frags + 1,
-				   xdp_to_ptr(xdpf), GFP_ATOMIC);
+	err = virtnet_add_outbuf(sq, nr_frags + 1, xdp_to_ptr(xdpf));
 	if (unlikely(err))
 		return -ENOSPC; /* Caller handle free/refcnt */
 
@@ -2154,7 +2233,7 @@ static int xmit_skb(struct virtnet_sq *sq, struct sk_buff *skb)
 			return num_sg;
 		num_sg++;
 	}
-	return virtqueue_add_outbuf(sq->vq, sq->sg, num_sg, skb, GFP_ATOMIC);
+	return virtnet_add_outbuf(sq, num_sg, skb);
 }
 
 static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev)
@@ -4011,9 +4090,25 @@ static void free_receive_page_frags(struct virtnet_info *vi)
 
 static void virtnet_sq_free_unused_bufs(struct virtqueue *vq)
 {
+	struct virtnet_info *vi = vq->vdev->priv;
+	struct virtio_dma_head *dma;
+	struct virtnet_sq *sq;
+	int i = vq2txq(vq);
 	void *buf;
 
-	while ((buf = virtqueue_detach_unused_buf(vq)) != NULL) {
+	sq = &vi->sq[i];
+
+	if (virtqueue_get_dma_premapped(sq->vq)) {
+		dma = &sq->dma.head;
+		dma->num = ARRAY_SIZE(sq->dma.items);
+		dma->next = 0;
+	} else {
+		dma = NULL;
+	}
+
+	while ((buf = virtqueue_detach_unused_buf_dma(vq, dma)) != NULL) {
+		virtnet_sq_unmap_buf(sq, dma);
+
 		if (!is_xdp_frame(buf))
 			dev_kfree_skb(buf);
 		else
@@ -4228,7 +4323,7 @@ static int init_vqs(struct virtnet_info *vi)
 	if (ret)
 		goto err_free;
 
-	virtnet_rq_set_premapped(vi);
+	virtnet_set_premapped(vi);
 
 	cpus_read_lock();
 	virtnet_set_affinity(vi);
diff --git a/drivers/net/virtio/virtio_net.h b/drivers/net/virtio/virtio_net.h
index 066a2b9d2b3c..dda144cc91c7 100644
--- a/drivers/net/virtio/virtio_net.h
+++ b/drivers/net/virtio/virtio_net.h
@@ -48,13 +48,21 @@ struct virtnet_rq_dma {
 	u16 need_sync;
 };
 
+struct virtnet_sq_dma {
+	struct virtio_dma_head head;
+	struct virtio_dma_item items[MAX_SKB_FRAGS + 2];
+};
+
 /* Internal representation of a send virtqueue */
 struct virtnet_sq {
 	/* Virtqueue associated with this virtnet_sq */
 	struct virtqueue *vq;
 
 	/* TX: fragments + linear part + virtio header */
-	struct scatterlist sg[MAX_SKB_FRAGS + 2];
+	union {
+		struct scatterlist sg[MAX_SKB_FRAGS + 2];
+		struct virtnet_sq_dma dma;
+	};
 
 	/* Name of the send queue: output.$index */
 	char name[16];
-- 
2.32.0.3.g01195cf9f


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: [PATCH net-next 1/5] virtio_ring: introduce virtqueue_get_buf_ctx_dma()
  2024-01-16  7:59 ` [PATCH net-next 1/5] virtio_ring: introduce virtqueue_get_buf_ctx_dma() Xuan Zhuo
@ 2024-01-24  6:54   ` Jason Wang
  2024-02-22 19:43   ` Michael S. Tsirkin
  1 sibling, 0 replies; 33+ messages in thread
From: Jason Wang @ 2024-01-24  6:54 UTC (permalink / raw)
  To: Xuan Zhuo
  Cc: netdev, Michael S. Tsirkin, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Alexei Starovoitov, Daniel Borkmann,
	Jesper Dangaard Brouer, John Fastabend, virtualization, bpf

On Tue, Jan 16, 2024 at 3:59 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
>
> introduce virtqueue_get_buf_ctx_dma() to collect the dma info when
> get buf from virtio core for premapped mode.
>
> If the virtio queue is premapped mode, the virtio-net send buf may
> have many desc.

This feature is not specific to virtio-net, so we can let "for example ..."

> Every desc dma address need to be unmap. So here we
> introduce a new helper to collect the dma address of the buffer from
> the virtio core.

Let's explain why we can't (or suboptimal to) depend on a driver to do this.

>
> Because the BAD_RING is called (that may set vq->broken), so
> the relative "const" of vq is removed.
>
> Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> ---
>  drivers/virtio/virtio_ring.c | 174 +++++++++++++++++++++++++----------
>  include/linux/virtio.h       |  16 ++++
>  2 files changed, 142 insertions(+), 48 deletions(-)
>
> diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> index 49299b1f9ec7..82f72428605b 100644
> --- a/drivers/virtio/virtio_ring.c
> +++ b/drivers/virtio/virtio_ring.c
> @@ -362,6 +362,45 @@ static struct device *vring_dma_dev(const struct vring_virtqueue *vq)
>         return vq->dma_dev;
>  }
>
> +/*
> + *     use_dma_api premapped -> do_unmap
> + *  1. false       false        false
> + *  2. true        false        true
> + *  3. true        true         false
> + *
> + * Only #3, we should return the DMA info to the driver.

So the code has a check for dma:

        if (!dma)
                return false;

For whatever the case, we need a better comment here.

So we had: use_dma_api, premapped, do_unmap and dma now. I must say
I'm totally lost in this maze. It's a strong hint the API is too
complicated and needs to be tweaked.

For example, is it legal to have do_unmap be false buf DMA is true?

Here're suggestions:

1) rename premapped to buffer_is_premapped
2) rename do_unmap to buffer_need_unmap or introduce a helper

bool vring_need_unmap_buffer()
{
        return use_dma_api && !premapped;
}

3) split the getting dma info logic into an helper like vritqueue_get_dma_info()

so we can do

if (!vring_need_unmap_buffer()) {
        virtqueue_get_dma_info()
        return;
}

4) explain why we still need to check dma assuming we had
vring_need_unmap_buffer():

If vring_need_unmap_buffer() is true, we don't need to care about dma at all.

If vring_need_unmap_buffer() is false, we must return dma info
otherwise there's a leak?

> + *
> + * Return:
> + * true: the virtio core must unmap the desc
> + * false: the virtio core skip the desc unmap

Might it be better to say "It's up to the driver to unmap"?

> + */
> +static bool vring_need_unmap(struct vring_virtqueue *vq,
> +                            struct virtio_dma_head *dma,
> +                            dma_addr_t addr, unsigned int length)
> +{
> +       if (vq->do_unmap)
> +               return true;
> +
> +       if (!vq->premapped)
> +               return false;
> +
> +       if (!dma)
> +               return false;

So the logic here is odd.

if (!dma)
        return false;
...
        return false;

A strong hint to split the below getting info logic into another
helper. The root cause is the function do more than just a "yes or
no".

> +
> +       if (unlikely(dma->next >= dma->num)) {
> +               BAD_RING(vq, "premapped vq: collect dma overflow: %pad %u\n",
> +                        &addr, length);
> +               return false;
> +       }
> +
> +       dma->items[dma->next].addr = addr;
> +       dma->items[dma->next].length = length;
> +
> +       ++dma->next;
> +
> +       return false;
> +}
> +

Thanks

....


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH net-next 4/5] virtio_ring: introduce virtqueue_get_dma_premapped()
  2024-01-16  7:59 ` [PATCH net-next 4/5] virtio_ring: introduce virtqueue_get_dma_premapped() Xuan Zhuo
@ 2024-01-25  3:39   ` Jason Wang
  2024-01-25  5:57     ` Xuan Zhuo
  0 siblings, 1 reply; 33+ messages in thread
From: Jason Wang @ 2024-01-25  3:39 UTC (permalink / raw)
  To: Xuan Zhuo
  Cc: netdev, Michael S. Tsirkin, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Alexei Starovoitov, Daniel Borkmann,
	Jesper Dangaard Brouer, John Fastabend, virtualization, bpf

On Tue, Jan 16, 2024 at 3:59 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
>
> Introduce helper virtqueue_get_dma_premapped(), then the driver
> can know whether dma unmap is needed.
>
> Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> ---
>  drivers/net/virtio/main.c       | 22 +++++++++-------------
>  drivers/net/virtio/virtio_net.h |  3 ---
>  drivers/virtio/virtio_ring.c    | 22 ++++++++++++++++++++++
>  include/linux/virtio.h          |  1 +
>  4 files changed, 32 insertions(+), 16 deletions(-)
>
> diff --git a/drivers/net/virtio/main.c b/drivers/net/virtio/main.c
> index 186b2cf5d8fc..4fbf612da235 100644
> --- a/drivers/net/virtio/main.c
> +++ b/drivers/net/virtio/main.c
> @@ -483,7 +483,7 @@ static void *virtnet_rq_get_buf(struct virtnet_rq *rq, u32 *len, void **ctx)
>         void *buf;
>
>         buf = virtqueue_get_buf_ctx(rq->vq, len, ctx);
> -       if (buf && rq->do_dma)
> +       if (buf && virtqueue_get_dma_premapped(rq->vq))
>                 virtnet_rq_unmap(rq, buf, *len);
>
>         return buf;
> @@ -496,7 +496,7 @@ static void virtnet_rq_init_one_sg(struct virtnet_rq *rq, void *buf, u32 len)
>         u32 offset;
>         void *head;
>
> -       if (!rq->do_dma) {
> +       if (!virtqueue_get_dma_premapped(rq->vq)) {
>                 sg_init_one(rq->sg, buf, len);
>                 return;
>         }
> @@ -526,7 +526,7 @@ static void *virtnet_rq_alloc(struct virtnet_rq *rq, u32 size, gfp_t gfp)
>
>         head = page_address(alloc_frag->page);
>
> -       if (rq->do_dma) {
> +       if (virtqueue_get_dma_premapped(rq->vq)) {
>                 dma = head;
>
>                 /* new pages */
> @@ -580,12 +580,8 @@ static void virtnet_rq_set_premapped(struct virtnet_info *vi)
>         if (!vi->mergeable_rx_bufs && vi->big_packets)
>                 return;
>
> -       for (i = 0; i < vi->max_queue_pairs; i++) {
> -               if (virtqueue_set_dma_premapped(vi->rq[i].vq))
> -                       continue;
> -
> -               vi->rq[i].do_dma = true;
> -       }
> +       for (i = 0; i < vi->max_queue_pairs; i++)
> +               virtqueue_set_dma_premapped(vi->rq[i].vq);
>  }
>
>  static void free_old_xmit(struct virtnet_sq *sq, bool in_napi)
> @@ -1643,7 +1639,7 @@ static int add_recvbuf_small(struct virtnet_info *vi, struct virtnet_rq *rq,
>
>         err = virtqueue_add_inbuf_ctx(rq->vq, rq->sg, 1, buf, ctx, gfp);
>         if (err < 0) {
> -               if (rq->do_dma)
> +               if (virtqueue_get_dma_premapped(rq->vq))
>                         virtnet_rq_unmap(rq, buf, 0);
>                 put_page(virt_to_head_page(buf));
>         }
> @@ -1758,7 +1754,7 @@ static int add_recvbuf_mergeable(struct virtnet_info *vi,
>         ctx = mergeable_len_to_ctx(len + room, headroom);
>         err = virtqueue_add_inbuf_ctx(rq->vq, rq->sg, 1, buf, ctx, gfp);
>         if (err < 0) {
> -               if (rq->do_dma)
> +               if (virtqueue_get_dma_premapped(rq->vq))
>                         virtnet_rq_unmap(rq, buf, 0);
>                 put_page(virt_to_head_page(buf));
>         }
> @@ -4007,7 +4003,7 @@ static void free_receive_page_frags(struct virtnet_info *vi)
>         int i;
>         for (i = 0; i < vi->max_queue_pairs; i++)
>                 if (vi->rq[i].alloc_frag.page) {
> -                       if (vi->rq[i].do_dma && vi->rq[i].last_dma)
> +                       if (virtqueue_get_dma_premapped(vi->rq[i].vq) && vi->rq[i].last_dma)
>                                 virtnet_rq_unmap(&vi->rq[i], vi->rq[i].last_dma, 0);
>                         put_page(vi->rq[i].alloc_frag.page);
>                 }
> @@ -4035,7 +4031,7 @@ static void virtnet_rq_free_unused_bufs(struct virtqueue *vq)
>         rq = &vi->rq[i];
>
>         while ((buf = virtqueue_detach_unused_buf(vq)) != NULL) {
> -               if (rq->do_dma)
> +               if (virtqueue_get_dma_premapped(rq->vq))
>                         virtnet_rq_unmap(rq, buf, 0);
>
>                 virtnet_rq_free_buf(vi, rq, buf);
> diff --git a/drivers/net/virtio/virtio_net.h b/drivers/net/virtio/virtio_net.h
> index b28a4d0a3150..066a2b9d2b3c 100644
> --- a/drivers/net/virtio/virtio_net.h
> +++ b/drivers/net/virtio/virtio_net.h
> @@ -115,9 +115,6 @@ struct virtnet_rq {
>
>         /* Record the last dma info to free after new pages is allocated. */
>         struct virtnet_rq_dma *last_dma;
> -
> -       /* Do dma by self */
> -       bool do_dma;
>  };
>
>  struct virtnet_info {
> diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> index 2c5089d3b510..9092bcdebb53 100644
> --- a/drivers/virtio/virtio_ring.c
> +++ b/drivers/virtio/virtio_ring.c
> @@ -2905,6 +2905,28 @@ int virtqueue_set_dma_premapped(struct virtqueue *_vq)
>  }
>  EXPORT_SYMBOL_GPL(virtqueue_set_dma_premapped);
>
> +/**
> + * virtqueue_get_dma_premapped - get the vring premapped mode
> + * @_vq: the struct virtqueue we're talking about.
> + *
> + * Get the premapped mode of the vq.
> + *
> + * Returns bool for the vq premapped mode.
> + */
> +bool virtqueue_get_dma_premapped(struct virtqueue *_vq)
> +{
> +       struct vring_virtqueue *vq = to_vvq(_vq);
> +       bool premapped;
> +
> +       START_USE(vq);
> +       premapped = vq->premapped;
> +       END_USE(vq);

Why do we need to protect premapped like this? Is the user allowed to
change it on the fly?

Thanks


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH net-next 5/5] virtio_net: sq support premapped mode
  2024-01-16  7:59 ` [PATCH net-next 5/5] virtio_net: sq support premapped mode Xuan Zhuo
@ 2024-01-25  3:39   ` Jason Wang
  2024-01-25  5:58     ` Xuan Zhuo
  0 siblings, 1 reply; 33+ messages in thread
From: Jason Wang @ 2024-01-25  3:39 UTC (permalink / raw)
  To: Xuan Zhuo
  Cc: netdev, Michael S. Tsirkin, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Alexei Starovoitov, Daniel Borkmann,
	Jesper Dangaard Brouer, John Fastabend, virtualization, bpf

On Tue, Jan 16, 2024 at 3:59 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
>
> If the xsk is enabling, the xsk tx will share the send queue.

Any reason for this? Technically, virtio-net can work as other NIC
like 256 queues. There could be some work like optimizing the
interrupt allocations etc.

> But the xsk requires that the send queue use the premapped mode.
> So the send queue must support premapped mode.
>
> command: pktgen_sample01_simple.sh -i eth0 -s 16/1400 -d 10.0.0.123 -m 00:16:3e:12:e1:3e -n 0 -p 100
> machine:  ecs.ebmg6e.26xlarge of Aliyun
> cpu: Intel(R) Xeon(R) Platinum 8269CY CPU @ 2.50GHz
> iommu mode: intel_iommu=on iommu.strict=1 iommu=nopt
>
>                       |        iommu off           |        iommu on
> ----------------------|-----------------------------------------------------
>                       | 16         |  1400         | 16         | 1400
> ----------------------|-----------------------------------------------------
> Before:               |1716796.00  |  1581829.00   | 390756.00  | 374493.00
> After(premapped off): |1733794.00  |  1576259.00   | 390189.00  | 378128.00
> After(premapped on):  |1707107.00  |  1562917.00   | 385667.00  | 373584.00
>
> Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> ---
>  drivers/net/virtio/main.c       | 119 ++++++++++++++++++++++++++++----
>  drivers/net/virtio/virtio_net.h |  10 ++-
>  2 files changed, 116 insertions(+), 13 deletions(-)
>
> diff --git a/drivers/net/virtio/main.c b/drivers/net/virtio/main.c
> index 4fbf612da235..53143f95a3a0 100644
> --- a/drivers/net/virtio/main.c
> +++ b/drivers/net/virtio/main.c
> @@ -168,13 +168,39 @@ static struct xdp_frame *ptr_to_xdp(void *ptr)
>         return (struct xdp_frame *)((unsigned long)ptr & ~VIRTIO_XDP_FLAG);
>  }
>
> +static void virtnet_sq_unmap_buf(struct virtnet_sq *sq, struct virtio_dma_head *dma)
> +{
> +       int i;
> +
> +       if (!dma)
> +               return;
> +
> +       for (i = 0; i < dma->next; ++i)
> +               virtqueue_dma_unmap_single_attrs(sq->vq,
> +                                                dma->items[i].addr,
> +                                                dma->items[i].length,
> +                                                DMA_TO_DEVICE, 0);
> +       dma->next = 0;
> +}
> +
>  static void __free_old_xmit(struct virtnet_sq *sq, bool in_napi,
>                             u64 *bytes, u64 *packets)
>  {
> +       struct virtio_dma_head *dma;
>         unsigned int len;
>         void *ptr;
>
> -       while ((ptr = virtqueue_get_buf(sq->vq, &len)) != NULL) {
> +       if (virtqueue_get_dma_premapped(sq->vq)) {

Any chance this.can be false?

> +               dma = &sq->dma.head;
> +               dma->num = ARRAY_SIZE(sq->dma.items);
> +               dma->next = 0;

Btw, I found in the case of RX we have:

virtnet_rq_alloc():

                        alloc_frag->offset = sizeof(*dma);

This seems to defeat frag coalescing when the memory is highly
fragmented or high order allocation is disallowed.

Any idea to solve this?

> +       } else {
> +               dma = NULL;
> +       }
> +
> +       while ((ptr = virtqueue_get_buf_ctx_dma(sq->vq, &len, dma, NULL)) != NULL) {
> +               virtnet_sq_unmap_buf(sq, dma);
> +
>                 if (!is_xdp_frame(ptr)) {
>                         struct sk_buff *skb = ptr;
>
> @@ -572,16 +598,70 @@ static void *virtnet_rq_alloc(struct virtnet_rq *rq, u32 size, gfp_t gfp)
>         return buf;
>  }
>
> -static void virtnet_rq_set_premapped(struct virtnet_info *vi)
> +static void virtnet_set_premapped(struct virtnet_info *vi)
>  {
>         int i;
>
> -       /* disable for big mode */
> -       if (!vi->mergeable_rx_bufs && vi->big_packets)
> -               return;
> +       for (i = 0; i < vi->max_queue_pairs; i++) {
> +               virtqueue_set_dma_premapped(vi->sq[i].vq);
>
> -       for (i = 0; i < vi->max_queue_pairs; i++)
> -               virtqueue_set_dma_premapped(vi->rq[i].vq);
> +               /* TODO for big mode */

Btw, how hard to support big mode? If we can do premapping for that
code could be simplified.

(There are vendors that doesn't support mergeable rx buffers).

> +               if (vi->mergeable_rx_bufs || !vi->big_packets)
> +                       virtqueue_set_dma_premapped(vi->rq[i].vq);
> +       }
> +}
> +
> +static void virtnet_sq_unmap_sg(struct virtnet_sq *sq, u32 num)
> +{
> +       struct scatterlist *sg;
> +       u32 i;
> +
> +       for (i = 0; i < num; ++i) {
> +               sg = &sq->sg[i];
> +
> +               virtqueue_dma_unmap_single_attrs(sq->vq,
> +                                                sg->dma_address,
> +                                                sg->length,
> +                                                DMA_TO_DEVICE, 0);
> +       }
> +}
> +
> +static int virtnet_sq_map_sg(struct virtnet_sq *sq, u32 num)
> +{
> +       struct scatterlist *sg;
> +       u32 i;
> +
> +       for (i = 0; i < num; ++i) {
> +               sg = &sq->sg[i];
> +               sg->dma_address = virtqueue_dma_map_single_attrs(sq->vq, sg_virt(sg),
> +                                                                sg->length,
> +                                                                DMA_TO_DEVICE, 0);
> +               if (virtqueue_dma_mapping_error(sq->vq, sg->dma_address))
> +                       goto err;
> +       }
> +

This seems nothing virtio-net specific, let's move it to the core?

Thanks


> +       return 0;
> +
> +err:
> +       virtnet_sq_unmap_sg(sq, i);
> +       return -ENOMEM;
> +}
> +
> +static int virtnet_add_outbuf(struct virtnet_sq *sq, u32 num, void *data)
> +{
> +       int ret;
> +
> +       if (virtqueue_get_dma_premapped(sq->vq)) {
> +               ret = virtnet_sq_map_sg(sq, num);
> +               if (ret)
> +                       return -ENOMEM;
> +       }
> +
> +       ret = virtqueue_add_outbuf(sq->vq, sq->sg, num, data, GFP_ATOMIC);
> +       if (ret && virtqueue_get_dma_premapped(sq->vq))
> +               virtnet_sq_unmap_sg(sq, num);
> +
> +       return ret;
>  }
>
>  static void free_old_xmit(struct virtnet_sq *sq, bool in_napi)
> @@ -687,8 +767,7 @@ static int __virtnet_xdp_xmit_one(struct virtnet_info *vi,
>                             skb_frag_size(frag), skb_frag_off(frag));
>         }
>
> -       err = virtqueue_add_outbuf(sq->vq, sq->sg, nr_frags + 1,
> -                                  xdp_to_ptr(xdpf), GFP_ATOMIC);
> +       err = virtnet_add_outbuf(sq, nr_frags + 1, xdp_to_ptr(xdpf));
>         if (unlikely(err))
>                 return -ENOSPC; /* Caller handle free/refcnt */
>
> @@ -2154,7 +2233,7 @@ static int xmit_skb(struct virtnet_sq *sq, struct sk_buff *skb)
>                         return num_sg;
>                 num_sg++;
>         }
> -       return virtqueue_add_outbuf(sq->vq, sq->sg, num_sg, skb, GFP_ATOMIC);
> +       return virtnet_add_outbuf(sq, num_sg, skb);
>  }
>
>  static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev)
> @@ -4011,9 +4090,25 @@ static void free_receive_page_frags(struct virtnet_info *vi)
>
>  static void virtnet_sq_free_unused_bufs(struct virtqueue *vq)
>  {
> +       struct virtnet_info *vi = vq->vdev->priv;
> +       struct virtio_dma_head *dma;
> +       struct virtnet_sq *sq;
> +       int i = vq2txq(vq);
>         void *buf;
>
> -       while ((buf = virtqueue_detach_unused_buf(vq)) != NULL) {
> +       sq = &vi->sq[i];
> +
> +       if (virtqueue_get_dma_premapped(sq->vq)) {
> +               dma = &sq->dma.head;
> +               dma->num = ARRAY_SIZE(sq->dma.items);
> +               dma->next = 0;
> +       } else {
> +               dma = NULL;
> +       }
> +
> +       while ((buf = virtqueue_detach_unused_buf_dma(vq, dma)) != NULL) {
> +               virtnet_sq_unmap_buf(sq, dma);
> +
>                 if (!is_xdp_frame(buf))
>                         dev_kfree_skb(buf);
>                 else
> @@ -4228,7 +4323,7 @@ static int init_vqs(struct virtnet_info *vi)
>         if (ret)
>                 goto err_free;
>
> -       virtnet_rq_set_premapped(vi);
> +       virtnet_set_premapped(vi);
>
>         cpus_read_lock();
>         virtnet_set_affinity(vi);
> diff --git a/drivers/net/virtio/virtio_net.h b/drivers/net/virtio/virtio_net.h
> index 066a2b9d2b3c..dda144cc91c7 100644
> --- a/drivers/net/virtio/virtio_net.h
> +++ b/drivers/net/virtio/virtio_net.h
> @@ -48,13 +48,21 @@ struct virtnet_rq_dma {
>         u16 need_sync;
>  };
>
> +struct virtnet_sq_dma {
> +       struct virtio_dma_head head;
> +       struct virtio_dma_item items[MAX_SKB_FRAGS + 2];
> +};
> +
>  /* Internal representation of a send virtqueue */
>  struct virtnet_sq {
>         /* Virtqueue associated with this virtnet_sq */
>         struct virtqueue *vq;
>
>         /* TX: fragments + linear part + virtio header */
> -       struct scatterlist sg[MAX_SKB_FRAGS + 2];
> +       union {
> +               struct scatterlist sg[MAX_SKB_FRAGS + 2];
> +               struct virtnet_sq_dma dma;
> +       };
>
>         /* Name of the send queue: output.$index */
>         char name[16];
> --
> 2.32.0.3.g01195cf9f
>


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH net-next 0/5] virtio-net: sq support premapped mode
  2024-01-16  7:59 [PATCH net-next 0/5] virtio-net: sq support premapped mode Xuan Zhuo
                   ` (4 preceding siblings ...)
  2024-01-16  7:59 ` [PATCH net-next 5/5] virtio_net: sq support premapped mode Xuan Zhuo
@ 2024-01-25  3:39 ` Jason Wang
  2024-01-25  5:42   ` Xuan Zhuo
  2024-02-22 19:45 ` Michael S. Tsirkin
  6 siblings, 1 reply; 33+ messages in thread
From: Jason Wang @ 2024-01-25  3:39 UTC (permalink / raw)
  To: Xuan Zhuo
  Cc: netdev, Michael S. Tsirkin, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Alexei Starovoitov, Daniel Borkmann,
	Jesper Dangaard Brouer, John Fastabend, virtualization, bpf

On Tue, Jan 16, 2024 at 3:59 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
>
> This is the second part of virtio-net support AF_XDP zero copy.
>
> The whole patch set
> http://lore.kernel.org/all/20231229073108.57778-1-xuanzhuo@linux.alibaba.com
>
> ## About the branch
>
> This patch set is pushed to the net-next branch, but some patches are about
> virtio core. Because the entire patch set for virtio-net to support AF_XDP
> should be pushed to net-next, I hope these patches will be merged into net-next
> with the virtio core maintains's Acked-by.
>
> ============================================================================
>
> ## AF_XDP
>
> XDP socket(AF_XDP) is an excellent bypass kernel network framework. The zero
> copy feature of xsk (XDP socket) needs to be supported by the driver. The
> performance of zero copy is very good. mlx5 and intel ixgbe already support
> this feature, This patch set allows virtio-net to support xsk's zerocopy xmit
> feature.
>
> At present, we have completed some preparation:
>
> 1. vq-reset (virtio spec and kernel code)
> 2. virtio-core premapped dma
> 3. virtio-net xdp refactor
>
> So it is time for Virtio-Net to complete the support for the XDP Socket
> Zerocopy.
>
> Virtio-net can not increase the queue num at will, so xsk shares the queue with
> kernel.
>
> On the other hand, Virtio-Net does not support generate interrupt from driver
> manually, so when we wakeup tx xmit, we used some tips. If the CPU run by TX
> NAPI last time is other CPUs, use IPI to wake up NAPI on the remote CPU. If it
> is also the local CPU, then we wake up napi directly.
>
> This patch set includes some refactor to the virtio-net to let that to support
> AF_XDP.
>
> ## performance
>
> ENV: Qemu with vhost-user(polling mode).
> Host CPU: Intel(R) Xeon(R) Platinum 8163 CPU @ 2.50GHz
>
> ### virtio PMD in guest with testpmd
>
> testpmd> show port stats all
>
>  ######################## NIC statistics for port 0 ########################
>  RX-packets: 19531092064 RX-missed: 0     RX-bytes: 1093741155584
>  RX-errors: 0
>  RX-nombuf: 0
>  TX-packets: 5959955552 TX-errors: 0     TX-bytes: 371030645664
>
>
>  Throughput (since last show)
>  Rx-pps:   8861574     Rx-bps:  3969985208
>  Tx-pps:   8861493     Tx-bps:  3969962736
>  ############################################################################
>
> ### AF_XDP PMD in guest with testpmd
>
> testpmd> show port stats all
>
>   ######################## NIC statistics for port 0  ########################
>   RX-packets: 68152727   RX-missed: 0          RX-bytes:  3816552712
>   RX-errors: 0
>   RX-nombuf:  0
>   TX-packets: 68114967   TX-errors: 33216      TX-bytes:  3814438152
>
>   Throughput (since last show)
>   Rx-pps:      6333196          Rx-bps:   2837272088
>   Tx-pps:      6333227          Tx-bps:   2837285936
>   ############################################################################
>
> But AF_XDP consumes more CPU for tx and rx napi(100% and 86%).
>
> ## maintain
>
> I am currently a reviewer for virtio-net. I commit to maintain AF_XDP support in
> virtio-net.
>
> Please review.
>

Rethink of the whole design, I have one question:

The reason we need to store DMA information is to harden the virtqueue
to make sure the DMA unmap is safe. This seems redundant when the
buffer were premapped by the driver, for example:

Receive queue maintains DMA information, so it doesn't need desc_extra to work.

So can we simply

1) when premapping is enabled, store DMA information by driver itself
2) don't store DMA information in desc_extra

Would this be simpler?

Thanks


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH net-next 0/5] virtio-net: sq support premapped mode
  2024-01-25  3:39 ` [PATCH net-next 0/5] virtio-net: " Jason Wang
@ 2024-01-25  5:42   ` Xuan Zhuo
  2024-01-25  5:49     ` Xuan Zhuo
  0 siblings, 1 reply; 33+ messages in thread
From: Xuan Zhuo @ 2024-01-25  5:42 UTC (permalink / raw)
  To: Jason Wang
  Cc: netdev, Michael S. Tsirkin, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Alexei Starovoitov, Daniel Borkmann,
	Jesper Dangaard Brouer, John Fastabend, virtualization, bpf

On Thu, 25 Jan 2024 11:39:28 +0800, Jason Wang <jasowang@redhat.com> wrote:
> On Tue, Jan 16, 2024 at 3:59 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> >
> > This is the second part of virtio-net support AF_XDP zero copy.
> >
> > The whole patch set
> > http://lore.kernel.org/all/20231229073108.57778-1-xuanzhuo@linux.alibaba.com
> >
> > ## About the branch
> >
> > This patch set is pushed to the net-next branch, but some patches are about
> > virtio core. Because the entire patch set for virtio-net to support AF_XDP
> > should be pushed to net-next, I hope these patches will be merged into net-next
> > with the virtio core maintains's Acked-by.
> >
> > ============================================================================
> >
> > ## AF_XDP
> >
> > XDP socket(AF_XDP) is an excellent bypass kernel network framework. The zero
> > copy feature of xsk (XDP socket) needs to be supported by the driver. The
> > performance of zero copy is very good. mlx5 and intel ixgbe already support
> > this feature, This patch set allows virtio-net to support xsk's zerocopy xmit
> > feature.
> >
> > At present, we have completed some preparation:
> >
> > 1. vq-reset (virtio spec and kernel code)
> > 2. virtio-core premapped dma
> > 3. virtio-net xdp refactor
> >
> > So it is time for Virtio-Net to complete the support for the XDP Socket
> > Zerocopy.
> >
> > Virtio-net can not increase the queue num at will, so xsk shares the queue with
> > kernel.
> >
> > On the other hand, Virtio-Net does not support generate interrupt from driver
> > manually, so when we wakeup tx xmit, we used some tips. If the CPU run by TX
> > NAPI last time is other CPUs, use IPI to wake up NAPI on the remote CPU. If it
> > is also the local CPU, then we wake up napi directly.
> >
> > This patch set includes some refactor to the virtio-net to let that to support
> > AF_XDP.
> >
> > ## performance
> >
> > ENV: Qemu with vhost-user(polling mode).
> > Host CPU: Intel(R) Xeon(R) Platinum 8163 CPU @ 2.50GHz
> >
> > ### virtio PMD in guest with testpmd
> >
> > testpmd> show port stats all
> >
> >  ######################## NIC statistics for port 0 ########################
> >  RX-packets: 19531092064 RX-missed: 0     RX-bytes: 1093741155584
> >  RX-errors: 0
> >  RX-nombuf: 0
> >  TX-packets: 5959955552 TX-errors: 0     TX-bytes: 371030645664
> >
> >
> >  Throughput (since last show)
> >  Rx-pps:   8861574     Rx-bps:  3969985208
> >  Tx-pps:   8861493     Tx-bps:  3969962736
> >  ############################################################################
> >
> > ### AF_XDP PMD in guest with testpmd
> >
> > testpmd> show port stats all
> >
> >   ######################## NIC statistics for port 0  ########################
> >   RX-packets: 68152727   RX-missed: 0          RX-bytes:  3816552712
> >   RX-errors: 0
> >   RX-nombuf:  0
> >   TX-packets: 68114967   TX-errors: 33216      TX-bytes:  3814438152
> >
> >   Throughput (since last show)
> >   Rx-pps:      6333196          Rx-bps:   2837272088
> >   Tx-pps:      6333227          Tx-bps:   2837285936
> >   ############################################################################
> >
> > But AF_XDP consumes more CPU for tx and rx napi(100% and 86%).
> >
> > ## maintain
> >
> > I am currently a reviewer for virtio-net. I commit to maintain AF_XDP support in
> > virtio-net.
> >
> > Please review.
> >
>
> Rethink of the whole design, I have one question:
>
> The reason we need to store DMA information is to harden the virtqueue
> to make sure the DMA unmap is safe. This seems redundant when the
> buffer were premapped by the driver, for example:
>
> Receive queue maintains DMA information, so it doesn't need desc_extra to work.
>
> So can we simply
>
> 1) when premapping is enabled, store DMA information by driver itself

YES. this is simpler. And this is more convenience.
But the driver must allocate memory to store the dma info.

> 2) don't store DMA information in desc_extra

YES. But the desc_extra memory is wasted. The "next" item is used.
Do you think should we free the desc_extra when the vq is premapped mode?

Thanks.


>
> Would this be simpler?
>
> Thanks
>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH net-next 0/5] virtio-net: sq support premapped mode
  2024-01-25  5:42   ` Xuan Zhuo
@ 2024-01-25  5:49     ` Xuan Zhuo
  2024-01-25  6:14       ` Jason Wang
  0 siblings, 1 reply; 33+ messages in thread
From: Xuan Zhuo @ 2024-01-25  5:49 UTC (permalink / raw)
  To: Xuan Zhuo
  Cc: netdev, Michael S. Tsirkin, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Alexei Starovoitov, Daniel Borkmann,
	Jesper Dangaard Brouer, John Fastabend, virtualization, bpf,
	Jason Wang

On Thu, 25 Jan 2024 13:42:05 +0800, Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> On Thu, 25 Jan 2024 11:39:28 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > On Tue, Jan 16, 2024 at 3:59 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > >
> > > This is the second part of virtio-net support AF_XDP zero copy.
> > >
> > > The whole patch set
> > > http://lore.kernel.org/all/20231229073108.57778-1-xuanzhuo@linux.alibaba.com
> > >
> > > ## About the branch
> > >
> > > This patch set is pushed to the net-next branch, but some patches are about
> > > virtio core. Because the entire patch set for virtio-net to support AF_XDP
> > > should be pushed to net-next, I hope these patches will be merged into net-next
> > > with the virtio core maintains's Acked-by.
> > >
> > > ============================================================================
> > >
> > > ## AF_XDP
> > >
> > > XDP socket(AF_XDP) is an excellent bypass kernel network framework. The zero
> > > copy feature of xsk (XDP socket) needs to be supported by the driver. The
> > > performance of zero copy is very good. mlx5 and intel ixgbe already support
> > > this feature, This patch set allows virtio-net to support xsk's zerocopy xmit
> > > feature.
> > >
> > > At present, we have completed some preparation:
> > >
> > > 1. vq-reset (virtio spec and kernel code)
> > > 2. virtio-core premapped dma
> > > 3. virtio-net xdp refactor
> > >
> > > So it is time for Virtio-Net to complete the support for the XDP Socket
> > > Zerocopy.
> > >
> > > Virtio-net can not increase the queue num at will, so xsk shares the queue with
> > > kernel.
> > >
> > > On the other hand, Virtio-Net does not support generate interrupt from driver
> > > manually, so when we wakeup tx xmit, we used some tips. If the CPU run by TX
> > > NAPI last time is other CPUs, use IPI to wake up NAPI on the remote CPU. If it
> > > is also the local CPU, then we wake up napi directly.
> > >
> > > This patch set includes some refactor to the virtio-net to let that to support
> > > AF_XDP.
> > >
> > > ## performance
> > >
> > > ENV: Qemu with vhost-user(polling mode).
> > > Host CPU: Intel(R) Xeon(R) Platinum 8163 CPU @ 2.50GHz
> > >
> > > ### virtio PMD in guest with testpmd
> > >
> > > testpmd> show port stats all
> > >
> > >  ######################## NIC statistics for port 0 ########################
> > >  RX-packets: 19531092064 RX-missed: 0     RX-bytes: 1093741155584
> > >  RX-errors: 0
> > >  RX-nombuf: 0
> > >  TX-packets: 5959955552 TX-errors: 0     TX-bytes: 371030645664
> > >
> > >
> > >  Throughput (since last show)
> > >  Rx-pps:   8861574     Rx-bps:  3969985208
> > >  Tx-pps:   8861493     Tx-bps:  3969962736
> > >  ############################################################################
> > >
> > > ### AF_XDP PMD in guest with testpmd
> > >
> > > testpmd> show port stats all
> > >
> > >   ######################## NIC statistics for port 0  ########################
> > >   RX-packets: 68152727   RX-missed: 0          RX-bytes:  3816552712
> > >   RX-errors: 0
> > >   RX-nombuf:  0
> > >   TX-packets: 68114967   TX-errors: 33216      TX-bytes:  3814438152
> > >
> > >   Throughput (since last show)
> > >   Rx-pps:      6333196          Rx-bps:   2837272088
> > >   Tx-pps:      6333227          Tx-bps:   2837285936
> > >   ############################################################################
> > >
> > > But AF_XDP consumes more CPU for tx and rx napi(100% and 86%).
> > >
> > > ## maintain
> > >
> > > I am currently a reviewer for virtio-net. I commit to maintain AF_XDP support in
> > > virtio-net.
> > >
> > > Please review.
> > >
> >
> > Rethink of the whole design, I have one question:
> >
> > The reason we need to store DMA information is to harden the virtqueue
> > to make sure the DMA unmap is safe. This seems redundant when the
> > buffer were premapped by the driver, for example:
> >
> > Receive queue maintains DMA information, so it doesn't need desc_extra to work.
> >
> > So can we simply
> >
> > 1) when premapping is enabled, store DMA information by driver itself
>
> YES. this is simpler. And this is more convenience.
> But the driver must allocate memory to store the dma info.
>
> > 2) don't store DMA information in desc_extra
>
> YES. But the desc_extra memory is wasted. The "next" item is used.
> Do you think should we free the desc_extra when the vq is premapped mode?


struct vring_desc_extra {
	dma_addr_t addr;		/* Descriptor DMA addr. */
	u32 len;			/* Descriptor length. */
	u16 flags;			/* Descriptor flags. */
	u16 next;			/* The next desc state in a list. */
};


The flags and the next are used whatever premapped or not.

So I think we can add a new array to store the addr and len.
If the vq is premappd, the memory can be freed.

struct vring_desc_extra {
	u16 flags;			/* Descriptor flags. */
	u16 next;			/* The next desc state in a list. */
};

struct vring_desc_dma {
	dma_addr_t addr;		/* Descriptor DMA addr. */
	u32 len;			/* Descriptor length. */
};

Thanks.

>
> Thanks.
>
>
> >
> > Would this be simpler?
> >
> > Thanks
> >

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH net-next 4/5] virtio_ring: introduce virtqueue_get_dma_premapped()
  2024-01-25  3:39   ` Jason Wang
@ 2024-01-25  5:57     ` Xuan Zhuo
  2024-01-29  3:07       ` Jason Wang
  0 siblings, 1 reply; 33+ messages in thread
From: Xuan Zhuo @ 2024-01-25  5:57 UTC (permalink / raw)
  To: Jason Wang
  Cc: netdev, Michael S. Tsirkin, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Alexei Starovoitov, Daniel Borkmann,
	Jesper Dangaard Brouer, John Fastabend, virtualization, bpf

On Thu, 25 Jan 2024 11:39:03 +0800, Jason Wang <jasowang@redhat.com> wrote:
> On Tue, Jan 16, 2024 at 3:59 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> >
> > Introduce helper virtqueue_get_dma_premapped(), then the driver
> > can know whether dma unmap is needed.
> >
> > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > ---
> >  drivers/net/virtio/main.c       | 22 +++++++++-------------
> >  drivers/net/virtio/virtio_net.h |  3 ---
> >  drivers/virtio/virtio_ring.c    | 22 ++++++++++++++++++++++
> >  include/linux/virtio.h          |  1 +
> >  4 files changed, 32 insertions(+), 16 deletions(-)
> >
> > diff --git a/drivers/net/virtio/main.c b/drivers/net/virtio/main.c
> > index 186b2cf5d8fc..4fbf612da235 100644
> > --- a/drivers/net/virtio/main.c
> > +++ b/drivers/net/virtio/main.c
> > @@ -483,7 +483,7 @@ static void *virtnet_rq_get_buf(struct virtnet_rq *rq, u32 *len, void **ctx)
> >         void *buf;
> >
> >         buf = virtqueue_get_buf_ctx(rq->vq, len, ctx);
> > -       if (buf && rq->do_dma)
> > +       if (buf && virtqueue_get_dma_premapped(rq->vq))
> >                 virtnet_rq_unmap(rq, buf, *len);
> >
> >         return buf;
> > @@ -496,7 +496,7 @@ static void virtnet_rq_init_one_sg(struct virtnet_rq *rq, void *buf, u32 len)
> >         u32 offset;
> >         void *head;
> >
> > -       if (!rq->do_dma) {
> > +       if (!virtqueue_get_dma_premapped(rq->vq)) {
> >                 sg_init_one(rq->sg, buf, len);
> >                 return;
> >         }
> > @@ -526,7 +526,7 @@ static void *virtnet_rq_alloc(struct virtnet_rq *rq, u32 size, gfp_t gfp)
> >
> >         head = page_address(alloc_frag->page);
> >
> > -       if (rq->do_dma) {
> > +       if (virtqueue_get_dma_premapped(rq->vq)) {
> >                 dma = head;
> >
> >                 /* new pages */
> > @@ -580,12 +580,8 @@ static void virtnet_rq_set_premapped(struct virtnet_info *vi)
> >         if (!vi->mergeable_rx_bufs && vi->big_packets)
> >                 return;
> >
> > -       for (i = 0; i < vi->max_queue_pairs; i++) {
> > -               if (virtqueue_set_dma_premapped(vi->rq[i].vq))
> > -                       continue;
> > -
> > -               vi->rq[i].do_dma = true;
> > -       }
> > +       for (i = 0; i < vi->max_queue_pairs; i++)
> > +               virtqueue_set_dma_premapped(vi->rq[i].vq);
> >  }
> >
> >  static void free_old_xmit(struct virtnet_sq *sq, bool in_napi)
> > @@ -1643,7 +1639,7 @@ static int add_recvbuf_small(struct virtnet_info *vi, struct virtnet_rq *rq,
> >
> >         err = virtqueue_add_inbuf_ctx(rq->vq, rq->sg, 1, buf, ctx, gfp);
> >         if (err < 0) {
> > -               if (rq->do_dma)
> > +               if (virtqueue_get_dma_premapped(rq->vq))
> >                         virtnet_rq_unmap(rq, buf, 0);
> >                 put_page(virt_to_head_page(buf));
> >         }
> > @@ -1758,7 +1754,7 @@ static int add_recvbuf_mergeable(struct virtnet_info *vi,
> >         ctx = mergeable_len_to_ctx(len + room, headroom);
> >         err = virtqueue_add_inbuf_ctx(rq->vq, rq->sg, 1, buf, ctx, gfp);
> >         if (err < 0) {
> > -               if (rq->do_dma)
> > +               if (virtqueue_get_dma_premapped(rq->vq))
> >                         virtnet_rq_unmap(rq, buf, 0);
> >                 put_page(virt_to_head_page(buf));
> >         }
> > @@ -4007,7 +4003,7 @@ static void free_receive_page_frags(struct virtnet_info *vi)
> >         int i;
> >         for (i = 0; i < vi->max_queue_pairs; i++)
> >                 if (vi->rq[i].alloc_frag.page) {
> > -                       if (vi->rq[i].do_dma && vi->rq[i].last_dma)
> > +                       if (virtqueue_get_dma_premapped(vi->rq[i].vq) && vi->rq[i].last_dma)
> >                                 virtnet_rq_unmap(&vi->rq[i], vi->rq[i].last_dma, 0);
> >                         put_page(vi->rq[i].alloc_frag.page);
> >                 }
> > @@ -4035,7 +4031,7 @@ static void virtnet_rq_free_unused_bufs(struct virtqueue *vq)
> >         rq = &vi->rq[i];
> >
> >         while ((buf = virtqueue_detach_unused_buf(vq)) != NULL) {
> > -               if (rq->do_dma)
> > +               if (virtqueue_get_dma_premapped(rq->vq))
> >                         virtnet_rq_unmap(rq, buf, 0);
> >
> >                 virtnet_rq_free_buf(vi, rq, buf);
> > diff --git a/drivers/net/virtio/virtio_net.h b/drivers/net/virtio/virtio_net.h
> > index b28a4d0a3150..066a2b9d2b3c 100644
> > --- a/drivers/net/virtio/virtio_net.h
> > +++ b/drivers/net/virtio/virtio_net.h
> > @@ -115,9 +115,6 @@ struct virtnet_rq {
> >
> >         /* Record the last dma info to free after new pages is allocated. */
> >         struct virtnet_rq_dma *last_dma;
> > -
> > -       /* Do dma by self */
> > -       bool do_dma;
> >  };
> >
> >  struct virtnet_info {
> > diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> > index 2c5089d3b510..9092bcdebb53 100644
> > --- a/drivers/virtio/virtio_ring.c
> > +++ b/drivers/virtio/virtio_ring.c
> > @@ -2905,6 +2905,28 @@ int virtqueue_set_dma_premapped(struct virtqueue *_vq)
> >  }
> >  EXPORT_SYMBOL_GPL(virtqueue_set_dma_premapped);
> >
> > +/**
> > + * virtqueue_get_dma_premapped - get the vring premapped mode
> > + * @_vq: the struct virtqueue we're talking about.
> > + *
> > + * Get the premapped mode of the vq.
> > + *
> > + * Returns bool for the vq premapped mode.
> > + */
> > +bool virtqueue_get_dma_premapped(struct virtqueue *_vq)
> > +{
> > +       struct vring_virtqueue *vq = to_vvq(_vq);
> > +       bool premapped;
> > +
> > +       START_USE(vq);
> > +       premapped = vq->premapped;
> > +       END_USE(vq);
>
> Why do we need to protect premapped like this? Is the user allowed to
> change it on the fly?


Just protect before accessing vq.

Thanks.
>
> Thanks
>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH net-next 5/5] virtio_net: sq support premapped mode
  2024-01-25  3:39   ` Jason Wang
@ 2024-01-25  5:58     ` Xuan Zhuo
  2024-01-29  3:06       ` Jason Wang
  0 siblings, 1 reply; 33+ messages in thread
From: Xuan Zhuo @ 2024-01-25  5:58 UTC (permalink / raw)
  To: Jason Wang
  Cc: netdev, Michael S. Tsirkin, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Alexei Starovoitov, Daniel Borkmann,
	Jesper Dangaard Brouer, John Fastabend, virtualization, bpf

On Thu, 25 Jan 2024 11:39:20 +0800, Jason Wang <jasowang@redhat.com> wrote:
> On Tue, Jan 16, 2024 at 3:59 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> >
> > If the xsk is enabling, the xsk tx will share the send queue.
>
> Any reason for this? Technically, virtio-net can work as other NIC
> like 256 queues. There could be some work like optimizing the
> interrupt allocations etc.

Just like the logic of XDP_TX.

Now the virtio spec does not allow to add new dynamic queues.
As I know, most hypervisors just support few queues. The num of
queues is not bigger than the cpu num. So the best way is
to share the send queues.

Parav and I tried to introduce dynamic queues. But that is dropped.
Before that I think we can share the send queues.


>
> > But the xsk requires that the send queue use the premapped mode.
> > So the send queue must support premapped mode.
> >
> > command: pktgen_sample01_simple.sh -i eth0 -s 16/1400 -d 10.0.0.123 -m 00:16:3e:12:e1:3e -n 0 -p 100
> > machine:  ecs.ebmg6e.26xlarge of Aliyun
> > cpu: Intel(R) Xeon(R) Platinum 8269CY CPU @ 2.50GHz
> > iommu mode: intel_iommu=on iommu.strict=1 iommu=nopt
> >
> >                       |        iommu off           |        iommu on
> > ----------------------|-----------------------------------------------------
> >                       | 16         |  1400         | 16         | 1400
> > ----------------------|-----------------------------------------------------
> > Before:               |1716796.00  |  1581829.00   | 390756.00  | 374493.00
> > After(premapped off): |1733794.00  |  1576259.00   | 390189.00  | 378128.00
> > After(premapped on):  |1707107.00  |  1562917.00   | 385667.00  | 373584.00
> >
> > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > ---
> >  drivers/net/virtio/main.c       | 119 ++++++++++++++++++++++++++++----
> >  drivers/net/virtio/virtio_net.h |  10 ++-
> >  2 files changed, 116 insertions(+), 13 deletions(-)
> >
> > diff --git a/drivers/net/virtio/main.c b/drivers/net/virtio/main.c
> > index 4fbf612da235..53143f95a3a0 100644
> > --- a/drivers/net/virtio/main.c
> > +++ b/drivers/net/virtio/main.c
> > @@ -168,13 +168,39 @@ static struct xdp_frame *ptr_to_xdp(void *ptr)
> >         return (struct xdp_frame *)((unsigned long)ptr & ~VIRTIO_XDP_FLAG);
> >  }
> >
> > +static void virtnet_sq_unmap_buf(struct virtnet_sq *sq, struct virtio_dma_head *dma)
> > +{
> > +       int i;
> > +
> > +       if (!dma)
> > +               return;
> > +
> > +       for (i = 0; i < dma->next; ++i)
> > +               virtqueue_dma_unmap_single_attrs(sq->vq,
> > +                                                dma->items[i].addr,
> > +                                                dma->items[i].length,
> > +                                                DMA_TO_DEVICE, 0);
> > +       dma->next = 0;
> > +}
> > +
> >  static void __free_old_xmit(struct virtnet_sq *sq, bool in_napi,
> >                             u64 *bytes, u64 *packets)
> >  {
> > +       struct virtio_dma_head *dma;
> >         unsigned int len;
> >         void *ptr;
> >
> > -       while ((ptr = virtqueue_get_buf(sq->vq, &len)) != NULL) {
> > +       if (virtqueue_get_dma_premapped(sq->vq)) {
>
> Any chance this.can be false?

__free_old_xmit is the common path.

The virtqueue_get_dma_premapped() is used to check whether the sq is premapped
mode.

>
> > +               dma = &sq->dma.head;
> > +               dma->num = ARRAY_SIZE(sq->dma.items);
> > +               dma->next = 0;
>
> Btw, I found in the case of RX we have:
>
> virtnet_rq_alloc():
>
>                         alloc_frag->offset = sizeof(*dma);
>
> This seems to defeat frag coalescing when the memory is highly
> fragmented or high order allocation is disallowed.
>
> Any idea to solve this?


On the rq premapped pathset, I answered this.

http://lore.kernel.org/all/1692156147.7470396-3-xuanzhuo@linux.alibaba.com

>
> > +       } else {
> > +               dma = NULL;
> > +       }
> > +
> > +       while ((ptr = virtqueue_get_buf_ctx_dma(sq->vq, &len, dma, NULL)) != NULL) {
> > +               virtnet_sq_unmap_buf(sq, dma);
> > +
> >                 if (!is_xdp_frame(ptr)) {
> >                         struct sk_buff *skb = ptr;
> >
> > @@ -572,16 +598,70 @@ static void *virtnet_rq_alloc(struct virtnet_rq *rq, u32 size, gfp_t gfp)
> >         return buf;
> >  }
> >
> > -static void virtnet_rq_set_premapped(struct virtnet_info *vi)
> > +static void virtnet_set_premapped(struct virtnet_info *vi)
> >  {
> >         int i;
> >
> > -       /* disable for big mode */
> > -       if (!vi->mergeable_rx_bufs && vi->big_packets)
> > -               return;
> > +       for (i = 0; i < vi->max_queue_pairs; i++) {
> > +               virtqueue_set_dma_premapped(vi->sq[i].vq);
> >
> > -       for (i = 0; i < vi->max_queue_pairs; i++)
> > -               virtqueue_set_dma_premapped(vi->rq[i].vq);
> > +               /* TODO for big mode */
>
> Btw, how hard to support big mode? If we can do premapping for that
> code could be simplified.
>
> (There are vendors that doesn't support mergeable rx buffers).

I will do that after these patch-sets

>
> > +               if (vi->mergeable_rx_bufs || !vi->big_packets)
> > +                       virtqueue_set_dma_premapped(vi->rq[i].vq);
> > +       }
> > +}
> > +
> > +static void virtnet_sq_unmap_sg(struct virtnet_sq *sq, u32 num)
> > +{
> > +       struct scatterlist *sg;
> > +       u32 i;
> > +
> > +       for (i = 0; i < num; ++i) {
> > +               sg = &sq->sg[i];
> > +
> > +               virtqueue_dma_unmap_single_attrs(sq->vq,
> > +                                                sg->dma_address,
> > +                                                sg->length,
> > +                                                DMA_TO_DEVICE, 0);
> > +       }
> > +}
> > +
> > +static int virtnet_sq_map_sg(struct virtnet_sq *sq, u32 num)
> > +{
> > +       struct scatterlist *sg;
> > +       u32 i;
> > +
> > +       for (i = 0; i < num; ++i) {
> > +               sg = &sq->sg[i];
> > +               sg->dma_address = virtqueue_dma_map_single_attrs(sq->vq, sg_virt(sg),
> > +                                                                sg->length,
> > +                                                                DMA_TO_DEVICE, 0);
> > +               if (virtqueue_dma_mapping_error(sq->vq, sg->dma_address))
> > +                       goto err;
> > +       }
> > +
>
> This seems nothing virtio-net specific, let's move it to the core?


This is the dma api style.

And the caller can not judge it by the return value of
virtqueue_dma_map_single_attrs.

Thanks


>
> Thanks
>
>
> > +       return 0;
> > +
> > +err:
> > +       virtnet_sq_unmap_sg(sq, i);
> > +       return -ENOMEM;
> > +}
> > +
> > +static int virtnet_add_outbuf(struct virtnet_sq *sq, u32 num, void *data)
> > +{
> > +       int ret;
> > +
> > +       if (virtqueue_get_dma_premapped(sq->vq)) {
> > +               ret = virtnet_sq_map_sg(sq, num);
> > +               if (ret)
> > +                       return -ENOMEM;
> > +       }
> > +
> > +       ret = virtqueue_add_outbuf(sq->vq, sq->sg, num, data, GFP_ATOMIC);
> > +       if (ret && virtqueue_get_dma_premapped(sq->vq))
> > +               virtnet_sq_unmap_sg(sq, num);
> > +
> > +       return ret;
> >  }
> >
> >  static void free_old_xmit(struct virtnet_sq *sq, bool in_napi)
> > @@ -687,8 +767,7 @@ static int __virtnet_xdp_xmit_one(struct virtnet_info *vi,
> >                             skb_frag_size(frag), skb_frag_off(frag));
> >         }
> >
> > -       err = virtqueue_add_outbuf(sq->vq, sq->sg, nr_frags + 1,
> > -                                  xdp_to_ptr(xdpf), GFP_ATOMIC);
> > +       err = virtnet_add_outbuf(sq, nr_frags + 1, xdp_to_ptr(xdpf));
> >         if (unlikely(err))
> >                 return -ENOSPC; /* Caller handle free/refcnt */
> >
> > @@ -2154,7 +2233,7 @@ static int xmit_skb(struct virtnet_sq *sq, struct sk_buff *skb)
> >                         return num_sg;
> >                 num_sg++;
> >         }
> > -       return virtqueue_add_outbuf(sq->vq, sq->sg, num_sg, skb, GFP_ATOMIC);
> > +       return virtnet_add_outbuf(sq, num_sg, skb);
> >  }
> >
> >  static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev)
> > @@ -4011,9 +4090,25 @@ static void free_receive_page_frags(struct virtnet_info *vi)
> >
> >  static void virtnet_sq_free_unused_bufs(struct virtqueue *vq)
> >  {
> > +       struct virtnet_info *vi = vq->vdev->priv;
> > +       struct virtio_dma_head *dma;
> > +       struct virtnet_sq *sq;
> > +       int i = vq2txq(vq);
> >         void *buf;
> >
> > -       while ((buf = virtqueue_detach_unused_buf(vq)) != NULL) {
> > +       sq = &vi->sq[i];
> > +
> > +       if (virtqueue_get_dma_premapped(sq->vq)) {
> > +               dma = &sq->dma.head;
> > +               dma->num = ARRAY_SIZE(sq->dma.items);
> > +               dma->next = 0;
> > +       } else {
> > +               dma = NULL;
> > +       }
> > +
> > +       while ((buf = virtqueue_detach_unused_buf_dma(vq, dma)) != NULL) {
> > +               virtnet_sq_unmap_buf(sq, dma);
> > +
> >                 if (!is_xdp_frame(buf))
> >                         dev_kfree_skb(buf);
> >                 else
> > @@ -4228,7 +4323,7 @@ static int init_vqs(struct virtnet_info *vi)
> >         if (ret)
> >                 goto err_free;
> >
> > -       virtnet_rq_set_premapped(vi);
> > +       virtnet_set_premapped(vi);
> >
> >         cpus_read_lock();
> >         virtnet_set_affinity(vi);
> > diff --git a/drivers/net/virtio/virtio_net.h b/drivers/net/virtio/virtio_net.h
> > index 066a2b9d2b3c..dda144cc91c7 100644
> > --- a/drivers/net/virtio/virtio_net.h
> > +++ b/drivers/net/virtio/virtio_net.h
> > @@ -48,13 +48,21 @@ struct virtnet_rq_dma {
> >         u16 need_sync;
> >  };
> >
> > +struct virtnet_sq_dma {
> > +       struct virtio_dma_head head;
> > +       struct virtio_dma_item items[MAX_SKB_FRAGS + 2];
> > +};
> > +
> >  /* Internal representation of a send virtqueue */
> >  struct virtnet_sq {
> >         /* Virtqueue associated with this virtnet_sq */
> >         struct virtqueue *vq;
> >
> >         /* TX: fragments + linear part + virtio header */
> > -       struct scatterlist sg[MAX_SKB_FRAGS + 2];
> > +       union {
> > +               struct scatterlist sg[MAX_SKB_FRAGS + 2];
> > +               struct virtnet_sq_dma dma;
> > +       };
> >
> >         /* Name of the send queue: output.$index */
> >         char name[16];
> > --
> > 2.32.0.3.g01195cf9f
> >
>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH net-next 0/5] virtio-net: sq support premapped mode
  2024-01-25  5:49     ` Xuan Zhuo
@ 2024-01-25  6:14       ` Jason Wang
  2024-01-25  6:25         ` Xuan Zhuo
  0 siblings, 1 reply; 33+ messages in thread
From: Jason Wang @ 2024-01-25  6:14 UTC (permalink / raw)
  To: Xuan Zhuo
  Cc: netdev, Michael S. Tsirkin, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Alexei Starovoitov, Daniel Borkmann,
	Jesper Dangaard Brouer, John Fastabend, virtualization, bpf

On Thu, Jan 25, 2024 at 1:52 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
>
> On Thu, 25 Jan 2024 13:42:05 +0800, Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > On Thu, 25 Jan 2024 11:39:28 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > On Tue, Jan 16, 2024 at 3:59 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > >
> > > > This is the second part of virtio-net support AF_XDP zero copy.
> > > >
> > > > The whole patch set
> > > > http://lore.kernel.org/all/20231229073108.57778-1-xuanzhuo@linux.alibaba.com
> > > >
> > > > ## About the branch
> > > >
> > > > This patch set is pushed to the net-next branch, but some patches are about
> > > > virtio core. Because the entire patch set for virtio-net to support AF_XDP
> > > > should be pushed to net-next, I hope these patches will be merged into net-next
> > > > with the virtio core maintains's Acked-by.
> > > >
> > > > ============================================================================
> > > >
> > > > ## AF_XDP
> > > >
> > > > XDP socket(AF_XDP) is an excellent bypass kernel network framework. The zero
> > > > copy feature of xsk (XDP socket) needs to be supported by the driver. The
> > > > performance of zero copy is very good. mlx5 and intel ixgbe already support
> > > > this feature, This patch set allows virtio-net to support xsk's zerocopy xmit
> > > > feature.
> > > >
> > > > At present, we have completed some preparation:
> > > >
> > > > 1. vq-reset (virtio spec and kernel code)
> > > > 2. virtio-core premapped dma
> > > > 3. virtio-net xdp refactor
> > > >
> > > > So it is time for Virtio-Net to complete the support for the XDP Socket
> > > > Zerocopy.
> > > >
> > > > Virtio-net can not increase the queue num at will, so xsk shares the queue with
> > > > kernel.
> > > >
> > > > On the other hand, Virtio-Net does not support generate interrupt from driver
> > > > manually, so when we wakeup tx xmit, we used some tips. If the CPU run by TX
> > > > NAPI last time is other CPUs, use IPI to wake up NAPI on the remote CPU. If it
> > > > is also the local CPU, then we wake up napi directly.
> > > >
> > > > This patch set includes some refactor to the virtio-net to let that to support
> > > > AF_XDP.
> > > >
> > > > ## performance
> > > >
> > > > ENV: Qemu with vhost-user(polling mode).
> > > > Host CPU: Intel(R) Xeon(R) Platinum 8163 CPU @ 2.50GHz
> > > >
> > > > ### virtio PMD in guest with testpmd
> > > >
> > > > testpmd> show port stats all
> > > >
> > > >  ######################## NIC statistics for port 0 ########################
> > > >  RX-packets: 19531092064 RX-missed: 0     RX-bytes: 1093741155584
> > > >  RX-errors: 0
> > > >  RX-nombuf: 0
> > > >  TX-packets: 5959955552 TX-errors: 0     TX-bytes: 371030645664
> > > >
> > > >
> > > >  Throughput (since last show)
> > > >  Rx-pps:   8861574     Rx-bps:  3969985208
> > > >  Tx-pps:   8861493     Tx-bps:  3969962736
> > > >  ############################################################################
> > > >
> > > > ### AF_XDP PMD in guest with testpmd
> > > >
> > > > testpmd> show port stats all
> > > >
> > > >   ######################## NIC statistics for port 0  ########################
> > > >   RX-packets: 68152727   RX-missed: 0          RX-bytes:  3816552712
> > > >   RX-errors: 0
> > > >   RX-nombuf:  0
> > > >   TX-packets: 68114967   TX-errors: 33216      TX-bytes:  3814438152
> > > >
> > > >   Throughput (since last show)
> > > >   Rx-pps:      6333196          Rx-bps:   2837272088
> > > >   Tx-pps:      6333227          Tx-bps:   2837285936
> > > >   ############################################################################
> > > >
> > > > But AF_XDP consumes more CPU for tx and rx napi(100% and 86%).
> > > >
> > > > ## maintain
> > > >
> > > > I am currently a reviewer for virtio-net. I commit to maintain AF_XDP support in
> > > > virtio-net.
> > > >
> > > > Please review.
> > > >
> > >
> > > Rethink of the whole design, I have one question:
> > >
> > > The reason we need to store DMA information is to harden the virtqueue
> > > to make sure the DMA unmap is safe. This seems redundant when the
> > > buffer were premapped by the driver, for example:
> > >
> > > Receive queue maintains DMA information, so it doesn't need desc_extra to work.
> > >
> > > So can we simply
> > >
> > > 1) when premapping is enabled, store DMA information by driver itself
> >
> > YES. this is simpler. And this is more convenience.
> > But the driver must allocate memory to store the dma info.

Right, and this looks like the common practice for most of the NIC drivers.

> >
> > > 2) don't store DMA information in desc_extra
> >
> > YES. But the desc_extra memory is wasted. The "next" item is used.
> > Do you think should we free the desc_extra when the vq is premapped mode?
>
>
> struct vring_desc_extra {
>         dma_addr_t addr;                /* Descriptor DMA addr. */
>         u32 len;                        /* Descriptor length. */
>         u16 flags;                      /* Descriptor flags. */
>         u16 next;                       /* The next desc state in a list. */
> };
>
>
> The flags and the next are used whatever premapped or not.
>
> So I think we can add a new array to store the addr and len.

Yes.

> If the vq is premappd, the memory can be freed.

Then we need to make sure the premapped is set before find_vqs() etc.

>
> struct vring_desc_extra {
>         u16 flags;                      /* Descriptor flags. */
>         u16 next;                       /* The next desc state in a list. */
> };
>
> struct vring_desc_dma {
>         dma_addr_t addr;                /* Descriptor DMA addr. */
>         u32 len;                        /* Descriptor length. */
> };
>
> Thanks.

Thanks

>
> >
> > Thanks.
> >
> >
> > >
> > > Would this be simpler?
> > >
> > > Thanks
> > >
>


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH net-next 0/5] virtio-net: sq support premapped mode
  2024-01-25  6:14       ` Jason Wang
@ 2024-01-25  6:25         ` Xuan Zhuo
  2024-01-29  3:14           ` Jason Wang
  0 siblings, 1 reply; 33+ messages in thread
From: Xuan Zhuo @ 2024-01-25  6:25 UTC (permalink / raw)
  To: Jason Wang
  Cc: netdev, Michael S. Tsirkin, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Alexei Starovoitov, Daniel Borkmann,
	Jesper Dangaard Brouer, John Fastabend, virtualization, bpf

On Thu, 25 Jan 2024 14:14:58 +0800, Jason Wang <jasowang@redhat.com> wrote:
> On Thu, Jan 25, 2024 at 1:52 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> >
> > On Thu, 25 Jan 2024 13:42:05 +0800, Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > On Thu, 25 Jan 2024 11:39:28 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > > On Tue, Jan 16, 2024 at 3:59 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > > >
> > > > > This is the second part of virtio-net support AF_XDP zero copy.
> > > > >
> > > > > The whole patch set
> > > > > http://lore.kernel.org/all/20231229073108.57778-1-xuanzhuo@linux.alibaba.com
> > > > >
> > > > > ## About the branch
> > > > >
> > > > > This patch set is pushed to the net-next branch, but some patches are about
> > > > > virtio core. Because the entire patch set for virtio-net to support AF_XDP
> > > > > should be pushed to net-next, I hope these patches will be merged into net-next
> > > > > with the virtio core maintains's Acked-by.
> > > > >
> > > > > ============================================================================
> > > > >
> > > > > ## AF_XDP
> > > > >
> > > > > XDP socket(AF_XDP) is an excellent bypass kernel network framework. The zero
> > > > > copy feature of xsk (XDP socket) needs to be supported by the driver. The
> > > > > performance of zero copy is very good. mlx5 and intel ixgbe already support
> > > > > this feature, This patch set allows virtio-net to support xsk's zerocopy xmit
> > > > > feature.
> > > > >
> > > > > At present, we have completed some preparation:
> > > > >
> > > > > 1. vq-reset (virtio spec and kernel code)
> > > > > 2. virtio-core premapped dma
> > > > > 3. virtio-net xdp refactor
> > > > >
> > > > > So it is time for Virtio-Net to complete the support for the XDP Socket
> > > > > Zerocopy.
> > > > >
> > > > > Virtio-net can not increase the queue num at will, so xsk shares the queue with
> > > > > kernel.
> > > > >
> > > > > On the other hand, Virtio-Net does not support generate interrupt from driver
> > > > > manually, so when we wakeup tx xmit, we used some tips. If the CPU run by TX
> > > > > NAPI last time is other CPUs, use IPI to wake up NAPI on the remote CPU. If it
> > > > > is also the local CPU, then we wake up napi directly.
> > > > >
> > > > > This patch set includes some refactor to the virtio-net to let that to support
> > > > > AF_XDP.
> > > > >
> > > > > ## performance
> > > > >
> > > > > ENV: Qemu with vhost-user(polling mode).
> > > > > Host CPU: Intel(R) Xeon(R) Platinum 8163 CPU @ 2.50GHz
> > > > >
> > > > > ### virtio PMD in guest with testpmd
> > > > >
> > > > > testpmd> show port stats all
> > > > >
> > > > >  ######################## NIC statistics for port 0 ########################
> > > > >  RX-packets: 19531092064 RX-missed: 0     RX-bytes: 1093741155584
> > > > >  RX-errors: 0
> > > > >  RX-nombuf: 0
> > > > >  TX-packets: 5959955552 TX-errors: 0     TX-bytes: 371030645664
> > > > >
> > > > >
> > > > >  Throughput (since last show)
> > > > >  Rx-pps:   8861574     Rx-bps:  3969985208
> > > > >  Tx-pps:   8861493     Tx-bps:  3969962736
> > > > >  ############################################################################
> > > > >
> > > > > ### AF_XDP PMD in guest with testpmd
> > > > >
> > > > > testpmd> show port stats all
> > > > >
> > > > >   ######################## NIC statistics for port 0  ########################
> > > > >   RX-packets: 68152727   RX-missed: 0          RX-bytes:  3816552712
> > > > >   RX-errors: 0
> > > > >   RX-nombuf:  0
> > > > >   TX-packets: 68114967   TX-errors: 33216      TX-bytes:  3814438152
> > > > >
> > > > >   Throughput (since last show)
> > > > >   Rx-pps:      6333196          Rx-bps:   2837272088
> > > > >   Tx-pps:      6333227          Tx-bps:   2837285936
> > > > >   ############################################################################
> > > > >
> > > > > But AF_XDP consumes more CPU for tx and rx napi(100% and 86%).
> > > > >
> > > > > ## maintain
> > > > >
> > > > > I am currently a reviewer for virtio-net. I commit to maintain AF_XDP support in
> > > > > virtio-net.
> > > > >
> > > > > Please review.
> > > > >
> > > >
> > > > Rethink of the whole design, I have one question:
> > > >
> > > > The reason we need to store DMA information is to harden the virtqueue
> > > > to make sure the DMA unmap is safe. This seems redundant when the
> > > > buffer were premapped by the driver, for example:
> > > >
> > > > Receive queue maintains DMA information, so it doesn't need desc_extra to work.
> > > >
> > > > So can we simply
> > > >
> > > > 1) when premapping is enabled, store DMA information by driver itself
> > >
> > > YES. this is simpler. And this is more convenience.
> > > But the driver must allocate memory to store the dma info.
>
> Right, and this looks like the common practice for most of the NIC drivers.
>
> > >
> > > > 2) don't store DMA information in desc_extra
> > >
> > > YES. But the desc_extra memory is wasted. The "next" item is used.
> > > Do you think should we free the desc_extra when the vq is premapped mode?
> >
> >
> > struct vring_desc_extra {
> >         dma_addr_t addr;                /* Descriptor DMA addr. */
> >         u32 len;                        /* Descriptor length. */
> >         u16 flags;                      /* Descriptor flags. */
> >         u16 next;                       /* The next desc state in a list. */
> > };
> >
> >
> > The flags and the next are used whatever premapped or not.
> >
> > So I think we can add a new array to store the addr and len.
>
> Yes.
>
> > If the vq is premappd, the memory can be freed.
>
> Then we need to make sure the premapped is set before find_vqs() etc.


Yes. We can start from the parameters of the find_vqs().

But actually we can free the dma array when the driver sets premapped mode.

>
> >
> > struct vring_desc_extra {
> >         u16 flags;                      /* Descriptor flags. */
> >         u16 next;                       /* The next desc state in a list. */
> > };
> >
> > struct vring_desc_dma {
> >         dma_addr_t addr;                /* Descriptor DMA addr. */
> >         u32 len;                        /* Descriptor length. */
> > };
> >
> > Thanks.

As we discussed, you may wait my next patch set of the new design.

Could you review the first patch set of this serial.
http://lore.kernel.org/all/20240116062842.67874-1-xuanzhuo@linux.alibaba.com

I work on top of this.

PS.

There is another patch set "device stats". I hope that is in your list.

http://lore.kernel.org/all/20231226073103.116153-1-xuanzhuo@linux.alibaba.com

Thanks.


>
> Thanks
>
> >
> > >
> > > Thanks.
> > >
> > >
> > > >
> > > > Would this be simpler?
> > > >
> > > > Thanks
> > > >
> >
>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH net-next 5/5] virtio_net: sq support premapped mode
  2024-01-25  5:58     ` Xuan Zhuo
@ 2024-01-29  3:06       ` Jason Wang
  2024-01-29  3:11         ` Xuan Zhuo
  0 siblings, 1 reply; 33+ messages in thread
From: Jason Wang @ 2024-01-29  3:06 UTC (permalink / raw)
  To: Xuan Zhuo
  Cc: netdev, Michael S. Tsirkin, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Alexei Starovoitov, Daniel Borkmann,
	Jesper Dangaard Brouer, John Fastabend, virtualization, bpf

On Thu, Jan 25, 2024 at 2:24 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
>
> On Thu, 25 Jan 2024 11:39:20 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > On Tue, Jan 16, 2024 at 3:59 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > >
> > > If the xsk is enabling, the xsk tx will share the send queue.
> >
> > Any reason for this? Technically, virtio-net can work as other NIC
> > like 256 queues. There could be some work like optimizing the
> > interrupt allocations etc.
>
> Just like the logic of XDP_TX.
>
> Now the virtio spec does not allow to add new dynamic queues.
> As I know, most hypervisors just support few queues.

When multiqueue is developed in Qemu, it support as least 256 queue
pairs if my memory is correct.

> The num of
> queues is not bigger than the cpu num. So the best way is
> to share the send queues.
>
> Parav and I tried to introduce dynamic queues.

Virtio-net doesn't differ from real NIC where most of them can create
queue dynamically. It's more about the resource allocation, if mgmt
can start with 256 queues, then we probably fine.

But I think we can leave this question now.

> But that is dropped.
> Before that I think we can share the send queues.
>
>
> >
> > > But the xsk requires that the send queue use the premapped mode.
> > > So the send queue must support premapped mode.
> > >
> > > command: pktgen_sample01_simple.sh -i eth0 -s 16/1400 -d 10.0.0.123 -m 00:16:3e:12:e1:3e -n 0 -p 100
> > > machine:  ecs.ebmg6e.26xlarge of Aliyun
> > > cpu: Intel(R) Xeon(R) Platinum 8269CY CPU @ 2.50GHz
> > > iommu mode: intel_iommu=on iommu.strict=1 iommu=nopt
> > >
> > >                       |        iommu off           |        iommu on
> > > ----------------------|-----------------------------------------------------
> > >                       | 16         |  1400         | 16         | 1400
> > > ----------------------|-----------------------------------------------------
> > > Before:               |1716796.00  |  1581829.00   | 390756.00  | 374493.00
> > > After(premapped off): |1733794.00  |  1576259.00   | 390189.00  | 378128.00
> > > After(premapped on):  |1707107.00  |  1562917.00   | 385667.00  | 373584.00
> > >
> > > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > > ---
> > >  drivers/net/virtio/main.c       | 119 ++++++++++++++++++++++++++++----
> > >  drivers/net/virtio/virtio_net.h |  10 ++-
> > >  2 files changed, 116 insertions(+), 13 deletions(-)
> > >
> > > diff --git a/drivers/net/virtio/main.c b/drivers/net/virtio/main.c
> > > index 4fbf612da235..53143f95a3a0 100644
> > > --- a/drivers/net/virtio/main.c
> > > +++ b/drivers/net/virtio/main.c
> > > @@ -168,13 +168,39 @@ static struct xdp_frame *ptr_to_xdp(void *ptr)
> > >         return (struct xdp_frame *)((unsigned long)ptr & ~VIRTIO_XDP_FLAG);
> > >  }
> > >
> > > +static void virtnet_sq_unmap_buf(struct virtnet_sq *sq, struct virtio_dma_head *dma)
> > > +{
> > > +       int i;
> > > +
> > > +       if (!dma)
> > > +               return;
> > > +
> > > +       for (i = 0; i < dma->next; ++i)
> > > +               virtqueue_dma_unmap_single_attrs(sq->vq,
> > > +                                                dma->items[i].addr,
> > > +                                                dma->items[i].length,
> > > +                                                DMA_TO_DEVICE, 0);
> > > +       dma->next = 0;
> > > +}
> > > +
> > >  static void __free_old_xmit(struct virtnet_sq *sq, bool in_napi,
> > >                             u64 *bytes, u64 *packets)
> > >  {
> > > +       struct virtio_dma_head *dma;
> > >         unsigned int len;
> > >         void *ptr;
> > >
> > > -       while ((ptr = virtqueue_get_buf(sq->vq, &len)) != NULL) {
> > > +       if (virtqueue_get_dma_premapped(sq->vq)) {
> >
> > Any chance this.can be false?
>
> __free_old_xmit is the common path.

Did you mean the XDP path doesn't work with this? If yes, we need to
change that.

>
> The virtqueue_get_dma_premapped() is used to check whether the sq is premapped
> mode.
>
> >
> > > +               dma = &sq->dma.head;
> > > +               dma->num = ARRAY_SIZE(sq->dma.items);
> > > +               dma->next = 0;
> >
> > Btw, I found in the case of RX we have:
> >
> > virtnet_rq_alloc():
> >
> >                         alloc_frag->offset = sizeof(*dma);
> >
> > This seems to defeat frag coalescing when the memory is highly
> > fragmented or high order allocation is disallowed.
> >
> > Any idea to solve this?
>
>
> On the rq premapped pathset, I answered this.
>
> http://lore.kernel.org/all/1692156147.7470396-3-xuanzhuo@linux.alibaba.com

Oops, I forget that.

>
> >
> > > +       } else {
> > > +               dma = NULL;
> > > +       }
> > > +
> > > +       while ((ptr = virtqueue_get_buf_ctx_dma(sq->vq, &len, dma, NULL)) != NULL) {
> > > +               virtnet_sq_unmap_buf(sq, dma);
> > > +
> > >                 if (!is_xdp_frame(ptr)) {
> > >                         struct sk_buff *skb = ptr;
> > >
> > > @@ -572,16 +598,70 @@ static void *virtnet_rq_alloc(struct virtnet_rq *rq, u32 size, gfp_t gfp)
> > >         return buf;
> > >  }
> > >
> > > -static void virtnet_rq_set_premapped(struct virtnet_info *vi)
> > > +static void virtnet_set_premapped(struct virtnet_info *vi)
> > >  {
> > >         int i;
> > >
> > > -       /* disable for big mode */
> > > -       if (!vi->mergeable_rx_bufs && vi->big_packets)
> > > -               return;
> > > +       for (i = 0; i < vi->max_queue_pairs; i++) {
> > > +               virtqueue_set_dma_premapped(vi->sq[i].vq);
> > >
> > > -       for (i = 0; i < vi->max_queue_pairs; i++)
> > > -               virtqueue_set_dma_premapped(vi->rq[i].vq);
> > > +               /* TODO for big mode */
> >
> > Btw, how hard to support big mode? If we can do premapping for that
> > code could be simplified.
> >
> > (There are vendors that doesn't support mergeable rx buffers).
>
> I will do that after these patch-sets

If it's not too hard, I'd suggest to do it now.

>
> >
> > > +               if (vi->mergeable_rx_bufs || !vi->big_packets)
> > > +                       virtqueue_set_dma_premapped(vi->rq[i].vq);
> > > +       }
> > > +}
> > > +
> > > +static void virtnet_sq_unmap_sg(struct virtnet_sq *sq, u32 num)
> > > +{
> > > +       struct scatterlist *sg;
> > > +       u32 i;
> > > +
> > > +       for (i = 0; i < num; ++i) {
> > > +               sg = &sq->sg[i];
> > > +
> > > +               virtqueue_dma_unmap_single_attrs(sq->vq,
> > > +                                                sg->dma_address,
> > > +                                                sg->length,
> > > +                                                DMA_TO_DEVICE, 0);
> > > +       }
> > > +}
> > > +
> > > +static int virtnet_sq_map_sg(struct virtnet_sq *sq, u32 num)
> > > +{
> > > +       struct scatterlist *sg;
> > > +       u32 i;
> > > +
> > > +       for (i = 0; i < num; ++i) {
> > > +               sg = &sq->sg[i];
> > > +               sg->dma_address = virtqueue_dma_map_single_attrs(sq->vq, sg_virt(sg),
> > > +                                                                sg->length,
> > > +                                                                DMA_TO_DEVICE, 0);
> > > +               if (virtqueue_dma_mapping_error(sq->vq, sg->dma_address))
> > > +                       goto err;
> > > +       }
> > > +
> >
> > This seems nothing virtio-net specific, let's move it to the core?
>
>
> This is the dma api style.
>
> And the caller can not judge it by the return value of
> virtqueue_dma_map_single_attrs.

I meant, if e.g virtio-fs want to use premapped, the code will for
sure be duplicated there as well.

Thanks


>
> Thanks
>
>
> >
> > Thanks
> >
> >
> > > +       return 0;
> > > +
> > > +err:
> > > +       virtnet_sq_unmap_sg(sq, i);
> > > +       return -ENOMEM;
> > > +}
> > > +
> > > +static int virtnet_add_outbuf(struct virtnet_sq *sq, u32 num, void *data)
> > > +{
> > > +       int ret;
> > > +
> > > +       if (virtqueue_get_dma_premapped(sq->vq)) {
> > > +               ret = virtnet_sq_map_sg(sq, num);
> > > +               if (ret)
> > > +                       return -ENOMEM;
> > > +       }
> > > +
> > > +       ret = virtqueue_add_outbuf(sq->vq, sq->sg, num, data, GFP_ATOMIC);
> > > +       if (ret && virtqueue_get_dma_premapped(sq->vq))
> > > +               virtnet_sq_unmap_sg(sq, num);
> > > +
> > > +       return ret;
> > >  }
> > >
> > >  static void free_old_xmit(struct virtnet_sq *sq, bool in_napi)
> > > @@ -687,8 +767,7 @@ static int __virtnet_xdp_xmit_one(struct virtnet_info *vi,
> > >                             skb_frag_size(frag), skb_frag_off(frag));
> > >         }
> > >
> > > -       err = virtqueue_add_outbuf(sq->vq, sq->sg, nr_frags + 1,
> > > -                                  xdp_to_ptr(xdpf), GFP_ATOMIC);
> > > +       err = virtnet_add_outbuf(sq, nr_frags + 1, xdp_to_ptr(xdpf));
> > >         if (unlikely(err))
> > >                 return -ENOSPC; /* Caller handle free/refcnt */
> > >
> > > @@ -2154,7 +2233,7 @@ static int xmit_skb(struct virtnet_sq *sq, struct sk_buff *skb)
> > >                         return num_sg;
> > >                 num_sg++;
> > >         }
> > > -       return virtqueue_add_outbuf(sq->vq, sq->sg, num_sg, skb, GFP_ATOMIC);
> > > +       return virtnet_add_outbuf(sq, num_sg, skb);
> > >  }
> > >
> > >  static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev)
> > > @@ -4011,9 +4090,25 @@ static void free_receive_page_frags(struct virtnet_info *vi)
> > >
> > >  static void virtnet_sq_free_unused_bufs(struct virtqueue *vq)
> > >  {
> > > +       struct virtnet_info *vi = vq->vdev->priv;
> > > +       struct virtio_dma_head *dma;
> > > +       struct virtnet_sq *sq;
> > > +       int i = vq2txq(vq);
> > >         void *buf;
> > >
> > > -       while ((buf = virtqueue_detach_unused_buf(vq)) != NULL) {
> > > +       sq = &vi->sq[i];
> > > +
> > > +       if (virtqueue_get_dma_premapped(sq->vq)) {
> > > +               dma = &sq->dma.head;
> > > +               dma->num = ARRAY_SIZE(sq->dma.items);
> > > +               dma->next = 0;
> > > +       } else {
> > > +               dma = NULL;
> > > +       }
> > > +
> > > +       while ((buf = virtqueue_detach_unused_buf_dma(vq, dma)) != NULL) {
> > > +               virtnet_sq_unmap_buf(sq, dma);
> > > +
> > >                 if (!is_xdp_frame(buf))
> > >                         dev_kfree_skb(buf);
> > >                 else
> > > @@ -4228,7 +4323,7 @@ static int init_vqs(struct virtnet_info *vi)
> > >         if (ret)
> > >                 goto err_free;
> > >
> > > -       virtnet_rq_set_premapped(vi);
> > > +       virtnet_set_premapped(vi);
> > >
> > >         cpus_read_lock();
> > >         virtnet_set_affinity(vi);
> > > diff --git a/drivers/net/virtio/virtio_net.h b/drivers/net/virtio/virtio_net.h
> > > index 066a2b9d2b3c..dda144cc91c7 100644
> > > --- a/drivers/net/virtio/virtio_net.h
> > > +++ b/drivers/net/virtio/virtio_net.h
> > > @@ -48,13 +48,21 @@ struct virtnet_rq_dma {
> > >         u16 need_sync;
> > >  };
> > >
> > > +struct virtnet_sq_dma {
> > > +       struct virtio_dma_head head;
> > > +       struct virtio_dma_item items[MAX_SKB_FRAGS + 2];
> > > +};
> > > +
> > >  /* Internal representation of a send virtqueue */
> > >  struct virtnet_sq {
> > >         /* Virtqueue associated with this virtnet_sq */
> > >         struct virtqueue *vq;
> > >
> > >         /* TX: fragments + linear part + virtio header */
> > > -       struct scatterlist sg[MAX_SKB_FRAGS + 2];
> > > +       union {
> > > +               struct scatterlist sg[MAX_SKB_FRAGS + 2];
> > > +               struct virtnet_sq_dma dma;
> > > +       };
> > >
> > >         /* Name of the send queue: output.$index */
> > >         char name[16];
> > > --
> > > 2.32.0.3.g01195cf9f
> > >
> >
>


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH net-next 4/5] virtio_ring: introduce virtqueue_get_dma_premapped()
  2024-01-25  5:57     ` Xuan Zhuo
@ 2024-01-29  3:07       ` Jason Wang
  2024-01-29  3:30         ` Xuan Zhuo
  0 siblings, 1 reply; 33+ messages in thread
From: Jason Wang @ 2024-01-29  3:07 UTC (permalink / raw)
  To: Xuan Zhuo
  Cc: netdev, Michael S. Tsirkin, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Alexei Starovoitov, Daniel Borkmann,
	Jesper Dangaard Brouer, John Fastabend, virtualization, bpf

On Thu, Jan 25, 2024 at 1:58 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
>
> On Thu, 25 Jan 2024 11:39:03 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > On Tue, Jan 16, 2024 at 3:59 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > >
> > > Introduce helper virtqueue_get_dma_premapped(), then the driver
> > > can know whether dma unmap is needed.
> > >
> > > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > > ---
> > >  drivers/net/virtio/main.c       | 22 +++++++++-------------
> > >  drivers/net/virtio/virtio_net.h |  3 ---
> > >  drivers/virtio/virtio_ring.c    | 22 ++++++++++++++++++++++
> > >  include/linux/virtio.h          |  1 +
> > >  4 files changed, 32 insertions(+), 16 deletions(-)
> > >
> > > diff --git a/drivers/net/virtio/main.c b/drivers/net/virtio/main.c
> > > index 186b2cf5d8fc..4fbf612da235 100644
> > > --- a/drivers/net/virtio/main.c
> > > +++ b/drivers/net/virtio/main.c
> > > @@ -483,7 +483,7 @@ static void *virtnet_rq_get_buf(struct virtnet_rq *rq, u32 *len, void **ctx)
> > >         void *buf;
> > >
> > >         buf = virtqueue_get_buf_ctx(rq->vq, len, ctx);
> > > -       if (buf && rq->do_dma)
> > > +       if (buf && virtqueue_get_dma_premapped(rq->vq))
> > >                 virtnet_rq_unmap(rq, buf, *len);
> > >
> > >         return buf;
> > > @@ -496,7 +496,7 @@ static void virtnet_rq_init_one_sg(struct virtnet_rq *rq, void *buf, u32 len)
> > >         u32 offset;
> > >         void *head;
> > >
> > > -       if (!rq->do_dma) {
> > > +       if (!virtqueue_get_dma_premapped(rq->vq)) {
> > >                 sg_init_one(rq->sg, buf, len);
> > >                 return;
> > >         }
> > > @@ -526,7 +526,7 @@ static void *virtnet_rq_alloc(struct virtnet_rq *rq, u32 size, gfp_t gfp)
> > >
> > >         head = page_address(alloc_frag->page);
> > >
> > > -       if (rq->do_dma) {
> > > +       if (virtqueue_get_dma_premapped(rq->vq)) {
> > >                 dma = head;
> > >
> > >                 /* new pages */
> > > @@ -580,12 +580,8 @@ static void virtnet_rq_set_premapped(struct virtnet_info *vi)
> > >         if (!vi->mergeable_rx_bufs && vi->big_packets)
> > >                 return;
> > >
> > > -       for (i = 0; i < vi->max_queue_pairs; i++) {
> > > -               if (virtqueue_set_dma_premapped(vi->rq[i].vq))
> > > -                       continue;
> > > -
> > > -               vi->rq[i].do_dma = true;
> > > -       }
> > > +       for (i = 0; i < vi->max_queue_pairs; i++)
> > > +               virtqueue_set_dma_premapped(vi->rq[i].vq);
> > >  }
> > >
> > >  static void free_old_xmit(struct virtnet_sq *sq, bool in_napi)
> > > @@ -1643,7 +1639,7 @@ static int add_recvbuf_small(struct virtnet_info *vi, struct virtnet_rq *rq,
> > >
> > >         err = virtqueue_add_inbuf_ctx(rq->vq, rq->sg, 1, buf, ctx, gfp);
> > >         if (err < 0) {
> > > -               if (rq->do_dma)
> > > +               if (virtqueue_get_dma_premapped(rq->vq))
> > >                         virtnet_rq_unmap(rq, buf, 0);
> > >                 put_page(virt_to_head_page(buf));
> > >         }
> > > @@ -1758,7 +1754,7 @@ static int add_recvbuf_mergeable(struct virtnet_info *vi,
> > >         ctx = mergeable_len_to_ctx(len + room, headroom);
> > >         err = virtqueue_add_inbuf_ctx(rq->vq, rq->sg, 1, buf, ctx, gfp);
> > >         if (err < 0) {
> > > -               if (rq->do_dma)
> > > +               if (virtqueue_get_dma_premapped(rq->vq))
> > >                         virtnet_rq_unmap(rq, buf, 0);
> > >                 put_page(virt_to_head_page(buf));
> > >         }
> > > @@ -4007,7 +4003,7 @@ static void free_receive_page_frags(struct virtnet_info *vi)
> > >         int i;
> > >         for (i = 0; i < vi->max_queue_pairs; i++)
> > >                 if (vi->rq[i].alloc_frag.page) {
> > > -                       if (vi->rq[i].do_dma && vi->rq[i].last_dma)
> > > +                       if (virtqueue_get_dma_premapped(vi->rq[i].vq) && vi->rq[i].last_dma)
> > >                                 virtnet_rq_unmap(&vi->rq[i], vi->rq[i].last_dma, 0);
> > >                         put_page(vi->rq[i].alloc_frag.page);
> > >                 }
> > > @@ -4035,7 +4031,7 @@ static void virtnet_rq_free_unused_bufs(struct virtqueue *vq)
> > >         rq = &vi->rq[i];
> > >
> > >         while ((buf = virtqueue_detach_unused_buf(vq)) != NULL) {
> > > -               if (rq->do_dma)
> > > +               if (virtqueue_get_dma_premapped(rq->vq))
> > >                         virtnet_rq_unmap(rq, buf, 0);
> > >
> > >                 virtnet_rq_free_buf(vi, rq, buf);
> > > diff --git a/drivers/net/virtio/virtio_net.h b/drivers/net/virtio/virtio_net.h
> > > index b28a4d0a3150..066a2b9d2b3c 100644
> > > --- a/drivers/net/virtio/virtio_net.h
> > > +++ b/drivers/net/virtio/virtio_net.h
> > > @@ -115,9 +115,6 @@ struct virtnet_rq {
> > >
> > >         /* Record the last dma info to free after new pages is allocated. */
> > >         struct virtnet_rq_dma *last_dma;
> > > -
> > > -       /* Do dma by self */
> > > -       bool do_dma;
> > >  };
> > >
> > >  struct virtnet_info {
> > > diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> > > index 2c5089d3b510..9092bcdebb53 100644
> > > --- a/drivers/virtio/virtio_ring.c
> > > +++ b/drivers/virtio/virtio_ring.c
> > > @@ -2905,6 +2905,28 @@ int virtqueue_set_dma_premapped(struct virtqueue *_vq)
> > >  }
> > >  EXPORT_SYMBOL_GPL(virtqueue_set_dma_premapped);
> > >
> > > +/**
> > > + * virtqueue_get_dma_premapped - get the vring premapped mode
> > > + * @_vq: the struct virtqueue we're talking about.
> > > + *
> > > + * Get the premapped mode of the vq.
> > > + *
> > > + * Returns bool for the vq premapped mode.
> > > + */
> > > +bool virtqueue_get_dma_premapped(struct virtqueue *_vq)
> > > +{
> > > +       struct vring_virtqueue *vq = to_vvq(_vq);
> > > +       bool premapped;
> > > +
> > > +       START_USE(vq);
> > > +       premapped = vq->premapped;
> > > +       END_USE(vq);
> >
> > Why do we need to protect premapped like this? Is the user allowed to
> > change it on the fly?
>
>
> Just protect before accessing vq.

I meant how did that differ from other booleans? E.g use_dma_api, do_unmap etc.

Thanks

>
> Thanks.
> >
> > Thanks
> >
>


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH net-next 5/5] virtio_net: sq support premapped mode
  2024-01-29  3:06       ` Jason Wang
@ 2024-01-29  3:11         ` Xuan Zhuo
  2024-01-30  2:56           ` Jason Wang
  0 siblings, 1 reply; 33+ messages in thread
From: Xuan Zhuo @ 2024-01-29  3:11 UTC (permalink / raw)
  To: Jason Wang
  Cc: netdev, Michael S. Tsirkin, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Alexei Starovoitov, Daniel Borkmann,
	Jesper Dangaard Brouer, John Fastabend, virtualization, bpf

On Mon, 29 Jan 2024 11:06:35 +0800, Jason Wang <jasowang@redhat.com> wrote:
> On Thu, Jan 25, 2024 at 2:24 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> >
> > On Thu, 25 Jan 2024 11:39:20 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > On Tue, Jan 16, 2024 at 3:59 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > >
> > > > If the xsk is enabling, the xsk tx will share the send queue.
> > >
> > > Any reason for this? Technically, virtio-net can work as other NIC
> > > like 256 queues. There could be some work like optimizing the
> > > interrupt allocations etc.
> >
> > Just like the logic of XDP_TX.
> >
> > Now the virtio spec does not allow to add new dynamic queues.
> > As I know, most hypervisors just support few queues.
>
> When multiqueue is developed in Qemu, it support as least 256 queue
> pairs if my memory is correct.
>


YES, but that is configured by the hypervisor.

For the user on any platform, when he got a vm, the queue num is fixed.
As I know, on most case, the num is less.
If we want the af-xdp/xdp-tx has the the independent queues
I think the dynamic queue is good way.


> > The num of
> > queues is not bigger than the cpu num. So the best way is
> > to share the send queues.
> >
> > Parav and I tried to introduce dynamic queues.
>
> Virtio-net doesn't differ from real NIC where most of them can create
> queue dynamically. It's more about the resource allocation, if mgmt
> can start with 256 queues, then we probably fine.

But now, if the devices has 256, we will enable the 256 queues by default.
that is too much.

So, the dynamic queue is not to create a new queue out of the resource.

The device may tell the driver, the max queue resource is 256,
but let we start from 8. If the driver need more, then we can
enable more.

But for me, the xdp tx can share the sq queue, so let we start
the af-xdp from sharing sq queue.


>
> But I think we can leave this question now.
>
> > But that is dropped.
> > Before that I think we can share the send queues.
> >
> >
> > >
> > > > But the xsk requires that the send queue use the premapped mode.
> > > > So the send queue must support premapped mode.
> > > >
> > > > command: pktgen_sample01_simple.sh -i eth0 -s 16/1400 -d 10.0.0.123 -m 00:16:3e:12:e1:3e -n 0 -p 100
> > > > machine:  ecs.ebmg6e.26xlarge of Aliyun
> > > > cpu: Intel(R) Xeon(R) Platinum 8269CY CPU @ 2.50GHz
> > > > iommu mode: intel_iommu=on iommu.strict=1 iommu=nopt
> > > >
> > > >                       |        iommu off           |        iommu on
> > > > ----------------------|-----------------------------------------------------
> > > >                       | 16         |  1400         | 16         | 1400
> > > > ----------------------|-----------------------------------------------------
> > > > Before:               |1716796.00  |  1581829.00   | 390756.00  | 374493.00
> > > > After(premapped off): |1733794.00  |  1576259.00   | 390189.00  | 378128.00
> > > > After(premapped on):  |1707107.00  |  1562917.00   | 385667.00  | 373584.00
> > > >
> > > > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > > > ---
> > > >  drivers/net/virtio/main.c       | 119 ++++++++++++++++++++++++++++----
> > > >  drivers/net/virtio/virtio_net.h |  10 ++-
> > > >  2 files changed, 116 insertions(+), 13 deletions(-)
> > > >
> > > > diff --git a/drivers/net/virtio/main.c b/drivers/net/virtio/main.c
> > > > index 4fbf612da235..53143f95a3a0 100644
> > > > --- a/drivers/net/virtio/main.c
> > > > +++ b/drivers/net/virtio/main.c
> > > > @@ -168,13 +168,39 @@ static struct xdp_frame *ptr_to_xdp(void *ptr)
> > > >         return (struct xdp_frame *)((unsigned long)ptr & ~VIRTIO_XDP_FLAG);
> > > >  }
> > > >
> > > > +static void virtnet_sq_unmap_buf(struct virtnet_sq *sq, struct virtio_dma_head *dma)
> > > > +{
> > > > +       int i;
> > > > +
> > > > +       if (!dma)
> > > > +               return;
> > > > +
> > > > +       for (i = 0; i < dma->next; ++i)
> > > > +               virtqueue_dma_unmap_single_attrs(sq->vq,
> > > > +                                                dma->items[i].addr,
> > > > +                                                dma->items[i].length,
> > > > +                                                DMA_TO_DEVICE, 0);
> > > > +       dma->next = 0;
> > > > +}
> > > > +
> > > >  static void __free_old_xmit(struct virtnet_sq *sq, bool in_napi,
> > > >                             u64 *bytes, u64 *packets)
> > > >  {
> > > > +       struct virtio_dma_head *dma;
> > > >         unsigned int len;
> > > >         void *ptr;
> > > >
> > > > -       while ((ptr = virtqueue_get_buf(sq->vq, &len)) != NULL) {
> > > > +       if (virtqueue_get_dma_premapped(sq->vq)) {
> > >
> > > Any chance this.can be false?
> >
> > __free_old_xmit is the common path.
>
> Did you mean the XDP path doesn't work with this? If yes, we need to
> change that.


NO. If the virtio core use_dma_api is false, the dma premapped
can not be ture.

>
> >
> > The virtqueue_get_dma_premapped() is used to check whether the sq is premapped
> > mode.
> >
> > >
> > > > +               dma = &sq->dma.head;
> > > > +               dma->num = ARRAY_SIZE(sq->dma.items);
> > > > +               dma->next = 0;
> > >
> > > Btw, I found in the case of RX we have:
> > >
> > > virtnet_rq_alloc():
> > >
> > >                         alloc_frag->offset = sizeof(*dma);
> > >
> > > This seems to defeat frag coalescing when the memory is highly
> > > fragmented or high order allocation is disallowed.
> > >
> > > Any idea to solve this?
> >
> >
> > On the rq premapped pathset, I answered this.
> >
> > http://lore.kernel.org/all/1692156147.7470396-3-xuanzhuo@linux.alibaba.com
>
> Oops, I forget that.
>
> >
> > >
> > > > +       } else {
> > > > +               dma = NULL;
> > > > +       }
> > > > +
> > > > +       while ((ptr = virtqueue_get_buf_ctx_dma(sq->vq, &len, dma, NULL)) != NULL) {
> > > > +               virtnet_sq_unmap_buf(sq, dma);
> > > > +
> > > >                 if (!is_xdp_frame(ptr)) {
> > > >                         struct sk_buff *skb = ptr;
> > > >
> > > > @@ -572,16 +598,70 @@ static void *virtnet_rq_alloc(struct virtnet_rq *rq, u32 size, gfp_t gfp)
> > > >         return buf;
> > > >  }
> > > >
> > > > -static void virtnet_rq_set_premapped(struct virtnet_info *vi)
> > > > +static void virtnet_set_premapped(struct virtnet_info *vi)
> > > >  {
> > > >         int i;
> > > >
> > > > -       /* disable for big mode */
> > > > -       if (!vi->mergeable_rx_bufs && vi->big_packets)
> > > > -               return;
> > > > +       for (i = 0; i < vi->max_queue_pairs; i++) {
> > > > +               virtqueue_set_dma_premapped(vi->sq[i].vq);
> > > >
> > > > -       for (i = 0; i < vi->max_queue_pairs; i++)
> > > > -               virtqueue_set_dma_premapped(vi->rq[i].vq);
> > > > +               /* TODO for big mode */
> > >
> > > Btw, how hard to support big mode? If we can do premapping for that
> > > code could be simplified.
> > >
> > > (There are vendors that doesn't support mergeable rx buffers).
> >
> > I will do that after these patch-sets
>
> If it's not too hard, I'd suggest to do it now.


YES. Is not too hard, but I was doing too much.

* virtio-net + device stats
* virtio-net + af-xdp, this patch set has about 27 commits

And I was pushing this too long, I just want to finish the work.
Then I can work on the next (premapped big mode, af-xdp multi-buf....).

So, let we step by step.


>
> >
> > >
> > > > +               if (vi->mergeable_rx_bufs || !vi->big_packets)
> > > > +                       virtqueue_set_dma_premapped(vi->rq[i].vq);
> > > > +       }
> > > > +}
> > > > +
> > > > +static void virtnet_sq_unmap_sg(struct virtnet_sq *sq, u32 num)
> > > > +{
> > > > +       struct scatterlist *sg;
> > > > +       u32 i;
> > > > +
> > > > +       for (i = 0; i < num; ++i) {
> > > > +               sg = &sq->sg[i];
> > > > +
> > > > +               virtqueue_dma_unmap_single_attrs(sq->vq,
> > > > +                                                sg->dma_address,
> > > > +                                                sg->length,
> > > > +                                                DMA_TO_DEVICE, 0);
> > > > +       }
> > > > +}
> > > > +
> > > > +static int virtnet_sq_map_sg(struct virtnet_sq *sq, u32 num)
> > > > +{
> > > > +       struct scatterlist *sg;
> > > > +       u32 i;
> > > > +
> > > > +       for (i = 0; i < num; ++i) {
> > > > +               sg = &sq->sg[i];
> > > > +               sg->dma_address = virtqueue_dma_map_single_attrs(sq->vq, sg_virt(sg),
> > > > +                                                                sg->length,
> > > > +                                                                DMA_TO_DEVICE, 0);
> > > > +               if (virtqueue_dma_mapping_error(sq->vq, sg->dma_address))
> > > > +                       goto err;
> > > > +       }
> > > > +
> > >
> > > This seems nothing virtio-net specific, let's move it to the core?
> >
> >
> > This is the dma api style.
> >
> > And the caller can not judge it by the return value of
> > virtqueue_dma_map_single_attrs.
>
> I meant, if e.g virtio-fs want to use premapped, the code will for
> sure be duplicated there as well.

If you mean this function virtnet_sq_map_sg, I think you are right.

I will put it to the virtio core.

Thanks.




>
> Thanks
>
>
> >
> > Thanks
> >
> >
> > >
> > > Thanks
> > >
> > >
> > > > +       return 0;
> > > > +
> > > > +err:
> > > > +       virtnet_sq_unmap_sg(sq, i);
> > > > +       return -ENOMEM;
> > > > +}
> > > > +
> > > > +static int virtnet_add_outbuf(struct virtnet_sq *sq, u32 num, void *data)
> > > > +{
> > > > +       int ret;
> > > > +
> > > > +       if (virtqueue_get_dma_premapped(sq->vq)) {
> > > > +               ret = virtnet_sq_map_sg(sq, num);
> > > > +               if (ret)
> > > > +                       return -ENOMEM;
> > > > +       }
> > > > +
> > > > +       ret = virtqueue_add_outbuf(sq->vq, sq->sg, num, data, GFP_ATOMIC);
> > > > +       if (ret && virtqueue_get_dma_premapped(sq->vq))
> > > > +               virtnet_sq_unmap_sg(sq, num);
> > > > +
> > > > +       return ret;
> > > >  }
> > > >
> > > >  static void free_old_xmit(struct virtnet_sq *sq, bool in_napi)
> > > > @@ -687,8 +767,7 @@ static int __virtnet_xdp_xmit_one(struct virtnet_info *vi,
> > > >                             skb_frag_size(frag), skb_frag_off(frag));
> > > >         }
> > > >
> > > > -       err = virtqueue_add_outbuf(sq->vq, sq->sg, nr_frags + 1,
> > > > -                                  xdp_to_ptr(xdpf), GFP_ATOMIC);
> > > > +       err = virtnet_add_outbuf(sq, nr_frags + 1, xdp_to_ptr(xdpf));
> > > >         if (unlikely(err))
> > > >                 return -ENOSPC; /* Caller handle free/refcnt */
> > > >
> > > > @@ -2154,7 +2233,7 @@ static int xmit_skb(struct virtnet_sq *sq, struct sk_buff *skb)
> > > >                         return num_sg;
> > > >                 num_sg++;
> > > >         }
> > > > -       return virtqueue_add_outbuf(sq->vq, sq->sg, num_sg, skb, GFP_ATOMIC);
> > > > +       return virtnet_add_outbuf(sq, num_sg, skb);
> > > >  }
> > > >
> > > >  static netdev_tx_t start_xmit(struct sk_buff *skb, struct net_device *dev)
> > > > @@ -4011,9 +4090,25 @@ static void free_receive_page_frags(struct virtnet_info *vi)
> > > >
> > > >  static void virtnet_sq_free_unused_bufs(struct virtqueue *vq)
> > > >  {
> > > > +       struct virtnet_info *vi = vq->vdev->priv;
> > > > +       struct virtio_dma_head *dma;
> > > > +       struct virtnet_sq *sq;
> > > > +       int i = vq2txq(vq);
> > > >         void *buf;
> > > >
> > > > -       while ((buf = virtqueue_detach_unused_buf(vq)) != NULL) {
> > > > +       sq = &vi->sq[i];
> > > > +
> > > > +       if (virtqueue_get_dma_premapped(sq->vq)) {
> > > > +               dma = &sq->dma.head;
> > > > +               dma->num = ARRAY_SIZE(sq->dma.items);
> > > > +               dma->next = 0;
> > > > +       } else {
> > > > +               dma = NULL;
> > > > +       }
> > > > +
> > > > +       while ((buf = virtqueue_detach_unused_buf_dma(vq, dma)) != NULL) {
> > > > +               virtnet_sq_unmap_buf(sq, dma);
> > > > +
> > > >                 if (!is_xdp_frame(buf))
> > > >                         dev_kfree_skb(buf);
> > > >                 else
> > > > @@ -4228,7 +4323,7 @@ static int init_vqs(struct virtnet_info *vi)
> > > >         if (ret)
> > > >                 goto err_free;
> > > >
> > > > -       virtnet_rq_set_premapped(vi);
> > > > +       virtnet_set_premapped(vi);
> > > >
> > > >         cpus_read_lock();
> > > >         virtnet_set_affinity(vi);
> > > > diff --git a/drivers/net/virtio/virtio_net.h b/drivers/net/virtio/virtio_net.h
> > > > index 066a2b9d2b3c..dda144cc91c7 100644
> > > > --- a/drivers/net/virtio/virtio_net.h
> > > > +++ b/drivers/net/virtio/virtio_net.h
> > > > @@ -48,13 +48,21 @@ struct virtnet_rq_dma {
> > > >         u16 need_sync;
> > > >  };
> > > >
> > > > +struct virtnet_sq_dma {
> > > > +       struct virtio_dma_head head;
> > > > +       struct virtio_dma_item items[MAX_SKB_FRAGS + 2];
> > > > +};
> > > > +
> > > >  /* Internal representation of a send virtqueue */
> > > >  struct virtnet_sq {
> > > >         /* Virtqueue associated with this virtnet_sq */
> > > >         struct virtqueue *vq;
> > > >
> > > >         /* TX: fragments + linear part + virtio header */
> > > > -       struct scatterlist sg[MAX_SKB_FRAGS + 2];
> > > > +       union {
> > > > +               struct scatterlist sg[MAX_SKB_FRAGS + 2];
> > > > +               struct virtnet_sq_dma dma;
> > > > +       };
> > > >
> > > >         /* Name of the send queue: output.$index */
> > > >         char name[16];
> > > > --
> > > > 2.32.0.3.g01195cf9f
> > > >
> > >
> >
>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH net-next 0/5] virtio-net: sq support premapped mode
  2024-01-25  6:25         ` Xuan Zhuo
@ 2024-01-29  3:14           ` Jason Wang
  2024-01-29  3:37             ` Xuan Zhuo
  0 siblings, 1 reply; 33+ messages in thread
From: Jason Wang @ 2024-01-29  3:14 UTC (permalink / raw)
  To: Xuan Zhuo
  Cc: netdev, Michael S. Tsirkin, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Alexei Starovoitov, Daniel Borkmann,
	Jesper Dangaard Brouer, John Fastabend, virtualization, bpf

On Thu, Jan 25, 2024 at 2:33 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
>
> On Thu, 25 Jan 2024 14:14:58 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > On Thu, Jan 25, 2024 at 1:52 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > >
> > > On Thu, 25 Jan 2024 13:42:05 +0800, Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > > On Thu, 25 Jan 2024 11:39:28 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > > > On Tue, Jan 16, 2024 at 3:59 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > > > >
> > > > > > This is the second part of virtio-net support AF_XDP zero copy.
> > > > > >
> > > > > > The whole patch set
> > > > > > http://lore.kernel.org/all/20231229073108.57778-1-xuanzhuo@linux.alibaba.com
> > > > > >
> > > > > > ## About the branch
> > > > > >
> > > > > > This patch set is pushed to the net-next branch, but some patches are about
> > > > > > virtio core. Because the entire patch set for virtio-net to support AF_XDP
> > > > > > should be pushed to net-next, I hope these patches will be merged into net-next
> > > > > > with the virtio core maintains's Acked-by.
> > > > > >
> > > > > > ============================================================================
> > > > > >
> > > > > > ## AF_XDP
> > > > > >
> > > > > > XDP socket(AF_XDP) is an excellent bypass kernel network framework. The zero
> > > > > > copy feature of xsk (XDP socket) needs to be supported by the driver. The
> > > > > > performance of zero copy is very good. mlx5 and intel ixgbe already support
> > > > > > this feature, This patch set allows virtio-net to support xsk's zerocopy xmit
> > > > > > feature.
> > > > > >
> > > > > > At present, we have completed some preparation:
> > > > > >
> > > > > > 1. vq-reset (virtio spec and kernel code)
> > > > > > 2. virtio-core premapped dma
> > > > > > 3. virtio-net xdp refactor
> > > > > >
> > > > > > So it is time for Virtio-Net to complete the support for the XDP Socket
> > > > > > Zerocopy.
> > > > > >
> > > > > > Virtio-net can not increase the queue num at will, so xsk shares the queue with
> > > > > > kernel.
> > > > > >
> > > > > > On the other hand, Virtio-Net does not support generate interrupt from driver
> > > > > > manually, so when we wakeup tx xmit, we used some tips. If the CPU run by TX
> > > > > > NAPI last time is other CPUs, use IPI to wake up NAPI on the remote CPU. If it
> > > > > > is also the local CPU, then we wake up napi directly.
> > > > > >
> > > > > > This patch set includes some refactor to the virtio-net to let that to support
> > > > > > AF_XDP.
> > > > > >
> > > > > > ## performance
> > > > > >
> > > > > > ENV: Qemu with vhost-user(polling mode).
> > > > > > Host CPU: Intel(R) Xeon(R) Platinum 8163 CPU @ 2.50GHz
> > > > > >
> > > > > > ### virtio PMD in guest with testpmd
> > > > > >
> > > > > > testpmd> show port stats all
> > > > > >
> > > > > >  ######################## NIC statistics for port 0 ########################
> > > > > >  RX-packets: 19531092064 RX-missed: 0     RX-bytes: 1093741155584
> > > > > >  RX-errors: 0
> > > > > >  RX-nombuf: 0
> > > > > >  TX-packets: 5959955552 TX-errors: 0     TX-bytes: 371030645664
> > > > > >
> > > > > >
> > > > > >  Throughput (since last show)
> > > > > >  Rx-pps:   8861574     Rx-bps:  3969985208
> > > > > >  Tx-pps:   8861493     Tx-bps:  3969962736
> > > > > >  ############################################################################
> > > > > >
> > > > > > ### AF_XDP PMD in guest with testpmd
> > > > > >
> > > > > > testpmd> show port stats all
> > > > > >
> > > > > >   ######################## NIC statistics for port 0  ########################
> > > > > >   RX-packets: 68152727   RX-missed: 0          RX-bytes:  3816552712
> > > > > >   RX-errors: 0
> > > > > >   RX-nombuf:  0
> > > > > >   TX-packets: 68114967   TX-errors: 33216      TX-bytes:  3814438152
> > > > > >
> > > > > >   Throughput (since last show)
> > > > > >   Rx-pps:      6333196          Rx-bps:   2837272088
> > > > > >   Tx-pps:      6333227          Tx-bps:   2837285936
> > > > > >   ############################################################################
> > > > > >
> > > > > > But AF_XDP consumes more CPU for tx and rx napi(100% and 86%).
> > > > > >
> > > > > > ## maintain
> > > > > >
> > > > > > I am currently a reviewer for virtio-net. I commit to maintain AF_XDP support in
> > > > > > virtio-net.
> > > > > >
> > > > > > Please review.
> > > > > >
> > > > >
> > > > > Rethink of the whole design, I have one question:
> > > > >
> > > > > The reason we need to store DMA information is to harden the virtqueue
> > > > > to make sure the DMA unmap is safe. This seems redundant when the
> > > > > buffer were premapped by the driver, for example:
> > > > >
> > > > > Receive queue maintains DMA information, so it doesn't need desc_extra to work.
> > > > >
> > > > > So can we simply
> > > > >
> > > > > 1) when premapping is enabled, store DMA information by driver itself
> > > >
> > > > YES. this is simpler. And this is more convenience.
> > > > But the driver must allocate memory to store the dma info.
> >
> > Right, and this looks like the common practice for most of the NIC drivers.
> >
> > > >
> > > > > 2) don't store DMA information in desc_extra
> > > >
> > > > YES. But the desc_extra memory is wasted. The "next" item is used.
> > > > Do you think should we free the desc_extra when the vq is premapped mode?
> > >
> > >
> > > struct vring_desc_extra {
> > >         dma_addr_t addr;                /* Descriptor DMA addr. */
> > >         u32 len;                        /* Descriptor length. */
> > >         u16 flags;                      /* Descriptor flags. */
> > >         u16 next;                       /* The next desc state in a list. */
> > > };
> > >
> > >
> > > The flags and the next are used whatever premapped or not.
> > >
> > > So I think we can add a new array to store the addr and len.
> >
> > Yes.
> >
> > > If the vq is premappd, the memory can be freed.
> >
> > Then we need to make sure the premapped is set before find_vqs() etc.
>
>
> Yes. We can start from the parameters of the find_vqs().
>
> But actually we can free the dma array when the driver sets premapped mode.

Probably, but that's kind of odd.

init()
    alloc()

set_premapped()
    free()

>
> >
> > >
> > > struct vring_desc_extra {
> > >         u16 flags;                      /* Descriptor flags. */
> > >         u16 next;                       /* The next desc state in a list. */
> > > };
> > >
> > > struct vring_desc_dma {
> > >         dma_addr_t addr;                /* Descriptor DMA addr. */
> > >         u32 len;                        /* Descriptor length. */
> > > };
> > >
> > > Thanks.
>
> As we discussed, you may wait my next patch set of the new design.
>
> Could you review the first patch set of this serial.
> http://lore.kernel.org/all/20240116062842.67874-1-xuanzhuo@linux.alibaba.com
>
> I work on top of this.

Actually, I'm a little confused about the dependencies.

We have three:

1) move the virtio-net to a dedicated directory
2) premapped mode
3) AF_XDP

It looks to me the current series is posted in that dependency.

Then I have questions:

1) do we agree with moving to a directory (I don't have a preference)?
2) if 3) depends on 2), I'd suggest to make sure 2) is finalized
before posting 3), this is because we have gone through several rounds
of AF_XDP and most concerns were for the API that is introduced in 2)

Does this make sense?

>
> PS.
>
> There is another patch set "device stats". I hope that is in your list.
>
> http://lore.kernel.org/all/20231226073103.116153-1-xuanzhuo@linux.alibaba.com

Yes, it is. (If it doesn't depend on the moving of the source).

Thanks

>
> Thanks.
>
>
> >
> > Thanks
> >
> > >
> > > >
> > > > Thanks.
> > > >
> > > >
> > > > >
> > > > > Would this be simpler?
> > > > >
> > > > > Thanks
> > > > >
> > >
> >
>


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH net-next 4/5] virtio_ring: introduce virtqueue_get_dma_premapped()
  2024-01-29  3:07       ` Jason Wang
@ 2024-01-29  3:30         ` Xuan Zhuo
  2024-01-30  2:54           ` Jason Wang
  0 siblings, 1 reply; 33+ messages in thread
From: Xuan Zhuo @ 2024-01-29  3:30 UTC (permalink / raw)
  To: Jason Wang
  Cc: netdev, Michael S. Tsirkin, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Alexei Starovoitov, Daniel Borkmann,
	Jesper Dangaard Brouer, John Fastabend, virtualization, bpf

On Mon, 29 Jan 2024 11:07:50 +0800, Jason Wang <jasowang@redhat.com> wrote:
> On Thu, Jan 25, 2024 at 1:58 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> >
> > On Thu, 25 Jan 2024 11:39:03 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > On Tue, Jan 16, 2024 at 3:59 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > >
> > > > Introduce helper virtqueue_get_dma_premapped(), then the driver
> > > > can know whether dma unmap is needed.
> > > >
> > > > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > > > ---
> > > >  drivers/net/virtio/main.c       | 22 +++++++++-------------
> > > >  drivers/net/virtio/virtio_net.h |  3 ---
> > > >  drivers/virtio/virtio_ring.c    | 22 ++++++++++++++++++++++
> > > >  include/linux/virtio.h          |  1 +
> > > >  4 files changed, 32 insertions(+), 16 deletions(-)
> > > >
> > > > diff --git a/drivers/net/virtio/main.c b/drivers/net/virtio/main.c
> > > > index 186b2cf5d8fc..4fbf612da235 100644
> > > > --- a/drivers/net/virtio/main.c
> > > > +++ b/drivers/net/virtio/main.c
> > > > @@ -483,7 +483,7 @@ static void *virtnet_rq_get_buf(struct virtnet_rq *rq, u32 *len, void **ctx)
> > > >         void *buf;
> > > >
> > > >         buf = virtqueue_get_buf_ctx(rq->vq, len, ctx);
> > > > -       if (buf && rq->do_dma)
> > > > +       if (buf && virtqueue_get_dma_premapped(rq->vq))
> > > >                 virtnet_rq_unmap(rq, buf, *len);
> > > >
> > > >         return buf;
> > > > @@ -496,7 +496,7 @@ static void virtnet_rq_init_one_sg(struct virtnet_rq *rq, void *buf, u32 len)
> > > >         u32 offset;
> > > >         void *head;
> > > >
> > > > -       if (!rq->do_dma) {
> > > > +       if (!virtqueue_get_dma_premapped(rq->vq)) {
> > > >                 sg_init_one(rq->sg, buf, len);
> > > >                 return;
> > > >         }
> > > > @@ -526,7 +526,7 @@ static void *virtnet_rq_alloc(struct virtnet_rq *rq, u32 size, gfp_t gfp)
> > > >
> > > >         head = page_address(alloc_frag->page);
> > > >
> > > > -       if (rq->do_dma) {
> > > > +       if (virtqueue_get_dma_premapped(rq->vq)) {
> > > >                 dma = head;
> > > >
> > > >                 /* new pages */
> > > > @@ -580,12 +580,8 @@ static void virtnet_rq_set_premapped(struct virtnet_info *vi)
> > > >         if (!vi->mergeable_rx_bufs && vi->big_packets)
> > > >                 return;
> > > >
> > > > -       for (i = 0; i < vi->max_queue_pairs; i++) {
> > > > -               if (virtqueue_set_dma_premapped(vi->rq[i].vq))
> > > > -                       continue;
> > > > -
> > > > -               vi->rq[i].do_dma = true;
> > > > -       }
> > > > +       for (i = 0; i < vi->max_queue_pairs; i++)
> > > > +               virtqueue_set_dma_premapped(vi->rq[i].vq);
> > > >  }
> > > >
> > > >  static void free_old_xmit(struct virtnet_sq *sq, bool in_napi)
> > > > @@ -1643,7 +1639,7 @@ static int add_recvbuf_small(struct virtnet_info *vi, struct virtnet_rq *rq,
> > > >
> > > >         err = virtqueue_add_inbuf_ctx(rq->vq, rq->sg, 1, buf, ctx, gfp);
> > > >         if (err < 0) {
> > > > -               if (rq->do_dma)
> > > > +               if (virtqueue_get_dma_premapped(rq->vq))
> > > >                         virtnet_rq_unmap(rq, buf, 0);
> > > >                 put_page(virt_to_head_page(buf));
> > > >         }
> > > > @@ -1758,7 +1754,7 @@ static int add_recvbuf_mergeable(struct virtnet_info *vi,
> > > >         ctx = mergeable_len_to_ctx(len + room, headroom);
> > > >         err = virtqueue_add_inbuf_ctx(rq->vq, rq->sg, 1, buf, ctx, gfp);
> > > >         if (err < 0) {
> > > > -               if (rq->do_dma)
> > > > +               if (virtqueue_get_dma_premapped(rq->vq))
> > > >                         virtnet_rq_unmap(rq, buf, 0);
> > > >                 put_page(virt_to_head_page(buf));
> > > >         }
> > > > @@ -4007,7 +4003,7 @@ static void free_receive_page_frags(struct virtnet_info *vi)
> > > >         int i;
> > > >         for (i = 0; i < vi->max_queue_pairs; i++)
> > > >                 if (vi->rq[i].alloc_frag.page) {
> > > > -                       if (vi->rq[i].do_dma && vi->rq[i].last_dma)
> > > > +                       if (virtqueue_get_dma_premapped(vi->rq[i].vq) && vi->rq[i].last_dma)
> > > >                                 virtnet_rq_unmap(&vi->rq[i], vi->rq[i].last_dma, 0);
> > > >                         put_page(vi->rq[i].alloc_frag.page);
> > > >                 }
> > > > @@ -4035,7 +4031,7 @@ static void virtnet_rq_free_unused_bufs(struct virtqueue *vq)
> > > >         rq = &vi->rq[i];
> > > >
> > > >         while ((buf = virtqueue_detach_unused_buf(vq)) != NULL) {
> > > > -               if (rq->do_dma)
> > > > +               if (virtqueue_get_dma_premapped(rq->vq))
> > > >                         virtnet_rq_unmap(rq, buf, 0);
> > > >
> > > >                 virtnet_rq_free_buf(vi, rq, buf);
> > > > diff --git a/drivers/net/virtio/virtio_net.h b/drivers/net/virtio/virtio_net.h
> > > > index b28a4d0a3150..066a2b9d2b3c 100644
> > > > --- a/drivers/net/virtio/virtio_net.h
> > > > +++ b/drivers/net/virtio/virtio_net.h
> > > > @@ -115,9 +115,6 @@ struct virtnet_rq {
> > > >
> > > >         /* Record the last dma info to free after new pages is allocated. */
> > > >         struct virtnet_rq_dma *last_dma;
> > > > -
> > > > -       /* Do dma by self */
> > > > -       bool do_dma;
> > > >  };
> > > >
> > > >  struct virtnet_info {
> > > > diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> > > > index 2c5089d3b510..9092bcdebb53 100644
> > > > --- a/drivers/virtio/virtio_ring.c
> > > > +++ b/drivers/virtio/virtio_ring.c
> > > > @@ -2905,6 +2905,28 @@ int virtqueue_set_dma_premapped(struct virtqueue *_vq)
> > > >  }
> > > >  EXPORT_SYMBOL_GPL(virtqueue_set_dma_premapped);
> > > >
> > > > +/**
> > > > + * virtqueue_get_dma_premapped - get the vring premapped mode
> > > > + * @_vq: the struct virtqueue we're talking about.
> > > > + *
> > > > + * Get the premapped mode of the vq.
> > > > + *
> > > > + * Returns bool for the vq premapped mode.
> > > > + */
> > > > +bool virtqueue_get_dma_premapped(struct virtqueue *_vq)
> > > > +{
> > > > +       struct vring_virtqueue *vq = to_vvq(_vq);
> > > > +       bool premapped;
> > > > +
> > > > +       START_USE(vq);
> > > > +       premapped = vq->premapped;
> > > > +       END_USE(vq);
> > >
> > > Why do we need to protect premapped like this? Is the user allowed to
> > > change it on the fly?
> >
> >
> > Just protect before accessing vq.
>
> I meant how did that differ from other booleans? E.g use_dma_api, do_unmap etc.

Sorry, maybe I misunderstanded you.

Do you mean, should we put "premapped" to the struct virtqueue?
Then the user can read/write by the struct virtqueue directly?

If that, the reason is that when set premapped, we must check
use_dma_api.


int virtqueue_set_dma_premapped(struct virtqueue *_vq)
{
	struct vring_virtqueue *vq = to_vvq(_vq);
	u32 num;

	START_USE(vq);

	num = vq->packed_ring ? vq->packed.vring.num : vq->split.vring.num;

	if (num != vq->vq.num_free) {
		END_USE(vq);
		return -EINVAL;
	}

	if (!vq->use_dma_api) {
		END_USE(vq);
		return -EINVAL;
	}

	vq->buffer_is_premapped = true;

	END_USE(vq);

	return 0;
}

Thanks.

>
> Thanks
>
> >
> > Thanks.
> > >
> > > Thanks
> > >
> >
>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH net-next 0/5] virtio-net: sq support premapped mode
  2024-01-29  3:14           ` Jason Wang
@ 2024-01-29  3:37             ` Xuan Zhuo
  2024-01-29  6:23               ` Xuan Zhuo
  0 siblings, 1 reply; 33+ messages in thread
From: Xuan Zhuo @ 2024-01-29  3:37 UTC (permalink / raw)
  To: Jason Wang
  Cc: netdev, Michael S. Tsirkin, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Alexei Starovoitov, Daniel Borkmann,
	Jesper Dangaard Brouer, John Fastabend, virtualization, bpf

On Mon, 29 Jan 2024 11:14:43 +0800, Jason Wang <jasowang@redhat.com> wrote:
> On Thu, Jan 25, 2024 at 2:33 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> >
> > On Thu, 25 Jan 2024 14:14:58 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > On Thu, Jan 25, 2024 at 1:52 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > >
> > > > On Thu, 25 Jan 2024 13:42:05 +0800, Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > > > On Thu, 25 Jan 2024 11:39:28 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > > > > On Tue, Jan 16, 2024 at 3:59 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > > > > >
> > > > > > > This is the second part of virtio-net support AF_XDP zero copy.
> > > > > > >
> > > > > > > The whole patch set
> > > > > > > http://lore.kernel.org/all/20231229073108.57778-1-xuanzhuo@linux.alibaba.com
> > > > > > >
> > > > > > > ## About the branch
> > > > > > >
> > > > > > > This patch set is pushed to the net-next branch, but some patches are about
> > > > > > > virtio core. Because the entire patch set for virtio-net to support AF_XDP
> > > > > > > should be pushed to net-next, I hope these patches will be merged into net-next
> > > > > > > with the virtio core maintains's Acked-by.
> > > > > > >
> > > > > > > ============================================================================
> > > > > > >
> > > > > > > ## AF_XDP
> > > > > > >
> > > > > > > XDP socket(AF_XDP) is an excellent bypass kernel network framework. The zero
> > > > > > > copy feature of xsk (XDP socket) needs to be supported by the driver. The
> > > > > > > performance of zero copy is very good. mlx5 and intel ixgbe already support
> > > > > > > this feature, This patch set allows virtio-net to support xsk's zerocopy xmit
> > > > > > > feature.
> > > > > > >
> > > > > > > At present, we have completed some preparation:
> > > > > > >
> > > > > > > 1. vq-reset (virtio spec and kernel code)
> > > > > > > 2. virtio-core premapped dma
> > > > > > > 3. virtio-net xdp refactor
> > > > > > >
> > > > > > > So it is time for Virtio-Net to complete the support for the XDP Socket
> > > > > > > Zerocopy.
> > > > > > >
> > > > > > > Virtio-net can not increase the queue num at will, so xsk shares the queue with
> > > > > > > kernel.
> > > > > > >
> > > > > > > On the other hand, Virtio-Net does not support generate interrupt from driver
> > > > > > > manually, so when we wakeup tx xmit, we used some tips. If the CPU run by TX
> > > > > > > NAPI last time is other CPUs, use IPI to wake up NAPI on the remote CPU. If it
> > > > > > > is also the local CPU, then we wake up napi directly.
> > > > > > >
> > > > > > > This patch set includes some refactor to the virtio-net to let that to support
> > > > > > > AF_XDP.
> > > > > > >
> > > > > > > ## performance
> > > > > > >
> > > > > > > ENV: Qemu with vhost-user(polling mode).
> > > > > > > Host CPU: Intel(R) Xeon(R) Platinum 8163 CPU @ 2.50GHz
> > > > > > >
> > > > > > > ### virtio PMD in guest with testpmd
> > > > > > >
> > > > > > > testpmd> show port stats all
> > > > > > >
> > > > > > >  ######################## NIC statistics for port 0 ########################
> > > > > > >  RX-packets: 19531092064 RX-missed: 0     RX-bytes: 1093741155584
> > > > > > >  RX-errors: 0
> > > > > > >  RX-nombuf: 0
> > > > > > >  TX-packets: 5959955552 TX-errors: 0     TX-bytes: 371030645664
> > > > > > >
> > > > > > >
> > > > > > >  Throughput (since last show)
> > > > > > >  Rx-pps:   8861574     Rx-bps:  3969985208
> > > > > > >  Tx-pps:   8861493     Tx-bps:  3969962736
> > > > > > >  ############################################################################
> > > > > > >
> > > > > > > ### AF_XDP PMD in guest with testpmd
> > > > > > >
> > > > > > > testpmd> show port stats all
> > > > > > >
> > > > > > >   ######################## NIC statistics for port 0  ########################
> > > > > > >   RX-packets: 68152727   RX-missed: 0          RX-bytes:  3816552712
> > > > > > >   RX-errors: 0
> > > > > > >   RX-nombuf:  0
> > > > > > >   TX-packets: 68114967   TX-errors: 33216      TX-bytes:  3814438152
> > > > > > >
> > > > > > >   Throughput (since last show)
> > > > > > >   Rx-pps:      6333196          Rx-bps:   2837272088
> > > > > > >   Tx-pps:      6333227          Tx-bps:   2837285936
> > > > > > >   ############################################################################
> > > > > > >
> > > > > > > But AF_XDP consumes more CPU for tx and rx napi(100% and 86%).
> > > > > > >
> > > > > > > ## maintain
> > > > > > >
> > > > > > > I am currently a reviewer for virtio-net. I commit to maintain AF_XDP support in
> > > > > > > virtio-net.
> > > > > > >
> > > > > > > Please review.
> > > > > > >
> > > > > >
> > > > > > Rethink of the whole design, I have one question:
> > > > > >
> > > > > > The reason we need to store DMA information is to harden the virtqueue
> > > > > > to make sure the DMA unmap is safe. This seems redundant when the
> > > > > > buffer were premapped by the driver, for example:
> > > > > >
> > > > > > Receive queue maintains DMA information, so it doesn't need desc_extra to work.
> > > > > >
> > > > > > So can we simply
> > > > > >
> > > > > > 1) when premapping is enabled, store DMA information by driver itself
> > > > >
> > > > > YES. this is simpler. And this is more convenience.
> > > > > But the driver must allocate memory to store the dma info.
> > >
> > > Right, and this looks like the common practice for most of the NIC drivers.
> > >
> > > > >
> > > > > > 2) don't store DMA information in desc_extra
> > > > >
> > > > > YES. But the desc_extra memory is wasted. The "next" item is used.
> > > > > Do you think should we free the desc_extra when the vq is premapped mode?
> > > >
> > > >
> > > > struct vring_desc_extra {
> > > >         dma_addr_t addr;                /* Descriptor DMA addr. */
> > > >         u32 len;                        /* Descriptor length. */
> > > >         u16 flags;                      /* Descriptor flags. */
> > > >         u16 next;                       /* The next desc state in a list. */
> > > > };
> > > >
> > > >
> > > > The flags and the next are used whatever premapped or not.
> > > >
> > > > So I think we can add a new array to store the addr and len.
> > >
> > > Yes.
> > >
> > > > If the vq is premappd, the memory can be freed.
> > >
> > > Then we need to make sure the premapped is set before find_vqs() etc.
> >
> >
> > Yes. We can start from the parameters of the find_vqs().
> >
> > But actually we can free the dma array when the driver sets premapped mode.
>
> Probably, but that's kind of odd.
>
> init()
>     alloc()
>
> set_premapped()
>     free()

If so, the premapped option will be a find_vqs parameter,
the api virtqueue_set_dma_premapped will be removed.
And we can put the buffer_is_premapped to the struct virtqueue,
The use can access it on the fly. (You asked on #4)


>
> >
> > >
> > > >
> > > > struct vring_desc_extra {
> > > >         u16 flags;                      /* Descriptor flags. */
> > > >         u16 next;                       /* The next desc state in a list. */
> > > > };
> > > >
> > > > struct vring_desc_dma {
> > > >         dma_addr_t addr;                /* Descriptor DMA addr. */
> > > >         u32 len;                        /* Descriptor length. */
> > > > };
> > > >
> > > > Thanks.
> >
> > As we discussed, you may wait my next patch set of the new design.
> >
> > Could you review the first patch set of this serial.
> > http://lore.kernel.org/all/20240116062842.67874-1-xuanzhuo@linux.alibaba.com
> >
> > I work on top of this.
>
> Actually, I'm a little confused about the dependencies.
>
> We have three:
>
> 1) move the virtio-net to a dedicated directory
> 2) premapped mode
> 3) AF_XDP
>
> It looks to me the current series is posted in that dependency.
>
> Then I have questions:
>
> 1) do we agree with moving to a directory (I don't have a preference)?
> 2) if 3) depends on 2), I'd suggest to make sure 2) is finalized
> before posting 3), this is because we have gone through several rounds
> of AF_XDP and most concerns were for the API that is introduced in 2)
>
> Does this make sense?

YES. this make sense.

I do this because I can reduce the work of rebasing the code again.

But you are right, this is the right order.


>
> >
> > PS.
> >
> > There is another patch set "device stats". I hope that is in your list.
> >
> > http://lore.kernel.org/all/20231226073103.116153-1-xuanzhuo@linux.alibaba.com
>
> Yes, it is. (If it doesn't depend on the moving of the source).

It does not depend on that.

Thanks.


>
> Thanks
>
> >
> > Thanks.
> >
> >
> > >
> > > Thanks
> > >
> > > >
> > > > >
> > > > > Thanks.
> > > > >
> > > > >
> > > > > >
> > > > > > Would this be simpler?
> > > > > >
> > > > > > Thanks
> > > > > >
> > > >
> > >
> >
>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH net-next 0/5] virtio-net: sq support premapped mode
  2024-01-29  3:37             ` Xuan Zhuo
@ 2024-01-29  6:23               ` Xuan Zhuo
  2024-01-30  2:51                 ` Jason Wang
  0 siblings, 1 reply; 33+ messages in thread
From: Xuan Zhuo @ 2024-01-29  6:23 UTC (permalink / raw)
  To: Xuan Zhuo
  Cc: netdev, Michael S. Tsirkin, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Alexei Starovoitov, Daniel Borkmann,
	Jesper Dangaard Brouer, John Fastabend, virtualization, bpf,
	Jason Wang

On Mon, 29 Jan 2024 11:37:56 +0800, Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> On Mon, 29 Jan 2024 11:14:43 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > On Thu, Jan 25, 2024 at 2:33 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > >
> > > On Thu, 25 Jan 2024 14:14:58 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > > On Thu, Jan 25, 2024 at 1:52 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > > >
> > > > > On Thu, 25 Jan 2024 13:42:05 +0800, Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > > > > On Thu, 25 Jan 2024 11:39:28 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > > > > > On Tue, Jan 16, 2024 at 3:59 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > > > > > >
> > > > > > > > This is the second part of virtio-net support AF_XDP zero copy.
> > > > > > > >
> > > > > > > > The whole patch set
> > > > > > > > http://lore.kernel.org/all/20231229073108.57778-1-xuanzhuo@linux.alibaba.com
> > > > > > > >
> > > > > > > > ## About the branch
> > > > > > > >
> > > > > > > > This patch set is pushed to the net-next branch, but some patches are about
> > > > > > > > virtio core. Because the entire patch set for virtio-net to support AF_XDP
> > > > > > > > should be pushed to net-next, I hope these patches will be merged into net-next
> > > > > > > > with the virtio core maintains's Acked-by.
> > > > > > > >
> > > > > > > > ============================================================================
> > > > > > > >
> > > > > > > > ## AF_XDP
> > > > > > > >
> > > > > > > > XDP socket(AF_XDP) is an excellent bypass kernel network framework. The zero
> > > > > > > > copy feature of xsk (XDP socket) needs to be supported by the driver. The
> > > > > > > > performance of zero copy is very good. mlx5 and intel ixgbe already support
> > > > > > > > this feature, This patch set allows virtio-net to support xsk's zerocopy xmit
> > > > > > > > feature.
> > > > > > > >
> > > > > > > > At present, we have completed some preparation:
> > > > > > > >
> > > > > > > > 1. vq-reset (virtio spec and kernel code)
> > > > > > > > 2. virtio-core premapped dma
> > > > > > > > 3. virtio-net xdp refactor
> > > > > > > >
> > > > > > > > So it is time for Virtio-Net to complete the support for the XDP Socket
> > > > > > > > Zerocopy.
> > > > > > > >
> > > > > > > > Virtio-net can not increase the queue num at will, so xsk shares the queue with
> > > > > > > > kernel.
> > > > > > > >
> > > > > > > > On the other hand, Virtio-Net does not support generate interrupt from driver
> > > > > > > > manually, so when we wakeup tx xmit, we used some tips. If the CPU run by TX
> > > > > > > > NAPI last time is other CPUs, use IPI to wake up NAPI on the remote CPU. If it
> > > > > > > > is also the local CPU, then we wake up napi directly.
> > > > > > > >
> > > > > > > > This patch set includes some refactor to the virtio-net to let that to support
> > > > > > > > AF_XDP.
> > > > > > > >
> > > > > > > > ## performance
> > > > > > > >
> > > > > > > > ENV: Qemu with vhost-user(polling mode).
> > > > > > > > Host CPU: Intel(R) Xeon(R) Platinum 8163 CPU @ 2.50GHz
> > > > > > > >
> > > > > > > > ### virtio PMD in guest with testpmd
> > > > > > > >
> > > > > > > > testpmd> show port stats all
> > > > > > > >
> > > > > > > >  ######################## NIC statistics for port 0 ########################
> > > > > > > >  RX-packets: 19531092064 RX-missed: 0     RX-bytes: 1093741155584
> > > > > > > >  RX-errors: 0
> > > > > > > >  RX-nombuf: 0
> > > > > > > >  TX-packets: 5959955552 TX-errors: 0     TX-bytes: 371030645664
> > > > > > > >
> > > > > > > >
> > > > > > > >  Throughput (since last show)
> > > > > > > >  Rx-pps:   8861574     Rx-bps:  3969985208
> > > > > > > >  Tx-pps:   8861493     Tx-bps:  3969962736
> > > > > > > >  ############################################################################
> > > > > > > >
> > > > > > > > ### AF_XDP PMD in guest with testpmd
> > > > > > > >
> > > > > > > > testpmd> show port stats all
> > > > > > > >
> > > > > > > >   ######################## NIC statistics for port 0  ########################
> > > > > > > >   RX-packets: 68152727   RX-missed: 0          RX-bytes:  3816552712
> > > > > > > >   RX-errors: 0
> > > > > > > >   RX-nombuf:  0
> > > > > > > >   TX-packets: 68114967   TX-errors: 33216      TX-bytes:  3814438152
> > > > > > > >
> > > > > > > >   Throughput (since last show)
> > > > > > > >   Rx-pps:      6333196          Rx-bps:   2837272088
> > > > > > > >   Tx-pps:      6333227          Tx-bps:   2837285936
> > > > > > > >   ############################################################################
> > > > > > > >
> > > > > > > > But AF_XDP consumes more CPU for tx and rx napi(100% and 86%).
> > > > > > > >
> > > > > > > > ## maintain
> > > > > > > >
> > > > > > > > I am currently a reviewer for virtio-net. I commit to maintain AF_XDP support in
> > > > > > > > virtio-net.
> > > > > > > >
> > > > > > > > Please review.
> > > > > > > >
> > > > > > >
> > > > > > > Rethink of the whole design, I have one question:
> > > > > > >
> > > > > > > The reason we need to store DMA information is to harden the virtqueue
> > > > > > > to make sure the DMA unmap is safe. This seems redundant when the
> > > > > > > buffer were premapped by the driver, for example:
> > > > > > >
> > > > > > > Receive queue maintains DMA information, so it doesn't need desc_extra to work.
> > > > > > >
> > > > > > > So can we simply
> > > > > > >
> > > > > > > 1) when premapping is enabled, store DMA information by driver itself
> > > > > >
> > > > > > YES. this is simpler. And this is more convenience.
> > > > > > But the driver must allocate memory to store the dma info.
> > > >
> > > > Right, and this looks like the common practice for most of the NIC drivers.
> > > >
> > > > > >
> > > > > > > 2) don't store DMA information in desc_extra
> > > > > >
> > > > > > YES. But the desc_extra memory is wasted. The "next" item is used.
> > > > > > Do you think should we free the desc_extra when the vq is premapped mode?
> > > > >
> > > > >
> > > > > struct vring_desc_extra {
> > > > >         dma_addr_t addr;                /* Descriptor DMA addr. */
> > > > >         u32 len;                        /* Descriptor length. */
> > > > >         u16 flags;                      /* Descriptor flags. */
> > > > >         u16 next;                       /* The next desc state in a list. */
> > > > > };
> > > > >
> > > > >
> > > > > The flags and the next are used whatever premapped or not.
> > > > >
> > > > > So I think we can add a new array to store the addr and len.
> > > >
> > > > Yes.
> > > >
> > > > > If the vq is premappd, the memory can be freed.
> > > >
> > > > Then we need to make sure the premapped is set before find_vqs() etc.
> > >
> > >
> > > Yes. We can start from the parameters of the find_vqs().
> > >
> > > But actually we can free the dma array when the driver sets premapped mode.
> >
> > Probably, but that's kind of odd.
> >
> > init()
> >     alloc()
> >
> > set_premapped()
> >     free()
>
> If so, the premapped option will be a find_vqs parameter,
> the api virtqueue_set_dma_premapped will be removed.
> And we can put the buffer_is_premapped to the struct virtqueue,
> The use can access it on the fly. (You asked on #4)


I try to pass the option to find_vqs.

You know, the find_vqs has too many parameters.
And everytime we try to add new parameter is a difficult work.
Many places need to be changed.


	int (*find_vqs)(struct virtio_device *, unsigned nvqs,
			struct virtqueue *vqs[], vq_callback_t *callbacks[],
			const char * const names[], const bool *ctx,
			const bool *premapped,
			struct irq_affinity *desc);

Do you have any preference if I try to refactor this to pass a struct?

Thanks.

>
>
> >
> > >
> > > >
> > > > >
> > > > > struct vring_desc_extra {
> > > > >         u16 flags;                      /* Descriptor flags. */
> > > > >         u16 next;                       /* The next desc state in a list. */
> > > > > };
> > > > >
> > > > > struct vring_desc_dma {
> > > > >         dma_addr_t addr;                /* Descriptor DMA addr. */
> > > > >         u32 len;                        /* Descriptor length. */
> > > > > };
> > > > >
> > > > > Thanks.
> > >
> > > As we discussed, you may wait my next patch set of the new design.
> > >
> > > Could you review the first patch set of this serial.
> > > http://lore.kernel.org/all/20240116062842.67874-1-xuanzhuo@linux.alibaba.com
> > >
> > > I work on top of this.
> >
> > Actually, I'm a little confused about the dependencies.
> >
> > We have three:
> >
> > 1) move the virtio-net to a dedicated directory
> > 2) premapped mode
> > 3) AF_XDP
> >
> > It looks to me the current series is posted in that dependency.
> >
> > Then I have questions:
> >
> > 1) do we agree with moving to a directory (I don't have a preference)?
> > 2) if 3) depends on 2), I'd suggest to make sure 2) is finalized
> > before posting 3), this is because we have gone through several rounds
> > of AF_XDP and most concerns were for the API that is introduced in 2)
> >
> > Does this make sense?
>
> YES. this make sense.
>
> I do this because I can reduce the work of rebasing the code again.
>
> But you are right, this is the right order.
>
>
> >
> > >
> > > PS.
> > >
> > > There is another patch set "device stats". I hope that is in your list.
> > >
> > > http://lore.kernel.org/all/20231226073103.116153-1-xuanzhuo@linux.alibaba.com
> >
> > Yes, it is. (If it doesn't depend on the moving of the source).
>
> It does not depend on that.
>
> Thanks.
>
>
> >
> > Thanks
> >
> > >
> > > Thanks.
> > >
> > >
> > > >
> > > > Thanks
> > > >
> > > > >
> > > > > >
> > > > > > Thanks.
> > > > > >
> > > > > >
> > > > > > >
> > > > > > > Would this be simpler?
> > > > > > >
> > > > > > > Thanks
> > > > > > >
> > > > >
> > > >
> > >
> >
>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH net-next 0/5] virtio-net: sq support premapped mode
  2024-01-29  6:23               ` Xuan Zhuo
@ 2024-01-30  2:51                 ` Jason Wang
  2024-01-30  3:13                   ` Xuan Zhuo
  0 siblings, 1 reply; 33+ messages in thread
From: Jason Wang @ 2024-01-30  2:51 UTC (permalink / raw)
  To: Xuan Zhuo
  Cc: netdev, Michael S. Tsirkin, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Alexei Starovoitov, Daniel Borkmann,
	Jesper Dangaard Brouer, John Fastabend, virtualization, bpf

On Mon, Jan 29, 2024 at 2:28 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
>
> On Mon, 29 Jan 2024 11:37:56 +0800, Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > On Mon, 29 Jan 2024 11:14:43 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > On Thu, Jan 25, 2024 at 2:33 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > >
> > > > On Thu, 25 Jan 2024 14:14:58 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > > > On Thu, Jan 25, 2024 at 1:52 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > > > >
> > > > > > On Thu, 25 Jan 2024 13:42:05 +0800, Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > > > > > On Thu, 25 Jan 2024 11:39:28 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > > > > > > On Tue, Jan 16, 2024 at 3:59 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > > > > > > >
> > > > > > > > > This is the second part of virtio-net support AF_XDP zero copy.
> > > > > > > > >
> > > > > > > > > The whole patch set
> > > > > > > > > http://lore.kernel.org/all/20231229073108.57778-1-xuanzhuo@linux.alibaba.com
> > > > > > > > >
> > > > > > > > > ## About the branch
> > > > > > > > >
> > > > > > > > > This patch set is pushed to the net-next branch, but some patches are about
> > > > > > > > > virtio core. Because the entire patch set for virtio-net to support AF_XDP
> > > > > > > > > should be pushed to net-next, I hope these patches will be merged into net-next
> > > > > > > > > with the virtio core maintains's Acked-by.
> > > > > > > > >
> > > > > > > > > ============================================================================
> > > > > > > > >
> > > > > > > > > ## AF_XDP
> > > > > > > > >
> > > > > > > > > XDP socket(AF_XDP) is an excellent bypass kernel network framework. The zero
> > > > > > > > > copy feature of xsk (XDP socket) needs to be supported by the driver. The
> > > > > > > > > performance of zero copy is very good. mlx5 and intel ixgbe already support
> > > > > > > > > this feature, This patch set allows virtio-net to support xsk's zerocopy xmit
> > > > > > > > > feature.
> > > > > > > > >
> > > > > > > > > At present, we have completed some preparation:
> > > > > > > > >
> > > > > > > > > 1. vq-reset (virtio spec and kernel code)
> > > > > > > > > 2. virtio-core premapped dma
> > > > > > > > > 3. virtio-net xdp refactor
> > > > > > > > >
> > > > > > > > > So it is time for Virtio-Net to complete the support for the XDP Socket
> > > > > > > > > Zerocopy.
> > > > > > > > >
> > > > > > > > > Virtio-net can not increase the queue num at will, so xsk shares the queue with
> > > > > > > > > kernel.
> > > > > > > > >
> > > > > > > > > On the other hand, Virtio-Net does not support generate interrupt from driver
> > > > > > > > > manually, so when we wakeup tx xmit, we used some tips. If the CPU run by TX
> > > > > > > > > NAPI last time is other CPUs, use IPI to wake up NAPI on the remote CPU. If it
> > > > > > > > > is also the local CPU, then we wake up napi directly.
> > > > > > > > >
> > > > > > > > > This patch set includes some refactor to the virtio-net to let that to support
> > > > > > > > > AF_XDP.
> > > > > > > > >
> > > > > > > > > ## performance
> > > > > > > > >
> > > > > > > > > ENV: Qemu with vhost-user(polling mode).
> > > > > > > > > Host CPU: Intel(R) Xeon(R) Platinum 8163 CPU @ 2.50GHz
> > > > > > > > >
> > > > > > > > > ### virtio PMD in guest with testpmd
> > > > > > > > >
> > > > > > > > > testpmd> show port stats all
> > > > > > > > >
> > > > > > > > >  ######################## NIC statistics for port 0 ########################
> > > > > > > > >  RX-packets: 19531092064 RX-missed: 0     RX-bytes: 1093741155584
> > > > > > > > >  RX-errors: 0
> > > > > > > > >  RX-nombuf: 0
> > > > > > > > >  TX-packets: 5959955552 TX-errors: 0     TX-bytes: 371030645664
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >  Throughput (since last show)
> > > > > > > > >  Rx-pps:   8861574     Rx-bps:  3969985208
> > > > > > > > >  Tx-pps:   8861493     Tx-bps:  3969962736
> > > > > > > > >  ############################################################################
> > > > > > > > >
> > > > > > > > > ### AF_XDP PMD in guest with testpmd
> > > > > > > > >
> > > > > > > > > testpmd> show port stats all
> > > > > > > > >
> > > > > > > > >   ######################## NIC statistics for port 0  ########################
> > > > > > > > >   RX-packets: 68152727   RX-missed: 0          RX-bytes:  3816552712
> > > > > > > > >   RX-errors: 0
> > > > > > > > >   RX-nombuf:  0
> > > > > > > > >   TX-packets: 68114967   TX-errors: 33216      TX-bytes:  3814438152
> > > > > > > > >
> > > > > > > > >   Throughput (since last show)
> > > > > > > > >   Rx-pps:      6333196          Rx-bps:   2837272088
> > > > > > > > >   Tx-pps:      6333227          Tx-bps:   2837285936
> > > > > > > > >   ############################################################################
> > > > > > > > >
> > > > > > > > > But AF_XDP consumes more CPU for tx and rx napi(100% and 86%).
> > > > > > > > >
> > > > > > > > > ## maintain
> > > > > > > > >
> > > > > > > > > I am currently a reviewer for virtio-net. I commit to maintain AF_XDP support in
> > > > > > > > > virtio-net.
> > > > > > > > >
> > > > > > > > > Please review.
> > > > > > > > >
> > > > > > > >
> > > > > > > > Rethink of the whole design, I have one question:
> > > > > > > >
> > > > > > > > The reason we need to store DMA information is to harden the virtqueue
> > > > > > > > to make sure the DMA unmap is safe. This seems redundant when the
> > > > > > > > buffer were premapped by the driver, for example:
> > > > > > > >
> > > > > > > > Receive queue maintains DMA information, so it doesn't need desc_extra to work.
> > > > > > > >
> > > > > > > > So can we simply
> > > > > > > >
> > > > > > > > 1) when premapping is enabled, store DMA information by driver itself
> > > > > > >
> > > > > > > YES. this is simpler. And this is more convenience.
> > > > > > > But the driver must allocate memory to store the dma info.
> > > > >
> > > > > Right, and this looks like the common practice for most of the NIC drivers.
> > > > >
> > > > > > >
> > > > > > > > 2) don't store DMA information in desc_extra
> > > > > > >
> > > > > > > YES. But the desc_extra memory is wasted. The "next" item is used.
> > > > > > > Do you think should we free the desc_extra when the vq is premapped mode?
> > > > > >
> > > > > >
> > > > > > struct vring_desc_extra {
> > > > > >         dma_addr_t addr;                /* Descriptor DMA addr. */
> > > > > >         u32 len;                        /* Descriptor length. */
> > > > > >         u16 flags;                      /* Descriptor flags. */
> > > > > >         u16 next;                       /* The next desc state in a list. */
> > > > > > };
> > > > > >
> > > > > >
> > > > > > The flags and the next are used whatever premapped or not.
> > > > > >
> > > > > > So I think we can add a new array to store the addr and len.
> > > > >
> > > > > Yes.
> > > > >
> > > > > > If the vq is premappd, the memory can be freed.
> > > > >
> > > > > Then we need to make sure the premapped is set before find_vqs() etc.
> > > >
> > > >
> > > > Yes. We can start from the parameters of the find_vqs().
> > > >
> > > > But actually we can free the dma array when the driver sets premapped mode.
> > >
> > > Probably, but that's kind of odd.
> > >
> > > init()
> > >     alloc()
> > >
> > > set_premapped()
> > >     free()
> >
> > If so, the premapped option will be a find_vqs parameter,
> > the api virtqueue_set_dma_premapped will be removed.
> > And we can put the buffer_is_premapped to the struct virtqueue,
> > The use can access it on the fly. (You asked on #4)
>
>
> I try to pass the option to find_vqs.
>
> You know, the find_vqs has too many parameters.
> And everytime we try to add new parameter is a difficult work.
> Many places need to be changed.
>
>
>         int (*find_vqs)(struct virtio_device *, unsigned nvqs,
>                         struct virtqueue *vqs[], vq_callback_t *callbacks[],
>                         const char * const names[], const bool *ctx,
>                         const bool *premapped,
>                         struct irq_affinity *desc);
>
> Do you have any preference if I try to refactor this to pass a struct?
>
> Thanks.

This should be fine.

Thanks


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH net-next 4/5] virtio_ring: introduce virtqueue_get_dma_premapped()
  2024-01-29  3:30         ` Xuan Zhuo
@ 2024-01-30  2:54           ` Jason Wang
  2024-01-30  3:13             ` Xuan Zhuo
  0 siblings, 1 reply; 33+ messages in thread
From: Jason Wang @ 2024-01-30  2:54 UTC (permalink / raw)
  To: Xuan Zhuo
  Cc: netdev, Michael S. Tsirkin, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Alexei Starovoitov, Daniel Borkmann,
	Jesper Dangaard Brouer, John Fastabend, virtualization, bpf

On Mon, Jan 29, 2024 at 11:33 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
>
> On Mon, 29 Jan 2024 11:07:50 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > On Thu, Jan 25, 2024 at 1:58 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > >
> > > On Thu, 25 Jan 2024 11:39:03 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > > On Tue, Jan 16, 2024 at 3:59 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > > >
> > > > > Introduce helper virtqueue_get_dma_premapped(), then the driver
> > > > > can know whether dma unmap is needed.
> > > > >
> > > > > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > > > > ---
> > > > >  drivers/net/virtio/main.c       | 22 +++++++++-------------
> > > > >  drivers/net/virtio/virtio_net.h |  3 ---
> > > > >  drivers/virtio/virtio_ring.c    | 22 ++++++++++++++++++++++
> > > > >  include/linux/virtio.h          |  1 +
> > > > >  4 files changed, 32 insertions(+), 16 deletions(-)
> > > > >
> > > > > diff --git a/drivers/net/virtio/main.c b/drivers/net/virtio/main.c
> > > > > index 186b2cf5d8fc..4fbf612da235 100644
> > > > > --- a/drivers/net/virtio/main.c
> > > > > +++ b/drivers/net/virtio/main.c
> > > > > @@ -483,7 +483,7 @@ static void *virtnet_rq_get_buf(struct virtnet_rq *rq, u32 *len, void **ctx)
> > > > >         void *buf;
> > > > >
> > > > >         buf = virtqueue_get_buf_ctx(rq->vq, len, ctx);
> > > > > -       if (buf && rq->do_dma)
> > > > > +       if (buf && virtqueue_get_dma_premapped(rq->vq))
> > > > >                 virtnet_rq_unmap(rq, buf, *len);
> > > > >
> > > > >         return buf;
> > > > > @@ -496,7 +496,7 @@ static void virtnet_rq_init_one_sg(struct virtnet_rq *rq, void *buf, u32 len)
> > > > >         u32 offset;
> > > > >         void *head;
> > > > >
> > > > > -       if (!rq->do_dma) {
> > > > > +       if (!virtqueue_get_dma_premapped(rq->vq)) {
> > > > >                 sg_init_one(rq->sg, buf, len);
> > > > >                 return;
> > > > >         }
> > > > > @@ -526,7 +526,7 @@ static void *virtnet_rq_alloc(struct virtnet_rq *rq, u32 size, gfp_t gfp)
> > > > >
> > > > >         head = page_address(alloc_frag->page);
> > > > >
> > > > > -       if (rq->do_dma) {
> > > > > +       if (virtqueue_get_dma_premapped(rq->vq)) {
> > > > >                 dma = head;
> > > > >
> > > > >                 /* new pages */
> > > > > @@ -580,12 +580,8 @@ static void virtnet_rq_set_premapped(struct virtnet_info *vi)
> > > > >         if (!vi->mergeable_rx_bufs && vi->big_packets)
> > > > >                 return;
> > > > >
> > > > > -       for (i = 0; i < vi->max_queue_pairs; i++) {
> > > > > -               if (virtqueue_set_dma_premapped(vi->rq[i].vq))
> > > > > -                       continue;
> > > > > -
> > > > > -               vi->rq[i].do_dma = true;
> > > > > -       }
> > > > > +       for (i = 0; i < vi->max_queue_pairs; i++)
> > > > > +               virtqueue_set_dma_premapped(vi->rq[i].vq);
> > > > >  }
> > > > >
> > > > >  static void free_old_xmit(struct virtnet_sq *sq, bool in_napi)
> > > > > @@ -1643,7 +1639,7 @@ static int add_recvbuf_small(struct virtnet_info *vi, struct virtnet_rq *rq,
> > > > >
> > > > >         err = virtqueue_add_inbuf_ctx(rq->vq, rq->sg, 1, buf, ctx, gfp);
> > > > >         if (err < 0) {
> > > > > -               if (rq->do_dma)
> > > > > +               if (virtqueue_get_dma_premapped(rq->vq))
> > > > >                         virtnet_rq_unmap(rq, buf, 0);
> > > > >                 put_page(virt_to_head_page(buf));
> > > > >         }
> > > > > @@ -1758,7 +1754,7 @@ static int add_recvbuf_mergeable(struct virtnet_info *vi,
> > > > >         ctx = mergeable_len_to_ctx(len + room, headroom);
> > > > >         err = virtqueue_add_inbuf_ctx(rq->vq, rq->sg, 1, buf, ctx, gfp);
> > > > >         if (err < 0) {
> > > > > -               if (rq->do_dma)
> > > > > +               if (virtqueue_get_dma_premapped(rq->vq))
> > > > >                         virtnet_rq_unmap(rq, buf, 0);
> > > > >                 put_page(virt_to_head_page(buf));
> > > > >         }
> > > > > @@ -4007,7 +4003,7 @@ static void free_receive_page_frags(struct virtnet_info *vi)
> > > > >         int i;
> > > > >         for (i = 0; i < vi->max_queue_pairs; i++)
> > > > >                 if (vi->rq[i].alloc_frag.page) {
> > > > > -                       if (vi->rq[i].do_dma && vi->rq[i].last_dma)
> > > > > +                       if (virtqueue_get_dma_premapped(vi->rq[i].vq) && vi->rq[i].last_dma)
> > > > >                                 virtnet_rq_unmap(&vi->rq[i], vi->rq[i].last_dma, 0);
> > > > >                         put_page(vi->rq[i].alloc_frag.page);
> > > > >                 }
> > > > > @@ -4035,7 +4031,7 @@ static void virtnet_rq_free_unused_bufs(struct virtqueue *vq)
> > > > >         rq = &vi->rq[i];
> > > > >
> > > > >         while ((buf = virtqueue_detach_unused_buf(vq)) != NULL) {
> > > > > -               if (rq->do_dma)
> > > > > +               if (virtqueue_get_dma_premapped(rq->vq))
> > > > >                         virtnet_rq_unmap(rq, buf, 0);
> > > > >
> > > > >                 virtnet_rq_free_buf(vi, rq, buf);
> > > > > diff --git a/drivers/net/virtio/virtio_net.h b/drivers/net/virtio/virtio_net.h
> > > > > index b28a4d0a3150..066a2b9d2b3c 100644
> > > > > --- a/drivers/net/virtio/virtio_net.h
> > > > > +++ b/drivers/net/virtio/virtio_net.h
> > > > > @@ -115,9 +115,6 @@ struct virtnet_rq {
> > > > >
> > > > >         /* Record the last dma info to free after new pages is allocated. */
> > > > >         struct virtnet_rq_dma *last_dma;
> > > > > -
> > > > > -       /* Do dma by self */
> > > > > -       bool do_dma;
> > > > >  };
> > > > >
> > > > >  struct virtnet_info {
> > > > > diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> > > > > index 2c5089d3b510..9092bcdebb53 100644
> > > > > --- a/drivers/virtio/virtio_ring.c
> > > > > +++ b/drivers/virtio/virtio_ring.c
> > > > > @@ -2905,6 +2905,28 @@ int virtqueue_set_dma_premapped(struct virtqueue *_vq)
> > > > >  }
> > > > >  EXPORT_SYMBOL_GPL(virtqueue_set_dma_premapped);
> > > > >
> > > > > +/**
> > > > > + * virtqueue_get_dma_premapped - get the vring premapped mode
> > > > > + * @_vq: the struct virtqueue we're talking about.
> > > > > + *
> > > > > + * Get the premapped mode of the vq.
> > > > > + *
> > > > > + * Returns bool for the vq premapped mode.
> > > > > + */
> > > > > +bool virtqueue_get_dma_premapped(struct virtqueue *_vq)
> > > > > +{
> > > > > +       struct vring_virtqueue *vq = to_vvq(_vq);
> > > > > +       bool premapped;
> > > > > +
> > > > > +       START_USE(vq);
> > > > > +       premapped = vq->premapped;
> > > > > +       END_USE(vq);
> > > >
> > > > Why do we need to protect premapped like this? Is the user allowed to
> > > > change it on the fly?
> > >
> > >
> > > Just protect before accessing vq.
> >
> > I meant how did that differ from other booleans? E.g use_dma_api, do_unmap etc.
>
> Sorry, maybe I misunderstanded you.
>
> Do you mean, should we put "premapped" to the struct virtqueue?
> Then the user can read/write by the struct virtqueue directly?
>
> If that, the reason is that when set premapped, we must check
> use_dma_api.

I may not be very clear.

I meant why we should protect premapped with START_USE()/END_USER() here.

If it is set once during init_vqs we should not need that.

Thanks


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH net-next 5/5] virtio_net: sq support premapped mode
  2024-01-29  3:11         ` Xuan Zhuo
@ 2024-01-30  2:56           ` Jason Wang
  2024-01-30  3:15             ` Xuan Zhuo
  0 siblings, 1 reply; 33+ messages in thread
From: Jason Wang @ 2024-01-30  2:56 UTC (permalink / raw)
  To: Xuan Zhuo
  Cc: netdev, Michael S. Tsirkin, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Alexei Starovoitov, Daniel Borkmann,
	Jesper Dangaard Brouer, John Fastabend, virtualization, bpf

On Mon, Jan 29, 2024 at 11:28 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
>
> On Mon, 29 Jan 2024 11:06:35 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > On Thu, Jan 25, 2024 at 2:24 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > >
> > > On Thu, 25 Jan 2024 11:39:20 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > > On Tue, Jan 16, 2024 at 3:59 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > > >
> > > > > If the xsk is enabling, the xsk tx will share the send queue.
> > > >
> > > > Any reason for this? Technically, virtio-net can work as other NIC
> > > > like 256 queues. There could be some work like optimizing the
> > > > interrupt allocations etc.
> > >
> > > Just like the logic of XDP_TX.
> > >
> > > Now the virtio spec does not allow to add new dynamic queues.
> > > As I know, most hypervisors just support few queues.
> >
> > When multiqueue is developed in Qemu, it support as least 256 queue
> > pairs if my memory is correct.
> >
>
>
> YES, but that is configured by the hypervisor.
>
> For the user on any platform, when he got a vm, the queue num is fixed.
> As I know, on most case, the num is less.
> If we want the af-xdp/xdp-tx has the the independent queues
> I think the dynamic queue is good way.

Yes, we can start from this.

>
>
> > > The num of
> > > queues is not bigger than the cpu num. So the best way is
> > > to share the send queues.
> > >
> > > Parav and I tried to introduce dynamic queues.
> >
> > Virtio-net doesn't differ from real NIC where most of them can create
> > queue dynamically. It's more about the resource allocation, if mgmt
> > can start with 256 queues, then we probably fine.
>
> But now, if the devices has 256, we will enable the 256 queues by default.
> that is too much.

It doesn't differ from the other NIC. E.g currently the active #qps is
determined by the number of cpus. this is only true if we have 256
cpus.

>
> So, the dynamic queue is not to create a new queue out of the resource.
>
> The device may tell the driver, the max queue resource is 256,
> but let we start from 8. If the driver need more, then we can
> enable more.

This is the policy we used now.

>
> But for me, the xdp tx can share the sq queue, so let we start
> the af-xdp from sharing sq queue.
>
>
> >
> > But I think we can leave this question now.
> >
> > > But that is dropped.
> > > Before that I think we can share the send queues.
> > >
> > >
> > > >
> > > > > But the xsk requires that the send queue use the premapped mode.
> > > > > So the send queue must support premapped mode.
> > > > >
> > > > > command: pktgen_sample01_simple.sh -i eth0 -s 16/1400 -d 10.0.0.123 -m 00:16:3e:12:e1:3e -n 0 -p 100
> > > > > machine:  ecs.ebmg6e.26xlarge of Aliyun
> > > > > cpu: Intel(R) Xeon(R) Platinum 8269CY CPU @ 2.50GHz
> > > > > iommu mode: intel_iommu=on iommu.strict=1 iommu=nopt
> > > > >
> > > > >                       |        iommu off           |        iommu on
> > > > > ----------------------|-----------------------------------------------------
> > > > >                       | 16         |  1400         | 16         | 1400
> > > > > ----------------------|-----------------------------------------------------
> > > > > Before:               |1716796.00  |  1581829.00   | 390756.00  | 374493.00
> > > > > After(premapped off): |1733794.00  |  1576259.00   | 390189.00  | 378128.00
> > > > > After(premapped on):  |1707107.00  |  1562917.00   | 385667.00  | 373584.00
> > > > >
> > > > > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > > > > ---
> > > > >  drivers/net/virtio/main.c       | 119 ++++++++++++++++++++++++++++----
> > > > >  drivers/net/virtio/virtio_net.h |  10 ++-
> > > > >  2 files changed, 116 insertions(+), 13 deletions(-)
> > > > >
> > > > > diff --git a/drivers/net/virtio/main.c b/drivers/net/virtio/main.c
> > > > > index 4fbf612da235..53143f95a3a0 100644
> > > > > --- a/drivers/net/virtio/main.c
> > > > > +++ b/drivers/net/virtio/main.c
> > > > > @@ -168,13 +168,39 @@ static struct xdp_frame *ptr_to_xdp(void *ptr)
> > > > >         return (struct xdp_frame *)((unsigned long)ptr & ~VIRTIO_XDP_FLAG);
> > > > >  }
> > > > >
> > > > > +static void virtnet_sq_unmap_buf(struct virtnet_sq *sq, struct virtio_dma_head *dma)
> > > > > +{
> > > > > +       int i;
> > > > > +
> > > > > +       if (!dma)
> > > > > +               return;
> > > > > +
> > > > > +       for (i = 0; i < dma->next; ++i)
> > > > > +               virtqueue_dma_unmap_single_attrs(sq->vq,
> > > > > +                                                dma->items[i].addr,
> > > > > +                                                dma->items[i].length,
> > > > > +                                                DMA_TO_DEVICE, 0);
> > > > > +       dma->next = 0;
> > > > > +}
> > > > > +
> > > > >  static void __free_old_xmit(struct virtnet_sq *sq, bool in_napi,
> > > > >                             u64 *bytes, u64 *packets)
> > > > >  {
> > > > > +       struct virtio_dma_head *dma;
> > > > >         unsigned int len;
> > > > >         void *ptr;
> > > > >
> > > > > -       while ((ptr = virtqueue_get_buf(sq->vq, &len)) != NULL) {
> > > > > +       if (virtqueue_get_dma_premapped(sq->vq)) {
> > > >
> > > > Any chance this.can be false?
> > >
> > > __free_old_xmit is the common path.
> >
> > Did you mean the XDP path doesn't work with this? If yes, we need to
> > change that.
>
>
> NO. If the virtio core use_dma_api is false, the dma premapped
> can not be ture.

Ok, I see.

>
> >
> > >
> > > The virtqueue_get_dma_premapped() is used to check whether the sq is premapped
> > > mode.
> > >
> > > >
> > > > > +               dma = &sq->dma.head;
> > > > > +               dma->num = ARRAY_SIZE(sq->dma.items);
> > > > > +               dma->next = 0;
> > > >
> > > > Btw, I found in the case of RX we have:
> > > >
> > > > virtnet_rq_alloc():
> > > >
> > > >                         alloc_frag->offset = sizeof(*dma);
> > > >
> > > > This seems to defeat frag coalescing when the memory is highly
> > > > fragmented or high order allocation is disallowed.
> > > >
> > > > Any idea to solve this?
> > >
> > >
> > > On the rq premapped pathset, I answered this.
> > >
> > > http://lore.kernel.org/all/1692156147.7470396-3-xuanzhuo@linux.alibaba.com
> >
> > Oops, I forget that.
> >
> > >
> > > >
> > > > > +       } else {
> > > > > +               dma = NULL;
> > > > > +       }
> > > > > +
> > > > > +       while ((ptr = virtqueue_get_buf_ctx_dma(sq->vq, &len, dma, NULL)) != NULL) {
> > > > > +               virtnet_sq_unmap_buf(sq, dma);
> > > > > +
> > > > >                 if (!is_xdp_frame(ptr)) {
> > > > >                         struct sk_buff *skb = ptr;
> > > > >
> > > > > @@ -572,16 +598,70 @@ static void *virtnet_rq_alloc(struct virtnet_rq *rq, u32 size, gfp_t gfp)
> > > > >         return buf;
> > > > >  }
> > > > >
> > > > > -static void virtnet_rq_set_premapped(struct virtnet_info *vi)
> > > > > +static void virtnet_set_premapped(struct virtnet_info *vi)
> > > > >  {
> > > > >         int i;
> > > > >
> > > > > -       /* disable for big mode */
> > > > > -       if (!vi->mergeable_rx_bufs && vi->big_packets)
> > > > > -               return;
> > > > > +       for (i = 0; i < vi->max_queue_pairs; i++) {
> > > > > +               virtqueue_set_dma_premapped(vi->sq[i].vq);
> > > > >
> > > > > -       for (i = 0; i < vi->max_queue_pairs; i++)
> > > > > -               virtqueue_set_dma_premapped(vi->rq[i].vq);
> > > > > +               /* TODO for big mode */
> > > >
> > > > Btw, how hard to support big mode? If we can do premapping for that
> > > > code could be simplified.
> > > >
> > > > (There are vendors that doesn't support mergeable rx buffers).
> > >
> > > I will do that after these patch-sets
> >
> > If it's not too hard, I'd suggest to do it now.
>
>
> YES. Is not too hard, but I was doing too much.
>
> * virtio-net + device stats
> * virtio-net + af-xdp, this patch set has about 27 commits
>
> And I was pushing this too long, I just want to finish the work.
> Then I can work on the next (premapped big mode, af-xdp multi-buf....).
>
> So, let we step by step.

That's fine.

Thanks


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH net-next 0/5] virtio-net: sq support premapped mode
  2024-01-30  2:51                 ` Jason Wang
@ 2024-01-30  3:13                   ` Xuan Zhuo
  0 siblings, 0 replies; 33+ messages in thread
From: Xuan Zhuo @ 2024-01-30  3:13 UTC (permalink / raw)
  To: Jason Wang
  Cc: netdev, Michael S. Tsirkin, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Alexei Starovoitov, Daniel Borkmann,
	Jesper Dangaard Brouer, John Fastabend, virtualization, bpf

On Tue, 30 Jan 2024 10:51:37 +0800, Jason Wang <jasowang@redhat.com> wrote:
> On Mon, Jan 29, 2024 at 2:28 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> >
> > On Mon, 29 Jan 2024 11:37:56 +0800, Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > On Mon, 29 Jan 2024 11:14:43 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > > On Thu, Jan 25, 2024 at 2:33 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > > >
> > > > > On Thu, 25 Jan 2024 14:14:58 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > > > > On Thu, Jan 25, 2024 at 1:52 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > > > > >
> > > > > > > On Thu, 25 Jan 2024 13:42:05 +0800, Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > > > > > > On Thu, 25 Jan 2024 11:39:28 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > > > > > > > On Tue, Jan 16, 2024 at 3:59 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > > > > > > > >
> > > > > > > > > > This is the second part of virtio-net support AF_XDP zero copy.
> > > > > > > > > >
> > > > > > > > > > The whole patch set
> > > > > > > > > > http://lore.kernel.org/all/20231229073108.57778-1-xuanzhuo@linux.alibaba.com
> > > > > > > > > >
> > > > > > > > > > ## About the branch
> > > > > > > > > >
> > > > > > > > > > This patch set is pushed to the net-next branch, but some patches are about
> > > > > > > > > > virtio core. Because the entire patch set for virtio-net to support AF_XDP
> > > > > > > > > > should be pushed to net-next, I hope these patches will be merged into net-next
> > > > > > > > > > with the virtio core maintains's Acked-by.
> > > > > > > > > >
> > > > > > > > > > ============================================================================
> > > > > > > > > >
> > > > > > > > > > ## AF_XDP
> > > > > > > > > >
> > > > > > > > > > XDP socket(AF_XDP) is an excellent bypass kernel network framework. The zero
> > > > > > > > > > copy feature of xsk (XDP socket) needs to be supported by the driver. The
> > > > > > > > > > performance of zero copy is very good. mlx5 and intel ixgbe already support
> > > > > > > > > > this feature, This patch set allows virtio-net to support xsk's zerocopy xmit
> > > > > > > > > > feature.
> > > > > > > > > >
> > > > > > > > > > At present, we have completed some preparation:
> > > > > > > > > >
> > > > > > > > > > 1. vq-reset (virtio spec and kernel code)
> > > > > > > > > > 2. virtio-core premapped dma
> > > > > > > > > > 3. virtio-net xdp refactor
> > > > > > > > > >
> > > > > > > > > > So it is time for Virtio-Net to complete the support for the XDP Socket
> > > > > > > > > > Zerocopy.
> > > > > > > > > >
> > > > > > > > > > Virtio-net can not increase the queue num at will, so xsk shares the queue with
> > > > > > > > > > kernel.
> > > > > > > > > >
> > > > > > > > > > On the other hand, Virtio-Net does not support generate interrupt from driver
> > > > > > > > > > manually, so when we wakeup tx xmit, we used some tips. If the CPU run by TX
> > > > > > > > > > NAPI last time is other CPUs, use IPI to wake up NAPI on the remote CPU. If it
> > > > > > > > > > is also the local CPU, then we wake up napi directly.
> > > > > > > > > >
> > > > > > > > > > This patch set includes some refactor to the virtio-net to let that to support
> > > > > > > > > > AF_XDP.
> > > > > > > > > >
> > > > > > > > > > ## performance
> > > > > > > > > >
> > > > > > > > > > ENV: Qemu with vhost-user(polling mode).
> > > > > > > > > > Host CPU: Intel(R) Xeon(R) Platinum 8163 CPU @ 2.50GHz
> > > > > > > > > >
> > > > > > > > > > ### virtio PMD in guest with testpmd
> > > > > > > > > >
> > > > > > > > > > testpmd> show port stats all
> > > > > > > > > >
> > > > > > > > > >  ######################## NIC statistics for port 0 ########################
> > > > > > > > > >  RX-packets: 19531092064 RX-missed: 0     RX-bytes: 1093741155584
> > > > > > > > > >  RX-errors: 0
> > > > > > > > > >  RX-nombuf: 0
> > > > > > > > > >  TX-packets: 5959955552 TX-errors: 0     TX-bytes: 371030645664
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >  Throughput (since last show)
> > > > > > > > > >  Rx-pps:   8861574     Rx-bps:  3969985208
> > > > > > > > > >  Tx-pps:   8861493     Tx-bps:  3969962736
> > > > > > > > > >  ############################################################################
> > > > > > > > > >
> > > > > > > > > > ### AF_XDP PMD in guest with testpmd
> > > > > > > > > >
> > > > > > > > > > testpmd> show port stats all
> > > > > > > > > >
> > > > > > > > > >   ######################## NIC statistics for port 0  ########################
> > > > > > > > > >   RX-packets: 68152727   RX-missed: 0          RX-bytes:  3816552712
> > > > > > > > > >   RX-errors: 0
> > > > > > > > > >   RX-nombuf:  0
> > > > > > > > > >   TX-packets: 68114967   TX-errors: 33216      TX-bytes:  3814438152
> > > > > > > > > >
> > > > > > > > > >   Throughput (since last show)
> > > > > > > > > >   Rx-pps:      6333196          Rx-bps:   2837272088
> > > > > > > > > >   Tx-pps:      6333227          Tx-bps:   2837285936
> > > > > > > > > >   ############################################################################
> > > > > > > > > >
> > > > > > > > > > But AF_XDP consumes more CPU for tx and rx napi(100% and 86%).
> > > > > > > > > >
> > > > > > > > > > ## maintain
> > > > > > > > > >
> > > > > > > > > > I am currently a reviewer for virtio-net. I commit to maintain AF_XDP support in
> > > > > > > > > > virtio-net.
> > > > > > > > > >
> > > > > > > > > > Please review.
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > > Rethink of the whole design, I have one question:
> > > > > > > > >
> > > > > > > > > The reason we need to store DMA information is to harden the virtqueue
> > > > > > > > > to make sure the DMA unmap is safe. This seems redundant when the
> > > > > > > > > buffer were premapped by the driver, for example:
> > > > > > > > >
> > > > > > > > > Receive queue maintains DMA information, so it doesn't need desc_extra to work.
> > > > > > > > >
> > > > > > > > > So can we simply
> > > > > > > > >
> > > > > > > > > 1) when premapping is enabled, store DMA information by driver itself
> > > > > > > >
> > > > > > > > YES. this is simpler. And this is more convenience.
> > > > > > > > But the driver must allocate memory to store the dma info.
> > > > > >
> > > > > > Right, and this looks like the common practice for most of the NIC drivers.
> > > > > >
> > > > > > > >
> > > > > > > > > 2) don't store DMA information in desc_extra
> > > > > > > >
> > > > > > > > YES. But the desc_extra memory is wasted. The "next" item is used.
> > > > > > > > Do you think should we free the desc_extra when the vq is premapped mode?
> > > > > > >
> > > > > > >
> > > > > > > struct vring_desc_extra {
> > > > > > >         dma_addr_t addr;                /* Descriptor DMA addr. */
> > > > > > >         u32 len;                        /* Descriptor length. */
> > > > > > >         u16 flags;                      /* Descriptor flags. */
> > > > > > >         u16 next;                       /* The next desc state in a list. */
> > > > > > > };
> > > > > > >
> > > > > > >
> > > > > > > The flags and the next are used whatever premapped or not.
> > > > > > >
> > > > > > > So I think we can add a new array to store the addr and len.
> > > > > >
> > > > > > Yes.
> > > > > >
> > > > > > > If the vq is premappd, the memory can be freed.
> > > > > >
> > > > > > Then we need to make sure the premapped is set before find_vqs() etc.
> > > > >
> > > > >
> > > > > Yes. We can start from the parameters of the find_vqs().
> > > > >
> > > > > But actually we can free the dma array when the driver sets premapped mode.
> > > >
> > > > Probably, but that's kind of odd.
> > > >
> > > > init()
> > > >     alloc()
> > > >
> > > > set_premapped()
> > > >     free()
> > >
> > > If so, the premapped option will be a find_vqs parameter,
> > > the api virtqueue_set_dma_premapped will be removed.
> > > And we can put the buffer_is_premapped to the struct virtqueue,
> > > The use can access it on the fly. (You asked on #4)
> >
> >
> > I try to pass the option to find_vqs.
> >
> > You know, the find_vqs has too many parameters.
> > And everytime we try to add new parameter is a difficult work.
> > Many places need to be changed.
> >
> >
> >         int (*find_vqs)(struct virtio_device *, unsigned nvqs,
> >                         struct virtqueue *vqs[], vq_callback_t *callbacks[],
> >                         const char * const names[], const bool *ctx,
> >                         const bool *premapped,
> >                         struct irq_affinity *desc);
> >
> > Do you have any preference if I try to refactor this to pass a struct?
> >
> > Thanks.
>
> This should be fine.

The patch set is sent. I will introduce that in next version.

Thanks.



>
> Thanks
>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH net-next 4/5] virtio_ring: introduce virtqueue_get_dma_premapped()
  2024-01-30  2:54           ` Jason Wang
@ 2024-01-30  3:13             ` Xuan Zhuo
  0 siblings, 0 replies; 33+ messages in thread
From: Xuan Zhuo @ 2024-01-30  3:13 UTC (permalink / raw)
  To: Jason Wang
  Cc: netdev, Michael S. Tsirkin, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Alexei Starovoitov, Daniel Borkmann,
	Jesper Dangaard Brouer, John Fastabend, virtualization, bpf

On Tue, 30 Jan 2024 10:54:25 +0800, Jason Wang <jasowang@redhat.com> wrote:
> On Mon, Jan 29, 2024 at 11:33 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> >
> > On Mon, 29 Jan 2024 11:07:50 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > On Thu, Jan 25, 2024 at 1:58 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > >
> > > > On Thu, 25 Jan 2024 11:39:03 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > > > On Tue, Jan 16, 2024 at 3:59 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > > > >
> > > > > > Introduce helper virtqueue_get_dma_premapped(), then the driver
> > > > > > can know whether dma unmap is needed.
> > > > > >
> > > > > > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > > > > > ---
> > > > > >  drivers/net/virtio/main.c       | 22 +++++++++-------------
> > > > > >  drivers/net/virtio/virtio_net.h |  3 ---
> > > > > >  drivers/virtio/virtio_ring.c    | 22 ++++++++++++++++++++++
> > > > > >  include/linux/virtio.h          |  1 +
> > > > > >  4 files changed, 32 insertions(+), 16 deletions(-)
> > > > > >
> > > > > > diff --git a/drivers/net/virtio/main.c b/drivers/net/virtio/main.c
> > > > > > index 186b2cf5d8fc..4fbf612da235 100644
> > > > > > --- a/drivers/net/virtio/main.c
> > > > > > +++ b/drivers/net/virtio/main.c
> > > > > > @@ -483,7 +483,7 @@ static void *virtnet_rq_get_buf(struct virtnet_rq *rq, u32 *len, void **ctx)
> > > > > >         void *buf;
> > > > > >
> > > > > >         buf = virtqueue_get_buf_ctx(rq->vq, len, ctx);
> > > > > > -       if (buf && rq->do_dma)
> > > > > > +       if (buf && virtqueue_get_dma_premapped(rq->vq))
> > > > > >                 virtnet_rq_unmap(rq, buf, *len);
> > > > > >
> > > > > >         return buf;
> > > > > > @@ -496,7 +496,7 @@ static void virtnet_rq_init_one_sg(struct virtnet_rq *rq, void *buf, u32 len)
> > > > > >         u32 offset;
> > > > > >         void *head;
> > > > > >
> > > > > > -       if (!rq->do_dma) {
> > > > > > +       if (!virtqueue_get_dma_premapped(rq->vq)) {
> > > > > >                 sg_init_one(rq->sg, buf, len);
> > > > > >                 return;
> > > > > >         }
> > > > > > @@ -526,7 +526,7 @@ static void *virtnet_rq_alloc(struct virtnet_rq *rq, u32 size, gfp_t gfp)
> > > > > >
> > > > > >         head = page_address(alloc_frag->page);
> > > > > >
> > > > > > -       if (rq->do_dma) {
> > > > > > +       if (virtqueue_get_dma_premapped(rq->vq)) {
> > > > > >                 dma = head;
> > > > > >
> > > > > >                 /* new pages */
> > > > > > @@ -580,12 +580,8 @@ static void virtnet_rq_set_premapped(struct virtnet_info *vi)
> > > > > >         if (!vi->mergeable_rx_bufs && vi->big_packets)
> > > > > >                 return;
> > > > > >
> > > > > > -       for (i = 0; i < vi->max_queue_pairs; i++) {
> > > > > > -               if (virtqueue_set_dma_premapped(vi->rq[i].vq))
> > > > > > -                       continue;
> > > > > > -
> > > > > > -               vi->rq[i].do_dma = true;
> > > > > > -       }
> > > > > > +       for (i = 0; i < vi->max_queue_pairs; i++)
> > > > > > +               virtqueue_set_dma_premapped(vi->rq[i].vq);
> > > > > >  }
> > > > > >
> > > > > >  static void free_old_xmit(struct virtnet_sq *sq, bool in_napi)
> > > > > > @@ -1643,7 +1639,7 @@ static int add_recvbuf_small(struct virtnet_info *vi, struct virtnet_rq *rq,
> > > > > >
> > > > > >         err = virtqueue_add_inbuf_ctx(rq->vq, rq->sg, 1, buf, ctx, gfp);
> > > > > >         if (err < 0) {
> > > > > > -               if (rq->do_dma)
> > > > > > +               if (virtqueue_get_dma_premapped(rq->vq))
> > > > > >                         virtnet_rq_unmap(rq, buf, 0);
> > > > > >                 put_page(virt_to_head_page(buf));
> > > > > >         }
> > > > > > @@ -1758,7 +1754,7 @@ static int add_recvbuf_mergeable(struct virtnet_info *vi,
> > > > > >         ctx = mergeable_len_to_ctx(len + room, headroom);
> > > > > >         err = virtqueue_add_inbuf_ctx(rq->vq, rq->sg, 1, buf, ctx, gfp);
> > > > > >         if (err < 0) {
> > > > > > -               if (rq->do_dma)
> > > > > > +               if (virtqueue_get_dma_premapped(rq->vq))
> > > > > >                         virtnet_rq_unmap(rq, buf, 0);
> > > > > >                 put_page(virt_to_head_page(buf));
> > > > > >         }
> > > > > > @@ -4007,7 +4003,7 @@ static void free_receive_page_frags(struct virtnet_info *vi)
> > > > > >         int i;
> > > > > >         for (i = 0; i < vi->max_queue_pairs; i++)
> > > > > >                 if (vi->rq[i].alloc_frag.page) {
> > > > > > -                       if (vi->rq[i].do_dma && vi->rq[i].last_dma)
> > > > > > +                       if (virtqueue_get_dma_premapped(vi->rq[i].vq) && vi->rq[i].last_dma)
> > > > > >                                 virtnet_rq_unmap(&vi->rq[i], vi->rq[i].last_dma, 0);
> > > > > >                         put_page(vi->rq[i].alloc_frag.page);
> > > > > >                 }
> > > > > > @@ -4035,7 +4031,7 @@ static void virtnet_rq_free_unused_bufs(struct virtqueue *vq)
> > > > > >         rq = &vi->rq[i];
> > > > > >
> > > > > >         while ((buf = virtqueue_detach_unused_buf(vq)) != NULL) {
> > > > > > -               if (rq->do_dma)
> > > > > > +               if (virtqueue_get_dma_premapped(rq->vq))
> > > > > >                         virtnet_rq_unmap(rq, buf, 0);
> > > > > >
> > > > > >                 virtnet_rq_free_buf(vi, rq, buf);
> > > > > > diff --git a/drivers/net/virtio/virtio_net.h b/drivers/net/virtio/virtio_net.h
> > > > > > index b28a4d0a3150..066a2b9d2b3c 100644
> > > > > > --- a/drivers/net/virtio/virtio_net.h
> > > > > > +++ b/drivers/net/virtio/virtio_net.h
> > > > > > @@ -115,9 +115,6 @@ struct virtnet_rq {
> > > > > >
> > > > > >         /* Record the last dma info to free after new pages is allocated. */
> > > > > >         struct virtnet_rq_dma *last_dma;
> > > > > > -
> > > > > > -       /* Do dma by self */
> > > > > > -       bool do_dma;
> > > > > >  };
> > > > > >
> > > > > >  struct virtnet_info {
> > > > > > diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> > > > > > index 2c5089d3b510..9092bcdebb53 100644
> > > > > > --- a/drivers/virtio/virtio_ring.c
> > > > > > +++ b/drivers/virtio/virtio_ring.c
> > > > > > @@ -2905,6 +2905,28 @@ int virtqueue_set_dma_premapped(struct virtqueue *_vq)
> > > > > >  }
> > > > > >  EXPORT_SYMBOL_GPL(virtqueue_set_dma_premapped);
> > > > > >
> > > > > > +/**
> > > > > > + * virtqueue_get_dma_premapped - get the vring premapped mode
> > > > > > + * @_vq: the struct virtqueue we're talking about.
> > > > > > + *
> > > > > > + * Get the premapped mode of the vq.
> > > > > > + *
> > > > > > + * Returns bool for the vq premapped mode.
> > > > > > + */
> > > > > > +bool virtqueue_get_dma_premapped(struct virtqueue *_vq)
> > > > > > +{
> > > > > > +       struct vring_virtqueue *vq = to_vvq(_vq);
> > > > > > +       bool premapped;
> > > > > > +
> > > > > > +       START_USE(vq);
> > > > > > +       premapped = vq->premapped;
> > > > > > +       END_USE(vq);
> > > > >
> > > > > Why do we need to protect premapped like this? Is the user allowed to
> > > > > change it on the fly?
> > > >
> > > >
> > > > Just protect before accessing vq.
> > >
> > > I meant how did that differ from other booleans? E.g use_dma_api, do_unmap etc.
> >
> > Sorry, maybe I misunderstanded you.
> >
> > Do you mean, should we put "premapped" to the struct virtqueue?
> > Then the user can read/write by the struct virtqueue directly?
> >
> > If that, the reason is that when set premapped, we must check
> > use_dma_api.
>
> I may not be very clear.
>
> I meant why we should protect premapped with START_USE()/END_USER() here.
>
> If it is set once during init_vqs we should not need that.


OK. I see.

Thanks.


>
> Thanks
>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH net-next 5/5] virtio_net: sq support premapped mode
  2024-01-30  2:56           ` Jason Wang
@ 2024-01-30  3:15             ` Xuan Zhuo
  2024-01-30  3:27               ` Xuan Zhuo
  0 siblings, 1 reply; 33+ messages in thread
From: Xuan Zhuo @ 2024-01-30  3:15 UTC (permalink / raw)
  To: Jason Wang
  Cc: netdev, Michael S. Tsirkin, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Alexei Starovoitov, Daniel Borkmann,
	Jesper Dangaard Brouer, John Fastabend, virtualization, bpf

On Tue, 30 Jan 2024 10:56:25 +0800, Jason Wang <jasowang@redhat.com> wrote:
> On Mon, Jan 29, 2024 at 11:28 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> >
> > On Mon, 29 Jan 2024 11:06:35 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > On Thu, Jan 25, 2024 at 2:24 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > >
> > > > On Thu, 25 Jan 2024 11:39:20 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > > > On Tue, Jan 16, 2024 at 3:59 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > > > >
> > > > > > If the xsk is enabling, the xsk tx will share the send queue.
> > > > >
> > > > > Any reason for this? Technically, virtio-net can work as other NIC
> > > > > like 256 queues. There could be some work like optimizing the
> > > > > interrupt allocations etc.
> > > >
> > > > Just like the logic of XDP_TX.
> > > >
> > > > Now the virtio spec does not allow to add new dynamic queues.
> > > > As I know, most hypervisors just support few queues.
> > >
> > > When multiqueue is developed in Qemu, it support as least 256 queue
> > > pairs if my memory is correct.
> > >
> >
> >
> > YES, but that is configured by the hypervisor.
> >
> > For the user on any platform, when he got a vm, the queue num is fixed.
> > As I know, on most case, the num is less.
> > If we want the af-xdp/xdp-tx has the the independent queues
> > I think the dynamic queue is good way.
>
> Yes, we can start from this.


My plan is start from sharing send queues.

After that I will push the dynamic queues rfc to the virtio spec.

If the new feature is negotiated, then we can support xdp/af-xdp
with independent send queues, if the feature is not supported,
xdp/af-xdp can work with sharing send queue.

I think that will not conflict.


>
> >
> >
> > > > The num of
> > > > queues is not bigger than the cpu num. So the best way is
> > > > to share the send queues.
> > > >
> > > > Parav and I tried to introduce dynamic queues.
> > >
> > > Virtio-net doesn't differ from real NIC where most of them can create
> > > queue dynamically. It's more about the resource allocation, if mgmt
> > > can start with 256 queues, then we probably fine.
> >
> > But now, if the devices has 256, we will enable the 256 queues by default.
> > that is too much.
>
> It doesn't differ from the other NIC. E.g currently the active #qps is
> determined by the number of cpus. this is only true if we have 256
> cpus.


YES. But now, the normal devices just have few queues (such as 8, 32).

Thanks.


>
> >
> > So, the dynamic queue is not to create a new queue out of the resource.
> >
> > The device may tell the driver, the max queue resource is 256,
> > but let we start from 8. If the driver need more, then we can
> > enable more.
>
> This is the policy we used now.
>
> >
> > But for me, the xdp tx can share the sq queue, so let we start
> > the af-xdp from sharing sq queue.
> >
> >
> > >
> > > But I think we can leave this question now.
> > >
> > > > But that is dropped.
> > > > Before that I think we can share the send queues.
> > > >
> > > >
> > > > >
> > > > > > But the xsk requires that the send queue use the premapped mode.
> > > > > > So the send queue must support premapped mode.
> > > > > >
> > > > > > command: pktgen_sample01_simple.sh -i eth0 -s 16/1400 -d 10.0.0.123 -m 00:16:3e:12:e1:3e -n 0 -p 100
> > > > > > machine:  ecs.ebmg6e.26xlarge of Aliyun
> > > > > > cpu: Intel(R) Xeon(R) Platinum 8269CY CPU @ 2.50GHz
> > > > > > iommu mode: intel_iommu=on iommu.strict=1 iommu=nopt
> > > > > >
> > > > > >                       |        iommu off           |        iommu on
> > > > > > ----------------------|-----------------------------------------------------
> > > > > >                       | 16         |  1400         | 16         | 1400
> > > > > > ----------------------|-----------------------------------------------------
> > > > > > Before:               |1716796.00  |  1581829.00   | 390756.00  | 374493.00
> > > > > > After(premapped off): |1733794.00  |  1576259.00   | 390189.00  | 378128.00
> > > > > > After(premapped on):  |1707107.00  |  1562917.00   | 385667.00  | 373584.00
> > > > > >
> > > > > > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > > > > > ---
> > > > > >  drivers/net/virtio/main.c       | 119 ++++++++++++++++++++++++++++----
> > > > > >  drivers/net/virtio/virtio_net.h |  10 ++-
> > > > > >  2 files changed, 116 insertions(+), 13 deletions(-)
> > > > > >
> > > > > > diff --git a/drivers/net/virtio/main.c b/drivers/net/virtio/main.c
> > > > > > index 4fbf612da235..53143f95a3a0 100644
> > > > > > --- a/drivers/net/virtio/main.c
> > > > > > +++ b/drivers/net/virtio/main.c
> > > > > > @@ -168,13 +168,39 @@ static struct xdp_frame *ptr_to_xdp(void *ptr)
> > > > > >         return (struct xdp_frame *)((unsigned long)ptr & ~VIRTIO_XDP_FLAG);
> > > > > >  }
> > > > > >
> > > > > > +static void virtnet_sq_unmap_buf(struct virtnet_sq *sq, struct virtio_dma_head *dma)
> > > > > > +{
> > > > > > +       int i;
> > > > > > +
> > > > > > +       if (!dma)
> > > > > > +               return;
> > > > > > +
> > > > > > +       for (i = 0; i < dma->next; ++i)
> > > > > > +               virtqueue_dma_unmap_single_attrs(sq->vq,
> > > > > > +                                                dma->items[i].addr,
> > > > > > +                                                dma->items[i].length,
> > > > > > +                                                DMA_TO_DEVICE, 0);
> > > > > > +       dma->next = 0;
> > > > > > +}
> > > > > > +
> > > > > >  static void __free_old_xmit(struct virtnet_sq *sq, bool in_napi,
> > > > > >                             u64 *bytes, u64 *packets)
> > > > > >  {
> > > > > > +       struct virtio_dma_head *dma;
> > > > > >         unsigned int len;
> > > > > >         void *ptr;
> > > > > >
> > > > > > -       while ((ptr = virtqueue_get_buf(sq->vq, &len)) != NULL) {
> > > > > > +       if (virtqueue_get_dma_premapped(sq->vq)) {
> > > > >
> > > > > Any chance this.can be false?
> > > >
> > > > __free_old_xmit is the common path.
> > >
> > > Did you mean the XDP path doesn't work with this? If yes, we need to
> > > change that.
> >
> >
> > NO. If the virtio core use_dma_api is false, the dma premapped
> > can not be ture.
>
> Ok, I see.
>
> >
> > >
> > > >
> > > > The virtqueue_get_dma_premapped() is used to check whether the sq is premapped
> > > > mode.
> > > >
> > > > >
> > > > > > +               dma = &sq->dma.head;
> > > > > > +               dma->num = ARRAY_SIZE(sq->dma.items);
> > > > > > +               dma->next = 0;
> > > > >
> > > > > Btw, I found in the case of RX we have:
> > > > >
> > > > > virtnet_rq_alloc():
> > > > >
> > > > >                         alloc_frag->offset = sizeof(*dma);
> > > > >
> > > > > This seems to defeat frag coalescing when the memory is highly
> > > > > fragmented or high order allocation is disallowed.
> > > > >
> > > > > Any idea to solve this?
> > > >
> > > >
> > > > On the rq premapped pathset, I answered this.
> > > >
> > > > http://lore.kernel.org/all/1692156147.7470396-3-xuanzhuo@linux.alibaba.com
> > >
> > > Oops, I forget that.
> > >
> > > >
> > > > >
> > > > > > +       } else {
> > > > > > +               dma = NULL;
> > > > > > +       }
> > > > > > +
> > > > > > +       while ((ptr = virtqueue_get_buf_ctx_dma(sq->vq, &len, dma, NULL)) != NULL) {
> > > > > > +               virtnet_sq_unmap_buf(sq, dma);
> > > > > > +
> > > > > >                 if (!is_xdp_frame(ptr)) {
> > > > > >                         struct sk_buff *skb = ptr;
> > > > > >
> > > > > > @@ -572,16 +598,70 @@ static void *virtnet_rq_alloc(struct virtnet_rq *rq, u32 size, gfp_t gfp)
> > > > > >         return buf;
> > > > > >  }
> > > > > >
> > > > > > -static void virtnet_rq_set_premapped(struct virtnet_info *vi)
> > > > > > +static void virtnet_set_premapped(struct virtnet_info *vi)
> > > > > >  {
> > > > > >         int i;
> > > > > >
> > > > > > -       /* disable for big mode */
> > > > > > -       if (!vi->mergeable_rx_bufs && vi->big_packets)
> > > > > > -               return;
> > > > > > +       for (i = 0; i < vi->max_queue_pairs; i++) {
> > > > > > +               virtqueue_set_dma_premapped(vi->sq[i].vq);
> > > > > >
> > > > > > -       for (i = 0; i < vi->max_queue_pairs; i++)
> > > > > > -               virtqueue_set_dma_premapped(vi->rq[i].vq);
> > > > > > +               /* TODO for big mode */
> > > > >
> > > > > Btw, how hard to support big mode? If we can do premapping for that
> > > > > code could be simplified.
> > > > >
> > > > > (There are vendors that doesn't support mergeable rx buffers).
> > > >
> > > > I will do that after these patch-sets
> > >
> > > If it's not too hard, I'd suggest to do it now.
> >
> >
> > YES. Is not too hard, but I was doing too much.
> >
> > * virtio-net + device stats
> > * virtio-net + af-xdp, this patch set has about 27 commits
> >
> > And I was pushing this too long, I just want to finish the work.
> > Then I can work on the next (premapped big mode, af-xdp multi-buf....).
> >
> > So, let we step by step.
>
> That's fine.
>
> Thanks
>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH net-next 5/5] virtio_net: sq support premapped mode
  2024-01-30  3:15             ` Xuan Zhuo
@ 2024-01-30  3:27               ` Xuan Zhuo
  0 siblings, 0 replies; 33+ messages in thread
From: Xuan Zhuo @ 2024-01-30  3:27 UTC (permalink / raw)
  To: Xuan Zhuo
  Cc: netdev, Michael S. Tsirkin, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Alexei Starovoitov, Daniel Borkmann,
	Jesper Dangaard Brouer, John Fastabend, virtualization, bpf,
	Jason Wang, Parav Pandit

On Tue, 30 Jan 2024 11:15:07 +0800, Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> On Tue, 30 Jan 2024 10:56:25 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > On Mon, Jan 29, 2024 at 11:28 AM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > >
> > > On Mon, 29 Jan 2024 11:06:35 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > > On Thu, Jan 25, 2024 at 2:24 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > > >
> > > > > On Thu, 25 Jan 2024 11:39:20 +0800, Jason Wang <jasowang@redhat.com> wrote:
> > > > > > On Tue, Jan 16, 2024 at 3:59 PM Xuan Zhuo <xuanzhuo@linux.alibaba.com> wrote:
> > > > > > >
> > > > > > > If the xsk is enabling, the xsk tx will share the send queue.
> > > > > >
> > > > > > Any reason for this? Technically, virtio-net can work as other NIC
> > > > > > like 256 queues. There could be some work like optimizing the
> > > > > > interrupt allocations etc.
> > > > >
> > > > > Just like the logic of XDP_TX.
> > > > >
> > > > > Now the virtio spec does not allow to add new dynamic queues.
> > > > > As I know, most hypervisors just support few queues.
> > > >
> > > > When multiqueue is developed in Qemu, it support as least 256 queue
> > > > pairs if my memory is correct.
> > > >
> > >
> > >
> > > YES, but that is configured by the hypervisor.
> > >
> > > For the user on any platform, when he got a vm, the queue num is fixed.
> > > As I know, on most case, the num is less.
> > > If we want the af-xdp/xdp-tx has the the independent queues
> > > I think the dynamic queue is good way.
> >
> > Yes, we can start from this.
>
>
> My plan is start from sharing send queues.
>
> After that I will push the dynamic queues rfc to the virtio spec.
>
> If the new feature is negotiated, then we can support xdp/af-xdp
> with independent send queues, if the feature is not supported,
> xdp/af-xdp can work with sharing send queue.
>
> I think that will not conflict.

cc Parav.


>
>
> >
> > >
> > >
> > > > > The num of
> > > > > queues is not bigger than the cpu num. So the best way is
> > > > > to share the send queues.
> > > > >
> > > > > Parav and I tried to introduce dynamic queues.
> > > >
> > > > Virtio-net doesn't differ from real NIC where most of them can create
> > > > queue dynamically. It's more about the resource allocation, if mgmt
> > > > can start with 256 queues, then we probably fine.
> > >
> > > But now, if the devices has 256, we will enable the 256 queues by default.
> > > that is too much.
> >
> > It doesn't differ from the other NIC. E.g currently the active #qps is
> > determined by the number of cpus. this is only true if we have 256
> > cpus.
>
>
> YES. But now, the normal devices just have few queues (such as 8, 32).
>
> Thanks.
>
>
> >
> > >
> > > So, the dynamic queue is not to create a new queue out of the resource.
> > >
> > > The device may tell the driver, the max queue resource is 256,
> > > but let we start from 8. If the driver need more, then we can
> > > enable more.
> >
> > This is the policy we used now.
> >
> > >
> > > But for me, the xdp tx can share the sq queue, so let we start
> > > the af-xdp from sharing sq queue.
> > >
> > >
> > > >
> > > > But I think we can leave this question now.
> > > >
> > > > > But that is dropped.
> > > > > Before that I think we can share the send queues.
> > > > >
> > > > >
> > > > > >
> > > > > > > But the xsk requires that the send queue use the premapped mode.
> > > > > > > So the send queue must support premapped mode.
> > > > > > >
> > > > > > > command: pktgen_sample01_simple.sh -i eth0 -s 16/1400 -d 10.0.0.123 -m 00:16:3e:12:e1:3e -n 0 -p 100
> > > > > > > machine:  ecs.ebmg6e.26xlarge of Aliyun
> > > > > > > cpu: Intel(R) Xeon(R) Platinum 8269CY CPU @ 2.50GHz
> > > > > > > iommu mode: intel_iommu=on iommu.strict=1 iommu=nopt
> > > > > > >
> > > > > > >                       |        iommu off           |        iommu on
> > > > > > > ----------------------|-----------------------------------------------------
> > > > > > >                       | 16         |  1400         | 16         | 1400
> > > > > > > ----------------------|-----------------------------------------------------
> > > > > > > Before:               |1716796.00  |  1581829.00   | 390756.00  | 374493.00
> > > > > > > After(premapped off): |1733794.00  |  1576259.00   | 390189.00  | 378128.00
> > > > > > > After(premapped on):  |1707107.00  |  1562917.00   | 385667.00  | 373584.00
> > > > > > >
> > > > > > > Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > > > > > > ---
> > > > > > >  drivers/net/virtio/main.c       | 119 ++++++++++++++++++++++++++++----
> > > > > > >  drivers/net/virtio/virtio_net.h |  10 ++-
> > > > > > >  2 files changed, 116 insertions(+), 13 deletions(-)
> > > > > > >
> > > > > > > diff --git a/drivers/net/virtio/main.c b/drivers/net/virtio/main.c
> > > > > > > index 4fbf612da235..53143f95a3a0 100644
> > > > > > > --- a/drivers/net/virtio/main.c
> > > > > > > +++ b/drivers/net/virtio/main.c
> > > > > > > @@ -168,13 +168,39 @@ static struct xdp_frame *ptr_to_xdp(void *ptr)
> > > > > > >         return (struct xdp_frame *)((unsigned long)ptr & ~VIRTIO_XDP_FLAG);
> > > > > > >  }
> > > > > > >
> > > > > > > +static void virtnet_sq_unmap_buf(struct virtnet_sq *sq, struct virtio_dma_head *dma)
> > > > > > > +{
> > > > > > > +       int i;
> > > > > > > +
> > > > > > > +       if (!dma)
> > > > > > > +               return;
> > > > > > > +
> > > > > > > +       for (i = 0; i < dma->next; ++i)
> > > > > > > +               virtqueue_dma_unmap_single_attrs(sq->vq,
> > > > > > > +                                                dma->items[i].addr,
> > > > > > > +                                                dma->items[i].length,
> > > > > > > +                                                DMA_TO_DEVICE, 0);
> > > > > > > +       dma->next = 0;
> > > > > > > +}
> > > > > > > +
> > > > > > >  static void __free_old_xmit(struct virtnet_sq *sq, bool in_napi,
> > > > > > >                             u64 *bytes, u64 *packets)
> > > > > > >  {
> > > > > > > +       struct virtio_dma_head *dma;
> > > > > > >         unsigned int len;
> > > > > > >         void *ptr;
> > > > > > >
> > > > > > > -       while ((ptr = virtqueue_get_buf(sq->vq, &len)) != NULL) {
> > > > > > > +       if (virtqueue_get_dma_premapped(sq->vq)) {
> > > > > >
> > > > > > Any chance this.can be false?
> > > > >
> > > > > __free_old_xmit is the common path.
> > > >
> > > > Did you mean the XDP path doesn't work with this? If yes, we need to
> > > > change that.
> > >
> > >
> > > NO. If the virtio core use_dma_api is false, the dma premapped
> > > can not be ture.
> >
> > Ok, I see.
> >
> > >
> > > >
> > > > >
> > > > > The virtqueue_get_dma_premapped() is used to check whether the sq is premapped
> > > > > mode.
> > > > >
> > > > > >
> > > > > > > +               dma = &sq->dma.head;
> > > > > > > +               dma->num = ARRAY_SIZE(sq->dma.items);
> > > > > > > +               dma->next = 0;
> > > > > >
> > > > > > Btw, I found in the case of RX we have:
> > > > > >
> > > > > > virtnet_rq_alloc():
> > > > > >
> > > > > >                         alloc_frag->offset = sizeof(*dma);
> > > > > >
> > > > > > This seems to defeat frag coalescing when the memory is highly
> > > > > > fragmented or high order allocation is disallowed.
> > > > > >
> > > > > > Any idea to solve this?
> > > > >
> > > > >
> > > > > On the rq premapped pathset, I answered this.
> > > > >
> > > > > http://lore.kernel.org/all/1692156147.7470396-3-xuanzhuo@linux.alibaba.com
> > > >
> > > > Oops, I forget that.
> > > >
> > > > >
> > > > > >
> > > > > > > +       } else {
> > > > > > > +               dma = NULL;
> > > > > > > +       }
> > > > > > > +
> > > > > > > +       while ((ptr = virtqueue_get_buf_ctx_dma(sq->vq, &len, dma, NULL)) != NULL) {
> > > > > > > +               virtnet_sq_unmap_buf(sq, dma);
> > > > > > > +
> > > > > > >                 if (!is_xdp_frame(ptr)) {
> > > > > > >                         struct sk_buff *skb = ptr;
> > > > > > >
> > > > > > > @@ -572,16 +598,70 @@ static void *virtnet_rq_alloc(struct virtnet_rq *rq, u32 size, gfp_t gfp)
> > > > > > >         return buf;
> > > > > > >  }
> > > > > > >
> > > > > > > -static void virtnet_rq_set_premapped(struct virtnet_info *vi)
> > > > > > > +static void virtnet_set_premapped(struct virtnet_info *vi)
> > > > > > >  {
> > > > > > >         int i;
> > > > > > >
> > > > > > > -       /* disable for big mode */
> > > > > > > -       if (!vi->mergeable_rx_bufs && vi->big_packets)
> > > > > > > -               return;
> > > > > > > +       for (i = 0; i < vi->max_queue_pairs; i++) {
> > > > > > > +               virtqueue_set_dma_premapped(vi->sq[i].vq);
> > > > > > >
> > > > > > > -       for (i = 0; i < vi->max_queue_pairs; i++)
> > > > > > > -               virtqueue_set_dma_premapped(vi->rq[i].vq);
> > > > > > > +               /* TODO for big mode */
> > > > > >
> > > > > > Btw, how hard to support big mode? If we can do premapping for that
> > > > > > code could be simplified.
> > > > > >
> > > > > > (There are vendors that doesn't support mergeable rx buffers).
> > > > >
> > > > > I will do that after these patch-sets
> > > >
> > > > If it's not too hard, I'd suggest to do it now.
> > >
> > >
> > > YES. Is not too hard, but I was doing too much.
> > >
> > > * virtio-net + device stats
> > > * virtio-net + af-xdp, this patch set has about 27 commits
> > >
> > > And I was pushing this too long, I just want to finish the work.
> > > Then I can work on the next (premapped big mode, af-xdp multi-buf....).
> > >
> > > So, let we step by step.
> >
> > That's fine.
> >
> > Thanks
> >
>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH net-next 1/5] virtio_ring: introduce virtqueue_get_buf_ctx_dma()
  2024-01-16  7:59 ` [PATCH net-next 1/5] virtio_ring: introduce virtqueue_get_buf_ctx_dma() Xuan Zhuo
  2024-01-24  6:54   ` Jason Wang
@ 2024-02-22 19:43   ` Michael S. Tsirkin
  1 sibling, 0 replies; 33+ messages in thread
From: Michael S. Tsirkin @ 2024-02-22 19:43 UTC (permalink / raw)
  To: Xuan Zhuo
  Cc: netdev, Jason Wang, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Alexei Starovoitov, Daniel Borkmann,
	Jesper Dangaard Brouer, John Fastabend, virtualization, bpf

On Tue, Jan 16, 2024 at 03:59:20PM +0800, Xuan Zhuo wrote:
> introduce virtqueue_get_buf_ctx_dma() to collect the dma info when
> get buf from virtio core for premapped mode.
> 
> If the virtio queue is premapped mode, the virtio-net send buf may
> have many desc. Every desc dma address need to be unmap. So here we
> introduce a new helper to collect the dma address of the buffer from
> the virtio core.
> 
> Because the BAD_RING is called (that may set vq->broken), so
> the relative "const" of vq is removed.
> 
> Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> ---
>  drivers/virtio/virtio_ring.c | 174 +++++++++++++++++++++++++----------
>  include/linux/virtio.h       |  16 ++++
>  2 files changed, 142 insertions(+), 48 deletions(-)
> 
> diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> index 49299b1f9ec7..82f72428605b 100644
> --- a/drivers/virtio/virtio_ring.c
> +++ b/drivers/virtio/virtio_ring.c
> @@ -362,6 +362,45 @@ static struct device *vring_dma_dev(const struct vring_virtqueue *vq)
>  	return vq->dma_dev;
>  }
>  
> +/*
> + *     use_dma_api premapped -> do_unmap
> + *  1. false       false        false
> + *  2. true        false        true
> + *  3. true        true         false
> + *
> + * Only #3, we should return the DMA info to the driver.

no idea what this table is supposed to mean.
Instead of this, just add comments near each
piece of code explaining it.
E.g. something like (guest guessing at the reason, pls replace
with the real one):

	/* if premapping is not supported, no need to unmap */
	if (!vq->premapped)
		return false;

and so on


> + * Return:
> + * true: the virtio core must unmap the desc
> + * false: the virtio core skip the desc unmap
> + */
> +static bool vring_need_unmap(struct vring_virtqueue *vq,
> +			     struct virtio_dma_head *dma,
> +			     dma_addr_t addr, unsigned int length)
> +{
> +	if (vq->do_unmap)
> +		return true;
> +
> +	if (!vq->premapped)
> +		return false;
> +
> +	if (!dma)
> +		return false;
> +
> +	if (unlikely(dma->next >= dma->num)) {
> +		BAD_RING(vq, "premapped vq: collect dma overflow: %pad %u\n",
> +			 &addr, length);
> +		return false;
> +	}
> +
> +	dma->items[dma->next].addr = addr;
> +	dma->items[dma->next].length = length;
> +
> +	++dma->next;
> +
> +	return false;
> +}
> +
>  /* Map one sg entry. */
>  static int vring_map_one_sg(const struct vring_virtqueue *vq, struct scatterlist *sg,
>  			    enum dma_data_direction direction, dma_addr_t *addr)
> @@ -440,12 +479,14 @@ static void virtqueue_init(struct vring_virtqueue *vq, u32 num)
>   * Split ring specific functions - *_split().
>   */
>  
> -static void vring_unmap_one_split_indirect(const struct vring_virtqueue *vq,
> -					   const struct vring_desc *desc)
> +static void vring_unmap_one_split_indirect(struct vring_virtqueue *vq,
> +					   const struct vring_desc *desc,
> +					   struct virtio_dma_head *dma)
>  {
>  	u16 flags;
>  
> -	if (!vq->do_unmap)
> +	if (!vring_need_unmap(vq, dma, virtio64_to_cpu(vq->vq.vdev, desc->addr),
> +			  virtio32_to_cpu(vq->vq.vdev, desc->len)))
>  		return;
>  
>  	flags = virtio16_to_cpu(vq->vq.vdev, desc->flags);
> @@ -457,8 +498,8 @@ static void vring_unmap_one_split_indirect(const struct vring_virtqueue *vq,
>  		       DMA_FROM_DEVICE : DMA_TO_DEVICE);
>  }
>  
> -static unsigned int vring_unmap_one_split(const struct vring_virtqueue *vq,
> -					  unsigned int i)
> +static unsigned int vring_unmap_one_split(struct vring_virtqueue *vq,
> +					  unsigned int i, struct virtio_dma_head *dma)
>  {
>  	struct vring_desc_extra *extra = vq->split.desc_extra;
>  	u16 flags;
> @@ -474,17 +515,16 @@ static unsigned int vring_unmap_one_split(const struct vring_virtqueue *vq,
>  				 extra[i].len,
>  				 (flags & VRING_DESC_F_WRITE) ?
>  				 DMA_FROM_DEVICE : DMA_TO_DEVICE);
> -	} else {
> -		if (!vq->do_unmap)
> -			goto out;
> -
> -		dma_unmap_page(vring_dma_dev(vq),
> -			       extra[i].addr,
> -			       extra[i].len,
> -			       (flags & VRING_DESC_F_WRITE) ?
> -			       DMA_FROM_DEVICE : DMA_TO_DEVICE);
> +		goto out;
>  	}
>  
> +	if (!vring_need_unmap(vq, dma, extra[i].addr, extra[i].len))
> +		goto out;
> +
> +	dma_unmap_page(vring_dma_dev(vq), extra[i].addr, extra[i].len,
> +		       (flags & VRING_DESC_F_WRITE) ?
> +		       DMA_FROM_DEVICE : DMA_TO_DEVICE);
> +
>  out:
>  	return extra[i].next;
>  }
> @@ -717,10 +757,10 @@ static inline int virtqueue_add_split(struct virtqueue *_vq,
>  		if (i == err_idx)
>  			break;
>  		if (indirect) {
> -			vring_unmap_one_split_indirect(vq, &desc[i]);
> +			vring_unmap_one_split_indirect(vq, &desc[i], NULL);
>  			i = virtio16_to_cpu(_vq->vdev, desc[i].next);
>  		} else
> -			i = vring_unmap_one_split(vq, i);
> +			i = vring_unmap_one_split(vq, i, NULL);
>  	}
>  
>  free_indirect:
> @@ -763,7 +803,7 @@ static bool virtqueue_kick_prepare_split(struct virtqueue *_vq)
>  }
>  
>  static void detach_buf_split(struct vring_virtqueue *vq, unsigned int head,
> -			     void **ctx)
> +			     struct virtio_dma_head *dma, void **ctx)
>  {
>  	unsigned int i, j;
>  	__virtio16 nextflag = cpu_to_virtio16(vq->vq.vdev, VRING_DESC_F_NEXT);
> @@ -775,12 +815,12 @@ static void detach_buf_split(struct vring_virtqueue *vq, unsigned int head,
>  	i = head;
>  
>  	while (vq->split.vring.desc[i].flags & nextflag) {
> -		vring_unmap_one_split(vq, i);
> +		vring_unmap_one_split(vq, i, dma);
>  		i = vq->split.desc_extra[i].next;
>  		vq->vq.num_free++;
>  	}
>  
> -	vring_unmap_one_split(vq, i);
> +	vring_unmap_one_split(vq, i, dma);
>  	vq->split.desc_extra[i].next = vq->free_head;
>  	vq->free_head = head;
>  
> @@ -802,9 +842,9 @@ static void detach_buf_split(struct vring_virtqueue *vq, unsigned int head,
>  				VRING_DESC_F_INDIRECT));
>  		BUG_ON(len == 0 || len % sizeof(struct vring_desc));
>  
> -		if (vq->do_unmap) {
> +		if (vq->do_unmap || dma) {
>  			for (j = 0; j < len / sizeof(struct vring_desc); j++)
> -				vring_unmap_one_split_indirect(vq, &indir_desc[j]);
> +				vring_unmap_one_split_indirect(vq, &indir_desc[j], dma);
>  		}
>  
>  		kfree(indir_desc);
> @@ -822,6 +862,7 @@ static bool more_used_split(const struct vring_virtqueue *vq)
>  
>  static void *virtqueue_get_buf_ctx_split(struct virtqueue *_vq,
>  					 unsigned int *len,
> +					 struct virtio_dma_head *dma,
>  					 void **ctx)
>  {
>  	struct vring_virtqueue *vq = to_vvq(_vq);
> @@ -862,7 +903,7 @@ static void *virtqueue_get_buf_ctx_split(struct virtqueue *_vq,
>  
>  	/* detach_buf_split clears data, so grab it now. */
>  	ret = vq->split.desc_state[i].data;
> -	detach_buf_split(vq, i, ctx);
> +	detach_buf_split(vq, i, dma, ctx);
>  	vq->last_used_idx++;
>  	/* If we expect an interrupt for the next entry, tell host
>  	 * by writing event index and flush out the write before
> @@ -984,7 +1025,7 @@ static void *virtqueue_detach_unused_buf_split(struct virtqueue *_vq)
>  			continue;
>  		/* detach_buf_split clears data, so grab it now. */
>  		buf = vq->split.desc_state[i].data;
> -		detach_buf_split(vq, i, NULL);
> +		detach_buf_split(vq, i, NULL, NULL);
>  		vq->split.avail_idx_shadow--;
>  		vq->split.vring.avail->idx = cpu_to_virtio16(_vq->vdev,
>  				vq->split.avail_idx_shadow);
> @@ -1220,8 +1261,9 @@ static u16 packed_last_used(u16 last_used_idx)
>  	return last_used_idx & ~(-(1 << VRING_PACKED_EVENT_F_WRAP_CTR));
>  }
>  
> -static void vring_unmap_extra_packed(const struct vring_virtqueue *vq,
> -				     const struct vring_desc_extra *extra)
> +static void vring_unmap_extra_packed(struct vring_virtqueue *vq,
> +				     const struct vring_desc_extra *extra,
> +				     struct virtio_dma_head *dma)
>  {
>  	u16 flags;
>  
> @@ -1235,23 +1277,24 @@ static void vring_unmap_extra_packed(const struct vring_virtqueue *vq,
>  				 extra->addr, extra->len,
>  				 (flags & VRING_DESC_F_WRITE) ?
>  				 DMA_FROM_DEVICE : DMA_TO_DEVICE);
> -	} else {
> -		if (!vq->do_unmap)
> -			return;
> -
> -		dma_unmap_page(vring_dma_dev(vq),
> -			       extra->addr, extra->len,
> -			       (flags & VRING_DESC_F_WRITE) ?
> -			       DMA_FROM_DEVICE : DMA_TO_DEVICE);
> +		return;
>  	}
> +
> +	if (!vring_need_unmap(vq, dma, extra->addr, extra->len))
> +		return;
> +
> +	dma_unmap_page(vring_dma_dev(vq), extra->addr, extra->len,
> +		       (flags & VRING_DESC_F_WRITE) ?
> +		       DMA_FROM_DEVICE : DMA_TO_DEVICE);
>  }
>  
> -static void vring_unmap_desc_packed(const struct vring_virtqueue *vq,
> -				    const struct vring_packed_desc *desc)
> +static void vring_unmap_desc_packed(struct vring_virtqueue *vq,
> +				    const struct vring_packed_desc *desc,
> +				    struct virtio_dma_head *dma)
>  {
>  	u16 flags;
>  
> -	if (!vq->do_unmap)
> +	if (!vring_need_unmap(vq, dma, le64_to_cpu(desc->addr), le32_to_cpu(desc->len)))
>  		return;
>  
>  	flags = le16_to_cpu(desc->flags);
> @@ -1389,7 +1432,7 @@ static int virtqueue_add_indirect_packed(struct vring_virtqueue *vq,
>  	err_idx = i;
>  
>  	for (i = 0; i < err_idx; i++)
> -		vring_unmap_desc_packed(vq, &desc[i]);
> +		vring_unmap_desc_packed(vq, &desc[i], NULL);
>  
>  free_desc:
>  	kfree(desc);
> @@ -1539,7 +1582,7 @@ static inline int virtqueue_add_packed(struct virtqueue *_vq,
>  	for (n = 0; n < total_sg; n++) {
>  		if (i == err_idx)
>  			break;
> -		vring_unmap_extra_packed(vq, &vq->packed.desc_extra[curr]);
> +		vring_unmap_extra_packed(vq, &vq->packed.desc_extra[curr], NULL);
>  		curr = vq->packed.desc_extra[curr].next;
>  		i++;
>  		if (i >= vq->packed.vring.num)
> @@ -1600,7 +1643,9 @@ static bool virtqueue_kick_prepare_packed(struct virtqueue *_vq)
>  }
>  
>  static void detach_buf_packed(struct vring_virtqueue *vq,
> -			      unsigned int id, void **ctx)
> +			      unsigned int id,
> +			      struct virtio_dma_head *dma,
> +			      void **ctx)
>  {
>  	struct vring_desc_state_packed *state = NULL;
>  	struct vring_packed_desc *desc;
> @@ -1615,11 +1660,10 @@ static void detach_buf_packed(struct vring_virtqueue *vq,
>  	vq->free_head = id;
>  	vq->vq.num_free += state->num;
>  
> -	if (unlikely(vq->do_unmap)) {
> +	if (vq->do_unmap || dma) {
>  		curr = id;
>  		for (i = 0; i < state->num; i++) {
> -			vring_unmap_extra_packed(vq,
> -						 &vq->packed.desc_extra[curr]);
> +			vring_unmap_extra_packed(vq, &vq->packed.desc_extra[curr], dma);
>  			curr = vq->packed.desc_extra[curr].next;
>  		}
>  	}
> @@ -1632,11 +1676,11 @@ static void detach_buf_packed(struct vring_virtqueue *vq,
>  		if (!desc)
>  			return;
>  
> -		if (vq->do_unmap) {
> +		if (vq->do_unmap || dma) {
>  			len = vq->packed.desc_extra[id].len;
>  			for (i = 0; i < len / sizeof(struct vring_packed_desc);
>  					i++)
> -				vring_unmap_desc_packed(vq, &desc[i]);
> +				vring_unmap_desc_packed(vq, &desc[i], dma);
>  		}
>  		kfree(desc);
>  		state->indir_desc = NULL;
> @@ -1672,6 +1716,7 @@ static bool more_used_packed(const struct vring_virtqueue *vq)
>  
>  static void *virtqueue_get_buf_ctx_packed(struct virtqueue *_vq,
>  					  unsigned int *len,
> +					  struct virtio_dma_head *dma,
>  					  void **ctx)
>  {
>  	struct vring_virtqueue *vq = to_vvq(_vq);
> @@ -1712,7 +1757,7 @@ static void *virtqueue_get_buf_ctx_packed(struct virtqueue *_vq,
>  
>  	/* detach_buf_packed clears data, so grab it now. */
>  	ret = vq->packed.desc_state[id].data;
> -	detach_buf_packed(vq, id, ctx);
> +	detach_buf_packed(vq, id, dma, ctx);
>  
>  	last_used += vq->packed.desc_state[id].num;
>  	if (unlikely(last_used >= vq->packed.vring.num)) {
> @@ -1877,7 +1922,7 @@ static void *virtqueue_detach_unused_buf_packed(struct virtqueue *_vq)
>  			continue;
>  		/* detach_buf clears data, so grab it now. */
>  		buf = vq->packed.desc_state[i].data;
> -		detach_buf_packed(vq, i, NULL);
> +		detach_buf_packed(vq, i, NULL, NULL);
>  		END_USE(vq);
>  		return buf;
>  	}
> @@ -2417,11 +2462,44 @@ void *virtqueue_get_buf_ctx(struct virtqueue *_vq, unsigned int *len,
>  {
>  	struct vring_virtqueue *vq = to_vvq(_vq);
>  
> -	return vq->packed_ring ? virtqueue_get_buf_ctx_packed(_vq, len, ctx) :
> -				 virtqueue_get_buf_ctx_split(_vq, len, ctx);
> +	return vq->packed_ring ? virtqueue_get_buf_ctx_packed(_vq, len, NULL, ctx) :
> +				 virtqueue_get_buf_ctx_split(_vq, len, NULL, ctx);
>  }
>  EXPORT_SYMBOL_GPL(virtqueue_get_buf_ctx);
>  
> +/**
> + * virtqueue_get_buf_ctx_dma - get the next used buffer with the dma info
> + * @_vq: the struct virtqueue we're talking about.
> + * @len: the length written into the buffer
> + * @dma: the head of the array to store the dma info
> + * @ctx: extra context for the token
> + *
> + * If the device wrote data into the buffer, @len will be set to the
> + * amount written.  This means you don't need to clear the buffer
> + * beforehand to ensure there's no data leakage in the case of short
> + * writes.
> + *
> + * Caller must ensure we don't call this with other virtqueue
> + * operations at the same time (except where noted).
> + *
> + * We store the dma info of every descriptor of this buf to the dma->items
> + * array. If the array size is too small, some dma info may be missed, so
> + * the caller must ensure the array is large enough. The dma->next is the out
> + * value to the caller, indicates the num of the used items.

num -> number?
So next is the number of items? And num is what?
Can't we avoid hacks like this in APIs?

> + *
> + * Returns NULL if there are no used buffers, or the "data" token
> + * handed to virtqueue_add_*().
> + */
> +void *virtqueue_get_buf_ctx_dma(struct virtqueue *_vq, unsigned int *len,
> +				struct virtio_dma_head *dma, void **ctx)
> +{
> +	struct vring_virtqueue *vq = to_vvq(_vq);
> +
> +	return vq->packed_ring ? virtqueue_get_buf_ctx_packed(_vq, len, dma, ctx) :
> +				 virtqueue_get_buf_ctx_split(_vq, len, dma, ctx);
> +}
> +EXPORT_SYMBOL_GPL(virtqueue_get_buf_ctx_dma);
> +
>  void *virtqueue_get_buf(struct virtqueue *_vq, unsigned int *len)
>  {
>  	return virtqueue_get_buf_ctx(_vq, len, NULL);
> diff --git a/include/linux/virtio.h b/include/linux/virtio.h
> index 4cc614a38376..572aecec205b 100644
> --- a/include/linux/virtio.h
> +++ b/include/linux/virtio.h
> @@ -75,6 +75,22 @@ void *virtqueue_get_buf(struct virtqueue *vq, unsigned int *len);
>  void *virtqueue_get_buf_ctx(struct virtqueue *vq, unsigned int *len,
>  			    void **ctx);
>  
> +struct virtio_dma_item {
> +	dma_addr_t addr;
> +	unsigned int length;
> +};
> +
> +struct virtio_dma_head {
> +	/* total num of items. */
> +	u16 num;
> +	/* point to the next item to store dma info. */
> +	u16 next;

I'm not sure what is this data structure ... is it a linked list?  a ring?
pls document.


> +	struct virtio_dma_item items[];
> +};
> +
> +void *virtqueue_get_buf_ctx_dma(struct virtqueue *_vq, unsigned int *len,
> +				struct virtio_dma_head *dma, void **ctx);
> +
>  void virtqueue_disable_cb(struct virtqueue *vq);
>  
>  bool virtqueue_enable_cb(struct virtqueue *vq);
> -- 
> 2.32.0.3.g01195cf9f


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH net-next 0/5] virtio-net: sq support premapped mode
  2024-01-16  7:59 [PATCH net-next 0/5] virtio-net: sq support premapped mode Xuan Zhuo
                   ` (5 preceding siblings ...)
  2024-01-25  3:39 ` [PATCH net-next 0/5] virtio-net: " Jason Wang
@ 2024-02-22 19:45 ` Michael S. Tsirkin
  2024-02-23  1:50   ` Xuan Zhuo
  6 siblings, 1 reply; 33+ messages in thread
From: Michael S. Tsirkin @ 2024-02-22 19:45 UTC (permalink / raw)
  To: Xuan Zhuo
  Cc: netdev, Jason Wang, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Alexei Starovoitov, Daniel Borkmann,
	Jesper Dangaard Brouer, John Fastabend, virtualization, bpf

On Tue, Jan 16, 2024 at 03:59:19PM +0800, Xuan Zhuo wrote:
> This is the second part of virtio-net support AF_XDP zero copy.

My understanding is, there's going to be another version of all
this work?

-- 
MST


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH net-next 0/5] virtio-net: sq support premapped mode
  2024-02-22 19:45 ` Michael S. Tsirkin
@ 2024-02-23  1:50   ` Xuan Zhuo
  0 siblings, 0 replies; 33+ messages in thread
From: Xuan Zhuo @ 2024-02-23  1:50 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: netdev, Jason Wang, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Alexei Starovoitov, Daniel Borkmann,
	Jesper Dangaard Brouer, John Fastabend, virtualization, bpf

On Thu, 22 Feb 2024 14:45:08 -0500, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> On Tue, Jan 16, 2024 at 03:59:19PM +0800, Xuan Zhuo wrote:
> > This is the second part of virtio-net support AF_XDP zero copy.
>
> My understanding is, there's going to be another version of all
> this work?

YES.

http://lore.kernel.org/all/20240202093951.120283-1-xuanzhuo@linux.alibaba.com

Thanks


>
> --
> MST
>

^ permalink raw reply	[flat|nested] 33+ messages in thread

end of thread, other threads:[~2024-02-23  1:50 UTC | newest]

Thread overview: 33+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-01-16  7:59 [PATCH net-next 0/5] virtio-net: sq support premapped mode Xuan Zhuo
2024-01-16  7:59 ` [PATCH net-next 1/5] virtio_ring: introduce virtqueue_get_buf_ctx_dma() Xuan Zhuo
2024-01-24  6:54   ` Jason Wang
2024-02-22 19:43   ` Michael S. Tsirkin
2024-01-16  7:59 ` [PATCH net-next 2/5] virtio_ring: virtqueue_disable_and_recycle let the callback detach bufs Xuan Zhuo
2024-01-16  7:59 ` [PATCH net-next 3/5] virtio_ring: introduce virtqueue_detach_unused_buf_dma() Xuan Zhuo
2024-01-16  7:59 ` [PATCH net-next 4/5] virtio_ring: introduce virtqueue_get_dma_premapped() Xuan Zhuo
2024-01-25  3:39   ` Jason Wang
2024-01-25  5:57     ` Xuan Zhuo
2024-01-29  3:07       ` Jason Wang
2024-01-29  3:30         ` Xuan Zhuo
2024-01-30  2:54           ` Jason Wang
2024-01-30  3:13             ` Xuan Zhuo
2024-01-16  7:59 ` [PATCH net-next 5/5] virtio_net: sq support premapped mode Xuan Zhuo
2024-01-25  3:39   ` Jason Wang
2024-01-25  5:58     ` Xuan Zhuo
2024-01-29  3:06       ` Jason Wang
2024-01-29  3:11         ` Xuan Zhuo
2024-01-30  2:56           ` Jason Wang
2024-01-30  3:15             ` Xuan Zhuo
2024-01-30  3:27               ` Xuan Zhuo
2024-01-25  3:39 ` [PATCH net-next 0/5] virtio-net: " Jason Wang
2024-01-25  5:42   ` Xuan Zhuo
2024-01-25  5:49     ` Xuan Zhuo
2024-01-25  6:14       ` Jason Wang
2024-01-25  6:25         ` Xuan Zhuo
2024-01-29  3:14           ` Jason Wang
2024-01-29  3:37             ` Xuan Zhuo
2024-01-29  6:23               ` Xuan Zhuo
2024-01-30  2:51                 ` Jason Wang
2024-01-30  3:13                   ` Xuan Zhuo
2024-02-22 19:45 ` Michael S. Tsirkin
2024-02-23  1:50   ` Xuan Zhuo

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.