All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/9] virtio_net: support multi buffer xdp
@ 2022-12-20 14:14 Heng Qi
  2022-12-20 14:14 ` [PATCH v2 1/9] virtio_net: disable the hole mechanism for xdp Heng Qi
                   ` (10 more replies)
  0 siblings, 11 replies; 41+ messages in thread
From: Heng Qi @ 2022-12-20 14:14 UTC (permalink / raw)
  To: netdev, bpf
  Cc: Jason Wang, Michael S . Tsirkin, Paolo Abeni, Jakub Kicinski,
	John Fastabend, David S . Miller, Daniel Borkmann,
	Alexei Starovoitov, Eric Dumazet, Xuan Zhuo

Changes since RFC:
- Using headroom instead of vi->xdp_enabled to avoid re-reading
  in add_recvbuf_mergeable();
- Disable GRO_HW and keep linearization for single buffer xdp;
- Renamed to virtnet_build_xdp_buff_mrg();
- pr_debug() to netdev_dbg();
- Adjusted the order of the patch series.

Currently, virtio net only supports xdp for single-buffer packets
or linearized multi-buffer packets. This patchset supports xdp for
multi-buffer packets, then larger MTU can be used if xdp sets the
xdp.frags. This does not affect single buffer handling.

In order to build multi-buffer xdp neatly, we integrated the code
into virtnet_build_xdp_buff_mrg() for xdp. The first buffer is used
for prepared xdp buff, and the rest of the buffers are added to
its skb_shared_info structure. This structure can also be
conveniently converted during XDP_PASS to get the corresponding skb.

Since virtio net uses comp pages, and bpf_xdp_frags_increase_tail()
is based on the assumption of the page pool,
(rxq->frag_size - skb_frag_size(frag) - skb_frag_off(frag))
is negative in most cases. So we didn't set xdp_rxq->frag_size in
virtnet_open() to disable the tail increase.

Heng Qi (9):
  virtio_net: disable the hole mechanism for xdp
  virtio_net: set up xdp for multi buffer packets
  virtio_net: update bytes calculation for xdp_frame
  virtio_net: build xdp_buff with multi buffers
  virtio_net: construct multi-buffer xdp in mergeable
  virtio_net: transmit the multi-buffer xdp
  virtio_net: build skb from multi-buffer xdp
  virtio_net: remove xdp related info from page_to_skb()
  virtio_net: support multi-buffer xdp

 drivers/net/virtio_net.c | 332 ++++++++++++++++++++++++++-------------
 1 file changed, 219 insertions(+), 113 deletions(-)

-- 
2.19.1.6.gb485710b


^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH v2 1/9] virtio_net: disable the hole mechanism for xdp
  2022-12-20 14:14 [PATCH v2 0/9] virtio_net: support multi buffer xdp Heng Qi
@ 2022-12-20 14:14 ` Heng Qi
  2022-12-27  6:30   ` Jason Wang
  2022-12-20 14:14 ` [PATCH v2 2/9] virtio_net: set up xdp for multi buffer packets Heng Qi
                   ` (9 subsequent siblings)
  10 siblings, 1 reply; 41+ messages in thread
From: Heng Qi @ 2022-12-20 14:14 UTC (permalink / raw)
  To: netdev, bpf
  Cc: Jason Wang, Michael S . Tsirkin, Paolo Abeni, Jakub Kicinski,
	John Fastabend, David S . Miller, Daniel Borkmann,
	Alexei Starovoitov, Eric Dumazet, Xuan Zhuo

XDP core assumes that the frame_size of xdp_buff and the length of
the frag are PAGE_SIZE. The hole may cause the processing of xdp to
fail, so we disable the hole mechanism when xdp is set.

Signed-off-by: Heng Qi <hengqi@linux.alibaba.com>
Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
---
 drivers/net/virtio_net.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 9cce7dec7366..443aa7b8f0ad 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -1419,8 +1419,11 @@ static int add_recvbuf_mergeable(struct virtnet_info *vi,
 		/* To avoid internal fragmentation, if there is very likely not
 		 * enough space for another buffer, add the remaining space to
 		 * the current buffer.
+		 * XDP core assumes that frame_size of xdp_buff and the length
+		 * of the frag are PAGE_SIZE, so we disable the hole mechanism.
 		 */
-		len += hole;
+		if (!headroom)
+			len += hole;
 		alloc_frag->offset += hole;
 	}
 
-- 
2.19.1.6.gb485710b


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH v2 2/9] virtio_net: set up xdp for multi buffer packets
  2022-12-20 14:14 [PATCH v2 0/9] virtio_net: support multi buffer xdp Heng Qi
  2022-12-20 14:14 ` [PATCH v2 1/9] virtio_net: disable the hole mechanism for xdp Heng Qi
@ 2022-12-20 14:14 ` Heng Qi
  2022-12-27  6:32   ` Jason Wang
  2022-12-20 14:14 ` [PATCH v2 3/9] virtio_net: update bytes calculation for xdp_frame Heng Qi
                   ` (8 subsequent siblings)
  10 siblings, 1 reply; 41+ messages in thread
From: Heng Qi @ 2022-12-20 14:14 UTC (permalink / raw)
  To: netdev, bpf
  Cc: Jason Wang, Michael S . Tsirkin, Paolo Abeni, Jakub Kicinski,
	John Fastabend, David S . Miller, Daniel Borkmann,
	Alexei Starovoitov, Eric Dumazet, Xuan Zhuo

When the xdp program sets xdp.frags, which means it can process
multi-buffer packets over larger MTU, so we continue to support xdp.
But for single-buffer xdp, we should keep checking for MTU.

Signed-off-by: Heng Qi <hengqi@linux.alibaba.com>
Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
---
 drivers/net/virtio_net.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 443aa7b8f0ad..c5c4e9db4ed3 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -3095,8 +3095,8 @@ static int virtnet_xdp_set(struct net_device *dev, struct bpf_prog *prog,
 		return -EINVAL;
 	}
 
-	if (dev->mtu > max_sz) {
-		NL_SET_ERR_MSG_MOD(extack, "MTU too large to enable XDP");
+	if (prog && !prog->aux->xdp_has_frags && dev->mtu > max_sz) {
+		NL_SET_ERR_MSG_MOD(extack, "MTU too large to enable XDP without frags");
 		netdev_warn(dev, "XDP requires MTU less than %lu\n", max_sz);
 		return -EINVAL;
 	}
-- 
2.19.1.6.gb485710b


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH v2 3/9] virtio_net: update bytes calculation for xdp_frame
  2022-12-20 14:14 [PATCH v2 0/9] virtio_net: support multi buffer xdp Heng Qi
  2022-12-20 14:14 ` [PATCH v2 1/9] virtio_net: disable the hole mechanism for xdp Heng Qi
  2022-12-20 14:14 ` [PATCH v2 2/9] virtio_net: set up xdp for multi buffer packets Heng Qi
@ 2022-12-20 14:14 ` Heng Qi
  2022-12-20 14:14 ` [PATCH v2 4/9] virtio_net: build xdp_buff with multi buffers Heng Qi
                   ` (7 subsequent siblings)
  10 siblings, 0 replies; 41+ messages in thread
From: Heng Qi @ 2022-12-20 14:14 UTC (permalink / raw)
  To: netdev, bpf
  Cc: Jason Wang, Michael S . Tsirkin, Paolo Abeni, Jakub Kicinski,
	John Fastabend, David S . Miller, Daniel Borkmann,
	Alexei Starovoitov, Eric Dumazet, Xuan Zhuo

Update relative record value for xdp_frame as basis
for multi-buffer xdp transmission.

Signed-off-by: Heng Qi <hengqi@linux.alibaba.com>
Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: Jason Wang <jasowang@redhat.com>
---
 drivers/net/virtio_net.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index c5c4e9db4ed3..08f209d7b0bf 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -658,7 +658,7 @@ static int virtnet_xdp_xmit(struct net_device *dev,
 		if (likely(is_xdp_frame(ptr))) {
 			struct xdp_frame *frame = ptr_to_xdp(ptr);
 
-			bytes += frame->len;
+			bytes += xdp_get_frame_len(frame);
 			xdp_return_frame(frame);
 		} else {
 			struct sk_buff *skb = ptr;
@@ -1604,7 +1604,7 @@ static void free_old_xmit_skbs(struct send_queue *sq, bool in_napi)
 		} else {
 			struct xdp_frame *frame = ptr_to_xdp(ptr);
 
-			bytes += frame->len;
+			bytes += xdp_get_frame_len(frame);
 			xdp_return_frame(frame);
 		}
 		packets++;
-- 
2.19.1.6.gb485710b


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH v2 4/9] virtio_net: build xdp_buff with multi buffers
  2022-12-20 14:14 [PATCH v2 0/9] virtio_net: support multi buffer xdp Heng Qi
                   ` (2 preceding siblings ...)
  2022-12-20 14:14 ` [PATCH v2 3/9] virtio_net: update bytes calculation for xdp_frame Heng Qi
@ 2022-12-20 14:14 ` Heng Qi
  2022-12-27  6:46   ` Jason Wang
  2022-12-20 14:14 ` [PATCH v2 5/9] virtio_net: construct multi-buffer xdp in mergeable Heng Qi
                   ` (6 subsequent siblings)
  10 siblings, 1 reply; 41+ messages in thread
From: Heng Qi @ 2022-12-20 14:14 UTC (permalink / raw)
  To: netdev, bpf
  Cc: Jason Wang, Michael S . Tsirkin, Paolo Abeni, Jakub Kicinski,
	John Fastabend, David S . Miller, Daniel Borkmann,
	Alexei Starovoitov, Eric Dumazet, Xuan Zhuo

Support xdp for multi buffer packets in mergeable mode.

Putting the first buffer as the linear part for xdp_buff,
and the rest of the buffers as non-linear fragments to struct
skb_shared_info in the tailroom belonging to xdp_buff.

Signed-off-by: Heng Qi <hengqi@linux.alibaba.com>
Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
---
 drivers/net/virtio_net.c | 78 ++++++++++++++++++++++++++++++++++++++++
 1 file changed, 78 insertions(+)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 08f209d7b0bf..8fc3b1841d92 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -931,6 +931,84 @@ static struct sk_buff *receive_big(struct net_device *dev,
 	return NULL;
 }
 
+/* TODO: build xdp in big mode */
+static int virtnet_build_xdp_buff_mrg(struct net_device *dev,
+				      struct virtnet_info *vi,
+				      struct receive_queue *rq,
+				      struct xdp_buff *xdp,
+				      void *buf,
+				      unsigned int len,
+				      unsigned int frame_sz,
+				      u16 *num_buf,
+				      unsigned int *xdp_frags_truesize,
+				      struct virtnet_rq_stats *stats)
+{
+	unsigned int tailroom = SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
+	struct virtio_net_hdr_mrg_rxbuf *hdr = buf;
+	unsigned int truesize, headroom;
+	struct skb_shared_info *shinfo;
+	unsigned int xdp_frags_truesz = 0;
+	unsigned int cur_frag_size;
+	struct page *page;
+	skb_frag_t *frag;
+	int offset;
+	void *ctx;
+
+	xdp_init_buff(xdp, frame_sz, &rq->xdp_rxq);
+	xdp_prepare_buff(xdp, buf - VIRTIO_XDP_HEADROOM,
+			 VIRTIO_XDP_HEADROOM + vi->hdr_len, len - vi->hdr_len, true);
+
+	if (*num_buf > 1) {
+		shinfo = xdp_get_shared_info_from_buff(xdp);
+		shinfo->nr_frags = 0;
+		shinfo->xdp_frags_size = 0;
+	}
+
+	if ((*num_buf - 1) > MAX_SKB_FRAGS)
+		return -EINVAL;
+
+	while ((--*num_buf) >= 1) {
+		buf = virtqueue_get_buf_ctx(rq->vq, &len, &ctx);
+		if (unlikely(!buf)) {
+			pr_debug("%s: rx error: %d buffers out of %d missing\n",
+				 dev->name, *num_buf,
+				 virtio16_to_cpu(vi->vdev, hdr->num_buffers));
+			dev->stats.rx_length_errors++;
+			return -EINVAL;
+		}
+
+		if (!xdp_buff_has_frags(xdp))
+			xdp_buff_set_frags_flag(xdp);
+
+		stats->bytes += len;
+		page = virt_to_head_page(buf);
+		offset = buf - page_address(page);
+		truesize = mergeable_ctx_to_truesize(ctx);
+		headroom = mergeable_ctx_to_headroom(ctx);
+
+		cur_frag_size = truesize + (headroom ? (headroom + tailroom) : 0);
+		xdp_frags_truesz += cur_frag_size;
+		if (unlikely(len > truesize || cur_frag_size > PAGE_SIZE)) {
+			pr_debug("%s: rx error: len %u exceeds truesize %lu\n",
+				 dev->name, len, (unsigned long)ctx);
+			dev->stats.rx_length_errors++;
+			return -EINVAL;
+		}
+
+		frag = &shinfo->frags[shinfo->nr_frags++];
+		__skb_frag_set_page(frag, page);
+		skb_frag_off_set(frag, offset);
+		skb_frag_size_set(frag, len);
+		if (page_is_pfmemalloc(page))
+			xdp_buff_set_frag_pfmemalloc(xdp);
+
+		shinfo->xdp_frags_size += len;
+	}
+
+	*xdp_frags_truesize = xdp_frags_truesz;
+	return 0;
+}
+
 static struct sk_buff *receive_mergeable(struct net_device *dev,
 					 struct virtnet_info *vi,
 					 struct receive_queue *rq,
-- 
2.19.1.6.gb485710b


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH v2 5/9] virtio_net: construct multi-buffer xdp in mergeable
  2022-12-20 14:14 [PATCH v2 0/9] virtio_net: support multi buffer xdp Heng Qi
                   ` (3 preceding siblings ...)
  2022-12-20 14:14 ` [PATCH v2 4/9] virtio_net: build xdp_buff with multi buffers Heng Qi
@ 2022-12-20 14:14 ` Heng Qi
  2022-12-27  7:01   ` Jason Wang
  2022-12-20 14:14 ` [PATCH v2 6/9] virtio_net: transmit the multi-buffer xdp Heng Qi
                   ` (5 subsequent siblings)
  10 siblings, 1 reply; 41+ messages in thread
From: Heng Qi @ 2022-12-20 14:14 UTC (permalink / raw)
  To: netdev, bpf
  Cc: Jason Wang, Michael S . Tsirkin, Paolo Abeni, Jakub Kicinski,
	John Fastabend, David S . Miller, Daniel Borkmann,
	Alexei Starovoitov, Eric Dumazet, Xuan Zhuo

Build multi-buffer xdp using virtnet_build_xdp_buff_mrg().

For the prefilled buffer before xdp is set, we will probably use
vq reset in the future. At the same time, virtio net currently
uses comp pages, and bpf_xdp_frags_increase_tail() needs to calculate
the tailroom of the last frag, which will involve the offset of the
corresponding page and cause a negative value, so we disable tail
increase by not setting xdp_rxq->frag_size.

Signed-off-by: Heng Qi <hengqi@linux.alibaba.com>
Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
---
 drivers/net/virtio_net.c | 60 +++++++++++++++++++++++++++++-----------
 1 file changed, 44 insertions(+), 16 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 8fc3b1841d92..40bc58fa57f5 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -1018,6 +1018,7 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
 					 unsigned int *xdp_xmit,
 					 struct virtnet_rq_stats *stats)
 {
+	unsigned int tailroom = SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
 	struct virtio_net_hdr_mrg_rxbuf *hdr = buf;
 	u16 num_buf = virtio16_to_cpu(vi->vdev, hdr->num_buffers);
 	struct page *page = virt_to_head_page(buf);
@@ -1048,11 +1049,14 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
 	rcu_read_lock();
 	xdp_prog = rcu_dereference(rq->xdp_prog);
 	if (xdp_prog) {
+		unsigned int xdp_frags_truesz = 0;
+		struct skb_shared_info *shinfo;
 		struct xdp_frame *xdpf;
 		struct page *xdp_page;
 		struct xdp_buff xdp;
 		void *data;
 		u32 act;
+		int i;
 
 		/* Transient failure which in theory could occur if
 		 * in-flight packets from before XDP was enabled reach
@@ -1061,19 +1065,23 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
 		if (unlikely(hdr->hdr.gso_type))
 			goto err_xdp;
 
-		/* Buffers with headroom use PAGE_SIZE as alloc size,
-		 * see add_recvbuf_mergeable() + get_mergeable_buf_len()
+		/* Now XDP core assumes frag size is PAGE_SIZE, but buffers
+		 * with headroom may add hole in truesize, which
+		 * make their length exceed PAGE_SIZE. So we disabled the
+		 * hole mechanism for xdp. See add_recvbuf_mergeable().
 		 */
 		frame_sz = headroom ? PAGE_SIZE : truesize;
 
-		/* This happens when rx buffer size is underestimated
-		 * or headroom is not enough because of the buffer
-		 * was refilled before XDP is set. This should only
-		 * happen for the first several packets, so we don't
-		 * care much about its performance.
+		/* This happens when headroom is not enough because
+		 * of the buffer was prefilled before XDP is set.
+		 * This should only happen for the first several packets.
+		 * In fact, vq reset can be used here to help us clean up
+		 * the prefilled buffers, but many existing devices do not
+		 * support it, and we don't want to bother users who are
+		 * using xdp normally.
 		 */
-		if (unlikely(num_buf > 1 ||
-			     headroom < virtnet_get_headroom(vi))) {
+		if (!xdp_prog->aux->xdp_has_frags &&
+		    (num_buf > 1 || headroom < virtnet_get_headroom(vi))) {
 			/* linearize data for XDP */
 			xdp_page = xdp_linearize_page(rq, &num_buf,
 						      page, offset,
@@ -1084,17 +1092,26 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
 			if (!xdp_page)
 				goto err_xdp;
 			offset = VIRTIO_XDP_HEADROOM;
+		} else if (unlikely(headroom < virtnet_get_headroom(vi))) {
+			if ((VIRTIO_XDP_HEADROOM + len + tailroom) > PAGE_SIZE)
+				goto err_xdp;
+
+			xdp_page = alloc_page(GFP_ATOMIC);
+			if (!xdp_page)
+				goto err_xdp;
+
+			memcpy(page_address(xdp_page) + VIRTIO_XDP_HEADROOM,
+			       page_address(page) + offset, len);
+			frame_sz = PAGE_SIZE;
+			offset = VIRTIO_XDP_HEADROOM;
 		} else {
 			xdp_page = page;
 		}
-
-		/* Allow consuming headroom but reserve enough space to push
-		 * the descriptor on if we get an XDP_TX return code.
-		 */
 		data = page_address(xdp_page) + offset;
-		xdp_init_buff(&xdp, frame_sz - vi->hdr_len, &rq->xdp_rxq);
-		xdp_prepare_buff(&xdp, data - VIRTIO_XDP_HEADROOM + vi->hdr_len,
-				 VIRTIO_XDP_HEADROOM, len - vi->hdr_len, true);
+		err = virtnet_build_xdp_buff_mrg(dev, vi, rq, &xdp, data, len, frame_sz,
+						 &num_buf, &xdp_frags_truesz, stats);
+		if (unlikely(err))
+			goto err_xdp_frags;
 
 		act = bpf_prog_run_xdp(xdp_prog, &xdp);
 		stats->xdp_packets++;
@@ -1190,6 +1207,17 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
 				__free_pages(xdp_page, 0);
 			goto err_xdp;
 		}
+err_xdp_frags:
+		shinfo = xdp_get_shared_info_from_buff(&xdp);
+
+		if (unlikely(xdp_page != page))
+			__free_pages(xdp_page, 0);
+
+		for (i = 0; i < shinfo->nr_frags; i++) {
+			xdp_page = skb_frag_page(&shinfo->frags[i]);
+			put_page(xdp_page);
+		}
+		goto err_xdp;
 	}
 	rcu_read_unlock();
 
-- 
2.19.1.6.gb485710b


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH v2 6/9] virtio_net: transmit the multi-buffer xdp
  2022-12-20 14:14 [PATCH v2 0/9] virtio_net: support multi buffer xdp Heng Qi
                   ` (4 preceding siblings ...)
  2022-12-20 14:14 ` [PATCH v2 5/9] virtio_net: construct multi-buffer xdp in mergeable Heng Qi
@ 2022-12-20 14:14 ` Heng Qi
  2022-12-27  7:12   ` Jason Wang
  2022-12-20 14:14 ` [PATCH v2 7/9] virtio_net: build skb from " Heng Qi
                   ` (4 subsequent siblings)
  10 siblings, 1 reply; 41+ messages in thread
From: Heng Qi @ 2022-12-20 14:14 UTC (permalink / raw)
  To: netdev, bpf
  Cc: Jason Wang, Michael S . Tsirkin, Paolo Abeni, Jakub Kicinski,
	John Fastabend, David S . Miller, Daniel Borkmann,
	Alexei Starovoitov, Eric Dumazet, Xuan Zhuo

This serves as the basis for XDP_TX and XDP_REDIRECT
to send a multi-buffer xdp_frame.

Signed-off-by: Heng Qi <hengqi@linux.alibaba.com>
Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
---
 drivers/net/virtio_net.c | 27 ++++++++++++++++++++++-----
 1 file changed, 22 insertions(+), 5 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 40bc58fa57f5..9f31bfa7f9a6 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -563,22 +563,39 @@ static int __virtnet_xdp_xmit_one(struct virtnet_info *vi,
 				   struct xdp_frame *xdpf)
 {
 	struct virtio_net_hdr_mrg_rxbuf *hdr;
-	int err;
+	struct skb_shared_info *shinfo;
+	u8 nr_frags = 0;
+	int err, i;
 
 	if (unlikely(xdpf->headroom < vi->hdr_len))
 		return -EOVERFLOW;
 
-	/* Make room for virtqueue hdr (also change xdpf->headroom?) */
+	if (unlikely(xdp_frame_has_frags(xdpf))) {
+		shinfo = xdp_get_shared_info_from_frame(xdpf);
+		nr_frags = shinfo->nr_frags;
+	}
+
+	/* Need to adjust this to calculate the correct postion
+	 * for shinfo of the xdpf.
+	 */
+	xdpf->headroom -= vi->hdr_len;
 	xdpf->data -= vi->hdr_len;
 	/* Zero header and leave csum up to XDP layers */
 	hdr = xdpf->data;
 	memset(hdr, 0, vi->hdr_len);
 	xdpf->len   += vi->hdr_len;
 
-	sg_init_one(sq->sg, xdpf->data, xdpf->len);
+	sg_init_table(sq->sg, nr_frags + 1);
+	sg_set_buf(sq->sg, xdpf->data, xdpf->len);
+	for (i = 0; i < nr_frags; i++) {
+		skb_frag_t *frag = &shinfo->frags[i];
+
+		sg_set_page(&sq->sg[i + 1], skb_frag_page(frag),
+			    skb_frag_size(frag), skb_frag_off(frag));
+	}
 
-	err = virtqueue_add_outbuf(sq->vq, sq->sg, 1, xdp_to_ptr(xdpf),
-				   GFP_ATOMIC);
+	err = virtqueue_add_outbuf(sq->vq, sq->sg, nr_frags + 1,
+				   xdp_to_ptr(xdpf), GFP_ATOMIC);
 	if (unlikely(err))
 		return -ENOSPC; /* Caller handle free/refcnt */
 
-- 
2.19.1.6.gb485710b


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH v2 7/9] virtio_net: build skb from multi-buffer xdp
  2022-12-20 14:14 [PATCH v2 0/9] virtio_net: support multi buffer xdp Heng Qi
                   ` (5 preceding siblings ...)
  2022-12-20 14:14 ` [PATCH v2 6/9] virtio_net: transmit the multi-buffer xdp Heng Qi
@ 2022-12-20 14:14 ` Heng Qi
  2022-12-27  7:31   ` Jason Wang
  2022-12-20 14:14 ` [PATCH v2 8/9] virtio_net: remove xdp related info from page_to_skb() Heng Qi
                   ` (3 subsequent siblings)
  10 siblings, 1 reply; 41+ messages in thread
From: Heng Qi @ 2022-12-20 14:14 UTC (permalink / raw)
  To: netdev, bpf
  Cc: Jason Wang, Michael S . Tsirkin, Paolo Abeni, Jakub Kicinski,
	John Fastabend, David S . Miller, Daniel Borkmann,
	Alexei Starovoitov, Eric Dumazet, Xuan Zhuo

This converts the xdp_buff directly to a skb, including
multi-buffer and single buffer xdp. We'll isolate the
construction of skb based on xdp from page_to_skb().

Signed-off-by: Heng Qi <hengqi@linux.alibaba.com>
Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
---
 drivers/net/virtio_net.c | 50 ++++++++++++++++++++++++++++++++++++++++
 1 file changed, 50 insertions(+)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 9f31bfa7f9a6..4e12196fcfd4 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -948,6 +948,56 @@ static struct sk_buff *receive_big(struct net_device *dev,
 	return NULL;
 }
 
+/* Why not use xdp_build_skb_from_frame() ?
+ * XDP core assumes that xdp frags are PAGE_SIZE in length, while in
+ * virtio-net there are 2 points that do not match its requirements:
+ *  1. The size of the prefilled buffer is not fixed before xdp is set.
+ *  2. When xdp is loaded, virtio-net has a hole mechanism (refer to
+ *     add_recvbuf_mergeable()), which will make the size of a buffer
+ *     exceed PAGE_SIZE.
+ */
+static struct sk_buff *build_skb_from_xdp_buff(struct net_device *dev,
+					       struct virtnet_info *vi,
+					       struct xdp_buff *xdp,
+					       unsigned int xdp_frags_truesz)
+{
+	struct skb_shared_info *sinfo = xdp_get_shared_info_from_buff(xdp);
+	unsigned int headroom, data_len;
+	struct sk_buff *skb;
+	int metasize;
+	u8 nr_frags;
+
+	if (unlikely(xdp->data_end > xdp_data_hard_end(xdp))) {
+		pr_debug("Error building skb as missing reserved tailroom for xdp");
+		return NULL;
+	}
+
+	if (unlikely(xdp_buff_has_frags(xdp)))
+		nr_frags = sinfo->nr_frags;
+
+	skb = build_skb(xdp->data_hard_start, xdp->frame_sz);
+	if (unlikely(!skb))
+		return NULL;
+
+	headroom = xdp->data - xdp->data_hard_start;
+	data_len = xdp->data_end - xdp->data;
+	skb_reserve(skb, headroom);
+	__skb_put(skb, data_len);
+
+	metasize = xdp->data - xdp->data_meta;
+	metasize = metasize > 0 ? metasize : 0;
+	if (metasize)
+		skb_metadata_set(skb, metasize);
+
+	if (unlikely(xdp_buff_has_frags(xdp)))
+		xdp_update_skb_shared_info(skb, nr_frags,
+					   sinfo->xdp_frags_size,
+					   xdp_frags_truesz,
+					   xdp_buff_is_frag_pfmemalloc(xdp));
+
+	return skb;
+}
+
 /* TODO: build xdp in big mode */
 static int virtnet_build_xdp_buff_mrg(struct net_device *dev,
 				      struct virtnet_info *vi,
-- 
2.19.1.6.gb485710b


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH v2 8/9] virtio_net: remove xdp related info from page_to_skb()
  2022-12-20 14:14 [PATCH v2 0/9] virtio_net: support multi buffer xdp Heng Qi
                   ` (6 preceding siblings ...)
  2022-12-20 14:14 ` [PATCH v2 7/9] virtio_net: build skb from " Heng Qi
@ 2022-12-20 14:14 ` Heng Qi
  2022-12-27  7:55   ` Jason Wang
  2022-12-20 14:14 ` [PATCH v2 9/9] virtio_net: support multi-buffer xdp Heng Qi
                   ` (2 subsequent siblings)
  10 siblings, 1 reply; 41+ messages in thread
From: Heng Qi @ 2022-12-20 14:14 UTC (permalink / raw)
  To: netdev, bpf
  Cc: Jason Wang, Michael S . Tsirkin, Paolo Abeni, Jakub Kicinski,
	John Fastabend, David S . Miller, Daniel Borkmann,
	Alexei Starovoitov, Eric Dumazet, Xuan Zhuo

For the clear construction of xdp_buff, we remove the xdp processing
interleaved with page_to_skb(). Now, the logic of xdp and building
skb from xdp are separate and independent.

Signed-off-by: Heng Qi <hengqi@linux.alibaba.com>
Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
---
 drivers/net/virtio_net.c | 41 +++++++++-------------------------------
 1 file changed, 9 insertions(+), 32 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 4e12196fcfd4..398ffe2a5084 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -439,9 +439,7 @@ static unsigned int mergeable_ctx_to_truesize(void *mrg_ctx)
 static struct sk_buff *page_to_skb(struct virtnet_info *vi,
 				   struct receive_queue *rq,
 				   struct page *page, unsigned int offset,
-				   unsigned int len, unsigned int truesize,
-				   bool hdr_valid, unsigned int metasize,
-				   unsigned int headroom)
+				   unsigned int len, unsigned int truesize)
 {
 	struct sk_buff *skb;
 	struct virtio_net_hdr_mrg_rxbuf *hdr;
@@ -459,21 +457,11 @@ static struct sk_buff *page_to_skb(struct virtnet_info *vi,
 	else
 		hdr_padded_len = sizeof(struct padded_vnet_hdr);
 
-	/* If headroom is not 0, there is an offset between the beginning of the
-	 * data and the allocated space, otherwise the data and the allocated
-	 * space are aligned.
-	 *
-	 * Buffers with headroom use PAGE_SIZE as alloc size, see
-	 * add_recvbuf_mergeable() + get_mergeable_buf_len()
-	 */
-	truesize = headroom ? PAGE_SIZE : truesize;
-	tailroom = truesize - headroom;
-	buf = p - headroom;
-
+	buf = p;
 	len -= hdr_len;
 	offset += hdr_padded_len;
 	p += hdr_padded_len;
-	tailroom -= hdr_padded_len + len;
+	tailroom = truesize - hdr_padded_len - len;
 
 	shinfo_size = SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
 
@@ -503,7 +491,7 @@ static struct sk_buff *page_to_skb(struct virtnet_info *vi,
 	if (len <= skb_tailroom(skb))
 		copy = len;
 	else
-		copy = ETH_HLEN + metasize;
+		copy = ETH_HLEN;
 	skb_put_data(skb, p, copy);
 
 	len -= copy;
@@ -542,19 +530,11 @@ static struct sk_buff *page_to_skb(struct virtnet_info *vi,
 		give_pages(rq, page);
 
 ok:
-	/* hdr_valid means no XDP, so we can copy the vnet header */
-	if (hdr_valid) {
-		hdr = skb_vnet_hdr(skb);
-		memcpy(hdr, hdr_p, hdr_len);
-	}
+	hdr = skb_vnet_hdr(skb);
+	memcpy(hdr, hdr_p, hdr_len);
 	if (page_to_free)
 		put_page(page_to_free);
 
-	if (metasize) {
-		__skb_pull(skb, metasize);
-		skb_metadata_set(skb, metasize);
-	}
-
 	return skb;
 }
 
@@ -934,7 +914,7 @@ static struct sk_buff *receive_big(struct net_device *dev,
 {
 	struct page *page = buf;
 	struct sk_buff *skb =
-		page_to_skb(vi, rq, page, 0, len, PAGE_SIZE, true, 0, 0);
+		page_to_skb(vi, rq, page, 0, len, PAGE_SIZE);
 
 	stats->bytes += len - vi->hdr_len;
 	if (unlikely(!skb))
@@ -1222,9 +1202,7 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
 				rcu_read_unlock();
 				put_page(page);
 				head_skb = page_to_skb(vi, rq, xdp_page, offset,
-						       len, PAGE_SIZE, false,
-						       metasize,
-						       headroom);
+						       len, PAGE_SIZE);
 				return head_skb;
 			}
 			break;
@@ -1289,8 +1267,7 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
 	rcu_read_unlock();
 
 skip_xdp:
-	head_skb = page_to_skb(vi, rq, page, offset, len, truesize, !xdp_prog,
-			       metasize, headroom);
+	head_skb = page_to_skb(vi, rq, page, offset, len, truesize);
 	curr_skb = head_skb;
 
 	if (unlikely(!curr_skb))
-- 
2.19.1.6.gb485710b


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH v2 9/9] virtio_net: support multi-buffer xdp
  2022-12-20 14:14 [PATCH v2 0/9] virtio_net: support multi buffer xdp Heng Qi
                   ` (7 preceding siblings ...)
  2022-12-20 14:14 ` [PATCH v2 8/9] virtio_net: remove xdp related info from page_to_skb() Heng Qi
@ 2022-12-20 14:14 ` Heng Qi
  2022-12-27  9:03   ` Jason Wang
  2022-12-22  1:30 ` [PATCH v2 0/9] virtio_net: support multi buffer xdp Jakub Kicinski
  2022-12-26  2:32 ` Heng Qi
  10 siblings, 1 reply; 41+ messages in thread
From: Heng Qi @ 2022-12-20 14:14 UTC (permalink / raw)
  To: netdev, bpf
  Cc: Jason Wang, Michael S . Tsirkin, Paolo Abeni, Jakub Kicinski,
	John Fastabend, David S . Miller, Daniel Borkmann,
	Alexei Starovoitov, Eric Dumazet, Xuan Zhuo

Driver can pass the skb to stack by build_skb_from_xdp_buff().

Driver forwards multi-buffer packets using the send queue
when XDP_TX and XDP_REDIRECT, and clears the reference of multi
pages when XDP_DROP.

Signed-off-by: Heng Qi <hengqi@linux.alibaba.com>
Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
---
 drivers/net/virtio_net.c | 65 ++++++----------------------------------
 1 file changed, 9 insertions(+), 56 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 398ffe2a5084..daa380b9d1cc 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -1074,7 +1074,6 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
 	struct bpf_prog *xdp_prog;
 	unsigned int truesize = mergeable_ctx_to_truesize(ctx);
 	unsigned int headroom = mergeable_ctx_to_headroom(ctx);
-	unsigned int metasize = 0;
 	unsigned int frame_sz;
 	int err;
 
@@ -1165,63 +1164,22 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
 
 		switch (act) {
 		case XDP_PASS:
-			metasize = xdp.data - xdp.data_meta;
-
-			/* recalculate offset to account for any header
-			 * adjustments and minus the metasize to copy the
-			 * metadata in page_to_skb(). Note other cases do not
-			 * build an skb and avoid using offset
-			 */
-			offset = xdp.data - page_address(xdp_page) -
-				 vi->hdr_len - metasize;
-
-			/* recalculate len if xdp.data, xdp.data_end or
-			 * xdp.data_meta were adjusted
-			 */
-			len = xdp.data_end - xdp.data + vi->hdr_len + metasize;
-
-			/* recalculate headroom if xdp.data or xdp_data_meta
-			 * were adjusted, note that offset should always point
-			 * to the start of the reserved bytes for virtio_net
-			 * header which are followed by xdp.data, that means
-			 * that offset is equal to the headroom (when buf is
-			 * starting at the beginning of the page, otherwise
-			 * there is a base offset inside the page) but it's used
-			 * with a different starting point (buf start) than
-			 * xdp.data (buf start + vnet hdr size). If xdp.data or
-			 * data_meta were adjusted by the xdp prog then the
-			 * headroom size has changed and so has the offset, we
-			 * can use data_hard_start, which points at buf start +
-			 * vnet hdr size, to calculate the new headroom and use
-			 * it later to compute buf start in page_to_skb()
-			 */
-			headroom = xdp.data - xdp.data_hard_start - metasize;
-
-			/* We can only create skb based on xdp_page. */
-			if (unlikely(xdp_page != page)) {
-				rcu_read_unlock();
-				put_page(page);
-				head_skb = page_to_skb(vi, rq, xdp_page, offset,
-						       len, PAGE_SIZE);
-				return head_skb;
-			}
-			break;
+			head_skb = build_skb_from_xdp_buff(dev, vi, &xdp, xdp_frags_truesz);
+			rcu_read_unlock();
+			return head_skb;
 		case XDP_TX:
 			stats->xdp_tx++;
 			xdpf = xdp_convert_buff_to_frame(&xdp);
 			if (unlikely(!xdpf)) {
-				if (unlikely(xdp_page != page))
-					put_page(xdp_page);
-				goto err_xdp;
+				netdev_dbg(dev, "convert buff to frame failed for xdp\n");
+				goto err_xdp_frags;
 			}
 			err = virtnet_xdp_xmit(dev, 1, &xdpf, 0);
 			if (unlikely(!err)) {
 				xdp_return_frame_rx_napi(xdpf);
 			} else if (unlikely(err < 0)) {
 				trace_xdp_exception(vi->dev, xdp_prog, act);
-				if (unlikely(xdp_page != page))
-					put_page(xdp_page);
-				goto err_xdp;
+				goto err_xdp_frags;
 			}
 			*xdp_xmit |= VIRTIO_XDP_TX;
 			if (unlikely(xdp_page != page))
@@ -1231,11 +1189,8 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
 		case XDP_REDIRECT:
 			stats->xdp_redirects++;
 			err = xdp_do_redirect(dev, &xdp, xdp_prog);
-			if (err) {
-				if (unlikely(xdp_page != page))
-					put_page(xdp_page);
-				goto err_xdp;
-			}
+			if (err)
+				goto err_xdp_frags;
 			*xdp_xmit |= VIRTIO_XDP_REDIR;
 			if (unlikely(xdp_page != page))
 				put_page(page);
@@ -1248,9 +1203,7 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
 			trace_xdp_exception(vi->dev, xdp_prog, act);
 			fallthrough;
 		case XDP_DROP:
-			if (unlikely(xdp_page != page))
-				__free_pages(xdp_page, 0);
-			goto err_xdp;
+			goto err_xdp_frags;
 		}
 err_xdp_frags:
 		shinfo = xdp_get_shared_info_from_buff(&xdp);
-- 
2.19.1.6.gb485710b


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 0/9] virtio_net: support multi buffer xdp
  2022-12-20 14:14 [PATCH v2 0/9] virtio_net: support multi buffer xdp Heng Qi
                   ` (8 preceding siblings ...)
  2022-12-20 14:14 ` [PATCH v2 9/9] virtio_net: support multi-buffer xdp Heng Qi
@ 2022-12-22  1:30 ` Jakub Kicinski
  2022-12-22  2:04   ` Heng Qi
  2022-12-26  2:32 ` Heng Qi
  10 siblings, 1 reply; 41+ messages in thread
From: Jakub Kicinski @ 2022-12-22  1:30 UTC (permalink / raw)
  To: Heng Qi
  Cc: netdev, bpf, Jason Wang, Michael S . Tsirkin, Paolo Abeni,
	John Fastabend, David S . Miller, Daniel Borkmann,
	Alexei Starovoitov, Eric Dumazet, Xuan Zhuo

On Tue, 20 Dec 2022 22:14:40 +0800 Heng Qi wrote:
> Changes since RFC:
> - Using headroom instead of vi->xdp_enabled to avoid re-reading
>   in add_recvbuf_mergeable();
> - Disable GRO_HW and keep linearization for single buffer xdp;
> - Renamed to virtnet_build_xdp_buff_mrg();
> - pr_debug() to netdev_dbg();
> - Adjusted the order of the patch series.

# Form letter - net-next is closed

We have already submitted the networking pull request to Linus
for v6.2 and therefore net-next is closed for new drivers, features,
code refactoring and optimizations. We are currently accepting
bug fixes only.

Please repost when net-next reopens after Jan 2nd.

RFC patches sent for review only are obviously welcome at any time.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 0/9] virtio_net: support multi buffer xdp
  2022-12-22  1:30 ` [PATCH v2 0/9] virtio_net: support multi buffer xdp Jakub Kicinski
@ 2022-12-22  2:04   ` Heng Qi
  0 siblings, 0 replies; 41+ messages in thread
From: Heng Qi @ 2022-12-22  2:04 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: netdev, bpf, Jason Wang, Michael S . Tsirkin, Paolo Abeni,
	John Fastabend, David S . Miller, Daniel Borkmann,
	Alexei Starovoitov, Eric Dumazet, Xuan Zhuo



在 2022/12/22 上午9:30, Jakub Kicinski 写道:
> On Tue, 20 Dec 2022 22:14:40 +0800 Heng Qi wrote:
>> Changes since RFC:
>> - Using headroom instead of vi->xdp_enabled to avoid re-reading
>>    in add_recvbuf_mergeable();
>> - Disable GRO_HW and keep linearization for single buffer xdp;
>> - Renamed to virtnet_build_xdp_buff_mrg();
>> - pr_debug() to netdev_dbg();
>> - Adjusted the order of the patch series.
> # Form letter - net-next is closed
>
> We have already submitted the networking pull request to Linus
> for v6.2 and therefore net-next is closed for new drivers, features,
> code refactoring and optimizations. We are currently accepting
> bug fixes only.
>
> Please repost when net-next reopens after Jan 2nd.
>
> RFC patches sent for review only are obviously welcome at any time.

Yes, I understand, we can also review this patch series first.

Thanks.





^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 0/9] virtio_net: support multi buffer xdp
  2022-12-20 14:14 [PATCH v2 0/9] virtio_net: support multi buffer xdp Heng Qi
                   ` (9 preceding siblings ...)
  2022-12-22  1:30 ` [PATCH v2 0/9] virtio_net: support multi buffer xdp Jakub Kicinski
@ 2022-12-26  2:32 ` Heng Qi
  2022-12-26  4:14   ` Jason Wang
  10 siblings, 1 reply; 41+ messages in thread
From: Heng Qi @ 2022-12-26  2:32 UTC (permalink / raw)
  To: Jason Wang
  Cc: Michael S . Tsirkin, Paolo Abeni, Jakub Kicinski, John Fastabend,
	David S . Miller, Daniel Borkmann, Alexei Starovoitov,
	Eric Dumazet, Xuan Zhuo, netdev, bpf


Hi Jason, do you have any comments on this?

Thanks.

在 2022/12/20 下午10:14, Heng Qi 写道:
> Changes since RFC:
> - Using headroom instead of vi->xdp_enabled to avoid re-reading
>    in add_recvbuf_mergeable();
> - Disable GRO_HW and keep linearization for single buffer xdp;
> - Renamed to virtnet_build_xdp_buff_mrg();
> - pr_debug() to netdev_dbg();
> - Adjusted the order of the patch series.
>
> Currently, virtio net only supports xdp for single-buffer packets
> or linearized multi-buffer packets. This patchset supports xdp for
> multi-buffer packets, then larger MTU can be used if xdp sets the
> xdp.frags. This does not affect single buffer handling.
>
> In order to build multi-buffer xdp neatly, we integrated the code
> into virtnet_build_xdp_buff_mrg() for xdp. The first buffer is used
> for prepared xdp buff, and the rest of the buffers are added to
> its skb_shared_info structure. This structure can also be
> conveniently converted during XDP_PASS to get the corresponding skb.
>
> Since virtio net uses comp pages, and bpf_xdp_frags_increase_tail()
> is based on the assumption of the page pool,
> (rxq->frag_size - skb_frag_size(frag) - skb_frag_off(frag))
> is negative in most cases. So we didn't set xdp_rxq->frag_size in
> virtnet_open() to disable the tail increase.
>
> Heng Qi (9):
>    virtio_net: disable the hole mechanism for xdp
>    virtio_net: set up xdp for multi buffer packets
>    virtio_net: update bytes calculation for xdp_frame
>    virtio_net: build xdp_buff with multi buffers
>    virtio_net: construct multi-buffer xdp in mergeable
>    virtio_net: transmit the multi-buffer xdp
>    virtio_net: build skb from multi-buffer xdp
>    virtio_net: remove xdp related info from page_to_skb()
>    virtio_net: support multi-buffer xdp
>
>   drivers/net/virtio_net.c | 332 ++++++++++++++++++++++++++-------------
>   1 file changed, 219 insertions(+), 113 deletions(-)
>


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 0/9] virtio_net: support multi buffer xdp
  2022-12-26  2:32 ` Heng Qi
@ 2022-12-26  4:14   ` Jason Wang
  0 siblings, 0 replies; 41+ messages in thread
From: Jason Wang @ 2022-12-26  4:14 UTC (permalink / raw)
  To: Heng Qi
  Cc: Michael S . Tsirkin, Paolo Abeni, Jakub Kicinski, John Fastabend,
	David S . Miller, Daniel Borkmann, Alexei Starovoitov,
	Eric Dumazet, Xuan Zhuo, netdev, bpf

On Mon, Dec 26, 2022 at 10:33 AM Heng Qi <hengqi@linux.alibaba.com> wrote:
>
>
> Hi Jason, do you have any comments on this?

I will review this no later than the end of this week.

Thanks

>
> Thanks.
>
> 在 2022/12/20 下午10:14, Heng Qi 写道:
> > Changes since RFC:
> > - Using headroom instead of vi->xdp_enabled to avoid re-reading
> >    in add_recvbuf_mergeable();
> > - Disable GRO_HW and keep linearization for single buffer xdp;
> > - Renamed to virtnet_build_xdp_buff_mrg();
> > - pr_debug() to netdev_dbg();
> > - Adjusted the order of the patch series.
> >
> > Currently, virtio net only supports xdp for single-buffer packets
> > or linearized multi-buffer packets. This patchset supports xdp for
> > multi-buffer packets, then larger MTU can be used if xdp sets the
> > xdp.frags. This does not affect single buffer handling.
> >
> > In order to build multi-buffer xdp neatly, we integrated the code
> > into virtnet_build_xdp_buff_mrg() for xdp. The first buffer is used
> > for prepared xdp buff, and the rest of the buffers are added to
> > its skb_shared_info structure. This structure can also be
> > conveniently converted during XDP_PASS to get the corresponding skb.
> >
> > Since virtio net uses comp pages, and bpf_xdp_frags_increase_tail()
> > is based on the assumption of the page pool,
> > (rxq->frag_size - skb_frag_size(frag) - skb_frag_off(frag))
> > is negative in most cases. So we didn't set xdp_rxq->frag_size in
> > virtnet_open() to disable the tail increase.
> >
> > Heng Qi (9):
> >    virtio_net: disable the hole mechanism for xdp
> >    virtio_net: set up xdp for multi buffer packets
> >    virtio_net: update bytes calculation for xdp_frame
> >    virtio_net: build xdp_buff with multi buffers
> >    virtio_net: construct multi-buffer xdp in mergeable
> >    virtio_net: transmit the multi-buffer xdp
> >    virtio_net: build skb from multi-buffer xdp
> >    virtio_net: remove xdp related info from page_to_skb()
> >    virtio_net: support multi-buffer xdp
> >
> >   drivers/net/virtio_net.c | 332 ++++++++++++++++++++++++++-------------
> >   1 file changed, 219 insertions(+), 113 deletions(-)
> >
>


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 1/9] virtio_net: disable the hole mechanism for xdp
  2022-12-20 14:14 ` [PATCH v2 1/9] virtio_net: disable the hole mechanism for xdp Heng Qi
@ 2022-12-27  6:30   ` Jason Wang
  2022-12-27  7:32     ` Heng Qi
  0 siblings, 1 reply; 41+ messages in thread
From: Jason Wang @ 2022-12-27  6:30 UTC (permalink / raw)
  To: Heng Qi, netdev, bpf
  Cc: Michael S . Tsirkin, Paolo Abeni, Jakub Kicinski, John Fastabend,
	David S . Miller, Daniel Borkmann, Alexei Starovoitov,
	Eric Dumazet, Xuan Zhuo


在 2022/12/20 22:14, Heng Qi 写道:
> XDP core assumes that the frame_size of xdp_buff and the length of
> the frag are PAGE_SIZE. The hole may cause the processing of xdp to
> fail, so we disable the hole mechanism when xdp is set.
>
> Signed-off-by: Heng Qi <hengqi@linux.alibaba.com>
> Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> ---
>   drivers/net/virtio_net.c | 5 ++++-
>   1 file changed, 4 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index 9cce7dec7366..443aa7b8f0ad 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -1419,8 +1419,11 @@ static int add_recvbuf_mergeable(struct virtnet_info *vi,
>   		/* To avoid internal fragmentation, if there is very likely not
>   		 * enough space for another buffer, add the remaining space to
>   		 * the current buffer.
> +		 * XDP core assumes that frame_size of xdp_buff and the length
> +		 * of the frag are PAGE_SIZE, so we disable the hole mechanism.
>   		 */
> -		len += hole;
> +		if (!headroom)
> +			len += hole;


Is this only a requirement of multi-buffer XDP? If not, it need to be 
backported to stable.

Thanks


>   		alloc_frag->offset += hole;
>   	}
>   


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 2/9] virtio_net: set up xdp for multi buffer packets
  2022-12-20 14:14 ` [PATCH v2 2/9] virtio_net: set up xdp for multi buffer packets Heng Qi
@ 2022-12-27  6:32   ` Jason Wang
  2022-12-27 12:20     ` Heng Qi
  0 siblings, 1 reply; 41+ messages in thread
From: Jason Wang @ 2022-12-27  6:32 UTC (permalink / raw)
  To: Heng Qi, netdev, bpf
  Cc: Michael S . Tsirkin, Paolo Abeni, Jakub Kicinski, John Fastabend,
	David S . Miller, Daniel Borkmann, Alexei Starovoitov,
	Eric Dumazet, Xuan Zhuo


在 2022/12/20 22:14, Heng Qi 写道:
> When the xdp program sets xdp.frags, which means it can process
> multi-buffer packets over larger MTU, so we continue to support xdp.
> But for single-buffer xdp, we should keep checking for MTU.
>
> Signed-off-by: Heng Qi <hengqi@linux.alibaba.com>
> Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> ---
>   drivers/net/virtio_net.c | 4 ++--
>   1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index 443aa7b8f0ad..c5c4e9db4ed3 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -3095,8 +3095,8 @@ static int virtnet_xdp_set(struct net_device *dev, struct bpf_prog *prog,
>   		return -EINVAL;
>   	}
>   
> -	if (dev->mtu > max_sz) {
> -		NL_SET_ERR_MSG_MOD(extack, "MTU too large to enable XDP");
> +	if (prog && !prog->aux->xdp_has_frags && dev->mtu > max_sz) {


Not related to this patch, but I see:

         unsigned long int max_sz = PAGE_SIZE - sizeof(struct 
padded_vnet_hdr);

Which is suspicious, do we need to count reserved headroom/tailroom as well?

Thanks


> +		NL_SET_ERR_MSG_MOD(extack, "MTU too large to enable XDP without frags");
>   		netdev_warn(dev, "XDP requires MTU less than %lu\n", max_sz);
>   		return -EINVAL;
>   	}


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 4/9] virtio_net: build xdp_buff with multi buffers
  2022-12-20 14:14 ` [PATCH v2 4/9] virtio_net: build xdp_buff with multi buffers Heng Qi
@ 2022-12-27  6:46   ` Jason Wang
  2022-12-27  9:10     ` Heng Qi
  0 siblings, 1 reply; 41+ messages in thread
From: Jason Wang @ 2022-12-27  6:46 UTC (permalink / raw)
  To: Heng Qi, netdev, bpf
  Cc: Michael S . Tsirkin, Paolo Abeni, Jakub Kicinski, John Fastabend,
	David S . Miller, Daniel Borkmann, Alexei Starovoitov,
	Eric Dumazet, Xuan Zhuo


在 2022/12/20 22:14, Heng Qi 写道:
> Support xdp for multi buffer packets in mergeable mode.
>
> Putting the first buffer as the linear part for xdp_buff,
> and the rest of the buffers as non-linear fragments to struct
> skb_shared_info in the tailroom belonging to xdp_buff.
>
> Signed-off-by: Heng Qi <hengqi@linux.alibaba.com>
> Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> ---
>   drivers/net/virtio_net.c | 78 ++++++++++++++++++++++++++++++++++++++++
>   1 file changed, 78 insertions(+)
>
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index 08f209d7b0bf..8fc3b1841d92 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -931,6 +931,84 @@ static struct sk_buff *receive_big(struct net_device *dev,
>   	return NULL;
>   }
>   
> +/* TODO: build xdp in big mode */
> +static int virtnet_build_xdp_buff_mrg(struct net_device *dev,
> +				      struct virtnet_info *vi,
> +				      struct receive_queue *rq,
> +				      struct xdp_buff *xdp,
> +				      void *buf,
> +				      unsigned int len,
> +				      unsigned int frame_sz,
> +				      u16 *num_buf,
> +				      unsigned int *xdp_frags_truesize,
> +				      struct virtnet_rq_stats *stats)
> +{
> +	unsigned int tailroom = SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
> +	struct virtio_net_hdr_mrg_rxbuf *hdr = buf;
> +	unsigned int truesize, headroom;
> +	struct skb_shared_info *shinfo;
> +	unsigned int xdp_frags_truesz = 0;
> +	unsigned int cur_frag_size;
> +	struct page *page;
> +	skb_frag_t *frag;
> +	int offset;
> +	void *ctx;
> +
> +	xdp_init_buff(xdp, frame_sz, &rq->xdp_rxq);
> +	xdp_prepare_buff(xdp, buf - VIRTIO_XDP_HEADROOM,
> +			 VIRTIO_XDP_HEADROOM + vi->hdr_len, len - vi->hdr_len, true);
> +
> +	if (*num_buf > 1) {
> +		shinfo = xdp_get_shared_info_from_buff(xdp);
> +		shinfo->nr_frags = 0;
> +		shinfo->xdp_frags_size = 0;
> +	}
> +
> +	if ((*num_buf - 1) > MAX_SKB_FRAGS)
> +		return -EINVAL;
> +
> +	while ((--*num_buf) >= 1) {
> +		buf = virtqueue_get_buf_ctx(rq->vq, &len, &ctx);
> +		if (unlikely(!buf)) {
> +			pr_debug("%s: rx error: %d buffers out of %d missing\n",
> +				 dev->name, *num_buf,
> +				 virtio16_to_cpu(vi->vdev, hdr->num_buffers));
> +			dev->stats.rx_length_errors++;
> +			return -EINVAL;
> +		}
> +
> +		if (!xdp_buff_has_frags(xdp))
> +			xdp_buff_set_frags_flag(xdp);


Any reason to put this inside the loop?


> +
> +		stats->bytes += len;
> +		page = virt_to_head_page(buf);
> +		offset = buf - page_address(page);
> +		truesize = mergeable_ctx_to_truesize(ctx);
> +		headroom = mergeable_ctx_to_headroom(ctx);
> +
> +		cur_frag_size = truesize + (headroom ? (headroom + tailroom) : 0);
> +		xdp_frags_truesz += cur_frag_size;


Not related to this patch, but it would easily confuse the future 
readers that the we need another math for truesize. I think at least we 
need some comments for this or

I guess the root cause is in get_mergeable_buf_len:

static unsigned int get_mergeable_buf_len(struct receive_queue *rq,
                                       struct ewma_pkt_len *avg_pkt_len,
                                           unsigned int room)
{
         struct virtnet_info *vi = rq->vq->vdev->priv;
         const size_t hdr_len = vi->hdr_len;
         unsigned int len;

         if (room)
         return PAGE_SIZE - room;

And we do

     len = get_mergeable_buf_len(rq, &rq->mrg_avg_pkt_len, room);

     ...

     ctx = mergeable_len_to_ctx(len, headroom);


I wonder if it's better to pack the real truesize (PAGE_SIZE) here. This 
may ease a lot of things.

Thanks


> +		if (unlikely(len > truesize || cur_frag_size > PAGE_SIZE)) {
> +			pr_debug("%s: rx error: len %u exceeds truesize %lu\n",
> +				 dev->name, len, (unsigned long)ctx);
> +			dev->stats.rx_length_errors++;
> +			return -EINVAL;
> +		}
> +
> +		frag = &shinfo->frags[shinfo->nr_frags++];
> +		__skb_frag_set_page(frag, page);
> +		skb_frag_off_set(frag, offset);
> +		skb_frag_size_set(frag, len);
> +		if (page_is_pfmemalloc(page))
> +			xdp_buff_set_frag_pfmemalloc(xdp);
> +
> +		shinfo->xdp_frags_size += len;
> +	}
> +
> +	*xdp_frags_truesize = xdp_frags_truesz;
> +	return 0;
> +}
> +
>   static struct sk_buff *receive_mergeable(struct net_device *dev,
>   					 struct virtnet_info *vi,
>   					 struct receive_queue *rq,


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 5/9] virtio_net: construct multi-buffer xdp in mergeable
  2022-12-20 14:14 ` [PATCH v2 5/9] virtio_net: construct multi-buffer xdp in mergeable Heng Qi
@ 2022-12-27  7:01   ` Jason Wang
  2022-12-27  9:31     ` Heng Qi
  0 siblings, 1 reply; 41+ messages in thread
From: Jason Wang @ 2022-12-27  7:01 UTC (permalink / raw)
  To: Heng Qi, netdev, bpf
  Cc: Michael S . Tsirkin, Paolo Abeni, Jakub Kicinski, John Fastabend,
	David S . Miller, Daniel Borkmann, Alexei Starovoitov,
	Eric Dumazet, Xuan Zhuo


在 2022/12/20 22:14, Heng Qi 写道:
> Build multi-buffer xdp using virtnet_build_xdp_buff_mrg().
>
> For the prefilled buffer before xdp is set, we will probably use
> vq reset in the future. At the same time, virtio net currently
> uses comp pages, and bpf_xdp_frags_increase_tail() needs to calculate
> the tailroom of the last frag, which will involve the offset of the
> corresponding page and cause a negative value, so we disable tail
> increase by not setting xdp_rxq->frag_size.
>
> Signed-off-by: Heng Qi <hengqi@linux.alibaba.com>
> Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> ---
>   drivers/net/virtio_net.c | 60 +++++++++++++++++++++++++++++-----------
>   1 file changed, 44 insertions(+), 16 deletions(-)
>
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index 8fc3b1841d92..40bc58fa57f5 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -1018,6 +1018,7 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
>   					 unsigned int *xdp_xmit,
>   					 struct virtnet_rq_stats *stats)
>   {
> +	unsigned int tailroom = SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
>   	struct virtio_net_hdr_mrg_rxbuf *hdr = buf;
>   	u16 num_buf = virtio16_to_cpu(vi->vdev, hdr->num_buffers);
>   	struct page *page = virt_to_head_page(buf);
> @@ -1048,11 +1049,14 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
>   	rcu_read_lock();
>   	xdp_prog = rcu_dereference(rq->xdp_prog);
>   	if (xdp_prog) {
> +		unsigned int xdp_frags_truesz = 0;
> +		struct skb_shared_info *shinfo;
>   		struct xdp_frame *xdpf;
>   		struct page *xdp_page;
>   		struct xdp_buff xdp;
>   		void *data;
>   		u32 act;
> +		int i;
>   
>   		/* Transient failure which in theory could occur if
>   		 * in-flight packets from before XDP was enabled reach
> @@ -1061,19 +1065,23 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
>   		if (unlikely(hdr->hdr.gso_type))
>   			goto err_xdp;
>   
> -		/* Buffers with headroom use PAGE_SIZE as alloc size,
> -		 * see add_recvbuf_mergeable() + get_mergeable_buf_len()
> +		/* Now XDP core assumes frag size is PAGE_SIZE, but buffers
> +		 * with headroom may add hole in truesize, which
> +		 * make their length exceed PAGE_SIZE. So we disabled the
> +		 * hole mechanism for xdp. See add_recvbuf_mergeable().
>   		 */
>   		frame_sz = headroom ? PAGE_SIZE : truesize;
>   
> -		/* This happens when rx buffer size is underestimated
> -		 * or headroom is not enough because of the buffer
> -		 * was refilled before XDP is set. This should only
> -		 * happen for the first several packets, so we don't
> -		 * care much about its performance.
> +		/* This happens when headroom is not enough because
> +		 * of the buffer was prefilled before XDP is set.
> +		 * This should only happen for the first several packets.
> +		 * In fact, vq reset can be used here to help us clean up
> +		 * the prefilled buffers, but many existing devices do not
> +		 * support it, and we don't want to bother users who are
> +		 * using xdp normally.
>   		 */
> -		if (unlikely(num_buf > 1 ||
> -			     headroom < virtnet_get_headroom(vi))) {
> +		if (!xdp_prog->aux->xdp_has_frags &&
> +		    (num_buf > 1 || headroom < virtnet_get_headroom(vi))) {
>   			/* linearize data for XDP */
>   			xdp_page = xdp_linearize_page(rq, &num_buf,
>   						      page, offset,
> @@ -1084,17 +1092,26 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
>   			if (!xdp_page)
>   				goto err_xdp;
>   			offset = VIRTIO_XDP_HEADROOM;
> +		} else if (unlikely(headroom < virtnet_get_headroom(vi))) {


I believe we need to check xdp_prog->aux->xdp_has_frags at least since 
this may not work if it needs more than one frags?

Btw, I don't see a reason why we can't reuse xdp_linearize_page(), (we 
probably don't need error is the buffer exceeds PAGE_SIZE).

Other looks good.

Thanks


> +			if ((VIRTIO_XDP_HEADROOM + len + tailroom) > PAGE_SIZE)
> +				goto err_xdp;
> +
> +			xdp_page = alloc_page(GFP_ATOMIC);
> +			if (!xdp_page)
> +				goto err_xdp;
> +
> +			memcpy(page_address(xdp_page) + VIRTIO_XDP_HEADROOM,
> +			       page_address(page) + offset, len);
> +			frame_sz = PAGE_SIZE;
> +			offset = VIRTIO_XDP_HEADROOM;
>   		} else {
>   			xdp_page = page;
>   		}
> -
> -		/* Allow consuming headroom but reserve enough space to push
> -		 * the descriptor on if we get an XDP_TX return code.
> -		 */
>   		data = page_address(xdp_page) + offset;
> -		xdp_init_buff(&xdp, frame_sz - vi->hdr_len, &rq->xdp_rxq);
> -		xdp_prepare_buff(&xdp, data - VIRTIO_XDP_HEADROOM + vi->hdr_len,
> -				 VIRTIO_XDP_HEADROOM, len - vi->hdr_len, true);
> +		err = virtnet_build_xdp_buff_mrg(dev, vi, rq, &xdp, data, len, frame_sz,
> +						 &num_buf, &xdp_frags_truesz, stats);
> +		if (unlikely(err))
> +			goto err_xdp_frags;
>   
>   		act = bpf_prog_run_xdp(xdp_prog, &xdp);
>   		stats->xdp_packets++;
> @@ -1190,6 +1207,17 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
>   				__free_pages(xdp_page, 0);
>   			goto err_xdp;
>   		}
> +err_xdp_frags:
> +		shinfo = xdp_get_shared_info_from_buff(&xdp);
> +
> +		if (unlikely(xdp_page != page))
> +			__free_pages(xdp_page, 0);
> +
> +		for (i = 0; i < shinfo->nr_frags; i++) {
> +			xdp_page = skb_frag_page(&shinfo->frags[i]);
> +			put_page(xdp_page);
> +		}
> +		goto err_xdp;
>   	}
>   	rcu_read_unlock();
>   


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 6/9] virtio_net: transmit the multi-buffer xdp
  2022-12-20 14:14 ` [PATCH v2 6/9] virtio_net: transmit the multi-buffer xdp Heng Qi
@ 2022-12-27  7:12   ` Jason Wang
  2022-12-27  8:26     ` Heng Qi
  0 siblings, 1 reply; 41+ messages in thread
From: Jason Wang @ 2022-12-27  7:12 UTC (permalink / raw)
  To: Heng Qi, netdev, bpf
  Cc: Michael S . Tsirkin, Paolo Abeni, Jakub Kicinski, John Fastabend,
	David S . Miller, Daniel Borkmann, Alexei Starovoitov,
	Eric Dumazet, Xuan Zhuo


在 2022/12/20 22:14, Heng Qi 写道:
> This serves as the basis for XDP_TX and XDP_REDIRECT
> to send a multi-buffer xdp_frame.
>
> Signed-off-by: Heng Qi <hengqi@linux.alibaba.com>
> Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> ---
>   drivers/net/virtio_net.c | 27 ++++++++++++++++++++++-----
>   1 file changed, 22 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index 40bc58fa57f5..9f31bfa7f9a6 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -563,22 +563,39 @@ static int __virtnet_xdp_xmit_one(struct virtnet_info *vi,
>   				   struct xdp_frame *xdpf)
>   {
>   	struct virtio_net_hdr_mrg_rxbuf *hdr;
> -	int err;
> +	struct skb_shared_info *shinfo;
> +	u8 nr_frags = 0;
> +	int err, i;
>   
>   	if (unlikely(xdpf->headroom < vi->hdr_len))
>   		return -EOVERFLOW;
>   
> -	/* Make room for virtqueue hdr (also change xdpf->headroom?) */
> +	if (unlikely(xdp_frame_has_frags(xdpf))) {
> +		shinfo = xdp_get_shared_info_from_frame(xdpf);
> +		nr_frags = shinfo->nr_frags;
> +	}
> +
> +	/* Need to adjust this to calculate the correct postion
> +	 * for shinfo of the xdpf.
> +	 */
> +	xdpf->headroom -= vi->hdr_len;


Any reason we need to do this here? (Or if it is, is it only needed for 
multibuffer XDP?)

Other looks good.

Thanks


>   	xdpf->data -= vi->hdr_len;
>   	/* Zero header and leave csum up to XDP layers */
>   	hdr = xdpf->data;
>   	memset(hdr, 0, vi->hdr_len);
>   	xdpf->len   += vi->hdr_len;
>   
> -	sg_init_one(sq->sg, xdpf->data, xdpf->len);
> +	sg_init_table(sq->sg, nr_frags + 1);
> +	sg_set_buf(sq->sg, xdpf->data, xdpf->len);
> +	for (i = 0; i < nr_frags; i++) {
> +		skb_frag_t *frag = &shinfo->frags[i];
> +
> +		sg_set_page(&sq->sg[i + 1], skb_frag_page(frag),
> +			    skb_frag_size(frag), skb_frag_off(frag));
> +	}
>   
> -	err = virtqueue_add_outbuf(sq->vq, sq->sg, 1, xdp_to_ptr(xdpf),
> -				   GFP_ATOMIC);
> +	err = virtqueue_add_outbuf(sq->vq, sq->sg, nr_frags + 1,
> +				   xdp_to_ptr(xdpf), GFP_ATOMIC);
>   	if (unlikely(err))
>   		return -ENOSPC; /* Caller handle free/refcnt */
>   


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 7/9] virtio_net: build skb from multi-buffer xdp
  2022-12-20 14:14 ` [PATCH v2 7/9] virtio_net: build skb from " Heng Qi
@ 2022-12-27  7:31   ` Jason Wang
  2022-12-27  7:51     ` Heng Qi
  0 siblings, 1 reply; 41+ messages in thread
From: Jason Wang @ 2022-12-27  7:31 UTC (permalink / raw)
  To: Heng Qi, netdev, bpf
  Cc: Michael S . Tsirkin, Paolo Abeni, Jakub Kicinski, John Fastabend,
	David S . Miller, Daniel Borkmann, Alexei Starovoitov,
	Eric Dumazet, Xuan Zhuo


在 2022/12/20 22:14, Heng Qi 写道:
> This converts the xdp_buff directly to a skb, including
> multi-buffer and single buffer xdp. We'll isolate the
> construction of skb based on xdp from page_to_skb().
>
> Signed-off-by: Heng Qi <hengqi@linux.alibaba.com>
> Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> ---
>   drivers/net/virtio_net.c | 50 ++++++++++++++++++++++++++++++++++++++++
>   1 file changed, 50 insertions(+)
>
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index 9f31bfa7f9a6..4e12196fcfd4 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -948,6 +948,56 @@ static struct sk_buff *receive_big(struct net_device *dev,
>   	return NULL;
>   }
>   
> +/* Why not use xdp_build_skb_from_frame() ?
> + * XDP core assumes that xdp frags are PAGE_SIZE in length, while in
> + * virtio-net there are 2 points that do not match its requirements:
> + *  1. The size of the prefilled buffer is not fixed before xdp is set.
> + *  2. When xdp is loaded, virtio-net has a hole mechanism (refer to
> + *     add_recvbuf_mergeable()), which will make the size of a buffer
> + *     exceed PAGE_SIZE.


Is point 2 still valid after patch 1?

Other than this:

Acked-by: Jason Wang <jasowang@redhat.com>

Thanks


> + */
> +static struct sk_buff *build_skb_from_xdp_buff(struct net_device *dev,
> +					       struct virtnet_info *vi,
> +					       struct xdp_buff *xdp,
> +					       unsigned int xdp_frags_truesz)
> +{
> +	struct skb_shared_info *sinfo = xdp_get_shared_info_from_buff(xdp);
> +	unsigned int headroom, data_len;
> +	struct sk_buff *skb;
> +	int metasize;
> +	u8 nr_frags;
> +
> +	if (unlikely(xdp->data_end > xdp_data_hard_end(xdp))) {
> +		pr_debug("Error building skb as missing reserved tailroom for xdp");
> +		return NULL;
> +	}
> +
> +	if (unlikely(xdp_buff_has_frags(xdp)))
> +		nr_frags = sinfo->nr_frags;
> +
> +	skb = build_skb(xdp->data_hard_start, xdp->frame_sz);
> +	if (unlikely(!skb))
> +		return NULL;
> +
> +	headroom = xdp->data - xdp->data_hard_start;
> +	data_len = xdp->data_end - xdp->data;
> +	skb_reserve(skb, headroom);
> +	__skb_put(skb, data_len);
> +
> +	metasize = xdp->data - xdp->data_meta;
> +	metasize = metasize > 0 ? metasize : 0;
> +	if (metasize)
> +		skb_metadata_set(skb, metasize);
> +
> +	if (unlikely(xdp_buff_has_frags(xdp)))
> +		xdp_update_skb_shared_info(skb, nr_frags,
> +					   sinfo->xdp_frags_size,
> +					   xdp_frags_truesz,
> +					   xdp_buff_is_frag_pfmemalloc(xdp));
> +
> +	return skb;
> +}
> +
>   /* TODO: build xdp in big mode */
>   static int virtnet_build_xdp_buff_mrg(struct net_device *dev,
>   				      struct virtnet_info *vi,


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 1/9] virtio_net: disable the hole mechanism for xdp
  2022-12-27  6:30   ` Jason Wang
@ 2022-12-27  7:32     ` Heng Qi
  2022-12-28  6:28       ` Jason Wang
  0 siblings, 1 reply; 41+ messages in thread
From: Heng Qi @ 2022-12-27  7:32 UTC (permalink / raw)
  To: Jason Wang, netdev, bpf
  Cc: Michael S . Tsirkin, Paolo Abeni, Jakub Kicinski, John Fastabend,
	David S . Miller, Daniel Borkmann, Alexei Starovoitov,
	Eric Dumazet, Xuan Zhuo



在 2022/12/27 下午2:30, Jason Wang 写道:
>
> 在 2022/12/20 22:14, Heng Qi 写道:
>> XDP core assumes that the frame_size of xdp_buff and the length of
>> the frag are PAGE_SIZE. The hole may cause the processing of xdp to
>> fail, so we disable the hole mechanism when xdp is set.
>>
>> Signed-off-by: Heng Qi <hengqi@linux.alibaba.com>
>> Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
>> ---
>>   drivers/net/virtio_net.c | 5 ++++-
>>   1 file changed, 4 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
>> index 9cce7dec7366..443aa7b8f0ad 100644
>> --- a/drivers/net/virtio_net.c
>> +++ b/drivers/net/virtio_net.c
>> @@ -1419,8 +1419,11 @@ static int add_recvbuf_mergeable(struct 
>> virtnet_info *vi,
>>           /* To avoid internal fragmentation, if there is very likely 
>> not
>>            * enough space for another buffer, add the remaining space to
>>            * the current buffer.
>> +         * XDP core assumes that frame_size of xdp_buff and the length
>> +         * of the frag are PAGE_SIZE, so we disable the hole mechanism.
>>            */
>> -        len += hole;
>> +        if (!headroom)
>> +            len += hole;
>
>
> Is this only a requirement of multi-buffer XDP? If not, it need to be 
> backported to stable.

It applies to single buffer xdp and multi-buffer xdp, but even if single 
buffer xdp has a hole
mechanism, there will be no problem (limiting mtu and turning off GUEST 
GSO), so there is
no need to backport it.

Thanks.

>
> Thanks
>
>
>>           alloc_frag->offset += hole;
>>       }


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 7/9] virtio_net: build skb from multi-buffer xdp
  2022-12-27  7:31   ` Jason Wang
@ 2022-12-27  7:51     ` Heng Qi
  0 siblings, 0 replies; 41+ messages in thread
From: Heng Qi @ 2022-12-27  7:51 UTC (permalink / raw)
  To: Jason Wang, netdev, bpf
  Cc: Michael S . Tsirkin, Paolo Abeni, Jakub Kicinski, John Fastabend,
	David S . Miller, Daniel Borkmann, Alexei Starovoitov,
	Eric Dumazet, Xuan Zhuo



在 2022/12/27 下午3:31, Jason Wang 写道:
>
> 在 2022/12/20 22:14, Heng Qi 写道:
>> This converts the xdp_buff directly to a skb, including
>> multi-buffer and single buffer xdp. We'll isolate the
>> construction of skb based on xdp from page_to_skb().
>>
>> Signed-off-by: Heng Qi <hengqi@linux.alibaba.com>
>> Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
>> ---
>>   drivers/net/virtio_net.c | 50 ++++++++++++++++++++++++++++++++++++++++
>>   1 file changed, 50 insertions(+)
>>
>> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
>> index 9f31bfa7f9a6..4e12196fcfd4 100644
>> --- a/drivers/net/virtio_net.c
>> +++ b/drivers/net/virtio_net.c
>> @@ -948,6 +948,56 @@ static struct sk_buff *receive_big(struct 
>> net_device *dev,
>>       return NULL;
>>   }
>>   +/* Why not use xdp_build_skb_from_frame() ?
>> + * XDP core assumes that xdp frags are PAGE_SIZE in length, while in
>> + * virtio-net there are 2 points that do not match its requirements:
>> + *  1. The size of the prefilled buffer is not fixed before xdp is set.
>> + *  2. When xdp is loaded, virtio-net has a hole mechanism (refer to
>> + *     add_recvbuf_mergeable()), which will make the size of a buffer
>> + *     exceed PAGE_SIZE.
>
>
> Is point 2 still valid after patch 1?

Yes, it is invalid anymore, I'll correct that, and there's a little more 
reason that
xdp_build_skb_from_frame() does more checks that we don't need, like
eth_type_trans() (which virtio-net does in receive_buf()).

>
> Other than this:
>
> Acked-by: Jason Wang <jasowang@redhat.com>

Thanks for your energy.

>
> Thanks
>
>
>> + */
>> +static struct sk_buff *build_skb_from_xdp_buff(struct net_device *dev,
>> +                           struct virtnet_info *vi,
>> +                           struct xdp_buff *xdp,
>> +                           unsigned int xdp_frags_truesz)
>> +{
>> +    struct skb_shared_info *sinfo = xdp_get_shared_info_from_buff(xdp);
>> +    unsigned int headroom, data_len;
>> +    struct sk_buff *skb;
>> +    int metasize;
>> +    u8 nr_frags;
>> +
>> +    if (unlikely(xdp->data_end > xdp_data_hard_end(xdp))) {
>> +        pr_debug("Error building skb as missing reserved tailroom 
>> for xdp");
>> +        return NULL;
>> +    }
>> +
>> +    if (unlikely(xdp_buff_has_frags(xdp)))
>> +        nr_frags = sinfo->nr_frags;
>> +
>> +    skb = build_skb(xdp->data_hard_start, xdp->frame_sz);
>> +    if (unlikely(!skb))
>> +        return NULL;
>> +
>> +    headroom = xdp->data - xdp->data_hard_start;
>> +    data_len = xdp->data_end - xdp->data;
>> +    skb_reserve(skb, headroom);
>> +    __skb_put(skb, data_len);
>> +
>> +    metasize = xdp->data - xdp->data_meta;
>> +    metasize = metasize > 0 ? metasize : 0;
>> +    if (metasize)
>> +        skb_metadata_set(skb, metasize);
>> +
>> +    if (unlikely(xdp_buff_has_frags(xdp)))
>> +        xdp_update_skb_shared_info(skb, nr_frags,
>> +                       sinfo->xdp_frags_size,
>> +                       xdp_frags_truesz,
>> +                       xdp_buff_is_frag_pfmemalloc(xdp));
>> +
>> +    return skb;
>> +}
>> +
>>   /* TODO: build xdp in big mode */
>>   static int virtnet_build_xdp_buff_mrg(struct net_device *dev,
>>                         struct virtnet_info *vi,


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 8/9] virtio_net: remove xdp related info from page_to_skb()
  2022-12-20 14:14 ` [PATCH v2 8/9] virtio_net: remove xdp related info from page_to_skb() Heng Qi
@ 2022-12-27  7:55   ` Jason Wang
  2022-12-27  8:27     ` Heng Qi
  0 siblings, 1 reply; 41+ messages in thread
From: Jason Wang @ 2022-12-27  7:55 UTC (permalink / raw)
  To: Heng Qi, netdev, bpf
  Cc: Michael S . Tsirkin, Paolo Abeni, Jakub Kicinski, John Fastabend,
	David S . Miller, Daniel Borkmann, Alexei Starovoitov,
	Eric Dumazet, Xuan Zhuo


在 2022/12/20 22:14, Heng Qi 写道:
> For the clear construction of xdp_buff, we remove the xdp processing
> interleaved with page_to_skb(). Now, the logic of xdp and building
> skb from xdp are separate and independent.
>
> Signed-off-by: Heng Qi <hengqi@linux.alibaba.com>
> Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>


Acked-by: Jason Wang <jasowang@redhat.com>

Thanks


> ---
>   drivers/net/virtio_net.c | 41 +++++++++-------------------------------
>   1 file changed, 9 insertions(+), 32 deletions(-)
>
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index 4e12196fcfd4..398ffe2a5084 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -439,9 +439,7 @@ static unsigned int mergeable_ctx_to_truesize(void *mrg_ctx)
>   static struct sk_buff *page_to_skb(struct virtnet_info *vi,
>   				   struct receive_queue *rq,
>   				   struct page *page, unsigned int offset,
> -				   unsigned int len, unsigned int truesize,
> -				   bool hdr_valid, unsigned int metasize,
> -				   unsigned int headroom)
> +				   unsigned int len, unsigned int truesize)
>   {
>   	struct sk_buff *skb;
>   	struct virtio_net_hdr_mrg_rxbuf *hdr;
> @@ -459,21 +457,11 @@ static struct sk_buff *page_to_skb(struct virtnet_info *vi,
>   	else
>   		hdr_padded_len = sizeof(struct padded_vnet_hdr);
>   
> -	/* If headroom is not 0, there is an offset between the beginning of the
> -	 * data and the allocated space, otherwise the data and the allocated
> -	 * space are aligned.
> -	 *
> -	 * Buffers with headroom use PAGE_SIZE as alloc size, see
> -	 * add_recvbuf_mergeable() + get_mergeable_buf_len()
> -	 */
> -	truesize = headroom ? PAGE_SIZE : truesize;
> -	tailroom = truesize - headroom;
> -	buf = p - headroom;
> -
> +	buf = p;
>   	len -= hdr_len;
>   	offset += hdr_padded_len;
>   	p += hdr_padded_len;
> -	tailroom -= hdr_padded_len + len;
> +	tailroom = truesize - hdr_padded_len - len;
>   
>   	shinfo_size = SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
>   
> @@ -503,7 +491,7 @@ static struct sk_buff *page_to_skb(struct virtnet_info *vi,
>   	if (len <= skb_tailroom(skb))
>   		copy = len;
>   	else
> -		copy = ETH_HLEN + metasize;
> +		copy = ETH_HLEN;
>   	skb_put_data(skb, p, copy);
>   
>   	len -= copy;
> @@ -542,19 +530,11 @@ static struct sk_buff *page_to_skb(struct virtnet_info *vi,
>   		give_pages(rq, page);
>   
>   ok:
> -	/* hdr_valid means no XDP, so we can copy the vnet header */
> -	if (hdr_valid) {
> -		hdr = skb_vnet_hdr(skb);
> -		memcpy(hdr, hdr_p, hdr_len);
> -	}
> +	hdr = skb_vnet_hdr(skb);
> +	memcpy(hdr, hdr_p, hdr_len);
>   	if (page_to_free)
>   		put_page(page_to_free);
>   
> -	if (metasize) {
> -		__skb_pull(skb, metasize);
> -		skb_metadata_set(skb, metasize);
> -	}
> -
>   	return skb;
>   }
>   
> @@ -934,7 +914,7 @@ static struct sk_buff *receive_big(struct net_device *dev,
>   {
>   	struct page *page = buf;
>   	struct sk_buff *skb =
> -		page_to_skb(vi, rq, page, 0, len, PAGE_SIZE, true, 0, 0);
> +		page_to_skb(vi, rq, page, 0, len, PAGE_SIZE);
>   
>   	stats->bytes += len - vi->hdr_len;
>   	if (unlikely(!skb))
> @@ -1222,9 +1202,7 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
>   				rcu_read_unlock();
>   				put_page(page);
>   				head_skb = page_to_skb(vi, rq, xdp_page, offset,
> -						       len, PAGE_SIZE, false,
> -						       metasize,
> -						       headroom);
> +						       len, PAGE_SIZE);
>   				return head_skb;
>   			}
>   			break;
> @@ -1289,8 +1267,7 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
>   	rcu_read_unlock();
>   
>   skip_xdp:
> -	head_skb = page_to_skb(vi, rq, page, offset, len, truesize, !xdp_prog,
> -			       metasize, headroom);
> +	head_skb = page_to_skb(vi, rq, page, offset, len, truesize);
>   	curr_skb = head_skb;
>   
>   	if (unlikely(!curr_skb))


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 6/9] virtio_net: transmit the multi-buffer xdp
  2022-12-27  7:12   ` Jason Wang
@ 2022-12-27  8:26     ` Heng Qi
  2022-12-28  6:30       ` Jason Wang
  0 siblings, 1 reply; 41+ messages in thread
From: Heng Qi @ 2022-12-27  8:26 UTC (permalink / raw)
  To: Jason Wang, netdev, bpf
  Cc: Michael S . Tsirkin, Paolo Abeni, Jakub Kicinski, John Fastabend,
	David S . Miller, Daniel Borkmann, Alexei Starovoitov,
	Eric Dumazet, Xuan Zhuo



在 2022/12/27 下午3:12, Jason Wang 写道:
>
> 在 2022/12/20 22:14, Heng Qi 写道:
>> This serves as the basis for XDP_TX and XDP_REDIRECT
>> to send a multi-buffer xdp_frame.
>>
>> Signed-off-by: Heng Qi <hengqi@linux.alibaba.com>
>> Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
>> ---
>>   drivers/net/virtio_net.c | 27 ++++++++++++++++++++++-----
>>   1 file changed, 22 insertions(+), 5 deletions(-)
>>
>> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
>> index 40bc58fa57f5..9f31bfa7f9a6 100644
>> --- a/drivers/net/virtio_net.c
>> +++ b/drivers/net/virtio_net.c
>> @@ -563,22 +563,39 @@ static int __virtnet_xdp_xmit_one(struct 
>> virtnet_info *vi,
>>                      struct xdp_frame *xdpf)
>>   {
>>       struct virtio_net_hdr_mrg_rxbuf *hdr;
>> -    int err;
>> +    struct skb_shared_info *shinfo;
>> +    u8 nr_frags = 0;
>> +    int err, i;
>>         if (unlikely(xdpf->headroom < vi->hdr_len))
>>           return -EOVERFLOW;
>>   -    /* Make room for virtqueue hdr (also change xdpf->headroom?) */
>> +    if (unlikely(xdp_frame_has_frags(xdpf))) {
>> +        shinfo = xdp_get_shared_info_from_frame(xdpf);
>> +        nr_frags = shinfo->nr_frags;
>> +    }
>> +
>> +    /* Need to adjust this to calculate the correct postion
>> +     * for shinfo of the xdpf.
>> +     */
>> +    xdpf->headroom -= vi->hdr_len;
>
>
> Any reason we need to do this here? (Or if it is, is it only needed 
> for multibuffer XDP?)

Going back to its wrapping function virtnet_xdp_xmit(), we need to free 
up the pending old buffers.
If the "is_xdp_frame(ptr)" condition is met, then we need to calculate 
the position of skb_shared_info
in xdp_get_frame_len() and xdp_return_frame(), which will involve to 
xdpf->data and xdpf->headroom.
Therefore, we need to update the value of headroom synchronously here.

Also, it's not necessary for single-buffer xdp, but we need to keep it 
because it's harmless and as it should be.

Thanks.

>
> Other looks good.
>
> Thanks
>
>
>>       xdpf->data -= vi->hdr_len;
>>       /* Zero header and leave csum up to XDP layers */
>>       hdr = xdpf->data;
>>       memset(hdr, 0, vi->hdr_len);
>>       xdpf->len   += vi->hdr_len;
>>   -    sg_init_one(sq->sg, xdpf->data, xdpf->len);
>> +    sg_init_table(sq->sg, nr_frags + 1);
>> +    sg_set_buf(sq->sg, xdpf->data, xdpf->len);
>> +    for (i = 0; i < nr_frags; i++) {
>> +        skb_frag_t *frag = &shinfo->frags[i];
>> +
>> +        sg_set_page(&sq->sg[i + 1], skb_frag_page(frag),
>> +                skb_frag_size(frag), skb_frag_off(frag));
>> +    }
>>   -    err = virtqueue_add_outbuf(sq->vq, sq->sg, 1, xdp_to_ptr(xdpf),
>> -                   GFP_ATOMIC);
>> +    err = virtqueue_add_outbuf(sq->vq, sq->sg, nr_frags + 1,
>> +                   xdp_to_ptr(xdpf), GFP_ATOMIC);
>>       if (unlikely(err))
>>           return -ENOSPC; /* Caller handle free/refcnt */


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 8/9] virtio_net: remove xdp related info from page_to_skb()
  2022-12-27  7:55   ` Jason Wang
@ 2022-12-27  8:27     ` Heng Qi
  0 siblings, 0 replies; 41+ messages in thread
From: Heng Qi @ 2022-12-27  8:27 UTC (permalink / raw)
  To: Jason Wang, netdev, bpf
  Cc: Michael S . Tsirkin, Paolo Abeni, Jakub Kicinski, John Fastabend,
	David S . Miller, Daniel Borkmann, Alexei Starovoitov,
	Eric Dumazet, Xuan Zhuo



在 2022/12/27 下午3:55, Jason Wang 写道:
>
> 在 2022/12/20 22:14, Heng Qi 写道:
>> For the clear construction of xdp_buff, we remove the xdp processing
>> interleaved with page_to_skb(). Now, the logic of xdp and building
>> skb from xdp are separate and independent.
>>
>> Signed-off-by: Heng Qi <hengqi@linux.alibaba.com>
>> Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
>
>
> Acked-by: Jason Wang <jasowang@redhat.com>

Thanks for your energy.

>
> Thanks
>
>
>> ---
>>   drivers/net/virtio_net.c | 41 +++++++++-------------------------------
>>   1 file changed, 9 insertions(+), 32 deletions(-)
>>
>> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
>> index 4e12196fcfd4..398ffe2a5084 100644
>> --- a/drivers/net/virtio_net.c
>> +++ b/drivers/net/virtio_net.c
>> @@ -439,9 +439,7 @@ static unsigned int 
>> mergeable_ctx_to_truesize(void *mrg_ctx)
>>   static struct sk_buff *page_to_skb(struct virtnet_info *vi,
>>                      struct receive_queue *rq,
>>                      struct page *page, unsigned int offset,
>> -                   unsigned int len, unsigned int truesize,
>> -                   bool hdr_valid, unsigned int metasize,
>> -                   unsigned int headroom)
>> +                   unsigned int len, unsigned int truesize)
>>   {
>>       struct sk_buff *skb;
>>       struct virtio_net_hdr_mrg_rxbuf *hdr;
>> @@ -459,21 +457,11 @@ static struct sk_buff *page_to_skb(struct 
>> virtnet_info *vi,
>>       else
>>           hdr_padded_len = sizeof(struct padded_vnet_hdr);
>>   -    /* If headroom is not 0, there is an offset between the 
>> beginning of the
>> -     * data and the allocated space, otherwise the data and the 
>> allocated
>> -     * space are aligned.
>> -     *
>> -     * Buffers with headroom use PAGE_SIZE as alloc size, see
>> -     * add_recvbuf_mergeable() + get_mergeable_buf_len()
>> -     */
>> -    truesize = headroom ? PAGE_SIZE : truesize;
>> -    tailroom = truesize - headroom;
>> -    buf = p - headroom;
>> -
>> +    buf = p;
>>       len -= hdr_len;
>>       offset += hdr_padded_len;
>>       p += hdr_padded_len;
>> -    tailroom -= hdr_padded_len + len;
>> +    tailroom = truesize - hdr_padded_len - len;
>>         shinfo_size = SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
>>   @@ -503,7 +491,7 @@ static struct sk_buff *page_to_skb(struct 
>> virtnet_info *vi,
>>       if (len <= skb_tailroom(skb))
>>           copy = len;
>>       else
>> -        copy = ETH_HLEN + metasize;
>> +        copy = ETH_HLEN;
>>       skb_put_data(skb, p, copy);
>>         len -= copy;
>> @@ -542,19 +530,11 @@ static struct sk_buff *page_to_skb(struct 
>> virtnet_info *vi,
>>           give_pages(rq, page);
>>     ok:
>> -    /* hdr_valid means no XDP, so we can copy the vnet header */
>> -    if (hdr_valid) {
>> -        hdr = skb_vnet_hdr(skb);
>> -        memcpy(hdr, hdr_p, hdr_len);
>> -    }
>> +    hdr = skb_vnet_hdr(skb);
>> +    memcpy(hdr, hdr_p, hdr_len);
>>       if (page_to_free)
>>           put_page(page_to_free);
>>   -    if (metasize) {
>> -        __skb_pull(skb, metasize);
>> -        skb_metadata_set(skb, metasize);
>> -    }
>> -
>>       return skb;
>>   }
>>   @@ -934,7 +914,7 @@ static struct sk_buff *receive_big(struct 
>> net_device *dev,
>>   {
>>       struct page *page = buf;
>>       struct sk_buff *skb =
>> -        page_to_skb(vi, rq, page, 0, len, PAGE_SIZE, true, 0, 0);
>> +        page_to_skb(vi, rq, page, 0, len, PAGE_SIZE);
>>         stats->bytes += len - vi->hdr_len;
>>       if (unlikely(!skb))
>> @@ -1222,9 +1202,7 @@ static struct sk_buff *receive_mergeable(struct 
>> net_device *dev,
>>                   rcu_read_unlock();
>>                   put_page(page);
>>                   head_skb = page_to_skb(vi, rq, xdp_page, offset,
>> -                               len, PAGE_SIZE, false,
>> -                               metasize,
>> -                               headroom);
>> +                               len, PAGE_SIZE);
>>                   return head_skb;
>>               }
>>               break;
>> @@ -1289,8 +1267,7 @@ static struct sk_buff *receive_mergeable(struct 
>> net_device *dev,
>>       rcu_read_unlock();
>>     skip_xdp:
>> -    head_skb = page_to_skb(vi, rq, page, offset, len, truesize, 
>> !xdp_prog,
>> -                   metasize, headroom);
>> +    head_skb = page_to_skb(vi, rq, page, offset, len, truesize);
>>       curr_skb = head_skb;
>>         if (unlikely(!curr_skb))


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 9/9] virtio_net: support multi-buffer xdp
  2022-12-20 14:14 ` [PATCH v2 9/9] virtio_net: support multi-buffer xdp Heng Qi
@ 2022-12-27  9:03   ` Jason Wang
  2022-12-27  9:11     ` Heng Qi
  0 siblings, 1 reply; 41+ messages in thread
From: Jason Wang @ 2022-12-27  9:03 UTC (permalink / raw)
  To: Heng Qi
  Cc: netdev, bpf, Michael S . Tsirkin, Paolo Abeni, Jakub Kicinski,
	John Fastabend, David S . Miller, Daniel Borkmann,
	Alexei Starovoitov, Eric Dumazet, Xuan Zhuo

On Tue, Dec 20, 2022 at 10:15 PM Heng Qi <hengqi@linux.alibaba.com> wrote:
>
> Driver can pass the skb to stack by build_skb_from_xdp_buff().
>
> Driver forwards multi-buffer packets using the send queue
> when XDP_TX and XDP_REDIRECT, and clears the reference of multi
> pages when XDP_DROP.
>
> Signed-off-by: Heng Qi <hengqi@linux.alibaba.com>
> Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>

Acked-by: Jason Wang <jasowang@redhat.com>

Thanks

> ---
>  drivers/net/virtio_net.c | 65 ++++++----------------------------------
>  1 file changed, 9 insertions(+), 56 deletions(-)
>
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index 398ffe2a5084..daa380b9d1cc 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -1074,7 +1074,6 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
>         struct bpf_prog *xdp_prog;
>         unsigned int truesize = mergeable_ctx_to_truesize(ctx);
>         unsigned int headroom = mergeable_ctx_to_headroom(ctx);
> -       unsigned int metasize = 0;
>         unsigned int frame_sz;
>         int err;
>
> @@ -1165,63 +1164,22 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
>
>                 switch (act) {
>                 case XDP_PASS:
> -                       metasize = xdp.data - xdp.data_meta;
> -
> -                       /* recalculate offset to account for any header
> -                        * adjustments and minus the metasize to copy the
> -                        * metadata in page_to_skb(). Note other cases do not
> -                        * build an skb and avoid using offset
> -                        */
> -                       offset = xdp.data - page_address(xdp_page) -
> -                                vi->hdr_len - metasize;
> -
> -                       /* recalculate len if xdp.data, xdp.data_end or
> -                        * xdp.data_meta were adjusted
> -                        */
> -                       len = xdp.data_end - xdp.data + vi->hdr_len + metasize;
> -
> -                       /* recalculate headroom if xdp.data or xdp_data_meta
> -                        * were adjusted, note that offset should always point
> -                        * to the start of the reserved bytes for virtio_net
> -                        * header which are followed by xdp.data, that means
> -                        * that offset is equal to the headroom (when buf is
> -                        * starting at the beginning of the page, otherwise
> -                        * there is a base offset inside the page) but it's used
> -                        * with a different starting point (buf start) than
> -                        * xdp.data (buf start + vnet hdr size). If xdp.data or
> -                        * data_meta were adjusted by the xdp prog then the
> -                        * headroom size has changed and so has the offset, we
> -                        * can use data_hard_start, which points at buf start +
> -                        * vnet hdr size, to calculate the new headroom and use
> -                        * it later to compute buf start in page_to_skb()
> -                        */
> -                       headroom = xdp.data - xdp.data_hard_start - metasize;
> -
> -                       /* We can only create skb based on xdp_page. */
> -                       if (unlikely(xdp_page != page)) {
> -                               rcu_read_unlock();
> -                               put_page(page);
> -                               head_skb = page_to_skb(vi, rq, xdp_page, offset,
> -                                                      len, PAGE_SIZE);
> -                               return head_skb;
> -                       }
> -                       break;
> +                       head_skb = build_skb_from_xdp_buff(dev, vi, &xdp, xdp_frags_truesz);
> +                       rcu_read_unlock();
> +                       return head_skb;
>                 case XDP_TX:
>                         stats->xdp_tx++;
>                         xdpf = xdp_convert_buff_to_frame(&xdp);
>                         if (unlikely(!xdpf)) {
> -                               if (unlikely(xdp_page != page))
> -                                       put_page(xdp_page);
> -                               goto err_xdp;
> +                               netdev_dbg(dev, "convert buff to frame failed for xdp\n");
> +                               goto err_xdp_frags;
>                         }
>                         err = virtnet_xdp_xmit(dev, 1, &xdpf, 0);
>                         if (unlikely(!err)) {
>                                 xdp_return_frame_rx_napi(xdpf);
>                         } else if (unlikely(err < 0)) {
>                                 trace_xdp_exception(vi->dev, xdp_prog, act);
> -                               if (unlikely(xdp_page != page))
> -                                       put_page(xdp_page);
> -                               goto err_xdp;
> +                               goto err_xdp_frags;
>                         }
>                         *xdp_xmit |= VIRTIO_XDP_TX;
>                         if (unlikely(xdp_page != page))
> @@ -1231,11 +1189,8 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
>                 case XDP_REDIRECT:
>                         stats->xdp_redirects++;
>                         err = xdp_do_redirect(dev, &xdp, xdp_prog);
> -                       if (err) {
> -                               if (unlikely(xdp_page != page))
> -                                       put_page(xdp_page);
> -                               goto err_xdp;
> -                       }
> +                       if (err)
> +                               goto err_xdp_frags;
>                         *xdp_xmit |= VIRTIO_XDP_REDIR;
>                         if (unlikely(xdp_page != page))
>                                 put_page(page);
> @@ -1248,9 +1203,7 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
>                         trace_xdp_exception(vi->dev, xdp_prog, act);
>                         fallthrough;
>                 case XDP_DROP:
> -                       if (unlikely(xdp_page != page))
> -                               __free_pages(xdp_page, 0);
> -                       goto err_xdp;
> +                       goto err_xdp_frags;
>                 }
>  err_xdp_frags:
>                 shinfo = xdp_get_shared_info_from_buff(&xdp);
> --
> 2.19.1.6.gb485710b
>


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 4/9] virtio_net: build xdp_buff with multi buffers
  2022-12-27  6:46   ` Jason Wang
@ 2022-12-27  9:10     ` Heng Qi
  2022-12-28  6:27       ` Jason Wang
  0 siblings, 1 reply; 41+ messages in thread
From: Heng Qi @ 2022-12-27  9:10 UTC (permalink / raw)
  To: Jason Wang, netdev, bpf
  Cc: Michael S . Tsirkin, Paolo Abeni, Jakub Kicinski, John Fastabend,
	David S . Miller, Daniel Borkmann, Alexei Starovoitov,
	Eric Dumazet, Xuan Zhuo



在 2022/12/27 下午2:46, Jason Wang 写道:
>
> 在 2022/12/20 22:14, Heng Qi 写道:
>> Support xdp for multi buffer packets in mergeable mode.
>>
>> Putting the first buffer as the linear part for xdp_buff,
>> and the rest of the buffers as non-linear fragments to struct
>> skb_shared_info in the tailroom belonging to xdp_buff.
>>
>> Signed-off-by: Heng Qi <hengqi@linux.alibaba.com>
>> Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
>> ---
>>   drivers/net/virtio_net.c | 78 ++++++++++++++++++++++++++++++++++++++++
>>   1 file changed, 78 insertions(+)
>>
>> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
>> index 08f209d7b0bf..8fc3b1841d92 100644
>> --- a/drivers/net/virtio_net.c
>> +++ b/drivers/net/virtio_net.c
>> @@ -931,6 +931,84 @@ static struct sk_buff *receive_big(struct 
>> net_device *dev,
>>       return NULL;
>>   }
>>   +/* TODO: build xdp in big mode */
>> +static int virtnet_build_xdp_buff_mrg(struct net_device *dev,
>> +                      struct virtnet_info *vi,
>> +                      struct receive_queue *rq,
>> +                      struct xdp_buff *xdp,
>> +                      void *buf,
>> +                      unsigned int len,
>> +                      unsigned int frame_sz,
>> +                      u16 *num_buf,
>> +                      unsigned int *xdp_frags_truesize,
>> +                      struct virtnet_rq_stats *stats)
>> +{
>> +    unsigned int tailroom = SKB_DATA_ALIGN(sizeof(struct 
>> skb_shared_info));
>> +    struct virtio_net_hdr_mrg_rxbuf *hdr = buf;
>> +    unsigned int truesize, headroom;
>> +    struct skb_shared_info *shinfo;
>> +    unsigned int xdp_frags_truesz = 0;
>> +    unsigned int cur_frag_size;
>> +    struct page *page;
>> +    skb_frag_t *frag;
>> +    int offset;
>> +    void *ctx;
>> +
>> +    xdp_init_buff(xdp, frame_sz, &rq->xdp_rxq);
>> +    xdp_prepare_buff(xdp, buf - VIRTIO_XDP_HEADROOM,
>> +             VIRTIO_XDP_HEADROOM + vi->hdr_len, len - vi->hdr_len, 
>> true);
>> +
>> +    if (*num_buf > 1) {
>> +        shinfo = xdp_get_shared_info_from_buff(xdp);
>> +        shinfo->nr_frags = 0;
>> +        shinfo->xdp_frags_size = 0;
>> +    }
>> +
>> +    if ((*num_buf - 1) > MAX_SKB_FRAGS)
>> +        return -EINVAL;
>> +
>> +    while ((--*num_buf) >= 1) {
>> +        buf = virtqueue_get_buf_ctx(rq->vq, &len, &ctx);
>> +        if (unlikely(!buf)) {
>> +            pr_debug("%s: rx error: %d buffers out of %d missing\n",
>> +                 dev->name, *num_buf,
>> +                 virtio16_to_cpu(vi->vdev, hdr->num_buffers));
>> +            dev->stats.rx_length_errors++;
>> +            return -EINVAL;
>> +        }
>> +
>> +        if (!xdp_buff_has_frags(xdp))
>> +            xdp_buff_set_frags_flag(xdp);
>
>
> Any reason to put this inside the loop?

I'll move it outside the loop in the next version.

>
>
>> +
>> +        stats->bytes += len;
>> +        page = virt_to_head_page(buf);
>> +        offset = buf - page_address(page);
>> +        truesize = mergeable_ctx_to_truesize(ctx);
>> +        headroom = mergeable_ctx_to_headroom(ctx);
>> +
>> +        cur_frag_size = truesize + (headroom ? (headroom + tailroom) 
>> : 0);
>> +        xdp_frags_truesz += cur_frag_size;
>
>
> Not related to this patch, but it would easily confuse the future 
> readers that the we need another math for truesize. I think at least 
> we need some comments for this or

Yes it might, I'll add more comments on this.

>
> I guess the root cause is in get_mergeable_buf_len:
>
> static unsigned int get_mergeable_buf_len(struct receive_queue *rq,
>                                       struct ewma_pkt_len *avg_pkt_len,
>                                           unsigned int room)
> {
>         struct virtnet_info *vi = rq->vq->vdev->priv;
>         const size_t hdr_len = vi->hdr_len;
>         unsigned int len;
>
>         if (room)
>         return PAGE_SIZE - room;
>
> And we do
>
>     len = get_mergeable_buf_len(rq, &rq->mrg_avg_pkt_len, room);
>
>     ...
>
>     ctx = mergeable_len_to_ctx(len, headroom);
>
>
> I wonder if it's better to pack the real truesize (PAGE_SIZE) here. 
> This may ease a lot of things.

I don't know the historical reason for not packing, but I guess this is 
for the convenience of
comparing the actual length len given by the device with truesize. 
Therefore, I think it would
be better to keep the above practice for now? Perhaps, I can add more 
explanation to the
above code to help future readers try not to get confused.

Thanks.

>
> Thanks
>
>
>> +        if (unlikely(len > truesize || cur_frag_size > PAGE_SIZE)) {
>> +            pr_debug("%s: rx error: len %u exceeds truesize %lu\n",
>> +                 dev->name, len, (unsigned long)ctx);
>> +            dev->stats.rx_length_errors++;
>> +            return -EINVAL;
>> +        }
>> +
>> +        frag = &shinfo->frags[shinfo->nr_frags++];
>> +        __skb_frag_set_page(frag, page);
>> +        skb_frag_off_set(frag, offset);
>> +        skb_frag_size_set(frag, len);
>> +        if (page_is_pfmemalloc(page))
>> +            xdp_buff_set_frag_pfmemalloc(xdp);
>> +
>> +        shinfo->xdp_frags_size += len;
>> +    }
>> +
>> +    *xdp_frags_truesize = xdp_frags_truesz;
>> +    return 0;
>> +}
>> +
>>   static struct sk_buff *receive_mergeable(struct net_device *dev,
>>                        struct virtnet_info *vi,
>>                        struct receive_queue *rq,


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 9/9] virtio_net: support multi-buffer xdp
  2022-12-27  9:03   ` Jason Wang
@ 2022-12-27  9:11     ` Heng Qi
  0 siblings, 0 replies; 41+ messages in thread
From: Heng Qi @ 2022-12-27  9:11 UTC (permalink / raw)
  To: Jason Wang
  Cc: netdev, bpf, Michael S . Tsirkin, Paolo Abeni, Jakub Kicinski,
	John Fastabend, David S . Miller, Daniel Borkmann,
	Alexei Starovoitov, Eric Dumazet, Xuan Zhuo



在 2022/12/27 下午5:03, Jason Wang 写道:
> On Tue, Dec 20, 2022 at 10:15 PM Heng Qi <hengqi@linux.alibaba.com> wrote:
>> Driver can pass the skb to stack by build_skb_from_xdp_buff().
>>
>> Driver forwards multi-buffer packets using the send queue
>> when XDP_TX and XDP_REDIRECT, and clears the reference of multi
>> pages when XDP_DROP.
>>
>> Signed-off-by: Heng Qi <hengqi@linux.alibaba.com>
>> Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> Acked-by: Jason Wang <jasowang@redhat.com>

Thanks for your energy.

>
> Thanks
>
>> ---
>>   drivers/net/virtio_net.c | 65 ++++++----------------------------------
>>   1 file changed, 9 insertions(+), 56 deletions(-)
>>
>> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
>> index 398ffe2a5084..daa380b9d1cc 100644
>> --- a/drivers/net/virtio_net.c
>> +++ b/drivers/net/virtio_net.c
>> @@ -1074,7 +1074,6 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
>>          struct bpf_prog *xdp_prog;
>>          unsigned int truesize = mergeable_ctx_to_truesize(ctx);
>>          unsigned int headroom = mergeable_ctx_to_headroom(ctx);
>> -       unsigned int metasize = 0;
>>          unsigned int frame_sz;
>>          int err;
>>
>> @@ -1165,63 +1164,22 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
>>
>>                  switch (act) {
>>                  case XDP_PASS:
>> -                       metasize = xdp.data - xdp.data_meta;
>> -
>> -                       /* recalculate offset to account for any header
>> -                        * adjustments and minus the metasize to copy the
>> -                        * metadata in page_to_skb(). Note other cases do not
>> -                        * build an skb and avoid using offset
>> -                        */
>> -                       offset = xdp.data - page_address(xdp_page) -
>> -                                vi->hdr_len - metasize;
>> -
>> -                       /* recalculate len if xdp.data, xdp.data_end or
>> -                        * xdp.data_meta were adjusted
>> -                        */
>> -                       len = xdp.data_end - xdp.data + vi->hdr_len + metasize;
>> -
>> -                       /* recalculate headroom if xdp.data or xdp_data_meta
>> -                        * were adjusted, note that offset should always point
>> -                        * to the start of the reserved bytes for virtio_net
>> -                        * header which are followed by xdp.data, that means
>> -                        * that offset is equal to the headroom (when buf is
>> -                        * starting at the beginning of the page, otherwise
>> -                        * there is a base offset inside the page) but it's used
>> -                        * with a different starting point (buf start) than
>> -                        * xdp.data (buf start + vnet hdr size). If xdp.data or
>> -                        * data_meta were adjusted by the xdp prog then the
>> -                        * headroom size has changed and so has the offset, we
>> -                        * can use data_hard_start, which points at buf start +
>> -                        * vnet hdr size, to calculate the new headroom and use
>> -                        * it later to compute buf start in page_to_skb()
>> -                        */
>> -                       headroom = xdp.data - xdp.data_hard_start - metasize;
>> -
>> -                       /* We can only create skb based on xdp_page. */
>> -                       if (unlikely(xdp_page != page)) {
>> -                               rcu_read_unlock();
>> -                               put_page(page);
>> -                               head_skb = page_to_skb(vi, rq, xdp_page, offset,
>> -                                                      len, PAGE_SIZE);
>> -                               return head_skb;
>> -                       }
>> -                       break;
>> +                       head_skb = build_skb_from_xdp_buff(dev, vi, &xdp, xdp_frags_truesz);
>> +                       rcu_read_unlock();
>> +                       return head_skb;
>>                  case XDP_TX:
>>                          stats->xdp_tx++;
>>                          xdpf = xdp_convert_buff_to_frame(&xdp);
>>                          if (unlikely(!xdpf)) {
>> -                               if (unlikely(xdp_page != page))
>> -                                       put_page(xdp_page);
>> -                               goto err_xdp;
>> +                               netdev_dbg(dev, "convert buff to frame failed for xdp\n");
>> +                               goto err_xdp_frags;
>>                          }
>>                          err = virtnet_xdp_xmit(dev, 1, &xdpf, 0);
>>                          if (unlikely(!err)) {
>>                                  xdp_return_frame_rx_napi(xdpf);
>>                          } else if (unlikely(err < 0)) {
>>                                  trace_xdp_exception(vi->dev, xdp_prog, act);
>> -                               if (unlikely(xdp_page != page))
>> -                                       put_page(xdp_page);
>> -                               goto err_xdp;
>> +                               goto err_xdp_frags;
>>                          }
>>                          *xdp_xmit |= VIRTIO_XDP_TX;
>>                          if (unlikely(xdp_page != page))
>> @@ -1231,11 +1189,8 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
>>                  case XDP_REDIRECT:
>>                          stats->xdp_redirects++;
>>                          err = xdp_do_redirect(dev, &xdp, xdp_prog);
>> -                       if (err) {
>> -                               if (unlikely(xdp_page != page))
>> -                                       put_page(xdp_page);
>> -                               goto err_xdp;
>> -                       }
>> +                       if (err)
>> +                               goto err_xdp_frags;
>>                          *xdp_xmit |= VIRTIO_XDP_REDIR;
>>                          if (unlikely(xdp_page != page))
>>                                  put_page(page);
>> @@ -1248,9 +1203,7 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
>>                          trace_xdp_exception(vi->dev, xdp_prog, act);
>>                          fallthrough;
>>                  case XDP_DROP:
>> -                       if (unlikely(xdp_page != page))
>> -                               __free_pages(xdp_page, 0);
>> -                       goto err_xdp;
>> +                       goto err_xdp_frags;
>>                  }
>>   err_xdp_frags:
>>                  shinfo = xdp_get_shared_info_from_buff(&xdp);
>> --
>> 2.19.1.6.gb485710b
>>


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 5/9] virtio_net: construct multi-buffer xdp in mergeable
  2022-12-27  7:01   ` Jason Wang
@ 2022-12-27  9:31     ` Heng Qi
  2022-12-28  6:24       ` Jason Wang
  0 siblings, 1 reply; 41+ messages in thread
From: Heng Qi @ 2022-12-27  9:31 UTC (permalink / raw)
  To: Jason Wang, netdev, bpf
  Cc: Michael S . Tsirkin, Paolo Abeni, Jakub Kicinski, John Fastabend,
	David S . Miller, Daniel Borkmann, Alexei Starovoitov,
	Eric Dumazet, Xuan Zhuo



在 2022/12/27 下午3:01, Jason Wang 写道:
>
> 在 2022/12/20 22:14, Heng Qi 写道:
>> Build multi-buffer xdp using virtnet_build_xdp_buff_mrg().
>>
>> For the prefilled buffer before xdp is set, we will probably use
>> vq reset in the future. At the same time, virtio net currently
>> uses comp pages, and bpf_xdp_frags_increase_tail() needs to calculate
>> the tailroom of the last frag, which will involve the offset of the
>> corresponding page and cause a negative value, so we disable tail
>> increase by not setting xdp_rxq->frag_size.
>>
>> Signed-off-by: Heng Qi <hengqi@linux.alibaba.com>
>> Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
>> ---
>>   drivers/net/virtio_net.c | 60 +++++++++++++++++++++++++++++-----------
>>   1 file changed, 44 insertions(+), 16 deletions(-)
>>
>> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
>> index 8fc3b1841d92..40bc58fa57f5 100644
>> --- a/drivers/net/virtio_net.c
>> +++ b/drivers/net/virtio_net.c
>> @@ -1018,6 +1018,7 @@ static struct sk_buff *receive_mergeable(struct 
>> net_device *dev,
>>                        unsigned int *xdp_xmit,
>>                        struct virtnet_rq_stats *stats)
>>   {
>> +    unsigned int tailroom = SKB_DATA_ALIGN(sizeof(struct 
>> skb_shared_info));
>>       struct virtio_net_hdr_mrg_rxbuf *hdr = buf;
>>       u16 num_buf = virtio16_to_cpu(vi->vdev, hdr->num_buffers);
>>       struct page *page = virt_to_head_page(buf);
>> @@ -1048,11 +1049,14 @@ static struct sk_buff 
>> *receive_mergeable(struct net_device *dev,
>>       rcu_read_lock();
>>       xdp_prog = rcu_dereference(rq->xdp_prog);
>>       if (xdp_prog) {
>> +        unsigned int xdp_frags_truesz = 0;
>> +        struct skb_shared_info *shinfo;
>>           struct xdp_frame *xdpf;
>>           struct page *xdp_page;
>>           struct xdp_buff xdp;
>>           void *data;
>>           u32 act;
>> +        int i;
>>             /* Transient failure which in theory could occur if
>>            * in-flight packets from before XDP was enabled reach
>> @@ -1061,19 +1065,23 @@ static struct sk_buff 
>> *receive_mergeable(struct net_device *dev,
>>           if (unlikely(hdr->hdr.gso_type))
>>               goto err_xdp;
>>   -        /* Buffers with headroom use PAGE_SIZE as alloc size,
>> -         * see add_recvbuf_mergeable() + get_mergeable_buf_len()
>> +        /* Now XDP core assumes frag size is PAGE_SIZE, but buffers
>> +         * with headroom may add hole in truesize, which
>> +         * make their length exceed PAGE_SIZE. So we disabled the
>> +         * hole mechanism for xdp. See add_recvbuf_mergeable().
>>            */
>>           frame_sz = headroom ? PAGE_SIZE : truesize;
>>   -        /* This happens when rx buffer size is underestimated
>> -         * or headroom is not enough because of the buffer
>> -         * was refilled before XDP is set. This should only
>> -         * happen for the first several packets, so we don't
>> -         * care much about its performance.
>> +        /* This happens when headroom is not enough because
>> +         * of the buffer was prefilled before XDP is set.
>> +         * This should only happen for the first several packets.
>> +         * In fact, vq reset can be used here to help us clean up
>> +         * the prefilled buffers, but many existing devices do not
>> +         * support it, and we don't want to bother users who are
>> +         * using xdp normally.
>>            */
>> -        if (unlikely(num_buf > 1 ||
>> -                 headroom < virtnet_get_headroom(vi))) {
>> +        if (!xdp_prog->aux->xdp_has_frags &&
>> +            (num_buf > 1 || headroom < virtnet_get_headroom(vi))) {
>>               /* linearize data for XDP */
>>               xdp_page = xdp_linearize_page(rq, &num_buf,
>>                                 page, offset,
>> @@ -1084,17 +1092,26 @@ static struct sk_buff 
>> *receive_mergeable(struct net_device *dev,
>>               if (!xdp_page)
>>                   goto err_xdp;
>>               offset = VIRTIO_XDP_HEADROOM;
>> +        } else if (unlikely(headroom < virtnet_get_headroom(vi))) {
>
>
> I believe we need to check xdp_prog->aux->xdp_has_frags at least since 
> this may not work if it needs more than one frags?

Sorry Jason, I didn't understand you, I'll try to answer. For 
multi-buffer xdp programs, if the first buffer is a pre-filled buffer 
(no headroom),
we need to copy it out and use the subsequent buffers of this packet as 
its frags (this is done in virtnet_build_xdp_buff_mrg()), therefore,
it seems that there is no need to check 'xdp_prog->aux->xdp_has_frags' 
to mark multi-buffer xdp (of course I can add it),

+ } else if (unlikely(headroom < virtnet_get_headroom(vi))) {

Because the linearization of single-buffer xdp has all been done before, 
the subsequent situation can only be applied to multi-buffer xdp:
+ if (!xdp_prog->aux->xdp_has_frags &&
+ (num_buf > 1 || headroom < virtnet_get_headroom(vi))) {

>
> Btw, I don't see a reason why we can't reuse xdp_linearize_page(), (we 
> probably don't need error is the buffer exceeds PAGE_SIZE).

For multi-buffer xdp, we only need to copy out the pre-filled first 
buffer, and use the remaining buffers of this packet as frags in 
virtnet_build_xdp_buff_mrg().

Thanks.

>
> Other looks good.
>
> Thanks
>
>
>> +            if ((VIRTIO_XDP_HEADROOM + len + tailroom) > PAGE_SIZE)
>> +                goto err_xdp;
>> +
>> +            xdp_page = alloc_page(GFP_ATOMIC);
>> +            if (!xdp_page)
>> +                goto err_xdp;
>> +
>> +            memcpy(page_address(xdp_page) + VIRTIO_XDP_HEADROOM,
>> +                   page_address(page) + offset, len);
>> +            frame_sz = PAGE_SIZE;
>> +            offset = VIRTIO_XDP_HEADROOM;
>>           } else {
>>               xdp_page = page;
>>           }
>> -
>> -        /* Allow consuming headroom but reserve enough space to push
>> -         * the descriptor on if we get an XDP_TX return code.
>> -         */
>>           data = page_address(xdp_page) + offset;
>> -        xdp_init_buff(&xdp, frame_sz - vi->hdr_len, &rq->xdp_rxq);
>> -        xdp_prepare_buff(&xdp, data - VIRTIO_XDP_HEADROOM + 
>> vi->hdr_len,
>> -                 VIRTIO_XDP_HEADROOM, len - vi->hdr_len, true);
>> +        err = virtnet_build_xdp_buff_mrg(dev, vi, rq, &xdp, data, 
>> len, frame_sz,
>> +                         &num_buf, &xdp_frags_truesz, stats);
>> +        if (unlikely(err))
>> +            goto err_xdp_frags;
>>             act = bpf_prog_run_xdp(xdp_prog, &xdp);
>>           stats->xdp_packets++;
>> @@ -1190,6 +1207,17 @@ static struct sk_buff 
>> *receive_mergeable(struct net_device *dev,
>>                   __free_pages(xdp_page, 0);
>>               goto err_xdp;
>>           }
>> +err_xdp_frags:
>> +        shinfo = xdp_get_shared_info_from_buff(&xdp);
>> +
>> +        if (unlikely(xdp_page != page))
>> +            __free_pages(xdp_page, 0);
>> +
>> +        for (i = 0; i < shinfo->nr_frags; i++) {
>> +            xdp_page = skb_frag_page(&shinfo->frags[i]);
>> +            put_page(xdp_page);
>> +        }
>> +        goto err_xdp;
>>       }
>>       rcu_read_unlock();


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 2/9] virtio_net: set up xdp for multi buffer packets
  2022-12-27  6:32   ` Jason Wang
@ 2022-12-27 12:20     ` Heng Qi
  2022-12-28  3:50       ` Heng Qi
  0 siblings, 1 reply; 41+ messages in thread
From: Heng Qi @ 2022-12-27 12:20 UTC (permalink / raw)
  To: Jason Wang, netdev, bpf
  Cc: Michael S . Tsirkin, Paolo Abeni, Jakub Kicinski, John Fastabend,
	David S . Miller, Daniel Borkmann, Alexei Starovoitov,
	Eric Dumazet, Xuan Zhuo



在 2022/12/27 下午2:32, Jason Wang 写道:
>
> 在 2022/12/20 22:14, Heng Qi 写道:
>> When the xdp program sets xdp.frags, which means it can process
>> multi-buffer packets over larger MTU, so we continue to support xdp.
>> But for single-buffer xdp, we should keep checking for MTU.
>>
>> Signed-off-by: Heng Qi <hengqi@linux.alibaba.com>
>> Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
>> ---
>>   drivers/net/virtio_net.c | 4 ++--
>>   1 file changed, 2 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
>> index 443aa7b8f0ad..c5c4e9db4ed3 100644
>> --- a/drivers/net/virtio_net.c
>> +++ b/drivers/net/virtio_net.c
>> @@ -3095,8 +3095,8 @@ static int virtnet_xdp_set(struct net_device 
>> *dev, struct bpf_prog *prog,
>>           return -EINVAL;
>>       }
>>   -    if (dev->mtu > max_sz) {
>> -        NL_SET_ERR_MSG_MOD(extack, "MTU too large to enable XDP");
>> +    if (prog && !prog->aux->xdp_has_frags && dev->mtu > max_sz) {
>
>
> Not related to this patch, but I see:
>
>         unsigned long int max_sz = PAGE_SIZE - sizeof(struct 
> padded_vnet_hdr);
>
> Which is suspicious, do we need to count reserved headroom/tailroom as 
> well?

This seems to be suspicious. After loading xdp, the size of the filled 
avail buffer
is (PAGE_SIZE - headroom - tailroom), so the size of the received used 
buffer, ie MTU,
should also be (PAGE_SIZE - headroom - tailroom).

Thanks.

>
> Thanks
>
>
>> +        NL_SET_ERR_MSG_MOD(extack, "MTU too large to enable XDP 
>> without frags");
>>           netdev_warn(dev, "XDP requires MTU less than %lu\n", max_sz);
>>           return -EINVAL;
>>       }


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 2/9] virtio_net: set up xdp for multi buffer packets
  2022-12-27 12:20     ` Heng Qi
@ 2022-12-28  3:50       ` Heng Qi
  2022-12-28  6:27         ` Jason Wang
  0 siblings, 1 reply; 41+ messages in thread
From: Heng Qi @ 2022-12-28  3:50 UTC (permalink / raw)
  To: Jason Wang, netdev, bpf
  Cc: Michael S . Tsirkin, Paolo Abeni, Jakub Kicinski, John Fastabend,
	David S . Miller, Daniel Borkmann, Alexei Starovoitov,
	Eric Dumazet, Xuan Zhuo



在 2022/12/27 下午8:20, Heng Qi 写道:
>
>
> 在 2022/12/27 下午2:32, Jason Wang 写道:
>>
>> 在 2022/12/20 22:14, Heng Qi 写道:
>>> When the xdp program sets xdp.frags, which means it can process
>>> multi-buffer packets over larger MTU, so we continue to support xdp.
>>> But for single-buffer xdp, we should keep checking for MTU.
>>>
>>> Signed-off-by: Heng Qi <hengqi@linux.alibaba.com>
>>> Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
>>> ---
>>>   drivers/net/virtio_net.c | 4 ++--
>>>   1 file changed, 2 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
>>> index 443aa7b8f0ad..c5c4e9db4ed3 100644
>>> --- a/drivers/net/virtio_net.c
>>> +++ b/drivers/net/virtio_net.c
>>> @@ -3095,8 +3095,8 @@ static int virtnet_xdp_set(struct net_device 
>>> *dev, struct bpf_prog *prog,
>>>           return -EINVAL;
>>>       }
>>>   -    if (dev->mtu > max_sz) {
>>> -        NL_SET_ERR_MSG_MOD(extack, "MTU too large to enable XDP");
>>> +    if (prog && !prog->aux->xdp_has_frags && dev->mtu > max_sz) {
>>
>>
>> Not related to this patch, but I see:
>>
>>         unsigned long int max_sz = PAGE_SIZE - sizeof(struct 
>> padded_vnet_hdr);
>>
>> Which is suspicious, do we need to count reserved headroom/tailroom 
>> as well?
>
> This seems to be suspicious. After loading xdp, the size of the filled 
> avail buffer
> is (PAGE_SIZE - headroom - tailroom), so the size of the received used 
> buffer, ie MTU,
> should also be (PAGE_SIZE - headroom - tailroom).

Hi Jason, this is indeed a problem. After verification, packet drop will 
indeed occur.  To avoid this,
the size of MTU should be (PAGE_SIZE - headroom - tailroom - ethhdr = 
4096 - 256 -320 - 14 =3506).
Because when there is xdp, each filling is 3520 (PAGE_SIZE - room), if 
the value of (MTU + 14) is
greater than 3520 (because the MTU does not contain the ethernet 
header), then the packet with a
length greater than 3520 will come in, so num_buf will still be greater 
than or equal to 2, and then
xdp_linearize_page() will be performed and the packet will be dropped 
because the total length is
greater than PAGE_SIZE.

I will make a separate bugfix patch to fix this later.

Thanks.

>
> Thanks.
>
>>
>> Thanks
>>
>>
>>> +        NL_SET_ERR_MSG_MOD(extack, "MTU too large to enable XDP 
>>> without frags");
>>>           netdev_warn(dev, "XDP requires MTU less than %lu\n", max_sz);
>>>           return -EINVAL;
>>>       }


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 5/9] virtio_net: construct multi-buffer xdp in mergeable
  2022-12-27  9:31     ` Heng Qi
@ 2022-12-28  6:24       ` Jason Wang
  2022-12-28  8:23         ` Heng Qi
  0 siblings, 1 reply; 41+ messages in thread
From: Jason Wang @ 2022-12-28  6:24 UTC (permalink / raw)
  To: Heng Qi, netdev, bpf
  Cc: Michael S . Tsirkin, Paolo Abeni, Jakub Kicinski, John Fastabend,
	David S . Miller, Daniel Borkmann, Alexei Starovoitov,
	Eric Dumazet, Xuan Zhuo


在 2022/12/27 17:31, Heng Qi 写道:
>
>
> 在 2022/12/27 下午3:01, Jason Wang 写道:
>>
>> 在 2022/12/20 22:14, Heng Qi 写道:
>>> Build multi-buffer xdp using virtnet_build_xdp_buff_mrg().
>>>
>>> For the prefilled buffer before xdp is set, we will probably use
>>> vq reset in the future. At the same time, virtio net currently
>>> uses comp pages, and bpf_xdp_frags_increase_tail() needs to calculate
>>> the tailroom of the last frag, which will involve the offset of the
>>> corresponding page and cause a negative value, so we disable tail
>>> increase by not setting xdp_rxq->frag_size.
>>>
>>> Signed-off-by: Heng Qi <hengqi@linux.alibaba.com>
>>> Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
>>> ---
>>>   drivers/net/virtio_net.c | 60 
>>> +++++++++++++++++++++++++++++-----------
>>>   1 file changed, 44 insertions(+), 16 deletions(-)
>>>
>>> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
>>> index 8fc3b1841d92..40bc58fa57f5 100644
>>> --- a/drivers/net/virtio_net.c
>>> +++ b/drivers/net/virtio_net.c
>>> @@ -1018,6 +1018,7 @@ static struct sk_buff 
>>> *receive_mergeable(struct net_device *dev,
>>>                        unsigned int *xdp_xmit,
>>>                        struct virtnet_rq_stats *stats)
>>>   {
>>> +    unsigned int tailroom = SKB_DATA_ALIGN(sizeof(struct 
>>> skb_shared_info));
>>>       struct virtio_net_hdr_mrg_rxbuf *hdr = buf;
>>>       u16 num_buf = virtio16_to_cpu(vi->vdev, hdr->num_buffers);
>>>       struct page *page = virt_to_head_page(buf);
>>> @@ -1048,11 +1049,14 @@ static struct sk_buff 
>>> *receive_mergeable(struct net_device *dev,
>>>       rcu_read_lock();
>>>       xdp_prog = rcu_dereference(rq->xdp_prog);
>>>       if (xdp_prog) {
>>> +        unsigned int xdp_frags_truesz = 0;
>>> +        struct skb_shared_info *shinfo;
>>>           struct xdp_frame *xdpf;
>>>           struct page *xdp_page;
>>>           struct xdp_buff xdp;
>>>           void *data;
>>>           u32 act;
>>> +        int i;
>>>             /* Transient failure which in theory could occur if
>>>            * in-flight packets from before XDP was enabled reach
>>> @@ -1061,19 +1065,23 @@ static struct sk_buff 
>>> *receive_mergeable(struct net_device *dev,
>>>           if (unlikely(hdr->hdr.gso_type))
>>>               goto err_xdp;
>>>   -        /* Buffers with headroom use PAGE_SIZE as alloc size,
>>> -         * see add_recvbuf_mergeable() + get_mergeable_buf_len()
>>> +        /* Now XDP core assumes frag size is PAGE_SIZE, but buffers
>>> +         * with headroom may add hole in truesize, which
>>> +         * make their length exceed PAGE_SIZE. So we disabled the
>>> +         * hole mechanism for xdp. See add_recvbuf_mergeable().
>>>            */
>>>           frame_sz = headroom ? PAGE_SIZE : truesize;
>>>   -        /* This happens when rx buffer size is underestimated
>>> -         * or headroom is not enough because of the buffer
>>> -         * was refilled before XDP is set. This should only
>>> -         * happen for the first several packets, so we don't
>>> -         * care much about its performance.
>>> +        /* This happens when headroom is not enough because
>>> +         * of the buffer was prefilled before XDP is set.
>>> +         * This should only happen for the first several packets.
>>> +         * In fact, vq reset can be used here to help us clean up
>>> +         * the prefilled buffers, but many existing devices do not
>>> +         * support it, and we don't want to bother users who are
>>> +         * using xdp normally.
>>>            */
>>> -        if (unlikely(num_buf > 1 ||
>>> -                 headroom < virtnet_get_headroom(vi))) {
>>> +        if (!xdp_prog->aux->xdp_has_frags &&
>>> +            (num_buf > 1 || headroom < virtnet_get_headroom(vi))) {
>>>               /* linearize data for XDP */
>>>               xdp_page = xdp_linearize_page(rq, &num_buf,
>>>                                 page, offset,
>>> @@ -1084,17 +1092,26 @@ static struct sk_buff 
>>> *receive_mergeable(struct net_device *dev,
>>>               if (!xdp_page)
>>>                   goto err_xdp;
>>>               offset = VIRTIO_XDP_HEADROOM;
>>> +        } else if (unlikely(headroom < virtnet_get_headroom(vi))) {
>>
>>
>> I believe we need to check xdp_prog->aux->xdp_has_frags at least 
>> since this may not work if it needs more than one frags?
>
> Sorry Jason, I didn't understand you, I'll try to answer. For 
> multi-buffer xdp programs, if the first buffer is a pre-filled buffer 
> (no headroom),
> we need to copy it out and use the subsequent buffers of this packet 
> as its frags (this is done in virtnet_build_xdp_buff_mrg()), therefore,
> it seems that there is no need to check 'xdp_prog->aux->xdp_has_frags' 
> to mark multi-buffer xdp (of course I can add it),
>
> + } else if (unlikely(headroom < virtnet_get_headroom(vi))) {
>
> Because the linearization of single-buffer xdp has all been done 
> before, the subsequent situation can only be applied to multi-buffer xdp:
> + if (!xdp_prog->aux->xdp_has_frags &&
> + (num_buf > 1 || headroom < virtnet_get_headroom(vi))) {


I basically meant what happens if

!xdp_prog->aux->xdp_has_frags && num_buf > 2 && headroom < 
virtnet_get_headroom(vi)

In this case the current code seems to leave the second buffer in the 
frags. This is the case of the buffer size underestimation that is 
mentioned in the comment before (I'd like to keep that).

(And that's why I'm asking to use linearizge_page())

Thanks


>
>>
>> Btw, I don't see a reason why we can't reuse xdp_linearize_page(), 
>> (we probably don't need error is the buffer exceeds PAGE_SIZE).
>
> For multi-buffer xdp, we only need to copy out the pre-filled first 
> buffer, and use the remaining buffers of this packet as frags in 
> virtnet_build_xdp_buff_mrg().
>
> Thanks.
>
>>
>> Other looks good.
>>
>> Thanks
>>
>>
>>> +            if ((VIRTIO_XDP_HEADROOM + len + tailroom) > PAGE_SIZE)
>>> +                goto err_xdp;
>>> +
>>> +            xdp_page = alloc_page(GFP_ATOMIC);
>>> +            if (!xdp_page)
>>> +                goto err_xdp;
>>> +
>>> +            memcpy(page_address(xdp_page) + VIRTIO_XDP_HEADROOM,
>>> +                   page_address(page) + offset, len);
>>> +            frame_sz = PAGE_SIZE;
>>> +            offset = VIRTIO_XDP_HEADROOM;
>>>           } else {
>>>               xdp_page = page;
>>>           }
>>> -
>>> -        /* Allow consuming headroom but reserve enough space to push
>>> -         * the descriptor on if we get an XDP_TX return code.
>>> -         */
>>>           data = page_address(xdp_page) + offset;
>>> -        xdp_init_buff(&xdp, frame_sz - vi->hdr_len, &rq->xdp_rxq);
>>> -        xdp_prepare_buff(&xdp, data - VIRTIO_XDP_HEADROOM + 
>>> vi->hdr_len,
>>> -                 VIRTIO_XDP_HEADROOM, len - vi->hdr_len, true);
>>> +        err = virtnet_build_xdp_buff_mrg(dev, vi, rq, &xdp, data, 
>>> len, frame_sz,
>>> +                         &num_buf, &xdp_frags_truesz, stats);
>>> +        if (unlikely(err))
>>> +            goto err_xdp_frags;
>>>             act = bpf_prog_run_xdp(xdp_prog, &xdp);
>>>           stats->xdp_packets++;
>>> @@ -1190,6 +1207,17 @@ static struct sk_buff 
>>> *receive_mergeable(struct net_device *dev,
>>>                   __free_pages(xdp_page, 0);
>>>               goto err_xdp;
>>>           }
>>> +err_xdp_frags:
>>> +        shinfo = xdp_get_shared_info_from_buff(&xdp);
>>> +
>>> +        if (unlikely(xdp_page != page))
>>> +            __free_pages(xdp_page, 0);
>>> +
>>> +        for (i = 0; i < shinfo->nr_frags; i++) {
>>> +            xdp_page = skb_frag_page(&shinfo->frags[i]);
>>> +            put_page(xdp_page);
>>> +        }
>>> +        goto err_xdp;
>>>       }
>>>       rcu_read_unlock();
>


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 4/9] virtio_net: build xdp_buff with multi buffers
  2022-12-27  9:10     ` Heng Qi
@ 2022-12-28  6:27       ` Jason Wang
  2022-12-28  8:17         ` Heng Qi
  0 siblings, 1 reply; 41+ messages in thread
From: Jason Wang @ 2022-12-28  6:27 UTC (permalink / raw)
  To: Heng Qi, netdev, bpf
  Cc: Michael S . Tsirkin, Paolo Abeni, Jakub Kicinski, John Fastabend,
	David S . Miller, Daniel Borkmann, Alexei Starovoitov,
	Eric Dumazet, Xuan Zhuo


在 2022/12/27 17:10, Heng Qi 写道:
>
>
> 在 2022/12/27 下午2:46, Jason Wang 写道:
>>
>> 在 2022/12/20 22:14, Heng Qi 写道:
>>> Support xdp for multi buffer packets in mergeable mode.
>>>
>>> Putting the first buffer as the linear part for xdp_buff,
>>> and the rest of the buffers as non-linear fragments to struct
>>> skb_shared_info in the tailroom belonging to xdp_buff.
>>>
>>> Signed-off-by: Heng Qi <hengqi@linux.alibaba.com>
>>> Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
>>> ---
>>>   drivers/net/virtio_net.c | 78 
>>> ++++++++++++++++++++++++++++++++++++++++
>>>   1 file changed, 78 insertions(+)
>>>
>>> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
>>> index 08f209d7b0bf..8fc3b1841d92 100644
>>> --- a/drivers/net/virtio_net.c
>>> +++ b/drivers/net/virtio_net.c
>>> @@ -931,6 +931,84 @@ static struct sk_buff *receive_big(struct 
>>> net_device *dev,
>>>       return NULL;
>>>   }
>>>   +/* TODO: build xdp in big mode */
>>> +static int virtnet_build_xdp_buff_mrg(struct net_device *dev,
>>> +                      struct virtnet_info *vi,
>>> +                      struct receive_queue *rq,
>>> +                      struct xdp_buff *xdp,
>>> +                      void *buf,
>>> +                      unsigned int len,
>>> +                      unsigned int frame_sz,
>>> +                      u16 *num_buf,
>>> +                      unsigned int *xdp_frags_truesize,
>>> +                      struct virtnet_rq_stats *stats)
>>> +{
>>> +    unsigned int tailroom = SKB_DATA_ALIGN(sizeof(struct 
>>> skb_shared_info));
>>> +    struct virtio_net_hdr_mrg_rxbuf *hdr = buf;
>>> +    unsigned int truesize, headroom;
>>> +    struct skb_shared_info *shinfo;
>>> +    unsigned int xdp_frags_truesz = 0;
>>> +    unsigned int cur_frag_size;
>>> +    struct page *page;
>>> +    skb_frag_t *frag;
>>> +    int offset;
>>> +    void *ctx;
>>> +
>>> +    xdp_init_buff(xdp, frame_sz, &rq->xdp_rxq);
>>> +    xdp_prepare_buff(xdp, buf - VIRTIO_XDP_HEADROOM,
>>> +             VIRTIO_XDP_HEADROOM + vi->hdr_len, len - vi->hdr_len, 
>>> true);
>>> +
>>> +    if (*num_buf > 1) {
>>> +        shinfo = xdp_get_shared_info_from_buff(xdp);
>>> +        shinfo->nr_frags = 0;
>>> +        shinfo->xdp_frags_size = 0;
>>> +    }
>>> +
>>> +    if ((*num_buf - 1) > MAX_SKB_FRAGS)
>>> +        return -EINVAL;
>>> +
>>> +    while ((--*num_buf) >= 1) {
>>> +        buf = virtqueue_get_buf_ctx(rq->vq, &len, &ctx);
>>> +        if (unlikely(!buf)) {
>>> +            pr_debug("%s: rx error: %d buffers out of %d missing\n",
>>> +                 dev->name, *num_buf,
>>> +                 virtio16_to_cpu(vi->vdev, hdr->num_buffers));
>>> +            dev->stats.rx_length_errors++;
>>> +            return -EINVAL;
>>> +        }
>>> +
>>> +        if (!xdp_buff_has_frags(xdp))
>>> +            xdp_buff_set_frags_flag(xdp);
>>
>>
>> Any reason to put this inside the loop?
>
> I'll move it outside the loop in the next version.
>
>>
>>
>>> +
>>> +        stats->bytes += len;
>>> +        page = virt_to_head_page(buf);
>>> +        offset = buf - page_address(page);
>>> +        truesize = mergeable_ctx_to_truesize(ctx);
>>> +        headroom = mergeable_ctx_to_headroom(ctx);
>>> +
>>> +        cur_frag_size = truesize + (headroom ? (headroom + 
>>> tailroom) : 0);
>>> +        xdp_frags_truesz += cur_frag_size;
>>
>>
>> Not related to this patch, but it would easily confuse the future 
>> readers that the we need another math for truesize. I think at least 
>> we need some comments for this or
>
> Yes it might, I'll add more comments on this.
>
>>
>> I guess the root cause is in get_mergeable_buf_len:
>>
>> static unsigned int get_mergeable_buf_len(struct receive_queue *rq,
>>                                       struct ewma_pkt_len *avg_pkt_len,
>>                                           unsigned int room)
>> {
>>         struct virtnet_info *vi = rq->vq->vdev->priv;
>>         const size_t hdr_len = vi->hdr_len;
>>         unsigned int len;
>>
>>         if (room)
>>         return PAGE_SIZE - room;
>>
>> And we do
>>
>>     len = get_mergeable_buf_len(rq, &rq->mrg_avg_pkt_len, room);
>>
>>     ...
>>
>>     ctx = mergeable_len_to_ctx(len, headroom);
>>
>>
>> I wonder if it's better to pack the real truesize (PAGE_SIZE) here. 
>> This may ease a lot of things.
>
> I don't know the historical reason for not packing, but I guess this 
> is for the convenience of
> comparing the actual length len given by the device with truesize. 
> Therefore, I think it would
> be better to keep the above practice for now?


The problem is:

We had

static unsigned int mergeable_ctx_to_truesize(void *mrg_ctx)

It means the truesize should be calculated before packing into the 
context. So having another math to get truesize is very confusing, I 
think it's better to fix that. Otherwise we may end up code that is hard 
to be reviewed even with the help of comments.

Thanks


> Perhaps, I can add more explanation to the
> above code to help future readers try not to get confused.
>
> Thanks.
>
>>
>> Thanks
>>
>>
>>> +        if (unlikely(len > truesize || cur_frag_size > PAGE_SIZE)) {
>>> +            pr_debug("%s: rx error: len %u exceeds truesize %lu\n",
>>> +                 dev->name, len, (unsigned long)ctx);
>>> +            dev->stats.rx_length_errors++;
>>> +            return -EINVAL;
>>> +        }
>>> +
>>> +        frag = &shinfo->frags[shinfo->nr_frags++];
>>> +        __skb_frag_set_page(frag, page);
>>> +        skb_frag_off_set(frag, offset);
>>> +        skb_frag_size_set(frag, len);
>>> +        if (page_is_pfmemalloc(page))
>>> +            xdp_buff_set_frag_pfmemalloc(xdp);
>>> +
>>> +        shinfo->xdp_frags_size += len;
>>> +    }
>>> +
>>> +    *xdp_frags_truesize = xdp_frags_truesz;
>>> +    return 0;
>>> +}
>>> +
>>>   static struct sk_buff *receive_mergeable(struct net_device *dev,
>>>                        struct virtnet_info *vi,
>>>                        struct receive_queue *rq,
>


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 2/9] virtio_net: set up xdp for multi buffer packets
  2022-12-28  3:50       ` Heng Qi
@ 2022-12-28  6:27         ` Jason Wang
  0 siblings, 0 replies; 41+ messages in thread
From: Jason Wang @ 2022-12-28  6:27 UTC (permalink / raw)
  To: Heng Qi, netdev, bpf
  Cc: Michael S . Tsirkin, Paolo Abeni, Jakub Kicinski, John Fastabend,
	David S . Miller, Daniel Borkmann, Alexei Starovoitov,
	Eric Dumazet, Xuan Zhuo


在 2022/12/28 11:50, Heng Qi 写道:
>
>
> 在 2022/12/27 下午8:20, Heng Qi 写道:
>>
>>
>> 在 2022/12/27 下午2:32, Jason Wang 写道:
>>>
>>> 在 2022/12/20 22:14, Heng Qi 写道:
>>>> When the xdp program sets xdp.frags, which means it can process
>>>> multi-buffer packets over larger MTU, so we continue to support xdp.
>>>> But for single-buffer xdp, we should keep checking for MTU.
>>>>
>>>> Signed-off-by: Heng Qi <hengqi@linux.alibaba.com>
>>>> Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
>>>> ---
>>>>   drivers/net/virtio_net.c | 4 ++--
>>>>   1 file changed, 2 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
>>>> index 443aa7b8f0ad..c5c4e9db4ed3 100644
>>>> --- a/drivers/net/virtio_net.c
>>>> +++ b/drivers/net/virtio_net.c
>>>> @@ -3095,8 +3095,8 @@ static int virtnet_xdp_set(struct net_device 
>>>> *dev, struct bpf_prog *prog,
>>>>           return -EINVAL;
>>>>       }
>>>>   -    if (dev->mtu > max_sz) {
>>>> -        NL_SET_ERR_MSG_MOD(extack, "MTU too large to enable XDP");
>>>> +    if (prog && !prog->aux->xdp_has_frags && dev->mtu > max_sz) {
>>>
>>>
>>> Not related to this patch, but I see:
>>>
>>>         unsigned long int max_sz = PAGE_SIZE - sizeof(struct 
>>> padded_vnet_hdr);
>>>
>>> Which is suspicious, do we need to count reserved headroom/tailroom 
>>> as well?
>>
>> This seems to be suspicious. After loading xdp, the size of the 
>> filled avail buffer
>> is (PAGE_SIZE - headroom - tailroom), so the size of the received 
>> used buffer, ie MTU,
>> should also be (PAGE_SIZE - headroom - tailroom).
>
> Hi Jason, this is indeed a problem. After verification, packet drop 
> will indeed occur.  To avoid this,
> the size of MTU should be (PAGE_SIZE - headroom - tailroom - ethhdr = 
> 4096 - 256 -320 - 14 =3506).
> Because when there is xdp, each filling is 3520 (PAGE_SIZE - room), if 
> the value of (MTU + 14) is
> greater than 3520 (because the MTU does not contain the ethernet 
> header), then the packet with a
> length greater than 3520 will come in, so num_buf will still be 
> greater than or equal to 2, and then
> xdp_linearize_page() will be performed and the packet will be dropped 
> because the total length is
> greater than PAGE_SIZE.
>
> I will make a separate bugfix patch to fix this later.


Great.

Thanks


>
> Thanks.
>
>>
>> Thanks.
>>
>>>
>>> Thanks
>>>
>>>
>>>> +        NL_SET_ERR_MSG_MOD(extack, "MTU too large to enable XDP 
>>>> without frags");
>>>>           netdev_warn(dev, "XDP requires MTU less than %lu\n", 
>>>> max_sz);
>>>>           return -EINVAL;
>>>>       }
>


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 1/9] virtio_net: disable the hole mechanism for xdp
  2022-12-27  7:32     ` Heng Qi
@ 2022-12-28  6:28       ` Jason Wang
  2022-12-28  8:24         ` Heng Qi
  0 siblings, 1 reply; 41+ messages in thread
From: Jason Wang @ 2022-12-28  6:28 UTC (permalink / raw)
  To: Heng Qi, netdev, bpf
  Cc: Michael S . Tsirkin, Paolo Abeni, Jakub Kicinski, John Fastabend,
	David S . Miller, Daniel Borkmann, Alexei Starovoitov,
	Eric Dumazet, Xuan Zhuo


在 2022/12/27 15:32, Heng Qi 写道:
>
>
> 在 2022/12/27 下午2:30, Jason Wang 写道:
>>
>> 在 2022/12/20 22:14, Heng Qi 写道:
>>> XDP core assumes that the frame_size of xdp_buff and the length of
>>> the frag are PAGE_SIZE. The hole may cause the processing of xdp to
>>> fail, so we disable the hole mechanism when xdp is set.
>>>
>>> Signed-off-by: Heng Qi <hengqi@linux.alibaba.com>
>>> Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
>>> ---
>>>   drivers/net/virtio_net.c | 5 ++++-
>>>   1 file changed, 4 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
>>> index 9cce7dec7366..443aa7b8f0ad 100644
>>> --- a/drivers/net/virtio_net.c
>>> +++ b/drivers/net/virtio_net.c
>>> @@ -1419,8 +1419,11 @@ static int add_recvbuf_mergeable(struct 
>>> virtnet_info *vi,
>>>           /* To avoid internal fragmentation, if there is very 
>>> likely not
>>>            * enough space for another buffer, add the remaining 
>>> space to
>>>            * the current buffer.
>>> +         * XDP core assumes that frame_size of xdp_buff and the length
>>> +         * of the frag are PAGE_SIZE, so we disable the hole 
>>> mechanism.
>>>            */
>>> -        len += hole;
>>> +        if (!headroom)
>>> +            len += hole;
>>
>>
>> Is this only a requirement of multi-buffer XDP? If not, it need to be 
>> backported to stable.
>
> It applies to single buffer xdp and multi-buffer xdp, but even if 
> single buffer xdp has a hole
> mechanism, there will be no problem (limiting mtu and turning off 
> GUEST GSO), so there is
> no need to backport it.


Let's add this in the changelog.

With that,

Acked-by: Jason Wang <jasowang@redhat.com>

Thanks


>
> Thanks.
>
>>
>> Thanks
>>
>>
>>>           alloc_frag->offset += hole;
>>>       }
>


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 6/9] virtio_net: transmit the multi-buffer xdp
  2022-12-27  8:26     ` Heng Qi
@ 2022-12-28  6:30       ` Jason Wang
  2022-12-28  8:25         ` Heng Qi
  0 siblings, 1 reply; 41+ messages in thread
From: Jason Wang @ 2022-12-28  6:30 UTC (permalink / raw)
  To: Heng Qi, netdev, bpf
  Cc: Michael S . Tsirkin, Paolo Abeni, Jakub Kicinski, John Fastabend,
	David S . Miller, Daniel Borkmann, Alexei Starovoitov,
	Eric Dumazet, Xuan Zhuo


在 2022/12/27 16:26, Heng Qi 写道:
>
>
> 在 2022/12/27 下午3:12, Jason Wang 写道:
>>
>> 在 2022/12/20 22:14, Heng Qi 写道:
>>> This serves as the basis for XDP_TX and XDP_REDIRECT
>>> to send a multi-buffer xdp_frame.
>>>
>>> Signed-off-by: Heng Qi <hengqi@linux.alibaba.com>
>>> Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
>>> ---
>>>   drivers/net/virtio_net.c | 27 ++++++++++++++++++++++-----
>>>   1 file changed, 22 insertions(+), 5 deletions(-)
>>>
>>> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
>>> index 40bc58fa57f5..9f31bfa7f9a6 100644
>>> --- a/drivers/net/virtio_net.c
>>> +++ b/drivers/net/virtio_net.c
>>> @@ -563,22 +563,39 @@ static int __virtnet_xdp_xmit_one(struct 
>>> virtnet_info *vi,
>>>                      struct xdp_frame *xdpf)
>>>   {
>>>       struct virtio_net_hdr_mrg_rxbuf *hdr;
>>> -    int err;
>>> +    struct skb_shared_info *shinfo;
>>> +    u8 nr_frags = 0;
>>> +    int err, i;
>>>         if (unlikely(xdpf->headroom < vi->hdr_len))
>>>           return -EOVERFLOW;
>>>   -    /* Make room for virtqueue hdr (also change xdpf->headroom?) */
>>> +    if (unlikely(xdp_frame_has_frags(xdpf))) {
>>> +        shinfo = xdp_get_shared_info_from_frame(xdpf);
>>> +        nr_frags = shinfo->nr_frags;
>>> +    }
>>> +
>>> +    /* Need to adjust this to calculate the correct postion
>>> +     * for shinfo of the xdpf.
>>> +     */
>>> +    xdpf->headroom -= vi->hdr_len;
>>
>>
>> Any reason we need to do this here? (Or if it is, is it only needed 
>> for multibuffer XDP?)
>
> Going back to its wrapping function virtnet_xdp_xmit(), we need to 
> free up the pending old buffers.
> If the "is_xdp_frame(ptr)" condition is met, then we need to calculate 
> the position of skb_shared_info
> in xdp_get_frame_len() and xdp_return_frame(), which will involve to 
> xdpf->data and xdpf->headroom.
> Therefore, we need to update the value of headroom synchronously here.


Let's tweak the comment above to something like this.

With that fixed,

Acked-by: Jason Wang <jasowang@redhat.com>

Thanks


>
> Also, it's not necessary for single-buffer xdp, but we need to keep it 
> because it's harmless and as it should be.
>
> Thanks.
>
>>
>> Other looks good.
>>
>> Thanks
>>
>>
>>>       xdpf->data -= vi->hdr_len;
>>>       /* Zero header and leave csum up to XDP layers */
>>>       hdr = xdpf->data;
>>>       memset(hdr, 0, vi->hdr_len);
>>>       xdpf->len   += vi->hdr_len;
>>>   -    sg_init_one(sq->sg, xdpf->data, xdpf->len);
>>> +    sg_init_table(sq->sg, nr_frags + 1);
>>> +    sg_set_buf(sq->sg, xdpf->data, xdpf->len);
>>> +    for (i = 0; i < nr_frags; i++) {
>>> +        skb_frag_t *frag = &shinfo->frags[i];
>>> +
>>> +        sg_set_page(&sq->sg[i + 1], skb_frag_page(frag),
>>> +                skb_frag_size(frag), skb_frag_off(frag));
>>> +    }
>>>   -    err = virtqueue_add_outbuf(sq->vq, sq->sg, 1, xdp_to_ptr(xdpf),
>>> -                   GFP_ATOMIC);
>>> +    err = virtqueue_add_outbuf(sq->vq, sq->sg, nr_frags + 1,
>>> +                   xdp_to_ptr(xdpf), GFP_ATOMIC);
>>>       if (unlikely(err))
>>>           return -ENOSPC; /* Caller handle free/refcnt */
>


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 4/9] virtio_net: build xdp_buff with multi buffers
  2022-12-28  6:27       ` Jason Wang
@ 2022-12-28  8:17         ` Heng Qi
  0 siblings, 0 replies; 41+ messages in thread
From: Heng Qi @ 2022-12-28  8:17 UTC (permalink / raw)
  To: Jason Wang, netdev, bpf
  Cc: Michael S . Tsirkin, Paolo Abeni, Jakub Kicinski, John Fastabend,
	David S . Miller, Daniel Borkmann, Alexei Starovoitov,
	Eric Dumazet, Xuan Zhuo



在 2022/12/28 下午2:27, Jason Wang 写道:
>
> 在 2022/12/27 17:10, Heng Qi 写道:
>>
>>
>> 在 2022/12/27 下午2:46, Jason Wang 写道:
>>>
>>> 在 2022/12/20 22:14, Heng Qi 写道:
>>>> Support xdp for multi buffer packets in mergeable mode.
>>>>
>>>> Putting the first buffer as the linear part for xdp_buff,
>>>> and the rest of the buffers as non-linear fragments to struct
>>>> skb_shared_info in the tailroom belonging to xdp_buff.
>>>>
>>>> Signed-off-by: Heng Qi <hengqi@linux.alibaba.com>
>>>> Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
>>>> ---
>>>>   drivers/net/virtio_net.c | 78 
>>>> ++++++++++++++++++++++++++++++++++++++++
>>>>   1 file changed, 78 insertions(+)
>>>>
>>>> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
>>>> index 08f209d7b0bf..8fc3b1841d92 100644
>>>> --- a/drivers/net/virtio_net.c
>>>> +++ b/drivers/net/virtio_net.c
>>>> @@ -931,6 +931,84 @@ static struct sk_buff *receive_big(struct 
>>>> net_device *dev,
>>>>       return NULL;
>>>>   }
>>>>   +/* TODO: build xdp in big mode */
>>>> +static int virtnet_build_xdp_buff_mrg(struct net_device *dev,
>>>> +                      struct virtnet_info *vi,
>>>> +                      struct receive_queue *rq,
>>>> +                      struct xdp_buff *xdp,
>>>> +                      void *buf,
>>>> +                      unsigned int len,
>>>> +                      unsigned int frame_sz,
>>>> +                      u16 *num_buf,
>>>> +                      unsigned int *xdp_frags_truesize,
>>>> +                      struct virtnet_rq_stats *stats)
>>>> +{
>>>> +    unsigned int tailroom = SKB_DATA_ALIGN(sizeof(struct 
>>>> skb_shared_info));
>>>> +    struct virtio_net_hdr_mrg_rxbuf *hdr = buf;
>>>> +    unsigned int truesize, headroom;
>>>> +    struct skb_shared_info *shinfo;
>>>> +    unsigned int xdp_frags_truesz = 0;
>>>> +    unsigned int cur_frag_size;
>>>> +    struct page *page;
>>>> +    skb_frag_t *frag;
>>>> +    int offset;
>>>> +    void *ctx;
>>>> +
>>>> +    xdp_init_buff(xdp, frame_sz, &rq->xdp_rxq);
>>>> +    xdp_prepare_buff(xdp, buf - VIRTIO_XDP_HEADROOM,
>>>> +             VIRTIO_XDP_HEADROOM + vi->hdr_len, len - vi->hdr_len, 
>>>> true);
>>>> +
>>>> +    if (*num_buf > 1) {
>>>> +        shinfo = xdp_get_shared_info_from_buff(xdp);
>>>> +        shinfo->nr_frags = 0;
>>>> +        shinfo->xdp_frags_size = 0;
>>>> +    }
>>>> +
>>>> +    if ((*num_buf - 1) > MAX_SKB_FRAGS)
>>>> +        return -EINVAL;
>>>> +
>>>> +    while ((--*num_buf) >= 1) {
>>>> +        buf = virtqueue_get_buf_ctx(rq->vq, &len, &ctx);
>>>> +        if (unlikely(!buf)) {
>>>> +            pr_debug("%s: rx error: %d buffers out of %d missing\n",
>>>> +                 dev->name, *num_buf,
>>>> +                 virtio16_to_cpu(vi->vdev, hdr->num_buffers));
>>>> +            dev->stats.rx_length_errors++;
>>>> +            return -EINVAL;
>>>> +        }
>>>> +
>>>> +        if (!xdp_buff_has_frags(xdp))
>>>> +            xdp_buff_set_frags_flag(xdp);
>>>
>>>
>>> Any reason to put this inside the loop?
>>
>> I'll move it outside the loop in the next version.
>>
>>>
>>>
>>>> +
>>>> +        stats->bytes += len;
>>>> +        page = virt_to_head_page(buf);
>>>> +        offset = buf - page_address(page);
>>>> +        truesize = mergeable_ctx_to_truesize(ctx);
>>>> +        headroom = mergeable_ctx_to_headroom(ctx);
>>>> +
>>>> +        cur_frag_size = truesize + (headroom ? (headroom + 
>>>> tailroom) : 0);
>>>> +        xdp_frags_truesz += cur_frag_size;
>>>
>>>
>>> Not related to this patch, but it would easily confuse the future 
>>> readers that the we need another math for truesize. I think at least 
>>> we need some comments for this or
>>
>> Yes it might, I'll add more comments on this.
>>
>>>
>>> I guess the root cause is in get_mergeable_buf_len:
>>>
>>> static unsigned int get_mergeable_buf_len(struct receive_queue *rq,
>>>                                       struct ewma_pkt_len *avg_pkt_len,
>>>                                           unsigned int room)
>>> {
>>>         struct virtnet_info *vi = rq->vq->vdev->priv;
>>>         const size_t hdr_len = vi->hdr_len;
>>>         unsigned int len;
>>>
>>>         if (room)
>>>         return PAGE_SIZE - room;
>>>
>>> And we do
>>>
>>>     len = get_mergeable_buf_len(rq, &rq->mrg_avg_pkt_len, room);
>>>
>>>     ...
>>>
>>>     ctx = mergeable_len_to_ctx(len, headroom);
>>>
>>>
>>> I wonder if it's better to pack the real truesize (PAGE_SIZE) here. 
>>> This may ease a lot of things.
>>
>> I don't know the historical reason for not packing, but I guess this 
>> is for the convenience of
>> comparing the actual length len given by the device with truesize. 
>> Therefore, I think it would
>> be better to keep the above practice for now?
>
>
> The problem is:
>
> We had
>
> static unsigned int mergeable_ctx_to_truesize(void *mrg_ctx)
>
> It means the truesize should be calculated before packing into the 
> context. So having another math to get truesize is very confusing, I 
> think it's better to fix that. Otherwise we may end up code that is 
> hard to be reviewed even with the help of comments.

It is reasonable to let truesize return to its literal meaning, which 
includes the length of
headroom and tailroom, and I will modify this in the next version.

Thanks.

>
> Thanks
>
>
>> Perhaps, I can add more explanation to the
>> above code to help future readers try not to get confused.
>>
>> Thanks.
>>
>>>
>>> Thanks
>>>
>>>
>>>> +        if (unlikely(len > truesize || cur_frag_size > PAGE_SIZE)) {
>>>> +            pr_debug("%s: rx error: len %u exceeds truesize %lu\n",
>>>> +                 dev->name, len, (unsigned long)ctx);
>>>> +            dev->stats.rx_length_errors++;
>>>> +            return -EINVAL;
>>>> +        }
>>>> +
>>>> +        frag = &shinfo->frags[shinfo->nr_frags++];
>>>> +        __skb_frag_set_page(frag, page);
>>>> +        skb_frag_off_set(frag, offset);
>>>> +        skb_frag_size_set(frag, len);
>>>> +        if (page_is_pfmemalloc(page))
>>>> +            xdp_buff_set_frag_pfmemalloc(xdp);
>>>> +
>>>> +        shinfo->xdp_frags_size += len;
>>>> +    }
>>>> +
>>>> +    *xdp_frags_truesize = xdp_frags_truesz;
>>>> +    return 0;
>>>> +}
>>>> +
>>>>   static struct sk_buff *receive_mergeable(struct net_device *dev,
>>>>                        struct virtnet_info *vi,
>>>>                        struct receive_queue *rq,
>>


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 5/9] virtio_net: construct multi-buffer xdp in mergeable
  2022-12-28  6:24       ` Jason Wang
@ 2022-12-28  8:23         ` Heng Qi
  2022-12-28 11:54           ` Jason Wang
  0 siblings, 1 reply; 41+ messages in thread
From: Heng Qi @ 2022-12-28  8:23 UTC (permalink / raw)
  To: Jason Wang
  Cc: netdev, bpf, Michael S . Tsirkin, Paolo Abeni, Jakub Kicinski,
	John Fastabend, David S . Miller, Daniel Borkmann,
	Alexei Starovoitov, Eric Dumazet, Xuan Zhuo

On Wed, Dec 28, 2022 at 02:24:22PM +0800, Jason Wang wrote:
> 
> 在 2022/12/27 17:31, Heng Qi 写道:
> >
> >
> >在 2022/12/27 下午3:01, Jason Wang 写道:
> >>
> >>在 2022/12/20 22:14, Heng Qi 写道:
> >>>Build multi-buffer xdp using virtnet_build_xdp_buff_mrg().
> >>>
> >>>For the prefilled buffer before xdp is set, we will probably use
> >>>vq reset in the future. At the same time, virtio net currently
> >>>uses comp pages, and bpf_xdp_frags_increase_tail() needs to calculate
> >>>the tailroom of the last frag, which will involve the offset of the
> >>>corresponding page and cause a negative value, so we disable tail
> >>>increase by not setting xdp_rxq->frag_size.
> >>>
> >>>Signed-off-by: Heng Qi <hengqi@linux.alibaba.com>
> >>>Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> >>>---
> >>>  drivers/net/virtio_net.c | 60
> >>>+++++++++++++++++++++++++++++-----------
> >>>  1 file changed, 44 insertions(+), 16 deletions(-)
> >>>
> >>>diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> >>>index 8fc3b1841d92..40bc58fa57f5 100644
> >>>--- a/drivers/net/virtio_net.c
> >>>+++ b/drivers/net/virtio_net.c
> >>>@@ -1018,6 +1018,7 @@ static struct sk_buff
> >>>*receive_mergeable(struct net_device *dev,
> >>>                       unsigned int *xdp_xmit,
> >>>                       struct virtnet_rq_stats *stats)
> >>>  {
> >>>+    unsigned int tailroom = SKB_DATA_ALIGN(sizeof(struct
> >>>skb_shared_info));
> >>>      struct virtio_net_hdr_mrg_rxbuf *hdr = buf;
> >>>      u16 num_buf = virtio16_to_cpu(vi->vdev, hdr->num_buffers);
> >>>      struct page *page = virt_to_head_page(buf);
> >>>@@ -1048,11 +1049,14 @@ static struct sk_buff
> >>>*receive_mergeable(struct net_device *dev,
> >>>      rcu_read_lock();
> >>>      xdp_prog = rcu_dereference(rq->xdp_prog);
> >>>      if (xdp_prog) {
> >>>+        unsigned int xdp_frags_truesz = 0;
> >>>+        struct skb_shared_info *shinfo;
> >>>          struct xdp_frame *xdpf;
> >>>          struct page *xdp_page;
> >>>          struct xdp_buff xdp;
> >>>          void *data;
> >>>          u32 act;
> >>>+        int i;
> >>>            /* Transient failure which in theory could occur if
> >>>           * in-flight packets from before XDP was enabled reach
> >>>@@ -1061,19 +1065,23 @@ static struct sk_buff
> >>>*receive_mergeable(struct net_device *dev,
> >>>          if (unlikely(hdr->hdr.gso_type))
> >>>              goto err_xdp;
> >>>  -        /* Buffers with headroom use PAGE_SIZE as alloc size,
> >>>-         * see add_recvbuf_mergeable() + get_mergeable_buf_len()
> >>>+        /* Now XDP core assumes frag size is PAGE_SIZE, but buffers
> >>>+         * with headroom may add hole in truesize, which
> >>>+         * make their length exceed PAGE_SIZE. So we disabled the
> >>>+         * hole mechanism for xdp. See add_recvbuf_mergeable().
> >>>           */
> >>>          frame_sz = headroom ? PAGE_SIZE : truesize;
> >>>  -        /* This happens when rx buffer size is underestimated
> >>>-         * or headroom is not enough because of the buffer
> >>>-         * was refilled before XDP is set. This should only
> >>>-         * happen for the first several packets, so we don't
> >>>-         * care much about its performance.
> >>>+        /* This happens when headroom is not enough because
> >>>+         * of the buffer was prefilled before XDP is set.
> >>>+         * This should only happen for the first several packets.
> >>>+         * In fact, vq reset can be used here to help us clean up
> >>>+         * the prefilled buffers, but many existing devices do not
> >>>+         * support it, and we don't want to bother users who are
> >>>+         * using xdp normally.
> >>>           */
> >>>-        if (unlikely(num_buf > 1 ||
> >>>-                 headroom < virtnet_get_headroom(vi))) {
> >>>+        if (!xdp_prog->aux->xdp_has_frags &&
> >>>+            (num_buf > 1 || headroom < virtnet_get_headroom(vi))) {
> >>>              /* linearize data for XDP */
> >>>              xdp_page = xdp_linearize_page(rq, &num_buf,
> >>>                                page, offset,
> >>>@@ -1084,17 +1092,26 @@ static struct sk_buff
> >>>*receive_mergeable(struct net_device *dev,
> >>>              if (!xdp_page)
> >>>                  goto err_xdp;
> >>>              offset = VIRTIO_XDP_HEADROOM;
> >>>+        } else if (unlikely(headroom < virtnet_get_headroom(vi))) {
> >>
> >>
> >>I believe we need to check xdp_prog->aux->xdp_has_frags at least
> >>since this may not work if it needs more than one frags?
> >
> >Sorry Jason, I didn't understand you, I'll try to answer. For
> >multi-buffer xdp programs, if the first buffer is a pre-filled
> >buffer (no headroom),
> >we need to copy it out and use the subsequent buffers of this
> >packet as its frags (this is done in
> >virtnet_build_xdp_buff_mrg()), therefore,
> >it seems that there is no need to check
> >'xdp_prog->aux->xdp_has_frags' to mark multi-buffer xdp (of course
> >I can add it),
> >
> >+ } else if (unlikely(headroom < virtnet_get_headroom(vi))) {
> >
> >Because the linearization of single-buffer xdp has all been done
> >before, the subsequent situation can only be applied to
> >multi-buffer xdp:
> >+ if (!xdp_prog->aux->xdp_has_frags &&
> >+ (num_buf > 1 || headroom < virtnet_get_headroom(vi))) {
> 
> 
> I basically meant what happens if
> 
> !xdp_prog->aux->xdp_has_frags && num_buf > 2 && headroom <
> virtnet_get_headroom(vi)
> 
> In this case the current code seems to leave the second buffer in
> the frags. This is the case of the buffer size underestimation that
> is mentioned in the comment before (I'd like to keep that).

If I'm not wrong, this case is still directly into the first 'if' loop.

-               if (unlikely(num_buf > 1 ||
-                            headroom < virtnet_get_headroom(vi))) {
+               if (!xdp_prog->aux->xdp_has_frags &&
+                   (num_buf > 1 || headroom < virtnet_get_headroom(vi))) {

Thanks.

> 
> (And that's why I'm asking to use linearizge_page())
> 
> Thanks
> 
> 
> >
> >>
> >>Btw, I don't see a reason why we can't reuse
> >>xdp_linearize_page(), (we probably don't need error is the
> >>buffer exceeds PAGE_SIZE).
> >
> >For multi-buffer xdp, we only need to copy out the pre-filled
> >first buffer, and use the remaining buffers of this packet as
> >frags in virtnet_build_xdp_buff_mrg().
> >
> >Thanks.
> >
> >>
> >>Other looks good.
> >>
> >>Thanks
> >>
> >>
> >>>+            if ((VIRTIO_XDP_HEADROOM + len + tailroom) > PAGE_SIZE)
> >>>+                goto err_xdp;
> >>>+
> >>>+            xdp_page = alloc_page(GFP_ATOMIC);
> >>>+            if (!xdp_page)
> >>>+                goto err_xdp;
> >>>+
> >>>+            memcpy(page_address(xdp_page) + VIRTIO_XDP_HEADROOM,
> >>>+                   page_address(page) + offset, len);
> >>>+            frame_sz = PAGE_SIZE;
> >>>+            offset = VIRTIO_XDP_HEADROOM;
> >>>          } else {
> >>>              xdp_page = page;
> >>>          }
> >>>-
> >>>-        /* Allow consuming headroom but reserve enough space to push
> >>>-         * the descriptor on if we get an XDP_TX return code.
> >>>-         */
> >>>          data = page_address(xdp_page) + offset;
> >>>-        xdp_init_buff(&xdp, frame_sz - vi->hdr_len, &rq->xdp_rxq);
> >>>-        xdp_prepare_buff(&xdp, data - VIRTIO_XDP_HEADROOM +
> >>>vi->hdr_len,
> >>>-                 VIRTIO_XDP_HEADROOM, len - vi->hdr_len, true);
> >>>+        err = virtnet_build_xdp_buff_mrg(dev, vi, rq, &xdp,
> >>>data, len, frame_sz,
> >>>+                         &num_buf, &xdp_frags_truesz, stats);
> >>>+        if (unlikely(err))
> >>>+            goto err_xdp_frags;
> >>>            act = bpf_prog_run_xdp(xdp_prog, &xdp);
> >>>          stats->xdp_packets++;
> >>>@@ -1190,6 +1207,17 @@ static struct sk_buff
> >>>*receive_mergeable(struct net_device *dev,
> >>>                  __free_pages(xdp_page, 0);
> >>>              goto err_xdp;
> >>>          }
> >>>+err_xdp_frags:
> >>>+        shinfo = xdp_get_shared_info_from_buff(&xdp);
> >>>+
> >>>+        if (unlikely(xdp_page != page))
> >>>+            __free_pages(xdp_page, 0);
> >>>+
> >>>+        for (i = 0; i < shinfo->nr_frags; i++) {
> >>>+            xdp_page = skb_frag_page(&shinfo->frags[i]);
> >>>+            put_page(xdp_page);
> >>>+        }
> >>>+        goto err_xdp;
> >>>      }
> >>>      rcu_read_unlock();
> >

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 1/9] virtio_net: disable the hole mechanism for xdp
  2022-12-28  6:28       ` Jason Wang
@ 2022-12-28  8:24         ` Heng Qi
  0 siblings, 0 replies; 41+ messages in thread
From: Heng Qi @ 2022-12-28  8:24 UTC (permalink / raw)
  To: Jason Wang, netdev, bpf
  Cc: Michael S . Tsirkin, Paolo Abeni, Jakub Kicinski, John Fastabend,
	David S . Miller, Daniel Borkmann, Alexei Starovoitov,
	Eric Dumazet, Xuan Zhuo



在 2022/12/28 下午2:28, Jason Wang 写道:
>
> 在 2022/12/27 15:32, Heng Qi 写道:
>>
>>
>> 在 2022/12/27 下午2:30, Jason Wang 写道:
>>>
>>> 在 2022/12/20 22:14, Heng Qi 写道:
>>>> XDP core assumes that the frame_size of xdp_buff and the length of
>>>> the frag are PAGE_SIZE. The hole may cause the processing of xdp to
>>>> fail, so we disable the hole mechanism when xdp is set.
>>>>
>>>> Signed-off-by: Heng Qi <hengqi@linux.alibaba.com>
>>>> Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
>>>> ---
>>>>   drivers/net/virtio_net.c | 5 ++++-
>>>>   1 file changed, 4 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
>>>> index 9cce7dec7366..443aa7b8f0ad 100644
>>>> --- a/drivers/net/virtio_net.c
>>>> +++ b/drivers/net/virtio_net.c
>>>> @@ -1419,8 +1419,11 @@ static int add_recvbuf_mergeable(struct 
>>>> virtnet_info *vi,
>>>>           /* To avoid internal fragmentation, if there is very 
>>>> likely not
>>>>            * enough space for another buffer, add the remaining 
>>>> space to
>>>>            * the current buffer.
>>>> +         * XDP core assumes that frame_size of xdp_buff and the 
>>>> length
>>>> +         * of the frag are PAGE_SIZE, so we disable the hole 
>>>> mechanism.
>>>>            */
>>>> -        len += hole;
>>>> +        if (!headroom)
>>>> +            len += hole;
>>>
>>>
>>> Is this only a requirement of multi-buffer XDP? If not, it need to 
>>> be backported to stable.
>>
>> It applies to single buffer xdp and multi-buffer xdp, but even if 
>> single buffer xdp has a hole
>> mechanism, there will be no problem (limiting mtu and turning off 
>> GUEST GSO), so there is
>> no need to backport it.
>
>
> Let's add this in the changelog.

Ok, thanks for your energy.

>
> With that,
>
> Acked-by: Jason Wang <jasowang@redhat.com>
>
> Thanks
>
>
>>
>> Thanks.
>>
>>>
>>> Thanks
>>>
>>>
>>>>           alloc_frag->offset += hole;
>>>>       }
>>


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 6/9] virtio_net: transmit the multi-buffer xdp
  2022-12-28  6:30       ` Jason Wang
@ 2022-12-28  8:25         ` Heng Qi
  0 siblings, 0 replies; 41+ messages in thread
From: Heng Qi @ 2022-12-28  8:25 UTC (permalink / raw)
  To: Jason Wang, netdev, bpf
  Cc: Michael S . Tsirkin, Paolo Abeni, Jakub Kicinski, John Fastabend,
	David S . Miller, Daniel Borkmann, Alexei Starovoitov,
	Eric Dumazet, Xuan Zhuo



在 2022/12/28 下午2:30, Jason Wang 写道:
>
> 在 2022/12/27 16:26, Heng Qi 写道:
>>
>>
>> 在 2022/12/27 下午3:12, Jason Wang 写道:
>>>
>>> 在 2022/12/20 22:14, Heng Qi 写道:
>>>> This serves as the basis for XDP_TX and XDP_REDIRECT
>>>> to send a multi-buffer xdp_frame.
>>>>
>>>> Signed-off-by: Heng Qi <hengqi@linux.alibaba.com>
>>>> Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
>>>> ---
>>>>   drivers/net/virtio_net.c | 27 ++++++++++++++++++++++-----
>>>>   1 file changed, 22 insertions(+), 5 deletions(-)
>>>>
>>>> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
>>>> index 40bc58fa57f5..9f31bfa7f9a6 100644
>>>> --- a/drivers/net/virtio_net.c
>>>> +++ b/drivers/net/virtio_net.c
>>>> @@ -563,22 +563,39 @@ static int __virtnet_xdp_xmit_one(struct 
>>>> virtnet_info *vi,
>>>>                      struct xdp_frame *xdpf)
>>>>   {
>>>>       struct virtio_net_hdr_mrg_rxbuf *hdr;
>>>> -    int err;
>>>> +    struct skb_shared_info *shinfo;
>>>> +    u8 nr_frags = 0;
>>>> +    int err, i;
>>>>         if (unlikely(xdpf->headroom < vi->hdr_len))
>>>>           return -EOVERFLOW;
>>>>   -    /* Make room for virtqueue hdr (also change xdpf->headroom?) */
>>>> +    if (unlikely(xdp_frame_has_frags(xdpf))) {
>>>> +        shinfo = xdp_get_shared_info_from_frame(xdpf);
>>>> +        nr_frags = shinfo->nr_frags;
>>>> +    }
>>>> +
>>>> +    /* Need to adjust this to calculate the correct postion
>>>> +     * for shinfo of the xdpf.
>>>> +     */
>>>> +    xdpf->headroom -= vi->hdr_len;
>>>
>>>
>>> Any reason we need to do this here? (Or if it is, is it only needed 
>>> for multibuffer XDP?)
>>
>> Going back to its wrapping function virtnet_xdp_xmit(), we need to 
>> free up the pending old buffers.
>> If the "is_xdp_frame(ptr)" condition is met, then we need to 
>> calculate the position of skb_shared_info
>> in xdp_get_frame_len() and xdp_return_frame(), which will involve to 
>> xdpf->data and xdpf->headroom.
>> Therefore, we need to update the value of headroom synchronously here.
>
>
> Let's tweak the comment above to something like this.

Ok, thanks for your energy.

>
> With that fixed,
>
> Acked-by: Jason Wang <jasowang@redhat.com>
>
> Thanks
>
>
>>
>> Also, it's not necessary for single-buffer xdp, but we need to keep 
>> it because it's harmless and as it should be.
>>
>> Thanks.
>>
>>>
>>> Other looks good.
>>>
>>> Thanks
>>>
>>>
>>>>       xdpf->data -= vi->hdr_len;
>>>>       /* Zero header and leave csum up to XDP layers */
>>>>       hdr = xdpf->data;
>>>>       memset(hdr, 0, vi->hdr_len);
>>>>       xdpf->len   += vi->hdr_len;
>>>>   -    sg_init_one(sq->sg, xdpf->data, xdpf->len);
>>>> +    sg_init_table(sq->sg, nr_frags + 1);
>>>> +    sg_set_buf(sq->sg, xdpf->data, xdpf->len);
>>>> +    for (i = 0; i < nr_frags; i++) {
>>>> +        skb_frag_t *frag = &shinfo->frags[i];
>>>> +
>>>> +        sg_set_page(&sq->sg[i + 1], skb_frag_page(frag),
>>>> +                skb_frag_size(frag), skb_frag_off(frag));
>>>> +    }
>>>>   -    err = virtqueue_add_outbuf(sq->vq, sq->sg, 1, xdp_to_ptr(xdpf),
>>>> -                   GFP_ATOMIC);
>>>> +    err = virtqueue_add_outbuf(sq->vq, sq->sg, nr_frags + 1,
>>>> +                   xdp_to_ptr(xdpf), GFP_ATOMIC);
>>>>       if (unlikely(err))
>>>>           return -ENOSPC; /* Caller handle free/refcnt */
>>


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 5/9] virtio_net: construct multi-buffer xdp in mergeable
  2022-12-28  8:23         ` Heng Qi
@ 2022-12-28 11:54           ` Jason Wang
  0 siblings, 0 replies; 41+ messages in thread
From: Jason Wang @ 2022-12-28 11:54 UTC (permalink / raw)
  To: Heng Qi
  Cc: netdev, bpf, Michael S . Tsirkin, Paolo Abeni, Jakub Kicinski,
	John Fastabend, David S . Miller, Daniel Borkmann,
	Alexei Starovoitov, Eric Dumazet, Xuan Zhuo

On Wed, Dec 28, 2022 at 4:23 PM Heng Qi <hengqi@linux.alibaba.com> wrote:
>
> On Wed, Dec 28, 2022 at 02:24:22PM +0800, Jason Wang wrote:
> >
> > 在 2022/12/27 17:31, Heng Qi 写道:
> > >
> > >
> > >在 2022/12/27 下午3:01, Jason Wang 写道:
> > >>
> > >>在 2022/12/20 22:14, Heng Qi 写道:
> > >>>Build multi-buffer xdp using virtnet_build_xdp_buff_mrg().
> > >>>
> > >>>For the prefilled buffer before xdp is set, we will probably use
> > >>>vq reset in the future. At the same time, virtio net currently
> > >>>uses comp pages, and bpf_xdp_frags_increase_tail() needs to calculate
> > >>>the tailroom of the last frag, which will involve the offset of the
> > >>>corresponding page and cause a negative value, so we disable tail
> > >>>increase by not setting xdp_rxq->frag_size.
> > >>>
> > >>>Signed-off-by: Heng Qi <hengqi@linux.alibaba.com>
> > >>>Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > >>>---
> > >>>  drivers/net/virtio_net.c | 60
> > >>>+++++++++++++++++++++++++++++-----------
> > >>>  1 file changed, 44 insertions(+), 16 deletions(-)
> > >>>
> > >>>diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > >>>index 8fc3b1841d92..40bc58fa57f5 100644
> > >>>--- a/drivers/net/virtio_net.c
> > >>>+++ b/drivers/net/virtio_net.c
> > >>>@@ -1018,6 +1018,7 @@ static struct sk_buff
> > >>>*receive_mergeable(struct net_device *dev,
> > >>>                       unsigned int *xdp_xmit,
> > >>>                       struct virtnet_rq_stats *stats)
> > >>>  {
> > >>>+    unsigned int tailroom = SKB_DATA_ALIGN(sizeof(struct
> > >>>skb_shared_info));
> > >>>      struct virtio_net_hdr_mrg_rxbuf *hdr = buf;
> > >>>      u16 num_buf = virtio16_to_cpu(vi->vdev, hdr->num_buffers);
> > >>>      struct page *page = virt_to_head_page(buf);
> > >>>@@ -1048,11 +1049,14 @@ static struct sk_buff
> > >>>*receive_mergeable(struct net_device *dev,
> > >>>      rcu_read_lock();
> > >>>      xdp_prog = rcu_dereference(rq->xdp_prog);
> > >>>      if (xdp_prog) {
> > >>>+        unsigned int xdp_frags_truesz = 0;
> > >>>+        struct skb_shared_info *shinfo;
> > >>>          struct xdp_frame *xdpf;
> > >>>          struct page *xdp_page;
> > >>>          struct xdp_buff xdp;
> > >>>          void *data;
> > >>>          u32 act;
> > >>>+        int i;
> > >>>            /* Transient failure which in theory could occur if
> > >>>           * in-flight packets from before XDP was enabled reach
> > >>>@@ -1061,19 +1065,23 @@ static struct sk_buff
> > >>>*receive_mergeable(struct net_device *dev,
> > >>>          if (unlikely(hdr->hdr.gso_type))
> > >>>              goto err_xdp;
> > >>>  -        /* Buffers with headroom use PAGE_SIZE as alloc size,
> > >>>-         * see add_recvbuf_mergeable() + get_mergeable_buf_len()
> > >>>+        /* Now XDP core assumes frag size is PAGE_SIZE, but buffers
> > >>>+         * with headroom may add hole in truesize, which
> > >>>+         * make their length exceed PAGE_SIZE. So we disabled the
> > >>>+         * hole mechanism for xdp. See add_recvbuf_mergeable().
> > >>>           */
> > >>>          frame_sz = headroom ? PAGE_SIZE : truesize;
> > >>>  -        /* This happens when rx buffer size is underestimated
> > >>>-         * or headroom is not enough because of the buffer
> > >>>-         * was refilled before XDP is set. This should only
> > >>>-         * happen for the first several packets, so we don't
> > >>>-         * care much about its performance.
> > >>>+        /* This happens when headroom is not enough because
> > >>>+         * of the buffer was prefilled before XDP is set.
> > >>>+         * This should only happen for the first several packets.
> > >>>+         * In fact, vq reset can be used here to help us clean up
> > >>>+         * the prefilled buffers, but many existing devices do not
> > >>>+         * support it, and we don't want to bother users who are
> > >>>+         * using xdp normally.
> > >>>           */
> > >>>-        if (unlikely(num_buf > 1 ||
> > >>>-                 headroom < virtnet_get_headroom(vi))) {
> > >>>+        if (!xdp_prog->aux->xdp_has_frags &&
> > >>>+            (num_buf > 1 || headroom < virtnet_get_headroom(vi))) {
> > >>>              /* linearize data for XDP */
> > >>>              xdp_page = xdp_linearize_page(rq, &num_buf,
> > >>>                                page, offset,
> > >>>@@ -1084,17 +1092,26 @@ static struct sk_buff
> > >>>*receive_mergeable(struct net_device *dev,
> > >>>              if (!xdp_page)
> > >>>                  goto err_xdp;
> > >>>              offset = VIRTIO_XDP_HEADROOM;
> > >>>+        } else if (unlikely(headroom < virtnet_get_headroom(vi))) {
> > >>
> > >>
> > >>I believe we need to check xdp_prog->aux->xdp_has_frags at least
> > >>since this may not work if it needs more than one frags?
> > >
> > >Sorry Jason, I didn't understand you, I'll try to answer. For
> > >multi-buffer xdp programs, if the first buffer is a pre-filled
> > >buffer (no headroom),
> > >we need to copy it out and use the subsequent buffers of this
> > >packet as its frags (this is done in
> > >virtnet_build_xdp_buff_mrg()), therefore,
> > >it seems that there is no need to check
> > >'xdp_prog->aux->xdp_has_frags' to mark multi-buffer xdp (of course
> > >I can add it),
> > >
> > >+ } else if (unlikely(headroom < virtnet_get_headroom(vi))) {
> > >
> > >Because the linearization of single-buffer xdp has all been done
> > >before, the subsequent situation can only be applied to
> > >multi-buffer xdp:
> > >+ if (!xdp_prog->aux->xdp_has_frags &&
> > >+ (num_buf > 1 || headroom < virtnet_get_headroom(vi))) {
> >
> >
> > I basically meant what happens if
> >
> > !xdp_prog->aux->xdp_has_frags && num_buf > 2 && headroom <
> > virtnet_get_headroom(vi)
> >
> > In this case the current code seems to leave the second buffer in
> > the frags. This is the case of the buffer size underestimation that
> > is mentioned in the comment before (I'd like to keep that).
>
> If I'm not wrong, this case is still directly into the first 'if' loop.

My fault, you're right.

Thanks

>
> -               if (unlikely(num_buf > 1 ||
> -                            headroom < virtnet_get_headroom(vi))) {
> +               if (!xdp_prog->aux->xdp_has_frags &&
> +                   (num_buf > 1 || headroom < virtnet_get_headroom(vi))) {
>
> Thanks.
>
> >
> > (And that's why I'm asking to use linearizge_page())
> >
> > Thanks
> >
> >
> > >
> > >>
> > >>Btw, I don't see a reason why we can't reuse
> > >>xdp_linearize_page(), (we probably don't need error is the
> > >>buffer exceeds PAGE_SIZE).
> > >
> > >For multi-buffer xdp, we only need to copy out the pre-filled
> > >first buffer, and use the remaining buffers of this packet as
> > >frags in virtnet_build_xdp_buff_mrg().
> > >
> > >Thanks.
> > >
> > >>
> > >>Other looks good.
> > >>
> > >>Thanks
> > >>
> > >>
> > >>>+            if ((VIRTIO_XDP_HEADROOM + len + tailroom) > PAGE_SIZE)
> > >>>+                goto err_xdp;
> > >>>+
> > >>>+            xdp_page = alloc_page(GFP_ATOMIC);
> > >>>+            if (!xdp_page)
> > >>>+                goto err_xdp;
> > >>>+
> > >>>+            memcpy(page_address(xdp_page) + VIRTIO_XDP_HEADROOM,
> > >>>+                   page_address(page) + offset, len);
> > >>>+            frame_sz = PAGE_SIZE;
> > >>>+            offset = VIRTIO_XDP_HEADROOM;
> > >>>          } else {
> > >>>              xdp_page = page;
> > >>>          }
> > >>>-
> > >>>-        /* Allow consuming headroom but reserve enough space to push
> > >>>-         * the descriptor on if we get an XDP_TX return code.
> > >>>-         */
> > >>>          data = page_address(xdp_page) + offset;
> > >>>-        xdp_init_buff(&xdp, frame_sz - vi->hdr_len, &rq->xdp_rxq);
> > >>>-        xdp_prepare_buff(&xdp, data - VIRTIO_XDP_HEADROOM +
> > >>>vi->hdr_len,
> > >>>-                 VIRTIO_XDP_HEADROOM, len - vi->hdr_len, true);
> > >>>+        err = virtnet_build_xdp_buff_mrg(dev, vi, rq, &xdp,
> > >>>data, len, frame_sz,
> > >>>+                         &num_buf, &xdp_frags_truesz, stats);
> > >>>+        if (unlikely(err))
> > >>>+            goto err_xdp_frags;
> > >>>            act = bpf_prog_run_xdp(xdp_prog, &xdp);
> > >>>          stats->xdp_packets++;
> > >>>@@ -1190,6 +1207,17 @@ static struct sk_buff
> > >>>*receive_mergeable(struct net_device *dev,
> > >>>                  __free_pages(xdp_page, 0);
> > >>>              goto err_xdp;
> > >>>          }
> > >>>+err_xdp_frags:
> > >>>+        shinfo = xdp_get_shared_info_from_buff(&xdp);
> > >>>+
> > >>>+        if (unlikely(xdp_page != page))
> > >>>+            __free_pages(xdp_page, 0);
> > >>>+
> > >>>+        for (i = 0; i < shinfo->nr_frags; i++) {
> > >>>+            xdp_page = skb_frag_page(&shinfo->frags[i]);
> > >>>+            put_page(xdp_page);
> > >>>+        }
> > >>>+        goto err_xdp;
> > >>>      }
> > >>>      rcu_read_unlock();
> > >
>


^ permalink raw reply	[flat|nested] 41+ messages in thread

end of thread, other threads:[~2022-12-28 11:55 UTC | newest]

Thread overview: 41+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-12-20 14:14 [PATCH v2 0/9] virtio_net: support multi buffer xdp Heng Qi
2022-12-20 14:14 ` [PATCH v2 1/9] virtio_net: disable the hole mechanism for xdp Heng Qi
2022-12-27  6:30   ` Jason Wang
2022-12-27  7:32     ` Heng Qi
2022-12-28  6:28       ` Jason Wang
2022-12-28  8:24         ` Heng Qi
2022-12-20 14:14 ` [PATCH v2 2/9] virtio_net: set up xdp for multi buffer packets Heng Qi
2022-12-27  6:32   ` Jason Wang
2022-12-27 12:20     ` Heng Qi
2022-12-28  3:50       ` Heng Qi
2022-12-28  6:27         ` Jason Wang
2022-12-20 14:14 ` [PATCH v2 3/9] virtio_net: update bytes calculation for xdp_frame Heng Qi
2022-12-20 14:14 ` [PATCH v2 4/9] virtio_net: build xdp_buff with multi buffers Heng Qi
2022-12-27  6:46   ` Jason Wang
2022-12-27  9:10     ` Heng Qi
2022-12-28  6:27       ` Jason Wang
2022-12-28  8:17         ` Heng Qi
2022-12-20 14:14 ` [PATCH v2 5/9] virtio_net: construct multi-buffer xdp in mergeable Heng Qi
2022-12-27  7:01   ` Jason Wang
2022-12-27  9:31     ` Heng Qi
2022-12-28  6:24       ` Jason Wang
2022-12-28  8:23         ` Heng Qi
2022-12-28 11:54           ` Jason Wang
2022-12-20 14:14 ` [PATCH v2 6/9] virtio_net: transmit the multi-buffer xdp Heng Qi
2022-12-27  7:12   ` Jason Wang
2022-12-27  8:26     ` Heng Qi
2022-12-28  6:30       ` Jason Wang
2022-12-28  8:25         ` Heng Qi
2022-12-20 14:14 ` [PATCH v2 7/9] virtio_net: build skb from " Heng Qi
2022-12-27  7:31   ` Jason Wang
2022-12-27  7:51     ` Heng Qi
2022-12-20 14:14 ` [PATCH v2 8/9] virtio_net: remove xdp related info from page_to_skb() Heng Qi
2022-12-27  7:55   ` Jason Wang
2022-12-27  8:27     ` Heng Qi
2022-12-20 14:14 ` [PATCH v2 9/9] virtio_net: support multi-buffer xdp Heng Qi
2022-12-27  9:03   ` Jason Wang
2022-12-27  9:11     ` Heng Qi
2022-12-22  1:30 ` [PATCH v2 0/9] virtio_net: support multi buffer xdp Jakub Kicinski
2022-12-22  2:04   ` Heng Qi
2022-12-26  2:32 ` Heng Qi
2022-12-26  4:14   ` Jason Wang

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.