All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH 0/9] virtio_net: support multi buffer xdp
@ 2022-11-22  7:43 Heng Qi
  2022-11-22  7:43 ` [RFC PATCH 1/9] virtio_net: disable the hole mechanism for xdp Heng Qi
                   ` (9 more replies)
  0 siblings, 10 replies; 30+ messages in thread
From: Heng Qi @ 2022-11-22  7:43 UTC (permalink / raw)
  To: netdev, bpf
  Cc: Jason Wang, Michael S. Tsirkin, Paolo Abeni, Jakub Kicinski,
	John Fastabend, David S. Miller, Daniel Borkmann,
	Alexei Starovoitov, Eric Dumazet

Currently, virtio net only supports xdp for single-buffer packets
or linearized multi-buffer packets. This patchset supports xdp for
multi-buffer packets, then GRO_HW related features can be
negotiated, and do not affect the processing of single-buffer xdp.

In order to build multi-buffer xdp neatly, we integrated the code
into virtnet_build_xdp_buff() for xdp. The first buffer is used
for prepared xdp buff, and the rest of the buffers are added to
its skb_shared_info structure. This structure can also be
conveniently converted during XDP_PASS to get the corresponding skb.

Since virtio net uses comp pages, and bpf_xdp_frags_increase_tail()
is based on the assumption of the page pool,
(rxq->frag_size - skb_frag_size(frag) - skb_frag_off(frag))
is negative in most cases. So we didn't set xdp_rxq->frag_size in
virtnet_open() to disable the tail increase.

Heng Qi (9):
  virtio_net: disable the hole mechanism for xdp
  virtio_net: set up xdp for multi buffer packets
  virtio_net: update bytes calculation for xdp_frame
  virtio_net: remove xdp related info from page_to_skb()
  virtio_net: build xdp_buff with multi buffers
  virtio_net: construct multi-buffer xdp in mergeable
  virtio_net: build skb from multi-buffer xdp
  virtio_net: transmit the multi-buffer xdp
  virtio_net: support multi-buffer xdp

 drivers/net/virtio_net.c | 356 ++++++++++++++++++++++++---------------
 1 file changed, 219 insertions(+), 137 deletions(-)

-- 
2.19.1.6.gb485710b


^ permalink raw reply	[flat|nested] 30+ messages in thread

* [RFC PATCH 1/9] virtio_net: disable the hole mechanism for xdp
  2022-11-22  7:43 [RFC PATCH 0/9] virtio_net: support multi buffer xdp Heng Qi
@ 2022-11-22  7:43 ` Heng Qi
  2022-12-06  5:20   ` Jason Wang
  2022-11-22  7:43 ` [RFC PATCH 2/9] virtio_net: set up xdp for multi buffer packets Heng Qi
                   ` (8 subsequent siblings)
  9 siblings, 1 reply; 30+ messages in thread
From: Heng Qi @ 2022-11-22  7:43 UTC (permalink / raw)
  To: netdev, bpf
  Cc: Jason Wang, Michael S. Tsirkin, Paolo Abeni, Jakub Kicinski,
	John Fastabend, David S. Miller, Daniel Borkmann,
	Alexei Starovoitov, Eric Dumazet

XDP core assumes that the frame_size of xdp_buff and the length of
the frag are PAGE_SIZE. But before xdp is set, the length of the prefilled
buffer may exceed PAGE_SIZE, which may cause the processing of xdp to fail,
so we disable the hole mechanism when xdp is loaded.

Signed-off-by: Heng Qi <hengqi@linux.alibaba.com>
Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
---
 drivers/net/virtio_net.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 9cce7dec7366..c5046d21b281 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -1419,8 +1419,11 @@ static int add_recvbuf_mergeable(struct virtnet_info *vi,
 		/* To avoid internal fragmentation, if there is very likely not
 		 * enough space for another buffer, add the remaining space to
 		 * the current buffer.
+		 * XDP core assumes that frame_size of xdp_buff and the length
+		 * of the frag are PAGE_SIZE, so we disable the hole mechanism.
 		 */
-		len += hole;
+		if (!vi->xdp_enabled)
+			len += hole;
 		alloc_frag->offset += hole;
 	}
 
-- 
2.19.1.6.gb485710b


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [RFC PATCH 2/9] virtio_net: set up xdp for multi buffer packets
  2022-11-22  7:43 [RFC PATCH 0/9] virtio_net: support multi buffer xdp Heng Qi
  2022-11-22  7:43 ` [RFC PATCH 1/9] virtio_net: disable the hole mechanism for xdp Heng Qi
@ 2022-11-22  7:43 ` Heng Qi
  2022-12-06  5:29   ` Jason Wang
  2022-11-22  7:43 ` [RFC PATCH 3/9] virtio_net: update bytes calculation for xdp_frame Heng Qi
                   ` (7 subsequent siblings)
  9 siblings, 1 reply; 30+ messages in thread
From: Heng Qi @ 2022-11-22  7:43 UTC (permalink / raw)
  To: netdev, bpf
  Cc: Jason Wang, Michael S. Tsirkin, Paolo Abeni, Jakub Kicinski,
	John Fastabend, David S. Miller, Daniel Borkmann,
	Alexei Starovoitov, Eric Dumazet

When the xdp program sets xdp.frags, which means it can process
multi-buffer packets, so we continue to open xdp support when
features such as GRO_HW are negotiated.

Signed-off-by: Heng Qi <hengqi@linux.alibaba.com>
Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
---
 drivers/net/virtio_net.c | 30 +++++++++++++++++-------------
 1 file changed, 17 insertions(+), 13 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index c5046d21b281..8f7d207d58d6 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -3080,14 +3080,21 @@ static int virtnet_xdp_set(struct net_device *dev, struct bpf_prog *prog,
 	u16 xdp_qp = 0, curr_qp;
 	int i, err;
 
-	if (!virtio_has_feature(vi->vdev, VIRTIO_NET_F_CTRL_GUEST_OFFLOADS)
-	    && (virtio_has_feature(vi->vdev, VIRTIO_NET_F_GUEST_TSO4) ||
-	        virtio_has_feature(vi->vdev, VIRTIO_NET_F_GUEST_TSO6) ||
-	        virtio_has_feature(vi->vdev, VIRTIO_NET_F_GUEST_ECN) ||
-		virtio_has_feature(vi->vdev, VIRTIO_NET_F_GUEST_UFO) ||
-		virtio_has_feature(vi->vdev, VIRTIO_NET_F_GUEST_CSUM))) {
-		NL_SET_ERR_MSG_MOD(extack, "Can't set XDP while host is implementing GRO_HW/CSUM, disable GRO_HW/CSUM first");
-		return -EOPNOTSUPP;
+	if (!virtio_has_feature(vi->vdev, VIRTIO_NET_F_CTRL_GUEST_OFFLOADS)) {
+		if (virtio_has_feature(vi->vdev, VIRTIO_NET_F_GUEST_CSUM)) {
+			NL_SET_ERR_MSG_MOD(extack, "Can't set XDP without frags while guest is implementing GUEST_CSUM");
+			return -EOPNOTSUPP;
+		}
+
+		if (prog && !prog->aux->xdp_has_frags) {
+			if (virtio_has_feature(vi->vdev, VIRTIO_NET_F_GUEST_TSO4) ||
+			    virtio_has_feature(vi->vdev, VIRTIO_NET_F_GUEST_TSO6) ||
+			    virtio_has_feature(vi->vdev, VIRTIO_NET_F_GUEST_ECN) ||
+			    virtio_has_feature(vi->vdev, VIRTIO_NET_F_GUEST_UFO)) {
+				NL_SET_ERR_MSG_MOD(extack, "Can't set XDP without frags while guest is implementing GUEST_GRO_HW");
+				return -EOPNOTSUPP;
+			}
+		}
 	}
 
 	if (vi->mergeable_rx_bufs && !vi->any_header_sg) {
@@ -3095,8 +3102,8 @@ static int virtnet_xdp_set(struct net_device *dev, struct bpf_prog *prog,
 		return -EINVAL;
 	}
 
-	if (dev->mtu > max_sz) {
-		NL_SET_ERR_MSG_MOD(extack, "MTU too large to enable XDP");
+	if (prog && !prog->aux->xdp_has_frags && dev->mtu > max_sz) {
+		NL_SET_ERR_MSG_MOD(extack, "MTU too large to enable XDP without frags");
 		netdev_warn(dev, "XDP requires MTU less than %lu\n", max_sz);
 		return -EINVAL;
 	}
@@ -3218,9 +3225,6 @@ static int virtnet_set_features(struct net_device *dev,
 	int err;
 
 	if ((dev->features ^ features) & NETIF_F_GRO_HW) {
-		if (vi->xdp_enabled)
-			return -EBUSY;
-
 		if (features & NETIF_F_GRO_HW)
 			offloads = vi->guest_offloads_capable;
 		else
-- 
2.19.1.6.gb485710b


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [RFC PATCH 3/9] virtio_net: update bytes calculation for xdp_frame
  2022-11-22  7:43 [RFC PATCH 0/9] virtio_net: support multi buffer xdp Heng Qi
  2022-11-22  7:43 ` [RFC PATCH 1/9] virtio_net: disable the hole mechanism for xdp Heng Qi
  2022-11-22  7:43 ` [RFC PATCH 2/9] virtio_net: set up xdp for multi buffer packets Heng Qi
@ 2022-11-22  7:43 ` Heng Qi
  2022-12-06  5:31   ` Jason Wang
  2022-11-22  7:43 ` [RFC PATCH 4/9] virtio_net: remove xdp related info from page_to_skb() Heng Qi
                   ` (6 subsequent siblings)
  9 siblings, 1 reply; 30+ messages in thread
From: Heng Qi @ 2022-11-22  7:43 UTC (permalink / raw)
  To: netdev, bpf
  Cc: Jason Wang, Michael S. Tsirkin, Paolo Abeni, Jakub Kicinski,
	John Fastabend, David S. Miller, Daniel Borkmann,
	Alexei Starovoitov, Eric Dumazet

Update relative record value for xdp_frame as basis
for multi-buffer xdp transmission.

Signed-off-by: Heng Qi <hengqi@linux.alibaba.com>
Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
---
 drivers/net/virtio_net.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 8f7d207d58d6..d3e8c63b9c4b 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -658,7 +658,7 @@ static int virtnet_xdp_xmit(struct net_device *dev,
 		if (likely(is_xdp_frame(ptr))) {
 			struct xdp_frame *frame = ptr_to_xdp(ptr);
 
-			bytes += frame->len;
+			bytes += xdp_get_frame_len(frame);
 			xdp_return_frame(frame);
 		} else {
 			struct sk_buff *skb = ptr;
@@ -1604,7 +1604,7 @@ static void free_old_xmit_skbs(struct send_queue *sq, bool in_napi)
 		} else {
 			struct xdp_frame *frame = ptr_to_xdp(ptr);
 
-			bytes += frame->len;
+			bytes += xdp_get_frame_len(frame);
 			xdp_return_frame(frame);
 		}
 		packets++;
-- 
2.19.1.6.gb485710b


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [RFC PATCH 4/9] virtio_net: remove xdp related info from page_to_skb()
  2022-11-22  7:43 [RFC PATCH 0/9] virtio_net: support multi buffer xdp Heng Qi
                   ` (2 preceding siblings ...)
  2022-11-22  7:43 ` [RFC PATCH 3/9] virtio_net: update bytes calculation for xdp_frame Heng Qi
@ 2022-11-22  7:43 ` Heng Qi
  2022-12-06  5:36   ` Jason Wang
  2022-11-22  7:43 ` [RFC PATCH 5/9] virtio_net: build xdp_buff with multi buffers Heng Qi
                   ` (5 subsequent siblings)
  9 siblings, 1 reply; 30+ messages in thread
From: Heng Qi @ 2022-11-22  7:43 UTC (permalink / raw)
  To: netdev, bpf
  Cc: Jason Wang, Michael S. Tsirkin, Paolo Abeni, Jakub Kicinski,
	John Fastabend, David S. Miller, Daniel Borkmann,
	Alexei Starovoitov, Eric Dumazet

For the clear construction of multi-buffer xdp_buff, we now remove the xdp
processing interleaved with page_to_skb() before, and the logic of xdp and
building skb from xdp will be separate and independent.

Signed-off-by: Heng Qi <hengqi@linux.alibaba.com>
Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
---
 drivers/net/virtio_net.c | 41 +++++++++-------------------------------
 1 file changed, 9 insertions(+), 32 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index d3e8c63b9c4b..cd65f85d5075 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -439,9 +439,7 @@ static unsigned int mergeable_ctx_to_truesize(void *mrg_ctx)
 static struct sk_buff *page_to_skb(struct virtnet_info *vi,
 				   struct receive_queue *rq,
 				   struct page *page, unsigned int offset,
-				   unsigned int len, unsigned int truesize,
-				   bool hdr_valid, unsigned int metasize,
-				   unsigned int headroom)
+				   unsigned int len, unsigned int truesize)
 {
 	struct sk_buff *skb;
 	struct virtio_net_hdr_mrg_rxbuf *hdr;
@@ -459,21 +457,11 @@ static struct sk_buff *page_to_skb(struct virtnet_info *vi,
 	else
 		hdr_padded_len = sizeof(struct padded_vnet_hdr);
 
-	/* If headroom is not 0, there is an offset between the beginning of the
-	 * data and the allocated space, otherwise the data and the allocated
-	 * space are aligned.
-	 *
-	 * Buffers with headroom use PAGE_SIZE as alloc size, see
-	 * add_recvbuf_mergeable() + get_mergeable_buf_len()
-	 */
-	truesize = headroom ? PAGE_SIZE : truesize;
-	tailroom = truesize - headroom;
-	buf = p - headroom;
-
+	buf = p;
 	len -= hdr_len;
 	offset += hdr_padded_len;
 	p += hdr_padded_len;
-	tailroom -= hdr_padded_len + len;
+	tailroom = truesize - hdr_padded_len - len;
 
 	shinfo_size = SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
 
@@ -503,7 +491,7 @@ static struct sk_buff *page_to_skb(struct virtnet_info *vi,
 	if (len <= skb_tailroom(skb))
 		copy = len;
 	else
-		copy = ETH_HLEN + metasize;
+		copy = ETH_HLEN;
 	skb_put_data(skb, p, copy);
 
 	len -= copy;
@@ -542,19 +530,11 @@ static struct sk_buff *page_to_skb(struct virtnet_info *vi,
 		give_pages(rq, page);
 
 ok:
-	/* hdr_valid means no XDP, so we can copy the vnet header */
-	if (hdr_valid) {
-		hdr = skb_vnet_hdr(skb);
-		memcpy(hdr, hdr_p, hdr_len);
-	}
+	hdr = skb_vnet_hdr(skb);
+	memcpy(hdr, hdr_p, hdr_len);
 	if (page_to_free)
 		put_page(page_to_free);
 
-	if (metasize) {
-		__skb_pull(skb, metasize);
-		skb_metadata_set(skb, metasize);
-	}
-
 	return skb;
 }
 
@@ -917,7 +897,7 @@ static struct sk_buff *receive_big(struct net_device *dev,
 {
 	struct page *page = buf;
 	struct sk_buff *skb =
-		page_to_skb(vi, rq, page, 0, len, PAGE_SIZE, true, 0, 0);
+		page_to_skb(vi, rq, page, 0, len, PAGE_SIZE);
 
 	stats->bytes += len - vi->hdr_len;
 	if (unlikely(!skb))
@@ -1060,9 +1040,7 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
 				rcu_read_unlock();
 				put_page(page);
 				head_skb = page_to_skb(vi, rq, xdp_page, offset,
-						       len, PAGE_SIZE, false,
-						       metasize,
-						       headroom);
+						       len, PAGE_SIZE);
 				return head_skb;
 			}
 			break;
@@ -1116,8 +1094,7 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
 	rcu_read_unlock();
 
 skip_xdp:
-	head_skb = page_to_skb(vi, rq, page, offset, len, truesize, !xdp_prog,
-			       metasize, headroom);
+	head_skb = page_to_skb(vi, rq, page, offset, len, truesize);
 	curr_skb = head_skb;
 
 	if (unlikely(!curr_skb))
-- 
2.19.1.6.gb485710b


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [RFC PATCH 5/9] virtio_net: build xdp_buff with multi buffers
  2022-11-22  7:43 [RFC PATCH 0/9] virtio_net: support multi buffer xdp Heng Qi
                   ` (3 preceding siblings ...)
  2022-11-22  7:43 ` [RFC PATCH 4/9] virtio_net: remove xdp related info from page_to_skb() Heng Qi
@ 2022-11-22  7:43 ` Heng Qi
  2022-12-06  6:14   ` Jason Wang
  2022-11-22  7:43 ` [RFC PATCH 6/9] virtio_net: construct multi-buffer xdp in mergeable Heng Qi
                   ` (4 subsequent siblings)
  9 siblings, 1 reply; 30+ messages in thread
From: Heng Qi @ 2022-11-22  7:43 UTC (permalink / raw)
  To: netdev, bpf
  Cc: Jason Wang, Michael S. Tsirkin, Paolo Abeni, Jakub Kicinski,
	John Fastabend, David S. Miller, Daniel Borkmann,
	Alexei Starovoitov, Eric Dumazet

Support xdp for multi buffer packets.

Putting the first buffer as the linear part for xdp_buff,
and the rest of the buffers as non-linear fragments to struct
skb_shared_info in the tailroom belonging to xdp_buff.

Signed-off-by: Heng Qi <hengqi@linux.alibaba.com>
Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
---
 drivers/net/virtio_net.c | 74 ++++++++++++++++++++++++++++++++++++++++
 1 file changed, 74 insertions(+)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index cd65f85d5075..20784b1d8236 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -911,6 +911,80 @@ static struct sk_buff *receive_big(struct net_device *dev,
 	return NULL;
 }
 
+static int virtnet_build_xdp_buff(struct net_device *dev,
+				  struct virtnet_info *vi,
+				  struct receive_queue *rq,
+				  struct xdp_buff *xdp,
+				  void *buf,
+				  unsigned int len,
+				  unsigned int frame_sz,
+				  u16 *num_buf,
+				  unsigned int *xdp_frags_truesize,
+				  struct virtnet_rq_stats *stats)
+{
+	unsigned int tailroom = SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
+	struct virtio_net_hdr_mrg_rxbuf *hdr = buf;
+	unsigned int truesize, headroom;
+	struct skb_shared_info *shinfo;
+	unsigned int xdp_frags_truesz = 0;
+	unsigned int cur_frag_size;
+	struct page *page;
+	skb_frag_t *frag;
+	int offset;
+	void *ctx;
+
+	xdp_init_buff(xdp, frame_sz, &rq->xdp_rxq);
+	xdp_prepare_buff(xdp, buf - VIRTIO_XDP_HEADROOM,
+			 VIRTIO_XDP_HEADROOM + vi->hdr_len, len - vi->hdr_len, true);
+	shinfo = xdp_get_shared_info_from_buff(xdp);
+	shinfo->nr_frags = 0;
+	shinfo->xdp_frags_size = 0;
+
+	if ((*num_buf - 1) > MAX_SKB_FRAGS)
+		return -EINVAL;
+
+	while ((--*num_buf) >= 1) {
+		buf = virtqueue_get_buf_ctx(rq->vq, &len, &ctx);
+		if (unlikely(!buf)) {
+			pr_debug("%s: rx error: %d buffers out of %d missing\n",
+				 dev->name, *num_buf,
+				 virtio16_to_cpu(vi->vdev, hdr->num_buffers));
+			dev->stats.rx_length_errors++;
+			return -EINVAL;
+		}
+
+		if (!xdp_buff_has_frags(xdp))
+			xdp_buff_set_frags_flag(xdp);
+
+		stats->bytes += len;
+		page = virt_to_head_page(buf);
+		offset = buf - page_address(page);
+		truesize = mergeable_ctx_to_truesize(ctx);
+		headroom = mergeable_ctx_to_headroom(ctx);
+
+		cur_frag_size = truesize + (headroom ? (headroom + tailroom) : 0);
+		xdp_frags_truesz += cur_frag_size;
+		if (unlikely(len > truesize || cur_frag_size > PAGE_SIZE)) {
+			pr_debug("%s: rx error: len %u exceeds truesize %lu\n",
+				 dev->name, len, (unsigned long)ctx);
+			dev->stats.rx_length_errors++;
+			return -EINVAL;
+		}
+
+		frag = &shinfo->frags[shinfo->nr_frags++];
+		__skb_frag_set_page(frag, page);
+		skb_frag_off_set(frag, offset);
+		skb_frag_size_set(frag, len);
+		if (page_is_pfmemalloc(page))
+			xdp_buff_set_frag_pfmemalloc(xdp);
+
+		shinfo->xdp_frags_size += len;
+	}
+
+	*xdp_frags_truesize = xdp_frags_truesz;
+	return 0;
+}
+
 static struct sk_buff *receive_mergeable(struct net_device *dev,
 					 struct virtnet_info *vi,
 					 struct receive_queue *rq,
-- 
2.19.1.6.gb485710b


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [RFC PATCH 6/9] virtio_net: construct multi-buffer xdp in mergeable
  2022-11-22  7:43 [RFC PATCH 0/9] virtio_net: support multi buffer xdp Heng Qi
                   ` (4 preceding siblings ...)
  2022-11-22  7:43 ` [RFC PATCH 5/9] virtio_net: build xdp_buff with multi buffers Heng Qi
@ 2022-11-22  7:43 ` Heng Qi
  2022-12-06  6:33   ` Jason Wang
  2022-11-22  7:43 ` [RFC PATCH 7/9] virtio_net: build skb from multi-buffer xdp Heng Qi
                   ` (3 subsequent siblings)
  9 siblings, 1 reply; 30+ messages in thread
From: Heng Qi @ 2022-11-22  7:43 UTC (permalink / raw)
  To: netdev, bpf
  Cc: Jason Wang, Michael S. Tsirkin, Paolo Abeni, Jakub Kicinski,
	John Fastabend, David S. Miller, Daniel Borkmann,
	Alexei Starovoitov, Eric Dumazet

Build multi-buffer xdp using virtnet_build_xdp_buff() in mergeable.

For the prefilled buffer before xdp is set, vq reset can be
used to clear it, but most devices do not support it at present.
In order not to bother users who are using xdp normally, we do
not use vq reset for the time being. At the same time, virtio
net currently uses comp pages, and bpf_xdp_frags_increase_tail()
needs to calculate the tailroom of the last frag, which will
involve the offset of the corresponding page and cause a negative
value, so we disable tail increase by not setting xdp_rxq->frag_size.

Signed-off-by: Heng Qi <hengqi@linux.alibaba.com>
Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
---
 drivers/net/virtio_net.c | 67 +++++++++++++++++++++++-----------------
 1 file changed, 38 insertions(+), 29 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 20784b1d8236..83e6933ae62b 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -994,6 +994,7 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
 					 unsigned int *xdp_xmit,
 					 struct virtnet_rq_stats *stats)
 {
+	unsigned int tailroom = SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
 	struct virtio_net_hdr_mrg_rxbuf *hdr = buf;
 	u16 num_buf = virtio16_to_cpu(vi->vdev, hdr->num_buffers);
 	struct page *page = virt_to_head_page(buf);
@@ -1024,53 +1025,50 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
 	rcu_read_lock();
 	xdp_prog = rcu_dereference(rq->xdp_prog);
 	if (xdp_prog) {
+		unsigned int xdp_frags_truesz = 0;
+		struct skb_shared_info *shinfo;
 		struct xdp_frame *xdpf;
 		struct page *xdp_page;
 		struct xdp_buff xdp;
 		void *data;
 		u32 act;
+		int i;
 
-		/* Transient failure which in theory could occur if
-		 * in-flight packets from before XDP was enabled reach
-		 * the receive path after XDP is loaded.
-		 */
-		if (unlikely(hdr->hdr.gso_type))
-			goto err_xdp;
-
-		/* Buffers with headroom use PAGE_SIZE as alloc size,
-		 * see add_recvbuf_mergeable() + get_mergeable_buf_len()
+		/* Now XDP core assumes frag size is PAGE_SIZE, but buffers
+		 * with headroom may add hole in truesize, which
+		 * make their length exceed PAGE_SIZE. So we disabled the
+		 * hole mechanism for xdp. See add_recvbuf_mergeable().
 		 */
 		frame_sz = headroom ? PAGE_SIZE : truesize;
 
-		/* This happens when rx buffer size is underestimated
-		 * or headroom is not enough because of the buffer
-		 * was refilled before XDP is set. This should only
-		 * happen for the first several packets, so we don't
-		 * care much about its performance.
+		/* This happens when headroom is not enough because
+		 * of the buffer was prefilled before XDP is set.
+		 * This should only happen for the first several packets.
+		 * In fact, vq reset can be used here to help us clean up
+		 * the prefilled buffers, but many existing devices do not
+		 * support it, and we don't want to bother users who are
+		 * using xdp normally.
 		 */
-		if (unlikely(num_buf > 1 ||
-			     headroom < virtnet_get_headroom(vi))) {
-			/* linearize data for XDP */
-			xdp_page = xdp_linearize_page(rq, &num_buf,
-						      page, offset,
-						      VIRTIO_XDP_HEADROOM,
-						      &len);
-			frame_sz = PAGE_SIZE;
+		if (unlikely(headroom < virtnet_get_headroom(vi))) {
+			if ((VIRTIO_XDP_HEADROOM + len + tailroom) > PAGE_SIZE)
+				goto err_xdp;
 
+			xdp_page = alloc_page(GFP_ATOMIC);
 			if (!xdp_page)
 				goto err_xdp;
+
+			memcpy(page_address(xdp_page) + VIRTIO_XDP_HEADROOM,
+			       page_address(page) + offset, len);
+			frame_sz = PAGE_SIZE;
 			offset = VIRTIO_XDP_HEADROOM;
 		} else {
 			xdp_page = page;
 		}
-
-		/* Allow consuming headroom but reserve enough space to push
-		 * the descriptor on if we get an XDP_TX return code.
-		 */
 		data = page_address(xdp_page) + offset;
-		xdp_init_buff(&xdp, frame_sz - vi->hdr_len, &rq->xdp_rxq);
-		xdp_prepare_buff(&xdp, data - VIRTIO_XDP_HEADROOM + vi->hdr_len,
-				 VIRTIO_XDP_HEADROOM, len - vi->hdr_len, true);
+		err = virtnet_build_xdp_buff(dev, vi, rq, &xdp, data, len, frame_sz,
+					     &num_buf, &xdp_frags_truesz, stats);
+		if (unlikely(err))
+			goto err_xdp_frags;
 
 		act = bpf_prog_run_xdp(xdp_prog, &xdp);
 		stats->xdp_packets++;
@@ -1164,6 +1162,17 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
 				__free_pages(xdp_page, 0);
 			goto err_xdp;
 		}
+err_xdp_frags:
+		shinfo = xdp_get_shared_info_from_buff(&xdp);
+
+		if (unlikely(xdp_page != page))
+			__free_pages(xdp_page, 0);
+
+		for (i = 0; i < shinfo->nr_frags; i++) {
+			xdp_page = skb_frag_page(&shinfo->frags[i]);
+			put_page(xdp_page);
+		}
+		goto err_xdp;
 	}
 	rcu_read_unlock();
 
-- 
2.19.1.6.gb485710b


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [RFC PATCH 7/9] virtio_net: build skb from multi-buffer xdp
  2022-11-22  7:43 [RFC PATCH 0/9] virtio_net: support multi buffer xdp Heng Qi
                   ` (5 preceding siblings ...)
  2022-11-22  7:43 ` [RFC PATCH 6/9] virtio_net: construct multi-buffer xdp in mergeable Heng Qi
@ 2022-11-22  7:43 ` Heng Qi
  2022-11-22  7:43 ` [RFC PATCH 8/9] virtio_net: transmit the " Heng Qi
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 30+ messages in thread
From: Heng Qi @ 2022-11-22  7:43 UTC (permalink / raw)
  To: netdev, bpf
  Cc: Jason Wang, Michael S. Tsirkin, Paolo Abeni, Jakub Kicinski,
	John Fastabend, David S. Miller, Daniel Borkmann,
	Alexei Starovoitov, Eric Dumazet

This converts a multi-buffer xdp_buff directly to a skb.

Now, we have isolated the construction of skb based on
multi-buffer xdp from page_to_skb().

Signed-off-by: Heng Qi <hengqi@linux.alibaba.com>
Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
---
 drivers/net/virtio_net.c | 50 ++++++++++++++++++++++++++++++++++++++++
 1 file changed, 50 insertions(+)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 83e6933ae62b..260ce722c687 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -911,6 +911,56 @@ static struct sk_buff *receive_big(struct net_device *dev,
 	return NULL;
 }
 
+/* Why not use xdp_build_skb_from_frame() ?
+ * XDP core assumes that xdp frags are PAGE_SIZE in length, while in
+ * virtio-net there are 2 points that do not match its requirements:
+ *  1. The size of the prefilled buffer is not fixed before xdp is set.
+ *  2. When xdp is loaded, virtio-net has a hole mechanism (refer to
+ *     add_recvbuf_mergeable()), which will make the size of a buffer
+ *     exceed PAGE_SIZE.
+ */
+static struct sk_buff *build_skb_from_xdp_buff(struct net_device *dev,
+					       struct virtnet_info *vi,
+					       struct xdp_buff *xdp,
+					       unsigned int xdp_frags_truesz)
+{
+	struct skb_shared_info *sinfo = xdp_get_shared_info_from_buff(xdp);
+	unsigned int headroom, data_len;
+	struct sk_buff *skb;
+	int metasize;
+	u8 nr_frags;
+
+	if (unlikely(xdp->data_end > xdp_data_hard_end(xdp))) {
+		pr_debug("Error building skb as missing reserved tailroom for xdp");
+		return NULL;
+	}
+
+	if (unlikely(xdp_buff_has_frags(xdp)))
+		nr_frags = sinfo->nr_frags;
+
+	skb = build_skb(xdp->data_hard_start, xdp->frame_sz);
+	if (unlikely(!skb))
+		return NULL;
+
+	headroom = xdp->data - xdp->data_hard_start;
+	data_len = xdp->data_end - xdp->data;
+	skb_reserve(skb, headroom);
+	__skb_put(skb, data_len);
+
+	metasize = xdp->data - xdp->data_meta;
+	metasize = metasize > 0 ? metasize : 0;
+	if (metasize)
+		skb_metadata_set(skb, metasize);
+
+	if (unlikely(xdp_buff_has_frags(xdp)))
+		xdp_update_skb_shared_info(skb, nr_frags,
+					   sinfo->xdp_frags_size,
+					   xdp_frags_truesz,
+					   xdp_buff_is_frag_pfmemalloc(xdp));
+
+	return skb;
+}
+
 static int virtnet_build_xdp_buff(struct net_device *dev,
 				  struct virtnet_info *vi,
 				  struct receive_queue *rq,
-- 
2.19.1.6.gb485710b


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [RFC PATCH 8/9] virtio_net: transmit the multi-buffer xdp
  2022-11-22  7:43 [RFC PATCH 0/9] virtio_net: support multi buffer xdp Heng Qi
                   ` (6 preceding siblings ...)
  2022-11-22  7:43 ` [RFC PATCH 7/9] virtio_net: build skb from multi-buffer xdp Heng Qi
@ 2022-11-22  7:43 ` Heng Qi
  2022-11-22  7:43 ` [RFC PATCH 9/9] virtio_net: support " Heng Qi
  2022-12-02  4:50 ` [RFC PATCH 0/9] virtio_net: support multi buffer xdp Heng Qi
  9 siblings, 0 replies; 30+ messages in thread
From: Heng Qi @ 2022-11-22  7:43 UTC (permalink / raw)
  To: netdev, bpf
  Cc: Jason Wang, Michael S. Tsirkin, Paolo Abeni, Jakub Kicinski,
	John Fastabend, David S. Miller, Daniel Borkmann,
	Alexei Starovoitov, Eric Dumazet

This serves as the basis for XDP_TX and XDP_REDIRECT
to send a multi-buffer xdp_frame.

Signed-off-by: Heng Qi <hengqi@linux.alibaba.com>
Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
---
 drivers/net/virtio_net.c | 22 +++++++++++++++++-----
 1 file changed, 17 insertions(+), 5 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 260ce722c687..431f2126a2b5 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -543,22 +543,34 @@ static int __virtnet_xdp_xmit_one(struct virtnet_info *vi,
 				   struct xdp_frame *xdpf)
 {
 	struct virtio_net_hdr_mrg_rxbuf *hdr;
-	int err;
+	struct skb_shared_info *shinfo;
+	int err, i;
 
 	if (unlikely(xdpf->headroom < vi->hdr_len))
 		return -EOVERFLOW;
 
-	/* Make room for virtqueue hdr (also change xdpf->headroom?) */
+	shinfo = xdp_get_shared_info_from_frame(xdpf);
+	/* Need to adjust this to calculate the correct postion
+	 * for shinfo of the xdpf.
+	 */
+	xdpf->headroom -= vi->hdr_len;
 	xdpf->data -= vi->hdr_len;
 	/* Zero header and leave csum up to XDP layers */
 	hdr = xdpf->data;
 	memset(hdr, 0, vi->hdr_len);
 	xdpf->len   += vi->hdr_len;
 
-	sg_init_one(sq->sg, xdpf->data, xdpf->len);
+	sg_init_table(sq->sg, shinfo->nr_frags + 1);
+	sg_set_buf(sq->sg, xdpf->data, xdpf->len);
+	for (i = 0; i < shinfo->nr_frags; i++) {
+		skb_frag_t *frag = &shinfo->frags[i];
+
+		sg_set_page(&sq->sg[i + 1], skb_frag_page(frag),
+			    skb_frag_size(frag), skb_frag_off(frag));
+	}
 
-	err = virtqueue_add_outbuf(sq->vq, sq->sg, 1, xdp_to_ptr(xdpf),
-				   GFP_ATOMIC);
+	err = virtqueue_add_outbuf(sq->vq, sq->sg, shinfo->nr_frags + 1,
+				   xdp_to_ptr(xdpf), GFP_ATOMIC);
 	if (unlikely(err))
 		return -ENOSPC; /* Caller handle free/refcnt */
 
-- 
2.19.1.6.gb485710b


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [RFC PATCH 9/9] virtio_net: support multi-buffer xdp
  2022-11-22  7:43 [RFC PATCH 0/9] virtio_net: support multi buffer xdp Heng Qi
                   ` (7 preceding siblings ...)
  2022-11-22  7:43 ` [RFC PATCH 8/9] virtio_net: transmit the " Heng Qi
@ 2022-11-22  7:43 ` Heng Qi
  2022-12-06  6:42   ` Jason Wang
  2022-12-02  4:50 ` [RFC PATCH 0/9] virtio_net: support multi buffer xdp Heng Qi
  9 siblings, 1 reply; 30+ messages in thread
From: Heng Qi @ 2022-11-22  7:43 UTC (permalink / raw)
  To: netdev, bpf
  Cc: Jason Wang, Michael S. Tsirkin, Paolo Abeni, Jakub Kicinski,
	John Fastabend, David S. Miller, Daniel Borkmann,
	Alexei Starovoitov, Eric Dumazet

Driver can pass the skb to stack by build_skb_from_xdp_buff().

Driver forwards multi-buffer packets using the send queue
when XDP_TX and XDP_REDIRECT, and clears the reference of multi
pages when XDP_DROP.

Signed-off-by: Heng Qi <hengqi@linux.alibaba.com>
Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
---
 drivers/net/virtio_net.c | 65 ++++++----------------------------------
 1 file changed, 9 insertions(+), 56 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 431f2126a2b5..bbd5cd9bfd47 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -1065,7 +1065,6 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
 	struct bpf_prog *xdp_prog;
 	unsigned int truesize = mergeable_ctx_to_truesize(ctx);
 	unsigned int headroom = mergeable_ctx_to_headroom(ctx);
-	unsigned int metasize = 0;
 	unsigned int frame_sz;
 	int err;
 
@@ -1137,63 +1136,22 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
 
 		switch (act) {
 		case XDP_PASS:
-			metasize = xdp.data - xdp.data_meta;
-
-			/* recalculate offset to account for any header
-			 * adjustments and minus the metasize to copy the
-			 * metadata in page_to_skb(). Note other cases do not
-			 * build an skb and avoid using offset
-			 */
-			offset = xdp.data - page_address(xdp_page) -
-				 vi->hdr_len - metasize;
-
-			/* recalculate len if xdp.data, xdp.data_end or
-			 * xdp.data_meta were adjusted
-			 */
-			len = xdp.data_end - xdp.data + vi->hdr_len + metasize;
-
-			/* recalculate headroom if xdp.data or xdp_data_meta
-			 * were adjusted, note that offset should always point
-			 * to the start of the reserved bytes for virtio_net
-			 * header which are followed by xdp.data, that means
-			 * that offset is equal to the headroom (when buf is
-			 * starting at the beginning of the page, otherwise
-			 * there is a base offset inside the page) but it's used
-			 * with a different starting point (buf start) than
-			 * xdp.data (buf start + vnet hdr size). If xdp.data or
-			 * data_meta were adjusted by the xdp prog then the
-			 * headroom size has changed and so has the offset, we
-			 * can use data_hard_start, which points at buf start +
-			 * vnet hdr size, to calculate the new headroom and use
-			 * it later to compute buf start in page_to_skb()
-			 */
-			headroom = xdp.data - xdp.data_hard_start - metasize;
-
-			/* We can only create skb based on xdp_page. */
-			if (unlikely(xdp_page != page)) {
-				rcu_read_unlock();
-				put_page(page);
-				head_skb = page_to_skb(vi, rq, xdp_page, offset,
-						       len, PAGE_SIZE);
-				return head_skb;
-			}
-			break;
+			head_skb = build_skb_from_xdp_buff(dev, vi, &xdp, xdp_frags_truesz);
+			rcu_read_unlock();
+			return head_skb;
 		case XDP_TX:
 			stats->xdp_tx++;
 			xdpf = xdp_convert_buff_to_frame(&xdp);
 			if (unlikely(!xdpf)) {
-				if (unlikely(xdp_page != page))
-					put_page(xdp_page);
-				goto err_xdp;
+				pr_debug("%s: convert buff to frame failed for xdp\n", dev->name);
+				goto err_xdp_frags;
 			}
 			err = virtnet_xdp_xmit(dev, 1, &xdpf, 0);
 			if (unlikely(!err)) {
 				xdp_return_frame_rx_napi(xdpf);
 			} else if (unlikely(err < 0)) {
 				trace_xdp_exception(vi->dev, xdp_prog, act);
-				if (unlikely(xdp_page != page))
-					put_page(xdp_page);
-				goto err_xdp;
+				goto err_xdp_frags;
 			}
 			*xdp_xmit |= VIRTIO_XDP_TX;
 			if (unlikely(xdp_page != page))
@@ -1203,11 +1161,8 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
 		case XDP_REDIRECT:
 			stats->xdp_redirects++;
 			err = xdp_do_redirect(dev, &xdp, xdp_prog);
-			if (err) {
-				if (unlikely(xdp_page != page))
-					put_page(xdp_page);
-				goto err_xdp;
-			}
+			if (err)
+				goto err_xdp_frags;
 			*xdp_xmit |= VIRTIO_XDP_REDIR;
 			if (unlikely(xdp_page != page))
 				put_page(page);
@@ -1220,9 +1175,7 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
 			trace_xdp_exception(vi->dev, xdp_prog, act);
 			fallthrough;
 		case XDP_DROP:
-			if (unlikely(xdp_page != page))
-				__free_pages(xdp_page, 0);
-			goto err_xdp;
+			goto err_xdp_frags;
 		}
 err_xdp_frags:
 		shinfo = xdp_get_shared_info_from_buff(&xdp);
-- 
2.19.1.6.gb485710b


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [RFC PATCH 0/9] virtio_net: support multi buffer xdp
  2022-11-22  7:43 [RFC PATCH 0/9] virtio_net: support multi buffer xdp Heng Qi
                   ` (8 preceding siblings ...)
  2022-11-22  7:43 ` [RFC PATCH 9/9] virtio_net: support " Heng Qi
@ 2022-12-02  4:50 ` Heng Qi
  2022-12-02  5:41   ` Jason Wang
  9 siblings, 1 reply; 30+ messages in thread
From: Heng Qi @ 2022-12-02  4:50 UTC (permalink / raw)
  To: Jason Wang
  Cc: Michael S. Tsirkin, Paolo Abeni, Jakub Kicinski, John Fastabend,
	David S. Miller, Daniel Borkmann, Alexei Starovoitov,
	Eric Dumazet, netdev, bpf, Xuan Zhuo

Hi, Jason.

Do you have any comments on this series?

Thanks.

在 2022/11/22 下午3:43, Heng Qi 写道:
> Currently, virtio net only supports xdp for single-buffer packets
> or linearized multi-buffer packets. This patchset supports xdp for
> multi-buffer packets, then GRO_HW related features can be
> negotiated, and do not affect the processing of single-buffer xdp.
>
> In order to build multi-buffer xdp neatly, we integrated the code
> into virtnet_build_xdp_buff() for xdp. The first buffer is used
> for prepared xdp buff, and the rest of the buffers are added to
> its skb_shared_info structure. This structure can also be
> conveniently converted during XDP_PASS to get the corresponding skb.
>
> Since virtio net uses comp pages, and bpf_xdp_frags_increase_tail()
> is based on the assumption of the page pool,
> (rxq->frag_size - skb_frag_size(frag) - skb_frag_off(frag))
> is negative in most cases. So we didn't set xdp_rxq->frag_size in
> virtnet_open() to disable the tail increase.
>
> Heng Qi (9):
>    virtio_net: disable the hole mechanism for xdp
>    virtio_net: set up xdp for multi buffer packets
>    virtio_net: update bytes calculation for xdp_frame
>    virtio_net: remove xdp related info from page_to_skb()
>    virtio_net: build xdp_buff with multi buffers
>    virtio_net: construct multi-buffer xdp in mergeable
>    virtio_net: build skb from multi-buffer xdp
>    virtio_net: transmit the multi-buffer xdp
>    virtio_net: support multi-buffer xdp
>
>   drivers/net/virtio_net.c | 356 ++++++++++++++++++++++++---------------
>   1 file changed, 219 insertions(+), 137 deletions(-)
>


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC PATCH 0/9] virtio_net: support multi buffer xdp
  2022-12-02  4:50 ` [RFC PATCH 0/9] virtio_net: support multi buffer xdp Heng Qi
@ 2022-12-02  5:41   ` Jason Wang
  0 siblings, 0 replies; 30+ messages in thread
From: Jason Wang @ 2022-12-02  5:41 UTC (permalink / raw)
  To: Heng Qi
  Cc: Michael S. Tsirkin, Paolo Abeni, Jakub Kicinski, John Fastabend,
	David S. Miller, Daniel Borkmann, Alexei Starovoitov,
	Eric Dumazet, netdev, bpf, Xuan Zhuo

On Fri, Dec 2, 2022 at 12:50 PM Heng Qi <hengqi@linux.alibaba.com> wrote:
>
> Hi, Jason.
>
> Do you have any comments on this series?

-EBUSY this week, but I will review it next week for sure.

Thanks

>
> Thanks.
>
> 在 2022/11/22 下午3:43, Heng Qi 写道:
> > Currently, virtio net only supports xdp for single-buffer packets
> > or linearized multi-buffer packets. This patchset supports xdp for
> > multi-buffer packets, then GRO_HW related features can be
> > negotiated, and do not affect the processing of single-buffer xdp.
> >
> > In order to build multi-buffer xdp neatly, we integrated the code
> > into virtnet_build_xdp_buff() for xdp. The first buffer is used
> > for prepared xdp buff, and the rest of the buffers are added to
> > its skb_shared_info structure. This structure can also be
> > conveniently converted during XDP_PASS to get the corresponding skb.
> >
> > Since virtio net uses comp pages, and bpf_xdp_frags_increase_tail()
> > is based on the assumption of the page pool,
> > (rxq->frag_size - skb_frag_size(frag) - skb_frag_off(frag))
> > is negative in most cases. So we didn't set xdp_rxq->frag_size in
> > virtnet_open() to disable the tail increase.
> >
> > Heng Qi (9):
> >    virtio_net: disable the hole mechanism for xdp
> >    virtio_net: set up xdp for multi buffer packets
> >    virtio_net: update bytes calculation for xdp_frame
> >    virtio_net: remove xdp related info from page_to_skb()
> >    virtio_net: build xdp_buff with multi buffers
> >    virtio_net: construct multi-buffer xdp in mergeable
> >    virtio_net: build skb from multi-buffer xdp
> >    virtio_net: transmit the multi-buffer xdp
> >    virtio_net: support multi-buffer xdp
> >
> >   drivers/net/virtio_net.c | 356 ++++++++++++++++++++++++---------------
> >   1 file changed, 219 insertions(+), 137 deletions(-)
> >
>


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC PATCH 1/9] virtio_net: disable the hole mechanism for xdp
  2022-11-22  7:43 ` [RFC PATCH 1/9] virtio_net: disable the hole mechanism for xdp Heng Qi
@ 2022-12-06  5:20   ` Jason Wang
  2022-12-08  8:20     ` Heng Qi
  0 siblings, 1 reply; 30+ messages in thread
From: Jason Wang @ 2022-12-06  5:20 UTC (permalink / raw)
  To: Heng Qi
  Cc: netdev, bpf, Michael S. Tsirkin, Paolo Abeni, Jakub Kicinski,
	John Fastabend, David S. Miller, Daniel Borkmann,
	Alexei Starovoitov, Eric Dumazet

On Tue, Nov 22, 2022 at 3:44 PM Heng Qi <hengqi@linux.alibaba.com> wrote:
>
> XDP core assumes that the frame_size of xdp_buff and the length of
> the frag are PAGE_SIZE. But before xdp is set, the length of the prefilled
> buffer may exceed PAGE_SIZE, which may cause the processing of xdp to fail,
> so we disable the hole mechanism when xdp is loaded.
>
> Signed-off-by: Heng Qi <hengqi@linux.alibaba.com>
> Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> ---
>  drivers/net/virtio_net.c | 5 ++++-
>  1 file changed, 4 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index 9cce7dec7366..c5046d21b281 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -1419,8 +1419,11 @@ static int add_recvbuf_mergeable(struct virtnet_info *vi,
>                 /* To avoid internal fragmentation, if there is very likely not
>                  * enough space for another buffer, add the remaining space to
>                  * the current buffer.
> +                * XDP core assumes that frame_size of xdp_buff and the length
> +                * of the frag are PAGE_SIZE, so we disable the hole mechanism.
>                  */
> -               len += hole;
> +               if (!vi->xdp_enabled)

How is this synchronized with virtnet_xdp_set()?

I think we need to use headroom here since it did:

static unsigned int virtnet_get_headroom(struct virtnet_info *vi)
{
        return vi->xdp_enabled ? VIRTIO_XDP_HEADROOM : 0;
}

Otherwise xdp_enabled could be re-read which may lead bugs.

Thanks

> +                       len += hole;
>                 alloc_frag->offset += hole;
>         }
>
> --
> 2.19.1.6.gb485710b
>


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC PATCH 2/9] virtio_net: set up xdp for multi buffer packets
  2022-11-22  7:43 ` [RFC PATCH 2/9] virtio_net: set up xdp for multi buffer packets Heng Qi
@ 2022-12-06  5:29   ` Jason Wang
  2022-12-08  8:21     ` Heng Qi
  0 siblings, 1 reply; 30+ messages in thread
From: Jason Wang @ 2022-12-06  5:29 UTC (permalink / raw)
  To: Heng Qi
  Cc: netdev, bpf, Michael S. Tsirkin, Paolo Abeni, Jakub Kicinski,
	John Fastabend, David S. Miller, Daniel Borkmann,
	Alexei Starovoitov, Eric Dumazet

On Tue, Nov 22, 2022 at 3:44 PM Heng Qi <hengqi@linux.alibaba.com> wrote:
>
> When the xdp program sets xdp.frags, which means it can process
> multi-buffer packets, so we continue to open xdp support when
> features such as GRO_HW are negotiated.
>
> Signed-off-by: Heng Qi <hengqi@linux.alibaba.com>
> Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> ---
>  drivers/net/virtio_net.c | 30 +++++++++++++++++-------------
>  1 file changed, 17 insertions(+), 13 deletions(-)
>
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index c5046d21b281..8f7d207d58d6 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -3080,14 +3080,21 @@ static int virtnet_xdp_set(struct net_device *dev, struct bpf_prog *prog,
>         u16 xdp_qp = 0, curr_qp;
>         int i, err;
>
> -       if (!virtio_has_feature(vi->vdev, VIRTIO_NET_F_CTRL_GUEST_OFFLOADS)
> -           && (virtio_has_feature(vi->vdev, VIRTIO_NET_F_GUEST_TSO4) ||
> -               virtio_has_feature(vi->vdev, VIRTIO_NET_F_GUEST_TSO6) ||
> -               virtio_has_feature(vi->vdev, VIRTIO_NET_F_GUEST_ECN) ||
> -               virtio_has_feature(vi->vdev, VIRTIO_NET_F_GUEST_UFO) ||
> -               virtio_has_feature(vi->vdev, VIRTIO_NET_F_GUEST_CSUM))) {
> -               NL_SET_ERR_MSG_MOD(extack, "Can't set XDP while host is implementing GRO_HW/CSUM, disable GRO_HW/CSUM first");
> -               return -EOPNOTSUPP;
> +       if (!virtio_has_feature(vi->vdev, VIRTIO_NET_F_CTRL_GUEST_OFFLOADS)) {
> +               if (virtio_has_feature(vi->vdev, VIRTIO_NET_F_GUEST_CSUM)) {
> +                       NL_SET_ERR_MSG_MOD(extack, "Can't set XDP without frags while guest is implementing GUEST_CSUM");
> +                       return -EOPNOTSUPP;
> +               }
> +
> +               if (prog && !prog->aux->xdp_has_frags) {
> +                       if (virtio_has_feature(vi->vdev, VIRTIO_NET_F_GUEST_TSO4) ||
> +                           virtio_has_feature(vi->vdev, VIRTIO_NET_F_GUEST_TSO6) ||
> +                           virtio_has_feature(vi->vdev, VIRTIO_NET_F_GUEST_ECN) ||
> +                           virtio_has_feature(vi->vdev, VIRTIO_NET_F_GUEST_UFO)) {
> +                               NL_SET_ERR_MSG_MOD(extack, "Can't set XDP without frags while guest is implementing GUEST_GRO_HW");
> +                               return -EOPNOTSUPP;
> +                       }
> +               }
>         }
>
>         if (vi->mergeable_rx_bufs && !vi->any_header_sg) {
> @@ -3095,8 +3102,8 @@ static int virtnet_xdp_set(struct net_device *dev, struct bpf_prog *prog,
>                 return -EINVAL;
>         }
>
> -       if (dev->mtu > max_sz) {
> -               NL_SET_ERR_MSG_MOD(extack, "MTU too large to enable XDP");
> +       if (prog && !prog->aux->xdp_has_frags && dev->mtu > max_sz) {
> +               NL_SET_ERR_MSG_MOD(extack, "MTU too large to enable XDP without frags");
>                 netdev_warn(dev, "XDP requires MTU less than %lu\n", max_sz);
>                 return -EINVAL;
>         }
> @@ -3218,9 +3225,6 @@ static int virtnet_set_features(struct net_device *dev,
>         int err;
>
>         if ((dev->features ^ features) & NETIF_F_GRO_HW) {
> -               if (vi->xdp_enabled)
> -                       return -EBUSY;

This seems suspicious, GRO_HW could be re-enabled accidentally even if
it was disabled when attaching an XDP program that is not capable of
doing multi-buffer XDP?

Thanks

> -
>                 if (features & NETIF_F_GRO_HW)
>                         offloads = vi->guest_offloads_capable;
>                 else
> --
> 2.19.1.6.gb485710b
>


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC PATCH 3/9] virtio_net: update bytes calculation for xdp_frame
  2022-11-22  7:43 ` [RFC PATCH 3/9] virtio_net: update bytes calculation for xdp_frame Heng Qi
@ 2022-12-06  5:31   ` Jason Wang
  2022-12-08  8:35     ` Heng Qi
  0 siblings, 1 reply; 30+ messages in thread
From: Jason Wang @ 2022-12-06  5:31 UTC (permalink / raw)
  To: Heng Qi
  Cc: netdev, bpf, Michael S. Tsirkin, Paolo Abeni, Jakub Kicinski,
	John Fastabend, David S. Miller, Daniel Borkmann,
	Alexei Starovoitov, Eric Dumazet

On Tue, Nov 22, 2022 at 3:44 PM Heng Qi <hengqi@linux.alibaba.com> wrote:
>
> Update relative record value for xdp_frame as basis
> for multi-buffer xdp transmission.
>
> Signed-off-by: Heng Qi <hengqi@linux.alibaba.com>
> Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>

Acked-by: Jason Wang <jasowang@redhat.com>

Thanks

> ---
>  drivers/net/virtio_net.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index 8f7d207d58d6..d3e8c63b9c4b 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -658,7 +658,7 @@ static int virtnet_xdp_xmit(struct net_device *dev,
>                 if (likely(is_xdp_frame(ptr))) {
>                         struct xdp_frame *frame = ptr_to_xdp(ptr);
>
> -                       bytes += frame->len;
> +                       bytes += xdp_get_frame_len(frame);
>                         xdp_return_frame(frame);
>                 } else {
>                         struct sk_buff *skb = ptr;
> @@ -1604,7 +1604,7 @@ static void free_old_xmit_skbs(struct send_queue *sq, bool in_napi)
>                 } else {
>                         struct xdp_frame *frame = ptr_to_xdp(ptr);
>
> -                       bytes += frame->len;
> +                       bytes += xdp_get_frame_len(frame);
>                         xdp_return_frame(frame);
>                 }
>                 packets++;
> --
> 2.19.1.6.gb485710b
>


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC PATCH 4/9] virtio_net: remove xdp related info from page_to_skb()
  2022-11-22  7:43 ` [RFC PATCH 4/9] virtio_net: remove xdp related info from page_to_skb() Heng Qi
@ 2022-12-06  5:36   ` Jason Wang
  2022-12-08  8:23     ` Heng Qi
  0 siblings, 1 reply; 30+ messages in thread
From: Jason Wang @ 2022-12-06  5:36 UTC (permalink / raw)
  To: Heng Qi
  Cc: netdev, bpf, Michael S. Tsirkin, Paolo Abeni, Jakub Kicinski,
	John Fastabend, David S. Miller, Daniel Borkmann,
	Alexei Starovoitov, Eric Dumazet

On Tue, Nov 22, 2022 at 3:44 PM Heng Qi <hengqi@linux.alibaba.com> wrote:
>
> For the clear construction of multi-buffer xdp_buff, we now remove the xdp
> processing interleaved with page_to_skb() before, and the logic of xdp and
> building skb from xdp will be separate and independent.
>
> Signed-off-by: Heng Qi <hengqi@linux.alibaba.com>
> Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>

So I think the organization of this series needs some tweak.

If I was not not and if we do things like this, XDP support is
actually broken and it breaks bisection and a lot of other things.

We need make sure each patch does not break anything, it probably requires

1) squash the following patches or
2) having a new helper to do XDP stuffs after/before page_to_skb()

Thanks

> ---
>  drivers/net/virtio_net.c | 41 +++++++++-------------------------------
>  1 file changed, 9 insertions(+), 32 deletions(-)
>
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index d3e8c63b9c4b..cd65f85d5075 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -439,9 +439,7 @@ static unsigned int mergeable_ctx_to_truesize(void *mrg_ctx)
>  static struct sk_buff *page_to_skb(struct virtnet_info *vi,
>                                    struct receive_queue *rq,
>                                    struct page *page, unsigned int offset,
> -                                  unsigned int len, unsigned int truesize,
> -                                  bool hdr_valid, unsigned int metasize,
> -                                  unsigned int headroom)
> +                                  unsigned int len, unsigned int truesize)
>  {
>         struct sk_buff *skb;
>         struct virtio_net_hdr_mrg_rxbuf *hdr;
> @@ -459,21 +457,11 @@ static struct sk_buff *page_to_skb(struct virtnet_info *vi,
>         else
>                 hdr_padded_len = sizeof(struct padded_vnet_hdr);
>
> -       /* If headroom is not 0, there is an offset between the beginning of the
> -        * data and the allocated space, otherwise the data and the allocated
> -        * space are aligned.
> -        *
> -        * Buffers with headroom use PAGE_SIZE as alloc size, see
> -        * add_recvbuf_mergeable() + get_mergeable_buf_len()
> -        */
> -       truesize = headroom ? PAGE_SIZE : truesize;
> -       tailroom = truesize - headroom;
> -       buf = p - headroom;
> -
> +       buf = p;
>         len -= hdr_len;
>         offset += hdr_padded_len;
>         p += hdr_padded_len;
> -       tailroom -= hdr_padded_len + len;
> +       tailroom = truesize - hdr_padded_len - len;
>
>         shinfo_size = SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
>
> @@ -503,7 +491,7 @@ static struct sk_buff *page_to_skb(struct virtnet_info *vi,
>         if (len <= skb_tailroom(skb))
>                 copy = len;
>         else
> -               copy = ETH_HLEN + metasize;
> +               copy = ETH_HLEN;
>         skb_put_data(skb, p, copy);
>
>         len -= copy;
> @@ -542,19 +530,11 @@ static struct sk_buff *page_to_skb(struct virtnet_info *vi,
>                 give_pages(rq, page);
>
>  ok:
> -       /* hdr_valid means no XDP, so we can copy the vnet header */
> -       if (hdr_valid) {
> -               hdr = skb_vnet_hdr(skb);
> -               memcpy(hdr, hdr_p, hdr_len);
> -       }
> +       hdr = skb_vnet_hdr(skb);
> +       memcpy(hdr, hdr_p, hdr_len);
>         if (page_to_free)
>                 put_page(page_to_free);
>
> -       if (metasize) {
> -               __skb_pull(skb, metasize);
> -               skb_metadata_set(skb, metasize);
> -       }
> -
>         return skb;
>  }
>
> @@ -917,7 +897,7 @@ static struct sk_buff *receive_big(struct net_device *dev,
>  {
>         struct page *page = buf;
>         struct sk_buff *skb =
> -               page_to_skb(vi, rq, page, 0, len, PAGE_SIZE, true, 0, 0);
> +               page_to_skb(vi, rq, page, 0, len, PAGE_SIZE);
>
>         stats->bytes += len - vi->hdr_len;
>         if (unlikely(!skb))
> @@ -1060,9 +1040,7 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
>                                 rcu_read_unlock();
>                                 put_page(page);
>                                 head_skb = page_to_skb(vi, rq, xdp_page, offset,
> -                                                      len, PAGE_SIZE, false,
> -                                                      metasize,
> -                                                      headroom);
> +                                                      len, PAGE_SIZE);
>                                 return head_skb;
>                         }
>                         break;
> @@ -1116,8 +1094,7 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
>         rcu_read_unlock();
>
>  skip_xdp:
> -       head_skb = page_to_skb(vi, rq, page, offset, len, truesize, !xdp_prog,
> -                              metasize, headroom);
> +       head_skb = page_to_skb(vi, rq, page, offset, len, truesize);
>         curr_skb = head_skb;
>
>         if (unlikely(!curr_skb))
> --
> 2.19.1.6.gb485710b
>


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC PATCH 5/9] virtio_net: build xdp_buff with multi buffers
  2022-11-22  7:43 ` [RFC PATCH 5/9] virtio_net: build xdp_buff with multi buffers Heng Qi
@ 2022-12-06  6:14   ` Jason Wang
  2022-12-08  8:25     ` Heng Qi
  0 siblings, 1 reply; 30+ messages in thread
From: Jason Wang @ 2022-12-06  6:14 UTC (permalink / raw)
  To: Heng Qi
  Cc: netdev, bpf, Michael S. Tsirkin, Paolo Abeni, Jakub Kicinski,
	John Fastabend, David S. Miller, Daniel Borkmann,
	Alexei Starovoitov, Eric Dumazet

On Tue, Nov 22, 2022 at 3:44 PM Heng Qi <hengqi@linux.alibaba.com> wrote:
>
> Support xdp for multi buffer packets.
>
> Putting the first buffer as the linear part for xdp_buff,
> and the rest of the buffers as non-linear fragments to struct
> skb_shared_info in the tailroom belonging to xdp_buff.
>
> Signed-off-by: Heng Qi <hengqi@linux.alibaba.com>
> Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> ---
>  drivers/net/virtio_net.c | 74 ++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 74 insertions(+)
>
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index cd65f85d5075..20784b1d8236 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -911,6 +911,80 @@ static struct sk_buff *receive_big(struct net_device *dev,
>         return NULL;
>  }
>
> +static int virtnet_build_xdp_buff(struct net_device *dev,
> +                                 struct virtnet_info *vi,
> +                                 struct receive_queue *rq,
> +                                 struct xdp_buff *xdp,
> +                                 void *buf,
> +                                 unsigned int len,
> +                                 unsigned int frame_sz,
> +                                 u16 *num_buf,
> +                                 unsigned int *xdp_frags_truesize,
> +                                 struct virtnet_rq_stats *stats)
> +{
> +       unsigned int tailroom = SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
> +       struct virtio_net_hdr_mrg_rxbuf *hdr = buf;
> +       unsigned int truesize, headroom;
> +       struct skb_shared_info *shinfo;
> +       unsigned int xdp_frags_truesz = 0;
> +       unsigned int cur_frag_size;
> +       struct page *page;
> +       skb_frag_t *frag;
> +       int offset;
> +       void *ctx;
> +
> +       xdp_init_buff(xdp, frame_sz, &rq->xdp_rxq);
> +       xdp_prepare_buff(xdp, buf - VIRTIO_XDP_HEADROOM,
> +                        VIRTIO_XDP_HEADROOM + vi->hdr_len, len - vi->hdr_len, true);
> +       shinfo = xdp_get_shared_info_from_buff(xdp);
> +       shinfo->nr_frags = 0;
> +       shinfo->xdp_frags_size = 0;
> +
> +       if ((*num_buf - 1) > MAX_SKB_FRAGS)
> +               return -EINVAL;
> +
> +       while ((--*num_buf) >= 1) {
> +               buf = virtqueue_get_buf_ctx(rq->vq, &len, &ctx);

So this works only for a mergeable buffer, I wonder if it's worth it
to make it work for big mode as well. Or at least we can mention it as
a TODO somewhere and rename this function (with mergeable suffix).

Others look good.

Thanks

> +               if (unlikely(!buf)) {
> +                       pr_debug("%s: rx error: %d buffers out of %d missing\n",
> +                                dev->name, *num_buf,
> +                                virtio16_to_cpu(vi->vdev, hdr->num_buffers));
> +                       dev->stats.rx_length_errors++;
> +                       return -EINVAL;
> +               }
> +
> +               if (!xdp_buff_has_frags(xdp))
> +                       xdp_buff_set_frags_flag(xdp);
> +
> +               stats->bytes += len;
> +               page = virt_to_head_page(buf);
> +               offset = buf - page_address(page);
> +               truesize = mergeable_ctx_to_truesize(ctx);
> +               headroom = mergeable_ctx_to_headroom(ctx);
> +
> +               cur_frag_size = truesize + (headroom ? (headroom + tailroom) : 0);
> +               xdp_frags_truesz += cur_frag_size;
> +               if (unlikely(len > truesize || cur_frag_size > PAGE_SIZE)) {
> +                       pr_debug("%s: rx error: len %u exceeds truesize %lu\n",
> +                                dev->name, len, (unsigned long)ctx);
> +                       dev->stats.rx_length_errors++;
> +                       return -EINVAL;
> +               }
> +
> +               frag = &shinfo->frags[shinfo->nr_frags++];
> +               __skb_frag_set_page(frag, page);
> +               skb_frag_off_set(frag, offset);
> +               skb_frag_size_set(frag, len);
> +               if (page_is_pfmemalloc(page))
> +                       xdp_buff_set_frag_pfmemalloc(xdp);
> +
> +               shinfo->xdp_frags_size += len;
> +       }
> +
> +       *xdp_frags_truesize = xdp_frags_truesz;
> +       return 0;
> +}
> +
>  static struct sk_buff *receive_mergeable(struct net_device *dev,
>                                          struct virtnet_info *vi,
>                                          struct receive_queue *rq,
> --
> 2.19.1.6.gb485710b
>


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC PATCH 6/9] virtio_net: construct multi-buffer xdp in mergeable
  2022-11-22  7:43 ` [RFC PATCH 6/9] virtio_net: construct multi-buffer xdp in mergeable Heng Qi
@ 2022-12-06  6:33   ` Jason Wang
  2022-12-08  8:30     ` Heng Qi
  0 siblings, 1 reply; 30+ messages in thread
From: Jason Wang @ 2022-12-06  6:33 UTC (permalink / raw)
  To: Heng Qi
  Cc: netdev, bpf, Michael S. Tsirkin, Paolo Abeni, Jakub Kicinski,
	John Fastabend, David S. Miller, Daniel Borkmann,
	Alexei Starovoitov, Eric Dumazet

On Tue, Nov 22, 2022 at 3:44 PM Heng Qi <hengqi@linux.alibaba.com> wrote:
>
> Build multi-buffer xdp using virtnet_build_xdp_buff() in mergeable.
>
> For the prefilled buffer before xdp is set, vq reset can be
> used to clear it, but most devices do not support it at present.
> In order not to bother users who are using xdp normally, we do
> not use vq reset for the time being.

I guess to tweak the part to say we will probably use vq reset in the future.

> At the same time, virtio
> net currently uses comp pages, and bpf_xdp_frags_increase_tail()
> needs to calculate the tailroom of the last frag, which will
> involve the offset of the corresponding page and cause a negative
> value, so we disable tail increase by not setting xdp_rxq->frag_size.
>
> Signed-off-by: Heng Qi <hengqi@linux.alibaba.com>
> Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> ---
>  drivers/net/virtio_net.c | 67 +++++++++++++++++++++++-----------------
>  1 file changed, 38 insertions(+), 29 deletions(-)
>
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index 20784b1d8236..83e6933ae62b 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -994,6 +994,7 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
>                                          unsigned int *xdp_xmit,
>                                          struct virtnet_rq_stats *stats)
>  {
> +       unsigned int tailroom = SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
>         struct virtio_net_hdr_mrg_rxbuf *hdr = buf;
>         u16 num_buf = virtio16_to_cpu(vi->vdev, hdr->num_buffers);
>         struct page *page = virt_to_head_page(buf);
> @@ -1024,53 +1025,50 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
>         rcu_read_lock();
>         xdp_prog = rcu_dereference(rq->xdp_prog);
>         if (xdp_prog) {
> +               unsigned int xdp_frags_truesz = 0;
> +               struct skb_shared_info *shinfo;
>                 struct xdp_frame *xdpf;
>                 struct page *xdp_page;
>                 struct xdp_buff xdp;
>                 void *data;
>                 u32 act;
> +               int i;
>
> -               /* Transient failure which in theory could occur if
> -                * in-flight packets from before XDP was enabled reach
> -                * the receive path after XDP is loaded.
> -                */
> -               if (unlikely(hdr->hdr.gso_type))
> -                       goto err_xdp;

Two questions:

1) should we keep this check for the XDP program that can't deal with XDP frags?
2) how could we guarantee that the vnet header (gso_type/csum_start
etc) is still valid after XDP (where XDP program can choose to
override the header)?

> -
> -               /* Buffers with headroom use PAGE_SIZE as alloc size,
> -                * see add_recvbuf_mergeable() + get_mergeable_buf_len()
> +               /* Now XDP core assumes frag size is PAGE_SIZE, but buffers
> +                * with headroom may add hole in truesize, which
> +                * make their length exceed PAGE_SIZE. So we disabled the
> +                * hole mechanism for xdp. See add_recvbuf_mergeable().
>                  */
>                 frame_sz = headroom ? PAGE_SIZE : truesize;
>
> -               /* This happens when rx buffer size is underestimated
> -                * or headroom is not enough because of the buffer
> -                * was refilled before XDP is set. This should only
> -                * happen for the first several packets, so we don't
> -                * care much about its performance.
> +               /* This happens when headroom is not enough because
> +                * of the buffer was prefilled before XDP is set.
> +                * This should only happen for the first several packets.
> +                * In fact, vq reset can be used here to help us clean up
> +                * the prefilled buffers, but many existing devices do not
> +                * support it, and we don't want to bother users who are
> +                * using xdp normally.
>                  */
> -               if (unlikely(num_buf > 1 ||
> -                            headroom < virtnet_get_headroom(vi))) {
> -                       /* linearize data for XDP */
> -                       xdp_page = xdp_linearize_page(rq, &num_buf,
> -                                                     page, offset,
> -                                                     VIRTIO_XDP_HEADROOM,
> -                                                     &len);
> -                       frame_sz = PAGE_SIZE;
> +               if (unlikely(headroom < virtnet_get_headroom(vi))) {
> +                       if ((VIRTIO_XDP_HEADROOM + len + tailroom) > PAGE_SIZE)
> +                               goto err_xdp;
>
> +                       xdp_page = alloc_page(GFP_ATOMIC);
>                         if (!xdp_page)
>                                 goto err_xdp;
> +
> +                       memcpy(page_address(xdp_page) + VIRTIO_XDP_HEADROOM,
> +                              page_address(page) + offset, len);
> +                       frame_sz = PAGE_SIZE;

How can we know a single page is sufficient here? (before XDP is set,
we reserve neither headroom nor tailroom).

>                         offset = VIRTIO_XDP_HEADROOM;

I think we should still try to do linearization for the XDP program
that doesn't support XDP frags.

Thanks

>                 } else {
>                         xdp_page = page;
>                 }
> -
> -               /* Allow consuming headroom but reserve enough space to push
> -                * the descriptor on if we get an XDP_TX return code.
> -                */
>                 data = page_address(xdp_page) + offset;
> -               xdp_init_buff(&xdp, frame_sz - vi->hdr_len, &rq->xdp_rxq);
> -               xdp_prepare_buff(&xdp, data - VIRTIO_XDP_HEADROOM + vi->hdr_len,
> -                                VIRTIO_XDP_HEADROOM, len - vi->hdr_len, true);
> +               err = virtnet_build_xdp_buff(dev, vi, rq, &xdp, data, len, frame_sz,
> +                                            &num_buf, &xdp_frags_truesz, stats);
> +               if (unlikely(err))
> +                       goto err_xdp_frags;
>
>                 act = bpf_prog_run_xdp(xdp_prog, &xdp);
>                 stats->xdp_packets++;
> @@ -1164,6 +1162,17 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
>                                 __free_pages(xdp_page, 0);
>                         goto err_xdp;
>                 }
> +err_xdp_frags:
> +               shinfo = xdp_get_shared_info_from_buff(&xdp);
> +
> +               if (unlikely(xdp_page != page))
> +                       __free_pages(xdp_page, 0);
> +
> +               for (i = 0; i < shinfo->nr_frags; i++) {
> +                       xdp_page = skb_frag_page(&shinfo->frags[i]);
> +                       put_page(xdp_page);
> +               }
> +               goto err_xdp;
>         }
>         rcu_read_unlock();
>
> --
> 2.19.1.6.gb485710b
>


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC PATCH 9/9] virtio_net: support multi-buffer xdp
  2022-11-22  7:43 ` [RFC PATCH 9/9] virtio_net: support " Heng Qi
@ 2022-12-06  6:42   ` Jason Wang
  2022-12-08  8:31     ` Heng Qi
  0 siblings, 1 reply; 30+ messages in thread
From: Jason Wang @ 2022-12-06  6:42 UTC (permalink / raw)
  To: Heng Qi
  Cc: netdev, bpf, Michael S. Tsirkin, Paolo Abeni, Jakub Kicinski,
	John Fastabend, David S. Miller, Daniel Borkmann,
	Alexei Starovoitov, Eric Dumazet

On Tue, Nov 22, 2022 at 3:44 PM Heng Qi <hengqi@linux.alibaba.com> wrote:
>
> Driver can pass the skb to stack by build_skb_from_xdp_buff().
>
> Driver forwards multi-buffer packets using the send queue
> when XDP_TX and XDP_REDIRECT, and clears the reference of multi
> pages when XDP_DROP.
>
> Signed-off-by: Heng Qi <hengqi@linux.alibaba.com>
> Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> ---
>  drivers/net/virtio_net.c | 65 ++++++----------------------------------
>  1 file changed, 9 insertions(+), 56 deletions(-)
>
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index 431f2126a2b5..bbd5cd9bfd47 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -1065,7 +1065,6 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
>         struct bpf_prog *xdp_prog;
>         unsigned int truesize = mergeable_ctx_to_truesize(ctx);
>         unsigned int headroom = mergeable_ctx_to_headroom(ctx);
> -       unsigned int metasize = 0;
>         unsigned int frame_sz;
>         int err;
>
> @@ -1137,63 +1136,22 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
>
>                 switch (act) {
>                 case XDP_PASS:
> -                       metasize = xdp.data - xdp.data_meta;
> -
> -                       /* recalculate offset to account for any header
> -                        * adjustments and minus the metasize to copy the
> -                        * metadata in page_to_skb(). Note other cases do not
> -                        * build an skb and avoid using offset
> -                        */
> -                       offset = xdp.data - page_address(xdp_page) -
> -                                vi->hdr_len - metasize;
> -
> -                       /* recalculate len if xdp.data, xdp.data_end or
> -                        * xdp.data_meta were adjusted
> -                        */
> -                       len = xdp.data_end - xdp.data + vi->hdr_len + metasize;
> -
> -                       /* recalculate headroom if xdp.data or xdp_data_meta
> -                        * were adjusted, note that offset should always point
> -                        * to the start of the reserved bytes for virtio_net
> -                        * header which are followed by xdp.data, that means
> -                        * that offset is equal to the headroom (when buf is
> -                        * starting at the beginning of the page, otherwise
> -                        * there is a base offset inside the page) but it's used
> -                        * with a different starting point (buf start) than
> -                        * xdp.data (buf start + vnet hdr size). If xdp.data or
> -                        * data_meta were adjusted by the xdp prog then the
> -                        * headroom size has changed and so has the offset, we
> -                        * can use data_hard_start, which points at buf start +
> -                        * vnet hdr size, to calculate the new headroom and use
> -                        * it later to compute buf start in page_to_skb()
> -                        */
> -                       headroom = xdp.data - xdp.data_hard_start - metasize;
> -
> -                       /* We can only create skb based on xdp_page. */
> -                       if (unlikely(xdp_page != page)) {
> -                               rcu_read_unlock();
> -                               put_page(page);
> -                               head_skb = page_to_skb(vi, rq, xdp_page, offset,
> -                                                      len, PAGE_SIZE);
> -                               return head_skb;
> -                       }
> -                       break;
> +                       head_skb = build_skb_from_xdp_buff(dev, vi, &xdp, xdp_frags_truesz);
> +                       rcu_read_unlock();
> +                       return head_skb;
>                 case XDP_TX:
>                         stats->xdp_tx++;
>                         xdpf = xdp_convert_buff_to_frame(&xdp);
>                         if (unlikely(!xdpf)) {
> -                               if (unlikely(xdp_page != page))
> -                                       put_page(xdp_page);
> -                               goto err_xdp;
> +                               pr_debug("%s: convert buff to frame failed for xdp\n", dev->name);

netdev_dbg()?

Thanks

> +                               goto err_xdp_frags;
>                         }
>                         err = virtnet_xdp_xmit(dev, 1, &xdpf, 0);
>                         if (unlikely(!err)) {
>                                 xdp_return_frame_rx_napi(xdpf);
>                         } else if (unlikely(err < 0)) {
>                                 trace_xdp_exception(vi->dev, xdp_prog, act);
> -                               if (unlikely(xdp_page != page))
> -                                       put_page(xdp_page);
> -                               goto err_xdp;
> +                               goto err_xdp_frags;
>                         }
>                         *xdp_xmit |= VIRTIO_XDP_TX;
>                         if (unlikely(xdp_page != page))
> @@ -1203,11 +1161,8 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
>                 case XDP_REDIRECT:
>                         stats->xdp_redirects++;
>                         err = xdp_do_redirect(dev, &xdp, xdp_prog);
> -                       if (err) {
> -                               if (unlikely(xdp_page != page))
> -                                       put_page(xdp_page);
> -                               goto err_xdp;
> -                       }
> +                       if (err)
> +                               goto err_xdp_frags;
>                         *xdp_xmit |= VIRTIO_XDP_REDIR;
>                         if (unlikely(xdp_page != page))
>                                 put_page(page);
> @@ -1220,9 +1175,7 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
>                         trace_xdp_exception(vi->dev, xdp_prog, act);
>                         fallthrough;
>                 case XDP_DROP:
> -                       if (unlikely(xdp_page != page))
> -                               __free_pages(xdp_page, 0);
> -                       goto err_xdp;
> +                       goto err_xdp_frags;
>                 }
>  err_xdp_frags:
>                 shinfo = xdp_get_shared_info_from_buff(&xdp);
> --
> 2.19.1.6.gb485710b
>


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC PATCH 1/9] virtio_net: disable the hole mechanism for xdp
  2022-12-06  5:20   ` Jason Wang
@ 2022-12-08  8:20     ` Heng Qi
  0 siblings, 0 replies; 30+ messages in thread
From: Heng Qi @ 2022-12-08  8:20 UTC (permalink / raw)
  To: Jason Wang
  Cc: netdev, bpf, Michael S. Tsirkin, Paolo Abeni, Jakub Kicinski,
	John Fastabend, David S. Miller, Daniel Borkmann,
	Alexei Starovoitov, Eric Dumazet



在 2022/12/6 下午1:20, Jason Wang 写道:
> On Tue, Nov 22, 2022 at 3:44 PM Heng Qi <hengqi@linux.alibaba.com> wrote:
>> XDP core assumes that the frame_size of xdp_buff and the length of
>> the frag are PAGE_SIZE. But before xdp is set, the length of the prefilled
>> buffer may exceed PAGE_SIZE, which may cause the processing of xdp to fail,
>> so we disable the hole mechanism when xdp is loaded.
>>
>> Signed-off-by: Heng Qi <hengqi@linux.alibaba.com>
>> Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
>> ---
>>   drivers/net/virtio_net.c | 5 ++++-
>>   1 file changed, 4 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
>> index 9cce7dec7366..c5046d21b281 100644
>> --- a/drivers/net/virtio_net.c
>> +++ b/drivers/net/virtio_net.c
>> @@ -1419,8 +1419,11 @@ static int add_recvbuf_mergeable(struct virtnet_info *vi,
>>                  /* To avoid internal fragmentation, if there is very likely not
>>                   * enough space for another buffer, add the remaining space to
>>                   * the current buffer.
>> +                * XDP core assumes that frame_size of xdp_buff and the length
>> +                * of the frag are PAGE_SIZE, so we disable the hole mechanism.
>>                   */
>> -               len += hole;
>> +               if (!vi->xdp_enabled)
> How is this synchronized with virtnet_xdp_set()?
>
> I think we need to use headroom here since it did:
>
> static unsigned int virtnet_get_headroom(struct virtnet_info *vi)
> {
>          return vi->xdp_enabled ? VIRTIO_XDP_HEADROOM : 0;
> }
>
> Otherwise xdp_enabled could be re-read which may lead bugs.

Yes, we should use headroom instead of using vi->xdp_enabled twice in 
the same
position to avoid re-reading.

Thanks for reminding.

>
> Thanks
>
>> +                       len += hole;
>>                  alloc_frag->offset += hole;
>>          }
>>
>> --
>> 2.19.1.6.gb485710b
>>


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC PATCH 2/9] virtio_net: set up xdp for multi buffer packets
  2022-12-06  5:29   ` Jason Wang
@ 2022-12-08  8:21     ` Heng Qi
  0 siblings, 0 replies; 30+ messages in thread
From: Heng Qi @ 2022-12-08  8:21 UTC (permalink / raw)
  To: Jason Wang
  Cc: netdev, bpf, Michael S. Tsirkin, Paolo Abeni, Jakub Kicinski,
	John Fastabend, David S. Miller, Daniel Borkmann,
	Alexei Starovoitov, Eric Dumazet



在 2022/12/6 下午1:29, Jason Wang 写道:
> On Tue, Nov 22, 2022 at 3:44 PM Heng Qi <hengqi@linux.alibaba.com> wrote:
>> When the xdp program sets xdp.frags, which means it can process
>> multi-buffer packets, so we continue to open xdp support when
>> features such as GRO_HW are negotiated.
>>
>> Signed-off-by: Heng Qi <hengqi@linux.alibaba.com>
>> Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
>> ---
>>   drivers/net/virtio_net.c | 30 +++++++++++++++++-------------
>>   1 file changed, 17 insertions(+), 13 deletions(-)
>>
>> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
>> index c5046d21b281..8f7d207d58d6 100644
>> --- a/drivers/net/virtio_net.c
>> +++ b/drivers/net/virtio_net.c
>> @@ -3080,14 +3080,21 @@ static int virtnet_xdp_set(struct net_device *dev, struct bpf_prog *prog,
>>          u16 xdp_qp = 0, curr_qp;
>>          int i, err;
>>
>> -       if (!virtio_has_feature(vi->vdev, VIRTIO_NET_F_CTRL_GUEST_OFFLOADS)
>> -           && (virtio_has_feature(vi->vdev, VIRTIO_NET_F_GUEST_TSO4) ||
>> -               virtio_has_feature(vi->vdev, VIRTIO_NET_F_GUEST_TSO6) ||
>> -               virtio_has_feature(vi->vdev, VIRTIO_NET_F_GUEST_ECN) ||
>> -               virtio_has_feature(vi->vdev, VIRTIO_NET_F_GUEST_UFO) ||
>> -               virtio_has_feature(vi->vdev, VIRTIO_NET_F_GUEST_CSUM))) {
>> -               NL_SET_ERR_MSG_MOD(extack, "Can't set XDP while host is implementing GRO_HW/CSUM, disable GRO_HW/CSUM first");
>> -               return -EOPNOTSUPP;
>> +       if (!virtio_has_feature(vi->vdev, VIRTIO_NET_F_CTRL_GUEST_OFFLOADS)) {
>> +               if (virtio_has_feature(vi->vdev, VIRTIO_NET_F_GUEST_CSUM)) {
>> +                       NL_SET_ERR_MSG_MOD(extack, "Can't set XDP without frags while guest is implementing GUEST_CSUM");
>> +                       return -EOPNOTSUPP;
>> +               }
>> +
>> +               if (prog && !prog->aux->xdp_has_frags) {
>> +                       if (virtio_has_feature(vi->vdev, VIRTIO_NET_F_GUEST_TSO4) ||
>> +                           virtio_has_feature(vi->vdev, VIRTIO_NET_F_GUEST_TSO6) ||
>> +                           virtio_has_feature(vi->vdev, VIRTIO_NET_F_GUEST_ECN) ||
>> +                           virtio_has_feature(vi->vdev, VIRTIO_NET_F_GUEST_UFO)) {
>> +                               NL_SET_ERR_MSG_MOD(extack, "Can't set XDP without frags while guest is implementing GUEST_GRO_HW");
>> +                               return -EOPNOTSUPP;
>> +                       }
>> +               }
>>          }
>>
>>          if (vi->mergeable_rx_bufs && !vi->any_header_sg) {
>> @@ -3095,8 +3102,8 @@ static int virtnet_xdp_set(struct net_device *dev, struct bpf_prog *prog,
>>                  return -EINVAL;
>>          }
>>
>> -       if (dev->mtu > max_sz) {
>> -               NL_SET_ERR_MSG_MOD(extack, "MTU too large to enable XDP");
>> +       if (prog && !prog->aux->xdp_has_frags && dev->mtu > max_sz) {
>> +               NL_SET_ERR_MSG_MOD(extack, "MTU too large to enable XDP without frags");
>>                  netdev_warn(dev, "XDP requires MTU less than %lu\n", max_sz);
>>                  return -EINVAL;
>>          }
>> @@ -3218,9 +3225,6 @@ static int virtnet_set_features(struct net_device *dev,
>>          int err;
>>
>>          if ((dev->features ^ features) & NETIF_F_GRO_HW) {
>> -               if (vi->xdp_enabled)
>> -                       return -EBUSY;
> This seems suspicious, GRO_HW could be re-enabled accidentally even if
> it was disabled when attaching an XDP program that is not capable of
> doing multi-buffer XDP?

Yes, we shouldn't drop this check, because GRO_HW is unfriendly to xdp 
programs without xdp.frags.

Thanks.

>
> Thanks
>
>> -
>>                  if (features & NETIF_F_GRO_HW)
>>                          offloads = vi->guest_offloads_capable;
>>                  else
>> --
>> 2.19.1.6.gb485710b
>>


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC PATCH 4/9] virtio_net: remove xdp related info from page_to_skb()
  2022-12-06  5:36   ` Jason Wang
@ 2022-12-08  8:23     ` Heng Qi
  0 siblings, 0 replies; 30+ messages in thread
From: Heng Qi @ 2022-12-08  8:23 UTC (permalink / raw)
  To: Jason Wang
  Cc: netdev, bpf, Michael S. Tsirkin, Paolo Abeni, Jakub Kicinski,
	John Fastabend, David S. Miller, Daniel Borkmann,
	Alexei Starovoitov, Eric Dumazet



在 2022/12/6 下午1:36, Jason Wang 写道:
> On Tue, Nov 22, 2022 at 3:44 PM Heng Qi <hengqi@linux.alibaba.com> wrote:
>> For the clear construction of multi-buffer xdp_buff, we now remove the xdp
>> processing interleaved with page_to_skb() before, and the logic of xdp and
>> building skb from xdp will be separate and independent.
>>
>> Signed-off-by: Heng Qi <hengqi@linux.alibaba.com>
>> Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> So I think the organization of this series needs some tweak.
>
> If I was not not and if we do things like this, XDP support is
> actually broken and it breaks bisection and a lot of other things.
>
> We need make sure each patch does not break anything, it probably requires
>
> 1) squash the following patches or
> 2) having a new helper to do XDP stuffs after/before page_to_skb()

Your comments are informative, I'm going to make this patch more 
independent of
"[PATCH 9/9] virtio_net: support multi-buffer xdp", so that each patch 
doesn't break anything.

Thanks.

>
> Thanks
>
>> ---
>>   drivers/net/virtio_net.c | 41 +++++++++-------------------------------
>>   1 file changed, 9 insertions(+), 32 deletions(-)
>>
>> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
>> index d3e8c63b9c4b..cd65f85d5075 100644
>> --- a/drivers/net/virtio_net.c
>> +++ b/drivers/net/virtio_net.c
>> @@ -439,9 +439,7 @@ static unsigned int mergeable_ctx_to_truesize(void *mrg_ctx)
>>   static struct sk_buff *page_to_skb(struct virtnet_info *vi,
>>                                     struct receive_queue *rq,
>>                                     struct page *page, unsigned int offset,
>> -                                  unsigned int len, unsigned int truesize,
>> -                                  bool hdr_valid, unsigned int metasize,
>> -                                  unsigned int headroom)
>> +                                  unsigned int len, unsigned int truesize)
>>   {
>>          struct sk_buff *skb;
>>          struct virtio_net_hdr_mrg_rxbuf *hdr;
>> @@ -459,21 +457,11 @@ static struct sk_buff *page_to_skb(struct virtnet_info *vi,
>>          else
>>                  hdr_padded_len = sizeof(struct padded_vnet_hdr);
>>
>> -       /* If headroom is not 0, there is an offset between the beginning of the
>> -        * data and the allocated space, otherwise the data and the allocated
>> -        * space are aligned.
>> -        *
>> -        * Buffers with headroom use PAGE_SIZE as alloc size, see
>> -        * add_recvbuf_mergeable() + get_mergeable_buf_len()
>> -        */
>> -       truesize = headroom ? PAGE_SIZE : truesize;
>> -       tailroom = truesize - headroom;
>> -       buf = p - headroom;
>> -
>> +       buf = p;
>>          len -= hdr_len;
>>          offset += hdr_padded_len;
>>          p += hdr_padded_len;
>> -       tailroom -= hdr_padded_len + len;
>> +       tailroom = truesize - hdr_padded_len - len;
>>
>>          shinfo_size = SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
>>
>> @@ -503,7 +491,7 @@ static struct sk_buff *page_to_skb(struct virtnet_info *vi,
>>          if (len <= skb_tailroom(skb))
>>                  copy = len;
>>          else
>> -               copy = ETH_HLEN + metasize;
>> +               copy = ETH_HLEN;
>>          skb_put_data(skb, p, copy);
>>
>>          len -= copy;
>> @@ -542,19 +530,11 @@ static struct sk_buff *page_to_skb(struct virtnet_info *vi,
>>                  give_pages(rq, page);
>>
>>   ok:
>> -       /* hdr_valid means no XDP, so we can copy the vnet header */
>> -       if (hdr_valid) {
>> -               hdr = skb_vnet_hdr(skb);
>> -               memcpy(hdr, hdr_p, hdr_len);
>> -       }
>> +       hdr = skb_vnet_hdr(skb);
>> +       memcpy(hdr, hdr_p, hdr_len);
>>          if (page_to_free)
>>                  put_page(page_to_free);
>>
>> -       if (metasize) {
>> -               __skb_pull(skb, metasize);
>> -               skb_metadata_set(skb, metasize);
>> -       }
>> -
>>          return skb;
>>   }
>>
>> @@ -917,7 +897,7 @@ static struct sk_buff *receive_big(struct net_device *dev,
>>   {
>>          struct page *page = buf;
>>          struct sk_buff *skb =
>> -               page_to_skb(vi, rq, page, 0, len, PAGE_SIZE, true, 0, 0);
>> +               page_to_skb(vi, rq, page, 0, len, PAGE_SIZE);
>>
>>          stats->bytes += len - vi->hdr_len;
>>          if (unlikely(!skb))
>> @@ -1060,9 +1040,7 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
>>                                  rcu_read_unlock();
>>                                  put_page(page);
>>                                  head_skb = page_to_skb(vi, rq, xdp_page, offset,
>> -                                                      len, PAGE_SIZE, false,
>> -                                                      metasize,
>> -                                                      headroom);
>> +                                                      len, PAGE_SIZE);
>>                                  return head_skb;
>>                          }
>>                          break;
>> @@ -1116,8 +1094,7 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
>>          rcu_read_unlock();
>>
>>   skip_xdp:
>> -       head_skb = page_to_skb(vi, rq, page, offset, len, truesize, !xdp_prog,
>> -                              metasize, headroom);
>> +       head_skb = page_to_skb(vi, rq, page, offset, len, truesize);
>>          curr_skb = head_skb;
>>
>>          if (unlikely(!curr_skb))
>> --
>> 2.19.1.6.gb485710b
>>


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC PATCH 5/9] virtio_net: build xdp_buff with multi buffers
  2022-12-06  6:14   ` Jason Wang
@ 2022-12-08  8:25     ` Heng Qi
  0 siblings, 0 replies; 30+ messages in thread
From: Heng Qi @ 2022-12-08  8:25 UTC (permalink / raw)
  To: Jason Wang
  Cc: netdev, bpf, Michael S. Tsirkin, Paolo Abeni, Jakub Kicinski,
	John Fastabend, David S. Miller, Daniel Borkmann,
	Alexei Starovoitov, Eric Dumazet



在 2022/12/6 下午2:14, Jason Wang 写道:
> On Tue, Nov 22, 2022 at 3:44 PM Heng Qi <hengqi@linux.alibaba.com> wrote:
>> Support xdp for multi buffer packets.
>>
>> Putting the first buffer as the linear part for xdp_buff,
>> and the rest of the buffers as non-linear fragments to struct
>> skb_shared_info in the tailroom belonging to xdp_buff.
>>
>> Signed-off-by: Heng Qi <hengqi@linux.alibaba.com>
>> Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
>> ---
>>   drivers/net/virtio_net.c | 74 ++++++++++++++++++++++++++++++++++++++++
>>   1 file changed, 74 insertions(+)
>>
>> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
>> index cd65f85d5075..20784b1d8236 100644
>> --- a/drivers/net/virtio_net.c
>> +++ b/drivers/net/virtio_net.c
>> @@ -911,6 +911,80 @@ static struct sk_buff *receive_big(struct net_device *dev,
>>          return NULL;
>>   }
>>
>> +static int virtnet_build_xdp_buff(struct net_device *dev,
>> +                                 struct virtnet_info *vi,
>> +                                 struct receive_queue *rq,
>> +                                 struct xdp_buff *xdp,
>> +                                 void *buf,
>> +                                 unsigned int len,
>> +                                 unsigned int frame_sz,
>> +                                 u16 *num_buf,
>> +                                 unsigned int *xdp_frags_truesize,
>> +                                 struct virtnet_rq_stats *stats)
>> +{
>> +       unsigned int tailroom = SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
>> +       struct virtio_net_hdr_mrg_rxbuf *hdr = buf;
>> +       unsigned int truesize, headroom;
>> +       struct skb_shared_info *shinfo;
>> +       unsigned int xdp_frags_truesz = 0;
>> +       unsigned int cur_frag_size;
>> +       struct page *page;
>> +       skb_frag_t *frag;
>> +       int offset;
>> +       void *ctx;
>> +
>> +       xdp_init_buff(xdp, frame_sz, &rq->xdp_rxq);
>> +       xdp_prepare_buff(xdp, buf - VIRTIO_XDP_HEADROOM,
>> +                        VIRTIO_XDP_HEADROOM + vi->hdr_len, len - vi->hdr_len, true);
>> +       shinfo = xdp_get_shared_info_from_buff(xdp);
>> +       shinfo->nr_frags = 0;
>> +       shinfo->xdp_frags_size = 0;
>> +
>> +       if ((*num_buf - 1) > MAX_SKB_FRAGS)
>> +               return -EINVAL;
>> +
>> +       while ((--*num_buf) >= 1) {
>> +               buf = virtqueue_get_buf_ctx(rq->vq, &len, &ctx);
> So this works only for a mergeable buffer, I wonder if it's worth it
> to make it work for big mode as well. Or at least we can mention it as
> a TODO somewhere and rename this function (with mergeable suffix).

Yes, I'm leaning towards the latter, I'll rename it with a mergeable 
suffix in the next version.

Thanks.

>
> Others look good.
>
> Thanks
>
>> +               if (unlikely(!buf)) {
>> +                       pr_debug("%s: rx error: %d buffers out of %d missing\n",
>> +                                dev->name, *num_buf,
>> +                                virtio16_to_cpu(vi->vdev, hdr->num_buffers));
>> +                       dev->stats.rx_length_errors++;
>> +                       return -EINVAL;
>> +               }
>> +
>> +               if (!xdp_buff_has_frags(xdp))
>> +                       xdp_buff_set_frags_flag(xdp);
>> +
>> +               stats->bytes += len;
>> +               page = virt_to_head_page(buf);
>> +               offset = buf - page_address(page);
>> +               truesize = mergeable_ctx_to_truesize(ctx);
>> +               headroom = mergeable_ctx_to_headroom(ctx);
>> +
>> +               cur_frag_size = truesize + (headroom ? (headroom + tailroom) : 0);
>> +               xdp_frags_truesz += cur_frag_size;
>> +               if (unlikely(len > truesize || cur_frag_size > PAGE_SIZE)) {
>> +                       pr_debug("%s: rx error: len %u exceeds truesize %lu\n",
>> +                                dev->name, len, (unsigned long)ctx);
>> +                       dev->stats.rx_length_errors++;
>> +                       return -EINVAL;
>> +               }
>> +
>> +               frag = &shinfo->frags[shinfo->nr_frags++];
>> +               __skb_frag_set_page(frag, page);
>> +               skb_frag_off_set(frag, offset);
>> +               skb_frag_size_set(frag, len);
>> +               if (page_is_pfmemalloc(page))
>> +                       xdp_buff_set_frag_pfmemalloc(xdp);
>> +
>> +               shinfo->xdp_frags_size += len;
>> +       }
>> +
>> +       *xdp_frags_truesize = xdp_frags_truesz;
>> +       return 0;
>> +}
>> +
>>   static struct sk_buff *receive_mergeable(struct net_device *dev,
>>                                           struct virtnet_info *vi,
>>                                           struct receive_queue *rq,
>> --
>> 2.19.1.6.gb485710b
>>


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC PATCH 6/9] virtio_net: construct multi-buffer xdp in mergeable
  2022-12-06  6:33   ` Jason Wang
@ 2022-12-08  8:30     ` Heng Qi
  2022-12-13  7:08       ` Jason Wang
  0 siblings, 1 reply; 30+ messages in thread
From: Heng Qi @ 2022-12-08  8:30 UTC (permalink / raw)
  To: Jason Wang
  Cc: netdev, bpf, Michael S. Tsirkin, Paolo Abeni, Jakub Kicinski,
	John Fastabend, David S. Miller, Daniel Borkmann,
	Alexei Starovoitov, Eric Dumazet



在 2022/12/6 下午2:33, Jason Wang 写道:
> On Tue, Nov 22, 2022 at 3:44 PM Heng Qi <hengqi@linux.alibaba.com> wrote:
>> Build multi-buffer xdp using virtnet_build_xdp_buff() in mergeable.
>>
>> For the prefilled buffer before xdp is set, vq reset can be
>> used to clear it, but most devices do not support it at present.
>> In order not to bother users who are using xdp normally, we do
>> not use vq reset for the time being.
> I guess to tweak the part to say we will probably use vq reset in the future.

OK, it works.

>
>> At the same time, virtio
>> net currently uses comp pages, and bpf_xdp_frags_increase_tail()
>> needs to calculate the tailroom of the last frag, which will
>> involve the offset of the corresponding page and cause a negative
>> value, so we disable tail increase by not setting xdp_rxq->frag_size.
>>
>> Signed-off-by: Heng Qi <hengqi@linux.alibaba.com>
>> Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
>> ---
>>   drivers/net/virtio_net.c | 67 +++++++++++++++++++++++-----------------
>>   1 file changed, 38 insertions(+), 29 deletions(-)
>>
>> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
>> index 20784b1d8236..83e6933ae62b 100644
>> --- a/drivers/net/virtio_net.c
>> +++ b/drivers/net/virtio_net.c
>> @@ -994,6 +994,7 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
>>                                           unsigned int *xdp_xmit,
>>                                           struct virtnet_rq_stats *stats)
>>   {
>> +       unsigned int tailroom = SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
>>          struct virtio_net_hdr_mrg_rxbuf *hdr = buf;
>>          u16 num_buf = virtio16_to_cpu(vi->vdev, hdr->num_buffers);
>>          struct page *page = virt_to_head_page(buf);
>> @@ -1024,53 +1025,50 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
>>          rcu_read_lock();
>>          xdp_prog = rcu_dereference(rq->xdp_prog);
>>          if (xdp_prog) {
>> +               unsigned int xdp_frags_truesz = 0;
>> +               struct skb_shared_info *shinfo;
>>                  struct xdp_frame *xdpf;
>>                  struct page *xdp_page;
>>                  struct xdp_buff xdp;
>>                  void *data;
>>                  u32 act;
>> +               int i;
>>
>> -               /* Transient failure which in theory could occur if
>> -                * in-flight packets from before XDP was enabled reach
>> -                * the receive path after XDP is loaded.
>> -                */
>> -               if (unlikely(hdr->hdr.gso_type))
>> -                       goto err_xdp;
> Two questions:
>
> 1) should we keep this check for the XDP program that can't deal with XDP frags?

Yes, the problem is the same as the xdp program without xdp.frags when 
GRO_HW, I will correct it.

> 2) how could we guarantee that the vnet header (gso_type/csum_start
> etc) is still valid after XDP (where XDP program can choose to
> override the header)?

We can save the vnet headr before the driver receives the packet and 
build xdp_buff, and then use
the pre-saved value in the subsequent process.

>> -
>> -               /* Buffers with headroom use PAGE_SIZE as alloc size,
>> -                * see add_recvbuf_mergeable() + get_mergeable_buf_len()
>> +               /* Now XDP core assumes frag size is PAGE_SIZE, but buffers
>> +                * with headroom may add hole in truesize, which
>> +                * make their length exceed PAGE_SIZE. So we disabled the
>> +                * hole mechanism for xdp. See add_recvbuf_mergeable().
>>                   */
>>                  frame_sz = headroom ? PAGE_SIZE : truesize;
>>
>> -               /* This happens when rx buffer size is underestimated
>> -                * or headroom is not enough because of the buffer
>> -                * was refilled before XDP is set. This should only
>> -                * happen for the first several packets, so we don't
>> -                * care much about its performance.
>> +               /* This happens when headroom is not enough because
>> +                * of the buffer was prefilled before XDP is set.
>> +                * This should only happen for the first several packets.
>> +                * In fact, vq reset can be used here to help us clean up
>> +                * the prefilled buffers, but many existing devices do not
>> +                * support it, and we don't want to bother users who are
>> +                * using xdp normally.
>>                   */
>> -               if (unlikely(num_buf > 1 ||
>> -                            headroom < virtnet_get_headroom(vi))) {
>> -                       /* linearize data for XDP */
>> -                       xdp_page = xdp_linearize_page(rq, &num_buf,
>> -                                                     page, offset,
>> -                                                     VIRTIO_XDP_HEADROOM,
>> -                                                     &len);
>> -                       frame_sz = PAGE_SIZE;
>> +               if (unlikely(headroom < virtnet_get_headroom(vi))) {
>> +                       if ((VIRTIO_XDP_HEADROOM + len + tailroom) > PAGE_SIZE)
>> +                               goto err_xdp;
>>
>> +                       xdp_page = alloc_page(GFP_ATOMIC);
>>                          if (!xdp_page)
>>                                  goto err_xdp;
>> +
>> +                       memcpy(page_address(xdp_page) + VIRTIO_XDP_HEADROOM,
>> +                              page_address(page) + offset, len);
>> +                       frame_sz = PAGE_SIZE;
> How can we know a single page is sufficient here? (before XDP is set,
> we reserve neither headroom nor tailroom).

This is only for the first buffer, refer to add_recvbuf_mergeable() and
get_mergeable_buf_len() A buffer is always no larger than a page.

>
>>                          offset = VIRTIO_XDP_HEADROOM;
> I think we should still try to do linearization for the XDP program
> that doesn't support XDP frags.

Yes, you are right.

Thanks.

>
> Thanks
>
>>                  } else {
>>                          xdp_page = page;
>>                  }
>> -
>> -               /* Allow consuming headroom but reserve enough space to push
>> -                * the descriptor on if we get an XDP_TX return code.
>> -                */
>>                  data = page_address(xdp_page) + offset;
>> -               xdp_init_buff(&xdp, frame_sz - vi->hdr_len, &rq->xdp_rxq);
>> -               xdp_prepare_buff(&xdp, data - VIRTIO_XDP_HEADROOM + vi->hdr_len,
>> -                                VIRTIO_XDP_HEADROOM, len - vi->hdr_len, true);
>> +               err = virtnet_build_xdp_buff(dev, vi, rq, &xdp, data, len, frame_sz,
>> +                                            &num_buf, &xdp_frags_truesz, stats);
>> +               if (unlikely(err))
>> +                       goto err_xdp_frags;
>>
>>                  act = bpf_prog_run_xdp(xdp_prog, &xdp);
>>                  stats->xdp_packets++;
>> @@ -1164,6 +1162,17 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
>>                                  __free_pages(xdp_page, 0);
>>                          goto err_xdp;
>>                  }
>> +err_xdp_frags:
>> +               shinfo = xdp_get_shared_info_from_buff(&xdp);
>> +
>> +               if (unlikely(xdp_page != page))
>> +                       __free_pages(xdp_page, 0);
>> +
>> +               for (i = 0; i < shinfo->nr_frags; i++) {
>> +                       xdp_page = skb_frag_page(&shinfo->frags[i]);
>> +                       put_page(xdp_page);
>> +               }
>> +               goto err_xdp;
>>          }
>>          rcu_read_unlock();
>>
>> --
>> 2.19.1.6.gb485710b
>>


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC PATCH 9/9] virtio_net: support multi-buffer xdp
  2022-12-06  6:42   ` Jason Wang
@ 2022-12-08  8:31     ` Heng Qi
  0 siblings, 0 replies; 30+ messages in thread
From: Heng Qi @ 2022-12-08  8:31 UTC (permalink / raw)
  To: Jason Wang
  Cc: netdev, bpf, Michael S. Tsirkin, Paolo Abeni, Jakub Kicinski,
	John Fastabend, David S. Miller, Daniel Borkmann,
	Alexei Starovoitov, Eric Dumazet



在 2022/12/6 下午2:42, Jason Wang 写道:
> On Tue, Nov 22, 2022 at 3:44 PM Heng Qi <hengqi@linux.alibaba.com> wrote:
>> Driver can pass the skb to stack by build_skb_from_xdp_buff().
>>
>> Driver forwards multi-buffer packets using the send queue
>> when XDP_TX and XDP_REDIRECT, and clears the reference of multi
>> pages when XDP_DROP.
>>
>> Signed-off-by: Heng Qi <hengqi@linux.alibaba.com>
>> Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
>> ---
>>   drivers/net/virtio_net.c | 65 ++++++----------------------------------
>>   1 file changed, 9 insertions(+), 56 deletions(-)
>>
>> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
>> index 431f2126a2b5..bbd5cd9bfd47 100644
>> --- a/drivers/net/virtio_net.c
>> +++ b/drivers/net/virtio_net.c
>> @@ -1065,7 +1065,6 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
>>          struct bpf_prog *xdp_prog;
>>          unsigned int truesize = mergeable_ctx_to_truesize(ctx);
>>          unsigned int headroom = mergeable_ctx_to_headroom(ctx);
>> -       unsigned int metasize = 0;
>>          unsigned int frame_sz;
>>          int err;
>>
>> @@ -1137,63 +1136,22 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
>>
>>                  switch (act) {
>>                  case XDP_PASS:
>> -                       metasize = xdp.data - xdp.data_meta;
>> -
>> -                       /* recalculate offset to account for any header
>> -                        * adjustments and minus the metasize to copy the
>> -                        * metadata in page_to_skb(). Note other cases do not
>> -                        * build an skb and avoid using offset
>> -                        */
>> -                       offset = xdp.data - page_address(xdp_page) -
>> -                                vi->hdr_len - metasize;
>> -
>> -                       /* recalculate len if xdp.data, xdp.data_end or
>> -                        * xdp.data_meta were adjusted
>> -                        */
>> -                       len = xdp.data_end - xdp.data + vi->hdr_len + metasize;
>> -
>> -                       /* recalculate headroom if xdp.data or xdp_data_meta
>> -                        * were adjusted, note that offset should always point
>> -                        * to the start of the reserved bytes for virtio_net
>> -                        * header which are followed by xdp.data, that means
>> -                        * that offset is equal to the headroom (when buf is
>> -                        * starting at the beginning of the page, otherwise
>> -                        * there is a base offset inside the page) but it's used
>> -                        * with a different starting point (buf start) than
>> -                        * xdp.data (buf start + vnet hdr size). If xdp.data or
>> -                        * data_meta were adjusted by the xdp prog then the
>> -                        * headroom size has changed and so has the offset, we
>> -                        * can use data_hard_start, which points at buf start +
>> -                        * vnet hdr size, to calculate the new headroom and use
>> -                        * it later to compute buf start in page_to_skb()
>> -                        */
>> -                       headroom = xdp.data - xdp.data_hard_start - metasize;
>> -
>> -                       /* We can only create skb based on xdp_page. */
>> -                       if (unlikely(xdp_page != page)) {
>> -                               rcu_read_unlock();
>> -                               put_page(page);
>> -                               head_skb = page_to_skb(vi, rq, xdp_page, offset,
>> -                                                      len, PAGE_SIZE);
>> -                               return head_skb;
>> -                       }
>> -                       break;
>> +                       head_skb = build_skb_from_xdp_buff(dev, vi, &xdp, xdp_frags_truesz);
>> +                       rcu_read_unlock();
>> +                       return head_skb;
>>                  case XDP_TX:
>>                          stats->xdp_tx++;
>>                          xdpf = xdp_convert_buff_to_frame(&xdp);
>>                          if (unlikely(!xdpf)) {
>> -                               if (unlikely(xdp_page != page))
>> -                                       put_page(xdp_page);
>> -                               goto err_xdp;
>> +                               pr_debug("%s: convert buff to frame failed for xdp\n", dev->name);
> netdev_dbg()?

Yes, I will fix it in the next version.

Thanks.

>
> Thanks
>
>> +                               goto err_xdp_frags;
>>                          }
>>                          err = virtnet_xdp_xmit(dev, 1, &xdpf, 0);
>>                          if (unlikely(!err)) {
>>                                  xdp_return_frame_rx_napi(xdpf);
>>                          } else if (unlikely(err < 0)) {
>>                                  trace_xdp_exception(vi->dev, xdp_prog, act);
>> -                               if (unlikely(xdp_page != page))
>> -                                       put_page(xdp_page);
>> -                               goto err_xdp;
>> +                               goto err_xdp_frags;
>>                          }
>>                          *xdp_xmit |= VIRTIO_XDP_TX;
>>                          if (unlikely(xdp_page != page))
>> @@ -1203,11 +1161,8 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
>>                  case XDP_REDIRECT:
>>                          stats->xdp_redirects++;
>>                          err = xdp_do_redirect(dev, &xdp, xdp_prog);
>> -                       if (err) {
>> -                               if (unlikely(xdp_page != page))
>> -                                       put_page(xdp_page);
>> -                               goto err_xdp;
>> -                       }
>> +                       if (err)
>> +                               goto err_xdp_frags;
>>                          *xdp_xmit |= VIRTIO_XDP_REDIR;
>>                          if (unlikely(xdp_page != page))
>>                                  put_page(page);
>> @@ -1220,9 +1175,7 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
>>                          trace_xdp_exception(vi->dev, xdp_prog, act);
>>                          fallthrough;
>>                  case XDP_DROP:
>> -                       if (unlikely(xdp_page != page))
>> -                               __free_pages(xdp_page, 0);
>> -                       goto err_xdp;
>> +                       goto err_xdp_frags;
>>                  }
>>   err_xdp_frags:
>>                  shinfo = xdp_get_shared_info_from_buff(&xdp);
>> --
>> 2.19.1.6.gb485710b
>>


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC PATCH 3/9] virtio_net: update bytes calculation for xdp_frame
  2022-12-06  5:31   ` Jason Wang
@ 2022-12-08  8:35     ` Heng Qi
  0 siblings, 0 replies; 30+ messages in thread
From: Heng Qi @ 2022-12-08  8:35 UTC (permalink / raw)
  To: Jason Wang
  Cc: netdev, bpf, Michael S. Tsirkin, Paolo Abeni, Jakub Kicinski,
	John Fastabend, David S. Miller, Daniel Borkmann,
	Alexei Starovoitov, Eric Dumazet



在 2022/12/6 下午1:31, Jason Wang 写道:
> On Tue, Nov 22, 2022 at 3:44 PM Heng Qi <hengqi@linux.alibaba.com> wrote:
>> Update relative record value for xdp_frame as basis
>> for multi-buffer xdp transmission.
>>
>> Signed-off-by: Heng Qi <hengqi@linux.alibaba.com>
>> Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> Acked-by: Jason Wang <jasowang@redhat.com>

Thanks for your energy, you are awesome.

>
> Thanks
>
>> ---
>>   drivers/net/virtio_net.c | 4 ++--
>>   1 file changed, 2 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
>> index 8f7d207d58d6..d3e8c63b9c4b 100644
>> --- a/drivers/net/virtio_net.c
>> +++ b/drivers/net/virtio_net.c
>> @@ -658,7 +658,7 @@ static int virtnet_xdp_xmit(struct net_device *dev,
>>                  if (likely(is_xdp_frame(ptr))) {
>>                          struct xdp_frame *frame = ptr_to_xdp(ptr);
>>
>> -                       bytes += frame->len;
>> +                       bytes += xdp_get_frame_len(frame);
>>                          xdp_return_frame(frame);
>>                  } else {
>>                          struct sk_buff *skb = ptr;
>> @@ -1604,7 +1604,7 @@ static void free_old_xmit_skbs(struct send_queue *sq, bool in_napi)
>>                  } else {
>>                          struct xdp_frame *frame = ptr_to_xdp(ptr);
>>
>> -                       bytes += frame->len;
>> +                       bytes += xdp_get_frame_len(frame);
>>                          xdp_return_frame(frame);
>>                  }
>>                  packets++;
>> --
>> 2.19.1.6.gb485710b
>>


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC PATCH 6/9] virtio_net: construct multi-buffer xdp in mergeable
  2022-12-08  8:30     ` Heng Qi
@ 2022-12-13  7:08       ` Jason Wang
  2022-12-14  8:37         ` Heng Qi
  0 siblings, 1 reply; 30+ messages in thread
From: Jason Wang @ 2022-12-13  7:08 UTC (permalink / raw)
  To: Heng Qi
  Cc: netdev, bpf, Michael S. Tsirkin, Paolo Abeni, Jakub Kicinski,
	John Fastabend, David S. Miller, Daniel Borkmann,
	Alexei Starovoitov, Eric Dumazet

On Thu, Dec 8, 2022 at 4:30 PM Heng Qi <hengqi@linux.alibaba.com> wrote:
>
>
>
> 在 2022/12/6 下午2:33, Jason Wang 写道:
> > On Tue, Nov 22, 2022 at 3:44 PM Heng Qi <hengqi@linux.alibaba.com> wrote:
> >> Build multi-buffer xdp using virtnet_build_xdp_buff() in mergeable.
> >>
> >> For the prefilled buffer before xdp is set, vq reset can be
> >> used to clear it, but most devices do not support it at present.
> >> In order not to bother users who are using xdp normally, we do
> >> not use vq reset for the time being.
> > I guess to tweak the part to say we will probably use vq reset in the future.
>
> OK, it works.
>
> >
> >> At the same time, virtio
> >> net currently uses comp pages, and bpf_xdp_frags_increase_tail()
> >> needs to calculate the tailroom of the last frag, which will
> >> involve the offset of the corresponding page and cause a negative
> >> value, so we disable tail increase by not setting xdp_rxq->frag_size.
> >>
> >> Signed-off-by: Heng Qi <hengqi@linux.alibaba.com>
> >> Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> >> ---
> >>   drivers/net/virtio_net.c | 67 +++++++++++++++++++++++-----------------
> >>   1 file changed, 38 insertions(+), 29 deletions(-)
> >>
> >> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> >> index 20784b1d8236..83e6933ae62b 100644
> >> --- a/drivers/net/virtio_net.c
> >> +++ b/drivers/net/virtio_net.c
> >> @@ -994,6 +994,7 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
> >>                                           unsigned int *xdp_xmit,
> >>                                           struct virtnet_rq_stats *stats)
> >>   {
> >> +       unsigned int tailroom = SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
> >>          struct virtio_net_hdr_mrg_rxbuf *hdr = buf;
> >>          u16 num_buf = virtio16_to_cpu(vi->vdev, hdr->num_buffers);
> >>          struct page *page = virt_to_head_page(buf);
> >> @@ -1024,53 +1025,50 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
> >>          rcu_read_lock();
> >>          xdp_prog = rcu_dereference(rq->xdp_prog);
> >>          if (xdp_prog) {
> >> +               unsigned int xdp_frags_truesz = 0;
> >> +               struct skb_shared_info *shinfo;
> >>                  struct xdp_frame *xdpf;
> >>                  struct page *xdp_page;
> >>                  struct xdp_buff xdp;
> >>                  void *data;
> >>                  u32 act;
> >> +               int i;
> >>
> >> -               /* Transient failure which in theory could occur if
> >> -                * in-flight packets from before XDP was enabled reach
> >> -                * the receive path after XDP is loaded.
> >> -                */
> >> -               if (unlikely(hdr->hdr.gso_type))
> >> -                       goto err_xdp;
> > Two questions:
> >
> > 1) should we keep this check for the XDP program that can't deal with XDP frags?
>
> Yes, the problem is the same as the xdp program without xdp.frags when
> GRO_HW, I will correct it.
>
> > 2) how could we guarantee that the vnet header (gso_type/csum_start
> > etc) is still valid after XDP (where XDP program can choose to
> > override the header)?
>
> We can save the vnet headr before the driver receives the packet and
> build xdp_buff, and then use
> the pre-saved value in the subsequent process.

The problem is that XDP may modify the packet (header) so some fields
are not valid any more (e.g csum_start/offset ?).

If I was not wrong, there's no way for the XDP program to access those
fields or does it support it right now?

>
> >> -
> >> -               /* Buffers with headroom use PAGE_SIZE as alloc size,
> >> -                * see add_recvbuf_mergeable() + get_mergeable_buf_len()
> >> +               /* Now XDP core assumes frag size is PAGE_SIZE, but buffers
> >> +                * with headroom may add hole in truesize, which
> >> +                * make their length exceed PAGE_SIZE. So we disabled the
> >> +                * hole mechanism for xdp. See add_recvbuf_mergeable().
> >>                   */
> >>                  frame_sz = headroom ? PAGE_SIZE : truesize;
> >>
> >> -               /* This happens when rx buffer size is underestimated
> >> -                * or headroom is not enough because of the buffer
> >> -                * was refilled before XDP is set. This should only
> >> -                * happen for the first several packets, so we don't
> >> -                * care much about its performance.
> >> +               /* This happens when headroom is not enough because
> >> +                * of the buffer was prefilled before XDP is set.
> >> +                * This should only happen for the first several packets.
> >> +                * In fact, vq reset can be used here to help us clean up
> >> +                * the prefilled buffers, but many existing devices do not
> >> +                * support it, and we don't want to bother users who are
> >> +                * using xdp normally.
> >>                   */
> >> -               if (unlikely(num_buf > 1 ||
> >> -                            headroom < virtnet_get_headroom(vi))) {
> >> -                       /* linearize data for XDP */
> >> -                       xdp_page = xdp_linearize_page(rq, &num_buf,
> >> -                                                     page, offset,
> >> -                                                     VIRTIO_XDP_HEADROOM,
> >> -                                                     &len);
> >> -                       frame_sz = PAGE_SIZE;
> >> +               if (unlikely(headroom < virtnet_get_headroom(vi))) {
> >> +                       if ((VIRTIO_XDP_HEADROOM + len + tailroom) > PAGE_SIZE)
> >> +                               goto err_xdp;
> >>
> >> +                       xdp_page = alloc_page(GFP_ATOMIC);
> >>                          if (!xdp_page)
> >>                                  goto err_xdp;
> >> +
> >> +                       memcpy(page_address(xdp_page) + VIRTIO_XDP_HEADROOM,
> >> +                              page_address(page) + offset, len);
> >> +                       frame_sz = PAGE_SIZE;
> > How can we know a single page is sufficient here? (before XDP is set,
> > we reserve neither headroom nor tailroom).
>
> This is only for the first buffer, refer to add_recvbuf_mergeable() and
> get_mergeable_buf_len() A buffer is always no larger than a page.

Ok.

Thanks

>
> >
> >>                          offset = VIRTIO_XDP_HEADROOM;
> > I think we should still try to do linearization for the XDP program
> > that doesn't support XDP frags.
>
> Yes, you are right.
>
> Thanks.
>
> >
> > Thanks
> >
> >>                  } else {
> >>                          xdp_page = page;
> >>                  }
> >> -
> >> -               /* Allow consuming headroom but reserve enough space to push
> >> -                * the descriptor on if we get an XDP_TX return code.
> >> -                */
> >>                  data = page_address(xdp_page) + offset;
> >> -               xdp_init_buff(&xdp, frame_sz - vi->hdr_len, &rq->xdp_rxq);
> >> -               xdp_prepare_buff(&xdp, data - VIRTIO_XDP_HEADROOM + vi->hdr_len,
> >> -                                VIRTIO_XDP_HEADROOM, len - vi->hdr_len, true);
> >> +               err = virtnet_build_xdp_buff(dev, vi, rq, &xdp, data, len, frame_sz,
> >> +                                            &num_buf, &xdp_frags_truesz, stats);
> >> +               if (unlikely(err))
> >> +                       goto err_xdp_frags;
> >>
> >>                  act = bpf_prog_run_xdp(xdp_prog, &xdp);
> >>                  stats->xdp_packets++;
> >> @@ -1164,6 +1162,17 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
> >>                                  __free_pages(xdp_page, 0);
> >>                          goto err_xdp;
> >>                  }
> >> +err_xdp_frags:
> >> +               shinfo = xdp_get_shared_info_from_buff(&xdp);
> >> +
> >> +               if (unlikely(xdp_page != page))
> >> +                       __free_pages(xdp_page, 0);
> >> +
> >> +               for (i = 0; i < shinfo->nr_frags; i++) {
> >> +                       xdp_page = skb_frag_page(&shinfo->frags[i]);
> >> +                       put_page(xdp_page);
> >> +               }
> >> +               goto err_xdp;
> >>          }
> >>          rcu_read_unlock();
> >>
> >> --
> >> 2.19.1.6.gb485710b
> >>
>


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC PATCH 6/9] virtio_net: construct multi-buffer xdp in mergeable
  2022-12-13  7:08       ` Jason Wang
@ 2022-12-14  8:37         ` Heng Qi
  2022-12-16  3:46           ` Jason Wang
  0 siblings, 1 reply; 30+ messages in thread
From: Heng Qi @ 2022-12-14  8:37 UTC (permalink / raw)
  To: Jason Wang
  Cc: netdev, bpf, Michael S. Tsirkin, Paolo Abeni, Jakub Kicinski,
	John Fastabend, David S. Miller, Daniel Borkmann,
	Alexei Starovoitov, Eric Dumazet

On Tue, Dec 13, 2022 at 03:08:46PM +0800, Jason Wang wrote:
> On Thu, Dec 8, 2022 at 4:30 PM Heng Qi <hengqi@linux.alibaba.com> wrote:
> >
> >
> >
> > 在 2022/12/6 下午2:33, Jason Wang 写道:
> > > On Tue, Nov 22, 2022 at 3:44 PM Heng Qi <hengqi@linux.alibaba.com> wrote:
> > >> Build multi-buffer xdp using virtnet_build_xdp_buff() in mergeable.
> > >>
> > >> For the prefilled buffer before xdp is set, vq reset can be
> > >> used to clear it, but most devices do not support it at present.
> > >> In order not to bother users who are using xdp normally, we do
> > >> not use vq reset for the time being.
> > > I guess to tweak the part to say we will probably use vq reset in the future.
> >
> > OK, it works.
> >
> > >
> > >> At the same time, virtio
> > >> net currently uses comp pages, and bpf_xdp_frags_increase_tail()
> > >> needs to calculate the tailroom of the last frag, which will
> > >> involve the offset of the corresponding page and cause a negative
> > >> value, so we disable tail increase by not setting xdp_rxq->frag_size.
> > >>
> > >> Signed-off-by: Heng Qi <hengqi@linux.alibaba.com>
> > >> Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > >> ---
> > >>   drivers/net/virtio_net.c | 67 +++++++++++++++++++++++-----------------
> > >>   1 file changed, 38 insertions(+), 29 deletions(-)
> > >>
> > >> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > >> index 20784b1d8236..83e6933ae62b 100644
> > >> --- a/drivers/net/virtio_net.c
> > >> +++ b/drivers/net/virtio_net.c
> > >> @@ -994,6 +994,7 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
> > >>                                           unsigned int *xdp_xmit,
> > >>                                           struct virtnet_rq_stats *stats)
> > >>   {
> > >> +       unsigned int tailroom = SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
> > >>          struct virtio_net_hdr_mrg_rxbuf *hdr = buf;
> > >>          u16 num_buf = virtio16_to_cpu(vi->vdev, hdr->num_buffers);
> > >>          struct page *page = virt_to_head_page(buf);
> > >> @@ -1024,53 +1025,50 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
> > >>          rcu_read_lock();
> > >>          xdp_prog = rcu_dereference(rq->xdp_prog);
> > >>          if (xdp_prog) {
> > >> +               unsigned int xdp_frags_truesz = 0;
> > >> +               struct skb_shared_info *shinfo;
> > >>                  struct xdp_frame *xdpf;
> > >>                  struct page *xdp_page;
> > >>                  struct xdp_buff xdp;
> > >>                  void *data;
> > >>                  u32 act;
> > >> +               int i;
> > >>
> > >> -               /* Transient failure which in theory could occur if
> > >> -                * in-flight packets from before XDP was enabled reach
> > >> -                * the receive path after XDP is loaded.
> > >> -                */
> > >> -               if (unlikely(hdr->hdr.gso_type))
> > >> -                       goto err_xdp;
> > > Two questions:
> > >
> > > 1) should we keep this check for the XDP program that can't deal with XDP frags?
> >
> > Yes, the problem is the same as the xdp program without xdp.frags when
> > GRO_HW, I will correct it.
> >
> > > 2) how could we guarantee that the vnet header (gso_type/csum_start
> > > etc) is still valid after XDP (where XDP program can choose to
> > > override the header)?
> >
> > We can save the vnet headr before the driver receives the packet and
> > build xdp_buff, and then use
> > the pre-saved value in the subsequent process.
> 
> The problem is that XDP may modify the packet (header) so some fields
> are not valid any more (e.g csum_start/offset ?).
> 
> If I was not wrong, there's no way for the XDP program to access those
> fields or does it support it right now?
> 

When guest_csum feature is negotiated, xdp cannot be set, because the metadata
of xdp_{buff, frame} may be adjusted by the bpf program, therefore,
csum_{start, offset} itself is invalid. And at the same time,
multi-buffer xdp programs should only Receive packets over larger MTU, so
we don't need gso related information anymore and need to disable GRO_HW.

Thanks.

> >
> > >> -
> > >> -               /* Buffers with headroom use PAGE_SIZE as alloc size,
> > >> -                * see add_recvbuf_mergeable() + get_mergeable_buf_len()
> > >> +               /* Now XDP core assumes frag size is PAGE_SIZE, but buffers
> > >> +                * with headroom may add hole in truesize, which
> > >> +                * make their length exceed PAGE_SIZE. So we disabled the
> > >> +                * hole mechanism for xdp. See add_recvbuf_mergeable().
> > >>                   */
> > >>                  frame_sz = headroom ? PAGE_SIZE : truesize;
> > >>
> > >> -               /* This happens when rx buffer size is underestimated
> > >> -                * or headroom is not enough because of the buffer
> > >> -                * was refilled before XDP is set. This should only
> > >> -                * happen for the first several packets, so we don't
> > >> -                * care much about its performance.
> > >> +               /* This happens when headroom is not enough because
> > >> +                * of the buffer was prefilled before XDP is set.
> > >> +                * This should only happen for the first several packets.
> > >> +                * In fact, vq reset can be used here to help us clean up
> > >> +                * the prefilled buffers, but many existing devices do not
> > >> +                * support it, and we don't want to bother users who are
> > >> +                * using xdp normally.
> > >>                   */
> > >> -               if (unlikely(num_buf > 1 ||
> > >> -                            headroom < virtnet_get_headroom(vi))) {
> > >> -                       /* linearize data for XDP */
> > >> -                       xdp_page = xdp_linearize_page(rq, &num_buf,
> > >> -                                                     page, offset,
> > >> -                                                     VIRTIO_XDP_HEADROOM,
> > >> -                                                     &len);
> > >> -                       frame_sz = PAGE_SIZE;
> > >> +               if (unlikely(headroom < virtnet_get_headroom(vi))) {
> > >> +                       if ((VIRTIO_XDP_HEADROOM + len + tailroom) > PAGE_SIZE)
> > >> +                               goto err_xdp;
> > >>
> > >> +                       xdp_page = alloc_page(GFP_ATOMIC);
> > >>                          if (!xdp_page)
> > >>                                  goto err_xdp;
> > >> +
> > >> +                       memcpy(page_address(xdp_page) + VIRTIO_XDP_HEADROOM,
> > >> +                              page_address(page) + offset, len);
> > >> +                       frame_sz = PAGE_SIZE;
> > > How can we know a single page is sufficient here? (before XDP is set,
> > > we reserve neither headroom nor tailroom).
> >
> > This is only for the first buffer, refer to add_recvbuf_mergeable() and
> > get_mergeable_buf_len() A buffer is always no larger than a page.
> 
> Ok.
> 
> Thanks
> 
> >
> > >
> > >>                          offset = VIRTIO_XDP_HEADROOM;
> > > I think we should still try to do linearization for the XDP program
> > > that doesn't support XDP frags.
> >
> > Yes, you are right.
> >
> > Thanks.
> >
> > >
> > > Thanks
> > >
> > >>                  } else {
> > >>                          xdp_page = page;
> > >>                  }
> > >> -
> > >> -               /* Allow consuming headroom but reserve enough space to push
> > >> -                * the descriptor on if we get an XDP_TX return code.
> > >> -                */
> > >>                  data = page_address(xdp_page) + offset;
> > >> -               xdp_init_buff(&xdp, frame_sz - vi->hdr_len, &rq->xdp_rxq);
> > >> -               xdp_prepare_buff(&xdp, data - VIRTIO_XDP_HEADROOM + vi->hdr_len,
> > >> -                                VIRTIO_XDP_HEADROOM, len - vi->hdr_len, true);
> > >> +               err = virtnet_build_xdp_buff(dev, vi, rq, &xdp, data, len, frame_sz,
> > >> +                                            &num_buf, &xdp_frags_truesz, stats);
> > >> +               if (unlikely(err))
> > >> +                       goto err_xdp_frags;
> > >>
> > >>                  act = bpf_prog_run_xdp(xdp_prog, &xdp);
> > >>                  stats->xdp_packets++;
> > >> @@ -1164,6 +1162,17 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
> > >>                                  __free_pages(xdp_page, 0);
> > >>                          goto err_xdp;
> > >>                  }
> > >> +err_xdp_frags:
> > >> +               shinfo = xdp_get_shared_info_from_buff(&xdp);
> > >> +
> > >> +               if (unlikely(xdp_page != page))
> > >> +                       __free_pages(xdp_page, 0);
> > >> +
> > >> +               for (i = 0; i < shinfo->nr_frags; i++) {
> > >> +                       xdp_page = skb_frag_page(&shinfo->frags[i]);
> > >> +                       put_page(xdp_page);
> > >> +               }
> > >> +               goto err_xdp;
> > >>          }
> > >>          rcu_read_unlock();
> > >>
> > >> --
> > >> 2.19.1.6.gb485710b
> > >>
> >

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC PATCH 6/9] virtio_net: construct multi-buffer xdp in mergeable
  2022-12-14  8:37         ` Heng Qi
@ 2022-12-16  3:46           ` Jason Wang
  2022-12-16  9:42             ` Heng Qi
  0 siblings, 1 reply; 30+ messages in thread
From: Jason Wang @ 2022-12-16  3:46 UTC (permalink / raw)
  To: Heng Qi
  Cc: netdev, bpf, Michael S. Tsirkin, Paolo Abeni, Jakub Kicinski,
	John Fastabend, David S. Miller, Daniel Borkmann,
	Alexei Starovoitov, Eric Dumazet

On Wed, Dec 14, 2022 at 4:38 PM Heng Qi <hengqi@linux.alibaba.com> wrote:
>
> On Tue, Dec 13, 2022 at 03:08:46PM +0800, Jason Wang wrote:
> > On Thu, Dec 8, 2022 at 4:30 PM Heng Qi <hengqi@linux.alibaba.com> wrote:
> > >
> > >
> > >
> > > 在 2022/12/6 下午2:33, Jason Wang 写道:
> > > > On Tue, Nov 22, 2022 at 3:44 PM Heng Qi <hengqi@linux.alibaba.com> wrote:
> > > >> Build multi-buffer xdp using virtnet_build_xdp_buff() in mergeable.
> > > >>
> > > >> For the prefilled buffer before xdp is set, vq reset can be
> > > >> used to clear it, but most devices do not support it at present.
> > > >> In order not to bother users who are using xdp normally, we do
> > > >> not use vq reset for the time being.
> > > > I guess to tweak the part to say we will probably use vq reset in the future.
> > >
> > > OK, it works.
> > >
> > > >
> > > >> At the same time, virtio
> > > >> net currently uses comp pages, and bpf_xdp_frags_increase_tail()
> > > >> needs to calculate the tailroom of the last frag, which will
> > > >> involve the offset of the corresponding page and cause a negative
> > > >> value, so we disable tail increase by not setting xdp_rxq->frag_size.
> > > >>
> > > >> Signed-off-by: Heng Qi <hengqi@linux.alibaba.com>
> > > >> Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
> > > >> ---
> > > >>   drivers/net/virtio_net.c | 67 +++++++++++++++++++++++-----------------
> > > >>   1 file changed, 38 insertions(+), 29 deletions(-)
> > > >>
> > > >> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > > >> index 20784b1d8236..83e6933ae62b 100644
> > > >> --- a/drivers/net/virtio_net.c
> > > >> +++ b/drivers/net/virtio_net.c
> > > >> @@ -994,6 +994,7 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
> > > >>                                           unsigned int *xdp_xmit,
> > > >>                                           struct virtnet_rq_stats *stats)
> > > >>   {
> > > >> +       unsigned int tailroom = SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
> > > >>          struct virtio_net_hdr_mrg_rxbuf *hdr = buf;
> > > >>          u16 num_buf = virtio16_to_cpu(vi->vdev, hdr->num_buffers);
> > > >>          struct page *page = virt_to_head_page(buf);
> > > >> @@ -1024,53 +1025,50 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
> > > >>          rcu_read_lock();
> > > >>          xdp_prog = rcu_dereference(rq->xdp_prog);
> > > >>          if (xdp_prog) {
> > > >> +               unsigned int xdp_frags_truesz = 0;
> > > >> +               struct skb_shared_info *shinfo;
> > > >>                  struct xdp_frame *xdpf;
> > > >>                  struct page *xdp_page;
> > > >>                  struct xdp_buff xdp;
> > > >>                  void *data;
> > > >>                  u32 act;
> > > >> +               int i;
> > > >>
> > > >> -               /* Transient failure which in theory could occur if
> > > >> -                * in-flight packets from before XDP was enabled reach
> > > >> -                * the receive path after XDP is loaded.
> > > >> -                */
> > > >> -               if (unlikely(hdr->hdr.gso_type))
> > > >> -                       goto err_xdp;
> > > > Two questions:
> > > >
> > > > 1) should we keep this check for the XDP program that can't deal with XDP frags?
> > >
> > > Yes, the problem is the same as the xdp program without xdp.frags when
> > > GRO_HW, I will correct it.
> > >
> > > > 2) how could we guarantee that the vnet header (gso_type/csum_start
> > > > etc) is still valid after XDP (where XDP program can choose to
> > > > override the header)?
> > >
> > > We can save the vnet headr before the driver receives the packet and
> > > build xdp_buff, and then use
> > > the pre-saved value in the subsequent process.
> >
> > The problem is that XDP may modify the packet (header) so some fields
> > are not valid any more (e.g csum_start/offset ?).
> >
> > If I was not wrong, there's no way for the XDP program to access those
> > fields or does it support it right now?
> >
>
> When guest_csum feature is negotiated, xdp cannot be set, because the metadata
> of xdp_{buff, frame} may be adjusted by the bpf program, therefore,
> csum_{start, offset} itself is invalid. And at the same time,
> multi-buffer xdp programs should only Receive packets over larger MTU, so
> we don't need gso related information anymore and need to disable GRO_HW.

Ok, that's fine.

(But it requires a large pMTU).

Thanks

>
> Thanks.
>
> > >
> > > >> -
> > > >> -               /* Buffers with headroom use PAGE_SIZE as alloc size,
> > > >> -                * see add_recvbuf_mergeable() + get_mergeable_buf_len()
> > > >> +               /* Now XDP core assumes frag size is PAGE_SIZE, but buffers
> > > >> +                * with headroom may add hole in truesize, which
> > > >> +                * make their length exceed PAGE_SIZE. So we disabled the
> > > >> +                * hole mechanism for xdp. See add_recvbuf_mergeable().
> > > >>                   */
> > > >>                  frame_sz = headroom ? PAGE_SIZE : truesize;
> > > >>
> > > >> -               /* This happens when rx buffer size is underestimated
> > > >> -                * or headroom is not enough because of the buffer
> > > >> -                * was refilled before XDP is set. This should only
> > > >> -                * happen for the first several packets, so we don't
> > > >> -                * care much about its performance.
> > > >> +               /* This happens when headroom is not enough because
> > > >> +                * of the buffer was prefilled before XDP is set.
> > > >> +                * This should only happen for the first several packets.
> > > >> +                * In fact, vq reset can be used here to help us clean up
> > > >> +                * the prefilled buffers, but many existing devices do not
> > > >> +                * support it, and we don't want to bother users who are
> > > >> +                * using xdp normally.
> > > >>                   */
> > > >> -               if (unlikely(num_buf > 1 ||
> > > >> -                            headroom < virtnet_get_headroom(vi))) {
> > > >> -                       /* linearize data for XDP */
> > > >> -                       xdp_page = xdp_linearize_page(rq, &num_buf,
> > > >> -                                                     page, offset,
> > > >> -                                                     VIRTIO_XDP_HEADROOM,
> > > >> -                                                     &len);
> > > >> -                       frame_sz = PAGE_SIZE;
> > > >> +               if (unlikely(headroom < virtnet_get_headroom(vi))) {
> > > >> +                       if ((VIRTIO_XDP_HEADROOM + len + tailroom) > PAGE_SIZE)
> > > >> +                               goto err_xdp;
> > > >>
> > > >> +                       xdp_page = alloc_page(GFP_ATOMIC);
> > > >>                          if (!xdp_page)
> > > >>                                  goto err_xdp;
> > > >> +
> > > >> +                       memcpy(page_address(xdp_page) + VIRTIO_XDP_HEADROOM,
> > > >> +                              page_address(page) + offset, len);
> > > >> +                       frame_sz = PAGE_SIZE;
> > > > How can we know a single page is sufficient here? (before XDP is set,
> > > > we reserve neither headroom nor tailroom).
> > >
> > > This is only for the first buffer, refer to add_recvbuf_mergeable() and
> > > get_mergeable_buf_len() A buffer is always no larger than a page.
> >
> > Ok.
> >
> > Thanks
> >
> > >
> > > >
> > > >>                          offset = VIRTIO_XDP_HEADROOM;
> > > > I think we should still try to do linearization for the XDP program
> > > > that doesn't support XDP frags.
> > >
> > > Yes, you are right.
> > >
> > > Thanks.
> > >
> > > >
> > > > Thanks
> > > >
> > > >>                  } else {
> > > >>                          xdp_page = page;
> > > >>                  }
> > > >> -
> > > >> -               /* Allow consuming headroom but reserve enough space to push
> > > >> -                * the descriptor on if we get an XDP_TX return code.
> > > >> -                */
> > > >>                  data = page_address(xdp_page) + offset;
> > > >> -               xdp_init_buff(&xdp, frame_sz - vi->hdr_len, &rq->xdp_rxq);
> > > >> -               xdp_prepare_buff(&xdp, data - VIRTIO_XDP_HEADROOM + vi->hdr_len,
> > > >> -                                VIRTIO_XDP_HEADROOM, len - vi->hdr_len, true);
> > > >> +               err = virtnet_build_xdp_buff(dev, vi, rq, &xdp, data, len, frame_sz,
> > > >> +                                            &num_buf, &xdp_frags_truesz, stats);
> > > >> +               if (unlikely(err))
> > > >> +                       goto err_xdp_frags;
> > > >>
> > > >>                  act = bpf_prog_run_xdp(xdp_prog, &xdp);
> > > >>                  stats->xdp_packets++;
> > > >> @@ -1164,6 +1162,17 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
> > > >>                                  __free_pages(xdp_page, 0);
> > > >>                          goto err_xdp;
> > > >>                  }
> > > >> +err_xdp_frags:
> > > >> +               shinfo = xdp_get_shared_info_from_buff(&xdp);
> > > >> +
> > > >> +               if (unlikely(xdp_page != page))
> > > >> +                       __free_pages(xdp_page, 0);
> > > >> +
> > > >> +               for (i = 0; i < shinfo->nr_frags; i++) {
> > > >> +                       xdp_page = skb_frag_page(&shinfo->frags[i]);
> > > >> +                       put_page(xdp_page);
> > > >> +               }
> > > >> +               goto err_xdp;
> > > >>          }
> > > >>          rcu_read_unlock();
> > > >>
> > > >> --
> > > >> 2.19.1.6.gb485710b
> > > >>
> > >
>


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC PATCH 6/9] virtio_net: construct multi-buffer xdp in mergeable
  2022-12-16  3:46           ` Jason Wang
@ 2022-12-16  9:42             ` Heng Qi
  0 siblings, 0 replies; 30+ messages in thread
From: Heng Qi @ 2022-12-16  9:42 UTC (permalink / raw)
  To: Jason Wang
  Cc: netdev, bpf, Michael S. Tsirkin, Paolo Abeni, Jakub Kicinski,
	John Fastabend, David S. Miller, Daniel Borkmann,
	Alexei Starovoitov, Eric Dumazet



在 2022/12/16 上午11:46, Jason Wang 写道:
> On Wed, Dec 14, 2022 at 4:38 PM Heng Qi <hengqi@linux.alibaba.com> wrote:
>> On Tue, Dec 13, 2022 at 03:08:46PM +0800, Jason Wang wrote:
>>> On Thu, Dec 8, 2022 at 4:30 PM Heng Qi <hengqi@linux.alibaba.com> wrote:
>>>>
>>>>
>>>> 在 2022/12/6 下午2:33, Jason Wang 写道:
>>>>> On Tue, Nov 22, 2022 at 3:44 PM Heng Qi <hengqi@linux.alibaba.com> wrote:
>>>>>> Build multi-buffer xdp using virtnet_build_xdp_buff() in mergeable.
>>>>>>
>>>>>> For the prefilled buffer before xdp is set, vq reset can be
>>>>>> used to clear it, but most devices do not support it at present.
>>>>>> In order not to bother users who are using xdp normally, we do
>>>>>> not use vq reset for the time being.
>>>>> I guess to tweak the part to say we will probably use vq reset in the future.
>>>> OK, it works.
>>>>
>>>>>> At the same time, virtio
>>>>>> net currently uses comp pages, and bpf_xdp_frags_increase_tail()
>>>>>> needs to calculate the tailroom of the last frag, which will
>>>>>> involve the offset of the corresponding page and cause a negative
>>>>>> value, so we disable tail increase by not setting xdp_rxq->frag_size.
>>>>>>
>>>>>> Signed-off-by: Heng Qi <hengqi@linux.alibaba.com>
>>>>>> Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
>>>>>> ---
>>>>>>    drivers/net/virtio_net.c | 67 +++++++++++++++++++++++-----------------
>>>>>>    1 file changed, 38 insertions(+), 29 deletions(-)
>>>>>>
>>>>>> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
>>>>>> index 20784b1d8236..83e6933ae62b 100644
>>>>>> --- a/drivers/net/virtio_net.c
>>>>>> +++ b/drivers/net/virtio_net.c
>>>>>> @@ -994,6 +994,7 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
>>>>>>                                            unsigned int *xdp_xmit,
>>>>>>                                            struct virtnet_rq_stats *stats)
>>>>>>    {
>>>>>> +       unsigned int tailroom = SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
>>>>>>           struct virtio_net_hdr_mrg_rxbuf *hdr = buf;
>>>>>>           u16 num_buf = virtio16_to_cpu(vi->vdev, hdr->num_buffers);
>>>>>>           struct page *page = virt_to_head_page(buf);
>>>>>> @@ -1024,53 +1025,50 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
>>>>>>           rcu_read_lock();
>>>>>>           xdp_prog = rcu_dereference(rq->xdp_prog);
>>>>>>           if (xdp_prog) {
>>>>>> +               unsigned int xdp_frags_truesz = 0;
>>>>>> +               struct skb_shared_info *shinfo;
>>>>>>                   struct xdp_frame *xdpf;
>>>>>>                   struct page *xdp_page;
>>>>>>                   struct xdp_buff xdp;
>>>>>>                   void *data;
>>>>>>                   u32 act;
>>>>>> +               int i;
>>>>>>
>>>>>> -               /* Transient failure which in theory could occur if
>>>>>> -                * in-flight packets from before XDP was enabled reach
>>>>>> -                * the receive path after XDP is loaded.
>>>>>> -                */
>>>>>> -               if (unlikely(hdr->hdr.gso_type))
>>>>>> -                       goto err_xdp;
>>>>> Two questions:
>>>>>
>>>>> 1) should we keep this check for the XDP program that can't deal with XDP frags?
>>>> Yes, the problem is the same as the xdp program without xdp.frags when
>>>> GRO_HW, I will correct it.
>>>>
>>>>> 2) how could we guarantee that the vnet header (gso_type/csum_start
>>>>> etc) is still valid after XDP (where XDP program can choose to
>>>>> override the header)?
>>>> We can save the vnet headr before the driver receives the packet and
>>>> build xdp_buff, and then use
>>>> the pre-saved value in the subsequent process.
>>> The problem is that XDP may modify the packet (header) so some fields
>>> are not valid any more (e.g csum_start/offset ?).
>>>
>>> If I was not wrong, there's no way for the XDP program to access those
>>> fields or does it support it right now?
>>>
>> When guest_csum feature is negotiated, xdp cannot be set, because the metadata
>> of xdp_{buff, frame} may be adjusted by the bpf program, therefore,
>> csum_{start, offset} itself is invalid. And at the same time,
>> multi-buffer xdp programs should only Receive packets over larger MTU, so
>> we don't need gso related information anymore and need to disable GRO_HW.
> Ok, that's fine.
>
> (But it requires a large pMTU).

Yes. Like a jumbo frame that larger than 4K.

Thanks.

>
> Thanks
>
>> Thanks.
>>
>>>>>> -
>>>>>> -               /* Buffers with headroom use PAGE_SIZE as alloc size,
>>>>>> -                * see add_recvbuf_mergeable() + get_mergeable_buf_len()
>>>>>> +               /* Now XDP core assumes frag size is PAGE_SIZE, but buffers
>>>>>> +                * with headroom may add hole in truesize, which
>>>>>> +                * make their length exceed PAGE_SIZE. So we disabled the
>>>>>> +                * hole mechanism for xdp. See add_recvbuf_mergeable().
>>>>>>                    */
>>>>>>                   frame_sz = headroom ? PAGE_SIZE : truesize;
>>>>>>
>>>>>> -               /* This happens when rx buffer size is underestimated
>>>>>> -                * or headroom is not enough because of the buffer
>>>>>> -                * was refilled before XDP is set. This should only
>>>>>> -                * happen for the first several packets, so we don't
>>>>>> -                * care much about its performance.
>>>>>> +               /* This happens when headroom is not enough because
>>>>>> +                * of the buffer was prefilled before XDP is set.
>>>>>> +                * This should only happen for the first several packets.
>>>>>> +                * In fact, vq reset can be used here to help us clean up
>>>>>> +                * the prefilled buffers, but many existing devices do not
>>>>>> +                * support it, and we don't want to bother users who are
>>>>>> +                * using xdp normally.
>>>>>>                    */
>>>>>> -               if (unlikely(num_buf > 1 ||
>>>>>> -                            headroom < virtnet_get_headroom(vi))) {
>>>>>> -                       /* linearize data for XDP */
>>>>>> -                       xdp_page = xdp_linearize_page(rq, &num_buf,
>>>>>> -                                                     page, offset,
>>>>>> -                                                     VIRTIO_XDP_HEADROOM,
>>>>>> -                                                     &len);
>>>>>> -                       frame_sz = PAGE_SIZE;
>>>>>> +               if (unlikely(headroom < virtnet_get_headroom(vi))) {
>>>>>> +                       if ((VIRTIO_XDP_HEADROOM + len + tailroom) > PAGE_SIZE)
>>>>>> +                               goto err_xdp;
>>>>>>
>>>>>> +                       xdp_page = alloc_page(GFP_ATOMIC);
>>>>>>                           if (!xdp_page)
>>>>>>                                   goto err_xdp;
>>>>>> +
>>>>>> +                       memcpy(page_address(xdp_page) + VIRTIO_XDP_HEADROOM,
>>>>>> +                              page_address(page) + offset, len);
>>>>>> +                       frame_sz = PAGE_SIZE;
>>>>> How can we know a single page is sufficient here? (before XDP is set,
>>>>> we reserve neither headroom nor tailroom).
>>>> This is only for the first buffer, refer to add_recvbuf_mergeable() and
>>>> get_mergeable_buf_len() A buffer is always no larger than a page.
>>> Ok.
>>>
>>> Thanks
>>>
>>>>>>                           offset = VIRTIO_XDP_HEADROOM;
>>>>> I think we should still try to do linearization for the XDP program
>>>>> that doesn't support XDP frags.
>>>> Yes, you are right.
>>>>
>>>> Thanks.
>>>>
>>>>> Thanks
>>>>>
>>>>>>                   } else {
>>>>>>                           xdp_page = page;
>>>>>>                   }
>>>>>> -
>>>>>> -               /* Allow consuming headroom but reserve enough space to push
>>>>>> -                * the descriptor on if we get an XDP_TX return code.
>>>>>> -                */
>>>>>>                   data = page_address(xdp_page) + offset;
>>>>>> -               xdp_init_buff(&xdp, frame_sz - vi->hdr_len, &rq->xdp_rxq);
>>>>>> -               xdp_prepare_buff(&xdp, data - VIRTIO_XDP_HEADROOM + vi->hdr_len,
>>>>>> -                                VIRTIO_XDP_HEADROOM, len - vi->hdr_len, true);
>>>>>> +               err = virtnet_build_xdp_buff(dev, vi, rq, &xdp, data, len, frame_sz,
>>>>>> +                                            &num_buf, &xdp_frags_truesz, stats);
>>>>>> +               if (unlikely(err))
>>>>>> +                       goto err_xdp_frags;
>>>>>>
>>>>>>                   act = bpf_prog_run_xdp(xdp_prog, &xdp);
>>>>>>                   stats->xdp_packets++;
>>>>>> @@ -1164,6 +1162,17 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
>>>>>>                                   __free_pages(xdp_page, 0);
>>>>>>                           goto err_xdp;
>>>>>>                   }
>>>>>> +err_xdp_frags:
>>>>>> +               shinfo = xdp_get_shared_info_from_buff(&xdp);
>>>>>> +
>>>>>> +               if (unlikely(xdp_page != page))
>>>>>> +                       __free_pages(xdp_page, 0);
>>>>>> +
>>>>>> +               for (i = 0; i < shinfo->nr_frags; i++) {
>>>>>> +                       xdp_page = skb_frag_page(&shinfo->frags[i]);
>>>>>> +                       put_page(xdp_page);
>>>>>> +               }
>>>>>> +               goto err_xdp;
>>>>>>           }
>>>>>>           rcu_read_unlock();
>>>>>>
>>>>>> --
>>>>>> 2.19.1.6.gb485710b
>>>>>>


^ permalink raw reply	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2022-12-16  9:42 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-11-22  7:43 [RFC PATCH 0/9] virtio_net: support multi buffer xdp Heng Qi
2022-11-22  7:43 ` [RFC PATCH 1/9] virtio_net: disable the hole mechanism for xdp Heng Qi
2022-12-06  5:20   ` Jason Wang
2022-12-08  8:20     ` Heng Qi
2022-11-22  7:43 ` [RFC PATCH 2/9] virtio_net: set up xdp for multi buffer packets Heng Qi
2022-12-06  5:29   ` Jason Wang
2022-12-08  8:21     ` Heng Qi
2022-11-22  7:43 ` [RFC PATCH 3/9] virtio_net: update bytes calculation for xdp_frame Heng Qi
2022-12-06  5:31   ` Jason Wang
2022-12-08  8:35     ` Heng Qi
2022-11-22  7:43 ` [RFC PATCH 4/9] virtio_net: remove xdp related info from page_to_skb() Heng Qi
2022-12-06  5:36   ` Jason Wang
2022-12-08  8:23     ` Heng Qi
2022-11-22  7:43 ` [RFC PATCH 5/9] virtio_net: build xdp_buff with multi buffers Heng Qi
2022-12-06  6:14   ` Jason Wang
2022-12-08  8:25     ` Heng Qi
2022-11-22  7:43 ` [RFC PATCH 6/9] virtio_net: construct multi-buffer xdp in mergeable Heng Qi
2022-12-06  6:33   ` Jason Wang
2022-12-08  8:30     ` Heng Qi
2022-12-13  7:08       ` Jason Wang
2022-12-14  8:37         ` Heng Qi
2022-12-16  3:46           ` Jason Wang
2022-12-16  9:42             ` Heng Qi
2022-11-22  7:43 ` [RFC PATCH 7/9] virtio_net: build skb from multi-buffer xdp Heng Qi
2022-11-22  7:43 ` [RFC PATCH 8/9] virtio_net: transmit the " Heng Qi
2022-11-22  7:43 ` [RFC PATCH 9/9] virtio_net: support " Heng Qi
2022-12-06  6:42   ` Jason Wang
2022-12-08  8:31     ` Heng Qi
2022-12-02  4:50 ` [RFC PATCH 0/9] virtio_net: support multi buffer xdp Heng Qi
2022-12-02  5:41   ` Jason Wang

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.