linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH bpf-next 0/6] ice: post-mbuf fixes
@ 2023-02-10 17:06 Alexander Lobakin
  2023-02-10 17:06 ` [PATCH bpf-next 1/6] ice: fix ice_tx_ring::xdp_tx_active underflow Alexander Lobakin
                   ` (8 more replies)
  0 siblings, 9 replies; 11+ messages in thread
From: Alexander Lobakin @ 2023-02-10 17:06 UTC (permalink / raw)
  To: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko
  Cc: Alexander Lobakin, Maciej Fijalkowski, Magnus Karlsson,
	Toke Høiland-Jørgensen, Martin KaFai Lau, Song Liu,
	Jesper Dangaard Brouer, Jakub Kicinski, bpf, netdev,
	linux-kernel

The set grew from the poor performance of %BPF_F_TEST_XDP_LIVE_FRAMES
when the ice-backed device is a sender. Initially there were around
3.3 Mpps / thread, while I have 5.5 on skb-based pktgen...

After fixing 0005 (0004 is a prereq for it) first (strange thing nobody
noticed that earlier), I started catching random OOMs. This is how 0002
(and partially 0001) appeared.
0003 is a suggestion from Maciej to not waste time on refactoring dead
lines. 0006 is a "cherry on top" to get away with the final 6.7 Mpps.
4.5 of 6 are fixes, but only the first three are tagged, since it then
starts being tricky. I may backport them manually later on.

TL;DR for the series is that shortcuts are good, but only as long as
they don't make the driver miss important things. %XDP_TX is purely
driver-local, however .ndo_xdp_xmit() is not, and sometimes assumptions
can be unsafe there.

With that series and also one core code patch[0], "live frames" and
xdp-trafficgen are now safe'n'fast on ice (probably more to come).

[0] https://lore.kernel.org/all/20230209172827.874728-1-alexandr.lobakin@intel.com
---
Goes to directly to bpf-next as touches the recently added/changed code.

Alexander Lobakin (6):
  ice: fix ice_tx_ring::xdp_tx_active underflow
  ice: fix XDP Tx ring overrun
  ice: remove two impossible branches on XDP Tx cleaning
  ice: robustify cleaning/completing XDP Tx buffers
  ice: fix freeing XDP frames backed by Page Pool
  ice: micro-optimize .ndo_xdp_xmit() path

 drivers/net/ethernet/intel/ice/ice_txrx.c     | 67 +++++++++-----
 drivers/net/ethernet/intel/ice/ice_txrx.h     | 37 ++++++--
 drivers/net/ethernet/intel/ice/ice_txrx_lib.c | 88 ++++++++++++-------
 drivers/net/ethernet/intel/ice/ice_txrx_lib.h |  4 +-
 drivers/net/ethernet/intel/ice/ice_xsk.c      | 12 +--
 5 files changed, 136 insertions(+), 72 deletions(-)

-- 
2.39.1


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH bpf-next 1/6] ice: fix ice_tx_ring::xdp_tx_active underflow
  2023-02-10 17:06 [PATCH bpf-next 0/6] ice: post-mbuf fixes Alexander Lobakin
@ 2023-02-10 17:06 ` Alexander Lobakin
  2023-02-10 17:06 ` [PATCH bpf-next 2/6] ice: fix XDP Tx ring overrun Alexander Lobakin
                   ` (7 subsequent siblings)
  8 siblings, 0 replies; 11+ messages in thread
From: Alexander Lobakin @ 2023-02-10 17:06 UTC (permalink / raw)
  To: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko
  Cc: Alexander Lobakin, Maciej Fijalkowski, Magnus Karlsson,
	Toke Høiland-Jørgensen, Martin KaFai Lau, Song Liu,
	Jesper Dangaard Brouer, Jakub Kicinski, bpf, netdev,
	linux-kernel

xdp_tx_active is used to indicate whether an XDP ring has any %XDP_TX
frames queued to shortcut processing Tx cleaning for XSk-enabled queues.
When !XSk, it simply indicates whether the ring has any queued frames in
general.
It gets increased on each frame placed onto the ring and counts the
whole frame, not each frag. However, currently it gets decremented in
ice_clean_xdp_tx_buf(), which is called per each buffer, i.e. per each
frag. Thus, on completing multi-frag frames, an underflow happens.
Move the decrement to the outer function and do it once per frame, not
buf. Also, do that on the stack and update the ring counter after the
loop is done to save several cycles.
XSk rings are fine since there are no frags at the moment.

Fixes: 3246a10752a7 ("ice: Add support for XDP multi-buffer on Tx side")
Signed-off-by: Alexander Lobakin <alexandr.lobakin@intel.com>
---
 drivers/net/ethernet/intel/ice/ice_txrx_lib.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/intel/ice/ice_txrx_lib.c b/drivers/net/ethernet/intel/ice/ice_txrx_lib.c
index 9bbed3f14e42..d1a7171e618b 100644
--- a/drivers/net/ethernet/intel/ice/ice_txrx_lib.c
+++ b/drivers/net/ethernet/intel/ice/ice_txrx_lib.c
@@ -231,7 +231,6 @@ ice_clean_xdp_tx_buf(struct ice_tx_ring *xdp_ring, struct ice_tx_buf *tx_buf)
 	dma_unmap_single(xdp_ring->dev, dma_unmap_addr(tx_buf, dma),
 			 dma_unmap_len(tx_buf, len), DMA_TO_DEVICE);
 	dma_unmap_len_set(tx_buf, len, 0);
-	xdp_ring->xdp_tx_active--;
 	page_frag_free(tx_buf->raw_buf);
 	tx_buf->raw_buf = NULL;
 }
@@ -246,8 +245,8 @@ static u32 ice_clean_xdp_irq(struct ice_tx_ring *xdp_ring)
 	u32 ntc = xdp_ring->next_to_clean;
 	struct ice_tx_desc *tx_desc;
 	u32 cnt = xdp_ring->count;
+	u32 frags, xdp_tx = 0;
 	u32 ready_frames = 0;
-	u32 frags;
 	u32 idx;
 	u32 ret;
 
@@ -274,6 +273,7 @@ static u32 ice_clean_xdp_irq(struct ice_tx_ring *xdp_ring)
 		total_pkts++;
 		/* count head + frags */
 		ready_frames -= frags + 1;
+		xdp_tx++;
 
 		if (xdp_ring->xsk_pool)
 			xsk_buff_free(tx_buf->xdp);
@@ -295,6 +295,7 @@ static u32 ice_clean_xdp_irq(struct ice_tx_ring *xdp_ring)
 
 	tx_desc->cmd_type_offset_bsz = 0;
 	xdp_ring->next_to_clean = ntc;
+	xdp_ring->xdp_tx_active -= xdp_tx;
 	ice_update_tx_ring_stats(xdp_ring, total_pkts, total_bytes);
 
 	return ret;
-- 
2.39.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH bpf-next 2/6] ice: fix XDP Tx ring overrun
  2023-02-10 17:06 [PATCH bpf-next 0/6] ice: post-mbuf fixes Alexander Lobakin
  2023-02-10 17:06 ` [PATCH bpf-next 1/6] ice: fix ice_tx_ring::xdp_tx_active underflow Alexander Lobakin
@ 2023-02-10 17:06 ` Alexander Lobakin
  2023-02-10 17:06 ` [PATCH bpf-next 3/6] ice: remove two impossible branches on XDP Tx cleaning Alexander Lobakin
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 11+ messages in thread
From: Alexander Lobakin @ 2023-02-10 17:06 UTC (permalink / raw)
  To: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko
  Cc: Alexander Lobakin, Maciej Fijalkowski, Magnus Karlsson,
	Toke Høiland-Jørgensen, Martin KaFai Lau, Song Liu,
	Jesper Dangaard Brouer, Jakub Kicinski, bpf, netdev,
	linux-kernel

Sometimes, under heavy XDP Tx traffic, e.g. when using XDP traffic
generator (%BPF_F_TEST_XDP_LIVE_FRAMES), the machine can catch OOM due
to the driver not freeing all of the pages passed to it by
.ndo_xdp_xmit().
Turned out that during the development of the tagged commit, the check,
which ensures that we have a free descriptor to queue a frame, moved
into the branch happening only when a buffer has frags. Otherwise, we
only run a cleaning cycle, but don't check anything.
ATST, there can be situations when the driver gets new frames to send,
but there are no buffers that can be cleaned/completed and the ring has
no free slots. It's very rare, but still possible (> 6.5 Mpps per ring).
The driver then fills the next buffer/descriptor, effectively
overwriting the data, which still needs to be freed.

Restore the check after the cleaning routine to make sure there is a
slot to queue a new frame. When there are frags, there still will be a
separate check that we can place all of them, but if the ring is full,
there's no point in wasting any more time.

(minor: make `!ready_frames` unlikely since it happens ~1-2 times per
 billion of frames)

Fixes: 3246a10752a7 ("ice: Add support for XDP multi-buffer on Tx side")
Signed-off-by: Alexander Lobakin <alexandr.lobakin@intel.com>
---
 drivers/net/ethernet/intel/ice/ice_txrx_lib.c | 19 ++++++++++++-------
 1 file changed, 12 insertions(+), 7 deletions(-)

diff --git a/drivers/net/ethernet/intel/ice/ice_txrx_lib.c b/drivers/net/ethernet/intel/ice/ice_txrx_lib.c
index d1a7171e618b..784f2f9ebb2d 100644
--- a/drivers/net/ethernet/intel/ice/ice_txrx_lib.c
+++ b/drivers/net/ethernet/intel/ice/ice_txrx_lib.c
@@ -260,7 +260,7 @@ static u32 ice_clean_xdp_irq(struct ice_tx_ring *xdp_ring)
 			ready_frames = idx + cnt - ntc + 1;
 	}
 
-	if (!ready_frames)
+	if (unlikely(!ready_frames))
 		return 0;
 	ret = ready_frames;
 
@@ -322,17 +322,17 @@ int __ice_xmit_xdp_ring(struct xdp_buff *xdp, struct ice_tx_ring *xdp_ring)
 	u32 frag = 0;
 
 	free_space = ICE_DESC_UNUSED(xdp_ring);
-
-	if (ICE_DESC_UNUSED(xdp_ring) < ICE_RING_QUARTER(xdp_ring))
+	if (free_space < ICE_RING_QUARTER(xdp_ring))
 		free_space += ice_clean_xdp_irq(xdp_ring);
 
+	if (unlikely(!free_space))
+		goto busy;
+
 	if (unlikely(xdp_buff_has_frags(xdp))) {
 		sinfo = xdp_get_shared_info_from_buff(xdp);
 		nr_frags = sinfo->nr_frags;
-		if (free_space < nr_frags + 1) {
-			xdp_ring->ring_stats->tx_stats.tx_busy++;
-			return ICE_XDP_CONSUMED;
-		}
+		if (free_space < nr_frags + 1)
+			goto busy;
 	}
 
 	tx_desc = ICE_TX_DESC(xdp_ring, ntu);
@@ -396,6 +396,11 @@ int __ice_xmit_xdp_ring(struct xdp_buff *xdp, struct ice_tx_ring *xdp_ring)
 		ntu--;
 	}
 	return ICE_XDP_CONSUMED;
+
+busy:
+	xdp_ring->ring_stats->tx_stats.tx_busy++;
+
+	return ICE_XDP_CONSUMED;
 }
 
 /**
-- 
2.39.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH bpf-next 3/6] ice: remove two impossible branches on XDP Tx cleaning
  2023-02-10 17:06 [PATCH bpf-next 0/6] ice: post-mbuf fixes Alexander Lobakin
  2023-02-10 17:06 ` [PATCH bpf-next 1/6] ice: fix ice_tx_ring::xdp_tx_active underflow Alexander Lobakin
  2023-02-10 17:06 ` [PATCH bpf-next 2/6] ice: fix XDP Tx ring overrun Alexander Lobakin
@ 2023-02-10 17:06 ` Alexander Lobakin
  2023-02-10 17:06 ` [PATCH bpf-next 4/6] ice: robustify cleaning/completing XDP Tx buffers Alexander Lobakin
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 11+ messages in thread
From: Alexander Lobakin @ 2023-02-10 17:06 UTC (permalink / raw)
  To: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko
  Cc: Alexander Lobakin, Maciej Fijalkowski, Magnus Karlsson,
	Toke Høiland-Jørgensen, Martin KaFai Lau, Song Liu,
	Jesper Dangaard Brouer, Jakub Kicinski, bpf, netdev,
	linux-kernel

The tagged commit started sending %XDP_TX frames from XSk Rx ring
directly without converting it to an &xdp_frame. However, when XSk is
enabled on a queue pair, it has its separate Tx cleaning functions, so
neither ice_clean_xdp_irq() nor ice_unmap_and_free_tx_buf() ever happens
there.
Remove impossible branches in order to reduce the diffstat of the
upcoming change.

Fixes: a24b4c6e9aab ("ice: xsk: Do not convert to buff to frame for XDP_TX")
Suggested-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Alexander Lobakin <alexandr.lobakin@intel.com>
---
 drivers/net/ethernet/intel/ice/ice_txrx.c     | 5 +----
 drivers/net/ethernet/intel/ice/ice_txrx_lib.c | 5 +----
 2 files changed, 2 insertions(+), 8 deletions(-)

diff --git a/drivers/net/ethernet/intel/ice/ice_txrx.c b/drivers/net/ethernet/intel/ice/ice_txrx.c
index 466113c86e6f..6b99adb695e7 100644
--- a/drivers/net/ethernet/intel/ice/ice_txrx.c
+++ b/drivers/net/ethernet/intel/ice/ice_txrx.c
@@ -116,10 +116,7 @@ ice_unmap_and_free_tx_buf(struct ice_tx_ring *ring, struct ice_tx_buf *tx_buf)
 		if (tx_buf->tx_flags & ICE_TX_FLAGS_DUMMY_PKT) {
 			devm_kfree(ring->dev, tx_buf->raw_buf);
 		} else if (ice_ring_is_xdp(ring)) {
-			if (ring->xsk_pool)
-				xsk_buff_free(tx_buf->xdp);
-			else
-				page_frag_free(tx_buf->raw_buf);
+			page_frag_free(tx_buf->raw_buf);
 		} else {
 			dev_kfree_skb_any(tx_buf->skb);
 		}
diff --git a/drivers/net/ethernet/intel/ice/ice_txrx_lib.c b/drivers/net/ethernet/intel/ice/ice_txrx_lib.c
index 784f2f9ebb2d..6371acb0deb0 100644
--- a/drivers/net/ethernet/intel/ice/ice_txrx_lib.c
+++ b/drivers/net/ethernet/intel/ice/ice_txrx_lib.c
@@ -275,10 +275,7 @@ static u32 ice_clean_xdp_irq(struct ice_tx_ring *xdp_ring)
 		ready_frames -= frags + 1;
 		xdp_tx++;
 
-		if (xdp_ring->xsk_pool)
-			xsk_buff_free(tx_buf->xdp);
-		else
-			ice_clean_xdp_tx_buf(xdp_ring, tx_buf);
+		ice_clean_xdp_tx_buf(xdp_ring, tx_buf);
 		ntc++;
 		if (ntc == cnt)
 			ntc = 0;
-- 
2.39.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH bpf-next 4/6] ice: robustify cleaning/completing XDP Tx buffers
  2023-02-10 17:06 [PATCH bpf-next 0/6] ice: post-mbuf fixes Alexander Lobakin
                   ` (2 preceding siblings ...)
  2023-02-10 17:06 ` [PATCH bpf-next 3/6] ice: remove two impossible branches on XDP Tx cleaning Alexander Lobakin
@ 2023-02-10 17:06 ` Alexander Lobakin
  2023-02-10 17:06 ` [PATCH bpf-next 5/6] ice: fix freeing XDP frames backed by Page Pool Alexander Lobakin
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 11+ messages in thread
From: Alexander Lobakin @ 2023-02-10 17:06 UTC (permalink / raw)
  To: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko
  Cc: Alexander Lobakin, Maciej Fijalkowski, Magnus Karlsson,
	Toke Høiland-Jørgensen, Martin KaFai Lau, Song Liu,
	Jesper Dangaard Brouer, Jakub Kicinski, bpf, netdev,
	linux-kernel

When queueing frames from a Page Pool for redirecting to a device backed
by the ice driver, `perf top` shows heavy load on page_alloc() and
page_frag_free(), despite that on a properly working system it must be
fully or at least almost zero-alloc. The problem is in fact a bit deeper
and raises from how ice cleans up completed Tx buffers.

The story so far: when cleaning/freeing the resources related to
a particular completed Tx frame (skbs, DMA mappings etc.), ice uses some
heuristics only without setting any type explicitly (except for dummy
Flow Director packets, which are marked via ice_tx_buf::tx_flags).
This kinda works, but only up to some point. For example, currently ice
assumes that each frame coming to __ice_xmit_xdp_ring(), is backed by
either plain order-0 page or plain page frag, while it may also be
backed by Page Pool or any other possible memory models introduced in
future. This means any &xdp_frame must be freed properly via
xdp_return_frame() family with no assumptions.

In order to do that, the whole heuristics must be replaced with setting
the Tx buffer/frame type explicitly, just how it's always been done via
an enum. Let us reuse 16 bits from ::tx_flags -- 1 bit-and instr won't
hurt much -- especially given that sometimes there was a check for
%ICE_TX_FLAGS_DUMMY_PKT, which is now turned from a flag to an enum
member. The rest of the changes is straightforward and most of it is
just a conversion to rely now on the type set in &ice_tx_buf rather than
to some secondary properties.
For now, no functional changes intended, the change only prepares the
ground for starting freeing XDP frames properly next step. And it must
be done atomically/synchronously to not break stuff.

Signed-off-by: Alexander Lobakin <alexandr.lobakin@intel.com>
---
 drivers/net/ethernet/intel/ice/ice_txrx.c     | 38 +++++++++----------
 drivers/net/ethernet/intel/ice/ice_txrx.h     | 34 ++++++++++++-----
 drivers/net/ethernet/intel/ice/ice_txrx_lib.c | 15 ++++++--
 drivers/net/ethernet/intel/ice/ice_xsk.c      | 12 +++---
 4 files changed, 63 insertions(+), 36 deletions(-)

diff --git a/drivers/net/ethernet/intel/ice/ice_txrx.c b/drivers/net/ethernet/intel/ice/ice_txrx.c
index 6b99adb695e7..d7e8a3f81e20 100644
--- a/drivers/net/ethernet/intel/ice/ice_txrx.c
+++ b/drivers/net/ethernet/intel/ice/ice_txrx.c
@@ -85,7 +85,7 @@ ice_prgm_fdir_fltr(struct ice_vsi *vsi, struct ice_fltr_desc *fdir_desc,
 	td_cmd = ICE_TXD_LAST_DESC_CMD | ICE_TX_DESC_CMD_DUMMY |
 		 ICE_TX_DESC_CMD_RE;
 
-	tx_buf->tx_flags = ICE_TX_FLAGS_DUMMY_PKT;
+	tx_buf->type = ICE_TX_BUF_DUMMY;
 	tx_buf->raw_buf = raw_packet;
 
 	tx_desc->cmd_type_offset_bsz =
@@ -112,28 +112,26 @@ ice_prgm_fdir_fltr(struct ice_vsi *vsi, struct ice_fltr_desc *fdir_desc,
 static void
 ice_unmap_and_free_tx_buf(struct ice_tx_ring *ring, struct ice_tx_buf *tx_buf)
 {
-	if (tx_buf->skb) {
-		if (tx_buf->tx_flags & ICE_TX_FLAGS_DUMMY_PKT) {
-			devm_kfree(ring->dev, tx_buf->raw_buf);
-		} else if (ice_ring_is_xdp(ring)) {
-			page_frag_free(tx_buf->raw_buf);
-		} else {
-			dev_kfree_skb_any(tx_buf->skb);
-		}
-		if (dma_unmap_len(tx_buf, len))
-			dma_unmap_single(ring->dev,
-					 dma_unmap_addr(tx_buf, dma),
-					 dma_unmap_len(tx_buf, len),
-					 DMA_TO_DEVICE);
-	} else if (dma_unmap_len(tx_buf, len)) {
+	if (dma_unmap_len(tx_buf, len))
 		dma_unmap_page(ring->dev,
 			       dma_unmap_addr(tx_buf, dma),
 			       dma_unmap_len(tx_buf, len),
 			       DMA_TO_DEVICE);
+
+	switch (tx_buf->type) {
+	case ICE_TX_BUF_DUMMY:
+		devm_kfree(ring->dev, tx_buf->raw_buf);
+		break;
+	case ICE_TX_BUF_SKB:
+		dev_kfree_skb_any(tx_buf->skb);
+		break;
+	case ICE_TX_BUF_XDP_TX:
+		page_frag_free(tx_buf->raw_buf);
+		break;
 	}
 
 	tx_buf->next_to_watch = NULL;
-	tx_buf->skb = NULL;
+	tx_buf->type = ICE_TX_BUF_EMPTY;
 	dma_unmap_len_set(tx_buf, len, 0);
 	/* tx_buf must be completely set up in the transmit path */
 }
@@ -266,7 +264,7 @@ static bool ice_clean_tx_irq(struct ice_tx_ring *tx_ring, int napi_budget)
 				 DMA_TO_DEVICE);
 
 		/* clear tx_buf data */
-		tx_buf->skb = NULL;
+		tx_buf->type = ICE_TX_BUF_EMPTY;
 		dma_unmap_len_set(tx_buf, len, 0);
 
 		/* unmap remaining buffers */
@@ -1709,6 +1707,7 @@ ice_tx_map(struct ice_tx_ring *tx_ring, struct ice_tx_buf *first,
 				       DMA_TO_DEVICE);
 
 		tx_buf = &tx_ring->tx_buf[i];
+		tx_buf->type = ICE_TX_BUF_FRAG;
 	}
 
 	/* record SW timestamp if HW timestamp is not available */
@@ -2352,6 +2351,7 @@ ice_xmit_frame_ring(struct sk_buff *skb, struct ice_tx_ring *tx_ring)
 	/* record the location of the first descriptor for this packet */
 	first = &tx_ring->tx_buf[tx_ring->next_to_use];
 	first->skb = skb;
+	first->type = ICE_TX_BUF_SKB;
 	first->bytecount = max_t(unsigned int, skb->len, ETH_ZLEN);
 	first->gso_segs = 1;
 	first->tx_flags = 0;
@@ -2524,11 +2524,11 @@ void ice_clean_ctrl_tx_irq(struct ice_tx_ring *tx_ring)
 					 dma_unmap_addr(tx_buf, dma),
 					 dma_unmap_len(tx_buf, len),
 					 DMA_TO_DEVICE);
-		if (tx_buf->tx_flags & ICE_TX_FLAGS_DUMMY_PKT)
+		if (tx_buf->type == ICE_TX_BUF_DUMMY)
 			devm_kfree(tx_ring->dev, tx_buf->raw_buf);
 
 		/* clear next_to_watch to prevent false hangs */
-		tx_buf->raw_buf = NULL;
+		tx_buf->type = ICE_TX_BUF_EMPTY;
 		tx_buf->tx_flags = 0;
 		tx_buf->next_to_watch = NULL;
 		dma_unmap_len_set(tx_buf, len, 0);
diff --git a/drivers/net/ethernet/intel/ice/ice_txrx.h b/drivers/net/ethernet/intel/ice/ice_txrx.h
index efa3d378f19e..18d8ba0396e8 100644
--- a/drivers/net/ethernet/intel/ice/ice_txrx.h
+++ b/drivers/net/ethernet/intel/ice/ice_txrx.h
@@ -121,10 +121,7 @@ static inline int ice_skb_pad(void)
 #define ICE_TX_FLAGS_TSO	BIT(0)
 #define ICE_TX_FLAGS_HW_VLAN	BIT(1)
 #define ICE_TX_FLAGS_SW_VLAN	BIT(2)
-/* ICE_TX_FLAGS_DUMMY_PKT is used to mark dummy packets that should be
- * freed instead of returned like skb packets.
- */
-#define ICE_TX_FLAGS_DUMMY_PKT	BIT(3)
+/* Free, was ICE_TX_FLAGS_DUMMY_PKT */
 #define ICE_TX_FLAGS_TSYN	BIT(4)
 #define ICE_TX_FLAGS_IPV4	BIT(5)
 #define ICE_TX_FLAGS_IPV6	BIT(6)
@@ -149,22 +146,41 @@ static inline int ice_skb_pad(void)
 
 #define ICE_TXD_LAST_DESC_CMD (ICE_TX_DESC_CMD_EOP | ICE_TX_DESC_CMD_RS)
 
+/**
+ * enum ice_tx_buf_type - type of &ice_tx_buf to act on Tx completion
+ * @ICE_TX_BUF_EMPTY: unused OR XSk frame, no action required
+ * @ICE_TX_BUF_DUMMY: dummy Flow Director packet, unmap and kfree()
+ * @ICE_TX_BUF_FRAG: mapped skb OR &xdp_buff frag, only unmap DMA
+ * @ICE_TX_BUF_SKB: &sk_buff, unmap and consume_skb(), update stats
+ * @ICE_TX_BUF_XDP_TX: &xdp_buff, unmap and page_frag_free(), stats
+ * @ICE_TX_BUF_XSK_TX: &xdp_buff on XSk queue, xsk_buff_free(), stats
+ */
+enum ice_tx_buf_type {
+	ICE_TX_BUF_EMPTY	= 0U,
+	ICE_TX_BUF_DUMMY,
+	ICE_TX_BUF_FRAG,
+	ICE_TX_BUF_SKB,
+	ICE_TX_BUF_XDP_TX,
+	ICE_TX_BUF_XSK_TX,
+};
+
 struct ice_tx_buf {
 	union {
 		struct ice_tx_desc *next_to_watch;
 		u32 rs_idx;
 	};
 	union {
-		struct sk_buff *skb;
-		void *raw_buf; /* used for XDP */
-		struct xdp_buff *xdp; /* used for XDP_TX ZC */
+		void *raw_buf;		/* used for XDP_TX and FDir rules */
+		struct sk_buff *skb;	/* used for .ndo_start_xmit() */
+		struct xdp_buff *xdp;	/* used for XDP_TX ZC */
 	};
 	unsigned int bytecount;
 	union {
 		unsigned int gso_segs;
-		unsigned int nr_frags; /* used for mbuf XDP */
+		unsigned int nr_frags;	/* used for mbuf XDP */
 	};
-	u32 tx_flags;
+	u32 type:16;			/* &ice_tx_buf_type */
+	u32 tx_flags:16;
 	DEFINE_DMA_UNMAP_LEN(len);
 	DEFINE_DMA_UNMAP_ADDR(dma);
 };
diff --git a/drivers/net/ethernet/intel/ice/ice_txrx_lib.c b/drivers/net/ethernet/intel/ice/ice_txrx_lib.c
index 6371acb0deb0..23ac4824e974 100644
--- a/drivers/net/ethernet/intel/ice/ice_txrx_lib.c
+++ b/drivers/net/ethernet/intel/ice/ice_txrx_lib.c
@@ -231,8 +231,14 @@ ice_clean_xdp_tx_buf(struct ice_tx_ring *xdp_ring, struct ice_tx_buf *tx_buf)
 	dma_unmap_single(xdp_ring->dev, dma_unmap_addr(tx_buf, dma),
 			 dma_unmap_len(tx_buf, len), DMA_TO_DEVICE);
 	dma_unmap_len_set(tx_buf, len, 0);
-	page_frag_free(tx_buf->raw_buf);
-	tx_buf->raw_buf = NULL;
+
+	switch (tx_buf->type) {
+	case ICE_TX_BUF_XDP_TX:
+		page_frag_free(tx_buf->raw_buf);
+		break;
+	}
+
+	tx_buf->type = ICE_TX_BUF_EMPTY;
 }
 
 /**
@@ -266,6 +272,7 @@ static u32 ice_clean_xdp_irq(struct ice_tx_ring *xdp_ring)
 
 	while (ready_frames) {
 		struct ice_tx_buf *tx_buf = &xdp_ring->tx_buf[ntc];
+		struct ice_tx_buf *head = tx_buf;
 
 		/* bytecount holds size of head + frags */
 		total_bytes += tx_buf->bytecount;
@@ -275,7 +282,6 @@ static u32 ice_clean_xdp_irq(struct ice_tx_ring *xdp_ring)
 		ready_frames -= frags + 1;
 		xdp_tx++;
 
-		ice_clean_xdp_tx_buf(xdp_ring, tx_buf);
 		ntc++;
 		if (ntc == cnt)
 			ntc = 0;
@@ -288,6 +294,8 @@ static u32 ice_clean_xdp_irq(struct ice_tx_ring *xdp_ring)
 			if (ntc == cnt)
 				ntc = 0;
 		}
+
+		ice_clean_xdp_tx_buf(xdp_ring, head);
 	}
 
 	tx_desc->cmd_type_offset_bsz = 0;
@@ -349,6 +357,7 @@ int __ice_xmit_xdp_ring(struct xdp_buff *xdp, struct ice_tx_ring *xdp_ring)
 
 		tx_desc->buf_addr = cpu_to_le64(dma);
 		tx_desc->cmd_type_offset_bsz = ice_build_ctob(0, 0, size, 0);
+		tx_buf->type = ICE_TX_BUF_XDP_TX;
 		tx_buf->raw_buf = data;
 
 		ntu++;
diff --git a/drivers/net/ethernet/intel/ice/ice_xsk.c b/drivers/net/ethernet/intel/ice/ice_xsk.c
index a25a68c69f22..917c75e530ca 100644
--- a/drivers/net/ethernet/intel/ice/ice_xsk.c
+++ b/drivers/net/ethernet/intel/ice/ice_xsk.c
@@ -631,7 +631,8 @@ static void ice_clean_xdp_irq_zc(struct ice_tx_ring *xdp_ring)
 	for (i = 0; i < xsk_frames; i++) {
 		tx_buf = &xdp_ring->tx_buf[ntc];
 
-		if (tx_buf->xdp) {
+		if (tx_buf->type == ICE_TX_BUF_XSK_TX) {
+			tx_buf->type = ICE_TX_BUF_EMPTY;
 			xsk_buff_free(tx_buf->xdp);
 			xdp_ring->xdp_tx_active--;
 		} else {
@@ -685,6 +686,7 @@ static int ice_xmit_xdp_tx_zc(struct xdp_buff *xdp,
 
 	tx_buf = &xdp_ring->tx_buf[ntu];
 	tx_buf->xdp = xdp;
+	tx_buf->type = ICE_TX_BUF_XSK_TX;
 	tx_desc = ICE_TX_DESC(xdp_ring, ntu);
 	tx_desc->buf_addr = cpu_to_le64(dma);
 	tx_desc->cmd_type_offset_bsz = ice_build_ctob(ICE_TX_DESC_CMD_EOP,
@@ -1083,12 +1085,12 @@ void ice_xsk_clean_xdp_ring(struct ice_tx_ring *xdp_ring)
 	while (ntc != ntu) {
 		struct ice_tx_buf *tx_buf = &xdp_ring->tx_buf[ntc];
 
-		if (tx_buf->xdp)
+		if (tx_buf->type == ICE_TX_BUF_XSK_TX) {
+			tx_buf->type = ICE_TX_BUF_EMPTY;
 			xsk_buff_free(tx_buf->xdp);
-		else
+		} else {
 			xsk_frames++;
-
-		tx_buf->raw_buf = NULL;
+		}
 
 		ntc++;
 		if (ntc >= xdp_ring->count)
-- 
2.39.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH bpf-next 5/6] ice: fix freeing XDP frames backed by Page Pool
  2023-02-10 17:06 [PATCH bpf-next 0/6] ice: post-mbuf fixes Alexander Lobakin
                   ` (3 preceding siblings ...)
  2023-02-10 17:06 ` [PATCH bpf-next 4/6] ice: robustify cleaning/completing XDP Tx buffers Alexander Lobakin
@ 2023-02-10 17:06 ` Alexander Lobakin
  2023-02-10 17:06 ` [PATCH bpf-next 6/6] ice: micro-optimize .ndo_xdp_xmit() path Alexander Lobakin
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 11+ messages in thread
From: Alexander Lobakin @ 2023-02-10 17:06 UTC (permalink / raw)
  To: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko
  Cc: Alexander Lobakin, Maciej Fijalkowski, Magnus Karlsson,
	Toke Høiland-Jørgensen, Martin KaFai Lau, Song Liu,
	Jesper Dangaard Brouer, Jakub Kicinski, bpf, netdev,
	linux-kernel

As already mentioned, freeing any &xdp_frame via page_frag_free() is
wrong, as it assumes the frame is backed by either an order-0 page or
a page with no "patrons" behind them, while in fact frames backed by
Page Pool can be redirected to a device, which's driver doesn't use it.
Keep storing a pointer to the raw buffer and then freeing it
unconditionally via page_frag_free() for %XDP_TX frames, but introduce
a separate type in the enum for frames coming through .ndo_xdp_xmit(),
and free them via xdp_return_frame_bulk(). Note that saving xdpf as
xdp_buff->data_hard_start is intentional and is always true when
everything is configured properly.
After this change, %XDP_REDIRECT from a Page Pool based driver to ice
becomes zero-alloc as it should be and horrendous 3.3 Mpps / queue
turn into 6.6, hehe.

Let it go with no "Fixes:" tag as it spans across good 5+ commits and
can't be trivially backported.

Signed-off-by: Alexander Lobakin <alexandr.lobakin@intel.com>
---
 drivers/net/ethernet/intel/ice/ice_txrx.c     |  5 ++-
 drivers/net/ethernet/intel/ice/ice_txrx.h     |  3 ++
 drivers/net/ethernet/intel/ice/ice_txrx_lib.c | 43 +++++++++++++++----
 drivers/net/ethernet/intel/ice/ice_txrx_lib.h |  3 +-
 4 files changed, 43 insertions(+), 11 deletions(-)

diff --git a/drivers/net/ethernet/intel/ice/ice_txrx.c b/drivers/net/ethernet/intel/ice/ice_txrx.c
index d7e8a3f81e20..e451276a37b6 100644
--- a/drivers/net/ethernet/intel/ice/ice_txrx.c
+++ b/drivers/net/ethernet/intel/ice/ice_txrx.c
@@ -128,6 +128,9 @@ ice_unmap_and_free_tx_buf(struct ice_tx_ring *ring, struct ice_tx_buf *tx_buf)
 	case ICE_TX_BUF_XDP_TX:
 		page_frag_free(tx_buf->raw_buf);
 		break;
+	case ICE_TX_BUF_XDP_XMIT:
+		xdp_return_frame(tx_buf->xdpf);
+		break;
 	}
 
 	tx_buf->next_to_watch = NULL;
@@ -575,7 +578,7 @@ ice_run_xdp(struct ice_rx_ring *rx_ring, struct xdp_buff *xdp,
 	case XDP_TX:
 		if (static_branch_unlikely(&ice_xdp_locking_key))
 			spin_lock(&xdp_ring->tx_lock);
-		ret = __ice_xmit_xdp_ring(xdp, xdp_ring);
+		ret = __ice_xmit_xdp_ring(xdp, xdp_ring, false);
 		if (static_branch_unlikely(&ice_xdp_locking_key))
 			spin_unlock(&xdp_ring->tx_lock);
 		if (ret == ICE_XDP_CONSUMED)
diff --git a/drivers/net/ethernet/intel/ice/ice_txrx.h b/drivers/net/ethernet/intel/ice/ice_txrx.h
index 18d8ba0396e8..fff0efe28373 100644
--- a/drivers/net/ethernet/intel/ice/ice_txrx.h
+++ b/drivers/net/ethernet/intel/ice/ice_txrx.h
@@ -153,6 +153,7 @@ static inline int ice_skb_pad(void)
  * @ICE_TX_BUF_FRAG: mapped skb OR &xdp_buff frag, only unmap DMA
  * @ICE_TX_BUF_SKB: &sk_buff, unmap and consume_skb(), update stats
  * @ICE_TX_BUF_XDP_TX: &xdp_buff, unmap and page_frag_free(), stats
+ * @ICE_TX_BUF_XDP_XMIT: &xdp_frame, unmap and xdp_return_frame(), stats
  * @ICE_TX_BUF_XSK_TX: &xdp_buff on XSk queue, xsk_buff_free(), stats
  */
 enum ice_tx_buf_type {
@@ -161,6 +162,7 @@ enum ice_tx_buf_type {
 	ICE_TX_BUF_FRAG,
 	ICE_TX_BUF_SKB,
 	ICE_TX_BUF_XDP_TX,
+	ICE_TX_BUF_XDP_XMIT,
 	ICE_TX_BUF_XSK_TX,
 };
 
@@ -172,6 +174,7 @@ struct ice_tx_buf {
 	union {
 		void *raw_buf;		/* used for XDP_TX and FDir rules */
 		struct sk_buff *skb;	/* used for .ndo_start_xmit() */
+		struct xdp_frame *xdpf;	/* used for .ndo_xdp_xmit() */
 		struct xdp_buff *xdp;	/* used for XDP_TX ZC */
 	};
 	unsigned int bytecount;
diff --git a/drivers/net/ethernet/intel/ice/ice_txrx_lib.c b/drivers/net/ethernet/intel/ice/ice_txrx_lib.c
index 23ac4824e974..6d98c34d99fc 100644
--- a/drivers/net/ethernet/intel/ice/ice_txrx_lib.c
+++ b/drivers/net/ethernet/intel/ice/ice_txrx_lib.c
@@ -222,13 +222,15 @@ ice_receive_skb(struct ice_rx_ring *rx_ring, struct sk_buff *skb, u16 vlan_tag)
 
 /**
  * ice_clean_xdp_tx_buf - Free and unmap XDP Tx buffer
- * @xdp_ring: XDP Tx ring
+ * @dev: device for DMA mapping
  * @tx_buf: Tx buffer to clean
+ * @bq: XDP bulk flush struct
  */
 static void
-ice_clean_xdp_tx_buf(struct ice_tx_ring *xdp_ring, struct ice_tx_buf *tx_buf)
+ice_clean_xdp_tx_buf(struct device *dev, struct ice_tx_buf *tx_buf,
+		     struct xdp_frame_bulk *bq)
 {
-	dma_unmap_single(xdp_ring->dev, dma_unmap_addr(tx_buf, dma),
+	dma_unmap_single(dev, dma_unmap_addr(tx_buf, dma),
 			 dma_unmap_len(tx_buf, len), DMA_TO_DEVICE);
 	dma_unmap_len_set(tx_buf, len, 0);
 
@@ -236,6 +238,9 @@ ice_clean_xdp_tx_buf(struct ice_tx_ring *xdp_ring, struct ice_tx_buf *tx_buf)
 	case ICE_TX_BUF_XDP_TX:
 		page_frag_free(tx_buf->raw_buf);
 		break;
+	case ICE_TX_BUF_XDP_XMIT:
+		xdp_return_frame_bulk(tx_buf->xdpf, bq);
+		break;
 	}
 
 	tx_buf->type = ICE_TX_BUF_EMPTY;
@@ -248,9 +253,11 @@ ice_clean_xdp_tx_buf(struct ice_tx_ring *xdp_ring, struct ice_tx_buf *tx_buf)
 static u32 ice_clean_xdp_irq(struct ice_tx_ring *xdp_ring)
 {
 	int total_bytes = 0, total_pkts = 0;
+	struct device *dev = xdp_ring->dev;
 	u32 ntc = xdp_ring->next_to_clean;
 	struct ice_tx_desc *tx_desc;
 	u32 cnt = xdp_ring->count;
+	struct xdp_frame_bulk bq;
 	u32 frags, xdp_tx = 0;
 	u32 ready_frames = 0;
 	u32 idx;
@@ -270,6 +277,9 @@ static u32 ice_clean_xdp_irq(struct ice_tx_ring *xdp_ring)
 		return 0;
 	ret = ready_frames;
 
+	xdp_frame_bulk_init(&bq);
+	rcu_read_lock(); /* xdp_return_frame_bulk() */
+
 	while (ready_frames) {
 		struct ice_tx_buf *tx_buf = &xdp_ring->tx_buf[ntc];
 		struct ice_tx_buf *head = tx_buf;
@@ -289,15 +299,18 @@ static u32 ice_clean_xdp_irq(struct ice_tx_ring *xdp_ring)
 		for (int i = 0; i < frags; i++) {
 			tx_buf = &xdp_ring->tx_buf[ntc];
 
-			ice_clean_xdp_tx_buf(xdp_ring, tx_buf);
+			ice_clean_xdp_tx_buf(dev, tx_buf, &bq);
 			ntc++;
 			if (ntc == cnt)
 				ntc = 0;
 		}
 
-		ice_clean_xdp_tx_buf(xdp_ring, head);
+		ice_clean_xdp_tx_buf(dev, head, &bq);
 	}
 
+	xdp_flush_frame_bulk(&bq);
+	rcu_read_unlock();
+
 	tx_desc->cmd_type_offset_bsz = 0;
 	xdp_ring->next_to_clean = ntc;
 	xdp_ring->xdp_tx_active -= xdp_tx;
@@ -310,8 +323,10 @@ static u32 ice_clean_xdp_irq(struct ice_tx_ring *xdp_ring)
  * __ice_xmit_xdp_ring - submit frame to XDP ring for transmission
  * @xdp: XDP buffer to be placed onto Tx descriptors
  * @xdp_ring: XDP ring for transmission
+ * @frame: whether this comes from .ndo_xdp_xmit()
  */
-int __ice_xmit_xdp_ring(struct xdp_buff *xdp, struct ice_tx_ring *xdp_ring)
+int __ice_xmit_xdp_ring(struct xdp_buff *xdp, struct ice_tx_ring *xdp_ring,
+			bool frame)
 {
 	struct skb_shared_info *sinfo = NULL;
 	u32 size = xdp->data_end - xdp->data;
@@ -355,10 +370,15 @@ int __ice_xmit_xdp_ring(struct xdp_buff *xdp, struct ice_tx_ring *xdp_ring)
 		dma_unmap_len_set(tx_buf, len, size);
 		dma_unmap_addr_set(tx_buf, dma, dma);
 
+		if (frame) {
+			tx_buf->type = ICE_TX_BUF_FRAG;
+		} else {
+			tx_buf->type = ICE_TX_BUF_XDP_TX;
+			tx_buf->raw_buf = data;
+		}
+
 		tx_desc->buf_addr = cpu_to_le64(dma);
 		tx_desc->cmd_type_offset_bsz = ice_build_ctob(0, 0, size, 0);
-		tx_buf->type = ICE_TX_BUF_XDP_TX;
-		tx_buf->raw_buf = data;
 
 		ntu++;
 		if (ntu == cnt)
@@ -379,6 +399,11 @@ int __ice_xmit_xdp_ring(struct xdp_buff *xdp, struct ice_tx_ring *xdp_ring)
 	tx_head->bytecount = xdp_get_buff_len(xdp);
 	tx_head->nr_frags = nr_frags;
 
+	if (frame) {
+		tx_head->type = ICE_TX_BUF_XDP_XMIT;
+		tx_head->xdpf = xdp->data_hard_start;
+	}
+
 	/* update last descriptor from a frame with EOP */
 	tx_desc->cmd_type_offset_bsz |=
 		cpu_to_le64(ICE_TX_DESC_CMD_EOP << ICE_TXD_QW1_CMD_S);
@@ -419,7 +444,7 @@ int ice_xmit_xdp_ring(struct xdp_frame *xdpf, struct ice_tx_ring *xdp_ring)
 	struct xdp_buff xdp;
 
 	xdp_convert_frame_to_buff(xdpf, &xdp);
-	return __ice_xmit_xdp_ring(&xdp, xdp_ring);
+	return __ice_xmit_xdp_ring(&xdp, xdp_ring, true);
 }
 
 /**
diff --git a/drivers/net/ethernet/intel/ice/ice_txrx_lib.h b/drivers/net/ethernet/intel/ice/ice_txrx_lib.h
index ea977f283c22..79efc20c46d9 100644
--- a/drivers/net/ethernet/intel/ice/ice_txrx_lib.h
+++ b/drivers/net/ethernet/intel/ice/ice_txrx_lib.h
@@ -143,7 +143,8 @@ static inline u32 ice_set_rs_bit(const struct ice_tx_ring *xdp_ring)
 void ice_finalize_xdp_rx(struct ice_tx_ring *xdp_ring, unsigned int xdp_res, u32 first_idx);
 int ice_xmit_xdp_buff(struct xdp_buff *xdp, struct ice_tx_ring *xdp_ring);
 int ice_xmit_xdp_ring(struct xdp_frame *xdpf, struct ice_tx_ring *xdp_ring);
-int __ice_xmit_xdp_ring(struct xdp_buff *xdp, struct ice_tx_ring *xdp_ring);
+int __ice_xmit_xdp_ring(struct xdp_buff *xdp, struct ice_tx_ring *xdp_ring,
+			bool frame);
 void ice_release_rx_desc(struct ice_rx_ring *rx_ring, u16 val);
 void
 ice_process_skb_fields(struct ice_rx_ring *rx_ring,
-- 
2.39.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH bpf-next 6/6] ice: micro-optimize .ndo_xdp_xmit() path
  2023-02-10 17:06 [PATCH bpf-next 0/6] ice: post-mbuf fixes Alexander Lobakin
                   ` (4 preceding siblings ...)
  2023-02-10 17:06 ` [PATCH bpf-next 5/6] ice: fix freeing XDP frames backed by Page Pool Alexander Lobakin
@ 2023-02-10 17:06 ` Alexander Lobakin
  2023-02-10 18:09 ` [PATCH bpf-next 0/6] ice: post-mbuf fixes Toke Høiland-Jørgensen
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 11+ messages in thread
From: Alexander Lobakin @ 2023-02-10 17:06 UTC (permalink / raw)
  To: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko
  Cc: Alexander Lobakin, Maciej Fijalkowski, Magnus Karlsson,
	Toke Høiland-Jørgensen, Martin KaFai Lau, Song Liu,
	Jesper Dangaard Brouer, Jakub Kicinski, bpf, netdev,
	linux-kernel

After the recent mbuf changes, ice_xmit_xdp_ring() became a 3-liner.
It makes no sense to keep it global in a different file than its caller.
Move it just next to the sole call site and mark static. Also, it
doesn't need a full xdp_convert_frame_to_buff(). Save several cycles
and fill only the fields used by __ice_xmit_xdp_ring() later on.
Finally, since it doesn't modify @xdpf anyhow, mark the argument const
to save some more (whole -11 bytes of .text! :D).

Thanks to 1 jump less and less calcs as well, this yields as many as
6.7 Mpps per queue. `xdp.data_hard_start = xdpf` is fully intentional
again (see xdp_convert_buff_to_frame()) and just works when there are
no source device's driver issues.

Signed-off-by: Alexander Lobakin <alexandr.lobakin@intel.com>
---
 drivers/net/ethernet/intel/ice/ice_txrx.c     | 21 ++++++++++++++++++-
 drivers/net/ethernet/intel/ice/ice_txrx_lib.c | 13 ------------
 drivers/net/ethernet/intel/ice/ice_txrx_lib.h |  1 -
 3 files changed, 20 insertions(+), 15 deletions(-)

diff --git a/drivers/net/ethernet/intel/ice/ice_txrx.c b/drivers/net/ethernet/intel/ice/ice_txrx.c
index e451276a37b6..aaf313a95368 100644
--- a/drivers/net/ethernet/intel/ice/ice_txrx.c
+++ b/drivers/net/ethernet/intel/ice/ice_txrx.c
@@ -605,6 +605,25 @@ ice_run_xdp(struct ice_rx_ring *rx_ring, struct xdp_buff *xdp,
 		ice_set_rx_bufs_act(xdp, rx_ring, ret);
 }
 
+/**
+ * ice_xmit_xdp_ring - submit frame to XDP ring for transmission
+ * @xdpf: XDP frame that will be converted to XDP buff
+ * @xdp_ring: XDP ring for transmission
+ */
+static int ice_xmit_xdp_ring(const struct xdp_frame *xdpf,
+			     struct ice_tx_ring *xdp_ring)
+{
+	struct xdp_buff xdp;
+
+	xdp.data_hard_start = (void *)xdpf;
+	xdp.data = xdpf->data;
+	xdp.data_end = xdp.data + xdpf->len;
+	xdp.frame_sz = xdpf->frame_sz;
+	xdp.flags = xdpf->flags;
+
+	return __ice_xmit_xdp_ring(&xdp, xdp_ring, true);
+}
+
 /**
  * ice_xdp_xmit - submit packets to XDP ring for transmission
  * @dev: netdev
@@ -650,7 +669,7 @@ ice_xdp_xmit(struct net_device *dev, int n, struct xdp_frame **frames,
 
 	tx_buf = &xdp_ring->tx_buf[xdp_ring->next_to_use];
 	for (i = 0; i < n; i++) {
-		struct xdp_frame *xdpf = frames[i];
+		const struct xdp_frame *xdpf = frames[i];
 		int err;
 
 		err = ice_xmit_xdp_ring(xdpf, xdp_ring);
diff --git a/drivers/net/ethernet/intel/ice/ice_txrx_lib.c b/drivers/net/ethernet/intel/ice/ice_txrx_lib.c
index 6d98c34d99fc..7bc5aa340c7d 100644
--- a/drivers/net/ethernet/intel/ice/ice_txrx_lib.c
+++ b/drivers/net/ethernet/intel/ice/ice_txrx_lib.c
@@ -434,19 +434,6 @@ int __ice_xmit_xdp_ring(struct xdp_buff *xdp, struct ice_tx_ring *xdp_ring,
 	return ICE_XDP_CONSUMED;
 }
 
-/**
- * ice_xmit_xdp_ring - submit frame to XDP ring for transmission
- * @xdpf: XDP frame that will be converted to XDP buff
- * @xdp_ring: XDP ring for transmission
- */
-int ice_xmit_xdp_ring(struct xdp_frame *xdpf, struct ice_tx_ring *xdp_ring)
-{
-	struct xdp_buff xdp;
-
-	xdp_convert_frame_to_buff(xdpf, &xdp);
-	return __ice_xmit_xdp_ring(&xdp, xdp_ring, true);
-}
-
 /**
  * ice_finalize_xdp_rx - Bump XDP Tx tail and/or flush redirect map
  * @xdp_ring: XDP ring
diff --git a/drivers/net/ethernet/intel/ice/ice_txrx_lib.h b/drivers/net/ethernet/intel/ice/ice_txrx_lib.h
index 79efc20c46d9..115969ecdf7b 100644
--- a/drivers/net/ethernet/intel/ice/ice_txrx_lib.h
+++ b/drivers/net/ethernet/intel/ice/ice_txrx_lib.h
@@ -142,7 +142,6 @@ static inline u32 ice_set_rs_bit(const struct ice_tx_ring *xdp_ring)
 
 void ice_finalize_xdp_rx(struct ice_tx_ring *xdp_ring, unsigned int xdp_res, u32 first_idx);
 int ice_xmit_xdp_buff(struct xdp_buff *xdp, struct ice_tx_ring *xdp_ring);
-int ice_xmit_xdp_ring(struct xdp_frame *xdpf, struct ice_tx_ring *xdp_ring);
 int __ice_xmit_xdp_ring(struct xdp_buff *xdp, struct ice_tx_ring *xdp_ring,
 			bool frame);
 void ice_release_rx_desc(struct ice_rx_ring *rx_ring, u16 val);
-- 
2.39.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH bpf-next 0/6] ice: post-mbuf fixes
  2023-02-10 17:06 [PATCH bpf-next 0/6] ice: post-mbuf fixes Alexander Lobakin
                   ` (5 preceding siblings ...)
  2023-02-10 17:06 ` [PATCH bpf-next 6/6] ice: micro-optimize .ndo_xdp_xmit() path Alexander Lobakin
@ 2023-02-10 18:09 ` Toke Høiland-Jørgensen
  2023-02-13 14:53   ` Alexander Lobakin
  2023-02-13 17:57 ` Maciej Fijalkowski
  2023-02-13 18:21 ` patchwork-bot+netdevbpf
  8 siblings, 1 reply; 11+ messages in thread
From: Toke Høiland-Jørgensen @ 2023-02-10 18:09 UTC (permalink / raw)
  To: Alexander Lobakin, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko
  Cc: Alexander Lobakin, Maciej Fijalkowski, Magnus Karlsson,
	Martin KaFai Lau, Song Liu, Jesper Dangaard Brouer,
	Jakub Kicinski, bpf, netdev, linux-kernel

Alexander Lobakin <alexandr.lobakin@intel.com> writes:

> The set grew from the poor performance of %BPF_F_TEST_XDP_LIVE_FRAMES
> when the ice-backed device is a sender. Initially there were around
> 3.3 Mpps / thread, while I have 5.5 on skb-based pktgen...
>
> After fixing 0005 (0004 is a prereq for it) first (strange thing nobody
> noticed that earlier), I started catching random OOMs. This is how 0002
> (and partially 0001) appeared.
> 0003 is a suggestion from Maciej to not waste time on refactoring dead
> lines. 0006 is a "cherry on top" to get away with the final 6.7 Mpps.
> 4.5 of 6 are fixes, but only the first three are tagged, since it then
> starts being tricky. I may backport them manually later on.
>
> TL;DR for the series is that shortcuts are good, but only as long as
> they don't make the driver miss important things. %XDP_TX is purely
> driver-local, however .ndo_xdp_xmit() is not, and sometimes assumptions
> can be unsafe there.
>
> With that series and also one core code patch[0], "live frames" and
> xdp-trafficgen are now safe'n'fast on ice (probably more to come).

Nice speedup! And cool to see that you're playing around with
xdp-trafficgen :)

-Toke


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH bpf-next 0/6] ice: post-mbuf fixes
  2023-02-10 18:09 ` [PATCH bpf-next 0/6] ice: post-mbuf fixes Toke Høiland-Jørgensen
@ 2023-02-13 14:53   ` Alexander Lobakin
  0 siblings, 0 replies; 11+ messages in thread
From: Alexander Lobakin @ 2023-02-13 14:53 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Maciej Fijalkowski, Magnus Karlsson, Martin KaFai Lau, Song Liu,
	Jesper Dangaard Brouer, Jakub Kicinski, bpf, netdev,
	linux-kernel

From: Toke Høiland-Jørgensen <toke@redhat.com>
Date: Fri, 10 Feb 2023 19:09:12 +0100

> Alexander Lobakin <alexandr.lobakin@intel.com> writes:
> 
>> The set grew from the poor performance of %BPF_F_TEST_XDP_LIVE_FRAMES
>> when the ice-backed device is a sender. Initially there were around
>> 3.3 Mpps / thread, while I have 5.5 on skb-based pktgen...
>>
>> After fixing 0005 (0004 is a prereq for it) first (strange thing nobody
>> noticed that earlier), I started catching random OOMs. This is how 0002
>> (and partially 0001) appeared.
>> 0003 is a suggestion from Maciej to not waste time on refactoring dead
>> lines. 0006 is a "cherry on top" to get away with the final 6.7 Mpps.
>> 4.5 of 6 are fixes, but only the first three are tagged, since it then
>> starts being tricky. I may backport them manually later on.
>>
>> TL;DR for the series is that shortcuts are good, but only as long as
>> they don't make the driver miss important things. %XDP_TX is purely
>> driver-local, however .ndo_xdp_xmit() is not, and sometimes assumptions
>> can be unsafe there.
>>
>> With that series and also one core code patch[0], "live frames" and
>> xdp-trafficgen are now safe'n'fast on ice (probably more to come).
> 
> Nice speedup! And cool to see that you're playing around with
> xdp-trafficgen :)

It's not only good for bombing receivers without any special HW, but
also for uncovering problems with XDP in drivers and/or kernel core,
as I can see :D

> 
> -Toke
>

Thanks,
Olek

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH bpf-next 0/6] ice: post-mbuf fixes
  2023-02-10 17:06 [PATCH bpf-next 0/6] ice: post-mbuf fixes Alexander Lobakin
                   ` (6 preceding siblings ...)
  2023-02-10 18:09 ` [PATCH bpf-next 0/6] ice: post-mbuf fixes Toke Høiland-Jørgensen
@ 2023-02-13 17:57 ` Maciej Fijalkowski
  2023-02-13 18:21 ` patchwork-bot+netdevbpf
  8 siblings, 0 replies; 11+ messages in thread
From: Maciej Fijalkowski @ 2023-02-13 17:57 UTC (permalink / raw)
  To: Alexander Lobakin
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Magnus Karlsson, Toke Høiland-Jørgensen,
	Martin KaFai Lau, Song Liu, Jesper Dangaard Brouer,
	Jakub Kicinski, bpf, netdev, linux-kernel

On Fri, Feb 10, 2023 at 06:06:12PM +0100, Alexander Lobakin wrote:
> The set grew from the poor performance of %BPF_F_TEST_XDP_LIVE_FRAMES
> when the ice-backed device is a sender. Initially there were around
> 3.3 Mpps / thread, while I have 5.5 on skb-based pktgen...
> 
> After fixing 0005 (0004 is a prereq for it) first (strange thing nobody
> noticed that earlier), I started catching random OOMs. This is how 0002
> (and partially 0001) appeared.
> 0003 is a suggestion from Maciej to not waste time on refactoring dead
> lines. 0006 is a "cherry on top" to get away with the final 6.7 Mpps.
> 4.5 of 6 are fixes, but only the first three are tagged, since it then
> starts being tricky. I may backport them manually later on.
> 
> TL;DR for the series is that shortcuts are good, but only as long as
> they don't make the driver miss important things. %XDP_TX is purely
> driver-local, however .ndo_xdp_xmit() is not, and sometimes assumptions
> can be unsafe there.
> 
> With that series and also one core code patch[0], "live frames" and
> xdp-trafficgen are now safe'n'fast on ice (probably more to come).
> 
> [0] https://lore.kernel.org/all/20230209172827.874728-1-alexandr.lobakin@intel.com
> ---
> Goes to directly to bpf-next as touches the recently added/changed code.

For the series:
Acked-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>

> 
> Alexander Lobakin (6):
>   ice: fix ice_tx_ring::xdp_tx_active underflow
>   ice: fix XDP Tx ring overrun
>   ice: remove two impossible branches on XDP Tx cleaning
>   ice: robustify cleaning/completing XDP Tx buffers
>   ice: fix freeing XDP frames backed by Page Pool
>   ice: micro-optimize .ndo_xdp_xmit() path
> 
>  drivers/net/ethernet/intel/ice/ice_txrx.c     | 67 +++++++++-----
>  drivers/net/ethernet/intel/ice/ice_txrx.h     | 37 ++++++--
>  drivers/net/ethernet/intel/ice/ice_txrx_lib.c | 88 ++++++++++++-------
>  drivers/net/ethernet/intel/ice/ice_txrx_lib.h |  4 +-
>  drivers/net/ethernet/intel/ice/ice_xsk.c      | 12 +--
>  5 files changed, 136 insertions(+), 72 deletions(-)
> 
> -- 
> 2.39.1
> 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH bpf-next 0/6] ice: post-mbuf fixes
  2023-02-10 17:06 [PATCH bpf-next 0/6] ice: post-mbuf fixes Alexander Lobakin
                   ` (7 preceding siblings ...)
  2023-02-13 17:57 ` Maciej Fijalkowski
@ 2023-02-13 18:21 ` patchwork-bot+netdevbpf
  8 siblings, 0 replies; 11+ messages in thread
From: patchwork-bot+netdevbpf @ 2023-02-13 18:21 UTC (permalink / raw)
  To: Alexander Lobakin
  Cc: ast, daniel, andrii, maciej.fijalkowski, magnus.karlsson, toke,
	martin.lau, song, hawk, kuba, bpf, netdev, linux-kernel

Hello:

This series was applied to bpf/bpf-next.git (master)
by Daniel Borkmann <daniel@iogearbox.net>:

On Fri, 10 Feb 2023 18:06:12 +0100 you wrote:
> The set grew from the poor performance of %BPF_F_TEST_XDP_LIVE_FRAMES
> when the ice-backed device is a sender. Initially there were around
> 3.3 Mpps / thread, while I have 5.5 on skb-based pktgen...
> 
> After fixing 0005 (0004 is a prereq for it) first (strange thing nobody
> noticed that earlier), I started catching random OOMs. This is how 0002
> (and partially 0001) appeared.
> 0003 is a suggestion from Maciej to not waste time on refactoring dead
> lines. 0006 is a "cherry on top" to get away with the final 6.7 Mpps.
> 4.5 of 6 are fixes, but only the first three are tagged, since it then
> starts being tricky. I may backport them manually later on.
> 
> [...]

Here is the summary with links:
  - [bpf-next,1/6] ice: fix ice_tx_ring::xdp_tx_active underflow
    https://git.kernel.org/bpf/bpf-next/c/bc4db8347003
  - [bpf-next,2/6] ice: fix XDP Tx ring overrun
    https://git.kernel.org/bpf/bpf-next/c/0bd939b60cea
  - [bpf-next,3/6] ice: remove two impossible branches on XDP Tx cleaning
    https://git.kernel.org/bpf/bpf-next/c/923096b5cec3
  - [bpf-next,4/6] ice: robustify cleaning/completing XDP Tx buffers
    https://git.kernel.org/bpf/bpf-next/c/aa1d3faf71a6
  - [bpf-next,5/6] ice: fix freeing XDP frames backed by Page Pool
    https://git.kernel.org/bpf/bpf-next/c/055d0920685e
  - [bpf-next,6/6] ice: micro-optimize .ndo_xdp_xmit() path
    https://git.kernel.org/bpf/bpf-next/c/ad07f29b9c9a

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2023-02-13 18:21 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-02-10 17:06 [PATCH bpf-next 0/6] ice: post-mbuf fixes Alexander Lobakin
2023-02-10 17:06 ` [PATCH bpf-next 1/6] ice: fix ice_tx_ring::xdp_tx_active underflow Alexander Lobakin
2023-02-10 17:06 ` [PATCH bpf-next 2/6] ice: fix XDP Tx ring overrun Alexander Lobakin
2023-02-10 17:06 ` [PATCH bpf-next 3/6] ice: remove two impossible branches on XDP Tx cleaning Alexander Lobakin
2023-02-10 17:06 ` [PATCH bpf-next 4/6] ice: robustify cleaning/completing XDP Tx buffers Alexander Lobakin
2023-02-10 17:06 ` [PATCH bpf-next 5/6] ice: fix freeing XDP frames backed by Page Pool Alexander Lobakin
2023-02-10 17:06 ` [PATCH bpf-next 6/6] ice: micro-optimize .ndo_xdp_xmit() path Alexander Lobakin
2023-02-10 18:09 ` [PATCH bpf-next 0/6] ice: post-mbuf fixes Toke Høiland-Jørgensen
2023-02-13 14:53   ` Alexander Lobakin
2023-02-13 17:57 ` Maciej Fijalkowski
2023-02-13 18:21 ` patchwork-bot+netdevbpf

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).