[PATCH bpf-next 00/10] xsk: stop softirq processing on full XSK Rx queue

bpf.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH bpf-next 00/10] xsk: stop softirq processing on full XSK Rx queue
@ 2022-04-05 11:06 Maciej Fijalkowski
  2022-04-05 11:06 ` [PATCH bpf-next 01/10] xsk: improve xdp_do_redirect() error codes Maciej Fijalkowski
                   ` (10 more replies)
  0 siblings, 11 replies; 30+ messages in thread
From: Maciej Fijalkowski @ 2022-04-05 11:06 UTC (permalink / raw)
  To: bpf, ast, daniel, magnus.karlsson, bjorn
  Cc: netdev, brouer, maximmi, alexandr.lobakin, Maciej Fijalkowski

Hi!

This is a revival of Bjorn's idea [0] to break NAPI loop when XSK Rx
queue gets full which in turn makes it impossible to keep on
successfully producing descriptors to XSK Rx ring. By breaking out of
the driver side immediately we will give the user space opportunity for
consuming descriptors from XSK Rx ring and therefore provide room in the
ring so that HW Rx -> XSK Rx redirection can be done.

Maxim asked and Jesper agreed on simplifying Bjorn's original API used
for detecting the event of interest, so let's just simply check for
-ENOBUFS within Intel's ZC drivers after an attempt to redirect a buffer
to XSK Rx. No real need for redirect API extension.

One might ask why it is still relevant even after having proper busy
poll support in place - here is the justification.

For xdpsock that was:
- run for l2fwd scenario,
- app/driver processing took place on the same core in busy poll
  with 2048 budget,
- HW ring sizes Tx 256, Rx 2048,

this work improved throughput by 78% and reduced Rx queue full statistic
bump by 99%.

For testing ice, make sure that you have [1] present on your side.

This set, besides the work described above, also carries also
improvements around return codes in various XSK paths and lastly a minor
optimization for xskq_cons_has_entries(), a helper that might be used
when XSK Rx batching would make it to the kernel.

Thanks!
MF

[0]: https://lore.kernel.org/bpf/20200904135332.60259-1-bjorn.topel@gmail.com/
[1]: https://lore.kernel.org/netdev/20220317175727.340251-1-maciej.fijalkowski@intel.com/

Björn Töpel (1):
  xsk: improve xdp_do_redirect() error codes

Maciej Fijalkowski (9):
  xsk: diversify return codes in xsk_rcv_check()
  ice: xsk: terminate NAPI when XSK Rx queue gets full
  i40e: xsk: terminate NAPI when XSK Rx queue gets full
  ixgbe: xsk: terminate NAPI when XSK Rx queue gets full
  ice: xsk: diversify return values from xsk_wakeup call paths
  i40e: xsk: diversify return values from xsk_wakeup call paths
  ixgbe: xsk: diversify return values from xsk_wakeup call paths
  ice: xsk: avoid refilling single Rx descriptors
  xsk: drop ternary operator from xskq_cons_has_entries

 .../ethernet/intel/i40e/i40e_txrx_common.h    |  1 +
 drivers/net/ethernet/intel/i40e/i40e_xsk.c    | 27 +++++++++------
 drivers/net/ethernet/intel/ice/ice_txrx.h     |  1 +
 drivers/net/ethernet/intel/ice/ice_xsk.c      | 34 ++++++++++++-------
 .../ethernet/intel/ixgbe/ixgbe_txrx_common.h  |  1 +
 drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c  | 29 ++++++++++------
 net/xdp/xsk.c                                 |  4 +--
 net/xdp/xsk_queue.h                           |  4 +--
 8 files changed, 64 insertions(+), 37 deletions(-)

-- 
2.33.1


^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH bpf-next 01/10] xsk: improve xdp_do_redirect() error codes
  2022-04-05 11:06 [PATCH bpf-next 00/10] xsk: stop softirq processing on full XSK Rx queue Maciej Fijalkowski
@ 2022-04-05 11:06 ` Maciej Fijalkowski
  2022-04-05 12:18   ` Jesper Dangaard Brouer
  2022-04-05 11:06 ` [PATCH bpf-next 02/10] xsk: diversify return codes in xsk_rcv_check() Maciej Fijalkowski
                   ` (9 subsequent siblings)
  10 siblings, 1 reply; 30+ messages in thread
From: Maciej Fijalkowski @ 2022-04-05 11:06 UTC (permalink / raw)
  To: bpf, ast, daniel, magnus.karlsson, bjorn
  Cc: netdev, brouer, maximmi, alexandr.lobakin, Maciej Fijalkowski

From: Björn Töpel <bjorn@kernel.org>

The error codes returned by xdp_do_redirect() when redirecting a frame
to an AF_XDP socket has not been very useful. A driver could not
distinguish between different errors. Prior this change the following
codes where used:

Socket not bound or incorrect queue/netdev: EINVAL
XDP frame/AF_XDP buffer size mismatch: ENOSPC
Could not allocate buffer (copy mode): ENOSPC
AF_XDP Rx buffer full: ENOSPC

After this change:

Socket not bound or incorrect queue/netdev: EINVAL
XDP frame/AF_XDP buffer size mismatch: ENOSPC
Could not allocate buffer (copy mode): ENOMEM
AF_XDP Rx buffer full: ENOBUFS

An AF_XDP zero-copy driver can now potentially determine if the
failure was due to a full Rx buffer, and if so stop processing more
frames, yielding to the userland AF_XDP application.

Signed-off-by: Björn Töpel <bjorn@kernel.org>
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
---
 net/xdp/xsk.c       | 2 +-
 net/xdp/xsk_queue.h | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
index 2c34caee0fd1..f75e121073e7 100644
--- a/net/xdp/xsk.c
+++ b/net/xdp/xsk.c
@@ -184,7 +184,7 @@ static int __xsk_rcv(struct xdp_sock *xs, struct xdp_buff *xdp)
 	xsk_xdp = xsk_buff_alloc(xs->pool);
 	if (!xsk_xdp) {
 		xs->rx_dropped++;
-		return -ENOSPC;
+		return -ENOMEM;
 	}
 
 	xsk_copy_xdp(xsk_xdp, xdp, len);
diff --git a/net/xdp/xsk_queue.h b/net/xdp/xsk_queue.h
index 801cda5d1938..644479e65578 100644
--- a/net/xdp/xsk_queue.h
+++ b/net/xdp/xsk_queue.h
@@ -382,7 +382,7 @@ static inline int xskq_prod_reserve_desc(struct xsk_queue *q,
 	u32 idx;
 
 	if (xskq_prod_is_full(q))
-		return -ENOSPC;
+		return -ENOBUFS;
 
 	/* A, matches D */
 	idx = q->cached_prod++ & q->ring_mask;
-- 
2.33.1


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH bpf-next 02/10] xsk: diversify return codes in xsk_rcv_check()
  2022-04-05 11:06 [PATCH bpf-next 00/10] xsk: stop softirq processing on full XSK Rx queue Maciej Fijalkowski
  2022-04-05 11:06 ` [PATCH bpf-next 01/10] xsk: improve xdp_do_redirect() error codes Maciej Fijalkowski
@ 2022-04-05 11:06 ` Maciej Fijalkowski
  2022-04-05 13:00   ` Jesper Dangaard Brouer
  2022-04-05 11:06 ` [PATCH bpf-next 03/10] ice: xsk: terminate NAPI when XSK Rx queue gets full Maciej Fijalkowski
                   ` (8 subsequent siblings)
  10 siblings, 1 reply; 30+ messages in thread
From: Maciej Fijalkowski @ 2022-04-05 11:06 UTC (permalink / raw)
  To: bpf, ast, daniel, magnus.karlsson, bjorn
  Cc: netdev, brouer, maximmi, alexandr.lobakin, Maciej Fijalkowski

Inspired by patch that made xdp_do_redirect() return values for XSKMAP
more meaningful, return -ENXIO instead of -EINVAL for socket being
unbound in xsk_rcv_check() as this is the usual value that is returned
for such event. In turn, it is now possible to easily distinguish what
went wrong, which is a bit harder when for both cases checked, -EINVAL
was returned.

Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
---
 net/xdp/xsk.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
index f75e121073e7..040c73345b7c 100644
--- a/net/xdp/xsk.c
+++ b/net/xdp/xsk.c
@@ -217,7 +217,7 @@ static bool xsk_is_bound(struct xdp_sock *xs)
 static int xsk_rcv_check(struct xdp_sock *xs, struct xdp_buff *xdp)
 {
 	if (!xsk_is_bound(xs))
-		return -EINVAL;
+		return -ENXIO;
 
 	if (xs->dev != xdp->rxq->dev || xs->queue_id != xdp->rxq->queue_index)
 		return -EINVAL;
-- 
2.33.1


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH bpf-next 03/10] ice: xsk: terminate NAPI when XSK Rx queue gets full
  2022-04-05 11:06 [PATCH bpf-next 00/10] xsk: stop softirq processing on full XSK Rx queue Maciej Fijalkowski
  2022-04-05 11:06 ` [PATCH bpf-next 01/10] xsk: improve xdp_do_redirect() error codes Maciej Fijalkowski
  2022-04-05 11:06 ` [PATCH bpf-next 02/10] xsk: diversify return codes in xsk_rcv_check() Maciej Fijalkowski
@ 2022-04-05 11:06 ` Maciej Fijalkowski
  2022-04-05 11:34   ` Alexander Lobakin
  2022-04-05 11:06 ` [PATCH bpf-next 04/10] i40e: " Maciej Fijalkowski
                   ` (7 subsequent siblings)
  10 siblings, 1 reply; 30+ messages in thread
From: Maciej Fijalkowski @ 2022-04-05 11:06 UTC (permalink / raw)
  To: bpf, ast, daniel, magnus.karlsson, bjorn
  Cc: netdev, brouer, maximmi, alexandr.lobakin, Maciej Fijalkowski

Correlate -ENOBUFS that was returned from xdp_do_redirect() with a XSK
Rx queue being full. In such case, terminate the softirq processing and
let the user space to consume descriptors from XSK Rx queue so that
there is room that driver can use later on.

Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
---
 drivers/net/ethernet/intel/ice/ice_txrx.h |  1 +
 drivers/net/ethernet/intel/ice/ice_xsk.c  | 25 +++++++++++++++--------
 2 files changed, 17 insertions(+), 9 deletions(-)

diff --git a/drivers/net/ethernet/intel/ice/ice_txrx.h b/drivers/net/ethernet/intel/ice/ice_txrx.h
index cead3eb149bd..f5a906c03669 100644
--- a/drivers/net/ethernet/intel/ice/ice_txrx.h
+++ b/drivers/net/ethernet/intel/ice/ice_txrx.h
@@ -133,6 +133,7 @@ static inline int ice_skb_pad(void)
 #define ICE_XDP_CONSUMED	BIT(0)
 #define ICE_XDP_TX		BIT(1)
 #define ICE_XDP_REDIR		BIT(2)
+#define ICE_XDP_EXIT		BIT(3)
 
 #define ICE_RX_DMA_ATTR \
 	(DMA_ATTR_SKIP_CPU_SYNC | DMA_ATTR_WEAK_ORDERING)
diff --git a/drivers/net/ethernet/intel/ice/ice_xsk.c b/drivers/net/ethernet/intel/ice/ice_xsk.c
index dfbcaf08520e..2439141cfa9d 100644
--- a/drivers/net/ethernet/intel/ice/ice_xsk.c
+++ b/drivers/net/ethernet/intel/ice/ice_xsk.c
@@ -538,9 +538,10 @@ ice_run_xdp_zc(struct ice_rx_ring *rx_ring, struct xdp_buff *xdp,
 
 	if (likely(act == XDP_REDIRECT)) {
 		err = xdp_do_redirect(rx_ring->netdev, xdp, xdp_prog);
-		if (err)
-			goto out_failure;
-		return ICE_XDP_REDIR;
+		if (!err)
+			return ICE_XDP_REDIR;
+		result = (err == -ENOBUFS) ? ICE_XDP_EXIT : ICE_XDP_CONSUMED;
+		goto out_failure;
 	}
 
 	switch (act) {
@@ -551,15 +552,15 @@ ice_run_xdp_zc(struct ice_rx_ring *rx_ring, struct xdp_buff *xdp,
 		if (result == ICE_XDP_CONSUMED)
 			goto out_failure;
 		break;
+	case XDP_DROP:
+		result = ICE_XDP_CONSUMED;
+		break;
 	default:
 		bpf_warn_invalid_xdp_action(rx_ring->netdev, xdp_prog, act);
 		fallthrough;
 	case XDP_ABORTED:
 out_failure:
 		trace_xdp_exception(rx_ring->netdev, xdp_prog, act);
-		fallthrough;
-	case XDP_DROP:
-		result = ICE_XDP_CONSUMED;
 		break;
 	}
 
@@ -628,10 +629,16 @@ int ice_clean_rx_irq_zc(struct ice_rx_ring *rx_ring, int budget)
 
 		xdp_res = ice_run_xdp_zc(rx_ring, xdp, xdp_prog, xdp_ring);
 		if (xdp_res) {
-			if (xdp_res & (ICE_XDP_TX | ICE_XDP_REDIR))
+			if (xdp_res == ICE_XDP_EXIT) {
+				failure = true;
+				xsk_buff_free(xdp);
+				ice_bump_ntc(rx_ring);
+				break;
+			} else if (xdp_res & (ICE_XDP_TX | ICE_XDP_REDIR)) {
 				xdp_xmit |= xdp_res;
-			else
+			} else {
 				xsk_buff_free(xdp);
+			}
 
 			total_rx_bytes += size;
 			total_rx_packets++;
@@ -666,7 +673,7 @@ int ice_clean_rx_irq_zc(struct ice_rx_ring *rx_ring, int budget)
 		ice_receive_skb(rx_ring, skb, vlan_tag);
 	}
 
-	failure = !ice_alloc_rx_bufs_zc(rx_ring, ICE_DESC_UNUSED(rx_ring));
+	failure |= !ice_alloc_rx_bufs_zc(rx_ring, ICE_DESC_UNUSED(rx_ring));
 
 	ice_finalize_xdp_rx(xdp_ring, xdp_xmit);
 	ice_update_rx_ring_stats(rx_ring, total_rx_packets, total_rx_bytes);
-- 
2.33.1


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH bpf-next 04/10] i40e: xsk: terminate NAPI when XSK Rx queue gets full
  2022-04-05 11:06 [PATCH bpf-next 00/10] xsk: stop softirq processing on full XSK Rx queue Maciej Fijalkowski
                   ` (2 preceding siblings ...)
  2022-04-05 11:06 ` [PATCH bpf-next 03/10] ice: xsk: terminate NAPI when XSK Rx queue gets full Maciej Fijalkowski
@ 2022-04-05 11:06 ` Maciej Fijalkowski
  2022-04-05 13:04   ` Jesper Dangaard Brouer
  2022-04-05 11:06 ` [PATCH bpf-next 05/10] ixgbe: " Maciej Fijalkowski
                   ` (6 subsequent siblings)
  10 siblings, 1 reply; 30+ messages in thread
From: Maciej Fijalkowski @ 2022-04-05 11:06 UTC (permalink / raw)
  To: bpf, ast, daniel, magnus.karlsson, bjorn
  Cc: netdev, brouer, maximmi, alexandr.lobakin, Maciej Fijalkowski

Correlate -ENOBUFS that was returned from xdp_do_redirect() with a XSK
Rx queue being full. In such case, terminate the softirq processing and
let the user space to consume descriptors from XSK Rx queue so that
there is room that driver can use later on.

Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
---
 .../ethernet/intel/i40e/i40e_txrx_common.h    |  1 +
 drivers/net/ethernet/intel/i40e/i40e_xsk.c    | 21 ++++++++++++-------
 2 files changed, 15 insertions(+), 7 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx_common.h b/drivers/net/ethernet/intel/i40e/i40e_txrx_common.h
index 19da3b22160f..8c5118c8baaf 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_txrx_common.h
+++ b/drivers/net/ethernet/intel/i40e/i40e_txrx_common.h
@@ -20,6 +20,7 @@ void i40e_release_rx_desc(struct i40e_ring *rx_ring, u32 val);
 #define I40E_XDP_CONSUMED	BIT(0)
 #define I40E_XDP_TX		BIT(1)
 #define I40E_XDP_REDIR		BIT(2)
+#define I40E_XDP_EXIT		BIT(3)
 
 /*
  * build_ctob - Builds the Tx descriptor (cmd, offset and type) qword
diff --git a/drivers/net/ethernet/intel/i40e/i40e_xsk.c b/drivers/net/ethernet/intel/i40e/i40e_xsk.c
index c1d25b0b0ca2..9f9e4ce9a24d 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_xsk.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_xsk.c
@@ -161,9 +161,10 @@ static int i40e_run_xdp_zc(struct i40e_ring *rx_ring, struct xdp_buff *xdp)
 
 	if (likely(act == XDP_REDIRECT)) {
 		err = xdp_do_redirect(rx_ring->netdev, xdp, xdp_prog);
-		if (err)
-			goto out_failure;
-		return I40E_XDP_REDIR;
+		if (!err)
+			return I40E_XDP_REDIR;
+		result = (err == -ENOBUFS) ? I40E_XDP_EXIT : I40E_XDP_CONSUMED;
+		goto out_failure;
 	}
 
 	switch (act) {
@@ -175,6 +176,9 @@ static int i40e_run_xdp_zc(struct i40e_ring *rx_ring, struct xdp_buff *xdp)
 		if (result == I40E_XDP_CONSUMED)
 			goto out_failure;
 		break;
+	case XDP_DROP:
+		result = I40E_XDP_CONSUMED;
+		break;
 	default:
 		bpf_warn_invalid_xdp_action(rx_ring->netdev, xdp_prog, act);
 		fallthrough;
@@ -182,9 +186,6 @@ static int i40e_run_xdp_zc(struct i40e_ring *rx_ring, struct xdp_buff *xdp)
 out_failure:
 		trace_xdp_exception(rx_ring->netdev, xdp_prog, act);
 		fallthrough; /* handle aborts by dropping packet */
-	case XDP_DROP:
-		result = I40E_XDP_CONSUMED;
-		break;
 	}
 	return result;
 }
@@ -370,6 +371,12 @@ int i40e_clean_rx_irq_zc(struct i40e_ring *rx_ring, int budget)
 		xsk_buff_dma_sync_for_cpu(bi, rx_ring->xsk_pool);
 
 		xdp_res = i40e_run_xdp_zc(rx_ring, bi);
+		if (xdp_res == I40E_XDP_EXIT) {
+			failure = true;
+			xsk_buff_free(bi);
+			next_to_clean = (next_to_clean + 1) & count_mask;
+			break;
+		}
 		i40e_handle_xdp_result_zc(rx_ring, bi, rx_desc, &rx_packets,
 					  &rx_bytes, size, xdp_res);
 		total_rx_packets += rx_packets;
@@ -382,7 +389,7 @@ int i40e_clean_rx_irq_zc(struct i40e_ring *rx_ring, int budget)
 	cleaned_count = (next_to_clean - rx_ring->next_to_use - 1) & count_mask;
 
 	if (cleaned_count >= I40E_RX_BUFFER_WRITE)
-		failure = !i40e_alloc_rx_buffers_zc(rx_ring, cleaned_count);
+		failure |= !i40e_alloc_rx_buffers_zc(rx_ring, cleaned_count);
 
 	i40e_finalize_xdp_rx(rx_ring, xdp_xmit);
 	i40e_update_rx_stats(rx_ring, total_rx_bytes, total_rx_packets);
-- 
2.33.1


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH bpf-next 05/10] ixgbe: xsk: terminate NAPI when XSK Rx queue gets full
  2022-04-05 11:06 [PATCH bpf-next 00/10] xsk: stop softirq processing on full XSK Rx queue Maciej Fijalkowski
                   ` (3 preceding siblings ...)
  2022-04-05 11:06 ` [PATCH bpf-next 04/10] i40e: " Maciej Fijalkowski
@ 2022-04-05 11:06 ` Maciej Fijalkowski
  2022-04-05 12:36   ` Jesper Dangaard Brouer
  2022-04-05 11:06 ` [PATCH bpf-next 06/10] ice: xsk: diversify return values from xsk_wakeup call paths Maciej Fijalkowski
                   ` (5 subsequent siblings)
  10 siblings, 1 reply; 30+ messages in thread
From: Maciej Fijalkowski @ 2022-04-05 11:06 UTC (permalink / raw)
  To: bpf, ast, daniel, magnus.karlsson, bjorn
  Cc: netdev, brouer, maximmi, alexandr.lobakin, Maciej Fijalkowski

Correlate -ENOBUFS that was returned from xdp_do_redirect() with a XSK
Rx queue being full. In such case, terminate the softirq processing and
let the user space to consume descriptors from XSK Rx queue so that
there is room that driver can use later on.

Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
---
 .../ethernet/intel/ixgbe/ixgbe_txrx_common.h  |  1 +
 drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c  | 23 ++++++++++++-------
 2 files changed, 16 insertions(+), 8 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_txrx_common.h b/drivers/net/ethernet/intel/ixgbe/ixgbe_txrx_common.h
index bba3feaf3318..f1f69ce67420 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_txrx_common.h
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_txrx_common.h
@@ -8,6 +8,7 @@
 #define IXGBE_XDP_CONSUMED	BIT(0)
 #define IXGBE_XDP_TX		BIT(1)
 #define IXGBE_XDP_REDIR		BIT(2)
+#define IXGBE_XDP_EXIT		BIT(3)
 
 #define IXGBE_TXD_CMD (IXGBE_TXD_CMD_EOP | \
 		       IXGBE_TXD_CMD_RS)
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c
index dd7ff66d422f..475244a2c6e4 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c
@@ -109,9 +109,10 @@ static int ixgbe_run_xdp_zc(struct ixgbe_adapter *adapter,
 
 	if (likely(act == XDP_REDIRECT)) {
 		err = xdp_do_redirect(rx_ring->netdev, xdp, xdp_prog);
-		if (err)
-			goto out_failure;
-		return IXGBE_XDP_REDIR;
+		if (!err)
+			return IXGBE_XDP_REDIR;
+		result = (err == -ENOBUFS) ? IXGBE_XDP_EXIT : IXGBE_XDP_CONSUMED;
+		goto out_failure;
 	}
 
 	switch (act) {
@@ -130,6 +131,9 @@ static int ixgbe_run_xdp_zc(struct ixgbe_adapter *adapter,
 		if (result == IXGBE_XDP_CONSUMED)
 			goto out_failure;
 		break;
+	case XDP_DROP:
+		result = IXGBE_XDP_CONSUMED;
+		break;
 	default:
 		bpf_warn_invalid_xdp_action(rx_ring->netdev, xdp_prog, act);
 		fallthrough;
@@ -137,9 +141,6 @@ static int ixgbe_run_xdp_zc(struct ixgbe_adapter *adapter,
 out_failure:
 		trace_xdp_exception(rx_ring->netdev, xdp_prog, act);
 		fallthrough; /* handle aborts by dropping packet */
-	case XDP_DROP:
-		result = IXGBE_XDP_CONSUMED;
-		break;
 	}
 	return result;
 }
@@ -304,10 +305,16 @@ int ixgbe_clean_rx_irq_zc(struct ixgbe_q_vector *q_vector,
 		xdp_res = ixgbe_run_xdp_zc(adapter, rx_ring, bi->xdp);
 
 		if (xdp_res) {
-			if (xdp_res & (IXGBE_XDP_TX | IXGBE_XDP_REDIR))
+			if (xdp_res == IXGBE_XDP_EXIT) {
+				failure = true;
+				xsk_buff_free(bi->xdp);
+				ixgbe_inc_ntc(rx_ring);
+				break;
+			} else if (xdp_res & (IXGBE_XDP_TX | IXGBE_XDP_REDIR)) {
 				xdp_xmit |= xdp_res;
-			else
+			} else {
 				xsk_buff_free(bi->xdp);
+			}
 
 			bi->xdp = NULL;
 			total_rx_packets++;
-- 
2.33.1


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH bpf-next 06/10] ice: xsk: diversify return values from xsk_wakeup call paths
  2022-04-05 11:06 [PATCH bpf-next 00/10] xsk: stop softirq processing on full XSK Rx queue Maciej Fijalkowski
                   ` (4 preceding siblings ...)
  2022-04-05 11:06 ` [PATCH bpf-next 05/10] ixgbe: " Maciej Fijalkowski
@ 2022-04-05 11:06 ` Maciej Fijalkowski
  2022-04-05 11:06 ` [PATCH bpf-next 07/10] i40e: " Maciej Fijalkowski
                   ` (4 subsequent siblings)
  10 siblings, 0 replies; 30+ messages in thread
From: Maciej Fijalkowski @ 2022-04-05 11:06 UTC (permalink / raw)
  To: bpf, ast, daniel, magnus.karlsson, bjorn
  Cc: netdev, brouer, maximmi, alexandr.lobakin, Maciej Fijalkowski

Currently, when debugging AF_XDP workloads, one can correlate the -ENXIO
return code as the case that XSK is not in the bound state. Returning
same code from ndo_xsk_wakeup can be misleading and simply makes it
harder to follow what is going on.

Change ENXIOs in ice's ndo_xsk_wakeup() implementations to EINVALs, so
that when probing it is clear that something is wrong on the driver
side, not in the xsk_{recv,send}msg.

There is a -ENETDOWN that can happen from both kernel/driver sides
though, but I don't have a correct replacement for this on one of the
sides, so let's keep it that way.

Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
---
 drivers/net/ethernet/intel/ice/ice_xsk.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/intel/ice/ice_xsk.c b/drivers/net/ethernet/intel/ice/ice_xsk.c
index 2439141cfa9d..272c0daf9ed3 100644
--- a/drivers/net/ethernet/intel/ice/ice_xsk.c
+++ b/drivers/net/ethernet/intel/ice/ice_xsk.c
@@ -929,13 +929,13 @@ ice_xsk_wakeup(struct net_device *netdev, u32 queue_id,
 		return -ENETDOWN;
 
 	if (!ice_is_xdp_ena_vsi(vsi))
-		return -ENXIO;
+		return -EINVAL;
 
 	if (queue_id >= vsi->num_txq)
-		return -ENXIO;
+		return -EINVAL;
 
 	if (!vsi->xdp_rings[queue_id]->xsk_pool)
-		return -ENXIO;
+		return -EINVAL;
 
 	ring = vsi->xdp_rings[queue_id];
 
-- 
2.33.1


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH bpf-next 07/10] i40e: xsk: diversify return values from xsk_wakeup call paths
  2022-04-05 11:06 [PATCH bpf-next 00/10] xsk: stop softirq processing on full XSK Rx queue Maciej Fijalkowski
                   ` (5 preceding siblings ...)
  2022-04-05 11:06 ` [PATCH bpf-next 06/10] ice: xsk: diversify return values from xsk_wakeup call paths Maciej Fijalkowski
@ 2022-04-05 11:06 ` Maciej Fijalkowski
  2022-04-05 11:06 ` [PATCH bpf-next 08/10] ixgbe: " Maciej Fijalkowski
                   ` (3 subsequent siblings)
  10 siblings, 0 replies; 30+ messages in thread
From: Maciej Fijalkowski @ 2022-04-05 11:06 UTC (permalink / raw)
  To: bpf, ast, daniel, magnus.karlsson, bjorn
  Cc: netdev, brouer, maximmi, alexandr.lobakin, Maciej Fijalkowski

Currently, when debugging AF_XDP workloads, one can correlate the -ENXIO
return code as the case that XSK is not in the bound state. Returning
same code from ndo_xsk_wakeup can be misleading and simply makes it
harder to follow what is going on.

Change ENXIOs in i40e's ndo_xsk_wakeup() implementations to EINVALs, so
that when probing it is clear that something is wrong on the driver
side, not in the xsk_{recv,send}msg.

There is a -ENETDOWN that can happen from both kernel/driver sides
though, but I don't have a correct replacement for this on one of the
sides, so let's keep it that way.

Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
---
 drivers/net/ethernet/intel/i40e/i40e_xsk.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_xsk.c b/drivers/net/ethernet/intel/i40e/i40e_xsk.c
index 9f9e4ce9a24d..f10297a0e647 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_xsk.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_xsk.c
@@ -601,13 +601,13 @@ int i40e_xsk_wakeup(struct net_device *dev, u32 queue_id, u32 flags)
 		return -ENETDOWN;
 
 	if (!i40e_enabled_xdp_vsi(vsi))
-		return -ENXIO;
+		return -EINVAL;
 
 	if (queue_id >= vsi->num_queue_pairs)
-		return -ENXIO;
+		return -EINVAL;
 
 	if (!vsi->xdp_rings[queue_id]->xsk_pool)
-		return -ENXIO;
+		return -EINVAL;
 
 	ring = vsi->xdp_rings[queue_id];
 
-- 
2.33.1


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH bpf-next 08/10] ixgbe: xsk: diversify return values from xsk_wakeup call paths
  2022-04-05 11:06 [PATCH bpf-next 00/10] xsk: stop softirq processing on full XSK Rx queue Maciej Fijalkowski
                   ` (6 preceding siblings ...)
  2022-04-05 11:06 ` [PATCH bpf-next 07/10] i40e: " Maciej Fijalkowski
@ 2022-04-05 11:06 ` Maciej Fijalkowski
  2022-04-05 11:06 ` [PATCH bpf-next 09/10] ice: xsk: avoid refilling single Rx descriptors Maciej Fijalkowski
                   ` (2 subsequent siblings)
  10 siblings, 0 replies; 30+ messages in thread
From: Maciej Fijalkowski @ 2022-04-05 11:06 UTC (permalink / raw)
  To: bpf, ast, daniel, magnus.karlsson, bjorn
  Cc: netdev, brouer, maximmi, alexandr.lobakin, Maciej Fijalkowski

Currently, when debugging AF_XDP workloads, one can correlate the -ENXIO
return code as the case that XSK is not in the bound state. Returning
same code from ndo_xsk_wakeup can be misleading and simply makes it
harder to follow what is going on.

Change ENXIOs in ixgbe's ndo_xsk_wakeup() implementations to EINVALs, so
that when probing it is clear that something is wrong on the driver
side, not in the xsk_{recv,send}msg.

There is a -ENETDOWN that can happen from both kernel/driver sides
though, but I don't have a correct replacement for this on one of the
sides, so let's keep it that way.

Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
---
 drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c
index 475244a2c6e4..c8870da7af72 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c
@@ -523,10 +523,10 @@ int ixgbe_xsk_wakeup(struct net_device *dev, u32 qid, u32 flags)
 		return -ENETDOWN;
 
 	if (!READ_ONCE(adapter->xdp_prog))
-		return -ENXIO;
+		return -EINVAL;
 
 	if (qid >= adapter->num_xdp_queues)
-		return -ENXIO;
+		return -EINVAL;
 
 	ring = adapter->xdp_ring[qid];
 
@@ -534,7 +534,7 @@ int ixgbe_xsk_wakeup(struct net_device *dev, u32 qid, u32 flags)
 		return -ENETDOWN;
 
 	if (!ring->xsk_pool)
-		return -ENXIO;
+		return -EINVAL;
 
 	if (!napi_if_scheduled_mark_missed(&ring->q_vector->napi)) {
 		u64 eics = BIT_ULL(ring->q_vector->v_idx);
-- 
2.33.1


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH bpf-next 09/10] ice: xsk: avoid refilling single Rx descriptors
  2022-04-05 11:06 [PATCH bpf-next 00/10] xsk: stop softirq processing on full XSK Rx queue Maciej Fijalkowski
                   ` (7 preceding siblings ...)
  2022-04-05 11:06 ` [PATCH bpf-next 08/10] ixgbe: " Maciej Fijalkowski
@ 2022-04-05 11:06 ` Maciej Fijalkowski
  2022-04-05 11:06 ` [PATCH bpf-next 10/10] xsk: drop ternary operator from xskq_cons_has_entries Maciej Fijalkowski
  2022-04-07 10:49 ` [PATCH bpf-next 00/10] xsk: stop softirq processing on full XSK Rx queue Maxim Mikityanskiy
  10 siblings, 0 replies; 30+ messages in thread
From: Maciej Fijalkowski @ 2022-04-05 11:06 UTC (permalink / raw)
  To: bpf, ast, daniel, magnus.karlsson, bjorn
  Cc: netdev, brouer, maximmi, alexandr.lobakin, Maciej Fijalkowski

Call alloc Rx routine for ZC driver only when the amount of unallocated
descriptors exceeds given threshold.

Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
---
 drivers/net/ethernet/intel/ice/ice_xsk.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/ice/ice_xsk.c b/drivers/net/ethernet/intel/ice/ice_xsk.c
index 272c0daf9ed3..143f6b6937bd 100644
--- a/drivers/net/ethernet/intel/ice/ice_xsk.c
+++ b/drivers/net/ethernet/intel/ice/ice_xsk.c
@@ -581,6 +581,7 @@ int ice_clean_rx_irq_zc(struct ice_rx_ring *rx_ring, int budget)
 	unsigned int xdp_xmit = 0;
 	struct bpf_prog *xdp_prog;
 	bool failure = false;
+	int entries_to_alloc;
 
 	/* ZC patch is enabled only when XDP program is set,
 	 * so here it can not be NULL
@@ -673,7 +674,9 @@ int ice_clean_rx_irq_zc(struct ice_rx_ring *rx_ring, int budget)
 		ice_receive_skb(rx_ring, skb, vlan_tag);
 	}
 
-	failure |= !ice_alloc_rx_bufs_zc(rx_ring, ICE_DESC_UNUSED(rx_ring));
+	entries_to_alloc = ICE_DESC_UNUSED(rx_ring);
+	if (entries_to_alloc > ICE_RING_QUARTER(rx_ring))
+		failure |= !ice_alloc_rx_bufs_zc(rx_ring, entries_to_alloc);
 
 	ice_finalize_xdp_rx(xdp_ring, xdp_xmit);
 	ice_update_rx_ring_stats(rx_ring, total_rx_packets, total_rx_bytes);
-- 
2.33.1


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH bpf-next 10/10] xsk: drop ternary operator from xskq_cons_has_entries
  2022-04-05 11:06 [PATCH bpf-next 00/10] xsk: stop softirq processing on full XSK Rx queue Maciej Fijalkowski
                   ` (8 preceding siblings ...)
  2022-04-05 11:06 ` [PATCH bpf-next 09/10] ice: xsk: avoid refilling single Rx descriptors Maciej Fijalkowski
@ 2022-04-05 11:06 ` Maciej Fijalkowski
  2022-04-07 10:49 ` [PATCH bpf-next 00/10] xsk: stop softirq processing on full XSK Rx queue Maxim Mikityanskiy
  10 siblings, 0 replies; 30+ messages in thread
From: Maciej Fijalkowski @ 2022-04-05 11:06 UTC (permalink / raw)
  To: bpf, ast, daniel, magnus.karlsson, bjorn
  Cc: netdev, brouer, maximmi, alexandr.lobakin, Maciej Fijalkowski

Simplify the mentioned helper function by removing ternary operator. The
expression that is there outputs the boolean value by itself.

This helper might be used in the hot path so this simplification can
also be considered as micro optimization.

Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
---
 net/xdp/xsk_queue.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/xdp/xsk_queue.h b/net/xdp/xsk_queue.h
index 644479e65578..a794410989cc 100644
--- a/net/xdp/xsk_queue.h
+++ b/net/xdp/xsk_queue.h
@@ -263,7 +263,7 @@ static inline u32 xskq_cons_nb_entries(struct xsk_queue *q, u32 max)
 
 static inline bool xskq_cons_has_entries(struct xsk_queue *q, u32 cnt)
 {
-	return xskq_cons_nb_entries(q, cnt) >= cnt ? true : false;
+	return xskq_cons_nb_entries(q, cnt) >= cnt;
 }
 
 static inline bool xskq_cons_peek_addr_unchecked(struct xsk_queue *q, u64 *addr)
-- 
2.33.1


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [PATCH bpf-next 03/10] ice: xsk: terminate NAPI when XSK Rx queue gets full
  2022-04-05 11:06 ` [PATCH bpf-next 03/10] ice: xsk: terminate NAPI when XSK Rx queue gets full Maciej Fijalkowski
@ 2022-04-05 11:34   ` Alexander Lobakin
  2022-04-05 12:02     ` Maciej Fijalkowski
  0 siblings, 1 reply; 30+ messages in thread
From: Alexander Lobakin @ 2022-04-05 11:34 UTC (permalink / raw)
  To: Maciej Fijalkowski
  Cc: Alexander Lobakin, bpf, ast, daniel, magnus.karlsson, bjorn,
	netdev, brouer, maximmi

From: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Date: Tue, 5 Apr 2022 13:06:24 +0200

> Correlate -ENOBUFS that was returned from xdp_do_redirect() with a XSK
> Rx queue being full. In such case, terminate the softirq processing and
> let the user space to consume descriptors from XSK Rx queue so that
> there is room that driver can use later on.
> 
> Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
> ---
>  drivers/net/ethernet/intel/ice/ice_txrx.h |  1 +
>  drivers/net/ethernet/intel/ice/ice_xsk.c  | 25 +++++++++++++++--------
>  2 files changed, 17 insertions(+), 9 deletions(-)

--- 8< ---

> @@ -551,15 +552,15 @@ ice_run_xdp_zc(struct ice_rx_ring *rx_ring, struct xdp_buff *xdp,
>  		if (result == ICE_XDP_CONSUMED)
>  			goto out_failure;
>  		break;
> +	case XDP_DROP:
> +		result = ICE_XDP_CONSUMED;
> +		break;
>  	default:
>  		bpf_warn_invalid_xdp_action(rx_ring->netdev, xdp_prog, act);
>  		fallthrough;
>  	case XDP_ABORTED:
>  out_failure:
>  		trace_xdp_exception(rx_ring->netdev, xdp_prog, act);
> -		fallthrough;
> -	case XDP_DROP:
> -		result = ICE_XDP_CONSUMED;
>  		break;

So the result for %XDP_ABORTED will be %ICE_XDP_PASS now? Or I'm
missing something :s

>  	}
>  
> @@ -628,10 +629,16 @@ int ice_clean_rx_irq_zc(struct ice_rx_ring *rx_ring, int budget)

--- 8< ---

> -- 
> 2.33.1

Thanks,
Al

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH bpf-next 03/10] ice: xsk: terminate NAPI when XSK Rx queue gets full
  2022-04-05 11:34   ` Alexander Lobakin
@ 2022-04-05 12:02     ` Maciej Fijalkowski
  0 siblings, 0 replies; 30+ messages in thread
From: Maciej Fijalkowski @ 2022-04-05 12:02 UTC (permalink / raw)
  To: Alexander Lobakin
  Cc: bpf, ast, daniel, magnus.karlsson, bjorn, netdev, brouer, maximmi

On Tue, Apr 05, 2022 at 01:34:03PM +0200, Alexander Lobakin wrote:
> From: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
> Date: Tue, 5 Apr 2022 13:06:24 +0200
> 
> > Correlate -ENOBUFS that was returned from xdp_do_redirect() with a XSK
> > Rx queue being full. In such case, terminate the softirq processing and
> > let the user space to consume descriptors from XSK Rx queue so that
> > there is room that driver can use later on.
> > 
> > Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
> > ---
> >  drivers/net/ethernet/intel/ice/ice_txrx.h |  1 +
> >  drivers/net/ethernet/intel/ice/ice_xsk.c  | 25 +++++++++++++++--------
> >  2 files changed, 17 insertions(+), 9 deletions(-)
> 
> --- 8< ---
> 
> > @@ -551,15 +552,15 @@ ice_run_xdp_zc(struct ice_rx_ring *rx_ring, struct xdp_buff *xdp,
> >  		if (result == ICE_XDP_CONSUMED)
> >  			goto out_failure;
> >  		break;
> > +	case XDP_DROP:
> > +		result = ICE_XDP_CONSUMED;
> > +		break;
> >  	default:
> >  		bpf_warn_invalid_xdp_action(rx_ring->netdev, xdp_prog, act);
> >  		fallthrough;
> >  	case XDP_ABORTED:
> >  out_failure:
> >  		trace_xdp_exception(rx_ring->netdev, xdp_prog, act);
> > -		fallthrough;
> > -	case XDP_DROP:
> > -		result = ICE_XDP_CONSUMED;
> >  		break;
> 
> So the result for %XDP_ABORTED will be %ICE_XDP_PASS now? Or I'm
> missing something :s

Yikes! I generally wanted to avoid the overwrite of result but still go
through the exception path.


Below should be fine if we add it to the current patch, but i'll double
check after the dinner.

Good catch as always, Alex :)

diff --git a/drivers/net/ethernet/intel/ice/ice_xsk.c b/drivers/net/ethernet/intel/ice/ice_xsk.c
index 143f6b6937bd..16c282b7050b 100644
--- a/drivers/net/ethernet/intel/ice/ice_xsk.c
+++ b/drivers/net/ethernet/intel/ice/ice_xsk.c
@@ -559,6 +559,7 @@ ice_run_xdp_zc(struct ice_rx_ring *rx_ring, struct xdp_buff *xdp,
 		bpf_warn_invalid_xdp_action(rx_ring->netdev, xdp_prog, act);
 		fallthrough;
 	case XDP_ABORTED:
+		result = ICE_XDP_CONSUMED;
 out_failure:
 		trace_xdp_exception(rx_ring->netdev, xdp_prog, act);
 		break;


> 
> >  	}
> >  
> > @@ -628,10 +629,16 @@ int ice_clean_rx_irq_zc(struct ice_rx_ring *rx_ring, int budget)
> 
> --- 8< ---
> 
> > -- 
> > 2.33.1
> 
> Thanks,
> Al

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [PATCH bpf-next 01/10] xsk: improve xdp_do_redirect() error codes
  2022-04-05 11:06 ` [PATCH bpf-next 01/10] xsk: improve xdp_do_redirect() error codes Maciej Fijalkowski
@ 2022-04-05 12:18   ` Jesper Dangaard Brouer
  0 siblings, 0 replies; 30+ messages in thread
From: Jesper Dangaard Brouer @ 2022-04-05 12:18 UTC (permalink / raw)
  To: Maciej Fijalkowski, bpf, ast, daniel, magnus.karlsson, bjorn
  Cc: brouer, netdev, maximmi, alexandr.lobakin


On 05/04/2022 13.06, Maciej Fijalkowski wrote:
> From: Björn Töpel<bjorn@kernel.org>
> 
> The error codes returned by xdp_do_redirect() when redirecting a frame
> to an AF_XDP socket has not been very useful. A driver could not
> distinguish between different errors. Prior this change the following
> codes where used:
> 
> Socket not bound or incorrect queue/netdev: EINVAL
> XDP frame/AF_XDP buffer size mismatch: ENOSPC
> Could not allocate buffer (copy mode): ENOSPC
> AF_XDP Rx buffer full: ENOSPC
> 
> After this change:
> 
> Socket not bound or incorrect queue/netdev: EINVAL
> XDP frame/AF_XDP buffer size mismatch: ENOSPC
> Could not allocate buffer (copy mode): ENOMEM
> AF_XDP Rx buffer full: ENOBUFS
> 
> An AF_XDP zero-copy driver can now potentially determine if the
> failure was due to a full Rx buffer, and if so stop processing more
> frames, yielding to the userland AF_XDP application.
> 
> Signed-off-by: Björn Töpel<bjorn@kernel.org>
> Signed-off-by: Maciej Fijalkowski<maciej.fijalkowski@intel.com>

Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>

Thanks for picking this work up again! :-)
--Jesper


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH bpf-next 05/10] ixgbe: xsk: terminate NAPI when XSK Rx queue gets full
  2022-04-05 11:06 ` [PATCH bpf-next 05/10] ixgbe: " Maciej Fijalkowski
@ 2022-04-05 12:36   ` Jesper Dangaard Brouer
  2022-04-05 13:52     ` Maciej Fijalkowski
  0 siblings, 1 reply; 30+ messages in thread
From: Jesper Dangaard Brouer @ 2022-04-05 12:36 UTC (permalink / raw)
  To: Maciej Fijalkowski, bpf, ast, daniel, magnus.karlsson, bjorn
  Cc: brouer, netdev, maximmi, alexandr.lobakin


On 05/04/2022 13.06, Maciej Fijalkowski wrote:
> Correlate -ENOBUFS that was returned from xdp_do_redirect() with a XSK
> Rx queue being full. In such case, terminate the softirq processing and
> let the user space to consume descriptors from XSK Rx queue so that
> there is room that driver can use later on.
> 
> Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
> ---
>   .../ethernet/intel/ixgbe/ixgbe_txrx_common.h  |  1 +
>   drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c  | 23 ++++++++++++-------
>   2 files changed, 16 insertions(+), 8 deletions(-)
> 
> diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_txrx_common.h b/drivers/net/ethernet/intel/ixgbe/ixgbe_txrx_common.h
> index bba3feaf3318..f1f69ce67420 100644
> --- a/drivers/net/ethernet/intel/ixgbe/ixgbe_txrx_common.h
> +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_txrx_common.h
> @@ -8,6 +8,7 @@
>   #define IXGBE_XDP_CONSUMED	BIT(0)
>   #define IXGBE_XDP_TX		BIT(1)
>   #define IXGBE_XDP_REDIR		BIT(2)
> +#define IXGBE_XDP_EXIT		BIT(3)
>   
>   #define IXGBE_TXD_CMD (IXGBE_TXD_CMD_EOP | \
>   		       IXGBE_TXD_CMD_RS)
> diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c
> index dd7ff66d422f..475244a2c6e4 100644
> --- a/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c
> +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c
> @@ -109,9 +109,10 @@ static int ixgbe_run_xdp_zc(struct ixgbe_adapter *adapter,
>   
>   	if (likely(act == XDP_REDIRECT)) {
>   		err = xdp_do_redirect(rx_ring->netdev, xdp, xdp_prog);
> -		if (err)
> -			goto out_failure;
> -		return IXGBE_XDP_REDIR;
> +		if (!err)
> +			return IXGBE_XDP_REDIR;
> +		result = (err == -ENOBUFS) ? IXGBE_XDP_EXIT : IXGBE_XDP_CONSUMED;
> +		goto out_failure;
>   	}
>   
>   	switch (act) {
> @@ -130,6 +131,9 @@ static int ixgbe_run_xdp_zc(struct ixgbe_adapter *adapter,
>   		if (result == IXGBE_XDP_CONSUMED)
>   			goto out_failure;
>   		break;
> +	case XDP_DROP:
> +		result = IXGBE_XDP_CONSUMED;
> +		break;
>   	default:
>   		bpf_warn_invalid_xdp_action(rx_ring->netdev, xdp_prog, act);
>   		fallthrough;
> @@ -137,9 +141,6 @@ static int ixgbe_run_xdp_zc(struct ixgbe_adapter *adapter,
>   out_failure:
>   		trace_xdp_exception(rx_ring->netdev, xdp_prog, act);
>   		fallthrough; /* handle aborts by dropping packet */
> -	case XDP_DROP:
> -		result = IXGBE_XDP_CONSUMED;
> -		break;
>   	}
>   	return result;
>   }
> @@ -304,10 +305,16 @@ int ixgbe_clean_rx_irq_zc(struct ixgbe_q_vector *q_vector,
>   		xdp_res = ixgbe_run_xdp_zc(adapter, rx_ring, bi->xdp);
>   
>   		if (xdp_res) {
> -			if (xdp_res & (IXGBE_XDP_TX | IXGBE_XDP_REDIR))
> +			if (xdp_res == IXGBE_XDP_EXIT) {

Micro optimization note: Having this as the first if()-statement
defaults the compiler to think this is the likely() case. (But compilers
can obviously be smarter and can easily choose that the IXGBE_XDP_REDIR
branch is so simple that it takes it as the likely case).
Just wanted to mention this, given this is fash-path code.

> +				failure = true;
> +				xsk_buff_free(bi->xdp);
> +				ixgbe_inc_ntc(rx_ring);
> +				break;

I was wondering if we have a situation where we should set xdp_xmit bit
to trigger the call to xdp_do_flush_map later in function, but I assume
you have this covered.

> +			} else if (xdp_res & (IXGBE_XDP_TX | IXGBE_XDP_REDIR)) {
>   				xdp_xmit |= xdp_res;
> -			else
> +			} else {
>   				xsk_buff_free(bi->xdp);
> +			}
>   
>   			bi->xdp = NULL;
>   			total_rx_packets++;


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH bpf-next 02/10] xsk: diversify return codes in xsk_rcv_check()
  2022-04-05 11:06 ` [PATCH bpf-next 02/10] xsk: diversify return codes in xsk_rcv_check() Maciej Fijalkowski
@ 2022-04-05 13:00   ` Jesper Dangaard Brouer
  2022-04-05 13:35     ` Maciej Fijalkowski
  0 siblings, 1 reply; 30+ messages in thread
From: Jesper Dangaard Brouer @ 2022-04-05 13:00 UTC (permalink / raw)
  To: Maciej Fijalkowski, bpf, ast, daniel, magnus.karlsson, bjorn
  Cc: brouer, netdev, maximmi, alexandr.lobakin



On 05/04/2022 13.06, Maciej Fijalkowski wrote:
> Inspired by patch that made xdp_do_redirect() return values for XSKMAP
> more meaningful, return -ENXIO instead of -EINVAL for socket being
> unbound in xsk_rcv_check() as this is the usual value that is returned
> for such event. In turn, it is now possible to easily distinguish what
> went wrong, which is a bit harder when for both cases checked, -EINVAL
> was returned.

I like this as it makes it easier to troubleshoot.
Could you update the description to explain how to debug this easily.
E.g. via this bpftrace one liner:


  bpftrace -e 'tracepoint:xdp:xdp_redirect* {@err[-args->err] = count();}'


> Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
> ---

Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>

>   net/xdp/xsk.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
> index f75e121073e7..040c73345b7c 100644
> --- a/net/xdp/xsk.c
> +++ b/net/xdp/xsk.c
> @@ -217,7 +217,7 @@ static bool xsk_is_bound(struct xdp_sock *xs)
>   static int xsk_rcv_check(struct xdp_sock *xs, struct xdp_buff *xdp)
>   {
>   	if (!xsk_is_bound(xs))
> -		return -EINVAL;
> +		return -ENXIO;
>   
>   	if (xs->dev != xdp->rxq->dev || xs->queue_id != xdp->rxq->queue_index)
>   		return -EINVAL;
> 


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH bpf-next 04/10] i40e: xsk: terminate NAPI when XSK Rx queue gets full
  2022-04-05 11:06 ` [PATCH bpf-next 04/10] i40e: " Maciej Fijalkowski
@ 2022-04-05 13:04   ` Jesper Dangaard Brouer
  2022-04-06 16:04     ` Maciej Fijalkowski
  0 siblings, 1 reply; 30+ messages in thread
From: Jesper Dangaard Brouer @ 2022-04-05 13:04 UTC (permalink / raw)
  To: Maciej Fijalkowski, bpf, ast, daniel, magnus.karlsson, bjorn
  Cc: brouer, netdev, maximmi, alexandr.lobakin, Toke Hoiland Jorgensen



On 05/04/2022 13.06, Maciej Fijalkowski wrote:
> Correlate -ENOBUFS that was returned from xdp_do_redirect() with a XSK
> Rx queue being full. In such case, terminate the softirq processing and
> let the user space to consume descriptors from XSK Rx queue so that
> there is room that driver can use later on.
> 
> Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
> ---
>   .../ethernet/intel/i40e/i40e_txrx_common.h    |  1 +
>   drivers/net/ethernet/intel/i40e/i40e_xsk.c    | 21 ++++++++++++-------
>   2 files changed, 15 insertions(+), 7 deletions(-)
> 
[...]

I noticed you are only doing this for the Zero-Copy variants.
Wouldn't this also be a benefit for normal AF_XDP ?


> diff --git a/drivers/net/ethernet/intel/i40e/i40e_xsk.c b/drivers/net/ethernet/intel/i40e/i40e_xsk.c
> index c1d25b0b0ca2..9f9e4ce9a24d 100644
> --- a/drivers/net/ethernet/intel/i40e/i40e_xsk.c
> +++ b/drivers/net/ethernet/intel/i40e/i40e_xsk.c
> @@ -161,9 +161,10 @@ static int i40e_run_xdp_zc(struct i40e_ring *rx_ring, struct xdp_buff *xdp)
>   
>   	if (likely(act == XDP_REDIRECT)) {
>   		err = xdp_do_redirect(rx_ring->netdev, xdp, xdp_prog);
> -		if (err)
> -			goto out_failure;
> -		return I40E_XDP_REDIR;
> +		if (!err)
> +			return I40E_XDP_REDIR;
> +		result = (err == -ENOBUFS) ? I40E_XDP_EXIT : I40E_XDP_CONSUMED;
> +		goto out_failure;
>   	}
>   
>   	switch (act) {
> @@ -175,6 +176,9 @@ static int i40e_run_xdp_zc(struct i40e_ring *rx_ring, struct xdp_buff *xdp)
>   		if (result == I40E_XDP_CONSUMED)
>   			goto out_failure;
>   		break;
> +	case XDP_DROP:
> +		result = I40E_XDP_CONSUMED;
> +		break;
>   	default:
>   		bpf_warn_invalid_xdp_action(rx_ring->netdev, xdp_prog, act);
>   		fallthrough;
> @@ -182,9 +186,6 @@ static int i40e_run_xdp_zc(struct i40e_ring *rx_ring, struct xdp_buff *xdp)
>   out_failure:
>   		trace_xdp_exception(rx_ring->netdev, xdp_prog, act);
>   		fallthrough; /* handle aborts by dropping packet */
> -	case XDP_DROP:
> -		result = I40E_XDP_CONSUMED;
> -		break;
>   	}
>   	return result;
>   }
> @@ -370,6 +371,12 @@ int i40e_clean_rx_irq_zc(struct i40e_ring *rx_ring, int budget)
>   		xsk_buff_dma_sync_for_cpu(bi, rx_ring->xsk_pool);
>   
>   		xdp_res = i40e_run_xdp_zc(rx_ring, bi);
> +		if (xdp_res == I40E_XDP_EXIT) {
> +			failure = true;
> +			xsk_buff_free(bi);
> +			next_to_clean = (next_to_clean + 1) & count_mask;
> +			break;
> +		}
>   		i40e_handle_xdp_result_zc(rx_ring, bi, rx_desc, &rx_packets,
>   					  &rx_bytes, size, xdp_res);
>   		total_rx_packets += rx_packets;
> @@ -382,7 +389,7 @@ int i40e_clean_rx_irq_zc(struct i40e_ring *rx_ring, int budget)
>   	cleaned_count = (next_to_clean - rx_ring->next_to_use - 1) & count_mask;
>   
>   	if (cleaned_count >= I40E_RX_BUFFER_WRITE)
> -		failure = !i40e_alloc_rx_buffers_zc(rx_ring, cleaned_count);
> +		failure |= !i40e_alloc_rx_buffers_zc(rx_ring, cleaned_count);
>   
>   	i40e_finalize_xdp_rx(rx_ring, xdp_xmit);
>   	i40e_update_rx_stats(rx_ring, total_rx_bytes, total_rx_packets);
> 


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH bpf-next 02/10] xsk: diversify return codes in xsk_rcv_check()
  2022-04-05 13:00   ` Jesper Dangaard Brouer
@ 2022-04-05 13:35     ` Maciej Fijalkowski
  0 siblings, 0 replies; 30+ messages in thread
From: Maciej Fijalkowski @ 2022-04-05 13:35 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: bpf, ast, daniel, magnus.karlsson, bjorn, brouer, netdev,
	maximmi, alexandr.lobakin

On Tue, Apr 05, 2022 at 03:00:25PM +0200, Jesper Dangaard Brouer wrote:
> 
> 
> On 05/04/2022 13.06, Maciej Fijalkowski wrote:
> > Inspired by patch that made xdp_do_redirect() return values for XSKMAP
> > more meaningful, return -ENXIO instead of -EINVAL for socket being
> > unbound in xsk_rcv_check() as this is the usual value that is returned
> > for such event. In turn, it is now possible to easily distinguish what
> > went wrong, which is a bit harder when for both cases checked, -EINVAL
> > was returned.
> 
> I like this as it makes it easier to troubleshoot.
> Could you update the description to explain how to debug this easily.
> E.g. via this bpftrace one liner:
> 
> 
>  bpftrace -e 'tracepoint:xdp:xdp_redirect* {@err[-args->err] = count();}'

Nice one! I'll include this in the commit message of v2.

> 
> 
> > Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
> > ---
> 
> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
> 
> >   net/xdp/xsk.c | 2 +-
> >   1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
> > index f75e121073e7..040c73345b7c 100644
> > --- a/net/xdp/xsk.c
> > +++ b/net/xdp/xsk.c
> > @@ -217,7 +217,7 @@ static bool xsk_is_bound(struct xdp_sock *xs)
> >   static int xsk_rcv_check(struct xdp_sock *xs, struct xdp_buff *xdp)
> >   {
> >   	if (!xsk_is_bound(xs))
> > -		return -EINVAL;
> > +		return -ENXIO;
> >   	if (xs->dev != xdp->rxq->dev || xs->queue_id != xdp->rxq->queue_index)
> >   		return -EINVAL;
> > 
> 

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH bpf-next 05/10] ixgbe: xsk: terminate NAPI when XSK Rx queue gets full
  2022-04-05 12:36   ` Jesper Dangaard Brouer
@ 2022-04-05 13:52     ` Maciej Fijalkowski
  0 siblings, 0 replies; 30+ messages in thread
From: Maciej Fijalkowski @ 2022-04-05 13:52 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: bpf, ast, daniel, magnus.karlsson, bjorn, brouer, netdev,
	maximmi, alexandr.lobakin

On Tue, Apr 05, 2022 at 02:36:41PM +0200, Jesper Dangaard Brouer wrote:
> 
> On 05/04/2022 13.06, Maciej Fijalkowski wrote:
> > Correlate -ENOBUFS that was returned from xdp_do_redirect() with a XSK
> > Rx queue being full. In such case, terminate the softirq processing and
> > let the user space to consume descriptors from XSK Rx queue so that
> > there is room that driver can use later on.
> > 
> > Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
> > ---
> >   .../ethernet/intel/ixgbe/ixgbe_txrx_common.h  |  1 +
> >   drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c  | 23 ++++++++++++-------
> >   2 files changed, 16 insertions(+), 8 deletions(-)
> > 
> > diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_txrx_common.h b/drivers/net/ethernet/intel/ixgbe/ixgbe_txrx_common.h
> > index bba3feaf3318..f1f69ce67420 100644
> > --- a/drivers/net/ethernet/intel/ixgbe/ixgbe_txrx_common.h
> > +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_txrx_common.h
> > @@ -8,6 +8,7 @@
> >   #define IXGBE_XDP_CONSUMED	BIT(0)
> >   #define IXGBE_XDP_TX		BIT(1)
> >   #define IXGBE_XDP_REDIR		BIT(2)
> > +#define IXGBE_XDP_EXIT		BIT(3)
> >   #define IXGBE_TXD_CMD (IXGBE_TXD_CMD_EOP | \
> >   		       IXGBE_TXD_CMD_RS)
> > diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c
> > index dd7ff66d422f..475244a2c6e4 100644
> > --- a/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c
> > +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c
> > @@ -109,9 +109,10 @@ static int ixgbe_run_xdp_zc(struct ixgbe_adapter *adapter,
> >   	if (likely(act == XDP_REDIRECT)) {
> >   		err = xdp_do_redirect(rx_ring->netdev, xdp, xdp_prog);
> > -		if (err)
> > -			goto out_failure;
> > -		return IXGBE_XDP_REDIR;
> > +		if (!err)
> > +			return IXGBE_XDP_REDIR;
> > +		result = (err == -ENOBUFS) ? IXGBE_XDP_EXIT : IXGBE_XDP_CONSUMED;
> > +		goto out_failure;
> >   	}
> >   	switch (act) {
> > @@ -130,6 +131,9 @@ static int ixgbe_run_xdp_zc(struct ixgbe_adapter *adapter,
> >   		if (result == IXGBE_XDP_CONSUMED)
> >   			goto out_failure;
> >   		break;
> > +	case XDP_DROP:
> > +		result = IXGBE_XDP_CONSUMED;
> > +		break;
> >   	default:
> >   		bpf_warn_invalid_xdp_action(rx_ring->netdev, xdp_prog, act);
> >   		fallthrough;
> > @@ -137,9 +141,6 @@ static int ixgbe_run_xdp_zc(struct ixgbe_adapter *adapter,
> >   out_failure:
> >   		trace_xdp_exception(rx_ring->netdev, xdp_prog, act);
> >   		fallthrough; /* handle aborts by dropping packet */
> > -	case XDP_DROP:
> > -		result = IXGBE_XDP_CONSUMED;
> > -		break;
> >   	}
> >   	return result;
> >   }
> > @@ -304,10 +305,16 @@ int ixgbe_clean_rx_irq_zc(struct ixgbe_q_vector *q_vector,
> >   		xdp_res = ixgbe_run_xdp_zc(adapter, rx_ring, bi->xdp);
> >   		if (xdp_res) {
> > -			if (xdp_res & (IXGBE_XDP_TX | IXGBE_XDP_REDIR))
> > +			if (xdp_res == IXGBE_XDP_EXIT) {
> 
> Micro optimization note: Having this as the first if()-statement
> defaults the compiler to think this is the likely() case. (But compilers
> can obviously be smarter and can easily choose that the IXGBE_XDP_REDIR
> branch is so simple that it takes it as the likely case).
> Just wanted to mention this, given this is fash-path code.

Good point. Since we're 'likely-fying' redirect path in
ixgbe_run_xdp_zc(), maybe it would make sense to make the branch that does
xdp_res & IXGBE_XDP_REDIR check as the likely() one.

> 
> > +				failure = true;
> > +				xsk_buff_free(bi->xdp);
> > +				ixgbe_inc_ntc(rx_ring);
> > +				break;
> 
> I was wondering if we have a situation where we should set xdp_xmit bit
> to trigger the call to xdp_do_flush_map later in function, but I assume
> you have this covered.

For every previous successful redirect xdp_xmit will be set with
corresponding bits that came out of ixgbe_run_xdp_zc(), so if we got to
the point of full XSK Rx queue, xdp_do_flush_map() will be called
eventually. Before doing so, idea is to give the current buffer back to
the XSK buffer pool and increment the "next_to_clean" which acts as the
head pointer on HW Rx ring. IOW, consume the current buffer/descriptor and
yield the CPU to the user space.

> 
> > +			} else if (xdp_res & (IXGBE_XDP_TX | IXGBE_XDP_REDIR)) {
> >   				xdp_xmit |= xdp_res;
> > -			else
> > +			} else {
> >   				xsk_buff_free(bi->xdp);
> > +			}
> >   			bi->xdp = NULL;
> >   			total_rx_packets++;
> 

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH bpf-next 04/10] i40e: xsk: terminate NAPI when XSK Rx queue gets full
  2022-04-05 13:04   ` Jesper Dangaard Brouer
@ 2022-04-06 16:04     ` Maciej Fijalkowski
  0 siblings, 0 replies; 30+ messages in thread
From: Maciej Fijalkowski @ 2022-04-06 16:04 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: bpf, ast, daniel, magnus.karlsson, bjorn, brouer, netdev,
	maximmi, alexandr.lobakin, Toke Hoiland Jorgensen

On Tue, Apr 05, 2022 at 03:04:17PM +0200, Jesper Dangaard Brouer wrote:
> 
> 
> On 05/04/2022 13.06, Maciej Fijalkowski wrote:
> > Correlate -ENOBUFS that was returned from xdp_do_redirect() with a XSK
> > Rx queue being full. In such case, terminate the softirq processing and
> > let the user space to consume descriptors from XSK Rx queue so that
> > there is room that driver can use later on.
> > 
> > Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
> > ---
> >   .../ethernet/intel/i40e/i40e_txrx_common.h    |  1 +
> >   drivers/net/ethernet/intel/i40e/i40e_xsk.c    | 21 ++++++++++++-------
> >   2 files changed, 15 insertions(+), 7 deletions(-)
> > 
> [...]
> 
> I noticed you are only doing this for the Zero-Copy variants.
> Wouldn't this also be a benefit for normal AF_XDP ?

Sorry for the delay, indeed this would improve AF_XDP in copy mode as
well, but only after a fix I have sent (not on lore yet :<).

I'll adjust patches to check for -ENOBUFS in $DRIVER_txrx.c and send a v2.

> 
> 
> > diff --git a/drivers/net/ethernet/intel/i40e/i40e_xsk.c b/drivers/net/ethernet/intel/i40e/i40e_xsk.c
> > index c1d25b0b0ca2..9f9e4ce9a24d 100644
> > --- a/drivers/net/ethernet/intel/i40e/i40e_xsk.c
> > +++ b/drivers/net/ethernet/intel/i40e/i40e_xsk.c
> > @@ -161,9 +161,10 @@ static int i40e_run_xdp_zc(struct i40e_ring *rx_ring, struct xdp_buff *xdp)
> >   	if (likely(act == XDP_REDIRECT)) {
> >   		err = xdp_do_redirect(rx_ring->netdev, xdp, xdp_prog);
> > -		if (err)
> > -			goto out_failure;
> > -		return I40E_XDP_REDIR;
> > +		if (!err)
> > +			return I40E_XDP_REDIR;
> > +		result = (err == -ENOBUFS) ? I40E_XDP_EXIT : I40E_XDP_CONSUMED;
> > +		goto out_failure;
> >   	}
> >   	switch (act) {
> > @@ -175,6 +176,9 @@ static int i40e_run_xdp_zc(struct i40e_ring *rx_ring, struct xdp_buff *xdp)
> >   		if (result == I40E_XDP_CONSUMED)
> >   			goto out_failure;
> >   		break;
> > +	case XDP_DROP:
> > +		result = I40E_XDP_CONSUMED;
> > +		break;
> >   	default:
> >   		bpf_warn_invalid_xdp_action(rx_ring->netdev, xdp_prog, act);
> >   		fallthrough;
> > @@ -182,9 +186,6 @@ static int i40e_run_xdp_zc(struct i40e_ring *rx_ring, struct xdp_buff *xdp)
> >   out_failure:
> >   		trace_xdp_exception(rx_ring->netdev, xdp_prog, act);
> >   		fallthrough; /* handle aborts by dropping packet */
> > -	case XDP_DROP:
> > -		result = I40E_XDP_CONSUMED;
> > -		break;
> >   	}
> >   	return result;
> >   }
> > @@ -370,6 +371,12 @@ int i40e_clean_rx_irq_zc(struct i40e_ring *rx_ring, int budget)
> >   		xsk_buff_dma_sync_for_cpu(bi, rx_ring->xsk_pool);
> >   		xdp_res = i40e_run_xdp_zc(rx_ring, bi);
> > +		if (xdp_res == I40E_XDP_EXIT) {
> > +			failure = true;
> > +			xsk_buff_free(bi);
> > +			next_to_clean = (next_to_clean + 1) & count_mask;
> > +			break;
> > +		}
> >   		i40e_handle_xdp_result_zc(rx_ring, bi, rx_desc, &rx_packets,
> >   					  &rx_bytes, size, xdp_res);
> >   		total_rx_packets += rx_packets;
> > @@ -382,7 +389,7 @@ int i40e_clean_rx_irq_zc(struct i40e_ring *rx_ring, int budget)
> >   	cleaned_count = (next_to_clean - rx_ring->next_to_use - 1) & count_mask;
> >   	if (cleaned_count >= I40E_RX_BUFFER_WRITE)
> > -		failure = !i40e_alloc_rx_buffers_zc(rx_ring, cleaned_count);
> > +		failure |= !i40e_alloc_rx_buffers_zc(rx_ring, cleaned_count);
> >   	i40e_finalize_xdp_rx(rx_ring, xdp_xmit);
> >   	i40e_update_rx_stats(rx_ring, total_rx_bytes, total_rx_packets);
> > 
> 

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH bpf-next 00/10] xsk: stop softirq processing on full XSK Rx queue
  2022-04-05 11:06 [PATCH bpf-next 00/10] xsk: stop softirq processing on full XSK Rx queue Maciej Fijalkowski
                   ` (9 preceding siblings ...)
  2022-04-05 11:06 ` [PATCH bpf-next 10/10] xsk: drop ternary operator from xskq_cons_has_entries Maciej Fijalkowski
@ 2022-04-07 10:49 ` Maxim Mikityanskiy
  2022-04-08  9:08   ` Maciej Fijalkowski
  10 siblings, 1 reply; 30+ messages in thread
From: Maxim Mikityanskiy @ 2022-04-07 10:49 UTC (permalink / raw)
  To: Maciej Fijalkowski, bpf, ast, daniel, magnus.karlsson, bjorn
  Cc: netdev, brouer, alexandr.lobakin, Tariq Toukan, Saeed Mahameed

On 2022-04-05 14:06, Maciej Fijalkowski wrote:
> Hi!
> 
> This is a revival of Bjorn's idea [0] to break NAPI loop when XSK Rx
> queue gets full which in turn makes it impossible to keep on
> successfully producing descriptors to XSK Rx ring. By breaking out of
> the driver side immediately we will give the user space opportunity for
> consuming descriptors from XSK Rx ring and therefore provide room in the
> ring so that HW Rx -> XSK Rx redirection can be done.
> 
> Maxim asked and Jesper agreed on simplifying Bjorn's original API used
> for detecting the event of interest, so let's just simply check for
> -ENOBUFS within Intel's ZC drivers after an attempt to redirect a buffer
> to XSK Rx. No real need for redirect API extension.

I believe some of the other comments under the old series [0] might be 
still relevant:

1. need_wakeup behavior. If need_wakeup is disabled, the expected 
behavior is busy-looping in NAPI, you shouldn't break out early, as the 
application does not restart NAPI, and the driver restarts it itself, 
leading to a less efficient loop. If need_wakeup is enabled, it should 
be set on ENOBUFS - I believe this is the case here, right?

2. 50/50 AF_XDP and XDP_TX mix usecase. By breaking out early, you 
prevent further packets from being XDP_TXed, leading to unnecessary 
latency increase. The new feature should be opt-in, otherwise such 
usecases suffer.

3. When the driver receives ENOBUFS, it has to drop the packet before 
returning to the application. It would be better experience if your 
feature saved all N packets from being dropped, not just N-1.

4. A slow or malicious AF_XDP application may easily cause an overflow 
of the hardware receive ring. Your feature introduces a mechanism to 
pause the driver while the congestion is on the application side, but no 
symmetric mechanism to pause the application when the driver is close to 
an overflow. I don't know the behavior of Intel NICs on overflow, but in 
our NICs it's considered a critical error, that is followed by a 
recovery procedure, so it's not something that should happen under 
normal workloads.

> One might ask why it is still relevant even after having proper busy
> poll support in place - here is the justification.
> 
> For xdpsock that was:
> - run for l2fwd scenario,
> - app/driver processing took place on the same core in busy poll
>    with 2048 budget,
> - HW ring sizes Tx 256, Rx 2048,
> 
> this work improved throughput by 78% and reduced Rx queue full statistic
> bump by 99%.
> 
> For testing ice, make sure that you have [1] present on your side.
> 
> This set, besides the work described above, also carries also
> improvements around return codes in various XSK paths and lastly a minor
> optimization for xskq_cons_has_entries(), a helper that might be used
> when XSK Rx batching would make it to the kernel.

Regarding error codes, I would like them to be consistent across all 
drivers, otherwise all the debuggability improvements are not useful 
enough. Your series only changed Intel drivers. Here also applies the 
backward compatibility concern: the same error codes than were in use 
have been repurposed, which may confuse some of existing applications.

> Thanks!
> MF
> 
> [0]: https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.kernel.org%2Fbpf%2F20200904135332.60259-1-bjorn.topel%40gmail.com%2F&amp;data=04%7C01%7Cmaximmi%40nvidia.com%7Cc9cefaa3a1cd465ccdb908da16f45eaf%7C43083d15727340c1b7db39efd9ccc17a%7C0%7C0%7C637847536077594100%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&amp;sdata=sLpTcgboo9YU55wtUtaY1%2F%2FbeiYxeWP5ubk%2FQ6X8vB8%3D&amp;reserved=0
> [1]: https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.kernel.org%2Fnetdev%2F20220317175727.340251-1-maciej.fijalkowski%40intel.com%2F&amp;data=04%7C01%7Cmaximmi%40nvidia.com%7Cc9cefaa3a1cd465ccdb908da16f45eaf%7C43083d15727340c1b7db39efd9ccc17a%7C0%7C0%7C637847536077594100%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&amp;sdata=OWXeZhc2Nouz%2FTMWBxvtTYbw%2FS8HWQfbfEqnVc5478k%3D&amp;reserved=0
> 
> Björn Töpel (1):
>    xsk: improve xdp_do_redirect() error codes
> 
> Maciej Fijalkowski (9):
>    xsk: diversify return codes in xsk_rcv_check()
>    ice: xsk: terminate NAPI when XSK Rx queue gets full
>    i40e: xsk: terminate NAPI when XSK Rx queue gets full
>    ixgbe: xsk: terminate NAPI when XSK Rx queue gets full
>    ice: xsk: diversify return values from xsk_wakeup call paths
>    i40e: xsk: diversify return values from xsk_wakeup call paths
>    ixgbe: xsk: diversify return values from xsk_wakeup call paths
>    ice: xsk: avoid refilling single Rx descriptors
>    xsk: drop ternary operator from xskq_cons_has_entries
> 
>   .../ethernet/intel/i40e/i40e_txrx_common.h    |  1 +
>   drivers/net/ethernet/intel/i40e/i40e_xsk.c    | 27 +++++++++------
>   drivers/net/ethernet/intel/ice/ice_txrx.h     |  1 +
>   drivers/net/ethernet/intel/ice/ice_xsk.c      | 34 ++++++++++++-------
>   .../ethernet/intel/ixgbe/ixgbe_txrx_common.h  |  1 +
>   drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c  | 29 ++++++++++------
>   net/xdp/xsk.c                                 |  4 +--
>   net/xdp/xsk_queue.h                           |  4 +--
>   8 files changed, 64 insertions(+), 37 deletions(-)
> 


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH bpf-next 00/10] xsk: stop softirq processing on full XSK Rx queue
  2022-04-07 10:49 ` [PATCH bpf-next 00/10] xsk: stop softirq processing on full XSK Rx queue Maxim Mikityanskiy
@ 2022-04-08  9:08   ` Maciej Fijalkowski
  2022-04-08 12:48     ` Maxim Mikityanskiy
  0 siblings, 1 reply; 30+ messages in thread
From: Maciej Fijalkowski @ 2022-04-08  9:08 UTC (permalink / raw)
  To: Maxim Mikityanskiy
  Cc: bpf, ast, daniel, magnus.karlsson, bjorn, netdev, brouer,
	alexandr.lobakin, Tariq Toukan, Saeed Mahameed

On Thu, Apr 07, 2022 at 01:49:02PM +0300, Maxim Mikityanskiy wrote:
> On 2022-04-05 14:06, Maciej Fijalkowski wrote:
> > Hi!
> > 
> > This is a revival of Bjorn's idea [0] to break NAPI loop when XSK Rx
> > queue gets full which in turn makes it impossible to keep on
> > successfully producing descriptors to XSK Rx ring. By breaking out of
> > the driver side immediately we will give the user space opportunity for
> > consuming descriptors from XSK Rx ring and therefore provide room in the
> > ring so that HW Rx -> XSK Rx redirection can be done.
> > 
> > Maxim asked and Jesper agreed on simplifying Bjorn's original API used
> > for detecting the event of interest, so let's just simply check for
> > -ENOBUFS within Intel's ZC drivers after an attempt to redirect a buffer
> > to XSK Rx. No real need for redirect API extension.
> 

Hey Maxim!

> I believe some of the other comments under the old series [0] might be still
> relevant:
> 
> 1. need_wakeup behavior. If need_wakeup is disabled, the expected behavior
> is busy-looping in NAPI, you shouldn't break out early, as the application
> does not restart NAPI, and the driver restarts it itself, leading to a less
> efficient loop. If need_wakeup is enabled, it should be set on ENOBUFS - I
> believe this is the case here, right?

Good point. We currently set need_wakeup flag for -ENOBUFS case as it is
being done for failure == true. You are right that we shouldn't be
breaking the loop on -ENOBUFS if need_wakeup flag is not set on xsk_pool,
will fix!

> 
> 2. 50/50 AF_XDP and XDP_TX mix usecase. By breaking out early, you prevent
> further packets from being XDP_TXed, leading to unnecessary latency
> increase. The new feature should be opt-in, otherwise such usecases suffer.

Anyone performing a lot of XDP_TX (or XDP_PASS, etc) should be using the
regular copy-mode driver, while the zero-copy driver should be used when most
packets are sent up to user-space. For the zero-copy driver, this opt in is not
necessary. But it sounds like a valid option for copy mode, though could we
think about the proper way as a follow up to this work?

> 
> 3. When the driver receives ENOBUFS, it has to drop the packet before
> returning to the application. It would be better experience if your feature
> saved all N packets from being dropped, not just N-1.

Sure, I'll re-run tests and see if we can omit freeing the current
xdp_buff and ntc bump, so that we would come back later on to the same
entry.

> 
> 4. A slow or malicious AF_XDP application may easily cause an overflow of
> the hardware receive ring. Your feature introduces a mechanism to pause the
> driver while the congestion is on the application side, but no symmetric
> mechanism to pause the application when the driver is close to an overflow.
> I don't know the behavior of Intel NICs on overflow, but in our NICs it's
> considered a critical error, that is followed by a recovery procedure, so
> it's not something that should happen under normal workloads.

I'm not sure I follow on this one. Feature is about overflowing the XSK
receive ring, not the HW one, right? Driver picks entries from fill ring
that were produced by app, so if app is slow on producing those I believe
this would be rather an underflow of ring, we would simply receive less
frames. For HW Rx ring actually being full, I think that HW would be
dropping the incoming frames, so I don't see the real reason to treat this
as critical error that needs to go through recovery.

Am I missing something? Maybe I have just misunderstood you.

> 
> > One might ask why it is still relevant even after having proper busy
> > poll support in place - here is the justification.
> > 
> > For xdpsock that was:
> > - run for l2fwd scenario,
> > - app/driver processing took place on the same core in busy poll
> >    with 2048 budget,
> > - HW ring sizes Tx 256, Rx 2048,
> > 
> > this work improved throughput by 78% and reduced Rx queue full statistic
> > bump by 99%.
> > 
> > For testing ice, make sure that you have [1] present on your side.
> > 
> > This set, besides the work described above, also carries also
> > improvements around return codes in various XSK paths and lastly a minor
> > optimization for xskq_cons_has_entries(), a helper that might be used
> > when XSK Rx batching would make it to the kernel.
> 
> Regarding error codes, I would like them to be consistent across all
> drivers, otherwise all the debuggability improvements are not useful enough.
> Your series only changed Intel drivers. Here also applies the backward
> compatibility concern: the same error codes than were in use have been
> repurposed, which may confuse some of existing applications.

I'll double check if ZC drivers are doing something unusual with return
values from xdp_do_redirect(). Regarding backward comp, I suppose you
refer only to changes in ndo_xsk_wakeup() callbacks as others are not
exposed to user space? They're not crucial to me, but it improved my
debugging experience.

FYI, your mail landed in my junk folder and the links [0] [1] are messed up in
the reply you sent. And this is true even on lore.kernel.org. They suddenly
refer to "nam11.safelinks.protection.outlook.com". Maybe something worth
looking into if you have this problem in the future.

> 
> > Thanks!
> > MF
> > 
> > [0]: https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.kernel.org%2Fbpf%2F20200904135332.60259-1-bjorn.topel%40gmail.com%2F&amp;data=04%7C01%7Cmaximmi%40nvidia.com%7Cc9cefaa3a1cd465ccdb908da16f45eaf%7C43083d15727340c1b7db39efd9ccc17a%7C0%7C0%7C637847536077594100%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&amp;sdata=sLpTcgboo9YU55wtUtaY1%2F%2FbeiYxeWP5ubk%2FQ6X8vB8%3D&amp;reserved=0
> > [1]: https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.kernel.org%2Fnetdev%2F20220317175727.340251-1-maciej.fijalkowski%40intel.com%2F&amp;data=04%7C01%7Cmaximmi%40nvidia.com%7Cc9cefaa3a1cd465ccdb908da16f45eaf%7C43083d15727340c1b7db39efd9ccc17a%7C0%7C0%7C637847536077594100%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&amp;sdata=OWXeZhc2Nouz%2FTMWBxvtTYbw%2FS8HWQfbfEqnVc5478k%3D&amp;reserved=0
> > 
> > Björn Töpel (1):
> >    xsk: improve xdp_do_redirect() error codes
> > 
> > Maciej Fijalkowski (9):
> >    xsk: diversify return codes in xsk_rcv_check()
> >    ice: xsk: terminate NAPI when XSK Rx queue gets full
> >    i40e: xsk: terminate NAPI when XSK Rx queue gets full
> >    ixgbe: xsk: terminate NAPI when XSK Rx queue gets full
> >    ice: xsk: diversify return values from xsk_wakeup call paths
> >    i40e: xsk: diversify return values from xsk_wakeup call paths
> >    ixgbe: xsk: diversify return values from xsk_wakeup call paths
> >    ice: xsk: avoid refilling single Rx descriptors
> >    xsk: drop ternary operator from xskq_cons_has_entries
> > 
> >   .../ethernet/intel/i40e/i40e_txrx_common.h    |  1 +
> >   drivers/net/ethernet/intel/i40e/i40e_xsk.c    | 27 +++++++++------
> >   drivers/net/ethernet/intel/ice/ice_txrx.h     |  1 +
> >   drivers/net/ethernet/intel/ice/ice_xsk.c      | 34 ++++++++++++-------
> >   .../ethernet/intel/ixgbe/ixgbe_txrx_common.h  |  1 +
> >   drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c  | 29 ++++++++++------
> >   net/xdp/xsk.c                                 |  4 +--
> >   net/xdp/xsk_queue.h                           |  4 +--
> >   8 files changed, 64 insertions(+), 37 deletions(-)
> > 
> 

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH bpf-next 00/10] xsk: stop softirq processing on full XSK Rx queue
  2022-04-08  9:08   ` Maciej Fijalkowski
@ 2022-04-08 12:48     ` Maxim Mikityanskiy
  2022-04-08 18:17       ` Jakub Kicinski
  2022-04-11 15:35       ` Maciej Fijalkowski
  0 siblings, 2 replies; 30+ messages in thread
From: Maxim Mikityanskiy @ 2022-04-08 12:48 UTC (permalink / raw)
  To: Maciej Fijalkowski
  Cc: bpf, ast, daniel, magnus.karlsson, bjorn, netdev, brouer,
	alexandr.lobakin, Tariq Toukan, Saeed Mahameed

On 2022-04-08 12:08, Maciej Fijalkowski wrote:
> On Thu, Apr 07, 2022 at 01:49:02PM +0300, Maxim Mikityanskiy wrote:
>> On 2022-04-05 14:06, Maciej Fijalkowski wrote:
>>> Hi!
>>>
>>> This is a revival of Bjorn's idea [0] to break NAPI loop when XSK Rx
>>> queue gets full which in turn makes it impossible to keep on
>>> successfully producing descriptors to XSK Rx ring. By breaking out of
>>> the driver side immediately we will give the user space opportunity for
>>> consuming descriptors from XSK Rx ring and therefore provide room in the
>>> ring so that HW Rx -> XSK Rx redirection can be done.
>>>
>>> Maxim asked and Jesper agreed on simplifying Bjorn's original API used
>>> for detecting the event of interest, so let's just simply check for
>>> -ENOBUFS within Intel's ZC drivers after an attempt to redirect a buffer
>>> to XSK Rx. No real need for redirect API extension.
>>
> 
> Hey Maxim!
> 
>> I believe some of the other comments under the old series [0] might be still
>> relevant:
>>
>> 1. need_wakeup behavior. If need_wakeup is disabled, the expected behavior
>> is busy-looping in NAPI, you shouldn't break out early, as the application
>> does not restart NAPI, and the driver restarts it itself, leading to a less
>> efficient loop. If need_wakeup is enabled, it should be set on ENOBUFS - I
>> believe this is the case here, right?
> 
> Good point. We currently set need_wakeup flag for -ENOBUFS case as it is
> being done for failure == true. You are right that we shouldn't be
> breaking the loop on -ENOBUFS if need_wakeup flag is not set on xsk_pool,
> will fix!
> 
>>
>> 2. 50/50 AF_XDP and XDP_TX mix usecase. By breaking out early, you prevent
>> further packets from being XDP_TXed, leading to unnecessary latency
>> increase. The new feature should be opt-in, otherwise such usecases suffer.
> 
> Anyone performing a lot of XDP_TX (or XDP_PASS, etc) should be using the
> regular copy-mode driver, while the zero-copy driver should be used when most
> packets are sent up to user-space.

You generalized that easily, but how can you be so sure that all mixed 
use cases can live with the much slower copy mode? Also, how do you 
apply your rule of thumb to the 75/25 AF_XDP/XDP_TX use case? It's both 
"a lot of XDP_TX" and "most packets are sent up to user-space" at the 
same time.

At the moment, the application is free to decide whether it wants 
zerocopy XDP_TX or zerocopy AF_XDP, depending on its needs. After your 
patchset the only valid XDP verdict on zerocopy AF_XDP basically becomes 
"XDP_REDIRECT to XSKMAP". I don't think it's valid to break an entire 
feature to speed up some very specific use case.

Moreover, in early days of AF_XDP there was an attempt to implement 
zerocopy XDP_TX on AF_XDP queues, meaning both XDP_TX and AF_XDP could 
be zerocopy. The implementation suffered from possible overflows in 
driver queues, thus wasn't upstreamed, but it's still a valid idea that 
potentially could be done if overflows are worked around somehow.

> For the zero-copy driver, this opt in is not
> necessary. But it sounds like a valid option for copy mode, though could we
> think about the proper way as a follow up to this work?

My opinion is that the knob has to be part of initial submission, and 
the new feature should be disabled by default, otherwise we have huge 
issues with backward compatibility (if we delay it, the next update 
changes the behavior, breaking some existing use cases, and there is no 
way to work around it).

>>
>> 3. When the driver receives ENOBUFS, it has to drop the packet before
>> returning to the application. It would be better experience if your feature
>> saved all N packets from being dropped, not just N-1.
> 
> Sure, I'll re-run tests and see if we can omit freeing the current
> xdp_buff and ntc bump, so that we would come back later on to the same
> entry.
> 
>>
>> 4. A slow or malicious AF_XDP application may easily cause an overflow of
>> the hardware receive ring. Your feature introduces a mechanism to pause the
>> driver while the congestion is on the application side, but no symmetric
>> mechanism to pause the application when the driver is close to an overflow.
>> I don't know the behavior of Intel NICs on overflow, but in our NICs it's
>> considered a critical error, that is followed by a recovery procedure, so
>> it's not something that should happen under normal workloads.
> 
> I'm not sure I follow on this one. Feature is about overflowing the XSK
> receive ring, not the HW one, right?

Right. So we have this pipeline of buffers:

NIC--> [HW RX ring] --NAPI--> [XSK RX ring] --app--> consumes packets

Currently, when the NIC puts stuff in HW RX ring, NAPI always runs and 
drains it either to XSK RX ring or to /dev/null if XSK RX ring is full. 
The driver fulfills its responsibility to prevent overflows of HW RX 
ring. If the application doesn't consume quick enough, the frames will 
be leaked, but it's only the application's issue, the driver stays 
consistent.

After the feature, it's possible to pause NAPI from the userspace 
application, effectively disrupting the driver's consistency. I don't 
think an XSK application should have this power.

> Driver picks entries from fill ring
> that were produced by app, so if app is slow on producing those I believe
> this would be rather an underflow of ring, we would simply receive less
> frames. For HW Rx ring actually being full, I think that HW would be
> dropping the incoming frames, so I don't see the real reason to treat this
> as critical error that needs to go through recovery.

I'll double check regarding the hardware behavior, but it is what it is. 
If an overflow moves the queue to the fault state and requires a 
recovery, there is nothing I can do about that.

A few more thoughts I just had: mlx5e shares the same NAPI instance to 
serve all queues in a channel, that includes the XSK RQ and the regular 
RQ. The regular and XSK traffic can be configured to be isolated to 
different channels, or they may co-exist on the same channel. If they 
co-exist, and XSK asks to pause NAPI, the regular traffic will still run 
NAPI and drop 1 XSK packet per NAPI cycle, unless point 3 is fixed. It 
can also be reproduced if NAPI is woken up by XSK TX. Besides, (correct 
me if I'm wrong) your current implementation introduces extra latency to 
XSK TX if XSK RX asked to pause NAPI, because NAPI will be restarted 
anyway (by TX wakeup), and it could have been rescheduled by the kernel.

> Am I missing something? Maybe I have just misunderstood you.
> 
>>
>>> One might ask why it is still relevant even after having proper busy
>>> poll support in place - here is the justification.
>>>
>>> For xdpsock that was:
>>> - run for l2fwd scenario,
>>> - app/driver processing took place on the same core in busy poll
>>>     with 2048 budget,
>>> - HW ring sizes Tx 256, Rx 2048,
>>>
>>> this work improved throughput by 78% and reduced Rx queue full statistic
>>> bump by 99%.
>>>
>>> For testing ice, make sure that you have [1] present on your side.
>>>
>>> This set, besides the work described above, also carries also
>>> improvements around return codes in various XSK paths and lastly a minor
>>> optimization for xskq_cons_has_entries(), a helper that might be used
>>> when XSK Rx batching would make it to the kernel.
>>
>> Regarding error codes, I would like them to be consistent across all
>> drivers, otherwise all the debuggability improvements are not useful enough.
>> Your series only changed Intel drivers. Here also applies the backward
>> compatibility concern: the same error codes than were in use have been
>> repurposed, which may confuse some of existing applications.
> 
> I'll double check if ZC drivers are doing something unusual with return
> values from xdp_do_redirect(). Regarding backward comp, I suppose you
> refer only to changes in ndo_xsk_wakeup() callbacks as others are not
> exposed to user space? They're not crucial to me, but it improved my
> debugging experience.

Sorry if I wasn't clear enough. Yes, I meant the wakeup error codes. We 
aren't doing anything unusual with xdp_do_redirect codes (can't say for 
other drivers, though).

Last time I wanted to improve error codes returned from some BPF helpers 
(make the errors more distinguishable), my patch was blocked because of 
backward compatibility concerns. To be on the safe side (i.e. to avoid 
further bug reports from someone who actually relied on specific codes), 
you might want to use a new error code, rather than repurposing the 
existing ones.

I personally don't have objections about changing the error codes the 
way you did if you keep them consistent across all drivers, not only 
Intel ones.

> FYI, your mail landed in my junk folder

That has to be something with your email server. I just sent an email to 
gmail, and it arrived to inbox. If anyone else (other than @intel.com) 
can't receive my emails, please tell me, I'll open a support ticket then.

> and the links [0] [1] are messed up in
> the reply you sent. And this is true even on lore.kernel.org. They suddenly
> refer to "nam11.safelinks.protection.outlook.com".

I'm afraid I can't do anything with these links. Our outlook server 
mangles them in all incoming letters as soon as they arrive. The letter 
I received from you already has them "sanitized".

> Maybe something worth
> looking into if you have this problem in the future.
> 
>>
>>> Thanks!
>>> MF
>>>
>>> [0]: https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.kernel.org%2Fbpf%2F20200904135332.60259-1-bjorn.topel%40gmail.com%2F&amp;data=04%7C01%7Cmaximmi%40nvidia.com%7Cc9cefaa3a1cd465ccdb908da16f45eaf%7C43083d15727340c1b7db39efd9ccc17a%7C0%7C0%7C637847536077594100%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&amp;sdata=sLpTcgboo9YU55wtUtaY1%2F%2FbeiYxeWP5ubk%2FQ6X8vB8%3D&amp;reserved=0
>>> [1]: https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.kernel.org%2Fnetdev%2F20220317175727.340251-1-maciej.fijalkowski%40intel.com%2F&amp;data=04%7C01%7Cmaximmi%40nvidia.com%7Cc9cefaa3a1cd465ccdb908da16f45eaf%7C43083d15727340c1b7db39efd9ccc17a%7C0%7C0%7C637847536077594100%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&amp;sdata=OWXeZhc2Nouz%2FTMWBxvtTYbw%2FS8HWQfbfEqnVc5478k%3D&amp;reserved=0
>>>
>>> Björn Töpel (1):
>>>     xsk: improve xdp_do_redirect() error codes
>>>
>>> Maciej Fijalkowski (9):
>>>     xsk: diversify return codes in xsk_rcv_check()
>>>     ice: xsk: terminate NAPI when XSK Rx queue gets full
>>>     i40e: xsk: terminate NAPI when XSK Rx queue gets full
>>>     ixgbe: xsk: terminate NAPI when XSK Rx queue gets full
>>>     ice: xsk: diversify return values from xsk_wakeup call paths
>>>     i40e: xsk: diversify return values from xsk_wakeup call paths
>>>     ixgbe: xsk: diversify return values from xsk_wakeup call paths
>>>     ice: xsk: avoid refilling single Rx descriptors
>>>     xsk: drop ternary operator from xskq_cons_has_entries
>>>
>>>    .../ethernet/intel/i40e/i40e_txrx_common.h    |  1 +
>>>    drivers/net/ethernet/intel/i40e/i40e_xsk.c    | 27 +++++++++------
>>>    drivers/net/ethernet/intel/ice/ice_txrx.h     |  1 +
>>>    drivers/net/ethernet/intel/ice/ice_xsk.c      | 34 ++++++++++++-------
>>>    .../ethernet/intel/ixgbe/ixgbe_txrx_common.h  |  1 +
>>>    drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c  | 29 ++++++++++------
>>>    net/xdp/xsk.c                                 |  4 +--
>>>    net/xdp/xsk_queue.h                           |  4 +--
>>>    8 files changed, 64 insertions(+), 37 deletions(-)
>>>
>>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH bpf-next 00/10] xsk: stop softirq processing on full XSK Rx queue
  2022-04-08 12:48     ` Maxim Mikityanskiy
@ 2022-04-08 18:17       ` Jakub Kicinski
  2022-04-11 15:46         ` Maciej Fijalkowski
  2022-04-11 15:35       ` Maciej Fijalkowski
  1 sibling, 1 reply; 30+ messages in thread
From: Jakub Kicinski @ 2022-04-08 18:17 UTC (permalink / raw)
  To: Maxim Mikityanskiy, Maciej Fijalkowski
  Cc: bpf, ast, daniel, magnus.karlsson, bjorn, netdev, brouer,
	alexandr.lobakin, Tariq Toukan, Saeed Mahameed

On Fri, 8 Apr 2022 15:48:44 +0300 Maxim Mikityanskiy wrote:
> >> 4. A slow or malicious AF_XDP application may easily cause an overflow of
> >> the hardware receive ring. Your feature introduces a mechanism to pause the
> >> driver while the congestion is on the application side, but no symmetric
> >> mechanism to pause the application when the driver is close to an overflow.
> >> I don't know the behavior of Intel NICs on overflow, but in our NICs it's
> >> considered a critical error, that is followed by a recovery procedure, so
> >> it's not something that should happen under normal workloads.  
> > 
> > I'm not sure I follow on this one. Feature is about overflowing the XSK
> > receive ring, not the HW one, right?  
> 
> Right. So we have this pipeline of buffers:
> 
> NIC--> [HW RX ring] --NAPI--> [XSK RX ring] --app--> consumes packets
> 
> Currently, when the NIC puts stuff in HW RX ring, NAPI always runs and 
> drains it either to XSK RX ring or to /dev/null if XSK RX ring is full. 
> The driver fulfills its responsibility to prevent overflows of HW RX 
> ring. If the application doesn't consume quick enough, the frames will 
> be leaked, but it's only the application's issue, the driver stays 
> consistent.
> 
> After the feature, it's possible to pause NAPI from the userspace 
> application, effectively disrupting the driver's consistency. I don't 
> think an XSK application should have this power.

+1
cover letter refers to busy poll, but did that test enable prefer busy
poll w/ the timeout configured right? It seems like similar goal can 
be achieved with just that.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH bpf-next 00/10] xsk: stop softirq processing on full XSK Rx queue
  2022-04-08 12:48     ` Maxim Mikityanskiy
  2022-04-08 18:17       ` Jakub Kicinski
@ 2022-04-11 15:35       ` Maciej Fijalkowski
  2022-04-13 10:40         ` Maxim Mikityanskiy
  1 sibling, 1 reply; 30+ messages in thread
From: Maciej Fijalkowski @ 2022-04-11 15:35 UTC (permalink / raw)
  To: Maxim Mikityanskiy
  Cc: bpf, ast, daniel, magnus.karlsson, bjorn, netdev, brouer,
	alexandr.lobakin, Tariq Toukan, Saeed Mahameed

On Fri, Apr 08, 2022 at 03:48:44PM +0300, Maxim Mikityanskiy wrote:
> On 2022-04-08 12:08, Maciej Fijalkowski wrote:
> > On Thu, Apr 07, 2022 at 01:49:02PM +0300, Maxim Mikityanskiy wrote:
> > > On 2022-04-05 14:06, Maciej Fijalkowski wrote:
> > > > Hi!
> > > > 
> > > > This is a revival of Bjorn's idea [0] to break NAPI loop when XSK Rx
> > > > queue gets full which in turn makes it impossible to keep on
> > > > successfully producing descriptors to XSK Rx ring. By breaking out of
> > > > the driver side immediately we will give the user space opportunity for
> > > > consuming descriptors from XSK Rx ring and therefore provide room in the
> > > > ring so that HW Rx -> XSK Rx redirection can be done.
> > > > 
> > > > Maxim asked and Jesper agreed on simplifying Bjorn's original API used
> > > > for detecting the event of interest, so let's just simply check for
> > > > -ENOBUFS within Intel's ZC drivers after an attempt to redirect a buffer
> > > > to XSK Rx. No real need for redirect API extension.
> > > 
> > 
> > Hey Maxim!
> > 
> > > I believe some of the other comments under the old series [0] might be still
> > > relevant:
> > > 
> > > 1. need_wakeup behavior. If need_wakeup is disabled, the expected behavior
> > > is busy-looping in NAPI, you shouldn't break out early, as the application
> > > does not restart NAPI, and the driver restarts it itself, leading to a less
> > > efficient loop. If need_wakeup is enabled, it should be set on ENOBUFS - I
> > > believe this is the case here, right?
> > 
> > Good point. We currently set need_wakeup flag for -ENOBUFS case as it is
> > being done for failure == true. You are right that we shouldn't be
> > breaking the loop on -ENOBUFS if need_wakeup flag is not set on xsk_pool,
> > will fix!
> > 
> > > 
> > > 2. 50/50 AF_XDP and XDP_TX mix usecase. By breaking out early, you prevent
> > > further packets from being XDP_TXed, leading to unnecessary latency
> > > increase. The new feature should be opt-in, otherwise such usecases suffer.
> > 
> > Anyone performing a lot of XDP_TX (or XDP_PASS, etc) should be using the
> > regular copy-mode driver, while the zero-copy driver should be used when most
> > packets are sent up to user-space.
> 
> You generalized that easily, but how can you be so sure that all mixed use
> cases can live with the much slower copy mode? Also, how do you apply your
> rule of thumb to the 75/25 AF_XDP/XDP_TX use case? It's both "a lot of
> XDP_TX" and "most packets are sent up to user-space" at the same time.

We are not aware of a single user that has this use case. What we do know
is that we have a lot of users that care about zero packet loss
performance when redirecting to an AF_XDP socket when using the zero-copy
driver. And this work addresses one of the corner cases and therefore
makes ZC driver better overall. So we say, focus on the cases people care
about. BTW, we do have users using mainly XDP_TX, XDP_PASS and
XDP_REDIRECT, but they are all using the regular driver for a good reason.
So we should not destroy those latencies as you mention.

> 
> At the moment, the application is free to decide whether it wants zerocopy
> XDP_TX or zerocopy AF_XDP, depending on its needs. After your patchset the
> only valid XDP verdict on zerocopy AF_XDP basically becomes "XDP_REDIRECT to
> XSKMAP". I don't think it's valid to break an entire feature to speed up
> some very specific use case.

We disagree that it 'breaks an entire feature' - it is about hardening the
driver when user did not come up with an optimal combo of ring sizes vs
busy poll budget. Driver needs to be able to handle such cases in a
reasonable way. What is more, (at least Intel) zero-copy drivers are
written in a way that XDP_REDIRECT to XSKMAP is the most probable verdict
that can come out of XDP program. However, other actions are of course
supported, so with your arguments, you could even say that we currently
treat redir as 'only valid' action, which is not true. Just note that
Intel's regular drivers (copy-mode AF_XDP socket) are optimized for
XDP_PASS (as that is the default without an XDP program in place) as that
is the most probable verdict for that driver.

> 
> Moreover, in early days of AF_XDP there was an attempt to implement zerocopy
> XDP_TX on AF_XDP queues, meaning both XDP_TX and AF_XDP could be zerocopy.
> The implementation suffered from possible overflows in driver queues, thus
> wasn't upstreamed, but it's still a valid idea that potentially could be
> done if overflows are worked around somehow.
> 

That feature would be good to have, but it has not been worked on and
might never be worked on since there seems to be little interest in XDP_TX
for the zero-copy driver. This also makes your argument about disregarding
XDP_TX a bit exaggerated. As we stated before, we have not seen a use case
from a real user for this.

> > For the zero-copy driver, this opt in is not
> > necessary. But it sounds like a valid option for copy mode, though could we
> > think about the proper way as a follow up to this work?
> 
> My opinion is that the knob has to be part of initial submission, and the
> new feature should be disabled by default, otherwise we have huge issues
> with backward compatibility (if we delay it, the next update changes the
> behavior, breaking some existing use cases, and there is no way to work
> around it).
> 

We would not like to introduce knobs for every
feature/optimization/trade-off we could think of unless we really have to.
Let us keep it simple. Zero-copy is optimized for XDP_REDIRECT to an
AF_XDP socket. The regular, copy-mode driver is optimized for the case
when the packet is consumed by the kernel stack or XDP. That means that we
should not introduce this optimization for the regular driver, as you say,
but it is fine to do it for the zero-copy driver without a knob. If we
ever introduce this for the regular driver, yes, we would need a knob.

> > > 
> > > 3. When the driver receives ENOBUFS, it has to drop the packet before
> > > returning to the application. It would be better experience if your feature
> > > saved all N packets from being dropped, not just N-1.
> > 
> > Sure, I'll re-run tests and see if we can omit freeing the current
> > xdp_buff and ntc bump, so that we would come back later on to the same
> > entry.
> > 
> > > 
> > > 4. A slow or malicious AF_XDP application may easily cause an overflow of
> > > the hardware receive ring. Your feature introduces a mechanism to pause the
> > > driver while the congestion is on the application side, but no symmetric
> > > mechanism to pause the application when the driver is close to an overflow.
> > > I don't know the behavior of Intel NICs on overflow, but in our NICs it's
> > > considered a critical error, that is followed by a recovery procedure, so
> > > it's not something that should happen under normal workloads.
> > 
> > I'm not sure I follow on this one. Feature is about overflowing the XSK
> > receive ring, not the HW one, right?
> 
> Right. So we have this pipeline of buffers:
> 
> NIC--> [HW RX ring] --NAPI--> [XSK RX ring] --app--> consumes packets
> 
> Currently, when the NIC puts stuff in HW RX ring, NAPI always runs and
> drains it either to XSK RX ring or to /dev/null if XSK RX ring is full. The
> driver fulfills its responsibility to prevent overflows of HW RX ring. If
> the application doesn't consume quick enough, the frames will be leaked, but
> it's only the application's issue, the driver stays consistent.
> 
> After the feature, it's possible to pause NAPI from the userspace
> application, effectively disrupting the driver's consistency. I don't think
> an XSK application should have this power.

It already has this power when using an AF_XDP socket. Nothing new. Some
examples, when using busy-poll together with gro_flush_timeout and
napi_defer_hard_irqs you already have this power. Another example is not
feeding buffers into the fill ring from the application side in zero-copy
mode. Also, application does not have to be "slow" in order to cause the
XSK Rx queue overflow. It can be the matter of not optimal budget choice
when compared to ring lengths, as stated above.

Besides that, you are right, in copy-mode (without busy-poll), let us not
introduce this as this would provide the application with this power when
it does not have it today.

> 
> > Driver picks entries from fill ring
> > that were produced by app, so if app is slow on producing those I believe
> > this would be rather an underflow of ring, we would simply receive less
> > frames. For HW Rx ring actually being full, I think that HW would be
> > dropping the incoming frames, so I don't see the real reason to treat this
> > as critical error that needs to go through recovery.
> 
> I'll double check regarding the hardware behavior, but it is what it is. If
> an overflow moves the queue to the fault state and requires a recovery,
> there is nothing I can do about that.
> 
> A few more thoughts I just had: mlx5e shares the same NAPI instance to serve
> all queues in a channel, that includes the XSK RQ and the regular RQ. The
> regular and XSK traffic can be configured to be isolated to different
> channels, or they may co-exist on the same channel. If they co-exist, and
> XSK asks to pause NAPI, the regular traffic will still run NAPI and drop 1
> XSK packet per NAPI cycle, unless point 3 is fixed. It can also be

XSK does not pause the whole NAPI cycle, your HW XSK RQ just stops with
doing redirects to AF_XDP XSK RQ. Your regular RQ is not affected in any
way. Finally point 3 needs to be fixed.

FWIW we also support having a channel/queue vector carrying more than one
RQ that is associated with particular NAPI instance.

> reproduced if NAPI is woken up by XSK TX. Besides, (correct me if I'm wrong)
> your current implementation introduces extra latency to XSK TX if XSK RX
> asked to pause NAPI, because NAPI will be restarted anyway (by TX wakeup),
> and it could have been rescheduled by the kernel.

Again, we don't pause NAPI. Tx and Rx processing are separate.

As for Intel drivers, Tx is processed first. So even if we break at Rx due
to -ENOBUFS from xdp_do_redirect(), Tx work has already been done.

To reiterate, we agreed on fixing point 1 and 3 from your original mail.
Valid and good points.

> 
> > Am I missing something? Maybe I have just misunderstood you.
> > 
> > > 
> > > > One might ask why it is still relevant even after having proper busy
> > > > poll support in place - here is the justification.
> > > > 
> > > > For xdpsock that was:
> > > > - run for l2fwd scenario,
> > > > - app/driver processing took place on the same core in busy poll
> > > >     with 2048 budget,
> > > > - HW ring sizes Tx 256, Rx 2048,
> > > > 
> > > > this work improved throughput by 78% and reduced Rx queue full statistic
> > > > bump by 99%.
> > > > 
> > > > For testing ice, make sure that you have [1] present on your side.
> > > > 
> > > > This set, besides the work described above, also carries also
> > > > improvements around return codes in various XSK paths and lastly a minor
> > > > optimization for xskq_cons_has_entries(), a helper that might be used
> > > > when XSK Rx batching would make it to the kernel.
> > > 
> > > Regarding error codes, I would like them to be consistent across all
> > > drivers, otherwise all the debuggability improvements are not useful enough.
> > > Your series only changed Intel drivers. Here also applies the backward
> > > compatibility concern: the same error codes than were in use have been
> > > repurposed, which may confuse some of existing applications.
> > 
> > I'll double check if ZC drivers are doing something unusual with return
> > values from xdp_do_redirect(). Regarding backward comp, I suppose you
> > refer only to changes in ndo_xsk_wakeup() callbacks as others are not
> > exposed to user space? They're not crucial to me, but it improved my
> > debugging experience.
> 
> Sorry if I wasn't clear enough. Yes, I meant the wakeup error codes. We
> aren't doing anything unusual with xdp_do_redirect codes (can't say for
> other drivers, though).
> 
> Last time I wanted to improve error codes returned from some BPF helpers
> (make the errors more distinguishable), my patch was blocked because of
> backward compatibility concerns. To be on the safe side (i.e. to avoid
> further bug reports from someone who actually relied on specific codes), you
> might want to use a new error code, rather than repurposing the existing
> ones.
> 
> I personally don't have objections about changing the error codes the way
> you did if you keep them consistent across all drivers, not only Intel ones.

Got your point. So I'll either drop the patches or look through
ndo_xsk_wakeup() implementations and try to standardize the ret codes.
Hope this sounds fine.

MF

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH bpf-next 00/10] xsk: stop softirq processing on full XSK Rx queue
  2022-04-08 18:17       ` Jakub Kicinski
@ 2022-04-11 15:46         ` Maciej Fijalkowski
  2022-04-11 17:07           ` Jakub Kicinski
  0 siblings, 1 reply; 30+ messages in thread
From: Maciej Fijalkowski @ 2022-04-11 15:46 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Maxim Mikityanskiy, bpf, ast, daniel, magnus.karlsson, bjorn,
	netdev, brouer, alexandr.lobakin, Tariq Toukan, Saeed Mahameed

On Fri, Apr 08, 2022 at 11:17:56AM -0700, Jakub Kicinski wrote:
> On Fri, 8 Apr 2022 15:48:44 +0300 Maxim Mikityanskiy wrote:
> > >> 4. A slow or malicious AF_XDP application may easily cause an overflow of
> > >> the hardware receive ring. Your feature introduces a mechanism to pause the
> > >> driver while the congestion is on the application side, but no symmetric
> > >> mechanism to pause the application when the driver is close to an overflow.
> > >> I don't know the behavior of Intel NICs on overflow, but in our NICs it's
> > >> considered a critical error, that is followed by a recovery procedure, so
> > >> it's not something that should happen under normal workloads.  
> > > 
> > > I'm not sure I follow on this one. Feature is about overflowing the XSK
> > > receive ring, not the HW one, right?  
> > 
> > Right. So we have this pipeline of buffers:
> > 
> > NIC--> [HW RX ring] --NAPI--> [XSK RX ring] --app--> consumes packets
> > 
> > Currently, when the NIC puts stuff in HW RX ring, NAPI always runs and 
> > drains it either to XSK RX ring or to /dev/null if XSK RX ring is full. 
> > The driver fulfills its responsibility to prevent overflows of HW RX 
> > ring. If the application doesn't consume quick enough, the frames will 
> > be leaked, but it's only the application's issue, the driver stays 
> > consistent.
> > 
> > After the feature, it's possible to pause NAPI from the userspace 
> > application, effectively disrupting the driver's consistency. I don't 
> > think an XSK application should have this power.
> 
> +1
> cover letter refers to busy poll, but did that test enable prefer busy
> poll w/ the timeout configured right? It seems like similar goal can 
> be achieved with just that.

AF_XDP busy poll where app and driver runs on same core, without
configuring gro_flush_timeout and napi_defer_hard_irqs does not bring much
value, so all of the busy poll tests were done with:

echo 2 | sudo tee /sys/class/net/ens4f1/napi_defer_hard_irqs
echo 200000 | sudo tee /sys/class/net/ens4f1/gro_flush_timeout

That said, performance can still suffer and packets would not make it up
to user space even with timeout being configured in the case I'm trying to
improve.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH bpf-next 00/10] xsk: stop softirq processing on full XSK Rx queue
  2022-04-11 15:46         ` Maciej Fijalkowski
@ 2022-04-11 17:07           ` Jakub Kicinski
  0 siblings, 0 replies; 30+ messages in thread
From: Jakub Kicinski @ 2022-04-11 17:07 UTC (permalink / raw)
  To: Maciej Fijalkowski
  Cc: Maxim Mikityanskiy, bpf, ast, daniel, magnus.karlsson, bjorn,
	netdev, brouer, alexandr.lobakin, Tariq Toukan, Saeed Mahameed

On Mon, 11 Apr 2022 17:46:06 +0200 Maciej Fijalkowski wrote:
> > +1
> > cover letter refers to busy poll, but did that test enable prefer busy
> > poll w/ the timeout configured right? It seems like similar goal can 
> > be achieved with just that.  
> 
> AF_XDP busy poll where app and driver runs on same core, without
> configuring gro_flush_timeout and napi_defer_hard_irqs does not bring much
> value, so all of the busy poll tests were done with:
> 
> echo 2 | sudo tee /sys/class/net/ens4f1/napi_defer_hard_irqs
> echo 200000 | sudo tee /sys/class/net/ens4f1/gro_flush_timeout
> 
> That said, performance can still suffer and packets would not make it up
> to user space even with timeout being configured in the case I'm trying to
> improve.

Does the system not break out of busy poll then? See if the NAPI
hrtimer fires or not.

It's a 2k queue IIRC, with timeout of 200us, so 10Mpps at least.
What rate is l2fwd sustaining without these patches?

You may have to either increase the timeout or increase the frequency
of repoll from user space (with a smaller budget).

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH bpf-next 00/10] xsk: stop softirq processing on full XSK Rx queue
  2022-04-11 15:35       ` Maciej Fijalkowski
@ 2022-04-13 10:40         ` Maxim Mikityanskiy
  2022-04-13 15:12           ` Magnus Karlsson
  0 siblings, 1 reply; 30+ messages in thread
From: Maxim Mikityanskiy @ 2022-04-13 10:40 UTC (permalink / raw)
  To: Maciej Fijalkowski
  Cc: bpf, ast, daniel, magnus.karlsson, bjorn, netdev, brouer,
	alexandr.lobakin, Tariq Toukan, Saeed Mahameed

On 2022-04-11 18:35, Maciej Fijalkowski wrote:
> On Fri, Apr 08, 2022 at 03:48:44PM +0300, Maxim Mikityanskiy wrote:
>> On 2022-04-08 12:08, Maciej Fijalkowski wrote:
>>> On Thu, Apr 07, 2022 at 01:49:02PM +0300, Maxim Mikityanskiy wrote:
>>>> On 2022-04-05 14:06, Maciej Fijalkowski wrote:
>>>>> Hi!
>>>>>
>>>>> This is a revival of Bjorn's idea [0] to break NAPI loop when XSK Rx
>>>>> queue gets full which in turn makes it impossible to keep on
>>>>> successfully producing descriptors to XSK Rx ring. By breaking out of
>>>>> the driver side immediately we will give the user space opportunity for
>>>>> consuming descriptors from XSK Rx ring and therefore provide room in the
>>>>> ring so that HW Rx -> XSK Rx redirection can be done.
>>>>>
>>>>> Maxim asked and Jesper agreed on simplifying Bjorn's original API used
>>>>> for detecting the event of interest, so let's just simply check for
>>>>> -ENOBUFS within Intel's ZC drivers after an attempt to redirect a buffer
>>>>> to XSK Rx. No real need for redirect API extension.
>>>>
>>>
>>> Hey Maxim!
>>>
>>>> I believe some of the other comments under the old series [0] might be still
>>>> relevant:
>>>>
>>>> 1. need_wakeup behavior. If need_wakeup is disabled, the expected behavior
>>>> is busy-looping in NAPI, you shouldn't break out early, as the application
>>>> does not restart NAPI, and the driver restarts it itself, leading to a less
>>>> efficient loop. If need_wakeup is enabled, it should be set on ENOBUFS - I
>>>> believe this is the case here, right?
>>>
>>> Good point. We currently set need_wakeup flag for -ENOBUFS case as it is
>>> being done for failure == true. You are right that we shouldn't be
>>> breaking the loop on -ENOBUFS if need_wakeup flag is not set on xsk_pool,
>>> will fix!
>>>
>>>>
>>>> 2. 50/50 AF_XDP and XDP_TX mix usecase. By breaking out early, you prevent
>>>> further packets from being XDP_TXed, leading to unnecessary latency
>>>> increase. The new feature should be opt-in, otherwise such usecases suffer.
>>>
>>> Anyone performing a lot of XDP_TX (or XDP_PASS, etc) should be using the
>>> regular copy-mode driver, while the zero-copy driver should be used when most
>>> packets are sent up to user-space.
>>
>> You generalized that easily, but how can you be so sure that all mixed use
>> cases can live with the much slower copy mode? Also, how do you apply your
>> rule of thumb to the 75/25 AF_XDP/XDP_TX use case? It's both "a lot of
>> XDP_TX" and "most packets are sent up to user-space" at the same time.
> 
> We are not aware of a single user that has this use case.

Doesn't mean there aren't any ;)

Back in the original series, Björn said it was a valid use case:

 > I'm leaning towards a more explicit opt-in like Max suggested. As Max
 > pointed out, AF_XDP/XDP_TX is actually a non-insane(!) way of using
 > AF_XDP zero-copy, which will suffer from the early exit.

https://lore.kernel.org/bpf/75146564-2774-47ac-cc75-40d74bea50d8@intel.com/

What has changed since then?

In any case, it's a significant behavior change that breaks backward 
compatibility and affects the mentioned use case. Such changes should go 
as opt-in: we have need_wakeup and unaligned chunks as opt-in features, 
so I don't see any reason to introduce a breaking change this time.

> What we do know
> is that we have a lot of users that care about zero packet loss
> performance when redirecting to an AF_XDP socket when using the zero-copy
> driver. And this work addresses one of the corner cases and therefore
> makes ZC driver better overall. So we say, focus on the cases people care
> about. BTW, we do have users using mainly XDP_TX, XDP_PASS and
> XDP_REDIRECT, but they are all using the regular driver for a good reason.
> So we should not destroy those latencies as you mention.
> 
>>
>> At the moment, the application is free to decide whether it wants zerocopy
>> XDP_TX or zerocopy AF_XDP, depending on its needs. After your patchset the
>> only valid XDP verdict on zerocopy AF_XDP basically becomes "XDP_REDIRECT to
>> XSKMAP". I don't think it's valid to break an entire feature to speed up
>> some very specific use case.
> 
> We disagree that it 'breaks an entire feature' - it is about hardening the
> driver when user did not come up with an optimal combo of ring sizes vs
> busy poll budget. Driver needs to be able to handle such cases in a
> reasonable way.

XDP_TX is a valid verdict on an XSK RX queue, and stopping XDP_TX 
processing for an indefinite time sounds to me as breaking the feature. 
  Improving performance doesn't justify breaking other stuff. It's OK to 
do so if the application explicitly acks that it doesn't care about 
XDP_TX, or (arguably) if it was the behavior from day one.

> What is more, (at least Intel) zero-copy drivers are
> written in a way that XDP_REDIRECT to XSKMAP is the most probable verdict
> that can come out of XDP program. However, other actions are of course
> supported, so with your arguments, you could even say that we currently
> treat redir as 'only valid' action, which is not true.

I did not say that, please don't attribute your speculations to me. One 
thing is optimizing for the most likely use case, the other is breaking 
unlikely use cases to improve the likely ones.

> Just note that
> Intel's regular drivers (copy-mode AF_XDP socket) are optimized for
> XDP_PASS (as that is the default without an XDP program in place) as that
> is the most probable verdict for that driver.
> 
>>
>> Moreover, in early days of AF_XDP there was an attempt to implement zerocopy
>> XDP_TX on AF_XDP queues, meaning both XDP_TX and AF_XDP could be zerocopy.
>> The implementation suffered from possible overflows in driver queues, thus
>> wasn't upstreamed, but it's still a valid idea that potentially could be
>> done if overflows are worked around somehow.
>>
> 
> That feature would be good to have, but it has not been worked on and
> might never be worked on since there seems to be little interest in XDP_TX
> for the zero-copy driver. This also makes your argument about disregarding
> XDP_TX a bit exaggerated. As we stated before, we have not seen a use case
> from a real user for this.
> 
>>> For the zero-copy driver, this opt in is not
>>> necessary. But it sounds like a valid option for copy mode, though could we
>>> think about the proper way as a follow up to this work?
>>
>> My opinion is that the knob has to be part of initial submission, and the
>> new feature should be disabled by default, otherwise we have huge issues
>> with backward compatibility (if we delay it, the next update changes the
>> behavior, breaking some existing use cases, and there is no way to work
>> around it).
>>
> 
> We would not like to introduce knobs for every
> feature/optimization/trade-off we could think of unless we really have to.
> Let us keep it simple. Zero-copy is optimized for XDP_REDIRECT to an
> AF_XDP socket. The regular, copy-mode driver is optimized for the case
> when the packet is consumed by the kernel stack or XDP. That means that we
> should not introduce this optimization for the regular driver, as you say,
> but it is fine to do it for the zero-copy driver without a knob. If we
> ever introduce this for the regular driver, yes, we would need a knob.
> 
>>>>
>>>> 3. When the driver receives ENOBUFS, it has to drop the packet before
>>>> returning to the application. It would be better experience if your feature
>>>> saved all N packets from being dropped, not just N-1.
>>>
>>> Sure, I'll re-run tests and see if we can omit freeing the current
>>> xdp_buff and ntc bump, so that we would come back later on to the same
>>> entry.
>>>
>>>>
>>>> 4. A slow or malicious AF_XDP application may easily cause an overflow of
>>>> the hardware receive ring. Your feature introduces a mechanism to pause the
>>>> driver while the congestion is on the application side, but no symmetric
>>>> mechanism to pause the application when the driver is close to an overflow.
>>>> I don't know the behavior of Intel NICs on overflow, but in our NICs it's
>>>> considered a critical error, that is followed by a recovery procedure, so
>>>> it's not something that should happen under normal workloads.
>>>
>>> I'm not sure I follow on this one. Feature is about overflowing the XSK
>>> receive ring, not the HW one, right?
>>
>> Right. So we have this pipeline of buffers:
>>
>> NIC--> [HW RX ring] --NAPI--> [XSK RX ring] --app--> consumes packets
>>
>> Currently, when the NIC puts stuff in HW RX ring, NAPI always runs and
>> drains it either to XSK RX ring or to /dev/null if XSK RX ring is full. The
>> driver fulfills its responsibility to prevent overflows of HW RX ring. If
>> the application doesn't consume quick enough, the frames will be leaked, but
>> it's only the application's issue, the driver stays consistent.
>>
>> After the feature, it's possible to pause NAPI from the userspace
>> application, effectively disrupting the driver's consistency. I don't think
>> an XSK application should have this power.
> 
> It already has this power when using an AF_XDP socket. Nothing new. Some
> examples, when using busy-poll together with gro_flush_timeout and
> napi_defer_hard_irqs you already have this power. Another example is not
> feeding buffers into the fill ring from the application side in zero-copy
> mode. Also, application does not have to be "slow" in order to cause the
> XSK Rx queue overflow. It can be the matter of not optimal budget choice
> when compared to ring lengths, as stated above.

(*)

> Besides that, you are right, in copy-mode (without busy-poll), let us not
> introduce this as this would provide the application with this power when
> it does not have it today.
> 
>>
>>> Driver picks entries from fill ring
>>> that were produced by app, so if app is slow on producing those I believe
>>> this would be rather an underflow of ring, we would simply receive less
>>> frames. For HW Rx ring actually being full, I think that HW would be
>>> dropping the incoming frames, so I don't see the real reason to treat this
>>> as critical error that needs to go through recovery.
>>
>> I'll double check regarding the hardware behavior, but it is what it is. If
>> an overflow moves the queue to the fault state and requires a recovery,
>> there is nothing I can do about that.

I double-checked that, and it seems there is no problem I indicated in 
point 4. In the mlx5e case, if NAPI is delayed, there will be lack of RX 
WQEs, and hardware will start dropping packets, and it's an absolutely 
regular situation. Your arguments above (*) are valid.

>> A few more thoughts I just had: mlx5e shares the same NAPI instance to serve
>> all queues in a channel, that includes the XSK RQ and the regular RQ. The
>> regular and XSK traffic can be configured to be isolated to different
>> channels, or they may co-exist on the same channel. If they co-exist, and
>> XSK asks to pause NAPI, the regular traffic will still run NAPI and drop 1
>> XSK packet per NAPI cycle, unless point 3 is fixed. It can also be
> 
> XSK does not pause the whole NAPI cycle, your HW XSK RQ just stops with
> doing redirects to AF_XDP XSK RQ. Your regular RQ is not affected in any
> way. Finally point 3 needs to be fixed.
> 
> FWIW we also support having a channel/queue vector carrying more than one
> RQ that is associated with particular NAPI instance.
> 
>> reproduced if NAPI is woken up by XSK TX. Besides, (correct me if I'm wrong)
>> your current implementation introduces extra latency to XSK TX if XSK RX
>> asked to pause NAPI, because NAPI will be restarted anyway (by TX wakeup),
>> and it could have been rescheduled by the kernel.
> 
> Again, we don't pause NAPI. Tx and Rx processing are separate.
> 
> As for Intel drivers, Tx is processed first. So even if we break at Rx due
> to -ENOBUFS from xdp_do_redirect(), Tx work has already been done.
> 
> To reiterate, we agreed on fixing point 1 and 3 from your original mail.
> Valid and good points.

Great that we agreed on 1 and 3! Point 4 can be dropped. For point 2, I 
wrote my thoughts above.

>>
>>> Am I missing something? Maybe I have just misunderstood you.
>>>
>>>>
>>>>> One might ask why it is still relevant even after having proper busy
>>>>> poll support in place - here is the justification.
>>>>>
>>>>> For xdpsock that was:
>>>>> - run for l2fwd scenario,
>>>>> - app/driver processing took place on the same core in busy poll
>>>>>      with 2048 budget,
>>>>> - HW ring sizes Tx 256, Rx 2048,
>>>>>
>>>>> this work improved throughput by 78% and reduced Rx queue full statistic
>>>>> bump by 99%.
>>>>>
>>>>> For testing ice, make sure that you have [1] present on your side.
>>>>>
>>>>> This set, besides the work described above, also carries also
>>>>> improvements around return codes in various XSK paths and lastly a minor
>>>>> optimization for xskq_cons_has_entries(), a helper that might be used
>>>>> when XSK Rx batching would make it to the kernel.
>>>>
>>>> Regarding error codes, I would like them to be consistent across all
>>>> drivers, otherwise all the debuggability improvements are not useful enough.
>>>> Your series only changed Intel drivers. Here also applies the backward
>>>> compatibility concern: the same error codes than were in use have been
>>>> repurposed, which may confuse some of existing applications.
>>>
>>> I'll double check if ZC drivers are doing something unusual with return
>>> values from xdp_do_redirect(). Regarding backward comp, I suppose you
>>> refer only to changes in ndo_xsk_wakeup() callbacks as others are not
>>> exposed to user space? They're not crucial to me, but it improved my
>>> debugging experience.
>>
>> Sorry if I wasn't clear enough. Yes, I meant the wakeup error codes. We
>> aren't doing anything unusual with xdp_do_redirect codes (can't say for
>> other drivers, though).
>>
>> Last time I wanted to improve error codes returned from some BPF helpers
>> (make the errors more distinguishable), my patch was blocked because of
>> backward compatibility concerns. To be on the safe side (i.e. to avoid
>> further bug reports from someone who actually relied on specific codes), you
>> might want to use a new error code, rather than repurposing the existing
>> ones.
>>
>> I personally don't have objections about changing the error codes the way
>> you did if you keep them consistent across all drivers, not only Intel ones.
> 
> Got your point. So I'll either drop the patches or look through
> ndo_xsk_wakeup() implementations and try to standardize the ret codes.
> Hope this sounds fine.

Yes, either way sounds absolutely fine to me.

> 
> MF


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH bpf-next 00/10] xsk: stop softirq processing on full XSK Rx queue
  2022-04-13 10:40         ` Maxim Mikityanskiy
@ 2022-04-13 15:12           ` Magnus Karlsson
  2022-04-13 15:26             ` Maciej Fijalkowski
  0 siblings, 1 reply; 30+ messages in thread
From: Magnus Karlsson @ 2022-04-13 15:12 UTC (permalink / raw)
  To: Maxim Mikityanskiy
  Cc: Maciej Fijalkowski, bpf, ast, daniel, magnus.karlsson, bjorn,
	netdev, brouer, alexandr.lobakin, Tariq Toukan, Saeed Mahameed

On Wed, Apr 13, 2022 at 1:40 PM Maxim Mikityanskiy <maximmi@nvidia.com> wrote:
>
> On 2022-04-11 18:35, Maciej Fijalkowski wrote:
> > On Fri, Apr 08, 2022 at 03:48:44PM +0300, Maxim Mikityanskiy wrote:
> >> On 2022-04-08 12:08, Maciej Fijalkowski wrote:
> >>> On Thu, Apr 07, 2022 at 01:49:02PM +0300, Maxim Mikityanskiy wrote:
> >>>> On 2022-04-05 14:06, Maciej Fijalkowski wrote:
> >>>>> Hi!
> >>>>>
> >>>>> This is a revival of Bjorn's idea [0] to break NAPI loop when XSK Rx
> >>>>> queue gets full which in turn makes it impossible to keep on
> >>>>> successfully producing descriptors to XSK Rx ring. By breaking out of
> >>>>> the driver side immediately we will give the user space opportunity for
> >>>>> consuming descriptors from XSK Rx ring and therefore provide room in the
> >>>>> ring so that HW Rx -> XSK Rx redirection can be done.
> >>>>>
> >>>>> Maxim asked and Jesper agreed on simplifying Bjorn's original API used
> >>>>> for detecting the event of interest, so let's just simply check for
> >>>>> -ENOBUFS within Intel's ZC drivers after an attempt to redirect a buffer
> >>>>> to XSK Rx. No real need for redirect API extension.
> >>>>
> >>>
> >>> Hey Maxim!
> >>>
> >>>> I believe some of the other comments under the old series [0] might be still
> >>>> relevant:
> >>>>
> >>>> 1. need_wakeup behavior. If need_wakeup is disabled, the expected behavior
> >>>> is busy-looping in NAPI, you shouldn't break out early, as the application
> >>>> does not restart NAPI, and the driver restarts it itself, leading to a less
> >>>> efficient loop. If need_wakeup is enabled, it should be set on ENOBUFS - I
> >>>> believe this is the case here, right?
> >>>
> >>> Good point. We currently set need_wakeup flag for -ENOBUFS case as it is
> >>> being done for failure == true. You are right that we shouldn't be
> >>> breaking the loop on -ENOBUFS if need_wakeup flag is not set on xsk_pool,
> >>> will fix!
> >>>
> >>>>
> >>>> 2. 50/50 AF_XDP and XDP_TX mix usecase. By breaking out early, you prevent
> >>>> further packets from being XDP_TXed, leading to unnecessary latency
> >>>> increase. The new feature should be opt-in, otherwise such usecases suffer.
> >>>
> >>> Anyone performing a lot of XDP_TX (or XDP_PASS, etc) should be using the
> >>> regular copy-mode driver, while the zero-copy driver should be used when most
> >>> packets are sent up to user-space.
> >>
> >> You generalized that easily, but how can you be so sure that all mixed use
> >> cases can live with the much slower copy mode? Also, how do you apply your
> >> rule of thumb to the 75/25 AF_XDP/XDP_TX use case? It's both "a lot of
> >> XDP_TX" and "most packets are sent up to user-space" at the same time.
> >
> > We are not aware of a single user that has this use case.
>
> Doesn't mean there aren't any ;)
>
> Back in the original series, Björn said it was a valid use case:
>
>  > I'm leaning towards a more explicit opt-in like Max suggested. As Max
>  > pointed out, AF_XDP/XDP_TX is actually a non-insane(!) way of using
>  > AF_XDP zero-copy, which will suffer from the early exit.
>
> https://lore.kernel.org/bpf/75146564-2774-47ac-cc75-40d74bea50d8@intel.com/
>
> What has changed since then?

We now have real use cases and have become wiser ;-).

> In any case, it's a significant behavior change that breaks backward
> compatibility and affects the mentioned use case. Such changes should go
> as opt-in: we have need_wakeup and unaligned chunks as opt-in features,
> so I don't see any reason to introduce a breaking change this time.

In my opinion, the existing flags you mentioned are different. The
unaligned flag changes the format of the descriptor and the
need_wakeup flag can stop the driver so user space needs to explicitly
wake it up. These options really change the uapi and not refactoring
the application for this would really break it, so an opt-in was a
must. What Maciej is suggesting is about changing the performance for
a case that I have never seen (does not mean it does not exist though
because I do not know all users of course, but it is at least
uncommon). Do we want to put an opt-in for every performance change we
commit. I say no. But at some point a performance change might of
course be so large that it is warranted. It should also be for a use
case that exists, or at least we believe exists. I do not think that
is the case for this one. But if someone out there really has this use
case, please let me know and I will be happy to change my opinion.

> > What we do know
> > is that we have a lot of users that care about zero packet loss
> > performance when redirecting to an AF_XDP socket when using the zero-copy
> > driver. And this work addresses one of the corner cases and therefore
> > makes ZC driver better overall. So we say, focus on the cases people care
> > about. BTW, we do have users using mainly XDP_TX, XDP_PASS and
> > XDP_REDIRECT, but they are all using the regular driver for a good reason.
> > So we should not destroy those latencies as you mention.
> >
> >>
> >> At the moment, the application is free to decide whether it wants zerocopy
> >> XDP_TX or zerocopy AF_XDP, depending on its needs. After your patchset the
> >> only valid XDP verdict on zerocopy AF_XDP basically becomes "XDP_REDIRECT to
> >> XSKMAP". I don't think it's valid to break an entire feature to speed up
> >> some very specific use case.
> >
> > We disagree that it 'breaks an entire feature' - it is about hardening the
> > driver when user did not come up with an optimal combo of ring sizes vs
> > busy poll budget. Driver needs to be able to handle such cases in a
> > reasonable way.
>
> XDP_TX is a valid verdict on an XSK RX queue, and stopping XDP_TX
> processing for an indefinite time sounds to me as breaking the feature.
>   Improving performance doesn't justify breaking other stuff. It's OK to
> do so if the application explicitly acks that it doesn't care about
> XDP_TX, or (arguably) if it was the behavior from day one.
>
> > What is more, (at least Intel) zero-copy drivers are
> > written in a way that XDP_REDIRECT to XSKMAP is the most probable verdict
> > that can come out of XDP program. However, other actions are of course
> > supported, so with your arguments, you could even say that we currently
> > treat redir as 'only valid' action, which is not true.
>
> I did not say that, please don't attribute your speculations to me. One
> thing is optimizing for the most likely use case, the other is breaking
> unlikely use cases to improve the likely ones.
>
> > Just note that
> > Intel's regular drivers (copy-mode AF_XDP socket) are optimized for
> > XDP_PASS (as that is the default without an XDP program in place) as that
> > is the most probable verdict for that driver.
> >
> >>
> >> Moreover, in early days of AF_XDP there was an attempt to implement zerocopy
> >> XDP_TX on AF_XDP queues, meaning both XDP_TX and AF_XDP could be zerocopy.
> >> The implementation suffered from possible overflows in driver queues, thus
> >> wasn't upstreamed, but it's still a valid idea that potentially could be
> >> done if overflows are worked around somehow.
> >>
> >
> > That feature would be good to have, but it has not been worked on and
> > might never be worked on since there seems to be little interest in XDP_TX
> > for the zero-copy driver. This also makes your argument about disregarding
> > XDP_TX a bit exaggerated. As we stated before, we have not seen a use case
> > from a real user for this.
> >
> >>> For the zero-copy driver, this opt in is not
> >>> necessary. But it sounds like a valid option for copy mode, though could we
> >>> think about the proper way as a follow up to this work?
> >>
> >> My opinion is that the knob has to be part of initial submission, and the
> >> new feature should be disabled by default, otherwise we have huge issues
> >> with backward compatibility (if we delay it, the next update changes the
> >> behavior, breaking some existing use cases, and there is no way to work
> >> around it).
> >>
> >
> > We would not like to introduce knobs for every
> > feature/optimization/trade-off we could think of unless we really have to.
> > Let us keep it simple. Zero-copy is optimized for XDP_REDIRECT to an
> > AF_XDP socket. The regular, copy-mode driver is optimized for the case
> > when the packet is consumed by the kernel stack or XDP. That means that we
> > should not introduce this optimization for the regular driver, as you say,
> > but it is fine to do it for the zero-copy driver without a knob. If we
> > ever introduce this for the regular driver, yes, we would need a knob.
> >
> >>>>
> >>>> 3. When the driver receives ENOBUFS, it has to drop the packet before
> >>>> returning to the application. It would be better experience if your feature
> >>>> saved all N packets from being dropped, not just N-1.
> >>>
> >>> Sure, I'll re-run tests and see if we can omit freeing the current
> >>> xdp_buff and ntc bump, so that we would come back later on to the same
> >>> entry.
> >>>
> >>>>
> >>>> 4. A slow or malicious AF_XDP application may easily cause an overflow of
> >>>> the hardware receive ring. Your feature introduces a mechanism to pause the
> >>>> driver while the congestion is on the application side, but no symmetric
> >>>> mechanism to pause the application when the driver is close to an overflow.
> >>>> I don't know the behavior of Intel NICs on overflow, but in our NICs it's
> >>>> considered a critical error, that is followed by a recovery procedure, so
> >>>> it's not something that should happen under normal workloads.
> >>>
> >>> I'm not sure I follow on this one. Feature is about overflowing the XSK
> >>> receive ring, not the HW one, right?
> >>
> >> Right. So we have this pipeline of buffers:
> >>
> >> NIC--> [HW RX ring] --NAPI--> [XSK RX ring] --app--> consumes packets
> >>
> >> Currently, when the NIC puts stuff in HW RX ring, NAPI always runs and
> >> drains it either to XSK RX ring or to /dev/null if XSK RX ring is full. The
> >> driver fulfills its responsibility to prevent overflows of HW RX ring. If
> >> the application doesn't consume quick enough, the frames will be leaked, but
> >> it's only the application's issue, the driver stays consistent.
> >>
> >> After the feature, it's possible to pause NAPI from the userspace
> >> application, effectively disrupting the driver's consistency. I don't think
> >> an XSK application should have this power.
> >
> > It already has this power when using an AF_XDP socket. Nothing new. Some
> > examples, when using busy-poll together with gro_flush_timeout and
> > napi_defer_hard_irqs you already have this power. Another example is not
> > feeding buffers into the fill ring from the application side in zero-copy
> > mode. Also, application does not have to be "slow" in order to cause the
> > XSK Rx queue overflow. It can be the matter of not optimal budget choice
> > when compared to ring lengths, as stated above.
>
> (*)
>
> > Besides that, you are right, in copy-mode (without busy-poll), let us not
> > introduce this as this would provide the application with this power when
> > it does not have it today.
> >
> >>
> >>> Driver picks entries from fill ring
> >>> that were produced by app, so if app is slow on producing those I believe
> >>> this would be rather an underflow of ring, we would simply receive less
> >>> frames. For HW Rx ring actually being full, I think that HW would be
> >>> dropping the incoming frames, so I don't see the real reason to treat this
> >>> as critical error that needs to go through recovery.
> >>
> >> I'll double check regarding the hardware behavior, but it is what it is. If
> >> an overflow moves the queue to the fault state and requires a recovery,
> >> there is nothing I can do about that.
>
> I double-checked that, and it seems there is no problem I indicated in
> point 4. In the mlx5e case, if NAPI is delayed, there will be lack of RX
> WQEs, and hardware will start dropping packets, and it's an absolutely
> regular situation. Your arguments above (*) are valid.
>
> >> A few more thoughts I just had: mlx5e shares the same NAPI instance to serve
> >> all queues in a channel, that includes the XSK RQ and the regular RQ. The
> >> regular and XSK traffic can be configured to be isolated to different
> >> channels, or they may co-exist on the same channel. If they co-exist, and
> >> XSK asks to pause NAPI, the regular traffic will still run NAPI and drop 1
> >> XSK packet per NAPI cycle, unless point 3 is fixed. It can also be
> >
> > XSK does not pause the whole NAPI cycle, your HW XSK RQ just stops with
> > doing redirects to AF_XDP XSK RQ. Your regular RQ is not affected in any
> > way. Finally point 3 needs to be fixed.
> >
> > FWIW we also support having a channel/queue vector carrying more than one
> > RQ that is associated with particular NAPI instance.
> >
> >> reproduced if NAPI is woken up by XSK TX. Besides, (correct me if I'm wrong)
> >> your current implementation introduces extra latency to XSK TX if XSK RX
> >> asked to pause NAPI, because NAPI will be restarted anyway (by TX wakeup),
> >> and it could have been rescheduled by the kernel.
> >
> > Again, we don't pause NAPI. Tx and Rx processing are separate.
> >
> > As for Intel drivers, Tx is processed first. So even if we break at Rx due
> > to -ENOBUFS from xdp_do_redirect(), Tx work has already been done.
> >
> > To reiterate, we agreed on fixing point 1 and 3 from your original mail.
> > Valid and good points.
>
> Great that we agreed on 1 and 3! Point 4 can be dropped. For point 2, I
> wrote my thoughts above.
>
> >>
> >>> Am I missing something? Maybe I have just misunderstood you.
> >>>
> >>>>
> >>>>> One might ask why it is still relevant even after having proper busy
> >>>>> poll support in place - here is the justification.
> >>>>>
> >>>>> For xdpsock that was:
> >>>>> - run for l2fwd scenario,
> >>>>> - app/driver processing took place on the same core in busy poll
> >>>>>      with 2048 budget,
> >>>>> - HW ring sizes Tx 256, Rx 2048,
> >>>>>
> >>>>> this work improved throughput by 78% and reduced Rx queue full statistic
> >>>>> bump by 99%.
> >>>>>
> >>>>> For testing ice, make sure that you have [1] present on your side.
> >>>>>
> >>>>> This set, besides the work described above, also carries also
> >>>>> improvements around return codes in various XSK paths and lastly a minor
> >>>>> optimization for xskq_cons_has_entries(), a helper that might be used
> >>>>> when XSK Rx batching would make it to the kernel.
> >>>>
> >>>> Regarding error codes, I would like them to be consistent across all
> >>>> drivers, otherwise all the debuggability improvements are not useful enough.
> >>>> Your series only changed Intel drivers. Here also applies the backward
> >>>> compatibility concern: the same error codes than were in use have been
> >>>> repurposed, which may confuse some of existing applications.
> >>>
> >>> I'll double check if ZC drivers are doing something unusual with return
> >>> values from xdp_do_redirect(). Regarding backward comp, I suppose you
> >>> refer only to changes in ndo_xsk_wakeup() callbacks as others are not
> >>> exposed to user space? They're not crucial to me, but it improved my
> >>> debugging experience.
> >>
> >> Sorry if I wasn't clear enough. Yes, I meant the wakeup error codes. We
> >> aren't doing anything unusual with xdp_do_redirect codes (can't say for
> >> other drivers, though).
> >>
> >> Last time I wanted to improve error codes returned from some BPF helpers
> >> (make the errors more distinguishable), my patch was blocked because of
> >> backward compatibility concerns. To be on the safe side (i.e. to avoid
> >> further bug reports from someone who actually relied on specific codes), you
> >> might want to use a new error code, rather than repurposing the existing
> >> ones.
> >>
> >> I personally don't have objections about changing the error codes the way
> >> you did if you keep them consistent across all drivers, not only Intel ones.
> >
> > Got your point. So I'll either drop the patches or look through
> > ndo_xsk_wakeup() implementations and try to standardize the ret codes.
> > Hope this sounds fine.
>
> Yes, either way sounds absolutely fine to me.
>
> >
> > MF
>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH bpf-next 00/10] xsk: stop softirq processing on full XSK Rx queue
  2022-04-13 15:12           ` Magnus Karlsson
@ 2022-04-13 15:26             ` Maciej Fijalkowski
  0 siblings, 0 replies; 30+ messages in thread
From: Maciej Fijalkowski @ 2022-04-13 15:26 UTC (permalink / raw)
  To: Magnus Karlsson
  Cc: Maxim Mikityanskiy, bpf, ast, daniel, magnus.karlsson, bjorn,
	netdev, brouer, alexandr.lobakin, Tariq Toukan, Saeed Mahameed

On Wed, Apr 13, 2022 at 05:12:09PM +0200, Magnus Karlsson wrote:
> On Wed, Apr 13, 2022 at 1:40 PM Maxim Mikityanskiy <maximmi@nvidia.com> wrote:
> >
> > On 2022-04-11 18:35, Maciej Fijalkowski wrote:
> > > On Fri, Apr 08, 2022 at 03:48:44PM +0300, Maxim Mikityanskiy wrote:
> > >> On 2022-04-08 12:08, Maciej Fijalkowski wrote:
> > >>> On Thu, Apr 07, 2022 at 01:49:02PM +0300, Maxim Mikityanskiy wrote:
> > >>>> On 2022-04-05 14:06, Maciej Fijalkowski wrote:
> > >>>>> Hi!
> > >>>>>
> > >>>>> This is a revival of Bjorn's idea [0] to break NAPI loop when XSK Rx
> > >>>>> queue gets full which in turn makes it impossible to keep on
> > >>>>> successfully producing descriptors to XSK Rx ring. By breaking out of
> > >>>>> the driver side immediately we will give the user space opportunity for
> > >>>>> consuming descriptors from XSK Rx ring and therefore provide room in the
> > >>>>> ring so that HW Rx -> XSK Rx redirection can be done.
> > >>>>>
> > >>>>> Maxim asked and Jesper agreed on simplifying Bjorn's original API used
> > >>>>> for detecting the event of interest, so let's just simply check for
> > >>>>> -ENOBUFS within Intel's ZC drivers after an attempt to redirect a buffer
> > >>>>> to XSK Rx. No real need for redirect API extension.
> > >>>>
> > >>>
> > >>> Hey Maxim!
> > >>>
> > >>>> I believe some of the other comments under the old series [0] might be still
> > >>>> relevant:
> > >>>>
> > >>>> 1. need_wakeup behavior. If need_wakeup is disabled, the expected behavior
> > >>>> is busy-looping in NAPI, you shouldn't break out early, as the application
> > >>>> does not restart NAPI, and the driver restarts it itself, leading to a less
> > >>>> efficient loop. If need_wakeup is enabled, it should be set on ENOBUFS - I
> > >>>> believe this is the case here, right?
> > >>>
> > >>> Good point. We currently set need_wakeup flag for -ENOBUFS case as it is
> > >>> being done for failure == true. You are right that we shouldn't be
> > >>> breaking the loop on -ENOBUFS if need_wakeup flag is not set on xsk_pool,
> > >>> will fix!
> > >>>
> > >>>>
> > >>>> 2. 50/50 AF_XDP and XDP_TX mix usecase. By breaking out early, you prevent
> > >>>> further packets from being XDP_TXed, leading to unnecessary latency
> > >>>> increase. The new feature should be opt-in, otherwise such usecases suffer.
> > >>>
> > >>> Anyone performing a lot of XDP_TX (or XDP_PASS, etc) should be using the
> > >>> regular copy-mode driver, while the zero-copy driver should be used when most
> > >>> packets are sent up to user-space.
> > >>
> > >> You generalized that easily, but how can you be so sure that all mixed use
> > >> cases can live with the much slower copy mode? Also, how do you apply your
> > >> rule of thumb to the 75/25 AF_XDP/XDP_TX use case? It's both "a lot of
> > >> XDP_TX" and "most packets are sent up to user-space" at the same time.
> > >
> > > We are not aware of a single user that has this use case.
> >
> > Doesn't mean there aren't any ;)
> >
> > Back in the original series, Björn said it was a valid use case:
> >
> >  > I'm leaning towards a more explicit opt-in like Max suggested. As Max
> >  > pointed out, AF_XDP/XDP_TX is actually a non-insane(!) way of using
> >  > AF_XDP zero-copy, which will suffer from the early exit.
> >
> > https://lore.kernel.org/bpf/75146564-2774-47ac-cc75-40d74bea50d8@intel.com/
> >
> > What has changed since then?
> 
> We now have real use cases and have become wiser ;-).

Bjorn said it was valid use case but we still do not have any real life
example of it...

> 
> > In any case, it's a significant behavior change that breaks backward
> > compatibility and affects the mentioned use case. Such changes should go
> > as opt-in: we have need_wakeup and unaligned chunks as opt-in features,
> > so I don't see any reason to introduce a breaking change this time.
> 
> In my opinion, the existing flags you mentioned are different. The
> unaligned flag changes the format of the descriptor and the
> need_wakeup flag can stop the driver so user space needs to explicitly
> wake it up. These options really change the uapi and not refactoring
> the application for this would really break it, so an opt-in was a
> must. What Maciej is suggesting is about changing the performance for
> a case that I have never seen (does not mean it does not exist though
> because I do not know all users of course, but it is at least
> uncommon). Do we want to put an opt-in for every performance change we
> commit. I say no. But at some point a performance change might of
> course be so large that it is warranted. It should also be for a use
> case that exists, or at least we believe exists. I do not think that
> is the case for this one. But if someone out there really has this use
> case, please let me know and I will be happy to change my opinion.

FWIW, to get this going, I will send a v2 without opt-in but with points 1
and 3 addressed.  I also changed ENXIO to EINVAL in mlx5 and stmmac's
ndo_xsk_wakeup().

> 
> > > What we do know
> > > is that we have a lot of users that care about zero packet loss
> > > performance when redirecting to an AF_XDP socket when using the zero-copy
> > > driver. And this work addresses one of the corner cases and therefore
> > > makes ZC driver better overall. So we say, focus on the cases people care
> > > about. BTW, we do have users using mainly XDP_TX, XDP_PASS and
> > > XDP_REDIRECT, but they are all using the regular driver for a good reason.
> > > So we should not destroy those latencies as you mention.
> > >
> > >>
> > >> At the moment, the application is free to decide whether it wants zerocopy
> > >> XDP_TX or zerocopy AF_XDP, depending on its needs. After your patchset the
> > >> only valid XDP verdict on zerocopy AF_XDP basically becomes "XDP_REDIRECT to
> > >> XSKMAP". I don't think it's valid to break an entire feature to speed up
> > >> some very specific use case.
> > >
> > > We disagree that it 'breaks an entire feature' - it is about hardening the
> > > driver when user did not come up with an optimal combo of ring sizes vs
> > > busy poll budget. Driver needs to be able to handle such cases in a
> > > reasonable way.
> >
> > XDP_TX is a valid verdict on an XSK RX queue, and stopping XDP_TX
> > processing for an indefinite time sounds to me as breaking the feature.
> >   Improving performance doesn't justify breaking other stuff. It's OK to
> > do so if the application explicitly acks that it doesn't care about
> > XDP_TX, or (arguably) if it was the behavior from day one.
> >
> > > What is more, (at least Intel) zero-copy drivers are
> > > written in a way that XDP_REDIRECT to XSKMAP is the most probable verdict
> > > that can come out of XDP program. However, other actions are of course
> > > supported, so with your arguments, you could even say that we currently
> > > treat redir as 'only valid' action, which is not true.
> >
> > I did not say that, please don't attribute your speculations to me. One
> > thing is optimizing for the most likely use case, the other is breaking
> > unlikely use cases to improve the likely ones.
> >
> > > Just note that
> > > Intel's regular drivers (copy-mode AF_XDP socket) are optimized for
> > > XDP_PASS (as that is the default without an XDP program in place) as that
> > > is the most probable verdict for that driver.
> > >
> > >>
> > >> Moreover, in early days of AF_XDP there was an attempt to implement zerocopy
> > >> XDP_TX on AF_XDP queues, meaning both XDP_TX and AF_XDP could be zerocopy.
> > >> The implementation suffered from possible overflows in driver queues, thus
> > >> wasn't upstreamed, but it's still a valid idea that potentially could be
> > >> done if overflows are worked around somehow.
> > >>
> > >
> > > That feature would be good to have, but it has not been worked on and
> > > might never be worked on since there seems to be little interest in XDP_TX
> > > for the zero-copy driver. This also makes your argument about disregarding
> > > XDP_TX a bit exaggerated. As we stated before, we have not seen a use case
> > > from a real user for this.
> > >
> > >>> For the zero-copy driver, this opt in is not
> > >>> necessary. But it sounds like a valid option for copy mode, though could we
> > >>> think about the proper way as a follow up to this work?
> > >>
> > >> My opinion is that the knob has to be part of initial submission, and the
> > >> new feature should be disabled by default, otherwise we have huge issues
> > >> with backward compatibility (if we delay it, the next update changes the
> > >> behavior, breaking some existing use cases, and there is no way to work
> > >> around it).
> > >>
> > >
> > > We would not like to introduce knobs for every
> > > feature/optimization/trade-off we could think of unless we really have to.
> > > Let us keep it simple. Zero-copy is optimized for XDP_REDIRECT to an
> > > AF_XDP socket. The regular, copy-mode driver is optimized for the case
> > > when the packet is consumed by the kernel stack or XDP. That means that we
> > > should not introduce this optimization for the regular driver, as you say,
> > > but it is fine to do it for the zero-copy driver without a knob. If we
> > > ever introduce this for the regular driver, yes, we would need a knob.
> > >
> > >>>>
> > >>>> 3. When the driver receives ENOBUFS, it has to drop the packet before
> > >>>> returning to the application. It would be better experience if your feature
> > >>>> saved all N packets from being dropped, not just N-1.
> > >>>
> > >>> Sure, I'll re-run tests and see if we can omit freeing the current
> > >>> xdp_buff and ntc bump, so that we would come back later on to the same
> > >>> entry.
> > >>>
> > >>>>
> > >>>> 4. A slow or malicious AF_XDP application may easily cause an overflow of
> > >>>> the hardware receive ring. Your feature introduces a mechanism to pause the
> > >>>> driver while the congestion is on the application side, but no symmetric
> > >>>> mechanism to pause the application when the driver is close to an overflow.
> > >>>> I don't know the behavior of Intel NICs on overflow, but in our NICs it's
> > >>>> considered a critical error, that is followed by a recovery procedure, so
> > >>>> it's not something that should happen under normal workloads.
> > >>>
> > >>> I'm not sure I follow on this one. Feature is about overflowing the XSK
> > >>> receive ring, not the HW one, right?
> > >>
> > >> Right. So we have this pipeline of buffers:
> > >>
> > >> NIC--> [HW RX ring] --NAPI--> [XSK RX ring] --app--> consumes packets
> > >>
> > >> Currently, when the NIC puts stuff in HW RX ring, NAPI always runs and
> > >> drains it either to XSK RX ring or to /dev/null if XSK RX ring is full. The
> > >> driver fulfills its responsibility to prevent overflows of HW RX ring. If
> > >> the application doesn't consume quick enough, the frames will be leaked, but
> > >> it's only the application's issue, the driver stays consistent.
> > >>
> > >> After the feature, it's possible to pause NAPI from the userspace
> > >> application, effectively disrupting the driver's consistency. I don't think
> > >> an XSK application should have this power.
> > >
> > > It already has this power when using an AF_XDP socket. Nothing new. Some
> > > examples, when using busy-poll together with gro_flush_timeout and
> > > napi_defer_hard_irqs you already have this power. Another example is not
> > > feeding buffers into the fill ring from the application side in zero-copy
> > > mode. Also, application does not have to be "slow" in order to cause the
> > > XSK Rx queue overflow. It can be the matter of not optimal budget choice
> > > when compared to ring lengths, as stated above.
> >
> > (*)
> >
> > > Besides that, you are right, in copy-mode (without busy-poll), let us not
> > > introduce this as this would provide the application with this power when
> > > it does not have it today.
> > >
> > >>
> > >>> Driver picks entries from fill ring
> > >>> that were produced by app, so if app is slow on producing those I believe
> > >>> this would be rather an underflow of ring, we would simply receive less
> > >>> frames. For HW Rx ring actually being full, I think that HW would be
> > >>> dropping the incoming frames, so I don't see the real reason to treat this
> > >>> as critical error that needs to go through recovery.
> > >>
> > >> I'll double check regarding the hardware behavior, but it is what it is. If
> > >> an overflow moves the queue to the fault state and requires a recovery,
> > >> there is nothing I can do about that.
> >
> > I double-checked that, and it seems there is no problem I indicated in
> > point 4. In the mlx5e case, if NAPI is delayed, there will be lack of RX
> > WQEs, and hardware will start dropping packets, and it's an absolutely
> > regular situation. Your arguments above (*) are valid.
> >
> > >> A few more thoughts I just had: mlx5e shares the same NAPI instance to serve
> > >> all queues in a channel, that includes the XSK RQ and the regular RQ. The
> > >> regular and XSK traffic can be configured to be isolated to different
> > >> channels, or they may co-exist on the same channel. If they co-exist, and
> > >> XSK asks to pause NAPI, the regular traffic will still run NAPI and drop 1
> > >> XSK packet per NAPI cycle, unless point 3 is fixed. It can also be
> > >
> > > XSK does not pause the whole NAPI cycle, your HW XSK RQ just stops with
> > > doing redirects to AF_XDP XSK RQ. Your regular RQ is not affected in any
> > > way. Finally point 3 needs to be fixed.
> > >
> > > FWIW we also support having a channel/queue vector carrying more than one
> > > RQ that is associated with particular NAPI instance.
> > >
> > >> reproduced if NAPI is woken up by XSK TX. Besides, (correct me if I'm wrong)
> > >> your current implementation introduces extra latency to XSK TX if XSK RX
> > >> asked to pause NAPI, because NAPI will be restarted anyway (by TX wakeup),
> > >> and it could have been rescheduled by the kernel.
> > >
> > > Again, we don't pause NAPI. Tx and Rx processing are separate.
> > >
> > > As for Intel drivers, Tx is processed first. So even if we break at Rx due
> > > to -ENOBUFS from xdp_do_redirect(), Tx work has already been done.
> > >
> > > To reiterate, we agreed on fixing point 1 and 3 from your original mail.
> > > Valid and good points.
> >
> > Great that we agreed on 1 and 3! Point 4 can be dropped. For point 2, I
> > wrote my thoughts above.
> >
> > >>
> > >>> Am I missing something? Maybe I have just misunderstood you.
> > >>>
> > >>>>
> > >>>>> One might ask why it is still relevant even after having proper busy
> > >>>>> poll support in place - here is the justification.
> > >>>>>
> > >>>>> For xdpsock that was:
> > >>>>> - run for l2fwd scenario,
> > >>>>> - app/driver processing took place on the same core in busy poll
> > >>>>>      with 2048 budget,
> > >>>>> - HW ring sizes Tx 256, Rx 2048,
> > >>>>>
> > >>>>> this work improved throughput by 78% and reduced Rx queue full statistic
> > >>>>> bump by 99%.
> > >>>>>
> > >>>>> For testing ice, make sure that you have [1] present on your side.
> > >>>>>
> > >>>>> This set, besides the work described above, also carries also
> > >>>>> improvements around return codes in various XSK paths and lastly a minor
> > >>>>> optimization for xskq_cons_has_entries(), a helper that might be used
> > >>>>> when XSK Rx batching would make it to the kernel.
> > >>>>
> > >>>> Regarding error codes, I would like them to be consistent across all
> > >>>> drivers, otherwise all the debuggability improvements are not useful enough.
> > >>>> Your series only changed Intel drivers. Here also applies the backward
> > >>>> compatibility concern: the same error codes than were in use have been
> > >>>> repurposed, which may confuse some of existing applications.
> > >>>
> > >>> I'll double check if ZC drivers are doing something unusual with return
> > >>> values from xdp_do_redirect(). Regarding backward comp, I suppose you
> > >>> refer only to changes in ndo_xsk_wakeup() callbacks as others are not
> > >>> exposed to user space? They're not crucial to me, but it improved my
> > >>> debugging experience.
> > >>
> > >> Sorry if I wasn't clear enough. Yes, I meant the wakeup error codes. We
> > >> aren't doing anything unusual with xdp_do_redirect codes (can't say for
> > >> other drivers, though).
> > >>
> > >> Last time I wanted to improve error codes returned from some BPF helpers
> > >> (make the errors more distinguishable), my patch was blocked because of
> > >> backward compatibility concerns. To be on the safe side (i.e. to avoid
> > >> further bug reports from someone who actually relied on specific codes), you
> > >> might want to use a new error code, rather than repurposing the existing
> > >> ones.
> > >>
> > >> I personally don't have objections about changing the error codes the way
> > >> you did if you keep them consistent across all drivers, not only Intel ones.
> > >
> > > Got your point. So I'll either drop the patches or look through
> > > ndo_xsk_wakeup() implementations and try to standardize the ret codes.
> > > Hope this sounds fine.
> >
> > Yes, either way sounds absolutely fine to me.
> >
> > >
> > > MF
> >

^ permalink raw reply	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2022-04-13 15:26 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-04-05 11:06 [PATCH bpf-next 00/10] xsk: stop softirq processing on full XSK Rx queue Maciej Fijalkowski
2022-04-05 11:06 ` [PATCH bpf-next 01/10] xsk: improve xdp_do_redirect() error codes Maciej Fijalkowski
2022-04-05 12:18   ` Jesper Dangaard Brouer
2022-04-05 11:06 ` [PATCH bpf-next 02/10] xsk: diversify return codes in xsk_rcv_check() Maciej Fijalkowski
2022-04-05 13:00   ` Jesper Dangaard Brouer
2022-04-05 13:35     ` Maciej Fijalkowski
2022-04-05 11:06 ` [PATCH bpf-next 03/10] ice: xsk: terminate NAPI when XSK Rx queue gets full Maciej Fijalkowski
2022-04-05 11:34   ` Alexander Lobakin
2022-04-05 12:02     ` Maciej Fijalkowski
2022-04-05 11:06 ` [PATCH bpf-next 04/10] i40e: " Maciej Fijalkowski
2022-04-05 13:04   ` Jesper Dangaard Brouer
2022-04-06 16:04     ` Maciej Fijalkowski
2022-04-05 11:06 ` [PATCH bpf-next 05/10] ixgbe: " Maciej Fijalkowski
2022-04-05 12:36   ` Jesper Dangaard Brouer
2022-04-05 13:52     ` Maciej Fijalkowski
2022-04-05 11:06 ` [PATCH bpf-next 06/10] ice: xsk: diversify return values from xsk_wakeup call paths Maciej Fijalkowski
2022-04-05 11:06 ` [PATCH bpf-next 07/10] i40e: " Maciej Fijalkowski
2022-04-05 11:06 ` [PATCH bpf-next 08/10] ixgbe: " Maciej Fijalkowski
2022-04-05 11:06 ` [PATCH bpf-next 09/10] ice: xsk: avoid refilling single Rx descriptors Maciej Fijalkowski
2022-04-05 11:06 ` [PATCH bpf-next 10/10] xsk: drop ternary operator from xskq_cons_has_entries Maciej Fijalkowski
2022-04-07 10:49 ` [PATCH bpf-next 00/10] xsk: stop softirq processing on full XSK Rx queue Maxim Mikityanskiy
2022-04-08  9:08   ` Maciej Fijalkowski
2022-04-08 12:48     ` Maxim Mikityanskiy
2022-04-08 18:17       ` Jakub Kicinski
2022-04-11 15:46         ` Maciej Fijalkowski
2022-04-11 17:07           ` Jakub Kicinski
2022-04-11 15:35       ` Maciej Fijalkowski
2022-04-13 10:40         ` Maxim Mikityanskiy
2022-04-13 15:12           ` Magnus Karlsson
2022-04-13 15:26             ` Maciej Fijalkowski

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).