netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH net 0/5][pull request] Intel Wired LAN Driver Updates 2021-03-12
@ 2021-03-12 17:47 Tony Nguyen
  2021-03-12 17:47 ` [PATCH net 1/5] ice: fix napi work done reporting in xsk path Tony Nguyen
                   ` (5 more replies)
  0 siblings, 6 replies; 7+ messages in thread
From: Tony Nguyen @ 2021-03-12 17:47 UTC (permalink / raw)
  To: davem, kuba
  Cc: Tony Nguyen, netdev, sassmann, bjorn.topel, magnus.karlsson,
	maciej.fijalkowski

This series contains updates to ice, i40e, ixgbe and igb drivers.

Magnus adjusts the return value for xsk allocation for ice. This fixes
reporting of napi work done and matches the behavior of other Intel NIC
drivers for xsk allocations.

Maciej moves storing of the rx_offset value to after the build_skb flag
is set as this flag affects the offset value for ice, i40e, and ixgbe.

Li RongQing resolves an issue where an Rx buffer can be reused
prematurely with XDP redirect for igb.

The following are changes since commit 7a1468ba0e02eee24ae1353e8933793a27198e20:
  net: phy: broadcom: Add power down exit reset state delay
and are available in the git repository at:
  git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue 100GbE

Li RongQing (1):
  igb: avoid premature Rx buffer reuse

Maciej Fijalkowski (3):
  i40e: move headroom initialization to i40e_configure_rx_ring
  ice: move headroom initialization to ice_setup_rx_ctx
  ixgbe: move headroom initialization to ixgbe_configure_rx_ring

Magnus Karlsson (1):
  ice: fix napi work done reporting in xsk path

 drivers/net/ethernet/intel/i40e/i40e_main.c   | 13 ++++++++++
 drivers/net/ethernet/intel/i40e/i40e_txrx.c   | 12 ----------
 drivers/net/ethernet/intel/ice/ice_base.c     | 24 +++++++++++++++++--
 drivers/net/ethernet/intel/ice/ice_txrx.c     | 17 -------------
 drivers/net/ethernet/intel/ice/ice_xsk.c      | 10 ++++----
 drivers/net/ethernet/intel/igb/igb_main.c     | 22 +++++++++++------
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c |  3 ++-
 7 files changed, 57 insertions(+), 44 deletions(-)

-- 
2.26.2


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH net 1/5] ice: fix napi work done reporting in xsk path
  2021-03-12 17:47 [PATCH net 0/5][pull request] Intel Wired LAN Driver Updates 2021-03-12 Tony Nguyen
@ 2021-03-12 17:47 ` Tony Nguyen
  2021-03-12 17:47 ` [PATCH net 2/5] i40e: move headroom initialization to i40e_configure_rx_ring Tony Nguyen
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: Tony Nguyen @ 2021-03-12 17:47 UTC (permalink / raw)
  To: davem, kuba
  Cc: Magnus Karlsson, netdev, sassmann, anthony.l.nguyen, bjorn.topel,
	maciej.fijalkowski, Kiran Bhandare

From: Magnus Karlsson <magnus.karlsson@intel.com>

Fix the wrong napi work done reporting in the xsk path of the ice
driver. The code in the main Rx processing loop was written to assume
that the buffer allocation code returns true if all allocations where
successful and false if not. In contrast with all other Intel NIC xsk
drivers, the ice_alloc_rx_bufs_zc() has the inverted logic messing up
the work done reporting in the napi loop.

This can be fixed either by inverting the return value from
ice_alloc_rx_bufs_zc() in the function that uses this in an incorrect
way, or by changing the return value of ice_alloc_rx_bufs_zc(). We
chose the latter as it makes all the xsk allocation functions for
Intel NICs behave in the same way. My guess is that it was this
unexpected discrepancy that gave rise to this bug in the first place.

Fixes: 5bb0c4b5eb61 ("ice, xsk: Move Rx allocation out of while-loop")
Reported-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Tested-by: Kiran Bhandare <kiranx.bhandare@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
---
 drivers/net/ethernet/intel/ice/ice_base.c |  6 ++++--
 drivers/net/ethernet/intel/ice/ice_xsk.c  | 10 +++++-----
 2 files changed, 9 insertions(+), 7 deletions(-)

diff --git a/drivers/net/ethernet/intel/ice/ice_base.c b/drivers/net/ethernet/intel/ice/ice_base.c
index 3124a3bf519a..952e41a1e001 100644
--- a/drivers/net/ethernet/intel/ice/ice_base.c
+++ b/drivers/net/ethernet/intel/ice/ice_base.c
@@ -418,6 +418,8 @@ int ice_setup_rx_ctx(struct ice_ring *ring)
 	writel(0, ring->tail);
 
 	if (ring->xsk_pool) {
+		bool ok;
+
 		if (!xsk_buff_can_alloc(ring->xsk_pool, num_bufs)) {
 			dev_warn(dev, "XSK buffer pool does not provide enough addresses to fill %d buffers on Rx ring %d\n",
 				 num_bufs, ring->q_index);
@@ -426,8 +428,8 @@ int ice_setup_rx_ctx(struct ice_ring *ring)
 			return 0;
 		}
 
-		err = ice_alloc_rx_bufs_zc(ring, num_bufs);
-		if (err)
+		ok = ice_alloc_rx_bufs_zc(ring, num_bufs);
+		if (!ok)
 			dev_info(dev, "Failed to allocate some buffers on XSK buffer pool enabled Rx ring %d (pf_q %d)\n",
 				 ring->q_index, pf_q);
 		return 0;
diff --git a/drivers/net/ethernet/intel/ice/ice_xsk.c b/drivers/net/ethernet/intel/ice/ice_xsk.c
index 83f3c9574ed1..9f94d9159acd 100644
--- a/drivers/net/ethernet/intel/ice/ice_xsk.c
+++ b/drivers/net/ethernet/intel/ice/ice_xsk.c
@@ -358,18 +358,18 @@ int ice_xsk_pool_setup(struct ice_vsi *vsi, struct xsk_buff_pool *pool, u16 qid)
  * This function allocates a number of Rx buffers from the fill ring
  * or the internal recycle mechanism and places them on the Rx ring.
  *
- * Returns false if all allocations were successful, true if any fail.
+ * Returns true if all allocations were successful, false if any fail.
  */
 bool ice_alloc_rx_bufs_zc(struct ice_ring *rx_ring, u16 count)
 {
 	union ice_32b_rx_flex_desc *rx_desc;
 	u16 ntu = rx_ring->next_to_use;
 	struct ice_rx_buf *rx_buf;
-	bool ret = false;
+	bool ok = true;
 	dma_addr_t dma;
 
 	if (!count)
-		return false;
+		return true;
 
 	rx_desc = ICE_RX_DESC(rx_ring, ntu);
 	rx_buf = &rx_ring->rx_buf[ntu];
@@ -377,7 +377,7 @@ bool ice_alloc_rx_bufs_zc(struct ice_ring *rx_ring, u16 count)
 	do {
 		rx_buf->xdp = xsk_buff_alloc(rx_ring->xsk_pool);
 		if (!rx_buf->xdp) {
-			ret = true;
+			ok = false;
 			break;
 		}
 
@@ -402,7 +402,7 @@ bool ice_alloc_rx_bufs_zc(struct ice_ring *rx_ring, u16 count)
 		ice_release_rx_desc(rx_ring, ntu);
 	}
 
-	return ret;
+	return ok;
 }
 
 /**
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH net 2/5] i40e: move headroom initialization to i40e_configure_rx_ring
  2021-03-12 17:47 [PATCH net 0/5][pull request] Intel Wired LAN Driver Updates 2021-03-12 Tony Nguyen
  2021-03-12 17:47 ` [PATCH net 1/5] ice: fix napi work done reporting in xsk path Tony Nguyen
@ 2021-03-12 17:47 ` Tony Nguyen
  2021-03-12 17:47 ` [PATCH net 3/5] ice: move headroom initialization to ice_setup_rx_ctx Tony Nguyen
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: Tony Nguyen @ 2021-03-12 17:47 UTC (permalink / raw)
  To: davem, kuba
  Cc: Maciej Fijalkowski, netdev, sassmann, anthony.l.nguyen,
	bjorn.topel, magnus.karlsson, Jesper Dangaard Brouer,
	Kiran Bhandare

From: Maciej Fijalkowski <maciej.fijalkowski@intel.com>

i40e_rx_offset(), that is supposed to initialize the Rx buffer headroom,
relies on I40E_RXR_FLAGS_BUILD_SKB_ENABLED flag.

Currently, the callsite of mentioned function is placed incorrectly
within i40e_setup_rx_descriptors() where Rx ring's build skb flag is not
set yet. This causes the XDP_REDIRECT to be partially broken due to
inability to create xdp_frame in the headroom space, as the headroom is
0.

For the record, below is the call graph:

i40e_vsi_open
 i40e_vsi_setup_rx_resources
  i40e_setup_rx_descriptors
   i40e_rx_offset() <-- sets offset to 0 as build_skb flag is set below

 i40e_vsi_configure_rx
  i40e_configure_rx_ring
   set_ring_build_skb_enabled(ring) <-- set build_skb flag

Fix this by moving i40e_rx_offset() to i40e_configure_rx_ring() after
the flag setting.

Fixes: f7bb0d71d658 ("i40e: store the result of i40e_rx_offset() onto i40e_ring")
Reported-by: Jesper Dangaard Brouer <brouer@redhat.com>
Co-developed-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Tested-by: Jesper Dangaard Brouer <brouer@redhat.com>
Tested-by: Kiran Bhandare <kiranx.bhandare@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
---
 drivers/net/ethernet/intel/i40e/i40e_main.c | 13 +++++++++++++
 drivers/net/ethernet/intel/i40e/i40e_txrx.c | 12 ------------
 2 files changed, 13 insertions(+), 12 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c
index 353deae139f9..17f3b800640e 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -3258,6 +3258,17 @@ static int i40e_configure_tx_ring(struct i40e_ring *ring)
 	return 0;
 }
 
+/**
+ * i40e_rx_offset - Return expected offset into page to access data
+ * @rx_ring: Ring we are requesting offset of
+ *
+ * Returns the offset value for ring into the data buffer.
+ */
+static unsigned int i40e_rx_offset(struct i40e_ring *rx_ring)
+{
+	return ring_uses_build_skb(rx_ring) ? I40E_SKB_PAD : 0;
+}
+
 /**
  * i40e_configure_rx_ring - Configure a receive ring context
  * @ring: The Rx ring to configure
@@ -3369,6 +3380,8 @@ static int i40e_configure_rx_ring(struct i40e_ring *ring)
 	else
 		set_ring_build_skb_enabled(ring);
 
+	ring->rx_offset = i40e_rx_offset(ring);
+
 	/* cache tail for quicker writes, and clear the reg before use */
 	ring->tail = hw->hw_addr + I40E_QRX_TAIL(pf_q);
 	writel(0, ring->tail);
diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.c b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
index 627794b31e33..5747a99122fb 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_txrx.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
@@ -1569,17 +1569,6 @@ void i40e_free_rx_resources(struct i40e_ring *rx_ring)
 	}
 }
 
-/**
- * i40e_rx_offset - Return expected offset into page to access data
- * @rx_ring: Ring we are requesting offset of
- *
- * Returns the offset value for ring into the data buffer.
- */
-static unsigned int i40e_rx_offset(struct i40e_ring *rx_ring)
-{
-	return ring_uses_build_skb(rx_ring) ? I40E_SKB_PAD : 0;
-}
-
 /**
  * i40e_setup_rx_descriptors - Allocate Rx descriptors
  * @rx_ring: Rx descriptor ring (for a specific queue) to setup
@@ -1608,7 +1597,6 @@ int i40e_setup_rx_descriptors(struct i40e_ring *rx_ring)
 	rx_ring->next_to_alloc = 0;
 	rx_ring->next_to_clean = 0;
 	rx_ring->next_to_use = 0;
-	rx_ring->rx_offset = i40e_rx_offset(rx_ring);
 
 	/* XDP RX-queue info only needed for RX rings exposed to XDP */
 	if (rx_ring->vsi->type == I40E_VSI_MAIN) {
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH net 3/5] ice: move headroom initialization to ice_setup_rx_ctx
  2021-03-12 17:47 [PATCH net 0/5][pull request] Intel Wired LAN Driver Updates 2021-03-12 Tony Nguyen
  2021-03-12 17:47 ` [PATCH net 1/5] ice: fix napi work done reporting in xsk path Tony Nguyen
  2021-03-12 17:47 ` [PATCH net 2/5] i40e: move headroom initialization to i40e_configure_rx_ring Tony Nguyen
@ 2021-03-12 17:47 ` Tony Nguyen
  2021-03-12 17:47 ` [PATCH net 4/5] ixgbe: move headroom initialization to ixgbe_configure_rx_ring Tony Nguyen
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: Tony Nguyen @ 2021-03-12 17:47 UTC (permalink / raw)
  To: davem, kuba
  Cc: Maciej Fijalkowski, netdev, sassmann, anthony.l.nguyen,
	bjorn.topel, magnus.karlsson, Kiran Bhandare

From: Maciej Fijalkowski <maciej.fijalkowski@intel.com>

ice_rx_offset(), that is supposed to initialize the Rx buffer headroom,
relies on ICE_RX_FLAGS_RING_BUILD_SKB flag as well as XDP prog presence.

Currently, the callsite of mentioned function is placed incorrectly
within ice_setup_rx_ring() where Rx ring's build skb flag is not
set yet. This causes the XDP_REDIRECT to be partially broken due to
inability to create xdp_frame in the headroom space, as the headroom is
0.

Fix this by moving ice_rx_offset() to ice_setup_rx_ctx() after the flag
setting.

Fixes: f1b1f409bf79 ("ice: store the result of ice_rx_offset() onto ice_ring")
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Tested-by: Kiran Bhandare <kiranx.bhandare@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
---
 drivers/net/ethernet/intel/ice/ice_base.c | 18 ++++++++++++++++++
 drivers/net/ethernet/intel/ice/ice_txrx.c | 17 -----------------
 2 files changed, 18 insertions(+), 17 deletions(-)

diff --git a/drivers/net/ethernet/intel/ice/ice_base.c b/drivers/net/ethernet/intel/ice/ice_base.c
index 952e41a1e001..1148d768f8ed 100644
--- a/drivers/net/ethernet/intel/ice/ice_base.c
+++ b/drivers/net/ethernet/intel/ice/ice_base.c
@@ -274,6 +274,22 @@ ice_setup_tx_ctx(struct ice_ring *ring, struct ice_tlan_ctx *tlan_ctx, u16 pf_q)
 	tlan_ctx->legacy_int = ICE_TX_LEGACY;
 }
 
+/**
+ * ice_rx_offset - Return expected offset into page to access data
+ * @rx_ring: Ring we are requesting offset of
+ *
+ * Returns the offset value for ring into the data buffer.
+ */
+static unsigned int ice_rx_offset(struct ice_ring *rx_ring)
+{
+	if (ice_ring_uses_build_skb(rx_ring))
+		return ICE_SKB_PAD;
+	else if (ice_is_xdp_ena_vsi(rx_ring->vsi))
+		return XDP_PACKET_HEADROOM;
+
+	return 0;
+}
+
 /**
  * ice_setup_rx_ctx - Configure a receive ring context
  * @ring: The Rx ring to configure
@@ -413,6 +429,8 @@ int ice_setup_rx_ctx(struct ice_ring *ring)
 	else
 		ice_set_ring_build_skb_ena(ring);
 
+	ring->rx_offset = ice_rx_offset(ring);
+
 	/* init queue specific tail register */
 	ring->tail = hw->hw_addr + QRX_TAIL(pf_q);
 	writel(0, ring->tail);
diff --git a/drivers/net/ethernet/intel/ice/ice_txrx.c b/drivers/net/ethernet/intel/ice/ice_txrx.c
index b7dc25da1202..b91dcfd12727 100644
--- a/drivers/net/ethernet/intel/ice/ice_txrx.c
+++ b/drivers/net/ethernet/intel/ice/ice_txrx.c
@@ -443,22 +443,6 @@ void ice_free_rx_ring(struct ice_ring *rx_ring)
 	}
 }
 
-/**
- * ice_rx_offset - Return expected offset into page to access data
- * @rx_ring: Ring we are requesting offset of
- *
- * Returns the offset value for ring into the data buffer.
- */
-static unsigned int ice_rx_offset(struct ice_ring *rx_ring)
-{
-	if (ice_ring_uses_build_skb(rx_ring))
-		return ICE_SKB_PAD;
-	else if (ice_is_xdp_ena_vsi(rx_ring->vsi))
-		return XDP_PACKET_HEADROOM;
-
-	return 0;
-}
-
 /**
  * ice_setup_rx_ring - Allocate the Rx descriptors
  * @rx_ring: the Rx ring to set up
@@ -493,7 +477,6 @@ int ice_setup_rx_ring(struct ice_ring *rx_ring)
 
 	rx_ring->next_to_use = 0;
 	rx_ring->next_to_clean = 0;
-	rx_ring->rx_offset = ice_rx_offset(rx_ring);
 
 	if (ice_is_xdp_ena_vsi(rx_ring->vsi))
 		WRITE_ONCE(rx_ring->xdp_prog, rx_ring->vsi->xdp_prog);
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH net 4/5] ixgbe: move headroom initialization to ixgbe_configure_rx_ring
  2021-03-12 17:47 [PATCH net 0/5][pull request] Intel Wired LAN Driver Updates 2021-03-12 Tony Nguyen
                   ` (2 preceding siblings ...)
  2021-03-12 17:47 ` [PATCH net 3/5] ice: move headroom initialization to ice_setup_rx_ctx Tony Nguyen
@ 2021-03-12 17:47 ` Tony Nguyen
  2021-03-12 17:47 ` [PATCH net 5/5] igb: avoid premature Rx buffer reuse Tony Nguyen
  2021-03-13  2:10 ` [PATCH net 0/5][pull request] Intel Wired LAN Driver Updates 2021-03-12 patchwork-bot+netdevbpf
  5 siblings, 0 replies; 7+ messages in thread
From: Tony Nguyen @ 2021-03-12 17:47 UTC (permalink / raw)
  To: davem, kuba
  Cc: Maciej Fijalkowski, netdev, sassmann, anthony.l.nguyen,
	bjorn.topel, magnus.karlsson, Vishakha Jambekar

From: Maciej Fijalkowski <maciej.fijalkowski@intel.com>

ixgbe_rx_offset(), that is supposed to initialize the Rx buffer headroom,
relies on __IXGBE_RX_BUILD_SKB_ENABLED flag.

Currently, the callsite of mentioned function is placed incorrectly
within ixgbe_setup_rx_resources() where Rx ring's build skb flag is not
set yet. This causes the XDP_REDIRECT to be partially broken due to
inability to create xdp_frame in the headroom space, as the headroom is
0.

Fix this by moving ixgbe_rx_offset() to ixgbe_configure_rx_ring() after
the flag setting, which happens to be set in ixgbe_set_rx_buffer_len.

Fixes: c0d4e9d223c5 ("ixgbe: store the result of ixgbe_rx_offset() onto ixgbe_ring")
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Tested-by: Vishakha Jambekar <vishakha.jambekar@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
---
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index 9f3f12e2ccf2..03d9aad516d4 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -4118,6 +4118,8 @@ void ixgbe_configure_rx_ring(struct ixgbe_adapter *adapter,
 #endif
 	}
 
+	ring->rx_offset = ixgbe_rx_offset(ring);
+
 	if (ring->xsk_pool && hw->mac.type != ixgbe_mac_82599EB) {
 		u32 xsk_buf_len = xsk_pool_get_rx_frame_size(ring->xsk_pool);
 
@@ -6578,7 +6580,6 @@ int ixgbe_setup_rx_resources(struct ixgbe_adapter *adapter,
 
 	rx_ring->next_to_clean = 0;
 	rx_ring->next_to_use = 0;
-	rx_ring->rx_offset = ixgbe_rx_offset(rx_ring);
 
 	/* XDP RX-queue info */
 	if (xdp_rxq_info_reg(&rx_ring->xdp_rxq, adapter->netdev,
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH net 5/5] igb: avoid premature Rx buffer reuse
  2021-03-12 17:47 [PATCH net 0/5][pull request] Intel Wired LAN Driver Updates 2021-03-12 Tony Nguyen
                   ` (3 preceding siblings ...)
  2021-03-12 17:47 ` [PATCH net 4/5] ixgbe: move headroom initialization to ixgbe_configure_rx_ring Tony Nguyen
@ 2021-03-12 17:47 ` Tony Nguyen
  2021-03-13  2:10 ` [PATCH net 0/5][pull request] Intel Wired LAN Driver Updates 2021-03-12 patchwork-bot+netdevbpf
  5 siblings, 0 replies; 7+ messages in thread
From: Tony Nguyen @ 2021-03-12 17:47 UTC (permalink / raw)
  To: davem, kuba
  Cc: Li RongQing, netdev, sassmann, anthony.l.nguyen, bjorn.topel,
	magnus.karlsson, maciej.fijalkowski, Alexander Duyck,
	Vishakha Jambekar

From: Li RongQing <lirongqing@baidu.com>

Igb needs a similar fix as commit 75aab4e10ae6a ("i40e: avoid
premature Rx buffer reuse")

The page recycle code, incorrectly, relied on that a page fragment
could not be freed inside xdp_do_redirect(). This assumption leads to
that page fragments that are used by the stack/XDP redirect can be
reused and overwritten.

To avoid this, store the page count prior invoking xdp_do_redirect().

Longer explanation:

Intel NICs have a recycle mechanism. The main idea is that a page is
split into two parts. One part is owned by the driver, one part might
be owned by someone else, such as the stack.

t0: Page is allocated, and put on the Rx ring
              +---------------
used by NIC ->| upper buffer
(rx_buffer)   +---------------
              | lower buffer
              +---------------
  page count  == USHRT_MAX
  rx_buffer->pagecnt_bias == USHRT_MAX

t1: Buffer is received, and passed to the stack (e.g.)
              +---------------
              | upper buff (skb)
              +---------------
used by NIC ->| lower buffer
(rx_buffer)   +---------------
  page count  == USHRT_MAX
  rx_buffer->pagecnt_bias == USHRT_MAX - 1

t2: Buffer is received, and redirected
              +---------------
              | upper buff (skb)
              +---------------
used by NIC ->| lower buffer
(rx_buffer)   +---------------

Now, prior calling xdp_do_redirect():
  page count  == USHRT_MAX
  rx_buffer->pagecnt_bias == USHRT_MAX - 2

This means that buffer *cannot* be flipped/reused, because the skb is
still using it.

The problem arises when xdp_do_redirect() actually frees the
segment. Then we get:
  page count  == USHRT_MAX - 1
  rx_buffer->pagecnt_bias == USHRT_MAX - 2

From a recycle perspective, the buffer can be flipped and reused,
which means that the skb data area is passed to the Rx HW ring!

To work around this, the page count is stored prior calling
xdp_do_redirect().

Fixes: 9cbc948b5a20 ("igb: add XDP support")
Signed-off-by: Li RongQing <lirongqing@baidu.com>
Reviewed-by: Alexander Duyck <alexanderduyck@fb.com>
Tested-by: Vishakha Jambekar <vishakha.jambekar@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
---
 drivers/net/ethernet/intel/igb/igb_main.c | 22 +++++++++++++++-------
 1 file changed, 15 insertions(+), 7 deletions(-)

diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c
index 878b31d534ec..d3b37761f637 100644
--- a/drivers/net/ethernet/intel/igb/igb_main.c
+++ b/drivers/net/ethernet/intel/igb/igb_main.c
@@ -8214,7 +8214,8 @@ static void igb_reuse_rx_page(struct igb_ring *rx_ring,
 	new_buff->pagecnt_bias	= old_buff->pagecnt_bias;
 }
 
-static bool igb_can_reuse_rx_page(struct igb_rx_buffer *rx_buffer)
+static bool igb_can_reuse_rx_page(struct igb_rx_buffer *rx_buffer,
+				  int rx_buf_pgcnt)
 {
 	unsigned int pagecnt_bias = rx_buffer->pagecnt_bias;
 	struct page *page = rx_buffer->page;
@@ -8225,7 +8226,7 @@ static bool igb_can_reuse_rx_page(struct igb_rx_buffer *rx_buffer)
 
 #if (PAGE_SIZE < 8192)
 	/* if we are only owner of page we can reuse it */
-	if (unlikely((page_ref_count(page) - pagecnt_bias) > 1))
+	if (unlikely((rx_buf_pgcnt - pagecnt_bias) > 1))
 		return false;
 #else
 #define IGB_LAST_OFFSET \
@@ -8614,11 +8615,17 @@ static unsigned int igb_rx_offset(struct igb_ring *rx_ring)
 }
 
 static struct igb_rx_buffer *igb_get_rx_buffer(struct igb_ring *rx_ring,
-					       const unsigned int size)
+					       const unsigned int size, int *rx_buf_pgcnt)
 {
 	struct igb_rx_buffer *rx_buffer;
 
 	rx_buffer = &rx_ring->rx_buffer_info[rx_ring->next_to_clean];
+	*rx_buf_pgcnt =
+#if (PAGE_SIZE < 8192)
+		page_count(rx_buffer->page);
+#else
+		0;
+#endif
 	prefetchw(rx_buffer->page);
 
 	/* we are reusing so sync this buffer for CPU use */
@@ -8634,9 +8641,9 @@ static struct igb_rx_buffer *igb_get_rx_buffer(struct igb_ring *rx_ring,
 }
 
 static void igb_put_rx_buffer(struct igb_ring *rx_ring,
-			      struct igb_rx_buffer *rx_buffer)
+			      struct igb_rx_buffer *rx_buffer, int rx_buf_pgcnt)
 {
-	if (igb_can_reuse_rx_page(rx_buffer)) {
+	if (igb_can_reuse_rx_page(rx_buffer, rx_buf_pgcnt)) {
 		/* hand second half of page back to the ring */
 		igb_reuse_rx_page(rx_ring, rx_buffer);
 	} else {
@@ -8664,6 +8671,7 @@ static int igb_clean_rx_irq(struct igb_q_vector *q_vector, const int budget)
 	unsigned int xdp_xmit = 0;
 	struct xdp_buff xdp;
 	u32 frame_sz = 0;
+	int rx_buf_pgcnt;
 
 	/* Frame size depend on rx_ring setup when PAGE_SIZE=4K */
 #if (PAGE_SIZE < 8192)
@@ -8693,7 +8701,7 @@ static int igb_clean_rx_irq(struct igb_q_vector *q_vector, const int budget)
 		 */
 		dma_rmb();
 
-		rx_buffer = igb_get_rx_buffer(rx_ring, size);
+		rx_buffer = igb_get_rx_buffer(rx_ring, size, &rx_buf_pgcnt);
 
 		/* retrieve a buffer from the ring */
 		if (!skb) {
@@ -8736,7 +8744,7 @@ static int igb_clean_rx_irq(struct igb_q_vector *q_vector, const int budget)
 			break;
 		}
 
-		igb_put_rx_buffer(rx_ring, rx_buffer);
+		igb_put_rx_buffer(rx_ring, rx_buffer, rx_buf_pgcnt);
 		cleaned_count++;
 
 		/* fetch next buffer in frame if non-eop */
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH net 0/5][pull request] Intel Wired LAN Driver Updates 2021-03-12
  2021-03-12 17:47 [PATCH net 0/5][pull request] Intel Wired LAN Driver Updates 2021-03-12 Tony Nguyen
                   ` (4 preceding siblings ...)
  2021-03-12 17:47 ` [PATCH net 5/5] igb: avoid premature Rx buffer reuse Tony Nguyen
@ 2021-03-13  2:10 ` patchwork-bot+netdevbpf
  5 siblings, 0 replies; 7+ messages in thread
From: patchwork-bot+netdevbpf @ 2021-03-13  2:10 UTC (permalink / raw)
  To: Tony Nguyen
  Cc: davem, kuba, netdev, sassmann, bjorn.topel, magnus.karlsson,
	maciej.fijalkowski

Hello:

This series was applied to netdev/net.git (refs/heads/master):

On Fri, 12 Mar 2021 09:47:50 -0800 you wrote:
> This series contains updates to ice, i40e, ixgbe and igb drivers.
> 
> Magnus adjusts the return value for xsk allocation for ice. This fixes
> reporting of napi work done and matches the behavior of other Intel NIC
> drivers for xsk allocations.
> 
> Maciej moves storing of the rx_offset value to after the build_skb flag
> is set as this flag affects the offset value for ice, i40e, and ixgbe.
> 
> [...]

Here is the summary with links:
  - [net,1/5] ice: fix napi work done reporting in xsk path
    https://git.kernel.org/netdev/net/c/ed0907e3bdcf
  - [net,2/5] i40e: move headroom initialization to i40e_configure_rx_ring
    https://git.kernel.org/netdev/net/c/a86606268ec0
  - [net,3/5] ice: move headroom initialization to ice_setup_rx_ctx
    https://git.kernel.org/netdev/net/c/89861c485c6a
  - [net,4/5] ixgbe: move headroom initialization to ixgbe_configure_rx_ring
    https://git.kernel.org/netdev/net/c/76064573b121
  - [net,5/5] igb: avoid premature Rx buffer reuse
    https://git.kernel.org/netdev/net/c/98dfb02aa222

You are awesome, thank you!
--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2021-03-13  2:11 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-03-12 17:47 [PATCH net 0/5][pull request] Intel Wired LAN Driver Updates 2021-03-12 Tony Nguyen
2021-03-12 17:47 ` [PATCH net 1/5] ice: fix napi work done reporting in xsk path Tony Nguyen
2021-03-12 17:47 ` [PATCH net 2/5] i40e: move headroom initialization to i40e_configure_rx_ring Tony Nguyen
2021-03-12 17:47 ` [PATCH net 3/5] ice: move headroom initialization to ice_setup_rx_ctx Tony Nguyen
2021-03-12 17:47 ` [PATCH net 4/5] ixgbe: move headroom initialization to ixgbe_configure_rx_ring Tony Nguyen
2021-03-12 17:47 ` [PATCH net 5/5] igb: avoid premature Rx buffer reuse Tony Nguyen
2021-03-13  2:10 ` [PATCH net 0/5][pull request] Intel Wired LAN Driver Updates 2021-03-12 patchwork-bot+netdevbpf

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).