netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH net-next 0/4] i40e driver performance tweaks for AF_XDP
@ 2020-07-02 15:37 Björn Töpel
  2020-07-02 15:37 ` [PATCH net-next 1/4] i40e, xsk: remove HW descriptor prefetch in AF_XDP path Björn Töpel
                   ` (3 more replies)
  0 siblings, 4 replies; 9+ messages in thread
From: Björn Töpel @ 2020-07-02 15:37 UTC (permalink / raw)
  To: intel-wired-lan; +Cc: Björn Töpel, magnus.karlsson, netdev, bpf

This series contains four patches worth of driver tweaks for the i40e
AF_XDP Rx path, that in total improves the Rx performance (rx_drop 64B
packets-per-second) with 17%.

Please refer to the individual commits for more details.


Cheers,
Björn


Björn Töpel (4):
  i40e, xsk: remove HW descriptor prefetch in AF_XDP path
  i40e: use 16B HW descriptors instead of 32B
  i40e, xsk: increase budget for AF_XDP path
  i40e, xsk: move buffer allocation out of the Rx processing loop

 drivers/net/ethernet/intel/i40e/i40e.h        |  2 +-
 .../net/ethernet/intel/i40e/i40e_debugfs.c    | 10 +++----
 drivers/net/ethernet/intel/i40e/i40e_main.c   |  4 +--
 drivers/net/ethernet/intel/i40e/i40e_trace.h  |  6 ++--
 drivers/net/ethernet/intel/i40e/i40e_txrx.c   | 19 +++++++++++--
 drivers/net/ethernet/intel/i40e/i40e_txrx.h   |  2 +-
 .../ethernet/intel/i40e/i40e_txrx_common.h    | 13 ---------
 drivers/net/ethernet/intel/i40e/i40e_type.h   |  5 +++-
 drivers/net/ethernet/intel/i40e/i40e_xsk.c    | 28 +++++++++++++------
 9 files changed, 50 insertions(+), 39 deletions(-)


base-commit: 23212a70077311396cda2823d627317c25e6e5d1
-- 
2.25.1


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH net-next 1/4] i40e, xsk: remove HW descriptor prefetch in AF_XDP path
  2020-07-02 15:37 [PATCH net-next 0/4] i40e driver performance tweaks for AF_XDP Björn Töpel
@ 2020-07-02 15:37 ` Björn Töpel
  2020-07-08 20:43   ` [Intel-wired-lan] " Bowers, AndrewX
  2020-07-02 15:37 ` [PATCH net-next 2/4] i40e: use 16B HW descriptors instead of 32B Björn Töpel
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 9+ messages in thread
From: Björn Töpel @ 2020-07-02 15:37 UTC (permalink / raw)
  To: intel-wired-lan; +Cc: Björn Töpel, magnus.karlsson, netdev, bpf

From: Björn Töpel <bjorn.topel@intel.com>

The software prefetching of HW descriptors has a negative impact on
the performance. Therefore, it is now removed.

Performance for the rx_drop benchmark increased with 2%.

Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
---
 drivers/net/ethernet/intel/i40e/i40e_txrx.c        | 13 +++++++++++++
 drivers/net/ethernet/intel/i40e/i40e_txrx_common.h | 13 -------------
 drivers/net/ethernet/intel/i40e/i40e_xsk.c         | 12 ++++++++++++
 3 files changed, 25 insertions(+), 13 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.c b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
index 3e5c566ceb01..e1a76fc05b8d 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_txrx.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
@@ -2299,6 +2299,19 @@ void i40e_finalize_xdp_rx(struct i40e_ring *rx_ring, unsigned int xdp_res)
 	}
 }
 
+/**
+ * i40e_inc_ntc: Advance the next_to_clean index
+ * @rx_ring: Rx ring
+ **/
+static void i40e_inc_ntc(struct i40e_ring *rx_ring)
+{
+	u32 ntc = rx_ring->next_to_clean + 1;
+
+	ntc = (ntc < rx_ring->count) ? ntc : 0;
+	rx_ring->next_to_clean = ntc;
+	prefetch(I40E_RX_DESC(rx_ring, ntc));
+}
+
 /**
  * i40e_clean_rx_irq - Clean completed descriptors from Rx ring - bounce buf
  * @rx_ring: rx descriptor ring to transact packets on
diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx_common.h b/drivers/net/ethernet/intel/i40e/i40e_txrx_common.h
index 667c4dc4b39f..1397dd3c1c57 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_txrx_common.h
+++ b/drivers/net/ethernet/intel/i40e/i40e_txrx_common.h
@@ -99,19 +99,6 @@ static inline bool i40e_rx_is_programming_status(u64 qword1)
 	return qword1 & I40E_RXD_QW1_LENGTH_SPH_MASK;
 }
 
-/**
- * i40e_inc_ntc: Advance the next_to_clean index
- * @rx_ring: Rx ring
- **/
-static inline void i40e_inc_ntc(struct i40e_ring *rx_ring)
-{
-	u32 ntc = rx_ring->next_to_clean + 1;
-
-	ntc = (ntc < rx_ring->count) ? ntc : 0;
-	rx_ring->next_to_clean = ntc;
-	prefetch(I40E_RX_DESC(rx_ring, ntc));
-}
-
 void i40e_xsk_clean_rx_ring(struct i40e_ring *rx_ring);
 void i40e_xsk_clean_tx_ring(struct i40e_ring *tx_ring);
 bool i40e_xsk_any_rx_ring_enabled(struct i40e_vsi *vsi);
diff --git a/drivers/net/ethernet/intel/i40e/i40e_xsk.c b/drivers/net/ethernet/intel/i40e/i40e_xsk.c
index 8ce57b507a21..1f2dd591dbf1 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_xsk.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_xsk.c
@@ -253,6 +253,18 @@ static struct sk_buff *i40e_construct_skb_zc(struct i40e_ring *rx_ring,
 	return skb;
 }
 
+/**
+ * i40e_inc_ntc: Advance the next_to_clean index
+ * @rx_ring: Rx ring
+ **/
+static void i40e_inc_ntc(struct i40e_ring *rx_ring)
+{
+	u32 ntc = rx_ring->next_to_clean + 1;
+
+	ntc = (ntc < rx_ring->count) ? ntc : 0;
+	rx_ring->next_to_clean = ntc;
+}
+
 /**
  * i40e_clean_rx_irq_zc - Consumes Rx packets from the hardware ring
  * @rx_ring: Rx ring
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH net-next 2/4] i40e: use 16B HW descriptors instead of 32B
  2020-07-02 15:37 [PATCH net-next 0/4] i40e driver performance tweaks for AF_XDP Björn Töpel
  2020-07-02 15:37 ` [PATCH net-next 1/4] i40e, xsk: remove HW descriptor prefetch in AF_XDP path Björn Töpel
@ 2020-07-02 15:37 ` Björn Töpel
  2020-07-08 20:43   ` [Intel-wired-lan] " Bowers, AndrewX
  2020-07-02 15:37 ` [PATCH net-next 3/4] i40e, xsk: increase budget for AF_XDP path Björn Töpel
  2020-07-02 15:37 ` [PATCH net-next 4/4] i40e, xsk: move buffer allocation out of the Rx processing loop Björn Töpel
  3 siblings, 1 reply; 9+ messages in thread
From: Björn Töpel @ 2020-07-02 15:37 UTC (permalink / raw)
  To: intel-wired-lan; +Cc: Björn Töpel, magnus.karlsson, netdev, bpf

From: Björn Töpel <bjorn.topel@intel.com>

The i40e NIC supports two flavors of HW descriptors, 16 and 32
byte. The latter has, obviously, room for more offloading
information. However, the only fields of the 32B HW descriptor that is
being used by the driver, is also available in the 16B descriptor.

In other words; Reading and writing 32 bytes instead of 16 byte is a
waste of bus bandwidth.

This commit starts using 16 byte descriptors instead of 32 byte
descriptors.

For AF_XDP the rx_drop benchmark was improved by 2%.

Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
---
 drivers/net/ethernet/intel/i40e/i40e.h         |  2 +-
 drivers/net/ethernet/intel/i40e/i40e_debugfs.c | 10 ++++------
 drivers/net/ethernet/intel/i40e/i40e_main.c    |  4 ++--
 drivers/net/ethernet/intel/i40e/i40e_trace.h   |  6 +++---
 drivers/net/ethernet/intel/i40e/i40e_txrx.c    |  6 +++---
 drivers/net/ethernet/intel/i40e/i40e_txrx.h    |  2 +-
 drivers/net/ethernet/intel/i40e/i40e_type.h    |  5 ++++-
 7 files changed, 18 insertions(+), 17 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e.h b/drivers/net/ethernet/intel/i40e/i40e.h
index a7e212d1caa2..ada0e93c38f0 100644
--- a/drivers/net/ethernet/intel/i40e/i40e.h
+++ b/drivers/net/ethernet/intel/i40e/i40e.h
@@ -90,7 +90,7 @@
 #define I40E_OEM_RELEASE_MASK		0x0000ffff
 
 #define I40E_RX_DESC(R, i)	\
-	(&(((union i40e_32byte_rx_desc *)((R)->desc))[i]))
+	(&(((union i40e_rx_desc *)((R)->desc))[i]))
 #define I40E_TX_DESC(R, i)	\
 	(&(((struct i40e_tx_desc *)((R)->desc))[i]))
 #define I40E_TX_CTXTDESC(R, i)	\
diff --git a/drivers/net/ethernet/intel/i40e/i40e_debugfs.c b/drivers/net/ethernet/intel/i40e/i40e_debugfs.c
index d3ad2e3aa838..d7c13ca9be7d 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_debugfs.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_debugfs.c
@@ -604,10 +604,9 @@ static void i40e_dbg_dump_desc(int cnt, int vsi_seid, int ring_id, int desc_n,
 			} else {
 				rxd = I40E_RX_DESC(ring, i);
 				dev_info(&pf->pdev->dev,
-					 "   d[%03x] = 0x%016llx 0x%016llx 0x%016llx 0x%016llx\n",
+					 "   d[%03x] = 0x%016llx 0x%016llx\n",
 					 i, rxd->read.pkt_addr,
-					 rxd->read.hdr_addr,
-					 rxd->read.rsvd1, rxd->read.rsvd2);
+					 rxd->read.hdr_addr);
 			}
 		}
 	} else if (cnt == 3) {
@@ -625,10 +624,9 @@ static void i40e_dbg_dump_desc(int cnt, int vsi_seid, int ring_id, int desc_n,
 		} else {
 			rxd = I40E_RX_DESC(ring, desc_n);
 			dev_info(&pf->pdev->dev,
-				 "vsi = %02i rx ring = %02i d[%03x] = 0x%016llx 0x%016llx 0x%016llx 0x%016llx\n",
+				 "vsi = %02i rx ring = %02i d[%03x] = 0x%016llx 0x%016llx\n",
 				 vsi_seid, ring_id, desc_n,
-				 rxd->read.pkt_addr, rxd->read.hdr_addr,
-				 rxd->read.rsvd1, rxd->read.rsvd2);
+				 rxd->read.pkt_addr, rxd->read.hdr_addr);
 		}
 	} else {
 		dev_info(&pf->pdev->dev, "dump desc rx/tx/xdp <vsi_seid> <ring_id> [<desc_n>]\n");
diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c
index dadbfb3d2a2b..265be020f07f 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -3320,8 +3320,8 @@ static int i40e_configure_rx_ring(struct i40e_ring *ring)
 	rx_ctx.base = (ring->dma / 128);
 	rx_ctx.qlen = ring->count;
 
-	/* use 32 byte descriptors */
-	rx_ctx.dsize = 1;
+	/* use 16 byte descriptors */
+	rx_ctx.dsize = 0;
 
 	/* descriptor type is always zero
 	 * rx_ctx.dtype = 0;
diff --git a/drivers/net/ethernet/intel/i40e/i40e_trace.h b/drivers/net/ethernet/intel/i40e/i40e_trace.h
index 424f02077e2e..983f8b98b275 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_trace.h
+++ b/drivers/net/ethernet/intel/i40e/i40e_trace.h
@@ -112,7 +112,7 @@ DECLARE_EVENT_CLASS(
 	i40e_rx_template,
 
 	TP_PROTO(struct i40e_ring *ring,
-		 union i40e_32byte_rx_desc *desc,
+		 union i40e_16byte_rx_desc *desc,
 		 struct sk_buff *skb),
 
 	TP_ARGS(ring, desc, skb),
@@ -140,7 +140,7 @@ DECLARE_EVENT_CLASS(
 DEFINE_EVENT(
 	i40e_rx_template, i40e_clean_rx_irq,
 	TP_PROTO(struct i40e_ring *ring,
-		 union i40e_32byte_rx_desc *desc,
+		 union i40e_16byte_rx_desc *desc,
 		 struct sk_buff *skb),
 
 	TP_ARGS(ring, desc, skb));
@@ -148,7 +148,7 @@ DEFINE_EVENT(
 DEFINE_EVENT(
 	i40e_rx_template, i40e_clean_rx_irq_rx,
 	TP_PROTO(struct i40e_ring *ring,
-		 union i40e_32byte_rx_desc *desc,
+		 union i40e_16byte_rx_desc *desc,
 		 struct sk_buff *skb),
 
 	TP_ARGS(ring, desc, skb));
diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.c b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
index e1a76fc05b8d..0db656ef5b9a 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_txrx.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
@@ -533,11 +533,11 @@ static void i40e_fd_handle_status(struct i40e_ring *rx_ring, u64 qword0_raw,
 {
 	struct i40e_pf *pf = rx_ring->vsi->back;
 	struct pci_dev *pdev = pf->pdev;
-	struct i40e_32b_rx_wb_qw0 *qw0;
+	struct i40e_16b_rx_wb_qw0 *qw0;
 	u32 fcnt_prog, fcnt_avail;
 	u32 error;
 
-	qw0 = (struct i40e_32b_rx_wb_qw0 *)&qword0_raw;
+	qw0 = (struct i40e_16b_rx_wb_qw0 *)&qword0_raw;
 	error = (qword1 & I40E_RX_PROG_STATUS_DESC_QW1_ERROR_MASK) >>
 		I40E_RX_PROG_STATUS_DESC_QW1_ERROR_SHIFT;
 
@@ -1418,7 +1418,7 @@ int i40e_setup_rx_descriptors(struct i40e_ring *rx_ring)
 	u64_stats_init(&rx_ring->syncp);
 
 	/* Round up to nearest 4K */
-	rx_ring->size = rx_ring->count * sizeof(union i40e_32byte_rx_desc);
+	rx_ring->size = rx_ring->count * sizeof(union i40e_rx_desc);
 	rx_ring->size = ALIGN(rx_ring->size, 4096);
 	rx_ring->desc = dma_alloc_coherent(dev, rx_ring->size,
 					   &rx_ring->dma, GFP_KERNEL);
diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.h b/drivers/net/ethernet/intel/i40e/i40e_txrx.h
index 4036893d6825..0eacd5f21e9d 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_txrx.h
+++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.h
@@ -110,7 +110,7 @@ enum i40e_dyn_idx_t {
  */
 #define I40E_RX_HDR_SIZE I40E_RXBUFFER_256
 #define I40E_PACKET_HDR_PAD (ETH_HLEN + ETH_FCS_LEN + (VLAN_HLEN * 2))
-#define i40e_rx_desc i40e_32byte_rx_desc
+#define i40e_rx_desc i40e_16byte_rx_desc
 
 #define I40E_RX_DMA_ATTR \
 	(DMA_ATTR_SKIP_CPU_SYNC | DMA_ATTR_WEAK_ORDERING)
diff --git a/drivers/net/ethernet/intel/i40e/i40e_type.h b/drivers/net/ethernet/intel/i40e/i40e_type.h
index 52410d609ba1..97d29df65f9e 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_type.h
+++ b/drivers/net/ethernet/intel/i40e/i40e_type.h
@@ -628,7 +628,7 @@ union i40e_16byte_rx_desc {
 		__le64 hdr_addr; /* Header buffer address */
 	} read;
 	struct {
-		struct {
+		struct i40e_16b_rx_wb_qw0 {
 			struct {
 				union {
 					__le16 mirroring_status;
@@ -647,6 +647,9 @@ union i40e_16byte_rx_desc {
 			__le64 status_error_len;
 		} qword1;
 	} wb;  /* writeback */
+	struct {
+		u64 qword[2];
+	} raw;
 };
 
 union i40e_32byte_rx_desc {
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH net-next 3/4] i40e, xsk: increase budget for AF_XDP path
  2020-07-02 15:37 [PATCH net-next 0/4] i40e driver performance tweaks for AF_XDP Björn Töpel
  2020-07-02 15:37 ` [PATCH net-next 1/4] i40e, xsk: remove HW descriptor prefetch in AF_XDP path Björn Töpel
  2020-07-02 15:37 ` [PATCH net-next 2/4] i40e: use 16B HW descriptors instead of 32B Björn Töpel
@ 2020-07-02 15:37 ` Björn Töpel
  2020-07-08 20:44   ` [Intel-wired-lan] " Bowers, AndrewX
  2020-07-02 15:37 ` [PATCH net-next 4/4] i40e, xsk: move buffer allocation out of the Rx processing loop Björn Töpel
  3 siblings, 1 reply; 9+ messages in thread
From: Björn Töpel @ 2020-07-02 15:37 UTC (permalink / raw)
  To: intel-wired-lan; +Cc: Björn Töpel, magnus.karlsson, netdev, bpf

From: Björn Töpel <bjorn.topel@intel.com>

The napi_budget, meaning the number of received packets that are
allowed to be processed for each napi invocation, takes into
consideration that each received packet is aimed for the kernel
networking stack.

That is not the case for the AF_XDP receive path, where the cost of
each packet is significantly less. Therefore, this commit disregards
the napi budget and increases it to 256. Processing 256 packets
targeted for AF_XDP is still less work than 64 (napi budget) packets
going to the kernel networking stack.

The performance for the rx_drop scenario is up 7%.

Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
---
 drivers/net/ethernet/intel/i40e/i40e_xsk.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_xsk.c b/drivers/net/ethernet/intel/i40e/i40e_xsk.c
index 1f2dd591dbf1..99f4afdc403d 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_xsk.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_xsk.c
@@ -265,6 +265,8 @@ static void i40e_inc_ntc(struct i40e_ring *rx_ring)
 	rx_ring->next_to_clean = ntc;
 }
 
+#define I40E_XSK_CLEAN_RX_BUDGET 256U
+
 /**
  * i40e_clean_rx_irq_zc - Consumes Rx packets from the hardware ring
  * @rx_ring: Rx ring
@@ -280,7 +282,7 @@ int i40e_clean_rx_irq_zc(struct i40e_ring *rx_ring, int budget)
 	bool failure = false;
 	struct sk_buff *skb;
 
-	while (likely(total_rx_packets < (unsigned int)budget)) {
+	while (likely(total_rx_packets < I40E_XSK_CLEAN_RX_BUDGET)) {
 		union i40e_rx_desc *rx_desc;
 		struct xdp_buff **bi;
 		unsigned int size;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH net-next 4/4] i40e, xsk: move buffer allocation out of the Rx processing loop
  2020-07-02 15:37 [PATCH net-next 0/4] i40e driver performance tweaks for AF_XDP Björn Töpel
                   ` (2 preceding siblings ...)
  2020-07-02 15:37 ` [PATCH net-next 3/4] i40e, xsk: increase budget for AF_XDP path Björn Töpel
@ 2020-07-02 15:37 ` Björn Töpel
  2020-07-08 20:45   ` [Intel-wired-lan] " Bowers, AndrewX
  3 siblings, 1 reply; 9+ messages in thread
From: Björn Töpel @ 2020-07-02 15:37 UTC (permalink / raw)
  To: intel-wired-lan; +Cc: Björn Töpel, magnus.karlsson, netdev, bpf

From: Björn Töpel <bjorn.topel@intel.com>

Instead of checking in each iteration of the Rx packet processing
loop, move the allocation out of the loop and do it once for each napi
activation.

For AF_XDP the rx_drop benchmark was improved by 6%.

Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
---
 drivers/net/ethernet/intel/i40e/i40e_xsk.c | 12 ++++--------
 1 file changed, 4 insertions(+), 8 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_xsk.c b/drivers/net/ethernet/intel/i40e/i40e_xsk.c
index 99f4afdc403d..91aee16fbe72 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_xsk.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_xsk.c
@@ -279,8 +279,8 @@ int i40e_clean_rx_irq_zc(struct i40e_ring *rx_ring, int budget)
 	unsigned int total_rx_bytes = 0, total_rx_packets = 0;
 	u16 cleaned_count = I40E_DESC_UNUSED(rx_ring);
 	unsigned int xdp_res, xdp_xmit = 0;
-	bool failure = false;
 	struct sk_buff *skb;
+	bool failure;
 
 	while (likely(total_rx_packets < I40E_XSK_CLEAN_RX_BUDGET)) {
 		union i40e_rx_desc *rx_desc;
@@ -288,13 +288,6 @@ int i40e_clean_rx_irq_zc(struct i40e_ring *rx_ring, int budget)
 		unsigned int size;
 		u64 qword;
 
-		if (cleaned_count >= I40E_RX_BUFFER_WRITE) {
-			failure = failure ||
-				  !i40e_alloc_rx_buffers_zc(rx_ring,
-							    cleaned_count);
-			cleaned_count = 0;
-		}
-
 		rx_desc = I40E_RX_DESC(rx_ring, rx_ring->next_to_clean);
 		qword = le64_to_cpu(rx_desc->wb.qword1.status_error_len);
 
@@ -369,6 +362,9 @@ int i40e_clean_rx_irq_zc(struct i40e_ring *rx_ring, int budget)
 		napi_gro_receive(&rx_ring->q_vector->napi, skb);
 	}
 
+	if (cleaned_count >= I40E_RX_BUFFER_WRITE)
+		failure = !i40e_alloc_rx_buffers_zc(rx_ring, cleaned_count);
+
 	i40e_finalize_xdp_rx(rx_ring, xdp_xmit);
 	i40e_update_rx_stats(rx_ring, total_rx_bytes, total_rx_packets);
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* RE: [Intel-wired-lan] [PATCH net-next 1/4] i40e, xsk: remove HW descriptor prefetch in AF_XDP path
  2020-07-02 15:37 ` [PATCH net-next 1/4] i40e, xsk: remove HW descriptor prefetch in AF_XDP path Björn Töpel
@ 2020-07-08 20:43   ` Bowers, AndrewX
  0 siblings, 0 replies; 9+ messages in thread
From: Bowers, AndrewX @ 2020-07-08 20:43 UTC (permalink / raw)
  To: intel-wired-lan; +Cc: netdev, bpf

> -----Original Message-----
> From: Intel-wired-lan <intel-wired-lan-bounces@osuosl.org> On Behalf Of
> Björn Töpel
> Sent: Thursday, July 2, 2020 8:37 AM
> To: intel-wired-lan@lists.osuosl.org
> Cc: netdev@vger.kernel.org; bpf@vger.kernel.org; Topel, Bjorn
> <bjorn.topel@intel.com>; Karlsson, Magnus <magnus.karlsson@intel.com>
> Subject: [Intel-wired-lan] [PATCH net-next 1/4] i40e, xsk: remove HW
> descriptor prefetch in AF_XDP path
> 
> From: Björn Töpel <bjorn.topel@intel.com>
> 
> The software prefetching of HW descriptors has a negative impact on the
> performance. Therefore, it is now removed.
> 
> Performance for the rx_drop benchmark increased with 2%.
> 
> Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
> ---
>  drivers/net/ethernet/intel/i40e/i40e_txrx.c        | 13 +++++++++++++
>  drivers/net/ethernet/intel/i40e/i40e_txrx_common.h | 13 -------------
>  drivers/net/ethernet/intel/i40e/i40e_xsk.c         | 12 ++++++++++++
>  3 files changed, 25 insertions(+), 13 deletions(-)


Tested-by: Andrew Bowers <andrewx.bowers@intel.com>



^ permalink raw reply	[flat|nested] 9+ messages in thread

* RE: [Intel-wired-lan] [PATCH net-next 2/4] i40e: use 16B HW descriptors instead of 32B
  2020-07-02 15:37 ` [PATCH net-next 2/4] i40e: use 16B HW descriptors instead of 32B Björn Töpel
@ 2020-07-08 20:43   ` Bowers, AndrewX
  0 siblings, 0 replies; 9+ messages in thread
From: Bowers, AndrewX @ 2020-07-08 20:43 UTC (permalink / raw)
  To: intel-wired-lan; +Cc: netdev, bpf

> -----Original Message-----
> From: Intel-wired-lan <intel-wired-lan-bounces@osuosl.org> On Behalf Of
> Björn Töpel
> Sent: Thursday, July 2, 2020 8:37 AM
> To: intel-wired-lan@lists.osuosl.org
> Cc: netdev@vger.kernel.org; bpf@vger.kernel.org; Topel, Bjorn
> <bjorn.topel@intel.com>; Karlsson, Magnus <magnus.karlsson@intel.com>
> Subject: [Intel-wired-lan] [PATCH net-next 2/4] i40e: use 16B HW descriptors
> instead of 32B
> 
> From: Björn Töpel <bjorn.topel@intel.com>
> 
> The i40e NIC supports two flavors of HW descriptors, 16 and 32 byte. The
> latter has, obviously, room for more offloading information. However, the
> only fields of the 32B HW descriptor that is being used by the driver, is also
> available in the 16B descriptor.
> 
> In other words; Reading and writing 32 bytes instead of 16 byte is a waste of
> bus bandwidth.
> 
> This commit starts using 16 byte descriptors instead of 32 byte descriptors.
> 
> For AF_XDP the rx_drop benchmark was improved by 2%.
> 
> Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
> ---
>  drivers/net/ethernet/intel/i40e/i40e.h         |  2 +-
>  drivers/net/ethernet/intel/i40e/i40e_debugfs.c | 10 ++++------
>  drivers/net/ethernet/intel/i40e/i40e_main.c    |  4 ++--
>  drivers/net/ethernet/intel/i40e/i40e_trace.h   |  6 +++---
>  drivers/net/ethernet/intel/i40e/i40e_txrx.c    |  6 +++---
>  drivers/net/ethernet/intel/i40e/i40e_txrx.h    |  2 +-
>  drivers/net/ethernet/intel/i40e/i40e_type.h    |  5 ++++-
>  7 files changed, 18 insertions(+), 17 deletions(-)

Tested-by: Andrew Bowers <andrewx.bowers@intel.com>



^ permalink raw reply	[flat|nested] 9+ messages in thread

* RE: [Intel-wired-lan] [PATCH net-next 3/4] i40e, xsk: increase budget for AF_XDP path
  2020-07-02 15:37 ` [PATCH net-next 3/4] i40e, xsk: increase budget for AF_XDP path Björn Töpel
@ 2020-07-08 20:44   ` Bowers, AndrewX
  0 siblings, 0 replies; 9+ messages in thread
From: Bowers, AndrewX @ 2020-07-08 20:44 UTC (permalink / raw)
  To: intel-wired-lan; +Cc: netdev, bpf

> -----Original Message-----
> From: Intel-wired-lan <intel-wired-lan-bounces@osuosl.org> On Behalf Of
> Björn Töpel
> Sent: Thursday, July 2, 2020 8:37 AM
> To: intel-wired-lan@lists.osuosl.org
> Cc: netdev@vger.kernel.org; bpf@vger.kernel.org; Topel, Bjorn
> <bjorn.topel@intel.com>; Karlsson, Magnus <magnus.karlsson@intel.com>
> Subject: [Intel-wired-lan] [PATCH net-next 3/4] i40e, xsk: increase budget for
> AF_XDP path
> 
> From: Björn Töpel <bjorn.topel@intel.com>
> 
> The napi_budget, meaning the number of received packets that are allowed
> to be processed for each napi invocation, takes into consideration that each
> received packet is aimed for the kernel networking stack.
> 
> That is not the case for the AF_XDP receive path, where the cost of each
> packet is significantly less. Therefore, this commit disregards the napi budget
> and increases it to 256. Processing 256 packets targeted for AF_XDP is still less
> work than 64 (napi budget) packets going to the kernel networking stack.
> 
> The performance for the rx_drop scenario is up 7%.
> 
> Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
> ---
>  drivers/net/ethernet/intel/i40e/i40e_xsk.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)

Tested-by: Andrew Bowers <andrewx.bowers@intel.com>



^ permalink raw reply	[flat|nested] 9+ messages in thread

* RE: [Intel-wired-lan] [PATCH net-next 4/4] i40e, xsk: move buffer allocation out of the Rx processing loop
  2020-07-02 15:37 ` [PATCH net-next 4/4] i40e, xsk: move buffer allocation out of the Rx processing loop Björn Töpel
@ 2020-07-08 20:45   ` Bowers, AndrewX
  0 siblings, 0 replies; 9+ messages in thread
From: Bowers, AndrewX @ 2020-07-08 20:45 UTC (permalink / raw)
  To: intel-wired-lan; +Cc: netdev, bpf

> -----Original Message-----
> From: Intel-wired-lan <intel-wired-lan-bounces@osuosl.org> On Behalf Of
> Björn Töpel
> Sent: Thursday, July 2, 2020 8:38 AM
> To: intel-wired-lan@lists.osuosl.org
> Cc: netdev@vger.kernel.org; bpf@vger.kernel.org; Topel, Bjorn
> <bjorn.topel@intel.com>; Karlsson, Magnus <magnus.karlsson@intel.com>
> Subject: [Intel-wired-lan] [PATCH net-next 4/4] i40e, xsk: move buffer
> allocation out of the Rx processing loop
> 
> From: Björn Töpel <bjorn.topel@intel.com>
> 
> Instead of checking in each iteration of the Rx packet processing loop, move
> the allocation out of the loop and do it once for each napi activation.
> 
> For AF_XDP the rx_drop benchmark was improved by 6%.
> 
> Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
> ---
>  drivers/net/ethernet/intel/i40e/i40e_xsk.c | 12 ++++--------
>  1 file changed, 4 insertions(+), 8 deletions(-)

Tested-by: Andrew Bowers <andrewx.bowers@intel.com>



^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2020-07-08 20:45 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-07-02 15:37 [PATCH net-next 0/4] i40e driver performance tweaks for AF_XDP Björn Töpel
2020-07-02 15:37 ` [PATCH net-next 1/4] i40e, xsk: remove HW descriptor prefetch in AF_XDP path Björn Töpel
2020-07-08 20:43   ` [Intel-wired-lan] " Bowers, AndrewX
2020-07-02 15:37 ` [PATCH net-next 2/4] i40e: use 16B HW descriptors instead of 32B Björn Töpel
2020-07-08 20:43   ` [Intel-wired-lan] " Bowers, AndrewX
2020-07-02 15:37 ` [PATCH net-next 3/4] i40e, xsk: increase budget for AF_XDP path Björn Töpel
2020-07-08 20:44   ` [Intel-wired-lan] " Bowers, AndrewX
2020-07-02 15:37 ` [PATCH net-next 4/4] i40e, xsk: move buffer allocation out of the Rx processing loop Björn Töpel
2020-07-08 20:45   ` [Intel-wired-lan] " Bowers, AndrewX

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).