netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/5] Introducing ixgbe AF_XDP ZC support
@ 2018-09-24 16:35 Björn Töpel
  2018-09-24 16:35 ` [PATCH 1/5] ixgbe: added Rx/Tx ring disable/enable functions Björn Töpel
                   ` (4 more replies)
  0 siblings, 5 replies; 8+ messages in thread
From: Björn Töpel @ 2018-09-24 16:35 UTC (permalink / raw)
  To: jeffrey.t.kirsher, intel-wired-lan
  Cc: Björn Töpel, magnus.karlsson, magnus.karlsson, ast,
	daniel, netdev, brouer, u9012063, tuc

From: Björn Töpel <bjorn.topel@intel.com>

This patch set introduces zero-copy AF_XDP support for Intel's ixgbe
driver.

The ixgbe zero-copy code is located in its own file ixgbe_xsk.[ch],
analogous to the i40e ZC support. Again, as in i40e, code paths have
been copied from the XDP path to the zero-copy path. Going forward we
will try to generalize more code between the AF_XDP ZC drivers, and
also reduce the heavy C&P.

We have run some benchmarks on a dual socket system with two Broadwell
E5 2660 @ 2.0 GHz with hyperthreading turned off. Each socket has 14
cores which gives a total of 28, but only two cores are used in these
experiments. One for TR/RX and one for the user space application. The
memory is DDR4 @ 2133 MT/s (1067 MHz) and the size of each DIMM is
8192MB and with 8 of those DIMMs in the system we have 64 GB of total
memory. The compiler used is GCC 7.3.0. The NIC is Intel
82599ES/X520-2 10Gbit/s using the ixgbe driver.

Below are the results in Mpps of the 82599ES/X520-2 NIC benchmark runs
for 64B and 1500B packets, generated by a commercial packet generator
HW blasting packets at full 10Gbit/s line rate. The results are with
retpoline and all other spectre and meltdown fixes.

AF_XDP performance 64B packets:
Benchmark   XDP_DRV with zerocopy
rxdrop        14.7
txpush        14.6
l2fwd         11.1

AF_XDP performance 1500B packets:
Benchmark   XDP_DRV with zerocopy
rxdrop        0.8
l2fwd         0.8

XDP performance on our system as a base line.

64B packets:
XDP stats       CPU     Mpps       issue-pps
XDP-RX CPU      16      14.7       0

1500B packets:
XDP stats       CPU     Mpps       issue-pps
XDP-RX CPU      16      0.8        0

The structure of the patch set is as follows:

Patch 1: Introduce Rx/Tx ring enable/disable functionality
Patch 2: Preparatory patche to ixgbe driver code for RX
Patch 3: ixgbe zero-copy support for RX
Patch 4: Preparatory patch to ixgbe driver code for TX
Patch 5: ixgbe zero-copy support for TX

Cheers!
Björn

Björn Töpel (5):
  ixgbe: added Rx/Tx ring disable/enable functions
  ixgbe: move common Rx functions to ixgbe_txrx_common.h
  ixgbe: add AF_XDP zero-copy Rx support
  ixgbe: move common Tx functions to ixgbe_txrx_common.h
  ixgbe: add AF_XDP zero-copy Tx support

 drivers/net/ethernet/intel/ixgbe/Makefile     |   3 +-
 drivers/net/ethernet/intel/ixgbe/ixgbe.h      |  27 +-
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 290 ++++++-
 .../ethernet/intel/ixgbe/ixgbe_txrx_common.h  |  50 ++
 drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c  | 814 ++++++++++++++++++
 5 files changed, 1139 insertions(+), 45 deletions(-)
 create mode 100644 drivers/net/ethernet/intel/ixgbe/ixgbe_txrx_common.h
 create mode 100644 drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c

-- 
2.17.1

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH 1/5] ixgbe: added Rx/Tx ring disable/enable functions
  2018-09-24 16:35 [PATCH 0/5] Introducing ixgbe AF_XDP ZC support Björn Töpel
@ 2018-09-24 16:35 ` Björn Töpel
  2018-09-24 16:35 ` [PATCH 2/5] ixgbe: move common Rx functions to ixgbe_txrx_common.h Björn Töpel
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 8+ messages in thread
From: Björn Töpel @ 2018-09-24 16:35 UTC (permalink / raw)
  To: jeffrey.t.kirsher, intel-wired-lan
  Cc: Björn Töpel, magnus.karlsson, magnus.karlsson, ast,
	daniel, netdev, brouer, u9012063, tuc

From: Björn Töpel <bjorn.topel@intel.com>

Add functions for Rx/Tx ring enable/disable. Instead of resetting the
whole device, only the affected ring is disabled or enabled.

This plumbing is used in later commits, when zero-copy AF_XDP support
is introduced.

Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
---
 drivers/net/ethernet/intel/ixgbe/ixgbe.h      |   1 +
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 158 ++++++++++++++++++
 2 files changed, 159 insertions(+)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe.h b/drivers/net/ethernet/intel/ixgbe/ixgbe.h
index 5c6fd42e90ed..265db172042a 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe.h
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe.h
@@ -271,6 +271,7 @@ enum ixgbe_ring_state_t {
 	__IXGBE_TX_DETECT_HANG,
 	__IXGBE_HANG_CHECK_ARMED,
 	__IXGBE_TX_XDP_RING,
+	__IXGBE_TX_DISABLED,
 };
 
 #define ring_uses_build_skb(ring) \
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index d4d47707662f..3b68e20c067d 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -8688,6 +8688,8 @@ static netdev_tx_t __ixgbe_xmit_frame(struct sk_buff *skb,
 		return NETDEV_TX_OK;
 
 	tx_ring = ring ? ring : adapter->tx_ring[skb->queue_mapping];
+	if (unlikely(test_bit(__IXGBE_TX_DISABLED, &tx_ring->state)))
+		return NETDEV_TX_BUSY;
 
 	return ixgbe_xmit_frame_ring(skb, adapter, tx_ring);
 }
@@ -10256,6 +10258,9 @@ static int ixgbe_xdp_xmit(struct net_device *dev, int n,
 	if (unlikely(!ring))
 		return -ENXIO;
 
+	if (unlikely(test_bit(__IXGBE_TX_DISABLED, &ring->state)))
+		return -ENXIO;
+
 	for (i = 0; i < n; i++) {
 		struct xdp_frame *xdpf = frames[i];
 		int err;
@@ -10322,6 +10327,159 @@ static const struct net_device_ops ixgbe_netdev_ops = {
 	.ndo_xdp_xmit		= ixgbe_xdp_xmit,
 };
 
+static void ixgbe_disable_txr_hw(struct ixgbe_adapter *adapter,
+				 struct ixgbe_ring *tx_ring)
+{
+	unsigned long wait_delay, delay_interval;
+	struct ixgbe_hw *hw = &adapter->hw;
+	u8 reg_idx = tx_ring->reg_idx;
+	int wait_loop;
+	u32 txdctl;
+
+	IXGBE_WRITE_REG(hw, IXGBE_TXDCTL(reg_idx), IXGBE_TXDCTL_SWFLSH);
+
+	/* delay mechanism from ixgbe_disable_tx */
+	delay_interval = ixgbe_get_completion_timeout(adapter) / 100;
+
+	wait_loop = IXGBE_MAX_RX_DESC_POLL;
+	wait_delay = delay_interval;
+
+	while (wait_loop--) {
+		usleep_range(wait_delay, wait_delay + 10);
+		wait_delay += delay_interval * 2;
+		txdctl = IXGBE_READ_REG(hw, IXGBE_TXDCTL(reg_idx));
+
+		if (!(txdctl & IXGBE_TXDCTL_ENABLE))
+			return;
+	}
+
+	e_err(drv, "TXDCTL.ENABLE not cleared within the polling period\n");
+}
+
+static void ixgbe_disable_txr(struct ixgbe_adapter *adapter,
+			      struct ixgbe_ring *tx_ring)
+{
+	set_bit(__IXGBE_TX_DISABLED, &tx_ring->state);
+	ixgbe_disable_txr_hw(adapter, tx_ring);
+}
+
+static void ixgbe_disable_rxr_hw(struct ixgbe_adapter *adapter,
+				 struct ixgbe_ring *rx_ring)
+{
+	unsigned long wait_delay, delay_interval;
+	struct ixgbe_hw *hw = &adapter->hw;
+	u8 reg_idx = rx_ring->reg_idx;
+	int wait_loop;
+	u32 rxdctl;
+
+	rxdctl = IXGBE_READ_REG(hw, IXGBE_RXDCTL(reg_idx));
+	rxdctl &= ~IXGBE_RXDCTL_ENABLE;
+	rxdctl |= IXGBE_RXDCTL_SWFLSH;
+
+	/* write value back with RXDCTL.ENABLE bit cleared */
+	IXGBE_WRITE_REG(hw, IXGBE_RXDCTL(reg_idx), rxdctl);
+
+	/* RXDCTL.EN may not change on 82598 if link is down, so skip it */
+	if (hw->mac.type == ixgbe_mac_82598EB &&
+	    !(IXGBE_READ_REG(hw, IXGBE_LINKS) & IXGBE_LINKS_UP))
+		return;
+
+	/* delay mechanism from ixgbe_disable_rx */
+	delay_interval = ixgbe_get_completion_timeout(adapter) / 100;
+
+	wait_loop = IXGBE_MAX_RX_DESC_POLL;
+	wait_delay = delay_interval;
+
+	while (wait_loop--) {
+		usleep_range(wait_delay, wait_delay + 10);
+		wait_delay += delay_interval * 2;
+		rxdctl = IXGBE_READ_REG(hw, IXGBE_RXDCTL(reg_idx));
+
+		if (!(rxdctl & IXGBE_RXDCTL_ENABLE))
+			return;
+	}
+
+	e_err(drv, "RXDCTL.ENABLE not cleared within the polling period\n");
+}
+
+static void ixgbe_reset_txr_stats(struct ixgbe_ring *tx_ring)
+{
+	memset(&tx_ring->stats, 0, sizeof(tx_ring->stats));
+	memset(&tx_ring->tx_stats, 0, sizeof(tx_ring->tx_stats));
+}
+
+static void ixgbe_reset_rxr_stats(struct ixgbe_ring *rx_ring)
+{
+	memset(&rx_ring->stats, 0, sizeof(rx_ring->stats));
+	memset(&rx_ring->rx_stats, 0, sizeof(rx_ring->rx_stats));
+}
+
+/**
+ * ixgbe_txrx_ring_disable - Disable Rx/Tx/XDP Tx rings
+ * @adapter: adapter structure
+ * @ring: ring index
+ *
+ * This function disables a certain Rx/Tx/XDP Tx ring. The function
+ * assumes that the netdev is running.
+ **/
+void ixgbe_txrx_ring_disable(struct ixgbe_adapter *adapter, int ring)
+{
+	struct ixgbe_ring *rx_ring, *tx_ring, *xdp_ring;
+
+	rx_ring = adapter->rx_ring[ring];
+	tx_ring = adapter->tx_ring[ring];
+	xdp_ring = adapter->xdp_ring[ring];
+
+	ixgbe_disable_txr(adapter, tx_ring);
+	if (xdp_ring)
+		ixgbe_disable_txr(adapter, xdp_ring);
+	ixgbe_disable_rxr_hw(adapter, rx_ring);
+
+	if (xdp_ring)
+		synchronize_sched();
+
+	/* Rx/Tx/XDP Tx share the same napi context. */
+	napi_disable(&rx_ring->q_vector->napi);
+
+	ixgbe_clean_tx_ring(tx_ring);
+	if (xdp_ring)
+		ixgbe_clean_tx_ring(xdp_ring);
+	ixgbe_clean_rx_ring(rx_ring);
+
+	ixgbe_reset_txr_stats(tx_ring);
+	if (xdp_ring)
+		ixgbe_reset_txr_stats(xdp_ring);
+	ixgbe_reset_rxr_stats(rx_ring);
+}
+
+/**
+ * ixgbe_txrx_ring_enable - Enable Rx/Tx/XDP Tx rings
+ * @adapter: adapter structure
+ * @ring: ring index
+ *
+ * This function enables a certain Rx/Tx/XDP Tx ring. The function
+ * assumes that the netdev is running.
+ **/
+void ixgbe_txrx_ring_enable(struct ixgbe_adapter *adapter, int ring)
+{
+	struct ixgbe_ring *rx_ring, *tx_ring, *xdp_ring;
+
+	rx_ring = adapter->rx_ring[ring];
+	tx_ring = adapter->tx_ring[ring];
+	xdp_ring = adapter->xdp_ring[ring];
+
+	/* Rx/Tx/XDP Tx share the same napi context. */
+	napi_enable(&rx_ring->q_vector->napi);
+
+	ixgbe_configure_tx_ring(adapter, tx_ring);
+	if (xdp_ring)
+		ixgbe_configure_tx_ring(adapter, xdp_ring);
+	ixgbe_configure_rx_ring(adapter, rx_ring);
+
+	clear_bit(__IXGBE_TX_DISABLED, &tx_ring->state);
+	clear_bit(__IXGBE_TX_DISABLED, &xdp_ring->state);
+}
+
 /**
  * ixgbe_enumerate_functions - Get the number of ports this device has
  * @adapter: adapter structure
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH 2/5] ixgbe: move common Rx functions to ixgbe_txrx_common.h
  2018-09-24 16:35 [PATCH 0/5] Introducing ixgbe AF_XDP ZC support Björn Töpel
  2018-09-24 16:35 ` [PATCH 1/5] ixgbe: added Rx/Tx ring disable/enable functions Björn Töpel
@ 2018-09-24 16:35 ` Björn Töpel
  2018-09-24 16:35 ` [PATCH 3/5] ixgbe: add AF_XDP zero-copy Rx support Björn Töpel
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 8+ messages in thread
From: Björn Töpel @ 2018-09-24 16:35 UTC (permalink / raw)
  To: jeffrey.t.kirsher, intel-wired-lan
  Cc: Björn Töpel, magnus.karlsson, magnus.karlsson, ast,
	daniel, netdev, brouer, u9012063, tuc

From: Björn Töpel <bjorn.topel@intel.com>

This patch prepares for the upcoming zero-copy Rx functionality, by
moving/changing linkage of common functions, used both by the regular
path and zero-copy path.

Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
---
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 29 +++++++------------
 .../ethernet/intel/ixgbe/ixgbe_txrx_common.h  | 26 +++++++++++++++++
 2 files changed, 37 insertions(+), 18 deletions(-)
 create mode 100644 drivers/net/ethernet/intel/ixgbe/ixgbe_txrx_common.h

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index 3b68e20c067d..3e2e6fb2215a 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -40,6 +40,7 @@
 #include "ixgbe_dcb_82599.h"
 #include "ixgbe_sriov.h"
 #include "ixgbe_model.h"
+#include "ixgbe_txrx_common.h"
 
 char ixgbe_driver_name[] = "ixgbe";
 static const char ixgbe_driver_string[] =
@@ -1673,9 +1674,9 @@ static void ixgbe_update_rsc_stats(struct ixgbe_ring *rx_ring,
  * order to populate the hash, checksum, VLAN, timestamp, protocol, and
  * other fields within the skb.
  **/
-static void ixgbe_process_skb_fields(struct ixgbe_ring *rx_ring,
-				     union ixgbe_adv_rx_desc *rx_desc,
-				     struct sk_buff *skb)
+void ixgbe_process_skb_fields(struct ixgbe_ring *rx_ring,
+			      union ixgbe_adv_rx_desc *rx_desc,
+			      struct sk_buff *skb)
 {
 	struct net_device *dev = rx_ring->netdev;
 	u32 flags = rx_ring->q_vector->adapter->flags;
@@ -1708,8 +1709,8 @@ static void ixgbe_process_skb_fields(struct ixgbe_ring *rx_ring,
 	skb->protocol = eth_type_trans(skb, dev);
 }
 
-static void ixgbe_rx_skb(struct ixgbe_q_vector *q_vector,
-			 struct sk_buff *skb)
+void ixgbe_rx_skb(struct ixgbe_q_vector *q_vector,
+		  struct sk_buff *skb)
 {
 	napi_gro_receive(&q_vector->napi, skb);
 }
@@ -1868,9 +1869,9 @@ static void ixgbe_dma_sync_frag(struct ixgbe_ring *rx_ring,
  *
  * Returns true if an error was encountered and skb was freed.
  **/
-static bool ixgbe_cleanup_headers(struct ixgbe_ring *rx_ring,
-				  union ixgbe_adv_rx_desc *rx_desc,
-				  struct sk_buff *skb)
+bool ixgbe_cleanup_headers(struct ixgbe_ring *rx_ring,
+			   union ixgbe_adv_rx_desc *rx_desc,
+			   struct sk_buff *skb)
 {
 	struct net_device *netdev = rx_ring->netdev;
 
@@ -2186,14 +2187,6 @@ static struct sk_buff *ixgbe_build_skb(struct ixgbe_ring *rx_ring,
 	return skb;
 }
 
-#define IXGBE_XDP_PASS		0
-#define IXGBE_XDP_CONSUMED	BIT(0)
-#define IXGBE_XDP_TX		BIT(1)
-#define IXGBE_XDP_REDIR		BIT(2)
-
-static int ixgbe_xmit_xdp_ring(struct ixgbe_adapter *adapter,
-			       struct xdp_frame *xdpf);
-
 static struct sk_buff *ixgbe_run_xdp(struct ixgbe_adapter *adapter,
 				     struct ixgbe_ring *rx_ring,
 				     struct xdp_buff *xdp)
@@ -8465,8 +8458,8 @@ static u16 ixgbe_select_queue(struct net_device *dev, struct sk_buff *skb,
 }
 
 #endif
-static int ixgbe_xmit_xdp_ring(struct ixgbe_adapter *adapter,
-			       struct xdp_frame *xdpf)
+int ixgbe_xmit_xdp_ring(struct ixgbe_adapter *adapter,
+			struct xdp_frame *xdpf)
 {
 	struct ixgbe_ring *ring = adapter->xdp_ring[smp_processor_id()];
 	struct ixgbe_tx_buffer *tx_buffer;
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_txrx_common.h b/drivers/net/ethernet/intel/ixgbe/ixgbe_txrx_common.h
new file mode 100644
index 000000000000..3780d315b991
--- /dev/null
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_txrx_common.h
@@ -0,0 +1,26 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/* Copyright(c) 2018 Intel Corporation. */
+
+#ifndef _IXGBE_TXRX_COMMON_H_
+#define _IXGBE_TXRX_COMMON_H_
+
+#define IXGBE_XDP_PASS		0
+#define IXGBE_XDP_CONSUMED	BIT(0)
+#define IXGBE_XDP_TX		BIT(1)
+#define IXGBE_XDP_REDIR		BIT(2)
+
+int ixgbe_xmit_xdp_ring(struct ixgbe_adapter *adapter,
+			struct xdp_frame *xdpf);
+bool ixgbe_cleanup_headers(struct ixgbe_ring *rx_ring,
+			   union ixgbe_adv_rx_desc *rx_desc,
+			   struct sk_buff *skb);
+void ixgbe_process_skb_fields(struct ixgbe_ring *rx_ring,
+			      union ixgbe_adv_rx_desc *rx_desc,
+			      struct sk_buff *skb);
+void ixgbe_rx_skb(struct ixgbe_q_vector *q_vector,
+		  struct sk_buff *skb);
+
+void ixgbe_txrx_ring_disable(struct ixgbe_adapter *adapter, int ring);
+void ixgbe_txrx_ring_enable(struct ixgbe_adapter *adapter, int ring);
+
+#endif /* #define _IXGBE_TXRX_COMMON_H_ */
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH 3/5] ixgbe: add AF_XDP zero-copy Rx support
  2018-09-24 16:35 [PATCH 0/5] Introducing ixgbe AF_XDP ZC support Björn Töpel
  2018-09-24 16:35 ` [PATCH 1/5] ixgbe: added Rx/Tx ring disable/enable functions Björn Töpel
  2018-09-24 16:35 ` [PATCH 2/5] ixgbe: move common Rx functions to ixgbe_txrx_common.h Björn Töpel
@ 2018-09-24 16:35 ` Björn Töpel
  2018-09-25 14:57   ` Jakub Kicinski
  2018-09-24 16:35 ` [PATCH 4/5] ixgbe: move common Tx functions to ixgbe_txrx_common.h Björn Töpel
  2018-09-24 16:35 ` [PATCH 5/5] ixgbe: add AF_XDP zero-copy Tx support Björn Töpel
  4 siblings, 1 reply; 8+ messages in thread
From: Björn Töpel @ 2018-09-24 16:35 UTC (permalink / raw)
  To: jeffrey.t.kirsher, intel-wired-lan
  Cc: Björn Töpel, magnus.karlsson, magnus.karlsson, ast,
	daniel, netdev, brouer, u9012063, tuc

From: Björn Töpel <bjorn.topel@intel.com>

This patch adds zero-copy Rx support for AF_XDP sockets. Instead of
allocating buffers of type MEM_TYPE_PAGE_SHARED, the Rx frames are
allocated as MEM_TYPE_ZERO_COPY when AF_XDP is enabled for a certain
queue.

All AF_XDP specific functions are added to a new file, ixgbe_xsk.c.

Note that when AF_XDP zero-copy is enabled, the XDP action XDP_PASS
will allocate a new buffer and copy the zero-copy frame prior passing
it to the kernel stack.

Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
---
 drivers/net/ethernet/intel/ixgbe/Makefile     |   3 +-
 drivers/net/ethernet/intel/ixgbe/ixgbe.h      |  26 +-
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c |  78 ++-
 .../ethernet/intel/ixgbe/ixgbe_txrx_common.h  |  15 +
 drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c  | 639 ++++++++++++++++++
 5 files changed, 741 insertions(+), 20 deletions(-)
 create mode 100644 drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c

diff --git a/drivers/net/ethernet/intel/ixgbe/Makefile b/drivers/net/ethernet/intel/ixgbe/Makefile
index 5414685189ce..ca6b0c458e4a 100644
--- a/drivers/net/ethernet/intel/ixgbe/Makefile
+++ b/drivers/net/ethernet/intel/ixgbe/Makefile
@@ -8,7 +8,8 @@ obj-$(CONFIG_IXGBE) += ixgbe.o
 
 ixgbe-objs := ixgbe_main.o ixgbe_common.o ixgbe_ethtool.o \
               ixgbe_82599.o ixgbe_82598.o ixgbe_phy.o ixgbe_sriov.o \
-              ixgbe_mbx.o ixgbe_x540.o ixgbe_x550.o ixgbe_lib.o ixgbe_ptp.o
+              ixgbe_mbx.o ixgbe_x540.o ixgbe_x550.o ixgbe_lib.o ixgbe_ptp.o \
+              ixgbe_xsk.o
 
 ixgbe-$(CONFIG_IXGBE_DCB) +=  ixgbe_dcb.o ixgbe_dcb_82598.o \
                               ixgbe_dcb_82599.o ixgbe_dcb_nl.o
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe.h b/drivers/net/ethernet/intel/ixgbe/ixgbe.h
index 265db172042a..421fdac3a76d 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe.h
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe.h
@@ -228,13 +228,17 @@ struct ixgbe_tx_buffer {
 struct ixgbe_rx_buffer {
 	struct sk_buff *skb;
 	dma_addr_t dma;
-	struct page *page;
-#if (BITS_PER_LONG > 32) || (PAGE_SIZE >= 65536)
-	__u32 page_offset;
-#else
-	__u16 page_offset;
-#endif
-	__u16 pagecnt_bias;
+	union {
+		struct {
+			struct page *page;
+			__u32 page_offset;
+			__u16 pagecnt_bias;
+		};
+		struct {
+			void *addr;
+			u64 handle;
+		};
+	};
 };
 
 struct ixgbe_queue_stats {
@@ -348,6 +352,9 @@ struct ixgbe_ring {
 		struct ixgbe_rx_queue_stats rx_stats;
 	};
 	struct xdp_rxq_info xdp_rxq;
+	struct xdp_umem *xsk_umem;
+	struct zero_copy_allocator zca; /* ZC allocator anchor */
+	u16 rx_buf_len;
 } ____cacheline_internodealigned_in_smp;
 
 enum ixgbe_ring_f_enum {
@@ -765,6 +772,11 @@ struct ixgbe_adapter {
 #ifdef CONFIG_XFRM_OFFLOAD
 	struct ixgbe_ipsec *ipsec;
 #endif /* CONFIG_XFRM_OFFLOAD */
+
+	/* AF_XDP zero-copy */
+	struct xdp_umem **xsk_umems;
+	u16 num_xsk_umems_used;
+	u16 num_xsk_umems;
 };
 
 static inline u8 ixgbe_max_rss_indices(struct ixgbe_adapter *adapter)
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index 3e2e6fb2215a..4e6726c623a8 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -34,6 +34,7 @@
 #include <net/tc_act/tc_mirred.h>
 #include <net/vxlan.h>
 #include <net/mpls.h>
+#include <net/xdp_sock.h>
 
 #include "ixgbe.h"
 #include "ixgbe_common.h"
@@ -3176,7 +3177,10 @@ int ixgbe_poll(struct napi_struct *napi, int budget)
 		per_ring_budget = budget;
 
 	ixgbe_for_each_ring(ring, q_vector->rx) {
-		int cleaned = ixgbe_clean_rx_irq(q_vector, ring,
+		int cleaned = ring->xsk_umem ?
+			      ixgbe_clean_rx_irq_zc(q_vector, ring,
+						    per_ring_budget) :
+			      ixgbe_clean_rx_irq(q_vector, ring,
 						 per_ring_budget);
 
 		work_done += cleaned;
@@ -3706,10 +3710,27 @@ static void ixgbe_configure_srrctl(struct ixgbe_adapter *adapter,
 	srrctl = IXGBE_RX_HDR_SIZE << IXGBE_SRRCTL_BSIZEHDRSIZE_SHIFT;
 
 	/* configure the packet buffer length */
-	if (test_bit(__IXGBE_RX_3K_BUFFER, &rx_ring->state))
+	if (rx_ring->xsk_umem) {
+		u32 xsk_buf_len = rx_ring->xsk_umem->chunk_size_nohr -
+				  XDP_PACKET_HEADROOM;
+
+		/* If the MAC support setting RXDCTL.RLPML, the
+		 * SRRCTL[n].BSIZEPKT is set to PAGE_SIZE and
+		 * RXDCTL.RLPML is set to the actual UMEM buffer
+		 * size. If not, then we are stuck with a 1k buffer
+		 * size resolution. In this case frames larger than
+		 * the UMEM buffer size viewed in a 1k resolution will
+		 * be dropped.
+		 */
+		if (hw->mac.type != ixgbe_mac_82599EB)
+			srrctl |= PAGE_SIZE >> IXGBE_SRRCTL_BSIZEPKT_SHIFT;
+		else
+			srrctl |= xsk_buf_len >> IXGBE_SRRCTL_BSIZEPKT_SHIFT;
+	} else if (test_bit(__IXGBE_RX_3K_BUFFER, &rx_ring->state)) {
 		srrctl |= IXGBE_RXBUFFER_3K >> IXGBE_SRRCTL_BSIZEPKT_SHIFT;
-	else
+	} else {
 		srrctl |= IXGBE_RXBUFFER_2K >> IXGBE_SRRCTL_BSIZEPKT_SHIFT;
+	}
 
 	/* configure descriptor type */
 	srrctl |= IXGBE_SRRCTL_DESCTYPE_ADV_ONEBUF;
@@ -4032,6 +4053,19 @@ void ixgbe_configure_rx_ring(struct ixgbe_adapter *adapter,
 	u32 rxdctl;
 	u8 reg_idx = ring->reg_idx;
 
+	xdp_rxq_info_unreg_mem_model(&ring->xdp_rxq);
+	ring->xsk_umem = ixgbe_xsk_umem(adapter, ring);
+	if (ring->xsk_umem) {
+		ring->zca.free = ixgbe_zca_free;
+		WARN_ON(xdp_rxq_info_reg_mem_model(&ring->xdp_rxq,
+						   MEM_TYPE_ZERO_COPY,
+						   &ring->zca));
+
+	} else {
+		WARN_ON(xdp_rxq_info_reg_mem_model(&ring->xdp_rxq,
+						   MEM_TYPE_PAGE_SHARED, NULL));
+	}
+
 	/* disable queue to avoid use of these values while updating state */
 	rxdctl = IXGBE_READ_REG(hw, IXGBE_RXDCTL(reg_idx));
 	rxdctl &= ~IXGBE_RXDCTL_ENABLE;
@@ -4081,6 +4115,17 @@ void ixgbe_configure_rx_ring(struct ixgbe_adapter *adapter,
 #endif
 	}
 
+	if (ring->xsk_umem && hw->mac.type != ixgbe_mac_82599EB) {
+		u32 xsk_buf_len = ring->xsk_umem->chunk_size_nohr -
+				  XDP_PACKET_HEADROOM;
+
+		rxdctl &= ~(IXGBE_RXDCTL_RLPMLMASK |
+			    IXGBE_RXDCTL_RLPML_EN);
+		rxdctl |= xsk_buf_len | IXGBE_RXDCTL_RLPML_EN;
+
+		ring->rx_buf_len = xsk_buf_len;
+	}
+
 	/* initialize rx_buffer_info */
 	memset(ring->rx_buffer_info, 0,
 	       sizeof(struct ixgbe_rx_buffer) * ring->count);
@@ -4094,7 +4139,10 @@ void ixgbe_configure_rx_ring(struct ixgbe_adapter *adapter,
 	IXGBE_WRITE_REG(hw, IXGBE_RXDCTL(reg_idx), rxdctl);
 
 	ixgbe_rx_desc_queue_enable(adapter, ring);
-	ixgbe_alloc_rx_buffers(ring, ixgbe_desc_unused(ring));
+	if (ring->xsk_umem)
+		ixgbe_alloc_rx_buffers_zc(ring, ixgbe_desc_unused(ring));
+	else
+		ixgbe_alloc_rx_buffers(ring, ixgbe_desc_unused(ring));
 }
 
 static void ixgbe_setup_psrtype(struct ixgbe_adapter *adapter)
@@ -5202,6 +5250,11 @@ static void ixgbe_clean_rx_ring(struct ixgbe_ring *rx_ring)
 	u16 i = rx_ring->next_to_clean;
 	struct ixgbe_rx_buffer *rx_buffer = &rx_ring->rx_buffer_info[i];
 
+	if (rx_ring->xsk_umem) {
+		ixgbe_xsk_clean_rx_ring(rx_ring);
+		goto skip_free;
+	}
+
 	/* Free all the Rx ring sk_buffs */
 	while (i != rx_ring->next_to_alloc) {
 		if (rx_buffer->skb) {
@@ -5240,6 +5293,7 @@ static void ixgbe_clean_rx_ring(struct ixgbe_ring *rx_ring)
 		}
 	}
 
+skip_free:
 	rx_ring->next_to_alloc = 0;
 	rx_ring->next_to_clean = 0;
 	rx_ring->next_to_use = 0;
@@ -6435,7 +6489,7 @@ int ixgbe_setup_rx_resources(struct ixgbe_adapter *adapter,
 	struct device *dev = rx_ring->dev;
 	int orig_node = dev_to_node(dev);
 	int ring_node = -1;
-	int size, err;
+	int size;
 
 	size = sizeof(struct ixgbe_rx_buffer) * rx_ring->count;
 
@@ -6472,13 +6526,6 @@ int ixgbe_setup_rx_resources(struct ixgbe_adapter *adapter,
 			     rx_ring->queue_index) < 0)
 		goto err;
 
-	err = xdp_rxq_info_reg_mem_model(&rx_ring->xdp_rxq,
-					 MEM_TYPE_PAGE_SHARED, NULL);
-	if (err) {
-		xdp_rxq_info_unreg(&rx_ring->xdp_rxq);
-		goto err;
-	}
-
 	rx_ring->xdp_prog = adapter->xdp_prog;
 
 	return 0;
@@ -10216,6 +10263,13 @@ static int ixgbe_xdp(struct net_device *dev, struct netdev_bpf *xdp)
 		xdp->prog_id = adapter->xdp_prog ?
 			adapter->xdp_prog->aux->id : 0;
 		return 0;
+	case XDP_QUERY_XSK_UMEM:
+		return ixgbe_xsk_umem_query(adapter, &xdp->xsk.umem,
+					    xdp->xsk.queue_id);
+	case XDP_SETUP_XSK_UMEM:
+		return ixgbe_xsk_umem_setup(adapter, xdp->xsk.umem,
+					    xdp->xsk.queue_id);
+
 	default:
 		return -EINVAL;
 	}
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_txrx_common.h b/drivers/net/ethernet/intel/ixgbe/ixgbe_txrx_common.h
index 3780d315b991..cf219f4e009d 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_txrx_common.h
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_txrx_common.h
@@ -23,4 +23,19 @@ void ixgbe_rx_skb(struct ixgbe_q_vector *q_vector,
 void ixgbe_txrx_ring_disable(struct ixgbe_adapter *adapter, int ring);
 void ixgbe_txrx_ring_enable(struct ixgbe_adapter *adapter, int ring);
 
+struct xdp_umem *ixgbe_xsk_umem(struct ixgbe_adapter *adapter,
+				struct ixgbe_ring *ring);
+int ixgbe_xsk_umem_query(struct ixgbe_adapter *adapter, struct xdp_umem **umem,
+			 u16 qid);
+int ixgbe_xsk_umem_setup(struct ixgbe_adapter *adapter, struct xdp_umem *umem,
+			 u16 qid);
+
+void ixgbe_zca_free(struct zero_copy_allocator *alloc, unsigned long handle);
+
+void ixgbe_alloc_rx_buffers_zc(struct ixgbe_ring *rx_ring, u16 cleaned_count);
+int ixgbe_clean_rx_irq_zc(struct ixgbe_q_vector *q_vector,
+			  struct ixgbe_ring *rx_ring,
+			  const int budget);
+void ixgbe_xsk_clean_rx_ring(struct ixgbe_ring *rx_ring);
+
 #endif /* #define _IXGBE_TXRX_COMMON_H_ */
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c
new file mode 100644
index 000000000000..253ce3cfbcf1
--- /dev/null
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c
@@ -0,0 +1,639 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright(c) 2018 Intel Corporation. */
+
+#include <linux/bpf_trace.h>
+#include <net/xdp_sock.h>
+#include <net/xdp.h>
+
+#include "ixgbe.h"
+#include "ixgbe_txrx_common.h"
+
+struct xdp_umem *ixgbe_xsk_umem(struct ixgbe_adapter *adapter,
+				struct ixgbe_ring *ring)
+{
+	bool xdp_on = READ_ONCE(adapter->xdp_prog);
+	int qid = ring->queue_index;
+
+	if (!adapter->xsk_umems || !adapter->xsk_umems[qid] || !xdp_on)
+		return NULL;
+
+	return adapter->xsk_umems[qid];
+}
+
+static int ixgbe_alloc_xsk_umems(struct ixgbe_adapter *adapter)
+{
+	if (adapter->xsk_umems)
+		return 0;
+
+	adapter->num_xsk_umems_used = 0;
+	adapter->num_xsk_umems = adapter->num_rx_queues;
+	adapter->xsk_umems = kcalloc(adapter->num_xsk_umems,
+				     sizeof(*adapter->xsk_umems),
+				     GFP_KERNEL);
+	if (!adapter->xsk_umems) {
+		adapter->num_xsk_umems = 0;
+		return -ENOMEM;
+	}
+
+	return 0;
+}
+
+static int ixgbe_add_xsk_umem(struct ixgbe_adapter *adapter,
+			      struct xdp_umem *umem,
+			      u16 qid)
+{
+	int err;
+
+	err = ixgbe_alloc_xsk_umems(adapter);
+	if (err)
+		return err;
+
+	adapter->xsk_umems[qid] = umem;
+	adapter->num_xsk_umems_used++;
+
+	return 0;
+}
+
+static void ixgbe_remove_xsk_umem(struct ixgbe_adapter *adapter, u16 qid)
+{
+	adapter->xsk_umems[qid] = NULL;
+	adapter->num_xsk_umems_used--;
+
+	if (adapter->num_xsk_umems == 0) {
+		kfree(adapter->xsk_umems);
+		adapter->xsk_umems = NULL;
+		adapter->num_xsk_umems = 0;
+	}
+}
+
+static int ixgbe_xsk_umem_dma_map(struct ixgbe_adapter *adapter,
+				  struct xdp_umem *umem)
+{
+	struct device *dev = &adapter->pdev->dev;
+	unsigned int i, j;
+	dma_addr_t dma;
+
+	for (i = 0; i < umem->npgs; i++) {
+		dma = dma_map_page_attrs(dev, umem->pgs[i], 0, PAGE_SIZE,
+					 DMA_BIDIRECTIONAL, IXGBE_RX_DMA_ATTR);
+		if (dma_mapping_error(dev, dma))
+			goto out_unmap;
+
+		umem->pages[i].dma = dma;
+	}
+
+	return 0;
+
+out_unmap:
+	for (j = 0; j < i; j++) {
+		dma_unmap_page_attrs(dev, umem->pages[i].dma, PAGE_SIZE,
+				     DMA_BIDIRECTIONAL, IXGBE_RX_DMA_ATTR);
+		umem->pages[i].dma = 0;
+	}
+
+	return -1;
+}
+
+static void ixgbe_xsk_umem_dma_unmap(struct ixgbe_adapter *adapter,
+				     struct xdp_umem *umem)
+{
+	struct device *dev = &adapter->pdev->dev;
+	unsigned int i;
+
+	for (i = 0; i < umem->npgs; i++) {
+		dma_unmap_page_attrs(dev, umem->pages[i].dma, PAGE_SIZE,
+				     DMA_BIDIRECTIONAL, IXGBE_RX_DMA_ATTR);
+
+		umem->pages[i].dma = 0;
+	}
+}
+
+static int ixgbe_xsk_umem_enable(struct ixgbe_adapter *adapter,
+				 struct xdp_umem *umem,
+				 u16 qid)
+{
+	struct xdp_umem_fq_reuse *reuseq;
+	bool if_running;
+	int err;
+
+	if (adapter->flags & IXGBE_FLAG_SRIOV_ENABLED)
+		return -EINVAL;
+
+	if (adapter->flags & IXGBE_FLAG_DCB_ENABLED)
+		return -EINVAL;
+
+	if (qid >= adapter->num_rx_queues)
+		return -EINVAL;
+
+	if (adapter->xsk_umems) {
+		if (qid >= adapter->num_xsk_umems)
+			return -EINVAL;
+		if (adapter->xsk_umems[qid])
+			return -EBUSY;
+	}
+
+	reuseq = xsk_reuseq_prepare(adapter->rx_ring[0]->count);
+	if (!reuseq)
+		return -ENOMEM;
+
+	xsk_reuseq_free(xsk_reuseq_swap(umem, reuseq));
+
+	err = ixgbe_xsk_umem_dma_map(adapter, umem);
+	if (err)
+		return err;
+
+	if_running = netif_running(adapter->netdev) &&
+		     READ_ONCE(adapter->xdp_prog);
+
+	if (if_running)
+		ixgbe_txrx_ring_disable(adapter, qid);
+
+	err = ixgbe_add_xsk_umem(adapter, umem, qid);
+
+	if (if_running)
+		ixgbe_txrx_ring_enable(adapter, qid);
+
+	return err;
+}
+
+static int ixgbe_xsk_umem_disable(struct ixgbe_adapter *adapter, u16 qid)
+{
+	bool if_running;
+
+	if (!adapter->xsk_umems || qid >= adapter->num_xsk_umems ||
+	    !adapter->xsk_umems[qid])
+		return -EINVAL;
+
+	if_running = netif_running(adapter->netdev) &&
+		     READ_ONCE(adapter->xdp_prog);
+
+	if (if_running)
+		ixgbe_txrx_ring_disable(adapter, qid);
+
+	ixgbe_xsk_umem_dma_unmap(adapter, adapter->xsk_umems[qid]);
+	ixgbe_remove_xsk_umem(adapter, qid);
+
+	if (if_running)
+		ixgbe_txrx_ring_enable(adapter, qid);
+
+	return 0;
+}
+
+int ixgbe_xsk_umem_query(struct ixgbe_adapter *adapter, struct xdp_umem **umem,
+			 u16 qid)
+{
+	if (adapter->flags & IXGBE_FLAG_SRIOV_ENABLED)
+		return -EINVAL;
+
+	if (adapter->flags & IXGBE_FLAG_DCB_ENABLED)
+		return -EINVAL;
+
+	if (qid >= adapter->num_rx_queues)
+		return -EINVAL;
+
+	if (adapter->xsk_umems) {
+		if (qid >= adapter->num_xsk_umems)
+			return -EINVAL;
+		*umem = adapter->xsk_umems[qid];
+		return 0;
+	}
+
+	*umem = NULL;
+	return 0;
+}
+
+int ixgbe_xsk_umem_setup(struct ixgbe_adapter *adapter, struct xdp_umem *umem,
+			 u16 qid)
+{
+	return umem ? ixgbe_xsk_umem_enable(adapter, umem, qid) :
+		ixgbe_xsk_umem_disable(adapter, qid);
+}
+
+static int ixgbe_run_xdp_zc(struct ixgbe_adapter *adapter,
+			    struct ixgbe_ring *rx_ring,
+			    struct xdp_buff *xdp)
+{
+	int err, result = IXGBE_XDP_PASS;
+	struct bpf_prog *xdp_prog;
+	struct xdp_frame *xdpf;
+	u32 act;
+
+	rcu_read_lock();
+	xdp_prog = READ_ONCE(rx_ring->xdp_prog);
+	act = bpf_prog_run_xdp(xdp_prog, xdp);
+	xdp->handle += xdp->data - xdp->data_hard_start;
+	switch (act) {
+	case XDP_PASS:
+		break;
+	case XDP_TX:
+		xdpf = convert_to_xdp_frame(xdp);
+		if (unlikely(!xdpf)) {
+			result = IXGBE_XDP_CONSUMED;
+			break;
+		}
+		result = ixgbe_xmit_xdp_ring(adapter, xdpf);
+		break;
+	case XDP_REDIRECT:
+		err = xdp_do_redirect(rx_ring->netdev, xdp, xdp_prog);
+		result = !err ? IXGBE_XDP_REDIR : IXGBE_XDP_CONSUMED;
+		break;
+	default:
+		bpf_warn_invalid_xdp_action(act);
+		/* fallthrough */
+	case XDP_ABORTED:
+		trace_xdp_exception(rx_ring->netdev, xdp_prog, act);
+		/* fallthrough -- handle aborts by dropping packet */
+	case XDP_DROP:
+		result = IXGBE_XDP_CONSUMED;
+		break;
+	}
+	rcu_read_unlock();
+	return result;
+}
+
+static struct ixgbe_rx_buffer *ixgbe_get_rx_buffer_zc(
+	struct ixgbe_ring *rx_ring,
+	unsigned int size)
+{
+	struct ixgbe_rx_buffer *bi;
+
+	bi = &rx_ring->rx_buffer_info[rx_ring->next_to_clean];
+
+	/* we are reusing so sync this buffer for CPU use */
+	dma_sync_single_range_for_cpu(rx_ring->dev,
+				      bi->dma, 0,
+				      size,
+				      DMA_BIDIRECTIONAL);
+
+	return bi;
+}
+
+static void ixgbe_reuse_rx_buffer_zc(struct ixgbe_ring *rx_ring,
+				     struct ixgbe_rx_buffer *obi)
+{
+	unsigned long mask = (unsigned long)rx_ring->xsk_umem->chunk_mask;
+	u64 hr = rx_ring->xsk_umem->headroom + XDP_PACKET_HEADROOM;
+	u16 nta = rx_ring->next_to_alloc;
+	struct ixgbe_rx_buffer *nbi;
+
+	nbi = &rx_ring->rx_buffer_info[rx_ring->next_to_alloc];
+	/* update, and store next to alloc */
+	nta++;
+	rx_ring->next_to_alloc = (nta < rx_ring->count) ? nta : 0;
+
+	/* transfer page from old buffer to new buffer */
+	nbi->dma = obi->dma & mask;
+	nbi->dma += hr;
+
+	nbi->addr = (void *)((unsigned long)obi->addr & mask);
+	nbi->addr += hr;
+
+	nbi->handle = obi->handle & mask;
+	nbi->handle += rx_ring->xsk_umem->headroom;
+
+	obi->addr = NULL;
+	obi->skb = NULL;
+}
+
+void ixgbe_zca_free(struct zero_copy_allocator *alloc, unsigned long handle)
+{
+	struct ixgbe_rx_buffer *bi;
+	struct ixgbe_ring *rx_ring;
+	u64 hr, mask;
+	u16 nta;
+
+	rx_ring = container_of(alloc, struct ixgbe_ring, zca);
+	hr = rx_ring->xsk_umem->headroom + XDP_PACKET_HEADROOM;
+	mask = rx_ring->xsk_umem->chunk_mask;
+
+	nta = rx_ring->next_to_alloc;
+	bi = rx_ring->rx_buffer_info;
+
+	nta++;
+	rx_ring->next_to_alloc = (nta < rx_ring->count) ? nta : 0;
+
+	handle &= mask;
+
+	bi->dma = xdp_umem_get_dma(rx_ring->xsk_umem, handle);
+	bi->dma += hr;
+
+	bi->addr = xdp_umem_get_data(rx_ring->xsk_umem, handle);
+	bi->addr += hr;
+
+	bi->handle = (u64)handle + rx_ring->xsk_umem->headroom;
+}
+
+static bool ixgbe_alloc_buffer_zc(struct ixgbe_ring *rx_ring,
+				  struct ixgbe_rx_buffer *bi)
+{
+	struct xdp_umem *umem = rx_ring->xsk_umem;
+	void *addr = bi->addr;
+	u64 handle, hr;
+
+	if (addr)
+		return true;
+
+	if (!xsk_umem_peek_addr(umem, &handle)) {
+		rx_ring->rx_stats.alloc_rx_page_failed++;
+		return false;
+	}
+
+	hr = umem->headroom + XDP_PACKET_HEADROOM;
+
+	bi->dma = xdp_umem_get_dma(umem, handle);
+	bi->dma += hr;
+
+	bi->addr = xdp_umem_get_data(umem, handle);
+	bi->addr += hr;
+
+	bi->handle = handle + umem->headroom;
+
+	xsk_umem_discard_addr(umem);
+	return true;
+}
+
+static bool ixgbe_alloc_buffer_slow_zc(struct ixgbe_ring *rx_ring,
+				       struct ixgbe_rx_buffer *bi)
+{
+	struct xdp_umem *umem = rx_ring->xsk_umem;
+	u64 handle, hr;
+
+	if (!xsk_umem_peek_addr_rq(umem, &handle)) {
+		rx_ring->rx_stats.alloc_rx_page_failed++;
+		return false;
+	}
+
+	handle &= rx_ring->xsk_umem->chunk_mask;
+
+	hr = umem->headroom + XDP_PACKET_HEADROOM;
+
+	bi->dma = xdp_umem_get_dma(umem, handle);
+	bi->dma += hr;
+
+	bi->addr = xdp_umem_get_data(umem, handle);
+	bi->addr += hr;
+
+	bi->handle = handle + umem->headroom;
+
+	xsk_umem_discard_addr_rq(umem);
+	return true;
+}
+
+static __always_inline bool __ixgbe_alloc_rx_buffers_zc(
+	struct ixgbe_ring *rx_ring,
+	u16 cleaned_count,
+	bool alloc(struct ixgbe_ring *rx_ring,
+		   struct ixgbe_rx_buffer *bi))
+{
+	union ixgbe_adv_rx_desc *rx_desc;
+	struct ixgbe_rx_buffer *bi;
+	u16 i = rx_ring->next_to_use;
+	bool ok = true;
+
+	/* nothing to do */
+	if (!cleaned_count)
+		return true;
+
+	rx_desc = IXGBE_RX_DESC(rx_ring, i);
+	bi = &rx_ring->rx_buffer_info[i];
+	i -= rx_ring->count;
+
+	do {
+		if (!alloc(rx_ring, bi)) {
+			ok = false;
+			break;
+		}
+
+		/* sync the buffer for use by the device */
+		dma_sync_single_range_for_device(rx_ring->dev, bi->dma,
+						 bi->page_offset,
+						 rx_ring->rx_buf_len,
+						 DMA_BIDIRECTIONAL);
+
+		/* Refresh the desc even if buffer_addrs didn't change
+		 * because each write-back erases this info.
+		 */
+		rx_desc->read.pkt_addr = cpu_to_le64(bi->dma);
+
+		rx_desc++;
+		bi++;
+		i++;
+		if (unlikely(!i)) {
+			rx_desc = IXGBE_RX_DESC(rx_ring, 0);
+			bi = rx_ring->rx_buffer_info;
+			i -= rx_ring->count;
+		}
+
+		/* clear the length for the next_to_use descriptor */
+		rx_desc->wb.upper.length = 0;
+
+		cleaned_count--;
+	} while (cleaned_count);
+
+	i += rx_ring->count;
+
+	if (rx_ring->next_to_use != i) {
+		rx_ring->next_to_use = i;
+
+		/* update next to alloc since we have filled the ring */
+		rx_ring->next_to_alloc = i;
+
+		/* Force memory writes to complete before letting h/w
+		 * know there are new descriptors to fetch.  (Only
+		 * applicable for weak-ordered memory model archs,
+		 * such as IA-64).
+		 */
+		wmb();
+		writel(i, rx_ring->tail);
+	}
+
+	return ok;
+}
+
+void ixgbe_alloc_rx_buffers_zc(struct ixgbe_ring *rx_ring, u16 count)
+{
+	__ixgbe_alloc_rx_buffers_zc(rx_ring, count,
+				    ixgbe_alloc_buffer_slow_zc);
+}
+
+static bool ixgbe_alloc_rx_buffers_fast_zc(struct ixgbe_ring *rx_ring,
+					   u16 count)
+{
+	return __ixgbe_alloc_rx_buffers_zc(rx_ring, count,
+					   ixgbe_alloc_buffer_zc);
+}
+
+static struct sk_buff *ixgbe_construct_skb_zc(struct ixgbe_ring *rx_ring,
+					      struct ixgbe_rx_buffer *bi,
+					      struct xdp_buff *xdp)
+{
+	unsigned int metasize = xdp->data - xdp->data_meta;
+	unsigned int datasize = xdp->data_end - xdp->data;
+	struct sk_buff *skb;
+
+	/* allocate a skb to store the frags */
+	skb = __napi_alloc_skb(&rx_ring->q_vector->napi,
+			       xdp->data_end - xdp->data_hard_start,
+			       GFP_ATOMIC | __GFP_NOWARN);
+	if (unlikely(!skb))
+		return NULL;
+
+	skb_reserve(skb, xdp->data - xdp->data_hard_start);
+	memcpy(__skb_put(skb, datasize), xdp->data, datasize);
+	if (metasize)
+		skb_metadata_set(skb, metasize);
+
+	ixgbe_reuse_rx_buffer_zc(rx_ring, bi);
+	return skb;
+}
+
+static void ixgbe_inc_ntc(struct ixgbe_ring *rx_ring)
+{
+	u32 ntc = rx_ring->next_to_clean + 1;
+
+	ntc = (ntc < rx_ring->count) ? ntc : 0;
+	rx_ring->next_to_clean = ntc;
+	prefetch(IXGBE_RX_DESC(rx_ring, ntc));
+}
+
+int ixgbe_clean_rx_irq_zc(struct ixgbe_q_vector *q_vector,
+			  struct ixgbe_ring *rx_ring,
+			  const int budget)
+{
+	unsigned int total_rx_bytes = 0, total_rx_packets = 0;
+	struct ixgbe_adapter *adapter = q_vector->adapter;
+	u16 cleaned_count = ixgbe_desc_unused(rx_ring);
+	unsigned int xdp_res, xdp_xmit = 0;
+	bool failure = false;
+	struct sk_buff *skb;
+	struct xdp_buff xdp;
+
+	xdp.rxq = &rx_ring->xdp_rxq;
+
+	while (likely(total_rx_packets < budget)) {
+		union ixgbe_adv_rx_desc *rx_desc;
+		struct ixgbe_rx_buffer *bi;
+		unsigned int size;
+
+		/* return some buffers to hardware, one at a time is too slow */
+		if (cleaned_count >= IXGBE_RX_BUFFER_WRITE) {
+			failure = failure ||
+				  !ixgbe_alloc_rx_buffers_fast_zc(
+					  rx_ring,
+					  cleaned_count);
+			cleaned_count = 0;
+		}
+
+		rx_desc = IXGBE_RX_DESC(rx_ring, rx_ring->next_to_clean);
+		size = le16_to_cpu(rx_desc->wb.upper.length);
+		if (!size)
+			break;
+
+		/* This memory barrier is needed to keep us from reading
+		 * any other fields out of the rx_desc until we know the
+		 * descriptor has been written back
+		 */
+		dma_rmb();
+
+		bi = ixgbe_get_rx_buffer_zc(rx_ring, size);
+
+		if (unlikely(!ixgbe_test_staterr(rx_desc,
+						 IXGBE_RXD_STAT_EOP))) {
+			struct ixgbe_rx_buffer *next_bi;
+
+			ixgbe_reuse_rx_buffer_zc(rx_ring, bi);
+			ixgbe_inc_ntc(rx_ring);
+			next_bi = &rx_ring->rx_buffer_info[
+				rx_ring->next_to_clean];
+			next_bi->skb = ERR_PTR(-EINVAL);
+			continue;
+		}
+
+		if (unlikely(bi->skb)) {
+			ixgbe_reuse_rx_buffer_zc(rx_ring, bi);
+			ixgbe_inc_ntc(rx_ring);
+			continue;
+		}
+
+		xdp.data = bi->addr;
+		xdp.data_meta = xdp.data;
+		xdp.data_hard_start = xdp.data - XDP_PACKET_HEADROOM;
+		xdp.data_end = xdp.data + size;
+		xdp.handle = bi->handle;
+
+		xdp_res = ixgbe_run_xdp_zc(adapter, rx_ring, &xdp);
+
+		if (xdp_res) {
+			if (xdp_res & (IXGBE_XDP_TX | IXGBE_XDP_REDIR)) {
+				xdp_xmit |= xdp_res;
+				bi->addr = NULL;
+				bi->skb = NULL;
+			} else {
+				ixgbe_reuse_rx_buffer_zc(rx_ring, bi);
+			}
+			total_rx_packets++;
+			total_rx_bytes += size;
+
+			cleaned_count++;
+			ixgbe_inc_ntc(rx_ring);
+			continue;
+		}
+
+		/* XDP_PASS path */
+		skb = ixgbe_construct_skb_zc(rx_ring, bi, &xdp);
+		if (!skb) {
+			rx_ring->rx_stats.alloc_rx_buff_failed++;
+			break;
+		}
+
+		cleaned_count++;
+		ixgbe_inc_ntc(rx_ring);
+
+		if (eth_skb_pad(skb))
+			continue;
+
+		total_rx_bytes += skb->len;
+		total_rx_packets++;
+
+		ixgbe_process_skb_fields(rx_ring, rx_desc, skb);
+		ixgbe_rx_skb(q_vector, skb);
+	}
+
+	if (xdp_xmit & IXGBE_XDP_REDIR)
+		xdp_do_flush_map();
+
+	if (xdp_xmit & IXGBE_XDP_TX) {
+		struct ixgbe_ring *ring = adapter->xdp_ring[smp_processor_id()];
+
+		/* Force memory writes to complete before letting h/w
+		 * know there are new descriptors to fetch.
+		 */
+		wmb();
+		writel(ring->next_to_use, ring->tail);
+	}
+
+	u64_stats_update_begin(&rx_ring->syncp);
+	rx_ring->stats.packets += total_rx_packets;
+	rx_ring->stats.bytes += total_rx_bytes;
+	u64_stats_update_end(&rx_ring->syncp);
+	q_vector->rx.total_packets += total_rx_packets;
+	q_vector->rx.total_bytes += total_rx_bytes;
+
+	return failure ? budget : (int)total_rx_packets;
+}
+
+void ixgbe_xsk_clean_rx_ring(struct ixgbe_ring *rx_ring)
+{
+	u16 i = rx_ring->next_to_clean;
+	struct ixgbe_rx_buffer *bi = &rx_ring->rx_buffer_info[i];
+
+	while (i != rx_ring->next_to_alloc) {
+		xsk_umem_fq_reuse(rx_ring->xsk_umem, bi->handle);
+		i++;
+		bi++;
+		if (i == rx_ring->count) {
+			i = 0;
+			bi = rx_ring->rx_buffer_info;
+		}
+	}
+}
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH 4/5] ixgbe: move common Tx functions to ixgbe_txrx_common.h
  2018-09-24 16:35 [PATCH 0/5] Introducing ixgbe AF_XDP ZC support Björn Töpel
                   ` (2 preceding siblings ...)
  2018-09-24 16:35 ` [PATCH 3/5] ixgbe: add AF_XDP zero-copy Rx support Björn Töpel
@ 2018-09-24 16:35 ` Björn Töpel
  2018-09-24 16:35 ` [PATCH 5/5] ixgbe: add AF_XDP zero-copy Tx support Björn Töpel
  4 siblings, 0 replies; 8+ messages in thread
From: Björn Töpel @ 2018-09-24 16:35 UTC (permalink / raw)
  To: jeffrey.t.kirsher, intel-wired-lan
  Cc: Björn Töpel, magnus.karlsson, magnus.karlsson, ast,
	daniel, netdev, brouer, u9012063, tuc

From: Björn Töpel <bjorn.topel@intel.com>

This patch prepares for the upcoming zero-copy Tx functionality by
moving common functions used both by the regular path and zero-copy
path.

Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
---
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c        | 9 +++------
 drivers/net/ethernet/intel/ixgbe/ixgbe_txrx_common.h | 5 +++++
 2 files changed, 8 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index 4e6726c623a8..4e9b28894a5b 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -895,8 +895,8 @@ static void ixgbe_set_ivar(struct ixgbe_adapter *adapter, s8 direction,
 	}
 }
 
-static inline void ixgbe_irq_rearm_queues(struct ixgbe_adapter *adapter,
-					  u64 qmask)
+void ixgbe_irq_rearm_queues(struct ixgbe_adapter *adapter,
+			    u64 qmask)
 {
 	u32 mask;
 
@@ -8150,9 +8150,6 @@ static inline int ixgbe_maybe_stop_tx(struct ixgbe_ring *tx_ring, u16 size)
 	return __ixgbe_maybe_stop_tx(tx_ring, size);
 }
 
-#define IXGBE_TXD_CMD (IXGBE_TXD_CMD_EOP | \
-		       IXGBE_TXD_CMD_RS)
-
 static int ixgbe_tx_map(struct ixgbe_ring *tx_ring,
 			struct ixgbe_tx_buffer *first,
 			const u8 hdr_len)
@@ -10275,7 +10272,7 @@ static int ixgbe_xdp(struct net_device *dev, struct netdev_bpf *xdp)
 	}
 }
 
-static void ixgbe_xdp_ring_update_tail(struct ixgbe_ring *ring)
+void ixgbe_xdp_ring_update_tail(struct ixgbe_ring *ring)
 {
 	/* Force memory writes to complete before letting h/w know there
 	 * are new descriptors to fetch.
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_txrx_common.h b/drivers/net/ethernet/intel/ixgbe/ixgbe_txrx_common.h
index cf219f4e009d..56afb685c648 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_txrx_common.h
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_txrx_common.h
@@ -9,6 +9,9 @@
 #define IXGBE_XDP_TX		BIT(1)
 #define IXGBE_XDP_REDIR		BIT(2)
 
+#define IXGBE_TXD_CMD (IXGBE_TXD_CMD_EOP | \
+		       IXGBE_TXD_CMD_RS)
+
 int ixgbe_xmit_xdp_ring(struct ixgbe_adapter *adapter,
 			struct xdp_frame *xdpf);
 bool ixgbe_cleanup_headers(struct ixgbe_ring *rx_ring,
@@ -19,6 +22,8 @@ void ixgbe_process_skb_fields(struct ixgbe_ring *rx_ring,
 			      struct sk_buff *skb);
 void ixgbe_rx_skb(struct ixgbe_q_vector *q_vector,
 		  struct sk_buff *skb);
+void ixgbe_xdp_ring_update_tail(struct ixgbe_ring *ring);
+void ixgbe_irq_rearm_queues(struct ixgbe_adapter *adapter, u64 qmask);
 
 void ixgbe_txrx_ring_disable(struct ixgbe_adapter *adapter, int ring);
 void ixgbe_txrx_ring_enable(struct ixgbe_adapter *adapter, int ring);
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH 5/5] ixgbe: add AF_XDP zero-copy Tx support
  2018-09-24 16:35 [PATCH 0/5] Introducing ixgbe AF_XDP ZC support Björn Töpel
                   ` (3 preceding siblings ...)
  2018-09-24 16:35 ` [PATCH 4/5] ixgbe: move common Tx functions to ixgbe_txrx_common.h Björn Töpel
@ 2018-09-24 16:35 ` Björn Töpel
  4 siblings, 0 replies; 8+ messages in thread
From: Björn Töpel @ 2018-09-24 16:35 UTC (permalink / raw)
  To: jeffrey.t.kirsher, intel-wired-lan
  Cc: Björn Töpel, magnus.karlsson, magnus.karlsson, ast,
	daniel, netdev, brouer, u9012063, tuc

From: Björn Töpel <bjorn.topel@intel.com>

This patch adds zero-copy Tx support for AF_XDP sockets. It implements
the ndo_xsk_async_xmit netdev ndo and performs all the Tx logic from a
NAPI context. This means pulling egress packets from the Tx ring,
placing the frames on the NIC HW descriptor ring and completing sent
frames back to the application via the completion ring.

The regular XDP Tx ring is used for AF_XDP as well. This rationale for
this is as follows: XDP_REDIRECT guarantees mutual exclusion between
different NAPI contexts based on CPU id. In other words, a netdev can
XDP_REDIRECT to another netdev with a different NAPI context, since
the operation is bound to a specific core and each core has its own
hardware ring.

As the AF_XDP Tx action is running in the same NAPI context and using
the same ring, it will also be protected from XDP_REDIRECT actions
with the exact same mechanism.

As with AF_XDP Rx, all AF_XDP Tx specific functions are added to
ixgbe_xsk.c.

Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
---
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c |  16 +-
 .../ethernet/intel/ixgbe/ixgbe_txrx_common.h  |   4 +
 drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c  | 175 ++++++++++++++++++
 3 files changed, 194 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index 4e9b28894a5b..64b19346396d 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -3161,7 +3161,11 @@ int ixgbe_poll(struct napi_struct *napi, int budget)
 #endif
 
 	ixgbe_for_each_ring(ring, q_vector->tx) {
-		if (!ixgbe_clean_tx_irq(q_vector, ring, budget))
+		bool wd = ring->xsk_umem ?
+			  ixgbe_clean_xdp_tx_irq(q_vector, ring, budget) :
+			  ixgbe_clean_tx_irq(q_vector, ring, budget);
+
+		if (!wd)
 			clean_complete = false;
 	}
 
@@ -3472,6 +3476,9 @@ void ixgbe_configure_tx_ring(struct ixgbe_adapter *adapter,
 	u32 txdctl = IXGBE_TXDCTL_ENABLE;
 	u8 reg_idx = ring->reg_idx;
 
+	if (ring_is_xdp(ring))
+		ring->xsk_umem = ixgbe_xsk_umem(adapter, ring);
+
 	/* disable queue to avoid issues while updating state */
 	IXGBE_WRITE_REG(hw, IXGBE_TXDCTL(reg_idx), 0);
 	IXGBE_WRITE_FLUSH(hw);
@@ -5938,6 +5945,11 @@ static void ixgbe_clean_tx_ring(struct ixgbe_ring *tx_ring)
 	u16 i = tx_ring->next_to_clean;
 	struct ixgbe_tx_buffer *tx_buffer = &tx_ring->tx_buffer_info[i];
 
+	if (tx_ring->xsk_umem) {
+		ixgbe_xsk_clean_tx_ring(tx_ring);
+		goto out;
+	}
+
 	while (i != tx_ring->next_to_use) {
 		union ixgbe_adv_tx_desc *eop_desc, *tx_desc;
 
@@ -5989,6 +6001,7 @@ static void ixgbe_clean_tx_ring(struct ixgbe_ring *tx_ring)
 	if (!ring_is_xdp(tx_ring))
 		netdev_tx_reset_queue(txring_txq(tx_ring));
 
+out:
 	/* reset next_to_use and next_to_clean */
 	tx_ring->next_to_use = 0;
 	tx_ring->next_to_clean = 0;
@@ -10369,6 +10382,7 @@ static const struct net_device_ops ixgbe_netdev_ops = {
 	.ndo_features_check	= ixgbe_features_check,
 	.ndo_bpf		= ixgbe_xdp,
 	.ndo_xdp_xmit		= ixgbe_xdp_xmit,
+	.ndo_xsk_async_xmit	= ixgbe_xsk_async_xmit,
 };
 
 static void ixgbe_disable_txr_hw(struct ixgbe_adapter *adapter,
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_txrx_common.h b/drivers/net/ethernet/intel/ixgbe/ixgbe_txrx_common.h
index 56afb685c648..53d4089f5644 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_txrx_common.h
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_txrx_common.h
@@ -42,5 +42,9 @@ int ixgbe_clean_rx_irq_zc(struct ixgbe_q_vector *q_vector,
 			  struct ixgbe_ring *rx_ring,
 			  const int budget);
 void ixgbe_xsk_clean_rx_ring(struct ixgbe_ring *rx_ring);
+bool ixgbe_clean_xdp_tx_irq(struct ixgbe_q_vector *q_vector,
+			    struct ixgbe_ring *tx_ring, int napi_budget);
+int ixgbe_xsk_async_xmit(struct net_device *dev, u32 queue_id);
+void ixgbe_xsk_clean_tx_ring(struct ixgbe_ring *tx_ring);
 
 #endif /* #define _IXGBE_TXRX_COMMON_H_ */
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c
index 253ce3cfbcf1..e998ed880460 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c
@@ -637,3 +637,178 @@ void ixgbe_xsk_clean_rx_ring(struct ixgbe_ring *rx_ring)
 		}
 	}
 }
+
+static bool ixgbe_xmit_zc(struct ixgbe_ring *xdp_ring, unsigned int budget)
+{
+	union ixgbe_adv_tx_desc *tx_desc = NULL;
+	struct ixgbe_tx_buffer *tx_bi;
+	bool work_done = true;
+	u32 len, cmd_type;
+	dma_addr_t dma;
+
+	while (budget-- > 0) {
+		if (unlikely(!ixgbe_desc_unused(xdp_ring))) {
+			work_done = false;
+			break;
+		}
+
+		if (!xsk_umem_consume_tx(xdp_ring->xsk_umem, &dma, &len))
+			break;
+
+		dma_sync_single_for_device(xdp_ring->dev, dma, len,
+					   DMA_BIDIRECTIONAL);
+
+		tx_bi = &xdp_ring->tx_buffer_info[xdp_ring->next_to_use];
+		tx_bi->bytecount = len;
+		tx_bi->xdpf = NULL;
+
+		tx_desc = IXGBE_TX_DESC(xdp_ring, xdp_ring->next_to_use);
+		tx_desc->read.buffer_addr = cpu_to_le64(dma);
+
+		/* put descriptor type bits */
+		cmd_type = IXGBE_ADVTXD_DTYP_DATA |
+			   IXGBE_ADVTXD_DCMD_DEXT |
+			   IXGBE_ADVTXD_DCMD_IFCS;
+		cmd_type |= len | IXGBE_TXD_CMD;
+		tx_desc->read.cmd_type_len = cpu_to_le32(cmd_type);
+		tx_desc->read.olinfo_status =
+			cpu_to_le32(len << IXGBE_ADVTXD_PAYLEN_SHIFT);
+
+		xdp_ring->next_to_use++;
+		if (xdp_ring->next_to_use == xdp_ring->count)
+			xdp_ring->next_to_use = 0;
+	}
+
+	if (tx_desc) {
+		ixgbe_xdp_ring_update_tail(xdp_ring);
+		xsk_umem_consume_tx_done(xdp_ring->xsk_umem);
+	}
+
+	return !!budget && work_done;
+}
+
+static void ixgbe_clean_xdp_tx_buffer(struct ixgbe_ring *tx_ring,
+				      struct ixgbe_tx_buffer *tx_bi)
+{
+	xdp_return_frame(tx_bi->xdpf);
+	dma_unmap_single(tx_ring->dev,
+			 dma_unmap_addr(tx_bi, dma),
+			 dma_unmap_len(tx_bi, len), DMA_TO_DEVICE);
+	dma_unmap_len_set(tx_bi, len, 0);
+}
+
+bool ixgbe_clean_xdp_tx_irq(struct ixgbe_q_vector *q_vector,
+			    struct ixgbe_ring *tx_ring, int napi_budget)
+{
+	unsigned int total_packets = 0, total_bytes = 0;
+	u32 i = tx_ring->next_to_clean, xsk_frames = 0;
+	unsigned int budget = q_vector->tx.work_limit;
+	struct xdp_umem *umem = tx_ring->xsk_umem;
+	union ixgbe_adv_tx_desc *tx_desc;
+	struct ixgbe_tx_buffer *tx_bi;
+	bool xmit_done;
+
+	tx_bi = &tx_ring->tx_buffer_info[i];
+	tx_desc = IXGBE_TX_DESC(tx_ring, i);
+	i -= tx_ring->count;
+
+	do {
+		if (!(tx_desc->wb.status & cpu_to_le32(IXGBE_TXD_STAT_DD)))
+			break;
+
+		total_bytes += tx_bi->bytecount;
+		total_packets += tx_bi->gso_segs;
+
+		if (tx_bi->xdpf)
+			ixgbe_clean_xdp_tx_buffer(tx_ring, tx_bi);
+		else
+			xsk_frames++;
+
+		tx_bi->xdpf = NULL;
+		total_bytes += tx_bi->bytecount;
+
+		tx_bi++;
+		tx_desc++;
+		i++;
+		if (unlikely(!i)) {
+			i -= tx_ring->count;
+			tx_bi = tx_ring->tx_buffer_info;
+			tx_desc = IXGBE_TX_DESC(tx_ring, 0);
+		}
+
+		/* issue prefetch for next Tx descriptor */
+		prefetch(tx_desc);
+
+		/* update budget accounting */
+		budget--;
+	} while (likely(budget));
+
+	i += tx_ring->count;
+	tx_ring->next_to_clean = i;
+
+	u64_stats_update_begin(&tx_ring->syncp);
+	tx_ring->stats.bytes += total_bytes;
+	tx_ring->stats.packets += total_packets;
+	u64_stats_update_end(&tx_ring->syncp);
+	q_vector->tx.total_bytes += total_bytes;
+	q_vector->tx.total_packets += total_packets;
+
+	if (xsk_frames)
+		xsk_umem_complete_tx(umem, xsk_frames);
+
+	xmit_done = ixgbe_xmit_zc(tx_ring, q_vector->tx.work_limit);
+	return budget > 0 && xmit_done;
+}
+
+int ixgbe_xsk_async_xmit(struct net_device *dev, u32 queue_id)
+{
+	struct ixgbe_adapter *adapter = netdev_priv(dev);
+	struct ixgbe_ring *ring;
+
+	if (test_bit(__IXGBE_DOWN, &adapter->state))
+		return -ENETDOWN;
+
+	if (!READ_ONCE(adapter->xdp_prog))
+		return -ENXIO;
+
+	if (queue_id >= adapter->num_rx_queues)
+		return -ENXIO;
+
+	if (!adapter->xsk_umems[queue_id])
+		return -ENXIO;
+
+	ring = adapter->xdp_ring[queue_id];
+	if (!napi_if_scheduled_mark_missed(&ring->q_vector->napi)) {
+		u64 eics = BIT_ULL(ring->q_vector->v_idx);
+
+		ixgbe_irq_rearm_queues(adapter, eics);
+	}
+
+	return 0;
+}
+
+void ixgbe_xsk_clean_tx_ring(struct ixgbe_ring *tx_ring)
+{
+	u16 ntc = tx_ring->next_to_clean, ntu = tx_ring->next_to_use;
+	struct xdp_umem *umem = tx_ring->xsk_umem;
+	struct ixgbe_tx_buffer *tx_bi;
+	u32 xsk_frames = 0;
+
+	while (ntc != ntu) {
+		tx_bi = &tx_ring->tx_buffer_info[ntc];
+
+		if (tx_bi->xdpf)
+			ixgbe_clean_xdp_tx_buffer(tx_ring, tx_bi);
+		else
+			xsk_frames++;
+
+		tx_bi->xdpf = NULL;
+
+		ntc++;
+		if (ntc == tx_ring->count)
+			ntc = 0;
+	}
+
+	if (xsk_frames)
+		xsk_umem_complete_tx(umem, xsk_frames);
+}
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH 3/5] ixgbe: add AF_XDP zero-copy Rx support
  2018-09-24 16:35 ` [PATCH 3/5] ixgbe: add AF_XDP zero-copy Rx support Björn Töpel
@ 2018-09-25 14:57   ` Jakub Kicinski
  2018-09-26 14:29     ` Björn Töpel
  0 siblings, 1 reply; 8+ messages in thread
From: Jakub Kicinski @ 2018-09-25 14:57 UTC (permalink / raw)
  To: Björn Töpel
  Cc: jeffrey.t.kirsher, intel-wired-lan, Björn Töpel,
	magnus.karlsson, magnus.karlsson, ast, daniel, netdev, brouer,
	u9012063, tuc

On Mon, 24 Sep 2018 18:35:55 +0200, Björn Töpel wrote:
> +	if (adapter->flags & IXGBE_FLAG_SRIOV_ENABLED)
> +		return -EINVAL;
> +
> +	if (adapter->flags & IXGBE_FLAG_DCB_ENABLED)
> +		return -EINVAL;

Hm, should you add UMEM checks to all the places these may get
enabled?  Like fabf1bce103a ("ixgbe: Prevent unsupported configurations
with XDP") did?

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 3/5] ixgbe: add AF_XDP zero-copy Rx support
  2018-09-25 14:57   ` Jakub Kicinski
@ 2018-09-26 14:29     ` Björn Töpel
  0 siblings, 0 replies; 8+ messages in thread
From: Björn Töpel @ 2018-09-26 14:29 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Jeff Kirsher, intel-wired-lan, Björn Töpel, Karlsson,
	Magnus, Magnus Karlsson, ast, Daniel Borkmann, Netdev,
	Jesper Dangaard Brouer, William Tu, tuc

Den tis 25 sep. 2018 kl 16:57 skrev Jakub Kicinski
<jakub.kicinski@netronome.com>:
>
> On Mon, 24 Sep 2018 18:35:55 +0200, Björn Töpel wrote:
> > +     if (adapter->flags & IXGBE_FLAG_SRIOV_ENABLED)
> > +             return -EINVAL;
> > +
> > +     if (adapter->flags & IXGBE_FLAG_DCB_ENABLED)
> > +             return -EINVAL;
>
> Hm, should you add UMEM checks to all the places these may get
> enabled?  Like fabf1bce103a ("ixgbe: Prevent unsupported configurations
> with XDP") did?

Actually, I can remove these checks, since it's already checked by the
XDP path. AF_XDP ZC is enabled only when XDP is enabled. So there's no
need to have the XDP checks in the AF_XDP path.

Björn

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2018-09-26 20:43 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-09-24 16:35 [PATCH 0/5] Introducing ixgbe AF_XDP ZC support Björn Töpel
2018-09-24 16:35 ` [PATCH 1/5] ixgbe: added Rx/Tx ring disable/enable functions Björn Töpel
2018-09-24 16:35 ` [PATCH 2/5] ixgbe: move common Rx functions to ixgbe_txrx_common.h Björn Töpel
2018-09-24 16:35 ` [PATCH 3/5] ixgbe: add AF_XDP zero-copy Rx support Björn Töpel
2018-09-25 14:57   ` Jakub Kicinski
2018-09-26 14:29     ` Björn Töpel
2018-09-24 16:35 ` [PATCH 4/5] ixgbe: move common Tx functions to ixgbe_txrx_common.h Björn Töpel
2018-09-24 16:35 ` [PATCH 5/5] ixgbe: add AF_XDP zero-copy Tx support Björn Töpel

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).