All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/5] Introducing ixgbe AF_XDP ZC support
@ 2018-10-02  8:00 ` =?unknown-8bit?q?Bj=C3=B6rn_T=C3=B6pel?=
  0 siblings, 0 replies; 34+ messages in thread
From: Björn Töpel @ 2018-10-02  8:00 UTC (permalink / raw)
  To: jeffrey.t.kirsher, intel-wired-lan
  Cc: Björn Töpel, magnus.karlsson, magnus.karlsson, ast,
	daniel, netdev, brouer, u9012063, tuc, jakub.kicinski

From: Björn Töpel <bjorn.topel@intel.com>

Jeff: Please remove the v1 patches from your dev-queue!

This patch set introduces zero-copy AF_XDP support for Intel's ixgbe
driver.

The ixgbe zero-copy code is located in its own file ixgbe_xsk.[ch],
analogous to the i40e ZC support. Again, as in i40e, code paths have
been copied from the XDP path to the zero-copy path. Going forward we
will try to generalize more code between the AF_XDP ZC drivers, and
also reduce the heavy C&P.

We have run some benchmarks on a dual socket system with two Broadwell
E5 2660 @ 2.0 GHz with hyperthreading turned off. Each socket has 14
cores which gives a total of 28, but only two cores are used in these
experiments. One for TR/RX and one for the user space application. The
memory is DDR4 @ 2133 MT/s (1067 MHz) and the size of each DIMM is
8192MB and with 8 of those DIMMs in the system we have 64 GB of total
memory. The compiler used is GCC 7.3.0. The NIC is Intel
82599ES/X520-2 10Gbit/s using the ixgbe driver.

Below are the results in Mpps of the 82599ES/X520-2 NIC benchmark runs
for 64B and 1500B packets, generated by a commercial packet generator
HW blasting packets at full 10Gbit/s line rate. The results are with
retpoline and all other spectre and meltdown fixes.

AF_XDP performance 64B packets:
Benchmark   XDP_DRV with zerocopy
rxdrop        14.7
txpush        14.6
l2fwd         11.1

AF_XDP performance 1500B packets:
Benchmark   XDP_DRV with zerocopy
rxdrop        0.8
l2fwd         0.8

XDP performance on our system as a base line.

64B packets:
XDP stats       CPU     Mpps       issue-pps
XDP-RX CPU      16      14.7       0

1500B packets:
XDP stats       CPU     Mpps       issue-pps
XDP-RX CPU      16      0.8        0

The structure of the patch set is as follows:

Patch 1: Introduce Rx/Tx ring enable/disable functionality
Patch 2: Preparatory patche to ixgbe driver code for RX
Patch 3: ixgbe zero-copy support for RX
Patch 4: Preparatory patch to ixgbe driver code for TX
Patch 5: ixgbe zero-copy support for TX

Changes since v1:

* Removed redundant AF_XDP precondition checks, pointed out by
  Jakub. Now, the preconditions are only checked at XDP enable time.
* Fixed a crash in the egress path, due to incorrect usage of
  ixgbe_ring queue_index member. In v2 a ring_idx back reference is
  introduced, and used in favor of queue_index. William reported the
  crash, and helped me smoke out the issue. Kudos!
* In ixgbe_xsk_async_xmit, validate qid against num_xdp_queues,
  instead of num_rx_queues.

Cheers!
Björn

Björn Töpel (5):
  ixgbe: added Rx/Tx ring disable/enable functions
  ixgbe: move common Rx functions to ixgbe_txrx_common.h
  ixgbe: add AF_XDP zero-copy Rx support
  ixgbe: move common Tx functions to ixgbe_txrx_common.h
  ixgbe: add AF_XDP zero-copy Tx support

 drivers/net/ethernet/intel/ixgbe/Makefile     |   3 +-
 drivers/net/ethernet/intel/ixgbe/ixgbe.h      |  28 +-
 drivers/net/ethernet/intel/ixgbe/ixgbe_lib.c  |  17 +-
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 291 ++++++-
 .../ethernet/intel/ixgbe/ixgbe_txrx_common.h  |  50 ++
 drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c  | 803 ++++++++++++++++++
 6 files changed, 1146 insertions(+), 46 deletions(-)
 create mode 100644 drivers/net/ethernet/intel/ixgbe/ixgbe_txrx_common.h
 create mode 100644 drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c

-- 
2.17.1

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [Intel-wired-lan] [PATCH v2 0/5] Introducing ixgbe AF_XDP ZC support
@ 2018-10-02  8:00 ` =?unknown-8bit?q?Bj=C3=B6rn_T=C3=B6pel?=
  0 siblings, 0 replies; 34+ messages in thread
From: =?unknown-8bit?q?Bj=C3=B6rn_T=C3=B6pel?= @ 2018-10-02  8:00 UTC (permalink / raw)
  To: intel-wired-lan

From: Bj?rn T?pel <bjorn.topel@intel.com>

Jeff: Please remove the v1 patches from your dev-queue!

This patch set introduces zero-copy AF_XDP support for Intel's ixgbe
driver.

The ixgbe zero-copy code is located in its own file ixgbe_xsk.[ch],
analogous to the i40e ZC support. Again, as in i40e, code paths have
been copied from the XDP path to the zero-copy path. Going forward we
will try to generalize more code between the AF_XDP ZC drivers, and
also reduce the heavy C&P.

We have run some benchmarks on a dual socket system with two Broadwell
E5 2660 @ 2.0 GHz with hyperthreading turned off. Each socket has 14
cores which gives a total of 28, but only two cores are used in these
experiments. One for TR/RX and one for the user space application. The
memory is DDR4 @ 2133 MT/s (1067 MHz) and the size of each DIMM is
8192MB and with 8 of those DIMMs in the system we have 64 GB of total
memory. The compiler used is GCC 7.3.0. The NIC is Intel
82599ES/X520-2 10Gbit/s using the ixgbe driver.

Below are the results in Mpps of the 82599ES/X520-2 NIC benchmark runs
for 64B and 1500B packets, generated by a commercial packet generator
HW blasting packets at full 10Gbit/s line rate. The results are with
retpoline and all other spectre and meltdown fixes.

AF_XDP performance 64B packets:
Benchmark   XDP_DRV with zerocopy
rxdrop        14.7
txpush        14.6
l2fwd         11.1

AF_XDP performance 1500B packets:
Benchmark   XDP_DRV with zerocopy
rxdrop        0.8
l2fwd         0.8

XDP performance on our system as a base line.

64B packets:
XDP stats       CPU     Mpps       issue-pps
XDP-RX CPU      16      14.7       0

1500B packets:
XDP stats       CPU     Mpps       issue-pps
XDP-RX CPU      16      0.8        0

The structure of the patch set is as follows:

Patch 1: Introduce Rx/Tx ring enable/disable functionality
Patch 2: Preparatory patche to ixgbe driver code for RX
Patch 3: ixgbe zero-copy support for RX
Patch 4: Preparatory patch to ixgbe driver code for TX
Patch 5: ixgbe zero-copy support for TX

Changes since v1:

* Removed redundant AF_XDP precondition checks, pointed out by
  Jakub. Now, the preconditions are only checked at XDP enable time.
* Fixed a crash in the egress path, due to incorrect usage of
  ixgbe_ring queue_index member. In v2 a ring_idx back reference is
  introduced, and used in favor of queue_index. William reported the
  crash, and helped me smoke out the issue. Kudos!
* In ixgbe_xsk_async_xmit, validate qid against num_xdp_queues,
  instead of num_rx_queues.

Cheers!
Bj?rn

Bj?rn T?pel (5):
  ixgbe: added Rx/Tx ring disable/enable functions
  ixgbe: move common Rx functions to ixgbe_txrx_common.h
  ixgbe: add AF_XDP zero-copy Rx support
  ixgbe: move common Tx functions to ixgbe_txrx_common.h
  ixgbe: add AF_XDP zero-copy Tx support

 drivers/net/ethernet/intel/ixgbe/Makefile     |   3 +-
 drivers/net/ethernet/intel/ixgbe/ixgbe.h      |  28 +-
 drivers/net/ethernet/intel/ixgbe/ixgbe_lib.c  |  17 +-
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 291 ++++++-
 .../ethernet/intel/ixgbe/ixgbe_txrx_common.h  |  50 ++
 drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c  | 803 ++++++++++++++++++
 6 files changed, 1146 insertions(+), 46 deletions(-)
 create mode 100644 drivers/net/ethernet/intel/ixgbe/ixgbe_txrx_common.h
 create mode 100644 drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c

-- 
2.17.1


^ permalink raw reply	[flat|nested] 34+ messages in thread

* [PATCH v2 1/5] ixgbe: added Rx/Tx ring disable/enable functions
  2018-10-02  8:00 ` [Intel-wired-lan] " =?unknown-8bit?q?Bj=C3=B6rn_T=C3=B6pel?=
@ 2018-10-02  8:00   ` =?unknown-8bit?q?Bj=C3=B6rn_T=C3=B6pel?=
  -1 siblings, 0 replies; 34+ messages in thread
From: Björn Töpel @ 2018-10-02  8:00 UTC (permalink / raw)
  To: jeffrey.t.kirsher, intel-wired-lan
  Cc: Björn Töpel, magnus.karlsson, magnus.karlsson, ast,
	daniel, netdev, brouer, u9012063, tuc, jakub.kicinski

From: Björn Töpel <bjorn.topel@intel.com>

Add functions for Rx/Tx ring enable/disable. Instead of resetting the
whole device, only the affected ring is disabled or enabled.

This plumbing is used in later commits, when zero-copy AF_XDP support
is introduced.

Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
---
 drivers/net/ethernet/intel/ixgbe/ixgbe.h      |   1 +
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 158 ++++++++++++++++++
 2 files changed, 159 insertions(+)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe.h b/drivers/net/ethernet/intel/ixgbe/ixgbe.h
index 5c6fd42e90ed..265db172042a 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe.h
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe.h
@@ -271,6 +271,7 @@ enum ixgbe_ring_state_t {
 	__IXGBE_TX_DETECT_HANG,
 	__IXGBE_HANG_CHECK_ARMED,
 	__IXGBE_TX_XDP_RING,
+	__IXGBE_TX_DISABLED,
 };
 
 #define ring_uses_build_skb(ring) \
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index 187b78f950b5..6ff886498882 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -8694,6 +8694,8 @@ static netdev_tx_t __ixgbe_xmit_frame(struct sk_buff *skb,
 		return NETDEV_TX_OK;
 
 	tx_ring = ring ? ring : adapter->tx_ring[skb->queue_mapping];
+	if (unlikely(test_bit(__IXGBE_TX_DISABLED, &tx_ring->state)))
+		return NETDEV_TX_BUSY;
 
 	return ixgbe_xmit_frame_ring(skb, adapter, tx_ring);
 }
@@ -10240,6 +10242,9 @@ static int ixgbe_xdp_xmit(struct net_device *dev, int n,
 	if (unlikely(!ring))
 		return -ENXIO;
 
+	if (unlikely(test_bit(__IXGBE_TX_DISABLED, &ring->state)))
+		return -ENXIO;
+
 	for (i = 0; i < n; i++) {
 		struct xdp_frame *xdpf = frames[i];
 		int err;
@@ -10303,6 +10308,159 @@ static const struct net_device_ops ixgbe_netdev_ops = {
 	.ndo_xdp_xmit		= ixgbe_xdp_xmit,
 };
 
+static void ixgbe_disable_txr_hw(struct ixgbe_adapter *adapter,
+				 struct ixgbe_ring *tx_ring)
+{
+	unsigned long wait_delay, delay_interval;
+	struct ixgbe_hw *hw = &adapter->hw;
+	u8 reg_idx = tx_ring->reg_idx;
+	int wait_loop;
+	u32 txdctl;
+
+	IXGBE_WRITE_REG(hw, IXGBE_TXDCTL(reg_idx), IXGBE_TXDCTL_SWFLSH);
+
+	/* delay mechanism from ixgbe_disable_tx */
+	delay_interval = ixgbe_get_completion_timeout(adapter) / 100;
+
+	wait_loop = IXGBE_MAX_RX_DESC_POLL;
+	wait_delay = delay_interval;
+
+	while (wait_loop--) {
+		usleep_range(wait_delay, wait_delay + 10);
+		wait_delay += delay_interval * 2;
+		txdctl = IXGBE_READ_REG(hw, IXGBE_TXDCTL(reg_idx));
+
+		if (!(txdctl & IXGBE_TXDCTL_ENABLE))
+			return;
+	}
+
+	e_err(drv, "TXDCTL.ENABLE not cleared within the polling period\n");
+}
+
+static void ixgbe_disable_txr(struct ixgbe_adapter *adapter,
+			      struct ixgbe_ring *tx_ring)
+{
+	set_bit(__IXGBE_TX_DISABLED, &tx_ring->state);
+	ixgbe_disable_txr_hw(adapter, tx_ring);
+}
+
+static void ixgbe_disable_rxr_hw(struct ixgbe_adapter *adapter,
+				 struct ixgbe_ring *rx_ring)
+{
+	unsigned long wait_delay, delay_interval;
+	struct ixgbe_hw *hw = &adapter->hw;
+	u8 reg_idx = rx_ring->reg_idx;
+	int wait_loop;
+	u32 rxdctl;
+
+	rxdctl = IXGBE_READ_REG(hw, IXGBE_RXDCTL(reg_idx));
+	rxdctl &= ~IXGBE_RXDCTL_ENABLE;
+	rxdctl |= IXGBE_RXDCTL_SWFLSH;
+
+	/* write value back with RXDCTL.ENABLE bit cleared */
+	IXGBE_WRITE_REG(hw, IXGBE_RXDCTL(reg_idx), rxdctl);
+
+	/* RXDCTL.EN may not change on 82598 if link is down, so skip it */
+	if (hw->mac.type == ixgbe_mac_82598EB &&
+	    !(IXGBE_READ_REG(hw, IXGBE_LINKS) & IXGBE_LINKS_UP))
+		return;
+
+	/* delay mechanism from ixgbe_disable_rx */
+	delay_interval = ixgbe_get_completion_timeout(adapter) / 100;
+
+	wait_loop = IXGBE_MAX_RX_DESC_POLL;
+	wait_delay = delay_interval;
+
+	while (wait_loop--) {
+		usleep_range(wait_delay, wait_delay + 10);
+		wait_delay += delay_interval * 2;
+		rxdctl = IXGBE_READ_REG(hw, IXGBE_RXDCTL(reg_idx));
+
+		if (!(rxdctl & IXGBE_RXDCTL_ENABLE))
+			return;
+	}
+
+	e_err(drv, "RXDCTL.ENABLE not cleared within the polling period\n");
+}
+
+static void ixgbe_reset_txr_stats(struct ixgbe_ring *tx_ring)
+{
+	memset(&tx_ring->stats, 0, sizeof(tx_ring->stats));
+	memset(&tx_ring->tx_stats, 0, sizeof(tx_ring->tx_stats));
+}
+
+static void ixgbe_reset_rxr_stats(struct ixgbe_ring *rx_ring)
+{
+	memset(&rx_ring->stats, 0, sizeof(rx_ring->stats));
+	memset(&rx_ring->rx_stats, 0, sizeof(rx_ring->rx_stats));
+}
+
+/**
+ * ixgbe_txrx_ring_disable - Disable Rx/Tx/XDP Tx rings
+ * @adapter: adapter structure
+ * @ring: ring index
+ *
+ * This function disables a certain Rx/Tx/XDP Tx ring. The function
+ * assumes that the netdev is running.
+ **/
+void ixgbe_txrx_ring_disable(struct ixgbe_adapter *adapter, int ring)
+{
+	struct ixgbe_ring *rx_ring, *tx_ring, *xdp_ring;
+
+	rx_ring = adapter->rx_ring[ring];
+	tx_ring = adapter->tx_ring[ring];
+	xdp_ring = adapter->xdp_ring[ring];
+
+	ixgbe_disable_txr(adapter, tx_ring);
+	if (xdp_ring)
+		ixgbe_disable_txr(adapter, xdp_ring);
+	ixgbe_disable_rxr_hw(adapter, rx_ring);
+
+	if (xdp_ring)
+		synchronize_sched();
+
+	/* Rx/Tx/XDP Tx share the same napi context. */
+	napi_disable(&rx_ring->q_vector->napi);
+
+	ixgbe_clean_tx_ring(tx_ring);
+	if (xdp_ring)
+		ixgbe_clean_tx_ring(xdp_ring);
+	ixgbe_clean_rx_ring(rx_ring);
+
+	ixgbe_reset_txr_stats(tx_ring);
+	if (xdp_ring)
+		ixgbe_reset_txr_stats(xdp_ring);
+	ixgbe_reset_rxr_stats(rx_ring);
+}
+
+/**
+ * ixgbe_txrx_ring_enable - Enable Rx/Tx/XDP Tx rings
+ * @adapter: adapter structure
+ * @ring: ring index
+ *
+ * This function enables a certain Rx/Tx/XDP Tx ring. The function
+ * assumes that the netdev is running.
+ **/
+void ixgbe_txrx_ring_enable(struct ixgbe_adapter *adapter, int ring)
+{
+	struct ixgbe_ring *rx_ring, *tx_ring, *xdp_ring;
+
+	rx_ring = adapter->rx_ring[ring];
+	tx_ring = adapter->tx_ring[ring];
+	xdp_ring = adapter->xdp_ring[ring];
+
+	/* Rx/Tx/XDP Tx share the same napi context. */
+	napi_enable(&rx_ring->q_vector->napi);
+
+	ixgbe_configure_tx_ring(adapter, tx_ring);
+	if (xdp_ring)
+		ixgbe_configure_tx_ring(adapter, xdp_ring);
+	ixgbe_configure_rx_ring(adapter, rx_ring);
+
+	clear_bit(__IXGBE_TX_DISABLED, &tx_ring->state);
+	clear_bit(__IXGBE_TX_DISABLED, &xdp_ring->state);
+}
+
 /**
  * ixgbe_enumerate_functions - Get the number of ports this device has
  * @adapter: adapter structure
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [Intel-wired-lan] [PATCH v2 1/5] ixgbe: added Rx/Tx ring disable/enable functions
@ 2018-10-02  8:00   ` =?unknown-8bit?q?Bj=C3=B6rn_T=C3=B6pel?=
  0 siblings, 0 replies; 34+ messages in thread
From: =?unknown-8bit?q?Bj=C3=B6rn_T=C3=B6pel?= @ 2018-10-02  8:00 UTC (permalink / raw)
  To: intel-wired-lan

From: Bj?rn T?pel <bjorn.topel@intel.com>

Add functions for Rx/Tx ring enable/disable. Instead of resetting the
whole device, only the affected ring is disabled or enabled.

This plumbing is used in later commits, when zero-copy AF_XDP support
is introduced.

Signed-off-by: Bj?rn T?pel <bjorn.topel@intel.com>
---
 drivers/net/ethernet/intel/ixgbe/ixgbe.h      |   1 +
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 158 ++++++++++++++++++
 2 files changed, 159 insertions(+)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe.h b/drivers/net/ethernet/intel/ixgbe/ixgbe.h
index 5c6fd42e90ed..265db172042a 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe.h
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe.h
@@ -271,6 +271,7 @@ enum ixgbe_ring_state_t {
 	__IXGBE_TX_DETECT_HANG,
 	__IXGBE_HANG_CHECK_ARMED,
 	__IXGBE_TX_XDP_RING,
+	__IXGBE_TX_DISABLED,
 };
 
 #define ring_uses_build_skb(ring) \
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index 187b78f950b5..6ff886498882 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -8694,6 +8694,8 @@ static netdev_tx_t __ixgbe_xmit_frame(struct sk_buff *skb,
 		return NETDEV_TX_OK;
 
 	tx_ring = ring ? ring : adapter->tx_ring[skb->queue_mapping];
+	if (unlikely(test_bit(__IXGBE_TX_DISABLED, &tx_ring->state)))
+		return NETDEV_TX_BUSY;
 
 	return ixgbe_xmit_frame_ring(skb, adapter, tx_ring);
 }
@@ -10240,6 +10242,9 @@ static int ixgbe_xdp_xmit(struct net_device *dev, int n,
 	if (unlikely(!ring))
 		return -ENXIO;
 
+	if (unlikely(test_bit(__IXGBE_TX_DISABLED, &ring->state)))
+		return -ENXIO;
+
 	for (i = 0; i < n; i++) {
 		struct xdp_frame *xdpf = frames[i];
 		int err;
@@ -10303,6 +10308,159 @@ static const struct net_device_ops ixgbe_netdev_ops = {
 	.ndo_xdp_xmit		= ixgbe_xdp_xmit,
 };
 
+static void ixgbe_disable_txr_hw(struct ixgbe_adapter *adapter,
+				 struct ixgbe_ring *tx_ring)
+{
+	unsigned long wait_delay, delay_interval;
+	struct ixgbe_hw *hw = &adapter->hw;
+	u8 reg_idx = tx_ring->reg_idx;
+	int wait_loop;
+	u32 txdctl;
+
+	IXGBE_WRITE_REG(hw, IXGBE_TXDCTL(reg_idx), IXGBE_TXDCTL_SWFLSH);
+
+	/* delay mechanism from ixgbe_disable_tx */
+	delay_interval = ixgbe_get_completion_timeout(adapter) / 100;
+
+	wait_loop = IXGBE_MAX_RX_DESC_POLL;
+	wait_delay = delay_interval;
+
+	while (wait_loop--) {
+		usleep_range(wait_delay, wait_delay + 10);
+		wait_delay += delay_interval * 2;
+		txdctl = IXGBE_READ_REG(hw, IXGBE_TXDCTL(reg_idx));
+
+		if (!(txdctl & IXGBE_TXDCTL_ENABLE))
+			return;
+	}
+
+	e_err(drv, "TXDCTL.ENABLE not cleared within the polling period\n");
+}
+
+static void ixgbe_disable_txr(struct ixgbe_adapter *adapter,
+			      struct ixgbe_ring *tx_ring)
+{
+	set_bit(__IXGBE_TX_DISABLED, &tx_ring->state);
+	ixgbe_disable_txr_hw(adapter, tx_ring);
+}
+
+static void ixgbe_disable_rxr_hw(struct ixgbe_adapter *adapter,
+				 struct ixgbe_ring *rx_ring)
+{
+	unsigned long wait_delay, delay_interval;
+	struct ixgbe_hw *hw = &adapter->hw;
+	u8 reg_idx = rx_ring->reg_idx;
+	int wait_loop;
+	u32 rxdctl;
+
+	rxdctl = IXGBE_READ_REG(hw, IXGBE_RXDCTL(reg_idx));
+	rxdctl &= ~IXGBE_RXDCTL_ENABLE;
+	rxdctl |= IXGBE_RXDCTL_SWFLSH;
+
+	/* write value back with RXDCTL.ENABLE bit cleared */
+	IXGBE_WRITE_REG(hw, IXGBE_RXDCTL(reg_idx), rxdctl);
+
+	/* RXDCTL.EN may not change on 82598 if link is down, so skip it */
+	if (hw->mac.type == ixgbe_mac_82598EB &&
+	    !(IXGBE_READ_REG(hw, IXGBE_LINKS) & IXGBE_LINKS_UP))
+		return;
+
+	/* delay mechanism from ixgbe_disable_rx */
+	delay_interval = ixgbe_get_completion_timeout(adapter) / 100;
+
+	wait_loop = IXGBE_MAX_RX_DESC_POLL;
+	wait_delay = delay_interval;
+
+	while (wait_loop--) {
+		usleep_range(wait_delay, wait_delay + 10);
+		wait_delay += delay_interval * 2;
+		rxdctl = IXGBE_READ_REG(hw, IXGBE_RXDCTL(reg_idx));
+
+		if (!(rxdctl & IXGBE_RXDCTL_ENABLE))
+			return;
+	}
+
+	e_err(drv, "RXDCTL.ENABLE not cleared within the polling period\n");
+}
+
+static void ixgbe_reset_txr_stats(struct ixgbe_ring *tx_ring)
+{
+	memset(&tx_ring->stats, 0, sizeof(tx_ring->stats));
+	memset(&tx_ring->tx_stats, 0, sizeof(tx_ring->tx_stats));
+}
+
+static void ixgbe_reset_rxr_stats(struct ixgbe_ring *rx_ring)
+{
+	memset(&rx_ring->stats, 0, sizeof(rx_ring->stats));
+	memset(&rx_ring->rx_stats, 0, sizeof(rx_ring->rx_stats));
+}
+
+/**
+ * ixgbe_txrx_ring_disable - Disable Rx/Tx/XDP Tx rings
+ * @adapter: adapter structure
+ * @ring: ring index
+ *
+ * This function disables a certain Rx/Tx/XDP Tx ring. The function
+ * assumes that the netdev is running.
+ **/
+void ixgbe_txrx_ring_disable(struct ixgbe_adapter *adapter, int ring)
+{
+	struct ixgbe_ring *rx_ring, *tx_ring, *xdp_ring;
+
+	rx_ring = adapter->rx_ring[ring];
+	tx_ring = adapter->tx_ring[ring];
+	xdp_ring = adapter->xdp_ring[ring];
+
+	ixgbe_disable_txr(adapter, tx_ring);
+	if (xdp_ring)
+		ixgbe_disable_txr(adapter, xdp_ring);
+	ixgbe_disable_rxr_hw(adapter, rx_ring);
+
+	if (xdp_ring)
+		synchronize_sched();
+
+	/* Rx/Tx/XDP Tx share the same napi context. */
+	napi_disable(&rx_ring->q_vector->napi);
+
+	ixgbe_clean_tx_ring(tx_ring);
+	if (xdp_ring)
+		ixgbe_clean_tx_ring(xdp_ring);
+	ixgbe_clean_rx_ring(rx_ring);
+
+	ixgbe_reset_txr_stats(tx_ring);
+	if (xdp_ring)
+		ixgbe_reset_txr_stats(xdp_ring);
+	ixgbe_reset_rxr_stats(rx_ring);
+}
+
+/**
+ * ixgbe_txrx_ring_enable - Enable Rx/Tx/XDP Tx rings
+ * @adapter: adapter structure
+ * @ring: ring index
+ *
+ * This function enables a certain Rx/Tx/XDP Tx ring. The function
+ * assumes that the netdev is running.
+ **/
+void ixgbe_txrx_ring_enable(struct ixgbe_adapter *adapter, int ring)
+{
+	struct ixgbe_ring *rx_ring, *tx_ring, *xdp_ring;
+
+	rx_ring = adapter->rx_ring[ring];
+	tx_ring = adapter->tx_ring[ring];
+	xdp_ring = adapter->xdp_ring[ring];
+
+	/* Rx/Tx/XDP Tx share the same napi context. */
+	napi_enable(&rx_ring->q_vector->napi);
+
+	ixgbe_configure_tx_ring(adapter, tx_ring);
+	if (xdp_ring)
+		ixgbe_configure_tx_ring(adapter, xdp_ring);
+	ixgbe_configure_rx_ring(adapter, rx_ring);
+
+	clear_bit(__IXGBE_TX_DISABLED, &tx_ring->state);
+	clear_bit(__IXGBE_TX_DISABLED, &xdp_ring->state);
+}
+
 /**
  * ixgbe_enumerate_functions - Get the number of ports this device has
  * @adapter: adapter structure
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v2 2/5] ixgbe: move common Rx functions to ixgbe_txrx_common.h
  2018-10-02  8:00 ` [Intel-wired-lan] " =?unknown-8bit?q?Bj=C3=B6rn_T=C3=B6pel?=
@ 2018-10-02  8:00   ` =?unknown-8bit?q?Bj=C3=B6rn_T=C3=B6pel?=
  -1 siblings, 0 replies; 34+ messages in thread
From: Björn Töpel @ 2018-10-02  8:00 UTC (permalink / raw)
  To: jeffrey.t.kirsher, intel-wired-lan
  Cc: Björn Töpel, magnus.karlsson, magnus.karlsson, ast,
	daniel, netdev, brouer, u9012063, tuc, jakub.kicinski

From: Björn Töpel <bjorn.topel@intel.com>

This patch prepares for the upcoming zero-copy Rx functionality, by
moving/changing linkage of common functions, used both by the regular
path and zero-copy path.

Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
---
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 29 +++++++------------
 .../ethernet/intel/ixgbe/ixgbe_txrx_common.h  | 26 +++++++++++++++++
 2 files changed, 37 insertions(+), 18 deletions(-)
 create mode 100644 drivers/net/ethernet/intel/ixgbe/ixgbe_txrx_common.h

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index 6ff886498882..cc655c4e24fd 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -40,6 +40,7 @@
 #include "ixgbe_dcb_82599.h"
 #include "ixgbe_sriov.h"
 #include "ixgbe_model.h"
+#include "ixgbe_txrx_common.h"
 
 char ixgbe_driver_name[] = "ixgbe";
 static const char ixgbe_driver_string[] =
@@ -1673,9 +1674,9 @@ static void ixgbe_update_rsc_stats(struct ixgbe_ring *rx_ring,
  * order to populate the hash, checksum, VLAN, timestamp, protocol, and
  * other fields within the skb.
  **/
-static void ixgbe_process_skb_fields(struct ixgbe_ring *rx_ring,
-				     union ixgbe_adv_rx_desc *rx_desc,
-				     struct sk_buff *skb)
+void ixgbe_process_skb_fields(struct ixgbe_ring *rx_ring,
+			      union ixgbe_adv_rx_desc *rx_desc,
+			      struct sk_buff *skb)
 {
 	struct net_device *dev = rx_ring->netdev;
 	u32 flags = rx_ring->q_vector->adapter->flags;
@@ -1708,8 +1709,8 @@ static void ixgbe_process_skb_fields(struct ixgbe_ring *rx_ring,
 	skb->protocol = eth_type_trans(skb, dev);
 }
 
-static void ixgbe_rx_skb(struct ixgbe_q_vector *q_vector,
-			 struct sk_buff *skb)
+void ixgbe_rx_skb(struct ixgbe_q_vector *q_vector,
+		  struct sk_buff *skb)
 {
 	napi_gro_receive(&q_vector->napi, skb);
 }
@@ -1868,9 +1869,9 @@ static void ixgbe_dma_sync_frag(struct ixgbe_ring *rx_ring,
  *
  * Returns true if an error was encountered and skb was freed.
  **/
-static bool ixgbe_cleanup_headers(struct ixgbe_ring *rx_ring,
-				  union ixgbe_adv_rx_desc *rx_desc,
-				  struct sk_buff *skb)
+bool ixgbe_cleanup_headers(struct ixgbe_ring *rx_ring,
+			   union ixgbe_adv_rx_desc *rx_desc,
+			   struct sk_buff *skb)
 {
 	struct net_device *netdev = rx_ring->netdev;
 
@@ -2186,14 +2187,6 @@ static struct sk_buff *ixgbe_build_skb(struct ixgbe_ring *rx_ring,
 	return skb;
 }
 
-#define IXGBE_XDP_PASS		0
-#define IXGBE_XDP_CONSUMED	BIT(0)
-#define IXGBE_XDP_TX		BIT(1)
-#define IXGBE_XDP_REDIR		BIT(2)
-
-static int ixgbe_xmit_xdp_ring(struct ixgbe_adapter *adapter,
-			       struct xdp_frame *xdpf);
-
 static struct sk_buff *ixgbe_run_xdp(struct ixgbe_adapter *adapter,
 				     struct ixgbe_ring *rx_ring,
 				     struct xdp_buff *xdp)
@@ -8471,8 +8464,8 @@ static u16 ixgbe_select_queue(struct net_device *dev, struct sk_buff *skb,
 }
 
 #endif
-static int ixgbe_xmit_xdp_ring(struct ixgbe_adapter *adapter,
-			       struct xdp_frame *xdpf)
+int ixgbe_xmit_xdp_ring(struct ixgbe_adapter *adapter,
+			struct xdp_frame *xdpf)
 {
 	struct ixgbe_ring *ring = adapter->xdp_ring[smp_processor_id()];
 	struct ixgbe_tx_buffer *tx_buffer;
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_txrx_common.h b/drivers/net/ethernet/intel/ixgbe/ixgbe_txrx_common.h
new file mode 100644
index 000000000000..3780d315b991
--- /dev/null
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_txrx_common.h
@@ -0,0 +1,26 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/* Copyright(c) 2018 Intel Corporation. */
+
+#ifndef _IXGBE_TXRX_COMMON_H_
+#define _IXGBE_TXRX_COMMON_H_
+
+#define IXGBE_XDP_PASS		0
+#define IXGBE_XDP_CONSUMED	BIT(0)
+#define IXGBE_XDP_TX		BIT(1)
+#define IXGBE_XDP_REDIR		BIT(2)
+
+int ixgbe_xmit_xdp_ring(struct ixgbe_adapter *adapter,
+			struct xdp_frame *xdpf);
+bool ixgbe_cleanup_headers(struct ixgbe_ring *rx_ring,
+			   union ixgbe_adv_rx_desc *rx_desc,
+			   struct sk_buff *skb);
+void ixgbe_process_skb_fields(struct ixgbe_ring *rx_ring,
+			      union ixgbe_adv_rx_desc *rx_desc,
+			      struct sk_buff *skb);
+void ixgbe_rx_skb(struct ixgbe_q_vector *q_vector,
+		  struct sk_buff *skb);
+
+void ixgbe_txrx_ring_disable(struct ixgbe_adapter *adapter, int ring);
+void ixgbe_txrx_ring_enable(struct ixgbe_adapter *adapter, int ring);
+
+#endif /* #define _IXGBE_TXRX_COMMON_H_ */
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [Intel-wired-lan] [PATCH v2 2/5] ixgbe: move common Rx functions to ixgbe_txrx_common.h
@ 2018-10-02  8:00   ` =?unknown-8bit?q?Bj=C3=B6rn_T=C3=B6pel?=
  0 siblings, 0 replies; 34+ messages in thread
From: =?unknown-8bit?q?Bj=C3=B6rn_T=C3=B6pel?= @ 2018-10-02  8:00 UTC (permalink / raw)
  To: intel-wired-lan

From: Bj?rn T?pel <bjorn.topel@intel.com>

This patch prepares for the upcoming zero-copy Rx functionality, by
moving/changing linkage of common functions, used both by the regular
path and zero-copy path.

Signed-off-by: Bj?rn T?pel <bjorn.topel@intel.com>
---
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 29 +++++++------------
 .../ethernet/intel/ixgbe/ixgbe_txrx_common.h  | 26 +++++++++++++++++
 2 files changed, 37 insertions(+), 18 deletions(-)
 create mode 100644 drivers/net/ethernet/intel/ixgbe/ixgbe_txrx_common.h

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index 6ff886498882..cc655c4e24fd 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -40,6 +40,7 @@
 #include "ixgbe_dcb_82599.h"
 #include "ixgbe_sriov.h"
 #include "ixgbe_model.h"
+#include "ixgbe_txrx_common.h"
 
 char ixgbe_driver_name[] = "ixgbe";
 static const char ixgbe_driver_string[] =
@@ -1673,9 +1674,9 @@ static void ixgbe_update_rsc_stats(struct ixgbe_ring *rx_ring,
  * order to populate the hash, checksum, VLAN, timestamp, protocol, and
  * other fields within the skb.
  **/
-static void ixgbe_process_skb_fields(struct ixgbe_ring *rx_ring,
-				     union ixgbe_adv_rx_desc *rx_desc,
-				     struct sk_buff *skb)
+void ixgbe_process_skb_fields(struct ixgbe_ring *rx_ring,
+			      union ixgbe_adv_rx_desc *rx_desc,
+			      struct sk_buff *skb)
 {
 	struct net_device *dev = rx_ring->netdev;
 	u32 flags = rx_ring->q_vector->adapter->flags;
@@ -1708,8 +1709,8 @@ static void ixgbe_process_skb_fields(struct ixgbe_ring *rx_ring,
 	skb->protocol = eth_type_trans(skb, dev);
 }
 
-static void ixgbe_rx_skb(struct ixgbe_q_vector *q_vector,
-			 struct sk_buff *skb)
+void ixgbe_rx_skb(struct ixgbe_q_vector *q_vector,
+		  struct sk_buff *skb)
 {
 	napi_gro_receive(&q_vector->napi, skb);
 }
@@ -1868,9 +1869,9 @@ static void ixgbe_dma_sync_frag(struct ixgbe_ring *rx_ring,
  *
  * Returns true if an error was encountered and skb was freed.
  **/
-static bool ixgbe_cleanup_headers(struct ixgbe_ring *rx_ring,
-				  union ixgbe_adv_rx_desc *rx_desc,
-				  struct sk_buff *skb)
+bool ixgbe_cleanup_headers(struct ixgbe_ring *rx_ring,
+			   union ixgbe_adv_rx_desc *rx_desc,
+			   struct sk_buff *skb)
 {
 	struct net_device *netdev = rx_ring->netdev;
 
@@ -2186,14 +2187,6 @@ static struct sk_buff *ixgbe_build_skb(struct ixgbe_ring *rx_ring,
 	return skb;
 }
 
-#define IXGBE_XDP_PASS		0
-#define IXGBE_XDP_CONSUMED	BIT(0)
-#define IXGBE_XDP_TX		BIT(1)
-#define IXGBE_XDP_REDIR		BIT(2)
-
-static int ixgbe_xmit_xdp_ring(struct ixgbe_adapter *adapter,
-			       struct xdp_frame *xdpf);
-
 static struct sk_buff *ixgbe_run_xdp(struct ixgbe_adapter *adapter,
 				     struct ixgbe_ring *rx_ring,
 				     struct xdp_buff *xdp)
@@ -8471,8 +8464,8 @@ static u16 ixgbe_select_queue(struct net_device *dev, struct sk_buff *skb,
 }
 
 #endif
-static int ixgbe_xmit_xdp_ring(struct ixgbe_adapter *adapter,
-			       struct xdp_frame *xdpf)
+int ixgbe_xmit_xdp_ring(struct ixgbe_adapter *adapter,
+			struct xdp_frame *xdpf)
 {
 	struct ixgbe_ring *ring = adapter->xdp_ring[smp_processor_id()];
 	struct ixgbe_tx_buffer *tx_buffer;
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_txrx_common.h b/drivers/net/ethernet/intel/ixgbe/ixgbe_txrx_common.h
new file mode 100644
index 000000000000..3780d315b991
--- /dev/null
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_txrx_common.h
@@ -0,0 +1,26 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/* Copyright(c) 2018 Intel Corporation. */
+
+#ifndef _IXGBE_TXRX_COMMON_H_
+#define _IXGBE_TXRX_COMMON_H_
+
+#define IXGBE_XDP_PASS		0
+#define IXGBE_XDP_CONSUMED	BIT(0)
+#define IXGBE_XDP_TX		BIT(1)
+#define IXGBE_XDP_REDIR		BIT(2)
+
+int ixgbe_xmit_xdp_ring(struct ixgbe_adapter *adapter,
+			struct xdp_frame *xdpf);
+bool ixgbe_cleanup_headers(struct ixgbe_ring *rx_ring,
+			   union ixgbe_adv_rx_desc *rx_desc,
+			   struct sk_buff *skb);
+void ixgbe_process_skb_fields(struct ixgbe_ring *rx_ring,
+			      union ixgbe_adv_rx_desc *rx_desc,
+			      struct sk_buff *skb);
+void ixgbe_rx_skb(struct ixgbe_q_vector *q_vector,
+		  struct sk_buff *skb);
+
+void ixgbe_txrx_ring_disable(struct ixgbe_adapter *adapter, int ring);
+void ixgbe_txrx_ring_enable(struct ixgbe_adapter *adapter, int ring);
+
+#endif /* #define _IXGBE_TXRX_COMMON_H_ */
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v2 3/5] ixgbe: add AF_XDP zero-copy Rx support
  2018-10-02  8:00 ` [Intel-wired-lan] " =?unknown-8bit?q?Bj=C3=B6rn_T=C3=B6pel?=
@ 2018-10-02  8:00   ` =?unknown-8bit?q?Bj=C3=B6rn_T=C3=B6pel?=
  -1 siblings, 0 replies; 34+ messages in thread
From: Björn Töpel @ 2018-10-02  8:00 UTC (permalink / raw)
  To: jeffrey.t.kirsher, intel-wired-lan
  Cc: Björn Töpel, magnus.karlsson, magnus.karlsson, ast,
	daniel, netdev, brouer, u9012063, tuc, jakub.kicinski

From: Björn Töpel <bjorn.topel@intel.com>

This patch adds zero-copy Rx support for AF_XDP sockets. Instead of
allocating buffers of type MEM_TYPE_PAGE_SHARED, the Rx frames are
allocated as MEM_TYPE_ZERO_COPY when AF_XDP is enabled for a certain
queue.

All AF_XDP specific functions are added to a new file, ixgbe_xsk.c.

Note that when AF_XDP zero-copy is enabled, the XDP action XDP_PASS
will allocate a new buffer and copy the zero-copy frame prior passing
it to the kernel stack.

Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
---
 drivers/net/ethernet/intel/ixgbe/Makefile     |   3 +-
 drivers/net/ethernet/intel/ixgbe/ixgbe.h      |  27 +-
 drivers/net/ethernet/intel/ixgbe/ixgbe_lib.c  |  17 +-
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c |  78 ++-
 .../ethernet/intel/ixgbe/ixgbe_txrx_common.h  |  15 +
 drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c  | 628 ++++++++++++++++++
 6 files changed, 747 insertions(+), 21 deletions(-)
 create mode 100644 drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c

diff --git a/drivers/net/ethernet/intel/ixgbe/Makefile b/drivers/net/ethernet/intel/ixgbe/Makefile
index 5414685189ce..ca6b0c458e4a 100644
--- a/drivers/net/ethernet/intel/ixgbe/Makefile
+++ b/drivers/net/ethernet/intel/ixgbe/Makefile
@@ -8,7 +8,8 @@ obj-$(CONFIG_IXGBE) += ixgbe.o
 
 ixgbe-objs := ixgbe_main.o ixgbe_common.o ixgbe_ethtool.o \
               ixgbe_82599.o ixgbe_82598.o ixgbe_phy.o ixgbe_sriov.o \
-              ixgbe_mbx.o ixgbe_x540.o ixgbe_x550.o ixgbe_lib.o ixgbe_ptp.o
+              ixgbe_mbx.o ixgbe_x540.o ixgbe_x550.o ixgbe_lib.o ixgbe_ptp.o \
+              ixgbe_xsk.o
 
 ixgbe-$(CONFIG_IXGBE_DCB) +=  ixgbe_dcb.o ixgbe_dcb_82598.o \
                               ixgbe_dcb_82599.o ixgbe_dcb_nl.o
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe.h b/drivers/net/ethernet/intel/ixgbe/ixgbe.h
index 265db172042a..7a7679e7be84 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe.h
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe.h
@@ -228,13 +228,17 @@ struct ixgbe_tx_buffer {
 struct ixgbe_rx_buffer {
 	struct sk_buff *skb;
 	dma_addr_t dma;
-	struct page *page;
-#if (BITS_PER_LONG > 32) || (PAGE_SIZE >= 65536)
-	__u32 page_offset;
-#else
-	__u16 page_offset;
-#endif
-	__u16 pagecnt_bias;
+	union {
+		struct {
+			struct page *page;
+			__u32 page_offset;
+			__u16 pagecnt_bias;
+		};
+		struct {
+			void *addr;
+			u64 handle;
+		};
+	};
 };
 
 struct ixgbe_queue_stats {
@@ -348,6 +352,10 @@ struct ixgbe_ring {
 		struct ixgbe_rx_queue_stats rx_stats;
 	};
 	struct xdp_rxq_info xdp_rxq;
+	struct xdp_umem *xsk_umem;
+	struct zero_copy_allocator zca; /* ZC allocator anchor */
+	u16 ring_idx;		/* {rx,tx,xdp}_ring back reference idx */
+	u16 rx_buf_len;
 } ____cacheline_internodealigned_in_smp;
 
 enum ixgbe_ring_f_enum {
@@ -765,6 +773,11 @@ struct ixgbe_adapter {
 #ifdef CONFIG_XFRM_OFFLOAD
 	struct ixgbe_ipsec *ipsec;
 #endif /* CONFIG_XFRM_OFFLOAD */
+
+	/* AF_XDP zero-copy */
+	struct xdp_umem **xsk_umems;
+	u16 num_xsk_umems_used;
+	u16 num_xsk_umems;
 };
 
 static inline u8 ixgbe_max_rss_indices(struct ixgbe_adapter *adapter)
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_lib.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_lib.c
index d361f570ca37..62e6499e4146 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_lib.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_lib.c
@@ -1055,7 +1055,7 @@ static int ixgbe_alloc_q_vectors(struct ixgbe_adapter *adapter)
 	int txr_remaining = adapter->num_tx_queues;
 	int xdp_remaining = adapter->num_xdp_queues;
 	int rxr_idx = 0, txr_idx = 0, xdp_idx = 0, v_idx = 0;
-	int err;
+	int err, i;
 
 	/* only one q_vector if MSI-X is disabled. */
 	if (!(adapter->flags & IXGBE_FLAG_MSIX_ENABLED))
@@ -1097,6 +1097,21 @@ static int ixgbe_alloc_q_vectors(struct ixgbe_adapter *adapter)
 		xdp_idx += xqpv;
 	}
 
+	for (i = 0; i < adapter->num_rx_queues; i++) {
+		if (adapter->rx_ring[i])
+			adapter->rx_ring[i]->ring_idx = i;
+	}
+
+	for (i = 0; i < adapter->num_tx_queues; i++) {
+		if (adapter->tx_ring[i])
+			adapter->tx_ring[i]->ring_idx = i;
+	}
+
+	for (i = 0; i < adapter->num_xdp_queues; i++) {
+		if (adapter->xdp_ring[i])
+			adapter->xdp_ring[i]->ring_idx = i;
+	}
+
 	return 0;
 
 err_out:
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index cc655c4e24fd..547092b8fe54 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -34,6 +34,7 @@
 #include <net/tc_act/tc_mirred.h>
 #include <net/vxlan.h>
 #include <net/mpls.h>
+#include <net/xdp_sock.h>
 
 #include "ixgbe.h"
 #include "ixgbe_common.h"
@@ -3176,7 +3177,10 @@ int ixgbe_poll(struct napi_struct *napi, int budget)
 		per_ring_budget = budget;
 
 	ixgbe_for_each_ring(ring, q_vector->rx) {
-		int cleaned = ixgbe_clean_rx_irq(q_vector, ring,
+		int cleaned = ring->xsk_umem ?
+			      ixgbe_clean_rx_irq_zc(q_vector, ring,
+						    per_ring_budget) :
+			      ixgbe_clean_rx_irq(q_vector, ring,
 						 per_ring_budget);
 
 		work_done += cleaned;
@@ -3706,10 +3710,27 @@ static void ixgbe_configure_srrctl(struct ixgbe_adapter *adapter,
 	srrctl = IXGBE_RX_HDR_SIZE << IXGBE_SRRCTL_BSIZEHDRSIZE_SHIFT;
 
 	/* configure the packet buffer length */
-	if (test_bit(__IXGBE_RX_3K_BUFFER, &rx_ring->state))
+	if (rx_ring->xsk_umem) {
+		u32 xsk_buf_len = rx_ring->xsk_umem->chunk_size_nohr -
+				  XDP_PACKET_HEADROOM;
+
+		/* If the MAC support setting RXDCTL.RLPML, the
+		 * SRRCTL[n].BSIZEPKT is set to PAGE_SIZE and
+		 * RXDCTL.RLPML is set to the actual UMEM buffer
+		 * size. If not, then we are stuck with a 1k buffer
+		 * size resolution. In this case frames larger than
+		 * the UMEM buffer size viewed in a 1k resolution will
+		 * be dropped.
+		 */
+		if (hw->mac.type != ixgbe_mac_82599EB)
+			srrctl |= PAGE_SIZE >> IXGBE_SRRCTL_BSIZEPKT_SHIFT;
+		else
+			srrctl |= xsk_buf_len >> IXGBE_SRRCTL_BSIZEPKT_SHIFT;
+	} else if (test_bit(__IXGBE_RX_3K_BUFFER, &rx_ring->state)) {
 		srrctl |= IXGBE_RXBUFFER_3K >> IXGBE_SRRCTL_BSIZEPKT_SHIFT;
-	else
+	} else {
 		srrctl |= IXGBE_RXBUFFER_2K >> IXGBE_SRRCTL_BSIZEPKT_SHIFT;
+	}
 
 	/* configure descriptor type */
 	srrctl |= IXGBE_SRRCTL_DESCTYPE_ADV_ONEBUF;
@@ -4032,6 +4053,19 @@ void ixgbe_configure_rx_ring(struct ixgbe_adapter *adapter,
 	u32 rxdctl;
 	u8 reg_idx = ring->reg_idx;
 
+	xdp_rxq_info_unreg_mem_model(&ring->xdp_rxq);
+	ring->xsk_umem = ixgbe_xsk_umem(adapter, ring);
+	if (ring->xsk_umem) {
+		ring->zca.free = ixgbe_zca_free;
+		WARN_ON(xdp_rxq_info_reg_mem_model(&ring->xdp_rxq,
+						   MEM_TYPE_ZERO_COPY,
+						   &ring->zca));
+
+	} else {
+		WARN_ON(xdp_rxq_info_reg_mem_model(&ring->xdp_rxq,
+						   MEM_TYPE_PAGE_SHARED, NULL));
+	}
+
 	/* disable queue to avoid use of these values while updating state */
 	rxdctl = IXGBE_READ_REG(hw, IXGBE_RXDCTL(reg_idx));
 	rxdctl &= ~IXGBE_RXDCTL_ENABLE;
@@ -4081,6 +4115,17 @@ void ixgbe_configure_rx_ring(struct ixgbe_adapter *adapter,
 #endif
 	}
 
+	if (ring->xsk_umem && hw->mac.type != ixgbe_mac_82599EB) {
+		u32 xsk_buf_len = ring->xsk_umem->chunk_size_nohr -
+				  XDP_PACKET_HEADROOM;
+
+		rxdctl &= ~(IXGBE_RXDCTL_RLPMLMASK |
+			    IXGBE_RXDCTL_RLPML_EN);
+		rxdctl |= xsk_buf_len | IXGBE_RXDCTL_RLPML_EN;
+
+		ring->rx_buf_len = xsk_buf_len;
+	}
+
 	/* initialize rx_buffer_info */
 	memset(ring->rx_buffer_info, 0,
 	       sizeof(struct ixgbe_rx_buffer) * ring->count);
@@ -4094,7 +4139,10 @@ void ixgbe_configure_rx_ring(struct ixgbe_adapter *adapter,
 	IXGBE_WRITE_REG(hw, IXGBE_RXDCTL(reg_idx), rxdctl);
 
 	ixgbe_rx_desc_queue_enable(adapter, ring);
-	ixgbe_alloc_rx_buffers(ring, ixgbe_desc_unused(ring));
+	if (ring->xsk_umem)
+		ixgbe_alloc_rx_buffers_zc(ring, ixgbe_desc_unused(ring));
+	else
+		ixgbe_alloc_rx_buffers(ring, ixgbe_desc_unused(ring));
 }
 
 static void ixgbe_setup_psrtype(struct ixgbe_adapter *adapter)
@@ -5208,6 +5256,11 @@ static void ixgbe_clean_rx_ring(struct ixgbe_ring *rx_ring)
 	u16 i = rx_ring->next_to_clean;
 	struct ixgbe_rx_buffer *rx_buffer = &rx_ring->rx_buffer_info[i];
 
+	if (rx_ring->xsk_umem) {
+		ixgbe_xsk_clean_rx_ring(rx_ring);
+		goto skip_free;
+	}
+
 	/* Free all the Rx ring sk_buffs */
 	while (i != rx_ring->next_to_alloc) {
 		if (rx_buffer->skb) {
@@ -5246,6 +5299,7 @@ static void ixgbe_clean_rx_ring(struct ixgbe_ring *rx_ring)
 		}
 	}
 
+skip_free:
 	rx_ring->next_to_alloc = 0;
 	rx_ring->next_to_clean = 0;
 	rx_ring->next_to_use = 0;
@@ -6441,7 +6495,7 @@ int ixgbe_setup_rx_resources(struct ixgbe_adapter *adapter,
 	struct device *dev = rx_ring->dev;
 	int orig_node = dev_to_node(dev);
 	int ring_node = -1;
-	int size, err;
+	int size;
 
 	size = sizeof(struct ixgbe_rx_buffer) * rx_ring->count;
 
@@ -6478,13 +6532,6 @@ int ixgbe_setup_rx_resources(struct ixgbe_adapter *adapter,
 			     rx_ring->queue_index) < 0)
 		goto err;
 
-	err = xdp_rxq_info_reg_mem_model(&rx_ring->xdp_rxq,
-					 MEM_TYPE_PAGE_SHARED, NULL);
-	if (err) {
-		xdp_rxq_info_unreg(&rx_ring->xdp_rxq);
-		goto err;
-	}
-
 	rx_ring->xdp_prog = adapter->xdp_prog;
 
 	return 0;
@@ -10200,6 +10247,13 @@ static int ixgbe_xdp(struct net_device *dev, struct netdev_bpf *xdp)
 		xdp->prog_id = adapter->xdp_prog ?
 			adapter->xdp_prog->aux->id : 0;
 		return 0;
+	case XDP_QUERY_XSK_UMEM:
+		return ixgbe_xsk_umem_query(adapter, &xdp->xsk.umem,
+					    xdp->xsk.queue_id);
+	case XDP_SETUP_XSK_UMEM:
+		return ixgbe_xsk_umem_setup(adapter, xdp->xsk.umem,
+					    xdp->xsk.queue_id);
+
 	default:
 		return -EINVAL;
 	}
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_txrx_common.h b/drivers/net/ethernet/intel/ixgbe/ixgbe_txrx_common.h
index 3780d315b991..cf219f4e009d 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_txrx_common.h
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_txrx_common.h
@@ -23,4 +23,19 @@ void ixgbe_rx_skb(struct ixgbe_q_vector *q_vector,
 void ixgbe_txrx_ring_disable(struct ixgbe_adapter *adapter, int ring);
 void ixgbe_txrx_ring_enable(struct ixgbe_adapter *adapter, int ring);
 
+struct xdp_umem *ixgbe_xsk_umem(struct ixgbe_adapter *adapter,
+				struct ixgbe_ring *ring);
+int ixgbe_xsk_umem_query(struct ixgbe_adapter *adapter, struct xdp_umem **umem,
+			 u16 qid);
+int ixgbe_xsk_umem_setup(struct ixgbe_adapter *adapter, struct xdp_umem *umem,
+			 u16 qid);
+
+void ixgbe_zca_free(struct zero_copy_allocator *alloc, unsigned long handle);
+
+void ixgbe_alloc_rx_buffers_zc(struct ixgbe_ring *rx_ring, u16 cleaned_count);
+int ixgbe_clean_rx_irq_zc(struct ixgbe_q_vector *q_vector,
+			  struct ixgbe_ring *rx_ring,
+			  const int budget);
+void ixgbe_xsk_clean_rx_ring(struct ixgbe_ring *rx_ring);
+
 #endif /* #define _IXGBE_TXRX_COMMON_H_ */
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c
new file mode 100644
index 000000000000..61259036ff4b
--- /dev/null
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c
@@ -0,0 +1,628 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright(c) 2018 Intel Corporation. */
+
+#include <linux/bpf_trace.h>
+#include <net/xdp_sock.h>
+#include <net/xdp.h>
+
+#include "ixgbe.h"
+#include "ixgbe_txrx_common.h"
+
+struct xdp_umem *ixgbe_xsk_umem(struct ixgbe_adapter *adapter,
+				struct ixgbe_ring *ring)
+{
+	bool xdp_on = READ_ONCE(adapter->xdp_prog);
+	int qid = ring->ring_idx;
+
+	if (!adapter->xsk_umems || !adapter->xsk_umems[qid] ||
+	    qid >= adapter->num_xsk_umems || !xdp_on)
+		return NULL;
+
+	return adapter->xsk_umems[qid];
+}
+
+static int ixgbe_alloc_xsk_umems(struct ixgbe_adapter *adapter)
+{
+	if (adapter->xsk_umems)
+		return 0;
+
+	adapter->num_xsk_umems_used = 0;
+	adapter->num_xsk_umems = adapter->num_rx_queues;
+	adapter->xsk_umems = kcalloc(adapter->num_xsk_umems,
+				     sizeof(*adapter->xsk_umems),
+				     GFP_KERNEL);
+	if (!adapter->xsk_umems) {
+		adapter->num_xsk_umems = 0;
+		return -ENOMEM;
+	}
+
+	return 0;
+}
+
+static int ixgbe_add_xsk_umem(struct ixgbe_adapter *adapter,
+			      struct xdp_umem *umem,
+			      u16 qid)
+{
+	int err;
+
+	err = ixgbe_alloc_xsk_umems(adapter);
+	if (err)
+		return err;
+
+	adapter->xsk_umems[qid] = umem;
+	adapter->num_xsk_umems_used++;
+
+	return 0;
+}
+
+static void ixgbe_remove_xsk_umem(struct ixgbe_adapter *adapter, u16 qid)
+{
+	adapter->xsk_umems[qid] = NULL;
+	adapter->num_xsk_umems_used--;
+
+	if (adapter->num_xsk_umems == 0) {
+		kfree(adapter->xsk_umems);
+		adapter->xsk_umems = NULL;
+		adapter->num_xsk_umems = 0;
+	}
+}
+
+static int ixgbe_xsk_umem_dma_map(struct ixgbe_adapter *adapter,
+				  struct xdp_umem *umem)
+{
+	struct device *dev = &adapter->pdev->dev;
+	unsigned int i, j;
+	dma_addr_t dma;
+
+	for (i = 0; i < umem->npgs; i++) {
+		dma = dma_map_page_attrs(dev, umem->pgs[i], 0, PAGE_SIZE,
+					 DMA_BIDIRECTIONAL, IXGBE_RX_DMA_ATTR);
+		if (dma_mapping_error(dev, dma))
+			goto out_unmap;
+
+		umem->pages[i].dma = dma;
+	}
+
+	return 0;
+
+out_unmap:
+	for (j = 0; j < i; j++) {
+		dma_unmap_page_attrs(dev, umem->pages[i].dma, PAGE_SIZE,
+				     DMA_BIDIRECTIONAL, IXGBE_RX_DMA_ATTR);
+		umem->pages[i].dma = 0;
+	}
+
+	return -1;
+}
+
+static void ixgbe_xsk_umem_dma_unmap(struct ixgbe_adapter *adapter,
+				     struct xdp_umem *umem)
+{
+	struct device *dev = &adapter->pdev->dev;
+	unsigned int i;
+
+	for (i = 0; i < umem->npgs; i++) {
+		dma_unmap_page_attrs(dev, umem->pages[i].dma, PAGE_SIZE,
+				     DMA_BIDIRECTIONAL, IXGBE_RX_DMA_ATTR);
+
+		umem->pages[i].dma = 0;
+	}
+}
+
+static int ixgbe_xsk_umem_enable(struct ixgbe_adapter *adapter,
+				 struct xdp_umem *umem,
+				 u16 qid)
+{
+	struct xdp_umem_fq_reuse *reuseq;
+	bool if_running;
+	int err;
+
+	if (qid >= adapter->num_rx_queues)
+		return -EINVAL;
+
+	if (adapter->xsk_umems) {
+		if (qid >= adapter->num_xsk_umems)
+			return -EINVAL;
+		if (adapter->xsk_umems[qid])
+			return -EBUSY;
+	}
+
+	reuseq = xsk_reuseq_prepare(adapter->rx_ring[0]->count);
+	if (!reuseq)
+		return -ENOMEM;
+
+	xsk_reuseq_free(xsk_reuseq_swap(umem, reuseq));
+
+	err = ixgbe_xsk_umem_dma_map(adapter, umem);
+	if (err)
+		return err;
+
+	if_running = netif_running(adapter->netdev) &&
+		     READ_ONCE(adapter->xdp_prog);
+
+	if (if_running)
+		ixgbe_txrx_ring_disable(adapter, qid);
+
+	err = ixgbe_add_xsk_umem(adapter, umem, qid);
+
+	if (if_running)
+		ixgbe_txrx_ring_enable(adapter, qid);
+
+	return err;
+}
+
+static int ixgbe_xsk_umem_disable(struct ixgbe_adapter *adapter, u16 qid)
+{
+	bool if_running;
+
+	if (!adapter->xsk_umems || qid >= adapter->num_xsk_umems ||
+	    !adapter->xsk_umems[qid])
+		return -EINVAL;
+
+	if_running = netif_running(adapter->netdev) &&
+		     READ_ONCE(adapter->xdp_prog);
+
+	if (if_running)
+		ixgbe_txrx_ring_disable(adapter, qid);
+
+	ixgbe_xsk_umem_dma_unmap(adapter, adapter->xsk_umems[qid]);
+	ixgbe_remove_xsk_umem(adapter, qid);
+
+	if (if_running)
+		ixgbe_txrx_ring_enable(adapter, qid);
+
+	return 0;
+}
+
+int ixgbe_xsk_umem_query(struct ixgbe_adapter *adapter, struct xdp_umem **umem,
+			 u16 qid)
+{
+	if (qid >= adapter->num_rx_queues)
+		return -EINVAL;
+
+	if (adapter->xsk_umems) {
+		if (qid >= adapter->num_xsk_umems)
+			return -EINVAL;
+		*umem = adapter->xsk_umems[qid];
+		return 0;
+	}
+
+	*umem = NULL;
+	return 0;
+}
+
+int ixgbe_xsk_umem_setup(struct ixgbe_adapter *adapter, struct xdp_umem *umem,
+			 u16 qid)
+{
+	return umem ? ixgbe_xsk_umem_enable(adapter, umem, qid) :
+		ixgbe_xsk_umem_disable(adapter, qid);
+}
+
+static int ixgbe_run_xdp_zc(struct ixgbe_adapter *adapter,
+			    struct ixgbe_ring *rx_ring,
+			    struct xdp_buff *xdp)
+{
+	int err, result = IXGBE_XDP_PASS;
+	struct bpf_prog *xdp_prog;
+	struct xdp_frame *xdpf;
+	u32 act;
+
+	rcu_read_lock();
+	xdp_prog = READ_ONCE(rx_ring->xdp_prog);
+	act = bpf_prog_run_xdp(xdp_prog, xdp);
+	xdp->handle += xdp->data - xdp->data_hard_start;
+	switch (act) {
+	case XDP_PASS:
+		break;
+	case XDP_TX:
+		xdpf = convert_to_xdp_frame(xdp);
+		if (unlikely(!xdpf)) {
+			result = IXGBE_XDP_CONSUMED;
+			break;
+		}
+		result = ixgbe_xmit_xdp_ring(adapter, xdpf);
+		break;
+	case XDP_REDIRECT:
+		err = xdp_do_redirect(rx_ring->netdev, xdp, xdp_prog);
+		result = !err ? IXGBE_XDP_REDIR : IXGBE_XDP_CONSUMED;
+		break;
+	default:
+		bpf_warn_invalid_xdp_action(act);
+		/* fallthrough */
+	case XDP_ABORTED:
+		trace_xdp_exception(rx_ring->netdev, xdp_prog, act);
+		/* fallthrough -- handle aborts by dropping packet */
+	case XDP_DROP:
+		result = IXGBE_XDP_CONSUMED;
+		break;
+	}
+	rcu_read_unlock();
+	return result;
+}
+
+static struct ixgbe_rx_buffer *ixgbe_get_rx_buffer_zc(
+	struct ixgbe_ring *rx_ring,
+	unsigned int size)
+{
+	struct ixgbe_rx_buffer *bi;
+
+	bi = &rx_ring->rx_buffer_info[rx_ring->next_to_clean];
+
+	/* we are reusing so sync this buffer for CPU use */
+	dma_sync_single_range_for_cpu(rx_ring->dev,
+				      bi->dma, 0,
+				      size,
+				      DMA_BIDIRECTIONAL);
+
+	return bi;
+}
+
+static void ixgbe_reuse_rx_buffer_zc(struct ixgbe_ring *rx_ring,
+				     struct ixgbe_rx_buffer *obi)
+{
+	unsigned long mask = (unsigned long)rx_ring->xsk_umem->chunk_mask;
+	u64 hr = rx_ring->xsk_umem->headroom + XDP_PACKET_HEADROOM;
+	u16 nta = rx_ring->next_to_alloc;
+	struct ixgbe_rx_buffer *nbi;
+
+	nbi = &rx_ring->rx_buffer_info[rx_ring->next_to_alloc];
+	/* update, and store next to alloc */
+	nta++;
+	rx_ring->next_to_alloc = (nta < rx_ring->count) ? nta : 0;
+
+	/* transfer page from old buffer to new buffer */
+	nbi->dma = obi->dma & mask;
+	nbi->dma += hr;
+
+	nbi->addr = (void *)((unsigned long)obi->addr & mask);
+	nbi->addr += hr;
+
+	nbi->handle = obi->handle & mask;
+	nbi->handle += rx_ring->xsk_umem->headroom;
+
+	obi->addr = NULL;
+	obi->skb = NULL;
+}
+
+void ixgbe_zca_free(struct zero_copy_allocator *alloc, unsigned long handle)
+{
+	struct ixgbe_rx_buffer *bi;
+	struct ixgbe_ring *rx_ring;
+	u64 hr, mask;
+	u16 nta;
+
+	rx_ring = container_of(alloc, struct ixgbe_ring, zca);
+	hr = rx_ring->xsk_umem->headroom + XDP_PACKET_HEADROOM;
+	mask = rx_ring->xsk_umem->chunk_mask;
+
+	nta = rx_ring->next_to_alloc;
+	bi = rx_ring->rx_buffer_info;
+
+	nta++;
+	rx_ring->next_to_alloc = (nta < rx_ring->count) ? nta : 0;
+
+	handle &= mask;
+
+	bi->dma = xdp_umem_get_dma(rx_ring->xsk_umem, handle);
+	bi->dma += hr;
+
+	bi->addr = xdp_umem_get_data(rx_ring->xsk_umem, handle);
+	bi->addr += hr;
+
+	bi->handle = (u64)handle + rx_ring->xsk_umem->headroom;
+}
+
+static bool ixgbe_alloc_buffer_zc(struct ixgbe_ring *rx_ring,
+				  struct ixgbe_rx_buffer *bi)
+{
+	struct xdp_umem *umem = rx_ring->xsk_umem;
+	void *addr = bi->addr;
+	u64 handle, hr;
+
+	if (addr)
+		return true;
+
+	if (!xsk_umem_peek_addr(umem, &handle)) {
+		rx_ring->rx_stats.alloc_rx_page_failed++;
+		return false;
+	}
+
+	hr = umem->headroom + XDP_PACKET_HEADROOM;
+
+	bi->dma = xdp_umem_get_dma(umem, handle);
+	bi->dma += hr;
+
+	bi->addr = xdp_umem_get_data(umem, handle);
+	bi->addr += hr;
+
+	bi->handle = handle + umem->headroom;
+
+	xsk_umem_discard_addr(umem);
+	return true;
+}
+
+static bool ixgbe_alloc_buffer_slow_zc(struct ixgbe_ring *rx_ring,
+				       struct ixgbe_rx_buffer *bi)
+{
+	struct xdp_umem *umem = rx_ring->xsk_umem;
+	u64 handle, hr;
+
+	if (!xsk_umem_peek_addr_rq(umem, &handle)) {
+		rx_ring->rx_stats.alloc_rx_page_failed++;
+		return false;
+	}
+
+	handle &= rx_ring->xsk_umem->chunk_mask;
+
+	hr = umem->headroom + XDP_PACKET_HEADROOM;
+
+	bi->dma = xdp_umem_get_dma(umem, handle);
+	bi->dma += hr;
+
+	bi->addr = xdp_umem_get_data(umem, handle);
+	bi->addr += hr;
+
+	bi->handle = handle + umem->headroom;
+
+	xsk_umem_discard_addr_rq(umem);
+	return true;
+}
+
+static __always_inline bool __ixgbe_alloc_rx_buffers_zc(
+	struct ixgbe_ring *rx_ring,
+	u16 cleaned_count,
+	bool alloc(struct ixgbe_ring *rx_ring,
+		   struct ixgbe_rx_buffer *bi))
+{
+	union ixgbe_adv_rx_desc *rx_desc;
+	struct ixgbe_rx_buffer *bi;
+	u16 i = rx_ring->next_to_use;
+	bool ok = true;
+
+	/* nothing to do */
+	if (!cleaned_count)
+		return true;
+
+	rx_desc = IXGBE_RX_DESC(rx_ring, i);
+	bi = &rx_ring->rx_buffer_info[i];
+	i -= rx_ring->count;
+
+	do {
+		if (!alloc(rx_ring, bi)) {
+			ok = false;
+			break;
+		}
+
+		/* sync the buffer for use by the device */
+		dma_sync_single_range_for_device(rx_ring->dev, bi->dma,
+						 bi->page_offset,
+						 rx_ring->rx_buf_len,
+						 DMA_BIDIRECTIONAL);
+
+		/* Refresh the desc even if buffer_addrs didn't change
+		 * because each write-back erases this info.
+		 */
+		rx_desc->read.pkt_addr = cpu_to_le64(bi->dma);
+
+		rx_desc++;
+		bi++;
+		i++;
+		if (unlikely(!i)) {
+			rx_desc = IXGBE_RX_DESC(rx_ring, 0);
+			bi = rx_ring->rx_buffer_info;
+			i -= rx_ring->count;
+		}
+
+		/* clear the length for the next_to_use descriptor */
+		rx_desc->wb.upper.length = 0;
+
+		cleaned_count--;
+	} while (cleaned_count);
+
+	i += rx_ring->count;
+
+	if (rx_ring->next_to_use != i) {
+		rx_ring->next_to_use = i;
+
+		/* update next to alloc since we have filled the ring */
+		rx_ring->next_to_alloc = i;
+
+		/* Force memory writes to complete before letting h/w
+		 * know there are new descriptors to fetch.  (Only
+		 * applicable for weak-ordered memory model archs,
+		 * such as IA-64).
+		 */
+		wmb();
+		writel(i, rx_ring->tail);
+	}
+
+	return ok;
+}
+
+void ixgbe_alloc_rx_buffers_zc(struct ixgbe_ring *rx_ring, u16 count)
+{
+	__ixgbe_alloc_rx_buffers_zc(rx_ring, count,
+				    ixgbe_alloc_buffer_slow_zc);
+}
+
+static bool ixgbe_alloc_rx_buffers_fast_zc(struct ixgbe_ring *rx_ring,
+					   u16 count)
+{
+	return __ixgbe_alloc_rx_buffers_zc(rx_ring, count,
+					   ixgbe_alloc_buffer_zc);
+}
+
+static struct sk_buff *ixgbe_construct_skb_zc(struct ixgbe_ring *rx_ring,
+					      struct ixgbe_rx_buffer *bi,
+					      struct xdp_buff *xdp)
+{
+	unsigned int metasize = xdp->data - xdp->data_meta;
+	unsigned int datasize = xdp->data_end - xdp->data;
+	struct sk_buff *skb;
+
+	/* allocate a skb to store the frags */
+	skb = __napi_alloc_skb(&rx_ring->q_vector->napi,
+			       xdp->data_end - xdp->data_hard_start,
+			       GFP_ATOMIC | __GFP_NOWARN);
+	if (unlikely(!skb))
+		return NULL;
+
+	skb_reserve(skb, xdp->data - xdp->data_hard_start);
+	memcpy(__skb_put(skb, datasize), xdp->data, datasize);
+	if (metasize)
+		skb_metadata_set(skb, metasize);
+
+	ixgbe_reuse_rx_buffer_zc(rx_ring, bi);
+	return skb;
+}
+
+static void ixgbe_inc_ntc(struct ixgbe_ring *rx_ring)
+{
+	u32 ntc = rx_ring->next_to_clean + 1;
+
+	ntc = (ntc < rx_ring->count) ? ntc : 0;
+	rx_ring->next_to_clean = ntc;
+	prefetch(IXGBE_RX_DESC(rx_ring, ntc));
+}
+
+int ixgbe_clean_rx_irq_zc(struct ixgbe_q_vector *q_vector,
+			  struct ixgbe_ring *rx_ring,
+			  const int budget)
+{
+	unsigned int total_rx_bytes = 0, total_rx_packets = 0;
+	struct ixgbe_adapter *adapter = q_vector->adapter;
+	u16 cleaned_count = ixgbe_desc_unused(rx_ring);
+	unsigned int xdp_res, xdp_xmit = 0;
+	bool failure = false;
+	struct sk_buff *skb;
+	struct xdp_buff xdp;
+
+	xdp.rxq = &rx_ring->xdp_rxq;
+
+	while (likely(total_rx_packets < budget)) {
+		union ixgbe_adv_rx_desc *rx_desc;
+		struct ixgbe_rx_buffer *bi;
+		unsigned int size;
+
+		/* return some buffers to hardware, one at a time is too slow */
+		if (cleaned_count >= IXGBE_RX_BUFFER_WRITE) {
+			failure = failure ||
+				  !ixgbe_alloc_rx_buffers_fast_zc(
+					  rx_ring,
+					  cleaned_count);
+			cleaned_count = 0;
+		}
+
+		rx_desc = IXGBE_RX_DESC(rx_ring, rx_ring->next_to_clean);
+		size = le16_to_cpu(rx_desc->wb.upper.length);
+		if (!size)
+			break;
+
+		/* This memory barrier is needed to keep us from reading
+		 * any other fields out of the rx_desc until we know the
+		 * descriptor has been written back
+		 */
+		dma_rmb();
+
+		bi = ixgbe_get_rx_buffer_zc(rx_ring, size);
+
+		if (unlikely(!ixgbe_test_staterr(rx_desc,
+						 IXGBE_RXD_STAT_EOP))) {
+			struct ixgbe_rx_buffer *next_bi;
+
+			ixgbe_reuse_rx_buffer_zc(rx_ring, bi);
+			ixgbe_inc_ntc(rx_ring);
+			next_bi = &rx_ring->rx_buffer_info[
+				rx_ring->next_to_clean];
+			next_bi->skb = ERR_PTR(-EINVAL);
+			continue;
+		}
+
+		if (unlikely(bi->skb)) {
+			ixgbe_reuse_rx_buffer_zc(rx_ring, bi);
+			ixgbe_inc_ntc(rx_ring);
+			continue;
+		}
+
+		xdp.data = bi->addr;
+		xdp.data_meta = xdp.data;
+		xdp.data_hard_start = xdp.data - XDP_PACKET_HEADROOM;
+		xdp.data_end = xdp.data + size;
+		xdp.handle = bi->handle;
+
+		xdp_res = ixgbe_run_xdp_zc(adapter, rx_ring, &xdp);
+
+		if (xdp_res) {
+			if (xdp_res & (IXGBE_XDP_TX | IXGBE_XDP_REDIR)) {
+				xdp_xmit |= xdp_res;
+				bi->addr = NULL;
+				bi->skb = NULL;
+			} else {
+				ixgbe_reuse_rx_buffer_zc(rx_ring, bi);
+			}
+			total_rx_packets++;
+			total_rx_bytes += size;
+
+			cleaned_count++;
+			ixgbe_inc_ntc(rx_ring);
+			continue;
+		}
+
+		/* XDP_PASS path */
+		skb = ixgbe_construct_skb_zc(rx_ring, bi, &xdp);
+		if (!skb) {
+			rx_ring->rx_stats.alloc_rx_buff_failed++;
+			break;
+		}
+
+		cleaned_count++;
+		ixgbe_inc_ntc(rx_ring);
+
+		if (eth_skb_pad(skb))
+			continue;
+
+		total_rx_bytes += skb->len;
+		total_rx_packets++;
+
+		ixgbe_process_skb_fields(rx_ring, rx_desc, skb);
+		ixgbe_rx_skb(q_vector, skb);
+	}
+
+	if (xdp_xmit & IXGBE_XDP_REDIR)
+		xdp_do_flush_map();
+
+	if (xdp_xmit & IXGBE_XDP_TX) {
+		struct ixgbe_ring *ring = adapter->xdp_ring[smp_processor_id()];
+
+		/* Force memory writes to complete before letting h/w
+		 * know there are new descriptors to fetch.
+		 */
+		wmb();
+		writel(ring->next_to_use, ring->tail);
+	}
+
+	u64_stats_update_begin(&rx_ring->syncp);
+	rx_ring->stats.packets += total_rx_packets;
+	rx_ring->stats.bytes += total_rx_bytes;
+	u64_stats_update_end(&rx_ring->syncp);
+	q_vector->rx.total_packets += total_rx_packets;
+	q_vector->rx.total_bytes += total_rx_bytes;
+
+	return failure ? budget : (int)total_rx_packets;
+}
+
+void ixgbe_xsk_clean_rx_ring(struct ixgbe_ring *rx_ring)
+{
+	u16 i = rx_ring->next_to_clean;
+	struct ixgbe_rx_buffer *bi = &rx_ring->rx_buffer_info[i];
+
+	while (i != rx_ring->next_to_alloc) {
+		xsk_umem_fq_reuse(rx_ring->xsk_umem, bi->handle);
+		i++;
+		bi++;
+		if (i == rx_ring->count) {
+			i = 0;
+			bi = rx_ring->rx_buffer_info;
+		}
+	}
+}
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [Intel-wired-lan] [PATCH v2 3/5] ixgbe: add AF_XDP zero-copy Rx support
@ 2018-10-02  8:00   ` =?unknown-8bit?q?Bj=C3=B6rn_T=C3=B6pel?=
  0 siblings, 0 replies; 34+ messages in thread
From: =?unknown-8bit?q?Bj=C3=B6rn_T=C3=B6pel?= @ 2018-10-02  8:00 UTC (permalink / raw)
  To: intel-wired-lan

From: Bj?rn T?pel <bjorn.topel@intel.com>

This patch adds zero-copy Rx support for AF_XDP sockets. Instead of
allocating buffers of type MEM_TYPE_PAGE_SHARED, the Rx frames are
allocated as MEM_TYPE_ZERO_COPY when AF_XDP is enabled for a certain
queue.

All AF_XDP specific functions are added to a new file, ixgbe_xsk.c.

Note that when AF_XDP zero-copy is enabled, the XDP action XDP_PASS
will allocate a new buffer and copy the zero-copy frame prior passing
it to the kernel stack.

Signed-off-by: Bj?rn T?pel <bjorn.topel@intel.com>
---
 drivers/net/ethernet/intel/ixgbe/Makefile     |   3 +-
 drivers/net/ethernet/intel/ixgbe/ixgbe.h      |  27 +-
 drivers/net/ethernet/intel/ixgbe/ixgbe_lib.c  |  17 +-
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c |  78 ++-
 .../ethernet/intel/ixgbe/ixgbe_txrx_common.h  |  15 +
 drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c  | 628 ++++++++++++++++++
 6 files changed, 747 insertions(+), 21 deletions(-)
 create mode 100644 drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c

diff --git a/drivers/net/ethernet/intel/ixgbe/Makefile b/drivers/net/ethernet/intel/ixgbe/Makefile
index 5414685189ce..ca6b0c458e4a 100644
--- a/drivers/net/ethernet/intel/ixgbe/Makefile
+++ b/drivers/net/ethernet/intel/ixgbe/Makefile
@@ -8,7 +8,8 @@ obj-$(CONFIG_IXGBE) += ixgbe.o
 
 ixgbe-objs := ixgbe_main.o ixgbe_common.o ixgbe_ethtool.o \
               ixgbe_82599.o ixgbe_82598.o ixgbe_phy.o ixgbe_sriov.o \
-              ixgbe_mbx.o ixgbe_x540.o ixgbe_x550.o ixgbe_lib.o ixgbe_ptp.o
+              ixgbe_mbx.o ixgbe_x540.o ixgbe_x550.o ixgbe_lib.o ixgbe_ptp.o \
+              ixgbe_xsk.o
 
 ixgbe-$(CONFIG_IXGBE_DCB) +=  ixgbe_dcb.o ixgbe_dcb_82598.o \
                               ixgbe_dcb_82599.o ixgbe_dcb_nl.o
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe.h b/drivers/net/ethernet/intel/ixgbe/ixgbe.h
index 265db172042a..7a7679e7be84 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe.h
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe.h
@@ -228,13 +228,17 @@ struct ixgbe_tx_buffer {
 struct ixgbe_rx_buffer {
 	struct sk_buff *skb;
 	dma_addr_t dma;
-	struct page *page;
-#if (BITS_PER_LONG > 32) || (PAGE_SIZE >= 65536)
-	__u32 page_offset;
-#else
-	__u16 page_offset;
-#endif
-	__u16 pagecnt_bias;
+	union {
+		struct {
+			struct page *page;
+			__u32 page_offset;
+			__u16 pagecnt_bias;
+		};
+		struct {
+			void *addr;
+			u64 handle;
+		};
+	};
 };
 
 struct ixgbe_queue_stats {
@@ -348,6 +352,10 @@ struct ixgbe_ring {
 		struct ixgbe_rx_queue_stats rx_stats;
 	};
 	struct xdp_rxq_info xdp_rxq;
+	struct xdp_umem *xsk_umem;
+	struct zero_copy_allocator zca; /* ZC allocator anchor */
+	u16 ring_idx;		/* {rx,tx,xdp}_ring back reference idx */
+	u16 rx_buf_len;
 } ____cacheline_internodealigned_in_smp;
 
 enum ixgbe_ring_f_enum {
@@ -765,6 +773,11 @@ struct ixgbe_adapter {
 #ifdef CONFIG_XFRM_OFFLOAD
 	struct ixgbe_ipsec *ipsec;
 #endif /* CONFIG_XFRM_OFFLOAD */
+
+	/* AF_XDP zero-copy */
+	struct xdp_umem **xsk_umems;
+	u16 num_xsk_umems_used;
+	u16 num_xsk_umems;
 };
 
 static inline u8 ixgbe_max_rss_indices(struct ixgbe_adapter *adapter)
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_lib.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_lib.c
index d361f570ca37..62e6499e4146 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_lib.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_lib.c
@@ -1055,7 +1055,7 @@ static int ixgbe_alloc_q_vectors(struct ixgbe_adapter *adapter)
 	int txr_remaining = adapter->num_tx_queues;
 	int xdp_remaining = adapter->num_xdp_queues;
 	int rxr_idx = 0, txr_idx = 0, xdp_idx = 0, v_idx = 0;
-	int err;
+	int err, i;
 
 	/* only one q_vector if MSI-X is disabled. */
 	if (!(adapter->flags & IXGBE_FLAG_MSIX_ENABLED))
@@ -1097,6 +1097,21 @@ static int ixgbe_alloc_q_vectors(struct ixgbe_adapter *adapter)
 		xdp_idx += xqpv;
 	}
 
+	for (i = 0; i < adapter->num_rx_queues; i++) {
+		if (adapter->rx_ring[i])
+			adapter->rx_ring[i]->ring_idx = i;
+	}
+
+	for (i = 0; i < adapter->num_tx_queues; i++) {
+		if (adapter->tx_ring[i])
+			adapter->tx_ring[i]->ring_idx = i;
+	}
+
+	for (i = 0; i < adapter->num_xdp_queues; i++) {
+		if (adapter->xdp_ring[i])
+			adapter->xdp_ring[i]->ring_idx = i;
+	}
+
 	return 0;
 
 err_out:
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index cc655c4e24fd..547092b8fe54 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -34,6 +34,7 @@
 #include <net/tc_act/tc_mirred.h>
 #include <net/vxlan.h>
 #include <net/mpls.h>
+#include <net/xdp_sock.h>
 
 #include "ixgbe.h"
 #include "ixgbe_common.h"
@@ -3176,7 +3177,10 @@ int ixgbe_poll(struct napi_struct *napi, int budget)
 		per_ring_budget = budget;
 
 	ixgbe_for_each_ring(ring, q_vector->rx) {
-		int cleaned = ixgbe_clean_rx_irq(q_vector, ring,
+		int cleaned = ring->xsk_umem ?
+			      ixgbe_clean_rx_irq_zc(q_vector, ring,
+						    per_ring_budget) :
+			      ixgbe_clean_rx_irq(q_vector, ring,
 						 per_ring_budget);
 
 		work_done += cleaned;
@@ -3706,10 +3710,27 @@ static void ixgbe_configure_srrctl(struct ixgbe_adapter *adapter,
 	srrctl = IXGBE_RX_HDR_SIZE << IXGBE_SRRCTL_BSIZEHDRSIZE_SHIFT;
 
 	/* configure the packet buffer length */
-	if (test_bit(__IXGBE_RX_3K_BUFFER, &rx_ring->state))
+	if (rx_ring->xsk_umem) {
+		u32 xsk_buf_len = rx_ring->xsk_umem->chunk_size_nohr -
+				  XDP_PACKET_HEADROOM;
+
+		/* If the MAC support setting RXDCTL.RLPML, the
+		 * SRRCTL[n].BSIZEPKT is set to PAGE_SIZE and
+		 * RXDCTL.RLPML is set to the actual UMEM buffer
+		 * size. If not, then we are stuck with a 1k buffer
+		 * size resolution. In this case frames larger than
+		 * the UMEM buffer size viewed in a 1k resolution will
+		 * be dropped.
+		 */
+		if (hw->mac.type != ixgbe_mac_82599EB)
+			srrctl |= PAGE_SIZE >> IXGBE_SRRCTL_BSIZEPKT_SHIFT;
+		else
+			srrctl |= xsk_buf_len >> IXGBE_SRRCTL_BSIZEPKT_SHIFT;
+	} else if (test_bit(__IXGBE_RX_3K_BUFFER, &rx_ring->state)) {
 		srrctl |= IXGBE_RXBUFFER_3K >> IXGBE_SRRCTL_BSIZEPKT_SHIFT;
-	else
+	} else {
 		srrctl |= IXGBE_RXBUFFER_2K >> IXGBE_SRRCTL_BSIZEPKT_SHIFT;
+	}
 
 	/* configure descriptor type */
 	srrctl |= IXGBE_SRRCTL_DESCTYPE_ADV_ONEBUF;
@@ -4032,6 +4053,19 @@ void ixgbe_configure_rx_ring(struct ixgbe_adapter *adapter,
 	u32 rxdctl;
 	u8 reg_idx = ring->reg_idx;
 
+	xdp_rxq_info_unreg_mem_model(&ring->xdp_rxq);
+	ring->xsk_umem = ixgbe_xsk_umem(adapter, ring);
+	if (ring->xsk_umem) {
+		ring->zca.free = ixgbe_zca_free;
+		WARN_ON(xdp_rxq_info_reg_mem_model(&ring->xdp_rxq,
+						   MEM_TYPE_ZERO_COPY,
+						   &ring->zca));
+
+	} else {
+		WARN_ON(xdp_rxq_info_reg_mem_model(&ring->xdp_rxq,
+						   MEM_TYPE_PAGE_SHARED, NULL));
+	}
+
 	/* disable queue to avoid use of these values while updating state */
 	rxdctl = IXGBE_READ_REG(hw, IXGBE_RXDCTL(reg_idx));
 	rxdctl &= ~IXGBE_RXDCTL_ENABLE;
@@ -4081,6 +4115,17 @@ void ixgbe_configure_rx_ring(struct ixgbe_adapter *adapter,
 #endif
 	}
 
+	if (ring->xsk_umem && hw->mac.type != ixgbe_mac_82599EB) {
+		u32 xsk_buf_len = ring->xsk_umem->chunk_size_nohr -
+				  XDP_PACKET_HEADROOM;
+
+		rxdctl &= ~(IXGBE_RXDCTL_RLPMLMASK |
+			    IXGBE_RXDCTL_RLPML_EN);
+		rxdctl |= xsk_buf_len | IXGBE_RXDCTL_RLPML_EN;
+
+		ring->rx_buf_len = xsk_buf_len;
+	}
+
 	/* initialize rx_buffer_info */
 	memset(ring->rx_buffer_info, 0,
 	       sizeof(struct ixgbe_rx_buffer) * ring->count);
@@ -4094,7 +4139,10 @@ void ixgbe_configure_rx_ring(struct ixgbe_adapter *adapter,
 	IXGBE_WRITE_REG(hw, IXGBE_RXDCTL(reg_idx), rxdctl);
 
 	ixgbe_rx_desc_queue_enable(adapter, ring);
-	ixgbe_alloc_rx_buffers(ring, ixgbe_desc_unused(ring));
+	if (ring->xsk_umem)
+		ixgbe_alloc_rx_buffers_zc(ring, ixgbe_desc_unused(ring));
+	else
+		ixgbe_alloc_rx_buffers(ring, ixgbe_desc_unused(ring));
 }
 
 static void ixgbe_setup_psrtype(struct ixgbe_adapter *adapter)
@@ -5208,6 +5256,11 @@ static void ixgbe_clean_rx_ring(struct ixgbe_ring *rx_ring)
 	u16 i = rx_ring->next_to_clean;
 	struct ixgbe_rx_buffer *rx_buffer = &rx_ring->rx_buffer_info[i];
 
+	if (rx_ring->xsk_umem) {
+		ixgbe_xsk_clean_rx_ring(rx_ring);
+		goto skip_free;
+	}
+
 	/* Free all the Rx ring sk_buffs */
 	while (i != rx_ring->next_to_alloc) {
 		if (rx_buffer->skb) {
@@ -5246,6 +5299,7 @@ static void ixgbe_clean_rx_ring(struct ixgbe_ring *rx_ring)
 		}
 	}
 
+skip_free:
 	rx_ring->next_to_alloc = 0;
 	rx_ring->next_to_clean = 0;
 	rx_ring->next_to_use = 0;
@@ -6441,7 +6495,7 @@ int ixgbe_setup_rx_resources(struct ixgbe_adapter *adapter,
 	struct device *dev = rx_ring->dev;
 	int orig_node = dev_to_node(dev);
 	int ring_node = -1;
-	int size, err;
+	int size;
 
 	size = sizeof(struct ixgbe_rx_buffer) * rx_ring->count;
 
@@ -6478,13 +6532,6 @@ int ixgbe_setup_rx_resources(struct ixgbe_adapter *adapter,
 			     rx_ring->queue_index) < 0)
 		goto err;
 
-	err = xdp_rxq_info_reg_mem_model(&rx_ring->xdp_rxq,
-					 MEM_TYPE_PAGE_SHARED, NULL);
-	if (err) {
-		xdp_rxq_info_unreg(&rx_ring->xdp_rxq);
-		goto err;
-	}
-
 	rx_ring->xdp_prog = adapter->xdp_prog;
 
 	return 0;
@@ -10200,6 +10247,13 @@ static int ixgbe_xdp(struct net_device *dev, struct netdev_bpf *xdp)
 		xdp->prog_id = adapter->xdp_prog ?
 			adapter->xdp_prog->aux->id : 0;
 		return 0;
+	case XDP_QUERY_XSK_UMEM:
+		return ixgbe_xsk_umem_query(adapter, &xdp->xsk.umem,
+					    xdp->xsk.queue_id);
+	case XDP_SETUP_XSK_UMEM:
+		return ixgbe_xsk_umem_setup(adapter, xdp->xsk.umem,
+					    xdp->xsk.queue_id);
+
 	default:
 		return -EINVAL;
 	}
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_txrx_common.h b/drivers/net/ethernet/intel/ixgbe/ixgbe_txrx_common.h
index 3780d315b991..cf219f4e009d 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_txrx_common.h
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_txrx_common.h
@@ -23,4 +23,19 @@ void ixgbe_rx_skb(struct ixgbe_q_vector *q_vector,
 void ixgbe_txrx_ring_disable(struct ixgbe_adapter *adapter, int ring);
 void ixgbe_txrx_ring_enable(struct ixgbe_adapter *adapter, int ring);
 
+struct xdp_umem *ixgbe_xsk_umem(struct ixgbe_adapter *adapter,
+				struct ixgbe_ring *ring);
+int ixgbe_xsk_umem_query(struct ixgbe_adapter *adapter, struct xdp_umem **umem,
+			 u16 qid);
+int ixgbe_xsk_umem_setup(struct ixgbe_adapter *adapter, struct xdp_umem *umem,
+			 u16 qid);
+
+void ixgbe_zca_free(struct zero_copy_allocator *alloc, unsigned long handle);
+
+void ixgbe_alloc_rx_buffers_zc(struct ixgbe_ring *rx_ring, u16 cleaned_count);
+int ixgbe_clean_rx_irq_zc(struct ixgbe_q_vector *q_vector,
+			  struct ixgbe_ring *rx_ring,
+			  const int budget);
+void ixgbe_xsk_clean_rx_ring(struct ixgbe_ring *rx_ring);
+
 #endif /* #define _IXGBE_TXRX_COMMON_H_ */
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c
new file mode 100644
index 000000000000..61259036ff4b
--- /dev/null
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c
@@ -0,0 +1,628 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright(c) 2018 Intel Corporation. */
+
+#include <linux/bpf_trace.h>
+#include <net/xdp_sock.h>
+#include <net/xdp.h>
+
+#include "ixgbe.h"
+#include "ixgbe_txrx_common.h"
+
+struct xdp_umem *ixgbe_xsk_umem(struct ixgbe_adapter *adapter,
+				struct ixgbe_ring *ring)
+{
+	bool xdp_on = READ_ONCE(adapter->xdp_prog);
+	int qid = ring->ring_idx;
+
+	if (!adapter->xsk_umems || !adapter->xsk_umems[qid] ||
+	    qid >= adapter->num_xsk_umems || !xdp_on)
+		return NULL;
+
+	return adapter->xsk_umems[qid];
+}
+
+static int ixgbe_alloc_xsk_umems(struct ixgbe_adapter *adapter)
+{
+	if (adapter->xsk_umems)
+		return 0;
+
+	adapter->num_xsk_umems_used = 0;
+	adapter->num_xsk_umems = adapter->num_rx_queues;
+	adapter->xsk_umems = kcalloc(adapter->num_xsk_umems,
+				     sizeof(*adapter->xsk_umems),
+				     GFP_KERNEL);
+	if (!adapter->xsk_umems) {
+		adapter->num_xsk_umems = 0;
+		return -ENOMEM;
+	}
+
+	return 0;
+}
+
+static int ixgbe_add_xsk_umem(struct ixgbe_adapter *adapter,
+			      struct xdp_umem *umem,
+			      u16 qid)
+{
+	int err;
+
+	err = ixgbe_alloc_xsk_umems(adapter);
+	if (err)
+		return err;
+
+	adapter->xsk_umems[qid] = umem;
+	adapter->num_xsk_umems_used++;
+
+	return 0;
+}
+
+static void ixgbe_remove_xsk_umem(struct ixgbe_adapter *adapter, u16 qid)
+{
+	adapter->xsk_umems[qid] = NULL;
+	adapter->num_xsk_umems_used--;
+
+	if (adapter->num_xsk_umems == 0) {
+		kfree(adapter->xsk_umems);
+		adapter->xsk_umems = NULL;
+		adapter->num_xsk_umems = 0;
+	}
+}
+
+static int ixgbe_xsk_umem_dma_map(struct ixgbe_adapter *adapter,
+				  struct xdp_umem *umem)
+{
+	struct device *dev = &adapter->pdev->dev;
+	unsigned int i, j;
+	dma_addr_t dma;
+
+	for (i = 0; i < umem->npgs; i++) {
+		dma = dma_map_page_attrs(dev, umem->pgs[i], 0, PAGE_SIZE,
+					 DMA_BIDIRECTIONAL, IXGBE_RX_DMA_ATTR);
+		if (dma_mapping_error(dev, dma))
+			goto out_unmap;
+
+		umem->pages[i].dma = dma;
+	}
+
+	return 0;
+
+out_unmap:
+	for (j = 0; j < i; j++) {
+		dma_unmap_page_attrs(dev, umem->pages[i].dma, PAGE_SIZE,
+				     DMA_BIDIRECTIONAL, IXGBE_RX_DMA_ATTR);
+		umem->pages[i].dma = 0;
+	}
+
+	return -1;
+}
+
+static void ixgbe_xsk_umem_dma_unmap(struct ixgbe_adapter *adapter,
+				     struct xdp_umem *umem)
+{
+	struct device *dev = &adapter->pdev->dev;
+	unsigned int i;
+
+	for (i = 0; i < umem->npgs; i++) {
+		dma_unmap_page_attrs(dev, umem->pages[i].dma, PAGE_SIZE,
+				     DMA_BIDIRECTIONAL, IXGBE_RX_DMA_ATTR);
+
+		umem->pages[i].dma = 0;
+	}
+}
+
+static int ixgbe_xsk_umem_enable(struct ixgbe_adapter *adapter,
+				 struct xdp_umem *umem,
+				 u16 qid)
+{
+	struct xdp_umem_fq_reuse *reuseq;
+	bool if_running;
+	int err;
+
+	if (qid >= adapter->num_rx_queues)
+		return -EINVAL;
+
+	if (adapter->xsk_umems) {
+		if (qid >= adapter->num_xsk_umems)
+			return -EINVAL;
+		if (adapter->xsk_umems[qid])
+			return -EBUSY;
+	}
+
+	reuseq = xsk_reuseq_prepare(adapter->rx_ring[0]->count);
+	if (!reuseq)
+		return -ENOMEM;
+
+	xsk_reuseq_free(xsk_reuseq_swap(umem, reuseq));
+
+	err = ixgbe_xsk_umem_dma_map(adapter, umem);
+	if (err)
+		return err;
+
+	if_running = netif_running(adapter->netdev) &&
+		     READ_ONCE(adapter->xdp_prog);
+
+	if (if_running)
+		ixgbe_txrx_ring_disable(adapter, qid);
+
+	err = ixgbe_add_xsk_umem(adapter, umem, qid);
+
+	if (if_running)
+		ixgbe_txrx_ring_enable(adapter, qid);
+
+	return err;
+}
+
+static int ixgbe_xsk_umem_disable(struct ixgbe_adapter *adapter, u16 qid)
+{
+	bool if_running;
+
+	if (!adapter->xsk_umems || qid >= adapter->num_xsk_umems ||
+	    !adapter->xsk_umems[qid])
+		return -EINVAL;
+
+	if_running = netif_running(adapter->netdev) &&
+		     READ_ONCE(adapter->xdp_prog);
+
+	if (if_running)
+		ixgbe_txrx_ring_disable(adapter, qid);
+
+	ixgbe_xsk_umem_dma_unmap(adapter, adapter->xsk_umems[qid]);
+	ixgbe_remove_xsk_umem(adapter, qid);
+
+	if (if_running)
+		ixgbe_txrx_ring_enable(adapter, qid);
+
+	return 0;
+}
+
+int ixgbe_xsk_umem_query(struct ixgbe_adapter *adapter, struct xdp_umem **umem,
+			 u16 qid)
+{
+	if (qid >= adapter->num_rx_queues)
+		return -EINVAL;
+
+	if (adapter->xsk_umems) {
+		if (qid >= adapter->num_xsk_umems)
+			return -EINVAL;
+		*umem = adapter->xsk_umems[qid];
+		return 0;
+	}
+
+	*umem = NULL;
+	return 0;
+}
+
+int ixgbe_xsk_umem_setup(struct ixgbe_adapter *adapter, struct xdp_umem *umem,
+			 u16 qid)
+{
+	return umem ? ixgbe_xsk_umem_enable(adapter, umem, qid) :
+		ixgbe_xsk_umem_disable(adapter, qid);
+}
+
+static int ixgbe_run_xdp_zc(struct ixgbe_adapter *adapter,
+			    struct ixgbe_ring *rx_ring,
+			    struct xdp_buff *xdp)
+{
+	int err, result = IXGBE_XDP_PASS;
+	struct bpf_prog *xdp_prog;
+	struct xdp_frame *xdpf;
+	u32 act;
+
+	rcu_read_lock();
+	xdp_prog = READ_ONCE(rx_ring->xdp_prog);
+	act = bpf_prog_run_xdp(xdp_prog, xdp);
+	xdp->handle += xdp->data - xdp->data_hard_start;
+	switch (act) {
+	case XDP_PASS:
+		break;
+	case XDP_TX:
+		xdpf = convert_to_xdp_frame(xdp);
+		if (unlikely(!xdpf)) {
+			result = IXGBE_XDP_CONSUMED;
+			break;
+		}
+		result = ixgbe_xmit_xdp_ring(adapter, xdpf);
+		break;
+	case XDP_REDIRECT:
+		err = xdp_do_redirect(rx_ring->netdev, xdp, xdp_prog);
+		result = !err ? IXGBE_XDP_REDIR : IXGBE_XDP_CONSUMED;
+		break;
+	default:
+		bpf_warn_invalid_xdp_action(act);
+		/* fallthrough */
+	case XDP_ABORTED:
+		trace_xdp_exception(rx_ring->netdev, xdp_prog, act);
+		/* fallthrough -- handle aborts by dropping packet */
+	case XDP_DROP:
+		result = IXGBE_XDP_CONSUMED;
+		break;
+	}
+	rcu_read_unlock();
+	return result;
+}
+
+static struct ixgbe_rx_buffer *ixgbe_get_rx_buffer_zc(
+	struct ixgbe_ring *rx_ring,
+	unsigned int size)
+{
+	struct ixgbe_rx_buffer *bi;
+
+	bi = &rx_ring->rx_buffer_info[rx_ring->next_to_clean];
+
+	/* we are reusing so sync this buffer for CPU use */
+	dma_sync_single_range_for_cpu(rx_ring->dev,
+				      bi->dma, 0,
+				      size,
+				      DMA_BIDIRECTIONAL);
+
+	return bi;
+}
+
+static void ixgbe_reuse_rx_buffer_zc(struct ixgbe_ring *rx_ring,
+				     struct ixgbe_rx_buffer *obi)
+{
+	unsigned long mask = (unsigned long)rx_ring->xsk_umem->chunk_mask;
+	u64 hr = rx_ring->xsk_umem->headroom + XDP_PACKET_HEADROOM;
+	u16 nta = rx_ring->next_to_alloc;
+	struct ixgbe_rx_buffer *nbi;
+
+	nbi = &rx_ring->rx_buffer_info[rx_ring->next_to_alloc];
+	/* update, and store next to alloc */
+	nta++;
+	rx_ring->next_to_alloc = (nta < rx_ring->count) ? nta : 0;
+
+	/* transfer page from old buffer to new buffer */
+	nbi->dma = obi->dma & mask;
+	nbi->dma += hr;
+
+	nbi->addr = (void *)((unsigned long)obi->addr & mask);
+	nbi->addr += hr;
+
+	nbi->handle = obi->handle & mask;
+	nbi->handle += rx_ring->xsk_umem->headroom;
+
+	obi->addr = NULL;
+	obi->skb = NULL;
+}
+
+void ixgbe_zca_free(struct zero_copy_allocator *alloc, unsigned long handle)
+{
+	struct ixgbe_rx_buffer *bi;
+	struct ixgbe_ring *rx_ring;
+	u64 hr, mask;
+	u16 nta;
+
+	rx_ring = container_of(alloc, struct ixgbe_ring, zca);
+	hr = rx_ring->xsk_umem->headroom + XDP_PACKET_HEADROOM;
+	mask = rx_ring->xsk_umem->chunk_mask;
+
+	nta = rx_ring->next_to_alloc;
+	bi = rx_ring->rx_buffer_info;
+
+	nta++;
+	rx_ring->next_to_alloc = (nta < rx_ring->count) ? nta : 0;
+
+	handle &= mask;
+
+	bi->dma = xdp_umem_get_dma(rx_ring->xsk_umem, handle);
+	bi->dma += hr;
+
+	bi->addr = xdp_umem_get_data(rx_ring->xsk_umem, handle);
+	bi->addr += hr;
+
+	bi->handle = (u64)handle + rx_ring->xsk_umem->headroom;
+}
+
+static bool ixgbe_alloc_buffer_zc(struct ixgbe_ring *rx_ring,
+				  struct ixgbe_rx_buffer *bi)
+{
+	struct xdp_umem *umem = rx_ring->xsk_umem;
+	void *addr = bi->addr;
+	u64 handle, hr;
+
+	if (addr)
+		return true;
+
+	if (!xsk_umem_peek_addr(umem, &handle)) {
+		rx_ring->rx_stats.alloc_rx_page_failed++;
+		return false;
+	}
+
+	hr = umem->headroom + XDP_PACKET_HEADROOM;
+
+	bi->dma = xdp_umem_get_dma(umem, handle);
+	bi->dma += hr;
+
+	bi->addr = xdp_umem_get_data(umem, handle);
+	bi->addr += hr;
+
+	bi->handle = handle + umem->headroom;
+
+	xsk_umem_discard_addr(umem);
+	return true;
+}
+
+static bool ixgbe_alloc_buffer_slow_zc(struct ixgbe_ring *rx_ring,
+				       struct ixgbe_rx_buffer *bi)
+{
+	struct xdp_umem *umem = rx_ring->xsk_umem;
+	u64 handle, hr;
+
+	if (!xsk_umem_peek_addr_rq(umem, &handle)) {
+		rx_ring->rx_stats.alloc_rx_page_failed++;
+		return false;
+	}
+
+	handle &= rx_ring->xsk_umem->chunk_mask;
+
+	hr = umem->headroom + XDP_PACKET_HEADROOM;
+
+	bi->dma = xdp_umem_get_dma(umem, handle);
+	bi->dma += hr;
+
+	bi->addr = xdp_umem_get_data(umem, handle);
+	bi->addr += hr;
+
+	bi->handle = handle + umem->headroom;
+
+	xsk_umem_discard_addr_rq(umem);
+	return true;
+}
+
+static __always_inline bool __ixgbe_alloc_rx_buffers_zc(
+	struct ixgbe_ring *rx_ring,
+	u16 cleaned_count,
+	bool alloc(struct ixgbe_ring *rx_ring,
+		   struct ixgbe_rx_buffer *bi))
+{
+	union ixgbe_adv_rx_desc *rx_desc;
+	struct ixgbe_rx_buffer *bi;
+	u16 i = rx_ring->next_to_use;
+	bool ok = true;
+
+	/* nothing to do */
+	if (!cleaned_count)
+		return true;
+
+	rx_desc = IXGBE_RX_DESC(rx_ring, i);
+	bi = &rx_ring->rx_buffer_info[i];
+	i -= rx_ring->count;
+
+	do {
+		if (!alloc(rx_ring, bi)) {
+			ok = false;
+			break;
+		}
+
+		/* sync the buffer for use by the device */
+		dma_sync_single_range_for_device(rx_ring->dev, bi->dma,
+						 bi->page_offset,
+						 rx_ring->rx_buf_len,
+						 DMA_BIDIRECTIONAL);
+
+		/* Refresh the desc even if buffer_addrs didn't change
+		 * because each write-back erases this info.
+		 */
+		rx_desc->read.pkt_addr = cpu_to_le64(bi->dma);
+
+		rx_desc++;
+		bi++;
+		i++;
+		if (unlikely(!i)) {
+			rx_desc = IXGBE_RX_DESC(rx_ring, 0);
+			bi = rx_ring->rx_buffer_info;
+			i -= rx_ring->count;
+		}
+
+		/* clear the length for the next_to_use descriptor */
+		rx_desc->wb.upper.length = 0;
+
+		cleaned_count--;
+	} while (cleaned_count);
+
+	i += rx_ring->count;
+
+	if (rx_ring->next_to_use != i) {
+		rx_ring->next_to_use = i;
+
+		/* update next to alloc since we have filled the ring */
+		rx_ring->next_to_alloc = i;
+
+		/* Force memory writes to complete before letting h/w
+		 * know there are new descriptors to fetch.  (Only
+		 * applicable for weak-ordered memory model archs,
+		 * such as IA-64).
+		 */
+		wmb();
+		writel(i, rx_ring->tail);
+	}
+
+	return ok;
+}
+
+void ixgbe_alloc_rx_buffers_zc(struct ixgbe_ring *rx_ring, u16 count)
+{
+	__ixgbe_alloc_rx_buffers_zc(rx_ring, count,
+				    ixgbe_alloc_buffer_slow_zc);
+}
+
+static bool ixgbe_alloc_rx_buffers_fast_zc(struct ixgbe_ring *rx_ring,
+					   u16 count)
+{
+	return __ixgbe_alloc_rx_buffers_zc(rx_ring, count,
+					   ixgbe_alloc_buffer_zc);
+}
+
+static struct sk_buff *ixgbe_construct_skb_zc(struct ixgbe_ring *rx_ring,
+					      struct ixgbe_rx_buffer *bi,
+					      struct xdp_buff *xdp)
+{
+	unsigned int metasize = xdp->data - xdp->data_meta;
+	unsigned int datasize = xdp->data_end - xdp->data;
+	struct sk_buff *skb;
+
+	/* allocate a skb to store the frags */
+	skb = __napi_alloc_skb(&rx_ring->q_vector->napi,
+			       xdp->data_end - xdp->data_hard_start,
+			       GFP_ATOMIC | __GFP_NOWARN);
+	if (unlikely(!skb))
+		return NULL;
+
+	skb_reserve(skb, xdp->data - xdp->data_hard_start);
+	memcpy(__skb_put(skb, datasize), xdp->data, datasize);
+	if (metasize)
+		skb_metadata_set(skb, metasize);
+
+	ixgbe_reuse_rx_buffer_zc(rx_ring, bi);
+	return skb;
+}
+
+static void ixgbe_inc_ntc(struct ixgbe_ring *rx_ring)
+{
+	u32 ntc = rx_ring->next_to_clean + 1;
+
+	ntc = (ntc < rx_ring->count) ? ntc : 0;
+	rx_ring->next_to_clean = ntc;
+	prefetch(IXGBE_RX_DESC(rx_ring, ntc));
+}
+
+int ixgbe_clean_rx_irq_zc(struct ixgbe_q_vector *q_vector,
+			  struct ixgbe_ring *rx_ring,
+			  const int budget)
+{
+	unsigned int total_rx_bytes = 0, total_rx_packets = 0;
+	struct ixgbe_adapter *adapter = q_vector->adapter;
+	u16 cleaned_count = ixgbe_desc_unused(rx_ring);
+	unsigned int xdp_res, xdp_xmit = 0;
+	bool failure = false;
+	struct sk_buff *skb;
+	struct xdp_buff xdp;
+
+	xdp.rxq = &rx_ring->xdp_rxq;
+
+	while (likely(total_rx_packets < budget)) {
+		union ixgbe_adv_rx_desc *rx_desc;
+		struct ixgbe_rx_buffer *bi;
+		unsigned int size;
+
+		/* return some buffers to hardware, one@a time is too slow */
+		if (cleaned_count >= IXGBE_RX_BUFFER_WRITE) {
+			failure = failure ||
+				  !ixgbe_alloc_rx_buffers_fast_zc(
+					  rx_ring,
+					  cleaned_count);
+			cleaned_count = 0;
+		}
+
+		rx_desc = IXGBE_RX_DESC(rx_ring, rx_ring->next_to_clean);
+		size = le16_to_cpu(rx_desc->wb.upper.length);
+		if (!size)
+			break;
+
+		/* This memory barrier is needed to keep us from reading
+		 * any other fields out of the rx_desc until we know the
+		 * descriptor has been written back
+		 */
+		dma_rmb();
+
+		bi = ixgbe_get_rx_buffer_zc(rx_ring, size);
+
+		if (unlikely(!ixgbe_test_staterr(rx_desc,
+						 IXGBE_RXD_STAT_EOP))) {
+			struct ixgbe_rx_buffer *next_bi;
+
+			ixgbe_reuse_rx_buffer_zc(rx_ring, bi);
+			ixgbe_inc_ntc(rx_ring);
+			next_bi = &rx_ring->rx_buffer_info[
+				rx_ring->next_to_clean];
+			next_bi->skb = ERR_PTR(-EINVAL);
+			continue;
+		}
+
+		if (unlikely(bi->skb)) {
+			ixgbe_reuse_rx_buffer_zc(rx_ring, bi);
+			ixgbe_inc_ntc(rx_ring);
+			continue;
+		}
+
+		xdp.data = bi->addr;
+		xdp.data_meta = xdp.data;
+		xdp.data_hard_start = xdp.data - XDP_PACKET_HEADROOM;
+		xdp.data_end = xdp.data + size;
+		xdp.handle = bi->handle;
+
+		xdp_res = ixgbe_run_xdp_zc(adapter, rx_ring, &xdp);
+
+		if (xdp_res) {
+			if (xdp_res & (IXGBE_XDP_TX | IXGBE_XDP_REDIR)) {
+				xdp_xmit |= xdp_res;
+				bi->addr = NULL;
+				bi->skb = NULL;
+			} else {
+				ixgbe_reuse_rx_buffer_zc(rx_ring, bi);
+			}
+			total_rx_packets++;
+			total_rx_bytes += size;
+
+			cleaned_count++;
+			ixgbe_inc_ntc(rx_ring);
+			continue;
+		}
+
+		/* XDP_PASS path */
+		skb = ixgbe_construct_skb_zc(rx_ring, bi, &xdp);
+		if (!skb) {
+			rx_ring->rx_stats.alloc_rx_buff_failed++;
+			break;
+		}
+
+		cleaned_count++;
+		ixgbe_inc_ntc(rx_ring);
+
+		if (eth_skb_pad(skb))
+			continue;
+
+		total_rx_bytes += skb->len;
+		total_rx_packets++;
+
+		ixgbe_process_skb_fields(rx_ring, rx_desc, skb);
+		ixgbe_rx_skb(q_vector, skb);
+	}
+
+	if (xdp_xmit & IXGBE_XDP_REDIR)
+		xdp_do_flush_map();
+
+	if (xdp_xmit & IXGBE_XDP_TX) {
+		struct ixgbe_ring *ring = adapter->xdp_ring[smp_processor_id()];
+
+		/* Force memory writes to complete before letting h/w
+		 * know there are new descriptors to fetch.
+		 */
+		wmb();
+		writel(ring->next_to_use, ring->tail);
+	}
+
+	u64_stats_update_begin(&rx_ring->syncp);
+	rx_ring->stats.packets += total_rx_packets;
+	rx_ring->stats.bytes += total_rx_bytes;
+	u64_stats_update_end(&rx_ring->syncp);
+	q_vector->rx.total_packets += total_rx_packets;
+	q_vector->rx.total_bytes += total_rx_bytes;
+
+	return failure ? budget : (int)total_rx_packets;
+}
+
+void ixgbe_xsk_clean_rx_ring(struct ixgbe_ring *rx_ring)
+{
+	u16 i = rx_ring->next_to_clean;
+	struct ixgbe_rx_buffer *bi = &rx_ring->rx_buffer_info[i];
+
+	while (i != rx_ring->next_to_alloc) {
+		xsk_umem_fq_reuse(rx_ring->xsk_umem, bi->handle);
+		i++;
+		bi++;
+		if (i == rx_ring->count) {
+			i = 0;
+			bi = rx_ring->rx_buffer_info;
+		}
+	}
+}
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v2 4/5] ixgbe: move common Tx functions to ixgbe_txrx_common.h
  2018-10-02  8:00 ` [Intel-wired-lan] " =?unknown-8bit?q?Bj=C3=B6rn_T=C3=B6pel?=
@ 2018-10-02  8:00   ` =?unknown-8bit?q?Bj=C3=B6rn_T=C3=B6pel?=
  -1 siblings, 0 replies; 34+ messages in thread
From: Björn Töpel @ 2018-10-02  8:00 UTC (permalink / raw)
  To: jeffrey.t.kirsher, intel-wired-lan
  Cc: Björn Töpel, magnus.karlsson, magnus.karlsson, ast,
	daniel, netdev, brouer, u9012063, tuc, jakub.kicinski

From: Björn Töpel <bjorn.topel@intel.com>

This patch prepares for the upcoming zero-copy Tx functionality by
moving common functions used both by the regular path and zero-copy
path.

Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
---
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c        | 9 +++------
 drivers/net/ethernet/intel/ixgbe/ixgbe_txrx_common.h | 5 +++++
 2 files changed, 8 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index 547092b8fe54..b211032f8682 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -895,8 +895,8 @@ static void ixgbe_set_ivar(struct ixgbe_adapter *adapter, s8 direction,
 	}
 }
 
-static inline void ixgbe_irq_rearm_queues(struct ixgbe_adapter *adapter,
-					  u64 qmask)
+void ixgbe_irq_rearm_queues(struct ixgbe_adapter *adapter,
+			    u64 qmask)
 {
 	u32 mask;
 
@@ -8156,9 +8156,6 @@ static inline int ixgbe_maybe_stop_tx(struct ixgbe_ring *tx_ring, u16 size)
 	return __ixgbe_maybe_stop_tx(tx_ring, size);
 }
 
-#define IXGBE_TXD_CMD (IXGBE_TXD_CMD_EOP | \
-		       IXGBE_TXD_CMD_RS)
-
 static int ixgbe_tx_map(struct ixgbe_ring *tx_ring,
 			struct ixgbe_tx_buffer *first,
 			const u8 hdr_len)
@@ -10259,7 +10256,7 @@ static int ixgbe_xdp(struct net_device *dev, struct netdev_bpf *xdp)
 	}
 }
 
-static void ixgbe_xdp_ring_update_tail(struct ixgbe_ring *ring)
+void ixgbe_xdp_ring_update_tail(struct ixgbe_ring *ring)
 {
 	/* Force memory writes to complete before letting h/w know there
 	 * are new descriptors to fetch.
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_txrx_common.h b/drivers/net/ethernet/intel/ixgbe/ixgbe_txrx_common.h
index cf219f4e009d..56afb685c648 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_txrx_common.h
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_txrx_common.h
@@ -9,6 +9,9 @@
 #define IXGBE_XDP_TX		BIT(1)
 #define IXGBE_XDP_REDIR		BIT(2)
 
+#define IXGBE_TXD_CMD (IXGBE_TXD_CMD_EOP | \
+		       IXGBE_TXD_CMD_RS)
+
 int ixgbe_xmit_xdp_ring(struct ixgbe_adapter *adapter,
 			struct xdp_frame *xdpf);
 bool ixgbe_cleanup_headers(struct ixgbe_ring *rx_ring,
@@ -19,6 +22,8 @@ void ixgbe_process_skb_fields(struct ixgbe_ring *rx_ring,
 			      struct sk_buff *skb);
 void ixgbe_rx_skb(struct ixgbe_q_vector *q_vector,
 		  struct sk_buff *skb);
+void ixgbe_xdp_ring_update_tail(struct ixgbe_ring *ring);
+void ixgbe_irq_rearm_queues(struct ixgbe_adapter *adapter, u64 qmask);
 
 void ixgbe_txrx_ring_disable(struct ixgbe_adapter *adapter, int ring);
 void ixgbe_txrx_ring_enable(struct ixgbe_adapter *adapter, int ring);
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [Intel-wired-lan] [PATCH v2 4/5] ixgbe: move common Tx functions to ixgbe_txrx_common.h
@ 2018-10-02  8:00   ` =?unknown-8bit?q?Bj=C3=B6rn_T=C3=B6pel?=
  0 siblings, 0 replies; 34+ messages in thread
From: =?unknown-8bit?q?Bj=C3=B6rn_T=C3=B6pel?= @ 2018-10-02  8:00 UTC (permalink / raw)
  To: intel-wired-lan

From: Bj?rn T?pel <bjorn.topel@intel.com>

This patch prepares for the upcoming zero-copy Tx functionality by
moving common functions used both by the regular path and zero-copy
path.

Signed-off-by: Bj?rn T?pel <bjorn.topel@intel.com>
---
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c        | 9 +++------
 drivers/net/ethernet/intel/ixgbe/ixgbe_txrx_common.h | 5 +++++
 2 files changed, 8 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index 547092b8fe54..b211032f8682 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -895,8 +895,8 @@ static void ixgbe_set_ivar(struct ixgbe_adapter *adapter, s8 direction,
 	}
 }
 
-static inline void ixgbe_irq_rearm_queues(struct ixgbe_adapter *adapter,
-					  u64 qmask)
+void ixgbe_irq_rearm_queues(struct ixgbe_adapter *adapter,
+			    u64 qmask)
 {
 	u32 mask;
 
@@ -8156,9 +8156,6 @@ static inline int ixgbe_maybe_stop_tx(struct ixgbe_ring *tx_ring, u16 size)
 	return __ixgbe_maybe_stop_tx(tx_ring, size);
 }
 
-#define IXGBE_TXD_CMD (IXGBE_TXD_CMD_EOP | \
-		       IXGBE_TXD_CMD_RS)
-
 static int ixgbe_tx_map(struct ixgbe_ring *tx_ring,
 			struct ixgbe_tx_buffer *first,
 			const u8 hdr_len)
@@ -10259,7 +10256,7 @@ static int ixgbe_xdp(struct net_device *dev, struct netdev_bpf *xdp)
 	}
 }
 
-static void ixgbe_xdp_ring_update_tail(struct ixgbe_ring *ring)
+void ixgbe_xdp_ring_update_tail(struct ixgbe_ring *ring)
 {
 	/* Force memory writes to complete before letting h/w know there
 	 * are new descriptors to fetch.
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_txrx_common.h b/drivers/net/ethernet/intel/ixgbe/ixgbe_txrx_common.h
index cf219f4e009d..56afb685c648 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_txrx_common.h
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_txrx_common.h
@@ -9,6 +9,9 @@
 #define IXGBE_XDP_TX		BIT(1)
 #define IXGBE_XDP_REDIR		BIT(2)
 
+#define IXGBE_TXD_CMD (IXGBE_TXD_CMD_EOP | \
+		       IXGBE_TXD_CMD_RS)
+
 int ixgbe_xmit_xdp_ring(struct ixgbe_adapter *adapter,
 			struct xdp_frame *xdpf);
 bool ixgbe_cleanup_headers(struct ixgbe_ring *rx_ring,
@@ -19,6 +22,8 @@ void ixgbe_process_skb_fields(struct ixgbe_ring *rx_ring,
 			      struct sk_buff *skb);
 void ixgbe_rx_skb(struct ixgbe_q_vector *q_vector,
 		  struct sk_buff *skb);
+void ixgbe_xdp_ring_update_tail(struct ixgbe_ring *ring);
+void ixgbe_irq_rearm_queues(struct ixgbe_adapter *adapter, u64 qmask);
 
 void ixgbe_txrx_ring_disable(struct ixgbe_adapter *adapter, int ring);
 void ixgbe_txrx_ring_enable(struct ixgbe_adapter *adapter, int ring);
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v2 5/5] ixgbe: add AF_XDP zero-copy Tx support
  2018-10-02  8:00 ` [Intel-wired-lan] " =?unknown-8bit?q?Bj=C3=B6rn_T=C3=B6pel?=
@ 2018-10-02  8:00   ` =?unknown-8bit?q?Bj=C3=B6rn_T=C3=B6pel?=
  -1 siblings, 0 replies; 34+ messages in thread
From: Björn Töpel @ 2018-10-02  8:00 UTC (permalink / raw)
  To: jeffrey.t.kirsher, intel-wired-lan
  Cc: Björn Töpel, magnus.karlsson, magnus.karlsson, ast,
	daniel, netdev, brouer, u9012063, tuc, jakub.kicinski

From: Björn Töpel <bjorn.topel@intel.com>

This patch adds zero-copy Tx support for AF_XDP sockets. It implements
the ndo_xsk_async_xmit netdev ndo and performs all the Tx logic from a
NAPI context. This means pulling egress packets from the Tx ring,
placing the frames on the NIC HW descriptor ring and completing sent
frames back to the application via the completion ring.

The regular XDP Tx ring is used for AF_XDP as well. This rationale for
this is as follows: XDP_REDIRECT guarantees mutual exclusion between
different NAPI contexts based on CPU id. In other words, a netdev can
XDP_REDIRECT to another netdev with a different NAPI context, since
the operation is bound to a specific core and each core has its own
hardware ring.

As the AF_XDP Tx action is running in the same NAPI context and using
the same ring, it will also be protected from XDP_REDIRECT actions
with the exact same mechanism.

As with AF_XDP Rx, all AF_XDP Tx specific functions are added to
ixgbe_xsk.c.

Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
---
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c |  17 +-
 .../ethernet/intel/ixgbe/ixgbe_txrx_common.h  |   4 +
 drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c  | 175 ++++++++++++++++++
 3 files changed, 195 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index b211032f8682..ec31b32d6674 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -3161,7 +3161,11 @@ int ixgbe_poll(struct napi_struct *napi, int budget)
 #endif
 
 	ixgbe_for_each_ring(ring, q_vector->tx) {
-		if (!ixgbe_clean_tx_irq(q_vector, ring, budget))
+		bool wd = ring->xsk_umem ?
+			  ixgbe_clean_xdp_tx_irq(q_vector, ring, budget) :
+			  ixgbe_clean_tx_irq(q_vector, ring, budget);
+
+		if (!wd)
 			clean_complete = false;
 	}
 
@@ -3472,6 +3476,10 @@ void ixgbe_configure_tx_ring(struct ixgbe_adapter *adapter,
 	u32 txdctl = IXGBE_TXDCTL_ENABLE;
 	u8 reg_idx = ring->reg_idx;
 
+	ring->xsk_umem = NULL;
+	if (ring_is_xdp(ring))
+		ring->xsk_umem = ixgbe_xsk_umem(adapter, ring);
+
 	/* disable queue to avoid issues while updating state */
 	IXGBE_WRITE_REG(hw, IXGBE_TXDCTL(reg_idx), 0);
 	IXGBE_WRITE_FLUSH(hw);
@@ -5944,6 +5952,11 @@ static void ixgbe_clean_tx_ring(struct ixgbe_ring *tx_ring)
 	u16 i = tx_ring->next_to_clean;
 	struct ixgbe_tx_buffer *tx_buffer = &tx_ring->tx_buffer_info[i];
 
+	if (tx_ring->xsk_umem) {
+		ixgbe_xsk_clean_tx_ring(tx_ring);
+		goto out;
+	}
+
 	while (i != tx_ring->next_to_use) {
 		union ixgbe_adv_tx_desc *eop_desc, *tx_desc;
 
@@ -5995,6 +6008,7 @@ static void ixgbe_clean_tx_ring(struct ixgbe_ring *tx_ring)
 	if (!ring_is_xdp(tx_ring))
 		netdev_tx_reset_queue(txring_txq(tx_ring));
 
+out:
 	/* reset next_to_use and next_to_clean */
 	tx_ring->next_to_use = 0;
 	tx_ring->next_to_clean = 0;
@@ -10350,6 +10364,7 @@ static const struct net_device_ops ixgbe_netdev_ops = {
 	.ndo_features_check	= ixgbe_features_check,
 	.ndo_bpf		= ixgbe_xdp,
 	.ndo_xdp_xmit		= ixgbe_xdp_xmit,
+	.ndo_xsk_async_xmit	= ixgbe_xsk_async_xmit,
 };
 
 static void ixgbe_disable_txr_hw(struct ixgbe_adapter *adapter,
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_txrx_common.h b/drivers/net/ethernet/intel/ixgbe/ixgbe_txrx_common.h
index 56afb685c648..53d4089f5644 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_txrx_common.h
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_txrx_common.h
@@ -42,5 +42,9 @@ int ixgbe_clean_rx_irq_zc(struct ixgbe_q_vector *q_vector,
 			  struct ixgbe_ring *rx_ring,
 			  const int budget);
 void ixgbe_xsk_clean_rx_ring(struct ixgbe_ring *rx_ring);
+bool ixgbe_clean_xdp_tx_irq(struct ixgbe_q_vector *q_vector,
+			    struct ixgbe_ring *tx_ring, int napi_budget);
+int ixgbe_xsk_async_xmit(struct net_device *dev, u32 queue_id);
+void ixgbe_xsk_clean_tx_ring(struct ixgbe_ring *tx_ring);
 
 #endif /* #define _IXGBE_TXRX_COMMON_H_ */
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c
index 61259036ff4b..cf1c6f2d97e5 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c
@@ -626,3 +626,178 @@ void ixgbe_xsk_clean_rx_ring(struct ixgbe_ring *rx_ring)
 		}
 	}
 }
+
+static bool ixgbe_xmit_zc(struct ixgbe_ring *xdp_ring, unsigned int budget)
+{
+	union ixgbe_adv_tx_desc *tx_desc = NULL;
+	struct ixgbe_tx_buffer *tx_bi;
+	bool work_done = true;
+	u32 len, cmd_type;
+	dma_addr_t dma;
+
+	while (budget-- > 0) {
+		if (unlikely(!ixgbe_desc_unused(xdp_ring))) {
+			work_done = false;
+			break;
+		}
+
+		if (!xsk_umem_consume_tx(xdp_ring->xsk_umem, &dma, &len))
+			break;
+
+		dma_sync_single_for_device(xdp_ring->dev, dma, len,
+					   DMA_BIDIRECTIONAL);
+
+		tx_bi = &xdp_ring->tx_buffer_info[xdp_ring->next_to_use];
+		tx_bi->bytecount = len;
+		tx_bi->xdpf = NULL;
+
+		tx_desc = IXGBE_TX_DESC(xdp_ring, xdp_ring->next_to_use);
+		tx_desc->read.buffer_addr = cpu_to_le64(dma);
+
+		/* put descriptor type bits */
+		cmd_type = IXGBE_ADVTXD_DTYP_DATA |
+			   IXGBE_ADVTXD_DCMD_DEXT |
+			   IXGBE_ADVTXD_DCMD_IFCS;
+		cmd_type |= len | IXGBE_TXD_CMD;
+		tx_desc->read.cmd_type_len = cpu_to_le32(cmd_type);
+		tx_desc->read.olinfo_status =
+			cpu_to_le32(len << IXGBE_ADVTXD_PAYLEN_SHIFT);
+
+		xdp_ring->next_to_use++;
+		if (xdp_ring->next_to_use == xdp_ring->count)
+			xdp_ring->next_to_use = 0;
+	}
+
+	if (tx_desc) {
+		ixgbe_xdp_ring_update_tail(xdp_ring);
+		xsk_umem_consume_tx_done(xdp_ring->xsk_umem);
+	}
+
+	return !!budget && work_done;
+}
+
+static void ixgbe_clean_xdp_tx_buffer(struct ixgbe_ring *tx_ring,
+				      struct ixgbe_tx_buffer *tx_bi)
+{
+	xdp_return_frame(tx_bi->xdpf);
+	dma_unmap_single(tx_ring->dev,
+			 dma_unmap_addr(tx_bi, dma),
+			 dma_unmap_len(tx_bi, len), DMA_TO_DEVICE);
+	dma_unmap_len_set(tx_bi, len, 0);
+}
+
+bool ixgbe_clean_xdp_tx_irq(struct ixgbe_q_vector *q_vector,
+			    struct ixgbe_ring *tx_ring, int napi_budget)
+{
+	unsigned int total_packets = 0, total_bytes = 0;
+	u32 i = tx_ring->next_to_clean, xsk_frames = 0;
+	unsigned int budget = q_vector->tx.work_limit;
+	struct xdp_umem *umem = tx_ring->xsk_umem;
+	union ixgbe_adv_tx_desc *tx_desc;
+	struct ixgbe_tx_buffer *tx_bi;
+	bool xmit_done;
+
+	tx_bi = &tx_ring->tx_buffer_info[i];
+	tx_desc = IXGBE_TX_DESC(tx_ring, i);
+	i -= tx_ring->count;
+
+	do {
+		if (!(tx_desc->wb.status & cpu_to_le32(IXGBE_TXD_STAT_DD)))
+			break;
+
+		total_bytes += tx_bi->bytecount;
+		total_packets += tx_bi->gso_segs;
+
+		if (tx_bi->xdpf)
+			ixgbe_clean_xdp_tx_buffer(tx_ring, tx_bi);
+		else
+			xsk_frames++;
+
+		tx_bi->xdpf = NULL;
+		total_bytes += tx_bi->bytecount;
+
+		tx_bi++;
+		tx_desc++;
+		i++;
+		if (unlikely(!i)) {
+			i -= tx_ring->count;
+			tx_bi = tx_ring->tx_buffer_info;
+			tx_desc = IXGBE_TX_DESC(tx_ring, 0);
+		}
+
+		/* issue prefetch for next Tx descriptor */
+		prefetch(tx_desc);
+
+		/* update budget accounting */
+		budget--;
+	} while (likely(budget));
+
+	i += tx_ring->count;
+	tx_ring->next_to_clean = i;
+
+	u64_stats_update_begin(&tx_ring->syncp);
+	tx_ring->stats.bytes += total_bytes;
+	tx_ring->stats.packets += total_packets;
+	u64_stats_update_end(&tx_ring->syncp);
+	q_vector->tx.total_bytes += total_bytes;
+	q_vector->tx.total_packets += total_packets;
+
+	if (xsk_frames)
+		xsk_umem_complete_tx(umem, xsk_frames);
+
+	xmit_done = ixgbe_xmit_zc(tx_ring, q_vector->tx.work_limit);
+	return budget > 0 && xmit_done;
+}
+
+int ixgbe_xsk_async_xmit(struct net_device *dev, u32 qid)
+{
+	struct ixgbe_adapter *adapter = netdev_priv(dev);
+	struct ixgbe_ring *ring;
+
+	if (test_bit(__IXGBE_DOWN, &adapter->state))
+		return -ENETDOWN;
+
+	if (!READ_ONCE(adapter->xdp_prog))
+		return -ENXIO;
+
+	if (qid >= adapter->num_xdp_queues)
+		return -ENXIO;
+
+	if (!adapter->xsk_umems || !adapter->xsk_umems[qid])
+		return -ENXIO;
+
+	ring = adapter->xdp_ring[qid];
+	if (!napi_if_scheduled_mark_missed(&ring->q_vector->napi)) {
+		u64 eics = BIT_ULL(ring->q_vector->v_idx);
+
+		ixgbe_irq_rearm_queues(adapter, eics);
+	}
+
+	return 0;
+}
+
+void ixgbe_xsk_clean_tx_ring(struct ixgbe_ring *tx_ring)
+{
+	u16 ntc = tx_ring->next_to_clean, ntu = tx_ring->next_to_use;
+	struct xdp_umem *umem = tx_ring->xsk_umem;
+	struct ixgbe_tx_buffer *tx_bi;
+	u32 xsk_frames = 0;
+
+	while (ntc != ntu) {
+		tx_bi = &tx_ring->tx_buffer_info[ntc];
+
+		if (tx_bi->xdpf)
+			ixgbe_clean_xdp_tx_buffer(tx_ring, tx_bi);
+		else
+			xsk_frames++;
+
+		tx_bi->xdpf = NULL;
+
+		ntc++;
+		if (ntc == tx_ring->count)
+			ntc = 0;
+	}
+
+	if (xsk_frames)
+		xsk_umem_complete_tx(umem, xsk_frames);
+}
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [Intel-wired-lan] [PATCH v2 5/5] ixgbe: add AF_XDP zero-copy Tx support
@ 2018-10-02  8:00   ` =?unknown-8bit?q?Bj=C3=B6rn_T=C3=B6pel?=
  0 siblings, 0 replies; 34+ messages in thread
From: =?unknown-8bit?q?Bj=C3=B6rn_T=C3=B6pel?= @ 2018-10-02  8:00 UTC (permalink / raw)
  To: intel-wired-lan

From: Bj?rn T?pel <bjorn.topel@intel.com>

This patch adds zero-copy Tx support for AF_XDP sockets. It implements
the ndo_xsk_async_xmit netdev ndo and performs all the Tx logic from a
NAPI context. This means pulling egress packets from the Tx ring,
placing the frames on the NIC HW descriptor ring and completing sent
frames back to the application via the completion ring.

The regular XDP Tx ring is used for AF_XDP as well. This rationale for
this is as follows: XDP_REDIRECT guarantees mutual exclusion between
different NAPI contexts based on CPU id. In other words, a netdev can
XDP_REDIRECT to another netdev with a different NAPI context, since
the operation is bound to a specific core and each core has its own
hardware ring.

As the AF_XDP Tx action is running in the same NAPI context and using
the same ring, it will also be protected from XDP_REDIRECT actions
with the exact same mechanism.

As with AF_XDP Rx, all AF_XDP Tx specific functions are added to
ixgbe_xsk.c.

Signed-off-by: Bj?rn T?pel <bjorn.topel@intel.com>
---
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c |  17 +-
 .../ethernet/intel/ixgbe/ixgbe_txrx_common.h  |   4 +
 drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c  | 175 ++++++++++++++++++
 3 files changed, 195 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index b211032f8682..ec31b32d6674 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -3161,7 +3161,11 @@ int ixgbe_poll(struct napi_struct *napi, int budget)
 #endif
 
 	ixgbe_for_each_ring(ring, q_vector->tx) {
-		if (!ixgbe_clean_tx_irq(q_vector, ring, budget))
+		bool wd = ring->xsk_umem ?
+			  ixgbe_clean_xdp_tx_irq(q_vector, ring, budget) :
+			  ixgbe_clean_tx_irq(q_vector, ring, budget);
+
+		if (!wd)
 			clean_complete = false;
 	}
 
@@ -3472,6 +3476,10 @@ void ixgbe_configure_tx_ring(struct ixgbe_adapter *adapter,
 	u32 txdctl = IXGBE_TXDCTL_ENABLE;
 	u8 reg_idx = ring->reg_idx;
 
+	ring->xsk_umem = NULL;
+	if (ring_is_xdp(ring))
+		ring->xsk_umem = ixgbe_xsk_umem(adapter, ring);
+
 	/* disable queue to avoid issues while updating state */
 	IXGBE_WRITE_REG(hw, IXGBE_TXDCTL(reg_idx), 0);
 	IXGBE_WRITE_FLUSH(hw);
@@ -5944,6 +5952,11 @@ static void ixgbe_clean_tx_ring(struct ixgbe_ring *tx_ring)
 	u16 i = tx_ring->next_to_clean;
 	struct ixgbe_tx_buffer *tx_buffer = &tx_ring->tx_buffer_info[i];
 
+	if (tx_ring->xsk_umem) {
+		ixgbe_xsk_clean_tx_ring(tx_ring);
+		goto out;
+	}
+
 	while (i != tx_ring->next_to_use) {
 		union ixgbe_adv_tx_desc *eop_desc, *tx_desc;
 
@@ -5995,6 +6008,7 @@ static void ixgbe_clean_tx_ring(struct ixgbe_ring *tx_ring)
 	if (!ring_is_xdp(tx_ring))
 		netdev_tx_reset_queue(txring_txq(tx_ring));
 
+out:
 	/* reset next_to_use and next_to_clean */
 	tx_ring->next_to_use = 0;
 	tx_ring->next_to_clean = 0;
@@ -10350,6 +10364,7 @@ static const struct net_device_ops ixgbe_netdev_ops = {
 	.ndo_features_check	= ixgbe_features_check,
 	.ndo_bpf		= ixgbe_xdp,
 	.ndo_xdp_xmit		= ixgbe_xdp_xmit,
+	.ndo_xsk_async_xmit	= ixgbe_xsk_async_xmit,
 };
 
 static void ixgbe_disable_txr_hw(struct ixgbe_adapter *adapter,
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_txrx_common.h b/drivers/net/ethernet/intel/ixgbe/ixgbe_txrx_common.h
index 56afb685c648..53d4089f5644 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_txrx_common.h
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_txrx_common.h
@@ -42,5 +42,9 @@ int ixgbe_clean_rx_irq_zc(struct ixgbe_q_vector *q_vector,
 			  struct ixgbe_ring *rx_ring,
 			  const int budget);
 void ixgbe_xsk_clean_rx_ring(struct ixgbe_ring *rx_ring);
+bool ixgbe_clean_xdp_tx_irq(struct ixgbe_q_vector *q_vector,
+			    struct ixgbe_ring *tx_ring, int napi_budget);
+int ixgbe_xsk_async_xmit(struct net_device *dev, u32 queue_id);
+void ixgbe_xsk_clean_tx_ring(struct ixgbe_ring *tx_ring);
 
 #endif /* #define _IXGBE_TXRX_COMMON_H_ */
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c
index 61259036ff4b..cf1c6f2d97e5 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c
@@ -626,3 +626,178 @@ void ixgbe_xsk_clean_rx_ring(struct ixgbe_ring *rx_ring)
 		}
 	}
 }
+
+static bool ixgbe_xmit_zc(struct ixgbe_ring *xdp_ring, unsigned int budget)
+{
+	union ixgbe_adv_tx_desc *tx_desc = NULL;
+	struct ixgbe_tx_buffer *tx_bi;
+	bool work_done = true;
+	u32 len, cmd_type;
+	dma_addr_t dma;
+
+	while (budget-- > 0) {
+		if (unlikely(!ixgbe_desc_unused(xdp_ring))) {
+			work_done = false;
+			break;
+		}
+
+		if (!xsk_umem_consume_tx(xdp_ring->xsk_umem, &dma, &len))
+			break;
+
+		dma_sync_single_for_device(xdp_ring->dev, dma, len,
+					   DMA_BIDIRECTIONAL);
+
+		tx_bi = &xdp_ring->tx_buffer_info[xdp_ring->next_to_use];
+		tx_bi->bytecount = len;
+		tx_bi->xdpf = NULL;
+
+		tx_desc = IXGBE_TX_DESC(xdp_ring, xdp_ring->next_to_use);
+		tx_desc->read.buffer_addr = cpu_to_le64(dma);
+
+		/* put descriptor type bits */
+		cmd_type = IXGBE_ADVTXD_DTYP_DATA |
+			   IXGBE_ADVTXD_DCMD_DEXT |
+			   IXGBE_ADVTXD_DCMD_IFCS;
+		cmd_type |= len | IXGBE_TXD_CMD;
+		tx_desc->read.cmd_type_len = cpu_to_le32(cmd_type);
+		tx_desc->read.olinfo_status =
+			cpu_to_le32(len << IXGBE_ADVTXD_PAYLEN_SHIFT);
+
+		xdp_ring->next_to_use++;
+		if (xdp_ring->next_to_use == xdp_ring->count)
+			xdp_ring->next_to_use = 0;
+	}
+
+	if (tx_desc) {
+		ixgbe_xdp_ring_update_tail(xdp_ring);
+		xsk_umem_consume_tx_done(xdp_ring->xsk_umem);
+	}
+
+	return !!budget && work_done;
+}
+
+static void ixgbe_clean_xdp_tx_buffer(struct ixgbe_ring *tx_ring,
+				      struct ixgbe_tx_buffer *tx_bi)
+{
+	xdp_return_frame(tx_bi->xdpf);
+	dma_unmap_single(tx_ring->dev,
+			 dma_unmap_addr(tx_bi, dma),
+			 dma_unmap_len(tx_bi, len), DMA_TO_DEVICE);
+	dma_unmap_len_set(tx_bi, len, 0);
+}
+
+bool ixgbe_clean_xdp_tx_irq(struct ixgbe_q_vector *q_vector,
+			    struct ixgbe_ring *tx_ring, int napi_budget)
+{
+	unsigned int total_packets = 0, total_bytes = 0;
+	u32 i = tx_ring->next_to_clean, xsk_frames = 0;
+	unsigned int budget = q_vector->tx.work_limit;
+	struct xdp_umem *umem = tx_ring->xsk_umem;
+	union ixgbe_adv_tx_desc *tx_desc;
+	struct ixgbe_tx_buffer *tx_bi;
+	bool xmit_done;
+
+	tx_bi = &tx_ring->tx_buffer_info[i];
+	tx_desc = IXGBE_TX_DESC(tx_ring, i);
+	i -= tx_ring->count;
+
+	do {
+		if (!(tx_desc->wb.status & cpu_to_le32(IXGBE_TXD_STAT_DD)))
+			break;
+
+		total_bytes += tx_bi->bytecount;
+		total_packets += tx_bi->gso_segs;
+
+		if (tx_bi->xdpf)
+			ixgbe_clean_xdp_tx_buffer(tx_ring, tx_bi);
+		else
+			xsk_frames++;
+
+		tx_bi->xdpf = NULL;
+		total_bytes += tx_bi->bytecount;
+
+		tx_bi++;
+		tx_desc++;
+		i++;
+		if (unlikely(!i)) {
+			i -= tx_ring->count;
+			tx_bi = tx_ring->tx_buffer_info;
+			tx_desc = IXGBE_TX_DESC(tx_ring, 0);
+		}
+
+		/* issue prefetch for next Tx descriptor */
+		prefetch(tx_desc);
+
+		/* update budget accounting */
+		budget--;
+	} while (likely(budget));
+
+	i += tx_ring->count;
+	tx_ring->next_to_clean = i;
+
+	u64_stats_update_begin(&tx_ring->syncp);
+	tx_ring->stats.bytes += total_bytes;
+	tx_ring->stats.packets += total_packets;
+	u64_stats_update_end(&tx_ring->syncp);
+	q_vector->tx.total_bytes += total_bytes;
+	q_vector->tx.total_packets += total_packets;
+
+	if (xsk_frames)
+		xsk_umem_complete_tx(umem, xsk_frames);
+
+	xmit_done = ixgbe_xmit_zc(tx_ring, q_vector->tx.work_limit);
+	return budget > 0 && xmit_done;
+}
+
+int ixgbe_xsk_async_xmit(struct net_device *dev, u32 qid)
+{
+	struct ixgbe_adapter *adapter = netdev_priv(dev);
+	struct ixgbe_ring *ring;
+
+	if (test_bit(__IXGBE_DOWN, &adapter->state))
+		return -ENETDOWN;
+
+	if (!READ_ONCE(adapter->xdp_prog))
+		return -ENXIO;
+
+	if (qid >= adapter->num_xdp_queues)
+		return -ENXIO;
+
+	if (!adapter->xsk_umems || !adapter->xsk_umems[qid])
+		return -ENXIO;
+
+	ring = adapter->xdp_ring[qid];
+	if (!napi_if_scheduled_mark_missed(&ring->q_vector->napi)) {
+		u64 eics = BIT_ULL(ring->q_vector->v_idx);
+
+		ixgbe_irq_rearm_queues(adapter, eics);
+	}
+
+	return 0;
+}
+
+void ixgbe_xsk_clean_tx_ring(struct ixgbe_ring *tx_ring)
+{
+	u16 ntc = tx_ring->next_to_clean, ntu = tx_ring->next_to_use;
+	struct xdp_umem *umem = tx_ring->xsk_umem;
+	struct ixgbe_tx_buffer *tx_bi;
+	u32 xsk_frames = 0;
+
+	while (ntc != ntu) {
+		tx_bi = &tx_ring->tx_buffer_info[ntc];
+
+		if (tx_bi->xdpf)
+			ixgbe_clean_xdp_tx_buffer(tx_ring, tx_bi);
+		else
+			xsk_frames++;
+
+		tx_bi->xdpf = NULL;
+
+		ntc++;
+		if (ntc == tx_ring->count)
+			ntc = 0;
+	}
+
+	if (xsk_frames)
+		xsk_umem_complete_tx(umem, xsk_frames);
+}
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* Re: [PATCH v2 0/5] Introducing ixgbe AF_XDP ZC support
  2018-10-02  8:00 ` [Intel-wired-lan] " =?unknown-8bit?q?Bj=C3=B6rn_T=C3=B6pel?=
@ 2018-10-02 18:23   ` William Tu
  -1 siblings, 0 replies; 34+ messages in thread
From: William Tu @ 2018-10-02 18:23 UTC (permalink / raw)
  To: Björn Töpel
  Cc: jeffrey.t.kirsher, intel-wired-lan, Björn Töpel,
	Karlsson, Magnus, Magnus Karlsson, Alexei Starovoitov,
	Daniel Borkmann, Linux Kernel Network Developers,
	Jesper Dangaard Brouer, Test, Jakub Kicinski

On Tue, Oct 2, 2018 at 1:01 AM Björn Töpel <bjorn.topel@gmail.com> wrote:
>
> From: Björn Töpel <bjorn.topel@intel.com>
>
> Jeff: Please remove the v1 patches from your dev-queue!
>
> This patch set introduces zero-copy AF_XDP support for Intel's ixgbe
> driver.
>
> The ixgbe zero-copy code is located in its own file ixgbe_xsk.[ch],
> analogous to the i40e ZC support. Again, as in i40e, code paths have
> been copied from the XDP path to the zero-copy path. Going forward we
> will try to generalize more code between the AF_XDP ZC drivers, and
> also reduce the heavy C&P.
>
> We have run some benchmarks on a dual socket system with two Broadwell
> E5 2660 @ 2.0 GHz with hyperthreading turned off. Each socket has 14
> cores which gives a total of 28, but only two cores are used in these
> experiments. One for TR/RX and one for the user space application. The
> memory is DDR4 @ 2133 MT/s (1067 MHz) and the size of each DIMM is
> 8192MB and with 8 of those DIMMs in the system we have 64 GB of total
> memory. The compiler used is GCC 7.3.0. The NIC is Intel
> 82599ES/X520-2 10Gbit/s using the ixgbe driver.
>
> Below are the results in Mpps of the 82599ES/X520-2 NIC benchmark runs
> for 64B and 1500B packets, generated by a commercial packet generator
> HW blasting packets at full 10Gbit/s line rate. The results are with
> retpoline and all other spectre and meltdown fixes.
>
> AF_XDP performance 64B packets:
> Benchmark   XDP_DRV with zerocopy
> rxdrop        14.7
> txpush        14.6
> l2fwd         11.1
>
> AF_XDP performance 1500B packets:
> Benchmark   XDP_DRV with zerocopy
> rxdrop        0.8
> l2fwd         0.8
>
> XDP performance on our system as a base line.
>
> 64B packets:
> XDP stats       CPU     Mpps       issue-pps
> XDP-RX CPU      16      14.7       0
>
> 1500B packets:
> XDP stats       CPU     Mpps       issue-pps
> XDP-RX CPU      16      0.8        0
>
> The structure of the patch set is as follows:
>
> Patch 1: Introduce Rx/Tx ring enable/disable functionality
> Patch 2: Preparatory patche to ixgbe driver code for RX
> Patch 3: ixgbe zero-copy support for RX
> Patch 4: Preparatory patch to ixgbe driver code for TX
> Patch 5: ixgbe zero-copy support for TX
>
> Changes since v1:
>
> * Removed redundant AF_XDP precondition checks, pointed out by
>   Jakub. Now, the preconditions are only checked at XDP enable time.
> * Fixed a crash in the egress path, due to incorrect usage of
>   ixgbe_ring queue_index member. In v2 a ring_idx back reference is
>   introduced, and used in favor of queue_index. William reported the
>   crash, and helped me smoke out the issue. Kudos!

Thanks! I tested this series and no more crash.
The number is pretty good (*without* spectre and meltdown fixes)
model name : Intel(R) Xeon(R) CPU E5-2440 v2 @ 1.90GHz, total 16 cores/

AF_XDP performance 64B packets:
Benchmark   XDP_DRV with zerocopy
rxdrop        20
txpush        18
l2fwd         20

Regards,
William

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [Intel-wired-lan] [PATCH v2 0/5] Introducing ixgbe AF_XDP ZC support
@ 2018-10-02 18:23   ` William Tu
  0 siblings, 0 replies; 34+ messages in thread
From: William Tu @ 2018-10-02 18:23 UTC (permalink / raw)
  To: intel-wired-lan

On Tue, Oct 2, 2018 at 1:01 AM Bj?rn T?pel <bjorn.topel@gmail.com> wrote:
>
> From: Bj?rn T?pel <bjorn.topel@intel.com>
>
> Jeff: Please remove the v1 patches from your dev-queue!
>
> This patch set introduces zero-copy AF_XDP support for Intel's ixgbe
> driver.
>
> The ixgbe zero-copy code is located in its own file ixgbe_xsk.[ch],
> analogous to the i40e ZC support. Again, as in i40e, code paths have
> been copied from the XDP path to the zero-copy path. Going forward we
> will try to generalize more code between the AF_XDP ZC drivers, and
> also reduce the heavy C&P.
>
> We have run some benchmarks on a dual socket system with two Broadwell
> E5 2660 @ 2.0 GHz with hyperthreading turned off. Each socket has 14
> cores which gives a total of 28, but only two cores are used in these
> experiments. One for TR/RX and one for the user space application. The
> memory is DDR4 @ 2133 MT/s (1067 MHz) and the size of each DIMM is
> 8192MB and with 8 of those DIMMs in the system we have 64 GB of total
> memory. The compiler used is GCC 7.3.0. The NIC is Intel
> 82599ES/X520-2 10Gbit/s using the ixgbe driver.
>
> Below are the results in Mpps of the 82599ES/X520-2 NIC benchmark runs
> for 64B and 1500B packets, generated by a commercial packet generator
> HW blasting packets at full 10Gbit/s line rate. The results are with
> retpoline and all other spectre and meltdown fixes.
>
> AF_XDP performance 64B packets:
> Benchmark   XDP_DRV with zerocopy
> rxdrop        14.7
> txpush        14.6
> l2fwd         11.1
>
> AF_XDP performance 1500B packets:
> Benchmark   XDP_DRV with zerocopy
> rxdrop        0.8
> l2fwd         0.8
>
> XDP performance on our system as a base line.
>
> 64B packets:
> XDP stats       CPU     Mpps       issue-pps
> XDP-RX CPU      16      14.7       0
>
> 1500B packets:
> XDP stats       CPU     Mpps       issue-pps
> XDP-RX CPU      16      0.8        0
>
> The structure of the patch set is as follows:
>
> Patch 1: Introduce Rx/Tx ring enable/disable functionality
> Patch 2: Preparatory patche to ixgbe driver code for RX
> Patch 3: ixgbe zero-copy support for RX
> Patch 4: Preparatory patch to ixgbe driver code for TX
> Patch 5: ixgbe zero-copy support for TX
>
> Changes since v1:
>
> * Removed redundant AF_XDP precondition checks, pointed out by
>   Jakub. Now, the preconditions are only checked at XDP enable time.
> * Fixed a crash in the egress path, due to incorrect usage of
>   ixgbe_ring queue_index member. In v2 a ring_idx back reference is
>   introduced, and used in favor of queue_index. William reported the
>   crash, and helped me smoke out the issue. Kudos!

Thanks! I tested this series and no more crash.
The number is pretty good (*without* spectre and meltdown fixes)
model name : Intel(R) Xeon(R) CPU E5-2440 v2 @ 1.90GHz, total 16 cores/

AF_XDP performance 64B packets:
Benchmark   XDP_DRV with zerocopy
rxdrop        20
txpush        18
l2fwd         20

Regards,
William

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v2 1/5] ixgbe: added Rx/Tx ring disable/enable functions
  2018-10-02  8:00   ` [Intel-wired-lan] " =?unknown-8bit?q?Bj=C3=B6rn_T=C3=B6pel?=
@ 2018-10-02 18:25     ` William Tu
  -1 siblings, 0 replies; 34+ messages in thread
From: William Tu @ 2018-10-02 18:25 UTC (permalink / raw)
  To: Björn Töpel
  Cc: jeffrey.t.kirsher, intel-wired-lan, Björn Töpel,
	Karlsson, Magnus, Magnus Karlsson, Alexei Starovoitov,
	Daniel Borkmann, Linux Kernel Network Developers,
	Jesper Dangaard Brouer, Test, Jakub Kicinski

On Tue, Oct 2, 2018 at 1:01 AM Björn Töpel <bjorn.topel@gmail.com> wrote:
>
> From: Björn Töpel <bjorn.topel@intel.com>
>
> Add functions for Rx/Tx ring enable/disable. Instead of resetting the
> whole device, only the affected ring is disabled or enabled.
>
> This plumbing is used in later commits, when zero-copy AF_XDP support
> is introduced.
>
> Signed-off-by: Björn Töpel <bjorn.topel@intel.com>

Thanks!
Tested-by: William Tu <u9012063@gmail.com>

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [Intel-wired-lan] [PATCH v2 1/5] ixgbe: added Rx/Tx ring disable/enable functions
@ 2018-10-02 18:25     ` William Tu
  0 siblings, 0 replies; 34+ messages in thread
From: William Tu @ 2018-10-02 18:25 UTC (permalink / raw)
  To: intel-wired-lan

On Tue, Oct 2, 2018 at 1:01 AM Bj?rn T?pel <bjorn.topel@gmail.com> wrote:
>
> From: Bj?rn T?pel <bjorn.topel@intel.com>
>
> Add functions for Rx/Tx ring enable/disable. Instead of resetting the
> whole device, only the affected ring is disabled or enabled.
>
> This plumbing is used in later commits, when zero-copy AF_XDP support
> is introduced.
>
> Signed-off-by: Bj?rn T?pel <bjorn.topel@intel.com>

Thanks!
Tested-by: William Tu <u9012063@gmail.com>

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v2 3/5] ixgbe: add AF_XDP zero-copy Rx support
  2018-10-02  8:00   ` [Intel-wired-lan] " =?unknown-8bit?q?Bj=C3=B6rn_T=C3=B6pel?=
@ 2018-10-02 18:26     ` William Tu
  -1 siblings, 0 replies; 34+ messages in thread
From: William Tu @ 2018-10-02 18:26 UTC (permalink / raw)
  To: Björn Töpel
  Cc: jeffrey.t.kirsher, intel-wired-lan, Björn Töpel,
	Karlsson, Magnus, Magnus Karlsson, Alexei Starovoitov,
	Daniel Borkmann, Linux Kernel Network Developers,
	Jesper Dangaard Brouer, Test, Jakub Kicinski

On Tue, Oct 2, 2018 at 1:01 AM Björn Töpel <bjorn.topel@gmail.com> wrote:
>
> From: Björn Töpel <bjorn.topel@intel.com>
>
> This patch adds zero-copy Rx support for AF_XDP sockets. Instead of
> allocating buffers of type MEM_TYPE_PAGE_SHARED, the Rx frames are
> allocated as MEM_TYPE_ZERO_COPY when AF_XDP is enabled for a certain
> queue.
>
> All AF_XDP specific functions are added to a new file, ixgbe_xsk.c.
>
> Note that when AF_XDP zero-copy is enabled, the XDP action XDP_PASS
> will allocate a new buffer and copy the zero-copy frame prior passing
> it to the kernel stack.
>
> Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
> ---

Thanks!
Tested-by: William Tu <u9012063@gmail.com>

>  drivers/net/ethernet/intel/ixgbe/Makefile     |   3 +-
>  drivers/net/ethernet/intel/ixgbe/ixgbe.h      |  27 +-
>  drivers/net/ethernet/intel/ixgbe/ixgbe_lib.c  |  17 +-
>  drivers/net/ethernet/intel/ixgbe/ixgbe_main.c |  78 ++-
>  .../ethernet/intel/ixgbe/ixgbe_txrx_common.h  |  15 +
>  drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c  | 628 ++++++++++++++++++
>  6 files changed, 747 insertions(+), 21 deletions(-)
>  create mode 100644 drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c
>

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [Intel-wired-lan] [PATCH v2 3/5] ixgbe: add AF_XDP zero-copy Rx support
@ 2018-10-02 18:26     ` William Tu
  0 siblings, 0 replies; 34+ messages in thread
From: William Tu @ 2018-10-02 18:26 UTC (permalink / raw)
  To: intel-wired-lan

On Tue, Oct 2, 2018 at 1:01 AM Bj?rn T?pel <bjorn.topel@gmail.com> wrote:
>
> From: Bj?rn T?pel <bjorn.topel@intel.com>
>
> This patch adds zero-copy Rx support for AF_XDP sockets. Instead of
> allocating buffers of type MEM_TYPE_PAGE_SHARED, the Rx frames are
> allocated as MEM_TYPE_ZERO_COPY when AF_XDP is enabled for a certain
> queue.
>
> All AF_XDP specific functions are added to a new file, ixgbe_xsk.c.
>
> Note that when AF_XDP zero-copy is enabled, the XDP action XDP_PASS
> will allocate a new buffer and copy the zero-copy frame prior passing
> it to the kernel stack.
>
> Signed-off-by: Bj?rn T?pel <bjorn.topel@intel.com>
> ---

Thanks!
Tested-by: William Tu <u9012063@gmail.com>

>  drivers/net/ethernet/intel/ixgbe/Makefile     |   3 +-
>  drivers/net/ethernet/intel/ixgbe/ixgbe.h      |  27 +-
>  drivers/net/ethernet/intel/ixgbe/ixgbe_lib.c  |  17 +-
>  drivers/net/ethernet/intel/ixgbe/ixgbe_main.c |  78 ++-
>  .../ethernet/intel/ixgbe/ixgbe_txrx_common.h  |  15 +
>  drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c  | 628 ++++++++++++++++++
>  6 files changed, 747 insertions(+), 21 deletions(-)
>  create mode 100644 drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c
>

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v2 5/5] ixgbe: add AF_XDP zero-copy Tx support
  2018-10-02  8:00   ` [Intel-wired-lan] " =?unknown-8bit?q?Bj=C3=B6rn_T=C3=B6pel?=
@ 2018-10-02 18:26     ` William Tu
  -1 siblings, 0 replies; 34+ messages in thread
From: William Tu @ 2018-10-02 18:26 UTC (permalink / raw)
  To: Björn Töpel
  Cc: jeffrey.t.kirsher, intel-wired-lan, Björn Töpel,
	Karlsson, Magnus, Magnus Karlsson, Alexei Starovoitov,
	Daniel Borkmann, Linux Kernel Network Developers,
	Jesper Dangaard Brouer, Test, Jakub Kicinski

On Tue, Oct 2, 2018 at 1:01 AM Björn Töpel <bjorn.topel@gmail.com> wrote:
>
> From: Björn Töpel <bjorn.topel@intel.com>
>
> This patch adds zero-copy Tx support for AF_XDP sockets. It implements
> the ndo_xsk_async_xmit netdev ndo and performs all the Tx logic from a
> NAPI context. This means pulling egress packets from the Tx ring,
> placing the frames on the NIC HW descriptor ring and completing sent
> frames back to the application via the completion ring.
>
> The regular XDP Tx ring is used for AF_XDP as well. This rationale for
> this is as follows: XDP_REDIRECT guarantees mutual exclusion between
> different NAPI contexts based on CPU id. In other words, a netdev can
> XDP_REDIRECT to another netdev with a different NAPI context, since
> the operation is bound to a specific core and each core has its own
> hardware ring.
>
> As the AF_XDP Tx action is running in the same NAPI context and using
> the same ring, it will also be protected from XDP_REDIRECT actions
> with the exact same mechanism.
>
> As with AF_XDP Rx, all AF_XDP Tx specific functions are added to
> ixgbe_xsk.c.
>
> Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
> ---

Thanks!
Tested-by: William Tu <u9012063@gmail.com>

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [Intel-wired-lan] [PATCH v2 5/5] ixgbe: add AF_XDP zero-copy Tx support
@ 2018-10-02 18:26     ` William Tu
  0 siblings, 0 replies; 34+ messages in thread
From: William Tu @ 2018-10-02 18:26 UTC (permalink / raw)
  To: intel-wired-lan

On Tue, Oct 2, 2018 at 1:01 AM Bj?rn T?pel <bjorn.topel@gmail.com> wrote:
>
> From: Bj?rn T?pel <bjorn.topel@intel.com>
>
> This patch adds zero-copy Tx support for AF_XDP sockets. It implements
> the ndo_xsk_async_xmit netdev ndo and performs all the Tx logic from a
> NAPI context. This means pulling egress packets from the Tx ring,
> placing the frames on the NIC HW descriptor ring and completing sent
> frames back to the application via the completion ring.
>
> The regular XDP Tx ring is used for AF_XDP as well. This rationale for
> this is as follows: XDP_REDIRECT guarantees mutual exclusion between
> different NAPI contexts based on CPU id. In other words, a netdev can
> XDP_REDIRECT to another netdev with a different NAPI context, since
> the operation is bound to a specific core and each core has its own
> hardware ring.
>
> As the AF_XDP Tx action is running in the same NAPI context and using
> the same ring, it will also be protected from XDP_REDIRECT actions
> with the exact same mechanism.
>
> As with AF_XDP Rx, all AF_XDP Tx specific functions are added to
> ixgbe_xsk.c.
>
> Signed-off-by: Bj?rn T?pel <bjorn.topel@intel.com>
> ---

Thanks!
Tested-by: William Tu <u9012063@gmail.com>

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v2 2/5] ixgbe: move common Rx functions to ixgbe_txrx_common.h
  2018-10-02  8:00   ` [Intel-wired-lan] " =?unknown-8bit?q?Bj=C3=B6rn_T=C3=B6pel?=
@ 2018-10-02 18:27     ` William Tu
  -1 siblings, 0 replies; 34+ messages in thread
From: William Tu @ 2018-10-02 18:27 UTC (permalink / raw)
  To: Björn Töpel
  Cc: jeffrey.t.kirsher, intel-wired-lan, Björn Töpel,
	Karlsson, Magnus, Magnus Karlsson, Alexei Starovoitov,
	Daniel Borkmann, Linux Kernel Network Developers,
	Jesper Dangaard Brouer, Test, Jakub Kicinski

On Tue, Oct 2, 2018 at 1:01 AM Björn Töpel <bjorn.topel@gmail.com> wrote:
>
> From: Björn Töpel <bjorn.topel@intel.com>
>
> This patch prepares for the upcoming zero-copy Rx functionality, by
> moving/changing linkage of common functions, used both by the regular
> path and zero-copy path.
>
> Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
> ---

Thanks!
Tested-by: William Tu <u9012063@gmail.com>

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [Intel-wired-lan] [PATCH v2 2/5] ixgbe: move common Rx functions to ixgbe_txrx_common.h
@ 2018-10-02 18:27     ` William Tu
  0 siblings, 0 replies; 34+ messages in thread
From: William Tu @ 2018-10-02 18:27 UTC (permalink / raw)
  To: intel-wired-lan

On Tue, Oct 2, 2018 at 1:01 AM Bj?rn T?pel <bjorn.topel@gmail.com> wrote:
>
> From: Bj?rn T?pel <bjorn.topel@intel.com>
>
> This patch prepares for the upcoming zero-copy Rx functionality, by
> moving/changing linkage of common functions, used both by the regular
> path and zero-copy path.
>
> Signed-off-by: Bj?rn T?pel <bjorn.topel@intel.com>
> ---

Thanks!
Tested-by: William Tu <u9012063@gmail.com>

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v2 4/5] ixgbe: move common Tx functions to ixgbe_txrx_common.h
  2018-10-02  8:00   ` [Intel-wired-lan] " =?unknown-8bit?q?Bj=C3=B6rn_T=C3=B6pel?=
@ 2018-10-02 18:28     ` William Tu
  -1 siblings, 0 replies; 34+ messages in thread
From: William Tu @ 2018-10-02 18:28 UTC (permalink / raw)
  To: Björn Töpel
  Cc: jeffrey.t.kirsher, intel-wired-lan, Björn Töpel,
	Karlsson, Magnus, Magnus Karlsson, Alexei Starovoitov,
	Daniel Borkmann, Linux Kernel Network Developers,
	Jesper Dangaard Brouer, Test, Jakub Kicinski

On Tue, Oct 2, 2018 at 1:01 AM Björn Töpel <bjorn.topel@gmail.com> wrote:
>
> From: Björn Töpel <bjorn.topel@intel.com>
>
> This patch prepares for the upcoming zero-copy Tx functionality by
> moving common functions used both by the regular path and zero-copy
> path.
>
> Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
> ---
Thanks!
Tested-by: William Tu <u9012063@gmail.com>

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [Intel-wired-lan] [PATCH v2 4/5] ixgbe: move common Tx functions to ixgbe_txrx_common.h
@ 2018-10-02 18:28     ` William Tu
  0 siblings, 0 replies; 34+ messages in thread
From: William Tu @ 2018-10-02 18:28 UTC (permalink / raw)
  To: intel-wired-lan

On Tue, Oct 2, 2018 at 1:01 AM Bj?rn T?pel <bjorn.topel@gmail.com> wrote:
>
> From: Bj?rn T?pel <bjorn.topel@intel.com>
>
> This patch prepares for the upcoming zero-copy Tx functionality by
> moving common functions used both by the regular path and zero-copy
> path.
>
> Signed-off-by: Bj?rn T?pel <bjorn.topel@intel.com>
> ---
Thanks!
Tested-by: William Tu <u9012063@gmail.com>

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v2 0/5] Introducing ixgbe AF_XDP ZC support
  2018-10-02 18:23   ` [Intel-wired-lan] " William Tu
@ 2018-10-02 18:39     ` =?unknown-8bit?q?Bj=C3=B6rn_T=C3=B6pel?=
  -1 siblings, 0 replies; 34+ messages in thread
From: Björn Töpel @ 2018-10-02 18:39 UTC (permalink / raw)
  To: William Tu, Björn Töpel
  Cc: jeffrey.t.kirsher, intel-wired-lan, Karlsson, Magnus,
	Magnus Karlsson, Alexei Starovoitov, Daniel Borkmann,
	Linux Kernel Network Developers, Jesper Dangaard Brouer, Test,
	Jakub Kicinski

On 2018-10-02 20:23, William Tu wrote:
> On Tue, Oct 2, 2018 at 1:01 AM Björn Töpel <bjorn.topel@gmail.com> wrote:
>>
>> From: Björn Töpel <bjorn.topel@intel.com>
>>
>> Jeff: Please remove the v1 patches from your dev-queue!
>>
>> This patch set introduces zero-copy AF_XDP support for Intel's ixgbe
>> driver.
>>
>> The ixgbe zero-copy code is located in its own file ixgbe_xsk.[ch],
>> analogous to the i40e ZC support. Again, as in i40e, code paths have
>> been copied from the XDP path to the zero-copy path. Going forward we
>> will try to generalize more code between the AF_XDP ZC drivers, and
>> also reduce the heavy C&P.
>>
>> We have run some benchmarks on a dual socket system with two Broadwell
>> E5 2660 @ 2.0 GHz with hyperthreading turned off. Each socket has 14
>> cores which gives a total of 28, but only two cores are used in these
>> experiments. One for TR/RX and one for the user space application. The
>> memory is DDR4 @ 2133 MT/s (1067 MHz) and the size of each DIMM is
>> 8192MB and with 8 of those DIMMs in the system we have 64 GB of total
>> memory. The compiler used is GCC 7.3.0. The NIC is Intel
>> 82599ES/X520-2 10Gbit/s using the ixgbe driver.
>>
>> Below are the results in Mpps of the 82599ES/X520-2 NIC benchmark runs
>> for 64B and 1500B packets, generated by a commercial packet generator
>> HW blasting packets at full 10Gbit/s line rate. The results are with
>> retpoline and all other spectre and meltdown fixes.
>>
>> AF_XDP performance 64B packets:
>> Benchmark   XDP_DRV with zerocopy
>> rxdrop        14.7
>> txpush        14.6
>> l2fwd         11.1
>>
>> AF_XDP performance 1500B packets:
>> Benchmark   XDP_DRV with zerocopy
>> rxdrop        0.8
>> l2fwd         0.8
>>
>> XDP performance on our system as a base line.
>>
>> 64B packets:
>> XDP stats       CPU     Mpps       issue-pps
>> XDP-RX CPU      16      14.7       0
>>
>> 1500B packets:
>> XDP stats       CPU     Mpps       issue-pps
>> XDP-RX CPU      16      0.8        0
>>
>> The structure of the patch set is as follows:
>>
>> Patch 1: Introduce Rx/Tx ring enable/disable functionality
>> Patch 2: Preparatory patche to ixgbe driver code for RX
>> Patch 3: ixgbe zero-copy support for RX
>> Patch 4: Preparatory patch to ixgbe driver code for TX
>> Patch 5: ixgbe zero-copy support for TX
>>
>> Changes since v1:
>>
>> * Removed redundant AF_XDP precondition checks, pointed out by
>>    Jakub. Now, the preconditions are only checked at XDP enable time.
>> * Fixed a crash in the egress path, due to incorrect usage of
>>    ixgbe_ring queue_index member. In v2 a ring_idx back reference is
>>    introduced, and used in favor of queue_index. William reported the
>>    crash, and helped me smoke out the issue. Kudos!
> 
> Thanks! I tested this series and no more crash.

Thank you for spending time on this!

> The number is pretty good (*without* spectre and meltdown fixes)
> model name : Intel(R) Xeon(R) CPU E5-2440 v2 @ 1.90GHz, total 16 cores/
> 
> AF_XDP performance 64B packets:
> Benchmark   XDP_DRV with zerocopy
> rxdrop        20
> txpush        18
> l2fwd         20
>

What is 20 here? Given that 14.8Mpps is maximum for 64B@10Gbit/s for
one queue, is this multiple queues? Is this xdpsock or OvS with AF_XDP?


Cheers,
Björn

> Regards,
> William
> 

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [Intel-wired-lan] [PATCH v2 0/5] Introducing ixgbe AF_XDP ZC support
@ 2018-10-02 18:39     ` =?unknown-8bit?q?Bj=C3=B6rn_T=C3=B6pel?=
  0 siblings, 0 replies; 34+ messages in thread
From: =?unknown-8bit?q?Bj=C3=B6rn_T=C3=B6pel?= @ 2018-10-02 18:39 UTC (permalink / raw)
  To: intel-wired-lan

On 2018-10-02 20:23, William Tu wrote:
> On Tue, Oct 2, 2018 at 1:01 AM Bj?rn T?pel <bjorn.topel@gmail.com> wrote:
>>
>> From: Bj?rn T?pel <bjorn.topel@intel.com>
>>
>> Jeff: Please remove the v1 patches from your dev-queue!
>>
>> This patch set introduces zero-copy AF_XDP support for Intel's ixgbe
>> driver.
>>
>> The ixgbe zero-copy code is located in its own file ixgbe_xsk.[ch],
>> analogous to the i40e ZC support. Again, as in i40e, code paths have
>> been copied from the XDP path to the zero-copy path. Going forward we
>> will try to generalize more code between the AF_XDP ZC drivers, and
>> also reduce the heavy C&P.
>>
>> We have run some benchmarks on a dual socket system with two Broadwell
>> E5 2660 @ 2.0 GHz with hyperthreading turned off. Each socket has 14
>> cores which gives a total of 28, but only two cores are used in these
>> experiments. One for TR/RX and one for the user space application. The
>> memory is DDR4 @ 2133 MT/s (1067 MHz) and the size of each DIMM is
>> 8192MB and with 8 of those DIMMs in the system we have 64 GB of total
>> memory. The compiler used is GCC 7.3.0. The NIC is Intel
>> 82599ES/X520-2 10Gbit/s using the ixgbe driver.
>>
>> Below are the results in Mpps of the 82599ES/X520-2 NIC benchmark runs
>> for 64B and 1500B packets, generated by a commercial packet generator
>> HW blasting packets at full 10Gbit/s line rate. The results are with
>> retpoline and all other spectre and meltdown fixes.
>>
>> AF_XDP performance 64B packets:
>> Benchmark   XDP_DRV with zerocopy
>> rxdrop        14.7
>> txpush        14.6
>> l2fwd         11.1
>>
>> AF_XDP performance 1500B packets:
>> Benchmark   XDP_DRV with zerocopy
>> rxdrop        0.8
>> l2fwd         0.8
>>
>> XDP performance on our system as a base line.
>>
>> 64B packets:
>> XDP stats       CPU     Mpps       issue-pps
>> XDP-RX CPU      16      14.7       0
>>
>> 1500B packets:
>> XDP stats       CPU     Mpps       issue-pps
>> XDP-RX CPU      16      0.8        0
>>
>> The structure of the patch set is as follows:
>>
>> Patch 1: Introduce Rx/Tx ring enable/disable functionality
>> Patch 2: Preparatory patche to ixgbe driver code for RX
>> Patch 3: ixgbe zero-copy support for RX
>> Patch 4: Preparatory patch to ixgbe driver code for TX
>> Patch 5: ixgbe zero-copy support for TX
>>
>> Changes since v1:
>>
>> * Removed redundant AF_XDP precondition checks, pointed out by
>>    Jakub. Now, the preconditions are only checked at XDP enable time.
>> * Fixed a crash in the egress path, due to incorrect usage of
>>    ixgbe_ring queue_index member. In v2 a ring_idx back reference is
>>    introduced, and used in favor of queue_index. William reported the
>>    crash, and helped me smoke out the issue. Kudos!
> 
> Thanks! I tested this series and no more crash.

Thank you for spending time on this!

> The number is pretty good (*without* spectre and meltdown fixes)
> model name : Intel(R) Xeon(R) CPU E5-2440 v2 @ 1.90GHz, total 16 cores/
> 
> AF_XDP performance 64B packets:
> Benchmark   XDP_DRV with zerocopy
> rxdrop        20
> txpush        18
> l2fwd         20
>

What is 20 here? Given that 14.8Mpps is maximum for 64B at 10Gbit/s for
one queue, is this multiple queues? Is this xdpsock or OvS with AF_XDP?


Cheers,
Bj?rn

> Regards,
> William
> 

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v2 0/5] Introducing ixgbe AF_XDP ZC support
  2018-10-02 18:39     ` [Intel-wired-lan] " =?unknown-8bit?q?Bj=C3=B6rn_T=C3=B6pel?=
@ 2018-10-02 18:43       ` William Tu
  -1 siblings, 0 replies; 34+ messages in thread
From: William Tu @ 2018-10-02 18:43 UTC (permalink / raw)
  To: Björn Töpel
  Cc: Björn Töpel, jeffrey.t.kirsher, intel-wired-lan,
	Karlsson, Magnus, Magnus Karlsson, Alexei Starovoitov,
	Daniel Borkmann, Linux Kernel Network Developers,
	Jesper Dangaard Brouer, Test, Jakub Kicinski

On Tue, Oct 2, 2018 at 11:39 AM Björn Töpel <bjorn.topel@intel.com> wrote:
>
> On 2018-10-02 20:23, William Tu wrote:
> > On Tue, Oct 2, 2018 at 1:01 AM Björn Töpel <bjorn.topel@gmail.com> wrote:
> >>
> >> From: Björn Töpel <bjorn.topel@intel.com>
> >>
> >> Jeff: Please remove the v1 patches from your dev-queue!
> >>
> >> This patch set introduces zero-copy AF_XDP support for Intel's ixgbe
> >> driver.
> >>
> >> The ixgbe zero-copy code is located in its own file ixgbe_xsk.[ch],
> >> analogous to the i40e ZC support. Again, as in i40e, code paths have
> >> been copied from the XDP path to the zero-copy path. Going forward we
> >> will try to generalize more code between the AF_XDP ZC drivers, and
> >> also reduce the heavy C&P.
> >>
> >> We have run some benchmarks on a dual socket system with two Broadwell
> >> E5 2660 @ 2.0 GHz with hyperthreading turned off. Each socket has 14
> >> cores which gives a total of 28, but only two cores are used in these
> >> experiments. One for TR/RX and one for the user space application. The
> >> memory is DDR4 @ 2133 MT/s (1067 MHz) and the size of each DIMM is
> >> 8192MB and with 8 of those DIMMs in the system we have 64 GB of total
> >> memory. The compiler used is GCC 7.3.0. The NIC is Intel
> >> 82599ES/X520-2 10Gbit/s using the ixgbe driver.
> >>
> >> Below are the results in Mpps of the 82599ES/X520-2 NIC benchmark runs
> >> for 64B and 1500B packets, generated by a commercial packet generator
> >> HW blasting packets at full 10Gbit/s line rate. The results are with
> >> retpoline and all other spectre and meltdown fixes.
> >>
> >> AF_XDP performance 64B packets:
> >> Benchmark   XDP_DRV with zerocopy
> >> rxdrop        14.7
> >> txpush        14.6
> >> l2fwd         11.1
> >>
> >> AF_XDP performance 1500B packets:
> >> Benchmark   XDP_DRV with zerocopy
> >> rxdrop        0.8
> >> l2fwd         0.8
> >>
> >> XDP performance on our system as a base line.
> >>
> >> 64B packets:
> >> XDP stats       CPU     Mpps       issue-pps
> >> XDP-RX CPU      16      14.7       0
> >>
> >> 1500B packets:
> >> XDP stats       CPU     Mpps       issue-pps
> >> XDP-RX CPU      16      0.8        0
> >>
> >> The structure of the patch set is as follows:
> >>
> >> Patch 1: Introduce Rx/Tx ring enable/disable functionality
> >> Patch 2: Preparatory patche to ixgbe driver code for RX
> >> Patch 3: ixgbe zero-copy support for RX
> >> Patch 4: Preparatory patch to ixgbe driver code for TX
> >> Patch 5: ixgbe zero-copy support for TX
> >>
> >> Changes since v1:
> >>
> >> * Removed redundant AF_XDP precondition checks, pointed out by
> >>    Jakub. Now, the preconditions are only checked at XDP enable time.
> >> * Fixed a crash in the egress path, due to incorrect usage of
> >>    ixgbe_ring queue_index member. In v2 a ring_idx back reference is
> >>    introduced, and used in favor of queue_index. William reported the
> >>    crash, and helped me smoke out the issue. Kudos!
> >
> > Thanks! I tested this series and no more crash.
>
> Thank you for spending time on this!
>
> > The number is pretty good (*without* spectre and meltdown fixes)
> > model name : Intel(R) Xeon(R) CPU E5-2440 v2 @ 1.90GHz, total 16 cores/
> >
> > AF_XDP performance 64B packets:
> > Benchmark   XDP_DRV with zerocopy
> > rxdrop        20
> > txpush        18
> > l2fwd         20
Sorry please ignore this number!
It's actually 2Mpps from xdpsock but that's because my sender only sends 2Mpps.
>
> What is 20 here? Given that 14.8Mpps is maximum for 64B@10Gbit/s for
> one queue, is this multiple queues? Is this xdpsock or OvS with AF_XDP?

I'm redoing the experiments with higher traffic rate, will report later..
William

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [Intel-wired-lan] [PATCH v2 0/5] Introducing ixgbe AF_XDP ZC support
@ 2018-10-02 18:43       ` William Tu
  0 siblings, 0 replies; 34+ messages in thread
From: William Tu @ 2018-10-02 18:43 UTC (permalink / raw)
  To: intel-wired-lan

On Tue, Oct 2, 2018 at 11:39 AM Bj?rn T?pel <bjorn.topel@intel.com> wrote:
>
> On 2018-10-02 20:23, William Tu wrote:
> > On Tue, Oct 2, 2018 at 1:01 AM Bj?rn T?pel <bjorn.topel@gmail.com> wrote:
> >>
> >> From: Bj?rn T?pel <bjorn.topel@intel.com>
> >>
> >> Jeff: Please remove the v1 patches from your dev-queue!
> >>
> >> This patch set introduces zero-copy AF_XDP support for Intel's ixgbe
> >> driver.
> >>
> >> The ixgbe zero-copy code is located in its own file ixgbe_xsk.[ch],
> >> analogous to the i40e ZC support. Again, as in i40e, code paths have
> >> been copied from the XDP path to the zero-copy path. Going forward we
> >> will try to generalize more code between the AF_XDP ZC drivers, and
> >> also reduce the heavy C&P.
> >>
> >> We have run some benchmarks on a dual socket system with two Broadwell
> >> E5 2660 @ 2.0 GHz with hyperthreading turned off. Each socket has 14
> >> cores which gives a total of 28, but only two cores are used in these
> >> experiments. One for TR/RX and one for the user space application. The
> >> memory is DDR4 @ 2133 MT/s (1067 MHz) and the size of each DIMM is
> >> 8192MB and with 8 of those DIMMs in the system we have 64 GB of total
> >> memory. The compiler used is GCC 7.3.0. The NIC is Intel
> >> 82599ES/X520-2 10Gbit/s using the ixgbe driver.
> >>
> >> Below are the results in Mpps of the 82599ES/X520-2 NIC benchmark runs
> >> for 64B and 1500B packets, generated by a commercial packet generator
> >> HW blasting packets at full 10Gbit/s line rate. The results are with
> >> retpoline and all other spectre and meltdown fixes.
> >>
> >> AF_XDP performance 64B packets:
> >> Benchmark   XDP_DRV with zerocopy
> >> rxdrop        14.7
> >> txpush        14.6
> >> l2fwd         11.1
> >>
> >> AF_XDP performance 1500B packets:
> >> Benchmark   XDP_DRV with zerocopy
> >> rxdrop        0.8
> >> l2fwd         0.8
> >>
> >> XDP performance on our system as a base line.
> >>
> >> 64B packets:
> >> XDP stats       CPU     Mpps       issue-pps
> >> XDP-RX CPU      16      14.7       0
> >>
> >> 1500B packets:
> >> XDP stats       CPU     Mpps       issue-pps
> >> XDP-RX CPU      16      0.8        0
> >>
> >> The structure of the patch set is as follows:
> >>
> >> Patch 1: Introduce Rx/Tx ring enable/disable functionality
> >> Patch 2: Preparatory patche to ixgbe driver code for RX
> >> Patch 3: ixgbe zero-copy support for RX
> >> Patch 4: Preparatory patch to ixgbe driver code for TX
> >> Patch 5: ixgbe zero-copy support for TX
> >>
> >> Changes since v1:
> >>
> >> * Removed redundant AF_XDP precondition checks, pointed out by
> >>    Jakub. Now, the preconditions are only checked at XDP enable time.
> >> * Fixed a crash in the egress path, due to incorrect usage of
> >>    ixgbe_ring queue_index member. In v2 a ring_idx back reference is
> >>    introduced, and used in favor of queue_index. William reported the
> >>    crash, and helped me smoke out the issue. Kudos!
> >
> > Thanks! I tested this series and no more crash.
>
> Thank you for spending time on this!
>
> > The number is pretty good (*without* spectre and meltdown fixes)
> > model name : Intel(R) Xeon(R) CPU E5-2440 v2 @ 1.90GHz, total 16 cores/
> >
> > AF_XDP performance 64B packets:
> > Benchmark   XDP_DRV with zerocopy
> > rxdrop        20
> > txpush        18
> > l2fwd         20
Sorry please ignore this number!
It's actually 2Mpps from xdpsock but that's because my sender only sends 2Mpps.
>
> What is 20 here? Given that 14.8Mpps is maximum for 64B at 10Gbit/s for
> one queue, is this multiple queues? Is this xdpsock or OvS with AF_XDP?

I'm redoing the experiments with higher traffic rate, will report later..
William

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v2 0/5] Introducing ixgbe AF_XDP ZC support
  2018-10-02  8:00 ` [Intel-wired-lan] " =?unknown-8bit?q?Bj=C3=B6rn_T=C3=B6pel?=
@ 2018-10-04 21:18   ` Jesper Dangaard Brouer
  -1 siblings, 0 replies; 34+ messages in thread
From: Jesper Dangaard Brouer @ 2018-10-04 21:18 UTC (permalink / raw)
  To: Björn Töpel
  Cc: jeffrey.t.kirsher, intel-wired-lan, Björn Töpel,
	magnus.karlsson, magnus.karlsson, ast, daniel, netdev, u9012063,
	tuc, jakub.kicinski, brouer

On Tue,  2 Oct 2018 10:00:29 +0200
Björn Töpel <bjorn.topel@gmail.com> wrote:

> From: Björn Töpel <bjorn.topel@intel.com>
> 
> Jeff: Please remove the v1 patches from your dev-queue!
> 
> This patch set introduces zero-copy AF_XDP support for Intel's ixgbe
> driver.
> 
> The ixgbe zero-copy code is located in its own file ixgbe_xsk.[ch],
> analogous to the i40e ZC support. Again, as in i40e, code paths have
> been copied from the XDP path to the zero-copy path. Going forward we
> will try to generalize more code between the AF_XDP ZC drivers, and
> also reduce the heavy C&P.
> 
> We have run some benchmarks on a dual socket system with two Broadwell
> E5 2660 @ 2.0 GHz with hyperthreading turned off. Each socket has 14
> cores which gives a total of 28, but only two cores are used in these
> experiments. One for TR/RX and one for the user space application. The
> memory is DDR4 @ 2133 MT/s (1067 MHz) and the size of each DIMM is
> 8192MB and with 8 of those DIMMs in the system we have 64 GB of total
> memory. The compiler used is GCC 7.3.0. The NIC is Intel
> 82599ES/X520-2 10Gbit/s using the ixgbe driver.
> 
> Below are the results in Mpps of the 82599ES/X520-2 NIC benchmark runs
> for 64B and 1500B packets, generated by a commercial packet generator
> HW blasting packets at full 10Gbit/s line rate. The results are with
> retpoline and all other spectre and meltdown fixes.
> 
> AF_XDP performance 64B packets:
> Benchmark   XDP_DRV with zerocopy
> rxdrop        14.7
> txpush        14.6

I see similar performance numbers, but my system can crash with 'txonly'.

See full crash log and my analysis, below.

> l2fwd         11.1

Got l2fwd 13.2 Mpps.


> 
> AF_XDP performance 1500B packets:
> Benchmark   XDP_DRV with zerocopy
> rxdrop        0.8
> l2fwd         0.8
> 
> XDP performance on our system as a base line.
> 
> 64B packets:
> XDP stats       CPU     Mpps       issue-pps
> XDP-RX CPU      16      14.7       0
> 
> 1500B packets:
> XDP stats       CPU     Mpps       issue-pps
> XDP-RX CPU      16      0.8        0
> 
> The structure of the patch set is as follows:
> 
> Patch 1: Introduce Rx/Tx ring enable/disable functionality
> Patch 2: Preparatory patche to ixgbe driver code for RX
> Patch 3: ixgbe zero-copy support for RX
> Patch 4: Preparatory patch to ixgbe driver code for TX
> Patch 5: ixgbe zero-copy support for TX
> 
> Changes since v1:
> 
> * Removed redundant AF_XDP precondition checks, pointed out by
>   Jakub. Now, the preconditions are only checked at XDP enable time.
> * Fixed a crash in the egress path, due to incorrect usage of
>   ixgbe_ring queue_index member. In v2 a ring_idx back reference is
>   introduced, and used in favor of queue_index. William reported the
>   crash, and helped me smoke out the issue. Kudos!
> * In ixgbe_xsk_async_xmit, validate qid against num_xdp_queues,
>   instead of num_rx_queues.
> 
> Cheers!
> Björn
> 
> Björn Töpel (5):
>   ixgbe: added Rx/Tx ring disable/enable functions
>   ixgbe: move common Rx functions to ixgbe_txrx_common.h
>   ixgbe: add AF_XDP zero-copy Rx support
>   ixgbe: move common Tx functions to ixgbe_txrx_common.h
>   ixgbe: add AF_XDP zero-copy Tx support
> 
>  drivers/net/ethernet/intel/ixgbe/Makefile     |   3 +-
>  drivers/net/ethernet/intel/ixgbe/ixgbe.h      |  28 +-
>  drivers/net/ethernet/intel/ixgbe/ixgbe_lib.c  |  17 +-
>  drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 291 ++++++-
>  .../ethernet/intel/ixgbe/ixgbe_txrx_common.h  |  50 ++
>  drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c  | 803 ++++++++++++++++++
>  6 files changed, 1146 insertions(+), 46 deletions(-)
>  create mode 100644 drivers/net/ethernet/intel/ixgbe/ixgbe_txrx_common.h
>  create mode 100644 drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c



 sock0@ixgbe2:0 rxdrop 	
                pps         pkts        1.00       
rx              14,572,284  36,093,496 
tx              0           0          


 sock0@ixgbe2:0 l2fwd 	
                pps         pkts        1.00       
rx              13,287,830  108,616,192
tx              13,287,830  108,616,284




Notice, the crash only happens some times (on the second invocation):

$ sudo ./xdpsock --interface ixgbe2 --txonly --zero
samples/bpf/xdpsock_user.c:kick_tx:749: Assertion failed: 0: errno: 100/"Network is down"

 sock0@ixgbe2:0 txonly 	
                pps         pkts        0.05       
rx              0           0          
tx              33,763      1,709      


$ sudo ./xdpsock --interface ixgbe2 --txonly --zero

 sock0@ixgbe2:0 txonly 	
                pps         pkts        1.00       
rx              0           0          
tx              14,730,354  14,733,404 


$ sudo ./xdpsock --interface ixgbe2 --txonly --zero
samples/bpf/xdpsock_user.c:kick_tx:749: Assertion failed: 0: errno: 100/"Network is down"

 sock0@ixgbe2:0 txonly 	
                pps         pkts        0.26       
rx              0           0          
tx              2,054,927   524,680    

$ sudo ./xdpsock --interface ixgbe2 --txonly --zero


[  249.953547] ixgbe 0000:01:00.1 ixgbe2: detected SFP+: 4
[  250.204158] ixgbe 0000:01:00.1 ixgbe2: NIC Link is Up 10 Gbps, Flow Control: None
[  257.217496] ixgbe 0000:01:00.1: removed PHC on ixgbe2
[  257.279328] ixgbe 0000:01:00.1: Multiqueue Disabled: Rx Queue count = 1, Tx Queue count = 1 XDP Queue count = 6
[  257.308463] ixgbe 0000:01:00.1: registered PHC device on ixgbe2
[  257.489166] ixgbe 0000:01:00.1 ixgbe2: detected SFP+: 4
[  257.494923] ixgbe 0000:01:00.1 ixgbe2: initiating reset to clear Tx work after link loss
[  257.716190] ixgbe 0000:01:00.1 ixgbe2: Reset adapter
[  257.968552] ixgbe 0000:01:00.1 ixgbe2: detected SFP+: 4
[  258.185273] ixgbe 0000:01:00.1 ixgbe2: NIC Link is Up 10 Gbps, Flow Control: None
[  260.836196] BUG: unable to handle kernel NULL pointer dereference at 0000000000000040
[  260.844652] PGD 0 P4D 0 
[  260.847527] Oops: 0002 [#1] PREEMPT SMP PTI
[  260.852042] CPU: 1 PID: 18 Comm: ksoftirqd/1 Not tainted 4.19.0-rc5-bpf-next-xdp-ixgbe-ZC+ #66
[  260.861269] Hardware name: Supermicro Super Server/X10SRi-F, BIOS 2.0a 08/01/2016
[  260.869381] RIP: 0010:xsk_umem_consume_tx+0xc9/0x180
[  260.874682] Code: 24 75 be 48 8b 86 08 03 00 00 48 8d b0 f8 fc ff ff 48 39 c7 75 96 e8 26 bd 8a ff 5b 31 c0 41 5a 41 5c 41 5d 5d 49 8d 62 f8 c3 <89> 41 40 8b 4a 24 8b 42 1c 29 c8 75 0b 48 8b 42 28 8b 00 89 42 1c
[  260.894317] RSP: 0018:ffffc9000323bd00 EFLAGS: 00010246
[  260.899873] RAX: 0000000000000000 RBX: ffffc9000323bd68 RCX: 0000000000000000
[  260.907339] RDX: ffff8808553e1c00 RSI: ffff880826e43000 RDI: ffff880854940818
[  260.914801] RBP: ffffc9000323bd20 R08: 0000000000000010 R09: 0000000000000000
[  260.922263] R10: ffffc9000323bd40 R11: 0000000000000000 R12: ffffc9000323bd64
[  260.929726] R13: ffff880854940780 R14: 0000000000000000 R15: 0000000000000000
[  260.937189] FS:  0000000000000000(0000) GS:ffff88085c640000(0000) knlGS:0000000000000000
[  260.945871] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  260.951943] CR2: 0000000000000040 CR3: 000000087f20a006 CR4: 00000000003606e0
[  260.959409] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  260.966872] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  260.974333] Call Trace:
[  260.977115]  ? ixgbe_clean_xdp_tx_irq+0x19d/0x2e0 [ixgbe]
[  260.982843]  ixgbe_clean_xdp_tx_irq+0x19d/0x2e0 [ixgbe]
[  260.988426]  ixgbe_poll+0x5a/0x700 [ixgbe]
[  260.992850]  net_rx_action+0x141/0x3f0
[  260.996931]  ? sort_range+0x20/0x20
[  261.000743]  __do_softirq+0xe3/0x2f7
[  261.004656]  ? sort_range+0x20/0x20
[  261.008490]  run_ksoftirqd+0x26/0x30
[  261.012420]  smpboot_thread_fn+0x114/0x1d0
[  261.016848]  kthread+0x111/0x130
[  261.020423]  ? kthread_create_worker_on_cpu+0x50/0x50
[  261.025802]  ret_from_fork+0x1f/0x30
[  261.029707] Modules linked in: ip6table_filter ip6_tables iptable_filter ip_tables x_tables tun nfnetlink bridge nf_defrag_ipv6 nf_defrag_ipv4 bpfilter sunrpc coretemp intel_cstate intel_uncore intel_rapl_perf pcspkr i2c_i801 wmi ipmi_si ipmi_devintf ipmi_msghandler acpi_pad pcc_cpufreq sch_fq_codel ixgbe mdio mlx5_core i40e igb nfp ptp i2c_algo_bit devlink i2c_core pps_core hid_generic [last unloaded: x_tables]
[  261.067878] CR2: 0000000000000040
[  261.071526] ---[ end trace f0011e17c3744ee4 ]---
[  261.077903] RIP: 0010:xsk_umem_consume_tx+0xc9/0x180
[  261.083191] Code: 24 75 be 48 8b 86 08 03 00 00 48 8d b0 f8 fc ff ff 48 39 c7 75 96 e8 26 bd 8a ff 5b 31 c0 41 5a 41 5c 41 5d 5d 49 8d 62 f8 c3 <89> 41 40 8b 4a 24 8b 42 1c 29 c8 75 0b 48 8b 42 28 8b 00 89 42 1c
[  261.102852] RSP: 0018:ffffc9000323bd00 EFLAGS: 00010246
[  261.108423] RAX: 0000000000000000 RBX: ffffc9000323bd68 RCX: 0000000000000000
[  261.115889] RDX: ffff8808553e1c00 RSI: ffff880826e43000 RDI: ffff880854940818
[  261.123382] RBP: ffffc9000323bd20 R08: 0000000000000010 R09: 0000000000000000
[  261.130847] R10: ffffc9000323bd40 R11: 0000000000000000 R12: ffffc9000323bd64
[  261.138325] R13: ffff880854940780 R14: 0000000000000000 R15: 0000000000000000
[  261.145788] FS:  0000000000000000(0000) GS:ffff88085c640000(0000) knlGS:0000000000000000
[  261.154503] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  261.160594] CR2: 0000000000000040 CR3: 000000087f20a006 CR4: 00000000003606e0
[  261.168070] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  261.175547] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  261.183012] Kernel panic - not syncing: Fatal exception in interrupt
[  261.189743] Kernel Offset: disabled
[  261.194954] ---[ end Kernel panic - not syncing: Fatal exception in interrupt ]---
[  261.203123] ------------[ cut here ]------------
[  261.208071] sched: Unexpected reschedule of offline CPU#0!
[  261.213885] WARNING: CPU: 1 PID: 18 at arch/x86/kernel/smp.c:128 native_smp_send_reschedule+0x31/0x40
[  261.223698] Modules linked in: ip6table_filter ip6_tables iptable_filter ip_tables x_tables tun nfnetlink bridge nf_defrag_ipv6 nf_defrag_ipv4 bpfilter sunrpc coretemp intel_cstate intel_uncore intel_rapl_perf pcspkr i2c_i801 wmi ipmi_si ipmi_devintf ipmi_msghandler acpi_pad pcc_cpufreq sch_fq_codel ixgbe mdio mlx5_core i40e igb nfp ptp i2c_algo_bit devlink i2c_core pps_core hid_generic [last unloaded: x_tables]
[  261.261869] CPU: 1 PID: 18 Comm: ksoftirqd/1 Tainted: G      D           4.19.0-rc5-bpf-next-xdp-ixgbe-ZC+ #66
[  261.272468] Hardware name: Supermicro Super Server/X10SRi-F, BIOS 2.0a 08/01/2016
[  261.280549] RIP: 0010:native_smp_send_reschedule+0x31/0x40
[  261.286361] Code: 48 0f a3 05 91 c7 3d 01 73 12 48 8b 05 e8 11 0c 01 be fd 00 00 00 48 8b 40 30 ff e0 89 fe 48 c7 c7 b8 36 09 82 e8 ff 7d 02 00 <0f> 0b c3 66 90 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 53 48
[  261.306001] RSP: 0018:ffff88085c643cc0 EFLAGS: 00010082
[  261.311553] RAX: 000000000000002e RBX: ffff88085c6213c0 RCX: 0000000000000006
[  261.319023] RDX: 0000000000000007 RSI: 0000000000000092 RDI: ffff88085c6555e0
[  261.326483] RBP: ffff88085306a0d4 R08: 0000000000000000 R09: 0000000000000478
[  261.333943] R10: ffff88085c643bf8 R11: ffffffff82acfbad R12: ffff880853069640
[  261.341407] R13: ffff88085c643d10 R14: 0000000000000086 R15: 00000000000213c0
[  261.348869] FS:  0000000000000000(0000) GS:ffff88085c640000(0000) knlGS:0000000000000000
[  261.357555] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  261.363624] CR2: 0000000000000040 CR3: 000000087f20a006 CR4: 00000000003606e0
[  261.371090] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  261.378554] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  261.386014] Call Trace:
[  261.388788]  <IRQ>
[  261.391128]  check_preempt_curr+0x6f/0x80
[  261.395466]  ttwu_do_wakeup+0x19/0x150
[  261.399548]  try_to_wake_up+0x19c/0x450
[  261.403715]  ? enqueue_entity+0xad/0x2c0
[  261.407964]  __wake_up_common+0x71/0x170
[  261.412220]  ep_poll_callback+0xb5/0x2a0
[  261.416474]  __wake_up_common+0x71/0x170
[  261.420729]  __wake_up_common_lock+0x6c/0x90
[  261.425335]  ? tick_sched_do_timer+0x60/0x60
[  261.429935]  irq_work_run_list+0x47/0x70
[  261.434190]  update_process_times+0x3b/0x50
[  261.438705]  tick_sched_handle+0x21/0x70
[  261.442959]  ? tick_sched_do_timer+0x50/0x60
[  261.447554]  tick_sched_timer+0x37/0x70
[  261.451719]  __hrtimer_run_queues+0xf8/0x2a0
[  261.456317]  hrtimer_interrupt+0xe5/0x240
[  261.460657]  ? sched_clock+0x5/0x10
[  261.464478]  smp_apic_timer_interrupt+0x5e/0x140
[  261.469420]  apic_timer_interrupt+0xf/0x20
[  261.473847]  </IRQ>
[  261.476271] RIP: 0010:panic+0x1e3/0x232
[  261.480433] Code: eb ac 83 3d 30 07 a0 01 00 74 05 e8 39 36 02 00 48 c7 c6 a0 8b ac 82 48 c7 c7 10 af 09 82 e8 84 6a 05 00 fb 66 0f 1f 44 00 00 <31> db e8 f8 22 0b 00 4c 39 eb 7c 17 41 83 f4 01 44 89 e7 ff 15 d6
[  261.500066] RSP: 0018:ffffc9000323baf8 EFLAGS: 00000292 ORIG_RAX: ffffffffffffff13
[  261.508234] RAX: 0000000000000046 RBX: 0000000000000000 RCX: 0000000000000006
[  261.515696] RDX: 0000000000000007 RSI: 0000000000000082 RDI: ffff88085c6555e0
[  261.523160] RBP: ffffc9000323bb68 R08: 0000000000000000 R09: 0000000000000476
[  261.530620] R10: 0000000000000008 R11: ffffffff82acfbad R12: 0000000000000000
[  261.538084] R13: 0000000000000000 R14: 0000000000000009 R15: 0000000000000001
[  261.545546]  ? panic+0x1dc/0x232
[  261.549101]  oops_end+0xb9/0xd0
[  261.552569]  no_context+0x156/0x3a0
[  261.556392]  ? cpumask_next_and+0x1a/0x20
[  261.560730]  ? find_busiest_group+0x112/0xa80
[  261.565413]  __do_page_fault+0xd5/0x500
[  261.569579]  page_fault+0x1e/0x30
[  261.573220] RIP: 0010:xsk_umem_consume_tx+0xc9/0x180
[  261.578508] Code: 24 75 be 48 8b 86 08 03 00 00 48 8d b0 f8 fc ff ff 48 39 c7 75 96 e8 26 bd 8a ff 5b 31 c0 41 5a 41 5c 41 5d 5d 49 8d 62 f8 c3 <89> 41 40 8b 4a 24 8b 42 1c 29 c8 75 0b 48 8b 42 28 8b 00 89 42 1c
[  261.598148] RSP: 0018:ffffc9000323bd00 EFLAGS: 00010246
[  261.603703] RAX: 0000000000000000 RBX: ffffc9000323bd68 RCX: 0000000000000000
[  261.611169] RDX: ffff8808553e1c00 RSI: ffff880826e43000 RDI: ffff880854940818
[  261.618631] RBP: ffffc9000323bd20 R08: 0000000000000010 R09: 0000000000000000
[  261.626094] R10: ffffc9000323bd40 R11: 0000000000000000 R12: ffffc9000323bd64
[  261.633557] R13: ffff880854940780 R14: 0000000000000000 R15: 0000000000000000
[  261.641021]  ? ixgbe_clean_xdp_tx_irq+0x19d/0x2e0 [ixgbe]
[  261.646755]  ixgbe_clean_xdp_tx_irq+0x19d/0x2e0 [ixgbe]
[  261.652308]  ixgbe_poll+0x5a/0x700 [ixgbe]
[  261.656735]  net_rx_action+0x141/0x3f0
[  261.660814]  ? sort_range+0x20/0x20
[  261.664627]  __do_softirq+0xe3/0x2f7
[  261.668530]  ? sort_range+0x20/0x20
[  261.672351]  run_ksoftirqd+0x26/0x30
[  261.676250]  smpboot_thread_fn+0x114/0x1d0
[  261.680671]  kthread+0x111/0x130
[  261.684223]  ? kthread_create_worker_on_cpu+0x50/0x50
[  261.689603]  ret_from_fork+0x1f/0x30
[  261.701291] ---[ end trace f0011e17c3744ee5 ]---


(gdb) list *(xsk_umem_consume_tx)+0xc9
0xffffffff81883fe9 is in xsk_umem_consume_tx (./include/linux/compiler.h:214).
209	static __always_inline void __write_once_size(volatile void *p, void *res, int size)
210	{
211		switch (size) {
212		case 1: *(volatile __u8 *)p = *(__u8 *)res; break;
213		case 2: *(volatile __u16 *)p = *(__u16 *)res; break;
214		case 4: *(volatile __u32 *)p = *(__u32 *)res; break;
215		case 8: *(volatile __u64 *)p = *(__u64 *)res; break;
216		default:
217			barrier();
218			__builtin_memcpy((void *)p, (const void *)res, size);


I think the bug occurs in the WRITE_ONCE in xskq_peek_desc() and
it correspond to q->ring == NULL (as ring have offset 40)

static inline struct xdp_desc *xskq_peek_desc(struct xsk_queue *q,
					      struct xdp_desc *desc)
{
	if (q->cons_tail == q->cons_head) {
		WRITE_ONCE(q->ring->consumer, q->cons_tail);
		q->cons_head = q->cons_tail + xskq_nb_avail(q, RX_BATCH_SIZE);

		/* Order consumer and data */
		smp_rmb();
	}

	return xskq_validate_desc(q, desc);
}

$ pahole -C xsk_queue vmlinux
struct xsk_queue {
	u64                        chunk_mask;           /*     0     8 */
	u64                        size;                 /*     8     8 */
	u32                        ring_mask;            /*    16     4 */
	u32                        nentries;             /*    20     4 */
	u32                        prod_head;            /*    24     4 */
	u32                        prod_tail;            /*    28     4 */
	u32                        cons_head;            /*    32     4 */
	u32                        cons_tail;            /*    36     4 */
	struct xdp_ring *          ring;                 /*    40     8 */
	u64                        invalid_descs;        /*    48     8 */

	/* size: 56, cachelines: 1, members: 10 */
	/* last cacheline: 56 bytes */
};
 

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [Intel-wired-lan] [PATCH v2 0/5] Introducing ixgbe AF_XDP ZC support
@ 2018-10-04 21:18   ` Jesper Dangaard Brouer
  0 siblings, 0 replies; 34+ messages in thread
From: Jesper Dangaard Brouer @ 2018-10-04 21:18 UTC (permalink / raw)
  To: intel-wired-lan

On Tue,  2 Oct 2018 10:00:29 +0200
Bj?rn T?pel <bjorn.topel@gmail.com> wrote:

> From: Bj?rn T?pel <bjorn.topel@intel.com>
> 
> Jeff: Please remove the v1 patches from your dev-queue!
> 
> This patch set introduces zero-copy AF_XDP support for Intel's ixgbe
> driver.
> 
> The ixgbe zero-copy code is located in its own file ixgbe_xsk.[ch],
> analogous to the i40e ZC support. Again, as in i40e, code paths have
> been copied from the XDP path to the zero-copy path. Going forward we
> will try to generalize more code between the AF_XDP ZC drivers, and
> also reduce the heavy C&P.
> 
> We have run some benchmarks on a dual socket system with two Broadwell
> E5 2660 @ 2.0 GHz with hyperthreading turned off. Each socket has 14
> cores which gives a total of 28, but only two cores are used in these
> experiments. One for TR/RX and one for the user space application. The
> memory is DDR4 @ 2133 MT/s (1067 MHz) and the size of each DIMM is
> 8192MB and with 8 of those DIMMs in the system we have 64 GB of total
> memory. The compiler used is GCC 7.3.0. The NIC is Intel
> 82599ES/X520-2 10Gbit/s using the ixgbe driver.
> 
> Below are the results in Mpps of the 82599ES/X520-2 NIC benchmark runs
> for 64B and 1500B packets, generated by a commercial packet generator
> HW blasting packets at full 10Gbit/s line rate. The results are with
> retpoline and all other spectre and meltdown fixes.
> 
> AF_XDP performance 64B packets:
> Benchmark   XDP_DRV with zerocopy
> rxdrop        14.7
> txpush        14.6

I see similar performance numbers, but my system can crash with 'txonly'.

See full crash log and my analysis, below.

> l2fwd         11.1

Got l2fwd 13.2 Mpps.


> 
> AF_XDP performance 1500B packets:
> Benchmark   XDP_DRV with zerocopy
> rxdrop        0.8
> l2fwd         0.8
> 
> XDP performance on our system as a base line.
> 
> 64B packets:
> XDP stats       CPU     Mpps       issue-pps
> XDP-RX CPU      16      14.7       0
> 
> 1500B packets:
> XDP stats       CPU     Mpps       issue-pps
> XDP-RX CPU      16      0.8        0
> 
> The structure of the patch set is as follows:
> 
> Patch 1: Introduce Rx/Tx ring enable/disable functionality
> Patch 2: Preparatory patche to ixgbe driver code for RX
> Patch 3: ixgbe zero-copy support for RX
> Patch 4: Preparatory patch to ixgbe driver code for TX
> Patch 5: ixgbe zero-copy support for TX
> 
> Changes since v1:
> 
> * Removed redundant AF_XDP precondition checks, pointed out by
>   Jakub. Now, the preconditions are only checked at XDP enable time.
> * Fixed a crash in the egress path, due to incorrect usage of
>   ixgbe_ring queue_index member. In v2 a ring_idx back reference is
>   introduced, and used in favor of queue_index. William reported the
>   crash, and helped me smoke out the issue. Kudos!
> * In ixgbe_xsk_async_xmit, validate qid against num_xdp_queues,
>   instead of num_rx_queues.
> 
> Cheers!
> Bj?rn
> 
> Bj?rn T?pel (5):
>   ixgbe: added Rx/Tx ring disable/enable functions
>   ixgbe: move common Rx functions to ixgbe_txrx_common.h
>   ixgbe: add AF_XDP zero-copy Rx support
>   ixgbe: move common Tx functions to ixgbe_txrx_common.h
>   ixgbe: add AF_XDP zero-copy Tx support
> 
>  drivers/net/ethernet/intel/ixgbe/Makefile     |   3 +-
>  drivers/net/ethernet/intel/ixgbe/ixgbe.h      |  28 +-
>  drivers/net/ethernet/intel/ixgbe/ixgbe_lib.c  |  17 +-
>  drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 291 ++++++-
>  .../ethernet/intel/ixgbe/ixgbe_txrx_common.h  |  50 ++
>  drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c  | 803 ++++++++++++++++++
>  6 files changed, 1146 insertions(+), 46 deletions(-)
>  create mode 100644 drivers/net/ethernet/intel/ixgbe/ixgbe_txrx_common.h
>  create mode 100644 drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c



 sock0 at ixgbe2:0 rxdrop 	
                pps         pkts        1.00       
rx              14,572,284  36,093,496 
tx              0           0          


 sock0 at ixgbe2:0 l2fwd 	
                pps         pkts        1.00       
rx              13,287,830  108,616,192
tx              13,287,830  108,616,284




Notice, the crash only happens some times (on the second invocation):

$ sudo ./xdpsock --interface ixgbe2 --txonly --zero
samples/bpf/xdpsock_user.c:kick_tx:749: Assertion failed: 0: errno: 100/"Network is down"

 sock0 at ixgbe2:0 txonly 	
                pps         pkts        0.05       
rx              0           0          
tx              33,763      1,709      


$ sudo ./xdpsock --interface ixgbe2 --txonly --zero

 sock0 at ixgbe2:0 txonly 	
                pps         pkts        1.00       
rx              0           0          
tx              14,730,354  14,733,404 


$ sudo ./xdpsock --interface ixgbe2 --txonly --zero
samples/bpf/xdpsock_user.c:kick_tx:749: Assertion failed: 0: errno: 100/"Network is down"

 sock0 at ixgbe2:0 txonly 	
                pps         pkts        0.26       
rx              0           0          
tx              2,054,927   524,680    

$ sudo ./xdpsock --interface ixgbe2 --txonly --zero


[  249.953547] ixgbe 0000:01:00.1 ixgbe2: detected SFP+: 4
[  250.204158] ixgbe 0000:01:00.1 ixgbe2: NIC Link is Up 10 Gbps, Flow Control: None
[  257.217496] ixgbe 0000:01:00.1: removed PHC on ixgbe2
[  257.279328] ixgbe 0000:01:00.1: Multiqueue Disabled: Rx Queue count = 1, Tx Queue count = 1 XDP Queue count = 6
[  257.308463] ixgbe 0000:01:00.1: registered PHC device on ixgbe2
[  257.489166] ixgbe 0000:01:00.1 ixgbe2: detected SFP+: 4
[  257.494923] ixgbe 0000:01:00.1 ixgbe2: initiating reset to clear Tx work after link loss
[  257.716190] ixgbe 0000:01:00.1 ixgbe2: Reset adapter
[  257.968552] ixgbe 0000:01:00.1 ixgbe2: detected SFP+: 4
[  258.185273] ixgbe 0000:01:00.1 ixgbe2: NIC Link is Up 10 Gbps, Flow Control: None
[  260.836196] BUG: unable to handle kernel NULL pointer dereference at 0000000000000040
[  260.844652] PGD 0 P4D 0 
[  260.847527] Oops: 0002 [#1] PREEMPT SMP PTI
[  260.852042] CPU: 1 PID: 18 Comm: ksoftirqd/1 Not tainted 4.19.0-rc5-bpf-next-xdp-ixgbe-ZC+ #66
[  260.861269] Hardware name: Supermicro Super Server/X10SRi-F, BIOS 2.0a 08/01/2016
[  260.869381] RIP: 0010:xsk_umem_consume_tx+0xc9/0x180
[  260.874682] Code: 24 75 be 48 8b 86 08 03 00 00 48 8d b0 f8 fc ff ff 48 39 c7 75 96 e8 26 bd 8a ff 5b 31 c0 41 5a 41 5c 41 5d 5d 49 8d 62 f8 c3 <89> 41 40 8b 4a 24 8b 42 1c 29 c8 75 0b 48 8b 42 28 8b 00 89 42 1c
[  260.894317] RSP: 0018:ffffc9000323bd00 EFLAGS: 00010246
[  260.899873] RAX: 0000000000000000 RBX: ffffc9000323bd68 RCX: 0000000000000000
[  260.907339] RDX: ffff8808553e1c00 RSI: ffff880826e43000 RDI: ffff880854940818
[  260.914801] RBP: ffffc9000323bd20 R08: 0000000000000010 R09: 0000000000000000
[  260.922263] R10: ffffc9000323bd40 R11: 0000000000000000 R12: ffffc9000323bd64
[  260.929726] R13: ffff880854940780 R14: 0000000000000000 R15: 0000000000000000
[  260.937189] FS:  0000000000000000(0000) GS:ffff88085c640000(0000) knlGS:0000000000000000
[  260.945871] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  260.951943] CR2: 0000000000000040 CR3: 000000087f20a006 CR4: 00000000003606e0
[  260.959409] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  260.966872] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  260.974333] Call Trace:
[  260.977115]  ? ixgbe_clean_xdp_tx_irq+0x19d/0x2e0 [ixgbe]
[  260.982843]  ixgbe_clean_xdp_tx_irq+0x19d/0x2e0 [ixgbe]
[  260.988426]  ixgbe_poll+0x5a/0x700 [ixgbe]
[  260.992850]  net_rx_action+0x141/0x3f0
[  260.996931]  ? sort_range+0x20/0x20
[  261.000743]  __do_softirq+0xe3/0x2f7
[  261.004656]  ? sort_range+0x20/0x20
[  261.008490]  run_ksoftirqd+0x26/0x30
[  261.012420]  smpboot_thread_fn+0x114/0x1d0
[  261.016848]  kthread+0x111/0x130
[  261.020423]  ? kthread_create_worker_on_cpu+0x50/0x50
[  261.025802]  ret_from_fork+0x1f/0x30
[  261.029707] Modules linked in: ip6table_filter ip6_tables iptable_filter ip_tables x_tables tun nfnetlink bridge nf_defrag_ipv6 nf_defrag_ipv4 bpfilter sunrpc coretemp intel_cstate intel_uncore intel_rapl_perf pcspkr i2c_i801 wmi ipmi_si ipmi_devintf ipmi_msghandler acpi_pad pcc_cpufreq sch_fq_codel ixgbe mdio mlx5_core i40e igb nfp ptp i2c_algo_bit devlink i2c_core pps_core hid_generic [last unloaded: x_tables]
[  261.067878] CR2: 0000000000000040
[  261.071526] ---[ end trace f0011e17c3744ee4 ]---
[  261.077903] RIP: 0010:xsk_umem_consume_tx+0xc9/0x180
[  261.083191] Code: 24 75 be 48 8b 86 08 03 00 00 48 8d b0 f8 fc ff ff 48 39 c7 75 96 e8 26 bd 8a ff 5b 31 c0 41 5a 41 5c 41 5d 5d 49 8d 62 f8 c3 <89> 41 40 8b 4a 24 8b 42 1c 29 c8 75 0b 48 8b 42 28 8b 00 89 42 1c
[  261.102852] RSP: 0018:ffffc9000323bd00 EFLAGS: 00010246
[  261.108423] RAX: 0000000000000000 RBX: ffffc9000323bd68 RCX: 0000000000000000
[  261.115889] RDX: ffff8808553e1c00 RSI: ffff880826e43000 RDI: ffff880854940818
[  261.123382] RBP: ffffc9000323bd20 R08: 0000000000000010 R09: 0000000000000000
[  261.130847] R10: ffffc9000323bd40 R11: 0000000000000000 R12: ffffc9000323bd64
[  261.138325] R13: ffff880854940780 R14: 0000000000000000 R15: 0000000000000000
[  261.145788] FS:  0000000000000000(0000) GS:ffff88085c640000(0000) knlGS:0000000000000000
[  261.154503] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  261.160594] CR2: 0000000000000040 CR3: 000000087f20a006 CR4: 00000000003606e0
[  261.168070] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  261.175547] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  261.183012] Kernel panic - not syncing: Fatal exception in interrupt
[  261.189743] Kernel Offset: disabled
[  261.194954] ---[ end Kernel panic - not syncing: Fatal exception in interrupt ]---
[  261.203123] ------------[ cut here ]------------
[  261.208071] sched: Unexpected reschedule of offline CPU#0!
[  261.213885] WARNING: CPU: 1 PID: 18 at arch/x86/kernel/smp.c:128 native_smp_send_reschedule+0x31/0x40
[  261.223698] Modules linked in: ip6table_filter ip6_tables iptable_filter ip_tables x_tables tun nfnetlink bridge nf_defrag_ipv6 nf_defrag_ipv4 bpfilter sunrpc coretemp intel_cstate intel_uncore intel_rapl_perf pcspkr i2c_i801 wmi ipmi_si ipmi_devintf ipmi_msghandler acpi_pad pcc_cpufreq sch_fq_codel ixgbe mdio mlx5_core i40e igb nfp ptp i2c_algo_bit devlink i2c_core pps_core hid_generic [last unloaded: x_tables]
[  261.261869] CPU: 1 PID: 18 Comm: ksoftirqd/1 Tainted: G      D           4.19.0-rc5-bpf-next-xdp-ixgbe-ZC+ #66
[  261.272468] Hardware name: Supermicro Super Server/X10SRi-F, BIOS 2.0a 08/01/2016
[  261.280549] RIP: 0010:native_smp_send_reschedule+0x31/0x40
[  261.286361] Code: 48 0f a3 05 91 c7 3d 01 73 12 48 8b 05 e8 11 0c 01 be fd 00 00 00 48 8b 40 30 ff e0 89 fe 48 c7 c7 b8 36 09 82 e8 ff 7d 02 00 <0f> 0b c3 66 90 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 53 48
[  261.306001] RSP: 0018:ffff88085c643cc0 EFLAGS: 00010082
[  261.311553] RAX: 000000000000002e RBX: ffff88085c6213c0 RCX: 0000000000000006
[  261.319023] RDX: 0000000000000007 RSI: 0000000000000092 RDI: ffff88085c6555e0
[  261.326483] RBP: ffff88085306a0d4 R08: 0000000000000000 R09: 0000000000000478
[  261.333943] R10: ffff88085c643bf8 R11: ffffffff82acfbad R12: ffff880853069640
[  261.341407] R13: ffff88085c643d10 R14: 0000000000000086 R15: 00000000000213c0
[  261.348869] FS:  0000000000000000(0000) GS:ffff88085c640000(0000) knlGS:0000000000000000
[  261.357555] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  261.363624] CR2: 0000000000000040 CR3: 000000087f20a006 CR4: 00000000003606e0
[  261.371090] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  261.378554] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  261.386014] Call Trace:
[  261.388788]  <IRQ>
[  261.391128]  check_preempt_curr+0x6f/0x80
[  261.395466]  ttwu_do_wakeup+0x19/0x150
[  261.399548]  try_to_wake_up+0x19c/0x450
[  261.403715]  ? enqueue_entity+0xad/0x2c0
[  261.407964]  __wake_up_common+0x71/0x170
[  261.412220]  ep_poll_callback+0xb5/0x2a0
[  261.416474]  __wake_up_common+0x71/0x170
[  261.420729]  __wake_up_common_lock+0x6c/0x90
[  261.425335]  ? tick_sched_do_timer+0x60/0x60
[  261.429935]  irq_work_run_list+0x47/0x70
[  261.434190]  update_process_times+0x3b/0x50
[  261.438705]  tick_sched_handle+0x21/0x70
[  261.442959]  ? tick_sched_do_timer+0x50/0x60
[  261.447554]  tick_sched_timer+0x37/0x70
[  261.451719]  __hrtimer_run_queues+0xf8/0x2a0
[  261.456317]  hrtimer_interrupt+0xe5/0x240
[  261.460657]  ? sched_clock+0x5/0x10
[  261.464478]  smp_apic_timer_interrupt+0x5e/0x140
[  261.469420]  apic_timer_interrupt+0xf/0x20
[  261.473847]  </IRQ>
[  261.476271] RIP: 0010:panic+0x1e3/0x232
[  261.480433] Code: eb ac 83 3d 30 07 a0 01 00 74 05 e8 39 36 02 00 48 c7 c6 a0 8b ac 82 48 c7 c7 10 af 09 82 e8 84 6a 05 00 fb 66 0f 1f 44 00 00 <31> db e8 f8 22 0b 00 4c 39 eb 7c 17 41 83 f4 01 44 89 e7 ff 15 d6
[  261.500066] RSP: 0018:ffffc9000323baf8 EFLAGS: 00000292 ORIG_RAX: ffffffffffffff13
[  261.508234] RAX: 0000000000000046 RBX: 0000000000000000 RCX: 0000000000000006
[  261.515696] RDX: 0000000000000007 RSI: 0000000000000082 RDI: ffff88085c6555e0
[  261.523160] RBP: ffffc9000323bb68 R08: 0000000000000000 R09: 0000000000000476
[  261.530620] R10: 0000000000000008 R11: ffffffff82acfbad R12: 0000000000000000
[  261.538084] R13: 0000000000000000 R14: 0000000000000009 R15: 0000000000000001
[  261.545546]  ? panic+0x1dc/0x232
[  261.549101]  oops_end+0xb9/0xd0
[  261.552569]  no_context+0x156/0x3a0
[  261.556392]  ? cpumask_next_and+0x1a/0x20
[  261.560730]  ? find_busiest_group+0x112/0xa80
[  261.565413]  __do_page_fault+0xd5/0x500
[  261.569579]  page_fault+0x1e/0x30
[  261.573220] RIP: 0010:xsk_umem_consume_tx+0xc9/0x180
[  261.578508] Code: 24 75 be 48 8b 86 08 03 00 00 48 8d b0 f8 fc ff ff 48 39 c7 75 96 e8 26 bd 8a ff 5b 31 c0 41 5a 41 5c 41 5d 5d 49 8d 62 f8 c3 <89> 41 40 8b 4a 24 8b 42 1c 29 c8 75 0b 48 8b 42 28 8b 00 89 42 1c
[  261.598148] RSP: 0018:ffffc9000323bd00 EFLAGS: 00010246
[  261.603703] RAX: 0000000000000000 RBX: ffffc9000323bd68 RCX: 0000000000000000
[  261.611169] RDX: ffff8808553e1c00 RSI: ffff880826e43000 RDI: ffff880854940818
[  261.618631] RBP: ffffc9000323bd20 R08: 0000000000000010 R09: 0000000000000000
[  261.626094] R10: ffffc9000323bd40 R11: 0000000000000000 R12: ffffc9000323bd64
[  261.633557] R13: ffff880854940780 R14: 0000000000000000 R15: 0000000000000000
[  261.641021]  ? ixgbe_clean_xdp_tx_irq+0x19d/0x2e0 [ixgbe]
[  261.646755]  ixgbe_clean_xdp_tx_irq+0x19d/0x2e0 [ixgbe]
[  261.652308]  ixgbe_poll+0x5a/0x700 [ixgbe]
[  261.656735]  net_rx_action+0x141/0x3f0
[  261.660814]  ? sort_range+0x20/0x20
[  261.664627]  __do_softirq+0xe3/0x2f7
[  261.668530]  ? sort_range+0x20/0x20
[  261.672351]  run_ksoftirqd+0x26/0x30
[  261.676250]  smpboot_thread_fn+0x114/0x1d0
[  261.680671]  kthread+0x111/0x130
[  261.684223]  ? kthread_create_worker_on_cpu+0x50/0x50
[  261.689603]  ret_from_fork+0x1f/0x30
[  261.701291] ---[ end trace f0011e17c3744ee5 ]---


(gdb) list *(xsk_umem_consume_tx)+0xc9
0xffffffff81883fe9 is in xsk_umem_consume_tx (./include/linux/compiler.h:214).
209	static __always_inline void __write_once_size(volatile void *p, void *res, int size)
210	{
211		switch (size) {
212		case 1: *(volatile __u8 *)p = *(__u8 *)res; break;
213		case 2: *(volatile __u16 *)p = *(__u16 *)res; break;
214		case 4: *(volatile __u32 *)p = *(__u32 *)res; break;
215		case 8: *(volatile __u64 *)p = *(__u64 *)res; break;
216		default:
217			barrier();
218			__builtin_memcpy((void *)p, (const void *)res, size);


I think the bug occurs in the WRITE_ONCE in xskq_peek_desc() and
it correspond to q->ring == NULL (as ring have offset 40)

static inline struct xdp_desc *xskq_peek_desc(struct xsk_queue *q,
					      struct xdp_desc *desc)
{
	if (q->cons_tail == q->cons_head) {
		WRITE_ONCE(q->ring->consumer, q->cons_tail);
		q->cons_head = q->cons_tail + xskq_nb_avail(q, RX_BATCH_SIZE);

		/* Order consumer and data */
		smp_rmb();
	}

	return xskq_validate_desc(q, desc);
}

$ pahole -C xsk_queue vmlinux
struct xsk_queue {
	u64                        chunk_mask;           /*     0     8 */
	u64                        size;                 /*     8     8 */
	u32                        ring_mask;            /*    16     4 */
	u32                        nentries;             /*    20     4 */
	u32                        prod_head;            /*    24     4 */
	u32                        prod_tail;            /*    28     4 */
	u32                        cons_head;            /*    32     4 */
	u32                        cons_tail;            /*    36     4 */
	struct xdp_ring *          ring;                 /*    40     8 */
	u64                        invalid_descs;        /*    48     8 */

	/* size: 56, cachelines: 1, members: 10 */
	/* last cacheline: 56 bytes */
};
 

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v2 0/5] Introducing ixgbe AF_XDP ZC support
  2018-10-04 21:18   ` [Intel-wired-lan] " Jesper Dangaard Brouer
@ 2018-10-05  4:59     ` =?unknown-8bit?q?Bj=C3=B6rn_T=C3=B6pel?=
  -1 siblings, 0 replies; 34+ messages in thread
From: Björn Töpel @ 2018-10-05  4:59 UTC (permalink / raw)
  To: Jesper Dangaard Brouer, Björn Töpel
  Cc: jeffrey.t.kirsher, intel-wired-lan, magnus.karlsson,
	magnus.karlsson, ast, daniel, netdev, u9012063, tuc,
	jakub.kicinski

On 2018-10-04 23:18, Jesper Dangaard Brouer wrote:
> I see similar performance numbers, but my system can crash with 'txonly'.

Thanks for finding this, Jesper!

Can you give me your "lspci -vvv" dump of your NIC, so I know what ixgbe
flavor you've got?

I'll dig into it right away.


Björn

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [Intel-wired-lan] [PATCH v2 0/5] Introducing ixgbe AF_XDP ZC support
@ 2018-10-05  4:59     ` =?unknown-8bit?q?Bj=C3=B6rn_T=C3=B6pel?=
  0 siblings, 0 replies; 34+ messages in thread
From: =?unknown-8bit?q?Bj=C3=B6rn_T=C3=B6pel?= @ 2018-10-05  4:59 UTC (permalink / raw)
  To: intel-wired-lan

On 2018-10-04 23:18, Jesper Dangaard Brouer wrote:
> I see similar performance numbers, but my system can crash with 'txonly'.

Thanks for finding this, Jesper!

Can you give me your "lspci -vvv" dump of your NIC, so I know what ixgbe
flavor you've got?

I'll dig into it right away.


Bj?rn

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v2 0/5] Introducing ixgbe AF_XDP ZC support
  2018-10-05  4:59     ` [Intel-wired-lan] " =?unknown-8bit?q?Bj=C3=B6rn_T=C3=B6pel?=
@ 2018-10-05 11:30       ` =?unknown-8bit?q?Bj=C3=B6rn_T=C3=B6pel?=
  -1 siblings, 0 replies; 34+ messages in thread
From: Björn Töpel @ 2018-10-05 11:30 UTC (permalink / raw)
  To: Jesper Dangaard Brouer, Björn Töpel
  Cc: jeffrey.t.kirsher, intel-wired-lan, magnus.karlsson,
	magnus.karlsson, ast, daniel, netdev, u9012063, tuc,
	jakub.kicinski

On 2018-10-05 06:59, Björn Töpel wrote:
> On 2018-10-04 23:18, Jesper Dangaard Brouer wrote:
>> I see similar performance numbers, but my system can crash with 'txonly'.
> 
> Thanks for finding this, Jesper!
> 
> Can you give me your "lspci -vvv" dump of your NIC, so I know what ixgbe
> flavor you've got?
> 
> I'll dig into it right away.
>

Jesper, there's (hopefully) a fix for the crash here:

   https://patchwork.ozlabs.org/patch/979442/

Thanks for spending time on the ixgbe ZC patches!


> 
> Björn

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [Intel-wired-lan] [PATCH v2 0/5] Introducing ixgbe AF_XDP ZC support
@ 2018-10-05 11:30       ` =?unknown-8bit?q?Bj=C3=B6rn_T=C3=B6pel?=
  0 siblings, 0 replies; 34+ messages in thread
From: =?unknown-8bit?q?Bj=C3=B6rn_T=C3=B6pel?= @ 2018-10-05 11:30 UTC (permalink / raw)
  To: intel-wired-lan

On 2018-10-05 06:59, Bj?rn T?pel wrote:
> On 2018-10-04 23:18, Jesper Dangaard Brouer wrote:
>> I see similar performance numbers, but my system can crash with 'txonly'.
> 
> Thanks for finding this, Jesper!
> 
> Can you give me your "lspci -vvv" dump of your NIC, so I know what ixgbe
> flavor you've got?
> 
> I'll dig into it right away.
>

Jesper, there's (hopefully) a fix for the crash here:

   https://patchwork.ozlabs.org/patch/979442/

Thanks for spending time on the ixgbe ZC patches!


> 
> Bj?rn

^ permalink raw reply	[flat|nested] 34+ messages in thread

end of thread, other threads:[~2018-10-05 18:29 UTC | newest]

Thread overview: 34+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-10-02  8:00 [PATCH v2 0/5] Introducing ixgbe AF_XDP ZC support Björn Töpel
2018-10-02  8:00 ` [Intel-wired-lan] " =?unknown-8bit?q?Bj=C3=B6rn_T=C3=B6pel?=
2018-10-02  8:00 ` [PATCH v2 1/5] ixgbe: added Rx/Tx ring disable/enable functions Björn Töpel
2018-10-02  8:00   ` [Intel-wired-lan] " =?unknown-8bit?q?Bj=C3=B6rn_T=C3=B6pel?=
2018-10-02 18:25   ` William Tu
2018-10-02 18:25     ` [Intel-wired-lan] " William Tu
2018-10-02  8:00 ` [PATCH v2 2/5] ixgbe: move common Rx functions to ixgbe_txrx_common.h Björn Töpel
2018-10-02  8:00   ` [Intel-wired-lan] " =?unknown-8bit?q?Bj=C3=B6rn_T=C3=B6pel?=
2018-10-02 18:27   ` William Tu
2018-10-02 18:27     ` [Intel-wired-lan] " William Tu
2018-10-02  8:00 ` [PATCH v2 3/5] ixgbe: add AF_XDP zero-copy Rx support Björn Töpel
2018-10-02  8:00   ` [Intel-wired-lan] " =?unknown-8bit?q?Bj=C3=B6rn_T=C3=B6pel?=
2018-10-02 18:26   ` William Tu
2018-10-02 18:26     ` [Intel-wired-lan] " William Tu
2018-10-02  8:00 ` [PATCH v2 4/5] ixgbe: move common Tx functions to ixgbe_txrx_common.h Björn Töpel
2018-10-02  8:00   ` [Intel-wired-lan] " =?unknown-8bit?q?Bj=C3=B6rn_T=C3=B6pel?=
2018-10-02 18:28   ` William Tu
2018-10-02 18:28     ` [Intel-wired-lan] " William Tu
2018-10-02  8:00 ` [PATCH v2 5/5] ixgbe: add AF_XDP zero-copy Tx support Björn Töpel
2018-10-02  8:00   ` [Intel-wired-lan] " =?unknown-8bit?q?Bj=C3=B6rn_T=C3=B6pel?=
2018-10-02 18:26   ` William Tu
2018-10-02 18:26     ` [Intel-wired-lan] " William Tu
2018-10-02 18:23 ` [PATCH v2 0/5] Introducing ixgbe AF_XDP ZC support William Tu
2018-10-02 18:23   ` [Intel-wired-lan] " William Tu
2018-10-02 18:39   ` Björn Töpel
2018-10-02 18:39     ` [Intel-wired-lan] " =?unknown-8bit?q?Bj=C3=B6rn_T=C3=B6pel?=
2018-10-02 18:43     ` William Tu
2018-10-02 18:43       ` [Intel-wired-lan] " William Tu
2018-10-04 21:18 ` Jesper Dangaard Brouer
2018-10-04 21:18   ` [Intel-wired-lan] " Jesper Dangaard Brouer
2018-10-05  4:59   ` Björn Töpel
2018-10-05  4:59     ` [Intel-wired-lan] " =?unknown-8bit?q?Bj=C3=B6rn_T=C3=B6pel?=
2018-10-05 11:30     ` Björn Töpel
2018-10-05 11:30       ` [Intel-wired-lan] " =?unknown-8bit?q?Bj=C3=B6rn_T=C3=B6pel?=

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.