All of lore.kernel.org
 help / color / mirror / Atom feed
* [Intel-wired-lan] [PATCH v5 0/2] i40e: support for XDP
@ 2017-05-19  7:08 =?unknown-8bit?q?Bj=C3=B6rn_T=C3=B6pel?=
  2017-05-19  7:08 ` [Intel-wired-lan] [PATCH v5 1/2] i40e: add XDP support for pass and drop actions =?unknown-8bit?q?Bj=C3=B6rn_T=C3=B6pel?=
                   ` (2 more replies)
  0 siblings, 3 replies; 8+ messages in thread
From: =?unknown-8bit?q?Bj=C3=B6rn_T=C3=B6pel?= @ 2017-05-19  7:08 UTC (permalink / raw)
  To: intel-wired-lan

From: Bj?rn T?pel <bjorn.topel@intel.com>

This series adds XDP support for i40e-based NICs.

The first patch wires up ndo_xdp and implements XDP_DROP semantics for
all actions. The second patch adds egress support via the XDP_TX
action.

Performance numbers (40GbE port, 64B packets) for xdp1 and xdp2
programs, from samples/bpf/:

 IOMMU                      | xdp1      | xdp2
 ---------------------------+-----------+-----------
 iommu=off                  | 29.7 Mpps | 17.1 Mpps
 iommu=pt intel_iommu=on    | 29.7 Mpps | 11.6 Mpps
 iommu=on intel_iommu=on    | 21.8 Mpps |  3.7 Mpps

Future improvements, not covered by the patches:
  * Egress: Create the iova mappings upfront
    (DMA_BIDIRECTIONAL/dma_sync_*), instead of creating a new iova
    mapping in the transmit fast-path. This will improve performance
    for the IOMMU-enabled case.
  * Proper debugfs support.
  * i40evf support.

Thanks to Alex, Daniel, John and Scott for all feedback!

v5:
  * Aligned the implementation to ixgbe's XDP support: naming, favor
    xchg instead of RCU semantics
  * Support for XDP headroom (piggybacking on Alex' build_skb work)
  * Added xdp tracepoints for exception states (as suggested by
    Daniel)

v4:
  * Removed unused i40e_page_is_reserved function
  * Prior running the XDP program, set the struct xdp_buff
    data_hard_start member

v3:
  * Rebased patch set on Jeff's dev-queue branch
  * MSI-X is no longer a prerequisite for XDP
  * RCU locking for the XDP program and XDP_RX support is introduced
    in the same patch
  * Rx bytes is now bumped for XDP
  * Removed pointer-to-pointer clunkiness
  * Added comments to XDP preconditions in ndo_xdp
  * When a non-EOF is received, log once, and drop the frame

v2:
  * Fixed kbuild error for PAGE_SIZE >= 8192.
  * Renamed i40e_try_flip_rx_page to i40e_can_reuse_rx_page, which is
    more in line to the other Intel Ethernet drivers (igb/fm10k).
  * Validate xdp_adjust_head support in ndo_xdp/XDP_SETUP_PROG.

Bj?rn T?pel (2):
  i40e: add XDP support for pass and drop actions
  i40e: add support for XDP_TX action

 drivers/net/ethernet/intel/i40e/i40e.h         |   8 +
 drivers/net/ethernet/intel/i40e/i40e_ethtool.c |  57 +++++-
 drivers/net/ethernet/intel/i40e/i40e_main.c    | 270 ++++++++++++++++++++++---
 drivers/net/ethernet/intel/i40e/i40e_txrx.c    | 245 ++++++++++++++++++----
 drivers/net/ethernet/intel/i40e/i40e_txrx.h    |  12 ++
 5 files changed, 530 insertions(+), 62 deletions(-)

-- 
2.11.0


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Intel-wired-lan] [PATCH v5 1/2] i40e: add XDP support for pass and drop actions
  2017-05-19  7:08 [Intel-wired-lan] [PATCH v5 0/2] i40e: support for XDP =?unknown-8bit?q?Bj=C3=B6rn_T=C3=B6pel?=
@ 2017-05-19  7:08 ` =?unknown-8bit?q?Bj=C3=B6rn_T=C3=B6pel?=
  2017-05-19 12:29   ` Alexander Duyck
  2017-05-19  7:08 ` [Intel-wired-lan] [PATCH v5 2/2] i40e: add support for XDP_TX action =?unknown-8bit?q?Bj=C3=B6rn_T=C3=B6pel?=
  2017-05-19 13:55 ` [Intel-wired-lan] [PATCH v5 0/2] i40e: support for XDP Alexander Duyck
  2 siblings, 1 reply; 8+ messages in thread
From: =?unknown-8bit?q?Bj=C3=B6rn_T=C3=B6pel?= @ 2017-05-19  7:08 UTC (permalink / raw)
  To: intel-wired-lan

From: Bj?rn T?pel <bjorn.topel@intel.com>

This commit adds basic XDP support for i40e derived NICs. All XDP
actions will end up in XDP_DROP.

Signed-off-by: Bj?rn T?pel <bjorn.topel@intel.com>
---
 drivers/net/ethernet/intel/i40e/i40e.h      |   7 ++
 drivers/net/ethernet/intel/i40e/i40e_main.c |  75 ++++++++++++++++
 drivers/net/ethernet/intel/i40e/i40e_txrx.c | 130 +++++++++++++++++++++-------
 drivers/net/ethernet/intel/i40e/i40e_txrx.h |   1 +
 4 files changed, 182 insertions(+), 31 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e.h b/drivers/net/ethernet/intel/i40e/i40e.h
index 395ca94faf80..d3195b29d53c 100644
--- a/drivers/net/ethernet/intel/i40e/i40e.h
+++ b/drivers/net/ethernet/intel/i40e/i40e.h
@@ -645,6 +645,8 @@ struct i40e_vsi {
 	u16 max_frame;
 	u16 rx_buf_len;
 
+	struct bpf_prog *xdp_prog;
+
 	/* List of q_vectors allocated to this VSI */
 	struct i40e_q_vector **q_vectors;
 	int num_q_vectors;
@@ -972,4 +974,9 @@ i40e_status i40e_get_npar_bw_setting(struct i40e_pf *pf);
 i40e_status i40e_set_npar_bw_setting(struct i40e_pf *pf);
 i40e_status i40e_commit_npar_bw_setting(struct i40e_pf *pf);
 void i40e_print_link_message(struct i40e_vsi *vsi, bool isup);
+
+static inline bool i40e_enabled_xdp_vsi(struct i40e_vsi *vsi)
+{
+	return !!vsi->xdp_prog;
+}
 #endif /* _I40E_H_ */
diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c
index 8d1d3b859af7..27a29904611a 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -27,6 +27,7 @@
 #include <linux/etherdevice.h>
 #include <linux/of_net.h>
 #include <linux/pci.h>
+#include <linux/bpf.h>
 
 /* Local includes */
 #include "i40e.h"
@@ -2408,6 +2409,13 @@ static int i40e_change_mtu(struct net_device *netdev, int new_mtu)
 	struct i40e_vsi *vsi = np->vsi;
 	struct i40e_pf *pf = vsi->back;
 
+	if (i40e_enabled_xdp_vsi(vsi)) {
+		int frame_size = new_mtu + ETH_HLEN + ETH_FCS_LEN + VLAN_HLEN;
+
+		if (frame_size > vsi->rx_buf_len)
+			return -EINVAL;
+	}
+
 	netdev_info(netdev, "changing MTU from %d to %d\n",
 		    netdev->mtu, new_mtu);
 	netdev->mtu = new_mtu;
@@ -9310,6 +9318,72 @@ static netdev_features_t i40e_features_check(struct sk_buff *skb,
 	return features & ~(NETIF_F_CSUM_MASK | NETIF_F_GSO_MASK);
 }
 
+/**
+ * i40e_xdp_setup - add/remove an XDP program
+ * @vsi: VSI to changed
+ * @prog: XDP program
+ **/
+static int i40e_xdp_setup(struct i40e_vsi *vsi,
+			  struct bpf_prog *prog)
+{
+	struct i40e_pf *pf = vsi->back;
+	int frame_size = vsi->netdev->mtu + ETH_HLEN + ETH_FCS_LEN + VLAN_HLEN;
+	int i;
+	bool need_reset;
+	struct bpf_prog *old_prog;
+
+	/* Don't allow frames that span over multiple buffers */
+	if (frame_size > vsi->rx_buf_len)
+		return -EINVAL;
+
+	if (!i40e_enabled_xdp_vsi(vsi) && !prog)
+		return 0;
+
+	/* When turning XDP on->off/off->on we reset and rebuild the rings. */
+	need_reset = (i40e_enabled_xdp_vsi(vsi) != !!prog);
+
+	if (need_reset)
+		i40e_prep_for_reset(pf, true);
+
+	old_prog = xchg(&vsi->xdp_prog, prog);
+
+	if (need_reset)
+		i40e_reset_and_rebuild(pf, true, true);
+
+	for (i = 0; i < vsi->num_queue_pairs; i++)
+		WRITE_ONCE(vsi->rx_rings[i]->xdp_prog, vsi->xdp_prog);
+
+	if (old_prog)
+		bpf_prog_put(old_prog);
+
+	return 0;
+}
+
+/**
+ * i40e_xdp - implements ndo_xdp for i40e
+ * @dev: netdevice
+ * @xdp: XDP command
+ **/
+static int i40e_xdp(struct net_device *dev,
+		    struct netdev_xdp *xdp)
+{
+	struct i40e_netdev_priv *np = netdev_priv(dev);
+	struct i40e_vsi *vsi = np->vsi;
+
+	if (vsi->type != I40E_VSI_MAIN)
+		return -EINVAL;
+
+	switch (xdp->command) {
+	case XDP_SETUP_PROG:
+		return i40e_xdp_setup(vsi, xdp->prog);
+	case XDP_QUERY_PROG:
+		xdp->prog_attached = i40e_enabled_xdp_vsi(vsi);
+		return 0;
+	default:
+		return -EINVAL;
+	}
+}
+
 static const struct net_device_ops i40e_netdev_ops = {
 	.ndo_open		= i40e_open,
 	.ndo_stop		= i40e_close,
@@ -9342,6 +9416,7 @@ static const struct net_device_ops i40e_netdev_ops = {
 	.ndo_features_check	= i40e_features_check,
 	.ndo_bridge_getlink	= i40e_ndo_bridge_getlink,
 	.ndo_bridge_setlink	= i40e_ndo_bridge_setlink,
+	.ndo_xdp		= i40e_xdp,
 };
 
 /**
diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.c b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
index af554f3cda19..0c2f0230faf4 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_txrx.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
@@ -26,6 +26,7 @@
 
 #include <linux/prefetch.h>
 #include <net/busy_poll.h>
+#include <linux/bpf_trace.h>
 #include "i40e.h"
 #include "i40e_trace.h"
 #include "i40e_prototype.h"
@@ -1195,6 +1196,7 @@ void i40e_clean_rx_ring(struct i40e_ring *rx_ring)
 void i40e_free_rx_resources(struct i40e_ring *rx_ring)
 {
 	i40e_clean_rx_ring(rx_ring);
+	rx_ring->xdp_prog = NULL;
 	kfree(rx_ring->rx_bi);
 	rx_ring->rx_bi = NULL;
 
@@ -1241,6 +1243,8 @@ int i40e_setup_rx_descriptors(struct i40e_ring *rx_ring)
 	rx_ring->next_to_clean = 0;
 	rx_ring->next_to_use = 0;
 
+	rx_ring->xdp_prog = rx_ring->vsi->xdp_prog;
+
 	return 0;
 err:
 	kfree(rx_ring->rx_bi);
@@ -1592,6 +1596,7 @@ void i40e_process_skb_fields(struct i40e_ring *rx_ring,
 /**
  * i40e_cleanup_headers - Correct empty headers
  * @rx_ring: rx descriptor ring packet is being transacted on
+ * @rx_desc: pointer to the EOP Rx descriptor
  * @skb: pointer to current skb being fixed
  *
  * Also address the case where we are pulling data in on pages only
@@ -1602,8 +1607,25 @@ void i40e_process_skb_fields(struct i40e_ring *rx_ring,
  *
  * Returns true if an error was encountered and skb was freed.
  **/
-static bool i40e_cleanup_headers(struct i40e_ring *rx_ring, struct sk_buff *skb)
+static bool i40e_cleanup_headers(struct i40e_ring *rx_ring,
+				 union i40e_rx_desc *rx_desc,
+				 struct sk_buff *skb)
 {
+	/* XDP packets use error pointer so abort at this point */
+	if (IS_ERR(skb))
+		return true;
+
+	/* ERR_MASK will only have valid bits if EOP set, and
+	 * what we are doing here is actually checking
+	 * I40E_RX_DESC_ERROR_RXE_SHIFT, since it is the zeroth bit in
+	 * the error field
+	 */
+	if (unlikely(i40e_test_staterr(rx_desc,
+				       BIT(I40E_RXD_QW1_ERROR_SHIFT)))) {
+		dev_kfree_skb_any(skb);
+		return true;
+	}
+
 	/* if eth_skb_pad returns an error the skb was freed */
 	if (eth_skb_pad(skb))
 		return true;
@@ -1783,10 +1805,10 @@ static struct i40e_rx_buffer *i40e_get_rx_buffer(struct i40e_ring *rx_ring,
  * skb correctly.
  */
 static struct sk_buff *i40e_construct_skb(struct i40e_ring *rx_ring,
-					  struct i40e_rx_buffer *rx_buffer,
-					  unsigned int size)
+					  struct xdp_buff *xdp,
+					  struct i40e_rx_buffer *rx_buffer)
 {
-	void *va = page_address(rx_buffer->page) + rx_buffer->page_offset;
+	unsigned int size = xdp->data_end - xdp->data;
 #if (PAGE_SIZE < 8192)
 	unsigned int truesize = i40e_rx_pg_size(rx_ring) / 2;
 #else
@@ -1796,9 +1818,9 @@ static struct sk_buff *i40e_construct_skb(struct i40e_ring *rx_ring,
 	struct sk_buff *skb;
 
 	/* prefetch first cache line of first page */
-	prefetch(va);
+	prefetch(xdp->data);
 #if L1_CACHE_BYTES < 128
-	prefetch(va + L1_CACHE_BYTES);
+	prefetch(xdp->data + L1_CACHE_BYTES);
 #endif
 
 	/* allocate a skb to store the frags */
@@ -1811,10 +1833,11 @@ static struct sk_buff *i40e_construct_skb(struct i40e_ring *rx_ring,
 	/* Determine available headroom for copy */
 	headlen = size;
 	if (headlen > I40E_RX_HDR_SIZE)
-		headlen = eth_get_headlen(va, I40E_RX_HDR_SIZE);
+		headlen = eth_get_headlen(xdp->data, I40E_RX_HDR_SIZE);
 
 	/* align pull length to size of long to optimize memcpy performance */
-	memcpy(__skb_put(skb, headlen), va, ALIGN(headlen, sizeof(long)));
+	memcpy(__skb_put(skb, headlen), xdp->data,
+	       ALIGN(headlen, sizeof(long)));
 
 	/* update all of the pointers */
 	size -= headlen;
@@ -1847,10 +1870,10 @@ static struct sk_buff *i40e_construct_skb(struct i40e_ring *rx_ring,
  * to set up the skb correctly and avoid any memcpy overhead.
  */
 static struct sk_buff *i40e_build_skb(struct i40e_ring *rx_ring,
-				      struct i40e_rx_buffer *rx_buffer,
-				      unsigned int size)
+				      struct xdp_buff *xdp,
+				      struct i40e_rx_buffer *rx_buffer)
 {
-	void *va = page_address(rx_buffer->page) + rx_buffer->page_offset;
+	unsigned int size = xdp->data_end - xdp->data;
 #if (PAGE_SIZE < 8192)
 	unsigned int truesize = i40e_rx_pg_size(rx_ring) / 2;
 #else
@@ -1860,12 +1883,12 @@ static struct sk_buff *i40e_build_skb(struct i40e_ring *rx_ring,
 	struct sk_buff *skb;
 
 	/* prefetch first cache line of first page */
-	prefetch(va);
+	prefetch(xdp->data);
 #if L1_CACHE_BYTES < 128
-	prefetch(va + L1_CACHE_BYTES);
+	prefetch(xdp->data + L1_CACHE_BYTES);
 #endif
 	/* build an skb around the page buffer */
-	skb = build_skb(va - I40E_SKB_PAD, truesize);
+	skb = build_skb(xdp->data_hard_start, truesize);
 	if (unlikely(!skb))
 		return NULL;
 
@@ -1944,6 +1967,46 @@ static bool i40e_is_non_eop(struct i40e_ring *rx_ring,
 	return true;
 }
 
+#define I40E_XDP_PASS 0
+#define I40E_XDP_CONSUMED 1
+
+/**
+ * i40e_run_xdp - run an XDP program
+ * @rx_ring: Rx ring being processed
+ * @xdp: XDP buffer containing the frame
+ **/
+static struct sk_buff *i40e_run_xdp(struct i40e_ring *rx_ring,
+				    struct xdp_buff *xdp)
+{
+	int result = I40E_XDP_PASS;
+	struct bpf_prog *xdp_prog;
+	u32 act;
+
+	rcu_read_lock();
+	xdp_prog = READ_ONCE(rx_ring->xdp_prog);
+
+	if (!xdp_prog)
+		goto xdp_out;
+
+	act = bpf_prog_run_xdp(xdp_prog, xdp);
+	switch (act) {
+	case XDP_PASS:
+		break;
+	default:
+		bpf_warn_invalid_xdp_action(act);
+	case XDP_TX:
+	case XDP_ABORTED:
+		trace_xdp_exception(rx_ring->netdev, xdp_prog, act);
+		/* fallthrough -- handle aborts by dropping packet */
+	case XDP_DROP:
+		result = I40E_XDP_CONSUMED;
+		break;
+	}
+xdp_out:
+	rcu_read_unlock();
+	return ERR_PTR(-result);
+}
+
 /**
  * i40e_clean_rx_irq - Clean completed descriptors from Rx ring - bounce buf
  * @rx_ring: rx descriptor ring to transact packets on
@@ -1970,6 +2033,7 @@ static int i40e_clean_rx_irq(struct i40e_ring *rx_ring, int budget)
 		u16 vlan_tag;
 		u8 rx_ptype;
 		u64 qword;
+		struct xdp_buff xdp;
 
 		/* return some buffers to hardware, one at a time is too slow */
 		if (cleaned_count >= I40E_RX_BUFFER_WRITE) {
@@ -2006,12 +2070,27 @@ static int i40e_clean_rx_irq(struct i40e_ring *rx_ring, int budget)
 		rx_buffer = i40e_get_rx_buffer(rx_ring, size);
 
 		/* retrieve a buffer from the ring */
-		if (skb)
+		if (!skb) {
+			xdp.data = page_address(rx_buffer->page) +
+				   rx_buffer->page_offset;
+			xdp.data_hard_start = xdp.data -
+					      i40e_rx_offset(rx_ring);
+			xdp.data_end = xdp.data + size;
+
+			skb = i40e_run_xdp(rx_ring, &xdp);
+		}
+
+		if (IS_ERR(skb)) {
+			total_rx_bytes += size;
+			total_rx_packets++;
+			rx_buffer->pagecnt_bias++;
+		} else if (skb) {
 			i40e_add_rx_frag(rx_ring, rx_buffer, skb, size);
-		else if (ring_uses_build_skb(rx_ring))
-			skb = i40e_build_skb(rx_ring, rx_buffer, size);
-		else
-			skb = i40e_construct_skb(rx_ring, rx_buffer, size);
+		} else if (ring_uses_build_skb(rx_ring)) {
+			skb = i40e_build_skb(rx_ring, &xdp, rx_buffer);
+		} else {
+			skb = i40e_construct_skb(rx_ring, &xdp, rx_buffer);
+		}
 
 		/* exit if we failed to retrieve a buffer */
 		if (!skb) {
@@ -2026,18 +2105,7 @@ static int i40e_clean_rx_irq(struct i40e_ring *rx_ring, int budget)
 		if (i40e_is_non_eop(rx_ring, rx_desc, skb))
 			continue;
 
-		/* ERR_MASK will only have valid bits if EOP set, and
-		 * what we are doing here is actually checking
-		 * I40E_RX_DESC_ERROR_RXE_SHIFT, since it is the zeroth bit in
-		 * the error field
-		 */
-		if (unlikely(i40e_test_staterr(rx_desc, BIT(I40E_RXD_QW1_ERROR_SHIFT)))) {
-			dev_kfree_skb_any(skb);
-			skb = NULL;
-			continue;
-		}
-
-		if (i40e_cleanup_headers(rx_ring, skb)) {
+		if (i40e_cleanup_headers(rx_ring, rx_desc, skb)) {
 			skb = NULL;
 			continue;
 		}
diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.h b/drivers/net/ethernet/intel/i40e/i40e_txrx.h
index f5de51124cae..31f0b162996f 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_txrx.h
+++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.h
@@ -360,6 +360,7 @@ struct i40e_ring {
 	void *desc;			/* Descriptor ring memory */
 	struct device *dev;		/* Used for DMA mapping */
 	struct net_device *netdev;	/* netdev ring maps to */
+	struct bpf_prog *xdp_prog;
 	union {
 		struct i40e_tx_buffer *tx_bi;
 		struct i40e_rx_buffer *rx_bi;
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [Intel-wired-lan] [PATCH v5 2/2] i40e: add support for XDP_TX action
  2017-05-19  7:08 [Intel-wired-lan] [PATCH v5 0/2] i40e: support for XDP =?unknown-8bit?q?Bj=C3=B6rn_T=C3=B6pel?=
  2017-05-19  7:08 ` [Intel-wired-lan] [PATCH v5 1/2] i40e: add XDP support for pass and drop actions =?unknown-8bit?q?Bj=C3=B6rn_T=C3=B6pel?=
@ 2017-05-19  7:08 ` =?unknown-8bit?q?Bj=C3=B6rn_T=C3=B6pel?=
  2017-05-19 13:40   ` Alexander Duyck
  2017-05-19 13:55 ` [Intel-wired-lan] [PATCH v5 0/2] i40e: support for XDP Alexander Duyck
  2 siblings, 1 reply; 8+ messages in thread
From: =?unknown-8bit?q?Bj=C3=B6rn_T=C3=B6pel?= @ 2017-05-19  7:08 UTC (permalink / raw)
  To: intel-wired-lan

From: Bj?rn T?pel <bjorn.topel@intel.com>

This patch adds proper XDP_TX action support. For each Tx ring, an
additional XDP Tx ring is allocated and setup. This version does the
DMA mapping in the fast-path, which will penalize performance for
IOMMU enabled systems. Further, debugfs support is not wired up for
the XDP Tx rings.

Signed-off-by: Bj?rn T?pel <bjorn.topel@intel.com>
---
 drivers/net/ethernet/intel/i40e/i40e.h         |   1 +
 drivers/net/ethernet/intel/i40e/i40e_ethtool.c |  57 +++++++-
 drivers/net/ethernet/intel/i40e/i40e_main.c    | 195 +++++++++++++++++++++----
 drivers/net/ethernet/intel/i40e/i40e_txrx.c    | 119 ++++++++++++++-
 drivers/net/ethernet/intel/i40e/i40e_txrx.h    |  11 ++
 5 files changed, 350 insertions(+), 33 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e.h b/drivers/net/ethernet/intel/i40e/i40e.h
index d3195b29d53c..4250ab55a9f1 100644
--- a/drivers/net/ethernet/intel/i40e/i40e.h
+++ b/drivers/net/ethernet/intel/i40e/i40e.h
@@ -629,6 +629,7 @@ struct i40e_vsi {
 	/* These are containers of ring pointers, allocated at run-time */
 	struct i40e_ring **rx_rings;
 	struct i40e_ring **tx_rings;
+	struct i40e_ring **xdp_rings; /* XDP Tx rings */
 
 	u32  active_filters;
 	u32  promisc_threshold;
diff --git a/drivers/net/ethernet/intel/i40e/i40e_ethtool.c b/drivers/net/ethernet/intel/i40e/i40e_ethtool.c
index 3d58762efbc0..518788e42887 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_ethtool.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_ethtool.c
@@ -1302,7 +1302,7 @@ static void i40e_get_ringparam(struct net_device *netdev,
 static int i40e_set_ringparam(struct net_device *netdev,
 			      struct ethtool_ringparam *ring)
 {
-	struct i40e_ring *tx_rings = NULL, *rx_rings = NULL;
+	struct i40e_ring *tx_rings = NULL, *rx_rings = NULL, *xdp_rings = NULL;
 	struct i40e_netdev_priv *np = netdev_priv(netdev);
 	struct i40e_hw *hw = &np->vsi->back->hw;
 	struct i40e_vsi *vsi = np->vsi;
@@ -1310,6 +1310,7 @@ static int i40e_set_ringparam(struct net_device *netdev,
 	u32 new_rx_count, new_tx_count;
 	int timeout = 50;
 	int i, err = 0;
+	bool xdp = i40e_enabled_xdp_vsi(vsi);
 
 	if ((ring->rx_mini_pending) || (ring->rx_jumbo_pending))
 		return -EINVAL;
@@ -1345,6 +1346,8 @@ static int i40e_set_ringparam(struct net_device *netdev,
 		for (i = 0; i < vsi->num_queue_pairs; i++) {
 			vsi->tx_rings[i]->count = new_tx_count;
 			vsi->rx_rings[i]->count = new_rx_count;
+			if (xdp)
+				vsi->xdp_rings[i]->count = new_tx_count;
 		}
 		goto done;
 	}
@@ -1440,6 +1443,41 @@ static int i40e_set_ringparam(struct net_device *netdev,
 		}
 	}
 
+	/* alloc updated XDP Tx resources */
+	if (xdp && new_tx_count != vsi->xdp_rings[0]->count) {
+		netdev_info(netdev,
+			    "Changing XDP Tx descriptor count from %d to %d.\n",
+			    vsi->xdp_rings[0]->count, new_tx_count);
+		xdp_rings = kcalloc(vsi->alloc_queue_pairs,
+				    sizeof(struct i40e_ring), GFP_KERNEL);
+		if (!xdp_rings) {
+			err = -ENOMEM;
+			goto free_rx;
+		}
+
+		for (i = 0; i < vsi->num_queue_pairs; i++) {
+			/* clone ring and setup updated count */
+			xdp_rings[i] = *vsi->xdp_rings[i];
+			xdp_rings[i].count = new_tx_count;
+			/* the desc and bi pointers will be reallocated in the
+			 * setup call
+			 */
+			xdp_rings[i].desc = NULL;
+			xdp_rings[i].rx_bi = NULL;
+			err = i40e_setup_tx_descriptors(&xdp_rings[i]);
+			if (err) {
+				while (i) {
+					i--;
+					i40e_free_tx_resources(&xdp_rings[i]);
+				}
+				kfree(xdp_rings);
+				xdp_rings = NULL;
+
+				goto free_rx;
+			}
+		}
+	}
+
 	/* Bring interface down, copy in the new ring info,
 	 * then restore the interface
 	 */
@@ -1474,8 +1512,25 @@ static int i40e_set_ringparam(struct net_device *netdev,
 		rx_rings = NULL;
 	}
 
+	if (xdp_rings) {
+		for (i = 0; i < vsi->num_queue_pairs; i++) {
+			i40e_free_tx_resources(vsi->xdp_rings[i]);
+			*vsi->xdp_rings[i] = xdp_rings[i];
+		}
+		kfree(xdp_rings);
+		xdp_rings = NULL;
+	}
+
 	i40e_up(vsi);
 
+free_rx:
+	/* error cleanup if the XDP Tx allocations failed after getting Rx */
+	if (rx_rings) {
+		for (i = 0; i < vsi->num_queue_pairs; i++)
+			i40e_free_rx_resources(&rx_rings[i]);
+		kfree(rx_rings);
+		rx_rings = NULL;
+	}
 free_tx:
 	/* error cleanup if the Rx allocations failed after getting Tx */
 	if (tx_rings) {
diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c
index 27a29904611a..f1dbcead79ba 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -408,6 +408,27 @@ struct rtnl_link_stats64 *i40e_get_vsi_stats_struct(struct i40e_vsi *vsi)
 }
 
 /**
+ * i40e_get_netdev_stats_struct_tx - populate stats from a Tx ring
+ * @ring: Tx ring to get statistics from
+ * @stats: statistics entry to be updated
+ **/
+static void i40e_get_netdev_stats_struct_tx(struct i40e_ring *ring,
+					    struct rtnl_link_stats64 *stats)
+{
+	u64 bytes, packets;
+	unsigned int start;
+
+	do {
+		start = u64_stats_fetch_begin_irq(&ring->syncp);
+		packets = ring->stats.packets;
+		bytes   = ring->stats.bytes;
+	} while (u64_stats_fetch_retry_irq(&ring->syncp, start));
+
+	stats->tx_packets += packets;
+	stats->tx_bytes   += bytes;
+}
+
+/**
  * i40e_get_netdev_stats_struct - Get statistics for netdev interface
  * @netdev: network interface device structure
  *
@@ -437,15 +458,8 @@ static void i40e_get_netdev_stats_struct(struct net_device *netdev,
 		tx_ring = ACCESS_ONCE(vsi->tx_rings[i]);
 		if (!tx_ring)
 			continue;
+		i40e_get_netdev_stats_struct_tx(tx_ring, stats);
 
-		do {
-			start = u64_stats_fetch_begin_irq(&tx_ring->syncp);
-			packets = tx_ring->stats.packets;
-			bytes   = tx_ring->stats.bytes;
-		} while (u64_stats_fetch_retry_irq(&tx_ring->syncp, start));
-
-		stats->tx_packets += packets;
-		stats->tx_bytes   += bytes;
 		rx_ring = &tx_ring[1];
 
 		do {
@@ -456,6 +470,9 @@ static void i40e_get_netdev_stats_struct(struct net_device *netdev,
 
 		stats->rx_packets += packets;
 		stats->rx_bytes   += bytes;
+
+		if (i40e_enabled_xdp_vsi(vsi))
+			i40e_get_netdev_stats_struct_tx(&rx_ring[1], stats);
 	}
 	rcu_read_unlock();
 
@@ -2802,6 +2819,12 @@ static int i40e_vsi_setup_tx_resources(struct i40e_vsi *vsi)
 	for (i = 0; i < vsi->num_queue_pairs && !err; i++)
 		err = i40e_setup_tx_descriptors(vsi->tx_rings[i]);
 
+	if (!i40e_enabled_xdp_vsi(vsi))
+		return err;
+
+	for (i = 0; i < vsi->num_queue_pairs && !err; i++)
+		err = i40e_setup_tx_descriptors(vsi->xdp_rings[i]);
+
 	return err;
 }
 
@@ -2815,12 +2838,17 @@ static void i40e_vsi_free_tx_resources(struct i40e_vsi *vsi)
 {
 	int i;
 
-	if (!vsi->tx_rings)
-		return;
+	if (vsi->tx_rings) {
+		for (i = 0; i < vsi->num_queue_pairs; i++)
+			if (vsi->tx_rings[i] && vsi->tx_rings[i]->desc)
+				i40e_free_tx_resources(vsi->tx_rings[i]);
+	}
 
-	for (i = 0; i < vsi->num_queue_pairs; i++)
-		if (vsi->tx_rings[i] && vsi->tx_rings[i]->desc)
-			i40e_free_tx_resources(vsi->tx_rings[i]);
+	if (vsi->xdp_rings) {
+		for (i = 0; i < vsi->num_queue_pairs; i++)
+			if (vsi->xdp_rings[i] && vsi->xdp_rings[i]->desc)
+				i40e_free_tx_resources(vsi->xdp_rings[i]);
+	}
 }
 
 /**
@@ -3081,6 +3109,12 @@ static int i40e_vsi_configure_tx(struct i40e_vsi *vsi)
 	for (i = 0; (i < vsi->num_queue_pairs) && !err; i++)
 		err = i40e_configure_tx_ring(vsi->tx_rings[i]);
 
+	if (!i40e_enabled_xdp_vsi(vsi))
+		return err;
+
+	for (i = 0; (i < vsi->num_queue_pairs) && !err; i++)
+		err = i40e_configure_tx_ring(vsi->xdp_rings[i]);
+
 	return err;
 }
 
@@ -3230,6 +3264,7 @@ static void i40e_vsi_configure_msix(struct i40e_vsi *vsi)
 	u16 vector;
 	int i, q;
 	u32 qp;
+	bool has_xdp = i40e_enabled_xdp_vsi(vsi);
 
 	/* The interrupt indexing is offset by 1 in the PFINT_ITRn
 	 * and PFINT_LNKLSTn registers, e.g.:
@@ -3256,16 +3291,28 @@ static void i40e_vsi_configure_msix(struct i40e_vsi *vsi)
 		wr32(hw, I40E_PFINT_LNKLSTN(vector - 1), qp);
 		for (q = 0; q < q_vector->num_ringpairs; q++) {
 			u32 val;
+			u32 nextqp = has_xdp ? qp + vsi->alloc_queue_pairs : qp;
 
 			val = I40E_QINT_RQCTL_CAUSE_ENA_MASK |
 			      (I40E_RX_ITR << I40E_QINT_RQCTL_ITR_INDX_SHIFT)  |
 			      (vector      << I40E_QINT_RQCTL_MSIX_INDX_SHIFT) |
-			      (qp          << I40E_QINT_RQCTL_NEXTQ_INDX_SHIFT)|
+			      (nextqp      << I40E_QINT_RQCTL_NEXTQ_INDX_SHIFT)|
 			      (I40E_QUEUE_TYPE_TX
 				      << I40E_QINT_RQCTL_NEXTQ_TYPE_SHIFT);
 
 			wr32(hw, I40E_QINT_RQCTL(qp), val);
 
+			if (has_xdp) {
+				val = I40E_QINT_TQCTL_CAUSE_ENA_MASK |
+				      (I40E_TX_ITR << I40E_QINT_TQCTL_ITR_INDX_SHIFT)  |
+				      (vector      << I40E_QINT_TQCTL_MSIX_INDX_SHIFT) |
+				      (qp          << I40E_QINT_TQCTL_NEXTQ_INDX_SHIFT)|
+				      (I40E_QUEUE_TYPE_TX
+				       << I40E_QINT_TQCTL_NEXTQ_TYPE_SHIFT);
+
+				wr32(hw, I40E_QINT_TQCTL(nextqp), val);
+			}
+
 			val = I40E_QINT_TQCTL_CAUSE_ENA_MASK |
 			      (I40E_TX_ITR << I40E_QINT_TQCTL_ITR_INDX_SHIFT)  |
 			      (vector      << I40E_QINT_TQCTL_MSIX_INDX_SHIFT) |
@@ -3334,6 +3381,7 @@ static void i40e_configure_msi_and_legacy(struct i40e_vsi *vsi)
 	struct i40e_pf *pf = vsi->back;
 	struct i40e_hw *hw = &pf->hw;
 	u32 val;
+	u32 nextqp = i40e_enabled_xdp_vsi(vsi) ? vsi->alloc_queue_pairs : 0;
 
 	/* set the ITR configuration */
 	q_vector->itr_countdown = ITR_COUNTDOWN_START;
@@ -3350,12 +3398,22 @@ static void i40e_configure_msi_and_legacy(struct i40e_vsi *vsi)
 	wr32(hw, I40E_PFINT_LNKLST0, 0);
 
 	/* Associate the queue pair to the vector and enable the queue int */
-	val = I40E_QINT_RQCTL_CAUSE_ENA_MASK		      |
-	      (I40E_RX_ITR << I40E_QINT_RQCTL_ITR_INDX_SHIFT) |
+	val = I40E_QINT_RQCTL_CAUSE_ENA_MASK		       |
+	      (I40E_RX_ITR << I40E_QINT_RQCTL_ITR_INDX_SHIFT)  |
+	      (nextqp	   << I40E_QINT_RQCTL_NEXTQ_INDX_SHIFT)|
 	      (I40E_QUEUE_TYPE_TX << I40E_QINT_TQCTL_NEXTQ_TYPE_SHIFT);
 
 	wr32(hw, I40E_QINT_RQCTL(0), val);
 
+	if (i40e_enabled_xdp_vsi(vsi)) {
+		val = I40E_QINT_TQCTL_CAUSE_ENA_MASK		     |
+		      (I40E_TX_ITR << I40E_QINT_TQCTL_ITR_INDX_SHIFT)|
+		      (I40E_QUEUE_TYPE_TX
+		       << I40E_QINT_TQCTL_NEXTQ_TYPE_SHIFT);
+
+	       wr32(hw, I40E_QINT_TQCTL(nextqp), val);
+	}
+
 	val = I40E_QINT_TQCTL_CAUSE_ENA_MASK		      |
 	      (I40E_TX_ITR << I40E_QINT_TQCTL_ITR_INDX_SHIFT) |
 	      (I40E_QUEUE_END_OF_LIST << I40E_QINT_TQCTL_NEXTQ_INDX_SHIFT);
@@ -3522,6 +3580,9 @@ static void i40e_vsi_disable_irq(struct i40e_vsi *vsi)
 	for (i = 0; i < vsi->num_queue_pairs; i++) {
 		wr32(hw, I40E_QINT_TQCTL(vsi->tx_rings[i]->reg_idx), 0);
 		wr32(hw, I40E_QINT_RQCTL(vsi->rx_rings[i]->reg_idx), 0);
+		if (!i40e_enabled_xdp_vsi(vsi))
+			continue;
+		wr32(hw, I40E_QINT_TQCTL(vsi->xdp_rings[i]->reg_idx), 0);
 	}
 
 	if (pf->flags & I40E_FLAG_MSIX_ENABLED) {
@@ -3818,12 +3879,22 @@ static void i40e_map_vector_to_qp(struct i40e_vsi *vsi, int v_idx, int qp_idx)
 	struct i40e_q_vector *q_vector = vsi->q_vectors[v_idx];
 	struct i40e_ring *tx_ring = vsi->tx_rings[qp_idx];
 	struct i40e_ring *rx_ring = vsi->rx_rings[qp_idx];
+	struct i40e_ring *xdp_ring = i40e_enabled_xdp_vsi(vsi) ?
+				     vsi->xdp_rings[qp_idx] : NULL;
 
 	tx_ring->q_vector = q_vector;
 	tx_ring->next = q_vector->tx.ring;
 	q_vector->tx.ring = tx_ring;
 	q_vector->tx.count++;
 
+	/* Place XDP Tx ring in the same q_vector ring list as regular Tx */
+	if (xdp_ring) {
+		xdp_ring->q_vector = q_vector;
+		xdp_ring->next = q_vector->tx.ring;
+		q_vector->tx.ring = xdp_ring;
+		q_vector->tx.count++;
+	}
+
 	rx_ring->q_vector = q_vector;
 	rx_ring->next = q_vector->rx.ring;
 	q_vector->rx.ring = rx_ring;
@@ -4026,6 +4097,23 @@ static int i40e_vsi_control_tx(struct i40e_vsi *vsi, bool enable)
 		}
 	}
 
+	if (ret || !i40e_enabled_xdp_vsi(vsi))
+		return ret;
+
+	pf_q = vsi->base_queue + vsi->alloc_queue_pairs;
+	for (i = 0; i < vsi->num_queue_pairs; i++, pf_q++) {
+		i40e_control_tx_q(pf, pf_q, enable);
+
+		/* wait for the change to finish */
+		ret = i40e_pf_txq_wait(pf, pf_q, enable);
+		if (ret) {
+			dev_info(&pf->pdev->dev,
+				 "VSI seid %d XDP Tx ring %d %sable timeout\n",
+				 vsi->seid, pf_q, (enable ? "en" : "dis"));
+			break;
+		}
+	}
+
 	return ret;
 }
 
@@ -4535,7 +4623,7 @@ int i40e_vsi_wait_queues_disabled(struct i40e_vsi *vsi)
 				 vsi->seid, pf_q);
 			return ret;
 		}
-		/* Check and wait for the Tx queue */
+		/* Check and wait for the Rx queue */
 		ret = i40e_pf_rxq_wait(pf, pf_q, false);
 		if (ret) {
 			dev_info(&pf->pdev->dev,
@@ -4545,6 +4633,21 @@ int i40e_vsi_wait_queues_disabled(struct i40e_vsi *vsi)
 		}
 	}
 
+	if (!i40e_enabled_xdp_vsi(vsi))
+		return 0;
+
+	pf_q = vsi->base_queue + vsi->alloc_queue_pairs;
+	for (i = 0; i < vsi->num_queue_pairs; i++, pf_q++) {
+		/* Check and wait for the XDP Tx queue */
+		ret = i40e_pf_txq_wait(pf, pf_q, false);
+		if (ret) {
+			dev_info(&pf->pdev->dev,
+				 "VSI seid %d XDP Tx ring %d disable timeout\n",
+				 vsi->seid, pf_q);
+			return ret;
+		}
+	}
+
 	return 0;
 }
 
@@ -5454,6 +5557,8 @@ void i40e_down(struct i40e_vsi *vsi)
 
 	for (i = 0; i < vsi->num_queue_pairs; i++) {
 		i40e_clean_tx_ring(vsi->tx_rings[i]);
+		if (i40e_enabled_xdp_vsi(vsi))
+			i40e_clean_tx_ring(vsi->xdp_rings[i]);
 		i40e_clean_rx_ring(vsi->rx_rings[i]);
 	}
 
@@ -7475,6 +7580,7 @@ static int i40e_set_num_rings_in_vsi(struct i40e_vsi *vsi)
 	switch (vsi->type) {
 	case I40E_VSI_MAIN:
 		vsi->alloc_queue_pairs = pf->num_lan_qps;
+
 		vsi->num_desc = ALIGN(I40E_DEFAULT_NUM_DESCRIPTORS,
 				      I40E_REQ_DESCRIPTOR_MULTIPLE);
 		if (pf->flags & I40E_FLAG_MSIX_ENABLED)
@@ -7524,13 +7630,17 @@ static int i40e_vsi_alloc_arrays(struct i40e_vsi *vsi, bool alloc_qvectors)
 {
 	int size;
 	int ret = 0;
+	int nrings;
 
-	/* allocate memory for both Tx and Rx ring pointers */
-	size = sizeof(struct i40e_ring *) * vsi->alloc_queue_pairs * 2;
+	/* allocate memory for both Tx, Rx and XDP Tx ring pointers */
+	nrings = i40e_enabled_xdp_vsi(vsi) ? 3 : 2;
+	size = sizeof(struct i40e_ring *) * vsi->alloc_queue_pairs * nrings;
 	vsi->tx_rings = kzalloc(size, GFP_KERNEL);
 	if (!vsi->tx_rings)
 		return -ENOMEM;
 	vsi->rx_rings = &vsi->tx_rings[vsi->alloc_queue_pairs];
+	if (i40e_enabled_xdp_vsi(vsi))
+		vsi->xdp_rings = &vsi->rx_rings[vsi->alloc_queue_pairs];
 
 	if (alloc_qvectors) {
 		/* allocate memory for q_vector pointers */
@@ -7650,6 +7760,7 @@ static void i40e_vsi_free_arrays(struct i40e_vsi *vsi, bool free_qvectors)
 	kfree(vsi->tx_rings);
 	vsi->tx_rings = NULL;
 	vsi->rx_rings = NULL;
+	vsi->xdp_rings = NULL;
 }
 
 /**
@@ -7733,6 +7844,8 @@ static void i40e_vsi_clear_rings(struct i40e_vsi *vsi)
 			kfree_rcu(vsi->tx_rings[i], rcu);
 			vsi->tx_rings[i] = NULL;
 			vsi->rx_rings[i] = NULL;
+			if (vsi->xdp_rings)
+				vsi->xdp_rings[i] = NULL;
 		}
 	}
 }
@@ -7743,14 +7856,15 @@ static void i40e_vsi_clear_rings(struct i40e_vsi *vsi)
  **/
 static int i40e_alloc_rings(struct i40e_vsi *vsi)
 {
-	struct i40e_ring *tx_ring, *rx_ring;
+	struct i40e_ring *tx_ring, *rx_ring, *xdp_ring;
 	struct i40e_pf *pf = vsi->back;
-	int i;
+	int i, nr;
 
+	nr = i40e_enabled_xdp_vsi(vsi) ? 3 : 2;
 	/* Set basic values in the rings to be used later during open() */
 	for (i = 0; i < vsi->alloc_queue_pairs; i++) {
 		/* allocate space for both Tx and Rx in one shot */
-		tx_ring = kzalloc(sizeof(struct i40e_ring) * 2, GFP_KERNEL);
+		tx_ring = kcalloc(nr, sizeof(*tx_ring), GFP_KERNEL);
 		if (!tx_ring)
 			goto err_out;
 
@@ -7780,6 +7894,26 @@ static int i40e_alloc_rings(struct i40e_vsi *vsi)
 		rx_ring->dcb_tc = 0;
 		rx_ring->rx_itr_setting = pf->rx_itr_default;
 		vsi->rx_rings[i] = rx_ring;
+
+		if (!i40e_enabled_xdp_vsi(vsi))
+			continue;
+
+		xdp_ring = &rx_ring[1];
+		xdp_ring->queue_index = vsi->alloc_queue_pairs + i;
+		xdp_ring->reg_idx = vsi->base_queue +
+				    vsi->alloc_queue_pairs + i;
+		xdp_ring->ring_active = false;
+		xdp_ring->vsi = vsi;
+		xdp_ring->netdev = NULL;
+		xdp_ring->dev = &pf->pdev->dev;
+		xdp_ring->count = vsi->num_desc;
+		xdp_ring->size = 0;
+		xdp_ring->dcb_tc = 0;
+		if (vsi->back->flags & I40E_FLAG_WB_ON_ITR_CAPABLE)
+			xdp_ring->flags = I40E_TXR_FLAGS_WB_ON_ITR;
+		set_ring_xdp(xdp_ring);
+		xdp_ring->tx_itr_setting = pf->tx_itr_default;
+		vsi->xdp_rings[i] = xdp_ring;
 	}
 
 	return 0;
@@ -9988,6 +10122,7 @@ static struct i40e_vsi *i40e_vsi_reinit_setup(struct i40e_vsi *vsi)
 	struct i40e_pf *pf;
 	u8 enabled_tc;
 	int ret;
+	u16 alloc_queue_pairs;
 
 	if (!vsi)
 		return NULL;
@@ -10003,11 +10138,14 @@ static struct i40e_vsi *i40e_vsi_reinit_setup(struct i40e_vsi *vsi)
 	if (ret)
 		goto err_vsi;
 
-	ret = i40e_get_lump(pf, pf->qp_pile, vsi->alloc_queue_pairs, vsi->idx);
+	alloc_queue_pairs = vsi->alloc_queue_pairs *
+			    (i40e_enabled_xdp_vsi(vsi) ? 2 : 1);
+
+	ret = i40e_get_lump(pf, pf->qp_pile, alloc_queue_pairs, vsi->idx);
 	if (ret < 0) {
 		dev_info(&pf->pdev->dev,
 			 "failed to get tracking for %d queues for VSI %d err %d\n",
-			 vsi->alloc_queue_pairs, vsi->seid, ret);
+			 alloc_queue_pairs, vsi->seid, ret);
 		goto err_vsi;
 	}
 	vsi->base_queue = ret;
@@ -10065,6 +10203,7 @@ struct i40e_vsi *i40e_vsi_setup(struct i40e_pf *pf, u8 type,
 	struct i40e_veb *veb = NULL;
 	int ret, i;
 	int v_idx;
+	u16 alloc_queue_pairs;
 
 	/* The requested uplink_seid must be either
 	 *     - the PF's port seid
@@ -10150,12 +10289,14 @@ struct i40e_vsi *i40e_vsi_setup(struct i40e_pf *pf, u8 type,
 	else if (type == I40E_VSI_SRIOV)
 		vsi->vf_id = param1;
 	/* assign it some queues */
-	ret = i40e_get_lump(pf, pf->qp_pile, vsi->alloc_queue_pairs,
-				vsi->idx);
+	alloc_queue_pairs = vsi->alloc_queue_pairs *
+			    (i40e_enabled_xdp_vsi(vsi) ? 2 : 1);
+
+	ret = i40e_get_lump(pf, pf->qp_pile, alloc_queue_pairs, vsi->idx);
 	if (ret < 0) {
 		dev_info(&pf->pdev->dev,
 			 "failed to get tracking for %d queues for VSI %d err=%d\n",
-			 vsi->alloc_queue_pairs, vsi->seid, ret);
+			 alloc_queue_pairs, vsi->seid, ret);
 		goto err_vsi;
 	}
 	vsi->base_queue = ret;
diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.c b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
index 0c2f0230faf4..c025e517f7e5 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_txrx.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
@@ -630,6 +630,8 @@ static void i40e_unmap_and_free_tx_resource(struct i40e_ring *ring,
 	if (tx_buffer->skb) {
 		if (tx_buffer->tx_flags & I40E_TX_FLAGS_FD_SB)
 			kfree(tx_buffer->raw_buf);
+		else if (ring_is_xdp(ring))
+			page_frag_free(tx_buffer->raw_buf);
 		else
 			dev_kfree_skb_any(tx_buffer->skb);
 		if (dma_unmap_len(tx_buffer, len))
@@ -771,8 +773,11 @@ static bool i40e_clean_tx_irq(struct i40e_vsi *vsi,
 		total_bytes += tx_buf->bytecount;
 		total_packets += tx_buf->gso_segs;
 
-		/* free the skb */
-		napi_consume_skb(tx_buf->skb, napi_budget);
+		/* free the skb/XDP data */
+		if (ring_is_xdp(tx_ring))
+			page_frag_free(tx_buf->raw_buf);
+		else
+			napi_consume_skb(tx_buf->skb, napi_budget);
 
 		/* unmap skb header data */
 		dma_unmap_single(tx_ring->dev,
@@ -848,6 +853,9 @@ static bool i40e_clean_tx_irq(struct i40e_vsi *vsi,
 			tx_ring->arm_wb = true;
 	}
 
+	if (ring_is_xdp(tx_ring))
+		return !!budget;
+
 	/* notify netdev of completed buffers */
 	netdev_tx_completed_queue(txring_txq(tx_ring),
 				  total_packets, total_bytes);
@@ -1969,6 +1977,10 @@ static bool i40e_is_non_eop(struct i40e_ring *rx_ring,
 
 #define I40E_XDP_PASS 0
 #define I40E_XDP_CONSUMED 1
+#define I40E_XDP_TX 2
+
+static int i40e_xmit_xdp_ring(struct xdp_buff *xdp,
+			      struct i40e_ring *xdp_ring);
 
 /**
  * i40e_run_xdp - run an XDP program
@@ -1981,6 +1993,7 @@ static struct sk_buff *i40e_run_xdp(struct i40e_ring *rx_ring,
 	int result = I40E_XDP_PASS;
 	struct bpf_prog *xdp_prog;
 	u32 act;
+	struct i40e_ring *xdp_ring;
 
 	rcu_read_lock();
 	xdp_prog = READ_ONCE(rx_ring->xdp_prog);
@@ -1989,12 +2002,16 @@ static struct sk_buff *i40e_run_xdp(struct i40e_ring *rx_ring,
 		goto xdp_out;
 
 	act = bpf_prog_run_xdp(xdp_prog, xdp);
+
 	switch (act) {
 	case XDP_PASS:
 		break;
+	case XDP_TX:
+		xdp_ring = rx_ring->vsi->xdp_rings[rx_ring->queue_index];
+		result = i40e_xmit_xdp_ring(xdp, xdp_ring);
+		break;
 	default:
 		bpf_warn_invalid_xdp_action(act);
-	case XDP_TX:
 	case XDP_ABORTED:
 		trace_xdp_exception(rx_ring->netdev, xdp_prog, act);
 		/* fallthrough -- handle aborts by dropping packet */
@@ -2008,6 +2025,27 @@ static struct sk_buff *i40e_run_xdp(struct i40e_ring *rx_ring,
 }
 
 /**
+ * i40e_rx_buffer_flip - adjusted rx_buffer to point to an unused region
+ * @rx_ring: Rx ring
+ * @rx_buffer: Rx buffer to adjust
+ * @size: Size of adjustment
+ **/
+static void i40e_rx_buffer_flip(struct i40e_ring *rx_ring,
+				struct i40e_rx_buffer *rx_buffer,
+				unsigned int size)
+{
+#if (PAGE_SIZE < 8192)
+	unsigned int truesize = i40e_rx_pg_size(rx_ring) / 2;
+
+	rx_buffer->page_offset ^= truesize;
+#else
+	unsigned int truesize = SKB_DATA_ALIGN(i40e_rx_offset(rx_ring) + size);
+
+	rx_buffer->page_offset += truesize;
+#endif
+}
+
+/**
  * i40e_clean_rx_irq - Clean completed descriptors from Rx ring - bounce buf
  * @rx_ring: rx descriptor ring to transact packets on
  * @budget: Total limit on number of packets to process
@@ -2024,7 +2062,7 @@ static int i40e_clean_rx_irq(struct i40e_ring *rx_ring, int budget)
 	unsigned int total_rx_bytes = 0, total_rx_packets = 0;
 	struct sk_buff *skb = rx_ring->skb;
 	u16 cleaned_count = I40E_DESC_UNUSED(rx_ring);
-	bool failure = false;
+	bool failure = false, xdp_xmit = false;
 
 	while (likely(total_rx_packets < budget)) {
 		struct i40e_rx_buffer *rx_buffer;
@@ -2081,9 +2119,14 @@ static int i40e_clean_rx_irq(struct i40e_ring *rx_ring, int budget)
 		}
 
 		if (IS_ERR(skb)) {
+			if (PTR_ERR(skb) == -I40E_XDP_TX) {
+				xdp_xmit = true;
+				i40e_rx_buffer_flip(rx_ring, rx_buffer, size);
+			} else {
+				rx_buffer->pagecnt_bias++;
+			}
 			total_rx_bytes += size;
 			total_rx_packets++;
-			rx_buffer->pagecnt_bias++;
 		} else if (skb) {
 			i40e_add_rx_frag(rx_ring, rx_buffer, skb, size);
 		} else if (ring_uses_build_skb(rx_ring)) {
@@ -2131,6 +2174,19 @@ static int i40e_clean_rx_irq(struct i40e_ring *rx_ring, int budget)
 		total_rx_packets++;
 	}
 
+	if (xdp_xmit) {
+		struct i40e_ring *xdp_ring;
+
+		xdp_ring = rx_ring->vsi->xdp_rings[rx_ring->queue_index];
+
+		/* Force memory writes to complete before letting h/w
+		 * know there are new descriptors to fetch.
+		 */
+		wmb();
+
+		writel(xdp_ring->next_to_use, xdp_ring->tail);
+	}
+
 	rx_ring->skb = skb;
 
 	u64_stats_update_begin(&rx_ring->syncp);
@@ -3188,6 +3244,59 @@ static inline int i40e_tx_map(struct i40e_ring *tx_ring, struct sk_buff *skb,
 }
 
 /**
+ * i40e_xmit_xdp_ring - transmits an XDP buffer to an XDP Tx ring
+ * @xdp: data to transmit
+ * @xdp_ring: XDP Tx ring
+ **/
+static int i40e_xmit_xdp_ring(struct xdp_buff *xdp,
+			      struct i40e_ring *xdp_ring)
+{
+	struct i40e_tx_buffer *tx_bi;
+	struct i40e_tx_desc *tx_desc;
+	u16 i = xdp_ring->next_to_use;
+	dma_addr_t dma;
+	u32 size = xdp->data_end - xdp->data;
+
+	if (!unlikely(I40E_DESC_UNUSED(xdp_ring))) {
+		xdp_ring->tx_stats.tx_busy++;
+		return I40E_XDP_CONSUMED;
+	}
+
+	dma = dma_map_single(xdp_ring->dev, xdp->data, size, DMA_TO_DEVICE);
+	if (dma_mapping_error(xdp_ring->dev, dma))
+		return I40E_XDP_CONSUMED;
+
+	tx_bi = &xdp_ring->tx_bi[i];
+	tx_bi->bytecount = size;
+	tx_bi->gso_segs = 1;
+	tx_bi->raw_buf = xdp->data;
+
+	/* record length, and DMA address */
+	dma_unmap_len_set(tx_bi, len, size);
+	dma_unmap_addr_set(tx_bi, dma, dma);
+
+	tx_desc = I40E_TX_DESC(xdp_ring, i);
+	tx_desc->buffer_addr = cpu_to_le64(dma);
+	tx_desc->cmd_type_offset_bsz = build_ctob(I40E_TX_DESC_CMD_ICRC
+						  | I40E_TXD_CMD,
+						  0, size, 0);
+
+	/* Make certain all of the status bits have been updated
+	 * before next_to_watch is written.
+	 */
+	smp_wmb();
+
+	i++;
+	if (i == xdp_ring->count)
+		i = 0;
+
+	tx_bi->next_to_watch = tx_desc;
+	xdp_ring->next_to_use = i;
+
+	return I40E_XDP_TX;
+}
+
+/**
  * i40e_xmit_frame_ring - Sends buffer on Tx ring
  * @skb:     send buffer
  * @tx_ring: ring to send buffer on
diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.h b/drivers/net/ethernet/intel/i40e/i40e_txrx.h
index 31f0b162996f..b288d58313a6 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_txrx.h
+++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.h
@@ -396,6 +396,7 @@ struct i40e_ring {
 	u16 flags;
 #define I40E_TXR_FLAGS_WB_ON_ITR		BIT(0)
 #define I40E_RXR_FLAGS_BUILD_SKB_ENABLED	BIT(1)
+#define I40E_TXR_FLAGS_XDP			BIT(2)
 
 	/* stats structs */
 	struct i40e_queue_stats	stats;
@@ -438,6 +439,16 @@ static inline void clear_ring_build_skb_enabled(struct i40e_ring *ring)
 	ring->flags &= ~I40E_RXR_FLAGS_BUILD_SKB_ENABLED;
 }
 
+static inline bool ring_is_xdp(struct i40e_ring *ring)
+{
+	return !!(ring->flags & I40E_TXR_FLAGS_XDP);
+}
+
+static inline void set_ring_xdp(struct i40e_ring *ring)
+{
+	ring->flags |= I40E_TXR_FLAGS_XDP;
+}
+
 enum i40e_latency_range {
 	I40E_LOWEST_LATENCY = 0,
 	I40E_LOW_LATENCY = 1,
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [Intel-wired-lan] [PATCH v5 1/2] i40e: add XDP support for pass and drop actions
  2017-05-19  7:08 ` [Intel-wired-lan] [PATCH v5 1/2] i40e: add XDP support for pass and drop actions =?unknown-8bit?q?Bj=C3=B6rn_T=C3=B6pel?=
@ 2017-05-19 12:29   ` Alexander Duyck
  2017-05-19 12:46     ` =?unknown-8bit?q?Bj=C3=B6rn_T=C3=B6pel?=
  0 siblings, 1 reply; 8+ messages in thread
From: Alexander Duyck @ 2017-05-19 12:29 UTC (permalink / raw)
  To: intel-wired-lan

On Fri, May 19, 2017 at 12:08 AM, Bj?rn T?pel <bjorn.topel@gmail.com> wrote:
> From: Bj?rn T?pel <bjorn.topel@intel.com>
>
> This commit adds basic XDP support for i40e derived NICs. All XDP
> actions will end up in XDP_DROP.
>
> Signed-off-by: Bj?rn T?pel <bjorn.topel@intel.com>

I have called out a few minor cosmetic issues that I would like to see
fixed, but otherwise the patch looks good to me and the issues I saw
didn't impact functionality.

Reviewed-by: Alexander Duyck <alexander.h.duyck@intel.com>

> ---
>  drivers/net/ethernet/intel/i40e/i40e.h      |   7 ++
>  drivers/net/ethernet/intel/i40e/i40e_main.c |  75 ++++++++++++++++
>  drivers/net/ethernet/intel/i40e/i40e_txrx.c | 130 +++++++++++++++++++++-------
>  drivers/net/ethernet/intel/i40e/i40e_txrx.h |   1 +
>  4 files changed, 182 insertions(+), 31 deletions(-)
>
> diff --git a/drivers/net/ethernet/intel/i40e/i40e.h b/drivers/net/ethernet/intel/i40e/i40e.h
> index 395ca94faf80..d3195b29d53c 100644
> --- a/drivers/net/ethernet/intel/i40e/i40e.h
> +++ b/drivers/net/ethernet/intel/i40e/i40e.h
> @@ -645,6 +645,8 @@ struct i40e_vsi {
>         u16 max_frame;
>         u16 rx_buf_len;
>
> +       struct bpf_prog *xdp_prog;
> +
>         /* List of q_vectors allocated to this VSI */
>         struct i40e_q_vector **q_vectors;
>         int num_q_vectors;
> @@ -972,4 +974,9 @@ i40e_status i40e_get_npar_bw_setting(struct i40e_pf *pf);
>  i40e_status i40e_set_npar_bw_setting(struct i40e_pf *pf);
>  i40e_status i40e_commit_npar_bw_setting(struct i40e_pf *pf);
>  void i40e_print_link_message(struct i40e_vsi *vsi, bool isup);
> +
> +static inline bool i40e_enabled_xdp_vsi(struct i40e_vsi *vsi)
> +{
> +       return !!vsi->xdp_prog;
> +}
>  #endif /* _I40E_H_ */
> diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c
> index 8d1d3b859af7..27a29904611a 100644
> --- a/drivers/net/ethernet/intel/i40e/i40e_main.c
> +++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
> @@ -27,6 +27,7 @@
>  #include <linux/etherdevice.h>
>  #include <linux/of_net.h>
>  #include <linux/pci.h>
> +#include <linux/bpf.h>
>
>  /* Local includes */
>  #include "i40e.h"
> @@ -2408,6 +2409,13 @@ static int i40e_change_mtu(struct net_device *netdev, int new_mtu)
>         struct i40e_vsi *vsi = np->vsi;
>         struct i40e_pf *pf = vsi->back;
>
> +       if (i40e_enabled_xdp_vsi(vsi)) {
> +               int frame_size = new_mtu + ETH_HLEN + ETH_FCS_LEN + VLAN_HLEN;
> +
> +               if (frame_size > vsi->rx_buf_len)
> +                       return -EINVAL;
> +       }
> +
>         netdev_info(netdev, "changing MTU from %d to %d\n",
>                     netdev->mtu, new_mtu);
>         netdev->mtu = new_mtu;
> @@ -9310,6 +9318,72 @@ static netdev_features_t i40e_features_check(struct sk_buff *skb,
>         return features & ~(NETIF_F_CSUM_MASK | NETIF_F_GSO_MASK);
>  }
>
> +/**
> + * i40e_xdp_setup - add/remove an XDP program
> + * @vsi: VSI to changed
> + * @prog: XDP program
> + **/
> +static int i40e_xdp_setup(struct i40e_vsi *vsi,
> +                         struct bpf_prog *prog)
> +{
> +       struct i40e_pf *pf = vsi->back;
> +       int frame_size = vsi->netdev->mtu + ETH_HLEN + ETH_FCS_LEN + VLAN_HLEN;
> +       int i;
> +       bool need_reset;
> +       struct bpf_prog *old_prog;
> +
> +       /* Don't allow frames that span over multiple buffers */
> +       if (frame_size > vsi->rx_buf_len)
> +               return -EINVAL;
> +
> +       if (!i40e_enabled_xdp_vsi(vsi) && !prog)
> +               return 0;
> +
> +       /* When turning XDP on->off/off->on we reset and rebuild the rings. */
> +       need_reset = (i40e_enabled_xdp_vsi(vsi) != !!prog);
> +
> +       if (need_reset)
> +               i40e_prep_for_reset(pf, true);
> +
> +       old_prog = xchg(&vsi->xdp_prog, prog);
> +
> +       if (need_reset)
> +               i40e_reset_and_rebuild(pf, true, true);
> +
> +       for (i = 0; i < vsi->num_queue_pairs; i++)
> +               WRITE_ONCE(vsi->rx_rings[i]->xdp_prog, vsi->xdp_prog);
> +
> +       if (old_prog)
> +               bpf_prog_put(old_prog);
> +
> +       return 0;
> +}
> +
> +/**
> + * i40e_xdp - implements ndo_xdp for i40e
> + * @dev: netdevice
> + * @xdp: XDP command
> + **/
> +static int i40e_xdp(struct net_device *dev,
> +                   struct netdev_xdp *xdp)
> +{
> +       struct i40e_netdev_priv *np = netdev_priv(dev);
> +       struct i40e_vsi *vsi = np->vsi;
> +
> +       if (vsi->type != I40E_VSI_MAIN)
> +               return -EINVAL;
> +
> +       switch (xdp->command) {
> +       case XDP_SETUP_PROG:
> +               return i40e_xdp_setup(vsi, xdp->prog);
> +       case XDP_QUERY_PROG:
> +               xdp->prog_attached = i40e_enabled_xdp_vsi(vsi);
> +               return 0;
> +       default:
> +               return -EINVAL;
> +       }
> +}
> +
>  static const struct net_device_ops i40e_netdev_ops = {
>         .ndo_open               = i40e_open,
>         .ndo_stop               = i40e_close,
> @@ -9342,6 +9416,7 @@ static const struct net_device_ops i40e_netdev_ops = {
>         .ndo_features_check     = i40e_features_check,
>         .ndo_bridge_getlink     = i40e_ndo_bridge_getlink,
>         .ndo_bridge_setlink     = i40e_ndo_bridge_setlink,
> +       .ndo_xdp                = i40e_xdp,
>  };
>
>  /**
> diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.c b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
> index af554f3cda19..0c2f0230faf4 100644
> --- a/drivers/net/ethernet/intel/i40e/i40e_txrx.c
> +++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
> @@ -26,6 +26,7 @@
>
>  #include <linux/prefetch.h>
>  #include <net/busy_poll.h>
> +#include <linux/bpf_trace.h>
>  #include "i40e.h"
>  #include "i40e_trace.h"
>  #include "i40e_prototype.h"
> @@ -1195,6 +1196,7 @@ void i40e_clean_rx_ring(struct i40e_ring *rx_ring)
>  void i40e_free_rx_resources(struct i40e_ring *rx_ring)
>  {
>         i40e_clean_rx_ring(rx_ring);
> +       rx_ring->xdp_prog = NULL;
>         kfree(rx_ring->rx_bi);
>         rx_ring->rx_bi = NULL;
>
> @@ -1241,6 +1243,8 @@ int i40e_setup_rx_descriptors(struct i40e_ring *rx_ring)
>         rx_ring->next_to_clean = 0;
>         rx_ring->next_to_use = 0;
>
> +       rx_ring->xdp_prog = rx_ring->vsi->xdp_prog;
> +
>         return 0;
>  err:
>         kfree(rx_ring->rx_bi);
> @@ -1592,6 +1596,7 @@ void i40e_process_skb_fields(struct i40e_ring *rx_ring,
>  /**
>   * i40e_cleanup_headers - Correct empty headers
>   * @rx_ring: rx descriptor ring packet is being transacted on
> + * @rx_desc: pointer to the EOP Rx descriptor
>   * @skb: pointer to current skb being fixed
>   *
>   * Also address the case where we are pulling data in on pages only
> @@ -1602,8 +1607,25 @@ void i40e_process_skb_fields(struct i40e_ring *rx_ring,
>   *
>   * Returns true if an error was encountered and skb was freed.
>   **/
> -static bool i40e_cleanup_headers(struct i40e_ring *rx_ring, struct sk_buff *skb)
> +static bool i40e_cleanup_headers(struct i40e_ring *rx_ring,
> +                                union i40e_rx_desc *rx_desc,
> +                                struct sk_buff *skb)
>  {
> +       /* XDP packets use error pointer so abort at this point */
> +       if (IS_ERR(skb))
> +               return true;
> +
> +       /* ERR_MASK will only have valid bits if EOP set, and
> +        * what we are doing here is actually checking
> +        * I40E_RX_DESC_ERROR_RXE_SHIFT, since it is the zeroth bit in
> +        * the error field
> +        */
> +       if (unlikely(i40e_test_staterr(rx_desc,
> +                                      BIT(I40E_RXD_QW1_ERROR_SHIFT)))) {
> +               dev_kfree_skb_any(skb);
> +               return true;
> +       }
> +
>         /* if eth_skb_pad returns an error the skb was freed */
>         if (eth_skb_pad(skb))
>                 return true;
> @@ -1783,10 +1805,10 @@ static struct i40e_rx_buffer *i40e_get_rx_buffer(struct i40e_ring *rx_ring,
>   * skb correctly.
>   */
>  static struct sk_buff *i40e_construct_skb(struct i40e_ring *rx_ring,
> -                                         struct i40e_rx_buffer *rx_buffer,
> -                                         unsigned int size)
> +                                         struct xdp_buff *xdp,
> +                                         struct i40e_rx_buffer *rx_buffer)

One minor request here. Let's leave rx_buffer in place and instead
just replace size with the xdp pointer. It helps to reduce some of the
noise in this and will make maintenance easier for the out-of-tree
driver as we will likely just be replacing xdp with size on older
kernels via our kcompat.

>  {
> -       void *va = page_address(rx_buffer->page) + rx_buffer->page_offset;
> +       unsigned int size = xdp->data_end - xdp->data;
>  #if (PAGE_SIZE < 8192)
>         unsigned int truesize = i40e_rx_pg_size(rx_ring) / 2;
>  #else
> @@ -1796,9 +1818,9 @@ static struct sk_buff *i40e_construct_skb(struct i40e_ring *rx_ring,
>         struct sk_buff *skb;
>
>         /* prefetch first cache line of first page */
> -       prefetch(va);
> +       prefetch(xdp->data);
>  #if L1_CACHE_BYTES < 128
> -       prefetch(va + L1_CACHE_BYTES);
> +       prefetch(xdp->data + L1_CACHE_BYTES);
>  #endif
>
>         /* allocate a skb to store the frags */
> @@ -1811,10 +1833,11 @@ static struct sk_buff *i40e_construct_skb(struct i40e_ring *rx_ring,
>         /* Determine available headroom for copy */
>         headlen = size;
>         if (headlen > I40E_RX_HDR_SIZE)
> -               headlen = eth_get_headlen(va, I40E_RX_HDR_SIZE);
> +               headlen = eth_get_headlen(xdp->data, I40E_RX_HDR_SIZE);
>
>         /* align pull length to size of long to optimize memcpy performance */
> -       memcpy(__skb_put(skb, headlen), va, ALIGN(headlen, sizeof(long)));
> +       memcpy(__skb_put(skb, headlen), xdp->data,
> +              ALIGN(headlen, sizeof(long)));
>
>         /* update all of the pointers */
>         size -= headlen;
> @@ -1847,10 +1870,10 @@ static struct sk_buff *i40e_construct_skb(struct i40e_ring *rx_ring,
>   * to set up the skb correctly and avoid any memcpy overhead.
>   */
>  static struct sk_buff *i40e_build_skb(struct i40e_ring *rx_ring,
> -                                     struct i40e_rx_buffer *rx_buffer,
> -                                     unsigned int size)
> +                                     struct xdp_buff *xdp,
> +                                     struct i40e_rx_buffer *rx_buffer)

Same here. If possible please just replace size with xdp instead of
moving the variables around.

>  {
> -       void *va = page_address(rx_buffer->page) + rx_buffer->page_offset;
> +       unsigned int size = xdp->data_end - xdp->data;
>  #if (PAGE_SIZE < 8192)
>         unsigned int truesize = i40e_rx_pg_size(rx_ring) / 2;
>  #else
> @@ -1860,12 +1883,12 @@ static struct sk_buff *i40e_build_skb(struct i40e_ring *rx_ring,
>         struct sk_buff *skb;
>
>         /* prefetch first cache line of first page */
> -       prefetch(va);
> +       prefetch(xdp->data);
>  #if L1_CACHE_BYTES < 128
> -       prefetch(va + L1_CACHE_BYTES);
> +       prefetch(xdp->data + L1_CACHE_BYTES);
>  #endif
>         /* build an skb around the page buffer */
> -       skb = build_skb(va - I40E_SKB_PAD, truesize);
> +       skb = build_skb(xdp->data_hard_start, truesize);
>         if (unlikely(!skb))
>                 return NULL;
>
> @@ -1944,6 +1967,46 @@ static bool i40e_is_non_eop(struct i40e_ring *rx_ring,
>         return true;
>  }
>
> +#define I40E_XDP_PASS 0
> +#define I40E_XDP_CONSUMED 1
> +
> +/**
> + * i40e_run_xdp - run an XDP program
> + * @rx_ring: Rx ring being processed
> + * @xdp: XDP buffer containing the frame
> + **/
> +static struct sk_buff *i40e_run_xdp(struct i40e_ring *rx_ring,
> +                                   struct xdp_buff *xdp)
> +{
> +       int result = I40E_XDP_PASS;
> +       struct bpf_prog *xdp_prog;
> +       u32 act;
> +
> +       rcu_read_lock();
> +       xdp_prog = READ_ONCE(rx_ring->xdp_prog);
> +
> +       if (!xdp_prog)
> +               goto xdp_out;
> +
> +       act = bpf_prog_run_xdp(xdp_prog, xdp);
> +       switch (act) {
> +       case XDP_PASS:
> +               break;
> +       default:
> +               bpf_warn_invalid_xdp_action(act);
> +       case XDP_TX:
> +       case XDP_ABORTED:
> +               trace_xdp_exception(rx_ring->netdev, xdp_prog, act);
> +               /* fallthrough -- handle aborts by dropping packet */
> +       case XDP_DROP:
> +               result = I40E_XDP_CONSUMED;
> +               break;
> +       }
> +xdp_out:
> +       rcu_read_unlock();
> +       return ERR_PTR(-result);
> +}
> +
>  /**
>   * i40e_clean_rx_irq - Clean completed descriptors from Rx ring - bounce buf
>   * @rx_ring: rx descriptor ring to transact packets on
> @@ -1970,6 +2033,7 @@ static int i40e_clean_rx_irq(struct i40e_ring *rx_ring, int budget)
>                 u16 vlan_tag;
>                 u8 rx_ptype;
>                 u64 qword;
> +               struct xdp_buff xdp;

Dave usually prefers what we call a reverse Christmas tree layout for
variables. Basically you should try to place the longest variable
declaration at the top and the shortest at the bottom. So you could
probably place it above vlan tag or size depending on the length of
the line.

>
>                 /* return some buffers to hardware, one at a time is too slow */
>                 if (cleaned_count >= I40E_RX_BUFFER_WRITE) {
> @@ -2006,12 +2070,27 @@ static int i40e_clean_rx_irq(struct i40e_ring *rx_ring, int budget)
>                 rx_buffer = i40e_get_rx_buffer(rx_ring, size);
>
>                 /* retrieve a buffer from the ring */
> -               if (skb)
> +               if (!skb) {
> +                       xdp.data = page_address(rx_buffer->page) +
> +                                  rx_buffer->page_offset;
> +                       xdp.data_hard_start = xdp.data -
> +                                             i40e_rx_offset(rx_ring);
> +                       xdp.data_end = xdp.data + size;
> +
> +                       skb = i40e_run_xdp(rx_ring, &xdp);
> +               }
> +
> +               if (IS_ERR(skb)) {
> +                       total_rx_bytes += size;
> +                       total_rx_packets++;
> +                       rx_buffer->pagecnt_bias++;
> +               } else if (skb) {
>                         i40e_add_rx_frag(rx_ring, rx_buffer, skb, size);
> -               else if (ring_uses_build_skb(rx_ring))
> -                       skb = i40e_build_skb(rx_ring, rx_buffer, size);
> -               else
> -                       skb = i40e_construct_skb(rx_ring, rx_buffer, size);
> +               } else if (ring_uses_build_skb(rx_ring)) {
> +                       skb = i40e_build_skb(rx_ring, &xdp, rx_buffer);
> +               } else {
> +                       skb = i40e_construct_skb(rx_ring, &xdp, rx_buffer);
> +               }
>
>                 /* exit if we failed to retrieve a buffer */
>                 if (!skb) {
> @@ -2026,18 +2105,7 @@ static int i40e_clean_rx_irq(struct i40e_ring *rx_ring, int budget)
>                 if (i40e_is_non_eop(rx_ring, rx_desc, skb))
>                         continue;
>
> -               /* ERR_MASK will only have valid bits if EOP set, and
> -                * what we are doing here is actually checking
> -                * I40E_RX_DESC_ERROR_RXE_SHIFT, since it is the zeroth bit in
> -                * the error field
> -                */
> -               if (unlikely(i40e_test_staterr(rx_desc, BIT(I40E_RXD_QW1_ERROR_SHIFT)))) {
> -                       dev_kfree_skb_any(skb);
> -                       skb = NULL;
> -                       continue;
> -               }
> -
> -               if (i40e_cleanup_headers(rx_ring, skb)) {
> +               if (i40e_cleanup_headers(rx_ring, rx_desc, skb)) {
>                         skb = NULL;
>                         continue;
>                 }
> diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.h b/drivers/net/ethernet/intel/i40e/i40e_txrx.h
> index f5de51124cae..31f0b162996f 100644
> --- a/drivers/net/ethernet/intel/i40e/i40e_txrx.h
> +++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.h
> @@ -360,6 +360,7 @@ struct i40e_ring {
>         void *desc;                     /* Descriptor ring memory */
>         struct device *dev;             /* Used for DMA mapping */
>         struct net_device *netdev;      /* netdev ring maps to */
> +       struct bpf_prog *xdp_prog;
>         union {
>                 struct i40e_tx_buffer *tx_bi;
>                 struct i40e_rx_buffer *rx_bi;
> --
> 2.11.0
>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Intel-wired-lan] [PATCH v5 1/2] i40e: add XDP support for pass and drop actions
  2017-05-19 12:29   ` Alexander Duyck
@ 2017-05-19 12:46     ` =?unknown-8bit?q?Bj=C3=B6rn_T=C3=B6pel?=
  0 siblings, 0 replies; 8+ messages in thread
From: =?unknown-8bit?q?Bj=C3=B6rn_T=C3=B6pel?= @ 2017-05-19 12:46 UTC (permalink / raw)
  To: intel-wired-lan

2017-05-19 14:29 GMT+02:00 Alexander Duyck <alexander.duyck@gmail.com>:
> On Fri, May 19, 2017 at 12:08 AM, Bj?rn T?pel <bjorn.topel@gmail.com> wrote:
>> From: Bj?rn T?pel <bjorn.topel@intel.com>
>>
>> This commit adds basic XDP support for i40e derived NICs. All XDP
>> actions will end up in XDP_DROP.
>>
>> Signed-off-by: Bj?rn T?pel <bjorn.topel@intel.com>
>
> I have called out a few minor cosmetic issues that I would like to see
> fixed, but otherwise the patch looks good to me and the issues I saw
> didn't impact functionality.

Thanks for the quick turn-around, Alex!

I'll address both size/xdp parameter order and Christmas tree layout
in the next spin -- that and any issues in the XDP_TX patch.


Thanks,
Bj?rn


>
> Reviewed-by: Alexander Duyck <alexander.h.duyck@intel.com>
>
>> ---
>>  drivers/net/ethernet/intel/i40e/i40e.h      |   7 ++
>>  drivers/net/ethernet/intel/i40e/i40e_main.c |  75 ++++++++++++++++
>>  drivers/net/ethernet/intel/i40e/i40e_txrx.c | 130 +++++++++++++++++++++-------
>>  drivers/net/ethernet/intel/i40e/i40e_txrx.h |   1 +
>>  4 files changed, 182 insertions(+), 31 deletions(-)
>>
>> diff --git a/drivers/net/ethernet/intel/i40e/i40e.h b/drivers/net/ethernet/intel/i40e/i40e.h
>> index 395ca94faf80..d3195b29d53c 100644
>> --- a/drivers/net/ethernet/intel/i40e/i40e.h
>> +++ b/drivers/net/ethernet/intel/i40e/i40e.h
>> @@ -645,6 +645,8 @@ struct i40e_vsi {
>>         u16 max_frame;
>>         u16 rx_buf_len;
>>
>> +       struct bpf_prog *xdp_prog;
>> +
>>         /* List of q_vectors allocated to this VSI */
>>         struct i40e_q_vector **q_vectors;
>>         int num_q_vectors;
>> @@ -972,4 +974,9 @@ i40e_status i40e_get_npar_bw_setting(struct i40e_pf *pf);
>>  i40e_status i40e_set_npar_bw_setting(struct i40e_pf *pf);
>>  i40e_status i40e_commit_npar_bw_setting(struct i40e_pf *pf);
>>  void i40e_print_link_message(struct i40e_vsi *vsi, bool isup);
>> +
>> +static inline bool i40e_enabled_xdp_vsi(struct i40e_vsi *vsi)
>> +{
>> +       return !!vsi->xdp_prog;
>> +}
>>  #endif /* _I40E_H_ */
>> diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c
>> index 8d1d3b859af7..27a29904611a 100644
>> --- a/drivers/net/ethernet/intel/i40e/i40e_main.c
>> +++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
>> @@ -27,6 +27,7 @@
>>  #include <linux/etherdevice.h>
>>  #include <linux/of_net.h>
>>  #include <linux/pci.h>
>> +#include <linux/bpf.h>
>>
>>  /* Local includes */
>>  #include "i40e.h"
>> @@ -2408,6 +2409,13 @@ static int i40e_change_mtu(struct net_device *netdev, int new_mtu)
>>         struct i40e_vsi *vsi = np->vsi;
>>         struct i40e_pf *pf = vsi->back;
>>
>> +       if (i40e_enabled_xdp_vsi(vsi)) {
>> +               int frame_size = new_mtu + ETH_HLEN + ETH_FCS_LEN + VLAN_HLEN;
>> +
>> +               if (frame_size > vsi->rx_buf_len)
>> +                       return -EINVAL;
>> +       }
>> +
>>         netdev_info(netdev, "changing MTU from %d to %d\n",
>>                     netdev->mtu, new_mtu);
>>         netdev->mtu = new_mtu;
>> @@ -9310,6 +9318,72 @@ static netdev_features_t i40e_features_check(struct sk_buff *skb,
>>         return features & ~(NETIF_F_CSUM_MASK | NETIF_F_GSO_MASK);
>>  }
>>
>> +/**
>> + * i40e_xdp_setup - add/remove an XDP program
>> + * @vsi: VSI to changed
>> + * @prog: XDP program
>> + **/
>> +static int i40e_xdp_setup(struct i40e_vsi *vsi,
>> +                         struct bpf_prog *prog)
>> +{
>> +       struct i40e_pf *pf = vsi->back;
>> +       int frame_size = vsi->netdev->mtu + ETH_HLEN + ETH_FCS_LEN + VLAN_HLEN;
>> +       int i;
>> +       bool need_reset;
>> +       struct bpf_prog *old_prog;
>> +
>> +       /* Don't allow frames that span over multiple buffers */
>> +       if (frame_size > vsi->rx_buf_len)
>> +               return -EINVAL;
>> +
>> +       if (!i40e_enabled_xdp_vsi(vsi) && !prog)
>> +               return 0;
>> +
>> +       /* When turning XDP on->off/off->on we reset and rebuild the rings. */
>> +       need_reset = (i40e_enabled_xdp_vsi(vsi) != !!prog);
>> +
>> +       if (need_reset)
>> +               i40e_prep_for_reset(pf, true);
>> +
>> +       old_prog = xchg(&vsi->xdp_prog, prog);
>> +
>> +       if (need_reset)
>> +               i40e_reset_and_rebuild(pf, true, true);
>> +
>> +       for (i = 0; i < vsi->num_queue_pairs; i++)
>> +               WRITE_ONCE(vsi->rx_rings[i]->xdp_prog, vsi->xdp_prog);
>> +
>> +       if (old_prog)
>> +               bpf_prog_put(old_prog);
>> +
>> +       return 0;
>> +}
>> +
>> +/**
>> + * i40e_xdp - implements ndo_xdp for i40e
>> + * @dev: netdevice
>> + * @xdp: XDP command
>> + **/
>> +static int i40e_xdp(struct net_device *dev,
>> +                   struct netdev_xdp *xdp)
>> +{
>> +       struct i40e_netdev_priv *np = netdev_priv(dev);
>> +       struct i40e_vsi *vsi = np->vsi;
>> +
>> +       if (vsi->type != I40E_VSI_MAIN)
>> +               return -EINVAL;
>> +
>> +       switch (xdp->command) {
>> +       case XDP_SETUP_PROG:
>> +               return i40e_xdp_setup(vsi, xdp->prog);
>> +       case XDP_QUERY_PROG:
>> +               xdp->prog_attached = i40e_enabled_xdp_vsi(vsi);
>> +               return 0;
>> +       default:
>> +               return -EINVAL;
>> +       }
>> +}
>> +
>>  static const struct net_device_ops i40e_netdev_ops = {
>>         .ndo_open               = i40e_open,
>>         .ndo_stop               = i40e_close,
>> @@ -9342,6 +9416,7 @@ static const struct net_device_ops i40e_netdev_ops = {
>>         .ndo_features_check     = i40e_features_check,
>>         .ndo_bridge_getlink     = i40e_ndo_bridge_getlink,
>>         .ndo_bridge_setlink     = i40e_ndo_bridge_setlink,
>> +       .ndo_xdp                = i40e_xdp,
>>  };
>>
>>  /**
>> diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.c b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
>> index af554f3cda19..0c2f0230faf4 100644
>> --- a/drivers/net/ethernet/intel/i40e/i40e_txrx.c
>> +++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
>> @@ -26,6 +26,7 @@
>>
>>  #include <linux/prefetch.h>
>>  #include <net/busy_poll.h>
>> +#include <linux/bpf_trace.h>
>>  #include "i40e.h"
>>  #include "i40e_trace.h"
>>  #include "i40e_prototype.h"
>> @@ -1195,6 +1196,7 @@ void i40e_clean_rx_ring(struct i40e_ring *rx_ring)
>>  void i40e_free_rx_resources(struct i40e_ring *rx_ring)
>>  {
>>         i40e_clean_rx_ring(rx_ring);
>> +       rx_ring->xdp_prog = NULL;
>>         kfree(rx_ring->rx_bi);
>>         rx_ring->rx_bi = NULL;
>>
>> @@ -1241,6 +1243,8 @@ int i40e_setup_rx_descriptors(struct i40e_ring *rx_ring)
>>         rx_ring->next_to_clean = 0;
>>         rx_ring->next_to_use = 0;
>>
>> +       rx_ring->xdp_prog = rx_ring->vsi->xdp_prog;
>> +
>>         return 0;
>>  err:
>>         kfree(rx_ring->rx_bi);
>> @@ -1592,6 +1596,7 @@ void i40e_process_skb_fields(struct i40e_ring *rx_ring,
>>  /**
>>   * i40e_cleanup_headers - Correct empty headers
>>   * @rx_ring: rx descriptor ring packet is being transacted on
>> + * @rx_desc: pointer to the EOP Rx descriptor
>>   * @skb: pointer to current skb being fixed
>>   *
>>   * Also address the case where we are pulling data in on pages only
>> @@ -1602,8 +1607,25 @@ void i40e_process_skb_fields(struct i40e_ring *rx_ring,
>>   *
>>   * Returns true if an error was encountered and skb was freed.
>>   **/
>> -static bool i40e_cleanup_headers(struct i40e_ring *rx_ring, struct sk_buff *skb)
>> +static bool i40e_cleanup_headers(struct i40e_ring *rx_ring,
>> +                                union i40e_rx_desc *rx_desc,
>> +                                struct sk_buff *skb)
>>  {
>> +       /* XDP packets use error pointer so abort at this point */
>> +       if (IS_ERR(skb))
>> +               return true;
>> +
>> +       /* ERR_MASK will only have valid bits if EOP set, and
>> +        * what we are doing here is actually checking
>> +        * I40E_RX_DESC_ERROR_RXE_SHIFT, since it is the zeroth bit in
>> +        * the error field
>> +        */
>> +       if (unlikely(i40e_test_staterr(rx_desc,
>> +                                      BIT(I40E_RXD_QW1_ERROR_SHIFT)))) {
>> +               dev_kfree_skb_any(skb);
>> +               return true;
>> +       }
>> +
>>         /* if eth_skb_pad returns an error the skb was freed */
>>         if (eth_skb_pad(skb))
>>                 return true;
>> @@ -1783,10 +1805,10 @@ static struct i40e_rx_buffer *i40e_get_rx_buffer(struct i40e_ring *rx_ring,
>>   * skb correctly.
>>   */
>>  static struct sk_buff *i40e_construct_skb(struct i40e_ring *rx_ring,
>> -                                         struct i40e_rx_buffer *rx_buffer,
>> -                                         unsigned int size)
>> +                                         struct xdp_buff *xdp,
>> +                                         struct i40e_rx_buffer *rx_buffer)
>
> One minor request here. Let's leave rx_buffer in place and instead
> just replace size with the xdp pointer. It helps to reduce some of the
> noise in this and will make maintenance easier for the out-of-tree
> driver as we will likely just be replacing xdp with size on older
> kernels via our kcompat.
>
>>  {
>> -       void *va = page_address(rx_buffer->page) + rx_buffer->page_offset;
>> +       unsigned int size = xdp->data_end - xdp->data;
>>  #if (PAGE_SIZE < 8192)
>>         unsigned int truesize = i40e_rx_pg_size(rx_ring) / 2;
>>  #else
>> @@ -1796,9 +1818,9 @@ static struct sk_buff *i40e_construct_skb(struct i40e_ring *rx_ring,
>>         struct sk_buff *skb;
>>
>>         /* prefetch first cache line of first page */
>> -       prefetch(va);
>> +       prefetch(xdp->data);
>>  #if L1_CACHE_BYTES < 128
>> -       prefetch(va + L1_CACHE_BYTES);
>> +       prefetch(xdp->data + L1_CACHE_BYTES);
>>  #endif
>>
>>         /* allocate a skb to store the frags */
>> @@ -1811,10 +1833,11 @@ static struct sk_buff *i40e_construct_skb(struct i40e_ring *rx_ring,
>>         /* Determine available headroom for copy */
>>         headlen = size;
>>         if (headlen > I40E_RX_HDR_SIZE)
>> -               headlen = eth_get_headlen(va, I40E_RX_HDR_SIZE);
>> +               headlen = eth_get_headlen(xdp->data, I40E_RX_HDR_SIZE);
>>
>>         /* align pull length to size of long to optimize memcpy performance */
>> -       memcpy(__skb_put(skb, headlen), va, ALIGN(headlen, sizeof(long)));
>> +       memcpy(__skb_put(skb, headlen), xdp->data,
>> +              ALIGN(headlen, sizeof(long)));
>>
>>         /* update all of the pointers */
>>         size -= headlen;
>> @@ -1847,10 +1870,10 @@ static struct sk_buff *i40e_construct_skb(struct i40e_ring *rx_ring,
>>   * to set up the skb correctly and avoid any memcpy overhead.
>>   */
>>  static struct sk_buff *i40e_build_skb(struct i40e_ring *rx_ring,
>> -                                     struct i40e_rx_buffer *rx_buffer,
>> -                                     unsigned int size)
>> +                                     struct xdp_buff *xdp,
>> +                                     struct i40e_rx_buffer *rx_buffer)
>
> Same here. If possible please just replace size with xdp instead of
> moving the variables around.
>
>>  {
>> -       void *va = page_address(rx_buffer->page) + rx_buffer->page_offset;
>> +       unsigned int size = xdp->data_end - xdp->data;
>>  #if (PAGE_SIZE < 8192)
>>         unsigned int truesize = i40e_rx_pg_size(rx_ring) / 2;
>>  #else
>> @@ -1860,12 +1883,12 @@ static struct sk_buff *i40e_build_skb(struct i40e_ring *rx_ring,
>>         struct sk_buff *skb;
>>
>>         /* prefetch first cache line of first page */
>> -       prefetch(va);
>> +       prefetch(xdp->data);
>>  #if L1_CACHE_BYTES < 128
>> -       prefetch(va + L1_CACHE_BYTES);
>> +       prefetch(xdp->data + L1_CACHE_BYTES);
>>  #endif
>>         /* build an skb around the page buffer */
>> -       skb = build_skb(va - I40E_SKB_PAD, truesize);
>> +       skb = build_skb(xdp->data_hard_start, truesize);
>>         if (unlikely(!skb))
>>                 return NULL;
>>
>> @@ -1944,6 +1967,46 @@ static bool i40e_is_non_eop(struct i40e_ring *rx_ring,
>>         return true;
>>  }
>>
>> +#define I40E_XDP_PASS 0
>> +#define I40E_XDP_CONSUMED 1
>> +
>> +/**
>> + * i40e_run_xdp - run an XDP program
>> + * @rx_ring: Rx ring being processed
>> + * @xdp: XDP buffer containing the frame
>> + **/
>> +static struct sk_buff *i40e_run_xdp(struct i40e_ring *rx_ring,
>> +                                   struct xdp_buff *xdp)
>> +{
>> +       int result = I40E_XDP_PASS;
>> +       struct bpf_prog *xdp_prog;
>> +       u32 act;
>> +
>> +       rcu_read_lock();
>> +       xdp_prog = READ_ONCE(rx_ring->xdp_prog);
>> +
>> +       if (!xdp_prog)
>> +               goto xdp_out;
>> +
>> +       act = bpf_prog_run_xdp(xdp_prog, xdp);
>> +       switch (act) {
>> +       case XDP_PASS:
>> +               break;
>> +       default:
>> +               bpf_warn_invalid_xdp_action(act);
>> +       case XDP_TX:
>> +       case XDP_ABORTED:
>> +               trace_xdp_exception(rx_ring->netdev, xdp_prog, act);
>> +               /* fallthrough -- handle aborts by dropping packet */
>> +       case XDP_DROP:
>> +               result = I40E_XDP_CONSUMED;
>> +               break;
>> +       }
>> +xdp_out:
>> +       rcu_read_unlock();
>> +       return ERR_PTR(-result);
>> +}
>> +
>>  /**
>>   * i40e_clean_rx_irq - Clean completed descriptors from Rx ring - bounce buf
>>   * @rx_ring: rx descriptor ring to transact packets on
>> @@ -1970,6 +2033,7 @@ static int i40e_clean_rx_irq(struct i40e_ring *rx_ring, int budget)
>>                 u16 vlan_tag;
>>                 u8 rx_ptype;
>>                 u64 qword;
>> +               struct xdp_buff xdp;
>
> Dave usually prefers what we call a reverse Christmas tree layout for
> variables. Basically you should try to place the longest variable
> declaration at the top and the shortest at the bottom. So you could
> probably place it above vlan tag or size depending on the length of
> the line.
>
>>
>>                 /* return some buffers to hardware, one at a time is too slow */
>>                 if (cleaned_count >= I40E_RX_BUFFER_WRITE) {
>> @@ -2006,12 +2070,27 @@ static int i40e_clean_rx_irq(struct i40e_ring *rx_ring, int budget)
>>                 rx_buffer = i40e_get_rx_buffer(rx_ring, size);
>>
>>                 /* retrieve a buffer from the ring */
>> -               if (skb)
>> +               if (!skb) {
>> +                       xdp.data = page_address(rx_buffer->page) +
>> +                                  rx_buffer->page_offset;
>> +                       xdp.data_hard_start = xdp.data -
>> +                                             i40e_rx_offset(rx_ring);
>> +                       xdp.data_end = xdp.data + size;
>> +
>> +                       skb = i40e_run_xdp(rx_ring, &xdp);
>> +               }
>> +
>> +               if (IS_ERR(skb)) {
>> +                       total_rx_bytes += size;
>> +                       total_rx_packets++;
>> +                       rx_buffer->pagecnt_bias++;
>> +               } else if (skb) {
>>                         i40e_add_rx_frag(rx_ring, rx_buffer, skb, size);
>> -               else if (ring_uses_build_skb(rx_ring))
>> -                       skb = i40e_build_skb(rx_ring, rx_buffer, size);
>> -               else
>> -                       skb = i40e_construct_skb(rx_ring, rx_buffer, size);
>> +               } else if (ring_uses_build_skb(rx_ring)) {
>> +                       skb = i40e_build_skb(rx_ring, &xdp, rx_buffer);
>> +               } else {
>> +                       skb = i40e_construct_skb(rx_ring, &xdp, rx_buffer);
>> +               }
>>
>>                 /* exit if we failed to retrieve a buffer */
>>                 if (!skb) {
>> @@ -2026,18 +2105,7 @@ static int i40e_clean_rx_irq(struct i40e_ring *rx_ring, int budget)
>>                 if (i40e_is_non_eop(rx_ring, rx_desc, skb))
>>                         continue;
>>
>> -               /* ERR_MASK will only have valid bits if EOP set, and
>> -                * what we are doing here is actually checking
>> -                * I40E_RX_DESC_ERROR_RXE_SHIFT, since it is the zeroth bit in
>> -                * the error field
>> -                */
>> -               if (unlikely(i40e_test_staterr(rx_desc, BIT(I40E_RXD_QW1_ERROR_SHIFT)))) {
>> -                       dev_kfree_skb_any(skb);
>> -                       skb = NULL;
>> -                       continue;
>> -               }
>> -
>> -               if (i40e_cleanup_headers(rx_ring, skb)) {
>> +               if (i40e_cleanup_headers(rx_ring, rx_desc, skb)) {
>>                         skb = NULL;
>>                         continue;
>>                 }
>> diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.h b/drivers/net/ethernet/intel/i40e/i40e_txrx.h
>> index f5de51124cae..31f0b162996f 100644
>> --- a/drivers/net/ethernet/intel/i40e/i40e_txrx.h
>> +++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.h
>> @@ -360,6 +360,7 @@ struct i40e_ring {
>>         void *desc;                     /* Descriptor ring memory */
>>         struct device *dev;             /* Used for DMA mapping */
>>         struct net_device *netdev;      /* netdev ring maps to */
>> +       struct bpf_prog *xdp_prog;
>>         union {
>>                 struct i40e_tx_buffer *tx_bi;
>>                 struct i40e_rx_buffer *rx_bi;
>> --
>> 2.11.0
>>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Intel-wired-lan] [PATCH v5 2/2] i40e: add support for XDP_TX action
  2017-05-19  7:08 ` [Intel-wired-lan] [PATCH v5 2/2] i40e: add support for XDP_TX action =?unknown-8bit?q?Bj=C3=B6rn_T=C3=B6pel?=
@ 2017-05-19 13:40   ` Alexander Duyck
  0 siblings, 0 replies; 8+ messages in thread
From: Alexander Duyck @ 2017-05-19 13:40 UTC (permalink / raw)
  To: intel-wired-lan

On Fri, May 19, 2017 at 12:08 AM, Bj?rn T?pel <bjorn.topel@gmail.com> wrote:
> From: Bj?rn T?pel <bjorn.topel@intel.com>
>
> This patch adds proper XDP_TX action support. For each Tx ring, an
> additional XDP Tx ring is allocated and setup. This version does the
> DMA mapping in the fast-path, which will penalize performance for
> IOMMU enabled systems. Further, debugfs support is not wired up for
> the XDP Tx rings.
>
> Signed-off-by: Bj?rn T?pel <bjorn.topel@intel.com>

So I think we still have some extra complexity here we can remove. I
called it out down below.

Basic idea is I would like to see us lay out queues Tx, XDP, Rx
instead of Tx, Rx, XDP as it makes it easier for us to just overflow
out of the allocated Tx rings and to process XDP rings.

> ---
>  drivers/net/ethernet/intel/i40e/i40e.h         |   1 +
>  drivers/net/ethernet/intel/i40e/i40e_ethtool.c |  57 +++++++-
>  drivers/net/ethernet/intel/i40e/i40e_main.c    | 195 +++++++++++++++++++++----
>  drivers/net/ethernet/intel/i40e/i40e_txrx.c    | 119 ++++++++++++++-
>  drivers/net/ethernet/intel/i40e/i40e_txrx.h    |  11 ++
>  5 files changed, 350 insertions(+), 33 deletions(-)
>
> diff --git a/drivers/net/ethernet/intel/i40e/i40e.h b/drivers/net/ethernet/intel/i40e/i40e.h
> index d3195b29d53c..4250ab55a9f1 100644
> --- a/drivers/net/ethernet/intel/i40e/i40e.h
> +++ b/drivers/net/ethernet/intel/i40e/i40e.h
> @@ -629,6 +629,7 @@ struct i40e_vsi {
>         /* These are containers of ring pointers, allocated at run-time */
>         struct i40e_ring **rx_rings;
>         struct i40e_ring **tx_rings;
> +       struct i40e_ring **xdp_rings; /* XDP Tx rings */
>
>         u32  active_filters;
>         u32  promisc_threshold;
> diff --git a/drivers/net/ethernet/intel/i40e/i40e_ethtool.c b/drivers/net/ethernet/intel/i40e/i40e_ethtool.c
> index 3d58762efbc0..518788e42887 100644
> --- a/drivers/net/ethernet/intel/i40e/i40e_ethtool.c
> +++ b/drivers/net/ethernet/intel/i40e/i40e_ethtool.c
> @@ -1302,7 +1302,7 @@ static void i40e_get_ringparam(struct net_device *netdev,
>  static int i40e_set_ringparam(struct net_device *netdev,
>                               struct ethtool_ringparam *ring)
>  {
> -       struct i40e_ring *tx_rings = NULL, *rx_rings = NULL;
> +       struct i40e_ring *tx_rings = NULL, *rx_rings = NULL, *xdp_rings = NULL;
>         struct i40e_netdev_priv *np = netdev_priv(netdev);
>         struct i40e_hw *hw = &np->vsi->back->hw;
>         struct i40e_vsi *vsi = np->vsi;
> @@ -1310,6 +1310,7 @@ static int i40e_set_ringparam(struct net_device *netdev,
>         u32 new_rx_count, new_tx_count;
>         int timeout = 50;
>         int i, err = 0;
> +       bool xdp = i40e_enabled_xdp_vsi(vsi);

So this should be moved up to match up with the reverse Christmas tree
layout that I called out in the previous patch.

>
>         if ((ring->rx_mini_pending) || (ring->rx_jumbo_pending))
>                 return -EINVAL;
> @@ -1345,6 +1346,8 @@ static int i40e_set_ringparam(struct net_device *netdev,
>                 for (i = 0; i < vsi->num_queue_pairs; i++) {
>                         vsi->tx_rings[i]->count = new_tx_count;
>                         vsi->rx_rings[i]->count = new_rx_count;
> +                       if (xdp)
> +                               vsi->xdp_rings[i]->count = new_tx_count;

I would say you can drop the "if (xdp)" line here. If the counts
aren't active updating it should have no impact, but not updating them
can cause issues since the XDP rings can fall out of sync with the Tx
rings.

>                 }
>                 goto done;
>         }
> @@ -1440,6 +1443,41 @@ static int i40e_set_ringparam(struct net_device *netdev,
>                 }
>         }
>
> +       /* alloc updated XDP Tx resources */
> +       if (xdp && new_tx_count != vsi->xdp_rings[0]->count) {
> +               netdev_info(netdev,
> +                           "Changing XDP Tx descriptor count from %d to %d.\n",
> +                           vsi->xdp_rings[0]->count, new_tx_count);
> +               xdp_rings = kcalloc(vsi->alloc_queue_pairs,
> +                                   sizeof(struct i40e_ring), GFP_KERNEL);
> +               if (!xdp_rings) {
> +                       err = -ENOMEM;
> +                       goto free_rx;
> +               }
> +
> +               for (i = 0; i < vsi->num_queue_pairs; i++) {
> +                       /* clone ring and setup updated count */
> +                       xdp_rings[i] = *vsi->xdp_rings[i];
> +                       xdp_rings[i].count = new_tx_count;
> +                       /* the desc and bi pointers will be reallocated in the
> +                        * setup call
> +                        */
> +                       xdp_rings[i].desc = NULL;
> +                       xdp_rings[i].rx_bi = NULL;
> +                       err = i40e_setup_tx_descriptors(&xdp_rings[i]);
> +                       if (err) {
> +                               while (i) {
> +                                       i--;
> +                                       i40e_free_tx_resources(&xdp_rings[i]);
> +                               }
> +                               kfree(xdp_rings);
> +                               xdp_rings = NULL;
> +
> +                               goto free_rx;
> +                       }
> +               }
> +       }
> +

Instead of adding this as a new block it might make sense to make this
a subsection of the Tx ring setup. You could then add a label to jump
to setup Rx if XDP is not supported and could avoid having to perform
the test more than once.

>         /* Bring interface down, copy in the new ring info,
>          * then restore the interface
>          */
> @@ -1474,8 +1512,25 @@ static int i40e_set_ringparam(struct net_device *netdev,
>                 rx_rings = NULL;
>         }
>
> +       if (xdp_rings) {
> +               for (i = 0; i < vsi->num_queue_pairs; i++) {
> +                       i40e_free_tx_resources(vsi->xdp_rings[i]);
> +                       *vsi->xdp_rings[i] = xdp_rings[i];
> +               }
> +               kfree(xdp_rings);
> +               xdp_rings = NULL;
> +       }
> +

I would prefer to see us move this up and handle it between Tx and Rx
instead of after Rx.

>         i40e_up(vsi);
>
> +free_rx:
> +       /* error cleanup if the XDP Tx allocations failed after getting Rx */
> +       if (rx_rings) {
> +               for (i = 0; i < vsi->num_queue_pairs; i++)
> +                       i40e_free_rx_resources(&rx_rings[i]);
> +               kfree(rx_rings);
> +               rx_rings = NULL;
> +       }

With the suggestion I made above I would rather see us freeing XDP
rings if the Rx failed rather than freeing Rx if XDP failed.

>  free_tx:
>         /* error cleanup if the Rx allocations failed after getting Tx */
>         if (tx_rings) {
> diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c
> index 27a29904611a..f1dbcead79ba 100644
> --- a/drivers/net/ethernet/intel/i40e/i40e_main.c
> +++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
> @@ -408,6 +408,27 @@ struct rtnl_link_stats64 *i40e_get_vsi_stats_struct(struct i40e_vsi *vsi)
>  }
>
>  /**
> + * i40e_get_netdev_stats_struct_tx - populate stats from a Tx ring
> + * @ring: Tx ring to get statistics from
> + * @stats: statistics entry to be updated
> + **/
> +static void i40e_get_netdev_stats_struct_tx(struct i40e_ring *ring,
> +                                           struct rtnl_link_stats64 *stats)
> +{
> +       u64 bytes, packets;
> +       unsigned int start;
> +
> +       do {
> +               start = u64_stats_fetch_begin_irq(&ring->syncp);
> +               packets = ring->stats.packets;
> +               bytes   = ring->stats.bytes;
> +       } while (u64_stats_fetch_retry_irq(&ring->syncp, start));
> +
> +       stats->tx_packets += packets;
> +       stats->tx_bytes   += bytes;
> +}
> +
> +/**
>   * i40e_get_netdev_stats_struct - Get statistics for netdev interface
>   * @netdev: network interface device structure
>   *
> @@ -437,15 +458,8 @@ static void i40e_get_netdev_stats_struct(struct net_device *netdev,
>                 tx_ring = ACCESS_ONCE(vsi->tx_rings[i]);
>                 if (!tx_ring)
>                         continue;
> +               i40e_get_netdev_stats_struct_tx(tx_ring, stats);
>
> -               do {
> -                       start = u64_stats_fetch_begin_irq(&tx_ring->syncp);
> -                       packets = tx_ring->stats.packets;
> -                       bytes   = tx_ring->stats.bytes;
> -               } while (u64_stats_fetch_retry_irq(&tx_ring->syncp, start));
> -
> -               stats->tx_packets += packets;
> -               stats->tx_bytes   += bytes;
>                 rx_ring = &tx_ring[1];
>
>                 do {
> @@ -456,6 +470,9 @@ static void i40e_get_netdev_stats_struct(struct net_device *netdev,
>
>                 stats->rx_packets += packets;
>                 stats->rx_bytes   += bytes;
> +
> +               if (i40e_enabled_xdp_vsi(vsi))
> +                       i40e_get_netdev_stats_struct_tx(&rx_ring[1], stats);
>         }
>         rcu_read_unlock();
>
> @@ -2802,6 +2819,12 @@ static int i40e_vsi_setup_tx_resources(struct i40e_vsi *vsi)
>         for (i = 0; i < vsi->num_queue_pairs && !err; i++)
>                 err = i40e_setup_tx_descriptors(vsi->tx_rings[i]);
>
> +       if (!i40e_enabled_xdp_vsi(vsi))
> +               return err;
> +
> +       for (i = 0; i < vsi->num_queue_pairs && !err; i++)
> +               err = i40e_setup_tx_descriptors(vsi->xdp_rings[i]);
> +
>         return err;
>  }
>
> @@ -2815,12 +2838,17 @@ static void i40e_vsi_free_tx_resources(struct i40e_vsi *vsi)
>  {
>         int i;
>
> -       if (!vsi->tx_rings)
> -               return;
> +       if (vsi->tx_rings) {
> +               for (i = 0; i < vsi->num_queue_pairs; i++)
> +                       if (vsi->tx_rings[i] && vsi->tx_rings[i]->desc)
> +                               i40e_free_tx_resources(vsi->tx_rings[i]);
> +       }
>
> -       for (i = 0; i < vsi->num_queue_pairs; i++)
> -               if (vsi->tx_rings[i] && vsi->tx_rings[i]->desc)
> -                       i40e_free_tx_resources(vsi->tx_rings[i]);
> +       if (vsi->xdp_rings) {
> +               for (i = 0; i < vsi->num_queue_pairs; i++)
> +                       if (vsi->xdp_rings[i] && vsi->xdp_rings[i]->desc)
> +                               i40e_free_tx_resources(vsi->xdp_rings[i]);
> +       }
>  }
>
>  /**
> @@ -3081,6 +3109,12 @@ static int i40e_vsi_configure_tx(struct i40e_vsi *vsi)
>         for (i = 0; (i < vsi->num_queue_pairs) && !err; i++)
>                 err = i40e_configure_tx_ring(vsi->tx_rings[i]);
>
> +       if (!i40e_enabled_xdp_vsi(vsi))
> +               return err;
> +
> +       for (i = 0; (i < vsi->num_queue_pairs) && !err; i++)
> +               err = i40e_configure_tx_ring(vsi->xdp_rings[i]);
> +
>         return err;
>  }
>
> @@ -3230,6 +3264,7 @@ static void i40e_vsi_configure_msix(struct i40e_vsi *vsi)
>         u16 vector;
>         int i, q;
>         u32 qp;
> +       bool has_xdp = i40e_enabled_xdp_vsi(vsi);

This line should be moved up for the reverse Xmas tree layout.

>
>         /* The interrupt indexing is offset by 1 in the PFINT_ITRn
>          * and PFINT_LNKLSTn registers, e.g.:
> @@ -3256,16 +3291,28 @@ static void i40e_vsi_configure_msix(struct i40e_vsi *vsi)
>                 wr32(hw, I40E_PFINT_LNKLSTN(vector - 1), qp);
>                 for (q = 0; q < q_vector->num_ringpairs; q++) {
>                         u32 val;
> +                       u32 nextqp = has_xdp ? qp + vsi->alloc_queue_pairs : qp;

Same here. This line should come before "u32 val;" not after.

>
>                         val = I40E_QINT_RQCTL_CAUSE_ENA_MASK |
>                               (I40E_RX_ITR << I40E_QINT_RQCTL_ITR_INDX_SHIFT)  |
>                               (vector      << I40E_QINT_RQCTL_MSIX_INDX_SHIFT) |
> -                             (qp          << I40E_QINT_RQCTL_NEXTQ_INDX_SHIFT)|
> +                             (nextqp      << I40E_QINT_RQCTL_NEXTQ_INDX_SHIFT)|
>                               (I40E_QUEUE_TYPE_TX
>                                       << I40E_QINT_RQCTL_NEXTQ_TYPE_SHIFT);
>
>                         wr32(hw, I40E_QINT_RQCTL(qp), val);
>
> +                       if (has_xdp) {
> +                               val = I40E_QINT_TQCTL_CAUSE_ENA_MASK |
> +                                     (I40E_TX_ITR << I40E_QINT_TQCTL_ITR_INDX_SHIFT)  |
> +                                     (vector      << I40E_QINT_TQCTL_MSIX_INDX_SHIFT) |
> +                                     (qp          << I40E_QINT_TQCTL_NEXTQ_INDX_SHIFT)|
> +                                     (I40E_QUEUE_TYPE_TX
> +                                      << I40E_QINT_TQCTL_NEXTQ_TYPE_SHIFT);
> +
> +                               wr32(hw, I40E_QINT_TQCTL(nextqp), val);
> +                       }
> +
>                         val = I40E_QINT_TQCTL_CAUSE_ENA_MASK |
>                               (I40E_TX_ITR << I40E_QINT_TQCTL_ITR_INDX_SHIFT)  |
>                               (vector      << I40E_QINT_TQCTL_MSIX_INDX_SHIFT) |
> @@ -3334,6 +3381,7 @@ static void i40e_configure_msi_and_legacy(struct i40e_vsi *vsi)
>         struct i40e_pf *pf = vsi->back;
>         struct i40e_hw *hw = &pf->hw;
>         u32 val;
> +       u32 nextqp = i40e_enabled_xdp_vsi(vsi) ? vsi->alloc_queue_pairs : 0;

Same here. The line should be moved up to the point that it has a
longer variable declaration in front of it. If there are no longer
variable declarations it should be the top.

>         /* set the ITR configuration */
>         q_vector->itr_countdown = ITR_COUNTDOWN_START;
> @@ -3350,12 +3398,22 @@ static void i40e_configure_msi_and_legacy(struct i40e_vsi *vsi)
>         wr32(hw, I40E_PFINT_LNKLST0, 0);
>
>         /* Associate the queue pair to the vector and enable the queue int */
> -       val = I40E_QINT_RQCTL_CAUSE_ENA_MASK                  |
> -             (I40E_RX_ITR << I40E_QINT_RQCTL_ITR_INDX_SHIFT) |
> +       val = I40E_QINT_RQCTL_CAUSE_ENA_MASK                   |
> +             (I40E_RX_ITR << I40E_QINT_RQCTL_ITR_INDX_SHIFT)  |
> +             (nextqp      << I40E_QINT_RQCTL_NEXTQ_INDX_SHIFT)|
>               (I40E_QUEUE_TYPE_TX << I40E_QINT_TQCTL_NEXTQ_TYPE_SHIFT);
>
>         wr32(hw, I40E_QINT_RQCTL(0), val);
>
> +       if (i40e_enabled_xdp_vsi(vsi)) {
> +               val = I40E_QINT_TQCTL_CAUSE_ENA_MASK                 |
> +                     (I40E_TX_ITR << I40E_QINT_TQCTL_ITR_INDX_SHIFT)|
> +                     (I40E_QUEUE_TYPE_TX
> +                      << I40E_QINT_TQCTL_NEXTQ_TYPE_SHIFT);
> +
> +              wr32(hw, I40E_QINT_TQCTL(nextqp), val);
> +       }
> +
>         val = I40E_QINT_TQCTL_CAUSE_ENA_MASK                  |
>               (I40E_TX_ITR << I40E_QINT_TQCTL_ITR_INDX_SHIFT) |
>               (I40E_QUEUE_END_OF_LIST << I40E_QINT_TQCTL_NEXTQ_INDX_SHIFT);
> @@ -3522,6 +3580,9 @@ static void i40e_vsi_disable_irq(struct i40e_vsi *vsi)
>         for (i = 0; i < vsi->num_queue_pairs; i++) {
>                 wr32(hw, I40E_QINT_TQCTL(vsi->tx_rings[i]->reg_idx), 0);
>                 wr32(hw, I40E_QINT_RQCTL(vsi->rx_rings[i]->reg_idx), 0);
> +               if (!i40e_enabled_xdp_vsi(vsi))
> +                       continue;
> +               wr32(hw, I40E_QINT_TQCTL(vsi->xdp_rings[i]->reg_idx), 0);
>         }
>
>         if (pf->flags & I40E_FLAG_MSIX_ENABLED) {
> @@ -3818,12 +3879,22 @@ static void i40e_map_vector_to_qp(struct i40e_vsi *vsi, int v_idx, int qp_idx)
>         struct i40e_q_vector *q_vector = vsi->q_vectors[v_idx];
>         struct i40e_ring *tx_ring = vsi->tx_rings[qp_idx];
>         struct i40e_ring *rx_ring = vsi->rx_rings[qp_idx];
> +       struct i40e_ring *xdp_ring = i40e_enabled_xdp_vsi(vsi) ?
> +                                    vsi->xdp_rings[qp_idx] : NULL;

What you might try doing is pulling this line out and actually drop it
into the if statement below. You could replace the xdp_ring check with
the i40e_enabled_xdp_vsi(vsi) check. Then it makes this a bit more
readable as the XDP setup below becomes a mini version of the
i40e_map_vector_to_qp function. Also it avoids the reverse xmas tree
issues as you could initialize the xdp_ring pointer inside of the if
statement.

>
>         tx_ring->q_vector = q_vector;
>         tx_ring->next = q_vector->tx.ring;
>         q_vector->tx.ring = tx_ring;
>         q_vector->tx.count++;
>
> +       /* Place XDP Tx ring in the same q_vector ring list as regular Tx */
> +       if (xdp_ring) {
> +               xdp_ring->q_vector = q_vector;
> +               xdp_ring->next = q_vector->tx.ring;
> +               q_vector->tx.ring = xdp_ring;
> +               q_vector->tx.count++;
> +       }
> +
>         rx_ring->q_vector = q_vector;
>         rx_ring->next = q_vector->rx.ring;
>         q_vector->rx.ring = rx_ring;
> @@ -4026,6 +4097,23 @@ static int i40e_vsi_control_tx(struct i40e_vsi *vsi, bool enable)
>                 }
>         }
>
> +       if (ret || !i40e_enabled_xdp_vsi(vsi))
> +               return ret;
> +
> +       pf_q = vsi->base_queue + vsi->alloc_queue_pairs;
> +       for (i = 0; i < vsi->num_queue_pairs; i++, pf_q++) {
> +               i40e_control_tx_q(pf, pf_q, enable);
> +
> +               /* wait for the change to finish */
> +               ret = i40e_pf_txq_wait(pf, pf_q, enable);
> +               if (ret) {
> +                       dev_info(&pf->pdev->dev,
> +                                "VSI seid %d XDP Tx ring %d %sable timeout\n",
> +                                vsi->seid, pf_q, (enable ? "en" : "dis"));
> +                       break;
> +               }
> +       }
> +

It might work better to place this inside the original loop in this
function. What you could do is just add another variable inside the
original loop in this function that could be pf_q +
vsi->alloc_queue_pairs.

>         return ret;
>  }
>
> @@ -4535,7 +4623,7 @@ int i40e_vsi_wait_queues_disabled(struct i40e_vsi *vsi)
>                                  vsi->seid, pf_q);
>                         return ret;
>                 }
> -               /* Check and wait for the Tx queue */
> +               /* Check and wait for the Rx queue */
>                 ret = i40e_pf_rxq_wait(pf, pf_q, false);
>                 if (ret) {
>                         dev_info(&pf->pdev->dev,
> @@ -4545,6 +4633,21 @@ int i40e_vsi_wait_queues_disabled(struct i40e_vsi *vsi)
>                 }
>         }
>
> +       if (!i40e_enabled_xdp_vsi(vsi))
> +               return 0;
> +
> +       pf_q = vsi->base_queue + vsi->alloc_queue_pairs;
> +       for (i = 0; i < vsi->num_queue_pairs; i++, pf_q++) {
> +               /* Check and wait for the XDP Tx queue */
> +               ret = i40e_pf_txq_wait(pf, pf_q, false);
> +               if (ret) {
> +                       dev_info(&pf->pdev->dev,
> +                                "VSI seid %d XDP Tx ring %d disable timeout\n",
> +                                vsi->seid, pf_q);
> +                       return ret;
> +               }
> +       }
> +

This would make more sense inside of the original loop instead of
being added as an extra bit.

>         return 0;
>  }
>
> @@ -5454,6 +5557,8 @@ void i40e_down(struct i40e_vsi *vsi)
>
>         for (i = 0; i < vsi->num_queue_pairs; i++) {
>                 i40e_clean_tx_ring(vsi->tx_rings[i]);
> +               if (i40e_enabled_xdp_vsi(vsi))
> +                       i40e_clean_tx_ring(vsi->xdp_rings[i]);
>                 i40e_clean_rx_ring(vsi->rx_rings[i]);
>         }
>
> @@ -7475,6 +7580,7 @@ static int i40e_set_num_rings_in_vsi(struct i40e_vsi *vsi)
>         switch (vsi->type) {
>         case I40E_VSI_MAIN:
>                 vsi->alloc_queue_pairs = pf->num_lan_qps;
> +

Looks like you somehow added a extra empty line here. This can be dropped.

>                 vsi->num_desc = ALIGN(I40E_DEFAULT_NUM_DESCRIPTORS,
>                                       I40E_REQ_DESCRIPTOR_MULTIPLE);
>                 if (pf->flags & I40E_FLAG_MSIX_ENABLED)
> @@ -7524,13 +7630,17 @@ static int i40e_vsi_alloc_arrays(struct i40e_vsi *vsi, bool alloc_qvectors)
>  {
>         int size;
>         int ret = 0;
> +       int nrings;
>
> -       /* allocate memory for both Tx and Rx ring pointers */
> -       size = sizeof(struct i40e_ring *) * vsi->alloc_queue_pairs * 2;
> +       /* allocate memory for both Tx, Rx and XDP Tx ring pointers */
> +       nrings = i40e_enabled_xdp_vsi(vsi) ? 3 : 2;
> +       size = sizeof(struct i40e_ring *) * vsi->alloc_queue_pairs * nrings;
>         vsi->tx_rings = kzalloc(size, GFP_KERNEL);
>         if (!vsi->tx_rings)
>                 return -ENOMEM;
>         vsi->rx_rings = &vsi->tx_rings[vsi->alloc_queue_pairs];
> +       if (i40e_enabled_xdp_vsi(vsi))
> +               vsi->xdp_rings = &vsi->rx_rings[vsi->alloc_queue_pairs];

I have a general thought here. Why not look at placing the xdp_rings
in the slot immediately after the Tx rings and before the Rx rings?
Doing that may help to simplify some of the code as you could then
just loop through alloc_queue_pairs * 2 in order to handle all of the
XDP rings from the Tx ring array pointer.

>         if (alloc_qvectors) {
>                 /* allocate memory for q_vector pointers */
> @@ -7650,6 +7760,7 @@ static void i40e_vsi_free_arrays(struct i40e_vsi *vsi, bool free_qvectors)
>         kfree(vsi->tx_rings);
>         vsi->tx_rings = NULL;
>         vsi->rx_rings = NULL;
> +       vsi->xdp_rings = NULL;
>  }
>
>  /**
> @@ -7733,6 +7844,8 @@ static void i40e_vsi_clear_rings(struct i40e_vsi *vsi)
>                         kfree_rcu(vsi->tx_rings[i], rcu);
>                         vsi->tx_rings[i] = NULL;
>                         vsi->rx_rings[i] = NULL;
> +                       if (vsi->xdp_rings)
> +                               vsi->xdp_rings[i] = NULL;
>                 }
>         }
>  }
> @@ -7743,14 +7856,15 @@ static void i40e_vsi_clear_rings(struct i40e_vsi *vsi)
>   **/
>  static int i40e_alloc_rings(struct i40e_vsi *vsi)
>  {
> -       struct i40e_ring *tx_ring, *rx_ring;
> +       struct i40e_ring *tx_ring, *rx_ring, *xdp_ring;

Instead of having multiple ring pointers we could probably simplify
this by just using one generic ring pointer since it doesn't matter if
it is really a Tx, Rx, or XDP ring. It simplifies the code below a
bit.

>         struct i40e_pf *pf = vsi->back;
> -       int i;
> +       int i, nr;
>
> +       nr = i40e_enabled_xdp_vsi(vsi) ? 3 : 2;
>         /* Set basic values in the rings to be used later during open() */
>         for (i = 0; i < vsi->alloc_queue_pairs; i++) {
>                 /* allocate space for both Tx and Rx in one shot */
> -               tx_ring = kzalloc(sizeof(struct i40e_ring) * 2, GFP_KERNEL);
> +               tx_ring = kcalloc(nr, sizeof(*tx_ring), GFP_KERNEL);

I would prefer it if we stayed with "sizeof(struct i40e_ring)" instead
of using "sizeof(*tx_ring)". I think it is more readable that way.

Also instead of using nr it might be more readable to use something
like "q_per_vector" or qpv just so it is clear that we are allocating
either 2 or 3 queues per vector.

>                 if (!tx_ring)
>                         goto err_out;
>
> @@ -7780,6 +7894,26 @@ static int i40e_alloc_rings(struct i40e_vsi *vsi)
>                 rx_ring->dcb_tc = 0;
>                 rx_ring->rx_itr_setting = pf->rx_itr_default;
>                 vsi->rx_rings[i] = rx_ring;
> +
> +               if (!i40e_enabled_xdp_vsi(vsi))
> +                       continue;
> +
> +               xdp_ring = &rx_ring[1];

So for example instead of doing this you could just call ring++ and
update the the correct pointer. It also makes it easier to move this
up to be processed before the Rx ring.

> +               xdp_ring->queue_index = vsi->alloc_queue_pairs + i;
> +               xdp_ring->reg_idx = vsi->base_queue +
> +                                   vsi->alloc_queue_pairs + i;

This could be redefined as "xdp_ring->reg_idx = vsi->alloc_queue_pairs
+ xpd_ring->queue_index".

> +               xdp_ring->ring_active = false;
> +               xdp_ring->vsi = vsi;
> +               xdp_ring->netdev = NULL;
> +               xdp_ring->dev = &pf->pdev->dev;
> +               xdp_ring->count = vsi->num_desc;
> +               xdp_ring->size = 0;
> +               xdp_ring->dcb_tc = 0;
> +               if (vsi->back->flags & I40E_FLAG_WB_ON_ITR_CAPABLE)
> +                       xdp_ring->flags = I40E_TXR_FLAGS_WB_ON_ITR;
> +               set_ring_xdp(xdp_ring);
> +               xdp_ring->tx_itr_setting = pf->tx_itr_default;
> +               vsi->xdp_rings[i] = xdp_ring;

So I would really like to see this whole block moved up to before the
Rx configuration. That way we have all the Tx configured before we get
to the Rx.

>         }
>
>         return 0;
> @@ -9988,6 +10122,7 @@ static struct i40e_vsi *i40e_vsi_reinit_setup(struct i40e_vsi *vsi)
>         struct i40e_pf *pf;
>         u8 enabled_tc;
>         int ret;
> +       u16 alloc_queue_pairs;

Needs to be moved up for reverse Christmas tree layout.

>
>         if (!vsi)
>                 return NULL;
> @@ -10003,11 +10138,14 @@ static struct i40e_vsi *i40e_vsi_reinit_setup(struct i40e_vsi *vsi)
>         if (ret)
>                 goto err_vsi;
>
> -       ret = i40e_get_lump(pf, pf->qp_pile, vsi->alloc_queue_pairs, vsi->idx);
> +       alloc_queue_pairs = vsi->alloc_queue_pairs *
> +                           (i40e_enabled_xdp_vsi(vsi) ? 2 : 1);
> +
> +       ret = i40e_get_lump(pf, pf->qp_pile, alloc_queue_pairs, vsi->idx);
>         if (ret < 0) {
>                 dev_info(&pf->pdev->dev,
>                          "failed to get tracking for %d queues for VSI %d err %d\n",
> -                        vsi->alloc_queue_pairs, vsi->seid, ret);
> +                        alloc_queue_pairs, vsi->seid, ret);
>                 goto err_vsi;
>         }
>         vsi->base_queue = ret;
> @@ -10065,6 +10203,7 @@ struct i40e_vsi *i40e_vsi_setup(struct i40e_pf *pf, u8 type,
>         struct i40e_veb *veb = NULL;
>         int ret, i;
>         int v_idx;
> +       u16 alloc_queue_pairs;
>
>         /* The requested uplink_seid must be either
>          *     - the PF's port seid
> @@ -10150,12 +10289,14 @@ struct i40e_vsi *i40e_vsi_setup(struct i40e_pf *pf, u8 type,
>         else if (type == I40E_VSI_SRIOV)
>                 vsi->vf_id = param1;
>         /* assign it some queues */
> -       ret = i40e_get_lump(pf, pf->qp_pile, vsi->alloc_queue_pairs,
> -                               vsi->idx);
> +       alloc_queue_pairs = vsi->alloc_queue_pairs *
> +                           (i40e_enabled_xdp_vsi(vsi) ? 2 : 1);
> +
> +       ret = i40e_get_lump(pf, pf->qp_pile, alloc_queue_pairs, vsi->idx);
>         if (ret < 0) {
>                 dev_info(&pf->pdev->dev,
>                          "failed to get tracking for %d queues for VSI %d err=%d\n",
> -                        vsi->alloc_queue_pairs, vsi->seid, ret);
> +                        alloc_queue_pairs, vsi->seid, ret);
>                 goto err_vsi;
>         }
>         vsi->base_queue = ret;
> diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.c b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
> index 0c2f0230faf4..c025e517f7e5 100644
> --- a/drivers/net/ethernet/intel/i40e/i40e_txrx.c
> +++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
> @@ -630,6 +630,8 @@ static void i40e_unmap_and_free_tx_resource(struct i40e_ring *ring,
>         if (tx_buffer->skb) {
>                 if (tx_buffer->tx_flags & I40E_TX_FLAGS_FD_SB)
>                         kfree(tx_buffer->raw_buf);
> +               else if (ring_is_xdp(ring))
> +                       page_frag_free(tx_buffer->raw_buf);
>                 else
>                         dev_kfree_skb_any(tx_buffer->skb);
>                 if (dma_unmap_len(tx_buffer, len))
> @@ -771,8 +773,11 @@ static bool i40e_clean_tx_irq(struct i40e_vsi *vsi,
>                 total_bytes += tx_buf->bytecount;
>                 total_packets += tx_buf->gso_segs;
>
> -               /* free the skb */
> -               napi_consume_skb(tx_buf->skb, napi_budget);
> +               /* free the skb/XDP data */
> +               if (ring_is_xdp(tx_ring))
> +                       page_frag_free(tx_buf->raw_buf);
> +               else
> +                       napi_consume_skb(tx_buf->skb, napi_budget);
>
>                 /* unmap skb header data */
>                 dma_unmap_single(tx_ring->dev,
> @@ -848,6 +853,9 @@ static bool i40e_clean_tx_irq(struct i40e_vsi *vsi,
>                         tx_ring->arm_wb = true;
>         }
>
> +       if (ring_is_xdp(tx_ring))
> +               return !!budget;
> +
>         /* notify netdev of completed buffers */
>         netdev_tx_completed_queue(txring_txq(tx_ring),
>                                   total_packets, total_bytes);
> @@ -1969,6 +1977,10 @@ static bool i40e_is_non_eop(struct i40e_ring *rx_ring,
>
>  #define I40E_XDP_PASS 0
>  #define I40E_XDP_CONSUMED 1
> +#define I40E_XDP_TX 2
> +
> +static int i40e_xmit_xdp_ring(struct xdp_buff *xdp,
> +                             struct i40e_ring *xdp_ring);
>
>  /**
>   * i40e_run_xdp - run an XDP program
> @@ -1981,6 +1993,7 @@ static struct sk_buff *i40e_run_xdp(struct i40e_ring *rx_ring,
>         int result = I40E_XDP_PASS;
>         struct bpf_prog *xdp_prog;
>         u32 act;
> +       struct i40e_ring *xdp_ring;

This should be moved up for reverse christmas tree.

>
>         rcu_read_lock();
>         xdp_prog = READ_ONCE(rx_ring->xdp_prog);
> @@ -1989,12 +2002,16 @@ static struct sk_buff *i40e_run_xdp(struct i40e_ring *rx_ring,
>                 goto xdp_out;
>
>         act = bpf_prog_run_xdp(xdp_prog, xdp);
> +

This white space should either have been added in patch 1 or can be
dropped here.

>         switch (act) {
>         case XDP_PASS:
>                 break;
> +       case XDP_TX:
> +               xdp_ring = rx_ring->vsi->xdp_rings[rx_ring->queue_index];
> +               result = i40e_xmit_xdp_ring(xdp, xdp_ring);
> +               break;
>         default:
>                 bpf_warn_invalid_xdp_action(act);
> -       case XDP_TX:
>         case XDP_ABORTED:
>                 trace_xdp_exception(rx_ring->netdev, xdp_prog, act);
>                 /* fallthrough -- handle aborts by dropping packet */
> @@ -2008,6 +2025,27 @@ static struct sk_buff *i40e_run_xdp(struct i40e_ring *rx_ring,
>  }
>
>  /**
> + * i40e_rx_buffer_flip - adjusted rx_buffer to point to an unused region
> + * @rx_ring: Rx ring
> + * @rx_buffer: Rx buffer to adjust
> + * @size: Size of adjustment
> + **/
> +static void i40e_rx_buffer_flip(struct i40e_ring *rx_ring,
> +                               struct i40e_rx_buffer *rx_buffer,
> +                               unsigned int size)
> +{
> +#if (PAGE_SIZE < 8192)
> +       unsigned int truesize = i40e_rx_pg_size(rx_ring) / 2;
> +
> +       rx_buffer->page_offset ^= truesize;
> +#else
> +       unsigned int truesize = SKB_DATA_ALIGN(i40e_rx_offset(rx_ring) + size);
> +
> +       rx_buffer->page_offset += truesize;
> +#endif
> +}
> +
> +/**
>   * i40e_clean_rx_irq - Clean completed descriptors from Rx ring - bounce buf
>   * @rx_ring: rx descriptor ring to transact packets on
>   * @budget: Total limit on number of packets to process
> @@ -2024,7 +2062,7 @@ static int i40e_clean_rx_irq(struct i40e_ring *rx_ring, int budget)
>         unsigned int total_rx_bytes = 0, total_rx_packets = 0;
>         struct sk_buff *skb = rx_ring->skb;
>         u16 cleaned_count = I40E_DESC_UNUSED(rx_ring);
> -       bool failure = false;
> +       bool failure = false, xdp_xmit = false;
>
>         while (likely(total_rx_packets < budget)) {
>                 struct i40e_rx_buffer *rx_buffer;
> @@ -2081,9 +2119,14 @@ static int i40e_clean_rx_irq(struct i40e_ring *rx_ring, int budget)
>                 }
>
>                 if (IS_ERR(skb)) {
> +                       if (PTR_ERR(skb) == -I40E_XDP_TX) {
> +                               xdp_xmit = true;
> +                               i40e_rx_buffer_flip(rx_ring, rx_buffer, size);
> +                       } else {
> +                               rx_buffer->pagecnt_bias++;
> +                       }
>                         total_rx_bytes += size;
>                         total_rx_packets++;
> -                       rx_buffer->pagecnt_bias++;
>                 } else if (skb) {
>                         i40e_add_rx_frag(rx_ring, rx_buffer, skb, size);
>                 } else if (ring_uses_build_skb(rx_ring)) {
> @@ -2131,6 +2174,19 @@ static int i40e_clean_rx_irq(struct i40e_ring *rx_ring, int budget)
>                 total_rx_packets++;
>         }
>
> +       if (xdp_xmit) {
> +               struct i40e_ring *xdp_ring;
> +
> +               xdp_ring = rx_ring->vsi->xdp_rings[rx_ring->queue_index];
> +
> +               /* Force memory writes to complete before letting h/w
> +                * know there are new descriptors to fetch.
> +                */
> +               wmb();
> +
> +               writel(xdp_ring->next_to_use, xdp_ring->tail);
> +       }
> +
>         rx_ring->skb = skb;
>
>         u64_stats_update_begin(&rx_ring->syncp);
> @@ -3188,6 +3244,59 @@ static inline int i40e_tx_map(struct i40e_ring *tx_ring, struct sk_buff *skb,
>  }
>
>  /**
> + * i40e_xmit_xdp_ring - transmits an XDP buffer to an XDP Tx ring
> + * @xdp: data to transmit
> + * @xdp_ring: XDP Tx ring
> + **/
> +static int i40e_xmit_xdp_ring(struct xdp_buff *xdp,
> +                             struct i40e_ring *xdp_ring)
> +{
> +       struct i40e_tx_buffer *tx_bi;
> +       struct i40e_tx_desc *tx_desc;
> +       u16 i = xdp_ring->next_to_use;
> +       dma_addr_t dma;
> +       u32 size = xdp->data_end - xdp->data;
> +
> +       if (!unlikely(I40E_DESC_UNUSED(xdp_ring))) {
> +               xdp_ring->tx_stats.tx_busy++;
> +               return I40E_XDP_CONSUMED;
> +       }
> +
> +       dma = dma_map_single(xdp_ring->dev, xdp->data, size, DMA_TO_DEVICE);
> +       if (dma_mapping_error(xdp_ring->dev, dma))
> +               return I40E_XDP_CONSUMED;
> +
> +       tx_bi = &xdp_ring->tx_bi[i];
> +       tx_bi->bytecount = size;
> +       tx_bi->gso_segs = 1;
> +       tx_bi->raw_buf = xdp->data;
> +
> +       /* record length, and DMA address */
> +       dma_unmap_len_set(tx_bi, len, size);
> +       dma_unmap_addr_set(tx_bi, dma, dma);
> +
> +       tx_desc = I40E_TX_DESC(xdp_ring, i);
> +       tx_desc->buffer_addr = cpu_to_le64(dma);
> +       tx_desc->cmd_type_offset_bsz = build_ctob(I40E_TX_DESC_CMD_ICRC
> +                                                 | I40E_TXD_CMD,
> +                                                 0, size, 0);
> +
> +       /* Make certain all of the status bits have been updated
> +        * before next_to_watch is written.
> +        */
> +       smp_wmb();
> +
> +       i++;
> +       if (i == xdp_ring->count)
> +               i = 0;
> +
> +       tx_bi->next_to_watch = tx_desc;
> +       xdp_ring->next_to_use = i;
> +
> +       return I40E_XDP_TX;
> +}
> +
> +/**
>   * i40e_xmit_frame_ring - Sends buffer on Tx ring
>   * @skb:     send buffer
>   * @tx_ring: ring to send buffer on
> diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.h b/drivers/net/ethernet/intel/i40e/i40e_txrx.h
> index 31f0b162996f..b288d58313a6 100644
> --- a/drivers/net/ethernet/intel/i40e/i40e_txrx.h
> +++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.h
> @@ -396,6 +396,7 @@ struct i40e_ring {
>         u16 flags;
>  #define I40E_TXR_FLAGS_WB_ON_ITR               BIT(0)
>  #define I40E_RXR_FLAGS_BUILD_SKB_ENABLED       BIT(1)
> +#define I40E_TXR_FLAGS_XDP                     BIT(2)
>
>         /* stats structs */
>         struct i40e_queue_stats stats;
> @@ -438,6 +439,16 @@ static inline void clear_ring_build_skb_enabled(struct i40e_ring *ring)
>         ring->flags &= ~I40E_RXR_FLAGS_BUILD_SKB_ENABLED;
>  }
>
> +static inline bool ring_is_xdp(struct i40e_ring *ring)
> +{
> +       return !!(ring->flags & I40E_TXR_FLAGS_XDP);
> +}
> +
> +static inline void set_ring_xdp(struct i40e_ring *ring)
> +{
> +       ring->flags |= I40E_TXR_FLAGS_XDP;
> +}
> +
>  enum i40e_latency_range {
>         I40E_LOWEST_LATENCY = 0,
>         I40E_LOW_LATENCY = 1,
> --
> 2.11.0
>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Intel-wired-lan] [PATCH v5 0/2] i40e: support for XDP
  2017-05-19  7:08 [Intel-wired-lan] [PATCH v5 0/2] i40e: support for XDP =?unknown-8bit?q?Bj=C3=B6rn_T=C3=B6pel?=
  2017-05-19  7:08 ` [Intel-wired-lan] [PATCH v5 1/2] i40e: add XDP support for pass and drop actions =?unknown-8bit?q?Bj=C3=B6rn_T=C3=B6pel?=
  2017-05-19  7:08 ` [Intel-wired-lan] [PATCH v5 2/2] i40e: add support for XDP_TX action =?unknown-8bit?q?Bj=C3=B6rn_T=C3=B6pel?=
@ 2017-05-19 13:55 ` Alexander Duyck
  2017-05-19 14:45   ` =?unknown-8bit?q?Bj=C3=B6rn_T=C3=B6pel?=
  2 siblings, 1 reply; 8+ messages in thread
From: Alexander Duyck @ 2017-05-19 13:55 UTC (permalink / raw)
  To: intel-wired-lan

On Fri, May 19, 2017 at 12:08 AM, Bj?rn T?pel <bjorn.topel@gmail.com> wrote:
> From: Bj?rn T?pel <bjorn.topel@intel.com>
>
> This series adds XDP support for i40e-based NICs.
>
> The first patch wires up ndo_xdp and implements XDP_DROP semantics for
> all actions. The second patch adds egress support via the XDP_TX
> action.
>
> Performance numbers (40GbE port, 64B packets) for xdp1 and xdp2
> programs, from samples/bpf/:
>
>  IOMMU                      | xdp1      | xdp2
>  ---------------------------+-----------+-----------
>  iommu=off                  | 29.7 Mpps | 17.1 Mpps
>  iommu=pt intel_iommu=on    | 29.7 Mpps | 11.6 Mpps
>  iommu=on intel_iommu=on    | 21.8 Mpps |  3.7 Mpps

These numbers look pretty good. I wouldn't expect us to have much in
the way of performance with iommu enabled, and the iommu=off numbers
are about 20Gb/s for xdp1, and better than 10Gb/s for xdp2 so this is
a good starting point. I'm assuming this is a single queue throughput
test?

> Future improvements, not covered by the patches:
>   * Egress: Create the iova mappings upfront
>     (DMA_BIDIRECTIONAL/dma_sync_*), instead of creating a new iova
>     mapping in the transmit fast-path. This will improve performance
>     for the IOMMU-enabled case.

The problem with using DMA_BIDIRECTIONAL is that there are scenarios
where it makes DMA more expensive in general as we have to then push
the data every time we do a sync for CPU. If you take a look at the
swiotlb code it will give you an idea of what I am talking about.

Also when we start supporting redirection the DMA_BIDIRECTIONAL won't
be useful unless you want to map a buffer for multiple devices
simultaneously.

>   * Proper debugfs support.
>   * i40evf support.
>
> Thanks to Alex, Daniel, John and Scott for all feedback!
>
> v5:
>   * Aligned the implementation to ixgbe's XDP support: naming, favor
>     xchg instead of RCU semantics
>   * Support for XDP headroom (piggybacking on Alex' build_skb work)
>   * Added xdp tracepoints for exception states (as suggested by
>     Daniel)
>
> v4:
>   * Removed unused i40e_page_is_reserved function
>   * Prior running the XDP program, set the struct xdp_buff
>     data_hard_start member
>
> v3:
>   * Rebased patch set on Jeff's dev-queue branch
>   * MSI-X is no longer a prerequisite for XDP
>   * RCU locking for the XDP program and XDP_RX support is introduced
>     in the same patch
>   * Rx bytes is now bumped for XDP
>   * Removed pointer-to-pointer clunkiness
>   * Added comments to XDP preconditions in ndo_xdp
>   * When a non-EOF is received, log once, and drop the frame
>
> v2:
>   * Fixed kbuild error for PAGE_SIZE >= 8192.
>   * Renamed i40e_try_flip_rx_page to i40e_can_reuse_rx_page, which is
>     more in line to the other Intel Ethernet drivers (igb/fm10k).
>   * Validate xdp_adjust_head support in ndo_xdp/XDP_SETUP_PROG.
>
> Bj?rn T?pel (2):
>   i40e: add XDP support for pass and drop actions
>   i40e: add support for XDP_TX action
>
>  drivers/net/ethernet/intel/i40e/i40e.h         |   8 +
>  drivers/net/ethernet/intel/i40e/i40e_ethtool.c |  57 +++++-
>  drivers/net/ethernet/intel/i40e/i40e_main.c    | 270 ++++++++++++++++++++++---
>  drivers/net/ethernet/intel/i40e/i40e_txrx.c    | 245 ++++++++++++++++++----
>  drivers/net/ethernet/intel/i40e/i40e_txrx.h    |  12 ++
>  5 files changed, 530 insertions(+), 62 deletions(-)
>
> --
> 2.11.0
>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Intel-wired-lan] [PATCH v5 0/2] i40e: support for XDP
  2017-05-19 13:55 ` [Intel-wired-lan] [PATCH v5 0/2] i40e: support for XDP Alexander Duyck
@ 2017-05-19 14:45   ` =?unknown-8bit?q?Bj=C3=B6rn_T=C3=B6pel?=
  0 siblings, 0 replies; 8+ messages in thread
From: =?unknown-8bit?q?Bj=C3=B6rn_T=C3=B6pel?= @ 2017-05-19 14:45 UTC (permalink / raw)
  To: intel-wired-lan

2017-05-19 15:55 GMT+02:00 Alexander Duyck <alexander.duyck@gmail.com>:
> On Fri, May 19, 2017 at 12:08 AM, Bj?rn T?pel <bjorn.topel@gmail.com> wrote:
>> From: Bj?rn T?pel <bjorn.topel@intel.com>
>>
>> This series adds XDP support for i40e-based NICs.
>>
>> The first patch wires up ndo_xdp and implements XDP_DROP semantics for
>> all actions. The second patch adds egress support via the XDP_TX
>> action.
>>
>> Performance numbers (40GbE port, 64B packets) for xdp1 and xdp2
>> programs, from samples/bpf/:
>>
>>  IOMMU                      | xdp1      | xdp2
>>  ---------------------------+-----------+-----------
>>  iommu=off                  | 29.7 Mpps | 17.1 Mpps
>>  iommu=pt intel_iommu=on    | 29.7 Mpps | 11.6 Mpps
>>  iommu=on intel_iommu=on    | 21.8 Mpps |  3.7 Mpps
>
> These numbers look pretty good. I wouldn't expect us to have much in
> the way of performance with iommu enabled, and the iommu=off numbers
> are about 20Gb/s for xdp1, and better than 10Gb/s for xdp2 so this is
> a good starting point. I'm assuming this is a single queue throughput
> test?

Correct, single queue/one core!

>
>> Future improvements, not covered by the patches:
>>   * Egress: Create the iova mappings upfront
>>     (DMA_BIDIRECTIONAL/dma_sync_*), instead of creating a new iova
>>     mapping in the transmit fast-path. This will improve performance
>>     for the IOMMU-enabled case.
>
> The problem with using DMA_BIDIRECTIONAL is that there are scenarios
> where it makes DMA more expensive in general as we have to then push
> the data every time we do a sync for CPU. If you take a look at the
> swiotlb code it will give you an idea of what I am talking about.
>
> Also when we start supporting redirection the DMA_BIDIRECTIONAL won't
> be useful unless you want to map a buffer for multiple devices
> simultaneously.
>
>>   * Proper debugfs support.
>>   * i40evf support.
>>
>> Thanks to Alex, Daniel, John and Scott for all feedback!
>>
>> v5:
>>   * Aligned the implementation to ixgbe's XDP support: naming, favor
>>     xchg instead of RCU semantics
>>   * Support for XDP headroom (piggybacking on Alex' build_skb work)
>>   * Added xdp tracepoints for exception states (as suggested by
>>     Daniel)
>>
>> v4:
>>   * Removed unused i40e_page_is_reserved function
>>   * Prior running the XDP program, set the struct xdp_buff
>>     data_hard_start member
>>
>> v3:
>>   * Rebased patch set on Jeff's dev-queue branch
>>   * MSI-X is no longer a prerequisite for XDP
>>   * RCU locking for the XDP program and XDP_RX support is introduced
>>     in the same patch
>>   * Rx bytes is now bumped for XDP
>>   * Removed pointer-to-pointer clunkiness
>>   * Added comments to XDP preconditions in ndo_xdp
>>   * When a non-EOF is received, log once, and drop the frame
>>
>> v2:
>>   * Fixed kbuild error for PAGE_SIZE >= 8192.
>>   * Renamed i40e_try_flip_rx_page to i40e_can_reuse_rx_page, which is
>>     more in line to the other Intel Ethernet drivers (igb/fm10k).
>>   * Validate xdp_adjust_head support in ndo_xdp/XDP_SETUP_PROG.
>>
>> Bj?rn T?pel (2):
>>   i40e: add XDP support for pass and drop actions
>>   i40e: add support for XDP_TX action
>>
>>  drivers/net/ethernet/intel/i40e/i40e.h         |   8 +
>>  drivers/net/ethernet/intel/i40e/i40e_ethtool.c |  57 +++++-
>>  drivers/net/ethernet/intel/i40e/i40e_main.c    | 270 ++++++++++++++++++++++---
>>  drivers/net/ethernet/intel/i40e/i40e_txrx.c    | 245 ++++++++++++++++++----
>>  drivers/net/ethernet/intel/i40e/i40e_txrx.h    |  12 ++
>>  5 files changed, 530 insertions(+), 62 deletions(-)
>>
>> --
>> 2.11.0
>>

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2017-05-19 14:45 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-05-19  7:08 [Intel-wired-lan] [PATCH v5 0/2] i40e: support for XDP =?unknown-8bit?q?Bj=C3=B6rn_T=C3=B6pel?=
2017-05-19  7:08 ` [Intel-wired-lan] [PATCH v5 1/2] i40e: add XDP support for pass and drop actions =?unknown-8bit?q?Bj=C3=B6rn_T=C3=B6pel?=
2017-05-19 12:29   ` Alexander Duyck
2017-05-19 12:46     ` =?unknown-8bit?q?Bj=C3=B6rn_T=C3=B6pel?=
2017-05-19  7:08 ` [Intel-wired-lan] [PATCH v5 2/2] i40e: add support for XDP_TX action =?unknown-8bit?q?Bj=C3=B6rn_T=C3=B6pel?=
2017-05-19 13:40   ` Alexander Duyck
2017-05-19 13:55 ` [Intel-wired-lan] [PATCH v5 0/2] i40e: support for XDP Alexander Duyck
2017-05-19 14:45   ` =?unknown-8bit?q?Bj=C3=B6rn_T=C3=B6pel?=

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.