All of lore.kernel.org
 help / color / mirror / Atom feed
* [Intel-wired-lan] [net-next PATCH 0/3] XDP for ixgbe
@ 2017-02-25 17:32 John Fastabend
  2017-02-25 17:32 ` [Intel-wired-lan] [net-next PATCH 1/3] ixgbe: add XDP support for pass and drop actions John Fastabend
                   ` (4 more replies)
  0 siblings, 5 replies; 14+ messages in thread
From: John Fastabend @ 2017-02-25 17:32 UTC (permalink / raw)
  To: intel-wired-lan

This series adds support for XDP on ixgbe. We still need to understand
adjust head size requirement. If we can compromise at 196B (is this
correct Alex?) then we can continue to use normal driver RX path and
avoid having XDP codebase + normal codebase. This is a big win for
everyone who has to read this code day to day and work on it. I
suggest if more headroom is needed then it should also be needed in
the normal stack case and we should provide a generic mechanism to
build up more headroom. Plus we already have ndo_set_rx_headroom()
can we just use this?

If this series can get accepted then we have a series behind it to
enable batching on TX to push TX Mpps up to line rates. The gist of
the implementation is to run XDP program in a loop, collecting the
action results in an array. And then pushing them into the TX routine.
For a first gen this will likely abort if we get a XDP_PASS routine
but this is just a matter of code wrangling and lazyness. It can be
resolved.

Future looking some improvements are needed, TX routines should take
an array of packets if we believe long trains of packets will be on
the RX ring inside a "processing window" (how many descriptors we
handle per irq clean). Note with many queues on devices we can ensure
this happens with flow director or even RSS in some cases.

@Alex, please review. Look at patch 2/3 in paticular and let me know
what you think about the trade-offs I made there w.r.t. num_xdp_queues

---

John Fastabend (3):
      ixgbe: add XDP support for pass and drop actions
      ixgbe: add support for XDP_TX action
      ixgbe: xdp support for adjust head


 drivers/net/ethernet/intel/ixgbe/ixgbe.h         |   14 +
 drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c |   29 +
 drivers/net/ethernet/intel/ixgbe/ixgbe_lib.c     |   85 ++++
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c    |  439 ++++++++++++++++++++--
 4 files changed, 511 insertions(+), 56 deletions(-)

--
Signature

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Intel-wired-lan] [net-next PATCH 1/3] ixgbe: add XDP support for pass and drop actions
  2017-02-25 17:32 [Intel-wired-lan] [net-next PATCH 0/3] XDP for ixgbe John Fastabend
@ 2017-02-25 17:32 ` John Fastabend
  2017-02-25 18:18   ` kbuild test robot
  2017-02-27 16:22   ` Alexander Duyck
  2017-02-25 17:32 ` [Intel-wired-lan] [net-next PATCH 2/3] ixgbe: add support for XDP_TX action John Fastabend
                   ` (3 subsequent siblings)
  4 siblings, 2 replies; 14+ messages in thread
From: John Fastabend @ 2017-02-25 17:32 UTC (permalink / raw)
  To: intel-wired-lan

Basic XDP drop support for ixgbe. Uses READ_ONCE/xchg semantics on XDP
programs instead of rcu primitives as suggested by Daniel Borkmann and
Alex Duyck.

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
---
 drivers/net/ethernet/intel/ixgbe/ixgbe.h      |    2 
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c |  108 +++++++++++++++++++++++++
 2 files changed, 108 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe.h b/drivers/net/ethernet/intel/ixgbe/ixgbe.h
index b812913..2d12c24 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe.h
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe.h
@@ -273,6 +273,7 @@ struct ixgbe_ring {
 	struct ixgbe_ring *next;	/* pointer to next ring in q_vector */
 	struct ixgbe_q_vector *q_vector; /* backpointer to host q_vector */
 	struct net_device *netdev;	/* netdev ring belongs to */
+	struct bpf_prog *xdp_prog;
 	struct device *dev;		/* device for DMA mapping */
 	struct ixgbe_fwd_adapter *l2_accel_priv;
 	void *desc;			/* descriptor ring memory */
@@ -510,6 +511,7 @@ struct ixgbe_adapter {
 	unsigned long active_vlans[BITS_TO_LONGS(VLAN_N_VID)];
 	/* OS defined structs */
 	struct net_device *netdev;
+	struct bpf_prog *xdp_prog;
 	struct pci_dev *pdev;
 
 	unsigned long state;
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index e3da397..ec2c38f 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -49,6 +49,9 @@
 #include <linux/if_macvlan.h>
 #include <linux/if_bridge.h>
 #include <linux/prefetch.h>
+#include <linux/bpf.h>
+#include <linux/bpf_trace.h>
+#include <linux/atomic.h>
 #include <scsi/fc/fc_fcoe.h>
 #include <net/udp_tunnel.h>
 #include <net/pkt_cls.h>
@@ -2051,7 +2054,7 @@ static void ixgbe_put_rx_buffer(struct ixgbe_ring *rx_ring,
 		/* hand second half of page back to the ring */
 		ixgbe_reuse_rx_page(rx_ring, rx_buffer);
 	} else {
-		if (IXGBE_CB(skb)->dma == rx_buffer->dma) {
+		if (skb && IXGBE_CB(skb)->dma == rx_buffer->dma) {
 			/* the page has been released from the ring */
 			IXGBE_CB(skb)->page_released = true;
 		} else {
@@ -2157,6 +2160,42 @@ static struct sk_buff *ixgbe_build_skb(struct ixgbe_ring *rx_ring,
 	return skb;
 }
 
+static int ixgbe_run_xdp(struct ixgbe_ring  *rx_ring,
+			 struct ixgbe_rx_buffer *rx_buffer,
+			 unsigned int size)
+{
+	struct bpf_prog *xdp_prog;
+	struct xdp_buff xdp;
+	void *addr;
+	u32 act;
+
+	xdp_prog = READ_ONCE(rx_ring->xdp_prog);
+
+	if (!xdp_prog)
+		return 0;
+
+	addr = page_address(rx_buffer->page) + rx_buffer->page_offset;
+	xdp.data_hard_start = addr;
+	xdp.data = addr;
+	xdp.data_end = addr + size;
+
+	act = bpf_prog_run_xdp(xdp_prog, &xdp);
+	switch (act) {
+	case XDP_PASS:
+		return 0;
+	default:
+		bpf_warn_invalid_xdp_action(act);
+	case XDP_TX:
+	case XDP_ABORTED:
+		trace_xdp_exception(rx_ring->netdev, xdp_prog, act);
+		/* fallthrough -- handle aborts by dropping packet */
+	case XDP_DROP:
+		rx_buffer->pagecnt_bias++; /* give page back */
+		break;
+	}
+	return size;
+}
+
 /**
  * ixgbe_clean_rx_irq - Clean completed descriptors from Rx ring - bounce buf
  * @q_vector: structure containing interrupt and ring information
@@ -2187,6 +2226,7 @@ static int ixgbe_clean_rx_irq(struct ixgbe_q_vector *q_vector,
 		struct ixgbe_rx_buffer *rx_buffer;
 		struct sk_buff *skb;
 		unsigned int size;
+		int consumed;
 
 		/* return some buffers to hardware, one at a time is too slow */
 		if (cleaned_count >= IXGBE_RX_BUFFER_WRITE) {
@@ -2207,6 +2247,19 @@ static int ixgbe_clean_rx_irq(struct ixgbe_q_vector *q_vector,
 
 		rx_buffer = ixgbe_get_rx_buffer(rx_ring, rx_desc, &skb, size);
 
+		rcu_read_lock();
+		consumed = ixgbe_run_xdp(rx_ring, rx_buffer, size);
+		rcu_read_unlock();
+
+		if (consumed) {
+			ixgbe_put_rx_buffer(rx_ring, rx_buffer, skb);
+			cleaned_count++;
+			ixgbe_is_non_eop(rx_ring, rx_desc, skb);
+			total_rx_packets++;
+			total_rx_bytes += size;
+			continue;
+		}
+
 		/* retrieve a buffer from the ring */
 		if (skb)
 			ixgbe_add_rx_frag(rx_ring, rx_buffer, skb, size);
@@ -6106,6 +6159,12 @@ int ixgbe_setup_rx_resources(struct ixgbe_ring *rx_ring)
 	return -ENOMEM;
 }
 
+static void ixgbe_setup_xdp_resource(struct ixgbe_adapter *adapter,
+				     struct ixgbe_ring *ring)
+{
+	xchg(&ring->xdp_prog, adapter->xdp_prog);
+}
+
 /**
  * ixgbe_setup_all_rx_resources - allocate all queues Rx resources
  * @adapter: board private structure
@@ -6122,8 +6181,10 @@ static int ixgbe_setup_all_rx_resources(struct ixgbe_adapter *adapter)
 
 	for (i = 0; i < adapter->num_rx_queues; i++) {
 		err = ixgbe_setup_rx_resources(adapter->rx_ring[i]);
-		if (!err)
+		if (!err) {
+			ixgbe_setup_xdp_resource(adapter, adapter->rx_ring[i]);
 			continue;
+		}
 
 		e_err(probe, "Allocation for Rx Queue %u failed\n", i);
 		goto err_setup_rx;
@@ -9455,6 +9516,48 @@ static void ixgbe_fwd_del(struct net_device *pdev, void *priv)
 	return features;
 }
 
+static int ixgbe_xdp_setup(struct net_device *dev, struct bpf_prog *prog)
+{
+	int i, frame_size = dev->mtu + ETH_HLEN + ETH_FCS_LEN + VLAN_HLEN;
+	struct ixgbe_adapter *adapter = netdev_priv(dev);
+	struct bpf_prog *old_adapter_prog;
+
+	/* verify ixgbe ring attributes are sufficient for XDP */
+	for (i = 0; i < adapter->num_rx_queues; i++) {
+		struct ixgbe_ring *ring = adapter->rx_ring[i];
+
+		if (ring_is_rsc_enabled(ring))
+			return -EINVAL;
+
+		if (frame_size > ixgbe_rx_bufsz(ring))
+			return -EINVAL;
+	}
+
+	old_adapter_prog = xchg(&adapter->xdp_prog, prog);
+	for (i = 0; i < adapter->num_rx_queues; i++)
+		ixgbe_setup_xdp_resource(adapter, adapter->rx_ring[i]);
+
+	if (old_adapter_prog)
+		bpf_prog_put(old_adapter_prog);
+
+	return 0;
+}
+
+static int ixgbe_xdp(struct net_device *dev, struct netdev_xdp *xdp)
+{
+	struct ixgbe_adapter *adapter = netdev_priv(dev);
+
+	switch (xdp->command) {
+	case XDP_SETUP_PROG:
+		return ixgbe_xdp_setup(dev, xdp->prog);
+	case XDP_QUERY_PROG:
+		xdp->prog_attached = !!(adapter->rx_ring[0]->xdp_prog);
+		return 0;
+	default:
+		return -EINVAL;
+	}
+}
+
 static const struct net_device_ops ixgbe_netdev_ops = {
 	.ndo_open		= ixgbe_open,
 	.ndo_stop		= ixgbe_close,
@@ -9500,6 +9603,7 @@ static void ixgbe_fwd_del(struct net_device *pdev, void *priv)
 	.ndo_udp_tunnel_add	= ixgbe_add_udp_tunnel_port,
 	.ndo_udp_tunnel_del	= ixgbe_del_udp_tunnel_port,
 	.ndo_features_check	= ixgbe_features_check,
+	.ndo_xdp		= ixgbe_xdp,
 };
 
 /**


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [Intel-wired-lan] [net-next PATCH 2/3] ixgbe: add support for XDP_TX action
  2017-02-25 17:32 [Intel-wired-lan] [net-next PATCH 0/3] XDP for ixgbe John Fastabend
  2017-02-25 17:32 ` [Intel-wired-lan] [net-next PATCH 1/3] ixgbe: add XDP support for pass and drop actions John Fastabend
@ 2017-02-25 17:32 ` John Fastabend
  2017-02-25 22:01   ` Alexander Duyck
  2017-02-25 17:33 ` [Intel-wired-lan] [net-next PATCH 3/3] ixgbe: xdp support for adjust head John Fastabend
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 14+ messages in thread
From: John Fastabend @ 2017-02-25 17:32 UTC (permalink / raw)
  To: intel-wired-lan

Add support for XDP_TX action.

A couple design choices were made here. First I use a new ring
pointer structure xdp_ring[] in the adapter struct instead of
pushing the newly allocated xdp TX rings into the tx_ring[]
structure. This means we have to duplicate loops around rings
in places we want to initialize both TX rings and XDP rings.
But by making it explicit it is obvious when we are using XDP
rings and when we are using TX rings. Further we don't have
to do ring arithmatic which is error prone. As a proof point
for doing this my first patches used only a single ring structure
and introduced bugs in FCoE code and macvlan code paths.

Second I am aware this is not the most optimized version of
this code possible. I want to get baseline support in using
the most readable format possible and then once this series
is included I will optimize the TX path in another series
of patches.

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
---
 drivers/net/ethernet/intel/ixgbe/ixgbe.h         |   12 +
 drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c |   29 ++
 drivers/net/ethernet/intel/ixgbe/ixgbe_lib.c     |   85 +++++-
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c    |  314 +++++++++++++++++++---
 4 files changed, 386 insertions(+), 54 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe.h b/drivers/net/ethernet/intel/ixgbe/ixgbe.h
index 2d12c24..571a072 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe.h
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe.h
@@ -190,7 +190,10 @@ struct vf_macvlans {
 struct ixgbe_tx_buffer {
 	union ixgbe_adv_tx_desc *next_to_watch;
 	unsigned long time_stamp;
-	struct sk_buff *skb;
+	union {
+		struct sk_buff *skb;
+		void *data; /* XDP uses addr ptr on irq_clean */
+	};
 	unsigned int bytecount;
 	unsigned short gso_segs;
 	__be16 protocol;
@@ -308,6 +311,7 @@ struct ixgbe_ring {
 	};
 
 	u8 dcb_tc;
+	bool xdp_ring;
 	struct ixgbe_queue_stats stats;
 	struct u64_stats_sync syncp;
 	union {
@@ -335,6 +339,7 @@ enum ixgbe_ring_f_enum {
 #define IXGBE_MAX_FCOE_INDICES		8
 #define MAX_RX_QUEUES			(IXGBE_MAX_FDIR_INDICES + 1)
 #define MAX_TX_QUEUES			(IXGBE_MAX_FDIR_INDICES + 1)
+#define MAX_XDP_QUEUES			(IXGBE_MAX_FDIR_INDICES + 1)
 #define IXGBE_MAX_L2A_QUEUES		4
 #define IXGBE_BAD_L2A_QUEUE		3
 #define IXGBE_MAX_MACVLANS		31
@@ -578,6 +583,10 @@ struct ixgbe_adapter {
 	__be16 vxlan_port;
 	__be16 geneve_port;
 
+	/* XDP */
+	int num_xdp_queues;
+	struct ixgbe_ring *xdp_ring[MAX_XDP_QUEUES];
+
 	/* TX */
 	struct ixgbe_ring *tx_ring[MAX_TX_QUEUES] ____cacheline_aligned_in_smp;
 
@@ -624,6 +633,7 @@ struct ixgbe_adapter {
 
 	u64 tx_busy;
 	unsigned int tx_ring_count;
+	unsigned int xdp_ring_count;
 	unsigned int rx_ring_count;
 
 	u32 link_speed;
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c
index d3e02ac..51efd0a 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c
@@ -1031,7 +1031,7 @@ static int ixgbe_set_ringparam(struct net_device *netdev,
 {
 	struct ixgbe_adapter *adapter = netdev_priv(netdev);
 	struct ixgbe_ring *temp_ring;
-	int i, err = 0;
+	int i, j, err = 0;
 	u32 new_rx_count, new_tx_count;
 
 	if ((ring->rx_mini_pending) || (ring->rx_jumbo_pending))
@@ -1057,15 +1057,19 @@ static int ixgbe_set_ringparam(struct net_device *netdev,
 	if (!netif_running(adapter->netdev)) {
 		for (i = 0; i < adapter->num_tx_queues; i++)
 			adapter->tx_ring[i]->count = new_tx_count;
+		for (i = 0; i < adapter->num_xdp_queues; i++)
+			adapter->xdp_ring[i]->count = new_tx_count;
 		for (i = 0; i < adapter->num_rx_queues; i++)
 			adapter->rx_ring[i]->count = new_rx_count;
 		adapter->tx_ring_count = new_tx_count;
+		adapter->xdp_ring_count = new_tx_count;
 		adapter->rx_ring_count = new_rx_count;
 		goto clear_reset;
 	}
 
 	/* allocate temporary buffer to store rings in */
-	i = max_t(int, adapter->num_tx_queues, adapter->num_rx_queues);
+	i = max_t(int, adapter->num_tx_queues + adapter->num_xdp_queues,
+		  adapter->num_rx_queues);
 	temp_ring = vmalloc(i * sizeof(struct ixgbe_ring));
 
 	if (!temp_ring) {
@@ -1097,12 +1101,33 @@ static int ixgbe_set_ringparam(struct net_device *netdev,
 			}
 		}
 
+		for (j = 0; j < adapter->num_xdp_queues; i++, j++) {
+			memcpy(&temp_ring[i], adapter->xdp_ring[j],
+			       sizeof(struct ixgbe_ring));
+
+			temp_ring[i].count = new_tx_count;
+			err = ixgbe_setup_tx_resources(&temp_ring[i]);
+			if (err) {
+				while (i) {
+					i--;
+					ixgbe_free_tx_resources(&temp_ring[i]);
+				}
+				goto err_setup;
+			}
+		}
+
 		for (i = 0; i < adapter->num_tx_queues; i++) {
 			ixgbe_free_tx_resources(adapter->tx_ring[i]);
 
 			memcpy(adapter->tx_ring[i], &temp_ring[i],
 			       sizeof(struct ixgbe_ring));
 		}
+		for (j = 0; j < adapter->num_xdp_queues; i++, j++) {
+			ixgbe_free_tx_resources(adapter->xdp_ring[j]);
+
+			memcpy(adapter->xdp_ring[j], &temp_ring[i],
+			       sizeof(struct ixgbe_ring));
+		}
 
 		adapter->tx_ring_count = new_tx_count;
 	}
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_lib.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_lib.c
index 1b8be7d..d9a4916 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_lib.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_lib.c
@@ -73,6 +73,11 @@ static bool ixgbe_cache_ring_dcb_sriov(struct ixgbe_adapter *adapter)
 			reg_idx = __ALIGN_MASK(reg_idx, ~vmdq->mask);
 		adapter->tx_ring[i]->reg_idx = reg_idx;
 	}
+	for (i = 0; i < adapter->num_xdp_queues; i++, reg_idx++) {
+		if ((reg_idx & ~vmdq->mask) >= tcs)
+			reg_idx = __ALIGN_MASK(reg_idx, ~vmdq->mask);
+		adapter->xdp_ring[i]->reg_idx = reg_idx;
+	}
 
 #ifdef IXGBE_FCOE
 	/* nothing to do if FCoE is disabled */
@@ -248,6 +253,12 @@ static bool ixgbe_cache_ring_sriov(struct ixgbe_adapter *adapter)
 		adapter->tx_ring[i]->reg_idx = reg_idx;
 	}
 
+	for (i = 0; i < adapter->num_xdp_queues; i++, reg_idx++) {
+		if ((reg_idx & rss->mask) >= rss->indices)
+			reg_idx = __ALIGN_MASK(reg_idx, ~vmdq->mask);
+		adapter->xdp_ring[i]->reg_idx = reg_idx;
+	}
+
 #ifdef IXGBE_FCOE
 	/* FCoE uses a linear block of queues so just assigning 1:1 */
 	for (; i < adapter->num_tx_queues; i++, reg_idx++)
@@ -267,12 +278,14 @@ static bool ixgbe_cache_ring_sriov(struct ixgbe_adapter *adapter)
  **/
 static bool ixgbe_cache_ring_rss(struct ixgbe_adapter *adapter)
 {
-	int i;
+	int i, reg_idx;
 
 	for (i = 0; i < adapter->num_rx_queues; i++)
 		adapter->rx_ring[i]->reg_idx = i;
-	for (i = 0; i < adapter->num_tx_queues; i++)
-		adapter->tx_ring[i]->reg_idx = i;
+	for (i = 0, reg_idx = 0; i < adapter->num_tx_queues; i++, reg_idx++)
+		adapter->tx_ring[i]->reg_idx = reg_idx;
+	for (i = 0; i < adapter->num_xdp_queues; i++, reg_idx++)
+		adapter->xdp_ring[i]->reg_idx = reg_idx;
 
 	return true;
 }
@@ -308,6 +321,14 @@ static void ixgbe_cache_ring_register(struct ixgbe_adapter *adapter)
 	ixgbe_cache_ring_rss(adapter);
 }
 
+static int ixgbe_xdp_queues(struct ixgbe_adapter *adapter)
+{
+	if (nr_cpu_ids > MAX_XDP_QUEUES)
+		return 0;
+
+	return adapter->xdp_prog ? nr_cpu_ids : 0;
+}
+
 #define IXGBE_RSS_64Q_MASK	0x3F
 #define IXGBE_RSS_16Q_MASK	0xF
 #define IXGBE_RSS_8Q_MASK	0x7
@@ -382,6 +403,7 @@ static bool ixgbe_set_dcb_sriov_queues(struct ixgbe_adapter *adapter)
 	adapter->num_rx_queues_per_pool = tcs;
 
 	adapter->num_tx_queues = vmdq_i * tcs;
+	adapter->num_xdp_queues = ixgbe_xdp_queues(adapter);
 	adapter->num_rx_queues = vmdq_i * tcs;
 
 #ifdef IXGBE_FCOE
@@ -479,6 +501,7 @@ static bool ixgbe_set_dcb_queues(struct ixgbe_adapter *adapter)
 		netdev_set_tc_queue(dev, i, rss_i, rss_i * i);
 
 	adapter->num_tx_queues = rss_i * tcs;
+	adapter->num_xdp_queues = ixgbe_xdp_queues(adapter);
 	adapter->num_rx_queues = rss_i * tcs;
 
 	return true;
@@ -549,6 +572,7 @@ static bool ixgbe_set_sriov_queues(struct ixgbe_adapter *adapter)
 
 	adapter->num_rx_queues = vmdq_i * rss_i;
 	adapter->num_tx_queues = vmdq_i * rss_i;
+	adapter->num_xdp_queues = ixgbe_xdp_queues(adapter);
 
 	/* disable ATR as it is not supported when VMDq is enabled */
 	adapter->flags &= ~IXGBE_FLAG_FDIR_HASH_CAPABLE;
@@ -669,6 +693,7 @@ static bool ixgbe_set_rss_queues(struct ixgbe_adapter *adapter)
 #endif /* IXGBE_FCOE */
 	adapter->num_rx_queues = rss_i;
 	adapter->num_tx_queues = rss_i;
+	adapter->num_xdp_queues = ixgbe_xdp_queues(adapter);
 
 	return true;
 }
@@ -689,6 +714,7 @@ static void ixgbe_set_num_queues(struct ixgbe_adapter *adapter)
 	/* Start with base case */
 	adapter->num_rx_queues = 1;
 	adapter->num_tx_queues = 1;
+	adapter->num_xdp_queues = 0;
 	adapter->num_rx_pools = adapter->num_rx_queues;
 	adapter->num_rx_queues_per_pool = 1;
 
@@ -720,7 +746,8 @@ static int ixgbe_acquire_msix_vectors(struct ixgbe_adapter *adapter)
 	int i, vectors, vector_threshold;
 
 	/* We start by asking for one vector per queue pair */
-	vectors = max(adapter->num_rx_queues, adapter->num_tx_queues);
+	vectors = max(adapter->num_rx_queues,
+		      adapter->num_tx_queues + adapter->num_xdp_queues);
 
 	/* It is easy to be greedy for MSI-X vectors. However, it really
 	 * doesn't do much good if we have a lot more vectors than CPUs. We'll
@@ -800,6 +827,8 @@ static void ixgbe_add_ring(struct ixgbe_ring *ring,
  * @v_idx: index of vector in adapter struct
  * @txr_count: total number of Tx rings to allocate
  * @txr_idx: index of first Tx ring to allocate
+ * @xdp_count: total number of XDP rings to allocate
+ * @xdp_idx: index of first XDP ring to allocate
  * @rxr_count: total number of Rx rings to allocate
  * @rxr_idx: index of first Rx ring to allocate
  *
@@ -808,6 +837,7 @@ static void ixgbe_add_ring(struct ixgbe_ring *ring,
 static int ixgbe_alloc_q_vector(struct ixgbe_adapter *adapter,
 				int v_count, int v_idx,
 				int txr_count, int txr_idx,
+				int xdp_count, int xdp_idx,
 				int rxr_count, int rxr_idx)
 {
 	struct ixgbe_q_vector *q_vector;
@@ -909,6 +939,33 @@ static int ixgbe_alloc_q_vector(struct ixgbe_adapter *adapter,
 		ring++;
 	}
 
+	while (xdp_count) {
+		/* assign generic ring traits */
+		ring->dev = &adapter->pdev->dev;
+		ring->netdev = adapter->netdev;
+
+		/* configure backlink on ring */
+		ring->q_vector = q_vector;
+
+		/* update q_vector Tx values */
+		ixgbe_add_ring(ring, &q_vector->tx);
+
+		/* apply Tx specific ring traits */
+		ring->count = adapter->xdp_ring_count;
+		ring->queue_index = xdp_idx;
+		ring->xdp_ring = true;
+
+		/* assign ring to adapter */
+		adapter->xdp_ring[xdp_idx] = ring;
+
+		/* update count and index */
+		xdp_count--;
+		xdp_idx += v_count;
+
+		/* push pointer to next ring */
+		ring++;
+	}
+
 	while (rxr_count) {
 		/* assign generic ring traits */
 		ring->dev = &adapter->pdev->dev;
@@ -1002,17 +1059,18 @@ static int ixgbe_alloc_q_vectors(struct ixgbe_adapter *adapter)
 	int q_vectors = adapter->num_q_vectors;
 	int rxr_remaining = adapter->num_rx_queues;
 	int txr_remaining = adapter->num_tx_queues;
-	int rxr_idx = 0, txr_idx = 0, v_idx = 0;
+	int xdp_remaining = adapter->num_xdp_queues;
+	int rxr_idx = 0, txr_idx = 0, xdp_idx = 0, v_idx = 0;
 	int err;
 
 	/* only one q_vector if MSI-X is disabled. */
 	if (!(adapter->flags & IXGBE_FLAG_MSIX_ENABLED))
 		q_vectors = 1;
 
-	if (q_vectors >= (rxr_remaining + txr_remaining)) {
+	if (q_vectors >= (rxr_remaining + txr_remaining + xdp_remaining)) {
 		for (; rxr_remaining; v_idx++) {
 			err = ixgbe_alloc_q_vector(adapter, q_vectors, v_idx,
-						   0, 0, 1, rxr_idx);
+						   0, 0, 0, 0, 1, rxr_idx);
 
 			if (err)
 				goto err_out;
@@ -1026,8 +1084,11 @@ static int ixgbe_alloc_q_vectors(struct ixgbe_adapter *adapter)
 	for (; v_idx < q_vectors; v_idx++) {
 		int rqpv = DIV_ROUND_UP(rxr_remaining, q_vectors - v_idx);
 		int tqpv = DIV_ROUND_UP(txr_remaining, q_vectors - v_idx);
+		int xqpv = DIV_ROUND_UP(xdp_remaining, q_vectors - v_idx);
+
 		err = ixgbe_alloc_q_vector(adapter, q_vectors, v_idx,
 					   tqpv, txr_idx,
+					   xqpv, xdp_idx,
 					   rqpv, rxr_idx);
 
 		if (err)
@@ -1036,14 +1097,17 @@ static int ixgbe_alloc_q_vectors(struct ixgbe_adapter *adapter)
 		/* update counts and index */
 		rxr_remaining -= rqpv;
 		txr_remaining -= tqpv;
+		xdp_remaining -= xqpv;
 		rxr_idx++;
 		txr_idx++;
+		xdp_idx++;
 	}
 
 	return 0;
 
 err_out:
 	adapter->num_tx_queues = 0;
+	adapter->num_xdp_queues = 0;
 	adapter->num_rx_queues = 0;
 	adapter->num_q_vectors = 0;
 
@@ -1066,6 +1130,7 @@ static void ixgbe_free_q_vectors(struct ixgbe_adapter *adapter)
 	int v_idx = adapter->num_q_vectors;
 
 	adapter->num_tx_queues = 0;
+	adapter->num_xdp_queues = 0;
 	adapter->num_rx_queues = 0;
 	adapter->num_q_vectors = 0;
 
@@ -1172,9 +1237,10 @@ int ixgbe_init_interrupt_scheme(struct ixgbe_adapter *adapter)
 
 	ixgbe_cache_ring_register(adapter);
 
-	e_dev_info("Multiqueue %s: Rx Queue count = %u, Tx Queue count = %u\n",
+	e_dev_info("Multiqueue %s: Rx Queue count = %u, Tx Queue count = %u XDP Queue count = %u\n",
 		   (adapter->num_rx_queues > 1) ? "Enabled" : "Disabled",
-		   adapter->num_rx_queues, adapter->num_tx_queues);
+		   adapter->num_rx_queues,
+		   adapter->num_tx_queues, adapter->num_xdp_queues);
 
 	set_bit(__IXGBE_DOWN, &adapter->state);
 
@@ -1195,6 +1261,7 @@ int ixgbe_init_interrupt_scheme(struct ixgbe_adapter *adapter)
 void ixgbe_clear_interrupt_scheme(struct ixgbe_adapter *adapter)
 {
 	adapter->num_tx_queues = 0;
+	adapter->num_xdp_queues = 0;
 	adapter->num_rx_queues = 0;
 
 	ixgbe_free_q_vectors(adapter);
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index ec2c38f..227caf8 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -594,6 +594,19 @@ static void ixgbe_regdump(struct ixgbe_hw *hw, struct ixgbe_reg_info *reginfo)
 
 }
 
+static void ixgbe_print_buffer(struct ixgbe_ring *ring, int n)
+{
+	struct ixgbe_tx_buffer *tx_buffer;
+
+	tx_buffer = &ring->tx_buffer_info[ring->next_to_clean];
+	pr_info(" %5d %5X %5X %016llX %08X %p %016llX\n",
+		   n, ring->next_to_use, ring->next_to_clean,
+		   (u64)dma_unmap_addr(tx_buffer, dma),
+		   dma_unmap_len(tx_buffer, len),
+		   tx_buffer->next_to_watch,
+		   (u64)tx_buffer->time_stamp);
+}
+
 /*
  * ixgbe_dump - Print registers, tx-rings and rx-rings
  */
@@ -603,7 +616,7 @@ static void ixgbe_dump(struct ixgbe_adapter *adapter)
 	struct ixgbe_hw *hw = &adapter->hw;
 	struct ixgbe_reg_info *reginfo;
 	int n = 0;
-	struct ixgbe_ring *tx_ring;
+	struct ixgbe_ring *ring;
 	struct ixgbe_tx_buffer *tx_buffer;
 	union ixgbe_adv_tx_desc *tx_desc;
 	struct my_u0 { u64 a; u64 b; } *u0;
@@ -644,14 +657,13 @@ static void ixgbe_dump(struct ixgbe_adapter *adapter)
 		"Queue [NTU] [NTC] [bi(ntc)->dma  ]",
 		"leng", "ntw", "timestamp");
 	for (n = 0; n < adapter->num_tx_queues; n++) {
-		tx_ring = adapter->tx_ring[n];
-		tx_buffer = &tx_ring->tx_buffer_info[tx_ring->next_to_clean];
-		pr_info(" %5d %5X %5X %016llX %08X %p %016llX\n",
-			   n, tx_ring->next_to_use, tx_ring->next_to_clean,
-			   (u64)dma_unmap_addr(tx_buffer, dma),
-			   dma_unmap_len(tx_buffer, len),
-			   tx_buffer->next_to_watch,
-			   (u64)tx_buffer->time_stamp);
+		ring = adapter->tx_ring[n];
+		ixgbe_print_buffer(ring, n);
+	}
+
+	for (n = 0; n < adapter->num_xdp_queues; n++) {
+		ring = adapter->xdp_ring[n];
+		ixgbe_print_buffer(ring, n);
 	}
 
 	/* Print TX Rings */
@@ -696,28 +708,28 @@ static void ixgbe_dump(struct ixgbe_adapter *adapter)
 	 */
 
 	for (n = 0; n < adapter->num_tx_queues; n++) {
-		tx_ring = adapter->tx_ring[n];
+		ring = adapter->tx_ring[n];
 		pr_info("------------------------------------\n");
-		pr_info("TX QUEUE INDEX = %d\n", tx_ring->queue_index);
+		pr_info("TX QUEUE INDEX = %d\n", ring->queue_index);
 		pr_info("------------------------------------\n");
 		pr_info("%s%s    %s              %s        %s          %s\n",
 			"T [desc]     [address 63:0  ] ",
 			"[PlPOIdStDDt Ln] [bi->dma       ] ",
 			"leng", "ntw", "timestamp", "bi->skb");
 
-		for (i = 0; tx_ring->desc && (i < tx_ring->count); i++) {
-			tx_desc = IXGBE_TX_DESC(tx_ring, i);
-			tx_buffer = &tx_ring->tx_buffer_info[i];
+		for (i = 0; ring->desc && (i < ring->count); i++) {
+			tx_desc = IXGBE_TX_DESC(ring, i);
+			tx_buffer = &ring->tx_buffer_info[i];
 			u0 = (struct my_u0 *)tx_desc;
 			if (dma_unmap_len(tx_buffer, len) > 0) {
 				const char *ring_desc;
 
-				if (i == tx_ring->next_to_use &&
-				    i == tx_ring->next_to_clean)
+				if (i == ring->next_to_use &&
+				    i == ring->next_to_clean)
 					ring_desc = " NTC/U";
-				else if (i == tx_ring->next_to_use)
+				else if (i == ring->next_to_use)
 					ring_desc = " NTU";
-				else if (i == tx_ring->next_to_clean)
+				else if (i == ring->next_to_clean)
 					ring_desc = " NTC";
 				else
 					ring_desc = "";
@@ -987,6 +999,10 @@ static void ixgbe_update_xoff_rx_lfc(struct ixgbe_adapter *adapter)
 	for (i = 0; i < adapter->num_tx_queues; i++)
 		clear_bit(__IXGBE_HANG_CHECK_ARMED,
 			  &adapter->tx_ring[i]->state);
+
+	for (i = 0; i < adapter->num_xdp_queues; i++)
+		clear_bit(__IXGBE_HANG_CHECK_ARMED,
+			  &adapter->xdp_ring[i]->state);
 }
 
 static void ixgbe_update_xoff_received(struct ixgbe_adapter *adapter)
@@ -1031,6 +1047,14 @@ static void ixgbe_update_xoff_received(struct ixgbe_adapter *adapter)
 		if (xoff[tc])
 			clear_bit(__IXGBE_HANG_CHECK_ARMED, &tx_ring->state);
 	}
+
+	for (i = 0; i < adapter->num_xdp_queues; i++) {
+		struct ixgbe_ring *xdp_ring = adapter->xdp_ring[i];
+
+		tc = xdp_ring->dcb_tc;
+		if (xoff[tc])
+			clear_bit(__IXGBE_HANG_CHECK_ARMED, &xdp_ring->state);
+	}
 }
 
 static u64 ixgbe_get_tx_completed(struct ixgbe_ring *ring)
@@ -1137,6 +1161,11 @@ static int ixgbe_tx_maxrate(struct net_device *netdev,
 	return 0;
 }
 
+static bool ixgbe_txring_is_xdp(struct ixgbe_ring *tx_ring)
+{
+	return tx_ring->xdp_ring;
+}
+
 /**
  * ixgbe_clean_tx_irq - Reclaim resources after transmit completes
  * @q_vector: structure containing interrupt and ring information
@@ -1182,7 +1211,10 @@ static bool ixgbe_clean_tx_irq(struct ixgbe_q_vector *q_vector,
 		total_packets += tx_buffer->gso_segs;
 
 		/* free the skb */
-		napi_consume_skb(tx_buffer->skb, napi_budget);
+		if (ixgbe_txring_is_xdp(tx_ring))
+			page_frag_free(tx_buffer->data);
+		else
+			napi_consume_skb(tx_buffer->skb, napi_budget);
 
 		/* unmap skb header data */
 		dma_unmap_single(tx_ring->dev,
@@ -1243,7 +1275,7 @@ static bool ixgbe_clean_tx_irq(struct ixgbe_q_vector *q_vector,
 	if (check_for_tx_hang(tx_ring) && ixgbe_check_tx_hang(tx_ring)) {
 		/* schedule immediate reset if we believe we hung */
 		struct ixgbe_hw *hw = &adapter->hw;
-		e_err(drv, "Detected Tx Unit Hang\n"
+		e_err(drv, "Detected Tx Unit Hang %s\n"
 			"  Tx Queue             <%d>\n"
 			"  TDH, TDT             <%x>, <%x>\n"
 			"  next_to_use          <%x>\n"
@@ -1251,13 +1283,16 @@ static bool ixgbe_clean_tx_irq(struct ixgbe_q_vector *q_vector,
 			"tx_buffer_info[next_to_clean]\n"
 			"  time_stamp           <%lx>\n"
 			"  jiffies              <%lx>\n",
+			ixgbe_txring_is_xdp(tx_ring) ? "(XDP)" : "",
 			tx_ring->queue_index,
 			IXGBE_READ_REG(hw, IXGBE_TDH(tx_ring->reg_idx)),
 			IXGBE_READ_REG(hw, IXGBE_TDT(tx_ring->reg_idx)),
 			tx_ring->next_to_use, i,
 			tx_ring->tx_buffer_info[i].time_stamp, jiffies);
 
-		netif_stop_subqueue(tx_ring->netdev, tx_ring->queue_index);
+		if (!ixgbe_txring_is_xdp(tx_ring))
+			netif_stop_subqueue(tx_ring->netdev,
+					    tx_ring->queue_index);
 
 		e_info(probe,
 		       "tx hang %d detected on queue %d, resetting adapter\n",
@@ -1270,6 +1305,9 @@ static bool ixgbe_clean_tx_irq(struct ixgbe_q_vector *q_vector,
 		return true;
 	}
 
+	if (ixgbe_txring_is_xdp(tx_ring))
+		return !!budget;
+
 	netdev_tx_completed_queue(txring_txq(tx_ring),
 				  total_packets, total_bytes);
 
@@ -2160,10 +2198,16 @@ static struct sk_buff *ixgbe_build_skb(struct ixgbe_ring *rx_ring,
 	return skb;
 }
 
-static int ixgbe_run_xdp(struct ixgbe_ring  *rx_ring,
+static int ixgbe_xmit_xdp_ring(struct xdp_buff *xdp,
+			       struct ixgbe_adapter *adapter,
+			       struct ixgbe_ring *tx_ring);
+
+static int ixgbe_run_xdp(struct ixgbe_adapter *adapter,
+			 struct ixgbe_ring  *rx_ring,
 			 struct ixgbe_rx_buffer *rx_buffer,
 			 unsigned int size)
 {
+	struct ixgbe_ring *xdp_ring;
 	struct bpf_prog *xdp_prog;
 	struct xdp_buff xdp;
 	void *addr;
@@ -2183,9 +2227,18 @@ static int ixgbe_run_xdp(struct ixgbe_ring  *rx_ring,
 	switch (act) {
 	case XDP_PASS:
 		return 0;
+	case XDP_TX:
+		xdp_ring = adapter->xdp_ring[smp_processor_id()];
+
+		/* XDP_TX is not flow controlled */
+		if (!xdp_ring) {
+			WARN_ON(1);
+			break;
+		}
+		ixgbe_xmit_xdp_ring(&xdp, adapter, xdp_ring);
+		break;
 	default:
 		bpf_warn_invalid_xdp_action(act);
-	case XDP_TX:
 	case XDP_ABORTED:
 		trace_xdp_exception(rx_ring->netdev, xdp_prog, act);
 		/* fallthrough -- handle aborts by dropping packet */
@@ -2214,8 +2267,8 @@ static int ixgbe_clean_rx_irq(struct ixgbe_q_vector *q_vector,
 			       const int budget)
 {
 	unsigned int total_rx_bytes = 0, total_rx_packets = 0;
-#ifdef IXGBE_FCOE
 	struct ixgbe_adapter *adapter = q_vector->adapter;
+#ifdef IXGBE_FCOE
 	int ddp_bytes;
 	unsigned int mss = 0;
 #endif /* IXGBE_FCOE */
@@ -2248,7 +2301,7 @@ static int ixgbe_clean_rx_irq(struct ixgbe_q_vector *q_vector,
 		rx_buffer = ixgbe_get_rx_buffer(rx_ring, rx_desc, &skb, size);
 
 		rcu_read_lock();
-		consumed = ixgbe_run_xdp(rx_ring, rx_buffer, size);
+		consumed = ixgbe_run_xdp(adapter, rx_ring, rx_buffer, size);
 		rcu_read_unlock();
 
 		if (consumed) {
@@ -2914,6 +2967,15 @@ static irqreturn_t ixgbe_msix_other(int irq, void *data)
 						       &ring->state))
 					reinit_count++;
 			}
+
+			for (i = 0; i < adapter->num_xdp_queues; i++) {
+				struct ixgbe_ring *ring = adapter->xdp_ring[i];
+
+				if (test_and_clear_bit(__IXGBE_TX_FDIR_INIT_DONE,
+						       &ring->state))
+					reinit_count++;
+			}
+
 			if (reinit_count) {
 				/* no more flow director interrupts until after init */
 				IXGBE_WRITE_REG(hw, IXGBE_EIMC, IXGBE_EIMC_FLOW_DIR);
@@ -3429,6 +3491,8 @@ static void ixgbe_configure_tx(struct ixgbe_adapter *adapter)
 	/* Setup the HW Tx Head and Tail descriptor pointers */
 	for (i = 0; i < adapter->num_tx_queues; i++)
 		ixgbe_configure_tx_ring(adapter, adapter->tx_ring[i]);
+	for (i = 0; i < adapter->num_xdp_queues; i++)
+		ixgbe_configure_tx_ring(adapter, adapter->xdp_ring[i]);
 }
 
 static void ixgbe_enable_rx_drop(struct ixgbe_adapter *adapter,
@@ -5598,7 +5662,8 @@ static void ixgbe_clean_tx_ring(struct ixgbe_ring *tx_ring)
 	}
 
 	/* reset BQL for queue */
-	netdev_tx_reset_queue(txring_txq(tx_ring));
+	if (!ixgbe_txring_is_xdp(tx_ring))
+		netdev_tx_reset_queue(txring_txq(tx_ring));
 
 	/* reset next_to_use and next_to_clean */
 	tx_ring->next_to_use = 0;
@@ -5627,6 +5692,8 @@ static void ixgbe_clean_all_tx_rings(struct ixgbe_adapter *adapter)
 
 	for (i = 0; i < adapter->num_tx_queues; i++)
 		ixgbe_clean_tx_ring(adapter->tx_ring[i]);
+	for (i = 0; i < adapter->num_xdp_queues; i++)
+		ixgbe_clean_tx_ring(adapter->xdp_ring[i]);
 }
 
 static void ixgbe_fdir_filter_exit(struct ixgbe_adapter *adapter)
@@ -5721,6 +5788,11 @@ void ixgbe_down(struct ixgbe_adapter *adapter)
 		u8 reg_idx = adapter->tx_ring[i]->reg_idx;
 		IXGBE_WRITE_REG(hw, IXGBE_TXDCTL(reg_idx), IXGBE_TXDCTL_SWFLSH);
 	}
+	for (i = 0; i < adapter->num_xdp_queues; i++) {
+		u8 reg_idx = adapter->xdp_ring[i]->reg_idx;
+
+		IXGBE_WRITE_REG(hw, IXGBE_TXDCTL(reg_idx), IXGBE_TXDCTL_SWFLSH);
+	}
 
 	/* Disable the Tx DMA engine on 82599 and later MAC */
 	switch (hw->mac.type) {
@@ -6008,6 +6080,7 @@ static int ixgbe_sw_init(struct ixgbe_adapter *adapter,
 
 	/* set default ring sizes */
 	adapter->tx_ring_count = IXGBE_DEFAULT_TXD;
+	adapter->xdp_ring_count = IXGBE_DEFAULT_TXD;
 	adapter->rx_ring_count = IXGBE_DEFAULT_RXD;
 
 	/* set default work limits */
@@ -6089,7 +6162,7 @@ int ixgbe_setup_tx_resources(struct ixgbe_ring *tx_ring)
  **/
 static int ixgbe_setup_all_tx_resources(struct ixgbe_adapter *adapter)
 {
-	int i, err = 0;
+	int i, j = 0, err = 0;
 
 	for (i = 0; i < adapter->num_tx_queues; i++) {
 		err = ixgbe_setup_tx_resources(adapter->tx_ring[i]);
@@ -6099,12 +6172,23 @@ static int ixgbe_setup_all_tx_resources(struct ixgbe_adapter *adapter)
 		e_err(probe, "Allocation for Tx Queue %u failed\n", i);
 		goto err_setup_tx;
 	}
+	for (j = 0; j < adapter->num_xdp_queues; j++) {
+		err = ixgbe_setup_tx_resources(adapter->xdp_ring[j]);
+		if (!err)
+			continue;
+
+		e_err(probe, "Allocation for Tx Queue %u failed\n", j);
+		goto err_setup_tx;
+	}
+
 
 	return 0;
 err_setup_tx:
 	/* rewind the index freeing the rings as we go */
 	while (i--)
 		ixgbe_free_tx_resources(adapter->tx_ring[i]);
+	while (j--)
+		ixgbe_free_tx_resources(adapter->xdp_ring[j]);
 	return err;
 }
 
@@ -6238,6 +6322,9 @@ static void ixgbe_free_all_tx_resources(struct ixgbe_adapter *adapter)
 	for (i = 0; i < adapter->num_tx_queues; i++)
 		if (adapter->tx_ring[i]->desc)
 			ixgbe_free_tx_resources(adapter->tx_ring[i]);
+	for (i = 0; i < adapter->num_xdp_queues; i++)
+		if (adapter->xdp_ring[i]->desc)
+			ixgbe_free_tx_resources(adapter->xdp_ring[i]);
 }
 
 /**
@@ -6656,6 +6743,14 @@ void ixgbe_update_stats(struct ixgbe_adapter *adapter)
 		bytes += tx_ring->stats.bytes;
 		packets += tx_ring->stats.packets;
 	}
+	for (i = 0; i < adapter->num_xdp_queues; i++) {
+		struct ixgbe_ring *xdp_ring = adapter->xdp_ring[i];
+
+		restart_queue += xdp_ring->tx_stats.restart_queue;
+		tx_busy += xdp_ring->tx_stats.tx_busy;
+		bytes += xdp_ring->stats.bytes;
+		packets += xdp_ring->stats.packets;
+	}
 	adapter->restart_queue = restart_queue;
 	adapter->tx_busy = tx_busy;
 	netdev->stats.tx_bytes = bytes;
@@ -6849,6 +6944,9 @@ static void ixgbe_fdir_reinit_subtask(struct ixgbe_adapter *adapter)
 		for (i = 0; i < adapter->num_tx_queues; i++)
 			set_bit(__IXGBE_TX_FDIR_INIT_DONE,
 				&(adapter->tx_ring[i]->state));
+		for (i = 0; i < adapter->num_xdp_queues; i++)
+			set_bit(__IXGBE_TX_FDIR_INIT_DONE,
+				&adapter->xdp_ring[i]->state);
 		/* re-enable flow director interrupts */
 		IXGBE_WRITE_REG(hw, IXGBE_EIMS, IXGBE_EIMS_FLOW_DIR);
 	} else {
@@ -6882,6 +6980,8 @@ static void ixgbe_check_hang_subtask(struct ixgbe_adapter *adapter)
 	if (netif_carrier_ok(adapter->netdev)) {
 		for (i = 0; i < adapter->num_tx_queues; i++)
 			set_check_for_tx_hang(adapter->tx_ring[i]);
+		for (i = 0; i < adapter->num_xdp_queues; i++)
+			set_check_for_tx_hang(adapter->xdp_ring[i]);
 	}
 
 	if (!(adapter->flags & IXGBE_FLAG_MSIX_ENABLED)) {
@@ -7112,6 +7212,13 @@ static bool ixgbe_ring_tx_pending(struct ixgbe_adapter *adapter)
 			return true;
 	}
 
+	for (i = 0; i < adapter->num_xdp_queues; i++) {
+		struct ixgbe_ring *ring = adapter->xdp_ring[i];
+
+		if (ring->next_to_use != ring->next_to_clean)
+			return true;
+	}
+
 	return false;
 }
 
@@ -8072,6 +8179,99 @@ static u16 ixgbe_select_queue(struct net_device *dev, struct sk_buff *skb,
 #endif
 }
 
+static int ixgbe_xmit_xdp_ring(struct xdp_buff *xdp,
+			       struct ixgbe_adapter *adapter,
+			       struct ixgbe_ring *ring)
+{
+	struct ixgbe_tx_buffer *tx_buffer;
+	union ixgbe_adv_tx_desc *tx_desc;
+	u32 len, tx_flags = 0;
+	dma_addr_t dma;
+	u16 count, i;
+	u32 cmd_type;
+
+	len = xdp->data_end - xdp->data;
+	count = TXD_USE_COUNT(len);
+
+	/* record the location of the first descriptor for this packet */
+	tx_buffer = &ring->tx_buffer_info[ring->next_to_use];
+	tx_buffer->skb = NULL;
+	tx_buffer->bytecount = len;
+	tx_buffer->gso_segs = 1;
+	tx_buffer->protocol = 0;
+
+#ifdef CONFIG_PCI_IOV
+	/* Use the l2switch_enable flag - would be false if the DMA
+	 * Tx switch had been disabled.
+	 */
+	if (adapter->flags & IXGBE_FLAG_SRIOV_ENABLED)
+		tx_flags |= IXGBE_TX_FLAGS_CC;
+#endif
+
+	tx_buffer->tx_flags = tx_flags;
+	i = ring->next_to_use;
+	tx_desc = IXGBE_TX_DESC(ring, i);
+
+	dma = dma_map_single(ring->dev, xdp->data, len, DMA_TO_DEVICE);
+	if (dma_mapping_error(ring->dev, dma))
+		goto dma_error;
+
+	dma_unmap_len_set(tx_buffer, len, len);
+	dma_unmap_addr_set(tx_buffer, dma, dma);
+	tx_buffer->data = xdp->data;
+	tx_desc->read.buffer_addr = cpu_to_le64(dma);
+
+	/* put descriptor type bits */
+	cmd_type = IXGBE_ADVTXD_DTYP_DATA |
+		   IXGBE_ADVTXD_DCMD_DEXT |
+		   IXGBE_ADVTXD_DCMD_IFCS |
+		   IXGBE_ADVTXD_DCMD_EOP;
+	cmd_type |= len | IXGBE_TXD_CMD;
+	tx_desc->read.cmd_type_len = cpu_to_le32(cmd_type);
+	tx_desc->read.olinfo_status =
+		cpu_to_le32(len << IXGBE_ADVTXD_PAYLEN_SHIFT);
+
+	/* With normal buffer flow we would also set the time_stamp. But in
+	 * XDP case we can safely skip it because@the moment it is not
+	 * used anywhere.
+	 */
+	tx_buffer->time_stamp = jiffies;
+
+	/* Force memory writes to complete before letting h/w know there
+	 * are new descriptors to fetch.  (Only applicable for weak-ordered
+	 * memory model archs, such as IA-64).
+	 *
+	 * We also need this memory barrier to make certain all of the
+	 * status bits have been updated before next_to_watch is written.
+	 */
+	wmb();
+
+	/* set next_to_watch value indicating a packet is present */
+	i++;
+	if (i == ring->count)
+		i = 0;
+
+	tx_buffer->next_to_watch = tx_desc;
+	ring->next_to_use = i;
+
+	writel(i, ring->tail);
+
+	/* we need this if more than one processor can write to our tail
+	 * at a time, it synchronizes IO on IA64/Altix systems
+	 */
+	mmiowb();
+	return NETDEV_TX_OK;
+dma_error:
+	if (dma_unmap_len(tx_buffer, len))
+		dma_unmap_single(ring->dev,
+				 dma_unmap_addr(tx_buffer, dma),
+				 dma_unmap_len(tx_buffer, len),
+				 DMA_TO_DEVICE);
+	dma_unmap_len_set(tx_buffer, len, 0);
+	ring->next_to_use = i;
+	return -ENOMEM;
+}
+
 netdev_tx_t ixgbe_xmit_frame_ring(struct sk_buff *skb,
 			  struct ixgbe_adapter *adapter,
 			  struct ixgbe_ring *tx_ring)
@@ -8363,6 +8563,23 @@ static void ixgbe_netpoll(struct net_device *netdev)
 
 #endif
 
+static void ixgbe_get_ring_stats64(struct rtnl_link_stats64 *stats,
+				   struct ixgbe_ring *ring)
+{
+	u64 bytes, packets;
+	unsigned int start;
+
+	if (ring) {
+		do {
+			start = u64_stats_fetch_begin_irq(&ring->syncp);
+			packets = ring->stats.packets;
+			bytes   = ring->stats.bytes;
+		} while (u64_stats_fetch_retry_irq(&ring->syncp, start));
+		stats->tx_packets += packets;
+		stats->tx_bytes   += bytes;
+	}
+}
+
 static void ixgbe_get_stats64(struct net_device *netdev,
 			      struct rtnl_link_stats64 *stats)
 {
@@ -8388,18 +8605,13 @@ static void ixgbe_get_stats64(struct net_device *netdev,
 
 	for (i = 0; i < adapter->num_tx_queues; i++) {
 		struct ixgbe_ring *ring = ACCESS_ONCE(adapter->tx_ring[i]);
-		u64 bytes, packets;
-		unsigned int start;
 
-		if (ring) {
-			do {
-				start = u64_stats_fetch_begin_irq(&ring->syncp);
-				packets = ring->stats.packets;
-				bytes   = ring->stats.bytes;
-			} while (u64_stats_fetch_retry_irq(&ring->syncp, start));
-			stats->tx_packets += packets;
-			stats->tx_bytes   += bytes;
-		}
+		ixgbe_get_ring_stats64(stats, ring);
+	}
+	for (i = 0; i < adapter->num_xdp_queues; i++) {
+		struct ixgbe_ring *ring = ACCESS_ONCE(adapter->xdp_ring[i]);
+
+		ixgbe_get_ring_stats64(stats, ring);
 	}
 	rcu_read_unlock();
 
@@ -8533,6 +8745,7 @@ int ixgbe_setup_tc(struct net_device *dev, u8 tc)
 	}
 
 	ixgbe_validate_rtr(adapter, tc);
+	adapter->num_xdp_queues = adapter->xdp_prog ? nr_cpu_ids : 0;
 
 #endif /* CONFIG_IXGBE_DCB */
 	ixgbe_init_interrupt_scheme(adapter);
@@ -9520,7 +9733,7 @@ static int ixgbe_xdp_setup(struct net_device *dev, struct bpf_prog *prog)
 {
 	int i, frame_size = dev->mtu + ETH_HLEN + ETH_FCS_LEN + VLAN_HLEN;
 	struct ixgbe_adapter *adapter = netdev_priv(dev);
-	struct bpf_prog *old_adapter_prog;
+	struct bpf_prog *old_prog;
 
 	/* verify ixgbe ring attributes are sufficient for XDP */
 	for (i = 0; i < adapter->num_rx_queues; i++) {
@@ -9533,12 +9746,26 @@ static int ixgbe_xdp_setup(struct net_device *dev, struct bpf_prog *prog)
 			return -EINVAL;
 	}
 
-	old_adapter_prog = xchg(&adapter->xdp_prog, prog);
+	if (nr_cpu_ids > MAX_XDP_QUEUES)
+		return -ENOMEM;
+
+	old_prog = xchg(&adapter->xdp_prog, prog);
+
+	/* If transitioning XDP modes reconfigure rings */
+	if (!!adapter->xdp_prog != !!old_prog) {
+		int err = ixgbe_setup_tc(dev, netdev_get_num_tc(dev));
+
+		if (err) {
+			rcu_assign_pointer(adapter->xdp_prog, old_prog);
+			return -EINVAL;
+		}
+	}
+
 	for (i = 0; i < adapter->num_rx_queues; i++)
 		ixgbe_setup_xdp_resource(adapter, adapter->rx_ring[i]);
 
-	if (old_adapter_prog)
-		bpf_prog_put(old_adapter_prog);
+	if (old_prog)
+		bpf_prog_put(old_prog);
 
 	return 0;
 }
@@ -10044,6 +10271,9 @@ static int ixgbe_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
 	for (i = 0; i < adapter->num_tx_queues; i++)
 		u64_stats_init(&adapter->tx_ring[i]->syncp);
 
+	for (i = 0; i < adapter->num_xdp_queues; i++)
+		u64_stats_init(&adapter->xdp_ring[i]->syncp);
+
 	/* WOL not supported for all devices */
 	adapter->wol = 0;
 	hw->eeprom.ops.read(hw, 0x2c, &adapter->eeprom_cap);


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [Intel-wired-lan] [net-next PATCH 3/3] ixgbe: xdp support for adjust head
  2017-02-25 17:32 [Intel-wired-lan] [net-next PATCH 0/3] XDP for ixgbe John Fastabend
  2017-02-25 17:32 ` [Intel-wired-lan] [net-next PATCH 1/3] ixgbe: add XDP support for pass and drop actions John Fastabend
  2017-02-25 17:32 ` [Intel-wired-lan] [net-next PATCH 2/3] ixgbe: add support for XDP_TX action John Fastabend
@ 2017-02-25 17:33 ` John Fastabend
  2017-02-27 16:32   ` Alexander Duyck
  2017-02-25 17:38 ` [Intel-wired-lan] [net-next PATCH 0/3] XDP for ixgbe John Fastabend
  2017-02-25 18:58 ` Alexander Duyck
  4 siblings, 1 reply; 14+ messages in thread
From: John Fastabend @ 2017-02-25 17:33 UTC (permalink / raw)
  To: intel-wired-lan

Add adjust_head support for XDP however at the moment we are only
adding IXGBE_SKB_PAD bytes of headroom to align with driver paths.

The infrastructure is is such that a follow on patch can extend
headroom up to 196B without changing RX path.

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
---
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c |   45 +++++++++++++++++--------
 1 file changed, 31 insertions(+), 14 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index 227caf8..040e469 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -2031,6 +2031,7 @@ static bool ixgbe_can_reuse_rx_page(struct ixgbe_rx_buffer *rx_buffer)
 static void ixgbe_add_rx_frag(struct ixgbe_ring *rx_ring,
 			      struct ixgbe_rx_buffer *rx_buffer,
 			      struct sk_buff *skb,
+			      unsigned int headroom,
 			      unsigned int size)
 {
 #if (PAGE_SIZE < 8192)
@@ -2041,7 +2042,8 @@ static void ixgbe_add_rx_frag(struct ixgbe_ring *rx_ring,
 				SKB_DATA_ALIGN(size);
 #endif
 	skb_add_rx_frag(skb, skb_shinfo(skb)->nr_frags, rx_buffer->page,
-			rx_buffer->page_offset, size, truesize);
+			rx_buffer->page_offset - (IXGBE_SKB_PAD - headroom),
+			size, truesize);
 #if (PAGE_SIZE < 8192)
 	rx_buffer->page_offset ^= truesize;
 #else
@@ -2114,6 +2116,7 @@ static void ixgbe_put_rx_buffer(struct ixgbe_ring *rx_ring,
 static struct sk_buff *ixgbe_construct_skb(struct ixgbe_ring *rx_ring,
 					   struct ixgbe_rx_buffer *rx_buffer,
 					   union ixgbe_adv_rx_desc *rx_desc,
+					   unsigned int headroom,
 					   unsigned int size)
 {
 	void *va = page_address(rx_buffer->page) + rx_buffer->page_offset;
@@ -2122,6 +2125,7 @@ static struct sk_buff *ixgbe_construct_skb(struct ixgbe_ring *rx_ring,
 #else
 	unsigned int truesize = SKB_DATA_ALIGN(size);
 #endif
+	unsigned int off_page;
 	struct sk_buff *skb;
 
 	/* prefetch first cache line of first page */
@@ -2135,12 +2139,14 @@ static struct sk_buff *ixgbe_construct_skb(struct ixgbe_ring *rx_ring,
 	if (unlikely(!skb))
 		return NULL;
 
+	off_page = IXGBE_SKB_PAD - headroom;
+
 	if (size > IXGBE_RX_HDR_SIZE) {
 		if (!ixgbe_test_staterr(rx_desc, IXGBE_RXD_STAT_EOP))
 			IXGBE_CB(skb)->dma = rx_buffer->dma;
 
 		skb_add_rx_frag(skb, 0, rx_buffer->page,
-				rx_buffer->page_offset,
+				rx_buffer->page_offset - off_page,
 				size, truesize);
 #if (PAGE_SIZE < 8192)
 		rx_buffer->page_offset ^= truesize;
@@ -2148,7 +2154,8 @@ static struct sk_buff *ixgbe_construct_skb(struct ixgbe_ring *rx_ring,
 		rx_buffer->page_offset += truesize;
 #endif
 	} else {
-		memcpy(__skb_put(skb, size), va, ALIGN(size, sizeof(long)));
+		memcpy(__skb_put(skb, size), va - off_page,
+		       ALIGN(size, sizeof(long)));
 		rx_buffer->pagecnt_bias++;
 	}
 
@@ -2158,6 +2165,7 @@ static struct sk_buff *ixgbe_construct_skb(struct ixgbe_ring *rx_ring,
 static struct sk_buff *ixgbe_build_skb(struct ixgbe_ring *rx_ring,
 				       struct ixgbe_rx_buffer *rx_buffer,
 				       union ixgbe_adv_rx_desc *rx_desc,
+				       unsigned int headroom,
 				       unsigned int size)
 {
 	void *va = page_address(rx_buffer->page) + rx_buffer->page_offset;
@@ -2181,7 +2189,7 @@ static struct sk_buff *ixgbe_build_skb(struct ixgbe_ring *rx_ring,
 		return NULL;
 
 	/* update pointers within the skb to store the data */
-	skb_reserve(skb, IXGBE_SKB_PAD);
+	skb_reserve(skb, headroom);
 	__skb_put(skb, size);
 
 	/* record DMA address if this is the start of a chain of buffers */
@@ -2202,10 +2210,14 @@ static int ixgbe_xmit_xdp_ring(struct xdp_buff *xdp,
 			       struct ixgbe_adapter *adapter,
 			       struct ixgbe_ring *tx_ring);
 
+#define IXGBE_XDP_PASS 0
+#define IXGBE_XDP_CONSUMED 1
+
 static int ixgbe_run_xdp(struct ixgbe_adapter *adapter,
 			 struct ixgbe_ring  *rx_ring,
 			 struct ixgbe_rx_buffer *rx_buffer,
-			 unsigned int size)
+			 unsigned int *headroom,
+			 unsigned int *size)
 {
 	struct ixgbe_ring *xdp_ring;
 	struct bpf_prog *xdp_prog;
@@ -2216,17 +2228,19 @@ static int ixgbe_run_xdp(struct ixgbe_adapter *adapter,
 	xdp_prog = READ_ONCE(rx_ring->xdp_prog);
 
 	if (!xdp_prog)
-		return 0;
+		return IXGBE_XDP_PASS;
 
 	addr = page_address(rx_buffer->page) + rx_buffer->page_offset;
-	xdp.data_hard_start = addr;
+	xdp.data_hard_start = addr - *headroom;
 	xdp.data = addr;
-	xdp.data_end = addr + size;
+	xdp.data_end = addr + *size;
 
 	act = bpf_prog_run_xdp(xdp_prog, &xdp);
 	switch (act) {
 	case XDP_PASS:
-		return 0;
+		*headroom = xdp.data - xdp.data_hard_start;
+		*size = xdp.data_end - xdp.data;
+		return IXGBE_XDP_PASS;
 	case XDP_TX:
 		xdp_ring = adapter->xdp_ring[smp_processor_id()];
 
@@ -2246,7 +2260,7 @@ static int ixgbe_run_xdp(struct ixgbe_adapter *adapter,
 		rx_buffer->pagecnt_bias++; /* give page back */
 		break;
 	}
-	return size;
+	return IXGBE_XDP_CONSUMED;
 }
 
 /**
@@ -2275,6 +2289,7 @@ static int ixgbe_clean_rx_irq(struct ixgbe_q_vector *q_vector,
 	u16 cleaned_count = ixgbe_desc_unused(rx_ring);
 
 	while (likely(total_rx_packets < budget)) {
+		unsigned int headroom = ixgbe_rx_offset(rx_ring);
 		union ixgbe_adv_rx_desc *rx_desc;
 		struct ixgbe_rx_buffer *rx_buffer;
 		struct sk_buff *skb;
@@ -2301,7 +2316,8 @@ static int ixgbe_clean_rx_irq(struct ixgbe_q_vector *q_vector,
 		rx_buffer = ixgbe_get_rx_buffer(rx_ring, rx_desc, &skb, size);
 
 		rcu_read_lock();
-		consumed = ixgbe_run_xdp(adapter, rx_ring, rx_buffer, size);
+		consumed = ixgbe_run_xdp(adapter, rx_ring, rx_buffer,
+					 &headroom, &size);
 		rcu_read_unlock();
 
 		if (consumed) {
@@ -2315,13 +2331,14 @@ static int ixgbe_clean_rx_irq(struct ixgbe_q_vector *q_vector,
 
 		/* retrieve a buffer from the ring */
 		if (skb)
-			ixgbe_add_rx_frag(rx_ring, rx_buffer, skb, size);
+			ixgbe_add_rx_frag(rx_ring, rx_buffer, skb,
+					  headroom, size);
 		else if (ring_uses_build_skb(rx_ring))
 			skb = ixgbe_build_skb(rx_ring, rx_buffer,
-					      rx_desc, size);
+					      rx_desc, headroom, size);
 		else
 			skb = ixgbe_construct_skb(rx_ring, rx_buffer,
-						  rx_desc, size);
+						  rx_desc, headroom, size);
 
 		/* exit if we failed to retrieve a buffer */
 		if (!skb) {


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [Intel-wired-lan] [net-next PATCH 0/3] XDP for ixgbe
  2017-02-25 17:32 [Intel-wired-lan] [net-next PATCH 0/3] XDP for ixgbe John Fastabend
                   ` (2 preceding siblings ...)
  2017-02-25 17:33 ` [Intel-wired-lan] [net-next PATCH 3/3] ixgbe: xdp support for adjust head John Fastabend
@ 2017-02-25 17:38 ` John Fastabend
  2017-02-27  1:35   ` Jeff Kirsher
  2017-02-25 18:58 ` Alexander Duyck
  4 siblings, 1 reply; 14+ messages in thread
From: John Fastabend @ 2017-02-25 17:38 UTC (permalink / raw)
  To: intel-wired-lan

On 17-02-25 09:32 AM, John Fastabend wrote:
> This series adds support for XDP on ixgbe. We still need to understand
> adjust head size requirement. If we can compromise at 196B (is this
> correct Alex?) then we can continue to use normal driver RX path and
> avoid having XDP codebase + normal codebase. This is a big win for
> everyone who has to read this code day to day and work on it. I
> suggest if more headroom is needed then it should also be needed in
> the normal stack case and we should provide a generic mechanism to
> build up more headroom. Plus we already have ndo_set_rx_headroom()
> can we just use this?
> 
> If this series can get accepted then we have a series behind it to
> enable batching on TX to push TX Mpps up to line rates. The gist of
> the implementation is to run XDP program in a loop, collecting the
> action results in an array. And then pushing them into the TX routine.
> For a first gen this will likely abort if we get a XDP_PASS routine
> but this is just a matter of code wrangling and lazyness. It can be
> resolved.
> 
> Future looking some improvements are needed, TX routines should take
> an array of packets if we believe long trains of packets will be on
> the RX ring inside a "processing window" (how many descriptors we
> handle per irq clean). Note with many queues on devices we can ensure
> this happens with flow director or even RSS in some cases.
> 
> @Alex, please review. Look at patch 2/3 in paticular and let me know
> what you think about the trade-offs I made there w.r.t. num_xdp_queues
> 
> ---

Hi Jeff,

There might need to be a couple versions to address feedback, but I
assume you can put this on your dev_queue for whenever net-next opens?

Thanks,
John


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Intel-wired-lan] [net-next PATCH 1/3] ixgbe: add XDP support for pass and drop actions
  2017-02-25 17:32 ` [Intel-wired-lan] [net-next PATCH 1/3] ixgbe: add XDP support for pass and drop actions John Fastabend
@ 2017-02-25 18:18   ` kbuild test robot
  2017-02-27 16:22   ` Alexander Duyck
  1 sibling, 0 replies; 14+ messages in thread
From: kbuild test robot @ 2017-02-25 18:18 UTC (permalink / raw)
  To: intel-wired-lan

Hi John,

[auto build test WARNING on jkirsher-next-queue/dev-queue]
[also build test WARNING on next-20170224]
[cannot apply to v4.10]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/John-Fastabend/XDP-for-ixgbe/20170226-013816
base:   https://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue.git dev-queue
config: xtensa-allmodconfig (attached as .config)
compiler: xtensa-linux-gcc (GCC) 4.9.0
reproduce:
        wget https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # save the attached .config to linux build tree
        make.cross ARCH=xtensa 

All warnings (new ones prefixed by >>):

   In file included from arch/xtensa/include/asm/atomic.h:21:0,
                    from include/linux/atomic.h:4,
                    from include/linux/debug_locks.h:5,
                    from include/linux/lockdep.h:25,
                    from include/linux/spinlock_types.h:18,
                    from include/linux/spinlock.h:81,
                    from include/linux/seqlock.h:35,
                    from include/linux/time.h:5,
                    from include/linux/stat.h:18,
                    from include/linux/module.h:10,
                    from drivers/net/ethernet/intel/ixgbe/ixgbe_main.c:30:
   drivers/net/ethernet/intel/ixgbe/ixgbe_main.c: In function 'ixgbe_setup_xdp_resource':
   arch/xtensa/include/asm/cmpxchg.h:139:3: warning: value computed is not used [-Wunused-value]
     ((__typeof__(*(ptr)))__xchg((unsigned long)(x),(ptr),sizeof(*(ptr))))
      ^
>> drivers/net/ethernet/intel/ixgbe/ixgbe_main.c:6165:2: note: in expansion of macro 'xchg'
     xchg(&ring->xdp_prog, adapter->xdp_prog);
     ^

vim +/xchg +6165 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c

  6149			goto err;
  6150	
  6151		rx_ring->next_to_clean = 0;
  6152		rx_ring->next_to_use = 0;
  6153	
  6154		return 0;
  6155	err:
  6156		vfree(rx_ring->rx_buffer_info);
  6157		rx_ring->rx_buffer_info = NULL;
  6158		dev_err(dev, "Unable to allocate memory for the Rx descriptor ring\n");
  6159		return -ENOMEM;
  6160	}
  6161	
  6162	static void ixgbe_setup_xdp_resource(struct ixgbe_adapter *adapter,
  6163					     struct ixgbe_ring *ring)
  6164	{
> 6165		xchg(&ring->xdp_prog, adapter->xdp_prog);
  6166	}
  6167	
  6168	/**
  6169	 * ixgbe_setup_all_rx_resources - allocate all queues Rx resources
  6170	 * @adapter: board private structure
  6171	 *
  6172	 * If this function returns with an error, then it's possible one or
  6173	 * more of the rings is populated (while the rest are not).  It is the

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation
-------------- next part --------------
A non-text attachment was scrubbed...
Name: .config.gz
Type: application/gzip
Size: 48209 bytes
Desc: not available
URL: <http://lists.osuosl.org/pipermail/intel-wired-lan/attachments/20170226/3d5a6f37/attachment-0001.bin>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Intel-wired-lan] [net-next PATCH 0/3] XDP for ixgbe
  2017-02-25 17:32 [Intel-wired-lan] [net-next PATCH 0/3] XDP for ixgbe John Fastabend
                   ` (3 preceding siblings ...)
  2017-02-25 17:38 ` [Intel-wired-lan] [net-next PATCH 0/3] XDP for ixgbe John Fastabend
@ 2017-02-25 18:58 ` Alexander Duyck
  4 siblings, 0 replies; 14+ messages in thread
From: Alexander Duyck @ 2017-02-25 18:58 UTC (permalink / raw)
  To: intel-wired-lan

On Sat, Feb 25, 2017 at 9:32 AM, John Fastabend
<john.fastabend@gmail.com> wrote:
> This series adds support for XDP on ixgbe. We still need to understand
> adjust head size requirement. If we can compromise at 196B (is this
> correct Alex?) then we can continue to use normal driver RX path and

In theory on x86 there is 192B of headroom we can reserve since shared
infor is only 320B.  That leaves us with 1536 for packet storage.

That said the current code is only reserving NET_SKB_PAD +
NET_IP_ALIGN as I hadn't taken into account the need for more headroom
than that when I did the ixgbe patches.  I'll try to submit a
follow-up patch for igb and ixgbe to update the padding.

> avoid having XDP codebase + normal codebase. This is a big win for
> everyone who has to read this code day to day and work on it. I
> suggest if more headroom is needed then it should also be needed in
> the normal stack case and we should provide a generic mechanism to
> build up more headroom. Plus we already have ndo_set_rx_headroom()
> can we just use this?
>
> If this series can get accepted then we have a series behind it to
> enable batching on TX to push TX Mpps up to line rates. The gist of
> the implementation is to run XDP program in a loop, collecting the
> action results in an array. And then pushing them into the TX routine.
> For a first gen this will likely abort if we get a XDP_PASS routine
> but this is just a matter of code wrangling and lazyness. It can be
> resolved.
>
> Future looking some improvements are needed, TX routines should take
> an array of packets if we believe long trains of packets will be on
> the RX ring inside a "processing window" (how many descriptors we
> handle per irq clean). Note with many queues on devices we can ensure
> this happens with flow director or even RSS in some cases.
>
> @Alex, please review. Look at patch 2/3 in paticular and let me know
> what you think about the trade-offs I made there w.r.t. num_xdp_queues

I'll try to get the set reviewed this weekend.

> ---
>
> John Fastabend (3):
>       ixgbe: add XDP support for pass and drop actions
>       ixgbe: add support for XDP_TX action
>       ixgbe: xdp support for adjust head
>
>
>  drivers/net/ethernet/intel/ixgbe/ixgbe.h         |   14 +
>  drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c |   29 +
>  drivers/net/ethernet/intel/ixgbe/ixgbe_lib.c     |   85 ++++
>  drivers/net/ethernet/intel/ixgbe/ixgbe_main.c    |  439 ++++++++++++++++++++--
>  4 files changed, 511 insertions(+), 56 deletions(-)
>
> --
> Signature

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Intel-wired-lan] [net-next PATCH 2/3] ixgbe: add support for XDP_TX action
  2017-02-25 17:32 ` [Intel-wired-lan] [net-next PATCH 2/3] ixgbe: add support for XDP_TX action John Fastabend
@ 2017-02-25 22:01   ` Alexander Duyck
  2017-03-01 23:15     ` John Fastabend
  0 siblings, 1 reply; 14+ messages in thread
From: Alexander Duyck @ 2017-02-25 22:01 UTC (permalink / raw)
  To: intel-wired-lan

On Sat, Feb 25, 2017 at 9:32 AM, John Fastabend
<john.fastabend@gmail.com> wrote:
> Add support for XDP_TX action.
>
> A couple design choices were made here. First I use a new ring
> pointer structure xdp_ring[] in the adapter struct instead of
> pushing the newly allocated xdp TX rings into the tx_ring[]
> structure. This means we have to duplicate loops around rings
> in places we want to initialize both TX rings and XDP rings.
> But by making it explicit it is obvious when we are using XDP
> rings and when we are using TX rings. Further we don't have
> to do ring arithmatic which is error prone. As a proof point
> for doing this my first patches used only a single ring structure
> and introduced bugs in FCoE code and macvlan code paths.
>
> Second I am aware this is not the most optimized version of
> this code possible. I want to get baseline support in using
> the most readable format possible and then once this series
> is included I will optimize the TX path in another series
> of patches.

Comments inline below.  There are still a few items to iron out but
this looks pretty good.  I'll try to get to the other two patches
tomorrow.

> Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
> ---
>  drivers/net/ethernet/intel/ixgbe/ixgbe.h         |   12 +
>  drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c |   29 ++
>  drivers/net/ethernet/intel/ixgbe/ixgbe_lib.c     |   85 +++++-
>  drivers/net/ethernet/intel/ixgbe/ixgbe_main.c    |  314 +++++++++++++++++++---
>  4 files changed, 386 insertions(+), 54 deletions(-)
>
> diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe.h b/drivers/net/ethernet/intel/ixgbe/ixgbe.h
> index 2d12c24..571a072 100644
> --- a/drivers/net/ethernet/intel/ixgbe/ixgbe.h
> +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe.h
> @@ -190,7 +190,10 @@ struct vf_macvlans {
>  struct ixgbe_tx_buffer {
>         union ixgbe_adv_tx_desc *next_to_watch;
>         unsigned long time_stamp;
> -       struct sk_buff *skb;
> +       union {
> +               struct sk_buff *skb;
> +               void *data; /* XDP uses addr ptr on irq_clean */
> +       };
>         unsigned int bytecount;
>         unsigned short gso_segs;
>         __be16 protocol;

This actually has me wondering if we couldn't short-cut things one
step further and just store a flag in the tx_flags portion of the
tx_buffer to record if this is an skb or a data pointer.  I'm
wondering if we couldn't optimize the standard transmit path to just
take a reference on the skb->head, and drop the sk_buff itself for
small packets since in most cases there isn't anything we really need
to hold onto anyway.  I might have to look into that.

> @@ -308,6 +311,7 @@ struct ixgbe_ring {
>         };
>
>         u8 dcb_tc;
> +       bool xdp_ring;
>         struct ixgbe_queue_stats stats;
>         struct u64_stats_sync syncp;
>         union {

Instead of adding a bool I would rather have this added to the ring
state as a single bit.  That way we don't have to modify the ring
structure here at all.

> @@ -335,6 +339,7 @@ enum ixgbe_ring_f_enum {
>  #define IXGBE_MAX_FCOE_INDICES         8
>  #define MAX_RX_QUEUES                  (IXGBE_MAX_FDIR_INDICES + 1)
>  #define MAX_TX_QUEUES                  (IXGBE_MAX_FDIR_INDICES + 1)
> +#define MAX_XDP_QUEUES                 (IXGBE_MAX_FDIR_INDICES + 1)
>  #define IXGBE_MAX_L2A_QUEUES           4
>  #define IXGBE_BAD_L2A_QUEUE            3
>  #define IXGBE_MAX_MACVLANS             31
> @@ -578,6 +583,10 @@ struct ixgbe_adapter {
>         __be16 vxlan_port;
>         __be16 geneve_port;
>
> +       /* XDP */
> +       int num_xdp_queues;
> +       struct ixgbe_ring *xdp_ring[MAX_XDP_QUEUES];
> +
>         /* TX */
>         struct ixgbe_ring *tx_ring[MAX_TX_QUEUES] ____cacheline_aligned_in_smp;
>
> @@ -624,6 +633,7 @@ struct ixgbe_adapter {
>
>         u64 tx_busy;
>         unsigned int tx_ring_count;
> +       unsigned int xdp_ring_count;
>         unsigned int rx_ring_count;
>
>         u32 link_speed;
> diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c
> index d3e02ac..51efd0a 100644
> --- a/drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c
> +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c
> @@ -1031,7 +1031,7 @@ static int ixgbe_set_ringparam(struct net_device *netdev,
>  {
>         struct ixgbe_adapter *adapter = netdev_priv(netdev);
>         struct ixgbe_ring *temp_ring;
> -       int i, err = 0;
> +       int i, j, err = 0;
>         u32 new_rx_count, new_tx_count;
>
>         if ((ring->rx_mini_pending) || (ring->rx_jumbo_pending))
> @@ -1057,15 +1057,19 @@ static int ixgbe_set_ringparam(struct net_device *netdev,
>         if (!netif_running(adapter->netdev)) {
>                 for (i = 0; i < adapter->num_tx_queues; i++)
>                         adapter->tx_ring[i]->count = new_tx_count;
> +               for (i = 0; i < adapter->num_xdp_queues; i++)
> +                       adapter->xdp_ring[i]->count = new_tx_count;
>                 for (i = 0; i < adapter->num_rx_queues; i++)
>                         adapter->rx_ring[i]->count = new_rx_count;
>                 adapter->tx_ring_count = new_tx_count;
> +               adapter->xdp_ring_count = new_tx_count;
>                 adapter->rx_ring_count = new_rx_count;
>                 goto clear_reset;
>         }
>
>         /* allocate temporary buffer to store rings in */
> -       i = max_t(int, adapter->num_tx_queues, adapter->num_rx_queues);
> +       i = max_t(int, adapter->num_tx_queues + adapter->num_xdp_queues,
> +                 adapter->num_rx_queues);

You only need the maximum of one of the 3 values.  You don't need to
add num_tx_queues to num_xdp_queues.

>         temp_ring = vmalloc(i * sizeof(struct ixgbe_ring));
>
>         if (!temp_ring) {
> @@ -1097,12 +1101,33 @@ static int ixgbe_set_ringparam(struct net_device *netdev,
>                         }
>                 }
>
> +               for (j = 0; j < adapter->num_xdp_queues; i++, j++) {
> +                       memcpy(&temp_ring[i], adapter->xdp_ring[j],
> +                              sizeof(struct ixgbe_ring));
> +
> +                       temp_ring[i].count = new_tx_count;
> +                       err = ixgbe_setup_tx_resources(&temp_ring[i]);
> +                       if (err) {
> +                               while (i) {
> +                                       i--;
> +                                       ixgbe_free_tx_resources(&temp_ring[i]);
> +                               }
> +                               goto err_setup;
> +                       }
> +               }
> +
>                 for (i = 0; i < adapter->num_tx_queues; i++) {
>                         ixgbe_free_tx_resources(adapter->tx_ring[i]);
>
>                         memcpy(adapter->tx_ring[i], &temp_ring[i],
>                                sizeof(struct ixgbe_ring));
>                 }
> +               for (j = 0; j < adapter->num_xdp_queues; i++, j++) {
> +                       ixgbe_free_tx_resources(adapter->xdp_ring[j]);
> +
> +                       memcpy(adapter->xdp_ring[j], &temp_ring[i],
> +                              sizeof(struct ixgbe_ring));
> +               }
>
>                 adapter->tx_ring_count = new_tx_count;
>         }
> diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_lib.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_lib.c
> index 1b8be7d..d9a4916 100644
> --- a/drivers/net/ethernet/intel/ixgbe/ixgbe_lib.c
> +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_lib.c
> @@ -73,6 +73,11 @@ static bool ixgbe_cache_ring_dcb_sriov(struct ixgbe_adapter *adapter)
>                         reg_idx = __ALIGN_MASK(reg_idx, ~vmdq->mask);
>                 adapter->tx_ring[i]->reg_idx = reg_idx;
>         }
> +       for (i = 0; i < adapter->num_xdp_queues; i++, reg_idx++) {
> +               if ((reg_idx & ~vmdq->mask) >= tcs)
> +                       reg_idx = __ALIGN_MASK(reg_idx, ~vmdq->mask);
> +               adapter->xdp_ring[i]->reg_idx = reg_idx;
> +       }
>
>  #ifdef IXGBE_FCOE
>         /* nothing to do if FCoE is disabled */
> @@ -248,6 +253,12 @@ static bool ixgbe_cache_ring_sriov(struct ixgbe_adapter *adapter)
>                 adapter->tx_ring[i]->reg_idx = reg_idx;
>         }
>
> +       for (i = 0; i < adapter->num_xdp_queues; i++, reg_idx++) {
> +               if ((reg_idx & rss->mask) >= rss->indices)
> +                       reg_idx = __ALIGN_MASK(reg_idx, ~vmdq->mask);
> +               adapter->xdp_ring[i]->reg_idx = reg_idx;
> +       }
> +
>  #ifdef IXGBE_FCOE
>         /* FCoE uses a linear block of queues so just assigning 1:1 */
>         for (; i < adapter->num_tx_queues; i++, reg_idx++)

One thing we may want to look at doing for the SR-IOV cases would be
to steal XDP queues from an adjacent VMDq pool.  It would put a
dependency us limiting ourselves by one additional VF, but it would
make the logic much simpler since each VMDq pool is the same size so
the Tx/Rx limit in one pool would be the same as the next so you could
do twice the number of Tx queues as Rx queues.

> @@ -267,12 +278,14 @@ static bool ixgbe_cache_ring_sriov(struct ixgbe_adapter *adapter)
>   **/
>  static bool ixgbe_cache_ring_rss(struct ixgbe_adapter *adapter)
>  {
> -       int i;
> +       int i, reg_idx;
>
>         for (i = 0; i < adapter->num_rx_queues; i++)
>                 adapter->rx_ring[i]->reg_idx = i;
> -       for (i = 0; i < adapter->num_tx_queues; i++)
> -               adapter->tx_ring[i]->reg_idx = i;
> +       for (i = 0, reg_idx = 0; i < adapter->num_tx_queues; i++, reg_idx++)
> +               adapter->tx_ring[i]->reg_idx = reg_idx;
> +       for (i = 0; i < adapter->num_xdp_queues; i++, reg_idx++)
> +               adapter->xdp_ring[i]->reg_idx = reg_idx;
>
>         return true;
>  }
> @@ -308,6 +321,14 @@ static void ixgbe_cache_ring_register(struct ixgbe_adapter *adapter)
>         ixgbe_cache_ring_rss(adapter);
>  }
>
> +static int ixgbe_xdp_queues(struct ixgbe_adapter *adapter)
> +{
> +       if (nr_cpu_ids > MAX_XDP_QUEUES)
> +               return 0;
> +
> +       return adapter->xdp_prog ? nr_cpu_ids : 0;
> +}
> +
>  #define IXGBE_RSS_64Q_MASK     0x3F
>  #define IXGBE_RSS_16Q_MASK     0xF
>  #define IXGBE_RSS_8Q_MASK      0x7
> @@ -382,6 +403,7 @@ static bool ixgbe_set_dcb_sriov_queues(struct ixgbe_adapter *adapter)
>         adapter->num_rx_queues_per_pool = tcs;
>
>         adapter->num_tx_queues = vmdq_i * tcs;
> +       adapter->num_xdp_queues = ixgbe_xdp_queues(adapter);
>         adapter->num_rx_queues = vmdq_i * tcs;
>
>  #ifdef IXGBE_FCOE
> @@ -479,6 +501,7 @@ static bool ixgbe_set_dcb_queues(struct ixgbe_adapter *adapter)
>                 netdev_set_tc_queue(dev, i, rss_i, rss_i * i);
>
>         adapter->num_tx_queues = rss_i * tcs;
> +       adapter->num_xdp_queues = ixgbe_xdp_queues(adapter);
>         adapter->num_rx_queues = rss_i * tcs;
>
>         return true;
> @@ -549,6 +572,7 @@ static bool ixgbe_set_sriov_queues(struct ixgbe_adapter *adapter)
>
>         adapter->num_rx_queues = vmdq_i * rss_i;
>         adapter->num_tx_queues = vmdq_i * rss_i;
> +       adapter->num_xdp_queues = ixgbe_xdp_queues(adapter);
>
>         /* disable ATR as it is not supported when VMDq is enabled */
>         adapter->flags &= ~IXGBE_FLAG_FDIR_HASH_CAPABLE;

I'm pretty sure we are currently looking at an overly simplified
setup.  I suspect we need to take queue limits into account for DCB
and SR-IOV.  For example what TC are we supposed to be taking Tx
queues from in the case of DCB? Really in order for us to support XDP
in the DCB case we need rss_i to be equal to or less than half the
maximum possible value.

And as I mentioned before we need to make certain in the case of
SR-IOV w/ DCB or without that we have one pool that we set aside so
that we can steal queues from it to support XPS.

> @@ -669,6 +693,7 @@ static bool ixgbe_set_rss_queues(struct ixgbe_adapter *adapter)
>  #endif /* IXGBE_FCOE */
>         adapter->num_rx_queues = rss_i;
>         adapter->num_tx_queues = rss_i;
> +       adapter->num_xdp_queues = ixgbe_xdp_queues(adapter);
>
>         return true;
>  }

So this is the one allocation that is safe for now since the rss_i has
an upper limit of 64.

> @@ -689,6 +714,7 @@ static void ixgbe_set_num_queues(struct ixgbe_adapter *adapter)
>         /* Start with base case */
>         adapter->num_rx_queues = 1;
>         adapter->num_tx_queues = 1;
> +       adapter->num_xdp_queues = 0;
>         adapter->num_rx_pools = adapter->num_rx_queues;
>         adapter->num_rx_queues_per_pool = 1;
>
> @@ -720,7 +746,8 @@ static int ixgbe_acquire_msix_vectors(struct ixgbe_adapter *adapter)
>         int i, vectors, vector_threshold;
>
>         /* We start by asking for one vector per queue pair */
> -       vectors = max(adapter->num_rx_queues, adapter->num_tx_queues);
> +       vectors = max(adapter->num_rx_queues,
> +                     adapter->num_tx_queues + adapter->num_xdp_queues);
>
>         /* It is easy to be greedy for MSI-X vectors. However, it really
>          * doesn't do much good if we have a lot more vectors than CPUs. We'll

This might be overkill on the vectors.  You can stack regular Rx
queues, Tx queues, and xdp_queues.  I would say take the maximum of
all 3 instead of adding Tx and XPS queues.

> @@ -800,6 +827,8 @@ static void ixgbe_add_ring(struct ixgbe_ring *ring,
>   * @v_idx: index of vector in adapter struct
>   * @txr_count: total number of Tx rings to allocate
>   * @txr_idx: index of first Tx ring to allocate
> + * @xdp_count: total number of XDP rings to allocate
> + * @xdp_idx: index of first XDP ring to allocate
>   * @rxr_count: total number of Rx rings to allocate
>   * @rxr_idx: index of first Rx ring to allocate
>   *
> @@ -808,6 +837,7 @@ static void ixgbe_add_ring(struct ixgbe_ring *ring,
>  static int ixgbe_alloc_q_vector(struct ixgbe_adapter *adapter,
>                                 int v_count, int v_idx,
>                                 int txr_count, int txr_idx,
> +                               int xdp_count, int xdp_idx,
>                                 int rxr_count, int rxr_idx)
>  {
>         struct ixgbe_q_vector *q_vector;
> @@ -909,6 +939,33 @@ static int ixgbe_alloc_q_vector(struct ixgbe_adapter *adapter,
>                 ring++;
>         }
>
> +       while (xdp_count) {
> +               /* assign generic ring traits */
> +               ring->dev = &adapter->pdev->dev;
> +               ring->netdev = adapter->netdev;

Just a thought on all this.  I don't know if we should be setting
ring->netdev.  We don't really have a netdevice associated with the
XDP rings.  we have an Rx ring.

If we were to leave ring->netdev as NULL it might help us to find any
and all places where we might have overlooked things and have XDP
trying to tweak netdev queues.  In addition it would help to short
circuit the flag checks in the Tx clean-up path as we could just check
for tx_ring->netdev instead of having to check for an XDP boolean
value.

> +               /* configure backlink on ring */
> +               ring->q_vector = q_vector;
> +
> +               /* update q_vector Tx values */
> +               ixgbe_add_ring(ring, &q_vector->tx);
> +
> +               /* apply Tx specific ring traits */
> +               ring->count = adapter->xdp_ring_count;
> +               ring->queue_index = xdp_idx;
> +               ring->xdp_ring = true;
> +
> +               /* assign ring to adapter */
> +               adapter->xdp_ring[xdp_idx] = ring;
> +
> +               /* update count and index */
> +               xdp_count--;
> +               xdp_idx += v_count;
> +
> +               /* push pointer to next ring */
> +               ring++;
> +       }
> +
>         while (rxr_count) {
>                 /* assign generic ring traits */
>                 ring->dev = &adapter->pdev->dev;
> @@ -1002,17 +1059,18 @@ static int ixgbe_alloc_q_vectors(struct ixgbe_adapter *adapter)
>         int q_vectors = adapter->num_q_vectors;
>         int rxr_remaining = adapter->num_rx_queues;
>         int txr_remaining = adapter->num_tx_queues;
> -       int rxr_idx = 0, txr_idx = 0, v_idx = 0;
> +       int xdp_remaining = adapter->num_xdp_queues;
> +       int rxr_idx = 0, txr_idx = 0, xdp_idx = 0, v_idx = 0;
>         int err;
>
>         /* only one q_vector if MSI-X is disabled. */
>         if (!(adapter->flags & IXGBE_FLAG_MSIX_ENABLED))
>                 q_vectors = 1;
>
> -       if (q_vectors >= (rxr_remaining + txr_remaining)) {
> +       if (q_vectors >= (rxr_remaining + txr_remaining + xdp_remaining)) {
>                 for (; rxr_remaining; v_idx++) {
>                         err = ixgbe_alloc_q_vector(adapter, q_vectors, v_idx,
> -                                                  0, 0, 1, rxr_idx);
> +                                                  0, 0, 0, 0, 1, rxr_idx);
>
>                         if (err)
>                                 goto err_out;
> @@ -1026,8 +1084,11 @@ static int ixgbe_alloc_q_vectors(struct ixgbe_adapter *adapter)
>         for (; v_idx < q_vectors; v_idx++) {
>                 int rqpv = DIV_ROUND_UP(rxr_remaining, q_vectors - v_idx);
>                 int tqpv = DIV_ROUND_UP(txr_remaining, q_vectors - v_idx);
> +               int xqpv = DIV_ROUND_UP(xdp_remaining, q_vectors - v_idx);
> +
>                 err = ixgbe_alloc_q_vector(adapter, q_vectors, v_idx,
>                                            tqpv, txr_idx,
> +                                          xqpv, xdp_idx,
>                                            rqpv, rxr_idx);
>
>                 if (err)
> @@ -1036,14 +1097,17 @@ static int ixgbe_alloc_q_vectors(struct ixgbe_adapter *adapter)
>                 /* update counts and index */
>                 rxr_remaining -= rqpv;
>                 txr_remaining -= tqpv;
> +               xdp_remaining -= xqpv;
>                 rxr_idx++;
>                 txr_idx++;
> +               xdp_idx++;
>         }
>
>         return 0;
>
>  err_out:
>         adapter->num_tx_queues = 0;
> +       adapter->num_xdp_queues = 0;
>         adapter->num_rx_queues = 0;
>         adapter->num_q_vectors = 0;
>
> @@ -1066,6 +1130,7 @@ static void ixgbe_free_q_vectors(struct ixgbe_adapter *adapter)
>         int v_idx = adapter->num_q_vectors;
>
>         adapter->num_tx_queues = 0;
> +       adapter->num_xdp_queues = 0;
>         adapter->num_rx_queues = 0;
>         adapter->num_q_vectors = 0;
>
> @@ -1172,9 +1237,10 @@ int ixgbe_init_interrupt_scheme(struct ixgbe_adapter *adapter)
>
>         ixgbe_cache_ring_register(adapter);
>
> -       e_dev_info("Multiqueue %s: Rx Queue count = %u, Tx Queue count = %u\n",
> +       e_dev_info("Multiqueue %s: Rx Queue count = %u, Tx Queue count = %u XDP Queue count = %u\n",
>                    (adapter->num_rx_queues > 1) ? "Enabled" : "Disabled",
> -                  adapter->num_rx_queues, adapter->num_tx_queues);
> +                  adapter->num_rx_queues,
> +                  adapter->num_tx_queues, adapter->num_xdp_queues);
>
>         set_bit(__IXGBE_DOWN, &adapter->state);
>

I think I would have preferred to see us leave tx_queues, and
rx_queues on the same line and add num_xdp_queues on a line by itself.

> @@ -1195,6 +1261,7 @@ int ixgbe_init_interrupt_scheme(struct ixgbe_adapter *adapter)
>  void ixgbe_clear_interrupt_scheme(struct ixgbe_adapter *adapter)
>  {
>         adapter->num_tx_queues = 0;
> +       adapter->num_xdp_queues = 0;
>         adapter->num_rx_queues = 0;
>
>         ixgbe_free_q_vectors(adapter);
> diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
> index ec2c38f..227caf8 100644
> --- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
> +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
> @@ -594,6 +594,19 @@ static void ixgbe_regdump(struct ixgbe_hw *hw, struct ixgbe_reg_info *reginfo)
>
>  }
>
> +static void ixgbe_print_buffer(struct ixgbe_ring *ring, int n)
> +{
> +       struct ixgbe_tx_buffer *tx_buffer;
> +
> +       tx_buffer = &ring->tx_buffer_info[ring->next_to_clean];
> +       pr_info(" %5d %5X %5X %016llX %08X %p %016llX\n",
> +                  n, ring->next_to_use, ring->next_to_clean,
> +                  (u64)dma_unmap_addr(tx_buffer, dma),
> +                  dma_unmap_len(tx_buffer, len),
> +                  tx_buffer->next_to_watch,
> +                  (u64)tx_buffer->time_stamp);
> +}
> +
>  /*
>   * ixgbe_dump - Print registers, tx-rings and rx-rings
>   */
> @@ -603,7 +616,7 @@ static void ixgbe_dump(struct ixgbe_adapter *adapter)
>         struct ixgbe_hw *hw = &adapter->hw;
>         struct ixgbe_reg_info *reginfo;
>         int n = 0;
> -       struct ixgbe_ring *tx_ring;
> +       struct ixgbe_ring *ring;
>         struct ixgbe_tx_buffer *tx_buffer;
>         union ixgbe_adv_tx_desc *tx_desc;
>         struct my_u0 { u64 a; u64 b; } *u0;
> @@ -644,14 +657,13 @@ static void ixgbe_dump(struct ixgbe_adapter *adapter)
>                 "Queue [NTU] [NTC] [bi(ntc)->dma  ]",
>                 "leng", "ntw", "timestamp");
>         for (n = 0; n < adapter->num_tx_queues; n++) {
> -               tx_ring = adapter->tx_ring[n];
> -               tx_buffer = &tx_ring->tx_buffer_info[tx_ring->next_to_clean];
> -               pr_info(" %5d %5X %5X %016llX %08X %p %016llX\n",
> -                          n, tx_ring->next_to_use, tx_ring->next_to_clean,
> -                          (u64)dma_unmap_addr(tx_buffer, dma),
> -                          dma_unmap_len(tx_buffer, len),
> -                          tx_buffer->next_to_watch,
> -                          (u64)tx_buffer->time_stamp);
> +               ring = adapter->tx_ring[n];
> +               ixgbe_print_buffer(ring, n);
> +       }
> +
> +       for (n = 0; n < adapter->num_xdp_queues; n++) {
> +               ring = adapter->xdp_ring[n];
> +               ixgbe_print_buffer(ring, n);
>         }
>
>         /* Print TX Rings */
> @@ -696,28 +708,28 @@ static void ixgbe_dump(struct ixgbe_adapter *adapter)
>          */
>
>         for (n = 0; n < adapter->num_tx_queues; n++) {
> -               tx_ring = adapter->tx_ring[n];
> +               ring = adapter->tx_ring[n];
>                 pr_info("------------------------------------\n");
> -               pr_info("TX QUEUE INDEX = %d\n", tx_ring->queue_index);
> +               pr_info("TX QUEUE INDEX = %d\n", ring->queue_index);
>                 pr_info("------------------------------------\n");
>                 pr_info("%s%s    %s              %s        %s          %s\n",
>                         "T [desc]     [address 63:0  ] ",
>                         "[PlPOIdStDDt Ln] [bi->dma       ] ",
>                         "leng", "ntw", "timestamp", "bi->skb");
>
> -               for (i = 0; tx_ring->desc && (i < tx_ring->count); i++) {
> -                       tx_desc = IXGBE_TX_DESC(tx_ring, i);
> -                       tx_buffer = &tx_ring->tx_buffer_info[i];
> +               for (i = 0; ring->desc && (i < ring->count); i++) {
> +                       tx_desc = IXGBE_TX_DESC(ring, i);
> +                       tx_buffer = &ring->tx_buffer_info[i];
>                         u0 = (struct my_u0 *)tx_desc;
>                         if (dma_unmap_len(tx_buffer, len) > 0) {
>                                 const char *ring_desc;
>
> -                               if (i == tx_ring->next_to_use &&
> -                                   i == tx_ring->next_to_clean)
> +                               if (i == ring->next_to_use &&
> +                                   i == ring->next_to_clean)
>                                         ring_desc = " NTC/U";
> -                               else if (i == tx_ring->next_to_use)
> +                               else if (i == ring->next_to_use)
>                                         ring_desc = " NTU";
> -                               else if (i == tx_ring->next_to_clean)
> +                               else if (i == ring->next_to_clean)
>                                         ring_desc = " NTC";
>                                 else
>                                         ring_desc = "";
> @@ -987,6 +999,10 @@ static void ixgbe_update_xoff_rx_lfc(struct ixgbe_adapter *adapter)
>         for (i = 0; i < adapter->num_tx_queues; i++)
>                 clear_bit(__IXGBE_HANG_CHECK_ARMED,
>                           &adapter->tx_ring[i]->state);
> +
> +       for (i = 0; i < adapter->num_xdp_queues; i++)
> +               clear_bit(__IXGBE_HANG_CHECK_ARMED,
> +                         &adapter->xdp_ring[i]->state);
>  }
>
>  static void ixgbe_update_xoff_received(struct ixgbe_adapter *adapter)
> @@ -1031,6 +1047,14 @@ static void ixgbe_update_xoff_received(struct ixgbe_adapter *adapter)
>                 if (xoff[tc])
>                         clear_bit(__IXGBE_HANG_CHECK_ARMED, &tx_ring->state);
>         }
> +
> +       for (i = 0; i < adapter->num_xdp_queues; i++) {
> +               struct ixgbe_ring *xdp_ring = adapter->xdp_ring[i];
> +
> +               tc = xdp_ring->dcb_tc;
> +               if (xoff[tc])
> +                       clear_bit(__IXGBE_HANG_CHECK_ARMED, &xdp_ring->state);
> +       }
>  }
>
>  static u64 ixgbe_get_tx_completed(struct ixgbe_ring *ring)
> @@ -1137,6 +1161,11 @@ static int ixgbe_tx_maxrate(struct net_device *netdev,
>         return 0;
>  }
>
> +static bool ixgbe_txring_is_xdp(struct ixgbe_ring *tx_ring)
> +{
> +       return tx_ring->xdp_ring;
> +}
> +
>  /**
>   * ixgbe_clean_tx_irq - Reclaim resources after transmit completes
>   * @q_vector: structure containing interrupt and ring information
> @@ -1182,7 +1211,10 @@ static bool ixgbe_clean_tx_irq(struct ixgbe_q_vector *q_vector,
>                 total_packets += tx_buffer->gso_segs;
>
>                 /* free the skb */
> -               napi_consume_skb(tx_buffer->skb, napi_budget);
> +               if (ixgbe_txring_is_xdp(tx_ring))
> +                       page_frag_free(tx_buffer->data);
> +               else
> +                       napi_consume_skb(tx_buffer->skb, napi_budget);
>
>                 /* unmap skb header data */
>                 dma_unmap_single(tx_ring->dev,
> @@ -1243,7 +1275,7 @@ static bool ixgbe_clean_tx_irq(struct ixgbe_q_vector *q_vector,
>         if (check_for_tx_hang(tx_ring) && ixgbe_check_tx_hang(tx_ring)) {
>                 /* schedule immediate reset if we believe we hung */
>                 struct ixgbe_hw *hw = &adapter->hw;
> -               e_err(drv, "Detected Tx Unit Hang\n"
> +               e_err(drv, "Detected Tx Unit Hang %s\n"
>                         "  Tx Queue             <%d>\n"
>                         "  TDH, TDT             <%x>, <%x>\n"
>                         "  next_to_use          <%x>\n"
> @@ -1251,13 +1283,16 @@ static bool ixgbe_clean_tx_irq(struct ixgbe_q_vector *q_vector,
>                         "tx_buffer_info[next_to_clean]\n"
>                         "  time_stamp           <%lx>\n"
>                         "  jiffies              <%lx>\n",
> +                       ixgbe_txring_is_xdp(tx_ring) ? "(XDP)" : "",
>                         tx_ring->queue_index,
>                         IXGBE_READ_REG(hw, IXGBE_TDH(tx_ring->reg_idx)),
>                         IXGBE_READ_REG(hw, IXGBE_TDT(tx_ring->reg_idx)),
>                         tx_ring->next_to_use, i,
>                         tx_ring->tx_buffer_info[i].time_stamp, jiffies);
>
> -               netif_stop_subqueue(tx_ring->netdev, tx_ring->queue_index);
> +               if (!ixgbe_txring_is_xdp(tx_ring))
> +                       netif_stop_subqueue(tx_ring->netdev,
> +                                           tx_ring->queue_index);
>
>                 e_info(probe,
>                        "tx hang %d detected on queue %d, resetting adapter\n",
> @@ -1270,6 +1305,9 @@ static bool ixgbe_clean_tx_irq(struct ixgbe_q_vector *q_vector,
>                 return true;
>         }
>
> +       if (ixgbe_txring_is_xdp(tx_ring))
> +               return !!budget;
> +
>         netdev_tx_completed_queue(txring_txq(tx_ring),
>                                   total_packets, total_bytes);
>
> @@ -2160,10 +2198,16 @@ static struct sk_buff *ixgbe_build_skb(struct ixgbe_ring *rx_ring,
>         return skb;
>  }
>
> -static int ixgbe_run_xdp(struct ixgbe_ring  *rx_ring,
> +static int ixgbe_xmit_xdp_ring(struct xdp_buff *xdp,
> +                              struct ixgbe_adapter *adapter,
> +                              struct ixgbe_ring *tx_ring);
> +
> +static int ixgbe_run_xdp(struct ixgbe_adapter *adapter,
> +                        struct ixgbe_ring  *rx_ring,
>                          struct ixgbe_rx_buffer *rx_buffer,
>                          unsigned int size)
>  {
> +       struct ixgbe_ring *xdp_ring;
>         struct bpf_prog *xdp_prog;
>         struct xdp_buff xdp;
>         void *addr;
> @@ -2183,9 +2227,18 @@ static int ixgbe_run_xdp(struct ixgbe_ring  *rx_ring,
>         switch (act) {
>         case XDP_PASS:
>                 return 0;
> +       case XDP_TX:
> +               xdp_ring = adapter->xdp_ring[smp_processor_id()];
> +
> +               /* XDP_TX is not flow controlled */
> +               if (!xdp_ring) {
> +                       WARN_ON(1);
> +                       break;
> +               }
> +               ixgbe_xmit_xdp_ring(&xdp, adapter, xdp_ring);
> +               break;

So just for the sake of future compatibility I would say you might
look at instead just passing a netdev and the XDP buffer.  Then your
transmit routine can make the decision on using smp_processor_id and
grab the xdp_ring it wants.  No point in having us make the decision
in the Rx path.  That way things stay closer to how the transmit path
already works.

Also shouldn't you return some sort of notification as to if you
completed the xmit successfully?  It almost seems like maybe we should
fall through to an XDP_ABORTED case if we didn't transmit the frame
since we might end up leaking the memory otherwise.

>         default:
>                 bpf_warn_invalid_xdp_action(act);
> -       case XDP_TX:
>         case XDP_ABORTED:
>                 trace_xdp_exception(rx_ring->netdev, xdp_prog, act);
>                 /* fallthrough -- handle aborts by dropping packet */
> @@ -2214,8 +2267,8 @@ static int ixgbe_clean_rx_irq(struct ixgbe_q_vector *q_vector,
>                                const int budget)
>  {
>         unsigned int total_rx_bytes = 0, total_rx_packets = 0;
> -#ifdef IXGBE_FCOE
>         struct ixgbe_adapter *adapter = q_vector->adapter;
> +#ifdef IXGBE_FCOE
>         int ddp_bytes;
>         unsigned int mss = 0;
>  #endif /* IXGBE_FCOE */
> @@ -2248,7 +2301,7 @@ static int ixgbe_clean_rx_irq(struct ixgbe_q_vector *q_vector,
>                 rx_buffer = ixgbe_get_rx_buffer(rx_ring, rx_desc, &skb, size);
>
>                 rcu_read_lock();
> -               consumed = ixgbe_run_xdp(rx_ring, rx_buffer, size);
> +               consumed = ixgbe_run_xdp(adapter, rx_ring, rx_buffer, size);
>                 rcu_read_unlock();
>
>                 if (consumed) {

You probably don't need the adapter struct.  If you have the rx_ring
it should have a netdev structure and from that you can get to the
adapter struct via netdev_priv.

> @@ -2914,6 +2967,15 @@ static irqreturn_t ixgbe_msix_other(int irq, void *data)
>                                                        &ring->state))
>                                         reinit_count++;
>                         }
> +
> +                       for (i = 0; i < adapter->num_xdp_queues; i++) {
> +                               struct ixgbe_ring *ring = adapter->xdp_ring[i];
> +
> +                               if (test_and_clear_bit(__IXGBE_TX_FDIR_INIT_DONE,
> +                                                      &ring->state))
> +                                       reinit_count++;
> +                       }
> +
>                         if (reinit_count) {
>                                 /* no more flow director interrupts until after init */
>                                 IXGBE_WRITE_REG(hw, IXGBE_EIMC, IXGBE_EIMC_FLOW_DIR);

We don't run flow director on the XDP queues so there is no need to
add this bit.

> @@ -3429,6 +3491,8 @@ static void ixgbe_configure_tx(struct ixgbe_adapter *adapter)
>         /* Setup the HW Tx Head and Tail descriptor pointers */
>         for (i = 0; i < adapter->num_tx_queues; i++)
>                 ixgbe_configure_tx_ring(adapter, adapter->tx_ring[i]);
> +       for (i = 0; i < adapter->num_xdp_queues; i++)
> +               ixgbe_configure_tx_ring(adapter, adapter->xdp_ring[i]);
>  }
>
>  static void ixgbe_enable_rx_drop(struct ixgbe_adapter *adapter,
> @@ -5598,7 +5662,8 @@ static void ixgbe_clean_tx_ring(struct ixgbe_ring *tx_ring)
>         }
>
>         /* reset BQL for queue */
> -       netdev_tx_reset_queue(txring_txq(tx_ring));
> +       if (!ixgbe_txring_is_xdp(tx_ring))
> +               netdev_tx_reset_queue(txring_txq(tx_ring));
>
>         /* reset next_to_use and next_to_clean */
>         tx_ring->next_to_use = 0;
> @@ -5627,6 +5692,8 @@ static void ixgbe_clean_all_tx_rings(struct ixgbe_adapter *adapter)
>
>         for (i = 0; i < adapter->num_tx_queues; i++)
>                 ixgbe_clean_tx_ring(adapter->tx_ring[i]);
> +       for (i = 0; i < adapter->num_xdp_queues; i++)
> +               ixgbe_clean_tx_ring(adapter->xdp_ring[i]);
>  }
>
>  static void ixgbe_fdir_filter_exit(struct ixgbe_adapter *adapter)
> @@ -5721,6 +5788,11 @@ void ixgbe_down(struct ixgbe_adapter *adapter)
>                 u8 reg_idx = adapter->tx_ring[i]->reg_idx;
>                 IXGBE_WRITE_REG(hw, IXGBE_TXDCTL(reg_idx), IXGBE_TXDCTL_SWFLSH);
>         }
> +       for (i = 0; i < adapter->num_xdp_queues; i++) {
> +               u8 reg_idx = adapter->xdp_ring[i]->reg_idx;
> +
> +               IXGBE_WRITE_REG(hw, IXGBE_TXDCTL(reg_idx), IXGBE_TXDCTL_SWFLSH);
> +       }
>
>         /* Disable the Tx DMA engine on 82599 and later MAC */
>         switch (hw->mac.type) {
> @@ -6008,6 +6080,7 @@ static int ixgbe_sw_init(struct ixgbe_adapter *adapter,
>
>         /* set default ring sizes */
>         adapter->tx_ring_count = IXGBE_DEFAULT_TXD;
> +       adapter->xdp_ring_count = IXGBE_DEFAULT_TXD;
>         adapter->rx_ring_count = IXGBE_DEFAULT_RXD;
>
>         /* set default work limits */

I would just use the tx_ring_count value for the XDP rings.  No point
in adding another control unless you want to add the ethtool interface
for it.

> @@ -6089,7 +6162,7 @@ int ixgbe_setup_tx_resources(struct ixgbe_ring *tx_ring)
>   **/
>  static int ixgbe_setup_all_tx_resources(struct ixgbe_adapter *adapter)
>  {
> -       int i, err = 0;
> +       int i, j = 0, err = 0;
>
>         for (i = 0; i < adapter->num_tx_queues; i++) {
>                 err = ixgbe_setup_tx_resources(adapter->tx_ring[i]);
> @@ -6099,12 +6172,23 @@ static int ixgbe_setup_all_tx_resources(struct ixgbe_adapter *adapter)
>                 e_err(probe, "Allocation for Tx Queue %u failed\n", i);
>                 goto err_setup_tx;
>         }
> +       for (j = 0; j < adapter->num_xdp_queues; j++) {
> +               err = ixgbe_setup_tx_resources(adapter->xdp_ring[j]);
> +               if (!err)
> +                       continue;
> +
> +               e_err(probe, "Allocation for Tx Queue %u failed\n", j);
> +               goto err_setup_tx;
> +       }
> +
>
>         return 0;
>  err_setup_tx:
>         /* rewind the index freeing the rings as we go */
>         while (i--)
>                 ixgbe_free_tx_resources(adapter->tx_ring[i]);
> +       while (j--)
> +               ixgbe_free_tx_resources(adapter->xdp_ring[j]);
>         return err;
>  }
>

Just for sake of symmetry I would prefer to see xdp rings freed before
the Tx rings.

> @@ -6238,6 +6322,9 @@ static void ixgbe_free_all_tx_resources(struct ixgbe_adapter *adapter)
>         for (i = 0; i < adapter->num_tx_queues; i++)
>                 if (adapter->tx_ring[i]->desc)
>                         ixgbe_free_tx_resources(adapter->tx_ring[i]);
> +       for (i = 0; i < adapter->num_xdp_queues; i++)
> +               if (adapter->xdp_ring[i]->desc)
> +                       ixgbe_free_tx_resources(adapter->xdp_ring[i]);
>  }
>
>  /**
> @@ -6656,6 +6743,14 @@ void ixgbe_update_stats(struct ixgbe_adapter *adapter)
>                 bytes += tx_ring->stats.bytes;
>                 packets += tx_ring->stats.packets;
>         }
> +       for (i = 0; i < adapter->num_xdp_queues; i++) {
> +               struct ixgbe_ring *xdp_ring = adapter->xdp_ring[i];
> +
> +               restart_queue += xdp_ring->tx_stats.restart_queue;
> +               tx_busy += xdp_ring->tx_stats.tx_busy;
> +               bytes += xdp_ring->stats.bytes;
> +               packets += xdp_ring->stats.packets;
> +       }
>         adapter->restart_queue = restart_queue;
>         adapter->tx_busy = tx_busy;
>         netdev->stats.tx_bytes = bytes;

We may wan to look at doing a slight refactor on this to convert some
of these stats grabs over to static functions.  We may want to also
group these using structs instead of just doing this as individual u64
stats.

> @@ -6849,6 +6944,9 @@ static void ixgbe_fdir_reinit_subtask(struct ixgbe_adapter *adapter)
>                 for (i = 0; i < adapter->num_tx_queues; i++)
>                         set_bit(__IXGBE_TX_FDIR_INIT_DONE,
>                                 &(adapter->tx_ring[i]->state));
> +               for (i = 0; i < adapter->num_xdp_queues; i++)
> +                       set_bit(__IXGBE_TX_FDIR_INIT_DONE,
> +                               &adapter->xdp_ring[i]->state);
>                 /* re-enable flow director interrupts */
>                 IXGBE_WRITE_REG(hw, IXGBE_EIMS, IXGBE_EIMS_FLOW_DIR);
>         } else {

I wouldn't bother with the flow director initialization for the XDP rings.

Although you may want to set __IXGBE_TX_XPS_INIT_DONE to avoid trying
to configure XPS for the XDP rings.

> @@ -6882,6 +6980,8 @@ static void ixgbe_check_hang_subtask(struct ixgbe_adapter *adapter)
>         if (netif_carrier_ok(adapter->netdev)) {
>                 for (i = 0; i < adapter->num_tx_queues; i++)
>                         set_check_for_tx_hang(adapter->tx_ring[i]);
> +               for (i = 0; i < adapter->num_xdp_queues; i++)
> +                       set_check_for_tx_hang(adapter->xdp_ring[i]);
>         }
>
>         if (!(adapter->flags & IXGBE_FLAG_MSIX_ENABLED)) {
> @@ -7112,6 +7212,13 @@ static bool ixgbe_ring_tx_pending(struct ixgbe_adapter *adapter)
>                         return true;
>         }
>
> +       for (i = 0; i < adapter->num_xdp_queues; i++) {
> +               struct ixgbe_ring *ring = adapter->xdp_ring[i];
> +
> +               if (ring->next_to_use != ring->next_to_clean)
> +                       return true;
> +       }
> +
>         return false;
>  }
>
> @@ -8072,6 +8179,99 @@ static u16 ixgbe_select_queue(struct net_device *dev, struct sk_buff *skb,
>  #endif
>  }
>
> +static int ixgbe_xmit_xdp_ring(struct xdp_buff *xdp,
> +                              struct ixgbe_adapter *adapter,
> +                              struct ixgbe_ring *ring)
> +{
> +       struct ixgbe_tx_buffer *tx_buffer;
> +       union ixgbe_adv_tx_desc *tx_desc;
> +       u32 len, tx_flags = 0;
> +       dma_addr_t dma;
> +       u16 count, i;
> +       u32 cmd_type;
> +
> +       len = xdp->data_end - xdp->data;
> +       count = TXD_USE_COUNT(len);
> +
> +       /* record the location of the first descriptor for this packet */
> +       tx_buffer = &ring->tx_buffer_info[ring->next_to_use];
> +       tx_buffer->skb = NULL;
> +       tx_buffer->bytecount = len;
> +       tx_buffer->gso_segs = 1;
> +       tx_buffer->protocol = 0;
> +
> +#ifdef CONFIG_PCI_IOV
> +       /* Use the l2switch_enable flag - would be false if the DMA
> +        * Tx switch had been disabled.
> +        */
> +       if (adapter->flags & IXGBE_FLAG_SRIOV_ENABLED)
> +               tx_flags |= IXGBE_TX_FLAGS_CC;
> +#endif
> +

I'm pretty sure there is a bug here.  The TX_FLAGS_CC is supposed to
notify the ring that it is supposed to populate the CC bit in olinfo
status.  Also you need to populate a context descriptor when this is
set since the header length recorded in the descriptor is used to
determine the length of the header in the packet.  You can probably
short-cut this and just add a bit of init code that just initializes
the queue by dropping a context descriptor in the ring with a ethernet
header length of 14 in it, and never bumping the tail.  Then if you
see the TX_FLAGS_CC you just default to context 0 which is already
recorded in the queue and should save you the trouble.

> +       tx_buffer->tx_flags = tx_flags;
> +       i = ring->next_to_use;
> +       tx_desc = IXGBE_TX_DESC(ring, i);
> +
> +       dma = dma_map_single(ring->dev, xdp->data, len, DMA_TO_DEVICE);
> +       if (dma_mapping_error(ring->dev, dma))
> +               goto dma_error;
> +
> +       dma_unmap_len_set(tx_buffer, len, len);
> +       dma_unmap_addr_set(tx_buffer, dma, dma);
> +       tx_buffer->data = xdp->data;
> +       tx_desc->read.buffer_addr = cpu_to_le64(dma);
> +
> +       /* put descriptor type bits */
> +       cmd_type = IXGBE_ADVTXD_DTYP_DATA |
> +                  IXGBE_ADVTXD_DCMD_DEXT |
> +                  IXGBE_ADVTXD_DCMD_IFCS |
> +                  IXGBE_ADVTXD_DCMD_EOP;

The EOP is redundant, we already included it in IXGBE_TXD_CMD which
you reference below.

> +       cmd_type |= len | IXGBE_TXD_CMD;
> +       tx_desc->read.cmd_type_len = cpu_to_le32(cmd_type);
> +       tx_desc->read.olinfo_status =
> +               cpu_to_le32(len << IXGBE_ADVTXD_PAYLEN_SHIFT);
> +
> +       /* With normal buffer flow we would also set the time_stamp. But in
> +        * XDP case we can safely skip it because at the moment it is not
> +        * used anywhere.
> +        */
> +       tx_buffer->time_stamp = jiffies;
> +

We should probably just drop the code from the driver that records
jiffies.  Feel free to submit a patch to drop it and save yourself a
few cycles.

> +       /* Force memory writes to complete before letting h/w know there
> +        * are new descriptors to fetch.  (Only applicable for weak-ordered
> +        * memory model archs, such as IA-64).
> +        *
> +        * We also need this memory barrier to make certain all of the
> +        * status bits have been updated before next_to_watch is written.
> +        */
> +       wmb();
> +
> +       /* set next_to_watch value indicating a packet is present */
> +       i++;
> +       if (i == ring->count)
> +               i = 0;
> +
> +       tx_buffer->next_to_watch = tx_desc;
> +       ring->next_to_use = i;
> +
> +       writel(i, ring->tail);
> +
> +       /* we need this if more than one processor can write to our tail
> +        * at a time, it synchronizes IO on IA64/Altix systems
> +        */
> +       mmiowb();
> +       return NETDEV_TX_OK;
> +dma_error:
> +       if (dma_unmap_len(tx_buffer, len))
> +               dma_unmap_single(ring->dev,
> +                                dma_unmap_addr(tx_buffer, dma),
> +                                dma_unmap_len(tx_buffer, len),
> +                                DMA_TO_DEVICE);

For now if there is a DMA error there is nothing to unmap since we
only have the one buffer.  We might just look at returning -ENOMEM
directly instead of jumping out for now.  As we add the ability to
support scatter-gather for Tx then we can look at adding this clean-up
since for now there is nothing to update.

> +       dma_unmap_len_set(tx_buffer, len, 0);
> +       ring->next_to_use = i;
> +       return -ENOMEM;
> +}
> +
>  netdev_tx_t ixgbe_xmit_frame_ring(struct sk_buff *skb,
>                           struct ixgbe_adapter *adapter,
>                           struct ixgbe_ring *tx_ring)
> @@ -8363,6 +8563,23 @@ static void ixgbe_netpoll(struct net_device *netdev)
>
>  #endif
>
> +static void ixgbe_get_ring_stats64(struct rtnl_link_stats64 *stats,
> +                                  struct ixgbe_ring *ring)
> +{
> +       u64 bytes, packets;
> +       unsigned int start;
> +
> +       if (ring) {
> +               do {
> +                       start = u64_stats_fetch_begin_irq(&ring->syncp);
> +                       packets = ring->stats.packets;
> +                       bytes   = ring->stats.bytes;
> +               } while (u64_stats_fetch_retry_irq(&ring->syncp, start));
> +               stats->tx_packets += packets;
> +               stats->tx_bytes   += bytes;
> +       }
> +}
> +
>  static void ixgbe_get_stats64(struct net_device *netdev,
>                               struct rtnl_link_stats64 *stats)
>  {
> @@ -8388,18 +8605,13 @@ static void ixgbe_get_stats64(struct net_device *netdev,
>
>         for (i = 0; i < adapter->num_tx_queues; i++) {
>                 struct ixgbe_ring *ring = ACCESS_ONCE(adapter->tx_ring[i]);
> -               u64 bytes, packets;
> -               unsigned int start;
>
> -               if (ring) {
> -                       do {
> -                               start = u64_stats_fetch_begin_irq(&ring->syncp);
> -                               packets = ring->stats.packets;
> -                               bytes   = ring->stats.bytes;
> -                       } while (u64_stats_fetch_retry_irq(&ring->syncp, start));
> -                       stats->tx_packets += packets;
> -                       stats->tx_bytes   += bytes;
> -               }
> +               ixgbe_get_ring_stats64(stats, ring);
> +       }
> +       for (i = 0; i < adapter->num_xdp_queues; i++) {
> +               struct ixgbe_ring *ring = ACCESS_ONCE(adapter->xdp_ring[i]);
> +
> +               ixgbe_get_ring_stats64(stats, ring);
>         }
>         rcu_read_unlock();
>
> @@ -8533,6 +8745,7 @@ int ixgbe_setup_tc(struct net_device *dev, u8 tc)
>         }
>
>         ixgbe_validate_rtr(adapter, tc);
> +       adapter->num_xdp_queues = adapter->xdp_prog ? nr_cpu_ids : 0;
>
>  #endif /* CONFIG_IXGBE_DCB */
>         ixgbe_init_interrupt_scheme(adapter);

I'm confused about this.  Don't you overwrite this when you call
ixgbe_init_interrupt_scheme?

> @@ -9520,7 +9733,7 @@ static int ixgbe_xdp_setup(struct net_device *dev, struct bpf_prog *prog)
>  {
>         int i, frame_size = dev->mtu + ETH_HLEN + ETH_FCS_LEN + VLAN_HLEN;
>         struct ixgbe_adapter *adapter = netdev_priv(dev);
> -       struct bpf_prog *old_adapter_prog;
> +       struct bpf_prog *old_prog;
>
>         /* verify ixgbe ring attributes are sufficient for XDP */
>         for (i = 0; i < adapter->num_rx_queues; i++) {
> @@ -9533,12 +9746,26 @@ static int ixgbe_xdp_setup(struct net_device *dev, struct bpf_prog *prog)
>                         return -EINVAL;
>         }
>
> -       old_adapter_prog = xchg(&adapter->xdp_prog, prog);
> +       if (nr_cpu_ids > MAX_XDP_QUEUES)
> +               return -ENOMEM;
> +
> +       old_prog = xchg(&adapter->xdp_prog, prog);
> +
> +       /* If transitioning XDP modes reconfigure rings */
> +       if (!!adapter->xdp_prog != !!old_prog) {
> +               int err = ixgbe_setup_tc(dev, netdev_get_num_tc(dev));
> +
> +               if (err) {
> +                       rcu_assign_pointer(adapter->xdp_prog, old_prog);
> +                       return -EINVAL;
> +               }
> +       }
> +
>         for (i = 0; i < adapter->num_rx_queues; i++)
>                 ixgbe_setup_xdp_resource(adapter, adapter->rx_ring[i]);
>
> -       if (old_adapter_prog)
> -               bpf_prog_put(old_adapter_prog);
> +       if (old_prog)
> +               bpf_prog_put(old_prog);
>
>         return 0;
>  }
> @@ -10044,6 +10271,9 @@ static int ixgbe_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
>         for (i = 0; i < adapter->num_tx_queues; i++)
>                 u64_stats_init(&adapter->tx_ring[i]->syncp);
>
> +       for (i = 0; i < adapter->num_xdp_queues; i++)
> +               u64_stats_init(&adapter->xdp_ring[i]->syncp);
> +
>         /* WOL not supported for all devices */
>         adapter->wol = 0;
>         hw->eeprom.ops.read(hw, 0x2c, &adapter->eeprom_cap);
>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Intel-wired-lan] [net-next PATCH 0/3] XDP for ixgbe
  2017-02-25 17:38 ` [Intel-wired-lan] [net-next PATCH 0/3] XDP for ixgbe John Fastabend
@ 2017-02-27  1:35   ` Jeff Kirsher
  0 siblings, 0 replies; 14+ messages in thread
From: Jeff Kirsher @ 2017-02-27  1:35 UTC (permalink / raw)
  To: intel-wired-lan

On Sat, 2017-02-25 at 09:38 -0800, John Fastabend wrote:
> There might need to be a couple versions to address feedback, but I
> assume you can put this on your dev_queue for whenever net-next
> opens?

Since Alex has some comments on patch 2 of the series, and Alex appears
to have some additional patches for igb/ixgbe coming, I will hold off
on this series.

I will monitor future versions, and will apply any future versions to
my dev-queue for testing.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: This is a digitally signed message part
URL: <http://lists.osuosl.org/pipermail/intel-wired-lan/attachments/20170226/7b349e3b/attachment.asc>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Intel-wired-lan] [net-next PATCH 1/3] ixgbe: add XDP support for pass and drop actions
  2017-02-25 17:32 ` [Intel-wired-lan] [net-next PATCH 1/3] ixgbe: add XDP support for pass and drop actions John Fastabend
  2017-02-25 18:18   ` kbuild test robot
@ 2017-02-27 16:22   ` Alexander Duyck
  1 sibling, 0 replies; 14+ messages in thread
From: Alexander Duyck @ 2017-02-27 16:22 UTC (permalink / raw)
  To: intel-wired-lan

On Sat, Feb 25, 2017 at 9:32 AM, John Fastabend
<john.fastabend@gmail.com> wrote:
> Basic XDP drop support for ixgbe. Uses READ_ONCE/xchg semantics on XDP
> programs instead of rcu primitives as suggested by Daniel Borkmann and
> Alex Duyck.
>
> Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
> ---
>  drivers/net/ethernet/intel/ixgbe/ixgbe.h      |    2
>  drivers/net/ethernet/intel/ixgbe/ixgbe_main.c |  108 +++++++++++++++++++++++++
>  2 files changed, 108 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe.h b/drivers/net/ethernet/intel/ixgbe/ixgbe.h
> index b812913..2d12c24 100644
> --- a/drivers/net/ethernet/intel/ixgbe/ixgbe.h
> +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe.h
> @@ -273,6 +273,7 @@ struct ixgbe_ring {
>         struct ixgbe_ring *next;        /* pointer to next ring in q_vector */
>         struct ixgbe_q_vector *q_vector; /* backpointer to host q_vector */
>         struct net_device *netdev;      /* netdev ring belongs to */
> +       struct bpf_prog *xdp_prog;
>         struct device *dev;             /* device for DMA mapping */
>         struct ixgbe_fwd_adapter *l2_accel_priv;
>         void *desc;                     /* descriptor ring memory */
> @@ -510,6 +511,7 @@ struct ixgbe_adapter {
>         unsigned long active_vlans[BITS_TO_LONGS(VLAN_N_VID)];
>         /* OS defined structs */
>         struct net_device *netdev;
> +       struct bpf_prog *xdp_prog;
>         struct pci_dev *pdev;
>
>         unsigned long state;
> diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
> index e3da397..ec2c38f 100644
> --- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
> +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
> @@ -49,6 +49,9 @@
>  #include <linux/if_macvlan.h>
>  #include <linux/if_bridge.h>
>  #include <linux/prefetch.h>
> +#include <linux/bpf.h>
> +#include <linux/bpf_trace.h>
> +#include <linux/atomic.h>
>  #include <scsi/fc/fc_fcoe.h>
>  #include <net/udp_tunnel.h>
>  #include <net/pkt_cls.h>
> @@ -2051,7 +2054,7 @@ static void ixgbe_put_rx_buffer(struct ixgbe_ring *rx_ring,
>                 /* hand second half of page back to the ring */
>                 ixgbe_reuse_rx_page(rx_ring, rx_buffer);
>         } else {
> -               if (IXGBE_CB(skb)->dma == rx_buffer->dma) {
> +               if (skb && IXGBE_CB(skb)->dma == rx_buffer->dma) {
>                         /* the page has been released from the ring */
>                         IXGBE_CB(skb)->page_released = true;
>                 } else {

So just an idea here.  What if we were to use the PTR_ERR code for the
skb return value instead just NULL or an skb pointer.  More on that
idea to follow in the review.

> @@ -2157,6 +2160,42 @@ static struct sk_buff *ixgbe_build_skb(struct ixgbe_ring *rx_ring,
>         return skb;
>  }
>
> +static int ixgbe_run_xdp(struct ixgbe_ring  *rx_ring,
> +                        struct ixgbe_rx_buffer *rx_buffer,
> +                        unsigned int size)

Maybe we should pass size via a reference pointer instead of as a
value.  Then we can modify it without having to use it as a return
value.  More on that below.

> +{
> +       struct bpf_prog *xdp_prog;
> +       struct xdp_buff xdp;
> +       void *addr;
> +       u32 act;
> +
> +       xdp_prog = READ_ONCE(rx_ring->xdp_prog);
> +
> +       if (!xdp_prog)
> +               return 0;
> +
> +       addr = page_address(rx_buffer->page) + rx_buffer->page_offset;
> +       xdp.data_hard_start = addr;

So this math is only partly correct.  The data_hard_start should be
adjusted if the BUILD_SKB flag is set by subtracting NET_SKB_PAD +
NET_IP_ALIGN.

> +       xdp.data = addr;
> +       xdp.data_end = addr + size;
> +
> +       act = bpf_prog_run_xdp(xdp_prog, &xdp);
> +       switch (act) {
> +       case XDP_PASS:
> +               return 0;
> +       default:
> +               bpf_warn_invalid_xdp_action(act);
> +       case XDP_TX:
> +       case XDP_ABORTED:
> +               trace_xdp_exception(rx_ring->netdev, xdp_prog, act);
> +               /* fallthrough -- handle aborts by dropping packet */
> +       case XDP_DROP:
> +               rx_buffer->pagecnt_bias++; /* give page back */
> +               break;
> +       }
> +       return size;
> +}
> +

I realized if we allow for head adjustment we should probably have
rx_buffer->page_offset get updated based off of xdp.data if we allow
it to be passed through.  We also may want to pass size as a pointer
instead of returning it.  That way we can return an skb as a PTR_ERR
type if we need to either Tx, drop, or abort.

If we allow XDP_PASS to adjust the page_offset we will need to make
sure in the 4K page case that we are sanitizing the page_offset later.
Probably need to add an and/or combination to make certain we are
always resetting the padding to the correct amount.

>  /**
>   * ixgbe_clean_rx_irq - Clean completed descriptors from Rx ring - bounce buf
>   * @q_vector: structure containing interrupt and ring information
> @@ -2187,6 +2226,7 @@ static int ixgbe_clean_rx_irq(struct ixgbe_q_vector *q_vector,
>                 struct ixgbe_rx_buffer *rx_buffer;
>                 struct sk_buff *skb;
>                 unsigned int size;
> +               int consumed;
>
>                 /* return some buffers to hardware, one at a time is too slow */
>                 if (cleaned_count >= IXGBE_RX_BUFFER_WRITE) {

If my idea is right we won't need to mess with adding another variable at all.

> @@ -2207,6 +2247,19 @@ static int ixgbe_clean_rx_irq(struct ixgbe_q_vector *q_vector,
>
>                 rx_buffer = ixgbe_get_rx_buffer(rx_ring, rx_desc, &skb, size);
>
> +               rcu_read_lock();
> +               consumed = ixgbe_run_xdp(rx_ring, rx_buffer, size);
> +               rcu_read_unlock();
> +
> +               if (consumed) {
> +                       ixgbe_put_rx_buffer(rx_ring, rx_buffer, skb);
> +                       cleaned_count++;
> +                       ixgbe_is_non_eop(rx_ring, rx_desc, skb);
> +                       total_rx_packets++;
> +                       total_rx_bytes += size;
> +                       continue;
> +               }
> +
>                 /* retrieve a buffer from the ring */
>                 if (skb)
>                         ixgbe_add_rx_frag(rx_ring, rx_buffer, skb, size);


I'm not really sure if we still need the rcu_read_lock/unlock after
changing to the READ_ONCE semantics.  For now I am just dropping it in
my suggestions down below.  It might make sense to move the RCU
locking into ixgbe_run_xdp if it is truly needed.

I'm not really a huge fan of having to break up the receive path like
this.  It seems like a ton of duplication of effort.  I think this
might work much better laid out like:
/* retrieve a buffer from the ring */
if (likely(!skb))
    skb = ixgbe_run_xdp(rx_ring, rx_buffer, &size);

if (IS_ERR(skb)) {
    total_rx_packets++;
    total_rx_bytes += size;
} else if (skb) {
    ixgbe_add_rx_frag(rx_ring, rx_buffer, skb, size);

Then all that is left is to make a change to ixgbe_cleanup_headers so
that it will identify that the skb is an error and push us back
through via the continue.

From what I can tell the only thing really preventing us from being
able to handle jumbo frames this way is the fact that they want to
track the first DMA mapping in the skb.  So with a change like the one
you already made, and us enforcing that RSC has to be disabled to
support XDP then we should be good.

> @@ -6106,6 +6159,12 @@ int ixgbe_setup_rx_resources(struct ixgbe_ring *rx_ring)
>         return -ENOMEM;
>  }
>
> +static void ixgbe_setup_xdp_resource(struct ixgbe_adapter *adapter,
> +                                    struct ixgbe_ring *ring)
> +{
> +       xchg(&ring->xdp_prog, adapter->xdp_prog);
> +}
> +
>  /**
>   * ixgbe_setup_all_rx_resources - allocate all queues Rx resources
>   * @adapter: board private structure
> @@ -6122,8 +6181,10 @@ static int ixgbe_setup_all_rx_resources(struct ixgbe_adapter *adapter)
>
>         for (i = 0; i < adapter->num_rx_queues; i++) {
>                 err = ixgbe_setup_rx_resources(adapter->rx_ring[i]);
> -               if (!err)
> +               if (!err) {
> +                       ixgbe_setup_xdp_resource(adapter, adapter->rx_ring[i]);
>                         continue;
> +               }
>
>                 e_err(probe, "Allocation for Rx Queue %u failed\n", i);
>                 goto err_setup_rx;

I'd say just this into ixgbe_setup_rx_resources instead of making it
its own thing.  It is just a one liner and we already have all the
same arguments in that function anyway.

We probably also need to update the code that will free_rx_resources
so that it will reset xdp_prog back to NULL.  That way we can
guarantee it is released if we just down the interface without
bringing it back up.

> @@ -9455,6 +9516,48 @@ static void ixgbe_fwd_del(struct net_device *pdev, void *priv)
>         return features;
>  }
>
> +static int ixgbe_xdp_setup(struct net_device *dev, struct bpf_prog *prog)
> +{
> +       int i, frame_size = dev->mtu + ETH_HLEN + ETH_FCS_LEN + VLAN_HLEN;
> +       struct ixgbe_adapter *adapter = netdev_priv(dev);
> +       struct bpf_prog *old_adapter_prog;
> +
> +       /* verify ixgbe ring attributes are sufficient for XDP */
> +       for (i = 0; i < adapter->num_rx_queues; i++) {
> +               struct ixgbe_ring *ring = adapter->rx_ring[i];
> +
> +               if (ring_is_rsc_enabled(ring))
> +                       return -EINVAL;
> +
> +               if (frame_size > ixgbe_rx_bufsz(ring))
> +                       return -EINVAL;
> +       }
> +
> +       old_adapter_prog = xchg(&adapter->xdp_prog, prog);
> +       for (i = 0; i < adapter->num_rx_queues; i++)
> +               ixgbe_setup_xdp_resource(adapter, adapter->rx_ring[i]);
> +
> +       if (old_adapter_prog)
> +               bpf_prog_put(old_adapter_prog);
> +
> +       return 0;
> +}
> +
> +static int ixgbe_xdp(struct net_device *dev, struct netdev_xdp *xdp)
> +{
> +       struct ixgbe_adapter *adapter = netdev_priv(dev);
> +
> +       switch (xdp->command) {
> +       case XDP_SETUP_PROG:
> +               return ixgbe_xdp_setup(dev, xdp->prog);
> +       case XDP_QUERY_PROG:
> +               xdp->prog_attached = !!(adapter->rx_ring[0]->xdp_prog);
> +               return 0;
> +       default:
> +               return -EINVAL;
> +       }
> +}
> +
>  static const struct net_device_ops ixgbe_netdev_ops = {
>         .ndo_open               = ixgbe_open,
>         .ndo_stop               = ixgbe_close,
> @@ -9500,6 +9603,7 @@ static void ixgbe_fwd_del(struct net_device *pdev, void *priv)
>         .ndo_udp_tunnel_add     = ixgbe_add_udp_tunnel_port,
>         .ndo_udp_tunnel_del     = ixgbe_del_udp_tunnel_port,
>         .ndo_features_check     = ixgbe_features_check,
> +       .ndo_xdp                = ixgbe_xdp,
>  };
>
>  /**
>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Intel-wired-lan] [net-next PATCH 3/3] ixgbe: xdp support for adjust head
  2017-02-25 17:33 ` [Intel-wired-lan] [net-next PATCH 3/3] ixgbe: xdp support for adjust head John Fastabend
@ 2017-02-27 16:32   ` Alexander Duyck
  2017-03-02  0:23     ` John Fastabend
  0 siblings, 1 reply; 14+ messages in thread
From: Alexander Duyck @ 2017-02-27 16:32 UTC (permalink / raw)
  To: intel-wired-lan

On Sat, Feb 25, 2017 at 9:33 AM, John Fastabend
<john.fastabend@gmail.com> wrote:
> Add adjust_head support for XDP however at the moment we are only
> adding IXGBE_SKB_PAD bytes of headroom to align with driver paths.
>
> The infrastructure is is such that a follow on patch can extend
> headroom up to 196B without changing RX path.

So I am really not a fan of the way this is implemented.  I think we
can probably just get away with passing size as a pointer, and
adjusting page_offset in the rx_buffer_info structure.

Then the only extra bit of work that is needed is that we have to add
some code to sanitize the page_offset in ixgbe_build_skb and
ixgbe_construct_skb in the 4K case since we can't just use an xor
anymore and instead will probably have to do a combination of xor,
and, or to clear and reset the page offset back to what it is supposed
to be.

> Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
> ---
>  drivers/net/ethernet/intel/ixgbe/ixgbe_main.c |   45 +++++++++++++++++--------
>  1 file changed, 31 insertions(+), 14 deletions(-)
>
> diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
> index 227caf8..040e469 100644
> --- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
> +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
> @@ -2031,6 +2031,7 @@ static bool ixgbe_can_reuse_rx_page(struct ixgbe_rx_buffer *rx_buffer)
>  static void ixgbe_add_rx_frag(struct ixgbe_ring *rx_ring,
>                               struct ixgbe_rx_buffer *rx_buffer,
>                               struct sk_buff *skb,
> +                             unsigned int headroom,
>                               unsigned int size)
>  {
>  #if (PAGE_SIZE < 8192)
> @@ -2041,7 +2042,8 @@ static void ixgbe_add_rx_frag(struct ixgbe_ring *rx_ring,
>                                 SKB_DATA_ALIGN(size);
>  #endif
>         skb_add_rx_frag(skb, skb_shinfo(skb)->nr_frags, rx_buffer->page,
> -                       rx_buffer->page_offset, size, truesize);
> +                       rx_buffer->page_offset - (IXGBE_SKB_PAD - headroom),
> +                       size, truesize);
>  #if (PAGE_SIZE < 8192)
>         rx_buffer->page_offset ^= truesize;
>  #else
> @@ -2114,6 +2116,7 @@ static void ixgbe_put_rx_buffer(struct ixgbe_ring *rx_ring,
>  static struct sk_buff *ixgbe_construct_skb(struct ixgbe_ring *rx_ring,
>                                            struct ixgbe_rx_buffer *rx_buffer,
>                                            union ixgbe_adv_rx_desc *rx_desc,
> +                                          unsigned int headroom,
>                                            unsigned int size)
>  {
>         void *va = page_address(rx_buffer->page) + rx_buffer->page_offset;
> @@ -2122,6 +2125,7 @@ static struct sk_buff *ixgbe_construct_skb(struct ixgbe_ring *rx_ring,
>  #else
>         unsigned int truesize = SKB_DATA_ALIGN(size);
>  #endif
> +       unsigned int off_page;
>         struct sk_buff *skb;
>
>         /* prefetch first cache line of first page */
> @@ -2135,12 +2139,14 @@ static struct sk_buff *ixgbe_construct_skb(struct ixgbe_ring *rx_ring,
>         if (unlikely(!skb))
>                 return NULL;
>
> +       off_page = IXGBE_SKB_PAD - headroom;
> +
>         if (size > IXGBE_RX_HDR_SIZE) {
>                 if (!ixgbe_test_staterr(rx_desc, IXGBE_RXD_STAT_EOP))
>                         IXGBE_CB(skb)->dma = rx_buffer->dma;
>
>                 skb_add_rx_frag(skb, 0, rx_buffer->page,
> -                               rx_buffer->page_offset,
> +                               rx_buffer->page_offset - off_page,
>                                 size, truesize);
>  #if (PAGE_SIZE < 8192)
>                 rx_buffer->page_offset ^= truesize;
> @@ -2148,7 +2154,8 @@ static struct sk_buff *ixgbe_construct_skb(struct ixgbe_ring *rx_ring,
>                 rx_buffer->page_offset += truesize;
>  #endif
>         } else {
> -               memcpy(__skb_put(skb, size), va, ALIGN(size, sizeof(long)));
> +               memcpy(__skb_put(skb, size), va - off_page,
> +                      ALIGN(size, sizeof(long)));
>                 rx_buffer->pagecnt_bias++;
>         }
>
> @@ -2158,6 +2165,7 @@ static struct sk_buff *ixgbe_construct_skb(struct ixgbe_ring *rx_ring,
>  static struct sk_buff *ixgbe_build_skb(struct ixgbe_ring *rx_ring,
>                                        struct ixgbe_rx_buffer *rx_buffer,
>                                        union ixgbe_adv_rx_desc *rx_desc,
> +                                      unsigned int headroom,
>                                        unsigned int size)
>  {
>         void *va = page_address(rx_buffer->page) + rx_buffer->page_offset;
> @@ -2181,7 +2189,7 @@ static struct sk_buff *ixgbe_build_skb(struct ixgbe_ring *rx_ring,
>                 return NULL;
>
>         /* update pointers within the skb to store the data */
> -       skb_reserve(skb, IXGBE_SKB_PAD);
> +       skb_reserve(skb, headroom);
>         __skb_put(skb, size);
>
>         /* record DMA address if this is the start of a chain of buffers */
> @@ -2202,10 +2210,14 @@ static int ixgbe_xmit_xdp_ring(struct xdp_buff *xdp,
>                                struct ixgbe_adapter *adapter,
>                                struct ixgbe_ring *tx_ring);
>
> +#define IXGBE_XDP_PASS 0
> +#define IXGBE_XDP_CONSUMED 1
> +

Going back to my original suggestions for patch one you would have 3
possible return values.  You would return NULL if it is meant to pass,
one value wrapped in PTR_ERR if it is meant to be an XMIT, and another
value wrapped in PTR_ERR if it is meant to be dropped.  The only real
difference between the 3 is that for NULL you do nothing, for XMIT you
are going to need to call your xdp_xmit function and update the
page_offset, and for drop you add one back to the pagecnt_bias.

I just realized I think my example that I provided for patch one was
missing that bit.  We need to add one back to the pagecnt_bias if we
are just dropping the buffer.

>  static int ixgbe_run_xdp(struct ixgbe_adapter *adapter,
>                          struct ixgbe_ring  *rx_ring,
>                          struct ixgbe_rx_buffer *rx_buffer,
> -                        unsigned int size)
> +                        unsigned int *headroom,
> +                        unsigned int *size)
>  {
>         struct ixgbe_ring *xdp_ring;
>         struct bpf_prog *xdp_prog;
> @@ -2216,17 +2228,19 @@ static int ixgbe_run_xdp(struct ixgbe_adapter *adapter,
>         xdp_prog = READ_ONCE(rx_ring->xdp_prog);
>
>         if (!xdp_prog)
> -               return 0;
> +               return IXGBE_XDP_PASS;
>
>         addr = page_address(rx_buffer->page) + rx_buffer->page_offset;
> -       xdp.data_hard_start = addr;
> +       xdp.data_hard_start = addr - *headroom;
>         xdp.data = addr;
> -       xdp.data_end = addr + size;
> +       xdp.data_end = addr + *size;
>
>         act = bpf_prog_run_xdp(xdp_prog, &xdp);
>         switch (act) {
>         case XDP_PASS:
> -               return 0;
> +               *headroom = xdp.data - xdp.data_hard_start;
> +               *size = xdp.data_end - xdp.data;
> +               return IXGBE_XDP_PASS;
>         case XDP_TX:
>                 xdp_ring = adapter->xdp_ring[smp_processor_id()];
>
> @@ -2246,7 +2260,7 @@ static int ixgbe_run_xdp(struct ixgbe_adapter *adapter,
>                 rx_buffer->pagecnt_bias++; /* give page back */
>                 break;
>         }
> -       return size;
> +       return IXGBE_XDP_CONSUMED;
>  }
>
>  /**
> @@ -2275,6 +2289,7 @@ static int ixgbe_clean_rx_irq(struct ixgbe_q_vector *q_vector,
>         u16 cleaned_count = ixgbe_desc_unused(rx_ring);
>
>         while (likely(total_rx_packets < budget)) {
> +               unsigned int headroom = ixgbe_rx_offset(rx_ring);
>                 union ixgbe_adv_rx_desc *rx_desc;
>                 struct ixgbe_rx_buffer *rx_buffer;
>                 struct sk_buff *skb;
> @@ -2301,7 +2316,8 @@ static int ixgbe_clean_rx_irq(struct ixgbe_q_vector *q_vector,
>                 rx_buffer = ixgbe_get_rx_buffer(rx_ring, rx_desc, &skb, size);
>
>                 rcu_read_lock();
> -               consumed = ixgbe_run_xdp(adapter, rx_ring, rx_buffer, size);
> +               consumed = ixgbe_run_xdp(adapter, rx_ring, rx_buffer,
> +                                        &headroom, &size);
>                 rcu_read_unlock();
>
>                 if (consumed) {
> @@ -2315,13 +2331,14 @@ static int ixgbe_clean_rx_irq(struct ixgbe_q_vector *q_vector,
>
>                 /* retrieve a buffer from the ring */
>                 if (skb)
> -                       ixgbe_add_rx_frag(rx_ring, rx_buffer, skb, size);
> +                       ixgbe_add_rx_frag(rx_ring, rx_buffer, skb,
> +                                         headroom, size);
>                 else if (ring_uses_build_skb(rx_ring))
>                         skb = ixgbe_build_skb(rx_ring, rx_buffer,
> -                                             rx_desc, size);
> +                                             rx_desc, headroom, size);
>                 else
>                         skb = ixgbe_construct_skb(rx_ring, rx_buffer,
> -                                                 rx_desc, size);
> +                                                 rx_desc, headroom, size);
>
>                 /* exit if we failed to retrieve a buffer */
>                 if (!skb) {
>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Intel-wired-lan] [net-next PATCH 2/3] ixgbe: add support for XDP_TX action
  2017-02-25 22:01   ` Alexander Duyck
@ 2017-03-01 23:15     ` John Fastabend
  2017-03-02  0:07       ` John Fastabend
  0 siblings, 1 reply; 14+ messages in thread
From: John Fastabend @ 2017-03-01 23:15 UTC (permalink / raw)
  To: intel-wired-lan

On 17-02-25 02:01 PM, Alexander Duyck wrote:
> On Sat, Feb 25, 2017 at 9:32 AM, John Fastabend
> <john.fastabend@gmail.com> wrote:
>> Add support for XDP_TX action.
>>
>> A couple design choices were made here. First I use a new ring
>> pointer structure xdp_ring[] in the adapter struct instead of
>> pushing the newly allocated xdp TX rings into the tx_ring[]
>> structure. This means we have to duplicate loops around rings
>> in places we want to initialize both TX rings and XDP rings.
>> But by making it explicit it is obvious when we are using XDP
>> rings and when we are using TX rings. Further we don't have
>> to do ring arithmatic which is error prone. As a proof point
>> for doing this my first patches used only a single ring structure
>> and introduced bugs in FCoE code and macvlan code paths.
>>
>> Second I am aware this is not the most optimized version of
>> this code possible. I want to get baseline support in using
>> the most readable format possible and then once this series
>> is included I will optimize the TX path in another series
>> of patches.
> 
> Comments inline below.  There are still a few items to iron out but
> this looks pretty good.  I'll try to get to the other two patches
> tomorrow.
> 
>> Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
>> ---
>>  drivers/net/ethernet/intel/ixgbe/ixgbe.h         |   12 +
>>  drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c |   29 ++
>>  drivers/net/ethernet/intel/ixgbe/ixgbe_lib.c     |   85 +++++-
>>  drivers/net/ethernet/intel/ixgbe/ixgbe_main.c    |  314 +++++++++++++++++++---
>>  4 files changed, 386 insertions(+), 54 deletions(-)
>>
>> diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe.h b/drivers/net/ethernet/intel/ixgbe/ixgbe.h
>> index 2d12c24..571a072 100644
>> --- a/drivers/net/ethernet/intel/ixgbe/ixgbe.h
>> +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe.h
>> @@ -190,7 +190,10 @@ struct vf_macvlans {
>>  struct ixgbe_tx_buffer {
>>         union ixgbe_adv_tx_desc *next_to_watch;
>>         unsigned long time_stamp;
>> -       struct sk_buff *skb;
>> +       union {
>> +               struct sk_buff *skb;
>> +               void *data; /* XDP uses addr ptr on irq_clean */
>> +       };
>>         unsigned int bytecount;
>>         unsigned short gso_segs;
>>         __be16 protocol;
> 
> This actually has me wondering if we couldn't short-cut things one
> step further and just store a flag in the tx_flags portion of the
> tx_buffer to record if this is an skb or a data pointer.  I'm
> wondering if we couldn't optimize the standard transmit path to just
> take a reference on the skb->head, and drop the sk_buff itself for
> small packets since in most cases there isn't anything we really need
> to hold onto anyway.  I might have to look into that.

Perhaps, I think this is probably a follow on series if its possible.
At the moment I like how simple this is.

> >> @@ -308,6 +311,7 @@ struct ixgbe_ring {
>>         };
>>
>>         u8 dcb_tc;
>> +       bool xdp_ring;
>>         struct ixgbe_queue_stats stats;
>>         struct u64_stats_sync syncp;
>>         union {
> 
> Instead of adding a bool I would rather have this added to the ring
> state as a single bit.  That way we don't have to modify the ring
> structure here at all.

Sure.

> 
>> @@ -335,6 +339,7 @@ enum ixgbe_ring_f_enum {
>>  #define IXGBE_MAX_FCOE_INDICES         8
>>  #define MAX_RX_QUEUES                  (IXGBE_MAX_FDIR_INDICES + 1)
>>  #define MAX_TX_QUEUES                  (IXGBE_MAX_FDIR_INDICES + 1)
>> +#define MAX_XDP_QUEUES                 (IXGBE_MAX_FDIR_INDICES + 1)
>>  #define IXGBE_MAX_L2A_QUEUES           4
>>  #define IXGBE_BAD_L2A_QUEUE            3
>>  #define IXGBE_MAX_MACVLANS             31
>> @@ -578,6 +583,10 @@ struct ixgbe_adapter {
>>         __be16 vxlan_port;
>>         __be16 geneve_port;
>>
>> +       /* XDP */
>> +       int num_xdp_queues;
>> +       struct ixgbe_ring *xdp_ring[MAX_XDP_QUEUES];
>> +
>>         /* TX */
>>         struct ixgbe_ring *tx_ring[MAX_TX_QUEUES] ____cacheline_aligned_in_smp;
>>
>> @@ -624,6 +633,7 @@ struct ixgbe_adapter {
>>
>>         u64 tx_busy;
>>         unsigned int tx_ring_count;
>> +       unsigned int xdp_ring_count;
>>         unsigned int rx_ring_count;
>>
>>         u32 link_speed;
>> diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c
>> index d3e02ac..51efd0a 100644
>> --- a/drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c
>> +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c
>> @@ -1031,7 +1031,7 @@ static int ixgbe_set_ringparam(struct net_device *netdev,
>>  {
>>         struct ixgbe_adapter *adapter = netdev_priv(netdev);
>>         struct ixgbe_ring *temp_ring;
>> -       int i, err = 0;
>> +       int i, j, err = 0;
>>         u32 new_rx_count, new_tx_count;
>>
>>         if ((ring->rx_mini_pending) || (ring->rx_jumbo_pending))
>> @@ -1057,15 +1057,19 @@ static int ixgbe_set_ringparam(struct net_device *netdev,
>>         if (!netif_running(adapter->netdev)) {
>>                 for (i = 0; i < adapter->num_tx_queues; i++)
>>                         adapter->tx_ring[i]->count = new_tx_count;
>> +               for (i = 0; i < adapter->num_xdp_queues; i++)
>> +                       adapter->xdp_ring[i]->count = new_tx_count;
>>                 for (i = 0; i < adapter->num_rx_queues; i++)
>>                         adapter->rx_ring[i]->count = new_rx_count;
>>                 adapter->tx_ring_count = new_tx_count;
>> +               adapter->xdp_ring_count = new_tx_count;
>>                 adapter->rx_ring_count = new_rx_count;
>>                 goto clear_reset;
>>         }
>>
>>         /* allocate temporary buffer to store rings in */
>> -       i = max_t(int, adapter->num_tx_queues, adapter->num_rx_queues);
>> +       i = max_t(int, adapter->num_tx_queues + adapter->num_xdp_queues,
>> +                 adapter->num_rx_queues);
> 
> You only need the maximum of one of the 3 values.  You don't need to
> add num_tx_queues to num_xdp_queues.
> 

hold-over from when I pushed xdp and tx queues into the same ring[] array.

>>         temp_ring = vmalloc(i * sizeof(struct ixgbe_ring));
>>
>>         if (!temp_ring) {
>> @@ -1097,12 +1101,33 @@ static int ixgbe_set_ringparam(struct net_device *netdev,
>>                         }
>>                 }
>>
>> +               for (j = 0; j < adapter->num_xdp_queues; i++, j++) {
>> +                       memcpy(&temp_ring[i], adapter->xdp_ring[j],
>> +                              sizeof(struct ixgbe_ring));
>> +
>> +                       temp_ring[i].count = new_tx_count;
>> +                       err = ixgbe_setup_tx_resources(&temp_ring[i]);
>> +                       if (err) {
>> +                               while (i) {
>> +                                       i--;
>> +                                       ixgbe_free_tx_resources(&temp_ring[i]);
>> +                               }
>> +                               goto err_setup;
>> +                       }
>> +               }
>> +
>>                 for (i = 0; i < adapter->num_tx_queues; i++) {
>>                         ixgbe_free_tx_resources(adapter->tx_ring[i]);
>>
>>                         memcpy(adapter->tx_ring[i], &temp_ring[i],
>>                                sizeof(struct ixgbe_ring));
>>                 }
>> +               for (j = 0; j < adapter->num_xdp_queues; i++, j++) {
>> +                       ixgbe_free_tx_resources(adapter->xdp_ring[j]);
>> +
>> +                       memcpy(adapter->xdp_ring[j], &temp_ring[i],
>> +                              sizeof(struct ixgbe_ring));
>> +               }
>>
>>                 adapter->tx_ring_count = new_tx_count;
>>         }
>> diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_lib.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_lib.c
>> index 1b8be7d..d9a4916 100644
>> --- a/drivers/net/ethernet/intel/ixgbe/ixgbe_lib.c
>> +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_lib.c
>> @@ -73,6 +73,11 @@ static bool ixgbe_cache_ring_dcb_sriov(struct ixgbe_adapter *adapter)
>>                         reg_idx = __ALIGN_MASK(reg_idx, ~vmdq->mask);
>>                 adapter->tx_ring[i]->reg_idx = reg_idx;
>>         }
>> +       for (i = 0; i < adapter->num_xdp_queues; i++, reg_idx++) {
>> +               if ((reg_idx & ~vmdq->mask) >= tcs)
>> +                       reg_idx = __ALIGN_MASK(reg_idx, ~vmdq->mask);
>> +               adapter->xdp_ring[i]->reg_idx = reg_idx;
>> +       }
>>
>>  #ifdef IXGBE_FCOE
>>         /* nothing to do if FCoE is disabled */
>> @@ -248,6 +253,12 @@ static bool ixgbe_cache_ring_sriov(struct ixgbe_adapter *adapter)
>>                 adapter->tx_ring[i]->reg_idx = reg_idx;
>>         }
>>
>> +       for (i = 0; i < adapter->num_xdp_queues; i++, reg_idx++) {
>> +               if ((reg_idx & rss->mask) >= rss->indices)
>> +                       reg_idx = __ALIGN_MASK(reg_idx, ~vmdq->mask);
>> +               adapter->xdp_ring[i]->reg_idx = reg_idx;
>> +       }
>> +
>>  #ifdef IXGBE_FCOE
>>         /* FCoE uses a linear block of queues so just assigning 1:1 */
>>         for (; i < adapter->num_tx_queues; i++, reg_idx++)
> 
> One thing we may want to look at doing for the SR-IOV cases would be
> to steal XDP queues from an adjacent VMDq pool.  It would put a
> dependency us limiting ourselves by one additional VF, but it would
> make the logic much simpler since each VMDq pool is the same size so
> the Tx/Rx limit in one pool would be the same as the next so you could
> do twice the number of Tx queues as Rx queues.

follow on patches?

> 
>> @@ -267,12 +278,14 @@ static bool ixgbe_cache_ring_sriov(struct ixgbe_adapter *adapter)
>>   **/
>>  static bool ixgbe_cache_ring_rss(struct ixgbe_adapter *adapter)
>>  {
>> -       int i;
>> +       int i, reg_idx;
>>
>>         for (i = 0; i < adapter->num_rx_queues; i++)
>>                 adapter->rx_ring[i]->reg_idx = i;
>> -       for (i = 0; i < adapter->num_tx_queues; i++)
>> -               adapter->tx_ring[i]->reg_idx = i;
>> +       for (i = 0, reg_idx = 0; i < adapter->num_tx_queues; i++, reg_idx++)
>> +               adapter->tx_ring[i]->reg_idx = reg_idx;
>> +       for (i = 0; i < adapter->num_xdp_queues; i++, reg_idx++)
>> +               adapter->xdp_ring[i]->reg_idx = reg_idx;
>>
>>         return true;
>>  }
>> @@ -308,6 +321,14 @@ static void ixgbe_cache_ring_register(struct ixgbe_adapter *adapter)
>>         ixgbe_cache_ring_rss(adapter);
>>  }
>>
>> +static int ixgbe_xdp_queues(struct ixgbe_adapter *adapter)
>> +{
>> +       if (nr_cpu_ids > MAX_XDP_QUEUES)
>> +               return 0;
>> +
>> +       return adapter->xdp_prog ? nr_cpu_ids : 0;
>> +}
>> +
>>  #define IXGBE_RSS_64Q_MASK     0x3F
>>  #define IXGBE_RSS_16Q_MASK     0xF
>>  #define IXGBE_RSS_8Q_MASK      0x7
>> @@ -382,6 +403,7 @@ static bool ixgbe_set_dcb_sriov_queues(struct ixgbe_adapter *adapter)
>>         adapter->num_rx_queues_per_pool = tcs;
>>
>>         adapter->num_tx_queues = vmdq_i * tcs;
>> +       adapter->num_xdp_queues = ixgbe_xdp_queues(adapter);
>>         adapter->num_rx_queues = vmdq_i * tcs;
>>
>>  #ifdef IXGBE_FCOE
>> @@ -479,6 +501,7 @@ static bool ixgbe_set_dcb_queues(struct ixgbe_adapter *adapter)
>>                 netdev_set_tc_queue(dev, i, rss_i, rss_i * i);
>>
>>         adapter->num_tx_queues = rss_i * tcs;
>> +       adapter->num_xdp_queues = ixgbe_xdp_queues(adapter);
>>         adapter->num_rx_queues = rss_i * tcs;
>>
>>         return true;
>> @@ -549,6 +572,7 @@ static bool ixgbe_set_sriov_queues(struct ixgbe_adapter *adapter)
>>
>>         adapter->num_rx_queues = vmdq_i * rss_i;
>>         adapter->num_tx_queues = vmdq_i * rss_i;
>> +       adapter->num_xdp_queues = ixgbe_xdp_queues(adapter);
>>
>>         /* disable ATR as it is not supported when VMDq is enabled */
>>         adapter->flags &= ~IXGBE_FLAG_FDIR_HASH_CAPABLE;
> 
> I'm pretty sure we are currently looking at an overly simplified
> setup.  I suspect we need to take queue limits into account for DCB
> and SR-IOV.  For example what TC are we supposed to be taking Tx
> queues from in the case of DCB? Really in order for us to support XDP
> in the DCB case we need rss_i to be equal to or less than half the
> maximum possible value.
> 

This is probably a gap in XDP in general. For example how would XDP even
push a packet at a traffic class? At least we should ensure rss_i doesn't
break something but not sure how to coexist with DCB reasonably at the
moment.

> And as I mentioned before we need to make certain in the case of
> SR-IOV w/ DCB or without that we have one pool that we set aside so
> that we can steal queues from it to support XPS.
> 

SR-IOV + XDP should work I'm a bit more skeptical about DCB at the
moment.  I'm think adding a check in rss_i to ensure we can claim
the queues ("less than half" comment above) and then sort out the
'xdp' pool concept as follow on series.

>> @@ -669,6 +693,7 @@ static bool ixgbe_set_rss_queues(struct ixgbe_adapter *adapter)
>>  #endif /* IXGBE_FCOE */
>>         adapter->num_rx_queues = rss_i;
>>         adapter->num_tx_queues = rss_i;
>> +       adapter->num_xdp_queues = ixgbe_xdp_queues(adapter);
>>
>>         return true;
>>  }
> 
> So this is the one allocation that is safe for now since the rss_i has
> an upper limit of 64.
> 
>> @@ -689,6 +714,7 @@ static void ixgbe_set_num_queues(struct ixgbe_adapter *adapter)
>>         /* Start with base case */
>>         adapter->num_rx_queues = 1;
>>         adapter->num_tx_queues = 1;
>> +       adapter->num_xdp_queues = 0;
>>         adapter->num_rx_pools = adapter->num_rx_queues;
>>         adapter->num_rx_queues_per_pool = 1;
>>
>> @@ -720,7 +746,8 @@ static int ixgbe_acquire_msix_vectors(struct ixgbe_adapter *adapter)
>>         int i, vectors, vector_threshold;
>>
>>         /* We start by asking for one vector per queue pair */
>> -       vectors = max(adapter->num_rx_queues, adapter->num_tx_queues);
>> +       vectors = max(adapter->num_rx_queues,
>> +                     adapter->num_tx_queues + adapter->num_xdp_queues);
>>
>>         /* It is easy to be greedy for MSI-X vectors. However, it really
>>          * doesn't do much good if we have a lot more vectors than CPUs. We'll
> 
> This might be overkill on the vectors.  You can stack regular Rx
> queues, Tx queues, and xdp_queues.  I would say take the maximum of
> all 3 instead of adding Tx and XPS queues.

sure.

> 
>> @@ -800,6 +827,8 @@ static void ixgbe_add_ring(struct ixgbe_ring *ring,
>>   * @v_idx: index of vector in adapter struct
>>   * @txr_count: total number of Tx rings to allocate
>>   * @txr_idx: index of first Tx ring to allocate
>> + * @xdp_count: total number of XDP rings to allocate
>> + * @xdp_idx: index of first XDP ring to allocate
>>   * @rxr_count: total number of Rx rings to allocate
>>   * @rxr_idx: index of first Rx ring to allocate
>>   *
>> @@ -808,6 +837,7 @@ static void ixgbe_add_ring(struct ixgbe_ring *ring,
>>  static int ixgbe_alloc_q_vector(struct ixgbe_adapter *adapter,
>>                                 int v_count, int v_idx,
>>                                 int txr_count, int txr_idx,
>> +                               int xdp_count, int xdp_idx,
>>                                 int rxr_count, int rxr_idx)
>>  {
>>         struct ixgbe_q_vector *q_vector;
>> @@ -909,6 +939,33 @@ static int ixgbe_alloc_q_vector(struct ixgbe_adapter *adapter,
>>                 ring++;
>>         }
>>
>> +       while (xdp_count) {
>> +               /* assign generic ring traits */
>> +               ring->dev = &adapter->pdev->dev;
>> +               ring->netdev = adapter->netdev;
> 
> Just a thought on all this.  I don't know if we should be setting
> ring->netdev.  We don't really have a netdevice associated with the
> XDP rings.  we have an Rx ring.
> 
> If we were to leave ring->netdev as NULL it might help us to find any
> and all places where we might have overlooked things and have XDP
> trying to tweak netdev queues.  In addition it would help to short
> circuit the flag checks in the Tx clean-up path as we could just check
> for tx_ring->netdev instead of having to check for an XDP boolean
> value.
> 

hmm not sure about this. When we dump debug statements for example on
a hung queue its helpful to have an idea what port the queues are from.
Although I guess we could print one of the other identifier strings instead.

>> +               /* configure backlink on ring */
>> +               ring->q_vector = q_vector;
>> +
>> +               /* update q_vector Tx values */
>> +               ixgbe_add_ring(ring, &q_vector->tx);
>> +
>> +               /* apply Tx specific ring traits */
>> +               ring->count = adapter->xdp_ring_count;
>> +               ring->queue_index = xdp_idx;
>> +               ring->xdp_ring = true;
>> +
>> +               /* assign ring to adapter */
>> +               adapter->xdp_ring[xdp_idx] = ring;
>> +
>> +               /* update count and index */
>> +               xdp_count--;
>> +               xdp_idx += v_count;
>> +
>> +               /* push pointer to next ring */
>> +               ring++;
>> +       }
>> +
>>         while (rxr_count) {
>>                 /* assign generic ring traits */
>>                 ring->dev = &adapter->pdev->dev;
>> @@ -1002,17 +1059,18 @@ static int ixgbe_alloc_q_vectors(struct ixgbe_adapter *adapter)
>>         int q_vectors = adapter->num_q_vectors;
>>         int rxr_remaining = adapter->num_rx_queues;
>>         int txr_remaining = adapter->num_tx_queues;
>> -       int rxr_idx = 0, txr_idx = 0, v_idx = 0;
>> +       int xdp_remaining = adapter->num_xdp_queues;
>> +       int rxr_idx = 0, txr_idx = 0, xdp_idx = 0, v_idx = 0;
>>         int err;
>>
>>         /* only one q_vector if MSI-X is disabled. */
>>         if (!(adapter->flags & IXGBE_FLAG_MSIX_ENABLED))
>>                 q_vectors = 1;
>>
>> -       if (q_vectors >= (rxr_remaining + txr_remaining)) {
>> +       if (q_vectors >= (rxr_remaining + txr_remaining + xdp_remaining)) {
>>                 for (; rxr_remaining; v_idx++) {
>>                         err = ixgbe_alloc_q_vector(adapter, q_vectors, v_idx,
>> -                                                  0, 0, 1, rxr_idx);
>> +                                                  0, 0, 0, 0, 1, rxr_idx);
>>
>>                         if (err)
>>                                 goto err_out;
>> @@ -1026,8 +1084,11 @@ static int ixgbe_alloc_q_vectors(struct ixgbe_adapter *adapter)
>>         for (; v_idx < q_vectors; v_idx++) {
>>                 int rqpv = DIV_ROUND_UP(rxr_remaining, q_vectors - v_idx);
>>                 int tqpv = DIV_ROUND_UP(txr_remaining, q_vectors - v_idx);
>> +               int xqpv = DIV_ROUND_UP(xdp_remaining, q_vectors - v_idx);
>> +
>>                 err = ixgbe_alloc_q_vector(adapter, q_vectors, v_idx,
>>                                            tqpv, txr_idx,
>> +                                          xqpv, xdp_idx,
>>                                            rqpv, rxr_idx);
>>
>>                 if (err)
>> @@ -1036,14 +1097,17 @@ static int ixgbe_alloc_q_vectors(struct ixgbe_adapter *adapter)
>>                 /* update counts and index */
>>                 rxr_remaining -= rqpv;
>>                 txr_remaining -= tqpv;
>> +               xdp_remaining -= xqpv;
>>                 rxr_idx++;
>>                 txr_idx++;
>> +               xdp_idx++;
>>         }
>>
>>         return 0;
>>
>>  err_out:
>>         adapter->num_tx_queues = 0;
>> +       adapter->num_xdp_queues = 0;
>>         adapter->num_rx_queues = 0;
>>         adapter->num_q_vectors = 0;
>>
>> @@ -1066,6 +1130,7 @@ static void ixgbe_free_q_vectors(struct ixgbe_adapter *adapter)
>>         int v_idx = adapter->num_q_vectors;
>>
>>         adapter->num_tx_queues = 0;
>> +       adapter->num_xdp_queues = 0;
>>         adapter->num_rx_queues = 0;
>>         adapter->num_q_vectors = 0;
>>
>> @@ -1172,9 +1237,10 @@ int ixgbe_init_interrupt_scheme(struct ixgbe_adapter *adapter)
>>
>>         ixgbe_cache_ring_register(adapter);
>>
>> -       e_dev_info("Multiqueue %s: Rx Queue count = %u, Tx Queue count = %u\n",
>> +       e_dev_info("Multiqueue %s: Rx Queue count = %u, Tx Queue count = %u XDP Queue count = %u\n",
>>                    (adapter->num_rx_queues > 1) ? "Enabled" : "Disabled",
>> -                  adapter->num_rx_queues, adapter->num_tx_queues);
>> +                  adapter->num_rx_queues,
>> +                  adapter->num_tx_queues, adapter->num_xdp_queues);
>>
>>         set_bit(__IXGBE_DOWN, &adapter->state);
>>
> 
> I think I would have preferred to see us leave tx_queues, and
> rx_queues on the same line and add num_xdp_queues on a line by itself.

really :/ it seems a bit cleaner to have them all on one line IMO it fits on an
80 char screen fairly sure.

> 
>> @@ -1195,6 +1261,7 @@ int ixgbe_init_interrupt_scheme(struct ixgbe_adapter *adapter)
>>  void ixgbe_clear_interrupt_scheme(struct ixgbe_adapter *adapter)
>>  {
>>         adapter->num_tx_queues = 0;
>> +       adapter->num_xdp_queues = 0;
>>         adapter->num_rx_queues = 0;
>>
>>         ixgbe_free_q_vectors(adapter);
>> diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
>> index ec2c38f..227caf8 100644
>> --- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
>> +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
>> @@ -594,6 +594,19 @@ static void ixgbe_regdump(struct ixgbe_hw *hw, struct ixgbe_reg_info *reginfo)
>>
>>  }
>>
>> +static void ixgbe_print_buffer(struct ixgbe_ring *ring, int n)
>> +{
>> +       struct ixgbe_tx_buffer *tx_buffer;
>> +
>> +       tx_buffer = &ring->tx_buffer_info[ring->next_to_clean];
>> +       pr_info(" %5d %5X %5X %016llX %08X %p %016llX\n",
>> +                  n, ring->next_to_use, ring->next_to_clean,
>> +                  (u64)dma_unmap_addr(tx_buffer, dma),
>> +                  dma_unmap_len(tx_buffer, len),
>> +                  tx_buffer->next_to_watch,
>> +                  (u64)tx_buffer->time_stamp);
>> +}
>> +
>>  /*
>>   * ixgbe_dump - Print registers, tx-rings and rx-rings
>>   */
>> @@ -603,7 +616,7 @@ static void ixgbe_dump(struct ixgbe_adapter *adapter)
>>         struct ixgbe_hw *hw = &adapter->hw;
>>         struct ixgbe_reg_info *reginfo;
>>         int n = 0;
>> -       struct ixgbe_ring *tx_ring;
>> +       struct ixgbe_ring *ring;
>>         struct ixgbe_tx_buffer *tx_buffer;
>>         union ixgbe_adv_tx_desc *tx_desc;
>>         struct my_u0 { u64 a; u64 b; } *u0;
>> @@ -644,14 +657,13 @@ static void ixgbe_dump(struct ixgbe_adapter *adapter)
>>                 "Queue [NTU] [NTC] [bi(ntc)->dma  ]",
>>                 "leng", "ntw", "timestamp");
>>         for (n = 0; n < adapter->num_tx_queues; n++) {
>> -               tx_ring = adapter->tx_ring[n];
>> -               tx_buffer = &tx_ring->tx_buffer_info[tx_ring->next_to_clean];
>> -               pr_info(" %5d %5X %5X %016llX %08X %p %016llX\n",
>> -                          n, tx_ring->next_to_use, tx_ring->next_to_clean,
>> -                          (u64)dma_unmap_addr(tx_buffer, dma),
>> -                          dma_unmap_len(tx_buffer, len),
>> -                          tx_buffer->next_to_watch,
>> -                          (u64)tx_buffer->time_stamp);
>> +               ring = adapter->tx_ring[n];
>> +               ixgbe_print_buffer(ring, n);
>> +       }
>> +
>> +       for (n = 0; n < adapter->num_xdp_queues; n++) {
>> +               ring = adapter->xdp_ring[n];
>> +               ixgbe_print_buffer(ring, n);
>>         }
>>
>>         /* Print TX Rings */
>> @@ -696,28 +708,28 @@ static void ixgbe_dump(struct ixgbe_adapter *adapter)
>>          */
>>
>>         for (n = 0; n < adapter->num_tx_queues; n++) {
>> -               tx_ring = adapter->tx_ring[n];
>> +               ring = adapter->tx_ring[n];
>>                 pr_info("------------------------------------\n");
>> -               pr_info("TX QUEUE INDEX = %d\n", tx_ring->queue_index);
>> +               pr_info("TX QUEUE INDEX = %d\n", ring->queue_index);
>>                 pr_info("------------------------------------\n");
>>                 pr_info("%s%s    %s              %s        %s          %s\n",
>>                         "T [desc]     [address 63:0  ] ",
>>                         "[PlPOIdStDDt Ln] [bi->dma       ] ",
>>                         "leng", "ntw", "timestamp", "bi->skb");
>>
>> -               for (i = 0; tx_ring->desc && (i < tx_ring->count); i++) {
>> -                       tx_desc = IXGBE_TX_DESC(tx_ring, i);
>> -                       tx_buffer = &tx_ring->tx_buffer_info[i];
>> +               for (i = 0; ring->desc && (i < ring->count); i++) {
>> +                       tx_desc = IXGBE_TX_DESC(ring, i);
>> +                       tx_buffer = &ring->tx_buffer_info[i];
>>                         u0 = (struct my_u0 *)tx_desc;
>>                         if (dma_unmap_len(tx_buffer, len) > 0) {
>>                                 const char *ring_desc;
>>
>> -                               if (i == tx_ring->next_to_use &&
>> -                                   i == tx_ring->next_to_clean)
>> +                               if (i == ring->next_to_use &&
>> +                                   i == ring->next_to_clean)
>>                                         ring_desc = " NTC/U";
>> -                               else if (i == tx_ring->next_to_use)
>> +                               else if (i == ring->next_to_use)
>>                                         ring_desc = " NTU";
>> -                               else if (i == tx_ring->next_to_clean)
>> +                               else if (i == ring->next_to_clean)
>>                                         ring_desc = " NTC";
>>                                 else
>>                                         ring_desc = "";
>> @@ -987,6 +999,10 @@ static void ixgbe_update_xoff_rx_lfc(struct ixgbe_adapter *adapter)
>>         for (i = 0; i < adapter->num_tx_queues; i++)
>>                 clear_bit(__IXGBE_HANG_CHECK_ARMED,
>>                           &adapter->tx_ring[i]->state);
>> +
>> +       for (i = 0; i < adapter->num_xdp_queues; i++)
>> +               clear_bit(__IXGBE_HANG_CHECK_ARMED,
>> +                         &adapter->xdp_ring[i]->state);
>>  }
>>
>>  static void ixgbe_update_xoff_received(struct ixgbe_adapter *adapter)
>> @@ -1031,6 +1047,14 @@ static void ixgbe_update_xoff_received(struct ixgbe_adapter *adapter)
>>                 if (xoff[tc])
>>                         clear_bit(__IXGBE_HANG_CHECK_ARMED, &tx_ring->state);
>>         }
>> +
>> +       for (i = 0; i < adapter->num_xdp_queues; i++) {
>> +               struct ixgbe_ring *xdp_ring = adapter->xdp_ring[i];
>> +
>> +               tc = xdp_ring->dcb_tc;
>> +               if (xoff[tc])
>> +                       clear_bit(__IXGBE_HANG_CHECK_ARMED, &xdp_ring->state);
>> +       }
>>  }
>>
>>  static u64 ixgbe_get_tx_completed(struct ixgbe_ring *ring)
>> @@ -1137,6 +1161,11 @@ static int ixgbe_tx_maxrate(struct net_device *netdev,
>>         return 0;
>>  }
>>
>> +static bool ixgbe_txring_is_xdp(struct ixgbe_ring *tx_ring)
>> +{
>> +       return tx_ring->xdp_ring;
>> +}
>> +
>>  /**
>>   * ixgbe_clean_tx_irq - Reclaim resources after transmit completes
>>   * @q_vector: structure containing interrupt and ring information
>> @@ -1182,7 +1211,10 @@ static bool ixgbe_clean_tx_irq(struct ixgbe_q_vector *q_vector,
>>                 total_packets += tx_buffer->gso_segs;
>>
>>                 /* free the skb */
>> -               napi_consume_skb(tx_buffer->skb, napi_budget);
>> +               if (ixgbe_txring_is_xdp(tx_ring))
>> +                       page_frag_free(tx_buffer->data);
>> +               else
>> +                       napi_consume_skb(tx_buffer->skb, napi_budget);
>>
>>                 /* unmap skb header data */
>>                 dma_unmap_single(tx_ring->dev,
>> @@ -1243,7 +1275,7 @@ static bool ixgbe_clean_tx_irq(struct ixgbe_q_vector *q_vector,
>>         if (check_for_tx_hang(tx_ring) && ixgbe_check_tx_hang(tx_ring)) {
>>                 /* schedule immediate reset if we believe we hung */
>>                 struct ixgbe_hw *hw = &adapter->hw;
>> -               e_err(drv, "Detected Tx Unit Hang\n"
>> +               e_err(drv, "Detected Tx Unit Hang %s\n"
>>                         "  Tx Queue             <%d>\n"
>>                         "  TDH, TDT             <%x>, <%x>\n"
>>                         "  next_to_use          <%x>\n"
>> @@ -1251,13 +1283,16 @@ static bool ixgbe_clean_tx_irq(struct ixgbe_q_vector *q_vector,
>>                         "tx_buffer_info[next_to_clean]\n"
>>                         "  time_stamp           <%lx>\n"
>>                         "  jiffies              <%lx>\n",
>> +                       ixgbe_txring_is_xdp(tx_ring) ? "(XDP)" : "",
>>                         tx_ring->queue_index,
>>                         IXGBE_READ_REG(hw, IXGBE_TDH(tx_ring->reg_idx)),
>>                         IXGBE_READ_REG(hw, IXGBE_TDT(tx_ring->reg_idx)),
>>                         tx_ring->next_to_use, i,
>>                         tx_ring->tx_buffer_info[i].time_stamp, jiffies);
>>
>> -               netif_stop_subqueue(tx_ring->netdev, tx_ring->queue_index);
>> +               if (!ixgbe_txring_is_xdp(tx_ring))
>> +                       netif_stop_subqueue(tx_ring->netdev,
>> +                                           tx_ring->queue_index);
>>
>>                 e_info(probe,
>>                        "tx hang %d detected on queue %d, resetting adapter\n",
>> @@ -1270,6 +1305,9 @@ static bool ixgbe_clean_tx_irq(struct ixgbe_q_vector *q_vector,
>>                 return true;
>>         }
>>
>> +       if (ixgbe_txring_is_xdp(tx_ring))
>> +               return !!budget;
>> +
>>         netdev_tx_completed_queue(txring_txq(tx_ring),
>>                                   total_packets, total_bytes);
>>
>> @@ -2160,10 +2198,16 @@ static struct sk_buff *ixgbe_build_skb(struct ixgbe_ring *rx_ring,
>>         return skb;
>>  }
>>
>> -static int ixgbe_run_xdp(struct ixgbe_ring  *rx_ring,
>> +static int ixgbe_xmit_xdp_ring(struct xdp_buff *xdp,
>> +                              struct ixgbe_adapter *adapter,
>> +                              struct ixgbe_ring *tx_ring);
>> +
>> +static int ixgbe_run_xdp(struct ixgbe_adapter *adapter,
>> +                        struct ixgbe_ring  *rx_ring,
>>                          struct ixgbe_rx_buffer *rx_buffer,
>>                          unsigned int size)
>>  {
>> +       struct ixgbe_ring *xdp_ring;
>>         struct bpf_prog *xdp_prog;
>>         struct xdp_buff xdp;
>>         void *addr;
>> @@ -2183,9 +2227,18 @@ static int ixgbe_run_xdp(struct ixgbe_ring  *rx_ring,
>>         switch (act) {
>>         case XDP_PASS:
>>                 return 0;
>> +       case XDP_TX:
>> +               xdp_ring = adapter->xdp_ring[smp_processor_id()];
>> +
>> +               /* XDP_TX is not flow controlled */
>> +               if (!xdp_ring) {
>> +                       WARN_ON(1);
>> +                       break;
>> +               }
>> +               ixgbe_xmit_xdp_ring(&xdp, adapter, xdp_ring);
>> +               break;
> 
> So just for the sake of future compatibility I would say you might
> look at instead just passing a netdev and the XDP buffer.  Then your
> transmit routine can make the decision on using smp_processor_id and
> grab the xdp_ring it wants.  No point in having us make the decision
> in the Rx path.  That way things stay closer to how the transmit path
> already works.
> 
> Also shouldn't you return some sort of notification as to if you
> completed the xmit successfully?  It almost seems like maybe we should
> fall through to an XDP_ABORTED case if we didn't transmit the frame
> since we might end up leaking the memory otherwise.

Yep. Fixing.

> 
>>         default:
>>                 bpf_warn_invalid_xdp_action(act);
>> -       case XDP_TX:
>>         case XDP_ABORTED:
>>                 trace_xdp_exception(rx_ring->netdev, xdp_prog, act);
>>                 /* fallthrough -- handle aborts by dropping packet */
>> @@ -2214,8 +2267,8 @@ static int ixgbe_clean_rx_irq(struct ixgbe_q_vector *q_vector,
>>                                const int budget)
>>  {
>>         unsigned int total_rx_bytes = 0, total_rx_packets = 0;
>> -#ifdef IXGBE_FCOE
>>         struct ixgbe_adapter *adapter = q_vector->adapter;
>> +#ifdef IXGBE_FCOE
>>         int ddp_bytes;
>>         unsigned int mss = 0;
>>  #endif /* IXGBE_FCOE */
>> @@ -2248,7 +2301,7 @@ static int ixgbe_clean_rx_irq(struct ixgbe_q_vector *q_vector,
>>                 rx_buffer = ixgbe_get_rx_buffer(rx_ring, rx_desc, &skb, size);
>>
>>                 rcu_read_lock();
>> -               consumed = ixgbe_run_xdp(rx_ring, rx_buffer, size);
>> +               consumed = ixgbe_run_xdp(adapter, rx_ring, rx_buffer, size);
>>                 rcu_read_unlock();
>>
>>                 if (consumed) {
> 
> You probably don't need the adapter struct.  If you have the rx_ring
> it should have a netdev structure and from that you can get to the
> adapter struct via netdev_priv.

Its easy to pass it through the function though and avoid priv(rx_ring->adapter)
line.

> 
>> @@ -2914,6 +2967,15 @@ static irqreturn_t ixgbe_msix_other(int irq, void *data)
>>                                                        &ring->state))
>>                                         reinit_count++;
>>                         }
>> +
>> +                       for (i = 0; i < adapter->num_xdp_queues; i++) {
>> +                               struct ixgbe_ring *ring = adapter->xdp_ring[i];
>> +
>> +                               if (test_and_clear_bit(__IXGBE_TX_FDIR_INIT_DONE,
>> +                                                      &ring->state))
>> +                                       reinit_count++;
>> +                       }
>> +
>>                         if (reinit_count) {
>>                                 /* no more flow director interrupts until after init */
>>                                 IXGBE_WRITE_REG(hw, IXGBE_EIMC, IXGBE_EIMC_FLOW_DIR);
> 
> We don't run flow director on the XDP queues so there is no need to
> add this bit.

Yep cut'n'paste error.

> 
>> @@ -3429,6 +3491,8 @@ static void ixgbe_configure_tx(struct ixgbe_adapter *adapter)
>>         /* Setup the HW Tx Head and Tail descriptor pointers */
>>         for (i = 0; i < adapter->num_tx_queues; i++)
>>                 ixgbe_configure_tx_ring(adapter, adapter->tx_ring[i]);
>> +       for (i = 0; i < adapter->num_xdp_queues; i++)
>> +               ixgbe_configure_tx_ring(adapter, adapter->xdp_ring[i]);
>>  }
>>
>>  static void ixgbe_enable_rx_drop(struct ixgbe_adapter *adapter,
>> @@ -5598,7 +5662,8 @@ static void ixgbe_clean_tx_ring(struct ixgbe_ring *tx_ring)
>>         }
>>
>>         /* reset BQL for queue */
>> -       netdev_tx_reset_queue(txring_txq(tx_ring));
>> +       if (!ixgbe_txring_is_xdp(tx_ring))
>> +               netdev_tx_reset_queue(txring_txq(tx_ring));
>>
>>         /* reset next_to_use and next_to_clean */
>>         tx_ring->next_to_use = 0;
>> @@ -5627,6 +5692,8 @@ static void ixgbe_clean_all_tx_rings(struct ixgbe_adapter *adapter)
>>
>>         for (i = 0; i < adapter->num_tx_queues; i++)
>>                 ixgbe_clean_tx_ring(adapter->tx_ring[i]);
>> +       for (i = 0; i < adapter->num_xdp_queues; i++)
>> +               ixgbe_clean_tx_ring(adapter->xdp_ring[i]);
>>  }
>>
>>  static void ixgbe_fdir_filter_exit(struct ixgbe_adapter *adapter)
>> @@ -5721,6 +5788,11 @@ void ixgbe_down(struct ixgbe_adapter *adapter)
>>                 u8 reg_idx = adapter->tx_ring[i]->reg_idx;
>>                 IXGBE_WRITE_REG(hw, IXGBE_TXDCTL(reg_idx), IXGBE_TXDCTL_SWFLSH);
>>         }
>> +       for (i = 0; i < adapter->num_xdp_queues; i++) {
>> +               u8 reg_idx = adapter->xdp_ring[i]->reg_idx;
>> +
>> +               IXGBE_WRITE_REG(hw, IXGBE_TXDCTL(reg_idx), IXGBE_TXDCTL_SWFLSH);
>> +       }
>>
>>         /* Disable the Tx DMA engine on 82599 and later MAC */
>>         switch (hw->mac.type) {
>> @@ -6008,6 +6080,7 @@ static int ixgbe_sw_init(struct ixgbe_adapter *adapter,
>>
>>         /* set default ring sizes */
>>         adapter->tx_ring_count = IXGBE_DEFAULT_TXD;
>> +       adapter->xdp_ring_count = IXGBE_DEFAULT_TXD;
>>         adapter->rx_ring_count = IXGBE_DEFAULT_RXD;
>>
>>         /* set default work limits */
> 
> I would just use the tx_ring_count value for the XDP rings.  No point
> in adding another control unless you want to add the ethtool interface
> for it.
> 

I liked that it was explicit. But its only ever used once so I'll remove it.

>> @@ -6089,7 +6162,7 @@ int ixgbe_setup_tx_resources(struct ixgbe_ring *tx_ring)
>>   **/
>>  static int ixgbe_setup_all_tx_resources(struct ixgbe_adapter *adapter)
>>  {
>> -       int i, err = 0;
>> +       int i, j = 0, err = 0;
>>
>>         for (i = 0; i < adapter->num_tx_queues; i++) {
>>                 err = ixgbe_setup_tx_resources(adapter->tx_ring[i]);
>> @@ -6099,12 +6172,23 @@ static int ixgbe_setup_all_tx_resources(struct ixgbe_adapter *adapter)
>>                 e_err(probe, "Allocation for Tx Queue %u failed\n", i);
>>                 goto err_setup_tx;
>>         }
>> +       for (j = 0; j < adapter->num_xdp_queues; j++) {
>> +               err = ixgbe_setup_tx_resources(adapter->xdp_ring[j]);
>> +               if (!err)
>> +                       continue;
>> +
>> +               e_err(probe, "Allocation for Tx Queue %u failed\n", j);
>> +               goto err_setup_tx;
>> +       }
>> +
>>
>>         return 0;
>>  err_setup_tx:
>>         /* rewind the index freeing the rings as we go */
>>         while (i--)
>>                 ixgbe_free_tx_resources(adapter->tx_ring[i]);
>> +       while (j--)
>> +               ixgbe_free_tx_resources(adapter->xdp_ring[j]);
>>         return err;
>>  }
>>
> 
> Just for sake of symmetry I would prefer to see xdp rings freed before
> the Tx rings.

OK.

> 
>> @@ -6238,6 +6322,9 @@ static void ixgbe_free_all_tx_resources(struct ixgbe_adapter *adapter)
>>         for (i = 0; i < adapter->num_tx_queues; i++)
>>                 if (adapter->tx_ring[i]->desc)
>>                         ixgbe_free_tx_resources(adapter->tx_ring[i]);
>> +       for (i = 0; i < adapter->num_xdp_queues; i++)
>> +               if (adapter->xdp_ring[i]->desc)
>> +                       ixgbe_free_tx_resources(adapter->xdp_ring[i]);
>>  }
>>
>>  /**
>> @@ -6656,6 +6743,14 @@ void ixgbe_update_stats(struct ixgbe_adapter *adapter)
>>                 bytes += tx_ring->stats.bytes;
>>                 packets += tx_ring->stats.packets;
>>         }
>> +       for (i = 0; i < adapter->num_xdp_queues; i++) {
>> +               struct ixgbe_ring *xdp_ring = adapter->xdp_ring[i];
>> +
>> +               restart_queue += xdp_ring->tx_stats.restart_queue;
>> +               tx_busy += xdp_ring->tx_stats.tx_busy;
>> +               bytes += xdp_ring->stats.bytes;
>> +               packets += xdp_ring->stats.packets;
>> +       }
>>         adapter->restart_queue = restart_queue;
>>         adapter->tx_busy = tx_busy;
>>         netdev->stats.tx_bytes = bytes;
> 
> We may wan to look at doing a slight refactor on this to convert some
> of these stats grabs over to static functions.  We may want to also
> group these using structs instead of just doing this as individual u64
> stats.

Agreed but that is a follow on patch and not really XDP specific.

> 
>> @@ -6849,6 +6944,9 @@ static void ixgbe_fdir_reinit_subtask(struct ixgbe_adapter *adapter)
>>                 for (i = 0; i < adapter->num_tx_queues; i++)
>>                         set_bit(__IXGBE_TX_FDIR_INIT_DONE,
>>                                 &(adapter->tx_ring[i]->state));
>> +               for (i = 0; i < adapter->num_xdp_queues; i++)
>> +                       set_bit(__IXGBE_TX_FDIR_INIT_DONE,
>> +                               &adapter->xdp_ring[i]->state);
>>                 /* re-enable flow director interrupts */
>>                 IXGBE_WRITE_REG(hw, IXGBE_EIMS, IXGBE_EIMS_FLOW_DIR);
>>         } else {
> 
> I wouldn't bother with the flow director initialization for the XDP rings.
> 
> Although you may want to set __IXGBE_TX_XPS_INIT_DONE to avoid trying
> to configure XPS for the XDP rings.

So I'll leave this block then.

> 
>> @@ -6882,6 +6980,8 @@ static void ixgbe_check_hang_subtask(struct ixgbe_adapter *adapter)
>>         if (netif_carrier_ok(adapter->netdev)) {
>>                 for (i = 0; i < adapter->num_tx_queues; i++)
>>                         set_check_for_tx_hang(adapter->tx_ring[i]);
>> +               for (i = 0; i < adapter->num_xdp_queues; i++)
>> +                       set_check_for_tx_hang(adapter->xdp_ring[i]);
>>         }
>>
>>         if (!(adapter->flags & IXGBE_FLAG_MSIX_ENABLED)) {
>> @@ -7112,6 +7212,13 @@ static bool ixgbe_ring_tx_pending(struct ixgbe_adapter *adapter)
>>                         return true;
>>         }
>>
>> +       for (i = 0; i < adapter->num_xdp_queues; i++) {
>> +               struct ixgbe_ring *ring = adapter->xdp_ring[i];
>> +
>> +               if (ring->next_to_use != ring->next_to_clean)
>> +                       return true;
>> +       }
>> +
>>         return false;
>>  }
>>
>> @@ -8072,6 +8179,99 @@ static u16 ixgbe_select_queue(struct net_device *dev, struct sk_buff *skb,
>>  #endif
>>  }
>>
>> +static int ixgbe_xmit_xdp_ring(struct xdp_buff *xdp,
>> +                              struct ixgbe_adapter *adapter,
>> +                              struct ixgbe_ring *ring)
>> +{
>> +       struct ixgbe_tx_buffer *tx_buffer;
>> +       union ixgbe_adv_tx_desc *tx_desc;
>> +       u32 len, tx_flags = 0;
>> +       dma_addr_t dma;
>> +       u16 count, i;
>> +       u32 cmd_type;
>> +
>> +       len = xdp->data_end - xdp->data;
>> +       count = TXD_USE_COUNT(len);
>> +
>> +       /* record the location of the first descriptor for this packet */
>> +       tx_buffer = &ring->tx_buffer_info[ring->next_to_use];
>> +       tx_buffer->skb = NULL;
>> +       tx_buffer->bytecount = len;
>> +       tx_buffer->gso_segs = 1;
>> +       tx_buffer->protocol = 0;
>> +
>> +#ifdef CONFIG_PCI_IOV
>> +       /* Use the l2switch_enable flag - would be false if the DMA
>> +        * Tx switch had been disabled.
>> +        */
>> +       if (adapter->flags & IXGBE_FLAG_SRIOV_ENABLED)
>> +               tx_flags |= IXGBE_TX_FLAGS_CC;
>> +#endif
>> +
> 
> I'm pretty sure there is a bug here.  The TX_FLAGS_CC is supposed to
> notify the ring that it is supposed to populate the CC bit in olinfo
> status.  Also you need to populate a context descriptor when this is
> set since the header length recorded in the descriptor is used to
> determine the length of the header in the packet.  You can probably
> short-cut this and just add a bit of init code that just initializes
> the queue by dropping a context descriptor in the ring with a ethernet
> header length of 14 in it, and never bumping the tail.  Then if you
> see the TX_FLAGS_CC you just default to context 0 which is already
> recorded in the queue and should save you the trouble.

Rigth this is a bug. For this patch I'll put it inline and then optimize
it later with some other optimizations.

> 
>> +       tx_buffer->tx_flags = tx_flags;
>> +       i = ring->next_to_use;
>> +       tx_desc = IXGBE_TX_DESC(ring, i);
>> +
>> +       dma = dma_map_single(ring->dev, xdp->data, len, DMA_TO_DEVICE);
>> +       if (dma_mapping_error(ring->dev, dma))
>> +               goto dma_error;
>> +
>> +       dma_unmap_len_set(tx_buffer, len, len);
>> +       dma_unmap_addr_set(tx_buffer, dma, dma);
>> +       tx_buffer->data = xdp->data;
>> +       tx_desc->read.buffer_addr = cpu_to_le64(dma);
>> +
>> +       /* put descriptor type bits */
>> +       cmd_type = IXGBE_ADVTXD_DTYP_DATA |
>> +                  IXGBE_ADVTXD_DCMD_DEXT |
>> +                  IXGBE_ADVTXD_DCMD_IFCS |
>> +                  IXGBE_ADVTXD_DCMD_EOP;
> 
> The EOP is redundant, we already included it in IXGBE_TXD_CMD which
> you reference below.
> 

Thanks

>> +       cmd_type |= len | IXGBE_TXD_CMD;
>> +       tx_desc->read.cmd_type_len = cpu_to_le32(cmd_type);
>> +       tx_desc->read.olinfo_status =
>> +               cpu_to_le32(len << IXGBE_ADVTXD_PAYLEN_SHIFT);
>> +
>> +       /* With normal buffer flow we would also set the time_stamp. But in
>> +        * XDP case we can safely skip it because at the moment it is not
>> +        * used anywhere.
>> +        */
>> +       tx_buffer->time_stamp = jiffies;
>> +
> 
> We should probably just drop the code from the driver that records
> jiffies.  Feel free to submit a patch to drop it and save yourself a
> few cycles.
> 

Sure.

>> +       /* Force memory writes to complete before letting h/w know there
>> +        * are new descriptors to fetch.  (Only applicable for weak-ordered
>> +        * memory model archs, such as IA-64).
>> +        *
>> +        * We also need this memory barrier to make certain all of the
>> +        * status bits have been updated before next_to_watch is written.
>> +        */
>> +       wmb();
>> +
>> +       /* set next_to_watch value indicating a packet is present */
>> +       i++;
>> +       if (i == ring->count)
>> +               i = 0;
>> +
>> +       tx_buffer->next_to_watch = tx_desc;
>> +       ring->next_to_use = i;
>> +
>> +       writel(i, ring->tail);
>> +
>> +       /* we need this if more than one processor can write to our tail
>> +        * at a time, it synchronizes IO on IA64/Altix systems
>> +        */
>> +       mmiowb();
>> +       return NETDEV_TX_OK;
>> +dma_error:
>> +       if (dma_unmap_len(tx_buffer, len))
>> +               dma_unmap_single(ring->dev,
>> +                                dma_unmap_addr(tx_buffer, dma),
>> +                                dma_unmap_len(tx_buffer, len),
>> +                                DMA_TO_DEVICE);
> 
> For now if there is a DMA error there is nothing to unmap since we
> only have the one buffer.  We might just look at returning -ENOMEM
> directly instead of jumping out for now.  As we add the ability to
> support scatter-gather for Tx then we can look at adding this clean-up
> since for now there is nothing to update.

removing it.

> 
>> +       dma_unmap_len_set(tx_buffer, len, 0);
>> +       ring->next_to_use = i;
>> +       return -ENOMEM;
>> +}
>> +
>>  netdev_tx_t ixgbe_xmit_frame_ring(struct sk_buff *skb,
>>                           struct ixgbe_adapter *adapter,
>>                           struct ixgbe_ring *tx_ring)
>> @@ -8363,6 +8563,23 @@ static void ixgbe_netpoll(struct net_device *netdev)
>>
>>  #endif
>>
>> +static void ixgbe_get_ring_stats64(struct rtnl_link_stats64 *stats,
>> +                                  struct ixgbe_ring *ring)
>> +{
>> +       u64 bytes, packets;
>> +       unsigned int start;
>> +
>> +       if (ring) {
>> +               do {
>> +                       start = u64_stats_fetch_begin_irq(&ring->syncp);
>> +                       packets = ring->stats.packets;
>> +                       bytes   = ring->stats.bytes;
>> +               } while (u64_stats_fetch_retry_irq(&ring->syncp, start));
>> +               stats->tx_packets += packets;
>> +               stats->tx_bytes   += bytes;
>> +       }
>> +}
>> +
>>  static void ixgbe_get_stats64(struct net_device *netdev,
>>                               struct rtnl_link_stats64 *stats)
>>  {
>> @@ -8388,18 +8605,13 @@ static void ixgbe_get_stats64(struct net_device *netdev,
>>
>>         for (i = 0; i < adapter->num_tx_queues; i++) {
>>                 struct ixgbe_ring *ring = ACCESS_ONCE(adapter->tx_ring[i]);
>> -               u64 bytes, packets;
>> -               unsigned int start;
>>
>> -               if (ring) {
>> -                       do {
>> -                               start = u64_stats_fetch_begin_irq(&ring->syncp);
>> -                               packets = ring->stats.packets;
>> -                               bytes   = ring->stats.bytes;
>> -                       } while (u64_stats_fetch_retry_irq(&ring->syncp, start));
>> -                       stats->tx_packets += packets;
>> -                       stats->tx_bytes   += bytes;
>> -               }
>> +               ixgbe_get_ring_stats64(stats, ring);
>> +       }
>> +       for (i = 0; i < adapter->num_xdp_queues; i++) {
>> +               struct ixgbe_ring *ring = ACCESS_ONCE(adapter->xdp_ring[i]);
>> +
>> +               ixgbe_get_ring_stats64(stats, ring);
>>         }
>>         rcu_read_unlock();
>>
>> @@ -8533,6 +8745,7 @@ int ixgbe_setup_tc(struct net_device *dev, u8 tc)
>>         }
>>
>>         ixgbe_validate_rtr(adapter, tc);
>> +       adapter->num_xdp_queues = adapter->xdp_prog ? nr_cpu_ids : 0;
>>
>>  #endif /* CONFIG_IXGBE_DCB */
>>         ixgbe_init_interrupt_scheme(adapter);
> 
> I'm confused about this.  Don't you overwrite this when you call
> ixgbe_init_interrupt_scheme?
> 

stale code nice catch. Originally I was doing the init directly in setup_tc.

>> @@ -9520,7 +9733,7 @@ static int ixgbe_xdp_setup(struct net_device *dev, struct bpf_prog *prog)
>>  {
>>         int i, frame_size = dev->mtu + ETH_HLEN + ETH_FCS_LEN + VLAN_HLEN;
>>         struct ixgbe_adapter *adapter = netdev_priv(dev);
>> -       struct bpf_prog *old_adapter_prog;
>> +       struct bpf_prog *old_prog;
>>
>>         /* verify ixgbe ring attributes are sufficient for XDP */
>>         for (i = 0; i < adapter->num_rx_queues; i++) {
>> @@ -9533,12 +9746,26 @@ static int ixgbe_xdp_setup(struct net_device *dev, struct bpf_prog *prog)
>>                         return -EINVAL;
>>         }
>>
>> -       old_adapter_prog = xchg(&adapter->xdp_prog, prog);
>> +       if (nr_cpu_ids > MAX_XDP_QUEUES)
>> +               return -ENOMEM;
>> +
>> +       old_prog = xchg(&adapter->xdp_prog, prog);
>> +
>> +       /* If transitioning XDP modes reconfigure rings */
>> +       if (!!adapter->xdp_prog != !!old_prog) {
>> +               int err = ixgbe_setup_tc(dev, netdev_get_num_tc(dev));
>> +
>> +               if (err) {
>> +                       rcu_assign_pointer(adapter->xdp_prog, old_prog);
>> +                       return -EINVAL;
>> +               }
>> +       }
>> +
>>         for (i = 0; i < adapter->num_rx_queues; i++)
>>                 ixgbe_setup_xdp_resource(adapter, adapter->rx_ring[i]);
>>
>> -       if (old_adapter_prog)
>> -               bpf_prog_put(old_adapter_prog);
>> +       if (old_prog)
>> +               bpf_prog_put(old_prog);
>>
>>         return 0;
>>  }
>> @@ -10044,6 +10271,9 @@ static int ixgbe_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
>>         for (i = 0; i < adapter->num_tx_queues; i++)
>>                 u64_stats_init(&adapter->tx_ring[i]->syncp);
>>
>> +       for (i = 0; i < adapter->num_xdp_queues; i++)
>> +               u64_stats_init(&adapter->xdp_ring[i]->syncp);
>> +
>>         /* WOL not supported for all devices */
>>         adapter->wol = 0;
>>         hw->eeprom.ops.read(hw, 0x2c, &adapter->eeprom_cap);
>>


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Intel-wired-lan] [net-next PATCH 2/3] ixgbe: add support for XDP_TX action
  2017-03-01 23:15     ` John Fastabend
@ 2017-03-02  0:07       ` John Fastabend
  0 siblings, 0 replies; 14+ messages in thread
From: John Fastabend @ 2017-03-02  0:07 UTC (permalink / raw)
  To: intel-wired-lan

On 17-03-01 03:15 PM, John Fastabend wrote:
> On 17-02-25 02:01 PM, Alexander Duyck wrote:
>> On Sat, Feb 25, 2017 at 9:32 AM, John Fastabend
>> <john.fastabend@gmail.com> wrote:
>>> Add support for XDP_TX action.
>>>
>>> A couple design choices were made here. First I use a new ring
>>> pointer structure xdp_ring[] in the adapter struct instead of
>>> pushing the newly allocated xdp TX rings into the tx_ring[]
>>> structure. This means we have to duplicate loops around rings
>>> in places we want to initialize both TX rings and XDP rings.
>>> But by making it explicit it is obvious when we are using XDP
>>> rings and when we are using TX rings. Further we don't have
>>> to do ring arithmatic which is error prone. As a proof point
>>> for doing this my first patches used only a single ring structure
>>> and introduced bugs in FCoE code and macvlan code paths.
>>>
>>> Second I am aware this is not the most optimized version of
>>> this code possible. I want to get baseline support in using
>>> the most readable format possible and then once this series
>>> is included I will optimize the TX path in another series
>>> of patches.
>>
>> Comments inline below.  There are still a few items to iron out but
>> this looks pretty good.  I'll try to get to the other two patches
>> tomorrow.
>>
>>> Signed-off-by: John Fastabend <john.r.fastabend@intel.com>

Just as an organizational point I think I'll disable XDP in this
patch if SRIOV or DCB is enabled. Then push the feature in a 4th
and 5th patch.

>>> ---
>>>  drivers/net/ethernet/intel/ixgbe/ixgbe.h         |   12 +
>>>  drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c |   29 ++
>>>  drivers/net/ethernet/intel/ixgbe/ixgbe_lib.c     |   85 +++++-
>>>  drivers/net/ethernet/intel/ixgbe/ixgbe_main.c    |  314 +++++++++++++++++++---
>>>  4 files changed, 386 insertions(+), 54 deletions(-)
>>>
>>> diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe.h b/drivers/net/ethernet/intel/ixgbe/ixgbe.h
>>> index 2d12c24..571a072 100644
>>> --- a/drivers/net/ethernet/intel/ixgbe/ixgbe.h
>>> +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe.h
>>> @@ -190,7 +190,10 @@ struct vf_macvlans {
>>>  struct ixgbe_tx_buffer {
>>>         union ixgbe_adv_tx_desc *next_to_watch;
>>>         unsigned long time_stamp;
>>> -       struct sk_buff *skb;
>>> +       union {
>>> +               struct sk_buff *skb;
>>> +               void *data; /* XDP uses addr ptr on irq_clean */
>>> +       };
>>>         unsigned int bytecount;
>>>         unsigned short gso_segs;
>>>         __be16 protocol;
>>
>> This actually has me wondering if we couldn't short-cut things one
>> step further and just store a flag in the tx_flags portion of the
>> tx_buffer to record if this is an skb or a data pointer.  I'm
>> wondering if we couldn't optimize the standard transmit path to just
>> take a reference on the skb->head, and drop the sk_buff itself for
>> small packets since in most cases there isn't anything we really need
>> to hold onto anyway.  I might have to look into that.
> 
> Perhaps, I think this is probably a follow on series if its possible.
> At the moment I like how simple this is.
> 
>>>> @@ -308,6 +311,7 @@ struct ixgbe_ring {
>>>         };
>>>
>>>         u8 dcb_tc;
>>> +       bool xdp_ring;
>>>         struct ixgbe_queue_stats stats;
>>>         struct u64_stats_sync syncp;
>>>         union {
>>
>> Instead of adding a bool I would rather have this added to the ring
>> state as a single bit.  That way we don't have to modify the ring
>> structure here at all.
> 
> Sure.
> 
>>
>>> @@ -335,6 +339,7 @@ enum ixgbe_ring_f_enum {
>>>  #define IXGBE_MAX_FCOE_INDICES         8
>>>  #define MAX_RX_QUEUES                  (IXGBE_MAX_FDIR_INDICES + 1)
>>>  #define MAX_TX_QUEUES                  (IXGBE_MAX_FDIR_INDICES + 1)
>>> +#define MAX_XDP_QUEUES                 (IXGBE_MAX_FDIR_INDICES + 1)
>>>  #define IXGBE_MAX_L2A_QUEUES           4
>>>  #define IXGBE_BAD_L2A_QUEUE            3
>>>  #define IXGBE_MAX_MACVLANS             31
>>> @@ -578,6 +583,10 @@ struct ixgbe_adapter {
>>>         __be16 vxlan_port;
>>>         __be16 geneve_port;
>>>
>>> +       /* XDP */
>>> +       int num_xdp_queues;
>>> +       struct ixgbe_ring *xdp_ring[MAX_XDP_QUEUES];
>>> +
>>>         /* TX */
>>>         struct ixgbe_ring *tx_ring[MAX_TX_QUEUES] ____cacheline_aligned_in_smp;
>>>
>>> @@ -624,6 +633,7 @@ struct ixgbe_adapter {
>>>
>>>         u64 tx_busy;
>>>         unsigned int tx_ring_count;
>>> +       unsigned int xdp_ring_count;
>>>         unsigned int rx_ring_count;
>>>
>>>         u32 link_speed;
>>> diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c
>>> index d3e02ac..51efd0a 100644
>>> --- a/drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c
>>> +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c
>>> @@ -1031,7 +1031,7 @@ static int ixgbe_set_ringparam(struct net_device *netdev,
>>>  {
>>>         struct ixgbe_adapter *adapter = netdev_priv(netdev);
>>>         struct ixgbe_ring *temp_ring;
>>> -       int i, err = 0;
>>> +       int i, j, err = 0;
>>>         u32 new_rx_count, new_tx_count;
>>>
>>>         if ((ring->rx_mini_pending) || (ring->rx_jumbo_pending))
>>> @@ -1057,15 +1057,19 @@ static int ixgbe_set_ringparam(struct net_device *netdev,
>>>         if (!netif_running(adapter->netdev)) {
>>>                 for (i = 0; i < adapter->num_tx_queues; i++)
>>>                         adapter->tx_ring[i]->count = new_tx_count;
>>> +               for (i = 0; i < adapter->num_xdp_queues; i++)
>>> +                       adapter->xdp_ring[i]->count = new_tx_count;
>>>                 for (i = 0; i < adapter->num_rx_queues; i++)
>>>                         adapter->rx_ring[i]->count = new_rx_count;
>>>                 adapter->tx_ring_count = new_tx_count;
>>> +               adapter->xdp_ring_count = new_tx_count;
>>>                 adapter->rx_ring_count = new_rx_count;
>>>                 goto clear_reset;
>>>         }
>>>
>>>         /* allocate temporary buffer to store rings in */
>>> -       i = max_t(int, adapter->num_tx_queues, adapter->num_rx_queues);
>>> +       i = max_t(int, adapter->num_tx_queues + adapter->num_xdp_queues,
>>> +                 adapter->num_rx_queues);
>>
>> You only need the maximum of one of the 3 values.  You don't need to
>> add num_tx_queues to num_xdp_queues.
>>
> 
> hold-over from when I pushed xdp and tx queues into the same ring[] array.
> 
>>>         temp_ring = vmalloc(i * sizeof(struct ixgbe_ring));
>>>
>>>         if (!temp_ring) {
>>> @@ -1097,12 +1101,33 @@ static int ixgbe_set_ringparam(struct net_device *netdev,
>>>                         }
>>>                 }
>>>
>>> +               for (j = 0; j < adapter->num_xdp_queues; i++, j++) {
>>> +                       memcpy(&temp_ring[i], adapter->xdp_ring[j],
>>> +                              sizeof(struct ixgbe_ring));
>>> +
>>> +                       temp_ring[i].count = new_tx_count;
>>> +                       err = ixgbe_setup_tx_resources(&temp_ring[i]);
>>> +                       if (err) {
>>> +                               while (i) {
>>> +                                       i--;
>>> +                                       ixgbe_free_tx_resources(&temp_ring[i]);
>>> +                               }
>>> +                               goto err_setup;
>>> +                       }
>>> +               }
>>> +
>>>                 for (i = 0; i < adapter->num_tx_queues; i++) {
>>>                         ixgbe_free_tx_resources(adapter->tx_ring[i]);
>>>
>>>                         memcpy(adapter->tx_ring[i], &temp_ring[i],
>>>                                sizeof(struct ixgbe_ring));
>>>                 }
>>> +               for (j = 0; j < adapter->num_xdp_queues; i++, j++) {
>>> +                       ixgbe_free_tx_resources(adapter->xdp_ring[j]);
>>> +
>>> +                       memcpy(adapter->xdp_ring[j], &temp_ring[i],
>>> +                              sizeof(struct ixgbe_ring));
>>> +               }
>>>
>>>                 adapter->tx_ring_count = new_tx_count;
>>>         }
>>> diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_lib.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_lib.c
>>> index 1b8be7d..d9a4916 100644
>>> --- a/drivers/net/ethernet/intel/ixgbe/ixgbe_lib.c
>>> +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_lib.c
>>> @@ -73,6 +73,11 @@ static bool ixgbe_cache_ring_dcb_sriov(struct ixgbe_adapter *adapter)
>>>                         reg_idx = __ALIGN_MASK(reg_idx, ~vmdq->mask);
>>>                 adapter->tx_ring[i]->reg_idx = reg_idx;
>>>         }
>>> +       for (i = 0; i < adapter->num_xdp_queues; i++, reg_idx++) {
>>> +               if ((reg_idx & ~vmdq->mask) >= tcs)
>>> +                       reg_idx = __ALIGN_MASK(reg_idx, ~vmdq->mask);
>>> +               adapter->xdp_ring[i]->reg_idx = reg_idx;
>>> +       }
>>>
>>>  #ifdef IXGBE_FCOE
>>>         /* nothing to do if FCoE is disabled */
>>> @@ -248,6 +253,12 @@ static bool ixgbe_cache_ring_sriov(struct ixgbe_adapter *adapter)
>>>                 adapter->tx_ring[i]->reg_idx = reg_idx;
>>>         }
>>>
>>> +       for (i = 0; i < adapter->num_xdp_queues; i++, reg_idx++) {
>>> +               if ((reg_idx & rss->mask) >= rss->indices)
>>> +                       reg_idx = __ALIGN_MASK(reg_idx, ~vmdq->mask);
>>> +               adapter->xdp_ring[i]->reg_idx = reg_idx;
>>> +       }
>>> +
>>>  #ifdef IXGBE_FCOE
>>>         /* FCoE uses a linear block of queues so just assigning 1:1 */
>>>         for (; i < adapter->num_tx_queues; i++, reg_idx++)
>>
>> One thing we may want to look at doing for the SR-IOV cases would be
>> to steal XDP queues from an adjacent VMDq pool.  It would put a
>> dependency us limiting ourselves by one additional VF, but it would
>> make the logic much simpler since each VMDq pool is the same size so
>> the Tx/Rx limit in one pool would be the same as the next so you could
>> do twice the number of Tx queues as Rx queues.
> 
> follow on patches?
> 
>>
>>> @@ -267,12 +278,14 @@ static bool ixgbe_cache_ring_sriov(struct ixgbe_adapter *adapter)
>>>   **/
>>>  static bool ixgbe_cache_ring_rss(struct ixgbe_adapter *adapter)
>>>  {
>>> -       int i;
>>> +       int i, reg_idx;
>>>
>>>         for (i = 0; i < adapter->num_rx_queues; i++)
>>>                 adapter->rx_ring[i]->reg_idx = i;
>>> -       for (i = 0; i < adapter->num_tx_queues; i++)
>>> -               adapter->tx_ring[i]->reg_idx = i;
>>> +       for (i = 0, reg_idx = 0; i < adapter->num_tx_queues; i++, reg_idx++)
>>> +               adapter->tx_ring[i]->reg_idx = reg_idx;
>>> +       for (i = 0; i < adapter->num_xdp_queues; i++, reg_idx++)
>>> +               adapter->xdp_ring[i]->reg_idx = reg_idx;
>>>
>>>         return true;
>>>  }
>>> @@ -308,6 +321,14 @@ static void ixgbe_cache_ring_register(struct ixgbe_adapter *adapter)
>>>         ixgbe_cache_ring_rss(adapter);
>>>  }
>>>
>>> +static int ixgbe_xdp_queues(struct ixgbe_adapter *adapter)
>>> +{
>>> +       if (nr_cpu_ids > MAX_XDP_QUEUES)
>>> +               return 0;
>>> +
>>> +       return adapter->xdp_prog ? nr_cpu_ids : 0;
>>> +}
>>> +
>>>  #define IXGBE_RSS_64Q_MASK     0x3F
>>>  #define IXGBE_RSS_16Q_MASK     0xF
>>>  #define IXGBE_RSS_8Q_MASK      0x7
>>> @@ -382,6 +403,7 @@ static bool ixgbe_set_dcb_sriov_queues(struct ixgbe_adapter *adapter)
>>>         adapter->num_rx_queues_per_pool = tcs;
>>>
>>>         adapter->num_tx_queues = vmdq_i * tcs;
>>> +       adapter->num_xdp_queues = ixgbe_xdp_queues(adapter);
>>>         adapter->num_rx_queues = vmdq_i * tcs;
>>>
>>>  #ifdef IXGBE_FCOE
>>> @@ -479,6 +501,7 @@ static bool ixgbe_set_dcb_queues(struct ixgbe_adapter *adapter)
>>>                 netdev_set_tc_queue(dev, i, rss_i, rss_i * i);
>>>
>>>         adapter->num_tx_queues = rss_i * tcs;
>>> +       adapter->num_xdp_queues = ixgbe_xdp_queues(adapter);
>>>         adapter->num_rx_queues = rss_i * tcs;
>>>
>>>         return true;
>>> @@ -549,6 +572,7 @@ static bool ixgbe_set_sriov_queues(struct ixgbe_adapter *adapter)
>>>
>>>         adapter->num_rx_queues = vmdq_i * rss_i;
>>>         adapter->num_tx_queues = vmdq_i * rss_i;
>>> +       adapter->num_xdp_queues = ixgbe_xdp_queues(adapter);
>>>
>>>         /* disable ATR as it is not supported when VMDq is enabled */
>>>         adapter->flags &= ~IXGBE_FLAG_FDIR_HASH_CAPABLE;
>>
>> I'm pretty sure we are currently looking at an overly simplified
>> setup.  I suspect we need to take queue limits into account for DCB
>> and SR-IOV.  For example what TC are we supposed to be taking Tx
>> queues from in the case of DCB? Really in order for us to support XDP
>> in the DCB case we need rss_i to be equal to or less than half the
>> maximum possible value.
>>
> 
> This is probably a gap in XDP in general. For example how would XDP even
> push a packet at a traffic class? At least we should ensure rss_i doesn't
> break something but not sure how to coexist with DCB reasonably at the
> moment.
> 
>> And as I mentioned before we need to make certain in the case of
>> SR-IOV w/ DCB or without that we have one pool that we set aside so
>> that we can steal queues from it to support XPS.
>>
> 
> SR-IOV + XDP should work I'm a bit more skeptical about DCB at the
> moment.  I'm think adding a check in rss_i to ensure we can claim
> the queues ("less than half" comment above) and then sort out the
> 'xdp' pool concept as follow on series.
> 
>>> @@ -669,6 +693,7 @@ static bool ixgbe_set_rss_queues(struct ixgbe_adapter *adapter)
>>>  #endif /* IXGBE_FCOE */
>>>         adapter->num_rx_queues = rss_i;
>>>         adapter->num_tx_queues = rss_i;
>>> +       adapter->num_xdp_queues = ixgbe_xdp_queues(adapter);
>>>
>>>         return true;
>>>  }
>>
>> So this is the one allocation that is safe for now since the rss_i has
>> an upper limit of 64.
>>
>>> @@ -689,6 +714,7 @@ static void ixgbe_set_num_queues(struct ixgbe_adapter *adapter)
>>>         /* Start with base case */
>>>         adapter->num_rx_queues = 1;
>>>         adapter->num_tx_queues = 1;
>>> +       adapter->num_xdp_queues = 0;
>>>         adapter->num_rx_pools = adapter->num_rx_queues;
>>>         adapter->num_rx_queues_per_pool = 1;
>>>
>>> @@ -720,7 +746,8 @@ static int ixgbe_acquire_msix_vectors(struct ixgbe_adapter *adapter)
>>>         int i, vectors, vector_threshold;
>>>
>>>         /* We start by asking for one vector per queue pair */
>>> -       vectors = max(adapter->num_rx_queues, adapter->num_tx_queues);
>>> +       vectors = max(adapter->num_rx_queues,
>>> +                     adapter->num_tx_queues + adapter->num_xdp_queues);
>>>
>>>         /* It is easy to be greedy for MSI-X vectors. However, it really
>>>          * doesn't do much good if we have a lot more vectors than CPUs. We'll
>>
>> This might be overkill on the vectors.  You can stack regular Rx
>> queues, Tx queues, and xdp_queues.  I would say take the maximum of
>> all 3 instead of adding Tx and XPS queues.
> 
> sure.
> 
>>
>>> @@ -800,6 +827,8 @@ static void ixgbe_add_ring(struct ixgbe_ring *ring,
>>>   * @v_idx: index of vector in adapter struct
>>>   * @txr_count: total number of Tx rings to allocate
>>>   * @txr_idx: index of first Tx ring to allocate
>>> + * @xdp_count: total number of XDP rings to allocate
>>> + * @xdp_idx: index of first XDP ring to allocate
>>>   * @rxr_count: total number of Rx rings to allocate
>>>   * @rxr_idx: index of first Rx ring to allocate
>>>   *
>>> @@ -808,6 +837,7 @@ static void ixgbe_add_ring(struct ixgbe_ring *ring,
>>>  static int ixgbe_alloc_q_vector(struct ixgbe_adapter *adapter,
>>>                                 int v_count, int v_idx,
>>>                                 int txr_count, int txr_idx,
>>> +                               int xdp_count, int xdp_idx,
>>>                                 int rxr_count, int rxr_idx)
>>>  {
>>>         struct ixgbe_q_vector *q_vector;
>>> @@ -909,6 +939,33 @@ static int ixgbe_alloc_q_vector(struct ixgbe_adapter *adapter,
>>>                 ring++;
>>>         }
>>>
>>> +       while (xdp_count) {
>>> +               /* assign generic ring traits */
>>> +               ring->dev = &adapter->pdev->dev;
>>> +               ring->netdev = adapter->netdev;
>>
>> Just a thought on all this.  I don't know if we should be setting
>> ring->netdev.  We don't really have a netdevice associated with the
>> XDP rings.  we have an Rx ring.
>>
>> If we were to leave ring->netdev as NULL it might help us to find any
>> and all places where we might have overlooked things and have XDP
>> trying to tweak netdev queues.  In addition it would help to short
>> circuit the flag checks in the Tx clean-up path as we could just check
>> for tx_ring->netdev instead of having to check for an XDP boolean
>> value.
>>
> 
> hmm not sure about this. When we dump debug statements for example on
> a hung queue its helpful to have an idea what port the queues are from.
> Although I guess we could print one of the other identifier strings instead.
> 
>>> +               /* configure backlink on ring */
>>> +               ring->q_vector = q_vector;
>>> +
>>> +               /* update q_vector Tx values */
>>> +               ixgbe_add_ring(ring, &q_vector->tx);
>>> +
>>> +               /* apply Tx specific ring traits */
>>> +               ring->count = adapter->xdp_ring_count;
>>> +               ring->queue_index = xdp_idx;
>>> +               ring->xdp_ring = true;
>>> +
>>> +               /* assign ring to adapter */
>>> +               adapter->xdp_ring[xdp_idx] = ring;
>>> +
>>> +               /* update count and index */
>>> +               xdp_count--;
>>> +               xdp_idx += v_count;
>>> +
>>> +               /* push pointer to next ring */
>>> +               ring++;
>>> +       }
>>> +
>>>         while (rxr_count) {
>>>                 /* assign generic ring traits */
>>>                 ring->dev = &adapter->pdev->dev;
>>> @@ -1002,17 +1059,18 @@ static int ixgbe_alloc_q_vectors(struct ixgbe_adapter *adapter)
>>>         int q_vectors = adapter->num_q_vectors;
>>>         int rxr_remaining = adapter->num_rx_queues;
>>>         int txr_remaining = adapter->num_tx_queues;
>>> -       int rxr_idx = 0, txr_idx = 0, v_idx = 0;
>>> +       int xdp_remaining = adapter->num_xdp_queues;
>>> +       int rxr_idx = 0, txr_idx = 0, xdp_idx = 0, v_idx = 0;
>>>         int err;
>>>
>>>         /* only one q_vector if MSI-X is disabled. */
>>>         if (!(adapter->flags & IXGBE_FLAG_MSIX_ENABLED))
>>>                 q_vectors = 1;
>>>
>>> -       if (q_vectors >= (rxr_remaining + txr_remaining)) {
>>> +       if (q_vectors >= (rxr_remaining + txr_remaining + xdp_remaining)) {
>>>                 for (; rxr_remaining; v_idx++) {
>>>                         err = ixgbe_alloc_q_vector(adapter, q_vectors, v_idx,
>>> -                                                  0, 0, 1, rxr_idx);
>>> +                                                  0, 0, 0, 0, 1, rxr_idx);
>>>
>>>                         if (err)
>>>                                 goto err_out;
>>> @@ -1026,8 +1084,11 @@ static int ixgbe_alloc_q_vectors(struct ixgbe_adapter *adapter)
>>>         for (; v_idx < q_vectors; v_idx++) {
>>>                 int rqpv = DIV_ROUND_UP(rxr_remaining, q_vectors - v_idx);
>>>                 int tqpv = DIV_ROUND_UP(txr_remaining, q_vectors - v_idx);
>>> +               int xqpv = DIV_ROUND_UP(xdp_remaining, q_vectors - v_idx);
>>> +
>>>                 err = ixgbe_alloc_q_vector(adapter, q_vectors, v_idx,
>>>                                            tqpv, txr_idx,
>>> +                                          xqpv, xdp_idx,
>>>                                            rqpv, rxr_idx);
>>>
>>>                 if (err)
>>> @@ -1036,14 +1097,17 @@ static int ixgbe_alloc_q_vectors(struct ixgbe_adapter *adapter)
>>>                 /* update counts and index */
>>>                 rxr_remaining -= rqpv;
>>>                 txr_remaining -= tqpv;
>>> +               xdp_remaining -= xqpv;
>>>                 rxr_idx++;
>>>                 txr_idx++;
>>> +               xdp_idx++;
>>>         }
>>>
>>>         return 0;
>>>
>>>  err_out:
>>>         adapter->num_tx_queues = 0;
>>> +       adapter->num_xdp_queues = 0;
>>>         adapter->num_rx_queues = 0;
>>>         adapter->num_q_vectors = 0;
>>>
>>> @@ -1066,6 +1130,7 @@ static void ixgbe_free_q_vectors(struct ixgbe_adapter *adapter)
>>>         int v_idx = adapter->num_q_vectors;
>>>
>>>         adapter->num_tx_queues = 0;
>>> +       adapter->num_xdp_queues = 0;
>>>         adapter->num_rx_queues = 0;
>>>         adapter->num_q_vectors = 0;
>>>
>>> @@ -1172,9 +1237,10 @@ int ixgbe_init_interrupt_scheme(struct ixgbe_adapter *adapter)
>>>
>>>         ixgbe_cache_ring_register(adapter);
>>>
>>> -       e_dev_info("Multiqueue %s: Rx Queue count = %u, Tx Queue count = %u\n",
>>> +       e_dev_info("Multiqueue %s: Rx Queue count = %u, Tx Queue count = %u XDP Queue count = %u\n",
>>>                    (adapter->num_rx_queues > 1) ? "Enabled" : "Disabled",
>>> -                  adapter->num_rx_queues, adapter->num_tx_queues);
>>> +                  adapter->num_rx_queues,
>>> +                  adapter->num_tx_queues, adapter->num_xdp_queues);
>>>
>>>         set_bit(__IXGBE_DOWN, &adapter->state);
>>>
>>
>> I think I would have preferred to see us leave tx_queues, and
>> rx_queues on the same line and add num_xdp_queues on a line by itself.
> 
> really :/ it seems a bit cleaner to have them all on one line IMO it fits on an
> 80 char screen fairly sure.
> 
>>
>>> @@ -1195,6 +1261,7 @@ int ixgbe_init_interrupt_scheme(struct ixgbe_adapter *adapter)
>>>  void ixgbe_clear_interrupt_scheme(struct ixgbe_adapter *adapter)
>>>  {
>>>         adapter->num_tx_queues = 0;
>>> +       adapter->num_xdp_queues = 0;
>>>         adapter->num_rx_queues = 0;
>>>
>>>         ixgbe_free_q_vectors(adapter);
>>> diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
>>> index ec2c38f..227caf8 100644
>>> --- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
>>> +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
>>> @@ -594,6 +594,19 @@ static void ixgbe_regdump(struct ixgbe_hw *hw, struct ixgbe_reg_info *reginfo)
>>>
>>>  }
>>>
>>> +static void ixgbe_print_buffer(struct ixgbe_ring *ring, int n)
>>> +{
>>> +       struct ixgbe_tx_buffer *tx_buffer;
>>> +
>>> +       tx_buffer = &ring->tx_buffer_info[ring->next_to_clean];
>>> +       pr_info(" %5d %5X %5X %016llX %08X %p %016llX\n",
>>> +                  n, ring->next_to_use, ring->next_to_clean,
>>> +                  (u64)dma_unmap_addr(tx_buffer, dma),
>>> +                  dma_unmap_len(tx_buffer, len),
>>> +                  tx_buffer->next_to_watch,
>>> +                  (u64)tx_buffer->time_stamp);
>>> +}
>>> +
>>>  /*
>>>   * ixgbe_dump - Print registers, tx-rings and rx-rings
>>>   */
>>> @@ -603,7 +616,7 @@ static void ixgbe_dump(struct ixgbe_adapter *adapter)
>>>         struct ixgbe_hw *hw = &adapter->hw;
>>>         struct ixgbe_reg_info *reginfo;
>>>         int n = 0;
>>> -       struct ixgbe_ring *tx_ring;
>>> +       struct ixgbe_ring *ring;
>>>         struct ixgbe_tx_buffer *tx_buffer;
>>>         union ixgbe_adv_tx_desc *tx_desc;
>>>         struct my_u0 { u64 a; u64 b; } *u0;
>>> @@ -644,14 +657,13 @@ static void ixgbe_dump(struct ixgbe_adapter *adapter)
>>>                 "Queue [NTU] [NTC] [bi(ntc)->dma  ]",
>>>                 "leng", "ntw", "timestamp");
>>>         for (n = 0; n < adapter->num_tx_queues; n++) {
>>> -               tx_ring = adapter->tx_ring[n];
>>> -               tx_buffer = &tx_ring->tx_buffer_info[tx_ring->next_to_clean];
>>> -               pr_info(" %5d %5X %5X %016llX %08X %p %016llX\n",
>>> -                          n, tx_ring->next_to_use, tx_ring->next_to_clean,
>>> -                          (u64)dma_unmap_addr(tx_buffer, dma),
>>> -                          dma_unmap_len(tx_buffer, len),
>>> -                          tx_buffer->next_to_watch,
>>> -                          (u64)tx_buffer->time_stamp);
>>> +               ring = adapter->tx_ring[n];
>>> +               ixgbe_print_buffer(ring, n);
>>> +       }
>>> +
>>> +       for (n = 0; n < adapter->num_xdp_queues; n++) {
>>> +               ring = adapter->xdp_ring[n];
>>> +               ixgbe_print_buffer(ring, n);
>>>         }
>>>
>>>         /* Print TX Rings */
>>> @@ -696,28 +708,28 @@ static void ixgbe_dump(struct ixgbe_adapter *adapter)
>>>          */
>>>
>>>         for (n = 0; n < adapter->num_tx_queues; n++) {
>>> -               tx_ring = adapter->tx_ring[n];
>>> +               ring = adapter->tx_ring[n];
>>>                 pr_info("------------------------------------\n");
>>> -               pr_info("TX QUEUE INDEX = %d\n", tx_ring->queue_index);
>>> +               pr_info("TX QUEUE INDEX = %d\n", ring->queue_index);
>>>                 pr_info("------------------------------------\n");
>>>                 pr_info("%s%s    %s              %s        %s          %s\n",
>>>                         "T [desc]     [address 63:0  ] ",
>>>                         "[PlPOIdStDDt Ln] [bi->dma       ] ",
>>>                         "leng", "ntw", "timestamp", "bi->skb");
>>>
>>> -               for (i = 0; tx_ring->desc && (i < tx_ring->count); i++) {
>>> -                       tx_desc = IXGBE_TX_DESC(tx_ring, i);
>>> -                       tx_buffer = &tx_ring->tx_buffer_info[i];
>>> +               for (i = 0; ring->desc && (i < ring->count); i++) {
>>> +                       tx_desc = IXGBE_TX_DESC(ring, i);
>>> +                       tx_buffer = &ring->tx_buffer_info[i];
>>>                         u0 = (struct my_u0 *)tx_desc;
>>>                         if (dma_unmap_len(tx_buffer, len) > 0) {
>>>                                 const char *ring_desc;
>>>
>>> -                               if (i == tx_ring->next_to_use &&
>>> -                                   i == tx_ring->next_to_clean)
>>> +                               if (i == ring->next_to_use &&
>>> +                                   i == ring->next_to_clean)
>>>                                         ring_desc = " NTC/U";
>>> -                               else if (i == tx_ring->next_to_use)
>>> +                               else if (i == ring->next_to_use)
>>>                                         ring_desc = " NTU";
>>> -                               else if (i == tx_ring->next_to_clean)
>>> +                               else if (i == ring->next_to_clean)
>>>                                         ring_desc = " NTC";
>>>                                 else
>>>                                         ring_desc = "";
>>> @@ -987,6 +999,10 @@ static void ixgbe_update_xoff_rx_lfc(struct ixgbe_adapter *adapter)
>>>         for (i = 0; i < adapter->num_tx_queues; i++)
>>>                 clear_bit(__IXGBE_HANG_CHECK_ARMED,
>>>                           &adapter->tx_ring[i]->state);
>>> +
>>> +       for (i = 0; i < adapter->num_xdp_queues; i++)
>>> +               clear_bit(__IXGBE_HANG_CHECK_ARMED,
>>> +                         &adapter->xdp_ring[i]->state);
>>>  }
>>>
>>>  static void ixgbe_update_xoff_received(struct ixgbe_adapter *adapter)
>>> @@ -1031,6 +1047,14 @@ static void ixgbe_update_xoff_received(struct ixgbe_adapter *adapter)
>>>                 if (xoff[tc])
>>>                         clear_bit(__IXGBE_HANG_CHECK_ARMED, &tx_ring->state);
>>>         }
>>> +
>>> +       for (i = 0; i < adapter->num_xdp_queues; i++) {
>>> +               struct ixgbe_ring *xdp_ring = adapter->xdp_ring[i];
>>> +
>>> +               tc = xdp_ring->dcb_tc;
>>> +               if (xoff[tc])
>>> +                       clear_bit(__IXGBE_HANG_CHECK_ARMED, &xdp_ring->state);
>>> +       }
>>>  }
>>>
>>>  static u64 ixgbe_get_tx_completed(struct ixgbe_ring *ring)
>>> @@ -1137,6 +1161,11 @@ static int ixgbe_tx_maxrate(struct net_device *netdev,
>>>         return 0;
>>>  }
>>>
>>> +static bool ixgbe_txring_is_xdp(struct ixgbe_ring *tx_ring)
>>> +{
>>> +       return tx_ring->xdp_ring;
>>> +}
>>> +
>>>  /**
>>>   * ixgbe_clean_tx_irq - Reclaim resources after transmit completes
>>>   * @q_vector: structure containing interrupt and ring information
>>> @@ -1182,7 +1211,10 @@ static bool ixgbe_clean_tx_irq(struct ixgbe_q_vector *q_vector,
>>>                 total_packets += tx_buffer->gso_segs;
>>>
>>>                 /* free the skb */
>>> -               napi_consume_skb(tx_buffer->skb, napi_budget);
>>> +               if (ixgbe_txring_is_xdp(tx_ring))
>>> +                       page_frag_free(tx_buffer->data);
>>> +               else
>>> +                       napi_consume_skb(tx_buffer->skb, napi_budget);
>>>
>>>                 /* unmap skb header data */
>>>                 dma_unmap_single(tx_ring->dev,
>>> @@ -1243,7 +1275,7 @@ static bool ixgbe_clean_tx_irq(struct ixgbe_q_vector *q_vector,
>>>         if (check_for_tx_hang(tx_ring) && ixgbe_check_tx_hang(tx_ring)) {
>>>                 /* schedule immediate reset if we believe we hung */
>>>                 struct ixgbe_hw *hw = &adapter->hw;
>>> -               e_err(drv, "Detected Tx Unit Hang\n"
>>> +               e_err(drv, "Detected Tx Unit Hang %s\n"
>>>                         "  Tx Queue             <%d>\n"
>>>                         "  TDH, TDT             <%x>, <%x>\n"
>>>                         "  next_to_use          <%x>\n"
>>> @@ -1251,13 +1283,16 @@ static bool ixgbe_clean_tx_irq(struct ixgbe_q_vector *q_vector,
>>>                         "tx_buffer_info[next_to_clean]\n"
>>>                         "  time_stamp           <%lx>\n"
>>>                         "  jiffies              <%lx>\n",
>>> +                       ixgbe_txring_is_xdp(tx_ring) ? "(XDP)" : "",
>>>                         tx_ring->queue_index,
>>>                         IXGBE_READ_REG(hw, IXGBE_TDH(tx_ring->reg_idx)),
>>>                         IXGBE_READ_REG(hw, IXGBE_TDT(tx_ring->reg_idx)),
>>>                         tx_ring->next_to_use, i,
>>>                         tx_ring->tx_buffer_info[i].time_stamp, jiffies);
>>>
>>> -               netif_stop_subqueue(tx_ring->netdev, tx_ring->queue_index);
>>> +               if (!ixgbe_txring_is_xdp(tx_ring))
>>> +                       netif_stop_subqueue(tx_ring->netdev,
>>> +                                           tx_ring->queue_index);
>>>
>>>                 e_info(probe,
>>>                        "tx hang %d detected on queue %d, resetting adapter\n",
>>> @@ -1270,6 +1305,9 @@ static bool ixgbe_clean_tx_irq(struct ixgbe_q_vector *q_vector,
>>>                 return true;
>>>         }
>>>
>>> +       if (ixgbe_txring_is_xdp(tx_ring))
>>> +               return !!budget;
>>> +
>>>         netdev_tx_completed_queue(txring_txq(tx_ring),
>>>                                   total_packets, total_bytes);
>>>
>>> @@ -2160,10 +2198,16 @@ static struct sk_buff *ixgbe_build_skb(struct ixgbe_ring *rx_ring,
>>>         return skb;
>>>  }
>>>
>>> -static int ixgbe_run_xdp(struct ixgbe_ring  *rx_ring,
>>> +static int ixgbe_xmit_xdp_ring(struct xdp_buff *xdp,
>>> +                              struct ixgbe_adapter *adapter,
>>> +                              struct ixgbe_ring *tx_ring);
>>> +
>>> +static int ixgbe_run_xdp(struct ixgbe_adapter *adapter,
>>> +                        struct ixgbe_ring  *rx_ring,
>>>                          struct ixgbe_rx_buffer *rx_buffer,
>>>                          unsigned int size)
>>>  {
>>> +       struct ixgbe_ring *xdp_ring;
>>>         struct bpf_prog *xdp_prog;
>>>         struct xdp_buff xdp;
>>>         void *addr;
>>> @@ -2183,9 +2227,18 @@ static int ixgbe_run_xdp(struct ixgbe_ring  *rx_ring,
>>>         switch (act) {
>>>         case XDP_PASS:
>>>                 return 0;
>>> +       case XDP_TX:
>>> +               xdp_ring = adapter->xdp_ring[smp_processor_id()];
>>> +
>>> +               /* XDP_TX is not flow controlled */
>>> +               if (!xdp_ring) {
>>> +                       WARN_ON(1);
>>> +                       break;
>>> +               }
>>> +               ixgbe_xmit_xdp_ring(&xdp, adapter, xdp_ring);
>>> +               break;
>>
>> So just for the sake of future compatibility I would say you might
>> look at instead just passing a netdev and the XDP buffer.  Then your
>> transmit routine can make the decision on using smp_processor_id and
>> grab the xdp_ring it wants.  No point in having us make the decision
>> in the Rx path.  That way things stay closer to how the transmit path
>> already works.
>>
>> Also shouldn't you return some sort of notification as to if you
>> completed the xmit successfully?  It almost seems like maybe we should
>> fall through to an XDP_ABORTED case if we didn't transmit the frame
>> since we might end up leaking the memory otherwise.
> 
> Yep. Fixing.
> 
>>
>>>         default:
>>>                 bpf_warn_invalid_xdp_action(act);
>>> -       case XDP_TX:
>>>         case XDP_ABORTED:
>>>                 trace_xdp_exception(rx_ring->netdev, xdp_prog, act);
>>>                 /* fallthrough -- handle aborts by dropping packet */
>>> @@ -2214,8 +2267,8 @@ static int ixgbe_clean_rx_irq(struct ixgbe_q_vector *q_vector,
>>>                                const int budget)
>>>  {
>>>         unsigned int total_rx_bytes = 0, total_rx_packets = 0;
>>> -#ifdef IXGBE_FCOE
>>>         struct ixgbe_adapter *adapter = q_vector->adapter;
>>> +#ifdef IXGBE_FCOE
>>>         int ddp_bytes;
>>>         unsigned int mss = 0;
>>>  #endif /* IXGBE_FCOE */
>>> @@ -2248,7 +2301,7 @@ static int ixgbe_clean_rx_irq(struct ixgbe_q_vector *q_vector,
>>>                 rx_buffer = ixgbe_get_rx_buffer(rx_ring, rx_desc, &skb, size);
>>>
>>>                 rcu_read_lock();
>>> -               consumed = ixgbe_run_xdp(rx_ring, rx_buffer, size);
>>> +               consumed = ixgbe_run_xdp(adapter, rx_ring, rx_buffer, size);
>>>                 rcu_read_unlock();
>>>
>>>                 if (consumed) {
>>
>> You probably don't need the adapter struct.  If you have the rx_ring
>> it should have a netdev structure and from that you can get to the
>> adapter struct via netdev_priv.
> 
> Its easy to pass it through the function though and avoid priv(rx_ring->adapter)
> line.
> 
>>
>>> @@ -2914,6 +2967,15 @@ static irqreturn_t ixgbe_msix_other(int irq, void *data)
>>>                                                        &ring->state))
>>>                                         reinit_count++;
>>>                         }
>>> +
>>> +                       for (i = 0; i < adapter->num_xdp_queues; i++) {
>>> +                               struct ixgbe_ring *ring = adapter->xdp_ring[i];
>>> +
>>> +                               if (test_and_clear_bit(__IXGBE_TX_FDIR_INIT_DONE,
>>> +                                                      &ring->state))
>>> +                                       reinit_count++;
>>> +                       }
>>> +
>>>                         if (reinit_count) {
>>>                                 /* no more flow director interrupts until after init */
>>>                                 IXGBE_WRITE_REG(hw, IXGBE_EIMC, IXGBE_EIMC_FLOW_DIR);
>>
>> We don't run flow director on the XDP queues so there is no need to
>> add this bit.
> 
> Yep cut'n'paste error.
> 
>>
>>> @@ -3429,6 +3491,8 @@ static void ixgbe_configure_tx(struct ixgbe_adapter *adapter)
>>>         /* Setup the HW Tx Head and Tail descriptor pointers */
>>>         for (i = 0; i < adapter->num_tx_queues; i++)
>>>                 ixgbe_configure_tx_ring(adapter, adapter->tx_ring[i]);
>>> +       for (i = 0; i < adapter->num_xdp_queues; i++)
>>> +               ixgbe_configure_tx_ring(adapter, adapter->xdp_ring[i]);
>>>  }
>>>
>>>  static void ixgbe_enable_rx_drop(struct ixgbe_adapter *adapter,
>>> @@ -5598,7 +5662,8 @@ static void ixgbe_clean_tx_ring(struct ixgbe_ring *tx_ring)
>>>         }
>>>
>>>         /* reset BQL for queue */
>>> -       netdev_tx_reset_queue(txring_txq(tx_ring));
>>> +       if (!ixgbe_txring_is_xdp(tx_ring))
>>> +               netdev_tx_reset_queue(txring_txq(tx_ring));
>>>
>>>         /* reset next_to_use and next_to_clean */
>>>         tx_ring->next_to_use = 0;
>>> @@ -5627,6 +5692,8 @@ static void ixgbe_clean_all_tx_rings(struct ixgbe_adapter *adapter)
>>>
>>>         for (i = 0; i < adapter->num_tx_queues; i++)
>>>                 ixgbe_clean_tx_ring(adapter->tx_ring[i]);
>>> +       for (i = 0; i < adapter->num_xdp_queues; i++)
>>> +               ixgbe_clean_tx_ring(adapter->xdp_ring[i]);
>>>  }
>>>
>>>  static void ixgbe_fdir_filter_exit(struct ixgbe_adapter *adapter)
>>> @@ -5721,6 +5788,11 @@ void ixgbe_down(struct ixgbe_adapter *adapter)
>>>                 u8 reg_idx = adapter->tx_ring[i]->reg_idx;
>>>                 IXGBE_WRITE_REG(hw, IXGBE_TXDCTL(reg_idx), IXGBE_TXDCTL_SWFLSH);
>>>         }
>>> +       for (i = 0; i < adapter->num_xdp_queues; i++) {
>>> +               u8 reg_idx = adapter->xdp_ring[i]->reg_idx;
>>> +
>>> +               IXGBE_WRITE_REG(hw, IXGBE_TXDCTL(reg_idx), IXGBE_TXDCTL_SWFLSH);
>>> +       }
>>>
>>>         /* Disable the Tx DMA engine on 82599 and later MAC */
>>>         switch (hw->mac.type) {
>>> @@ -6008,6 +6080,7 @@ static int ixgbe_sw_init(struct ixgbe_adapter *adapter,
>>>
>>>         /* set default ring sizes */
>>>         adapter->tx_ring_count = IXGBE_DEFAULT_TXD;
>>> +       adapter->xdp_ring_count = IXGBE_DEFAULT_TXD;
>>>         adapter->rx_ring_count = IXGBE_DEFAULT_RXD;
>>>
>>>         /* set default work limits */
>>
>> I would just use the tx_ring_count value for the XDP rings.  No point
>> in adding another control unless you want to add the ethtool interface
>> for it.
>>
> 
> I liked that it was explicit. But its only ever used once so I'll remove it.
> 
>>> @@ -6089,7 +6162,7 @@ int ixgbe_setup_tx_resources(struct ixgbe_ring *tx_ring)
>>>   **/
>>>  static int ixgbe_setup_all_tx_resources(struct ixgbe_adapter *adapter)
>>>  {
>>> -       int i, err = 0;
>>> +       int i, j = 0, err = 0;
>>>
>>>         for (i = 0; i < adapter->num_tx_queues; i++) {
>>>                 err = ixgbe_setup_tx_resources(adapter->tx_ring[i]);
>>> @@ -6099,12 +6172,23 @@ static int ixgbe_setup_all_tx_resources(struct ixgbe_adapter *adapter)
>>>                 e_err(probe, "Allocation for Tx Queue %u failed\n", i);
>>>                 goto err_setup_tx;
>>>         }
>>> +       for (j = 0; j < adapter->num_xdp_queues; j++) {
>>> +               err = ixgbe_setup_tx_resources(adapter->xdp_ring[j]);
>>> +               if (!err)
>>> +                       continue;
>>> +
>>> +               e_err(probe, "Allocation for Tx Queue %u failed\n", j);
>>> +               goto err_setup_tx;
>>> +       }
>>> +
>>>
>>>         return 0;
>>>  err_setup_tx:
>>>         /* rewind the index freeing the rings as we go */
>>>         while (i--)
>>>                 ixgbe_free_tx_resources(adapter->tx_ring[i]);
>>> +       while (j--)
>>> +               ixgbe_free_tx_resources(adapter->xdp_ring[j]);
>>>         return err;
>>>  }
>>>
>>
>> Just for sake of symmetry I would prefer to see xdp rings freed before
>> the Tx rings.
> 
> OK.
> 
>>
>>> @@ -6238,6 +6322,9 @@ static void ixgbe_free_all_tx_resources(struct ixgbe_adapter *adapter)
>>>         for (i = 0; i < adapter->num_tx_queues; i++)
>>>                 if (adapter->tx_ring[i]->desc)
>>>                         ixgbe_free_tx_resources(adapter->tx_ring[i]);
>>> +       for (i = 0; i < adapter->num_xdp_queues; i++)
>>> +               if (adapter->xdp_ring[i]->desc)
>>> +                       ixgbe_free_tx_resources(adapter->xdp_ring[i]);
>>>  }
>>>
>>>  /**
>>> @@ -6656,6 +6743,14 @@ void ixgbe_update_stats(struct ixgbe_adapter *adapter)
>>>                 bytes += tx_ring->stats.bytes;
>>>                 packets += tx_ring->stats.packets;
>>>         }
>>> +       for (i = 0; i < adapter->num_xdp_queues; i++) {
>>> +               struct ixgbe_ring *xdp_ring = adapter->xdp_ring[i];
>>> +
>>> +               restart_queue += xdp_ring->tx_stats.restart_queue;
>>> +               tx_busy += xdp_ring->tx_stats.tx_busy;
>>> +               bytes += xdp_ring->stats.bytes;
>>> +               packets += xdp_ring->stats.packets;
>>> +       }
>>>         adapter->restart_queue = restart_queue;
>>>         adapter->tx_busy = tx_busy;
>>>         netdev->stats.tx_bytes = bytes;
>>
>> We may wan to look at doing a slight refactor on this to convert some
>> of these stats grabs over to static functions.  We may want to also
>> group these using structs instead of just doing this as individual u64
>> stats.
> 
> Agreed but that is a follow on patch and not really XDP specific.
> 
>>
>>> @@ -6849,6 +6944,9 @@ static void ixgbe_fdir_reinit_subtask(struct ixgbe_adapter *adapter)
>>>                 for (i = 0; i < adapter->num_tx_queues; i++)
>>>                         set_bit(__IXGBE_TX_FDIR_INIT_DONE,
>>>                                 &(adapter->tx_ring[i]->state));
>>> +               for (i = 0; i < adapter->num_xdp_queues; i++)
>>> +                       set_bit(__IXGBE_TX_FDIR_INIT_DONE,
>>> +                               &adapter->xdp_ring[i]->state);
>>>                 /* re-enable flow director interrupts */
>>>                 IXGBE_WRITE_REG(hw, IXGBE_EIMS, IXGBE_EIMS_FLOW_DIR);
>>>         } else {
>>
>> I wouldn't bother with the flow director initialization for the XDP rings.
>>
>> Although you may want to set __IXGBE_TX_XPS_INIT_DONE to avoid trying
>> to configure XPS for the XDP rings.
> 
> So I'll leave this block then.
> 
>>
>>> @@ -6882,6 +6980,8 @@ static void ixgbe_check_hang_subtask(struct ixgbe_adapter *adapter)
>>>         if (netif_carrier_ok(adapter->netdev)) {
>>>                 for (i = 0; i < adapter->num_tx_queues; i++)
>>>                         set_check_for_tx_hang(adapter->tx_ring[i]);
>>> +               for (i = 0; i < adapter->num_xdp_queues; i++)
>>> +                       set_check_for_tx_hang(adapter->xdp_ring[i]);
>>>         }
>>>
>>>         if (!(adapter->flags & IXGBE_FLAG_MSIX_ENABLED)) {
>>> @@ -7112,6 +7212,13 @@ static bool ixgbe_ring_tx_pending(struct ixgbe_adapter *adapter)
>>>                         return true;
>>>         }
>>>
>>> +       for (i = 0; i < adapter->num_xdp_queues; i++) {
>>> +               struct ixgbe_ring *ring = adapter->xdp_ring[i];
>>> +
>>> +               if (ring->next_to_use != ring->next_to_clean)
>>> +                       return true;
>>> +       }
>>> +
>>>         return false;
>>>  }
>>>
>>> @@ -8072,6 +8179,99 @@ static u16 ixgbe_select_queue(struct net_device *dev, struct sk_buff *skb,
>>>  #endif
>>>  }
>>>
>>> +static int ixgbe_xmit_xdp_ring(struct xdp_buff *xdp,
>>> +                              struct ixgbe_adapter *adapter,
>>> +                              struct ixgbe_ring *ring)
>>> +{
>>> +       struct ixgbe_tx_buffer *tx_buffer;
>>> +       union ixgbe_adv_tx_desc *tx_desc;
>>> +       u32 len, tx_flags = 0;
>>> +       dma_addr_t dma;
>>> +       u16 count, i;
>>> +       u32 cmd_type;
>>> +
>>> +       len = xdp->data_end - xdp->data;
>>> +       count = TXD_USE_COUNT(len);
>>> +
>>> +       /* record the location of the first descriptor for this packet */
>>> +       tx_buffer = &ring->tx_buffer_info[ring->next_to_use];
>>> +       tx_buffer->skb = NULL;
>>> +       tx_buffer->bytecount = len;
>>> +       tx_buffer->gso_segs = 1;
>>> +       tx_buffer->protocol = 0;
>>> +
>>> +#ifdef CONFIG_PCI_IOV
>>> +       /* Use the l2switch_enable flag - would be false if the DMA
>>> +        * Tx switch had been disabled.
>>> +        */
>>> +       if (adapter->flags & IXGBE_FLAG_SRIOV_ENABLED)
>>> +               tx_flags |= IXGBE_TX_FLAGS_CC;
>>> +#endif
>>> +
>>
>> I'm pretty sure there is a bug here.  The TX_FLAGS_CC is supposed to
>> notify the ring that it is supposed to populate the CC bit in olinfo
>> status.  Also you need to populate a context descriptor when this is
>> set since the header length recorded in the descriptor is used to
>> determine the length of the header in the packet.  You can probably
>> short-cut this and just add a bit of init code that just initializes
>> the queue by dropping a context descriptor in the ring with a ethernet
>> header length of 14 in it, and never bumping the tail.  Then if you
>> see the TX_FLAGS_CC you just default to context 0 which is already
>> recorded in the queue and should save you the trouble.
> 
> Rigth this is a bug. For this patch I'll put it inline and then optimize
> it later with some other optimizations.
> 
>>
>>> +       tx_buffer->tx_flags = tx_flags;
>>> +       i = ring->next_to_use;
>>> +       tx_desc = IXGBE_TX_DESC(ring, i);
>>> +
>>> +       dma = dma_map_single(ring->dev, xdp->data, len, DMA_TO_DEVICE);
>>> +       if (dma_mapping_error(ring->dev, dma))
>>> +               goto dma_error;
>>> +
>>> +       dma_unmap_len_set(tx_buffer, len, len);
>>> +       dma_unmap_addr_set(tx_buffer, dma, dma);
>>> +       tx_buffer->data = xdp->data;
>>> +       tx_desc->read.buffer_addr = cpu_to_le64(dma);
>>> +
>>> +       /* put descriptor type bits */
>>> +       cmd_type = IXGBE_ADVTXD_DTYP_DATA |
>>> +                  IXGBE_ADVTXD_DCMD_DEXT |
>>> +                  IXGBE_ADVTXD_DCMD_IFCS |
>>> +                  IXGBE_ADVTXD_DCMD_EOP;
>>
>> The EOP is redundant, we already included it in IXGBE_TXD_CMD which
>> you reference below.
>>
> 
> Thanks
> 
>>> +       cmd_type |= len | IXGBE_TXD_CMD;
>>> +       tx_desc->read.cmd_type_len = cpu_to_le32(cmd_type);
>>> +       tx_desc->read.olinfo_status =
>>> +               cpu_to_le32(len << IXGBE_ADVTXD_PAYLEN_SHIFT);
>>> +
>>> +       /* With normal buffer flow we would also set the time_stamp. But in
>>> +        * XDP case we can safely skip it because at the moment it is not
>>> +        * used anywhere.
>>> +        */
>>> +       tx_buffer->time_stamp = jiffies;
>>> +
>>
>> We should probably just drop the code from the driver that records
>> jiffies.  Feel free to submit a patch to drop it and save yourself a
>> few cycles.
>>
> 
> Sure.
> 
>>> +       /* Force memory writes to complete before letting h/w know there
>>> +        * are new descriptors to fetch.  (Only applicable for weak-ordered
>>> +        * memory model archs, such as IA-64).
>>> +        *
>>> +        * We also need this memory barrier to make certain all of the
>>> +        * status bits have been updated before next_to_watch is written.
>>> +        */
>>> +       wmb();
>>> +
>>> +       /* set next_to_watch value indicating a packet is present */
>>> +       i++;
>>> +       if (i == ring->count)
>>> +               i = 0;
>>> +
>>> +       tx_buffer->next_to_watch = tx_desc;
>>> +       ring->next_to_use = i;
>>> +
>>> +       writel(i, ring->tail);
>>> +
>>> +       /* we need this if more than one processor can write to our tail
>>> +        * at a time, it synchronizes IO on IA64/Altix systems
>>> +        */
>>> +       mmiowb();
>>> +       return NETDEV_TX_OK;
>>> +dma_error:
>>> +       if (dma_unmap_len(tx_buffer, len))
>>> +               dma_unmap_single(ring->dev,
>>> +                                dma_unmap_addr(tx_buffer, dma),
>>> +                                dma_unmap_len(tx_buffer, len),
>>> +                                DMA_TO_DEVICE);
>>
>> For now if there is a DMA error there is nothing to unmap since we
>> only have the one buffer.  We might just look at returning -ENOMEM
>> directly instead of jumping out for now.  As we add the ability to
>> support scatter-gather for Tx then we can look at adding this clean-up
>> since for now there is nothing to update.
> 
> removing it.
> 
>>
>>> +       dma_unmap_len_set(tx_buffer, len, 0);
>>> +       ring->next_to_use = i;
>>> +       return -ENOMEM;
>>> +}
>>> +
>>>  netdev_tx_t ixgbe_xmit_frame_ring(struct sk_buff *skb,
>>>                           struct ixgbe_adapter *adapter,
>>>                           struct ixgbe_ring *tx_ring)
>>> @@ -8363,6 +8563,23 @@ static void ixgbe_netpoll(struct net_device *netdev)
>>>
>>>  #endif
>>>
>>> +static void ixgbe_get_ring_stats64(struct rtnl_link_stats64 *stats,
>>> +                                  struct ixgbe_ring *ring)
>>> +{
>>> +       u64 bytes, packets;
>>> +       unsigned int start;
>>> +
>>> +       if (ring) {
>>> +               do {
>>> +                       start = u64_stats_fetch_begin_irq(&ring->syncp);
>>> +                       packets = ring->stats.packets;
>>> +                       bytes   = ring->stats.bytes;
>>> +               } while (u64_stats_fetch_retry_irq(&ring->syncp, start));
>>> +               stats->tx_packets += packets;
>>> +               stats->tx_bytes   += bytes;
>>> +       }
>>> +}
>>> +
>>>  static void ixgbe_get_stats64(struct net_device *netdev,
>>>                               struct rtnl_link_stats64 *stats)
>>>  {
>>> @@ -8388,18 +8605,13 @@ static void ixgbe_get_stats64(struct net_device *netdev,
>>>
>>>         for (i = 0; i < adapter->num_tx_queues; i++) {
>>>                 struct ixgbe_ring *ring = ACCESS_ONCE(adapter->tx_ring[i]);
>>> -               u64 bytes, packets;
>>> -               unsigned int start;
>>>
>>> -               if (ring) {
>>> -                       do {
>>> -                               start = u64_stats_fetch_begin_irq(&ring->syncp);
>>> -                               packets = ring->stats.packets;
>>> -                               bytes   = ring->stats.bytes;
>>> -                       } while (u64_stats_fetch_retry_irq(&ring->syncp, start));
>>> -                       stats->tx_packets += packets;
>>> -                       stats->tx_bytes   += bytes;
>>> -               }
>>> +               ixgbe_get_ring_stats64(stats, ring);
>>> +       }
>>> +       for (i = 0; i < adapter->num_xdp_queues; i++) {
>>> +               struct ixgbe_ring *ring = ACCESS_ONCE(adapter->xdp_ring[i]);
>>> +
>>> +               ixgbe_get_ring_stats64(stats, ring);
>>>         }
>>>         rcu_read_unlock();
>>>
>>> @@ -8533,6 +8745,7 @@ int ixgbe_setup_tc(struct net_device *dev, u8 tc)
>>>         }
>>>
>>>         ixgbe_validate_rtr(adapter, tc);
>>> +       adapter->num_xdp_queues = adapter->xdp_prog ? nr_cpu_ids : 0;
>>>
>>>  #endif /* CONFIG_IXGBE_DCB */
>>>         ixgbe_init_interrupt_scheme(adapter);
>>
>> I'm confused about this.  Don't you overwrite this when you call
>> ixgbe_init_interrupt_scheme?
>>
> 
> stale code nice catch. Originally I was doing the init directly in setup_tc.
> 
>>> @@ -9520,7 +9733,7 @@ static int ixgbe_xdp_setup(struct net_device *dev, struct bpf_prog *prog)
>>>  {
>>>         int i, frame_size = dev->mtu + ETH_HLEN + ETH_FCS_LEN + VLAN_HLEN;
>>>         struct ixgbe_adapter *adapter = netdev_priv(dev);
>>> -       struct bpf_prog *old_adapter_prog;
>>> +       struct bpf_prog *old_prog;
>>>
>>>         /* verify ixgbe ring attributes are sufficient for XDP */
>>>         for (i = 0; i < adapter->num_rx_queues; i++) {
>>> @@ -9533,12 +9746,26 @@ static int ixgbe_xdp_setup(struct net_device *dev, struct bpf_prog *prog)
>>>                         return -EINVAL;
>>>         }
>>>
>>> -       old_adapter_prog = xchg(&adapter->xdp_prog, prog);
>>> +       if (nr_cpu_ids > MAX_XDP_QUEUES)
>>> +               return -ENOMEM;
>>> +
>>> +       old_prog = xchg(&adapter->xdp_prog, prog);
>>> +
>>> +       /* If transitioning XDP modes reconfigure rings */
>>> +       if (!!adapter->xdp_prog != !!old_prog) {
>>> +               int err = ixgbe_setup_tc(dev, netdev_get_num_tc(dev));
>>> +
>>> +               if (err) {
>>> +                       rcu_assign_pointer(adapter->xdp_prog, old_prog);
>>> +                       return -EINVAL;
>>> +               }
>>> +       }
>>> +
>>>         for (i = 0; i < adapter->num_rx_queues; i++)
>>>                 ixgbe_setup_xdp_resource(adapter, adapter->rx_ring[i]);
>>>
>>> -       if (old_adapter_prog)
>>> -               bpf_prog_put(old_adapter_prog);
>>> +       if (old_prog)
>>> +               bpf_prog_put(old_prog);
>>>
>>>         return 0;
>>>  }
>>> @@ -10044,6 +10271,9 @@ static int ixgbe_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
>>>         for (i = 0; i < adapter->num_tx_queues; i++)
>>>                 u64_stats_init(&adapter->tx_ring[i]->syncp);
>>>
>>> +       for (i = 0; i < adapter->num_xdp_queues; i++)
>>> +               u64_stats_init(&adapter->xdp_ring[i]->syncp);
>>> +
>>>         /* WOL not supported for all devices */
>>>         adapter->wol = 0;
>>>         hw->eeprom.ops.read(hw, 0x2c, &adapter->eeprom_cap);
>>>
> 


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Intel-wired-lan] [net-next PATCH 3/3] ixgbe: xdp support for adjust head
  2017-02-27 16:32   ` Alexander Duyck
@ 2017-03-02  0:23     ` John Fastabend
  0 siblings, 0 replies; 14+ messages in thread
From: John Fastabend @ 2017-03-02  0:23 UTC (permalink / raw)
  To: intel-wired-lan

On 17-02-27 08:32 AM, Alexander Duyck wrote:
> On Sat, Feb 25, 2017 at 9:33 AM, John Fastabend
> <john.fastabend@gmail.com> wrote:
>> Add adjust_head support for XDP however at the moment we are only
>> adding IXGBE_SKB_PAD bytes of headroom to align with driver paths.
>>
>> The infrastructure is is such that a follow on patch can extend
>> headroom up to 196B without changing RX path.
> 
> So I am really not a fan of the way this is implemented.  I think we
> can probably just get away with passing size as a pointer, and
> adjusting page_offset in the rx_buffer_info structure.
> 
> Then the only extra bit of work that is needed is that we have to add
> some code to sanitize the page_offset in ixgbe_build_skb and
> ixgbe_construct_skb in the 4K case since we can't just use an xor
> anymore and instead will probably have to do a combination of xor,
> and, or to clear and reset the page offset back to what it is supposed
> to be.

I think the code now is getting fairly simple without manipulating the
offset directly.

> 
>> Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
>> ---
>>  drivers/net/ethernet/intel/ixgbe/ixgbe_main.c |   45 +++++++++++++++++--------
>>  1 file changed, 31 insertions(+), 14 deletions(-)
>>
>> diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
>> index 227caf8..040e469 100644
>> --- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
>> +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
>> @@ -2031,6 +2031,7 @@ static bool ixgbe_can_reuse_rx_page(struct ixgbe_rx_buffer *rx_buffer)
>>  static void ixgbe_add_rx_frag(struct ixgbe_ring *rx_ring,
>>                               struct ixgbe_rx_buffer *rx_buffer,
>>                               struct sk_buff *skb,
>> +                             unsigned int headroom,
>>                               unsigned int size)
>>  {
>>  #if (PAGE_SIZE < 8192)
>> @@ -2041,7 +2042,8 @@ static void ixgbe_add_rx_frag(struct ixgbe_ring *rx_ring,
>>                                 SKB_DATA_ALIGN(size);
>>  #endif
>>         skb_add_rx_frag(skb, skb_shinfo(skb)->nr_frags, rx_buffer->page,
>> -                       rx_buffer->page_offset, size, truesize);
>> +                       rx_buffer->page_offset - (IXGBE_SKB_PAD - headroom),
>> +                       size, truesize);
>>  #if (PAGE_SIZE < 8192)
>>         rx_buffer->page_offset ^= truesize;
>>  #else
>> @@ -2114,6 +2116,7 @@ static void ixgbe_put_rx_buffer(struct ixgbe_ring *rx_ring,
>>  static struct sk_buff *ixgbe_construct_skb(struct ixgbe_ring *rx_ring,
>>                                            struct ixgbe_rx_buffer *rx_buffer,
>>                                            union ixgbe_adv_rx_desc *rx_desc,
>> +                                          unsigned int headroom,
>>                                            unsigned int size)
>>  {
>>         void *va = page_address(rx_buffer->page) + rx_buffer->page_offset;
>> @@ -2122,6 +2125,7 @@ static struct sk_buff *ixgbe_construct_skb(struct ixgbe_ring *rx_ring,
>>  #else
>>         unsigned int truesize = SKB_DATA_ALIGN(size);
>>  #endif
>> +       unsigned int off_page;
>>         struct sk_buff *skb;
>>
>>         /* prefetch first cache line of first page */
>> @@ -2135,12 +2139,14 @@ static struct sk_buff *ixgbe_construct_skb(struct ixgbe_ring *rx_ring,
>>         if (unlikely(!skb))
>>                 return NULL;
>>
>> +       off_page = IXGBE_SKB_PAD - headroom;
>> +
>>         if (size > IXGBE_RX_HDR_SIZE) {
>>                 if (!ixgbe_test_staterr(rx_desc, IXGBE_RXD_STAT_EOP))
>>                         IXGBE_CB(skb)->dma = rx_buffer->dma;
>>
>>                 skb_add_rx_frag(skb, 0, rx_buffer->page,
>> -                               rx_buffer->page_offset,
>> +                               rx_buffer->page_offset - off_page,
>>                                 size, truesize);
>>  #if (PAGE_SIZE < 8192)
>>                 rx_buffer->page_offset ^= truesize;
>> @@ -2148,7 +2154,8 @@ static struct sk_buff *ixgbe_construct_skb(struct ixgbe_ring *rx_ring,
>>                 rx_buffer->page_offset += truesize;
>>  #endif
>>         } else {
>> -               memcpy(__skb_put(skb, size), va, ALIGN(size, sizeof(long)));
>> +               memcpy(__skb_put(skb, size), va - off_page,
>> +                      ALIGN(size, sizeof(long)));
>>                 rx_buffer->pagecnt_bias++;
>>         }
>>
>> @@ -2158,6 +2165,7 @@ static struct sk_buff *ixgbe_construct_skb(struct ixgbe_ring *rx_ring,
>>  static struct sk_buff *ixgbe_build_skb(struct ixgbe_ring *rx_ring,
>>                                        struct ixgbe_rx_buffer *rx_buffer,
>>                                        union ixgbe_adv_rx_desc *rx_desc,
>> +                                      unsigned int headroom,
>>                                        unsigned int size)
>>  {
>>         void *va = page_address(rx_buffer->page) + rx_buffer->page_offset;
>> @@ -2181,7 +2189,7 @@ static struct sk_buff *ixgbe_build_skb(struct ixgbe_ring *rx_ring,
>>                 return NULL;
>>
>>         /* update pointers within the skb to store the data */
>> -       skb_reserve(skb, IXGBE_SKB_PAD);
>> +       skb_reserve(skb, headroom);
>>         __skb_put(skb, size);
>>
>>         /* record DMA address if this is the start of a chain of buffers */
>> @@ -2202,10 +2210,14 @@ static int ixgbe_xmit_xdp_ring(struct xdp_buff *xdp,
>>                                struct ixgbe_adapter *adapter,
>>                                struct ixgbe_ring *tx_ring);
>>
>> +#define IXGBE_XDP_PASS 0
>> +#define IXGBE_XDP_CONSUMED 1
>> +
> 
> Going back to my original suggestions for patch one you would have 3
> possible return values.  You would return NULL if it is meant to pass,
> one value wrapped in PTR_ERR if it is meant to be an XMIT, and another
> value wrapped in PTR_ERR if it is meant to be dropped.  The only real
> difference between the 3 is that for NULL you do nothing, for XMIT you
> are going to need to call your xdp_xmit function and update the
> page_offset, and for drop you add one back to the pagecnt_bias.
> 
> I just realized I think my example that I provided for patch one was
> missing that bit.  We need to add one back to the pagecnt_bias if we
> are just dropping the buffer.
> 

hmm I'm looking at this now I'm not sure you save much by doing this.
All you get to do is avoid using an intermediate variable 'int consumed'
in this case.

Let me send out the latest version and see what you think.

>>  static int ixgbe_run_xdp(struct ixgbe_adapter *adapter,
>>                          struct ixgbe_ring  *rx_ring,
>>                          struct ixgbe_rx_buffer *rx_buffer,
>> -                        unsigned int size)
>> +                        unsigned int *headroom,
>> +                        unsigned int *size)
>>  {
>>         struct ixgbe_ring *xdp_ring;
>>         struct bpf_prog *xdp_prog;
>> @@ -2216,17 +2228,19 @@ static int ixgbe_run_xdp(struct ixgbe_adapter *adapter,
>>         xdp_prog = READ_ONCE(rx_ring->xdp_prog);
>>
>>         if (!xdp_prog)
>> -               return 0;
>> +               return IXGBE_XDP_PASS;
>>
>>         addr = page_address(rx_buffer->page) + rx_buffer->page_offset;
>> -       xdp.data_hard_start = addr;
>> +       xdp.data_hard_start = addr - *headroom;
>>         xdp.data = addr;
>> -       xdp.data_end = addr + size;
>> +       xdp.data_end = addr + *size;
>>
>>         act = bpf_prog_run_xdp(xdp_prog, &xdp);
>>         switch (act) {
>>         case XDP_PASS:
>> -               return 0;
>> +               *headroom = xdp.data - xdp.data_hard_start;
>> +               *size = xdp.data_end - xdp.data;
>> +               return IXGBE_XDP_PASS;
>>         case XDP_TX:
>>                 xdp_ring = adapter->xdp_ring[smp_processor_id()];
>>
>> @@ -2246,7 +2260,7 @@ static int ixgbe_run_xdp(struct ixgbe_adapter *adapter,
>>                 rx_buffer->pagecnt_bias++; /* give page back */
>>                 break;
>>         }
>> -       return size;
>> +       return IXGBE_XDP_CONSUMED;
>>  }
>>
>>  /**
>> @@ -2275,6 +2289,7 @@ static int ixgbe_clean_rx_irq(struct ixgbe_q_vector *q_vector,
>>         u16 cleaned_count = ixgbe_desc_unused(rx_ring);
>>
>>         while (likely(total_rx_packets < budget)) {
>> +               unsigned int headroom = ixgbe_rx_offset(rx_ring);
>>                 union ixgbe_adv_rx_desc *rx_desc;
>>                 struct ixgbe_rx_buffer *rx_buffer;
>>                 struct sk_buff *skb;
>> @@ -2301,7 +2316,8 @@ static int ixgbe_clean_rx_irq(struct ixgbe_q_vector *q_vector,
>>                 rx_buffer = ixgbe_get_rx_buffer(rx_ring, rx_desc, &skb, size);
>>
>>                 rcu_read_lock();
>> -               consumed = ixgbe_run_xdp(adapter, rx_ring, rx_buffer, size);
>> +               consumed = ixgbe_run_xdp(adapter, rx_ring, rx_buffer,
>> +                                        &headroom, &size);
>>                 rcu_read_unlock();
>>
>>                 if (consumed) {
>> @@ -2315,13 +2331,14 @@ static int ixgbe_clean_rx_irq(struct ixgbe_q_vector *q_vector,
>>
>>                 /* retrieve a buffer from the ring */
>>                 if (skb)
>> -                       ixgbe_add_rx_frag(rx_ring, rx_buffer, skb, size);
>> +                       ixgbe_add_rx_frag(rx_ring, rx_buffer, skb,
>> +                                         headroom, size);
>>                 else if (ring_uses_build_skb(rx_ring))
>>                         skb = ixgbe_build_skb(rx_ring, rx_buffer,
>> -                                             rx_desc, size);
>> +                                             rx_desc, headroom, size);
>>                 else
>>                         skb = ixgbe_construct_skb(rx_ring, rx_buffer,
>> -                                                 rx_desc, size);
>> +                                                 rx_desc, headroom, size);
>>
>>                 /* exit if we failed to retrieve a buffer */
>>                 if (!skb) {
>>


^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2017-03-02  0:23 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-02-25 17:32 [Intel-wired-lan] [net-next PATCH 0/3] XDP for ixgbe John Fastabend
2017-02-25 17:32 ` [Intel-wired-lan] [net-next PATCH 1/3] ixgbe: add XDP support for pass and drop actions John Fastabend
2017-02-25 18:18   ` kbuild test robot
2017-02-27 16:22   ` Alexander Duyck
2017-02-25 17:32 ` [Intel-wired-lan] [net-next PATCH 2/3] ixgbe: add support for XDP_TX action John Fastabend
2017-02-25 22:01   ` Alexander Duyck
2017-03-01 23:15     ` John Fastabend
2017-03-02  0:07       ` John Fastabend
2017-02-25 17:33 ` [Intel-wired-lan] [net-next PATCH 3/3] ixgbe: xdp support for adjust head John Fastabend
2017-02-27 16:32   ` Alexander Duyck
2017-03-02  0:23     ` John Fastabend
2017-02-25 17:38 ` [Intel-wired-lan] [net-next PATCH 0/3] XDP for ixgbe John Fastabend
2017-02-27  1:35   ` Jeff Kirsher
2017-02-25 18:58 ` Alexander Duyck

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.