netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 intel-next 0/6] XDP_TX improvements for ice
@ 2021-08-05 23:00 Maciej Fijalkowski
  2021-08-05 23:00 ` [PATCH v3 intel-next 1/6] ice: split ice_ring onto Tx/Rx separate structs Maciej Fijalkowski
                   ` (5 more replies)
  0 siblings, 6 replies; 10+ messages in thread
From: Maciej Fijalkowski @ 2021-08-05 23:00 UTC (permalink / raw)
  To: intel-wired-lan
  Cc: netdev, bpf, davem, anthony.l.nguyen, kuba, bjorn,
	magnus.karlsson, jesse.brandeburg, alexandr.lobakin, joamaki,
	toke, Maciej Fijalkowski

Hi,

it's been a while. Here's another revision of XDP_TX improvements for
ice. This time I decided to split the generic ring struct that was
serving both Tx and Rx sides onto separate entities. It is due to the
fact that this set introduces few Tx specific fields onto ring.

Also, when compared to v2, Xdp ring is propagated onto Rx ring.
Accessing vsi->xdp_rings array, especially in fallback path, is not
convenient.

Finally patch 5 introduces yet another cleaning logic, different from
v2. For more info please see commit messages.

Thanks!
Maciej

v2 : https://lore.kernel.org/bpf/20210705164338.58313-1-maciej.fijalkowski@intel.com/
v1 : https://lore.kernel.org/bpf/20210601113236.42651-1-maciej.fijalkowski@intel.com/

Maciej Fijalkowski (6):
  ice: split ice_ring onto Tx/Rx separate structs
  ice: unify xdp_rings accesses
  ice: do not create xdp_frame on XDP_TX
  ice: propagate xdp_ring onto rx_ring
  ice: optimize XDP_TX workloads
  ice: introduce XDP_TX fallback path

 drivers/net/ethernet/intel/ice/ice.h          |  30 +++-
 drivers/net/ethernet/intel/ice/ice_base.c     |  27 ++--
 drivers/net/ethernet/intel/ice/ice_base.h     |   6 +-
 drivers/net/ethernet/intel/ice/ice_dcb_lib.c  |   5 +-
 drivers/net/ethernet/intel/ice/ice_dcb_lib.h  |   6 +-
 drivers/net/ethernet/intel/ice/ice_ethtool.c  |  17 ++-
 drivers/net/ethernet/intel/ice/ice_lib.c      |  32 ++--
 drivers/net/ethernet/intel/ice/ice_lib.h      |   4 +-
 drivers/net/ethernet/intel/ice/ice_main.c     | 101 +++++++++----
 drivers/net/ethernet/intel/ice/ice_trace.h    |   8 +-
 drivers/net/ethernet/intel/ice/ice_txrx.c     | 139 +++++++++++-------
 drivers/net/ethernet/intel/ice/ice_txrx.h     |  94 +++++++-----
 drivers/net/ethernet/intel/ice/ice_txrx_lib.c |  86 +++++++++--
 drivers/net/ethernet/intel/ice/ice_txrx_lib.h |   8 +-
 .../net/ethernet/intel/ice/ice_virtchnl_pf.c  |   2 +-
 drivers/net/ethernet/intel/ice/ice_xsk.c      |  52 ++++---
 drivers/net/ethernet/intel/ice/ice_xsk.h      |   8 +-
 17 files changed, 408 insertions(+), 217 deletions(-)

-- 
2.20.1


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH v3 intel-next 1/6] ice: split ice_ring onto Tx/Rx separate structs
  2021-08-05 23:00 [PATCH v3 intel-next 0/6] XDP_TX improvements for ice Maciej Fijalkowski
@ 2021-08-05 23:00 ` Maciej Fijalkowski
  2021-08-06  1:08   ` kernel test robot
  2021-08-06 20:46   ` Creeley, Brett
  2021-08-05 23:00 ` [PATCH v3 intel-next 2/6] ice: unify xdp_rings accesses Maciej Fijalkowski
                   ` (4 subsequent siblings)
  5 siblings, 2 replies; 10+ messages in thread
From: Maciej Fijalkowski @ 2021-08-05 23:00 UTC (permalink / raw)
  To: intel-wired-lan
  Cc: netdev, bpf, davem, anthony.l.nguyen, kuba, bjorn,
	magnus.karlsson, jesse.brandeburg, alexandr.lobakin, joamaki,
	toke, Maciej Fijalkowski

While it was convenient to have a generic ring structure that served
both Tx and Rx sides, next commits are going to introduce several
Tx-specific fields, so in order to avoid hurting the Rx side, let's
pull out the Tx ring onto new ice_tx_ring struct and let the ice_ring
handle the Rx rings only.

Make the union out of the ring container within ice_q_vector so that it
is possible to iterate over newly introduced ice_tx_ring.

Remove the @size as it's only accessed from control path and it can be
calculated pretty easily.

Remove @ring_active as it's not actively used anywhere.

Change definitions of ice_update_ring_stats and
ice_fetch_u64_stats_per_ring so that they are ring agnostic and can be
used for both Rx and Tx rings.

Sizes of Rx and Tx ring structs are 256 and 192 bytes, respectively. In
Rx ring xdp_rxq_info occupies its own cacheline, so it's the major
difference now.

Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
---
 drivers/net/ethernet/intel/ice/ice.h          | 27 ++++--
 drivers/net/ethernet/intel/ice/ice_base.c     | 27 +++---
 drivers/net/ethernet/intel/ice/ice_base.h     |  6 +-
 drivers/net/ethernet/intel/ice/ice_dcb_lib.c  |  5 +-
 drivers/net/ethernet/intel/ice/ice_dcb_lib.h  |  6 +-
 drivers/net/ethernet/intel/ice/ice_ethtool.c  | 17 ++--
 drivers/net/ethernet/intel/ice/ice_lib.c      | 28 +++---
 drivers/net/ethernet/intel/ice/ice_lib.h      |  4 +-
 drivers/net/ethernet/intel/ice/ice_main.c     | 47 +++++-----
 drivers/net/ethernet/intel/ice/ice_trace.h    |  8 +-
 drivers/net/ethernet/intel/ice/ice_txrx.c     | 87 ++++++++++--------
 drivers/net/ethernet/intel/ice/ice_txrx.h     | 90 ++++++++++++-------
 drivers/net/ethernet/intel/ice/ice_txrx_lib.c |  6 +-
 drivers/net/ethernet/intel/ice/ice_txrx_lib.h |  8 +-
 .../net/ethernet/intel/ice/ice_virtchnl_pf.c  |  2 +-
 drivers/net/ethernet/intel/ice/ice_xsk.c      | 29 +++---
 drivers/net/ethernet/intel/ice/ice_xsk.h      |  8 +-
 17 files changed, 233 insertions(+), 172 deletions(-)

diff --git a/drivers/net/ethernet/intel/ice/ice.h b/drivers/net/ethernet/intel/ice/ice.h
index a450343fbb92..2e15e097bc0f 100644
--- a/drivers/net/ethernet/intel/ice/ice.h
+++ b/drivers/net/ethernet/intel/ice/ice.h
@@ -266,7 +266,7 @@ struct ice_vsi {
 	struct ice_pf *back;		 /* back pointer to PF */
 	struct ice_port_info *port_info; /* back pointer to port_info */
 	struct ice_ring **rx_rings;	 /* Rx ring array */
-	struct ice_ring **tx_rings;	 /* Tx ring array */
+	struct ice_tx_ring **tx_rings;	 /* Tx ring array */
 	struct ice_q_vector **q_vectors; /* q_vector array */
 
 	irqreturn_t (*irq_handler)(int irq, void *data);
@@ -343,7 +343,7 @@ struct ice_vsi {
 	u16 qset_handle[ICE_MAX_TRAFFIC_CLASS];
 	struct ice_tc_cfg tc_cfg;
 	struct bpf_prog *xdp_prog;
-	struct ice_ring **xdp_rings;	 /* XDP ring array */
+	struct ice_tx_ring **xdp_rings;	 /* XDP ring array */
 	unsigned long *af_xdp_zc_qps;	 /* tracks AF_XDP ZC enabled qps */
 	u16 num_xdp_txq;		 /* Used XDP queues */
 	u8 xdp_mapping_mode;		 /* ICE_MAP_MODE_[CONTIG|SCATTER] */
@@ -555,14 +555,14 @@ static inline bool ice_is_xdp_ena_vsi(struct ice_vsi *vsi)
 	return !!vsi->xdp_prog;
 }
 
-static inline void ice_set_ring_xdp(struct ice_ring *ring)
+static inline void ice_set_ring_xdp(struct ice_tx_ring *ring)
 {
 	ring->flags |= ICE_TX_FLAGS_RING_XDP;
 }
 
 /**
  * ice_xsk_pool - get XSK buffer pool bound to a ring
- * @ring: ring to use
+ * @ring: Rx ring to use
  *
  * Returns a pointer to xdp_umem structure if there is a buffer pool present,
  * NULL otherwise.
@@ -572,8 +572,23 @@ static inline struct xsk_buff_pool *ice_xsk_pool(struct ice_ring *ring)
 	struct ice_vsi *vsi = ring->vsi;
 	u16 qid = ring->q_index;
 
-	if (ice_ring_is_xdp(ring))
-		qid -= vsi->num_xdp_txq;
+	if (!ice_is_xdp_ena_vsi(vsi) || !test_bit(qid, vsi->af_xdp_zc_qps))
+		return NULL;
+
+	return xsk_get_pool_from_qid(vsi->netdev, qid);
+}
+
+/**
+ * ice_tx_xsk_pool - get XSK buffer pool bound to a ring
+ * @ring: Tx ring to use
+ *
+ * Returns a pointer to xdp_umem structure if there is a buffer pool present,
+ * NULL otherwise. Tx equivalent of ice_xsk_pool.
+ */
+static inline struct xsk_buff_pool *ice_tx_xsk_pool(struct ice_tx_ring *ring)
+{
+	struct ice_vsi *vsi = ring->vsi;
+	u16 qid = ring->q_index - vsi->num_xdp_txq;
 
 	if (!ice_is_xdp_ena_vsi(vsi) || !test_bit(qid, vsi->af_xdp_zc_qps))
 		return NULL;
diff --git a/drivers/net/ethernet/intel/ice/ice_base.c b/drivers/net/ethernet/intel/ice/ice_base.c
index c36057efc7ae..838ee4b8d96f 100644
--- a/drivers/net/ethernet/intel/ice/ice_base.c
+++ b/drivers/net/ethernet/intel/ice/ice_base.c
@@ -146,6 +146,7 @@ static void ice_free_q_vector(struct ice_vsi *vsi, int v_idx)
 {
 	struct ice_q_vector *q_vector;
 	struct ice_pf *pf = vsi->back;
+	struct ice_tx_ring *tx_ring;
 	struct ice_ring *ring;
 	struct device *dev;
 
@@ -156,8 +157,8 @@ static void ice_free_q_vector(struct ice_vsi *vsi, int v_idx)
 	}
 	q_vector = vsi->q_vectors[v_idx];
 
-	ice_for_each_ring(ring, q_vector->tx)
-		ring->q_vector = NULL;
+	ice_for_each_tx_ring(tx_ring, q_vector->tx)
+		tx_ring->q_vector = NULL;
 	ice_for_each_ring(ring, q_vector->rx)
 		ring->q_vector = NULL;
 
@@ -206,7 +207,7 @@ static void ice_cfg_itr_gran(struct ice_hw *hw)
  * @ring: ring to get the absolute queue index
  * @tc: traffic class number
  */
-static u16 ice_calc_q_handle(struct ice_vsi *vsi, struct ice_ring *ring, u8 tc)
+static u16 ice_calc_q_handle(struct ice_vsi *vsi, struct ice_tx_ring *ring, u8 tc)
 {
 	WARN_ONCE(ice_ring_is_xdp(ring) && tc, "XDP ring can't belong to TC other than 0\n");
 
@@ -224,7 +225,7 @@ static u16 ice_calc_q_handle(struct ice_vsi *vsi, struct ice_ring *ring, u8 tc)
  * This enables/disables XPS for a given Tx descriptor ring
  * based on the TCs enabled for the VSI that ring belongs to.
  */
-static void ice_cfg_xps_tx_ring(struct ice_ring *ring)
+static void ice_cfg_xps_tx_ring(struct ice_tx_ring *ring)
 {
 	if (!ring->q_vector || !ring->netdev)
 		return;
@@ -246,7 +247,7 @@ static void ice_cfg_xps_tx_ring(struct ice_ring *ring)
  * Configure the Tx descriptor ring in TLAN context.
  */
 static void
-ice_setup_tx_ctx(struct ice_ring *ring, struct ice_tlan_ctx *tlan_ctx, u16 pf_q)
+ice_setup_tx_ctx(struct ice_tx_ring *ring, struct ice_tlan_ctx *tlan_ctx, u16 pf_q)
 {
 	struct ice_vsi *vsi = ring->vsi;
 	struct ice_hw *hw = &vsi->back->hw;
@@ -258,7 +259,7 @@ ice_setup_tx_ctx(struct ice_ring *ring, struct ice_tlan_ctx *tlan_ctx, u16 pf_q)
 	/* Transmit Queue Length */
 	tlan_ctx->qlen = ring->count;
 
-	ice_set_cgd_num(tlan_ctx, ring);
+	ice_set_cgd_num(tlan_ctx, ring->dcb_tc);
 
 	/* PF number */
 	tlan_ctx->pf_num = hw->pf_id;
@@ -660,16 +661,16 @@ void ice_vsi_map_rings_to_vectors(struct ice_vsi *vsi)
 		tx_rings_per_v = (u8)DIV_ROUND_UP(tx_rings_rem,
 						  q_vectors - v_id);
 		q_vector->num_ring_tx = tx_rings_per_v;
-		q_vector->tx.ring = NULL;
+		q_vector->tx.tx_ring = NULL;
 		q_vector->tx.itr_idx = ICE_TX_ITR;
 		q_base = vsi->num_txq - tx_rings_rem;
 
 		for (q_id = q_base; q_id < (q_base + tx_rings_per_v); q_id++) {
-			struct ice_ring *tx_ring = vsi->tx_rings[q_id];
+			struct ice_tx_ring *tx_ring = vsi->tx_rings[q_id];
 
 			tx_ring->q_vector = q_vector;
-			tx_ring->next = q_vector->tx.ring;
-			q_vector->tx.ring = tx_ring;
+			tx_ring->next = q_vector->tx.tx_ring;
+			q_vector->tx.tx_ring = tx_ring;
 		}
 		tx_rings_rem -= tx_rings_per_v;
 
@@ -711,7 +712,7 @@ void ice_vsi_free_q_vectors(struct ice_vsi *vsi)
  * @qg_buf: queue group buffer
  */
 int
-ice_vsi_cfg_txq(struct ice_vsi *vsi, struct ice_ring *ring,
+ice_vsi_cfg_txq(struct ice_vsi *vsi, struct ice_tx_ring *ring,
 		struct ice_aqc_add_tx_qgrp *qg_buf)
 {
 	u8 buf_len = struct_size(qg_buf, txqs, 1);
@@ -870,7 +871,7 @@ void ice_trigger_sw_intr(struct ice_hw *hw, struct ice_q_vector *q_vector)
  */
 int
 ice_vsi_stop_tx_ring(struct ice_vsi *vsi, enum ice_disq_rst_src rst_src,
-		     u16 rel_vmvf_num, struct ice_ring *ring,
+		     u16 rel_vmvf_num, struct ice_tx_ring *ring,
 		     struct ice_txq_meta *txq_meta)
 {
 	struct ice_pf *pf = vsi->back;
@@ -927,7 +928,7 @@ ice_vsi_stop_tx_ring(struct ice_vsi *vsi, enum ice_disq_rst_src rst_src,
  * are needed for stopping Tx queue
  */
 void
-ice_fill_txq_meta(struct ice_vsi *vsi, struct ice_ring *ring,
+ice_fill_txq_meta(struct ice_vsi *vsi, struct ice_tx_ring *ring,
 		  struct ice_txq_meta *txq_meta)
 {
 	u8 tc;
diff --git a/drivers/net/ethernet/intel/ice/ice_base.h b/drivers/net/ethernet/intel/ice/ice_base.h
index 20e1c29aa68a..2ce777eb53b0 100644
--- a/drivers/net/ethernet/intel/ice/ice_base.h
+++ b/drivers/net/ethernet/intel/ice/ice_base.h
@@ -15,7 +15,7 @@ int ice_vsi_alloc_q_vectors(struct ice_vsi *vsi);
 void ice_vsi_map_rings_to_vectors(struct ice_vsi *vsi);
 void ice_vsi_free_q_vectors(struct ice_vsi *vsi);
 int
-ice_vsi_cfg_txq(struct ice_vsi *vsi, struct ice_ring *ring,
+ice_vsi_cfg_txq(struct ice_vsi *vsi, struct ice_tx_ring *ring,
 		struct ice_aqc_add_tx_qgrp *qg_buf);
 void ice_cfg_itr(struct ice_hw *hw, struct ice_q_vector *q_vector);
 void
@@ -25,9 +25,9 @@ ice_cfg_rxq_interrupt(struct ice_vsi *vsi, u16 rxq, u16 msix_idx, u16 itr_idx);
 void ice_trigger_sw_intr(struct ice_hw *hw, struct ice_q_vector *q_vector);
 int
 ice_vsi_stop_tx_ring(struct ice_vsi *vsi, enum ice_disq_rst_src rst_src,
-		     u16 rel_vmvf_num, struct ice_ring *ring,
+		     u16 rel_vmvf_num, struct ice_tx_ring *ring,
 		     struct ice_txq_meta *txq_meta);
 void
-ice_fill_txq_meta(struct ice_vsi *vsi, struct ice_ring *ring,
+ice_fill_txq_meta(struct ice_vsi *vsi, struct ice_tx_ring *ring,
 		  struct ice_txq_meta *txq_meta);
 #endif /* _ICE_BASE_H_ */
diff --git a/drivers/net/ethernet/intel/ice/ice_dcb_lib.c b/drivers/net/ethernet/intel/ice/ice_dcb_lib.c
index 926cf748c5ec..2507223bfdc7 100644
--- a/drivers/net/ethernet/intel/ice/ice_dcb_lib.c
+++ b/drivers/net/ethernet/intel/ice/ice_dcb_lib.c
@@ -194,7 +194,8 @@ u8 ice_dcb_get_tc(struct ice_vsi *vsi, int queue_index)
  */
 void ice_vsi_cfg_dcb_rings(struct ice_vsi *vsi)
 {
-	struct ice_ring *tx_ring, *rx_ring;
+	struct ice_tx_ring *tx_ring;
+	struct ice_ring *rx_ring;
 	u16 qoffset, qcount;
 	int i, n;
 
@@ -814,7 +815,7 @@ void ice_update_dcb_stats(struct ice_pf *pf)
  * tag will already be configured with the correct ID and priority bits
  */
 void
-ice_tx_prepare_vlan_flags_dcb(struct ice_ring *tx_ring,
+ice_tx_prepare_vlan_flags_dcb(struct ice_tx_ring *tx_ring,
 			      struct ice_tx_buf *first)
 {
 	struct sk_buff *skb = first->skb;
diff --git a/drivers/net/ethernet/intel/ice/ice_dcb_lib.h b/drivers/net/ethernet/intel/ice/ice_dcb_lib.h
index 261b6e2ed7bc..a5bdf47cd34a 100644
--- a/drivers/net/ethernet/intel/ice/ice_dcb_lib.h
+++ b/drivers/net/ethernet/intel/ice/ice_dcb_lib.h
@@ -28,7 +28,7 @@ void ice_vsi_cfg_dcb_rings(struct ice_vsi *vsi);
 int ice_init_pf_dcb(struct ice_pf *pf, bool locked);
 void ice_update_dcb_stats(struct ice_pf *pf);
 void
-ice_tx_prepare_vlan_flags_dcb(struct ice_ring *tx_ring,
+ice_tx_prepare_vlan_flags_dcb(struct ice_tx_ring *tx_ring,
 			      struct ice_tx_buf *first);
 void
 ice_dcb_process_lldp_set_mib_change(struct ice_pf *pf,
@@ -49,9 +49,9 @@ static inline bool ice_find_q_in_range(u16 low, u16 high, unsigned int tx_q)
 }
 
 static inline void
-ice_set_cgd_num(struct ice_tlan_ctx *tlan_ctx, struct ice_ring *ring)
+ice_set_cgd_num(struct ice_tlan_ctx *tlan_ctx, u8 dcb_tc)
 {
-	tlan_ctx->cgd_num = ring->dcb_tc;
+	tlan_ctx->cgd_num = dcb_tc;
 }
 
 static inline bool ice_is_dcb_active(struct ice_pf *pf)
diff --git a/drivers/net/ethernet/intel/ice/ice_ethtool.c b/drivers/net/ethernet/intel/ice/ice_ethtool.c
index d95a5daca114..644ce9f3494d 100644
--- a/drivers/net/ethernet/intel/ice/ice_ethtool.c
+++ b/drivers/net/ethernet/intel/ice/ice_ethtool.c
@@ -584,7 +584,7 @@ static bool ice_lbtest_check_frame(u8 *frame)
  *
  * Function sends loopback packets on a test Tx ring.
  */
-static int ice_diag_send(struct ice_ring *tx_ring, u8 *data, u16 size)
+static int ice_diag_send(struct ice_tx_ring *tx_ring, u8 *data, u16 size)
 {
 	struct ice_tx_desc *tx_desc;
 	struct ice_tx_buf *tx_buf;
@@ -676,9 +676,10 @@ static u64 ice_loopback_test(struct net_device *netdev)
 	struct ice_netdev_priv *np = netdev_priv(netdev);
 	struct ice_vsi *orig_vsi = np->vsi, *test_vsi;
 	struct ice_pf *pf = orig_vsi->back;
-	struct ice_ring *tx_ring, *rx_ring;
 	u8 broadcast[ETH_ALEN], ret = 0;
 	int num_frames, valid_frames;
+	struct ice_tx_ring *tx_ring;
+	struct ice_ring *rx_ring;
 	struct device *dev;
 	u8 *tx_frame;
 	int i;
@@ -1318,6 +1319,7 @@ ice_get_ethtool_stats(struct net_device *netdev,
 	struct ice_netdev_priv *np = netdev_priv(netdev);
 	struct ice_vsi *vsi = np->vsi;
 	struct ice_pf *pf = vsi->back;
+	struct ice_tx_ring *tx_ring;
 	struct ice_ring *ring;
 	unsigned int j;
 	int i = 0;
@@ -1336,10 +1338,10 @@ ice_get_ethtool_stats(struct net_device *netdev,
 	rcu_read_lock();
 
 	ice_for_each_alloc_txq(vsi, j) {
-		ring = READ_ONCE(vsi->tx_rings[j]);
+		tx_ring = READ_ONCE(vsi->tx_rings[j]);
 		if (ring) {
-			data[i++] = ring->stats.pkts;
-			data[i++] = ring->stats.bytes;
+			data[i++] = tx_ring->stats.pkts;
+			data[i++] = tx_ring->stats.bytes;
 		} else {
 			data[i++] = 0;
 			data[i++] = 0;
@@ -2667,9 +2669,10 @@ ice_get_ringparam(struct net_device *netdev, struct ethtool_ringparam *ring)
 static int
 ice_set_ringparam(struct net_device *netdev, struct ethtool_ringparam *ring)
 {
-	struct ice_ring *tx_rings = NULL, *rx_rings = NULL;
+	struct ice_tx_ring *tx_rings = NULL;
+	struct ice_ring *rx_rings = NULL;
 	struct ice_netdev_priv *np = netdev_priv(netdev);
-	struct ice_ring *xdp_rings = NULL;
+	struct ice_tx_ring *xdp_rings = NULL;
 	struct ice_vsi *vsi = np->vsi;
 	struct ice_pf *pf = vsi->back;
 	int i, timeout = 50, err = 0;
diff --git a/drivers/net/ethernet/intel/ice/ice_lib.c b/drivers/net/ethernet/intel/ice/ice_lib.c
index dde9802c6c72..ac0d7a52406b 100644
--- a/drivers/net/ethernet/intel/ice/ice_lib.c
+++ b/drivers/net/ethernet/intel/ice/ice_lib.c
@@ -379,12 +379,12 @@ static irqreturn_t ice_msix_clean_ctrl_vsi(int __always_unused irq, void *data)
 {
 	struct ice_q_vector *q_vector = (struct ice_q_vector *)data;
 
-	if (!q_vector->tx.ring)
+	if (!q_vector->tx.tx_ring)
 		return IRQ_HANDLED;
 
 #define FDIR_RX_DESC_CLEAN_BUDGET 64
 	ice_clean_rx_irq(q_vector->rx.ring, FDIR_RX_DESC_CLEAN_BUDGET);
-	ice_clean_ctrl_tx_irq(q_vector->tx.ring);
+	ice_clean_ctrl_tx_irq(q_vector->tx.tx_ring);
 
 	return IRQ_HANDLED;
 }
@@ -1286,7 +1286,7 @@ static int ice_vsi_alloc_rings(struct ice_vsi *vsi)
 	dev = ice_pf_to_dev(pf);
 	/* Allocate Tx rings */
 	for (i = 0; i < vsi->alloc_txq; i++) {
-		struct ice_ring *ring;
+		struct ice_tx_ring *ring;
 
 		/* allocate with kzalloc(), free with kfree_rcu() */
 		ring = kzalloc(sizeof(*ring), GFP_KERNEL);
@@ -1296,7 +1296,6 @@ static int ice_vsi_alloc_rings(struct ice_vsi *vsi)
 
 		ring->q_index = i;
 		ring->reg_idx = vsi->txq_map[i];
-		ring->ring_active = false;
 		ring->vsi = vsi;
 		ring->tx_tstamps = &pf->ptp.port.tx;
 		ring->dev = dev;
@@ -1315,7 +1314,6 @@ static int ice_vsi_alloc_rings(struct ice_vsi *vsi)
 
 		ring->q_index = i;
 		ring->reg_idx = vsi->rxq_map[i];
-		ring->ring_active = false;
 		ring->vsi = vsi;
 		ring->netdev = vsi->netdev;
 		ring->dev = dev;
@@ -1710,7 +1708,7 @@ int ice_vsi_cfg_single_rxq(struct ice_vsi *vsi, u16 q_idx)
 	return ice_vsi_cfg_rxq(vsi->rx_rings[q_idx]);
 }
 
-int ice_vsi_cfg_single_txq(struct ice_vsi *vsi, struct ice_ring **tx_rings, u16 q_idx)
+int ice_vsi_cfg_single_txq(struct ice_vsi *vsi, struct ice_tx_ring **tx_rings, u16 q_idx)
 {
 	struct ice_aqc_add_tx_qgrp *qg_buf;
 	int err;
@@ -1766,7 +1764,7 @@ int ice_vsi_cfg_rxqs(struct ice_vsi *vsi)
  * Configure the Tx VSI for operation.
  */
 static int
-ice_vsi_cfg_txqs(struct ice_vsi *vsi, struct ice_ring **rings, u16 count)
+ice_vsi_cfg_txqs(struct ice_vsi *vsi, struct ice_tx_ring **rings, u16 count)
 {
 	struct ice_aqc_add_tx_qgrp *qg_buf;
 	u16 q_idx = 0;
@@ -1818,7 +1816,7 @@ int ice_vsi_cfg_xdp_txqs(struct ice_vsi *vsi)
 		return ret;
 
 	for (i = 0; i < vsi->num_xdp_txq; i++)
-		vsi->xdp_rings[i]->xsk_pool = ice_xsk_pool(vsi->xdp_rings[i]);
+		vsi->xdp_rings[i]->xsk_pool = ice_tx_xsk_pool(vsi->xdp_rings[i]);
 
 	return ret;
 }
@@ -2057,7 +2055,7 @@ int ice_vsi_stop_all_rx_rings(struct ice_vsi *vsi)
  */
 static int
 ice_vsi_stop_tx_rings(struct ice_vsi *vsi, enum ice_disq_rst_src rst_src,
-		      u16 rel_vmvf_num, struct ice_ring **rings, u16 count)
+		      u16 rel_vmvf_num, struct ice_tx_ring **rings, u16 count)
 {
 	u16 q_idx;
 
@@ -3357,10 +3355,10 @@ int ice_vsi_cfg_tc(struct ice_vsi *vsi, u8 ena_tc)
  *
  * This function assumes that caller has acquired a u64_stats_sync lock.
  */
-static void ice_update_ring_stats(struct ice_ring *ring, u64 pkts, u64 bytes)
+static void ice_update_ring_stats(struct ice_q_stats *stats, u64 pkts, u64 bytes)
 {
-	ring->stats.bytes += bytes;
-	ring->stats.pkts += pkts;
+	stats->bytes += bytes;
+	stats->pkts += pkts;
 }
 
 /**
@@ -3369,10 +3367,10 @@ static void ice_update_ring_stats(struct ice_ring *ring, u64 pkts, u64 bytes)
  * @pkts: number of processed packets
  * @bytes: number of processed bytes
  */
-void ice_update_tx_ring_stats(struct ice_ring *tx_ring, u64 pkts, u64 bytes)
+void ice_update_tx_ring_stats(struct ice_tx_ring *tx_ring, u64 pkts, u64 bytes)
 {
 	u64_stats_update_begin(&tx_ring->syncp);
-	ice_update_ring_stats(tx_ring, pkts, bytes);
+	ice_update_ring_stats(&tx_ring->stats, pkts, bytes);
 	u64_stats_update_end(&tx_ring->syncp);
 }
 
@@ -3385,7 +3383,7 @@ void ice_update_tx_ring_stats(struct ice_ring *tx_ring, u64 pkts, u64 bytes)
 void ice_update_rx_ring_stats(struct ice_ring *rx_ring, u64 pkts, u64 bytes)
 {
 	u64_stats_update_begin(&rx_ring->syncp);
-	ice_update_ring_stats(rx_ring, pkts, bytes);
+	ice_update_ring_stats(&rx_ring->stats, pkts, bytes);
 	u64_stats_update_end(&rx_ring->syncp);
 }
 
diff --git a/drivers/net/ethernet/intel/ice/ice_lib.h b/drivers/net/ethernet/intel/ice/ice_lib.h
index d5a28bf0fc2c..2a69666db194 100644
--- a/drivers/net/ethernet/intel/ice/ice_lib.h
+++ b/drivers/net/ethernet/intel/ice/ice_lib.h
@@ -14,7 +14,7 @@ void ice_update_eth_stats(struct ice_vsi *vsi);
 
 int ice_vsi_cfg_single_rxq(struct ice_vsi *vsi, u16 q_idx);
 
-int ice_vsi_cfg_single_txq(struct ice_vsi *vsi, struct ice_ring **tx_rings, u16 q_idx);
+int ice_vsi_cfg_single_txq(struct ice_vsi *vsi, struct ice_tx_ring **tx_rings, u16 q_idx);
 
 int ice_vsi_cfg_rxqs(struct ice_vsi *vsi);
 
@@ -93,7 +93,7 @@ void ice_vsi_free_tx_rings(struct ice_vsi *vsi);
 
 void ice_vsi_manage_rss_lut(struct ice_vsi *vsi, bool ena);
 
-void ice_update_tx_ring_stats(struct ice_ring *ring, u64 pkts, u64 bytes);
+void ice_update_tx_ring_stats(struct ice_tx_ring *ring, u64 pkts, u64 bytes);
 
 void ice_update_rx_ring_stats(struct ice_ring *ring, u64 pkts, u64 bytes);
 
diff --git a/drivers/net/ethernet/intel/ice/ice_main.c b/drivers/net/ethernet/intel/ice/ice_main.c
index ef8d1815af56..cbcb4ad60852 100644
--- a/drivers/net/ethernet/intel/ice/ice_main.c
+++ b/drivers/net/ethernet/intel/ice/ice_main.c
@@ -61,7 +61,7 @@ bool netif_is_ice(struct net_device *dev)
  * ice_get_tx_pending - returns number of Tx descriptors not processed
  * @ring: the ring of descriptors
  */
-static u16 ice_get_tx_pending(struct ice_ring *ring)
+static u16 ice_get_tx_pending(struct ice_tx_ring *ring)
 {
 	u16 head, tail;
 
@@ -101,7 +101,7 @@ static void ice_check_for_hang_subtask(struct ice_pf *pf)
 	hw = &vsi->back->hw;
 
 	for (i = 0; i < vsi->num_txq; i++) {
-		struct ice_ring *tx_ring = vsi->tx_rings[i];
+		struct ice_tx_ring *tx_ring = vsi->tx_rings[i];
 
 		if (tx_ring && tx_ring->desc) {
 			/* If packet counter has not changed the queue is
@@ -2363,7 +2363,7 @@ static int ice_xdp_alloc_setup_rings(struct ice_vsi *vsi)
 
 	for (i = 0; i < vsi->num_xdp_txq; i++) {
 		u16 xdp_q_idx = vsi->alloc_txq + i;
-		struct ice_ring *xdp_ring;
+		struct ice_tx_ring *xdp_ring;
 
 		xdp_ring = kzalloc(sizeof(*xdp_ring), GFP_KERNEL);
 
@@ -2372,7 +2372,6 @@ static int ice_xdp_alloc_setup_rings(struct ice_vsi *vsi)
 
 		xdp_ring->q_index = xdp_q_idx;
 		xdp_ring->reg_idx = vsi->txq_map[xdp_q_idx];
-		xdp_ring->ring_active = false;
 		xdp_ring->vsi = vsi;
 		xdp_ring->netdev = NULL;
 		xdp_ring->dev = dev;
@@ -2381,7 +2380,7 @@ static int ice_xdp_alloc_setup_rings(struct ice_vsi *vsi)
 		if (ice_setup_tx_ring(xdp_ring))
 			goto free_xdp_rings;
 		ice_set_ring_xdp(xdp_ring);
-		xdp_ring->xsk_pool = ice_xsk_pool(xdp_ring);
+		xdp_ring->xsk_pool = ice_tx_xsk_pool(xdp_ring);
 	}
 
 	return 0;
@@ -2460,11 +2459,11 @@ int ice_prepare_xdp_rings(struct ice_vsi *vsi, struct bpf_prog *prog)
 		q_base = vsi->num_xdp_txq - xdp_rings_rem;
 
 		for (q_id = q_base; q_id < (q_base + xdp_rings_per_v); q_id++) {
-			struct ice_ring *xdp_ring = vsi->xdp_rings[q_id];
+			struct ice_tx_ring *xdp_ring = vsi->xdp_rings[q_id];
 
 			xdp_ring->q_vector = q_vector;
-			xdp_ring->next = q_vector->tx.ring;
-			q_vector->tx.ring = xdp_ring;
+			xdp_ring->next = q_vector->tx.tx_ring;
+			q_vector->tx.tx_ring = xdp_ring;
 		}
 		xdp_rings_rem -= xdp_rings_per_v;
 	}
@@ -2534,14 +2533,14 @@ int ice_destroy_xdp_rings(struct ice_vsi *vsi)
 
 	ice_for_each_q_vector(vsi, v_idx) {
 		struct ice_q_vector *q_vector = vsi->q_vectors[v_idx];
-		struct ice_ring *ring;
+		struct ice_tx_ring *ring;
 
-		ice_for_each_ring(ring, q_vector->tx)
+		ice_for_each_tx_ring(ring, q_vector->tx)
 			if (!ring->tx_buf || !ice_ring_is_xdp(ring))
 				break;
 
 		/* restore the value of last node prior to XDP setup */
-		q_vector->tx.ring = ring;
+		q_vector->tx.tx_ring = ring;
 	}
 
 free_qmap:
@@ -5615,19 +5614,18 @@ int ice_up(struct ice_vsi *vsi)
  * that needs to be performed to read u64 values in 32 bit machine.
  */
 static void
-ice_fetch_u64_stats_per_ring(struct ice_ring *ring, u64 *pkts, u64 *bytes)
+ice_fetch_u64_stats_per_ring(struct u64_stats_sync *syncp, struct ice_q_stats stats,
+			     u64 *pkts, u64 *bytes)
 {
 	unsigned int start;
 	*pkts = 0;
 	*bytes = 0;
 
-	if (!ring)
-		return;
 	do {
-		start = u64_stats_fetch_begin_irq(&ring->syncp);
-		*pkts = ring->stats.pkts;
-		*bytes = ring->stats.bytes;
-	} while (u64_stats_fetch_retry_irq(&ring->syncp, start));
+		start = u64_stats_fetch_begin_irq(syncp);
+		*pkts = stats.pkts;
+		*bytes = stats.bytes;
+	} while (u64_stats_fetch_retry_irq(syncp, start));
 }
 
 /**
@@ -5637,18 +5635,19 @@ ice_fetch_u64_stats_per_ring(struct ice_ring *ring, u64 *pkts, u64 *bytes)
  * @count: number of rings
  */
 static void
-ice_update_vsi_tx_ring_stats(struct ice_vsi *vsi, struct ice_ring **rings,
+ice_update_vsi_tx_ring_stats(struct ice_vsi *vsi, struct ice_tx_ring **rings,
 			     u16 count)
 {
 	struct rtnl_link_stats64 *vsi_stats = &vsi->net_stats;
 	u16 i;
 
 	for (i = 0; i < count; i++) {
-		struct ice_ring *ring;
+		struct ice_tx_ring *ring;
 		u64 pkts, bytes;
 
 		ring = READ_ONCE(rings[i]);
-		ice_fetch_u64_stats_per_ring(ring, &pkts, &bytes);
+		if (ring)
+			ice_fetch_u64_stats_per_ring(&ring->syncp, ring->stats, &pkts, &bytes);
 		vsi_stats->tx_packets += pkts;
 		vsi_stats->tx_bytes += bytes;
 		vsi->tx_restart += ring->tx_stats.restart_q;
@@ -5689,7 +5688,7 @@ static void ice_update_vsi_ring_stats(struct ice_vsi *vsi)
 	ice_for_each_rxq(vsi, i) {
 		struct ice_ring *ring = READ_ONCE(vsi->rx_rings[i]);
 
-		ice_fetch_u64_stats_per_ring(ring, &pkts, &bytes);
+		ice_fetch_u64_stats_per_ring(&ring->syncp, ring->stats, &pkts, &bytes);
 		vsi_stats->rx_packets += pkts;
 		vsi_stats->rx_bytes += bytes;
 		vsi->rx_buf_failed += ring->rx_stats.alloc_buf_failed;
@@ -6036,7 +6035,7 @@ int ice_vsi_setup_tx_rings(struct ice_vsi *vsi)
 	}
 
 	ice_for_each_txq(vsi, i) {
-		struct ice_ring *ring = vsi->tx_rings[i];
+		struct ice_tx_ring *ring = vsi->tx_rings[i];
 
 		if (!ring)
 			return -EINVAL;
@@ -6962,7 +6961,7 @@ ice_bridge_setlink(struct net_device *dev, struct nlmsghdr *nlh,
 static void ice_tx_timeout(struct net_device *netdev, unsigned int txqueue)
 {
 	struct ice_netdev_priv *np = netdev_priv(netdev);
-	struct ice_ring *tx_ring = NULL;
+	struct ice_tx_ring *tx_ring = NULL;
 	struct ice_vsi *vsi = np->vsi;
 	struct ice_pf *pf = vsi->back;
 	u32 i;
diff --git a/drivers/net/ethernet/intel/ice/ice_trace.h b/drivers/net/ethernet/intel/ice/ice_trace.h
index 9bc0b8fdfc77..f230ff435f26 100644
--- a/drivers/net/ethernet/intel/ice/ice_trace.h
+++ b/drivers/net/ethernet/intel/ice/ice_trace.h
@@ -115,7 +115,7 @@ DEFINE_EVENT(ice_tx_dim_template, ice_tx_dim_work,
 
 /* Events related to a vsi & ring */
 DECLARE_EVENT_CLASS(ice_tx_template,
-		    TP_PROTO(struct ice_ring *ring, struct ice_tx_desc *desc,
+		    TP_PROTO(struct ice_tx_ring *ring, struct ice_tx_desc *desc,
 			     struct ice_tx_buf *buf),
 
 		    TP_ARGS(ring, desc, buf),
@@ -135,7 +135,7 @@ DECLARE_EVENT_CLASS(ice_tx_template,
 
 #define DEFINE_TX_TEMPLATE_OP_EVENT(name) \
 DEFINE_EVENT(ice_tx_template, name, \
-	     TP_PROTO(struct ice_ring *ring, \
+	     TP_PROTO(struct ice_tx_ring *ring, \
 		      struct ice_tx_desc *desc, \
 		      struct ice_tx_buf *buf), \
 	     TP_ARGS(ring, desc, buf))
@@ -192,7 +192,7 @@ DEFINE_EVENT(ice_rx_indicate_template, ice_clean_rx_irq_indicate,
 );
 
 DECLARE_EVENT_CLASS(ice_xmit_template,
-		    TP_PROTO(struct ice_ring *ring, struct sk_buff *skb),
+		    TP_PROTO(struct ice_tx_ring *ring, struct sk_buff *skb),
 
 		    TP_ARGS(ring, skb),
 
@@ -210,7 +210,7 @@ DECLARE_EVENT_CLASS(ice_xmit_template,
 
 #define DEFINE_XMIT_TEMPLATE_OP_EVENT(name) \
 DEFINE_EVENT(ice_xmit_template, name, \
-	     TP_PROTO(struct ice_ring *ring, struct sk_buff *skb), \
+	     TP_PROTO(struct ice_tx_ring *ring, struct sk_buff *skb), \
 	     TP_ARGS(ring, skb))
 
 DEFINE_XMIT_TEMPLATE_OP_EVENT(ice_xmit_frame_ring);
diff --git a/drivers/net/ethernet/intel/ice/ice_txrx.c b/drivers/net/ethernet/intel/ice/ice_txrx.c
index 6ee8e0032d52..fca5aca1ffae 100644
--- a/drivers/net/ethernet/intel/ice/ice_txrx.c
+++ b/drivers/net/ethernet/intel/ice/ice_txrx.c
@@ -32,7 +32,7 @@ ice_prgm_fdir_fltr(struct ice_vsi *vsi, struct ice_fltr_desc *fdir_desc,
 	struct ice_tx_buf *tx_buf, *first;
 	struct ice_fltr_desc *f_desc;
 	struct ice_tx_desc *tx_desc;
-	struct ice_ring *tx_ring;
+	struct ice_tx_ring *tx_ring;
 	struct device *dev;
 	dma_addr_t dma;
 	u32 td_cmd;
@@ -106,7 +106,7 @@ ice_prgm_fdir_fltr(struct ice_vsi *vsi, struct ice_fltr_desc *fdir_desc,
  * @tx_buf: the buffer to free
  */
 static void
-ice_unmap_and_free_tx_buf(struct ice_ring *ring, struct ice_tx_buf *tx_buf)
+ice_unmap_and_free_tx_buf(struct ice_tx_ring *ring, struct ice_tx_buf *tx_buf)
 {
 	if (tx_buf->skb) {
 		if (tx_buf->tx_flags & ICE_TX_FLAGS_DUMMY_PKT)
@@ -133,7 +133,7 @@ ice_unmap_and_free_tx_buf(struct ice_ring *ring, struct ice_tx_buf *tx_buf)
 	/* tx_buf must be completely set up in the transmit path */
 }
 
-static struct netdev_queue *txring_txq(const struct ice_ring *ring)
+static struct netdev_queue *txring_txq(const struct ice_tx_ring *ring)
 {
 	return netdev_get_tx_queue(ring->netdev, ring->q_index);
 }
@@ -142,8 +142,9 @@ static struct netdev_queue *txring_txq(const struct ice_ring *ring)
  * ice_clean_tx_ring - Free any empty Tx buffers
  * @tx_ring: ring to be cleaned
  */
-void ice_clean_tx_ring(struct ice_ring *tx_ring)
+void ice_clean_tx_ring(struct ice_tx_ring *tx_ring)
 {
+	u32 size;
 	u16 i;
 
 	if (ice_ring_is_xdp(tx_ring) && tx_ring->xsk_pool) {
@@ -162,8 +163,10 @@ void ice_clean_tx_ring(struct ice_ring *tx_ring)
 tx_skip_free:
 	memset(tx_ring->tx_buf, 0, sizeof(*tx_ring->tx_buf) * tx_ring->count);
 
+	size = ALIGN(tx_ring->count * sizeof(struct ice_tx_desc),
+		     PAGE_SIZE);
 	/* Zero out the descriptor ring */
-	memset(tx_ring->desc, 0, tx_ring->size);
+	memset(tx_ring->desc, 0, size);
 
 	tx_ring->next_to_use = 0;
 	tx_ring->next_to_clean = 0;
@@ -181,14 +184,18 @@ void ice_clean_tx_ring(struct ice_ring *tx_ring)
  *
  * Free all transmit software resources
  */
-void ice_free_tx_ring(struct ice_ring *tx_ring)
+void ice_free_tx_ring(struct ice_tx_ring *tx_ring)
 {
+	u32 size;
+
 	ice_clean_tx_ring(tx_ring);
 	devm_kfree(tx_ring->dev, tx_ring->tx_buf);
 	tx_ring->tx_buf = NULL;
 
 	if (tx_ring->desc) {
-		dmam_free_coherent(tx_ring->dev, tx_ring->size,
+		size = ALIGN(tx_ring->count * sizeof(struct ice_tx_desc),
+			     PAGE_SIZE);
+		dmam_free_coherent(tx_ring->dev, size,
 				   tx_ring->desc, tx_ring->dma);
 		tx_ring->desc = NULL;
 	}
@@ -201,7 +208,7 @@ void ice_free_tx_ring(struct ice_ring *tx_ring)
  *
  * Returns true if there's any budget left (e.g. the clean is finished)
  */
-static bool ice_clean_tx_irq(struct ice_ring *tx_ring, int napi_budget)
+static bool ice_clean_tx_irq(struct ice_tx_ring *tx_ring, int napi_budget)
 {
 	unsigned int total_bytes = 0, total_pkts = 0;
 	unsigned int budget = ICE_DFLT_IRQ_WORK;
@@ -329,9 +336,10 @@ static bool ice_clean_tx_irq(struct ice_ring *tx_ring, int napi_budget)
  *
  * Return 0 on success, negative on error
  */
-int ice_setup_tx_ring(struct ice_ring *tx_ring)
+int ice_setup_tx_ring(struct ice_tx_ring *tx_ring)
 {
 	struct device *dev = tx_ring->dev;
+	u32 size;
 
 	if (!dev)
 		return -ENOMEM;
@@ -345,13 +353,13 @@ int ice_setup_tx_ring(struct ice_ring *tx_ring)
 		return -ENOMEM;
 
 	/* round up to nearest page */
-	tx_ring->size = ALIGN(tx_ring->count * sizeof(struct ice_tx_desc),
-			      PAGE_SIZE);
-	tx_ring->desc = dmam_alloc_coherent(dev, tx_ring->size, &tx_ring->dma,
+	size = ALIGN(tx_ring->count * sizeof(struct ice_tx_desc),
+		     PAGE_SIZE);
+	tx_ring->desc = dmam_alloc_coherent(dev, size, &tx_ring->dma,
 					    GFP_KERNEL);
 	if (!tx_ring->desc) {
 		dev_err(dev, "Unable to allocate memory for the Tx descriptor ring, size=%d\n",
-			tx_ring->size);
+			size);
 		goto err;
 	}
 
@@ -373,6 +381,7 @@ int ice_setup_tx_ring(struct ice_ring *tx_ring)
 void ice_clean_rx_ring(struct ice_ring *rx_ring)
 {
 	struct device *dev = rx_ring->dev;
+	u32 size;
 	u16 i;
 
 	/* ring already cleared, nothing to do */
@@ -417,7 +426,9 @@ void ice_clean_rx_ring(struct ice_ring *rx_ring)
 	memset(rx_ring->rx_buf, 0, sizeof(*rx_ring->rx_buf) * rx_ring->count);
 
 	/* Zero out the descriptor ring */
-	memset(rx_ring->desc, 0, rx_ring->size);
+	size = ALIGN(rx_ring->count * sizeof(union ice_32byte_rx_desc),
+		     PAGE_SIZE);
+	memset(rx_ring->desc, 0, size);
 
 	rx_ring->next_to_alloc = 0;
 	rx_ring->next_to_clean = 0;
@@ -432,6 +443,8 @@ void ice_clean_rx_ring(struct ice_ring *rx_ring)
  */
 void ice_free_rx_ring(struct ice_ring *rx_ring)
 {
+	u32 size;
+
 	ice_clean_rx_ring(rx_ring);
 	if (rx_ring->vsi->type == ICE_VSI_PF)
 		if (xdp_rxq_info_is_reg(&rx_ring->xdp_rxq))
@@ -441,7 +454,9 @@ void ice_free_rx_ring(struct ice_ring *rx_ring)
 	rx_ring->rx_buf = NULL;
 
 	if (rx_ring->desc) {
-		dmam_free_coherent(rx_ring->dev, rx_ring->size,
+		size = ALIGN(rx_ring->count * sizeof(union ice_32byte_rx_desc),
+			     PAGE_SIZE);
+		dmam_free_coherent(rx_ring->dev, size,
 				   rx_ring->desc, rx_ring->dma);
 		rx_ring->desc = NULL;
 	}
@@ -456,6 +471,7 @@ void ice_free_rx_ring(struct ice_ring *rx_ring)
 int ice_setup_rx_ring(struct ice_ring *rx_ring)
 {
 	struct device *dev = rx_ring->dev;
+	u32 size;
 
 	if (!dev)
 		return -ENOMEM;
@@ -469,13 +485,13 @@ int ice_setup_rx_ring(struct ice_ring *rx_ring)
 		return -ENOMEM;
 
 	/* round up to nearest page */
-	rx_ring->size = ALIGN(rx_ring->count * sizeof(union ice_32byte_rx_desc),
-			      PAGE_SIZE);
-	rx_ring->desc = dmam_alloc_coherent(dev, rx_ring->size, &rx_ring->dma,
+	size = ALIGN(rx_ring->count * sizeof(union ice_32byte_rx_desc),
+		     PAGE_SIZE);
+	rx_ring->desc = dmam_alloc_coherent(dev, size, &rx_ring->dma,
 					    GFP_KERNEL);
 	if (!rx_ring->desc) {
 		dev_err(dev, "Unable to allocate memory for the Rx descriptor ring, size=%d\n",
-			rx_ring->size);
+			size);
 		goto err;
 	}
 
@@ -526,7 +542,7 @@ static int
 ice_run_xdp(struct ice_ring *rx_ring, struct xdp_buff *xdp,
 	    struct bpf_prog *xdp_prog)
 {
-	struct ice_ring *xdp_ring;
+	struct ice_tx_ring *xdp_ring;
 	int err, result;
 	u32 act;
 
@@ -576,7 +592,7 @@ ice_xdp_xmit(struct net_device *dev, int n, struct xdp_frame **frames,
 	struct ice_netdev_priv *np = netdev_priv(dev);
 	unsigned int queue_index = smp_processor_id();
 	struct ice_vsi *vsi = np->vsi;
-	struct ice_ring *xdp_ring;
+	struct ice_tx_ring *xdp_ring;
 	int nxmit = 0, i;
 
 	if (test_bit(ICE_VSI_DOWN, vsi->state))
@@ -1247,9 +1263,9 @@ static void ice_net_dim(struct ice_q_vector *q_vector)
 	if (ITR_IS_DYNAMIC(tx)) {
 		struct dim_sample dim_sample = {};
 		u64 packets = 0, bytes = 0;
-		struct ice_ring *ring;
+		struct ice_tx_ring *ring;
 
-		ice_for_each_ring(ring, q_vector->tx) {
+		ice_for_each_tx_ring(ring, q_vector->tx) {
 			packets += ring->stats.pkts;
 			bytes += ring->stats.bytes;
 		}
@@ -1387,6 +1403,7 @@ int ice_napi_poll(struct napi_struct *napi, int budget)
 {
 	struct ice_q_vector *q_vector =
 				container_of(napi, struct ice_q_vector, napi);
+	struct ice_tx_ring *tx_ring;
 	bool clean_complete = true;
 	struct ice_ring *ring;
 	int budget_per_ring;
@@ -1395,10 +1412,10 @@ int ice_napi_poll(struct napi_struct *napi, int budget)
 	/* Since the actual Tx work is minimal, we can give the Tx a larger
 	 * budget and be more aggressive about cleaning up the Tx descriptors.
 	 */
-	ice_for_each_ring(ring, q_vector->tx) {
-		bool wd = ring->xsk_pool ?
-			  ice_clean_tx_irq_zc(ring, budget) :
-			  ice_clean_tx_irq(ring, budget);
+	ice_for_each_tx_ring(tx_ring, q_vector->tx) {
+		bool wd = tx_ring->xsk_pool ?
+			  ice_clean_tx_irq_zc(tx_ring, budget) :
+			  ice_clean_tx_irq(tx_ring, budget);
 
 		if (!wd)
 			clean_complete = false;
@@ -1462,7 +1479,7 @@ int ice_napi_poll(struct napi_struct *napi, int budget)
  *
  * Returns -EBUSY if a stop is needed, else 0
  */
-static int __ice_maybe_stop_tx(struct ice_ring *tx_ring, unsigned int size)
+static int __ice_maybe_stop_tx(struct ice_tx_ring *tx_ring, unsigned int size)
 {
 	netif_stop_subqueue(tx_ring->netdev, tx_ring->q_index);
 	/* Memory barrier before checking head and tail */
@@ -1485,7 +1502,7 @@ static int __ice_maybe_stop_tx(struct ice_ring *tx_ring, unsigned int size)
  *
  * Returns 0 if stop is not needed
  */
-static int ice_maybe_stop_tx(struct ice_ring *tx_ring, unsigned int size)
+static int ice_maybe_stop_tx(struct ice_tx_ring *tx_ring, unsigned int size)
 {
 	if (likely(ICE_DESC_UNUSED(tx_ring) >= size))
 		return 0;
@@ -1504,7 +1521,7 @@ static int ice_maybe_stop_tx(struct ice_ring *tx_ring, unsigned int size)
  * it and the length into the transmit descriptor.
  */
 static void
-ice_tx_map(struct ice_ring *tx_ring, struct ice_tx_buf *first,
+ice_tx_map(struct ice_tx_ring *tx_ring, struct ice_tx_buf *first,
 	   struct ice_tx_offload_params *off)
 {
 	u64 td_offset, td_tag, td_cmd;
@@ -1840,7 +1857,7 @@ int ice_tx_csum(struct ice_tx_buf *first, struct ice_tx_offload_params *off)
  * related to VLAN tagging for the HW, such as VLAN, DCB, etc.
  */
 static void
-ice_tx_prepare_vlan_flags(struct ice_ring *tx_ring, struct ice_tx_buf *first)
+ice_tx_prepare_vlan_flags(struct ice_tx_ring *tx_ring, struct ice_tx_buf *first)
 {
 	struct sk_buff *skb = first->skb;
 
@@ -2146,7 +2163,7 @@ static bool ice_chk_linearize(struct sk_buff *skb, unsigned int count)
  * @off: Tx offload parameters
  */
 static void
-ice_tstamp(struct ice_ring *tx_ring, struct sk_buff *skb,
+ice_tstamp(struct ice_tx_ring *tx_ring, struct sk_buff *skb,
 	   struct ice_tx_buf *first, struct ice_tx_offload_params *off)
 {
 	s8 idx;
@@ -2181,7 +2198,7 @@ ice_tstamp(struct ice_ring *tx_ring, struct sk_buff *skb,
  * Returns NETDEV_TX_OK if sent, else an error code
  */
 static netdev_tx_t
-ice_xmit_frame_ring(struct sk_buff *skb, struct ice_ring *tx_ring)
+ice_xmit_frame_ring(struct sk_buff *skb, struct ice_tx_ring *tx_ring)
 {
 	struct ice_tx_offload_params offload = { 0 };
 	struct ice_vsi *vsi = tx_ring->vsi;
@@ -2282,7 +2299,7 @@ netdev_tx_t ice_start_xmit(struct sk_buff *skb, struct net_device *netdev)
 {
 	struct ice_netdev_priv *np = netdev_priv(netdev);
 	struct ice_vsi *vsi = np->vsi;
-	struct ice_ring *tx_ring;
+	struct ice_tx_ring *tx_ring;
 
 	tx_ring = vsi->tx_rings[skb->queue_mapping];
 
@@ -2299,7 +2316,7 @@ netdev_tx_t ice_start_xmit(struct sk_buff *skb, struct net_device *netdev)
  * ice_clean_ctrl_tx_irq - interrupt handler for flow director Tx queue
  * @tx_ring: tx_ring to clean
  */
-void ice_clean_ctrl_tx_irq(struct ice_ring *tx_ring)
+void ice_clean_ctrl_tx_irq(struct ice_tx_ring *tx_ring)
 {
 	struct ice_vsi *vsi = tx_ring->vsi;
 	s16 i = tx_ring->next_to_clean;
diff --git a/drivers/net/ethernet/intel/ice/ice_txrx.h b/drivers/net/ethernet/intel/ice/ice_txrx.h
index 1e46e80f3d6f..d4ab3558933e 100644
--- a/drivers/net/ethernet/intel/ice/ice_txrx.h
+++ b/drivers/net/ethernet/intel/ice/ice_txrx.h
@@ -154,7 +154,7 @@ struct ice_tx_buf {
 
 struct ice_tx_offload_params {
 	u64 cd_qw1;
-	struct ice_ring *tx_ring;
+	struct ice_tx_ring *tx_ring;
 	u32 td_cmd;
 	u32 td_offset;
 	u32 td_l2tag1;
@@ -267,16 +267,11 @@ struct ice_ring {
 	struct ice_vsi *vsi;		/* Backreference to associated VSI */
 	struct ice_q_vector *q_vector;	/* Backreference to associated vector */
 	u8 __iomem *tail;
-	union {
-		struct ice_tx_buf *tx_buf;
-		struct ice_rx_buf *rx_buf;
-	};
+	struct ice_rx_buf *rx_buf;
 	/* CL2 - 2nd cacheline starts here */
+	struct xdp_rxq_info xdp_rxq;
+	/* CL3 - 3rd cacheline starts here */
 	u16 q_index;			/* Queue number of ring */
-	u16 q_handle;			/* Queue handle per TC */
-
-	u8 ring_active:1;		/* is ring online or not */
-
 	u16 count;			/* Number of descriptors */
 	u16 reg_idx;			/* HW register index of the ring */
 
@@ -284,38 +279,61 @@ struct ice_ring {
 	u16 next_to_use;
 	u16 next_to_clean;
 	u16 next_to_alloc;
+	u16 rx_offset;
+	u16 rx_buf_len;
 
 	/* stats structs */
+	struct ice_rxq_stats rx_stats;
 	struct ice_q_stats	stats;
 	struct u64_stats_sync syncp;
-	union {
-		struct ice_txq_stats tx_stats;
-		struct ice_rxq_stats rx_stats;
-	};
 
 	struct rcu_head rcu;		/* to avoid race on free */
-	DECLARE_BITMAP(xps_state, ICE_TX_NBITS);	/* XPS Config State */
+	/* CL4 - 3rd cacheline starts here */
 	struct bpf_prog *xdp_prog;
 	struct xsk_buff_pool *xsk_pool;
-	u16 rx_offset;
-	/* CL3 - 3rd cacheline starts here */
-	struct xdp_rxq_info xdp_rxq;
 	struct sk_buff *skb;
-	/* CLX - the below items are only accessed infrequently and should be
-	 * in their own cache line if possible
-	 */
-#define ICE_TX_FLAGS_RING_XDP		BIT(0)
+	dma_addr_t dma;			/* physical address of ring */
 #define ICE_RX_FLAGS_RING_BUILD_SKB	BIT(1)
+	u64 cached_phctime;
+	u8 dcb_tc;			/* Traffic class of ring */
+	u8 ptp_rx;
 	u8 flags;
+} ____cacheline_internodealigned_in_smp;
+
+struct ice_tx_ring {
+	/* CL1 - 1st cacheline starts here */
+	struct ice_tx_ring *next;	/* pointer to next ring in q_vector */
+	void *desc;			/* Descriptor ring memory */
+	struct device *dev;		/* Used for DMA mapping */
+	u8 __iomem *tail;
+	struct ice_tx_buf *tx_buf;
+	struct ice_q_vector *q_vector;	/* Backreference to associated vector */
+	struct net_device *netdev;	/* netdev ring maps to */
+	struct ice_vsi *vsi;		/* Backreference to associated VSI */
+	/* CL2 - 2nd cacheline starts here */
 	dma_addr_t dma;			/* physical address of ring */
-	unsigned int size;		/* length of descriptor ring in bytes */
+	u16 next_to_use;
+	u16 next_to_clean;
+	u16 count;			/* Number of descriptors */
+	u16 q_index;			/* Queue number of ring */
+	struct xsk_buff_pool *xsk_pool;
+
+	/* stats structs */
+	struct ice_q_stats	stats;
+	struct u64_stats_sync syncp;
+	struct ice_txq_stats tx_stats;
+
+	/* CL3 - 3rd cacheline starts here */
+	struct rcu_head rcu;		/* to avoid race on free */
+	DECLARE_BITMAP(xps_state, ICE_TX_NBITS);	/* XPS Config State */
+	struct ice_ptp_tx *tx_tstamps;
 	u32 txq_teid;			/* Added Tx queue TEID */
-	u16 rx_buf_len;
+	u16 q_handle;			/* Queue handle per TC */
+	u16 reg_idx;			/* HW register index of the ring */
+#define ICE_TX_FLAGS_RING_XDP		BIT(0)
+	u8 flags;
 	u8 dcb_tc;			/* Traffic class of ring */
-	struct ice_ptp_tx *tx_tstamps;
-	u64 cached_phctime;
-	u8 ptp_rx:1;
-	u8 ptp_tx:1;
+	u8 ptp_tx;
 } ____cacheline_internodealigned_in_smp;
 
 static inline bool ice_ring_uses_build_skb(struct ice_ring *ring)
@@ -333,14 +351,17 @@ static inline void ice_clear_ring_build_skb_ena(struct ice_ring *ring)
 	ring->flags &= ~ICE_RX_FLAGS_RING_BUILD_SKB;
 }
 
-static inline bool ice_ring_is_xdp(struct ice_ring *ring)
+static inline bool ice_ring_is_xdp(struct ice_tx_ring *ring)
 {
 	return !!(ring->flags & ICE_TX_FLAGS_RING_XDP);
 }
 
 struct ice_ring_container {
 	/* head of linked-list of rings */
-	struct ice_ring *ring;
+	union {
+		struct ice_ring *ring;
+		struct ice_tx_ring *tx_ring;
+	};
 	struct dim dim;		/* data for net_dim algorithm */
 	u16 itr_idx;		/* index in the interrupt vector */
 	/* this matches the maximum number of ITR bits, but in usec
@@ -363,6 +384,9 @@ struct ice_coalesce_stored {
 #define ice_for_each_ring(pos, head) \
 	for (pos = (head).ring; pos; pos = pos->next)
 
+#define ice_for_each_tx_ring(pos, head) \
+	for (pos = (head).tx_ring; pos; pos = pos->next)
+
 static inline unsigned int ice_rx_pg_order(struct ice_ring *ring)
 {
 #if (PAGE_SIZE < 8192)
@@ -378,16 +402,16 @@ union ice_32b_rx_flex_desc;
 
 bool ice_alloc_rx_bufs(struct ice_ring *rxr, u16 cleaned_count);
 netdev_tx_t ice_start_xmit(struct sk_buff *skb, struct net_device *netdev);
-void ice_clean_tx_ring(struct ice_ring *tx_ring);
+void ice_clean_tx_ring(struct ice_tx_ring *tx_ring);
 void ice_clean_rx_ring(struct ice_ring *rx_ring);
-int ice_setup_tx_ring(struct ice_ring *tx_ring);
+int ice_setup_tx_ring(struct ice_tx_ring *tx_ring);
 int ice_setup_rx_ring(struct ice_ring *rx_ring);
-void ice_free_tx_ring(struct ice_ring *tx_ring);
+void ice_free_tx_ring(struct ice_tx_ring *tx_ring);
 void ice_free_rx_ring(struct ice_ring *rx_ring);
 int ice_napi_poll(struct napi_struct *napi, int budget);
 int
 ice_prgm_fdir_fltr(struct ice_vsi *vsi, struct ice_fltr_desc *fdir_desc,
 		   u8 *raw_packet);
 int ice_clean_rx_irq(struct ice_ring *rx_ring, int budget);
-void ice_clean_ctrl_tx_irq(struct ice_ring *tx_ring);
+void ice_clean_ctrl_tx_irq(struct ice_tx_ring *tx_ring);
 #endif /* _ICE_TXRX_H_ */
diff --git a/drivers/net/ethernet/intel/ice/ice_txrx_lib.c b/drivers/net/ethernet/intel/ice/ice_txrx_lib.c
index 171397dcf00a..74519c603872 100644
--- a/drivers/net/ethernet/intel/ice/ice_txrx_lib.c
+++ b/drivers/net/ethernet/intel/ice/ice_txrx_lib.c
@@ -217,7 +217,7 @@ ice_receive_skb(struct ice_ring *rx_ring, struct sk_buff *skb, u16 vlan_tag)
  * @size: packet data size
  * @xdp_ring: XDP ring for transmission
  */
-int ice_xmit_xdp_ring(void *data, u16 size, struct ice_ring *xdp_ring)
+int ice_xmit_xdp_ring(void *data, u16 size, struct ice_tx_ring *xdp_ring)
 {
 	u16 i = xdp_ring->next_to_use;
 	struct ice_tx_desc *tx_desc;
@@ -269,7 +269,7 @@ int ice_xmit_xdp_ring(void *data, u16 size, struct ice_ring *xdp_ring)
  *
  * Returns negative on failure, 0 on success.
  */
-int ice_xmit_xdp_buff(struct xdp_buff *xdp, struct ice_ring *xdp_ring)
+int ice_xmit_xdp_buff(struct xdp_buff *xdp, struct ice_tx_ring *xdp_ring)
 {
 	struct xdp_frame *xdpf = xdp_convert_buff_to_frame(xdp);
 
@@ -294,7 +294,7 @@ void ice_finalize_xdp_rx(struct ice_ring *rx_ring, unsigned int xdp_res)
 		xdp_do_flush_map();
 
 	if (xdp_res & ICE_XDP_TX) {
-		struct ice_ring *xdp_ring =
+		struct ice_tx_ring *xdp_ring =
 			rx_ring->vsi->xdp_rings[rx_ring->q_index];
 
 		ice_xdp_ring_update_tail(xdp_ring);
diff --git a/drivers/net/ethernet/intel/ice/ice_txrx_lib.h b/drivers/net/ethernet/intel/ice/ice_txrx_lib.h
index 05ac30752902..6989070ae2e2 100644
--- a/drivers/net/ethernet/intel/ice/ice_txrx_lib.h
+++ b/drivers/net/ethernet/intel/ice/ice_txrx_lib.h
@@ -37,7 +37,7 @@ ice_build_ctob(u64 td_cmd, u64 td_offset, unsigned int size, u64 td_tag)
  *
  * This function updates the XDP Tx ring tail register.
  */
-static inline void ice_xdp_ring_update_tail(struct ice_ring *xdp_ring)
+static inline void ice_xdp_ring_update_tail(struct ice_tx_ring *xdp_ring)
 {
 	/* Force memory writes to complete before letting h/w
 	 * know there are new descriptors to fetch.
@@ -46,9 +46,9 @@ static inline void ice_xdp_ring_update_tail(struct ice_ring *xdp_ring)
 	writel_relaxed(xdp_ring->next_to_use, xdp_ring->tail);
 }
 
-void ice_finalize_xdp_rx(struct ice_ring *rx_ring, unsigned int xdp_res);
-int ice_xmit_xdp_buff(struct xdp_buff *xdp, struct ice_ring *xdp_ring);
-int ice_xmit_xdp_ring(void *data, u16 size, struct ice_ring *xdp_ring);
+void ice_finalize_xdp_rx(struct ice_ring *xdp_ring, unsigned int xdp_res);
+int ice_xmit_xdp_buff(struct xdp_buff *xdp, struct ice_tx_ring *xdp_ring);
+int ice_xmit_xdp_ring(void *data, u16 size, struct ice_tx_ring *xdp_ring);
 void ice_release_rx_desc(struct ice_ring *rx_ring, u16 val);
 void
 ice_process_skb_fields(struct ice_ring *rx_ring,
diff --git a/drivers/net/ethernet/intel/ice/ice_virtchnl_pf.c b/drivers/net/ethernet/intel/ice/ice_virtchnl_pf.c
index 2826570dab51..0ee694960e51 100644
--- a/drivers/net/ethernet/intel/ice/ice_virtchnl_pf.c
+++ b/drivers/net/ethernet/intel/ice/ice_virtchnl_pf.c
@@ -3326,7 +3326,7 @@ static int ice_vc_dis_qs_msg(struct ice_vf *vf, u8 *msg)
 		q_map = vqs->tx_queues;
 
 		for_each_set_bit(vf_q_id, &q_map, ICE_MAX_RSS_QS_PER_VF) {
-			struct ice_ring *ring = vsi->tx_rings[vf_q_id];
+			struct ice_tx_ring *ring = vsi->tx_rings[vf_q_id];
 			struct ice_txq_meta txq_meta = { 0 };
 
 			if (!ice_vc_isvalid_q_id(vf, vqs->vsi_id, vf_q_id)) {
diff --git a/drivers/net/ethernet/intel/ice/ice_xsk.c b/drivers/net/ethernet/intel/ice/ice_xsk.c
index 5a9f61deeb38..bcf0f8e2ba6e 100644
--- a/drivers/net/ethernet/intel/ice/ice_xsk.c
+++ b/drivers/net/ethernet/intel/ice/ice_xsk.c
@@ -104,12 +104,13 @@ ice_qvec_cfg_msix(struct ice_vsi *vsi, struct ice_q_vector *q_vector)
 	u16 reg_idx = q_vector->reg_idx;
 	struct ice_pf *pf = vsi->back;
 	struct ice_hw *hw = &pf->hw;
+	struct ice_tx_ring *tx_ring;
 	struct ice_ring *ring;
 
 	ice_cfg_itr(hw, q_vector);
 
-	ice_for_each_ring(ring, q_vector->tx)
-		ice_cfg_txq_interrupt(vsi, ring->reg_idx, reg_idx,
+	ice_for_each_tx_ring(tx_ring, q_vector->tx)
+		ice_cfg_txq_interrupt(vsi, tx_ring->reg_idx, reg_idx,
 				      q_vector->tx.itr_idx);
 
 	ice_for_each_ring(ring, q_vector->rx)
@@ -144,8 +145,9 @@ static void ice_qvec_ena_irq(struct ice_vsi *vsi, struct ice_q_vector *q_vector)
 static int ice_qp_dis(struct ice_vsi *vsi, u16 q_idx)
 {
 	struct ice_txq_meta txq_meta = { };
-	struct ice_ring *tx_ring, *rx_ring;
 	struct ice_q_vector *q_vector;
+	struct ice_tx_ring *tx_ring;
+	struct ice_ring *rx_ring;
 	int timeout = 50;
 	int err;
 
@@ -171,7 +173,7 @@ static int ice_qp_dis(struct ice_vsi *vsi, u16 q_idx)
 	if (err)
 		return err;
 	if (ice_is_xdp_ena_vsi(vsi)) {
-		struct ice_ring *xdp_ring = vsi->xdp_rings[q_idx];
+		struct ice_tx_ring *xdp_ring = vsi->xdp_rings[q_idx];
 
 		memset(&txq_meta, 0, sizeof(txq_meta));
 		ice_fill_txq_meta(vsi, xdp_ring, &txq_meta);
@@ -201,8 +203,9 @@ static int ice_qp_dis(struct ice_vsi *vsi, u16 q_idx)
 static int ice_qp_ena(struct ice_vsi *vsi, u16 q_idx)
 {
 	struct ice_aqc_add_tx_qgrp *qg_buf;
-	struct ice_ring *tx_ring, *rx_ring;
 	struct ice_q_vector *q_vector;
+	struct ice_tx_ring *tx_ring;
+	struct ice_ring *rx_ring;
 	u16 size;
 	int err;
 
@@ -225,7 +228,7 @@ static int ice_qp_ena(struct ice_vsi *vsi, u16 q_idx)
 		goto free_buf;
 
 	if (ice_is_xdp_ena_vsi(vsi)) {
-		struct ice_ring *xdp_ring = vsi->xdp_rings[q_idx];
+		struct ice_tx_ring *xdp_ring = vsi->xdp_rings[q_idx];
 
 		memset(qg_buf, 0, size);
 		qg_buf->num_txqs = 1;
@@ -233,7 +236,7 @@ static int ice_qp_ena(struct ice_vsi *vsi, u16 q_idx)
 		if (err)
 			goto free_buf;
 		ice_set_ring_xdp(xdp_ring);
-		xdp_ring->xsk_pool = ice_xsk_pool(xdp_ring);
+		xdp_ring->xsk_pool = ice_tx_xsk_pool(xdp_ring);
 	}
 
 	err = ice_vsi_cfg_rxq(rx_ring);
@@ -462,8 +465,8 @@ static int
 ice_run_xdp_zc(struct ice_ring *rx_ring, struct xdp_buff *xdp)
 {
 	int err, result = ICE_XDP_PASS;
+	struct ice_tx_ring *xdp_ring;
 	struct bpf_prog *xdp_prog;
-	struct ice_ring *xdp_ring;
 	u32 act;
 
 	/* ZC patch is enabled only when XDP program is set,
@@ -618,7 +621,7 @@ int ice_clean_rx_irq_zc(struct ice_ring *rx_ring, int budget)
  *
  * Returns true if cleanup/transmission is done.
  */
-static bool ice_xmit_zc(struct ice_ring *xdp_ring, int budget)
+static bool ice_xmit_zc(struct ice_tx_ring *xdp_ring, int budget)
 {
 	struct ice_tx_desc *tx_desc = NULL;
 	bool work_done = true;
@@ -669,7 +672,7 @@ static bool ice_xmit_zc(struct ice_ring *xdp_ring, int budget)
  * @tx_buf: Tx buffer to clean
  */
 static void
-ice_clean_xdp_tx_buf(struct ice_ring *xdp_ring, struct ice_tx_buf *tx_buf)
+ice_clean_xdp_tx_buf(struct ice_tx_ring *xdp_ring, struct ice_tx_buf *tx_buf)
 {
 	xdp_return_frame((struct xdp_frame *)tx_buf->raw_buf);
 	dma_unmap_single(xdp_ring->dev, dma_unmap_addr(tx_buf, dma),
@@ -684,7 +687,7 @@ ice_clean_xdp_tx_buf(struct ice_ring *xdp_ring, struct ice_tx_buf *tx_buf)
  *
  * Returns true if cleanup/tranmission is done.
  */
-bool ice_clean_tx_irq_zc(struct ice_ring *xdp_ring, int budget)
+bool ice_clean_tx_irq_zc(struct ice_tx_ring *xdp_ring, int budget)
 {
 	int total_packets = 0, total_bytes = 0;
 	s16 ntc = xdp_ring->next_to_clean;
@@ -757,7 +760,7 @@ ice_xsk_wakeup(struct net_device *netdev, u32 queue_id,
 	struct ice_netdev_priv *np = netdev_priv(netdev);
 	struct ice_q_vector *q_vector;
 	struct ice_vsi *vsi = np->vsi;
-	struct ice_ring *ring;
+	struct ice_tx_ring *ring;
 
 	if (test_bit(ICE_DOWN, vsi->state))
 		return -ENETDOWN;
@@ -826,7 +829,7 @@ void ice_xsk_clean_rx_ring(struct ice_ring *rx_ring)
  * ice_xsk_clean_xdp_ring - Clean the XDP Tx ring and its buffer pool queues
  * @xdp_ring: XDP_Tx ring
  */
-void ice_xsk_clean_xdp_ring(struct ice_ring *xdp_ring)
+void ice_xsk_clean_xdp_ring(struct ice_tx_ring *xdp_ring)
 {
 	u16 ntc = xdp_ring->next_to_clean, ntu = xdp_ring->next_to_use;
 	u32 xsk_frames = 0;
diff --git a/drivers/net/ethernet/intel/ice/ice_xsk.h b/drivers/net/ethernet/intel/ice/ice_xsk.h
index ea208808623a..2cf26372aefd 100644
--- a/drivers/net/ethernet/intel/ice/ice_xsk.h
+++ b/drivers/net/ethernet/intel/ice/ice_xsk.h
@@ -12,12 +12,12 @@ struct ice_vsi;
 int ice_xsk_pool_setup(struct ice_vsi *vsi, struct xsk_buff_pool *pool,
 		       u16 qid);
 int ice_clean_rx_irq_zc(struct ice_ring *rx_ring, int budget);
-bool ice_clean_tx_irq_zc(struct ice_ring *xdp_ring, int budget);
+bool ice_clean_tx_irq_zc(struct ice_tx_ring *xdp_ring, int budget);
 int ice_xsk_wakeup(struct net_device *netdev, u32 queue_id, u32 flags);
 bool ice_alloc_rx_bufs_zc(struct ice_ring *rx_ring, u16 count);
 bool ice_xsk_any_rx_ring_ena(struct ice_vsi *vsi);
 void ice_xsk_clean_rx_ring(struct ice_ring *rx_ring);
-void ice_xsk_clean_xdp_ring(struct ice_ring *xdp_ring);
+void ice_xsk_clean_xdp_ring(struct ice_tx_ring *xdp_ring);
 #else
 static inline int
 ice_xsk_pool_setup(struct ice_vsi __always_unused *vsi,
@@ -35,7 +35,7 @@ ice_clean_rx_irq_zc(struct ice_ring __always_unused *rx_ring,
 }
 
 static inline bool
-ice_clean_tx_irq_zc(struct ice_ring __always_unused *xdp_ring,
+ice_clean_tx_irq_zc(struct ice_tx_ring __always_unused *xdp_ring,
 		    int __always_unused budget)
 {
 	return false;
@@ -61,6 +61,6 @@ ice_xsk_wakeup(struct net_device __always_unused *netdev,
 }
 
 static inline void ice_xsk_clean_rx_ring(struct ice_ring *rx_ring) { }
-static inline void ice_xsk_clean_xdp_ring(struct ice_ring *xdp_ring) { }
+static inline void ice_xsk_clean_xdp_ring(struct ice_tx_ring *xdp_ring) { }
 #endif /* CONFIG_XDP_SOCKETS */
 #endif /* !_ICE_XSK_H_ */
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v3 intel-next 2/6] ice: unify xdp_rings accesses
  2021-08-05 23:00 [PATCH v3 intel-next 0/6] XDP_TX improvements for ice Maciej Fijalkowski
  2021-08-05 23:00 ` [PATCH v3 intel-next 1/6] ice: split ice_ring onto Tx/Rx separate structs Maciej Fijalkowski
@ 2021-08-05 23:00 ` Maciej Fijalkowski
  2021-08-05 23:00 ` [PATCH v3 intel-next 3/6] ice: do not create xdp_frame on XDP_TX Maciej Fijalkowski
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 10+ messages in thread
From: Maciej Fijalkowski @ 2021-08-05 23:00 UTC (permalink / raw)
  To: intel-wired-lan
  Cc: netdev, bpf, davem, anthony.l.nguyen, kuba, bjorn,
	magnus.karlsson, jesse.brandeburg, alexandr.lobakin, joamaki,
	toke, Maciej Fijalkowski

There has been a long lasting issue of improper xdp_rings indexing for
XDP_TX and XDP_REDIRECT actions. Given that currently rx_ring->q_index
is mixed with smp_processor_id(), there could be a situation where Tx
descriptors are produced onto XDP Tx ring, but tail is never bumped -
for example pin a particular queue id to non-matching IRQ line.

Address this problem by ignoring the user ring count setting and always
initialize the xdp_rings array to be of num_possible_cpus() size. Then,
always use the smp_processor_id() as an index to xdp_rings array. This
provides serialization as at given time only a single softirq can run on
a particular CPU.

Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
---
 drivers/net/ethernet/intel/ice/ice_lib.c      | 2 +-
 drivers/net/ethernet/intel/ice/ice_main.c     | 2 +-
 drivers/net/ethernet/intel/ice/ice_txrx_lib.c | 2 +-
 3 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/intel/ice/ice_lib.c b/drivers/net/ethernet/intel/ice/ice_lib.c
index ac0d7a52406b..d44a657384e6 100644
--- a/drivers/net/ethernet/intel/ice/ice_lib.c
+++ b/drivers/net/ethernet/intel/ice/ice_lib.c
@@ -3152,7 +3152,7 @@ int ice_vsi_rebuild(struct ice_vsi *vsi, bool init_vsi)
 
 		ice_vsi_map_rings_to_vectors(vsi);
 		if (ice_is_xdp_ena_vsi(vsi)) {
-			vsi->num_xdp_txq = vsi->alloc_rxq;
+			vsi->num_xdp_txq = num_possible_cpus();
 			ret = ice_prepare_xdp_rings(vsi, vsi->xdp_prog);
 			if (ret)
 				goto err_vectors;
diff --git a/drivers/net/ethernet/intel/ice/ice_main.c b/drivers/net/ethernet/intel/ice/ice_main.c
index cbcb4ad60852..8a1603301726 100644
--- a/drivers/net/ethernet/intel/ice/ice_main.c
+++ b/drivers/net/ethernet/intel/ice/ice_main.c
@@ -2625,7 +2625,7 @@ ice_xdp_setup_prog(struct ice_vsi *vsi, struct bpf_prog *prog,
 	}
 
 	if (!ice_is_xdp_ena_vsi(vsi) && prog) {
-		vsi->num_xdp_txq = vsi->alloc_rxq;
+		vsi->num_xdp_txq = num_possible_cpus();
 		xdp_ring_err = ice_prepare_xdp_rings(vsi, prog);
 		if (xdp_ring_err)
 			NL_SET_ERR_MSG_MOD(extack, "Setting up XDP Tx resources failed");
diff --git a/drivers/net/ethernet/intel/ice/ice_txrx_lib.c b/drivers/net/ethernet/intel/ice/ice_txrx_lib.c
index 74519c603872..152703e202e2 100644
--- a/drivers/net/ethernet/intel/ice/ice_txrx_lib.c
+++ b/drivers/net/ethernet/intel/ice/ice_txrx_lib.c
@@ -295,7 +295,7 @@ void ice_finalize_xdp_rx(struct ice_ring *rx_ring, unsigned int xdp_res)
 
 	if (xdp_res & ICE_XDP_TX) {
 		struct ice_tx_ring *xdp_ring =
-			rx_ring->vsi->xdp_rings[rx_ring->q_index];
+			rx_ring->vsi->xdp_rings[smp_processor_id()];
 
 		ice_xdp_ring_update_tail(xdp_ring);
 	}
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v3 intel-next 3/6] ice: do not create xdp_frame on XDP_TX
  2021-08-05 23:00 [PATCH v3 intel-next 0/6] XDP_TX improvements for ice Maciej Fijalkowski
  2021-08-05 23:00 ` [PATCH v3 intel-next 1/6] ice: split ice_ring onto Tx/Rx separate structs Maciej Fijalkowski
  2021-08-05 23:00 ` [PATCH v3 intel-next 2/6] ice: unify xdp_rings accesses Maciej Fijalkowski
@ 2021-08-05 23:00 ` Maciej Fijalkowski
  2021-08-05 23:00 ` [PATCH v3 intel-next 4/6] ice: propagate xdp_ring onto rx_ring Maciej Fijalkowski
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 10+ messages in thread
From: Maciej Fijalkowski @ 2021-08-05 23:00 UTC (permalink / raw)
  To: intel-wired-lan
  Cc: netdev, bpf, davem, anthony.l.nguyen, kuba, bjorn,
	magnus.karlsson, jesse.brandeburg, alexandr.lobakin, joamaki,
	toke, Maciej Fijalkowski

xdp_frame is not needed for XDP_TX data path in ice driver case.
For this data path cleaning of sent descriptor will not happen anywhere
outside of the driver, which means that carrying the information about
the underlying memory model via xdp_frame will not be used. Therefore,
this conversion can be simply dropped, which would relieve CPU a bit.

Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
---
 drivers/net/ethernet/intel/ice/ice_txrx.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/ice/ice_txrx.c b/drivers/net/ethernet/intel/ice/ice_txrx.c
index fca5aca1ffae..3bb851481f3f 100644
--- a/drivers/net/ethernet/intel/ice/ice_txrx.c
+++ b/drivers/net/ethernet/intel/ice/ice_txrx.c
@@ -552,7 +552,7 @@ ice_run_xdp(struct ice_ring *rx_ring, struct xdp_buff *xdp,
 		return ICE_XDP_PASS;
 	case XDP_TX:
 		xdp_ring = rx_ring->vsi->xdp_rings[smp_processor_id()];
-		result = ice_xmit_xdp_buff(xdp, xdp_ring);
+		result = ice_xmit_xdp_ring(xdp->data, xdp->data_end - xdp->data, xdp_ring);
 		if (result == ICE_XDP_CONSUMED)
 			goto out_failure;
 		return result;
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v3 intel-next 4/6] ice: propagate xdp_ring onto rx_ring
  2021-08-05 23:00 [PATCH v3 intel-next 0/6] XDP_TX improvements for ice Maciej Fijalkowski
                   ` (2 preceding siblings ...)
  2021-08-05 23:00 ` [PATCH v3 intel-next 3/6] ice: do not create xdp_frame on XDP_TX Maciej Fijalkowski
@ 2021-08-05 23:00 ` Maciej Fijalkowski
  2021-08-05 23:00 ` [PATCH v3 intel-next 5/6] ice: optimize XDP_TX workloads Maciej Fijalkowski
  2021-08-05 23:00 ` [PATCH v3 intel-next 6/6] ice: introduce XDP_TX fallback path Maciej Fijalkowski
  5 siblings, 0 replies; 10+ messages in thread
From: Maciej Fijalkowski @ 2021-08-05 23:00 UTC (permalink / raw)
  To: intel-wired-lan
  Cc: netdev, bpf, davem, anthony.l.nguyen, kuba, bjorn,
	magnus.karlsson, jesse.brandeburg, alexandr.lobakin, joamaki,
	toke, Maciej Fijalkowski

With rings being split, it is now convenient to introduce a pointer to
XDP ring within the Rx ring. For XDP_TX workloads this means that
xdp_rings array access will be skipped, which was executed per each
processed frame.

Also, read the XDP prog once per NAPI and if prog is present, set up the
local xdp_ring pointer. Reading prog a single time was discussed in [1]
with some concern raised by Toke around dispatcher handling and having
the need for going through the RCU grace period in the ndo_bpf driver
callback, but ice currently is torning down NAPI instances regardless of
the prog presence on VSI.

Although the pointer to XDP ring introduced to Rx ring makes things a
lot slimmer/simpler, I still feel that single prog read per NAPI
lifetime is beneficial.

Further patch that will introduce the fallback path will also get a
profit from that as xdp_ring pointer will be set during the XDP rings
setup.

[1]: https://lore.kernel.org/bpf/87k0oseo6e.fsf@toke.dk/

Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
---
 drivers/net/ethernet/intel/ice/ice_main.c     |  3 +++
 drivers/net/ethernet/intel/ice/ice_txrx.c     | 23 +++++++++--------
 drivers/net/ethernet/intel/ice/ice_txrx.h     |  1 +
 drivers/net/ethernet/intel/ice/ice_txrx_lib.c |  8 ++----
 drivers/net/ethernet/intel/ice/ice_txrx_lib.h |  2 +-
 drivers/net/ethernet/intel/ice/ice_xsk.c      | 25 +++++++++++--------
 6 files changed, 34 insertions(+), 28 deletions(-)

diff --git a/drivers/net/ethernet/intel/ice/ice_main.c b/drivers/net/ethernet/intel/ice/ice_main.c
index 8a1603301726..9ae0e1e9867c 100644
--- a/drivers/net/ethernet/intel/ice/ice_main.c
+++ b/drivers/net/ethernet/intel/ice/ice_main.c
@@ -2383,6 +2383,9 @@ static int ice_xdp_alloc_setup_rings(struct ice_vsi *vsi)
 		xdp_ring->xsk_pool = ice_tx_xsk_pool(xdp_ring);
 	}
 
+	ice_for_each_rxq(vsi, i)
+		vsi->rx_rings[i]->xdp_ring = vsi->xdp_rings[i];
+
 	return 0;
 
 free_xdp_rings:
diff --git a/drivers/net/ethernet/intel/ice/ice_txrx.c b/drivers/net/ethernet/intel/ice/ice_txrx.c
index 3bb851481f3f..add773113e79 100644
--- a/drivers/net/ethernet/intel/ice/ice_txrx.c
+++ b/drivers/net/ethernet/intel/ice/ice_txrx.c
@@ -535,15 +535,15 @@ ice_rx_frame_truesize(struct ice_ring *rx_ring, unsigned int __maybe_unused size
  * @rx_ring: Rx ring
  * @xdp: xdp_buff used as input to the XDP program
  * @xdp_prog: XDP program to run
+ * @xdp_ring: ring to be used for XDP_TX action
  *
  * Returns any of ICE_XDP_{PASS, CONSUMED, TX, REDIR}
  */
 static int
 ice_run_xdp(struct ice_ring *rx_ring, struct xdp_buff *xdp,
-	    struct bpf_prog *xdp_prog)
+	    struct bpf_prog *xdp_prog, struct ice_tx_ring *xdp_ring)
 {
-	struct ice_tx_ring *xdp_ring;
-	int err, result;
+	int err;
 	u32 act;
 
 	act = bpf_prog_run_xdp(xdp_prog, xdp);
@@ -551,11 +551,10 @@ ice_run_xdp(struct ice_ring *rx_ring, struct xdp_buff *xdp,
 	case XDP_PASS:
 		return ICE_XDP_PASS;
 	case XDP_TX:
-		xdp_ring = rx_ring->vsi->xdp_rings[smp_processor_id()];
-		result = ice_xmit_xdp_ring(xdp->data, xdp->data_end - xdp->data, xdp_ring);
-		if (result == ICE_XDP_CONSUMED)
+		err = ice_xmit_xdp_ring(xdp->data, xdp->data_end - xdp->data, xdp_ring);
+		if (err == ICE_XDP_CONSUMED)
 			goto out_failure;
-		return result;
+		return err;
 	case XDP_REDIRECT:
 		err = xdp_do_redirect(rx_ring->netdev, xdp, xdp_prog);
 		if (err)
@@ -1081,6 +1080,7 @@ int ice_clean_rx_irq(struct ice_ring *rx_ring, int budget)
 	unsigned int total_rx_bytes = 0, total_rx_pkts = 0, frame_sz = 0;
 	u16 cleaned_count = ICE_DESC_UNUSED(rx_ring);
 	unsigned int offset = rx_ring->rx_offset;
+	struct ice_tx_ring *xdp_ring = NULL;
 	unsigned int xdp_res, xdp_xmit = 0;
 	struct sk_buff *skb = rx_ring->skb;
 	struct bpf_prog *xdp_prog = NULL;
@@ -1093,6 +1093,10 @@ int ice_clean_rx_irq(struct ice_ring *rx_ring, int budget)
 #endif
 	xdp_init_buff(&xdp, frame_sz, &rx_ring->xdp_rxq);
 
+	xdp_prog = READ_ONCE(rx_ring->xdp_prog);
+	if (xdp_prog)
+		xdp_ring = rx_ring->xdp_ring;
+
 	/* start the loop to process Rx packets bounded by 'budget' */
 	while (likely(total_rx_pkts < (unsigned int)budget)) {
 		union ice_32b_rx_flex_desc *rx_desc;
@@ -1156,11 +1160,10 @@ int ice_clean_rx_irq(struct ice_ring *rx_ring, int budget)
 		xdp.frame_sz = ice_rx_frame_truesize(rx_ring, size);
 #endif
 
-		xdp_prog = READ_ONCE(rx_ring->xdp_prog);
 		if (!xdp_prog)
 			goto construct_skb;
 
-		xdp_res = ice_run_xdp(rx_ring, &xdp, xdp_prog);
+		xdp_res = ice_run_xdp(rx_ring, &xdp, xdp_prog, xdp_ring);
 		if (!xdp_res)
 			goto construct_skb;
 		if (xdp_res & (ICE_XDP_TX | ICE_XDP_REDIR)) {
@@ -1237,7 +1240,7 @@ int ice_clean_rx_irq(struct ice_ring *rx_ring, int budget)
 	failure = ice_alloc_rx_bufs(rx_ring, cleaned_count);
 
 	if (xdp_prog)
-		ice_finalize_xdp_rx(rx_ring, xdp_xmit);
+		ice_finalize_xdp_rx(xdp_ring, xdp_xmit);
 	rx_ring->skb = skb;
 
 	ice_update_rx_ring_stats(rx_ring, total_rx_pkts, total_rx_bytes);
diff --git a/drivers/net/ethernet/intel/ice/ice_txrx.h b/drivers/net/ethernet/intel/ice/ice_txrx.h
index d4ab3558933e..0e24052429e5 100644
--- a/drivers/net/ethernet/intel/ice/ice_txrx.h
+++ b/drivers/net/ethernet/intel/ice/ice_txrx.h
@@ -290,6 +290,7 @@ struct ice_ring {
 	struct rcu_head rcu;		/* to avoid race on free */
 	/* CL4 - 3rd cacheline starts here */
 	struct bpf_prog *xdp_prog;
+	struct ice_tx_ring *xdp_ring;
 	struct xsk_buff_pool *xsk_pool;
 	struct sk_buff *skb;
 	dma_addr_t dma;			/* physical address of ring */
diff --git a/drivers/net/ethernet/intel/ice/ice_txrx_lib.c b/drivers/net/ethernet/intel/ice/ice_txrx_lib.c
index 152703e202e2..68163dd3054c 100644
--- a/drivers/net/ethernet/intel/ice/ice_txrx_lib.c
+++ b/drivers/net/ethernet/intel/ice/ice_txrx_lib.c
@@ -288,15 +288,11 @@ int ice_xmit_xdp_buff(struct xdp_buff *xdp, struct ice_tx_ring *xdp_ring)
  * should be called when a batch of packets has been processed in the
  * napi loop.
  */
-void ice_finalize_xdp_rx(struct ice_ring *rx_ring, unsigned int xdp_res)
+void ice_finalize_xdp_rx(struct ice_tx_ring *xdp_ring, unsigned int xdp_res)
 {
 	if (xdp_res & ICE_XDP_REDIR)
 		xdp_do_flush_map();
 
-	if (xdp_res & ICE_XDP_TX) {
-		struct ice_tx_ring *xdp_ring =
-			rx_ring->vsi->xdp_rings[smp_processor_id()];
-
+	if (xdp_res & ICE_XDP_TX)
 		ice_xdp_ring_update_tail(xdp_ring);
-	}
 }
diff --git a/drivers/net/ethernet/intel/ice/ice_txrx_lib.h b/drivers/net/ethernet/intel/ice/ice_txrx_lib.h
index 6989070ae2e2..36295c2baac7 100644
--- a/drivers/net/ethernet/intel/ice/ice_txrx_lib.h
+++ b/drivers/net/ethernet/intel/ice/ice_txrx_lib.h
@@ -46,7 +46,7 @@ static inline void ice_xdp_ring_update_tail(struct ice_tx_ring *xdp_ring)
 	writel_relaxed(xdp_ring->next_to_use, xdp_ring->tail);
 }
 
-void ice_finalize_xdp_rx(struct ice_ring *xdp_ring, unsigned int xdp_res);
+void ice_finalize_xdp_rx(struct ice_tx_ring *xdp_ring, unsigned int xdp_res);
 int ice_xmit_xdp_buff(struct xdp_buff *xdp, struct ice_tx_ring *xdp_ring);
 int ice_xmit_xdp_ring(void *data, u16 size, struct ice_tx_ring *xdp_ring);
 void ice_release_rx_desc(struct ice_ring *rx_ring, u16 val);
diff --git a/drivers/net/ethernet/intel/ice/ice_xsk.c b/drivers/net/ethernet/intel/ice/ice_xsk.c
index bcf0f8e2ba6e..8d8b39cce2b8 100644
--- a/drivers/net/ethernet/intel/ice/ice_xsk.c
+++ b/drivers/net/ethernet/intel/ice/ice_xsk.c
@@ -458,22 +458,18 @@ ice_construct_skb_zc(struct ice_ring *rx_ring, struct ice_rx_buf *rx_buf)
  * ice_run_xdp_zc - Executes an XDP program in zero-copy path
  * @rx_ring: Rx ring
  * @xdp: xdp_buff used as input to the XDP program
+ * @xdp_prog: XDP program to run
+ * @xdp_ring: ring to be used for XDP_TX action
  *
  * Returns any of ICE_XDP_{PASS, CONSUMED, TX, REDIR}
  */
 static int
-ice_run_xdp_zc(struct ice_ring *rx_ring, struct xdp_buff *xdp)
+ice_run_xdp_zc(struct ice_ring *rx_ring, struct xdp_buff *xdp,
+	       struct bpf_prog *xdp_prog, struct ice_tx_ring *xdp_ring)
 {
 	int err, result = ICE_XDP_PASS;
-	struct ice_tx_ring *xdp_ring;
-	struct bpf_prog *xdp_prog;
 	u32 act;
 
-	/* ZC patch is enabled only when XDP program is set,
-	 * so here it can not be NULL
-	 */
-	xdp_prog = READ_ONCE(rx_ring->xdp_prog);
-
 	act = bpf_prog_run_xdp(xdp_prog, xdp);
 
 	if (likely(act == XDP_REDIRECT)) {
@@ -487,7 +483,6 @@ ice_run_xdp_zc(struct ice_ring *rx_ring, struct xdp_buff *xdp)
 	case XDP_PASS:
 		break;
 	case XDP_TX:
-		xdp_ring = rx_ring->vsi->xdp_rings[rx_ring->q_index];
 		result = ice_xmit_xdp_buff(xdp, xdp_ring);
 		if (result == ICE_XDP_CONSUMED)
 			goto out_failure;
@@ -518,9 +513,17 @@ int ice_clean_rx_irq_zc(struct ice_ring *rx_ring, int budget)
 {
 	unsigned int total_rx_bytes = 0, total_rx_packets = 0;
 	u16 cleaned_count = ICE_DESC_UNUSED(rx_ring);
+	struct ice_tx_ring *xdp_ring;
 	unsigned int xdp_xmit = 0;
+	struct bpf_prog *xdp_prog;
 	bool failure = false;
 
+	/* ZC patch is enabled only when XDP program is set,
+	 * so here it can not be NULL
+	 */
+	xdp_prog = READ_ONCE(rx_ring->xdp_prog);
+	xdp_ring = rx_ring->xdp_ring;
+
 	while (likely(total_rx_packets < (unsigned int)budget)) {
 		union ice_32b_rx_flex_desc *rx_desc;
 		unsigned int size, xdp_res = 0;
@@ -551,7 +554,7 @@ int ice_clean_rx_irq_zc(struct ice_ring *rx_ring, int budget)
 		rx_buf->xdp->data_end = rx_buf->xdp->data + size;
 		xsk_buff_dma_sync_for_cpu(rx_buf->xdp, rx_ring->xsk_pool);
 
-		xdp_res = ice_run_xdp_zc(rx_ring, rx_buf->xdp);
+		xdp_res = ice_run_xdp_zc(rx_ring, rx_buf->xdp, xdp_prog, xdp_ring);
 		if (xdp_res) {
 			if (xdp_res & (ICE_XDP_TX | ICE_XDP_REDIR))
 				xdp_xmit |= xdp_res;
@@ -599,7 +602,7 @@ int ice_clean_rx_irq_zc(struct ice_ring *rx_ring, int budget)
 	if (cleaned_count >= ICE_RX_BUF_WRITE)
 		failure = !ice_alloc_rx_bufs_zc(rx_ring, cleaned_count);
 
-	ice_finalize_xdp_rx(rx_ring, xdp_xmit);
+	ice_finalize_xdp_rx(xdp_ring, xdp_xmit);
 	ice_update_rx_ring_stats(rx_ring, total_rx_packets, total_rx_bytes);
 
 	if (xsk_uses_need_wakeup(rx_ring->xsk_pool)) {
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v3 intel-next 5/6] ice: optimize XDP_TX workloads
  2021-08-05 23:00 [PATCH v3 intel-next 0/6] XDP_TX improvements for ice Maciej Fijalkowski
                   ` (3 preceding siblings ...)
  2021-08-05 23:00 ` [PATCH v3 intel-next 4/6] ice: propagate xdp_ring onto rx_ring Maciej Fijalkowski
@ 2021-08-05 23:00 ` Maciej Fijalkowski
  2021-08-05 23:00 ` [PATCH v3 intel-next 6/6] ice: introduce XDP_TX fallback path Maciej Fijalkowski
  5 siblings, 0 replies; 10+ messages in thread
From: Maciej Fijalkowski @ 2021-08-05 23:00 UTC (permalink / raw)
  To: intel-wired-lan
  Cc: netdev, bpf, davem, anthony.l.nguyen, kuba, bjorn,
	magnus.karlsson, jesse.brandeburg, alexandr.lobakin, joamaki,
	toke, Maciej Fijalkowski

Optimize Tx descriptor cleaning for XDP. Current approach doesn't
really scale and chokes when multiple flows are handled.

Introduce two ring fields, @next_dd and @next_rs that will keep track of
descriptor that should be looked at when the need for cleaning arise and
the descriptor that should have the RS bit set, respectively.

Note that at this point the threshold is a constant (32), but it is
something that we could make configurable.

First thing is to get away from setting RS bit on each descriptor. Let's
do this only once NTU is higher than the currently @next_rs value. In
such case, grab the tx_desc[next_rs], set the RS bit in descriptor and
advance the @next_rs by a 32.

Second thing is to clean the Tx ring only when there are less than 32
free entries. For that case, look up the tx_desc[next_dd] for a DD bit.
This bit is written back by HW to let the driver know that xmit was
successful. It will happen only for those descriptors that had RS bit
set. Clean only 32 descriptors and advance the DD bit.

Actual cleaning routine is moved from ice_napi_poll() down to the
ice_xmit_xdp_ring(). It is safe to do so as XDP ring will not get any
SKBs in there that would rely on interrupts for the cleaning. Nice side
effect is that for rare case of Tx fallback path (that next patch is
going to introduce) we don't have to trigger the SW irq to clean the
ring.

With those two concepts, ring is kept at being almost full, but it is
guaranteed that driver will be able to produce Tx descriptors.

This approach seems to work out well even though the Tx descriptors are
produced in one-by-one manner. Test was conducted with the ice HW
bombarded with packets from HW generator, configured to generate 30
flows.

Xdp2 sample yields the following results:
<snip>
proto 17:   79973066 pkt/s
proto 17:   80018911 pkt/s
proto 17:   80004654 pkt/s
proto 17:   79992395 pkt/s
proto 17:   79975162 pkt/s
proto 17:   79955054 pkt/s
proto 17:   79869168 pkt/s
proto 17:   79823947 pkt/s
proto 17:   79636971 pkt/s
</snip>

As that sample reports the Rx'ed frames, let's look at sar output.
It says that what we Rx'ed we do actually Tx, no noticeable drops.
Average:        IFACE   rxpck/s   txpck/s    rxkB/s    txkB/s   rxcmp/s txcmp/s  rxmcst/s   %ifutil
Average:       ens4f1 79842324.00 79842310.40 4678261.17 4678260.38 0.00      0.00      0.00     38.32

with tx_busy staying calm.

When compared to a state before:
Average:        IFACE   rxpck/s   txpck/s    rxkB/s    txkB/s   rxcmp/s txcmp/s  rxmcst/s   %ifutil
Average:       ens4f1 90919711.60 42233822.60 5327326.85 2474638.04 0.00      0.00      0.00     43.64

it can be observed that the amount of txpck/s is almost doubled, meaning
that the performance is improved by around 90%. All of this due to the
drops in the driver, previously the tx_busy stat was bumped at a 7mpps
rate.

Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
---
 drivers/net/ethernet/intel/ice/ice_main.c     |  2 +
 drivers/net/ethernet/intel/ice/ice_txrx.c     | 21 +++---
 drivers/net/ethernet/intel/ice/ice_txrx.h     | 10 ++-
 drivers/net/ethernet/intel/ice/ice_txrx_lib.c | 73 ++++++++++++++++---
 4 files changed, 82 insertions(+), 24 deletions(-)

diff --git a/drivers/net/ethernet/intel/ice/ice_main.c b/drivers/net/ethernet/intel/ice/ice_main.c
index 9ae0e1e9867c..9f7388698b82 100644
--- a/drivers/net/ethernet/intel/ice/ice_main.c
+++ b/drivers/net/ethernet/intel/ice/ice_main.c
@@ -2374,6 +2374,8 @@ static int ice_xdp_alloc_setup_rings(struct ice_vsi *vsi)
 		xdp_ring->reg_idx = vsi->txq_map[xdp_q_idx];
 		xdp_ring->vsi = vsi;
 		xdp_ring->netdev = NULL;
+		xdp_ring->next_dd = ICE_TX_THRESH - 1;
+		xdp_ring->next_rs = ICE_TX_THRESH - 1;
 		xdp_ring->dev = dev;
 		xdp_ring->count = vsi->num_tx_desc;
 		WRITE_ONCE(vsi->xdp_rings[i], xdp_ring);
diff --git a/drivers/net/ethernet/intel/ice/ice_txrx.c b/drivers/net/ethernet/intel/ice/ice_txrx.c
index add773113e79..7d8e4af65ca3 100644
--- a/drivers/net/ethernet/intel/ice/ice_txrx.c
+++ b/drivers/net/ethernet/intel/ice/ice_txrx.c
@@ -245,11 +245,8 @@ static bool ice_clean_tx_irq(struct ice_tx_ring *tx_ring, int napi_budget)
 		total_bytes += tx_buf->bytecount;
 		total_pkts += tx_buf->gso_segs;
 
-		if (ice_ring_is_xdp(tx_ring))
-			page_frag_free(tx_buf->raw_buf);
-		else
-			/* free the skb */
-			napi_consume_skb(tx_buf->skb, napi_budget);
+		/* free the skb */
+		napi_consume_skb(tx_buf->skb, napi_budget);
 
 		/* unmap skb header data */
 		dma_unmap_single(tx_ring->dev,
@@ -305,9 +302,6 @@ static bool ice_clean_tx_irq(struct ice_tx_ring *tx_ring, int napi_budget)
 
 	ice_update_tx_ring_stats(tx_ring, total_pkts, total_bytes);
 
-	if (ice_ring_is_xdp(tx_ring))
-		return !!budget;
-
 	netdev_tx_completed_queue(txring_txq(tx_ring), total_pkts,
 				  total_bytes);
 
@@ -1416,9 +1410,14 @@ int ice_napi_poll(struct napi_struct *napi, int budget)
 	 * budget and be more aggressive about cleaning up the Tx descriptors.
 	 */
 	ice_for_each_tx_ring(tx_ring, q_vector->tx) {
-		bool wd = tx_ring->xsk_pool ?
-			  ice_clean_tx_irq_zc(tx_ring, budget) :
-			  ice_clean_tx_irq(tx_ring, budget);
+		bool wd;
+
+		if (tx_ring->xsk_pool)
+			wd = ice_clean_tx_irq_zc(tx_ring, budget);
+		else if (ice_ring_is_xdp(tx_ring))
+			wd = true;
+		else
+			wd = ice_clean_tx_irq(tx_ring, budget);
 
 		if (!wd)
 			clean_complete = false;
diff --git a/drivers/net/ethernet/intel/ice/ice_txrx.h b/drivers/net/ethernet/intel/ice/ice_txrx.h
index 0e24052429e5..8c30d92af4c9 100644
--- a/drivers/net/ethernet/intel/ice/ice_txrx.h
+++ b/drivers/net/ethernet/intel/ice/ice_txrx.h
@@ -13,6 +13,7 @@
 #define ICE_MAX_CHAINED_RX_BUFS	5
 #define ICE_MAX_BUF_TXD		8
 #define ICE_MIN_TX_LEN		17
+#define ICE_TX_THRESH		32
 
 /* The size limit for a transmit buffer in a descriptor is (16K - 1).
  * In order to align with the read requests we will align the value to
@@ -313,12 +314,15 @@ struct ice_tx_ring {
 	struct ice_vsi *vsi;		/* Backreference to associated VSI */
 	/* CL2 - 2nd cacheline starts here */
 	dma_addr_t dma;			/* physical address of ring */
+	struct xsk_buff_pool *xsk_pool;
 	u16 next_to_use;
 	u16 next_to_clean;
+	u16 next_rs;
+	u16 next_dd;
+	u16 q_handle;			/* Queue handle per TC */
+	u16 reg_idx;			/* HW register index of the ring */
 	u16 count;			/* Number of descriptors */
 	u16 q_index;			/* Queue number of ring */
-	struct xsk_buff_pool *xsk_pool;
-
 	/* stats structs */
 	struct ice_q_stats	stats;
 	struct u64_stats_sync syncp;
@@ -329,8 +333,6 @@ struct ice_tx_ring {
 	DECLARE_BITMAP(xps_state, ICE_TX_NBITS);	/* XPS Config State */
 	struct ice_ptp_tx *tx_tstamps;
 	u32 txq_teid;			/* Added Tx queue TEID */
-	u16 q_handle;			/* Queue handle per TC */
-	u16 reg_idx;			/* HW register index of the ring */
 #define ICE_TX_FLAGS_RING_XDP		BIT(0)
 	u8 flags;
 	u8 dcb_tc;			/* Traffic class of ring */
diff --git a/drivers/net/ethernet/intel/ice/ice_txrx_lib.c b/drivers/net/ethernet/intel/ice/ice_txrx_lib.c
index 68163dd3054c..f82e2789ad93 100644
--- a/drivers/net/ethernet/intel/ice/ice_txrx_lib.c
+++ b/drivers/net/ethernet/intel/ice/ice_txrx_lib.c
@@ -2,6 +2,7 @@
 /* Copyright (c) 2019, Intel Corporation. */
 
 #include "ice_txrx_lib.h"
+#include "ice_lib.h"
 
 /**
  * ice_release_rx_desc - Store the new tail and head values
@@ -211,6 +212,52 @@ ice_receive_skb(struct ice_ring *rx_ring, struct sk_buff *skb, u16 vlan_tag)
 	napi_gro_receive(&rx_ring->q_vector->napi, skb);
 }
 
+/**
+ * ice_clean_xdp_irq - Reclaim resources after transmit completes on XDP ring
+ * @xdp_ring: XDP ring to clean
+ */
+static void ice_clean_xdp_irq(struct ice_tx_ring *xdp_ring)
+{
+	unsigned int total_bytes = 0, total_pkts = 0;
+	u16 ntc = xdp_ring->next_to_clean;
+	struct ice_tx_desc *next_dd_desc;
+	u16 next_dd = xdp_ring->next_dd;
+	struct ice_tx_buf *tx_buf;
+	int i;
+
+	next_dd_desc = ICE_TX_DESC(xdp_ring, next_dd);
+	if (!(next_dd_desc->cmd_type_offset_bsz &
+	    cpu_to_le64(ICE_TX_DESC_DTYPE_DESC_DONE)))
+		return;
+
+	for (i = 0; i < ICE_TX_THRESH; i++) {
+		tx_buf = &xdp_ring->tx_buf[ntc];
+
+		total_bytes += tx_buf->bytecount;
+		/* normally tx_buf->gso_segs was taken but at this point
+		 * it's always 1 for us
+		 */
+		total_pkts++;
+
+		page_frag_free(tx_buf->raw_buf);
+		dma_unmap_single(xdp_ring->dev, dma_unmap_addr(tx_buf, dma),
+				 dma_unmap_len(tx_buf, len), DMA_TO_DEVICE);
+		dma_unmap_len_set(tx_buf, len, 0);
+		tx_buf->raw_buf = NULL;
+
+		ntc++;
+		if (ntc >= xdp_ring->count)
+			ntc = 0;
+	}
+
+	next_dd_desc->cmd_type_offset_bsz = 0;
+	xdp_ring->next_dd = xdp_ring->next_dd + ICE_TX_THRESH;
+	if (xdp_ring->next_dd > xdp_ring->count)
+		xdp_ring->next_dd = ICE_TX_THRESH - 1;
+	xdp_ring->next_to_clean = ntc;
+	ice_update_tx_ring_stats(xdp_ring, total_pkts, total_bytes);
+}
+
 /**
  * ice_xmit_xdp_ring - submit single packet to XDP ring for transmission
  * @data: packet data pointer
@@ -224,6 +271,9 @@ int ice_xmit_xdp_ring(void *data, u16 size, struct ice_tx_ring *xdp_ring)
 	struct ice_tx_buf *tx_buf;
 	dma_addr_t dma;
 
+	if (ICE_DESC_UNUSED(xdp_ring) < ICE_TX_THRESH)
+		ice_clean_xdp_irq(xdp_ring);
+
 	if (!unlikely(ICE_DESC_UNUSED(xdp_ring))) {
 		xdp_ring->tx_stats.tx_busy++;
 		return ICE_XDP_CONSUMED;
@@ -244,21 +294,26 @@ int ice_xmit_xdp_ring(void *data, u16 size, struct ice_tx_ring *xdp_ring)
 
 	tx_desc = ICE_TX_DESC(xdp_ring, i);
 	tx_desc->buf_addr = cpu_to_le64(dma);
-	tx_desc->cmd_type_offset_bsz = ice_build_ctob(ICE_TXD_LAST_DESC_CMD, 0,
+	tx_desc->cmd_type_offset_bsz = ice_build_ctob(ICE_TX_DESC_CMD_EOP, 0,
 						      size, 0);
 
-	/* Make certain all of the status bits have been updated
-	 * before next_to_watch is written.
-	 */
-	smp_wmb();
-
 	i++;
-	if (i == xdp_ring->count)
+	if (i == xdp_ring->count) {
 		i = 0;
-
-	tx_buf->next_to_watch = tx_desc;
+		tx_desc = ICE_TX_DESC(xdp_ring, xdp_ring->next_rs);
+		tx_desc->cmd_type_offset_bsz |=
+			cpu_to_le64(ICE_TX_DESC_CMD_RS << ICE_TXD_QW1_CMD_S);
+		xdp_ring->next_rs = ICE_TX_THRESH - 1;
+	}
 	xdp_ring->next_to_use = i;
 
+	if (i > xdp_ring->next_rs) {
+		tx_desc = ICE_TX_DESC(xdp_ring, xdp_ring->next_rs);
+		tx_desc->cmd_type_offset_bsz |=
+			cpu_to_le64(ICE_TX_DESC_CMD_RS << ICE_TXD_QW1_CMD_S);
+		xdp_ring->next_rs += ICE_TX_THRESH;
+	}
+
 	return ICE_XDP_TX;
 }
 
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v3 intel-next 6/6] ice: introduce XDP_TX fallback path
  2021-08-05 23:00 [PATCH v3 intel-next 0/6] XDP_TX improvements for ice Maciej Fijalkowski
                   ` (4 preceding siblings ...)
  2021-08-05 23:00 ` [PATCH v3 intel-next 5/6] ice: optimize XDP_TX workloads Maciej Fijalkowski
@ 2021-08-05 23:00 ` Maciej Fijalkowski
  5 siblings, 0 replies; 10+ messages in thread
From: Maciej Fijalkowski @ 2021-08-05 23:00 UTC (permalink / raw)
  To: intel-wired-lan
  Cc: netdev, bpf, davem, anthony.l.nguyen, kuba, bjorn,
	magnus.karlsson, jesse.brandeburg, alexandr.lobakin, joamaki,
	toke, Maciej Fijalkowski

Under rare circumstances there might be a situation where a requirement
of having XDP Tx queue per CPU could not be fulfilled and some of the Tx
resources have to be shared between CPUs. This yields a need for placing
accesses to xdp_ring inside a critical section protected by spinlock.
These accesses happen to be in the hot path, so let's introduce the
static branch that will be triggered from the control plane when driver
could not provide Tx queue dedicated for XDP on each CPU.

Currently, the design that has been picked is to allow any number of XDP
Tx queues that is at least half of a count of CPUs that platform has.
For lower number driver will bail out with a response to user that there
were not enough Tx resources that would allow configuring XDP. The
sharing of rings is signalled via static branch enablement which in turn
indicates that lock for xdp_ring accesses needs to be taken in hot path.

Approach based on static branch has no impact on performance of a
non-fallback path. One thing that is needed to be mentioned is a fact
that the static branch will act as a global driver switch, meaning that
if one PF got out of Tx resources, then other PFs that ice driver is
servicing will suffer. However, given the fact that HW that ice driver
is handling has 1024 Tx queues per each PF, this is currently an
unlikely scenario.

Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
---
 drivers/net/ethernet/intel/ice/ice.h          |  3 ++
 drivers/net/ethernet/intel/ice/ice_lib.c      |  4 +-
 drivers/net/ethernet/intel/ice/ice_main.c     | 53 ++++++++++++++++---
 drivers/net/ethernet/intel/ice/ice_txrx.c     | 16 +++++-
 drivers/net/ethernet/intel/ice/ice_txrx.h     |  1 +
 drivers/net/ethernet/intel/ice/ice_txrx_lib.c |  7 ++-
 6 files changed, 75 insertions(+), 9 deletions(-)

diff --git a/drivers/net/ethernet/intel/ice/ice.h b/drivers/net/ethernet/intel/ice/ice.h
index 2e15e097bc0f..4c7ff0e8c20f 100644
--- a/drivers/net/ethernet/intel/ice/ice.h
+++ b/drivers/net/ethernet/intel/ice/ice.h
@@ -158,6 +158,8 @@
 
 #define ice_pf_to_dev(pf) (&((pf)->pdev->dev))
 
+DECLARE_STATIC_KEY_FALSE(ice_xdp_locking_key);
+
 struct ice_txq_meta {
 	u32 q_teid;	/* Tx-scheduler element identifier */
 	u16 q_id;	/* Entry in VSI's txq_map bitmap */
@@ -662,6 +664,7 @@ int ice_up(struct ice_vsi *vsi);
 int ice_down(struct ice_vsi *vsi);
 int ice_vsi_cfg(struct ice_vsi *vsi);
 struct ice_vsi *ice_lb_vsi_setup(struct ice_pf *pf, struct ice_port_info *pi);
+int ice_vsi_determine_xdp_res(struct ice_vsi *vsi);
 int ice_prepare_xdp_rings(struct ice_vsi *vsi, struct bpf_prog *prog);
 int ice_destroy_xdp_rings(struct ice_vsi *vsi);
 int
diff --git a/drivers/net/ethernet/intel/ice/ice_lib.c b/drivers/net/ethernet/intel/ice/ice_lib.c
index d44a657384e6..09890a69b154 100644
--- a/drivers/net/ethernet/intel/ice/ice_lib.c
+++ b/drivers/net/ethernet/intel/ice/ice_lib.c
@@ -3152,7 +3152,9 @@ int ice_vsi_rebuild(struct ice_vsi *vsi, bool init_vsi)
 
 		ice_vsi_map_rings_to_vectors(vsi);
 		if (ice_is_xdp_ena_vsi(vsi)) {
-			vsi->num_xdp_txq = num_possible_cpus();
+			ret = ice_vsi_determine_xdp_res(vsi);
+			if (ret)
+				goto err_vectors;
 			ret = ice_prepare_xdp_rings(vsi, vsi->xdp_prog);
 			if (ret)
 				goto err_vectors;
diff --git a/drivers/net/ethernet/intel/ice/ice_main.c b/drivers/net/ethernet/intel/ice/ice_main.c
index 9f7388698b82..7ab207cda62b 100644
--- a/drivers/net/ethernet/intel/ice/ice_main.c
+++ b/drivers/net/ethernet/intel/ice/ice_main.c
@@ -42,6 +42,8 @@ MODULE_PARM_DESC(debug, "netif level (0=none,...,16=all)");
 #endif /* !CONFIG_DYNAMIC_DEBUG */
 
 static DEFINE_IDA(ice_aux_ida);
+DEFINE_STATIC_KEY_FALSE(ice_xdp_locking_key);
+EXPORT_SYMBOL(ice_xdp_locking_key);
 
 static struct workqueue_struct *ice_wq;
 static const struct net_device_ops ice_netdev_safe_mode_ops;
@@ -2383,10 +2385,15 @@ static int ice_xdp_alloc_setup_rings(struct ice_vsi *vsi)
 			goto free_xdp_rings;
 		ice_set_ring_xdp(xdp_ring);
 		xdp_ring->xsk_pool = ice_tx_xsk_pool(xdp_ring);
+		spin_lock_init(&xdp_ring->tx_lock);
 	}
 
-	ice_for_each_rxq(vsi, i)
-		vsi->rx_rings[i]->xdp_ring = vsi->xdp_rings[i];
+	ice_for_each_rxq(vsi, i) {
+		if (static_key_enabled(&ice_xdp_locking_key))
+			vsi->rx_rings[i]->xdp_ring = vsi->xdp_rings[i % vsi->num_xdp_txq];
+		else
+			vsi->rx_rings[i]->xdp_ring = vsi->xdp_rings[i];
+	}
 
 	return 0;
 
@@ -2451,6 +2458,10 @@ int ice_prepare_xdp_rings(struct ice_vsi *vsi, struct bpf_prog *prog)
 	if (__ice_vsi_get_qs(&xdp_qs_cfg))
 		goto err_map_xdp;
 
+	if (static_key_enabled(&ice_xdp_locking_key))
+		netdev_warn(vsi->netdev,
+			    "Could not allocate one XDP Tx ring per CPU, XDP_TX/XDP_REDIRECT actions will be slower\n");
+
 	if (ice_xdp_alloc_setup_rings(vsi))
 		goto clear_xdp_rings;
 
@@ -2567,6 +2578,9 @@ int ice_destroy_xdp_rings(struct ice_vsi *vsi)
 	devm_kfree(ice_pf_to_dev(pf), vsi->xdp_rings);
 	vsi->xdp_rings = NULL;
 
+	if (static_key_enabled(&ice_xdp_locking_key))
+		static_branch_dec(&ice_xdp_locking_key);
+
 	if (ice_is_reset_in_progress(pf->state) || !vsi->q_vectors[0])
 		return 0;
 
@@ -2601,6 +2615,29 @@ static void ice_vsi_rx_napi_schedule(struct ice_vsi *vsi)
 	}
 }
 
+/**
+ * ice_vsi_determine_xdp_res - figure out how many Tx qs can XDP have
+ * @vsi: VSI to determine the count of XDP Tx qs
+ *
+ * returns 0 if Tx qs count is higher than at least half of CPU count,
+ * -ENOMEM otherwise
+ */
+int ice_vsi_determine_xdp_res(struct ice_vsi *vsi)
+{
+	u16 avail = ice_get_avail_txq_count(vsi->back);
+	u16 cpus = num_possible_cpus();
+
+	if (avail < cpus / 2)
+		return -ENOMEM;
+
+	vsi->num_xdp_txq = min_t(u16, avail, cpus);
+
+	if (vsi->num_xdp_txq < cpus)
+		static_branch_inc(&ice_xdp_locking_key);
+
+	return 0;
+}
+
 /**
  * ice_xdp_setup_prog - Add or remove XDP eBPF program
  * @vsi: VSI to setup XDP for
@@ -2630,10 +2667,14 @@ ice_xdp_setup_prog(struct ice_vsi *vsi, struct bpf_prog *prog,
 	}
 
 	if (!ice_is_xdp_ena_vsi(vsi) && prog) {
-		vsi->num_xdp_txq = num_possible_cpus();
-		xdp_ring_err = ice_prepare_xdp_rings(vsi, prog);
-		if (xdp_ring_err)
-			NL_SET_ERR_MSG_MOD(extack, "Setting up XDP Tx resources failed");
+		xdp_ring_err = ice_vsi_determine_xdp_res(vsi);
+		if (xdp_ring_err) {
+			NL_SET_ERR_MSG_MOD(extack, "Not enough Tx resources for XDP");
+		} else {
+			xdp_ring_err = ice_prepare_xdp_rings(vsi, prog);
+			if (xdp_ring_err)
+				NL_SET_ERR_MSG_MOD(extack, "Setting up XDP Tx resources failed");
+		}
 	} else if (ice_is_xdp_ena_vsi(vsi) && !prog) {
 		xdp_ring_err = ice_destroy_xdp_rings(vsi);
 		if (xdp_ring_err)
diff --git a/drivers/net/ethernet/intel/ice/ice_txrx.c b/drivers/net/ethernet/intel/ice/ice_txrx.c
index 7d8e4af65ca3..7714fc7bab2b 100644
--- a/drivers/net/ethernet/intel/ice/ice_txrx.c
+++ b/drivers/net/ethernet/intel/ice/ice_txrx.c
@@ -545,7 +545,11 @@ ice_run_xdp(struct ice_ring *rx_ring, struct xdp_buff *xdp,
 	case XDP_PASS:
 		return ICE_XDP_PASS;
 	case XDP_TX:
+		if (static_branch_unlikely(&ice_xdp_locking_key))
+			spin_lock(&xdp_ring->tx_lock);
 		err = ice_xmit_xdp_ring(xdp->data, xdp->data_end - xdp->data, xdp_ring);
+		if (static_branch_unlikely(&ice_xdp_locking_key))
+			spin_unlock(&xdp_ring->tx_lock);
 		if (err == ICE_XDP_CONSUMED)
 			goto out_failure;
 		return err;
@@ -597,7 +601,14 @@ ice_xdp_xmit(struct net_device *dev, int n, struct xdp_frame **frames,
 	if (unlikely(flags & ~XDP_XMIT_FLAGS_MASK))
 		return -EINVAL;
 
-	xdp_ring = vsi->xdp_rings[queue_index];
+	if (static_branch_unlikely(&ice_xdp_locking_key)) {
+		queue_index %= vsi->num_xdp_txq;
+		xdp_ring = vsi->xdp_rings[queue_index];
+		spin_lock(&xdp_ring->tx_lock);
+	} else {
+		xdp_ring = vsi->xdp_rings[queue_index];
+	}
+
 	for (i = 0; i < n; i++) {
 		struct xdp_frame *xdpf = frames[i];
 		int err;
@@ -611,6 +622,9 @@ ice_xdp_xmit(struct net_device *dev, int n, struct xdp_frame **frames,
 	if (unlikely(flags & XDP_XMIT_FLUSH))
 		ice_xdp_ring_update_tail(xdp_ring);
 
+	if (static_branch_unlikely(&ice_xdp_locking_key))
+		spin_unlock(&xdp_ring->tx_lock);
+
 	return nxmit;
 }
 
diff --git a/drivers/net/ethernet/intel/ice/ice_txrx.h b/drivers/net/ethernet/intel/ice/ice_txrx.h
index 8c30d92af4c9..7916d2adebeb 100644
--- a/drivers/net/ethernet/intel/ice/ice_txrx.h
+++ b/drivers/net/ethernet/intel/ice/ice_txrx.h
@@ -332,6 +332,7 @@ struct ice_tx_ring {
 	struct rcu_head rcu;		/* to avoid race on free */
 	DECLARE_BITMAP(xps_state, ICE_TX_NBITS);	/* XPS Config State */
 	struct ice_ptp_tx *tx_tstamps;
+	spinlock_t tx_lock;
 	u32 txq_teid;			/* Added Tx queue TEID */
 #define ICE_TX_FLAGS_RING_XDP		BIT(0)
 	u8 flags;
diff --git a/drivers/net/ethernet/intel/ice/ice_txrx_lib.c b/drivers/net/ethernet/intel/ice/ice_txrx_lib.c
index f82e2789ad93..d18ea4612ba4 100644
--- a/drivers/net/ethernet/intel/ice/ice_txrx_lib.c
+++ b/drivers/net/ethernet/intel/ice/ice_txrx_lib.c
@@ -348,6 +348,11 @@ void ice_finalize_xdp_rx(struct ice_tx_ring *xdp_ring, unsigned int xdp_res)
 	if (xdp_res & ICE_XDP_REDIR)
 		xdp_do_flush_map();
 
-	if (xdp_res & ICE_XDP_TX)
+	if (xdp_res & ICE_XDP_TX) {
+		if (static_branch_unlikely(&ice_xdp_locking_key))
+			spin_lock(&xdp_ring->tx_lock);
 		ice_xdp_ring_update_tail(xdp_ring);
+		if (static_branch_unlikely(&ice_xdp_locking_key))
+			spin_unlock(&xdp_ring->tx_lock);
+	}
 }
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH v3 intel-next 1/6] ice: split ice_ring onto Tx/Rx separate structs
  2021-08-05 23:00 ` [PATCH v3 intel-next 1/6] ice: split ice_ring onto Tx/Rx separate structs Maciej Fijalkowski
@ 2021-08-06  1:08   ` kernel test robot
  2021-08-06 20:46   ` Creeley, Brett
  1 sibling, 0 replies; 10+ messages in thread
From: kernel test robot @ 2021-08-06  1:08 UTC (permalink / raw)
  To: Maciej Fijalkowski, intel-wired-lan
  Cc: kbuild-all, netdev, bpf, davem, anthony.l.nguyen, kuba, bjorn,
	magnus.karlsson, jesse.brandeburg, alexandr.lobakin

[-- Attachment #1: Type: text/plain, Size: 6648 bytes --]

Hi Maciej,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on ipvs/master]
[also build test ERROR on v5.14-rc4 next-20210805]
[cannot apply to tnguy-next-queue/dev-queue sparc-next/master]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Maciej-Fijalkowski/XDP_TX-improvements-for-ice/20210806-071546
base:   https://git.kernel.org/pub/scm/linux/kernel/git/horms/ipvs.git master
config: powerpc64-randconfig-s032-20210804 (attached as .config)
compiler: powerpc-linux-gcc (GCC) 10.3.0
reproduce:
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # apt-get install sparse
        # sparse version: v0.6.3-348-gf0e6938b-dirty
        # https://github.com/0day-ci/linux/commit/349763b451b9e0cd2d65208bb0664e581b8afffb
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Maciej-Fijalkowski/XDP_TX-improvements-for-ice/20210806-071546
        git checkout 349763b451b9e0cd2d65208bb0664e581b8afffb
        # save the attached .config to linux build tree
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-10.3.0 make.cross C=1 CF='-fdiagnostic-prefix -D__CHECK_ENDIAN__' ARCH=powerpc64 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All error/warnings (new ones prefixed by >>):

   drivers/net/ethernet/intel/ice/ice_base.c: In function 'ice_setup_tx_ctx':
>> drivers/net/ethernet/intel/ice/ice_base.c:262:32: warning: passing argument 2 of 'ice_set_cgd_num' makes pointer from integer without a cast [-Wint-conversion]
     262 |  ice_set_cgd_num(tlan_ctx, ring->dcb_tc);
         |                            ~~~~^~~~~~~~
         |                                |
         |                                u8 {aka unsigned char}
   In file included from drivers/net/ethernet/intel/ice/ice_base.c:7:
   drivers/net/ethernet/intel/ice/ice_dcb_lib.h:122:84: note: expected 'struct ice_ring *' but argument is of type 'u8' {aka 'unsigned char'}
     122 | static inline void ice_set_cgd_num(struct ice_tlan_ctx *tlan_ctx, struct ice_ring *ring) { }
         |                                                                   ~~~~~~~~~~~~~~~~~^~~~
--
   drivers/net/ethernet/intel/ice/ice_txrx.c: In function 'ice_tx_prepare_vlan_flags':
>> drivers/net/ethernet/intel/ice/ice_txrx.c:1876:32: error: passing argument 1 of 'ice_tx_prepare_vlan_flags_dcb' from incompatible pointer type [-Werror=incompatible-pointer-types]
    1876 |  ice_tx_prepare_vlan_flags_dcb(tx_ring, first);
         |                                ^~~~~~~
         |                                |
         |                                struct ice_tx_ring *
   In file included from drivers/net/ethernet/intel/ice/ice_txrx.c:14:
   drivers/net/ethernet/intel/ice/ice_dcb_lib.h:98:64: note: expected 'struct ice_ring *' but argument is of type 'struct ice_tx_ring *'
      98 | ice_tx_prepare_vlan_flags_dcb(struct ice_ring __always_unused *tx_ring,
         |                               ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~
   cc1: some warnings being treated as errors


sparse warnings: (new ones prefixed by >>)
>> drivers/net/ethernet/intel/ice/ice_base.c:262:39: sparse: sparse: incorrect type in argument 2 (different base types) @@     expected struct ice_ring *ring @@     got unsigned char [usertype] dcb_tc @@
   drivers/net/ethernet/intel/ice/ice_base.c:262:39: sparse:     expected struct ice_ring *ring
   drivers/net/ethernet/intel/ice/ice_base.c:262:39: sparse:     got unsigned char [usertype] dcb_tc
>> drivers/net/ethernet/intel/ice/ice_base.c:262:35: sparse: sparse: non size-preserving integer to pointer cast

vim +/ice_tx_prepare_vlan_flags_dcb +1876 drivers/net/ethernet/intel/ice/ice_txrx.c

d76a60ba7afb89 Anirudh Venkataramanan 2018-03-20  1850  
d76a60ba7afb89 Anirudh Venkataramanan 2018-03-20  1851  /**
f9867df6d96593 Anirudh Venkataramanan 2019-02-19  1852   * ice_tx_prepare_vlan_flags - prepare generic Tx VLAN tagging flags for HW
d76a60ba7afb89 Anirudh Venkataramanan 2018-03-20  1853   * @tx_ring: ring to send buffer on
d76a60ba7afb89 Anirudh Venkataramanan 2018-03-20  1854   * @first: pointer to struct ice_tx_buf
d76a60ba7afb89 Anirudh Venkataramanan 2018-03-20  1855   *
d76a60ba7afb89 Anirudh Venkataramanan 2018-03-20  1856   * Checks the skb and set up correspondingly several generic transmit flags
d76a60ba7afb89 Anirudh Venkataramanan 2018-03-20  1857   * related to VLAN tagging for the HW, such as VLAN, DCB, etc.
d76a60ba7afb89 Anirudh Venkataramanan 2018-03-20  1858   */
2bb19d6e077190 Brett Creeley          2020-05-15  1859  static void
349763b451b9e0 Maciej Fijalkowski     2021-08-06  1860  ice_tx_prepare_vlan_flags(struct ice_tx_ring *tx_ring, struct ice_tx_buf *first)
d76a60ba7afb89 Anirudh Venkataramanan 2018-03-20  1861  {
d76a60ba7afb89 Anirudh Venkataramanan 2018-03-20  1862  	struct sk_buff *skb = first->skb;
d76a60ba7afb89 Anirudh Venkataramanan 2018-03-20  1863  
2bb19d6e077190 Brett Creeley          2020-05-15  1864  	/* nothing left to do, software offloaded VLAN */
2bb19d6e077190 Brett Creeley          2020-05-15  1865  	if (!skb_vlan_tag_present(skb) && eth_type_vlan(skb->protocol))
2bb19d6e077190 Brett Creeley          2020-05-15  1866  		return;
2bb19d6e077190 Brett Creeley          2020-05-15  1867  
2bb19d6e077190 Brett Creeley          2020-05-15  1868  	/* currently, we always assume 802.1Q for VLAN insertion as VLAN
2bb19d6e077190 Brett Creeley          2020-05-15  1869  	 * insertion for 802.1AD is not supported
2bb19d6e077190 Brett Creeley          2020-05-15  1870  	 */
d76a60ba7afb89 Anirudh Venkataramanan 2018-03-20  1871  	if (skb_vlan_tag_present(skb)) {
d76a60ba7afb89 Anirudh Venkataramanan 2018-03-20  1872  		first->tx_flags |= skb_vlan_tag_get(skb) << ICE_TX_FLAGS_VLAN_S;
d76a60ba7afb89 Anirudh Venkataramanan 2018-03-20  1873  		first->tx_flags |= ICE_TX_FLAGS_HW_VLAN;
d76a60ba7afb89 Anirudh Venkataramanan 2018-03-20  1874  	}
d76a60ba7afb89 Anirudh Venkataramanan 2018-03-20  1875  
2bb19d6e077190 Brett Creeley          2020-05-15 @1876  	ice_tx_prepare_vlan_flags_dcb(tx_ring, first);
d76a60ba7afb89 Anirudh Venkataramanan 2018-03-20  1877  }
d76a60ba7afb89 Anirudh Venkataramanan 2018-03-20  1878  

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 41149 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v3 intel-next 1/6] ice: split ice_ring onto Tx/Rx separate structs
  2021-08-05 23:00 ` [PATCH v3 intel-next 1/6] ice: split ice_ring onto Tx/Rx separate structs Maciej Fijalkowski
  2021-08-06  1:08   ` kernel test robot
@ 2021-08-06 20:46   ` Creeley, Brett
  2021-08-10 13:10     ` Maciej Fijalkowski
  1 sibling, 1 reply; 10+ messages in thread
From: Creeley, Brett @ 2021-08-06 20:46 UTC (permalink / raw)
  To: Fijalkowski, Maciej, intel-wired-lan
  Cc: toke, Karlsson, Magnus, davem, Lobakin, Alexandr, bjorn,
	Brandeburg, Jesse, kuba, bpf, netdev, Nguyen, Anthony L, joamaki

On Fri, 2021-08-06 at 01:00 +0200, Maciej Fijalkowski wrote:
> While it was convenient to have a generic ring structure that served
> both Tx and Rx sides, next commits are going to introduce several
> Tx-specific fields, so in order to avoid hurting the Rx side, let's
> pull out the Tx ring onto new ice_tx_ring struct and let the ice_ring
> handle the Rx rings only.

I like this change. It makes a lot of sense because the Rx/Tx rings
have diverged so much.

I don't see any changes in the coalesce code. I'm pretty sure there
should be some changes in ice_set_rc_coalesce() at the very least
based on these changes.

> 
> Make the union out of the ring container within ice_q_vector so that
> it
> is possible to iterate over newly introduced ice_tx_ring.
> 
> Remove the @size as it's only accessed from control path and it can
> be
> calculated pretty easily.
> 
> Remove @ring_active as it's not actively used anywhere.
> 
> Change definitions of ice_update_ring_stats and
> ice_fetch_u64_stats_per_ring so that they are ring agnostic and can
> be
> used for both Rx and Tx rings.
> 
> Sizes of Rx and Tx ring structs are 256 and 192 bytes, respectively.
> In
> Rx ring xdp_rxq_info occupies its own cacheline, so it's the major
> difference now.
> 
> Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
> ---
>  drivers/net/ethernet/intel/ice/ice.h          | 27 ++++--
>  drivers/net/ethernet/intel/ice/ice_base.c     | 27 +++---
>  drivers/net/ethernet/intel/ice/ice_base.h     |  6 +-
>  drivers/net/ethernet/intel/ice/ice_dcb_lib.c  |  5 +-
>  drivers/net/ethernet/intel/ice/ice_dcb_lib.h  |  6 +-
>  drivers/net/ethernet/intel/ice/ice_ethtool.c  | 17 ++--
>  drivers/net/ethernet/intel/ice/ice_lib.c      | 28 +++---
>  drivers/net/ethernet/intel/ice/ice_lib.h      |  4 +-
>  drivers/net/ethernet/intel/ice/ice_main.c     | 47 +++++-----
>  drivers/net/ethernet/intel/ice/ice_trace.h    |  8 +-
>  drivers/net/ethernet/intel/ice/ice_txrx.c     | 87 ++++++++++-------
> -
>  drivers/net/ethernet/intel/ice/ice_txrx.h     | 90 ++++++++++++-----
> --
>  drivers/net/ethernet/intel/ice/ice_txrx_lib.c |  6 +-
>  drivers/net/ethernet/intel/ice/ice_txrx_lib.h |  8 +-
>  .../net/ethernet/intel/ice/ice_virtchnl_pf.c  |  2 +-
>  drivers/net/ethernet/intel/ice/ice_xsk.c      | 29 +++---
>  drivers/net/ethernet/intel/ice/ice_xsk.h      |  8 +-
>  17 files changed, 233 insertions(+), 172 deletions(-)
> 
> diff --git a/drivers/net/ethernet/intel/ice/ice.h
> b/drivers/net/ethernet/intel/ice/ice.h
> index a450343fbb92..2e15e097bc0f 100644
> --- a/drivers/net/ethernet/intel/ice/ice.h
> +++ b/drivers/net/ethernet/intel/ice/ice.h
> @@ -266,7 +266,7 @@ struct ice_vsi {
>  	struct ice_pf *back;		 /* back pointer to PF */
>  	struct ice_port_info *port_info; /* back pointer to port_info
> */
>  	struct ice_ring **rx_rings;	 /* Rx ring array */

If you are doing this, we should be explicit for Rx rings too and
rename ice_ring to ice_rx_ring.

Obviously this would generate some more work here, but I think
it's necessary with this change.

> -	struct ice_ring **tx_rings;	 /* Tx ring array */
> +	struct ice_tx_ring **tx_rings;	 /* Tx ring array */
>  	struct ice_q_vector **q_vectors; /* q_vector array */
>  
>  	irqreturn_t (*irq_handler)(int irq, void *data);
> @@ -343,7 +343,7 @@ struct ice_vsi {
>  	u16 qset_handle[ICE_MAX_TRAFFIC_CLASS];
>  	struct ice_tc_cfg tc_cfg;
>  	struct bpf_prog *xdp_prog;
> -	struct ice_ring **xdp_rings;	 /* XDP ring array */
> +	struct ice_tx_ring **xdp_rings;	 /* XDP ring array */
>  	unsigned long *af_xdp_zc_qps;	 /* tracks AF_XDP ZC enabled
> qps */
>  	u16 num_xdp_txq;		 /* Used XDP queues */
>  	u8 xdp_mapping_mode;		 /*
> ICE_MAP_MODE_[CONTIG|SCATTER] */
> @@ -555,14 +555,14 @@ static inline bool ice_is_xdp_ena_vsi(struct
> ice_vsi *vsi)
>  	return !!vsi->xdp_prog;
>  }
>  
> -static inline void ice_set_ring_xdp(struct ice_ring *ring)
> +static inline void ice_set_ring_xdp(struct ice_tx_ring *ring)
>  {
>  	ring->flags |= ICE_TX_FLAGS_RING_XDP;
>  }
>  
>  /**
>   * ice_xsk_pool - get XSK buffer pool bound to a ring
> - * @ring: ring to use
> + * @ring: Rx ring to use
>   *
>   * Returns a pointer to xdp_umem structure if there is a buffer pool
> present,
>   * NULL otherwise.
> @@ -572,8 +572,23 @@ static inline struct xsk_buff_pool
> *ice_xsk_pool(struct ice_ring *ring)
>  	struct ice_vsi *vsi = ring->vsi;
>  	u16 qid = ring->q_index;
>  
> -	if (ice_ring_is_xdp(ring))
> -		qid -= vsi->num_xdp_txq;
> +	if (!ice_is_xdp_ena_vsi(vsi) || !test_bit(qid, vsi-
> >af_xdp_zc_qps))
> +		return NULL;
> +
> +	return xsk_get_pool_from_qid(vsi->netdev, qid);

Is this a bug fix? It seems like before we
> +}
> +
> +/**
> + * ice_tx_xsk_pool - get XSK buffer pool bound to a ring
> + * @ring: Tx ring to use
> + *
> + * Returns a pointer to xdp_umem structure if there is a buffer pool
> present,
> + * NULL otherwise. Tx equivalent of ice_xsk_pool.
> + */
> +static inline struct xsk_buff_pool *ice_tx_xsk_pool(struct
> ice_tx_ring *ring)
> +{
> +	struct ice_vsi *vsi = ring->vsi;
> +	u16 qid = ring->q_index - vsi->num_xdp_txq;

RCT. Should just assign the qid variable after to keep RCT
ordering. Probably not strictly necessary though because it
makes sense this way since you have to deref the vsi first.

>  
>  	if (!ice_is_xdp_ena_vsi(vsi) || !test_bit(qid, vsi-
> >af_xdp_zc_qps))
>  		return NULL;
> diff --git a/drivers/net/ethernet/intel/ice/ice_base.c
> b/drivers/net/ethernet/intel/ice/ice_base.c
> index c36057efc7ae..838ee4b8d96f 100644
> --- a/drivers/net/ethernet/intel/ice/ice_base.c
> +++ b/drivers/net/ethernet/intel/ice/ice_base.c
> @@ -146,6 +146,7 @@ static void ice_free_q_vector(struct ice_vsi
> *vsi, int v_idx)
>  {
>  	struct ice_q_vector *q_vector;
>  	struct ice_pf *pf = vsi->back;
> +	struct ice_tx_ring *tx_ring;
>  	struct ice_ring *ring;
struct ice_rx_ring *rx_ring; would be much more clear here
>  	struct device *dev;
>  
> @@ -156,8 +157,8 @@ static void ice_free_q_vector(struct ice_vsi
> *vsi, int v_idx)
>  	}
>  	q_vector = vsi->q_vectors[v_idx];
>  
> -	ice_for_each_ring(ring, q_vector->tx)
> -		ring->q_vector = NULL;
> +	ice_for_each_tx_ring(tx_ring, q_vector->tx)
> +		tx_ring->q_vector = NULL;

It seems like if we used a "void *ring" in the ice_ring_container
it would simplify some of this and we wouldn't need a
differnt "for_each" for loop.

The only downfall is we would have to cast to the correct ring
type based on context when we want to dereference it.

>  	ice_for_each_ring(ring, q_vector->rx)
>  		ring->q_vector = NULL;
Then it would be more explicit:

ice_for_each_rx_ring(ring, q_vector->rx)
	rx_ring->q_vector = NULL;
>  
> @@ -206,7 +207,7 @@ static void ice_cfg_itr_gran(struct ice_hw *hw)
>   * @ring: ring to get the absolute queue index
>   * @tc: traffic class number
>   */
> -static u16 ice_calc_q_handle(struct ice_vsi *vsi, struct ice_ring
> *ring, u8 tc)
> +static u16 ice_calc_q_handle(struct ice_vsi *vsi, struct ice_tx_ring
> *ring, u8 tc)

should this be ice_calc_txq_handle()? Seems like it should have always
been called that, but your change made it more obvious.

>  {
>  	WARN_ONCE(ice_ring_is_xdp(ring) && tc, "XDP ring can't belong
> to TC other than 0\n");
>  
> @@ -224,7 +225,7 @@ static u16 ice_calc_q_handle(struct ice_vsi *vsi,
> struct ice_ring *ring, u8 tc)
>   * This enables/disables XPS for a given Tx descriptor ring
>   * based on the TCs enabled for the VSI that ring belongs to.
>   */
> -static void ice_cfg_xps_tx_ring(struct ice_ring *ring)
> +static void ice_cfg_xps_tx_ring(struct ice_tx_ring *ring)
>  {
>  	if (!ring->q_vector || !ring->netdev)
>  		return;
> @@ -246,7 +247,7 @@ static void ice_cfg_xps_tx_ring(struct ice_ring
> *ring)
>   * Configure the Tx descriptor ring in TLAN context.
>   */
>  static void
> -ice_setup_tx_ctx(struct ice_ring *ring, struct ice_tlan_ctx
> *tlan_ctx, u16 pf_q)
> +ice_setup_tx_ctx(struct ice_tx_ring *ring, struct ice_tlan_ctx
> *tlan_ctx, u16 pf_q)
>  {
>  	struct ice_vsi *vsi = ring->vsi;
>  	struct ice_hw *hw = &vsi->back->hw;
> @@ -258,7 +259,7 @@ ice_setup_tx_ctx(struct ice_ring *ring, struct
> ice_tlan_ctx *tlan_ctx, u16 pf_q)
>  	/* Transmit Queue Length */
>  	tlan_ctx->qlen = ring->count;
>  
> -	ice_set_cgd_num(tlan_ctx, ring);
> +	ice_set_cgd_num(tlan_ctx, ring->dcb_tc);
>  
>  	/* PF number */
>  	tlan_ctx->pf_num = hw->pf_id;
> @@ -660,16 +661,16 @@ void ice_vsi_map_rings_to_vectors(struct
> ice_vsi *vsi)
>  		tx_rings_per_v = (u8)DIV_ROUND_UP(tx_rings_rem,
>  						  q_vectors - v_id);
>  		q_vector->num_ring_tx = tx_rings_per_v;
> -		q_vector->tx.ring = NULL;
> +		q_vector->tx.tx_ring = NULL;
>  		q_vector->tx.itr_idx = ICE_TX_ITR;
>  		q_base = vsi->num_txq - tx_rings_rem;
>  
>  		for (q_id = q_base; q_id < (q_base + tx_rings_per_v);
> q_id++) {
> -			struct ice_ring *tx_ring = vsi->tx_rings[q_id];
> +			struct ice_tx_ring *tx_ring = vsi-
> >tx_rings[q_id];
>  
>  			tx_ring->q_vector = q_vector;
> -			tx_ring->next = q_vector->tx.ring;
> -			q_vector->tx.ring = tx_ring;
> +			tx_ring->next = q_vector->tx.tx_ring;
> +			q_vector->tx.tx_ring = tx_ring;
>  		}
>  		tx_rings_rem -= tx_rings_per_v;
>  
> @@ -711,7 +712,7 @@ void ice_vsi_free_q_vectors(struct ice_vsi *vsi)
>   * @qg_buf: queue group buffer
>   */
>  int
> -ice_vsi_cfg_txq(struct ice_vsi *vsi, struct ice_ring *ring,
> +ice_vsi_cfg_txq(struct ice_vsi *vsi, struct ice_tx_ring *ring,
>  		struct ice_aqc_add_tx_qgrp *qg_buf)
>  {
>  	u8 buf_len = struct_size(qg_buf, txqs, 1);
> @@ -870,7 +871,7 @@ void ice_trigger_sw_intr(struct ice_hw *hw,
> struct ice_q_vector *q_vector)
>   */
>  int
>  ice_vsi_stop_tx_ring(struct ice_vsi *vsi, enum ice_disq_rst_src
> rst_src,
> -		     u16 rel_vmvf_num, struct ice_ring *ring,
> +		     u16 rel_vmvf_num, struct ice_tx_ring *ring,
>  		     struct ice_txq_meta *txq_meta)
>  {
>  	struct ice_pf *pf = vsi->back;
> @@ -927,7 +928,7 @@ ice_vsi_stop_tx_ring(struct ice_vsi *vsi, enum
> ice_disq_rst_src rst_src,
>   * are needed for stopping Tx queue
>   */
>  void
> -ice_fill_txq_meta(struct ice_vsi *vsi, struct ice_ring *ring,
> +ice_fill_txq_meta(struct ice_vsi *vsi, struct ice_tx_ring *ring,
>  		  struct ice_txq_meta *txq_meta)
>  {
>  	u8 tc;
> diff --git a/drivers/net/ethernet/intel/ice/ice_base.h
> b/drivers/net/ethernet/intel/ice/ice_base.h
> index 20e1c29aa68a..2ce777eb53b0 100644
> --- a/drivers/net/ethernet/intel/ice/ice_base.h
> +++ b/drivers/net/ethernet/intel/ice/ice_base.h
> @@ -15,7 +15,7 @@ int ice_vsi_alloc_q_vectors(struct ice_vsi *vsi);
>  void ice_vsi_map_rings_to_vectors(struct ice_vsi *vsi);
>  void ice_vsi_free_q_vectors(struct ice_vsi *vsi);
>  int
> -ice_vsi_cfg_txq(struct ice_vsi *vsi, struct ice_ring *ring,
> +ice_vsi_cfg_txq(struct ice_vsi *vsi, struct ice_tx_ring *ring,
>  		struct ice_aqc_add_tx_qgrp *qg_buf);
>  void ice_cfg_itr(struct ice_hw *hw, struct ice_q_vector *q_vector);
>  void
> @@ -25,9 +25,9 @@ ice_cfg_rxq_interrupt(struct ice_vsi *vsi, u16 rxq,
> u16 msix_idx, u16 itr_idx);
>  void ice_trigger_sw_intr(struct ice_hw *hw, struct ice_q_vector
> *q_vector);
>  int
>  ice_vsi_stop_tx_ring(struct ice_vsi *vsi, enum ice_disq_rst_src
> rst_src,
> -		     u16 rel_vmvf_num, struct ice_ring *ring,
> +		     u16 rel_vmvf_num, struct ice_tx_ring *ring,
>  		     struct ice_txq_meta *txq_meta);
>  void
> -ice_fill_txq_meta(struct ice_vsi *vsi, struct ice_ring *ring,
> +ice_fill_txq_meta(struct ice_vsi *vsi, struct ice_tx_ring *ring,
>  		  struct ice_txq_meta *txq_meta);
>  #endif /* _ICE_BASE_H_ */
> diff --git a/drivers/net/ethernet/intel/ice/ice_dcb_lib.c
> b/drivers/net/ethernet/intel/ice/ice_dcb_lib.c
> index 926cf748c5ec..2507223bfdc7 100644
> --- a/drivers/net/ethernet/intel/ice/ice_dcb_lib.c
> +++ b/drivers/net/ethernet/intel/ice/ice_dcb_lib.c
> @@ -194,7 +194,8 @@ u8 ice_dcb_get_tc(struct ice_vsi *vsi, int
> queue_index)
>   */
>  void ice_vsi_cfg_dcb_rings(struct ice_vsi *vsi)
>  {
> -	struct ice_ring *tx_ring, *rx_ring;
> +	struct ice_tx_ring *tx_ring;
> +	struct ice_ring *rx_ring;
>  	u16 qoffset, qcount;
>  	int i, n;
>  
> @@ -814,7 +815,7 @@ void ice_update_dcb_stats(struct ice_pf *pf)
>   * tag will already be configured with the correct ID and priority
> bits
>   */
>  void
> -ice_tx_prepare_vlan_flags_dcb(struct ice_ring *tx_ring,
> +ice_tx_prepare_vlan_flags_dcb(struct ice_tx_ring *tx_ring,
>  			      struct ice_tx_buf *first)
>  {
>  	struct sk_buff *skb = first->skb;
> diff --git a/drivers/net/ethernet/intel/ice/ice_dcb_lib.h
> b/drivers/net/ethernet/intel/ice/ice_dcb_lib.h
> index 261b6e2ed7bc..a5bdf47cd34a 100644
> --- a/drivers/net/ethernet/intel/ice/ice_dcb_lib.h
> +++ b/drivers/net/ethernet/intel/ice/ice_dcb_lib.h
> @@ -28,7 +28,7 @@ void ice_vsi_cfg_dcb_rings(struct ice_vsi *vsi);
>  int ice_init_pf_dcb(struct ice_pf *pf, bool locked);
>  void ice_update_dcb_stats(struct ice_pf *pf);
>  void
> -ice_tx_prepare_vlan_flags_dcb(struct ice_ring *tx_ring,
> +ice_tx_prepare_vlan_flags_dcb(struct ice_tx_ring *tx_ring,
>  			      struct ice_tx_buf *first);
>  void
>  ice_dcb_process_lldp_set_mib_change(struct ice_pf *pf,
> @@ -49,9 +49,9 @@ static inline bool ice_find_q_in_range(u16 low, u16
> high, unsigned int tx_q)
>  }
>  
>  static inline void
> -ice_set_cgd_num(struct ice_tlan_ctx *tlan_ctx, struct ice_ring
> *ring)
> +ice_set_cgd_num(struct ice_tlan_ctx *tlan_ctx, u8 dcb_tc)
>  {
> -	tlan_ctx->cgd_num = ring->dcb_tc;
> +	tlan_ctx->cgd_num = dcb_tc;

Seems like this change isn't 100% necessary as part of this patch,
but I guess you would have had to update it to use ice_tx_ring,
so this does make sense to just pass the dcb_tc.

>  }
>  
>  static inline bool ice_is_dcb_active(struct ice_pf *pf)
> diff --git a/drivers/net/ethernet/intel/ice/ice_ethtool.c
> b/drivers/net/ethernet/intel/ice/ice_ethtool.c
> index d95a5daca114..644ce9f3494d 100644
> --- a/drivers/net/ethernet/intel/ice/ice_ethtool.c
> +++ b/drivers/net/ethernet/intel/ice/ice_ethtool.c
> @@ -584,7 +584,7 @@ static bool ice_lbtest_check_frame(u8 *frame)
>   *
>   * Function sends loopback packets on a test Tx ring.
>   */
> -static int ice_diag_send(struct ice_ring *tx_ring, u8 *data, u16
> size)
> +static int ice_diag_send(struct ice_tx_ring *tx_ring, u8 *data, u16
> size)
>  {
>  	struct ice_tx_desc *tx_desc;
>  	struct ice_tx_buf *tx_buf;
> @@ -676,9 +676,10 @@ static u64 ice_loopback_test(struct net_device
> *netdev)
>  	struct ice_netdev_priv *np = netdev_priv(netdev);
>  	struct ice_vsi *orig_vsi = np->vsi, *test_vsi;
>  	struct ice_pf *pf = orig_vsi->back;
> -	struct ice_ring *tx_ring, *rx_ring;
>  	u8 broadcast[ETH_ALEN], ret = 0;
>  	int num_frames, valid_frames;
> +	struct ice_tx_ring *tx_ring;
> +	struct ice_ring *rx_ring;
>  	struct device *dev;
>  	u8 *tx_frame;
>  	int i;
> @@ -1318,6 +1319,7 @@ ice_get_ethtool_stats(struct net_device
> *netdev,
>  	struct ice_netdev_priv *np = netdev_priv(netdev);
>  	struct ice_vsi *vsi = np->vsi;
>  	struct ice_pf *pf = vsi->back;
> +	struct ice_tx_ring *tx_ring;
>  	struct ice_ring *ring;
>  	unsigned int j;
>  	int i = 0;
> @@ -1336,10 +1338,10 @@ ice_get_ethtool_stats(struct net_device
> *netdev,
>  	rcu_read_lock();
>  
>  	ice_for_each_alloc_txq(vsi, j) {
> -		ring = READ_ONCE(vsi->tx_rings[j]);
> +		tx_ring = READ_ONCE(vsi->tx_rings[j]);
>  		if (ring) {

This should be "if (tx_ring)"

> -			data[i++] = ring->stats.pkts;
> -			data[i++] = ring->stats.bytes;
> +			data[i++] = tx_ring->stats.pkts;
> +			data[i++] = tx_ring->stats.bytes;
>  		} else {
>  			data[i++] = 0;
>  			data[i++] = 0;
> @@ -2667,9 +2669,10 @@ ice_get_ringparam(struct net_device *netdev,
> struct ethtool_ringparam *ring)
>  static int
>  ice_set_ringparam(struct net_device *netdev, struct
> ethtool_ringparam *ring)
>  {
> -	struct ice_ring *tx_rings = NULL, *rx_rings = NULL;
> +	struct ice_tx_ring *tx_rings = NULL;
> +	struct ice_ring *rx_rings = NULL;
>  	struct ice_netdev_priv *np = netdev_priv(netdev);
> -	struct ice_ring *xdp_rings = NULL;
> +	struct ice_tx_ring *xdp_rings = NULL;

RCT got a little messed up here.

>  	struct ice_vsi *vsi = np->vsi;
>  	struct ice_pf *pf = vsi->back;
>  	int i, timeout = 50, err = 0;
> diff --git a/drivers/net/ethernet/intel/ice/ice_lib.c
> b/drivers/net/ethernet/intel/ice/ice_lib.c
> index dde9802c6c72..ac0d7a52406b 100644
> --- a/drivers/net/ethernet/intel/ice/ice_lib.c
> +++ b/drivers/net/ethernet/intel/ice/ice_lib.c
> @@ -379,12 +379,12 @@ static irqreturn_t ice_msix_clean_ctrl_vsi(int
> __always_unused irq, void *data)
>  {
>  	struct ice_q_vector *q_vector = (struct ice_q_vector *)data;
>  
> -	if (!q_vector->tx.ring)

I don't think this function would have changed if we used a "void
*ring" in the ice_ring_container.

> +	if (!q_vector->tx.tx_ring)
>  		return IRQ_HANDLED;
>  
>  #define FDIR_RX_DESC_CLEAN_BUDGET 64
>  	ice_clean_rx_irq(q_vector->rx.ring, FDIR_RX_DESC_CLEAN_BUDGET);
> -	ice_clean_ctrl_tx_irq(q_vector->tx.ring);
> +	ice_clean_ctrl_tx_irq(q_vector->tx.tx_ring);
>  
>  	return IRQ_HANDLED;
>  }
> @@ -1286,7 +1286,7 @@ static int ice_vsi_alloc_rings(struct ice_vsi
> *vsi)
>  	dev = ice_pf_to_dev(pf);
>  	/* Allocate Tx rings */
>  	for (i = 0; i < vsi->alloc_txq; i++) {
> -		struct ice_ring *ring;
> +		struct ice_tx_ring *ring;
>  
>  		/* allocate with kzalloc(), free with kfree_rcu() */
>  		ring = kzalloc(sizeof(*ring), GFP_KERNEL);
> @@ -1296,7 +1296,6 @@ static int ice_vsi_alloc_rings(struct ice_vsi
> *vsi)
>  
>  		ring->q_index = i;
>  		ring->reg_idx = vsi->txq_map[i];
> -		ring->ring_active = false;
>  		ring->vsi = vsi;
>  		ring->tx_tstamps = &pf->ptp.port.tx;
>  		ring->dev = dev;
> @@ -1315,7 +1314,6 @@ static int ice_vsi_alloc_rings(struct ice_vsi
> *vsi)
>  
>  		ring->q_index = i;
>  		ring->reg_idx = vsi->rxq_map[i];
> -		ring->ring_active = false;
>  		ring->vsi = vsi;
>  		ring->netdev = vsi->netdev;
>  		ring->dev = dev;
> @@ -1710,7 +1708,7 @@ int ice_vsi_cfg_single_rxq(struct ice_vsi *vsi,
> u16 q_idx)
>  	return ice_vsi_cfg_rxq(vsi->rx_rings[q_idx]);
>  }
>  
> -int ice_vsi_cfg_single_txq(struct ice_vsi *vsi, struct ice_ring
> **tx_rings, u16 q_idx)
> +int ice_vsi_cfg_single_txq(struct ice_vsi *vsi, struct ice_tx_ring
> **tx_rings, u16 q_idx)
>  {
>  	struct ice_aqc_add_tx_qgrp *qg_buf;
>  	int err;
> @@ -1766,7 +1764,7 @@ int ice_vsi_cfg_rxqs(struct ice_vsi *vsi)
>   * Configure the Tx VSI for operation.
>   */
>  static int
> -ice_vsi_cfg_txqs(struct ice_vsi *vsi, struct ice_ring **rings, u16
> count)
> +ice_vsi_cfg_txqs(struct ice_vsi *vsi, struct ice_tx_ring **rings,
> u16 count)
>  {
>  	struct ice_aqc_add_tx_qgrp *qg_buf;
>  	u16 q_idx = 0;
> @@ -1818,7 +1816,7 @@ int ice_vsi_cfg_xdp_txqs(struct ice_vsi *vsi)
>  		return ret;
>  
>  	for (i = 0; i < vsi->num_xdp_txq; i++)
> -		vsi->xdp_rings[i]->xsk_pool = ice_xsk_pool(vsi-
> >xdp_rings[i]);
> +		vsi->xdp_rings[i]->xsk_pool = ice_tx_xsk_pool(vsi-
> >xdp_rings[i]);
>  
>  	return ret;
>  }
> @@ -2057,7 +2055,7 @@ int ice_vsi_stop_all_rx_rings(struct ice_vsi
> *vsi)
>   */
>  static int
>  ice_vsi_stop_tx_rings(struct ice_vsi *vsi, enum ice_disq_rst_src
> rst_src,
> -		      u16 rel_vmvf_num, struct ice_ring **rings, u16
> count)
> +		      u16 rel_vmvf_num, struct ice_tx_ring **rings, u16
> count)
>  {
>  	u16 q_idx;
>  
> @@ -3357,10 +3355,10 @@ int ice_vsi_cfg_tc(struct ice_vsi *vsi, u8
> ena_tc)
>   *
>   * This function assumes that caller has acquired a u64_stats_sync
> lock.
>   */
> -static void ice_update_ring_stats(struct ice_ring *ring, u64 pkts,
> u64 bytes)
> +static void ice_update_ring_stats(struct ice_q_stats *stats, u64
> pkts, u64 bytes)
>  {
> -	ring->stats.bytes += bytes;
> -	ring->stats.pkts += pkts;
> +	stats->bytes += bytes;
> +	stats->pkts += pkts;

This is a nice little clean up.

>  }
>  
>  /**
> @@ -3369,10 +3367,10 @@ static void ice_update_ring_stats(struct
> ice_ring *ring, u64 pkts, u64 bytes)
>   * @pkts: number of processed packets
>   * @bytes: number of processed bytes
>   */
> -void ice_update_tx_ring_stats(struct ice_ring *tx_ring, u64 pkts,
> u64 bytes)
> +void ice_update_tx_ring_stats(struct ice_tx_ring *tx_ring, u64 pkts,
> u64 bytes)
>  {
>  	u64_stats_update_begin(&tx_ring->syncp);
> -	ice_update_ring_stats(tx_ring, pkts, bytes);
> +	ice_update_ring_stats(&tx_ring->stats, pkts, bytes);
>  	u64_stats_update_end(&tx_ring->syncp);
>  }
>  
> @@ -3385,7 +3383,7 @@ void ice_update_tx_ring_stats(struct ice_ring
> *tx_ring, u64 pkts, u64 bytes)
>  void ice_update_rx_ring_stats(struct ice_ring *rx_ring, u64 pkts,
> u64 bytes)
>  {
>  	u64_stats_update_begin(&rx_ring->syncp);
> -	ice_update_ring_stats(rx_ring, pkts, bytes);
> +	ice_update_ring_stats(&rx_ring->stats, pkts, bytes);
>  	u64_stats_update_end(&rx_ring->syncp);
>  }
>  
> diff --git a/drivers/net/ethernet/intel/ice/ice_lib.h
> b/drivers/net/ethernet/intel/ice/ice_lib.h
> index d5a28bf0fc2c..2a69666db194 100644
> --- a/drivers/net/ethernet/intel/ice/ice_lib.h
> +++ b/drivers/net/ethernet/intel/ice/ice_lib.h
> @@ -14,7 +14,7 @@ void ice_update_eth_stats(struct ice_vsi *vsi);
>  
>  int ice_vsi_cfg_single_rxq(struct ice_vsi *vsi, u16 q_idx);
>  
> -int ice_vsi_cfg_single_txq(struct ice_vsi *vsi, struct ice_ring
> **tx_rings, u16 q_idx);
> +int ice_vsi_cfg_single_txq(struct ice_vsi *vsi, struct ice_tx_ring
> **tx_rings, u16 q_idx);
>  
>  int ice_vsi_cfg_rxqs(struct ice_vsi *vsi);
>  
> @@ -93,7 +93,7 @@ void ice_vsi_free_tx_rings(struct ice_vsi *vsi);
>  
>  void ice_vsi_manage_rss_lut(struct ice_vsi *vsi, bool ena);
>  
> -void ice_update_tx_ring_stats(struct ice_ring *ring, u64 pkts, u64
> bytes);
> +void ice_update_tx_ring_stats(struct ice_tx_ring *ring, u64 pkts,
> u64 bytes);
>  
>  void ice_update_rx_ring_stats(struct ice_ring *ring, u64 pkts, u64
> bytes);
>  
> diff --git a/drivers/net/ethernet/intel/ice/ice_main.c
> b/drivers/net/ethernet/intel/ice/ice_main.c
> index ef8d1815af56..cbcb4ad60852 100644
> --- a/drivers/net/ethernet/intel/ice/ice_main.c
> +++ b/drivers/net/ethernet/intel/ice/ice_main.c
> @@ -61,7 +61,7 @@ bool netif_is_ice(struct net_device *dev)
>   * ice_get_tx_pending - returns number of Tx descriptors not
> processed
>   * @ring: the ring of descriptors
>   */
> -static u16 ice_get_tx_pending(struct ice_ring *ring)
> +static u16 ice_get_tx_pending(struct ice_tx_ring *ring)
>  {
>  	u16 head, tail;
>  
> @@ -101,7 +101,7 @@ static void ice_check_for_hang_subtask(struct
> ice_pf *pf)
>  	hw = &vsi->back->hw;
>  
>  	for (i = 0; i < vsi->num_txq; i++) {

Interesting that this isn't using ice_for_each_txq()

> -		struct ice_ring *tx_ring = vsi->tx_rings[i];
> +		struct ice_tx_ring *tx_ring = vsi->tx_rings[i];
>  
>  		if (tx_ring && tx_ring->desc) {
>  			/* If packet counter has not changed the queue
> is
> @@ -2363,7 +2363,7 @@ static int ice_xdp_alloc_setup_rings(struct
> ice_vsi *vsi)
>  
>  	for (i = 0; i < vsi->num_xdp_txq; i++) {
>  		u16 xdp_q_idx = vsi->alloc_txq + i;
> -		struct ice_ring *xdp_ring;
> +		struct ice_tx_ring *xdp_ring;
>  
>  		xdp_ring = kzalloc(sizeof(*xdp_ring), GFP_KERNEL);
>  
> @@ -2372,7 +2372,6 @@ static int ice_xdp_alloc_setup_rings(struct
> ice_vsi *vsi)
>  
>  		xdp_ring->q_index = xdp_q_idx;
>  		xdp_ring->reg_idx = vsi->txq_map[xdp_q_idx];
> -		xdp_ring->ring_active = false;
>  		xdp_ring->vsi = vsi;
>  		xdp_ring->netdev = NULL;
>  		xdp_ring->dev = dev;
> @@ -2381,7 +2380,7 @@ static int ice_xdp_alloc_setup_rings(struct
> ice_vsi *vsi)
>  		if (ice_setup_tx_ring(xdp_ring))
>  			goto free_xdp_rings;
>  		ice_set_ring_xdp(xdp_ring);
> -		xdp_ring->xsk_pool = ice_xsk_pool(xdp_ring);
> +		xdp_ring->xsk_pool = ice_tx_xsk_pool(xdp_ring);
>  	}
>  
>  	return 0;
> @@ -2460,11 +2459,11 @@ int ice_prepare_xdp_rings(struct ice_vsi
> *vsi, struct bpf_prog *prog)
>  		q_base = vsi->num_xdp_txq - xdp_rings_rem;
>  
>  		for (q_id = q_base; q_id < (q_base + xdp_rings_per_v);
> q_id++) {
> -			struct ice_ring *xdp_ring = vsi-
> >xdp_rings[q_id];
> +			struct ice_tx_ring *xdp_ring = vsi-
> >xdp_rings[q_id];
>  
>  			xdp_ring->q_vector = q_vector;
> -			xdp_ring->next = q_vector->tx.ring;
> -			q_vector->tx.ring = xdp_ring;
> +			xdp_ring->next = q_vector->tx.tx_ring;
> +			q_vector->tx.tx_ring = xdp_ring;
>  		}
>  		xdp_rings_rem -= xdp_rings_per_v;
>  	}
> @@ -2534,14 +2533,14 @@ int ice_destroy_xdp_rings(struct ice_vsi
> *vsi)
>  
>  	ice_for_each_q_vector(vsi, v_idx) {
>  		struct ice_q_vector *q_vector = vsi->q_vectors[v_idx];
> -		struct ice_ring *ring;
> +		struct ice_tx_ring *ring;
>  
> -		ice_for_each_ring(ring, q_vector->tx)
> +		ice_for_each_tx_ring(ring, q_vector->tx)
>  			if (!ring->tx_buf || !ice_ring_is_xdp(ring))
>  				break;
>  
>  		/* restore the value of last node prior to XDP setup */
> -		q_vector->tx.ring = ring;
> +		q_vector->tx.tx_ring = ring;
>  	}
>  
>  free_qmap:
> @@ -5615,19 +5614,18 @@ int ice_up(struct ice_vsi *vsi)
>   * that needs to be performed to read u64 values in 32 bit machine.
>   */
>  static void
> -ice_fetch_u64_stats_per_ring(struct ice_ring *ring, u64 *pkts, u64
> *bytes)
> +ice_fetch_u64_stats_per_ring(struct u64_stats_sync *syncp, struct
> ice_q_stats stats,
> +			     u64 *pkts, u64 *bytes)
>  {
>  	unsigned int start;
>  	*pkts = 0;
>  	*bytes = 0;
>  
> -	if (!ring)
> -		return;
>  	do {
> -		start = u64_stats_fetch_begin_irq(&ring->syncp);
> -		*pkts = ring->stats.pkts;
> -		*bytes = ring->stats.bytes;
> -	} while (u64_stats_fetch_retry_irq(&ring->syncp, start));
> +		start = u64_stats_fetch_begin_irq(syncp);
> +		*pkts = stats.pkts;
> +		*bytes = stats.bytes;
> +	} while (u64_stats_fetch_retry_irq(syncp, start));
>  }
>  
>  /**
> @@ -5637,18 +5635,19 @@ ice_fetch_u64_stats_per_ring(struct ice_ring
> *ring, u64 *pkts, u64 *bytes)
>   * @count: number of rings
>   */
>  static void
> -ice_update_vsi_tx_ring_stats(struct ice_vsi *vsi, struct ice_ring
> **rings,
> +ice_update_vsi_tx_ring_stats(struct ice_vsi *vsi, struct ice_tx_ring
> **rings,
>  			     u16 count)
>  {
>  	struct rtnl_link_stats64 *vsi_stats = &vsi->net_stats;
>  	u16 i;
>  
>  	for (i = 0; i < count; i++) {
> -		struct ice_ring *ring;
> +		struct ice_tx_ring *ring;
>  		u64 pkts, bytes;
>  
>  		ring = READ_ONCE(rings[i]);
> -		ice_fetch_u64_stats_per_ring(ring, &pkts, &bytes);
> +		if (ring)
> +			ice_fetch_u64_stats_per_ring(&ring->syncp,
> ring->stats, &pkts, &bytes);
>  		vsi_stats->tx_packets += pkts;
>  		vsi_stats->tx_bytes += bytes;
>  		vsi->tx_restart += ring->tx_stats.restart_q;
> @@ -5689,7 +5688,7 @@ static void ice_update_vsi_ring_stats(struct
> ice_vsi *vsi)
>  	ice_for_each_rxq(vsi, i) {
>  		struct ice_ring *ring = READ_ONCE(vsi->rx_rings[i]);
>  
> -		ice_fetch_u64_stats_per_ring(ring, &pkts, &bytes);
> +		ice_fetch_u64_stats_per_ring(&ring->syncp, ring->stats, 
> &pkts, &bytes);
>  		vsi_stats->rx_packets += pkts;
>  		vsi_stats->rx_bytes += bytes;
>  		vsi->rx_buf_failed += ring->rx_stats.alloc_buf_failed;
> @@ -6036,7 +6035,7 @@ int ice_vsi_setup_tx_rings(struct ice_vsi *vsi)
>  	}
>  
>  	ice_for_each_txq(vsi, i) {
> -		struct ice_ring *ring = vsi->tx_rings[i];
> +		struct ice_tx_ring *ring = vsi->tx_rings[i];
>  
>  		if (!ring)
>  			return -EINVAL;
> @@ -6962,7 +6961,7 @@ ice_bridge_setlink(struct net_device *dev,
> struct nlmsghdr *nlh,
>  static void ice_tx_timeout(struct net_device *netdev, unsigned int
> txqueue)
>  {
>  	struct ice_netdev_priv *np = netdev_priv(netdev);
> -	struct ice_ring *tx_ring = NULL;
> +	struct ice_tx_ring *tx_ring = NULL;
>  	struct ice_vsi *vsi = np->vsi;
>  	struct ice_pf *pf = vsi->back;
>  	u32 i;
> diff --git a/drivers/net/ethernet/intel/ice/ice_trace.h
> b/drivers/net/ethernet/intel/ice/ice_trace.h
> index 9bc0b8fdfc77..f230ff435f26 100644
> --- a/drivers/net/ethernet/intel/ice/ice_trace.h
> +++ b/drivers/net/ethernet/intel/ice/ice_trace.h
> @@ -115,7 +115,7 @@ DEFINE_EVENT(ice_tx_dim_template,
> ice_tx_dim_work,
>  
>  /* Events related to a vsi & ring */
>  DECLARE_EVENT_CLASS(ice_tx_template,
> -		    TP_PROTO(struct ice_ring *ring, struct ice_tx_desc
> *desc,
> +		    TP_PROTO(struct ice_tx_ring *ring, struct
> ice_tx_desc *desc,
>  			     struct ice_tx_buf *buf),
>  
>  		    TP_ARGS(ring, desc, buf),
> @@ -135,7 +135,7 @@ DECLARE_EVENT_CLASS(ice_tx_template,
>  
>  #define DEFINE_TX_TEMPLATE_OP_EVENT(name) \
>  DEFINE_EVENT(ice_tx_template, name, \
> -	     TP_PROTO(struct ice_ring *ring, \
> +	     TP_PROTO(struct ice_tx_ring *ring, \
>  		      struct ice_tx_desc *desc, \
>  		      struct ice_tx_buf *buf), \
>  	     TP_ARGS(ring, desc, buf))
> @@ -192,7 +192,7 @@ DEFINE_EVENT(ice_rx_indicate_template,
> ice_clean_rx_irq_indicate,
>  );
>  
>  DECLARE_EVENT_CLASS(ice_xmit_template,
> -		    TP_PROTO(struct ice_ring *ring, struct sk_buff
> *skb),
> +		    TP_PROTO(struct ice_tx_ring *ring, struct sk_buff
> *skb),
>  
>  		    TP_ARGS(ring, skb),
>  
> @@ -210,7 +210,7 @@ DECLARE_EVENT_CLASS(ice_xmit_template,
>  
>  #define DEFINE_XMIT_TEMPLATE_OP_EVENT(name) \
>  DEFINE_EVENT(ice_xmit_template, name, \
> -	     TP_PROTO(struct ice_ring *ring, struct sk_buff *skb), \
> +	     TP_PROTO(struct ice_tx_ring *ring, struct sk_buff *skb), \
>  	     TP_ARGS(ring, skb))
>  
>  DEFINE_XMIT_TEMPLATE_OP_EVENT(ice_xmit_frame_ring);
> diff --git a/drivers/net/ethernet/intel/ice/ice_txrx.c
> b/drivers/net/ethernet/intel/ice/ice_txrx.c
> index 6ee8e0032d52..fca5aca1ffae 100644
> --- a/drivers/net/ethernet/intel/ice/ice_txrx.c
> +++ b/drivers/net/ethernet/intel/ice/ice_txrx.c
> @@ -32,7 +32,7 @@ ice_prgm_fdir_fltr(struct ice_vsi *vsi, struct
> ice_fltr_desc *fdir_desc,
>  	struct ice_tx_buf *tx_buf, *first;
>  	struct ice_fltr_desc *f_desc;
>  	struct ice_tx_desc *tx_desc;
> -	struct ice_ring *tx_ring;
> +	struct ice_tx_ring *tx_ring;
>  	struct device *dev;
>  	dma_addr_t dma;
>  	u32 td_cmd;
> @@ -106,7 +106,7 @@ ice_prgm_fdir_fltr(struct ice_vsi *vsi, struct
> ice_fltr_desc *fdir_desc,
>   * @tx_buf: the buffer to free
>   */
>  static void
> -ice_unmap_and_free_tx_buf(struct ice_ring *ring, struct ice_tx_buf
> *tx_buf)
> +ice_unmap_and_free_tx_buf(struct ice_tx_ring *ring, struct
> ice_tx_buf *tx_buf)
>  {
>  	if (tx_buf->skb) {
>  		if (tx_buf->tx_flags & ICE_TX_FLAGS_DUMMY_PKT)
> @@ -133,7 +133,7 @@ ice_unmap_and_free_tx_buf(struct ice_ring *ring,
> struct ice_tx_buf *tx_buf)
>  	/* tx_buf must be completely set up in the transmit path */
>  }
>  
> -static struct netdev_queue *txring_txq(const struct ice_ring *ring)
> +static struct netdev_queue *txring_txq(const struct ice_tx_ring
> *ring)
>  {
>  	return netdev_get_tx_queue(ring->netdev, ring->q_index);
>  }
> @@ -142,8 +142,9 @@ static struct netdev_queue *txring_txq(const
> struct ice_ring *ring)
>   * ice_clean_tx_ring - Free any empty Tx buffers
>   * @tx_ring: ring to be cleaned
>   */
> -void ice_clean_tx_ring(struct ice_ring *tx_ring)
> +void ice_clean_tx_ring(struct ice_tx_ring *tx_ring)
>  {
> +	u32 size;
>  	u16 i;
>  
>  	if (ice_ring_is_xdp(tx_ring) && tx_ring->xsk_pool) {
> @@ -162,8 +163,10 @@ void ice_clean_tx_ring(struct ice_ring *tx_ring)
>  tx_skip_free:
>  	memset(tx_ring->tx_buf, 0, sizeof(*tx_ring->tx_buf) * tx_ring-
> >count);
>  
> +	size = ALIGN(tx_ring->count * sizeof(struct ice_tx_desc),
> +		     PAGE_SIZE);
>  	/* Zero out the descriptor ring */
> -	memset(tx_ring->desc, 0, tx_ring->size);
> +	memset(tx_ring->desc, 0, size);
>  
>  	tx_ring->next_to_use = 0;
>  	tx_ring->next_to_clean = 0;
> @@ -181,14 +184,18 @@ void ice_clean_tx_ring(struct ice_ring
> *tx_ring)
>   *
>   * Free all transmit software resources
>   */
> -void ice_free_tx_ring(struct ice_ring *tx_ring)
> +void ice_free_tx_ring(struct ice_tx_ring *tx_ring)
>  {
> +	u32 size;
> +
>  	ice_clean_tx_ring(tx_ring);
>  	devm_kfree(tx_ring->dev, tx_ring->tx_buf);
>  	tx_ring->tx_buf = NULL;
>  
>  	if (tx_ring->desc) {
> -		dmam_free_coherent(tx_ring->dev, tx_ring->size,
> +		size = ALIGN(tx_ring->count * sizeof(struct
> ice_tx_desc),
> +			     PAGE_SIZE);
> +		dmam_free_coherent(tx_ring->dev, size,
>  				   tx_ring->desc, tx_ring->dma);
>  		tx_ring->desc = NULL;
>  	}
> @@ -201,7 +208,7 @@ void ice_free_tx_ring(struct ice_ring *tx_ring)
>   *
>   * Returns true if there's any budget left (e.g. the clean is
> finished)
>   */
> -static bool ice_clean_tx_irq(struct ice_ring *tx_ring, int
> napi_budget)
> +static bool ice_clean_tx_irq(struct ice_tx_ring *tx_ring, int
> napi_budget)
>  {
>  	unsigned int total_bytes = 0, total_pkts = 0;
>  	unsigned int budget = ICE_DFLT_IRQ_WORK;
> @@ -329,9 +336,10 @@ static bool ice_clean_tx_irq(struct ice_ring
> *tx_ring, int napi_budget)
>   *
>   * Return 0 on success, negative on error
>   */
> -int ice_setup_tx_ring(struct ice_ring *tx_ring)
> +int ice_setup_tx_ring(struct ice_tx_ring *tx_ring)
>  {
>  	struct device *dev = tx_ring->dev;
> +	u32 size;
>  
>  	if (!dev)
>  		return -ENOMEM;
> @@ -345,13 +353,13 @@ int ice_setup_tx_ring(struct ice_ring *tx_ring)
>  		return -ENOMEM;
>  
>  	/* round up to nearest page */
> -	tx_ring->size = ALIGN(tx_ring->count * sizeof(struct
> ice_tx_desc),
> -			      PAGE_SIZE);
> -	tx_ring->desc = dmam_alloc_coherent(dev, tx_ring->size,
> &tx_ring->dma,
> +	size = ALIGN(tx_ring->count * sizeof(struct ice_tx_desc),
> +		     PAGE_SIZE);
> +	tx_ring->desc = dmam_alloc_coherent(dev, size, &tx_ring->dma,
>  					    GFP_KERNEL);
>  	if (!tx_ring->desc) {
>  		dev_err(dev, "Unable to allocate memory for the Tx
> descriptor ring, size=%d\n",
> -			tx_ring->size);
> +			size);
>  		goto err;
>  	}
>  
> @@ -373,6 +381,7 @@ int ice_setup_tx_ring(struct ice_ring *tx_ring)
>  void ice_clean_rx_ring(struct ice_ring *rx_ring)
>  {
>  	struct device *dev = rx_ring->dev;
> +	u32 size;
>  	u16 i;
>  
>  	/* ring already cleared, nothing to do */
> @@ -417,7 +426,9 @@ void ice_clean_rx_ring(struct ice_ring *rx_ring)
>  	memset(rx_ring->rx_buf, 0, sizeof(*rx_ring->rx_buf) * rx_ring-
> >count);
>  
>  	/* Zero out the descriptor ring */
> -	memset(rx_ring->desc, 0, rx_ring->size);
> +	size = ALIGN(rx_ring->count * sizeof(union ice_32byte_rx_desc),
> +		     PAGE_SIZE);
> +	memset(rx_ring->desc, 0, size);
>  
>  	rx_ring->next_to_alloc = 0;
>  	rx_ring->next_to_clean = 0;
> @@ -432,6 +443,8 @@ void ice_clean_rx_ring(struct ice_ring *rx_ring)
>   */
>  void ice_free_rx_ring(struct ice_ring *rx_ring)
>  {
> +	u32 size;
> +
>  	ice_clean_rx_ring(rx_ring);
>  	if (rx_ring->vsi->type == ICE_VSI_PF)
>  		if (xdp_rxq_info_is_reg(&rx_ring->xdp_rxq))
> @@ -441,7 +454,9 @@ void ice_free_rx_ring(struct ice_ring *rx_ring)
>  	rx_ring->rx_buf = NULL;
>  
>  	if (rx_ring->desc) {
> -		dmam_free_coherent(rx_ring->dev, rx_ring->size,
> +		size = ALIGN(rx_ring->count * sizeof(union
> ice_32byte_rx_desc),
> +			     PAGE_SIZE);
> +		dmam_free_coherent(rx_ring->dev, size,
>  				   rx_ring->desc, rx_ring->dma);
>  		rx_ring->desc = NULL;
>  	}
> @@ -456,6 +471,7 @@ void ice_free_rx_ring(struct ice_ring *rx_ring)
>  int ice_setup_rx_ring(struct ice_ring *rx_ring)
>  {
>  	struct device *dev = rx_ring->dev;
> +	u32 size;
>  
>  	if (!dev)
>  		return -ENOMEM;
> @@ -469,13 +485,13 @@ int ice_setup_rx_ring(struct ice_ring *rx_ring)
>  		return -ENOMEM;
>  
>  	/* round up to nearest page */
> -	rx_ring->size = ALIGN(rx_ring->count * sizeof(union
> ice_32byte_rx_desc),
> -			      PAGE_SIZE);
> -	rx_ring->desc = dmam_alloc_coherent(dev, rx_ring->size,
> &rx_ring->dma,
> +	size = ALIGN(rx_ring->count * sizeof(union ice_32byte_rx_desc),
> +		     PAGE_SIZE);
> +	rx_ring->desc = dmam_alloc_coherent(dev, size, &rx_ring->dma,
>  					    GFP_KERNEL);
>  	if (!rx_ring->desc) {
>  		dev_err(dev, "Unable to allocate memory for the Rx
> descriptor ring, size=%d\n",
> -			rx_ring->size);
> +			size);
>  		goto err;
>  	}
>  
> @@ -526,7 +542,7 @@ static int
>  ice_run_xdp(struct ice_ring *rx_ring, struct xdp_buff *xdp,
>  	    struct bpf_prog *xdp_prog)
>  {
> -	struct ice_ring *xdp_ring;
> +	struct ice_tx_ring *xdp_ring;
>  	int err, result;
>  	u32 act;
>  
> @@ -576,7 +592,7 @@ ice_xdp_xmit(struct net_device *dev, int n,
> struct xdp_frame **frames,
>  	struct ice_netdev_priv *np = netdev_priv(dev);
>  	unsigned int queue_index = smp_processor_id();
>  	struct ice_vsi *vsi = np->vsi;
> -	struct ice_ring *xdp_ring;
> +	struct ice_tx_ring *xdp_ring;
>  	int nxmit = 0, i;
>  
>  	if (test_bit(ICE_VSI_DOWN, vsi->state))
> @@ -1247,9 +1263,9 @@ static void ice_net_dim(struct ice_q_vector
> *q_vector)
>  	if (ITR_IS_DYNAMIC(tx)) {
>  		struct dim_sample dim_sample = {};
>  		u64 packets = 0, bytes = 0;
> -		struct ice_ring *ring;
> +		struct ice_tx_ring *ring;
>  
> -		ice_for_each_ring(ring, q_vector->tx) {
> +		ice_for_each_tx_ring(ring, q_vector->tx) {
>  			packets += ring->stats.pkts;
>  			bytes += ring->stats.bytes;
>  		}
> @@ -1387,6 +1403,7 @@ int ice_napi_poll(struct napi_struct *napi, int
> budget)
>  {
>  	struct ice_q_vector *q_vector =
>  				container_of(napi, struct ice_q_vector,
> napi);
> +	struct ice_tx_ring *tx_ring;
>  	bool clean_complete = true;
>  	struct ice_ring *ring;
>  	int budget_per_ring;
> @@ -1395,10 +1412,10 @@ int ice_napi_poll(struct napi_struct *napi,
> int budget)
>  	/* Since the actual Tx work is minimal, we can give the Tx a
> larger
>  	 * budget and be more aggressive about cleaning up the Tx
> descriptors.
>  	 */
> -	ice_for_each_ring(ring, q_vector->tx) {
> -		bool wd = ring->xsk_pool ?
> -			  ice_clean_tx_irq_zc(ring, budget) :
> -			  ice_clean_tx_irq(ring, budget);
> +	ice_for_each_tx_ring(tx_ring, q_vector->tx) {
> +		bool wd = tx_ring->xsk_pool ?
> +			  ice_clean_tx_irq_zc(tx_ring, budget) :
> +			  ice_clean_tx_irq(tx_ring, budget);
>  
>  		if (!wd)
>  			clean_complete = false;
> @@ -1462,7 +1479,7 @@ int ice_napi_poll(struct napi_struct *napi, int
> budget)
>   *
>   * Returns -EBUSY if a stop is needed, else 0
>   */
> -static int __ice_maybe_stop_tx(struct ice_ring *tx_ring, unsigned
> int size)
> +static int __ice_maybe_stop_tx(struct ice_tx_ring *tx_ring, unsigned
> int size)
>  {
>  	netif_stop_subqueue(tx_ring->netdev, tx_ring->q_index);
>  	/* Memory barrier before checking head and tail */
> @@ -1485,7 +1502,7 @@ static int __ice_maybe_stop_tx(struct ice_ring
> *tx_ring, unsigned int size)
>   *
>   * Returns 0 if stop is not needed
>   */
> -static int ice_maybe_stop_tx(struct ice_ring *tx_ring, unsigned int
> size)
> +static int ice_maybe_stop_tx(struct ice_tx_ring *tx_ring, unsigned
> int size)
>  {
>  	if (likely(ICE_DESC_UNUSED(tx_ring) >= size))
>  		return 0;
> @@ -1504,7 +1521,7 @@ static int ice_maybe_stop_tx(struct ice_ring
> *tx_ring, unsigned int size)
>   * it and the length into the transmit descriptor.
>   */
>  static void
> -ice_tx_map(struct ice_ring *tx_ring, struct ice_tx_buf *first,
> +ice_tx_map(struct ice_tx_ring *tx_ring, struct ice_tx_buf *first,
>  	   struct ice_tx_offload_params *off)
>  {
>  	u64 td_offset, td_tag, td_cmd;
> @@ -1840,7 +1857,7 @@ int ice_tx_csum(struct ice_tx_buf *first,
> struct ice_tx_offload_params *off)
>   * related to VLAN tagging for the HW, such as VLAN, DCB, etc.
>   */
>  static void
> -ice_tx_prepare_vlan_flags(struct ice_ring *tx_ring, struct
> ice_tx_buf *first)
> +ice_tx_prepare_vlan_flags(struct ice_tx_ring *tx_ring, struct
> ice_tx_buf *first)
>  {
>  	struct sk_buff *skb = first->skb;
>  
> @@ -2146,7 +2163,7 @@ static bool ice_chk_linearize(struct sk_buff
> *skb, unsigned int count)
>   * @off: Tx offload parameters
>   */
>  static void
> -ice_tstamp(struct ice_ring *tx_ring, struct sk_buff *skb,
> +ice_tstamp(struct ice_tx_ring *tx_ring, struct sk_buff *skb,
>  	   struct ice_tx_buf *first, struct ice_tx_offload_params *off)
>  {
>  	s8 idx;
> @@ -2181,7 +2198,7 @@ ice_tstamp(struct ice_ring *tx_ring, struct
> sk_buff *skb,
>   * Returns NETDEV_TX_OK if sent, else an error code
>   */
>  static netdev_tx_t
> -ice_xmit_frame_ring(struct sk_buff *skb, struct ice_ring *tx_ring)
> +ice_xmit_frame_ring(struct sk_buff *skb, struct ice_tx_ring
> *tx_ring)
>  {
>  	struct ice_tx_offload_params offload = { 0 };
>  	struct ice_vsi *vsi = tx_ring->vsi;
> @@ -2282,7 +2299,7 @@ netdev_tx_t ice_start_xmit(struct sk_buff *skb,
> struct net_device *netdev)
>  {
>  	struct ice_netdev_priv *np = netdev_priv(netdev);
>  	struct ice_vsi *vsi = np->vsi;
> -	struct ice_ring *tx_ring;
> +	struct ice_tx_ring *tx_ring;
>  
>  	tx_ring = vsi->tx_rings[skb->queue_mapping];
>  
> @@ -2299,7 +2316,7 @@ netdev_tx_t ice_start_xmit(struct sk_buff *skb,
> struct net_device *netdev)
>   * ice_clean_ctrl_tx_irq - interrupt handler for flow director Tx
> queue
>   * @tx_ring: tx_ring to clean
>   */
> -void ice_clean_ctrl_tx_irq(struct ice_ring *tx_ring)
> +void ice_clean_ctrl_tx_irq(struct ice_tx_ring *tx_ring)
>  {
>  	struct ice_vsi *vsi = tx_ring->vsi;
>  	s16 i = tx_ring->next_to_clean;
> diff --git a/drivers/net/ethernet/intel/ice/ice_txrx.h
> b/drivers/net/ethernet/intel/ice/ice_txrx.h
> index 1e46e80f3d6f..d4ab3558933e 100644
> --- a/drivers/net/ethernet/intel/ice/ice_txrx.h
> +++ b/drivers/net/ethernet/intel/ice/ice_txrx.h
> @@ -154,7 +154,7 @@ struct ice_tx_buf {
>  
>  struct ice_tx_offload_params {
>  	u64 cd_qw1;
> -	struct ice_ring *tx_ring;
> +	struct ice_tx_ring *tx_ring;
>  	u32 td_cmd;
>  	u32 td_offset;
>  	u32 td_l2tag1;
> @@ -267,16 +267,11 @@ struct ice_ring {
>  	struct ice_vsi *vsi;		/* Backreference to
> associated VSI */
>  	struct ice_q_vector *q_vector;	/* Backreference to
> associated vector */
>  	u8 __iomem *tail;
> -	union {
> -		struct ice_tx_buf *tx_buf;
> -		struct ice_rx_buf *rx_buf;
> -	};
> +	struct ice_rx_buf *rx_buf;
>  	/* CL2 - 2nd cacheline starts here */
> +	struct xdp_rxq_info xdp_rxq;
> +	/* CL3 - 3rd cacheline starts here */
>  	u16 q_index;			/* Queue number of ring */
> -	u16 q_handle;			/* Queue handle per TC */
> -
> -	u8 ring_active:1;		/* is ring online or not */

Seems like "ring_active" could be removed as a separate patch since
it doesn't seemed to be used at all. Am I missing something here?

> -
>  	u16 count;			/* Number of descriptors */
>  	u16 reg_idx;			/* HW register index of the
> ring */
>  
> @@ -284,38 +279,61 @@ struct ice_ring {
>  	u16 next_to_use;
>  	u16 next_to_clean;
>  	u16 next_to_alloc;
> +	u16 rx_offset;
> +	u16 rx_buf_len;
>  
>  	/* stats structs */
> +	struct ice_rxq_stats rx_stats;
>  	struct ice_q_stats	stats;
>  	struct u64_stats_sync syncp;
> -	union {
> -		struct ice_txq_stats tx_stats;
> -		struct ice_rxq_stats rx_stats;
> -	};
>  
>  	struct rcu_head rcu;		/* to avoid race on free */
> -	DECLARE_BITMAP(xps_state, ICE_TX_NBITS);	/* XPS Config State
> */
> +	/* CL4 - 3rd cacheline starts here */
>  	struct bpf_prog *xdp_prog;
>  	struct xsk_buff_pool *xsk_pool;
> -	u16 rx_offset;
> -	/* CL3 - 3rd cacheline starts here */
> -	struct xdp_rxq_info xdp_rxq;
>  	struct sk_buff *skb;
> -	/* CLX - the below items are only accessed infrequently and
> should be
> -	 * in their own cache line if possible
> -	 */
> -#define ICE_TX_FLAGS_RING_XDP		BIT(0)
> +	dma_addr_t dma;			/* physical address of ring
> */
>  #define ICE_RX_FLAGS_RING_BUILD_SKB	BIT(1)
> +	u64 cached_phctime;
> +	u8 dcb_tc;			/* Traffic class of ring */
> +	u8 ptp_rx;
>  	u8 flags;
> +} ____cacheline_internodealigned_in_smp;
> +
> +struct ice_tx_ring {
> +	/* CL1 - 1st cacheline starts here */
> +	struct ice_tx_ring *next;	/* pointer to next ring in q_vector
> */
> +	void *desc;			/* Descriptor ring memory */
> +	struct device *dev;		/* Used for DMA mapping */
> +	u8 __iomem *tail;
> +	struct ice_tx_buf *tx_buf;
> +	struct ice_q_vector *q_vector;	/* Backreference to
> associated vector */
> +	struct net_device *netdev;	/* netdev ring maps to */
> +	struct ice_vsi *vsi;		/* Backreference to
> associated VSI */
> +	/* CL2 - 2nd cacheline starts here */
>  	dma_addr_t dma;			/* physical address of ring
> */
> -	unsigned int size;		/* length of descriptor ring
> in bytes */
> +	u16 next_to_use;
> +	u16 next_to_clean;
> +	u16 count;			/* Number of descriptors */
> +	u16 q_index;			/* Queue number of ring */
> +	struct xsk_buff_pool *xsk_pool;
> +
> +	/* stats structs */
> +	struct ice_q_stats	stats;
> +	struct u64_stats_sync syncp;
> +	struct ice_txq_stats tx_stats;
> +
> +	/* CL3 - 3rd cacheline starts here */
> +	struct rcu_head rcu;		/* to avoid race on free */
> +	DECLARE_BITMAP(xps_state, ICE_TX_NBITS);	/* XPS Config State
> */
> +	struct ice_ptp_tx *tx_tstamps;
>  	u32 txq_teid;			/* Added Tx queue TEID */
> -	u16 rx_buf_len;
> +	u16 q_handle;			/* Queue handle per TC */
> +	u16 reg_idx;			/* HW register index of the
> ring */
> +#define ICE_TX_FLAGS_RING_XDP		BIT(0)
> +	u8 flags;
>  	u8 dcb_tc;			/* Traffic class of ring */
> -	struct ice_ptp_tx *tx_tstamps;
> -	u64 cached_phctime;
> -	u8 ptp_rx:1;
> -	u8 ptp_tx:1;
> +	u8 ptp_tx;
>  } ____cacheline_internodealigned_in_smp;
>  
>  static inline bool ice_ring_uses_build_skb(struct ice_ring *ring)
> @@ -333,14 +351,17 @@ static inline void
> ice_clear_ring_build_skb_ena(struct ice_ring *ring)
>  	ring->flags &= ~ICE_RX_FLAGS_RING_BUILD_SKB;
>  }
>  
> -static inline bool ice_ring_is_xdp(struct ice_ring *ring)
> +static inline bool ice_ring_is_xdp(struct ice_tx_ring *ring)
>  {
>  	return !!(ring->flags & ICE_TX_FLAGS_RING_XDP);
>  }
>  
>  struct ice_ring_container {
>  	/* head of linked-list of rings */
> -	struct ice_ring *ring;
> +	union {
> +		struct ice_ring *ring;
> +		struct ice_tx_ring *tx_ring;
> +	};
>  	struct dim dim;		/* data for net_dim algorithm */
>  	u16 itr_idx;		/* index in the interrupt vector */
>  	/* this matches the maximum number of ITR bits, but in usec
> @@ -363,6 +384,9 @@ struct ice_coalesce_stored {
>  #define ice_for_each_ring(pos, head) \
>  	for (pos = (head).ring; pos; pos = pos->next)
>  
> +#define ice_for_each_tx_ring(pos, head) \
> +	for (pos = (head).tx_ring; pos; pos = pos->next)
> +
>  {
>  #if (PAGE_SIZE < 8192)
> @@ -378,16 +402,16 @@ union ice_32b_rx_flex_desc;
>  
>  bool ice_alloc_rx_bufs(struct ice_ring *rxr, u16 cleaned_count);
>  netdev_tx_t ice_start_xmit(struct sk_buff *skb, struct net_device
> *netdev);
> -void ice_clean_tx_ring(struct ice_ring *tx_ring);
> +void ice_clean_tx_ring(struct ice_tx_ring *tx_ring);
>  void ice_clean_rx_ring(struct ice_ring *rx_ring);
> -int ice_setup_tx_ring(struct ice_ring *tx_ring);
> +int ice_setup_tx_ring(struct ice_tx_ring *tx_ring);
>  int ice_setup_rx_ring(struct ice_ring *rx_ring);
> -void ice_free_tx_ring(struct ice_ring *tx_ring);
> +void ice_free_tx_ring(struct ice_tx_ring *tx_ring);
>  void ice_free_rx_ring(struct ice_ring *rx_ring);
>  int ice_napi_poll(struct napi_struct *napi, int budget);
>  int
>  ice_prgm_fdir_fltr(struct ice_vsi *vsi, struct ice_fltr_desc
> *fdir_desc,
>  		   u8 *raw_packet);
>  int ice_clean_rx_irq(struct ice_ring *rx_ring, int budget);
> -void ice_clean_ctrl_tx_irq(struct ice_ring *tx_ring);
> +void ice_clean_ctrl_tx_irq(struct ice_tx_ring *tx_ring);
>  #endif /* _ICE_TXRX_H_ */
> diff --git a/drivers/net/ethernet/intel/ice/ice_txrx_lib.c
> b/drivers/net/ethernet/intel/ice/ice_txrx_lib.c
> index 171397dcf00a..74519c603872 100644
> --- a/drivers/net/ethernet/intel/ice/ice_txrx_lib.c
> +++ b/drivers/net/ethernet/intel/ice/ice_txrx_lib.c
> @@ -217,7 +217,7 @@ ice_receive_skb(struct ice_ring *rx_ring, struct
> sk_buff *skb, u16 vlan_tag)
>   * @size: packet data size
>   * @xdp_ring: XDP ring for transmission
>   */
> -int ice_xmit_xdp_ring(void *data, u16 size, struct ice_ring
> *xdp_ring)
> +int ice_xmit_xdp_ring(void *data, u16 size, struct ice_tx_ring
> *xdp_ring)
>  {
>  	u16 i = xdp_ring->next_to_use;
>  	struct ice_tx_desc *tx_desc;
> @@ -269,7 +269,7 @@ int ice_xmit_xdp_ring(void *data, u16 size,
> struct ice_ring *xdp_ring)
>   *
>   * Returns negative on failure, 0 on success.
>   */
> -int ice_xmit_xdp_buff(struct xdp_buff *xdp, struct ice_ring
> *xdp_ring)
> +int ice_xmit_xdp_buff(struct xdp_buff *xdp, struct ice_tx_ring
> *xdp_ring)
>  {
>  	struct xdp_frame *xdpf = xdp_convert_buff_to_frame(xdp);
>  
> @@ -294,7 +294,7 @@ void ice_finalize_xdp_rx(struct ice_ring
> *rx_ring, unsigned int xdp_res)
>  		xdp_do_flush_map();
>  
>  	if (xdp_res & ICE_XDP_TX) {
> -		struct ice_ring *xdp_ring =
> +		struct ice_tx_ring *xdp_ring =
>  			rx_ring->vsi->xdp_rings[rx_ring->q_index];

Probably me not understanding XDP, but this looks a little strange.

>  
>  		ice_xdp_ring_update_tail(xdp_ring);
> diff --git a/drivers/net/ethernet/intel/ice/ice_txrx_lib.h
> b/drivers/net/ethernet/intel/ice/ice_txrx_lib.h
> index 05ac30752902..6989070ae2e2 100644
> --- a/drivers/net/ethernet/intel/ice/ice_txrx_lib.h
> +++ b/drivers/net/ethernet/intel/ice/ice_txrx_lib.h
> @@ -37,7 +37,7 @@ ice_build_ctob(u64 td_cmd, u64 td_offset, unsigned
> int size, u64 td_tag)
>   *
>   * This function updates the XDP Tx ring tail register.
>   */
> -static inline void ice_xdp_ring_update_tail(struct ice_ring
> *xdp_ring)
> +static inline void ice_xdp_ring_update_tail(struct ice_tx_ring
> *xdp_ring)
>  {
>  	/* Force memory writes to complete before letting h/w
>  	 * know there are new descriptors to fetch.
> @@ -46,9 +46,9 @@ static inline void ice_xdp_ring_update_tail(struct
> ice_ring *xdp_ring)
>  	writel_relaxed(xdp_ring->next_to_use, xdp_ring->tail);
>  }
>  
> -void ice_finalize_xdp_rx(struct ice_ring *rx_ring, unsigned int
> xdp_res);
> -int ice_xmit_xdp_buff(struct xdp_buff *xdp, struct ice_ring
> *xdp_ring);
> -int ice_xmit_xdp_ring(void *data, u16 size, struct ice_ring
> *xdp_ring);
> +void ice_finalize_xdp_rx(struct ice_ring *xdp_ring, unsigned int
> xdp_res);
> +int ice_xmit_xdp_buff(struct xdp_buff *xdp, struct ice_tx_ring
> *xdp_ring);
> +int ice_xmit_xdp_ring(void *data, u16 size, struct ice_tx_ring
> *xdp_ring);
>  void ice_release_rx_desc(struct ice_ring *rx_ring, u16 val);
>  void
>  ice_process_skb_fields(struct ice_ring *rx_ring,
> diff --git a/drivers/net/ethernet/intel/ice/ice_virtchnl_pf.c
> b/drivers/net/ethernet/intel/ice/ice_virtchnl_pf.c
> index 2826570dab51..0ee694960e51 100644
> --- a/drivers/net/ethernet/intel/ice/ice_virtchnl_pf.c
> +++ b/drivers/net/ethernet/intel/ice/ice_virtchnl_pf.c
> @@ -3326,7 +3326,7 @@ static int ice_vc_dis_qs_msg(struct ice_vf *vf,
> u8 *msg)
>  		q_map = vqs->tx_queues;
>  
>  		for_each_set_bit(vf_q_id, &q_map,
> ICE_MAX_RSS_QS_PER_VF) {
> -			struct ice_ring *ring = vsi->tx_rings[vf_q_id];
> +			struct ice_tx_ring *ring = vsi-
> >tx_rings[vf_q_id];
>  			struct ice_txq_meta txq_meta = { 0 };
>  
>  			if (!ice_vc_isvalid_q_id(vf, vqs->vsi_id,
> vf_q_id)) {
> diff --git a/drivers/net/ethernet/intel/ice/ice_xsk.c
> b/drivers/net/ethernet/intel/ice/ice_xsk.c
> index 5a9f61deeb38..bcf0f8e2ba6e 100644
> --- a/drivers/net/ethernet/intel/ice/ice_xsk.c
> +++ b/drivers/net/ethernet/intel/ice/ice_xsk.c
> @@ -104,12 +104,13 @@ ice_qvec_cfg_msix(struct ice_vsi *vsi, struct
> ice_q_vector *q_vector)
>  	u16 reg_idx = q_vector->reg_idx;
>  	struct ice_pf *pf = vsi->back;
>  	struct ice_hw *hw = &pf->hw;
> +	struct ice_tx_ring *tx_ring;
>  	struct ice_ring *ring;
>  
>  	ice_cfg_itr(hw, q_vector);
>  
> -	ice_for_each_ring(ring, q_vector->tx)
> -		ice_cfg_txq_interrupt(vsi, ring->reg_idx, reg_idx,
> +	ice_for_each_tx_ring(tx_ring, q_vector->tx)
> +		ice_cfg_txq_interrupt(vsi, tx_ring->reg_idx, reg_idx,
>  				      q_vector->tx.itr_idx);
>  
>  	ice_for_each_ring(ring, q_vector->rx)
> @@ -144,8 +145,9 @@ static void ice_qvec_ena_irq(struct ice_vsi *vsi,
> struct ice_q_vector *q_vector)
>  static int ice_qp_dis(struct ice_vsi *vsi, u16 q_idx)
>  {
>  	struct ice_txq_meta txq_meta = { };
> -	struct ice_ring *tx_ring, *rx_ring;
>  	struct ice_q_vector *q_vector;
> +	struct ice_tx_ring *tx_ring;
> +	struct ice_ring *rx_ring;
>  	int timeout = 50;
>  	int err;
>  
> @@ -171,7 +173,7 @@ static int ice_qp_dis(struct ice_vsi *vsi, u16
> q_idx)
>  	if (err)
>  		return err;
>  	if (ice_is_xdp_ena_vsi(vsi)) {
> -		struct ice_ring *xdp_ring = vsi->xdp_rings[q_idx];
> +		struct ice_tx_ring *xdp_ring = vsi->xdp_rings[q_idx];
>  
>  		memset(&txq_meta, 0, sizeof(txq_meta));
>  		ice_fill_txq_meta(vsi, xdp_ring, &txq_meta);
> @@ -201,8 +203,9 @@ static int ice_qp_dis(struct ice_vsi *vsi, u16
> q_idx)
>  static int ice_qp_ena(struct ice_vsi *vsi, u16 q_idx)
>  {
>  	struct ice_aqc_add_tx_qgrp *qg_buf;
> -	struct ice_ring *tx_ring, *rx_ring;
>  	struct ice_q_vector *q_vector;
> +	struct ice_tx_ring *tx_ring;
> +	struct ice_ring *rx_ring;
>  	u16 size;
>  	int err;
>  
> @@ -225,7 +228,7 @@ static int ice_qp_ena(struct ice_vsi *vsi, u16
> q_idx)
>  		goto free_buf;
>  
>  	if (ice_is_xdp_ena_vsi(vsi)) {
> -		struct ice_ring *xdp_ring = vsi->xdp_rings[q_idx];
> +		struct ice_tx_ring *xdp_ring = vsi->xdp_rings[q_idx];
>  
>  		memset(qg_buf, 0, size);
>  		qg_buf->num_txqs = 1;
> @@ -233,7 +236,7 @@ static int ice_qp_ena(struct ice_vsi *vsi, u16
> q_idx)
>  		if (err)
>  			goto free_buf;
>  		ice_set_ring_xdp(xdp_ring);
> -		xdp_ring->xsk_pool = ice_xsk_pool(xdp_ring);
> +		xdp_ring->xsk_pool = ice_tx_xsk_pool(xdp_ring);
>  	}
>  
>  	err = ice_vsi_cfg_rxq(rx_ring);
> @@ -462,8 +465,8 @@ static int
>  ice_run_xdp_zc(struct ice_ring *rx_ring, struct xdp_buff *xdp)
>  {
>  	int err, result = ICE_XDP_PASS;
> +	struct ice_tx_ring *xdp_ring;
>  	struct bpf_prog *xdp_prog;
> -	struct ice_ring *xdp_ring;
>  	u32 act;
>  
>  	/* ZC patch is enabled only when XDP program is set,
> @@ -618,7 +621,7 @@ int ice_clean_rx_irq_zc(struct ice_ring *rx_ring,
> int budget)
>   *
>   * Returns true if cleanup/transmission is done.
>   */
> -static bool ice_xmit_zc(struct ice_ring *xdp_ring, int budget)
> +static bool ice_xmit_zc(struct ice_tx_ring *xdp_ring, int budget)
>  {
>  	struct ice_tx_desc *tx_desc = NULL;
>  	bool work_done = true;
> @@ -669,7 +672,7 @@ static bool ice_xmit_zc(struct ice_ring
> *xdp_ring, int budget)
>   * @tx_buf: Tx buffer to clean
>   */
>  static void
> -ice_clean_xdp_tx_buf(struct ice_ring *xdp_ring, struct ice_tx_buf
> *tx_buf)
> +ice_clean_xdp_tx_buf(struct ice_tx_ring *xdp_ring, struct ice_tx_buf
> *tx_buf)
>  {
>  	xdp_return_frame((struct xdp_frame *)tx_buf->raw_buf);
>  	dma_unmap_single(xdp_ring->dev, dma_unmap_addr(tx_buf, dma),
> @@ -684,7 +687,7 @@ ice_clean_xdp_tx_buf(struct ice_ring *xdp_ring,
> struct ice_tx_buf *tx_buf)
>   *
>   * Returns true if cleanup/tranmission is done.
>   */
> -bool ice_clean_tx_irq_zc(struct ice_ring *xdp_ring, int budget)
> +bool ice_clean_tx_irq_zc(struct ice_tx_ring *xdp_ring, int budget)
>  {
>  	int total_packets = 0, total_bytes = 0;
>  	s16 ntc = xdp_ring->next_to_clean;
> @@ -757,7 +760,7 @@ ice_xsk_wakeup(struct net_device *netdev, u32
> queue_id,
>  	struct ice_netdev_priv *np = netdev_priv(netdev);
>  	struct ice_q_vector *q_vector;
>  	struct ice_vsi *vsi = np->vsi;
> -	struct ice_ring *ring;
> +	struct ice_tx_ring *ring;
>  
>  	if (test_bit(ICE_DOWN, vsi->state))
>  		return -ENETDOWN;
> @@ -826,7 +829,7 @@ void ice_xsk_clean_rx_ring(struct ice_ring
> *rx_ring)
>   * ice_xsk_clean_xdp_ring - Clean the XDP Tx ring and its buffer
> pool queues
>   * @xdp_ring: XDP_Tx ring
>   */
> -void ice_xsk_clean_xdp_ring(struct ice_ring *xdp_ring)
> +void ice_xsk_clean_xdp_ring(struct ice_tx_ring *xdp_ring)
>  {
>  	u16 ntc = xdp_ring->next_to_clean, ntu = xdp_ring->next_to_use;
>  	u32 xsk_frames = 0;
> diff --git a/drivers/net/ethernet/intel/ice/ice_xsk.h
> b/drivers/net/ethernet/intel/ice/ice_xsk.h
> index ea208808623a..2cf26372aefd 100644
> --- a/drivers/net/ethernet/intel/ice/ice_xsk.h
> +++ b/drivers/net/ethernet/intel/ice/ice_xsk.h
> @@ -12,12 +12,12 @@ struct ice_vsi;
>  int ice_xsk_pool_setup(struct ice_vsi *vsi, struct xsk_buff_pool
> *pool,
>  		       u16 qid);
>  int ice_clean_rx_irq_zc(struct ice_ring *rx_ring, int budget);
> -bool ice_clean_tx_irq_zc(struct ice_ring *xdp_ring, int budget);
> +bool ice_clean_tx_irq_zc(struct ice_tx_ring *xdp_ring, int budget);
>  int ice_xsk_wakeup(struct net_device *netdev, u32 queue_id, u32
> flags);
>  bool ice_alloc_rx_bufs_zc(struct ice_ring *rx_ring, u16 count);
>  bool ice_xsk_any_rx_ring_ena(struct ice_vsi *vsi);
>  void ice_xsk_clean_rx_ring(struct ice_ring *rx_ring);
> -void ice_xsk_clean_xdp_ring(struct ice_ring *xdp_ring);
> +void ice_xsk_clean_xdp_ring(struct ice_tx_ring *xdp_ring);
>  #else
>  static inline int
>  ice_xsk_pool_setup(struct ice_vsi __always_unused *vsi,
> @@ -35,7 +35,7 @@ ice_clean_rx_irq_zc(struct ice_ring __always_unused
> *rx_ring,
>  }
>  
>  static inline bool
> -ice_clean_tx_irq_zc(struct ice_ring __always_unused *xdp_ring,
> +ice_clean_tx_irq_zc(struct ice_tx_ring __always_unused *xdp_ring,
>  		    int __always_unused budget)
>  {
>  	return false;
> @@ -61,6 +61,6 @@ ice_xsk_wakeup(struct net_device __always_unused
> *netdev,
>  }
>  
>  static inline void ice_xsk_clean_rx_ring(struct ice_ring *rx_ring) {
> }
> -static inline void ice_xsk_clean_xdp_ring(struct ice_ring *xdp_ring)
> { }
> +static inline void ice_xsk_clean_xdp_ring(struct ice_tx_ring
> *xdp_ring) { }
>  #endif /* CONFIG_XDP_SOCKETS */
>  #endif /* !_ICE_XSK_H_ */

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v3 intel-next 1/6] ice: split ice_ring onto Tx/Rx separate structs
  2021-08-06 20:46   ` Creeley, Brett
@ 2021-08-10 13:10     ` Maciej Fijalkowski
  0 siblings, 0 replies; 10+ messages in thread
From: Maciej Fijalkowski @ 2021-08-10 13:10 UTC (permalink / raw)
  To: Creeley, Brett
  Cc: intel-wired-lan, toke, Karlsson, Magnus, davem, Lobakin,
	Alexandr, bjorn, Brandeburg, Jesse, kuba, bpf, netdev, Nguyen,
	Anthony L, joamaki

On Fri, Aug 06, 2021 at 09:46:07PM +0100, Creeley, Brett wrote:
> On Fri, 2021-08-06 at 01:00 +0200, Maciej Fijalkowski wrote:
> > While it was convenient to have a generic ring structure that served
> > both Tx and Rx sides, next commits are going to introduce several
> > Tx-specific fields, so in order to avoid hurting the Rx side, let's
> > pull out the Tx ring onto new ice_tx_ring struct and let the ice_ring
> > handle the Rx rings only.
> 
> I like this change. It makes a lot of sense because the Rx/Tx rings
> have diverged so much.

Glad to hear! First of all, thanks a lot for taking a look at this.

> 
> I don't see any changes in the coalesce code. I'm pretty sure there
> should be some changes in ice_set_rc_coalesce() at the very least
> based on these changes.

Yeah I guess we need some adjustments with regards to type of the ring
container.

> 
> >
> > Make the union out of the ring container within ice_q_vector so that
> > it
> > is possible to iterate over newly introduced ice_tx_ring.
> >
> > Remove the @size as it's only accessed from control path and it can
> > be
> > calculated pretty easily.
> >
> > Remove @ring_active as it's not actively used anywhere.
> >
> > Change definitions of ice_update_ring_stats and
> > ice_fetch_u64_stats_per_ring so that they are ring agnostic and can
> > be
> > used for both Rx and Tx rings.
> >
> > Sizes of Rx and Tx ring structs are 256 and 192 bytes, respectively.
> > In
> > Rx ring xdp_rxq_info occupies its own cacheline, so it's the major
> > difference now.
> >
> > Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
> > ---
> >  drivers/net/ethernet/intel/ice/ice.h          | 27 ++++--
> >  drivers/net/ethernet/intel/ice/ice_base.c     | 27 +++---
> >  drivers/net/ethernet/intel/ice/ice_base.h     |  6 +-
> >  drivers/net/ethernet/intel/ice/ice_dcb_lib.c  |  5 +-
> >  drivers/net/ethernet/intel/ice/ice_dcb_lib.h  |  6 +-
> >  drivers/net/ethernet/intel/ice/ice_ethtool.c  | 17 ++--
> >  drivers/net/ethernet/intel/ice/ice_lib.c      | 28 +++---
> >  drivers/net/ethernet/intel/ice/ice_lib.h      |  4 +-
> >  drivers/net/ethernet/intel/ice/ice_main.c     | 47 +++++-----
> >  drivers/net/ethernet/intel/ice/ice_trace.h    |  8 +-
> >  drivers/net/ethernet/intel/ice/ice_txrx.c     | 87 ++++++++++-------
> > -
> >  drivers/net/ethernet/intel/ice/ice_txrx.h     | 90 ++++++++++++-----
> > --
> >  drivers/net/ethernet/intel/ice/ice_txrx_lib.c |  6 +-
> >  drivers/net/ethernet/intel/ice/ice_txrx_lib.h |  8 +-
> >  .../net/ethernet/intel/ice/ice_virtchnl_pf.c  |  2 +-
> >  drivers/net/ethernet/intel/ice/ice_xsk.c      | 29 +++---
> >  drivers/net/ethernet/intel/ice/ice_xsk.h      |  8 +-
> >  17 files changed, 233 insertions(+), 172 deletions(-)
> >
> > diff --git a/drivers/net/ethernet/intel/ice/ice.h
> > b/drivers/net/ethernet/intel/ice/ice.h
> > index a450343fbb92..2e15e097bc0f 100644
> > --- a/drivers/net/ethernet/intel/ice/ice.h
> > +++ b/drivers/net/ethernet/intel/ice/ice.h
> > @@ -266,7 +266,7 @@ struct ice_vsi {
> >       struct ice_pf *back;             /* back pointer to PF */
> >       struct ice_port_info *port_info; /* back pointer to port_info
> > */
> >       struct ice_ring **rx_rings;      /* Rx ring array */
> 
> If you are doing this, we should be explicit for Rx rings too and
> rename ice_ring to ice_rx_ring.

I just wanted to reduce the overhead by not doing so, but I agree it's
needed...

> 
> Obviously this would generate some more work here, but I think
> it's necessary with this change.
> 
> > -     struct ice_ring **tx_rings;      /* Tx ring array */
> > +     struct ice_tx_ring **tx_rings;   /* Tx ring array */
> >       struct ice_q_vector **q_vectors; /* q_vector array */
> >
> >       irqreturn_t (*irq_handler)(int irq, void *data);
> > @@ -343,7 +343,7 @@ struct ice_vsi {
> >       u16 qset_handle[ICE_MAX_TRAFFIC_CLASS];
> >       struct ice_tc_cfg tc_cfg;
> >       struct bpf_prog *xdp_prog;
> > -     struct ice_ring **xdp_rings;     /* XDP ring array */
> > +     struct ice_tx_ring **xdp_rings;  /* XDP ring array */
> >       unsigned long *af_xdp_zc_qps;    /* tracks AF_XDP ZC enabled
> > qps */
> >       u16 num_xdp_txq;                 /* Used XDP queues */
> >       u8 xdp_mapping_mode;             /*
> > ICE_MAP_MODE_[CONTIG|SCATTER] */
> > @@ -555,14 +555,14 @@ static inline bool ice_is_xdp_ena_vsi(struct
> > ice_vsi *vsi)
> >       return !!vsi->xdp_prog;
> >  }
> >
> > -static inline void ice_set_ring_xdp(struct ice_ring *ring)
> > +static inline void ice_set_ring_xdp(struct ice_tx_ring *ring)
> >  {
> >       ring->flags |= ICE_TX_FLAGS_RING_XDP;
> >  }
> >
> >  /**
> >   * ice_xsk_pool - get XSK buffer pool bound to a ring
> > - * @ring: ring to use
> > + * @ring: Rx ring to use
> >   *
> >   * Returns a pointer to xdp_umem structure if there is a buffer pool
> > present,
> >   * NULL otherwise.
> > @@ -572,8 +572,23 @@ static inline struct xsk_buff_pool
> > *ice_xsk_pool(struct ice_ring *ring)
> >       struct ice_vsi *vsi = ring->vsi;
> >       u16 qid = ring->q_index;
> >
> > -     if (ice_ring_is_xdp(ring))
> > -             qid -= vsi->num_xdp_txq;
> > +     if (!ice_is_xdp_ena_vsi(vsi) || !test_bit(qid, vsi-
> > >af_xdp_zc_qps))
> > +             return NULL;
> > +
> > +     return xsk_get_pool_from_qid(vsi->netdev, qid);
> 
> Is this a bug fix? It seems like before we

Seems like you didn't finish your thought? But it's not a bugfix. This
func is now dedicated only for Rx rings which won't have the
ICE_TX_FLAGS_RING_XDP set as it's dedicated for XDP Tx ring, that's why I
removed the call to ice_ring_is_xdp().

Maybe you're referring to something else?

> > +}
> > +
> > +/**
> > + * ice_tx_xsk_pool - get XSK buffer pool bound to a ring
> > + * @ring: Tx ring to use
> > + *
> > + * Returns a pointer to xdp_umem structure if there is a buffer pool
> > present,
> > + * NULL otherwise. Tx equivalent of ice_xsk_pool.
> > + */
> > +static inline struct xsk_buff_pool *ice_tx_xsk_pool(struct
> > ice_tx_ring *ring)
> > +{
> > +     struct ice_vsi *vsi = ring->vsi;
> > +     u16 qid = ring->q_index - vsi->num_xdp_txq;
> 
> RCT. Should just assign the qid variable after to keep RCT
> ordering. Probably not strictly necessary though because it
> makes sense this way since you have to deref the vsi first.

If you insist, I can rewrite this in a way that RCT requirement is
satisfied.

> 
> >
> >       if (!ice_is_xdp_ena_vsi(vsi) || !test_bit(qid, vsi-
> > >af_xdp_zc_qps))
> >               return NULL;
> > diff --git a/drivers/net/ethernet/intel/ice/ice_base.c
> > b/drivers/net/ethernet/intel/ice/ice_base.c
> > index c36057efc7ae..838ee4b8d96f 100644
> > --- a/drivers/net/ethernet/intel/ice/ice_base.c
> > +++ b/drivers/net/ethernet/intel/ice/ice_base.c
> > @@ -146,6 +146,7 @@ static void ice_free_q_vector(struct ice_vsi
> > *vsi, int v_idx)
> >  {
> >       struct ice_q_vector *q_vector;
> >       struct ice_pf *pf = vsi->back;
> > +     struct ice_tx_ring *tx_ring;
> >       struct ice_ring *ring;
> struct ice_rx_ring *rx_ring; would be much more clear here
> >       struct device *dev;
> >
> > @@ -156,8 +157,8 @@ static void ice_free_q_vector(struct ice_vsi
> > *vsi, int v_idx)
> >       }
> >       q_vector = vsi->q_vectors[v_idx];
> >
> > -     ice_for_each_ring(ring, q_vector->tx)
> > -             ring->q_vector = NULL;
> > +     ice_for_each_tx_ring(tx_ring, q_vector->tx)
> > +             tx_ring->q_vector = NULL;
> 
> It seems like if we used a "void *ring" in the ice_ring_container
> it would simplify some of this and we wouldn't need a
> differnt "for_each" for loop.
> 
> The only downfall is we would have to cast to the correct ring
> type based on context when we want to dereference it.

I tried the void *ring approach and it turned out to have the same or even
worse overhead as all of the references to a Rx/Tx ring specific struct
within the for_each loop needed to be replaced.

> 
> >       ice_for_each_ring(ring, q_vector->rx)
> >               ring->q_vector = NULL;
> Then it would be more explicit:
> 
> ice_for_each_rx_ring(ring, q_vector->rx)
>         rx_ring->q_vector = NULL;

You need to do:
struct ice_rx_ring *rx_ring = (struct ice_rx_ring *)ring;

before NULLing and then do the s/ring/rx_ring/g within the loop.

> >
> > @@ -206,7 +207,7 @@ static void ice_cfg_itr_gran(struct ice_hw *hw)
> >   * @ring: ring to get the absolute queue index
> >   * @tc: traffic class number
> >   */
> > -static u16 ice_calc_q_handle(struct ice_vsi *vsi, struct ice_ring
> > *ring, u8 tc)
> > +static u16 ice_calc_q_handle(struct ice_vsi *vsi, struct ice_tx_ring
> > *ring, u8 tc)
> 
> should this be ice_calc_txq_handle()? Seems like it should have always
> been called that, but your change made it more obvious.

Agree!

> 
> >  {
> >       WARN_ONCE(ice_ring_is_xdp(ring) && tc, "XDP ring can't belong
> > to TC other than 0\n");
> >
> > @@ -224,7 +225,7 @@ static u16 ice_calc_q_handle(struct ice_vsi *vsi,
> > struct ice_ring *ring, u8 tc)
> >   * This enables/disables XPS for a given Tx descriptor ring
> >   * based on the TCs enabled for the VSI that ring belongs to.
> >   */
> > -static void ice_cfg_xps_tx_ring(struct ice_ring *ring)
> > +static void ice_cfg_xps_tx_ring(struct ice_tx_ring *ring)
> >  {
> >       if (!ring->q_vector || !ring->netdev)
> >               return;
> > @@ -246,7 +247,7 @@ static void ice_cfg_xps_tx_ring(struct ice_ring
> > *ring)
> >   * Configure the Tx descriptor ring in TLAN context.
> >   */
> >  static void
> > -ice_setup_tx_ctx(struct ice_ring *ring, struct ice_tlan_ctx
> > *tlan_ctx, u16 pf_q)
> > +ice_setup_tx_ctx(struct ice_tx_ring *ring, struct ice_tlan_ctx
> > *tlan_ctx, u16 pf_q)
> >  {
> >       struct ice_vsi *vsi = ring->vsi;
> >       struct ice_hw *hw = &vsi->back->hw;
> > @@ -258,7 +259,7 @@ ice_setup_tx_ctx(struct ice_ring *ring, struct
> > ice_tlan_ctx *tlan_ctx, u16 pf_q)
> >       /* Transmit Queue Length */
> >       tlan_ctx->qlen = ring->count;
> >
> > -     ice_set_cgd_num(tlan_ctx, ring);
> > +     ice_set_cgd_num(tlan_ctx, ring->dcb_tc);
> >
> >       /* PF number */
> >       tlan_ctx->pf_num = hw->pf_id;
> > @@ -660,16 +661,16 @@ void ice_vsi_map_rings_to_vectors(struct
> > ice_vsi *vsi)
> >               tx_rings_per_v = (u8)DIV_ROUND_UP(tx_rings_rem,
> >                                                 q_vectors - v_id);
> >               q_vector->num_ring_tx = tx_rings_per_v;
> > -             q_vector->tx.ring = NULL;
> > +             q_vector->tx.tx_ring = NULL;
> >               q_vector->tx.itr_idx = ICE_TX_ITR;
> >               q_base = vsi->num_txq - tx_rings_rem;
> >
> >               for (q_id = q_base; q_id < (q_base + tx_rings_per_v);
> > q_id++) {
> > -                     struct ice_ring *tx_ring = vsi->tx_rings[q_id];
> > +                     struct ice_tx_ring *tx_ring = vsi-
> > >tx_rings[q_id];
> >
> >                       tx_ring->q_vector = q_vector;
> > -                     tx_ring->next = q_vector->tx.ring;
> > -                     q_vector->tx.ring = tx_ring;
> > +                     tx_ring->next = q_vector->tx.tx_ring;
> > +                     q_vector->tx.tx_ring = tx_ring;
> >               }
> >               tx_rings_rem -= tx_rings_per_v;
> >
> > @@ -711,7 +712,7 @@ void ice_vsi_free_q_vectors(struct ice_vsi *vsi)
> >   * @qg_buf: queue group buffer
> >   */
> >  int
> > -ice_vsi_cfg_txq(struct ice_vsi *vsi, struct ice_ring *ring,
> > +ice_vsi_cfg_txq(struct ice_vsi *vsi, struct ice_tx_ring *ring,
> >               struct ice_aqc_add_tx_qgrp *qg_buf)
> >  {
> >       u8 buf_len = struct_size(qg_buf, txqs, 1);
> > @@ -870,7 +871,7 @@ void ice_trigger_sw_intr(struct ice_hw *hw,
> > struct ice_q_vector *q_vector)
> >   */
> >  int
> >  ice_vsi_stop_tx_ring(struct ice_vsi *vsi, enum ice_disq_rst_src
> > rst_src,
> > -                  u16 rel_vmvf_num, struct ice_ring *ring,
> > +                  u16 rel_vmvf_num, struct ice_tx_ring *ring,
> >                    struct ice_txq_meta *txq_meta)
> >  {
> >       struct ice_pf *pf = vsi->back;
> > @@ -927,7 +928,7 @@ ice_vsi_stop_tx_ring(struct ice_vsi *vsi, enum
> > ice_disq_rst_src rst_src,
> >   * are needed for stopping Tx queue
> >   */
> >  void
> > -ice_fill_txq_meta(struct ice_vsi *vsi, struct ice_ring *ring,
> > +ice_fill_txq_meta(struct ice_vsi *vsi, struct ice_tx_ring *ring,
> >                 struct ice_txq_meta *txq_meta)
> >  {
> >       u8 tc;
> > diff --git a/drivers/net/ethernet/intel/ice/ice_base.h
> > b/drivers/net/ethernet/intel/ice/ice_base.h
> > index 20e1c29aa68a..2ce777eb53b0 100644
> > --- a/drivers/net/ethernet/intel/ice/ice_base.h
> > +++ b/drivers/net/ethernet/intel/ice/ice_base.h
> > @@ -15,7 +15,7 @@ int ice_vsi_alloc_q_vectors(struct ice_vsi *vsi);
> >  void ice_vsi_map_rings_to_vectors(struct ice_vsi *vsi);
> >  void ice_vsi_free_q_vectors(struct ice_vsi *vsi);
> >  int
> > -ice_vsi_cfg_txq(struct ice_vsi *vsi, struct ice_ring *ring,
> > +ice_vsi_cfg_txq(struct ice_vsi *vsi, struct ice_tx_ring *ring,
> >               struct ice_aqc_add_tx_qgrp *qg_buf);
> >  void ice_cfg_itr(struct ice_hw *hw, struct ice_q_vector *q_vector);
> >  void
> > @@ -25,9 +25,9 @@ ice_cfg_rxq_interrupt(struct ice_vsi *vsi, u16 rxq,
> > u16 msix_idx, u16 itr_idx);
> >  void ice_trigger_sw_intr(struct ice_hw *hw, struct ice_q_vector
> > *q_vector);
> >  int
> >  ice_vsi_stop_tx_ring(struct ice_vsi *vsi, enum ice_disq_rst_src
> > rst_src,
> > -                  u16 rel_vmvf_num, struct ice_ring *ring,
> > +                  u16 rel_vmvf_num, struct ice_tx_ring *ring,
> >                    struct ice_txq_meta *txq_meta);
> >  void
> > -ice_fill_txq_meta(struct ice_vsi *vsi, struct ice_ring *ring,
> > +ice_fill_txq_meta(struct ice_vsi *vsi, struct ice_tx_ring *ring,
> >                 struct ice_txq_meta *txq_meta);
> >  #endif /* _ICE_BASE_H_ */
> > diff --git a/drivers/net/ethernet/intel/ice/ice_dcb_lib.c
> > b/drivers/net/ethernet/intel/ice/ice_dcb_lib.c
> > index 926cf748c5ec..2507223bfdc7 100644
> > --- a/drivers/net/ethernet/intel/ice/ice_dcb_lib.c
> > +++ b/drivers/net/ethernet/intel/ice/ice_dcb_lib.c
> > @@ -194,7 +194,8 @@ u8 ice_dcb_get_tc(struct ice_vsi *vsi, int
> > queue_index)
> >   */
> >  void ice_vsi_cfg_dcb_rings(struct ice_vsi *vsi)
> >  {
> > -     struct ice_ring *tx_ring, *rx_ring;
> > +     struct ice_tx_ring *tx_ring;
> > +     struct ice_ring *rx_ring;
> >       u16 qoffset, qcount;
> >       int i, n;
> >
> > @@ -814,7 +815,7 @@ void ice_update_dcb_stats(struct ice_pf *pf)
> >   * tag will already be configured with the correct ID and priority
> > bits
> >   */
> >  void
> > -ice_tx_prepare_vlan_flags_dcb(struct ice_ring *tx_ring,
> > +ice_tx_prepare_vlan_flags_dcb(struct ice_tx_ring *tx_ring,
> >                             struct ice_tx_buf *first)
> >  {
> >       struct sk_buff *skb = first->skb;
> > diff --git a/drivers/net/ethernet/intel/ice/ice_dcb_lib.h
> > b/drivers/net/ethernet/intel/ice/ice_dcb_lib.h
> > index 261b6e2ed7bc..a5bdf47cd34a 100644
> > --- a/drivers/net/ethernet/intel/ice/ice_dcb_lib.h
> > +++ b/drivers/net/ethernet/intel/ice/ice_dcb_lib.h
> > @@ -28,7 +28,7 @@ void ice_vsi_cfg_dcb_rings(struct ice_vsi *vsi);
> >  int ice_init_pf_dcb(struct ice_pf *pf, bool locked);
> >  void ice_update_dcb_stats(struct ice_pf *pf);
> >  void
> > -ice_tx_prepare_vlan_flags_dcb(struct ice_ring *tx_ring,
> > +ice_tx_prepare_vlan_flags_dcb(struct ice_tx_ring *tx_ring,
> >                             struct ice_tx_buf *first);
> >  void
> >  ice_dcb_process_lldp_set_mib_change(struct ice_pf *pf,
> > @@ -49,9 +49,9 @@ static inline bool ice_find_q_in_range(u16 low, u16
> > high, unsigned int tx_q)
> >  }
> >
> >  static inline void
> > -ice_set_cgd_num(struct ice_tlan_ctx *tlan_ctx, struct ice_ring
> > *ring)
> > +ice_set_cgd_num(struct ice_tlan_ctx *tlan_ctx, u8 dcb_tc)
> >  {
> > -     tlan_ctx->cgd_num = ring->dcb_tc;
> > +     tlan_ctx->cgd_num = dcb_tc;
> 
> Seems like this change isn't 100% necessary as part of this patch,
> but I guess you would have had to update it to use ice_tx_ring,
> so this does make sense to just pass the dcb_tc.

I can pass the ice_tx_ring to keep the previous logic.

> 
> >  }
> >
> >  static inline bool ice_is_dcb_active(struct ice_pf *pf)
> > diff --git a/drivers/net/ethernet/intel/ice/ice_ethtool.c
> > b/drivers/net/ethernet/intel/ice/ice_ethtool.c
> > index d95a5daca114..644ce9f3494d 100644
> > --- a/drivers/net/ethernet/intel/ice/ice_ethtool.c
> > +++ b/drivers/net/ethernet/intel/ice/ice_ethtool.c
> > @@ -584,7 +584,7 @@ static bool ice_lbtest_check_frame(u8 *frame)
> >   *
> >   * Function sends loopback packets on a test Tx ring.
> >   */
> > -static int ice_diag_send(struct ice_ring *tx_ring, u8 *data, u16
> > size)
> > +static int ice_diag_send(struct ice_tx_ring *tx_ring, u8 *data, u16
> > size)
> >  {
> >       struct ice_tx_desc *tx_desc;
> >       struct ice_tx_buf *tx_buf;
> > @@ -676,9 +676,10 @@ static u64 ice_loopback_test(struct net_device
> > *netdev)
> >       struct ice_netdev_priv *np = netdev_priv(netdev);
> >       struct ice_vsi *orig_vsi = np->vsi, *test_vsi;
> >       struct ice_pf *pf = orig_vsi->back;
> > -     struct ice_ring *tx_ring, *rx_ring;
> >       u8 broadcast[ETH_ALEN], ret = 0;
> >       int num_frames, valid_frames;
> > +     struct ice_tx_ring *tx_ring;
> > +     struct ice_ring *rx_ring;
> >       struct device *dev;
> >       u8 *tx_frame;
> >       int i;
> > @@ -1318,6 +1319,7 @@ ice_get_ethtool_stats(struct net_device
> > *netdev,
> >       struct ice_netdev_priv *np = netdev_priv(netdev);
> >       struct ice_vsi *vsi = np->vsi;
> >       struct ice_pf *pf = vsi->back;
> > +     struct ice_tx_ring *tx_ring;
> >       struct ice_ring *ring;
> >       unsigned int j;
> >       int i = 0;
> > @@ -1336,10 +1338,10 @@ ice_get_ethtool_stats(struct net_device
> > *netdev,
> >       rcu_read_lock();
> >
> >       ice_for_each_alloc_txq(vsi, j) {
> > -             ring = READ_ONCE(vsi->tx_rings[j]);
> > +             tx_ring = READ_ONCE(vsi->tx_rings[j]);
> >               if (ring) {
> 
> This should be "if (tx_ring)"

Oops, thanks, lkp yelled at me as well.

> 
> > -                     data[i++] = ring->stats.pkts;
> > -                     data[i++] = ring->stats.bytes;
> > +                     data[i++] = tx_ring->stats.pkts;
> > +                     data[i++] = tx_ring->stats.bytes;
> >               } else {
> >                       data[i++] = 0;
> >                       data[i++] = 0;
> > @@ -2667,9 +2669,10 @@ ice_get_ringparam(struct net_device *netdev,
> > struct ethtool_ringparam *ring)
> >  static int
> >  ice_set_ringparam(struct net_device *netdev, struct
> > ethtool_ringparam *ring)
> >  {
> > -     struct ice_ring *tx_rings = NULL, *rx_rings = NULL;
> > +     struct ice_tx_ring *tx_rings = NULL;
> > +     struct ice_ring *rx_rings = NULL;
> >       struct ice_netdev_priv *np = netdev_priv(netdev);
> > -     struct ice_ring *xdp_rings = NULL;
> > +     struct ice_tx_ring *xdp_rings = NULL;
> 
> RCT got a little messed up here.

Will fix.

> 
> >       struct ice_vsi *vsi = np->vsi;
> >       struct ice_pf *pf = vsi->back;
> >       int i, timeout = 50, err = 0;
> > diff --git a/drivers/net/ethernet/intel/ice/ice_lib.c
> > b/drivers/net/ethernet/intel/ice/ice_lib.c
> > index dde9802c6c72..ac0d7a52406b 100644
> > --- a/drivers/net/ethernet/intel/ice/ice_lib.c
> > +++ b/drivers/net/ethernet/intel/ice/ice_lib.c
> > @@ -379,12 +379,12 @@ static irqreturn_t ice_msix_clean_ctrl_vsi(int
> > __always_unused irq, void *data)
> >  {
> >       struct ice_q_vector *q_vector = (struct ice_q_vector *)data;
> >
> > -     if (!q_vector->tx.ring)
> 
> I don't think this function would have changed if we used a "void
> *ring" in the ice_ring_container.

Right, but I still think that this way we have less overhead given the
things that would be needed for void * approach I described above...

> 
> > +     if (!q_vector->tx.tx_ring)
> >               return IRQ_HANDLED;
> >
> >  #define FDIR_RX_DESC_CLEAN_BUDGET 64
> >       ice_clean_rx_irq(q_vector->rx.ring, FDIR_RX_DESC_CLEAN_BUDGET);
> > -     ice_clean_ctrl_tx_irq(q_vector->tx.ring);
> > +     ice_clean_ctrl_tx_irq(q_vector->tx.tx_ring);
> >
> >       return IRQ_HANDLED;
> >  }
> > @@ -1286,7 +1286,7 @@ static int ice_vsi_alloc_rings(struct ice_vsi
> > *vsi)
> >       dev = ice_pf_to_dev(pf);
> >       /* Allocate Tx rings */
> >       for (i = 0; i < vsi->alloc_txq; i++) {
> > -             struct ice_ring *ring;
> > +             struct ice_tx_ring *ring;
> >
> >               /* allocate with kzalloc(), free with kfree_rcu() */
> >               ring = kzalloc(sizeof(*ring), GFP_KERNEL);
> > @@ -1296,7 +1296,6 @@ static int ice_vsi_alloc_rings(struct ice_vsi
> > *vsi)
> >
> >               ring->q_index = i;
> >               ring->reg_idx = vsi->txq_map[i];
> > -             ring->ring_active = false;
> >               ring->vsi = vsi;
> >               ring->tx_tstamps = &pf->ptp.port.tx;
> >               ring->dev = dev;
> > @@ -1315,7 +1314,6 @@ static int ice_vsi_alloc_rings(struct ice_vsi
> > *vsi)
> >
> >               ring->q_index = i;
> >               ring->reg_idx = vsi->rxq_map[i];
> > -             ring->ring_active = false;
> >               ring->vsi = vsi;
> >               ring->netdev = vsi->netdev;
> >               ring->dev = dev;
> > @@ -1710,7 +1708,7 @@ int ice_vsi_cfg_single_rxq(struct ice_vsi *vsi,
> > u16 q_idx)
> >       return ice_vsi_cfg_rxq(vsi->rx_rings[q_idx]);
> >  }
> >
> > -int ice_vsi_cfg_single_txq(struct ice_vsi *vsi, struct ice_ring
> > **tx_rings, u16 q_idx)
> > +int ice_vsi_cfg_single_txq(struct ice_vsi *vsi, struct ice_tx_ring
> > **tx_rings, u16 q_idx)
> >  {
> >       struct ice_aqc_add_tx_qgrp *qg_buf;
> >       int err;
> > @@ -1766,7 +1764,7 @@ int ice_vsi_cfg_rxqs(struct ice_vsi *vsi)
> >   * Configure the Tx VSI for operation.
> >   */
> >  static int
> > -ice_vsi_cfg_txqs(struct ice_vsi *vsi, struct ice_ring **rings, u16
> > count)
> > +ice_vsi_cfg_txqs(struct ice_vsi *vsi, struct ice_tx_ring **rings,
> > u16 count)
> >  {
> >       struct ice_aqc_add_tx_qgrp *qg_buf;
> >       u16 q_idx = 0;
> > @@ -1818,7 +1816,7 @@ int ice_vsi_cfg_xdp_txqs(struct ice_vsi *vsi)
> >               return ret;
> >
> >       for (i = 0; i < vsi->num_xdp_txq; i++)
> > -             vsi->xdp_rings[i]->xsk_pool = ice_xsk_pool(vsi-
> > >xdp_rings[i]);
> > +             vsi->xdp_rings[i]->xsk_pool = ice_tx_xsk_pool(vsi-
> > >xdp_rings[i]);
> >
> >       return ret;
> >  }
> > @@ -2057,7 +2055,7 @@ int ice_vsi_stop_all_rx_rings(struct ice_vsi
> > *vsi)
> >   */
> >  static int
> >  ice_vsi_stop_tx_rings(struct ice_vsi *vsi, enum ice_disq_rst_src
> > rst_src,
> > -                   u16 rel_vmvf_num, struct ice_ring **rings, u16
> > count)
> > +                   u16 rel_vmvf_num, struct ice_tx_ring **rings, u16
> > count)
> >  {
> >       u16 q_idx;
> >
> > @@ -3357,10 +3355,10 @@ int ice_vsi_cfg_tc(struct ice_vsi *vsi, u8
> > ena_tc)
> >   *
> >   * This function assumes that caller has acquired a u64_stats_sync
> > lock.
> >   */
> > -static void ice_update_ring_stats(struct ice_ring *ring, u64 pkts,
> > u64 bytes)
> > +static void ice_update_ring_stats(struct ice_q_stats *stats, u64
> > pkts, u64 bytes)
> >  {
> > -     ring->stats.bytes += bytes;
> > -     ring->stats.pkts += pkts;
> > +     stats->bytes += bytes;
> > +     stats->pkts += pkts;
> 
> This is a nice little clean up.
> 
> >  }
> >
> >  /**
> > @@ -3369,10 +3367,10 @@ static void ice_update_ring_stats(struct
> > ice_ring *ring, u64 pkts, u64 bytes)
> >   * @pkts: number of processed packets
> >   * @bytes: number of processed bytes
> >   */
> > -void ice_update_tx_ring_stats(struct ice_ring *tx_ring, u64 pkts,
> > u64 bytes)
> > +void ice_update_tx_ring_stats(struct ice_tx_ring *tx_ring, u64 pkts,
> > u64 bytes)
> >  {
> >       u64_stats_update_begin(&tx_ring->syncp);
> > -     ice_update_ring_stats(tx_ring, pkts, bytes);
> > +     ice_update_ring_stats(&tx_ring->stats, pkts, bytes);
> >       u64_stats_update_end(&tx_ring->syncp);
> >  }
> >
> > @@ -3385,7 +3383,7 @@ void ice_update_tx_ring_stats(struct ice_ring
> > *tx_ring, u64 pkts, u64 bytes)
> >  void ice_update_rx_ring_stats(struct ice_ring *rx_ring, u64 pkts,
> > u64 bytes)
> >  {
> >       u64_stats_update_begin(&rx_ring->syncp);
> > -     ice_update_ring_stats(rx_ring, pkts, bytes);
> > +     ice_update_ring_stats(&rx_ring->stats, pkts, bytes);
> >       u64_stats_update_end(&rx_ring->syncp);
> >  }
> >
> > diff --git a/drivers/net/ethernet/intel/ice/ice_lib.h
> > b/drivers/net/ethernet/intel/ice/ice_lib.h
> > index d5a28bf0fc2c..2a69666db194 100644
> > --- a/drivers/net/ethernet/intel/ice/ice_lib.h
> > +++ b/drivers/net/ethernet/intel/ice/ice_lib.h
> > @@ -14,7 +14,7 @@ void ice_update_eth_stats(struct ice_vsi *vsi);
> >
> >  int ice_vsi_cfg_single_rxq(struct ice_vsi *vsi, u16 q_idx);
> >
> > -int ice_vsi_cfg_single_txq(struct ice_vsi *vsi, struct ice_ring
> > **tx_rings, u16 q_idx);
> > +int ice_vsi_cfg_single_txq(struct ice_vsi *vsi, struct ice_tx_ring
> > **tx_rings, u16 q_idx);
> >
> >  int ice_vsi_cfg_rxqs(struct ice_vsi *vsi);
> >
> > @@ -93,7 +93,7 @@ void ice_vsi_free_tx_rings(struct ice_vsi *vsi);
> >
> >  void ice_vsi_manage_rss_lut(struct ice_vsi *vsi, bool ena);
> >
> > -void ice_update_tx_ring_stats(struct ice_ring *ring, u64 pkts, u64
> > bytes);
> > +void ice_update_tx_ring_stats(struct ice_tx_ring *ring, u64 pkts,
> > u64 bytes);
> >
> >  void ice_update_rx_ring_stats(struct ice_ring *ring, u64 pkts, u64
> > bytes);
> >
> > diff --git a/drivers/net/ethernet/intel/ice/ice_main.c
> > b/drivers/net/ethernet/intel/ice/ice_main.c
> > index ef8d1815af56..cbcb4ad60852 100644
> > --- a/drivers/net/ethernet/intel/ice/ice_main.c
> > +++ b/drivers/net/ethernet/intel/ice/ice_main.c
> > @@ -61,7 +61,7 @@ bool netif_is_ice(struct net_device *dev)
> >   * ice_get_tx_pending - returns number of Tx descriptors not
> > processed
> >   * @ring: the ring of descriptors
> >   */
> > -static u16 ice_get_tx_pending(struct ice_ring *ring)
> > +static u16 ice_get_tx_pending(struct ice_tx_ring *ring)
> >  {
> >       u16 head, tail;
> >
> > @@ -101,7 +101,7 @@ static void ice_check_for_hang_subtask(struct
> > ice_pf *pf)
> >       hw = &vsi->back->hw;
> >
> >       for (i = 0; i < vsi->num_txq; i++) {
> 
> Interesting that this isn't using ice_for_each_txq()

I see that there are two more occurrences of such loop that could be
replaced with ice_for_each_txq(). Separate patch?

> 
> > -             struct ice_ring *tx_ring = vsi->tx_rings[i];
> > +             struct ice_tx_ring *tx_ring = vsi->tx_rings[i];
> >
> >               if (tx_ring && tx_ring->desc) {
> >                       /* If packet counter has not changed the queue
> > is

[...]

> > diff --git a/drivers/net/ethernet/intel/ice/ice_txrx.h
> > b/drivers/net/ethernet/intel/ice/ice_txrx.h
> > index 1e46e80f3d6f..d4ab3558933e 100644
> > --- a/drivers/net/ethernet/intel/ice/ice_txrx.h
> > +++ b/drivers/net/ethernet/intel/ice/ice_txrx.h
> > @@ -154,7 +154,7 @@ struct ice_tx_buf {
> >
> >  struct ice_tx_offload_params {
> >       u64 cd_qw1;
> > -     struct ice_ring *tx_ring;
> > +     struct ice_tx_ring *tx_ring;
> >       u32 td_cmd;
> >       u32 td_offset;
> >       u32 td_l2tag1;
> > @@ -267,16 +267,11 @@ struct ice_ring {
> >       struct ice_vsi *vsi;            /* Backreference to
> > associated VSI */
> >       struct ice_q_vector *q_vector;  /* Backreference to
> > associated vector */
> >       u8 __iomem *tail;
> > -     union {
> > -             struct ice_tx_buf *tx_buf;
> > -             struct ice_rx_buf *rx_buf;
> > -     };
> > +     struct ice_rx_buf *rx_buf;
> >       /* CL2 - 2nd cacheline starts here */
> > +     struct xdp_rxq_info xdp_rxq;
> > +     /* CL3 - 3rd cacheline starts here */
> >       u16 q_index;                    /* Queue number of ring */
> > -     u16 q_handle;                   /* Queue handle per TC */
> > -
> > -     u8 ring_active:1;               /* is ring online or not */
> 
> Seems like "ring_active" could be removed as a separate patch since
> it doesn't seemed to be used at all. Am I missing something here?

I don't mind pulling this out to a separate patch. This is not used for a
long time AFAICT.

> 
> > -
> >       u16 count;                      /* Number of descriptors */
> >       u16 reg_idx;                    /* HW register index of the
> > ring */
> >
> > @@ -284,38 +279,61 @@ struct ice_ring {
> >       u16 next_to_use;
> >       u16 next_to_clean;
> >       u16 next_to_alloc;
> > +     u16 rx_offset;
> > +     u16 rx_buf_len;
> >
> >       /* stats structs */
> > +     struct ice_rxq_stats rx_stats;
> >       struct ice_q_stats      stats;
> >       struct u64_stats_sync syncp;
> > -     union {
> > -             struct ice_txq_stats tx_stats;
> > -             struct ice_rxq_stats rx_stats;
> > -     };
> >
> >       struct rcu_head rcu;            /* to avoid race on free */
> > -     DECLARE_BITMAP(xps_state, ICE_TX_NBITS);        /* XPS Config State
> > */
> > +     /* CL4 - 3rd cacheline starts here */
> >       struct bpf_prog *xdp_prog;
> >       struct xsk_buff_pool *xsk_pool;
> > -     u16 rx_offset;
> > -     /* CL3 - 3rd cacheline starts here */
> > -     struct xdp_rxq_info xdp_rxq;
> >       struct sk_buff *skb;
> > -     /* CLX - the below items are only accessed infrequently and
> > should be
> > -      * in their own cache line if possible
> > -      */
> > -#define ICE_TX_FLAGS_RING_XDP                BIT(0)
> > +     dma_addr_t dma;                 /* physical address of ring
> > */
> >  #define ICE_RX_FLAGS_RING_BUILD_SKB  BIT(1)
> > +     u64 cached_phctime;
> > +     u8 dcb_tc;                      /* Traffic class of ring */
> > +     u8 ptp_rx;
> >       u8 flags;
> > +} ____cacheline_internodealigned_in_smp;
> > +
> > +struct ice_tx_ring {
> > +     /* CL1 - 1st cacheline starts here */
> > +     struct ice_tx_ring *next;       /* pointer to next ring in q_vector
> > */
> > +     void *desc;                     /* Descriptor ring memory */
> > +     struct device *dev;             /* Used for DMA mapping */
> > +     u8 __iomem *tail;
> > +     struct ice_tx_buf *tx_buf;
> > +     struct ice_q_vector *q_vector;  /* Backreference to
> > associated vector */
> > +     struct net_device *netdev;      /* netdev ring maps to */
> > +     struct ice_vsi *vsi;            /* Backreference to
> > associated VSI */
> > +     /* CL2 - 2nd cacheline starts here */
> >       dma_addr_t dma;                 /* physical address of ring
> > */
> > -     unsigned int size;              /* length of descriptor ring
> > in bytes */
> > +     u16 next_to_use;
> > +     u16 next_to_clean;
> > +     u16 count;                      /* Number of descriptors */
> > +     u16 q_index;                    /* Queue number of ring */
> > +     struct xsk_buff_pool *xsk_pool;
> > +
> > +     /* stats structs */
> > +     struct ice_q_stats      stats;
> > +     struct u64_stats_sync syncp;
> > +     struct ice_txq_stats tx_stats;
> > +
> > +     /* CL3 - 3rd cacheline starts here */
> > +     struct rcu_head rcu;            /* to avoid race on free */
> > +     DECLARE_BITMAP(xps_state, ICE_TX_NBITS);        /* XPS Config State
> > */
> > +     struct ice_ptp_tx *tx_tstamps;
> >       u32 txq_teid;                   /* Added Tx queue TEID */
> > -     u16 rx_buf_len;
> > +     u16 q_handle;                   /* Queue handle per TC */
> > +     u16 reg_idx;                    /* HW register index of the
> > ring */
> > +#define ICE_TX_FLAGS_RING_XDP                BIT(0)
> > +     u8 flags;
> >       u8 dcb_tc;                      /* Traffic class of ring */
> > -     struct ice_ptp_tx *tx_tstamps;
> > -     u64 cached_phctime;
> > -     u8 ptp_rx:1;
> > -     u8 ptp_tx:1;
> > +     u8 ptp_tx;
> >  } ____cacheline_internodealigned_in_smp;
> >
> >  static inline bool ice_ring_uses_build_skb(struct ice_ring *ring)
> > @@ -333,14 +351,17 @@ static inline void
> > ice_clear_ring_build_skb_ena(struct ice_ring *ring)
> >       ring->flags &= ~ICE_RX_FLAGS_RING_BUILD_SKB;
> >  }
> >
> > -static inline bool ice_ring_is_xdp(struct ice_ring *ring)
> > +static inline bool ice_ring_is_xdp(struct ice_tx_ring *ring)
> >  {
> >       return !!(ring->flags & ICE_TX_FLAGS_RING_XDP);
> >  }
> >
> >  struct ice_ring_container {
> >       /* head of linked-list of rings */
> > -     struct ice_ring *ring;
> > +     union {
> > +             struct ice_ring *ring;
> > +             struct ice_tx_ring *tx_ring;
> > +     };
> >       struct dim dim;         /* data for net_dim algorithm */
> >       u16 itr_idx;            /* index in the interrupt vector */
> >       /* this matches the maximum number of ITR bits, but in usec
> > @@ -363,6 +384,9 @@ struct ice_coalesce_stored {
> >  #define ice_for_each_ring(pos, head) \
> >       for (pos = (head).ring; pos; pos = pos->next)
> >
> > +#define ice_for_each_tx_ring(pos, head) \
> > +     for (pos = (head).tx_ring; pos; pos = pos->next)
> > +
> >  {
> >  #if (PAGE_SIZE < 8192)
> > @@ -378,16 +402,16 @@ union ice_32b_rx_flex_desc;
> >
> >  bool ice_alloc_rx_bufs(struct ice_ring *rxr, u16 cleaned_count);
> >  netdev_tx_t ice_start_xmit(struct sk_buff *skb, struct net_device
> > *netdev);
> > -void ice_clean_tx_ring(struct ice_ring *tx_ring);
> > +void ice_clean_tx_ring(struct ice_tx_ring *tx_ring);
> >  void ice_clean_rx_ring(struct ice_ring *rx_ring);
> > -int ice_setup_tx_ring(struct ice_ring *tx_ring);
> > +int ice_setup_tx_ring(struct ice_tx_ring *tx_ring);
> >  int ice_setup_rx_ring(struct ice_ring *rx_ring);
> > -void ice_free_tx_ring(struct ice_ring *tx_ring);
> > +void ice_free_tx_ring(struct ice_tx_ring *tx_ring);
> >  void ice_free_rx_ring(struct ice_ring *rx_ring);
> >  int ice_napi_poll(struct napi_struct *napi, int budget);
> >  int
> >  ice_prgm_fdir_fltr(struct ice_vsi *vsi, struct ice_fltr_desc
> > *fdir_desc,
> >                  u8 *raw_packet);
> >  int ice_clean_rx_irq(struct ice_ring *rx_ring, int budget);
> > -void ice_clean_ctrl_tx_irq(struct ice_ring *tx_ring);
> > +void ice_clean_ctrl_tx_irq(struct ice_tx_ring *tx_ring);
> >  #endif /* _ICE_TXRX_H_ */
> > diff --git a/drivers/net/ethernet/intel/ice/ice_txrx_lib.c
> > b/drivers/net/ethernet/intel/ice/ice_txrx_lib.c
> > index 171397dcf00a..74519c603872 100644
> > --- a/drivers/net/ethernet/intel/ice/ice_txrx_lib.c
> > +++ b/drivers/net/ethernet/intel/ice/ice_txrx_lib.c
> > @@ -217,7 +217,7 @@ ice_receive_skb(struct ice_ring *rx_ring, struct
> > sk_buff *skb, u16 vlan_tag)
> >   * @size: packet data size
> >   * @xdp_ring: XDP ring for transmission
> >   */
> > -int ice_xmit_xdp_ring(void *data, u16 size, struct ice_ring
> > *xdp_ring)
> > +int ice_xmit_xdp_ring(void *data, u16 size, struct ice_tx_ring
> > *xdp_ring)
> >  {
> >       u16 i = xdp_ring->next_to_use;
> >       struct ice_tx_desc *tx_desc;
> > @@ -269,7 +269,7 @@ int ice_xmit_xdp_ring(void *data, u16 size,
> > struct ice_ring *xdp_ring)
> >   *
> >   * Returns negative on failure, 0 on success.
> >   */
> > -int ice_xmit_xdp_buff(struct xdp_buff *xdp, struct ice_ring
> > *xdp_ring)
> > +int ice_xmit_xdp_buff(struct xdp_buff *xdp, struct ice_tx_ring
> > *xdp_ring)
> >  {
> >       struct xdp_frame *xdpf = xdp_convert_buff_to_frame(xdp);
> >
> > @@ -294,7 +294,7 @@ void ice_finalize_xdp_rx(struct ice_ring
> > *rx_ring, unsigned int xdp_res)
> >               xdp_do_flush_map();
> >
> >       if (xdp_res & ICE_XDP_TX) {
> > -             struct ice_ring *xdp_ring =
> > +             struct ice_tx_ring *xdp_ring =
> >                       rx_ring->vsi->xdp_rings[rx_ring->q_index];
> 
> Probably me not understanding XDP, but this looks a little strange.

Very strange, but later patches change the ice_finalize_xdp_rx() to get
xdp_ring directly as an input so there won't be a need for this weird
digging anymore. So this part will look like:

	if (xdp_res & ICE_XDP_TX)
		ice_xdp_ring_update_tail(xdp_ring);

> 
> >
> >               ice_xdp_ring_update_tail(xdp_ring);

[...]

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2021-08-10 13:25 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-08-05 23:00 [PATCH v3 intel-next 0/6] XDP_TX improvements for ice Maciej Fijalkowski
2021-08-05 23:00 ` [PATCH v3 intel-next 1/6] ice: split ice_ring onto Tx/Rx separate structs Maciej Fijalkowski
2021-08-06  1:08   ` kernel test robot
2021-08-06 20:46   ` Creeley, Brett
2021-08-10 13:10     ` Maciej Fijalkowski
2021-08-05 23:00 ` [PATCH v3 intel-next 2/6] ice: unify xdp_rings accesses Maciej Fijalkowski
2021-08-05 23:00 ` [PATCH v3 intel-next 3/6] ice: do not create xdp_frame on XDP_TX Maciej Fijalkowski
2021-08-05 23:00 ` [PATCH v3 intel-next 4/6] ice: propagate xdp_ring onto rx_ring Maciej Fijalkowski
2021-08-05 23:00 ` [PATCH v3 intel-next 5/6] ice: optimize XDP_TX workloads Maciej Fijalkowski
2021-08-05 23:00 ` [PATCH v3 intel-next 6/6] ice: introduce XDP_TX fallback path Maciej Fijalkowski

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).