All of lore.kernel.org
 help / color / mirror / Atom feed
* [Intel-wired-lan] [PATCH net-next v3 0/3] ice: interrupt moderation updates
@ 2021-08-24  1:12 Jesse Brandeburg
  2021-08-24  1:12 ` [Intel-wired-lan] [PATCH net-next v3 1/3] ice: update dim usage and moderation Jesse Brandeburg
                   ` (2 more replies)
  0 siblings, 3 replies; 9+ messages in thread
From: Jesse Brandeburg @ 2021-08-24  1:12 UTC (permalink / raw)
  To: intel-wired-lan

Improve how the driver generates interrupts when it's using DIMLIB, and
repair a couple of bugs found when making this change. It doesn't make
sense for these changes to go into net, because they both depend on
patch one in this series, or changes that are only available in
net-next.

Jesse Brandeburg (3):
  ice: update dim usage and moderation
  ice: fix rate limit update after coalesce change
  ice: fix software generating extra interrupts

 drivers/net/ethernet/intel/ice/ice_ethtool.c  |  17 ++-
 .../net/ethernet/intel/ice/ice_hw_autogen.h   |   1 +
 drivers/net/ethernet/intel/ice/ice_lib.c      |  29 ++++-
 drivers/net/ethernet/intel/ice/ice_lib.h      |   1 +
 drivers/net/ethernet/intel/ice/ice_main.c     | 118 ++++++++++--------
 drivers/net/ethernet/intel/ice/ice_txrx.c     | 110 +++++++++-------
 6 files changed, 170 insertions(+), 106 deletions(-)

-- 
2.31.1


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Intel-wired-lan] [PATCH net-next v3 1/3] ice: update dim usage and moderation
  2021-08-24  1:12 [Intel-wired-lan] [PATCH net-next v3 0/3] ice: interrupt moderation updates Jesse Brandeburg
@ 2021-08-24  1:12 ` Jesse Brandeburg
  2021-09-20  5:09   ` G, GurucharanX
  2021-09-20  6:46   ` Paul Menzel
  2021-08-24  1:12 ` [Intel-wired-lan] [PATCH net-next v3 2/3] ice: fix rate limit update after coalesce change Jesse Brandeburg
  2021-08-24  1:12 ` [Intel-wired-lan] [PATCH net-next v3 3/3] ice: fix software generating extra interrupts Jesse Brandeburg
  2 siblings, 2 replies; 9+ messages in thread
From: Jesse Brandeburg @ 2021-08-24  1:12 UTC (permalink / raw)
  To: intel-wired-lan

The driver was having trouble with unreliable latency when doing single
threaded ping-pong tests. This was root caused to the DIM algorithm
landing on a too slow interrupt value, which caused high latency, and it
was especially present when queues were being switched frequently by the
scheduler as happens on default setups today.

In attempting to improve this, we allow the upper rate limit for
interrupts to move to rate limit of 4 microseconds as a max, which means
that no vector can generate more than 250,000 interrupts per second. The
old config was up to 100,000. The driver previously tried to program the
rate limit too frequently and if the receive and transmit side were both
active on the same vector, the INTRL would be set incorrectly, and this
change fixes that issue as a side effect of the redesign.

This driver will operate from now on with a slightly changed DIM table
with more emphasis towards latency sensitivity by having more table
entries with lower latency than with high latency (high being >= 64
microseconds).

The driver also resets the DIM algorithm state with a new stats set when
there is no work done and the data becomes stale (older than 1 second),
for the respective receive or transmit portion of the interrupt.

Add a new helper for setting rate limit, which will be used more
in a followup patch.

Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>

---
v3: merged on top of tx_ring/rx_ring split patch series
v2: original version
---
 drivers/net/ethernet/intel/ice/ice_lib.c  |  25 +++++
 drivers/net/ethernet/intel/ice/ice_lib.h  |   1 +
 drivers/net/ethernet/intel/ice/ice_main.c | 118 ++++++++++++----------
 drivers/net/ethernet/intel/ice/ice_txrx.c |  84 ++++++++-------
 4 files changed, 141 insertions(+), 87 deletions(-)

diff --git a/drivers/net/ethernet/intel/ice/ice_lib.c b/drivers/net/ethernet/intel/ice/ice_lib.c
index f9f060da3f04..28240b9f83e5 100644
--- a/drivers/net/ethernet/intel/ice/ice_lib.c
+++ b/drivers/net/ethernet/intel/ice/ice_lib.c
@@ -1941,6 +1941,31 @@ void ice_write_itr(struct ice_ring_container *rc, u16 itr)
 	__ice_write_itr(q_vector, rc, itr);
 }
 
+/**
+ * ice_set_q_vector_intrl - set up interrupt rate limiting
+ * @q_vector: the vector to be configured
+ *
+ * Interrupt rate limiting is local to the vector, not per-queue so we must
+ * detect if either ring container has dynamic enabled to decide what to set
+ * the interrupt rate limit to via INTRL settings. In the case of dynamic
+ * disabled on both, write the value with the cached setting to make sure
+ * INTRL matches user visible value.
+ */
+void ice_set_q_vector_intrl(struct ice_q_vector *q_vector)
+{
+	if (ITR_IS_DYNAMIC(&q_vector->tx) || ITR_IS_DYNAMIC(&q_vector->rx)) {
+		/* in the case of dynamic enabled, cap each vector to no more
+		 * than (4 us) 250,000 ints/sec, which allows low latency
+		 * but still less than 500,000 interrupts per second, which
+		 * reduces CPU a bit in the case of the lowest latency
+		 * setting. The 4 here is a value in microseconds.
+		 */
+		ice_write_intrl(q_vector, 4);
+	} else {
+		ice_write_intrl(q_vector, q_vector->intrl);
+	}
+}
+
 /**
  * ice_vsi_cfg_msix - MSIX mode Interrupt Config in the HW
  * @vsi: the VSI being configured
diff --git a/drivers/net/ethernet/intel/ice/ice_lib.h b/drivers/net/ethernet/intel/ice/ice_lib.h
index f446c5071478..989c4241676a 100644
--- a/drivers/net/ethernet/intel/ice/ice_lib.h
+++ b/drivers/net/ethernet/intel/ice/ice_lib.h
@@ -103,6 +103,7 @@ int ice_status_to_errno(enum ice_status err);
 
 void ice_write_intrl(struct ice_q_vector *q_vector, u8 intrl);
 void ice_write_itr(struct ice_ring_container *rc, u16 itr);
+void ice_set_q_vector_intrl(struct ice_q_vector *q_vector);
 
 enum ice_status
 ice_vsi_cfg_mac_fltr(struct ice_vsi *vsi, const u8 *macaddr, bool set);
diff --git a/drivers/net/ethernet/intel/ice/ice_main.c b/drivers/net/ethernet/intel/ice/ice_main.c
index 10c6287aa5b2..7749bb0945bc 100644
--- a/drivers/net/ethernet/intel/ice/ice_main.c
+++ b/drivers/net/ethernet/intel/ice/ice_main.c
@@ -5568,77 +5568,59 @@ int ice_vsi_cfg(struct ice_vsi *vsi)
 }
 
 /* THEORY OF MODERATION:
- * The below code creates custom DIM profiles for use by this driver, because
- * the ice driver hardware works differently than the hardware that DIMLIB was
+ * The ice driver hardware works differently than the hardware that DIMLIB was
  * originally made for. ice hardware doesn't have packet count limits that
  * can trigger an interrupt, but it *does* have interrupt rate limit support,
- * and this code adds that capability to be used by the driver when it's using
- * DIMLIB. The DIMLIB code was always designed to be a suggestion to the driver
- * for how to "respond" to traffic and interrupts, so this driver uses a
- * slightly different set of moderation parameters to get best performance.
+ * which is hard-coded to limit to 250,000 ints/second.
+ * If not using dynamic moderation, the INTRL value can be modified
+ * by ethtool rx-usecs-high.
  */
 struct ice_dim {
 	/* the throttle rate for interrupts, basically worst case delay before
 	 * an initial interrupt fires, value is stored in microseconds.
 	 */
 	u16 itr;
-	/* the rate limit for interrupts, which can cap a delay from a small
-	 * ITR at a certain amount of interrupts per second. f.e. a 2us ITR
-	 * could yield as much as 500,000 interrupts per second, but with a
-	 * 10us rate limit, it limits to 100,000 interrupts per second. Value
-	 * is stored in microseconds.
-	 */
-	u16 intrl;
 };
 
 /* Make a different profile for Rx that doesn't allow quite so aggressive
- * moderation at the high end (it maxes out at 128us or about 8k interrupts a
- * second. The INTRL/rate parameters here are only useful to cap small ITR
- * values, which is why for larger ITR's - like 128, which can only generate
- * 8k interrupts per second, there is no point to rate limit and the values
- * are set to zero. The rate limit values do affect latency, and so must
- * be reasonably small so to not impact latency sensitive tests.
+ * moderation at the high end (it maxxes out at 126us or about 8k interrupts a
+ * second.
  */
 static const struct ice_dim rx_profile[] = {
-	{2, 10},
-	{8, 16},
-	{32, 0},
-	{96, 0},
-	{128, 0}
+	{2},    /* 500,000 ints/s, capped at 250K by INTRL */
+	{8},    /* 125,000 ints/s */
+	{16},   /*  62,500 ints/s */
+	{62},   /*  16,129 ints/s */
+	{126}   /*   7,936 ints/s */
 };
 
 /* The transmit profile, which has the same sorts of values
  * as the previous struct
  */
 static const struct ice_dim tx_profile[] = {
-	{2, 10},
-	{8, 16},
-	{64, 0},
-	{128, 0},
-	{256, 0}
+	{2},    /* 500,000 ints/s, capped at 250K by INTRL */
+	{8},    /* 125,000 ints/s */
+	{40},   /*  16,125 ints/s */
+	{128},  /*   7,812 ints/s */
+	{256}   /*   3,906 ints/s */
 };
 
 static void ice_tx_dim_work(struct work_struct *work)
 {
 	struct ice_ring_container *rc;
-	struct ice_q_vector *q_vector;
 	struct dim *dim;
-	u16 itr, intrl;
+	u16 itr;
 
 	dim = container_of(work, struct dim, work);
-	rc = container_of(dim, struct ice_ring_container, dim);
-	q_vector = container_of(rc, struct ice_q_vector, tx);
+	rc = (struct ice_ring_container *)dim->priv;
 
-	if (dim->profile_ix >= ARRAY_SIZE(tx_profile))
-		dim->profile_ix = ARRAY_SIZE(tx_profile) - 1;
+	WARN_ON(dim->profile_ix >= ARRAY_SIZE(tx_profile));
 
 	/* look up the values in our local table */
 	itr = tx_profile[dim->profile_ix].itr;
-	intrl = tx_profile[dim->profile_ix].intrl;
 
-	ice_trace(tx_dim_work, q_vector, dim);
+	ice_trace(tx_dim_work, container_of(rc, struct ice_q_vector, tx), dim);
 	ice_write_itr(rc, itr);
-	ice_write_intrl(q_vector, intrl);
 
 	dim->state = DIM_START_MEASURE;
 }
@@ -5646,28 +5628,64 @@ static void ice_tx_dim_work(struct work_struct *work)
 static void ice_rx_dim_work(struct work_struct *work)
 {
 	struct ice_ring_container *rc;
-	struct ice_q_vector *q_vector;
 	struct dim *dim;
-	u16 itr, intrl;
+	u16 itr;
 
 	dim = container_of(work, struct dim, work);
-	rc = container_of(dim, struct ice_ring_container, dim);
-	q_vector = container_of(rc, struct ice_q_vector, rx);
+	rc = (struct ice_ring_container *)dim->priv;
 
-	if (dim->profile_ix >= ARRAY_SIZE(rx_profile))
-		dim->profile_ix = ARRAY_SIZE(rx_profile) - 1;
+	WARN_ON(dim->profile_ix >= ARRAY_SIZE(rx_profile));
 
 	/* look up the values in our local table */
 	itr = rx_profile[dim->profile_ix].itr;
-	intrl = rx_profile[dim->profile_ix].intrl;
 
-	ice_trace(rx_dim_work, q_vector, dim);
+	ice_trace(rx_dim_work, container_of(rc, struct ice_q_vector, rx), dim);
 	ice_write_itr(rc, itr);
-	ice_write_intrl(q_vector, intrl);
 
 	dim->state = DIM_START_MEASURE;
 }
 
+#define ICE_DIM_DEFAULT_PROFILE_IX 1
+
+/**
+ * ice_init_moderation - set up interrupt moderation
+ * @q_vector: the vector containing rings to be configured
+ *
+ * Set up interrupt moderation registers, with the intent to do the right thing
+ * when called from reset or from probe, and whether or not dynamic moderation
+ * is enabled or not. Take special care to write all the registers in both
+ * dynamic mode or not in order to make sure hardware is in a known state.
+ */
+static void ice_init_moderation(struct ice_q_vector *q_vector)
+{
+	struct ice_ring_container *rc;
+	bool tx_dynamic, rx_dynamic;
+
+	rc = &q_vector->tx;
+	INIT_WORK(&rc->dim.work, ice_tx_dim_work);
+	rc->dim.mode = DIM_CQ_PERIOD_MODE_START_FROM_EQE;
+	rc->dim.profile_ix = ICE_DIM_DEFAULT_PROFILE_IX;
+	rc->dim.priv = rc;
+	tx_dynamic = ITR_IS_DYNAMIC(rc);
+
+	/* set the initial TX ITR to match the above */
+	ice_write_itr(rc, tx_dynamic ?
+		      tx_profile[rc->dim.profile_ix].itr : rc->itr_setting);
+
+	rc = &q_vector->rx;
+	INIT_WORK(&rc->dim.work, ice_rx_dim_work);
+	rc->dim.mode = DIM_CQ_PERIOD_MODE_START_FROM_EQE;
+	rc->dim.profile_ix = ICE_DIM_DEFAULT_PROFILE_IX;
+	rc->dim.priv = rc;
+	rx_dynamic = ITR_IS_DYNAMIC(rc);
+
+	/* set the initial RX ITR to match the above */
+	ice_write_itr(rc, rx_dynamic ? rx_profile[rc->dim.profile_ix].itr :
+				       rc->itr_setting);
+
+	ice_set_q_vector_intrl(q_vector);
+}
+
 /**
  * ice_napi_enable_all - Enable NAPI for all q_vectors in the VSI
  * @vsi: the VSI being configured
@@ -5682,11 +5700,7 @@ static void ice_napi_enable_all(struct ice_vsi *vsi)
 	ice_for_each_q_vector(vsi, q_idx) {
 		struct ice_q_vector *q_vector = vsi->q_vectors[q_idx];
 
-		INIT_WORK(&q_vector->tx.dim.work, ice_tx_dim_work);
-		q_vector->tx.dim.mode = DIM_CQ_PERIOD_MODE_START_FROM_EQE;
-
-		INIT_WORK(&q_vector->rx.dim.work, ice_rx_dim_work);
-		q_vector->rx.dim.mode = DIM_CQ_PERIOD_MODE_START_FROM_EQE;
+		ice_init_moderation(q_vector);
 
 		if (q_vector->rx.rx_ring || q_vector->tx.tx_ring)
 			napi_enable(&q_vector->napi);
diff --git a/drivers/net/ethernet/intel/ice/ice_txrx.c b/drivers/net/ethernet/intel/ice/ice_txrx.c
index 01ae331927bd..33573774647f 100644
--- a/drivers/net/ethernet/intel/ice/ice_txrx.c
+++ b/drivers/net/ethernet/intel/ice/ice_txrx.c
@@ -1259,6 +1259,41 @@ int ice_clean_rx_irq(struct ice_rx_ring *rx_ring, int budget)
 	return failure ? budget : (int)total_rx_pkts;
 }
 
+static void __ice_update_sample(struct ice_q_vector *q_vector,
+				struct ice_ring_container *rc,
+				struct dim_sample *sample,
+				bool is_tx)
+{
+	u64 packets = 0, bytes = 0;
+
+	if (is_tx) {
+		struct ice_tx_ring *tx_ring;
+
+		ice_for_each_tx_ring(tx_ring, *rc) {
+			packets += tx_ring->stats.pkts;
+			bytes += tx_ring->stats.bytes;
+		}
+	} else {
+		struct ice_rx_ring *rx_ring;
+
+		ice_for_each_rx_ring(rx_ring, *rc) {
+			packets += rx_ring->stats.pkts;
+			bytes += rx_ring->stats.bytes;
+		}
+	}
+
+	dim_update_sample(q_vector->total_events, packets, bytes, sample);
+	sample->comp_ctr = 0;
+
+	/* if dim settings get stale, like when not updated for 1
+	 * second or longer, force it to start again. This addresses the
+	 * freqent case of an idle queue being switched to by the
+	 * scheduler. The 1,000 here means 1,000 milliseconds.
+	 */
+	if (ktime_ms_delta(sample->time, rc->dim.start_sample.time) >= 1000)
+		rc->dim.state = DIM_START_MEASURE;
+}
+
 /**
  * ice_net_dim - Update net DIM algorithm
  * @q_vector: the vector associated with the interrupt
@@ -1274,34 +1309,16 @@ static void ice_net_dim(struct ice_q_vector *q_vector)
 	struct ice_ring_container *rx = &q_vector->rx;
 
 	if (ITR_IS_DYNAMIC(tx)) {
-		struct dim_sample dim_sample = {};
-		u64 packets = 0, bytes = 0;
-		struct ice_tx_ring *ring;
-
-		ice_for_each_tx_ring(ring, q_vector->tx) {
-			packets += ring->stats.pkts;
-			bytes += ring->stats.bytes;
-		}
-
-		dim_update_sample(q_vector->total_events, packets, bytes,
-				  &dim_sample);
+		struct dim_sample dim_sample;
 
+		__ice_update_sample(q_vector, tx, &dim_sample, true);
 		net_dim(&tx->dim, dim_sample);
 	}
 
 	if (ITR_IS_DYNAMIC(rx)) {
-		struct dim_sample dim_sample = {};
-		u64 packets = 0, bytes = 0;
-		struct ice_rx_ring *ring;
-
-		ice_for_each_rx_ring(ring, q_vector->rx) {
-			packets += ring->stats.pkts;
-			bytes += ring->stats.bytes;
-		}
-
-		dim_update_sample(q_vector->total_events, packets, bytes,
-				  &dim_sample);
+		struct dim_sample dim_sample;
 
+		__ice_update_sample(q_vector, rx, &dim_sample, false);
 		net_dim(&rx->dim, dim_sample);
 	}
 }
@@ -1328,15 +1345,14 @@ static u32 ice_buildreg_itr(u16 itr_idx, u16 itr)
 }
 
 /**
- * ice_update_ena_itr - Update ITR moderation and re-enable MSI-X interrupt
+ * ice_enable_interrupt - re-enable MSI-X interrupt
  * @q_vector: the vector associated with the interrupt to enable
  *
- * Update the net_dim() algorithm and re-enable the interrupt associated with
- * this vector.
- *
- * If the VSI is down, the interrupt will not be re-enabled.
+ * If the VSI is down, the interrupt will not be re-enabled. Also,
+ * when enabling the interrupt always reset the wb_on_itr to false
+ * and trigger a software interrupt to clean out internal state.
  */
-static void ice_update_ena_itr(struct ice_q_vector *q_vector)
+static void ice_enable_interrupt(struct ice_q_vector *q_vector)
 {
 	struct ice_vsi *vsi = q_vector->vsi;
 	bool wb_en = q_vector->wb_on_itr;
@@ -1351,10 +1367,6 @@ static void ice_update_ena_itr(struct ice_q_vector *q_vector)
 	if (wb_en)
 		q_vector->wb_on_itr = false;
 
-	/* This will do nothing if dynamic updates are not enabled. */
-	ice_net_dim(q_vector);
-
-	/* net_dim() updates ITR out-of-band using a work item */
 	itr_val = ice_buildreg_itr(ICE_ITR_NONE, 0);
 	/* trigger an immediate software interrupt when exiting
 	 * busy poll, to make sure to catch any pending cleanups
@@ -1482,10 +1494,12 @@ int ice_napi_poll(struct napi_struct *napi, int budget)
 	/* Exit the polling mode, but don't re-enable interrupts if stack might
 	 * poll us due to busy-polling
 	 */
-	if (likely(napi_complete_done(napi, work_done)))
-		ice_update_ena_itr(q_vector);
-	else
+	if (likely(napi_complete_done(napi, work_done))) {
+		ice_net_dim(q_vector);
+		ice_enable_interrupt(q_vector);
+	} else {
 		ice_set_wb_on_itr(q_vector);
+	}
 
 	return min_t(int, work_done, budget - 1);
 }
-- 
2.31.1


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Intel-wired-lan] [PATCH net-next v3 2/3] ice: fix rate limit update after coalesce change
  2021-08-24  1:12 [Intel-wired-lan] [PATCH net-next v3 0/3] ice: interrupt moderation updates Jesse Brandeburg
  2021-08-24  1:12 ` [Intel-wired-lan] [PATCH net-next v3 1/3] ice: update dim usage and moderation Jesse Brandeburg
@ 2021-08-24  1:12 ` Jesse Brandeburg
  2021-09-20  5:10   ` G, GurucharanX
  2021-08-24  1:12 ` [Intel-wired-lan] [PATCH net-next v3 3/3] ice: fix software generating extra interrupts Jesse Brandeburg
  2 siblings, 1 reply; 9+ messages in thread
From: Jesse Brandeburg @ 2021-08-24  1:12 UTC (permalink / raw)
  To: intel-wired-lan

If the adaptive settings are changed with
ethtool -C ethx adaptive-rx off adaptive-tx off
then the interrupt rate limit should be maintained as a user set value,
but only if BOTH adaptive settings are off. Fix a bug where the rate
limit that was being used in adaptive mode was staying set in the
register but was not reported correctly by ethtool -c ethx. Due to long
lines include a small refactor of q_vector variable.

Fixes: b8b4772377dd ("ice: refactor interrupt moderation writes")
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
---
v3: merged with tx_ring/rx_ring split series.
---
 drivers/net/ethernet/intel/ice/ice_ethtool.c | 17 +++++++++++------
 drivers/net/ethernet/intel/ice/ice_lib.c     |  4 ++--
 2 files changed, 13 insertions(+), 8 deletions(-)

diff --git a/drivers/net/ethernet/intel/ice/ice_ethtool.c b/drivers/net/ethernet/intel/ice/ice_ethtool.c
index cf6b1fbef584..3b3dfa13c54f 100644
--- a/drivers/net/ethernet/intel/ice/ice_ethtool.c
+++ b/drivers/net/ethernet/intel/ice/ice_ethtool.c
@@ -3638,6 +3638,9 @@ ice_set_rc_coalesce(struct ethtool_coalesce *ec,
 
 	switch (rc->type) {
 	case ICE_RX_CONTAINER:
+	{
+		struct ice_q_vector *q_vector = rc->rx_ring->q_vector;
+
 		if (ec->rx_coalesce_usecs_high > ICE_MAX_INTRL ||
 		    (ec->rx_coalesce_usecs_high &&
 		     ec->rx_coalesce_usecs_high < pf->hw.intrl_gran)) {
@@ -3646,22 +3649,20 @@ ice_set_rc_coalesce(struct ethtool_coalesce *ec,
 				    ICE_MAX_INTRL);
 			return -EINVAL;
 		}
-		if (ec->rx_coalesce_usecs_high != rc->rx_ring->q_vector->intrl &&
+		if (ec->rx_coalesce_usecs_high != q_vector->intrl &&
 		    (ec->use_adaptive_rx_coalesce || ec->use_adaptive_tx_coalesce)) {
 			netdev_info(vsi->netdev, "Invalid value, %s-usecs-high cannot be changed if adaptive-tx or adaptive-rx is enabled\n",
 				    c_type_str);
 			return -EINVAL;
 		}
-		if (ec->rx_coalesce_usecs_high != rc->rx_ring->q_vector->intrl) {
-			rc->rx_ring->q_vector->intrl = ec->rx_coalesce_usecs_high;
-			ice_write_intrl(rc->rx_ring->q_vector,
-					ec->rx_coalesce_usecs_high);
-		}
+		if (ec->rx_coalesce_usecs_high != q_vector->intrl)
+			q_vector->intrl = ec->rx_coalesce_usecs_high;
 
 		use_adaptive_coalesce = ec->use_adaptive_rx_coalesce;
 		coalesce_usecs = ec->rx_coalesce_usecs;
 
 		break;
+	}
 	case ICE_TX_CONTAINER:
 		use_adaptive_coalesce = ec->use_adaptive_tx_coalesce;
 		coalesce_usecs = ec->tx_coalesce_usecs;
@@ -3806,6 +3807,8 @@ __ice_set_coalesce(struct net_device *netdev, struct ethtool_coalesce *ec,
 
 			if (ice_set_q_coalesce(vsi, ec, v_idx))
 				return -EINVAL;
+
+			ice_set_q_vector_intrl(vsi->q_vectors[v_idx]);
 		}
 		goto set_complete;
 	}
@@ -3813,6 +3816,8 @@ __ice_set_coalesce(struct net_device *netdev, struct ethtool_coalesce *ec,
 	if (ice_set_q_coalesce(vsi, ec, q_num))
 		return -EINVAL;
 
+	ice_set_q_vector_intrl(vsi->q_vectors[q_num]);
+
 set_complete:
 	return 0;
 }
diff --git a/drivers/net/ethernet/intel/ice/ice_lib.c b/drivers/net/ethernet/intel/ice/ice_lib.c
index 28240b9f83e5..ef540c7f7a5a 100644
--- a/drivers/net/ethernet/intel/ice/ice_lib.c
+++ b/drivers/net/ethernet/intel/ice/ice_lib.c
@@ -3123,7 +3123,7 @@ ice_vsi_rebuild_set_coalesce(struct ice_vsi *vsi,
 		}
 
 		vsi->q_vectors[i]->intrl = coalesce[i].intrl;
-		ice_write_intrl(vsi->q_vectors[i], coalesce[i].intrl);
+		ice_set_q_vector_intrl(vsi->q_vectors[i]);
 	}
 
 	/* the number of queue vectors increased so write whatever is in
@@ -3141,7 +3141,7 @@ ice_vsi_rebuild_set_coalesce(struct ice_vsi *vsi,
 		ice_write_itr(rc, rc->itr_setting);
 
 		vsi->q_vectors[i]->intrl = coalesce[0].intrl;
-		ice_write_intrl(vsi->q_vectors[i], coalesce[0].intrl);
+		ice_set_q_vector_intrl(vsi->q_vectors[i]);
 	}
 }
 
-- 
2.31.1


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Intel-wired-lan] [PATCH net-next v3 3/3] ice: fix software generating extra interrupts
  2021-08-24  1:12 [Intel-wired-lan] [PATCH net-next v3 0/3] ice: interrupt moderation updates Jesse Brandeburg
  2021-08-24  1:12 ` [Intel-wired-lan] [PATCH net-next v3 1/3] ice: update dim usage and moderation Jesse Brandeburg
  2021-08-24  1:12 ` [Intel-wired-lan] [PATCH net-next v3 2/3] ice: fix rate limit update after coalesce change Jesse Brandeburg
@ 2021-08-24  1:12 ` Jesse Brandeburg
  2021-09-20  5:11   ` G, GurucharanX
  2 siblings, 1 reply; 9+ messages in thread
From: Jesse Brandeburg @ 2021-08-24  1:12 UTC (permalink / raw)
  To: intel-wired-lan

The driver tried to work around missing completion events that occurred
while interrupts are disabled, by triggering a software interrupt
whenever we exit polling (but we had to have polled at least once).

This was causing a *lot* of extra interrupts for some workloads like
NVMe over TCP, which resulted in regressions in performance. It was also
visible when polling didn't prevent interrupts when busy_poll was
enabled.

Fix the extra interrupts by utilizing our previously unused 3rd ITR
(interrupt throttle) index and set it to 20K interrupts per second, and
then trigger a software interrupt within that rate limit.

While here, slightly refactor the code to avoid an overwrite of a local
variable in the case of wb_en = true.

Fixes: b7306b42beaf ("ice: manage interrupts during poll exit")
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
---
 .../net/ethernet/intel/ice/ice_hw_autogen.h   |  1 +
 drivers/net/ethernet/intel/ice/ice_txrx.c     | 26 +++++++++++--------
 2 files changed, 16 insertions(+), 11 deletions(-)

diff --git a/drivers/net/ethernet/intel/ice/ice_hw_autogen.h b/drivers/net/ethernet/intel/ice/ice_hw_autogen.h
index ab5065b5e748..d16738a3d3a7 100644
--- a/drivers/net/ethernet/intel/ice/ice_hw_autogen.h
+++ b/drivers/net/ethernet/intel/ice/ice_hw_autogen.h
@@ -183,6 +183,7 @@
 #define GLINT_DYN_CTL_INTERVAL_S		5
 #define GLINT_DYN_CTL_INTERVAL_M		ICE_M(0xFFF, 5)
 #define GLINT_DYN_CTL_SW_ITR_INDX_ENA_M		BIT(24)
+#define GLINT_DYN_CTL_SW_ITR_INDX_S		25
 #define GLINT_DYN_CTL_SW_ITR_INDX_M		ICE_M(0x3, 25)
 #define GLINT_DYN_CTL_WB_ON_ITR_M		BIT(30)
 #define GLINT_DYN_CTL_INTENA_MSK_M		BIT(31)
diff --git a/drivers/net/ethernet/intel/ice/ice_txrx.c b/drivers/net/ethernet/intel/ice/ice_txrx.c
index 33573774647f..44e50d511a49 100644
--- a/drivers/net/ethernet/intel/ice/ice_txrx.c
+++ b/drivers/net/ethernet/intel/ice/ice_txrx.c
@@ -1361,21 +1361,25 @@ static void ice_enable_interrupt(struct ice_q_vector *q_vector)
 	if (test_bit(ICE_DOWN, vsi->state))
 		return;
 
-	/* When exiting WB_ON_ITR, let ITR resume its normal
-	 * interrupts-enabled path.
+	/* trigger an ITR delayed software interrupt when exiting busy poll, to
+	 * make sure to catch any pending cleanups that might have been missed
+	 * due to interrupt state transition. If busy poll or poll isn't
+	 * enabled, then don't update ITR, and just enable the interrupt.
 	 */
-	if (wb_en)
+	if (!wb_en) {
+		itr_val = ice_buildreg_itr(ICE_ITR_NONE, 0);
+	} else {
 		q_vector->wb_on_itr = false;
 
-	itr_val = ice_buildreg_itr(ICE_ITR_NONE, 0);
-	/* trigger an immediate software interrupt when exiting
-	 * busy poll, to make sure to catch any pending cleanups
-	 * that might have been missed due to interrupt state
-	 * transition.
-	 */
-	if (wb_en) {
+		/* do two things here with a single write. Set up the third ITR
+		 * index to be used for software interrupt moderation, and then
+		 * trigger a software interrupt with a rate limit of 20K on
+		 * software interrupts, this will help avoid high interrupt
+		 * loads due to frequently polling and exiting polling.
+		 */
+		itr_val = ice_buildreg_itr(ICE_IDX_ITR2, ICE_ITR_20K);
 		itr_val |= GLINT_DYN_CTL_SWINT_TRIG_M |
-			   GLINT_DYN_CTL_SW_ITR_INDX_M |
+			   ICE_IDX_ITR2 << GLINT_DYN_CTL_SW_ITR_INDX_S |
 			   GLINT_DYN_CTL_SW_ITR_INDX_ENA_M;
 	}
 	wr32(&vsi->back->hw, GLINT_DYN_CTL(q_vector->reg_idx), itr_val);
-- 
2.31.1


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Intel-wired-lan] [PATCH net-next v3 1/3] ice: update dim usage and moderation
  2021-08-24  1:12 ` [Intel-wired-lan] [PATCH net-next v3 1/3] ice: update dim usage and moderation Jesse Brandeburg
@ 2021-09-20  5:09   ` G, GurucharanX
  2021-09-20  6:46   ` Paul Menzel
  1 sibling, 0 replies; 9+ messages in thread
From: G, GurucharanX @ 2021-09-20  5:09 UTC (permalink / raw)
  To: intel-wired-lan



> -----Original Message-----
> From: Intel-wired-lan <intel-wired-lan-bounces@osuosl.org> On Behalf Of
> Jesse Brandeburg
> Sent: Tuesday, August 24, 2021 6:43 AM
> To: intel-wired-lan at lists.osuosl.org
> Subject: [Intel-wired-lan] [PATCH net-next v3 1/3] ice: update dim usage and
> moderation
> 
> The driver was having trouble with unreliable latency when doing single
> threaded ping-pong tests. This was root caused to the DIM algorithm landing
> on a too slow interrupt value, which caused high latency, and it was especially
> present when queues were being switched frequently by the scheduler as
> happens on default setups today.
> 
> In attempting to improve this, we allow the upper rate limit for interrupts to
> move to rate limit of 4 microseconds as a max, which means that no vector
> can generate more than 250,000 interrupts per second. The old config was up
> to 100,000. The driver previously tried to program the rate limit too
> frequently and if the receive and transmit side were both active on the same
> vector, the INTRL would be set incorrectly, and this change fixes that issue as
> a side effect of the redesign.
> 
> This driver will operate from now on with a slightly changed DIM table with
> more emphasis towards latency sensitivity by having more table entries with
> lower latency than with high latency (high being >= 64 microseconds).
> 
> The driver also resets the DIM algorithm state with a new stats set when
> there is no work done and the data becomes stale (older than 1 second), for
> the respective receive or transmit portion of the interrupt.
> 
> Add a new helper for setting rate limit, which will be used more in a followup
> patch.
> 
> Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
> 
> ---
> v3: merged on top of tx_ring/rx_ring split patch series
> v2: original version
> ---
>  drivers/net/ethernet/intel/ice/ice_lib.c  |  25 +++++
>  drivers/net/ethernet/intel/ice/ice_lib.h  |   1 +
>  drivers/net/ethernet/intel/ice/ice_main.c | 118 ++++++++++++----------
> drivers/net/ethernet/intel/ice/ice_txrx.c |  84 ++++++++-------
>  4 files changed, 141 insertions(+), 87 deletions(-)
> 

Tested-by: Gurucharan G <gurucharanx.g@intel.com> (A Contingent worker at Intel)

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Intel-wired-lan] [PATCH net-next v3 2/3] ice: fix rate limit update after coalesce change
  2021-08-24  1:12 ` [Intel-wired-lan] [PATCH net-next v3 2/3] ice: fix rate limit update after coalesce change Jesse Brandeburg
@ 2021-09-20  5:10   ` G, GurucharanX
  0 siblings, 0 replies; 9+ messages in thread
From: G, GurucharanX @ 2021-09-20  5:10 UTC (permalink / raw)
  To: intel-wired-lan



> -----Original Message-----
> From: Intel-wired-lan <intel-wired-lan-bounces@osuosl.org> On Behalf Of
> Jesse Brandeburg
> Sent: Tuesday, August 24, 2021 6:43 AM
> To: intel-wired-lan at lists.osuosl.org
> Subject: [Intel-wired-lan] [PATCH net-next v3 2/3] ice: fix rate limit update
> after coalesce change
> 
> If the adaptive settings are changed with ethtool -C ethx adaptive-rx off
> adaptive-tx off then the interrupt rate limit should be maintained as a user
> set value, but only if BOTH adaptive settings are off. Fix a bug where the rate
> limit that was being used in adaptive mode was staying set in the register but
> was not reported correctly by ethtool -c ethx. Due to long lines include a
> small refactor of q_vector variable.
> 
> Fixes: b8b4772377dd ("ice: refactor interrupt moderation writes")
> Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
> ---
> v3: merged with tx_ring/rx_ring split series.
> ---
>  drivers/net/ethernet/intel/ice/ice_ethtool.c | 17 +++++++++++------
>  drivers/net/ethernet/intel/ice/ice_lib.c     |  4 ++--
>  2 files changed, 13 insertions(+), 8 deletions(-)
> 

Tested-by: Gurucharan G <gurucharanx.g@intel.com> (A Contingent worker at Intel)

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Intel-wired-lan] [PATCH net-next v3 3/3] ice: fix software generating extra interrupts
  2021-08-24  1:12 ` [Intel-wired-lan] [PATCH net-next v3 3/3] ice: fix software generating extra interrupts Jesse Brandeburg
@ 2021-09-20  5:11   ` G, GurucharanX
  0 siblings, 0 replies; 9+ messages in thread
From: G, GurucharanX @ 2021-09-20  5:11 UTC (permalink / raw)
  To: intel-wired-lan



> -----Original Message-----
> From: Intel-wired-lan <intel-wired-lan-bounces@osuosl.org> On Behalf Of
> Jesse Brandeburg
> Sent: Tuesday, August 24, 2021 6:43 AM
> To: intel-wired-lan at lists.osuosl.org
> Subject: [Intel-wired-lan] [PATCH net-next v3 3/3] ice: fix software
> generating extra interrupts
> 
> The driver tried to work around missing completion events that occurred
> while interrupts are disabled, by triggering a software interrupt whenever
> we exit polling (but we had to have polled at least once).
> 
> This was causing a *lot* of extra interrupts for some workloads like NVMe
> over TCP, which resulted in regressions in performance. It was also visible
> when polling didn't prevent interrupts when busy_poll was enabled.
> 
> Fix the extra interrupts by utilizing our previously unused 3rd ITR (interrupt
> throttle) index and set it to 20K interrupts per second, and then trigger a
> software interrupt within that rate limit.
> 
> While here, slightly refactor the code to avoid an overwrite of a local variable
> in the case of wb_en = true.
> 
> Fixes: b7306b42beaf ("ice: manage interrupts during poll exit")
> Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
> ---
>  .../net/ethernet/intel/ice/ice_hw_autogen.h   |  1 +
>  drivers/net/ethernet/intel/ice/ice_txrx.c     | 26 +++++++++++--------
>  2 files changed, 16 insertions(+), 11 deletions(-)
> 

Tested-by: Gurucharan G <gurucharanx.g@intel.com> (A Contingent worker at Intel)

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Intel-wired-lan] [PATCH net-next v3 1/3] ice: update dim usage and moderation
  2021-08-24  1:12 ` [Intel-wired-lan] [PATCH net-next v3 1/3] ice: update dim usage and moderation Jesse Brandeburg
  2021-09-20  5:09   ` G, GurucharanX
@ 2021-09-20  6:46   ` Paul Menzel
  2021-09-20 19:02     ` Jesse Brandeburg
  1 sibling, 1 reply; 9+ messages in thread
From: Paul Menzel @ 2021-09-20  6:46 UTC (permalink / raw)
  To: intel-wired-lan

Dear Jesse,



Am 24.08.21 um 03:12 schrieb Jesse Brandeburg:
> The driver was having trouble with unreliable latency when doing single
> threaded ping-pong tests. This was root caused to the DIM algorithm
> landing on a too slow interrupt value, which caused high latency, and it
> was especially present when queues were being switched frequently by the
> scheduler as happens on default setups today.
> 
> In attempting to improve this, we allow the upper rate limit for
> interrupts to move to rate limit of 4 microseconds as a max, which means
> that no vector can generate more than 250,000 interrupts per second. The
> old config was up to 100,000. The driver previously tried to program the
> rate limit too frequently and if the receive and transmit side were both
> active on the same vector, the INTRL would be set incorrectly, and this
> change fixes that issue as a side effect of the redesign.
> 
> This driver will operate from now on with a slightly changed DIM table
> with more emphasis towards latency sensitivity by having more table
> entries with lower latency than with high latency (high being >= 64
> microseconds).
> 
> The driver also resets the DIM algorithm state with a new stats set when
> there is no work done and the data becomes stale (older than 1 second),
> for the respective receive or transmit portion of the interrupt.
> 
> Add a new helper for setting rate limit, which will be used more
> in a followup patch.

Nice work. Could you please add, how to run the ping-pong test, so 
others can verify the behavior?

> Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
> 
> ---
> v3: merged on top of tx_ring/rx_ring split patch series
> v2: original version
> ---
>   drivers/net/ethernet/intel/ice/ice_lib.c  |  25 +++++
>   drivers/net/ethernet/intel/ice/ice_lib.h  |   1 +
>   drivers/net/ethernet/intel/ice/ice_main.c | 118 ++++++++++++----------
>   drivers/net/ethernet/intel/ice/ice_txrx.c |  84 ++++++++-------
>   4 files changed, 141 insertions(+), 87 deletions(-)
> 
> diff --git a/drivers/net/ethernet/intel/ice/ice_lib.c b/drivers/net/ethernet/intel/ice/ice_lib.c
> index f9f060da3f04..28240b9f83e5 100644
> --- a/drivers/net/ethernet/intel/ice/ice_lib.c
> +++ b/drivers/net/ethernet/intel/ice/ice_lib.c
> @@ -1941,6 +1941,31 @@ void ice_write_itr(struct ice_ring_container *rc, u16 itr)
>   	__ice_write_itr(q_vector, rc, itr);
>   }
>   
> +/**
> + * ice_set_q_vector_intrl - set up interrupt rate limiting
> + * @q_vector: the vector to be configured
> + *
> + * Interrupt rate limiting is local to the vector, not per-queue so we must
> + * detect if either ring container has dynamic enabled to decide what to set

Dynamic moderation?

> + * the interrupt rate limit to via INTRL settings. In the case of dynamic
> + * disabled on both, write the value with the cached setting to make sure

Ditto.

> + * INTRL matches user visible value.
> + */
> +void ice_set_q_vector_intrl(struct ice_q_vector *q_vector)
> +{
> +	if (ITR_IS_DYNAMIC(&q_vector->tx) || ITR_IS_DYNAMIC(&q_vector->rx)) {
> +		/* in the case of dynamic enabled, cap each vector to no more
> +		 * than (4 us) 250,000 ints/sec, which allows low latency
> +		 * but still less than 500,000 interrupts per second, which
> +		 * reduces CPU a bit in the case of the lowest latency
> +		 * setting. The 4 here is a value in microseconds.
> +		 */
> +		ice_write_intrl(q_vector, 4);
> +	} else {
> +		ice_write_intrl(q_vector, q_vector->intrl);
> +	}
> +}
> +
>   /**
>    * ice_vsi_cfg_msix - MSIX mode Interrupt Config in the HW
>    * @vsi: the VSI being configured
> diff --git a/drivers/net/ethernet/intel/ice/ice_lib.h b/drivers/net/ethernet/intel/ice/ice_lib.h
> index f446c5071478..989c4241676a 100644
> --- a/drivers/net/ethernet/intel/ice/ice_lib.h
> +++ b/drivers/net/ethernet/intel/ice/ice_lib.h
> @@ -103,6 +103,7 @@ int ice_status_to_errno(enum ice_status err);
>   
>   void ice_write_intrl(struct ice_q_vector *q_vector, u8 intrl);
>   void ice_write_itr(struct ice_ring_container *rc, u16 itr);
> +void ice_set_q_vector_intrl(struct ice_q_vector *q_vector);
>   
>   enum ice_status
>   ice_vsi_cfg_mac_fltr(struct ice_vsi *vsi, const u8 *macaddr, bool set);
> diff --git a/drivers/net/ethernet/intel/ice/ice_main.c b/drivers/net/ethernet/intel/ice/ice_main.c
> index 10c6287aa5b2..7749bb0945bc 100644
> --- a/drivers/net/ethernet/intel/ice/ice_main.c
> +++ b/drivers/net/ethernet/intel/ice/ice_main.c
> @@ -5568,77 +5568,59 @@ int ice_vsi_cfg(struct ice_vsi *vsi)
>   }
>   
>   /* THEORY OF MODERATION:
> - * The below code creates custom DIM profiles for use by this driver, because
> - * the ice driver hardware works differently than the hardware that DIMLIB was
> + * The ice driver hardware works differently than the hardware that DIMLIB was
>    * originally made for. ice hardware doesn't have packet count limits that
>    * can trigger an interrupt, but it *does* have interrupt rate limit support,
> - * and this code adds that capability to be used by the driver when it's using
> - * DIMLIB. The DIMLIB code was always designed to be a suggestion to the driver
> - * for how to "respond" to traffic and interrupts, so this driver uses a
> - * slightly different set of moderation parameters to get best performance.
> + * which is hard-coded to limit to 250,000 ints/second.

Maybe:

? hard-coded to a limit of 250,000 ints/second.

> + * If not using dynamic moderation, the INTRL value can be modified
> + * by ethtool rx-usecs-high.
>    */
>   struct ice_dim {
>   	/* the throttle rate for interrupts, basically worst case delay before
>   	 * an initial interrupt fires, value is stored in microseconds.
>   	 */
>   	u16 itr;
> -	/* the rate limit for interrupts, which can cap a delay from a small
> -	 * ITR at a certain amount of interrupts per second. f.e. a 2us ITR
> -	 * could yield as much as 500,000 interrupts per second, but with a
> -	 * 10us rate limit, it limits to 100,000 interrupts per second. Value
> -	 * is stored in microseconds.
> -	 */
> -	u16 intrl;
>   };
>   
>   /* Make a different profile for Rx that doesn't allow quite so aggressive
> - * moderation at the high end (it maxes out at 128us or about 8k interrupts a
> - * second. The INTRL/rate parameters here are only useful to cap small ITR
> - * values, which is why for larger ITR's - like 128, which can only generate
> - * 8k interrupts per second, there is no point to rate limit and the values
> - * are set to zero. The rate limit values do affect latency, and so must
> - * be reasonably small so to not impact latency sensitive tests.
> + * moderation at the high end (it maxxes out at 126us or about 8k interrupts a

maxes?

> + * second.
>    */
>   static const struct ice_dim rx_profile[] = {
> -	{2, 10},
> -	{8, 16},
> -	{32, 0},
> -	{96, 0},
> -	{128, 0}
> +	{2},    /* 500,000 ints/s, capped at 250K by INTRL */
> +	{8},    /* 125,000 ints/s */
> +	{16},   /*  62,500 ints/s */
> +	{62},   /*  16,129 ints/s */
> +	{126}   /*   7,936 ints/s */
>   };
>   
>   /* The transmit profile, which has the same sorts of values
>    * as the previous struct
>    */
>   static const struct ice_dim tx_profile[] = {
> -	{2, 10},
> -	{8, 16},
> -	{64, 0},
> -	{128, 0},
> -	{256, 0}
> +	{2},    /* 500,000 ints/s, capped at 250K by INTRL */
> +	{8},    /* 125,000 ints/s */
> +	{40},   /*  16,125 ints/s */
> +	{128},  /*   7,812 ints/s */
> +	{256}   /*   3,906 ints/s */
>   };
>   
>   static void ice_tx_dim_work(struct work_struct *work)
>   {
>   	struct ice_ring_container *rc;
> -	struct ice_q_vector *q_vector;
>   	struct dim *dim;
> -	u16 itr, intrl;
> +	u16 itr;
>   
>   	dim = container_of(work, struct dim, work);
> -	rc = container_of(dim, struct ice_ring_container, dim);
> -	q_vector = container_of(rc, struct ice_q_vector, tx);
> +	rc = (struct ice_ring_container *)dim->priv;
>   
> -	if (dim->profile_ix >= ARRAY_SIZE(tx_profile))
> -		dim->profile_ix = ARRAY_SIZE(tx_profile) - 1;
> +	WARN_ON(dim->profile_ix >= ARRAY_SIZE(tx_profile));
>   
>   	/* look up the values in our local table */
>   	itr = tx_profile[dim->profile_ix].itr;
> -	intrl = tx_profile[dim->profile_ix].intrl;
>   
> -	ice_trace(tx_dim_work, q_vector, dim);
> +	ice_trace(tx_dim_work, container_of(rc, struct ice_q_vector, tx), dim);
>   	ice_write_itr(rc, itr);
> -	ice_write_intrl(q_vector, intrl);
>   
>   	dim->state = DIM_START_MEASURE;
>   }
> @@ -5646,28 +5628,64 @@ static void ice_tx_dim_work(struct work_struct *work)
>   static void ice_rx_dim_work(struct work_struct *work)
>   {
>   	struct ice_ring_container *rc;
> -	struct ice_q_vector *q_vector;
>   	struct dim *dim;
> -	u16 itr, intrl;
> +	u16 itr;
>   
>   	dim = container_of(work, struct dim, work);
> -	rc = container_of(dim, struct ice_ring_container, dim);
> -	q_vector = container_of(rc, struct ice_q_vector, rx);
> +	rc = (struct ice_ring_container *)dim->priv;
>   
> -	if (dim->profile_ix >= ARRAY_SIZE(rx_profile))
> -		dim->profile_ix = ARRAY_SIZE(rx_profile) - 1;
> +	WARN_ON(dim->profile_ix >= ARRAY_SIZE(rx_profile));
>   
>   	/* look up the values in our local table */
>   	itr = rx_profile[dim->profile_ix].itr;
> -	intrl = rx_profile[dim->profile_ix].intrl;
>   
> -	ice_trace(rx_dim_work, q_vector, dim);
> +	ice_trace(rx_dim_work, container_of(rc, struct ice_q_vector, rx), dim);
>   	ice_write_itr(rc, itr);
> -	ice_write_intrl(q_vector, intrl);
>   
>   	dim->state = DIM_START_MEASURE;
>   }
>   
> +#define ICE_DIM_DEFAULT_PROFILE_IX 1
> +
> +/**
> + * ice_init_moderation - set up interrupt moderation
> + * @q_vector: the vector containing rings to be configured
> + *
> + * Set up interrupt moderation registers, with the intent to do the right thing
> + * when called from reset or from probe, and whether or not dynamic moderation
> + * is enabled or not. Take special care to write all the registers in both
> + * dynamic mode or not in order to make sure hardware is in a known state.

moderation?

> + */
> +static void ice_init_moderation(struct ice_q_vector *q_vector)
> +{
> +	struct ice_ring_container *rc;
> +	bool tx_dynamic, rx_dynamic;
> +
> +	rc = &q_vector->tx;
> +	INIT_WORK(&rc->dim.work, ice_tx_dim_work);
> +	rc->dim.mode = DIM_CQ_PERIOD_MODE_START_FROM_EQE;
> +	rc->dim.profile_ix = ICE_DIM_DEFAULT_PROFILE_IX;
> +	rc->dim.priv = rc;
> +	tx_dynamic = ITR_IS_DYNAMIC(rc);
> +
> +	/* set the initial TX ITR to match the above */
> +	ice_write_itr(rc, tx_dynamic ?
> +		      tx_profile[rc->dim.profile_ix].itr : rc->itr_setting);
> +
> +	rc = &q_vector->rx;
> +	INIT_WORK(&rc->dim.work, ice_rx_dim_work);
> +	rc->dim.mode = DIM_CQ_PERIOD_MODE_START_FROM_EQE;
> +	rc->dim.profile_ix = ICE_DIM_DEFAULT_PROFILE_IX;
> +	rc->dim.priv = rc;
> +	rx_dynamic = ITR_IS_DYNAMIC(rc);
> +
> +	/* set the initial RX ITR to match the above */
> +	ice_write_itr(rc, rx_dynamic ? rx_profile[rc->dim.profile_ix].itr :
> +				       rc->itr_setting);
> +
> +	ice_set_q_vector_intrl(q_vector);
> +}
> +
>   /**
>    * ice_napi_enable_all - Enable NAPI for all q_vectors in the VSI
>    * @vsi: the VSI being configured
> @@ -5682,11 +5700,7 @@ static void ice_napi_enable_all(struct ice_vsi *vsi)
>   	ice_for_each_q_vector(vsi, q_idx) {
>   		struct ice_q_vector *q_vector = vsi->q_vectors[q_idx];
>   
> -		INIT_WORK(&q_vector->tx.dim.work, ice_tx_dim_work);
> -		q_vector->tx.dim.mode = DIM_CQ_PERIOD_MODE_START_FROM_EQE;
> -
> -		INIT_WORK(&q_vector->rx.dim.work, ice_rx_dim_work);
> -		q_vector->rx.dim.mode = DIM_CQ_PERIOD_MODE_START_FROM_EQE;
> +		ice_init_moderation(q_vector);
>   
>   		if (q_vector->rx.rx_ring || q_vector->tx.tx_ring)
>   			napi_enable(&q_vector->napi);
> diff --git a/drivers/net/ethernet/intel/ice/ice_txrx.c b/drivers/net/ethernet/intel/ice/ice_txrx.c
> index 01ae331927bd..33573774647f 100644
> --- a/drivers/net/ethernet/intel/ice/ice_txrx.c
> +++ b/drivers/net/ethernet/intel/ice/ice_txrx.c
> @@ -1259,6 +1259,41 @@ int ice_clean_rx_irq(struct ice_rx_ring *rx_ring, int budget)
>   	return failure ? budget : (int)total_rx_pkts;
>   }
>   
> +static void __ice_update_sample(struct ice_q_vector *q_vector,
> +				struct ice_ring_container *rc,
> +				struct dim_sample *sample,
> +				bool is_tx)
> +{
> +	u64 packets = 0, bytes = 0;
> +
> +	if (is_tx) {
> +		struct ice_tx_ring *tx_ring;
> +
> +		ice_for_each_tx_ring(tx_ring, *rc) {
> +			packets += tx_ring->stats.pkts;
> +			bytes += tx_ring->stats.bytes;
> +		}
> +	} else {
> +		struct ice_rx_ring *rx_ring;
> +
> +		ice_for_each_rx_ring(rx_ring, *rc) {
> +			packets += rx_ring->stats.pkts;
> +			bytes += rx_ring->stats.bytes;
> +		}
> +	}
> +
> +	dim_update_sample(q_vector->total_events, packets, bytes, sample);
> +	sample->comp_ctr = 0;
> +
> +	/* if dim settings get stale, like when not updated for 1
> +	 * second or longer, force it to start again. This addresses the
> +	 * freqent case of an idle queue being switched to by the

frequent

> +	 * scheduler. The 1,000 here means 1,000 milliseconds.
> +	 */
> +	if (ktime_ms_delta(sample->time, rc->dim.start_sample.time) >= 1000)
> +		rc->dim.state = DIM_START_MEASURE;
> +}
> +
>   /**
>    * ice_net_dim - Update net DIM algorithm
>    * @q_vector: the vector associated with the interrupt
> @@ -1274,34 +1309,16 @@ static void ice_net_dim(struct ice_q_vector *q_vector)
>   	struct ice_ring_container *rx = &q_vector->rx;
>   
>   	if (ITR_IS_DYNAMIC(tx)) {
> -		struct dim_sample dim_sample = {};
> -		u64 packets = 0, bytes = 0;
> -		struct ice_tx_ring *ring;
> -
> -		ice_for_each_tx_ring(ring, q_vector->tx) {
> -			packets += ring->stats.pkts;
> -			bytes += ring->stats.bytes;
> -		}
> -
> -		dim_update_sample(q_vector->total_events, packets, bytes,
> -				  &dim_sample);
> +		struct dim_sample dim_sample;
>   
> +		__ice_update_sample(q_vector, tx, &dim_sample, true);
>   		net_dim(&tx->dim, dim_sample);
>   	}
>   
>   	if (ITR_IS_DYNAMIC(rx)) {
> -		struct dim_sample dim_sample = {};
> -		u64 packets = 0, bytes = 0;
> -		struct ice_rx_ring *ring;
> -
> -		ice_for_each_rx_ring(ring, q_vector->rx) {
> -			packets += ring->stats.pkts;
> -			bytes += ring->stats.bytes;
> -		}
> -
> -		dim_update_sample(q_vector->total_events, packets, bytes,
> -				  &dim_sample);
> +		struct dim_sample dim_sample;
>   
> +		__ice_update_sample(q_vector, rx, &dim_sample, false);
>   		net_dim(&rx->dim, dim_sample);
>   	}
>   }
> @@ -1328,15 +1345,14 @@ static u32 ice_buildreg_itr(u16 itr_idx, u16 itr)
>   }
>   
>   /**
> - * ice_update_ena_itr - Update ITR moderation and re-enable MSI-X interrupt
> + * ice_enable_interrupt - re-enable MSI-X interrupt
>    * @q_vector: the vector associated with the interrupt to enable
>    *
> - * Update the net_dim() algorithm and re-enable the interrupt associated with
> - * this vector.
> - *
> - * If the VSI is down, the interrupt will not be re-enabled.
> + * If the VSI is down, the interrupt will not be re-enabled. Also,
> + * when enabling the interrupt always reset the wb_on_itr to false
> + * and trigger a software interrupt to clean out internal state.

Is the internal state cleaning required in the spec?

>    */
> -static void ice_update_ena_itr(struct ice_q_vector *q_vector)
> +static void ice_enable_interrupt(struct ice_q_vector *q_vector)
>   {
>   	struct ice_vsi *vsi = q_vector->vsi;
>   	bool wb_en = q_vector->wb_on_itr;
> @@ -1351,10 +1367,6 @@ static void ice_update_ena_itr(struct ice_q_vector *q_vector)
>   	if (wb_en)
>   		q_vector->wb_on_itr = false;
>   
> -	/* This will do nothing if dynamic updates are not enabled. */
> -	ice_net_dim(q_vector);
> -
> -	/* net_dim() updates ITR out-of-band using a work item */
>   	itr_val = ice_buildreg_itr(ICE_ITR_NONE, 0);
>   	/* trigger an immediate software interrupt when exiting
>   	 * busy poll, to make sure to catch any pending cleanups
> @@ -1482,10 +1494,12 @@ int ice_napi_poll(struct napi_struct *napi, int budget)
>   	/* Exit the polling mode, but don't re-enable interrupts if stack might
>   	 * poll us due to busy-polling
>   	 */
> -	if (likely(napi_complete_done(napi, work_done)))
> -		ice_update_ena_itr(q_vector);
> -	else
> +	if (likely(napi_complete_done(napi, work_done))) {
> +		ice_net_dim(q_vector);
> +		ice_enable_interrupt(q_vector);
> +	} else {
>   		ice_set_wb_on_itr(q_vector);
> +	}
>   
>   	return min_t(int, work_done, budget - 1);
>   }


Kind regards,

Paul

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Intel-wired-lan] [PATCH net-next v3 1/3] ice: update dim usage and moderation
  2021-09-20  6:46   ` Paul Menzel
@ 2021-09-20 19:02     ` Jesse Brandeburg
  0 siblings, 0 replies; 9+ messages in thread
From: Jesse Brandeburg @ 2021-09-20 19:02 UTC (permalink / raw)
  To: intel-wired-lan

On 9/19/2021 11:46 PM, Paul Menzel wrote:
> Dear Jesse,

Thanks for reviewing, I applied your text changes to v4.

> Am 24.08.21 um 03:12 schrieb Jesse Brandeburg:
>> Add a new helper for setting rate limit, which will be used more
>> in a followup patch.
> 
> Nice work. Could you please add, how to run the ping-pong test, so
> others can verify the behavior?

The test ran 32 rounds of netperf -t TCP_RR with a single thread, with
sizes of bytes transferred varying in powers of two from 1-64K.
each command was something like (for example for the 8K ping pong)
netperf -t TCP_RR -H host -l 30 -- -r 8K


>> @@ -1328,15 +1345,14 @@ static u32 ice_buildreg_itr(u16 itr_idx, u16 itr)
>> ? }
>> ? ? /**
>> - * ice_update_ena_itr - Update ITR moderation and re-enable MSI-X
>> interrupt
>> + * ice_enable_interrupt - re-enable MSI-X interrupt
>> ?? * @q_vector: the vector associated with the interrupt to enable
>> ?? *
>> - * Update the net_dim() algorithm and re-enable the interrupt
>> associated with
>> - * this vector.
>> - *
>> - * If the VSI is down, the interrupt will not be re-enabled.
>> + * If the VSI is down, the interrupt will not be re-enabled. Also,
>> + * when enabling the interrupt always reset the wb_on_itr to false
>> + * and trigger a software interrupt to clean out internal state.
> 
> Is the internal state cleaning required in the spec?

It is mentioned in the interrupt management section that triggering a
software interrupt will "flush" internal state. The spec just doesn't
mention when you should use that trick.


^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2021-09-20 19:02 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-08-24  1:12 [Intel-wired-lan] [PATCH net-next v3 0/3] ice: interrupt moderation updates Jesse Brandeburg
2021-08-24  1:12 ` [Intel-wired-lan] [PATCH net-next v3 1/3] ice: update dim usage and moderation Jesse Brandeburg
2021-09-20  5:09   ` G, GurucharanX
2021-09-20  6:46   ` Paul Menzel
2021-09-20 19:02     ` Jesse Brandeburg
2021-08-24  1:12 ` [Intel-wired-lan] [PATCH net-next v3 2/3] ice: fix rate limit update after coalesce change Jesse Brandeburg
2021-09-20  5:10   ` G, GurucharanX
2021-08-24  1:12 ` [Intel-wired-lan] [PATCH net-next v3 3/3] ice: fix software generating extra interrupts Jesse Brandeburg
2021-09-20  5:11   ` G, GurucharanX

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.