[Intel-wired-lan] [PATCH iwl-next v2 0/5] ice: Improve miscellaneous interrupt code

All of lore.kernel.org
 help / color / mirror / Atom feed

* [Intel-wired-lan] [PATCH iwl-next v2 0/5] ice: Improve miscellaneous interrupt code
@ 2023-05-30 17:46 Jacob Keller
  2023-05-30 17:46 ` [Intel-wired-lan] [PATCH iwl-next v2 1/5] ice: handle extts in the miscellaneous interrupt thread Jacob Keller
                   ` (4 more replies)
  0 siblings, 5 replies; 11+ messages in thread
From: Jacob Keller @ 2023-05-30 17:46 UTC (permalink / raw)
  To: Intel Wired LAN, Anthony Nguyen

This series fixes issues that occur due to the interaction with the hard IRQ
ice_misc_intr() function and the soft IRQ ice_misc_intr_thread_fn().

First, Karol fixes an issue with the external timestamp being handled by the
PTP kworker, resulting in a disruption of timestamping an input signal on
the external software defined pins with a frequency of 100Hz.

Second, Karol fixes an issue when running on PREEMPT_RT kernels due to such
kernels not allowing queue_work() inside of the hard IRQ. The scheduling of
the service task is moved into the soft IRQ thread function instead.

Following this, I have three patches intended to fix an issue with the
handling for E822 hardware involving a rather subtle and complicated race
between the hard IRQ function and the soft IRQ thread function. Ultimately
the code is refactored so that we stop polling in the thread function, and
ensure that the OICR (Other interrupt control register) is disabled for the
duration of the thread.

I think that while this does fix a race condition issue, the amount of
changes required to refactor the interrupt function is large enough that it
does not warrant going into net.

The race itself manifests only if multiple Tx timestamps are outstanding at
once. This is atypical behavior for most applications. For example, ptp4l
synchronously waits for timestamps before continuing processing.

Jacob Keller (3):
  ice: introduce ICE_TX_TSTAMP_WORK enumeration
  ice: trigger PFINT_OICR_TSYN_TX interrupt instead of polling
  ice: do not re-enable miscellaneous interrupt until thread_fn
    completes

Karol Kolacinski (2):
  ice: handle extts in the miscellaneous interrupt thread
  ice: always return IRQ_WAKE_THREAD in ice_misc_intr()

Changes since v1:
* Re-write commit message for the second patch involving the
  ice_schedule_work() change.
* Remove unused more_timestamps variable

 drivers/net/ethernet/intel/ice/ice.h      |  1 +
 drivers/net/ethernet/intel/ice/ice_main.c | 52 +++++++++++++------
 drivers/net/ethernet/intel/ice/ice_ptp.c  | 62 ++++++++++++-----------
 drivers/net/ethernet/intel/ice/ice_ptp.h  | 16 ++++--
 4 files changed, 83 insertions(+), 48 deletions(-)

base-commit: 1e806efa4f2837a829044df27e1196a4fd520ba3
-- 
2.40.0.471.gbd7f14d9353b

_______________________________________________
Intel-wired-lan mailing list
Intel-wired-lan@osuosl.org
https://lists.osuosl.org/mailman/listinfo/intel-wired-lan

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Intel-wired-lan] [PATCH iwl-next v2 1/5] ice: handle extts in the miscellaneous interrupt thread
  2023-05-30 17:46 [Intel-wired-lan] [PATCH iwl-next v2 0/5] ice: Improve miscellaneous interrupt code Jacob Keller
@ 2023-05-30 17:46 ` Jacob Keller
  2023-05-31 14:38   ` Michal Schmidt
  2023-05-30 17:46 ` [Intel-wired-lan] [PATCH iwl-next v2 2/5] ice: always return IRQ_WAKE_THREAD in ice_misc_intr() Jacob Keller
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 11+ messages in thread
From: Jacob Keller @ 2023-05-30 17:46 UTC (permalink / raw)
  To: Intel Wired LAN, Anthony Nguyen; +Cc: Karol Kolacinski, Anatolii Gerasymenko

From: Karol Kolacinski <karol.kolacinski@intel.com>

The ice_ptp_extts_work() and ice_ptp_periodic_work() functions are both
scheduled on the same kthread worker, pf.ptp.kworker. The
ice_ptp_periodic_work() function sends to the firmware to interact with the
PHY, and must block to wait for responses.

This can cause delay in responding to the PFINT_OICR_TSYN_EVNT interrupt
cause, ultimately resulting in disruption to processing an input signal of
the frequency is high enough. In our testing, even 100 Hz signals get
disrupted.

Fix this by instead processing the signal inside the miscellaneous
interrupt thread prior to handling Tx timestamps.

Fixes: 172db5f91d5f ("ice: add support for auxiliary input/output pins")
Reported-by: Anatolii Gerasymenko <anatolii.gerasymenko@intel.com>
Signed-off-by: Karol Kolacinski <karol.kolacinski@intel.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
---
 drivers/net/ethernet/intel/ice/ice.h      |  1 +
 drivers/net/ethernet/intel/ice/ice_main.c | 33 +++++++++++++++++------
 drivers/net/ethernet/intel/ice/ice_ptp.c  | 12 +++------
 drivers/net/ethernet/intel/ice/ice_ptp.h  |  4 +--
 4 files changed, 31 insertions(+), 19 deletions(-)

diff --git a/drivers/net/ethernet/intel/ice/ice.h b/drivers/net/ethernet/intel/ice/ice.h
index 176e281dfa24..4256b9f5a25f 100644
--- a/drivers/net/ethernet/intel/ice/ice.h
+++ b/drivers/net/ethernet/intel/ice/ice.h
@@ -580,6 +580,7 @@ struct ice_pf {
 	u32 hw_csum_rx_error;
 	u32 oicr_err_reg;
 	struct msi_map oicr_irq;	/* Other interrupt cause MSIX vector */
+	u16 oicr_misc;		/* Misc interrupts to handle in bottom half */
 	u16 max_pf_txqs;	/* Total Tx queues PF wide */
 	u16 max_pf_rxqs;	/* Total Rx queues PF wide */
 	u16 num_lan_msix;	/* Total MSIX vectors for base driver */
diff --git a/drivers/net/ethernet/intel/ice/ice_main.c b/drivers/net/ethernet/intel/ice/ice_main.c
index c7d7104dbadc..9e4d7d884115 100644
--- a/drivers/net/ethernet/intel/ice/ice_main.c
+++ b/drivers/net/ethernet/intel/ice/ice_main.c
@@ -3139,20 +3139,28 @@ static irqreturn_t ice_misc_intr(int __always_unused irq, void *data)
 
 	if (oicr & PFINT_OICR_TSYN_TX_M) {
 		ena_mask &= ~PFINT_OICR_TSYN_TX_M;
-		if (!hw->reset_ongoing)
+		if (!hw->reset_ongoing) {
+			pf->oicr_misc |= PFINT_OICR_TSYN_TX_M;
 			ret = IRQ_WAKE_THREAD;
+		}
 	}
 
 	if (oicr & PFINT_OICR_TSYN_EVNT_M) {
 		u8 tmr_idx = hw->func_caps.ts_func_info.tmr_index_owned;
 		u32 gltsyn_stat = rd32(hw, GLTSYN_STAT(tmr_idx));
 
-		/* Save EVENTs from GTSYN register */
-		pf->ptp.ext_ts_irq |= gltsyn_stat & (GLTSYN_STAT_EVENT0_M |
-						     GLTSYN_STAT_EVENT1_M |
-						     GLTSYN_STAT_EVENT2_M);
 		ena_mask &= ~PFINT_OICR_TSYN_EVNT_M;
-		kthread_queue_work(pf->ptp.kworker, &pf->ptp.extts_work);
+
+		if (hw->func_caps.ts_func_info.src_tmr_owned) {
+			/* Save EVENTs from GLTSYN register */
+			pf->ptp.ext_ts_irq |= gltsyn_stat &
+					      (GLTSYN_STAT_EVENT0_M |
+					       GLTSYN_STAT_EVENT1_M |
+					       GLTSYN_STAT_EVENT2_M);
+
+			pf->oicr_misc |= PFINT_OICR_TSYN_EVNT_M;
+			ret = IRQ_WAKE_THREAD;
+		}
 	}
 
 #define ICE_AUX_CRIT_ERR (PFINT_OICR_PE_CRITERR_M | PFINT_OICR_HMC_ERR_M | PFINT_OICR_PE_PUSH_M)
@@ -3196,8 +3204,16 @@ static irqreturn_t ice_misc_intr_thread_fn(int __always_unused irq, void *data)
 	if (ice_is_reset_in_progress(pf->state))
 		return IRQ_HANDLED;
 
-	while (!ice_ptp_process_ts(pf))
-		usleep_range(50, 100);
+	if (pf->oicr_misc & PFINT_OICR_TSYN_EVNT_M) {
+		ice_ptp_extts_event(pf);
+		pf->oicr_misc &= ~PFINT_OICR_TSYN_EVNT_M;
+	}
+
+	if (pf->oicr_misc & PFINT_OICR_TSYN_TX_M) {
+		while (!ice_ptp_process_ts(pf))
+			usleep_range(50, 100);
+		pf->oicr_misc &= ~PFINT_OICR_TSYN_TX_M;
+	}
 
 	return IRQ_HANDLED;
 }
@@ -4958,6 +4974,7 @@ ice_probe(struct pci_dev *pdev, const struct pci_device_id __always_unused *ent)
 	pf = ice_allocate_pf(dev);
 	if (!pf)
 		return -ENOMEM;
+	pf->oicr_misc = 0;
 
 	/* initialize Auxiliary index to invalid value */
 	pf->aux_idx = -1;
diff --git a/drivers/net/ethernet/intel/ice/ice_ptp.c b/drivers/net/ethernet/intel/ice/ice_ptp.c
index d4b6c997141d..6f51ebaf1d70 100644
--- a/drivers/net/ethernet/intel/ice/ice_ptp.c
+++ b/drivers/net/ethernet/intel/ice/ice_ptp.c
@@ -1458,15 +1458,11 @@ static int ice_ptp_adjfine(struct ptp_clock_info *info, long scaled_ppm)
 }
 
 /**
- * ice_ptp_extts_work - Workqueue task function
- * @work: external timestamp work structure
- *
- * Service for PTP external clock event
+ * ice_ptp_extts_event - Process PTP external clock event
+ * @pf: Board private structure
  */
-static void ice_ptp_extts_work(struct kthread_work *work)
+void ice_ptp_extts_event(struct ice_pf *pf)
 {
-	struct ice_ptp *ptp = container_of(work, struct ice_ptp, extts_work);
-	struct ice_pf *pf = container_of(ptp, struct ice_pf, ptp);
 	struct ptp_clock_event event;
 	struct ice_hw *hw = &pf->hw;
 	u8 chan, tmr_idx;
@@ -2558,7 +2554,6 @@ void ice_ptp_prepare_for_reset(struct ice_pf *pf)
 	ice_ptp_cfg_timestamp(pf, false);
 
 	kthread_cancel_delayed_work_sync(&ptp->work);
-	kthread_cancel_work_sync(&ptp->extts_work);
 
 	if (test_bit(ICE_PFR_REQ, pf->state))
 		return;
@@ -2656,7 +2651,6 @@ static int ice_ptp_init_work(struct ice_pf *pf, struct ice_ptp *ptp)
 
 	/* Initialize work functions */
 	kthread_init_delayed_work(&ptp->work, ice_ptp_periodic_work);
-	kthread_init_work(&ptp->extts_work, ice_ptp_extts_work);
 
 	/* Allocate a kworker for handling work required for the ports
 	 * connected to the PTP hardware clock.
diff --git a/drivers/net/ethernet/intel/ice/ice_ptp.h b/drivers/net/ethernet/intel/ice/ice_ptp.h
index 9cda2f43e0e5..9f8902c1e743 100644
--- a/drivers/net/ethernet/intel/ice/ice_ptp.h
+++ b/drivers/net/ethernet/intel/ice/ice_ptp.h
@@ -169,7 +169,6 @@ struct ice_ptp_port {
  * struct ice_ptp - data used for integrating with CONFIG_PTP_1588_CLOCK
  * @port: data for the PHY port initialization procedure
  * @work: delayed work function for periodic tasks
- * @extts_work: work function for handling external Tx timestamps
  * @cached_phc_time: a cached copy of the PHC time for timestamp extension
  * @cached_phc_jiffies: jiffies when cached_phc_time was last updated
  * @ext_ts_chan: the external timestamp channel in use
@@ -190,7 +189,6 @@ struct ice_ptp_port {
 struct ice_ptp {
 	struct ice_ptp_port port;
 	struct kthread_delayed_work work;
-	struct kthread_work extts_work;
 	u64 cached_phc_time;
 	unsigned long cached_phc_jiffies;
 	u8 ext_ts_chan;
@@ -256,6 +254,7 @@ int ice_ptp_get_ts_config(struct ice_pf *pf, struct ifreq *ifr);
 void ice_ptp_cfg_timestamp(struct ice_pf *pf, bool ena);
 int ice_get_ptp_clock_index(struct ice_pf *pf);
 
+void ice_ptp_extts_event(struct ice_pf *pf);
 s8 ice_ptp_request_ts(struct ice_ptp_tx *tx, struct sk_buff *skb);
 bool ice_ptp_process_ts(struct ice_pf *pf);
 
@@ -284,6 +283,7 @@ static inline int ice_get_ptp_clock_index(struct ice_pf *pf)
 	return -1;
 }
 
+static inline void ice_ptp_extts_event(struct ice_pf *pf) { }
 static inline s8
 ice_ptp_request_ts(struct ice_ptp_tx *tx, struct sk_buff *skb)
 {
-- 
2.40.0.471.gbd7f14d9353b

_______________________________________________
Intel-wired-lan mailing list
Intel-wired-lan@osuosl.org
https://lists.osuosl.org/mailman/listinfo/intel-wired-lan

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [Intel-wired-lan] [PATCH iwl-next v2 2/5] ice: always return IRQ_WAKE_THREAD in ice_misc_intr()
  2023-05-30 17:46 [Intel-wired-lan] [PATCH iwl-next v2 0/5] ice: Improve miscellaneous interrupt code Jacob Keller
  2023-05-30 17:46 ` [Intel-wired-lan] [PATCH iwl-next v2 1/5] ice: handle extts in the miscellaneous interrupt thread Jacob Keller
@ 2023-05-30 17:46 ` Jacob Keller
  2023-05-30 17:46 ` [Intel-wired-lan] [PATCH iwl-next v2 3/5] ice: introduce ICE_TX_TSTAMP_WORK enumeration Jacob Keller
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 11+ messages in thread
From: Jacob Keller @ 2023-05-30 17:46 UTC (permalink / raw)
  To: Intel Wired LAN, Anthony Nguyen; +Cc: Karol Kolacinski

From: Karol Kolacinski <karol.kolacinski@intel.com>

Refactor the ice_misc_intr() function to always return IRQ_WAKE_THREAD, and
schedule the service task during the soft IRQ thread function instead of at
the end of the hard IRQ handler.

Remove the duplicate call to ice_service_task_schedule() that happened when
we got a PCI exception.

Signed-off-by: Karol Kolacinski <karol.kolacinski@intel.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
---

Chances since v1:
* Rewrote commit message, no longer justifying change with reference to
  CONFIG_PREEMT_RT, as this is no longer true since v5.8. We still would
  like this change since it simplifies the ice_misc_intr() function and
  would more closely align with what we provided to our customer on an older
  stable kernel.

 drivers/net/ethernet/intel/ice/ice_main.c | 14 ++++----------
 1 file changed, 4 insertions(+), 10 deletions(-)

diff --git a/drivers/net/ethernet/intel/ice/ice_main.c b/drivers/net/ethernet/intel/ice/ice_main.c
index 9e4d7d884115..8b59632ec6b1 100644
--- a/drivers/net/ethernet/intel/ice/ice_main.c
+++ b/drivers/net/ethernet/intel/ice/ice_main.c
@@ -3058,7 +3058,6 @@ static irqreturn_t ice_misc_intr(int __always_unused irq, void *data)
 {
 	struct ice_pf *pf = (struct ice_pf *)data;
 	struct ice_hw *hw = &pf->hw;
-	irqreturn_t ret = IRQ_NONE;
 	struct device *dev;
 	u32 oicr, ena_mask;
 
@@ -3139,10 +3138,8 @@ static irqreturn_t ice_misc_intr(int __always_unused irq, void *data)
 
 	if (oicr & PFINT_OICR_TSYN_TX_M) {
 		ena_mask &= ~PFINT_OICR_TSYN_TX_M;
-		if (!hw->reset_ongoing) {
+		if (!hw->reset_ongoing)
 			pf->oicr_misc |= PFINT_OICR_TSYN_TX_M;
-			ret = IRQ_WAKE_THREAD;
-		}
 	}
 
 	if (oicr & PFINT_OICR_TSYN_EVNT_M) {
@@ -3159,7 +3156,6 @@ static irqreturn_t ice_misc_intr(int __always_unused irq, void *data)
 					       GLTSYN_STAT_EVENT2_M);
 
 			pf->oicr_misc |= PFINT_OICR_TSYN_EVNT_M;
-			ret = IRQ_WAKE_THREAD;
 		}
 	}
 
@@ -3180,16 +3176,12 @@ static irqreturn_t ice_misc_intr(int __always_unused irq, void *data)
 		if (oicr & (PFINT_OICR_PCI_EXCEPTION_M |
 			    PFINT_OICR_ECC_ERR_M)) {
 			set_bit(ICE_PFR_REQ, pf->state);
-			ice_service_task_schedule(pf);
 		}
 	}
-	if (!ret)
-		ret = IRQ_HANDLED;
 
-	ice_service_task_schedule(pf);
 	ice_irq_dynamic_ena(hw, NULL, NULL);
 
-	return ret;
+	return IRQ_WAKE_THREAD;
 }
 
 /**
@@ -3204,6 +3196,8 @@ static irqreturn_t ice_misc_intr_thread_fn(int __always_unused irq, void *data)
 	if (ice_is_reset_in_progress(pf->state))
 		return IRQ_HANDLED;
 
+	ice_service_task_schedule(pf);
+
 	if (pf->oicr_misc & PFINT_OICR_TSYN_EVNT_M) {
 		ice_ptp_extts_event(pf);
 		pf->oicr_misc &= ~PFINT_OICR_TSYN_EVNT_M;
-- 
2.40.0.471.gbd7f14d9353b

_______________________________________________
Intel-wired-lan mailing list
Intel-wired-lan@osuosl.org
https://lists.osuosl.org/mailman/listinfo/intel-wired-lan

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [Intel-wired-lan] [PATCH iwl-next v2 3/5] ice: introduce ICE_TX_TSTAMP_WORK enumeration
  2023-05-30 17:46 [Intel-wired-lan] [PATCH iwl-next v2 0/5] ice: Improve miscellaneous interrupt code Jacob Keller
  2023-05-30 17:46 ` [Intel-wired-lan] [PATCH iwl-next v2 1/5] ice: handle extts in the miscellaneous interrupt thread Jacob Keller
  2023-05-30 17:46 ` [Intel-wired-lan] [PATCH iwl-next v2 2/5] ice: always return IRQ_WAKE_THREAD in ice_misc_intr() Jacob Keller
@ 2023-05-30 17:46 ` Jacob Keller
  2023-05-30 17:46 ` [Intel-wired-lan] [PATCH iwl-next v2 4/5] ice: trigger PFINT_OICR_TSYN_TX interrupt instead of polling Jacob Keller
  2023-05-30 17:46 ` [Intel-wired-lan] [PATCH iwl-next v2 5/5] ice: do not re-enable miscellaneous interrupt until thread_fn completes Jacob Keller
  4 siblings, 0 replies; 11+ messages in thread
From: Jacob Keller @ 2023-05-30 17:46 UTC (permalink / raw)
  To: Intel Wired LAN, Anthony Nguyen

The ice_ptp_process_ts() function and its various helper functions return a
boolean value indicating whether any work is remaining. This use of a
boolean has grown confusing as we have multiple helpers that pass status
between each other. Readers must be aware of what "true" and "false" mean,
and it is very easy to get their meaning inverted. The names of the
functions are not standard "yes/no" questions, which is the best practice
for boolean returns.

Replace this use of an enumeration with a custom type, enum
ice_tx_tstamp_work. This enumeration clearly indicates whether all work is
done, or if more work is pending.

To aid in readability, factor the actual list iteration and processing out
into ice_ptp_process_tx_tstamp(), making it void. Then call this in
ice_ptp_tx_tstamp() ensuring that we always check the Tracker list at the
end when determining the appropriate return value.

Now the return value is an explicit name instead of the true or false
value. This is easier to follow and makes reading the resulting callers
much simpler.

In addition, this paves the way for future work to allow E822 hardware to
process timestamps for all functions using a single interrupt on the clock
owning PF.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
---

Changes since v1:
*  Remove unused more_timestamps variable from ice_ptp_process_tx_tstamp

 drivers/net/ethernet/intel/ice/ice_main.c |  2 +-
 drivers/net/ethernet/intel/ice/ice_ptp.c  | 50 ++++++++++++++---------
 drivers/net/ethernet/intel/ice/ice_ptp.h  | 12 +++++-
 3 files changed, 42 insertions(+), 22 deletions(-)

diff --git a/drivers/net/ethernet/intel/ice/ice_main.c b/drivers/net/ethernet/intel/ice/ice_main.c
index 8b59632ec6b1..481dccdb95cd 100644
--- a/drivers/net/ethernet/intel/ice/ice_main.c
+++ b/drivers/net/ethernet/intel/ice/ice_main.c
@@ -3204,7 +3204,7 @@ static irqreturn_t ice_misc_intr_thread_fn(int __always_unused irq, void *data)
 	}
 
 	if (pf->oicr_misc & PFINT_OICR_TSYN_TX_M) {
-		while (!ice_ptp_process_ts(pf))
+		while (ice_ptp_process_ts(pf) == ICE_TX_TSTAMP_WORK_PENDING)
 			usleep_range(50, 100);
 		pf->oicr_misc &= ~PFINT_OICR_TSYN_TX_M;
 	}
diff --git a/drivers/net/ethernet/intel/ice/ice_ptp.c b/drivers/net/ethernet/intel/ice/ice_ptp.c
index 6f51ebaf1d70..81d96a40d5a7 100644
--- a/drivers/net/ethernet/intel/ice/ice_ptp.c
+++ b/drivers/net/ethernet/intel/ice/ice_ptp.c
@@ -617,7 +617,7 @@ ice_ptp_is_tx_tracker_up(struct ice_ptp_tx *tx)
 }
 
 /**
- * ice_ptp_tx_tstamp - Process Tx timestamps for a port
+ * ice_ptp_process_tx_tstamp - Process Tx timestamps for a port
  * @tx: the PTP Tx timestamp tracker
  *
  * Process timestamps captured by the PHY associated with this port. To do
@@ -633,15 +633,6 @@ ice_ptp_is_tx_tracker_up(struct ice_ptp_tx *tx)
  * 6) extend the 40 bit timestamp value to get a 64 bit timestamp value
  * 7) send this 64 bit timestamp to the stack
  *
- * Returns true if all timestamps were handled, and false if any slots remain
- * without a timestamp.
- *
- * After looping, if we still have waiting SKBs, return false. This may cause
- * us effectively poll even when not strictly necessary. We do this because
- * it's possible a new timestamp was requested around the same time as the
- * interrupt. In some cases hardware might not interrupt us again when the
- * timestamp is captured.
- *
  * Note that we do not hold the tracking lock while reading the Tx timestamp.
  * This is because reading the timestamp requires taking a mutex that might
  * sleep.
@@ -673,10 +664,9 @@ ice_ptp_is_tx_tracker_up(struct ice_ptp_tx *tx)
  * the packet will never be sent by hardware and discard it without reading
  * the timestamp register.
  */
-static bool ice_ptp_tx_tstamp(struct ice_ptp_tx *tx)
+static void ice_ptp_process_tx_tstamp(struct ice_ptp_tx *tx)
 {
 	struct ice_ptp_port *ptp_port;
-	bool more_timestamps;
 	struct ice_pf *pf;
 	struct ice_hw *hw;
 	u64 tstamp_ready;
@@ -685,7 +675,7 @@ static bool ice_ptp_tx_tstamp(struct ice_ptp_tx *tx)
 	u8 idx;
 
 	if (!tx->init)
-		return true;
+		return;
 
 	ptp_port = container_of(tx, struct ice_ptp_port, tx);
 	pf = ptp_port_to_pf(ptp_port);
@@ -694,7 +684,7 @@ static bool ice_ptp_tx_tstamp(struct ice_ptp_tx *tx)
 	/* Read the Tx ready status first */
 	err = ice_get_phy_tx_tstamp_ready(hw, tx->block, &tstamp_ready);
 	if (err)
-		return false;
+		return;
 
 	/* Drop packets if the link went down */
 	link_up = ptp_port->link_up;
@@ -782,15 +772,34 @@ static bool ice_ptp_tx_tstamp(struct ice_ptp_tx *tx)
 		skb_tstamp_tx(skb, &shhwtstamps);
 		dev_kfree_skb_any(skb);
 	}
+}
 
-	/* Check if we still have work to do. If so, re-queue this task to
-	 * poll for remaining timestamps.
-	 */
+/**
+ * ice_ptp_tx_tstamp - Process Tx timestamps for this function.
+ * @tx: Tx tracking structure to initialize
+ *
+ * Returns: ICE_TX_TSTAMP_WORK_PENDING if there are any outstanding incomplete
+ * Tx timestamps, or ICE_TX_TSTAMP_WORK_DONE otherwise.
+ */
+static enum ice_tx_tstamp_work ice_ptp_tx_tstamp(struct ice_ptp_tx *tx)
+{
+	bool more_timestamps;
+
+	if (!tx->init)
+		return ICE_TX_TSTAMP_WORK_DONE;
+
+	/* Process the Tx timestamp tracker */
+	ice_ptp_process_tx_tstamp(tx);
+
+	/* Check if there are outstanding Tx timestamps */
 	spin_lock(&tx->lock);
 	more_timestamps = tx->init && !bitmap_empty(tx->in_use, tx->len);
 	spin_unlock(&tx->lock);
 
-	return !more_timestamps;
+	if (more_timestamps)
+		return ICE_TX_TSTAMP_WORK_PENDING;
+
+	return ICE_TX_TSTAMP_WORK_DONE;
 }
 
 /**
@@ -2426,9 +2435,10 @@ s8 ice_ptp_request_ts(struct ice_ptp_tx *tx, struct sk_buff *skb)
  * ice_ptp_process_ts - Process the PTP Tx timestamps
  * @pf: Board private structure
  *
- * Returns true if timestamps are processed.
+ * Returns: ICE_TX_TSTAMP_WORK_PENDING if there are any outstanding Tx
+ * timestamps that need processing, and ICE_TX_TSTAMP_WORK_DONE otherwise.
  */
-bool ice_ptp_process_ts(struct ice_pf *pf)
+enum ice_tx_tstamp_work ice_ptp_process_ts(struct ice_pf *pf)
 {
 	return ice_ptp_tx_tstamp(&pf->ptp.port.tx);
 }
diff --git a/drivers/net/ethernet/intel/ice/ice_ptp.h b/drivers/net/ethernet/intel/ice/ice_ptp.h
index 9f8902c1e743..6c90775e1eb0 100644
--- a/drivers/net/ethernet/intel/ice/ice_ptp.h
+++ b/drivers/net/ethernet/intel/ice/ice_ptp.h
@@ -108,6 +108,16 @@ struct ice_tx_tstamp {
 	u64 cached_tstamp;
 };
 
+/**
+ * enum ice_tx_tstamp_work - Status of Tx timestamp work function
+ * @ICE_TX_TSTAMP_WORK_DONE - Tx timestamp processing is complete
+ * @ICE_TX_TSTAMP_WORK_PENDING - More Tx timestamps are pending
+ */
+enum ice_tx_tstamp_work {
+	ICE_TX_TSTAMP_WORK_DONE = 0,
+	ICE_TX_TSTAMP_WORK_PENDING,
+};
+
 /**
  * struct ice_ptp_tx - Tracking structure for all Tx timestamp requests on a port
  * @lock: lock to prevent concurrent access to fields of this struct
@@ -256,7 +266,7 @@ int ice_get_ptp_clock_index(struct ice_pf *pf);
 
 void ice_ptp_extts_event(struct ice_pf *pf);
 s8 ice_ptp_request_ts(struct ice_ptp_tx *tx, struct sk_buff *skb);
-bool ice_ptp_process_ts(struct ice_pf *pf);
+enum ice_tx_tstamp_work ice_ptp_process_ts(struct ice_pf *pf);
 
 void
 ice_ptp_rx_hwtstamp(struct ice_rx_ring *rx_ring,
-- 
2.40.0.471.gbd7f14d9353b

_______________________________________________
Intel-wired-lan mailing list
Intel-wired-lan@osuosl.org
https://lists.osuosl.org/mailman/listinfo/intel-wired-lan

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [Intel-wired-lan] [PATCH iwl-next v2 4/5] ice: trigger PFINT_OICR_TSYN_TX interrupt instead of polling
  2023-05-30 17:46 [Intel-wired-lan] [PATCH iwl-next v2 0/5] ice: Improve miscellaneous interrupt code Jacob Keller
                   ` (2 preceding siblings ...)
  2023-05-30 17:46 ` [Intel-wired-lan] [PATCH iwl-next v2 3/5] ice: introduce ICE_TX_TSTAMP_WORK enumeration Jacob Keller
@ 2023-05-30 17:46 ` Jacob Keller
  2023-05-30 17:46 ` [Intel-wired-lan] [PATCH iwl-next v2 5/5] ice: do not re-enable miscellaneous interrupt until thread_fn completes Jacob Keller
  4 siblings, 0 replies; 11+ messages in thread
From: Jacob Keller @ 2023-05-30 17:46 UTC (permalink / raw)
  To: Intel Wired LAN, Anthony Nguyen

In ice_misc_intr_thread_fn(), if we do not complete all Tx timestamp work,
the thread function will poll continuously forever.

For E822 hardware, this wastes time as the return value from
ice_ptp_process_ts() is accurate and always reports correctly that the PHY
actually has new timestamp data.

In addition, if we receive enough timestamps at the right pacing, we might
never exit this polling and thus prevent other interrupt tasks from being
processed.

Fix this by instead writing to PFINT_OICR, causing an emulated interrupt to
be triggered immediately. This does take slightly more processing than just
re-checking the timestamps. However, it allows all of the other interrupt
causes a chance to be processed first in the hard IRQ function.

Note that the OICR interrupt is throttled to 8K per second, so the hardware
will not let the interrupt trigger more often than once every 124
microseconds, so this should not cause a significant increase in CPU usage
vs the sleeping method.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
---
 drivers/net/ethernet/intel/ice/ice_main.c | 12 ++++++++++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/intel/ice/ice_main.c b/drivers/net/ethernet/intel/ice/ice_main.c
index 481dccdb95cd..72e1b919b2d3 100644
--- a/drivers/net/ethernet/intel/ice/ice_main.c
+++ b/drivers/net/ethernet/intel/ice/ice_main.c
@@ -3204,8 +3204,16 @@ static irqreturn_t ice_misc_intr_thread_fn(int __always_unused irq, void *data)
 	}

 	if (pf->oicr_misc & PFINT_OICR_TSYN_TX_M) {
-		while (ice_ptp_process_ts(pf) == ICE_TX_TSTAMP_WORK_PENDING)
-			usleep_range(50, 100);
+		struct ice_hw *hw = &pf->hw;
+
+		/* Process outstanding Tx timestamps. If there is more work,
+		 * re-arm the interrupt to trigger again.
+		 */
+		if (ice_ptp_process_ts(pf) == ICE_TX_TSTAMP_WORK_PENDING) {
+			wr32(hw, PFINT_OICR, PFINT_OICR_TSYN_TX_M);
+			ice_flush(hw);
+		}
+
 		pf->oicr_misc &= ~PFINT_OICR_TSYN_TX_M;
 	}

-- 
2.40.0.471.gbd7f14d9353b

_______________________________________________
Intel-wired-lan mailing list
Intel-wired-lan@osuosl.org
https://lists.osuosl.org/mailman/listinfo/intel-wired-lan

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [Intel-wired-lan] [PATCH iwl-next v2 5/5] ice: do not re-enable miscellaneous interrupt until thread_fn completes
  2023-05-30 17:46 [Intel-wired-lan] [PATCH iwl-next v2 0/5] ice: Improve miscellaneous interrupt code Jacob Keller
                   ` (3 preceding siblings ...)
  2023-05-30 17:46 ` [Intel-wired-lan] [PATCH iwl-next v2 4/5] ice: trigger PFINT_OICR_TSYN_TX interrupt instead of polling Jacob Keller
@ 2023-05-30 17:46 ` Jacob Keller
  2023-05-31 14:19   ` Michal Schmidt
  4 siblings, 1 reply; 11+ messages in thread
From: Jacob Keller @ 2023-05-30 17:46 UTC (permalink / raw)
  To: Intel Wired LAN, Anthony Nguyen

At the end of ice_misc_intr() the driver calls ice_irq_dynamic_ena() to
re-enable the miscellaneous interrupt. This is done before the
ice_misc_intr_thread_fn can run and complete.

According to the kernel function comment documentation for
request_threaded_irq(), the interrupt should remain disabled until the
thread function completes its task.

By re-enabling the interrupt at the end of the hard IRQ, it is possible for
a new interrupt to trigger while the thread function is processing. This is
problematic for PTP Tx timestamps.

For E822 devices, the hardware in the PHY keeps track of how many
outstanding timestamps are generated and how many timestamps are read from
the PHY.

This counter is incremented once for each timestamp that is captured by
hardware, and decremented once each time a timestamp is read from the PHY.
The PHY will not generate a new interrupt unless this internal counter is
zero before the most recently captured timestamp.

Because of this counter behavior, a race with the hard IRQ and threaded IRQ
function can result in the potential for the counter to get stuck such that
no new interrupts will be triggered until the device is reset.

Consider the following flow:

 1 -> Tx timestamp completes in hardware
 2 -> timestamp interrupt occurs
 3 -> ice_misc_intr() re-enables timestamp interrupt, and wakes the
      thread_fn
 4 -> thread_fn is running and processing Tx timestamp
 5 -> the Tx timestamp is read from PHY, clearing the counter
 6 -> a new Tx timestamp completes in hardware, triggering interrupt
 7 -> the thread_fn hasn't exited and reported IRQ handled
 8 -> ice_misc_intr() triggers and sees PTP interrupt, so tries to wake thread
 9 -> thread_fn is already running (IRQTF_RUNTHREAD is set still!) so we
      skip running the thread...
10 -> an outstanding timestamp is remaining but we never read it
11 -> interrupt never triggers again

The fix for this complicated race condition is simple: do not re-enable the
miscellaneous interrupt until *after* the thread function completes. If a
new timestamp event triggers while the interrupt is disabled, it will be
remembered and should cause the interrupt to trigger again immediately
after we re-enable the interrupt.

Fixes: 1229b33973c7 ("ice: Add low latency Tx timestamp read")
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
---
 drivers/net/ethernet/intel/ice/ice_main.c | 9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/intel/ice/ice_main.c b/drivers/net/ethernet/intel/ice/ice_main.c
index 72e1b919b2d3..51fe3da0d54f 100644
--- a/drivers/net/ethernet/intel/ice/ice_main.c
+++ b/drivers/net/ethernet/intel/ice/ice_main.c
@@ -3179,8 +3179,6 @@ static irqreturn_t ice_misc_intr(int __always_unused irq, void *data)
 		}
 	}

-	ice_irq_dynamic_ena(hw, NULL, NULL);
-
 	return IRQ_WAKE_THREAD;
 }

@@ -3192,6 +3190,9 @@ static irqreturn_t ice_misc_intr(int __always_unused irq, void *data)
 static irqreturn_t ice_misc_intr_thread_fn(int __always_unused irq, void *data)
 {
 	struct ice_pf *pf = data;
+	struct ice_hw *hw;
+
+	hw = &pf->hw;

 	if (ice_is_reset_in_progress(pf->state))
 		return IRQ_HANDLED;
@@ -3204,8 +3205,6 @@ static irqreturn_t ice_misc_intr_thread_fn(int __always_unused irq, void *data)
 	}

 	if (pf->oicr_misc & PFINT_OICR_TSYN_TX_M) {
-		struct ice_hw *hw = &pf->hw;
-
 		/* Process outstanding Tx timestamps. If there is more work,
 		 * re-arm the interrupt to trigger again.
 		 */
@@ -3217,6 +3216,8 @@ static irqreturn_t ice_misc_intr_thread_fn(int __always_unused irq, void *data)
 		pf->oicr_misc &= ~PFINT_OICR_TSYN_TX_M;
 	}

+	ice_irq_dynamic_ena(hw, NULL, NULL);
+
 	return IRQ_HANDLED;
 }

-- 
2.40.0.471.gbd7f14d9353b

_______________________________________________
Intel-wired-lan mailing list
Intel-wired-lan@osuosl.org
https://lists.osuosl.org/mailman/listinfo/intel-wired-lan

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [Intel-wired-lan] [PATCH iwl-next v2 5/5] ice: do not re-enable miscellaneous interrupt until thread_fn completes
  2023-05-30 17:46 ` [Intel-wired-lan] [PATCH iwl-next v2 5/5] ice: do not re-enable miscellaneous interrupt until thread_fn completes Jacob Keller
@ 2023-05-31 14:19   ` Michal Schmidt
  2023-06-01 18:46     ` Jacob Keller
  0 siblings, 1 reply; 11+ messages in thread
From: Michal Schmidt @ 2023-05-31 14:19 UTC (permalink / raw)
  To: Jacob Keller; +Cc: Anthony Nguyen, Intel Wired LAN

On Tue, May 30, 2023 at 7:46 PM Jacob Keller <jacob.e.keller@intel.com> wrote:
>
> At the end of ice_misc_intr() the driver calls ice_irq_dynamic_ena() to
> re-enable the miscellaneous interrupt. This is done before the
> ice_misc_intr_thread_fn can run and complete.
>
> According to the kernel function comment documentation for
> request_threaded_irq(), the interrupt should remain disabled until the
> thread function completes its task.
>
> By re-enabling the interrupt at the end of the hard IRQ, it is possible for
> a new interrupt to trigger while the thread function is processing. This is
> problematic for PTP Tx timestamps.
>
> For E822 devices, the hardware in the PHY keeps track of how many
> outstanding timestamps are generated and how many timestamps are read from
> the PHY.
>
> This counter is incremented once for each timestamp that is captured by
> hardware, and decremented once each time a timestamp is read from the PHY.
> The PHY will not generate a new interrupt unless this internal counter is
> zero before the most recently captured timestamp.
>
> Because of this counter behavior, a race with the hard IRQ and threaded IRQ
> function can result in the potential for the counter to get stuck such that
> no new interrupts will be triggered until the device is reset.
>
> Consider the following flow:
>
>  1 -> Tx timestamp completes in hardware
>  2 -> timestamp interrupt occurs
>  3 -> ice_misc_intr() re-enables timestamp interrupt, and wakes the
>       thread_fn
>  4 -> thread_fn is running and processing Tx timestamp
>  5 -> the Tx timestamp is read from PHY, clearing the counter
>  6 -> a new Tx timestamp completes in hardware, triggering interrupt
>  7 -> the thread_fn hasn't exited and reported IRQ handled
>  8 -> ice_misc_intr() triggers and sees PTP interrupt, so tries to wake thread
>  9 -> thread_fn is already running (IRQTF_RUNTHREAD is set still!) so we
>       skip running the thread...

There is a misunderstanding here. IRQTF_RUNTHREAD does not remain set
for the execution of thread_fn. The IRQ thread clears the flag
*before* it starts executing your thread_fn. See kernel/irq/manage.c.
The IRQ thread's main loop is in irq_thread(). Every iteration of the
loop starts with irq_wait_for_interrupt(). It uses
"test_and_clear_bit(IRQTF_RUNTHREAD, ...)" to check if there's work to
do.

So it's not possible for step 9 to forget the interrupt like that. If
thread_fn is still executing, the setting of IRQTF_RUNTHREAD by the
hard interrupt handler will tell the IRQ thread to go through the loop
again.

Michal

> 10 -> an outstanding timestamp is remaining but we never read it
> 11 -> interrupt never triggers again
>
> The fix for this complicated race condition is simple: do not re-enable the
> miscellaneous interrupt until *after* the thread function completes. If a
> new timestamp event triggers while the interrupt is disabled, it will be
> remembered and should cause the interrupt to trigger again immediately
> after we re-enable the interrupt.
>
> Fixes: 1229b33973c7 ("ice: Add low latency Tx timestamp read")
> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
> ---
>  drivers/net/ethernet/intel/ice/ice_main.c | 9 +++++----
>  1 file changed, 5 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/net/ethernet/intel/ice/ice_main.c b/drivers/net/ethernet/intel/ice/ice_main.c
> index 72e1b919b2d3..51fe3da0d54f 100644
> --- a/drivers/net/ethernet/intel/ice/ice_main.c
> +++ b/drivers/net/ethernet/intel/ice/ice_main.c
> @@ -3179,8 +3179,6 @@ static irqreturn_t ice_misc_intr(int __always_unused irq, void *data)
>                 }
>         }
>
> -       ice_irq_dynamic_ena(hw, NULL, NULL);
> -
>         return IRQ_WAKE_THREAD;
>  }
>
> @@ -3192,6 +3190,9 @@ static irqreturn_t ice_misc_intr(int __always_unused irq, void *data)
>  static irqreturn_t ice_misc_intr_thread_fn(int __always_unused irq, void *data)
>  {
>         struct ice_pf *pf = data;
> +       struct ice_hw *hw;
> +
> +       hw = &pf->hw;
>
>         if (ice_is_reset_in_progress(pf->state))
>                 return IRQ_HANDLED;
> @@ -3204,8 +3205,6 @@ static irqreturn_t ice_misc_intr_thread_fn(int __always_unused irq, void *data)
>         }
>
>         if (pf->oicr_misc & PFINT_OICR_TSYN_TX_M) {
> -               struct ice_hw *hw = &pf->hw;
> -
>                 /* Process outstanding Tx timestamps. If there is more work,
>                  * re-arm the interrupt to trigger again.
>                  */
> @@ -3217,6 +3216,8 @@ static irqreturn_t ice_misc_intr_thread_fn(int __always_unused irq, void *data)
>                 pf->oicr_misc &= ~PFINT_OICR_TSYN_TX_M;
>         }
>
> +       ice_irq_dynamic_ena(hw, NULL, NULL);
> +
>         return IRQ_HANDLED;
>  }
>
> --
> 2.40.0.471.gbd7f14d9353b
>

_______________________________________________
Intel-wired-lan mailing list
Intel-wired-lan@osuosl.org
https://lists.osuosl.org/mailman/listinfo/intel-wired-lan

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Intel-wired-lan] [PATCH iwl-next v2 1/5] ice: handle extts in the miscellaneous interrupt thread
  2023-05-30 17:46 ` [Intel-wired-lan] [PATCH iwl-next v2 1/5] ice: handle extts in the miscellaneous interrupt thread Jacob Keller
@ 2023-05-31 14:38   ` Michal Schmidt
  0 siblings, 0 replies; 11+ messages in thread
From: Michal Schmidt @ 2023-05-31 14:38 UTC (permalink / raw)
  To: Jacob Keller
  Cc: Anthony Nguyen, Karol Kolacinski, Intel Wired LAN, Anatolii Gerasymenko

On Tue, May 30, 2023 at 7:46 PM Jacob Keller <jacob.e.keller@intel.com> wrote:
>
> From: Karol Kolacinski <karol.kolacinski@intel.com>
>
> The ice_ptp_extts_work() and ice_ptp_periodic_work() functions are both
> scheduled on the same kthread worker, pf.ptp.kworker. The
> ice_ptp_periodic_work() function sends to the firmware to interact with the
> PHY, and must block to wait for responses.
>
> This can cause delay in responding to the PFINT_OICR_TSYN_EVNT interrupt
> cause, ultimately resulting in disruption to processing an input signal of
> the frequency is high enough. In our testing, even 100 Hz signals get
> disrupted.
>
> Fix this by instead processing the signal inside the miscellaneous
> interrupt thread prior to handling Tx timestamps.
>
> Fixes: 172db5f91d5f ("ice: add support for auxiliary input/output pins")
> Reported-by: Anatolii Gerasymenko <anatolii.gerasymenko@intel.com>
> Signed-off-by: Karol Kolacinski <karol.kolacinski@intel.com>
> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
> ---
>  drivers/net/ethernet/intel/ice/ice.h      |  1 +
>  drivers/net/ethernet/intel/ice/ice_main.c | 33 +++++++++++++++++------
>  drivers/net/ethernet/intel/ice/ice_ptp.c  | 12 +++------
>  drivers/net/ethernet/intel/ice/ice_ptp.h  |  4 +--
>  4 files changed, 31 insertions(+), 19 deletions(-)
>
...
> @@ -4958,6 +4974,7 @@ ice_probe(struct pci_dev *pdev, const struct pci_device_id __always_unused *ent)
>         pf = ice_allocate_pf(dev);
>         if (!pf)
>                 return -ENOMEM;
> +       pf->oicr_misc = 0;

The initialization is redundant. ice_allocate_pf kzalloc'ed the whole structure.

Michal

>         /* initialize Auxiliary index to invalid value */
>         pf->aux_idx = -1;

_______________________________________________
Intel-wired-lan mailing list
Intel-wired-lan@osuosl.org
https://lists.osuosl.org/mailman/listinfo/intel-wired-lan

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Intel-wired-lan] [PATCH iwl-next v2 5/5] ice: do not re-enable miscellaneous interrupt until thread_fn completes
  2023-05-31 14:19   ` Michal Schmidt
@ 2023-06-01 18:46     ` Jacob Keller
  2023-06-01 19:18       ` Michal Schmidt
  0 siblings, 1 reply; 11+ messages in thread
From: Jacob Keller @ 2023-06-01 18:46 UTC (permalink / raw)
  To: Michal Schmidt; +Cc: Anthony Nguyen, Intel Wired LAN

On 5/31/2023 7:19 AM, Michal Schmidt wrote:
> On Tue, May 30, 2023 at 7:46 PM Jacob Keller <jacob.e.keller@intel.com> wrote:
>>
>> At the end of ice_misc_intr() the driver calls ice_irq_dynamic_ena() to
>> re-enable the miscellaneous interrupt. This is done before the
>> ice_misc_intr_thread_fn can run and complete.
>>
>> According to the kernel function comment documentation for
>> request_threaded_irq(), the interrupt should remain disabled until the
>> thread function completes its task.
>>
>> By re-enabling the interrupt at the end of the hard IRQ, it is possible for
>> a new interrupt to trigger while the thread function is processing. This is
>> problematic for PTP Tx timestamps.
>>
>> For E822 devices, the hardware in the PHY keeps track of how many
>> outstanding timestamps are generated and how many timestamps are read from
>> the PHY.
>>
>> This counter is incremented once for each timestamp that is captured by
>> hardware, and decremented once each time a timestamp is read from the PHY.
>> The PHY will not generate a new interrupt unless this internal counter is
>> zero before the most recently captured timestamp.
>>
>> Because of this counter behavior, a race with the hard IRQ and threaded IRQ
>> function can result in the potential for the counter to get stuck such that
>> no new interrupts will be triggered until the device is reset.
>>
>> Consider the following flow:
>>
>>  1 -> Tx timestamp completes in hardware
>>  2 -> timestamp interrupt occurs
>>  3 -> ice_misc_intr() re-enables timestamp interrupt, and wakes the
>>       thread_fn
>>  4 -> thread_fn is running and processing Tx timestamp
>>  5 -> the Tx timestamp is read from PHY, clearing the counter
>>  6 -> a new Tx timestamp completes in hardware, triggering interrupt
>>  7 -> the thread_fn hasn't exited and reported IRQ handled
>>  8 -> ice_misc_intr() triggers and sees PTP interrupt, so tries to wake thread
>>  9 -> thread_fn is already running (IRQTF_RUNTHREAD is set still!) so we
>>       skip running the thread...
> 
> There is a misunderstanding here. IRQTF_RUNTHREAD does not remain set
> for the execution of thread_fn. The IRQ thread clears the flag
> *before* it starts executing your thread_fn. See kernel/irq/manage.c.
> The IRQ thread's main loop is in irq_thread(). Every iteration of the
> loop starts with irq_wait_for_interrupt(). It uses
> "test_and_clear_bit(IRQTF_RUNTHREAD, ...)" to check if there's work to
> do.
> 
> So it's not possible for step 9 to forget the interrupt like that. If
> thread_fn is still executing, the setting of IRQTF_RUNTHREAD by the
> hard interrupt handler will tell the IRQ thread to go through the loop
> again.
> 
> Michal
> 

Ok, so my original understanding was flawed, but I think I see the
actual race that happened.

I'll try to explain it here, and see if you agree with the outline, then
I can update the commit message.

The PHYs keep track of how many outstanding timestamps are in memory,
and only generate an interrupt if the count of timestamps goes from
below a threshold to above a threshold value (set by the driver to be 1,
so that it will interrupt immediately on the first timestamp).

As long as there are unread timestamps in the memory bank, it will not
generate a new interrupt.

Somehow, the device gets in a state where it failed to read the
timestamps from the PHY memory *after* the interrupt occurred. When this
happens, we no longer get a new interrupt, because the PHY sees that
there are still unread timestamps. (Yes, I know, this is a poor design).

Because of this, we stop processing all future timestamps.

The actual race is possibly something like the following:

same steps up from 1-7, then:

8 -> ice_misc_intr tiriggers and sees PTP interrupt, so it sets
PFINT_OICR_TSYN_TX_M in pf->oicr_misc.
9 -> unfortunately, ice_misc_intr_thread_fn then *clears* the bit after
exiting from its processing loop.
10 -> once thread_fn exits, the threaded IRQ logic re-runs the function.
However, due to the way that we interact between the main and thread
function, the second run sees that PFINT_OICR_TSYN_TX_M is unset, so it
doesn't run the loop again.

This patch fixes things by ensuring that the hardware won't even
generate a hard IRQ interrupt until the thread_fn completes. We could
also instead fix this by improving the communication between the handler
function and the thread function by using atomic test_and_clear(), or by
using an atomic counter.

I can send a v3 with appropriate fixes and an updated commit message,
once this description makes sense to you.
_______________________________________________
Intel-wired-lan mailing list
Intel-wired-lan@osuosl.org
https://lists.osuosl.org/mailman/listinfo/intel-wired-lan

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Intel-wired-lan] [PATCH iwl-next v2 5/5] ice: do not re-enable miscellaneous interrupt until thread_fn completes
  2023-06-01 18:46     ` Jacob Keller
@ 2023-06-01 19:18       ` Michal Schmidt
  2023-06-01 19:45         ` Jacob Keller
  0 siblings, 1 reply; 11+ messages in thread
From: Michal Schmidt @ 2023-06-01 19:18 UTC (permalink / raw)
  To: Jacob Keller; +Cc: Anthony Nguyen, Intel Wired LAN

On Thu, Jun 1, 2023 at 8:46 PM Jacob Keller <jacob.e.keller@intel.com> wrote:
> On 5/31/2023 7:19 AM, Michal Schmidt wrote:
> > On Tue, May 30, 2023 at 7:46 PM Jacob Keller <jacob.e.keller@intel.com> wrote:
> >>
> >> At the end of ice_misc_intr() the driver calls ice_irq_dynamic_ena() to
> >> re-enable the miscellaneous interrupt. This is done before the
> >> ice_misc_intr_thread_fn can run and complete.
> >>
> >> According to the kernel function comment documentation for
> >> request_threaded_irq(), the interrupt should remain disabled until the
> >> thread function completes its task.
> >>
> >> By re-enabling the interrupt at the end of the hard IRQ, it is possible for
> >> a new interrupt to trigger while the thread function is processing. This is
> >> problematic for PTP Tx timestamps.
> >>
> >> For E822 devices, the hardware in the PHY keeps track of how many
> >> outstanding timestamps are generated and how many timestamps are read from
> >> the PHY.
> >>
> >> This counter is incremented once for each timestamp that is captured by
> >> hardware, and decremented once each time a timestamp is read from the PHY.
> >> The PHY will not generate a new interrupt unless this internal counter is
> >> zero before the most recently captured timestamp.
> >>
> >> Because of this counter behavior, a race with the hard IRQ and threaded IRQ
> >> function can result in the potential for the counter to get stuck such that
> >> no new interrupts will be triggered until the device is reset.
> >>
> >> Consider the following flow:
> >>
> >>  1 -> Tx timestamp completes in hardware
> >>  2 -> timestamp interrupt occurs
> >>  3 -> ice_misc_intr() re-enables timestamp interrupt, and wakes the
> >>       thread_fn
> >>  4 -> thread_fn is running and processing Tx timestamp
> >>  5 -> the Tx timestamp is read from PHY, clearing the counter
> >>  6 -> a new Tx timestamp completes in hardware, triggering interrupt
> >>  7 -> the thread_fn hasn't exited and reported IRQ handled
> >>  8 -> ice_misc_intr() triggers and sees PTP interrupt, so tries to wake thread
> >>  9 -> thread_fn is already running (IRQTF_RUNTHREAD is set still!) so we
> >>       skip running the thread...
> >
> > There is a misunderstanding here. IRQTF_RUNTHREAD does not remain set
> > for the execution of thread_fn. The IRQ thread clears the flag
> > *before* it starts executing your thread_fn. See kernel/irq/manage.c.
> > The IRQ thread's main loop is in irq_thread(). Every iteration of the
> > loop starts with irq_wait_for_interrupt(). It uses
> > "test_and_clear_bit(IRQTF_RUNTHREAD, ...)" to check if there's work to
> > do.
> >
> > So it's not possible for step 9 to forget the interrupt like that. If
> > thread_fn is still executing, the setting of IRQTF_RUNTHREAD by the
> > hard interrupt handler will tell the IRQ thread to go through the loop
> > again.
> >
> > Michal
> >
>
> Ok, so my original understanding was flawed, but I think I see the
> actual race that happened.
>
> I'll try to explain it here, and see if you agree with the outline, then
> I can update the commit message.
>
> The PHYs keep track of how many outstanding timestamps are in memory,
> and only generate an interrupt if the count of timestamps goes from
> below a threshold to above a threshold value (set by the driver to be 1,
> so that it will interrupt immediately on the first timestamp).
>
> As long as there are unread timestamps in the memory bank, it will not
> generate a new interrupt.
>
> Somehow, the device gets in a state where it failed to read the
> timestamps from the PHY memory *after* the interrupt occurred. When this
> happens, we no longer get a new interrupt, because the PHY sees that
> there are still unread timestamps. (Yes, I know, this is a poor design).
>
> Because of this, we stop processing all future timestamps.
>
> The actual race is possibly something like the following:
>
> same steps up from 1-7, then:
>
> 8 -> ice_misc_intr tiriggers and sees PTP interrupt, so it sets
> PFINT_OICR_TSYN_TX_M in pf->oicr_misc.
> 9 -> unfortunately, ice_misc_intr_thread_fn then *clears* the bit after
> exiting from its processing loop.
> 10 -> once thread_fn exits, the threaded IRQ logic re-runs the function.
> However, due to the way that we interact between the main and thread
> function, the second run sees that PFINT_OICR_TSYN_TX_M is unset, so it
> doesn't run the loop again.
>
> This patch fixes things by ensuring that the hardware won't even
> generate a hard IRQ interrupt until the thread_fn completes. We could
> also instead fix this by improving the communication between the handler
> function and the thread function by using atomic test_and_clear(), or by
> using an atomic counter.
>
> I can send a v3 with appropriate fixes and an updated commit message,
> once this description makes sense to you.

OK, the new explanation seems plausible.
Michal

_______________________________________________
Intel-wired-lan mailing list
Intel-wired-lan@osuosl.org
https://lists.osuosl.org/mailman/listinfo/intel-wired-lan

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Intel-wired-lan] [PATCH iwl-next v2 5/5] ice: do not re-enable miscellaneous interrupt until thread_fn completes
  2023-06-01 19:18       ` Michal Schmidt
@ 2023-06-01 19:45         ` Jacob Keller
  0 siblings, 0 replies; 11+ messages in thread
From: Jacob Keller @ 2023-06-01 19:45 UTC (permalink / raw)
  To: Michal Schmidt; +Cc: Anthony Nguyen, Intel Wired LAN



On 6/1/2023 12:18 PM, Michal Schmidt wrote:
> On Thu, Jun 1, 2023 at 8:46 PM Jacob Keller <jacob.e.keller@intel.com> wrote:
>> On 5/31/2023 7:19 AM, Michal Schmidt wrote:
>>> On Tue, May 30, 2023 at 7:46 PM Jacob Keller <jacob.e.keller@intel.com> wrote:
>>>>
>>>> At the end of ice_misc_intr() the driver calls ice_irq_dynamic_ena() to
>>>> re-enable the miscellaneous interrupt. This is done before the
>>>> ice_misc_intr_thread_fn can run and complete.
>>>>
>>>> According to the kernel function comment documentation for
>>>> request_threaded_irq(), the interrupt should remain disabled until the
>>>> thread function completes its task.
>>>>
>>>> By re-enabling the interrupt at the end of the hard IRQ, it is possible for
>>>> a new interrupt to trigger while the thread function is processing. This is
>>>> problematic for PTP Tx timestamps.
>>>>
>>>> For E822 devices, the hardware in the PHY keeps track of how many
>>>> outstanding timestamps are generated and how many timestamps are read from
>>>> the PHY.
>>>>
>>>> This counter is incremented once for each timestamp that is captured by
>>>> hardware, and decremented once each time a timestamp is read from the PHY.
>>>> The PHY will not generate a new interrupt unless this internal counter is
>>>> zero before the most recently captured timestamp.
>>>>
>>>> Because of this counter behavior, a race with the hard IRQ and threaded IRQ
>>>> function can result in the potential for the counter to get stuck such that
>>>> no new interrupts will be triggered until the device is reset.
>>>>
>>>> Consider the following flow:
>>>>
>>>>  1 -> Tx timestamp completes in hardware
>>>>  2 -> timestamp interrupt occurs
>>>>  3 -> ice_misc_intr() re-enables timestamp interrupt, and wakes the
>>>>       thread_fn
>>>>  4 -> thread_fn is running and processing Tx timestamp
>>>>  5 -> the Tx timestamp is read from PHY, clearing the counter
>>>>  6 -> a new Tx timestamp completes in hardware, triggering interrupt
>>>>  7 -> the thread_fn hasn't exited and reported IRQ handled
>>>>  8 -> ice_misc_intr() triggers and sees PTP interrupt, so tries to wake thread
>>>>  9 -> thread_fn is already running (IRQTF_RUNTHREAD is set still!) so we
>>>>       skip running the thread...
>>>
>>> There is a misunderstanding here. IRQTF_RUNTHREAD does not remain set
>>> for the execution of thread_fn. The IRQ thread clears the flag
>>> *before* it starts executing your thread_fn. See kernel/irq/manage.c.
>>> The IRQ thread's main loop is in irq_thread(). Every iteration of the
>>> loop starts with irq_wait_for_interrupt(). It uses
>>> "test_and_clear_bit(IRQTF_RUNTHREAD, ...)" to check if there's work to
>>> do.
>>>
>>> So it's not possible for step 9 to forget the interrupt like that. If
>>> thread_fn is still executing, the setting of IRQTF_RUNTHREAD by the
>>> hard interrupt handler will tell the IRQ thread to go through the loop
>>> again.
>>>
>>> Michal
>>>
>>
>> Ok, so my original understanding was flawed, but I think I see the
>> actual race that happened.
>>
>> I'll try to explain it here, and see if you agree with the outline, then
>> I can update the commit message.
>>
>> The PHYs keep track of how many outstanding timestamps are in memory,
>> and only generate an interrupt if the count of timestamps goes from
>> below a threshold to above a threshold value (set by the driver to be 1,
>> so that it will interrupt immediately on the first timestamp).
>>
>> As long as there are unread timestamps in the memory bank, it will not
>> generate a new interrupt.
>>
>> Somehow, the device gets in a state where it failed to read the
>> timestamps from the PHY memory *after* the interrupt occurred. When this
>> happens, we no longer get a new interrupt, because the PHY sees that
>> there are still unread timestamps. (Yes, I know, this is a poor design).
>>
>> Because of this, we stop processing all future timestamps.
>>
>> The actual race is possibly something like the following:
>>
>> same steps up from 1-7, then:
>>
>> 8 -> ice_misc_intr tiriggers and sees PTP interrupt, so it sets
>> PFINT_OICR_TSYN_TX_M in pf->oicr_misc.
>> 9 -> unfortunately, ice_misc_intr_thread_fn then *clears* the bit after
>> exiting from its processing loop.
>> 10 -> once thread_fn exits, the threaded IRQ logic re-runs the function.
>> However, due to the way that we interact between the main and thread
>> function, the second run sees that PFINT_OICR_TSYN_TX_M is unset, so it
>> doesn't run the loop again.
>>
>> This patch fixes things by ensuring that the hardware won't even
>> generate a hard IRQ interrupt until the thread_fn completes. We could
>> also instead fix this by improving the communication between the handler
>> function and the thread function by using atomic test_and_clear(), or by
>> using an atomic counter.
>>
>> I can send a v3 with appropriate fixes and an updated commit message,
>> once this description makes sense to you.
> 
> OK, the new explanation seems plausible.
> Michal
> 

Really appreciate the careful review and helping me understand some of
the areas I lacked sufficient knowledge in. I'll send a v3 with improved
messages and I will also convert to using atomics instead of bare u32
values for passing data between the functions. I think thats better even
if we leave the OICR disabled until the thread function exits.
_______________________________________________
Intel-wired-lan mailing list
Intel-wired-lan@osuosl.org
https://lists.osuosl.org/mailman/listinfo/intel-wired-lan

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2023-06-01 19:46 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-05-30 17:46 [Intel-wired-lan] [PATCH iwl-next v2 0/5] ice: Improve miscellaneous interrupt code Jacob Keller
2023-05-30 17:46 ` [Intel-wired-lan] [PATCH iwl-next v2 1/5] ice: handle extts in the miscellaneous interrupt thread Jacob Keller
2023-05-31 14:38   ` Michal Schmidt
2023-05-30 17:46 ` [Intel-wired-lan] [PATCH iwl-next v2 2/5] ice: always return IRQ_WAKE_THREAD in ice_misc_intr() Jacob Keller
2023-05-30 17:46 ` [Intel-wired-lan] [PATCH iwl-next v2 3/5] ice: introduce ICE_TX_TSTAMP_WORK enumeration Jacob Keller
2023-05-30 17:46 ` [Intel-wired-lan] [PATCH iwl-next v2 4/5] ice: trigger PFINT_OICR_TSYN_TX interrupt instead of polling Jacob Keller
2023-05-30 17:46 ` [Intel-wired-lan] [PATCH iwl-next v2 5/5] ice: do not re-enable miscellaneous interrupt until thread_fn completes Jacob Keller
2023-05-31 14:19   ` Michal Schmidt
2023-06-01 18:46     ` Jacob Keller
2023-06-01 19:18       ` Michal Schmidt
2023-06-01 19:45         ` Jacob Keller

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.