[Intel-wired-lan] [PATCH iwl-net V2, 0/2] Fix repeated EEH reports in MSI domain

intel-wired-lan.lists.osuosl.org archive mirror
 help / color / mirror / Atom feed

* [Intel-wired-lan] [PATCH iwl-net V2, 0/2] Fix repeated EEH reports in MSI domain
@ 2024-05-13 17:55 Thinh Tran
  2024-05-13 17:55 ` [Intel-wired-lan] [PATCH iwl-net V2, 1/2] i40e: fractoring out i40e_suspend/i40e_resume Thinh Tran
                   ` (2 more replies)
  0 siblings, 3 replies; 6+ messages in thread
From: Thinh Tran @ 2024-05-13 17:55 UTC (permalink / raw)
  To: netdev, kuba, anthony.l.nguyen, aleksandr.loktionov, przemyslaw.kitszel
  Cc: edumazet, rob.thomas, Thinh Tran, intel-wired-lan, pabeni, davem

The patch fixes an issue where repeated EEH reports with a single error
on the bus of Intel X710 4-port 10G Base-T adapter in the MSI domain
causes the device to be permanently disabled.  It fully resets and
restarts the device when handling the PCI EEH error.

Two new functions, i40e_io_suspend() and i40e_io_resume(), have been
introduced.  These functions were factored out from the existing
i40e_suspend() and i40e_resume() respectively.  This factoring was
done due to concerns about the logic of the I40E_SUSPENSED state, which
caused the device not able to recover.  The functions are now used in the
EEH handling for device suspend/resume callbacks.

- In the PCI error detected callback, replaced i40e_prep_for_reset()
  with i40e_io_suspend(). The change is to fully suspend all I/O
  operations
- In the PCI error slot reset callback, replaced pci_enable_device_mem()
  with pci_enable_device(). This change enables both I/O and memory of 
  the device.
- In the PCI error resume callback, replaced i40e_handle_reset_warning()
  with i40e_io_resume(). This change allows the system to resume I/O 
  operations

v2: fixed typos and split into two commits

Thinh Tran (2):
  i40e: fractoring out i40e_suspend/i40e_resume
  i40e: Fully suspend and resume IO operations in EEH case

 drivers/net/ethernet/intel/i40e/i40e_main.c | 257 +++++++++++---------
 1 file changed, 140 insertions(+), 117 deletions(-)

-- 
2.25.1

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Intel-wired-lan] [PATCH iwl-net V2, 1/2] i40e: fractoring out i40e_suspend/i40e_resume
  2024-05-13 17:55 [Intel-wired-lan] [PATCH iwl-net V2, 0/2] Fix repeated EEH reports in MSI domain Thinh Tran
@ 2024-05-13 17:55 ` Thinh Tran
  2024-05-13 17:55 ` [Intel-wired-lan] [PATCH iwl-net V2, 2/2] i40e: Fully suspend and resume IO operations in EEH case Thinh Tran
  2024-05-14  9:55 ` [Intel-wired-lan] [PATCH iwl-net V2, 0/2] Fix repeated EEH reports in MSI domain Simon Horman
  2 siblings, 0 replies; 6+ messages in thread
From: Thinh Tran @ 2024-05-13 17:55 UTC (permalink / raw)
  To: netdev, kuba, anthony.l.nguyen, aleksandr.loktionov, przemyslaw.kitszel
  Cc: edumazet, rob.thomas, Thinh Tran, intel-wired-lan, pabeni, davem

Fractoring out i40e_suspend() and i40e_resume() to i40e_io_suspend()
and 40e_io_resume() respectively. 

Reordered the function, i40e_enable_mc_magic_wake() has been moved 
ahead of i40e_io_suspend() to ensure it is declared before being used.

Tested-by: Robert Thomas <rob.thomas@ibm.com>
Signed-off-by: Thinh Tran <thinhtr@linux.ibm.com>
---
 drivers/net/ethernet/intel/i40e/i40e_main.c | 248 +++++++++++---------
 1 file changed, 134 insertions(+), 114 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c
index ffb9f9f15c52..281c8ec27af2 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -16303,6 +16303,138 @@ static void i40e_remove(struct pci_dev *pdev)
 	pci_disable_device(pdev);
 }
 
+/**
+ * i40e_enable_mc_magic_wake - enable multicast magic packet wake up
+ * using the mac_address_write admin q function
+ * @pf: pointer to i40e_pf struct
+ **/
+static void i40e_enable_mc_magic_wake(struct i40e_pf *pf)
+{
+	struct i40e_hw *hw = &pf->hw;
+	u8 mac_addr[6];
+	u16 flags = 0;
+	int ret;
+
+	/* Get current MAC address in case it's an LAA */
+	if (pf->vsi[pf->lan_vsi] && pf->vsi[pf->lan_vsi]->netdev) {
+		ether_addr_copy(mac_addr,
+				pf->vsi[pf->lan_vsi]->netdev->dev_addr);
+	} else {
+		dev_err(&pf->pdev->dev,
+			"Failed to retrieve MAC address; using default\n");
+		ether_addr_copy(mac_addr, hw->mac.addr);
+	}
+
+	/* The FW expects the mac address write cmd to first be called with
+	 * one of these flags before calling it again with the multicast
+	 * enable flags.
+	 */
+	flags = I40E_AQC_WRITE_TYPE_LAA_WOL;
+
+	if (hw->func_caps.flex10_enable && hw->partition_id != 1)
+		flags = I40E_AQC_WRITE_TYPE_LAA_ONLY;
+
+	ret = i40e_aq_mac_address_write(hw, flags, mac_addr, NULL);
+	if (ret) {
+		dev_err(&pf->pdev->dev,
+			"Failed to update MAC address registers; cannot enable Multicast Magic packet wake up");
+		return;
+	}
+
+	flags = I40E_AQC_MC_MAG_EN
+			| I40E_AQC_WOL_PRESERVE_ON_PFR
+			| I40E_AQC_WRITE_TYPE_UPDATE_MC_MAG;
+	ret = i40e_aq_mac_address_write(hw, flags, mac_addr, NULL);
+	if (ret)
+		dev_err(&pf->pdev->dev,
+			"Failed to enable Multicast Magic Packet wake up\n");
+}
+
+/**
+ * i40e_io_suspend - suspend all IO operations
+ * @pf: pointer to i40e_pf struct
+ *
+ **/
+static int i40e_io_suspend(struct i40e_pf *pf)
+{
+	struct i40e_hw *hw = &pf->hw;
+
+	set_bit(__I40E_DOWN, pf->state);
+
+	/* Ensure service task will not be running */
+	del_timer_sync(&pf->service_timer);
+	cancel_work_sync(&pf->service_task);
+
+	/* Client close must be called explicitly here because the timer
+	 * has been stopped.
+	 */
+	i40e_notify_client_of_netdev_close(pf->vsi[pf->lan_vsi], false);
+
+	if (test_bit(I40E_HW_CAP_WOL_MC_MAGIC_PKT_WAKE, pf->hw.caps) &&
+	    pf->wol_en)
+		i40e_enable_mc_magic_wake(pf);
+
+	/* Since we're going to destroy queues during the
+	 * i40e_clear_interrupt_scheme() we should hold the RTNL lock for this
+	 * whole section
+	 */
+	rtnl_lock();
+
+	i40e_prep_for_reset(pf);
+
+	wr32(hw, I40E_PFPM_APM, (pf->wol_en ? I40E_PFPM_APM_APME_MASK : 0));
+	wr32(hw, I40E_PFPM_WUFC, (pf->wol_en ? I40E_PFPM_WUFC_MAG_MASK : 0));
+
+	/* Clear the interrupt scheme and release our IRQs so that the system
+	 * can safely hibernate even when there are a large number of CPUs.
+	 * Otherwise hibernation might fail when mapping all the vectors back
+	 * to CPU0.
+	 */
+	i40e_clear_interrupt_scheme(pf);
+
+	rtnl_unlock();
+
+	return 0;
+}
+
+/**
+ * i40e_io_resume - resume IO operations
+ * @pf: pointer to i40e_pf struct
+ *
+ **/
+static int i40e_io_resume(struct i40e_pf *pf)
+{
+	int err;
+
+	/* We need to hold the RTNL lock prior to restoring interrupt schemes,
+	 * since we're going to be restoring queues
+	 */
+	rtnl_lock();
+
+	/* We cleared the interrupt scheme when we suspended, so we need to
+	 * restore it now to resume device functionality.
+	 */
+	err = i40e_restore_interrupt_scheme(pf);
+	if (err) {
+		dev_err(&pf->pdev->dev, "Cannot restore interrupt scheme: %d\n",
+			err);
+	}
+
+	clear_bit(__I40E_DOWN, pf->state);
+	i40e_reset_and_rebuild(pf, false, true);
+
+	rtnl_unlock();
+
+	/* Clear suspended state last after everything is recovered */
+	clear_bit(__I40E_SUSPENDED, pf->state);
+
+	/* Restart the service task */
+	mod_timer(&pf->service_timer,
+		  round_jiffies(jiffies + pf->service_timer_period));
+
+	return 0;
+}
+
 /**
  * i40e_pci_error_detected - warning that something funky happened in PCI land
  * @pdev: PCI device information struct
@@ -16415,53 +16547,6 @@ static void i40e_pci_error_resume(struct pci_dev *pdev)
 	i40e_handle_reset_warning(pf, false);
 }
 
-/**
- * i40e_enable_mc_magic_wake - enable multicast magic packet wake up
- * using the mac_address_write admin q function
- * @pf: pointer to i40e_pf struct
- **/
-static void i40e_enable_mc_magic_wake(struct i40e_pf *pf)
-{
-	struct i40e_hw *hw = &pf->hw;
-	u8 mac_addr[6];
-	u16 flags = 0;
-	int ret;
-
-	/* Get current MAC address in case it's an LAA */
-	if (pf->vsi[pf->lan_vsi] && pf->vsi[pf->lan_vsi]->netdev) {
-		ether_addr_copy(mac_addr,
-				pf->vsi[pf->lan_vsi]->netdev->dev_addr);
-	} else {
-		dev_err(&pf->pdev->dev,
-			"Failed to retrieve MAC address; using default\n");
-		ether_addr_copy(mac_addr, hw->mac.addr);
-	}
-
-	/* The FW expects the mac address write cmd to first be called with
-	 * one of these flags before calling it again with the multicast
-	 * enable flags.
-	 */
-	flags = I40E_AQC_WRITE_TYPE_LAA_WOL;
-
-	if (hw->func_caps.flex10_enable && hw->partition_id != 1)
-		flags = I40E_AQC_WRITE_TYPE_LAA_ONLY;
-
-	ret = i40e_aq_mac_address_write(hw, flags, mac_addr, NULL);
-	if (ret) {
-		dev_err(&pf->pdev->dev,
-			"Failed to update MAC address registers; cannot enable Multicast Magic packet wake up");
-		return;
-	}
-
-	flags = I40E_AQC_MC_MAG_EN
-			| I40E_AQC_WOL_PRESERVE_ON_PFR
-			| I40E_AQC_WRITE_TYPE_UPDATE_MC_MAG;
-	ret = i40e_aq_mac_address_write(hw, flags, mac_addr, NULL);
-	if (ret)
-		dev_err(&pf->pdev->dev,
-			"Failed to enable Multicast Magic Packet wake up\n");
-}
-
 /**
  * i40e_shutdown - PCI callback for shutting down
  * @pdev: PCI device information struct
@@ -16521,48 +16606,11 @@ static void i40e_shutdown(struct pci_dev *pdev)
 static int __maybe_unused i40e_suspend(struct device *dev)
 {
 	struct i40e_pf *pf = dev_get_drvdata(dev);
-	struct i40e_hw *hw = &pf->hw;
 
 	/* If we're already suspended, then there is nothing to do */
 	if (test_and_set_bit(__I40E_SUSPENDED, pf->state))
 		return 0;
-
-	set_bit(__I40E_DOWN, pf->state);
-
-	/* Ensure service task will not be running */
-	del_timer_sync(&pf->service_timer);
-	cancel_work_sync(&pf->service_task);
-
-	/* Client close must be called explicitly here because the timer
-	 * has been stopped.
-	 */
-	i40e_notify_client_of_netdev_close(pf->vsi[pf->lan_vsi], false);
-
-	if (test_bit(I40E_HW_CAP_WOL_MC_MAGIC_PKT_WAKE, pf->hw.caps) &&
-	    pf->wol_en)
-		i40e_enable_mc_magic_wake(pf);
-
-	/* Since we're going to destroy queues during the
-	 * i40e_clear_interrupt_scheme() we should hold the RTNL lock for this
-	 * whole section
-	 */
-	rtnl_lock();
-
-	i40e_prep_for_reset(pf);
-
-	wr32(hw, I40E_PFPM_APM, (pf->wol_en ? I40E_PFPM_APM_APME_MASK : 0));
-	wr32(hw, I40E_PFPM_WUFC, (pf->wol_en ? I40E_PFPM_WUFC_MAG_MASK : 0));
-
-	/* Clear the interrupt scheme and release our IRQs so that the system
-	 * can safely hibernate even when there are a large number of CPUs.
-	 * Otherwise hibernation might fail when mapping all the vectors back
-	 * to CPU0.
-	 */
-	i40e_clear_interrupt_scheme(pf);
-
-	rtnl_unlock();
-
-	return 0;
+	return i40e_io_suspend(pf);
 }
 
 /**
@@ -16572,39 +16620,11 @@ static int __maybe_unused i40e_suspend(struct device *dev)
 static int __maybe_unused i40e_resume(struct device *dev)
 {
 	struct i40e_pf *pf = dev_get_drvdata(dev);
-	int err;
 
 	/* If we're not suspended, then there is nothing to do */
 	if (!test_bit(__I40E_SUSPENDED, pf->state))
 		return 0;
-
-	/* We need to hold the RTNL lock prior to restoring interrupt schemes,
-	 * since we're going to be restoring queues
-	 */
-	rtnl_lock();
-
-	/* We cleared the interrupt scheme when we suspended, so we need to
-	 * restore it now to resume device functionality.
-	 */
-	err = i40e_restore_interrupt_scheme(pf);
-	if (err) {
-		dev_err(dev, "Cannot restore interrupt scheme: %d\n",
-			err);
-	}
-
-	clear_bit(__I40E_DOWN, pf->state);
-	i40e_reset_and_rebuild(pf, false, true);
-
-	rtnl_unlock();
-
-	/* Clear suspended state last after everything is recovered */
-	clear_bit(__I40E_SUSPENDED, pf->state);
-
-	/* Restart the service task */
-	mod_timer(&pf->service_timer,
-		  round_jiffies(jiffies + pf->service_timer_period));
-
-	return 0;
+	return i40e_io_resume(pf);
 }
 
 static const struct pci_error_handlers i40e_err_handler = {
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [Intel-wired-lan] [PATCH iwl-net V2, 2/2] i40e: Fully suspend and resume IO operations in EEH case
  2024-05-13 17:55 [Intel-wired-lan] [PATCH iwl-net V2, 0/2] Fix repeated EEH reports in MSI domain Thinh Tran
  2024-05-13 17:55 ` [Intel-wired-lan] [PATCH iwl-net V2, 1/2] i40e: fractoring out i40e_suspend/i40e_resume Thinh Tran
@ 2024-05-13 17:55 ` Thinh Tran
  2024-05-14  9:55 ` [Intel-wired-lan] [PATCH iwl-net V2, 0/2] Fix repeated EEH reports in MSI domain Simon Horman
  2 siblings, 0 replies; 6+ messages in thread
From: Thinh Tran @ 2024-05-13 17:55 UTC (permalink / raw)
  To: netdev, kuba, anthony.l.nguyen, aleksandr.loktionov, przemyslaw.kitszel
  Cc: edumazet, rob.thomas, Thinh Tran, intel-wired-lan, pabeni, davem

When EEH events occurs, the callback functions in the i40e, which are
managed by the EEH driver, will completely suspend and resume all IO
operations.

Fixes: a5f3d2c17b07 ("powerpc/pseries/pci: Add MSI domains")
Tested-by: Robert Thomas <rob.thomas@ibm.com>
Signed-off-by: Thinh Tran <thinhtr@linux.ibm.com>
---
 drivers/net/ethernet/intel/i40e/i40e_main.c | 9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c
index 281c8ec27af2..9f71a61e0c52 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -11138,6 +11138,8 @@ static void i40e_reset_and_rebuild(struct i40e_pf *pf, bool reinit,
 	ret = i40e_reset(pf);
 	if (!ret)
 		i40e_rebuild(pf, reinit, lock_acquired);
+	else
+		dev_err(&pf->pdev->dev, "%s: i40e_reset() FAILED", __func__);
 }
 
 /**
@@ -16459,7 +16461,7 @@ static pci_ers_result_t i40e_pci_error_detected(struct pci_dev *pdev,
 
 	/* shutdown all operations */
 	if (!test_bit(__I40E_SUSPENDED, pf->state))
-		i40e_prep_for_reset(pf);
+		i40e_io_suspend(pf);
 
 	/* Request a slot reset */
 	return PCI_ERS_RESULT_NEED_RESET;
@@ -16481,7 +16483,8 @@ static pci_ers_result_t i40e_pci_error_slot_reset(struct pci_dev *pdev)
 	u32 reg;
 
 	dev_dbg(&pdev->dev, "%s\n", __func__);
-	if (pci_enable_device_mem(pdev)) {
+	/* enable I/O and memory of the device  */
+	if (pci_enable_device(pdev)) {
 		dev_info(&pdev->dev,
 			 "Cannot re-enable PCI device after reset.\n");
 		result = PCI_ERS_RESULT_DISCONNECT;
@@ -16544,7 +16547,7 @@ static void i40e_pci_error_resume(struct pci_dev *pdev)
 	if (test_bit(__I40E_SUSPENDED, pf->state))
 		return;
 
-	i40e_handle_reset_warning(pf, false);
+	i40e_io_resume(pf);
 }
 
 /**
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [Intel-wired-lan] [PATCH iwl-net V2, 0/2] Fix repeated EEH reports in MSI domain
  2024-05-13 17:55 [Intel-wired-lan] [PATCH iwl-net V2, 0/2] Fix repeated EEH reports in MSI domain Thinh Tran
  2024-05-13 17:55 ` [Intel-wired-lan] [PATCH iwl-net V2, 1/2] i40e: fractoring out i40e_suspend/i40e_resume Thinh Tran
  2024-05-13 17:55 ` [Intel-wired-lan] [PATCH iwl-net V2, 2/2] i40e: Fully suspend and resume IO operations in EEH case Thinh Tran
@ 2024-05-14  9:55 ` Simon Horman
  2024-05-14 17:11   ` Jacob Keller
  2024-05-14 19:52   ` Thinh Tran
  2 siblings, 2 replies; 6+ messages in thread
From: Simon Horman @ 2024-05-14  9:55 UTC (permalink / raw)
  To: Thinh Tran
  Cc: netdev, rob.thomas, aleksandr.loktionov, edumazet,
	anthony.l.nguyen, przemyslaw.kitszel, kuba, pabeni, davem,
	intel-wired-lan

On Mon, May 13, 2024 at 12:55:47PM -0500, Thinh Tran wrote:
> The patch fixes an issue where repeated EEH reports with a single error
> on the bus of Intel X710 4-port 10G Base-T adapter in the MSI domain
> causes the device to be permanently disabled.  It fully resets and
> restarts the device when handling the PCI EEH error.
> 
> Two new functions, i40e_io_suspend() and i40e_io_resume(), have been
> introduced.  These functions were factored out from the existing
> i40e_suspend() and i40e_resume() respectively.  This factoring was
> done due to concerns about the logic of the I40E_SUSPENSED state, which
> caused the device not able to recover.  The functions are now used in the
> EEH handling for device suspend/resume callbacks.
> 
> - In the PCI error detected callback, replaced i40e_prep_for_reset()
>   with i40e_io_suspend(). The change is to fully suspend all I/O
>   operations
> - In the PCI error slot reset callback, replaced pci_enable_device_mem()
>   with pci_enable_device(). This change enables both I/O and memory of 
>   the device.
> - In the PCI error resume callback, replaced i40e_handle_reset_warning()
>   with i40e_io_resume(). This change allows the system to resume I/O 
>   operations
> 
> v2: fixed typos and split into two commits

Hi,

These patches look good to me, but I think it would be worth adding parts
of the text above to the commit messages of each patch. This will make the
information easier to find in git logs in future.

> 
> Thinh Tran (2):
>   i40e: fractoring out i40e_suspend/i40e_resume
>   i40e: Fully suspend and resume IO operations in EEH case
> 
>  drivers/net/ethernet/intel/i40e/i40e_main.c | 257 +++++++++++---------
>  1 file changed, 140 insertions(+), 117 deletions(-)
> 
> -- 
> 2.25.1
> 
> 

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Intel-wired-lan] [PATCH iwl-net V2, 0/2] Fix repeated EEH reports in MSI domain
  2024-05-14  9:55 ` [Intel-wired-lan] [PATCH iwl-net V2, 0/2] Fix repeated EEH reports in MSI domain Simon Horman
@ 2024-05-14 17:11   ` Jacob Keller
  2024-05-14 19:52   ` Thinh Tran
  1 sibling, 0 replies; 6+ messages in thread
From: Jacob Keller @ 2024-05-14 17:11 UTC (permalink / raw)
  To: intel-wired-lan



On 5/14/2024 2:55 AM, Simon Horman wrote:
> On Mon, May 13, 2024 at 12:55:47PM -0500, Thinh Tran wrote:
>> The patch fixes an issue where repeated EEH reports with a single error
>> on the bus of Intel X710 4-port 10G Base-T adapter in the MSI domain
>> causes the device to be permanently disabled.  It fully resets and
>> restarts the device when handling the PCI EEH error.
>>
>> Two new functions, i40e_io_suspend() and i40e_io_resume(), have been
>> introduced.  These functions were factored out from the existing
>> i40e_suspend() and i40e_resume() respectively.  This factoring was
>> done due to concerns about the logic of the I40E_SUSPENSED state, which
>> caused the device not able to recover.  The functions are now used in the
>> EEH handling for device suspend/resume callbacks.
>>
>> - In the PCI error detected callback, replaced i40e_prep_for_reset()
>>   with i40e_io_suspend(). The change is to fully suspend all I/O
>>   operations
>> - In the PCI error slot reset callback, replaced pci_enable_device_mem()
>>   with pci_enable_device(). This change enables both I/O and memory of 
>>   the device.
>> - In the PCI error resume callback, replaced i40e_handle_reset_warning()
>>   with i40e_io_resume(). This change allows the system to resume I/O 
>>   operations
>>
>> v2: fixed typos and split into two commits
> 
> Hi,
> 
> These patches look good to me, but I think it would be worth adding parts
> of the text above to the commit messages of each patch. This will make the
> information easier to find in git logs in future.
> 

Yes please, I'd like a reworded message as well so that we don't lose
this important context.

>>
>> Thinh Tran (2):
>>   i40e: fractoring out i40e_suspend/i40e_resume
>>   i40e: Fully suspend and resume IO operations in EEH case
>>
>>  drivers/net/ethernet/intel/i40e/i40e_main.c | 257 +++++++++++---------
>>  1 file changed, 140 insertions(+), 117 deletions(-)
>>
>> -- 
>> 2.25.1
>>>>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Intel-wired-lan] [PATCH iwl-net V2, 0/2] Fix repeated EEH reports in MSI domain
  2024-05-14  9:55 ` [Intel-wired-lan] [PATCH iwl-net V2, 0/2] Fix repeated EEH reports in MSI domain Simon Horman
  2024-05-14 17:11   ` Jacob Keller
@ 2024-05-14 19:52   ` Thinh Tran
  1 sibling, 0 replies; 6+ messages in thread
From: Thinh Tran @ 2024-05-14 19:52 UTC (permalink / raw)
  To: Simon Horman
  Cc: netdev, rob.thomas, aleksandr.loktionov, edumazet,
	anthony.l.nguyen, przemyslaw.kitszel, kuba, pabeni, davem,
	intel-wired-lan


Thanks for reviewing.

On 5/14/2024 4:55 AM, Simon Horman wrote:
> On Mon, May 13, 2024 at 12:55:47PM -0500, Thinh Tran wrote:
>> The patch fixes an issue where repeated EEH reports with a single error
>> on the bus of Intel X710 4-port 10G Base-T adapter in the MSI domain
>> causes the device to be permanently disabled.  It fully resets and
>> restarts the device when handling the PCI EEH error.
>>
>> Two new functions, i40e_io_suspend() and i40e_io_resume(), have been
>> introduced.  These functions were factored out from the existing
>> i40e_suspend() and i40e_resume() respectively.  This factoring was
>> done due to concerns about the logic of the I40E_SUSPENSED state, which
>> caused the device not able to recover.  The functions are now used in the
>> EEH handling for device suspend/resume callbacks.
>>
>> - In the PCI error detected callback, replaced i40e_prep_for_reset()
>>    with i40e_io_suspend(). The change is to fully suspend all I/O
>>    operations
>> - In the PCI error slot reset callback, replaced pci_enable_device_mem()
>>    with pci_enable_device(). This change enables both I/O and memory of
>>    the device.
>> - In the PCI error resume callback, replaced i40e_handle_reset_warning()
>>    with i40e_io_resume(). This change allows the system to resume I/O
>>    operations
>>
>> v2: fixed typos and split into two commits
> 
> Hi,
> 
> These patches look good to me, but I think it would be worth adding parts
> of the text above to the commit messages of each patch. This will make the
> information easier to find in git logs in future.
> 

I'll move the text to patches' commit messages.
Thanks
Thinh Tran

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2024-05-14 19:52 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-05-13 17:55 [Intel-wired-lan] [PATCH iwl-net V2, 0/2] Fix repeated EEH reports in MSI domain Thinh Tran
2024-05-13 17:55 ` [Intel-wired-lan] [PATCH iwl-net V2, 1/2] i40e: fractoring out i40e_suspend/i40e_resume Thinh Tran
2024-05-13 17:55 ` [Intel-wired-lan] [PATCH iwl-net V2, 2/2] i40e: Fully suspend and resume IO operations in EEH case Thinh Tran
2024-05-14  9:55 ` [Intel-wired-lan] [PATCH iwl-net V2, 0/2] Fix repeated EEH reports in MSI domain Simon Horman
2024-05-14 17:11   ` Jacob Keller
2024-05-14 19:52   ` Thinh Tran

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).