[next-queue PATCH v2 0/2] Address IRQ related crash seen due to io_perm

All of lore.kernel.org
 help / color / mirror / Atom feed

* [next-queue PATCH v2 0/2] Address IRQ related crash seen due to io_perm_failure
@ 2019-10-11 15:34 ` Alexander Duyck
  0 siblings, 0 replies; 12+ messages in thread
From: Alexander Duyck @ 2019-10-11 15:34 UTC (permalink / raw)
  To: alexander.h.duyck, intel-wired-lan, jeffrey.t.kirsher; +Cc: netdev, zdai, zdai

David Dai had submitted a patch[1] to address a reported issue with e1000e
calling pci_disable_msi without first freeing the interrupts. Looking over
the issue it seems the problem was the fact that e1000e_down was being
called in e1000_io_error_detected without calling e1000_free_irq, and this
was resulting in e1000e_close skipping over the call to e1000e_down and
e1000_free_irq.

The use of the __E1000_DOWN flag for the close test seems to have come from
the runtime power management changes that were made some time ago. From
what I can tell in the close path we should be disabling runtime power
management via a call to pm_runtime_get_sync. As such we can remove the
test for the __E1000_DOWN bit. However in comparing this with other drivers
we do need to avoid freeing the IRQs more than once. So in order to address
that I have copied the approach taken in igb and taken it a bit further so
that we will always detach the interface and if the interface is up we will
bring it down and free the IRQs. In addition we are able to reuse some of
the power management code so I have taken the opportunity to merge those
bits.

[1]: https://lore.kernel.org/lkml/1570121672-12172-1-git-send-email-zdai@linux.vnet.ibm.com/

v2: Move e1000e_pm_thaw out of CONFIG_PM region to fix build issue on Sparc64

---

Alexander Duyck (2):
      e1000e: Use rtnl_lock to prevent race conditions between net and pci/pm
      e1000e: Drop unnecessary __E1000_DOWN bit twiddling

 drivers/net/ethernet/intel/e1000e/netdev.c |   75 +++++++++++++---------------
 1 file changed, 36 insertions(+), 39 deletions(-)

--

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Intel-wired-lan] [next-queue PATCH v2 0/2] Address IRQ related crash seen due to io_perm_failure
@ 2019-10-11 15:34 ` Alexander Duyck
  0 siblings, 0 replies; 12+ messages in thread
From: Alexander Duyck @ 2019-10-11 15:34 UTC (permalink / raw)
  To: intel-wired-lan

David Dai had submitted a patch[1] to address a reported issue with e1000e
calling pci_disable_msi without first freeing the interrupts. Looking over
the issue it seems the problem was the fact that e1000e_down was being
called in e1000_io_error_detected without calling e1000_free_irq, and this
was resulting in e1000e_close skipping over the call to e1000e_down and
e1000_free_irq.

The use of the __E1000_DOWN flag for the close test seems to have come from
the runtime power management changes that were made some time ago. From
what I can tell in the close path we should be disabling runtime power
management via a call to pm_runtime_get_sync. As such we can remove the
test for the __E1000_DOWN bit. However in comparing this with other drivers
we do need to avoid freeing the IRQs more than once. So in order to address
that I have copied the approach taken in igb and taken it a bit further so
that we will always detach the interface and if the interface is up we will
bring it down and free the IRQs. In addition we are able to reuse some of
the power management code so I have taken the opportunity to merge those
bits.

[1]: https://lore.kernel.org/lkml/1570121672-12172-1-git-send-email-zdai at linux.vnet.ibm.com/

v2: Move e1000e_pm_thaw out of CONFIG_PM region to fix build issue on Sparc64

---

Alexander Duyck (2):
      e1000e: Use rtnl_lock to prevent race conditions between net and pci/pm
      e1000e: Drop unnecessary __E1000_DOWN bit twiddling

 drivers/net/ethernet/intel/e1000e/netdev.c |   75 +++++++++++++---------------
 1 file changed, 36 insertions(+), 39 deletions(-)

--

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [next-queue PATCH v2 1/2] e1000e: Use rtnl_lock to prevent race conditions between net and pci/pm
  2019-10-11 15:34 ` [Intel-wired-lan] " Alexander Duyck
@ 2019-10-11 15:34   ` Alexander Duyck
  -1 siblings, 0 replies; 12+ messages in thread
From: Alexander Duyck @ 2019-10-11 15:34 UTC (permalink / raw)
  To: alexander.h.duyck, intel-wired-lan, jeffrey.t.kirsher; +Cc: netdev, zdai, zdai

From: Alexander Duyck <alexander.h.duyck@linux.intel.com>

This patch is meant to address possible race conditions that can exist
between network configuration and power management. A similar issue was
fixed for igb in commit 9474933caf21 ("igb: close/suspend race in
netif_device_detach").

In addition it consolidates the code so that the PCI error handling code
will essentially perform the power management freeze on the device prior to
attempting a reset, and will thaw the device afterwards if that is what it
is planning to do. Otherwise when we call close on the interface it should
see it is detached and not attempt to call the logic to down the interface
and free the IRQs again.

>From what I can tell the check that was adding the check for __E1000_DOWN
in e1000e_close was added when runtime power management was added. However
it should not be relevant for us as we perform a call to
pm_runtime_get_sync before we call e1000_down/free_irq so it should always
be back up before we call into this anyway.

Reported-by: Morumuri Srivalli <smorumu1@in.ibm.com>
Tested-by: David Dai <zdai@linux.vnet.ibm.com>
Signed-off-by: Alexander Duyck <alexander.h.duyck@linux.intel.com>
---
 drivers/net/ethernet/intel/e1000e/netdev.c |   68 ++++++++++++++--------------
 1 file changed, 35 insertions(+), 33 deletions(-)

diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c b/drivers/net/ethernet/intel/e1000e/netdev.c
index d7d56e42a6aa..db1591eef28e 100644
--- a/drivers/net/ethernet/intel/e1000e/netdev.c
+++ b/drivers/net/ethernet/intel/e1000e/netdev.c
@@ -4715,12 +4715,12 @@ int e1000e_close(struct net_device *netdev)
 
 	pm_runtime_get_sync(&pdev->dev);
 
-	if (!test_bit(__E1000_DOWN, &adapter->state)) {
+	if (netif_device_present(netdev)) {
 		e1000e_down(adapter, true);
 		e1000_free_irq(adapter);
 
 		/* Link status message must follow this format */
-		pr_info("%s NIC Link is Down\n", adapter->netdev->name);
+		pr_info("%s NIC Link is Down\n", netdev->name);
 	}
 
 	napi_disable(&adapter->napi);
@@ -6298,10 +6298,14 @@ static int e1000e_pm_freeze(struct device *dev)
 {
 	struct net_device *netdev = dev_get_drvdata(dev);
 	struct e1000_adapter *adapter = netdev_priv(netdev);
+	bool present;
 
+	rtnl_lock();
+
+	present = netif_device_present(netdev);
 	netif_device_detach(netdev);
 
-	if (netif_running(netdev)) {
+	if (present && netif_running(netdev)) {
 		int count = E1000_CHECK_RESET_COUNT;
 
 		while (test_bit(__E1000_RESETTING, &adapter->state) && count--)
@@ -6313,6 +6317,8 @@ static int e1000e_pm_freeze(struct device *dev)
 		e1000e_down(adapter, false);
 		e1000_free_irq(adapter);
 	}
+	rtnl_unlock();
+
 	e1000e_reset_interrupt_capability(adapter);
 
 	/* Allow time for pending master requests to run */
@@ -6560,6 +6566,30 @@ static void e1000e_disable_aspm_locked(struct pci_dev *pdev, u16 state)
 	__e1000e_disable_aspm(pdev, state, 1);
 }
 
+static int e1000e_pm_thaw(struct device *dev)
+{
+	struct net_device *netdev = dev_get_drvdata(dev);
+	struct e1000_adapter *adapter = netdev_priv(netdev);
+	int rc = 0;
+
+	e1000e_set_interrupt_capability(adapter);
+
+	rtnl_lock();
+	if (netif_running(netdev)) {
+		rc = e1000_request_irq(adapter);
+		if (rc)
+			goto err_irq;
+
+		e1000e_up(adapter);
+	}
+
+	netif_device_attach(netdev);
+err_irq:
+	rtnl_unlock();
+
+	return rc;
+}
+
 #ifdef CONFIG_PM
 static int __e1000_resume(struct pci_dev *pdev)
 {
@@ -6627,26 +6657,6 @@ static int __e1000_resume(struct pci_dev *pdev)
 }
 
 #ifdef CONFIG_PM_SLEEP
-static int e1000e_pm_thaw(struct device *dev)
-{
-	struct net_device *netdev = dev_get_drvdata(dev);
-	struct e1000_adapter *adapter = netdev_priv(netdev);
-
-	e1000e_set_interrupt_capability(adapter);
-	if (netif_running(netdev)) {
-		u32 err = e1000_request_irq(adapter);
-
-		if (err)
-			return err;
-
-		e1000e_up(adapter);
-	}
-
-	netif_device_attach(netdev);
-
-	return 0;
-}
-
 static int e1000e_pm_suspend(struct device *dev)
 {
 	struct pci_dev *pdev = to_pci_dev(dev);
@@ -6818,16 +6828,11 @@ static void e1000_netpoll(struct net_device *netdev)
 static pci_ers_result_t e1000_io_error_detected(struct pci_dev *pdev,
 						pci_channel_state_t state)
 {
-	struct net_device *netdev = pci_get_drvdata(pdev);
-	struct e1000_adapter *adapter = netdev_priv(netdev);
-
-	netif_device_detach(netdev);
+	e1000e_pm_freeze(&pdev->dev);
 
 	if (state == pci_channel_io_perm_failure)
 		return PCI_ERS_RESULT_DISCONNECT;
 
-	if (netif_running(netdev))
-		e1000e_down(adapter, true);
 	pci_disable_device(pdev);
 
 	/* Request a slot slot reset. */
@@ -6893,10 +6898,7 @@ static void e1000_io_resume(struct pci_dev *pdev)
 
 	e1000_init_manageability_pt(adapter);
 
-	if (netif_running(netdev))
-		e1000e_up(adapter);
-
-	netif_device_attach(netdev);
+	e1000e_pm_thaw(&pdev->dev);
 
 	/* If the controller has AMT, do not set DRV_LOAD until the interface
 	 * is up.  For all other cases, let the f/w know that the h/w is now


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [Intel-wired-lan] [next-queue PATCH v2 1/2] e1000e: Use rtnl_lock to prevent race conditions between net and pci/pm
@ 2019-10-11 15:34   ` Alexander Duyck
  0 siblings, 0 replies; 12+ messages in thread
From: Alexander Duyck @ 2019-10-11 15:34 UTC (permalink / raw)
  To: intel-wired-lan

From: Alexander Duyck <alexander.h.duyck@linux.intel.com>

This patch is meant to address possible race conditions that can exist
between network configuration and power management. A similar issue was
fixed for igb in commit 9474933caf21 ("igb: close/suspend race in
netif_device_detach").

In addition it consolidates the code so that the PCI error handling code
will essentially perform the power management freeze on the device prior to
attempting a reset, and will thaw the device afterwards if that is what it
is planning to do. Otherwise when we call close on the interface it should
see it is detached and not attempt to call the logic to down the interface
and free the IRQs again.

From what I can tell the check that was adding the check for __E1000_DOWN
in e1000e_close was added when runtime power management was added. However
it should not be relevant for us as we perform a call to
pm_runtime_get_sync before we call e1000_down/free_irq so it should always
be back up before we call into this anyway.

Reported-by: Morumuri Srivalli <smorumu1@in.ibm.com>
Tested-by: David Dai <zdai@linux.vnet.ibm.com>
Signed-off-by: Alexander Duyck <alexander.h.duyck@linux.intel.com>
---
 drivers/net/ethernet/intel/e1000e/netdev.c |   68 ++++++++++++++--------------
 1 file changed, 35 insertions(+), 33 deletions(-)

diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c b/drivers/net/ethernet/intel/e1000e/netdev.c
index d7d56e42a6aa..db1591eef28e 100644
--- a/drivers/net/ethernet/intel/e1000e/netdev.c
+++ b/drivers/net/ethernet/intel/e1000e/netdev.c
@@ -4715,12 +4715,12 @@ int e1000e_close(struct net_device *netdev)
 
 	pm_runtime_get_sync(&pdev->dev);
 
-	if (!test_bit(__E1000_DOWN, &adapter->state)) {
+	if (netif_device_present(netdev)) {
 		e1000e_down(adapter, true);
 		e1000_free_irq(adapter);
 
 		/* Link status message must follow this format */
-		pr_info("%s NIC Link is Down\n", adapter->netdev->name);
+		pr_info("%s NIC Link is Down\n", netdev->name);
 	}
 
 	napi_disable(&adapter->napi);
@@ -6298,10 +6298,14 @@ static int e1000e_pm_freeze(struct device *dev)
 {
 	struct net_device *netdev = dev_get_drvdata(dev);
 	struct e1000_adapter *adapter = netdev_priv(netdev);
+	bool present;
 
+	rtnl_lock();
+
+	present = netif_device_present(netdev);
 	netif_device_detach(netdev);
 
-	if (netif_running(netdev)) {
+	if (present && netif_running(netdev)) {
 		int count = E1000_CHECK_RESET_COUNT;
 
 		while (test_bit(__E1000_RESETTING, &adapter->state) && count--)
@@ -6313,6 +6317,8 @@ static int e1000e_pm_freeze(struct device *dev)
 		e1000e_down(adapter, false);
 		e1000_free_irq(adapter);
 	}
+	rtnl_unlock();
+
 	e1000e_reset_interrupt_capability(adapter);
 
 	/* Allow time for pending master requests to run */
@@ -6560,6 +6566,30 @@ static void e1000e_disable_aspm_locked(struct pci_dev *pdev, u16 state)
 	__e1000e_disable_aspm(pdev, state, 1);
 }
 
+static int e1000e_pm_thaw(struct device *dev)
+{
+	struct net_device *netdev = dev_get_drvdata(dev);
+	struct e1000_adapter *adapter = netdev_priv(netdev);
+	int rc = 0;
+
+	e1000e_set_interrupt_capability(adapter);
+
+	rtnl_lock();
+	if (netif_running(netdev)) {
+		rc = e1000_request_irq(adapter);
+		if (rc)
+			goto err_irq;
+
+		e1000e_up(adapter);
+	}
+
+	netif_device_attach(netdev);
+err_irq:
+	rtnl_unlock();
+
+	return rc;
+}
+
 #ifdef CONFIG_PM
 static int __e1000_resume(struct pci_dev *pdev)
 {
@@ -6627,26 +6657,6 @@ static int __e1000_resume(struct pci_dev *pdev)
 }
 
 #ifdef CONFIG_PM_SLEEP
-static int e1000e_pm_thaw(struct device *dev)
-{
-	struct net_device *netdev = dev_get_drvdata(dev);
-	struct e1000_adapter *adapter = netdev_priv(netdev);
-
-	e1000e_set_interrupt_capability(adapter);
-	if (netif_running(netdev)) {
-		u32 err = e1000_request_irq(adapter);
-
-		if (err)
-			return err;
-
-		e1000e_up(adapter);
-	}
-
-	netif_device_attach(netdev);
-
-	return 0;
-}
-
 static int e1000e_pm_suspend(struct device *dev)
 {
 	struct pci_dev *pdev = to_pci_dev(dev);
@@ -6818,16 +6828,11 @@ static void e1000_netpoll(struct net_device *netdev)
 static pci_ers_result_t e1000_io_error_detected(struct pci_dev *pdev,
 						pci_channel_state_t state)
 {
-	struct net_device *netdev = pci_get_drvdata(pdev);
-	struct e1000_adapter *adapter = netdev_priv(netdev);
-
-	netif_device_detach(netdev);
+	e1000e_pm_freeze(&pdev->dev);
 
 	if (state == pci_channel_io_perm_failure)
 		return PCI_ERS_RESULT_DISCONNECT;
 
-	if (netif_running(netdev))
-		e1000e_down(adapter, true);
 	pci_disable_device(pdev);
 
 	/* Request a slot slot reset. */
@@ -6893,10 +6898,7 @@ static void e1000_io_resume(struct pci_dev *pdev)
 
 	e1000_init_manageability_pt(adapter);
 
-	if (netif_running(netdev))
-		e1000e_up(adapter);
-
-	netif_device_attach(netdev);
+	e1000e_pm_thaw(&pdev->dev);
 
 	/* If the controller has AMT, do not set DRV_LOAD until the interface
 	 * is up.  For all other cases, let the f/w know that the h/w is now


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [next-queue PATCH v2 2/2] e1000e: Drop unnecessary __E1000_DOWN bit twiddling
  2019-10-11 15:34 ` [Intel-wired-lan] " Alexander Duyck
@ 2019-10-11 15:34   ` Alexander Duyck
  -1 siblings, 0 replies; 12+ messages in thread
From: Alexander Duyck @ 2019-10-11 15:34 UTC (permalink / raw)
  To: alexander.h.duyck, intel-wired-lan, jeffrey.t.kirsher; +Cc: netdev, zdai, zdai

From: Alexander Duyck <alexander.h.duyck@linux.intel.com>

Since we no longer check for __E1000_DOWN in e1000e_close we can drop the
spot where we were restoring the bit. This saves us a bit of unnecessary
complexity.

Signed-off-by: Alexander Duyck <alexander.h.duyck@linux.intel.com>
---
 drivers/net/ethernet/intel/e1000e/netdev.c |    7 +------
 1 file changed, 1 insertion(+), 6 deletions(-)

diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c b/drivers/net/ethernet/intel/e1000e/netdev.c
index db1591eef28e..c31259dde78d 100644
--- a/drivers/net/ethernet/intel/e1000e/netdev.c
+++ b/drivers/net/ethernet/intel/e1000e/netdev.c
@@ -7409,15 +7409,13 @@ static void e1000_remove(struct pci_dev *pdev)
 {
 	struct net_device *netdev = pci_get_drvdata(pdev);
 	struct e1000_adapter *adapter = netdev_priv(netdev);
-	bool down = test_bit(__E1000_DOWN, &adapter->state);
 
 	e1000e_ptp_remove(adapter);
 
 	/* The timers may be rescheduled, so explicitly disable them
 	 * from being rescheduled.
 	 */
-	if (!down)
-		set_bit(__E1000_DOWN, &adapter->state);
+	set_bit(__E1000_DOWN, &adapter->state);
 	del_timer_sync(&adapter->phy_info_timer);
 
 	cancel_work_sync(&adapter->reset_task);
@@ -7437,9 +7435,6 @@ static void e1000_remove(struct pci_dev *pdev)
 		}
 	}
 
-	/* Don't lie to e1000_close() down the road. */
-	if (!down)
-		clear_bit(__E1000_DOWN, &adapter->state);
 	unregister_netdev(netdev);
 
 	if (pci_dev_run_wake(pdev))


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [Intel-wired-lan] [next-queue PATCH v2 2/2] e1000e: Drop unnecessary __E1000_DOWN bit twiddling
@ 2019-10-11 15:34   ` Alexander Duyck
  0 siblings, 0 replies; 12+ messages in thread
From: Alexander Duyck @ 2019-10-11 15:34 UTC (permalink / raw)
  To: intel-wired-lan

From: Alexander Duyck <alexander.h.duyck@linux.intel.com>

Since we no longer check for __E1000_DOWN in e1000e_close we can drop the
spot where we were restoring the bit. This saves us a bit of unnecessary
complexity.

Signed-off-by: Alexander Duyck <alexander.h.duyck@linux.intel.com>
---
 drivers/net/ethernet/intel/e1000e/netdev.c |    7 +------
 1 file changed, 1 insertion(+), 6 deletions(-)

diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c b/drivers/net/ethernet/intel/e1000e/netdev.c
index db1591eef28e..c31259dde78d 100644
--- a/drivers/net/ethernet/intel/e1000e/netdev.c
+++ b/drivers/net/ethernet/intel/e1000e/netdev.c
@@ -7409,15 +7409,13 @@ static void e1000_remove(struct pci_dev *pdev)
 {
 	struct net_device *netdev = pci_get_drvdata(pdev);
 	struct e1000_adapter *adapter = netdev_priv(netdev);
-	bool down = test_bit(__E1000_DOWN, &adapter->state);
 
 	e1000e_ptp_remove(adapter);
 
 	/* The timers may be rescheduled, so explicitly disable them
 	 * from being rescheduled.
 	 */
-	if (!down)
-		set_bit(__E1000_DOWN, &adapter->state);
+	set_bit(__E1000_DOWN, &adapter->state);
 	del_timer_sync(&adapter->phy_info_timer);
 
 	cancel_work_sync(&adapter->reset_task);
@@ -7437,9 +7435,6 @@ static void e1000_remove(struct pci_dev *pdev)
 		}
 	}
 
-	/* Don't lie to e1000_close() down the road. */
-	if (!down)
-		clear_bit(__E1000_DOWN, &adapter->state);
 	unregister_netdev(netdev);
 
 	if (pci_dev_run_wake(pdev))


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* RE: [next-queue PATCH v2 1/2] e1000e: Use rtnl_lock to prevent race conditions between net and pci/pm
  2019-10-11 15:34   ` [Intel-wired-lan] " Alexander Duyck
@ 2019-10-23  2:09     ` Brown, Aaron F
  -1 siblings, 0 replies; 12+ messages in thread
From: Brown, Aaron F @ 2019-10-23  2:09 UTC (permalink / raw)
  To: Alexander Duyck, alexander.h.duyck, intel-wired-lan, Kirsher, Jeffrey T
  Cc: netdev, zdai, zdai

> From: netdev-owner@vger.kernel.org <netdev-owner@vger.kernel.org>
> On Behalf Of Alexander Duyck
> Sent: Friday, October 11, 2019 8:35 AM
> To: alexander.h.duyck@linux.intel.com; intel-wired-lan@lists.osuosl.org;
> Kirsher, Jeffrey T <jeffrey.t.kirsher@intel.com>
> Cc: netdev@vger.kernel.org; zdai@us.ibm.com; zdai@linux.vnet.ibm.com
> Subject: [next-queue PATCH v2 1/2] e1000e: Use rtnl_lock to prevent race
> conditions between net and pci/pm
> 
> From: Alexander Duyck <alexander.h.duyck@linux.intel.com>
> 
> This patch is meant to address possible race conditions that can exist
> between network configuration and power management. A similar issue was
> fixed for igb in commit 9474933caf21 ("igb: close/suspend race in
> netif_device_detach").
> 
> In addition it consolidates the code so that the PCI error handling code
> will essentially perform the power management freeze on the device prior to
> attempting a reset, and will thaw the device afterwards if that is what it
> is planning to do. Otherwise when we call close on the interface it should
> see it is detached and not attempt to call the logic to down the interface
> and free the IRQs again.
> 
> >From what I can tell the check that was adding the check for
> __E1000_DOWN
> in e1000e_close was added when runtime power management was added.
> However
> it should not be relevant for us as we perform a call to
> pm_runtime_get_sync before we call e1000_down/free_irq so it should
> always
> be back up before we call into this anyway.
> 
> Reported-by: Morumuri Srivalli <smorumu1@in.ibm.com>
> Tested-by: David Dai <zdai@linux.vnet.ibm.com>
> Signed-off-by: Alexander Duyck <alexander.h.duyck@linux.intel.com>
> ---
>  drivers/net/ethernet/intel/e1000e/netdev.c |   68 ++++++++++++++---------
> -----
>  1 file changed, 35 insertions(+), 33 deletions(-)

Tested-by: Aaron Brown <aaron.f.brown@intel.com>


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Intel-wired-lan] [next-queue PATCH v2 1/2] e1000e: Use rtnl_lock to prevent race conditions between net and pci/pm
@ 2019-10-23  2:09     ` Brown, Aaron F
  0 siblings, 0 replies; 12+ messages in thread
From: Brown, Aaron F @ 2019-10-23  2:09 UTC (permalink / raw)
  To: intel-wired-lan

> From: netdev-owner at vger.kernel.org <netdev-owner@vger.kernel.org>
> On Behalf Of Alexander Duyck
> Sent: Friday, October 11, 2019 8:35 AM
> To: alexander.h.duyck at linux.intel.com; intel-wired-lan at lists.osuosl.org;
> Kirsher, Jeffrey T <jeffrey.t.kirsher@intel.com>
> Cc: netdev at vger.kernel.org; zdai at us.ibm.com; zdai at linux.vnet.ibm.com
> Subject: [next-queue PATCH v2 1/2] e1000e: Use rtnl_lock to prevent race
> conditions between net and pci/pm
> 
> From: Alexander Duyck <alexander.h.duyck@linux.intel.com>
> 
> This patch is meant to address possible race conditions that can exist
> between network configuration and power management. A similar issue was
> fixed for igb in commit 9474933caf21 ("igb: close/suspend race in
> netif_device_detach").
> 
> In addition it consolidates the code so that the PCI error handling code
> will essentially perform the power management freeze on the device prior to
> attempting a reset, and will thaw the device afterwards if that is what it
> is planning to do. Otherwise when we call close on the interface it should
> see it is detached and not attempt to call the logic to down the interface
> and free the IRQs again.
> 
> >From what I can tell the check that was adding the check for
> __E1000_DOWN
> in e1000e_close was added when runtime power management was added.
> However
> it should not be relevant for us as we perform a call to
> pm_runtime_get_sync before we call e1000_down/free_irq so it should
> always
> be back up before we call into this anyway.
> 
> Reported-by: Morumuri Srivalli <smorumu1@in.ibm.com>
> Tested-by: David Dai <zdai@linux.vnet.ibm.com>
> Signed-off-by: Alexander Duyck <alexander.h.duyck@linux.intel.com>
> ---
>  drivers/net/ethernet/intel/e1000e/netdev.c |   68 ++++++++++++++---------
> -----
>  1 file changed, 35 insertions(+), 33 deletions(-)

Tested-by: Aaron Brown <aaron.f.brown@intel.com>


^ permalink raw reply	[flat|nested] 12+ messages in thread

* RE: [Intel-wired-lan] [next-queue PATCH v2 2/2] e1000e: Drop unnecessary __E1000_DOWN bit twiddling
  2019-10-11 15:34   ` [Intel-wired-lan] " Alexander Duyck
@ 2019-10-23  2:10     ` Brown, Aaron F
  -1 siblings, 0 replies; 12+ messages in thread
From: Brown, Aaron F @ 2019-10-23  2:10 UTC (permalink / raw)
  To: Alexander Duyck, alexander.h.duyck, intel-wired-lan, Kirsher, Jeffrey T
  Cc: netdev, zdai, zdai

> From: Intel-wired-lan <intel-wired-lan-bounces@osuosl.org> On Behalf Of
> Alexander Duyck
> Sent: Friday, October 11, 2019 8:35 AM
> To: alexander.h.duyck@linux.intel.com; intel-wired-lan@lists.osuosl.org;
> Kirsher, Jeffrey T <jeffrey.t.kirsher@intel.com>
> Cc: netdev@vger.kernel.org; zdai@us.ibm.com; zdai@linux.vnet.ibm.com
> Subject: [Intel-wired-lan] [next-queue PATCH v2 2/2] e1000e: Drop
> unnecessary __E1000_DOWN bit twiddling
> 
> From: Alexander Duyck <alexander.h.duyck@linux.intel.com>
> 
> Since we no longer check for __E1000_DOWN in e1000e_close we can drop
> the
> spot where we were restoring the bit. This saves us a bit of unnecessary
> complexity.
> 
> Signed-off-by: Alexander Duyck <alexander.h.duyck@linux.intel.com>
> ---
>  drivers/net/ethernet/intel/e1000e/netdev.c |    7 +------
>  1 file changed, 1 insertion(+), 6 deletions(-)
> 

Tested-by: Aaron Brown <aaron.f.brown@intel.com>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Intel-wired-lan] [next-queue PATCH v2 2/2] e1000e: Drop unnecessary __E1000_DOWN bit twiddling
@ 2019-10-23  2:10     ` Brown, Aaron F
  0 siblings, 0 replies; 12+ messages in thread
From: Brown, Aaron F @ 2019-10-23  2:10 UTC (permalink / raw)
  To: intel-wired-lan

> From: Intel-wired-lan <intel-wired-lan-bounces@osuosl.org> On Behalf Of
> Alexander Duyck
> Sent: Friday, October 11, 2019 8:35 AM
> To: alexander.h.duyck at linux.intel.com; intel-wired-lan at lists.osuosl.org;
> Kirsher, Jeffrey T <jeffrey.t.kirsher@intel.com>
> Cc: netdev at vger.kernel.org; zdai at us.ibm.com; zdai at linux.vnet.ibm.com
> Subject: [Intel-wired-lan] [next-queue PATCH v2 2/2] e1000e: Drop
> unnecessary __E1000_DOWN bit twiddling
> 
> From: Alexander Duyck <alexander.h.duyck@linux.intel.com>
> 
> Since we no longer check for __E1000_DOWN in e1000e_close we can drop
> the
> spot where we were restoring the bit. This saves us a bit of unnecessary
> complexity.
> 
> Signed-off-by: Alexander Duyck <alexander.h.duyck@linux.intel.com>
> ---
>  drivers/net/ethernet/intel/e1000e/netdev.c |    7 +------
>  1 file changed, 1 insertion(+), 6 deletions(-)
> 

Tested-by: Aaron Brown <aaron.f.brown@intel.com>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [next-queue PATCH v2 0/2] Address IRQ related crash seen due to io_perm_failure
  2019-10-11 15:34 ` [Intel-wired-lan] " Alexander Duyck
@ 2019-10-29 14:43   ` David Z. Dai
  -1 siblings, 0 replies; 12+ messages in thread
From: David Z. Dai @ 2019-10-29 14:43 UTC (permalink / raw)
  To: Alexander Duyck
  Cc: alexander.h.duyck, intel-wired-lan, jeffrey.t.kirsher, netdev, zdai

On Fri, 2019-10-11 at 08:34 -0700, Alexander Duyck wrote:
> David Dai had submitted a patch[1] to address a reported issue with e1000e
> calling pci_disable_msi without first freeing the interrupts. Looking over
> the issue it seems the problem was the fact that e1000e_down was being
> called in e1000_io_error_detected without calling e1000_free_irq, and this
> was resulting in e1000e_close skipping over the call to e1000e_down and
> e1000_free_irq.
> 
> The use of the __E1000_DOWN flag for the close test seems to have come from
> the runtime power management changes that were made some time ago. From
> what I can tell in the close path we should be disabling runtime power
> management via a call to pm_runtime_get_sync. As such we can remove the
> test for the __E1000_DOWN bit. However in comparing this with other drivers
> we do need to avoid freeing the IRQs more than once. So in order to address
> that I have copied the approach taken in igb and taken it a bit further so
> that we will always detach the interface and if the interface is up we will
> bring it down and free the IRQs. In addition we are able to reuse some of
> the power management code so I have taken the opportunity to merge those
> bits.
> 
> [1]: https://lore.kernel.org/lkml/1570121672-12172-1-git-send-email-zdai@linux.vnet.ibm.com/
> 
> v2: Move e1000e_pm_thaw out of CONFIG_PM region to fix build issue on Sparc64
> 
> ---
> 
> Alexander Duyck (2):
>       e1000e: Use rtnl_lock to prevent race conditions between net and pci/pm
>       e1000e: Drop unnecessary __E1000_DOWN bit twiddling
> 
> 
>  drivers/net/ethernet/intel/e1000e/netdev.c |   75 +++++++++++++---------------
>  1 file changed, 36 insertions(+), 39 deletions(-)
> 
I am not familiar with the process. Don't mean to push you in any way.
Just want to check if these 2 v2 patches will be accepted by upstream?
or any thing else needs to be done to finish the process? 

Thanks! - David


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Intel-wired-lan] [next-queue PATCH v2 0/2] Address IRQ related crash seen due to io_perm_failure
@ 2019-10-29 14:43   ` David Z. Dai
  0 siblings, 0 replies; 12+ messages in thread
From: David Z. Dai @ 2019-10-29 14:43 UTC (permalink / raw)
  To: intel-wired-lan

On Fri, 2019-10-11 at 08:34 -0700, Alexander Duyck wrote:
> David Dai had submitted a patch[1] to address a reported issue with e1000e
> calling pci_disable_msi without first freeing the interrupts. Looking over
> the issue it seems the problem was the fact that e1000e_down was being
> called in e1000_io_error_detected without calling e1000_free_irq, and this
> was resulting in e1000e_close skipping over the call to e1000e_down and
> e1000_free_irq.
> 
> The use of the __E1000_DOWN flag for the close test seems to have come from
> the runtime power management changes that were made some time ago. From
> what I can tell in the close path we should be disabling runtime power
> management via a call to pm_runtime_get_sync. As such we can remove the
> test for the __E1000_DOWN bit. However in comparing this with other drivers
> we do need to avoid freeing the IRQs more than once. So in order to address
> that I have copied the approach taken in igb and taken it a bit further so
> that we will always detach the interface and if the interface is up we will
> bring it down and free the IRQs. In addition we are able to reuse some of
> the power management code so I have taken the opportunity to merge those
> bits.
> 
> [1]: https://lore.kernel.org/lkml/1570121672-12172-1-git-send-email-zdai at linux.vnet.ibm.com/
> 
> v2: Move e1000e_pm_thaw out of CONFIG_PM region to fix build issue on Sparc64
> 
> ---
> 
> Alexander Duyck (2):
>       e1000e: Use rtnl_lock to prevent race conditions between net and pci/pm
>       e1000e: Drop unnecessary __E1000_DOWN bit twiddling
> 
> 
>  drivers/net/ethernet/intel/e1000e/netdev.c |   75 +++++++++++++---------------
>  1 file changed, 36 insertions(+), 39 deletions(-)
> 
I am not familiar with the process. Don't mean to push you in any way.
Just want to check if these 2 v2 patches will be accepted by upstream?
or any thing else needs to be done to finish the process? 

Thanks! - David


^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2019-10-29 14:43 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-10-11 15:34 [next-queue PATCH v2 0/2] Address IRQ related crash seen due to io_perm_failure Alexander Duyck
2019-10-11 15:34 ` [Intel-wired-lan] " Alexander Duyck
2019-10-11 15:34 ` [next-queue PATCH v2 1/2] e1000e: Use rtnl_lock to prevent race conditions between net and pci/pm Alexander Duyck
2019-10-11 15:34   ` [Intel-wired-lan] " Alexander Duyck
2019-10-23  2:09   ` Brown, Aaron F
2019-10-23  2:09     ` [Intel-wired-lan] " Brown, Aaron F
2019-10-11 15:34 ` [next-queue PATCH v2 2/2] e1000e: Drop unnecessary __E1000_DOWN bit twiddling Alexander Duyck
2019-10-11 15:34   ` [Intel-wired-lan] " Alexander Duyck
2019-10-23  2:10   ` Brown, Aaron F
2019-10-23  2:10     ` Brown, Aaron F
2019-10-29 14:43 ` [next-queue PATCH v2 0/2] Address IRQ related crash seen due to io_perm_failure David Z. Dai
2019-10-29 14:43   ` [Intel-wired-lan] " David Z. Dai

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.