linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] e1000e: free IRQ when the link is up or down
@ 2016-11-02 21:08 Tyler Baicar
  2016-11-02 21:44 ` Alexander Duyck
  2016-11-03  8:09 ` [Intel-wired-lan] " Ruinskiy, Dima
  0 siblings, 2 replies; 4+ messages in thread
From: Tyler Baicar @ 2016-11-02 21:08 UTC (permalink / raw)
  To: jeffrey.t.kirsher, intel-wired-lan, netdev, linux-kernel, okaya, timur
  Cc: Tyler Baicar

Move IRQ free code so that it will happen regardless of the
link state. Currently the e1000e driver only releases its IRQ
if the link is up. This is not sufficient because it is
possible for a link to go down without releasing the IRQ. A
secondary bus reset can cause this case to happen.

Signed-off-by: Tyler Baicar <tbaicar@codeaurora.org>
---
 drivers/net/ethernet/intel/e1000e/netdev.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c b/drivers/net/ethernet/intel/e1000e/netdev.c
index 7017281..36cfcb0 100644
--- a/drivers/net/ethernet/intel/e1000e/netdev.c
+++ b/drivers/net/ethernet/intel/e1000e/netdev.c
@@ -4679,12 +4679,13 @@ int e1000e_close(struct net_device *netdev)
 
 	if (!test_bit(__E1000_DOWN, &adapter->state)) {
 		e1000e_down(adapter, true);
-		e1000_free_irq(adapter);
 
 		/* Link status message must follow this format */
 		pr_info("%s NIC Link is Down\n", adapter->netdev->name);
 	}
 
+	e1000_free_irq(adapter);
+
 	napi_disable(&adapter->napi);
 
 	e1000e_free_tx_resources(adapter->tx_ring);
-- 
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project.

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH] e1000e: free IRQ when the link is up or down
  2016-11-02 21:08 [PATCH] e1000e: free IRQ when the link is up or down Tyler Baicar
@ 2016-11-02 21:44 ` Alexander Duyck
  2016-11-03  8:09 ` [Intel-wired-lan] " Ruinskiy, Dima
  1 sibling, 0 replies; 4+ messages in thread
From: Alexander Duyck @ 2016-11-02 21:44 UTC (permalink / raw)
  To: Tyler Baicar
  Cc: Jeff Kirsher, intel-wired-lan, Netdev, linux-kernel, okaya, timur

On Wed, Nov 2, 2016 at 2:08 PM, Tyler Baicar <tbaicar@codeaurora.org> wrote:
> Move IRQ free code so that it will happen regardless of the
> link state. Currently the e1000e driver only releases its IRQ
> if the link is up. This is not sufficient because it is
> possible for a link to go down without releasing the IRQ. A
> secondary bus reset can cause this case to happen.
>
> Signed-off-by: Tyler Baicar <tbaicar@codeaurora.org>
> ---
>  drivers/net/ethernet/intel/e1000e/netdev.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c b/drivers/net/ethernet/intel/e1000e/netdev.c
> index 7017281..36cfcb0 100644
> --- a/drivers/net/ethernet/intel/e1000e/netdev.c
> +++ b/drivers/net/ethernet/intel/e1000e/netdev.c
> @@ -4679,12 +4679,13 @@ int e1000e_close(struct net_device *netdev)
>
>         if (!test_bit(__E1000_DOWN, &adapter->state)) {
>                 e1000e_down(adapter, true);
> -               e1000_free_irq(adapter);
>
>                 /* Link status message must follow this format */
>                 pr_info("%s NIC Link is Down\n", adapter->netdev->name);
>         }
>
> +       e1000_free_irq(adapter);
> +
>         napi_disable(&adapter->napi);
>
>         e1000e_free_tx_resources(adapter->tx_ring);


The __E1000_DOWN bit has nothing to do with link state.  It is
basically there to make sure that we don't call e1000e_down multiple
times on the same interface.

With that being said the change itself is probably okay since from
what I can tell e1000e_open doesn't do a check on the __E1000_DOWN bit
before requesting the interrupt.  However, you may want to incorporate
pieces of this change (http://patchwork.ozlabs.org/patch/690139/) that
went in for ixgbevf.  Basically you need to keep the suspend code from
racing with the close call.  The easiest way to do that is to wrap the
bits that are also in e1000e_close in the rtnl_lock like we did for
ixgbevf, and then you would need to check for netif_device_present
before calling e1000_free_irq() just so you didn't call it twice.

- Alex

^ permalink raw reply	[flat|nested] 4+ messages in thread

* RE: [Intel-wired-lan] [PATCH] e1000e: free IRQ when the link is up or     down
  2016-11-02 21:08 [PATCH] e1000e: free IRQ when the link is up or down Tyler Baicar
  2016-11-02 21:44 ` Alexander Duyck
@ 2016-11-03  8:09 ` Ruinskiy, Dima
  2016-11-03 15:54   ` Baicar, Tyler
  1 sibling, 1 reply; 4+ messages in thread
From: Ruinskiy, Dima @ 2016-11-03  8:09 UTC (permalink / raw)
  To: Tyler Baicar, Kirsher, Jeffrey T, intel-wired-lan, netdev,
	linux-kernel, okaya, timur

>-----Original Message-----
>From: Intel-wired-lan [mailto:intel-wired-lan-bounces@lists.osuosl.org] On
>Behalf Of Tyler Baicar
>Sent: Wednesday, 02 November, 2016 23:08
>To: Kirsher, Jeffrey T; intel-wired-lan@lists.osuosl.org;
>netdev@vger.kernel.org; linux-kernel@vger.kernel.org;
>okaya@codeaurora.org; timur@codeaurora.org
>Cc: Tyler Baicar
>Subject: [Intel-wired-lan] [PATCH] e1000e: free IRQ when the link is up or
>down
>
>Move IRQ free code so that it will happen regardless of the link state.
>Currently the e1000e driver only releases its IRQ if the link is up. This is not
>sufficient because it is possible for a link to go down without releasing the IRQ.
>A secondary bus reset can cause this case to happen.
>
>Signed-off-by: Tyler Baicar <tbaicar@codeaurora.org>
>---
> drivers/net/ethernet/intel/e1000e/netdev.c | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
>
>diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c
>b/drivers/net/ethernet/intel/e1000e/netdev.c
>index 7017281..36cfcb0 100644
>--- a/drivers/net/ethernet/intel/e1000e/netdev.c
>+++ b/drivers/net/ethernet/intel/e1000e/netdev.c
>@@ -4679,12 +4679,13 @@ int e1000e_close(struct net_device *netdev)
>
> 	if (!test_bit(__E1000_DOWN, &adapter->state)) {
> 		e1000e_down(adapter, true);
>-		e1000_free_irq(adapter);
>
> 		/* Link status message must follow this format */
> 		pr_info("%s NIC Link is Down\n", adapter->netdev->name);
> 	}
>
>+	e1000_free_irq(adapter);
>+
> 	napi_disable(&adapter->napi);
>
> 	e1000e_free_tx_resources(adapter->tx_ring);
>--
>Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm
>Technologies, Inc.
>Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux
>Foundation Collaborative Project.
>
>_______________________________________________
>Intel-wired-lan mailing list
>Intel-wired-lan@lists.osuosl.org
>http://lists.osuosl.org/mailman/listinfo/intel-wired-lan

This is not correct. __E1000_DOWN has nothing to do with link state. It is an internal driver status bit that indicates that device shutdown is in progress.

I would not change this code without checking very carefully the driver state machine. This can cause a whole lot of issues. Did you encounter some particular problem that is resolved by this change?
---------------------------------------------------------------------
Intel Israel (74) Limited

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [Intel-wired-lan] [PATCH] e1000e: free IRQ when the link is up or down
  2016-11-03  8:09 ` [Intel-wired-lan] " Ruinskiy, Dima
@ 2016-11-03 15:54   ` Baicar, Tyler
  0 siblings, 0 replies; 4+ messages in thread
From: Baicar, Tyler @ 2016-11-03 15:54 UTC (permalink / raw)
  To: Ruinskiy, Dima, Kirsher, Jeffrey T, intel-wired-lan, netdev,
	linux-kernel, okaya, timur

On 11/3/2016 2:09 AM, Ruinskiy, Dima wrote:
>> -----Original Message-----
>> From: Intel-wired-lan [mailto:intel-wired-lan-bounces@lists.osuosl.org] On
>> Behalf Of Tyler Baicar
>> Sent: Wednesday, 02 November, 2016 23:08
>> To: Kirsher, Jeffrey T; intel-wired-lan@lists.osuosl.org;
>> netdev@vger.kernel.org; linux-kernel@vger.kernel.org;
>> okaya@codeaurora.org; timur@codeaurora.org
>> Cc: Tyler Baicar
>> Subject: [Intel-wired-lan] [PATCH] e1000e: free IRQ when the link is up or
>> down
>>
>> Move IRQ free code so that it will happen regardless of the link state.
>> Currently the e1000e driver only releases its IRQ if the link is up. This is not
>> sufficient because it is possible for a link to go down without releasing the IRQ.
>> A secondary bus reset can cause this case to happen.
>>
>> Signed-off-by: Tyler Baicar <tbaicar@codeaurora.org>
>> ---
>> drivers/net/ethernet/intel/e1000e/netdev.c | 3 ++-
>> 1 file changed, 2 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c
>> b/drivers/net/ethernet/intel/e1000e/netdev.c
>> index 7017281..36cfcb0 100644
>> --- a/drivers/net/ethernet/intel/e1000e/netdev.c
>> +++ b/drivers/net/ethernet/intel/e1000e/netdev.c
>> @@ -4679,12 +4679,13 @@ int e1000e_close(struct net_device *netdev)
>>
>> 	if (!test_bit(__E1000_DOWN, &adapter->state)) {
>> 		e1000e_down(adapter, true);
>> -		e1000_free_irq(adapter);
>>
>> 		/* Link status message must follow this format */
>> 		pr_info("%s NIC Link is Down\n", adapter->netdev->name);
>> 	}
>>
>> +	e1000_free_irq(adapter);
>> +
>> 	napi_disable(&adapter->napi);
>>
>> 	e1000e_free_tx_resources(adapter->tx_ring);
> This is not correct. __E1000_DOWN has nothing to do with link state. It is an internal driver status bit that indicates that device shutdown is in progress.
>
> I would not change this code without checking very carefully the driver state machine. This can cause a whole lot of issues. Did you encounter some particular problem that is resolved by this change?
Hello Dima,

The issue is that when a secondary bus reset occurs the current code 
will not free the IRQ due to this __E1000_DOWN check. If the IRQ isn't 
freed, then later in e1000_remove we run into a kernel bug:

pcieport 0004:00:00.0: PCIe Bus Error: severity=Corrected, type=Physical 
Layer, id=0000(Receiver ID)
pcieport 0004:00:00.0:   device [17cb:0400] error 
status/mask=00000001/00006000
pcieport 0004:00:00.0:    [ 0] Receiver Error         (First)
pcieport 0004:00:00.0: PCIe Bus Error: severity=Uncorrected (Non-Fatal), 
type=Transaction Layer, id=0000(Requester ID)
pcieport 0004:00:00.0:   device [17cb:0400] error 
status/mask=00004000/00400000
pcieport 0004:00:00.0:    [14] Completion Timeout     (First)
ACPI: \_SB_.PCI4: Device has suffered a power fault
kernel BUG at drivers/pci/msi.c:369!

The stack dump is:

free_msi_irqs+0x6c/0x1a8
pci_disable_msi+0xb0/0x148
e1000e_reset_interrupt_capability+0x60/0x78
e1000_remove+0xc8/0x180
pci_device_remove+0x48/0x118
__device_release_driver+0x80/0x108
device_release_driver+0x2c/0x40
pci_stop_bus_device+0xa0/0xb0
pci_stop_bus_device+0x3c/0xb0
pci_stop_root_bus+0x54/0x80
acpi_pci_root_remove+0x28/0x64
acpi_bus_trim+0x6c/0xa4
acpi_device_hotplug+0x19c/0x3f4
acpi_hotplug_work_fn+0x28/0x3c
process_one_work+0x150/0x460
worker_thread+0x50/0x4b8
kthread+0xd4/0xe8
ret_from_fork+0x10/0x50

This bug is hit because the IRQ still has action since it was never 
freed. This patch resolves this issue.

Thanks,
Tyler

-- 
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project.

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2016-11-03 15:54 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-11-02 21:08 [PATCH] e1000e: free IRQ when the link is up or down Tyler Baicar
2016-11-02 21:44 ` Alexander Duyck
2016-11-03  8:09 ` [Intel-wired-lan] " Ruinskiy, Dima
2016-11-03 15:54   ` Baicar, Tyler

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).