From: Tony Nguyen <anthony.l.nguyen@intel.com>
To: davem@davemloft.net, kuba@kernel.org, pabeni@redhat.com,
edumazet@google.com, netdev@vger.kernel.org
Cc: Ying Hsu <yinghsu@chromium.org>,
anthony.l.nguyen@intel.com, aleksandr.loktionov@intel.com,
Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com>
Subject: [PATCH net-next] igb: Fix igb_down hung on surprise removal
Date: Tue, 20 Jun 2023 10:47:32 -0700 [thread overview]
Message-ID: <20230620174732.4145155-1-anthony.l.nguyen@intel.com> (raw)
From: Ying Hsu <yinghsu@chromium.org>
In a setup where a Thunderbolt hub connects to Ethernet and a display
through USB Type-C, users may experience a hung task timeout when they
remove the cable between the PC and the Thunderbolt hub.
This is because the igb_down function is called multiple times when
the Thunderbolt hub is unplugged. For example, the igb_io_error_detected
triggers the first call, and the igb_remove triggers the second call.
The second call to igb_down will block at napi_synchronize.
Here's the call trace:
__schedule+0x3b0/0xddb
? __mod_timer+0x164/0x5d3
schedule+0x44/0xa8
schedule_timeout+0xb2/0x2a4
? run_local_timers+0x4e/0x4e
msleep+0x31/0x38
igb_down+0x12c/0x22a [igb 6615058754948bfde0bf01429257eb59f13030d4]
__igb_close+0x6f/0x9c [igb 6615058754948bfde0bf01429257eb59f13030d4]
igb_close+0x23/0x2b [igb 6615058754948bfde0bf01429257eb59f13030d4]
__dev_close_many+0x95/0xec
dev_close_many+0x6e/0x103
unregister_netdevice_many+0x105/0x5b1
unregister_netdevice_queue+0xc2/0x10d
unregister_netdev+0x1c/0x23
igb_remove+0xa7/0x11c [igb 6615058754948bfde0bf01429257eb59f13030d4]
pci_device_remove+0x3f/0x9c
device_release_driver_internal+0xfe/0x1b4
pci_stop_bus_device+0x5b/0x7f
pci_stop_bus_device+0x30/0x7f
pci_stop_bus_device+0x30/0x7f
pci_stop_and_remove_bus_device+0x12/0x19
pciehp_unconfigure_device+0x76/0xe9
pciehp_disable_slot+0x6e/0x131
pciehp_handle_presence_or_link_change+0x7a/0x3f7
pciehp_ist+0xbe/0x194
irq_thread_fn+0x22/0x4d
? irq_thread+0x1fd/0x1fd
irq_thread+0x17b/0x1fd
? irq_forced_thread_fn+0x5f/0x5f
kthread+0x142/0x153
? __irq_get_irqchip_state+0x46/0x46
? kthread_associate_blkcg+0x71/0x71
ret_from_fork+0x1f/0x30
In this case, igb_io_error_detected detaches the network interface
and requests a PCIE slot reset, however, the PCIE reset callback is
not being invoked and thus the Ethernet connection breaks down.
As the PCIE error in this case is a non-fatal one, requesting a
slot reset can be avoided.
This patch fixes the task hung issue and preserves Ethernet
connection by ignoring non-fatal PCIE errors.
Signed-off-by: Ying Hsu <yinghsu@chromium.org>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
---
drivers/net/ethernet/intel/igb/igb_main.c | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c
index 9fcac96022d7..9a2561409b06 100644
--- a/drivers/net/ethernet/intel/igb/igb_main.c
+++ b/drivers/net/ethernet/intel/igb/igb_main.c
@@ -9587,6 +9587,11 @@ static pci_ers_result_t igb_io_error_detected(struct pci_dev *pdev,
struct net_device *netdev = pci_get_drvdata(pdev);
struct igb_adapter *adapter = netdev_priv(netdev);
+ if (state == pci_channel_io_normal) {
+ dev_warn(&pdev->dev, "Non-correctable non-fatal error reported.\n");
+ return PCI_ERS_RESULT_CAN_RECOVER;
+ }
+
netif_device_detach(netdev);
if (state == pci_channel_io_perm_failure)
--
2.38.1
next reply other threads:[~2023-06-20 17:52 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-06-20 17:47 Tony Nguyen [this message]
2023-06-21 14:49 ` [PATCH net-next] igb: Fix igb_down hung on surprise removal Simon Horman
2023-06-23 3:00 ` patchwork-bot+netdevbpf
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20230620174732.4145155-1-anthony.l.nguyen@intel.com \
--to=anthony.l.nguyen@intel.com \
--cc=aleksandr.loktionov@intel.com \
--cc=davem@davemloft.net \
--cc=edumazet@google.com \
--cc=himasekharx.reddy.pucha@intel.com \
--cc=kuba@kernel.org \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=yinghsu@chromium.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.