* [PATCH] iavf: Fix hang during reboot/shutdown
@ 2022-03-17 10:45 ` Ivan Vecera
0 siblings, 0 replies; 9+ messages in thread
From: Ivan Vecera @ 2022-03-17 10:45 UTC (permalink / raw)
To: netdev
Cc: poros, Jesse Brandeburg, Tony Nguyen, David S. Miller,
Jakub Kicinski, Paolo Abeni, Slawomir Laba, Mateusz Palczewski,
Jacob Keller, Phani Burra, moderated list:INTEL ETHERNET DRIVERS,
open list
Recent commit 974578017fc1 ("iavf: Add waiting so the port is
initialized in remove") adds a wait-loop at the beginning of
iavf_remove() to ensure that port initialization is finished
prior unregistering net device. This causes a regression
in reboot/shutdown scenario because in this case callback
iavf_shutdown() is called and this callback detaches the device,
makes it down if it is running and sets its state to __IAVF_REMOVE.
Later shutdown callback of associated PF driver (e.g. ice_shutdown)
is called. That callback calls among other things sriov_disable()
that calls indirectly iavf_remove() (see stack trace below).
As the adapter state is already __IAVF_REMOVE then the mentioned
loop is end-less and shutdown process hangs.
The patch fixes this by checking adapter's state at the beginning
of iavf_remove() and skips the rest of the function if the adapter
is already in remove state (shutdown is in progress).
Reproducer:
1. Create VF on PF driven by ice or i40e driver
2. Ensure that the VF is bound to iavf driver
3. Reboot
[52625.981294] sysrq: SysRq : Show Blocked State
[52625.988377] task:reboot state:D stack: 0 pid:17359 ppid: 1 f2
[52625.996732] Call Trace:
[52625.999187] __schedule+0x2d1/0x830
[52626.007400] schedule+0x35/0xa0
[52626.010545] schedule_hrtimeout_range_clock+0x83/0x100
[52626.020046] usleep_range+0x5b/0x80
[52626.023540] iavf_remove+0x63/0x5b0 [iavf]
[52626.027645] pci_device_remove+0x3b/0xc0
[52626.031572] device_release_driver_internal+0x103/0x1f0
[52626.036805] pci_stop_bus_device+0x72/0xa0
[52626.040904] pci_stop_and_remove_bus_device+0xe/0x20
[52626.045870] pci_iov_remove_virtfn+0xba/0x120
[52626.050232] sriov_disable+0x2f/0xe0
[52626.053813] ice_free_vfs+0x7c/0x340 [ice]
[52626.057946] ice_remove+0x220/0x240 [ice]
[52626.061967] ice_shutdown+0x16/0x50 [ice]
[52626.065987] pci_device_shutdown+0x34/0x60
[52626.070086] device_shutdown+0x165/0x1c5
[52626.074011] kernel_restart+0xe/0x30
[52626.077593] __do_sys_reboot+0x1d2/0x210
[52626.093815] do_syscall_64+0x5b/0x1a0
[52626.097483] entry_SYSCALL_64_after_hwframe+0x65/0xca
Fixes: 974578017fc1 ("iavf: Add waiting so the port is initialized in remove")
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
---
drivers/net/ethernet/intel/iavf/iavf_main.c | 7 +++++++
1 file changed, 7 insertions(+)
diff --git a/drivers/net/ethernet/intel/iavf/iavf_main.c b/drivers/net/ethernet/intel/iavf/iavf_main.c
index 45570e3f782e..0e178a0a59c5 100644
--- a/drivers/net/ethernet/intel/iavf/iavf_main.c
+++ b/drivers/net/ethernet/intel/iavf/iavf_main.c
@@ -4620,6 +4620,13 @@ static void iavf_remove(struct pci_dev *pdev)
struct iavf_hw *hw = &adapter->hw;
int err;
+ /* When reboot/shutdown is in progress no need to do anything
+ * as the adapter is already REMOVE state that was set during
+ * iavf_shutdown() callback.
+ */
+ if (adapter->state == __IAVF_REMOVE)
+ return;
+
set_bit(__IAVF_IN_REMOVE_TASK, &adapter->crit_section);
/* Wait until port initialization is complete.
* There are flows where register/unregister netdev may race.
--
2.34.1
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [Intel-wired-lan] [PATCH] iavf: Fix hang during reboot/shutdown
@ 2022-03-17 10:45 ` Ivan Vecera
0 siblings, 0 replies; 9+ messages in thread
From: Ivan Vecera @ 2022-03-17 10:45 UTC (permalink / raw)
To: intel-wired-lan
Recent commit 974578017fc1 ("iavf: Add waiting so the port is
initialized in remove") adds a wait-loop at the beginning of
iavf_remove() to ensure that port initialization is finished
prior unregistering net device. This causes a regression
in reboot/shutdown scenario because in this case callback
iavf_shutdown() is called and this callback detaches the device,
makes it down if it is running and sets its state to __IAVF_REMOVE.
Later shutdown callback of associated PF driver (e.g. ice_shutdown)
is called. That callback calls among other things sriov_disable()
that calls indirectly iavf_remove() (see stack trace below).
As the adapter state is already __IAVF_REMOVE then the mentioned
loop is end-less and shutdown process hangs.
The patch fixes this by checking adapter's state at the beginning
of iavf_remove() and skips the rest of the function if the adapter
is already in remove state (shutdown is in progress).
Reproducer:
1. Create VF on PF driven by ice or i40e driver
2. Ensure that the VF is bound to iavf driver
3. Reboot
[52625.981294] sysrq: SysRq : Show Blocked State
[52625.988377] task:reboot state:D stack: 0 pid:17359 ppid: 1 f2
[52625.996732] Call Trace:
[52625.999187] __schedule+0x2d1/0x830
[52626.007400] schedule+0x35/0xa0
[52626.010545] schedule_hrtimeout_range_clock+0x83/0x100
[52626.020046] usleep_range+0x5b/0x80
[52626.023540] iavf_remove+0x63/0x5b0 [iavf]
[52626.027645] pci_device_remove+0x3b/0xc0
[52626.031572] device_release_driver_internal+0x103/0x1f0
[52626.036805] pci_stop_bus_device+0x72/0xa0
[52626.040904] pci_stop_and_remove_bus_device+0xe/0x20
[52626.045870] pci_iov_remove_virtfn+0xba/0x120
[52626.050232] sriov_disable+0x2f/0xe0
[52626.053813] ice_free_vfs+0x7c/0x340 [ice]
[52626.057946] ice_remove+0x220/0x240 [ice]
[52626.061967] ice_shutdown+0x16/0x50 [ice]
[52626.065987] pci_device_shutdown+0x34/0x60
[52626.070086] device_shutdown+0x165/0x1c5
[52626.074011] kernel_restart+0xe/0x30
[52626.077593] __do_sys_reboot+0x1d2/0x210
[52626.093815] do_syscall_64+0x5b/0x1a0
[52626.097483] entry_SYSCALL_64_after_hwframe+0x65/0xca
Fixes: 974578017fc1 ("iavf: Add waiting so the port is initialized in remove")
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
---
drivers/net/ethernet/intel/iavf/iavf_main.c | 7 +++++++
1 file changed, 7 insertions(+)
diff --git a/drivers/net/ethernet/intel/iavf/iavf_main.c b/drivers/net/ethernet/intel/iavf/iavf_main.c
index 45570e3f782e..0e178a0a59c5 100644
--- a/drivers/net/ethernet/intel/iavf/iavf_main.c
+++ b/drivers/net/ethernet/intel/iavf/iavf_main.c
@@ -4620,6 +4620,13 @@ static void iavf_remove(struct pci_dev *pdev)
struct iavf_hw *hw = &adapter->hw;
int err;
+ /* When reboot/shutdown is in progress no need to do anything
+ * as the adapter is already REMOVE state that was set during
+ * iavf_shutdown() callback.
+ */
+ if (adapter->state == __IAVF_REMOVE)
+ return;
+
set_bit(__IAVF_IN_REMOVE_TASK, &adapter->crit_section);
/* Wait until port initialization is complete.
* There are flows where register/unregister netdev may race.
--
2.34.1
^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH] iavf: Fix hang during reboot/shutdown
2022-03-17 10:45 ` [Intel-wired-lan] " Ivan Vecera
@ 2022-03-17 16:11 ` Jakub Kicinski
-1 siblings, 0 replies; 9+ messages in thread
From: Jakub Kicinski @ 2022-03-17 16:11 UTC (permalink / raw)
To: Jesse Brandeburg, Tony Nguyen
Cc: Ivan Vecera, netdev, poros, David S. Miller, Paolo Abeni,
Slawomir Laba, Mateusz Palczewski, Jacob Keller, Phani Burra,
moderated list:INTEL ETHERNET DRIVERS, open list
On Thu, 17 Mar 2022 11:45:24 +0100 Ivan Vecera wrote:
> Recent commit 974578017fc1 ("iavf: Add waiting so the port is
> initialized in remove") adds a wait-loop at the beginning of
> iavf_remove() to ensure that port initialization is finished
> prior unregistering net device. This causes a regression
> in reboot/shutdown scenario because in this case callback
> iavf_shutdown() is called and this callback detaches the device,
> makes it down if it is running and sets its state to __IAVF_REMOVE.
> Later shutdown callback of associated PF driver (e.g. ice_shutdown)
> is called. That callback calls among other things sriov_disable()
> that calls indirectly iavf_remove() (see stack trace below).
> As the adapter state is already __IAVF_REMOVE then the mentioned
> loop is end-less and shutdown process hangs.
Tony, Jesse, looks like the regression is from 5.17-rc6, should
I take this directly so it makes 5.17 final?
^ permalink raw reply [flat|nested] 9+ messages in thread
* [Intel-wired-lan] [PATCH] iavf: Fix hang during reboot/shutdown
@ 2022-03-17 16:11 ` Jakub Kicinski
0 siblings, 0 replies; 9+ messages in thread
From: Jakub Kicinski @ 2022-03-17 16:11 UTC (permalink / raw)
To: intel-wired-lan
On Thu, 17 Mar 2022 11:45:24 +0100 Ivan Vecera wrote:
> Recent commit 974578017fc1 ("iavf: Add waiting so the port is
> initialized in remove") adds a wait-loop at the beginning of
> iavf_remove() to ensure that port initialization is finished
> prior unregistering net device. This causes a regression
> in reboot/shutdown scenario because in this case callback
> iavf_shutdown() is called and this callback detaches the device,
> makes it down if it is running and sets its state to __IAVF_REMOVE.
> Later shutdown callback of associated PF driver (e.g. ice_shutdown)
> is called. That callback calls among other things sriov_disable()
> that calls indirectly iavf_remove() (see stack trace below).
> As the adapter state is already __IAVF_REMOVE then the mentioned
> loop is end-less and shutdown process hangs.
Tony, Jesse, looks like the regression is from 5.17-rc6, should
I take this directly so it makes 5.17 final?
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] iavf: Fix hang during reboot/shutdown
2022-03-17 16:11 ` [Intel-wired-lan] " Jakub Kicinski
@ 2022-03-17 17:03 ` Tony Nguyen
-1 siblings, 0 replies; 9+ messages in thread
From: Tony Nguyen @ 2022-03-17 17:03 UTC (permalink / raw)
To: Jakub Kicinski, Jesse Brandeburg
Cc: Ivan Vecera, netdev, poros, David S. Miller, Paolo Abeni,
Slawomir Laba, Mateusz Palczewski, Jacob Keller, Phani Burra,
moderated list:INTEL ETHERNET DRIVERS, open list
On 3/17/2022 9:11 AM, Jakub Kicinski wrote:
> On Thu, 17 Mar 2022 11:45:24 +0100 Ivan Vecera wrote:
>> Recent commit 974578017fc1 ("iavf: Add waiting so the port is
>> initialized in remove") adds a wait-loop at the beginning of
>> iavf_remove() to ensure that port initialization is finished
>> prior unregistering net device. This causes a regression
>> in reboot/shutdown scenario because in this case callback
>> iavf_shutdown() is called and this callback detaches the device,
>> makes it down if it is running and sets its state to __IAVF_REMOVE.
>> Later shutdown callback of associated PF driver (e.g. ice_shutdown)
>> is called. That callback calls among other things sriov_disable()
>> that calls indirectly iavf_remove() (see stack trace below).
>> As the adapter state is already __IAVF_REMOVE then the mentioned
>> loop is end-less and shutdown process hangs.
> Tony, Jesse, looks like the regression is from 5.17-rc6, should
> I take this directly so it makes 5.17 final?
Hi Jakub,
There are some additional improvements that we think can be made but we
need more time to analyze and test. This is probably good for you to
take to make into this kernel though. We will send follow on patches if
needed.
Thanks,
Tony
^ permalink raw reply [flat|nested] 9+ messages in thread
* [Intel-wired-lan] [PATCH] iavf: Fix hang during reboot/shutdown
@ 2022-03-17 17:03 ` Tony Nguyen
0 siblings, 0 replies; 9+ messages in thread
From: Tony Nguyen @ 2022-03-17 17:03 UTC (permalink / raw)
To: intel-wired-lan
On 3/17/2022 9:11 AM, Jakub Kicinski wrote:
> On Thu, 17 Mar 2022 11:45:24 +0100 Ivan Vecera wrote:
>> Recent commit 974578017fc1 ("iavf: Add waiting so the port is
>> initialized in remove") adds a wait-loop at the beginning of
>> iavf_remove() to ensure that port initialization is finished
>> prior unregistering net device. This causes a regression
>> in reboot/shutdown scenario because in this case callback
>> iavf_shutdown() is called and this callback detaches the device,
>> makes it down if it is running and sets its state to __IAVF_REMOVE.
>> Later shutdown callback of associated PF driver (e.g. ice_shutdown)
>> is called. That callback calls among other things sriov_disable()
>> that calls indirectly iavf_remove() (see stack trace below).
>> As the adapter state is already __IAVF_REMOVE then the mentioned
>> loop is end-less and shutdown process hangs.
> Tony, Jesse, looks like the regression is from 5.17-rc6, should
> I take this directly so it makes 5.17 final?
Hi Jakub,
There are some additional improvements that we think can be made but we
need more time to analyze and test. This is probably good for you to
take to make into this kernel though. We will send follow on patches if
needed.
Thanks,
Tony
^ permalink raw reply [flat|nested] 9+ messages in thread
* [Intel-wired-lan] [PATCH] iavf: Fix hang during reboot/shutdown
2022-03-17 17:03 ` [Intel-wired-lan] " Tony Nguyen
(?)
@ 2022-03-17 17:18 ` Jakub Kicinski
-1 siblings, 0 replies; 9+ messages in thread
From: Jakub Kicinski @ 2022-03-17 17:18 UTC (permalink / raw)
To: intel-wired-lan
On Thu, 17 Mar 2022 10:03:05 -0700 Tony Nguyen wrote:
> There are some additional improvements that we think can be made but we
> need more time to analyze and test. This is probably good for you to
> take to make into this kernel though. We will send follow on patches if
> needed.
Sounds good, thanks!
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] iavf: Fix hang during reboot/shutdown
2022-03-17 10:45 ` [Intel-wired-lan] " Ivan Vecera
@ 2022-03-17 17:30 ` patchwork-bot+netdevbpf
-1 siblings, 0 replies; 9+ messages in thread
From: patchwork-bot+netdevbpf @ 2022-03-17 17:30 UTC (permalink / raw)
To: Ivan Vecera
Cc: netdev, poros, jesse.brandeburg, anthony.l.nguyen, davem, kuba,
pabeni, slawomirx.laba, mateusz.palczewski, jacob.e.keller,
phani.r.burra, intel-wired-lan, linux-kernel
Hello:
This patch was applied to netdev/net.git (master)
by Jakub Kicinski <kuba@kernel.org>:
On Thu, 17 Mar 2022 11:45:24 +0100 you wrote:
> Recent commit 974578017fc1 ("iavf: Add waiting so the port is
> initialized in remove") adds a wait-loop at the beginning of
> iavf_remove() to ensure that port initialization is finished
> prior unregistering net device. This causes a regression
> in reboot/shutdown scenario because in this case callback
> iavf_shutdown() is called and this callback detaches the device,
> makes it down if it is running and sets its state to __IAVF_REMOVE.
> Later shutdown callback of associated PF driver (e.g. ice_shutdown)
> is called. That callback calls among other things sriov_disable()
> that calls indirectly iavf_remove() (see stack trace below).
> As the adapter state is already __IAVF_REMOVE then the mentioned
> loop is end-less and shutdown process hangs.
>
> [...]
Here is the summary with links:
- iavf: Fix hang during reboot/shutdown
https://git.kernel.org/netdev/net/c/b04683ff8f08
You are awesome, thank you!
--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html
^ permalink raw reply [flat|nested] 9+ messages in thread
* [Intel-wired-lan] [PATCH] iavf: Fix hang during reboot/shutdown
@ 2022-03-17 17:30 ` patchwork-bot+netdevbpf
0 siblings, 0 replies; 9+ messages in thread
From: patchwork-bot+netdevbpf @ 2022-03-17 17:30 UTC (permalink / raw)
To: intel-wired-lan
Hello:
This patch was applied to netdev/net.git (master)
by Jakub Kicinski <kuba@kernel.org>:
On Thu, 17 Mar 2022 11:45:24 +0100 you wrote:
> Recent commit 974578017fc1 ("iavf: Add waiting so the port is
> initialized in remove") adds a wait-loop at the beginning of
> iavf_remove() to ensure that port initialization is finished
> prior unregistering net device. This causes a regression
> in reboot/shutdown scenario because in this case callback
> iavf_shutdown() is called and this callback detaches the device,
> makes it down if it is running and sets its state to __IAVF_REMOVE.
> Later shutdown callback of associated PF driver (e.g. ice_shutdown)
> is called. That callback calls among other things sriov_disable()
> that calls indirectly iavf_remove() (see stack trace below).
> As the adapter state is already __IAVF_REMOVE then the mentioned
> loop is end-less and shutdown process hangs.
>
> [...]
Here is the summary with links:
- iavf: Fix hang during reboot/shutdown
https://git.kernel.org/netdev/net/c/b04683ff8f08
You are awesome, thank you!
--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2022-03-17 17:30 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-03-17 10:45 [PATCH] iavf: Fix hang during reboot/shutdown Ivan Vecera
2022-03-17 10:45 ` [Intel-wired-lan] " Ivan Vecera
2022-03-17 16:11 ` Jakub Kicinski
2022-03-17 16:11 ` [Intel-wired-lan] " Jakub Kicinski
2022-03-17 17:03 ` Tony Nguyen
2022-03-17 17:03 ` [Intel-wired-lan] " Tony Nguyen
2022-03-17 17:18 ` Jakub Kicinski
2022-03-17 17:30 ` patchwork-bot+netdevbpf
2022-03-17 17:30 ` [Intel-wired-lan] " patchwork-bot+netdevbpf
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.