All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] iavf: Fix hang during reboot/shutdown
@ 2022-03-17 10:45 ` Ivan Vecera
  0 siblings, 0 replies; 9+ messages in thread
From: Ivan Vecera @ 2022-03-17 10:45 UTC (permalink / raw)
  To: netdev
  Cc: poros, Jesse Brandeburg, Tony Nguyen, David S. Miller,
	Jakub Kicinski, Paolo Abeni, Slawomir Laba, Mateusz Palczewski,
	Jacob Keller, Phani Burra, moderated list:INTEL ETHERNET DRIVERS,
	open list

Recent commit 974578017fc1 ("iavf: Add waiting so the port is
initialized in remove") adds a wait-loop at the beginning of
iavf_remove() to ensure that port initialization is finished
prior unregistering net device. This causes a regression
in reboot/shutdown scenario because in this case callback
iavf_shutdown() is called and this callback detaches the device,
makes it down if it is running and sets its state to __IAVF_REMOVE.
Later shutdown callback of associated PF driver (e.g. ice_shutdown)
is called. That callback calls among other things sriov_disable()
that calls indirectly iavf_remove() (see stack trace below).
As the adapter state is already __IAVF_REMOVE then the mentioned
loop is end-less and shutdown process hangs.

The patch fixes this by checking adapter's state at the beginning
of iavf_remove() and skips the rest of the function if the adapter
is already in remove state (shutdown is in progress).

Reproducer:
1. Create VF on PF driven by ice or i40e driver
2. Ensure that the VF is bound to iavf driver
3. Reboot

[52625.981294] sysrq: SysRq : Show Blocked State
[52625.988377] task:reboot          state:D stack:    0 pid:17359 ppid:     1 f2
[52625.996732] Call Trace:
[52625.999187]  __schedule+0x2d1/0x830
[52626.007400]  schedule+0x35/0xa0
[52626.010545]  schedule_hrtimeout_range_clock+0x83/0x100
[52626.020046]  usleep_range+0x5b/0x80
[52626.023540]  iavf_remove+0x63/0x5b0 [iavf]
[52626.027645]  pci_device_remove+0x3b/0xc0
[52626.031572]  device_release_driver_internal+0x103/0x1f0
[52626.036805]  pci_stop_bus_device+0x72/0xa0
[52626.040904]  pci_stop_and_remove_bus_device+0xe/0x20
[52626.045870]  pci_iov_remove_virtfn+0xba/0x120
[52626.050232]  sriov_disable+0x2f/0xe0
[52626.053813]  ice_free_vfs+0x7c/0x340 [ice]
[52626.057946]  ice_remove+0x220/0x240 [ice]
[52626.061967]  ice_shutdown+0x16/0x50 [ice]
[52626.065987]  pci_device_shutdown+0x34/0x60
[52626.070086]  device_shutdown+0x165/0x1c5
[52626.074011]  kernel_restart+0xe/0x30
[52626.077593]  __do_sys_reboot+0x1d2/0x210
[52626.093815]  do_syscall_64+0x5b/0x1a0
[52626.097483]  entry_SYSCALL_64_after_hwframe+0x65/0xca

Fixes: 974578017fc1 ("iavf: Add waiting so the port is initialized in remove")
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
---
 drivers/net/ethernet/intel/iavf/iavf_main.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/drivers/net/ethernet/intel/iavf/iavf_main.c b/drivers/net/ethernet/intel/iavf/iavf_main.c
index 45570e3f782e..0e178a0a59c5 100644
--- a/drivers/net/ethernet/intel/iavf/iavf_main.c
+++ b/drivers/net/ethernet/intel/iavf/iavf_main.c
@@ -4620,6 +4620,13 @@ static void iavf_remove(struct pci_dev *pdev)
 	struct iavf_hw *hw = &adapter->hw;
 	int err;
 
+	/* When reboot/shutdown is in progress no need to do anything
+	 * as the adapter is already REMOVE state that was set during
+	 * iavf_shutdown() callback.
+	 */
+	if (adapter->state == __IAVF_REMOVE)
+		return;
+
 	set_bit(__IAVF_IN_REMOVE_TASK, &adapter->crit_section);
 	/* Wait until port initialization is complete.
 	 * There are flows where register/unregister netdev may race.
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [Intel-wired-lan] [PATCH] iavf: Fix hang during reboot/shutdown
@ 2022-03-17 10:45 ` Ivan Vecera
  0 siblings, 0 replies; 9+ messages in thread
From: Ivan Vecera @ 2022-03-17 10:45 UTC (permalink / raw)
  To: intel-wired-lan

Recent commit 974578017fc1 ("iavf: Add waiting so the port is
initialized in remove") adds a wait-loop at the beginning of
iavf_remove() to ensure that port initialization is finished
prior unregistering net device. This causes a regression
in reboot/shutdown scenario because in this case callback
iavf_shutdown() is called and this callback detaches the device,
makes it down if it is running and sets its state to __IAVF_REMOVE.
Later shutdown callback of associated PF driver (e.g. ice_shutdown)
is called. That callback calls among other things sriov_disable()
that calls indirectly iavf_remove() (see stack trace below).
As the adapter state is already __IAVF_REMOVE then the mentioned
loop is end-less and shutdown process hangs.

The patch fixes this by checking adapter's state at the beginning
of iavf_remove() and skips the rest of the function if the adapter
is already in remove state (shutdown is in progress).

Reproducer:
1. Create VF on PF driven by ice or i40e driver
2. Ensure that the VF is bound to iavf driver
3. Reboot

[52625.981294] sysrq: SysRq : Show Blocked State
[52625.988377] task:reboot          state:D stack:    0 pid:17359 ppid:     1 f2
[52625.996732] Call Trace:
[52625.999187]  __schedule+0x2d1/0x830
[52626.007400]  schedule+0x35/0xa0
[52626.010545]  schedule_hrtimeout_range_clock+0x83/0x100
[52626.020046]  usleep_range+0x5b/0x80
[52626.023540]  iavf_remove+0x63/0x5b0 [iavf]
[52626.027645]  pci_device_remove+0x3b/0xc0
[52626.031572]  device_release_driver_internal+0x103/0x1f0
[52626.036805]  pci_stop_bus_device+0x72/0xa0
[52626.040904]  pci_stop_and_remove_bus_device+0xe/0x20
[52626.045870]  pci_iov_remove_virtfn+0xba/0x120
[52626.050232]  sriov_disable+0x2f/0xe0
[52626.053813]  ice_free_vfs+0x7c/0x340 [ice]
[52626.057946]  ice_remove+0x220/0x240 [ice]
[52626.061967]  ice_shutdown+0x16/0x50 [ice]
[52626.065987]  pci_device_shutdown+0x34/0x60
[52626.070086]  device_shutdown+0x165/0x1c5
[52626.074011]  kernel_restart+0xe/0x30
[52626.077593]  __do_sys_reboot+0x1d2/0x210
[52626.093815]  do_syscall_64+0x5b/0x1a0
[52626.097483]  entry_SYSCALL_64_after_hwframe+0x65/0xca

Fixes: 974578017fc1 ("iavf: Add waiting so the port is initialized in remove")
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
---
 drivers/net/ethernet/intel/iavf/iavf_main.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/drivers/net/ethernet/intel/iavf/iavf_main.c b/drivers/net/ethernet/intel/iavf/iavf_main.c
index 45570e3f782e..0e178a0a59c5 100644
--- a/drivers/net/ethernet/intel/iavf/iavf_main.c
+++ b/drivers/net/ethernet/intel/iavf/iavf_main.c
@@ -4620,6 +4620,13 @@ static void iavf_remove(struct pci_dev *pdev)
 	struct iavf_hw *hw = &adapter->hw;
 	int err;
 
+	/* When reboot/shutdown is in progress no need to do anything
+	 * as the adapter is already REMOVE state that was set during
+	 * iavf_shutdown() callback.
+	 */
+	if (adapter->state == __IAVF_REMOVE)
+		return;
+
 	set_bit(__IAVF_IN_REMOVE_TASK, &adapter->crit_section);
 	/* Wait until port initialization is complete.
 	 * There are flows where register/unregister netdev may race.
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH] iavf: Fix hang during reboot/shutdown
  2022-03-17 10:45 ` [Intel-wired-lan] " Ivan Vecera
@ 2022-03-17 16:11   ` Jakub Kicinski
  -1 siblings, 0 replies; 9+ messages in thread
From: Jakub Kicinski @ 2022-03-17 16:11 UTC (permalink / raw)
  To: Jesse Brandeburg, Tony Nguyen
  Cc: Ivan Vecera, netdev, poros, David S. Miller, Paolo Abeni,
	Slawomir Laba, Mateusz Palczewski, Jacob Keller, Phani Burra,
	moderated list:INTEL ETHERNET DRIVERS, open list

On Thu, 17 Mar 2022 11:45:24 +0100 Ivan Vecera wrote:
> Recent commit 974578017fc1 ("iavf: Add waiting so the port is
> initialized in remove") adds a wait-loop at the beginning of
> iavf_remove() to ensure that port initialization is finished
> prior unregistering net device. This causes a regression
> in reboot/shutdown scenario because in this case callback
> iavf_shutdown() is called and this callback detaches the device,
> makes it down if it is running and sets its state to __IAVF_REMOVE.
> Later shutdown callback of associated PF driver (e.g. ice_shutdown)
> is called. That callback calls among other things sriov_disable()
> that calls indirectly iavf_remove() (see stack trace below).
> As the adapter state is already __IAVF_REMOVE then the mentioned
> loop is end-less and shutdown process hangs.

Tony, Jesse, looks like the regression is from 5.17-rc6, should 
I take this directly so it makes 5.17 final?

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Intel-wired-lan] [PATCH] iavf: Fix hang during reboot/shutdown
@ 2022-03-17 16:11   ` Jakub Kicinski
  0 siblings, 0 replies; 9+ messages in thread
From: Jakub Kicinski @ 2022-03-17 16:11 UTC (permalink / raw)
  To: intel-wired-lan

On Thu, 17 Mar 2022 11:45:24 +0100 Ivan Vecera wrote:
> Recent commit 974578017fc1 ("iavf: Add waiting so the port is
> initialized in remove") adds a wait-loop at the beginning of
> iavf_remove() to ensure that port initialization is finished
> prior unregistering net device. This causes a regression
> in reboot/shutdown scenario because in this case callback
> iavf_shutdown() is called and this callback detaches the device,
> makes it down if it is running and sets its state to __IAVF_REMOVE.
> Later shutdown callback of associated PF driver (e.g. ice_shutdown)
> is called. That callback calls among other things sriov_disable()
> that calls indirectly iavf_remove() (see stack trace below).
> As the adapter state is already __IAVF_REMOVE then the mentioned
> loop is end-less and shutdown process hangs.

Tony, Jesse, looks like the regression is from 5.17-rc6, should 
I take this directly so it makes 5.17 final?

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] iavf: Fix hang during reboot/shutdown
  2022-03-17 16:11   ` [Intel-wired-lan] " Jakub Kicinski
@ 2022-03-17 17:03     ` Tony Nguyen
  -1 siblings, 0 replies; 9+ messages in thread
From: Tony Nguyen @ 2022-03-17 17:03 UTC (permalink / raw)
  To: Jakub Kicinski, Jesse Brandeburg
  Cc: Ivan Vecera, netdev, poros, David S. Miller, Paolo Abeni,
	Slawomir Laba, Mateusz Palczewski, Jacob Keller, Phani Burra,
	moderated list:INTEL ETHERNET DRIVERS, open list


On 3/17/2022 9:11 AM, Jakub Kicinski wrote:
> On Thu, 17 Mar 2022 11:45:24 +0100 Ivan Vecera wrote:
>> Recent commit 974578017fc1 ("iavf: Add waiting so the port is
>> initialized in remove") adds a wait-loop at the beginning of
>> iavf_remove() to ensure that port initialization is finished
>> prior unregistering net device. This causes a regression
>> in reboot/shutdown scenario because in this case callback
>> iavf_shutdown() is called and this callback detaches the device,
>> makes it down if it is running and sets its state to __IAVF_REMOVE.
>> Later shutdown callback of associated PF driver (e.g. ice_shutdown)
>> is called. That callback calls among other things sriov_disable()
>> that calls indirectly iavf_remove() (see stack trace below).
>> As the adapter state is already __IAVF_REMOVE then the mentioned
>> loop is end-less and shutdown process hangs.
> Tony, Jesse, looks like the regression is from 5.17-rc6, should
> I take this directly so it makes 5.17 final?

Hi Jakub,

There are some additional improvements that we think can be made but we 
need more time to analyze and test. This is probably good for you to 
take to make into this kernel though. We will send follow on patches if 
needed.

Thanks,

Tony


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Intel-wired-lan] [PATCH] iavf: Fix hang during reboot/shutdown
@ 2022-03-17 17:03     ` Tony Nguyen
  0 siblings, 0 replies; 9+ messages in thread
From: Tony Nguyen @ 2022-03-17 17:03 UTC (permalink / raw)
  To: intel-wired-lan


On 3/17/2022 9:11 AM, Jakub Kicinski wrote:
> On Thu, 17 Mar 2022 11:45:24 +0100 Ivan Vecera wrote:
>> Recent commit 974578017fc1 ("iavf: Add waiting so the port is
>> initialized in remove") adds a wait-loop at the beginning of
>> iavf_remove() to ensure that port initialization is finished
>> prior unregistering net device. This causes a regression
>> in reboot/shutdown scenario because in this case callback
>> iavf_shutdown() is called and this callback detaches the device,
>> makes it down if it is running and sets its state to __IAVF_REMOVE.
>> Later shutdown callback of associated PF driver (e.g. ice_shutdown)
>> is called. That callback calls among other things sriov_disable()
>> that calls indirectly iavf_remove() (see stack trace below).
>> As the adapter state is already __IAVF_REMOVE then the mentioned
>> loop is end-less and shutdown process hangs.
> Tony, Jesse, looks like the regression is from 5.17-rc6, should
> I take this directly so it makes 5.17 final?

Hi Jakub,

There are some additional improvements that we think can be made but we 
need more time to analyze and test. This is probably good for you to 
take to make into this kernel though. We will send follow on patches if 
needed.

Thanks,

Tony


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Intel-wired-lan] [PATCH] iavf: Fix hang during reboot/shutdown
  2022-03-17 17:03     ` [Intel-wired-lan] " Tony Nguyen
  (?)
@ 2022-03-17 17:18     ` Jakub Kicinski
  -1 siblings, 0 replies; 9+ messages in thread
From: Jakub Kicinski @ 2022-03-17 17:18 UTC (permalink / raw)
  To: intel-wired-lan

On Thu, 17 Mar 2022 10:03:05 -0700 Tony Nguyen wrote:
> There are some additional improvements that we think can be made but we 
> need more time to analyze and test. This is probably good for you to 
> take to make into this kernel though. We will send follow on patches if 
> needed.

Sounds good, thanks!

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] iavf: Fix hang during reboot/shutdown
  2022-03-17 10:45 ` [Intel-wired-lan] " Ivan Vecera
@ 2022-03-17 17:30   ` patchwork-bot+netdevbpf
  -1 siblings, 0 replies; 9+ messages in thread
From: patchwork-bot+netdevbpf @ 2022-03-17 17:30 UTC (permalink / raw)
  To: Ivan Vecera
  Cc: netdev, poros, jesse.brandeburg, anthony.l.nguyen, davem, kuba,
	pabeni, slawomirx.laba, mateusz.palczewski, jacob.e.keller,
	phani.r.burra, intel-wired-lan, linux-kernel

Hello:

This patch was applied to netdev/net.git (master)
by Jakub Kicinski <kuba@kernel.org>:

On Thu, 17 Mar 2022 11:45:24 +0100 you wrote:
> Recent commit 974578017fc1 ("iavf: Add waiting so the port is
> initialized in remove") adds a wait-loop at the beginning of
> iavf_remove() to ensure that port initialization is finished
> prior unregistering net device. This causes a regression
> in reboot/shutdown scenario because in this case callback
> iavf_shutdown() is called and this callback detaches the device,
> makes it down if it is running and sets its state to __IAVF_REMOVE.
> Later shutdown callback of associated PF driver (e.g. ice_shutdown)
> is called. That callback calls among other things sriov_disable()
> that calls indirectly iavf_remove() (see stack trace below).
> As the adapter state is already __IAVF_REMOVE then the mentioned
> loop is end-less and shutdown process hangs.
> 
> [...]

Here is the summary with links:
  - iavf: Fix hang during reboot/shutdown
    https://git.kernel.org/netdev/net/c/b04683ff8f08

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Intel-wired-lan] [PATCH] iavf: Fix hang during reboot/shutdown
@ 2022-03-17 17:30   ` patchwork-bot+netdevbpf
  0 siblings, 0 replies; 9+ messages in thread
From: patchwork-bot+netdevbpf @ 2022-03-17 17:30 UTC (permalink / raw)
  To: intel-wired-lan

Hello:

This patch was applied to netdev/net.git (master)
by Jakub Kicinski <kuba@kernel.org>:

On Thu, 17 Mar 2022 11:45:24 +0100 you wrote:
> Recent commit 974578017fc1 ("iavf: Add waiting so the port is
> initialized in remove") adds a wait-loop at the beginning of
> iavf_remove() to ensure that port initialization is finished
> prior unregistering net device. This causes a regression
> in reboot/shutdown scenario because in this case callback
> iavf_shutdown() is called and this callback detaches the device,
> makes it down if it is running and sets its state to __IAVF_REMOVE.
> Later shutdown callback of associated PF driver (e.g. ice_shutdown)
> is called. That callback calls among other things sriov_disable()
> that calls indirectly iavf_remove() (see stack trace below).
> As the adapter state is already __IAVF_REMOVE then the mentioned
> loop is end-less and shutdown process hangs.
> 
> [...]

Here is the summary with links:
  - iavf: Fix hang during reboot/shutdown
    https://git.kernel.org/netdev/net/c/b04683ff8f08

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2022-03-17 17:30 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-03-17 10:45 [PATCH] iavf: Fix hang during reboot/shutdown Ivan Vecera
2022-03-17 10:45 ` [Intel-wired-lan] " Ivan Vecera
2022-03-17 16:11 ` Jakub Kicinski
2022-03-17 16:11   ` [Intel-wired-lan] " Jakub Kicinski
2022-03-17 17:03   ` Tony Nguyen
2022-03-17 17:03     ` [Intel-wired-lan] " Tony Nguyen
2022-03-17 17:18     ` Jakub Kicinski
2022-03-17 17:30 ` patchwork-bot+netdevbpf
2022-03-17 17:30   ` [Intel-wired-lan] " patchwork-bot+netdevbpf

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.