All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH net v2] stmmac: Clear variable when destroying workqueue
       [not found] <CGME20240226154254eucas1p2bedde2c58f147809f83b23d455af9289@eucas1p2.samsung.com>
@ 2024-02-26 15:42 ` Jakub Raczynski
  2024-02-26 15:57   ` Jiri Pirko
  0 siblings, 1 reply; 5+ messages in thread
From: Jakub Raczynski @ 2024-02-26 15:42 UTC (permalink / raw)
  To: netdev; +Cc: kuba, alexandre.torgue, joabreu, Jakub Raczynski

Currently when suspending driver and stopping workqueue it is checked whether
workqueue is not NULL and if so, it is destroyed.
Function destroy_workqueue() does drain queue and does clear variable, but
it does not set workqueue variable to NULL. This can cause kernel/module
panic if code attempts to clear workqueue that was not initialized.

This scenario is possible when resuming suspended driver in stmmac_resume(),
because there is no handling for failed stmmac_hw_setup(),
which can fail and return if DMA engine has failed to initialize,
and workqueue is initialized after DMA engine.
Should DMA engine fail to initialize, resume will proceed normally,
but interface won't work and TX queue will eventually timeout,
causing 'Reset adapter' error.
This then does destroy workqueue during reset process.
And since workqueue is initialized after DMA engine and can be skipped,
it will cause kernel/module panic.

This commit sets workqueue variable to NULL when destroying workqueue,
which secures against that possible driver crash.

Fixes: 5a5586112b929 ("net: stmmac: support FPE link partner hand-shaking procedure")
Signed-off-by: Jakub Raczynski <j.raczynski@samsung.com>
---
 drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
index 75d029704503..0681029a2489 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
@@ -4005,8 +4005,10 @@ static void stmmac_fpe_stop_wq(struct stmmac_priv *priv)
 {
 	set_bit(__FPE_REMOVING, &priv->fpe_task_state);
 
-	if (priv->fpe_wq)
+	if (priv->fpe_wq) {
 		destroy_workqueue(priv->fpe_wq);
+		priv->fpe_wq = NULL;
+	}
 
 	netdev_info(priv->dev, "FPE workqueue stop");
 }
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH net v2] stmmac: Clear variable when destroying workqueue
  2024-02-26 15:42 ` [PATCH net v2] stmmac: Clear variable when destroying workqueue Jakub Raczynski
@ 2024-02-26 15:57   ` Jiri Pirko
       [not found]     ` <CGME20240226164337eucas1p196a1049cb7e766984910eee2f99bae4e@eucas1p1.samsung.com>
  0 siblings, 1 reply; 5+ messages in thread
From: Jiri Pirko @ 2024-02-26 15:57 UTC (permalink / raw)
  To: Jakub Raczynski; +Cc: netdev, kuba, alexandre.torgue, joabreu

Mon, Feb 26, 2024 at 04:42:17PM CET, j.raczynski@samsung.com wrote:
>Currently when suspending driver and stopping workqueue it is checked whether
>workqueue is not NULL and if so, it is destroyed.
>Function destroy_workqueue() does drain queue and does clear variable, but
>it does not set workqueue variable to NULL. This can cause kernel/module
>panic if code attempts to clear workqueue that was not initialized.
>
>This scenario is possible when resuming suspended driver in stmmac_resume(),
>because there is no handling for failed stmmac_hw_setup(),
>which can fail and return if DMA engine has failed to initialize,
>and workqueue is initialized after DMA engine.
>Should DMA engine fail to initialize, resume will proceed normally,
>but interface won't work and TX queue will eventually timeout,
>causing 'Reset adapter' error.
>This then does destroy workqueue during reset process.
>And since workqueue is initialized after DMA engine and can be skipped,
>it will cause kernel/module panic.

If you have a trace, it is good to inline it here so the future
reader/backporter can immediately match it.

>
>This commit sets workqueue variable to NULL when destroying workqueue,

Don't talk about "this commit" in the patch description, just tell the
codebase what to do using imperative mood:
https://www.kernel.org/doc/html/v6.6/process/submitting-patches.html#describe-your-changes


>which secures against that possible driver crash.
>
>Fixes: 5a5586112b929 ("net: stmmac: support FPE link partner hand-shaking procedure")
>Signed-off-by: Jakub Raczynski <j.raczynski@samsung.com>
>---
> drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 4 +++-
> 1 file changed, 3 insertions(+), 1 deletion(-)
>
>diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
>index 75d029704503..0681029a2489 100644
>--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
>+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
>@@ -4005,8 +4005,10 @@ static void stmmac_fpe_stop_wq(struct stmmac_priv *priv)
> {
> 	set_bit(__FPE_REMOVING, &priv->fpe_task_state);
> 
>-	if (priv->fpe_wq)
>+	if (priv->fpe_wq) {
> 		destroy_workqueue(priv->fpe_wq);
>+		priv->fpe_wq = NULL;
>+	}
> 
> 	netdev_info(priv->dev, "FPE workqueue stop");
> }
>-- 
>2.34.1
>
>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH net v2] stmmac: Clear variable when destroying workqueue
       [not found]     ` <CGME20240226164337eucas1p196a1049cb7e766984910eee2f99bae4e@eucas1p1.samsung.com>
@ 2024-02-26 16:42       ` Jakub Raczynski
  2024-02-27  7:06         ` Jiri Pirko
  2024-02-28 11:30         ` patchwork-bot+netdevbpf
  0 siblings, 2 replies; 5+ messages in thread
From: Jakub Raczynski @ 2024-02-26 16:42 UTC (permalink / raw)
  To: netdev; +Cc: kuba, alexandre.torgue, joabreu, Jakub Raczynski

Currently when suspending driver and stopping workqueue it is checked whether
workqueue is not NULL and if so, it is destroyed.
Function destroy_workqueue() does drain queue and does clear variable, but
it does not set workqueue variable to NULL. This can cause kernel/module
panic if code attempts to clear workqueue that was not initialized.

This scenario is possible when resuming suspended driver in stmmac_resume(),
because there is no handling for failed stmmac_hw_setup(),
which can fail and return if DMA engine has failed to initialize,
and workqueue is initialized after DMA engine.
Should DMA engine fail to initialize, resume will proceed normally,
but interface won't work and TX queue will eventually timeout,
causing 'Reset adapter' error.
This then does destroy workqueue during reset process.
And since workqueue is initialized after DMA engine and can be skipped,
it will cause kernel/module panic.

To secure against this possible crash, set workqueue variable to NULL when
destroying workqueue.

Log/backtrace from crash goes as follows:
[88.031977]------------[ cut here ]------------
[88.031985]NETDEV WATCHDOG: eth0 (sxgmac): transmit queue 1 timed out
[88.032017]WARNING: CPU: 0 PID: 0 at net/sched/sch_generic.c:477 dev_watchdog+0x390/0x398
           <Skipping backtrace for watchdog timeout>
[88.032251]---[ end trace e70de432e4d5c2c0 ]---
[88.032282]sxgmac 16d88000.ethernet eth0: Reset adapter.
[88.036359]------------[ cut here ]------------
[88.036519]Call trace:
[88.036523] flush_workqueue+0x3e4/0x430
[88.036528] drain_workqueue+0xc4/0x160
[88.036533] destroy_workqueue+0x40/0x270
[88.036537] stmmac_fpe_stop_wq+0x4c/0x70
[88.036541] stmmac_release+0x278/0x280
[88.036546] __dev_close_many+0xcc/0x158
[88.036551] dev_close_many+0xbc/0x190
[88.036555] dev_close.part.0+0x70/0xc0
[88.036560] dev_close+0x24/0x30
[88.036564] stmmac_service_task+0x110/0x140
[88.036569] process_one_work+0x1d8/0x4a0
[88.036573] worker_thread+0x54/0x408
[88.036578] kthread+0x164/0x170
[88.036583] ret_from_fork+0x10/0x20
[88.036588]---[ end trace e70de432e4d5c2c1 ]---
[88.036597]Unable to handle kernel NULL pointer dereference at virtual address 0000000000000004

Fixes: 5a5586112b929 ("net: stmmac: support FPE link partner hand-shaking procedure")
Signed-off-by: Jakub Raczynski <j.raczynski@samsung.com>
---
 drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
index 75d029704503..0681029a2489 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
@@ -4005,8 +4005,10 @@ static void stmmac_fpe_stop_wq(struct stmmac_priv *priv)
 {
 	set_bit(__FPE_REMOVING, &priv->fpe_task_state);
 
-	if (priv->fpe_wq)
+	if (priv->fpe_wq) {
 		destroy_workqueue(priv->fpe_wq);
+		priv->fpe_wq = NULL;
+	}
 
 	netdev_info(priv->dev, "FPE workqueue stop");
 }
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH net v2] stmmac: Clear variable when destroying workqueue
  2024-02-26 16:42       ` Jakub Raczynski
@ 2024-02-27  7:06         ` Jiri Pirko
  2024-02-28 11:30         ` patchwork-bot+netdevbpf
  1 sibling, 0 replies; 5+ messages in thread
From: Jiri Pirko @ 2024-02-27  7:06 UTC (permalink / raw)
  To: Jakub Raczynski; +Cc: netdev, kuba, alexandre.torgue, joabreu

Mon, Feb 26, 2024 at 05:42:32PM CET, j.raczynski@samsung.com wrote:
>Currently when suspending driver and stopping workqueue it is checked whether
>workqueue is not NULL and if so, it is destroyed.
>Function destroy_workqueue() does drain queue and does clear variable, but
>it does not set workqueue variable to NULL. This can cause kernel/module
>panic if code attempts to clear workqueue that was not initialized.
>
>This scenario is possible when resuming suspended driver in stmmac_resume(),
>because there is no handling for failed stmmac_hw_setup(),
>which can fail and return if DMA engine has failed to initialize,
>and workqueue is initialized after DMA engine.
>Should DMA engine fail to initialize, resume will proceed normally,
>but interface won't work and TX queue will eventually timeout,
>causing 'Reset adapter' error.
>This then does destroy workqueue during reset process.
>And since workqueue is initialized after DMA engine and can be skipped,
>it will cause kernel/module panic.
>
>To secure against this possible crash, set workqueue variable to NULL when
>destroying workqueue.
>
>Log/backtrace from crash goes as follows:
>[88.031977]------------[ cut here ]------------
>[88.031985]NETDEV WATCHDOG: eth0 (sxgmac): transmit queue 1 timed out
>[88.032017]WARNING: CPU: 0 PID: 0 at net/sched/sch_generic.c:477 dev_watchdog+0x390/0x398
>           <Skipping backtrace for watchdog timeout>
>[88.032251]---[ end trace e70de432e4d5c2c0 ]---
>[88.032282]sxgmac 16d88000.ethernet eth0: Reset adapter.
>[88.036359]------------[ cut here ]------------
>[88.036519]Call trace:
>[88.036523] flush_workqueue+0x3e4/0x430
>[88.036528] drain_workqueue+0xc4/0x160
>[88.036533] destroy_workqueue+0x40/0x270
>[88.036537] stmmac_fpe_stop_wq+0x4c/0x70
>[88.036541] stmmac_release+0x278/0x280
>[88.036546] __dev_close_many+0xcc/0x158
>[88.036551] dev_close_many+0xbc/0x190
>[88.036555] dev_close.part.0+0x70/0xc0
>[88.036560] dev_close+0x24/0x30
>[88.036564] stmmac_service_task+0x110/0x140
>[88.036569] process_one_work+0x1d8/0x4a0
>[88.036573] worker_thread+0x54/0x408
>[88.036578] kthread+0x164/0x170
>[88.036583] ret_from_fork+0x10/0x20
>[88.036588]---[ end trace e70de432e4d5c2c1 ]---
>[88.036597]Unable to handle kernel NULL pointer dereference at virtual address 0000000000000004
>
>Fixes: 5a5586112b929 ("net: stmmac: support FPE link partner hand-shaking procedure")
>Signed-off-by: Jakub Raczynski <j.raczynski@samsung.com>

Reviewed-by: Jiri Pirko <jiri@nvidia.com>

Next time, send v2 as a separate email starting new thread. Thanks!

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH net v2] stmmac: Clear variable when destroying workqueue
  2024-02-26 16:42       ` Jakub Raczynski
  2024-02-27  7:06         ` Jiri Pirko
@ 2024-02-28 11:30         ` patchwork-bot+netdevbpf
  1 sibling, 0 replies; 5+ messages in thread
From: patchwork-bot+netdevbpf @ 2024-02-28 11:30 UTC (permalink / raw)
  To: Jakub Raczynski; +Cc: netdev, kuba, alexandre.torgue, joabreu

Hello:

This patch was applied to netdev/net.git (main)
by David S. Miller <davem@davemloft.net>:

On Mon, 26 Feb 2024 17:42:32 +0100 you wrote:
> Currently when suspending driver and stopping workqueue it is checked whether
> workqueue is not NULL and if so, it is destroyed.
> Function destroy_workqueue() does drain queue and does clear variable, but
> it does not set workqueue variable to NULL. This can cause kernel/module
> panic if code attempts to clear workqueue that was not initialized.
> 
> This scenario is possible when resuming suspended driver in stmmac_resume(),
> because there is no handling for failed stmmac_hw_setup(),
> which can fail and return if DMA engine has failed to initialize,
> and workqueue is initialized after DMA engine.
> Should DMA engine fail to initialize, resume will proceed normally,
> but interface won't work and TX queue will eventually timeout,
> causing 'Reset adapter' error.
> This then does destroy workqueue during reset process.
> And since workqueue is initialized after DMA engine and can be skipped,
> it will cause kernel/module panic.
> 
> [...]

Here is the summary with links:
  - [net,v2] stmmac: Clear variable when destroying workqueue
    https://git.kernel.org/netdev/net/c/8af411bbba1f

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2024-02-28 11:30 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <CGME20240226154254eucas1p2bedde2c58f147809f83b23d455af9289@eucas1p2.samsung.com>
2024-02-26 15:42 ` [PATCH net v2] stmmac: Clear variable when destroying workqueue Jakub Raczynski
2024-02-26 15:57   ` Jiri Pirko
     [not found]     ` <CGME20240226164337eucas1p196a1049cb7e766984910eee2f99bae4e@eucas1p1.samsung.com>
2024-02-26 16:42       ` Jakub Raczynski
2024-02-27  7:06         ` Jiri Pirko
2024-02-28 11:30         ` patchwork-bot+netdevbpf

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.