All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH net v2] net: phy: Fix race condition on link status change
@ 2022-05-06  6:08 Francesco Dolcini
  2022-05-06  6:13 ` Francesco Dolcini
                   ` (3 more replies)
  0 siblings, 4 replies; 6+ messages in thread
From: Francesco Dolcini @ 2022-05-06  6:08 UTC (permalink / raw)
  To: Andrew Lunn, Heiner Kallweit, Russell King, Joakim Zhang
  Cc: Francesco Dolcini, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, netdev, stable

This fixes the following error caused by a race condition between
phydev->adjust_link() and a MDIO transaction in the phy interrupt
handler. The issue was reproduced with the ethernet FEC driver and a
micrel KSZ9031 phy.

[  146.195696] fec 2188000.ethernet eth0: MDIO read timeout
[  146.201779] ------------[ cut here ]------------
[  146.206671] WARNING: CPU: 0 PID: 571 at drivers/net/phy/phy.c:942 phy_error+0x24/0x6c
[  146.214744] Modules linked in: bnep imx_vdoa imx_sdma evbug
[  146.220640] CPU: 0 PID: 571 Comm: irq/128-2188000 Not tainted 5.18.0-rc3-00080-gd569e86915b7 #9
[  146.229563] Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree)
[  146.236257]  unwind_backtrace from show_stack+0x10/0x14
[  146.241640]  show_stack from dump_stack_lvl+0x58/0x70
[  146.246841]  dump_stack_lvl from __warn+0xb4/0x24c
[  146.251772]  __warn from warn_slowpath_fmt+0x5c/0xd4
[  146.256873]  warn_slowpath_fmt from phy_error+0x24/0x6c
[  146.262249]  phy_error from kszphy_handle_interrupt+0x40/0x48
[  146.268159]  kszphy_handle_interrupt from irq_thread_fn+0x1c/0x78
[  146.274417]  irq_thread_fn from irq_thread+0xf0/0x1dc
[  146.279605]  irq_thread from kthread+0xe4/0x104
[  146.284267]  kthread from ret_from_fork+0x14/0x28
[  146.289164] Exception stack(0xe6fa1fb0 to 0xe6fa1ff8)
[  146.294448] 1fa0:                                     00000000 00000000 00000000 00000000
[  146.302842] 1fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[  146.311281] 1fe0: 00000000 00000000 00000000 00000000 00000013 00000000
[  146.318262] irq event stamp: 12325
[  146.321780] hardirqs last  enabled at (12333): [<c01984c4>] __up_console_sem+0x50/0x60
[  146.330013] hardirqs last disabled at (12342): [<c01984b0>] __up_console_sem+0x3c/0x60
[  146.338259] softirqs last  enabled at (12324): [<c01017f0>] __do_softirq+0x2c0/0x624
[  146.346311] softirqs last disabled at (12319): [<c01300ac>] __irq_exit_rcu+0x138/0x178
[  146.354447] ---[ end trace 0000000000000000 ]---

With the FEC driver phydev->adjust_link() calls fec_enet_adjust_link()
calls fec_stop()/fec_restart() and both these function reset and
temporary disable the FEC disrupting any MII transaction that
could be happening at the same time.

fec_enet_adjust_link() and phy_read() can be running at the same time
when we have one additional interrupt before the phy_state_machine() is
able to terminate.

Thread 1 (phylib WQ)       | Thread 2 (phy interrupt)
                           |
                           | phy_interrupt()            <-- PHY IRQ
                           |  handle_interrupt()
                           |   phy_read()
                           |   phy_trigger_machine()
                           |    --> schedule phylib WQ
                           |
                           |
phy_state_machine()        |
 phy_check_link_status()   |
  phy_link_change()        |
   phydev->adjust_link()   |
    fec_enet_adjust_link() |
     --> FEC reset         | phy_interrupt()            <-- PHY IRQ
                           |  phy_read()
                           |

Fix this by acquiring the phydev lock in phy_interrupt().

Link: https://lore.kernel.org/all/20220422152612.GA510015@francesco-nb.int.toradex.com/
Fixes: c974bdbc3e77 ("net: phy: Use threaded IRQ, to allow IRQ from sleeping devices")
cc: <stable@vger.kernel.org>
Signed-off-by: Francesco Dolcini <francesco.dolcini@toradex.com>
---
 drivers/net/phy/phy.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/drivers/net/phy/phy.c b/drivers/net/phy/phy.c
index beb2b66da132..f122026c4682 100644
--- a/drivers/net/phy/phy.c
+++ b/drivers/net/phy/phy.c
@@ -970,8 +970,13 @@ static irqreturn_t phy_interrupt(int irq, void *phy_dat)
 {
 	struct phy_device *phydev = phy_dat;
 	struct phy_driver *drv = phydev->drv;
+	irqreturn_t ret;
 
-	return drv->handle_interrupt(phydev);
+	mutex_lock(&phydev->lock);
+	ret = drv->handle_interrupt(phydev);
+	mutex_unlock(&phydev->lock);
+
+	return ret;
 }
 
 /**
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH net v2] net: phy: Fix race condition on link status change
  2022-05-06  6:08 [PATCH net v2] net: phy: Fix race condition on link status change Francesco Dolcini
@ 2022-05-06  6:13 ` Francesco Dolcini
  2022-05-06 10:10 ` Francesco Dolcini
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 6+ messages in thread
From: Francesco Dolcini @ 2022-05-06  6:13 UTC (permalink / raw)
  To: Andrew Lunn, Heiner Kallweit, Russell King, Joakim Zhang
  Cc: Francesco Dolcini, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, netdev, stable

On Fri, May 06, 2022 at 08:08:15AM +0200, Francesco Dolcini wrote:
> This fixes the following error caused by a race condition between
> phydev->adjust_link() and a MDIO transaction in the phy interrupt
> handler. The issue was reproduced with the ethernet FEC driver and a
> micrel KSZ9031 phy.
> 
...
> Fix this by acquiring the phydev lock in phy_interrupt().
> 
> Link: https://lore.kernel.org/all/20220422152612.GA510015@francesco-nb.int.toradex.com/
> Fixes: c974bdbc3e77 ("net: phy: Use threaded IRQ, to allow IRQ from sleeping devices")
> cc: <stable@vger.kernel.org>
> Signed-off-by: Francesco Dolcini <francesco.dolcini@toradex.com>

whoops, I forgot the changelog, sorry.

v2: Added fixes tag, corrected commit message formatting (tab vs space)


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH net v2] net: phy: Fix race condition on link status change
  2022-05-06  6:08 [PATCH net v2] net: phy: Fix race condition on link status change Francesco Dolcini
  2022-05-06  6:13 ` Francesco Dolcini
@ 2022-05-06 10:10 ` Francesco Dolcini
  2022-05-06 14:51   ` Greg Kroah-Hartman
  2022-05-07 13:57 ` Andrew Lunn
  2022-05-10  1:00 ` patchwork-bot+netdevbpf
  3 siblings, 1 reply; 6+ messages in thread
From: Francesco Dolcini @ 2022-05-06 10:10 UTC (permalink / raw)
  To: stable, Sasha Levin, Greg Kroah-Hartman
  Cc: Francesco Dolcini, Andrew Lunn, Heiner Kallweit, Russell King,
	Joakim Zhang, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, netdev

Hello,
I have one question about the process of patch/fixes backporting
to old stable kernel.

Assuming the following patch will be deemed correct and applied it will
have to be eventually backported.

This patch will apply cleanly on v5.15+, but some work is needed to
backport to v5.10 or older.

Should I send a patch for the older kernels once the fix is merged?
Reading Documentation/process/stable-kernel-rules.rst it was not clear
to me what to do in this "mixed" situations.

Thanks a lot,
Francesco

On Fri, May 06, 2022 at 08:08:15AM +0200, Francesco Dolcini wrote:
> This fixes the following error caused by a race condition between
> phydev->adjust_link() and a MDIO transaction in the phy interrupt
> handler. The issue was reproduced with the ethernet FEC driver and a
> micrel KSZ9031 phy.
> 
> [  146.195696] fec 2188000.ethernet eth0: MDIO read timeout
> [  146.201779] ------------[ cut here ]------------
> [  146.206671] WARNING: CPU: 0 PID: 571 at drivers/net/phy/phy.c:942 phy_error+0x24/0x6c
> [  146.214744] Modules linked in: bnep imx_vdoa imx_sdma evbug
> [  146.220640] CPU: 0 PID: 571 Comm: irq/128-2188000 Not tainted 5.18.0-rc3-00080-gd569e86915b7 #9
> [  146.229563] Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree)
> [  146.236257]  unwind_backtrace from show_stack+0x10/0x14
> [  146.241640]  show_stack from dump_stack_lvl+0x58/0x70
> [  146.246841]  dump_stack_lvl from __warn+0xb4/0x24c
> [  146.251772]  __warn from warn_slowpath_fmt+0x5c/0xd4
> [  146.256873]  warn_slowpath_fmt from phy_error+0x24/0x6c
> [  146.262249]  phy_error from kszphy_handle_interrupt+0x40/0x48
> [  146.268159]  kszphy_handle_interrupt from irq_thread_fn+0x1c/0x78
> [  146.274417]  irq_thread_fn from irq_thread+0xf0/0x1dc
> [  146.279605]  irq_thread from kthread+0xe4/0x104
> [  146.284267]  kthread from ret_from_fork+0x14/0x28
> [  146.289164] Exception stack(0xe6fa1fb0 to 0xe6fa1ff8)
> [  146.294448] 1fa0:                                     00000000 00000000 00000000 00000000
> [  146.302842] 1fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
> [  146.311281] 1fe0: 00000000 00000000 00000000 00000000 00000013 00000000
> [  146.318262] irq event stamp: 12325
> [  146.321780] hardirqs last  enabled at (12333): [<c01984c4>] __up_console_sem+0x50/0x60
> [  146.330013] hardirqs last disabled at (12342): [<c01984b0>] __up_console_sem+0x3c/0x60
> [  146.338259] softirqs last  enabled at (12324): [<c01017f0>] __do_softirq+0x2c0/0x624
> [  146.346311] softirqs last disabled at (12319): [<c01300ac>] __irq_exit_rcu+0x138/0x178
> [  146.354447] ---[ end trace 0000000000000000 ]---
> 
> With the FEC driver phydev->adjust_link() calls fec_enet_adjust_link()
> calls fec_stop()/fec_restart() and both these function reset and
> temporary disable the FEC disrupting any MII transaction that
> could be happening at the same time.
> 
> fec_enet_adjust_link() and phy_read() can be running at the same time
> when we have one additional interrupt before the phy_state_machine() is
> able to terminate.
> 
> Thread 1 (phylib WQ)       | Thread 2 (phy interrupt)
>                            |
>                            | phy_interrupt()            <-- PHY IRQ
>                            |  handle_interrupt()
>                            |   phy_read()
>                            |   phy_trigger_machine()
>                            |    --> schedule phylib WQ
>                            |
>                            |
> phy_state_machine()        |
>  phy_check_link_status()   |
>   phy_link_change()        |
>    phydev->adjust_link()   |
>     fec_enet_adjust_link() |
>      --> FEC reset         | phy_interrupt()            <-- PHY IRQ
>                            |  phy_read()
>                            |
> 
> Fix this by acquiring the phydev lock in phy_interrupt().
> 
> Link: https://lore.kernel.org/all/20220422152612.GA510015@francesco-nb.int.toradex.com/
> Fixes: c974bdbc3e77 ("net: phy: Use threaded IRQ, to allow IRQ from sleeping devices")
> cc: <stable@vger.kernel.org>
> Signed-off-by: Francesco Dolcini <francesco.dolcini@toradex.com>
> ---
>  drivers/net/phy/phy.c | 7 ++++++-
>  1 file changed, 6 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/net/phy/phy.c b/drivers/net/phy/phy.c
> index beb2b66da132..f122026c4682 100644
> --- a/drivers/net/phy/phy.c
> +++ b/drivers/net/phy/phy.c
> @@ -970,8 +970,13 @@ static irqreturn_t phy_interrupt(int irq, void *phy_dat)
>  {
>  	struct phy_device *phydev = phy_dat;
>  	struct phy_driver *drv = phydev->drv;
> +	irqreturn_t ret;
>  
> -	return drv->handle_interrupt(phydev);
> +	mutex_lock(&phydev->lock);
> +	ret = drv->handle_interrupt(phydev);
> +	mutex_unlock(&phydev->lock);
> +
> +	return ret;
>  }
>  
>  /**
> -- 
> 2.25.1
> 


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH net v2] net: phy: Fix race condition on link status change
  2022-05-06 10:10 ` Francesco Dolcini
@ 2022-05-06 14:51   ` Greg Kroah-Hartman
  0 siblings, 0 replies; 6+ messages in thread
From: Greg Kroah-Hartman @ 2022-05-06 14:51 UTC (permalink / raw)
  To: Francesco Dolcini
  Cc: stable, Sasha Levin, Andrew Lunn, Heiner Kallweit, Russell King,
	Joakim Zhang, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, netdev

On Fri, May 06, 2022 at 12:10:31PM +0200, Francesco Dolcini wrote:
> Hello,
> I have one question about the process of patch/fixes backporting
> to old stable kernel.
> 
> Assuming the following patch will be deemed correct and applied it will
> have to be eventually backported.
> 
> This patch will apply cleanly on v5.15+, but some work is needed to
> backport to v5.10 or older.
> 
> Should I send a patch for the older kernels once the fix is merged?
> Reading Documentation/process/stable-kernel-rules.rst it was not clear
> to me what to do in this "mixed" situations.

You will get an email from me if it does not apply to older kernels and
you can respond with the backported fix then.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH net v2] net: phy: Fix race condition on link status change
  2022-05-06  6:08 [PATCH net v2] net: phy: Fix race condition on link status change Francesco Dolcini
  2022-05-06  6:13 ` Francesco Dolcini
  2022-05-06 10:10 ` Francesco Dolcini
@ 2022-05-07 13:57 ` Andrew Lunn
  2022-05-10  1:00 ` patchwork-bot+netdevbpf
  3 siblings, 0 replies; 6+ messages in thread
From: Andrew Lunn @ 2022-05-07 13:57 UTC (permalink / raw)
  To: Francesco Dolcini
  Cc: Heiner Kallweit, Russell King, Joakim Zhang, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, netdev, stable

On Fri, May 06, 2022 at 08:08:15AM +0200, Francesco Dolcini wrote:
> This fixes the following error caused by a race condition between
> phydev->adjust_link() and a MDIO transaction in the phy interrupt
> handler. The issue was reproduced with the ethernet FEC driver and a
> micrel KSZ9031 phy.
> 
> Link: https://lore.kernel.org/all/20220422152612.GA510015@francesco-nb.int.toradex.com/
> Fixes: c974bdbc3e77 ("net: phy: Use threaded IRQ, to allow IRQ from sleeping devices")
> cc: <stable@vger.kernel.org>
> Signed-off-by: Francesco Dolcini <francesco.dolcini@toradex.com>

Reviewed-by: Andrew Lunn <andrew@lunn.ch>

    Andrew

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH net v2] net: phy: Fix race condition on link status change
  2022-05-06  6:08 [PATCH net v2] net: phy: Fix race condition on link status change Francesco Dolcini
                   ` (2 preceding siblings ...)
  2022-05-07 13:57 ` Andrew Lunn
@ 2022-05-10  1:00 ` patchwork-bot+netdevbpf
  3 siblings, 0 replies; 6+ messages in thread
From: patchwork-bot+netdevbpf @ 2022-05-10  1:00 UTC (permalink / raw)
  To: Francesco Dolcini
  Cc: andrew, hkallweit1, linux, qiangqing.zhang, davem, edumazet,
	kuba, pabeni, netdev, stable

Hello:

This patch was applied to netdev/net.git (master)
by Jakub Kicinski <kuba@kernel.org>:

On Fri,  6 May 2022 08:08:15 +0200 you wrote:
> This fixes the following error caused by a race condition between
> phydev->adjust_link() and a MDIO transaction in the phy interrupt
> handler. The issue was reproduced with the ethernet FEC driver and a
> micrel KSZ9031 phy.
> 
> [  146.195696] fec 2188000.ethernet eth0: MDIO read timeout
> [  146.201779] ------------[ cut here ]------------
> [  146.206671] WARNING: CPU: 0 PID: 571 at drivers/net/phy/phy.c:942 phy_error+0x24/0x6c
> [  146.214744] Modules linked in: bnep imx_vdoa imx_sdma evbug
> [  146.220640] CPU: 0 PID: 571 Comm: irq/128-2188000 Not tainted 5.18.0-rc3-00080-gd569e86915b7 #9
> [  146.229563] Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree)
> [  146.236257]  unwind_backtrace from show_stack+0x10/0x14
> [  146.241640]  show_stack from dump_stack_lvl+0x58/0x70
> [  146.246841]  dump_stack_lvl from __warn+0xb4/0x24c
> [  146.251772]  __warn from warn_slowpath_fmt+0x5c/0xd4
> [  146.256873]  warn_slowpath_fmt from phy_error+0x24/0x6c
> [  146.262249]  phy_error from kszphy_handle_interrupt+0x40/0x48
> [  146.268159]  kszphy_handle_interrupt from irq_thread_fn+0x1c/0x78
> [  146.274417]  irq_thread_fn from irq_thread+0xf0/0x1dc
> [  146.279605]  irq_thread from kthread+0xe4/0x104
> [  146.284267]  kthread from ret_from_fork+0x14/0x28
> [  146.289164] Exception stack(0xe6fa1fb0 to 0xe6fa1ff8)
> [  146.294448] 1fa0:                                     00000000 00000000 00000000 00000000
> [  146.302842] 1fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
> [  146.311281] 1fe0: 00000000 00000000 00000000 00000000 00000013 00000000
> [  146.318262] irq event stamp: 12325
> [  146.321780] hardirqs last  enabled at (12333): [<c01984c4>] __up_console_sem+0x50/0x60
> [  146.330013] hardirqs last disabled at (12342): [<c01984b0>] __up_console_sem+0x3c/0x60
> [  146.338259] softirqs last  enabled at (12324): [<c01017f0>] __do_softirq+0x2c0/0x624
> [  146.346311] softirqs last disabled at (12319): [<c01300ac>] __irq_exit_rcu+0x138/0x178
> [  146.354447] ---[ end trace 0000000000000000 ]---
> 
> [...]

Here is the summary with links:
  - [net,v2] net: phy: Fix race condition on link status change
    https://git.kernel.org/netdev/net/c/91a7cda1f4b8

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2022-05-10  1:00 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-05-06  6:08 [PATCH net v2] net: phy: Fix race condition on link status change Francesco Dolcini
2022-05-06  6:13 ` Francesco Dolcini
2022-05-06 10:10 ` Francesco Dolcini
2022-05-06 14:51   ` Greg Kroah-Hartman
2022-05-07 13:57 ` Andrew Lunn
2022-05-10  1:00 ` patchwork-bot+netdevbpf

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.