All of lore.kernel.org
 help / color / mirror / Atom feed
* ordering of call to unbind() in usbnet_disconnect
@ 2022-03-10 11:25 Oliver Neukum
  2022-03-10 11:38 ` Oleksij Rempel
  0 siblings, 1 reply; 26+ messages in thread
From: Oliver Neukum @ 2022-03-10 11:25 UTC (permalink / raw)
  To: Oleksij Rempel; +Cc: Oleksij Rempel, netdev

Hi,

I got bug reports that 2c9d6c2b871d ("usbnet: run unbind() before
unregister_netdev()")
is causing regressions. Rather than simply reverting it,
it seems to me that the call needs to be split. One in the old place
and one in the place you moved it to.

Could you tell me which drivers you moved the call for?

    Regards
        Oliver


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: ordering of call to unbind() in usbnet_disconnect
  2022-03-10 11:25 ordering of call to unbind() in usbnet_disconnect Oliver Neukum
@ 2022-03-10 11:38 ` Oleksij Rempel
  2022-03-14 18:42   ` Lukas Wunner
  0 siblings, 1 reply; 26+ messages in thread
From: Oleksij Rempel @ 2022-03-10 11:38 UTC (permalink / raw)
  To: Oliver Neukum; +Cc: Oleksij Rempel, netdev

Hi,

On Thu, Mar 10, 2022 at 12:25:08PM +0100, Oliver Neukum wrote:
> Hi,
> 
> I got bug reports that 2c9d6c2b871d ("usbnet: run unbind() before
> unregister_netdev()")
> is causing regressions. Rather than simply reverting it,
> it seems to me that the call needs to be split. One in the old place
> and one in the place you moved it to.
> 
> Could you tell me which drivers you moved the call for?

I moved it for asix_devices.c:ax88772_unbind()

Probably smsc95xx.c:smsc95xx_unbind() has same issue.

Regards,
Oleksij
-- 
Pengutronix e.K.                           |                             |
Steuerwalder Str. 21                       | http://www.pengutronix.de/  |
31137 Hildesheim, Germany                  | Phone: +49-5121-206917-0    |
Amtsgericht Hildesheim, HRA 2686           | Fax:   +49-5121-206917-5555 |

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: ordering of call to unbind() in usbnet_disconnect
  2022-03-10 11:38 ` Oleksij Rempel
@ 2022-03-14 18:42   ` Lukas Wunner
  2022-03-14 19:14     ` Andrew Lunn
  0 siblings, 1 reply; 26+ messages in thread
From: Lukas Wunner @ 2022-03-14 18:42 UTC (permalink / raw)
  To: Oleksij Rempel
  Cc: Oliver Neukum, Oleksij Rempel, netdev, Heiner Kallweit, Andrew Lunn

[cc += Heiner Kallweit, Andrew Lunn]

On Thu, Mar 10, 2022 at 12:38:20PM +0100, Oleksij Rempel wrote:
> On Thu, Mar 10, 2022 at 12:25:08PM +0100, Oliver Neukum wrote:
> > I got bug reports that 2c9d6c2b871d ("usbnet: run unbind() before
> > unregister_netdev()")
> > is causing regressions.

I would like to see this reverted as well.  For obvious reasons,
the order in usbnet_disconnect() should be the inverse of
usbnet_probe().  Since 2c9d6c2b871d, that's no longer the case.


> > Rather than simply reverting it,
> > it seems to me that the call needs to be split. One in the old place
> > and one in the place you moved it to.

I disagree.  The commit message claims that the change is necessary
because phy_disconnect() fails if called with phydev->attached_dev == NULL.

I've just gone through the code to check that and the only thing that
caught my eye is an unconditional call to netif_testing_off(dev) in
phy_stop().  It seems to me that making that call conditional on
"if (dev)" would solve the issue.

That's a bug introduced by
   4a459bdc7472 ("net: phy: Put interface into oper testing during cable
                  test")
or 472115d9834c ("net: phy: stop PHY if needed when entering
                  phy_disconnect")
depending on how you look at it.

Thanks,

Lukas

> > Could you tell me which drivers you moved the call for?
> 
> I moved it for asix_devices.c:ax88772_unbind()
> 
> Probably smsc95xx.c:smsc95xx_unbind() has same issue.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: ordering of call to unbind() in usbnet_disconnect
  2022-03-14 18:42   ` Lukas Wunner
@ 2022-03-14 19:14     ` Andrew Lunn
  2022-03-15  5:44       ` Oleksij Rempel
  0 siblings, 1 reply; 26+ messages in thread
From: Andrew Lunn @ 2022-03-14 19:14 UTC (permalink / raw)
  To: Lukas Wunner
  Cc: Oleksij Rempel, Oliver Neukum, Oleksij Rempel, netdev, Heiner Kallweit

On Mon, Mar 14, 2022 at 07:42:34PM +0100, Lukas Wunner wrote:
> [cc += Heiner Kallweit, Andrew Lunn]
> 
> On Thu, Mar 10, 2022 at 12:38:20PM +0100, Oleksij Rempel wrote:
> > On Thu, Mar 10, 2022 at 12:25:08PM +0100, Oliver Neukum wrote:
> > > I got bug reports that 2c9d6c2b871d ("usbnet: run unbind() before
> > > unregister_netdev()")
> > > is causing regressions.
> 
> I would like to see this reverted as well.  For obvious reasons,
> the order in usbnet_disconnect() should be the inverse of
> usbnet_probe().  Since 2c9d6c2b871d, that's no longer the case.
> 
> 
> > > Rather than simply reverting it,
> > > it seems to me that the call needs to be split. One in the old place
> > > and one in the place you moved it to.
> 
> I disagree.  The commit message claims that the change is necessary
> because phy_disconnect() fails if called with phydev->attached_dev == NULL.

The only place i see which sets phydev->attached_dev is
phy_attach_direct(). So if phydev->attached_dev is NULL, the PHY has
not been attached, and hence there is no need to call
phy_disconnect().

	Andrew

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: ordering of call to unbind() in usbnet_disconnect
  2022-03-14 19:14     ` Andrew Lunn
@ 2022-03-15  5:44       ` Oleksij Rempel
  2022-03-15  8:32         ` Lukas Wunner
  0 siblings, 1 reply; 26+ messages in thread
From: Oleksij Rempel @ 2022-03-15  5:44 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: Lukas Wunner, Oliver Neukum, Oleksij Rempel, netdev, Heiner Kallweit

On Mon, Mar 14, 2022 at 08:14:36PM +0100, Andrew Lunn wrote:
> On Mon, Mar 14, 2022 at 07:42:34PM +0100, Lukas Wunner wrote:
> > [cc += Heiner Kallweit, Andrew Lunn]
> > 
> > On Thu, Mar 10, 2022 at 12:38:20PM +0100, Oleksij Rempel wrote:
> > > On Thu, Mar 10, 2022 at 12:25:08PM +0100, Oliver Neukum wrote:
> > > > I got bug reports that 2c9d6c2b871d ("usbnet: run unbind() before
> > > > unregister_netdev()")
> > > > is causing regressions.
> > 
> > I would like to see this reverted as well.  For obvious reasons,
> > the order in usbnet_disconnect() should be the inverse of
> > usbnet_probe().  Since 2c9d6c2b871d, that's no longer the case.
> > 
> > 
> > > > Rather than simply reverting it,
> > > > it seems to me that the call needs to be split. One in the old place
> > > > and one in the place you moved it to.
> > 
> > I disagree.  The commit message claims that the change is necessary
> > because phy_disconnect() fails if called with phydev->attached_dev == NULL.
> 
> The only place i see which sets phydev->attached_dev is
> phy_attach_direct(). So if phydev->attached_dev is NULL, the PHY has
> not been attached, and hence there is no need to call
> phy_disconnect().

phydev->attached_dev is not NULL. It was linked to unregistered/freed
netdev. This is why my patch changing the order to call phy_disconnect()
first and then unregister_netdev().

Regards,
Oleksij
-- 
Pengutronix e.K.                           |                             |
Steuerwalder Str. 21                       | http://www.pengutronix.de/  |
31137 Hildesheim, Germany                  | Phone: +49-5121-206917-0    |
Amtsgericht Hildesheim, HRA 2686           | Fax:   +49-5121-206917-5555 |

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: ordering of call to unbind() in usbnet_disconnect
  2022-03-15  5:44       ` Oleksij Rempel
@ 2022-03-15  8:32         ` Lukas Wunner
  2022-03-15 11:38           ` Oleksij Rempel
  0 siblings, 1 reply; 26+ messages in thread
From: Lukas Wunner @ 2022-03-15  8:32 UTC (permalink / raw)
  To: Oleksij Rempel
  Cc: Andrew Lunn, Oliver Neukum, Oleksij Rempel, netdev, Heiner Kallweit

On Tue, Mar 15, 2022 at 06:44:03AM +0100, Oleksij Rempel wrote:
> On Mon, Mar 14, 2022 at 08:14:36PM +0100, Andrew Lunn wrote:
> > On Mon, Mar 14, 2022 at 07:42:34PM +0100, Lukas Wunner wrote:
> > > On Thu, Mar 10, 2022 at 12:38:20PM +0100, Oleksij Rempel wrote:
> > > > On Thu, Mar 10, 2022 at 12:25:08PM +0100, Oliver Neukum wrote:
> > > > > I got bug reports that 2c9d6c2b871d ("usbnet: run unbind() before
> > > > > unregister_netdev()") is causing regressions.
> > > > > Rather than simply reverting it,
> > > > > it seems to me that the call needs to be split. One in the old place
> > > > > and one in the place you moved it to.
> > > 
> > > I disagree.  The commit message claims that the change is necessary
> > > because phy_disconnect() fails if called with
> > > phydev->attached_dev == NULL.
> > 
> > The only place i see which sets phydev->attached_dev is
> > phy_attach_direct(). So if phydev->attached_dev is NULL, the PHY has
> > not been attached, and hence there is no need to call
> > phy_disconnect().
> 
> phydev->attached_dev is not NULL.

Right, I was mistaken, sorry.


> It was linked to unregistered/freed
> netdev. This is why my patch changing the order to call phy_disconnect()
> first and then unregister_netdev().

Unregistered yes, but freed no.  Here's the order before 2c9d6c2b871d:

  usbnet_disconnect()
    unregister_netdev()
    ax88772_unbind()
      phy_disconnect()
    free_netdev()

Is it illegal to disconnect a PHY from an unregistered, but not yet freed
net_device?

Oleksij, the commit message of 2c9d6c2b871d says that disconnecting the
PHY "fails" in that situation.  Please elaborate what the failure looked
like.  Did you get a stacktrace?

Thanks,

Lukas

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: ordering of call to unbind() in usbnet_disconnect
  2022-03-15  8:32         ` Lukas Wunner
@ 2022-03-15 11:38           ` Oleksij Rempel
  2022-03-15 13:28             ` Andrew Lunn
  2022-03-26 12:25             ` Lukas Wunner
  0 siblings, 2 replies; 26+ messages in thread
From: Oleksij Rempel @ 2022-03-15 11:38 UTC (permalink / raw)
  To: Lukas Wunner
  Cc: Andrew Lunn, Oliver Neukum, Oleksij Rempel, netdev, Heiner Kallweit

On Tue, Mar 15, 2022 at 09:32:34AM +0100, Lukas Wunner wrote:
> On Tue, Mar 15, 2022 at 06:44:03AM +0100, Oleksij Rempel wrote:
> > On Mon, Mar 14, 2022 at 08:14:36PM +0100, Andrew Lunn wrote:
> > > On Mon, Mar 14, 2022 at 07:42:34PM +0100, Lukas Wunner wrote:
> > > > On Thu, Mar 10, 2022 at 12:38:20PM +0100, Oleksij Rempel wrote:
> > > > > On Thu, Mar 10, 2022 at 12:25:08PM +0100, Oliver Neukum wrote:
> > > > > > I got bug reports that 2c9d6c2b871d ("usbnet: run unbind() before
> > > > > > unregister_netdev()") is causing regressions.
> > > > > > Rather than simply reverting it,
> > > > > > it seems to me that the call needs to be split. One in the old place
> > > > > > and one in the place you moved it to.
> > > > 
> > > > I disagree.  The commit message claims that the change is necessary
> > > > because phy_disconnect() fails if called with
> > > > phydev->attached_dev == NULL.
> > > 
> > > The only place i see which sets phydev->attached_dev is
> > > phy_attach_direct(). So if phydev->attached_dev is NULL, the PHY has
> > > not been attached, and hence there is no need to call
> > > phy_disconnect().
> > 
> > phydev->attached_dev is not NULL.
> 
> Right, I was mistaken, sorry.
> 
> 
> > It was linked to unregistered/freed
> > netdev. This is why my patch changing the order to call phy_disconnect()
> > first and then unregister_netdev().
> 
> Unregistered yes, but freed no.  Here's the order before 2c9d6c2b871d:
> 
>   usbnet_disconnect()
>     unregister_netdev()
>     ax88772_unbind()
>       phy_disconnect()
>     free_netdev()
> 
> Is it illegal to disconnect a PHY from an unregistered, but not yet freed
> net_device?
> 
> Oleksij, the commit message of 2c9d6c2b871d says that disconnecting the
> PHY "fails" in that situation.  Please elaborate what the failure looked
> like.  Did you get a stacktrace?

[   15.459655] asix 2-1.2:1.0 eth1: Link is Up - 100Mbps/Full - flow control off
[   30.600242] usb 2-1.2: USB disconnect, device number 3
[   30.611962] asix 2-1.2:1.0 eth1: unregister 'asix' usb-ci_hdrc.1-1.2, ASIX AX88772B USB 2.0 Ethernet
[   30.649173] asix 2-1.2:1.0 eth1 (unregistered): Failed to write reg index 0x0000: -19
[   30.657027] asix 2-1.2:1.0 eth1 (unregistered): Failed to write Medium Mode mode to 0x0000: ffffffed
[   30.683006] asix 2-1.2:1.0 eth1 (unregistered): Link is Down
[   30.689512] asix 2-1.2:1.0 eth1 (unregistered): Failed to write reg index 0x0000: -19
[   30.697359] asix 2-1.2:1.0 eth1 (unregistered): Failed to enable software MII access
[   30.706009] asix 2-1.2:1.0 eth1 (unregistered): Failed to write reg index 0x0000: -19
[   30.714277] asix 2-1.2:1.0 eth1 (unregistered): Failed to enable software MII access
[   30.732689] 8<--- cut here ---
[   30.735757] Unable to handle kernel paging request at virtual address 2e839000
[   30.742984] pgd = af824ad7
[   30.745695] [2e839000] *pgd=00000000
[   30.749282] Internal error: Oops: 5 [#1] PREEMPT SMP ARM
[   30.754602] Modules linked in:
[   30.757663] CPU: 0 PID: 77 Comm: kworker/0:2 Not tainted 5.13.0-rc3-00818-g06edf1a940be #2
[   30.765934] Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree)
[   30.772466] Workqueue: events linkwatch_event
[   30.776841] PC is at linkwatch_do_dev+0x6c/0x88
[   30.781380] LR is at __local_bh_enable_ip+0x6c/0x100
[   30.786356] pc : [<c08637f4>]    lr : [<c013e5c4>]    psr: 60030093
[   30.792625] sp : c1d2bed8  ip : 00000001  fp : c28d2000
[   30.797852] r10: c0fcf19c  r9 : c0fcf170  r8 : 00000000
[   30.803080] r7 : c102f044  r6 : c1d2beec  r5 : 00000063  r4 : c28d2000
[   30.809611] r3 : 00000000  r2 : 00000000  r1 : 2e839000  r0 : 60030013
[   30.816140] Flags: nZCv  IRQs off  FIQs on  Mode SVC_32  ISA ARM  Segment none
[   30.823367] Control: 10c5387d  Table: 12b54059  DAC: 00000051
[   30.829114] Register r0 information: non-paged memory
[   30.834174] Register r1 information: non-paged memory
[   30.839231] Register r2 information: NULL pointer
[   30.843941] Register r3 information: NULL pointer
[   30.848649] Register r4 information: slab kmalloc-2k start c28d2000 pointer offset 0 size 2048
[   30.857287] Register r5 information: non-paged memory
[   30.862344] Register r6 information: non-slab/vmalloc memory
[   30.868008] Register r7 information: non-slab/vmalloc memory
[   30.873673] Register r8 information: NULL pointer
[   30.878381] Register r9 information: non-slab/vmalloc memory
[   30.884045] Register r10 information: non-slab/vmalloc memory
[   30.889797] Register r11 information: slab kmalloc-2k start c28d2000 pointer offset 0 size 2048
[   30.898516] Register r12 information: non-paged memory
[   30.903662] Process kworker/0:2 (pid: 77, stack limit = 0xded42e9b)
[   30.909935] Stack: (0xc1d2bed8 to 0xc1d2c000)
[   30.914298] bec0:                                                       c28d22cc c0863a48
[   30.922481] bee0: 00000000 c1d2a000 00000008 c1d2beec c1d2beec 73675768 c1d4da84 c0fcf170
[   30.930663] bf00: c1ce0d80 ef6d28c0 ef6d5c00 00000000 00000000 c0fed000 ef6d28c0 c0863b8c
[   30.938844] bf20: c0fcf170 c0155550 c1d2a000 c0a13fd4 ef6d28d8 c1ce0d80 ef6d28c0 c1ce0d94
[   30.947026] bf40: ef6d28d8 c0f03d00 00000008 c1d2a000 ef6d28c0 c0155dc0 c1003e48 c0fec754
[   30.955208] bf60: c1ce75e4 c1ce75c0 c1ce7fc0 c1d2a000 00000000 c1931eb4 c0155d5c c1ce0d80
[   30.963389] bf80: c1ce75e4 c015bacc 00000000 c1ce7fc0 c015b954 00000000 00000000 00000000
[   30.971570] bfa0: 00000000 00000000 00000000 c0100150 00000000 00000000 00000000 00000000
[   30.979750] bfc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[   30.987931] bfe0: 00000000 00000000 00000000 00000000 00000013 00000000 00000000 00000000
[   30.996114] [<c08637f4>] (linkwatch_do_dev) from [<c0863a48>] (__linkwatch_run_queue+0xe0/0x1f0)
[   31.004917] [<c0863a48>] (__linkwatch_run_queue) from [<c0863b8c>] (linkwatch_event+0x34/0x3c)
[   31.013540] [<c0863b8c>] (linkwatch_event) from [<c0155550>] (process_one_work+0x20c/0x5d0)
[   31.021911] [<c0155550>] (process_one_work) from [<c0155dc0>] (worker_thread+0x64/0x570)
[   31.030010] [<c0155dc0>] (worker_thread) from [<c015bacc>] (kthread+0x178/0x190)
[   31.037421] [<c015bacc>] (kthread) from [<c0100150>] (ret_from_fork+0x14/0x24)
[   31.044654] Exception stack(0xc1d2bfb0 to 0xc1d2bff8)
[   31.049710] bfa0:                                     00000000 00000000 00000000 00000000
[   31.057891] bfc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[   31.066071] bfe0: 00000000 00000000 00000000 00000000 00000013 00000000
[   31.072692] Code: e10f0000 f10c0080 e59432c8 ee1d1f90 (e7932001) 
[   31.078788] ---[ end trace f80581862631ce84 ]---

-- 
Pengutronix e.K.                           |                             |
Steuerwalder Str. 21                       | http://www.pengutronix.de/  |
31137 Hildesheim, Germany                  | Phone: +49-5121-206917-0    |
Amtsgericht Hildesheim, HRA 2686           | Fax:   +49-5121-206917-5555 |

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: ordering of call to unbind() in usbnet_disconnect
  2022-03-15 11:38           ` Oleksij Rempel
@ 2022-03-15 13:28             ` Andrew Lunn
  2022-03-17 15:53               ` Oliver Neukum
  2022-03-21 10:02               ` Lukas Wunner
  2022-03-26 12:25             ` Lukas Wunner
  1 sibling, 2 replies; 26+ messages in thread
From: Andrew Lunn @ 2022-03-15 13:28 UTC (permalink / raw)
  To: Oleksij Rempel
  Cc: Lukas Wunner, Oliver Neukum, Oleksij Rempel, netdev, Heiner Kallweit

> > > It was linked to unregistered/freed
> > > netdev. This is why my patch changing the order to call phy_disconnect()
> > > first and then unregister_netdev().
> > 
> > Unregistered yes, but freed no.  Here's the order before 2c9d6c2b871d:
> > 
> >   usbnet_disconnect()
> >     unregister_netdev()
> >     ax88772_unbind()
> >       phy_disconnect()
> >     free_netdev()
> > 
> > Is it illegal to disconnect a PHY from an unregistered, but not yet freed
> > net_device?

There are drivers which unregistering and then calling
phy_disconnect. In general that should be a valid pattern. But more
MAC drivers actually connect the PHY on open and disconnect it on
close. So it is less well used.

> > Oleksij, the commit message of 2c9d6c2b871d says that disconnecting the
> > PHY "fails" in that situation.  Please elaborate what the failure looked
> > like.  Did you get a stacktrace?
> 
> [   15.459655] asix 2-1.2:1.0 eth1: Link is Up - 100Mbps/Full - flow control off
> [   30.600242] usb 2-1.2: USB disconnect, device number 3
> [   30.611962] asix 2-1.2:1.0 eth1: unregister 'asix' usb-ci_hdrc.1-1.2, ASIX AX88772B USB 2.0 Ethernet
> [   30.649173] asix 2-1.2:1.0 eth1 (unregistered): Failed to write reg index 0x0000: -19
> [   30.657027] asix 2-1.2:1.0 eth1 (unregistered): Failed to write Medium Mode mode to 0x0000: ffffffed
> [   30.683006] asix 2-1.2:1.0 eth1 (unregistered): Link is Down
> [   30.689512] asix 2-1.2:1.0 eth1 (unregistered): Failed to write reg index 0x0000: -19
> [   30.697359] asix 2-1.2:1.0 eth1 (unregistered): Failed to enable software MII access
> [   30.706009] asix 2-1.2:1.0 eth1 (unregistered): Failed to write reg index 0x0000: -19
> [   30.714277] asix 2-1.2:1.0 eth1 (unregistered): Failed to enable software MII access
> [   30.732689] 8<--- cut here ---
> [   30.735757] Unable to handle kernel paging request at virtual address 2e839000
> [   30.996114] [<c08637f4>] (linkwatch_do_dev) from [<c0863a48>] (__linkwatch_run_queue+0xe0/0x1f0)
> [   31.004917] [<c0863a48>] (__linkwatch_run_queue) from [<c0863b8c>] (linkwatch_event+0x34/0x3c)
> [   31.013540] [<c0863b8c>] (linkwatch_event) from [<c0155550>] (process_one_work+0x20c/0x5d0)
> [   31.021911] [<c0155550>] (process_one_work) from [<c0155dc0>] (worker_thread+0x64/0x570)
> [   31.030010] [<c0155dc0>] (worker_thread) from [<c015bacc>] (kthread+0x178/0x190)
> [   31.037421] [<c015bacc>] (kthread) from [<c0100150>] (ret_from_fork+0x14/0x24)

This is not directly PHY related, although it could be indirect. This
is to do with sending notifications after the link changed etc. It is
async, so i wounder if by the time it runs the netdev has been freed?
That would indicate some sort of bug, missing lock etc.

     Andrew

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: ordering of call to unbind() in usbnet_disconnect
  2022-03-15 13:28             ` Andrew Lunn
@ 2022-03-17 15:53               ` Oliver Neukum
  2022-03-17 21:03                 ` Lukas Wunner
  2022-03-21 10:17                 ` Lukas Wunner
  2022-03-21 10:02               ` Lukas Wunner
  1 sibling, 2 replies; 26+ messages in thread
From: Oliver Neukum @ 2022-03-17 15:53 UTC (permalink / raw)
  To: Andrew Lunn, Oleksij Rempel
  Cc: Lukas Wunner, Oliver Neukum, Oleksij Rempel, netdev, Heiner Kallweit



On 15.03.22 14:28, Andrew Lunn wrote:
>>>> It was linked to unregistered/freed
>>>> netdev. This is why my patch changing the order to call phy_disconnect()
>>>> first and then unregister_netdev().
>>> Unregistered yes, but freed no.  Here's the order before 2c9d6c2b871d:
>>>
>>>   usbnet_disconnect()
>>>     unregister_netdev()
>>>     ax88772_unbind()
>>>       phy_disconnect()
>>>     free_netdev()
>>>
>>> Is it illegal to disconnect a PHY from an unregistered, but not yet freed
>>> net_device?
> There are drivers which unregistering and then calling
> phy_disconnect. In general that should be a valid pattern. But more
> MAC drivers actually connect the PHY on open and disconnect it on
> close. So it is less well used.
Hi,

this is an interesting discussion, but what practical conclusion
do we draw from it? Is it necessary to provide both orders
of notifying the subdriver, or isn't it?

    Regards
        Oliver


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: ordering of call to unbind() in usbnet_disconnect
  2022-03-17 15:53               ` Oliver Neukum
@ 2022-03-17 21:03                 ` Lukas Wunner
  2022-03-21 10:17                 ` Lukas Wunner
  1 sibling, 0 replies; 26+ messages in thread
From: Lukas Wunner @ 2022-03-17 21:03 UTC (permalink / raw)
  To: Oliver Neukum
  Cc: Andrew Lunn, Oleksij Rempel, Oleksij Rempel, netdev, Heiner Kallweit

On Thu, Mar 17, 2022 at 04:53:34PM +0100, Oliver Neukum wrote:
> On 15.03.22 14:28, Andrew Lunn wrote:
> >>>> It was linked to unregistered/freed
> >>>> netdev. This is why my patch changing the order to call phy_disconnect()
> >>>> first and then unregister_netdev().
> >>> Unregistered yes, but freed no.  Here's the order before 2c9d6c2b871d:
> >>>
> >>>   usbnet_disconnect()
> >>>     unregister_netdev()
> >>>     ax88772_unbind()
> >>>       phy_disconnect()
> >>>     free_netdev()
> >>>
> >>> Is it illegal to disconnect a PHY from an unregistered, but not yet freed
> >>> net_device?
> > There are drivers which unregistering and then calling
> > phy_disconnect. In general that should be a valid pattern. But more
> > MAC drivers actually connect the PHY on open and disconnect it on
> > close. So it is less well used.
> 
> this is an interesting discussion, but what practical conclusion
> do we draw from it? Is it necessary to provide both orders
> of notifying the subdriver, or isn't it?

As far as I'm concerned, more time for analysis is needed
to understand what the issues really are and how to solve them.

Commit a049a30fc27c (for a different USB Ethernet driver -- smsc95xx.c)
seems to imply that unregistering the netdev before phy_disconnect()
doesn't work.

Thanks,

Lukas

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: ordering of call to unbind() in usbnet_disconnect
  2022-03-15 13:28             ` Andrew Lunn
  2022-03-17 15:53               ` Oliver Neukum
@ 2022-03-21 10:02               ` Lukas Wunner
  2022-03-21 13:10                 ` Andrew Lunn
  1 sibling, 1 reply; 26+ messages in thread
From: Lukas Wunner @ 2022-03-21 10:02 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: Oleksij Rempel, Oliver Neukum, Oleksij Rempel, netdev, Heiner Kallweit

On Tue, Mar 15, 2022 at 02:28:32PM +0100, Andrew Lunn wrote:
> > > > It was linked to unregistered/freed netdev. This is why my patch
> > > > changing the order to call phy_disconnect() first and then
> > > > unregister_netdev().
> > > 
> > > Unregistered yes, but freed no.  Here's the order before 2c9d6c2b871d:
> > > 
> > >   usbnet_disconnect()
> > >     unregister_netdev()
> > >     ax88772_unbind()
> > >       phy_disconnect()
> > >     free_netdev()
> > > 
> > > Is it illegal to disconnect a PHY from an unregistered, but not yet freed
> > > net_device?
> 
> There are drivers which unregistering and then calling
> phy_disconnect. In general that should be a valid pattern. But more
> MAC drivers actually connect the PHY on open and disconnect it on
> close. So it is less well used.

It turns out that unregistering a net_device and then calling
phy_disconnect() may indeed crash and is thus not permitted
right now:

Oleksij added the following code comment with commit e532a096be0e
("net: usb: asix: ax88772: add phylib support"):

  /* On unplugged USB, we will get MDIO communication errors and the
   * PHY will be set in to PHY_HALTED state.

So the USB adapter gets unplugged, access to MII registers fails with
-ENODEV, phy_error() is called, phy_state_machine() transitions to
PHY_HALTED and performs the following call:

  phy_state_machine()
    phy_link_down()
      phy_link_change()
        netif_carrier_off()
          linkwatch_fire_event()

Asynchronously, usbnet_disconnect() calls phy_detach() and then
free_netdev().

A bit later, linkwatch_event() runs and tries to access the freed
net_device, leading to the crash that Oleksij posted upthread.

The fact that linkwatch_fire_event() increments the refcount doesn't
help because unregister_netdevice() has already run (it waits for
the refcount to drop to 1).

My suggestion would be to amend unregister_netdevice() to set
dev->phydev->attached_dev = NULL.  It may also be a good idea
to WARN_ON() in free_netdev() if the refcount is not 1.

Andrew, please clarify whether you really think that the
"unregister netdev, then detach phy" order should be supported.
If you do think that it should be supported, we'll have to litter
phylib with NULL pointer checks for attached_dev.  If you don't
want that, we should at least document that it's an illegal pattern.

Even if you decide that we should rather declare this pattern
illegal rather than littering phylib with NULL pointer checks,
I strongly recommend that at least unregister_netdevice() sets
dev->phydev->attached_dev = NULL.  That will cause oopses which
are easier to debug than complex races like the one witnessed
by Oleksij.

Side note:  Since e532a096be0e, ax88772_stop() directly accesses
phydev->state.  I wonder whether that's legal.  I'm under the
impression that the state is internal to phylib.

Another side note:  The commit message of 2c9d6c2b871d is poor.
It should have contained a stack trace and a clear explanation.
It took me days of staring at code to reverse-engineer what
went wrong here.

Thanks,

Lukas

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: ordering of call to unbind() in usbnet_disconnect
  2022-03-17 15:53               ` Oliver Neukum
  2022-03-17 21:03                 ` Lukas Wunner
@ 2022-03-21 10:17                 ` Lukas Wunner
  2022-03-21 10:43                   ` Oleksij Rempel
  2022-03-31  9:35                   ` Oliver Neukum
  1 sibling, 2 replies; 26+ messages in thread
From: Lukas Wunner @ 2022-03-21 10:17 UTC (permalink / raw)
  To: Oliver Neukum
  Cc: Andrew Lunn, Oleksij Rempel, Oleksij Rempel, netdev, Heiner Kallweit

On Thu, Mar 17, 2022 at 04:53:34PM +0100, Oliver Neukum wrote:
> On 15.03.22 14:28, Andrew Lunn wrote:
> >>>> It was linked to unregistered/freed
> >>>> netdev. This is why my patch changing the order to call phy_disconnect()
> >>>> first and then unregister_netdev().
> >>> Unregistered yes, but freed no.  Here's the order before 2c9d6c2b871d:
> >>>
> >>>   usbnet_disconnect()
> >>>     unregister_netdev()
> >>>     ax88772_unbind()
> >>>       phy_disconnect()
> >>>     free_netdev()
> >>>
> >>> Is it illegal to disconnect a PHY from an unregistered, but not yet freed
> >>> net_device?
> > There are drivers which unregistering and then calling
> > phy_disconnect. In general that should be a valid pattern. But more
> > MAC drivers actually connect the PHY on open and disconnect it on
> > close. So it is less well used.
> 
> this is an interesting discussion, but what practical conclusion
> do we draw from it? Is it necessary to provide both orders
> of notifying the subdriver, or isn't it?

I see two possible solutions:

(1) The pattern of unregistering a net_device and then detaching the PHY
    is made legal by setting attached_dev = NULL on unregister and adding
    NULL pointer checks to phylib.  I'll wait for the phylib maintainers'
    comments whether they find that acceptable.

(2) Affected drivers (asix_devices.c, smsc95xx.c, possibly others) are
    amended with ->ndo_uninit() callbacks, which call phy_disconnect().
    That's basically your (Oliver's) idea to split usbnet ->unbind,
    but without actually having to split it.  (Just use the existing
    ->ndo_uninit.)

By the way: 2c9d6c2b871d caused breakage in smsc95xx.c which was
subsequently fixed by a049a30fc27c.  That in turn required another
fix, 0bf3885324a8.  Some of these code changes will have to be
rolled back or adjusted after reverting 2c9d6c2b871d.  It's a giant mess.
It's possible that more drivers saw fixes due to 2c9d6c2b871d,
I haven't checked that yet.

Oliver, why did you want to revert 2c9d6c2b871d, i.e. in which drivers
have your users reported breakage?  Do you have bugzilla links?

Thanks,

Lukas

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: ordering of call to unbind() in usbnet_disconnect
  2022-03-21 10:17                 ` Lukas Wunner
@ 2022-03-21 10:43                   ` Oleksij Rempel
  2022-03-31  9:35                   ` Oliver Neukum
  1 sibling, 0 replies; 26+ messages in thread
From: Oleksij Rempel @ 2022-03-21 10:43 UTC (permalink / raw)
  To: Lukas Wunner
  Cc: Oliver Neukum, Andrew Lunn, Oleksij Rempel, netdev, Heiner Kallweit

On Mon, Mar 21, 2022 at 11:17:51AM +0100, Lukas Wunner wrote:
> On Thu, Mar 17, 2022 at 04:53:34PM +0100, Oliver Neukum wrote:
> > On 15.03.22 14:28, Andrew Lunn wrote:
> > >>>> It was linked to unregistered/freed
> > >>>> netdev. This is why my patch changing the order to call phy_disconnect()
> > >>>> first and then unregister_netdev().
> > >>> Unregistered yes, but freed no.  Here's the order before 2c9d6c2b871d:
> > >>>
> > >>>   usbnet_disconnect()
> > >>>     unregister_netdev()
> > >>>     ax88772_unbind()
> > >>>       phy_disconnect()
> > >>>     free_netdev()
> > >>>
> > >>> Is it illegal to disconnect a PHY from an unregistered, but not yet freed
> > >>> net_device?
> > > There are drivers which unregistering and then calling
> > > phy_disconnect. In general that should be a valid pattern. But more
> > > MAC drivers actually connect the PHY on open and disconnect it on
> > > close. So it is less well used.
> > 
> > this is an interesting discussion, but what practical conclusion
> > do we draw from it? Is it necessary to provide both orders
> > of notifying the subdriver, or isn't it?
> 
> I see two possible solutions:
> 
> (1) The pattern of unregistering a net_device and then detaching the PHY
>     is made legal by setting attached_dev = NULL on unregister and adding
>     NULL pointer checks to phylib.  I'll wait for the phylib maintainers'
>     comments whether they find that acceptable.
> 
> (2) Affected drivers (asix_devices.c, smsc95xx.c, possibly others) are
>     amended with ->ndo_uninit() callbacks, which call phy_disconnect().
>     That's basically your (Oliver's) idea to split usbnet ->unbind,
>     but without actually having to split it.  (Just use the existing
>     ->ndo_uninit.)
> 
> By the way: 2c9d6c2b871d caused breakage in smsc95xx.c which was
> subsequently fixed by a049a30fc27c.  That in turn required another
> fix, 0bf3885324a8.  Some of these code changes will have to be
> rolled back or adjusted after reverting 2c9d6c2b871d.  It's a giant mess.
> It's possible that more drivers saw fixes due to 2c9d6c2b871d,
> I haven't checked that yet.

The smsc95xx.c needed fixes after porting it to PHYlib and MDIO. Since
you can just remove the USB adapter, everything depending and still
running on it will just explode in different colors. The is no way, you
can run stop sequence on physically disconnected device.

> Oliver, why did you want to revert 2c9d6c2b871d, i.e. in which drivers
> have your users reported breakage?  Do you have bugzilla links?
> 
> Thanks,
> 
> Lukas
> 

-- 
Pengutronix e.K.                           |                             |
Steuerwalder Str. 21                       | http://www.pengutronix.de/  |
31137 Hildesheim, Germany                  | Phone: +49-5121-206917-0    |
Amtsgericht Hildesheim, HRA 2686           | Fax:   +49-5121-206917-5555 |

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: ordering of call to unbind() in usbnet_disconnect
  2022-03-21 10:02               ` Lukas Wunner
@ 2022-03-21 13:10                 ` Andrew Lunn
  2022-03-26 12:39                   ` Lukas Wunner
  0 siblings, 1 reply; 26+ messages in thread
From: Andrew Lunn @ 2022-03-21 13:10 UTC (permalink / raw)
  To: Lukas Wunner
  Cc: Oleksij Rempel, Oliver Neukum, Oleksij Rempel, netdev, Heiner Kallweit

> It turns out that unregistering a net_device and then calling
> phy_disconnect() may indeed crash and is thus not permitted
> right now:
> 
> Oleksij added the following code comment with commit e532a096be0e
> ("net: usb: asix: ax88772: add phylib support"):
> 
>   /* On unplugged USB, we will get MDIO communication errors and the
>    * PHY will be set in to PHY_HALTED state.
> 
> So the USB adapter gets unplugged, access to MII registers fails with
> -ENODEV, phy_error() is called, phy_state_machine() transitions to
> PHY_HALTED and performs the following call:
> 
>   phy_state_machine()
>     phy_link_down()
>       phy_link_change()
>         netif_carrier_off()
>           linkwatch_fire_event()
> 
> Asynchronously, usbnet_disconnect() calls phy_detach() and then
> free_netdev().
> 
> A bit later, linkwatch_event() runs and tries to access the freed
> net_device, leading to the crash that Oleksij posted upthread.
> 
> The fact that linkwatch_fire_event() increments the refcount doesn't
> help because unregister_netdevice() has already run (it waits for
> the refcount to drop to 1).
> 
> My suggestion would be to amend unregister_netdevice() to set
> dev->phydev->attached_dev = NULL.  It may also be a good idea
> to WARN_ON() in free_netdev() if the refcount is not 1.
> 
> Andrew, please clarify whether you really think that the
> "unregister netdev, then detach phy" order should be supported.

There are two patterns in use at the moment:

1) The phy is attached in open() and detached in close(). There is no
   danger of the netdev disappearing at this time.

2) The PHY is attached during probe, and detached during release.

This second case is what is being used here in the USB code. This is
also a common pattern for complex devices. In probe, you get all the
components of a complex devices, stitch them together and then
register the composite device. During release, you unregister the
composite device, and then release all the components. Since this is a
natural model, i think it should work.

A WARN_ON() for refcount not 1 makes a lot of sense.

dev->phydev->attached_dev = NULL im less sure about. That is what
detach is all about. I think we need to review the code and see if the
normal path is making use of netdev, or is this problem limited to
phy_error(), which the USB code is invoking? It maybe we need to make
phy_error() more paranoid, or not even used in detach.

	    Andrew

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: ordering of call to unbind() in usbnet_disconnect
  2022-03-15 11:38           ` Oleksij Rempel
  2022-03-15 13:28             ` Andrew Lunn
@ 2022-03-26 12:25             ` Lukas Wunner
  2022-03-26 12:44               ` Andrew Lunn
  1 sibling, 1 reply; 26+ messages in thread
From: Lukas Wunner @ 2022-03-26 12:25 UTC (permalink / raw)
  To: Oleksij Rempel
  Cc: Andrew Lunn, Oliver Neukum, Oleksij Rempel, netdev, Heiner Kallweit

On Tue, Mar 15, 2022 at 12:38:41PM +0100, Oleksij Rempel wrote:
> On Tue, Mar 15, 2022 at 09:32:34AM +0100, Lukas Wunner wrote:
> > > > > > On Thu, Mar 10, 2022 at 12:25:08PM +0100, Oliver Neukum wrote:
> > > > > > > I got bug reports that 2c9d6c2b871d ("usbnet: run unbind() before
> > > > > > > unregister_netdev()") is causing regressions.
> > 
> > Is it illegal to disconnect a PHY from an unregistered, but not yet freed
> > net_device?
> > 
> > Oleksij, the commit message of 2c9d6c2b871d says that disconnecting the
> > PHY "fails" in that situation.  Please elaborate what the failure looked
> > like.  Did you get a stacktrace?

Oleksij, I cannot reproduce your stacktrace (included in full length below).
I've tested with kernel 5.13 (since the stacktrace was with 5.13-rc3)
with all your (and other people's) asix patches applied on top,
except for 2c9d6c2b871d.  Tried unplugging an AX88772A multiple times,
never got a stacktrace.

I've also walked down the code paths from usbnet_disconnect() and cannot
see how the stacktrace could occur.

Normally an unregistering netdev is removed from the linkwatch event list
(lweventlist) via this call stack:

          usbnet_disconnect()
            unregister_netdev()
              rtnl_unlock()
                netdev_run_todo()
                  netdev_wait_allrefs()
                    linkwatch_forget_dev()
                      linkwatch_do_dev()

For the stacktrace to occur, the netdev would have to be subsequently
re-added to the linkwatch event list via linkwatch_fire_event().

That is called, among other places, from netif_carrier_off().  However,
netif_carrier_off() is already called *before* linkwatch_forget_dev()
when unregister_netdev() stops the netdev before unregistering it:

          usbnet_disconnect()
            unregister_netdev()
              unregister_netdevice()
                unregister_netdevice_queue(dev, NULL)
                  unregister_netdevice_many()
                    dev_close_many()
                      __dev_close_many()
                        usbnet_stop()
                          ax88772_stop()
                            phy_stop() # state = PHY_HALTED
                              phy_state_machine()
                                phy_link_down()
                                  phy_link_change()
                                    netif_carrier_off()
                                      linkwatch_fire_event()

Again, this is *before* the subsequent call to linkwatch_forget_dev().
I don't see how netif_carrier_off() could be called once more after that.

Perhaps the stacktrace occurred with an earlier version of your patches
which lacked the call to phy_stop() in ax88772_stop()?

I'm going to submit a revert of 2c9d6c2b871d.  If you *are* able to
reproduce the stacktrace after the revert, please add a dump_stack()
to linkwatch_fire_event() so that we can find the root cause.

Thanks,

Lukas

> 
> [   15.459655] asix 2-1.2:1.0 eth1: Link is Up - 100Mbps/Full - flow control off
> [   30.600242] usb 2-1.2: USB disconnect, device number 3
> [   30.611962] asix 2-1.2:1.0 eth1: unregister 'asix' usb-ci_hdrc.1-1.2, ASIX AX88772B USB 2.0 Ethernet
> [   30.649173] asix 2-1.2:1.0 eth1 (unregistered): Failed to write reg index 0x0000: -19
> [   30.657027] asix 2-1.2:1.0 eth1 (unregistered): Failed to write Medium Mode mode to 0x0000: ffffffed
> [   30.683006] asix 2-1.2:1.0 eth1 (unregistered): Link is Down
> [   30.689512] asix 2-1.2:1.0 eth1 (unregistered): Failed to write reg index 0x0000: -19
> [   30.697359] asix 2-1.2:1.0 eth1 (unregistered): Failed to enable software MII access
> [   30.706009] asix 2-1.2:1.0 eth1 (unregistered): Failed to write reg index 0x0000: -19
> [   30.714277] asix 2-1.2:1.0 eth1 (unregistered): Failed to enable software MII access
> [   30.732689] 8<--- cut here ---
> [   30.735757] Unable to handle kernel paging request at virtual address 2e839000
> [   30.742984] pgd = af824ad7
> [   30.745695] [2e839000] *pgd=00000000
> [   30.749282] Internal error: Oops: 5 [#1] PREEMPT SMP ARM
> [   30.754602] Modules linked in:
> [   30.757663] CPU: 0 PID: 77 Comm: kworker/0:2 Not tainted 5.13.0-rc3-00818-g06edf1a940be #2
> [   30.765934] Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree)
> [   30.772466] Workqueue: events linkwatch_event
> [   30.776841] PC is at linkwatch_do_dev+0x6c/0x88
> [   30.781380] LR is at __local_bh_enable_ip+0x6c/0x100
> [   30.786356] pc : [<c08637f4>]    lr : [<c013e5c4>]    psr: 60030093
> [   30.792625] sp : c1d2bed8  ip : 00000001  fp : c28d2000
> [   30.797852] r10: c0fcf19c  r9 : c0fcf170  r8 : 00000000
> [   30.803080] r7 : c102f044  r6 : c1d2beec  r5 : 00000063  r4 : c28d2000
> [   30.809611] r3 : 00000000  r2 : 00000000  r1 : 2e839000  r0 : 60030013
> [   30.816140] Flags: nZCv  IRQs off  FIQs on  Mode SVC_32  ISA ARM  Segment none
> [   30.823367] Control: 10c5387d  Table: 12b54059  DAC: 00000051
> [   30.829114] Register r0 information: non-paged memory
> [   30.834174] Register r1 information: non-paged memory
> [   30.839231] Register r2 information: NULL pointer
> [   30.843941] Register r3 information: NULL pointer
> [   30.848649] Register r4 information: slab kmalloc-2k start c28d2000 pointer offset 0 size 2048
> [   30.857287] Register r5 information: non-paged memory
> [   30.862344] Register r6 information: non-slab/vmalloc memory
> [   30.868008] Register r7 information: non-slab/vmalloc memory
> [   30.873673] Register r8 information: NULL pointer
> [   30.878381] Register r9 information: non-slab/vmalloc memory
> [   30.884045] Register r10 information: non-slab/vmalloc memory
> [   30.889797] Register r11 information: slab kmalloc-2k start c28d2000 pointer offset 0 size 2048
> [   30.898516] Register r12 information: non-paged memory
> [   30.903662] Process kworker/0:2 (pid: 77, stack limit = 0xded42e9b)
> [   30.909935] Stack: (0xc1d2bed8 to 0xc1d2c000)
> [   30.914298] bec0:                                                       c28d22cc c0863a48
> [   30.922481] bee0: 00000000 c1d2a000 00000008 c1d2beec c1d2beec 73675768 c1d4da84 c0fcf170
> [   30.930663] bf00: c1ce0d80 ef6d28c0 ef6d5c00 00000000 00000000 c0fed000 ef6d28c0 c0863b8c
> [   30.938844] bf20: c0fcf170 c0155550 c1d2a000 c0a13fd4 ef6d28d8 c1ce0d80 ef6d28c0 c1ce0d94
> [   30.947026] bf40: ef6d28d8 c0f03d00 00000008 c1d2a000 ef6d28c0 c0155dc0 c1003e48 c0fec754
> [   30.955208] bf60: c1ce75e4 c1ce75c0 c1ce7fc0 c1d2a000 00000000 c1931eb4 c0155d5c c1ce0d80
> [   30.963389] bf80: c1ce75e4 c015bacc 00000000 c1ce7fc0 c015b954 00000000 00000000 00000000
> [   30.971570] bfa0: 00000000 00000000 00000000 c0100150 00000000 00000000 00000000 00000000
> [   30.979750] bfc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
> [   30.987931] bfe0: 00000000 00000000 00000000 00000000 00000013 00000000 00000000 00000000
> [   30.996114] [<c08637f4>] (linkwatch_do_dev) from [<c0863a48>] (__linkwatch_run_queue+0xe0/0x1f0)
> [   31.004917] [<c0863a48>] (__linkwatch_run_queue) from [<c0863b8c>] (linkwatch_event+0x34/0x3c)
> [   31.013540] [<c0863b8c>] (linkwatch_event) from [<c0155550>] (process_one_work+0x20c/0x5d0)
> [   31.021911] [<c0155550>] (process_one_work) from [<c0155dc0>] (worker_thread+0x64/0x570)
> [   31.030010] [<c0155dc0>] (worker_thread) from [<c015bacc>] (kthread+0x178/0x190)
> [   31.037421] [<c015bacc>] (kthread) from [<c0100150>] (ret_from_fork+0x14/0x24)
> [   31.044654] Exception stack(0xc1d2bfb0 to 0xc1d2bff8)
> [   31.049710] bfa0:                                     00000000 00000000 00000000 00000000
> [   31.057891] bfc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
> [   31.066071] bfe0: 00000000 00000000 00000000 00000000 00000013 00000000
> [   31.072692] Code: e10f0000 f10c0080 e59432c8 ee1d1f90 (e7932001) 
> [   31.078788] ---[ end trace f80581862631ce84 ]---

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: ordering of call to unbind() in usbnet_disconnect
  2022-03-21 13:10                 ` Andrew Lunn
@ 2022-03-26 12:39                   ` Lukas Wunner
  2022-03-26 12:49                     ` Andrew Lunn
  0 siblings, 1 reply; 26+ messages in thread
From: Lukas Wunner @ 2022-03-26 12:39 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: Oleksij Rempel, Oliver Neukum, Oleksij Rempel, netdev, Heiner Kallweit

On Mon, Mar 21, 2022 at 02:10:27PM +0100, Andrew Lunn wrote:
> There are two patterns in use at the moment:
> 
> 1) The phy is attached in open() and detached in close(). There is no
>    danger of the netdev disappearing at this time.
> 
> 2) The PHY is attached during probe, and detached during release.
> 
> This second case is what is being used here in the USB code. This is
> also a common pattern for complex devices. In probe, you get all the
> components of a complex devices, stitch them together and then
> register the composite device. During release, you unregister the
> composite device, and then release all the components. Since this is a
> natural model, i think it should work.

I've gone through all drivers and noticed that some of them use a variation
of pattern 2 which looks fishy:

On probe, they first attach the PHY, then register the netdev.
On remove, they detach the PHY, then unregister the netdev.

Is it legal to detach the PHY from a registered (potentially running)
netdev? It looks wrong to me.

Affected drivers:

          drivers/net/ethernet/actions/owl-emac.c
          drivers/net/ethernet/altera/altera_tse_main.c
          drivers/net/ethernet/apm/xgene-v2/mdio.c
          drivers/net/ethernet/apm/xgene/xgene_enet_hw.c
          drivers/net/ethernet/arc/emac_main.c
          drivers/net/ethernet/asix/ax88796c_main.c
          drivers/net/ethernet/broadcom/tg3.c
          drivers/net/ethernet/cortina/gemini.c
          drivers/net/ethernet/dnet.c
          drivers/net/ethernet/ethoc.c
          drivers/net/ethernet/hisilicon/hip04_eth.c
          drivers/net/ethernet/lantiq_etop.c
          drivers/net/ethernet/marvell/pxa168_eth.c
          drivers/net/ethernet/toshiba/tc35815.c

Some of these use devm_register_netdev() on probe and disconnect the
PHY in their ->remove hook.  They're missing the fact that
__device_release_driver() calls the ->remove hook before
devres_release_all().

Thanks,

Lukas

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: ordering of call to unbind() in usbnet_disconnect
  2022-03-26 12:25             ` Lukas Wunner
@ 2022-03-26 12:44               ` Andrew Lunn
  2022-03-26 13:01                 ` Lukas Wunner
  0 siblings, 1 reply; 26+ messages in thread
From: Andrew Lunn @ 2022-03-26 12:44 UTC (permalink / raw)
  To: Lukas Wunner
  Cc: Oleksij Rempel, Oliver Neukum, Oleksij Rempel, netdev, Heiner Kallweit

On Sat, Mar 26, 2022 at 01:25:52PM +0100, Lukas Wunner wrote:
> On Tue, Mar 15, 2022 at 12:38:41PM +0100, Oleksij Rempel wrote:
> > On Tue, Mar 15, 2022 at 09:32:34AM +0100, Lukas Wunner wrote:
> > > > > > > On Thu, Mar 10, 2022 at 12:25:08PM +0100, Oliver Neukum wrote:
> > > > > > > > I got bug reports that 2c9d6c2b871d ("usbnet: run unbind() before
> > > > > > > > unregister_netdev()") is causing regressions.
> > > 
> > > Is it illegal to disconnect a PHY from an unregistered, but not yet freed
> > > net_device?
> > > 
> > > Oleksij, the commit message of 2c9d6c2b871d says that disconnecting the
> > > PHY "fails" in that situation.  Please elaborate what the failure looked
> > > like.  Did you get a stacktrace?
> 
> Oleksij, I cannot reproduce your stacktrace (included in full length below).
> I've tested with kernel 5.13 (since the stacktrace was with 5.13-rc3)
> with all your (and other people's) asix patches applied on top,
> except for 2c9d6c2b871d.  Tried unplugging an AX88772A multiple times,
> never got a stacktrace.
> 
> I've also walked down the code paths from usbnet_disconnect() and cannot
> see how the stacktrace could occur.
> 
> Normally an unregistering netdev is removed from the linkwatch event list
> (lweventlist) via this call stack:
> 
>           usbnet_disconnect()
>             unregister_netdev()
>               rtnl_unlock()
>                 netdev_run_todo()
>                   netdev_wait_allrefs()
>                     linkwatch_forget_dev()
>                       linkwatch_do_dev()
> 
> For the stacktrace to occur, the netdev would have to be subsequently
> re-added to the linkwatch event list via linkwatch_fire_event().

Hi Lukas

What you might be missing is a call to phy_error()
 
> That is called, among other places, from netif_carrier_off().  However,
> netif_carrier_off() is already called *before* linkwatch_forget_dev()
> when unregister_netdev() stops the netdev before unregistering it:
> 
>           usbnet_disconnect()
>             unregister_netdev()
>               unregister_netdevice()
>                 unregister_netdevice_queue(dev, NULL)
>                   unregister_netdevice_many()
>                     dev_close_many()
>                       __dev_close_many()
>                         usbnet_stop()
>                           ax88772_stop()
>                             phy_stop() # state = PHY_HALTED
>                               phy_state_machine()

I'm guessing somewhere around here:

If it calls into the PHY driver, and the PHY calls for an MDIO bus
transaction, and that returns an error, -ENODEV or -EIO for example,
because the USB device has gone away, and that results in a call to
phy_error().

void phy_error(struct phy_device *phydev)
{
        WARN_ON(1);

        mutex_lock(&phydev->lock);
        phydev->state = PHY_HALTED;
        mutex_unlock(&phydev->lock);

        phy_trigger_machine(phydev);
}

That will trigger the PHY state machine to run again, asynchronously.

The end of phy_stop() says:

        /* Cannot call flush_scheduled_work() here as desired because
         * of rtnl_lock(), but PHY_HALTED shall guarantee irq handler
         * will not reenable interrupts.
         */

so it looks like the state machine will run again, and potentially use
netdev.

If the MDIO bus driver is no longer returning ENODEV, than we should
avoid this.

      Andrew

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: ordering of call to unbind() in usbnet_disconnect
  2022-03-26 12:39                   ` Lukas Wunner
@ 2022-03-26 12:49                     ` Andrew Lunn
  2022-03-26 13:04                       ` Lukas Wunner
  0 siblings, 1 reply; 26+ messages in thread
From: Andrew Lunn @ 2022-03-26 12:49 UTC (permalink / raw)
  To: Lukas Wunner
  Cc: Oleksij Rempel, Oliver Neukum, Oleksij Rempel, netdev, Heiner Kallweit

On Sat, Mar 26, 2022 at 01:39:29PM +0100, Lukas Wunner wrote:
> On Mon, Mar 21, 2022 at 02:10:27PM +0100, Andrew Lunn wrote:
> > There are two patterns in use at the moment:
> > 
> > 1) The phy is attached in open() and detached in close(). There is no
> >    danger of the netdev disappearing at this time.
> > 
> > 2) The PHY is attached during probe, and detached during release.
> > 
> > This second case is what is being used here in the USB code. This is
> > also a common pattern for complex devices. In probe, you get all the
> > components of a complex devices, stitch them together and then
> > register the composite device. During release, you unregister the
> > composite device, and then release all the components. Since this is a
> > natural model, i think it should work.
> 
> I've gone through all drivers and noticed that some of them use a variation
> of pattern 2 which looks fishy:
> 
> On probe, they first attach the PHY, then register the netdev.
> On remove, they detach the PHY, then unregister the netdev.
> 
> Is it legal to detach the PHY from a registered (potentially running)
> netdev? It looks wrong to me.

I think the network stack guarantee that the close() method is called
before unregister completes. It is a common pattern to attach the PHY
in open() and detach it in close(). The stack itself should not be
using the PHY when it is down, the exception being IOCTL handlers
which people often get wrong.

So i don't think this is an issue.

   Andrew

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: ordering of call to unbind() in usbnet_disconnect
  2022-03-26 12:44               ` Andrew Lunn
@ 2022-03-26 13:01                 ` Lukas Wunner
  0 siblings, 0 replies; 26+ messages in thread
From: Lukas Wunner @ 2022-03-26 13:01 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: Oleksij Rempel, Oliver Neukum, Oleksij Rempel, netdev, Heiner Kallweit

On Sat, Mar 26, 2022 at 01:44:12PM +0100, Andrew Lunn wrote:
> On Sat, Mar 26, 2022 at 01:25:52PM +0100, Lukas Wunner wrote:
> > Oleksij, I cannot reproduce your stacktrace (included in full length below).
> > I've tested with kernel 5.13 (since the stacktrace was with 5.13-rc3)
> > with all your (and other people's) asix patches applied on top,
> > except for 2c9d6c2b871d.  Tried unplugging an AX88772A multiple times,
> > never got a stacktrace.
> > 
> > I've also walked down the code paths from usbnet_disconnect() and cannot
> > see how the stacktrace could occur.
> > 
> > Normally an unregistering netdev is removed from the linkwatch event list
> > (lweventlist) via this call stack:
> > 
> >           usbnet_disconnect()
> >             unregister_netdev()
> >               rtnl_unlock()
> >                 netdev_run_todo()
> >                   netdev_wait_allrefs()
> >                     linkwatch_forget_dev()
> >                       linkwatch_do_dev()
> > 
> > For the stacktrace to occur, the netdev would have to be subsequently
> > re-added to the linkwatch event list via linkwatch_fire_event().
> 
> What you might be missing is a call to phy_error()

But phy_error() has a WARN_ON(1) right at its top.  So it produces
a stacktrace itself.  That stacktrace is nowhere to be seen in the
dmesg output Oleksij posted.  Hence it can't be caused by phy_error().

Also, recall that unregister_netdev() stops the netdev before
unregistering it.  That in turn causes an invocation of phy_stop()
via ax88772_stop().  phy_stop() already puts the PHY into PHY_HALTED
state and resets phydev->link = 0.  So a subsequent phy_error() cannot
result in a call to phy_link_down() (which would indeed trigger a
dangerous linkwatch_fire_event()).


> > That is called, among other places, from netif_carrier_off().  However,
> > netif_carrier_off() is already called *before* linkwatch_forget_dev()
> > when unregister_netdev() stops the netdev before unregistering it:
> > 
> >           usbnet_disconnect()
> >             unregister_netdev()
> >               unregister_netdevice()
> >                 unregister_netdevice_queue(dev, NULL)
> >                   unregister_netdevice_many()
> >                     dev_close_many()
> >                       __dev_close_many()
> >                         usbnet_stop()
> >                           ax88772_stop()
> >                             phy_stop() # state = PHY_HALTED
> >                               phy_state_machine()
> 
> I'm guessing somewhere around here:
> 
> If it calls into the PHY driver, and the PHY calls for an MDIO bus
> transaction, and that returns an error, -ENODEV or -EIO for example,
> because the USB device has gone away, and that results in a call to
> phy_error().

Oleksij amended phy_state_machine() to bail out if err == -ENODEV
with commit 06edf1a940be ("net: phy: do not print dump stack if device
was removed").  The commit skips the phy_error() on -ENODEV, which
makes a lot of sense.

Thanks,

Lukas

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: ordering of call to unbind() in usbnet_disconnect
  2022-03-26 12:49                     ` Andrew Lunn
@ 2022-03-26 13:04                       ` Lukas Wunner
  2022-03-27  8:37                         ` Oleksij Rempel
  0 siblings, 1 reply; 26+ messages in thread
From: Lukas Wunner @ 2022-03-26 13:04 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: Oleksij Rempel, Oliver Neukum, Oleksij Rempel, netdev, Heiner Kallweit

On Sat, Mar 26, 2022 at 01:49:28PM +0100, Andrew Lunn wrote:
> On Sat, Mar 26, 2022 at 01:39:29PM +0100, Lukas Wunner wrote:
> > On Mon, Mar 21, 2022 at 02:10:27PM +0100, Andrew Lunn wrote:
> > > There are two patterns in use at the moment:
> > > 
> > > 1) The phy is attached in open() and detached in close(). There is no
> > >    danger of the netdev disappearing at this time.
> > > 
> > > 2) The PHY is attached during probe, and detached during release.
> > > 
> > > This second case is what is being used here in the USB code. This is
> > > also a common pattern for complex devices. In probe, you get all the
> > > components of a complex devices, stitch them together and then
> > > register the composite device. During release, you unregister the
> > > composite device, and then release all the components. Since this is a
> > > natural model, i think it should work.
> > 
> > I've gone through all drivers and noticed that some of them use a variation
> > of pattern 2 which looks fishy:
> > 
> > On probe, they first attach the PHY, then register the netdev.
> > On remove, they detach the PHY, then unregister the netdev.
> > 
> > Is it legal to detach the PHY from a registered (potentially running)
> > netdev? It looks wrong to me.
> 
> I think the network stack guarantee that the close() method is called
> before unregister completes. It is a common pattern to attach the PHY
> in open() and detach it in close(). The stack itself should not be
> using the PHY when it is down, the exception being IOCTL handlers
> which people often get wrong.

But the PHY is detached from a *running* netdev *before* that netdev
is unregistered (and closed).  Is that really legal?

Thanks,

Lukas

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: ordering of call to unbind() in usbnet_disconnect
  2022-03-26 13:04                       ` Lukas Wunner
@ 2022-03-27  8:37                         ` Oleksij Rempel
  2022-03-31  9:20                           ` Oliver Neukum
  0 siblings, 1 reply; 26+ messages in thread
From: Oleksij Rempel @ 2022-03-27  8:37 UTC (permalink / raw)
  To: Lukas Wunner
  Cc: Andrew Lunn, Oliver Neukum, Oleksij Rempel, netdev, Heiner Kallweit

On Sat, Mar 26, 2022 at 02:04:30PM +0100, Lukas Wunner wrote:
> On Sat, Mar 26, 2022 at 01:49:28PM +0100, Andrew Lunn wrote:
> > On Sat, Mar 26, 2022 at 01:39:29PM +0100, Lukas Wunner wrote:
> > > On Mon, Mar 21, 2022 at 02:10:27PM +0100, Andrew Lunn wrote:
> > > > There are two patterns in use at the moment:
> > > > 
> > > > 1) The phy is attached in open() and detached in close(). There is no
> > > >    danger of the netdev disappearing at this time.
> > > > 
> > > > 2) The PHY is attached during probe, and detached during release.
> > > > 
> > > > This second case is what is being used here in the USB code. This is
> > > > also a common pattern for complex devices. In probe, you get all the
> > > > components of a complex devices, stitch them together and then
> > > > register the composite device. During release, you unregister the
> > > > composite device, and then release all the components. Since this is a
> > > > natural model, i think it should work.
> > > 
> > > I've gone through all drivers and noticed that some of them use a variation
> > > of pattern 2 which looks fishy:
> > > 
> > > On probe, they first attach the PHY, then register the netdev.
> > > On remove, they detach the PHY, then unregister the netdev.
> > > 
> > > Is it legal to detach the PHY from a registered (potentially running)
> > > netdev? It looks wrong to me.
> > 
> > I think the network stack guarantee that the close() method is called
> > before unregister completes. It is a common pattern to attach the PHY
> > in open() and detach it in close(). The stack itself should not be
> > using the PHY when it is down, the exception being IOCTL handlers
> > which people often get wrong.
> 
> But the PHY is detached from a *running* netdev *before* that netdev
> is unregistered (and closed).  Is that really legal?

IMO, it reflects, more or less, the reality of devices with SFP modules.
The PHY can be physically removed from running netdev. At same time,
netdev should be registered and visible for the user, even if PHY is not
physically attached.

Regards,
Oleksij
-- 
Pengutronix e.K.                           |                             |
Steuerwalder Str. 21                       | http://www.pengutronix.de/  |
31137 Hildesheim, Germany                  | Phone: +49-5121-206917-0    |
Amtsgericht Hildesheim, HRA 2686           | Fax:   +49-5121-206917-5555 |

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: ordering of call to unbind() in usbnet_disconnect
  2022-03-27  8:37                         ` Oleksij Rempel
@ 2022-03-31  9:20                           ` Oliver Neukum
  2022-03-31  9:30                             ` Lukas Wunner
  0 siblings, 1 reply; 26+ messages in thread
From: Oliver Neukum @ 2022-03-31  9:20 UTC (permalink / raw)
  To: Oleksij Rempel, Lukas Wunner
  Cc: Andrew Lunn, Oliver Neukum, Oleksij Rempel, netdev, Heiner Kallweit



On 27.03.22 10:37, Oleksij Rempel wrote:
> On Sat, Mar 26, 2022 at 02:04:30PM +0100, Lukas Wunner wrote:
>> On Sat, Mar 26, 2022 at 01:49:28PM +0100, Andrew Lunn wrote:
>>> On Sat, Mar 26, 2022 at 01:39:29PM +0100, Lukas Wunner wrote:
>>>
>>>> On probe, they first attach the PHY, then register the netdev.
>>>> On remove, they detach the PHY, then unregister the netdev.
>>>>
>>>> Is it legal to detach the PHY from a registered (potentially running)
>>>> netdev? It looks wrong to me.
>>> I think the network stack guarantee that the close() method is called
>>> before unregister completes. It is a common pattern to attach the PHY
>>> in open() and detach it in close(). The stack itself should not be
>>> using the PHY when it is down, the exception being IOCTL handlers
>>> which people often get wrong.
>> But the PHY is detached from a *running* netdev *before* that netdev
>> is unregistered (and closed).  Is that really legal?
> IMO, it reflects, more or less, the reality of devices with SFP modules.
> The PHY can be physically removed from running netdev. At same time,
> netdev should be registered and visible for the user, even if PHY is not
> physically attached.
>
>
Hi,

this makes sense, but the relevance to the question of how to do an
unplug of the whole device is indirect, isn't it? I am afraid, putting my
maintainer's hat on, I have to point on that we have a stable tree for
which we will need some solution.

Nor can usbnet exclusively cater to device that expose their PHY
over MDIO. (or at all really). Intuitively I must say that exactly reversing
the order of probe() in disconnect() is kind of the default.
If there is a need to deviate from that, of course we will acomodate that,
but making this the exclusive order is another matter.

I really get that you want to discuss this matter exhaustively, but we
need to
come to some kind of conclusion.

    Regards
        Oliver


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: ordering of call to unbind() in usbnet_disconnect
  2022-03-31  9:20                           ` Oliver Neukum
@ 2022-03-31  9:30                             ` Lukas Wunner
  2022-03-31  9:59                               ` Oliver Neukum
  0 siblings, 1 reply; 26+ messages in thread
From: Lukas Wunner @ 2022-03-31  9:30 UTC (permalink / raw)
  To: Oliver Neukum
  Cc: Oleksij Rempel, Andrew Lunn, Oleksij Rempel, netdev, Heiner Kallweit

On Thu, Mar 31, 2022 at 11:20:50AM +0200, Oliver Neukum wrote:
> I am afraid, putting my
> maintainer's hat on, I have to point on that we have a stable tree for
> which we will need some solution.
> 
> Nor can usbnet exclusively cater to device that expose their PHY
> over MDIO. (or at all really). Intuitively I must say that exactly reversing
> the order of probe() in disconnect() is kind of the default.
> If there is a need to deviate from that, of course we will acomodate that,
> but making this the exclusive order is another matter.
> 
> I really get that you want to discuss this matter exhaustively, but we
> need to
> come to some kind of conclusion.

I propose the below patch.  If you could provide more details on the
regressions you've reported (+ ideally bugzilla links), I'll be happy
to include them in the commit message.  Thanks!

-- >8 --
Subject: [PATCH] usbnet: Run unregister_netdev() before unbind() again

Oliver says he got bug reports that commit 2c9d6c2b871d ("usbnet: run
unbind() before unregister_netdev()") is causing regressions.

The commit made binding and unbinding of USB Ethernet asymmetrical:
Before, usbnet_probe() first invoked the ->bind() callback and then
register_netdev().  usbnet_disconnect() mirrored that by first invoking
unregister_netdev() and then ->unbind().

Since the commit, the order in usbnet_disconnect() is reversed and no
longer mirrors usbnet_probe().

One consequence is that a PHY disconnected (and stopped) in ->unbind()
is afterwards stopped once more by unregister_netdev() as it closes the
netdev before unregistering.  That necessitates a contortion in ->stop()
because the PHY may only be stopped if it hasn't already been
disconnected.

Reverting the commit allows making the call to phy_stop() unconditional
in ->stop() and also fixes the issues reported by Oliver.

Fixes: 2c9d6c2b871d ("usbnet: run unbind() before unregister_netdev()")
Link: https://lore.kernel.org/netdev/62b944a1-0df2-6e81-397c-6bf9dea266ef@suse.com/
Reported-by: Oliver Neukum <oneukum@suse.com>
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Cc: stable@vger.kernel.org # v5.14+
Cc: Oleksij Rempel <o.rempel@pengutronix.de>
Cc: Martyn Welch <martyn.welch@collabora.com>
Cc: Andrew Lunn <andrew@lunn.ch>
---
 drivers/net/usb/asix_devices.c | 6 +-----
 drivers/net/usb/smsc95xx.c     | 3 +--
 drivers/net/usb/usbnet.c       | 6 +++---
 3 files changed, 5 insertions(+), 10 deletions(-)

diff --git a/drivers/net/usb/asix_devices.c b/drivers/net/usb/asix_devices.c
index 4514d35..7d569f0 100644
--- a/drivers/net/usb/asix_devices.c
+++ b/drivers/net/usb/asix_devices.c
@@ -794,11 +794,7 @@ static int ax88772_stop(struct usbnet *dev)
 {
 	struct asix_common_private *priv = dev->driver_priv;
 
-	/* On unplugged USB, we will get MDIO communication errors and the
-	 * PHY will be set in to PHY_HALTED state.
-	 */
-	if (priv->phydev->state != PHY_HALTED)
-		phy_stop(priv->phydev);
+	phy_stop(priv->phydev);
 
 	return 0;
 }
diff --git a/drivers/net/usb/smsc95xx.c b/drivers/net/usb/smsc95xx.c
index 193635e..3dc6df6 100644
--- a/drivers/net/usb/smsc95xx.c
+++ b/drivers/net/usb/smsc95xx.c
@@ -1280,8 +1280,7 @@ static int smsc95xx_start_phy(struct usbnet *dev)
 
 static int smsc95xx_stop(struct usbnet *dev)
 {
-	if (dev->net->phydev)
-		phy_stop(dev->net->phydev);
+	phy_stop(dev->net->phydev);
 
 	return 0;
 }
diff --git a/drivers/net/usb/usbnet.c b/drivers/net/usb/usbnet.c
index 9a6450f..e8b4a93 100644
--- a/drivers/net/usb/usbnet.c
+++ b/drivers/net/usb/usbnet.c
@@ -1616,9 +1616,6 @@ void usbnet_disconnect (struct usb_interface *intf)
 		   xdev->bus->bus_name, xdev->devpath,
 		   dev->driver_info->description);
 
-	if (dev->driver_info->unbind)
-		dev->driver_info->unbind(dev, intf);
-
 	net = dev->net;
 	unregister_netdev (net);
 
@@ -1626,6 +1623,9 @@ void usbnet_disconnect (struct usb_interface *intf)
 
 	usb_scuttle_anchored_urbs(&dev->deferred);
 
+	if (dev->driver_info->unbind)
+		dev->driver_info->unbind (dev, intf);
+
 	usb_kill_urb(dev->interrupt);
 	usb_free_urb(dev->interrupt);
 	kfree(dev->padding_pkt);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: ordering of call to unbind() in usbnet_disconnect
  2022-03-21 10:17                 ` Lukas Wunner
  2022-03-21 10:43                   ` Oleksij Rempel
@ 2022-03-31  9:35                   ` Oliver Neukum
  1 sibling, 0 replies; 26+ messages in thread
From: Oliver Neukum @ 2022-03-31  9:35 UTC (permalink / raw)
  To: Lukas Wunner, Oliver Neukum
  Cc: Andrew Lunn, Oleksij Rempel, Oleksij Rempel, netdev, Heiner Kallweit



On 21.03.22 11:17, Lukas Wunner wrote:
> On Thu, Mar 17, 2022 at 04:53:34PM +0100, Oliver Neukum wrote:
>
> By the way: 2c9d6c2b871d caused breakage in smsc95xx.c which was
> subsequently fixed by a049a30fc27c.  That in turn required another
> fix, 0bf3885324a8.  Some of these code changes will have to be
> rolled back or adjusted after reverting 2c9d6c2b871d.  It's a giant mess.
> It's possible that more drivers saw fixes due to 2c9d6c2b871d,
> I haven't checked that yet.
Very well, but in order to do so, I guess we need a fix first.
>
> Oliver, why did you want to revert 2c9d6c2b871d, i.e. in which drivers
> have your users reported breakage?  Do you have bugzilla links?
I made the patch in response to a security report about a DOS attack

    Regards
        Oliver


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: ordering of call to unbind() in usbnet_disconnect
  2022-03-31  9:30                             ` Lukas Wunner
@ 2022-03-31  9:59                               ` Oliver Neukum
  2022-03-31 11:22                                 ` Lukas Wunner
  0 siblings, 1 reply; 26+ messages in thread
From: Oliver Neukum @ 2022-03-31  9:59 UTC (permalink / raw)
  To: Lukas Wunner, Oliver Neukum
  Cc: Oleksij Rempel, Andrew Lunn, Oleksij Rempel, netdev, Heiner Kallweit



On 31.03.22 11:30, Lukas Wunner wrote:
> I propose the below patch.  If you could provide more details on the
> regressions you've reported (+ ideally bugzilla links), I'll be happy
> to include them in the commit message.  Thanks!
There is no bugzilla, but the report can be found:
https://lore.kernel.org/netdev/CAG48ez0MHBbENX5gCdHAUXZ7h7s20LnepBF-pa5M=7Bi-jZrEA@mail.gmail.com/
>
> The commit made binding and unbinding of USB Ethernet asymmetrical:
> Before, usbnet_probe() first invoked the ->bind() callback and then
> register_netdev().  usbnet_disconnect() mirrored that by first invoking
> unregister_netdev() and then ->unbind().
>
> Since the commit, the order in usbnet_disconnect() is reversed and no
> longer mirrors usbnet_probe().
>
> One consequence is that a PHY disconnected (and stopped) in ->unbind()
> is afterwards stopped once more by unregister_netdev() as it closes the
> netdev before unregistering.  That necessitates a contortion in ->stop()
> because the PHY may only be stopped if it hasn't already been
> disconnected.
>
> Reverting the commit allows making the call to phy_stop() unconditional
> in ->stop() and also fixes the issues reported by Oliver.
Very well, but what prevents reintroducing the isssue this revert's
target was to fix?

    Regards
        Oliver


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: ordering of call to unbind() in usbnet_disconnect
  2022-03-31  9:59                               ` Oliver Neukum
@ 2022-03-31 11:22                                 ` Lukas Wunner
  0 siblings, 0 replies; 26+ messages in thread
From: Lukas Wunner @ 2022-03-31 11:22 UTC (permalink / raw)
  To: Oliver Neukum
  Cc: Oleksij Rempel, Andrew Lunn, Oleksij Rempel, netdev, Heiner Kallweit

On Thu, Mar 31, 2022 at 11:59:44AM +0200, Oliver Neukum wrote:
> On 31.03.22 11:30, Lukas Wunner wrote:
> > I propose the below patch.  If you could provide more details on the
> > regressions you've reported (+ ideally bugzilla links), I'll be happy
> > to include them in the commit message.  Thanks!
> 
> There is no bugzilla, but the report can be found:
> https://lore.kernel.org/netdev/CAG48ez0MHBbENX5gCdHAUXZ7h7s20LnepBF-pa5M=7Bi-jZrEA@mail.gmail.com/

Excellent, thanks.

That's a different driver (ax88179_178a.c) than the one Oleksij saw the
linkwatch splat on (ax88172a.c).  I need to analyze Jann's findings
before I can say more about this.

Thanks,

Lukas

^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2022-03-31 11:22 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-03-10 11:25 ordering of call to unbind() in usbnet_disconnect Oliver Neukum
2022-03-10 11:38 ` Oleksij Rempel
2022-03-14 18:42   ` Lukas Wunner
2022-03-14 19:14     ` Andrew Lunn
2022-03-15  5:44       ` Oleksij Rempel
2022-03-15  8:32         ` Lukas Wunner
2022-03-15 11:38           ` Oleksij Rempel
2022-03-15 13:28             ` Andrew Lunn
2022-03-17 15:53               ` Oliver Neukum
2022-03-17 21:03                 ` Lukas Wunner
2022-03-21 10:17                 ` Lukas Wunner
2022-03-21 10:43                   ` Oleksij Rempel
2022-03-31  9:35                   ` Oliver Neukum
2022-03-21 10:02               ` Lukas Wunner
2022-03-21 13:10                 ` Andrew Lunn
2022-03-26 12:39                   ` Lukas Wunner
2022-03-26 12:49                     ` Andrew Lunn
2022-03-26 13:04                       ` Lukas Wunner
2022-03-27  8:37                         ` Oleksij Rempel
2022-03-31  9:20                           ` Oliver Neukum
2022-03-31  9:30                             ` Lukas Wunner
2022-03-31  9:59                               ` Oliver Neukum
2022-03-31 11:22                                 ` Lukas Wunner
2022-03-26 12:25             ` Lukas Wunner
2022-03-26 12:44               ` Andrew Lunn
2022-03-26 13:01                 ` Lukas Wunner

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.