All of lore.kernel.org
 help / color / mirror / Atom feed
From: Lukas Wunner <lukas@wunner.de>
To: Andrew Lunn <andrew@lunn.ch>
Cc: Oleksij Rempel <o.rempel@pengutronix.de>,
	Oliver Neukum <oneukum@suse.com>,
	Oleksij Rempel <linux@rempel-privat.de>,
	"netdev@vger.kernel.org" <netdev@vger.kernel.org>,
	Heiner Kallweit <hkallweit1@gmail.com>
Subject: Re: ordering of call to unbind() in usbnet_disconnect
Date: Mon, 21 Mar 2022 11:02:26 +0100	[thread overview]
Message-ID: <20220321100226.GA19177@wunner.de> (raw)
In-Reply-To: <YjCUgCNHw6BUqJxr@lunn.ch>

On Tue, Mar 15, 2022 at 02:28:32PM +0100, Andrew Lunn wrote:
> > > > It was linked to unregistered/freed netdev. This is why my patch
> > > > changing the order to call phy_disconnect() first and then
> > > > unregister_netdev().
> > > 
> > > Unregistered yes, but freed no.  Here's the order before 2c9d6c2b871d:
> > > 
> > >   usbnet_disconnect()
> > >     unregister_netdev()
> > >     ax88772_unbind()
> > >       phy_disconnect()
> > >     free_netdev()
> > > 
> > > Is it illegal to disconnect a PHY from an unregistered, but not yet freed
> > > net_device?
> 
> There are drivers which unregistering and then calling
> phy_disconnect. In general that should be a valid pattern. But more
> MAC drivers actually connect the PHY on open and disconnect it on
> close. So it is less well used.

It turns out that unregistering a net_device and then calling
phy_disconnect() may indeed crash and is thus not permitted
right now:

Oleksij added the following code comment with commit e532a096be0e
("net: usb: asix: ax88772: add phylib support"):

  /* On unplugged USB, we will get MDIO communication errors and the
   * PHY will be set in to PHY_HALTED state.

So the USB adapter gets unplugged, access to MII registers fails with
-ENODEV, phy_error() is called, phy_state_machine() transitions to
PHY_HALTED and performs the following call:

  phy_state_machine()
    phy_link_down()
      phy_link_change()
        netif_carrier_off()
          linkwatch_fire_event()

Asynchronously, usbnet_disconnect() calls phy_detach() and then
free_netdev().

A bit later, linkwatch_event() runs and tries to access the freed
net_device, leading to the crash that Oleksij posted upthread.

The fact that linkwatch_fire_event() increments the refcount doesn't
help because unregister_netdevice() has already run (it waits for
the refcount to drop to 1).

My suggestion would be to amend unregister_netdevice() to set
dev->phydev->attached_dev = NULL.  It may also be a good idea
to WARN_ON() in free_netdev() if the refcount is not 1.

Andrew, please clarify whether you really think that the
"unregister netdev, then detach phy" order should be supported.
If you do think that it should be supported, we'll have to litter
phylib with NULL pointer checks for attached_dev.  If you don't
want that, we should at least document that it's an illegal pattern.

Even if you decide that we should rather declare this pattern
illegal rather than littering phylib with NULL pointer checks,
I strongly recommend that at least unregister_netdevice() sets
dev->phydev->attached_dev = NULL.  That will cause oopses which
are easier to debug than complex races like the one witnessed
by Oleksij.

Side note:  Since e532a096be0e, ax88772_stop() directly accesses
phydev->state.  I wonder whether that's legal.  I'm under the
impression that the state is internal to phylib.

Another side note:  The commit message of 2c9d6c2b871d is poor.
It should have contained a stack trace and a clear explanation.
It took me days of staring at code to reverse-engineer what
went wrong here.

Thanks,

Lukas

  parent reply	other threads:[~2022-03-21 10:09 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-03-10 11:25 ordering of call to unbind() in usbnet_disconnect Oliver Neukum
2022-03-10 11:38 ` Oleksij Rempel
2022-03-14 18:42   ` Lukas Wunner
2022-03-14 19:14     ` Andrew Lunn
2022-03-15  5:44       ` Oleksij Rempel
2022-03-15  8:32         ` Lukas Wunner
2022-03-15 11:38           ` Oleksij Rempel
2022-03-15 13:28             ` Andrew Lunn
2022-03-17 15:53               ` Oliver Neukum
2022-03-17 21:03                 ` Lukas Wunner
2022-03-21 10:17                 ` Lukas Wunner
2022-03-21 10:43                   ` Oleksij Rempel
2022-03-31  9:35                   ` Oliver Neukum
2022-03-21 10:02               ` Lukas Wunner [this message]
2022-03-21 13:10                 ` Andrew Lunn
2022-03-26 12:39                   ` Lukas Wunner
2022-03-26 12:49                     ` Andrew Lunn
2022-03-26 13:04                       ` Lukas Wunner
2022-03-27  8:37                         ` Oleksij Rempel
2022-03-31  9:20                           ` Oliver Neukum
2022-03-31  9:30                             ` Lukas Wunner
2022-03-31  9:59                               ` Oliver Neukum
2022-03-31 11:22                                 ` Lukas Wunner
2022-03-26 12:25             ` Lukas Wunner
2022-03-26 12:44               ` Andrew Lunn
2022-03-26 13:01                 ` Lukas Wunner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220321100226.GA19177@wunner.de \
    --to=lukas@wunner.de \
    --cc=andrew@lunn.ch \
    --cc=hkallweit1@gmail.com \
    --cc=linux@rempel-privat.de \
    --cc=netdev@vger.kernel.org \
    --cc=o.rempel@pengutronix.de \
    --cc=oneukum@suse.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.