linux-i2c.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH net 0/2] Fix mdiobus users with devres
@ 2021-09-20 21:42 Vladimir Oltean
  2021-09-20 21:42 ` [PATCH net 1/2] net: dsa: don't allocate the slave_mii_bus using devres Vladimir Oltean
                   ` (3 more replies)
  0 siblings, 4 replies; 10+ messages in thread
From: Vladimir Oltean @ 2021-09-20 21:42 UTC (permalink / raw)
  To: netdev
  Cc: Linus Walleij, Andrew Lunn, Vivien Didelot, Florian Fainelli,
	Vladimir Oltean, David S. Miller, Jakub Kicinski,
	Bartosz Golaszewski, Wolfram Sang, linux-i2c, Mark Brown,
	linux-spi, Alvin Šipraga, Lino Sanfilippo

Commit ac3a68d56651 ("net: phy: don't abuse devres in
devm_mdiobus_register()") by Bartosz Golaszewski has introduced two
classes of potential bugs by making the devres callback of
devm_mdiobus_alloc stop calling mdiobus_unregister.

The exact buggy circumstances are presented in the individual commit
messages. I have searched the tree for other occurrences, but at the
moment:

- for issue (a) I have no concrete proof that other buses except SPI and
  I2C suffer from it, and the only SPI or I2C device drivers that call
  of_mdiobus_alloc are the DSA drivers that leave a NULL
  ds->slave_mii_bus and a non-NULL ds->ops->phy_read, aka ksz9477,
  ksz8795, lan9303_i2c, vsc73xx-spi.

- for issue (b), all drivers which call of_mdiobus_alloc either use
  of_mdiobus_register too, or call mdiobus_unregister sometime within
  the ->remove path.

Although at this point I've seen enough strangeness caused by this
"device_del during ->shutdown" that I'm just going to copy the SPI and
I2C subsystem maintainers to this patch series, to get their feedback
whether they've had reports about things like this before. I don't think
other buses behave in this way, it forces SPI and I2C devices to have to
protect themselves from a really strange set of issues.

Vladimir Oltean (2):
  net: dsa: don't allocate the slave_mii_bus using devres
  net: dsa: realtek: register the MDIO bus under devres

 drivers/net/dsa/realtek-smi-core.c |  2 +-
 net/dsa/dsa2.c                     | 12 +++++++++---
 2 files changed, 10 insertions(+), 4 deletions(-)

-- 
2.25.1


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH net 1/2] net: dsa: don't allocate the slave_mii_bus using devres
  2021-09-20 21:42 [PATCH net 0/2] Fix mdiobus users with devres Vladimir Oltean
@ 2021-09-20 21:42 ` Vladimir Oltean
  2021-09-21  2:16   ` Florian Fainelli
                     ` (2 more replies)
  2021-09-20 21:42 ` [PATCH net 2/2] net: dsa: realtek: register the MDIO bus under devres Vladimir Oltean
                   ` (2 subsequent siblings)
  3 siblings, 3 replies; 10+ messages in thread
From: Vladimir Oltean @ 2021-09-20 21:42 UTC (permalink / raw)
  To: netdev
  Cc: Linus Walleij, Andrew Lunn, Vivien Didelot, Florian Fainelli,
	Vladimir Oltean, David S. Miller, Jakub Kicinski,
	Bartosz Golaszewski, Wolfram Sang, linux-i2c, Mark Brown,
	linux-spi, Alvin Šipraga, Lino Sanfilippo

The Linux device model permits both the ->shutdown and ->remove driver
methods to get called during a shutdown procedure. Example: a DSA switch
which sits on an SPI bus, and the SPI bus driver calls this on its
->shutdown method:

spi_unregister_controller
-> device_for_each_child(&ctlr->dev, NULL, __unregister);
   -> spi_unregister_device(to_spi_device(dev));
      -> device_del(&spi->dev);

So this is a simple pattern which can theoretically appear on any bus,
although the only other buses on which I've been able to find it are
I2C:

i2c_del_adapter
-> device_for_each_child(&adap->dev, NULL, __unregister_client);
   -> i2c_unregister_device(client);
      -> device_unregister(&client->dev);

The implication of this pattern is that devices on these buses can be
unregistered after having been shut down. The drivers for these devices
might choose to return early either from ->remove or ->shutdown if the
other callback has already run once, and they might choose that the
->shutdown method should only perform a subset of the teardown done by
->remove (to avoid unnecessary delays when rebooting).

So in other words, the device driver may choose on ->remove to not
do anything (therefore to not unregister an MDIO bus it has registered
on ->probe), because this ->remove is actually triggered by the
device_shutdown path, and its ->shutdown method has already run and done
the minimally required cleanup.

This used to be fine until the blamed commit, but now, the following
BUG_ON triggers:

void mdiobus_free(struct mii_bus *bus)
{
	/* For compatibility with error handling in drivers. */
	if (bus->state == MDIOBUS_ALLOCATED) {
		kfree(bus);
		return;
	}

	BUG_ON(bus->state != MDIOBUS_UNREGISTERED);
	bus->state = MDIOBUS_RELEASED;

	put_device(&bus->dev);
}

In other words, there is an attempt to free an MDIO bus which was not
unregistered. The attempt to free it comes from the devres release
callbacks of the SPI device, which are executed after the device is
unregistered.

I'm not saying that the fact that MDIO buses allocated using devres
would automatically get unregistered wasn't strange. I'm just saying
that the commit didn't care about auditing existing call paths in the
kernel, and now, the following code sequences are potentially buggy:

(a) devm_mdiobus_alloc followed by plain mdiobus_register, for a device
    located on a bus that unregisters its children on shutdown. After
    the blamed patch, either both the alloc and the register should use
    devres, or none should.

(b) devm_mdiobus_alloc followed by plain mdiobus_register, and then no
    mdiobus_unregister at all in the remove path. After the blamed
    patch, nobody unregisters the MDIO bus anymore, so this is even more
    buggy than the previous case which needs a specific bus
    configuration to be seen, this one is an unconditional bug.

In this case, DSA falls into category (a), it tries to be helpful and
registers an MDIO bus on behalf of the switch, which might be on such a
bus. I've no idea why it does it under devres.

It does this on probe:

	if (!ds->slave_mii_bus && ds->ops->phy_read)
		alloc and register mdio bus

and this on remove:

	if (ds->slave_mii_bus && ds->ops->phy_read)
		unregister mdio bus

I _could_ imagine using devres because the condition used on remove is
different than the condition used on probe. So strictly speaking, DSA
cannot determine whether the ds->slave_mii_bus it sees on remove is the
ds->slave_mii_bus that _it_ has allocated on probe. Using devres would
have solved that problem. But nonetheless, the existing code already
proceeds to unregister the MDIO bus, even though it might be
unregistering an MDIO bus it has never registered. So I can only guess
that no driver that implements ds->ops->phy_read also allocates and
registers ds->slave_mii_bus itself.

So in that case, if unregistering is fine, freeing must be fine too.

Stop using devres and free the MDIO bus manually. This will make devres
stop attempting to free a still registered MDIO bus on ->shutdown.

Fixes: ac3a68d56651 ("net: phy: don't abuse devres in devm_mdiobus_register()")
Reported-by: Lino Sanfilippo <LinoSanfilippo@gmx.de>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
---
 net/dsa/dsa2.c | 12 +++++++++---
 1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/net/dsa/dsa2.c b/net/dsa/dsa2.c
index f14897d9b31d..274018e9171c 100644
--- a/net/dsa/dsa2.c
+++ b/net/dsa/dsa2.c
@@ -880,7 +880,7 @@ static int dsa_switch_setup(struct dsa_switch *ds)
 	devlink_params_publish(ds->devlink);
 
 	if (!ds->slave_mii_bus && ds->ops->phy_read) {
-		ds->slave_mii_bus = devm_mdiobus_alloc(ds->dev);
+		ds->slave_mii_bus = mdiobus_alloc();
 		if (!ds->slave_mii_bus) {
 			err = -ENOMEM;
 			goto teardown;
@@ -890,13 +890,16 @@ static int dsa_switch_setup(struct dsa_switch *ds)
 
 		err = mdiobus_register(ds->slave_mii_bus);
 		if (err < 0)
-			goto teardown;
+			goto free_slave_mii_bus;
 	}
 
 	ds->setup = true;
 
 	return 0;
 
+free_slave_mii_bus:
+	if (ds->slave_mii_bus && ds->ops->phy_read)
+		mdiobus_free(ds->slave_mii_bus);
 teardown:
 	if (ds->ops->teardown)
 		ds->ops->teardown(ds);
@@ -921,8 +924,11 @@ static void dsa_switch_teardown(struct dsa_switch *ds)
 	if (!ds->setup)
 		return;
 
-	if (ds->slave_mii_bus && ds->ops->phy_read)
+	if (ds->slave_mii_bus && ds->ops->phy_read) {
 		mdiobus_unregister(ds->slave_mii_bus);
+		mdiobus_free(ds->slave_mii_bus);
+		ds->slave_mii_bus = NULL;
+	}
 
 	dsa_switch_unregister_notifier(ds);
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH net 2/2] net: dsa: realtek: register the MDIO bus under devres
  2021-09-20 21:42 [PATCH net 0/2] Fix mdiobus users with devres Vladimir Oltean
  2021-09-20 21:42 ` [PATCH net 1/2] net: dsa: don't allocate the slave_mii_bus using devres Vladimir Oltean
@ 2021-09-20 21:42 ` Vladimir Oltean
  2021-09-21 12:00   ` Andrew Lunn
  2021-09-21 16:22   ` Linus Walleij
  2021-09-21  7:32 ` [PATCH net 0/2] Fix mdiobus users with devres Bartosz Golaszewski
  2021-09-21 13:00 ` patchwork-bot+netdevbpf
  3 siblings, 2 replies; 10+ messages in thread
From: Vladimir Oltean @ 2021-09-20 21:42 UTC (permalink / raw)
  To: netdev
  Cc: Linus Walleij, Andrew Lunn, Vivien Didelot, Florian Fainelli,
	Vladimir Oltean, David S. Miller, Jakub Kicinski,
	Bartosz Golaszewski, Wolfram Sang, linux-i2c, Mark Brown,
	linux-spi, Alvin Šipraga, Lino Sanfilippo

The Linux device model permits both the ->shutdown and ->remove driver
methods to get called during a shutdown procedure. Example: a DSA switch
which sits on an SPI bus, and the SPI bus driver calls this on its
->shutdown method:

spi_unregister_controller
-> device_for_each_child(&ctlr->dev, NULL, __unregister);
   -> spi_unregister_device(to_spi_device(dev));
      -> device_del(&spi->dev);

So this is a simple pattern which can theoretically appear on any bus,
although the only other buses on which I've been able to find it are
I2C:

i2c_del_adapter
-> device_for_each_child(&adap->dev, NULL, __unregister_client);
   -> i2c_unregister_device(client);
      -> device_unregister(&client->dev);

The implication of this pattern is that devices on these buses can be
unregistered after having been shut down. The drivers for these devices
might choose to return early either from ->remove or ->shutdown if the
other callback has already run once, and they might choose that the
->shutdown method should only perform a subset of the teardown done by
->remove (to avoid unnecessary delays when rebooting).

So in other words, the device driver may choose on ->remove to not
do anything (therefore to not unregister an MDIO bus it has registered
on ->probe), because this ->remove is actually triggered by the
device_shutdown path, and its ->shutdown method has already run and done
the minimally required cleanup.

This used to be fine until the blamed commit, but now, the following
BUG_ON triggers:

void mdiobus_free(struct mii_bus *bus)
{
	/* For compatibility with error handling in drivers. */
	if (bus->state == MDIOBUS_ALLOCATED) {
		kfree(bus);
		return;
	}

	BUG_ON(bus->state != MDIOBUS_UNREGISTERED);
	bus->state = MDIOBUS_RELEASED;

	put_device(&bus->dev);
}

In other words, there is an attempt to free an MDIO bus which was not
unregistered. The attempt to free it comes from the devres release
callbacks of the SPI device, which are executed after the device is
unregistered.

I'm not saying that the fact that MDIO buses allocated using devres
would automatically get unregistered wasn't strange. I'm just saying
that the commit didn't care about auditing existing call paths in the
kernel, and now, the following code sequences are potentially buggy:

(a) devm_mdiobus_alloc followed by plain mdiobus_register, for a device
    located on a bus that unregisters its children on shutdown. After
    the blamed patch, either both the alloc and the register should use
    devres, or none should.

(b) devm_mdiobus_alloc followed by plain mdiobus_register, and then no
    mdiobus_unregister at all in the remove path. After the blamed
    patch, nobody unregisters the MDIO bus anymore, so this is even more
    buggy than the previous case which needs a specific bus
    configuration to be seen, this one is an unconditional bug.

In this case, the Realtek drivers fall under category (b). To solve it,
we can register the MDIO bus under devres too, which restores the
previous behavior.

Fixes: ac3a68d56651 ("net: phy: don't abuse devres in devm_mdiobus_register()")
Reported-by: Lino Sanfilippo <LinoSanfilippo@gmx.de>
Reported-by: Alvin Šipraga <alsi@bang-olufsen.dk>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
---
 drivers/net/dsa/realtek-smi-core.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/dsa/realtek-smi-core.c b/drivers/net/dsa/realtek-smi-core.c
index dd2f0d6208b3..2fcfd917b876 100644
--- a/drivers/net/dsa/realtek-smi-core.c
+++ b/drivers/net/dsa/realtek-smi-core.c
@@ -368,7 +368,7 @@ int realtek_smi_setup_mdio(struct realtek_smi *smi)
 	smi->slave_mii_bus->parent = smi->dev;
 	smi->ds->slave_mii_bus = smi->slave_mii_bus;
 
-	ret = of_mdiobus_register(smi->slave_mii_bus, mdio_np);
+	ret = devm_of_mdiobus_register(smi->dev, smi->slave_mii_bus, mdio_np);
 	if (ret) {
 		dev_err(smi->dev, "unable to register MDIO bus %s\n",
 			smi->slave_mii_bus->id);
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH net 1/2] net: dsa: don't allocate the slave_mii_bus using devres
  2021-09-20 21:42 ` [PATCH net 1/2] net: dsa: don't allocate the slave_mii_bus using devres Vladimir Oltean
@ 2021-09-21  2:16   ` Florian Fainelli
  2021-09-21 10:07   ` Lino Sanfilippo
  2021-09-21 11:58   ` Andrew Lunn
  2 siblings, 0 replies; 10+ messages in thread
From: Florian Fainelli @ 2021-09-21  2:16 UTC (permalink / raw)
  To: Vladimir Oltean, netdev
  Cc: Linus Walleij, Andrew Lunn, Vivien Didelot, Vladimir Oltean,
	David S. Miller, Jakub Kicinski, Bartosz Golaszewski,
	Wolfram Sang, linux-i2c, Mark Brown, linux-spi,
	Alvin Šipraga, Lino Sanfilippo



On 9/20/2021 2:42 PM, Vladimir Oltean wrote:
> The Linux device model permits both the ->shutdown and ->remove driver
> methods to get called during a shutdown procedure. Example: a DSA switch
> which sits on an SPI bus, and the SPI bus driver calls this on its
> ->shutdown method:
> 
> spi_unregister_controller
> -> device_for_each_child(&ctlr->dev, NULL, __unregister);
>     -> spi_unregister_device(to_spi_device(dev));
>        -> device_del(&spi->dev);
> 
> So this is a simple pattern which can theoretically appear on any bus,
> although the only other buses on which I've been able to find it are
> I2C:
> 
> i2c_del_adapter
> -> device_for_each_child(&adap->dev, NULL, __unregister_client);
>     -> i2c_unregister_device(client);
>        -> device_unregister(&client->dev);
> 
> The implication of this pattern is that devices on these buses can be
> unregistered after having been shut down. The drivers for these devices
> might choose to return early either from ->remove or ->shutdown if the
> other callback has already run once, and they might choose that the
> ->shutdown method should only perform a subset of the teardown done by
> ->remove (to avoid unnecessary delays when rebooting).
> 
> So in other words, the device driver may choose on ->remove to not
> do anything (therefore to not unregister an MDIO bus it has registered
> on ->probe), because this ->remove is actually triggered by the
> device_shutdown path, and its ->shutdown method has already run and done
> the minimally required cleanup.
> 
> This used to be fine until the blamed commit, but now, the following
> BUG_ON triggers:
> 
> void mdiobus_free(struct mii_bus *bus)
> {
> 	/* For compatibility with error handling in drivers. */
> 	if (bus->state == MDIOBUS_ALLOCATED) {
> 		kfree(bus);
> 		return;
> 	}
> 
> 	BUG_ON(bus->state != MDIOBUS_UNREGISTERED);
> 	bus->state = MDIOBUS_RELEASED;
> 
> 	put_device(&bus->dev);
> }
> 
> In other words, there is an attempt to free an MDIO bus which was not
> unregistered. The attempt to free it comes from the devres release
> callbacks of the SPI device, which are executed after the device is
> unregistered.
> 
> I'm not saying that the fact that MDIO buses allocated using devres
> would automatically get unregistered wasn't strange. I'm just saying
> that the commit didn't care about auditing existing call paths in the
> kernel, and now, the following code sequences are potentially buggy:
> 
> (a) devm_mdiobus_alloc followed by plain mdiobus_register, for a device
>      located on a bus that unregisters its children on shutdown. After
>      the blamed patch, either both the alloc and the register should use
>      devres, or none should.
> 
> (b) devm_mdiobus_alloc followed by plain mdiobus_register, and then no
>      mdiobus_unregister at all in the remove path. After the blamed
>      patch, nobody unregisters the MDIO bus anymore, so this is even more
>      buggy than the previous case which needs a specific bus
>      configuration to be seen, this one is an unconditional bug.
> 
> In this case, DSA falls into category (a), it tries to be helpful and
> registers an MDIO bus on behalf of the switch, which might be on such a
> bus. I've no idea why it does it under devres.
> 
> It does this on probe:
> 
> 	if (!ds->slave_mii_bus && ds->ops->phy_read)
> 		alloc and register mdio bus
> 
> and this on remove:
> 
> 	if (ds->slave_mii_bus && ds->ops->phy_read)
> 		unregister mdio bus
> 
> I _could_ imagine using devres because the condition used on remove is
> different than the condition used on probe. So strictly speaking, DSA
> cannot determine whether the ds->slave_mii_bus it sees on remove is the
> ds->slave_mii_bus that _it_ has allocated on probe. Using devres would
> have solved that problem. But nonetheless, the existing code already
> proceeds to unregister the MDIO bus, even though it might be
> unregistering an MDIO bus it has never registered. So I can only guess
> that no driver that implements ds->ops->phy_read also allocates and
> registers ds->slave_mii_bus itself.
> 
> So in that case, if unregistering is fine, freeing must be fine too.
> 
> Stop using devres and free the MDIO bus manually. This will make devres
> stop attempting to free a still registered MDIO bus on ->shutdown.
> 
> Fixes: ac3a68d56651 ("net: phy: don't abuse devres in devm_mdiobus_register()")
> Reported-by: Lino Sanfilippo <LinoSanfilippo@gmx.de>
> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>

Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
-- 
Florian

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH net 0/2] Fix mdiobus users with devres
  2021-09-20 21:42 [PATCH net 0/2] Fix mdiobus users with devres Vladimir Oltean
  2021-09-20 21:42 ` [PATCH net 1/2] net: dsa: don't allocate the slave_mii_bus using devres Vladimir Oltean
  2021-09-20 21:42 ` [PATCH net 2/2] net: dsa: realtek: register the MDIO bus under devres Vladimir Oltean
@ 2021-09-21  7:32 ` Bartosz Golaszewski
  2021-09-21 13:00 ` patchwork-bot+netdevbpf
  3 siblings, 0 replies; 10+ messages in thread
From: Bartosz Golaszewski @ 2021-09-21  7:32 UTC (permalink / raw)
  To: Vladimir Oltean
  Cc: netdev, Linus Walleij, Andrew Lunn, Vivien Didelot,
	Florian Fainelli, Vladimir Oltean, David S. Miller,
	Jakub Kicinski, Wolfram Sang, linux-i2c, Mark Brown, linux-spi,
	Alvin Šipraga, Lino Sanfilippo

On Mon, Sep 20, 2021 at 11:42 PM Vladimir Oltean
<vladimir.oltean@nxp.com> wrote:
>
> Commit ac3a68d56651 ("net: phy: don't abuse devres in
> devm_mdiobus_register()") by Bartosz Golaszewski has introduced two
> classes of potential bugs by making the devres callback of
> devm_mdiobus_alloc stop calling mdiobus_unregister.
>
> The exact buggy circumstances are presented in the individual commit
> messages. I have searched the tree for other occurrences, but at the
> moment:
>
> - for issue (a) I have no concrete proof that other buses except SPI and
>   I2C suffer from it, and the only SPI or I2C device drivers that call
>   of_mdiobus_alloc are the DSA drivers that leave a NULL
>   ds->slave_mii_bus and a non-NULL ds->ops->phy_read, aka ksz9477,
>   ksz8795, lan9303_i2c, vsc73xx-spi.
>
> - for issue (b), all drivers which call of_mdiobus_alloc either use
>   of_mdiobus_register too, or call mdiobus_unregister sometime within
>   the ->remove path.
>
> Although at this point I've seen enough strangeness caused by this
> "device_del during ->shutdown" that I'm just going to copy the SPI and
> I2C subsystem maintainers to this patch series, to get their feedback
> whether they've had reports about things like this before. I don't think
> other buses behave in this way, it forces SPI and I2C devices to have to
> protect themselves from a really strange set of issues.
>
> Vladimir Oltean (2):
>   net: dsa: don't allocate the slave_mii_bus using devres
>   net: dsa: realtek: register the MDIO bus under devres
>
>  drivers/net/dsa/realtek-smi-core.c |  2 +-
>  net/dsa/dsa2.c                     | 12 +++++++++---
>  2 files changed, 10 insertions(+), 4 deletions(-)
>
> --
> 2.25.1
>

Hi Vladimir,

Thanks for the detailed description and sorry for the trouble this
caused. I will revisit this and go through the drivers using those
functions again and possibly come up with some improvement.

Acked-by: Bartosz Golaszewski <brgl@bgdev.pl>

Bartosz

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH net 1/2] net: dsa: don't allocate the slave_mii_bus using devres
  2021-09-20 21:42 ` [PATCH net 1/2] net: dsa: don't allocate the slave_mii_bus using devres Vladimir Oltean
  2021-09-21  2:16   ` Florian Fainelli
@ 2021-09-21 10:07   ` Lino Sanfilippo
  2021-09-21 11:58   ` Andrew Lunn
  2 siblings, 0 replies; 10+ messages in thread
From: Lino Sanfilippo @ 2021-09-21 10:07 UTC (permalink / raw)
  To: Vladimir Oltean, netdev
  Cc: Linus Walleij, Andrew Lunn, Vivien Didelot, Florian Fainelli,
	Vladimir Oltean, David S. Miller, Jakub Kicinski,
	Bartosz Golaszewski, Wolfram Sang, linux-i2c, Mark Brown,
	linux-spi, Alvin Šipraga


Hi Vladimir,

On 20.09.21 at 23:42, Vladimir Oltean wrote:
> The Linux device model permits both the ->shutdown and ->remove driver
> methods to get called during a shutdown procedure. Example: a DSA switch
> which sits on an SPI bus, and the SPI bus driver calls this on its
> ->shutdown method:
>
> spi_unregister_controller
> -> device_for_each_child(&ctlr->dev, NULL, __unregister);
>    -> spi_unregister_device(to_spi_device(dev));
>       -> device_del(&spi->dev);
>
> So this is a simple pattern which can theoretically appear on any bus,
> although the only other buses on which I've been able to find it are
> I2C:
>
> i2c_del_adapter
> -> device_for_each_child(&adap->dev, NULL, __unregister_client);
>    -> i2c_unregister_device(client);
>       -> device_unregister(&client->dev);
>
> The implication of this pattern is that devices on these buses can be
> unregistered after having been shut down. The drivers for these devices
> might choose to return early either from ->remove or ->shutdown if the
> other callback has already run once, and they might choose that the
> ->shutdown method should only perform a subset of the teardown done by
> ->remove (to avoid unnecessary delays when rebooting).
>
> So in other words, the device driver may choose on ->remove to not
> do anything (therefore to not unregister an MDIO bus it has registered
> on ->probe), because this ->remove is actually triggered by the
> device_shutdown path, and its ->shutdown method has already run and done
> the minimally required cleanup.
>
> This used to be fine until the blamed commit, but now, the following
> BUG_ON triggers:
>
> void mdiobus_free(struct mii_bus *bus)
> {
> 	/* For compatibility with error handling in drivers. */
> 	if (bus->state == MDIOBUS_ALLOCATED) {
> 		kfree(bus);
> 		return;
> 	}
>
> 	BUG_ON(bus->state != MDIOBUS_UNREGISTERED);
> 	bus->state = MDIOBUS_RELEASED;
>
> 	put_device(&bus->dev);
> }
>
> In other words, there is an attempt to free an MDIO bus which was not
> unregistered. The attempt to free it comes from the devres release
> callbacks of the SPI device, which are executed after the device is
> unregistered.
>
> I'm not saying that the fact that MDIO buses allocated using devres
> would automatically get unregistered wasn't strange. I'm just saying
> that the commit didn't care about auditing existing call paths in the
> kernel, and now, the following code sequences are potentially buggy:
>
> (a) devm_mdiobus_alloc followed by plain mdiobus_register, for a device
>     located on a bus that unregisters its children on shutdown. After
>     the blamed patch, either both the alloc and the register should use
>     devres, or none should.
>
> (b) devm_mdiobus_alloc followed by plain mdiobus_register, and then no
>     mdiobus_unregister at all in the remove path. After the blamed
>     patch, nobody unregisters the MDIO bus anymore, so this is even more
>     buggy than the previous case which needs a specific bus
>     configuration to be seen, this one is an unconditional bug.
>
> In this case, DSA falls into category (a), it tries to be helpful and
> registers an MDIO bus on behalf of the switch, which might be on such a
> bus. I've no idea why it does it under devres.
>
> It does this on probe:
>
> 	if (!ds->slave_mii_bus && ds->ops->phy_read)
> 		alloc and register mdio bus
>
> and this on remove:
>
> 	if (ds->slave_mii_bus && ds->ops->phy_read)
> 		unregister mdio bus
>
> I _could_ imagine using devres because the condition used on remove is
> different than the condition used on probe. So strictly speaking, DSA
> cannot determine whether the ds->slave_mii_bus it sees on remove is the
> ds->slave_mii_bus that _it_ has allocated on probe. Using devres would
> have solved that problem. But nonetheless, the existing code already
> proceeds to unregister the MDIO bus, even though it might be
> unregistering an MDIO bus it has never registered. So I can only guess
> that no driver that implements ds->ops->phy_read also allocates and
> registers ds->slave_mii_bus itself.
>
> So in that case, if unregistering is fine, freeing must be fine too.
>
> Stop using devres and free the MDIO bus manually. This will make devres
> stop attempting to free a still registered MDIO bus on ->shutdown.
>
> Fixes: ac3a68d56651 ("net: phy: don't abuse devres in devm_mdiobus_register()")
> Reported-by: Lino Sanfilippo <LinoSanfilippo@gmx.de>
> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
> ---
>  net/dsa/dsa2.c | 12 +++++++++---
>  1 file changed, 9 insertions(+), 3 deletions(-)
>
> diff --git a/net/dsa/dsa2.c b/net/dsa/dsa2.c
> index f14897d9b31d..274018e9171c 100644
> --- a/net/dsa/dsa2.c
> +++ b/net/dsa/dsa2.c
> @@ -880,7 +880,7 @@ static int dsa_switch_setup(struct dsa_switch *ds)
>  	devlink_params_publish(ds->devlink);
>
>  	if (!ds->slave_mii_bus && ds->ops->phy_read) {
> -		ds->slave_mii_bus = devm_mdiobus_alloc(ds->dev);
> +		ds->slave_mii_bus = mdiobus_alloc();
>  		if (!ds->slave_mii_bus) {
>  			err = -ENOMEM;
>  			goto teardown;
> @@ -890,13 +890,16 @@ static int dsa_switch_setup(struct dsa_switch *ds)
>
>  		err = mdiobus_register(ds->slave_mii_bus);
>  		if (err < 0)
> -			goto teardown;
> +			goto free_slave_mii_bus;
>  	}
>
>  	ds->setup = true;
>
>  	return 0;
>
> +free_slave_mii_bus:
> +	if (ds->slave_mii_bus && ds->ops->phy_read)
> +		mdiobus_free(ds->slave_mii_bus);
>  teardown:
>  	if (ds->ops->teardown)
>  		ds->ops->teardown(ds);
> @@ -921,8 +924,11 @@ static void dsa_switch_teardown(struct dsa_switch *ds)
>  	if (!ds->setup)
>  		return;
>
> -	if (ds->slave_mii_bus && ds->ops->phy_read)
> +	if (ds->slave_mii_bus && ds->ops->phy_read) {
>  		mdiobus_unregister(ds->slave_mii_bus);
> +		mdiobus_free(ds->slave_mii_bus);
> +		ds->slave_mii_bus = NULL;
> +	}
>
>  	dsa_switch_unregister_notifier(ds);
>
>

I applied this patch on top of your series "Make DSA switch drivers compatible with
masters which unregister on shutdown" and now the shutdown works as expected (i.e.
no hang as without your patches and no kernel BUG as with only the above mentioned
series applied).

Great job, thanks a lot!

FWIW:
Tested-by: Lino Sanfilippo <LinoSanfilippo@gmx.de>

Best Regards,
Lino



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH net 1/2] net: dsa: don't allocate the slave_mii_bus using devres
  2021-09-20 21:42 ` [PATCH net 1/2] net: dsa: don't allocate the slave_mii_bus using devres Vladimir Oltean
  2021-09-21  2:16   ` Florian Fainelli
  2021-09-21 10:07   ` Lino Sanfilippo
@ 2021-09-21 11:58   ` Andrew Lunn
  2 siblings, 0 replies; 10+ messages in thread
From: Andrew Lunn @ 2021-09-21 11:58 UTC (permalink / raw)
  To: Vladimir Oltean
  Cc: netdev, Linus Walleij, Vivien Didelot, Florian Fainelli,
	Vladimir Oltean, David S. Miller, Jakub Kicinski,
	Bartosz Golaszewski, Wolfram Sang, linux-i2c, Mark Brown,
	linux-spi, Alvin Šipraga, Lino Sanfilippo

> I _could_ imagine using devres because the condition used on remove is
> different than the condition used on probe. So strictly speaking, DSA
> cannot determine whether the ds->slave_mii_bus it sees on remove is the
> ds->slave_mii_bus that _it_ has allocated on probe. Using devres would
> have solved that problem. But nonetheless, the existing code already
> proceeds to unregister the MDIO bus, even though it might be
> unregistering an MDIO bus it has never registered. So I can only guess
> that no driver that implements ds->ops->phy_read also allocates and
> registers ds->slave_mii_bus itself.

That should not happen. It should be either/or.

But there is no enforcement of that.

Reviewed-by: Andrew Lunn <andrew@lunn.ch>

    Andrew

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH net 2/2] net: dsa: realtek: register the MDIO bus under devres
  2021-09-20 21:42 ` [PATCH net 2/2] net: dsa: realtek: register the MDIO bus under devres Vladimir Oltean
@ 2021-09-21 12:00   ` Andrew Lunn
  2021-09-21 16:22   ` Linus Walleij
  1 sibling, 0 replies; 10+ messages in thread
From: Andrew Lunn @ 2021-09-21 12:00 UTC (permalink / raw)
  To: Vladimir Oltean
  Cc: netdev, Linus Walleij, Vivien Didelot, Florian Fainelli,
	Vladimir Oltean, David S. Miller, Jakub Kicinski,
	Bartosz Golaszewski, Wolfram Sang, linux-i2c, Mark Brown,
	linux-spi, Alvin Šipraga, Lino Sanfilippo

> Fixes: ac3a68d56651 ("net: phy: don't abuse devres in devm_mdiobus_register()")
> Reported-by: Lino Sanfilippo <LinoSanfilippo@gmx.de>
> Reported-by: Alvin Šipraga <alsi@bang-olufsen.dk>
> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>

Reviewed-by: Andrew Lunn <andrew@lunn.ch>

    Andrew

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH net 0/2] Fix mdiobus users with devres
  2021-09-20 21:42 [PATCH net 0/2] Fix mdiobus users with devres Vladimir Oltean
                   ` (2 preceding siblings ...)
  2021-09-21  7:32 ` [PATCH net 0/2] Fix mdiobus users with devres Bartosz Golaszewski
@ 2021-09-21 13:00 ` patchwork-bot+netdevbpf
  3 siblings, 0 replies; 10+ messages in thread
From: patchwork-bot+netdevbpf @ 2021-09-21 13:00 UTC (permalink / raw)
  To: Vladimir Oltean
  Cc: netdev, linus.walleij, andrew, vivien.didelot, f.fainelli,
	olteanv, davem, kuba, bgolaszewski, wsa, linux-i2c, broonie,
	linux-spi, alsi, LinoSanfilippo

Hello:

This series was applied to netdev/net.git (refs/heads/master):

On Tue, 21 Sep 2021 00:42:07 +0300 you wrote:
> Commit ac3a68d56651 ("net: phy: don't abuse devres in
> devm_mdiobus_register()") by Bartosz Golaszewski has introduced two
> classes of potential bugs by making the devres callback of
> devm_mdiobus_alloc stop calling mdiobus_unregister.
> 
> The exact buggy circumstances are presented in the individual commit
> messages. I have searched the tree for other occurrences, but at the
> moment:
> 
> [...]

Here is the summary with links:
  - [net,1/2] net: dsa: don't allocate the slave_mii_bus using devres
    https://git.kernel.org/netdev/net/c/5135e96a3dd2
  - [net,2/2] net: dsa: realtek: register the MDIO bus under devres
    https://git.kernel.org/netdev/net/c/74b6d7d13307

You are awesome, thank you!
--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH net 2/2] net: dsa: realtek: register the MDIO bus under devres
  2021-09-20 21:42 ` [PATCH net 2/2] net: dsa: realtek: register the MDIO bus under devres Vladimir Oltean
  2021-09-21 12:00   ` Andrew Lunn
@ 2021-09-21 16:22   ` Linus Walleij
  1 sibling, 0 replies; 10+ messages in thread
From: Linus Walleij @ 2021-09-21 16:22 UTC (permalink / raw)
  To: Vladimir Oltean
  Cc: netdev, Andrew Lunn, Vivien Didelot, Florian Fainelli,
	Vladimir Oltean, David S. Miller, Jakub Kicinski,
	Bartosz Golaszewski, Wolfram Sang, linux-i2c, Mark Brown,
	linux-spi, Alvin Šipraga, Lino Sanfilippo

On Mon, Sep 20, 2021 at 11:42 PM Vladimir Oltean
<vladimir.oltean@nxp.com> wrote:

> Fixes: ac3a68d56651 ("net: phy: don't abuse devres in devm_mdiobus_register()")
> Reported-by: Lino Sanfilippo <LinoSanfilippo@gmx.de>
> Reported-by: Alvin Šipraga <alsi@bang-olufsen.dk>
> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>

Acked-by: Linus Walleij <linus.walleij@linaro.org>

Yours,
Linus Walleij

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2021-09-21 16:23 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-09-20 21:42 [PATCH net 0/2] Fix mdiobus users with devres Vladimir Oltean
2021-09-20 21:42 ` [PATCH net 1/2] net: dsa: don't allocate the slave_mii_bus using devres Vladimir Oltean
2021-09-21  2:16   ` Florian Fainelli
2021-09-21 10:07   ` Lino Sanfilippo
2021-09-21 11:58   ` Andrew Lunn
2021-09-20 21:42 ` [PATCH net 2/2] net: dsa: realtek: register the MDIO bus under devres Vladimir Oltean
2021-09-21 12:00   ` Andrew Lunn
2021-09-21 16:22   ` Linus Walleij
2021-09-21  7:32 ` [PATCH net 0/2] Fix mdiobus users with devres Bartosz Golaszewski
2021-09-21 13:00 ` patchwork-bot+netdevbpf

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).