linux-acpi.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH net-next 0/3] Make the PHY library stop being so greedy when binding the generic PHY driver
@ 2021-09-01 22:50 Vladimir Oltean
  2021-09-01 22:50 ` [RFC PATCH net-next 1/3] net: phy: don't bind genphy in phy_attach_direct if the specific driver defers probe Vladimir Oltean
                   ` (4 more replies)
  0 siblings, 5 replies; 42+ messages in thread
From: Vladimir Oltean @ 2021-09-01 22:50 UTC (permalink / raw)
  To: netdev
  Cc: Greg Kroah-Hartman, Rafael J. Wysocki, Andrew Lunn,
	Heiner Kallweit, Russell King, David S. Miller, Jakub Kicinski,
	Vivien Didelot, Florian Fainelli, Vladimir Oltean, linux-kernel,
	Linus Walleij, Alvin Šipraga, ACPI Devel Maling List,
	kernel-team, Len Brown

This is a continuation of the discussion on patch "[v1,1/2] driver core:
fw_devlink: Add support for FWNODE_FLAG_BROKEN_PARENT" from here:
https://patchwork.kernel.org/project/netdevbpf/patch/20210826074526.825517-2-saravanak@google.com/

Summary: in a complex combination of device dependencies which is not
really relevant to what is being proposed here, DSA ends up calling
phylink_of_phy_connect during a period in which the PHY driver goes
through a series of probe deferral events.

The central point of that discussion is that DSA seems "broken" for
expecting the PHY driver to probe immediately on PHYs belonging to the
internal MDIO buses of switches. A few suggestions were made about what
to do, but some were not satisfactory and some did not solve the problem.

In fact, fw_devlink, the mechanism that causes the PHY driver to defer
probing in this particular case, has some significant "issues" too, but
its "issues" are only in quotes "because at worst it'd allow a few
unnecessary deferred probes":
https://patchwork.kernel.org/project/netdevbpf/patch/20210826074526.825517-2-saravanak@google.com/#24418895

So if that's the criterion by which an issue is an issue, maybe we
should take a step back and look at the bigger picture.

There is nothing about the idea that a PHY might defer probing, or about
the changes proposed here, that has anything with DSA. Furthermore, the
changes done by this series solve the problem in the same way: "they
allow a few unnecessary deferred probes" <- in this case they provoke
this to the caller of phy_attach_direct.

If we look at commit 16983507742c ("net: phy: probe PHY drivers
synchronously"), we see that the PHY library expectation is for the PHY
device to have a PHY driver bound to it as soon as device_add() finishes.

Well, as it turns out, in case the PHY device has any supplier which is
not ready, this is not possible, but that patch still seems to ensure
that the process of binding a driver to the device has at least started.
That process will continue for a while, and will race with
phy_attach_direct calls, so we need to make the latter observe the fact
that a driver is struggling to probe, and wait for it a bit more.

What I've not tested is loading the PHY module at runtime, and seeing
how phy_attach_direct behaves then. I expect that this change set will
not alter the behavior in that case: the genphy will still bind to a
device with no driver, and phy_attach_direct will not return -EPROBE_DEFER
in that case.

I might not be very versed in the device core/internals, but the patches
make sense to me, and worked as intended from the first try on my system
(Turris MOX with mv88e6xxx), which was modified to make the same "sins"
as those called out in the thread above:

- use PHY interrupts provided by the switch itself as an interrupt-controller
- call of_mdiobus_register from setup() and not from probe(), so as to
  not circumvent fw_devlink's limitations, and still get to hit the PHY
  probe deferral conditions.

So feedback and testing on other platforms is very appreciated.

Vladimir Oltean (3):
  net: phy: don't bind genphy in phy_attach_direct if the specific
    driver defers probe
  net: dsa: destroy the phylink instance on any error in
    dsa_slave_phy_setup
  net: dsa: allow the phy_connect() call to return -EPROBE_DEFER

 drivers/base/dd.c            | 21 +++++++++++++++++++--
 drivers/net/phy/phy_device.c |  8 ++++++++
 include/linux/device.h       |  1 +
 net/dsa/dsa2.c               |  2 ++
 net/dsa/slave.c              | 12 +++++-------
 5 files changed, 35 insertions(+), 9 deletions(-)

-- 
2.25.1


^ permalink raw reply	[flat|nested] 42+ messages in thread

* [RFC PATCH net-next 1/3] net: phy: don't bind genphy in phy_attach_direct if the specific driver defers probe
  2021-09-01 22:50 [RFC PATCH net-next 0/3] Make the PHY library stop being so greedy when binding the generic PHY driver Vladimir Oltean
@ 2021-09-01 22:50 ` Vladimir Oltean
  2021-09-02  5:43   ` Greg Kroah-Hartman
  2021-09-02 18:50   ` Russell King (Oracle)
  2021-09-01 22:50 ` [RFC PATCH net-next 2/3] net: dsa: destroy the phylink instance on any error in dsa_slave_phy_setup Vladimir Oltean
                   ` (3 subsequent siblings)
  4 siblings, 2 replies; 42+ messages in thread
From: Vladimir Oltean @ 2021-09-01 22:50 UTC (permalink / raw)
  To: netdev
  Cc: Greg Kroah-Hartman, Rafael J. Wysocki, Andrew Lunn,
	Heiner Kallweit, Russell King, David S. Miller, Jakub Kicinski,
	Vivien Didelot, Florian Fainelli, Vladimir Oltean, linux-kernel,
	Linus Walleij, Alvin Šipraga, ACPI Devel Maling List,
	kernel-team, Len Brown

There are systems where the PHY driver might get its probe deferred due
to a missing supplier, like an interrupt-parent, gpio, clock or whatever.

If the phy_attach_direct call happens right in between probe attempts,
the PHY library is greedy and assumes that a specific driver will never
appear, so it just binds the generic PHY driver.

In certain cases this is the wrong choice, because some PHYs simply need
the specific driver. The specific PHY driver was going to probe, given
enough time, but this doesn't seem to matter to phy_attach_direct.

To solve this, make phy_attach_direct check whether a specific PHY
driver is pending or not, and if it is, just defer the probing of the
MAC that's connecting to us a bit more too.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
---
 drivers/base/dd.c            | 21 +++++++++++++++++++--
 drivers/net/phy/phy_device.c |  8 ++++++++
 include/linux/device.h       |  1 +
 3 files changed, 28 insertions(+), 2 deletions(-)

diff --git a/drivers/base/dd.c b/drivers/base/dd.c
index 1c379d20812a..b22073b0acd2 100644
--- a/drivers/base/dd.c
+++ b/drivers/base/dd.c
@@ -128,13 +128,30 @@ static void deferred_probe_work_func(struct work_struct *work)
 }
 static DECLARE_WORK(deferred_probe_work, deferred_probe_work_func);
 
+static bool __device_pending_probe(struct device *dev)
+{
+	return !list_empty(&dev->p->deferred_probe);
+}
+
+bool device_pending_probe(struct device *dev)
+{
+	bool pending;
+
+	mutex_lock(&deferred_probe_mutex);
+	pending = __device_pending_probe(dev);
+	mutex_unlock(&deferred_probe_mutex);
+
+	return pending;
+}
+EXPORT_SYMBOL_GPL(device_pending_probe);
+
 void driver_deferred_probe_add(struct device *dev)
 {
 	if (!dev->can_match)
 		return;
 
 	mutex_lock(&deferred_probe_mutex);
-	if (list_empty(&dev->p->deferred_probe)) {
+	if (!__device_pending_probe(dev)) {
 		dev_dbg(dev, "Added to deferred list\n");
 		list_add_tail(&dev->p->deferred_probe, &deferred_probe_pending_list);
 	}
@@ -144,7 +161,7 @@ void driver_deferred_probe_add(struct device *dev)
 void driver_deferred_probe_del(struct device *dev)
 {
 	mutex_lock(&deferred_probe_mutex);
-	if (!list_empty(&dev->p->deferred_probe)) {
+	if (__device_pending_probe(dev)) {
 		dev_dbg(dev, "Removed from deferred list\n");
 		list_del_init(&dev->p->deferred_probe);
 		__device_set_deferred_probe_reason(dev, NULL);
diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c
index 52310df121de..2c22a32f0a1c 100644
--- a/drivers/net/phy/phy_device.c
+++ b/drivers/net/phy/phy_device.c
@@ -1386,8 +1386,16 @@ int phy_attach_direct(struct net_device *dev, struct phy_device *phydev,
 
 	/* Assume that if there is no driver, that it doesn't
 	 * exist, and we should use the genphy driver.
+	 * The exception is during probing, when the PHY driver might have
+	 * attempted a probe but has requested deferral. Since there might be
+	 * MAC drivers which also attach to the PHY during probe time, try
+	 * harder to bind the specific PHY driver, and defer the MAC driver's
+	 * probing until then.
 	 */
 	if (!d->driver) {
+		if (device_pending_probe(d))
+			return -EPROBE_DEFER;
+
 		if (phydev->is_c45)
 			d->driver = &genphy_c45_driver.mdiodrv.driver;
 		else
diff --git a/include/linux/device.h b/include/linux/device.h
index e270cb740b9e..505e77715789 100644
--- a/include/linux/device.h
+++ b/include/linux/device.h
@@ -889,6 +889,7 @@ int __must_check driver_attach(struct device_driver *drv);
 void device_initial_probe(struct device *dev);
 int __must_check device_reprobe(struct device *dev);
 
+bool device_pending_probe(struct device *dev);
 bool device_is_bound(struct device *dev);
 
 /*
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [RFC PATCH net-next 2/3] net: dsa: destroy the phylink instance on any error in dsa_slave_phy_setup
  2021-09-01 22:50 [RFC PATCH net-next 0/3] Make the PHY library stop being so greedy when binding the generic PHY driver Vladimir Oltean
  2021-09-01 22:50 ` [RFC PATCH net-next 1/3] net: phy: don't bind genphy in phy_attach_direct if the specific driver defers probe Vladimir Oltean
@ 2021-09-01 22:50 ` Vladimir Oltean
  2021-09-02 12:25   ` Russell King (Oracle)
  2021-09-02 23:21   ` Florian Fainelli
  2021-09-01 22:50 ` [RFC PATCH net-next 3/3] net: dsa: allow the phy_connect() call to return -EPROBE_DEFER Vladimir Oltean
                   ` (2 subsequent siblings)
  4 siblings, 2 replies; 42+ messages in thread
From: Vladimir Oltean @ 2021-09-01 22:50 UTC (permalink / raw)
  To: netdev
  Cc: Greg Kroah-Hartman, Rafael J. Wysocki, Andrew Lunn,
	Heiner Kallweit, Russell King, David S. Miller, Jakub Kicinski,
	Vivien Didelot, Florian Fainelli, Vladimir Oltean, linux-kernel,
	Linus Walleij, Alvin Šipraga, ACPI Devel Maling List,
	kernel-team, Len Brown

DSA supports connecting to a phy-handle, and has a fallback to a non-OF
based method of connecting to an internal PHY on the switch's own MDIO
bus, if no phy-handle and no fixed-link nodes were present.

The -ENODEV error code from the first attempt (phylink_of_phy_connect)
is what triggers the second attempt (phylink_connect_phy).

However, when the first attempt returns a different error code than
-ENODEV, this results in an unbalance of calls to phylink_create and
phylink_destroy by the time we exit the function. The phylink instance
has leaked.

There are many other error codes that can be returned by
phylink_of_phy_connect. For example, phylink_validate returns -EINVAL.
So this is a practical issue too.

Fixes: aab9c4067d23 ("net: dsa: Plug in PHYLINK support")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
---
I know, I will send this bug fix to "net" too, this is provided just for
testing purposes, and for the completeness of the patch set.

 net/dsa/slave.c | 12 +++++-------
 1 file changed, 5 insertions(+), 7 deletions(-)

diff --git a/net/dsa/slave.c b/net/dsa/slave.c
index a7a114b9cb77..8a395290267c 100644
--- a/net/dsa/slave.c
+++ b/net/dsa/slave.c
@@ -1856,13 +1856,11 @@ static int dsa_slave_phy_setup(struct net_device *slave_dev)
 		 * use the switch internal MDIO bus instead
 		 */
 		ret = dsa_slave_phy_connect(slave_dev, dp->index, phy_flags);
-		if (ret) {
-			netdev_err(slave_dev,
-				   "failed to connect to port %d: %d\n",
-				   dp->index, ret);
-			phylink_destroy(dp->pl);
-			return ret;
-		}
+	}
+	if (ret) {
+		netdev_err(slave_dev, "failed to connect to PHY: %pe\n",
+			   ERR_PTR(ret));
+		phylink_destroy(dp->pl);
 	}
 
 	return ret;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [RFC PATCH net-next 3/3] net: dsa: allow the phy_connect() call to return -EPROBE_DEFER
  2021-09-01 22:50 [RFC PATCH net-next 0/3] Make the PHY library stop being so greedy when binding the generic PHY driver Vladimir Oltean
  2021-09-01 22:50 ` [RFC PATCH net-next 1/3] net: phy: don't bind genphy in phy_attach_direct if the specific driver defers probe Vladimir Oltean
  2021-09-01 22:50 ` [RFC PATCH net-next 2/3] net: dsa: destroy the phylink instance on any error in dsa_slave_phy_setup Vladimir Oltean
@ 2021-09-01 22:50 ` Vladimir Oltean
  2021-09-02 12:19 ` [RFC PATCH net-next 0/3] Make the PHY library stop being so greedy when binding the generic PHY driver Russell King (Oracle)
  2021-09-02 22:05 ` Vladimir Oltean
  4 siblings, 0 replies; 42+ messages in thread
From: Vladimir Oltean @ 2021-09-01 22:50 UTC (permalink / raw)
  To: netdev
  Cc: Greg Kroah-Hartman, Rafael J. Wysocki, Andrew Lunn,
	Heiner Kallweit, Russell King, David S. Miller, Jakub Kicinski,
	Vivien Didelot, Florian Fainelli, Vladimir Oltean, linux-kernel,
	Linus Walleij, Alvin Šipraga, ACPI Devel Maling List,
	kernel-team, Len Brown

Currently DSA ignores any errors coming from dsa_port_setup(), and this
includes:

dsa_port_setup
-> dsa_slave_create
   -> dsa_slave_phy_setup
      -> phylink_of_phy_connect
         -> ...
            -> phy_attach_direct

This is done such that PHYs present on optional riser cards which are
missing do not cause the entire switch probing to fail.

Now that phy_attach_direct tries harder to probe the specific PHY driver
instead of genphy, it can actually return -EPROBE_DEFER. It makes sense
to treat this error separately, and not just give up. Trigger the normal
error path, unwind the setup done so far, and come back later.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
---
 net/dsa/dsa2.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/net/dsa/dsa2.c b/net/dsa/dsa2.c
index e78901d33a10..282bdebac835 100644
--- a/net/dsa/dsa2.c
+++ b/net/dsa/dsa2.c
@@ -912,6 +912,8 @@ static int dsa_tree_setup_switches(struct dsa_switch_tree *dst)
 
 	list_for_each_entry(dp, &dst->ports, list) {
 		err = dsa_port_setup(dp);
+		if (err == -EPROBE_DEFER)
+			goto teardown;
 		if (err) {
 			dsa_port_devlink_teardown(dp);
 			dp->type = DSA_PORT_TYPE_UNUSED;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* Re: [RFC PATCH net-next 1/3] net: phy: don't bind genphy in phy_attach_direct if the specific driver defers probe
  2021-09-01 22:50 ` [RFC PATCH net-next 1/3] net: phy: don't bind genphy in phy_attach_direct if the specific driver defers probe Vladimir Oltean
@ 2021-09-02  5:43   ` Greg Kroah-Hartman
  2021-09-02 10:11     ` Vladimir Oltean
  2021-09-02 14:37     ` Rafael J. Wysocki
  2021-09-02 18:50   ` Russell King (Oracle)
  1 sibling, 2 replies; 42+ messages in thread
From: Greg Kroah-Hartman @ 2021-09-02  5:43 UTC (permalink / raw)
  To: Vladimir Oltean
  Cc: netdev, Rafael J. Wysocki, Andrew Lunn, Heiner Kallweit,
	Russell King, David S. Miller, Jakub Kicinski, Vivien Didelot,
	Florian Fainelli, Vladimir Oltean, linux-kernel, Linus Walleij,
	Alvin Šipraga, ACPI Devel Maling List, kernel-team,
	Len Brown

On Thu, Sep 02, 2021 at 01:50:51AM +0300, Vladimir Oltean wrote:
> There are systems where the PHY driver might get its probe deferred due
> to a missing supplier, like an interrupt-parent, gpio, clock or whatever.
> 
> If the phy_attach_direct call happens right in between probe attempts,
> the PHY library is greedy and assumes that a specific driver will never
> appear, so it just binds the generic PHY driver.
> 
> In certain cases this is the wrong choice, because some PHYs simply need
> the specific driver. The specific PHY driver was going to probe, given
> enough time, but this doesn't seem to matter to phy_attach_direct.
> 
> To solve this, make phy_attach_direct check whether a specific PHY
> driver is pending or not, and if it is, just defer the probing of the
> MAC that's connecting to us a bit more too.
> 
> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
> ---
>  drivers/base/dd.c            | 21 +++++++++++++++++++--
>  drivers/net/phy/phy_device.c |  8 ++++++++
>  include/linux/device.h       |  1 +
>  3 files changed, 28 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/base/dd.c b/drivers/base/dd.c
> index 1c379d20812a..b22073b0acd2 100644
> --- a/drivers/base/dd.c
> +++ b/drivers/base/dd.c
> @@ -128,13 +128,30 @@ static void deferred_probe_work_func(struct work_struct *work)
>  }
>  static DECLARE_WORK(deferred_probe_work, deferred_probe_work_func);
>  
> +static bool __device_pending_probe(struct device *dev)
> +{
> +	return !list_empty(&dev->p->deferred_probe);
> +}
> +
> +bool device_pending_probe(struct device *dev)
> +{
> +	bool pending;
> +
> +	mutex_lock(&deferred_probe_mutex);
> +	pending = __device_pending_probe(dev);
> +	mutex_unlock(&deferred_probe_mutex);
> +
> +	return pending;
> +}
> +EXPORT_SYMBOL_GPL(device_pending_probe);
> +
>  void driver_deferred_probe_add(struct device *dev)
>  {
>  	if (!dev->can_match)
>  		return;
>  
>  	mutex_lock(&deferred_probe_mutex);
> -	if (list_empty(&dev->p->deferred_probe)) {
> +	if (!__device_pending_probe(dev)) {
>  		dev_dbg(dev, "Added to deferred list\n");
>  		list_add_tail(&dev->p->deferred_probe, &deferred_probe_pending_list);
>  	}
> @@ -144,7 +161,7 @@ void driver_deferred_probe_add(struct device *dev)
>  void driver_deferred_probe_del(struct device *dev)
>  {
>  	mutex_lock(&deferred_probe_mutex);
> -	if (!list_empty(&dev->p->deferred_probe)) {
> +	if (__device_pending_probe(dev)) {
>  		dev_dbg(dev, "Removed from deferred list\n");
>  		list_del_init(&dev->p->deferred_probe);
>  		__device_set_deferred_probe_reason(dev, NULL);
> diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c
> index 52310df121de..2c22a32f0a1c 100644
> --- a/drivers/net/phy/phy_device.c
> +++ b/drivers/net/phy/phy_device.c
> @@ -1386,8 +1386,16 @@ int phy_attach_direct(struct net_device *dev, struct phy_device *phydev,
>  
>  	/* Assume that if there is no driver, that it doesn't
>  	 * exist, and we should use the genphy driver.
> +	 * The exception is during probing, when the PHY driver might have
> +	 * attempted a probe but has requested deferral. Since there might be
> +	 * MAC drivers which also attach to the PHY during probe time, try
> +	 * harder to bind the specific PHY driver, and defer the MAC driver's
> +	 * probing until then.

Wait, no, this should not be a "special" thing, and why would the list
of deferred probe show this?

If a bus wants to have this type of "generic vs. specific" logic, then
it needs to handle it in the bus logic itself as that does NOT fit into
the normal driver model at all.  Don't try to get a "hint" of this by
messing with the probe function list.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [RFC PATCH net-next 1/3] net: phy: don't bind genphy in phy_attach_direct if the specific driver defers probe
  2021-09-02  5:43   ` Greg Kroah-Hartman
@ 2021-09-02 10:11     ` Vladimir Oltean
  2021-09-02 10:37       ` Greg Kroah-Hartman
  2021-09-02 14:37     ` Rafael J. Wysocki
  1 sibling, 1 reply; 42+ messages in thread
From: Vladimir Oltean @ 2021-09-02 10:11 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Vladimir Oltean, netdev, Rafael J. Wysocki, Andrew Lunn,
	Heiner Kallweit, Russell King, David S. Miller, Jakub Kicinski,
	Vivien Didelot, Florian Fainelli, linux-kernel, Linus Walleij,
	Alvin Šipraga, ACPI Devel Maling List, kernel-team,
	Len Brown

On Thu, Sep 02, 2021 at 07:43:10AM +0200, Greg Kroah-Hartman wrote:
> Wait, no, this should not be a "special" thing, and why would the list
> of deferred probe show this?

Why as in why would it work/do what I want, or as in why would you want to do that?

> If a bus wants to have this type of "generic vs. specific" logic, then
> it needs to handle it in the bus logic itself as that does NOT fit into
> the normal driver model at all.  Don't try to get a "hint" of this by
> messing with the probe function list.

Where and how? Do you have an example?

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [RFC PATCH net-next 1/3] net: phy: don't bind genphy in phy_attach_direct if the specific driver defers probe
  2021-09-02 10:11     ` Vladimir Oltean
@ 2021-09-02 10:37       ` Greg Kroah-Hartman
  2021-09-02 11:17         ` Vladimir Oltean
  0 siblings, 1 reply; 42+ messages in thread
From: Greg Kroah-Hartman @ 2021-09-02 10:37 UTC (permalink / raw)
  To: Vladimir Oltean
  Cc: Vladimir Oltean, netdev, Rafael J. Wysocki, Andrew Lunn,
	Heiner Kallweit, Russell King, David S. Miller, Jakub Kicinski,
	Vivien Didelot, Florian Fainelli, linux-kernel, Linus Walleij,
	Alvin Šipraga, ACPI Devel Maling List, kernel-team,
	Len Brown

On Thu, Sep 02, 2021 at 01:11:50PM +0300, Vladimir Oltean wrote:
> On Thu, Sep 02, 2021 at 07:43:10AM +0200, Greg Kroah-Hartman wrote:
> > Wait, no, this should not be a "special" thing, and why would the list
> > of deferred probe show this?
> 
> Why as in why would it work/do what I want, or as in why would you want to do that?

Both!  :)

> > If a bus wants to have this type of "generic vs. specific" logic, then
> > it needs to handle it in the bus logic itself as that does NOT fit into
> > the normal driver model at all.  Don't try to get a "hint" of this by
> > messing with the probe function list.
> 
> Where and how? Do you have an example?

No I do not, sorry, most busses do not do this for obvious ordering /
loading / we are not that crazy reasons.

What is causing this all to suddenly break?  The devlink stuff?

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [RFC PATCH net-next 1/3] net: phy: don't bind genphy in phy_attach_direct if the specific driver defers probe
  2021-09-02 10:37       ` Greg Kroah-Hartman
@ 2021-09-02 11:17         ` Vladimir Oltean
  0 siblings, 0 replies; 42+ messages in thread
From: Vladimir Oltean @ 2021-09-02 11:17 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Vladimir Oltean, netdev, Rafael J. Wysocki, Andrew Lunn,
	Heiner Kallweit, Russell King, David S. Miller, Jakub Kicinski,
	Vivien Didelot, Florian Fainelli, linux-kernel, Linus Walleij,
	Alvin Šipraga, ACPI Devel Maling List, kernel-team,
	Len Brown

On Thu, Sep 02, 2021 at 12:37:34PM +0200, Greg Kroah-Hartman wrote:
> On Thu, Sep 02, 2021 at 01:11:50PM +0300, Vladimir Oltean wrote:
> > On Thu, Sep 02, 2021 at 07:43:10AM +0200, Greg Kroah-Hartman wrote:
> > > Wait, no, this should not be a "special" thing, and why would the list
> > > of deferred probe show this?
> >
> > Why as in why would it work/do what I want, or as in why would you want to do that?
>
> Both!  :)

So first: why would it work.
You seem to have a misconception that I am "messing with the probe
function list".
I am not, I am just exporting the information whether the device had a
driver which returned -EPROBE_DEFER during probe, or not. For that I am
looking at the presence of this device on the deferred_probe_pending_list.

driver_probe_device
-> if (ret == -EPROBE_DEFER || ret == EPROBE_DEFER) driver_deferred_probe_add(dev);
   -> list_add_tail(&dev->p->deferred_probe, &deferred_probe_pending_list);

driver_bound
-> driver_deferred_probe_del
   -> list_del_init(&dev->p->deferred_probe);

So the presence of "dev" inside deferred_probe_pending_list means
precisely that a driver is pending to be bound.

Second: why would I want to do that.
In the case of PHY devices, the driver binding process starts here:

phy_device_register
-> device_add

It begins synchronously, but may not finish due to probe deferral.
So after device_add finishes, phydev->drv might be NULL due to 2 reasons:

1. -EPROBE_DEFER triggered by "somebody", either by the PHY driver probe
   function itself, or by third parties (like device_links_check_suppliers
   happening to notice that before even calling the driver's probe fn).
   Anyway, the distinction between these 2 is pretty much irrelevant.

2. There genuinely was no driver loaded in the system for this PHY. Note
   that the way things are written, the Generic PHY driver will not
   match on any device in phy_bus_match(). It is bound manually, separately.

The PHY library is absolutely happy to work with a headless chicken, a
phydev with a NULL phydev->drv. Just search for "if (!phydev->drv)"
inside drivers/net/phy/phy.c and drivers/net/phy/phy_device.c.

However, the phydev walking with a NULL drv can only last for so long.
An Ethernet port will soon need that PHY device, and will attach to it.
There are many code paths, all ending in phy_attach_direct.
However, when an Ethernet port decides to attach to a PHY device is
completely asynchronous to the lifetime of the PHY device itself.
This moment is where a driver is really needed, and if none is present,
the generic one is force-bound.

My patch only distinguishes between case 1 and 2 for which phydev->drv
might be NULL. It avoids force-binding the generic PHY when a specific
PHY driver was found, but did not finish binding due to probe deferral.

> > > If a bus wants to have this type of "generic vs. specific" logic, then
> > > it needs to handle it in the bus logic itself as that does NOT fit into
> > > the normal driver model at all.  Don't try to get a "hint" of this by
> > > messing with the probe function list.
> >
> > Where and how? Do you have an example?
>
> No I do not, sorry, most busses do not do this for obvious ordering /
> loading / we are not that crazy reasons.
>
> What is causing this all to suddenly break?  The devlink stuff?

There was a report related to fw_devlink indeed, however strictly
speaking, I wouldn't say it is the cause of all this. It is pretty
uncommon for a PHY device to defer probing I think, hence the bad
assumptions made around it.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [RFC PATCH net-next 0/3] Make the PHY library stop being so greedy when binding the generic PHY driver
  2021-09-01 22:50 [RFC PATCH net-next 0/3] Make the PHY library stop being so greedy when binding the generic PHY driver Vladimir Oltean
                   ` (2 preceding siblings ...)
  2021-09-01 22:50 ` [RFC PATCH net-next 3/3] net: dsa: allow the phy_connect() call to return -EPROBE_DEFER Vladimir Oltean
@ 2021-09-02 12:19 ` Russell King (Oracle)
  2021-09-02 12:35   ` Vladimir Oltean
  2021-09-02 22:05 ` Vladimir Oltean
  4 siblings, 1 reply; 42+ messages in thread
From: Russell King (Oracle) @ 2021-09-02 12:19 UTC (permalink / raw)
  To: Vladimir Oltean
  Cc: netdev, Greg Kroah-Hartman, Rafael J. Wysocki, Andrew Lunn,
	Heiner Kallweit, David S. Miller, Jakub Kicinski, Vivien Didelot,
	Florian Fainelli, Vladimir Oltean, linux-kernel, Linus Walleij,
	Alvin Šipraga, ACPI Devel Maling List, kernel-team,
	Len Brown

On Thu, Sep 02, 2021 at 01:50:50AM +0300, Vladimir Oltean wrote:
> The central point of that discussion is that DSA seems "broken" for
> expecting the PHY driver to probe immediately on PHYs belonging to the
> internal MDIO buses of switches. A few suggestions were made about what
> to do, but some were not satisfactory and some did not solve the problem.

I think you need to describe the mechanism here. Why wouldn't a PHY
belonging to an internal MDIO bus of a switch not probe immediately?
What resources may not be available?

If we have a DSA driver that tries to probe the PHYs before e.g. the
interrupt controller inside the DSA switch has been configured, aren't
we just making completely unnecessary problems for ourselves? Wouldn't
it be saner to ensure that the interrupt controller has been setup
and become available prior to attempting to setup anything that
relies upon that interrupt controller?

From what I see of Marvell switches, the internal PHYs only ever rely
on internal resources of the switch they are embedded in.

External PHYs to the switch are a different matter - these can rely on
external clocks, and in that scenario, it would make sense for a
deferred probe to cause the entire switch to defer, since we don't
have all the resources for the switch to be functional (and, because we
want the PHYs to be present at switch probe time, not when we try to
bring up the interface, I don't see there's much other choice.)

Trying to move that to interface-up time /will/ break userspace - for
example, Debian's interfaces(8) bridge support will become unreliable,
and probably a whole host of other userspace. It will cause regressions
and instability to userspace. So that's a big no.

Maybe I'm missing exactly what the problem is...

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [RFC PATCH net-next 2/3] net: dsa: destroy the phylink instance on any error in dsa_slave_phy_setup
  2021-09-01 22:50 ` [RFC PATCH net-next 2/3] net: dsa: destroy the phylink instance on any error in dsa_slave_phy_setup Vladimir Oltean
@ 2021-09-02 12:25   ` Russell King (Oracle)
  2021-09-02 23:21   ` Florian Fainelli
  1 sibling, 0 replies; 42+ messages in thread
From: Russell King (Oracle) @ 2021-09-02 12:25 UTC (permalink / raw)
  To: Vladimir Oltean
  Cc: netdev, Greg Kroah-Hartman, Rafael J. Wysocki, Andrew Lunn,
	Heiner Kallweit, David S. Miller, Jakub Kicinski, Vivien Didelot,
	Florian Fainelli, Vladimir Oltean, linux-kernel, Linus Walleij,
	Alvin Šipraga, ACPI Devel Maling List, kernel-team,
	Len Brown

On Thu, Sep 02, 2021 at 01:50:52AM +0300, Vladimir Oltean wrote:
> DSA supports connecting to a phy-handle, and has a fallback to a non-OF
> based method of connecting to an internal PHY on the switch's own MDIO
> bus, if no phy-handle and no fixed-link nodes were present.
> 
> The -ENODEV error code from the first attempt (phylink_of_phy_connect)
> is what triggers the second attempt (phylink_connect_phy).
> 
> However, when the first attempt returns a different error code than
> -ENODEV, this results in an unbalance of calls to phylink_create and
> phylink_destroy by the time we exit the function. The phylink instance
> has leaked.
> 
> There are many other error codes that can be returned by
> phylink_of_phy_connect. For example, phylink_validate returns -EINVAL.
> So this is a practical issue too.
> 
> Fixes: aab9c4067d23 ("net: dsa: Plug in PHYLINK support")
> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
> ---
> I know, I will send this bug fix to "net" too, this is provided just for
> testing purposes, and for the completeness of the patch set.

Probably should have been the first patch of the set. This looks
absolutely correct to me. Please send for the net tree.

Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>

Thanks.

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [RFC PATCH net-next 0/3] Make the PHY library stop being so greedy when binding the generic PHY driver
  2021-09-02 12:19 ` [RFC PATCH net-next 0/3] Make the PHY library stop being so greedy when binding the generic PHY driver Russell King (Oracle)
@ 2021-09-02 12:35   ` Vladimir Oltean
  2021-09-02 12:59     ` Vladimir Oltean
                       ` (2 more replies)
  0 siblings, 3 replies; 42+ messages in thread
From: Vladimir Oltean @ 2021-09-02 12:35 UTC (permalink / raw)
  To: Russell King (Oracle)
  Cc: Vladimir Oltean, netdev, Greg Kroah-Hartman, Rafael J. Wysocki,
	Andrew Lunn, Heiner Kallweit, David S. Miller, Jakub Kicinski,
	Vivien Didelot, Florian Fainelli, linux-kernel, Linus Walleij,
	Alvin Šipraga, ACPI Devel Maling List, kernel-team,
	Len Brown

On Thu, Sep 02, 2021 at 01:19:27PM +0100, Russell King (Oracle) wrote:
> On Thu, Sep 02, 2021 at 01:50:50AM +0300, Vladimir Oltean wrote:
> > The central point of that discussion is that DSA seems "broken" for
> > expecting the PHY driver to probe immediately on PHYs belonging to the
> > internal MDIO buses of switches. A few suggestions were made about what
> > to do, but some were not satisfactory and some did not solve the problem.
> 
> I think you need to describe the mechanism here. Why wouldn't a PHY
> belonging to an internal MDIO bus of a switch not probe immediately?
> What resources may not be available?

As you point out below, the interrupt-controller is what is not available.
There is a mechanism called fw_devlink which infers links from one OF
node to another based on phandles. When you have an interrupt-parent,
that OF node becomes a supplier to you. Those OF node links are then
transferred to device links once the devices having those OF nodes are
created.

> If we have a DSA driver that tries to probe the PHYs before e.g. the
> interrupt controller inside the DSA switch has been configured, aren't
> we just making completely unnecessary problems for ourselves?

This is not what happens, if that were the case, of course I would fix
_that_ and not in this way.

> Wouldn't it be saner to ensure that the interrupt controller has been
> setup and become available prior to attempting to setup anything that
> relies upon that interrupt controller?

The interrupt controller _has_ been set up. The trouble is that the
interrupt controller has the same OF node as the switch itself, and the
same OF node. Therefore, fw_devlink waits for the _entire_ switch to
finish probing, it doesn't have insight into the fact that the
dependency is just on the interrupt controller.

> From what I see of Marvell switches, the internal PHYs only ever rely
> on internal resources of the switch they are embedded in.
> 
> External PHYs to the switch are a different matter - these can rely on
> external clocks, and in that scenario, it would make sense for a
> deferred probe to cause the entire switch to defer, since we don't
> have all the resources for the switch to be functional (and, because we
> want the PHYs to be present at switch probe time, not when we try to
> bring up the interface, I don't see there's much other choice.)
> 
> Trying to move that to interface-up time /will/ break userspace - for
> example, Debian's interfaces(8) bridge support will become unreliable,
> and probably a whole host of other userspace. It will cause regressions
> and instability to userspace. So that's a big no.

Why a big no? I expect there to be 2 call paths of phy_attach_direct:
- At probe time. Both the MAC driver and the PHY driver are probing.
  This is what has this patch addresses. There is no issue to return
  -EPROBE_DEFER at that time, since drivers connect to the PHY before
  they register their netdev. So if connecting defers, there is no
  netdev to unregister, and user space knows nothing of this.
- At .ndo_open time. This is where it maybe gets interesting, but not to
  user space. If you open a netdev and it connects to the PHY then, I
  wouldn't expect the PHY to be undergoing a probing process, all of
  that should have been settled by then, should it not? Where it might
  get interesting is with NFS root, and I admit I haven't tested that.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [RFC PATCH net-next 0/3] Make the PHY library stop being so greedy when binding the generic PHY driver
  2021-09-02 12:35   ` Vladimir Oltean
@ 2021-09-02 12:59     ` Vladimir Oltean
  2021-09-02 13:26     ` Russell King (Oracle)
  2021-09-02 20:07     ` Andrew Lunn
  2 siblings, 0 replies; 42+ messages in thread
From: Vladimir Oltean @ 2021-09-02 12:59 UTC (permalink / raw)
  To: Russell King (Oracle)
  Cc: Vladimir Oltean, netdev, Greg Kroah-Hartman, Rafael J. Wysocki,
	Andrew Lunn, Heiner Kallweit, David S. Miller, Jakub Kicinski,
	Vivien Didelot, Florian Fainelli, linux-kernel, Linus Walleij,
	Alvin Šipraga, ACPI Devel Maling List, kernel-team,
	Len Brown

On Thu, Sep 02, 2021 at 03:35:32PM +0300, Vladimir Oltean wrote:
> On Thu, Sep 02, 2021 at 01:19:27PM +0100, Russell King (Oracle) wrote:
> > On Thu, Sep 02, 2021 at 01:50:50AM +0300, Vladimir Oltean wrote:
> > > The central point of that discussion is that DSA seems "broken" for
> > > expecting the PHY driver to probe immediately on PHYs belonging to the
> > > internal MDIO buses of switches. A few suggestions were made about what
> > > to do, but some were not satisfactory and some did not solve the problem.
> > 
> > I think you need to describe the mechanism here. Why wouldn't a PHY
> > belonging to an internal MDIO bus of a switch not probe immediately?
> > What resources may not be available?
> 
> As you point out below, the interrupt-controller is what is not available.
> There is a mechanism called fw_devlink which infers links from one OF
> node to another based on phandles. When you have an interrupt-parent,
> that OF node becomes a supplier to you. Those OF node links are then
> transferred to device links once the devices having those OF nodes are
> created.
> 
> > If we have a DSA driver that tries to probe the PHYs before e.g. the
> > interrupt controller inside the DSA switch has been configured, aren't
> > we just making completely unnecessary problems for ourselves?
> 
> This is not what happens, if that were the case, of course I would fix
> _that_ and not in this way.
> 
> > Wouldn't it be saner to ensure that the interrupt controller has been
> > setup and become available prior to attempting to setup anything that
> > relies upon that interrupt controller?
> 
> The interrupt controller _has_ been set up. The trouble is that the
> interrupt controller has the same OF node as the switch itself, and the
> same OF node. Therefore, fw_devlink waits for the _entire_ switch to

...and the same struct device, not "OF node" repeated twice, silly me.

> finish probing, it doesn't have insight into the fact that the
> dependency is just on the interrupt controller.
> 
> > From what I see of Marvell switches, the internal PHYs only ever rely
> > on internal resources of the switch they are embedded in.
> > 
> > External PHYs to the switch are a different matter - these can rely on
> > external clocks, and in that scenario, it would make sense for a
> > deferred probe to cause the entire switch to defer, since we don't
> > have all the resources for the switch to be functional (and, because we
> > want the PHYs to be present at switch probe time, not when we try to
> > bring up the interface, I don't see there's much other choice.)
> > 
> > Trying to move that to interface-up time /will/ break userspace - for
> > example, Debian's interfaces(8) bridge support will become unreliable,
> > and probably a whole host of other userspace. It will cause regressions
> > and instability to userspace. So that's a big no.
> 
> Why a big no? I expect there to be 2 call paths of phy_attach_direct:
> - At probe time. Both the MAC driver and the PHY driver are probing.
>   This is what has this patch addresses. There is no issue to return
>   -EPROBE_DEFER at that time, since drivers connect to the PHY before
>   they register their netdev. So if connecting defers, there is no
>   netdev to unregister, and user space knows nothing of this.
> - At .ndo_open time. This is where it maybe gets interesting, but not to
>   user space. If you open a netdev and it connects to the PHY then, I
>   wouldn't expect the PHY to be undergoing a probing process, all of
>   that should have been settled by then, should it not? Where it might
>   get interesting is with NFS root, and I admit I haven't tested that.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [RFC PATCH net-next 0/3] Make the PHY library stop being so greedy when binding the generic PHY driver
  2021-09-02 12:35   ` Vladimir Oltean
  2021-09-02 12:59     ` Vladimir Oltean
@ 2021-09-02 13:26     ` Russell King (Oracle)
  2021-09-02 15:23       ` Vladimir Oltean
  2021-09-02 20:07     ` Andrew Lunn
  2 siblings, 1 reply; 42+ messages in thread
From: Russell King (Oracle) @ 2021-09-02 13:26 UTC (permalink / raw)
  To: Vladimir Oltean
  Cc: Vladimir Oltean, netdev, Greg Kroah-Hartman, Rafael J. Wysocki,
	Andrew Lunn, Heiner Kallweit, David S. Miller, Jakub Kicinski,
	Vivien Didelot, Florian Fainelli, linux-kernel, Linus Walleij,
	Alvin Šipraga, ACPI Devel Maling List, kernel-team,
	Len Brown

On Thu, Sep 02, 2021 at 03:35:32PM +0300, Vladimir Oltean wrote:
> On Thu, Sep 02, 2021 at 01:19:27PM +0100, Russell King (Oracle) wrote:
> > On Thu, Sep 02, 2021 at 01:50:50AM +0300, Vladimir Oltean wrote:
> > > The central point of that discussion is that DSA seems "broken" for
> > > expecting the PHY driver to probe immediately on PHYs belonging to the
> > > internal MDIO buses of switches. A few suggestions were made about what
> > > to do, but some were not satisfactory and some did not solve the problem.
> > 
> > I think you need to describe the mechanism here. Why wouldn't a PHY
> > belonging to an internal MDIO bus of a switch not probe immediately?
> > What resources may not be available?
> 
> As you point out below, the interrupt-controller is what is not available.
> There is a mechanism called fw_devlink which infers links from one OF
> node to another based on phandles. When you have an interrupt-parent,
> that OF node becomes a supplier to you. Those OF node links are then
> transferred to device links once the devices having those OF nodes are
> created.
> 
> > If we have a DSA driver that tries to probe the PHYs before e.g. the
> > interrupt controller inside the DSA switch has been configured, aren't
> > we just making completely unnecessary problems for ourselves?
> 
> This is not what happens, if that were the case, of course I would fix
> _that_ and not in this way.
> 
> > Wouldn't it be saner to ensure that the interrupt controller has been
> > setup and become available prior to attempting to setup anything that
> > relies upon that interrupt controller?
> 
> The interrupt controller _has_ been set up. The trouble is that the
> interrupt controller has the same OF node as the switch itself, and the
> same OF node. Therefore, fw_devlink waits for the _entire_ switch to
> finish probing, it doesn't have insight into the fact that the
> dependency is just on the interrupt controller.
> 
> > From what I see of Marvell switches, the internal PHYs only ever rely
> > on internal resources of the switch they are embedded in.
> > 
> > External PHYs to the switch are a different matter - these can rely on
> > external clocks, and in that scenario, it would make sense for a
> > deferred probe to cause the entire switch to defer, since we don't
> > have all the resources for the switch to be functional (and, because we
> > want the PHYs to be present at switch probe time, not when we try to
> > bring up the interface, I don't see there's much other choice.)
> > 
> > Trying to move that to interface-up time /will/ break userspace - for
> > example, Debian's interfaces(8) bridge support will become unreliable,
> > and probably a whole host of other userspace. It will cause regressions
> > and instability to userspace. So that's a big no.
> 
> Why a big no?

Fundamental rule of kernel programming: we do not break existing
userspace.

Debian has had support for configuring bridges at boot time via
the interfaces file for years. Breaking that is going to upset a
lot of people (me included) resulting in busted networks. It
would be a sure way to make oneself unpopular.

> I expect there to be 2 call paths of phy_attach_direct:
> - At probe time. Both the MAC driver and the PHY driver are probing.
>   This is what has this patch addresses. There is no issue to return
>   -EPROBE_DEFER at that time, since drivers connect to the PHY before
>   they register their netdev. So if connecting defers, there is no
>   netdev to unregister, and user space knows nothing of this.
> - At .ndo_open time. This is where it maybe gets interesting, but not to
>   user space. If you open a netdev and it connects to the PHY then, I
>   wouldn't expect the PHY to be undergoing a probing process, all of
>   that should have been settled by then, should it not? Where it might
>   get interesting is with NFS root, and I admit I haven't tested that.

I don't think you can make that assumption. Consider the case where
systemd is being used, DSA stuff is modular, and we're trying to
setup a bridge device on DSA. DSA could be probing while the bridge
is being setup.

Sadly, this isn't theoretical. I've ended up needing:

	pre-up sleep 1

in my bridge configuration to allow time for DSA to finish probing.
It's not a pleasant solution, nor a particularly reliable one at
that, but it currently works around the problem. We don't need more
cases of this kind of thing leading to boot time unreliability...

Or if we do, then we're turning Linux into Windows, where you can
end up with different behaviours each time the system is boot
depending on the exact order that various stuff comes up.

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [RFC PATCH net-next 1/3] net: phy: don't bind genphy in phy_attach_direct if the specific driver defers probe
  2021-09-02  5:43   ` Greg Kroah-Hartman
  2021-09-02 10:11     ` Vladimir Oltean
@ 2021-09-02 14:37     ` Rafael J. Wysocki
  1 sibling, 0 replies; 42+ messages in thread
From: Rafael J. Wysocki @ 2021-09-02 14:37 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Vladimir Oltean, netdev, Rafael J. Wysocki, Andrew Lunn,
	Heiner Kallweit, Russell King, David S. Miller, Jakub Kicinski,
	Vivien Didelot, Florian Fainelli, Vladimir Oltean,
	Linux Kernel Mailing List, Linus Walleij, Alvin Šipraga,
	ACPI Devel Maling List, kernel-team, Len Brown

On Thu, Sep 2, 2021 at 7:43 AM Greg Kroah-Hartman
<gregkh@linuxfoundation.org> wrote:
>
> On Thu, Sep 02, 2021 at 01:50:51AM +0300, Vladimir Oltean wrote:
> > There are systems where the PHY driver might get its probe deferred due
> > to a missing supplier, like an interrupt-parent, gpio, clock or whatever.
> >
> > If the phy_attach_direct call happens right in between probe attempts,
> > the PHY library is greedy and assumes that a specific driver will never
> > appear, so it just binds the generic PHY driver.
> >
> > In certain cases this is the wrong choice, because some PHYs simply need
> > the specific driver. The specific PHY driver was going to probe, given
> > enough time, but this doesn't seem to matter to phy_attach_direct.
> >
> > To solve this, make phy_attach_direct check whether a specific PHY
> > driver is pending or not, and if it is, just defer the probing of the
> > MAC that's connecting to us a bit more too.
> >
> > Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
> > ---
> >  drivers/base/dd.c            | 21 +++++++++++++++++++--
> >  drivers/net/phy/phy_device.c |  8 ++++++++
> >  include/linux/device.h       |  1 +
> >  3 files changed, 28 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/base/dd.c b/drivers/base/dd.c
> > index 1c379d20812a..b22073b0acd2 100644
> > --- a/drivers/base/dd.c
> > +++ b/drivers/base/dd.c
> > @@ -128,13 +128,30 @@ static void deferred_probe_work_func(struct work_struct *work)
> >  }
> >  static DECLARE_WORK(deferred_probe_work, deferred_probe_work_func);
> >
> > +static bool __device_pending_probe(struct device *dev)
> > +{
> > +     return !list_empty(&dev->p->deferred_probe);
> > +}
> > +
> > +bool device_pending_probe(struct device *dev)
> > +{
> > +     bool pending;
> > +
> > +     mutex_lock(&deferred_probe_mutex);
> > +     pending = __device_pending_probe(dev);
> > +     mutex_unlock(&deferred_probe_mutex);
> > +
> > +     return pending;
> > +}
> > +EXPORT_SYMBOL_GPL(device_pending_probe);
> > +
> >  void driver_deferred_probe_add(struct device *dev)
> >  {
> >       if (!dev->can_match)
> >               return;
> >
> >       mutex_lock(&deferred_probe_mutex);
> > -     if (list_empty(&dev->p->deferred_probe)) {
> > +     if (!__device_pending_probe(dev)) {
> >               dev_dbg(dev, "Added to deferred list\n");
> >               list_add_tail(&dev->p->deferred_probe, &deferred_probe_pending_list);
> >       }
> > @@ -144,7 +161,7 @@ void driver_deferred_probe_add(struct device *dev)
> >  void driver_deferred_probe_del(struct device *dev)
> >  {
> >       mutex_lock(&deferred_probe_mutex);
> > -     if (!list_empty(&dev->p->deferred_probe)) {
> > +     if (__device_pending_probe(dev)) {
> >               dev_dbg(dev, "Removed from deferred list\n");
> >               list_del_init(&dev->p->deferred_probe);
> >               __device_set_deferred_probe_reason(dev, NULL);
> > diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c
> > index 52310df121de..2c22a32f0a1c 100644
> > --- a/drivers/net/phy/phy_device.c
> > +++ b/drivers/net/phy/phy_device.c
> > @@ -1386,8 +1386,16 @@ int phy_attach_direct(struct net_device *dev, struct phy_device *phydev,
> >
> >       /* Assume that if there is no driver, that it doesn't
> >        * exist, and we should use the genphy driver.
> > +      * The exception is during probing, when the PHY driver might have
> > +      * attempted a probe but has requested deferral. Since there might be
> > +      * MAC drivers which also attach to the PHY during probe time, try
> > +      * harder to bind the specific PHY driver, and defer the MAC driver's
> > +      * probing until then.
>
> Wait, no, this should not be a "special" thing, and why would the list
> of deferred probe show this?
>
> If a bus wants to have this type of "generic vs. specific" logic, then
> it needs to handle it in the bus logic itself as that does NOT fit into
> the normal driver model at all.

Well, I think that this is a general issue and it appears to me to be
present in the driver core too, at least to some extent.

Namely, if there are two drivers matching the same device and the
first one's ->probe() returns -EPROBE_DEFER, that will be converted to
EPROBE_DEFER by really_probe(), so driver_probe_device() will pass it
to __device_attach_driver() which then will return 0.  This
bus_for_each_drv() will call __device_attach_driver()  for the second
matching driver even though the first one may still probe successfully
later.

To me, this really is a variant of "if a driver has failed to probe,
try another one" which phy_attach_direct() appears to be doing and in
both cases the probing of the "alternative" is premature if the
probing of the original driver has been deferred.

> Don't try to get a "hint" of this by messing with the probe function list.

I agree that this doesn't look particularly clean, but then I'm
wondering how to address this cleanly.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [RFC PATCH net-next 0/3] Make the PHY library stop being so greedy when binding the generic PHY driver
  2021-09-02 13:26     ` Russell King (Oracle)
@ 2021-09-02 15:23       ` Vladimir Oltean
  2021-09-02 16:31         ` Russell King (Oracle)
  0 siblings, 1 reply; 42+ messages in thread
From: Vladimir Oltean @ 2021-09-02 15:23 UTC (permalink / raw)
  To: Russell King (Oracle)
  Cc: Vladimir Oltean, netdev, Greg Kroah-Hartman, Rafael J. Wysocki,
	Andrew Lunn, Heiner Kallweit, David S. Miller, Jakub Kicinski,
	Vivien Didelot, Florian Fainelli, linux-kernel, Linus Walleij,
	Alvin Šipraga, ACPI Devel Maling List, kernel-team,
	Len Brown

On Thu, Sep 02, 2021 at 02:26:35PM +0100, Russell King (Oracle) wrote:
> On Thu, Sep 02, 2021 at 03:35:32PM +0300, Vladimir Oltean wrote:
> > On Thu, Sep 02, 2021 at 01:19:27PM +0100, Russell King (Oracle) wrote:
> > > On Thu, Sep 02, 2021 at 01:50:50AM +0300, Vladimir Oltean wrote:
> > > > The central point of that discussion is that DSA seems "broken" for
> > > > expecting the PHY driver to probe immediately on PHYs belonging to the
> > > > internal MDIO buses of switches. A few suggestions were made about what
> > > > to do, but some were not satisfactory and some did not solve the problem.
> > >
> > > I think you need to describe the mechanism here. Why wouldn't a PHY
> > > belonging to an internal MDIO bus of a switch not probe immediately?
> > > What resources may not be available?
> >
> > As you point out below, the interrupt-controller is what is not available.
> > There is a mechanism called fw_devlink which infers links from one OF
> > node to another based on phandles. When you have an interrupt-parent,
> > that OF node becomes a supplier to you. Those OF node links are then
> > transferred to device links once the devices having those OF nodes are
> > created.
> >
> > > If we have a DSA driver that tries to probe the PHYs before e.g. the
> > > interrupt controller inside the DSA switch has been configured, aren't
> > > we just making completely unnecessary problems for ourselves?
> >
> > This is not what happens, if that were the case, of course I would fix
> > _that_ and not in this way.
> >
> > > Wouldn't it be saner to ensure that the interrupt controller has been
> > > setup and become available prior to attempting to setup anything that
> > > relies upon that interrupt controller?
> >
> > The interrupt controller _has_ been set up. The trouble is that the
> > interrupt controller has the same OF node as the switch itself, and the
> > same OF node. Therefore, fw_devlink waits for the _entire_ switch to
> > finish probing, it doesn't have insight into the fact that the
> > dependency is just on the interrupt controller.
> >
> > > From what I see of Marvell switches, the internal PHYs only ever rely
> > > on internal resources of the switch they are embedded in.
> > >
> > > External PHYs to the switch are a different matter - these can rely on
> > > external clocks, and in that scenario, it would make sense for a
> > > deferred probe to cause the entire switch to defer, since we don't
> > > have all the resources for the switch to be functional (and, because we
> > > want the PHYs to be present at switch probe time, not when we try to
> > > bring up the interface, I don't see there's much other choice.)
> > >
> > > Trying to move that to interface-up time /will/ break userspace - for
> > > example, Debian's interfaces(8) bridge support will become unreliable,
> > > and probably a whole host of other userspace. It will cause regressions
> > > and instability to userspace. So that's a big no.
> >
> > Why a big no?
>
> Fundamental rule of kernel programming: we do not break existing
> userspace.

Of course, I wasn't asking why we shouldn't be breaking user space, but
about the specifics of why this change would do that.

> Debian has had support for configuring bridges at boot time via
> the interfaces file for years. Breaking that is going to upset a
> lot of people (me included) resulting in busted networks. It
> would be a sure way to make oneself unpopular.
>
> > I expect there to be 2 call paths of phy_attach_direct:
> > - At probe time. Both the MAC driver and the PHY driver are probing.
> >   This is what has this patch addresses. There is no issue to return
> >   -EPROBE_DEFER at that time, since drivers connect to the PHY before
> >   they register their netdev. So if connecting defers, there is no
> >   netdev to unregister, and user space knows nothing of this.
> > - At .ndo_open time. This is where it maybe gets interesting, but not to
> >   user space. If you open a netdev and it connects to the PHY then, I
> >   wouldn't expect the PHY to be undergoing a probing process, all of
> >   that should have been settled by then, should it not? Where it might
> >   get interesting is with NFS root, and I admit I haven't tested that.
>
> I don't think you can make that assumption. Consider the case where
> systemd is being used, DSA stuff is modular, and we're trying to
> setup a bridge device on DSA. DSA could be probing while the bridge
> is being setup.
>
> Sadly, this isn't theoretical. I've ended up needing:
>
> 	pre-up sleep 1
>
> in my bridge configuration to allow time for DSA to finish probing.
> It's not a pleasant solution, nor a particularly reliable one at
> that, but it currently works around the problem.

What problem? This is the first time I've heard of this report, and you
should definitely not need that.

I do have a system set up to use systemd-networkd, and I did want to try
this out:

$ for file in /etc/systemd/network/*; do echo ${file}; cat ${file}; done
/etc/systemd/network/br0.netdev
[NetDev]
Name=br0
Kind=bridge

[Bridge]
VLANFiltering=no
DefaultPVID=1
STP=no

[VLAN]
MVRP=no
/etc/systemd/network/br0.network
[Match]
Name=br0

[Network]
DHCP=ipv4
/etc/systemd/network/eth0.network
[Match]
Name=eth0

[Network]
Bridge=br0
/etc/systemd/network/eth1.network
[Match]
Name=eth1

[Network]
Bridge=br0
/etc/systemd/network/eth2.network
[Match]
Name=eth2

[Network]
LinkLocalAddressing=yes
/etc/systemd/network/swp.network
[Match]
Name=swp*

[Network]
BindCarrier=eth2
Bridge=br0
# Before
# bridge link
7: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 master br0 state forwarding priority 32 cost 4
8: eth1: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 master br0 state disabled priority 32 cost 100
# Kick off the probing
$ insmod sja1105.ko
[   34.922908] sja1105 spi0.1: Probed switch chip: SJA1105T
$ insmod tag_sja1105.ko
$ echo spi0.1 > /sys/bus/spi/drivers/sja1105/bind
[   51.345993] sja1105 spi0.1: Probed switch chip: SJA1105T
[   51.378063] sja1105 spi0.1 swp5 (uninitialized): PHY [mdio@2d24000:06] driver [Broadcom BCM5464] (irq=POLL)
[   51.389880] sja1105 spi0.1 swp2 (uninitialized): PHY [mdio@2d24000:03] driver [Broadcom BCM5464] (irq=POLL)
[   51.401806] sja1105 spi0.1 swp3 (uninitialized): PHY [mdio@2d24000:04] driver [Broadcom BCM5464] (irq=POLL)
[   51.413710] sja1105 spi0.1 swp4 (uninitialized): PHY [mdio@2d24000:05] driver [Broadcom BCM5464] (irq=POLL)
[   51.424859] fsl-gianfar soc:ethernet@2d90000 eth2: Link is Up - 1Gbps/Full - flow control off
[   51.453768] sja1105 spi0.1: configuring for fixed/rgmii link mode
[   51.460094] device eth2 entered promiscuous mode
[   51.464856] DSA: tree 0 setup
[   51.477105] br0: port 3(swp2) entered blocking state
[   51.478394] sja1105 spi0.1: Link is Up - 1Gbps/Full - flow control off
[   51.482080] br0: port 3(swp2) entered disabled state
[   51.531585] device swp2 entered promiscuous mode
[   51.550365] sja1105 spi0.1 swp2: configuring for phy/rgmii-id link mode
[   51.559631] br0: port 4(swp5) entered blocking state
[   51.564597] br0: port 4(swp5) entered disabled state
[   51.586224] device swp5 entered promiscuous mode
[   51.647483] sja1105 spi0.1 swp5: configuring for phy/rgmii-id link mode
[   51.665995] br0: port 5(swp4) entered blocking state
[   51.671004] br0: port 5(swp4) entered disabled state
[   51.677991] device swp4 entered promiscuous mode
[   51.685967] br0: port 6(swp3) entered blocking state
[   51.690935] br0: port 6(swp3) entered disabled state
[   51.698246] device swp3 entered promiscuous mode
[   51.746640] sja1105 spi0.1 swp4: configuring for phy/rgmii-id link mode
[   51.754986] sja1105 spi0.1 swp3: configuring for phy/rgmii-id link mode
[   54.716225] sja1105 spi0.1 swp2: Link is Up - 1Gbps/Full - flow control off
[   54.723208] IPv6: ADDRCONF(NETDEV_CHANGE): swp2: link becomes ready
[   54.729620] br0: port 3(swp2) entered blocking state
[   54.734576] br0: port 3(swp2) entered forwarding state
[   54.796136] sja1105 spi0.1 swp5: Link is Up - 1Gbps/Full - flow control off
[   54.803117] IPv6: ADDRCONF(NETDEV_CHANGE): swp5: link becomes ready
[   54.809527] br0: port 4(swp5) entered blocking state
[   54.814484] br0: port 4(swp5) entered forwarding state
[   54.876397] sja1105 spi0.1 swp3: Link is Up - 1Gbps/Full - flow control off
[   54.883378] IPv6: ADDRCONF(NETDEV_CHANGE): swp3: link becomes ready
[   54.889790] br0: port 6(swp3) entered blocking state
[   54.894744] br0: port 6(swp3) entered forwarding state
# After
$ bridge link
7: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 master br0 state forwarding priority 32 cost 4
8: eth1: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 master br0 state disabled priority 32 cost 100
12: swp5@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 master br0 state forwarding priority 32 cost 4
13: swp2@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 master br0 state forwarding priority 32 cost 4
14: swp3@eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 master br0 state forwarding priority 32 cost 4
15: swp4@eth2: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 master br0 state disabled priority 32 cost 100

The ports are ready to pass traffic, and are doing it.

So what does "wait for DSA to finish probing" mean? What driver, kernel
and systemd-networkd version is this, exactly, and what is it that needs
waiting?

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [RFC PATCH net-next 0/3] Make the PHY library stop being so greedy when binding the generic PHY driver
  2021-09-02 15:23       ` Vladimir Oltean
@ 2021-09-02 16:31         ` Russell King (Oracle)
  2021-09-02 17:10           ` Vladimir Oltean
  0 siblings, 1 reply; 42+ messages in thread
From: Russell King (Oracle) @ 2021-09-02 16:31 UTC (permalink / raw)
  To: Vladimir Oltean
  Cc: Vladimir Oltean, netdev, Greg Kroah-Hartman, Rafael J. Wysocki,
	Andrew Lunn, Heiner Kallweit, David S. Miller, Jakub Kicinski,
	Vivien Didelot, Florian Fainelli, linux-kernel, Linus Walleij,
	Alvin Šipraga, ACPI Devel Maling List, kernel-team,
	Len Brown

On Thu, Sep 02, 2021 at 06:23:42PM +0300, Vladimir Oltean wrote:
> On Thu, Sep 02, 2021 at 02:26:35PM +0100, Russell King (Oracle) wrote:
> > Debian has had support for configuring bridges at boot time via
> > the interfaces file for years. Breaking that is going to upset a
> > lot of people (me included) resulting in busted networks. It
> > would be a sure way to make oneself unpopular.
> >
> > > I expect there to be 2 call paths of phy_attach_direct:
> > > - At probe time. Both the MAC driver and the PHY driver are probing.
> > >   This is what has this patch addresses. There is no issue to return
> > >   -EPROBE_DEFER at that time, since drivers connect to the PHY before
> > >   they register their netdev. So if connecting defers, there is no
> > >   netdev to unregister, and user space knows nothing of this.
> > > - At .ndo_open time. This is where it maybe gets interesting, but not to
> > >   user space. If you open a netdev and it connects to the PHY then, I
> > >   wouldn't expect the PHY to be undergoing a probing process, all of
> > >   that should have been settled by then, should it not? Where it might
> > >   get interesting is with NFS root, and I admit I haven't tested that.
> >
> > I don't think you can make that assumption. Consider the case where
> > systemd is being used, DSA stuff is modular, and we're trying to
> > setup a bridge device on DSA. DSA could be probing while the bridge
> > is being setup.
> >
> > Sadly, this isn't theoretical. I've ended up needing:
> >
> > 	pre-up sleep 1
> >
> > in my bridge configuration to allow time for DSA to finish probing.
> > It's not a pleasant solution, nor a particularly reliable one at
> > that, but it currently works around the problem.
> 
> What problem? This is the first time I've heard of this report, and you
> should definitely not need that.

I found it when upgrading the Clearfog by the DSL modems to v5.13.
When I rebooted it with a previously working kernel (v5.7) it has
never had a problem. With v5.13, it failed to add all the lan ports
into the bridge, because the bridge was still being setup by the
kernel while userspace was trying to configure it. Note that I have
extra debug in my kernels, hence the extra messages:

Aug 30 11:29:52 sw-dsl kernel: [    3.308583] Marvell 88E1540 mv88e6xxx-0:03: probe: irq=78
Aug 30 11:29:52 sw-dsl kernel: [    3.308595] Marvell 88E1540 mv88e6xxx-0:03: probe: irq=78
Aug 30 11:29:52 sw-dsl kernel: [    3.332403] Marvell 88E1540 mv88e6xxx-0:04: probe: irq=79
Aug 30 11:29:52 sw-dsl kernel: [    3.332415] Marvell 88E1540 mv88e6xxx-0:04: probe: irq=79
Aug 30 11:29:52 sw-dsl kernel: [    3.412638] Marvell 88E1545 mv88e6xxx-0:0f: probe: irq=-1
Aug 30 11:29:52 sw-dsl kernel: [    3.412649] Marvell 88E1545 mv88e6xxx-0:0f: probe: irq=-1
Aug 30 11:29:52 sw-dsl kernel: [    3.515888] libphy: mv88e6xxx SMI: probed

Here, userspace starts configuring eno1, the ethernet port connected
to the DSA switch:

Aug 30 11:29:52 sw-dsl kernel: [    3.536090] mvneta f1030000.ethernet eno1: configuring for inband/1000base-x link mode
Aug 30 11:29:52 sw-dsl kernel: [    3.536109] mvneta f1030000.ethernet eno1: major config 1000base-x
Aug 30 11:29:52 sw-dsl kernel: [    3.536117] mvneta f1030000.ethernet eno1: phylink_mac_config: mode=inband/1000base-x/Unknown/Unknown adv=0000000,00000200,00002240 pause=04 link=0 an=1
Aug 30 11:29:52 sw-dsl kernel: [    3.536135] mvneta f1030000.ethernet eno1: mac link down
Aug 30 11:29:52 sw-dsl kernel: [    3.536135] mvneta f1030000.ethernet eno1: mac link down
Aug 30 11:29:52 sw-dsl kernel: [    3.536146] mvneta f1030000.ethernet eno1: mac link down
Aug 30 11:29:52 sw-dsl kernel: [    3.536146] mvneta f1030000.ethernet eno1: mac link down
Aug 30 11:29:52 sw-dsl kernel: [    3.572013] mvneta f1030000.ethernet eno1: mac link up
Aug 30 11:29:52 sw-dsl kernel: [    3.572016] mvneta f1030000.ethernet eno1: mac link up
Aug 30 11:29:52 sw-dsl kernel: [    3.572046] mvneta f1030000.ethernet eno1: Link is Up - 1Gbps/Full - flow control rx/tx
Aug 30 11:29:52 sw-dsl kernel: [    3.657820] 8021q: 802.1Q VLAN Support v1.8

We get the link to eno1 going down/up due to DSA's actions:

Aug 30 11:29:53 sw-dsl kernel: [    4.291882] mvneta f1030000.ethernet eno1: Link is Down
Aug 30 11:29:53 sw-dsl kernel: [    4.309425] mvneta f1030000.ethernet eno1: mac link down
Aug 30 11:29:53 sw-dsl kernel: [    4.309425] mvneta f1030000.ethernet eno1: mac link down
Aug 30 11:29:53 sw-dsl kernel: [    4.309440] mvneta f1030000.ethernet eno1: configuring for inband/1000base-x link mode
Aug 30 11:29:53 sw-dsl kernel: [    4.309447] mvneta f1030000.ethernet eno1: major config 1000base-x
Aug 30 11:29:53 sw-dsl kernel: [    4.309454] mvneta f1030000.ethernet eno1: phylink_mac_config: mode=inband/1000base-x/Unknown/Unknown adv=0000000,00000200,00002240 pause=04 link=0 an=1
Aug 30 11:29:53 sw-dsl kernel: [    4.345013] mvneta f1030000.ethernet eno1: mac link up
Aug 30 11:29:53 sw-dsl kernel: [    4.345014] mvneta f1030000.ethernet eno1: mac link up
Aug 30 11:29:53 sw-dsl kernel: [    4.345036] mvneta f1030000.ethernet eno1: Link is Up - 1Gbps/Full - flow control rx/tx

DSA then starts initialising the ports:

Aug 30 11:29:53 sw-dsl kernel: [    4.397647] mv88e6085 f1072004.mdio-mii:04 lan5 (uninitialized): PHY [mv88e6xxx-0:00] driver [Marvell 88E1540] (irq=75)
Aug 30 11:29:53 sw-dsl kernel: [    4.397663] mv88e6085 f1072004.mdio-mii:04 lan5 (uninitialized): phy: setting supported 0000000,00000000,000022ef advertising
0000000,00000000,000022ef
Aug 30 11:29:53 sw-dsl kernel: [    4.493080] mv88e6085 f1072004.mdio-mii:04 lan4 (uninitialized): PHY [mv88e6xxx-0:01] driver [Marvell 88E1540] (irq=76)
Aug 30 11:29:53 sw-dsl kernel: [    4.493093] mv88e6085 f1072004.mdio-mii:04 lan4 (uninitialized): phy: setting supported 0000000,00000000,000022ef advertising
0000000,00000000,000022ef
Aug 30 11:29:53 sw-dsl kernel: [    4.577070] mv88e6085 f1072004.mdio-mii:04 lan3 (uninitialized): PHY [mv88e6xxx-0:02] driver [Marvell 88E1540] (irq=77)
Aug 30 11:29:53 sw-dsl kernel: [    4.577081] mv88e6085 f1072004.mdio-mii:04 lan3 (uninitialized): phy: setting supported 0000000,00000000,000022ef advertising
0000000,00000000,000022ef

Meanwhile userspace is trying to setup the bridge while this is going
on, and has tried to add the non-existent lan2 at this point, but
lan4 has just been created in time, so Debian's bridge support adds
it to the brdsl bridge:

Aug 30 11:29:53 sw-dsl kernel: [    4.652237] brdsl: port 1(lan4) entered blocking state
Aug 30 11:29:53 sw-dsl kernel: [    4.652250] brdsl: port 1(lan4) entered disabled state

DSA continues setting up the other ports, here lan2, but the bridge
setup scripts have already moved on past lan2.

Aug 30 11:29:53 sw-dsl kernel: [    4.674038] mv88e6085 f1072004.mdio-mii:04 lan2 (uninitialized): PHY [mv88e6xxx-0:03] driver [Marvell 88E1540] (irq=78)
Aug 30 11:29:53 sw-dsl kernel: [    4.674052] mv88e6085 f1072004.mdio-mii:04 lan2 (uninitialized): phy: setting supported 0000000,00000000,000022ef advertising
0000000,00000000,000022ef
Aug 30 11:29:53 sw-dsl kernel: [    4.674612] device lan4 entered promiscuous mode
Aug 30 11:29:53 sw-dsl kernel: [    4.785886] device eno1 entered promiscuous mode
Aug 30 11:29:53 sw-dsl kernel: [    4.786971] mv88e6085 f1072004.mdio-mii:04 lan4: configuring for phy/gmii link mode
Aug 30 11:29:53 sw-dsl kernel: [    4.786980] mv88e6085 f1072004.mdio-mii:04 lan4: major config gmii
Aug 30 11:29:53 sw-dsl kernel: [    4.786986] mv88e6085 f1072004.mdio-mii:04 lan4: phylink_mac_config: mode=phy/gmii/Unknown/Unknown adv=0000000,00000000,00000000 pause=00 link=0 an=0
Aug 30 11:29:53 sw-dsl kernel: [    4.786996] mv88e6085 f1072004.mdio-mii:04: p1: dsa_port_phylink_mac_config()
Aug 30 11:29:53 sw-dsl kernel: [    4.789977] 8021q: adding VLAN 0 to HW filter
on device lan4
Aug 30 11:29:53 sw-dsl kernel: [    4.836720] brdsl: port 2(eno2) entered blocking state
Aug 30 11:29:53 sw-dsl kernel: [    4.836733] brdsl: port 2(eno2) entered disabled state

Here, the SFP port (on eno2) is added to the bridge.

Aug 30 11:29:53 sw-dsl kernel: [    4.836907] device eno2 entered promiscuous mode
Aug 30 11:29:53 sw-dsl kernel: [    4.837011] brdsl: port 2(eno2) entered blocking state
Aug 30 11:29:53 sw-dsl kernel: [    4.837019] brdsl: port 2(eno2) entered forwarding state
Aug 30 11:29:53 sw-dsl kernel: [    4.837058] IPv6: ADDRCONF(NETDEV_CHANGE): brdsl: link becomes ready
Aug 30 11:29:53 sw-dsl kernel: [    4.846989] mv88e6085 f1072004.mdio-mii:04 lan4: phy link down gmii/Unknown/Unknown/off
Aug 30 11:29:53 sw-dsl kernel: [    4.896264] mv88e6085 f1072004.mdio-mii:04 lan1 (uninitialized): PHY [mv88e6xxx-0:04] driver [Marvell 88E1540] (irq=79)
Aug 30 11:29:53 sw-dsl kernel: [    4.896278] mv88e6085 f1072004.mdio-mii:04 lan1 (uninitialized): phy: setting supported 0000000,00000000,000022ef advertising
0000000,00000000,000022ef
Aug 30 11:29:53 sw-dsl kernel: [    4.934514] DSA: tree 0 setup

Here, the DSA tree has finally finished initialising in the kernel.

Aug 30 11:29:53 sw-dsl kernel: [    4.986877] mv88e6085 f1072004.mdio-mii:04 lan1: configuring for phy/gmii link mode
Aug 30 11:29:53 sw-dsl kernel: [    4.986890] mv88e6085 f1072004.mdio-mii:04 lan1: major config gmii
Aug 30 11:29:53 sw-dsl kernel: [    4.986896] mv88e6085 f1072004.mdio-mii:04 lan1: phylink_mac_config: mode=phy/gmii/Unknown/Unknown adv=0000000,00000000,00000000 pause=00 link=0 an=0
Aug 30 11:29:53 sw-dsl kernel: [    4.986907] mv88e6085 f1072004.mdio-mii:04: p4: dsa_port_phylink_mac_config()
Aug 30 11:29:53 sw-dsl kernel: [    4.990199] 8021q: adding VLAN 0 to HW filter
on device lan1
Aug 30 11:29:54 sw-dsl kernel: [    5.041313] mv88e6085 f1072004.mdio-mii:04 lan1: phy link down gmii/Unknown/Unknown/off
Aug 30 11:29:56 sw-dsl kernel: [    7.630016] mv88e6085 f1072004.mdio-mii:04 lan4: phy link up gmii/1Gbps/Full/off
Aug 30 11:29:56 sw-dsl kernel: [    7.630031] mv88e6085 f1072004.mdio-mii:04 lan4: phylink_mac_config: mode=phy/gmii/1Gbps/Full adv=0000000,00000000,00000000 pause=00 link=1 an=0
Aug 30 11:29:56 sw-dsl kernel: [    7.630043] mv88e6085 f1072004.mdio-mii:04: p1: dsa_port_phylink_mac_config()
Aug 30 11:29:56 sw-dsl kernel: [    7.630294] mv88e6085 f1072004.mdio-mii:04 lan4: Link is Up - 1Gbps/Full - flow control off
Aug 30 11:29:56 sw-dsl kernel: [    7.630312] brdsl: port 1(lan4) entered blocking state
Aug 30 11:29:56 sw-dsl kernel: [    7.630321] brdsl: port 1(lan4) entered forwarding state

I then notice that my Internet connection hasn't come back, so I start
poking about with it, first adding it to the bridge:

Aug 30 11:31:13 sw-dsl kernel: [   84.990122] brdsl: port 3(lan2) entered blocking state
Aug 30 11:31:13 sw-dsl kernel: [   84.990134] brdsl: port 3(lan2) entered disabled state
Aug 30 11:31:14 sw-dsl kernel: [   85.063971] device lan2 entered promiscuous mode

And then setting it to up state and configuring its vlan settings:

Aug 30 11:32:45 sw-dsl kernel: [  176.476090] mv88e6085 f1072004.mdio-mii:04 lan2: configuring for phy/gmii link mode
Aug 30 11:32:45 sw-dsl kernel: [  176.476103] mv88e6085 f1072004.mdio-mii:04 lan2: major config gmii
Aug 30 11:32:45 sw-dsl kernel: [  176.476109] mv88e6085 f1072004.mdio-mii:04 lan2: phylink_mac_config: mode=phy/gmii/Unknown/Unknown adv=0000000,00000000,00000000 pause=00 link=0 an=0
Aug 30 11:32:45 sw-dsl kernel: [  176.476120] mv88e6085 f1072004.mdio-mii:04: p3: dsa_port_phylink_mac_config()
Aug 30 11:32:45 sw-dsl kernel: [  176.479495] 8021q: adding VLAN 0 to HW filter
on device lan2
Aug 30 11:32:45 sw-dsl kernel: [  176.537796] mv88e6085 f1072004.mdio-mii:04 lan2: phy link down gmii/Unknown/Unknown/off
Aug 30 11:32:48 sw-dsl kernel: [  179.280863] mv88e6085 f1072004.mdio-mii:04 lan2: phy link up gmii/1Gbps/Full/rx/tx
Aug 30 11:32:48 sw-dsl kernel: [  179.280877] mv88e6085 f1072004.mdio-mii:04 lan2: phylink_mac_config: mode=phy/gmii/1Gbps/Full adv=0000000,00000000,00000000 pause=03 link=1 an=0
Aug 30 11:32:48 sw-dsl kernel: [  179.280888] mv88e6085 f1072004.mdio-mii:04: p3: dsa_port_phylink_mac_config()
Aug 30 11:32:48 sw-dsl kernel: [  179.280894] mv88e6085 f1072004.mdio-mii:04: p3: dsa_port_phylink_mac_link_up()
Aug 30 11:32:48 sw-dsl kernel: [  179.282958] mv88e6085 f1072004.mdio-mii:04 lan2: Link is Up - 1Gbps/Full - flow control rx/tx

I had:

iface brdsl inet manual
        bridge-ports lan2 lan4
        bridge-maxwait 0
	up brctl addif $IFACE eno2

I now have:
iface brdsl inet manual
        bridge-ports lan2 lan4
        bridge-waitport 10
        bridge-maxwait 0
        pre-up sleep 1
	up brctl addif $IFACE eno2

to ensure that all ports get properly configured.

What can be seen from the above is that there is most definitely a race.
It is possible to start configuring a DSA switch before the DSA switch
driver has finished being probed by the kernel.

Here is the kernel log from v5.7 which has never showed these problems,
because DSA seemed to always setup everything in kernel space prior to
userspace beginning configuration:

Aug 25 23:03:54 sw-dsl kernel: [    5.793137] mvneta f1030000.ethernet eno1: configuring for inband/1000base-x link mode
Aug 25 23:03:54 sw-dsl kernel: [    5.793148] mvneta f1030000.ethernet eno1: config interface 1000base-x
Aug 25 23:03:54 sw-dsl kernel: [    5.793157] mvneta f1030000.ethernet eno1: phylink_mac_config: mode=inband/1000base-x/Unknown/Unknown adv=000,00000200,00002240 pause=04 link=0 an=1
Aug 25 23:03:54 sw-dsl kernel: [    5.793168] mvneta f1030000.ethernet eno1: mac link down
Aug 25 23:03:54 sw-dsl kernel: [    5.793170] mvneta f1030000.ethernet eno1: mac link down
Aug 25 23:03:54 sw-dsl kernel: [    5.819769] mvneta f1030000.ethernet eno1: mac link up
Aug 25 23:03:54 sw-dsl kernel: [    5.819792] mvneta f1030000.ethernet eno1: Link is Up - 1Gbps/Full - flow control rx/tx
Aug 25 23:03:54 sw-dsl kernel: [    5.948900] 8021q: 802.1Q VLAN Support v1.8
  6.459779] mv88e6085 f1072004.mdio-mii:04: nonfatal error -95 setting MTU on port 0
Aug 25 23:03:54 sw-dsl kernel: [    6.462890] mv88e6085 f1072004.mdio-mii:04 lan5 (uninitialized): PHY [mv88e6xxx-0:00] driver [Marvell 88E1540] (irq=67)
Aug 25 23:03:54 sw-dsl kernel: [    6.462905] mv88e6085 f1072004.mdio-mii:04 lan5 (uninitialized): phy: setting supported 000,00000000,000022ef advertising 000,00000000,000022ef
Aug 25 23:03:54 sw-dsl kernel: [    6.465904] mv88e6085 f1072004.mdio-mii:04: nonfatal error -95 setting MTU on port 1
Aug 25 23:03:54 sw-dsl kernel: [    6.468101] mv88e6085 f1072004.mdio-mii:04 lan4 (uninitialized): PHY [mv88e6xxx-0:01] driver [Marvell 88E1540] (irq=68)
Aug 25 23:03:54 sw-dsl kernel: [    6.468109] mv88e6085 f1072004.mdio-mii:04 lan4 (uninitialized): phy: setting supported 000,00000000,000022ef advertising 000,00000000,000022ef
Aug 25 23:03:54 sw-dsl kernel: [    6.472162] mv88e6085 f1072004.mdio-mii:04: nonfatal error -95 setting MTU on port 2
Aug 25 23:03:54 sw-dsl kernel: [    6.474247] mv88e6085 f1072004.mdio-mii:04 lan3 (uninitialized): PHY [mv88e6xxx-0:02] driver [Marvell 88E1540] (irq=69)
Aug 25 23:03:54 sw-dsl kernel: [    6.474261] mv88e6085 f1072004.mdio-mii:04 lan3 (uninitialized): phy: setting supported 000,00000000,000022ef advertising 000,00000000,000022ef
Aug 25 23:03:54 sw-dsl kernel: [    6.481824] mv88e6085 f1072004.mdio-mii:04: nonfatal error -95 setting MTU on port 3
Aug 25 23:03:54 sw-dsl kernel: [    6.486354] mv88e6085 f1072004.mdio-mii:04 lan2 (uninitialized): PHY [mv88e6xxx-0:03] driver [Marvell 88E1540] (irq=70)
Aug 25 23:03:54 sw-dsl kernel: [    6.486363] mv88e6085 f1072004.mdio-mii:04 lan2 (uninitialized): phy: setting supported 000,00000000,000022ef advertising 000,00000000,000022ef
Aug 25 23:03:54 sw-dsl kernel: [    6.498494] mv88e6085 f1072004.mdio-mii:04: nonfatal error -95 setting MTU on port 4
Aug 25 23:03:54 sw-dsl kernel: [    6.502272] mv88e6085 f1072004.mdio-mii:04 lan1 (uninitialized): PHY [mv88e6xxx-0:04] driver [Marvell 88E1540] (irq=71)
Aug 25 23:03:54 sw-dsl kernel: [    6.502279] mv88e6085 f1072004.mdio-mii:04 lan1 (uninitialized): phy: setting supported 000,00000000,000022ef advertising 000,00000000,000022ef
Aug 25 23:03:54 sw-dsl kernel: [    6.532258] mv88e6085 f1072004.mdio-mii:04: nonfatal error -95 setting MTU on port 6
Aug 25 23:03:54 sw-dsl kernel: [    6.535877] mvneta f1030000.ethernet eno1: Link is Down
Aug 25 23:03:54 sw-dsl kernel: [    6.541733] mvneta f1030000.ethernet eno1: configuring for inband/1000base-x link mode
Aug 25 23:03:54 sw-dsl kernel: [    6.541741] mvneta f1030000.ethernet eno1: config interface 1000base-x
Aug 25 23:03:54 sw-dsl kernel: [    6.541754] mvneta f1030000.ethernet eno1: phylink_mac_config: mode=inband/1000base-x/Unknown/Unknown adv=000,00000200,00002240 pause=04 link=0 an=1
Aug 25 23:03:54 sw-dsl kernel: [    6.541771] mvneta f1030000.ethernet eno1: mac link down
Aug 25 23:03:54 sw-dsl kernel: [    6.541779] mvneta f1030000.ethernet eno1: mac link down
Aug 25 23:03:54 sw-dsl kernel: [    6.541907] DSA: tree 0 setup

Here, the kernel DSA switch driver has finished doing its setup
before we even get to configuring the bridge device below.

Aug 25 23:03:54 sw-dsl kernel: [    6.569105] mvneta f1030000.ethernet eno1: mac link up
Aug 25 23:03:54 sw-dsl kernel: [    6.569113] mvneta f1030000.ethernet eno1: mac link up
Aug 25 23:03:54 sw-dsl kernel: [    6.569139] mvneta f1030000.ethernet eno1: Link is Up - 1Gbps/Full - flow control rx/tx
Aug 25 23:03:55 sw-dsl kernel: [    6.931763] brdsl: port 1(lan2) entered blocking state
Aug 25 23:03:55 sw-dsl kernel: [    6.931769] brdsl: port 1(lan2) entered disabled state
Aug 25 23:03:55 sw-dsl kernel: [    6.932863] device lan2 entered promiscuous mode
Aug 25 23:03:55 sw-dsl kernel: [    7.032838] device eno1 entered promiscuous mode
Aug 25 23:03:55 sw-dsl kernel: [    7.032902] mv88e6085 f1072004.mdio-mii:04 lan2: configuring for phy/gmii link mode
Aug 25 23:03:55 sw-dsl kernel: [    7.032907] mv88e6085 f1072004.mdio-mii:04 lan2: config interface gmii
Aug 25 23:03:55 sw-dsl kernel: [    7.032916] mv88e6085 f1072004.mdio-mii:04 lan2: phylink_mac_config: mode=phy/gmii/Unknown/Unknown adv=000,00000000,00000000 pause=00 link=0 an=0
Aug 25 23:03:55 sw-dsl kernel: [    7.032920] mv88e6085 f1072004.mdio-mii:04: p3: dsa_port_phylink_mac_config()
Aug 25 23:03:55 sw-dsl kernel: [    7.037225] 8021q: adding VLAN 0 to HW filter
on device lan2
Aug 25 23:03:55 sw-dsl kernel: [    7.044979] brdsl: port 2(lan4) entered blocking state
Aug 25 23:03:55 sw-dsl kernel: [    7.044985] brdsl: port 2(lan4) entered disabled state
Aug 25 23:03:55 sw-dsl kernel: [    7.056189] device lan4 entered promiscuous mode
Aug 25 23:03:55 sw-dsl kernel: [    7.107067] mv88e6085 f1072004.mdio-mii:04 lan4: configuring for phy/gmii link mode
Aug 25 23:03:55 sw-dsl kernel: [    7.107073] mv88e6085 f1072004.mdio-mii:04 lan4: config interface gmii
Aug 25 23:03:55 sw-dsl kernel: [    7.107080] mv88e6085 f1072004.mdio-mii:04 lan4: phylink_mac_config: mode=phy/gmii/Unknown/Unknown adv=000,00000000,00000000 pause=00 link=0 an=0
Aug 25 23:03:55 sw-dsl kernel: [    7.107084] mv88e6085 f1072004.mdio-mii:04: p1: dsa_port_phylink_mac_config()
Aug 25 23:03:55 sw-dsl kernel: [    7.118831] 8021q: adding VLAN 0 to HW filter
on device lan4
Aug 25 23:03:55 sw-dsl kernel: [    7.153604] brdsl: port 3(eno2) entered blocking state
Aug 25 23:03:55 sw-dsl kernel: [    7.153610] brdsl: port 3(eno2) entered disabled state
Aug 25 23:03:55 sw-dsl kernel: [    7.153720] mv88e6085 f1072004.mdio-mii:04 lan2: phy link down gmii/Unknown/Unknown/off
Aug 25 23:03:55 sw-dsl kernel: [    7.153790] device eno2 entered promiscuous mode
Aug 25 23:03:55 sw-dsl kernel: [    7.153890] brdsl: port 3(eno2) entered blocking state
Aug 25 23:03:55 sw-dsl kernel: [    7.153895] brdsl: port 3(eno2) entered forwarding state
Aug 25 23:03:55 sw-dsl kernel: [    7.153930] IPv6: ADDRCONF(NETDEV_CHANGE): brdsl: link becomes ready
Aug 25 23:03:55 sw-dsl kernel: [    7.295739] mv88e6085 f1072004.mdio-mii:04 lan4: phy link down gmii/Unknown/Unknown/off
Aug 25 23:03:55 sw-dsl kernel: [    7.575615] mv88e6085 f1072004.mdio-mii:04 lan1: configuring for phy/gmii link mode
Aug 25 23:03:55 sw-dsl kernel: [    7.575622] mv88e6085 f1072004.mdio-mii:04 lan1: config interface gmii
Aug 25 23:03:55 sw-dsl kernel: [    7.575630] mv88e6085 f1072004.mdio-mii:04 lan1: phylink_mac_config: mode=phy/gmii/Unknown/Unknown adv=000,00000000,00000000 pause=00 link=0 an=0
Aug 25 23:03:55 sw-dsl kernel: [    7.575634] mv88e6085 f1072004.mdio-mii:04: p4: dsa_port_phylink_mac_config()
Aug 25 23:03:55 sw-dsl kernel: [    7.579334] 8021q: adding VLAN 0 to HW filter
on device lan1
Aug 25 23:03:55 sw-dsl kernel: [    7.635966] mv88e6085 f1072004.mdio-mii:04 lan1: phy link down gmii/Unknown/Unknown/off

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [RFC PATCH net-next 0/3] Make the PHY library stop being so greedy when binding the generic PHY driver
  2021-09-02 16:31         ` Russell King (Oracle)
@ 2021-09-02 17:10           ` Vladimir Oltean
  2021-09-02 17:50             ` Russell King (Oracle)
  0 siblings, 1 reply; 42+ messages in thread
From: Vladimir Oltean @ 2021-09-02 17:10 UTC (permalink / raw)
  To: Russell King (Oracle)
  Cc: Vladimir Oltean, netdev, Greg Kroah-Hartman, Rafael J. Wysocki,
	Andrew Lunn, Heiner Kallweit, David S. Miller, Jakub Kicinski,
	Vivien Didelot, Florian Fainelli, linux-kernel, Linus Walleij,
	Alvin Šipraga, ACPI Devel Maling List, kernel-team,
	Len Brown

On Thu, Sep 02, 2021 at 05:31:44PM +0100, Russell King (Oracle) wrote:
> On Thu, Sep 02, 2021 at 06:23:42PM +0300, Vladimir Oltean wrote:
> > On Thu, Sep 02, 2021 at 02:26:35PM +0100, Russell King (Oracle) wrote:
> > > Debian has had support for configuring bridges at boot time via
> > > the interfaces file for years. Breaking that is going to upset a
> > > lot of people (me included) resulting in busted networks. It
> > > would be a sure way to make oneself unpopular.
> > >
> > > > I expect there to be 2 call paths of phy_attach_direct:
> > > > - At probe time. Both the MAC driver and the PHY driver are probing.
> > > >   This is what has this patch addresses. There is no issue to return
> > > >   -EPROBE_DEFER at that time, since drivers connect to the PHY before
> > > >   they register their netdev. So if connecting defers, there is no
> > > >   netdev to unregister, and user space knows nothing of this.
> > > > - At .ndo_open time. This is where it maybe gets interesting, but not to
> > > >   user space. If you open a netdev and it connects to the PHY then, I
> > > >   wouldn't expect the PHY to be undergoing a probing process, all of
> > > >   that should have been settled by then, should it not? Where it might
> > > >   get interesting is with NFS root, and I admit I haven't tested that.
> > >
> > > I don't think you can make that assumption. Consider the case where
> > > systemd is being used, DSA stuff is modular, and we're trying to
> > > setup a bridge device on DSA. DSA could be probing while the bridge
> > > is being setup.
> > >
> > > Sadly, this isn't theoretical. I've ended up needing:
> > >
> > > 	pre-up sleep 1
> > >
> > > in my bridge configuration to allow time for DSA to finish probing.
> > > It's not a pleasant solution, nor a particularly reliable one at
> > > that, but it currently works around the problem.
> > 
> > What problem? This is the first time I've heard of this report, and you
> > should definitely not need that.
> 
> I found it when upgrading the Clearfog by the DSL modems to v5.13.
> When I rebooted it with a previously working kernel (v5.7) it has
> never had a problem. With v5.13, it failed to add all the lan ports
> into the bridge, because the bridge was still being setup by the
> kernel while userspace was trying to configure it. Note that I have
> extra debug in my kernels, hence the extra messages:

Ok, first you talked about the interfaces file, then systemd. If it's
not about systemd's network manager then I don't see how it is relevant.
What package and version is this exactly, ifupdown, ifupdown2,
ifupdown-ng, busybox ifupdown? I think they all use the interfaces file.

> Aug 30 11:29:52 sw-dsl kernel: [    3.308583] Marvell 88E1540 mv88e6xxx-0:03: probe: irq=78
> Aug 30 11:29:52 sw-dsl kernel: [    3.308595] Marvell 88E1540 mv88e6xxx-0:03: probe: irq=78
> Aug 30 11:29:52 sw-dsl kernel: [    3.332403] Marvell 88E1540 mv88e6xxx-0:04: probe: irq=79
> Aug 30 11:29:52 sw-dsl kernel: [    3.332415] Marvell 88E1540 mv88e6xxx-0:04: probe: irq=79
> Aug 30 11:29:52 sw-dsl kernel: [    3.412638] Marvell 88E1545 mv88e6xxx-0:0f: probe: irq=-1
> Aug 30 11:29:52 sw-dsl kernel: [    3.412649] Marvell 88E1545 mv88e6xxx-0:0f: probe: irq=-1
> Aug 30 11:29:52 sw-dsl kernel: [    3.515888] libphy: mv88e6xxx SMI: probed
> 
> Here, userspace starts configuring eno1, the ethernet port connected
> to the DSA switch:
> 
> Aug 30 11:29:52 sw-dsl kernel: [    3.536090] mvneta f1030000.ethernet eno1: configuring for inband/1000base-x link mode
> Aug 30 11:29:52 sw-dsl kernel: [    3.536109] mvneta f1030000.ethernet eno1: major config 1000base-x
> Aug 30 11:29:52 sw-dsl kernel: [    3.536117] mvneta f1030000.ethernet eno1: phylink_mac_config: mode=inband/1000base-x/Unknown/Unknown adv=0000000,00000200,00002240 pause=04 link=0 an=1
> Aug 30 11:29:52 sw-dsl kernel: [    3.536135] mvneta f1030000.ethernet eno1: mac link down
> Aug 30 11:29:52 sw-dsl kernel: [    3.536135] mvneta f1030000.ethernet eno1: mac link down
> Aug 30 11:29:52 sw-dsl kernel: [    3.536146] mvneta f1030000.ethernet eno1: mac link down
> Aug 30 11:29:52 sw-dsl kernel: [    3.536146] mvneta f1030000.ethernet eno1: mac link down
> Aug 30 11:29:52 sw-dsl kernel: [    3.572013] mvneta f1030000.ethernet eno1: mac link up
> Aug 30 11:29:52 sw-dsl kernel: [    3.572016] mvneta f1030000.ethernet eno1: mac link up
> Aug 30 11:29:52 sw-dsl kernel: [    3.572046] mvneta f1030000.ethernet eno1: Link is Up - 1Gbps/Full - flow control rx/tx
> Aug 30 11:29:52 sw-dsl kernel: [    3.657820] 8021q: 802.1Q VLAN Support v1.8
> 
> We get the link to eno1 going down/up due to DSA's actions:

What "actions"? There were only 2 DSA changes related to the state of
the master interface, but DSA never forces the master to go down. Quite
the opposite, it forces the master up when it needs to, and it goes down
when the master goes down. See:

9d5ef190e561 ("net: dsa: automatically bring up DSA master when opening user port")
c0a8a9c27493 ("net: dsa: automatically bring user ports down when master goes down")

So if eno1 goes down and that causes breakage, DSA did not trigger it.
Also, please note that eno1 goes down in your "working" example too.

> 
> Aug 30 11:29:53 sw-dsl kernel: [    4.291882] mvneta f1030000.ethernet eno1: Link is Down
> Aug 30 11:29:53 sw-dsl kernel: [    4.309425] mvneta f1030000.ethernet eno1: mac link down
> Aug 30 11:29:53 sw-dsl kernel: [    4.309425] mvneta f1030000.ethernet eno1: mac link down
> Aug 30 11:29:53 sw-dsl kernel: [    4.309440] mvneta f1030000.ethernet eno1: configuring for inband/1000base-x link mode
> Aug 30 11:29:53 sw-dsl kernel: [    4.309447] mvneta f1030000.ethernet eno1: major config 1000base-x
> Aug 30 11:29:53 sw-dsl kernel: [    4.309454] mvneta f1030000.ethernet eno1: phylink_mac_config: mode=inband/1000base-x/Unknown/Unknown adv=0000000,00000200,00002240 pause=04 link=0 an=1
> Aug 30 11:29:53 sw-dsl kernel: [    4.345013] mvneta f1030000.ethernet eno1: mac link up
> Aug 30 11:29:53 sw-dsl kernel: [    4.345014] mvneta f1030000.ethernet eno1: mac link up
> Aug 30 11:29:53 sw-dsl kernel: [    4.345036] mvneta f1030000.ethernet eno1: Link is Up - 1Gbps/Full - flow control rx/tx
> 
> DSA then starts initialising the ports:
> 
> Aug 30 11:29:53 sw-dsl kernel: [    4.397647] mv88e6085 f1072004.mdio-mii:04 lan5 (uninitialized): PHY [mv88e6xxx-0:00] driver [Marvell 88E1540] (irq=75)
> Aug 30 11:29:53 sw-dsl kernel: [    4.397663] mv88e6085 f1072004.mdio-mii:04 lan5 (uninitialized): phy: setting supported 0000000,00000000,000022ef advertising
> 0000000,00000000,000022ef
> Aug 30 11:29:53 sw-dsl kernel: [    4.493080] mv88e6085 f1072004.mdio-mii:04 lan4 (uninitialized): PHY [mv88e6xxx-0:01] driver [Marvell 88E1540] (irq=76)
> Aug 30 11:29:53 sw-dsl kernel: [    4.493093] mv88e6085 f1072004.mdio-mii:04 lan4 (uninitialized): phy: setting supported 0000000,00000000,000022ef advertising
> 0000000,00000000,000022ef
> Aug 30 11:29:53 sw-dsl kernel: [    4.577070] mv88e6085 f1072004.mdio-mii:04 lan3 (uninitialized): PHY [mv88e6xxx-0:02] driver [Marvell 88E1540] (irq=77)
> Aug 30 11:29:53 sw-dsl kernel: [    4.577081] mv88e6085 f1072004.mdio-mii:04 lan3 (uninitialized): phy: setting supported 0000000,00000000,000022ef advertising
> 0000000,00000000,000022ef
> 
> Meanwhile userspace is trying to setup the bridge while this is going
> on, and has tried to add the non-existent lan2 at this point, but
> lan4 has just been created in time, so Debian's bridge support adds
> it to the brdsl bridge:
> 
> Aug 30 11:29:53 sw-dsl kernel: [    4.652237] brdsl: port 1(lan4) entered blocking state
> Aug 30 11:29:53 sw-dsl kernel: [    4.652250] brdsl: port 1(lan4) entered disabled state
> 
> DSA continues setting up the other ports, here lan2, but the bridge
> setup scripts have already moved on past lan2.

How does this program know that lan2 exists before it starts attempting
to enslave it to a bridge via the brctl program, and what does DSA do to
violate that assumption?

> 
> Aug 30 11:29:53 sw-dsl kernel: [    4.674038] mv88e6085 f1072004.mdio-mii:04 lan2 (uninitialized): PHY [mv88e6xxx-0:03] driver [Marvell 88E1540] (irq=78)
> Aug 30 11:29:53 sw-dsl kernel: [    4.674052] mv88e6085 f1072004.mdio-mii:04 lan2 (uninitialized): phy: setting supported 0000000,00000000,000022ef advertising
> 0000000,00000000,000022ef
> Aug 30 11:29:53 sw-dsl kernel: [    4.674612] device lan4 entered promiscuous mode
> Aug 30 11:29:53 sw-dsl kernel: [    4.785886] device eno1 entered promiscuous mode
> Aug 30 11:29:53 sw-dsl kernel: [    4.786971] mv88e6085 f1072004.mdio-mii:04 lan4: configuring for phy/gmii link mode
> Aug 30 11:29:53 sw-dsl kernel: [    4.786980] mv88e6085 f1072004.mdio-mii:04 lan4: major config gmii
> Aug 30 11:29:53 sw-dsl kernel: [    4.786986] mv88e6085 f1072004.mdio-mii:04 lan4: phylink_mac_config: mode=phy/gmii/Unknown/Unknown adv=0000000,00000000,00000000 pause=00 link=0 an=0
> Aug 30 11:29:53 sw-dsl kernel: [    4.786996] mv88e6085 f1072004.mdio-mii:04: p1: dsa_port_phylink_mac_config()
> Aug 30 11:29:53 sw-dsl kernel: [    4.789977] 8021q: adding VLAN 0 to HW filter
> on device lan4
> Aug 30 11:29:53 sw-dsl kernel: [    4.836720] brdsl: port 2(eno2) entered blocking state
> Aug 30 11:29:53 sw-dsl kernel: [    4.836733] brdsl: port 2(eno2) entered disabled state
> 
> Here, the SFP port (on eno2) is added to the bridge.
> 
> Aug 30 11:29:53 sw-dsl kernel: [    4.836907] device eno2 entered promiscuous mode
> Aug 30 11:29:53 sw-dsl kernel: [    4.837011] brdsl: port 2(eno2) entered blocking state
> Aug 30 11:29:53 sw-dsl kernel: [    4.837019] brdsl: port 2(eno2) entered forwarding state
> Aug 30 11:29:53 sw-dsl kernel: [    4.837058] IPv6: ADDRCONF(NETDEV_CHANGE): brdsl: link becomes ready
> Aug 30 11:29:53 sw-dsl kernel: [    4.846989] mv88e6085 f1072004.mdio-mii:04 lan4: phy link down gmii/Unknown/Unknown/off
> Aug 30 11:29:53 sw-dsl kernel: [    4.896264] mv88e6085 f1072004.mdio-mii:04 lan1 (uninitialized): PHY [mv88e6xxx-0:04] driver [Marvell 88E1540] (irq=79)
> Aug 30 11:29:53 sw-dsl kernel: [    4.896278] mv88e6085 f1072004.mdio-mii:04 lan1 (uninitialized): phy: setting supported 0000000,00000000,000022ef advertising
> 0000000,00000000,000022ef
> Aug 30 11:29:53 sw-dsl kernel: [    4.934514] DSA: tree 0 setup
> 
> Here, the DSA tree has finally finished initialising in the kernel.
> 
> Aug 30 11:29:53 sw-dsl kernel: [    4.986877] mv88e6085 f1072004.mdio-mii:04 lan1: configuring for phy/gmii link mode
> Aug 30 11:29:53 sw-dsl kernel: [    4.986890] mv88e6085 f1072004.mdio-mii:04 lan1: major config gmii
> Aug 30 11:29:53 sw-dsl kernel: [    4.986896] mv88e6085 f1072004.mdio-mii:04 lan1: phylink_mac_config: mode=phy/gmii/Unknown/Unknown adv=0000000,00000000,00000000 pause=00 link=0 an=0
> Aug 30 11:29:53 sw-dsl kernel: [    4.986907] mv88e6085 f1072004.mdio-mii:04: p4: dsa_port_phylink_mac_config()
> Aug 30 11:29:53 sw-dsl kernel: [    4.990199] 8021q: adding VLAN 0 to HW filter
> on device lan1
> Aug 30 11:29:54 sw-dsl kernel: [    5.041313] mv88e6085 f1072004.mdio-mii:04 lan1: phy link down gmii/Unknown/Unknown/off
> Aug 30 11:29:56 sw-dsl kernel: [    7.630016] mv88e6085 f1072004.mdio-mii:04 lan4: phy link up gmii/1Gbps/Full/off
> Aug 30 11:29:56 sw-dsl kernel: [    7.630031] mv88e6085 f1072004.mdio-mii:04 lan4: phylink_mac_config: mode=phy/gmii/1Gbps/Full adv=0000000,00000000,00000000 pause=00 link=1 an=0
> Aug 30 11:29:56 sw-dsl kernel: [    7.630043] mv88e6085 f1072004.mdio-mii:04: p1: dsa_port_phylink_mac_config()
> Aug 30 11:29:56 sw-dsl kernel: [    7.630294] mv88e6085 f1072004.mdio-mii:04 lan4: Link is Up - 1Gbps/Full - flow control off
> Aug 30 11:29:56 sw-dsl kernel: [    7.630312] brdsl: port 1(lan4) entered blocking state
> Aug 30 11:29:56 sw-dsl kernel: [    7.630321] brdsl: port 1(lan4) entered forwarding state
> 
> I then notice that my Internet connection hasn't come back, so I start
> poking about with it, first adding it to the bridge:
> 
> Aug 30 11:31:13 sw-dsl kernel: [   84.990122] brdsl: port 3(lan2) entered blocking state
> Aug 30 11:31:13 sw-dsl kernel: [   84.990134] brdsl: port 3(lan2) entered disabled state
> Aug 30 11:31:14 sw-dsl kernel: [   85.063971] device lan2 entered promiscuous mode
> 
> And then setting it to up state and configuring its vlan settings:
> 
> Aug 30 11:32:45 sw-dsl kernel: [  176.476090] mv88e6085 f1072004.mdio-mii:04 lan2: configuring for phy/gmii link mode
> Aug 30 11:32:45 sw-dsl kernel: [  176.476103] mv88e6085 f1072004.mdio-mii:04 lan2: major config gmii
> Aug 30 11:32:45 sw-dsl kernel: [  176.476109] mv88e6085 f1072004.mdio-mii:04 lan2: phylink_mac_config: mode=phy/gmii/Unknown/Unknown adv=0000000,00000000,00000000 pause=00 link=0 an=0
> Aug 30 11:32:45 sw-dsl kernel: [  176.476120] mv88e6085 f1072004.mdio-mii:04: p3: dsa_port_phylink_mac_config()
> Aug 30 11:32:45 sw-dsl kernel: [  176.479495] 8021q: adding VLAN 0 to HW filter
> on device lan2
> Aug 30 11:32:45 sw-dsl kernel: [  176.537796] mv88e6085 f1072004.mdio-mii:04 lan2: phy link down gmii/Unknown/Unknown/off
> Aug 30 11:32:48 sw-dsl kernel: [  179.280863] mv88e6085 f1072004.mdio-mii:04 lan2: phy link up gmii/1Gbps/Full/rx/tx
> Aug 30 11:32:48 sw-dsl kernel: [  179.280877] mv88e6085 f1072004.mdio-mii:04 lan2: phylink_mac_config: mode=phy/gmii/1Gbps/Full adv=0000000,00000000,00000000 pause=03 link=1 an=0
> Aug 30 11:32:48 sw-dsl kernel: [  179.280888] mv88e6085 f1072004.mdio-mii:04: p3: dsa_port_phylink_mac_config()
> Aug 30 11:32:48 sw-dsl kernel: [  179.280894] mv88e6085 f1072004.mdio-mii:04: p3: dsa_port_phylink_mac_link_up()
> Aug 30 11:32:48 sw-dsl kernel: [  179.282958] mv88e6085 f1072004.mdio-mii:04 lan2: Link is Up - 1Gbps/Full - flow control rx/tx
> 
> I had:
> 
> iface brdsl inet manual
>         bridge-ports lan2 lan4
>         bridge-maxwait 0
> 	up brctl addif $IFACE eno2
> 
> I now have:
> iface brdsl inet manual
>         bridge-ports lan2 lan4
>         bridge-waitport 10
>         bridge-maxwait 0
>         pre-up sleep 1
> 	up brctl addif $IFACE eno2

I searched google for the "bridge-ports" keyword relative to ifupdown
and could not find the source code of a program which parses this. Could
you let me know what is the source code of the program you are using?

> 
> to ensure that all ports get properly configured.
> 
> What can be seen from the above is that there is most definitely a race.
> It is possible to start configuring a DSA switch before the DSA switch
> driver has finished being probed by the kernel.
> 
> Here is the kernel log from v5.7 which has never showed these problems,
> because DSA seemed to always setup everything in kernel space prior to
> userspace beginning configuration:
> 
> Aug 25 23:03:54 sw-dsl kernel: [    5.793137] mvneta f1030000.ethernet eno1: configuring for inband/1000base-x link mode
> Aug 25 23:03:54 sw-dsl kernel: [    5.793148] mvneta f1030000.ethernet eno1: config interface 1000base-x
> Aug 25 23:03:54 sw-dsl kernel: [    5.793157] mvneta f1030000.ethernet eno1: phylink_mac_config: mode=inband/1000base-x/Unknown/Unknown adv=000,00000200,00002240 pause=04 link=0 an=1
> Aug 25 23:03:54 sw-dsl kernel: [    5.793168] mvneta f1030000.ethernet eno1: mac link down
> Aug 25 23:03:54 sw-dsl kernel: [    5.793170] mvneta f1030000.ethernet eno1: mac link down
> Aug 25 23:03:54 sw-dsl kernel: [    5.819769] mvneta f1030000.ethernet eno1: mac link up
> Aug 25 23:03:54 sw-dsl kernel: [    5.819792] mvneta f1030000.ethernet eno1: Link is Up - 1Gbps/Full - flow control rx/tx
> Aug 25 23:03:54 sw-dsl kernel: [    5.948900] 8021q: 802.1Q VLAN Support v1.8
>   6.459779] mv88e6085 f1072004.mdio-mii:04: nonfatal error -95 setting MTU on port 0
> Aug 25 23:03:54 sw-dsl kernel: [    6.462890] mv88e6085 f1072004.mdio-mii:04 lan5 (uninitialized): PHY [mv88e6xxx-0:00] driver [Marvell 88E1540] (irq=67)
> Aug 25 23:03:54 sw-dsl kernel: [    6.462905] mv88e6085 f1072004.mdio-mii:04 lan5 (uninitialized): phy: setting supported 000,00000000,000022ef advertising 000,00000000,000022ef
> Aug 25 23:03:54 sw-dsl kernel: [    6.465904] mv88e6085 f1072004.mdio-mii:04: nonfatal error -95 setting MTU on port 1
> Aug 25 23:03:54 sw-dsl kernel: [    6.468101] mv88e6085 f1072004.mdio-mii:04 lan4 (uninitialized): PHY [mv88e6xxx-0:01] driver [Marvell 88E1540] (irq=68)
> Aug 25 23:03:54 sw-dsl kernel: [    6.468109] mv88e6085 f1072004.mdio-mii:04 lan4 (uninitialized): phy: setting supported 000,00000000,000022ef advertising 000,00000000,000022ef
> Aug 25 23:03:54 sw-dsl kernel: [    6.472162] mv88e6085 f1072004.mdio-mii:04: nonfatal error -95 setting MTU on port 2
> Aug 25 23:03:54 sw-dsl kernel: [    6.474247] mv88e6085 f1072004.mdio-mii:04 lan3 (uninitialized): PHY [mv88e6xxx-0:02] driver [Marvell 88E1540] (irq=69)
> Aug 25 23:03:54 sw-dsl kernel: [    6.474261] mv88e6085 f1072004.mdio-mii:04 lan3 (uninitialized): phy: setting supported 000,00000000,000022ef advertising 000,00000000,000022ef
> Aug 25 23:03:54 sw-dsl kernel: [    6.481824] mv88e6085 f1072004.mdio-mii:04: nonfatal error -95 setting MTU on port 3
> Aug 25 23:03:54 sw-dsl kernel: [    6.486354] mv88e6085 f1072004.mdio-mii:04 lan2 (uninitialized): PHY [mv88e6xxx-0:03] driver [Marvell 88E1540] (irq=70)
> Aug 25 23:03:54 sw-dsl kernel: [    6.486363] mv88e6085 f1072004.mdio-mii:04 lan2 (uninitialized): phy: setting supported 000,00000000,000022ef advertising 000,00000000,000022ef
> Aug 25 23:03:54 sw-dsl kernel: [    6.498494] mv88e6085 f1072004.mdio-mii:04: nonfatal error -95 setting MTU on port 4
> Aug 25 23:03:54 sw-dsl kernel: [    6.502272] mv88e6085 f1072004.mdio-mii:04 lan1 (uninitialized): PHY [mv88e6xxx-0:04] driver [Marvell 88E1540] (irq=71)
> Aug 25 23:03:54 sw-dsl kernel: [    6.502279] mv88e6085 f1072004.mdio-mii:04 lan1 (uninitialized): phy: setting supported 000,00000000,000022ef advertising 000,00000000,000022ef
> Aug 25 23:03:54 sw-dsl kernel: [    6.532258] mv88e6085 f1072004.mdio-mii:04: nonfatal error -95 setting MTU on port 6
> Aug 25 23:03:54 sw-dsl kernel: [    6.535877] mvneta f1030000.ethernet eno1: Link is Down
> Aug 25 23:03:54 sw-dsl kernel: [    6.541733] mvneta f1030000.ethernet eno1: configuring for inband/1000base-x link mode
> Aug 25 23:03:54 sw-dsl kernel: [    6.541741] mvneta f1030000.ethernet eno1: config interface 1000base-x
> Aug 25 23:03:54 sw-dsl kernel: [    6.541754] mvneta f1030000.ethernet eno1: phylink_mac_config: mode=inband/1000base-x/Unknown/Unknown adv=000,00000200,00002240 pause=04 link=0 an=1
> Aug 25 23:03:54 sw-dsl kernel: [    6.541771] mvneta f1030000.ethernet eno1: mac link down
> Aug 25 23:03:54 sw-dsl kernel: [    6.541779] mvneta f1030000.ethernet eno1: mac link down
> Aug 25 23:03:54 sw-dsl kernel: [    6.541907] DSA: tree 0 setup
> 
> Here, the kernel DSA switch driver has finished doing its setup
> before we even get to configuring the bridge device below.
> 
> Aug 25 23:03:54 sw-dsl kernel: [    6.569105] mvneta f1030000.ethernet eno1: mac link up
> Aug 25 23:03:54 sw-dsl kernel: [    6.569113] mvneta f1030000.ethernet eno1: mac link up
> Aug 25 23:03:54 sw-dsl kernel: [    6.569139] mvneta f1030000.ethernet eno1: Link is Up - 1Gbps/Full - flow control rx/tx
> Aug 25 23:03:55 sw-dsl kernel: [    6.931763] brdsl: port 1(lan2) entered blocking state
> Aug 25 23:03:55 sw-dsl kernel: [    6.931769] brdsl: port 1(lan2) entered disabled state
> Aug 25 23:03:55 sw-dsl kernel: [    6.932863] device lan2 entered promiscuous mode
> Aug 25 23:03:55 sw-dsl kernel: [    7.032838] device eno1 entered promiscuous mode
> Aug 25 23:03:55 sw-dsl kernel: [    7.032902] mv88e6085 f1072004.mdio-mii:04 lan2: configuring for phy/gmii link mode
> Aug 25 23:03:55 sw-dsl kernel: [    7.032907] mv88e6085 f1072004.mdio-mii:04 lan2: config interface gmii
> Aug 25 23:03:55 sw-dsl kernel: [    7.032916] mv88e6085 f1072004.mdio-mii:04 lan2: phylink_mac_config: mode=phy/gmii/Unknown/Unknown adv=000,00000000,00000000 pause=00 link=0 an=0
> Aug 25 23:03:55 sw-dsl kernel: [    7.032920] mv88e6085 f1072004.mdio-mii:04: p3: dsa_port_phylink_mac_config()
> Aug 25 23:03:55 sw-dsl kernel: [    7.037225] 8021q: adding VLAN 0 to HW filter
> on device lan2
> Aug 25 23:03:55 sw-dsl kernel: [    7.044979] brdsl: port 2(lan4) entered blocking state
> Aug 25 23:03:55 sw-dsl kernel: [    7.044985] brdsl: port 2(lan4) entered disabled state
> Aug 25 23:03:55 sw-dsl kernel: [    7.056189] device lan4 entered promiscuous mode
> Aug 25 23:03:55 sw-dsl kernel: [    7.107067] mv88e6085 f1072004.mdio-mii:04 lan4: configuring for phy/gmii link mode
> Aug 25 23:03:55 sw-dsl kernel: [    7.107073] mv88e6085 f1072004.mdio-mii:04 lan4: config interface gmii
> Aug 25 23:03:55 sw-dsl kernel: [    7.107080] mv88e6085 f1072004.mdio-mii:04 lan4: phylink_mac_config: mode=phy/gmii/Unknown/Unknown adv=000,00000000,00000000 pause=00 link=0 an=0
> Aug 25 23:03:55 sw-dsl kernel: [    7.107084] mv88e6085 f1072004.mdio-mii:04: p1: dsa_port_phylink_mac_config()
> Aug 25 23:03:55 sw-dsl kernel: [    7.118831] 8021q: adding VLAN 0 to HW filter
> on device lan4
> Aug 25 23:03:55 sw-dsl kernel: [    7.153604] brdsl: port 3(eno2) entered blocking state
> Aug 25 23:03:55 sw-dsl kernel: [    7.153610] brdsl: port 3(eno2) entered disabled state
> Aug 25 23:03:55 sw-dsl kernel: [    7.153720] mv88e6085 f1072004.mdio-mii:04 lan2: phy link down gmii/Unknown/Unknown/off
> Aug 25 23:03:55 sw-dsl kernel: [    7.153790] device eno2 entered promiscuous mode
> Aug 25 23:03:55 sw-dsl kernel: [    7.153890] brdsl: port 3(eno2) entered blocking state
> Aug 25 23:03:55 sw-dsl kernel: [    7.153895] brdsl: port 3(eno2) entered forwarding state
> Aug 25 23:03:55 sw-dsl kernel: [    7.153930] IPv6: ADDRCONF(NETDEV_CHANGE): brdsl: link becomes ready
> Aug 25 23:03:55 sw-dsl kernel: [    7.295739] mv88e6085 f1072004.mdio-mii:04 lan4: phy link down gmii/Unknown/Unknown/off
> Aug 25 23:03:55 sw-dsl kernel: [    7.575615] mv88e6085 f1072004.mdio-mii:04 lan1: configuring for phy/gmii link mode
> Aug 25 23:03:55 sw-dsl kernel: [    7.575622] mv88e6085 f1072004.mdio-mii:04 lan1: config interface gmii
> Aug 25 23:03:55 sw-dsl kernel: [    7.575630] mv88e6085 f1072004.mdio-mii:04 lan1: phylink_mac_config: mode=phy/gmii/Unknown/Unknown adv=000,00000000,00000000 pause=00 link=0 an=0
> Aug 25 23:03:55 sw-dsl kernel: [    7.575634] mv88e6085 f1072004.mdio-mii:04: p4: dsa_port_phylink_mac_config()
> Aug 25 23:03:55 sw-dsl kernel: [    7.579334] 8021q: adding VLAN 0 to HW filter
> on device lan1
> Aug 25 23:03:55 sw-dsl kernel: [    7.635966] mv88e6085 f1072004.mdio-mii:04 lan1: phy link down gmii/Unknown/Unknown/off
> 
> -- 
> RMK's Patch system: https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.armlinux.org.uk%2Fdeveloper%2Fpatches%2F&amp;data=04%7C01%7Cvladimir.oltean%40nxp.com%7C4226a7652ae7497284df08d96e2f29e4%7C686ea1d3bc2b4c6fa92cd99c5c301635%7C0%7C0%7C637661971114812881%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=6hDf%2FS%2FMnpRhzEYuW14zuaEAcaTgdMsQJPpmR9WA5cI%3D&amp;reserved=0
> FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [RFC PATCH net-next 0/3] Make the PHY library stop being so greedy when binding the generic PHY driver
  2021-09-02 17:10           ` Vladimir Oltean
@ 2021-09-02 17:50             ` Russell King (Oracle)
  2021-09-02 19:05               ` Vladimir Oltean
  0 siblings, 1 reply; 42+ messages in thread
From: Russell King (Oracle) @ 2021-09-02 17:50 UTC (permalink / raw)
  To: Vladimir Oltean
  Cc: Vladimir Oltean, netdev, Greg Kroah-Hartman, Rafael J. Wysocki,
	Andrew Lunn, Heiner Kallweit, David S. Miller, Jakub Kicinski,
	Vivien Didelot, Florian Fainelli, linux-kernel, Linus Walleij,
	Alvin Šipraga, ACPI Devel Maling List, kernel-team,
	Len Brown

On Thu, Sep 02, 2021 at 05:10:34PM +0000, Vladimir Oltean wrote:
> On Thu, Sep 02, 2021 at 05:31:44PM +0100, Russell King (Oracle) wrote:
> > On Thu, Sep 02, 2021 at 06:23:42PM +0300, Vladimir Oltean wrote:
> > > On Thu, Sep 02, 2021 at 02:26:35PM +0100, Russell King (Oracle) wrote:
> > > > Debian has had support for configuring bridges at boot time via
> > > > the interfaces file for years. Breaking that is going to upset a
> > > > lot of people (me included) resulting in busted networks. It
> > > > would be a sure way to make oneself unpopular.
> > > >
> > > > > I expect there to be 2 call paths of phy_attach_direct:
> > > > > - At probe time. Both the MAC driver and the PHY driver are probing.
> > > > >   This is what has this patch addresses. There is no issue to return
> > > > >   -EPROBE_DEFER at that time, since drivers connect to the PHY before
> > > > >   they register their netdev. So if connecting defers, there is no
> > > > >   netdev to unregister, and user space knows nothing of this.
> > > > > - At .ndo_open time. This is where it maybe gets interesting, but not to
> > > > >   user space. If you open a netdev and it connects to the PHY then, I
> > > > >   wouldn't expect the PHY to be undergoing a probing process, all of
> > > > >   that should have been settled by then, should it not? Where it might
> > > > >   get interesting is with NFS root, and I admit I haven't tested that.
> > > >
> > > > I don't think you can make that assumption. Consider the case where
> > > > systemd is being used, DSA stuff is modular, and we're trying to
> > > > setup a bridge device on DSA. DSA could be probing while the bridge
> > > > is being setup.
> > > >
> > > > Sadly, this isn't theoretical. I've ended up needing:
> > > >
> > > > 	pre-up sleep 1
> > > >
> > > > in my bridge configuration to allow time for DSA to finish probing.
> > > > It's not a pleasant solution, nor a particularly reliable one at
> > > > that, but it currently works around the problem.
> > > 
> > > What problem? This is the first time I've heard of this report, and you
> > > should definitely not need that.
> > 
> > I found it when upgrading the Clearfog by the DSL modems to v5.13.
> > When I rebooted it with a previously working kernel (v5.7) it has
> > never had a problem. With v5.13, it failed to add all the lan ports
> > into the bridge, because the bridge was still being setup by the
> > kernel while userspace was trying to configure it. Note that I have
> > extra debug in my kernels, hence the extra messages:
> 
> Ok, first you talked about the interfaces file, then systemd. If it's
> not about systemd's network manager then I don't see how it is relevant.

You're reading in stuff to what I write that I did not write... I said:

"Consider the case where systemd is being used, DSA stuff is modular,
and we're trying to setup a bridge device on DSA."

That does not mean I'm using systemd's network manager - which is
something I know little about and have never used.

The reason I mentioned systemd is precisely because with systemd, you
get a hell of a lot happening parallel - and that's significiant in
this case, because it's very clear that modules are being loaded in
parallel with networking being brought up - and that is where the
problems begin. In fact, modules themselves get loaded in paralllel
with systemd.

> What package and version is this exactly, ifupdown, ifupdown2,
> ifupdown-ng, busybox ifupdown? I think they all use the interfaces file.

It's a standard uptodate debian oldstable (buster) install, not yet
upgraded to bullseye:

ifupdown 0.8.35
bridge-utils 1.6-2

> > We get the link to eno1 going down/up due to DSA's actions:
> 
> What "actions"? There were only 2 DSA changes related to the state of
> the master interface, but DSA never forces the master to go down. Quite
> the opposite, it forces the master up when it needs to, and it goes down
> when the master goes down. See:
> 
> 9d5ef190e561 ("net: dsa: automatically bring up DSA master when opening user port")
> c0a8a9c27493 ("net: dsa: automatically bring user ports down when master goes down")

mv88e6xxx will temporarily force the link down while the port is
being configured if one asks it to operate in in-band mode (which
I have.)

> So if eno1 goes down and that causes breakage, DSA did not trigger it.
> Also, please note that eno1 goes down in your "working" example too.

I'm not complaining about this. It's a non-problem. It does however
serve as an indication where we are through the bring-up.

> > Aug 30 11:29:53 sw-dsl kernel: [    4.291882] mvneta f1030000.ethernet eno1: Link is Down
> > Aug 30 11:29:53 sw-dsl kernel: [    4.309425] mvneta f1030000.ethernet eno1: mac link down
> > Aug 30 11:29:53 sw-dsl kernel: [    4.309425] mvneta f1030000.ethernet eno1: mac link down
> > Aug 30 11:29:53 sw-dsl kernel: [    4.309440] mvneta f1030000.ethernet eno1: configuring for inband/1000base-x link mode
> > Aug 30 11:29:53 sw-dsl kernel: [    4.309447] mvneta f1030000.ethernet eno1: major config 1000base-x
> > Aug 30 11:29:53 sw-dsl kernel: [    4.309454] mvneta f1030000.ethernet eno1: phylink_mac_config: mode=inband/1000base-x/Unknown/Unknown adv=0000000,00000200,00002240 pause=04 link=0 an=1
> > Aug 30 11:29:53 sw-dsl kernel: [    4.345013] mvneta f1030000.ethernet eno1: mac link up
> > Aug 30 11:29:53 sw-dsl kernel: [    4.345014] mvneta f1030000.ethernet eno1: mac link up
> > Aug 30 11:29:53 sw-dsl kernel: [    4.345036] mvneta f1030000.ethernet eno1: Link is Up - 1Gbps/Full - flow control rx/tx
> > 
> > DSA then starts initialising the ports:
> > 
> > Aug 30 11:29:53 sw-dsl kernel: [    4.397647] mv88e6085 f1072004.mdio-mii:04 lan5 (uninitialized): PHY [mv88e6xxx-0:00] driver [Marvell 88E1540] (irq=75)
> > Aug 30 11:29:53 sw-dsl kernel: [    4.397663] mv88e6085 f1072004.mdio-mii:04 lan5 (uninitialized): phy: setting supported 0000000,00000000,000022ef advertising
> > 0000000,00000000,000022ef
> > Aug 30 11:29:53 sw-dsl kernel: [    4.493080] mv88e6085 f1072004.mdio-mii:04 lan4 (uninitialized): PHY [mv88e6xxx-0:01] driver [Marvell 88E1540] (irq=76)
> > Aug 30 11:29:53 sw-dsl kernel: [    4.493093] mv88e6085 f1072004.mdio-mii:04 lan4 (uninitialized): phy: setting supported 0000000,00000000,000022ef advertising
> > 0000000,00000000,000022ef
> > Aug 30 11:29:53 sw-dsl kernel: [    4.577070] mv88e6085 f1072004.mdio-mii:04 lan3 (uninitialized): PHY [mv88e6xxx-0:02] driver [Marvell 88E1540] (irq=77)
> > Aug 30 11:29:53 sw-dsl kernel: [    4.577081] mv88e6085 f1072004.mdio-mii:04 lan3 (uninitialized): phy: setting supported 0000000,00000000,000022ef advertising
> > 0000000,00000000,000022ef
> > 
> > Meanwhile userspace is trying to setup the bridge while this is going
> > on, and has tried to add the non-existent lan2 at this point, but
> > lan4 has just been created in time, so Debian's bridge support adds
> > it to the brdsl bridge:
> > 
> > Aug 30 11:29:53 sw-dsl kernel: [    4.652237] brdsl: port 1(lan4) entered blocking state
> > Aug 30 11:29:53 sw-dsl kernel: [    4.652250] brdsl: port 1(lan4) entered disabled state
> > 
> > DSA continues setting up the other ports, here lan2, but the bridge
> > setup scripts have already moved on past lan2.
> 
> How does this program know that lan2 exists before it starts attempting
> to enslave it to a bridge via the brctl program, and what does DSA do to
> violate that assumption?

This is the whole point I'm trying to get across to you - these are
_scripts_. They aren't some fancy program that runs in the background.
They assume that the interfaces are already there - as can be seen
from my v5.7 log, they are. With v5.13, they aren't because stuff starts
coming up while DSA is still initialising.

> > Aug 30 11:29:53 sw-dsl kernel: [    4.674038] mv88e6085 f1072004.mdio-mii:04 lan2 (uninitialized): PHY [mv88e6xxx-0:03] driver [Marvell 88E1540] (irq=78)
> > Aug 30 11:29:53 sw-dsl kernel: [    4.674052] mv88e6085 f1072004.mdio-mii:04 lan2 (uninitialized): phy: setting supported 0000000,00000000,000022ef advertising
> > 0000000,00000000,000022ef
> > Aug 30 11:29:53 sw-dsl kernel: [    4.674612] device lan4 entered promiscuous mode
> > Aug 30 11:29:53 sw-dsl kernel: [    4.785886] device eno1 entered promiscuous mode
> > Aug 30 11:29:53 sw-dsl kernel: [    4.786971] mv88e6085 f1072004.mdio-mii:04 lan4: configuring for phy/gmii link mode
> > Aug 30 11:29:53 sw-dsl kernel: [    4.786980] mv88e6085 f1072004.mdio-mii:04 lan4: major config gmii
> > Aug 30 11:29:53 sw-dsl kernel: [    4.786986] mv88e6085 f1072004.mdio-mii:04 lan4: phylink_mac_config: mode=phy/gmii/Unknown/Unknown adv=0000000,00000000,00000000 pause=00 link=0 an=0
> > Aug 30 11:29:53 sw-dsl kernel: [    4.786996] mv88e6085 f1072004.mdio-mii:04: p1: dsa_port_phylink_mac_config()
> > Aug 30 11:29:53 sw-dsl kernel: [    4.789977] 8021q: adding VLAN 0 to HW filter
> > on device lan4
> > Aug 30 11:29:53 sw-dsl kernel: [    4.836720] brdsl: port 2(eno2) entered blocking state
> > Aug 30 11:29:53 sw-dsl kernel: [    4.836733] brdsl: port 2(eno2) entered disabled state
> > 
> > Here, the SFP port (on eno2) is added to the bridge.
> > 
> > Aug 30 11:29:53 sw-dsl kernel: [    4.836907] device eno2 entered promiscuous mode
> > Aug 30 11:29:53 sw-dsl kernel: [    4.837011] brdsl: port 2(eno2) entered blocking state
> > Aug 30 11:29:53 sw-dsl kernel: [    4.837019] brdsl: port 2(eno2) entered forwarding state
> > Aug 30 11:29:53 sw-dsl kernel: [    4.837058] IPv6: ADDRCONF(NETDEV_CHANGE): brdsl: link becomes ready
> > Aug 30 11:29:53 sw-dsl kernel: [    4.846989] mv88e6085 f1072004.mdio-mii:04 lan4: phy link down gmii/Unknown/Unknown/off
> > Aug 30 11:29:53 sw-dsl kernel: [    4.896264] mv88e6085 f1072004.mdio-mii:04 lan1 (uninitialized): PHY [mv88e6xxx-0:04] driver [Marvell 88E1540] (irq=79)
> > Aug 30 11:29:53 sw-dsl kernel: [    4.896278] mv88e6085 f1072004.mdio-mii:04 lan1 (uninitialized): phy: setting supported 0000000,00000000,000022ef advertising
> > 0000000,00000000,000022ef
> > Aug 30 11:29:53 sw-dsl kernel: [    4.934514] DSA: tree 0 setup
> > 
> > Here, the DSA tree has finally finished initialising in the kernel.
> > 
> > Aug 30 11:29:53 sw-dsl kernel: [    4.986877] mv88e6085 f1072004.mdio-mii:04 lan1: configuring for phy/gmii link mode
> > Aug 30 11:29:53 sw-dsl kernel: [    4.986890] mv88e6085 f1072004.mdio-mii:04 lan1: major config gmii
> > Aug 30 11:29:53 sw-dsl kernel: [    4.986896] mv88e6085 f1072004.mdio-mii:04 lan1: phylink_mac_config: mode=phy/gmii/Unknown/Unknown adv=0000000,00000000,00000000 pause=00 link=0 an=0
> > Aug 30 11:29:53 sw-dsl kernel: [    4.986907] mv88e6085 f1072004.mdio-mii:04: p4: dsa_port_phylink_mac_config()
> > Aug 30 11:29:53 sw-dsl kernel: [    4.990199] 8021q: adding VLAN 0 to HW filter
> > on device lan1
> > Aug 30 11:29:54 sw-dsl kernel: [    5.041313] mv88e6085 f1072004.mdio-mii:04 lan1: phy link down gmii/Unknown/Unknown/off
> > Aug 30 11:29:56 sw-dsl kernel: [    7.630016] mv88e6085 f1072004.mdio-mii:04 lan4: phy link up gmii/1Gbps/Full/off
> > Aug 30 11:29:56 sw-dsl kernel: [    7.630031] mv88e6085 f1072004.mdio-mii:04 lan4: phylink_mac_config: mode=phy/gmii/1Gbps/Full adv=0000000,00000000,00000000 pause=00 link=1 an=0
> > Aug 30 11:29:56 sw-dsl kernel: [    7.630043] mv88e6085 f1072004.mdio-mii:04: p1: dsa_port_phylink_mac_config()
> > Aug 30 11:29:56 sw-dsl kernel: [    7.630294] mv88e6085 f1072004.mdio-mii:04 lan4: Link is Up - 1Gbps/Full - flow control off
> > Aug 30 11:29:56 sw-dsl kernel: [    7.630312] brdsl: port 1(lan4) entered blocking state
> > Aug 30 11:29:56 sw-dsl kernel: [    7.630321] brdsl: port 1(lan4) entered forwarding state
> > 
> > I then notice that my Internet connection hasn't come back, so I start
> > poking about with it, first adding it to the bridge:
> > 
> > Aug 30 11:31:13 sw-dsl kernel: [   84.990122] brdsl: port 3(lan2) entered blocking state
> > Aug 30 11:31:13 sw-dsl kernel: [   84.990134] brdsl: port 3(lan2) entered disabled state
> > Aug 30 11:31:14 sw-dsl kernel: [   85.063971] device lan2 entered promiscuous mode
> > 
> > And then setting it to up state and configuring its vlan settings:
> > 
> > Aug 30 11:32:45 sw-dsl kernel: [  176.476090] mv88e6085 f1072004.mdio-mii:04 lan2: configuring for phy/gmii link mode
> > Aug 30 11:32:45 sw-dsl kernel: [  176.476103] mv88e6085 f1072004.mdio-mii:04 lan2: major config gmii
> > Aug 30 11:32:45 sw-dsl kernel: [  176.476109] mv88e6085 f1072004.mdio-mii:04 lan2: phylink_mac_config: mode=phy/gmii/Unknown/Unknown adv=0000000,00000000,00000000 pause=00 link=0 an=0
> > Aug 30 11:32:45 sw-dsl kernel: [  176.476120] mv88e6085 f1072004.mdio-mii:04: p3: dsa_port_phylink_mac_config()
> > Aug 30 11:32:45 sw-dsl kernel: [  176.479495] 8021q: adding VLAN 0 to HW filter
> > on device lan2
> > Aug 30 11:32:45 sw-dsl kernel: [  176.537796] mv88e6085 f1072004.mdio-mii:04 lan2: phy link down gmii/Unknown/Unknown/off
> > Aug 30 11:32:48 sw-dsl kernel: [  179.280863] mv88e6085 f1072004.mdio-mii:04 lan2: phy link up gmii/1Gbps/Full/rx/tx
> > Aug 30 11:32:48 sw-dsl kernel: [  179.280877] mv88e6085 f1072004.mdio-mii:04 lan2: phylink_mac_config: mode=phy/gmii/1Gbps/Full adv=0000000,00000000,00000000 pause=03 link=1 an=0
> > Aug 30 11:32:48 sw-dsl kernel: [  179.280888] mv88e6085 f1072004.mdio-mii:04: p3: dsa_port_phylink_mac_config()
> > Aug 30 11:32:48 sw-dsl kernel: [  179.280894] mv88e6085 f1072004.mdio-mii:04: p3: dsa_port_phylink_mac_link_up()
> > Aug 30 11:32:48 sw-dsl kernel: [  179.282958] mv88e6085 f1072004.mdio-mii:04 lan2: Link is Up - 1Gbps/Full - flow control rx/tx
> > 
> > I had:
> > 
> > iface brdsl inet manual
> >         bridge-ports lan2 lan4
> >         bridge-maxwait 0
> > 	up brctl addif $IFACE eno2
> > 
> > I now have:
> > iface brdsl inet manual
> >         bridge-ports lan2 lan4
> >         bridge-waitport 10
> >         bridge-maxwait 0
> >         pre-up sleep 1
> > 	up brctl addif $IFACE eno2
> 
> I searched google for the "bridge-ports" keyword relative to ifupdown
> and could not find the source code of a program which parses this. Could
> you let me know what is the source code of the program you are using?

It's a script, see the debian bridge-utils package:
/lib/bridge-utils/ifupdown.sh

Also see the ifup man page - ifup converts much of the interfaces
file into environment variables for called hook scripts in
/etc/network/*.d to make use of. So e.g. bridge-ports becomes
$IF_BRIDGE_PORTS etc.

Debian has been using this method since probably shortly after
bridge support was introduced - it's been around for a very long
time.

> > to ensure that all ports get properly configured.
> > 
> > What can be seen from the above is that there is most definitely a race.
> > It is possible to start configuring a DSA switch before the DSA switch
> > driver has finished being probed by the kernel.
> > 
> > Here is the kernel log from v5.7 which has never showed these problems,
> > because DSA seemed to always setup everything in kernel space prior to
> > userspace beginning configuration:
> > 
> > Aug 25 23:03:54 sw-dsl kernel: [    5.793137] mvneta f1030000.ethernet eno1: configuring for inband/1000base-x link mode
> > Aug 25 23:03:54 sw-dsl kernel: [    5.793148] mvneta f1030000.ethernet eno1: config interface 1000base-x
> > Aug 25 23:03:54 sw-dsl kernel: [    5.793157] mvneta f1030000.ethernet eno1: phylink_mac_config: mode=inband/1000base-x/Unknown/Unknown adv=000,00000200,00002240 pause=04 link=0 an=1
> > Aug 25 23:03:54 sw-dsl kernel: [    5.793168] mvneta f1030000.ethernet eno1: mac link down
> > Aug 25 23:03:54 sw-dsl kernel: [    5.793170] mvneta f1030000.ethernet eno1: mac link down
> > Aug 25 23:03:54 sw-dsl kernel: [    5.819769] mvneta f1030000.ethernet eno1: mac link up
> > Aug 25 23:03:54 sw-dsl kernel: [    5.819792] mvneta f1030000.ethernet eno1: Link is Up - 1Gbps/Full - flow control rx/tx
> > Aug 25 23:03:54 sw-dsl kernel: [    5.948900] 8021q: 802.1Q VLAN Support v1.8
> >   6.459779] mv88e6085 f1072004.mdio-mii:04: nonfatal error -95 setting MTU on port 0
> > Aug 25 23:03:54 sw-dsl kernel: [    6.462890] mv88e6085 f1072004.mdio-mii:04 lan5 (uninitialized): PHY [mv88e6xxx-0:00] driver [Marvell 88E1540] (irq=67)
> > Aug 25 23:03:54 sw-dsl kernel: [    6.462905] mv88e6085 f1072004.mdio-mii:04 lan5 (uninitialized): phy: setting supported 000,00000000,000022ef advertising 000,00000000,000022ef
> > Aug 25 23:03:54 sw-dsl kernel: [    6.465904] mv88e6085 f1072004.mdio-mii:04: nonfatal error -95 setting MTU on port 1
> > Aug 25 23:03:54 sw-dsl kernel: [    6.468101] mv88e6085 f1072004.mdio-mii:04 lan4 (uninitialized): PHY [mv88e6xxx-0:01] driver [Marvell 88E1540] (irq=68)
> > Aug 25 23:03:54 sw-dsl kernel: [    6.468109] mv88e6085 f1072004.mdio-mii:04 lan4 (uninitialized): phy: setting supported 000,00000000,000022ef advertising 000,00000000,000022ef
> > Aug 25 23:03:54 sw-dsl kernel: [    6.472162] mv88e6085 f1072004.mdio-mii:04: nonfatal error -95 setting MTU on port 2
> > Aug 25 23:03:54 sw-dsl kernel: [    6.474247] mv88e6085 f1072004.mdio-mii:04 lan3 (uninitialized): PHY [mv88e6xxx-0:02] driver [Marvell 88E1540] (irq=69)
> > Aug 25 23:03:54 sw-dsl kernel: [    6.474261] mv88e6085 f1072004.mdio-mii:04 lan3 (uninitialized): phy: setting supported 000,00000000,000022ef advertising 000,00000000,000022ef
> > Aug 25 23:03:54 sw-dsl kernel: [    6.481824] mv88e6085 f1072004.mdio-mii:04: nonfatal error -95 setting MTU on port 3
> > Aug 25 23:03:54 sw-dsl kernel: [    6.486354] mv88e6085 f1072004.mdio-mii:04 lan2 (uninitialized): PHY [mv88e6xxx-0:03] driver [Marvell 88E1540] (irq=70)
> > Aug 25 23:03:54 sw-dsl kernel: [    6.486363] mv88e6085 f1072004.mdio-mii:04 lan2 (uninitialized): phy: setting supported 000,00000000,000022ef advertising 000,00000000,000022ef
> > Aug 25 23:03:54 sw-dsl kernel: [    6.498494] mv88e6085 f1072004.mdio-mii:04: nonfatal error -95 setting MTU on port 4
> > Aug 25 23:03:54 sw-dsl kernel: [    6.502272] mv88e6085 f1072004.mdio-mii:04 lan1 (uninitialized): PHY [mv88e6xxx-0:04] driver [Marvell 88E1540] (irq=71)
> > Aug 25 23:03:54 sw-dsl kernel: [    6.502279] mv88e6085 f1072004.mdio-mii:04 lan1 (uninitialized): phy: setting supported 000,00000000,000022ef advertising 000,00000000,000022ef
> > Aug 25 23:03:54 sw-dsl kernel: [    6.532258] mv88e6085 f1072004.mdio-mii:04: nonfatal error -95 setting MTU on port 6
> > Aug 25 23:03:54 sw-dsl kernel: [    6.535877] mvneta f1030000.ethernet eno1: Link is Down
> > Aug 25 23:03:54 sw-dsl kernel: [    6.541733] mvneta f1030000.ethernet eno1: configuring for inband/1000base-x link mode
> > Aug 25 23:03:54 sw-dsl kernel: [    6.541741] mvneta f1030000.ethernet eno1: config interface 1000base-x
> > Aug 25 23:03:54 sw-dsl kernel: [    6.541754] mvneta f1030000.ethernet eno1: phylink_mac_config: mode=inband/1000base-x/Unknown/Unknown adv=000,00000200,00002240 pause=04 link=0 an=1
> > Aug 25 23:03:54 sw-dsl kernel: [    6.541771] mvneta f1030000.ethernet eno1: mac link down
> > Aug 25 23:03:54 sw-dsl kernel: [    6.541779] mvneta f1030000.ethernet eno1: mac link down
> > Aug 25 23:03:54 sw-dsl kernel: [    6.541907] DSA: tree 0 setup
> > 
> > Here, the kernel DSA switch driver has finished doing its setup
> > before we even get to configuring the bridge device below.
> > 
> > Aug 25 23:03:54 sw-dsl kernel: [    6.569105] mvneta f1030000.ethernet eno1: mac link up
> > Aug 25 23:03:54 sw-dsl kernel: [    6.569113] mvneta f1030000.ethernet eno1: mac link up
> > Aug 25 23:03:54 sw-dsl kernel: [    6.569139] mvneta f1030000.ethernet eno1: Link is Up - 1Gbps/Full - flow control rx/tx
> > Aug 25 23:03:55 sw-dsl kernel: [    6.931763] brdsl: port 1(lan2) entered blocking state
> > Aug 25 23:03:55 sw-dsl kernel: [    6.931769] brdsl: port 1(lan2) entered disabled state
> > Aug 25 23:03:55 sw-dsl kernel: [    6.932863] device lan2 entered promiscuous mode
> > Aug 25 23:03:55 sw-dsl kernel: [    7.032838] device eno1 entered promiscuous mode
> > Aug 25 23:03:55 sw-dsl kernel: [    7.032902] mv88e6085 f1072004.mdio-mii:04 lan2: configuring for phy/gmii link mode
> > Aug 25 23:03:55 sw-dsl kernel: [    7.032907] mv88e6085 f1072004.mdio-mii:04 lan2: config interface gmii
> > Aug 25 23:03:55 sw-dsl kernel: [    7.032916] mv88e6085 f1072004.mdio-mii:04 lan2: phylink_mac_config: mode=phy/gmii/Unknown/Unknown adv=000,00000000,00000000 pause=00 link=0 an=0
> > Aug 25 23:03:55 sw-dsl kernel: [    7.032920] mv88e6085 f1072004.mdio-mii:04: p3: dsa_port_phylink_mac_config()
> > Aug 25 23:03:55 sw-dsl kernel: [    7.037225] 8021q: adding VLAN 0 to HW filter
> > on device lan2
> > Aug 25 23:03:55 sw-dsl kernel: [    7.044979] brdsl: port 2(lan4) entered blocking state
> > Aug 25 23:03:55 sw-dsl kernel: [    7.044985] brdsl: port 2(lan4) entered disabled state
> > Aug 25 23:03:55 sw-dsl kernel: [    7.056189] device lan4 entered promiscuous mode
> > Aug 25 23:03:55 sw-dsl kernel: [    7.107067] mv88e6085 f1072004.mdio-mii:04 lan4: configuring for phy/gmii link mode
> > Aug 25 23:03:55 sw-dsl kernel: [    7.107073] mv88e6085 f1072004.mdio-mii:04 lan4: config interface gmii
> > Aug 25 23:03:55 sw-dsl kernel: [    7.107080] mv88e6085 f1072004.mdio-mii:04 lan4: phylink_mac_config: mode=phy/gmii/Unknown/Unknown adv=000,00000000,00000000 pause=00 link=0 an=0
> > Aug 25 23:03:55 sw-dsl kernel: [    7.107084] mv88e6085 f1072004.mdio-mii:04: p1: dsa_port_phylink_mac_config()
> > Aug 25 23:03:55 sw-dsl kernel: [    7.118831] 8021q: adding VLAN 0 to HW filter
> > on device lan4
> > Aug 25 23:03:55 sw-dsl kernel: [    7.153604] brdsl: port 3(eno2) entered blocking state
> > Aug 25 23:03:55 sw-dsl kernel: [    7.153610] brdsl: port 3(eno2) entered disabled state
> > Aug 25 23:03:55 sw-dsl kernel: [    7.153720] mv88e6085 f1072004.mdio-mii:04 lan2: phy link down gmii/Unknown/Unknown/off
> > Aug 25 23:03:55 sw-dsl kernel: [    7.153790] device eno2 entered promiscuous mode
> > Aug 25 23:03:55 sw-dsl kernel: [    7.153890] brdsl: port 3(eno2) entered blocking state
> > Aug 25 23:03:55 sw-dsl kernel: [    7.153895] brdsl: port 3(eno2) entered forwarding state
> > Aug 25 23:03:55 sw-dsl kernel: [    7.153930] IPv6: ADDRCONF(NETDEV_CHANGE): brdsl: link becomes ready
> > Aug 25 23:03:55 sw-dsl kernel: [    7.295739] mv88e6085 f1072004.mdio-mii:04 lan4: phy link down gmii/Unknown/Unknown/off
> > Aug 25 23:03:55 sw-dsl kernel: [    7.575615] mv88e6085 f1072004.mdio-mii:04 lan1: configuring for phy/gmii link mode
> > Aug 25 23:03:55 sw-dsl kernel: [    7.575622] mv88e6085 f1072004.mdio-mii:04 lan1: config interface gmii
> > Aug 25 23:03:55 sw-dsl kernel: [    7.575630] mv88e6085 f1072004.mdio-mii:04 lan1: phylink_mac_config: mode=phy/gmii/Unknown/Unknown adv=000,00000000,00000000 pause=00 link=0 an=0
> > Aug 25 23:03:55 sw-dsl kernel: [    7.575634] mv88e6085 f1072004.mdio-mii:04: p4: dsa_port_phylink_mac_config()
> > Aug 25 23:03:55 sw-dsl kernel: [    7.579334] 8021q: adding VLAN 0 to HW filter
> > on device lan1
> > Aug 25 23:03:55 sw-dsl kernel: [    7.635966] mv88e6085 f1072004.mdio-mii:04 lan1: phy link down gmii/Unknown/Unknown/off
> > 
> > -- 
> > RMK's Patch system: https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.armlinux.org.uk%2Fdeveloper%2Fpatches%2F&amp;data=04%7C01%7Cvladimir.oltean%40nxp.com%7C4226a7652ae7497284df08d96e2f29e4%7C686ea1d3bc2b4c6fa92cd99c5c301635%7C0%7C0%7C637661971114812881%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=6hDf%2FS%2FMnpRhzEYuW14zuaEAcaTgdMsQJPpmR9WA5cI%3D&amp;reserved=0
> > FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [RFC PATCH net-next 1/3] net: phy: don't bind genphy in phy_attach_direct if the specific driver defers probe
  2021-09-01 22:50 ` [RFC PATCH net-next 1/3] net: phy: don't bind genphy in phy_attach_direct if the specific driver defers probe Vladimir Oltean
  2021-09-02  5:43   ` Greg Kroah-Hartman
@ 2021-09-02 18:50   ` Russell King (Oracle)
  2021-09-02 19:23     ` Vladimir Oltean
  2021-09-02 19:51     ` Andrew Lunn
  1 sibling, 2 replies; 42+ messages in thread
From: Russell King (Oracle) @ 2021-09-02 18:50 UTC (permalink / raw)
  To: Vladimir Oltean
  Cc: netdev, Greg Kroah-Hartman, Rafael J. Wysocki, Andrew Lunn,
	Heiner Kallweit, David S. Miller, Jakub Kicinski, Vivien Didelot,
	Florian Fainelli, Vladimir Oltean, linux-kernel, Linus Walleij,
	Alvin Šipraga, ACPI Devel Maling List, kernel-team,
	Len Brown

On Thu, Sep 02, 2021 at 01:50:51AM +0300, Vladimir Oltean wrote:
> diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c
> index 52310df121de..2c22a32f0a1c 100644
> --- a/drivers/net/phy/phy_device.c
> +++ b/drivers/net/phy/phy_device.c
> @@ -1386,8 +1386,16 @@ int phy_attach_direct(struct net_device *dev, struct phy_device *phydev,
>  
>  	/* Assume that if there is no driver, that it doesn't
>  	 * exist, and we should use the genphy driver.
> +	 * The exception is during probing, when the PHY driver might have
> +	 * attempted a probe but has requested deferral. Since there might be
> +	 * MAC drivers which also attach to the PHY during probe time, try
> +	 * harder to bind the specific PHY driver, and defer the MAC driver's
> +	 * probing until then.
>  	 */
>  	if (!d->driver) {
> +		if (device_pending_probe(d))
> +			return -EPROBE_DEFER;

Something else that concerns me here.

As noted, many network drivers attempt to attach their PHY when the
device is brought up, and not during their probe function.

Taking a driver at random:

drivers/net/ethernet/renesas/sh_eth.c

sh_eth_phy_init() calls of_phy_connect() or phy_connect(), which
ultimately calls phy_attach_direct() and propagates the error code
via an error pointer.

sh_eth_phy_init() propagates the error code to its caller,
sh_eth_phy_start(). This is called from sh_eth_open(), which
probagates the error code. This is called from .ndo_open... and it's
highly likely -EPROBE_DEFER will end up being returned to userspace
through either netlink or netdev ioctls.

Since EPROBE_DEFER is not an error number that we export to
userspace, this should basically never be exposed to userspace, yet
we have a path that it _could_ be exposed if the above condition
is true.

If device_pending_probe() returns true e.g. during initial boot up
while modules are being loaded - maybe the phy driver doesn't have
all the resources it needs because of some other module that hasn't
finished initialising - then we have a window where this will be
exposed to userspace.

So, do we need to fix all the network drivers to do something if
their .ndo_open method encounters this? If so, what? Sleep a bit
and try again? How many times to retry? Convert the error code into
something else, causing userspace to fail where it worked before? If
so which error code?

I think this needs to be thought through a bit better. In this case,
I feel that throwing -EPROBE_DEFER to solve one problem with one
subsystem can result in new problems elsewhere.

We did have an idea at one point about reserving some flag bits in
phydev->dev_flags for phylib use, but I don't think that happened.
If this is the direction we want to go, I think we need to have a
flag in dev_flags so that callers opt-in to the new behaviour whereas
callers such as from .ndo_open keep the old behaviour - because they
just aren't setup to handle an -EPROBE_DEFER return from these
functions.

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [RFC PATCH net-next 0/3] Make the PHY library stop being so greedy when binding the generic PHY driver
  2021-09-02 17:50             ` Russell King (Oracle)
@ 2021-09-02 19:05               ` Vladimir Oltean
  2021-09-02 20:03                 ` Russell King (Oracle)
  0 siblings, 1 reply; 42+ messages in thread
From: Vladimir Oltean @ 2021-09-02 19:05 UTC (permalink / raw)
  To: Russell King (Oracle)
  Cc: Vladimir Oltean, netdev, Greg Kroah-Hartman, Rafael J. Wysocki,
	Andrew Lunn, Heiner Kallweit, David S. Miller, Jakub Kicinski,
	Vivien Didelot, Florian Fainelli, linux-kernel, Linus Walleij,
	Alvin Šipraga, ACPI Devel Maling List, kernel-team,
	Len Brown

On Thu, Sep 02, 2021 at 06:50:43PM +0100, Russell King (Oracle) wrote:
> On Thu, Sep 02, 2021 at 05:10:34PM +0000, Vladimir Oltean wrote:
> > On Thu, Sep 02, 2021 at 05:31:44PM +0100, Russell King (Oracle) wrote:
> > > On Thu, Sep 02, 2021 at 06:23:42PM +0300, Vladimir Oltean wrote:
> > > > On Thu, Sep 02, 2021 at 02:26:35PM +0100, Russell King (Oracle) wrote:
> > > > > Debian has had support for configuring bridges at boot time via
> > > > > the interfaces file for years. Breaking that is going to upset a
> > > > > lot of people (me included) resulting in busted networks. It
> > > > > would be a sure way to make oneself unpopular.
> > > > >
> > > > > > I expect there to be 2 call paths of phy_attach_direct:
> > > > > > - At probe time. Both the MAC driver and the PHY driver are probing.
> > > > > >   This is what has this patch addresses. There is no issue to return
> > > > > >   -EPROBE_DEFER at that time, since drivers connect to the PHY before
> > > > > >   they register their netdev. So if connecting defers, there is no
> > > > > >   netdev to unregister, and user space knows nothing of this.
> > > > > > - At .ndo_open time. This is where it maybe gets interesting, but not to
> > > > > >   user space. If you open a netdev and it connects to the PHY then, I
> > > > > >   wouldn't expect the PHY to be undergoing a probing process, all of
> > > > > >   that should have been settled by then, should it not? Where it might
> > > > > >   get interesting is with NFS root, and I admit I haven't tested that.
> > > > >
> > > > > I don't think you can make that assumption. Consider the case where
> > > > > systemd is being used, DSA stuff is modular, and we're trying to
> > > > > setup a bridge device on DSA. DSA could be probing while the bridge
> > > > > is being setup.
> > > > >
> > > > > Sadly, this isn't theoretical. I've ended up needing:
> > > > >
> > > > > 	pre-up sleep 1
> > > > >
> > > > > in my bridge configuration to allow time for DSA to finish probing.
> > > > > It's not a pleasant solution, nor a particularly reliable one at
> > > > > that, but it currently works around the problem.
> > > >
> > > > What problem? This is the first time I've heard of this report, and you
> > > > should definitely not need that.
> > >
> > > I found it when upgrading the Clearfog by the DSL modems to v5.13.
> > > When I rebooted it with a previously working kernel (v5.7) it has
> > > never had a problem. With v5.13, it failed to add all the lan ports
> > > into the bridge, because the bridge was still being setup by the
> > > kernel while userspace was trying to configure it. Note that I have
> > > extra debug in my kernels, hence the extra messages:
> >
> > Ok, first you talked about the interfaces file, then systemd. If it's
> > not about systemd's network manager then I don't see how it is relevant.
>
> You're reading in stuff to what I write that I did not write... I said:
>
> "Consider the case where systemd is being used, DSA stuff is modular,
> and we're trying to setup a bridge device on DSA."
>
> That does not mean I'm using systemd's network manager - which is
> something I know little about and have never used.

You should definitely try it out, it gets a lot of new features added
all the time, it uses the netlink interface, it reacts on udev events.

> The reason I mentioned systemd is precisely because with systemd, you
> get a hell of a lot happening parallel - and that's significiant in
> this case, because it's very clear that modules are being loaded in
> parallel with networking being brought up - and that is where the
> problems begin. In fact, modules themselves get loaded in paralllel
> with systemd.

So that's what I don't understand. You're saying that the ifupdown
service runs in parallel with systemd-modules-load.service, and
networking is a kernel module? Doesn't that mean it behaves as expected,
then? /shrug/
Have you tried adding an 'After=systemd-modules-load.service' dependency
to the ifupdown service? I don't think that DSA is that bad that it
registers its net devices outside of the process context in which the
insmod mv88e6xxx.ko is called. Quite the opposite, I think (but I
haven't actually taken a close look yet) that the component stuff
Saravana is proposing would do exactly that. So you "fix" one issue, you
introduce another.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [RFC PATCH net-next 1/3] net: phy: don't bind genphy in phy_attach_direct if the specific driver defers probe
  2021-09-02 18:50   ` Russell King (Oracle)
@ 2021-09-02 19:23     ` Vladimir Oltean
  2021-09-02 19:51     ` Andrew Lunn
  1 sibling, 0 replies; 42+ messages in thread
From: Vladimir Oltean @ 2021-09-02 19:23 UTC (permalink / raw)
  To: Russell King (Oracle)
  Cc: Vladimir Oltean, netdev, Greg Kroah-Hartman, Rafael J. Wysocki,
	Andrew Lunn, Heiner Kallweit, David S. Miller, Jakub Kicinski,
	Vivien Didelot, Florian Fainelli, linux-kernel, Linus Walleij,
	Alvin Šipraga, ACPI Devel Maling List, kernel-team,
	Len Brown

On Thu, Sep 02, 2021 at 07:50:16PM +0100, Russell King (Oracle) wrote:
> On Thu, Sep 02, 2021 at 01:50:51AM +0300, Vladimir Oltean wrote:
> > diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c
> > index 52310df121de..2c22a32f0a1c 100644
> > --- a/drivers/net/phy/phy_device.c
> > +++ b/drivers/net/phy/phy_device.c
> > @@ -1386,8 +1386,16 @@ int phy_attach_direct(struct net_device *dev, struct phy_device *phydev,
> >  
> >  	/* Assume that if there is no driver, that it doesn't
> >  	 * exist, and we should use the genphy driver.
> > +	 * The exception is during probing, when the PHY driver might have
> > +	 * attempted a probe but has requested deferral. Since there might be
> > +	 * MAC drivers which also attach to the PHY during probe time, try
> > +	 * harder to bind the specific PHY driver, and defer the MAC driver's
> > +	 * probing until then.
> >  	 */
> >  	if (!d->driver) {
> > +		if (device_pending_probe(d))
> > +			return -EPROBE_DEFER;
> 
> Something else that concerns me here.
> 
> As noted, many network drivers attempt to attach their PHY when the
> device is brought up, and not during their probe function.
> 
> Taking a driver at random:
> 
> drivers/net/ethernet/renesas/sh_eth.c
> 
> sh_eth_phy_init() calls of_phy_connect() or phy_connect(), which
> ultimately calls phy_attach_direct() and propagates the error code
> via an error pointer.
> 
> sh_eth_phy_init() propagates the error code to its caller,
> sh_eth_phy_start(). This is called from sh_eth_open(), which
> probagates the error code. This is called from .ndo_open... and it's
> highly likely -EPROBE_DEFER will end up being returned to userspace
> through either netlink or netdev ioctls.
> 
> Since EPROBE_DEFER is not an error number that we export to
> userspace, this should basically never be exposed to userspace, yet
> we have a path that it _could_ be exposed if the above condition
> is true.
> 
> If device_pending_probe() returns true e.g. during initial boot up
> while modules are being loaded - maybe the phy driver doesn't have
> all the resources it needs because of some other module that hasn't
> finished initialising - then we have a window where this will be
> exposed to userspace.
> 
> So, do we need to fix all the network drivers to do something if
> their .ndo_open method encounters this? If so, what? Sleep a bit
> and try again? How many times to retry? Convert the error code into
> something else, causing userspace to fail where it worked before? If
> so which error code?

It depends what is the outcome you're going for.
If there's a PHY driver pending, I would do something to wait for that
if I could, it would be silly for the PHY driver to be loading but the
PHY to still be bound to genphy.

I feel that connecting to the PHY from the probe path is the overall
cleaner way to go since it deals with this automatically, but due to the
sheer volume of drivers that connect from .ndo_open, modifying them in
bulk is out of the question. Something sensible needs to happen with
them too, and 'genphy is what you get' might be just that, which is
basically what is happening without these patches. On that note, I don't
know whether there is any objective advantage to connecting to the PHY
at .ndo_open time.

> 
> I think this needs to be thought through a bit better. In this case,
> I feel that throwing -EPROBE_DEFER to solve one problem with one
> subsystem can result in new problems elsewhere.
> 
> We did have an idea at one point about reserving some flag bits in
> phydev->dev_flags for phylib use, but I don't think that happened.
> If this is the direction we want to go, I think we need to have a
> flag in dev_flags so that callers opt-in to the new behaviour whereas
> callers such as from .ndo_open keep the old behaviour - because they
> just aren't setup to handle an -EPROBE_DEFER return from these
> functions.

Or that, yes. I hadn't actually thought about using PHY flags, but I
suppose callers which already can cope with EPROBE_DEFER (they connect
from probe) can opt into that.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [RFC PATCH net-next 1/3] net: phy: don't bind genphy in phy_attach_direct if the specific driver defers probe
  2021-09-02 18:50   ` Russell King (Oracle)
  2021-09-02 19:23     ` Vladimir Oltean
@ 2021-09-02 19:51     ` Andrew Lunn
  2021-09-02 20:33       ` Florian Fainelli
  1 sibling, 1 reply; 42+ messages in thread
From: Andrew Lunn @ 2021-09-02 19:51 UTC (permalink / raw)
  To: Russell King (Oracle)
  Cc: Vladimir Oltean, netdev, Greg Kroah-Hartman, Rafael J. Wysocki,
	Heiner Kallweit, David S. Miller, Jakub Kicinski, Vivien Didelot,
	Florian Fainelli, Vladimir Oltean, linux-kernel, Linus Walleij,
	Alvin Šipraga, ACPI Devel Maling List, kernel-team,
	Len Brown

On Thu, Sep 02, 2021 at 07:50:16PM +0100, Russell King (Oracle) wrote:
> On Thu, Sep 02, 2021 at 01:50:51AM +0300, Vladimir Oltean wrote:
> > diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c
> > index 52310df121de..2c22a32f0a1c 100644
> > --- a/drivers/net/phy/phy_device.c
> > +++ b/drivers/net/phy/phy_device.c
> > @@ -1386,8 +1386,16 @@ int phy_attach_direct(struct net_device *dev, struct phy_device *phydev,
> >  
> >  	/* Assume that if there is no driver, that it doesn't
> >  	 * exist, and we should use the genphy driver.
> > +	 * The exception is during probing, when the PHY driver might have
> > +	 * attempted a probe but has requested deferral. Since there might be
> > +	 * MAC drivers which also attach to the PHY during probe time, try
> > +	 * harder to bind the specific PHY driver, and defer the MAC driver's
> > +	 * probing until then.
> >  	 */
> >  	if (!d->driver) {
> > +		if (device_pending_probe(d))
> > +			return -EPROBE_DEFER;
> 
> Something else that concerns me here.
> 
> As noted, many network drivers attempt to attach their PHY when the
> device is brought up, and not during their probe function.

Yes, this is going to be a problem. I agree it is too late to return
-EPROBE_DEFER. Maybe phy_attach_direct() needs to wait around, if the
device is still on the deferred list, otherwise use genphy. And maybe
a timeout and return -ENODEV, which is not 100% correct, we know the
device exists, we just cannot drive it.

Can we tell we are in the context of a driver probe? Or do we need to
add a parameter to the various phy_attach API calls to let the core
know if this is probe or open?

This is more likely to be a problem with NFS root, with the kernel
bringing up an interface as soon as its registered. userspace bringing
up interfaces is generally much later, and udev tends to wait around
until there are no more driver load requests before the boot
continues.

	Andrew

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [RFC PATCH net-next 0/3] Make the PHY library stop being so greedy when binding the generic PHY driver
  2021-09-02 19:05               ` Vladimir Oltean
@ 2021-09-02 20:03                 ` Russell King (Oracle)
  2021-09-02 20:21                   ` Vladimir Oltean
  0 siblings, 1 reply; 42+ messages in thread
From: Russell King (Oracle) @ 2021-09-02 20:03 UTC (permalink / raw)
  To: Vladimir Oltean
  Cc: Vladimir Oltean, netdev, Greg Kroah-Hartman, Rafael J. Wysocki,
	Andrew Lunn, Heiner Kallweit, David S. Miller, Jakub Kicinski,
	Vivien Didelot, Florian Fainelli, linux-kernel, Linus Walleij,
	Alvin Šipraga, ACPI Devel Maling List, kernel-team,
	Len Brown

On Thu, Sep 02, 2021 at 10:05:07PM +0300, Vladimir Oltean wrote:
> On Thu, Sep 02, 2021 at 06:50:43PM +0100, Russell King (Oracle) wrote:
> > On Thu, Sep 02, 2021 at 05:10:34PM +0000, Vladimir Oltean wrote:
> > > On Thu, Sep 02, 2021 at 05:31:44PM +0100, Russell King (Oracle) wrote:
> > > > On Thu, Sep 02, 2021 at 06:23:42PM +0300, Vladimir Oltean wrote:
> > > > > On Thu, Sep 02, 2021 at 02:26:35PM +0100, Russell King (Oracle) wrote:
> > > > > > Debian has had support for configuring bridges at boot time via
> > > > > > the interfaces file for years. Breaking that is going to upset a
> > > > > > lot of people (me included) resulting in busted networks. It
> > > > > > would be a sure way to make oneself unpopular.
> > > > > >
> > > > > > > I expect there to be 2 call paths of phy_attach_direct:
> > > > > > > - At probe time. Both the MAC driver and the PHY driver are probing.
> > > > > > >   This is what has this patch addresses. There is no issue to return
> > > > > > >   -EPROBE_DEFER at that time, since drivers connect to the PHY before
> > > > > > >   they register their netdev. So if connecting defers, there is no
> > > > > > >   netdev to unregister, and user space knows nothing of this.
> > > > > > > - At .ndo_open time. This is where it maybe gets interesting, but not to
> > > > > > >   user space. If you open a netdev and it connects to the PHY then, I
> > > > > > >   wouldn't expect the PHY to be undergoing a probing process, all of
> > > > > > >   that should have been settled by then, should it not? Where it might
> > > > > > >   get interesting is with NFS root, and I admit I haven't tested that.
> > > > > >
> > > > > > I don't think you can make that assumption. Consider the case where
> > > > > > systemd is being used, DSA stuff is modular, and we're trying to
> > > > > > setup a bridge device on DSA. DSA could be probing while the bridge
> > > > > > is being setup.
> > > > > >
> > > > > > Sadly, this isn't theoretical. I've ended up needing:
> > > > > >
> > > > > > 	pre-up sleep 1
> > > > > >
> > > > > > in my bridge configuration to allow time for DSA to finish probing.
> > > > > > It's not a pleasant solution, nor a particularly reliable one at
> > > > > > that, but it currently works around the problem.
> > > > >
> > > > > What problem? This is the first time I've heard of this report, and you
> > > > > should definitely not need that.
> > > >
> > > > I found it when upgrading the Clearfog by the DSL modems to v5.13.
> > > > When I rebooted it with a previously working kernel (v5.7) it has
> > > > never had a problem. With v5.13, it failed to add all the lan ports
> > > > into the bridge, because the bridge was still being setup by the
> > > > kernel while userspace was trying to configure it. Note that I have
> > > > extra debug in my kernels, hence the extra messages:
> > >
> > > Ok, first you talked about the interfaces file, then systemd. If it's
> > > not about systemd's network manager then I don't see how it is relevant.
> >
> > You're reading in stuff to what I write that I did not write... I said:
> >
> > "Consider the case where systemd is being used, DSA stuff is modular,
> > and we're trying to setup a bridge device on DSA."
> >
> > That does not mean I'm using systemd's network manager - which is
> > something I know little about and have never used.
> 
> You should definitely try it out, it gets a lot of new features added
> all the time, it uses the netlink interface, it reacts on udev events.
> 
> > The reason I mentioned systemd is precisely because with systemd, you
> > get a hell of a lot happening parallel - and that's significiant in
> > this case, because it's very clear that modules are being loaded in
> > parallel with networking being brought up - and that is where the
> > problems begin. In fact, modules themselves get loaded in paralllel
> > with systemd.
> 
> So that's what I don't understand. You're saying that the ifupdown
> service runs in parallel with systemd-modules-load.service, and
> networking is a kernel module? Doesn't that mean it behaves as expected,
> then? /shrug/
> Have you tried adding an 'After=systemd-modules-load.service' dependency
> to the ifupdown service? I don't think that DSA is that bad that it
> registers its net devices outside of the process context in which the
> insmod mv88e6xxx.ko is called. Quite the opposite, I think (but I
> haven't actually taken a close look yet) that the component stuff
> Saravana is proposing would do exactly that. So you "fix" one issue, you
> introduce another.

# systemctl list-dependencies networking.service
networking.service
  ├─ifupdown-pre.service
  ├─system.slice
  └─network.target
# systemctl list-dependencies ifupdown-pre.service
ifupdown-pre.service
  ├─system.slice
  └─systemd-udevd.service

Looking in the service files for a better idea:

networking.service:
Requires=ifupdown-pre.service
Wants=network.target
After=local-fs.target network-pre.target apparmor.service systemd-sysctl.service systemd-modules-load.service ifupdown-pre.service
Before=network.target shutdown.target network-online.target

ifupdown-pre.service:
Wants=systemd-udevd.service
After=systemd-udev-trigger.service
Before=network.target

So, the dependency you mention is already present. As is a dependency
on udev. The problem is udev does all the automatic module loading
asynchronously and in a multithreaded way.

I don't think there's a way to make systemd wait for all module loads
to complete.

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [RFC PATCH net-next 0/3] Make the PHY library stop being so greedy when binding the generic PHY driver
  2021-09-02 12:35   ` Vladimir Oltean
  2021-09-02 12:59     ` Vladimir Oltean
  2021-09-02 13:26     ` Russell King (Oracle)
@ 2021-09-02 20:07     ` Andrew Lunn
  2021-09-02 20:32       ` Vladimir Oltean
  2 siblings, 1 reply; 42+ messages in thread
From: Andrew Lunn @ 2021-09-02 20:07 UTC (permalink / raw)
  To: Vladimir Oltean
  Cc: Russell King (Oracle),
	Vladimir Oltean, netdev, Greg Kroah-Hartman, Rafael J. Wysocki,
	Heiner Kallweit, David S. Miller, Jakub Kicinski, Vivien Didelot,
	Florian Fainelli, linux-kernel, Linus Walleij,
	Alvin Šipraga, ACPI Devel Maling List, kernel-team,
	Len Brown

> The interrupt controller _has_ been set up. The trouble is that the
> interrupt controller has the same OF node as the switch itself, and the
> same OF node. Therefore, fw_devlink waits for the _entire_ switch to
> finish probing, it doesn't have insight into the fact that the
> dependency is just on the interrupt controller.

That seems to be the problem. fw_devlink appears to think probe is an
atomic operation. A device is not probed, or full probed. Where as the
drivers are making use of it being non atomic.

Maybe fw_devlink needs the third state, probing. And when deciding if
a device can be probed and depends on a device which is currently
probing, it looks deeper, follows the phandle and see if the resource
is actually available?

	 Andrew


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [RFC PATCH net-next 0/3] Make the PHY library stop being so greedy when binding the generic PHY driver
  2021-09-02 20:03                 ` Russell King (Oracle)
@ 2021-09-02 20:21                   ` Vladimir Oltean
  2021-09-02 20:29                     ` Russell King (Oracle)
  0 siblings, 1 reply; 42+ messages in thread
From: Vladimir Oltean @ 2021-09-02 20:21 UTC (permalink / raw)
  To: Russell King (Oracle)
  Cc: Vladimir Oltean, netdev, Greg Kroah-Hartman, Rafael J. Wysocki,
	Andrew Lunn, Heiner Kallweit, David S. Miller, Jakub Kicinski,
	Vivien Didelot, Florian Fainelli, linux-kernel, Linus Walleij,
	Alvin Šipraga, ACPI Devel Maling List, kernel-team,
	Len Brown

On Thu, Sep 02, 2021 at 09:03:01PM +0100, Russell King (Oracle) wrote:
> On Thu, Sep 02, 2021 at 10:05:07PM +0300, Vladimir Oltean wrote:
> > On Thu, Sep 02, 2021 at 06:50:43PM +0100, Russell King (Oracle) wrote:
> > > On Thu, Sep 02, 2021 at 05:10:34PM +0000, Vladimir Oltean wrote:
> > > > On Thu, Sep 02, 2021 at 05:31:44PM +0100, Russell King (Oracle) wrote:
> > > > > On Thu, Sep 02, 2021 at 06:23:42PM +0300, Vladimir Oltean wrote:
> > > > > > On Thu, Sep 02, 2021 at 02:26:35PM +0100, Russell King (Oracle) wrote:
> > > > > > > Debian has had support for configuring bridges at boot time via
> > > > > > > the interfaces file for years. Breaking that is going to upset a
> > > > > > > lot of people (me included) resulting in busted networks. It
> > > > > > > would be a sure way to make oneself unpopular.
> > > > > > >
> > > > > > > > I expect there to be 2 call paths of phy_attach_direct:
> > > > > > > > - At probe time. Both the MAC driver and the PHY driver are probing.
> > > > > > > >   This is what has this patch addresses. There is no issue to return
> > > > > > > >   -EPROBE_DEFER at that time, since drivers connect to the PHY before
> > > > > > > >   they register their netdev. So if connecting defers, there is no
> > > > > > > >   netdev to unregister, and user space knows nothing of this.
> > > > > > > > - At .ndo_open time. This is where it maybe gets interesting, but not to
> > > > > > > >   user space. If you open a netdev and it connects to the PHY then, I
> > > > > > > >   wouldn't expect the PHY to be undergoing a probing process, all of
> > > > > > > >   that should have been settled by then, should it not? Where it might
> > > > > > > >   get interesting is with NFS root, and I admit I haven't tested that.
> > > > > > >
> > > > > > > I don't think you can make that assumption. Consider the case where
> > > > > > > systemd is being used, DSA stuff is modular, and we're trying to
> > > > > > > setup a bridge device on DSA. DSA could be probing while the bridge
> > > > > > > is being setup.
> > > > > > >
> > > > > > > Sadly, this isn't theoretical. I've ended up needing:
> > > > > > >
> > > > > > > 	pre-up sleep 1
> > > > > > >
> > > > > > > in my bridge configuration to allow time for DSA to finish probing.
> > > > > > > It's not a pleasant solution, nor a particularly reliable one at
> > > > > > > that, but it currently works around the problem.
> > > > > >
> > > > > > What problem? This is the first time I've heard of this report, and you
> > > > > > should definitely not need that.
> > > > >
> > > > > I found it when upgrading the Clearfog by the DSL modems to v5.13.
> > > > > When I rebooted it with a previously working kernel (v5.7) it has
> > > > > never had a problem. With v5.13, it failed to add all the lan ports
> > > > > into the bridge, because the bridge was still being setup by the
> > > > > kernel while userspace was trying to configure it. Note that I have
> > > > > extra debug in my kernels, hence the extra messages:
> > > >
> > > > Ok, first you talked about the interfaces file, then systemd. If it's
> > > > not about systemd's network manager then I don't see how it is relevant.
> > >
> > > You're reading in stuff to what I write that I did not write... I said:
> > >
> > > "Consider the case where systemd is being used, DSA stuff is modular,
> > > and we're trying to setup a bridge device on DSA."
> > >
> > > That does not mean I'm using systemd's network manager - which is
> > > something I know little about and have never used.
> > 
> > You should definitely try it out, it gets a lot of new features added
> > all the time, it uses the netlink interface, it reacts on udev events.
> > 
> > > The reason I mentioned systemd is precisely because with systemd, you
> > > get a hell of a lot happening parallel - and that's significiant in
> > > this case, because it's very clear that modules are being loaded in
> > > parallel with networking being brought up - and that is where the
> > > problems begin. In fact, modules themselves get loaded in paralllel
> > > with systemd.
> > 
> > So that's what I don't understand. You're saying that the ifupdown
> > service runs in parallel with systemd-modules-load.service, and
> > networking is a kernel module? Doesn't that mean it behaves as expected,
> > then? /shrug/
> > Have you tried adding an 'After=systemd-modules-load.service' dependency
> > to the ifupdown service? I don't think that DSA is that bad that it
> > registers its net devices outside of the process context in which the
> > insmod mv88e6xxx.ko is called. Quite the opposite, I think (but I
> > haven't actually taken a close look yet) that the component stuff
> > Saravana is proposing would do exactly that. So you "fix" one issue, you
> > introduce another.
> 
> # systemctl list-dependencies networking.service
> networking.service
>   ├─ifupdown-pre.service
>   ├─system.slice
>   └─network.target
> # systemctl list-dependencies ifupdown-pre.service
> ifupdown-pre.service
>   ├─system.slice
>   └─systemd-udevd.service
> 
> Looking in the service files for a better idea:
> 
> networking.service:
> Requires=ifupdown-pre.service
> Wants=network.target
> After=local-fs.target network-pre.target apparmor.service systemd-sysctl.service systemd-modules-load.service ifupdown-pre.service
> Before=network.target shutdown.target network-online.target
> 
> ifupdown-pre.service:
> Wants=systemd-udevd.service
> After=systemd-udev-trigger.service
> Before=network.target
> 
> So, the dependency you mention is already present. As is a dependency
> on udev. The problem is udev does all the automatic module loading
> asynchronously and in a multithreaded way.
> 
> I don't think there's a way to make systemd wait for all module loads
> to complete.

So ifupdown-pre.service has a call to "udevadm settle". This "watches
the udev event queue, and exits if all current events are handled",
according to the man page. But which current events? ifupdown-pre.service
does not have the dependency on systemd-modules-load.service, just
networking.service does. So maybe ifupdown-pre.service does not wait for
DSA to finish initializing, then it tells networking.service that all is ok.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [RFC PATCH net-next 0/3] Make the PHY library stop being so greedy when binding the generic PHY driver
  2021-09-02 20:21                   ` Vladimir Oltean
@ 2021-09-02 20:29                     ` Russell King (Oracle)
  0 siblings, 0 replies; 42+ messages in thread
From: Russell King (Oracle) @ 2021-09-02 20:29 UTC (permalink / raw)
  To: Vladimir Oltean
  Cc: Vladimir Oltean, netdev, Greg Kroah-Hartman, Rafael J. Wysocki,
	Andrew Lunn, Heiner Kallweit, David S. Miller, Jakub Kicinski,
	Vivien Didelot, Florian Fainelli, linux-kernel, Linus Walleij,
	Alvin Šipraga, ACPI Devel Maling List, kernel-team,
	Len Brown

On Thu, Sep 02, 2021 at 11:21:24PM +0300, Vladimir Oltean wrote:
> On Thu, Sep 02, 2021 at 09:03:01PM +0100, Russell King (Oracle) wrote:
> > # systemctl list-dependencies networking.service
> > networking.service
> >   ├─ifupdown-pre.service
> >   ├─system.slice
> >   └─network.target
> > # systemctl list-dependencies ifupdown-pre.service
> > ifupdown-pre.service
> >   ├─system.slice
> >   └─systemd-udevd.service
> > 
> > Looking in the service files for a better idea:
> > 
> > networking.service:
> > Requires=ifupdown-pre.service
> > Wants=network.target
> > After=local-fs.target network-pre.target apparmor.service systemd-sysctl.service systemd-modules-load.service ifupdown-pre.service
> > Before=network.target shutdown.target network-online.target
> > 
> > ifupdown-pre.service:
> > Wants=systemd-udevd.service
> > After=systemd-udev-trigger.service
> > Before=network.target
> > 
> > So, the dependency you mention is already present. As is a dependency
> > on udev. The problem is udev does all the automatic module loading
> > asynchronously and in a multithreaded way.
> > 
> > I don't think there's a way to make systemd wait for all module loads
> > to complete.
> 
> So ifupdown-pre.service has a call to "udevadm settle". This "watches
> the udev event queue, and exits if all current events are handled",
> according to the man page. But which current events? ifupdown-pre.service
> does not have the dependency on systemd-modules-load.service, just
> networking.service does. So maybe ifupdown-pre.service does not wait for
> DSA to finish initializing, then it tells networking.service that all is ok.

ifupdown-pre.service does have a call to udevadm settle, and that
does get called from what I can tell.

systemd-modules-load.service is an entire red herring. The only
module listed in the various modules-load.d directories is "tun"
for openvpn (which isn't currently being used.)

As I've already told you (and you seem to have ignored), DSA gets
loaded by udev, not by systemd-modules-load.service.
systemd-modules-load.service is irrelevant to my situation.

I think there's a problem with "and exits if all current events are
handled" - does that mean it's fired off a modprobe process which
is in progress, or does that mean that the modprobe process has
completed.

Given that we can see that ifup is being run while the DSA module is
still in the middle of probing, the latter interpretation can not be
true - unless systemd is ignoring the dependencies. Or just in
general, systemd being systemd (I have very little faith in systemd
behaving as it should.)

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [RFC PATCH net-next 0/3] Make the PHY library stop being so greedy when binding the generic PHY driver
  2021-09-02 20:07     ` Andrew Lunn
@ 2021-09-02 20:32       ` Vladimir Oltean
  2021-09-02 21:39         ` Russell King (Oracle)
  0 siblings, 1 reply; 42+ messages in thread
From: Vladimir Oltean @ 2021-09-02 20:32 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: Russell King (Oracle),
	Vladimir Oltean, netdev, Greg Kroah-Hartman, Rafael J. Wysocki,
	Heiner Kallweit, David S. Miller, Jakub Kicinski, Vivien Didelot,
	Florian Fainelli, linux-kernel, Linus Walleij,
	Alvin Šipraga, ACPI Devel Maling List, kernel-team,
	Len Brown

On Thu, Sep 02, 2021 at 10:07:49PM +0200, Andrew Lunn wrote:
> > The interrupt controller _has_ been set up. The trouble is that the
> > interrupt controller has the same OF node as the switch itself, and the
> > same OF node. Therefore, fw_devlink waits for the _entire_ switch to
> > finish probing, it doesn't have insight into the fact that the
> > dependency is just on the interrupt controller.
>
> That seems to be the problem. fw_devlink appears to think probe is an
> atomic operation. A device is not probed, or full probed. Where as the
> drivers are making use of it being non atomic.
>
> Maybe fw_devlink needs the third state, probing. And when deciding if
> a device can be probed and depends on a device which is currently
> probing, it looks deeper, follows the phandle and see if the resource
> is actually available?

This is interesting because there already exists a device link state for
when the consumer is "probing", but for the supplier, it's binary:

/**
 * enum device_link_state - Device link states.
 * @DL_STATE_NONE: The presence of the drivers is not being tracked.
 * @DL_STATE_DORMANT: None of the supplier/consumer drivers is present.
 * @DL_STATE_AVAILABLE: The supplier driver is present, but the consumer is not.
 * @DL_STATE_CONSUMER_PROBE: The consumer is probing (supplier driver present).
 * @DL_STATE_ACTIVE: Both the supplier and consumer drivers are present.
 * @DL_STATE_SUPPLIER_UNBIND: The supplier driver is unbinding.
 */

The check that's killing us is in device_links_check_suppliers, and is
for DL_STATE_AVAILABLE:

	list_for_each_entry(link, &dev->links.suppliers, c_node) {
		if (!(link->flags & DL_FLAG_MANAGED))
			continue;

		if (link->status != DL_STATE_AVAILABLE &&
		    !(link->flags & DL_FLAG_SYNC_STATE_ONLY)) {
			device_links_missing_supplier(dev);
			dev_err(dev, "probe deferral - supplier %s not ready\n",
				dev_name(link->supplier));
			ret = -EPROBE_DEFER;
			break;
		}
		WRITE_ONCE(link->status, DL_STATE_CONSUMER_PROBE);
	}

Anyway, I was expecting quite a different reaction from this patch
series, and especially one from Saravana. We are essentially battling to
handle an -EPROBE_DEFER we don't need (the battle might be worth it
though, in the general case, which is one of the reasons I posted them).

But these patches also solve DSA's issue with the circular dependency
between the switch and its internal PHYs, and nobody seems to have asked
the most important question: why?

The PHY should return -EPROBE_DEFER ad infinitum, since its supplier has
never finished probing by the time it calls phy_attach_direct.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [RFC PATCH net-next 1/3] net: phy: don't bind genphy in phy_attach_direct if the specific driver defers probe
  2021-09-02 19:51     ` Andrew Lunn
@ 2021-09-02 20:33       ` Florian Fainelli
  2021-09-02 21:33         ` Russell King (Oracle)
  0 siblings, 1 reply; 42+ messages in thread
From: Florian Fainelli @ 2021-09-02 20:33 UTC (permalink / raw)
  To: Andrew Lunn, Russell King (Oracle)
  Cc: Vladimir Oltean, netdev, Greg Kroah-Hartman, Rafael J. Wysocki,
	Heiner Kallweit, David S. Miller, Jakub Kicinski, Vivien Didelot,
	Vladimir Oltean, linux-kernel, Linus Walleij, Alvin Šipraga,
	ACPI Devel Maling List, kernel-team, Len Brown



On 9/2/2021 12:51 PM, Andrew Lunn wrote:
> On Thu, Sep 02, 2021 at 07:50:16PM +0100, Russell King (Oracle) wrote:
>> On Thu, Sep 02, 2021 at 01:50:51AM +0300, Vladimir Oltean wrote:
>>> diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c
>>> index 52310df121de..2c22a32f0a1c 100644
>>> --- a/drivers/net/phy/phy_device.c
>>> +++ b/drivers/net/phy/phy_device.c
>>> @@ -1386,8 +1386,16 @@ int phy_attach_direct(struct net_device *dev, struct phy_device *phydev,
>>>   
>>>   	/* Assume that if there is no driver, that it doesn't
>>>   	 * exist, and we should use the genphy driver.
>>> +	 * The exception is during probing, when the PHY driver might have
>>> +	 * attempted a probe but has requested deferral. Since there might be
>>> +	 * MAC drivers which also attach to the PHY during probe time, try
>>> +	 * harder to bind the specific PHY driver, and defer the MAC driver's
>>> +	 * probing until then.
>>>   	 */
>>>   	if (!d->driver) {
>>> +		if (device_pending_probe(d))
>>> +			return -EPROBE_DEFER;
>>
>> Something else that concerns me here.
>>
>> As noted, many network drivers attempt to attach their PHY when the
>> device is brought up, and not during their probe function.
> 
> Yes, this is going to be a problem. I agree it is too late to return
> -EPROBE_DEFER. Maybe phy_attach_direct() needs to wait around, if the
> device is still on the deferred list, otherwise use genphy. And maybe
> a timeout and return -ENODEV, which is not 100% correct, we know the
> device exists, we just cannot drive it.

Is it really going to be a problem though? The two cases where this will 
matter is if we use IP auto-configuration within the kernel, which this 
patchset ought to be helping with, if we are already in user-space and 
the PHY is connected at .ndo_open() time, there is a whole lot of things 
that did happen prior to getting there, such as udevd using modaliases 
in order to load every possible module we might, so I am debating 
whether we will really see a probe deferral at all.

> 
> Can we tell we are in the context of a driver probe? Or do we need to
> add a parameter to the various phy_attach API calls to let the core
> know if this is probe or open?

Actually we do the RTNL lock will be held during ndo_open and it won't 
during driver probe.

> 
> This is more likely to be a problem with NFS root, with the kernel
> bringing up an interface as soon as its registered. userspace bringing
> up interfaces is generally much later, and udev tends to wait around
> until there are no more driver load requests before the boot
> continues.

See my point above, with Vladimir's change, we should have fw_devlink do 
its job such that by the time the network interface is needed for IP 
auto-configuration, all of its depending resources should also be ready, 
would not they?
-- 
Florian

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [RFC PATCH net-next 1/3] net: phy: don't bind genphy in phy_attach_direct if the specific driver defers probe
  2021-09-02 20:33       ` Florian Fainelli
@ 2021-09-02 21:33         ` Russell King (Oracle)
  2021-09-02 21:39           ` Vladimir Oltean
  0 siblings, 1 reply; 42+ messages in thread
From: Russell King (Oracle) @ 2021-09-02 21:33 UTC (permalink / raw)
  To: Florian Fainelli
  Cc: Andrew Lunn, Vladimir Oltean, netdev, Greg Kroah-Hartman,
	Rafael J. Wysocki, Heiner Kallweit, David S. Miller,
	Jakub Kicinski, Vivien Didelot, Vladimir Oltean, linux-kernel,
	Linus Walleij, Alvin Šipraga, ACPI Devel Maling List,
	kernel-team, Len Brown

On Thu, Sep 02, 2021 at 01:33:57PM -0700, Florian Fainelli wrote:
> On 9/2/2021 12:51 PM, Andrew Lunn wrote:
> > On Thu, Sep 02, 2021 at 07:50:16PM +0100, Russell King (Oracle) wrote:
> > > On Thu, Sep 02, 2021 at 01:50:51AM +0300, Vladimir Oltean wrote:
> > > > diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c
> > > > index 52310df121de..2c22a32f0a1c 100644
> > > > --- a/drivers/net/phy/phy_device.c
> > > > +++ b/drivers/net/phy/phy_device.c
> > > > @@ -1386,8 +1386,16 @@ int phy_attach_direct(struct net_device *dev, struct phy_device *phydev,
> > > >   	/* Assume that if there is no driver, that it doesn't
> > > >   	 * exist, and we should use the genphy driver.
> > > > +	 * The exception is during probing, when the PHY driver might have
> > > > +	 * attempted a probe but has requested deferral. Since there might be
> > > > +	 * MAC drivers which also attach to the PHY during probe time, try
> > > > +	 * harder to bind the specific PHY driver, and defer the MAC driver's
> > > > +	 * probing until then.
> > > >   	 */
> > > >   	if (!d->driver) {
> > > > +		if (device_pending_probe(d))
> > > > +			return -EPROBE_DEFER;
> > > 
> > > Something else that concerns me here.
> > > 
> > > As noted, many network drivers attempt to attach their PHY when the
> > > device is brought up, and not during their probe function.
> > 
> > Yes, this is going to be a problem. I agree it is too late to return
> > -EPROBE_DEFER. Maybe phy_attach_direct() needs to wait around, if the
> > device is still on the deferred list, otherwise use genphy. And maybe
> > a timeout and return -ENODEV, which is not 100% correct, we know the
> > device exists, we just cannot drive it.
> 
> Is it really going to be a problem though? The two cases where this will
> matter is if we use IP auto-configuration within the kernel, which this
> patchset ought to be helping with

There is no handling of EPROBE_DEFER in the IP auto-configuration
code while trying to bring up interfaces:

        for_each_netdev(&init_net, dev) {
                if (ic_is_init_dev(dev)) {
...
                        oflags = dev->flags;
                        if (dev_change_flags(dev, oflags | IFF_UP, NULL) < 0) {
                                pr_err("IP-Config: Failed to open %s\n",
                                       dev->name);
                                continue;
                        }

So, the only way this could be reliable is if we can guarantee that
all deferred probes will have been retried by the time we get here.
Do we have that guarantee?

> if we are already in user-space and the
> PHY is connected at .ndo_open() time, there is a whole lot of things that
> did happen prior to getting there, such as udevd using modaliases in order
> to load every possible module we might, so I am debating whether we will
> really see a probe deferral at all.

As can be seen from my recent posts which show on Debian Buster that
interfaces are attempted to be brought up while e.g. mv88e6xxx is still
probing, we can't make any guarantees that things have "settled" by the
time userspace attempts to bring up the network interfaces.

I may have more on why that is happening... I won't post it here, I'll
post to the other thread.

> > Can we tell we are in the context of a driver probe? Or do we need to
> > add a parameter to the various phy_attach API calls to let the core
> > know if this is probe or open?
> 
> Actually we do the RTNL lock will be held during ndo_open and it won't
> during driver probe.

That's probably an unreliable indicator. DPAA2 has weirdness in the
way it can dynamically create and destroy network interfaces, which
does lead to problems with the rtnl lock. I've been carrying a patch
from NXP for this for almost two years now, which NXP still haven't
submitted:

http://git.armlinux.org.uk/cgit/linux-arm.git/commit/?h=cex7&id=a600f2ee50223e9bcdcf86b65b4c427c0fd425a4

... and I've no idea why that patch never made mainline. I need it
to avoid the stated deadlock on SolidRun Honeycomb platforms when
creating additional network interfaces for the SFP cages in userspace.

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [RFC PATCH net-next 0/3] Make the PHY library stop being so greedy when binding the generic PHY driver
  2021-09-02 20:32       ` Vladimir Oltean
@ 2021-09-02 21:39         ` Russell King (Oracle)
  0 siblings, 0 replies; 42+ messages in thread
From: Russell King (Oracle) @ 2021-09-02 21:39 UTC (permalink / raw)
  To: Vladimir Oltean
  Cc: Andrew Lunn, Vladimir Oltean, netdev, Greg Kroah-Hartman,
	Rafael J. Wysocki, Heiner Kallweit, David S. Miller,
	Jakub Kicinski, Vivien Didelot, Florian Fainelli, linux-kernel,
	Linus Walleij, Alvin Šipraga, ACPI Devel Maling List,
	kernel-team, Len Brown

On Thu, Sep 02, 2021 at 11:32:48PM +0300, Vladimir Oltean wrote:
> But these patches also solve DSA's issue with the circular dependency
> between the switch and its internal PHYs, and nobody seems to have asked
> the most important question: why?

Surely you specified that in your cover message and in the patch
that actually fixes the problem, as one always should do.

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [RFC PATCH net-next 1/3] net: phy: don't bind genphy in phy_attach_direct if the specific driver defers probe
  2021-09-02 21:33         ` Russell King (Oracle)
@ 2021-09-02 21:39           ` Vladimir Oltean
  2021-09-02 22:24             ` Russell King (Oracle)
  0 siblings, 1 reply; 42+ messages in thread
From: Vladimir Oltean @ 2021-09-02 21:39 UTC (permalink / raw)
  To: Russell King (Oracle)
  Cc: Florian Fainelli, Andrew Lunn, Vladimir Oltean, netdev,
	Greg Kroah-Hartman, Rafael J. Wysocki, Heiner Kallweit,
	David S. Miller, Jakub Kicinski, Vivien Didelot, linux-kernel,
	Linus Walleij, Alvin Šipraga, ACPI Devel Maling List,
	kernel-team, Len Brown

On Thu, Sep 02, 2021 at 10:33:03PM +0100, Russell King (Oracle) wrote:
> That's probably an unreliable indicator. DPAA2 has weirdness in the
> way it can dynamically create and destroy network interfaces, which
> does lead to problems with the rtnl lock. I've been carrying a patch
> from NXP for this for almost two years now, which NXP still haven't
> submitted:
> 
> http://git.armlinux.org.uk/cgit/linux-arm.git/commit/?h=cex7&id=a600f2ee50223e9bcdcf86b65b4c427c0fd425a4
> 
> ... and I've no idea why that patch never made mainline. I need it
> to avoid the stated deadlock on SolidRun Honeycomb platforms when
> creating additional network interfaces for the SFP cages in userspace.

Ah, nice, I've copied that broken logic for the dpaa2-switch too:
https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git/commit/?id=d52ef12f7d6c016f3b249db95af33f725e3dd065

So why don't you send the patch? I can send it too if you want to, one
for the switch and one for the DPNI driver.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [RFC PATCH net-next 0/3] Make the PHY library stop being so greedy when binding the generic PHY driver
  2021-09-01 22:50 [RFC PATCH net-next 0/3] Make the PHY library stop being so greedy when binding the generic PHY driver Vladimir Oltean
                   ` (3 preceding siblings ...)
  2021-09-02 12:19 ` [RFC PATCH net-next 0/3] Make the PHY library stop being so greedy when binding the generic PHY driver Russell King (Oracle)
@ 2021-09-02 22:05 ` Vladimir Oltean
  2021-09-02 23:29   ` Saravana Kannan
  4 siblings, 1 reply; 42+ messages in thread
From: Vladimir Oltean @ 2021-09-02 22:05 UTC (permalink / raw)
  To: Vladimir Oltean, Saravana Kannan
  Cc: netdev, Greg Kroah-Hartman, Rafael J. Wysocki, Andrew Lunn,
	Heiner Kallweit, Russell King, David S. Miller, Jakub Kicinski,
	Vivien Didelot, Florian Fainelli, linux-kernel, Linus Walleij,
	Alvin Šipraga, ACPI Devel Maling List, kernel-team,
	Len Brown

On Thu, Sep 02, 2021 at 01:50:50AM +0300, Vladimir Oltean wrote:
> This is a continuation of the discussion on patch "[v1,1/2] driver core:
> fw_devlink: Add support for FWNODE_FLAG_BROKEN_PARENT" from here:
> https://patchwork.kernel.org/project/netdevbpf/patch/20210826074526.825517-2-saravanak@google.com/
> 
> Summary: in a complex combination of device dependencies which is not
> really relevant to what is being proposed here, DSA ends up calling
> phylink_of_phy_connect during a period in which the PHY driver goes
> through a series of probe deferral events.
> 
> The central point of that discussion is that DSA seems "broken" for
> expecting the PHY driver to probe immediately on PHYs belonging to the
> internal MDIO buses of switches. A few suggestions were made about what
> to do, but some were not satisfactory and some did not solve the problem.
> 
> In fact, fw_devlink, the mechanism that causes the PHY driver to defer
> probing in this particular case, has some significant "issues" too, but
> its "issues" are only in quotes "because at worst it'd allow a few
> unnecessary deferred probes":
> https://patchwork.kernel.org/project/netdevbpf/patch/20210826074526.825517-2-saravanak@google.com/#24418895
> 
> So if that's the criterion by which an issue is an issue, maybe we
> should take a step back and look at the bigger picture.
> 
> There is nothing about the idea that a PHY might defer probing, or about
> the changes proposed here, that has anything with DSA. Furthermore, the
> changes done by this series solve the problem in the same way: "they
> allow a few unnecessary deferred probes" <- in this case they provoke
> this to the caller of phy_attach_direct.
> 
> If we look at commit 16983507742c ("net: phy: probe PHY drivers
> synchronously"), we see that the PHY library expectation is for the PHY
> device to have a PHY driver bound to it as soon as device_add() finishes.
> 
> Well, as it turns out, in case the PHY device has any supplier which is
> not ready, this is not possible, but that patch still seems to ensure
> that the process of binding a driver to the device has at least started.
> That process will continue for a while, and will race with
> phy_attach_direct calls, so we need to make the latter observe the fact
> that a driver is struggling to probe, and wait for it a bit more.
> 
> What I've not tested is loading the PHY module at runtime, and seeing
> how phy_attach_direct behaves then. I expect that this change set will
> not alter the behavior in that case: the genphy will still bind to a
> device with no driver, and phy_attach_direct will not return -EPROBE_DEFER
> in that case.
> 
> I might not be very versed in the device core/internals, but the patches
> make sense to me, and worked as intended from the first try on my system
> (Turris MOX with mv88e6xxx), which was modified to make the same "sins"
> as those called out in the thread above:
> 
> - use PHY interrupts provided by the switch itself as an interrupt-controller
> - call of_mdiobus_register from setup() and not from probe(), so as to
>   not circumvent fw_devlink's limitations, and still get to hit the PHY
>   probe deferral conditions.
> 
> So feedback and testing on other platforms is very appreciated.
> 
> Vladimir Oltean (3):
>   net: phy: don't bind genphy in phy_attach_direct if the specific
>     driver defers probe
>   net: dsa: destroy the phylink instance on any error in
>     dsa_slave_phy_setup
>   net: dsa: allow the phy_connect() call to return -EPROBE_DEFER
> 
>  drivers/base/dd.c            | 21 +++++++++++++++++++--
>  drivers/net/phy/phy_device.c |  8 ++++++++
>  include/linux/device.h       |  1 +
>  net/dsa/dsa2.c               |  2 ++
>  net/dsa/slave.c              | 12 +++++-------
>  5 files changed, 35 insertions(+), 9 deletions(-)
> 
> -- 
> 2.25.1
> 

Ouch, I just realized that Saravana, the person whose reaction I've been
waiting for the most, is not copied....

Saravana, you can find the thread here to sync up with what has been
discussed:
https://patchwork.kernel.org/project/netdevbpf/cover/20210901225053.1205571-1-vladimir.oltean@nxp.com/

Sorry.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [RFC PATCH net-next 1/3] net: phy: don't bind genphy in phy_attach_direct if the specific driver defers probe
  2021-09-02 21:39           ` Vladimir Oltean
@ 2021-09-02 22:24             ` Russell King (Oracle)
  2021-09-02 22:45               ` Vladimir Oltean
  2021-09-03  9:27               ` Ioana Ciornei
  0 siblings, 2 replies; 42+ messages in thread
From: Russell King (Oracle) @ 2021-09-02 22:24 UTC (permalink / raw)
  To: Vladimir Oltean
  Cc: Florian Fainelli, Andrew Lunn, Vladimir Oltean, netdev,
	Greg Kroah-Hartman, Rafael J. Wysocki, Heiner Kallweit,
	David S. Miller, Jakub Kicinski, Vivien Didelot, linux-kernel,
	Linus Walleij, Alvin Šipraga, ACPI Devel Maling List,
	kernel-team, Len Brown

On Fri, Sep 03, 2021 at 12:39:49AM +0300, Vladimir Oltean wrote:
> On Thu, Sep 02, 2021 at 10:33:03PM +0100, Russell King (Oracle) wrote:
> > That's probably an unreliable indicator. DPAA2 has weirdness in the
> > way it can dynamically create and destroy network interfaces, which
> > does lead to problems with the rtnl lock. I've been carrying a patch
> > from NXP for this for almost two years now, which NXP still haven't
> > submitted:
> > 
> > http://git.armlinux.org.uk/cgit/linux-arm.git/commit/?h=cex7&id=a600f2ee50223e9bcdcf86b65b4c427c0fd425a4
> > 
> > ... and I've no idea why that patch never made mainline. I need it
> > to avoid the stated deadlock on SolidRun Honeycomb platforms when
> > creating additional network interfaces for the SFP cages in userspace.
> 
> Ah, nice, I've copied that broken logic for the dpaa2-switch too:
> https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git/commit/?id=d52ef12f7d6c016f3b249db95af33f725e3dd065
> 
> So why don't you send the patch? I can send it too if you want to, one
> for the switch and one for the DPNI driver.

Sorry, I mis-stated. NXP did submit that exact patch, but it's actually
incorrect for the reason I stated when it was sent:

https://patchwork.ozlabs.org/project/netdev/patch/1574363727-5437-2-git-send-email-ioana.ciornei@nxp.com/

I did miss the rtnl_lock() around phylink_disconnect_phy() in the
description of the race, which goes someway towards hiding it, but
there is still a race between phylink_destroy() and another thread
calling dpaa2_eth_get_link_ksettings(), and priv->mac being freed:

static int
dpaa2_eth_get_link_ksettings(struct net_device *net_dev,
                             struct ethtool_link_ksettings *link_settings)
{
        struct dpaa2_eth_priv *priv = netdev_priv(net_dev);

        if (dpaa2_eth_is_type_phy(priv))
                return phylink_ethtool_ksettings_get(priv->mac->phylink,
                                                     link_settings);

which dereferences priv->mac and priv->mac->phylink, vs:

static irqreturn_t dpni_irq0_handler_thread(int irq_num, void *arg)
{
...
        if (status & DPNI_IRQ_EVENT_ENDPOINT_CHANGED) {
                dpaa2_eth_set_mac_addr(netdev_priv(net_dev));
                dpaa2_eth_update_tx_fqids(priv);

                if (dpaa2_eth_has_mac(priv))
                        dpaa2_eth_disconnect_mac(priv);
                else
                        dpaa2_eth_connect_mac(priv);
        }

static void dpaa2_eth_disconnect_mac(struct dpaa2_eth_priv *priv)
{
        if (dpaa2_eth_is_type_phy(priv))
                dpaa2_mac_disconnect(priv->mac);

        if (!dpaa2_eth_has_mac(priv))
                return;

        dpaa2_mac_close(priv->mac);
        kfree(priv->mac);		<== potential use after free bug by
        priv->mac = NULL;		<== dpaa2_eth_get_link_ksettings()
}

void dpaa2_mac_disconnect(struct dpaa2_mac *mac)
{
        if (!mac->phylink)
                return;

        phylink_disconnect_phy(mac->phylink);
        phylink_destroy(mac->phylink);	<== another use-after-free bug via
					    dpaa2_eth_get_link_ksettings()
        dpaa2_pcs_destroy(mac);
}

Note that phylink_destroy() is documented as:

 * Note: the rtnl lock must not be held when calling this function.

because it calls sfp_bus_del_upstream(), which will take the rtnl lock
itself. An alternative solution would be to remove the rtnl locking
from sfp_bus_del_upstream(), but then force _everyone_ to take the
rtnl lock before calling phylink_destroy() - meaning a larger block of
code ends up executing under the lock than is really necessary.

However, as I stated in my review of the patch "As I've already stated,
the phylink is not designed to be created and destroyed on a published
network device." That still remains true today, and it seems that the
issue has never been fixed in DPAA2 despite having been pointed out.

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [RFC PATCH net-next 1/3] net: phy: don't bind genphy in phy_attach_direct if the specific driver defers probe
  2021-09-02 22:24             ` Russell King (Oracle)
@ 2021-09-02 22:45               ` Vladimir Oltean
  2021-09-02 23:02                 ` Andrew Lunn
  2021-09-03  9:27               ` Ioana Ciornei
  1 sibling, 1 reply; 42+ messages in thread
From: Vladimir Oltean @ 2021-09-02 22:45 UTC (permalink / raw)
  To: Russell King (Oracle)
  Cc: Florian Fainelli, Andrew Lunn, Vladimir Oltean, netdev,
	Greg Kroah-Hartman, Rafael J. Wysocki, Heiner Kallweit,
	David S. Miller, Jakub Kicinski, Vivien Didelot, linux-kernel,
	Linus Walleij, Alvin Šipraga, ACPI Devel Maling List,
	kernel-team, Len Brown

On Thu, Sep 02, 2021 at 11:24:39PM +0100, Russell King (Oracle) wrote:
> On Fri, Sep 03, 2021 at 12:39:49AM +0300, Vladimir Oltean wrote:
> > On Thu, Sep 02, 2021 at 10:33:03PM +0100, Russell King (Oracle) wrote:
> > > That's probably an unreliable indicator. DPAA2 has weirdness in the
> > > way it can dynamically create and destroy network interfaces, which
> > > does lead to problems with the rtnl lock. I've been carrying a patch
> > > from NXP for this for almost two years now, which NXP still haven't
> > > submitted:
> > >
> > > http://git.armlinux.org.uk/cgit/linux-arm.git/commit/?h=cex7&id=a600f2ee50223e9bcdcf86b65b4c427c0fd425a4
> > >
> > > ... and I've no idea why that patch never made mainline. I need it
> > > to avoid the stated deadlock on SolidRun Honeycomb platforms when
> > > creating additional network interfaces for the SFP cages in userspace.
> >
> > Ah, nice, I've copied that broken logic for the dpaa2-switch too:
> > https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git/commit/?id=d52ef12f7d6c016f3b249db95af33f725e3dd065
> >
> > So why don't you send the patch? I can send it too if you want to, one
> > for the switch and one for the DPNI driver.
>
> Sorry, I mis-stated. NXP did submit that exact patch, but it's actually
> incorrect for the reason I stated when it was sent:
>
> https://patchwork.ozlabs.org/project/netdev/patch/1574363727-5437-2-git-send-email-ioana.ciornei@nxp.com/

So why are you carrying it then?

> I did miss the rtnl_lock() around phylink_disconnect_phy() in the
> description of the race, which goes someway towards hiding it, but
> there is still a race between phylink_destroy() and another thread
> calling dpaa2_eth_get_link_ksettings(), and priv->mac being freed:
>
> static int
> dpaa2_eth_get_link_ksettings(struct net_device *net_dev,
>                              struct ethtool_link_ksettings *link_settings)
> {
>         struct dpaa2_eth_priv *priv = netdev_priv(net_dev);
>
>         if (dpaa2_eth_is_type_phy(priv))
>                 return phylink_ethtool_ksettings_get(priv->mac->phylink,
>                                                      link_settings);
>
> which dereferences priv->mac and priv->mac->phylink, vs:
>
> static irqreturn_t dpni_irq0_handler_thread(int irq_num, void *arg)
> {
> ...
>         if (status & DPNI_IRQ_EVENT_ENDPOINT_CHANGED) {
>                 dpaa2_eth_set_mac_addr(netdev_priv(net_dev));
>                 dpaa2_eth_update_tx_fqids(priv);
>
>                 if (dpaa2_eth_has_mac(priv))
>                         dpaa2_eth_disconnect_mac(priv);
>                 else
>                         dpaa2_eth_connect_mac(priv);
>         }
>
> static void dpaa2_eth_disconnect_mac(struct dpaa2_eth_priv *priv)
> {
>         if (dpaa2_eth_is_type_phy(priv))
>                 dpaa2_mac_disconnect(priv->mac);
>
>         if (!dpaa2_eth_has_mac(priv))
>                 return;
>
>         dpaa2_mac_close(priv->mac);
>         kfree(priv->mac);		<== potential use after free bug by
>         priv->mac = NULL;		<== dpaa2_eth_get_link_ksettings()
> }

Okay, so this needs to stay under the rtnetlink mutex, to serialize with
dpaa2_eth_get_link_ksettings which is already under the rtnetlink mutex.
So the way in which rtnl_lock is taken right now is actually fine in a way.

>
> void dpaa2_mac_disconnect(struct dpaa2_mac *mac)
> {
>         if (!mac->phylink)
>                 return;
>
>         phylink_disconnect_phy(mac->phylink);
>         phylink_destroy(mac->phylink);	<== another use-after-free bug via
> 					    dpaa2_eth_get_link_ksettings()
>         dpaa2_pcs_destroy(mac);
> }
>
> Note that phylink_destroy() is documented as:
>
>  * Note: the rtnl lock must not be held when calling this function.
>
> because it calls sfp_bus_del_upstream(), which will take the rtnl lock
> itself. An alternative solution would be to remove the rtnl locking
> from sfp_bus_del_upstream(), but then force _everyone_ to take the
> rtnl lock before calling phylink_destroy() - meaning a larger block of
> code ends up executing under the lock than is really necessary.

So phylink_destroy has exactly 20 call sites, it is not that bad?

And as for "larger block than necessary" - doesn't the dpaa2 prolonged
usage count as necessary?

> However, as I stated in my review of the patch "As I've already stated,
> the phylink is not designed to be created and destroyed on a published
> network device." That still remains true today, and it seems that the
> issue has never been fixed in DPAA2 despite having been pointed out.

So what would you do, exactly, to "fix" the issue that a DPNI can
connect and disconnect at runtime from a DPMAC?

Also, "X is not designed to Y" doesn't really say much, given a bit of
will power. Linux was not designed to run on non-i386 either.

Any other issues besides needing to take rtnl_mutex top-level when
calling phylink_destroy? Since phylink_disconnect_phy needs it anyway,
and phylink_destroy ends up calling sfp_bus_del_upstream which takes the
same mutex again, and drivers that connect/disconnect at probe/remove
time end up calling both in a row, I don't think there is much of an
issue to speak of, or that the rework would be that difficult.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [RFC PATCH net-next 1/3] net: phy: don't bind genphy in phy_attach_direct if the specific driver defers probe
  2021-09-02 22:45               ` Vladimir Oltean
@ 2021-09-02 23:02                 ` Andrew Lunn
  2021-09-02 23:26                   ` Vladimir Oltean
  0 siblings, 1 reply; 42+ messages in thread
From: Andrew Lunn @ 2021-09-02 23:02 UTC (permalink / raw)
  To: Vladimir Oltean
  Cc: Russell King (Oracle),
	Florian Fainelli, Vladimir Oltean, netdev, Greg Kroah-Hartman,
	Rafael J. Wysocki, Heiner Kallweit, David S. Miller,
	Jakub Kicinski, Vivien Didelot, linux-kernel, Linus Walleij,
	Alvin Šipraga, ACPI Devel Maling List, kernel-team,
	Len Brown

> > Note that phylink_destroy() is documented as:
> >
> >  * Note: the rtnl lock must not be held when calling this function.
> >

...

> 
> Any other issues besides needing to take rtnl_mutex top-level when
> calling phylink_destroy?

We should try to keep phylink_create and phylink_destroy symmetrical:

/**
 * phylink_create() - create a phylink instance
 * @config: a pointer to the target &struct phylink_config
 * @fwnode: a pointer to a &struct fwnode_handle describing the network
 *      interface
 * @iface: the desired link mode defined by &typedef phy_interface_t
 * @mac_ops: a pointer to a &struct phylink_mac_ops for the MAC.
 *
 * Create a new phylink instance, and parse the link parameters found in @np.
 * This will parse in-band modes, fixed-link or SFP configuration.
 *
 * Note: the rtnl lock must not be held when calling this function.

Having different locking requirements will catch people out.

Interestingly, there is no ASSERT_NO_RTNL(). Maybe we should add such
a macro.

    Andrew

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [RFC PATCH net-next 2/3] net: dsa: destroy the phylink instance on any error in dsa_slave_phy_setup
  2021-09-01 22:50 ` [RFC PATCH net-next 2/3] net: dsa: destroy the phylink instance on any error in dsa_slave_phy_setup Vladimir Oltean
  2021-09-02 12:25   ` Russell King (Oracle)
@ 2021-09-02 23:21   ` Florian Fainelli
  1 sibling, 0 replies; 42+ messages in thread
From: Florian Fainelli @ 2021-09-02 23:21 UTC (permalink / raw)
  To: Vladimir Oltean, netdev
  Cc: Greg Kroah-Hartman, Rafael J. Wysocki, Andrew Lunn,
	Heiner Kallweit, Russell King, David S. Miller, Jakub Kicinski,
	Vivien Didelot, Vladimir Oltean, linux-kernel, Linus Walleij,
	Alvin Šipraga, ACPI Devel Maling List, kernel-team,
	Len Brown



On 9/1/2021 3:50 PM, Vladimir Oltean wrote:
> DSA supports connecting to a phy-handle, and has a fallback to a non-OF
> based method of connecting to an internal PHY on the switch's own MDIO
> bus, if no phy-handle and no fixed-link nodes were present.
> 
> The -ENODEV error code from the first attempt (phylink_of_phy_connect)
> is what triggers the second attempt (phylink_connect_phy).
> 
> However, when the first attempt returns a different error code than
> -ENODEV, this results in an unbalance of calls to phylink_create and
> phylink_destroy by the time we exit the function. The phylink instance
> has leaked.
> 
> There are many other error codes that can be returned by
> phylink_of_phy_connect. For example, phylink_validate returns -EINVAL.
> So this is a practical issue too.
> 
> Fixes: aab9c4067d23 ("net: dsa: Plug in PHYLINK support")
> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>

Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
-- 
Florian

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [RFC PATCH net-next 1/3] net: phy: don't bind genphy in phy_attach_direct if the specific driver defers probe
  2021-09-02 23:02                 ` Andrew Lunn
@ 2021-09-02 23:26                   ` Vladimir Oltean
  2021-09-03  0:04                     ` Russell King (Oracle)
  0 siblings, 1 reply; 42+ messages in thread
From: Vladimir Oltean @ 2021-09-02 23:26 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: Russell King (Oracle),
	Florian Fainelli, Vladimir Oltean, netdev, Greg Kroah-Hartman,
	Rafael J. Wysocki, Heiner Kallweit, David S. Miller,
	Jakub Kicinski, Vivien Didelot, linux-kernel, Linus Walleij,
	Alvin Šipraga, ACPI Devel Maling List, kernel-team,
	Len Brown

On Fri, Sep 03, 2021 at 01:02:06AM +0200, Andrew Lunn wrote:
> We should try to keep phylink_create and phylink_destroy symmetrical:
> 
> /**
>  * phylink_create() - create a phylink instance
>  * @config: a pointer to the target &struct phylink_config
>  * @fwnode: a pointer to a &struct fwnode_handle describing the network
>  *      interface
>  * @iface: the desired link mode defined by &typedef phy_interface_t
>  * @mac_ops: a pointer to a &struct phylink_mac_ops for the MAC.
>  *
>  * Create a new phylink instance, and parse the link parameters found in @np.
>  * This will parse in-band modes, fixed-link or SFP configuration.
>  *
>  * Note: the rtnl lock must not be held when calling this function.
> 
> Having different locking requirements will catch people out.
> 
> Interestingly, there is no ASSERT_NO_RTNL(). Maybe we should add such
> a macro.

In this case, the easiest might be to just take a different mutex in
dpaa2 which serializes all places that access the priv->mac references.
I don't know exactly why the SFP bus needs the rtnl_mutex, I've removed
those locks and will see what fails tomorrow, but I don't think dpaa2
has a good enough justification to take the rtnl_mutex just so that it
can connect and disconnect to the MAC freely at runtime.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [RFC PATCH net-next 0/3] Make the PHY library stop being so greedy when binding the generic PHY driver
  2021-09-02 22:05 ` Vladimir Oltean
@ 2021-09-02 23:29   ` Saravana Kannan
  0 siblings, 0 replies; 42+ messages in thread
From: Saravana Kannan @ 2021-09-02 23:29 UTC (permalink / raw)
  To: Vladimir Oltean
  Cc: Vladimir Oltean, netdev, Greg Kroah-Hartman, Rafael J. Wysocki,
	Andrew Lunn, Heiner Kallweit, Russell King, David S. Miller,
	Jakub Kicinski, Vivien Didelot, Florian Fainelli, linux-kernel,
	Linus Walleij, Alvin Šipraga, ACPI Devel Maling List,
	kernel-team, Len Brown

On Thu, Sep 2, 2021 at 3:05 PM Vladimir Oltean <olteanv@gmail.com> wrote:
>
> On Thu, Sep 02, 2021 at 01:50:50AM +0300, Vladimir Oltean wrote:
> > This is a continuation of the discussion on patch "[v1,1/2] driver core:
> > fw_devlink: Add support for FWNODE_FLAG_BROKEN_PARENT" from here:
> > https://patchwork.kernel.org/project/netdevbpf/patch/20210826074526.825517-2-saravanak@google.com/
> >
> > Summary: in a complex combination of device dependencies which is not
> > really relevant to what is being proposed here, DSA ends up calling
> > phylink_of_phy_connect during a period in which the PHY driver goes
> > through a series of probe deferral events.
> >
> > The central point of that discussion is that DSA seems "broken" for
> > expecting the PHY driver to probe immediately on PHYs belonging to the
> > internal MDIO buses of switches. A few suggestions were made about what
> > to do, but some were not satisfactory and some did not solve the problem.
> >
> > In fact, fw_devlink, the mechanism that causes the PHY driver to defer
> > probing in this particular case, has some significant "issues" too, but
> > its "issues" are only in quotes "because at worst it'd allow a few
> > unnecessary deferred probes":
> > https://patchwork.kernel.org/project/netdevbpf/patch/20210826074526.825517-2-saravanak@google.com/#24418895
> >
> > So if that's the criterion by which an issue is an issue, maybe we
> > should take a step back and look at the bigger picture.
> >
> > There is nothing about the idea that a PHY might defer probing, or about
> > the changes proposed here, that has anything with DSA. Furthermore, the
> > changes done by this series solve the problem in the same way: "they
> > allow a few unnecessary deferred probes" <- in this case they provoke
> > this to the caller of phy_attach_direct.
> >
> > If we look at commit 16983507742c ("net: phy: probe PHY drivers
> > synchronously"), we see that the PHY library expectation is for the PHY
> > device to have a PHY driver bound to it as soon as device_add() finishes.
> >
> > Well, as it turns out, in case the PHY device has any supplier which is
> > not ready, this is not possible, but that patch still seems to ensure
> > that the process of binding a driver to the device has at least started.
> > That process will continue for a while, and will race with
> > phy_attach_direct calls, so we need to make the latter observe the fact
> > that a driver is struggling to probe, and wait for it a bit more.
> >
> > What I've not tested is loading the PHY module at runtime, and seeing
> > how phy_attach_direct behaves then. I expect that this change set will
> > not alter the behavior in that case: the genphy will still bind to a
> > device with no driver, and phy_attach_direct will not return -EPROBE_DEFER
> > in that case.
> >
> > I might not be very versed in the device core/internals, but the patches
> > make sense to me, and worked as intended from the first try on my system
> > (Turris MOX with mv88e6xxx), which was modified to make the same "sins"
> > as those called out in the thread above:
> >
> > - use PHY interrupts provided by the switch itself as an interrupt-controller
> > - call of_mdiobus_register from setup() and not from probe(), so as to
> >   not circumvent fw_devlink's limitations, and still get to hit the PHY
> >   probe deferral conditions.
> >
> > So feedback and testing on other platforms is very appreciated.
> >
> > Vladimir Oltean (3):
> >   net: phy: don't bind genphy in phy_attach_direct if the specific
> >     driver defers probe
> >   net: dsa: destroy the phylink instance on any error in
> >     dsa_slave_phy_setup
> >   net: dsa: allow the phy_connect() call to return -EPROBE_DEFER
> >
> >  drivers/base/dd.c            | 21 +++++++++++++++++++--
> >  drivers/net/phy/phy_device.c |  8 ++++++++
> >  include/linux/device.h       |  1 +
> >  net/dsa/dsa2.c               |  2 ++
> >  net/dsa/slave.c              | 12 +++++-------
> >  5 files changed, 35 insertions(+), 9 deletions(-)
> >
> > --
> > 2.25.1
> >
>
> Ouch, I just realized that Saravana, the person whose reaction I've been
> waiting for the most, is not copied....
>
> Saravana, you can find the thread here to sync up with what has been
> discussed:
> https://patchwork.kernel.org/project/netdevbpf/cover/20210901225053.1205571-1-vladimir.oltean@nxp.com/

Woah! The thread blew up.

>
> Sorry.

No worries.

I'll read through the thread later and maybe provide more responses,
but one thing I wanted to say right away:

Don't depend on dev->p->deferred_probe. It can be "empty" for a device
that has returned -EPROBE_DEFER for a bunch of reasons:

1. When the device is in the middle of being reattempted, it would be
empty. You can't hold any lock that'll ensure correctness either
because deferred probe locking is a mess (I'm working on cleaning that
up).

2. I'm working on actually not adding devices to that list if there's
a known supplier that hasn't been probed yet. No point retrying it
again and again for every deferred probe trigger when we know it's
going to fail. And we'll basically get topological probe ordering.

Your closest bet right now is d->can_match. Only caveat is that it's
not cleared if the actual driver gets unregistered.


-Saravana

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [RFC PATCH net-next 1/3] net: phy: don't bind genphy in phy_attach_direct if the specific driver defers probe
  2021-09-02 23:26                   ` Vladimir Oltean
@ 2021-09-03  0:04                     ` Russell King (Oracle)
  2021-09-03 20:48                       ` Vladimir Oltean
  0 siblings, 1 reply; 42+ messages in thread
From: Russell King (Oracle) @ 2021-09-03  0:04 UTC (permalink / raw)
  To: Vladimir Oltean
  Cc: Andrew Lunn, Florian Fainelli, Vladimir Oltean, netdev,
	Greg Kroah-Hartman, Rafael J. Wysocki, Heiner Kallweit,
	David S. Miller, Jakub Kicinski, Vivien Didelot, linux-kernel,
	Linus Walleij, Alvin Šipraga, ACPI Devel Maling List,
	kernel-team, Len Brown

On Fri, Sep 03, 2021 at 02:26:07AM +0300, Vladimir Oltean wrote:
> On Fri, Sep 03, 2021 at 01:02:06AM +0200, Andrew Lunn wrote:
> > We should try to keep phylink_create and phylink_destroy symmetrical:
> > 
> > /**
> >  * phylink_create() - create a phylink instance
> >  * @config: a pointer to the target &struct phylink_config
> >  * @fwnode: a pointer to a &struct fwnode_handle describing the network
> >  *      interface
> >  * @iface: the desired link mode defined by &typedef phy_interface_t
> >  * @mac_ops: a pointer to a &struct phylink_mac_ops for the MAC.
> >  *
> >  * Create a new phylink instance, and parse the link parameters found in @np.
> >  * This will parse in-band modes, fixed-link or SFP configuration.
> >  *
> >  * Note: the rtnl lock must not be held when calling this function.
> > 
> > Having different locking requirements will catch people out.
> > 
> > Interestingly, there is no ASSERT_NO_RTNL(). Maybe we should add such
> > a macro.
> 
> In this case, the easiest might be to just take a different mutex in
> dpaa2 which serializes all places that access the priv->mac references.
> I don't know exactly why the SFP bus needs the rtnl_mutex, I've removed
> those locks and will see what fails tomorrow, but I don't think dpaa2
> has a good enough justification to take the rtnl_mutex just so that it
> can connect and disconnect to the MAC freely at runtime.

It needs it to ensure that the sfp-bus code is safe. sfp-bus code
sits between phylink and the sfp stuff, and will be called from
either side. It can't have its own lock, because that gives lockdep
splats.

Removing a lock and then running the kernel is a down right stupid
way to test to see if a lock is necessary.

That approach is like having built a iron bridge, covered it in paint,
then you remove most the bolts, and then test to see whether it's safe
for vehicles to travel over it by riding your bicycle across it and
declaring it safe.

Sorry, but if you think "remove lock, run kernel, if it works fine
the lock is unnecessary" is a valid approach, then you've just
disqualified yourself from discussing this topic any further.
Locking is done by knowing the code and code analysis, not by
playing "does the code fail if I remove it" games. I am utterly
shocked that you think that this is a valid approach.

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [RFC PATCH net-next 1/3] net: phy: don't bind genphy in phy_attach_direct if the specific driver defers probe
  2021-09-02 22:24             ` Russell King (Oracle)
  2021-09-02 22:45               ` Vladimir Oltean
@ 2021-09-03  9:27               ` Ioana Ciornei
  1 sibling, 0 replies; 42+ messages in thread
From: Ioana Ciornei @ 2021-09-03  9:27 UTC (permalink / raw)
  To: Russell King (Oracle)
  Cc: Vladimir Oltean, Florian Fainelli, Andrew Lunn, Vladimir Oltean,
	netdev, Greg Kroah-Hartman, Rafael J. Wysocki, Heiner Kallweit,
	David S. Miller, Jakub Kicinski, Vivien Didelot, linux-kernel,
	Linus Walleij, Alvin Šipraga, ACPI Devel Maling List,
	kernel-team, Len Brown

On Thu, Sep 02, 2021 at 11:24:39PM +0100, Russell King (Oracle) wrote:
> On Fri, Sep 03, 2021 at 12:39:49AM +0300, Vladimir Oltean wrote:
> > On Thu, Sep 02, 2021 at 10:33:03PM +0100, Russell King (Oracle) wrote:
> > > That's probably an unreliable indicator. DPAA2 has weirdness in the
> > > way it can dynamically create and destroy network interfaces, which
> > > does lead to problems with the rtnl lock. I've been carrying a patch
> > > from NXP for this for almost two years now, which NXP still haven't
> > > submitted:
> > > 
> > > http://git.armlinux.org.uk/cgit/linux-arm.git/commit/?h=cex7&id=a600f2ee50223e9bcdcf86b65b4c427c0fd425a4
> > > 
> > > ... and I've no idea why that patch never made mainline. I need it
> > > to avoid the stated deadlock on SolidRun Honeycomb platforms when
> > > creating additional network interfaces for the SFP cages in userspace.
> > 
> > Ah, nice, I've copied that broken logic for the dpaa2-switch too:
> > https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git/commit/?id=d52ef12f7d6c016f3b249db95af33f725e3dd065
> > 
> > So why don't you send the patch? I can send it too if you want to, one
> > for the switch and one for the DPNI driver.
> 
> Sorry, I mis-stated. NXP did submit that exact patch, but it's actually
> incorrect for the reason I stated when it was sent:
> 
> https://patchwork.ozlabs.org/project/netdev/patch/1574363727-5437-2-git-send-email-ioana.ciornei@nxp.com/
> 
> I did miss the rtnl_lock() around phylink_disconnect_phy() in the
> description of the race, which goes someway towards hiding it, but
> there is still a race between phylink_destroy() and another thread
> calling dpaa2_eth_get_link_ksettings(), and priv->mac being freed:
> 
> static int
> dpaa2_eth_get_link_ksettings(struct net_device *net_dev,
>                              struct ethtool_link_ksettings *link_settings)
> {
>         struct dpaa2_eth_priv *priv = netdev_priv(net_dev);
> 
>         if (dpaa2_eth_is_type_phy(priv))
>                 return phylink_ethtool_ksettings_get(priv->mac->phylink,
>                                                      link_settings);
> 
> which dereferences priv->mac and priv->mac->phylink, vs:
> 
> static irqreturn_t dpni_irq0_handler_thread(int irq_num, void *arg)
> {
> ...
>         if (status & DPNI_IRQ_EVENT_ENDPOINT_CHANGED) {
>                 dpaa2_eth_set_mac_addr(netdev_priv(net_dev));
>                 dpaa2_eth_update_tx_fqids(priv);
> 
>                 if (dpaa2_eth_has_mac(priv))
>                         dpaa2_eth_disconnect_mac(priv);
>                 else
>                         dpaa2_eth_connect_mac(priv);
>         }
> 
> static void dpaa2_eth_disconnect_mac(struct dpaa2_eth_priv *priv)
> {
>         if (dpaa2_eth_is_type_phy(priv))
>                 dpaa2_mac_disconnect(priv->mac);
> 
>         if (!dpaa2_eth_has_mac(priv))
>                 return;
> 
>         dpaa2_mac_close(priv->mac);
>         kfree(priv->mac);		<== potential use after free bug by
>         priv->mac = NULL;		<== dpaa2_eth_get_link_ksettings()
> }
> 
> void dpaa2_mac_disconnect(struct dpaa2_mac *mac)
> {
>         if (!mac->phylink)
>                 return;
> 
>         phylink_disconnect_phy(mac->phylink);
>         phylink_destroy(mac->phylink);	<== another use-after-free bug via
> 					    dpaa2_eth_get_link_ksettings()
>         dpaa2_pcs_destroy(mac);
> }
> 
> Note that phylink_destroy() is documented as:
> 
>  * Note: the rtnl lock must not be held when calling this function.
> 
> because it calls sfp_bus_del_upstream(), which will take the rtnl lock
> itself. An alternative solution would be to remove the rtnl locking
> from sfp_bus_del_upstream(), but then force _everyone_ to take the
> rtnl lock before calling phylink_destroy() - meaning a larger block of
> code ends up executing under the lock than is really necessary.
> 
> However, as I stated in my review of the patch "As I've already stated,
> the phylink is not designed to be created and destroyed on a published
> network device." That still remains true today, and it seems that the
> issue has never been fixed in DPAA2 despite having been pointed out.
> 

My attempt to fix this issue was that patch that you just pointed at.
Taking your feedback into account (that phylink is not designed to be
created and destroyed on a published networking device) I really do not
know what other viable solution to send out.

The alternative here would have been to just have a different driver for
the MAC side (probing on dpmac objects) that creates the phylink
instance at probe time and then is just used by the dpaa2-eth driver
when it connects to a dpmac. This way no phylink is created/destroyed
dynamically.

This was the architecture of my initial attempt at supporting phylink in
DPAA2.
https://patchwork.ozlabs.org/project/netdev/patch/1560470153-26155-5-git-send-email-ioana.ciornei@nxp.com/

If you have any suggestion on how I should go about fixing this, please
let me know.

Ioana


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [RFC PATCH net-next 1/3] net: phy: don't bind genphy in phy_attach_direct if the specific driver defers probe
  2021-09-03  0:04                     ` Russell King (Oracle)
@ 2021-09-03 20:48                       ` Vladimir Oltean
  2021-09-03 22:06                         ` Russell King (Oracle)
  0 siblings, 1 reply; 42+ messages in thread
From: Vladimir Oltean @ 2021-09-03 20:48 UTC (permalink / raw)
  To: Russell King (Oracle)
  Cc: Andrew Lunn, Florian Fainelli, Vladimir Oltean, netdev,
	Greg Kroah-Hartman, Rafael J. Wysocki, Heiner Kallweit,
	David S. Miller, Jakub Kicinski, Vivien Didelot, linux-kernel,
	Linus Walleij, Alvin Šipraga, ACPI Devel Maling List,
	kernel-team, Len Brown

On Fri, Sep 03, 2021 at 01:04:19AM +0100, Russell King (Oracle) wrote:
> Removing a lock and then running the kernel is a down right stupid
> way to test to see if a lock is necessary.
> 
> That approach is like having built a iron bridge, covered it in paint,
> then you remove most the bolts, and then test to see whether it's safe
> for vehicles to travel over it by riding your bicycle across it and
> declaring it safe.
> 
> Sorry, but if you think "remove lock, run kernel, if it works fine
> the lock is unnecessary" is a valid approach, then you've just
> disqualified yourself from discussing this topic any further.
> Locking is done by knowing the code and code analysis, not by
> playing "does the code fail if I remove it" games. I am utterly
> shocked that you think that this is a valid approach.

... and this is exactly why you will no longer get any attention from me
on this topic. Good luck.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [RFC PATCH net-next 1/3] net: phy: don't bind genphy in phy_attach_direct if the specific driver defers probe
  2021-09-03 20:48                       ` Vladimir Oltean
@ 2021-09-03 22:06                         ` Russell King (Oracle)
  0 siblings, 0 replies; 42+ messages in thread
From: Russell King (Oracle) @ 2021-09-03 22:06 UTC (permalink / raw)
  To: Vladimir Oltean
  Cc: Andrew Lunn, Florian Fainelli, Vladimir Oltean, netdev,
	Greg Kroah-Hartman, Rafael J. Wysocki, Heiner Kallweit,
	David S. Miller, Jakub Kicinski, Vivien Didelot, linux-kernel,
	Linus Walleij, Alvin Šipraga, ACPI Devel Maling List,
	kernel-team, Len Brown

On Fri, Sep 03, 2021 at 11:48:22PM +0300, Vladimir Oltean wrote:
> On Fri, Sep 03, 2021 at 01:04:19AM +0100, Russell King (Oracle) wrote:
> > Removing a lock and then running the kernel is a down right stupid
> > way to test to see if a lock is necessary.
> > 
> > That approach is like having built a iron bridge, covered it in paint,
> > then you remove most the bolts, and then test to see whether it's safe
> > for vehicles to travel over it by riding your bicycle across it and
> > declaring it safe.
> > 
> > Sorry, but if you think "remove lock, run kernel, if it works fine
> > the lock is unnecessary" is a valid approach, then you've just
> > disqualified yourself from discussing this topic any further.
> > Locking is done by knowing the code and code analysis, not by
> > playing "does the code fail if I remove it" games. I am utterly
> > shocked that you think that this is a valid approach.
> 
> ... and this is exactly why you will no longer get any attention from me
> on this topic. Good luck.

Good, because your approach to this to me reads as "I don't think you
know what the hell you're doing so I'm going to remove a lock to test
whether it is needed." Effectively, that action is an insult towards
me as the author of that code.

And as I said, if you think that's a valid approach, then quite frankly
I don't want you touching my code, because you clearly don't know what
you're doing as you aren't willing to put the necessary effort in to
understanding the code.

Removing a lock and running the kernel is _never_ a valid way to see
whether the lock is required or not. The only way is via code analysis.

I wonder whether you'd take the same approach with filesystems or
memory management code. Why don't you try removing some locks from
those subsystems and see how long your filesystems last?

You could have asked why the lock was necessary, and I would have
described it. That would have been the civil approach. Maybe even
put forward a hypothesis why you think the lock isn't necessary, but
no, you decide that the best way to go about this is to remove the
lock and see whether the kernel breaks.

It may shock you to know that those of us who have been working on
the kernel for almost 30 years and have seen the evolution of the
kernel from uniprocessor to SMP, have had to debug race conditions
caused by a lack of locking know very well that you can have what
seems to be a functioning kernel despite missing locks - and such a
kernel can last quite a long time and only show up the race quite
rarely. This is exactly why "lets remove the lock and see if it
breaks" is a completely invalid approach. I'm sorry that you don't
seem to realise just how stupid a suggestion that was.

I can tell you now: removing the locks you proposed will not show an
immediate problem, but by removing those locks you will definitely
open up race conditions between driver binding events on the SFP
side and network usage on the netdev side which will only occur
rarely.

And just because they only happen rarely is not a justification to
remove locks, no matter how inconvenient those locks may be.

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!

^ permalink raw reply	[flat|nested] 42+ messages in thread

end of thread, other threads:[~2021-09-03 22:06 UTC | newest]

Thread overview: 42+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-09-01 22:50 [RFC PATCH net-next 0/3] Make the PHY library stop being so greedy when binding the generic PHY driver Vladimir Oltean
2021-09-01 22:50 ` [RFC PATCH net-next 1/3] net: phy: don't bind genphy in phy_attach_direct if the specific driver defers probe Vladimir Oltean
2021-09-02  5:43   ` Greg Kroah-Hartman
2021-09-02 10:11     ` Vladimir Oltean
2021-09-02 10:37       ` Greg Kroah-Hartman
2021-09-02 11:17         ` Vladimir Oltean
2021-09-02 14:37     ` Rafael J. Wysocki
2021-09-02 18:50   ` Russell King (Oracle)
2021-09-02 19:23     ` Vladimir Oltean
2021-09-02 19:51     ` Andrew Lunn
2021-09-02 20:33       ` Florian Fainelli
2021-09-02 21:33         ` Russell King (Oracle)
2021-09-02 21:39           ` Vladimir Oltean
2021-09-02 22:24             ` Russell King (Oracle)
2021-09-02 22:45               ` Vladimir Oltean
2021-09-02 23:02                 ` Andrew Lunn
2021-09-02 23:26                   ` Vladimir Oltean
2021-09-03  0:04                     ` Russell King (Oracle)
2021-09-03 20:48                       ` Vladimir Oltean
2021-09-03 22:06                         ` Russell King (Oracle)
2021-09-03  9:27               ` Ioana Ciornei
2021-09-01 22:50 ` [RFC PATCH net-next 2/3] net: dsa: destroy the phylink instance on any error in dsa_slave_phy_setup Vladimir Oltean
2021-09-02 12:25   ` Russell King (Oracle)
2021-09-02 23:21   ` Florian Fainelli
2021-09-01 22:50 ` [RFC PATCH net-next 3/3] net: dsa: allow the phy_connect() call to return -EPROBE_DEFER Vladimir Oltean
2021-09-02 12:19 ` [RFC PATCH net-next 0/3] Make the PHY library stop being so greedy when binding the generic PHY driver Russell King (Oracle)
2021-09-02 12:35   ` Vladimir Oltean
2021-09-02 12:59     ` Vladimir Oltean
2021-09-02 13:26     ` Russell King (Oracle)
2021-09-02 15:23       ` Vladimir Oltean
2021-09-02 16:31         ` Russell King (Oracle)
2021-09-02 17:10           ` Vladimir Oltean
2021-09-02 17:50             ` Russell King (Oracle)
2021-09-02 19:05               ` Vladimir Oltean
2021-09-02 20:03                 ` Russell King (Oracle)
2021-09-02 20:21                   ` Vladimir Oltean
2021-09-02 20:29                     ` Russell King (Oracle)
2021-09-02 20:07     ` Andrew Lunn
2021-09-02 20:32       ` Vladimir Oltean
2021-09-02 21:39         ` Russell King (Oracle)
2021-09-02 22:05 ` Vladimir Oltean
2021-09-02 23:29   ` Saravana Kannan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).