linux-acpi.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 0/3] fw_devlink bug fixes
@ 2021-09-15 17:09 Saravana Kannan
  2021-09-15 17:09 ` [PATCH v3 1/3] driver core: fw_devlink: Improve handling of cyclic dependencies Saravana Kannan
                   ` (4 more replies)
  0 siblings, 5 replies; 26+ messages in thread
From: Saravana Kannan @ 2021-09-15 17:09 UTC (permalink / raw)
  To: Greg Kroah-Hartman, Rafael J. Wysocki, Andrew Lunn,
	Heiner Kallweit, Russell King, David S. Miller, Jakub Kicinski,
	Len Brown, Saravana Kannan
  Cc: Geert Uytterhoeven, Vladimir Oltean, kernel-team, linux-kernel,
	netdev, linux-acpi

Intended for 5.15.

Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Andrew Lunn <andrew@lunn.ch>
Cc: Vladimir Oltean <olteanv@gmail.com>

v1->v2:
- Added a few Reviewed-by and Tested-by tags
- Addressed Geert's comments in patches 3 and 5
- Dropped the fw_devlink.debug patch
- Added 2 more patches to the series to address other fw_devlink issues

v2->v3:
- Split the logging/debug changes into a separate series

Thanks,
Saravana

Saravana Kannan (3):
  driver core: fw_devlink: Improve handling of cyclic dependencies
  driver core: fw_devlink: Add support for
    FWNODE_FLAG_NEEDS_CHILD_BOUND_ON_ADD
  net: mdiobus: Set FWNODE_FLAG_NEEDS_CHILD_BOUND_ON_ADD for mdiobus
    parents

 drivers/base/core.c        | 36 +++++++++++++++++++++++++++++++-----
 drivers/net/phy/mdio_bus.c |  4 ++++
 include/linux/fwnode.h     | 11 ++++++++---
 3 files changed, 43 insertions(+), 8 deletions(-)

-- 
2.33.0.309.g3052b89438-goog


^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH v3 1/3] driver core: fw_devlink: Improve handling of cyclic dependencies
  2021-09-15 17:09 [PATCH v3 0/3] fw_devlink bug fixes Saravana Kannan
@ 2021-09-15 17:09 ` Saravana Kannan
  2021-09-18 14:38   ` Rafael J. Wysocki
  2021-09-15 17:09 ` [PATCH v3 2/3] driver core: fw_devlink: Add support for FWNODE_FLAG_NEEDS_CHILD_BOUND_ON_ADD Saravana Kannan
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 26+ messages in thread
From: Saravana Kannan @ 2021-09-15 17:09 UTC (permalink / raw)
  To: Greg Kroah-Hartman, Rafael J. Wysocki, Andrew Lunn,
	Heiner Kallweit, Russell King, David S. Miller, Jakub Kicinski,
	Len Brown, Saravana Kannan
  Cc: Geert Uytterhoeven, Vladimir Oltean, kernel-team, linux-kernel,
	netdev, linux-acpi, Marek Szyprowski

When we have a dependency of the form:

Device-A -> Device-C
	Device-B

Device-C -> Device-B

Where,
* Indentation denotes "child of" parent in previous line.
* X -> Y denotes X is consumer of Y based on firmware (Eg: DT).

We have cyclic dependency: device-A -> device-C -> device-B -> device-A

fw_devlink current treats device-C -> device-B dependency as an invalid
dependency and doesn't enforce it but leaves the rest of the
dependencies as is.

While the current behavior is necessary, it is not sufficient if the
false dependency in this example is actually device-A -> device-C. When
this is the case, device-C will correctly probe defer waiting for
device-B to be added, but device-A will be incorrectly probe deferred by
fw_devlink waiting on device-C to probe successfully. Due to this, none
of the devices in the cycle will end up probing.

To fix this, we need to go relax all the dependencies in the cycle like
we already do in the other instances where fw_devlink detects cycles.
A real world example of this was reported[1] and analyzed[2].

[1] - https://lore.kernel.org/lkml/0a2c4106-7f48-2bb5-048e-8c001a7c3fda@samsung.com/
[2] - https://lore.kernel.org/lkml/CAGETcx8peaew90SWiux=TyvuGgvTQOmO4BFALz7aj0Za5QdNFQ@mail.gmail.com/
Fixes: f9aa460672c9 ("driver core: Refactor fw_devlink feature")
Reported-by: Marek Szyprowski <m.szyprowski@samsung.com>
Signed-off-by: Saravana Kannan <saravanak@google.com>
Tested-by: Marek Szyprowski <m.szyprowski@samsung.com>
---
 drivers/base/core.c | 17 ++++++++++++-----
 1 file changed, 12 insertions(+), 5 deletions(-)

diff --git a/drivers/base/core.c b/drivers/base/core.c
index e65dd803a453..316df6027093 100644
--- a/drivers/base/core.c
+++ b/drivers/base/core.c
@@ -1772,14 +1772,21 @@ static int fw_devlink_create_devlink(struct device *con,
 	 * be broken by applying logic. Check for these types of cycles and
 	 * break them so that devices in the cycle probe properly.
 	 *
-	 * If the supplier's parent is dependent on the consumer, then
-	 * the consumer-supplier dependency is a false dependency. So,
-	 * treat it as an invalid link.
+	 * If the supplier's parent is dependent on the consumer, then the
+	 * consumer and supplier have a cyclic dependency. Since fw_devlink
+	 * can't tell which of the inferred dependencies are incorrect, don't
+	 * enforce probe ordering between any of the devices in this cyclic
+	 * dependency. Do this by relaxing all the fw_devlink device links in
+	 * this cycle and by treating the fwnode link between the consumer and
+	 * the supplier as an invalid dependency.
 	 */
 	sup_dev = fwnode_get_next_parent_dev(sup_handle);
 	if (sup_dev && device_is_dependent(con, sup_dev)) {
-		dev_dbg(con, "Not linking to %pfwP - False link\n",
-			sup_handle);
+		dev_info(con, "Fixing up cyclic dependency with %pfwP (%s)\n",
+			 sup_handle, dev_name(sup_dev));
+		device_links_write_lock();
+		fw_devlink_relax_cycle(con, sup_dev);
+		device_links_write_unlock();
 		ret = -EINVAL;
 	} else {
 		/*
-- 
2.33.0.309.g3052b89438-goog


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v3 2/3] driver core: fw_devlink: Add support for FWNODE_FLAG_NEEDS_CHILD_BOUND_ON_ADD
  2021-09-15 17:09 [PATCH v3 0/3] fw_devlink bug fixes Saravana Kannan
  2021-09-15 17:09 ` [PATCH v3 1/3] driver core: fw_devlink: Improve handling of cyclic dependencies Saravana Kannan
@ 2021-09-15 17:09 ` Saravana Kannan
  2021-09-18 15:03   ` Rafael J. Wysocki
  2021-09-15 17:09 ` [PATCH v3 3/3] net: mdiobus: Set FWNODE_FLAG_NEEDS_CHILD_BOUND_ON_ADD for mdiobus parents Saravana Kannan
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 26+ messages in thread
From: Saravana Kannan @ 2021-09-15 17:09 UTC (permalink / raw)
  To: Greg Kroah-Hartman, Rafael J. Wysocki, Andrew Lunn,
	Heiner Kallweit, Russell King, David S. Miller, Jakub Kicinski,
	Len Brown, Saravana Kannan
  Cc: Geert Uytterhoeven, Vladimir Oltean, kernel-team, linux-kernel,
	netdev, linux-acpi

If a parent device is also a supplier to a child device, fw_devlink=on by
design delays the probe() of the child device until the probe() of the
parent finishes successfully.

However, some drivers of such parent devices (where parent is also a
supplier) expect the child device to finish probing successfully as soon as
they are added using device_add() and before the probe() of the parent
device has completed successfully. One example of such a case is discussed
in the link mentioned below.

Add a flag to make fw_devlink=on not enforce these supplier-consumer
relationships, so these drivers can continue working.

Link: https://lore.kernel.org/netdev/CAGETcx_uj0V4DChME-gy5HGKTYnxLBX=TH2rag29f_p=UcG+Tg@mail.gmail.com/
Fixes: ea718c699055 ("Revert "Revert "driver core: Set fw_devlink=on by default""")
Signed-off-by: Saravana Kannan <saravanak@google.com>
---
 drivers/base/core.c    | 19 +++++++++++++++++++
 include/linux/fwnode.h | 11 ++++++++---
 2 files changed, 27 insertions(+), 3 deletions(-)

diff --git a/drivers/base/core.c b/drivers/base/core.c
index 316df6027093..21d4cb5d3767 100644
--- a/drivers/base/core.c
+++ b/drivers/base/core.c
@@ -1722,6 +1722,25 @@ static int fw_devlink_create_devlink(struct device *con,
 	struct device *sup_dev;
 	int ret = 0;
 
+	/*
+	 * In some cases, a device P might also be a supplier to its child node
+	 * C. However, this would defer the probe of C until the probe of P
+	 * completes successfully. This is perfectly fine in the device driver
+	 * model. device_add() doesn't guarantee probe completion of the device
+	 * by the time it returns.
+	 *
+	 * However, there are a few drivers that assume C will finish probing
+	 * as soon as it's added and before P finishes probing. So, we provide
+	 * a flag to let fw_devlink know not to delay the probe of C until the
+	 * probe of P completes successfully.
+	 *
+	 * When such a flag is set, we can't create device links where P is the
+	 * supplier of C as that would delay the probe of C.
+	 */
+	if (sup_handle->flags & FWNODE_FLAG_NEEDS_CHILD_BOUND_ON_ADD &&
+	    fwnode_is_ancestor_of(sup_handle, con->fwnode))
+		return -EINVAL;
+
 	sup_dev = get_dev_from_fwnode(sup_handle);
 	if (sup_dev) {
 		/*
diff --git a/include/linux/fwnode.h b/include/linux/fwnode.h
index 59828516ebaf..9f4ad719bfe3 100644
--- a/include/linux/fwnode.h
+++ b/include/linux/fwnode.h
@@ -22,10 +22,15 @@ struct device;
  * LINKS_ADDED:	The fwnode has already be parsed to add fwnode links.
  * NOT_DEVICE:	The fwnode will never be populated as a struct device.
  * INITIALIZED: The hardware corresponding to fwnode has been initialized.
+ * NEEDS_CHILD_BOUND_ON_ADD: For this fwnode/device to probe successfully, its
+ *			     driver needs its child devices to be bound with
+ *			     their respective drivers as soon as they are
+ *			     added.
  */
-#define FWNODE_FLAG_LINKS_ADDED		BIT(0)
-#define FWNODE_FLAG_NOT_DEVICE		BIT(1)
-#define FWNODE_FLAG_INITIALIZED		BIT(2)
+#define FWNODE_FLAG_LINKS_ADDED			BIT(0)
+#define FWNODE_FLAG_NOT_DEVICE			BIT(1)
+#define FWNODE_FLAG_INITIALIZED			BIT(2)
+#define FWNODE_FLAG_NEEDS_CHILD_BOUND_ON_ADD	BIT(3)
 
 struct fwnode_handle {
 	struct fwnode_handle *secondary;
-- 
2.33.0.309.g3052b89438-goog


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v3 3/3] net: mdiobus: Set FWNODE_FLAG_NEEDS_CHILD_BOUND_ON_ADD for mdiobus parents
  2021-09-15 17:09 [PATCH v3 0/3] fw_devlink bug fixes Saravana Kannan
  2021-09-15 17:09 ` [PATCH v3 1/3] driver core: fw_devlink: Improve handling of cyclic dependencies Saravana Kannan
  2021-09-15 17:09 ` [PATCH v3 2/3] driver core: fw_devlink: Add support for FWNODE_FLAG_NEEDS_CHILD_BOUND_ON_ADD Saravana Kannan
@ 2021-09-15 17:09 ` Saravana Kannan
  2021-09-23 17:30 ` [PATCH v3 0/3] fw_devlink bug fixes Greg Kroah-Hartman
  2021-09-23 22:28 ` Florian Fainelli
  4 siblings, 0 replies; 26+ messages in thread
From: Saravana Kannan @ 2021-09-15 17:09 UTC (permalink / raw)
  To: Greg Kroah-Hartman, Rafael J. Wysocki, Andrew Lunn,
	Heiner Kallweit, Russell King, David S. Miller, Jakub Kicinski,
	Len Brown, Saravana Kannan
  Cc: Geert Uytterhoeven, Vladimir Oltean, kernel-team, linux-kernel,
	netdev, linux-acpi

There are many instances of PHYs that depend on a switch to supply a
resource (Eg: interrupts). Switches also expects the PHYs to be probed
by their specific drivers as soon as they are added. If that doesn't
happen, then the switch would force the use of generic PHY drivers for
the PHY even if the PHY might have specific driver available.

fw_devlink=on by design can cause delayed probes of PHY. To avoid, this
we need to set the FWNODE_FLAG_NEEDS_CHILD_BOUND_ON_ADD for the switch's
fwnode before the PHYs are added. The most generic way to do this is to
set this flag for the parent of MDIO busses which is typically the
switch.

For more context:
https://lore.kernel.org/lkml/YTll0i6Rz3WAAYzs@lunn.ch/#t

Fixes: ea718c699055 ("Revert "Revert "driver core: Set fw_devlink=on by default""")
Suggested-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Saravana Kannan <saravanak@google.com>
---
 drivers/net/phy/mdio_bus.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/drivers/net/phy/mdio_bus.c b/drivers/net/phy/mdio_bus.c
index 53f034fc2ef7..ee8313a4ac71 100644
--- a/drivers/net/phy/mdio_bus.c
+++ b/drivers/net/phy/mdio_bus.c
@@ -525,6 +525,10 @@ int __mdiobus_register(struct mii_bus *bus, struct module *owner)
 	    NULL == bus->read || NULL == bus->write)
 		return -EINVAL;
 
+	if (bus->parent && bus->parent->of_node)
+		bus->parent->of_node->fwnode.flags |=
+					FWNODE_FLAG_NEEDS_CHILD_BOUND_ON_ADD;
+
 	BUG_ON(bus->state != MDIOBUS_ALLOCATED &&
 	       bus->state != MDIOBUS_UNREGISTERED);
 
-- 
2.33.0.309.g3052b89438-goog


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH v3 1/3] driver core: fw_devlink: Improve handling of cyclic dependencies
  2021-09-15 17:09 ` [PATCH v3 1/3] driver core: fw_devlink: Improve handling of cyclic dependencies Saravana Kannan
@ 2021-09-18 14:38   ` Rafael J. Wysocki
  0 siblings, 0 replies; 26+ messages in thread
From: Rafael J. Wysocki @ 2021-09-18 14:38 UTC (permalink / raw)
  To: Saravana Kannan
  Cc: Greg Kroah-Hartman, Rafael J. Wysocki, Andrew Lunn,
	Heiner Kallweit, Russell King, David S. Miller, Jakub Kicinski,
	Len Brown, Geert Uytterhoeven, Vladimir Oltean,
	Cc: Android Kernel, Linux Kernel Mailing List, netdev,
	ACPI Devel Maling List, Marek Szyprowski

On Wed, Sep 15, 2021 at 7:09 PM Saravana Kannan <saravanak@google.com> wrote:
>
> When we have a dependency of the form:
>
> Device-A -> Device-C
>         Device-B
>
> Device-C -> Device-B
>
> Where,
> * Indentation denotes "child of" parent in previous line.
> * X -> Y denotes X is consumer of Y based on firmware (Eg: DT).
>
> We have cyclic dependency: device-A -> device-C -> device-B -> device-A
>
> fw_devlink current treats device-C -> device-B dependency as an invalid
> dependency and doesn't enforce it but leaves the rest of the
> dependencies as is.
>
> While the current behavior is necessary, it is not sufficient if the
> false dependency in this example is actually device-A -> device-C. When
> this is the case, device-C will correctly probe defer waiting for
> device-B to be added, but device-A will be incorrectly probe deferred by
> fw_devlink waiting on device-C to probe successfully. Due to this, none
> of the devices in the cycle will end up probing.
>
> To fix this, we need to go relax all the dependencies in the cycle like
> we already do in the other instances where fw_devlink detects cycles.
> A real world example of this was reported[1] and analyzed[2].
>
> [1] - https://lore.kernel.org/lkml/0a2c4106-7f48-2bb5-048e-8c001a7c3fda@samsung.com/
> [2] - https://lore.kernel.org/lkml/CAGETcx8peaew90SWiux=TyvuGgvTQOmO4BFALz7aj0Za5QdNFQ@mail.gmail.com/
> Fixes: f9aa460672c9 ("driver core: Refactor fw_devlink feature")
> Reported-by: Marek Szyprowski <m.szyprowski@samsung.com>
> Signed-off-by: Saravana Kannan <saravanak@google.com>
> Tested-by: Marek Szyprowski <m.szyprowski@samsung.com>
> ---
>  drivers/base/core.c | 17 ++++++++++++-----
>  1 file changed, 12 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/base/core.c b/drivers/base/core.c
> index e65dd803a453..316df6027093 100644
> --- a/drivers/base/core.c
> +++ b/drivers/base/core.c
> @@ -1772,14 +1772,21 @@ static int fw_devlink_create_devlink(struct device *con,
>          * be broken by applying logic. Check for these types of cycles and
>          * break them so that devices in the cycle probe properly.
>          *
> -        * If the supplier's parent is dependent on the consumer, then
> -        * the consumer-supplier dependency is a false dependency. So,
> -        * treat it as an invalid link.
> +        * If the supplier's parent is dependent on the consumer, then the
> +        * consumer and supplier have a cyclic dependency. Since fw_devlink
> +        * can't tell which of the inferred dependencies are incorrect, don't
> +        * enforce probe ordering between any of the devices in this cyclic
> +        * dependency. Do this by relaxing all the fw_devlink device links in
> +        * this cycle and by treating the fwnode link between the consumer and
> +        * the supplier as an invalid dependency.
>          */
>         sup_dev = fwnode_get_next_parent_dev(sup_handle);
>         if (sup_dev && device_is_dependent(con, sup_dev)) {
> -               dev_dbg(con, "Not linking to %pfwP - False link\n",
> -                       sup_handle);
> +               dev_info(con, "Fixing up cyclic dependency with %pfwP (%s)\n",
> +                        sup_handle, dev_name(sup_dev));

Why not dev_dbg()?

Other than this, the change makes sense to me.

> +               device_links_write_lock();
> +               fw_devlink_relax_cycle(con, sup_dev);
> +               device_links_write_unlock();
>                 ret = -EINVAL;
>         } else {
>                 /*
> --
> 2.33.0.309.g3052b89438-goog
>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v3 2/3] driver core: fw_devlink: Add support for FWNODE_FLAG_NEEDS_CHILD_BOUND_ON_ADD
  2021-09-15 17:09 ` [PATCH v3 2/3] driver core: fw_devlink: Add support for FWNODE_FLAG_NEEDS_CHILD_BOUND_ON_ADD Saravana Kannan
@ 2021-09-18 15:03   ` Rafael J. Wysocki
  2021-09-19  1:16     ` Andrew Lunn
  0 siblings, 1 reply; 26+ messages in thread
From: Rafael J. Wysocki @ 2021-09-18 15:03 UTC (permalink / raw)
  To: Saravana Kannan
  Cc: Greg Kroah-Hartman, Rafael J. Wysocki, Andrew Lunn,
	Heiner Kallweit, Russell King, David S. Miller, Jakub Kicinski,
	Len Brown, Geert Uytterhoeven, Vladimir Oltean,
	Cc: Android Kernel, Linux Kernel Mailing List, netdev,
	ACPI Devel Maling List

On Wed, Sep 15, 2021 at 7:09 PM Saravana Kannan <saravanak@google.com> wrote:
>
> If a parent device is also a supplier to a child device, fw_devlink=on by
> design delays the probe() of the child device until the probe() of the
> parent finishes successfully.
>
> However, some drivers of such parent devices (where parent is also a
> supplier) expect the child device to finish probing successfully as soon as
> they are added using device_add() and before the probe() of the parent
> device has completed successfully. One example of such a case is discussed
> in the link mentioned below.
>
> Add a flag to make fw_devlink=on not enforce these supplier-consumer
> relationships, so these drivers can continue working.
>
> Link: https://lore.kernel.org/netdev/CAGETcx_uj0V4DChME-gy5HGKTYnxLBX=TH2rag29f_p=UcG+Tg@mail.gmail.com/
> Fixes: ea718c699055 ("Revert "Revert "driver core: Set fw_devlink=on by default""")
> Signed-off-by: Saravana Kannan <saravanak@google.com>
> ---
>  drivers/base/core.c    | 19 +++++++++++++++++++
>  include/linux/fwnode.h | 11 ++++++++---
>  2 files changed, 27 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/base/core.c b/drivers/base/core.c
> index 316df6027093..21d4cb5d3767 100644
> --- a/drivers/base/core.c
> +++ b/drivers/base/core.c
> @@ -1722,6 +1722,25 @@ static int fw_devlink_create_devlink(struct device *con,
>         struct device *sup_dev;
>         int ret = 0;
>
> +       /*
> +        * In some cases, a device P might also be a supplier to its child node
> +        * C. However, this would defer the probe of C until the probe of P
> +        * completes successfully. This is perfectly fine in the device driver
> +        * model. device_add() doesn't guarantee probe completion of the device
> +        * by the time it returns.

That's right.

However, I don't quite see a point in device links where the supplier
is a direct ancestor of the consumer.  From the PM perspective they
are simply redundant and I'm not sure about any other use cases where
they aren't.

So what's the reason to add them in the first place?

> +        *
> +        * However, there are a few drivers that assume C will finish probing
> +        * as soon as it's added and before P finishes probing. So, we provide
> +        * a flag to let fw_devlink know not to delay the probe of C until the
> +        * probe of P completes successfully.

Well, who's expected to set that flag and when?  This needs to be
mentioned here IMO.

> +        *
> +        * When such a flag is set, we can't create device links where P is the
> +        * supplier of C as that would delay the probe of C.
> +        */
> +       if (sup_handle->flags & FWNODE_FLAG_NEEDS_CHILD_BOUND_ON_ADD &&
> +           fwnode_is_ancestor_of(sup_handle, con->fwnode))
> +               return -EINVAL;
> +
>         sup_dev = get_dev_from_fwnode(sup_handle);
>         if (sup_dev) {
>                 /*
> diff --git a/include/linux/fwnode.h b/include/linux/fwnode.h
> index 59828516ebaf..9f4ad719bfe3 100644
> --- a/include/linux/fwnode.h
> +++ b/include/linux/fwnode.h
> @@ -22,10 +22,15 @@ struct device;
>   * LINKS_ADDED:        The fwnode has already be parsed to add fwnode links.
>   * NOT_DEVICE: The fwnode will never be populated as a struct device.
>   * INITIALIZED: The hardware corresponding to fwnode has been initialized.
> + * NEEDS_CHILD_BOUND_ON_ADD: For this fwnode/device to probe successfully, its
> + *                          driver needs its child devices to be bound with
> + *                          their respective drivers as soon as they are
> + *                          added.

The fact that this requires so much comment text here is a clear
band-aid indication to me.

>   */
> -#define FWNODE_FLAG_LINKS_ADDED                BIT(0)
> -#define FWNODE_FLAG_NOT_DEVICE         BIT(1)
> -#define FWNODE_FLAG_INITIALIZED                BIT(2)
> +#define FWNODE_FLAG_LINKS_ADDED                        BIT(0)
> +#define FWNODE_FLAG_NOT_DEVICE                 BIT(1)
> +#define FWNODE_FLAG_INITIALIZED                        BIT(2)
> +#define FWNODE_FLAG_NEEDS_CHILD_BOUND_ON_ADD   BIT(3)
>
>  struct fwnode_handle {
>         struct fwnode_handle *secondary;
> --
> 2.33.0.309.g3052b89438-goog
>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v3 2/3] driver core: fw_devlink: Add support for FWNODE_FLAG_NEEDS_CHILD_BOUND_ON_ADD
  2021-09-18 15:03   ` Rafael J. Wysocki
@ 2021-09-19  1:16     ` Andrew Lunn
  2021-09-19  6:41       ` Greg Kroah-Hartman
  2021-09-21 16:15       ` Greg Kroah-Hartman
  0 siblings, 2 replies; 26+ messages in thread
From: Andrew Lunn @ 2021-09-19  1:16 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Saravana Kannan, Greg Kroah-Hartman, Heiner Kallweit,
	Russell King, David S. Miller, Jakub Kicinski, Len Brown,
	Geert Uytterhoeven, Vladimir Oltean, Cc: Android Kernel,
	Linux Kernel Mailing List, netdev, ACPI Devel Maling List

> > diff --git a/include/linux/fwnode.h b/include/linux/fwnode.h
> > index 59828516ebaf..9f4ad719bfe3 100644
> > --- a/include/linux/fwnode.h
> > +++ b/include/linux/fwnode.h
> > @@ -22,10 +22,15 @@ struct device;
> >   * LINKS_ADDED:        The fwnode has already be parsed to add fwnode links.
> >   * NOT_DEVICE: The fwnode will never be populated as a struct device.
> >   * INITIALIZED: The hardware corresponding to fwnode has been initialized.
> > + * NEEDS_CHILD_BOUND_ON_ADD: For this fwnode/device to probe successfully, its
> > + *                          driver needs its child devices to be bound with
> > + *                          their respective drivers as soon as they are
> > + *                          added.
> 
> The fact that this requires so much comment text here is a clear
> band-aid indication to me.

This whole patchset is a band aid, but it is for stable, to fix things
which are currently broken. So we need to answer the question, is a
bad aid good enough for stable, with the assumption a real fix will
come along later?

     Andrew

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v3 2/3] driver core: fw_devlink: Add support for FWNODE_FLAG_NEEDS_CHILD_BOUND_ON_ADD
  2021-09-19  1:16     ` Andrew Lunn
@ 2021-09-19  6:41       ` Greg Kroah-Hartman
  2021-09-21 16:15       ` Greg Kroah-Hartman
  1 sibling, 0 replies; 26+ messages in thread
From: Greg Kroah-Hartman @ 2021-09-19  6:41 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: Rafael J. Wysocki, Saravana Kannan, Heiner Kallweit,
	Russell King, David S. Miller, Jakub Kicinski, Len Brown,
	Geert Uytterhoeven, Vladimir Oltean, Cc: Android Kernel,
	Linux Kernel Mailing List, netdev, ACPI Devel Maling List

On Sun, Sep 19, 2021 at 03:16:34AM +0200, Andrew Lunn wrote:
> > > diff --git a/include/linux/fwnode.h b/include/linux/fwnode.h
> > > index 59828516ebaf..9f4ad719bfe3 100644
> > > --- a/include/linux/fwnode.h
> > > +++ b/include/linux/fwnode.h
> > > @@ -22,10 +22,15 @@ struct device;
> > >   * LINKS_ADDED:        The fwnode has already be parsed to add fwnode links.
> > >   * NOT_DEVICE: The fwnode will never be populated as a struct device.
> > >   * INITIALIZED: The hardware corresponding to fwnode has been initialized.
> > > + * NEEDS_CHILD_BOUND_ON_ADD: For this fwnode/device to probe successfully, its
> > > + *                          driver needs its child devices to be bound with
> > > + *                          their respective drivers as soon as they are
> > > + *                          added.
> > 
> > The fact that this requires so much comment text here is a clear
> > band-aid indication to me.
> 
> This whole patchset is a band aid, but it is for stable, to fix things
> which are currently broken. So we need to answer the question, is a
> bad aid good enough for stable, with the assumption a real fix will
> come along later?

Fix it properly first and worry about stable later.

greg k-h

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v3 2/3] driver core: fw_devlink: Add support for FWNODE_FLAG_NEEDS_CHILD_BOUND_ON_ADD
  2021-09-19  1:16     ` Andrew Lunn
  2021-09-19  6:41       ` Greg Kroah-Hartman
@ 2021-09-21 16:15       ` Greg Kroah-Hartman
  2021-09-21 16:35         ` Rafael J. Wysocki
  1 sibling, 1 reply; 26+ messages in thread
From: Greg Kroah-Hartman @ 2021-09-21 16:15 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: Rafael J. Wysocki, Saravana Kannan, Heiner Kallweit,
	Russell King, David S. Miller, Jakub Kicinski, Len Brown,
	Geert Uytterhoeven, Vladimir Oltean, Cc: Android Kernel,
	Linux Kernel Mailing List, netdev, ACPI Devel Maling List

On Sun, Sep 19, 2021 at 03:16:34AM +0200, Andrew Lunn wrote:
> > > diff --git a/include/linux/fwnode.h b/include/linux/fwnode.h
> > > index 59828516ebaf..9f4ad719bfe3 100644
> > > --- a/include/linux/fwnode.h
> > > +++ b/include/linux/fwnode.h
> > > @@ -22,10 +22,15 @@ struct device;
> > >   * LINKS_ADDED:        The fwnode has already be parsed to add fwnode links.
> > >   * NOT_DEVICE: The fwnode will never be populated as a struct device.
> > >   * INITIALIZED: The hardware corresponding to fwnode has been initialized.
> > > + * NEEDS_CHILD_BOUND_ON_ADD: For this fwnode/device to probe successfully, its
> > > + *                          driver needs its child devices to be bound with
> > > + *                          their respective drivers as soon as they are
> > > + *                          added.
> > 
> > The fact that this requires so much comment text here is a clear
> > band-aid indication to me.
> 
> This whole patchset is a band aid, but it is for stable, to fix things
> which are currently broken. So we need to answer the question, is a
> bad aid good enough for stable, with the assumption a real fix will
> come along later?

Fix it properly first, don't worry about stable.

But what is wrong with this as-is?  What needs to be done that is not
happening here that you feels still needs to be addressed?

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v3 2/3] driver core: fw_devlink: Add support for FWNODE_FLAG_NEEDS_CHILD_BOUND_ON_ADD
  2021-09-21 16:15       ` Greg Kroah-Hartman
@ 2021-09-21 16:35         ` Rafael J. Wysocki
  2021-09-21 17:56           ` Andrew Lunn
  0 siblings, 1 reply; 26+ messages in thread
From: Rafael J. Wysocki @ 2021-09-21 16:35 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Andrew Lunn, Rafael J. Wysocki, Saravana Kannan, Heiner Kallweit,
	Russell King, David S. Miller, Jakub Kicinski, Len Brown,
	Geert Uytterhoeven, Vladimir Oltean, Cc: Android Kernel,
	Linux Kernel Mailing List, netdev, ACPI Devel Maling List

On Tue, Sep 21, 2021 at 6:15 PM Greg Kroah-Hartman
<gregkh@linuxfoundation.org> wrote:
>
> On Sun, Sep 19, 2021 at 03:16:34AM +0200, Andrew Lunn wrote:
> > > > diff --git a/include/linux/fwnode.h b/include/linux/fwnode.h
> > > > index 59828516ebaf..9f4ad719bfe3 100644
> > > > --- a/include/linux/fwnode.h
> > > > +++ b/include/linux/fwnode.h
> > > > @@ -22,10 +22,15 @@ struct device;
> > > >   * LINKS_ADDED:        The fwnode has already be parsed to add fwnode links.
> > > >   * NOT_DEVICE: The fwnode will never be populated as a struct device.
> > > >   * INITIALIZED: The hardware corresponding to fwnode has been initialized.
> > > > + * NEEDS_CHILD_BOUND_ON_ADD: For this fwnode/device to probe successfully, its
> > > > + *                          driver needs its child devices to be bound with
> > > > + *                          their respective drivers as soon as they are
> > > > + *                          added.
> > >
> > > The fact that this requires so much comment text here is a clear
> > > band-aid indication to me.
> >
> > This whole patchset is a band aid, but it is for stable, to fix things
> > which are currently broken. So we need to answer the question, is a
> > bad aid good enough for stable, with the assumption a real fix will
> > come along later?
>
> Fix it properly first, don't worry about stable.
>
> But what is wrong with this as-is?  What needs to be done that is not
> happening here that you feels still needs to be addressed?

The existing code attempts to "enforce" device links where the
supplier is a direct ancestor of the consumer (e.g. its parent), which
is questionable by itself (why do that?) and that's the source of the
underlying issue (self-inflicted circular dependencies that cause
devices to wait for a deferred probe forever) which this patchest
attempts to avoid by adding an extra flag to the driver core and
expecting specific drivers to mark their devices as "special".  And
that's "until we have a real fix".

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v3 2/3] driver core: fw_devlink: Add support for FWNODE_FLAG_NEEDS_CHILD_BOUND_ON_ADD
  2021-09-21 16:35         ` Rafael J. Wysocki
@ 2021-09-21 17:56           ` Andrew Lunn
  2021-09-21 18:56             ` Rafael J. Wysocki
  0 siblings, 1 reply; 26+ messages in thread
From: Andrew Lunn @ 2021-09-21 17:56 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Greg Kroah-Hartman, Saravana Kannan, Heiner Kallweit,
	Russell King, David S. Miller, Jakub Kicinski, Len Brown,
	Geert Uytterhoeven, Vladimir Oltean, Cc: Android Kernel,
	Linux Kernel Mailing List, netdev, ACPI Devel Maling List

> The existing code attempts to "enforce" device links where the
> supplier is a direct ancestor of the consumer (e.g. its parent), which
> is questionable by itself (why do that?)

In this case, we have an Ethernet switch as the parent device. It
registers an interrupt controller, to the interrupt subsystem. It also
registers an MDIO controller to the MDIO subsystem. The MDIO subsystem
finds the Ethernet PHYs on the MDIO bus, and registers the PHYs to the
PHY subsystem.

Device tree is then used to glue all the parts together. The PHY has
an interrupt output which is connected to the interrupt controller,
and a standard DT property is used to connect the two. The MACs in the
switch are connected to the PHYs, and standard DT properties are used
to connect them together. So we have a loop. But the driver model does
not have a problem with this, at least not until fw_devlink came
along. As soon as a resource is registered with a subsystem, it can be
used. Where as fw_devlink seems to assume a resource cannot be used
until the driver providing it completes probe.

Now, we could ignore all these subsystems, re-invent the wheels inside
the switch driver, and then not have suppliers and consumers at all,
it is all internal. But that seems like a bad idea, more wheels, more
bugs.

So for me, the real fix is that fw_devlink learns that resources are
available as soon as they are registered, not when the provider device
completes probe.

	  Andrew

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v3 2/3] driver core: fw_devlink: Add support for FWNODE_FLAG_NEEDS_CHILD_BOUND_ON_ADD
  2021-09-21 17:56           ` Andrew Lunn
@ 2021-09-21 18:56             ` Rafael J. Wysocki
  2021-09-21 19:50               ` Andrew Lunn
  0 siblings, 1 reply; 26+ messages in thread
From: Rafael J. Wysocki @ 2021-09-21 18:56 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: Rafael J. Wysocki, Greg Kroah-Hartman, Saravana Kannan,
	Heiner Kallweit, Russell King, David S. Miller, Jakub Kicinski,
	Len Brown, Geert Uytterhoeven, Vladimir Oltean,
	Cc: Android Kernel, Linux Kernel Mailing List, netdev,
	ACPI Devel Maling List

On Tue, Sep 21, 2021 at 7:56 PM Andrew Lunn <andrew@lunn.ch> wrote:
>
> > The existing code attempts to "enforce" device links where the
> > supplier is a direct ancestor of the consumer (e.g. its parent), which
> > is questionable by itself (why do that?)
>
> In this case, we have an Ethernet switch as the parent device. It
> registers an interrupt controller, to the interrupt subsystem. It also
> registers an MDIO controller to the MDIO subsystem. The MDIO subsystem
> finds the Ethernet PHYs on the MDIO bus, and registers the PHYs to the
> PHY subsystem.
>
> Device tree is then used to glue all the parts together. The PHY has
> an interrupt output which is connected to the interrupt controller,
> and a standard DT property is used to connect the two. The MACs in the
> switch are connected to the PHYs, and standard DT properties are used
> to connect them together. So we have a loop. But the driver model does
> not have a problem with this, at least not until fw_devlink came
> along. As soon as a resource is registered with a subsystem, it can be
> used. Where as fw_devlink seems to assume a resource cannot be used
> until the driver providing it completes probe.

It works at a device level, so it doesn't know about resources.  The
only information it has is of the "this device may depend on that
other device" type and it uses that information to figure out a usable
probe ordering for drivers.

> Now, we could ignore all these subsystems, re-invent the wheels inside
> the switch driver, and then not have suppliers and consumers at all,
> it is all internal. But that seems like a bad idea, more wheels, more
> bugs.
>
> So for me, the real fix is that fw_devlink learns that resources are
> available as soon as they are registered, not when the provider device
> completes probe.

Because it doesn't know about individual resources, it cannot really do that.

Also if the probe has already started, it may still return
-EPROBE_DEFER at any time in theory, so as a rule the dependency is
actually known to be satisfied when the probe has successfully
completed.

However, making children wait for their parents to complete probing is
generally artificial, especially in the cases when the children are
registered by the parent's driver.  So waiting should be an exception
in these cases, not a rule.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v3 2/3] driver core: fw_devlink: Add support for FWNODE_FLAG_NEEDS_CHILD_BOUND_ON_ADD
  2021-09-21 18:56             ` Rafael J. Wysocki
@ 2021-09-21 19:50               ` Andrew Lunn
  2021-09-21 20:07                 ` Saravana Kannan
  0 siblings, 1 reply; 26+ messages in thread
From: Andrew Lunn @ 2021-09-21 19:50 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Greg Kroah-Hartman, Saravana Kannan, Heiner Kallweit,
	Russell King, David S. Miller, Jakub Kicinski, Len Brown,
	Geert Uytterhoeven, Vladimir Oltean, Cc: Android Kernel,
	Linux Kernel Mailing List, netdev, ACPI Devel Maling List

> It works at a device level, so it doesn't know about resources.  The
> only information it has is of the "this device may depend on that
> other device" type and it uses that information to figure out a usable
> probe ordering for drivers.

And that simplification is the problem. A phandle does not point to a
device, it points to a resource of a device. It should really be doing
what the driver would do, follow the phandle to the resource and see
if it exists yet. If it does not exist then yes it can defer the
probe. If the resource does exist, allow the driver to probe.

> Also if the probe has already started, it may still return
> -EPROBE_DEFER at any time in theory

Sure it can, and does. And any driver which is not broken will
unregister its resources on the error path. And that causes users of
the resources to release them. It all nicely unravels, and then tries
again later. This all works, it is what these drivers do.

> However, making children wait for their parents to complete probing is
> generally artificial, especially in the cases when the children are
> registered by the parent's driver.  So waiting should be an exception
> in these cases, not a rule.

So are you suggesting that fw_devlink core needs to change, recognise
the dependency on a parent, and allow the probe? Works for me. Gets us
back to before fw_devlink.

    Andrew

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v3 2/3] driver core: fw_devlink: Add support for FWNODE_FLAG_NEEDS_CHILD_BOUND_ON_ADD
  2021-09-21 19:50               ` Andrew Lunn
@ 2021-09-21 20:07                 ` Saravana Kannan
  2021-09-21 21:02                   ` Andrew Lunn
  2021-09-23 12:48                   ` Rafael J. Wysocki
  0 siblings, 2 replies; 26+ messages in thread
From: Saravana Kannan @ 2021-09-21 20:07 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: Rafael J. Wysocki, Greg Kroah-Hartman, Heiner Kallweit,
	Russell King, David S. Miller, Jakub Kicinski, Len Brown,
	Geert Uytterhoeven, Vladimir Oltean, Cc: Android Kernel,
	Linux Kernel Mailing List, netdev, ACPI Devel Maling List

Sorry I've been busy with LPC and some other stuff and could respond earlier.

On Tue, Sep 21, 2021 at 12:50 PM Andrew Lunn <andrew@lunn.ch> wrote:
>
> > It works at a device level, so it doesn't know about resources.  The
> > only information it has is of the "this device may depend on that
> > other device" type and it uses that information to figure out a usable
> > probe ordering for drivers.
>
> And that simplification is the problem. A phandle does not point to a
> device, it points to a resource of a device. It should really be doing
> what the driver would do, follow the phandle to the resource and see
> if it exists yet. If it does not exist then yes it can defer the
> probe. If the resource does exist, allow the driver to probe.
>
> > Also if the probe has already started, it may still return
> > -EPROBE_DEFER at any time in theory
>
> Sure it can, and does. And any driver which is not broken will
> unregister its resources on the error path. And that causes users of
> the resources to release them. It all nicely unravels, and then tries
> again later. This all works, it is what these drivers do.

One of the points of fw_devlink=on is to avoid the pointless deferred
probes that'd happen in this situation. So saying "let this happen"
when fw_devlink=on kinda beats the point of it. See further below.

>
> > However, making children wait for their parents to complete probing is
> > generally artificial, especially in the cases when the children are
> > registered by the parent's driver.  So waiting should be an exception
> > in these cases, not a rule.

Rafael,

There are cases where the children try to probe too quickly (before
the parent has had time to set up all the resources it's setting up)
and the child defers the probe. Even Andrew had an example of that
with some ethernet driver where the deferred probe is attempted
multiple times wasting time and then it eventually succeeds.

Considering there's no guarantee that a device_add() will result in
the device being bound immediately, why shouldn't we make the child
device wait until the parent has completely probed and we know all the
resources from the parent are guaranteed to be available? Why can't we
treat drivers that assume a device will get bound as soon as it's
added as the exception (because we don't guarantee that anyway)?

Also, this assumption that the child will be bound successfully upon
addition forces the parent/child drivers to play initcall chicken --
the child's driver has to be registered before the parent's driver. We
should be getting away from those by fixing the parent driver that's
making these assumptions (I'll be glad to help with that). We need to
be moving towards reducing pointless deferred probes and initcall
ordering requirements instead of saying "this bad assumption used to
work, so allow me to continue doing that".

-Saravana

> So are you suggesting that fw_devlink core needs to change, recognise
> the dependency on a parent, and allow the probe? Works for me. Gets us
> back to before fw_devlink.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v3 2/3] driver core: fw_devlink: Add support for FWNODE_FLAG_NEEDS_CHILD_BOUND_ON_ADD
  2021-09-21 20:07                 ` Saravana Kannan
@ 2021-09-21 21:02                   ` Andrew Lunn
  2021-09-21 21:54                     ` Saravana Kannan
  2021-09-23 12:48                   ` Rafael J. Wysocki
  1 sibling, 1 reply; 26+ messages in thread
From: Andrew Lunn @ 2021-09-21 21:02 UTC (permalink / raw)
  To: Saravana Kannan
  Cc: Rafael J. Wysocki, Greg Kroah-Hartman, Heiner Kallweit,
	Russell King, David S. Miller, Jakub Kicinski, Len Brown,
	Geert Uytterhoeven, Vladimir Oltean, Cc: Android Kernel,
	Linux Kernel Mailing List, netdev, ACPI Devel Maling List

> There are cases where the children try to probe too quickly (before
> the parent has had time to set up all the resources it's setting up)
> and the child defers the probe. Even Andrew had an example of that
> with some ethernet driver where the deferred probe is attempted
> multiple times wasting time and then it eventually succeeds.

And i prefer an occasional EPROBE_DEFER over a broken Ethernet switch,
which is the current state. I'm happy to see optimisations, but not at
the expense of breaking working stuff.

> Also, this assumption that the child will be bound successfully upon
> addition forces the parent/child drivers to play initcall chicken

We have never had any initcall chicken problems. The switch drivers
all are standard mdio_module_driver, module_platform_driver,
module_i2c_driver, module_pci_driver. Nothing special here. Things
load in whatever order they load, and it all works out, maybe with an
EPROBE_DEFER cycle. Which is good, we get our error paths tested, and
sometimes find bugs that way.

     Andrew

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v3 2/3] driver core: fw_devlink: Add support for FWNODE_FLAG_NEEDS_CHILD_BOUND_ON_ADD
  2021-09-21 21:02                   ` Andrew Lunn
@ 2021-09-21 21:54                     ` Saravana Kannan
  2021-09-21 22:04                       ` Andrew Lunn
  0 siblings, 1 reply; 26+ messages in thread
From: Saravana Kannan @ 2021-09-21 21:54 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: Rafael J. Wysocki, Greg Kroah-Hartman, Heiner Kallweit,
	Russell King, David S. Miller, Jakub Kicinski, Len Brown,
	Geert Uytterhoeven, Vladimir Oltean, Cc: Android Kernel,
	Linux Kernel Mailing List, netdev, ACPI Devel Maling List

On Tue, Sep 21, 2021 at 2:03 PM Andrew Lunn <andrew@lunn.ch> wrote:
>
> > There are cases where the children try to probe too quickly (before
> > the parent has had time to set up all the resources it's setting up)
> > and the child defers the probe. Even Andrew had an example of that
> > with some ethernet driver where the deferred probe is attempted
> > multiple times wasting time and then it eventually succeeds.
>
> And i prefer an occasional EPROBE_DEFER over a broken Ethernet switch,
> which is the current state. I'm happy to see optimisations, but not at
> the expense of breaking working stuff.

Right, but in that case, the long term solution should be to make
changes so we don't expect the child to be bound as soon as it's
added. Not disable the optimization. Agree?

>
> > Also, this assumption that the child will be bound successfully upon
> > addition forces the parent/child drivers to play initcall chicken
>
> We have never had any initcall chicken problems. The switch drivers
> all are standard mdio_module_driver, module_platform_driver,
> module_i2c_driver, module_pci_driver. Nothing special here. Things
> load in whatever order they load, and it all works out, maybe with an
> EPROBE_DEFER cycle. Which is good, we get our error paths tested, and
> sometimes find bugs that way.

My comment was a general comment about parent drives that expect the
child drivers to be bound on addition -- not specific to DSA.

But even in the DSA case, not playing initcall chicken means if you
change the order of driver registration, things should still work.
However, as it stands, if you register the switch driver before the
PHY drivers are registered, you are going to force bind the PHYs to
the generic driver.

-Saravana

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v3 2/3] driver core: fw_devlink: Add support for FWNODE_FLAG_NEEDS_CHILD_BOUND_ON_ADD
  2021-09-21 21:54                     ` Saravana Kannan
@ 2021-09-21 22:04                       ` Andrew Lunn
  2021-09-21 22:26                         ` Saravana Kannan
  0 siblings, 1 reply; 26+ messages in thread
From: Andrew Lunn @ 2021-09-21 22:04 UTC (permalink / raw)
  To: Saravana Kannan
  Cc: Rafael J. Wysocki, Greg Kroah-Hartman, Heiner Kallweit,
	Russell King, David S. Miller, Jakub Kicinski, Len Brown,
	Geert Uytterhoeven, Vladimir Oltean, Cc: Android Kernel,
	Linux Kernel Mailing List, netdev, ACPI Devel Maling List

On Tue, Sep 21, 2021 at 02:54:47PM -0700, Saravana Kannan wrote:
> On Tue, Sep 21, 2021 at 2:03 PM Andrew Lunn <andrew@lunn.ch> wrote:
> >
> > > There are cases where the children try to probe too quickly (before
> > > the parent has had time to set up all the resources it's setting up)
> > > and the child defers the probe. Even Andrew had an example of that
> > > with some ethernet driver where the deferred probe is attempted
> > > multiple times wasting time and then it eventually succeeds.
> >
> > And i prefer an occasional EPROBE_DEFER over a broken Ethernet switch,
> > which is the current state. I'm happy to see optimisations, but not at
> > the expense of breaking working stuff.
> 
> Right, but in that case, the long term solution should be to make
> changes so we don't expect the child to be bound as soon as it's
> added. Not disable the optimization. Agree?

Maybe. Lets see how you fix what is currently broken. At the moment, i
don't care too much about the long term solution. The current quick
fix for stable does not seem to be making any progress. So we need the
real fix now, to unbreak what is currently broken, then we can think
about the long term.

    Andrew

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v3 2/3] driver core: fw_devlink: Add support for FWNODE_FLAG_NEEDS_CHILD_BOUND_ON_ADD
  2021-09-21 22:04                       ` Andrew Lunn
@ 2021-09-21 22:26                         ` Saravana Kannan
  2021-09-22  0:45                           ` Andrew Lunn
  0 siblings, 1 reply; 26+ messages in thread
From: Saravana Kannan @ 2021-09-21 22:26 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: Rafael J. Wysocki, Greg Kroah-Hartman, Heiner Kallweit,
	Russell King, David S. Miller, Jakub Kicinski, Len Brown,
	Geert Uytterhoeven, Vladimir Oltean, Cc: Android Kernel,
	Linux Kernel Mailing List, netdev, ACPI Devel Maling List

On Tue, Sep 21, 2021 at 3:04 PM Andrew Lunn <andrew@lunn.ch> wrote:
>
> On Tue, Sep 21, 2021 at 02:54:47PM -0700, Saravana Kannan wrote:
> > On Tue, Sep 21, 2021 at 2:03 PM Andrew Lunn <andrew@lunn.ch> wrote:
> > >
> > > > There are cases where the children try to probe too quickly (before
> > > > the parent has had time to set up all the resources it's setting up)
> > > > and the child defers the probe. Even Andrew had an example of that
> > > > with some ethernet driver where the deferred probe is attempted
> > > > multiple times wasting time and then it eventually succeeds.
> > >
> > > And i prefer an occasional EPROBE_DEFER over a broken Ethernet switch,
> > > which is the current state. I'm happy to see optimisations, but not at
> > > the expense of breaking working stuff.
> >
> > Right, but in that case, the long term solution should be to make
> > changes so we don't expect the child to be bound as soon as it's
> > added. Not disable the optimization. Agree?
>
> Maybe. Lets see how you fix what is currently broken. At the moment, i
> don't care too much about the long term solution. The current quick
> fix for stable does not seem to be making any progress. So we need the
> real fix now, to unbreak what is currently broken, then we can think
> about the long term.

Wait, what's the difference between a real fix vs a long term fix? To
me those are the same.

I agree that having DSA be broken till we have the final fix isn't
good. Especially because DSA is complicated and the over eager gen PHY
driver makes it even harder to fix it quickly.

Merging this patch gives an exception to DSA, while we figure out a
good fix. It also makes sure more drivers don't get merged with the
same assumptions (because fw_devlink=on won't give them the
exception).

Greg/Rafael, can we merge this please while we figure out a fix for
DSA/PHY. It's a non-trivial problem to solve because PHYs need some
kind of driver "match rating".

-Saravana

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v3 2/3] driver core: fw_devlink: Add support for FWNODE_FLAG_NEEDS_CHILD_BOUND_ON_ADD
  2021-09-21 22:26                         ` Saravana Kannan
@ 2021-09-22  0:45                           ` Andrew Lunn
  2021-09-22  1:00                             ` Saravana Kannan
  0 siblings, 1 reply; 26+ messages in thread
From: Andrew Lunn @ 2021-09-22  0:45 UTC (permalink / raw)
  To: Saravana Kannan
  Cc: Rafael J. Wysocki, Greg Kroah-Hartman, Heiner Kallweit,
	Russell King, David S. Miller, Jakub Kicinski, Len Brown,
	Geert Uytterhoeven, Vladimir Oltean, Cc: Android Kernel,
	Linux Kernel Mailing List, netdev, ACPI Devel Maling List

> Wait, what's the difference between a real fix vs a long term fix? To
> me those are the same.

Maybe the long term fix is you follow the phandle to the actual
resources, see it is present, and allow the probe? That brings you in
line with how things actually work with devices probing against
resources.

I don't know how much work that is, since there is no uniform API to
follow a phandle to a resource. I think each phandle type has its own
helper. For an interrupt phandle you need to use of_irq_get(), for a
gpio phandle maybe of_get_named_gpio_flags(), for a reset phandle
__of_reset_control_get(), etc.

Because this does not sounds too simple, maybe you can find something
simpler which is a real fix for now, good enough that it will get
merged, and then you can implement this phandle following for the long
term fix?

	 Andrew

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v3 2/3] driver core: fw_devlink: Add support for FWNODE_FLAG_NEEDS_CHILD_BOUND_ON_ADD
  2021-09-22  0:45                           ` Andrew Lunn
@ 2021-09-22  1:00                             ` Saravana Kannan
  2021-09-22 12:52                               ` Andrew Lunn
  0 siblings, 1 reply; 26+ messages in thread
From: Saravana Kannan @ 2021-09-22  1:00 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: Rafael J. Wysocki, Greg Kroah-Hartman, Heiner Kallweit,
	Russell King, David S. Miller, Jakub Kicinski, Len Brown,
	Geert Uytterhoeven, Vladimir Oltean, Cc: Android Kernel,
	Linux Kernel Mailing List, netdev, ACPI Devel Maling List

On Tue, Sep 21, 2021 at 5:45 PM Andrew Lunn <andrew@lunn.ch> wrote:
>
> > Wait, what's the difference between a real fix vs a long term fix? To
> > me those are the same.
>
> Maybe the long term fix is you follow the phandle to the actual
> resources, see it is present, and allow the probe? That brings you in
> line with how things actually work with devices probing against
> resources.
>
> I don't know how much work that is, since there is no uniform API to
> follow a phandle to a resource. I think each phandle type has its own
> helper. For an interrupt phandle you need to use of_irq_get(), for a
> gpio phandle maybe of_get_named_gpio_flags(), for a reset phandle
> __of_reset_control_get(), etc.

That goes back to Rafael's reply (and I agree):

"Also if the probe has already started, it may still return
-EPROBE_DEFER at any time in theory, so as a rule the dependency is
actually known to be satisfied when the probe has successfully
completed."

So waiting for the probe to finish is the right behavior/intentional
for fw_devlink.

> Because this does not sounds too simple, maybe you can find something
> simpler which is a real fix for now, good enough that it will get
> merged, and then you can implement this phandle following for the long
> term fix?

The simpler fix is really just this patch. I'm hoping Greg/Rafael see
my point about doing the exception this way prevents things from
getting worse will we address existing cases that need the flag.

The long/proper fix is to the DSA framework. I have some ideas that I
think will work but I've had time to get to (but on the top of my
upstream work list). We can judge that after I send out the patches :)

-Saravana

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v3 2/3] driver core: fw_devlink: Add support for FWNODE_FLAG_NEEDS_CHILD_BOUND_ON_ADD
  2021-09-22  1:00                             ` Saravana Kannan
@ 2021-09-22 12:52                               ` Andrew Lunn
  0 siblings, 0 replies; 26+ messages in thread
From: Andrew Lunn @ 2021-09-22 12:52 UTC (permalink / raw)
  To: Saravana Kannan
  Cc: Rafael J. Wysocki, Greg Kroah-Hartman, Heiner Kallweit,
	Russell King, David S. Miller, Jakub Kicinski, Len Brown,
	Geert Uytterhoeven, Vladimir Oltean, Cc: Android Kernel,
	Linux Kernel Mailing List, netdev, ACPI Devel Maling List

> That goes back to Rafael's reply (and I agree):
> 
> "Also if the probe has already started, it may still return
> -EPROBE_DEFER at any time in theory, so as a rule the dependency is
> actually known to be satisfied when the probe has successfully
> completed."
> 
> So waiting for the probe to finish is the right behavior/intentional
> for fw_devlink.

But differs to how things actually work in the driver model. The
driver model does not care if a driver has finished probing, you can
use a resource as soon as it is registered. Hence this whole
problem/discussion.

	Andrew

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v3 2/3] driver core: fw_devlink: Add support for FWNODE_FLAG_NEEDS_CHILD_BOUND_ON_ADD
  2021-09-21 20:07                 ` Saravana Kannan
  2021-09-21 21:02                   ` Andrew Lunn
@ 2021-09-23 12:48                   ` Rafael J. Wysocki
  1 sibling, 0 replies; 26+ messages in thread
From: Rafael J. Wysocki @ 2021-09-23 12:48 UTC (permalink / raw)
  To: Saravana Kannan
  Cc: Andrew Lunn, Rafael J. Wysocki, Greg Kroah-Hartman,
	Heiner Kallweit, Russell King, David S. Miller, Jakub Kicinski,
	Len Brown, Geert Uytterhoeven, Vladimir Oltean,
	Cc: Android Kernel, Linux Kernel Mailing List, netdev,
	ACPI Devel Maling List

On Tue, Sep 21, 2021 at 10:07 PM Saravana Kannan <saravanak@google.com> wrote:
>
> Sorry I've been busy with LPC and some other stuff and could respond earlier.
>
> On Tue, Sep 21, 2021 at 12:50 PM Andrew Lunn <andrew@lunn.ch> wrote:
> >
> > > It works at a device level, so it doesn't know about resources.  The
> > > only information it has is of the "this device may depend on that
> > > other device" type and it uses that information to figure out a usable
> > > probe ordering for drivers.
> >
> > And that simplification is the problem. A phandle does not point to a
> > device, it points to a resource of a device. It should really be doing
> > what the driver would do, follow the phandle to the resource and see
> > if it exists yet. If it does not exist then yes it can defer the
> > probe. If the resource does exist, allow the driver to probe.
> >
> > > Also if the probe has already started, it may still return
> > > -EPROBE_DEFER at any time in theory
> >
> > Sure it can, and does. And any driver which is not broken will
> > unregister its resources on the error path. And that causes users of
> > the resources to release them. It all nicely unravels, and then tries
> > again later. This all works, it is what these drivers do.
>
> One of the points of fw_devlink=on is to avoid the pointless deferred
> probes that'd happen in this situation. So saying "let this happen"
> when fw_devlink=on kinda beats the point of it. See further below.

Well, you need to define "pointless deferred probes" in the first
place.  fw_devlink adds deferred probes by itself, so why are those
not pointless whereas the others are?

> >
> > > However, making children wait for their parents to complete probing is
> > > generally artificial, especially in the cases when the children are
> > > registered by the parent's driver.  So waiting should be an exception
> > > in these cases, not a rule.
>
> Rafael,
>
> There are cases where the children try to probe too quickly (before
> the parent has had time to set up all the resources it's setting up)
> and the child defers the probe. Even Andrew had an example of that
> with some ethernet driver where the deferred probe is attempted
> multiple times wasting time and then it eventually succeeds.

You seem to be arguing that it may be possible to replace multiple
probe attempts that each are deferred with one probe deferral which
then is beneficial from the performance perspective.

Yes, there are cases like that, but when this is used as a general
rule, it introduces a problem if it does a deferred probe when there
is no need for a probe deferral at all (like in the specific problem
case at hand).  Also if the probing of the child is deferred just
once, adding an extra dependency on the parent to it doesn't really
help.

> Considering there's no guarantee that a device_add() will result in
> the device being bound immediately, why shouldn't we make the child
> device wait until the parent has completely probed and we know all the
> resources from the parent are guaranteed to be available? Why can't we
> treat drivers that assume a device will get bound as soon as it's
> added as the exception (because we don't guarantee that anyway)?

Because this adds artificial constraints that otherwise aren't there
in some cases to the picture and asking drivers to mark themselves as
"please don't add these artificial constraints for me" is not
particularly straightforward.  Moreover, it does that retroactively
causing things that are entirely correct and previously worked just
fine to now have to paint themselves red to continue working as
before.

The fact that immediate probe completion cannot be guaranteed in
general doesn't mean that it cannot be assumed in certain situations.
For example, a parent driver registering a child may know what the
child driver is and so it may know that the child will either probe
successfully right away or the probing of it will fail and your extra
constraint breaks that assumption.  You can't really know how many of
such cases there are and trying to cover them with a new flag is a
retroactive whack-a-mole game.

> Also, this assumption that the child will be bound successfully upon
> addition forces the parent/child drivers to play initcall chicken --
> the child's driver has to be registered before the parent's driver.

That's true, but why is this a general problem?  For example, they
both may be registered by the same function in the right order.
What's wrong with that?

> We should be getting away from those by fixing the parent driver that's
> making these assumptions (I'll be glad to help with that). We need to
> be moving towards reducing pointless deferred probes and initcall
> ordering requirements instead of saying "this bad assumption used to
> work, so allow me to continue doing that".

It is not always a bad assumption.  It may be code designed this way.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v3 0/3] fw_devlink bug fixes
  2021-09-15 17:09 [PATCH v3 0/3] fw_devlink bug fixes Saravana Kannan
                   ` (2 preceding siblings ...)
  2021-09-15 17:09 ` [PATCH v3 3/3] net: mdiobus: Set FWNODE_FLAG_NEEDS_CHILD_BOUND_ON_ADD for mdiobus parents Saravana Kannan
@ 2021-09-23 17:30 ` Greg Kroah-Hartman
  2021-09-23 19:44   ` Vladimir Oltean
  2021-09-23 22:28 ` Florian Fainelli
  4 siblings, 1 reply; 26+ messages in thread
From: Greg Kroah-Hartman @ 2021-09-23 17:30 UTC (permalink / raw)
  To: Saravana Kannan
  Cc: Rafael J. Wysocki, Andrew Lunn, Heiner Kallweit, Russell King,
	David S. Miller, Jakub Kicinski, Len Brown, Geert Uytterhoeven,
	Vladimir Oltean, kernel-team, linux-kernel, netdev, linux-acpi

On Wed, Sep 15, 2021 at 10:09:36AM -0700, Saravana Kannan wrote:
> Intended for 5.15.
> 
> Cc: Geert Uytterhoeven <geert@linux-m68k.org>
> Cc: Andrew Lunn <andrew@lunn.ch>
> Cc: Vladimir Oltean <olteanv@gmail.com>
> 
> v1->v2:
> - Added a few Reviewed-by and Tested-by tags
> - Addressed Geert's comments in patches 3 and 5
> - Dropped the fw_devlink.debug patch
> - Added 2 more patches to the series to address other fw_devlink issues
> 
> v2->v3:
> - Split the logging/debug changes into a separate series

I have taken this now into my tree.

It fixes the real problem where drivers were making the wrong assumption
that if they registered a device, it would be instantly bound to a
driver.  Drivers that did this were getting lucky, as this was never a
guarantee of the driver core (think about if you enabled async
probing, and the mess with the bus specific locks that should be
preventing much of this)

With this new flag, we can mark these drivers/busses that have this
assumption and work to solve correctly over time.  The issue with using
a "generic vs. specific" driver is a bit related, I'm amazed that a
subsystem actually implemented it this way, others of us have been
avoiding this for a very long time due to the complexity involved when
things are built as modules.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v3 0/3] fw_devlink bug fixes
  2021-09-23 17:30 ` [PATCH v3 0/3] fw_devlink bug fixes Greg Kroah-Hartman
@ 2021-09-23 19:44   ` Vladimir Oltean
  2021-09-24  6:09     ` Greg Kroah-Hartman
  0 siblings, 1 reply; 26+ messages in thread
From: Vladimir Oltean @ 2021-09-23 19:44 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Saravana Kannan, Rafael J. Wysocki, Andrew Lunn, Heiner Kallweit,
	Russell King, David S. Miller, Jakub Kicinski, Len Brown,
	Geert Uytterhoeven, kernel-team, linux-kernel, netdev,
	linux-acpi

On Thu, Sep 23, 2021 at 07:30:04PM +0200, Greg Kroah-Hartman wrote:
> It fixes the real problem where drivers were making the wrong assumption
> that if they registered a device, it would be instantly bound to a
> driver.  Drivers that did this were getting lucky, as this was never a
> guarantee of the driver core (think about if you enabled async
> probing, and the mess with the bus specific locks that should be
> preventing much of this)

Since commit d173a137c5bd ("driver-core: enable drivers to opt-out of
async probe") it is possible to opt out of async probing, and PHY
drivers do opt out of it, at the time of writing.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v3 0/3] fw_devlink bug fixes
  2021-09-15 17:09 [PATCH v3 0/3] fw_devlink bug fixes Saravana Kannan
                   ` (3 preceding siblings ...)
  2021-09-23 17:30 ` [PATCH v3 0/3] fw_devlink bug fixes Greg Kroah-Hartman
@ 2021-09-23 22:28 ` Florian Fainelli
  4 siblings, 0 replies; 26+ messages in thread
From: Florian Fainelli @ 2021-09-23 22:28 UTC (permalink / raw)
  To: Saravana Kannan, Greg Kroah-Hartman, Rafael J. Wysocki,
	Andrew Lunn, Heiner Kallweit, Russell King, David S. Miller,
	Jakub Kicinski, Len Brown
  Cc: Geert Uytterhoeven, Vladimir Oltean, kernel-team, linux-kernel,
	netdev, linux-acpi

On 9/15/21 10:09 AM, Saravana Kannan wrote:
> Intended for 5.15.
> 
> Cc: Geert Uytterhoeven <geert@linux-m68k.org>
> Cc: Andrew Lunn <andrew@lunn.ch>
> Cc: Vladimir Oltean <olteanv@gmail.com>
> 
> v1->v2:
> - Added a few Reviewed-by and Tested-by tags
> - Addressed Geert's comments in patches 3 and 5
> - Dropped the fw_devlink.debug patch
> - Added 2 more patches to the series to address other fw_devlink issues
> 
> v2->v3:
> - Split the logging/debug changes into a separate series
> 
> Thanks,
> Saravana
> 
> Saravana Kannan (3):
>   driver core: fw_devlink: Improve handling of cyclic dependencies
>   driver core: fw_devlink: Add support for
>     FWNODE_FLAG_NEEDS_CHILD_BOUND_ON_ADD
>   net: mdiobus: Set FWNODE_FLAG_NEEDS_CHILD_BOUND_ON_ADD for mdiobus
>     parents

Andrew, did you get a chance to test this patch set on a ZII development
board rev B or C by any chance?
-- 
Florian

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v3 0/3] fw_devlink bug fixes
  2021-09-23 19:44   ` Vladimir Oltean
@ 2021-09-24  6:09     ` Greg Kroah-Hartman
  0 siblings, 0 replies; 26+ messages in thread
From: Greg Kroah-Hartman @ 2021-09-24  6:09 UTC (permalink / raw)
  To: Vladimir Oltean
  Cc: Saravana Kannan, Rafael J. Wysocki, Andrew Lunn, Heiner Kallweit,
	Russell King, David S. Miller, Jakub Kicinski, Len Brown,
	Geert Uytterhoeven, kernel-team, linux-kernel, netdev,
	linux-acpi

On Thu, Sep 23, 2021 at 10:44:48PM +0300, Vladimir Oltean wrote:
> On Thu, Sep 23, 2021 at 07:30:04PM +0200, Greg Kroah-Hartman wrote:
> > It fixes the real problem where drivers were making the wrong assumption
> > that if they registered a device, it would be instantly bound to a
> > driver.  Drivers that did this were getting lucky, as this was never a
> > guarantee of the driver core (think about if you enabled async
> > probing, and the mess with the bus specific locks that should be
> > preventing much of this)
> 
> Since commit d173a137c5bd ("driver-core: enable drivers to opt-out of
> async probe") it is possible to opt out of async probing, and PHY
> drivers do opt out of it, at the time of writing.

That's good, but we are talking about system-wide enabling of async
probing in the future, which might cause problems here :)

Anyway, let's go with this option for now and Saravana has assured me
that he will work on fixing up these drivers/bus to work properly going
forward.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2021-09-24  6:09 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-09-15 17:09 [PATCH v3 0/3] fw_devlink bug fixes Saravana Kannan
2021-09-15 17:09 ` [PATCH v3 1/3] driver core: fw_devlink: Improve handling of cyclic dependencies Saravana Kannan
2021-09-18 14:38   ` Rafael J. Wysocki
2021-09-15 17:09 ` [PATCH v3 2/3] driver core: fw_devlink: Add support for FWNODE_FLAG_NEEDS_CHILD_BOUND_ON_ADD Saravana Kannan
2021-09-18 15:03   ` Rafael J. Wysocki
2021-09-19  1:16     ` Andrew Lunn
2021-09-19  6:41       ` Greg Kroah-Hartman
2021-09-21 16:15       ` Greg Kroah-Hartman
2021-09-21 16:35         ` Rafael J. Wysocki
2021-09-21 17:56           ` Andrew Lunn
2021-09-21 18:56             ` Rafael J. Wysocki
2021-09-21 19:50               ` Andrew Lunn
2021-09-21 20:07                 ` Saravana Kannan
2021-09-21 21:02                   ` Andrew Lunn
2021-09-21 21:54                     ` Saravana Kannan
2021-09-21 22:04                       ` Andrew Lunn
2021-09-21 22:26                         ` Saravana Kannan
2021-09-22  0:45                           ` Andrew Lunn
2021-09-22  1:00                             ` Saravana Kannan
2021-09-22 12:52                               ` Andrew Lunn
2021-09-23 12:48                   ` Rafael J. Wysocki
2021-09-15 17:09 ` [PATCH v3 3/3] net: mdiobus: Set FWNODE_FLAG_NEEDS_CHILD_BOUND_ON_ADD for mdiobus parents Saravana Kannan
2021-09-23 17:30 ` [PATCH v3 0/3] fw_devlink bug fixes Greg Kroah-Hartman
2021-09-23 19:44   ` Vladimir Oltean
2021-09-24  6:09     ` Greg Kroah-Hartman
2021-09-23 22:28 ` Florian Fainelli

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).