netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/9] deferred_probe_timeout logic clean up
@ 2022-06-01  7:06 Saravana Kannan
  2022-06-01  7:06 ` [PATCH v2 1/9] PM: domains: Delete usage of driver_deferred_probe_check_state() Saravana Kannan
                   ` (9 more replies)
  0 siblings, 10 replies; 69+ messages in thread
From: Saravana Kannan @ 2022-06-01  7:06 UTC (permalink / raw)
  To: Greg Kroah-Hartman, Rafael J. Wysocki, Kevin Hilman, Ulf Hansson,
	Len Brown, Pavel Machek, Joerg Roedel, Will Deacon, Andrew Lunn,
	Heiner Kallweit, Russell King, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Linus Walleij, Hideaki YOSHIFUJI,
	David Ahern
  Cc: Saravana Kannan, kernel-team, linux-kernel, linux-pm, iommu,
	netdev, linux-gpio

This series is based on linux-next + these 2 small patches applies on top:
https://lore.kernel.org/lkml/20220526034609.480766-1-saravanak@google.com/

A lot of the deferred_probe_timeout logic is redundant with
fw_devlink=on.  Also, enabling deferred_probe_timeout by default breaks
a few cases.

This series tries to delete the redundant logic, simplify the frameworks
that use driver_deferred_probe_check_state(), enable
deferred_probe_timeout=10 by default, and fixes the nfsroot failure
case.

The overall idea of this series is to replace the global behavior of
driver_deferred_probe_check_state() where all devices give up waiting on
supplier at the same time with a more granular behavior:

1. Devices with all their suppliers successfully probed by late_initcall
   probe as usual and avoid unnecessary deferred probe attempts.

2. At or after late_initcall, in cases where boot would break because of
   fw_devlink=on being strict about the ordering, we

   a. Temporarily relax the enforcement to probe any unprobed devices
      that can probe successfully in the current state of the system.
      For example, when we boot with a NFS rootfs and no network device
      has probed.
   b. Go back to enforcing the ordering for any devices that haven't
      probed.

3. After deferred probe timeout expires, we permanently give up waiting
   on supplier devices without drivers. At this point, whatever devices
   can probe without some of their optional suppliers end up probing.

In the case where module support is disabled, it's fairly
straightforward and all device probes are completed before the initcalls
are done.

Patches 1 to 3 are fairly straightforward and can probably be applied
right away.

Patches 4 to 6 are for fixing the NFS rootfs issue and setting the
default deferred_probe_timeout back to 10 seconds when modules are
enabled.

Patches 7 to 9 are further clean up of the deferred_probe_timeout logic
so that no framework has to know/care about deferred_probe_timeout.

Yoshihiro/Geert,

If you can test this patch series and confirm that the NFS root case
works, I'd really appreciate that.

Thanks,
Saravana

v1 -> v2:
Rewrote the NFS rootfs fix to be a lot less destructive on the
fw_devlink ordering for devices that don't end up probing during the
"best effort" attempt at probing all devices needed for a network rootfs

Saravana Kannan (9):
  PM: domains: Delete usage of driver_deferred_probe_check_state()
  pinctrl: devicetree: Delete usage of
    driver_deferred_probe_check_state()
  net: mdio: Delete usage of driver_deferred_probe_check_state()
  driver core: Add wait_for_init_devices_probe helper function
  net: ipconfig: Relax fw_devlink if we need to mount a network rootfs
  Revert "driver core: Set default deferred_probe_timeout back to 0."
  driver core: Set fw_devlink.strict=1 by default
  iommu/of: Delete usage of driver_deferred_probe_check_state()
  driver core: Delete driver_deferred_probe_check_state()

 drivers/base/base.h            |   1 +
 drivers/base/core.c            | 102 ++++++++++++++++++++++++++++++---
 drivers/base/dd.c              |  54 ++++++-----------
 drivers/base/power/domain.c    |   2 +-
 drivers/iommu/of_iommu.c       |   2 +-
 drivers/net/mdio/fwnode_mdio.c |   4 +-
 drivers/pinctrl/devicetree.c   |   2 +-
 include/linux/device/driver.h  |   2 +-
 net/ipv4/ipconfig.c            |   6 ++
 9 files changed, 126 insertions(+), 49 deletions(-)

-- 
2.36.1.255.ge46751e96f-goog


^ permalink raw reply	[flat|nested] 69+ messages in thread

* [PATCH v2 1/9] PM: domains: Delete usage of driver_deferred_probe_check_state()
  2022-06-01  7:06 [PATCH v2 0/9] deferred_probe_timeout logic clean up Saravana Kannan
@ 2022-06-01  7:06 ` Saravana Kannan
  2022-06-09 11:44   ` Ulf Hansson
  2022-06-21  7:28   ` Tony Lindgren
  2022-06-01  7:06 ` [PATCH v2 2/9] pinctrl: devicetree: " Saravana Kannan
                   ` (8 subsequent siblings)
  9 siblings, 2 replies; 69+ messages in thread
From: Saravana Kannan @ 2022-06-01  7:06 UTC (permalink / raw)
  To: Greg Kroah-Hartman, Rafael J. Wysocki, Kevin Hilman, Ulf Hansson,
	Len Brown, Pavel Machek, Joerg Roedel, Will Deacon, Andrew Lunn,
	Heiner Kallweit, Russell King, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Linus Walleij, Hideaki YOSHIFUJI,
	David Ahern
  Cc: Saravana Kannan, kernel-team, linux-kernel, linux-pm, iommu,
	netdev, linux-gpio

Now that fw_devlink=on by default and fw_devlink supports
"power-domains" property, the execution will never get to the point
where driver_deferred_probe_check_state() is called before the supplier
has probed successfully or before deferred probe timeout has expired.

So, delete the call and replace it with -ENODEV.

Signed-off-by: Saravana Kannan <saravanak@google.com>
---
 drivers/base/power/domain.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/base/power/domain.c b/drivers/base/power/domain.c
index 739e52cd4aba..3e86772d5fac 100644
--- a/drivers/base/power/domain.c
+++ b/drivers/base/power/domain.c
@@ -2730,7 +2730,7 @@ static int __genpd_dev_pm_attach(struct device *dev, struct device *base_dev,
 		mutex_unlock(&gpd_list_lock);
 		dev_dbg(dev, "%s() failed to find PM domain: %ld\n",
 			__func__, PTR_ERR(pd));
-		return driver_deferred_probe_check_state(base_dev);
+		return -ENODEV;
 	}
 
 	dev_dbg(dev, "adding to PM domain %s\n", pd->name);
-- 
2.36.1.255.ge46751e96f-goog


^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [PATCH v2 2/9] pinctrl: devicetree: Delete usage of driver_deferred_probe_check_state()
  2022-06-01  7:06 [PATCH v2 0/9] deferred_probe_timeout logic clean up Saravana Kannan
  2022-06-01  7:06 ` [PATCH v2 1/9] PM: domains: Delete usage of driver_deferred_probe_check_state() Saravana Kannan
@ 2022-06-01  7:06 ` Saravana Kannan
  2022-06-01  7:06 ` [PATCH v2 3/9] net: mdio: " Saravana Kannan
                   ` (7 subsequent siblings)
  9 siblings, 0 replies; 69+ messages in thread
From: Saravana Kannan @ 2022-06-01  7:06 UTC (permalink / raw)
  To: Greg Kroah-Hartman, Rafael J. Wysocki, Kevin Hilman, Ulf Hansson,
	Len Brown, Pavel Machek, Joerg Roedel, Will Deacon, Andrew Lunn,
	Heiner Kallweit, Russell King, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Linus Walleij, Hideaki YOSHIFUJI,
	David Ahern
  Cc: Saravana Kannan, kernel-team, linux-kernel, linux-pm, iommu,
	netdev, linux-gpio

Now that fw_devlink=on by default and fw_devlink supports
"pinctrl-[0-8]" property, the execution will never get to the point
where driver_deferred_probe_check_state() is called before the supplier
has probed successfully or before deferred probe timeout has expired.

So, delete the call and replace it with -ENODEV.

Signed-off-by: Saravana Kannan <saravanak@google.com>
---
 drivers/pinctrl/devicetree.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/pinctrl/devicetree.c b/drivers/pinctrl/devicetree.c
index 3fb238714718..ef898ee8ca6b 100644
--- a/drivers/pinctrl/devicetree.c
+++ b/drivers/pinctrl/devicetree.c
@@ -129,7 +129,7 @@ static int dt_to_map_one_config(struct pinctrl *p,
 		np_pctldev = of_get_next_parent(np_pctldev);
 		if (!np_pctldev || of_node_is_root(np_pctldev)) {
 			of_node_put(np_pctldev);
-			ret = driver_deferred_probe_check_state(p->dev);
+			ret = -ENODEV;
 			/* keep deferring if modules are enabled */
 			if (IS_ENABLED(CONFIG_MODULES) && !allow_default && ret < 0)
 				ret = -EPROBE_DEFER;
-- 
2.36.1.255.ge46751e96f-goog


^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [PATCH v2 3/9] net: mdio: Delete usage of driver_deferred_probe_check_state()
  2022-06-01  7:06 [PATCH v2 0/9] deferred_probe_timeout logic clean up Saravana Kannan
  2022-06-01  7:06 ` [PATCH v2 1/9] PM: domains: Delete usage of driver_deferred_probe_check_state() Saravana Kannan
  2022-06-01  7:06 ` [PATCH v2 2/9] pinctrl: devicetree: " Saravana Kannan
@ 2022-06-01  7:06 ` Saravana Kannan
  2022-07-05  9:11   ` Geert Uytterhoeven
  2022-06-01  7:07 ` [PATCH v2 4/9] driver core: Add wait_for_init_devices_probe helper function Saravana Kannan
                   ` (6 subsequent siblings)
  9 siblings, 1 reply; 69+ messages in thread
From: Saravana Kannan @ 2022-06-01  7:06 UTC (permalink / raw)
  To: Greg Kroah-Hartman, Rafael J. Wysocki, Kevin Hilman, Ulf Hansson,
	Len Brown, Pavel Machek, Joerg Roedel, Will Deacon, Andrew Lunn,
	Heiner Kallweit, Russell King, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Linus Walleij, Hideaki YOSHIFUJI,
	David Ahern
  Cc: Saravana Kannan, kernel-team, linux-kernel, linux-pm, iommu,
	netdev, linux-gpio

Now that fw_devlink=on by default and fw_devlink supports interrupt
properties, the execution will never get to the point where
driver_deferred_probe_check_state() is called before the supplier has
probed successfully or before deferred probe timeout has expired.

So, delete the call and replace it with -ENODEV.

Signed-off-by: Saravana Kannan <saravanak@google.com>
---
 drivers/net/mdio/fwnode_mdio.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/drivers/net/mdio/fwnode_mdio.c b/drivers/net/mdio/fwnode_mdio.c
index 1c1584fca632..3e79c2c51929 100644
--- a/drivers/net/mdio/fwnode_mdio.c
+++ b/drivers/net/mdio/fwnode_mdio.c
@@ -47,9 +47,7 @@ int fwnode_mdiobus_phy_device_register(struct mii_bus *mdio,
 	 * just fall back to poll mode
 	 */
 	if (rc == -EPROBE_DEFER)
-		rc = driver_deferred_probe_check_state(&phy->mdio.dev);
-	if (rc == -EPROBE_DEFER)
-		return rc;
+		rc = -ENODEV;
 
 	if (rc > 0) {
 		phy->irq = rc;
-- 
2.36.1.255.ge46751e96f-goog


^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [PATCH v2 4/9] driver core: Add wait_for_init_devices_probe helper function
  2022-06-01  7:06 [PATCH v2 0/9] deferred_probe_timeout logic clean up Saravana Kannan
                   ` (2 preceding siblings ...)
  2022-06-01  7:06 ` [PATCH v2 3/9] net: mdio: " Saravana Kannan
@ 2022-06-01  7:07 ` Saravana Kannan
  2022-06-01  7:07 ` [PATCH v2 5/9] net: ipconfig: Relax fw_devlink if we need to mount a network rootfs Saravana Kannan
                   ` (5 subsequent siblings)
  9 siblings, 0 replies; 69+ messages in thread
From: Saravana Kannan @ 2022-06-01  7:07 UTC (permalink / raw)
  To: Greg Kroah-Hartman, Rafael J. Wysocki, Kevin Hilman, Ulf Hansson,
	Len Brown, Pavel Machek, Joerg Roedel, Will Deacon, Andrew Lunn,
	Heiner Kallweit, Russell King, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Linus Walleij, Hideaki YOSHIFUJI,
	David Ahern
  Cc: Saravana Kannan, kernel-team, linux-kernel, linux-pm, iommu,
	netdev, linux-gpio

Some devices might need to be probed and bound successfully before the
kernel boot sequence can finish and move on to init/userspace. For
example, a network interface might need to be bound to be able to mount
a NFS rootfs.

With fw_devlink=on by default, some of these devices might be blocked
from probing because they are waiting on a optional supplier that
doesn't have a driver. While fw_devlink will eventually identify such
devices and unblock the probing automatically, it might be too late by
the time it unblocks the probing of devices. For example, the IP4
autoconfig might timeout before fw_devlink unblocks probing of the
network interface.

This function is available to temporarily try and probe all devices that
have a driver even if some of their suppliers haven't been added or
don't have drivers.

The drivers can then decide which of the suppliers are optional vs
mandatory and probe the device if possible. By the time this function
returns, all such "best effort" probes are guaranteed to be completed.
If a device successfully probes in this mode, we delete all fw_devlink
discovered dependencies of that device where the supplier hasn't yet
probed successfully because they have to be optional dependencies.

This also means that some devices that aren't needed for init and could
have waited for their optional supplier to probe (when the supplier's
module is loaded later on) would end up probing prematurely with limited
functionality.  So call this function only when boot would fail without
it.

Signed-off-by: Saravana Kannan <saravanak@google.com>
---
 drivers/base/base.h           |   1 +
 drivers/base/core.c           | 100 ++++++++++++++++++++++++++++++++--
 drivers/base/dd.c             |  19 +++++--
 include/linux/device/driver.h |   1 +
 4 files changed, 110 insertions(+), 11 deletions(-)

diff --git a/drivers/base/base.h b/drivers/base/base.h
index ab71403d102f..b3a43a164dcd 100644
--- a/drivers/base/base.h
+++ b/drivers/base/base.h
@@ -160,6 +160,7 @@ extern int devres_release_all(struct device *dev);
 extern void device_block_probing(void);
 extern void device_unblock_probing(void);
 extern void deferred_probe_extend_timeout(void);
+extern void driver_deferred_probe_trigger(void);
 
 /* /sys/devices directory */
 extern struct kset *devices_kset;
diff --git a/drivers/base/core.c b/drivers/base/core.c
index 7cd789c4985d..61fdfe99b348 100644
--- a/drivers/base/core.c
+++ b/drivers/base/core.c
@@ -54,6 +54,7 @@ static unsigned int defer_sync_state_count = 1;
 static DEFINE_MUTEX(fwnode_link_lock);
 static bool fw_devlink_is_permissive(void);
 static bool fw_devlink_drv_reg_done;
+static bool fw_devlink_best_effort;
 
 /**
  * fwnode_link_add - Create a link between two fwnode_handles.
@@ -965,6 +966,11 @@ static void device_links_missing_supplier(struct device *dev)
 	}
 }
 
+static bool dev_is_best_effort(struct device *dev)
+{
+	return fw_devlink_best_effort && dev->can_match;
+}
+
 /**
  * device_links_check_suppliers - Check presence of supplier drivers.
  * @dev: Consumer device.
@@ -984,7 +990,7 @@ static void device_links_missing_supplier(struct device *dev)
 int device_links_check_suppliers(struct device *dev)
 {
 	struct device_link *link;
-	int ret = 0;
+	int ret = 0, fwnode_ret = 0;
 	struct fwnode_handle *sup_fw;
 
 	/*
@@ -997,12 +1003,17 @@ int device_links_check_suppliers(struct device *dev)
 		sup_fw = list_first_entry(&dev->fwnode->suppliers,
 					  struct fwnode_link,
 					  c_hook)->supplier;
-		dev_err_probe(dev, -EPROBE_DEFER, "wait for supplier %pfwP\n",
-			      sup_fw);
-		mutex_unlock(&fwnode_link_lock);
-		return -EPROBE_DEFER;
+		if (!dev_is_best_effort(dev)) {
+			fwnode_ret = -EPROBE_DEFER;
+			dev_err_probe(dev, -EPROBE_DEFER,
+				    "wait for supplier %pfwP\n", sup_fw);
+		} else {
+			fwnode_ret = -EAGAIN;
+		}
 	}
 	mutex_unlock(&fwnode_link_lock);
+	if (fwnode_ret == -EPROBE_DEFER)
+		return fwnode_ret;
 
 	device_links_write_lock();
 
@@ -1012,6 +1023,14 @@ int device_links_check_suppliers(struct device *dev)
 
 		if (link->status != DL_STATE_AVAILABLE &&
 		    !(link->flags & DL_FLAG_SYNC_STATE_ONLY)) {
+
+			if (dev_is_best_effort(dev) &&
+			    link->flags & DL_FLAG_INFERRED &&
+			    !link->supplier->can_match) {
+				ret = -EAGAIN;
+				continue;
+			}
+
 			device_links_missing_supplier(dev);
 			dev_err_probe(dev, -EPROBE_DEFER,
 				      "supplier %s not ready\n",
@@ -1024,7 +1043,8 @@ int device_links_check_suppliers(struct device *dev)
 	dev->links.status = DL_DEV_PROBING;
 
 	device_links_write_unlock();
-	return ret;
+
+	return ret ? ret : fwnode_ret;
 }
 
 /**
@@ -1289,6 +1309,18 @@ void device_links_driver_bound(struct device *dev)
 			 * save to drop the managed link completely.
 			 */
 			device_link_drop_managed(link);
+		} else if (dev_is_best_effort(dev) &&
+			   link->flags & DL_FLAG_INFERRED &&
+			   link->status != DL_STATE_CONSUMER_PROBE &&
+			   !link->supplier->can_match) {
+			/*
+			 * When dev_is_best_effort() is true, we ignore device
+			 * links to suppliers that don't have a driver.  If the
+			 * consumer device still managed to probe, there's no
+			 * point in maintaining a device link in a weird state
+			 * (consumer probed before supplier). So delete it.
+			 */
+			device_link_drop_managed(link);
 		} else {
 			WARN_ON(link->status != DL_STATE_CONSUMER_PROBE);
 			WRITE_ONCE(link->status, DL_STATE_ACTIVE);
@@ -1655,6 +1687,62 @@ void fw_devlink_drivers_done(void)
 	device_links_write_unlock();
 }
 
+/**
+ * wait_for_init_devices_probe - Try to probe any device needed for init
+ *
+ * Some devices might need to be probed and bound successfully before the kernel
+ * boot sequence can finish and move on to init/userspace. For example, a
+ * network interface might need to be bound to be able to mount a NFS rootfs.
+ *
+ * With fw_devlink=on by default, some of these devices might be blocked from
+ * probing because they are waiting on a optional supplier that doesn't have a
+ * driver. While fw_devlink will eventually identify such devices and unblock
+ * the probing automatically, it might be too late by the time it unblocks the
+ * probing of devices. For example, the IP4 autoconfig might timeout before
+ * fw_devlink unblocks probing of the network interface.
+ *
+ * This function is available to temporarily try and probe all devices that have
+ * a driver even if some of their suppliers haven't been added or don't have
+ * drivers.
+ *
+ * The drivers can then decide which of the suppliers are optional vs mandatory
+ * and probe the device if possible. By the time this function returns, all such
+ * "best effort" probes are guaranteed to be completed. If a device successfully
+ * probes in this mode, we delete all fw_devlink discovered dependencies of that
+ * device where the supplier hasn't yet probed successfully because they have to
+ * be optional dependencies.
+ *
+ * Any devices that didn't successfully probe go back to being treated as if
+ * this function was never called.
+ *
+ * This also means that some devices that aren't needed for init and could have
+ * waited for their optional supplier to probe (when the supplier's module is
+ * loaded later on) would end up probing prematurely with limited functionality.
+ * So call this function only when boot would fail without it.
+ */
+void __init wait_for_init_devices_probe(void)
+{
+	if (!fw_devlink_flags || fw_devlink_is_permissive())
+		return;
+
+	/*
+	 * Wait for all ongoing probes to finish so that the "best effort" is
+	 * only applied to devices that can't probe otherwise.
+	 */
+	wait_for_device_probe();
+
+	pr_info("Trying to probe devices needed for running init ...\n");
+	fw_devlink_best_effort = true;
+	driver_deferred_probe_trigger();
+
+	/*
+	 * Wait for all "best effort" probes to finish before going back to
+	 * normal enforcement.
+	 */
+	wait_for_device_probe();
+	fw_devlink_best_effort = false;
+}
+
 static void fw_devlink_unblock_consumers(struct device *dev)
 {
 	struct device_link *link;
diff --git a/drivers/base/dd.c b/drivers/base/dd.c
index 11b0fb6414d3..4a55fbb7e0da 100644
--- a/drivers/base/dd.c
+++ b/drivers/base/dd.c
@@ -172,7 +172,7 @@ static bool driver_deferred_probe_enable;
  * changes in the midst of a probe, then deferred processing should be triggered
  * again.
  */
-static void driver_deferred_probe_trigger(void)
+void driver_deferred_probe_trigger(void)
 {
 	if (!driver_deferred_probe_enable)
 		return;
@@ -580,7 +580,7 @@ static int really_probe(struct device *dev, struct device_driver *drv)
 {
 	bool test_remove = IS_ENABLED(CONFIG_DEBUG_TEST_DRIVER_REMOVE) &&
 			   !drv->suppress_bind_attrs;
-	int ret;
+	int ret, link_ret;
 
 	if (defer_all_probes) {
 		/*
@@ -592,9 +592,9 @@ static int really_probe(struct device *dev, struct device_driver *drv)
 		return -EPROBE_DEFER;
 	}
 
-	ret = device_links_check_suppliers(dev);
-	if (ret)
-		return ret;
+	link_ret = device_links_check_suppliers(dev);
+	if (link_ret == -EPROBE_DEFER)
+		return link_ret;
 
 	pr_debug("bus: '%s': %s: probing driver %s with device %s\n",
 		 drv->bus->name, __func__, drv->name, dev_name(dev));
@@ -633,6 +633,15 @@ static int really_probe(struct device *dev, struct device_driver *drv)
 
 	ret = call_driver_probe(dev, drv);
 	if (ret) {
+		/*
+		 * If fw_devlink_best_effort is active (denoted by -EAGAIN), the
+		 * device might actually probe properly once some of its missing
+		 * suppliers have probed. So, treat this as if the driver
+		 * returned -EPROBE_DEFER.
+		 */
+		if (link_ret == -EAGAIN)
+			ret = -EPROBE_DEFER;
+
 		/*
 		 * Return probe errors as positive values so that the callers
 		 * can distinguish them from other errors.
diff --git a/include/linux/device/driver.h b/include/linux/device/driver.h
index 700453017e1c..2114d65b862f 100644
--- a/include/linux/device/driver.h
+++ b/include/linux/device/driver.h
@@ -129,6 +129,7 @@ extern struct device_driver *driver_find(const char *name,
 					 struct bus_type *bus);
 extern int driver_probe_done(void);
 extern void wait_for_device_probe(void);
+void __init wait_for_init_devices_probe(void);
 
 /* sysfs interface for exporting driver attributes */
 
-- 
2.36.1.255.ge46751e96f-goog


^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [PATCH v2 5/9] net: ipconfig: Relax fw_devlink if we need to mount a network rootfs
  2022-06-01  7:06 [PATCH v2 0/9] deferred_probe_timeout logic clean up Saravana Kannan
                   ` (3 preceding siblings ...)
  2022-06-01  7:07 ` [PATCH v2 4/9] driver core: Add wait_for_init_devices_probe helper function Saravana Kannan
@ 2022-06-01  7:07 ` Saravana Kannan
  2022-06-01  7:07 ` [PATCH v2 6/9] Revert "driver core: Set default deferred_probe_timeout back to 0." Saravana Kannan
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 69+ messages in thread
From: Saravana Kannan @ 2022-06-01  7:07 UTC (permalink / raw)
  To: Greg Kroah-Hartman, Rafael J. Wysocki, Kevin Hilman, Ulf Hansson,
	Len Brown, Pavel Machek, Joerg Roedel, Will Deacon, Andrew Lunn,
	Heiner Kallweit, Russell King, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Linus Walleij, Hideaki YOSHIFUJI,
	David Ahern
  Cc: Saravana Kannan, kernel-team, linux-kernel, linux-pm, iommu,
	netdev, linux-gpio

If there are network devices that could probe without some of their
suppliers probing and those network devices are needed to mount a
network rootfs, then fw_devlink=on might break that usecase by blocking
the network devices from probing by the time IP auto config starts.

So, if no network devices are available when IP auto config is enabled
and we have a network rootfs, make sure fw_devlink doesn't block the
probing of any device that has a driver and then retry finding a network
device.

Signed-off-by: Saravana Kannan <saravanak@google.com>
---
 net/ipv4/ipconfig.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/net/ipv4/ipconfig.c b/net/ipv4/ipconfig.c
index 9d41d5d5cd1e..2342debd7066 100644
--- a/net/ipv4/ipconfig.c
+++ b/net/ipv4/ipconfig.c
@@ -1434,6 +1434,7 @@ __be32 __init root_nfs_parse_addr(char *name)
 static int __init wait_for_devices(void)
 {
 	int i;
+	bool try_init_devs = true;
 
 	for (i = 0; i < DEVICE_WAIT_MAX; i++) {
 		struct net_device *dev;
@@ -1452,6 +1453,11 @@ static int __init wait_for_devices(void)
 		rtnl_unlock();
 		if (found)
 			return 0;
+		if (try_init_devs &&
+		    (ROOT_DEV == Root_NFS || ROOT_DEV == Root_CIFS)) {
+			try_init_devs = false;
+			wait_for_init_devices_probe();
+		}
 		ssleep(1);
 	}
 	return -ENODEV;
-- 
2.36.1.255.ge46751e96f-goog


^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [PATCH v2 6/9] Revert "driver core: Set default deferred_probe_timeout back to 0."
  2022-06-01  7:06 [PATCH v2 0/9] deferred_probe_timeout logic clean up Saravana Kannan
                   ` (4 preceding siblings ...)
  2022-06-01  7:07 ` [PATCH v2 5/9] net: ipconfig: Relax fw_devlink if we need to mount a network rootfs Saravana Kannan
@ 2022-06-01  7:07 ` Saravana Kannan
  2022-07-20 17:31   ` Geert Uytterhoeven
  2022-06-01  7:07 ` [PATCH v2 7/9] driver core: Set fw_devlink.strict=1 by default Saravana Kannan
                   ` (3 subsequent siblings)
  9 siblings, 1 reply; 69+ messages in thread
From: Saravana Kannan @ 2022-06-01  7:07 UTC (permalink / raw)
  To: Greg Kroah-Hartman, Rafael J. Wysocki, Kevin Hilman, Ulf Hansson,
	Len Brown, Pavel Machek, Joerg Roedel, Will Deacon, Andrew Lunn,
	Heiner Kallweit, Russell King, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Linus Walleij, Hideaki YOSHIFUJI,
	David Ahern
  Cc: Saravana Kannan, kernel-team, linux-kernel, linux-pm, iommu,
	netdev, linux-gpio

This reverts commit 11f7e7ef553b6b93ac1aa74a3c2011b9cc8aeb61.

Let's take another shot at getting deferred_probe_timeout=10 to work.

Signed-off-by: Saravana Kannan <saravanak@google.com>
---
 drivers/base/dd.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/drivers/base/dd.c b/drivers/base/dd.c
index 4a55fbb7e0da..335e71d3a618 100644
--- a/drivers/base/dd.c
+++ b/drivers/base/dd.c
@@ -256,7 +256,12 @@ static int deferred_devs_show(struct seq_file *s, void *data)
 }
 DEFINE_SHOW_ATTRIBUTE(deferred_devs);
 
+#ifdef CONFIG_MODULES
+int driver_deferred_probe_timeout = 10;
+#else
 int driver_deferred_probe_timeout;
+#endif
+
 EXPORT_SYMBOL_GPL(driver_deferred_probe_timeout);
 
 static int __init deferred_probe_timeout_setup(char *str)
-- 
2.36.1.255.ge46751e96f-goog


^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [PATCH v2 7/9] driver core: Set fw_devlink.strict=1 by default
  2022-06-01  7:06 [PATCH v2 0/9] deferred_probe_timeout logic clean up Saravana Kannan
                   ` (5 preceding siblings ...)
  2022-06-01  7:07 ` [PATCH v2 6/9] Revert "driver core: Set default deferred_probe_timeout back to 0." Saravana Kannan
@ 2022-06-01  7:07 ` Saravana Kannan
  2022-06-22  7:47   ` Sascha Hauer
  2022-06-01  7:07 ` [PATCH v2 8/9] iommu/of: Delete usage of driver_deferred_probe_check_state() Saravana Kannan
                   ` (2 subsequent siblings)
  9 siblings, 1 reply; 69+ messages in thread
From: Saravana Kannan @ 2022-06-01  7:07 UTC (permalink / raw)
  To: Greg Kroah-Hartman, Rafael J. Wysocki, Kevin Hilman, Ulf Hansson,
	Len Brown, Pavel Machek, Joerg Roedel, Will Deacon, Andrew Lunn,
	Heiner Kallweit, Russell King, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Linus Walleij, Hideaki YOSHIFUJI,
	David Ahern
  Cc: Saravana Kannan, kernel-team, linux-kernel, linux-pm, iommu,
	netdev, linux-gpio

Now that deferred_probe_timeout is non-zero by default, fw_devlink will
never permanently block the probing of devices. It'll try its best to
probe the devices in the right order and then finally let devices probe
even if their suppliers don't have any drivers.

Signed-off-by: Saravana Kannan <saravanak@google.com>
---
 drivers/base/core.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/base/core.c b/drivers/base/core.c
index 61fdfe99b348..977b379a495b 100644
--- a/drivers/base/core.c
+++ b/drivers/base/core.c
@@ -1613,7 +1613,7 @@ static int __init fw_devlink_setup(char *arg)
 }
 early_param("fw_devlink", fw_devlink_setup);
 
-static bool fw_devlink_strict;
+static bool fw_devlink_strict = true;
 static int __init fw_devlink_strict_setup(char *arg)
 {
 	return strtobool(arg, &fw_devlink_strict);
-- 
2.36.1.255.ge46751e96f-goog


^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [PATCH v2 8/9] iommu/of: Delete usage of driver_deferred_probe_check_state()
  2022-06-01  7:06 [PATCH v2 0/9] deferred_probe_timeout logic clean up Saravana Kannan
                   ` (6 preceding siblings ...)
  2022-06-01  7:07 ` [PATCH v2 7/9] driver core: Set fw_devlink.strict=1 by default Saravana Kannan
@ 2022-06-01  7:07 ` Saravana Kannan
  2022-06-01  7:07 ` [PATCH v2 9/9] driver core: Delete driver_deferred_probe_check_state() Saravana Kannan
  2022-06-07 18:07 ` [PATCH v2 0/9] deferred_probe_timeout logic clean up Geert Uytterhoeven
  9 siblings, 0 replies; 69+ messages in thread
From: Saravana Kannan @ 2022-06-01  7:07 UTC (permalink / raw)
  To: Greg Kroah-Hartman, Rafael J. Wysocki, Kevin Hilman, Ulf Hansson,
	Len Brown, Pavel Machek, Joerg Roedel, Will Deacon, Andrew Lunn,
	Heiner Kallweit, Russell King, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Linus Walleij, Hideaki YOSHIFUJI,
	David Ahern
  Cc: Saravana Kannan, kernel-team, linux-kernel, linux-pm, iommu,
	netdev, linux-gpio

Now that fw_devlink=on and fw_devlink.strict=1 by default and fw_devlink
supports iommu DT properties, the execution will never get to the point
where driver_deferred_probe_check_state() is called before the supplier
has probed successfully or before deferred probe timeout has expired.

So, delete the call and replace it with -ENODEV.

Signed-off-by: Saravana Kannan <saravanak@google.com>
---
 drivers/iommu/of_iommu.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/iommu/of_iommu.c b/drivers/iommu/of_iommu.c
index 5696314ae69e..41f4eb005219 100644
--- a/drivers/iommu/of_iommu.c
+++ b/drivers/iommu/of_iommu.c
@@ -40,7 +40,7 @@ static int of_iommu_xlate(struct device *dev,
 	 * a proper probe-ordering dependency mechanism in future.
 	 */
 	if (!ops)
-		return driver_deferred_probe_check_state(dev);
+		return -ENODEV;
 
 	if (!try_module_get(ops->owner))
 		return -ENODEV;
-- 
2.36.1.255.ge46751e96f-goog


^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [PATCH v2 9/9] driver core: Delete driver_deferred_probe_check_state()
  2022-06-01  7:06 [PATCH v2 0/9] deferred_probe_timeout logic clean up Saravana Kannan
                   ` (7 preceding siblings ...)
  2022-06-01  7:07 ` [PATCH v2 8/9] iommu/of: Delete usage of driver_deferred_probe_check_state() Saravana Kannan
@ 2022-06-01  7:07 ` Saravana Kannan
  2022-06-07 18:07 ` [PATCH v2 0/9] deferred_probe_timeout logic clean up Geert Uytterhoeven
  9 siblings, 0 replies; 69+ messages in thread
From: Saravana Kannan @ 2022-06-01  7:07 UTC (permalink / raw)
  To: Greg Kroah-Hartman, Rafael J. Wysocki, Kevin Hilman, Ulf Hansson,
	Len Brown, Pavel Machek, Joerg Roedel, Will Deacon, Andrew Lunn,
	Heiner Kallweit, Russell King, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Linus Walleij, Hideaki YOSHIFUJI,
	David Ahern
  Cc: Saravana Kannan, kernel-team, linux-kernel, linux-pm, iommu,
	netdev, linux-gpio

The function is no longer used. So delete it.

Signed-off-by: Saravana Kannan <saravanak@google.com>
---
 drivers/base/dd.c             | 30 ------------------------------
 include/linux/device/driver.h |  1 -
 2 files changed, 31 deletions(-)

diff --git a/drivers/base/dd.c b/drivers/base/dd.c
index 335e71d3a618..e600dd2afc35 100644
--- a/drivers/base/dd.c
+++ b/drivers/base/dd.c
@@ -274,42 +274,12 @@ static int __init deferred_probe_timeout_setup(char *str)
 }
 __setup("deferred_probe_timeout=", deferred_probe_timeout_setup);
 
-/**
- * driver_deferred_probe_check_state() - Check deferred probe state
- * @dev: device to check
- *
- * Return:
- * * -ENODEV if initcalls have completed and modules are disabled.
- * * -ETIMEDOUT if the deferred probe timeout was set and has expired
- *   and modules are enabled.
- * * -EPROBE_DEFER in other cases.
- *
- * Drivers or subsystems can opt-in to calling this function instead of directly
- * returning -EPROBE_DEFER.
- */
-int driver_deferred_probe_check_state(struct device *dev)
-{
-	if (!IS_ENABLED(CONFIG_MODULES) && initcalls_done) {
-		dev_warn(dev, "ignoring dependency for device, assuming no driver\n");
-		return -ENODEV;
-	}
-
-	if (!driver_deferred_probe_timeout && initcalls_done) {
-		dev_warn(dev, "deferred probe timeout, ignoring dependency\n");
-		return -ETIMEDOUT;
-	}
-
-	return -EPROBE_DEFER;
-}
-EXPORT_SYMBOL_GPL(driver_deferred_probe_check_state);
-
 static void deferred_probe_timeout_work_func(struct work_struct *work)
 {
 	struct device_private *p;
 
 	fw_devlink_drivers_done();
 
-	driver_deferred_probe_timeout = 0;
 	driver_deferred_probe_trigger();
 	flush_work(&deferred_probe_work);
 
diff --git a/include/linux/device/driver.h b/include/linux/device/driver.h
index 2114d65b862f..7acaabde5396 100644
--- a/include/linux/device/driver.h
+++ b/include/linux/device/driver.h
@@ -242,7 +242,6 @@ driver_find_device_by_acpi_dev(struct device_driver *drv, const void *adev)
 
 extern int driver_deferred_probe_timeout;
 void driver_deferred_probe_add(struct device *dev);
-int driver_deferred_probe_check_state(struct device *dev);
 void driver_init(void);
 
 /**
-- 
2.36.1.255.ge46751e96f-goog


^ permalink raw reply related	[flat|nested] 69+ messages in thread

* Re: [PATCH v2 0/9] deferred_probe_timeout logic clean up
  2022-06-01  7:06 [PATCH v2 0/9] deferred_probe_timeout logic clean up Saravana Kannan
                   ` (8 preceding siblings ...)
  2022-06-01  7:07 ` [PATCH v2 9/9] driver core: Delete driver_deferred_probe_check_state() Saravana Kannan
@ 2022-06-07 18:07 ` Geert Uytterhoeven
  2022-06-08  0:55   ` Saravana Kannan
  9 siblings, 1 reply; 69+ messages in thread
From: Geert Uytterhoeven @ 2022-06-07 18:07 UTC (permalink / raw)
  To: Saravana Kannan
  Cc: Greg Kroah-Hartman, Rafael J. Wysocki, Kevin Hilman, Ulf Hansson,
	Len Brown, Pavel Machek, Joerg Roedel, Will Deacon, Andrew Lunn,
	Heiner Kallweit, Russell King, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Linus Walleij, Hideaki YOSHIFUJI,
	David Ahern, Android Kernel Team, Linux Kernel Mailing List,
	Linux PM list, Linux IOMMU, netdev, open list:GPIO SUBSYSTEM,
	Linux-Renesas

Hi Saravana,

On Wed, Jun 1, 2022 at 12:46 PM Saravana Kannan <saravanak@google.com> wrote:
> This series is based on linux-next + these 2 small patches applies on top:
> https://lore.kernel.org/lkml/20220526034609.480766-1-saravanak@google.com/
>
> A lot of the deferred_probe_timeout logic is redundant with
> fw_devlink=on.  Also, enabling deferred_probe_timeout by default breaks
> a few cases.
>
> This series tries to delete the redundant logic, simplify the frameworks
> that use driver_deferred_probe_check_state(), enable
> deferred_probe_timeout=10 by default, and fixes the nfsroot failure
> case.
>
> The overall idea of this series is to replace the global behavior of
> driver_deferred_probe_check_state() where all devices give up waiting on
> supplier at the same time with a more granular behavior:
>
> 1. Devices with all their suppliers successfully probed by late_initcall
>    probe as usual and avoid unnecessary deferred probe attempts.
>
> 2. At or after late_initcall, in cases where boot would break because of
>    fw_devlink=on being strict about the ordering, we
>
>    a. Temporarily relax the enforcement to probe any unprobed devices
>       that can probe successfully in the current state of the system.
>       For example, when we boot with a NFS rootfs and no network device
>       has probed.
>    b. Go back to enforcing the ordering for any devices that haven't
>       probed.
>
> 3. After deferred probe timeout expires, we permanently give up waiting
>    on supplier devices without drivers. At this point, whatever devices
>    can probe without some of their optional suppliers end up probing.
>
> In the case where module support is disabled, it's fairly
> straightforward and all device probes are completed before the initcalls
> are done.
>
> Patches 1 to 3 are fairly straightforward and can probably be applied
> right away.
>
> Patches 4 to 6 are for fixing the NFS rootfs issue and setting the
> default deferred_probe_timeout back to 10 seconds when modules are
> enabled.
>
> Patches 7 to 9 are further clean up of the deferred_probe_timeout logic
> so that no framework has to know/care about deferred_probe_timeout.
>
> Yoshihiro/Geert,
>
> If you can test this patch series and confirm that the NFS root case
> works, I'd really appreciate that.

Thanks, I gave this a try on various boards I have access to.
The results were quite positive. E.g. the compile error I saw on v1
(implicit declation of fw_devlink_unblock_may_probe(), which is no longer
 used in v2) is gone.

However, I'm seeing a weird error when userspace (Debian9 nfsroot) is
starting:

    [  OK  ] Started D-Bus System Message Bus.
    Unable to handle kernel NULL pointer dereference at virtual
address 0000000000000000
    Unable to handle kernel NULL pointer dereference at virtual
address 0000000000000000
    Mem abort info:
      ESR = 0x0000000096000004
    Mem abort info:
      ESR = 0x0000000096000004
      EC = 0x25: DABT (current EL), IL = 32 bits
      SET = 0, FnV = 0
      EC = 0x25: DABT (current EL), IL = 32 bits
      EA = 0, S1PTW = 0
      FSC = 0x04: level 0 translation fault
      SET = 0, FnV = 0
    Data abort info:
      ISV = 0, ISS = 0x00000004
      EA = 0, S1PTW = 0
      FSC = 0x04: level 0 translation fault
      CM = 0, WnR = 0
    user pgtable: 4k pages, 48-bit VAs, pgdp=000000004ec45000
    [0000000000000000] pgd=0000000000000000, p4d=0000000000000000
    Data abort info:
    Internal error: Oops: 96000004 [#1] PREEMPT SMP
    CPU: 0 PID: 374 Comm: v4l_id Tainted: G        W
5.19.0-rc1-arm64-renesas-00799-gc13c3e49e8bd #1660
      ISV = 0, ISS = 0x00000004
    Hardware name: Renesas Ebisu-4D board based on r8a77990 (DT)
    pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
      CM = 0, WnR = 0
    pc : subdev_open+0x8c/0x128
    lr : subdev_open+0x78/0x128
    sp : ffff80000aadba60
    x29: ffff80000aadba60 x28: 0000000000000000 x27: ffff80000aadbc58
    x26: 0000000000020000 x25: ffff00000b3aaf00 x24: 0000000000000000
    x23: ffff00000c331c00 x22: ffff000009aa61b8 x21: ffff000009aa6000
    x20: ffff000008bae3e8 x19: ffff00000c3fe200 x18: 0000000000000000
    x17: ffff800076945000 x16: ffff800008004000 x15: 00008cc6bf550c7c
    x14: 000000000000038f x13: 000000000000001a x12: ffff00007fba8618
    x11: 0000000000000001 x10: 0000000000000000 x9 : ffff800009253954
    x8 : ffff00000b3aaf00 x7 : 0000000000000004 x6 : 000000000000001a
    x5 : 0000000000000000 x4 : 0000000000000000 x3 : 0000000000000001
    x2 : 0000000100000001 x1 : 0000000000000000 x0 : 0000000000000000
    Call trace:
     subdev_open+0x8c/0x128
     v4l2_open+0xa4/0x120
     chrdev_open+0x78/0x178
     do_dentry_open+0xfc/0x398
     vfs_open+0x28/0x30
     path_openat+0x584/0x9c8
     do_filp_open+0x80/0x108
     do_sys_openat2+0x20c/0x2d8
    user pgtable: 4k pages, 48-bit VAs, pgdp=000000004ec53000
     do_sys_open+0x54/0xa0
     __arm64_sys_openat+0x20/0x28
     invoke_syscall+0x40/0xf8
     el0_svc_common.constprop.0+0xf0/0x110
     do_el0_svc+0x20/0x78
     el0_svc+0x48/0xd0
     el0t_64_sync_handler+0xb0/0xb8
     el0t_64_sync+0x148/0x14c
    Code: f9405280 f9400400 b40000e0 f9400280 (f9400000)
    ---[ end trace 0000000000000000 ]---

This only happens on the Ebisu-4D board (r8a77990-ebisu.dts).
I do not see this on the Salvator-X(S) boards.

Bisection shows this starts to happen with "[PATCH v2 7/9] driver core:
Set fw_devlink.strict=1 by default".

Adding more debug info:

    subdev_open:54: file v4l-subdev1
    Unable to handle kernel NULL pointer dereference at virtual
address 0000000000000000
    subdev_open:54: file v4l-subdev2
    Unable to handle kernel NULL pointer dereference at virtual
address 0000000000000000

Matching the subdev using sysfs gives:

    /sys/devices/platform/soc/e6500000.i2c/i2c-0/0-0070/video4linux/v4l-subdev1
    /sys/devices/platform/soc/e6500000.i2c/i2c-0/0-0070/video4linux/v4l-subdev2

The i2c device is the adi,adv7482 at address 0x70.
But now I'm lost...

Gr{oetje,eeting}s,

                        Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v2 0/9] deferred_probe_timeout logic clean up
  2022-06-07 18:07 ` [PATCH v2 0/9] deferred_probe_timeout logic clean up Geert Uytterhoeven
@ 2022-06-08  0:55   ` Saravana Kannan
  2022-06-08  4:17     ` Saravana Kannan
  0 siblings, 1 reply; 69+ messages in thread
From: Saravana Kannan @ 2022-06-08  0:55 UTC (permalink / raw)
  To: Geert Uytterhoeven
  Cc: Greg Kroah-Hartman, Rafael J. Wysocki, Kevin Hilman, Ulf Hansson,
	Len Brown, Pavel Machek, Joerg Roedel, Will Deacon, Andrew Lunn,
	Heiner Kallweit, Russell King, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Linus Walleij, David Ahern,
	Android Kernel Team, Linux Kernel Mailing List, Linux PM list,
	Linux IOMMU, netdev, open list:GPIO SUBSYSTEM, Linux-Renesas

-Hideaki -- their email keeps bouncing.

On Tue, Jun 7, 2022 at 11:13 AM Geert Uytterhoeven <geert@linux-m68k.org> wrote:
>
> Hi Saravana,
>
> On Wed, Jun 1, 2022 at 12:46 PM Saravana Kannan <saravanak@google.com> wrote:
> > This series is based on linux-next + these 2 small patches applies on top:
> > https://lore.kernel.org/lkml/20220526034609.480766-1-saravanak@google.com/
> >
> > A lot of the deferred_probe_timeout logic is redundant with
> > fw_devlink=on.  Also, enabling deferred_probe_timeout by default breaks
> > a few cases.
> >
> > This series tries to delete the redundant logic, simplify the frameworks
> > that use driver_deferred_probe_check_state(), enable
> > deferred_probe_timeout=10 by default, and fixes the nfsroot failure
> > case.
> >
> > The overall idea of this series is to replace the global behavior of
> > driver_deferred_probe_check_state() where all devices give up waiting on
> > supplier at the same time with a more granular behavior:
> >
> > 1. Devices with all their suppliers successfully probed by late_initcall
> >    probe as usual and avoid unnecessary deferred probe attempts.
> >
> > 2. At or after late_initcall, in cases where boot would break because of
> >    fw_devlink=on being strict about the ordering, we
> >
> >    a. Temporarily relax the enforcement to probe any unprobed devices
> >       that can probe successfully in the current state of the system.
> >       For example, when we boot with a NFS rootfs and no network device
> >       has probed.
> >    b. Go back to enforcing the ordering for any devices that haven't
> >       probed.
> >
> > 3. After deferred probe timeout expires, we permanently give up waiting
> >    on supplier devices without drivers. At this point, whatever devices
> >    can probe without some of their optional suppliers end up probing.
> >
> > In the case where module support is disabled, it's fairly
> > straightforward and all device probes are completed before the initcalls
> > are done.
> >
> > Patches 1 to 3 are fairly straightforward and can probably be applied
> > right away.
> >
> > Patches 4 to 6 are for fixing the NFS rootfs issue and setting the
> > default deferred_probe_timeout back to 10 seconds when modules are
> > enabled.
> >
> > Patches 7 to 9 are further clean up of the deferred_probe_timeout logic
> > so that no framework has to know/care about deferred_probe_timeout.
> >
> > Yoshihiro/Geert,
> >
> > If you can test this patch series and confirm that the NFS root case
> > works, I'd really appreciate that.
>
> Thanks, I gave this a try on various boards I have access to.
> The results were quite positive. E.g. the compile error I saw on v1
> (implicit declation of fw_devlink_unblock_may_probe(), which is no longer
>  used in v2) is gone.

Thanks a lot for testing these.

> However, I'm seeing a weird error when userspace (Debian9 nfsroot) is
> starting:
>
>     [  OK  ] Started D-Bus System Message Bus.
>     Unable to handle kernel NULL pointer dereference at virtual
> address 0000000000000000
>     Unable to handle kernel NULL pointer dereference at virtual
> address 0000000000000000
>     Mem abort info:
>       ESR = 0x0000000096000004
>     Mem abort info:
>       ESR = 0x0000000096000004
>       EC = 0x25: DABT (current EL), IL = 32 bits
>       SET = 0, FnV = 0
>       EC = 0x25: DABT (current EL), IL = 32 bits
>       EA = 0, S1PTW = 0
>       FSC = 0x04: level 0 translation fault
>       SET = 0, FnV = 0
>     Data abort info:
>       ISV = 0, ISS = 0x00000004
>       EA = 0, S1PTW = 0
>       FSC = 0x04: level 0 translation fault
>       CM = 0, WnR = 0
>     user pgtable: 4k pages, 48-bit VAs, pgdp=000000004ec45000
>     [0000000000000000] pgd=0000000000000000, p4d=0000000000000000
>     Data abort info:
>     Internal error: Oops: 96000004 [#1] PREEMPT SMP
>     CPU: 0 PID: 374 Comm: v4l_id Tainted: G        W
> 5.19.0-rc1-arm64-renesas-00799-gc13c3e49e8bd #1660
>       ISV = 0, ISS = 0x00000004
>     Hardware name: Renesas Ebisu-4D board based on r8a77990 (DT)
>     pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
>       CM = 0, WnR = 0
>     pc : subdev_open+0x8c/0x128
>     lr : subdev_open+0x78/0x128
>     sp : ffff80000aadba60
>     x29: ffff80000aadba60 x28: 0000000000000000 x27: ffff80000aadbc58
>     x26: 0000000000020000 x25: ffff00000b3aaf00 x24: 0000000000000000
>     x23: ffff00000c331c00 x22: ffff000009aa61b8 x21: ffff000009aa6000
>     x20: ffff000008bae3e8 x19: ffff00000c3fe200 x18: 0000000000000000
>     x17: ffff800076945000 x16: ffff800008004000 x15: 00008cc6bf550c7c
>     x14: 000000000000038f x13: 000000000000001a x12: ffff00007fba8618
>     x11: 0000000000000001 x10: 0000000000000000 x9 : ffff800009253954
>     x8 : ffff00000b3aaf00 x7 : 0000000000000004 x6 : 000000000000001a
>     x5 : 0000000000000000 x4 : 0000000000000000 x3 : 0000000000000001
>     x2 : 0000000100000001 x1 : 0000000000000000 x0 : 0000000000000000
>     Call trace:
>      subdev_open+0x8c/0x128
>      v4l2_open+0xa4/0x120
>      chrdev_open+0x78/0x178
>      do_dentry_open+0xfc/0x398
>      vfs_open+0x28/0x30
>      path_openat+0x584/0x9c8
>      do_filp_open+0x80/0x108
>      do_sys_openat2+0x20c/0x2d8
>     user pgtable: 4k pages, 48-bit VAs, pgdp=000000004ec53000
>      do_sys_open+0x54/0xa0
>      __arm64_sys_openat+0x20/0x28
>      invoke_syscall+0x40/0xf8
>      el0_svc_common.constprop.0+0xf0/0x110
>      do_el0_svc+0x20/0x78
>      el0_svc+0x48/0xd0
>      el0t_64_sync_handler+0xb0/0xb8
>      el0t_64_sync+0x148/0x14c
>     Code: f9405280 f9400400 b40000e0 f9400280 (f9400000)
>     ---[ end trace 0000000000000000 ]---
>
> This only happens on the Ebisu-4D board (r8a77990-ebisu.dts).
> I do not see this on the Salvator-X(S) boards.

Ok. I don't know much about either of these boards. Are they supposed
to be very similar?

> Bisection shows this starts to happen with "[PATCH v2 7/9] driver core:
> Set fw_devlink.strict=1 by default".

So in the series, by this point, the previous patches would have
deferred probe timeout set to 10s (it can get extended on new driver
additions of course) and once the timer expires suppliers without
drivers will no longer block any consumers. The only difference
fw_devlink.strict=1 should cause is iommus and dmas dependency being
treated as mandatory till the timeout expires.

In this instance, do you have iommu drivers and dma drivers compiled
in or loaded as modules or not available at all? In all these case,
the list of devices that would end up probing eventually should be the
same with or without fw_devlink.strict=1. The only difference would be
some reordering of probes.

So this looks to me like improper error handling/assumption in the
driver for this subdev device. I'm guessing one of the suppliers to
this subdev has a direct/indirect dependency on iommus and this subdev
driver is assuming that the supplier would have probed by the time
it's probed.

>
> Adding more debug info:
>
>     subdev_open:54: file v4l-subdev1
>     Unable to handle kernel NULL pointer dereference at virtual
> address 0000000000000000
>     subdev_open:54: file v4l-subdev2
>     Unable to handle kernel NULL pointer dereference at virtual
> address 0000000000000000
>
> Matching the subdev using sysfs gives:
>
>     /sys/devices/platform/soc/e6500000.i2c/i2c-0/0-0070/video4linux/v4l-subdev1
>     /sys/devices/platform/soc/e6500000.i2c/i2c-0/0-0070/video4linux/v4l-subdev2
>
> The i2c device is the adi,adv7482 at address 0x70.

I'm guessing the fix would be somewhere in this driver, but I haven't
dug into it. Any guesses on which of its suppliers might have a
direct/indirect dependency on an iommu/dma? You could also enable the
debug log in fw_devlink_relax_link() and see if it relaxes any link
where the supplier is an iommu/dma device. That might give us some
hints.

-Saravana

> But now I'm lost...
>
> Gr{oetje,eeting}s,
>
>                         Geert
>
> --
> Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org
>
> In personal conversations with technical people, I call myself a hacker. But
> when I'm talking to journalists I just say "programmer" or something like that.
>                                 -- Linus Torvalds

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v2 0/9] deferred_probe_timeout logic clean up
  2022-06-08  0:55   ` Saravana Kannan
@ 2022-06-08  4:17     ` Saravana Kannan
  2022-06-08 10:25       ` Geert Uytterhoeven
  0 siblings, 1 reply; 69+ messages in thread
From: Saravana Kannan @ 2022-06-08  4:17 UTC (permalink / raw)
  To: Geert Uytterhoeven
  Cc: Greg Kroah-Hartman, Rafael J. Wysocki, Kevin Hilman, Ulf Hansson,
	Len Brown, Pavel Machek, Joerg Roedel, Will Deacon, Andrew Lunn,
	Heiner Kallweit, Russell King, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Linus Walleij, David Ahern,
	Android Kernel Team, Linux Kernel Mailing List, Linux PM list,
	Linux IOMMU, netdev, open list:GPIO SUBSYSTEM, Linux-Renesas

On Tue, Jun 7, 2022 at 5:55 PM Saravana Kannan <saravanak@google.com> wrote:
>
> -Hideaki -- their email keeps bouncing.
>
> On Tue, Jun 7, 2022 at 11:13 AM Geert Uytterhoeven <geert@linux-m68k.org> wrote:
> >
> > Hi Saravana,
> >
> > On Wed, Jun 1, 2022 at 12:46 PM Saravana Kannan <saravanak@google.com> wrote:
> > > This series is based on linux-next + these 2 small patches applies on top:
> > > https://lore.kernel.org/lkml/20220526034609.480766-1-saravanak@google.com/
> > >
> > > A lot of the deferred_probe_timeout logic is redundant with
> > > fw_devlink=on.  Also, enabling deferred_probe_timeout by default breaks
> > > a few cases.
> > >
> > > This series tries to delete the redundant logic, simplify the frameworks
> > > that use driver_deferred_probe_check_state(), enable
> > > deferred_probe_timeout=10 by default, and fixes the nfsroot failure
> > > case.
> > >
> > > The overall idea of this series is to replace the global behavior of
> > > driver_deferred_probe_check_state() where all devices give up waiting on
> > > supplier at the same time with a more granular behavior:
> > >
> > > 1. Devices with all their suppliers successfully probed by late_initcall
> > >    probe as usual and avoid unnecessary deferred probe attempts.
> > >
> > > 2. At or after late_initcall, in cases where boot would break because of
> > >    fw_devlink=on being strict about the ordering, we
> > >
> > >    a. Temporarily relax the enforcement to probe any unprobed devices
> > >       that can probe successfully in the current state of the system.
> > >       For example, when we boot with a NFS rootfs and no network device
> > >       has probed.
> > >    b. Go back to enforcing the ordering for any devices that haven't
> > >       probed.
> > >
> > > 3. After deferred probe timeout expires, we permanently give up waiting
> > >    on supplier devices without drivers. At this point, whatever devices
> > >    can probe without some of their optional suppliers end up probing.
> > >
> > > In the case where module support is disabled, it's fairly
> > > straightforward and all device probes are completed before the initcalls
> > > are done.
> > >
> > > Patches 1 to 3 are fairly straightforward and can probably be applied
> > > right away.
> > >
> > > Patches 4 to 6 are for fixing the NFS rootfs issue and setting the
> > > default deferred_probe_timeout back to 10 seconds when modules are
> > > enabled.
> > >
> > > Patches 7 to 9 are further clean up of the deferred_probe_timeout logic
> > > so that no framework has to know/care about deferred_probe_timeout.
> > >
> > > Yoshihiro/Geert,
> > >
> > > If you can test this patch series and confirm that the NFS root case
> > > works, I'd really appreciate that.
> >
> > Thanks, I gave this a try on various boards I have access to.
> > The results were quite positive. E.g. the compile error I saw on v1
> > (implicit declation of fw_devlink_unblock_may_probe(), which is no longer
> >  used in v2) is gone.
>
> Thanks a lot for testing these.
>
> > However, I'm seeing a weird error when userspace (Debian9 nfsroot) is
> > starting:
> >
> >     [  OK  ] Started D-Bus System Message Bus.
> >     Unable to handle kernel NULL pointer dereference at virtual
> > address 0000000000000000
> >     Unable to handle kernel NULL pointer dereference at virtual
> > address 0000000000000000
> >     Mem abort info:
> >       ESR = 0x0000000096000004
> >     Mem abort info:
> >       ESR = 0x0000000096000004
> >       EC = 0x25: DABT (current EL), IL = 32 bits
> >       SET = 0, FnV = 0
> >       EC = 0x25: DABT (current EL), IL = 32 bits
> >       EA = 0, S1PTW = 0
> >       FSC = 0x04: level 0 translation fault
> >       SET = 0, FnV = 0
> >     Data abort info:
> >       ISV = 0, ISS = 0x00000004
> >       EA = 0, S1PTW = 0
> >       FSC = 0x04: level 0 translation fault
> >       CM = 0, WnR = 0
> >     user pgtable: 4k pages, 48-bit VAs, pgdp=000000004ec45000
> >     [0000000000000000] pgd=0000000000000000, p4d=0000000000000000
> >     Data abort info:
> >     Internal error: Oops: 96000004 [#1] PREEMPT SMP
> >     CPU: 0 PID: 374 Comm: v4l_id Tainted: G        W
> > 5.19.0-rc1-arm64-renesas-00799-gc13c3e49e8bd #1660
> >       ISV = 0, ISS = 0x00000004
> >     Hardware name: Renesas Ebisu-4D board based on r8a77990 (DT)
> >     pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> >       CM = 0, WnR = 0
> >     pc : subdev_open+0x8c/0x128
> >     lr : subdev_open+0x78/0x128
> >     sp : ffff80000aadba60
> >     x29: ffff80000aadba60 x28: 0000000000000000 x27: ffff80000aadbc58
> >     x26: 0000000000020000 x25: ffff00000b3aaf00 x24: 0000000000000000
> >     x23: ffff00000c331c00 x22: ffff000009aa61b8 x21: ffff000009aa6000
> >     x20: ffff000008bae3e8 x19: ffff00000c3fe200 x18: 0000000000000000
> >     x17: ffff800076945000 x16: ffff800008004000 x15: 00008cc6bf550c7c
> >     x14: 000000000000038f x13: 000000000000001a x12: ffff00007fba8618
> >     x11: 0000000000000001 x10: 0000000000000000 x9 : ffff800009253954
> >     x8 : ffff00000b3aaf00 x7 : 0000000000000004 x6 : 000000000000001a
> >     x5 : 0000000000000000 x4 : 0000000000000000 x3 : 0000000000000001
> >     x2 : 0000000100000001 x1 : 0000000000000000 x0 : 0000000000000000
> >     Call trace:
> >      subdev_open+0x8c/0x128

After disassembling the code on my end (with slightly different
config) and looking at 0x8c from the start of the function, I'm pretty
sure the NULL deref is happening here inside subdev_open()

        if (sd->v4l2_dev->mdev && sd->entity.graph_obj.mdev->dev) {

sd->entity.graph_obj.mdev == NULL.

And going by the field names, I'm guessing these are suppliers pointed
to by "remote-endpoint". Sadly fw_devlink can't extract any dependency
info from remote-endpoint because the devices generally point to each
other so a cycle is detected and the probe ordering isn't enforced
between the endpoints. We still need to parse remote-endpoint to
detect cycles created by a combination of endpoints/other properties
(there's a real world case in upstream).

> >      v4l2_open+0xa4/0x120
> >      chrdev_open+0x78/0x178
> >      do_dentry_open+0xfc/0x398
> >      vfs_open+0x28/0x30
> >      path_openat+0x584/0x9c8
> >      do_filp_open+0x80/0x108
> >      do_sys_openat2+0x20c/0x2d8
> >     user pgtable: 4k pages, 48-bit VAs, pgdp=000000004ec53000
> >      do_sys_open+0x54/0xa0
> >      __arm64_sys_openat+0x20/0x28
> >      invoke_syscall+0x40/0xf8
> >      el0_svc_common.constprop.0+0xf0/0x110
> >      do_el0_svc+0x20/0x78
> >      el0_svc+0x48/0xd0
> >      el0t_64_sync_handler+0xb0/0xb8
> >      el0t_64_sync+0x148/0x14c
> >     Code: f9405280 f9400400 b40000e0 f9400280 (f9400000)
> >     ---[ end trace 0000000000000000 ]---
> >
> > This only happens on the Ebisu-4D board (r8a77990-ebisu.dts).
> > I do not see this on the Salvator-X(S) boards.
>
> Ok. I don't know much about either of these boards. Are they supposed
> to be very similar?
>
> > Bisection shows this starts to happen with "[PATCH v2 7/9] driver core:
> > Set fw_devlink.strict=1 by default".
>
> So in the series, by this point, the previous patches would have
> deferred probe timeout set to 10s (it can get extended on new driver
> additions of course) and once the timer expires suppliers without
> drivers will no longer block any consumers. The only difference
> fw_devlink.strict=1 should cause is iommus and dmas dependency being
> treated as mandatory till the timeout expires.
>
> In this instance, do you have iommu drivers and dma drivers compiled
> in or loaded as modules or not available at all? In all these case,
> the list of devices that would end up probing eventually should be the
> same with or without fw_devlink.strict=1. The only difference would be
> some reordering of probes.
>
> So this looks to me like improper error handling/assumption in the
> driver for this subdev device. I'm guessing one of the suppliers to
> this subdev has a direct/indirect dependency on iommus and this subdev
> driver is assuming that the supplier would have probed by the time
> it's probed.
>
> >
> > Adding more debug info:
> >
> >     subdev_open:54: file v4l-subdev1
> >     Unable to handle kernel NULL pointer dereference at virtual
> > address 0000000000000000
> >     subdev_open:54: file v4l-subdev2
> >     Unable to handle kernel NULL pointer dereference at virtual
> > address 0000000000000000

How did you get these two "subdev_open" strings? And how/why the NULL
deref there?

> >
> > Matching the subdev using sysfs gives:
> >
> >     /sys/devices/platform/soc/e6500000.i2c/i2c-0/0-0070/video4linux/v4l-subdev1
> >     /sys/devices/platform/soc/e6500000.i2c/i2c-0/0-0070/video4linux/v4l-subdev2
> >
> > The i2c device is the adi,adv7482 at address 0x70.
>
> I'm guessing the fix would be somewhere in this driver, but I haven't
> dug into it. Any guesses on which of its suppliers might have a
> direct/indirect dependency on an iommu/dma? You could also enable the
> debug log in fw_devlink_relax_link() and see if it relaxes any link
> where the supplier is an iommu/dma device. That might give us some
> hints.

After spending way too much time on this looking at
drivers/media/v4l2-core, drivers/media/mc and
drivers/media/i2c/adv748x/ code, I'm guessing the ordering issue is
probably between "csi40:" device and the video-receiver@70 (the
"adi,adv7482") device.

Based on your points about the sysfs, I was initially digging into
drivers/media/i2c/adv748x/adv748x-core.c. But then the parent of
video-receiver@70 is an i2c0 that has dmas dependencies. The csi40:
(referred to from video-controller) doesn't seem to have any iommu or
dmas dependency. So my guess is the csi40 gets probed first and then
assumes the video-controller is already available.

Can you use this info to take a stab at debugging this further?

TL;DR is that I think this is some driver issue where it's not
checking for one of its suppliers to be ready yet.

-Saravana

>
> -Saravana
>
> > But now I'm lost...
> >
> > Gr{oetje,eeting}s,
> >
> >                         Geert
> >
> > --
> > Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org
> >
> > In personal conversations with technical people, I call myself a hacker. But
> > when I'm talking to journalists I just say "programmer" or something like that.
> >                                 -- Linus Torvalds

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v2 0/9] deferred_probe_timeout logic clean up
  2022-06-08  4:17     ` Saravana Kannan
@ 2022-06-08 10:25       ` Geert Uytterhoeven
  2022-06-08 18:12         ` Saravana Kannan
  0 siblings, 1 reply; 69+ messages in thread
From: Geert Uytterhoeven @ 2022-06-08 10:25 UTC (permalink / raw)
  To: Saravana Kannan
  Cc: Greg Kroah-Hartman, Rafael J. Wysocki, Kevin Hilman, Ulf Hansson,
	Len Brown, Pavel Machek, Joerg Roedel, Will Deacon, Andrew Lunn,
	Heiner Kallweit, Russell King, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Linus Walleij, David Ahern,
	Android Kernel Team, Linux Kernel Mailing List, Linux PM list,
	Linux IOMMU, netdev, open list:GPIO SUBSYSTEM, Linux-Renesas,
	Niklas Söderlund, Laurent Pinchart

Hi Saravana,

On Wed, Jun 8, 2022 at 6:17 AM Saravana Kannan <saravanak@google.com> wrote:
> On Tue, Jun 7, 2022 at 5:55 PM Saravana Kannan <saravanak@google.com> wrote:
> > On Tue, Jun 7, 2022 at 11:13 AM Geert Uytterhoeven <geert@linux-m68k.org> wrote:
> > > On Wed, Jun 1, 2022 at 12:46 PM Saravana Kannan <saravanak@google.com> wrote:
> > > > This series is based on linux-next + these 2 small patches applies on top:
> > > > https://lore.kernel.org/lkml/20220526034609.480766-1-saravanak@google.com/
> > > >
> > > > A lot of the deferred_probe_timeout logic is redundant with
> > > > fw_devlink=on.  Also, enabling deferred_probe_timeout by default breaks
> > > > a few cases.
> > > >
> > > > This series tries to delete the redundant logic, simplify the frameworks
> > > > that use driver_deferred_probe_check_state(), enable
> > > > deferred_probe_timeout=10 by default, and fixes the nfsroot failure
> > > > case.
> > > >
> > > > The overall idea of this series is to replace the global behavior of
> > > > driver_deferred_probe_check_state() where all devices give up waiting on
> > > > supplier at the same time with a more granular behavior:
> > > >
> > > > 1. Devices with all their suppliers successfully probed by late_initcall
> > > >    probe as usual and avoid unnecessary deferred probe attempts.
> > > >
> > > > 2. At or after late_initcall, in cases where boot would break because of
> > > >    fw_devlink=on being strict about the ordering, we
> > > >
> > > >    a. Temporarily relax the enforcement to probe any unprobed devices
> > > >       that can probe successfully in the current state of the system.
> > > >       For example, when we boot with a NFS rootfs and no network device
> > > >       has probed.
> > > >    b. Go back to enforcing the ordering for any devices that haven't
> > > >       probed.
> > > >
> > > > 3. After deferred probe timeout expires, we permanently give up waiting
> > > >    on supplier devices without drivers. At this point, whatever devices
> > > >    can probe without some of their optional suppliers end up probing.
> > > >
> > > > In the case where module support is disabled, it's fairly
> > > > straightforward and all device probes are completed before the initcalls
> > > > are done.
> > > >
> > > > Patches 1 to 3 are fairly straightforward and can probably be applied
> > > > right away.
> > > >
> > > > Patches 4 to 6 are for fixing the NFS rootfs issue and setting the
> > > > default deferred_probe_timeout back to 10 seconds when modules are
> > > > enabled.
> > > >
> > > > Patches 7 to 9 are further clean up of the deferred_probe_timeout logic
> > > > so that no framework has to know/care about deferred_probe_timeout.
> > > >
> > > > Yoshihiro/Geert,
> > > >
> > > > If you can test this patch series and confirm that the NFS root case
> > > > works, I'd really appreciate that.
> > >
> > > Thanks, I gave this a try on various boards I have access to.
> > > The results were quite positive. E.g. the compile error I saw on v1
> > > (implicit declation of fw_devlink_unblock_may_probe(), which is no longer
> > >  used in v2) is gone.
> >
> > Thanks a lot for testing these.
> >
> > > However, I'm seeing a weird error when userspace (Debian9 nfsroot) is
> > > starting:
> > >
> > >     [  OK  ] Started D-Bus System Message Bus.
> > >     Unable to handle kernel NULL pointer dereference at virtual
> > > address 0000000000000000
> > >     Unable to handle kernel NULL pointer dereference at virtual
> > > address 0000000000000000
> > >     Mem abort info:
> > >       ESR = 0x0000000096000004
> > >     Mem abort info:
> > >       ESR = 0x0000000096000004
> > >       EC = 0x25: DABT (current EL), IL = 32 bits
> > >       SET = 0, FnV = 0
> > >       EC = 0x25: DABT (current EL), IL = 32 bits
> > >       EA = 0, S1PTW = 0
> > >       FSC = 0x04: level 0 translation fault
> > >       SET = 0, FnV = 0
> > >     Data abort info:
> > >       ISV = 0, ISS = 0x00000004
> > >       EA = 0, S1PTW = 0
> > >       FSC = 0x04: level 0 translation fault
> > >       CM = 0, WnR = 0
> > >     user pgtable: 4k pages, 48-bit VAs, pgdp=000000004ec45000
> > >     [0000000000000000] pgd=0000000000000000, p4d=0000000000000000
> > >     Data abort info:
> > >     Internal error: Oops: 96000004 [#1] PREEMPT SMP
> > >     CPU: 0 PID: 374 Comm: v4l_id Tainted: G        W
> > > 5.19.0-rc1-arm64-renesas-00799-gc13c3e49e8bd #1660
> > >       ISV = 0, ISS = 0x00000004
> > >     Hardware name: Renesas Ebisu-4D board based on r8a77990 (DT)
> > >     pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> > >       CM = 0, WnR = 0
> > >     pc : subdev_open+0x8c/0x128
> > >     lr : subdev_open+0x78/0x128
> > >     sp : ffff80000aadba60
> > >     x29: ffff80000aadba60 x28: 0000000000000000 x27: ffff80000aadbc58
> > >     x26: 0000000000020000 x25: ffff00000b3aaf00 x24: 0000000000000000
> > >     x23: ffff00000c331c00 x22: ffff000009aa61b8 x21: ffff000009aa6000
> > >     x20: ffff000008bae3e8 x19: ffff00000c3fe200 x18: 0000000000000000
> > >     x17: ffff800076945000 x16: ffff800008004000 x15: 00008cc6bf550c7c
> > >     x14: 000000000000038f x13: 000000000000001a x12: ffff00007fba8618
> > >     x11: 0000000000000001 x10: 0000000000000000 x9 : ffff800009253954
> > >     x8 : ffff00000b3aaf00 x7 : 0000000000000004 x6 : 000000000000001a
> > >     x5 : 0000000000000000 x4 : 0000000000000000 x3 : 0000000000000001
> > >     x2 : 0000000100000001 x1 : 0000000000000000 x0 : 0000000000000000
> > >     Call trace:
> > >      subdev_open+0x8c/0x128
>
> After disassembling the code on my end (with slightly different
> config) and looking at 0x8c from the start of the function, I'm pretty
> sure the NULL deref is happening here inside subdev_open()
>
>         if (sd->v4l2_dev->mdev && sd->entity.graph_obj.mdev->dev) {
>
> sd->entity.graph_obj.mdev == NULL.
>
> And going by the field names, I'm guessing these are suppliers pointed
> to by "remote-endpoint". Sadly fw_devlink can't extract any dependency
> info from remote-endpoint because the devices generally point to each
> other so a cycle is detected and the probe ordering isn't enforced
> between the endpoints. We still need to parse remote-endpoint to
> detect cycles created by a combination of endpoints/other properties
> (there's a real world case in upstream).
>
> > >      v4l2_open+0xa4/0x120
> > >      chrdev_open+0x78/0x178
> > >      do_dentry_open+0xfc/0x398
> > >      vfs_open+0x28/0x30
> > >      path_openat+0x584/0x9c8
> > >      do_filp_open+0x80/0x108
> > >      do_sys_openat2+0x20c/0x2d8
> > >     user pgtable: 4k pages, 48-bit VAs, pgdp=000000004ec53000
> > >      do_sys_open+0x54/0xa0
> > >      __arm64_sys_openat+0x20/0x28
> > >      invoke_syscall+0x40/0xf8
> > >      el0_svc_common.constprop.0+0xf0/0x110
> > >      do_el0_svc+0x20/0x78
> > >      el0_svc+0x48/0xd0
> > >      el0t_64_sync_handler+0xb0/0xb8
> > >      el0t_64_sync+0x148/0x14c
> > >     Code: f9405280 f9400400 b40000e0 f9400280 (f9400000)
> > >     ---[ end trace 0000000000000000 ]---
> > >
> > > This only happens on the Ebisu-4D board (r8a77990-ebisu.dts).
> > > I do not see this on the Salvator-X(S) boards.
> >
> > Ok. I don't know much about either of these boards. Are they supposed
> > to be very similar?
> >
> > > Bisection shows this starts to happen with "[PATCH v2 7/9] driver core:
> > > Set fw_devlink.strict=1 by default".
> >
> > So in the series, by this point, the previous patches would have
> > deferred probe timeout set to 10s (it can get extended on new driver
> > additions of course) and once the timer expires suppliers without
> > drivers will no longer block any consumers. The only difference
> > fw_devlink.strict=1 should cause is iommus and dmas dependency being
> > treated as mandatory till the timeout expires.
> >
> > In this instance, do you have iommu drivers and dma drivers compiled
> > in or loaded as modules or not available at all? In all these case,
> > the list of devices that would end up probing eventually should be the
> > same with or without fw_devlink.strict=1. The only difference would be
> > some reordering of probes.
> >
> > So this looks to me like improper error handling/assumption in the
> > driver for this subdev device. I'm guessing one of the suppliers to
> > this subdev has a direct/indirect dependency on iommus and this subdev
> > driver is assuming that the supplier would have probed by the time
> > it's probed.
> >
> > >
> > > Adding more debug info:
> > >
> > >     subdev_open:54: file v4l-subdev1
> > >     Unable to handle kernel NULL pointer dereference at virtual
> > > address 0000000000000000
> > >     subdev_open:54: file v4l-subdev2
> > >     Unable to handle kernel NULL pointer dereference at virtual
> > > address 0000000000000000
>
> How did you get these two "subdev_open" strings? And how/why the NULL
> deref there?

I added a debug print at the top of subdev_open():

    pr_info("%s:%u: file %pD\n", __func__, __LINE__, file);

The NULL deref is the actual issue.

> > > Matching the subdev using sysfs gives:
> > >
> > >     /sys/devices/platform/soc/e6500000.i2c/i2c-0/0-0070/video4linux/v4l-subdev1
> > >     /sys/devices/platform/soc/e6500000.i2c/i2c-0/0-0070/video4linux/v4l-subdev2
> > >
> > > The i2c device is the adi,adv7482 at address 0x70.
> >
> > I'm guessing the fix would be somewhere in this driver, but I haven't
> > dug into it. Any guesses on which of its suppliers might have a
> > direct/indirect dependency on an iommu/dma? You could also enable the
> > debug log in fw_devlink_relax_link() and see if it relaxes any link
> > where the supplier is an iommu/dma device. That might give us some
> > hints.
>
> After spending way too much time on this looking at
> drivers/media/v4l2-core, drivers/media/mc and
> drivers/media/i2c/adv748x/ code, I'm guessing the ordering issue is
> probably between "csi40:" device and the video-receiver@70 (the
> "adi,adv7482") device.
>
> Based on your points about the sysfs, I was initially digging into
> drivers/media/i2c/adv748x/adv748x-core.c. But then the parent of
> video-receiver@70 is an i2c0 that has dmas dependencies. The csi40:
> (referred to from video-controller) doesn't seem to have any iommu or
> dmas dependency. So my guess is the csi40 gets probed first and then
> assumes the video-controller is already available.
>
> Can you use this info to take a stab at debugging this further?

Thanks for looking into this, there is indeed a cyclic dependency:

    i2c 0-0070: Fixing up cyclic dependency with feaa0000.csi2
    i2c 0-0070: Fixing up cyclic dependency with hdmi-in
    i2c 0-0070: Fixing up cyclic dependency with cvbs-in

> TL;DR is that I think this is some driver issue where it's not
> checking for one of its suppliers to be ready yet.

Setting fw_devlink_strict to true vs. false seems to influence which of
two different failures will happen:
  - rcar-csi2: probe of feaa0000.csi2 failed with error -22
  - rcar-vin: probe of e6ef5000.video failed with error -22
The former causes the NULL pointer dereference later.
The latter existed before, but I hadn't noticed it, and bisection
led to the real culprit (commit 3e52419ec04f9769 ("media: rcar-{csi2,vin}:
Move to full Virtual Channel routing per CSI-2 IP").

I am bringing it up with the multi-media guys in
https://lore.kernel.org/all/20220124124858.571363-4-niklas.soderlund+renesas@ragnatech.se...

Thanks!

Gr{oetje,eeting}s,

                        Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v2 0/9] deferred_probe_timeout logic clean up
  2022-06-08 10:25       ` Geert Uytterhoeven
@ 2022-06-08 18:12         ` Saravana Kannan
  2022-06-08 18:47           ` Geert Uytterhoeven
  0 siblings, 1 reply; 69+ messages in thread
From: Saravana Kannan @ 2022-06-08 18:12 UTC (permalink / raw)
  To: Geert Uytterhoeven
  Cc: Greg Kroah-Hartman, Rafael J. Wysocki, Kevin Hilman, Ulf Hansson,
	Len Brown, Pavel Machek, Joerg Roedel, Will Deacon, Andrew Lunn,
	Heiner Kallweit, Russell King, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Linus Walleij, David Ahern,
	Android Kernel Team, Linux Kernel Mailing List, Linux PM list,
	Linux IOMMU, netdev, open list:GPIO SUBSYSTEM, Linux-Renesas,
	Niklas Söderlund, Laurent Pinchart

On Wed, Jun 8, 2022 at 3:26 AM Geert Uytterhoeven <geert@linux-m68k.org> wrote:
>
> Hi Saravana,
>
> On Wed, Jun 8, 2022 at 6:17 AM Saravana Kannan <saravanak@google.com> wrote:
> > On Tue, Jun 7, 2022 at 5:55 PM Saravana Kannan <saravanak@google.com> wrote:
> > > On Tue, Jun 7, 2022 at 11:13 AM Geert Uytterhoeven <geert@linux-m68k.org> wrote:
> > > > On Wed, Jun 1, 2022 at 12:46 PM Saravana Kannan <saravanak@google.com> wrote:
> > > > > This series is based on linux-next + these 2 small patches applies on top:
> > > > > https://lore.kernel.org/lkml/20220526034609.480766-1-saravanak@google.com/
> > > > >
> > > > > A lot of the deferred_probe_timeout logic is redundant with
> > > > > fw_devlink=on.  Also, enabling deferred_probe_timeout by default breaks
> > > > > a few cases.
> > > > >
> > > > > This series tries to delete the redundant logic, simplify the frameworks
> > > > > that use driver_deferred_probe_check_state(), enable
> > > > > deferred_probe_timeout=10 by default, and fixes the nfsroot failure
> > > > > case.
> > > > >
> > > > > The overall idea of this series is to replace the global behavior of
> > > > > driver_deferred_probe_check_state() where all devices give up waiting on
> > > > > supplier at the same time with a more granular behavior:
> > > > >
> > > > > 1. Devices with all their suppliers successfully probed by late_initcall
> > > > >    probe as usual and avoid unnecessary deferred probe attempts.
> > > > >
> > > > > 2. At or after late_initcall, in cases where boot would break because of
> > > > >    fw_devlink=on being strict about the ordering, we
> > > > >
> > > > >    a. Temporarily relax the enforcement to probe any unprobed devices
> > > > >       that can probe successfully in the current state of the system.
> > > > >       For example, when we boot with a NFS rootfs and no network device
> > > > >       has probed.
> > > > >    b. Go back to enforcing the ordering for any devices that haven't
> > > > >       probed.
> > > > >
> > > > > 3. After deferred probe timeout expires, we permanently give up waiting
> > > > >    on supplier devices without drivers. At this point, whatever devices
> > > > >    can probe without some of their optional suppliers end up probing.
> > > > >
> > > > > In the case where module support is disabled, it's fairly
> > > > > straightforward and all device probes are completed before the initcalls
> > > > > are done.
> > > > >
> > > > > Patches 1 to 3 are fairly straightforward and can probably be applied
> > > > > right away.
> > > > >
> > > > > Patches 4 to 6 are for fixing the NFS rootfs issue and setting the
> > > > > default deferred_probe_timeout back to 10 seconds when modules are
> > > > > enabled.
> > > > >
> > > > > Patches 7 to 9 are further clean up of the deferred_probe_timeout logic
> > > > > so that no framework has to know/care about deferred_probe_timeout.
> > > > >
> > > > > Yoshihiro/Geert,
> > > > >
> > > > > If you can test this patch series and confirm that the NFS root case
> > > > > works, I'd really appreciate that.
> > > >
> > > > Thanks, I gave this a try on various boards I have access to.
> > > > The results were quite positive. E.g. the compile error I saw on v1
> > > > (implicit declation of fw_devlink_unblock_may_probe(), which is no longer
> > > >  used in v2) is gone.
> > >
> > > Thanks a lot for testing these.
> > >
> > > > However, I'm seeing a weird error when userspace (Debian9 nfsroot) is
> > > > starting:
> > > >
> > > >     [  OK  ] Started D-Bus System Message Bus.
> > > >     Unable to handle kernel NULL pointer dereference at virtual
> > > > address 0000000000000000
> > > >     Unable to handle kernel NULL pointer dereference at virtual
> > > > address 0000000000000000
> > > >     Mem abort info:
> > > >       ESR = 0x0000000096000004
> > > >     Mem abort info:
> > > >       ESR = 0x0000000096000004
> > > >       EC = 0x25: DABT (current EL), IL = 32 bits
> > > >       SET = 0, FnV = 0
> > > >       EC = 0x25: DABT (current EL), IL = 32 bits
> > > >       EA = 0, S1PTW = 0
> > > >       FSC = 0x04: level 0 translation fault
> > > >       SET = 0, FnV = 0
> > > >     Data abort info:
> > > >       ISV = 0, ISS = 0x00000004
> > > >       EA = 0, S1PTW = 0
> > > >       FSC = 0x04: level 0 translation fault
> > > >       CM = 0, WnR = 0
> > > >     user pgtable: 4k pages, 48-bit VAs, pgdp=000000004ec45000
> > > >     [0000000000000000] pgd=0000000000000000, p4d=0000000000000000
> > > >     Data abort info:
> > > >     Internal error: Oops: 96000004 [#1] PREEMPT SMP
> > > >     CPU: 0 PID: 374 Comm: v4l_id Tainted: G        W
> > > > 5.19.0-rc1-arm64-renesas-00799-gc13c3e49e8bd #1660
> > > >       ISV = 0, ISS = 0x00000004
> > > >     Hardware name: Renesas Ebisu-4D board based on r8a77990 (DT)
> > > >     pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> > > >       CM = 0, WnR = 0
> > > >     pc : subdev_open+0x8c/0x128
> > > >     lr : subdev_open+0x78/0x128
> > > >     sp : ffff80000aadba60
> > > >     x29: ffff80000aadba60 x28: 0000000000000000 x27: ffff80000aadbc58
> > > >     x26: 0000000000020000 x25: ffff00000b3aaf00 x24: 0000000000000000
> > > >     x23: ffff00000c331c00 x22: ffff000009aa61b8 x21: ffff000009aa6000
> > > >     x20: ffff000008bae3e8 x19: ffff00000c3fe200 x18: 0000000000000000
> > > >     x17: ffff800076945000 x16: ffff800008004000 x15: 00008cc6bf550c7c
> > > >     x14: 000000000000038f x13: 000000000000001a x12: ffff00007fba8618
> > > >     x11: 0000000000000001 x10: 0000000000000000 x9 : ffff800009253954
> > > >     x8 : ffff00000b3aaf00 x7 : 0000000000000004 x6 : 000000000000001a
> > > >     x5 : 0000000000000000 x4 : 0000000000000000 x3 : 0000000000000001
> > > >     x2 : 0000000100000001 x1 : 0000000000000000 x0 : 0000000000000000
> > > >     Call trace:
> > > >      subdev_open+0x8c/0x128
> >
> > After disassembling the code on my end (with slightly different
> > config) and looking at 0x8c from the start of the function, I'm pretty
> > sure the NULL deref is happening here inside subdev_open()
> >
> >         if (sd->v4l2_dev->mdev && sd->entity.graph_obj.mdev->dev) {
> >
> > sd->entity.graph_obj.mdev == NULL.
> >
> > And going by the field names, I'm guessing these are suppliers pointed
> > to by "remote-endpoint". Sadly fw_devlink can't extract any dependency
> > info from remote-endpoint because the devices generally point to each
> > other so a cycle is detected and the probe ordering isn't enforced
> > between the endpoints. We still need to parse remote-endpoint to
> > detect cycles created by a combination of endpoints/other properties
> > (there's a real world case in upstream).
> >
> > > >      v4l2_open+0xa4/0x120
> > > >      chrdev_open+0x78/0x178
> > > >      do_dentry_open+0xfc/0x398
> > > >      vfs_open+0x28/0x30
> > > >      path_openat+0x584/0x9c8
> > > >      do_filp_open+0x80/0x108
> > > >      do_sys_openat2+0x20c/0x2d8
> > > >     user pgtable: 4k pages, 48-bit VAs, pgdp=000000004ec53000
> > > >      do_sys_open+0x54/0xa0
> > > >      __arm64_sys_openat+0x20/0x28
> > > >      invoke_syscall+0x40/0xf8
> > > >      el0_svc_common.constprop.0+0xf0/0x110
> > > >      do_el0_svc+0x20/0x78
> > > >      el0_svc+0x48/0xd0
> > > >      el0t_64_sync_handler+0xb0/0xb8
> > > >      el0t_64_sync+0x148/0x14c
> > > >     Code: f9405280 f9400400 b40000e0 f9400280 (f9400000)
> > > >     ---[ end trace 0000000000000000 ]---
> > > >
> > > > This only happens on the Ebisu-4D board (r8a77990-ebisu.dts).
> > > > I do not see this on the Salvator-X(S) boards.
> > >
> > > Ok. I don't know much about either of these boards. Are they supposed
> > > to be very similar?
> > >
> > > > Bisection shows this starts to happen with "[PATCH v2 7/9] driver core:
> > > > Set fw_devlink.strict=1 by default".
> > >
> > > So in the series, by this point, the previous patches would have
> > > deferred probe timeout set to 10s (it can get extended on new driver
> > > additions of course) and once the timer expires suppliers without
> > > drivers will no longer block any consumers. The only difference
> > > fw_devlink.strict=1 should cause is iommus and dmas dependency being
> > > treated as mandatory till the timeout expires.
> > >
> > > In this instance, do you have iommu drivers and dma drivers compiled
> > > in or loaded as modules or not available at all? In all these case,
> > > the list of devices that would end up probing eventually should be the
> > > same with or without fw_devlink.strict=1. The only difference would be
> > > some reordering of probes.
> > >
> > > So this looks to me like improper error handling/assumption in the
> > > driver for this subdev device. I'm guessing one of the suppliers to
> > > this subdev has a direct/indirect dependency on iommus and this subdev
> > > driver is assuming that the supplier would have probed by the time
> > > it's probed.
> > >
> > > >
> > > > Adding more debug info:
> > > >
> > > >     subdev_open:54: file v4l-subdev1
> > > >     Unable to handle kernel NULL pointer dereference at virtual
> > > > address 0000000000000000
> > > >     subdev_open:54: file v4l-subdev2
> > > >     Unable to handle kernel NULL pointer dereference at virtual
> > > > address 0000000000000000
> >
> > How did you get these two "subdev_open" strings? And how/why the NULL
> > deref there?
>
> I added a debug print at the top of subdev_open():
>
>     pr_info("%s:%u: file %pD\n", __func__, __LINE__, file);
>
> The NULL deref is the actual issue.
>
> > > > Matching the subdev using sysfs gives:
> > > >
> > > >     /sys/devices/platform/soc/e6500000.i2c/i2c-0/0-0070/video4linux/v4l-subdev1
> > > >     /sys/devices/platform/soc/e6500000.i2c/i2c-0/0-0070/video4linux/v4l-subdev2
> > > >
> > > > The i2c device is the adi,adv7482 at address 0x70.
> > >
> > > I'm guessing the fix would be somewhere in this driver, but I haven't
> > > dug into it. Any guesses on which of its suppliers might have a
> > > direct/indirect dependency on an iommu/dma? You could also enable the
> > > debug log in fw_devlink_relax_link() and see if it relaxes any link
> > > where the supplier is an iommu/dma device. That might give us some
> > > hints.
> >
> > After spending way too much time on this looking at
> > drivers/media/v4l2-core, drivers/media/mc and
> > drivers/media/i2c/adv748x/ code, I'm guessing the ordering issue is
> > probably between "csi40:" device and the video-receiver@70 (the
> > "adi,adv7482") device.
> >
> > Based on your points about the sysfs, I was initially digging into
> > drivers/media/i2c/adv748x/adv748x-core.c. But then the parent of
> > video-receiver@70 is an i2c0 that has dmas dependencies. The csi40:
> > (referred to from video-controller) doesn't seem to have any iommu or
> > dmas dependency. So my guess is the csi40 gets probed first and then
> > assumes the video-controller is already available.
> >
> > Can you use this info to take a stab at debugging this further?
>
> Thanks for looking into this, there is indeed a cyclic dependency:
>
>     i2c 0-0070: Fixing up cyclic dependency with feaa0000.csi2
>     i2c 0-0070: Fixing up cyclic dependency with hdmi-in
>     i2c 0-0070: Fixing up cyclic dependency with cvbs-in
>
> > TL;DR is that I think this is some driver issue where it's not
> > checking for one of its suppliers to be ready yet.
>
> Setting fw_devlink_strict to true vs. false seems to influence which of
> two different failures will happen:
>   - rcar-csi2: probe of feaa0000.csi2 failed with error -22
>   - rcar-vin: probe of e6ef5000.video failed with error -22
> The former causes the NULL pointer dereference later.
> The latter existed before, but I hadn't noticed it, and bisection
> led to the real culprit (commit 3e52419ec04f9769 ("media: rcar-{csi2,vin}:
> Move to full Virtual Channel routing per CSI-2 IP").

If you revert that patch, does this series work fine? If yes, are you
happy with giving this a Tested-by?

-Saravana

>
> I am bringing it up with the multi-media guys in
> https://lore.kernel.org/all/20220124124858.571363-4-niklas.soderlund+renesas@ragnatech.se...
>
> Thanks!
>
> Gr{oetje,eeting}s,
>
>                         Geert
>
> --
> Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org
>
> In personal conversations with technical people, I call myself a hacker. But
> when I'm talking to journalists I just say "programmer" or something like that.
>                                 -- Linus Torvalds

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v2 0/9] deferred_probe_timeout logic clean up
  2022-06-08 18:12         ` Saravana Kannan
@ 2022-06-08 18:47           ` Geert Uytterhoeven
  2022-06-08 21:07             ` Saravana Kannan
  0 siblings, 1 reply; 69+ messages in thread
From: Geert Uytterhoeven @ 2022-06-08 18:47 UTC (permalink / raw)
  To: Saravana Kannan
  Cc: Greg Kroah-Hartman, Rafael J. Wysocki, Kevin Hilman, Ulf Hansson,
	Len Brown, Pavel Machek, Joerg Roedel, Will Deacon, Andrew Lunn,
	Heiner Kallweit, Russell King, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Linus Walleij, David Ahern,
	Android Kernel Team, Linux Kernel Mailing List, Linux PM list,
	Linux IOMMU, netdev, open list:GPIO SUBSYSTEM, Linux-Renesas,
	Niklas Söderlund, Laurent Pinchart

Hi Saravana,

On Wed, Jun 8, 2022 at 8:13 PM Saravana Kannan <saravanak@google.com> wrote:
> On Wed, Jun 8, 2022 at 3:26 AM Geert Uytterhoeven <geert@linux-m68k.org> wrote:
> > On Wed, Jun 8, 2022 at 6:17 AM Saravana Kannan <saravanak@google.com> wrote:
> > > On Tue, Jun 7, 2022 at 5:55 PM Saravana Kannan <saravanak@google.com> wrote:
> > > > On Tue, Jun 7, 2022 at 11:13 AM Geert Uytterhoeven <geert@linux-m68k.org> wrote:
> > > > > On Wed, Jun 1, 2022 at 12:46 PM Saravana Kannan <saravanak@google.com> wrote:
> > > > > > This series is based on linux-next + these 2 small patches applies on top:
> > > > > > https://lore.kernel.org/lkml/20220526034609.480766-1-saravanak@google.com/
> > > > > >
> > > > > > A lot of the deferred_probe_timeout logic is redundant with
> > > > > > fw_devlink=on.  Also, enabling deferred_probe_timeout by default breaks
> > > > > > a few cases.
> > > > > >
> > > > > > This series tries to delete the redundant logic, simplify the frameworks
> > > > > > that use driver_deferred_probe_check_state(), enable
> > > > > > deferred_probe_timeout=10 by default, and fixes the nfsroot failure
> > > > > > case.
> > > > > >
> > > > > > The overall idea of this series is to replace the global behavior of
> > > > > > driver_deferred_probe_check_state() where all devices give up waiting on
> > > > > > supplier at the same time with a more granular behavior:
> > > > > >
> > > > > > 1. Devices with all their suppliers successfully probed by late_initcall
> > > > > >    probe as usual and avoid unnecessary deferred probe attempts.
> > > > > >
> > > > > > 2. At or after late_initcall, in cases where boot would break because of
> > > > > >    fw_devlink=on being strict about the ordering, we
> > > > > >
> > > > > >    a. Temporarily relax the enforcement to probe any unprobed devices
> > > > > >       that can probe successfully in the current state of the system.
> > > > > >       For example, when we boot with a NFS rootfs and no network device
> > > > > >       has probed.
> > > > > >    b. Go back to enforcing the ordering for any devices that haven't
> > > > > >       probed.
> > > > > >
> > > > > > 3. After deferred probe timeout expires, we permanently give up waiting
> > > > > >    on supplier devices without drivers. At this point, whatever devices
> > > > > >    can probe without some of their optional suppliers end up probing.
> > > > > >
> > > > > > In the case where module support is disabled, it's fairly
> > > > > > straightforward and all device probes are completed before the initcalls
> > > > > > are done.
> > > > > >
> > > > > > Patches 1 to 3 are fairly straightforward and can probably be applied
> > > > > > right away.
> > > > > >
> > > > > > Patches 4 to 6 are for fixing the NFS rootfs issue and setting the
> > > > > > default deferred_probe_timeout back to 10 seconds when modules are
> > > > > > enabled.
> > > > > >
> > > > > > Patches 7 to 9 are further clean up of the deferred_probe_timeout logic
> > > > > > so that no framework has to know/care about deferred_probe_timeout.
> > > > > >
> > > > > > Yoshihiro/Geert,
> > > > > >
> > > > > > If you can test this patch series and confirm that the NFS root case
> > > > > > works, I'd really appreciate that.
> > > > >
> > > > > Thanks, I gave this a try on various boards I have access to.
> > > > > The results were quite positive. E.g. the compile error I saw on v1
> > > > > (implicit declation of fw_devlink_unblock_may_probe(), which is no longer
> > > > >  used in v2) is gone.
> > > >
> > > > Thanks a lot for testing these.
> > > >
> > > > > However, I'm seeing a weird error when userspace (Debian9 nfsroot) is
> > > > > starting:

> > Setting fw_devlink_strict to true vs. false seems to influence which of
> > two different failures will happen:
> >   - rcar-csi2: probe of feaa0000.csi2 failed with error -22
> >   - rcar-vin: probe of e6ef5000.video failed with error -22
> > The former causes the NULL pointer dereference later.
> > The latter existed before, but I hadn't noticed it, and bisection
> > led to the real culprit (commit 3e52419ec04f9769 ("media: rcar-{csi2,vin}:
> > Move to full Virtual Channel routing per CSI-2 IP").
>
> If you revert that patch, does this series work fine? If yes, are you
> happy with giving this a Tested-by?

Sure, sorry for forgetting that ;-)

Tested-by: Geert Uytterhoeven <geert+renesas@glider.be>

Gr{oetje,eeting}s,

                        Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v2 0/9] deferred_probe_timeout logic clean up
  2022-06-08 18:47           ` Geert Uytterhoeven
@ 2022-06-08 21:07             ` Saravana Kannan
  2022-06-08 22:49               ` Jakub Kicinski
  0 siblings, 1 reply; 69+ messages in thread
From: Saravana Kannan @ 2022-06-08 21:07 UTC (permalink / raw)
  To: Geert Uytterhoeven
  Cc: Greg Kroah-Hartman, Rafael J. Wysocki, Kevin Hilman, Ulf Hansson,
	Len Brown, Pavel Machek, Joerg Roedel, Will Deacon, Andrew Lunn,
	Heiner Kallweit, Russell King, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Linus Walleij, David Ahern,
	Android Kernel Team, Linux Kernel Mailing List, Linux PM list,
	Linux IOMMU, netdev, open list:GPIO SUBSYSTEM, Linux-Renesas,
	Niklas Söderlund, Laurent Pinchart, Rob Herring,
	Vladimir Oltean

On Wed, Jun 8, 2022 at 11:54 AM Geert Uytterhoeven <geert@linux-m68k.org> wrote:
>
> Hi Saravana,
>
> On Wed, Jun 8, 2022 at 8:13 PM Saravana Kannan <saravanak@google.com> wrote:
> > On Wed, Jun 8, 2022 at 3:26 AM Geert Uytterhoeven <geert@linux-m68k.org> wrote:
> > > On Wed, Jun 8, 2022 at 6:17 AM Saravana Kannan <saravanak@google.com> wrote:
> > > > On Tue, Jun 7, 2022 at 5:55 PM Saravana Kannan <saravanak@google.com> wrote:
> > > > > On Tue, Jun 7, 2022 at 11:13 AM Geert Uytterhoeven <geert@linux-m68k.org> wrote:
> > > > > > On Wed, Jun 1, 2022 at 12:46 PM Saravana Kannan <saravanak@google.com> wrote:
> > > > > > > This series is based on linux-next + these 2 small patches applies on top:
> > > > > > > https://lore.kernel.org/lkml/20220526034609.480766-1-saravanak@google.com/
> > > > > > >
> > > > > > > A lot of the deferred_probe_timeout logic is redundant with
> > > > > > > fw_devlink=on.  Also, enabling deferred_probe_timeout by default breaks
> > > > > > > a few cases.
> > > > > > >
> > > > > > > This series tries to delete the redundant logic, simplify the frameworks
> > > > > > > that use driver_deferred_probe_check_state(), enable
> > > > > > > deferred_probe_timeout=10 by default, and fixes the nfsroot failure
> > > > > > > case.
> > > > > > >
> > > > > > > The overall idea of this series is to replace the global behavior of
> > > > > > > driver_deferred_probe_check_state() where all devices give up waiting on
> > > > > > > supplier at the same time with a more granular behavior:
> > > > > > >
> > > > > > > 1. Devices with all their suppliers successfully probed by late_initcall
> > > > > > >    probe as usual and avoid unnecessary deferred probe attempts.
> > > > > > >
> > > > > > > 2. At or after late_initcall, in cases where boot would break because of
> > > > > > >    fw_devlink=on being strict about the ordering, we
> > > > > > >
> > > > > > >    a. Temporarily relax the enforcement to probe any unprobed devices
> > > > > > >       that can probe successfully in the current state of the system.
> > > > > > >       For example, when we boot with a NFS rootfs and no network device
> > > > > > >       has probed.
> > > > > > >    b. Go back to enforcing the ordering for any devices that haven't
> > > > > > >       probed.
> > > > > > >
> > > > > > > 3. After deferred probe timeout expires, we permanently give up waiting
> > > > > > >    on supplier devices without drivers. At this point, whatever devices
> > > > > > >    can probe without some of their optional suppliers end up probing.
> > > > > > >
> > > > > > > In the case where module support is disabled, it's fairly
> > > > > > > straightforward and all device probes are completed before the initcalls
> > > > > > > are done.
> > > > > > >
> > > > > > > Patches 1 to 3 are fairly straightforward and can probably be applied
> > > > > > > right away.
> > > > > > >
> > > > > > > Patches 4 to 6 are for fixing the NFS rootfs issue and setting the
> > > > > > > default deferred_probe_timeout back to 10 seconds when modules are
> > > > > > > enabled.
> > > > > > >
> > > > > > > Patches 7 to 9 are further clean up of the deferred_probe_timeout logic
> > > > > > > so that no framework has to know/care about deferred_probe_timeout.
> > > > > > >
> > > > > > > Yoshihiro/Geert,
> > > > > > >
> > > > > > > If you can test this patch series and confirm that the NFS root case
> > > > > > > works, I'd really appreciate that.
> > > > > >
> > > > > > Thanks, I gave this a try on various boards I have access to.
> > > > > > The results were quite positive. E.g. the compile error I saw on v1
> > > > > > (implicit declation of fw_devlink_unblock_may_probe(), which is no longer
> > > > > >  used in v2) is gone.
> > > > >
> > > > > Thanks a lot for testing these.
> > > > >
> > > > > > However, I'm seeing a weird error when userspace (Debian9 nfsroot) is
> > > > > > starting:
>
> > > Setting fw_devlink_strict to true vs. false seems to influence which of
> > > two different failures will happen:
> > >   - rcar-csi2: probe of feaa0000.csi2 failed with error -22
> > >   - rcar-vin: probe of e6ef5000.video failed with error -22
> > > The former causes the NULL pointer dereference later.
> > > The latter existed before, but I hadn't noticed it, and bisection
> > > led to the real culprit (commit 3e52419ec04f9769 ("media: rcar-{csi2,vin}:
> > > Move to full Virtual Channel routing per CSI-2 IP").
> >
> > If you revert that patch, does this series work fine? If yes, are you
> > happy with giving this a Tested-by?
>
> Sure, sorry for forgetting that ;-)
>
> Tested-by: Geert Uytterhoeven <geert+renesas@glider.be>

+few folks who I forgot to add.

Geert,

Thanks for the extensive testing!

Linus W, Ulf, Kevin, Will, Rob, Vladimir,

Can I get your reviews for the deletion of
driver_deferred_probe_check_state() please? We can finally remove it
and have frameworks not needing to know about it.

Greg, Rafael,

Can you review the wait_for_init_devices_probe() patch and the other
trivial driver core changes please?

David/Jakub,

Do the IP4 autoconfig changes look reasonable to you?

Thanks,
Saravana


>
> Gr{oetje,eeting}s,
>
>                         Geert
>
> --
> Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org
>
> In personal conversations with technical people, I call myself a hacker. But
> when I'm talking to journalists I just say "programmer" or something like that.
>                                 -- Linus Torvalds

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v2 0/9] deferred_probe_timeout logic clean up
  2022-06-08 21:07             ` Saravana Kannan
@ 2022-06-08 22:49               ` Jakub Kicinski
  2022-06-08 23:15                 ` Saravana Kannan
  0 siblings, 1 reply; 69+ messages in thread
From: Jakub Kicinski @ 2022-06-08 22:49 UTC (permalink / raw)
  To: Saravana Kannan
  Cc: Geert Uytterhoeven, Greg Kroah-Hartman, Rafael J. Wysocki,
	Kevin Hilman, Ulf Hansson, Len Brown, Pavel Machek, Joerg Roedel,
	Will Deacon, Andrew Lunn, Heiner Kallweit, Russell King,
	David S. Miller, Eric Dumazet, Paolo Abeni, Linus Walleij,
	David Ahern, Android Kernel Team, Linux Kernel Mailing List,
	Linux PM list, Linux IOMMU, netdev, open list:GPIO SUBSYSTEM,
	Linux-Renesas, Niklas Söderlund, Laurent Pinchart,
	Rob Herring, Vladimir Oltean, Florian Fainelli,
	Thomas Bogendoerfer

On Wed, 8 Jun 2022 14:07:44 -0700 Saravana Kannan wrote:
> David/Jakub,
> 
> Do the IP4 autoconfig changes look reasonable to you?

I'm no expert in this area, I'd trust the opinion of the embedded folks
(adding Florian as well) more than myself. It's unclear to me why we'd
wait_for_init_devices_probe() after the first failed iteration, sleep,
and then allow 11 more iterations with wait_for_device_probe().

Let me also add Thomas since he wrote e2ffe3ff6f5e ("net: ipconfig:
Wait for deferred device probes").

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v2 0/9] deferred_probe_timeout logic clean up
  2022-06-08 22:49               ` Jakub Kicinski
@ 2022-06-08 23:15                 ` Saravana Kannan
  0 siblings, 0 replies; 69+ messages in thread
From: Saravana Kannan @ 2022-06-08 23:15 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Geert Uytterhoeven, Greg Kroah-Hartman, Rafael J. Wysocki,
	Kevin Hilman, Ulf Hansson, Len Brown, Pavel Machek, Joerg Roedel,
	Will Deacon, Andrew Lunn, Heiner Kallweit, Russell King,
	David S. Miller, Eric Dumazet, Paolo Abeni, Linus Walleij,
	David Ahern, Android Kernel Team, Linux Kernel Mailing List,
	Linux PM list, Linux IOMMU, netdev, open list:GPIO SUBSYSTEM,
	Linux-Renesas, Niklas Söderlund, Laurent Pinchart,
	Rob Herring, Vladimir Oltean, Florian Fainelli,
	Thomas Bogendoerfer

On Wed, Jun 8, 2022 at 3:49 PM Jakub Kicinski <kuba@kernel.org> wrote:
>
> On Wed, 8 Jun 2022 14:07:44 -0700 Saravana Kannan wrote:
> > David/Jakub,
> >
> > Do the IP4 autoconfig changes look reasonable to you?
>
> I'm no expert in this area, I'd trust the opinion of the embedded folks
> (adding Florian as well) more than myself.

Thanks.

> It's unclear to me why we'd
> wait_for_init_devices_probe() after the first failed iteration,

wait_for_init_devices_probe() relaxes ordering rules for all devices
and it's not something we want to do unless we really need it. That's
why we are doing that only if we can't find any network device in the
first iteration.

> sleep,
> and then allow 11 more iterations with wait_for_device_probe().
> Let me also add Thomas since he wrote e2ffe3ff6f5e ("net: ipconfig:
> Wait for deferred device probes").

Even without this change, I'm not sure the wait_for_device_probe()
needs to be within the loop. It's probably sufficient to just do it
once in the beginning, but it's already there and I'm not sure if I'm
missing some scenarios, so I left that part as is.

-Saravana

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v2 1/9] PM: domains: Delete usage of driver_deferred_probe_check_state()
  2022-06-01  7:06 ` [PATCH v2 1/9] PM: domains: Delete usage of driver_deferred_probe_check_state() Saravana Kannan
@ 2022-06-09 11:44   ` Ulf Hansson
  2022-06-09 19:29     ` Saravana Kannan
  2022-06-21  7:28   ` Tony Lindgren
  1 sibling, 1 reply; 69+ messages in thread
From: Ulf Hansson @ 2022-06-09 11:44 UTC (permalink / raw)
  To: Saravana Kannan
  Cc: Greg Kroah-Hartman, Rafael J. Wysocki, Kevin Hilman, Len Brown,
	Pavel Machek, Joerg Roedel, Will Deacon, Andrew Lunn,
	Heiner Kallweit, Russell King, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Linus Walleij, Hideaki YOSHIFUJI,
	David Ahern, kernel-team, linux-kernel, linux-pm, iommu, netdev,
	linux-gpio

On Wed, 1 Jun 2022 at 09:07, Saravana Kannan <saravanak@google.com> wrote:
>
> Now that fw_devlink=on by default and fw_devlink supports
> "power-domains" property, the execution will never get to the point
> where driver_deferred_probe_check_state() is called before the supplier
> has probed successfully or before deferred probe timeout has expired.
>
> So, delete the call and replace it with -ENODEV.

With fw_devlink=on by default - does that mean that the parameter
can't be changed?

Or perhaps the point is that we don't want to go back, but rather drop
the fw_devlink parameter altogether when moving forward?

>
> Signed-off-by: Saravana Kannan <saravanak@google.com>

Just a minor nitpick below. Nevertheless, feel free to add:

Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org>

> ---
>  drivers/base/power/domain.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/base/power/domain.c b/drivers/base/power/domain.c
> index 739e52cd4aba..3e86772d5fac 100644
> --- a/drivers/base/power/domain.c
> +++ b/drivers/base/power/domain.c
> @@ -2730,7 +2730,7 @@ static int __genpd_dev_pm_attach(struct device *dev, struct device *base_dev,
>                 mutex_unlock(&gpd_list_lock);
>                 dev_dbg(dev, "%s() failed to find PM domain: %ld\n",
>                         __func__, PTR_ERR(pd));
> -               return driver_deferred_probe_check_state(base_dev);

Adding a brief comment about why -EPROBE_DEFER doesn't make sense
here, would be nice.

> +               return -ENODEV;
>         }
>
>         dev_dbg(dev, "adding to PM domain %s\n", pd->name);
> --
> 2.36.1.255.ge46751e96f-goog
>

Kind regards
Uffe

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v2 1/9] PM: domains: Delete usage of driver_deferred_probe_check_state()
  2022-06-09 11:44   ` Ulf Hansson
@ 2022-06-09 19:29     ` Saravana Kannan
  0 siblings, 0 replies; 69+ messages in thread
From: Saravana Kannan @ 2022-06-09 19:29 UTC (permalink / raw)
  To: Ulf Hansson
  Cc: Greg Kroah-Hartman, Rafael J. Wysocki, Kevin Hilman, Len Brown,
	Pavel Machek, Joerg Roedel, Will Deacon, Andrew Lunn,
	Heiner Kallweit, Russell King, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Linus Walleij, Hideaki YOSHIFUJI,
	David Ahern, kernel-team, linux-kernel, linux-pm, iommu, netdev,
	linux-gpio

On Thu, Jun 9, 2022 at 4:45 AM Ulf Hansson <ulf.hansson@linaro.org> wrote:
>
> On Wed, 1 Jun 2022 at 09:07, Saravana Kannan <saravanak@google.com> wrote:
> >
> > Now that fw_devlink=on by default and fw_devlink supports
> > "power-domains" property, the execution will never get to the point
> > where driver_deferred_probe_check_state() is called before the supplier
> > has probed successfully or before deferred probe timeout has expired.
> >
> > So, delete the call and replace it with -ENODEV.
>
> With fw_devlink=on by default - does that mean that the parameter
> can't be changed?
>
> Or perhaps the point is that we don't want to go back, but rather drop
> the fw_devlink parameter altogether when moving forward?

Good question. For now, keeping fw_devlink=off and
fw_devlink=permissive as debugging options that I can ask people to
try if some probe is getting blocked.

Or maybe if some ultra low memory use case wants to avoid create
device links, fwnode links, etc and can build everything in and have
init/probe happen in the right order.

But in the long run, I see a strong possibility for
fw_devlink=off/permissive being removed. I'd still want to keep it for
implementing =rpm where it'd also automatically enable PM runtime
tracking, but I don't understand that well enough yet to do it by
default.

> >
> > Signed-off-by: Saravana Kannan <saravanak@google.com>
>
> Just a minor nitpick below. Nevertheless, feel free to add:
>
> Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org>

Thanks!

>
> > ---
> >  drivers/base/power/domain.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/drivers/base/power/domain.c b/drivers/base/power/domain.c
> > index 739e52cd4aba..3e86772d5fac 100644
> > --- a/drivers/base/power/domain.c
> > +++ b/drivers/base/power/domain.c
> > @@ -2730,7 +2730,7 @@ static int __genpd_dev_pm_attach(struct device *dev, struct device *base_dev,
> >                 mutex_unlock(&gpd_list_lock);
> >                 dev_dbg(dev, "%s() failed to find PM domain: %ld\n",
> >                         __func__, PTR_ERR(pd));
> > -               return driver_deferred_probe_check_state(base_dev);
>
> Adding a brief comment about why -EPROBE_DEFER doesn't make sense
> here, would be nice.

Will do once all the reviews comeout/when I send v3.

I'm thinking something like:
/* fw_devlink will take care of retrying for missing suppliers */

-Saravana

>
> > +               return -ENODEV;
> >         }
> >
> >         dev_dbg(dev, "adding to PM domain %s\n", pd->name);
> > --
> > 2.36.1.255.ge46751e96f-goog
> >
>
> Kind regards
> Uffe

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v2 1/9] PM: domains: Delete usage of driver_deferred_probe_check_state()
  2022-06-01  7:06 ` [PATCH v2 1/9] PM: domains: Delete usage of driver_deferred_probe_check_state() Saravana Kannan
  2022-06-09 11:44   ` Ulf Hansson
@ 2022-06-21  7:28   ` Tony Lindgren
  2022-06-21 19:34     ` Saravana Kannan
  2022-06-23 12:08     ` Alexander Stein
  1 sibling, 2 replies; 69+ messages in thread
From: Tony Lindgren @ 2022-06-21  7:28 UTC (permalink / raw)
  To: Saravana Kannan
  Cc: Greg Kroah-Hartman, Rafael J. Wysocki, Kevin Hilman, Ulf Hansson,
	Len Brown, Pavel Machek, Joerg Roedel, Will Deacon, Andrew Lunn,
	Heiner Kallweit, Russell King, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Linus Walleij, Hideaki YOSHIFUJI,
	David Ahern, kernel-team, linux-kernel, linux-pm, iommu, netdev,
	linux-gpio, Geert Uytterhoeven

Hi,

* Saravana Kannan <saravanak@google.com> [700101 02:00]:
> Now that fw_devlink=on by default and fw_devlink supports
> "power-domains" property, the execution will never get to the point
> where driver_deferred_probe_check_state() is called before the supplier
> has probed successfully or before deferred probe timeout has expired.
> 
> So, delete the call and replace it with -ENODEV.

Looks like this causes omaps to not boot in Linux next. With this
simple-pm-bus fails to probe initially as the power-domain is not
yet available. On platform_probe() genpd_get_from_provider() returns
-ENOENT.

Seems like other stuff is potentially broken too, any ideas on
how to fix this?

Regards,

Tony



> 
> Signed-off-by: Saravana Kannan <saravanak@google.com>
> ---
>  drivers/base/power/domain.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/base/power/domain.c b/drivers/base/power/domain.c
> index 739e52cd4aba..3e86772d5fac 100644
> --- a/drivers/base/power/domain.c
> +++ b/drivers/base/power/domain.c
> @@ -2730,7 +2730,7 @@ static int __genpd_dev_pm_attach(struct device *dev, struct device *base_dev,
>  		mutex_unlock(&gpd_list_lock);
>  		dev_dbg(dev, "%s() failed to find PM domain: %ld\n",
>  			__func__, PTR_ERR(pd));
> -		return driver_deferred_probe_check_state(base_dev);
> +		return -ENODEV;
>  	}
>  
>  	dev_dbg(dev, "adding to PM domain %s\n", pd->name);
> -- 
> 2.36.1.255.ge46751e96f-goog
> 

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v2 1/9] PM: domains: Delete usage of driver_deferred_probe_check_state()
  2022-06-21  7:28   ` Tony Lindgren
@ 2022-06-21 19:34     ` Saravana Kannan
  2022-06-22  4:58       ` Tony Lindgren
  2022-06-23 12:08     ` Alexander Stein
  1 sibling, 1 reply; 69+ messages in thread
From: Saravana Kannan @ 2022-06-21 19:34 UTC (permalink / raw)
  To: Tony Lindgren
  Cc: Greg Kroah-Hartman, Rafael J. Wysocki, Kevin Hilman, Ulf Hansson,
	Len Brown, Pavel Machek, Joerg Roedel, Will Deacon, Andrew Lunn,
	Heiner Kallweit, Russell King, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Linus Walleij, Hideaki YOSHIFUJI,
	David Ahern, kernel-team, linux-kernel, linux-pm, iommu, netdev,
	linux-gpio, Geert Uytterhoeven

On Tue, Jun 21, 2022 at 12:28 AM Tony Lindgren <tony@atomide.com> wrote:
>
> Hi,
>
> * Saravana Kannan <saravanak@google.com> [700101 02:00]:
> > Now that fw_devlink=on by default and fw_devlink supports
> > "power-domains" property, the execution will never get to the point
> > where driver_deferred_probe_check_state() is called before the supplier
> > has probed successfully or before deferred probe timeout has expired.
> >
> > So, delete the call and replace it with -ENODEV.
>
> Looks like this causes omaps to not boot in Linux next.

Can you please point me to an example DTS I could use for debugging
this? I'm assuming you are leaving fw_devlink=on and not turning it
off or putting it in permissive mode.

> With this
> simple-pm-bus fails to probe initially as the power-domain is not
> yet available.

Before we get to late_initcall(), I'd expect this series to not have
any impact because both fw_devlink and
driver_deferred_probe_check_state() should be causing the device's
probe to get deferred until the PM domain device comes up.

To double check this, without this series, can you give me the list of
"supplier:*" symlinks under a simple-pm-bus device's sysfs folder
that's having problems with this series? And for all those symlinks,
cat the "status" file under that directory?

In the system where this is failing, is the PM domain driver loaded as
a module at a later point?

Couple of other things to try (with the patches) to narrow this down:
* Can you set driver_probe_timeout=0 in the command line and see if that helps?
* Can you set it to something high like 30 or even larger and see if it helps?

> On platform_probe() genpd_get_from_provider() returns
> -ENOENT.

This error is with the series I assume?

> Seems like other stuff is potentially broken too, any ideas on
> how to fix this?

I'll want to understand the issue first. It's not yet clear to me why
fw_devlink isn't blocking the probe of the simple-pm-bus device until
the PM domain device shows up. And if it is not blocking, then why and
at what point in boot it's giving up and letting the probe get to this
point where there's an error.

-Saravana

>
>
> >
> > Signed-off-by: Saravana Kannan <saravanak@google.com>
> > ---
> >  drivers/base/power/domain.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/drivers/base/power/domain.c b/drivers/base/power/domain.c
> > index 739e52cd4aba..3e86772d5fac 100644
> > --- a/drivers/base/power/domain.c
> > +++ b/drivers/base/power/domain.c
> > @@ -2730,7 +2730,7 @@ static int __genpd_dev_pm_attach(struct device *dev, struct device *base_dev,
> >               mutex_unlock(&gpd_list_lock);
> >               dev_dbg(dev, "%s() failed to find PM domain: %ld\n",
> >                       __func__, PTR_ERR(pd));
> > -             return driver_deferred_probe_check_state(base_dev);
> > +             return -ENODEV;
> >       }
> >
> >       dev_dbg(dev, "adding to PM domain %s\n", pd->name);
> > --
> > 2.36.1.255.ge46751e96f-goog
> >
>
> --
> To unsubscribe from this group and stop receiving emails from it, send an email to kernel-team+unsubscribe@android.com.
>

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v2 1/9] PM: domains: Delete usage of driver_deferred_probe_check_state()
  2022-06-21 19:34     ` Saravana Kannan
@ 2022-06-22  4:58       ` Tony Lindgren
  2022-06-22 19:09         ` Saravana Kannan
  0 siblings, 1 reply; 69+ messages in thread
From: Tony Lindgren @ 2022-06-22  4:58 UTC (permalink / raw)
  To: Saravana Kannan
  Cc: Greg Kroah-Hartman, Rafael J. Wysocki, Kevin Hilman, Ulf Hansson,
	Len Brown, Pavel Machek, Joerg Roedel, Will Deacon, Andrew Lunn,
	Heiner Kallweit, Russell King, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Linus Walleij, Hideaki YOSHIFUJI,
	David Ahern, kernel-team, linux-kernel, linux-pm, iommu, netdev,
	linux-gpio, Geert Uytterhoeven

Hi,

* Saravana Kannan <saravanak@google.com> [220621 19:29]:
> On Tue, Jun 21, 2022 at 12:28 AM Tony Lindgren <tony@atomide.com> wrote:
> >
> > Hi,
> >
> > * Saravana Kannan <saravanak@google.com> [700101 02:00]:
> > > Now that fw_devlink=on by default and fw_devlink supports
> > > "power-domains" property, the execution will never get to the point
> > > where driver_deferred_probe_check_state() is called before the supplier
> > > has probed successfully or before deferred probe timeout has expired.
> > >
> > > So, delete the call and replace it with -ENODEV.
> >
> > Looks like this causes omaps to not boot in Linux next.
> 
> Can you please point me to an example DTS I could use for debugging
> this? I'm assuming you are leaving fw_devlink=on and not turning it
> off or putting it in permissive mode.

Sure, this seems to happen at least with simple-pm-bus as the top
level interconnect with a configured power-domains property:

$ git grep -A10 "ocp {" arch/arm/boot/dts/*.dtsi | grep -B3 -A4 simple-pm-bus

This issue is no directly related fw_devlink. It is a side effect of
removing driver_deferred_probe_check_state(). We no longer return
-EPROBE_DEFER at the end of driver_deferred_probe_check_state().

> > On platform_probe() genpd_get_from_provider() returns
> > -ENOENT.
> 
> This error is with the series I assume?

On the first probe genpd_get_from_provider() will return -ENOENT in
both cases. The list is empty on the first probe and there are no
genpd providers at this point.

Earlier with driver_deferred_probe_check_state(), the initial -ENOENT
ends up getting changed to -EPROBE_DEFER at the end of
driver_deferred_probe_check_state(), we are now missing that.

Regards,

Tony

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v2 7/9] driver core: Set fw_devlink.strict=1 by default
  2022-06-01  7:07 ` [PATCH v2 7/9] driver core: Set fw_devlink.strict=1 by default Saravana Kannan
@ 2022-06-22  7:47   ` Sascha Hauer
  2022-06-22  8:44     ` Linus Walleij
  0 siblings, 1 reply; 69+ messages in thread
From: Sascha Hauer @ 2022-06-22  7:47 UTC (permalink / raw)
  To: Saravana Kannan
  Cc: Greg Kroah-Hartman, Rafael J. Wysocki, Kevin Hilman, Ulf Hansson,
	Len Brown, Pavel Machek, Joerg Roedel, Will Deacon, Andrew Lunn,
	Heiner Kallweit, Russell King, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Linus Walleij, Hideaki YOSHIFUJI,
	David Ahern, kernel-team, linux-kernel, linux-pm, iommu, netdev,
	linux-gpio, kernel

On Wed, Jun 01, 2022 at 12:07:03AM -0700, Saravana Kannan wrote:
> Now that deferred_probe_timeout is non-zero by default, fw_devlink will
> never permanently block the probing of devices. It'll try its best to
> probe the devices in the right order and then finally let devices probe
> even if their suppliers don't have any drivers.
> 
> Signed-off-by: Saravana Kannan <saravanak@google.com>
> ---
>  drivers/base/core.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)

As mentioned here:

https://lore.kernel.org/lkml/20220622062027.994614-1-peng.fan@oss.nxp.com/

This patch has the effect that console UART devices which have "dmas"
properties specified in the device tree get deferred for 10 to 20
seconds. This happens on i.MX and likely on other SoCs as well. On i.MX
the dma channel is only requested at UART startup time and not at probe
time. dma is not used for the console. Nevertheless with this driver probe
defers until the dma engine driver is available.

It shouldn't go in as-is.

Sascha

> 
> diff --git a/drivers/base/core.c b/drivers/base/core.c
> index 61fdfe99b348..977b379a495b 100644
> --- a/drivers/base/core.c
> +++ b/drivers/base/core.c
> @@ -1613,7 +1613,7 @@ static int __init fw_devlink_setup(char *arg)
>  }
>  early_param("fw_devlink", fw_devlink_setup);
>  
> -static bool fw_devlink_strict;
> +static bool fw_devlink_strict = true;
>  static int __init fw_devlink_strict_setup(char *arg)
>  {
>  	return strtobool(arg, &fw_devlink_strict);
> -- 
> 2.36.1.255.ge46751e96f-goog
> 
> 

-- 
Pengutronix e.K.                           |                             |
Steuerwalder Str. 21                       | http://www.pengutronix.de/  |
31137 Hildesheim, Germany                  | Phone: +49-5121-206917-0    |
Amtsgericht Hildesheim, HRA 2686           | Fax:   +49-5121-206917-5555 |

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v2 7/9] driver core: Set fw_devlink.strict=1 by default
  2022-06-22  7:47   ` Sascha Hauer
@ 2022-06-22  8:44     ` Linus Walleij
  2022-06-22 10:52       ` Andy Shevchenko
  2022-06-22 19:40       ` Saravana Kannan
  0 siblings, 2 replies; 69+ messages in thread
From: Linus Walleij @ 2022-06-22  8:44 UTC (permalink / raw)
  To: Sascha Hauer
  Cc: Saravana Kannan, Greg Kroah-Hartman, Rafael J. Wysocki,
	Kevin Hilman, Ulf Hansson, Len Brown, Pavel Machek, Joerg Roedel,
	Will Deacon, Andrew Lunn, Heiner Kallweit, Russell King,
	David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Hideaki YOSHIFUJI, David Ahern, kernel-team, linux-kernel,
	linux-pm, iommu, netdev, linux-gpio, kernel

On Wed, Jun 22, 2022 at 9:48 AM Sascha Hauer <sha@pengutronix.de> wrote:

> This patch has the effect that console UART devices which have "dmas"
> properties specified in the device tree get deferred for 10 to 20
> seconds. This happens on i.MX and likely on other SoCs as well. On i.MX
> the dma channel is only requested at UART startup time and not at probe
> time. dma is not used for the console. Nevertheless with this driver probe
> defers until the dma engine driver is available.
>
> It shouldn't go in as-is.

This affects all machines with the PL011 UART and DMAs specified as
well.

It would be best if the console subsystem could be treated special and
not require DMA devlink to be satisfied before probing.

It seems devlink is not quite aware of the concept of resources that are
necessary to probe vs resources that are nice to have and might be
added after probe. We need a strong devlink for the first category
and maybe a weak devlink for the latter category.

I don't know if this is a generic hardware property for all operating
systems so it could be a DT property such as dma-weak-dependency?
Or maybe compromize and add a linux,dma-weak-dependency;
property?

Yours,
Linus Walleij

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v2 7/9] driver core: Set fw_devlink.strict=1 by default
  2022-06-22  8:44     ` Linus Walleij
@ 2022-06-22 10:52       ` Andy Shevchenko
  2022-06-22 11:18         ` Sascha Hauer
  2022-06-22 19:40       ` Saravana Kannan
  1 sibling, 1 reply; 69+ messages in thread
From: Andy Shevchenko @ 2022-06-22 10:52 UTC (permalink / raw)
  To: Linus Walleij
  Cc: Sascha Hauer, Saravana Kannan, Greg Kroah-Hartman,
	Rafael J. Wysocki, Kevin Hilman, Ulf Hansson, Len Brown,
	Pavel Machek, Joerg Roedel, Will Deacon, Andrew Lunn,
	Heiner Kallweit, Russell King, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Hideaki YOSHIFUJI, David Ahern,
	Android Kernel Team, Linux Kernel Mailing List, Linux PM,
	list@263.net:IOMMU DRIVERS, netdev, open list:GPIO SUBSYSTEM,
	Sascha Hauer

On Wed, Jun 22, 2022 at 10:44 AM Linus Walleij <linus.walleij@linaro.org> wrote:
> On Wed, Jun 22, 2022 at 9:48 AM Sascha Hauer <sha@pengutronix.de> wrote:

...

> > This patch has the effect that console UART devices which have "dmas"
> > properties specified in the device tree get deferred for 10 to 20
> > seconds. This happens on i.MX and likely on other SoCs as well. On i.MX
> > the dma channel is only requested at UART startup time and not at probe
> > time. dma is not used for the console. Nevertheless with this driver probe
> > defers until the dma engine driver is available.
> >
> > It shouldn't go in as-is.
>
> This affects all machines with the PL011 UART and DMAs specified as
> well.
>
> It would be best if the console subsystem could be treated special and
> not require DMA devlink to be satisfied before probing.

In 8250 we force disable DMA and PM on kernel consoles, because it's
so-o PITA and has a lot of corner cases we may never chase down.

089b6d365491 serial: 8250_port: Disable DMA operations for kernel console
bedb404e91bb serial: 8250_port: Don't use power management for kernel console


> It seems devlink is not quite aware of the concept of resources that are
> necessary to probe vs resources that are nice to have and might be
> added after probe. We need a strong devlink for the first category
> and maybe a weak devlink for the latter category.
>
> I don't know if this is a generic hardware property for all operating
> systems so it could be a DT property such as dma-weak-dependency?
> Or maybe compromize and add a linux,dma-weak-dependency;
> property?


-- 
With Best Regards,
Andy Shevchenko

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v2 7/9] driver core: Set fw_devlink.strict=1 by default
  2022-06-22 10:52       ` Andy Shevchenko
@ 2022-06-22 11:18         ` Sascha Hauer
  0 siblings, 0 replies; 69+ messages in thread
From: Sascha Hauer @ 2022-06-22 11:18 UTC (permalink / raw)
  To: Andy Shevchenko
  Cc: Linus Walleij, Andrew Lunn, Ulf Hansson, Rafael J. Wysocki,
	Eric Dumazet, Pavel Machek, Will Deacon, Saravana Kannan,
	Kevin Hilman, Joerg Roedel, Russell King, Jakub Kicinski,
	Paolo Abeni, Android Kernel Team, Len Brown, Linux PM,
	open list:GPIO SUBSYSTEM, Hideaki YOSHIFUJI, Greg Kroah-Hartman,
	David Ahern, Linux Kernel Mailing List,
	list@263.net:IOMMU DRIVERS, Sascha Hauer, netdev,
	David S. Miller, Heiner Kallweit

On Wed, Jun 22, 2022 at 12:52:02PM +0200, Andy Shevchenko wrote:
> On Wed, Jun 22, 2022 at 10:44 AM Linus Walleij <linus.walleij@linaro.org> wrote:
> > On Wed, Jun 22, 2022 at 9:48 AM Sascha Hauer <sha@pengutronix.de> wrote:
> 
> ...
> 
> > > This patch has the effect that console UART devices which have "dmas"
> > > properties specified in the device tree get deferred for 10 to 20
> > > seconds. This happens on i.MX and likely on other SoCs as well. On i.MX
> > > the dma channel is only requested at UART startup time and not at probe
> > > time. dma is not used for the console. Nevertheless with this driver probe
> > > defers until the dma engine driver is available.
> > >
> > > It shouldn't go in as-is.
> >
> > This affects all machines with the PL011 UART and DMAs specified as
> > well.
> >
> > It would be best if the console subsystem could be treated special and
> > not require DMA devlink to be satisfied before probing.
> 
> In 8250 we force disable DMA and PM on kernel consoles, because it's
> so-o PITA and has a lot of corner cases we may never chase down.

On i.MX this is done as well, but it doesn't help here. The driver is
not even probed when the device node contains a "dmas" property.

Sascha

-- 
Pengutronix e.K.                           |                             |
Steuerwalder Str. 21                       | http://www.pengutronix.de/  |
31137 Hildesheim, Germany                  | Phone: +49-5121-206917-0    |
Amtsgericht Hildesheim, HRA 2686           | Fax:   +49-5121-206917-5555 |

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v2 1/9] PM: domains: Delete usage of driver_deferred_probe_check_state()
  2022-06-22  4:58       ` Tony Lindgren
@ 2022-06-22 19:09         ` Saravana Kannan
  2022-06-23  7:01           ` Tony Lindgren
  0 siblings, 1 reply; 69+ messages in thread
From: Saravana Kannan @ 2022-06-22 19:09 UTC (permalink / raw)
  To: Tony Lindgren
  Cc: Greg Kroah-Hartman, Rafael J. Wysocki, Kevin Hilman, Ulf Hansson,
	Len Brown, Pavel Machek, Joerg Roedel, Will Deacon, Andrew Lunn,
	Heiner Kallweit, Russell King, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Linus Walleij, Hideaki YOSHIFUJI,
	David Ahern, kernel-team, linux-kernel, linux-pm, iommu, netdev,
	linux-gpio, Geert Uytterhoeven

On Tue, Jun 21, 2022 at 9:59 PM Tony Lindgren <tony@atomide.com> wrote:
>
> Hi,
>
> * Saravana Kannan <saravanak@google.com> [220621 19:29]:
> > On Tue, Jun 21, 2022 at 12:28 AM Tony Lindgren <tony@atomide.com> wrote:
> > >
> > > Hi,
> > >
> > > * Saravana Kannan <saravanak@google.com> [700101 02:00]:
> > > > Now that fw_devlink=on by default and fw_devlink supports
> > > > "power-domains" property, the execution will never get to the point
> > > > where driver_deferred_probe_check_state() is called before the supplier
> > > > has probed successfully or before deferred probe timeout has expired.
> > > >
> > > > So, delete the call and replace it with -ENODEV.
> > >
> > > Looks like this causes omaps to not boot in Linux next.
> >
> > Can you please point me to an example DTS I could use for debugging
> > this? I'm assuming you are leaving fw_devlink=on and not turning it
> > off or putting it in permissive mode.
>
> Sure, this seems to happen at least with simple-pm-bus as the top
> level interconnect with a configured power-domains property:
>
> $ git grep -A10 "ocp {" arch/arm/boot/dts/*.dtsi | grep -B3 -A4 simple-pm-bus

Thanks for the example. I generally start looking from dts (not dtsi)
files in case there are some DT property override/additions after the
dtsi files are included in the dts file. But I'll assume for now
that's not the case. If there's a specific dts file for a board I can
look from that'd be helpful to rule out those kinds of issues.

For now, I looked at arch/arm/boot/dts/omap4.dtsi.

>
> This issue is no directly related fw_devlink. It is a side effect of
> removing driver_deferred_probe_check_state(). We no longer return
> -EPROBE_DEFER at the end of driver_deferred_probe_check_state().

Yes, I understand the issue. But driver_deferred_probe_check_state()
was deleted because fw_devlink=on should have short circuited the
probe attempt with an  -EPROBE_DEFER before reaching the bus/driver
probe function and hitting this -ENOENT failure. That's why I was
asking the other questions.

> > > On platform_probe() genpd_get_from_provider() returns
> > > -ENOENT.
> >
> > This error is with the series I assume?
>
> On the first probe genpd_get_from_provider() will return -ENOENT in
> both cases. The list is empty on the first probe and there are no
> genpd providers at this point.
>
> Earlier with driver_deferred_probe_check_state(), the initial -ENOENT
> ends up getting changed to -EPROBE_DEFER at the end of
> driver_deferred_probe_check_state(), we are now missing that.

Right, I was aware -ENOENT would be returned if we got this far. But
the point of this series is that you shouldn't have gotten that far
before your pm domain device is ready. Hence my questions from the
earlier reply.

Can I get answers to rest of my questions in the first reply please?
That should help us figure out why fw_devlink let us get this far.
Summarize them here to make it easy:
* Are you running with fw_devlink=on?
* Is the"ti,omap4-prm-inst"/"ti,omap-prm-inst" built-in in this case?
* If it's not built-in, can you please try deferred_probe_timeout=0
and deferred_probe_timeout=30 and see if either one of them help?
* Can I get the output of "ls -d supplier:*" and "cat
supplier:*/status" output from the sysfs dir for the ocp device
without this series where it boots properly.

Thanks,
Saravana

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v2 7/9] driver core: Set fw_devlink.strict=1 by default
  2022-06-22  8:44     ` Linus Walleij
  2022-06-22 10:52       ` Andy Shevchenko
@ 2022-06-22 19:40       ` Saravana Kannan
  2022-06-22 20:35         ` Saravana Kannan
  2022-06-28 13:09         ` Linus Walleij
  1 sibling, 2 replies; 69+ messages in thread
From: Saravana Kannan @ 2022-06-22 19:40 UTC (permalink / raw)
  To: Linus Walleij
  Cc: Sascha Hauer, Greg Kroah-Hartman, Rafael J. Wysocki,
	Kevin Hilman, Ulf Hansson, Len Brown, Pavel Machek, Joerg Roedel,
	Will Deacon, Andrew Lunn, Heiner Kallweit, Russell King,
	David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Hideaki YOSHIFUJI, David Ahern, kernel-team, linux-kernel,
	linux-pm, iommu, netdev, linux-gpio, kernel

On Wed, Jun 22, 2022 at 1:44 AM Linus Walleij <linus.walleij@linaro.org> wrote:
>
> On Wed, Jun 22, 2022 at 9:48 AM Sascha Hauer <sha@pengutronix.de> wrote:
>
> > This patch has the effect that console UART devices which have "dmas"
> > properties specified in the device tree get deferred for 10 to 20
> > seconds. This happens on i.MX and likely on other SoCs as well. On i.MX
> > the dma channel is only requested at UART startup time and not at probe
> > time. dma is not used for the console. Nevertheless with this driver probe
> > defers until the dma engine driver is available.

FYI, if most of the drivers are built in, you could set
deferred_probe_timeout=1 to reduce the impact of this (should drop
down to 1 to 2 seconds). Is that an option until we figure out
something better?

Actually, why isn't earlyconsole being used? That doesn't get blocked
on anything and the main point of that is to have console working from
really early on.

> >
> > It shouldn't go in as-is.
>
> This affects all machines with the PL011 UART and DMAs specified as
> well.
>
> It would be best if the console subsystem could be treated special and
> not require DMA devlink to be satisfied before probing.

If we can mark the console devices somehow before their drivers probe
them, I can make fw_devlink give them special treatment. Is there any
way I could identify them before their drivers probe?

> It seems devlink is not quite aware of the concept of resources that are
> necessary to probe vs resources that are nice to have and might be
> added after probe.

Correct, it can't tell them apart. Which is why it tries its best to
enforce them, get most of them ordered properly and then gives up
enforcing the rest after deferred_probe_timeout= expires. There's a
bit more nuance than what I explained here (explained in earlier
commit texts, LPC talks), but that's the gist of it. That's what's
going on in this case Sascha is pointing out.z

> We need a strong devlink for the first category
> and maybe a weak devlink for the latter category.
>
> I don't know if this is a generic hardware property for all operating
> systems so it could be a DT property such as dma-weak-dependency?
>
> Or maybe compromize and add a linux,dma-weak-dependency;
> property?

The linux,dma-weak-dependency might be an option, but then if the
kernel version changes and we want to enforce it because we now have a
dma driver (not related to Shasha's example) support, then the
fw_devlink still can't enforce it because of that property. But maybe
that's okay? The consumer can try to use dma and defer probe if it
fails?

Another option is to mark console devices in DT with some property and
we can give special treatment for those without waiting for
deferred_probe_timeout= to expire.

-Saravana

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v2 7/9] driver core: Set fw_devlink.strict=1 by default
  2022-06-22 19:40       ` Saravana Kannan
@ 2022-06-22 20:35         ` Saravana Kannan
  2022-06-22 22:30           ` Saravana Kannan
  2022-06-28 13:09         ` Linus Walleij
  1 sibling, 1 reply; 69+ messages in thread
From: Saravana Kannan @ 2022-06-22 20:35 UTC (permalink / raw)
  To: Linus Walleij
  Cc: Sascha Hauer, Greg Kroah-Hartman, Rafael J. Wysocki,
	Kevin Hilman, Ulf Hansson, Len Brown, Pavel Machek, Joerg Roedel,
	Will Deacon, Andrew Lunn, Heiner Kallweit, Russell King,
	David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Hideaki YOSHIFUJI, David Ahern, kernel-team, linux-kernel,
	linux-pm, iommu, netdev, linux-gpio, kernel

On Wed, Jun 22, 2022 at 12:40 PM Saravana Kannan <saravanak@google.com> wrote:
>
> On Wed, Jun 22, 2022 at 1:44 AM Linus Walleij <linus.walleij@linaro.org> wrote:
> >
> > On Wed, Jun 22, 2022 at 9:48 AM Sascha Hauer <sha@pengutronix.de> wrote:
> >
> > > This patch has the effect that console UART devices which have "dmas"
> > > properties specified in the device tree get deferred for 10 to 20
> > > seconds. This happens on i.MX and likely on other SoCs as well. On i.MX
> > > the dma channel is only requested at UART startup time and not at probe
> > > time. dma is not used for the console. Nevertheless with this driver probe
> > > defers until the dma engine driver is available.
>
> FYI, if most of the drivers are built in, you could set
> deferred_probe_timeout=1 to reduce the impact of this (should drop
> down to 1 to 2 seconds). Is that an option until we figure out
> something better?
>
> Actually, why isn't earlyconsole being used? That doesn't get blocked
> on anything and the main point of that is to have console working from
> really early on.
>
> > >
> > > It shouldn't go in as-is.
> >
> > This affects all machines with the PL011 UART and DMAs specified as
> > well.
> >
> > It would be best if the console subsystem could be treated special and
> > not require DMA devlink to be satisfied before probing.
>
> If we can mark the console devices somehow before their drivers probe
> them, I can make fw_devlink give them special treatment. Is there any
> way I could identify them before their drivers probe?
>
> > It seems devlink is not quite aware of the concept of resources that are
> > necessary to probe vs resources that are nice to have and might be
> > added after probe.
>
> Correct, it can't tell them apart. Which is why it tries its best to
> enforce them, get most of them ordered properly and then gives up
> enforcing the rest after deferred_probe_timeout= expires. There's a
> bit more nuance than what I explained here (explained in earlier
> commit texts, LPC talks), but that's the gist of it. That's what's
> going on in this case Sascha is pointing out.z
>
> > We need a strong devlink for the first category
> > and maybe a weak devlink for the latter category.
> >
> > I don't know if this is a generic hardware property for all operating
> > systems so it could be a DT property such as dma-weak-dependency?
> >
> > Or maybe compromize and add a linux,dma-weak-dependency;
> > property?
>
> The linux,dma-weak-dependency might be an option, but then if the
> kernel version changes and we want to enforce it because we now have a
> dma driver (not related to Shasha's example) support, then the
> fw_devlink still can't enforce it because of that property. But maybe
> that's okay? The consumer can try to use dma and defer probe if it
> fails?
>
> Another option is to mark console devices in DT with some property and
> we can give special treatment for those without waiting for
> deferred_probe_timeout= to expire.

Heh, looks like there's already a property for that: stdout-path.

Let me send a series that'll use that to give special treatment to
console devices.

-Saravana

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v2 7/9] driver core: Set fw_devlink.strict=1 by default
  2022-06-22 20:35         ` Saravana Kannan
@ 2022-06-22 22:30           ` Saravana Kannan
  0 siblings, 0 replies; 69+ messages in thread
From: Saravana Kannan @ 2022-06-22 22:30 UTC (permalink / raw)
  To: Linus Walleij
  Cc: Sascha Hauer, Greg Kroah-Hartman, Rafael J. Wysocki,
	Kevin Hilman, Ulf Hansson, Len Brown, Pavel Machek, Joerg Roedel,
	Will Deacon, Andrew Lunn, Heiner Kallweit, Russell King,
	David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Hideaki YOSHIFUJI, David Ahern, kernel-team, linux-kernel,
	linux-pm, iommu, netdev, linux-gpio, kernel

On Wed, Jun 22, 2022 at 1:35 PM Saravana Kannan <saravanak@google.com> wrote:
>
> On Wed, Jun 22, 2022 at 12:40 PM Saravana Kannan <saravanak@google.com> wrote:
> >
> > On Wed, Jun 22, 2022 at 1:44 AM Linus Walleij <linus.walleij@linaro.org> wrote:
> > >
> > > On Wed, Jun 22, 2022 at 9:48 AM Sascha Hauer <sha@pengutronix.de> wrote:
> > >
> > > > This patch has the effect that console UART devices which have "dmas"
> > > > properties specified in the device tree get deferred for 10 to 20
> > > > seconds. This happens on i.MX and likely on other SoCs as well. On i.MX
> > > > the dma channel is only requested at UART startup time and not at probe
> > > > time. dma is not used for the console. Nevertheless with this driver probe
> > > > defers until the dma engine driver is available.
> >
> > FYI, if most of the drivers are built in, you could set
> > deferred_probe_timeout=1 to reduce the impact of this (should drop
> > down to 1 to 2 seconds). Is that an option until we figure out
> > something better?
> >
> > Actually, why isn't earlyconsole being used? That doesn't get blocked
> > on anything and the main point of that is to have console working from
> > really early on.
> >
> > > >
> > > > It shouldn't go in as-is.
> > >
> > > This affects all machines with the PL011 UART and DMAs specified as
> > > well.
> > >
> > > It would be best if the console subsystem could be treated special and
> > > not require DMA devlink to be satisfied before probing.
> >
> > If we can mark the console devices somehow before their drivers probe
> > them, I can make fw_devlink give them special treatment. Is there any
> > way I could identify them before their drivers probe?
> >
> > > It seems devlink is not quite aware of the concept of resources that are
> > > necessary to probe vs resources that are nice to have and might be
> > > added after probe.
> >
> > Correct, it can't tell them apart. Which is why it tries its best to
> > enforce them, get most of them ordered properly and then gives up
> > enforcing the rest after deferred_probe_timeout= expires. There's a
> > bit more nuance than what I explained here (explained in earlier
> > commit texts, LPC talks), but that's the gist of it. That's what's
> > going on in this case Sascha is pointing out.z
> >
> > > We need a strong devlink for the first category
> > > and maybe a weak devlink for the latter category.
> > >
> > > I don't know if this is a generic hardware property for all operating
> > > systems so it could be a DT property such as dma-weak-dependency?
> > >
> > > Or maybe compromize and add a linux,dma-weak-dependency;
> > > property?
> >
> > The linux,dma-weak-dependency might be an option, but then if the
> > kernel version changes and we want to enforce it because we now have a
> > dma driver (not related to Shasha's example) support, then the
> > fw_devlink still can't enforce it because of that property. But maybe
> > that's okay? The consumer can try to use dma and defer probe if it
> > fails?
> >
> > Another option is to mark console devices in DT with some property and
> > we can give special treatment for those without waiting for
> > deferred_probe_timeout= to expire.
>
> Heh, looks like there's already a property for that: stdout-path.
>
> Let me send a series that'll use that to give special treatment to
> console devices.

Here's the fix.
https://lore.kernel.org/lkml/20220622215912.550419-1-saravanak@google.com/

Sascha, can you give it a shot?

-Saravana

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v2 1/9] PM: domains: Delete usage of driver_deferred_probe_check_state()
  2022-06-22 19:09         ` Saravana Kannan
@ 2022-06-23  7:01           ` Tony Lindgren
  2022-06-23  8:21             ` Saravana Kannan
  0 siblings, 1 reply; 69+ messages in thread
From: Tony Lindgren @ 2022-06-23  7:01 UTC (permalink / raw)
  To: Saravana Kannan
  Cc: Greg Kroah-Hartman, Rafael J. Wysocki, Kevin Hilman, Ulf Hansson,
	Len Brown, Pavel Machek, Joerg Roedel, Will Deacon, Andrew Lunn,
	Heiner Kallweit, Russell King, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Linus Walleij, Hideaki YOSHIFUJI,
	David Ahern, kernel-team, linux-kernel, linux-pm, iommu, netdev,
	linux-gpio, Geert Uytterhoeven

* Saravana Kannan <saravanak@google.com> [220622 19:05]:
> On Tue, Jun 21, 2022 at 9:59 PM Tony Lindgren <tony@atomide.com> wrote:
> >
> > Hi,
> >
> > * Saravana Kannan <saravanak@google.com> [220621 19:29]:
> > > On Tue, Jun 21, 2022 at 12:28 AM Tony Lindgren <tony@atomide.com> wrote:
> > > >
> > > > Hi,
> > > >
> > > > * Saravana Kannan <saravanak@google.com> [700101 02:00]:
> > > > > Now that fw_devlink=on by default and fw_devlink supports
> > > > > "power-domains" property, the execution will never get to the point
> > > > > where driver_deferred_probe_check_state() is called before the supplier
> > > > > has probed successfully or before deferred probe timeout has expired.
> > > > >
> > > > > So, delete the call and replace it with -ENODEV.
> > > >
> > > > Looks like this causes omaps to not boot in Linux next.
> > >
> > > Can you please point me to an example DTS I could use for debugging
> > > this? I'm assuming you are leaving fw_devlink=on and not turning it
> > > off or putting it in permissive mode.
> >
> > Sure, this seems to happen at least with simple-pm-bus as the top
> > level interconnect with a configured power-domains property:
> >
> > $ git grep -A10 "ocp {" arch/arm/boot/dts/*.dtsi | grep -B3 -A4 simple-pm-bus
> 
> Thanks for the example. I generally start looking from dts (not dtsi)
> files in case there are some DT property override/additions after the
> dtsi files are included in the dts file. But I'll assume for now
> that's not the case. If there's a specific dts file for a board I can
> look from that'd be helpful to rule out those kinds of issues.
> 
> For now, I looked at arch/arm/boot/dts/omap4.dtsi.

OK it should be very similar for all the affected SoCs.

> > This issue is no directly related fw_devlink. It is a side effect of
> > removing driver_deferred_probe_check_state(). We no longer return
> > -EPROBE_DEFER at the end of driver_deferred_probe_check_state().
> 
> Yes, I understand the issue. But driver_deferred_probe_check_state()
> was deleted because fw_devlink=on should have short circuited the
> probe attempt with an  -EPROBE_DEFER before reaching the bus/driver
> probe function and hitting this -ENOENT failure. That's why I was
> asking the other questions.

OK. So where is the -EPROBE_DEFER supposed to happen without
driver_deferred_probe_check_state() then?

> > > > On platform_probe() genpd_get_from_provider() returns
> > > > -ENOENT.
> > >
> > > This error is with the series I assume?
> >
> > On the first probe genpd_get_from_provider() will return -ENOENT in
> > both cases. The list is empty on the first probe and there are no
> > genpd providers at this point.
> >
> > Earlier with driver_deferred_probe_check_state(), the initial -ENOENT
> > ends up getting changed to -EPROBE_DEFER at the end of
> > driver_deferred_probe_check_state(), we are now missing that.
> 
> Right, I was aware -ENOENT would be returned if we got this far. But
> the point of this series is that you shouldn't have gotten that far
> before your pm domain device is ready. Hence my questions from the
> earlier reply.

OK

> Can I get answers to rest of my questions in the first reply please?
> That should help us figure out why fw_devlink let us get this far.
> Summarize them here to make it easy:
> * Are you running with fw_devlink=on?

Yes with the default with no specific kernel params so looks like
FW_DEVLINK_FLAGS_ON.

> * Is the"ti,omap4-prm-inst"/"ti,omap-prm-inst" built-in in this case?

Yes

> * If it's not built-in, can you please try deferred_probe_timeout=0
> and deferred_probe_timeout=30 and see if either one of them help?

It's built in so I did not try these.

> * Can I get the output of "ls -d supplier:*" and "cat
> supplier:*/status" output from the sysfs dir for the ocp device
> without this series where it boots properly.

Hmm so I'm not seeing any supplier for the top level ocp device in
the booting case without your patches. I see the suppliers for the
ocp child device instances only.

Without your patches I see simple-pm-bus probe initially with
EPROBE_DEFER like I described earlier, and then simple-pm-bus probes
on the second try.

Regards,

Tony

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v2 1/9] PM: domains: Delete usage of driver_deferred_probe_check_state()
  2022-06-23  7:01           ` Tony Lindgren
@ 2022-06-23  8:21             ` Saravana Kannan
  2022-06-27  9:10               ` Tony Lindgren
  0 siblings, 1 reply; 69+ messages in thread
From: Saravana Kannan @ 2022-06-23  8:21 UTC (permalink / raw)
  To: Tony Lindgren
  Cc: Greg Kroah-Hartman, Rafael J. Wysocki, Kevin Hilman, Ulf Hansson,
	Len Brown, Pavel Machek, Joerg Roedel, Will Deacon, Andrew Lunn,
	Heiner Kallweit, Russell King, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Linus Walleij, Hideaki YOSHIFUJI,
	David Ahern, kernel-team, linux-kernel, linux-pm, iommu, netdev,
	linux-gpio, Geert Uytterhoeven

On Thu, Jun 23, 2022 at 12:01 AM Tony Lindgren <tony@atomide.com> wrote:
>
> * Saravana Kannan <saravanak@google.com> [220622 19:05]:
> > On Tue, Jun 21, 2022 at 9:59 PM Tony Lindgren <tony@atomide.com> wrote:
> > >
> > > Hi,
> > >
> > > * Saravana Kannan <saravanak@google.com> [220621 19:29]:
> > > > On Tue, Jun 21, 2022 at 12:28 AM Tony Lindgren <tony@atomide.com> wrote:
> > > > >
> > > > > Hi,
> > > > >
> > > > > * Saravana Kannan <saravanak@google.com> [700101 02:00]:
> > > > > > Now that fw_devlink=on by default and fw_devlink supports
> > > > > > "power-domains" property, the execution will never get to the point
> > > > > > where driver_deferred_probe_check_state() is called before the supplier
> > > > > > has probed successfully or before deferred probe timeout has expired.
> > > > > >
> > > > > > So, delete the call and replace it with -ENODEV.
> > > > >
> > > > > Looks like this causes omaps to not boot in Linux next.
> > > >
> > > > Can you please point me to an example DTS I could use for debugging
> > > > this? I'm assuming you are leaving fw_devlink=on and not turning it
> > > > off or putting it in permissive mode.
> > >
> > > Sure, this seems to happen at least with simple-pm-bus as the top
> > > level interconnect with a configured power-domains property:
> > >
> > > $ git grep -A10 "ocp {" arch/arm/boot/dts/*.dtsi | grep -B3 -A4 simple-pm-bus
> >
> > Thanks for the example. I generally start looking from dts (not dtsi)
> > files in case there are some DT property override/additions after the
> > dtsi files are included in the dts file. But I'll assume for now
> > that's not the case. If there's a specific dts file for a board I can
> > look from that'd be helpful to rule out those kinds of issues.
> >
> > For now, I looked at arch/arm/boot/dts/omap4.dtsi.
>
> OK it should be very similar for all the affected SoCs.
>
> > > This issue is no directly related fw_devlink. It is a side effect of
> > > removing driver_deferred_probe_check_state(). We no longer return
> > > -EPROBE_DEFER at the end of driver_deferred_probe_check_state().
> >
> > Yes, I understand the issue. But driver_deferred_probe_check_state()
> > was deleted because fw_devlink=on should have short circuited the
> > probe attempt with an  -EPROBE_DEFER before reaching the bus/driver
> > probe function and hitting this -ENOENT failure. That's why I was
> > asking the other questions.
>
> OK. So where is the -EPROBE_DEFER supposed to happen without
> driver_deferred_probe_check_state() then?

device_links_check_suppliers() call inside really_probe() would short
circuit and return an -EPROBE_DEFER if the device links are created as
expected.

>
> > > > > On platform_probe() genpd_get_from_provider() returns
> > > > > -ENOENT.
> > > >
> > > > This error is with the series I assume?
> > >
> > > On the first probe genpd_get_from_provider() will return -ENOENT in
> > > both cases. The list is empty on the first probe and there are no
> > > genpd providers at this point.
> > >
> > > Earlier with driver_deferred_probe_check_state(), the initial -ENOENT
> > > ends up getting changed to -EPROBE_DEFER at the end of
> > > driver_deferred_probe_check_state(), we are now missing that.
> >
> > Right, I was aware -ENOENT would be returned if we got this far. But
> > the point of this series is that you shouldn't have gotten that far
> > before your pm domain device is ready. Hence my questions from the
> > earlier reply.
>
> OK
>
> > Can I get answers to rest of my questions in the first reply please?
> > That should help us figure out why fw_devlink let us get this far.
> > Summarize them here to make it easy:
> > * Are you running with fw_devlink=on?
>
> Yes with the default with no specific kernel params so looks like
> FW_DEVLINK_FLAGS_ON.
>
> > * Is the"ti,omap4-prm-inst"/"ti,omap-prm-inst" built-in in this case?
>
> Yes
>
> > * If it's not built-in, can you please try deferred_probe_timeout=0
> > and deferred_probe_timeout=30 and see if either one of them help?
>
> It's built in so I did not try these.
>
> > * Can I get the output of "ls -d supplier:*" and "cat
> > supplier:*/status" output from the sysfs dir for the ocp device
> > without this series where it boots properly.
>
> Hmm so I'm not seeing any supplier for the top level ocp device in
> the booting case without your patches. I see the suppliers for the
> ocp child device instances only.

Hmmm... this is strange (that the device link isn't there), but this
is what I suspected.

Now we need to figure out why it's missing. There are only a few
things that could cause this and I don't see any of those. I already
checked to make sure the power domain in this instance had a proper
driver with a probe() function -- if it didn't, then that's one thing
that'd could have caused the missing device link. The device does seem
to have a proper driver, so looks like I can rule that out.

Can you point me to the dts file that corresponds to the specific
board you are testing this one? I probably won't find anything, but I
want to rule out some of the possibilities.

All the device link creation logic is inside drivers/base/core.c. So
if you can look at the existing messages or add other stuff to figure
out why the device link isn't getting created, that'd be handy. In
either case, I'll continue staring at the DT and code to see what
might be happening here.

-Saravana

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v2 1/9] PM: domains: Delete usage of driver_deferred_probe_check_state()
  2022-06-21  7:28   ` Tony Lindgren
  2022-06-21 19:34     ` Saravana Kannan
@ 2022-06-23 12:08     ` Alexander Stein
  2022-07-01  0:37       ` Saravana Kannan
  1 sibling, 1 reply; 69+ messages in thread
From: Alexander Stein @ 2022-06-23 12:08 UTC (permalink / raw)
  To: Saravana Kannan, Tony Lindgren
  Cc: Greg Kroah-Hartman, Rafael J. Wysocki, Kevin Hilman, Ulf Hansson,
	Len Brown, Pavel Machek, Joerg Roedel, Will Deacon, Andrew Lunn,
	Heiner Kallweit, Russell King, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Linus Walleij, Hideaki YOSHIFUJI,
	David Ahern, kernel-team, linux-kernel, linux-pm, iommu, netdev,
	linux-gpio, Geert Uytterhoeven

Hi,

Am Dienstag, 21. Juni 2022, 09:28:43 CEST schrieb Tony Lindgren:
> Hi,
> 
> * Saravana Kannan <saravanak@google.com> [700101 02:00]:
> > Now that fw_devlink=on by default and fw_devlink supports
> > "power-domains" property, the execution will never get to the point
> > where driver_deferred_probe_check_state() is called before the supplier
> > has probed successfully or before deferred probe timeout has expired.
> > 
> > So, delete the call and replace it with -ENODEV.
> 
> Looks like this causes omaps to not boot in Linux next. With this
> simple-pm-bus fails to probe initially as the power-domain is not
> yet available. On platform_probe() genpd_get_from_provider() returns
> -ENOENT.
> 
> Seems like other stuff is potentially broken too, any ideas on
> how to fix this?

I think I'm hit by this as well, although I do not get a lockup.
In my case I'm using arch/arm64/boot/dts/freescale/imx8mq-tqma8mq-mba8mx.dts 
and probing of 38320000.blk-ctrl fails as the power-domain is not (yet) 
registed. See the (filtered) dmesg output:

> [    0.744245] PM: Added domain provider from
> /soc@0/bus@30000000/gpc@303a0000/pgc/power-domain@0 [    0.744756] PM:
> Added domain provider from
> /soc@0/bus@30000000/gpc@303a0000/pgc/power-domain@2 [    0.745012] PM:
> Added domain provider from
> /soc@0/bus@30000000/gpc@303a0000/pgc/power-domain@3 [    0.745268] PM:
> Added domain provider from
> /soc@0/bus@30000000/gpc@303a0000/pgc/power-domain@4 [    0.746121] PM:
> Added domain provider from
> /soc@0/bus@30000000/gpc@303a0000/pgc/power-domain@7 [    0.746400] PM:
> Added domain provider from
> /soc@0/bus@30000000/gpc@303a0000/pgc/power-domain@8 [    0.746665] PM:
> Added domain provider from
> /soc@0/bus@30000000/gpc@303a0000/pgc/power-domain@9 [    0.746927] PM:
> Added domain provider from
> /soc@0/bus@30000000/gpc@303a0000/pgc/power-domain@a [    0.748870]
> imx8m-blk-ctrl 38320000.blk-ctrl: error -ENODEV: failed to attach bus power
> domain [    1.265279] PM: Added domain provider from
> /soc@0/bus@30000000/gpc@303a0000/pgc/power-domain@5 [    1.265861] PM:
> Added domain provider from
> /soc@0/bus@30000000/gpc@303a0000/pgc/power-domain@6

blk-ctrl@38320000 requires the power-domain 'pgc_vpu', which is power-domain@6 
in pgc.

Best regards,
Alexander

> > Signed-off-by: Saravana Kannan <saravanak@google.com>
> > ---
> > 
> >  drivers/base/power/domain.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/drivers/base/power/domain.c b/drivers/base/power/domain.c
> > index 739e52cd4aba..3e86772d5fac 100644
> > --- a/drivers/base/power/domain.c
> > +++ b/drivers/base/power/domain.c
> > @@ -2730,7 +2730,7 @@ static int __genpd_dev_pm_attach(struct device *dev,
> > struct device *base_dev,> 
> >  		mutex_unlock(&gpd_list_lock);
> >  		dev_dbg(dev, "%s() failed to find PM domain: %ld\n",
> >  		
> >  			__func__, PTR_ERR(pd));
> > 
> > -		return driver_deferred_probe_check_state(base_dev);
> > +		return -ENODEV;
> > 
> >  	}
> >  	
> >  	dev_dbg(dev, "adding to PM domain %s\n", pd->name);





^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v2 1/9] PM: domains: Delete usage of driver_deferred_probe_check_state()
  2022-06-23  8:21             ` Saravana Kannan
@ 2022-06-27  9:10               ` Tony Lindgren
  2022-06-30 23:10                 ` Saravana Kannan
  0 siblings, 1 reply; 69+ messages in thread
From: Tony Lindgren @ 2022-06-27  9:10 UTC (permalink / raw)
  To: Saravana Kannan
  Cc: Greg Kroah-Hartman, Rafael J. Wysocki, Kevin Hilman, Ulf Hansson,
	Len Brown, Pavel Machek, Joerg Roedel, Will Deacon, Andrew Lunn,
	Heiner Kallweit, Russell King, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Linus Walleij, Hideaki YOSHIFUJI,
	David Ahern, kernel-team, linux-kernel, linux-pm, iommu, netdev,
	linux-gpio, Geert Uytterhoeven, Alexander Stein

* Saravana Kannan <saravanak@google.com> [220623 08:17]:
> On Thu, Jun 23, 2022 at 12:01 AM Tony Lindgren <tony@atomide.com> wrote:
> >
> > * Saravana Kannan <saravanak@google.com> [220622 19:05]:
> > > On Tue, Jun 21, 2022 at 9:59 PM Tony Lindgren <tony@atomide.com> wrote:
> > > > This issue is no directly related fw_devlink. It is a side effect of
> > > > removing driver_deferred_probe_check_state(). We no longer return
> > > > -EPROBE_DEFER at the end of driver_deferred_probe_check_state().
> > >
> > > Yes, I understand the issue. But driver_deferred_probe_check_state()
> > > was deleted because fw_devlink=on should have short circuited the
> > > probe attempt with an  -EPROBE_DEFER before reaching the bus/driver
> > > probe function and hitting this -ENOENT failure. That's why I was
> > > asking the other questions.
> >
> > OK. So where is the -EPROBE_DEFER supposed to happen without
> > driver_deferred_probe_check_state() then?
> 
> device_links_check_suppliers() call inside really_probe() would short
> circuit and return an -EPROBE_DEFER if the device links are created as
> expected.

OK

> > Hmm so I'm not seeing any supplier for the top level ocp device in
> > the booting case without your patches. I see the suppliers for the
> > ocp child device instances only.
> 
> Hmmm... this is strange (that the device link isn't there), but this
> is what I suspected.

Yup, maybe it's because of the supplier being a device in the child
interconnect for the ocp.

> Now we need to figure out why it's missing. There are only a few
> things that could cause this and I don't see any of those. I already
> checked to make sure the power domain in this instance had a proper
> driver with a probe() function -- if it didn't, then that's one thing
> that'd could have caused the missing device link. The device does seem
> to have a proper driver, so looks like I can rule that out.
> 
> Can you point me to the dts file that corresponds to the specific
> board you are testing this one? I probably won't find anything, but I
> want to rule out some of the possibilities.

You can use the beaglebone black dts for example, that's
arch/arm/boot/dts/am335x-boneblack.dts and uses am33xx.dtsi for
ocp interconnect with simple-pm-bus.

> All the device link creation logic is inside drivers/base/core.c. So
> if you can look at the existing messages or add other stuff to figure
> out why the device link isn't getting created, that'd be handy. In
> either case, I'll continue staring at the DT and code to see what
> might be happening here.

In device_links_check_suppliers() I see these ocp suppliers:

platform ocp: device_links_check_suppliers: 1024: supplier 44e00d00.prm: link->status: 0 link->flags: 000001c0
platform ocp: device_links_check_suppliers: 1024: supplier 44e01000.prm: link->status: 0 link->flags: 000001c0
platform ocp: device_links_check_suppliers: 1024: supplier 44e00c00.prm: link->status: 0 link->flags: 000001c0
platform ocp: device_links_check_suppliers: 1024: supplier 44e00e00.prm: link->status: 0 link->flags: 000001c0
platform ocp: device_links_check_suppliers: 1024: supplier 44e01100.prm: link->status: 0 link->flags: 000001c0
platform ocp: device_links_check_suppliers: 1024: supplier fixedregulator0: link->status: 1 link->flags: 000001c0

No -EPROBE_DEFER is returned in device_links_check_suppliers() for
44e00c00.prm supplier for beaglebone black for example, 0 gets
returned.

Regards,

Tony

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v2 7/9] driver core: Set fw_devlink.strict=1 by default
  2022-06-22 19:40       ` Saravana Kannan
  2022-06-22 20:35         ` Saravana Kannan
@ 2022-06-28 13:09         ` Linus Walleij
  1 sibling, 0 replies; 69+ messages in thread
From: Linus Walleij @ 2022-06-28 13:09 UTC (permalink / raw)
  To: Saravana Kannan
  Cc: Sascha Hauer, Greg Kroah-Hartman, Rafael J. Wysocki,
	Kevin Hilman, Ulf Hansson, Len Brown, Pavel Machek, Joerg Roedel,
	Will Deacon, Andrew Lunn, Heiner Kallweit, Russell King,
	David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Hideaki YOSHIFUJI, David Ahern, kernel-team, linux-kernel,
	linux-pm, iommu, netdev, linux-gpio, kernel

On Wed, Jun 22, 2022 at 9:40 PM Saravana Kannan <saravanak@google.com> wrote:

> Actually, why isn't earlyconsole being used? That doesn't get blocked
> on anything and the main point of that is to have console working from
> really early on.

For Arm (arch/arm) there is a special low-level debug option call low-level
debug, which you find in e.g:
arch/arm/Kconfig.debug
arch/arm/kernel/debug.S

This debug facility can print to the UART fifo before even MMU is up, pretty
much from the first instruction the kernel executes.

The versatility of LL-debug means that developers do not use earlyconsole
much on Arm.

I don't know about arm64 though.

Yours,
Linus Walleij

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v2 1/9] PM: domains: Delete usage of driver_deferred_probe_check_state()
  2022-06-27  9:10               ` Tony Lindgren
@ 2022-06-30 23:10                 ` Saravana Kannan
  2022-06-30 23:26                   ` Rob Herring
  2022-07-01  7:38                   ` Geert Uytterhoeven
  0 siblings, 2 replies; 69+ messages in thread
From: Saravana Kannan @ 2022-06-30 23:10 UTC (permalink / raw)
  To: Tony Lindgren, Rob Herring, Geert Uytterhoeven
  Cc: Greg Kroah-Hartman, Rafael J. Wysocki, Kevin Hilman, Ulf Hansson,
	Len Brown, Pavel Machek, Joerg Roedel, Will Deacon, Andrew Lunn,
	Heiner Kallweit, Russell King, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Linus Walleij, Hideaki YOSHIFUJI,
	David Ahern, kernel-team, linux-kernel, linux-pm, iommu, netdev,
	linux-gpio, Alexander Stein

On Mon, Jun 27, 2022 at 2:10 AM Tony Lindgren <tony@atomide.com> wrote:
>
> * Saravana Kannan <saravanak@google.com> [220623 08:17]:
> > On Thu, Jun 23, 2022 at 12:01 AM Tony Lindgren <tony@atomide.com> wrote:
> > >
> > > * Saravana Kannan <saravanak@google.com> [220622 19:05]:
> > > > On Tue, Jun 21, 2022 at 9:59 PM Tony Lindgren <tony@atomide.com> wrote:
> > > > > This issue is no directly related fw_devlink. It is a side effect of
> > > > > removing driver_deferred_probe_check_state(). We no longer return
> > > > > -EPROBE_DEFER at the end of driver_deferred_probe_check_state().
> > > >
> > > > Yes, I understand the issue. But driver_deferred_probe_check_state()
> > > > was deleted because fw_devlink=on should have short circuited the
> > > > probe attempt with an  -EPROBE_DEFER before reaching the bus/driver
> > > > probe function and hitting this -ENOENT failure. That's why I was
> > > > asking the other questions.
> > >
> > > OK. So where is the -EPROBE_DEFER supposed to happen without
> > > driver_deferred_probe_check_state() then?
> >
> > device_links_check_suppliers() call inside really_probe() would short
> > circuit and return an -EPROBE_DEFER if the device links are created as
> > expected.
>
> OK
>
> > > Hmm so I'm not seeing any supplier for the top level ocp device in
> > > the booting case without your patches. I see the suppliers for the
> > > ocp child device instances only.
> >
> > Hmmm... this is strange (that the device link isn't there), but this
> > is what I suspected.
>
> Yup, maybe it's because of the supplier being a device in the child
> interconnect for the ocp.

Ugh... yeah, this is why the normal (not SYNC_STATE_ONLY) device link
isn't being created.

So the aggregated view is something like (I had to set tabs = 4 space
to fit it within 80 cols):

    ocp: ocp {         <========================= Consumer
        compatible = "simple-pm-bus";
        power-domains = <&prm_per>; <=========== Supplier ref

                l4_wkup: interconnect@44c00000 {
            compatible = "ti,am33xx-l4-wkup", "simple-pm-bus";

            segment@200000 {  /* 0x44e00000 */
                compatible = "simple-pm-bus";

                target-module@0 { /* 0x44e00000, ap 8 58.0 */
                    compatible = "ti,sysc-omap4", "ti,sysc";

                    prcm: prcm@0 {
                        compatible = "ti,am3-prcm", "simple-bus";

                        prm_per: prm@c00 { <========= Actual Supplier
                            compatible = "ti,am3-prm-inst", "ti,omap-prm-inst";
                        };
                    };
                };
            };
        };
    };

The power-domain supplier is the great-great-great-grand-child of the
consumer. It's not clear to me how this is valid. What does it even
mean?

Rob, is this considered a valid DT?

Geert, thoughts on whether this is a correct use of simple-pm-bus device?

Also, how is the power domain attach/get working in this case? As far
as I can tell, at least for "simple-pm-bus" devices, the pm domain
attachment is happening under:
really_probe() -> call_driver_probe -> platform_probe() ->
dev_pm_domain_attach()

So, how is the pm domain attach succeeding in the first place without
my changes?

> > Now we need to figure out why it's missing. There are only a few
> > things that could cause this and I don't see any of those. I already
> > checked to make sure the power domain in this instance had a proper
> > driver with a probe() function -- if it didn't, then that's one thing
> > that'd could have caused the missing device link. The device does seem
> > to have a proper driver, so looks like I can rule that out.
> >
> > Can you point me to the dts file that corresponds to the specific
> > board you are testing this one? I probably won't find anything, but I
> > want to rule out some of the possibilities.
>
> You can use the beaglebone black dts for example, that's
> arch/arm/boot/dts/am335x-boneblack.dts and uses am33xx.dtsi for
> ocp interconnect with simple-pm-bus.
>
> > All the device link creation logic is inside drivers/base/core.c. So
> > if you can look at the existing messages or add other stuff to figure
> > out why the device link isn't getting created, that'd be handy. In
> > either case, I'll continue staring at the DT and code to see what
> > might be happening here.
>
> In device_links_check_suppliers() I see these ocp suppliers:
>
> platform ocp: device_links_check_suppliers: 1024: supplier 44e00d00.prm: link->status: 0 link->flags: 000001c0
> platform ocp: device_links_check_suppliers: 1024: supplier 44e01000.prm: link->status: 0 link->flags: 000001c0
> platform ocp: device_links_check_suppliers: 1024: supplier 44e00c00.prm: link->status: 0 link->flags: 000001c0
> platform ocp: device_links_check_suppliers: 1024: supplier 44e00e00.prm: link->status: 0 link->flags: 000001c0
> platform ocp: device_links_check_suppliers: 1024: supplier 44e01100.prm: link->status: 0 link->flags: 000001c0
> platform ocp: device_links_check_suppliers: 1024: supplier fixedregulator0: link->status: 1 link->flags: 000001c0
>
> No -EPROBE_DEFER is returned in device_links_check_suppliers() for
> 44e00c00.prm supplier for beaglebone black for example, 0 gets
> returned.

Yeah, the "1c0" flags are SYNC_STATE_ONLY device links and aren't
relevant to the issue we are seeing. Those links are being created as
a proxy for other descendant devices of ocp that haven't been added
yet, but are consumers of these *.prm devices. They are mainly meant
for correctness of sync_state() callbacks of the supplier and don't
affect probe order. For example: target-module@56000000 is a consumer
of prm_gfx 44e01100.prm.

-Saravana

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v2 1/9] PM: domains: Delete usage of driver_deferred_probe_check_state()
  2022-06-30 23:10                 ` Saravana Kannan
@ 2022-06-30 23:26                   ` Rob Herring
  2022-06-30 23:30                     ` Saravana Kannan
  2022-07-01  7:38                   ` Geert Uytterhoeven
  1 sibling, 1 reply; 69+ messages in thread
From: Rob Herring @ 2022-06-30 23:26 UTC (permalink / raw)
  To: Saravana Kannan
  Cc: Tony Lindgren, Geert Uytterhoeven, Greg Kroah-Hartman,
	Rafael J. Wysocki, Kevin Hilman, Ulf Hansson, Len Brown,
	Pavel Machek, Joerg Roedel, Will Deacon, Andrew Lunn,
	Heiner Kallweit, Russell King, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Linus Walleij, Hideaki YOSHIFUJI,
	David Ahern, Android Kernel Team, linux-kernel,
	open list:THERMAL, Linux IOMMU, netdev, open list:GPIO SUBSYSTEM,
	Alexander Stein

On Thu, Jun 30, 2022 at 5:11 PM Saravana Kannan <saravanak@google.com> wrote:
>
> On Mon, Jun 27, 2022 at 2:10 AM Tony Lindgren <tony@atomide.com> wrote:
> >
> > * Saravana Kannan <saravanak@google.com> [220623 08:17]:
> > > On Thu, Jun 23, 2022 at 12:01 AM Tony Lindgren <tony@atomide.com> wrote:
> > > >
> > > > * Saravana Kannan <saravanak@google.com> [220622 19:05]:
> > > > > On Tue, Jun 21, 2022 at 9:59 PM Tony Lindgren <tony@atomide.com> wrote:
> > > > > > This issue is no directly related fw_devlink. It is a side effect of
> > > > > > removing driver_deferred_probe_check_state(). We no longer return
> > > > > > -EPROBE_DEFER at the end of driver_deferred_probe_check_state().
> > > > >
> > > > > Yes, I understand the issue. But driver_deferred_probe_check_state()
> > > > > was deleted because fw_devlink=on should have short circuited the
> > > > > probe attempt with an  -EPROBE_DEFER before reaching the bus/driver
> > > > > probe function and hitting this -ENOENT failure. That's why I was
> > > > > asking the other questions.
> > > >
> > > > OK. So where is the -EPROBE_DEFER supposed to happen without
> > > > driver_deferred_probe_check_state() then?
> > >
> > > device_links_check_suppliers() call inside really_probe() would short
> > > circuit and return an -EPROBE_DEFER if the device links are created as
> > > expected.
> >
> > OK
> >
> > > > Hmm so I'm not seeing any supplier for the top level ocp device in
> > > > the booting case without your patches. I see the suppliers for the
> > > > ocp child device instances only.
> > >
> > > Hmmm... this is strange (that the device link isn't there), but this
> > > is what I suspected.
> >
> > Yup, maybe it's because of the supplier being a device in the child
> > interconnect for the ocp.
>
> Ugh... yeah, this is why the normal (not SYNC_STATE_ONLY) device link
> isn't being created.
>
> So the aggregated view is something like (I had to set tabs = 4 space
> to fit it within 80 cols):
>
>     ocp: ocp {         <========================= Consumer
>         compatible = "simple-pm-bus";
>         power-domains = <&prm_per>; <=========== Supplier ref
>
>                 l4_wkup: interconnect@44c00000 {
>             compatible = "ti,am33xx-l4-wkup", "simple-pm-bus";
>
>             segment@200000 {  /* 0x44e00000 */
>                 compatible = "simple-pm-bus";
>
>                 target-module@0 { /* 0x44e00000, ap 8 58.0 */
>                     compatible = "ti,sysc-omap4", "ti,sysc";
>
>                     prcm: prcm@0 {
>                         compatible = "ti,am3-prcm", "simple-bus";
>
>                         prm_per: prm@c00 { <========= Actual Supplier
>                             compatible = "ti,am3-prm-inst", "ti,omap-prm-inst";
>                         };
>                     };
>                 };
>             };
>         };
>     };
>
> The power-domain supplier is the great-great-great-grand-child of the
> consumer. It's not clear to me how this is valid. What does it even
> mean?
>
> Rob, is this considered a valid DT?

Valid DT for broken h/w.

So the domain must be default on and then simple-pm-bus is going to
hold a reference to the domain preventing it from ever getting powered
off and things seem to work. Except what happens during suspend?

Rob

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v2 1/9] PM: domains: Delete usage of driver_deferred_probe_check_state()
  2022-06-30 23:26                   ` Rob Herring
@ 2022-06-30 23:30                     ` Saravana Kannan
  2022-07-01  5:33                       ` Tony Lindgren
  0 siblings, 1 reply; 69+ messages in thread
From: Saravana Kannan @ 2022-06-30 23:30 UTC (permalink / raw)
  To: Rob Herring
  Cc: Tony Lindgren, Geert Uytterhoeven, Greg Kroah-Hartman,
	Rafael J. Wysocki, Kevin Hilman, Ulf Hansson, Len Brown,
	Pavel Machek, Joerg Roedel, Will Deacon, Andrew Lunn,
	Heiner Kallweit, Russell King, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Linus Walleij, Hideaki YOSHIFUJI,
	David Ahern, Android Kernel Team, linux-kernel,
	open list:THERMAL, Linux IOMMU, netdev, open list:GPIO SUBSYSTEM,
	Alexander Stein

On Thu, Jun 30, 2022 at 4:26 PM Rob Herring <robh@kernel.org> wrote:
>
> On Thu, Jun 30, 2022 at 5:11 PM Saravana Kannan <saravanak@google.com> wrote:
> >
> > On Mon, Jun 27, 2022 at 2:10 AM Tony Lindgren <tony@atomide.com> wrote:
> > >
> > > * Saravana Kannan <saravanak@google.com> [220623 08:17]:
> > > > On Thu, Jun 23, 2022 at 12:01 AM Tony Lindgren <tony@atomide.com> wrote:
> > > > >
> > > > > * Saravana Kannan <saravanak@google.com> [220622 19:05]:
> > > > > > On Tue, Jun 21, 2022 at 9:59 PM Tony Lindgren <tony@atomide.com> wrote:
> > > > > > > This issue is no directly related fw_devlink. It is a side effect of
> > > > > > > removing driver_deferred_probe_check_state(). We no longer return
> > > > > > > -EPROBE_DEFER at the end of driver_deferred_probe_check_state().
> > > > > >
> > > > > > Yes, I understand the issue. But driver_deferred_probe_check_state()
> > > > > > was deleted because fw_devlink=on should have short circuited the
> > > > > > probe attempt with an  -EPROBE_DEFER before reaching the bus/driver
> > > > > > probe function and hitting this -ENOENT failure. That's why I was
> > > > > > asking the other questions.
> > > > >
> > > > > OK. So where is the -EPROBE_DEFER supposed to happen without
> > > > > driver_deferred_probe_check_state() then?
> > > >
> > > > device_links_check_suppliers() call inside really_probe() would short
> > > > circuit and return an -EPROBE_DEFER if the device links are created as
> > > > expected.
> > >
> > > OK
> > >
> > > > > Hmm so I'm not seeing any supplier for the top level ocp device in
> > > > > the booting case without your patches. I see the suppliers for the
> > > > > ocp child device instances only.
> > > >
> > > > Hmmm... this is strange (that the device link isn't there), but this
> > > > is what I suspected.
> > >
> > > Yup, maybe it's because of the supplier being a device in the child
> > > interconnect for the ocp.
> >
> > Ugh... yeah, this is why the normal (not SYNC_STATE_ONLY) device link
> > isn't being created.
> >
> > So the aggregated view is something like (I had to set tabs = 4 space
> > to fit it within 80 cols):
> >
> >     ocp: ocp {         <========================= Consumer
> >         compatible = "simple-pm-bus";
> >         power-domains = <&prm_per>; <=========== Supplier ref
> >
> >                 l4_wkup: interconnect@44c00000 {
> >             compatible = "ti,am33xx-l4-wkup", "simple-pm-bus";
> >
> >             segment@200000 {  /* 0x44e00000 */
> >                 compatible = "simple-pm-bus";
> >
> >                 target-module@0 { /* 0x44e00000, ap 8 58.0 */
> >                     compatible = "ti,sysc-omap4", "ti,sysc";
> >
> >                     prcm: prcm@0 {
> >                         compatible = "ti,am3-prcm", "simple-bus";
> >
> >                         prm_per: prm@c00 { <========= Actual Supplier
> >                             compatible = "ti,am3-prm-inst", "ti,omap-prm-inst";
> >                         };
> >                     };
> >                 };
> >             };
> >         };
> >     };
> >
> > The power-domain supplier is the great-great-great-grand-child of the
> > consumer. It's not clear to me how this is valid. What does it even
> > mean?
> >
> > Rob, is this considered a valid DT?
>
> Valid DT for broken h/w.

I'm not sure even in that case it's valid. When the parent device is
in reset (when the SoC is coming out of reset), there's no way the
descendant is functional. And if the descendant is not functional, how
is the parent device powered up? This just feels like an incorrect
representation of the real h/w.

> So the domain must be default on and then simple-pm-bus is going to
> hold a reference to the domain preventing it from ever getting powered
> off and things seem to work. Except what happens during suspend?

But how can simple-pm-bus even get a reference? The PM domain can't
get added until we are well into the probe of the simple-pm-bus and
AFAICT the genpd attach is done before the driver probe is even
called.

-Saravana

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v2 1/9] PM: domains: Delete usage of driver_deferred_probe_check_state()
  2022-06-23 12:08     ` Alexander Stein
@ 2022-07-01  0:37       ` Saravana Kannan
  2022-07-01  6:01         ` (EXT) " Alexander Stein
  2022-07-01  7:30         ` Geert Uytterhoeven
  0 siblings, 2 replies; 69+ messages in thread
From: Saravana Kannan @ 2022-07-01  0:37 UTC (permalink / raw)
  To: Alexander Stein
  Cc: Tony Lindgren, Greg Kroah-Hartman, Rafael J. Wysocki,
	Kevin Hilman, Ulf Hansson, Len Brown, Pavel Machek, Joerg Roedel,
	Will Deacon, Andrew Lunn, Heiner Kallweit, Russell King,
	David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Linus Walleij, Hideaki YOSHIFUJI, David Ahern, kernel-team,
	linux-kernel, linux-pm, iommu, netdev, linux-gpio,
	Geert Uytterhoeven

[-- Attachment #1: Type: text/plain, Size: 1924 bytes --]

On Thu, Jun 23, 2022 at 5:08 AM Alexander Stein
<alexander.stein@ew.tq-group.com> wrote:
>
> Hi,
>
> Am Dienstag, 21. Juni 2022, 09:28:43 CEST schrieb Tony Lindgren:
> > Hi,
> >
> > * Saravana Kannan <saravanak@google.com> [700101 02:00]:
> > > Now that fw_devlink=on by default and fw_devlink supports
> > > "power-domains" property, the execution will never get to the point
> > > where driver_deferred_probe_check_state() is called before the supplier
> > > has probed successfully or before deferred probe timeout has expired.
> > >
> > > So, delete the call and replace it with -ENODEV.
> >
> > Looks like this causes omaps to not boot in Linux next. With this
> > simple-pm-bus fails to probe initially as the power-domain is not
> > yet available. On platform_probe() genpd_get_from_provider() returns
> > -ENOENT.
> >
> > Seems like other stuff is potentially broken too, any ideas on
> > how to fix this?
>
> I think I'm hit by this as well, although I do not get a lockup.
> In my case I'm using arch/arm64/boot/dts/freescale/imx8mq-tqma8mq-mba8mx.dts
> and probing of 38320000.blk-ctrl fails as the power-domain is not (yet)
> registed.

Ok, took a look.

The problem is that there are two drivers for the same device and they
both initialize this device.

    gpc: gpc@303a0000 {
        compatible = "fsl,imx8mq-gpc";
    }

$ git grep -l "fsl,imx7d-gpc" -- drivers/
drivers/irqchip/irq-imx-gpcv2.c
drivers/soc/imx/gpcv2.c

IMHO, this is a bad/broken design.

So what's happening is that fw_devlink will block the probe of
38320000.blk-ctrl until 303a0000.gpc is initialized. And it stops
blocking the probe of 38320000.blk-ctrl as soon as the first driver
initializes the device. In this case, it's the irqchip driver.

I'd recommend combining these drivers into one. Something like the
patch I'm attaching (sorry for the attachment, copy-paste is mangling
the tabs). Can you give it a shot please?

-Saravana

[-- Attachment #2: 0001-combine-drivers.patch --]
[-- Type: application/x-patch, Size: 3528 bytes --]

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v2 1/9] PM: domains: Delete usage of driver_deferred_probe_check_state()
  2022-06-30 23:30                     ` Saravana Kannan
@ 2022-07-01  5:33                       ` Tony Lindgren
  2022-07-01  6:12                         ` Tony Lindgren
  0 siblings, 1 reply; 69+ messages in thread
From: Tony Lindgren @ 2022-07-01  5:33 UTC (permalink / raw)
  To: Saravana Kannan
  Cc: Rob Herring, Geert Uytterhoeven, Greg Kroah-Hartman,
	Rafael J. Wysocki, Kevin Hilman, Ulf Hansson, Len Brown,
	Pavel Machek, Joerg Roedel, Will Deacon, Andrew Lunn,
	Heiner Kallweit, Russell King, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Linus Walleij, Hideaki YOSHIFUJI,
	David Ahern, Android Kernel Team, linux-kernel,
	open list:THERMAL, Linux IOMMU, netdev, open list:GPIO SUBSYSTEM,
	Alexander Stein

* Saravana Kannan <saravanak@google.com> [220630 23:25]:
> On Thu, Jun 30, 2022 at 4:26 PM Rob Herring <robh@kernel.org> wrote:
> >
> > On Thu, Jun 30, 2022 at 5:11 PM Saravana Kannan <saravanak@google.com> wrote:
> > >
> > > On Mon, Jun 27, 2022 at 2:10 AM Tony Lindgren <tony@atomide.com> wrote:
> > > >
> > > > * Saravana Kannan <saravanak@google.com> [220623 08:17]:
> > > > > On Thu, Jun 23, 2022 at 12:01 AM Tony Lindgren <tony@atomide.com> wrote:
> > > > > >
> > > > > > * Saravana Kannan <saravanak@google.com> [220622 19:05]:
> > > > > > > On Tue, Jun 21, 2022 at 9:59 PM Tony Lindgren <tony@atomide.com> wrote:
> > > > > > > > This issue is no directly related fw_devlink. It is a side effect of
> > > > > > > > removing driver_deferred_probe_check_state(). We no longer return
> > > > > > > > -EPROBE_DEFER at the end of driver_deferred_probe_check_state().
> > > > > > >
> > > > > > > Yes, I understand the issue. But driver_deferred_probe_check_state()
> > > > > > > was deleted because fw_devlink=on should have short circuited the
> > > > > > > probe attempt with an  -EPROBE_DEFER before reaching the bus/driver
> > > > > > > probe function and hitting this -ENOENT failure. That's why I was
> > > > > > > asking the other questions.
> > > > > >
> > > > > > OK. So where is the -EPROBE_DEFER supposed to happen without
> > > > > > driver_deferred_probe_check_state() then?
> > > > >
> > > > > device_links_check_suppliers() call inside really_probe() would short
> > > > > circuit and return an -EPROBE_DEFER if the device links are created as
> > > > > expected.
> > > >
> > > > OK
> > > >
> > > > > > Hmm so I'm not seeing any supplier for the top level ocp device in
> > > > > > the booting case without your patches. I see the suppliers for the
> > > > > > ocp child device instances only.
> > > > >
> > > > > Hmmm... this is strange (that the device link isn't there), but this
> > > > > is what I suspected.
> > > >
> > > > Yup, maybe it's because of the supplier being a device in the child
> > > > interconnect for the ocp.
> > >
> > > Ugh... yeah, this is why the normal (not SYNC_STATE_ONLY) device link
> > > isn't being created.
> > >
> > > So the aggregated view is something like (I had to set tabs = 4 space
> > > to fit it within 80 cols):
> > >
> > >     ocp: ocp {         <========================= Consumer
> > >         compatible = "simple-pm-bus";
> > >         power-domains = <&prm_per>; <=========== Supplier ref
> > >
> > >                 l4_wkup: interconnect@44c00000 {
> > >             compatible = "ti,am33xx-l4-wkup", "simple-pm-bus";
> > >
> > >             segment@200000 {  /* 0x44e00000 */
> > >                 compatible = "simple-pm-bus";
> > >
> > >                 target-module@0 { /* 0x44e00000, ap 8 58.0 */
> > >                     compatible = "ti,sysc-omap4", "ti,sysc";
> > >
> > >                     prcm: prcm@0 {
> > >                         compatible = "ti,am3-prcm", "simple-bus";
> > >
> > >                         prm_per: prm@c00 { <========= Actual Supplier
> > >                             compatible = "ti,am3-prm-inst", "ti,omap-prm-inst";
> > >                         };
> > >                     };
> > >                 };
> > >             };
> > >         };
> > >     };
> > >
> > > The power-domain supplier is the great-great-great-grand-child of the
> > > consumer. It's not clear to me how this is valid. What does it even
> > > mean?
> > >
> > > Rob, is this considered a valid DT?
> >
> > Valid DT for broken h/w.
> 
> I'm not sure even in that case it's valid. When the parent device is
> in reset (when the SoC is coming out of reset), there's no way the
> descendant is functional. And if the descendant is not functional, how
> is the parent device powered up? This just feels like an incorrect
> representation of the real h/w.

It should be correct representation based on scanning the interconnects
and looking at the documentation. Some interconnect parts are wired
always-on and some interconnect instances may be dual-mapped.

We have a quirk to probe prm/prcm first with pdata_quirks_init_clocks().
Maybe that also now fails in addition to the top level interconnect
probing no longer producing -EPROBE_DEFER.

> > So the domain must be default on and then simple-pm-bus is going to
> > hold a reference to the domain preventing it from ever getting powered
> > off and things seem to work. Except what happens during suspend?
> 
> But how can simple-pm-bus even get a reference? The PM domain can't
> get added until we are well into the probe of the simple-pm-bus and
> AFAICT the genpd attach is done before the driver probe is even
> called.

The prm/prcm gets of_platform_populate() called on it early.

Regards,

Tony

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: (EXT) Re: [PATCH v2 1/9] PM: domains: Delete usage of driver_deferred_probe_check_state()
  2022-07-01  0:37       ` Saravana Kannan
@ 2022-07-01  6:01         ` Alexander Stein
  2022-07-01  7:02           ` Saravana Kannan
  2022-07-01  7:30         ` Geert Uytterhoeven
  1 sibling, 1 reply; 69+ messages in thread
From: Alexander Stein @ 2022-07-01  6:01 UTC (permalink / raw)
  To: Saravana Kannan
  Cc: Tony Lindgren, Greg Kroah-Hartman, Rafael J. Wysocki,
	Kevin Hilman, Ulf Hansson, Len Brown, Pavel Machek, Joerg Roedel,
	Will Deacon, Andrew Lunn, Heiner Kallweit, Russell King,
	David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Linus Walleij, Hideaki YOSHIFUJI, David Ahern, kernel-team,
	linux-kernel, linux-pm, iommu, netdev, linux-gpio,
	Geert Uytterhoeven

Hi Saravana,

Am Freitag, 1. Juli 2022, 02:37:14 CEST schrieb Saravana Kannan:
> On Thu, Jun 23, 2022 at 5:08 AM Alexander Stein
> 
> <alexander.stein@ew.tq-group.com> wrote:
> > Hi,
> > 
> > Am Dienstag, 21. Juni 2022, 09:28:43 CEST schrieb Tony Lindgren:
> > > Hi,
> > > 
> > > * Saravana Kannan <saravanak@google.com> [700101 02:00]:
> > > > Now that fw_devlink=on by default and fw_devlink supports
> > > > "power-domains" property, the execution will never get to the point
> > > > where driver_deferred_probe_check_state() is called before the
> > > > supplier
> > > > has probed successfully or before deferred probe timeout has expired.
> > > > 
> > > > So, delete the call and replace it with -ENODEV.
> > > 
> > > Looks like this causes omaps to not boot in Linux next. With this
> > > simple-pm-bus fails to probe initially as the power-domain is not
> > > yet available. On platform_probe() genpd_get_from_provider() returns
> > > -ENOENT.
> > > 
> > > Seems like other stuff is potentially broken too, any ideas on
> > > how to fix this?
> > 
> > I think I'm hit by this as well, although I do not get a lockup.
> > In my case I'm using
> > arch/arm64/boot/dts/freescale/imx8mq-tqma8mq-mba8mx.dts and probing of
> > 38320000.blk-ctrl fails as the power-domain is not (yet) registed.
> 
> Ok, took a look.
> 
> The problem is that there are two drivers for the same device and they
> both initialize this device.
> 
>     gpc: gpc@303a0000 {
>         compatible = "fsl,imx8mq-gpc";
>     }
> 
> $ git grep -l "fsl,imx7d-gpc" -- drivers/
> drivers/irqchip/irq-imx-gpcv2.c
> drivers/soc/imx/gpcv2.c
> 
> IMHO, this is a bad/broken design.
> 
> So what's happening is that fw_devlink will block the probe of
> 38320000.blk-ctrl until 303a0000.gpc is initialized. And it stops
> blocking the probe of 38320000.blk-ctrl as soon as the first driver
> initializes the device. In this case, it's the irqchip driver.
> 
> I'd recommend combining these drivers into one. Something like the
> patch I'm attaching (sorry for the attachment, copy-paste is mangling
> the tabs). Can you give it a shot please?

I tried this patch and it delayed the driver initialization (those of UART as 
well BTW). Unfortunately the driver fails the same way:
> [    1.125253] imx8m-blk-ctrl 38320000.blk-ctrl: error -ENODEV: failed to 
attach power domain "bus"

More than that it even introduced some more errors:
> [    0.008160] irq: no irq domain found for gpc@303a0000 !
> [    0.013251] Failed to map interrupt for
> /soc@0/bus@30400000/timer@306a0000
> [    0.020152] Failed to initialize '/soc@0/bus@30400000/timer@306a0000':
> -22

I kept the timestamps to show that these errors happen very early. So now the 
usage of the "global" interrupt parent, set at line 18,
> interrupt-parent = <&gpc>;
is not possible at this point of boot time.

Best regards,
Alexander




^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v2 1/9] PM: domains: Delete usage of driver_deferred_probe_check_state()
  2022-07-01  5:33                       ` Tony Lindgren
@ 2022-07-01  6:12                         ` Tony Lindgren
  2022-07-01  8:10                           ` Saravana Kannan
  0 siblings, 1 reply; 69+ messages in thread
From: Tony Lindgren @ 2022-07-01  6:12 UTC (permalink / raw)
  To: Saravana Kannan
  Cc: Rob Herring, Geert Uytterhoeven, Greg Kroah-Hartman,
	Rafael J. Wysocki, Kevin Hilman, Ulf Hansson, Len Brown,
	Pavel Machek, Joerg Roedel, Will Deacon, Andrew Lunn,
	Heiner Kallweit, Russell King, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Linus Walleij, Hideaki YOSHIFUJI,
	David Ahern, Android Kernel Team, linux-kernel,
	open list:THERMAL, Linux IOMMU, netdev, open list:GPIO SUBSYSTEM,
	Alexander Stein

* Tony Lindgren <tony@atomide.com> [220701 08:33]:
> * Saravana Kannan <saravanak@google.com> [220630 23:25]:
> > On Thu, Jun 30, 2022 at 4:26 PM Rob Herring <robh@kernel.org> wrote:
> > >
> > > On Thu, Jun 30, 2022 at 5:11 PM Saravana Kannan <saravanak@google.com> wrote:
> > > >
> > > > On Mon, Jun 27, 2022 at 2:10 AM Tony Lindgren <tony@atomide.com> wrote:
> > > > >
> > > > > * Saravana Kannan <saravanak@google.com> [220623 08:17]:
> > > > > > On Thu, Jun 23, 2022 at 12:01 AM Tony Lindgren <tony@atomide.com> wrote:
> > > > > > >
> > > > > > > * Saravana Kannan <saravanak@google.com> [220622 19:05]:
> > > > > > > > On Tue, Jun 21, 2022 at 9:59 PM Tony Lindgren <tony@atomide.com> wrote:
> > > > > > > > > This issue is no directly related fw_devlink. It is a side effect of
> > > > > > > > > removing driver_deferred_probe_check_state(). We no longer return
> > > > > > > > > -EPROBE_DEFER at the end of driver_deferred_probe_check_state().
> > > > > > > >
> > > > > > > > Yes, I understand the issue. But driver_deferred_probe_check_state()
> > > > > > > > was deleted because fw_devlink=on should have short circuited the
> > > > > > > > probe attempt with an  -EPROBE_DEFER before reaching the bus/driver
> > > > > > > > probe function and hitting this -ENOENT failure. That's why I was
> > > > > > > > asking the other questions.
> > > > > > >
> > > > > > > OK. So where is the -EPROBE_DEFER supposed to happen without
> > > > > > > driver_deferred_probe_check_state() then?
> > > > > >
> > > > > > device_links_check_suppliers() call inside really_probe() would short
> > > > > > circuit and return an -EPROBE_DEFER if the device links are created as
> > > > > > expected.
> > > > >
> > > > > OK
> > > > >
> > > > > > > Hmm so I'm not seeing any supplier for the top level ocp device in
> > > > > > > the booting case without your patches. I see the suppliers for the
> > > > > > > ocp child device instances only.
> > > > > >
> > > > > > Hmmm... this is strange (that the device link isn't there), but this
> > > > > > is what I suspected.
> > > > >
> > > > > Yup, maybe it's because of the supplier being a device in the child
> > > > > interconnect for the ocp.
> > > >
> > > > Ugh... yeah, this is why the normal (not SYNC_STATE_ONLY) device link
> > > > isn't being created.
> > > >
> > > > So the aggregated view is something like (I had to set tabs = 4 space
> > > > to fit it within 80 cols):
> > > >
> > > >     ocp: ocp {         <========================= Consumer
> > > >         compatible = "simple-pm-bus";
> > > >         power-domains = <&prm_per>; <=========== Supplier ref
> > > >
> > > >                 l4_wkup: interconnect@44c00000 {
> > > >             compatible = "ti,am33xx-l4-wkup", "simple-pm-bus";
> > > >
> > > >             segment@200000 {  /* 0x44e00000 */
> > > >                 compatible = "simple-pm-bus";
> > > >
> > > >                 target-module@0 { /* 0x44e00000, ap 8 58.0 */
> > > >                     compatible = "ti,sysc-omap4", "ti,sysc";
> > > >
> > > >                     prcm: prcm@0 {
> > > >                         compatible = "ti,am3-prcm", "simple-bus";
> > > >
> > > >                         prm_per: prm@c00 { <========= Actual Supplier
> > > >                             compatible = "ti,am3-prm-inst", "ti,omap-prm-inst";
> > > >                         };
> > > >                     };
> > > >                 };
> > > >             };
> > > >         };
> > > >     };
> > > >
> > > > The power-domain supplier is the great-great-great-grand-child of the
> > > > consumer. It's not clear to me how this is valid. What does it even
> > > > mean?
> > > >
> > > > Rob, is this considered a valid DT?
> > >
> > > Valid DT for broken h/w.
> > 
> > I'm not sure even in that case it's valid. When the parent device is
> > in reset (when the SoC is coming out of reset), there's no way the
> > descendant is functional. And if the descendant is not functional, how
> > is the parent device powered up? This just feels like an incorrect
> > representation of the real h/w.
> 
> It should be correct representation based on scanning the interconnects
> and looking at the documentation. Some interconnect parts are wired
> always-on and some interconnect instances may be dual-mapped.
> 
> We have a quirk to probe prm/prcm first with pdata_quirks_init_clocks().
> Maybe that also now fails in addition to the top level interconnect
> probing no longer producing -EPROBE_DEFER.
> 
> > > So the domain must be default on and then simple-pm-bus is going to
> > > hold a reference to the domain preventing it from ever getting powered
> > > off and things seem to work. Except what happens during suspend?
> > 
> > But how can simple-pm-bus even get a reference? The PM domain can't
> > get added until we are well into the probe of the simple-pm-bus and
> > AFAICT the genpd attach is done before the driver probe is even
> > called.
> 
> The prm/prcm gets of_platform_populate() called on it early.

The hackish patch below makes things boot for me, not convinced this
is the preferred fix compared to earlier deferred probe handling though.
Going back to the init level tinkering seems like a step back to me.

Regards,

Tony

8< ----------------
diff --git a/drivers/soc/ti/omap_prm.c b/drivers/soc/ti/omap_prm.c
--- a/drivers/soc/ti/omap_prm.c
+++ b/drivers/soc/ti/omap_prm.c
@@ -991,4 +991,9 @@ static struct platform_driver omap_prm_driver = {
 		.of_match_table	= omap_prm_id_table,
 	},
 };
-builtin_platform_driver(omap_prm_driver);
+
+static int __init omap_prm_init(void)
+{
+        return platform_driver_register(&omap_prm_driver);
+}
+subsys_initcall(omap_prm_init);
-- 
2.36.1

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: (EXT) Re: [PATCH v2 1/9] PM: domains: Delete usage of driver_deferred_probe_check_state()
  2022-07-01  6:01         ` (EXT) " Alexander Stein
@ 2022-07-01  7:02           ` Saravana Kannan
  2022-07-04  7:07             ` (EXT) " Alexander Stein
  0 siblings, 1 reply; 69+ messages in thread
From: Saravana Kannan @ 2022-07-01  7:02 UTC (permalink / raw)
  To: Alexander Stein
  Cc: Tony Lindgren, Greg Kroah-Hartman, Rafael J. Wysocki,
	Kevin Hilman, Ulf Hansson, Len Brown, Pavel Machek, Joerg Roedel,
	Will Deacon, Andrew Lunn, Heiner Kallweit, Russell King,
	David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Linus Walleij, Hideaki YOSHIFUJI, David Ahern, kernel-team,
	linux-kernel, linux-pm, iommu, netdev, linux-gpio,
	Geert Uytterhoeven

On Thu, Jun 30, 2022 at 11:02 PM Alexander Stein
<alexander.stein@ew.tq-group.com> wrote:
>
> Hi Saravana,
>
> Am Freitag, 1. Juli 2022, 02:37:14 CEST schrieb Saravana Kannan:
> > On Thu, Jun 23, 2022 at 5:08 AM Alexander Stein
> >
> > <alexander.stein@ew.tq-group.com> wrote:
> > > Hi,
> > >
> > > Am Dienstag, 21. Juni 2022, 09:28:43 CEST schrieb Tony Lindgren:
> > > > Hi,
> > > >
> > > > * Saravana Kannan <saravanak@google.com> [700101 02:00]:
> > > > > Now that fw_devlink=on by default and fw_devlink supports
> > > > > "power-domains" property, the execution will never get to the point
> > > > > where driver_deferred_probe_check_state() is called before the
> > > > > supplier
> > > > > has probed successfully or before deferred probe timeout has expired.
> > > > >
> > > > > So, delete the call and replace it with -ENODEV.
> > > >
> > > > Looks like this causes omaps to not boot in Linux next. With this
> > > > simple-pm-bus fails to probe initially as the power-domain is not
> > > > yet available. On platform_probe() genpd_get_from_provider() returns
> > > > -ENOENT.
> > > >
> > > > Seems like other stuff is potentially broken too, any ideas on
> > > > how to fix this?
> > >
> > > I think I'm hit by this as well, although I do not get a lockup.
> > > In my case I'm using
> > > arch/arm64/boot/dts/freescale/imx8mq-tqma8mq-mba8mx.dts and probing of
> > > 38320000.blk-ctrl fails as the power-domain is not (yet) registed.
> >
> > Ok, took a look.
> >
> > The problem is that there are two drivers for the same device and they
> > both initialize this device.
> >
> >     gpc: gpc@303a0000 {
> >         compatible = "fsl,imx8mq-gpc";
> >     }
> >
> > $ git grep -l "fsl,imx7d-gpc" -- drivers/
> > drivers/irqchip/irq-imx-gpcv2.c
> > drivers/soc/imx/gpcv2.c
> >
> > IMHO, this is a bad/broken design.
> >
> > So what's happening is that fw_devlink will block the probe of
> > 38320000.blk-ctrl until 303a0000.gpc is initialized. And it stops
> > blocking the probe of 38320000.blk-ctrl as soon as the first driver
> > initializes the device. In this case, it's the irqchip driver.
> >
> > I'd recommend combining these drivers into one. Something like the
> > patch I'm attaching (sorry for the attachment, copy-paste is mangling
> > the tabs). Can you give it a shot please?
>
> I tried this patch and it delayed the driver initialization (those of UART as
> well BTW). Unfortunately the driver fails the same way:

Thanks for testing the patch!

> > [    1.125253] imx8m-blk-ctrl 38320000.blk-ctrl: error -ENODEV: failed to
> attach power domain "bus"
>
> More than that it even introduced some more errors:
> > [    0.008160] irq: no irq domain found for gpc@303a0000 !

So the idea behind my change was that as long as the irqchip isn't the
root of the irqdomain (might be using the terms incorrectly) like the
gic, you can make it a platform driver. And I was trying to hack up a
patch that's the equivalent of platform_irqchip_probe() (which just
ends up eventually calling the callback you use in IRQCHIP_DECLARE().
I probably made some mistake in the quick hack that I'm sure if
fixable.

> > [    0.013251] Failed to map interrupt for
> > /soc@0/bus@30400000/timer@306a0000

However, this timer driver also uses TIMER_OF_DECLARE() which can't
handle failure to get the IRQ (because it's can't -EPROBE_DEFER). So,
this means, the timer driver inturn needs to be converted to a
platform driver if it's supposed to work with the IRQCHIP_DECLARE()
being converted to a platform driver.

But that's a can of worms not worth opening. But then I remembered
this simpler workaround will work and it is pretty much a variant of
the workaround that's already in the gpc's irqchip driver to allow two
drivers to probe the same device (people really should stop doing
that).

Can you drop my previous hack patch and try this instead please? I'm
99% sure this will work.

diff --git a/drivers/irqchip/irq-imx-gpcv2.c b/drivers/irqchip/irq-imx-gpcv2.c
index b9c22f764b4d..8a0e82067924 100644
--- a/drivers/irqchip/irq-imx-gpcv2.c
+++ b/drivers/irqchip/irq-imx-gpcv2.c
@@ -283,6 +283,7 @@ static int __init imx_gpcv2_irqchip_init(struct
device_node *node,
         * later the GPC power domain driver will not be skipped.
         */
        of_node_clear_flag(node, OF_POPULATED);
+       fwnode_dev_initialized(domain->fwnode, false);
        return 0;
 }

-Saravana

^ permalink raw reply related	[flat|nested] 69+ messages in thread

* Re: [PATCH v2 1/9] PM: domains: Delete usage of driver_deferred_probe_check_state()
  2022-07-01  0:37       ` Saravana Kannan
  2022-07-01  6:01         ` (EXT) " Alexander Stein
@ 2022-07-01  7:30         ` Geert Uytterhoeven
  1 sibling, 0 replies; 69+ messages in thread
From: Geert Uytterhoeven @ 2022-07-01  7:30 UTC (permalink / raw)
  To: Saravana Kannan
  Cc: Alexander Stein, Tony Lindgren, Greg Kroah-Hartman,
	Rafael J. Wysocki, Kevin Hilman, Ulf Hansson, Len Brown,
	Pavel Machek, Joerg Roedel, Will Deacon, Andrew Lunn,
	Heiner Kallweit, Russell King, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Linus Walleij, Hideaki YOSHIFUJI,
	David Ahern, Android Kernel Team, Linux Kernel Mailing List,
	Linux PM list, Linux IOMMU, netdev, open list:GPIO SUBSYSTEM

Hi Saravana,

On Fri, Jul 1, 2022 at 2:37 AM Saravana Kannan <saravanak@google.com> wrote:
> On Thu, Jun 23, 2022 at 5:08 AM Alexander Stein
> <alexander.stein@ew.tq-group.com> wrote:
> > Am Dienstag, 21. Juni 2022, 09:28:43 CEST schrieb Tony Lindgren:

> > > * Saravana Kannan <saravanak@google.com> [700101 02:00]:
> > > > Now that fw_devlink=on by default and fw_devlink supports
> > > > "power-domains" property, the execution will never get to the point
> > > > where driver_deferred_probe_check_state() is called before the supplier
> > > > has probed successfully or before deferred probe timeout has expired.
> > > >
> > > > So, delete the call and replace it with -ENODEV.
> > >
> > > Looks like this causes omaps to not boot in Linux next. With this
> > > simple-pm-bus fails to probe initially as the power-domain is not
> > > yet available. On platform_probe() genpd_get_from_provider() returns
> > > -ENOENT.
> > >
> > > Seems like other stuff is potentially broken too, any ideas on
> > > how to fix this?
> >
> > I think I'm hit by this as well, although I do not get a lockup.
> > In my case I'm using arch/arm64/boot/dts/freescale/imx8mq-tqma8mq-mba8mx.dts
> > and probing of 38320000.blk-ctrl fails as the power-domain is not (yet)
> > registed.
>
> Ok, took a look.
>
> The problem is that there are two drivers for the same device and they
> both initialize this device.
>
>     gpc: gpc@303a0000 {
>         compatible = "fsl,imx8mq-gpc";
>     }
>
> $ git grep -l "fsl,imx7d-gpc" -- drivers/
> drivers/irqchip/irq-imx-gpcv2.c
> drivers/soc/imx/gpcv2.c

You missed the "driver" in arch/arm/mach-imx/src.c ;-)

Gr{oetje,eeting}s,

                        Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v2 1/9] PM: domains: Delete usage of driver_deferred_probe_check_state()
  2022-06-30 23:10                 ` Saravana Kannan
  2022-06-30 23:26                   ` Rob Herring
@ 2022-07-01  7:38                   ` Geert Uytterhoeven
  1 sibling, 0 replies; 69+ messages in thread
From: Geert Uytterhoeven @ 2022-07-01  7:38 UTC (permalink / raw)
  To: Saravana Kannan
  Cc: Tony Lindgren, Rob Herring, Greg Kroah-Hartman,
	Rafael J. Wysocki, Kevin Hilman, Ulf Hansson, Len Brown,
	Pavel Machek, Joerg Roedel, Will Deacon, Andrew Lunn,
	Heiner Kallweit, Russell King, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Linus Walleij, Hideaki YOSHIFUJI,
	David Ahern, Android Kernel Team, Linux Kernel Mailing List,
	Linux PM list, Linux IOMMU, netdev, open list:GPIO SUBSYSTEM,
	Alexander Stein

Hi Saravana,

On Fri, Jul 1, 2022 at 1:11 AM Saravana Kannan <saravanak@google.com> wrote:
> On Mon, Jun 27, 2022 at 2:10 AM Tony Lindgren <tony@atomide.com> wrote:
> > * Saravana Kannan <saravanak@google.com> [220623 08:17]:
> > > On Thu, Jun 23, 2022 at 12:01 AM Tony Lindgren <tony@atomide.com> wrote:
> > > > * Saravana Kannan <saravanak@google.com> [220622 19:05]:
> > > > > On Tue, Jun 21, 2022 at 9:59 PM Tony Lindgren <tony@atomide.com> wrote:
> > > > > > This issue is no directly related fw_devlink. It is a side effect of
> > > > > > removing driver_deferred_probe_check_state(). We no longer return
> > > > > > -EPROBE_DEFER at the end of driver_deferred_probe_check_state().
> > > > >
> > > > > Yes, I understand the issue. But driver_deferred_probe_check_state()
> > > > > was deleted because fw_devlink=on should have short circuited the
> > > > > probe attempt with an  -EPROBE_DEFER before reaching the bus/driver
> > > > > probe function and hitting this -ENOENT failure. That's why I was
> > > > > asking the other questions.
> > > >
> > > > OK. So where is the -EPROBE_DEFER supposed to happen without
> > > > driver_deferred_probe_check_state() then?
> > >
> > > device_links_check_suppliers() call inside really_probe() would short
> > > circuit and return an -EPROBE_DEFER if the device links are created as
> > > expected.
> >
> > OK
> >
> > > > Hmm so I'm not seeing any supplier for the top level ocp device in
> > > > the booting case without your patches. I see the suppliers for the
> > > > ocp child device instances only.
> > >
> > > Hmmm... this is strange (that the device link isn't there), but this
> > > is what I suspected.
> >
> > Yup, maybe it's because of the supplier being a device in the child
> > interconnect for the ocp.
>
> Ugh... yeah, this is why the normal (not SYNC_STATE_ONLY) device link
> isn't being created.
>
> So the aggregated view is something like (I had to set tabs = 4 space
> to fit it within 80 cols):
>
>     ocp: ocp {         <========================= Consumer
>         compatible = "simple-pm-bus";
>         power-domains = <&prm_per>; <=========== Supplier ref
>
>                 l4_wkup: interconnect@44c00000 {
>             compatible = "ti,am33xx-l4-wkup", "simple-pm-bus";
>
>             segment@200000 {  /* 0x44e00000 */
>                 compatible = "simple-pm-bus";
>
>                 target-module@0 { /* 0x44e00000, ap 8 58.0 */
>                     compatible = "ti,sysc-omap4", "ti,sysc";
>
>                     prcm: prcm@0 {
>                         compatible = "ti,am3-prcm", "simple-bus";
>
>                         prm_per: prm@c00 { <========= Actual Supplier
>                             compatible = "ti,am3-prm-inst", "ti,omap-prm-inst";
>                         };
>                     };
>                 };
>             };
>         };
>     };
>
> The power-domain supplier is the great-great-great-grand-child of the
> consumer. It's not clear to me how this is valid. What does it even
> mean?
>
> Rob, is this considered a valid DT?
>
> Geert, thoughts on whether this is a correct use of simple-pm-bus device?

Well, if the hardware is wired that way...

It's not that dissimilar from CPU cores, and interrupt and GPIO
controllers in power domains and clocked by controllable clocks:
you can cut the branch you're sitting on, and you have to be careful
when going to sleep, and make sure your wake-up sources are still
functional.

> Also, how is the power domain attach/get working in this case? As far
> as I can tell, at least for "simple-pm-bus" devices, the pm domain
> attachment is happening under:
> really_probe() -> call_driver_probe -> platform_probe() ->
> dev_pm_domain_attach()
>
> So, how is the pm domain attach succeeding in the first place without
> my changes?

That's a software thing ;-)

Gr{oetje,eeting}s,

                        Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v2 1/9] PM: domains: Delete usage of driver_deferred_probe_check_state()
  2022-07-01  6:12                         ` Tony Lindgren
@ 2022-07-01  8:10                           ` Saravana Kannan
  2022-07-01  8:26                             ` Saravana Kannan
  0 siblings, 1 reply; 69+ messages in thread
From: Saravana Kannan @ 2022-07-01  8:10 UTC (permalink / raw)
  To: Tony Lindgren
  Cc: Rob Herring, Geert Uytterhoeven, Greg Kroah-Hartman,
	Rafael J. Wysocki, Kevin Hilman, Ulf Hansson, Len Brown,
	Pavel Machek, Joerg Roedel, Will Deacon, Andrew Lunn,
	Heiner Kallweit, Russell King, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Linus Walleij, Hideaki YOSHIFUJI,
	David Ahern, Android Kernel Team, linux-kernel,
	open list:THERMAL, Linux IOMMU, netdev, open list:GPIO SUBSYSTEM,
	Alexander Stein

On Thu, Jun 30, 2022 at 11:12 PM Tony Lindgren <tony@atomide.com> wrote:
>
> * Tony Lindgren <tony@atomide.com> [220701 08:33]:
> > * Saravana Kannan <saravanak@google.com> [220630 23:25]:
> > > On Thu, Jun 30, 2022 at 4:26 PM Rob Herring <robh@kernel.org> wrote:
> > > >
> > > > On Thu, Jun 30, 2022 at 5:11 PM Saravana Kannan <saravanak@google.com> wrote:
> > > > >
> > > > > On Mon, Jun 27, 2022 at 2:10 AM Tony Lindgren <tony@atomide.com> wrote:
> > > > > >
> > > > > > * Saravana Kannan <saravanak@google.com> [220623 08:17]:
> > > > > > > On Thu, Jun 23, 2022 at 12:01 AM Tony Lindgren <tony@atomide.com> wrote:
> > > > > > > >
> > > > > > > > * Saravana Kannan <saravanak@google.com> [220622 19:05]:
> > > > > > > > > On Tue, Jun 21, 2022 at 9:59 PM Tony Lindgren <tony@atomide.com> wrote:
> > > > > > > > > > This issue is no directly related fw_devlink. It is a side effect of
> > > > > > > > > > removing driver_deferred_probe_check_state(). We no longer return
> > > > > > > > > > -EPROBE_DEFER at the end of driver_deferred_probe_check_state().
> > > > > > > > >
> > > > > > > > > Yes, I understand the issue. But driver_deferred_probe_check_state()
> > > > > > > > > was deleted because fw_devlink=on should have short circuited the
> > > > > > > > > probe attempt with an  -EPROBE_DEFER before reaching the bus/driver
> > > > > > > > > probe function and hitting this -ENOENT failure. That's why I was
> > > > > > > > > asking the other questions.
> > > > > > > >
> > > > > > > > OK. So where is the -EPROBE_DEFER supposed to happen without
> > > > > > > > driver_deferred_probe_check_state() then?
> > > > > > >
> > > > > > > device_links_check_suppliers() call inside really_probe() would short
> > > > > > > circuit and return an -EPROBE_DEFER if the device links are created as
> > > > > > > expected.
> > > > > >
> > > > > > OK
> > > > > >
> > > > > > > > Hmm so I'm not seeing any supplier for the top level ocp device in
> > > > > > > > the booting case without your patches. I see the suppliers for the
> > > > > > > > ocp child device instances only.
> > > > > > >
> > > > > > > Hmmm... this is strange (that the device link isn't there), but this
> > > > > > > is what I suspected.
> > > > > >
> > > > > > Yup, maybe it's because of the supplier being a device in the child
> > > > > > interconnect for the ocp.
> > > > >
> > > > > Ugh... yeah, this is why the normal (not SYNC_STATE_ONLY) device link
> > > > > isn't being created.
> > > > >
> > > > > So the aggregated view is something like (I had to set tabs = 4 space
> > > > > to fit it within 80 cols):
> > > > >
> > > > >     ocp: ocp {         <========================= Consumer
> > > > >         compatible = "simple-pm-bus";
> > > > >         power-domains = <&prm_per>; <=========== Supplier ref
> > > > >
> > > > >                 l4_wkup: interconnect@44c00000 {
> > > > >             compatible = "ti,am33xx-l4-wkup", "simple-pm-bus";
> > > > >
> > > > >             segment@200000 {  /* 0x44e00000 */
> > > > >                 compatible = "simple-pm-bus";
> > > > >
> > > > >                 target-module@0 { /* 0x44e00000, ap 8 58.0 */
> > > > >                     compatible = "ti,sysc-omap4", "ti,sysc";
> > > > >
> > > > >                     prcm: prcm@0 {
> > > > >                         compatible = "ti,am3-prcm", "simple-bus";
> > > > >
> > > > >                         prm_per: prm@c00 { <========= Actual Supplier
> > > > >                             compatible = "ti,am3-prm-inst", "ti,omap-prm-inst";
> > > > >                         };
> > > > >                     };
> > > > >                 };
> > > > >             };
> > > > >         };
> > > > >     };
> > > > >
> > > > > The power-domain supplier is the great-great-great-grand-child of the
> > > > > consumer. It's not clear to me how this is valid. What does it even
> > > > > mean?
> > > > >
> > > > > Rob, is this considered a valid DT?
> > > >
> > > > Valid DT for broken h/w.
> > >
> > > I'm not sure even in that case it's valid. When the parent device is
> > > in reset (when the SoC is coming out of reset), there's no way the
> > > descendant is functional. And if the descendant is not functional, how
> > > is the parent device powered up? This just feels like an incorrect
> > > representation of the real h/w.
> >
> > It should be correct representation based on scanning the interconnects
> > and looking at the documentation. Some interconnect parts are wired
> > always-on and some interconnect instances may be dual-mapped.

Thanks for helping to debug this. Appreciate it.

> >
> > We have a quirk to probe prm/prcm first with pdata_quirks_init_clocks().

:'(

I checked out the code. These prm devices just get populated with NULL
as the parent. So they are effectively top level devices from the
perspective of driver core.

> > Maybe that also now fails in addition to the top level interconnect
> > probing no longer producing -EPROBE_DEFER.

As far as I can tell pdata_quirks_init_clocks() is just adding these
prm devices (amongst other drivers). So I don't expect that to fail.

> >
> > > > So the domain must be default on and then simple-pm-bus is going to
> > > > hold a reference to the domain preventing it from ever getting powered
> > > > off and things seem to work. Except what happens during suspend?
> > >
> > > But how can simple-pm-bus even get a reference? The PM domain can't
> > > get added until we are well into the probe of the simple-pm-bus and
> > > AFAICT the genpd attach is done before the driver probe is even
> > > called.
> >
> > The prm/prcm gets of_platform_populate() called on it early.

:'(

> The hackish patch below makes things boot for me, not convinced this
> is the preferred fix compared to earlier deferred probe handling though.
> Going back to the init level tinkering seems like a step back to me.

The goal of fw_devlink is to avoid init level tinkering and it does
help with that in general. But these kinds of quirks are going to need
a few exceptions -- with them being quirks and all. And this change
will avoid an unnecessary deferred probe (that used to happen even
before my change).

The other option to handle this quirk is to create the invalid
(consumer is parent of supplier) fwnode_link between the prm device
and its consumers when the prm device is populated. Then fw_devlink
will end up creating a device link when ocp gets added. But I'm not
sure if it's going to be easy to find and add all those consumers.

I'd say, for now, let's go with this patch below. I'll see if I can
get fw_devlink to handle these odd quirks without breaking the normal
cases or making them significantly slower. But that'll take some time
and I'm not sure there'll be a nice solution.

Thanks,
Saravana

> Regards,
>
> Tony
>
> 8< ----------------
> diff --git a/drivers/soc/ti/omap_prm.c b/drivers/soc/ti/omap_prm.c
> --- a/drivers/soc/ti/omap_prm.c
> +++ b/drivers/soc/ti/omap_prm.c
> @@ -991,4 +991,9 @@ static struct platform_driver omap_prm_driver = {
>                 .of_match_table = omap_prm_id_table,
>         },
>  };
> -builtin_platform_driver(omap_prm_driver);
> +
> +static int __init omap_prm_init(void)
> +{
> +        return platform_driver_register(&omap_prm_driver);
> +}
> +subsys_initcall(omap_prm_init);
> --
> 2.36.1
>
> --
> To unsubscribe from this group and stop receiving emails from it, send an email to kernel-team+unsubscribe@android.com.
>

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v2 1/9] PM: domains: Delete usage of driver_deferred_probe_check_state()
  2022-07-01  8:10                           ` Saravana Kannan
@ 2022-07-01  8:26                             ` Saravana Kannan
  2022-07-01 13:00                               ` Tony Lindgren
  2022-07-01 15:08                               ` Sudeep Holla
  0 siblings, 2 replies; 69+ messages in thread
From: Saravana Kannan @ 2022-07-01  8:26 UTC (permalink / raw)
  To: Tony Lindgren
  Cc: Rob Herring, Geert Uytterhoeven, Greg Kroah-Hartman,
	Rafael J. Wysocki, Kevin Hilman, Ulf Hansson, Len Brown,
	Pavel Machek, Joerg Roedel, Will Deacon, Andrew Lunn,
	Heiner Kallweit, Russell King, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Linus Walleij, Hideaki YOSHIFUJI,
	David Ahern, Android Kernel Team, linux-kernel,
	open list:THERMAL, Linux IOMMU, netdev, open list:GPIO SUBSYSTEM,
	Alexander Stein

On Fri, Jul 1, 2022 at 1:10 AM Saravana Kannan <saravanak@google.com> wrote:
>
> On Thu, Jun 30, 2022 at 11:12 PM Tony Lindgren <tony@atomide.com> wrote:
> >
> > * Tony Lindgren <tony@atomide.com> [220701 08:33]:
> > > * Saravana Kannan <saravanak@google.com> [220630 23:25]:
> > > > On Thu, Jun 30, 2022 at 4:26 PM Rob Herring <robh@kernel.org> wrote:
> > > > >
> > > > > On Thu, Jun 30, 2022 at 5:11 PM Saravana Kannan <saravanak@google.com> wrote:
> > > > > >
> > > > > > On Mon, Jun 27, 2022 at 2:10 AM Tony Lindgren <tony@atomide.com> wrote:
> > > > > > >
> > > > > > > * Saravana Kannan <saravanak@google.com> [220623 08:17]:
> > > > > > > > On Thu, Jun 23, 2022 at 12:01 AM Tony Lindgren <tony@atomide.com> wrote:
> > > > > > > > >
> > > > > > > > > * Saravana Kannan <saravanak@google.com> [220622 19:05]:
> > > > > > > > > > On Tue, Jun 21, 2022 at 9:59 PM Tony Lindgren <tony@atomide.com> wrote:
> > > > > > > > > > > This issue is no directly related fw_devlink. It is a side effect of
> > > > > > > > > > > removing driver_deferred_probe_check_state(). We no longer return
> > > > > > > > > > > -EPROBE_DEFER at the end of driver_deferred_probe_check_state().
> > > > > > > > > >
> > > > > > > > > > Yes, I understand the issue. But driver_deferred_probe_check_state()
> > > > > > > > > > was deleted because fw_devlink=on should have short circuited the
> > > > > > > > > > probe attempt with an  -EPROBE_DEFER before reaching the bus/driver
> > > > > > > > > > probe function and hitting this -ENOENT failure. That's why I was
> > > > > > > > > > asking the other questions.
> > > > > > > > >
> > > > > > > > > OK. So where is the -EPROBE_DEFER supposed to happen without
> > > > > > > > > driver_deferred_probe_check_state() then?
> > > > > > > >
> > > > > > > > device_links_check_suppliers() call inside really_probe() would short
> > > > > > > > circuit and return an -EPROBE_DEFER if the device links are created as
> > > > > > > > expected.
> > > > > > >
> > > > > > > OK
> > > > > > >
> > > > > > > > > Hmm so I'm not seeing any supplier for the top level ocp device in
> > > > > > > > > the booting case without your patches. I see the suppliers for the
> > > > > > > > > ocp child device instances only.
> > > > > > > >
> > > > > > > > Hmmm... this is strange (that the device link isn't there), but this
> > > > > > > > is what I suspected.
> > > > > > >
> > > > > > > Yup, maybe it's because of the supplier being a device in the child
> > > > > > > interconnect for the ocp.
> > > > > >
> > > > > > Ugh... yeah, this is why the normal (not SYNC_STATE_ONLY) device link
> > > > > > isn't being created.
> > > > > >
> > > > > > So the aggregated view is something like (I had to set tabs = 4 space
> > > > > > to fit it within 80 cols):
> > > > > >
> > > > > >     ocp: ocp {         <========================= Consumer
> > > > > >         compatible = "simple-pm-bus";
> > > > > >         power-domains = <&prm_per>; <=========== Supplier ref
> > > > > >
> > > > > >                 l4_wkup: interconnect@44c00000 {
> > > > > >             compatible = "ti,am33xx-l4-wkup", "simple-pm-bus";
> > > > > >
> > > > > >             segment@200000 {  /* 0x44e00000 */
> > > > > >                 compatible = "simple-pm-bus";
> > > > > >
> > > > > >                 target-module@0 { /* 0x44e00000, ap 8 58.0 */
> > > > > >                     compatible = "ti,sysc-omap4", "ti,sysc";
> > > > > >
> > > > > >                     prcm: prcm@0 {
> > > > > >                         compatible = "ti,am3-prcm", "simple-bus";
> > > > > >
> > > > > >                         prm_per: prm@c00 { <========= Actual Supplier
> > > > > >                             compatible = "ti,am3-prm-inst", "ti,omap-prm-inst";
> > > > > >                         };
> > > > > >                     };
> > > > > >                 };
> > > > > >             };
> > > > > >         };
> > > > > >     };
> > > > > >
> > > > > > The power-domain supplier is the great-great-great-grand-child of the
> > > > > > consumer. It's not clear to me how this is valid. What does it even
> > > > > > mean?
> > > > > >
> > > > > > Rob, is this considered a valid DT?
> > > > >
> > > > > Valid DT for broken h/w.
> > > >
> > > > I'm not sure even in that case it's valid. When the parent device is
> > > > in reset (when the SoC is coming out of reset), there's no way the
> > > > descendant is functional. And if the descendant is not functional, how
> > > > is the parent device powered up? This just feels like an incorrect
> > > > representation of the real h/w.
> > >
> > > It should be correct representation based on scanning the interconnects
> > > and looking at the documentation. Some interconnect parts are wired
> > > always-on and some interconnect instances may be dual-mapped.
>
> Thanks for helping to debug this. Appreciate it.
>
> > >
> > > We have a quirk to probe prm/prcm first with pdata_quirks_init_clocks().
>
> :'(
>
> I checked out the code. These prm devices just get populated with NULL
> as the parent. So they are effectively top level devices from the
> perspective of driver core.
>
> > > Maybe that also now fails in addition to the top level interconnect
> > > probing no longer producing -EPROBE_DEFER.
>
> As far as I can tell pdata_quirks_init_clocks() is just adding these
> prm devices (amongst other drivers). So I don't expect that to fail.
>
> > >
> > > > > So the domain must be default on and then simple-pm-bus is going to
> > > > > hold a reference to the domain preventing it from ever getting powered
> > > > > off and things seem to work. Except what happens during suspend?
> > > >
> > > > But how can simple-pm-bus even get a reference? The PM domain can't
> > > > get added until we are well into the probe of the simple-pm-bus and
> > > > AFAICT the genpd attach is done before the driver probe is even
> > > > called.
> > >
> > > The prm/prcm gets of_platform_populate() called on it early.
>
> :'(
>
> > The hackish patch below makes things boot for me, not convinced this
> > is the preferred fix compared to earlier deferred probe handling though.
> > Going back to the init level tinkering seems like a step back to me.
>
> The goal of fw_devlink is to avoid init level tinkering and it does
> help with that in general. But these kinds of quirks are going to need
> a few exceptions -- with them being quirks and all. And this change
> will avoid an unnecessary deferred probe (that used to happen even
> before my change).
>
> The other option to handle this quirk is to create the invalid
> (consumer is parent of supplier) fwnode_link between the prm device
> and its consumers when the prm device is populated. Then fw_devlink
> will end up creating a device link when ocp gets added. But I'm not
> sure if it's going to be easy to find and add all those consumers.
>
> I'd say, for now, let's go with this patch below. I'll see if I can
> get fw_devlink to handle these odd quirks without breaking the normal
> cases or making them significantly slower. But that'll take some time
> and I'm not sure there'll be a nice solution.

Can you check if this hack helps? If so, then I can think about
whether we can pick it up without breaking everything else. Copy-paste
tab mess up warning.

-Saravana

8< ----------------

diff --git a/drivers/of/property.c b/drivers/of/property.c
index 967f79b59016..f671a7528719 100644
--- a/drivers/of/property.c
+++ b/drivers/of/property.c
@@ -1138,18 +1138,6 @@ static int of_link_to_phandle(struct device_node *con_np,
                return -ENODEV;
        }

-       /*
-        * Don't allow linking a device node as a consumer of one of its
-        * descendant nodes. By definition, a child node can't be a functional
-        * dependency for the parent node.
-        */
-       if (of_is_ancestor_of(con_np, sup_np)) {
-               pr_debug("Not linking %pOFP to %pOFP - is descendant\n",
-                        con_np, sup_np);
-               of_node_put(sup_np);
-               return -EINVAL;
-       }
-
        /*
         * Don't create links to "early devices" that won't have struct devices
         * created for them.
@@ -1163,6 +1151,25 @@ static int of_link_to_phandle(struct device_node *con_np,
                of_node_put(sup_np);
                return -ENODEV;
        }
+
+       /*
+        * Don't allow linking a device node as a consumer of one of its
+        * descendant nodes. By definition, a child node can't be a functional
+        * dependency for the parent node.
+        *
+        * However, if the child node already has a device while the parent is
+        * in the process of being added, it's probably some weird quirk
+        * handling. So, don't both checking if the consumer is an ancestor of
+        * the supplier.
+        */
+       if (!sup_dev && of_is_ancestor_of(con_np, sup_np)) {
+               pr_debug("Not linking %pOFP to %pOFP - is descendant\n",
+                        con_np, sup_np);
+               put_device(sup_dev);
+               of_node_put(sup_np);
+               return -EINVAL;
+       }
+

^ permalink raw reply related	[flat|nested] 69+ messages in thread

* Re: [PATCH v2 1/9] PM: domains: Delete usage of driver_deferred_probe_check_state()
  2022-07-01  8:26                             ` Saravana Kannan
@ 2022-07-01 13:00                               ` Tony Lindgren
  2022-07-12  7:12                                 ` Tony Lindgren
  2022-07-01 15:08                               ` Sudeep Holla
  1 sibling, 1 reply; 69+ messages in thread
From: Tony Lindgren @ 2022-07-01 13:00 UTC (permalink / raw)
  To: Saravana Kannan
  Cc: Rob Herring, Geert Uytterhoeven, Greg Kroah-Hartman,
	Rafael J. Wysocki, Kevin Hilman, Ulf Hansson, Len Brown,
	Pavel Machek, Joerg Roedel, Will Deacon, Andrew Lunn,
	Heiner Kallweit, Russell King, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Linus Walleij, Hideaki YOSHIFUJI,
	David Ahern, Android Kernel Team, linux-kernel,
	open list:THERMAL, Linux IOMMU, netdev, open list:GPIO SUBSYSTEM,
	Alexander Stein

* Saravana Kannan <saravanak@google.com> [220701 08:21]:
> On Fri, Jul 1, 2022 at 1:10 AM Saravana Kannan <saravanak@google.com> wrote:
> >
> > On Thu, Jun 30, 2022 at 11:12 PM Tony Lindgren <tony@atomide.com> wrote:
> > >
> > > * Tony Lindgren <tony@atomide.com> [220701 08:33]:
> > > > * Saravana Kannan <saravanak@google.com> [220630 23:25]:
> > > > > On Thu, Jun 30, 2022 at 4:26 PM Rob Herring <robh@kernel.org> wrote:
> > > > > >
> > > > > > On Thu, Jun 30, 2022 at 5:11 PM Saravana Kannan <saravanak@google.com> wrote:
> > > > > > >
> > > > > > > On Mon, Jun 27, 2022 at 2:10 AM Tony Lindgren <tony@atomide.com> wrote:
> > > > > > > >
> > > > > > > > * Saravana Kannan <saravanak@google.com> [220623 08:17]:
> > > > > > > > > On Thu, Jun 23, 2022 at 12:01 AM Tony Lindgren <tony@atomide.com> wrote:
> > > > > > > > > >
> > > > > > > > > > * Saravana Kannan <saravanak@google.com> [220622 19:05]:
> > > > > > > > > > > On Tue, Jun 21, 2022 at 9:59 PM Tony Lindgren <tony@atomide.com> wrote:
> > > > > > > > > > > > This issue is no directly related fw_devlink. It is a side effect of
> > > > > > > > > > > > removing driver_deferred_probe_check_state(). We no longer return
> > > > > > > > > > > > -EPROBE_DEFER at the end of driver_deferred_probe_check_state().
> > > > > > > > > > >
> > > > > > > > > > > Yes, I understand the issue. But driver_deferred_probe_check_state()
> > > > > > > > > > > was deleted because fw_devlink=on should have short circuited the
> > > > > > > > > > > probe attempt with an  -EPROBE_DEFER before reaching the bus/driver
> > > > > > > > > > > probe function and hitting this -ENOENT failure. That's why I was
> > > > > > > > > > > asking the other questions.
> > > > > > > > > >
> > > > > > > > > > OK. So where is the -EPROBE_DEFER supposed to happen without
> > > > > > > > > > driver_deferred_probe_check_state() then?
> > > > > > > > >
> > > > > > > > > device_links_check_suppliers() call inside really_probe() would short
> > > > > > > > > circuit and return an -EPROBE_DEFER if the device links are created as
> > > > > > > > > expected.
> > > > > > > >
> > > > > > > > OK
> > > > > > > >
> > > > > > > > > > Hmm so I'm not seeing any supplier for the top level ocp device in
> > > > > > > > > > the booting case without your patches. I see the suppliers for the
> > > > > > > > > > ocp child device instances only.
> > > > > > > > >
> > > > > > > > > Hmmm... this is strange (that the device link isn't there), but this
> > > > > > > > > is what I suspected.
> > > > > > > >
> > > > > > > > Yup, maybe it's because of the supplier being a device in the child
> > > > > > > > interconnect for the ocp.
> > > > > > >
> > > > > > > Ugh... yeah, this is why the normal (not SYNC_STATE_ONLY) device link
> > > > > > > isn't being created.
> > > > > > >
> > > > > > > So the aggregated view is something like (I had to set tabs = 4 space
> > > > > > > to fit it within 80 cols):
> > > > > > >
> > > > > > >     ocp: ocp {         <========================= Consumer
> > > > > > >         compatible = "simple-pm-bus";
> > > > > > >         power-domains = <&prm_per>; <=========== Supplier ref
> > > > > > >
> > > > > > >                 l4_wkup: interconnect@44c00000 {
> > > > > > >             compatible = "ti,am33xx-l4-wkup", "simple-pm-bus";
> > > > > > >
> > > > > > >             segment@200000 {  /* 0x44e00000 */
> > > > > > >                 compatible = "simple-pm-bus";
> > > > > > >
> > > > > > >                 target-module@0 { /* 0x44e00000, ap 8 58.0 */
> > > > > > >                     compatible = "ti,sysc-omap4", "ti,sysc";
> > > > > > >
> > > > > > >                     prcm: prcm@0 {
> > > > > > >                         compatible = "ti,am3-prcm", "simple-bus";
> > > > > > >
> > > > > > >                         prm_per: prm@c00 { <========= Actual Supplier
> > > > > > >                             compatible = "ti,am3-prm-inst", "ti,omap-prm-inst";
> > > > > > >                         };
> > > > > > >                     };
> > > > > > >                 };
> > > > > > >             };
> > > > > > >         };
> > > > > > >     };
> > > > > > >
> > > > > > > The power-domain supplier is the great-great-great-grand-child of the
> > > > > > > consumer. It's not clear to me how this is valid. What does it even
> > > > > > > mean?
> > > > > > >
> > > > > > > Rob, is this considered a valid DT?
> > > > > >
> > > > > > Valid DT for broken h/w.
> > > > >
> > > > > I'm not sure even in that case it's valid. When the parent device is
> > > > > in reset (when the SoC is coming out of reset), there's no way the
> > > > > descendant is functional. And if the descendant is not functional, how
> > > > > is the parent device powered up? This just feels like an incorrect
> > > > > representation of the real h/w.
> > > >
> > > > It should be correct representation based on scanning the interconnects
> > > > and looking at the documentation. Some interconnect parts are wired
> > > > always-on and some interconnect instances may be dual-mapped.
> >
> > Thanks for helping to debug this. Appreciate it.
> >
> > > >
> > > > We have a quirk to probe prm/prcm first with pdata_quirks_init_clocks().
> >
> > :'(
> >
> > I checked out the code. These prm devices just get populated with NULL
> > as the parent. So they are effectively top level devices from the
> > perspective of driver core.
> >
> > > > Maybe that also now fails in addition to the top level interconnect
> > > > probing no longer producing -EPROBE_DEFER.
> >
> > As far as I can tell pdata_quirks_init_clocks() is just adding these
> > prm devices (amongst other drivers). So I don't expect that to fail.
> >
> > > >
> > > > > > So the domain must be default on and then simple-pm-bus is going to
> > > > > > hold a reference to the domain preventing it from ever getting powered
> > > > > > off and things seem to work. Except what happens during suspend?
> > > > >
> > > > > But how can simple-pm-bus even get a reference? The PM domain can't
> > > > > get added until we are well into the probe of the simple-pm-bus and
> > > > > AFAICT the genpd attach is done before the driver probe is even
> > > > > called.
> > > >
> > > > The prm/prcm gets of_platform_populate() called on it early.
> >
> > :'(
> >
> > > The hackish patch below makes things boot for me, not convinced this
> > > is the preferred fix compared to earlier deferred probe handling though.
> > > Going back to the init level tinkering seems like a step back to me.
> >
> > The goal of fw_devlink is to avoid init level tinkering and it does
> > help with that in general. But these kinds of quirks are going to need
> > a few exceptions -- with them being quirks and all. And this change
> > will avoid an unnecessary deferred probe (that used to happen even
> > before my change).
> >
> > The other option to handle this quirk is to create the invalid
> > (consumer is parent of supplier) fwnode_link between the prm device
> > and its consumers when the prm device is populated. Then fw_devlink
> > will end up creating a device link when ocp gets added. But I'm not
> > sure if it's going to be easy to find and add all those consumers.
> >
> > I'd say, for now, let's go with this patch below. I'll see if I can
> > get fw_devlink to handle these odd quirks without breaking the normal
> > cases or making them significantly slower. But that'll take some time
> > and I'm not sure there'll be a nice solution.
> 
> Can you check if this hack helps? If so, then I can think about
> whether we can pick it up without breaking everything else. Copy-paste
> tab mess up warning.

Yeah so manually applying your patch while updating it against
next-20220624 kernel boots for me. I ended up with the following
changes FYI.

Also, looks like both with the initcall change for prm, and the patch
below, there seems to be also another problem where my test devices no
longer properly idle somehow compared to reverting the your two patches
in next.

Regards,

Tony

8< -------------------
diff --git a/drivers/of/property.c b/drivers/of/property.c
--- a/drivers/of/property.c
+++ b/drivers/of/property.c
@@ -1138,18 +1138,6 @@ static int of_link_to_phandle(struct device_node *con_np,
 		return -ENODEV;
 	}
 
-	/*
-	 * Don't allow linking a device node as a consumer of one of its
-	 * descendant nodes. By definition, a child node can't be a functional
-	 * dependency for the parent node.
-	 */
-	if (of_is_ancestor_of(con_np, sup_np)) {
-		pr_debug("Not linking %pOFP to %pOFP - is descendant\n",
-			 con_np, sup_np);
-		of_node_put(sup_np);
-		return -EINVAL;
-	}
-
 	/*
 	 * Don't create links to "early devices" that won't have struct devices
 	 * created for them.
@@ -1163,9 +1151,27 @@ static int of_link_to_phandle(struct device_node *con_np,
 		of_node_put(sup_np);
 		return -ENODEV;
 	}
-	put_device(sup_dev);
+
+	/*
+	 * Don't allow linking a device node as a consumer of one of its
+	 * descendant nodes. By definition, a child node can't be a functional
+	 * dependency for the parent node.
+	 *
+	 * However, if the child node already has a device while the parent is
+	 * in the process of being added, it's probably some weird quirk
+	 * handling. So, don't both checking if the consumer is an ancestor of
+	 * the supplier.
+	 */
+	if (!sup_dev && of_is_ancestor_of(con_np, sup_np)) {
+		pr_debug("Not linking %pOFP to %pOFP - is descendant\n",
+			 con_np, sup_np);
+		put_device(sup_dev);
+		of_node_put(sup_np);
+		return -EINVAL;
+	}
 
 	fwnode_link_add(of_fwnode_handle(con_np), of_fwnode_handle(sup_np));
+	put_device(sup_dev);
 	of_node_put(sup_np);
 
 	return 0;
-- 
2.36.1

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v2 1/9] PM: domains: Delete usage of driver_deferred_probe_check_state()
  2022-07-01  8:26                             ` Saravana Kannan
  2022-07-01 13:00                               ` Tony Lindgren
@ 2022-07-01 15:08                               ` Sudeep Holla
  2022-07-01 19:13                                 ` Saravana Kannan
  1 sibling, 1 reply; 69+ messages in thread
From: Sudeep Holla @ 2022-07-01 15:08 UTC (permalink / raw)
  To: Saravana Kannan
  Cc: Tony Lindgren, Rob Herring, Geert Uytterhoeven,
	Greg Kroah-Hartman, Rafael J. Wysocki, Kevin Hilman, Ulf Hansson,
	Len Brown, Pavel Machek, Joerg Roedel, Will Deacon, Andrew Lunn,
	Heiner Kallweit, Russell King, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Linus Walleij, Hideaki YOSHIFUJI,
	David Ahern, Android Kernel Team, linux-kernel,
	open list:THERMAL, Linux IOMMU, netdev, open list:GPIO SUBSYSTEM,
	Sudeep Holla, Alexander Stein

Hi, Saravana,

On Fri, Jul 01, 2022 at 01:26:12AM -0700, Saravana Kannan wrote:

[...]

> Can you check if this hack helps? If so, then I can think about
> whether we can pick it up without breaking everything else. Copy-paste
> tab mess up warning.

Sorry for jumping in late and not even sure if this is right thread.
I have not bisected anything yet, but I am seeing issues on my Juno R2
with SCMI enabled power domains and Coresight AMBA devices.

OF: amba_device_add() failed (-19) for /etf@20010000
OF: amba_device_add() failed (-19) for /tpiu@20030000
OF: amba_device_add() failed (-19) for /funnel@20040000
OF: amba_device_add() failed (-19) for /etr@20070000
OF: amba_device_add() failed (-19) for /stm@20100000
OF: amba_device_add() failed (-19) for /replicator@20120000
OF: amba_device_add() failed (-19) for /cpu-debug@22010000
OF: amba_device_add() failed (-19) for /etm@22040000
OF: amba_device_add() failed (-19) for /cti@22020000
OF: amba_device_add() failed (-19) for /funnel@220c0000
OF: amba_device_add() failed (-19) for /cpu-debug@22110000
OF: amba_device_add() failed (-19) for /etm@22140000
OF: amba_device_add() failed (-19) for /cti@22120000
OF: amba_device_add() failed (-19) for /cpu-debug@23010000
OF: amba_device_add() failed (-19) for /etm@23040000
OF: amba_device_add() failed (-19) for /cti@23020000
OF: amba_device_add() failed (-19) for /funnel@230c0000
OF: amba_device_add() failed (-19) for /cpu-debug@23110000
OF: amba_device_add() failed (-19) for /etm@23140000
OF: amba_device_add() failed (-19) for /cti@23120000
OF: amba_device_add() failed (-19) for /cpu-debug@23210000
OF: amba_device_add() failed (-19) for /etm@23240000
OF: amba_device_add() failed (-19) for /cti@23220000
OF: amba_device_add() failed (-19) for /cpu-debug@23310000
OF: amba_device_add() failed (-19) for /etm@23340000
OF: amba_device_add() failed (-19) for /cti@23320000
OF: amba_device_add() failed (-19) for /cti@20020000
OF: amba_device_add() failed (-19) for /cti@20110000
OF: amba_device_add() failed (-19) for /funnel@20130000
OF: amba_device_add() failed (-19) for /etf@20140000
OF: amba_device_add() failed (-19) for /funnel@20150000
OF: amba_device_add() failed (-19) for /cti@20160000

These are working fine with deferred probe in the mainline.
I tried the hack you have suggested here(rather Tony's version), also
tried with fw_devlink=0 and fw_devlink=1 && fw_devlink.strict=0
No change in the behaviour.

The DTS are in arch/arm64/boot/dts/arm/juno-*-scmi.dts and there
coresight devices are mostly in juno-cs-r1r2.dtsi

Let me know if there is anything obvious or you want me to bisect which
means I need more time. I can do that next week.

-- 
Regards,
Sudeep

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v2 1/9] PM: domains: Delete usage of driver_deferred_probe_check_state()
  2022-07-01 15:08                               ` Sudeep Holla
@ 2022-07-01 19:13                                 ` Saravana Kannan
  2022-07-05  8:44                                   ` Saravana Kannan
  0 siblings, 1 reply; 69+ messages in thread
From: Saravana Kannan @ 2022-07-01 19:13 UTC (permalink / raw)
  To: Sudeep Holla
  Cc: Tony Lindgren, Rob Herring, Geert Uytterhoeven,
	Greg Kroah-Hartman, Rafael J. Wysocki, Kevin Hilman, Ulf Hansson,
	Len Brown, Pavel Machek, Joerg Roedel, Will Deacon, Andrew Lunn,
	Heiner Kallweit, Russell King, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Linus Walleij, Hideaki YOSHIFUJI,
	David Ahern, Android Kernel Team, linux-kernel,
	open list:THERMAL, Linux IOMMU, netdev, open list:GPIO SUBSYSTEM,
	Alexander Stein

On Fri, Jul 1, 2022 at 8:08 AM Sudeep Holla <sudeep.holla@arm.com> wrote:
>
> Hi, Saravana,
>
> On Fri, Jul 01, 2022 at 01:26:12AM -0700, Saravana Kannan wrote:
>
> [...]
>
> > Can you check if this hack helps? If so, then I can think about
> > whether we can pick it up without breaking everything else. Copy-paste
> > tab mess up warning.
>
> Sorry for jumping in late and not even sure if this is right thread.
> I have not bisected anything yet, but I am seeing issues on my Juno R2
> with SCMI enabled power domains and Coresight AMBA devices.
>
> OF: amba_device_add() failed (-19) for /etf@20010000
> OF: amba_device_add() failed (-19) for /tpiu@20030000
> OF: amba_device_add() failed (-19) for /funnel@20040000
> OF: amba_device_add() failed (-19) for /etr@20070000
> OF: amba_device_add() failed (-19) for /stm@20100000
> OF: amba_device_add() failed (-19) for /replicator@20120000
> OF: amba_device_add() failed (-19) for /cpu-debug@22010000
> OF: amba_device_add() failed (-19) for /etm@22040000
> OF: amba_device_add() failed (-19) for /cti@22020000
> OF: amba_device_add() failed (-19) for /funnel@220c0000
> OF: amba_device_add() failed (-19) for /cpu-debug@22110000
> OF: amba_device_add() failed (-19) for /etm@22140000
> OF: amba_device_add() failed (-19) for /cti@22120000
> OF: amba_device_add() failed (-19) for /cpu-debug@23010000
> OF: amba_device_add() failed (-19) for /etm@23040000
> OF: amba_device_add() failed (-19) for /cti@23020000
> OF: amba_device_add() failed (-19) for /funnel@230c0000
> OF: amba_device_add() failed (-19) for /cpu-debug@23110000
> OF: amba_device_add() failed (-19) for /etm@23140000
> OF: amba_device_add() failed (-19) for /cti@23120000
> OF: amba_device_add() failed (-19) for /cpu-debug@23210000
> OF: amba_device_add() failed (-19) for /etm@23240000
> OF: amba_device_add() failed (-19) for /cti@23220000
> OF: amba_device_add() failed (-19) for /cpu-debug@23310000
> OF: amba_device_add() failed (-19) for /etm@23340000
> OF: amba_device_add() failed (-19) for /cti@23320000
> OF: amba_device_add() failed (-19) for /cti@20020000
> OF: amba_device_add() failed (-19) for /cti@20110000
> OF: amba_device_add() failed (-19) for /funnel@20130000
> OF: amba_device_add() failed (-19) for /etf@20140000
> OF: amba_device_add() failed (-19) for /funnel@20150000
> OF: amba_device_add() failed (-19) for /cti@20160000
>
> These are working fine with deferred probe in the mainline.
> I tried the hack you have suggested here(rather Tony's version),

Thanks for trying that.

> also
> tried with fw_devlink=0 and fw_devlink=1

0 and 1 aren't valid input to fw_devlink. But yeah, I don't expect
disabling it to make anything better.

> && fw_devlink.strict=0
> No change in the behaviour.
>
> The DTS are in arch/arm64/boot/dts/arm/juno-*-scmi.dts and there
> coresight devices are mostly in juno-cs-r1r2.dtsi

Thanks

> Let me know if there is anything obvious or you want me to bisect which
> means I need more time. I can do that next week.

I'll let you know once I poke at the DTS. We need to figure out why
fw_devlink wasn't blocking these from getting to the error (same as in
Tony's case). But since these are amba devices, I think I have some
guesses.

This is an old series that had some issues in some cases and I haven't
gotten around to looking at it. You can give that a shot if you can
apply it to a recent tree.
https://lore.kernel.org/lkml/20210304195101.3843496-1-saravanak@google.com/

After looking at that old patch again, I think I know what's going on.
For normal devices, the pm domain attach happens AFTER the device is
added and fw_devlink has had a chance to set up device links. And if
the suppliers aren't ready, really_probe() won't get as far as
dev_pm_domain_attach(). But for amba, the clock and pm domain
suppliers are "grabbed" before adding the device.

So with that old patch + always returning -EPROBE_DEFER in
amba_device_add() if amba_read_periphid() fails should fix your issue.

-Saravana

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: (EXT) Re: (EXT) Re: [PATCH v2 1/9] PM: domains: Delete usage of driver_deferred_probe_check_state()
  2022-07-01  7:02           ` Saravana Kannan
@ 2022-07-04  7:07             ` Alexander Stein
  2022-07-05  1:24               ` Saravana Kannan
  0 siblings, 1 reply; 69+ messages in thread
From: Alexander Stein @ 2022-07-04  7:07 UTC (permalink / raw)
  To: Saravana Kannan
  Cc: Tony Lindgren, Greg Kroah-Hartman, Rafael J. Wysocki,
	Kevin Hilman, Ulf Hansson, Len Brown, Pavel Machek, Joerg Roedel,
	Will Deacon, Andrew Lunn, Heiner Kallweit, Russell King,
	David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Linus Walleij, Hideaki YOSHIFUJI, David Ahern, kernel-team,
	linux-kernel, linux-pm, iommu, netdev, linux-gpio,
	Geert Uytterhoeven

Am Freitag, 1. Juli 2022, 09:02:22 CEST schrieb Saravana Kannan:
> On Thu, Jun 30, 2022 at 11:02 PM Alexander Stein
> 
> <alexander.stein@ew.tq-group.com> wrote:
> > Hi Saravana,
> > 
> > Am Freitag, 1. Juli 2022, 02:37:14 CEST schrieb Saravana Kannan:
> > > On Thu, Jun 23, 2022 at 5:08 AM Alexander Stein
> > > 
> > > <alexander.stein@ew.tq-group.com> wrote:
> > > > Hi,
> > > > 
> > > > Am Dienstag, 21. Juni 2022, 09:28:43 CEST schrieb Tony Lindgren:
> > > > > Hi,
> > > > > 
> > > > > * Saravana Kannan <saravanak@google.com> [700101 02:00]:
> > > > > > Now that fw_devlink=on by default and fw_devlink supports
> > > > > > "power-domains" property, the execution will never get to the
> > > > > > point
> > > > > > where driver_deferred_probe_check_state() is called before the
> > > > > > supplier
> > > > > > has probed successfully or before deferred probe timeout has
> > > > > > expired.
> > > > > > 
> > > > > > So, delete the call and replace it with -ENODEV.
> > > > > 
> > > > > Looks like this causes omaps to not boot in Linux next. With this
> > > > > simple-pm-bus fails to probe initially as the power-domain is not
> > > > > yet available. On platform_probe() genpd_get_from_provider() returns
> > > > > -ENOENT.
> > > > > 
> > > > > Seems like other stuff is potentially broken too, any ideas on
> > > > > how to fix this?
> > > > 
> > > > I think I'm hit by this as well, although I do not get a lockup.
> > > > In my case I'm using
> > > > arch/arm64/boot/dts/freescale/imx8mq-tqma8mq-mba8mx.dts and probing of
> > > > 38320000.blk-ctrl fails as the power-domain is not (yet) registed.
> > > 
> > > Ok, took a look.
> > > 
> > > The problem is that there are two drivers for the same device and they
> > > both initialize this device.
> > > 
> > >     gpc: gpc@303a0000 {
> > >     
> > >         compatible = "fsl,imx8mq-gpc";
> > >     
> > >     }
> > > 
> > > $ git grep -l "fsl,imx7d-gpc" -- drivers/
> > > drivers/irqchip/irq-imx-gpcv2.c
> > > drivers/soc/imx/gpcv2.c
> > > 
> > > IMHO, this is a bad/broken design.
> > > 
> > > So what's happening is that fw_devlink will block the probe of
> > > 38320000.blk-ctrl until 303a0000.gpc is initialized. And it stops
> > > blocking the probe of 38320000.blk-ctrl as soon as the first driver
> > > initializes the device. In this case, it's the irqchip driver.
> > > 
> > > I'd recommend combining these drivers into one. Something like the
> > > patch I'm attaching (sorry for the attachment, copy-paste is mangling
> > > the tabs). Can you give it a shot please?
> > 
> > I tried this patch and it delayed the driver initialization (those of UART
> > as
> > well BTW). Unfortunately the driver fails the same way:
> Thanks for testing the patch!
> 
> > > [    1.125253] imx8m-blk-ctrl 38320000.blk-ctrl: error -ENODEV: failed
> > > to
> > 
> > attach power domain "bus"
> > 
> > More than that it even introduced some more errors:
> > > [    0.008160] irq: no irq domain found for gpc@303a0000 !
> 
> So the idea behind my change was that as long as the irqchip isn't the
> root of the irqdomain (might be using the terms incorrectly) like the
> gic, you can make it a platform driver. And I was trying to hack up a
> patch that's the equivalent of platform_irqchip_probe() (which just
> ends up eventually calling the callback you use in IRQCHIP_DECLARE().
> I probably made some mistake in the quick hack that I'm sure if
> fixable.
> 
> > > [    0.013251] Failed to map interrupt for
> > > /soc@0/bus@30400000/timer@306a0000
> 
> However, this timer driver also uses TIMER_OF_DECLARE() which can't
> handle failure to get the IRQ (because it's can't -EPROBE_DEFER). So,
> this means, the timer driver inturn needs to be converted to a
> platform driver if it's supposed to work with the IRQCHIP_DECLARE()
> being converted to a platform driver.
> 
> But that's a can of worms not worth opening. But then I remembered
> this simpler workaround will work and it is pretty much a variant of
> the workaround that's already in the gpc's irqchip driver to allow two
> drivers to probe the same device (people really should stop doing
> that).
> 
> Can you drop my previous hack patch and try this instead please? I'm
> 99% sure this will work.
> 
> diff --git a/drivers/irqchip/irq-imx-gpcv2.c
> b/drivers/irqchip/irq-imx-gpcv2.c index b9c22f764b4d..8a0e82067924 100644
> --- a/drivers/irqchip/irq-imx-gpcv2.c
> +++ b/drivers/irqchip/irq-imx-gpcv2.c
> @@ -283,6 +283,7 @@ static int __init imx_gpcv2_irqchip_init(struct
> device_node *node,
>          * later the GPC power domain driver will not be skipped.
>          */
>         of_node_clear_flag(node, OF_POPULATED);
> +       fwnode_dev_initialized(domain->fwnode, false);
>         return 0;
>  }

Just to be sure here, I tried this patch on top of next-20220701 but 
unfortunately this doesn't fix the original problem either. The timer errors 
are gone though.
The probe of imx8m-blk-ctrl got slightly delayed (from 0.74 to 0.90s printk 
time) but results in the identical error message.

Best regards,
Alexander




^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: (EXT) Re: (EXT) Re: [PATCH v2 1/9] PM: domains: Delete usage of driver_deferred_probe_check_state()
  2022-07-04  7:07             ` (EXT) " Alexander Stein
@ 2022-07-05  1:24               ` Saravana Kannan
  2022-07-06 13:02                 ` Re: " Alexander Stein
  0 siblings, 1 reply; 69+ messages in thread
From: Saravana Kannan @ 2022-07-05  1:24 UTC (permalink / raw)
  To: Alexander Stein
  Cc: Tony Lindgren, Greg Kroah-Hartman, Rafael J. Wysocki,
	Kevin Hilman, Ulf Hansson, Len Brown, Pavel Machek, Joerg Roedel,
	Will Deacon, Andrew Lunn, Heiner Kallweit, Russell King,
	David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Linus Walleij, Hideaki YOSHIFUJI, David Ahern, kernel-team,
	linux-kernel, linux-pm, iommu, netdev, linux-gpio,
	Geert Uytterhoeven

[-- Attachment #1: Type: text/plain, Size: 6179 bytes --]

On Mon, Jul 4, 2022 at 12:07 AM Alexander Stein
<alexander.stein@ew.tq-group.com> wrote:
>
> Am Freitag, 1. Juli 2022, 09:02:22 CEST schrieb Saravana Kannan:
> > On Thu, Jun 30, 2022 at 11:02 PM Alexander Stein
> >
> > <alexander.stein@ew.tq-group.com> wrote:
> > > Hi Saravana,
> > >
> > > Am Freitag, 1. Juli 2022, 02:37:14 CEST schrieb Saravana Kannan:
> > > > On Thu, Jun 23, 2022 at 5:08 AM Alexander Stein
> > > >
> > > > <alexander.stein@ew.tq-group.com> wrote:
> > > > > Hi,
> > > > >
> > > > > Am Dienstag, 21. Juni 2022, 09:28:43 CEST schrieb Tony Lindgren:
> > > > > > Hi,
> > > > > >
> > > > > > * Saravana Kannan <saravanak@google.com> [700101 02:00]:
> > > > > > > Now that fw_devlink=on by default and fw_devlink supports
> > > > > > > "power-domains" property, the execution will never get to the
> > > > > > > point
> > > > > > > where driver_deferred_probe_check_state() is called before the
> > > > > > > supplier
> > > > > > > has probed successfully or before deferred probe timeout has
> > > > > > > expired.
> > > > > > >
> > > > > > > So, delete the call and replace it with -ENODEV.
> > > > > >
> > > > > > Looks like this causes omaps to not boot in Linux next. With this
> > > > > > simple-pm-bus fails to probe initially as the power-domain is not
> > > > > > yet available. On platform_probe() genpd_get_from_provider() returns
> > > > > > -ENOENT.
> > > > > >
> > > > > > Seems like other stuff is potentially broken too, any ideas on
> > > > > > how to fix this?
> > > > >
> > > > > I think I'm hit by this as well, although I do not get a lockup.
> > > > > In my case I'm using
> > > > > arch/arm64/boot/dts/freescale/imx8mq-tqma8mq-mba8mx.dts and probing of
> > > > > 38320000.blk-ctrl fails as the power-domain is not (yet) registed.
> > > >
> > > > Ok, took a look.
> > > >
> > > > The problem is that there are two drivers for the same device and they
> > > > both initialize this device.
> > > >
> > > >     gpc: gpc@303a0000 {
> > > >
> > > >         compatible = "fsl,imx8mq-gpc";
> > > >
> > > >     }
> > > >
> > > > $ git grep -l "fsl,imx7d-gpc" -- drivers/
> > > > drivers/irqchip/irq-imx-gpcv2.c
> > > > drivers/soc/imx/gpcv2.c
> > > >
> > > > IMHO, this is a bad/broken design.
> > > >
> > > > So what's happening is that fw_devlink will block the probe of
> > > > 38320000.blk-ctrl until 303a0000.gpc is initialized. And it stops
> > > > blocking the probe of 38320000.blk-ctrl as soon as the first driver
> > > > initializes the device. In this case, it's the irqchip driver.
> > > >
> > > > I'd recommend combining these drivers into one. Something like the
> > > > patch I'm attaching (sorry for the attachment, copy-paste is mangling
> > > > the tabs). Can you give it a shot please?
> > >
> > > I tried this patch and it delayed the driver initialization (those of UART
> > > as
> > > well BTW). Unfortunately the driver fails the same way:
> > Thanks for testing the patch!
> >
> > > > [    1.125253] imx8m-blk-ctrl 38320000.blk-ctrl: error -ENODEV: failed
> > > > to
> > >
> > > attach power domain "bus"
> > >
> > > More than that it even introduced some more errors:
> > > > [    0.008160] irq: no irq domain found for gpc@303a0000 !
> >
> > So the idea behind my change was that as long as the irqchip isn't the
> > root of the irqdomain (might be using the terms incorrectly) like the
> > gic, you can make it a platform driver. And I was trying to hack up a
> > patch that's the equivalent of platform_irqchip_probe() (which just
> > ends up eventually calling the callback you use in IRQCHIP_DECLARE().
> > I probably made some mistake in the quick hack that I'm sure if
> > fixable.
> >
> > > > [    0.013251] Failed to map interrupt for
> > > > /soc@0/bus@30400000/timer@306a0000
> >
> > However, this timer driver also uses TIMER_OF_DECLARE() which can't
> > handle failure to get the IRQ (because it's can't -EPROBE_DEFER). So,
> > this means, the timer driver inturn needs to be converted to a
> > platform driver if it's supposed to work with the IRQCHIP_DECLARE()
> > being converted to a platform driver.
> >
> > But that's a can of worms not worth opening. But then I remembered
> > this simpler workaround will work and it is pretty much a variant of
> > the workaround that's already in the gpc's irqchip driver to allow two
> > drivers to probe the same device (people really should stop doing
> > that).
> >
> > Can you drop my previous hack patch and try this instead please? I'm
> > 99% sure this will work.
> >
> > diff --git a/drivers/irqchip/irq-imx-gpcv2.c
> > b/drivers/irqchip/irq-imx-gpcv2.c index b9c22f764b4d..8a0e82067924 100644
> > --- a/drivers/irqchip/irq-imx-gpcv2.c
> > +++ b/drivers/irqchip/irq-imx-gpcv2.c
> > @@ -283,6 +283,7 @@ static int __init imx_gpcv2_irqchip_init(struct
> > device_node *node,
> >          * later the GPC power domain driver will not be skipped.
> >          */
> >         of_node_clear_flag(node, OF_POPULATED);
> > +       fwnode_dev_initialized(domain->fwnode, false);
> >         return 0;
> >  }
>
> Just to be sure here, I tried this patch on top of next-20220701 but
> unfortunately this doesn't fix the original problem either. The timer errors
> are gone though.

To clarify, you had the timer issue only with my "combine drivers" patch, right?

> The probe of imx8m-blk-ctrl got slightly delayed (from 0.74 to 0.90s printk
> time) but results in the identical error message.

My guess is that the probe attempt of blk-ctrl is delayed now till gpc
probes (because of the device links getting created with the
fwnode_dev_initialized() fix), but by the time gpc probe finishes, the
power domains aren't registered yet because of the additional level of
device addition and probing.

Can you try the attached patch please?

And if that doesn't fix the issues, then enable the debug logs in the
following functions please and share the logs from boot till the
failure? If you can enable CONFIG_PRINTK_CALLER, that'd help too.
device_link_add()
fwnode_link_add()
fw_devlink_relax_cycle()

Btw, part of the reason I'm trying to make sure we fix it the right
way is that when we try to enable async boot by default, we don't run
into issues.

Thanks,
Saravana

[-- Attachment #2: 0001-imx-fix-and-logs.patch --]
[-- Type: text/x-patch, Size: 1755 bytes --]

From 34ae4fa9c7efca26e5946422ab9a0925ce4a5293 Mon Sep 17 00:00:00 2001
From: Saravana Kannan <saravanak@google.com>
Date: Fri, 1 Jul 2022 01:25:56 -0700
Subject: [PATCH] imx-fix and logs

---
 drivers/irqchip/irq-imx-gpcv2.c |  1 +
 drivers/soc/imx/gpcv2.c         | 10 +++++++++-
 2 files changed, 10 insertions(+), 1 deletion(-)

diff --git a/drivers/irqchip/irq-imx-gpcv2.c b/drivers/irqchip/irq-imx-gpcv2.c
index b9c22f764b4d..8a0e82067924 100644
--- a/drivers/irqchip/irq-imx-gpcv2.c
+++ b/drivers/irqchip/irq-imx-gpcv2.c
@@ -283,6 +283,7 @@ static int __init imx_gpcv2_irqchip_init(struct device_node *node,
 	 * later the GPC power domain driver will not be skipped.
 	 */
 	of_node_clear_flag(node, OF_POPULATED);
+	fwnode_dev_initialized(domain->fwnode, false);
 	return 0;
 }
 
diff --git a/drivers/soc/imx/gpcv2.c b/drivers/soc/imx/gpcv2.c
index 85aa86e1338a..07fceaf74f19 100644
--- a/drivers/soc/imx/gpcv2.c
+++ b/drivers/soc/imx/gpcv2.c
@@ -1350,6 +1350,8 @@ static int imx_pgc_domain_probe(struct platform_device *pdev)
 		goto out_genpd_remove;
 	}
 
+	dev_info("%s: Probe succeeded\n", __func__);
+
 	return 0;
 
 out_genpd_remove:
@@ -1423,7 +1425,12 @@ static struct platform_driver imx_pgc_domain_driver = {
 	.remove   = imx_pgc_domain_remove,
 	.id_table = imx_pgc_domain_id,
 };
-builtin_platform_driver(imx_pgc_domain_driver)
+
+static int __init imx_pgc_domain_init(void)
+{
+	return platform_driver_register(&imx_pgc_domain_driver);
+}
+subsys_initcall(imx_pgc_domain_init);
 
 static int imx_gpcv2_probe(struct platform_device *pdev)
 {
@@ -1518,6 +1525,7 @@ static int imx_gpcv2_probe(struct platform_device *pdev)
 		}
 	}
 
+	dev_info("%s: Probe succeeded\n", __func__);
 	return 0;
 }
 
-- 
2.37.0.rc0.161.g10f37bed90-goog


^ permalink raw reply related	[flat|nested] 69+ messages in thread

* Re: [PATCH v2 1/9] PM: domains: Delete usage of driver_deferred_probe_check_state()
  2022-07-01 19:13                                 ` Saravana Kannan
@ 2022-07-05  8:44                                   ` Saravana Kannan
  0 siblings, 0 replies; 69+ messages in thread
From: Saravana Kannan @ 2022-07-05  8:44 UTC (permalink / raw)
  To: Sudeep Holla
  Cc: Tony Lindgren, Rob Herring, Geert Uytterhoeven,
	Greg Kroah-Hartman, Rafael J. Wysocki, Kevin Hilman, Ulf Hansson,
	Len Brown, Pavel Machek, Joerg Roedel, Will Deacon, Andrew Lunn,
	Heiner Kallweit, Russell King, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Linus Walleij, Hideaki YOSHIFUJI,
	David Ahern, Android Kernel Team, linux-kernel,
	open list:THERMAL, Linux IOMMU, netdev, open list:GPIO SUBSYSTEM,
	Alexander Stein

On Fri, Jul 1, 2022 at 12:13 PM Saravana Kannan <saravanak@google.com> wrote:
>
> On Fri, Jul 1, 2022 at 8:08 AM Sudeep Holla <sudeep.holla@arm.com> wrote:
> >
> > Hi, Saravana,
> >
> > On Fri, Jul 01, 2022 at 01:26:12AM -0700, Saravana Kannan wrote:
> >
> > [...]
> >
> > > Can you check if this hack helps? If so, then I can think about
> > > whether we can pick it up without breaking everything else. Copy-paste
> > > tab mess up warning.
> >
> > Sorry for jumping in late and not even sure if this is right thread.
> > I have not bisected anything yet, but I am seeing issues on my Juno R2
> > with SCMI enabled power domains and Coresight AMBA devices.
> >
> > OF: amba_device_add() failed (-19) for /etf@20010000
> > OF: amba_device_add() failed (-19) for /tpiu@20030000
> > OF: amba_device_add() failed (-19) for /funnel@20040000
> > OF: amba_device_add() failed (-19) for /etr@20070000
> > OF: amba_device_add() failed (-19) for /stm@20100000
> > OF: amba_device_add() failed (-19) for /replicator@20120000
> > OF: amba_device_add() failed (-19) for /cpu-debug@22010000
> > OF: amba_device_add() failed (-19) for /etm@22040000
> > OF: amba_device_add() failed (-19) for /cti@22020000
> > OF: amba_device_add() failed (-19) for /funnel@220c0000
> > OF: amba_device_add() failed (-19) for /cpu-debug@22110000
> > OF: amba_device_add() failed (-19) for /etm@22140000
> > OF: amba_device_add() failed (-19) for /cti@22120000
> > OF: amba_device_add() failed (-19) for /cpu-debug@23010000
> > OF: amba_device_add() failed (-19) for /etm@23040000
> > OF: amba_device_add() failed (-19) for /cti@23020000
> > OF: amba_device_add() failed (-19) for /funnel@230c0000
> > OF: amba_device_add() failed (-19) for /cpu-debug@23110000
> > OF: amba_device_add() failed (-19) for /etm@23140000
> > OF: amba_device_add() failed (-19) for /cti@23120000
> > OF: amba_device_add() failed (-19) for /cpu-debug@23210000
> > OF: amba_device_add() failed (-19) for /etm@23240000
> > OF: amba_device_add() failed (-19) for /cti@23220000
> > OF: amba_device_add() failed (-19) for /cpu-debug@23310000
> > OF: amba_device_add() failed (-19) for /etm@23340000
> > OF: amba_device_add() failed (-19) for /cti@23320000
> > OF: amba_device_add() failed (-19) for /cti@20020000
> > OF: amba_device_add() failed (-19) for /cti@20110000
> > OF: amba_device_add() failed (-19) for /funnel@20130000
> > OF: amba_device_add() failed (-19) for /etf@20140000
> > OF: amba_device_add() failed (-19) for /funnel@20150000
> > OF: amba_device_add() failed (-19) for /cti@20160000
> >
> > These are working fine with deferred probe in the mainline.
> > I tried the hack you have suggested here(rather Tony's version),
>
> Thanks for trying that.
>
> > also
> > tried with fw_devlink=0 and fw_devlink=1
>
> 0 and 1 aren't valid input to fw_devlink. But yeah, I don't expect
> disabling it to make anything better.
>
> > && fw_devlink.strict=0
> > No change in the behaviour.
> >
> > The DTS are in arch/arm64/boot/dts/arm/juno-*-scmi.dts and there
> > coresight devices are mostly in juno-cs-r1r2.dtsi
>
> Thanks
>
> > Let me know if there is anything obvious or you want me to bisect which
> > means I need more time. I can do that next week.
>
> I'll let you know once I poke at the DTS. We need to figure out why
> fw_devlink wasn't blocking these from getting to the error (same as in
> Tony's case). But since these are amba devices, I think I have some
> guesses.
>
> This is an old series that had some issues in some cases and I haven't
> gotten around to looking at it. You can give that a shot if you can
> apply it to a recent tree.
> https://lore.kernel.org/lkml/20210304195101.3843496-1-saravanak@google.com/

I rebased it to driver-core-next and tested the patch  (for
correctness, not with your issue though). I'm fairly sure it should
help with your issue. Can you give it a shot please?

https://lore.kernel.org/lkml/20220705083934.3974140-1-saravanak@google.com/T/#u

-Saravana

>
> After looking at that old patch again, I think I know what's going on.
> For normal devices, the pm domain attach happens AFTER the device is
> added and fw_devlink has had a chance to set up device links. And if
> the suppliers aren't ready, really_probe() won't get as far as
> dev_pm_domain_attach(). But for amba, the clock and pm domain
> suppliers are "grabbed" before adding the device.
>
> So with that old patch + always returning -EPROBE_DEFER in
> amba_device_add() if amba_read_periphid() fails should fix your issue.
>
> -Saravana

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v2 3/9] net: mdio: Delete usage of driver_deferred_probe_check_state()
  2022-06-01  7:06 ` [PATCH v2 3/9] net: mdio: " Saravana Kannan
@ 2022-07-05  9:11   ` Geert Uytterhoeven
  2022-07-13  1:40     ` Saravana Kannan
  2022-08-15  8:38     ` Geert Uytterhoeven
  0 siblings, 2 replies; 69+ messages in thread
From: Geert Uytterhoeven @ 2022-07-05  9:11 UTC (permalink / raw)
  To: Saravana Kannan
  Cc: Greg Kroah-Hartman, Rafael J. Wysocki, Kevin Hilman, Ulf Hansson,
	Len Brown, Pavel Machek, Joerg Roedel, Will Deacon, Andrew Lunn,
	Heiner Kallweit, Russell King, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Linus Walleij, Hideaki YOSHIFUJI,
	David Ahern, Android Kernel Team, Linux Kernel Mailing List,
	Linux PM list, Linux IOMMU, netdev, open list:GPIO SUBSYSTEM,
	Linux-Renesas

Hi Saravana,

On Wed, Jun 1, 2022 at 2:44 PM Saravana Kannan <saravanak@google.com> wrote:
> Now that fw_devlink=on by default and fw_devlink supports interrupt
> properties, the execution will never get to the point where
> driver_deferred_probe_check_state() is called before the supplier has
> probed successfully or before deferred probe timeout has expired.
>
> So, delete the call and replace it with -ENODEV.
>
> Signed-off-by: Saravana Kannan <saravanak@google.com>

Thanks for your patch, which is now commit f8217275b57aa48d ("net:
mdio: Delete usage of driver_deferred_probe_check_state()") in
driver-core/driver-core-next.

Seems like I missed something when providing my T-b for this series,
sorry for that.

arch/arm/boot/dts/r8a7791-koelsch.dts has:

    &ether {
            pinctrl-0 = <&ether_pins>, <&phy1_pins>;
            pinctrl-names = "default";

            phy-handle = <&phy1>;
            renesas,ether-link-active-low;
            status = "okay";

            phy1: ethernet-phy@1 {
                    compatible = "ethernet-phy-id0022.1537",
                                 "ethernet-phy-ieee802.3-c22";
                    reg = <1>;
                    interrupt-parent = <&irqc0>;
                    interrupts = <0 IRQ_TYPE_LEVEL_LOW>;
                    micrel,led-mode = <1>;
                    reset-gpios = <&gpio5 22 GPIO_ACTIVE_LOW>;
            };
    };

Despite the interrupts property, &ether is now probed before irqc0
(interrupt-controller@e61c0000 in arch/arm/boot/dts/r8a7791.dtsi),
causing the PHY not finding its interrupt, and resorting to polling:

    -Micrel KSZ8041RNLI ee700000.ethernet-ffffffff:01: attached PHY
driver (mii_bus:phy_addr=ee700000.ethernet-ffffffff:01, irq=185)
    +Micrel KSZ8041RNLI ee700000.ethernet-ffffffff:01: attached PHY
driver (mii_bus:phy_addr=ee700000.ethernet-ffffffff:01, irq=POLL)

Reverting this commit, and commit 9cbffc7a59561be9 ("driver core:
Delete driver_deferred_probe_check_state()") fixes that.

> --- a/drivers/net/mdio/fwnode_mdio.c
> +++ b/drivers/net/mdio/fwnode_mdio.c
> @@ -47,9 +47,7 @@ int fwnode_mdiobus_phy_device_register(struct mii_bus *mdio,
>          * just fall back to poll mode
>          */
>         if (rc == -EPROBE_DEFER)
> -               rc = driver_deferred_probe_check_state(&phy->mdio.dev);
> -       if (rc == -EPROBE_DEFER)
> -               return rc;
> +               rc = -ENODEV;
>
>         if (rc > 0) {
>                 phy->irq = rc;

Gr{oetje,eeting}s,

                        Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: Re: Re: Re: [PATCH v2 1/9] PM: domains: Delete usage of driver_deferred_probe_check_state()
  2022-07-05  1:24               ` Saravana Kannan
@ 2022-07-06 13:02                 ` Alexander Stein
  2022-07-13  0:45                   ` Saravana Kannan
  0 siblings, 1 reply; 69+ messages in thread
From: Alexander Stein @ 2022-07-06 13:02 UTC (permalink / raw)
  To: Saravana Kannan
  Cc: Tony Lindgren, Greg Kroah-Hartman, Rafael J. Wysocki,
	Kevin Hilman, Ulf Hansson, Len Brown, Pavel Machek, Joerg Roedel,
	Will Deacon, Andrew Lunn, Heiner Kallweit, Russell King,
	David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Linus Walleij, Hideaki YOSHIFUJI, David Ahern, kernel-team,
	linux-kernel, linux-pm, iommu, netdev, linux-gpio,
	Geert Uytterhoeven

[-- Attachment #1: Type: text/plain, Size: 7970 bytes --]

Am Dienstag, 5. Juli 2022, 03:24:33 CEST schrieb Saravana Kannan:
> On Mon, Jul 4, 2022 at 12:07 AM Alexander Stein
> 
> <alexander.stein@ew.tq-group.com> wrote:
> > Am Freitag, 1. Juli 2022, 09:02:22 CEST schrieb Saravana Kannan:
> > > On Thu, Jun 30, 2022 at 11:02 PM Alexander Stein
> > > 
> > > <alexander.stein@ew.tq-group.com> wrote:
> > > > Hi Saravana,
> > > > 
> > > > Am Freitag, 1. Juli 2022, 02:37:14 CEST schrieb Saravana Kannan:
> > > > > On Thu, Jun 23, 2022 at 5:08 AM Alexander Stein
> > > > > 
> > > > > <alexander.stein@ew.tq-group.com> wrote:
> > > > > > Hi,
> > > > > > 
> > > > > > Am Dienstag, 21. Juni 2022, 09:28:43 CEST schrieb Tony Lindgren:
> > > > > > > Hi,
> > > > > > > 
> > > > > > > * Saravana Kannan <saravanak@google.com> [700101 02:00]:
> > > > > > > > Now that fw_devlink=on by default and fw_devlink supports
> > > > > > > > "power-domains" property, the execution will never get to the
> > > > > > > > point
> > > > > > > > where driver_deferred_probe_check_state() is called before the
> > > > > > > > supplier
> > > > > > > > has probed successfully or before deferred probe timeout has
> > > > > > > > expired.
> > > > > > > > 
> > > > > > > > So, delete the call and replace it with -ENODEV.
> > > > > > > 
> > > > > > > Looks like this causes omaps to not boot in Linux next. With
> > > > > > > this
> > > > > > > simple-pm-bus fails to probe initially as the power-domain is
> > > > > > > not
> > > > > > > yet available. On platform_probe() genpd_get_from_provider()
> > > > > > > returns
> > > > > > > -ENOENT.
> > > > > > > 
> > > > > > > Seems like other stuff is potentially broken too, any ideas on
> > > > > > > how to fix this?
> > > > > > 
> > > > > > I think I'm hit by this as well, although I do not get a lockup.
> > > > > > In my case I'm using
> > > > > > arch/arm64/boot/dts/freescale/imx8mq-tqma8mq-mba8mx.dts and
> > > > > > probing of
> > > > > > 38320000.blk-ctrl fails as the power-domain is not (yet) registed.
> > > > > 
> > > > > Ok, took a look.
> > > > > 
> > > > > The problem is that there are two drivers for the same device and
> > > > > they
> > > > > both initialize this device.
> > > > > 
> > > > >     gpc: gpc@303a0000 {
> > > > >     
> > > > >         compatible = "fsl,imx8mq-gpc";
> > > > >     
> > > > >     }
> > > > > 
> > > > > $ git grep -l "fsl,imx7d-gpc" -- drivers/
> > > > > drivers/irqchip/irq-imx-gpcv2.c
> > > > > drivers/soc/imx/gpcv2.c
> > > > > 
> > > > > IMHO, this is a bad/broken design.
> > > > > 
> > > > > So what's happening is that fw_devlink will block the probe of
> > > > > 38320000.blk-ctrl until 303a0000.gpc is initialized. And it stops
> > > > > blocking the probe of 38320000.blk-ctrl as soon as the first driver
> > > > > initializes the device. In this case, it's the irqchip driver.
> > > > > 
> > > > > I'd recommend combining these drivers into one. Something like the
> > > > > patch I'm attaching (sorry for the attachment, copy-paste is
> > > > > mangling
> > > > > the tabs). Can you give it a shot please?
> > > > 
> > > > I tried this patch and it delayed the driver initialization (those of
> > > > UART
> > > > as
> > > 
> > > > well BTW). Unfortunately the driver fails the same way:
> > > Thanks for testing the patch!
> > > 
> > > > > [    1.125253] imx8m-blk-ctrl 38320000.blk-ctrl: error -ENODEV:
> > > > > failed
> > > > > to
> > > > 
> > > > attach power domain "bus"
> > > > 
> > > > More than that it even introduced some more errors:
> > > > > [    0.008160] irq: no irq domain found for gpc@303a0000 !
> > > 
> > > So the idea behind my change was that as long as the irqchip isn't the
> > > root of the irqdomain (might be using the terms incorrectly) like the
> > > gic, you can make it a platform driver. And I was trying to hack up a
> > > patch that's the equivalent of platform_irqchip_probe() (which just
> > > ends up eventually calling the callback you use in IRQCHIP_DECLARE().
> > > I probably made some mistake in the quick hack that I'm sure if
> > > fixable.
> > > 
> > > > > [    0.013251] Failed to map interrupt for
> > > > > /soc@0/bus@30400000/timer@306a0000
> > > 
> > > However, this timer driver also uses TIMER_OF_DECLARE() which can't
> > > handle failure to get the IRQ (because it's can't -EPROBE_DEFER). So,
> > > this means, the timer driver inturn needs to be converted to a
> > > platform driver if it's supposed to work with the IRQCHIP_DECLARE()
> > > being converted to a platform driver.
> > > 
> > > But that's a can of worms not worth opening. But then I remembered
> > > this simpler workaround will work and it is pretty much a variant of
> > > the workaround that's already in the gpc's irqchip driver to allow two
> > > drivers to probe the same device (people really should stop doing
> > > that).
> > > 
> > > Can you drop my previous hack patch and try this instead please? I'm
> > > 99% sure this will work.
> > > 
> > > diff --git a/drivers/irqchip/irq-imx-gpcv2.c
> > > b/drivers/irqchip/irq-imx-gpcv2.c index b9c22f764b4d..8a0e82067924
> > > 100644
> > > --- a/drivers/irqchip/irq-imx-gpcv2.c
> > > +++ b/drivers/irqchip/irq-imx-gpcv2.c
> > > @@ -283,6 +283,7 @@ static int __init imx_gpcv2_irqchip_init(struct
> > > device_node *node,
> > > 
> > >          * later the GPC power domain driver will not be skipped.
> > >          */
> > >         
> > >         of_node_clear_flag(node, OF_POPULATED);
> > > 
> > > +       fwnode_dev_initialized(domain->fwnode, false);
> > > 
> > >         return 0;
> > >  
> > >  }
> > 
> > Just to be sure here, I tried this patch on top of next-20220701 but
> > unfortunately this doesn't fix the original problem either. The timer
> > errors are gone though.
> 
> To clarify, you had the timer issue only with my "combine drivers" patch,
> right?

That's correct.

> > The probe of imx8m-blk-ctrl got slightly delayed (from 0.74 to 0.90s
> > printk
> > time) but results in the identical error message.
> 
> My guess is that the probe attempt of blk-ctrl is delayed now till gpc
> probes (because of the device links getting created with the
> fwnode_dev_initialized() fix), but by the time gpc probe finishes, the
> power domains aren't registered yet because of the additional level of
> device addition and probing.
> 
> Can you try the attached patch please?

Sure, it needed some small fixes though. But the error still is present.

> And if that doesn't fix the issues, then enable the debug logs in the
> following functions please and share the logs from boot till the
> failure? If you can enable CONFIG_PRINTK_CALLER, that'd help too.
> device_link_add()
> fwnode_link_add()
> fw_devlink_relax_cycle()

I switched fw_devlink_relax_cycle() for fw_devlink_relax_link() as the former 
has no debug output here.

For the record I added the following line to my kernel command line:
> dyndbg="func device_link_add +p; func fwnode_link_add +p; func 
fw_devlink_relax_link +p"

I attached the dmesg until the probe error to this mail. But I noticed the 
following lines which seem interesting:
> [    1.466620][    T8] imx-pgc imx-pgc-domain.5: Linked as a consumer to
> regulator.8
> [    1.466743][    T8] imx-pgc imx-pgc-domain.5: imx_pgc_domain_probe: Probe 
succeeded
> [    1.474733][    T8] imx-pgc imx-pgc-domain.6: Linked as a consumer to 
regulator.9
> [    1.474774][    T8] imx-pgc imx-pgc-domain.6: imx_pgc_domain_probe: Probe 
succeeded

regulator.8 and regulator.9 is the power sequencer, attached on I2C. This also 
makes perfectly sense if you look at [1]ff. These power domains are supplied 
by specific power supply rails. Several, if not all, imx8mq boards have this 
kind of setting.

> Btw, part of the reason I'm trying to make sure we fix it the right
> way is that when we try to enable async boot by default, we don't run
> into issues.

Sounds resonable.

Best regards,
Alexander

[1] https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/tree/
arch/arm64/boot/dts/freescale/imx8mq-tqma8mq.dtsi#n84

[-- Attachment #2: dmesg.short --]
[-- Type: text/plain, Size: 42236 bytes --]

[    0.000000][    T0] Booting Linux on physical CPU 0x0000000000 [0x410fd034]
[    0.000000][    T0] Linux version 5.19.0-rc5-next-20220706+ (steina@steina-w) (aarch64-v8a-linux-gnu-gcc (OSELAS.Toolchain-2020.08.0 10-20200822) 10.2.1 20200822, GNU ld (GNU Binutils) 2.35) #422 SMP PREEMPT Wed Jul 6 14:43:55 CEST 2022
[    0.000000][    T0] Machine model: TQ-Systems GmbH i.MX8MQ TQMa8MQ on MBa8Mx
[    0.000000][    T0] earlycon: ec_imx6q0 at MMIO 0x0000000030880000 (options '115200')
[    0.000000][    T0] printk: bootconsole [ec_imx6q0] enabled
[    0.000000][    T0] efi: UEFI not found.
[    0.000000][    T0] Reserved memory: created CMA memory pool at 0x0000000090000000, size 640 MiB
[    0.000000][    T0] OF: reserved mem: initialized node linux,cma, compatible id shared-dma-pool
[    0.000000][    T0] NUMA: No NUMA configuration found
[    0.000000][    T0] NUMA: Faking a node at [mem 0x0000000040000000-0x000000013fffffff]
[    0.000000][    T0] NUMA: NODE_DATA [mem 0x13f7beb40-0x13f7c0fff]
[    0.000000][    T0] Zone ranges:
[    0.000000][    T0]   DMA      [mem 0x0000000040000000-0x00000000ffffffff]
[    0.000000][    T0]   DMA32    empty
[    0.000000][    T0]   Normal   [mem 0x0000000100000000-0x000000013fffffff]
[    0.000000][    T0] Movable zone start for each node
[    0.000000][    T0] Early memory node ranges
[    0.000000][    T0]   node   0: [mem 0x0000000040000000-0x000000013fffffff]
[    0.000000][    T0] Initmem setup node 0 [mem 0x0000000040000000-0x000000013fffffff]
[    0.000000][    T0] psci: probing for conduit method from DT.
[    0.000000][    T0] psci: PSCIv1.1 detected in firmware.
[    0.000000][    T0] psci: Using standard PSCI v0.2 function IDs
[    0.000000][    T0] psci: MIGRATE_INFO_TYPE not supported.
[    0.000000][    T0] psci: SMC Calling Convention v1.1
[    0.000000][    T0] percpu: Embedded 19 pages/cpu s38376 r8192 d31256 u77824
[    0.000000][    T0] pcpu-alloc: s38376 r8192 d31256 u77824 alloc=19*4096
[    0.000000][    T0] pcpu-alloc: [0] 0 [0] 1 [0] 2 [0] 3 
[    0.000000][    T0] Detected VIPT I-cache on CPU0
[    0.000000][    T0] CPU features: detected: GIC system register CPU interface
[    0.000000][    T0] CPU features: detected: ARM erratum 845719
[    0.000000][    T0] Fallback order for Node 0: 0 
[    0.000000][    T0] Built 1 zonelists, mobility grouping on.  Total pages: 1032192
[    0.000000][    T0] Policy zone: Normal
[    0.000000][    T0] Kernel command line: root=/dev/nfs rw nfsroot=192.168.0.101:/srv/tftp/imx8_mainline,v3,tcp ip=192.168.0.100:192.168.0.101::::eth0:off console=ttymxc2,115200 earlycon=ec_imx6q,0x30880000,115200 dyndbg="func device_link_add +p; func fwnode_link_add +p; func fw_devlink_relax_link +p"
[    0.000000][    T0] Dentry cache hash table entries: 524288 (order: 10, 4194304 bytes, linear)
[    0.000000][    T0] Inode-cache hash table entries: 262144 (order: 9, 2097152 bytes, linear)
[    0.000000][    T0] mem auto-init: stack:off, heap alloc:off, heap free:off
[    0.000000][    T0] software IO TLB: mapped [mem 0x00000000fbfff000-0x00000000fffff000] (64MB)
[    0.000000][    T0] Memory: 3363428K/4194304K available (14016K kernel code, 2190K rwdata, 6540K rodata, 5312K init, 494K bss, 175516K reserved, 655360K cma-reserved)
[    0.000000][    T0] SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=4, Nodes=1
[    0.000000][    T0] rcu: Preemptible hierarchical RCU implementation.
[    0.000000][    T0] rcu: 	RCU event tracing is enabled.
[    0.000000][    T0] rcu: 	RCU restricting CPUs from NR_CPUS=256 to nr_cpu_ids=4.
[    0.000000][    T0] 	Trampoline variant of Tasks RCU enabled.
[    0.000000][    T0] 	Tracing variant of Tasks RCU enabled.
[    0.000000][    T0] rcu: RCU calculated value of scheduler-enlistment delay is 25 jiffies.
[    0.000000][    T0] rcu: Adjusting geometry for rcu_fanout_leaf=16, nr_cpu_ids=4
[    0.000000][    T0] NR_IRQS: 64, nr_irqs: 64, preallocated irqs: 0
[    0.000000][    T0] GICv3: GIC: Using split EOI/Deactivate mode
[    0.000000][    T0] GICv3: 128 SPIs implemented
[    0.000000][    T0] GICv3: 0 Extended SPIs implemented
[    0.000000][    T0] Root IRQ handler: gic_handle_irq
[    0.000000][    T0] GICv3: GICv3 features: 16 PPIs
[    0.000000][    T0] GICv3: CPU0: found redistributor 0 region 0:0x0000000038880000
[    0.000000][    T0] ITS: No ITS available, not enabling LPIs
[    0.000000][    T0] rcu: srcu_init: Setting srcu_struct sizes based on contention.
[    0.000000][    T0] arch_timer: cp15 timer(s) running at 8.33MHz (phys).
[    0.000000][    T0] clocksource: arch_sys_counter: mask: 0xffffffffffffff max_cycles: 0x1ec0311ec, max_idle_ns: 440795202152 ns
[    0.000001][    T0] sched_clock: 56 bits at 8MHz, resolution 120ns, wraps every 2199023255541ns
[    0.009211][    T0] Console: colour dummy device 80x25
[    0.013967][    T0] Calibrating delay loop (skipped), value calculated using timer frequency.. 16.66 BogoMIPS (lpj=33333)
[    0.024859][    T0] pid_max: default: 32768 minimum: 301
[    0.030261][    T0] LSM: Security Framework initializing
[    0.035625][    T0] Mount-cache hash table entries: 8192 (order: 4, 65536 bytes, linear)
[    0.043622][    T0] Mountpoint-cache hash table entries: 8192 (order: 4, 65536 bytes, linear)
[    0.054209][    T1] cblist_init_generic: Setting adjustable number of callback queues.
[    0.060116][    T1] cblist_init_generic: Setting shift to 2 and lim to 1.
[    0.066983][    T1] cblist_init_generic: Setting shift to 2 and lim to 1.
[    0.073902][    T1] rcu: Hierarchical SRCU implementation.
[    0.079195][    T1] rcu: 	Max phase no-delay instances is 1000.
[    0.087737][    T1] EFI services will not be available.
[    0.090823][    T1] smp: Bringing up secondary CPUs ...
[    0.096224][    T0] Detected VIPT I-cache on CPU1
[    0.096283][    T0] GICv3: CPU1: found redistributor 1 region 0:0x00000000388a0000
[    0.096337][    T0] CPU1: Booted secondary processor 0x0000000001 [0x410fd034]
[    0.097001][    T0] Detected VIPT I-cache on CPU2
[    0.097041][    T0] GICv3: CPU2: found redistributor 2 region 0:0x00000000388c0000
[    0.097069][    T0] CPU2: Booted secondary processor 0x0000000002 [0x410fd034]
[    0.097672][    T0] Detected VIPT I-cache on CPU3
[    0.097711][    T0] GICv3: CPU3: found redistributor 3 region 0:0x00000000388e0000
[    0.097739][    T0] CPU3: Booted secondary processor 0x0000000003 [0x410fd034]
[    0.097832][    T1] smp: Brought up 1 node, 4 CPUs
[    0.158985][    T1] SMP: Total of 4 processors activated.
[    0.164380][    T1] CPU features: detected: 32-bit EL0 Support
[    0.170245][    T1] CPU features: detected: CRC32 instructions
[    0.176396][    T1] CPU: All CPU(s) started at EL2
[    0.180899][   T14] alternatives: patching kernel code
[    0.188121][    T1] devtmpfs: initialized
[    0.199468][    T1] clocksource: jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 7645041785100000 ns
[    0.207118][    T1] futex hash table entries: 1024 (order: 4, 65536 bytes, linear)
[    0.239390][    T1] pinctrl core: initialized pinctrl subsystem
[    0.245118][    T1] DMI not present or invalid.
[    0.247854][    T1] NET: Registered PF_NETLINK/PF_ROUTE protocol family
[    0.254831][    T1] DMA: preallocated 512 KiB GFP_KERNEL pool for atomic allocations
[    0.261589][    T1] DMA: preallocated 512 KiB GFP_KERNEL|GFP_DMA pool for atomic allocations
[    0.270133][    T1] DMA: preallocated 512 KiB GFP_KERNEL|GFP_DMA32 pool for atomic allocations
[    0.278564][    T1] audit: initializing netlink subsys (disabled)
[    0.284869][   T34] audit: type=2000 audit(0.184:1): state=initialized audit_enabled=0 res=1
[    0.285423][    T1] thermal_sys: Registered thermal governor 'bang_bang'
[    0.293086][    T1] thermal_sys: Registered thermal governor 'step_wise'
[    0.299794][    T1] thermal_sys: Registered thermal governor 'power_allocator'
[    0.306986][    T1] cpuidle: using governor menu
[    0.318580][    T1] hw-breakpoint: found 6 breakpoint and 4 watchpoint registers.
[    0.325965][    T1] ASID allocator initialised with 65536 entries
[    0.333008][    T1] Serial: AMBA PL011 UART driver
[    0.337902][    T1] gpio@30200000 Linked as a fwnode consumer to gpc@303a0000
[    0.337943][    T1] gpio@30200000 Linked as a fwnode consumer to clock-controller@30380000
[    0.338004][    T1] gpio@30210000 Linked as a fwnode consumer to gpc@303a0000
[    0.338038][    T1] gpio@30210000 Linked as a fwnode consumer to clock-controller@30380000
[    0.338098][    T1] gpio@30220000 Linked as a fwnode consumer to gpc@303a0000
[    0.338132][    T1] gpio@30220000 Linked as a fwnode consumer to clock-controller@30380000
[    0.338192][    T1] gpio@30230000 Linked as a fwnode consumer to gpc@303a0000
[    0.338227][    T1] gpio@30230000 Linked as a fwnode consumer to clock-controller@30380000
[    0.338284][    T1] gpio@30240000 Linked as a fwnode consumer to gpc@303a0000
[    0.338317][    T1] gpio@30240000 Linked as a fwnode consumer to clock-controller@30380000
[    0.338378][    T1] tmu@30260000 Linked as a fwnode consumer to gpc@303a0000
[    0.338398][    T1] tmu@30260000 Linked as a fwnode consumer to clock-controller@30380000
[    0.338455][    T1] watchdog@30280000 Linked as a fwnode consumer to gpc@303a0000
[    0.338475][    T1] watchdog@30280000 Linked as a fwnode consumer to clock-controller@30380000
[    0.338502][    T1] watchdog@30280000 Linked as a fwnode consumer to pinctrl@30330000
[    0.338549][    T1] sdma@302c0000 Linked as a fwnode consumer to gpc@303a0000
[    0.338571][    T1] sdma@302c0000 Linked as a fwnode consumer to clock-controller@30380000
[    0.338984][    T1] efuse@30350000 Linked as a fwnode consumer to clock-controller@30380000
[    0.339058][    T1] syscon@30360000 Linked as a fwnode consumer to gpc@303a0000
[    0.339120][    T1] snvs-rtc-lp Linked as a fwnode consumer to gpc@303a0000
[    0.339154][    T1] snvs-rtc-lp Linked as a fwnode consumer to clock-controller@30380000
[    0.339199][    T1] snvs-powerkey Linked as a fwnode consumer to gpc@303a0000
[    0.339226][    T1] snvs-powerkey Linked as a fwnode consumer to clock-controller@30380000
[    0.339278][    T1] clock-controller@30380000 Linked as a fwnode consumer to gpc@303a0000
[    0.339392][    T1] reset-controller@30390000 Linked as a fwnode consumer to gpc@303a0000
[    0.339497][    T1] power-domain@1 Linked as a fwnode consumer to gpc@303a0000
[    0.339564][    T1] power-domain@5 Linked as a fwnode consumer to clock-controller@30380000
[    0.339601][    T1] power-domain@5 Linked as a fwnode consumer to pmic@8
[    0.339631][    T1] power-domain@6 Linked as a fwnode consumer to clock-controller@30380000
[    0.339672][    T1] power-domain@6 Linked as a fwnode consumer to pmic@8
[    0.339786][    T1] pwm@30680000 Linked as a fwnode consumer to gpc@303a0000
[    0.339809][    T1] pwm@30680000 Linked as a fwnode consumer to clock-controller@30380000
[    0.339845][    T1] pwm@30680000 Linked as a fwnode consumer to pinctrl@30330000
[    0.339886][    T1] pwm@30690000 Linked as a fwnode consumer to gpc@303a0000
[    0.339906][    T1] pwm@30690000 Linked as a fwnode consumer to clock-controller@30380000
[    0.339942][    T1] pwm@30690000 Linked as a fwnode consumer to pinctrl@30330000
[    0.339985][    T1] timer@306a0000 Linked as a fwnode consumer to gpc@303a0000
[    0.340064][    T1] spi@30820000 Linked as a fwnode consumer to gpc@303a0000
[    0.340085][    T1] spi@30820000 Linked as a fwnode consumer to clock-controller@30380000
[    0.340114][    T1] spi@30820000 Linked as a fwnode consumer to sdma@30bd0000
[    0.340146][    T1] spi@30820000 Linked as a fwnode consumer to pinctrl@30330000
[    0.340167][    T1] spi@30820000 Linked as a fwnode consumer to gpio@30240000
[    0.340218][    T1] spi@30830000 Linked as a fwnode consumer to gpc@303a0000
[    0.340240][    T1] spi@30830000 Linked as a fwnode consumer to clock-controller@30380000
[    0.340266][    T1] spi@30830000 Linked as a fwnode consumer to sdma@30bd0000
[    0.340300][    T1] spi@30830000 Linked as a fwnode consumer to pinctrl@30330000
[    0.340319][    T1] spi@30830000 Linked as a fwnode consumer to gpio@30240000
[    0.340364][    T1] serial@30860000 Linked as a fwnode consumer to gpc@303a0000
[    0.340386][    T1] serial@30860000 Linked as a fwnode consumer to clock-controller@30380000
[    0.340419][    T1] serial@30860000 Linked as a fwnode consumer to pinctrl@30330000
[    0.340468][    T1] serial@30880000 Linked as a fwnode consumer to gpc@303a0000
[    0.340489][    T1] serial@30880000 Linked as a fwnode consumer to clock-controller@30380000
[    0.340524][    T1] serial@30880000 Linked as a fwnode consumer to pinctrl@30330000
[    0.340572][    T1] serial@30890000 Linked as a fwnode consumer to gpc@303a0000
[    0.340599][    T1] serial@30890000 Linked as a fwnode consumer to clock-controller@30380000
[    0.340635][    T1] serial@30890000 Linked as a fwnode consumer to pinctrl@30330000
[    0.340690][    T1] sai@308c0000 Linked as a fwnode consumer to gpc@303a0000
[    0.340712][    T1] sai@308c0000 Linked as a fwnode consumer to clock-controller@30380000
[    0.340782][    T1] sai@308c0000 Linked as a fwnode consumer to sdma@30bd0000
[    0.340817][    T1] sai@308c0000 Linked as a fwnode consumer to pinctrl@30330000
[    0.340880][    T1] crypto@30900000 Linked as a fwnode consumer to gpc@303a0000
[    0.340903][    T1] crypto@30900000 Linked as a fwnode consumer to clock-controller@30380000
[    0.340953][    T1] jr@2000 Linked as a fwnode consumer to gpc@303a0000
[    0.341002][    T1] jr@3000 Linked as a fwnode consumer to gpc@303a0000
[    0.341053][    T1] i2c@30a20000 Linked as a fwnode consumer to gpc@303a0000
[    0.341074][    T1] i2c@30a20000 Linked as a fwnode consumer to clock-controller@30380000
[    0.341109][    T1] i2c@30a20000 Linked as a fwnode consumer to pinctrl@30330000
[    0.341136][    T1] i2c@30a20000 Linked as a fwnode consumer to gpio@30240000
[    0.341384][    T1] rtc@51 Linked as a fwnode consumer to pinctrl@30330000
[    0.341404][    T1] rtc@51 Linked as a fwnode consumer to gpio@30200000
[    0.341513][    T1] gpio@23 Linked as a fwnode consumer to regulator-3v3
[    0.341529][    T1] gpio@23 Linked as a fwnode consumer to gpio@30200000
[    0.341641][    T1] gpio@24 Linked as a fwnode consumer to regulator-3v3
[    0.341678][    T1] gpio@25 Linked as a fwnode consumer to regulator-3v3
[    0.341698][    T1] gpio@25 Linked as a fwnode consumer to pinctrl@30330000
[    0.341714][    T1] gpio@25 Linked as a fwnode consumer to gpio@30200000
[    0.341800][    T1] i2c@30a30000 Linked as a fwnode consumer to gpc@303a0000
[    0.341821][    T1] i2c@30a30000 Linked as a fwnode consumer to clock-controller@30380000
[    0.341857][    T1] i2c@30a30000 Linked as a fwnode consumer to pinctrl@30330000
[    0.341885][    T1] i2c@30a30000 Linked as a fwnode consumer to gpio@30240000
[    0.341927][    T1] audio-codec@18 Linked as a fwnode consumer to gpio@25
[    0.341945][    T1] audio-codec@18 Linked as a fwnode consumer to regulator-3v3
[    0.341972][    T1] audio-codec@18 Linked as a fwnode consumer to clock-controller@30380000
[    0.342044][    T1] i2c@30a40000 Linked as a fwnode consumer to gpc@303a0000
[    0.342066][    T1] i2c@30a40000 Linked as a fwnode consumer to clock-controller@30380000
[    0.342100][    T1] i2c@30a40000 Linked as a fwnode consumer to pinctrl@30330000
[    0.342133][    T1] i2c@30a40000 Linked as a fwnode consumer to gpio@30240000
[    0.342192][    T1] mailbox@30aa0000 Linked as a fwnode consumer to gpc@303a0000
[    0.342213][    T1] mailbox@30aa0000 Linked as a fwnode consumer to clock-controller@30380000
[    0.342261][    T1] mmc@30b40000 Linked as a fwnode consumer to gpc@303a0000
[    0.342284][    T1] mmc@30b40000 Linked as a fwnode consumer to clock-controller@30380000
[    0.342333][    T1] mmc@30b40000 Linked as a fwnode consumer to pinctrl@30330000
[    0.342377][    T1] mmc@30b40000 Linked as a fwnode consumer to regulator-vcc3v3
[    0.342397][    T1] mmc@30b40000 Linked as a fwnode consumer to regulator-vcc1v8
[    0.342443][    T1] mmc@30b50000 Linked as a fwnode consumer to gpc@303a0000
[    0.342467][    T1] mmc@30b50000 Linked as a fwnode consumer to clock-controller@30380000
[    0.342515][    T1] mmc@30b50000 Linked as a fwnode consumer to pinctrl@30330000
[    0.342563][    T1] mmc@30b50000 Linked as a fwnode consumer to gpio@30210000
[    0.342592][    T1] mmc@30b50000 Linked as a fwnode consumer to regulator-vmmc
[    0.342634][    T1] sdma@30bd0000 Linked as a fwnode consumer to gpc@303a0000
[    0.342655][    T1] sdma@30bd0000 Linked as a fwnode consumer to clock-controller@30380000
[    0.342720][    T1] ethernet@30be0000 Linked as a fwnode consumer to gpc@303a0000
[    0.342783][    T1] ethernet@30be0000 Linked as a fwnode consumer to clock-controller@30380000
[    0.342843][    T1] ethernet@30be0000 Linked as a fwnode consumer to efuse@30350000
[    0.342874][    T1] ethernet@30be0000 Linked as a fwnode consumer to pinctrl@30330000
[    0.342901][    T1] ethernet@30be0000 Linked as a fwnode consumer to regulator-3v3
[    0.342969][    T1] ethernet-phy@e Linked as a fwnode consumer to gpio@25
[    0.343008][    T1] interconnect@32700000 Linked as a fwnode consumer to clock-controller@30380000
[    0.343113][    T1] interrupt-controller@32e2d000 Linked as a fwnode consumer to gpc@303a0000
[    0.343135][    T1] interrupt-controller@32e2d000 Linked as a fwnode consumer to clock-controller@30380000
[    0.343193][    T1] gpu@38000000 Linked as a fwnode consumer to gpc@303a0000
[    0.343214][    T1] gpu@38000000 Linked as a fwnode consumer to clock-controller@30380000
[    0.343288][    T1] usb@38100000 Linked as a fwnode consumer to clock-controller@30380000
[    0.343342][    T1] usb@38100000 Linked as a fwnode consumer to gpc@303a0000
[    0.343367][    T1] usb@38100000 Linked as a fwnode consumer to usb-phy@381f0040
[    0.343407][    T1] usb@38100000 Linked as a fwnode consumer to extcon-usbotg0
[    0.343450][    T1] usb-phy@381f0040 Linked as a fwnode consumer to clock-controller@30380000
[    0.343489][    T1] usb-phy@381f0040 Linked as a fwnode consumer to regulator-otg-vbus
[    0.343529][    T1] usb@38200000 Linked as a fwnode consumer to clock-controller@30380000
[    0.343582][    T1] usb@38200000 Linked as a fwnode consumer to gpc@303a0000
[    0.343606][    T1] usb@38200000 Linked as a fwnode consumer to usb-phy@382f0040
[    0.343659][    T1] usb-phy@382f0040 Linked as a fwnode consumer to clock-controller@30380000
[    0.343723][    T1] video-codec@38300000 Linked as a fwnode consumer to gpc@303a0000
[    0.343745][    T1] video-codec@38300000 Linked as a fwnode consumer to clock-controller@30380000
[    0.343779][    T1] video-codec@38300000 Linked as a fwnode consumer to blk-ctrl@38320000
[    0.343823][    T1] video-codec@38310000 Linked as a fwnode consumer to gpc@303a0000
[    0.343845][    T1] video-codec@38310000 Linked as a fwnode consumer to clock-controller@30380000
[    0.343870][    T1] video-codec@38310000 Linked as a fwnode consumer to blk-ctrl@38320000
[    0.343905][    T1] blk-ctrl@38320000 Linked as a fwnode consumer to gpc@303a0000
[    0.343943][    T1] blk-ctrl@38320000 Linked as a fwnode consumer to clock-controller@30380000
[    0.344029][    T1] pcie@33800000 Linked as a fwnode consumer to gpc@303a0000
[    0.344086][    T1] pcie@33800000 Linked as a fwnode consumer to reset-controller@30390000
[    0.344133][    T1] pcie@33800000 Linked as a fwnode consumer to pmic@8
[    0.344153][    T1] pcie@33800000 Linked as a fwnode consumer to gpio@23
[    0.344173][    T1] pcie@33800000 Linked as a fwnode consumer to clock-controller@30380000
[    0.344216][    T1] pcie@33800000 Linked as a fwnode consumer to regulator-3v3
[    0.344283][    T1] pcie@33c00000 Linked as a fwnode consumer to gpc@303a0000
[    0.344335][    T1] pcie@33c00000 Linked as a fwnode consumer to reset-controller@30390000
[    0.344382][    T1] pcie@33c00000 Linked as a fwnode consumer to pmic@8
[    0.344399][    T1] pcie@33c00000 Linked as a fwnode consumer to clock-controller@30380000
[    0.344442][    T1] pcie@33c00000 Linked as a fwnode consumer to regulator-3v3
[    0.346535][    T1] platform 30280000.watchdog: Linked as a consumer to 30330000.pinctrl
[    0.347193][    T1] imx8mq-pinctrl 30330000.pinctrl: initialized IMX pinctrl driver
[    0.353405][    T1] platform 30370000.snvs:snvs-powerkey: Linked as a consumer to 30380000.clock-controller
[    0.353484][    T1] platform 30370000.snvs:snvs-rtc-lp: Linked as a consumer to 30380000.clock-controller
[    0.353556][    T1] platform 30350000.efuse: Linked as a consumer to 30380000.clock-controller
[    0.353634][    T1] platform 302c0000.sdma: Linked as a consumer to 30380000.clock-controller
[    0.353706][    T1] platform 30280000.watchdog: Linked as a consumer to 30380000.clock-controller
[    0.353778][    T1] platform 30260000.tmu: Linked as a consumer to 30380000.clock-controller
[    0.353855][    T1] platform 30240000.gpio: Linked as a consumer to 30380000.clock-controller
[    0.353926][    T1] platform 30230000.gpio: Linked as a consumer to 30380000.clock-controller
[    0.354007][    T1] platform 30220000.gpio: Linked as a consumer to 30380000.clock-controller
[    0.354091][    T1] platform 30210000.gpio: Linked as a consumer to 30380000.clock-controller
[    0.354162][    T1] platform 30200000.gpio: Linked as a consumer to 30380000.clock-controller
[    0.354603][    T1] platform 30390000.reset-controller: Linked as a consumer to 303a0000.gpc
[    0.354688][    T1] platform 30380000.clock-controller: Linked as a consumer to 303a0000.gpc
[    0.354761][    T1] platform 30370000.snvs:snvs-powerkey: Linked as a consumer to 303a0000.gpc
[    0.354842][    T1] platform 30370000.snvs:snvs-rtc-lp: Linked as a consumer to 303a0000.gpc
[    0.354913][    T1] platform 30360000.syscon: Linked as a consumer to 303a0000.gpc
[    0.354982][    T1] platform 302c0000.sdma: Linked as a consumer to 303a0000.gpc
[    0.355058][    T1] platform 30280000.watchdog: Linked as a consumer to 303a0000.gpc
[    0.355128][    T1] platform 30260000.tmu: Linked as a consumer to 303a0000.gpc
[    0.355197][    T1] platform 30240000.gpio: Linked as a consumer to 303a0000.gpc
[    0.355276][    T1] platform 30230000.gpio: Linked as a consumer to 303a0000.gpc
[    0.355346][    T1] platform 30220000.gpio: Linked as a consumer to 303a0000.gpc
[    0.355431][    T1] platform 30210000.gpio: Linked as a consumer to 303a0000.gpc
[    0.355504][    T1] platform 30200000.gpio: Linked as a consumer to 303a0000.gpc
[    0.355584][    T1] platform 303a0000.gpc: Linked as a sync state only consumer to 30380000.clock-controller
[    0.355809][    T1] platform 30400000.bus: Linked as a sync state only consumer to 30330000.pinctrl
[    0.355879][    T1] platform 30400000.bus: Linked as a sync state only consumer to 30380000.clock-controller
[    0.355977][    T1] platform 30400000.bus: Linked as a sync state only consumer to 303a0000.gpc
[    0.356232][    T1] platform 30680000.pwm: Linked as a consumer to 30330000.pinctrl
[    0.356338][    T1] platform 30680000.pwm: Linked as a consumer to 30380000.clock-controller
[    0.356408][    T1] platform 30680000.pwm: Linked as a consumer to 303a0000.gpc
[    0.356650][    T1] platform 30690000.pwm: Linked as a consumer to 30330000.pinctrl
[    0.356762][    T1] platform 30690000.pwm: Linked as a consumer to 30380000.clock-controller
[    0.356834][    T1] platform 30690000.pwm: Linked as a consumer to 303a0000.gpc
[    0.357081][    T1] platform 306a0000.timer: Linked as a consumer to 303a0000.gpc
[    0.357380][    T1] platform 30800000.bus: Linked as a sync state only consumer to 30240000.gpio
[    0.357449][    T1] platform 30800000.bus: Linked as a sync state only consumer to 30330000.pinctrl
[    0.357520][    T1] platform 30800000.bus: Linked as a sync state only consumer to 30380000.clock-controller
[    0.357597][    T1] platform 30800000.bus: Linked as a sync state only consumer to 303a0000.gpc
[    0.357726][    T1] platform 30800000.bus: Linked as a sync state only consumer to 30200000.gpio
[    0.357837][    T1] platform 30800000.bus: Linked as a sync state only consumer to 30210000.gpio
[    0.357922][    T1] platform 30800000.bus: Linked as a sync state only consumer to 30350000.efuse
[    0.358158][    T1] platform 30820000.spi: Linked as a consumer to 30240000.gpio
[    0.358231][    T1] platform 30820000.spi: Linked as a consumer to 30330000.pinctrl
[    0.358335][    T1] platform 30820000.spi: Linked as a consumer to 30380000.clock-controller
[    0.358418][    T1] platform 30820000.spi: Linked as a consumer to 303a0000.gpc
[    0.358666][    T1] platform 30830000.spi: Linked as a consumer to 30240000.gpio
[    0.358741][    T1] platform 30830000.spi: Linked as a consumer to 30330000.pinctrl
[    0.358851][    T1] platform 30830000.spi: Linked as a consumer to 30380000.clock-controller
[    0.358927][    T1] platform 30830000.spi: Linked as a consumer to 303a0000.gpc
[    0.359177][    T1] platform 30860000.serial: Linked as a consumer to 30330000.pinctrl
[    0.359287][    T1] platform 30860000.serial: Linked as a consumer to 30380000.clock-controller
[    0.359356][    T1] platform 30860000.serial: Linked as a consumer to 303a0000.gpc
[    0.359601][    T1] platform 30880000.serial: Linked as a consumer to 30330000.pinctrl
[    0.359706][    T1] platform 30880000.serial: Linked as a consumer to 30380000.clock-controller
[    0.359782][    T1] platform 30880000.serial: Linked as a consumer to 303a0000.gpc
[    0.360024][    T1] platform 30890000.serial: Linked as a consumer to 30330000.pinctrl
[    0.360152][    T1] platform 30890000.serial: Linked as a consumer to 30380000.clock-controller
[    0.360223][    T1] platform 30890000.serial: Linked as a consumer to 303a0000.gpc
[    0.360474][    T1] platform 308c0000.sai: Linked as a consumer to 30330000.pinctrl
[    0.360591][    T1] platform 308c0000.sai: Linked as a consumer to 30380000.clock-controller
[    0.360662][    T1] platform 308c0000.sai: Linked as a consumer to 303a0000.gpc
[    0.360904][    T1] platform 30900000.crypto: Linked as a consumer to 30380000.clock-controller
[    0.360980][    T1] platform 30900000.crypto: Linked as a consumer to 303a0000.gpc
[    0.361263][    T1] platform 30a20000.i2c: Linked as a consumer to 30240000.gpio
[    0.361337][    T1] platform 30a20000.i2c: Linked as a consumer to 30330000.pinctrl
[    0.361448][    T1] platform 30a20000.i2c: Linked as a consumer to 30380000.clock-controller
[    0.361523][    T1] platform 30a20000.i2c: Linked as a consumer to 303a0000.gpc
[    0.361625][    T1] platform 30a20000.i2c: Linked as a sync state only consumer to 30200000.gpio
[    0.361880][    T1] platform 30a30000.i2c: Linked as a consumer to 30240000.gpio
[    0.361958][    T1] platform 30a30000.i2c: Linked as a consumer to 30330000.pinctrl
[    0.362062][    T1] platform 30a30000.i2c: Linked as a consumer to 30380000.clock-controller
[    0.362139][    T1] platform 30a30000.i2c: Linked as a consumer to 303a0000.gpc
[    0.362395][    T1] platform 30a40000.i2c: Linked as a consumer to 30240000.gpio
[    0.362474][    T1] platform 30a40000.i2c: Linked as a consumer to 30330000.pinctrl
[    0.362578][    T1] platform 30a40000.i2c: Linked as a consumer to 30380000.clock-controller
[    0.362654][    T1] platform 30a40000.i2c: Linked as a consumer to 303a0000.gpc
[    0.362911][    T1] platform 30aa0000.mailbox: Linked as a consumer to 30380000.clock-controller
[    0.362991][    T1] platform 30aa0000.mailbox: Linked as a consumer to 303a0000.gpc
[    0.363233][    T1] platform 30b40000.mmc: Linked as a consumer to 30330000.pinctrl
[    0.363345][    T1] platform 30b40000.mmc: Linked as a consumer to 30380000.clock-controller
[    0.363418][    T1] platform 30b40000.mmc: Linked as a consumer to 303a0000.gpc
[    0.363669][    T1] platform 30b50000.mmc: Linked as a consumer to 30210000.gpio
[    0.363740][    T1] platform 30b50000.mmc: Linked as a consumer to 30330000.pinctrl
[    0.363856][    T1] platform 30b50000.mmc: Linked as a consumer to 30380000.clock-controller
[    0.363933][    T1] platform 30b50000.mmc: Linked as a consumer to 303a0000.gpc
[    0.364193][    T1] platform 308c0000.sai: Linked as a consumer to 30bd0000.sdma
[    0.364264][    T1] platform 30830000.spi: Linked as a consumer to 30bd0000.sdma
[    0.364347][    T1] platform 30820000.spi: Linked as a consumer to 30bd0000.sdma
[    0.364424][    T1] platform 30bd0000.sdma: Linked as a consumer to 30380000.clock-controller
[    0.364497][    T1] platform 30bd0000.sdma: Linked as a consumer to 303a0000.gpc
[    0.364752][    T1] platform 30be0000.ethernet: Linked as a consumer to 30330000.pinctrl
[    0.364862][    T1] platform 30be0000.ethernet: Linked as a consumer to 30350000.efuse
[    0.364940][    T1] platform 30be0000.ethernet: Linked as a consumer to 30380000.clock-controller
[    0.365013][    T1] platform 30be0000.ethernet: Linked as a consumer to 303a0000.gpc
[    0.365261][    T1] platform 32700000.interconnect: Linked as a consumer to 30380000.clock-controller
[    0.365487][    T1] platform 32c00000.bus: Linked as a sync state only consumer to 30380000.clock-controller
[    0.365564][    T1] platform 32c00000.bus: Linked as a sync state only consumer to 303a0000.gpc
[    0.365795][    T1] platform 32e2d000.interrupt-controller: Linked as a consumer to 30380000.clock-controller
[    0.365877][    T1] platform 32e2d000.interrupt-controller: Linked as a consumer to 303a0000.gpc
[    0.366108][    T1] platform 38000000.gpu: Linked as a consumer to 30380000.clock-controller
[    0.366189][    T1] platform 38000000.gpu: Linked as a consumer to 303a0000.gpc
[    0.366417][    T1] platform 38100000.usb: Linked as a consumer to 303a0000.gpc
[    0.366511][    T1] platform 38100000.usb: Linked as a consumer to 30380000.clock-controller
[    0.366733][    T1] platform 38100000.usb: Linked as a consumer to 381f0040.usb-phy
[    0.366808][    T1] platform 381f0040.usb-phy: Linked as a consumer to 30380000.clock-controller
[    0.367025][    T1] platform 38200000.usb: Linked as a consumer to 303a0000.gpc
[    0.367124][    T1] platform 38200000.usb: Linked as a consumer to 30380000.clock-controller
[    0.367371][    T1] platform 38200000.usb: Linked as a consumer to 382f0040.usb-phy
[    0.367445][    T1] platform 382f0040.usb-phy: Linked as a consumer to 30380000.clock-controller
[    0.367664][    T1] platform 38300000.video-codec: Linked as a consumer to 30380000.clock-controller
[    0.367745][    T1] platform 38300000.video-codec: Linked as a consumer to 303a0000.gpc
[    0.367967][    T1] platform 38310000.video-codec: Linked as a consumer to 30380000.clock-controller
[    0.368048][    T1] platform 38310000.video-codec: Linked as a consumer to 303a0000.gpc
[    0.368308][    T1] platform 38310000.video-codec: Linked as a consumer to 38320000.blk-ctrl
[    0.368387][    T1] platform 38300000.video-codec: Linked as a consumer to 38320000.blk-ctrl
[    0.368462][    T1] platform 38320000.blk-ctrl: Linked as a consumer to 30380000.clock-controller
[    0.368542][    T1] platform 38320000.blk-ctrl: Linked as a consumer to 303a0000.gpc
[    0.368800][    T1] platform 33800000.pcie: Linked as a consumer to 30380000.clock-controller
[    0.368878][    T1] platform 33800000.pcie: Linked as a consumer to 30390000.reset-controller
[    0.368948][    T1] platform 33800000.pcie: Linked as a consumer to 303a0000.gpc
[    0.369210][    T1] platform 33c00000.pcie: Linked as a consumer to 30380000.clock-controller
[    0.369284][    T1] platform 33c00000.pcie: Linked as a consumer to 30390000.reset-controller
[    0.369361][    T1] platform 33c00000.pcie: Linked as a consumer to 303a0000.gpc
[    0.369721][    T1] platform 30b40000.mmc: Linked as a consumer to regulator-vcc1v8
[    0.369916][    T1] platform 30b40000.mmc: Linked as a consumer to regulator-vcc3v3
[    0.370041][    T1] regulator-vdd-arm Linked as a fwnode consumer to pinctrl@30330000
[    0.370081][    T1] regulator-vdd-arm Linked as a fwnode consumer to gpio@30200000
[    0.370179][    T1] platform regulator-vdd-arm: Linked as a consumer to 30200000.gpio
[    0.370254][    T1] platform regulator-vdd-arm: Linked as a consumer to 30330000.pinctrl
[    0.370410][    T1] beeper Linked as a fwnode consumer to pwm@30690000
[    0.370437][    T1] beeper Linked as a fwnode consumer to regulator-3v3
[    0.370519][    T1] platform beeper: Linked as a consumer to 30690000.pwm
[    0.370652][    T1] gpio-keys Linked as a fwnode consumer to pinctrl@30330000
[    0.370691][    T1] switch-1 Linked as a fwnode consumer to gpio@30200000
[    0.370729][    T1] switch-2 Linked as a fwnode consumer to gpio@30220000
[    0.370764][    T1] switch-3 Linked as a fwnode consumer to gpio@30200000
[    0.370853][    T1] platform gpio-keys: Linked as a consumer to 30330000.pinctrl
[    0.370957][    T1] platform gpio-keys: Linked as a sync state only consumer to 30200000.gpio
[    0.371032][    T1] platform gpio-keys: Linked as a sync state only consumer to 30220000.gpio
[    0.371162][    T1] gpio-leds Linked as a fwnode consumer to pinctrl@30330000
[    0.371192][    T1] led1 Linked as a fwnode consumer to gpio@30200000
[    0.371225][    T1] led2 Linked as a fwnode consumer to gpio@30220000
[    0.371255][    T1] led3 Linked as a fwnode consumer to gpio@30200000
[    0.371334][    T1] platform gpio-leds: Linked as a consumer to 30330000.pinctrl
[    0.371440][    T1] platform gpio-leds: Linked as a sync state only consumer to 30200000.gpio
[    0.371508][    T1] platform gpio-leds: Linked as a sync state only consumer to 30220000.gpio
[    0.371760][    T1] regulator-sn65dsi83-1v8 Linked as a fwnode consumer to gpio@23
[    0.371965][    T1] platform beeper: Linked as a consumer to regulator-3v3
[    0.372038][    T1] platform 33c00000.pcie: Linked as a consumer to regulator-3v3
[    0.372132][    T1] platform 33800000.pcie: Linked as a consumer to regulator-3v3
[    0.372210][    T1] platform 30be0000.ethernet: Linked as a consumer to regulator-3v3
[    0.372288][    T1] platform 30a30000.i2c: Linked as a sync state only consumer to regulator-3v3
[    0.372370][    T1] platform 30a20000.i2c: Linked as a sync state only consumer to regulator-3v3
[    0.372612][    T1] extcon-usbotg0 Linked as a fwnode consumer to pinctrl@30330000
[    0.372634][    T1] extcon-usbotg0 Linked as a fwnode consumer to gpio@30200000
[    0.372721][    T1] platform 38100000.usb: Linked as a consumer to extcon-usbotg0
[    0.372794][    T1] platform extcon-usbotg0: Linked as a consumer to 30200000.gpio
[    0.372870][    T1] platform extcon-usbotg0: Linked as a consumer to 30330000.pinctrl
[    0.373026][    T1] regulator-otg-vbus Linked as a fwnode consumer to pinctrl@30330000
[    0.373058][    T1] regulator-otg-vbus Linked as a fwnode consumer to gpio@30200000
[    0.373144][    T1] platform 381f0040.usb-phy: Linked as a consumer to regulator-otg-vbus
[    0.373223][    T1] platform regulator-otg-vbus: Linked as a consumer to 30200000.gpio
[    0.373303][    T1] platform regulator-otg-vbus: Linked as a consumer to 30330000.pinctrl
[    0.373468][    T1] regulator-vmmc Linked as a fwnode consumer to gpio@30210000
[    0.373557][    T1] platform 30b50000.mmc: Linked as a consumer to regulator-vmmc
[    0.373629][    T1] platform regulator-vmmc: Linked as a consumer to 30210000.gpio
[    0.374218][    T1] KASLR disabled due to lack of seed
[    0.388702][    T1] HugeTLB: registered 1.00 GiB page size, pre-allocated 0 pages
[    0.393390][    T1] HugeTLB: 16380 KiB vmemmap can be freed for a 1.00 GiB page
[    0.400687][    T1] HugeTLB: registered 32.0 MiB page size, pre-allocated 0 pages
[    0.408180][    T1] HugeTLB: 508 KiB vmemmap can be freed for a 32.0 MiB page
[    0.415333][    T1] HugeTLB: registered 2.00 MiB page size, pre-allocated 0 pages
[    0.422830][    T1] HugeTLB: 28 KiB vmemmap can be freed for a 2.00 MiB page
[    0.429896][    T1] HugeTLB: registered 64.0 KiB page size, pre-allocated 0 pages
[    0.437392][    T1] HugeTLB: 0 KiB vmemmap can be freed for a 64.0 KiB page
[    0.445115][    T1] cryptd: max_cpu_qlen set to 1000
[    0.452708][    T1] iommu: Default domain type: Translated 
[    0.455480][    T1] iommu: DMA domain TLB invalidation policy: strict mode 
[    0.462785][    T1] SCSI subsystem initialized
[    0.467027][    T1] libata version 3.00 loaded.
[    0.467296][    T1] usbcore: registered new interface driver usbfs
[    0.473114][    T1] usbcore: registered new interface driver hub
[    0.479109][    T1] usbcore: registered new device driver usb
[    0.485493][    T1] mc: Linux media interface: v0.10
[    0.489841][    T1] videodev: Linux video capture interface: v2.00
[    0.496022][    T1] pps_core: LinuxPPS API ver. 1 registered
[    0.501661][    T1] pps_core: Software ver. 5.3.6 - Copyright 2005-2007 Rodolfo Giometti <giometti@linux.it>
[    0.511529][    T1] PTP clock support registered
[    0.516295][    T1] EDAC MC: Ver: 3.0.0
[    0.520929][    T1] FPGA manager framework
[    0.524168][    T1] Advanced Linux Sound Architecture Driver Initialized.
[    0.531982][    T1] vgaarb: loaded
[    0.534640][    T1] clocksource: Switched to clocksource arch_sys_counter
[    0.541330][    T1] VFS: Disk quotas dquot_6.6.0
[    0.545754][    T1] VFS: Dquot-cache hash table entries: 512 (order 0, 4096 bytes)
[    0.562786][    T1] NET: Registered PF_INET protocol family
[    0.565777][    T1] IP idents hash table entries: 65536 (order: 7, 524288 bytes, linear)
[    0.577693][    T1] tcp_listen_portaddr_hash hash table entries: 2048 (order: 3, 32768 bytes, linear)
[    0.584158][    T1] Table-perturb hash table entries: 65536 (order: 6, 262144 bytes, linear)
[    0.592571][    T1] TCP established hash table entries: 32768 (order: 6, 262144 bytes, linear)
[    0.601494][    T1] TCP bind hash table entries: 32768 (order: 7, 524288 bytes, linear)
[    0.609827][    T1] TCP: Hash tables configured (established 32768 bind 32768)
[    0.616560][    T1] UDP hash table entries: 2048 (order: 4, 65536 bytes, linear)
[    0.623972][    T1] UDP-Lite hash table entries: 2048 (order: 4, 65536 bytes, linear)
[    0.631945][    T1] NET: Registered PF_UNIX/PF_LOCAL protocol family
[    0.638470][    T1] RPC: Registered named UNIX socket transport module.
[    0.644711][    T1] RPC: Registered udp transport module.
[    0.650116][    T1] RPC: Registered tcp transport module.
[    0.655516][    T1] RPC: Registered tcp NFSv4.1 backchannel transport module.
[    0.662672][    T1] PCI: CLS 0 bytes, default 64
[    0.668140][    T1] hw perfevents: enabled with armv8_cortex_a53 PMU driver, 7 counters available
[    0.678299][    T1] Initialise system trusted keyrings
[    0.681494][    T1] workingset: timestamp_bits=42 max_order=20 bucket_order=0
[    0.697669][    T1] squashfs: version 4.0 (2009/01/31) Phillip Lougher
[    0.702176][    T1] NFS: Registering the id_resolver key type
[    0.707163][    T1] Key type id_resolver registered
[    0.712008][    T1] Key type id_legacy registered
[    0.716812][    T1] nfs4filelayout_init: NFSv4 File Layout Driver Registering...
[    0.724130][    T1] nfs4flexfilelayout_init: NFSv4 Flexfile Layout Driver Registering...
[    0.787889][    T1] Key type asymmetric registered
[    0.789832][    T1] Asymmetric key parser 'x509' registered
[    0.795474][    T1] Block layer SCSI generic (bsg) driver version 0.4 loaded (major 243)
[    0.803537][    T1] io scheduler mq-deadline registered
[    0.808765][    T1] io scheduler kyber registered
[    0.820609][    T1] imx-pgc imx-pgc-domain.0: imx_pgc_domain_probe: Probe succeeded
[    0.825861][    T1] imx-pgc imx-pgc-domain.2: imx_pgc_domain_probe: Probe succeeded
[    0.833300][    T1] imx-pgc imx-pgc-domain.3: imx_pgc_domain_probe: Probe succeeded
[    0.840957][    T1] imx-pgc imx-pgc-domain.4: imx_pgc_domain_probe: Probe succeeded
[    0.849079][    T1] imx-pgc imx-pgc-domain.7: imx_pgc_domain_probe: Probe succeeded
[    0.856313][    T1] imx-pgc imx-pgc-domain.8: imx_pgc_domain_probe: Probe succeeded
[    0.863985][    T1] imx-pgc imx-pgc-domain.9: imx_pgc_domain_probe: Probe succeeded
[    0.871658][    T1] imx-pgc imx-pgc-domain.10: imx_pgc_domain_probe: Probe succeeded
[    0.879282][    T1] imx-gpcv2 303a0000.gpc: imx_gpcv2_probe: Probe succeeded
[    0.886886][    T1] SoC: i.MX8MQ revision 2.1
[    0.896040][    T1] Serial: 8250/16550 driver, 4 ports, IRQ sharing enabled
[    0.916005][    T1] igbvf: Intel(R) Gigabit Virtual Function Network Driver
[    0.920152][    T1] igbvf: Copyright (c) 2009 - 2012 Intel Corporation.
[    0.927211][    T1] VFIO - User Level meta-driver version: 0.3
[    0.934574][    T1] ehci_hcd: USB 2.0 'Enhanced' Host Controller (EHCI) Driver
[    0.939877][    T1] ehci-pci: EHCI PCI platform driver
[    0.945038][    T1] ehci-platform: EHCI generic platform driver
[    0.952771][    T1] i2c_dev: i2c /dev entries driver
[    0.958624][    T1] sdhci: Secure Digital Host Controller Interface driver
[    0.962791][    T1] sdhci: Copyright(c) Pierre Ossman
[    0.968202][    T1] Synopsys Designware Multimedia Card Interface Driver
[    0.975227][    T1] sdhci-pltfm: SDHCI platform and OF driver helper
[    0.983039][    T1] ledtrig-cpu: registered to indicate activity on CPUs
[    0.988797][    T1] hid: raw HID events driver (C) Jiri Kosina
[    0.994056][    T1] usbcore: registered new interface driver usbhid
[    0.999766][    T1] usbhid: USB HID core driver
[    1.007142][    T1]  cs_system_cfg: CoreSight Configuration manager initialised
[    1.017973][    T1] NET: Registered PF_PACKET protocol family
[    1.021006][    T1] Key type dns_resolver registered
[    1.026254][    T1] registered taskstats version 1
[    1.030692][    T1] Loading compiled-in X.509 certificates
[    1.072731][    T8] imx8m-blk-ctrl 38320000.blk-ctrl: error -ENODEV: failed to attach power domain "bus"

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v2 1/9] PM: domains: Delete usage of driver_deferred_probe_check_state()
  2022-07-01 13:00                               ` Tony Lindgren
@ 2022-07-12  7:12                                 ` Tony Lindgren
  2022-07-13  0:49                                   ` Saravana Kannan
  0 siblings, 1 reply; 69+ messages in thread
From: Tony Lindgren @ 2022-07-12  7:12 UTC (permalink / raw)
  To: Saravana Kannan
  Cc: Rob Herring, Geert Uytterhoeven, Greg Kroah-Hartman,
	Rafael J. Wysocki, Kevin Hilman, Ulf Hansson, Len Brown,
	Pavel Machek, Joerg Roedel, Will Deacon, Andrew Lunn,
	Heiner Kallweit, Russell King, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Linus Walleij, Hideaki YOSHIFUJI,
	David Ahern, Android Kernel Team, linux-kernel,
	open list:THERMAL, Linux IOMMU, netdev, open list:GPIO SUBSYSTEM,
	Alexander Stein

* Tony Lindgren <tony@atomide.com> [220701 16:00]:
> Also, looks like both with the initcall change for prm, and the patch
> below, there seems to be also another problem where my test devices no
> longer properly idle somehow compared to reverting the your two patches
> in next.

Sorry looks like was a wrong conclusion. While trying to track down this
issue, I cannot reproduce it. So I don't see issues idling with either
the initcall change or your test patch.

Not sure what caused my earlier tests to fail though. Maybe a config
change to enable more debugging, or possibly some kind of warm reset vs
cold reset type issue.

Regards,

Tony

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: Re: Re: Re: [PATCH v2 1/9] PM: domains: Delete usage of driver_deferred_probe_check_state()
  2022-07-06 13:02                 ` Re: " Alexander Stein
@ 2022-07-13  0:45                   ` Saravana Kannan
  2022-07-14  6:41                     ` Alexander Stein
  0 siblings, 1 reply; 69+ messages in thread
From: Saravana Kannan @ 2022-07-13  0:45 UTC (permalink / raw)
  To: Alexander Stein
  Cc: Tony Lindgren, Greg Kroah-Hartman, Rafael J. Wysocki,
	Kevin Hilman, Ulf Hansson, Len Brown, Pavel Machek, Joerg Roedel,
	Will Deacon, Andrew Lunn, Heiner Kallweit, Russell King,
	David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Linus Walleij, Hideaki YOSHIFUJI, David Ahern, kernel-team,
	linux-kernel, linux-pm, iommu, netdev, linux-gpio,
	Geert Uytterhoeven

On Wed, Jul 6, 2022 at 6:02 AM Alexander Stein
<alexander.stein@ew.tq-group.com> wrote:
>

Thanks for testing all my patches and helping me debug this.

Btw, can you try to keep the subject the same please? Looks like
somewhere in your path [EXT] is added sometimes. lore.kernel.org keeps
the thread together, but my email client (gmail) gets confused.

> Am Dienstag, 5. Juli 2022, 03:24:33 CEST schrieb Saravana Kannan:
> > On Mon, Jul 4, 2022 at 12:07 AM Alexander Stein
> >
> > <alexander.stein@ew.tq-group.com> wrote:
> > > Am Freitag, 1. Juli 2022, 09:02:22 CEST schrieb Saravana Kannan:
> > > > On Thu, Jun 30, 2022 at 11:02 PM Alexander Stein
> > > >
> > > > <alexander.stein@ew.tq-group.com> wrote:
> > > > > Hi Saravana,
> > > > >
> > > > > Am Freitag, 1. Juli 2022, 02:37:14 CEST schrieb Saravana Kannan:
> > > > > > On Thu, Jun 23, 2022 at 5:08 AM Alexander Stein
> > > > > >
> > > > > > <alexander.stein@ew.tq-group.com> wrote:
> > > > > > > Hi,
> > > > > > >
> > > > > > > Am Dienstag, 21. Juni 2022, 09:28:43 CEST schrieb Tony Lindgren:
> > > > > > > > Hi,
> > > > > > > >
> > > > > > > > * Saravana Kannan <saravanak@google.com> [700101 02:00]:
> > > > > > > > > Now that fw_devlink=on by default and fw_devlink supports
> > > > > > > > > "power-domains" property, the execution will never get to the
> > > > > > > > > point
> > > > > > > > > where driver_deferred_probe_check_state() is called before the
> > > > > > > > > supplier
> > > > > > > > > has probed successfully or before deferred probe timeout has
> > > > > > > > > expired.
> > > > > > > > >
> > > > > > > > > So, delete the call and replace it with -ENODEV.
> > > > > > > >
> > > > > > > > Looks like this causes omaps to not boot in Linux next. With
> > > > > > > > this
> > > > > > > > simple-pm-bus fails to probe initially as the power-domain is
> > > > > > > > not
> > > > > > > > yet available. On platform_probe() genpd_get_from_provider()
> > > > > > > > returns
> > > > > > > > -ENOENT.
> > > > > > > >
> > > > > > > > Seems like other stuff is potentially broken too, any ideas on
> > > > > > > > how to fix this?
> > > > > > >
> > > > > > > I think I'm hit by this as well, although I do not get a lockup.
> > > > > > > In my case I'm using
> > > > > > > arch/arm64/boot/dts/freescale/imx8mq-tqma8mq-mba8mx.dts and
> > > > > > > probing of
> > > > > > > 38320000.blk-ctrl fails as the power-domain is not (yet) registed.
> > > > > >
> > > > > > Ok, took a look.
> > > > > >
> > > > > > The problem is that there are two drivers for the same device and
> > > > > > they
> > > > > > both initialize this device.
> > > > > >
> > > > > >     gpc: gpc@303a0000 {
> > > > > >
> > > > > >         compatible = "fsl,imx8mq-gpc";
> > > > > >
> > > > > >     }
> > > > > >
> > > > > > $ git grep -l "fsl,imx7d-gpc" -- drivers/
> > > > > > drivers/irqchip/irq-imx-gpcv2.c
> > > > > > drivers/soc/imx/gpcv2.c
> > > > > >
> > > > > > IMHO, this is a bad/broken design.
> > > > > >
> > > > > > So what's happening is that fw_devlink will block the probe of
> > > > > > 38320000.blk-ctrl until 303a0000.gpc is initialized. And it stops
> > > > > > blocking the probe of 38320000.blk-ctrl as soon as the first driver
> > > > > > initializes the device. In this case, it's the irqchip driver.
> > > > > >
> > > > > > I'd recommend combining these drivers into one. Something like the
> > > > > > patch I'm attaching (sorry for the attachment, copy-paste is
> > > > > > mangling
> > > > > > the tabs). Can you give it a shot please?
> > > > >
> > > > > I tried this patch and it delayed the driver initialization (those of
> > > > > UART
> > > > > as
> > > >
> > > > > well BTW). Unfortunately the driver fails the same way:
> > > > Thanks for testing the patch!
> > > >
> > > > > > [    1.125253] imx8m-blk-ctrl 38320000.blk-ctrl: error -ENODEV:
> > > > > > failed
> > > > > > to
> > > > >
> > > > > attach power domain "bus"
> > > > >
> > > > > More than that it even introduced some more errors:
> > > > > > [    0.008160] irq: no irq domain found for gpc@303a0000 !
> > > >
> > > > So the idea behind my change was that as long as the irqchip isn't the
> > > > root of the irqdomain (might be using the terms incorrectly) like the
> > > > gic, you can make it a platform driver. And I was trying to hack up a
> > > > patch that's the equivalent of platform_irqchip_probe() (which just
> > > > ends up eventually calling the callback you use in IRQCHIP_DECLARE().
> > > > I probably made some mistake in the quick hack that I'm sure if
> > > > fixable.
> > > >
> > > > > > [    0.013251] Failed to map interrupt for
> > > > > > /soc@0/bus@30400000/timer@306a0000
> > > >
> > > > However, this timer driver also uses TIMER_OF_DECLARE() which can't
> > > > handle failure to get the IRQ (because it's can't -EPROBE_DEFER). So,
> > > > this means, the timer driver inturn needs to be converted to a
> > > > platform driver if it's supposed to work with the IRQCHIP_DECLARE()
> > > > being converted to a platform driver.
> > > >
> > > > But that's a can of worms not worth opening. But then I remembered
> > > > this simpler workaround will work and it is pretty much a variant of
> > > > the workaround that's already in the gpc's irqchip driver to allow two
> > > > drivers to probe the same device (people really should stop doing
> > > > that).
> > > >
> > > > Can you drop my previous hack patch and try this instead please? I'm
> > > > 99% sure this will work.
> > > >
> > > > diff --git a/drivers/irqchip/irq-imx-gpcv2.c
> > > > b/drivers/irqchip/irq-imx-gpcv2.c index b9c22f764b4d..8a0e82067924
> > > > 100644
> > > > --- a/drivers/irqchip/irq-imx-gpcv2.c
> > > > +++ b/drivers/irqchip/irq-imx-gpcv2.c
> > > > @@ -283,6 +283,7 @@ static int __init imx_gpcv2_irqchip_init(struct
> > > > device_node *node,
> > > >
> > > >          * later the GPC power domain driver will not be skipped.
> > > >          */
> > > >
> > > >         of_node_clear_flag(node, OF_POPULATED);
> > > >
> > > > +       fwnode_dev_initialized(domain->fwnode, false);
> > > >
> > > >         return 0;
> > > >
> > > >  }
> > >
> > > Just to be sure here, I tried this patch on top of next-20220701 but
> > > unfortunately this doesn't fix the original problem either. The timer
> > > errors are gone though.
> >
> > To clarify, you had the timer issue only with my "combine drivers" patch,
> > right?
>
> That's correct.
>
> > > The probe of imx8m-blk-ctrl got slightly delayed (from 0.74 to 0.90s
> > > printk
> > > time) but results in the identical error message.
> >
> > My guess is that the probe attempt of blk-ctrl is delayed now till gpc
> > probes (because of the device links getting created with the
> > fwnode_dev_initialized() fix), but by the time gpc probe finishes, the
> > power domains aren't registered yet because of the additional level of
> > device addition and probing.
> >
> > Can you try the attached patch please?
>
> Sure, it needed some small fixes though. But the error still is present.
>
> > And if that doesn't fix the issues, then enable the debug logs in the
> > following functions please and share the logs from boot till the
> > failure? If you can enable CONFIG_PRINTK_CALLER, that'd help too.
> > device_link_add()
> > fwnode_link_add()
> > fw_devlink_relax_cycle()
>
> I switched fw_devlink_relax_cycle() for fw_devlink_relax_link() as the former
> has no debug output here.
>
> For the record I added the following line to my kernel command line:
> > dyndbg="func device_link_add +p; func fwnode_link_add +p; func
> fw_devlink_relax_link +p"
>
> I attached the dmesg until the probe error to this mail. But I noticed the
> following lines which seem interesting:
> > [    1.466620][    T8] imx-pgc imx-pgc-domain.5: Linked as a consumer to
> > regulator.8
> > [    1.466743][    T8] imx-pgc imx-pgc-domain.5: imx_pgc_domain_probe: Probe
> succeeded
> > [    1.474733][    T8] imx-pgc imx-pgc-domain.6: Linked as a consumer to
> regulator.9
> > [    1.474774][    T8] imx-pgc imx-pgc-domain.6: imx_pgc_domain_probe: Probe
> succeeded

I'm guessing this happens after the probe error.

Ok, I looked at the dmesg logs and this pretty much confirms my
thought on why the probe ordering wasn't maintained.

The power domains lack a compatible property, so the blk-ctrl is
linked as a consumer of the gpc instead:
[    0.343905][    T1] blk-ctrl@38320000 Linked as a fwnode consumer
to gpc@303a0000
[    0.343943][    T1] blk-ctrl@38320000 Linked as a fwnode consumer
to clock-controller@30380000
This ^^ is the device tree parsing figuring out the dependencies
between the DT nodes.

[    0.368462][    T1] platform 38320000.blk-ctrl: Linked as a
consumer to 30380000.clock-controller
[    0.368542][    T1] platform 38320000.blk-ctrl: Linked as a
consumer to 303a0000.gpc
This ^^ is converting the DT node dependencies into device links.

So, the only real options are:
1. Fix DT and add a compatible string to the DT nodes.
2. Move the initcall level of the regulator driver so the powerdomain
probe doesn't get deferred. Not ideal that we are playing initcall
chicken to handle the feature meant to remove the need for initcall
chicken. But I see these "device, but won't have a compatible
property" as exceptions and feel it's okay to have to play with
initcall levels to handle those.
3. Provide a helper function that driver that do this (creating
devices for child DT nodes without compatible property) can use to
move/copy their consumer device links to the child devices they add.
And then fix up the gpc driver so that it copies the gpc -- blk-ctrl
device link to the proper power domain.
4. I have another idea for how I could fix that at a driver core
level, but I'm not sure it'll work yet and its definitely not
something I want to try and get in for 5.19 -- too late for that IMHO.

Want to give (2) a shot so that I can still try to keep the cleanup
series that caused this problem (that's the long term goal) while I
give (3) and (4) a shot for 5.20?

> regulator.8 and regulator.9 is the power sequencer, attached on I2C. This also
> makes perfectly sense if you look at [1]ff. These power domains are supplied
> by specific power supply rails. Several, if not all, imx8mq boards have this
> kind of setting.

Yeah, makes sense in terms of what's going on.

-Saravana

>
> > Btw, part of the reason I'm trying to make sure we fix it the right
> > way is that when we try to enable async boot by default, we don't run
> > into issues.
>
> Sounds resonable.
>
> Best regards,
> Alexander
>
> [1] https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/tree/
> arch/arm64/boot/dts/freescale/imx8mq-tqma8mq.dtsi#n84

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v2 1/9] PM: domains: Delete usage of driver_deferred_probe_check_state()
  2022-07-12  7:12                                 ` Tony Lindgren
@ 2022-07-13  0:49                                   ` Saravana Kannan
  2022-07-13  8:06                                     ` Tony Lindgren
  0 siblings, 1 reply; 69+ messages in thread
From: Saravana Kannan @ 2022-07-13  0:49 UTC (permalink / raw)
  To: Tony Lindgren
  Cc: Rob Herring, Geert Uytterhoeven, Greg Kroah-Hartman,
	Rafael J. Wysocki, Kevin Hilman, Ulf Hansson, Len Brown,
	Pavel Machek, Joerg Roedel, Will Deacon, Andrew Lunn,
	Heiner Kallweit, Russell King, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Linus Walleij, Hideaki YOSHIFUJI,
	David Ahern, Android Kernel Team, linux-kernel,
	open list:THERMAL, Linux IOMMU, netdev, open list:GPIO SUBSYSTEM,
	Alexander Stein

On Tue, Jul 12, 2022 at 12:12 AM Tony Lindgren <tony@atomide.com> wrote:
>
> * Tony Lindgren <tony@atomide.com> [220701 16:00]:
> > Also, looks like both with the initcall change for prm, and the patch
> > below, there seems to be also another problem where my test devices no
> > longer properly idle somehow compared to reverting the your two patches
> > in next.
>
> Sorry looks like was a wrong conclusion. While trying to track down this
> issue, I cannot reproduce it. So I don't see issues idling with either
> the initcall change or your test patch.
>
> Not sure what caused my earlier tests to fail though. Maybe a config
> change to enable more debugging, or possibly some kind of warm reset vs
> cold reset type issue.

Thanks for getting back to me about the false alarm.

OK, so it looks like my patch to drivers/of/property.c fixed the issue
for you. In that case, let me test that a bit more thoroughly on my
end to make sure it's not breaking any existing functionality. And if
it's not breaking, I'll land that in the kernel eventually. Might be a
bit too late for 5.19. I'm considering temporarily reverting my series
depending on how the rest of the issues from my series go.

-Saravana

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v2 3/9] net: mdio: Delete usage of driver_deferred_probe_check_state()
  2022-07-05  9:11   ` Geert Uytterhoeven
@ 2022-07-13  1:40     ` Saravana Kannan
  2022-07-13 11:39       ` Geert Uytterhoeven
  2022-08-15  8:38     ` Geert Uytterhoeven
  1 sibling, 1 reply; 69+ messages in thread
From: Saravana Kannan @ 2022-07-13  1:40 UTC (permalink / raw)
  To: Geert Uytterhoeven
  Cc: Greg Kroah-Hartman, Rafael J. Wysocki, Kevin Hilman, Ulf Hansson,
	Len Brown, Pavel Machek, Joerg Roedel, Will Deacon, Andrew Lunn,
	Heiner Kallweit, Russell King, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Linus Walleij, Hideaki YOSHIFUJI,
	David Ahern, Android Kernel Team, Linux Kernel Mailing List,
	Linux PM list, Linux IOMMU, netdev, open list:GPIO SUBSYSTEM,
	Linux-Renesas

On Tue, Jul 5, 2022 at 2:11 AM Geert Uytterhoeven <geert@linux-m68k.org> wrote:
>
> Hi Saravana,
>
> On Wed, Jun 1, 2022 at 2:44 PM Saravana Kannan <saravanak@google.com> wrote:
> > Now that fw_devlink=on by default and fw_devlink supports interrupt
> > properties, the execution will never get to the point where
> > driver_deferred_probe_check_state() is called before the supplier has
> > probed successfully or before deferred probe timeout has expired.
> >
> > So, delete the call and replace it with -ENODEV.
> >
> > Signed-off-by: Saravana Kannan <saravanak@google.com>
>
> Thanks for your patch, which is now commit f8217275b57aa48d ("net:
> mdio: Delete usage of driver_deferred_probe_check_state()") in
> driver-core/driver-core-next.
>
> Seems like I missed something when providing my T-b for this series,
> sorry for that.

No worries. Appreciate any testing help.

>
> arch/arm/boot/dts/r8a7791-koelsch.dts has:
>
>     &ether {
>             pinctrl-0 = <&ether_pins>, <&phy1_pins>;
>             pinctrl-names = "default";
>
>             phy-handle = <&phy1>;
>             renesas,ether-link-active-low;
>             status = "okay";
>
>             phy1: ethernet-phy@1 {
>                     compatible = "ethernet-phy-id0022.1537",
>                                  "ethernet-phy-ieee802.3-c22";
>                     reg = <1>;
>                     interrupt-parent = <&irqc0>;
>                     interrupts = <0 IRQ_TYPE_LEVEL_LOW>;
>                     micrel,led-mode = <1>;
>                     reset-gpios = <&gpio5 22 GPIO_ACTIVE_LOW>;
>             };
>     };
>
> Despite the interrupts property, &ether is now probed before irqc0
> (interrupt-controller@e61c0000 in arch/arm/boot/dts/r8a7791.dtsi),
> causing the PHY not finding its interrupt, and resorting to polling:

I'd still expect the device link to have been created properly for
this phy device. Could you enable the logging in device_link_add() to
check the link is created between the phy and the IRQ?

My guess is that this probably has something to do with phys being
attached to drivers differently.

>
>     -Micrel KSZ8041RNLI ee700000.ethernet-ffffffff:01: attached PHY
> driver (mii_bus:phy_addr=ee700000.ethernet-ffffffff:01, irq=185)
>     +Micrel KSZ8041RNLI ee700000.ethernet-ffffffff:01: attached PHY
> driver (mii_bus:phy_addr=ee700000.ethernet-ffffffff:01, irq=POLL)

Can you drop a WARN() where this is printed to get the stack trace to
check my hypothesis?

-Saravana

>
> Reverting this commit, and commit 9cbffc7a59561be9 ("driver core:
> Delete driver_deferred_probe_check_state()") fixes that.
>
> > --- a/drivers/net/mdio/fwnode_mdio.c
> > +++ b/drivers/net/mdio/fwnode_mdio.c
> > @@ -47,9 +47,7 @@ int fwnode_mdiobus_phy_device_register(struct mii_bus *mdio,
> >          * just fall back to poll mode
> >          */
> >         if (rc == -EPROBE_DEFER)
> > -               rc = driver_deferred_probe_check_state(&phy->mdio.dev);
> > -       if (rc == -EPROBE_DEFER)
> > -               return rc;
> > +               rc = -ENODEV;
> >
> >         if (rc > 0) {
> >                 phy->irq = rc;
>
> Gr{oetje,eeting}s,
>
>                         Geert
>
> --
> Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org
>
> In personal conversations with technical people, I call myself a hacker. But
> when I'm talking to journalists I just say "programmer" or something like that.
>                                 -- Linus Torvalds

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v2 1/9] PM: domains: Delete usage of driver_deferred_probe_check_state()
  2022-07-13  0:49                                   ` Saravana Kannan
@ 2022-07-13  8:06                                     ` Tony Lindgren
  0 siblings, 0 replies; 69+ messages in thread
From: Tony Lindgren @ 2022-07-13  8:06 UTC (permalink / raw)
  To: Saravana Kannan
  Cc: Rob Herring, Geert Uytterhoeven, Greg Kroah-Hartman,
	Rafael J. Wysocki, Kevin Hilman, Ulf Hansson, Len Brown,
	Pavel Machek, Joerg Roedel, Will Deacon, Andrew Lunn,
	Heiner Kallweit, Russell King, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Linus Walleij, Hideaki YOSHIFUJI,
	David Ahern, Android Kernel Team, linux-kernel,
	open list:THERMAL, Linux IOMMU, netdev, open list:GPIO SUBSYSTEM,
	Alexander Stein

* Saravana Kannan <saravanak@google.com> [220713 00:44]:
> On Tue, Jul 12, 2022 at 12:12 AM Tony Lindgren <tony@atomide.com> wrote:
> >
> > * Tony Lindgren <tony@atomide.com> [220701 16:00]:
> > > Also, looks like both with the initcall change for prm, and the patch
> > > below, there seems to be also another problem where my test devices no
> > > longer properly idle somehow compared to reverting the your two patches
> > > in next.
> >
> > Sorry looks like was a wrong conclusion. While trying to track down this
> > issue, I cannot reproduce it. So I don't see issues idling with either
> > the initcall change or your test patch.
> >
> > Not sure what caused my earlier tests to fail though. Maybe a config
> > change to enable more debugging, or possibly some kind of warm reset vs
> > cold reset type issue.
> 
> Thanks for getting back to me about the false alarm.

FYI I'm pretty sure I had also some pending sdhci related patches applied
while testing causing extra issues.

> OK, so it looks like my patch to drivers/of/property.c fixed the issue
> for you. In that case, let me test that a bit more thoroughly on my
> end to make sure it's not breaking any existing functionality. And if
> it's not breaking, I'll land that in the kernel eventually. Might be a
> bit too late for 5.19. I'm considering temporarily reverting my series
> depending on how the rest of the issues from my series go.

OK. Seems the series is otherwise working and in case of issues, partial
revert should be enough in the worst case. But yeah, probably some more
testing is needed.

Regards,

Tony

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v2 3/9] net: mdio: Delete usage of driver_deferred_probe_check_state()
  2022-07-13  1:40     ` Saravana Kannan
@ 2022-07-13 11:39       ` Geert Uytterhoeven
  0 siblings, 0 replies; 69+ messages in thread
From: Geert Uytterhoeven @ 2022-07-13 11:39 UTC (permalink / raw)
  To: Saravana Kannan
  Cc: Greg Kroah-Hartman, Rafael J. Wysocki, Kevin Hilman, Ulf Hansson,
	Len Brown, Pavel Machek, Joerg Roedel, Will Deacon, Andrew Lunn,
	Heiner Kallweit, Russell King, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Linus Walleij, Hideaki YOSHIFUJI,
	David Ahern, Android Kernel Team, Linux Kernel Mailing List,
	Linux PM list, Linux IOMMU, netdev, open list:GPIO SUBSYSTEM,
	Linux-Renesas

Hi Saravana,

On Wed, Jul 13, 2022 at 3:40 AM Saravana Kannan <saravanak@google.com> wrote:
> On Tue, Jul 5, 2022 at 2:11 AM Geert Uytterhoeven <geert@linux-m68k.org> wrote:
> > On Wed, Jun 1, 2022 at 2:44 PM Saravana Kannan <saravanak@google.com> wrote:
> > > Now that fw_devlink=on by default and fw_devlink supports interrupt
> > > properties, the execution will never get to the point where
> > > driver_deferred_probe_check_state() is called before the supplier has
> > > probed successfully or before deferred probe timeout has expired.
> > >
> > > So, delete the call and replace it with -ENODEV.
> > >
> > > Signed-off-by: Saravana Kannan <saravanak@google.com>
> >
> > Thanks for your patch, which is now commit f8217275b57aa48d ("net:
> > mdio: Delete usage of driver_deferred_probe_check_state()") in
> > driver-core/driver-core-next.
> >
> > Seems like I missed something when providing my T-b for this series,
> > sorry for that.
>
> > arch/arm/boot/dts/r8a7791-koelsch.dts has:
> >
> >     &ether {
> >             pinctrl-0 = <&ether_pins>, <&phy1_pins>;
> >             pinctrl-names = "default";
> >
> >             phy-handle = <&phy1>;
> >             renesas,ether-link-active-low;
> >             status = "okay";
> >
> >             phy1: ethernet-phy@1 {
> >                     compatible = "ethernet-phy-id0022.1537",
> >                                  "ethernet-phy-ieee802.3-c22";
> >                     reg = <1>;
> >                     interrupt-parent = <&irqc0>;
> >                     interrupts = <0 IRQ_TYPE_LEVEL_LOW>;
> >                     micrel,led-mode = <1>;
> >                     reset-gpios = <&gpio5 22 GPIO_ACTIVE_LOW>;
> >             };
> >     };
> >
> > Despite the interrupts property, &ether is now probed before irqc0
> > (interrupt-controller@e61c0000 in arch/arm/boot/dts/r8a7791.dtsi),
> > causing the PHY not finding its interrupt, and resorting to polling:
>
> I'd still expect the device link to have been created properly for
> this phy device. Could you enable the logging in device_link_add() to
> check the link is created between the phy and the IRQ?
>
> My guess is that this probably has something to do with phys being
> attached to drivers differently.

Comparison of dmesg before/after enabling debugging, for
related nodes:

    +interrupt-controller@e61c0000 Linked as a fwnode consumer to
clock-controller@e6150000

    +pmic@58 Linked as a fwnode consumer to interrupt-controller@e61c0000
    +regulator@68 Linked as a fwnode consumer to interrupt-controller@e61c0000

Other user of irqc

    +ethernet@ee700000 Linked as a fwnode consumer to clock-controller@e6150000
    +ethernet@ee700000 Linked as a fwnode consumer to pinctrl@e6060000
    +ethernet-phy@1 Linked as a fwnode consumer to interrupt-controller@e61c0000
    +ethernet-phy@1 Linked as a fwnode consumer to gpio@e6055000

PHY linked correctly to consumers

    +device: 'e61c0000.interrupt-controller': device_add
    +device: 'platform:e6150000.clock-controller--platform:e61c0000.interrupt-controller':
device_add
    +devices_kset: Moving e61c0000.interrupt-controller to end of list
    +platform e61c0000.interrupt-controller: Linked as a consumer to
e6150000.clock-controller
    +interrupt-controller@e61c0000 Dropping the fwnode link to
clock-controller@e6150000
    +platform e61c0000.interrupt-controller: error -EPROBE_DEFER:
supplier e6150000.clock-controller not ready

Tried to probe irqc (why? consumer not ready), deferred.

    +device: 'platform:e61c0000.interrupt-controller--platform:e60b0000.i2c':
device_add
    +platform e60b0000.i2c: Linked as a sync state only consumer to
e61c0000.interrupt-controller

I guess sync state means through other (child) consumers (pmic,
regulator) above?

    +device: 'ee700000.ethernet': device_add
    +device: 'platform:e6060000.pinctrl--platform:ee700000.ethernet': device_add
    +devices_kset: Moving ee700000.ethernet to end of list
    +platform ee700000.ethernet: Linked as a consumer to e6060000.pinctrl
    +ethernet@ee700000 Dropping the fwnode link to pinctrl@e6060000
    +device: 'platform:e6150000.clock-controller--platform:ee700000.ethernet':
device_add
    +devices_kset: Moving ee700000.ethernet to end of list
    +platform ee700000.ethernet: Linked as a consumer to
e6150000.clock-controller
    +ethernet@ee700000 Dropping the fwnode link to clock-controller@e6150000
    +device: 'platform:e6055000.gpio--platform:ee700000.ethernet': device_add
    +platform ee700000.ethernet: Linked as a sync state only consumer
to e6055000.gpio
    +device: 'platform:e61c0000.interrupt-controller--platform:ee700000.ethernet':
device_add
    +platform ee700000.ethernet: Linked as a sync state only consumer
to e61c0000.interrupt-controller

Hence linking ethernet to child (phy) consumers.

    +device: 'ee700000.ethernet-ffffffff': device_add

Probing ethernet...

     libphy: fwnode_get_phy_id: fwnode
/soc/ethernet@ee700000/ethernet-phy@1 phy_id = 0x00221537
     libphy: fwnode_get_phy_id: fwnode
/soc/ethernet@ee700000/ethernet-phy@1 phy_id = 0x00221537
    +fwnode_mdiobus_phy_device_register: fwnode_irq_get() returned -517
    +fwnode_mdiobus_phy_device_register: ignoring -EPROBE_DEFER

This is the part that got changed by this patch.

    +device: 'ee700000.ethernet-ffffffff:01': device_add
    +device: 'platform:e6055000.gpio--mdio_bus:ee700000.ethernet-ffffffff:01':
device_add
    +devices_kset: Moving ee700000.ethernet-ffffffff:01 to end of list
    +mdio_bus ee700000.ethernet-ffffffff:01: Linked as a consumer to
e6055000.gpio
    +ethernet-phy@1 Dropping the fwnode link to gpio@e6055000
    +device: 'platform:e61c0000.interrupt-controller--mdio_bus:ee700000.ethernet-ffffffff:01':
device_add
    +devices_kset: Moving ee700000.ethernet-ffffffff:01 to end of list
    +mdio_bus ee700000.ethernet-ffffffff:01: Linked as a consumer to
e61c0000.interrupt-controller
    +ethernet-phy@1 Dropping the fwnode link to interrupt-controller@e61c0000
    +mdio_bus ee700000.ethernet-ffffffff:01: error -EPROBE_DEFER:
supplier e61c0000.interrupt-controller not ready

Why was ethernet probed this early?
We knew the supplier of the phy was still missing?

    +device: 'eth1': device_add
     sh-eth ee700000.ethernet eth1: Base address at 0xee700000,
2e:09:0a:00:6d:85, IRQ 104.
    +sh-eth ee700000.ethernet: Dropping the link to e6055000.gpio
    +device: 'platform:e6055000.gpio--platform:ee700000.ethernet':
device_unregister
    +sh-eth ee700000.ethernet: Dropping the link to
e61c0000.interrupt-controller
    +device: 'platform:e61c0000.interrupt-controller--platform:ee700000.ethernet':
device_unregister

    +devices_kset: Moving e61c0000.interrupt-controller to end of list
    +devices_kset: Moving ee700000.ethernet-ffffffff:01 to end of list
     renesas_irqc e61c0000.interrupt-controller: driving 10 irqs

Finally, irqc is probed.

    +device: '6-0058': device_add
    +device: 'platform:e61c0000.interrupt-controller--i2c:6-0058': device_add
    +devices_kset: Moving 6-0058 to end of list
    +i2c 6-0058: Linked as a consumer to e61c0000.interrupt-controller
    +pmic@58 Dropping the fwnode link to interrupt-controller@e61c0000

    +device: '6-0068': device_add
    +device: 'platform:e61c0000.interrupt-controller--i2c:6-0068': device_add
    +devices_kset: Moving 6-0068 to end of list
    +i2c 6-0068: Linked as a consumer to e61c0000.interrupt-controller
    +regulator@68 Dropping the fwnode link to interrupt-controller@e61c0000

Propagating other irqc suppliers to the parent of their consumers

    +i2c-sh_mobile e60b0000.i2c: Dropping the link to
e61c0000.interrupt-controller
    +device: 'platform:e61c0000.interrupt-controller--platform:e60b0000.i2c':
device_unregister

    +devices_kset: Moving ee700000.ethernet-ffffffff:01 to end of list

     Micrel KSZ8041RNLI ee700000.ethernet-ffffffff:01: attached PHY
driver (mii_bus:phy_addr=ee700000.ethernet-ffffffff:01, irq=POLL)
     sh-eth ee700000.ethernet eth1: Link is Up - 100Mbps/Full - flow control off
     Sending DHCP requests ., OK

> >     -Micrel KSZ8041RNLI ee700000.ethernet-ffffffff:01: attached PHY
> > driver (mii_bus:phy_addr=ee700000.ethernet-ffffffff:01, irq=185)
> >     +Micrel KSZ8041RNLI ee700000.ethernet-ffffffff:01: attached PHY
> > driver (mii_bus:phy_addr=ee700000.ethernet-ffffffff:01, irq=POLL)
>
> Can you drop a WARN() where this is printed to get the stack trace to
> check my hypothesis?

That didn't help much, as this is the messenger, not the cause.

Thanks!

Gr{oetje,eeting}s,

                        Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: Re: Re: Re: [PATCH v2 1/9] PM: domains: Delete usage of driver_deferred_probe_check_state()
  2022-07-13  0:45                   ` Saravana Kannan
@ 2022-07-14  6:41                     ` Alexander Stein
  2022-07-15 22:08                       ` Saravana Kannan
  0 siblings, 1 reply; 69+ messages in thread
From: Alexander Stein @ 2022-07-14  6:41 UTC (permalink / raw)
  To: Saravana Kannan, l.stach
  Cc: Tony Lindgren, Greg Kroah-Hartman, Rafael J. Wysocki,
	Kevin Hilman, Ulf Hansson, Len Brown, Pavel Machek, Joerg Roedel,
	Will Deacon, Andrew Lunn, Heiner Kallweit, Russell King,
	David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Linus Walleij, Hideaki YOSHIFUJI, David Ahern, kernel-team,
	linux-kernel, linux-pm, iommu, netdev, linux-gpio,
	Geert Uytterhoeven

Am Mittwoch, 13. Juli 2022, 02:45:06 CEST schrieb Saravana Kannan:
> On Wed, Jul 6, 2022 at 6:02 AM Alexander Stein
> <alexander.stein@ew.tq-group.com> wrote:
> 
> 
> Thanks for testing all my patches and helping me debug this.
> 
> Btw, can you try to keep the subject the same please? Looks like
> somewhere in your path [EXT] is added sometimes. lore.kernel.org keeps
> the thread together, but my email client (gmail) gets confused.

Sorry about that. Unfortunately [EXT] is inserted automatically and it is 
tedious and error-prone to remove it manually...

> > Am Dienstag, 5. Juli 2022, 03:24:33 CEST schrieb Saravana Kannan:
> > > On Mon, Jul 4, 2022 at 12:07 AM Alexander Stein
> > > 
> > > <alexander.stein@ew.tq-group.com> wrote:
> > > > Am Freitag, 1. Juli 2022, 09:02:22 CEST schrieb Saravana Kannan:
> > > > > On Thu, Jun 30, 2022 at 11:02 PM Alexander Stein
> > > > > 
> > > > > <alexander.stein@ew.tq-group.com> wrote:
> > > > > > Hi Saravana,
> > > > > > 
> > > > > > Am Freitag, 1. Juli 2022, 02:37:14 CEST schrieb Saravana Kannan:
> > > > > > > On Thu, Jun 23, 2022 at 5:08 AM Alexander Stein
> > > > > > > 
> > > > > > > <alexander.stein@ew.tq-group.com> wrote:
> > > > > > > > Hi,
> > > > > > > > 
> > > > > > > > Am Dienstag, 21. Juni 2022, 09:28:43 CEST schrieb Tony 
Lindgren:
> > > > > > > > > Hi,
> > > > > > > > > 
> > > > > > > > > * Saravana Kannan <saravanak@google.com> [700101 02:00]:
> > > > > > > > > > Now that fw_devlink=on by default and fw_devlink supports
> > > > > > > > > > "power-domains" property, the execution will never get to
> > > > > > > > > > the
> > > > > > > > > > point
> > > > > > > > > > where driver_deferred_probe_check_state() is called before
> > > > > > > > > > the
> > > > > > > > > > supplier
> > > > > > > > > > has probed successfully or before deferred probe timeout
> > > > > > > > > > has
> > > > > > > > > > expired.
> > > > > > > > > > 
> > > > > > > > > > So, delete the call and replace it with -ENODEV.
> > > > > > > > > 
> > > > > > > > > Looks like this causes omaps to not boot in Linux next. With
> > > > > > > > > this
> > > > > > > > > simple-pm-bus fails to probe initially as the power-domain
> > > > > > > > > is
> > > > > > > > > not
> > > > > > > > > yet available. On platform_probe() genpd_get_from_provider()
> > > > > > > > > returns
> > > > > > > > > -ENOENT.
> > > > > > > > > 
> > > > > > > > > Seems like other stuff is potentially broken too, any ideas
> > > > > > > > > on
> > > > > > > > > how to fix this?
> > > > > > > > 
> > > > > > > > I think I'm hit by this as well, although I do not get a
> > > > > > > > lockup.
> > > > > > > > In my case I'm using
> > > > > > > > arch/arm64/boot/dts/freescale/imx8mq-tqma8mq-mba8mx.dts and
> > > > > > > > probing of
> > > > > > > > 38320000.blk-ctrl fails as the power-domain is not (yet)
> > > > > > > > registed.
> > > > > > > 
> > > > > > > Ok, took a look.
> > > > > > > 
> > > > > > > The problem is that there are two drivers for the same device
> > > > > > > and
> > > > > > > they
> > > > > > > both initialize this device.
> > > > > > > 
> > > > > > >     gpc: gpc@303a0000 {
> > > > > > >     
> > > > > > >         compatible = "fsl,imx8mq-gpc";
> > > > > > >     
> > > > > > >     }
> > > > > > > 
> > > > > > > $ git grep -l "fsl,imx7d-gpc" -- drivers/
> > > > > > > drivers/irqchip/irq-imx-gpcv2.c
> > > > > > > drivers/soc/imx/gpcv2.c
> > > > > > > 
> > > > > > > IMHO, this is a bad/broken design.
> > > > > > > 
> > > > > > > So what's happening is that fw_devlink will block the probe of
> > > > > > > 38320000.blk-ctrl until 303a0000.gpc is initialized. And it
> > > > > > > stops
> > > > > > > blocking the probe of 38320000.blk-ctrl as soon as the first
> > > > > > > driver
> > > > > > > initializes the device. In this case, it's the irqchip driver.
> > > > > > > 
> > > > > > > I'd recommend combining these drivers into one. Something like
> > > > > > > the
> > > > > > > patch I'm attaching (sorry for the attachment, copy-paste is
> > > > > > > mangling
> > > > > > > the tabs). Can you give it a shot please?
> > > > > > 
> > > > > > I tried this patch and it delayed the driver initialization (those
> > > > > > of
> > > > > > UART
> > > > > > as
> > > > > 
> > > > > > well BTW). Unfortunately the driver fails the same way:
> > > > > Thanks for testing the patch!
> > > > > 
> > > > > > > [    1.125253] imx8m-blk-ctrl 38320000.blk-ctrl: error -ENODEV:
> > > > > > > failed
> > > > > > > to
> > > > > > 
> > > > > > attach power domain "bus"
> > > > > > 
> > > > > > More than that it even introduced some more errors:
> > > > > > > [    0.008160] irq: no irq domain found for gpc@303a0000 !
> > > > > 
> > > > > So the idea behind my change was that as long as the irqchip isn't
> > > > > the
> > > > > root of the irqdomain (might be using the terms incorrectly) like
> > > > > the
> > > > > gic, you can make it a platform driver. And I was trying to hack up
> > > > > a
> > > > > patch that's the equivalent of platform_irqchip_probe() (which just
> > > > > ends up eventually calling the callback you use in
> > > > > IRQCHIP_DECLARE().
> > > > > I probably made some mistake in the quick hack that I'm sure if
> > > > > fixable.
> > > > > 
> > > > > > > [    0.013251] Failed to map interrupt for
> > > > > > > /soc@0/bus@30400000/timer@306a0000
> > > > > 
> > > > > However, this timer driver also uses TIMER_OF_DECLARE() which can't
> > > > > handle failure to get the IRQ (because it's can't -EPROBE_DEFER).
> > > > > So,
> > > > > this means, the timer driver inturn needs to be converted to a
> > > > > platform driver if it's supposed to work with the IRQCHIP_DECLARE()
> > > > > being converted to a platform driver.
> > > > > 
> > > > > But that's a can of worms not worth opening. But then I remembered
> > > > > this simpler workaround will work and it is pretty much a variant of
> > > > > the workaround that's already in the gpc's irqchip driver to allow
> > > > > two
> > > > > drivers to probe the same device (people really should stop doing
> > > > > that).
> > > > > 
> > > > > Can you drop my previous hack patch and try this instead please? I'm
> > > > > 99% sure this will work.
> > > > > 
> > > > > diff --git a/drivers/irqchip/irq-imx-gpcv2.c
> > > > > b/drivers/irqchip/irq-imx-gpcv2.c index b9c22f764b4d..8a0e82067924
> > > > > 100644
> > > > > --- a/drivers/irqchip/irq-imx-gpcv2.c
> > > > > +++ b/drivers/irqchip/irq-imx-gpcv2.c
> > > > > @@ -283,6 +283,7 @@ static int __init imx_gpcv2_irqchip_init(struct
> > > > > device_node *node,
> > > > > 
> > > > >          * later the GPC power domain driver will not be skipped.
> > > > >          */
> > > > >         
> > > > >         of_node_clear_flag(node, OF_POPULATED);
> > > > > 
> > > > > +       fwnode_dev_initialized(domain->fwnode, false);
> > > > > 
> > > > >         return 0;
> > > > >  
> > > > >  }
> > > > 
> > > > Just to be sure here, I tried this patch on top of next-20220701 but
> > > > unfortunately this doesn't fix the original problem either. The timer
> > > > errors are gone though.
> > > 
> > > To clarify, you had the timer issue only with my "combine drivers"
> > > patch,
> > > right?
> > 
> > That's correct.
> > 
> > > > The probe of imx8m-blk-ctrl got slightly delayed (from 0.74 to 0.90s
> > > > printk
> > > > time) but results in the identical error message.
> > > 
> > > My guess is that the probe attempt of blk-ctrl is delayed now till gpc
> > > probes (because of the device links getting created with the
> > > fwnode_dev_initialized() fix), but by the time gpc probe finishes, the
> > > power domains aren't registered yet because of the additional level of
> > > device addition and probing.
> > > 
> > > Can you try the attached patch please?
> > 
> > Sure, it needed some small fixes though. But the error still is present.
> > 
> > > And if that doesn't fix the issues, then enable the debug logs in the
> > > following functions please and share the logs from boot till the
> > > failure? If you can enable CONFIG_PRINTK_CALLER, that'd help too.
> > > device_link_add()
> > > fwnode_link_add()
> > > fw_devlink_relax_cycle()
> > 
> > I switched fw_devlink_relax_cycle() for fw_devlink_relax_link() as the
> > former has no debug output here.
> > 
> > For the record I added the following line to my kernel command line:
> > > dyndbg="func device_link_add +p; func fwnode_link_add +p; func
> > 
> > fw_devlink_relax_link +p"
> > 
> > I attached the dmesg until the probe error to this mail. But I noticed the
> > 
> > following lines which seem interesting:
> > > [    1.466620][    T8] imx-pgc imx-pgc-domain.5: Linked as a consumer to
> > > regulator.8
> > > [    1.466743][    T8] imx-pgc imx-pgc-domain.5: imx_pgc_domain_probe:
> > > Probe> 
> > succeeded
> > 
> > > [    1.474733][    T8] imx-pgc imx-pgc-domain.6: Linked as a consumer to
> > 
> > regulator.9
> > 
> > > [    1.474774][    T8] imx-pgc imx-pgc-domain.6: imx_pgc_domain_probe:
> > > Probe> 
> > succeeded
> 
> I'm guessing this happens after the probe error.
> 
> Ok, I looked at the dmesg logs and this pretty much confirms my
> thought on why the probe ordering wasn't maintained.
> 
> The power domains lack a compatible property, so the blk-ctrl is
> linked as a consumer of the gpc instead:
> [    0.343905][    T1] blk-ctrl@38320000 Linked as a fwnode consumer
> to gpc@303a0000
> [    0.343943][    T1] blk-ctrl@38320000 Linked as a fwnode consumer
> to clock-controller@30380000
> This ^^ is the device tree parsing figuring out the dependencies
> between the DT nodes.
> 
> [    0.368462][    T1] platform 38320000.blk-ctrl: Linked as a
> consumer to 30380000.clock-controller
> [    0.368542][    T1] platform 38320000.blk-ctrl: Linked as a
> consumer to 303a0000.gpc
> This ^^ is converting the DT node dependencies into device links.
> 
> So, the only real options are:
> 1. Fix DT and add a compatible string to the DT nodes.
> 2. Move the initcall level of the regulator driver so the powerdomain
> probe doesn't get deferred. Not ideal that we are playing initcall
> chicken to handle the feature meant to remove the need for initcall
> chicken. But I see these "device, but won't have a compatible
> property" as exceptions and feel it's okay to have to play with
> initcall levels to handle those.
> 3. Provide a helper function that driver that do this (creating
> devices for child DT nodes without compatible property) can use to
> move/copy their consumer device links to the child devices they add.
> And then fix up the gpc driver so that it copies the gpc -- blk-ctrl
> device link to the proper power domain.
> 4. I have another idea for how I could fix that at a driver core
> level, but I'm not sure it'll work yet and its definitely not
> something I want to try and get in for 5.19 -- too late for that IMHO.
> 
> Want to give (2) a shot so that I can still try to keep the cleanup
> series that caused this problem (that's the long term goal) while I
> give (3) and (4) a shot for 5.20?

Sure, I can give (2) a shot. Which initcall needs to be modified? You have a 
diff snippet?
BTW: this potentially affects all imx8m and imx7d as they have the same gpc 
binding.

Can't say much about (1). I added Lucas Stach to recipients, he did a lot on 
this gpc driver.
@Lucas: Do you have some input why the gpc power domains do not have a 
compatible? Is it reasonable to add them?

Best regards,
Alexander

> > regulator.8 and regulator.9 is the power sequencer, attached on I2C. This
> > also makes perfectly sense if you look at [1]ff. These power domains are
> > supplied by specific power supply rails. Several, if not all, imx8mq
> > boards have this kind of setting.
> 
> Yeah, makes sense in terms of what's going on.
> 
> -Saravana
> 
> > > Btw, part of the reason I'm trying to make sure we fix it the right
> > > way is that when we try to enable async boot by default, we don't run
> > > into issues.
> > 
> > Sounds resonable.
> > 
> > Best regards,
> > Alexander
> > 
> > [1]
> > https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/tree/
> > arch/arm64/boot/dts/freescale/imx8mq-tqma8mq.dtsi#n84





^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: Re: Re: Re: [PATCH v2 1/9] PM: domains: Delete usage of driver_deferred_probe_check_state()
  2022-07-14  6:41                     ` Alexander Stein
@ 2022-07-15 22:08                       ` Saravana Kannan
  0 siblings, 0 replies; 69+ messages in thread
From: Saravana Kannan @ 2022-07-15 22:08 UTC (permalink / raw)
  To: Alexander Stein
  Cc: l.stach, Tony Lindgren, Greg Kroah-Hartman, Rafael J. Wysocki,
	Kevin Hilman, Ulf Hansson, Len Brown, Pavel Machek, Joerg Roedel,
	Will Deacon, Andrew Lunn, Heiner Kallweit, Russell King,
	David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Linus Walleij, Hideaki YOSHIFUJI, David Ahern, kernel-team,
	linux-kernel, linux-pm, iommu, netdev, linux-gpio,
	Geert Uytterhoeven

On Wed, Jul 13, 2022 at 11:41 PM Alexander Stein
<alexander.stein@ew.tq-group.com> wrote:
>
> Am Mittwoch, 13. Juli 2022, 02:45:06 CEST schrieb Saravana Kannan:
> > On Wed, Jul 6, 2022 at 6:02 AM Alexander Stein
> > <alexander.stein@ew.tq-group.com> wrote:
> >
> >
> > Thanks for testing all my patches and helping me debug this.
> >
> > Btw, can you try to keep the subject the same please? Looks like
> > somewhere in your path [EXT] is added sometimes. lore.kernel.org keeps
> > the thread together, but my email client (gmail) gets confused.
>
> Sorry about that. Unfortunately [EXT] is inserted automatically and it is
> tedious and error-prone to remove it manually...
>
> > > Am Dienstag, 5. Juli 2022, 03:24:33 CEST schrieb Saravana Kannan:
> > > > On Mon, Jul 4, 2022 at 12:07 AM Alexander Stein
> > > >
> > > > <alexander.stein@ew.tq-group.com> wrote:
> > > > > Am Freitag, 1. Juli 2022, 09:02:22 CEST schrieb Saravana Kannan:
> > > > > > On Thu, Jun 30, 2022 at 11:02 PM Alexander Stein
> > > > > >
> > > > > > <alexander.stein@ew.tq-group.com> wrote:
> > > > > > > Hi Saravana,
> > > > > > >
> > > > > > > Am Freitag, 1. Juli 2022, 02:37:14 CEST schrieb Saravana Kannan:
> > > > > > > > On Thu, Jun 23, 2022 at 5:08 AM Alexander Stein
> > > > > > > >
> > > > > > > > <alexander.stein@ew.tq-group.com> wrote:
> > > > > > > > > Hi,
> > > > > > > > >
> > > > > > > > > Am Dienstag, 21. Juni 2022, 09:28:43 CEST schrieb Tony
> Lindgren:
> > > > > > > > > > Hi,
> > > > > > > > > >
> > > > > > > > > > * Saravana Kannan <saravanak@google.com> [700101 02:00]:
> > > > > > > > > > > Now that fw_devlink=on by default and fw_devlink supports
> > > > > > > > > > > "power-domains" property, the execution will never get to
> > > > > > > > > > > the
> > > > > > > > > > > point
> > > > > > > > > > > where driver_deferred_probe_check_state() is called before
> > > > > > > > > > > the
> > > > > > > > > > > supplier
> > > > > > > > > > > has probed successfully or before deferred probe timeout
> > > > > > > > > > > has
> > > > > > > > > > > expired.
> > > > > > > > > > >
> > > > > > > > > > > So, delete the call and replace it with -ENODEV.
> > > > > > > > > >
> > > > > > > > > > Looks like this causes omaps to not boot in Linux next. With
> > > > > > > > > > this
> > > > > > > > > > simple-pm-bus fails to probe initially as the power-domain
> > > > > > > > > > is
> > > > > > > > > > not
> > > > > > > > > > yet available. On platform_probe() genpd_get_from_provider()
> > > > > > > > > > returns
> > > > > > > > > > -ENOENT.
> > > > > > > > > >
> > > > > > > > > > Seems like other stuff is potentially broken too, any ideas
> > > > > > > > > > on
> > > > > > > > > > how to fix this?
> > > > > > > > >
> > > > > > > > > I think I'm hit by this as well, although I do not get a
> > > > > > > > > lockup.
> > > > > > > > > In my case I'm using
> > > > > > > > > arch/arm64/boot/dts/freescale/imx8mq-tqma8mq-mba8mx.dts and
> > > > > > > > > probing of
> > > > > > > > > 38320000.blk-ctrl fails as the power-domain is not (yet)
> > > > > > > > > registed.
> > > > > > > >
> > > > > > > > Ok, took a look.
> > > > > > > >
> > > > > > > > The problem is that there are two drivers for the same device
> > > > > > > > and
> > > > > > > > they
> > > > > > > > both initialize this device.
> > > > > > > >
> > > > > > > >     gpc: gpc@303a0000 {
> > > > > > > >
> > > > > > > >         compatible = "fsl,imx8mq-gpc";
> > > > > > > >
> > > > > > > >     }
> > > > > > > >
> > > > > > > > $ git grep -l "fsl,imx7d-gpc" -- drivers/
> > > > > > > > drivers/irqchip/irq-imx-gpcv2.c
> > > > > > > > drivers/soc/imx/gpcv2.c
> > > > > > > >
> > > > > > > > IMHO, this is a bad/broken design.
> > > > > > > >
> > > > > > > > So what's happening is that fw_devlink will block the probe of
> > > > > > > > 38320000.blk-ctrl until 303a0000.gpc is initialized. And it
> > > > > > > > stops
> > > > > > > > blocking the probe of 38320000.blk-ctrl as soon as the first
> > > > > > > > driver
> > > > > > > > initializes the device. In this case, it's the irqchip driver.
> > > > > > > >
> > > > > > > > I'd recommend combining these drivers into one. Something like
> > > > > > > > the
> > > > > > > > patch I'm attaching (sorry for the attachment, copy-paste is
> > > > > > > > mangling
> > > > > > > > the tabs). Can you give it a shot please?
> > > > > > >
> > > > > > > I tried this patch and it delayed the driver initialization (those
> > > > > > > of
> > > > > > > UART
> > > > > > > as
> > > > > >
> > > > > > > well BTW). Unfortunately the driver fails the same way:
> > > > > > Thanks for testing the patch!
> > > > > >
> > > > > > > > [    1.125253] imx8m-blk-ctrl 38320000.blk-ctrl: error -ENODEV:
> > > > > > > > failed
> > > > > > > > to
> > > > > > >
> > > > > > > attach power domain "bus"
> > > > > > >
> > > > > > > More than that it even introduced some more errors:
> > > > > > > > [    0.008160] irq: no irq domain found for gpc@303a0000 !
> > > > > >
> > > > > > So the idea behind my change was that as long as the irqchip isn't
> > > > > > the
> > > > > > root of the irqdomain (might be using the terms incorrectly) like
> > > > > > the
> > > > > > gic, you can make it a platform driver. And I was trying to hack up
> > > > > > a
> > > > > > patch that's the equivalent of platform_irqchip_probe() (which just
> > > > > > ends up eventually calling the callback you use in
> > > > > > IRQCHIP_DECLARE().
> > > > > > I probably made some mistake in the quick hack that I'm sure if
> > > > > > fixable.
> > > > > >
> > > > > > > > [    0.013251] Failed to map interrupt for
> > > > > > > > /soc@0/bus@30400000/timer@306a0000
> > > > > >
> > > > > > However, this timer driver also uses TIMER_OF_DECLARE() which can't
> > > > > > handle failure to get the IRQ (because it's can't -EPROBE_DEFER).
> > > > > > So,
> > > > > > this means, the timer driver inturn needs to be converted to a
> > > > > > platform driver if it's supposed to work with the IRQCHIP_DECLARE()
> > > > > > being converted to a platform driver.
> > > > > >
> > > > > > But that's a can of worms not worth opening. But then I remembered
> > > > > > this simpler workaround will work and it is pretty much a variant of
> > > > > > the workaround that's already in the gpc's irqchip driver to allow
> > > > > > two
> > > > > > drivers to probe the same device (people really should stop doing
> > > > > > that).
> > > > > >
> > > > > > Can you drop my previous hack patch and try this instead please? I'm
> > > > > > 99% sure this will work.
> > > > > >
> > > > > > diff --git a/drivers/irqchip/irq-imx-gpcv2.c
> > > > > > b/drivers/irqchip/irq-imx-gpcv2.c index b9c22f764b4d..8a0e82067924
> > > > > > 100644
> > > > > > --- a/drivers/irqchip/irq-imx-gpcv2.c
> > > > > > +++ b/drivers/irqchip/irq-imx-gpcv2.c
> > > > > > @@ -283,6 +283,7 @@ static int __init imx_gpcv2_irqchip_init(struct
> > > > > > device_node *node,
> > > > > >
> > > > > >          * later the GPC power domain driver will not be skipped.
> > > > > >          */
> > > > > >
> > > > > >         of_node_clear_flag(node, OF_POPULATED);
> > > > > >
> > > > > > +       fwnode_dev_initialized(domain->fwnode, false);
> > > > > >
> > > > > >         return 0;
> > > > > >
> > > > > >  }
> > > > >
> > > > > Just to be sure here, I tried this patch on top of next-20220701 but
> > > > > unfortunately this doesn't fix the original problem either. The timer
> > > > > errors are gone though.
> > > >
> > > > To clarify, you had the timer issue only with my "combine drivers"
> > > > patch,
> > > > right?
> > >
> > > That's correct.
> > >
> > > > > The probe of imx8m-blk-ctrl got slightly delayed (from 0.74 to 0.90s
> > > > > printk
> > > > > time) but results in the identical error message.
> > > >
> > > > My guess is that the probe attempt of blk-ctrl is delayed now till gpc
> > > > probes (because of the device links getting created with the
> > > > fwnode_dev_initialized() fix), but by the time gpc probe finishes, the
> > > > power domains aren't registered yet because of the additional level of
> > > > device addition and probing.
> > > >
> > > > Can you try the attached patch please?
> > >
> > > Sure, it needed some small fixes though. But the error still is present.
> > >
> > > > And if that doesn't fix the issues, then enable the debug logs in the
> > > > following functions please and share the logs from boot till the
> > > > failure? If you can enable CONFIG_PRINTK_CALLER, that'd help too.
> > > > device_link_add()
> > > > fwnode_link_add()
> > > > fw_devlink_relax_cycle()
> > >
> > > I switched fw_devlink_relax_cycle() for fw_devlink_relax_link() as the
> > > former has no debug output here.
> > >
> > > For the record I added the following line to my kernel command line:
> > > > dyndbg="func device_link_add +p; func fwnode_link_add +p; func
> > >
> > > fw_devlink_relax_link +p"
> > >
> > > I attached the dmesg until the probe error to this mail. But I noticed the
> > >
> > > following lines which seem interesting:
> > > > [    1.466620][    T8] imx-pgc imx-pgc-domain.5: Linked as a consumer to
> > > > regulator.8
> > > > [    1.466743][    T8] imx-pgc imx-pgc-domain.5: imx_pgc_domain_probe:
> > > > Probe>
> > > succeeded
> > >
> > > > [    1.474733][    T8] imx-pgc imx-pgc-domain.6: Linked as a consumer to
> > >
> > > regulator.9
> > >
> > > > [    1.474774][    T8] imx-pgc imx-pgc-domain.6: imx_pgc_domain_probe:
> > > > Probe>
> > > succeeded
> >
> > I'm guessing this happens after the probe error.
> >
> > Ok, I looked at the dmesg logs and this pretty much confirms my
> > thought on why the probe ordering wasn't maintained.
> >
> > The power domains lack a compatible property, so the blk-ctrl is
> > linked as a consumer of the gpc instead:
> > [    0.343905][    T1] blk-ctrl@38320000 Linked as a fwnode consumer
> > to gpc@303a0000
> > [    0.343943][    T1] blk-ctrl@38320000 Linked as a fwnode consumer
> > to clock-controller@30380000
> > This ^^ is the device tree parsing figuring out the dependencies
> > between the DT nodes.
> >
> > [    0.368462][    T1] platform 38320000.blk-ctrl: Linked as a
> > consumer to 30380000.clock-controller
> > [    0.368542][    T1] platform 38320000.blk-ctrl: Linked as a
> > consumer to 303a0000.gpc
> > This ^^ is converting the DT node dependencies into device links.
> >
> > So, the only real options are:
> > 1. Fix DT and add a compatible string to the DT nodes.
> > 2. Move the initcall level of the regulator driver so the powerdomain
> > probe doesn't get deferred. Not ideal that we are playing initcall
> > chicken to handle the feature meant to remove the need for initcall
> > chicken. But I see these "device, but won't have a compatible
> > property" as exceptions and feel it's okay to have to play with
> > initcall levels to handle those.
> > 3. Provide a helper function that driver that do this (creating
> > devices for child DT nodes without compatible property) can use to
> > move/copy their consumer device links to the child devices they add.
> > And then fix up the gpc driver so that it copies the gpc -- blk-ctrl
> > device link to the proper power domain.
> > 4. I have another idea for how I could fix that at a driver core
> > level, but I'm not sure it'll work yet and its definitely not
> > something I want to try and get in for 5.19 -- too late for that IMHO.
> >
> > Want to give (2) a shot so that I can still try to keep the cleanup
> > series that caused this problem (that's the long term goal) while I
> > give (3) and (4) a shot for 5.20?
>
> Sure, I can give (2) a shot. Which initcall needs to be modified? You have a
> diff snippet?

All initcall for all the regulator drivers that feed this gpc power domain.

> BTW: this potentially affects all imx8m and imx7d as they have the same gpc
> binding.

Good point. That's why I was asking for your help :) -- you have more
context on these hardware.

> Can't say much about (1). I added Lucas Stach to recipients, he did a lot on
> this gpc driver.
> @Lucas: Do you have some input why the gpc power domains do not have a
> compatible? Is it reasonable to add them?

It's generally frowned upon to update the kernel in a way that it
breaks backwards compatibility with an older DT binary. That's why I
didn't ask about (1).

It's fairly trivial to get it to work if we (who is "we" here?) agree
it's okay to add the compatible property and break DT backwards
compatibility in this case.

-Saravana

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v2 6/9] Revert "driver core: Set default deferred_probe_timeout back to 0."
  2022-06-01  7:07 ` [PATCH v2 6/9] Revert "driver core: Set default deferred_probe_timeout back to 0." Saravana Kannan
@ 2022-07-20 17:31   ` Geert Uytterhoeven
  2022-07-20 19:01     ` Saravana Kannan
  0 siblings, 1 reply; 69+ messages in thread
From: Geert Uytterhoeven @ 2022-07-20 17:31 UTC (permalink / raw)
  To: Saravana Kannan
  Cc: Greg Kroah-Hartman, Rafael J. Wysocki, Kevin Hilman, Ulf Hansson,
	Len Brown, Pavel Machek, Joerg Roedel, Will Deacon, Andrew Lunn,
	Heiner Kallweit, Russell King, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Linus Walleij, Hideaki YOSHIFUJI,
	David Ahern, Android Kernel Team, Linux Kernel Mailing List,
	Linux PM list, Linux IOMMU, netdev, open list:GPIO SUBSYSTEM,
	Wolfram Sang, Linux-Renesas

Hi Saravana,

On Wed, Jun 1, 2022 at 9:45 AM Saravana Kannan <saravanak@google.com> wrote:
> This reverts commit 11f7e7ef553b6b93ac1aa74a3c2011b9cc8aeb61.
>
> Let's take another shot at getting deferred_probe_timeout=10 to work.
>
> Signed-off-by: Saravana Kannan <saravanak@google.com>

Thanks for your patch, which is now commit f516d01b9df2782b
("Revert "driver core: Set default deferred_probe_timeout
back to 0."") in driver-core/driver-core-next.

Wolfram found an issue on a Renesas board where disabling the IOMMU
driver (CONFIG_IPMMU_VMSA=n) causes the system to fail to boot,
and bisected this to a merge of driver-core/driver-core-next.
After some trials, I managed to reproduce the issue, and bisected it
further to commit f516d01b9df2782b.

The affected config has:
    CONFIG_MODULES=y
    CONFIG_RCAR_DMAC=y
    CONFIG_IPMMU_VMSA=n

In arch/arm64/boot/dts/renesas/r8a77951-salvator-xs.dtb,
e6e88000.serial links to a dmac, and the dmac links to an iommu,
for which no driver is available.

Playing with deferred_probe_timeout values doesn't help.

However, the above options do not seem to be sufficient to trigger
the issue, as I had other configs with those three options that do
boot fine.

After bisecting configs, I found the culprit: CONFIG_IP_PNP.
As Wolfram was using an initramfs, CONFIG_IP_PNP was not needed.
If CONFIG_IP_PNP=n, booting fails.
If CONFIG_IP_PNP=y, booting succeeds.
In fact, just disabling late_initcall(ip_auto_config) makes it fail,
too.
Reducing ip_auto_config(), it turns out the call to
wait_for_init_devices_probe() is what is needed to unblock booting.

So I guess wait_for_init_devices_probe() needs to be called (where?)
if CONFIG_IP_PNP=n, too?

> --- a/drivers/base/dd.c
> +++ b/drivers/base/dd.c
> @@ -256,7 +256,12 @@ static int deferred_devs_show(struct seq_file *s, void *data)
>  }
>  DEFINE_SHOW_ATTRIBUTE(deferred_devs);
>
> +#ifdef CONFIG_MODULES
> +int driver_deferred_probe_timeout = 10;
> +#else
>  int driver_deferred_probe_timeout;
> +#endif
> +
>  EXPORT_SYMBOL_GPL(driver_deferred_probe_timeout);
>
>  static int __init deferred_probe_timeout_setup(char *str)

Gr{oetje,eeting}s,

                        Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v2 6/9] Revert "driver core: Set default deferred_probe_timeout back to 0."
  2022-07-20 17:31   ` Geert Uytterhoeven
@ 2022-07-20 19:01     ` Saravana Kannan
  2022-07-21  8:40       ` Geert Uytterhoeven
  0 siblings, 1 reply; 69+ messages in thread
From: Saravana Kannan @ 2022-07-20 19:01 UTC (permalink / raw)
  To: Geert Uytterhoeven
  Cc: Greg Kroah-Hartman, Rafael J. Wysocki, Kevin Hilman, Ulf Hansson,
	Len Brown, Pavel Machek, Joerg Roedel, Will Deacon, Andrew Lunn,
	Heiner Kallweit, Russell King, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Linus Walleij, Hideaki YOSHIFUJI,
	David Ahern, Android Kernel Team, Linux Kernel Mailing List,
	Linux PM list, Linux IOMMU, netdev, open list:GPIO SUBSYSTEM,
	Wolfram Sang, Linux-Renesas

On Wed, Jul 20, 2022 at 10:31 AM Geert Uytterhoeven
<geert@linux-m68k.org> wrote:
>
> Hi Saravana,
>
> On Wed, Jun 1, 2022 at 9:45 AM Saravana Kannan <saravanak@google.com> wrote:
> > This reverts commit 11f7e7ef553b6b93ac1aa74a3c2011b9cc8aeb61.
> >
> > Let's take another shot at getting deferred_probe_timeout=10 to work.
> >
> > Signed-off-by: Saravana Kannan <saravanak@google.com>
>
> Thanks for your patch, which is now commit f516d01b9df2782b
> ("Revert "driver core: Set default deferred_probe_timeout
> back to 0."") in driver-core/driver-core-next.
>
> Wolfram found an issue on a Renesas board where disabling the IOMMU
> driver (CONFIG_IPMMU_VMSA=n) causes the system to fail to boot,
> and bisected this to a merge of driver-core/driver-core-next.
> After some trials, I managed to reproduce the issue, and bisected it
> further to commit f516d01b9df2782b.
>
> The affected config has:
>     CONFIG_MODULES=y
>     CONFIG_RCAR_DMAC=y
>     CONFIG_IPMMU_VMSA=n
>
> In arch/arm64/boot/dts/renesas/r8a77951-salvator-xs.dtb,
> e6e88000.serial links to a dmac, and the dmac links to an iommu,
> for which no driver is available.

Thanks for digging into this and giving more details.

Is e6e88000.serial being blocked the reason for the boot failure?

If so, can you give this a shot?
https://lore.kernel.org/lkml/20220701012647.2007122-1-saravanak@google.com/

> Playing with deferred_probe_timeout values doesn't help.

This part is strange though. If you set deferred_probe_timeout=1,
fw_devlink will stop blocking all probes 1 second after
late_initcall()s finish. So, similar to the ip autoconfig issue, is
the issue caused by something that needs to be finished before we hit
late_initcall() but is getting blocked?

> However, the above options do not seem to be sufficient to trigger
> the issue, as I had other configs with those three options that do
> boot fine.
>
> After bisecting configs, I found the culprit: CONFIG_IP_PNP.
> As Wolfram was using an initramfs, CONFIG_IP_PNP was not needed.
> If CONFIG_IP_PNP=n, booting fails.
> If CONFIG_IP_PNP=y, booting succeeds.
> In fact, just disabling late_initcall(ip_auto_config) makes it fail,
> too.
> Reducing ip_auto_config(), it turns out the call to
> wait_for_init_devices_probe() is what is needed to unblock booting.
>
> So I guess wait_for_init_devices_probe() needs to be called (where?)
> if CONFIG_IP_PNP=n, too?

That function just unblocks all devices and allows them to try and
probe and then waits for all possible probes to finish before
returning. They problem with call it randomly/every time is that it
breaks functionality where an optional supplier will probe after a few
modules are loaded in the future.

I guess one possible issue with the timeout not helping is that once
the timeout expires, things are still being probed and nothing is
being blocked till they finish probing.

I'm trying to have the default config (in terms of fw_devlink,
deferred probe behavior, timeouts, etc) be the same between a fully
static kernel (but CONFIG_MODULES still enabled) and a fully modular
kernel (like GKI). But it might end up being an untenable problem.

I'll wait to see what specifically is the issue in this case and then
I'll go from there.

-Saravana

> > --- a/drivers/base/dd.c
> > +++ b/drivers/base/dd.c
> > @@ -256,7 +256,12 @@ static int deferred_devs_show(struct seq_file *s, void *data)
> >  }
> >  DEFINE_SHOW_ATTRIBUTE(deferred_devs);
> >
> > +#ifdef CONFIG_MODULES
> > +int driver_deferred_probe_timeout = 10;
> > +#else
> >  int driver_deferred_probe_timeout;
> > +#endif
> > +
> >  EXPORT_SYMBOL_GPL(driver_deferred_probe_timeout);
> >
> >  static int __init deferred_probe_timeout_setup(char *str)
>
> Gr{oetje,eeting}s,
>
>                         Geert
>
> --
> Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org
>
> In personal conversations with technical people, I call myself a hacker. But
> when I'm talking to journalists I just say "programmer" or something like that.
>                                 -- Linus Torvalds
>
> --
> To unsubscribe from this group and stop receiving emails from it, send an email to kernel-team+unsubscribe@android.com.
>

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v2 6/9] Revert "driver core: Set default deferred_probe_timeout back to 0."
  2022-07-20 19:01     ` Saravana Kannan
@ 2022-07-21  8:40       ` Geert Uytterhoeven
  0 siblings, 0 replies; 69+ messages in thread
From: Geert Uytterhoeven @ 2022-07-21  8:40 UTC (permalink / raw)
  To: Saravana Kannan
  Cc: Greg Kroah-Hartman, Rafael J. Wysocki, Kevin Hilman, Ulf Hansson,
	Len Brown, Pavel Machek, Joerg Roedel, Will Deacon, Andrew Lunn,
	Heiner Kallweit, Russell King, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Linus Walleij, Hideaki YOSHIFUJI,
	David Ahern, Android Kernel Team, Linux Kernel Mailing List,
	Linux PM list, Linux IOMMU, netdev, open list:GPIO SUBSYSTEM,
	Wolfram Sang, Linux-Renesas

Hi Saravana,

On Wed, Jul 20, 2022 at 9:02 PM Saravana Kannan <saravanak@google.com> wrote:
> On Wed, Jul 20, 2022 at 10:31 AM Geert Uytterhoeven
> <geert@linux-m68k.org> wrote:
> > On Wed, Jun 1, 2022 at 9:45 AM Saravana Kannan <saravanak@google.com> wrote:
> > > This reverts commit 11f7e7ef553b6b93ac1aa74a3c2011b9cc8aeb61.
> > >
> > > Let's take another shot at getting deferred_probe_timeout=10 to work.
> > >
> > > Signed-off-by: Saravana Kannan <saravanak@google.com>
> >
> > Thanks for your patch, which is now commit f516d01b9df2782b
> > ("Revert "driver core: Set default deferred_probe_timeout
> > back to 0."") in driver-core/driver-core-next.
> >
> > Wolfram found an issue on a Renesas board where disabling the IOMMU
> > driver (CONFIG_IPMMU_VMSA=n) causes the system to fail to boot,
> > and bisected this to a merge of driver-core/driver-core-next.
> > After some trials, I managed to reproduce the issue, and bisected it
> > further to commit f516d01b9df2782b.
> >
> > The affected config has:
> >     CONFIG_MODULES=y
> >     CONFIG_RCAR_DMAC=y
> >     CONFIG_IPMMU_VMSA=n
> >
> > In arch/arm64/boot/dts/renesas/r8a77951-salvator-xs.dtb,
> > e6e88000.serial links to a dmac, and the dmac links to an iommu,
> > for which no driver is available.
>
> Thanks for digging into this and giving more details.
>
> Is e6e88000.serial being blocked the reason for the boot failure?

It doesn't seem to be.

> If so, can you give this a shot?
> https://lore.kernel.org/lkml/20220701012647.2007122-1-saravanak@google.com/

Thanks, but it doesn't make a difference.

> > After bisecting configs, I found the culprit: CONFIG_IP_PNP.
> > As Wolfram was using an initramfs, CONFIG_IP_PNP was not needed.
> > If CONFIG_IP_PNP=n, booting fails.
> > If CONFIG_IP_PNP=y, booting succeeds.
> > In fact, just disabling late_initcall(ip_auto_config) makes it fail,
> > too.
> > Reducing ip_auto_config(), it turns out the call to
> > wait_for_init_devices_probe() is what is needed to unblock booting.
> >
> > So I guess wait_for_init_devices_probe() needs to be called (where?)
> > if CONFIG_IP_PNP=n, too?
>
> That function just unblocks all devices and allows them to try and
> probe and then waits for all possible probes to finish before
> returning. They problem with call it randomly/every time is that it
> breaks functionality where an optional supplier will probe after a few
> modules are loaded in the future.
>
> I guess one possible issue with the timeout not helping is that once
> the timeout expires, things are still being probed and nothing is
> being blocked till they finish probing.

I'm not sure that it's a device that's missing.

Calling wait_for_init_devices_probe() or not changes lots of little
things in the probing order. But when comparing the sorted boot logs,
there does not seem to be any difference in the list of devices that
was probed successfully.
It looks like the system is just blocked on something else?...

I tried getting a list of all locks held using Magic SysRq + d,
but Magic SysRq on the serial console does not work at this point
(it does work in the booted kernel with CONFIG_IP_PNP=y).


Gr{oetje,eeting}s,

                        Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v2 3/9] net: mdio: Delete usage of driver_deferred_probe_check_state()
  2022-07-05  9:11   ` Geert Uytterhoeven
  2022-07-13  1:40     ` Saravana Kannan
@ 2022-08-15  8:38     ` Geert Uytterhoeven
  1 sibling, 0 replies; 69+ messages in thread
From: Geert Uytterhoeven @ 2022-08-15  8:38 UTC (permalink / raw)
  To: Saravana Kannan
  Cc: Greg Kroah-Hartman, Rafael J. Wysocki, Kevin Hilman, Ulf Hansson,
	Len Brown, Pavel Machek, Joerg Roedel, Will Deacon, Andrew Lunn,
	Heiner Kallweit, Russell King, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Linus Walleij, Hideaki YOSHIFUJI,
	David Ahern, Android Kernel Team, Linux Kernel Mailing List,
	Linux PM list, Linux IOMMU, netdev, open list:GPIO SUBSYSTEM,
	Linux-Renesas

Hi Saravana,

On Tue, Jul 5, 2022 at 11:11 AM Geert Uytterhoeven <geert@linux-m68k.org> wrote:
> On Wed, Jun 1, 2022 at 2:44 PM Saravana Kannan <saravanak@google.com> wrote:
> > Now that fw_devlink=on by default and fw_devlink supports interrupt
> > properties, the execution will never get to the point where
> > driver_deferred_probe_check_state() is called before the supplier has
> > probed successfully or before deferred probe timeout has expired.
> >
> > So, delete the call and replace it with -ENODEV.
> >
> > Signed-off-by: Saravana Kannan <saravanak@google.com>
>
> Thanks for your patch, which is now commit f8217275b57aa48d ("net:
> mdio: Delete usage of driver_deferred_probe_check_state()") in
> driver-core/driver-core-next.
>
> Seems like I missed something when providing my T-b for this series,
> sorry for that.
>
> arch/arm/boot/dts/r8a7791-koelsch.dts has:
>
>     &ether {
>             pinctrl-0 = <&ether_pins>, <&phy1_pins>;
>             pinctrl-names = "default";
>
>             phy-handle = <&phy1>;
>             renesas,ether-link-active-low;
>             status = "okay";
>
>             phy1: ethernet-phy@1 {
>                     compatible = "ethernet-phy-id0022.1537",
>                                  "ethernet-phy-ieee802.3-c22";
>                     reg = <1>;
>                     interrupt-parent = <&irqc0>;
>                     interrupts = <0 IRQ_TYPE_LEVEL_LOW>;
>                     micrel,led-mode = <1>;
>                     reset-gpios = <&gpio5 22 GPIO_ACTIVE_LOW>;
>             };
>     };
>
> Despite the interrupts property, &ether is now probed before irqc0
> (interrupt-controller@e61c0000 in arch/arm/boot/dts/r8a7791.dtsi),
> causing the PHY not finding its interrupt, and resorting to polling:
>
>     -Micrel KSZ8041RNLI ee700000.ethernet-ffffffff:01: attached PHY
> driver (mii_bus:phy_addr=ee700000.ethernet-ffffffff:01, irq=185)
>     +Micrel KSZ8041RNLI ee700000.ethernet-ffffffff:01: attached PHY
> driver (mii_bus:phy_addr=ee700000.ethernet-ffffffff:01, irq=POLL)
>
> Reverting this commit, and commit 9cbffc7a59561be9 ("driver core:
> Delete driver_deferred_probe_check_state()") fixes that.

FTR, this issue is now present in v6.0-rc1.
I haven't tried your newest series yet.

Gr{oetje,eeting}s,

                        Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply	[flat|nested] 69+ messages in thread

end of thread, other threads:[~2022-08-15  8:39 UTC | newest]

Thread overview: 69+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-06-01  7:06 [PATCH v2 0/9] deferred_probe_timeout logic clean up Saravana Kannan
2022-06-01  7:06 ` [PATCH v2 1/9] PM: domains: Delete usage of driver_deferred_probe_check_state() Saravana Kannan
2022-06-09 11:44   ` Ulf Hansson
2022-06-09 19:29     ` Saravana Kannan
2022-06-21  7:28   ` Tony Lindgren
2022-06-21 19:34     ` Saravana Kannan
2022-06-22  4:58       ` Tony Lindgren
2022-06-22 19:09         ` Saravana Kannan
2022-06-23  7:01           ` Tony Lindgren
2022-06-23  8:21             ` Saravana Kannan
2022-06-27  9:10               ` Tony Lindgren
2022-06-30 23:10                 ` Saravana Kannan
2022-06-30 23:26                   ` Rob Herring
2022-06-30 23:30                     ` Saravana Kannan
2022-07-01  5:33                       ` Tony Lindgren
2022-07-01  6:12                         ` Tony Lindgren
2022-07-01  8:10                           ` Saravana Kannan
2022-07-01  8:26                             ` Saravana Kannan
2022-07-01 13:00                               ` Tony Lindgren
2022-07-12  7:12                                 ` Tony Lindgren
2022-07-13  0:49                                   ` Saravana Kannan
2022-07-13  8:06                                     ` Tony Lindgren
2022-07-01 15:08                               ` Sudeep Holla
2022-07-01 19:13                                 ` Saravana Kannan
2022-07-05  8:44                                   ` Saravana Kannan
2022-07-01  7:38                   ` Geert Uytterhoeven
2022-06-23 12:08     ` Alexander Stein
2022-07-01  0:37       ` Saravana Kannan
2022-07-01  6:01         ` (EXT) " Alexander Stein
2022-07-01  7:02           ` Saravana Kannan
2022-07-04  7:07             ` (EXT) " Alexander Stein
2022-07-05  1:24               ` Saravana Kannan
2022-07-06 13:02                 ` Re: " Alexander Stein
2022-07-13  0:45                   ` Saravana Kannan
2022-07-14  6:41                     ` Alexander Stein
2022-07-15 22:08                       ` Saravana Kannan
2022-07-01  7:30         ` Geert Uytterhoeven
2022-06-01  7:06 ` [PATCH v2 2/9] pinctrl: devicetree: " Saravana Kannan
2022-06-01  7:06 ` [PATCH v2 3/9] net: mdio: " Saravana Kannan
2022-07-05  9:11   ` Geert Uytterhoeven
2022-07-13  1:40     ` Saravana Kannan
2022-07-13 11:39       ` Geert Uytterhoeven
2022-08-15  8:38     ` Geert Uytterhoeven
2022-06-01  7:07 ` [PATCH v2 4/9] driver core: Add wait_for_init_devices_probe helper function Saravana Kannan
2022-06-01  7:07 ` [PATCH v2 5/9] net: ipconfig: Relax fw_devlink if we need to mount a network rootfs Saravana Kannan
2022-06-01  7:07 ` [PATCH v2 6/9] Revert "driver core: Set default deferred_probe_timeout back to 0." Saravana Kannan
2022-07-20 17:31   ` Geert Uytterhoeven
2022-07-20 19:01     ` Saravana Kannan
2022-07-21  8:40       ` Geert Uytterhoeven
2022-06-01  7:07 ` [PATCH v2 7/9] driver core: Set fw_devlink.strict=1 by default Saravana Kannan
2022-06-22  7:47   ` Sascha Hauer
2022-06-22  8:44     ` Linus Walleij
2022-06-22 10:52       ` Andy Shevchenko
2022-06-22 11:18         ` Sascha Hauer
2022-06-22 19:40       ` Saravana Kannan
2022-06-22 20:35         ` Saravana Kannan
2022-06-22 22:30           ` Saravana Kannan
2022-06-28 13:09         ` Linus Walleij
2022-06-01  7:07 ` [PATCH v2 8/9] iommu/of: Delete usage of driver_deferred_probe_check_state() Saravana Kannan
2022-06-01  7:07 ` [PATCH v2 9/9] driver core: Delete driver_deferred_probe_check_state() Saravana Kannan
2022-06-07 18:07 ` [PATCH v2 0/9] deferred_probe_timeout logic clean up Geert Uytterhoeven
2022-06-08  0:55   ` Saravana Kannan
2022-06-08  4:17     ` Saravana Kannan
2022-06-08 10:25       ` Geert Uytterhoeven
2022-06-08 18:12         ` Saravana Kannan
2022-06-08 18:47           ` Geert Uytterhoeven
2022-06-08 21:07             ` Saravana Kannan
2022-06-08 22:49               ` Jakub Kicinski
2022-06-08 23:15                 ` Saravana Kannan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).