linux-acpi.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v1 0/4] Optimize fw_devlink parsing
@ 2020-05-15  5:34 Saravana Kannan
  2020-05-15  5:34 ` [PATCH v1 1/4] driver core: Move code to the right part of the file Saravana Kannan
                   ` (4 more replies)
  0 siblings, 5 replies; 22+ messages in thread
From: Saravana Kannan @ 2020-05-15  5:34 UTC (permalink / raw)
  To: Greg Kroah-Hartman, Rafael J. Wysocki, Rob Herring, Frank Rowand,
	Len Brown
  Cc: Saravana Kannan, kernel-team, linux-kernel, devicetree,
	linux-acpi, Ji Luo

When fw_devlink is enabled on hardware with a large number of device
tree nodes, the initial device addition done in
of_platform_default_populate_init() can be very inefficient. This is
because most devices will fail to find all their suppliers when they are
added and will keep trying to parse their device tree nodes and link to
any newly added devices

This was an item on my TODO list that I'm finally getting around to. On
hardware I'm testing on, this saved 1.216 _seconds_!  Another SoC vendor
was also able to test a similar but hacky patch series and confirmed
that it saved them around 1 second.

Thanks,
Saravana
P.S: It took me longer to write the comments than the code!

Saravana Kannan (4):
  driver core: Move code to the right part of the file
  driver core: Look for waiting consumers only for a fwnode's primary
    device
  driver core: fw_devlink: Add support for batching fwnode parsing
  of: platform: Batch fwnode parsing when adding all top level devices

 drivers/base/base.h    |   1 +
 drivers/base/core.c    | 193 ++++++++++++++++++++++++++++++++---------
 drivers/base/dd.c      |   8 ++
 drivers/of/platform.c  |   2 +
 include/linux/fwnode.h |   2 +
 5 files changed, 164 insertions(+), 42 deletions(-)

-- 
2.26.2.761.g0e0b3e54be-goog


^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH v1 1/4] driver core: Move code to the right part of the file
  2020-05-15  5:34 [PATCH v1 0/4] Optimize fw_devlink parsing Saravana Kannan
@ 2020-05-15  5:34 ` Saravana Kannan
  2020-05-15  5:34 ` [PATCH v1 2/4] driver core: Look for waiting consumers only for a fwnode's primary device Saravana Kannan
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 22+ messages in thread
From: Saravana Kannan @ 2020-05-15  5:34 UTC (permalink / raw)
  To: Greg Kroah-Hartman, Rafael J. Wysocki, Rob Herring, Frank Rowand,
	Len Brown
  Cc: Saravana Kannan, kernel-team, linux-kernel, devicetree,
	linux-acpi, Ji Luo

This commit just moves around code to match the general organization of
the file.

Signed-off-by: Saravana Kannan <saravanak@google.com>
---
 drivers/base/core.c | 60 ++++++++++++++++++++++-----------------------
 1 file changed, 30 insertions(+), 30 deletions(-)

diff --git a/drivers/base/core.c b/drivers/base/core.c
index c9045521596f..2b454aae64b5 100644
--- a/drivers/base/core.c
+++ b/drivers/base/core.c
@@ -1143,6 +1143,36 @@ static void device_links_purge(struct device *dev)
 	device_links_write_unlock();
 }
 
+static u32 fw_devlink_flags = DL_FLAG_SYNC_STATE_ONLY;
+static int __init fw_devlink_setup(char *arg)
+{
+	if (!arg)
+		return -EINVAL;
+
+	if (strcmp(arg, "off") == 0) {
+		fw_devlink_flags = 0;
+	} else if (strcmp(arg, "permissive") == 0) {
+		fw_devlink_flags = DL_FLAG_SYNC_STATE_ONLY;
+	} else if (strcmp(arg, "on") == 0) {
+		fw_devlink_flags = DL_FLAG_AUTOPROBE_CONSUMER;
+	} else if (strcmp(arg, "rpm") == 0) {
+		fw_devlink_flags = DL_FLAG_AUTOPROBE_CONSUMER |
+				   DL_FLAG_PM_RUNTIME;
+	}
+	return 0;
+}
+early_param("fw_devlink", fw_devlink_setup);
+
+u32 fw_devlink_get_flags(void)
+{
+	return fw_devlink_flags;
+}
+
+static bool fw_devlink_is_permissive(void)
+{
+	return fw_devlink_flags == DL_FLAG_SYNC_STATE_ONLY;
+}
+
 /* Device links support end. */
 
 int (*platform_notify)(struct device *dev) = NULL;
@@ -2345,36 +2375,6 @@ static int device_private_init(struct device *dev)
 	return 0;
 }
 
-static u32 fw_devlink_flags = DL_FLAG_SYNC_STATE_ONLY;
-static int __init fw_devlink_setup(char *arg)
-{
-	if (!arg)
-		return -EINVAL;
-
-	if (strcmp(arg, "off") == 0) {
-		fw_devlink_flags = 0;
-	} else if (strcmp(arg, "permissive") == 0) {
-		fw_devlink_flags = DL_FLAG_SYNC_STATE_ONLY;
-	} else if (strcmp(arg, "on") == 0) {
-		fw_devlink_flags = DL_FLAG_AUTOPROBE_CONSUMER;
-	} else if (strcmp(arg, "rpm") == 0) {
-		fw_devlink_flags = DL_FLAG_AUTOPROBE_CONSUMER |
-				   DL_FLAG_PM_RUNTIME;
-	}
-	return 0;
-}
-early_param("fw_devlink", fw_devlink_setup);
-
-u32 fw_devlink_get_flags(void)
-{
-	return fw_devlink_flags;
-}
-
-static bool fw_devlink_is_permissive(void)
-{
-	return fw_devlink_flags == DL_FLAG_SYNC_STATE_ONLY;
-}
-
 /**
  * device_add - add device to device hierarchy.
  * @dev: device.
-- 
2.26.2.761.g0e0b3e54be-goog


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v1 2/4] driver core: Look for waiting consumers only for a fwnode's primary device
  2020-05-15  5:34 [PATCH v1 0/4] Optimize fw_devlink parsing Saravana Kannan
  2020-05-15  5:34 ` [PATCH v1 1/4] driver core: Move code to the right part of the file Saravana Kannan
@ 2020-05-15  5:34 ` Saravana Kannan
  2020-05-15  5:34 ` [PATCH v1 3/4] driver core: fw_devlink: Add support for batching fwnode parsing Saravana Kannan
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 22+ messages in thread
From: Saravana Kannan @ 2020-05-15  5:34 UTC (permalink / raw)
  To: Greg Kroah-Hartman, Rafael J. Wysocki, Rob Herring, Frank Rowand,
	Len Brown
  Cc: Saravana Kannan, kernel-team, linux-kernel, devicetree,
	linux-acpi, Ji Luo

Commit 4dbe191c046e ("driver core: Add device links from fwnode only for
the primary device") skipped linking a fwnode's secondary device to
the suppliers listed in its fwnode.

However, a fwnode's secondary device can't be found using
get_dev_from_fwnode(). So, there's no point in trying to see if devices
waiting for suppliers might want to link to a fwnode's secondary device.

This commit removes that unnecessary step for devices that aren't a
fwnode's primary device and also moves the code to a more appropriate
part of the file.

Signed-off-by: Saravana Kannan <saravanak@google.com>
---
 drivers/base/core.c | 29 ++++++++++++++++++-----------
 1 file changed, 18 insertions(+), 11 deletions(-)

diff --git a/drivers/base/core.c b/drivers/base/core.c
index 2b454aae64b5..f585d92e09d0 100644
--- a/drivers/base/core.c
+++ b/drivers/base/core.c
@@ -1173,6 +1173,21 @@ static bool fw_devlink_is_permissive(void)
 	return fw_devlink_flags == DL_FLAG_SYNC_STATE_ONLY;
 }
 
+static void fw_devlink_link_device(struct device *dev)
+{
+	int fw_ret;
+
+	device_link_add_missing_supplier_links();
+
+	if (fw_devlink_flags && fwnode_has_op(dev->fwnode, add_links)) {
+		fw_ret = fwnode_call_int_op(dev->fwnode, add_links, dev);
+		if (fw_ret == -ENODEV && !fw_devlink_is_permissive())
+			device_link_wait_for_mandatory_supplier(dev);
+		else if (fw_ret)
+			device_link_wait_for_optional_supplier(dev);
+	}
+}
+
 /* Device links support end. */
 
 int (*platform_notify)(struct device *dev) = NULL;
@@ -2407,7 +2422,7 @@ int device_add(struct device *dev)
 	struct device *parent;
 	struct kobject *kobj;
 	struct class_interface *class_intf;
-	int error = -EINVAL, fw_ret;
+	int error = -EINVAL;
 	struct kobject *glue_dir = NULL;
 	bool is_fwnode_dev = false;
 
@@ -2524,16 +2539,8 @@ int device_add(struct device *dev)
 	 * waiting consumers can link to it before the driver is bound to the
 	 * device and the driver sync_state callback is called for this device.
 	 */
-	device_link_add_missing_supplier_links();
-
-	if (fw_devlink_flags && is_fwnode_dev &&
-	    fwnode_has_op(dev->fwnode, add_links)) {
-		fw_ret = fwnode_call_int_op(dev->fwnode, add_links, dev);
-		if (fw_ret == -ENODEV && !fw_devlink_is_permissive())
-			device_link_wait_for_mandatory_supplier(dev);
-		else if (fw_ret)
-			device_link_wait_for_optional_supplier(dev);
-	}
+	if (is_fwnode_dev)
+		fw_devlink_link_device(dev);
 
 	bus_probe_device(dev);
 	if (parent)
-- 
2.26.2.761.g0e0b3e54be-goog


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v1 3/4] driver core: fw_devlink: Add support for batching fwnode parsing
  2020-05-15  5:34 [PATCH v1 0/4] Optimize fw_devlink parsing Saravana Kannan
  2020-05-15  5:34 ` [PATCH v1 1/4] driver core: Move code to the right part of the file Saravana Kannan
  2020-05-15  5:34 ` [PATCH v1 2/4] driver core: Look for waiting consumers only for a fwnode's primary device Saravana Kannan
@ 2020-05-15  5:34 ` Saravana Kannan
  2020-05-15  5:35 ` [PATCH v1 4/4] of: platform: Batch fwnode parsing when adding all top level devices Saravana Kannan
  2020-05-15  8:52 ` [PATCH v1 0/4] Optimize fw_devlink parsing Saravana Kannan
  4 siblings, 0 replies; 22+ messages in thread
From: Saravana Kannan @ 2020-05-15  5:34 UTC (permalink / raw)
  To: Greg Kroah-Hartman, Rafael J. Wysocki, Rob Herring, Frank Rowand,
	Len Brown
  Cc: Saravana Kannan, kernel-team, linux-kernel, devicetree,
	linux-acpi, Ji Luo

The amount of time spent parsing fwnodes of devices can become really
high if the devices are added in an non-ideal order. Worst case can be
O(N^2) when N devices are added. But this can be optimized to O(N) by
adding all the devices and then parsing all their fwnodes in one batch.

This commit adds fw_devlink_pause() and fw_devlink_resume() to allow
doing this.

Signed-off-by: Saravana Kannan <saravanak@google.com>
---
 drivers/base/base.h    |   1 +
 drivers/base/core.c    | 116 ++++++++++++++++++++++++++++++++++++++---
 drivers/base/dd.c      |   8 +++
 include/linux/fwnode.h |   2 +
 4 files changed, 120 insertions(+), 7 deletions(-)

diff --git a/drivers/base/base.h b/drivers/base/base.h
index 40fb069a8a7e..95c22c0f9036 100644
--- a/drivers/base/base.h
+++ b/drivers/base/base.h
@@ -153,6 +153,7 @@ extern char *make_class_name(const char *name, struct kobject *kobj);
 extern int devres_release_all(struct device *dev);
 extern void device_block_probing(void);
 extern void device_unblock_probing(void);
+extern void driver_deferred_probe_force_trigger(void);
 
 /* /sys/devices directory */
 extern struct kset *devices_kset;
diff --git a/drivers/base/core.c b/drivers/base/core.c
index f585d92e09d0..84c569726d75 100644
--- a/drivers/base/core.c
+++ b/drivers/base/core.c
@@ -49,6 +49,9 @@ static LIST_HEAD(wait_for_suppliers);
 static DEFINE_MUTEX(wfs_lock);
 static LIST_HEAD(deferred_sync);
 static unsigned int defer_sync_state_count = 1;
+static unsigned int defer_fw_devlink_count;
+static DEFINE_MUTEX(defer_fw_devlink_lock);
+static bool fw_devlink_is_permissive(void);
 
 #ifdef CONFIG_SRCU
 static DEFINE_MUTEX(device_links_lock);
@@ -527,7 +530,7 @@ static void device_link_add_missing_supplier_links(void)
 		int ret = fwnode_call_int_op(dev->fwnode, add_links, dev);
 		if (!ret)
 			list_del_init(&dev->links.needs_suppliers);
-		else if (ret != -ENODEV)
+		else if (ret != -ENODEV || fw_devlink_is_permissive())
 			dev->links.need_for_probe = false;
 	}
 	mutex_unlock(&wfs_lock);
@@ -1177,17 +1180,116 @@ static void fw_devlink_link_device(struct device *dev)
 {
 	int fw_ret;
 
-	device_link_add_missing_supplier_links();
+	if (!fw_devlink_flags)
+		return;
+
+	mutex_lock(&defer_fw_devlink_lock);
+	if (!defer_fw_devlink_count)
+		device_link_add_missing_supplier_links();
+
+	/*
+	 * The device's fwnode not having add_links() doesn't affect if other
+	 * consumers can find this device as a supplier.  So, this check is
+	 * intentionally placed after device_link_add_missing_supplier_links().
+	 */
+	if (!fwnode_has_op(dev->fwnode, add_links))
+		goto out;
 
-	if (fw_devlink_flags && fwnode_has_op(dev->fwnode, add_links)) {
+	/*
+	 * If fw_devlink is being deferred, assume all devices have mandatory
+	 * suppliers they need to link to later. Then, when the fw_devlink is
+	 * resumed, all these devices will get a chance to try and link to any
+	 * suppliers they have.
+	 */
+	if (!defer_fw_devlink_count) {
 		fw_ret = fwnode_call_int_op(dev->fwnode, add_links, dev);
-		if (fw_ret == -ENODEV && !fw_devlink_is_permissive())
-			device_link_wait_for_mandatory_supplier(dev);
-		else if (fw_ret)
-			device_link_wait_for_optional_supplier(dev);
+		if (fw_ret == -ENODEV && fw_devlink_is_permissive())
+			fw_ret = -EAGAIN;
+	} else {
+		fw_ret = -ENODEV;
 	}
+
+	if (fw_ret == -ENODEV)
+		device_link_wait_for_mandatory_supplier(dev);
+	else if (fw_ret)
+		device_link_wait_for_optional_supplier(dev);
+
+out:
+	mutex_unlock(&defer_fw_devlink_lock);
 }
 
+/**
+ * fw_devlink_pause - Pause parsing of fwnode to create device links
+ *
+ * Calling this function defers any fwnode parsing to create device links until
+ * fw_devlink_resume() is called. Both these functions are ref counted and the
+ * caller needs to match the calls.
+ *
+ * While fw_devlink is paused:
+ * - Any device that is added won't have its fwnode parsed to create device
+ *   links.
+ * - The probe of the device will also be deferred during this period.
+ * - Any devices that were already added, but waiting for suppliers won't be
+ *   able to link to newly added devices.
+ *
+ * Once fw_devlink_resume():
+ * - All the fwnodes that was not parsed will be parsed.
+ * - All the devices that were deferred probing will be reattempted if they
+ *   aren't waiting for any more suppliers.
+ *
+ * This pair of functions, is mainly meant to optimize the parsing of fwnodes
+ * when a lot of devices that need to link to each other are added in a short
+ * interval of time. For example, adding all the top level devices in a system.
+ *
+ * For example, if N devices are added and:
+ * - All the consumers are added before their suppliers
+ * - All the suppliers of the N devices are part of the N devices
+ *
+ * Then:
+ *
+ * - With the use of fw_devlink_pause() and fw_devlink_resume(), each device
+ *   will only need one parsing of its fwnode because it is guaranteed to find
+ *   all the supplier devices already registered and ready to link to. It won't
+ *   have to do another pass later to find one or more suppliers it couldn't
+ *   find in the first parse of the fwnode. So, we'll only need O(N) fwnode
+ *   parses.
+ *
+ * - Without the use of fw_devlink_pause() and fw_devlink_resume(), we would
+ *   end up doing O(N^2) parses of fwnodes because every device that's added is
+ *   guaranteed to trigger a parse of the fwnode of every device added before
+ *   it. This O(N^2) parse is made worse by the fact that when a fwnode of a
+ *   device is parsed, all it descendant devices might need to have their
+ *   fwnodes parsed too (even if the devices themselves aren't added).
+ */
+void fw_devlink_pause(void)
+{
+	mutex_lock(&defer_fw_devlink_lock);
+	defer_fw_devlink_count++;
+	mutex_unlock(&defer_fw_devlink_lock);
+}
+
+/** fw_devlink_resume - Resume parsing of fwnode to create device links
+ *
+ * This function is used in conjunction with fw_devlink_pause() and is ref
+ * counted. See documentation for fw_devlink_pause() for more details.
+ */
+void fw_devlink_resume(void)
+{
+	mutex_lock(&defer_fw_devlink_lock);
+	if (!defer_fw_devlink_count) {
+		WARN(true, "Unmatched fw_devlink pause/resume!");
+		goto out;
+	}
+
+	defer_fw_devlink_count--;
+	if (defer_fw_devlink_count)
+		goto out;
+
+	device_link_add_missing_supplier_links();
+	driver_deferred_probe_force_trigger();
+out:
+	mutex_unlock(&defer_fw_devlink_lock);
+}
 /* Device links support end. */
 
 int (*platform_notify)(struct device *dev) = NULL;
diff --git a/drivers/base/dd.c b/drivers/base/dd.c
index 48ca81cb8ebc..63991d97adcc 100644
--- a/drivers/base/dd.c
+++ b/drivers/base/dd.c
@@ -160,6 +160,14 @@ static bool driver_deferred_probe_enable = false;
  * again.
  */
 static void driver_deferred_probe_trigger(void)
+{
+	if (!driver_deferred_probe_enable)
+		return;
+
+	driver_deferred_probe_force_trigger();
+}
+
+void driver_deferred_probe_force_trigger(void)
 {
 	if (!driver_deferred_probe_enable)
 		return;
diff --git a/include/linux/fwnode.h b/include/linux/fwnode.h
index e0abafbb17f8..9506f8ec0974 100644
--- a/include/linux/fwnode.h
+++ b/include/linux/fwnode.h
@@ -171,5 +171,7 @@ struct fwnode_operations {
 #define get_dev_from_fwnode(fwnode)	get_device((fwnode)->dev)
 
 extern u32 fw_devlink_get_flags(void);
+void fw_devlink_pause(void);
+void fw_devlink_resume(void);
 
 #endif
-- 
2.26.2.761.g0e0b3e54be-goog


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v1 4/4] of: platform: Batch fwnode parsing when adding all top level devices
  2020-05-15  5:34 [PATCH v1 0/4] Optimize fw_devlink parsing Saravana Kannan
                   ` (2 preceding siblings ...)
  2020-05-15  5:34 ` [PATCH v1 3/4] driver core: fw_devlink: Add support for batching fwnode parsing Saravana Kannan
@ 2020-05-15  5:35 ` Saravana Kannan
       [not found]   ` <CGME20200519062510eucas1p27bc59da66e1b77534855103a27f87452@eucas1p2.samsung.com>
  2020-06-17 12:20   ` Geert Uytterhoeven
  2020-05-15  8:52 ` [PATCH v1 0/4] Optimize fw_devlink parsing Saravana Kannan
  4 siblings, 2 replies; 22+ messages in thread
From: Saravana Kannan @ 2020-05-15  5:35 UTC (permalink / raw)
  To: Greg Kroah-Hartman, Rafael J. Wysocki, Rob Herring, Frank Rowand,
	Len Brown
  Cc: Saravana Kannan, kernel-team, linux-kernel, devicetree,
	linux-acpi, Ji Luo

The fw_devlink_pause() and fw_devlink_resume() APIs allow batching the
parsing of the device tree nodes when a lot of devices are added. This
will significantly cut down parsing time (as much a 1 second on some
systems). So, use them when adding devices for all the top level device
tree nodes in a system.

Signed-off-by: Saravana Kannan <saravanak@google.com>
---
 drivers/of/platform.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/of/platform.c b/drivers/of/platform.c
index 3371e4a06248..55d719347810 100644
--- a/drivers/of/platform.c
+++ b/drivers/of/platform.c
@@ -538,7 +538,9 @@ static int __init of_platform_default_populate_init(void)
 	}
 
 	/* Populate everything else. */
+	fw_devlink_pause();
 	of_platform_default_populate(NULL, NULL, NULL);
+	fw_devlink_resume();
 
 	return 0;
 }
-- 
2.26.2.761.g0e0b3e54be-goog


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH v1 0/4] Optimize fw_devlink parsing
  2020-05-15  5:34 [PATCH v1 0/4] Optimize fw_devlink parsing Saravana Kannan
                   ` (3 preceding siblings ...)
  2020-05-15  5:35 ` [PATCH v1 4/4] of: platform: Batch fwnode parsing when adding all top level devices Saravana Kannan
@ 2020-05-15  8:52 ` Saravana Kannan
  2020-05-15 14:36   ` Greg Kroah-Hartman
  4 siblings, 1 reply; 22+ messages in thread
From: Saravana Kannan @ 2020-05-15  8:52 UTC (permalink / raw)
  To: Greg Kroah-Hartman, Rafael J. Wysocki, Rob Herring, Frank Rowand,
	Len Brown
  Cc: Android Kernel Team, LKML,
	open list:OPEN FIRMWARE AND FLATTENED DEVICE TREE BINDINGS,
	ACPI Devel Maling List, Ji Luo

On Thu, May 14, 2020 at 10:35 PM Saravana Kannan <saravanak@google.com> wrote:
>
> When fw_devlink is enabled on hardware with a large number of device
> tree nodes, the initial device addition done in
> of_platform_default_populate_init() can be very inefficient. This is
> because most devices will fail to find all their suppliers when they are
> added and will keep trying to parse their device tree nodes and link to
> any newly added devices
>
> This was an item on my TODO list that I'm finally getting around to. On
> hardware I'm testing on, this saved 1.216 _seconds_!

Correction. It went from 1.216 _seconds_ to 61 _milliseconds_! So
about 95% reduction in time.

-Saravana

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v1 0/4] Optimize fw_devlink parsing
  2020-05-15  8:52 ` [PATCH v1 0/4] Optimize fw_devlink parsing Saravana Kannan
@ 2020-05-15 14:36   ` Greg Kroah-Hartman
  0 siblings, 0 replies; 22+ messages in thread
From: Greg Kroah-Hartman @ 2020-05-15 14:36 UTC (permalink / raw)
  To: Saravana Kannan
  Cc: Rafael J. Wysocki, Rob Herring, Frank Rowand, Len Brown,
	Android Kernel Team, LKML,
	open list:OPEN FIRMWARE AND FLATTENED DEVICE TREE BINDINGS,
	ACPI Devel Maling List, Ji Luo

On Fri, May 15, 2020 at 01:52:37AM -0700, Saravana Kannan wrote:
> On Thu, May 14, 2020 at 10:35 PM Saravana Kannan <saravanak@google.com> wrote:
> >
> > When fw_devlink is enabled on hardware with a large number of device
> > tree nodes, the initial device addition done in
> > of_platform_default_populate_init() can be very inefficient. This is
> > because most devices will fail to find all their suppliers when they are
> > added and will keep trying to parse their device tree nodes and link to
> > any newly added devices
> >
> > This was an item on my TODO list that I'm finally getting around to. On
> > hardware I'm testing on, this saved 1.216 _seconds_!
> 
> Correction. It went from 1.216 _seconds_ to 61 _milliseconds_! So
> about 95% reduction in time.

Nice speedups!  All now queued up, thanks.

greg k-h

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v1 4/4] of: platform: Batch fwnode parsing when adding all top level devices
       [not found]   ` <CGME20200519062510eucas1p27bc59da66e1b77534855103a27f87452@eucas1p2.samsung.com>
@ 2020-05-19  6:25     ` Marek Szyprowski
  2020-05-19  6:48       ` Saravana Kannan
  0 siblings, 1 reply; 22+ messages in thread
From: Marek Szyprowski @ 2020-05-19  6:25 UTC (permalink / raw)
  To: Saravana Kannan, Greg Kroah-Hartman, Rafael J. Wysocki,
	Rob Herring, Frank Rowand, Len Brown
  Cc: kernel-team, linux-kernel, devicetree, linux-acpi, Ji Luo,
	'Linux Samsung SOC'

Hi Saravana,

On 15.05.2020 07:35, Saravana Kannan wrote:
> The fw_devlink_pause() and fw_devlink_resume() APIs allow batching the
> parsing of the device tree nodes when a lot of devices are added. This
> will significantly cut down parsing time (as much a 1 second on some
> systems). So, use them when adding devices for all the top level device
> tree nodes in a system.
>
> Signed-off-by: Saravana Kannan <saravanak@google.com>

This patch recently landed in linux-next 20200518. Sadly, it causes 
regression on Samsung Exynos5433-based TM2e board:

s3c64xx-spi 14d30000.spi: Failed to get RX DMA channel
s3c64xx-spi 14d50000.spi: Failed to get RX DMA channel
s3c64xx-spi 14d30000.spi: Failed to get RX DMA channel
s3c64xx-spi 14d50000.spi: Failed to get RX DMA channel
s3c64xx-spi 14d30000.spi: Failed to get RX DMA channel

Internal error: synchronous external abort: 96000210 [#1] PREEMPT SMP
Modules linked in:
CPU: 0 PID: 50 Comm: kworker/0:1 Not tainted 5.7.0-rc5+ #701
Hardware name: Samsung TM2E board (DT)
Workqueue: events deferred_probe_work_func
pstate: 60000005 (nZCv daif -PAN -UAO)
pc : samsung_i2s_probe+0x768/0x8f0
lr : samsung_i2s_probe+0x688/0x8f0
...
Call trace:
  samsung_i2s_probe+0x768/0x8f0
  platform_drv_probe+0x50/0xa8
  really_probe+0x108/0x370
  driver_probe_device+0x54/0xb8
  __device_attach_driver+0x90/0xc0
  bus_for_each_drv+0x70/0xc8
  __device_attach+0xdc/0x140
  device_initial_probe+0x10/0x18
  bus_probe_device+0x94/0xa0
  deferred_probe_work_func+0x70/0xa8
  process_one_work+0x2a8/0x718
  worker_thread+0x48/0x470
  kthread+0x134/0x160
  ret_from_fork+0x10/0x1c
Code: 17ffffaf d503201f f94086c0 91003000 (88dffc00)
---[ end trace ccf721c9400ddbd6 ]---
Kernel panic - not syncing: Fatal exception
SMP: stopping secondary CPUs
Kernel Offset: disabled
CPU features: 0x090002,24006087
Memory Limit: none

---[ end Kernel panic - not syncing: Fatal exception ]---

Both issues, the lack of DMA for SPI device and Synchronous abort in I2S 
probe are new after applying this patch. I'm trying to investigate which 
resources are missing and why. The latter issue means typically that the 
registers for the given device has been accessed without enabling the 
needed clocks or power domains.

> ---
>   drivers/of/platform.c | 2 ++
>   1 file changed, 2 insertions(+)
>
> diff --git a/drivers/of/platform.c b/drivers/of/platform.c
> index 3371e4a06248..55d719347810 100644
> --- a/drivers/of/platform.c
> +++ b/drivers/of/platform.c
> @@ -538,7 +538,9 @@ static int __init of_platform_default_populate_init(void)
>   	}
>   
>   	/* Populate everything else. */
> +	fw_devlink_pause();
>   	of_platform_default_populate(NULL, NULL, NULL);
> +	fw_devlink_resume();
>   
>   	return 0;
>   }

Best regards
-- 
Marek Szyprowski, PhD
Samsung R&D Institute Poland


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v1 4/4] of: platform: Batch fwnode parsing when adding all top level devices
  2020-05-19  6:25     ` Marek Szyprowski
@ 2020-05-19  6:48       ` Saravana Kannan
  2020-05-19  7:11         ` Marek Szyprowski
  0 siblings, 1 reply; 22+ messages in thread
From: Saravana Kannan @ 2020-05-19  6:48 UTC (permalink / raw)
  To: Marek Szyprowski
  Cc: Greg Kroah-Hartman, Rafael J. Wysocki, Rob Herring, Frank Rowand,
	Len Brown, Android Kernel Team, LKML,
	open list:OPEN FIRMWARE AND FLATTENED DEVICE TREE BINDINGS,
	ACPI Devel Maling List, Ji Luo, Linux Samsung SOC

On Mon, May 18, 2020 at 11:25 PM Marek Szyprowski
<m.szyprowski@samsung.com> wrote:
>
> Hi Saravana,
>
> On 15.05.2020 07:35, Saravana Kannan wrote:
> > The fw_devlink_pause() and fw_devlink_resume() APIs allow batching the
> > parsing of the device tree nodes when a lot of devices are added. This
> > will significantly cut down parsing time (as much a 1 second on some
> > systems). So, use them when adding devices for all the top level device
> > tree nodes in a system.
> >
> > Signed-off-by: Saravana Kannan <saravanak@google.com>
>
> This patch recently landed in linux-next 20200518. Sadly, it causes
> regression on Samsung Exynos5433-based TM2e board:
>
> s3c64xx-spi 14d30000.spi: Failed to get RX DMA channel
> s3c64xx-spi 14d50000.spi: Failed to get RX DMA channel
> s3c64xx-spi 14d30000.spi: Failed to get RX DMA channel
> s3c64xx-spi 14d50000.spi: Failed to get RX DMA channel
> s3c64xx-spi 14d30000.spi: Failed to get RX DMA channel
>
> Internal error: synchronous external abort: 96000210 [#1] PREEMPT SMP
> Modules linked in:
> CPU: 0 PID: 50 Comm: kworker/0:1 Not tainted 5.7.0-rc5+ #701
> Hardware name: Samsung TM2E board (DT)
> Workqueue: events deferred_probe_work_func
> pstate: 60000005 (nZCv daif -PAN -UAO)
> pc : samsung_i2s_probe+0x768/0x8f0
> lr : samsung_i2s_probe+0x688/0x8f0
> ...
> Call trace:
>   samsung_i2s_probe+0x768/0x8f0
>   platform_drv_probe+0x50/0xa8
>   really_probe+0x108/0x370
>   driver_probe_device+0x54/0xb8
>   __device_attach_driver+0x90/0xc0
>   bus_for_each_drv+0x70/0xc8
>   __device_attach+0xdc/0x140
>   device_initial_probe+0x10/0x18
>   bus_probe_device+0x94/0xa0
>   deferred_probe_work_func+0x70/0xa8
>   process_one_work+0x2a8/0x718
>   worker_thread+0x48/0x470
>   kthread+0x134/0x160
>   ret_from_fork+0x10/0x1c
> Code: 17ffffaf d503201f f94086c0 91003000 (88dffc00)
> ---[ end trace ccf721c9400ddbd6 ]---
> Kernel panic - not syncing: Fatal exception
> SMP: stopping secondary CPUs
> Kernel Offset: disabled
> CPU features: 0x090002,24006087
> Memory Limit: none
>
> ---[ end Kernel panic - not syncing: Fatal exception ]---
>
> Both issues, the lack of DMA for SPI device and Synchronous abort in I2S
> probe are new after applying this patch. I'm trying to investigate which
> resources are missing and why. The latter issue means typically that the
> registers for the given device has been accessed without enabling the
> needed clocks or power domains.

Did you try this copy-pasta fix that I sent later?
https://lore.kernel.org/lkml/20200517173453.157703-1-saravanak@google.com/

Not every system would need it (my test setup didn't), but it helps some cases.

If that fix doesn't help, then some tips for debugging the failing drivers.
What this pause/resume patch effectively (not explicitly) does is:
1. Doesn't immediately probe the devices as they are added in
of_platform_default_populate_init()
2. Adds them in order to the deferred probe list.
3. Then kicks off deferred probe on them in the order they were added.

These drivers are just not handling -EPROBE_DEFER correctly or
assuming probe order and that's causing these issues.

So, we can either fix that or you can try adding some code to flush
the deferred probe workqueue at the end of fw_devlink_resume().

Let me know how it goes.

Thanks,
Saravana

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v1 4/4] of: platform: Batch fwnode parsing when adding all top level devices
  2020-05-19  6:48       ` Saravana Kannan
@ 2020-05-19  7:11         ` Marek Szyprowski
  2020-05-19 10:32           ` Marek Szyprowski
  0 siblings, 1 reply; 22+ messages in thread
From: Marek Szyprowski @ 2020-05-19  7:11 UTC (permalink / raw)
  To: Saravana Kannan
  Cc: Greg Kroah-Hartman, Rafael J. Wysocki, Rob Herring, Frank Rowand,
	Len Brown, Android Kernel Team, LKML,
	open list:OPEN FIRMWARE AND FLATTENED DEVICE TREE BINDINGS,
	ACPI Devel Maling List, Ji Luo, Linux Samsung SOC

Hi Saravana,

On 19.05.2020 08:48, Saravana Kannan wrote:
> On Mon, May 18, 2020 at 11:25 PM Marek Szyprowski
> <m.szyprowski@samsung.com> wrote:
>> On 15.05.2020 07:35, Saravana Kannan wrote:
>>> The fw_devlink_pause() and fw_devlink_resume() APIs allow batching the
>>> parsing of the device tree nodes when a lot of devices are added. This
>>> will significantly cut down parsing time (as much a 1 second on some
>>> systems). So, use them when adding devices for all the top level device
>>> tree nodes in a system.
>>>
>>> Signed-off-by: Saravana Kannan <saravanak@google.com>
>> This patch recently landed in linux-next 20200518. Sadly, it causes
>> regression on Samsung Exynos5433-based TM2e board:
>>
>> s3c64xx-spi 14d30000.spi: Failed to get RX DMA channel
>> s3c64xx-spi 14d50000.spi: Failed to get RX DMA channel
>> s3c64xx-spi 14d30000.spi: Failed to get RX DMA channel
>> s3c64xx-spi 14d50000.spi: Failed to get RX DMA channel
>> s3c64xx-spi 14d30000.spi: Failed to get RX DMA channel
>>
>> Internal error: synchronous external abort: 96000210 [#1] PREEMPT SMP
>> Modules linked in:
>> CPU: 0 PID: 50 Comm: kworker/0:1 Not tainted 5.7.0-rc5+ #701
>> Hardware name: Samsung TM2E board (DT)
>> Workqueue: events deferred_probe_work_func
>> pstate: 60000005 (nZCv daif -PAN -UAO)
>> pc : samsung_i2s_probe+0x768/0x8f0
>> lr : samsung_i2s_probe+0x688/0x8f0
>> ...
>> Call trace:
>>    samsung_i2s_probe+0x768/0x8f0
>>    platform_drv_probe+0x50/0xa8
>>    really_probe+0x108/0x370
>>    driver_probe_device+0x54/0xb8
>>    __device_attach_driver+0x90/0xc0
>>    bus_for_each_drv+0x70/0xc8
>>    __device_attach+0xdc/0x140
>>    device_initial_probe+0x10/0x18
>>    bus_probe_device+0x94/0xa0
>>    deferred_probe_work_func+0x70/0xa8
>>    process_one_work+0x2a8/0x718
>>    worker_thread+0x48/0x470
>>    kthread+0x134/0x160
>>    ret_from_fork+0x10/0x1c
>> Code: 17ffffaf d503201f f94086c0 91003000 (88dffc00)
>> ---[ end trace ccf721c9400ddbd6 ]---
>> Kernel panic - not syncing: Fatal exception
>> SMP: stopping secondary CPUs
>> Kernel Offset: disabled
>> CPU features: 0x090002,24006087
>> Memory Limit: none
>>
>> ---[ end Kernel panic - not syncing: Fatal exception ]---
>>
>> Both issues, the lack of DMA for SPI device and Synchronous abort in I2S
>> probe are new after applying this patch. I'm trying to investigate which
>> resources are missing and why. The latter issue means typically that the
>> registers for the given device has been accessed without enabling the
>> needed clocks or power domains.
> Did you try this copy-pasta fix that I sent later?
> https://lore.kernel.org/lkml/20200517173453.157703-1-saravanak@google.com/
>
> Not every system would need it (my test setup didn't), but it helps some cases.
>
> If that fix doesn't help, then some tips for debugging the failing drivers.
> What this pause/resume patch effectively (not explicitly) does is:
> 1. Doesn't immediately probe the devices as they are added in
> of_platform_default_populate_init()
> 2. Adds them in order to the deferred probe list.
> 3. Then kicks off deferred probe on them in the order they were added.
>
> These drivers are just not handling -EPROBE_DEFER correctly or
> assuming probe order and that's causing these issues.
>
> So, we can either fix that or you can try adding some code to flush
> the deferred probe workqueue at the end of fw_devlink_resume().
>
> Let me know how it goes.

So far it looks that your patch revealed a hidden issue in exynos5433 
clocks configuration, because adding clk_ignore_unused parameter to 
kernel command line fixes the boot. I'm still investigating it, so 
probable you can ignore my regression report. I will let you know asap I 
finish checking it.

Best regards
-- 
Marek Szyprowski, PhD
Samsung R&D Institute Poland


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v1 4/4] of: platform: Batch fwnode parsing when adding all top level devices
  2020-05-19  7:11         ` Marek Szyprowski
@ 2020-05-19 10:32           ` Marek Szyprowski
  2020-05-19 18:02             ` Saravana Kannan
  0 siblings, 1 reply; 22+ messages in thread
From: Marek Szyprowski @ 2020-05-19 10:32 UTC (permalink / raw)
  To: Saravana Kannan
  Cc: Greg Kroah-Hartman, Rafael J. Wysocki, Rob Herring, Frank Rowand,
	Len Brown, Android Kernel Team, LKML,
	open list:OPEN FIRMWARE AND FLATTENED DEVICE TREE BINDINGS,
	ACPI Devel Maling List, Ji Luo, Linux Samsung SOC

Hi

On 19.05.2020 09:11, Marek Szyprowski wrote:
> On 19.05.2020 08:48, Saravana Kannan wrote:
>> On Mon, May 18, 2020 at 11:25 PM Marek Szyprowski
>> <m.szyprowski@samsung.com> wrote:
>>> On 15.05.2020 07:35, Saravana Kannan wrote:
>>>> The fw_devlink_pause() and fw_devlink_resume() APIs allow batching the
>>>> parsing of the device tree nodes when a lot of devices are added. This
>>>> will significantly cut down parsing time (as much a 1 second on some
>>>> systems). So, use them when adding devices for all the top level 
>>>> device
>>>> tree nodes in a system.
>>>>
>>>> Signed-off-by: Saravana Kannan <saravanak@google.com>
>>> This patch recently landed in linux-next 20200518. Sadly, it causes
>>> regression on Samsung Exynos5433-based TM2e board:
>>>
>>> ...
>>>
>>> Both issues, the lack of DMA for SPI device and Synchronous abort in 
>>> I2S
>>> probe are new after applying this patch. I'm trying to investigate 
>>> which
>>> resources are missing and why. The latter issue means typically that 
>>> the
>>> registers for the given device has been accessed without enabling the
>>> needed clocks or power domains.
>> Did you try this copy-pasta fix that I sent later?
>> https://lore.kernel.org/lkml/20200517173453.157703-1-saravanak@google.com/ 
>>
>>
>> Not every system would need it (my test setup didn't), but it helps 
>> some cases.
>>
>> If that fix doesn't help, then some tips for debugging the failing 
>> drivers.
>> What this pause/resume patch effectively (not explicitly) does is:
>> 1. Doesn't immediately probe the devices as they are added in
>> of_platform_default_populate_init()
>> 2. Adds them in order to the deferred probe list.
>> 3. Then kicks off deferred probe on them in the order they were added.
>>
>> These drivers are just not handling -EPROBE_DEFER correctly or
>> assuming probe order and that's causing these issues.
>>
>> So, we can either fix that or you can try adding some code to flush
>> the deferred probe workqueue at the end of fw_devlink_resume().
>>
>> Let me know how it goes.
>
> So far it looks that your patch revealed a hidden issue in exynos5433 
> clocks configuration, because adding clk_ignore_unused parameter to 
> kernel command line fixes the boot. I'm still investigating it, so 
> probable you can ignore my regression report. I will let you know asap 
> I finish checking it.
>
Okay, I confirm that the issue is in the Exynos I2S driver and 
Exynos5433 clock provider. I've posted a quick workaround. I'm sorry for 
the noise, your patch is fine.

Best regards
-- 
Marek Szyprowski, PhD
Samsung R&D Institute Poland


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v1 4/4] of: platform: Batch fwnode parsing when adding all top level devices
  2020-05-19 10:32           ` Marek Szyprowski
@ 2020-05-19 18:02             ` Saravana Kannan
  2020-05-20  4:21               ` Marek Szyprowski
  0 siblings, 1 reply; 22+ messages in thread
From: Saravana Kannan @ 2020-05-19 18:02 UTC (permalink / raw)
  To: Marek Szyprowski
  Cc: Greg Kroah-Hartman, Rafael J. Wysocki, Rob Herring, Frank Rowand,
	Len Brown, Android Kernel Team, LKML,
	open list:OPEN FIRMWARE AND FLATTENED DEVICE TREE BINDINGS,
	ACPI Devel Maling List, Ji Luo, Linux Samsung SOC

On Tue, May 19, 2020 at 3:32 AM Marek Szyprowski
<m.szyprowski@samsung.com> wrote:
>
> Hi
>
> On 19.05.2020 09:11, Marek Szyprowski wrote:
> > On 19.05.2020 08:48, Saravana Kannan wrote:
> >> On Mon, May 18, 2020 at 11:25 PM Marek Szyprowski
> >> <m.szyprowski@samsung.com> wrote:
> >>> On 15.05.2020 07:35, Saravana Kannan wrote:
> >>>> The fw_devlink_pause() and fw_devlink_resume() APIs allow batching the
> >>>> parsing of the device tree nodes when a lot of devices are added. This
> >>>> will significantly cut down parsing time (as much a 1 second on some
> >>>> systems). So, use them when adding devices for all the top level
> >>>> device
> >>>> tree nodes in a system.
> >>>>
> >>>> Signed-off-by: Saravana Kannan <saravanak@google.com>
> >>> This patch recently landed in linux-next 20200518. Sadly, it causes
> >>> regression on Samsung Exynos5433-based TM2e board:
> >>>
> >>> ...
> >>>
> >>> Both issues, the lack of DMA for SPI device and Synchronous abort in
> >>> I2S
> >>> probe are new after applying this patch. I'm trying to investigate
> >>> which
> >>> resources are missing and why. The latter issue means typically that
> >>> the
> >>> registers for the given device has been accessed without enabling the
> >>> needed clocks or power domains.
> >> Did you try this copy-pasta fix that I sent later?
> >> https://lore.kernel.org/lkml/20200517173453.157703-1-saravanak@google.com/
> >>
> >>
> >> Not every system would need it (my test setup didn't), but it helps
> >> some cases.
> >>
> >> If that fix doesn't help, then some tips for debugging the failing
> >> drivers.
> >> What this pause/resume patch effectively (not explicitly) does is:
> >> 1. Doesn't immediately probe the devices as they are added in
> >> of_platform_default_populate_init()
> >> 2. Adds them in order to the deferred probe list.
> >> 3. Then kicks off deferred probe on them in the order they were added.
> >>
> >> These drivers are just not handling -EPROBE_DEFER correctly or
> >> assuming probe order and that's causing these issues.
> >>
> >> So, we can either fix that or you can try adding some code to flush
> >> the deferred probe workqueue at the end of fw_devlink_resume().
> >>
> >> Let me know how it goes.
> >
> > So far it looks that your patch revealed a hidden issue in exynos5433
> > clocks configuration, because adding clk_ignore_unused parameter to
> > kernel command line fixes the boot. I'm still investigating it, so
> > probable you can ignore my regression report. I will let you know asap
> > I finish checking it.
> >
> Okay, I confirm that the issue is in the Exynos I2S driver and
> Exynos5433 clock provider. I've posted a quick workaround. I'm sorry for
> the noise, your patch is fine.

Thanks for debugging and finding the real issue. I tried finding your
patches, but couldn't. Can you point me to a lore.kernel.org link? I'm
just curious to see what the issue was.

I'm guessing you didn't need to pick up this one?
https://lore.kernel.org/lkml/20200517173453.157703-1-saravanak@google.com/

-Saravana

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v1 4/4] of: platform: Batch fwnode parsing when adding all top level devices
  2020-05-19 18:02             ` Saravana Kannan
@ 2020-05-20  4:21               ` Marek Szyprowski
  0 siblings, 0 replies; 22+ messages in thread
From: Marek Szyprowski @ 2020-05-20  4:21 UTC (permalink / raw)
  To: Saravana Kannan
  Cc: Greg Kroah-Hartman, Rafael J. Wysocki, Rob Herring, Frank Rowand,
	Len Brown, Android Kernel Team, LKML,
	open list:OPEN FIRMWARE AND FLATTENED DEVICE TREE BINDINGS,
	ACPI Devel Maling List, Ji Luo, Linux Samsung SOC

Hi Saravana,

On 19.05.2020 20:02, Saravana Kannan wrote:
> On Tue, May 19, 2020 at 3:32 AM Marek Szyprowski
> <m.szyprowski@samsung.com> wrote:
>> On 19.05.2020 09:11, Marek Szyprowski wrote:
>>> On 19.05.2020 08:48, Saravana Kannan wrote:
>>>> On Mon, May 18, 2020 at 11:25 PM Marek Szyprowski
>>>> <m.szyprowski@samsung.com> wrote:
>>>>> On 15.05.2020 07:35, Saravana Kannan wrote:
>>>>>> The fw_devlink_pause() and fw_devlink_resume() APIs allow batching the
>>>>>> parsing of the device tree nodes when a lot of devices are added. This
>>>>>> will significantly cut down parsing time (as much a 1 second on some
>>>>>> systems). So, use them when adding devices for all the top level
>>>>>> device
>>>>>> tree nodes in a system.
>>>>>>
>>>>>> Signed-off-by: Saravana Kannan <saravanak@google.com>
>>>>> This patch recently landed in linux-next 20200518. Sadly, it causes
>>>>> regression on Samsung Exynos5433-based TM2e board:
>>>>>
>>>>> ...
>>>>>
>>>>> Both issues, the lack of DMA for SPI device and Synchronous abort in
>>>>> I2S
>>>>> probe are new after applying this patch. I'm trying to investigate
>>>>> which
>>>>> resources are missing and why. The latter issue means typically that
>>>>> the
>>>>> registers for the given device has been accessed without enabling the
>>>>> needed clocks or power domains.
>>>> Did you try this copy-pasta fix that I sent later?
>>>> https://lore.kernel.org/lkml/20200517173453.157703-1-saravanak@google.com/
>>>>
>>>>
>>>> Not every system would need it (my test setup didn't), but it helps
>>>> some cases.
>>>>
>>>> If that fix doesn't help, then some tips for debugging the failing
>>>> drivers.
>>>> What this pause/resume patch effectively (not explicitly) does is:
>>>> 1. Doesn't immediately probe the devices as they are added in
>>>> of_platform_default_populate_init()
>>>> 2. Adds them in order to the deferred probe list.
>>>> 3. Then kicks off deferred probe on them in the order they were added.
>>>>
>>>> These drivers are just not handling -EPROBE_DEFER correctly or
>>>> assuming probe order and that's causing these issues.
>>>>
>>>> So, we can either fix that or you can try adding some code to flush
>>>> the deferred probe workqueue at the end of fw_devlink_resume().
>>>>
>>>> Let me know how it goes.
>>> So far it looks that your patch revealed a hidden issue in exynos5433
>>> clocks configuration, because adding clk_ignore_unused parameter to
>>> kernel command line fixes the boot. I'm still investigating it, so
>>> probable you can ignore my regression report. I will let you know asap
>>> I finish checking it.
>>>
>> Okay, I confirm that the issue is in the Exynos I2S driver and
>> Exynos5433 clock provider. I've posted a quick workaround. I'm sorry for
>> the noise, your patch is fine.
> Thanks for debugging and finding the real issue. I tried finding your
> patches, but couldn't. Can you point me to a lore.kernel.org link? I'm
> just curious to see what the issue was.

https://lore.kernel.org/linux-samsung-soc/f67db8c1-453b-4c70-67b9-59762ac34f64@kernel.org/T/#t

It looks that one more clock has to be enabled to properly read init 
configuration. So far it worked, because that device was probed much 
earlier, before the unused clocks are turned off. Your patch changed the 
probe order, so that device is probed later.

> I'm guessing you didn't need to pick up this one?
> https://lore.kernel.org/lkml/20200517173453.157703-1-saravanak@google.com/

Best regards
-- 
Marek Szyprowski, PhD
Samsung R&D Institute Poland


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v1 4/4] of: platform: Batch fwnode parsing when adding all top level devices
  2020-05-15  5:35 ` [PATCH v1 4/4] of: platform: Batch fwnode parsing when adding all top level devices Saravana Kannan
       [not found]   ` <CGME20200519062510eucas1p27bc59da66e1b77534855103a27f87452@eucas1p2.samsung.com>
@ 2020-06-17 12:20   ` Geert Uytterhoeven
  2020-06-17 18:36     ` Saravana Kannan
  1 sibling, 1 reply; 22+ messages in thread
From: Geert Uytterhoeven @ 2020-06-17 12:20 UTC (permalink / raw)
  To: Saravana Kannan
  Cc: Greg Kroah-Hartman, Rafael J. Wysocki, Rob Herring, Frank Rowand,
	Len Brown, Android Kernel Team, Linux Kernel Mailing List,
	open list:OPEN FIRMWARE AND FLATTENED DEVICE TREE BINDINGS,
	ACPI Devel Maling List, Ji Luo, Linux-Renesas, Marek Szyprowski

Hi Saravana,

On Fri, May 15, 2020 at 7:38 AM Saravana Kannan <saravanak@google.com> wrote:
> The fw_devlink_pause() and fw_devlink_resume() APIs allow batching the
> parsing of the device tree nodes when a lot of devices are added. This
> will significantly cut down parsing time (as much a 1 second on some
> systems). So, use them when adding devices for all the top level device
> tree nodes in a system.
>
> Signed-off-by: Saravana Kannan <saravanak@google.com>

This is now commit 93d2e4322aa74c1a ("of: platform: Batch fwnode parsing
when adding all top level devices") in v5.8-rc1, and I have bisected a
regression to it: on r8a7740/armadillo and sh73a0/kzm9g, the system can
no longer be woken up from s2ram by a GPIO key. Reverting the commit
fixes the issue.

On these systems, the GPIO/PFC block has its interrupt lines connected
to intermediate interrupt controllers (Renesas INTC), which are in turn
connected to the main interrupt controller (ARM GIC).  The INTC block is
part of a power and clock domain.  Hence if a GPIO is enabled as a
wake-up source, the INTC is part of the wake-up path, and thus must be
kept enabled when entering s2ram.

While this commit has no impact on probe order for me (unlike in Marek's
case), it does have an impact on suspend order:
  - Before this commit:
      1. The keyboard (gpio-keys) is suspended, and calls
         enable_irq_wake() to inform the upstream interrupt controller
         (INTC) that it is part of the wake-up path,
      2. INTC is suspended, and calls device_set_wakeup_path() to inform
         the device core that it must be kept enabled,
      3. The system is woken by pressing a wake-up key.

  - After this commit:
      1. INTC is suspended, and is not aware it is part of the wake-up
         path, so it is disabled by the device core,
      2. gpio-keys is suspended, and calls enable_irq_wake() in vain,
      3. Pressing a wake-up key has no effect, as INTC is disabled, and
         the interrupt does not come through.

It looks like no device links are involved, as both gpio-keys and INTC have
no links.
Do you have a clue?

Thanks!

> --- a/drivers/of/platform.c
> +++ b/drivers/of/platform.c
> @@ -538,7 +538,9 @@ static int __init of_platform_default_populate_init(void)
>         }
>
>         /* Populate everything else. */
> +       fw_devlink_pause();
>         of_platform_default_populate(NULL, NULL, NULL);
> +       fw_devlink_resume();
>
>         return 0;
>  }

Gr{oetje,eeting}s,

                        Geert

-- 
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v1 4/4] of: platform: Batch fwnode parsing when adding all top level devices
  2020-06-17 12:20   ` Geert Uytterhoeven
@ 2020-06-17 18:36     ` Saravana Kannan
  2020-06-18  7:31       ` Geert Uytterhoeven
  0 siblings, 1 reply; 22+ messages in thread
From: Saravana Kannan @ 2020-06-17 18:36 UTC (permalink / raw)
  To: Geert Uytterhoeven
  Cc: Greg Kroah-Hartman, Rafael J. Wysocki, Rob Herring, Frank Rowand,
	Len Brown, Android Kernel Team, Linux Kernel Mailing List,
	open list:OPEN FIRMWARE AND FLATTENED DEVICE TREE BINDINGS,
	ACPI Devel Maling List, Ji Luo, Linux-Renesas, Marek Szyprowski

Hi Geert,

On Wed, Jun 17, 2020 at 5:20 AM Geert Uytterhoeven <geert@linux-m68k.org> wrote:
>
> Hi Saravana,
>
> On Fri, May 15, 2020 at 7:38 AM Saravana Kannan <saravanak@google.com> wrote:
> > The fw_devlink_pause() and fw_devlink_resume() APIs allow batching the
> > parsing of the device tree nodes when a lot of devices are added. This
> > will significantly cut down parsing time (as much a 1 second on some
> > systems). So, use them when adding devices for all the top level device
> > tree nodes in a system.
> >
> > Signed-off-by: Saravana Kannan <saravanak@google.com>
>
> This is now commit 93d2e4322aa74c1a ("of: platform: Batch fwnode parsing
> when adding all top level devices") in v5.8-rc1, and I have bisected a
> regression to it: on r8a7740/armadillo and sh73a0/kzm9g, the system can
> no longer be woken up from s2ram by a GPIO key. Reverting the commit
> fixes the issue.
>
> On these systems, the GPIO/PFC block has its interrupt lines connected
> to intermediate interrupt controllers (Renesas INTC), which are in turn
> connected to the main interrupt controller (ARM GIC).  The INTC block is
> part of a power and clock domain.  Hence if a GPIO is enabled as a
> wake-up source, the INTC is part of the wake-up path, and thus must be
> kept enabled when entering s2ram.
>
> While this commit has no impact on probe order for me (unlike in Marek's
> case), it does have an impact on suspend order:
>   - Before this commit:
>       1. The keyboard (gpio-keys) is suspended, and calls
>          enable_irq_wake() to inform the upstream interrupt controller
>          (INTC) that it is part of the wake-up path,
>       2. INTC is suspended, and calls device_set_wakeup_path() to inform
>          the device core that it must be kept enabled,
>       3. The system is woken by pressing a wake-up key.
>
>   - After this commit:
>       1. INTC is suspended, and is not aware it is part of the wake-up
>          path, so it is disabled by the device core,
>       2. gpio-keys is suspended, and calls enable_irq_wake() in vain,
>       3. Pressing a wake-up key has no effect, as INTC is disabled, and
>          the interrupt does not come through.
>
> It looks like no device links are involved, as both gpio-keys and INTC have
> no links.
> Do you have a clue?
>
> Thanks!

That patch of mine defers probe on all devices added by the
of_platform_default_populate() call, and then once the call returns,
it immediately triggers a deferred probe.

So all these devices are being probed in parallel in the deferred
probe workqueue while the main "initcall thread" continues down to
further initcalls. It looks like some of the drivers in subsequent
initcalls are assuming that devices in the earlier initcalls always
probe and can't be deferred?

There are two options.
1. Fix these drivers.
2. Add a "flush deferred workqueue" in fw_devlink_resume()

I'd rather we fix the drivers so that they handle deferred probes
correctly. Thoughts?

-Saravana

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v1 4/4] of: platform: Batch fwnode parsing when adding all top level devices
  2020-06-17 18:36     ` Saravana Kannan
@ 2020-06-18  7:31       ` Geert Uytterhoeven
  2020-06-18 23:00         ` Saravana Kannan
  0 siblings, 1 reply; 22+ messages in thread
From: Geert Uytterhoeven @ 2020-06-18  7:31 UTC (permalink / raw)
  To: Saravana Kannan
  Cc: Greg Kroah-Hartman, Rafael J. Wysocki, Rob Herring, Frank Rowand,
	Len Brown, Android Kernel Team, Linux Kernel Mailing List,
	open list:OPEN FIRMWARE AND FLATTENED DEVICE TREE BINDINGS,
	ACPI Devel Maling List, Ji Luo, Linux-Renesas, Marek Szyprowski

Hi Saravana,

On Wed, Jun 17, 2020 at 8:36 PM Saravana Kannan <saravanak@google.com> wrote:
> On Wed, Jun 17, 2020 at 5:20 AM Geert Uytterhoeven <geert@linux-m68k.org> wrote:
> > On Fri, May 15, 2020 at 7:38 AM Saravana Kannan <saravanak@google.com> wrote:
> > > The fw_devlink_pause() and fw_devlink_resume() APIs allow batching the
> > > parsing of the device tree nodes when a lot of devices are added. This
> > > will significantly cut down parsing time (as much a 1 second on some
> > > systems). So, use them when adding devices for all the top level device
> > > tree nodes in a system.
> > >
> > > Signed-off-by: Saravana Kannan <saravanak@google.com>
> >
> > This is now commit 93d2e4322aa74c1a ("of: platform: Batch fwnode parsing
> > when adding all top level devices") in v5.8-rc1, and I have bisected a
> > regression to it: on r8a7740/armadillo and sh73a0/kzm9g, the system can
> > no longer be woken up from s2ram by a GPIO key. Reverting the commit
> > fixes the issue.
> >
> > On these systems, the GPIO/PFC block has its interrupt lines connected
> > to intermediate interrupt controllers (Renesas INTC), which are in turn
> > connected to the main interrupt controller (ARM GIC).  The INTC block is
> > part of a power and clock domain.  Hence if a GPIO is enabled as a
> > wake-up source, the INTC is part of the wake-up path, and thus must be
> > kept enabled when entering s2ram.
> >
> > While this commit has no impact on probe order for me (unlike in Marek's
> > case), it does have an impact on suspend order:
> >   - Before this commit:
> >       1. The keyboard (gpio-keys) is suspended, and calls
> >          enable_irq_wake() to inform the upstream interrupt controller
> >          (INTC) that it is part of the wake-up path,
> >       2. INTC is suspended, and calls device_set_wakeup_path() to inform
> >          the device core that it must be kept enabled,
> >       3. The system is woken by pressing a wake-up key.
> >
> >   - After this commit:
> >       1. INTC is suspended, and is not aware it is part of the wake-up
> >          path, so it is disabled by the device core,
> >       2. gpio-keys is suspended, and calls enable_irq_wake() in vain,
> >       3. Pressing a wake-up key has no effect, as INTC is disabled, and
> >          the interrupt does not come through.
> >
> > It looks like no device links are involved, as both gpio-keys and INTC have
> > no links.
> > Do you have a clue?
> >
> > Thanks!
>
> That patch of mine defers probe on all devices added by the
> of_platform_default_populate() call, and then once the call returns,
> it immediately triggers a deferred probe.
>
> So all these devices are being probed in parallel in the deferred
> probe workqueue while the main "initcall thread" continues down to
> further initcalls. It looks like some of the drivers in subsequent
> initcalls are assuming that devices in the earlier initcalls always
> probe and can't be deferred?
>
> There are two options.
> 1. Fix these drivers.
> 2. Add a "flush deferred workqueue" in fw_devlink_resume()
>
> I'd rather we fix the drivers so that they handle deferred probes
> correctly. Thoughts?

While the affected drivers should handle deferred probe fine, none of
the affected drivers is subject to deferred probing: they all probe
successfully on first try (I had added debug prints to
platform_drv_probe() to be sure).
The affected drivers are still probed in the same order (INTC is one of
the earliest drivers probed, gpio-keys is the last). However, during
system suspend, gpio-keys is suspended before INTC, which is wrong, as
gpio-keys uses an interrupt provided by INTC.

Perhaps the "in parallel" is the real culprit, and there is a race
condition somewhere?

Gr{oetje,eeting}s,

                        Geert

-- 
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v1 4/4] of: platform: Batch fwnode parsing when adding all top level devices
  2020-06-18  7:31       ` Geert Uytterhoeven
@ 2020-06-18 23:00         ` Saravana Kannan
  2020-06-19 12:24           ` Geert Uytterhoeven
  0 siblings, 1 reply; 22+ messages in thread
From: Saravana Kannan @ 2020-06-18 23:00 UTC (permalink / raw)
  To: Geert Uytterhoeven
  Cc: Greg Kroah-Hartman, Rafael J. Wysocki, Rob Herring, Frank Rowand,
	Len Brown, Android Kernel Team, Linux Kernel Mailing List,
	open list:OPEN FIRMWARE AND FLATTENED DEVICE TREE BINDINGS,
	ACPI Devel Maling List, Ji Luo, Linux-Renesas, Marek Szyprowski

On Thu, Jun 18, 2020 at 12:32 AM Geert Uytterhoeven
<geert@linux-m68k.org> wrote:
>
> Hi Saravana,
>
> On Wed, Jun 17, 2020 at 8:36 PM Saravana Kannan <saravanak@google.com> wrote:
> > On Wed, Jun 17, 2020 at 5:20 AM Geert Uytterhoeven <geert@linux-m68k.org> wrote:
> > > On Fri, May 15, 2020 at 7:38 AM Saravana Kannan <saravanak@google.com> wrote:
> > > > The fw_devlink_pause() and fw_devlink_resume() APIs allow batching the
> > > > parsing of the device tree nodes when a lot of devices are added. This
> > > > will significantly cut down parsing time (as much a 1 second on some
> > > > systems). So, use them when adding devices for all the top level device
> > > > tree nodes in a system.
> > > >
> > > > Signed-off-by: Saravana Kannan <saravanak@google.com>
> > >
> > > This is now commit 93d2e4322aa74c1a ("of: platform: Batch fwnode parsing
> > > when adding all top level devices") in v5.8-rc1, and I have bisected a
> > > regression to it: on r8a7740/armadillo and sh73a0/kzm9g, the system can
> > > no longer be woken up from s2ram by a GPIO key. Reverting the commit
> > > fixes the issue.
> > >
> > > On these systems, the GPIO/PFC block has its interrupt lines connected
> > > to intermediate interrupt controllers (Renesas INTC), which are in turn
> > > connected to the main interrupt controller (ARM GIC).  The INTC block is
> > > part of a power and clock domain.  Hence if a GPIO is enabled as a
> > > wake-up source, the INTC is part of the wake-up path, and thus must be
> > > kept enabled when entering s2ram.
> > >
> > > While this commit has no impact on probe order for me (unlike in Marek's
> > > case), it does have an impact on suspend order:
> > >   - Before this commit:
> > >       1. The keyboard (gpio-keys) is suspended, and calls
> > >          enable_irq_wake() to inform the upstream interrupt controller
> > >          (INTC) that it is part of the wake-up path,
> > >       2. INTC is suspended, and calls device_set_wakeup_path() to inform
> > >          the device core that it must be kept enabled,
> > >       3. The system is woken by pressing a wake-up key.
> > >
> > >   - After this commit:
> > >       1. INTC is suspended, and is not aware it is part of the wake-up
> > >          path, so it is disabled by the device core,
> > >       2. gpio-keys is suspended, and calls enable_irq_wake() in vain,
> > >       3. Pressing a wake-up key has no effect, as INTC is disabled, and
> > >          the interrupt does not come through.
> > >
> > > It looks like no device links are involved, as both gpio-keys and INTC have
> > > no links.
> > > Do you have a clue?
> > >
> > > Thanks!
> >
> > That patch of mine defers probe on all devices added by the
> > of_platform_default_populate() call, and then once the call returns,
> > it immediately triggers a deferred probe.
> >
> > So all these devices are being probed in parallel in the deferred
> > probe workqueue while the main "initcall thread" continues down to
> > further initcalls. It looks like some of the drivers in subsequent
> > initcalls are assuming that devices in the earlier initcalls always
> > probe and can't be deferred?
> >
> > There are two options.
> > 1. Fix these drivers.
> > 2. Add a "flush deferred workqueue" in fw_devlink_resume()
> >
> > I'd rather we fix the drivers so that they handle deferred probes
> > correctly. Thoughts?
>
> While the affected drivers should handle deferred probe fine, none of
> the affected drivers is subject to deferred probing: they all probe
> successfully on first try (I had added debug prints to
> platform_drv_probe() to be sure).
> The affected drivers are still probed in the same order (INTC is one of
> the earliest drivers probed, gpio-keys is the last).

Thanks, this is useful info. Now I know that my patch isn't somehow
reordering devices that would have probed as soon as
of_platform_default_populate_init() added them.

When you say the "The affected drivers are still probed in the same
order", are you only referring to the devices that would have probed
before of_platform_default_populate_init() returns? Or ALL devices in
the system are probing in the same order?

I assume gpio-keys gets probed in the "normal init thread" and not by
the deferred probe workqueue? I'm guessing this because gpio_keys
driver seems to register during late_initcall() whereas
of_platform_default_populate_init() runs as an arch_initcall_sync().

> However, during
> system suspend, gpio-keys is suspended before INTC, which is wrong, as
> gpio-keys uses an interrupt provided by INTC.
>
> Perhaps the "in parallel" is the real culprit, and there is a race
> condition somewhere?

I tried digging into the gpio_keys driver code to see how it interacts
with INTC and if gpio-keys defers probe if INTC hasn't probed yet. But
it seems like a rabbit hole that'd be easier to figure out when you
have the device. Can you check if gpio-keys is probing before INTC in
the "bad" case?

Also, in general, can you see if there's a difference in the probe
order between all the devices in the system? Adding a log to
really_probe() would be better in case non-platform devices are
getting reordered (my change affects all devices that are created from
DT, not just platform devices).

I want to make sure we understand the real issue before we try to fix it.

Thanks,
Saravana

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v1 4/4] of: platform: Batch fwnode parsing when adding all top level devices
  2020-06-18 23:00         ` Saravana Kannan
@ 2020-06-19 12:24           ` Geert Uytterhoeven
  2020-06-19 20:07             ` Saravana Kannan
  0 siblings, 1 reply; 22+ messages in thread
From: Geert Uytterhoeven @ 2020-06-19 12:24 UTC (permalink / raw)
  To: Saravana Kannan
  Cc: Greg Kroah-Hartman, Rafael J. Wysocki, Rob Herring, Frank Rowand,
	Len Brown, Android Kernel Team, Linux Kernel Mailing List,
	open list:OPEN FIRMWARE AND FLATTENED DEVICE TREE BINDINGS,
	ACPI Devel Maling List, Ji Luo, Linux-Renesas, Marek Szyprowski

Hi Saravana,

On Fri, Jun 19, 2020 at 1:01 AM Saravana Kannan <saravanak@google.com> wrote:
> On Thu, Jun 18, 2020 at 12:32 AM Geert Uytterhoeven
> <geert@linux-m68k.org> wrote:
> > On Wed, Jun 17, 2020 at 8:36 PM Saravana Kannan <saravanak@google.com> wrote:
> > > On Wed, Jun 17, 2020 at 5:20 AM Geert Uytterhoeven <geert@linux-m68k.org> wrote:
> > > > On Fri, May 15, 2020 at 7:38 AM Saravana Kannan <saravanak@google.com> wrote:
> > > > > The fw_devlink_pause() and fw_devlink_resume() APIs allow batching the
> > > > > parsing of the device tree nodes when a lot of devices are added. This
> > > > > will significantly cut down parsing time (as much a 1 second on some
> > > > > systems). So, use them when adding devices for all the top level device
> > > > > tree nodes in a system.
> > > > >
> > > > > Signed-off-by: Saravana Kannan <saravanak@google.com>
> > > >
> > > > This is now commit 93d2e4322aa74c1a ("of: platform: Batch fwnode parsing
> > > > when adding all top level devices") in v5.8-rc1, and I have bisected a
> > > > regression to it: on r8a7740/armadillo and sh73a0/kzm9g, the system can
> > > > no longer be woken up from s2ram by a GPIO key. Reverting the commit
> > > > fixes the issue.
> > > >
> > > > On these systems, the GPIO/PFC block has its interrupt lines connected
> > > > to intermediate interrupt controllers (Renesas INTC), which are in turn
> > > > connected to the main interrupt controller (ARM GIC).  The INTC block is
> > > > part of a power and clock domain.  Hence if a GPIO is enabled as a
> > > > wake-up source, the INTC is part of the wake-up path, and thus must be
> > > > kept enabled when entering s2ram.
> > > >
> > > > While this commit has no impact on probe order for me (unlike in Marek's
> > > > case), it does have an impact on suspend order:
> > > >   - Before this commit:
> > > >       1. The keyboard (gpio-keys) is suspended, and calls
> > > >          enable_irq_wake() to inform the upstream interrupt controller
> > > >          (INTC) that it is part of the wake-up path,
> > > >       2. INTC is suspended, and calls device_set_wakeup_path() to inform
> > > >          the device core that it must be kept enabled,
> > > >       3. The system is woken by pressing a wake-up key.
> > > >
> > > >   - After this commit:
> > > >       1. INTC is suspended, and is not aware it is part of the wake-up
> > > >          path, so it is disabled by the device core,
> > > >       2. gpio-keys is suspended, and calls enable_irq_wake() in vain,
> > > >       3. Pressing a wake-up key has no effect, as INTC is disabled, and
> > > >          the interrupt does not come through.
> > > >
> > > > It looks like no device links are involved, as both gpio-keys and INTC have
> > > > no links.
> > > > Do you have a clue?
> > > >
> > > > Thanks!
> > >
> > > That patch of mine defers probe on all devices added by the
> > > of_platform_default_populate() call, and then once the call returns,
> > > it immediately triggers a deferred probe.
> > >
> > > So all these devices are being probed in parallel in the deferred
> > > probe workqueue while the main "initcall thread" continues down to
> > > further initcalls. It looks like some of the drivers in subsequent
> > > initcalls are assuming that devices in the earlier initcalls always
> > > probe and can't be deferred?
> > >
> > > There are two options.
> > > 1. Fix these drivers.
> > > 2. Add a "flush deferred workqueue" in fw_devlink_resume()
> > >
> > > I'd rather we fix the drivers so that they handle deferred probes
> > > correctly. Thoughts?
> >
> > While the affected drivers should handle deferred probe fine, none of
> > the affected drivers is subject to deferred probing: they all probe
> > successfully on first try (I had added debug prints to
> > platform_drv_probe() to be sure).
> > The affected drivers are still probed in the same order (INTC is one of
> > the earliest drivers probed, gpio-keys is the last).
>
> Thanks, this is useful info. Now I know that my patch isn't somehow
> reordering devices that would have probed as soon as
> of_platform_default_populate_init() added them.
>
> When you say the "The affected drivers are still probed in the same
> order", are you only referring to the devices that would have probed
> before of_platform_default_populate_init() returns? Or ALL devices in
> the system are probing in the same order?

I was referring to all platform devices (based on a debug print added to
platform_drv_probe()). See more below.

> I assume gpio-keys gets probed in the "normal init thread" and not by
> the deferred probe workqueue? I'm guessing this because gpio_keys
> driver seems to register during late_initcall() whereas
> of_platform_default_populate_init() runs as an arch_initcall_sync().

After adding a WARN(1, ...) to gpio_keys_probe(), the backtrace shows it
is called directly from do_one_initcall(), in both the good and the bad
case.

> > However, during
> > system suspend, gpio-keys is suspended before INTC, which is wrong, as
> > gpio-keys uses an interrupt provided by INTC.
> >
> > Perhaps the "in parallel" is the real culprit, and there is a race
> > condition somewhere?
>
> I tried digging into the gpio_keys driver code to see how it interacts
> with INTC and if gpio-keys defers probe if INTC hasn't probed yet. But
> it seems like a rabbit hole that'd be easier to figure out when you
> have the device. Can you check if gpio-keys is probing before INTC in
> the "bad" case?

It is not, gpio-keys is always probed very late.

Hence for testing, I moved gpio-keys initialization just before INTC, so
it is probed before INTC. Then gpio-keys is deferred, as expected, and
reprobes successfully later.
Interestingly, that fixes my wake-up issue, too?!?

> Also, in general, can you see if there's a difference in the probe
> order between all the devices in the system? Adding a log to
> really_probe() would be better in case non-platform devices are
> getting reordered (my change affects all devices that are created from
> DT, not just platform devices).
>
> I want to make sure we understand the real issue before we try to fix it.

Enabling all debug prints in really_probe(), comparing the output
before/after the bad commit, and filtering out all noise, I get:

     bus: 'platform': really_probe: probing driver reg-dummy with
device reg-dummy
B    bus: 'platform': really_probe: probing driver renesas_intc_irqpin
with device e6900000.interrupt-controller
     bus: 'platform': really_probe: probing driver renesas_intc_irqpin
with device e6900004.interrupt-controller
A   +Workqueue: events deferred_probe_work_func
     bus: 'platform': really_probe: probing driver renesas_intc_irqpin
with device e6900008.interrupt-controller
C    bus: 'platform': really_probe: probing driver renesas_intc_irqpin
with device e690000c.interrupt-controller
     bus: 'platform': really_probe: probing driver sh-pfc with device
e6050000.pin-controller
     bus: 'platform': really_probe: probing driver reg-fixed-voltage
with device regulator-1p8v
     bus: 'platform': really_probe: probing driver reg-fixed-voltage
with device regulator-3p3v
     bus: 'platform': really_probe: probing driver reg-fixed-voltage
with device regulator-vmmc-sdhi0
     bus: 'platform': really_probe: probing driver reg-fixed-voltage
with device regulator-vmmc-sdhi2
     bus: 'platform': really_probe: probing driver i2c-sh_mobile with
device e6820000.i2c
     bus: 'i2c': really_probe: probing driver as3711 with device 0-0040
     bus: 'platform': really_probe: probing driver as3711-regulator
with device as3711-regulator
     bus: 'platform': really_probe: probing driver i2c-sh_mobile with
device e6822000.i2c
     bus: 'platform': really_probe: probing driver i2c-sh_mobile with
device e6826000.i2c
     bus: 'i2c': really_probe: probing driver pcf857x with device 2-0020
     bus: 'platform': really_probe: probing driver sh_cmt with device
e6138000.timer
     bus: 'platform': really_probe: probing driver armv7-pmu with device pmu
     bus: 'platform': really_probe: probing driver simple-pm-bus with
device fec10000.bus
     bus: 'platform': really_probe: probing driver as3711-backlight
with device as3711-backlight
     bus: 'platform': really_probe: probing driver sh-sci with device
e6c80000.serial
     bus: 'platform': really_probe: probing driver smsc911x with
device 10000000.ethernet
     bus: 'i2c': really_probe: probing driver st1232-ts with device 1-0055
     bus: 'i2c': really_probe: probing driver adxl34x with device 0-001d
     bus: 'i2c': really_probe: probing driver rtc-rs5c372 with device 0-0032
     bus: 'platform': really_probe: probing driver rmobile_reset with
device e6180000.system-controller
     bus: 'platform': really_probe: probing driver cpufreq-dt with
device cpufreq-dt
     bus: 'platform': really_probe: probing driver sh_mobile_sdhi with
device ee100000.sd
     bus: 'platform': really_probe: probing driver sh_mobile_sdhi with
device ee140000.sd
     bus: 'platform': really_probe: probing driver sh_mmcif with
device e6bd0000.mmc
     bus: 'platform': really_probe: probing driver leds-gpio with device leds
     bus: 'i2c': really_probe: probing driver ak8975 with device 0-000c
     bus: 'platform': really_probe: probing driver snd-soc-dummy with
device snd-soc-dummy
     bus: 'i2c': really_probe: probing driver ak4642-codec with device 0-0012
     bus: 'platform': really_probe: probing driver asoc-simple-card
with device sound
     bus: 'platform': really_probe: probing driver fsi-pcm-audio with
device ec230000.sound
     bus: 'platform': really_probe: probing driver asoc-simple-card
with device sound
     bus: 'platform': really_probe: probing driver gpio-keys with
device keyboard"
     bus: 'mmc': really_probe: probing driver mmcblk with device mmc2:0001

So all devices are probed in the exact same order.
A: Note the addition of the message "Workqueue: events
   deferred_probe_work_func", which might give a clue?
B,C: "e6900000.interrupt-controller" and "e6900008.interrupt-controller"
     are the two devices that are suspended later in the wrong order.
     One of them is probed before A, one after, so A may be a red herring?

I'm still not much wiser, though....

BTW, r8a7740/armadillo is single CPU, while sh73a0/kzm9g is dual-CPU.
So both UP and SMP are affected.

Gr{oetje,eeting}s,

                        Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v1 4/4] of: platform: Batch fwnode parsing when adding all top level devices
  2020-06-19 12:24           ` Geert Uytterhoeven
@ 2020-06-19 20:07             ` Saravana Kannan
  2020-06-20  2:32               ` Saravana Kannan
  0 siblings, 1 reply; 22+ messages in thread
From: Saravana Kannan @ 2020-06-19 20:07 UTC (permalink / raw)
  To: Geert Uytterhoeven
  Cc: Greg Kroah-Hartman, Rafael J. Wysocki, Rob Herring, Frank Rowand,
	Len Brown, Android Kernel Team, Linux Kernel Mailing List,
	open list:OPEN FIRMWARE AND FLATTENED DEVICE TREE BINDINGS,
	ACPI Devel Maling List, Ji Luo, Linux-Renesas, Marek Szyprowski

Snipped a bunch of old context.

On Fri, Jun 19, 2020 at 5:24 AM Geert Uytterhoeven <geert@linux-m68k.org> wrote:
>
> On Fri, Jun 19, 2020 at 1:01 AM Saravana Kannan <saravanak@google.com> wrote:
> >
> > When you say the "The affected drivers are still probed in the same
> > order", are you only referring to the devices that would have probed
> > before of_platform_default_populate_init() returns? Or ALL devices in
> > the system are probing in the same order?
>
> I was referring to all platform devices (based on a debug print added to
> platform_drv_probe()). See more below.
>
> > I assume gpio-keys gets probed in the "normal init thread" and not by
> > the deferred probe workqueue? I'm guessing this because gpio_keys
> > driver seems to register during late_initcall() whereas
> > of_platform_default_populate_init() runs as an arch_initcall_sync().
>
> After adding a WARN(1, ...) to gpio_keys_probe(), the backtrace shows it
> is called directly from do_one_initcall(), in both the good and the bad
> case.
>
> > > However, during
> > > system suspend, gpio-keys is suspended before INTC, which is wrong, as
> > > gpio-keys uses an interrupt provided by INTC.
> > >
> > > Perhaps the "in parallel" is the real culprit, and there is a race
> > > condition somewhere?
> >
> > I tried digging into the gpio_keys driver code to see how it interacts
> > with INTC and if gpio-keys defers probe if INTC hasn't probed yet. But
> > it seems like a rabbit hole that'd be easier to figure out when you
> > have the device. Can you check if gpio-keys is probing before INTC in
> > the "bad" case?
>
> It is not, gpio-keys is always probed very late.
>
> Hence for testing, I moved gpio-keys initialization just before INTC, so
> it is probed before INTC. Then gpio-keys is deferred, as expected, and
> reprobes successfully later.
> Interestingly, that fixes my wake-up issue, too?!?
>
> > Also, in general, can you see if there's a difference in the probe
> > order between all the devices in the system? Adding a log to
> > really_probe() would be better in case non-platform devices are
> > getting reordered (my change affects all devices that are created from
> > DT, not just platform devices).
> >
> > I want to make sure we understand the real issue before we try to fix it.
>
> Enabling all debug prints in really_probe(), comparing the output
> before/after the bad commit, and filtering out all noise, I get:
>
>      bus: 'platform': really_probe: probing driver reg-dummy with
> device reg-dummy
> B    bus: 'platform': really_probe: probing driver renesas_intc_irqpin
> with device e6900000.interrupt-controller
>      bus: 'platform': really_probe: probing driver renesas_intc_irqpin
> with device e6900004.interrupt-controller
> A   +Workqueue: events deferred_probe_work_func
>      bus: 'platform': really_probe: probing driver renesas_intc_irqpin
> with device e6900008.interrupt-controller
> C    bus: 'platform': really_probe: probing driver renesas_intc_irqpin
> with device e690000c.interrupt-controller
>      bus: 'platform': really_probe: probing driver sh-pfc with device
> e6050000.pin-controller
>      bus: 'platform': really_probe: probing driver reg-fixed-voltage
> with device regulator-1p8v
>      bus: 'platform': really_probe: probing driver reg-fixed-voltage
> with device regulator-3p3v
>      bus: 'platform': really_probe: probing driver reg-fixed-voltage
> with device regulator-vmmc-sdhi0
>      bus: 'platform': really_probe: probing driver reg-fixed-voltage
> with device regulator-vmmc-sdhi2
>      bus: 'platform': really_probe: probing driver i2c-sh_mobile with
> device e6820000.i2c
>      bus: 'i2c': really_probe: probing driver as3711 with device 0-0040
>      bus: 'platform': really_probe: probing driver as3711-regulator
> with device as3711-regulator
>      bus: 'platform': really_probe: probing driver i2c-sh_mobile with
> device e6822000.i2c
>      bus: 'platform': really_probe: probing driver i2c-sh_mobile with
> device e6826000.i2c
>      bus: 'i2c': really_probe: probing driver pcf857x with device 2-0020
>      bus: 'platform': really_probe: probing driver sh_cmt with device
> e6138000.timer
>      bus: 'platform': really_probe: probing driver armv7-pmu with device pmu
>      bus: 'platform': really_probe: probing driver simple-pm-bus with
> device fec10000.bus
>      bus: 'platform': really_probe: probing driver as3711-backlight
> with device as3711-backlight
>      bus: 'platform': really_probe: probing driver sh-sci with device
> e6c80000.serial
>      bus: 'platform': really_probe: probing driver smsc911x with
> device 10000000.ethernet
>      bus: 'i2c': really_probe: probing driver st1232-ts with device 1-0055
>      bus: 'i2c': really_probe: probing driver adxl34x with device 0-001d
>      bus: 'i2c': really_probe: probing driver rtc-rs5c372 with device 0-0032
>      bus: 'platform': really_probe: probing driver rmobile_reset with
> device e6180000.system-controller
>      bus: 'platform': really_probe: probing driver cpufreq-dt with
> device cpufreq-dt
>      bus: 'platform': really_probe: probing driver sh_mobile_sdhi with
> device ee100000.sd
>      bus: 'platform': really_probe: probing driver sh_mobile_sdhi with
> device ee140000.sd
>      bus: 'platform': really_probe: probing driver sh_mmcif with
> device e6bd0000.mmc
>      bus: 'platform': really_probe: probing driver leds-gpio with device leds
>      bus: 'i2c': really_probe: probing driver ak8975 with device 0-000c
>      bus: 'platform': really_probe: probing driver snd-soc-dummy with
> device snd-soc-dummy
>      bus: 'i2c': really_probe: probing driver ak4642-codec with device 0-0012
>      bus: 'platform': really_probe: probing driver asoc-simple-card
> with device sound
>      bus: 'platform': really_probe: probing driver fsi-pcm-audio with
> device ec230000.sound
>      bus: 'platform': really_probe: probing driver asoc-simple-card
> with device sound
>      bus: 'platform': really_probe: probing driver gpio-keys with
> device keyboard"
>      bus: 'mmc': really_probe: probing driver mmcblk with device mmc2:0001
>
> So all devices are probed in the exact same order.
> A: Note the addition of the message "Workqueue: events
>    deferred_probe_work_func", which might give a clue?
> B,C: "e6900000.interrupt-controller" and "e6900008.interrupt-controller"
>      are the two devices that are suspended later in the wrong order.
>      One of them is probed before A, one after, so A may be a red herring?
>

Thanks! Your logs and the tip above helped!

*face palm*/Sigh... (for what is happening)

In the absence of other info (like device links, parent-child
relationship, etc) the device_add() order is used as an implicit order
for suspend (in reverse) and resume.
device_add() -> device_pm_add() -> list_add_tail(..., &dpm_list)

But the deferred probe code reorders the deferred devices:
deferred_probe_work_func() -> device_pm_move_to_tail()

So, this is what is happening:
1. of_platform_default_populate() adds all the top level devices in DT order.
2. So dpm_list is: ... INTC, ..., gpio-keys
3. Only devices with drivers already registered by this time get added
to the deferred probe list (because that only happens when a driver
matches)
4. INTC is one of them.
5. So when deferred_probe_work_func() gets to INTC, it moves it to the
end of the dpm_list
6. So dpm_list is now: ..., gpio-keys, ... INTC

Things fail.

Obviously, my patch is causing this unintended behavior and I could
try to fix it. But I'm not really sure if the actual bug is in my
patch...

Deferred probe is supposed to allow a driver to defer probe without
worrying something else is going to get screwed up. Also, order of
devices in DT isn't supposed to affect functionality. Both of these
are not true with the current behavior.

I think instead of deferred_probe_work_func() moving the device to the
end of the dpm_list, I think the device probing successfully is what
should move it to the end of the dpm_list. That way, the dpm_list is
actually ordered by when the devices become functional and not the
random order in DT or random probe order which can get pretty
convoluted with multiple deferred probes. This feels right and will
make suspend/resume more robust against DT ordering -- but I'm not
sure what other wide ranging impact this has for other platforms.

Greg/Rafael, do you agree with the paragraph above? If yes, I can send
a patch out that orders dpm_list by probe success order.

Thanks,
Saravana

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v1 4/4] of: platform: Batch fwnode parsing when adding all top level devices
  2020-06-19 20:07             ` Saravana Kannan
@ 2020-06-20  2:32               ` Saravana Kannan
  2020-06-22 15:49                 ` Geert Uytterhoeven
  0 siblings, 1 reply; 22+ messages in thread
From: Saravana Kannan @ 2020-06-20  2:32 UTC (permalink / raw)
  To: Geert Uytterhoeven
  Cc: Greg Kroah-Hartman, Rafael J. Wysocki, Rob Herring, Frank Rowand,
	Len Brown, Android Kernel Team, Linux Kernel Mailing List,
	open list:OPEN FIRMWARE AND FLATTENED DEVICE TREE BINDINGS,
	ACPI Devel Maling List, Ji Luo, Linux-Renesas, Marek Szyprowski

On Fri, Jun 19, 2020 at 1:07 PM Saravana Kannan <saravanak@google.com> wrote:
>
> I think instead of deferred_probe_work_func() moving the device to the
> end of the dpm_list, I think the device probing successfully is what
> should move it to the end of the dpm_list. That way, the dpm_list is
> actually ordered by when the devices become functional and not the
> random order in DT or random probe order which can get pretty
> convoluted with multiple deferred probes. This feels right and will
> make suspend/resume more robust against DT ordering -- but I'm not
> sure what other wide ranging impact this has for other platforms.

Geert,

If you want to play around with a potential fix to test my hypothesis,
I think it's just adding this one line to driver_bound():
============
klist_add_tail(&dev->p->knode_driver, &dev->driver->p->klist_devices);
device_links_driver_bound(dev);
+device_pm_move_to_tail(dev);

device_pm_check_callbacks(dev);
============

-Saravana

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v1 4/4] of: platform: Batch fwnode parsing when adding all top level devices
  2020-06-20  2:32               ` Saravana Kannan
@ 2020-06-22 15:49                 ` Geert Uytterhoeven
  2020-06-24 23:22                   ` Saravana Kannan
  0 siblings, 1 reply; 22+ messages in thread
From: Geert Uytterhoeven @ 2020-06-22 15:49 UTC (permalink / raw)
  To: Saravana Kannan
  Cc: Greg Kroah-Hartman, Rafael J. Wysocki, Rob Herring, Frank Rowand,
	Len Brown, Android Kernel Team, Linux Kernel Mailing List,
	open list:OPEN FIRMWARE AND FLATTENED DEVICE TREE BINDINGS,
	ACPI Devel Maling List, Ji Luo, Linux-Renesas, Marek Szyprowski

Hi Saravana,

On Sat, Jun 20, 2020 at 4:33 AM Saravana Kannan <saravanak@google.com> wrote:
> On Fri, Jun 19, 2020 at 1:07 PM Saravana Kannan <saravanak@google.com> wrote:
> > I think instead of deferred_probe_work_func() moving the device to the
> > end of the dpm_list, I think the device probing successfully is what
> > should move it to the end of the dpm_list. That way, the dpm_list is
> > actually ordered by when the devices become functional and not the
> > random order in DT or random probe order which can get pretty
> > convoluted with multiple deferred probes. This feels right and will
> > make suspend/resume more robust against DT ordering -- but I'm not
> > sure what other wide ranging impact this has for other platforms.
>
> If you want to play around with a potential fix to test my hypothesis,
> I think it's just adding this one line to driver_bound():
> ============
> klist_add_tail(&dev->p->knode_driver, &dev->driver->p->klist_devices);
> device_links_driver_bound(dev);
> +device_pm_move_to_tail(dev);
>
> device_pm_check_callbacks(dev);
> ============

Thanks, that seems to fix the issue for me, on both affected systems!
Note that this has quite some impact on the order devices are suspended,
but this seems harmless.

Will try on more systems later...

Gr{oetje,eeting}s,

                        Geert

-- 
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v1 4/4] of: platform: Batch fwnode parsing when adding all top level devices
  2020-06-22 15:49                 ` Geert Uytterhoeven
@ 2020-06-24 23:22                   ` Saravana Kannan
  0 siblings, 0 replies; 22+ messages in thread
From: Saravana Kannan @ 2020-06-24 23:22 UTC (permalink / raw)
  To: Geert Uytterhoeven
  Cc: Greg Kroah-Hartman, Rafael J. Wysocki, Rob Herring, Frank Rowand,
	Len Brown, Android Kernel Team, Linux Kernel Mailing List,
	open list:OPEN FIRMWARE AND FLATTENED DEVICE TREE BINDINGS,
	ACPI Devel Maling List, Ji Luo, Linux-Renesas, Marek Szyprowski

On Mon, Jun 22, 2020 at 8:49 AM Geert Uytterhoeven <geert@linux-m68k.org> wrote:
>
> Hi Saravana,
>
> On Sat, Jun 20, 2020 at 4:33 AM Saravana Kannan <saravanak@google.com> wrote:
> > On Fri, Jun 19, 2020 at 1:07 PM Saravana Kannan <saravanak@google.com> wrote:
> > > I think instead of deferred_probe_work_func() moving the device to the
> > > end of the dpm_list, I think the device probing successfully is what
> > > should move it to the end of the dpm_list. That way, the dpm_list is
> > > actually ordered by when the devices become functional and not the
> > > random order in DT or random probe order which can get pretty
> > > convoluted with multiple deferred probes. This feels right and will
> > > make suspend/resume more robust against DT ordering -- but I'm not
> > > sure what other wide ranging impact this has for other platforms.
> >
> > If you want to play around with a potential fix to test my hypothesis,
> > I think it's just adding this one line to driver_bound():
> > ============
> > klist_add_tail(&dev->p->knode_driver, &dev->driver->p->klist_devices);
> > device_links_driver_bound(dev);
> > +device_pm_move_to_tail(dev);
> >
> > device_pm_check_callbacks(dev);
> > ============
>
> Thanks, that seems to fix the issue for me, on both affected systems!
> Note that this has quite some impact on the order devices are suspended,
> but this seems harmless.
>
> Will try on more systems later...

Thanks for testing. Maybe I should just send that change as a patch
and see what Greg/Rafael have to say to that.

It's a general fix anyway. So, might as well send it out.

-Saravana

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2020-06-24 23:23 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-05-15  5:34 [PATCH v1 0/4] Optimize fw_devlink parsing Saravana Kannan
2020-05-15  5:34 ` [PATCH v1 1/4] driver core: Move code to the right part of the file Saravana Kannan
2020-05-15  5:34 ` [PATCH v1 2/4] driver core: Look for waiting consumers only for a fwnode's primary device Saravana Kannan
2020-05-15  5:34 ` [PATCH v1 3/4] driver core: fw_devlink: Add support for batching fwnode parsing Saravana Kannan
2020-05-15  5:35 ` [PATCH v1 4/4] of: platform: Batch fwnode parsing when adding all top level devices Saravana Kannan
     [not found]   ` <CGME20200519062510eucas1p27bc59da66e1b77534855103a27f87452@eucas1p2.samsung.com>
2020-05-19  6:25     ` Marek Szyprowski
2020-05-19  6:48       ` Saravana Kannan
2020-05-19  7:11         ` Marek Szyprowski
2020-05-19 10:32           ` Marek Szyprowski
2020-05-19 18:02             ` Saravana Kannan
2020-05-20  4:21               ` Marek Szyprowski
2020-06-17 12:20   ` Geert Uytterhoeven
2020-06-17 18:36     ` Saravana Kannan
2020-06-18  7:31       ` Geert Uytterhoeven
2020-06-18 23:00         ` Saravana Kannan
2020-06-19 12:24           ` Geert Uytterhoeven
2020-06-19 20:07             ` Saravana Kannan
2020-06-20  2:32               ` Saravana Kannan
2020-06-22 15:49                 ` Geert Uytterhoeven
2020-06-24 23:22                   ` Saravana Kannan
2020-05-15  8:52 ` [PATCH v1 0/4] Optimize fw_devlink parsing Saravana Kannan
2020-05-15 14:36   ` Greg Kroah-Hartman

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).