linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RESEND PATCH v1 0/5] Solve postboot supplier cleanup and optimize probe ordering
@ 2019-06-04  0:32 Saravana Kannan
  2019-06-04  0:32 ` [RESEND PATCH v1 1/5] of/platform: Speed up of_find_device_by_node() Saravana Kannan
                   ` (6 more replies)
  0 siblings, 7 replies; 35+ messages in thread
From: Saravana Kannan @ 2019-06-04  0:32 UTC (permalink / raw)
  To: Rob Herring, Mark Rutland, Greg Kroah-Hartman, Rafael J. Wysocki,
	Frank Rowand
  Cc: Saravana Kannan, David Collins, devicetree, linux-kernel, kernel-team

Add a generic "depends-on" property that allows specifying mandatory
functional dependencies between devices. Add device-links after the
devices are created (but before they are probed) by looking at this
"depends-on" property.

This property is used instead of existing DT properties that specify
phandles of other devices (Eg: clocks, pinctrl, regulators, etc). This
is because not all resources referred to by existing DT properties are
mandatory functional dependencies. Some devices/drivers might be able
to operate with reduced functionality when some of the resources
aren't available. For example, a device could operate in polling mode
if no IRQ is available, a device could skip doing power management if
clock or voltage control isn't available and they are left on, etc.

So, adding mandatory functional dependency links between devices by
looking at referred phandles in DT properties won't work as it would
prevent probing devices that could be probed. By having an explicit
depends-on property, we can handle these cases correctly.

Having functional dependencies explicitly called out in DT and
automatically added before the devices are probed, provides the
following benefits:

- Optimizes device probe order and avoids the useless work of
  attempting probes of devices that will not probe successfully
  (because their suppliers aren't present or haven't probed yet).

  For example, in a commonly available mobile SoC, registering just
  one consumer device's driver at an initcall level earlier than the
  supplier device's driver causes 11 failed probe attempts before the
  consumer device probes successfully. This was with a kernel with all
  the drivers statically compiled in. This problem gets a lot worse if
  all the drivers are loaded as modules without direct symbol
  dependencies.

- Supplier devices like clock providers, regulators providers, etc
  need to keep the resources they provide active and at a particular
  state(s) during boot up even if their current set of consumers don't
  request the resource to be active. This is because the rest of the
  consumers might not have probed yet and turning off the resource
  before all the consumers have probed could lead to a hang or
  undesired user experience.

  Some frameworks (Eg: regulator) handle this today by turning off
  "unused" resources at late_initcall_sync and hoping all the devices
  have probed by then. This is not a valid assumption for systems with
  loadable modules. Other frameworks (Eg: clock) just don't handle
  this due to the lack of a clear signal for when they can turn off
  resources. This leads to downstream hacks to handle cases like this
  that can easily be solved in the upstream kernel.

  By linking devices before they are probed, we give suppliers a clear
  count of the number of dependent consumers. Once all of the
  consumers are active, the suppliers can turn off the unused
  resources without making assumptions about the number of consumers.

By default we just add device-links to track "driver presence" (probe
succeeded) of the supplier device. If any other functionality provided
by device-links are needed, it is left to the consumer/supplier
devices to change the link when they probe.
 

Saravana Kannan (5):
  of/platform: Speed up of_find_device_by_node()
  driver core: Add device links support for pending links to suppliers
  dt-bindings: Add depends-on property
  of/platform: Add functional dependency link from "depends-on" property
  driver core: Add sync_state driver/bus callback

 .../devicetree/bindings/depends-on.txt        |  26 +++++
 drivers/base/core.c                           | 106 ++++++++++++++++++
 drivers/of/platform.c                         |  75 ++++++++++++-
 include/linux/device.h                        |  24 ++++
 include/linux/of.h                            |   3 +
 5 files changed, 233 insertions(+), 1 deletion(-)
 create mode 100644 Documentation/devicetree/bindings/depends-on.txt

-- 
2.22.0.rc1.257.g3120a18244-goog


^ permalink raw reply	[flat|nested] 35+ messages in thread

* [RESEND PATCH v1 1/5] of/platform: Speed up of_find_device_by_node()
  2019-06-04  0:32 [RESEND PATCH v1 0/5] Solve postboot supplier cleanup and optimize probe ordering Saravana Kannan
@ 2019-06-04  0:32 ` Saravana Kannan
  2019-06-10 17:36   ` Rob Herring
  2019-06-04  0:32 ` [RESEND PATCH v1 2/5] driver core: Add device links support for pending links to suppliers Saravana Kannan
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 35+ messages in thread
From: Saravana Kannan @ 2019-06-04  0:32 UTC (permalink / raw)
  To: Rob Herring, Mark Rutland, Greg Kroah-Hartman, Rafael J. Wysocki,
	Frank Rowand
  Cc: Saravana Kannan, David Collins, devicetree, linux-kernel, kernel-team

Add a pointer from device tree node to the device created from it.
This allows us to find the device corresponding to a device tree node
without having to loop through all the platform devices.

However, fallback to looping through the platform devices to handle
any devices that might set their own of_node.

Signed-off-by: Saravana Kannan <saravanak@google.com>
---
 drivers/of/platform.c | 20 +++++++++++++++++++-
 include/linux/of.h    |  3 +++
 2 files changed, 22 insertions(+), 1 deletion(-)

diff --git a/drivers/of/platform.c b/drivers/of/platform.c
index 04ad312fd85b..1115a8d80a33 100644
--- a/drivers/of/platform.c
+++ b/drivers/of/platform.c
@@ -42,6 +42,8 @@ static int of_dev_node_match(struct device *dev, void *data)
 	return dev->of_node == data;
 }
 
+static DEFINE_SPINLOCK(of_dev_lock);
+
 /**
  * of_find_device_by_node - Find the platform_device associated with a node
  * @np: Pointer to device tree node
@@ -55,7 +57,18 @@ struct platform_device *of_find_device_by_node(struct device_node *np)
 {
 	struct device *dev;
 
-	dev = bus_find_device(&platform_bus_type, NULL, np, of_dev_node_match);
+	/*
+	 * Spinlock needed to make sure np->dev doesn't get freed between NULL
+	 * check inside and kref count increment inside get_device(). This is
+	 * achieved by grabbing the spinlock before setting np->dev = NULL in
+	 * of_platform_device_destroy().
+	 */
+	spin_lock(&of_dev_lock);
+	dev = get_device(np->dev);
+	spin_unlock(&of_dev_lock);
+	if (!dev)
+		dev = bus_find_device(&platform_bus_type, NULL, np,
+				      of_dev_node_match);
 	return dev ? to_platform_device(dev) : NULL;
 }
 EXPORT_SYMBOL(of_find_device_by_node);
@@ -196,6 +209,7 @@ static struct platform_device *of_platform_device_create_pdata(
 		platform_device_put(dev);
 		goto err_clear_flag;
 	}
+	np->dev = &dev->dev;
 
 	return dev;
 
@@ -556,6 +570,10 @@ int of_platform_device_destroy(struct device *dev, void *data)
 	if (of_node_check_flag(dev->of_node, OF_POPULATED_BUS))
 		device_for_each_child(dev, NULL, of_platform_device_destroy);
 
+	/* Spinlock is needed for of_find_device_by_node() to work */
+	spin_lock(&of_dev_lock);
+	dev->of_node->dev = NULL;
+	spin_unlock(&of_dev_lock);
 	of_node_clear_flag(dev->of_node, OF_POPULATED);
 	of_node_clear_flag(dev->of_node, OF_POPULATED_BUS);
 
diff --git a/include/linux/of.h b/include/linux/of.h
index 0cf857012f11..f2b4912cbca1 100644
--- a/include/linux/of.h
+++ b/include/linux/of.h
@@ -48,6 +48,8 @@ struct property {
 struct of_irq_controller;
 #endif
 
+struct device;
+
 struct device_node {
 	const char *name;
 	phandle phandle;
@@ -68,6 +70,7 @@ struct device_node {
 	unsigned int unique_id;
 	struct of_irq_controller *irq_trans;
 #endif
+	struct device *dev;		/* Device created from this node */
 };
 
 #define MAX_PHANDLE_ARGS 16
-- 
2.22.0.rc1.257.g3120a18244-goog


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [RESEND PATCH v1 2/5] driver core: Add device links support for pending links to suppliers
  2019-06-04  0:32 [RESEND PATCH v1 0/5] Solve postboot supplier cleanup and optimize probe ordering Saravana Kannan
  2019-06-04  0:32 ` [RESEND PATCH v1 1/5] of/platform: Speed up of_find_device_by_node() Saravana Kannan
@ 2019-06-04  0:32 ` Saravana Kannan
  2019-06-04  0:32 ` [RESEND PATCH v1 3/5] dt-bindings: Add depends-on property Saravana Kannan
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 35+ messages in thread
From: Saravana Kannan @ 2019-06-04  0:32 UTC (permalink / raw)
  To: Rob Herring, Mark Rutland, Greg Kroah-Hartman, Rafael J. Wysocki,
	Frank Rowand
  Cc: Saravana Kannan, David Collins, devicetree, linux-kernel, kernel-team

When consumer devices are added, they might not have a supplier device
to link to despite needing mandatory resources/functionality from one
or more suppliers. Add a waiting_for_suppliers list to track such
consumers and add helper functions to manage the list.

Marking/unmarking a consumer device as waiting for suppliers is
generally expected to be done by the entity that's creating the
device.

Signed-off-by: Saravana Kannan <saravanak@google.com>
---
 drivers/base/core.c    | 67 ++++++++++++++++++++++++++++++++++++++++++
 include/linux/device.h |  5 ++++
 2 files changed, 72 insertions(+)

diff --git a/drivers/base/core.c b/drivers/base/core.c
index fd7511e04e62..9ab6782dda1c 100644
--- a/drivers/base/core.c
+++ b/drivers/base/core.c
@@ -44,6 +44,8 @@ early_param("sysfs.deprecated", sysfs_deprecated_setup);
 #endif
 
 /* Device links support. */
+static LIST_HEAD(wait_for_suppliers);
+static DEFINE_MUTEX(wfs_lock);
 
 #ifdef CONFIG_SRCU
 static DEFINE_MUTEX(device_links_lock);
@@ -401,6 +403,53 @@ struct device_link *device_link_add(struct device *consumer,
 }
 EXPORT_SYMBOL_GPL(device_link_add);
 
+/**
+ * device_link_wait_for_supplier - Mark device as waiting for supplier
+ * @consumer: Consumer device
+ *
+ * Marks the consumer device as waiting for suppliers to become available. The
+ * consumer device will never be probed until it's unmarked as waiting for
+ * suppliers. The caller is responsible for adding the link to the supplier
+ * once the supplier device is present.
+ *
+ * This function is NOT meant to be called from the probe function of the
+ * consumer but rather from code that creates the consumer device.
+ */
+void device_link_wait_for_supplier(struct device *consumer)
+{
+	mutex_lock(&wfs_lock);
+	list_add_tail(&consumer->links.needs_suppliers, &wait_for_suppliers);
+	mutex_unlock(&wfs_lock);
+}
+
+/**
+ * device_link_check_waiting_consumers - Try to unmark waiting consumers
+ * @add_suppliers: Callback function to add suppliers to waiting consumer
+ *
+ * Loops through all consumers waiting on suppliers and tries to add all their
+ * supplier links. If that succeeds, the consumer device is unmarked as waiting
+ * for suppliers. Otherwise, they are left marked as waiting on suppliers,
+ *
+ * The add_suppliers callback is expected to return 0 if it has found and added
+ * all the supplier links for the consumer device. It should return an error if
+ * it isn't able to do so.
+ *
+ * The caller of device_link_wait_for_supplier() is expected to call this once
+ * it's aware of potential suppliers becoming available.
+ */
+void device_link_check_waiting_consumers(
+		int (*add_suppliers)(struct device *consumer))
+{
+	struct device *dev, *tmp;
+
+	mutex_lock(&wfs_lock);
+	list_for_each_entry_safe(dev, tmp, &wait_for_suppliers,
+				 links.needs_suppliers)
+		if (!add_suppliers(dev))
+			list_del_init(&dev->links.needs_suppliers);
+	mutex_unlock(&wfs_lock);
+}
+
 static void device_link_free(struct device_link *link)
 {
 	while (refcount_dec_not_one(&link->rpm_active))
@@ -535,6 +584,19 @@ int device_links_check_suppliers(struct device *dev)
 	struct device_link *link;
 	int ret = 0;
 
+	/*
+	 * If a device is waiting for one or more suppliers (in
+	 * wait_for_suppliers list), it is not ready to probe yet. So just
+	 * return -EPROBE_DEFER without having to check the links with existing
+	 * suppliers.
+	 */
+	mutex_lock(&wfs_lock);
+	if (!list_empty(&dev->links.needs_suppliers)) {
+		mutex_unlock(&wfs_lock);
+		return -EPROBE_DEFER;
+	}
+	mutex_unlock(&wfs_lock);
+
 	device_links_write_lock();
 
 	list_for_each_entry(link, &dev->links.suppliers, c_node) {
@@ -812,6 +874,10 @@ static void device_links_purge(struct device *dev)
 {
 	struct device_link *link, *ln;
 
+	mutex_lock(&wfs_lock);
+	list_del(&dev->links.needs_suppliers);
+	mutex_unlock(&wfs_lock);
+
 	/*
 	 * Delete all of the remaining links from this device to any other
 	 * devices (either consumers or suppliers).
@@ -1673,6 +1739,7 @@ void device_initialize(struct device *dev)
 #endif
 	INIT_LIST_HEAD(&dev->links.consumers);
 	INIT_LIST_HEAD(&dev->links.suppliers);
+	INIT_LIST_HEAD(&dev->links.needs_suppliers);
 	dev->links.status = DL_DEV_NO_DRIVER;
 }
 EXPORT_SYMBOL_GPL(device_initialize);
diff --git a/include/linux/device.h b/include/linux/device.h
index e85264fb6616..4e71e5386aae 100644
--- a/include/linux/device.h
+++ b/include/linux/device.h
@@ -887,11 +887,13 @@ enum dl_dev_state {
  * struct dev_links_info - Device data related to device links.
  * @suppliers: List of links to supplier devices.
  * @consumers: List of links to consumer devices.
+ * @needs_suppliers: Hook to global list of devices waiting for suppliers.
  * @status: Driver status information.
  */
 struct dev_links_info {
 	struct list_head suppliers;
 	struct list_head consumers;
+	struct list_head needs_suppliers;
 	enum dl_dev_state status;
 };
 
@@ -1395,6 +1397,9 @@ struct device_link *device_link_add(struct device *consumer,
 				    struct device *supplier, u32 flags);
 void device_link_del(struct device_link *link);
 void device_link_remove(void *consumer, struct device *supplier);
+void device_link_wait_for_supplier(struct device *consumer);
+void device_link_check_waiting_consumers(
+		int (*add_suppliers)(struct device *consumer));
 
 #ifndef dev_fmt
 #define dev_fmt(fmt) fmt
-- 
2.22.0.rc1.257.g3120a18244-goog


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [RESEND PATCH v1 3/5] dt-bindings: Add depends-on property
  2019-06-04  0:32 [RESEND PATCH v1 0/5] Solve postboot supplier cleanup and optimize probe ordering Saravana Kannan
  2019-06-04  0:32 ` [RESEND PATCH v1 1/5] of/platform: Speed up of_find_device_by_node() Saravana Kannan
  2019-06-04  0:32 ` [RESEND PATCH v1 2/5] driver core: Add device links support for pending links to suppliers Saravana Kannan
@ 2019-06-04  0:32 ` Saravana Kannan
  2019-06-12 14:45   ` Sudeep Holla
  2019-06-04  0:32 ` [RESEND PATCH v1 4/5] of/platform: Add functional dependency link from "depends-on" property Saravana Kannan
                   ` (3 subsequent siblings)
  6 siblings, 1 reply; 35+ messages in thread
From: Saravana Kannan @ 2019-06-04  0:32 UTC (permalink / raw)
  To: Rob Herring, Mark Rutland, Greg Kroah-Hartman, Rafael J. Wysocki,
	Frank Rowand
  Cc: Saravana Kannan, David Collins, devicetree, linux-kernel, kernel-team

The depends-on property is used to list the mandatory functional
dependencies of a consumer device on zero or more supplier devices.

Signed-off-by: Saravana Kannan <saravanak@google.com>
---
 .../devicetree/bindings/depends-on.txt        | 26 +++++++++++++++++++
 1 file changed, 26 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/depends-on.txt

diff --git a/Documentation/devicetree/bindings/depends-on.txt b/Documentation/devicetree/bindings/depends-on.txt
new file mode 100644
index 000000000000..1cbddd11cf17
--- /dev/null
+++ b/Documentation/devicetree/bindings/depends-on.txt
@@ -0,0 +1,26 @@
+Functional dependency linking
+=============================
+
+Apart from parent-child relationships, devices (consumers) often have
+functional dependencies on other devices (suppliers). Common examples of
+suppliers are clock, regulators, pinctrl, etc. However not all of them are
+dependencies with well defined devicetree bindings. Also, not all functional
+dependencies are mandatory as the device might be able to operate in a limited
+mode without some of the dependencies.
+
+The depends-on property allows marking these mandatory functional dependencies
+between one or more devices. The depends-on property is used by the consumer
+device to point to all its mandatory supplier devices.
+
+Optional properties:
+- depends-on:	A list of phandles to mandatory suppliers of the device.
+
+
+Examples:
+Here, the device is dependent on only 2 of the 3 clock providers
+dev@40031000 {
+	      compatible = "name";
+	      reg = <0x40031000 0x1000>;
+	      clocks = <&osc_1 1>, <&osc_2 7> <&osc_3 5>;
+	      depends-on = <&osc_1>, <&osc_3>;
+};
-- 
2.22.0.rc1.257.g3120a18244-goog


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [RESEND PATCH v1 4/5] of/platform: Add functional dependency link from "depends-on" property
  2019-06-04  0:32 [RESEND PATCH v1 0/5] Solve postboot supplier cleanup and optimize probe ordering Saravana Kannan
                   ` (2 preceding siblings ...)
  2019-06-04  0:32 ` [RESEND PATCH v1 3/5] dt-bindings: Add depends-on property Saravana Kannan
@ 2019-06-04  0:32 ` Saravana Kannan
  2019-06-04  0:32 ` [RESEND PATCH v1 5/5] driver core: Add sync_state driver/bus callback Saravana Kannan
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 35+ messages in thread
From: Saravana Kannan @ 2019-06-04  0:32 UTC (permalink / raw)
  To: Rob Herring, Mark Rutland, Greg Kroah-Hartman, Rafael J. Wysocki,
	Frank Rowand
  Cc: Saravana Kannan, David Collins, devicetree, linux-kernel, kernel-team

Add device-links after the devices are created (but before they are
probed) by looking at the "depends-on" DT property that allows
specifying mandatory functional dependencies between devices.

This property is used instead of existing DT properties that specify
phandles of other devices (Eg: clocks, pinctrl, regulators, etc). This
is because not all resources referred to by existing DT properties are
mandatory functional dependencies. Some devices/drivers might be able
to operate with reduced functionality when some of the resources
aren't available. For example, a device could operate in polling mode
if no IRQ is available, a device could skip doing power management if
clock or voltage control isn't available and they are left on, etc.

Automatically adding device-links for functional dependencies at the
framework level provides the following benefits:

- Optimizes device probe order and avoids the useless work of
  attempting probes of devices that will not probe successfully
  (because their suppliers aren't present or haven't probed yet).

  For example, in a commonly available mobile SoC, registering just one
  consumer device's driver at an initcall level earlier than the
  supplier device's driver causes 11 failed probe attempts before the
  consumer device probes successfully. This was with a kernel with all
  the drivers statically compiled in. This problem gets a lot worse if
  all the drivers are loaded as modules without direct symbol
  dependencies.

- Supplier devices like clock providers, regulators providers, etc
  need to keep the resources they provide active and at a particular
  state(s) during boot up even if their current set of consumers don't
  request the resource to be active. This is because the rest of the
  consumers might not have probed yet and turning off the resource
  before all the consumers have probed could lead to a hang or
  undesired user experience.

  Some frameworks (Eg: regulator) handle this today by turning off
  "unused" resources at late_initcall_sync and hoping all the devices
  have probed by then. This is not a valid assumption for systems with
  loadable modules. Other frameworks (Eg: clock) just don't handle
  this due to the lack of a clear signal for when they can turn off
  resources. This leads to downstream hacks to handle cases like this
  that can easily be solved in the upstream kernel.

  By linking devices before they are probed, we give suppliers a clear
  count of the number of dependent consumers. Once all of the consumers
  are active, the suppliers can turn off the unused resources without
  making assumptions about the number of consumers.

By default we just add device-links to track "driver presence" (probe
succeeded) of the supplier device. If any other functionality provided
by device-links are needed, it is left to the consumer/supplier devices
to change the link when they probe.

Signed-off-by: Saravana Kannan <saravanak@google.com>
---
 drivers/of/platform.c | 46 +++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 46 insertions(+)

diff --git a/drivers/of/platform.c b/drivers/of/platform.c
index 1115a8d80a33..da1aa52b310a 100644
--- a/drivers/of/platform.c
+++ b/drivers/of/platform.c
@@ -74,6 +74,45 @@ struct platform_device *of_find_device_by_node(struct device_node *np)
 EXPORT_SYMBOL(of_find_device_by_node);
 
 #ifdef CONFIG_OF_ADDRESS
+static int of_link_to_suppliers(struct device *dev)
+{
+	struct device_node *sup_node;
+	struct platform_device *sup_dev;
+	unsigned int i = 0, links = 0;
+	u32 dl_flags = DL_FLAG_AUTOPROBE_CONSUMER;
+
+	if (unlikely(!dev->of_node))
+		return 0;
+
+	while ((sup_node = of_parse_phandle(dev->of_node, "depends-on", i))) {
+		i++;
+		sup_dev = of_find_device_by_node(sup_node);
+		if (!sup_dev)
+			continue;
+		if (device_link_add(dev, &sup_dev->dev, dl_flags))
+			links++;
+		put_device(&sup_dev->dev);
+	}
+	if (links < i)
+		return -ENODEV;
+	return 0;
+}
+
+static void link_waiting_consumers_func(struct work_struct *work)
+{
+	device_link_check_waiting_consumers(of_link_to_suppliers);
+}
+static DECLARE_WORK(link_waiting_consumers_work, link_waiting_consumers_func);
+
+static bool link_waiting_consumers_enable;
+static void link_waiting_consumers_trigger(void)
+{
+	if (!link_waiting_consumers_enable)
+		return;
+
+	schedule_work(&link_waiting_consumers_work);
+}
+
 /*
  * The following routines scan a subtree and registers a device for
  * each applicable node.
@@ -205,11 +244,14 @@ static struct platform_device *of_platform_device_create_pdata(
 	dev->dev.platform_data = platform_data;
 	of_msi_configure(&dev->dev, dev->dev.of_node);
 
+	if (of_link_to_suppliers(&dev->dev))
+		device_link_wait_for_supplier(&dev->dev);
 	if (of_device_add(dev) != 0) {
 		platform_device_put(dev);
 		goto err_clear_flag;
 	}
 	np->dev = &dev->dev;
+	link_waiting_consumers_trigger();
 
 	return dev;
 
@@ -555,6 +597,10 @@ static int __init of_platform_default_populate_init(void)
 	/* Populate everything else. */
 	of_platform_default_populate(NULL, NULL, NULL);
 
+	/* Make the device-links between suppliers and consumers */
+	link_waiting_consumers_enable = true;
+	device_link_check_waiting_consumers(of_link_to_suppliers);
+
 	return 0;
 }
 arch_initcall_sync(of_platform_default_populate_init);
-- 
2.22.0.rc1.257.g3120a18244-goog


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [RESEND PATCH v1 5/5] driver core: Add sync_state driver/bus callback
  2019-06-04  0:32 [RESEND PATCH v1 0/5] Solve postboot supplier cleanup and optimize probe ordering Saravana Kannan
                   ` (3 preceding siblings ...)
  2019-06-04  0:32 ` [RESEND PATCH v1 4/5] of/platform: Add functional dependency link from "depends-on" property Saravana Kannan
@ 2019-06-04  0:32 ` Saravana Kannan
  2019-06-12 21:21 ` [RESEND PATCH v1 0/5] Solve postboot supplier cleanup and optimize probe ordering Frank Rowand
  2019-06-24 22:37 ` Sandeep Patil
  6 siblings, 0 replies; 35+ messages in thread
From: Saravana Kannan @ 2019-06-04  0:32 UTC (permalink / raw)
  To: Rob Herring, Mark Rutland, Greg Kroah-Hartman, Rafael J. Wysocki,
	Frank Rowand
  Cc: Saravana Kannan, David Collins, devicetree, linux-kernel, kernel-team

This sync_state driver/bus callback is called once all the consumers
of a supplier have probed successfully.

This allows the supplier device's driver/bus to sync the supplier
device's state to the software state with the guarantee that all the
consumers are actively managing the resources provided by the supplier
device.

To maintain backwards compatibility and ease transition from existing
frameworks and resource cleanup schemes, late_initcall_sync is the
earliest when the sync_state callback might be called.

There is no upper bound on the time by which the sync_state callback
has to be called. This is because if a consumer device never probes,
the supplier has to maintain its resources in the state left by the
bootloader. For example, if the bootloader leaves the display
backlight at a fixed voltage and the backlight driver is never probed,
you don't want the backlight to ever be turned off after boot up.

Signed-off-by: Saravana Kannan <saravanak@google.com>
---
 drivers/base/core.c    | 39 +++++++++++++++++++++++++++++++++++++++
 drivers/of/platform.c  |  9 +++++++++
 include/linux/device.h | 19 +++++++++++++++++++
 3 files changed, 67 insertions(+)

diff --git a/drivers/base/core.c b/drivers/base/core.c
index 9ab6782dda1c..7a8777a33e8c 100644
--- a/drivers/base/core.c
+++ b/drivers/base/core.c
@@ -46,6 +46,7 @@ early_param("sysfs.deprecated", sysfs_deprecated_setup);
 /* Device links support. */
 static LIST_HEAD(wait_for_suppliers);
 static DEFINE_MUTEX(wfs_lock);
+static bool supplier_sync_state_enabled;
 
 #ifdef CONFIG_SRCU
 static DEFINE_MUTEX(device_links_lock);
@@ -616,6 +617,41 @@ int device_links_check_suppliers(struct device *dev)
 	return ret;
 }
 
+static void __device_links_supplier_sync_state(struct device *dev)
+{
+	struct device_link *link;
+
+	if (dev->state_synced)
+		return;
+
+	list_for_each_entry(link, &dev->links.consumers, s_node) {
+		if (link->flags & DL_FLAG_STATELESS)
+			continue;
+		if (link->status != DL_STATE_ACTIVE)
+			return;
+	}
+
+	if (dev->bus->sync_state)
+		dev->bus->sync_state(dev);
+	else if (dev->driver && dev->driver->sync_state)
+		dev->driver->sync_state(dev);
+
+	dev->state_synced = true;
+}
+
+int device_links_supplier_sync_state(struct device *dev, void *data)
+{
+	device_links_write_lock();
+	__device_links_supplier_sync_state(dev);
+	device_links_write_unlock();
+	return 0;
+}
+
+void device_links_supplier_sync_state_enable(void)
+{
+	supplier_sync_state_enabled = true;
+}
+
 /**
  * device_links_driver_bound - Update device links after probing its driver.
  * @dev: Device to update the links for.
@@ -660,6 +696,9 @@ void device_links_driver_bound(struct device *dev)
 
 		WARN_ON(link->status != DL_STATE_CONSUMER_PROBE);
 		WRITE_ONCE(link->status, DL_STATE_ACTIVE);
+
+		if (supplier_sync_state_enabled)
+			__device_links_supplier_sync_state(link->supplier);
 	}
 
 	dev->links.status = DL_DEV_DRIVER_BOUND;
diff --git a/drivers/of/platform.c b/drivers/of/platform.c
index da1aa52b310a..b5cce0c2496a 100644
--- a/drivers/of/platform.c
+++ b/drivers/of/platform.c
@@ -604,6 +604,15 @@ static int __init of_platform_default_populate_init(void)
 	return 0;
 }
 arch_initcall_sync(of_platform_default_populate_init);
+
+static int __init of_platform_sync_state_init(void)
+{
+	device_links_supplier_sync_state_enable();
+	bus_for_each_dev(&platform_bus_type, NULL, NULL,
+			 device_links_supplier_sync_state);
+	return 0;
+}
+late_initcall_sync(of_platform_sync_state_init);
 #endif
 
 int of_platform_device_destroy(struct device *dev, void *data)
diff --git a/include/linux/device.h b/include/linux/device.h
index 4e71e5386aae..f6d5bba098df 100644
--- a/include/linux/device.h
+++ b/include/linux/device.h
@@ -79,6 +79,13 @@ extern void bus_remove_file(struct bus_type *, struct bus_attribute *);
  *		that generate uevents to add the environment variables.
  * @probe:	Called when a new device or driver add to this bus, and callback
  *		the specific driver's probe to initial the matched device.
+ * @sync_state:	Called to sync device state to software state after all the
+ *		state tracking consumers linked to this device (present at
+ *		the time of late_initcall) have successfully bound to a
+ *		driver. If the device has no consumers, this function will
+ *		be called at late_initcall_sync level. If the device has
+ *		consumers that are never bound to a driver, this function
+ *		will never get called until they do.
  * @remove:	Called when a device removed from this bus.
  * @shutdown:	Called at shut-down time to quiesce the device.
  *
@@ -122,6 +129,7 @@ struct bus_type {
 	int (*match)(struct device *dev, struct device_driver *drv);
 	int (*uevent)(struct device *dev, struct kobj_uevent_env *env);
 	int (*probe)(struct device *dev);
+	void (*sync_state)(struct device *dev);
 	int (*remove)(struct device *dev);
 	void (*shutdown)(struct device *dev);
 
@@ -251,6 +259,13 @@ enum probe_type {
  * @probe:	Called to query the existence of a specific device,
  *		whether this driver can work with it, and bind the driver
  *		to a specific device.
+ * @sync_state:	Called to sync device state to software state after all the
+ *		state tracking consumers linked to this device (present at
+ *		the time of late_initcall) have successfully bound to a
+ *		driver. If the device has no consumers, this function will
+ *		be called at late_initcall_sync level. If the device has
+ *		consumers that are never bound to a driver, this function
+ *		will never get called until they do.
  * @remove:	Called when the device is removed from the system to
  *		unbind a device from this driver.
  * @shutdown:	Called at shut-down time to quiesce the device.
@@ -288,6 +303,7 @@ struct device_driver {
 	const struct acpi_device_id	*acpi_match_table;
 
 	int (*probe) (struct device *dev);
+	void (*sync_state)(struct device *dev);
 	int (*remove) (struct device *dev);
 	void (*shutdown) (struct device *dev);
 	int (*suspend) (struct device *dev, pm_message_t state);
@@ -1058,6 +1074,7 @@ struct device {
 	bool			offline_disabled:1;
 	bool			offline:1;
 	bool			of_node_reused:1;
+	bool			state_synced:1;
 #if defined(CONFIG_ARCH_HAS_SYNC_DMA_FOR_DEVICE) || \
     defined(CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU) || \
     defined(CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU_ALL)
@@ -1400,6 +1417,8 @@ void device_link_remove(void *consumer, struct device *supplier);
 void device_link_wait_for_supplier(struct device *consumer);
 void device_link_check_waiting_consumers(
 		int (*add_suppliers)(struct device *consumer));
+int device_links_supplier_sync_state(struct device *dev, void *data);
+void device_links_supplier_sync_state_enable(void);
 
 #ifndef dev_fmt
 #define dev_fmt(fmt) fmt
-- 
2.22.0.rc1.257.g3120a18244-goog


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* Re: [RESEND PATCH v1 1/5] of/platform: Speed up of_find_device_by_node()
  2019-06-04  0:32 ` [RESEND PATCH v1 1/5] of/platform: Speed up of_find_device_by_node() Saravana Kannan
@ 2019-06-10 17:36   ` Rob Herring
       [not found]     ` <CAGETcx_Kb3+fFYHhBUdVbCSNTk4v5j1Ket=Et2ZQY0SHUgLLMw@mail.gmail.com>
  2019-06-11 15:18     ` Frank Rowand
  0 siblings, 2 replies; 35+ messages in thread
From: Rob Herring @ 2019-06-10 17:36 UTC (permalink / raw)
  To: Saravana Kannan
  Cc: Mark Rutland, Greg Kroah-Hartman, Rafael J. Wysocki,
	Frank Rowand, David Collins, devicetree, linux-kernel,
	Android Kernel Team

Why are you resending this rather than replying to Frank's last
comments on the original?

On Mon, Jun 3, 2019 at 6:32 PM Saravana Kannan <saravanak@google.com> wrote:
>
> Add a pointer from device tree node to the device created from it.
> This allows us to find the device corresponding to a device tree node
> without having to loop through all the platform devices.
>
> However, fallback to looping through the platform devices to handle
> any devices that might set their own of_node.
>
> Signed-off-by: Saravana Kannan <saravanak@google.com>
> ---
>  drivers/of/platform.c | 20 +++++++++++++++++++-
>  include/linux/of.h    |  3 +++
>  2 files changed, 22 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/of/platform.c b/drivers/of/platform.c
> index 04ad312fd85b..1115a8d80a33 100644
> --- a/drivers/of/platform.c
> +++ b/drivers/of/platform.c
> @@ -42,6 +42,8 @@ static int of_dev_node_match(struct device *dev, void *data)
>         return dev->of_node == data;
>  }
>
> +static DEFINE_SPINLOCK(of_dev_lock);
> +
>  /**
>   * of_find_device_by_node - Find the platform_device associated with a node
>   * @np: Pointer to device tree node
> @@ -55,7 +57,18 @@ struct platform_device *of_find_device_by_node(struct device_node *np)
>  {
>         struct device *dev;
>
> -       dev = bus_find_device(&platform_bus_type, NULL, np, of_dev_node_match);
> +       /*
> +        * Spinlock needed to make sure np->dev doesn't get freed between NULL
> +        * check inside and kref count increment inside get_device(). This is
> +        * achieved by grabbing the spinlock before setting np->dev = NULL in
> +        * of_platform_device_destroy().
> +        */
> +       spin_lock(&of_dev_lock);
> +       dev = get_device(np->dev);
> +       spin_unlock(&of_dev_lock);
> +       if (!dev)
> +               dev = bus_find_device(&platform_bus_type, NULL, np,
> +                                     of_dev_node_match);
>         return dev ? to_platform_device(dev) : NULL;
>  }
>  EXPORT_SYMBOL(of_find_device_by_node);
> @@ -196,6 +209,7 @@ static struct platform_device *of_platform_device_create_pdata(
>                 platform_device_put(dev);
>                 goto err_clear_flag;
>         }
> +       np->dev = &dev->dev;
>
>         return dev;
>
> @@ -556,6 +570,10 @@ int of_platform_device_destroy(struct device *dev, void *data)
>         if (of_node_check_flag(dev->of_node, OF_POPULATED_BUS))
>                 device_for_each_child(dev, NULL, of_platform_device_destroy);
>
> +       /* Spinlock is needed for of_find_device_by_node() to work */
> +       spin_lock(&of_dev_lock);
> +       dev->of_node->dev = NULL;
> +       spin_unlock(&of_dev_lock);
>         of_node_clear_flag(dev->of_node, OF_POPULATED);
>         of_node_clear_flag(dev->of_node, OF_POPULATED_BUS);
>
> diff --git a/include/linux/of.h b/include/linux/of.h
> index 0cf857012f11..f2b4912cbca1 100644
> --- a/include/linux/of.h
> +++ b/include/linux/of.h
> @@ -48,6 +48,8 @@ struct property {
>  struct of_irq_controller;
>  #endif
>
> +struct device;
> +
>  struct device_node {
>         const char *name;
>         phandle phandle;
> @@ -68,6 +70,7 @@ struct device_node {
>         unsigned int unique_id;
>         struct of_irq_controller *irq_trans;
>  #endif
> +       struct device *dev;             /* Device created from this node */
>  };
>
>  #define MAX_PHANDLE_ARGS 16
> --
> 2.22.0.rc1.257.g3120a18244-goog
>

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RESEND PATCH v1 1/5] of/platform: Speed up of_find_device_by_node()
       [not found]     ` <CAGETcx_Kb3+fFYHhBUdVbCSNTk4v5j1Ket=Et2ZQY0SHUgLLMw@mail.gmail.com>
@ 2019-06-10 21:07       ` Rob Herring
  0 siblings, 0 replies; 35+ messages in thread
From: Rob Herring @ 2019-06-10 21:07 UTC (permalink / raw)
  To: Saravana Kannan
  Cc: Mark Rutland, Greg Kroah-Hartman, Rafael J. Wysocki,
	Frank Rowand, David Collins, devicetree, linux-kernel,
	Android Kernel Team

On Mon, Jun 10, 2019 at 1:15 PM Saravana Kannan <saravanak@google.com> wrote:
>
> I did. And I didn't get a response. Folks suggested this might be one of the preferred ways instead of sending "bump" emails.

It's summer time and people go on vacation.

Rob

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RESEND PATCH v1 1/5] of/platform: Speed up of_find_device_by_node()
  2019-06-10 17:36   ` Rob Herring
       [not found]     ` <CAGETcx_Kb3+fFYHhBUdVbCSNTk4v5j1Ket=Et2ZQY0SHUgLLMw@mail.gmail.com>
@ 2019-06-11 15:18     ` Frank Rowand
  2019-06-11 20:56       ` Saravana Kannan
  1 sibling, 1 reply; 35+ messages in thread
From: Frank Rowand @ 2019-06-11 15:18 UTC (permalink / raw)
  To: Rob Herring, Saravana Kannan
  Cc: Mark Rutland, Greg Kroah-Hartman, Rafael J. Wysocki,
	David Collins, devicetree, linux-kernel, Android Kernel Team

Hi Saravana,

On 6/10/19 10:36 AM, Rob Herring wrote:
> Why are you resending this rather than replying to Frank's last
> comments on the original?

Adding on a different aspect...  The independent replies from three different
maintainers (Rob, Mark, myself) pointed out architectural issues with the
patch series.  There were also some implementation issues brought out.
(Although I refrained from bringing up most of my implementation issues
as they are not relevant until architecture issues are resolved.)

When three maintainers say the architecture has issues, you should step
back and think hard.  (Not to say maintainers are always correct...)

My suggestion at this point is that you need to go back to the drawing board
and re-think how to address the use case.

-Frank

> 
> On Mon, Jun 3, 2019 at 6:32 PM Saravana Kannan <saravanak@google.com> wrote:
>>
>> Add a pointer from device tree node to the device created from it.
>> This allows us to find the device corresponding to a device tree node
>> without having to loop through all the platform devices.
>>
>> However, fallback to looping through the platform devices to handle
>> any devices that might set their own of_node.
>>
>> Signed-off-by: Saravana Kannan <saravanak@google.com>
>> ---
>>  drivers/of/platform.c | 20 +++++++++++++++++++-
>>  include/linux/of.h    |  3 +++
>>  2 files changed, 22 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/of/platform.c b/drivers/of/platform.c
>> index 04ad312fd85b..1115a8d80a33 100644
>> --- a/drivers/of/platform.c
>> +++ b/drivers/of/platform.c
>> @@ -42,6 +42,8 @@ static int of_dev_node_match(struct device *dev, void *data)
>>         return dev->of_node == data;
>>  }
>>
>> +static DEFINE_SPINLOCK(of_dev_lock);
>> +
>>  /**
>>   * of_find_device_by_node - Find the platform_device associated with a node
>>   * @np: Pointer to device tree node
>> @@ -55,7 +57,18 @@ struct platform_device *of_find_device_by_node(struct device_node *np)
>>  {
>>         struct device *dev;
>>
>> -       dev = bus_find_device(&platform_bus_type, NULL, np, of_dev_node_match);
>> +       /*
>> +        * Spinlock needed to make sure np->dev doesn't get freed between NULL
>> +        * check inside and kref count increment inside get_device(). This is
>> +        * achieved by grabbing the spinlock before setting np->dev = NULL in
>> +        * of_platform_device_destroy().
>> +        */
>> +       spin_lock(&of_dev_lock);
>> +       dev = get_device(np->dev);
>> +       spin_unlock(&of_dev_lock);
>> +       if (!dev)
>> +               dev = bus_find_device(&platform_bus_type, NULL, np,
>> +                                     of_dev_node_match);
>>         return dev ? to_platform_device(dev) : NULL;
>>  }
>>  EXPORT_SYMBOL(of_find_device_by_node);
>> @@ -196,6 +209,7 @@ static struct platform_device *of_platform_device_create_pdata(
>>                 platform_device_put(dev);
>>                 goto err_clear_flag;
>>         }
>> +       np->dev = &dev->dev;
>>
>>         return dev;
>>
>> @@ -556,6 +570,10 @@ int of_platform_device_destroy(struct device *dev, void *data)
>>         if (of_node_check_flag(dev->of_node, OF_POPULATED_BUS))
>>                 device_for_each_child(dev, NULL, of_platform_device_destroy);
>>
>> +       /* Spinlock is needed for of_find_device_by_node() to work */
>> +       spin_lock(&of_dev_lock);
>> +       dev->of_node->dev = NULL;
>> +       spin_unlock(&of_dev_lock);
>>         of_node_clear_flag(dev->of_node, OF_POPULATED);
>>         of_node_clear_flag(dev->of_node, OF_POPULATED_BUS);
>>
>> diff --git a/include/linux/of.h b/include/linux/of.h
>> index 0cf857012f11..f2b4912cbca1 100644
>> --- a/include/linux/of.h
>> +++ b/include/linux/of.h
>> @@ -48,6 +48,8 @@ struct property {
>>  struct of_irq_controller;
>>  #endif
>>
>> +struct device;
>> +
>>  struct device_node {
>>         const char *name;
>>         phandle phandle;
>> @@ -68,6 +70,7 @@ struct device_node {
>>         unsigned int unique_id;
>>         struct of_irq_controller *irq_trans;
>>  #endif
>> +       struct device *dev;             /* Device created from this node */
>>  };
>>
>>  #define MAX_PHANDLE_ARGS 16
>> --
>> 2.22.0.rc1.257.g3120a18244-goog
>>
> .
> 


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RESEND PATCH v1 1/5] of/platform: Speed up of_find_device_by_node()
  2019-06-11 15:18     ` Frank Rowand
@ 2019-06-11 20:56       ` Saravana Kannan
  2019-06-11 21:52         ` Sandeep Patil
  0 siblings, 1 reply; 35+ messages in thread
From: Saravana Kannan @ 2019-06-11 20:56 UTC (permalink / raw)
  To: Frank Rowand
  Cc: Rob Herring, Mark Rutland, Greg Kroah-Hartman, Rafael J. Wysocki,
	David Collins, devicetree, linux-kernel, Android Kernel Team

On Tue, Jun 11, 2019 at 8:18 AM Frank Rowand <frowand.list@gmail.com> wrote:
>
> Hi Saravana,
>
> On 6/10/19 10:36 AM, Rob Herring wrote:
> > Why are you resending this rather than replying to Frank's last
> > comments on the original?
>
> Adding on a different aspect...  The independent replies from three different
> maintainers (Rob, Mark, myself) pointed out architectural issues with the
> patch series.  There were also some implementation issues brought out.
> (Although I refrained from bringing up most of my implementation issues
> as they are not relevant until architecture issues are resolved.)

Right, I'm not too worried about the implementation issues before we
settle on the architectural issues. Those are easy to fix.

Honestly, the main points that the maintainers raised are:
1) This is a configuration property and not describing the device.
Just use the implicit dependencies coming from existing bindings.

I gave a bunch of reasons for why I think it isn't an OS configuration
property. But even if that's not something the maintainers can agree
to, I gave a concrete example (cyclic dependencies between clock
provider hardware) where the implicit dependencies would prevent one
of the devices from probing till the end of time. So even if the
maintainers don't agree we should always look at "depends-on" to
decide the dependencies, we still need some means to override the
implicit dependencies where they don't match the real dependency. Can
we use depends-on as an override when the implicit dependencies aren't
correct?

2) This doesn't need to be solved because this is just optimizing
probing or saving power ("we should get rid of this auto disabling"):

I explained why this patch series is not just about optimizing probe
ordering or saving power. And why we can't ignore auto disabling
(because it's more than just auto disabling). The kernel is currently
broken when trying to use modules in ARM SoCs (probably in other
systems/archs too, but I can't speak for those).

3) Concerns about backwards compatibility

I pointed out why the current scheme (depends-on being the only source
of dependency) doesn't break compatibility. And if we go with
"depends-on" as an override what we could do to keep backwards
compatibility. Happy to hear more thoughts or discuss options.

4) How the "sync_state" would work for a device that supplies multiple
functionalities but a limited driver.

I explained how it would work.

> When three maintainers say the architecture has issues, you should step
> back and think hard.  (Not to say maintainers are always correct...)

Yes, the 3 maintainers brought up their concerns and I replied to
explain my viewpoint on their concerns. So I'm waiting for a response
from them before I proceed further. Even if the response is going to
be "we still don't agree with this architecture", I'd prefer at least
some direction on what the maintainers think is the right
architecture. There are a million different ways to solve this. It's
not reasonable to expect people submitting patches to shoot in the
dark and hope it hits what the maintainers have in their mind as
"acceptable solutions". I even proposed some alternatives. So, it'd be
nice to hear the maintainers' thoughts on my responses, alternative
designs or some alternate proposals.

There's a real problem that needs to be solved. Just saying "I don't
like this solution" without suggesting alternatives isn't a
constructive way to improve the kernel.

> My suggestion at this point is that you need to go back to the drawing board
> and re-think how to address the use case.

I'd be happy to and I've been thinking about other ways, but I'd like
some direction from the maintainers before writing a lot of code
that's just going to be completely rejected.

To summarize the options:
1) Use depends-on as the only source of dependency (this patch series)
- Easy to keep backward compatible
- Handles all cases
- Arguable that this is OS configuration

2) Use existing bindings for dependencies, but use "depends-on" for
overriding cases that are incorrect.
- New kernel not backward compatible with old DT. So devices will stop
booting or won't have full functionality.
- Needs some scheme to disabling all dependency tracking to be
backward compatible.
  a) Can have the dependency linking code under CONFIG_XXXXX
  b) Can have a kernel commandline
  c) NEW alternative to CONFIG_XXXX or commandline. Can have an
additional "implicit-dependencies-are-explicit-too" DT property that
can be added to the "choosen" DT node to tell the OS that the implicit
dependencies are valid real dependencies too. Then if they need
overrides, depends-on can be used. So old DT blobs would effectively
have this feature disabled.
- Handles all cases
- Only (c) above could be argued as OS configuration. But I'd argue
that it's clarifying the DT dependencies to be more explicit.

Thanks,
Saravana

> -Frank
>
> >
> > On Mon, Jun 3, 2019 at 6:32 PM Saravana Kannan <saravanak@google.com> wrote:
> >>
> >> Add a pointer from device tree node to the device created from it.
> >> This allows us to find the device corresponding to a device tree node
> >> without having to loop through all the platform devices.
> >>
> >> However, fallback to looping through the platform devices to handle
> >> any devices that might set their own of_node.
> >>
> >> Signed-off-by: Saravana Kannan <saravanak@google.com>
> >> ---
> >>  drivers/of/platform.c | 20 +++++++++++++++++++-
> >>  include/linux/of.h    |  3 +++
> >>  2 files changed, 22 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/drivers/of/platform.c b/drivers/of/platform.c
> >> index 04ad312fd85b..1115a8d80a33 100644
> >> --- a/drivers/of/platform.c
> >> +++ b/drivers/of/platform.c
> >> @@ -42,6 +42,8 @@ static int of_dev_node_match(struct device *dev, void *data)
> >>         return dev->of_node == data;
> >>  }
> >>
> >> +static DEFINE_SPINLOCK(of_dev_lock);
> >> +
> >>  /**
> >>   * of_find_device_by_node - Find the platform_device associated with a node
> >>   * @np: Pointer to device tree node
> >> @@ -55,7 +57,18 @@ struct platform_device *of_find_device_by_node(struct device_node *np)
> >>  {
> >>         struct device *dev;
> >>
> >> -       dev = bus_find_device(&platform_bus_type, NULL, np, of_dev_node_match);
> >> +       /*
> >> +        * Spinlock needed to make sure np->dev doesn't get freed between NULL
> >> +        * check inside and kref count increment inside get_device(). This is
> >> +        * achieved by grabbing the spinlock before setting np->dev = NULL in
> >> +        * of_platform_device_destroy().
> >> +        */
> >> +       spin_lock(&of_dev_lock);
> >> +       dev = get_device(np->dev);
> >> +       spin_unlock(&of_dev_lock);
> >> +       if (!dev)
> >> +               dev = bus_find_device(&platform_bus_type, NULL, np,
> >> +                                     of_dev_node_match);
> >>         return dev ? to_platform_device(dev) : NULL;
> >>  }
> >>  EXPORT_SYMBOL(of_find_device_by_node);
> >> @@ -196,6 +209,7 @@ static struct platform_device *of_platform_device_create_pdata(
> >>                 platform_device_put(dev);
> >>                 goto err_clear_flag;
> >>         }
> >> +       np->dev = &dev->dev;
> >>
> >>         return dev;
> >>
> >> @@ -556,6 +570,10 @@ int of_platform_device_destroy(struct device *dev, void *data)
> >>         if (of_node_check_flag(dev->of_node, OF_POPULATED_BUS))
> >>                 device_for_each_child(dev, NULL, of_platform_device_destroy);
> >>
> >> +       /* Spinlock is needed for of_find_device_by_node() to work */
> >> +       spin_lock(&of_dev_lock);
> >> +       dev->of_node->dev = NULL;
> >> +       spin_unlock(&of_dev_lock);
> >>         of_node_clear_flag(dev->of_node, OF_POPULATED);
> >>         of_node_clear_flag(dev->of_node, OF_POPULATED_BUS);
> >>
> >> diff --git a/include/linux/of.h b/include/linux/of.h
> >> index 0cf857012f11..f2b4912cbca1 100644
> >> --- a/include/linux/of.h
> >> +++ b/include/linux/of.h
> >> @@ -48,6 +48,8 @@ struct property {
> >>  struct of_irq_controller;
> >>  #endif
> >>
> >> +struct device;
> >> +
> >>  struct device_node {
> >>         const char *name;
> >>         phandle phandle;
> >> @@ -68,6 +70,7 @@ struct device_node {
> >>         unsigned int unique_id;
> >>         struct of_irq_controller *irq_trans;
> >>  #endif
> >> +       struct device *dev;             /* Device created from this node */
> >>  };
> >>
> >>  #define MAX_PHANDLE_ARGS 16
> >> --
> >> 2.22.0.rc1.257.g3120a18244-goog
> >>
> > .
> >
>

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RESEND PATCH v1 1/5] of/platform: Speed up of_find_device_by_node()
  2019-06-11 20:56       ` Saravana Kannan
@ 2019-06-11 21:52         ` Sandeep Patil
  2019-06-12 13:53           ` Rob Herring
  0 siblings, 1 reply; 35+ messages in thread
From: Sandeep Patil @ 2019-06-11 21:52 UTC (permalink / raw)
  To: Saravana Kannan
  Cc: Frank Rowand, Rob Herring, Mark Rutland, Greg Kroah-Hartman,
	Rafael J. Wysocki, David Collins, devicetree, linux-kernel,
	Android Kernel Team

On Tue, Jun 11, 2019 at 01:56:25PM -0700, 'Saravana Kannan' via kernel-team wrote:
> On Tue, Jun 11, 2019 at 8:18 AM Frank Rowand <frowand.list@gmail.com> wrote:
> >
> > Hi Saravana,
> >
> > On 6/10/19 10:36 AM, Rob Herring wrote:
> > > Why are you resending this rather than replying to Frank's last
> > > comments on the original?
> >
> > Adding on a different aspect...  The independent replies from three different
> > maintainers (Rob, Mark, myself) pointed out architectural issues with the
> > patch series.  There were also some implementation issues brought out.
> > (Although I refrained from bringing up most of my implementation issues
> > as they are not relevant until architecture issues are resolved.)
> 
> Right, I'm not too worried about the implementation issues before we
> settle on the architectural issues. Those are easy to fix.
> 
> Honestly, the main points that the maintainers raised are:
> 1) This is a configuration property and not describing the device.
> Just use the implicit dependencies coming from existing bindings.
> 
> I gave a bunch of reasons for why I think it isn't an OS configuration
> property. But even if that's not something the maintainers can agree
> to, I gave a concrete example (cyclic dependencies between clock
> provider hardware) where the implicit dependencies would prevent one
> of the devices from probing till the end of time. So even if the
> maintainers don't agree we should always look at "depends-on" to
> decide the dependencies, we still need some means to override the
> implicit dependencies where they don't match the real dependency. Can
> we use depends-on as an override when the implicit dependencies aren't
> correct?
> 
> 2) This doesn't need to be solved because this is just optimizing
> probing or saving power ("we should get rid of this auto disabling"):
> 
> I explained why this patch series is not just about optimizing probe
> ordering or saving power. And why we can't ignore auto disabling
> (because it's more than just auto disabling). The kernel is currently
> broken when trying to use modules in ARM SoCs (probably in other
> systems/archs too, but I can't speak for those).
> 
> 3) Concerns about backwards compatibility
> 
> I pointed out why the current scheme (depends-on being the only source
> of dependency) doesn't break compatibility. And if we go with
> "depends-on" as an override what we could do to keep backwards
> compatibility. Happy to hear more thoughts or discuss options.
> 
> 4) How the "sync_state" would work for a device that supplies multiple
> functionalities but a limited driver.

<snip>
To be clear, all of above are _real_ problems that stops us from efficiently
load device drivers as modules for Android.

So, if 'depends-on' doesn't seem like the right approach and "going back to
the drawing board" is the ask, could you please point us in the right
direction?

- ssp

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RESEND PATCH v1 1/5] of/platform: Speed up of_find_device_by_node()
  2019-06-11 21:52         ` Sandeep Patil
@ 2019-06-12 13:53           ` Rob Herring
  2019-06-12 14:21             ` Greg Kroah-Hartman
                               ` (2 more replies)
  0 siblings, 3 replies; 35+ messages in thread
From: Rob Herring @ 2019-06-12 13:53 UTC (permalink / raw)
  To: Sandeep Patil, Saravana Kannan
  Cc: Frank Rowand, Mark Rutland, Greg Kroah-Hartman,
	Rafael J. Wysocki, David Collins, devicetree, linux-kernel,
	Android Kernel Team

On Tue, Jun 11, 2019 at 3:52 PM Sandeep Patil <sspatil@android.com> wrote:
>
> On Tue, Jun 11, 2019 at 01:56:25PM -0700, 'Saravana Kannan' via kernel-team wrote:
> > On Tue, Jun 11, 2019 at 8:18 AM Frank Rowand <frowand.list@gmail.com> wrote:
> > >
> > > Hi Saravana,
> > >
> > > On 6/10/19 10:36 AM, Rob Herring wrote:
> > > > Why are you resending this rather than replying to Frank's last
> > > > comments on the original?
> > >
> > > Adding on a different aspect...  The independent replies from three different
> > > maintainers (Rob, Mark, myself) pointed out architectural issues with the
> > > patch series.  There were also some implementation issues brought out.
> > > (Although I refrained from bringing up most of my implementation issues
> > > as they are not relevant until architecture issues are resolved.)
> >
> > Right, I'm not too worried about the implementation issues before we
> > settle on the architectural issues. Those are easy to fix.
> >
> > Honestly, the main points that the maintainers raised are:
> > 1) This is a configuration property and not describing the device.
> > Just use the implicit dependencies coming from existing bindings.
> >
> > I gave a bunch of reasons for why I think it isn't an OS configuration
> > property. But even if that's not something the maintainers can agree
> > to, I gave a concrete example (cyclic dependencies between clock
> > provider hardware) where the implicit dependencies would prevent one
> > of the devices from probing till the end of time. So even if the
> > maintainers don't agree we should always look at "depends-on" to
> > decide the dependencies, we still need some means to override the
> > implicit dependencies where they don't match the real dependency. Can
> > we use depends-on as an override when the implicit dependencies aren't
> > correct?
> >
> > 2) This doesn't need to be solved because this is just optimizing
> > probing or saving power ("we should get rid of this auto disabling"):
> >
> > I explained why this patch series is not just about optimizing probe
> > ordering or saving power. And why we can't ignore auto disabling
> > (because it's more than just auto disabling). The kernel is currently
> > broken when trying to use modules in ARM SoCs (probably in other
> > systems/archs too, but I can't speak for those).
> >
> > 3) Concerns about backwards compatibility
> >
> > I pointed out why the current scheme (depends-on being the only source
> > of dependency) doesn't break compatibility. And if we go with
> > "depends-on" as an override what we could do to keep backwards
> > compatibility. Happy to hear more thoughts or discuss options.
> >
> > 4) How the "sync_state" would work for a device that supplies multiple
> > functionalities but a limited driver.
>
> <snip>
> To be clear, all of above are _real_ problems that stops us from efficiently
> load device drivers as modules for Android.
>
> So, if 'depends-on' doesn't seem like the right approach and "going back to
> the drawing board" is the ask, could you please point us in the right
> direction?

Use the dependencies which are already there in DT. That's clocks,
pinctrl, regulators, interrupts, gpio at a minimum. I'm simply not
going to accept duplicating all those dependencies in DT. The downside
for the kernel is you have to address these one by one and can't have
a generic property the driver core code can parse. After that's in
place, then maybe we can consider handling any additional dependencies
not already captured in DT. Once all that is in place, we can probably
sort device and/or driver lists to optimize the probe order (maybe the
driver core already does that now?).

Get rid of the auto disabling of clocks and regulators in
late_initcall. It's simply not a valid marker that boot is done when
modules are involved. We probably can't get rid of it as lot's of
platforms rely on that, so it will have to be opt out. Make it the
platform's responsibility for ensuring a consistent state.

Perhaps we need a 'boot done' or 'stop deferring probe' trigger from
userspace in order to make progress if dependencies are missing. Or
maybe just some timeout would be sufficient. I think this is probably
more useful for development than in a shipping product. Even if you
could fallback to polling mode instead of interrupts for example, I
doubt you would want to in a product.

You should also keep in mind that everything needed for a console has
to be built in. Maybe Android can say the console isn't needed, but in
general we can't.

Rob

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RESEND PATCH v1 1/5] of/platform: Speed up of_find_device_by_node()
  2019-06-12 13:53           ` Rob Herring
@ 2019-06-12 14:21             ` Greg Kroah-Hartman
  2019-06-12 16:53               ` Rob Herring
  2019-06-12 17:03               ` Frank Rowand
  2019-06-12 16:07             ` Frank Rowand
  2019-06-12 19:03             ` Frank Rowand
  2 siblings, 2 replies; 35+ messages in thread
From: Greg Kroah-Hartman @ 2019-06-12 14:21 UTC (permalink / raw)
  To: Rob Herring
  Cc: Sandeep Patil, Saravana Kannan, Frank Rowand, Mark Rutland,
	Rafael J. Wysocki, David Collins, devicetree, linux-kernel,
	Android Kernel Team

On Wed, Jun 12, 2019 at 07:53:39AM -0600, Rob Herring wrote:
> On Tue, Jun 11, 2019 at 3:52 PM Sandeep Patil <sspatil@android.com> wrote:
> >
> > On Tue, Jun 11, 2019 at 01:56:25PM -0700, 'Saravana Kannan' via kernel-team wrote:
> > > On Tue, Jun 11, 2019 at 8:18 AM Frank Rowand <frowand.list@gmail.com> wrote:
> > > >
> > > > Hi Saravana,
> > > >
> > > > On 6/10/19 10:36 AM, Rob Herring wrote:
> > > > > Why are you resending this rather than replying to Frank's last
> > > > > comments on the original?
> > > >
> > > > Adding on a different aspect...  The independent replies from three different
> > > > maintainers (Rob, Mark, myself) pointed out architectural issues with the
> > > > patch series.  There were also some implementation issues brought out.
> > > > (Although I refrained from bringing up most of my implementation issues
> > > > as they are not relevant until architecture issues are resolved.)
> > >
> > > Right, I'm not too worried about the implementation issues before we
> > > settle on the architectural issues. Those are easy to fix.
> > >
> > > Honestly, the main points that the maintainers raised are:
> > > 1) This is a configuration property and not describing the device.
> > > Just use the implicit dependencies coming from existing bindings.
> > >
> > > I gave a bunch of reasons for why I think it isn't an OS configuration
> > > property. But even if that's not something the maintainers can agree
> > > to, I gave a concrete example (cyclic dependencies between clock
> > > provider hardware) where the implicit dependencies would prevent one
> > > of the devices from probing till the end of time. So even if the
> > > maintainers don't agree we should always look at "depends-on" to
> > > decide the dependencies, we still need some means to override the
> > > implicit dependencies where they don't match the real dependency. Can
> > > we use depends-on as an override when the implicit dependencies aren't
> > > correct?
> > >
> > > 2) This doesn't need to be solved because this is just optimizing
> > > probing or saving power ("we should get rid of this auto disabling"):
> > >
> > > I explained why this patch series is not just about optimizing probe
> > > ordering or saving power. And why we can't ignore auto disabling
> > > (because it's more than just auto disabling). The kernel is currently
> > > broken when trying to use modules in ARM SoCs (probably in other
> > > systems/archs too, but I can't speak for those).
> > >
> > > 3) Concerns about backwards compatibility
> > >
> > > I pointed out why the current scheme (depends-on being the only source
> > > of dependency) doesn't break compatibility. And if we go with
> > > "depends-on" as an override what we could do to keep backwards
> > > compatibility. Happy to hear more thoughts or discuss options.
> > >
> > > 4) How the "sync_state" would work for a device that supplies multiple
> > > functionalities but a limited driver.
> >
> > <snip>
> > To be clear, all of above are _real_ problems that stops us from efficiently
> > load device drivers as modules for Android.
> >
> > So, if 'depends-on' doesn't seem like the right approach and "going back to
> > the drawing board" is the ask, could you please point us in the right
> > direction?
> 
> Use the dependencies which are already there in DT. That's clocks,
> pinctrl, regulators, interrupts, gpio at a minimum. I'm simply not
> going to accept duplicating all those dependencies in DT. The downside
> for the kernel is you have to address these one by one and can't have
> a generic property the driver core code can parse. After that's in
> place, then maybe we can consider handling any additional dependencies
> not already captured in DT. Once all that is in place, we can probably
> sort device and/or driver lists to optimize the probe order (maybe the
> driver core already does that now?).
> 
> Get rid of the auto disabling of clocks and regulators in
> late_initcall. It's simply not a valid marker that boot is done when
> modules are involved. We probably can't get rid of it as lot's of
> platforms rely on that, so it will have to be opt out. Make it the
> platform's responsibility for ensuring a consistent state.
> 
> Perhaps we need a 'boot done' or 'stop deferring probe' trigger from
> userspace in order to make progress if dependencies are missing.

People have tried to do this multiple times, and you never really know
when "boot is done" due to busses that have discoverable devices and
async probing of other busses.

You do know "something" when you pivot to a new boot disk, and when you
try to load init, but given initramfs and the fact that modules are
usually included on them, that's not really a good indication that
anything is "finished".

I don't want userspace to be responsible for telling the kernel, "hey
you should be finished now!", as that's an async notification that is
going to be ripe for problems.

I really like the "depends-on" information, as it shows a topology that
DT doesn't seem to be able to show today, yet we rely on it in the
kernel with the whole deferred probing mess.  To me, there doesn't seem
to be any other way to properly "know" this.

> Or
> maybe just some timeout would be sufficient. I think this is probably
> more useful for development than in a shipping product. Even if you
> could fallback to polling mode instead of interrupts for example, I
> doubt you would want to in a product.

timeouts suck.  And do not work for shipping products.  I want a device
with 100 modules that relys on DT to be able to boot just as fast as a
laptop with 100 modules that has all of the needed dependancies
described today in their bus topologies, because they used sane hardware
(i.e. PCI and ACPI).  Why hurt embedded people just because their
hardware relies on a system where you have to loop for long periods of
time because DT can not show the topology correctly?

> You should also keep in mind that everything needed for a console has
> to be built in. Maybe Android can say the console isn't needed, but in
> general we can't.

What does a console have to do with any of this?

confused,

greg k-h

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RESEND PATCH v1 3/5] dt-bindings: Add depends-on property
  2019-06-04  0:32 ` [RESEND PATCH v1 3/5] dt-bindings: Add depends-on property Saravana Kannan
@ 2019-06-12 14:45   ` Sudeep Holla
  0 siblings, 0 replies; 35+ messages in thread
From: Sudeep Holla @ 2019-06-12 14:45 UTC (permalink / raw)
  To: Saravana Kannan
  Cc: Rob Herring, Mark Rutland, Greg Kroah-Hartman, Rafael J. Wysocki,
	Frank Rowand, David Collins, devicetree, linux-kernel,
	kernel-team, Sudeep Holla

On Mon, Jun 03, 2019 at 05:32:16PM -0700, Saravana Kannan wrote:
> The depends-on property is used to list the mandatory functional
> dependencies of a consumer device on zero or more supplier devices.
> 
> Signed-off-by: Saravana Kannan <saravanak@google.com>
> ---
>  .../devicetree/bindings/depends-on.txt        | 26 +++++++++++++++++++
>  1 file changed, 26 insertions(+)
>  create mode 100644 Documentation/devicetree/bindings/depends-on.txt
> 
> diff --git a/Documentation/devicetree/bindings/depends-on.txt b/Documentation/devicetree/bindings/depends-on.txt
> new file mode 100644
> index 000000000000..1cbddd11cf17
> --- /dev/null
> +++ b/Documentation/devicetree/bindings/depends-on.txt
> @@ -0,0 +1,26 @@
> +Functional dependency linking
> +=============================
> +
> +Apart from parent-child relationships, devices (consumers) often have
> +functional dependencies on other devices (suppliers). Common examples of
> +suppliers are clock, regulators, pinctrl, etc. However not all of them are
> +dependencies with well defined devicetree bindings. Also, not all functional
> +dependencies are mandatory as the device might be able to operate in a limited
> +mode without some of the dependencies.
> +
> +The depends-on property allows marking these mandatory functional dependencies
> +between one or more devices. The depends-on property is used by the consumer
> +device to point to all its mandatory supplier devices.
> +
> +Optional properties:
> +- depends-on:	A list of phandles to mandatory suppliers of the device.
> +
> +
> +Examples:
> +Here, the device is dependent on only 2 of the 3 clock providers
> +dev@40031000 {
> +	      compatible = "name";
> +	      reg = <0x40031000 0x1000>;
> +	      clocks = <&osc_1 1>, <&osc_2 7> <&osc_3 5>;
> +	      depends-on = <&osc_1>, <&osc_3>;

Why is this not implicit from clocks property and why is there need for
explicit mention of this ?

What happens if there's a cyclic dependency ? From hardware perspective,
cyclic dependency is quite feasible, so IMO it's hard to define this
property unambiguously.

And being optional, we need to deal with the absence of these, and if we
fallback to marking clock, regulators, pinctrl suppliers as depends-on,
this may end up redundant.

You may need to show examples(preferably with DTs in mainline just to
help understand it quickly and easily) that are beyond these standard
suppliers that are not implicit to add this property.

--
Regards,
Sudeep

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RESEND PATCH v1 1/5] of/platform: Speed up of_find_device_by_node()
  2019-06-12 13:53           ` Rob Herring
  2019-06-12 14:21             ` Greg Kroah-Hartman
@ 2019-06-12 16:07             ` Frank Rowand
  2019-06-12 16:47               ` Frank Rowand
  2019-06-12 19:03             ` Frank Rowand
  2 siblings, 1 reply; 35+ messages in thread
From: Frank Rowand @ 2019-06-12 16:07 UTC (permalink / raw)
  To: Rob Herring, Sandeep Patil, Saravana Kannan
  Cc: Mark Rutland, Greg Kroah-Hartman, Rafael J. Wysocki,
	David Collins, devicetree, linux-kernel, Android Kernel Team

On 6/12/19 6:53 AM, Rob Herring wrote:
> On Tue, Jun 11, 2019 at 3:52 PM Sandeep Patil <sspatil@android.com> wrote:
>>
>> On Tue, Jun 11, 2019 at 01:56:25PM -0700, 'Saravana Kannan' via kernel-team wrote:
>>> On Tue, Jun 11, 2019 at 8:18 AM Frank Rowand <frowand.list@gmail.com> wrote:
>>>>
>>>> Hi Saravana,
>>>>
>>>> On 6/10/19 10:36 AM, Rob Herring wrote:
>>>>> Why are you resending this rather than replying to Frank's last
>>>>> comments on the original?
>>>>
>>>> Adding on a different aspect...  The independent replies from three different
>>>> maintainers (Rob, Mark, myself) pointed out architectural issues with the
>>>> patch series.  There were also some implementation issues brought out.
>>>> (Although I refrained from bringing up most of my implementation issues
>>>> as they are not relevant until architecture issues are resolved.)
>>>
>>> Right, I'm not too worried about the implementation issues before we
>>> settle on the architectural issues. Those are easy to fix.
>>>
>>> Honestly, the main points that the maintainers raised are:
>>> 1) This is a configuration property and not describing the device.
>>> Just use the implicit dependencies coming from existing bindings.
>>>
>>> I gave a bunch of reasons for why I think it isn't an OS configuration
>>> property. But even if that's not something the maintainers can agree
>>> to, I gave a concrete example (cyclic dependencies between clock
>>> provider hardware) where the implicit dependencies would prevent one
>>> of the devices from probing till the end of time. So even if the
>>> maintainers don't agree we should always look at "depends-on" to
>>> decide the dependencies, we still need some means to override the
>>> implicit dependencies where they don't match the real dependency. Can
>>> we use depends-on as an override when the implicit dependencies aren't
>>> correct?
>>>
>>> 2) This doesn't need to be solved because this is just optimizing
>>> probing or saving power ("we should get rid of this auto disabling"):
>>>
>>> I explained why this patch series is not just about optimizing probe
>>> ordering or saving power. And why we can't ignore auto disabling
>>> (because it's more than just auto disabling). The kernel is currently
>>> broken when trying to use modules in ARM SoCs (probably in other
>>> systems/archs too, but I can't speak for those).
>>>
>>> 3) Concerns about backwards compatibility
>>>
>>> I pointed out why the current scheme (depends-on being the only source
>>> of dependency) doesn't break compatibility. And if we go with
>>> "depends-on" as an override what we could do to keep backwards
>>> compatibility. Happy to hear more thoughts or discuss options.
>>>
>>> 4) How the "sync_state" would work for a device that supplies multiple
>>> functionalities but a limited driver.
>>
>> <snip>
>> To be clear, all of above are _real_ problems that stops us from efficiently
>> load device drivers as modules for Android.
>>
>> So, if 'depends-on' doesn't seem like the right approach and "going back to
>> the drawing board" is the ask, could you please point us in the right
>> direction?
> 
> Use the dependencies which are already there in DT. That's clocks,
> pinctrl, regulators, interrupts, gpio at a minimum. I'm simply not
> going to accept duplicating all those dependencies in DT. The downside
> for the kernel is you have to address these one by one and can't have
> a generic property the driver core code can parse. After that's in
> place, then maybe we can consider handling any additional dependencies
> not already captured in DT. Once all that is in place, we can probably
> sort device and/or driver lists to optimize the probe order (maybe the
> driver core already does that now?).
> 
> Get rid of the auto disabling of clocks and regulators in
> late_initcall. It's simply not a valid marker that boot is done when
> modules are involved. We probably can't get rid of it as lot's of
> platforms rely on that, so it will have to be opt out. Make it the
> platform's responsibility for ensuring a consistent state.

Setting aside modules for one moment, why is there any auto disabling
of clocks and regulators in late initcall????  Deferred probe processing
does not begin until deferred_probe_initcall() sets
driver_deferred_probe_enable to true.  No late_initcall function
should ever depend on ordering with respect to any other late_initcall.
(And yes, I know that among various initcall levels, there have been
games played to get a certain amount of ordering, but that is at
best fragile.)

In addition to modules, devicetree overlays need to be considered.

Just as modules can result in a driver appearing after boot finishes,
overlays can result in new devicetree nodes (and thus dependencies)
appearing after boot finishes.

-Frank

> 
> Perhaps we need a 'boot done' or 'stop deferring probe' trigger from
> userspace in order to make progress if dependencies are missing. Or
> maybe just some timeout would be sufficient. I think this is probably
> more useful for development than in a shipping product. Even if you
> could fallback to polling mode instead of interrupts for example, I
> doubt you would want to in a product.
> 
> You should also keep in mind that everything needed for a console has
> to be built in. Maybe Android can say the console isn't needed, but in
> general we can't.
> 
> Rob
> .
> 


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RESEND PATCH v1 1/5] of/platform: Speed up of_find_device_by_node()
  2019-06-12 16:07             ` Frank Rowand
@ 2019-06-12 16:47               ` Frank Rowand
  0 siblings, 0 replies; 35+ messages in thread
From: Frank Rowand @ 2019-06-12 16:47 UTC (permalink / raw)
  To: Rob Herring, Sandeep Patil, Saravana Kannan
  Cc: Mark Rutland, Greg Kroah-Hartman, Rafael J. Wysocki,
	David Collins, devicetree, linux-kernel, Android Kernel Team

On 6/12/19 9:07 AM, Frank Rowand wrote:
> On 6/12/19 6:53 AM, Rob Herring wrote:
>> On Tue, Jun 11, 2019 at 3:52 PM Sandeep Patil <sspatil@android.com> wrote:
>>>
>>> On Tue, Jun 11, 2019 at 01:56:25PM -0700, 'Saravana Kannan' via kernel-team wrote:
>>>> On Tue, Jun 11, 2019 at 8:18 AM Frank Rowand <frowand.list@gmail.com> wrote:
>>>>>
>>>>> Hi Saravana,
>>>>>
>>>>> On 6/10/19 10:36 AM, Rob Herring wrote:
>>>>>> Why are you resending this rather than replying to Frank's last
>>>>>> comments on the original?
>>>>>
>>>>> Adding on a different aspect...  The independent replies from three different
>>>>> maintainers (Rob, Mark, myself) pointed out architectural issues with the
>>>>> patch series.  There were also some implementation issues brought out.
>>>>> (Although I refrained from bringing up most of my implementation issues
>>>>> as they are not relevant until architecture issues are resolved.)
>>>>
>>>> Right, I'm not too worried about the implementation issues before we
>>>> settle on the architectural issues. Those are easy to fix.
>>>>
>>>> Honestly, the main points that the maintainers raised are:
>>>> 1) This is a configuration property and not describing the device.
>>>> Just use the implicit dependencies coming from existing bindings.
>>>>
>>>> I gave a bunch of reasons for why I think it isn't an OS configuration
>>>> property. But even if that's not something the maintainers can agree
>>>> to, I gave a concrete example (cyclic dependencies between clock
>>>> provider hardware) where the implicit dependencies would prevent one
>>>> of the devices from probing till the end of time. So even if the
>>>> maintainers don't agree we should always look at "depends-on" to
>>>> decide the dependencies, we still need some means to override the
>>>> implicit dependencies where they don't match the real dependency. Can
>>>> we use depends-on as an override when the implicit dependencies aren't
>>>> correct?
>>>>
>>>> 2) This doesn't need to be solved because this is just optimizing
>>>> probing or saving power ("we should get rid of this auto disabling"):
>>>>
>>>> I explained why this patch series is not just about optimizing probe
>>>> ordering or saving power. And why we can't ignore auto disabling
>>>> (because it's more than just auto disabling). The kernel is currently
>>>> broken when trying to use modules in ARM SoCs (probably in other
>>>> systems/archs too, but I can't speak for those).
>>>>
>>>> 3) Concerns about backwards compatibility
>>>>
>>>> I pointed out why the current scheme (depends-on being the only source
>>>> of dependency) doesn't break compatibility. And if we go with
>>>> "depends-on" as an override what we could do to keep backwards
>>>> compatibility. Happy to hear more thoughts or discuss options.
>>>>
>>>> 4) How the "sync_state" would work for a device that supplies multiple
>>>> functionalities but a limited driver.
>>>
>>> <snip>
>>> To be clear, all of above are _real_ problems that stops us from efficiently
>>> load device drivers as modules for Android.
>>>
>>> So, if 'depends-on' doesn't seem like the right approach and "going back to
>>> the drawing board" is the ask, could you please point us in the right
>>> direction?
>>
>> Use the dependencies which are already there in DT. That's clocks,
>> pinctrl, regulators, interrupts, gpio at a minimum. I'm simply not
>> going to accept duplicating all those dependencies in DT. The downside
>> for the kernel is you have to address these one by one and can't have
>> a generic property the driver core code can parse. After that's in
>> place, then maybe we can consider handling any additional dependencies
>> not already captured in DT. Once all that is in place, we can probably
>> sort device and/or driver lists to optimize the probe order (maybe the
>> driver core already does that now?).
>>
>> Get rid of the auto disabling of clocks and regulators in
>> late_initcall. It's simply not a valid marker that boot is done when
>> modules are involved. We probably can't get rid of it as lot's of
>> platforms rely on that, so it will have to be opt out. Make it the
>> platform's responsibility for ensuring a consistent state.
> 
> Setting aside modules for one moment, why is there any auto disabling
> of clocks and regulators in late initcall????  Deferred probe processing
> does not begin until deferred_probe_initcall() sets

I failed to mention that deferred_probe_initcall() is a late_initcall.

-Frank

> driver_deferred_probe_enable to true.  No late_initcall function
> should ever depend on ordering with respect to any other late_initcall.
> (And yes, I know that among various initcall levels, there have been
> games played to get a certain amount of ordering, but that is at
> best fragile.)
> 
> In addition to modules, devicetree overlays need to be considered.
> 
> Just as modules can result in a driver appearing after boot finishes,
> overlays can result in new devicetree nodes (and thus dependencies)
> appearing after boot finishes.
> 
> -Frank
> 
>>
>> Perhaps we need a 'boot done' or 'stop deferring probe' trigger from
>> userspace in order to make progress if dependencies are missing. Or
>> maybe just some timeout would be sufficient. I think this is probably
>> more useful for development than in a shipping product. Even if you
>> could fallback to polling mode instead of interrupts for example, I
>> doubt you would want to in a product.
>>
>> You should also keep in mind that everything needed for a console has
>> to be built in. Maybe Android can say the console isn't needed, but in
>> general we can't.
>>
>> Rob
>> .
>>
> 
> 


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RESEND PATCH v1 1/5] of/platform: Speed up of_find_device_by_node()
  2019-06-12 14:21             ` Greg Kroah-Hartman
@ 2019-06-12 16:53               ` Rob Herring
  2019-06-12 17:08                 ` Greg Kroah-Hartman
  2019-06-12 17:03               ` Frank Rowand
  1 sibling, 1 reply; 35+ messages in thread
From: Rob Herring @ 2019-06-12 16:53 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Sandeep Patil, Saravana Kannan, Frank Rowand, Mark Rutland,
	Rafael J. Wysocki, David Collins, devicetree, linux-kernel,
	Android Kernel Team

On Wed, Jun 12, 2019 at 8:22 AM Greg Kroah-Hartman
<gregkh@linuxfoundation.org> wrote:
>
> On Wed, Jun 12, 2019 at 07:53:39AM -0600, Rob Herring wrote:
> > On Tue, Jun 11, 2019 at 3:52 PM Sandeep Patil <sspatil@android.com> wrote:
> > >
> > > On Tue, Jun 11, 2019 at 01:56:25PM -0700, 'Saravana Kannan' via kernel-team wrote:
> > > > On Tue, Jun 11, 2019 at 8:18 AM Frank Rowand <frowand.list@gmail.com> wrote:
> > > > >
> > > > > Hi Saravana,
> > > > >
> > > > > On 6/10/19 10:36 AM, Rob Herring wrote:
> > > > > > Why are you resending this rather than replying to Frank's last
> > > > > > comments on the original?
> > > > >
> > > > > Adding on a different aspect...  The independent replies from three different
> > > > > maintainers (Rob, Mark, myself) pointed out architectural issues with the
> > > > > patch series.  There were also some implementation issues brought out.
> > > > > (Although I refrained from bringing up most of my implementation issues
> > > > > as they are not relevant until architecture issues are resolved.)
> > > >
> > > > Right, I'm not too worried about the implementation issues before we
> > > > settle on the architectural issues. Those are easy to fix.
> > > >
> > > > Honestly, the main points that the maintainers raised are:
> > > > 1) This is a configuration property and not describing the device.
> > > > Just use the implicit dependencies coming from existing bindings.
> > > >
> > > > I gave a bunch of reasons for why I think it isn't an OS configuration
> > > > property. But even if that's not something the maintainers can agree
> > > > to, I gave a concrete example (cyclic dependencies between clock
> > > > provider hardware) where the implicit dependencies would prevent one
> > > > of the devices from probing till the end of time. So even if the
> > > > maintainers don't agree we should always look at "depends-on" to
> > > > decide the dependencies, we still need some means to override the
> > > > implicit dependencies where they don't match the real dependency. Can
> > > > we use depends-on as an override when the implicit dependencies aren't
> > > > correct?
> > > >
> > > > 2) This doesn't need to be solved because this is just optimizing
> > > > probing or saving power ("we should get rid of this auto disabling"):
> > > >
> > > > I explained why this patch series is not just about optimizing probe
> > > > ordering or saving power. And why we can't ignore auto disabling
> > > > (because it's more than just auto disabling). The kernel is currently
> > > > broken when trying to use modules in ARM SoCs (probably in other
> > > > systems/archs too, but I can't speak for those).
> > > >
> > > > 3) Concerns about backwards compatibility
> > > >
> > > > I pointed out why the current scheme (depends-on being the only source
> > > > of dependency) doesn't break compatibility. And if we go with
> > > > "depends-on" as an override what we could do to keep backwards
> > > > compatibility. Happy to hear more thoughts or discuss options.
> > > >
> > > > 4) How the "sync_state" would work for a device that supplies multiple
> > > > functionalities but a limited driver.
> > >
> > > <snip>
> > > To be clear, all of above are _real_ problems that stops us from efficiently
> > > load device drivers as modules for Android.
> > >
> > > So, if 'depends-on' doesn't seem like the right approach and "going back to
> > > the drawing board" is the ask, could you please point us in the right
> > > direction?
> >
> > Use the dependencies which are already there in DT. That's clocks,
> > pinctrl, regulators, interrupts, gpio at a minimum. I'm simply not
> > going to accept duplicating all those dependencies in DT. The downside
> > for the kernel is you have to address these one by one and can't have
> > a generic property the driver core code can parse. After that's in
> > place, then maybe we can consider handling any additional dependencies
> > not already captured in DT. Once all that is in place, we can probably
> > sort device and/or driver lists to optimize the probe order (maybe the
> > driver core already does that now?).
> >
> > Get rid of the auto disabling of clocks and regulators in
> > late_initcall. It's simply not a valid marker that boot is done when
> > modules are involved. We probably can't get rid of it as lot's of
> > platforms rely on that, so it will have to be opt out. Make it the
> > platform's responsibility for ensuring a consistent state.
> >
> > Perhaps we need a 'boot done' or 'stop deferring probe' trigger from
> > userspace in order to make progress if dependencies are missing.
>
> People have tried to do this multiple times, and you never really know
> when "boot is done" due to busses that have discoverable devices and
> async probing of other busses.

Yes, I know which is why I proposed the second name with more limited
meaning/function.

> You do know "something" when you pivot to a new boot disk, and when you
> try to load init, but given initramfs and the fact that modules are
> usually included on them, that's not really a good indication that
> anything is "finished".
>
> I don't want userspace to be responsible for telling the kernel, "hey
> you should be finished now!", as that's an async notification that is
> going to be ripe for problems.

The usecase I care about here is when the DT has the dependency
information, but the kernel doesn't have the driver and the dependency
is never resolved. The same problem has to be solved with a
'depends-on' property. This easily happens with a new DT with added
dependencies like pinctrl and an old kernel that doesn't have the
"new" driver. Another example is IOMMUs. We need some way to say stop
waiting for dependencies. It is really just a debug option (of course,
how to prevent a debug option from being used in production?). This
works now for built-in cases with the same late_initcall abuse.

Using late_initcall_sync as an indicator has all the same problems
with userspace indicating boot finished. We should get rid of the
late_initcall_sync abuses and stop trying to work around them.

> I really like the "depends-on" information, as it shows a topology that
> DT doesn't seem to be able to show today, yet we rely on it in the
> kernel with the whole deferred probing mess.  To me, there doesn't seem
> to be any other way to properly "know" this.

As I said, DT *does* have this dependency information already. The
problem is the kernel probing doesn't use it. Fix that and then we can
discuss dependencies the DT doesn't provide that the kernel needs.

> > Or
> > maybe just some timeout would be sufficient. I think this is probably
> > more useful for development than in a shipping product. Even if you
> > could fallback to polling mode instead of interrupts for example, I
> > doubt you would want to in a product.
>
> timeouts suck.  And do not work for shipping products.  I want a device
> with 100 modules that relys on DT to be able to boot just as fast as a
> laptop with 100 modules that has all of the needed dependancies
> described today in their bus topologies, because they used sane hardware
> (i.e. PCI and ACPI).  Why hurt embedded people just because their
> hardware relies on a system where you have to loop for long periods of
> time because DT can not show the topology correctly?

I failed to list buses, but those too are already described in DT. Bus
dependencies are handled already.

> > You should also keep in mind that everything needed for a console has
> > to be built in. Maybe Android can say the console isn't needed, but in
> > general we can't.
>
> What does a console have to do with any of this?

Do you want to move all the SoC support to modules and have a console?
That doesn't work. Dependencies for the console have to be resolved
before initcalls are done. I don't recall the exact details, but I did
hit that issue when working on handling the missing dependency issues
(25b4e70dcce9 driver core: allow stopping deferred probe after init).

The point is that moving things to modules does introduce issues
beyond just dependency tracking.

Rob

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RESEND PATCH v1 1/5] of/platform: Speed up of_find_device_by_node()
  2019-06-12 14:21             ` Greg Kroah-Hartman
  2019-06-12 16:53               ` Rob Herring
@ 2019-06-12 17:03               ` Frank Rowand
  1 sibling, 0 replies; 35+ messages in thread
From: Frank Rowand @ 2019-06-12 17:03 UTC (permalink / raw)
  To: Greg Kroah-Hartman, Rob Herring
  Cc: Sandeep Patil, Saravana Kannan, Mark Rutland, Rafael J. Wysocki,
	David Collins, devicetree, linux-kernel, Android Kernel Team

On 6/12/19 7:21 AM, Greg Kroah-Hartman wrote:
> On Wed, Jun 12, 2019 at 07:53:39AM -0600, Rob Herring wrote:
>> On Tue, Jun 11, 2019 at 3:52 PM Sandeep Patil <sspatil@android.com> wrote:
>>>
>>> On Tue, Jun 11, 2019 at 01:56:25PM -0700, 'Saravana Kannan' via kernel-team wrote:
>>>> On Tue, Jun 11, 2019 at 8:18 AM Frank Rowand <frowand.list@gmail.com> wrote:
>>>>>
>>>>> Hi Saravana,
>>>>>
>>>>> On 6/10/19 10:36 AM, Rob Herring wrote:
>>>>>> Why are you resending this rather than replying to Frank's last
>>>>>> comments on the original?
>>>>>
>>>>> Adding on a different aspect...  The independent replies from three different
>>>>> maintainers (Rob, Mark, myself) pointed out architectural issues with the
>>>>> patch series.  There were also some implementation issues brought out.
>>>>> (Although I refrained from bringing up most of my implementation issues
>>>>> as they are not relevant until architecture issues are resolved.)
>>>>
>>>> Right, I'm not too worried about the implementation issues before we
>>>> settle on the architectural issues. Those are easy to fix.
>>>>
>>>> Honestly, the main points that the maintainers raised are:
>>>> 1) This is a configuration property and not describing the device.
>>>> Just use the implicit dependencies coming from existing bindings.
>>>>
>>>> I gave a bunch of reasons for why I think it isn't an OS configuration
>>>> property. But even if that's not something the maintainers can agree
>>>> to, I gave a concrete example (cyclic dependencies between clock
>>>> provider hardware) where the implicit dependencies would prevent one
>>>> of the devices from probing till the end of time. So even if the
>>>> maintainers don't agree we should always look at "depends-on" to
>>>> decide the dependencies, we still need some means to override the
>>>> implicit dependencies where they don't match the real dependency. Can
>>>> we use depends-on as an override when the implicit dependencies aren't
>>>> correct?
>>>>
>>>> 2) This doesn't need to be solved because this is just optimizing
>>>> probing or saving power ("we should get rid of this auto disabling"):
>>>>
>>>> I explained why this patch series is not just about optimizing probe
>>>> ordering or saving power. And why we can't ignore auto disabling
>>>> (because it's more than just auto disabling). The kernel is currently
>>>> broken when trying to use modules in ARM SoCs (probably in other
>>>> systems/archs too, but I can't speak for those).
>>>>
>>>> 3) Concerns about backwards compatibility
>>>>
>>>> I pointed out why the current scheme (depends-on being the only source
>>>> of dependency) doesn't break compatibility. And if we go with
>>>> "depends-on" as an override what we could do to keep backwards
>>>> compatibility. Happy to hear more thoughts or discuss options.
>>>>
>>>> 4) How the "sync_state" would work for a device that supplies multiple
>>>> functionalities but a limited driver.
>>>
>>> <snip>
>>> To be clear, all of above are _real_ problems that stops us from efficiently
>>> load device drivers as modules for Android.
>>>
>>> So, if 'depends-on' doesn't seem like the right approach and "going back to
>>> the drawing board" is the ask, could you please point us in the right
>>> direction?
>>
>> Use the dependencies which are already there in DT. That's clocks,
>> pinctrl, regulators, interrupts, gpio at a minimum. I'm simply not
>> going to accept duplicating all those dependencies in DT. The downside
>> for the kernel is you have to address these one by one and can't have
>> a generic property the driver core code can parse. After that's in
>> place, then maybe we can consider handling any additional dependencies
>> not already captured in DT. Once all that is in place, we can probably
>> sort device and/or driver lists to optimize the probe order (maybe the
>> driver core already does that now?).
>>
>> Get rid of the auto disabling of clocks and regulators in
>> late_initcall. It's simply not a valid marker that boot is done when
>> modules are involved. We probably can't get rid of it as lot's of
>> platforms rely on that, so it will have to be opt out. Make it the
>> platform's responsibility for ensuring a consistent state.
>>
>> Perhaps we need a 'boot done' or 'stop deferring probe' trigger from
>> userspace in order to make progress if dependencies are missing.
> 
> People have tried to do this multiple times, and you never really know
> when "boot is done" due to busses that have discoverable devices and
> async probing of other busses.
> 
> You do know "something" when you pivot to a new boot disk, and when you
> try to load init, but given initramfs and the fact that modules are
> usually included on them, that's not really a good indication that
> anything is "finished".
> 
> I don't want userspace to be responsible for telling the kernel, "hey
> you should be finished now!", as that's an async notification that is
> going to be ripe for problems.
> 
> I really like the "depends-on" information, as it shows a topology that
> DT doesn't seem to be able to show today, yet we rely on it in the
> kernel with the whole deferred probing mess.  To me, there doesn't seem
> to be any other way to properly "know" this.

Rob pointed out (above) the dependencies that are already contained in
devicetree.  Those are real hardware descriptions that have to be
correct or the devices will not use the hardware that they "depend on"
(thus an incorrect hardware description _also_ results in the resource
that is not described not actually being a dependency).

If you add a second set of properties purely to list dependencies then
the second set of dependencies is likely to have latent errors that are
not initially detected (when the property is first introduced) because
the proper resources just happen to be available by sheer chance.  Then
a later change in boot order resulting from a code change or configuration
change can expose that latent error.

A second set of dependency properties is making boot more fragile.

-Frank

> 
>> Or
>> maybe just some timeout would be sufficient. I think this is probably
>> more useful for development than in a shipping product. Even if you
>> could fallback to polling mode instead of interrupts for example, I
>> doubt you would want to in a product.
> 
> timeouts suck.  And do not work for shipping products.  I want a device
> with 100 modules that relys on DT to be able to boot just as fast as a
> laptop with 100 modules that has all of the needed dependancies
> described today in their bus topologies, because they used sane hardware
> (i.e. PCI and ACPI).  Why hurt embedded people just because their
> hardware relies on a system where you have to loop for long periods of
> time because DT can not show the topology correctly?
> 
>> You should also keep in mind that everything needed for a console has
>> to be built in. Maybe Android can say the console isn't needed, but in
>> general we can't.
> 
> What does a console have to do with any of this?
> 
> confused,
> 
> greg k-h
> 


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RESEND PATCH v1 1/5] of/platform: Speed up of_find_device_by_node()
  2019-06-12 16:53               ` Rob Herring
@ 2019-06-12 17:08                 ` Greg Kroah-Hartman
  2019-06-12 18:19                   ` Rob Herring
  0 siblings, 1 reply; 35+ messages in thread
From: Greg Kroah-Hartman @ 2019-06-12 17:08 UTC (permalink / raw)
  To: Rob Herring
  Cc: Sandeep Patil, Saravana Kannan, Frank Rowand, Mark Rutland,
	Rafael J. Wysocki, David Collins, devicetree, linux-kernel,
	Android Kernel Team

On Wed, Jun 12, 2019 at 10:53:09AM -0600, Rob Herring wrote:
> On Wed, Jun 12, 2019 at 8:22 AM Greg Kroah-Hartman
> <gregkh@linuxfoundation.org> wrote:
> >
> > On Wed, Jun 12, 2019 at 07:53:39AM -0600, Rob Herring wrote:
> > > On Tue, Jun 11, 2019 at 3:52 PM Sandeep Patil <sspatil@android.com> wrote:
> > > >
> > > > On Tue, Jun 11, 2019 at 01:56:25PM -0700, 'Saravana Kannan' via kernel-team wrote:
> > > > > On Tue, Jun 11, 2019 at 8:18 AM Frank Rowand <frowand.list@gmail.com> wrote:
> > > > > >
> > > > > > Hi Saravana,
> > > > > >
> > > > > > On 6/10/19 10:36 AM, Rob Herring wrote:
> > > > > > > Why are you resending this rather than replying to Frank's last
> > > > > > > comments on the original?
> > > > > >
> > > > > > Adding on a different aspect...  The independent replies from three different
> > > > > > maintainers (Rob, Mark, myself) pointed out architectural issues with the
> > > > > > patch series.  There were also some implementation issues brought out.
> > > > > > (Although I refrained from bringing up most of my implementation issues
> > > > > > as they are not relevant until architecture issues are resolved.)
> > > > >
> > > > > Right, I'm not too worried about the implementation issues before we
> > > > > settle on the architectural issues. Those are easy to fix.
> > > > >
> > > > > Honestly, the main points that the maintainers raised are:
> > > > > 1) This is a configuration property and not describing the device.
> > > > > Just use the implicit dependencies coming from existing bindings.
> > > > >
> > > > > I gave a bunch of reasons for why I think it isn't an OS configuration
> > > > > property. But even if that's not something the maintainers can agree
> > > > > to, I gave a concrete example (cyclic dependencies between clock
> > > > > provider hardware) where the implicit dependencies would prevent one
> > > > > of the devices from probing till the end of time. So even if the
> > > > > maintainers don't agree we should always look at "depends-on" to
> > > > > decide the dependencies, we still need some means to override the
> > > > > implicit dependencies where they don't match the real dependency. Can
> > > > > we use depends-on as an override when the implicit dependencies aren't
> > > > > correct?
> > > > >
> > > > > 2) This doesn't need to be solved because this is just optimizing
> > > > > probing or saving power ("we should get rid of this auto disabling"):
> > > > >
> > > > > I explained why this patch series is not just about optimizing probe
> > > > > ordering or saving power. And why we can't ignore auto disabling
> > > > > (because it's more than just auto disabling). The kernel is currently
> > > > > broken when trying to use modules in ARM SoCs (probably in other
> > > > > systems/archs too, but I can't speak for those).
> > > > >
> > > > > 3) Concerns about backwards compatibility
> > > > >
> > > > > I pointed out why the current scheme (depends-on being the only source
> > > > > of dependency) doesn't break compatibility. And if we go with
> > > > > "depends-on" as an override what we could do to keep backwards
> > > > > compatibility. Happy to hear more thoughts or discuss options.
> > > > >
> > > > > 4) How the "sync_state" would work for a device that supplies multiple
> > > > > functionalities but a limited driver.
> > > >
> > > > <snip>
> > > > To be clear, all of above are _real_ problems that stops us from efficiently
> > > > load device drivers as modules for Android.
> > > >
> > > > So, if 'depends-on' doesn't seem like the right approach and "going back to
> > > > the drawing board" is the ask, could you please point us in the right
> > > > direction?
> > >
> > > Use the dependencies which are already there in DT. That's clocks,
> > > pinctrl, regulators, interrupts, gpio at a minimum. I'm simply not
> > > going to accept duplicating all those dependencies in DT. The downside
> > > for the kernel is you have to address these one by one and can't have
> > > a generic property the driver core code can parse. After that's in
> > > place, then maybe we can consider handling any additional dependencies
> > > not already captured in DT. Once all that is in place, we can probably
> > > sort device and/or driver lists to optimize the probe order (maybe the
> > > driver core already does that now?).
> > >
> > > Get rid of the auto disabling of clocks and regulators in
> > > late_initcall. It's simply not a valid marker that boot is done when
> > > modules are involved. We probably can't get rid of it as lot's of
> > > platforms rely on that, so it will have to be opt out. Make it the
> > > platform's responsibility for ensuring a consistent state.
> > >
> > > Perhaps we need a 'boot done' or 'stop deferring probe' trigger from
> > > userspace in order to make progress if dependencies are missing.
> >
> > People have tried to do this multiple times, and you never really know
> > when "boot is done" due to busses that have discoverable devices and
> > async probing of other busses.
> 
> Yes, I know which is why I proposed the second name with more limited
> meaning/function.

I still don't want to have the kernel have to rely on this.

> > You do know "something" when you pivot to a new boot disk, and when you
> > try to load init, but given initramfs and the fact that modules are
> > usually included on them, that's not really a good indication that
> > anything is "finished".
> >
> > I don't want userspace to be responsible for telling the kernel, "hey
> > you should be finished now!", as that's an async notification that is
> > going to be ripe for problems.
> 
> The usecase I care about here is when the DT has the dependency
> information, but the kernel doesn't have the driver and the dependency
> is never resolved.

Then we have the same situation as today and nothing different happens,
right?

> The same problem has to be solved with a
> 'depends-on' property. This easily happens with a new DT with added
> dependencies like pinctrl and an old kernel that doesn't have the
> "new" driver. Another example is IOMMUs. We need some way to say stop
> waiting for dependencies. It is really just a debug option (of course,
> how to prevent a debug option from being used in production?). This
> works now for built-in cases with the same late_initcall abuse.

What is a debug option?  We need something "for real".

> Using late_initcall_sync as an indicator has all the same problems
> with userspace indicating boot finished. We should get rid of the
> late_initcall_sync abuses and stop trying to work around them.

I agree, but that's not the issue here.

> > I really like the "depends-on" information, as it shows a topology that
> > DT doesn't seem to be able to show today, yet we rely on it in the
> > kernel with the whole deferred probing mess.  To me, there doesn't seem
> > to be any other way to properly "know" this.
> 
> As I said, DT *does* have this dependency information already. The
> problem is the kernel probing doesn't use it. Fix that and then we can
> discuss dependencies the DT doesn't provide that the kernel needs.

Where can the kernel probing be fixed to use it?  What am I missing that
can be done instead of what this patchset does?

> > > Or
> > > maybe just some timeout would be sufficient. I think this is probably
> > > more useful for development than in a shipping product. Even if you
> > > could fallback to polling mode instead of interrupts for example, I
> > > doubt you would want to in a product.
> >
> > timeouts suck.  And do not work for shipping products.  I want a device
> > with 100 modules that relys on DT to be able to boot just as fast as a
> > laptop with 100 modules that has all of the needed dependancies
> > described today in their bus topologies, because they used sane hardware
> > (i.e. PCI and ACPI).  Why hurt embedded people just because their
> > hardware relies on a system where you have to loop for long periods of
> > time because DT can not show the topology correctly?
> 
> I failed to list buses, but those too are already described in DT. Bus
> dependencies are handled already.

That's not my point, I know sane busses are described in DT :)

> > > You should also keep in mind that everything needed for a console has
> > > to be built in. Maybe Android can say the console isn't needed, but in
> > > general we can't.
> >
> > What does a console have to do with any of this?
> 
> Do you want to move all the SoC support to modules and have a console?

That's not the issue/point here at all.  I really could care less about
a console, the issue is that we need a way to take a "generic" arm64
kernel, with a load of modules (i.e. allmodconfig) and have it boot in a
sane amount of time.

Other arches do this today, why can't arm64?  That is the issue here,
not console or late init calls.

> That doesn't work. Dependencies for the console have to be resolved
> before initcalls are done. I don't recall the exact details, but I did
> hit that issue when working on handling the missing dependency issues
> (25b4e70dcce9 driver core: allow stopping deferred probe after init).
> 
> The point is that moving things to modules does introduce issues
> beyond just dependency tracking.

I totally agree, and yet here we both are typing on Linux machines where
this all "just works" today.  Now to get that to work on a DT-based
system :)

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RESEND PATCH v1 1/5] of/platform: Speed up of_find_device_by_node()
  2019-06-12 17:08                 ` Greg Kroah-Hartman
@ 2019-06-12 18:19                   ` Rob Herring
  2019-06-12 19:29                     ` Saravana Kannan
  0 siblings, 1 reply; 35+ messages in thread
From: Rob Herring @ 2019-06-12 18:19 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Sandeep Patil, Saravana Kannan, Frank Rowand, Mark Rutland,
	Rafael J. Wysocki, David Collins, devicetree, linux-kernel,
	Android Kernel Team

On Wed, Jun 12, 2019 at 11:08 AM Greg Kroah-Hartman
<gregkh@linuxfoundation.org> wrote:
>
> On Wed, Jun 12, 2019 at 10:53:09AM -0600, Rob Herring wrote:
> > On Wed, Jun 12, 2019 at 8:22 AM Greg Kroah-Hartman
> > <gregkh@linuxfoundation.org> wrote:
> > >
> > > On Wed, Jun 12, 2019 at 07:53:39AM -0600, Rob Herring wrote:
> > > > On Tue, Jun 11, 2019 at 3:52 PM Sandeep Patil <sspatil@android.com> wrote:
> > > > >
> > > > > On Tue, Jun 11, 2019 at 01:56:25PM -0700, 'Saravana Kannan' via kernel-team wrote:
> > > > > > On Tue, Jun 11, 2019 at 8:18 AM Frank Rowand <frowand.list@gmail.com> wrote:
> > > > > > >
> > > > > > > Hi Saravana,
> > > > > > >
> > > > > > > On 6/10/19 10:36 AM, Rob Herring wrote:
> > > > > > > > Why are you resending this rather than replying to Frank's last
> > > > > > > > comments on the original?
> > > > > > >
> > > > > > > Adding on a different aspect...  The independent replies from three different
> > > > > > > maintainers (Rob, Mark, myself) pointed out architectural issues with the
> > > > > > > patch series.  There were also some implementation issues brought out.
> > > > > > > (Although I refrained from bringing up most of my implementation issues
> > > > > > > as they are not relevant until architecture issues are resolved.)
> > > > > >
> > > > > > Right, I'm not too worried about the implementation issues before we
> > > > > > settle on the architectural issues. Those are easy to fix.
> > > > > >
> > > > > > Honestly, the main points that the maintainers raised are:
> > > > > > 1) This is a configuration property and not describing the device.
> > > > > > Just use the implicit dependencies coming from existing bindings.
> > > > > >
> > > > > > I gave a bunch of reasons for why I think it isn't an OS configuration
> > > > > > property. But even if that's not something the maintainers can agree
> > > > > > to, I gave a concrete example (cyclic dependencies between clock
> > > > > > provider hardware) where the implicit dependencies would prevent one
> > > > > > of the devices from probing till the end of time. So even if the
> > > > > > maintainers don't agree we should always look at "depends-on" to
> > > > > > decide the dependencies, we still need some means to override the
> > > > > > implicit dependencies where they don't match the real dependency. Can
> > > > > > we use depends-on as an override when the implicit dependencies aren't
> > > > > > correct?
> > > > > >
> > > > > > 2) This doesn't need to be solved because this is just optimizing
> > > > > > probing or saving power ("we should get rid of this auto disabling"):
> > > > > >
> > > > > > I explained why this patch series is not just about optimizing probe
> > > > > > ordering or saving power. And why we can't ignore auto disabling
> > > > > > (because it's more than just auto disabling). The kernel is currently
> > > > > > broken when trying to use modules in ARM SoCs (probably in other
> > > > > > systems/archs too, but I can't speak for those).
> > > > > >
> > > > > > 3) Concerns about backwards compatibility
> > > > > >
> > > > > > I pointed out why the current scheme (depends-on being the only source
> > > > > > of dependency) doesn't break compatibility. And if we go with
> > > > > > "depends-on" as an override what we could do to keep backwards
> > > > > > compatibility. Happy to hear more thoughts or discuss options.
> > > > > >
> > > > > > 4) How the "sync_state" would work for a device that supplies multiple
> > > > > > functionalities but a limited driver.
> > > > >
> > > > > <snip>
> > > > > To be clear, all of above are _real_ problems that stops us from efficiently
> > > > > load device drivers as modules for Android.
> > > > >
> > > > > So, if 'depends-on' doesn't seem like the right approach and "going back to
> > > > > the drawing board" is the ask, could you please point us in the right
> > > > > direction?
> > > >
> > > > Use the dependencies which are already there in DT. That's clocks,
> > > > pinctrl, regulators, interrupts, gpio at a minimum. I'm simply not
> > > > going to accept duplicating all those dependencies in DT. The downside
> > > > for the kernel is you have to address these one by one and can't have
> > > > a generic property the driver core code can parse. After that's in
> > > > place, then maybe we can consider handling any additional dependencies
> > > > not already captured in DT. Once all that is in place, we can probably
> > > > sort device and/or driver lists to optimize the probe order (maybe the
> > > > driver core already does that now?).
> > > >
> > > > Get rid of the auto disabling of clocks and regulators in
> > > > late_initcall. It's simply not a valid marker that boot is done when
> > > > modules are involved. We probably can't get rid of it as lot's of
> > > > platforms rely on that, so it will have to be opt out. Make it the
> > > > platform's responsibility for ensuring a consistent state.
> > > >
> > > > Perhaps we need a 'boot done' or 'stop deferring probe' trigger from
> > > > userspace in order to make progress if dependencies are missing.
> > >
> > > People have tried to do this multiple times, and you never really know
> > > when "boot is done" due to busses that have discoverable devices and
> > > async probing of other busses.
> >
> > Yes, I know which is why I proposed the second name with more limited
> > meaning/function.
>
> I still don't want to have the kernel have to rely on this.
>
> > > You do know "something" when you pivot to a new boot disk, and when you
> > > try to load init, but given initramfs and the fact that modules are
> > > usually included on them, that's not really a good indication that
> > > anything is "finished".
> > >
> > > I don't want userspace to be responsible for telling the kernel, "hey
> > > you should be finished now!", as that's an async notification that is
> > > going to be ripe for problems.
> >
> > The usecase I care about here is when the DT has the dependency
> > information, but the kernel doesn't have the driver and the dependency
> > is never resolved.
>
> Then we have the same situation as today and nothing different happens,
> right?

Huh?

This works today, but not for modules.

>
> > The same problem has to be solved with a
> > 'depends-on' property. This easily happens with a new DT with added
> > dependencies like pinctrl and an old kernel that doesn't have the
> > "new" driver. Another example is IOMMUs. We need some way to say stop
> > waiting for dependencies. It is really just a debug option (of course,
> > how to prevent a debug option from being used in production?). This
> > works now for built-in cases with the same late_initcall abuse.
>
> What is a debug option?  We need something "for real".
>
> > Using late_initcall_sync as an indicator has all the same problems
> > with userspace indicating boot finished. We should get rid of the
> > late_initcall_sync abuses and stop trying to work around them.
>
> I agree, but that's not the issue here.

It is because the cover letter mentions it and downstream work around it.

> > > I really like the "depends-on" information, as it shows a topology that
> > > DT doesn't seem to be able to show today, yet we rely on it in the
> > > kernel with the whole deferred probing mess.  To me, there doesn't seem
> > > to be any other way to properly "know" this.
> >
> > As I said, DT *does* have this dependency information already. The
> > problem is the kernel probing doesn't use it. Fix that and then we can
> > discuss dependencies the DT doesn't provide that the kernel needs.
>
> Where can the kernel probing be fixed to use it?  What am I missing that
> can be done instead of what this patchset does?

Somewhere, either in each subsystem or in the DT or core code creating
struct devices, you need to iterate thru the dependencies. Take clocks
as an example:

for each node:
  for each 'clocks' phandle
    Lookup struct device from clock phandle
    Add the clock provider struct device to node's struct device links

Now, repeat this for regulators, interrupts, etc.

This series is pretty much doing the same thing, you just have to
parse each provider rather than only 'depends-on'.

One issue is the struct device for the dependency may not be created
yet. I think this series would have the same issue, but haven't dug
into how it avoids that or whether it just ignores it and falls back
to deferring probe.

I'm also not clear on how you create struct devices and add
dependencies before probing gets attempted. If a driver is already
registered, probe is going to be attempted before any dependencies are
added. I guess the issue is avoided with drivers being modules, but
any solution should work for built-in too.

Rob

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RESEND PATCH v1 1/5] of/platform: Speed up of_find_device_by_node()
  2019-06-12 13:53           ` Rob Herring
  2019-06-12 14:21             ` Greg Kroah-Hartman
  2019-06-12 16:07             ` Frank Rowand
@ 2019-06-12 19:03             ` Frank Rowand
  2 siblings, 0 replies; 35+ messages in thread
From: Frank Rowand @ 2019-06-12 19:03 UTC (permalink / raw)
  To: Rob Herring, Sandeep Patil, Saravana Kannan
  Cc: Mark Rutland, Greg Kroah-Hartman, Rafael J. Wysocki,
	David Collins, devicetree, linux-kernel, Android Kernel Team

On 6/12/19 6:53 AM, Rob Herring wrote:
> On Tue, Jun 11, 2019 at 3:52 PM Sandeep Patil <sspatil@android.com> wrote:
>>
>> On Tue, Jun 11, 2019 at 01:56:25PM -0700, 'Saravana Kannan' via kernel-team wrote:
>>> On Tue, Jun 11, 2019 at 8:18 AM Frank Rowand <frowand.list@gmail.com> wrote:
>>>>
>>>> Hi Saravana,
>>>>
>>>> On 6/10/19 10:36 AM, Rob Herring wrote:
>>>>> Why are you resending this rather than replying to Frank's last
>>>>> comments on the original?
>>>>
>>>> Adding on a different aspect...  The independent replies from three different
>>>> maintainers (Rob, Mark, myself) pointed out architectural issues with the
>>>> patch series.  There were also some implementation issues brought out.
>>>> (Although I refrained from bringing up most of my implementation issues
>>>> as they are not relevant until architecture issues are resolved.)
>>>
>>> Right, I'm not too worried about the implementation issues before we
>>> settle on the architectural issues. Those are easy to fix.
>>>
>>> Honestly, the main points that the maintainers raised are:
>>> 1) This is a configuration property and not describing the device.
>>> Just use the implicit dependencies coming from existing bindings.
>>>
>>> I gave a bunch of reasons for why I think it isn't an OS configuration
>>> property. But even if that's not something the maintainers can agree
>>> to, I gave a concrete example (cyclic dependencies between clock
>>> provider hardware) where the implicit dependencies would prevent one
>>> of the devices from probing till the end of time. So even if the
>>> maintainers don't agree we should always look at "depends-on" to
>>> decide the dependencies, we still need some means to override the
>>> implicit dependencies where they don't match the real dependency. Can
>>> we use depends-on as an override when the implicit dependencies aren't
>>> correct?
>>>
>>> 2) This doesn't need to be solved because this is just optimizing
>>> probing or saving power ("we should get rid of this auto disabling"):
>>>
>>> I explained why this patch series is not just about optimizing probe
>>> ordering or saving power. And why we can't ignore auto disabling
>>> (because it's more than just auto disabling). The kernel is currently
>>> broken when trying to use modules in ARM SoCs (probably in other
>>> systems/archs too, but I can't speak for those).
>>>
>>> 3) Concerns about backwards compatibility
>>>
>>> I pointed out why the current scheme (depends-on being the only source
>>> of dependency) doesn't break compatibility. And if we go with
>>> "depends-on" as an override what we could do to keep backwards
>>> compatibility. Happy to hear more thoughts or discuss options.
>>>
>>> 4) How the "sync_state" would work for a device that supplies multiple
>>> functionalities but a limited driver.
>>
>> <snip>
>> To be clear, all of above are _real_ problems that stops us from efficiently
>> load device drivers as modules for Android.
>>
>> So, if 'depends-on' doesn't seem like the right approach and "going back to
>> the drawing board" is the ask, could you please point us in the right
>> direction?
> 


> Use the dependencies which are already there in DT. That's clocks,
> pinctrl, regulators, interrupts, gpio at a minimum. I'm simply not
> going to accept duplicating all those dependencies in DT. The downside

Just in case it isn't obvious from my other comments, I'm fully in
agreement with Rob on this.

-Frank

> for the kernel is you have to address these one by one and can't have
> a generic property the driver core code can parse. After that's in
> place, then maybe we can consider handling any additional dependencies
> not already captured in DT. Once all that is in place, we can probably
> sort device and/or driver lists to optimize the probe order (maybe the
> driver core already does that now?).
> 
> Get rid of the auto disabling of clocks and regulators in
> late_initcall. It's simply not a valid marker that boot is done when
> modules are involved. We probably can't get rid of it as lot's of
> platforms rely on that, so it will have to be opt out. Make it the
> platform's responsibility for ensuring a consistent state.
> 
> Perhaps we need a 'boot done' or 'stop deferring probe' trigger from
> userspace in order to make progress if dependencies are missing. Or
> maybe just some timeout would be sufficient. I think this is probably
> more useful for development than in a shipping product. Even if you
> could fallback to polling mode instead of interrupts for example, I
> doubt you would want to in a product.
> 
> You should also keep in mind that everything needed for a console has
> to be built in. Maybe Android can say the console isn't needed, but in
> general we can't.
> 
> Rob
> .
> 


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RESEND PATCH v1 1/5] of/platform: Speed up of_find_device_by_node()
  2019-06-12 18:19                   ` Rob Herring
@ 2019-06-12 19:29                     ` Saravana Kannan
  2019-06-12 20:23                       ` Frank Rowand
                                         ` (2 more replies)
  0 siblings, 3 replies; 35+ messages in thread
From: Saravana Kannan @ 2019-06-12 19:29 UTC (permalink / raw)
  To: Rob Herring
  Cc: Greg Kroah-Hartman, Sandeep Patil, Frank Rowand, Mark Rutland,
	Rafael J. Wysocki, David Collins, devicetree, linux-kernel,
	Android Kernel Team

On Wed, Jun 12, 2019 at 11:19 AM Rob Herring <robh+dt@kernel.org> wrote:
>
> On Wed, Jun 12, 2019 at 11:08 AM Greg Kroah-Hartman
> <gregkh@linuxfoundation.org> wrote:
> >
> > On Wed, Jun 12, 2019 at 10:53:09AM -0600, Rob Herring wrote:
> > > On Wed, Jun 12, 2019 at 8:22 AM Greg Kroah-Hartman
> > > <gregkh@linuxfoundation.org> wrote:
> > > >
> > > > On Wed, Jun 12, 2019 at 07:53:39AM -0600, Rob Herring wrote:
> > > > > On Tue, Jun 11, 2019 at 3:52 PM Sandeep Patil <sspatil@android.com> wrote:
> > > > > >
> > > > > > On Tue, Jun 11, 2019 at 01:56:25PM -0700, 'Saravana Kannan' via kernel-team wrote:
> > > > > > > On Tue, Jun 11, 2019 at 8:18 AM Frank Rowand <frowand.list@gmail.com> wrote:
> > > > > > > >
> > > > > > > > Hi Saravana,
> > > > > > > >
> > > > > > > > On 6/10/19 10:36 AM, Rob Herring wrote:
> > > > > > > > > Why are you resending this rather than replying to Frank's last
> > > > > > > > > comments on the original?
> > > > > > > >
> > > > > > > > Adding on a different aspect...  The independent replies from three different
> > > > > > > > maintainers (Rob, Mark, myself) pointed out architectural issues with the
> > > > > > > > patch series.  There were also some implementation issues brought out.
> > > > > > > > (Although I refrained from bringing up most of my implementation issues
> > > > > > > > as they are not relevant until architecture issues are resolved.)
> > > > > > >
> > > > > > > Right, I'm not too worried about the implementation issues before we
> > > > > > > settle on the architectural issues. Those are easy to fix.
> > > > > > >
> > > > > > > Honestly, the main points that the maintainers raised are:
> > > > > > > 1) This is a configuration property and not describing the device.
> > > > > > > Just use the implicit dependencies coming from existing bindings.
> > > > > > >
> > > > > > > I gave a bunch of reasons for why I think it isn't an OS configuration
> > > > > > > property. But even if that's not something the maintainers can agree
> > > > > > > to, I gave a concrete example (cyclic dependencies between clock
> > > > > > > provider hardware) where the implicit dependencies would prevent one
> > > > > > > of the devices from probing till the end of time. So even if the
> > > > > > > maintainers don't agree we should always look at "depends-on" to
> > > > > > > decide the dependencies, we still need some means to override the
> > > > > > > implicit dependencies where they don't match the real dependency. Can
> > > > > > > we use depends-on as an override when the implicit dependencies aren't
> > > > > > > correct?
> > > > > > >
> > > > > > > 2) This doesn't need to be solved because this is just optimizing
> > > > > > > probing or saving power ("we should get rid of this auto disabling"):
> > > > > > >
> > > > > > > I explained why this patch series is not just about optimizing probe
> > > > > > > ordering or saving power. And why we can't ignore auto disabling
> > > > > > > (because it's more than just auto disabling). The kernel is currently
> > > > > > > broken when trying to use modules in ARM SoCs (probably in other
> > > > > > > systems/archs too, but I can't speak for those).
> > > > > > >
> > > > > > > 3) Concerns about backwards compatibility
> > > > > > >
> > > > > > > I pointed out why the current scheme (depends-on being the only source
> > > > > > > of dependency) doesn't break compatibility. And if we go with
> > > > > > > "depends-on" as an override what we could do to keep backwards
> > > > > > > compatibility. Happy to hear more thoughts or discuss options.
> > > > > > >
> > > > > > > 4) How the "sync_state" would work for a device that supplies multiple
> > > > > > > functionalities but a limited driver.
> > > > > >
> > > > > > <snip>
> > > > > > To be clear, all of above are _real_ problems that stops us from efficiently
> > > > > > load device drivers as modules for Android.
> > > > > >
> > > > > > So, if 'depends-on' doesn't seem like the right approach and "going back to
> > > > > > the drawing board" is the ask, could you please point us in the right
> > > > > > direction?
> > > > >
> > > > > Use the dependencies which are already there in DT. That's clocks,
> > > > > pinctrl, regulators, interrupts, gpio at a minimum. I'm simply not
> > > > > going to accept duplicating all those dependencies in DT. The downside
> > > > > for the kernel is you have to address these one by one and can't have
> > > > > a generic property the driver core code can parse. After that's in
> > > > > place, then maybe we can consider handling any additional dependencies
> > > > > not already captured in DT. Once all that is in place, we can probably
> > > > > sort device and/or driver lists to optimize the probe order (maybe the
> > > > > driver core already does that now?).
> > > > >
> > > > > Get rid of the auto disabling of clocks and regulators in
> > > > > late_initcall. It's simply not a valid marker that boot is done when
> > > > > modules are involved. We probably can't get rid of it as lot's of
> > > > > platforms rely on that, so it will have to be opt out. Make it the
> > > > > platform's responsibility for ensuring a consistent state.
> > > > >
> > > > > Perhaps we need a 'boot done' or 'stop deferring probe' trigger from
> > > > > userspace in order to make progress if dependencies are missing.
> > > >
> > > > People have tried to do this multiple times, and you never really know
> > > > when "boot is done" due to busses that have discoverable devices and
> > > > async probing of other busses.
> > >
> > > Yes, I know which is why I proposed the second name with more limited
> > > meaning/function.
> >
> > I still don't want to have the kernel have to rely on this.
> >
> > > > You do know "something" when you pivot to a new boot disk, and when you
> > > > try to load init, but given initramfs and the fact that modules are
> > > > usually included on them, that's not really a good indication that
> > > > anything is "finished".
> > > >
> > > > I don't want userspace to be responsible for telling the kernel, "hey
> > > > you should be finished now!", as that's an async notification that is
> > > > going to be ripe for problems.
> > >
> > > The usecase I care about here is when the DT has the dependency
> > > information, but the kernel doesn't have the driver and the dependency
> > > is never resolved.
> >
> > Then we have the same situation as today and nothing different happens,
> > right?
>
> Huh?
>
> This works today, but not for modules.

Replying to this a bit further down.

>
> >
> > > The same problem has to be solved with a
> > > 'depends-on' property. This easily happens with a new DT with added
> > > dependencies like pinctrl and an old kernel that doesn't have the
> > > "new" driver.

Isn't this the perfect example of an "implicit dependency" in a DT
node not being a mandatory dependency? The old kernel worked fine with
older DT without the added pinctrl dependency, so treating it as a
mandatory dependency seems wrong for that particular device?
depends-on avoids all this because the older kernel won't parse
depends-on. And for newer kernels, it'll parse only what depends-on
says are dependencies and not make wrong assumptions.

> > > Another example is IOMMUs. We need some way to say stop
> > > waiting for dependencies. It is really just a debug option (of course,
> > > how to prevent a debug option from being used in production?). This
> > > works now for built-in cases with the same late_initcall abuse.
> >
> > What is a debug option?  We need something "for real".
> >
> > > Using late_initcall_sync as an indicator has all the same problems
> > > with userspace indicating boot finished. We should get rid of the
> > > late_initcall_sync abuses and stop trying to work around them.
> >
> > I agree, but that's not the issue here.
>
> It is because the cover letter mentions it and downstream work around it.

This patch series is trying to remove the use of late_initcall_sync
and instead relying on dependency information coming from DT. So, you
are agreeing with the patchset.

> > > > I really like the "depends-on" information, as it shows a topology that
> > > > DT doesn't seem to be able to show today, yet we rely on it in the
> > > > kernel with the whole deferred probing mess.  To me, there doesn't seem
> > > > to be any other way to properly "know" this.
> > >
> > > As I said, DT *does* have this dependency information already. The
> > > problem is the kernel probing doesn't use it. Fix that and then we can
> > > discuss dependencies the DT doesn't provide that the kernel needs.
> >
> > Where can the kernel probing be fixed to use it?  What am I missing that
> > can be done instead of what this patchset does?
>
> Somewhere, either in each subsystem or in the DT or core code creating
> struct devices, you need to iterate thru the dependencies. Take clocks
> as an example:
>
> for each node:
>   for each 'clocks' phandle
>     Lookup struct device from clock phandle
>     Add the clock provider struct device to node's struct device links
>
> Now, repeat this for regulators, interrupts, etc.

I'm more than happy to do this if the maintainers can accept this as a
solution, but then we need to agree that we need an override property
if the implicit dependency isn't a mandatory dependency. We also need
to agree on how we do this without breaking backwards compatibility.
Either as a config option for this feature or have a property in the
"chosen" node to say it's okay to assume existing bindings imply
mandatory dependencies (it's just describing the DT more explicitly
and the kernel will use it to enable this feature).

Although regulator binding are a "problem" because the kernel will
have to parse every property in a node to check if it ends with
-supply and then treat it as if it's a regulator binding (even though
it might not be). So regulators will need some kind of "opt out" in
the kernel (not DT).

> This series is pretty much doing the same thing, you just have to
> parse each provider rather than only 'depends-on'.
>
> One issue is the struct device for the dependency may not be created
> yet. I think this series would have the same issue, but haven't dug
> into how it avoids that or whether it just ignores it and falls back
> to deferring probe.

The patch series handles this properly and doesn't fall back to
deferred probing.

> I'm also not clear on how you create struct devices and add
> dependencies before probing gets attempted. If a driver is already
> registered, probe is going to be attempted before any dependencies are
> added. I guess the issue is avoided with drivers being modules, but
> any solution should work for built-in too.

This is also handled properly in the series.

I've actually boot tested both these scenarios you call out and the
patch series handles them properly.

But you are missing the main point here. The goal isn't to just
eliminate deferred probing (it's a great side effect even it it just
stops 99% of them), but also remove the bad assumption that
late_initcall_sync() means all the devices are probed. The suppliers
need a better signal (which the patch series provides) to tell when
they can "unfreeze" the resources left on at boot.

It's true that device tree overlays can be added after userspace comes
up, but in those cases whoever is adding the device tree nodes can make
sure that the resources needed by the "to be added overlay devices" are
kept at the right level. It's also unlikely that the bootloader is
leaving resources
on for these overlay devices because they might never be added.

And even if it doesn't work perfectly for instances with overlays
(neither does the
current kernel), it's still better to fix it for the next
million/billion devices that'll use
ARM without post boot overlays.

Thanks,
Saravana

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RESEND PATCH v1 1/5] of/platform: Speed up of_find_device_by_node()
  2019-06-12 19:29                     ` Saravana Kannan
@ 2019-06-12 20:23                       ` Frank Rowand
  2019-06-12 21:12                       ` Rob Herring
  2019-06-18 20:47                       ` Sandeep Patil
  2 siblings, 0 replies; 35+ messages in thread
From: Frank Rowand @ 2019-06-12 20:23 UTC (permalink / raw)
  To: Saravana Kannan, Rob Herring
  Cc: Greg Kroah-Hartman, Sandeep Patil, Mark Rutland,
	Rafael J. Wysocki, David Collins, devicetree, linux-kernel,
	Android Kernel Team

On 6/12/19 12:29 PM, Saravana Kannan wrote:
> On Wed, Jun 12, 2019 at 11:19 AM Rob Herring <robh+dt@kernel.org> wrote:
>>
>> On Wed, Jun 12, 2019 at 11:08 AM Greg Kroah-Hartman
>> <gregkh@linuxfoundation.org> wrote:
>>>
>>> On Wed, Jun 12, 2019 at 10:53:09AM -0600, Rob Herring wrote:
>>>> On Wed, Jun 12, 2019 at 8:22 AM Greg Kroah-Hartman
>>>> <gregkh@linuxfoundation.org> wrote:
>>>>>
>>>>> On Wed, Jun 12, 2019 at 07:53:39AM -0600, Rob Herring wrote:
>>>>>> On Tue, Jun 11, 2019 at 3:52 PM Sandeep Patil <sspatil@android.com> wrote:
>>>>>>>
>>>>>>> On Tue, Jun 11, 2019 at 01:56:25PM -0700, 'Saravana Kannan' via kernel-team wrote:
>>>>>>>> On Tue, Jun 11, 2019 at 8:18 AM Frank Rowand <frowand.list@gmail.com> wrote:
>>>>>>>>>
>>>>>>>>> Hi Saravana,
>>>>>>>>>
>>>>>>>>> On 6/10/19 10:36 AM, Rob Herring wrote:
>>>>>>>>>> Why are you resending this rather than replying to Frank's last
>>>>>>>>>> comments on the original?
>>>>>>>>>
>>>>>>>>> Adding on a different aspect...  The independent replies from three different
>>>>>>>>> maintainers (Rob, Mark, myself) pointed out architectural issues with the
>>>>>>>>> patch series.  There were also some implementation issues brought out.
>>>>>>>>> (Although I refrained from bringing up most of my implementation issues
>>>>>>>>> as they are not relevant until architecture issues are resolved.)
>>>>>>>>
>>>>>>>> Right, I'm not too worried about the implementation issues before we
>>>>>>>> settle on the architectural issues. Those are easy to fix.
>>>>>>>>
>>>>>>>> Honestly, the main points that the maintainers raised are:
>>>>>>>> 1) This is a configuration property and not describing the device.
>>>>>>>> Just use the implicit dependencies coming from existing bindings.
>>>>>>>>
>>>>>>>> I gave a bunch of reasons for why I think it isn't an OS configuration
>>>>>>>> property. But even if that's not something the maintainers can agree
>>>>>>>> to, I gave a concrete example (cyclic dependencies between clock
>>>>>>>> provider hardware) where the implicit dependencies would prevent one
>>>>>>>> of the devices from probing till the end of time. So even if the
>>>>>>>> maintainers don't agree we should always look at "depends-on" to
>>>>>>>> decide the dependencies, we still need some means to override the
>>>>>>>> implicit dependencies where they don't match the real dependency. Can
>>>>>>>> we use depends-on as an override when the implicit dependencies aren't
>>>>>>>> correct?
>>>>>>>>
>>>>>>>> 2) This doesn't need to be solved because this is just optimizing
>>>>>>>> probing or saving power ("we should get rid of this auto disabling"):
>>>>>>>>
>>>>>>>> I explained why this patch series is not just about optimizing probe
>>>>>>>> ordering or saving power. And why we can't ignore auto disabling
>>>>>>>> (because it's more than just auto disabling). The kernel is currently
>>>>>>>> broken when trying to use modules in ARM SoCs (probably in other
>>>>>>>> systems/archs too, but I can't speak for those).
>>>>>>>>
>>>>>>>> 3) Concerns about backwards compatibility
>>>>>>>>
>>>>>>>> I pointed out why the current scheme (depends-on being the only source
>>>>>>>> of dependency) doesn't break compatibility. And if we go with
>>>>>>>> "depends-on" as an override what we could do to keep backwards
>>>>>>>> compatibility. Happy to hear more thoughts or discuss options.
>>>>>>>>
>>>>>>>> 4) How the "sync_state" would work for a device that supplies multiple
>>>>>>>> functionalities but a limited driver.
>>>>>>>
>>>>>>> <snip>
>>>>>>> To be clear, all of above are _real_ problems that stops us from efficiently
>>>>>>> load device drivers as modules for Android.
>>>>>>>
>>>>>>> So, if 'depends-on' doesn't seem like the right approach and "going back to
>>>>>>> the drawing board" is the ask, could you please point us in the right
>>>>>>> direction?
>>>>>>
>>>>>> Use the dependencies which are already there in DT. That's clocks,
>>>>>> pinctrl, regulators, interrupts, gpio at a minimum. I'm simply not
>>>>>> going to accept duplicating all those dependencies in DT. The downside
>>>>>> for the kernel is you have to address these one by one and can't have
>>>>>> a generic property the driver core code can parse. After that's in
>>>>>> place, then maybe we can consider handling any additional dependencies
>>>>>> not already captured in DT. Once all that is in place, we can probably
>>>>>> sort device and/or driver lists to optimize the probe order (maybe the
>>>>>> driver core already does that now?).
>>>>>>
>>>>>> Get rid of the auto disabling of clocks and regulators in
>>>>>> late_initcall. It's simply not a valid marker that boot is done when
>>>>>> modules are involved. We probably can't get rid of it as lot's of
>>>>>> platforms rely on that, so it will have to be opt out. Make it the
>>>>>> platform's responsibility for ensuring a consistent state.
>>>>>>
>>>>>> Perhaps we need a 'boot done' or 'stop deferring probe' trigger from
>>>>>> userspace in order to make progress if dependencies are missing.
>>>>>
>>>>> People have tried to do this multiple times, and you never really know
>>>>> when "boot is done" due to busses that have discoverable devices and
>>>>> async probing of other busses.
>>>>
>>>> Yes, I know which is why I proposed the second name with more limited
>>>> meaning/function.
>>>
>>> I still don't want to have the kernel have to rely on this.
>>>
>>>>> You do know "something" when you pivot to a new boot disk, and when you
>>>>> try to load init, but given initramfs and the fact that modules are
>>>>> usually included on them, that's not really a good indication that
>>>>> anything is "finished".
>>>>>
>>>>> I don't want userspace to be responsible for telling the kernel, "hey
>>>>> you should be finished now!", as that's an async notification that is
>>>>> going to be ripe for problems.
>>>>
>>>> The usecase I care about here is when the DT has the dependency
>>>> information, but the kernel doesn't have the driver and the dependency
>>>> is never resolved.
>>>
>>> Then we have the same situation as today and nothing different happens,
>>> right?
>>
>> Huh?
>>
>> This works today, but not for modules.
> 
> Replying to this a bit further down.
> 
>>
>>>
>>>> The same problem has to be solved with a
>>>> 'depends-on' property. This easily happens with a new DT with added
>>>> dependencies like pinctrl and an old kernel that doesn't have the
>>>> "new" driver.
> 
> Isn't this the perfect example of an "implicit dependency" in a DT
> node not being a mandatory dependency? The old kernel worked fine with
> older DT without the added pinctrl dependency, so treating it as a
> mandatory dependency seems wrong for that particular device?
> depends-on avoids all this because the older kernel won't parse
> depends-on. And for newer kernels, it'll parse only what depends-on
> says are dependencies and not make wrong assumptions.
> 
>>>> Another example is IOMMUs. We need some way to say stop
>>>> waiting for dependencies. It is really just a debug option (of course,
>>>> how to prevent a debug option from being used in production?). This
>>>> works now for built-in cases with the same late_initcall abuse.
>>>
>>> What is a debug option?  We need something "for real".
>>>
>>>> Using late_initcall_sync as an indicator has all the same problems
>>>> with userspace indicating boot finished. We should get rid of the
>>>> late_initcall_sync abuses and stop trying to work around them.
>>>
>>> I agree, but that's not the issue here.
>>
>> It is because the cover letter mentions it and downstream work around it.
> 
> This patch series is trying to remove the use of late_initcall_sync
> and instead relying on dependency information coming from DT. So, you
> are agreeing with the patchset.
> 
>>>>> I really like the "depends-on" information, as it shows a topology that
>>>>> DT doesn't seem to be able to show today, yet we rely on it in the
>>>>> kernel with the whole deferred probing mess.  To me, there doesn't seem
>>>>> to be any other way to properly "know" this.
>>>>
>>>> As I said, DT *does* have this dependency information already. The
>>>> problem is the kernel probing doesn't use it. Fix that and then we can
>>>> discuss dependencies the DT doesn't provide that the kernel needs.
>>>
>>> Where can the kernel probing be fixed to use it?  What am I missing that
>>> can be done instead of what this patchset does?
>>
>> Somewhere, either in each subsystem or in the DT or core code creating
>> struct devices, you need to iterate thru the dependencies. Take clocks
>> as an example:
>>
>> for each node:
>>   for each 'clocks' phandle
>>     Lookup struct device from clock phandle
>>     Add the clock provider struct device to node's struct device links
>>
>> Now, repeat this for regulators, interrupts, etc.
> 
> I'm more than happy to do this if the maintainers can accept this as a
> solution, but then we need to agree that we need an override property
> if the implicit dependency isn't a mandatory dependency. We also need

Why is a dependency not mandatory?


> to agree on how we do this without breaking backwards compatibility.
> Either as a config option for this feature or have a property in the
> "chosen" node to say it's okay to assume existing bindings imply
> mandatory dependencies (it's just describing the DT more explicitly
> and the kernel will use it to enable this feature).

You lost me here.


> Although regulator binding are a "problem" because the kernel will
> have to parse every property in a node to check if it ends with
> -supply and then treat it as if it's a regulator binding (even though
> it might not be). So regulators will need some kind of "opt out" in
> the kernel (not DT).
> 
>> This series is pretty much doing the same thing, you just have to
>> parse each provider rather than only 'depends-on'.
>>
>> One issue is the struct device for the dependency may not be created
>> yet. I think this series would have the same issue, but haven't dug
>> into how it avoids that or whether it just ignores it and falls back
>> to deferring probe.
> 
> The patch series handles this properly and doesn't fall back to
> deferred probing.
> 
>> I'm also not clear on how you create struct devices and add
>> dependencies before probing gets attempted. If a driver is already
>> registered, probe is going to be attempted before any dependencies are
>> added. I guess the issue is avoided with drivers being modules, but
>> any solution should work for built-in too.
> 
> This is also handled properly in the series.
> 
> I've actually boot tested both these scenarios you call out and the
> patch series handles them properly.
> 
> But you are missing the main point here. The goal isn't to just
> eliminate deferred probing (it's a great side effect even it it just
> stops 99% of them), but also remove the bad assumption that
> late_initcall_sync() means all the devices are probed. The suppliers
> need a better signal (which the patch series provides) to tell when
> they can "unfreeze" the resources left on at boot.
> 
> It's true that device tree overlays can be added after userspace comes
> up, but in those cases whoever is adding the device tree nodes can make
> sure that the resources needed by the "to be added overlay devices" are
> kept at the right level. It's also unlikely that the bootloader is
> leaving resources
> on for these overlay devices because they might never be added.

Just like modules might never be added.

> 
> And even if it doesn't work perfectly for instances with overlays
> (neither does the
> current kernel), it's still better to fix it for the next
> million/billion devices that'll use
> ARM without post boot overlays.

Sorry, you do not get to ignore overlays.

> 
> Thanks,
> Saravana
> 


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RESEND PATCH v1 1/5] of/platform: Speed up of_find_device_by_node()
  2019-06-12 19:29                     ` Saravana Kannan
  2019-06-12 20:23                       ` Frank Rowand
@ 2019-06-12 21:12                       ` Rob Herring
  2019-06-12 22:10                         ` Saravana Kannan
  2019-06-18 20:47                       ` Sandeep Patil
  2 siblings, 1 reply; 35+ messages in thread
From: Rob Herring @ 2019-06-12 21:12 UTC (permalink / raw)
  To: Saravana Kannan
  Cc: Greg Kroah-Hartman, Sandeep Patil, Frank Rowand, Mark Rutland,
	Rafael J. Wysocki, David Collins, devicetree, linux-kernel,
	Android Kernel Team

On Wed, Jun 12, 2019 at 1:29 PM Saravana Kannan <saravanak@google.com> wrote:
>
> On Wed, Jun 12, 2019 at 11:19 AM Rob Herring <robh+dt@kernel.org> wrote:
> >
> > On Wed, Jun 12, 2019 at 11:08 AM Greg Kroah-Hartman
> > <gregkh@linuxfoundation.org> wrote:
> > >
> > > On Wed, Jun 12, 2019 at 10:53:09AM -0600, Rob Herring wrote:
> > > > On Wed, Jun 12, 2019 at 8:22 AM Greg Kroah-Hartman
> > > > <gregkh@linuxfoundation.org> wrote:
> > > > >
> > > > > On Wed, Jun 12, 2019 at 07:53:39AM -0600, Rob Herring wrote:
> > > > > > On Tue, Jun 11, 2019 at 3:52 PM Sandeep Patil <sspatil@android.com> wrote:
> > > > > > >
> > > > > > > On Tue, Jun 11, 2019 at 01:56:25PM -0700, 'Saravana Kannan' via kernel-team wrote:
> > > > > > > > On Tue, Jun 11, 2019 at 8:18 AM Frank Rowand <frowand.list@gmail.com> wrote:
> > > > > > > > >
> > > > > > > > > Hi Saravana,
> > > > > > > > >
> > > > > > > > > On 6/10/19 10:36 AM, Rob Herring wrote:
> > > > > > > > > > Why are you resending this rather than replying to Frank's last
> > > > > > > > > > comments on the original?
> > > > > > > > >
> > > > > > > > > Adding on a different aspect...  The independent replies from three different
> > > > > > > > > maintainers (Rob, Mark, myself) pointed out architectural issues with the
> > > > > > > > > patch series.  There were also some implementation issues brought out.
> > > > > > > > > (Although I refrained from bringing up most of my implementation issues
> > > > > > > > > as they are not relevant until architecture issues are resolved.)
> > > > > > > >
> > > > > > > > Right, I'm not too worried about the implementation issues before we
> > > > > > > > settle on the architectural issues. Those are easy to fix.
> > > > > > > >
> > > > > > > > Honestly, the main points that the maintainers raised are:
> > > > > > > > 1) This is a configuration property and not describing the device.
> > > > > > > > Just use the implicit dependencies coming from existing bindings.
> > > > > > > >
> > > > > > > > I gave a bunch of reasons for why I think it isn't an OS configuration
> > > > > > > > property. But even if that's not something the maintainers can agree
> > > > > > > > to, I gave a concrete example (cyclic dependencies between clock
> > > > > > > > provider hardware) where the implicit dependencies would prevent one
> > > > > > > > of the devices from probing till the end of time. So even if the
> > > > > > > > maintainers don't agree we should always look at "depends-on" to
> > > > > > > > decide the dependencies, we still need some means to override the
> > > > > > > > implicit dependencies where they don't match the real dependency. Can
> > > > > > > > we use depends-on as an override when the implicit dependencies aren't
> > > > > > > > correct?
> > > > > > > >
> > > > > > > > 2) This doesn't need to be solved because this is just optimizing
> > > > > > > > probing or saving power ("we should get rid of this auto disabling"):
> > > > > > > >
> > > > > > > > I explained why this patch series is not just about optimizing probe
> > > > > > > > ordering or saving power. And why we can't ignore auto disabling
> > > > > > > > (because it's more than just auto disabling). The kernel is currently
> > > > > > > > broken when trying to use modules in ARM SoCs (probably in other
> > > > > > > > systems/archs too, but I can't speak for those).
> > > > > > > >
> > > > > > > > 3) Concerns about backwards compatibility
> > > > > > > >
> > > > > > > > I pointed out why the current scheme (depends-on being the only source
> > > > > > > > of dependency) doesn't break compatibility. And if we go with
> > > > > > > > "depends-on" as an override what we could do to keep backwards
> > > > > > > > compatibility. Happy to hear more thoughts or discuss options.
> > > > > > > >
> > > > > > > > 4) How the "sync_state" would work for a device that supplies multiple
> > > > > > > > functionalities but a limited driver.
> > > > > > >
> > > > > > > <snip>
> > > > > > > To be clear, all of above are _real_ problems that stops us from efficiently
> > > > > > > load device drivers as modules for Android.
> > > > > > >
> > > > > > > So, if 'depends-on' doesn't seem like the right approach and "going back to
> > > > > > > the drawing board" is the ask, could you please point us in the right
> > > > > > > direction?
> > > > > >
> > > > > > Use the dependencies which are already there in DT. That's clocks,
> > > > > > pinctrl, regulators, interrupts, gpio at a minimum. I'm simply not
> > > > > > going to accept duplicating all those dependencies in DT. The downside
> > > > > > for the kernel is you have to address these one by one and can't have
> > > > > > a generic property the driver core code can parse. After that's in
> > > > > > place, then maybe we can consider handling any additional dependencies
> > > > > > not already captured in DT. Once all that is in place, we can probably
> > > > > > sort device and/or driver lists to optimize the probe order (maybe the
> > > > > > driver core already does that now?).
> > > > > >
> > > > > > Get rid of the auto disabling of clocks and regulators in
> > > > > > late_initcall. It's simply not a valid marker that boot is done when
> > > > > > modules are involved. We probably can't get rid of it as lot's of
> > > > > > platforms rely on that, so it will have to be opt out. Make it the
> > > > > > platform's responsibility for ensuring a consistent state.
> > > > > >
> > > > > > Perhaps we need a 'boot done' or 'stop deferring probe' trigger from
> > > > > > userspace in order to make progress if dependencies are missing.
> > > > >
> > > > > People have tried to do this multiple times, and you never really know
> > > > > when "boot is done" due to busses that have discoverable devices and
> > > > > async probing of other busses.
> > > >
> > > > Yes, I know which is why I proposed the second name with more limited
> > > > meaning/function.
> > >
> > > I still don't want to have the kernel have to rely on this.
> > >
> > > > > You do know "something" when you pivot to a new boot disk, and when you
> > > > > try to load init, but given initramfs and the fact that modules are
> > > > > usually included on them, that's not really a good indication that
> > > > > anything is "finished".
> > > > >
> > > > > I don't want userspace to be responsible for telling the kernel, "hey
> > > > > you should be finished now!", as that's an async notification that is
> > > > > going to be ripe for problems.
> > > >
> > > > The usecase I care about here is when the DT has the dependency
> > > > information, but the kernel doesn't have the driver and the dependency
> > > > is never resolved.
> > >
> > > Then we have the same situation as today and nothing different happens,
> > > right?
> >
> > Huh?
> >
> > This works today, but not for modules.
>
> Replying to this a bit further down.
>
> >
> > >
> > > > The same problem has to be solved with a
> > > > 'depends-on' property. This easily happens with a new DT with added
> > > > dependencies like pinctrl and an old kernel that doesn't have the
> > > > "new" driver.
>
> Isn't this the perfect example of an "implicit dependency" in a DT
> node not being a mandatory dependency? The old kernel worked fine with
> older DT without the added pinctrl dependency, so treating it as a
> mandatory dependency seems wrong for that particular device?
> depends-on avoids all this because the older kernel won't parse
> depends-on. And for newer kernels, it'll parse only what depends-on
> says are dependencies and not make wrong assumptions.

What happens when the older kernel is a kernel that parses
'depends-on'? What's 'older' is a moving target. We just need a
'depends-on-v5.2', 'depends-on-v5.3', etc.

It's not just 'old' kernels. Current kernels which didn't enable some
driver also have the same problem. All of this is mostly just a
development problem on incomplete platforms. But it is a real problem
that SUSE folks have raised.

> > > > Another example is IOMMUs. We need some way to say stop
> > > > waiting for dependencies. It is really just a debug option (of course,
> > > > how to prevent a debug option from being used in production?). This
> > > > works now for built-in cases with the same late_initcall abuse.
> > >
> > > What is a debug option?  We need something "for real".
> > >
> > > > Using late_initcall_sync as an indicator has all the same problems
> > > > with userspace indicating boot finished. We should get rid of the
> > > > late_initcall_sync abuses and stop trying to work around them.
> > >
> > > I agree, but that's not the issue here.
> >
> > It is because the cover letter mentions it and downstream work around it.
>
> This patch series is trying to remove the use of late_initcall_sync
> and instead relying on dependency information coming from DT. So, you
> are agreeing with the patchset.

That aspect, yes. It was Greg that said the late_initcall_sync abuses
were not the issue. 'depends-on' is the only thing I'm objecting to
ATM and I understand the problem (you're not the first to try). I've
not studied the the details of the patchset though beyond that.

> > > > > I really like the "depends-on" information, as it shows a topology that
> > > > > DT doesn't seem to be able to show today, yet we rely on it in the
> > > > > kernel with the whole deferred probing mess.  To me, there doesn't seem
> > > > > to be any other way to properly "know" this.
> > > >
> > > > As I said, DT *does* have this dependency information already. The
> > > > problem is the kernel probing doesn't use it. Fix that and then we can
> > > > discuss dependencies the DT doesn't provide that the kernel needs.
> > >
> > > Where can the kernel probing be fixed to use it?  What am I missing that
> > > can be done instead of what this patchset does?
> >
> > Somewhere, either in each subsystem or in the DT or core code creating
> > struct devices, you need to iterate thru the dependencies. Take clocks
> > as an example:
> >
> > for each node:
> >   for each 'clocks' phandle
> >     Lookup struct device from clock phandle
> >     Add the clock provider struct device to node's struct device links
> >
> > Now, repeat this for regulators, interrupts, etc.
>
> I'm more than happy to do this if the maintainers can accept this as a
> solution, but then we need to agree that we need an override property
> if the implicit dependency isn't a mandatory dependency. We also need
> to agree on how we do this without breaking backwards compatibility.
> Either as a config option for this feature or have a property in the
> "chosen" node to say it's okay to assume existing bindings imply
> mandatory dependencies (it's just describing the DT more explicitly
> and the kernel will use it to enable this feature).

Thinking of the above example I gave, the problem is mandatory or not
can't be decided in the DT. It's a function of what the kernel
supports or not, so the kernel has to decide. With a 'depends-on'
property, the value you'd want depends on which kernel do you boot. If
I'm booting a kernel with the pinctrl driver, then pinctrl is a
mandatory dependency. If I'm booting a kernel without the pinctrl
driver, then pinctrl is not a 'depends-on' dependency. Ignoring
dependencies like this case only works when things are built-in and
only for specific subsystems where it makes sense. It would be nice to
solve it for modules too, but that can be done later.

There's also the case of dependencies where the driver decides how to
handle them. For example, UARTs will use DMA if the dmaengine driver
is available and that decision is (at least for some drivers I've
looked at) made on every open(). I suppose in that case, the driver
could remove the dependency link since it knows it can work with or
without DMA, though we'd need some sort of mechanism to do that.

> Although regulator binding are a "problem" because the kernel will
> have to parse every property in a node to check if it ends with
> -supply and then treat it as if it's a regulator binding (even though
> it might not be). So regulators will need some kind of "opt out" in
> the kernel (not DT).

Yes, that's unfortunate. GPIO is the same. You can safely assume that
'-supply' is a regulator. This is at least somewhat enforced by the
binding schema now.

We probably need a for_each_of_property_with_suffix() iterator for
these cases and with some pointer math on the property names or a
custom str compare function, it shouldn't be much slower to match.

> > This series is pretty much doing the same thing, you just have to
> > parse each provider rather than only 'depends-on'.
> >
> > One issue is the struct device for the dependency may not be created
> > yet. I think this series would have the same issue, but haven't dug
> > into how it avoids that or whether it just ignores it and falls back
> > to deferring probe.
>
> The patch series handles this properly and doesn't fall back to
> deferred probing.
>
> > I'm also not clear on how you create struct devices and add
> > dependencies before probing gets attempted. If a driver is already
> > registered, probe is going to be attempted before any dependencies are
> > added. I guess the issue is avoided with drivers being modules, but
> > any solution should work for built-in too.
>
> This is also handled properly in the series.
>
> I've actually boot tested both these scenarios you call out and the
> patch series handles them properly.

Okay, that's good.

> But you are missing the main point here. The goal isn't to just
> eliminate deferred probing (it's a great side effect even it it just
> stops 99% of them), but also remove the bad assumption that
> late_initcall_sync() means all the devices are probed. The suppliers
> need a better signal (which the patch series provides) to tell when
> they can "unfreeze" the resources left on at boot.

I understand this. I think I was clear on my opinion about using
late_initcall_sync.

Rob

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RESEND PATCH v1 0/5] Solve postboot supplier cleanup and optimize probe ordering
  2019-06-04  0:32 [RESEND PATCH v1 0/5] Solve postboot supplier cleanup and optimize probe ordering Saravana Kannan
                   ` (4 preceding siblings ...)
  2019-06-04  0:32 ` [RESEND PATCH v1 5/5] driver core: Add sync_state driver/bus callback Saravana Kannan
@ 2019-06-12 21:21 ` Frank Rowand
  2019-06-13 13:19   ` Rob Herring
  2019-06-24 22:37 ` Sandeep Patil
  6 siblings, 1 reply; 35+ messages in thread
From: Frank Rowand @ 2019-06-12 21:21 UTC (permalink / raw)
  To: Saravana Kannan, Rob Herring, Mark Rutland, Greg Kroah-Hartman,
	Rafael J. Wysocki
  Cc: David Collins, devicetree, linux-kernel, kernel-team, David Collins

Adding cc: David Collins

Plus my comments below.

On 6/3/19 5:32 PM, Saravana Kannan wrote:
> Add a generic "depends-on" property that allows specifying mandatory
> functional dependencies between devices. Add device-links after the
> devices are created (but before they are probed) by looking at this
> "depends-on" property.
> 
> This property is used instead of existing DT properties that specify
> phandles of other devices (Eg: clocks, pinctrl, regulators, etc). This
> is because not all resources referred to by existing DT properties are
> mandatory functional dependencies. Some devices/drivers might be able
> to operate with reduced functionality when some of the resources
> aren't available. For example, a device could operate in polling mode
> if no IRQ is available, a device could skip doing power management if
> clock or voltage control isn't available and they are left on, etc.
> 
> So, adding mandatory functional dependency links between devices by
> looking at referred phandles in DT properties won't work as it would
> prevent probing devices that could be probed. By having an explicit
> depends-on property, we can handle these cases correctly.
> 
> Having functional dependencies explicitly called out in DT and
> automatically added before the devices are probed, provides the
> following benefits:
> 
> - Optimizes device probe order and avoids the useless work of
>   attempting probes of devices that will not probe successfully
>   (because their suppliers aren't present or haven't probed yet).
> 
>   For example, in a commonly available mobile SoC, registering just
>   one consumer device's driver at an initcall level earlier than the
>   supplier device's driver causes 11 failed probe attempts before the
>   consumer device probes successfully. This was with a kernel with all
>   the drivers statically compiled in. This problem gets a lot worse if
>   all the drivers are loaded as modules without direct symbol
>   dependencies.
> 
> - Supplier devices like clock providers, regulators providers, etc
>   need to keep the resources they provide active and at a particular
>   state(s) during boot up even if their current set of consumers don't
>   request the resource to be active. This is because the rest of the
>   consumers might not have probed yet and turning off the resource
>   before all the consumers have probed could lead to a hang or
>   undesired user experience.
> 
>   Some frameworks (Eg: regulator) handle this today by turning off
>   "unused" resources at late_initcall_sync and hoping all the devices
>   have probed by then. This is not a valid assumption for systems with
>   loadable modules. Other frameworks (Eg: clock) just don't handle
>   this due to the lack of a clear signal for when they can turn off
>   resources. This leads to downstream hacks to handle cases like this
>   that can easily be solved in the upstream kernel.
> 
>   By linking devices before they are probed, we give suppliers a clear
>   count of the number of dependent consumers. Once all of the
>   consumers are active, the suppliers can turn off the unused
>   resources without making assumptions about the number of consumers.
> 
> By default we just add device-links to track "driver presence" (probe
> succeeded) of the supplier device. If any other functionality provided
> by device-links are needed, it is left to the consumer/supplier
> devices to change the link when they probe.
>  
> 
> Saravana Kannan (5):
>   of/platform: Speed up of_find_device_by_node()
>   driver core: Add device links support for pending links to suppliers
>   dt-bindings: Add depends-on property
>   of/platform: Add functional dependency link from "depends-on" property
>   driver core: Add sync_state driver/bus callback
> 
>  .../devicetree/bindings/depends-on.txt        |  26 +++++
>  drivers/base/core.c                           | 106 ++++++++++++++++++
>  drivers/of/platform.c                         |  75 ++++++++++++-
>  include/linux/device.h                        |  24 ++++
>  include/linux/of.h                            |   3 +
>  5 files changed, 233 insertions(+), 1 deletion(-)
>  create mode 100644 Documentation/devicetree/bindings/depends-on.txt
> 


I don't think the above description adequately describes one key problem
that the patch set addresses.

David Collins described the problem in an email late in the thread of
the first submission of this series.  Instead of providing a link to
that email, I am going to fully copy it here:

On 5/31/19 4:27 PM, David Collins wrote:
> Hello Saravana,
> 
> On 5/23/19 6:01 PM, Saravana Kannan wrote:
> ...
>> Having functional dependencies explicitly called out in DT and
>> automatically added before the devices are probed, provides the
>> following benefits:
> ...
>> - Supplier devices like clock providers, regulators providers, etc
>>   need to keep the resources they provide active and at a particular
>>   state(s) during boot up even if their current set of consumers don't
>>   request the resource to be active. This is because the rest of the
>>   consumers might not have probed yet and turning off the resource
>>   before all the consumers have probed could lead to a hang or
>>   undesired user experience.
> This benefit provided by the sync_state() callback function introduced in
> this series gives us a mechanism to solve a specific problem encountered
> on Qualcomm Technologies, Inc. (QTI) boards when booting with drivers
> compiled as modules.  QTI boards have a regulator that powers the PHYs for
> display, camera, USB, UFS, and PCIe.  When these boards boot up, the boot
> loader enables this regulator along with other resources in order to
> display a splash screen image.  The regulator must remain enabled until
> the Linux display driver has probed and made a request with the regulator
> framework to keep the regulator enabled.  If the regulator is disabled
> prematurely, then the screen image is corrupted and the display hardware
> enters a bad state.
> 
> We have observed that when the camera driver probes before the display
> driver, it performs this sequence: regulator_enable(), camera register IO,
> regulator_disable().  Since it is the first consumer of the shared
> regulator, the regulator is physically disabled (even though display
> hardware still requires it to be enabled).  We have solved this problem
> when compiling drivers statically with a downstream regulator
> proxy-consumer driver.  This proxy-consumer is able to make an enable
> request for the shared regulator before any other consumer.  It then
> removes its request at late_initcall_sync.
> 
> Unfortunately, when drivers are compiled as modules instead of compiled
> statically into the kernel image, late_initcall_sync is not a meaningful
> marker of driver probe completion.  This means that our existing proxy
> voting system will not work when drivers are compiled as modules.  The
> sync_state() callback resolves this issue by providing a notification that
> is guaranteed to arrive only after all consumers of the shared regulator
> have probed.
> 
> QTI boards have other cases of shared resources such as bus bandwidth
> which must remain at least at a level set by boot loaders in order to
> properly support hardware blocks that are enabled before the Linux kernel
> starts booting.
> 
> Take care,
> David
> 

To paraphrase, the problem is:

   - bootloader enables a regulator for display
   - during Linux boot camera driver probes:
      * enable the regulator also used for display
      * disable the regulator
         + screen image is corrupted
         + display hardware enters bad state
   - later during Linux boot display driver probes:
      * enable the regulator, but too late

So the problem is an ordering dependency between the camera driver probe
and the display driver probe.

Or alternatively the problem could be seen as: the bootloader has enabled
a regulator for a device that the bootloader is aware of, but has not
communicated to the Linux regulator framework that the device requires
the regulator to remain enabled.

Thinking about the problem this way could lead to an entirely different
solution.

-Frank
      

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RESEND PATCH v1 1/5] of/platform: Speed up of_find_device_by_node()
  2019-06-12 21:12                       ` Rob Herring
@ 2019-06-12 22:10                         ` Saravana Kannan
  0 siblings, 0 replies; 35+ messages in thread
From: Saravana Kannan @ 2019-06-12 22:10 UTC (permalink / raw)
  To: Rob Herring
  Cc: Greg Kroah-Hartman, Sandeep Patil, Frank Rowand, Mark Rutland,
	Rafael J. Wysocki, David Collins, devicetree, linux-kernel,
	Android Kernel Team

On Wed, Jun 12, 2019 at 2:12 PM Rob Herring <robh+dt@kernel.org> wrote:
>
> On Wed, Jun 12, 2019 at 1:29 PM Saravana Kannan <saravanak@google.com> wrote:
> >
> > On Wed, Jun 12, 2019 at 11:19 AM Rob Herring <robh+dt@kernel.org> wrote:
> > >
> > > On Wed, Jun 12, 2019 at 11:08 AM Greg Kroah-Hartman
> > > <gregkh@linuxfoundation.org> wrote:
> > > >
> > > > On Wed, Jun 12, 2019 at 10:53:09AM -0600, Rob Herring wrote:
> > > > > On Wed, Jun 12, 2019 at 8:22 AM Greg Kroah-Hartman
> > > > > <gregkh@linuxfoundation.org> wrote:
> > > > > >
> > > > > > On Wed, Jun 12, 2019 at 07:53:39AM -0600, Rob Herring wrote:
> > > > > > > On Tue, Jun 11, 2019 at 3:52 PM Sandeep Patil <sspatil@android.com> wrote:
> > > > > > > >
> > > > > > > > On Tue, Jun 11, 2019 at 01:56:25PM -0700, 'Saravana Kannan' via kernel-team wrote:
> > > > > > > > > On Tue, Jun 11, 2019 at 8:18 AM Frank Rowand <frowand.list@gmail.com> wrote:
> > > > > > > > > >
> > > > > > > > > > Hi Saravana,
> > > > > > > > > >
> > > > > > > > > > On 6/10/19 10:36 AM, Rob Herring wrote:
> > > > > > > > > > > Why are you resending this rather than replying to Frank's last
> > > > > > > > > > > comments on the original?
> > > > > > > > > >
> > > > > > > > > > Adding on a different aspect...  The independent replies from three different
> > > > > > > > > > maintainers (Rob, Mark, myself) pointed out architectural issues with the
> > > > > > > > > > patch series.  There were also some implementation issues brought out.
> > > > > > > > > > (Although I refrained from bringing up most of my implementation issues
> > > > > > > > > > as they are not relevant until architecture issues are resolved.)
> > > > > > > > >
> > > > > > > > > Right, I'm not too worried about the implementation issues before we
> > > > > > > > > settle on the architectural issues. Those are easy to fix.
> > > > > > > > >
> > > > > > > > > Honestly, the main points that the maintainers raised are:
> > > > > > > > > 1) This is a configuration property and not describing the device.
> > > > > > > > > Just use the implicit dependencies coming from existing bindings.
> > > > > > > > >
> > > > > > > > > I gave a bunch of reasons for why I think it isn't an OS configuration
> > > > > > > > > property. But even if that's not something the maintainers can agree
> > > > > > > > > to, I gave a concrete example (cyclic dependencies between clock
> > > > > > > > > provider hardware) where the implicit dependencies would prevent one
> > > > > > > > > of the devices from probing till the end of time. So even if the
> > > > > > > > > maintainers don't agree we should always look at "depends-on" to
> > > > > > > > > decide the dependencies, we still need some means to override the
> > > > > > > > > implicit dependencies where they don't match the real dependency. Can
> > > > > > > > > we use depends-on as an override when the implicit dependencies aren't
> > > > > > > > > correct?
> > > > > > > > >
> > > > > > > > > 2) This doesn't need to be solved because this is just optimizing
> > > > > > > > > probing or saving power ("we should get rid of this auto disabling"):
> > > > > > > > >
> > > > > > > > > I explained why this patch series is not just about optimizing probe
> > > > > > > > > ordering or saving power. And why we can't ignore auto disabling
> > > > > > > > > (because it's more than just auto disabling). The kernel is currently
> > > > > > > > > broken when trying to use modules in ARM SoCs (probably in other
> > > > > > > > > systems/archs too, but I can't speak for those).
> > > > > > > > >
> > > > > > > > > 3) Concerns about backwards compatibility
> > > > > > > > >
> > > > > > > > > I pointed out why the current scheme (depends-on being the only source
> > > > > > > > > of dependency) doesn't break compatibility. And if we go with
> > > > > > > > > "depends-on" as an override what we could do to keep backwards
> > > > > > > > > compatibility. Happy to hear more thoughts or discuss options.
> > > > > > > > >
> > > > > > > > > 4) How the "sync_state" would work for a device that supplies multiple
> > > > > > > > > functionalities but a limited driver.
> > > > > > > >
> > > > > > > > <snip>
> > > > > > > > To be clear, all of above are _real_ problems that stops us from efficiently
> > > > > > > > load device drivers as modules for Android.
> > > > > > > >
> > > > > > > > So, if 'depends-on' doesn't seem like the right approach and "going back to
> > > > > > > > the drawing board" is the ask, could you please point us in the right
> > > > > > > > direction?
> > > > > > >
> > > > > > > Use the dependencies which are already there in DT. That's clocks,
> > > > > > > pinctrl, regulators, interrupts, gpio at a minimum. I'm simply not
> > > > > > > going to accept duplicating all those dependencies in DT. The downside
> > > > > > > for the kernel is you have to address these one by one and can't have
> > > > > > > a generic property the driver core code can parse. After that's in
> > > > > > > place, then maybe we can consider handling any additional dependencies
> > > > > > > not already captured in DT. Once all that is in place, we can probably
> > > > > > > sort device and/or driver lists to optimize the probe order (maybe the
> > > > > > > driver core already does that now?).
> > > > > > >
> > > > > > > Get rid of the auto disabling of clocks and regulators in
> > > > > > > late_initcall. It's simply not a valid marker that boot is done when
> > > > > > > modules are involved. We probably can't get rid of it as lot's of
> > > > > > > platforms rely on that, so it will have to be opt out. Make it the
> > > > > > > platform's responsibility for ensuring a consistent state.
> > > > > > >
> > > > > > > Perhaps we need a 'boot done' or 'stop deferring probe' trigger from
> > > > > > > userspace in order to make progress if dependencies are missing.
> > > > > >
> > > > > > People have tried to do this multiple times, and you never really know
> > > > > > when "boot is done" due to busses that have discoverable devices and
> > > > > > async probing of other busses.
> > > > >
> > > > > Yes, I know which is why I proposed the second name with more limited
> > > > > meaning/function.
> > > >
> > > > I still don't want to have the kernel have to rely on this.
> > > >
> > > > > > You do know "something" when you pivot to a new boot disk, and when you
> > > > > > try to load init, but given initramfs and the fact that modules are
> > > > > > usually included on them, that's not really a good indication that
> > > > > > anything is "finished".
> > > > > >
> > > > > > I don't want userspace to be responsible for telling the kernel, "hey
> > > > > > you should be finished now!", as that's an async notification that is
> > > > > > going to be ripe for problems.
> > > > >
> > > > > The usecase I care about here is when the DT has the dependency
> > > > > information, but the kernel doesn't have the driver and the dependency
> > > > > is never resolved.
> > > >
> > > > Then we have the same situation as today and nothing different happens,
> > > > right?
> > >
> > > Huh?
> > >
> > > This works today, but not for modules.
> >
> > Replying to this a bit further down.
> >
> > >
> > > >
> > > > > The same problem has to be solved with a
> > > > > 'depends-on' property. This easily happens with a new DT with added
> > > > > dependencies like pinctrl and an old kernel that doesn't have the
> > > > > "new" driver.
> >
> > Isn't this the perfect example of an "implicit dependency" in a DT
> > node not being a mandatory dependency? The old kernel worked fine with
> > older DT without the added pinctrl dependency, so treating it as a
> > mandatory dependency seems wrong for that particular device?
> > depends-on avoids all this because the older kernel won't parse
> > depends-on. And for newer kernels, it'll parse only what depends-on
> > says are dependencies and not make wrong assumptions.
>
> What happens when the older kernel is a kernel that parses
> 'depends-on'? What's 'older' is a moving target. We just need a
> 'depends-on-v5.2', 'depends-on-v5.3', etc.
> It's not just 'old' kernels. Current kernels which didn't enable some
> driver also have the same problem. All of this is mostly just a
> development problem on incomplete platforms. But it is a real problem
> that SUSE folks have raised.
> > > > > Another example is IOMMUs. We need some way to say stop
> > > > > waiting for dependencies. It is really just a debug option (of course,
> > > > > how to prevent a debug option from being used in production?). This
> > > > > works now for built-in cases with the same late_initcall abuse.
> > > >
> > > > What is a debug option?  We need something "for real".
> > > >
> > > > > Using late_initcall_sync as an indicator has all the same problems
> > > > > with userspace indicating boot finished. We should get rid of the
> > > > > late_initcall_sync abuses and stop trying to work around them.
> > > >
> > > > I agree, but that's not the issue here.
> > >
> > > It is because the cover letter mentions it and downstream work around it.
> >
> > This patch series is trying to remove the use of late_initcall_sync
> > and instead relying on dependency information coming from DT. So, you
> > are agreeing with the patchset.
>
> That aspect, yes. It was Greg that said the late_initcall_sync abuses
> were not the issue. 'depends-on' is the only thing I'm objecting to
> ATM and I understand the problem (you're not the first to try). I've
> not studied the the details of the patchset though beyond that.
>
> > > > > > I really like the "depends-on" information, as it shows a topology that
> > > > > > DT doesn't seem to be able to show today, yet we rely on it in the
> > > > > > kernel with the whole deferred probing mess.  To me, there doesn't seem
> > > > > > to be any other way to properly "know" this.
> > > > >
> > > > > As I said, DT *does* have this dependency information already. The
> > > > > problem is the kernel probing doesn't use it. Fix that and then we can
> > > > > discuss dependencies the DT doesn't provide that the kernel needs.
> > > >
> > > > Where can the kernel probing be fixed to use it?  What am I missing that
> > > > can be done instead of what this patchset does?
> > >
> > > Somewhere, either in each subsystem or in the DT or core code creating
> > > struct devices, you need to iterate thru the dependencies. Take clocks
> > > as an example:
> > >
> > > for each node:
> > >   for each 'clocks' phandle
> > >     Lookup struct device from clock phandle
> > >     Add the clock provider struct device to node's struct device links
> > >
> > > Now, repeat this for regulators, interrupts, etc.
> >
> > I'm more than happy to do this if the maintainers can accept this as a
> > solution, but then we need to agree that we need an override property

Can you please respond to this? For the clock controller example, we
need to explicitly override the implied dependency because... see
subsequent comments below.
Can we at least agree to using "depends-on" as an override when the
existing bindings don't properly define the mandatory dependencies? If
it'll make you happy, we can even make it "does-not-depend-on".

> > if the implicit dependency isn't a mandatory dependency. We also need
> > to agree on how we do this without breaking backwards compatibility.
> > Either as a config option for this feature or have a property in the
> > "chosen" node to say it's okay to assume existing bindings imply
> > mandatory dependencies (it's just describing the DT more explicitly
> > and the kernel will use it to enable this feature).
>
> Thinking of the above example I gave, the problem is mandatory or not
> can't be decided in the DT. It's a function of what the kernel
> supports or not, so the kernel has to decide.

That might be debatable for the example you gave, but it's certainly
not debatable for some clock controllers (CC) that have cyclic
dependencies. Only one of them can function without the other.

> With a 'depends-on'
> property, the value you'd want depends on which kernel do you boot.

In the cases I'm thinking of, this has nothing to do with the kernel.
CC1 supplies 50 clocks, 5 of which need input from CC2. CC2 supplies
10 clocks, all of them are dependent on 4 inputs from CC1. CC2 can
never function without CC1. But in DT, they'll have a cyclic
dependency if you go by clock bindings they list.

> If
> I'm booting a kernel with the pinctrl driver, then pinctrl is a
> mandatory dependency. If I'm booting a kernel without the pinctrl
> driver, then pinctrl is not a 'depends-on' dependency.

The way I define depends-on, pinctrl will never be a dependency for
this specific device (because you are saying it can function without
it). The driver decides if it wants to -EPROBE_DEFER or not depending
on the kernel. But I've beaten this horse to death. So, moving
along...

> Ignoring
> dependencies like this case only works when things are built-in and
> only for specific subsystems where it makes sense. It would be nice to
> solve it for modules too, but that can be done later.
>
> There's also the case of dependencies where the driver decides how to
> handle them. For example, UARTs will use DMA if the dmaengine driver
> is available and that decision is (at least for some drivers I've
> looked at) made on every open(). I suppose in that case, the driver
> could remove the dependency link since it knows it can work with or
> without DMA, though we'd need some sort of mechanism to do that.

Now you are just going back to listing the examples I gave for why we
need "depends-on" and why existing bindings aren't mandatory
dependencies. Don't know if I should laugh or cry :)

-Saravana

> > Although regulator binding are a "problem" because the kernel will
> > have to parse every property in a node to check if it ends with
> > -supply and then treat it as if it's a regulator binding (even though
> > it might not be). So regulators will need some kind of "opt out" in
> > the kernel (not DT).
>
> Yes, that's unfortunate. GPIO is the same. You can safely assume that
> '-supply' is a regulator. This is at least somewhat enforced by the
> binding schema now.
>
> We probably need a for_each_of_property_with_suffix() iterator for
> these cases and with some pointer math on the property names or a
> custom str compare function, it shouldn't be much slower to match.
>
> > > This series is pretty much doing the same thing, you just have to
> > > parse each provider rather than only 'depends-on'.
> > >
> > > One issue is the struct device for the dependency may not be created
> > > yet. I think this series would have the same issue, but haven't dug
> > > into how it avoids that or whether it just ignores it and falls back
> > > to deferring probe.
> >
> > The patch series handles this properly and doesn't fall back to
> > deferred probing.
> >
> > > I'm also not clear on how you create struct devices and add
> > > dependencies before probing gets attempted. If a driver is already
> > > registered, probe is going to be attempted before any dependencies are
> > > added. I guess the issue is avoided with drivers being modules, but
> > > any solution should work for built-in too.
> >
> > This is also handled properly in the series.
> >
> > I've actually boot tested both these scenarios you call out and the
> > patch series handles them properly.
>
> Okay, that's good.
>
> > But you are missing the main point here. The goal isn't to just
> > eliminate deferred probing (it's a great side effect even it it just
> > stops 99% of them), but also remove the bad assumption that
> > late_initcall_sync() means all the devices are probed. The suppliers
> > need a better signal (which the patch series provides) to tell when
> > they can "unfreeze" the resources left on at boot.
>
> I understand this. I think I was clear on my opinion about using
> late_initcall_sync.
>
> Rob

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RESEND PATCH v1 0/5] Solve postboot supplier cleanup and optimize probe ordering
  2019-06-12 21:21 ` [RESEND PATCH v1 0/5] Solve postboot supplier cleanup and optimize probe ordering Frank Rowand
@ 2019-06-13 13:19   ` Rob Herring
  0 siblings, 0 replies; 35+ messages in thread
From: Rob Herring @ 2019-06-13 13:19 UTC (permalink / raw)
  To: Frank Rowand
  Cc: Saravana Kannan, Mark Rutland, Greg Kroah-Hartman,
	Rafael J. Wysocki, David Collins, devicetree, linux-kernel,
	Android Kernel Team

On Wed, Jun 12, 2019 at 3:21 PM Frank Rowand <frowand.list@gmail.com> wrote:
>
> Adding cc: David Collins
>
> Plus my comments below.
>
> On 6/3/19 5:32 PM, Saravana Kannan wrote:
> > Add a generic "depends-on" property that allows specifying mandatory
> > functional dependencies between devices. Add device-links after the
> > devices are created (but before they are probed) by looking at this
> > "depends-on" property.
> >
> > This property is used instead of existing DT properties that specify
> > phandles of other devices (Eg: clocks, pinctrl, regulators, etc). This
> > is because not all resources referred to by existing DT properties are
> > mandatory functional dependencies. Some devices/drivers might be able
> > to operate with reduced functionality when some of the resources
> > aren't available. For example, a device could operate in polling mode
> > if no IRQ is available, a device could skip doing power management if
> > clock or voltage control isn't available and they are left on, etc.
> >
> > So, adding mandatory functional dependency links between devices by
> > looking at referred phandles in DT properties won't work as it would
> > prevent probing devices that could be probed. By having an explicit
> > depends-on property, we can handle these cases correctly.
> >
> > Having functional dependencies explicitly called out in DT and
> > automatically added before the devices are probed, provides the
> > following benefits:
> >
> > - Optimizes device probe order and avoids the useless work of
> >   attempting probes of devices that will not probe successfully
> >   (because their suppliers aren't present or haven't probed yet).
> >
> >   For example, in a commonly available mobile SoC, registering just
> >   one consumer device's driver at an initcall level earlier than the
> >   supplier device's driver causes 11 failed probe attempts before the
> >   consumer device probes successfully. This was with a kernel with all
> >   the drivers statically compiled in. This problem gets a lot worse if
> >   all the drivers are loaded as modules without direct symbol
> >   dependencies.
> >
> > - Supplier devices like clock providers, regulators providers, etc
> >   need to keep the resources they provide active and at a particular
> >   state(s) during boot up even if their current set of consumers don't
> >   request the resource to be active. This is because the rest of the
> >   consumers might not have probed yet and turning off the resource
> >   before all the consumers have probed could lead to a hang or
> >   undesired user experience.
> >
> >   Some frameworks (Eg: regulator) handle this today by turning off
> >   "unused" resources at late_initcall_sync and hoping all the devices
> >   have probed by then. This is not a valid assumption for systems with
> >   loadable modules. Other frameworks (Eg: clock) just don't handle
> >   this due to the lack of a clear signal for when they can turn off
> >   resources. This leads to downstream hacks to handle cases like this
> >   that can easily be solved in the upstream kernel.
> >
> >   By linking devices before they are probed, we give suppliers a clear
> >   count of the number of dependent consumers. Once all of the
> >   consumers are active, the suppliers can turn off the unused
> >   resources without making assumptions about the number of consumers.
> >
> > By default we just add device-links to track "driver presence" (probe
> > succeeded) of the supplier device. If any other functionality provided
> > by device-links are needed, it is left to the consumer/supplier
> > devices to change the link when they probe.
> >
> >
> > Saravana Kannan (5):
> >   of/platform: Speed up of_find_device_by_node()
> >   driver core: Add device links support for pending links to suppliers
> >   dt-bindings: Add depends-on property
> >   of/platform: Add functional dependency link from "depends-on" property
> >   driver core: Add sync_state driver/bus callback
> >
> >  .../devicetree/bindings/depends-on.txt        |  26 +++++
> >  drivers/base/core.c                           | 106 ++++++++++++++++++
> >  drivers/of/platform.c                         |  75 ++++++++++++-
> >  include/linux/device.h                        |  24 ++++
> >  include/linux/of.h                            |   3 +
> >  5 files changed, 233 insertions(+), 1 deletion(-)
> >  create mode 100644 Documentation/devicetree/bindings/depends-on.txt
> >
>
>
> I don't think the above description adequately describes one key problem
> that the patch set addresses.
>
> David Collins described the problem in an email late in the thread of
> the first submission of this series.  Instead of providing a link to
> that email, I am going to fully copy it here:
>
> On 5/31/19 4:27 PM, David Collins wrote:
> > Hello Saravana,
> >
> > On 5/23/19 6:01 PM, Saravana Kannan wrote:
> > ...
> >> Having functional dependencies explicitly called out in DT and
> >> automatically added before the devices are probed, provides the
> >> following benefits:
> > ...
> >> - Supplier devices like clock providers, regulators providers, etc
> >>   need to keep the resources they provide active and at a particular
> >>   state(s) during boot up even if their current set of consumers don't
> >>   request the resource to be active. This is because the rest of the
> >>   consumers might not have probed yet and turning off the resource
> >>   before all the consumers have probed could lead to a hang or
> >>   undesired user experience.
> > This benefit provided by the sync_state() callback function introduced in
> > this series gives us a mechanism to solve a specific problem encountered
> > on Qualcomm Technologies, Inc. (QTI) boards when booting with drivers
> > compiled as modules.  QTI boards have a regulator that powers the PHYs for
> > display, camera, USB, UFS, and PCIe.  When these boards boot up, the boot
> > loader enables this regulator along with other resources in order to
> > display a splash screen image.  The regulator must remain enabled until
> > the Linux display driver has probed and made a request with the regulator
> > framework to keep the regulator enabled.  If the regulator is disabled
> > prematurely, then the screen image is corrupted and the display hardware
> > enters a bad state.
> >
> > We have observed that when the camera driver probes before the display
> > driver, it performs this sequence: regulator_enable(), camera register IO,
> > regulator_disable().  Since it is the first consumer of the shared
> > regulator, the regulator is physically disabled (even though display
> > hardware still requires it to be enabled).  We have solved this problem
> > when compiling drivers statically with a downstream regulator
> > proxy-consumer driver.  This proxy-consumer is able to make an enable
> > request for the shared regulator before any other consumer.  It then
> > removes its request at late_initcall_sync.
> >
> > Unfortunately, when drivers are compiled as modules instead of compiled
> > statically into the kernel image, late_initcall_sync is not a meaningful
> > marker of driver probe completion.  This means that our existing proxy
> > voting system will not work when drivers are compiled as modules.  The
> > sync_state() callback resolves this issue by providing a notification that
> > is guaranteed to arrive only after all consumers of the shared regulator
> > have probed.
> >
> > QTI boards have other cases of shared resources such as bus bandwidth
> > which must remain at least at a level set by boot loaders in order to
> > properly support hardware blocks that are enabled before the Linux kernel
> > starts booting.
> >
> > Take care,
> > David
> >
>
> To paraphrase, the problem is:
>
>    - bootloader enables a regulator for display
>    - during Linux boot camera driver probes:
>       * enable the regulator also used for display
>       * disable the regulator
>          + screen image is corrupted
>          + display hardware enters bad state
>    - later during Linux boot display driver probes:
>       * enable the regulator, but too late
>
> So the problem is an ordering dependency between the camera driver probe
> and the display driver probe.
>
> Or alternatively the problem could be seen as: the bootloader has enabled
> a regulator for a device that the bootloader is aware of, but has not
> communicated to the Linux regulator framework that the device requires
> the regulator to remain enabled.

The bootloader should have communicated this information via the
'simple-framebuffer' binding. The problem is I think we don't have any
mechanism to ensure the simple-fb driver probes before other drivers
(other than it's dependencies). Well, there are initcall levels, but
I'd really like to get rid of any reliance on that. Anything that
relies on initcall level ordering is not going to work as module
unless userspace also ensures the ordering.

The other case I've heard of is the serial port's clock getting
disabled and killing the earlycon as we assume the bootloader left
serial port dependencies configured.

> Thinking about the problem this way could lead to an entirely different
> solution.

I think the ordering issue is orthogonal to using modules. We need
some way to prioritize probing of some devices. IMO, just prioritizing
the 'stdout-path' device and 'simple-framebuffer' if present solves
this. I don't think we need a general solution in this case. The may
be a few other cases, but I think these are the exception. The state
from the bootloader (and don't forget about kexec) needs to be well
defined and minimal.

Rob

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RESEND PATCH v1 1/5] of/platform: Speed up of_find_device_by_node()
  2019-06-12 19:29                     ` Saravana Kannan
  2019-06-12 20:23                       ` Frank Rowand
  2019-06-12 21:12                       ` Rob Herring
@ 2019-06-18 20:47                       ` Sandeep Patil
  2019-06-18 21:22                         ` Saravana Kannan
  2 siblings, 1 reply; 35+ messages in thread
From: Sandeep Patil @ 2019-06-18 20:47 UTC (permalink / raw)
  To: Saravana Kannan
  Cc: Rob Herring, Greg Kroah-Hartman, Frank Rowand, Mark Rutland,
	Rafael J. Wysocki, David Collins, devicetree, linux-kernel,
	Android Kernel Team

On Wed, Jun 12, 2019 at 12:29:18PM -0700, 'Saravana Kannan' via kernel-team wrote:
> On Wed, Jun 12, 2019 at 11:19 AM Rob Herring <robh+dt@kernel.org> wrote:
> >
> > On Wed, Jun 12, 2019 at 11:08 AM Greg Kroah-Hartman
> > <gregkh@linuxfoundation.org> wrote:
> > >
> > > On Wed, Jun 12, 2019 at 10:53:09AM -0600, Rob Herring wrote:
> > > > On Wed, Jun 12, 2019 at 8:22 AM Greg Kroah-Hartman
> > > > <gregkh@linuxfoundation.org> wrote:
> > > > >
> > > > > On Wed, Jun 12, 2019 at 07:53:39AM -0600, Rob Herring wrote:
> > > > > > On Tue, Jun 11, 2019 at 3:52 PM Sandeep Patil <sspatil@android.com> wrote:
> > > > > > >
> > > > > > > On Tue, Jun 11, 2019 at 01:56:25PM -0700, 'Saravana Kannan' via kernel-team wrote:
> > > > > > > > On Tue, Jun 11, 2019 at 8:18 AM Frank Rowand <frowand.list@gmail.com> wrote:
> > > > > > > > >
> > > > > > > > > Hi Saravana,
> > > > > > > > >
> > > > > > > > > On 6/10/19 10:36 AM, Rob Herring wrote:
> > > > > > > > > > Why are you resending this rather than replying to Frank's last
> > > > > > > > > > comments on the original?
> > > > > > > > >
> > > > > > > > > Adding on a different aspect...  The independent replies from three different
> > > > > > > > > maintainers (Rob, Mark, myself) pointed out architectural issues with the
> > > > > > > > > patch series.  There were also some implementation issues brought out.
> > > > > > > > > (Although I refrained from bringing up most of my implementation issues
> > > > > > > > > as they are not relevant until architecture issues are resolved.)
> > > > > > > >
> > > > > > > > Right, I'm not too worried about the implementation issues before we
> > > > > > > > settle on the architectural issues. Those are easy to fix.
> > > > > > > >
> > > > > > > > Honestly, the main points that the maintainers raised are:
> > > > > > > > 1) This is a configuration property and not describing the device.
> > > > > > > > Just use the implicit dependencies coming from existing bindings.
> > > > > > > >
> > > > > > > > I gave a bunch of reasons for why I think it isn't an OS configuration
> > > > > > > > property. But even if that's not something the maintainers can agree
> > > > > > > > to, I gave a concrete example (cyclic dependencies between clock
> > > > > > > > provider hardware) where the implicit dependencies would prevent one
> > > > > > > > of the devices from probing till the end of time. So even if the
> > > > > > > > maintainers don't agree we should always look at "depends-on" to
> > > > > > > > decide the dependencies, we still need some means to override the
> > > > > > > > implicit dependencies where they don't match the real dependency. Can
> > > > > > > > we use depends-on as an override when the implicit dependencies aren't
> > > > > > > > correct?
> > > > > > > >
> > > > > > > > 2) This doesn't need to be solved because this is just optimizing
> > > > > > > > probing or saving power ("we should get rid of this auto disabling"):
> > > > > > > >
> > > > > > > > I explained why this patch series is not just about optimizing probe
> > > > > > > > ordering or saving power. And why we can't ignore auto disabling
> > > > > > > > (because it's more than just auto disabling). The kernel is currently
> > > > > > > > broken when trying to use modules in ARM SoCs (probably in other
> > > > > > > > systems/archs too, but I can't speak for those).
> > > > > > > >
> > > > > > > > 3) Concerns about backwards compatibility
> > > > > > > >
> > > > > > > > I pointed out why the current scheme (depends-on being the only source
> > > > > > > > of dependency) doesn't break compatibility. And if we go with
> > > > > > > > "depends-on" as an override what we could do to keep backwards
> > > > > > > > compatibility. Happy to hear more thoughts or discuss options.
> > > > > > > >
> > > > > > > > 4) How the "sync_state" would work for a device that supplies multiple
> > > > > > > > functionalities but a limited driver.
> > > > > > >
> > > > > > > <snip>
> > > > > > > To be clear, all of above are _real_ problems that stops us from efficiently
> > > > > > > load device drivers as modules for Android.
> > > > > > >
> > > > > > > So, if 'depends-on' doesn't seem like the right approach and "going back to
> > > > > > > the drawing board" is the ask, could you please point us in the right
> > > > > > > direction?
> > > > > >
> > > > > > Use the dependencies which are already there in DT. That's clocks,
> > > > > > pinctrl, regulators, interrupts, gpio at a minimum. I'm simply not
> > > > > > going to accept duplicating all those dependencies in DT. The downside
> > > > > > for the kernel is you have to address these one by one and can't have
> > > > > > a generic property the driver core code can parse. After that's in
> > > > > > place, then maybe we can consider handling any additional dependencies
> > > > > > not already captured in DT. Once all that is in place, we can probably
> > > > > > sort device and/or driver lists to optimize the probe order (maybe the
> > > > > > driver core already does that now?).
> > > > > >
> > > > > > Get rid of the auto disabling of clocks and regulators in
> > > > > > late_initcall. It's simply not a valid marker that boot is done when
> > > > > > modules are involved. We probably can't get rid of it as lot's of
> > > > > > platforms rely on that, so it will have to be opt out. Make it the
> > > > > > platform's responsibility for ensuring a consistent state.
> > > > > >
> > > > > > Perhaps we need a 'boot done' or 'stop deferring probe' trigger from
> > > > > > userspace in order to make progress if dependencies are missing.
> > > > >
> > > > > People have tried to do this multiple times, and you never really know
> > > > > when "boot is done" due to busses that have discoverable devices and
> > > > > async probing of other busses.
> > > >
> > > > Yes, I know which is why I proposed the second name with more limited
> > > > meaning/function.
> > >
> > > I still don't want to have the kernel have to rely on this.
> > >
> > > > > You do know "something" when you pivot to a new boot disk, and when you
> > > > > try to load init, but given initramfs and the fact that modules are
> > > > > usually included on them, that's not really a good indication that
> > > > > anything is "finished".
> > > > >
> > > > > I don't want userspace to be responsible for telling the kernel, "hey
> > > > > you should be finished now!", as that's an async notification that is
> > > > > going to be ripe for problems.
> > > >
> > > > The usecase I care about here is when the DT has the dependency
> > > > information, but the kernel doesn't have the driver and the dependency
> > > > is never resolved.
> > >
> > > Then we have the same situation as today and nothing different happens,
> > > right?
> >
> > Huh?
> >
> > This works today, but not for modules.
> 
> Replying to this a bit further down.
> 
> >
> > >
> > > > The same problem has to be solved with a
> > > > 'depends-on' property. This easily happens with a new DT with added
> > > > dependencies like pinctrl and an old kernel that doesn't have the
> > > > "new" driver.
> 
> Isn't this the perfect example of an "implicit dependency" in a DT
> node not being a mandatory dependency? The old kernel worked fine with
> older DT without the added pinctrl dependency, so treating it as a
> mandatory dependency seems wrong for that particular device?
> depends-on avoids all this because the older kernel won't parse
> depends-on. And for newer kernels, it'll parse only what depends-on
> says are dependencies and not make wrong assumptions.
> 
> > > > Another example is IOMMUs. We need some way to say stop
> > > > waiting for dependencies. It is really just a debug option (of course,
> > > > how to prevent a debug option from being used in production?). This
> > > > works now for built-in cases with the same late_initcall abuse.
> > >
> > > What is a debug option?  We need something "for real".
> > >
> > > > Using late_initcall_sync as an indicator has all the same problems
> > > > with userspace indicating boot finished. We should get rid of the
> > > > late_initcall_sync abuses and stop trying to work around them.
> > >
> > > I agree, but that's not the issue here.
> >
> > It is because the cover letter mentions it and downstream work around it.
> 
> This patch series is trying to remove the use of late_initcall_sync
> and instead relying on dependency information coming from DT. So, you
> are agreeing with the patchset.
> 
> > > > > I really like the "depends-on" information, as it shows a topology that
> > > > > DT doesn't seem to be able to show today, yet we rely on it in the
> > > > > kernel with the whole deferred probing mess.  To me, there doesn't seem
> > > > > to be any other way to properly "know" this.
> > > >
> > > > As I said, DT *does* have this dependency information already. The
> > > > problem is the kernel probing doesn't use it. Fix that and then we can
> > > > discuss dependencies the DT doesn't provide that the kernel needs.
> > >
> > > Where can the kernel probing be fixed to use it?  What am I missing that
> > > can be done instead of what this patchset does?
> >
> > Somewhere, either in each subsystem or in the DT or core code creating
> > struct devices, you need to iterate thru the dependencies. Take clocks
> > as an example:
> >
> > for each node:
> >   for each 'clocks' phandle
> >     Lookup struct device from clock phandle
> >     Add the clock provider struct device to node's struct device links
> >
> > Now, repeat this for regulators, interrupts, etc.
> 
> I'm more than happy to do this if the maintainers can accept this as a
> solution, but then we need to agree that we need an override property
> if the implicit dependency isn't a mandatory dependency.

I don't quite understand what you mean by "isn't a mandatory dependency"
here. I think IIUC, what Rob said will solve the probe order problem,
correct?

Is there a problem if we split this in two and handle the
late_initcall_sync() + regulators separately and solve the probe ordering
here as suggested above?

I know the original intention of the series is to resolve the
late_initcall_sync() assumption and probe order was a "side-effect". However,
I think probing in the dependency order is still extremely valuable and can
resolve boot time issues ahead of time.

> We also need
> to agree on how we do this without breaking backwards compatibility.
> Either as a config option for this feature or have a property in the
> "chosen" node to say it's okay to assume existing bindings imply
> mandatory dependencies (it's just describing the DT more explicitly
> and the kernel will use it to enable this feature).
> 
> Although regulator binding are a "problem" because the kernel will
> have to parse every property in a node to check if it ends with
> -supply and then treat it as if it's a regulator binding (even though
> it might not be). So regulators will need some kind of "opt out" in
> the kernel (not DT).

Agree and it is going to immediately conflict with 'power-supply' for
example. If we are going this route, then we need to fix and agree on
standard regulator bindings too and make the changes everywhere in the
kernel.

> 
> > This series is pretty much doing the same thing, you just have to
> > parse each provider rather than only 'depends-on'.
> >
> > One issue is the struct device for the dependency may not be created
> > yet. I think this series would have the same issue, but haven't dug
> > into how it avoids that or whether it just ignores it and falls back
> > to deferring probe.
> 
> The patch series handles this properly and doesn't fall back to
> deferred probing.
> 
> > I'm also not clear on how you create struct devices and add
> > dependencies before probing gets attempted. If a driver is already
> > registered, probe is going to be attempted before any dependencies are
> > added. I guess the issue is avoided with drivers being modules, but
> > any solution should work for built-in too.
> 
> This is also handled properly in the series.
> 
> I've actually boot tested both these scenarios you call out and the
> patch series handles them properly.
> 
> But you are missing the main point here. The goal isn't to just
> eliminate deferred probing (it's a great side effect even it it just
> stops 99% of them), but also remove the bad assumption that
> late_initcall_sync() means all the devices are probed. The suppliers
> need a better signal (which the patch series provides) to tell when
> they can "unfreeze" the resources left on at boot.
> 

Is the summary here that we need to figure out a different approach / fix
regulator framework, or something else ? It wasn't clear from all other
emails from this thread, sorry for noise if I missed it.

- ssp

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RESEND PATCH v1 1/5] of/platform: Speed up of_find_device_by_node()
  2019-06-18 20:47                       ` Sandeep Patil
@ 2019-06-18 21:22                         ` Saravana Kannan
  0 siblings, 0 replies; 35+ messages in thread
From: Saravana Kannan @ 2019-06-18 21:22 UTC (permalink / raw)
  To: Sandeep Patil
  Cc: Rob Herring, Greg Kroah-Hartman, Frank Rowand, Mark Rutland,
	Rafael J. Wysocki, David Collins, devicetree, linux-kernel,
	Android Kernel Team

On Tue, Jun 18, 2019 at 1:47 PM Sandeep Patil <sspatil@android.com> wrote:
>
> On Wed, Jun 12, 2019 at 12:29:18PM -0700, 'Saravana Kannan' via kernel-team wrote:
> > On Wed, Jun 12, 2019 at 11:19 AM Rob Herring <robh+dt@kernel.org> wrote:
> > >
> > > On Wed, Jun 12, 2019 at 11:08 AM Greg Kroah-Hartman
> > > <gregkh@linuxfoundation.org> wrote:
> > > >
> > > > On Wed, Jun 12, 2019 at 10:53:09AM -0600, Rob Herring wrote:
> > > > > On Wed, Jun 12, 2019 at 8:22 AM Greg Kroah-Hartman
> > > > > <gregkh@linuxfoundation.org> wrote:
> > > > > >
> > > > > > On Wed, Jun 12, 2019 at 07:53:39AM -0600, Rob Herring wrote:
> > > > > > > On Tue, Jun 11, 2019 at 3:52 PM Sandeep Patil <sspatil@android.com> wrote:
> > > > > > > >
> > > > > > > > On Tue, Jun 11, 2019 at 01:56:25PM -0700, 'Saravana Kannan' via kernel-team wrote:
> > > > > > > > > On Tue, Jun 11, 2019 at 8:18 AM Frank Rowand <frowand.list@gmail.com> wrote:
> > > > > > > > > >
> > > > > > > > > > Hi Saravana,
> > > > > > > > > >
> > > > > > > > > > On 6/10/19 10:36 AM, Rob Herring wrote:
> > > > > > > > > > > Why are you resending this rather than replying to Frank's last
> > > > > > > > > > > comments on the original?
> > > > > > > > > >
> > > > > > > > > > Adding on a different aspect...  The independent replies from three different
> > > > > > > > > > maintainers (Rob, Mark, myself) pointed out architectural issues with the
> > > > > > > > > > patch series.  There were also some implementation issues brought out.
> > > > > > > > > > (Although I refrained from bringing up most of my implementation issues
> > > > > > > > > > as they are not relevant until architecture issues are resolved.)
> > > > > > > > >
> > > > > > > > > Right, I'm not too worried about the implementation issues before we
> > > > > > > > > settle on the architectural issues. Those are easy to fix.
> > > > > > > > >
> > > > > > > > > Honestly, the main points that the maintainers raised are:
> > > > > > > > > 1) This is a configuration property and not describing the device.
> > > > > > > > > Just use the implicit dependencies coming from existing bindings.
> > > > > > > > >
> > > > > > > > > I gave a bunch of reasons for why I think it isn't an OS configuration
> > > > > > > > > property. But even if that's not something the maintainers can agree
> > > > > > > > > to, I gave a concrete example (cyclic dependencies between clock
> > > > > > > > > provider hardware) where the implicit dependencies would prevent one
> > > > > > > > > of the devices from probing till the end of time. So even if the
> > > > > > > > > maintainers don't agree we should always look at "depends-on" to
> > > > > > > > > decide the dependencies, we still need some means to override the
> > > > > > > > > implicit dependencies where they don't match the real dependency. Can
> > > > > > > > > we use depends-on as an override when the implicit dependencies aren't
> > > > > > > > > correct?
> > > > > > > > >
> > > > > > > > > 2) This doesn't need to be solved because this is just optimizing
> > > > > > > > > probing or saving power ("we should get rid of this auto disabling"):
> > > > > > > > >
> > > > > > > > > I explained why this patch series is not just about optimizing probe
> > > > > > > > > ordering or saving power. And why we can't ignore auto disabling
> > > > > > > > > (because it's more than just auto disabling). The kernel is currently
> > > > > > > > > broken when trying to use modules in ARM SoCs (probably in other
> > > > > > > > > systems/archs too, but I can't speak for those).
> > > > > > > > >
> > > > > > > > > 3) Concerns about backwards compatibility
> > > > > > > > >
> > > > > > > > > I pointed out why the current scheme (depends-on being the only source
> > > > > > > > > of dependency) doesn't break compatibility. And if we go with
> > > > > > > > > "depends-on" as an override what we could do to keep backwards
> > > > > > > > > compatibility. Happy to hear more thoughts or discuss options.
> > > > > > > > >
> > > > > > > > > 4) How the "sync_state" would work for a device that supplies multiple
> > > > > > > > > functionalities but a limited driver.
> > > > > > > >
> > > > > > > > <snip>
> > > > > > > > To be clear, all of above are _real_ problems that stops us from efficiently
> > > > > > > > load device drivers as modules for Android.
> > > > > > > >
> > > > > > > > So, if 'depends-on' doesn't seem like the right approach and "going back to
> > > > > > > > the drawing board" is the ask, could you please point us in the right
> > > > > > > > direction?
> > > > > > >
> > > > > > > Use the dependencies which are already there in DT. That's clocks,
> > > > > > > pinctrl, regulators, interrupts, gpio at a minimum. I'm simply not
> > > > > > > going to accept duplicating all those dependencies in DT. The downside
> > > > > > > for the kernel is you have to address these one by one and can't have
> > > > > > > a generic property the driver core code can parse. After that's in
> > > > > > > place, then maybe we can consider handling any additional dependencies
> > > > > > > not already captured in DT. Once all that is in place, we can probably
> > > > > > > sort device and/or driver lists to optimize the probe order (maybe the
> > > > > > > driver core already does that now?).
> > > > > > >
> > > > > > > Get rid of the auto disabling of clocks and regulators in
> > > > > > > late_initcall. It's simply not a valid marker that boot is done when
> > > > > > > modules are involved. We probably can't get rid of it as lot's of
> > > > > > > platforms rely on that, so it will have to be opt out. Make it the
> > > > > > > platform's responsibility for ensuring a consistent state.
> > > > > > >
> > > > > > > Perhaps we need a 'boot done' or 'stop deferring probe' trigger from
> > > > > > > userspace in order to make progress if dependencies are missing.
> > > > > >
> > > > > > People have tried to do this multiple times, and you never really know
> > > > > > when "boot is done" due to busses that have discoverable devices and
> > > > > > async probing of other busses.
> > > > >
> > > > > Yes, I know which is why I proposed the second name with more limited
> > > > > meaning/function.
> > > >
> > > > I still don't want to have the kernel have to rely on this.
> > > >
> > > > > > You do know "something" when you pivot to a new boot disk, and when you
> > > > > > try to load init, but given initramfs and the fact that modules are
> > > > > > usually included on them, that's not really a good indication that
> > > > > > anything is "finished".
> > > > > >
> > > > > > I don't want userspace to be responsible for telling the kernel, "hey
> > > > > > you should be finished now!", as that's an async notification that is
> > > > > > going to be ripe for problems.
> > > > >
> > > > > The usecase I care about here is when the DT has the dependency
> > > > > information, but the kernel doesn't have the driver and the dependency
> > > > > is never resolved.
> > > >
> > > > Then we have the same situation as today and nothing different happens,
> > > > right?
> > >
> > > Huh?
> > >
> > > This works today, but not for modules.
> >
> > Replying to this a bit further down.
> >
> > >
> > > >
> > > > > The same problem has to be solved with a
> > > > > 'depends-on' property. This easily happens with a new DT with added
> > > > > dependencies like pinctrl and an old kernel that doesn't have the
> > > > > "new" driver.
> >
> > Isn't this the perfect example of an "implicit dependency" in a DT
> > node not being a mandatory dependency? The old kernel worked fine with
> > older DT without the added pinctrl dependency, so treating it as a
> > mandatory dependency seems wrong for that particular device?
> > depends-on avoids all this because the older kernel won't parse
> > depends-on. And for newer kernels, it'll parse only what depends-on
> > says are dependencies and not make wrong assumptions.
> >
> > > > > Another example is IOMMUs. We need some way to say stop
> > > > > waiting for dependencies. It is really just a debug option (of course,
> > > > > how to prevent a debug option from being used in production?). This
> > > > > works now for built-in cases with the same late_initcall abuse.
> > > >
> > > > What is a debug option?  We need something "for real".
> > > >
> > > > > Using late_initcall_sync as an indicator has all the same problems
> > > > > with userspace indicating boot finished. We should get rid of the
> > > > > late_initcall_sync abuses and stop trying to work around them.
> > > >
> > > > I agree, but that's not the issue here.
> > >
> > > It is because the cover letter mentions it and downstream work around it.
> >
> > This patch series is trying to remove the use of late_initcall_sync
> > and instead relying on dependency information coming from DT. So, you
> > are agreeing with the patchset.
> >
> > > > > > I really like the "depends-on" information, as it shows a topology that
> > > > > > DT doesn't seem to be able to show today, yet we rely on it in the
> > > > > > kernel with the whole deferred probing mess.  To me, there doesn't seem
> > > > > > to be any other way to properly "know" this.
> > > > >
> > > > > As I said, DT *does* have this dependency information already. The
> > > > > problem is the kernel probing doesn't use it. Fix that and then we can
> > > > > discuss dependencies the DT doesn't provide that the kernel needs.
> > > >
> > > > Where can the kernel probing be fixed to use it?  What am I missing that
> > > > can be done instead of what this patchset does?
> > >
> > > Somewhere, either in each subsystem or in the DT or core code creating
> > > struct devices, you need to iterate thru the dependencies. Take clocks
> > > as an example:
> > >
> > > for each node:
> > >   for each 'clocks' phandle
> > >     Lookup struct device from clock phandle
> > >     Add the clock provider struct device to node's struct device links
> > >
> > > Now, repeat this for regulators, interrupts, etc.
> >
> > I'm more than happy to do this if the maintainers can accept this as a
> > solution, but then we need to agree that we need an override property
> > if the implicit dependency isn't a mandatory dependency.
>
> I don't quite understand what you mean by "isn't a mandatory dependency"
> here.

A binding to a clock, regulator, etc without which the device could
still operate. Best example is clock providers with cyclic
dependencies. One of them definitely can operate without the other
(but the reverse is not true).

> I think IIUC, what Rob said will solve the probe order problem,
> correct?

No. I'll repeat my statement from earlier:
I'm more than happy to do this if the maintainers can accept this as a
solution, but then we need to agree that we need an override property
if the implicit dependency isn't a mandatory dependency. Clock
providers are just one good example case of devices with cyclic
dependencies. We need to at least have depends-on act as a way to
override implicit dependencies when they don't match the real
dependency.

> Is there a problem if we split this in two and handle the
> late_initcall_sync() + regulators separately and solve the probe ordering
> here as suggested above?

Based on his earlier email, my understanding is that he doesn't care
about the probe ordering being inefficient because he doesn't expect
it to add much to boot time and sees that as an OS issue and not a DT
issue. I believe this specific proposal from his is to explain why we
don't need a "depends-on" property (and I'm saying it's only partly
correct).

> I know the original intention of the series is to resolve the
> late_initcall_sync() assumption and probe order was a "side-effect". However,
> I think probing in the dependency order is still extremely valuable and can
> resolve boot time issues ahead of time.
>
> > We also need
> > to agree on how we do this without breaking backwards compatibility.
> > Either as a config option for this feature or have a property in the
> > "chosen" node to say it's okay to assume existing bindings imply
> > mandatory dependencies (it's just describing the DT more explicitly
> > and the kernel will use it to enable this feature).
> >
> > Although regulator binding are a "problem" because the kernel will
> > have to parse every property in a node to check if it ends with
> > -supply and then treat it as if it's a regulator binding (even though
> > it might not be). So regulators will need some kind of "opt out" in
> > the kernel (not DT).
>
> Agree and it is going to immediately conflict with 'power-supply' for
> example. If we are going this route, then we need to fix and agree on
> standard regulator bindings too and make the changes everywhere in the
> kernel.
>
> >
> > > This series is pretty much doing the same thing, you just have to
> > > parse each provider rather than only 'depends-on'.
> > >
> > > One issue is the struct device for the dependency may not be created
> > > yet. I think this series would have the same issue, but haven't dug
> > > into how it avoids that or whether it just ignores it and falls back
> > > to deferring probe.
> >
> > The patch series handles this properly and doesn't fall back to
> > deferred probing.
> >
> > > I'm also not clear on how you create struct devices and add
> > > dependencies before probing gets attempted. If a driver is already
> > > registered, probe is going to be attempted before any dependencies are
> > > added. I guess the issue is avoided with drivers being modules, but
> > > any solution should work for built-in too.
> >
> > This is also handled properly in the series.
> >
> > I've actually boot tested both these scenarios you call out and the
> > patch series handles them properly.
> >
> > But you are missing the main point here. The goal isn't to just
> > eliminate deferred probing (it's a great side effect even it it just
> > stops 99% of them), but also remove the bad assumption that
> > late_initcall_sync() means all the devices are probed. The suppliers
> > need a better signal (which the patch series provides) to tell when
> > they can "unfreeze" the resources left on at boot.
> >
>
> Is the summary here that we need to figure out a different approach / fix
> regulator framework, or something else ? It wasn't clear from all other
> emails from this thread, sorry for noise if I missed it.

No, the comment about regulators was just about how unfriendly that
binding is for parsing dependencies. That has nothing to do with the
general design though (which is, do we need depends-on?).

-Saravana

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RESEND PATCH v1 0/5] Solve postboot supplier cleanup and optimize probe ordering
  2019-06-04  0:32 [RESEND PATCH v1 0/5] Solve postboot supplier cleanup and optimize probe ordering Saravana Kannan
                   ` (5 preceding siblings ...)
  2019-06-12 21:21 ` [RESEND PATCH v1 0/5] Solve postboot supplier cleanup and optimize probe ordering Frank Rowand
@ 2019-06-24 22:37 ` Sandeep Patil
  2019-06-25  3:53   ` Greg Kroah-Hartman
  6 siblings, 1 reply; 35+ messages in thread
From: Sandeep Patil @ 2019-06-24 22:37 UTC (permalink / raw)
  To: Saravana Kannan
  Cc: Rob Herring, Mark Rutland, Greg Kroah-Hartman, Rafael J. Wysocki,
	Frank Rowand, David Collins, devicetree, linux-kernel,
	kernel-team

(Responding to the first email in the series to summarize the current
situation and choices we have.)

On Mon, Jun 03, 2019 at 05:32:13PM -0700, 'Saravana Kannan' via kernel-team wrote:
> Add a generic "depends-on" property that allows specifying mandatory
> functional dependencies between devices. Add device-links after the
> devices are created (but before they are probed) by looking at this
> "depends-on" property.
> 
> This property is used instead of existing DT properties that specify
> phandles of other devices (Eg: clocks, pinctrl, regulators, etc). This
> is because not all resources referred to by existing DT properties are
> mandatory functional dependencies. Some devices/drivers might be able
> to operate with reduced functionality when some of the resources
> aren't available. For example, a device could operate in polling mode
> if no IRQ is available, a device could skip doing power management if
> clock or voltage control isn't available and they are left on, etc.
> 
> So, adding mandatory functional dependency links between devices by
> looking at referred phandles in DT properties won't work as it would
> prevent probing devices that could be probed. By having an explicit
> depends-on property, we can handle these cases correctly.
> 
> Having functional dependencies explicitly called out in DT and
> automatically added before the devices are probed, provides the
> following benefits:
> 
> - Optimizes device probe order and avoids the useless work of
>   attempting probes of devices that will not probe successfully
>   (because their suppliers aren't present or haven't probed yet).
> 
>   For example, in a commonly available mobile SoC, registering just
>   one consumer device's driver at an initcall level earlier than the
>   supplier device's driver causes 11 failed probe attempts before the
>   consumer device probes successfully. This was with a kernel with all
>   the drivers statically compiled in. This problem gets a lot worse if
>   all the drivers are loaded as modules without direct symbol
>   dependencies.
> 
> - Supplier devices like clock providers, regulators providers, etc
>   need to keep the resources they provide active and at a particular
>   state(s) during boot up even if their current set of consumers don't
>   request the resource to be active. This is because the rest of the
>   consumers might not have probed yet and turning off the resource
>   before all the consumers have probed could lead to a hang or
>   undesired user experience.
> 
>   Some frameworks (Eg: regulator) handle this today by turning off
>   "unused" resources at late_initcall_sync and hoping all the devices
>   have probed by then. This is not a valid assumption for systems with
>   loadable modules. Other frameworks (Eg: clock) just don't handle
>   this due to the lack of a clear signal for when they can turn off
>   resources. This leads to downstream hacks to handle cases like this
>   that can easily be solved in the upstream kernel.
> 
>   By linking devices before they are probed, we give suppliers a clear
>   count of the number of dependent consumers. Once all of the
>   consumers are active, the suppliers can turn off the unused
>   resources without making assumptions about the number of consumers.
> 
> By default we just add device-links to track "driver presence" (probe
> succeeded) of the supplier device. If any other functionality provided
> by device-links are needed, it is left to the consumer/supplier
> devices to change the link when they probe.
>  

We are trying to make sure that all (most) drivers in an Aarch64 system can
be kernel modules for Android, like any other desktop system for
example. There are a number of problems we need to fix before that happens
ofcourse.

That patch series does the following -

1. Resolve the consumer<->supplier relationship in order for subsystems to
turn off resources to save power is #1 on our list. This is because it will
define how driver and DT writers define and write their code for Android
devices.

2. Resolve cyclic dependency between devices nodes as a side-effect.  That
will make sure Android systems do not suffer from deferred probing and we
have a generic way of serialize driver probes. (This can also be mitigated
somewhat by module dependencies outside of the kernel.)

Subsystems like regulator, interconnect can immediately benefit from #1 and
it makes the module/driver loading possible for Android systems without
regressing power horribly.

After thinking about Rob's suggestion to loop though dependencies, I think it
doesn't work because we don't know what to do when there are cycles. Drivers
sometimes need to have access to phandles in order to retrieve resources from
that phandle. We can't assume that is an implied "probe dependency". Saravana
had several such examples already and it is a _real world_ scenario.

I think the current binding are insufficient to describe the runtime hardware
dependencies. IOW, the current bindinds were probably intended to be that, but
drivers use them to get phandles for other nodes too and are not *strict*
dependencies. Either way, for Android systems to realize this goal of having
driver modules load on demand, we have to have something that does what this
patch series does.

I am also sensitive to the fact that adding a new binding in "depends-on"
will mean all Android devices will start *depending on* that DT binding (no
pun intended) without any upstream support. We want to avoid that as much
as possible.

I guess we are happy to hear better ways of doing this that work, but in the
absence of that, we are going to have to carry the series for Android and
that makes me sad :(

Suggestions, thoughts, for what Android should do here...?

Thanks
- ssp



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RESEND PATCH v1 0/5] Solve postboot supplier cleanup and optimize probe ordering
  2019-06-24 22:37 ` Sandeep Patil
@ 2019-06-25  3:53   ` Greg Kroah-Hartman
  2019-06-26  4:30     ` Sandeep Patil
  2019-06-26 21:30     ` Rob Herring
  0 siblings, 2 replies; 35+ messages in thread
From: Greg Kroah-Hartman @ 2019-06-25  3:53 UTC (permalink / raw)
  To: Sandeep Patil
  Cc: Saravana Kannan, Rob Herring, Mark Rutland, Rafael J. Wysocki,
	Frank Rowand, David Collins, devicetree, linux-kernel,
	kernel-team

On Mon, Jun 24, 2019 at 03:37:07PM -0700, Sandeep Patil wrote:
> We are trying to make sure that all (most) drivers in an Aarch64 system can
> be kernel modules for Android, like any other desktop system for
> example. There are a number of problems we need to fix before that happens
> ofcourse.

I will argue that this is NOT an android-specific issue.  If the goal of
creating an arm64 kernel that will "just work" for a wide range of
hardware configurations without rebuilding is going to happen, we need
to solve this problem with DT.  This goal was one of the original wishes
of the arm64 development effort, let's not loose sight of it as
obviously, this is not working properly just yet.

It just seems that Android is the first one to actually try and
implement that goal :)

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RESEND PATCH v1 0/5] Solve postboot supplier cleanup and optimize probe ordering
  2019-06-25  3:53   ` Greg Kroah-Hartman
@ 2019-06-26  4:30     ` Sandeep Patil
  2019-06-26  5:49       ` Frank Rowand
  2019-06-26 21:30     ` Rob Herring
  1 sibling, 1 reply; 35+ messages in thread
From: Sandeep Patil @ 2019-06-26  4:30 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Saravana Kannan, Rob Herring, Mark Rutland, Rafael J. Wysocki,
	Frank Rowand, David Collins, devicetree, linux-kernel,
	kernel-team

On Tue, Jun 25, 2019 at 11:53:13AM +0800, Greg Kroah-Hartman wrote:
> On Mon, Jun 24, 2019 at 03:37:07PM -0700, Sandeep Patil wrote:
> > We are trying to make sure that all (most) drivers in an Aarch64 system can
> > be kernel modules for Android, like any other desktop system for
> > example. There are a number of problems we need to fix before that happens
> > ofcourse.
> 
> I will argue that this is NOT an android-specific issue.  If the goal of
> creating an arm64 kernel that will "just work" for a wide range of
> hardware configurations without rebuilding is going to happen, we need
> to solve this problem with DT.  This goal was one of the original wishes
> of the arm64 development effort, let's not loose sight of it as
> obviously, this is not working properly just yet.

I believe the proposed solution in this patch series is just that. I am not
sure what the alternatives are. The alternative suggested was to reuse
pre-existing dt-bindings for dependency based probe re-ordering and resolution.

However, it seems we had no way to *really* check if these dependencies are
the real. So, a device may or may not actually depend on the other device for
probe / initialization when the dependency is mentioned in it's dt node. From
DT's point of view, there is no way to tell this ..

I don't know how this is handled in x86. With DT, I don't see how we can do
this unless DT dependencies are _really_ tied with runtime dependencies (The
cycles would have been apparent if that was the case.

Honestly, the "depends-on" property suggested here just piles on to the
existing state. So, it is somewhat doubling the exiting bindings. It says,
you must use depends-on property to define probe / initialization dependency.
The existing bindings like 'clock', 'interrupt', '*-supply' do not enforce
that right now, so you will have device nodes that have these bindings right
now but don't necessarily need them for successful probe for example.

> 
> It just seems that Android is the first one to actually try and
> implement that goal :)

I guess :)

- ssp

> 
> thanks,
> 
> greg k-h

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RESEND PATCH v1 0/5] Solve postboot supplier cleanup and optimize probe ordering
  2019-06-26  4:30     ` Sandeep Patil
@ 2019-06-26  5:49       ` Frank Rowand
  0 siblings, 0 replies; 35+ messages in thread
From: Frank Rowand @ 2019-06-26  5:49 UTC (permalink / raw)
  To: Sandeep Patil, Greg Kroah-Hartman
  Cc: Saravana Kannan, Rob Herring, Mark Rutland, Rafael J. Wysocki,
	David Collins, devicetree, linux-kernel, kernel-team

On 6/25/19 9:30 PM, Sandeep Patil wrote:
> On Tue, Jun 25, 2019 at 11:53:13AM +0800, Greg Kroah-Hartman wrote:
>> On Mon, Jun 24, 2019 at 03:37:07PM -0700, Sandeep Patil wrote:
>>> We are trying to make sure that all (most) drivers in an Aarch64 system can
>>> be kernel modules for Android, like any other desktop system for
>>> example. There are a number of problems we need to fix before that happens
>>> ofcourse.
>>
>> I will argue that this is NOT an android-specific issue.  If the goal of
>> creating an arm64 kernel that will "just work" for a wide range of
>> hardware configurations without rebuilding is going to happen, we need
>> to solve this problem with DT.  This goal was one of the original wishes
>> of the arm64 development effort, let's not loose sight of it as
>> obviously, this is not working properly just yet.
> 
> I believe the proposed solution in this patch series is just that. I am not
> sure what the alternatives are. The alternative suggested was to reuse

Look at the responses from myself and from Rob to patch 0.  No one responded
to our comments.

-Frank


> pre-existing dt-bindings for dependency based probe re-ordering and resolution.
> 
> However, it seems we had no way to *really* check if these dependencies are
> the real. So, a device may or may not actually depend on the other device for
> probe / initialization when the dependency is mentioned in it's dt node. From
> DT's point of view, there is no way to tell this ..
> 
> I don't know how this is handled in x86. With DT, I don't see how we can do
> this unless DT dependencies are _really_ tied with runtime dependencies (The
> cycles would have been apparent if that was the case.
> 
> Honestly, the "depends-on" property suggested here just piles on to the
> existing state. So, it is somewhat doubling the exiting bindings. It says,
> you must use depends-on property to define probe / initialization dependency.
> The existing bindings like 'clock', 'interrupt', '*-supply' do not enforce
> that right now, so you will have device nodes that have these bindings right
> now but don't necessarily need them for successful probe for example.
> 
>>
>> It just seems that Android is the first one to actually try and
>> implement that goal :)
> 
> I guess :)
> 
> - ssp
> 
>>
>> thanks,
>>
>> greg k-h
> 


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RESEND PATCH v1 0/5] Solve postboot supplier cleanup and optimize probe ordering
  2019-06-25  3:53   ` Greg Kroah-Hartman
  2019-06-26  4:30     ` Sandeep Patil
@ 2019-06-26 21:30     ` Rob Herring
  2019-06-28  2:36       ` Saravana Kannan
  1 sibling, 1 reply; 35+ messages in thread
From: Rob Herring @ 2019-06-26 21:30 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Sandeep Patil, Saravana Kannan, Mark Rutland, Rafael J. Wysocki,
	Frank Rowand, David Collins, devicetree, linux-kernel,
	Android Kernel Team

On Mon, Jun 24, 2019 at 9:54 PM Greg Kroah-Hartman
<gregkh@linuxfoundation.org> wrote:
>
> On Mon, Jun 24, 2019 at 03:37:07PM -0700, Sandeep Patil wrote:
> > We are trying to make sure that all (most) drivers in an Aarch64 system can
> > be kernel modules for Android, like any other desktop system for
> > example. There are a number of problems we need to fix before that happens
> > ofcourse.
>
> I will argue that this is NOT an android-specific issue.  If the goal of
> creating an arm64 kernel that will "just work" for a wide range of
> hardware configurations without rebuilding is going to happen, we need
> to solve this problem with DT.  This goal was one of the original wishes
> of the arm64 development effort, let's not loose sight of it as
> obviously, this is not working properly just yet.

I fail to see how the different Linux behavior between drivers
built-in and as modules has anything whatsoever to do with DT. Fix the
problems in Linux and use the dependencies that are already expressed
in DT and *then* we can talk about using DT to provide *hints* for
solving any remaining problems.

Rob

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RESEND PATCH v1 0/5] Solve postboot supplier cleanup and optimize probe ordering
  2019-06-26 21:30     ` Rob Herring
@ 2019-06-28  2:36       ` Saravana Kannan
  0 siblings, 0 replies; 35+ messages in thread
From: Saravana Kannan @ 2019-06-28  2:36 UTC (permalink / raw)
  To: Rob Herring
  Cc: Greg Kroah-Hartman, Sandeep Patil, Mark Rutland,
	Rafael J. Wysocki, Frank Rowand, David Collins, devicetree,
	linux-kernel, Android Kernel Team

On Wed, Jun 26, 2019 at 2:31 PM Rob Herring <robh+dt@kernel.org> wrote:
>
> On Mon, Jun 24, 2019 at 9:54 PM Greg Kroah-Hartman
> <gregkh@linuxfoundation.org> wrote:
> >
> > On Mon, Jun 24, 2019 at 03:37:07PM -0700, Sandeep Patil wrote:
> > > We are trying to make sure that all (most) drivers in an Aarch64 system can
> > > be kernel modules for Android, like any other desktop system for
> > > example. There are a number of problems we need to fix before that happens
> > > ofcourse.
> >
> > I will argue that this is NOT an android-specific issue.  If the goal of
> > creating an arm64 kernel that will "just work" for a wide range of
> > hardware configurations without rebuilding is going to happen, we need
> > to solve this problem with DT.  This goal was one of the original wishes
> > of the arm64 development effort, let's not loose sight of it as
> > obviously, this is not working properly just yet.
>
> I fail to see how the different Linux behavior between drivers
> built-in and as modules has anything whatsoever to do with DT.

You are right, built-in vs module problem is not a DT issue. But this
is not so much a built-in vs module issue. It's just that built-in has
a hack available that works sometimes. But really, both are broken.

> Fix the
> problems in Linux and use the dependencies that are already expressed
> in DT and *then* we can talk about using DT to provide *hints* for
> solving any remaining problems.

Done. Sent v2 patch series that uses existing bindings.

-Saravana

^ permalink raw reply	[flat|nested] 35+ messages in thread

end of thread, other threads:[~2019-06-28  2:37 UTC | newest]

Thread overview: 35+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-06-04  0:32 [RESEND PATCH v1 0/5] Solve postboot supplier cleanup and optimize probe ordering Saravana Kannan
2019-06-04  0:32 ` [RESEND PATCH v1 1/5] of/platform: Speed up of_find_device_by_node() Saravana Kannan
2019-06-10 17:36   ` Rob Herring
     [not found]     ` <CAGETcx_Kb3+fFYHhBUdVbCSNTk4v5j1Ket=Et2ZQY0SHUgLLMw@mail.gmail.com>
2019-06-10 21:07       ` Rob Herring
2019-06-11 15:18     ` Frank Rowand
2019-06-11 20:56       ` Saravana Kannan
2019-06-11 21:52         ` Sandeep Patil
2019-06-12 13:53           ` Rob Herring
2019-06-12 14:21             ` Greg Kroah-Hartman
2019-06-12 16:53               ` Rob Herring
2019-06-12 17:08                 ` Greg Kroah-Hartman
2019-06-12 18:19                   ` Rob Herring
2019-06-12 19:29                     ` Saravana Kannan
2019-06-12 20:23                       ` Frank Rowand
2019-06-12 21:12                       ` Rob Herring
2019-06-12 22:10                         ` Saravana Kannan
2019-06-18 20:47                       ` Sandeep Patil
2019-06-18 21:22                         ` Saravana Kannan
2019-06-12 17:03               ` Frank Rowand
2019-06-12 16:07             ` Frank Rowand
2019-06-12 16:47               ` Frank Rowand
2019-06-12 19:03             ` Frank Rowand
2019-06-04  0:32 ` [RESEND PATCH v1 2/5] driver core: Add device links support for pending links to suppliers Saravana Kannan
2019-06-04  0:32 ` [RESEND PATCH v1 3/5] dt-bindings: Add depends-on property Saravana Kannan
2019-06-12 14:45   ` Sudeep Holla
2019-06-04  0:32 ` [RESEND PATCH v1 4/5] of/platform: Add functional dependency link from "depends-on" property Saravana Kannan
2019-06-04  0:32 ` [RESEND PATCH v1 5/5] driver core: Add sync_state driver/bus callback Saravana Kannan
2019-06-12 21:21 ` [RESEND PATCH v1 0/5] Solve postboot supplier cleanup and optimize probe ordering Frank Rowand
2019-06-13 13:19   ` Rob Herring
2019-06-24 22:37 ` Sandeep Patil
2019-06-25  3:53   ` Greg Kroah-Hartman
2019-06-26  4:30     ` Sandeep Patil
2019-06-26  5:49       ` Frank Rowand
2019-06-26 21:30     ` Rob Herring
2019-06-28  2:36       ` Saravana Kannan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).