devicetree.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v7 0/7] Solve postboot supplier cleanup and optimize probe ordering
@ 2019-07-24  0:10 Saravana Kannan
  2019-07-24  0:10 ` [PATCH v7 1/7] driver core: Add support for linking devices during device addition Saravana Kannan
                   ` (8 more replies)
  0 siblings, 9 replies; 37+ messages in thread
From: Saravana Kannan @ 2019-07-24  0:10 UTC (permalink / raw)
  To: Rob Herring, Mark Rutland, Greg Kroah-Hartman, Rafael J. Wysocki,
	Frank Rowand
  Cc: Saravana Kannan, devicetree, linux-kernel, David Collins, kernel-team

Add device-links to track functional dependencies between devices
after they are created (but before they are probed) by looking at
their common DT bindings like clocks, interconnects, etc.

Having functional dependencies automatically added before the devices
are probed, provides the following benefits:

- Optimizes device probe order and avoids the useless work of
  attempting probes of devices that will not probe successfully
  (because their suppliers aren't present or haven't probed yet).

  For example, in a commonly available mobile SoC, registering just
  one consumer device's driver at an initcall level earlier than the
  supplier device's driver causes 11 failed probe attempts before the
  consumer device probes successfully. This was with a kernel with all
  the drivers statically compiled in. This problem gets a lot worse if
  all the drivers are loaded as modules without direct symbol
  dependencies.

- Supplier devices like clock providers, interconnect providers, etc
  need to keep the resources they provide active and at a particular
  state(s) during boot up even if their current set of consumers don't
  request the resource to be active. This is because the rest of the
  consumers might not have probed yet and turning off the resource
  before all the consumers have probed could lead to a hang or
  undesired user experience.

  Some frameworks (Eg: regulator) handle this today by turning off
  "unused" resources at late_initcall_sync and hoping all the devices
  have probed by then. This is not a valid assumption for systems with
  loadable modules. Other frameworks (Eg: clock) just don't handle
  this due to the lack of a clear signal for when they can turn off
  resources. This leads to downstream hacks to handle cases like this
  that can easily be solved in the upstream kernel.

  By linking devices before they are probed, we give suppliers a clear
  count of the number of dependent consumers. Once all of the
  consumers are active, the suppliers can turn off the unused
  resources without making assumptions about the number of consumers.

By default we just add device-links to track "driver presence" (probe
succeeded) of the supplier device. If any other functionality provided
by device-links are needed, it is left to the consumer/supplier
devices to change the link when they probe.

v1 -> v2:
- Drop patch to speed up of_find_device_by_node()
- Drop depends-on property and use existing bindings

v2 -> v3:
- Refactor the code to have driver core initiate the linking of devs
- Have driver core link consumers to supplier before it's probed
- Add support for drivers to edit the device links before probing

v3 -> v4:
- Tested edit_links() on system with cyclic dependency. Works.
- Added some checks to make sure device link isn't attempted from
  parent device node to child device node.
- Added way to pause/resume sync_state callbacks across
  of_platform_populate().
- Recursively parse DT node to create device links from parent to
  suppliers of parent and all child nodes.

v4 -> v5:
- Fixed copy-pasta bugs with linked list handling
- Walk up the phandle reference till I find an actual device (needed
  for regulators to work)
- Added support for linking devices from regulator DT bindings
- Tested the whole series again to make sure cyclic dependencies are
  broken with edit_links() and regulator links are created properly.

v5 -> v6:
- Split, squashed and reordered some of the patches.
- Refactored the device linking code to follow the same code pattern for
  any property.

v6 -> v7:
- No functional changes.
- Renamed i to index
- Added comment to clarify not having to check property name for every
  index
- Added "matched" variable to clarify code. No functional change.
- Added comments to include/linux/device.h for add_links()

I've also not updated this patch series to handle the new patch [1] from
Rafael. Will do that once this patch series is close to being Acked.

[1] - https://lore.kernel.org/lkml/3121545.4lOhFoIcdQ@kreacher/

-Saravana


Saravana Kannan (7):
  driver core: Add support for linking devices during device addition
  driver core: Add edit_links() callback for drivers
  of/platform: Add functional dependency link from DT bindings
  driver core: Add sync_state driver/bus callback
  of/platform: Pause/resume sync state during init and
    of_platform_populate()
  of/platform: Create device links for all child-supplier depencencies
  of/platform: Don't create device links for default busses

 .../admin-guide/kernel-parameters.txt         |   5 +
 drivers/base/core.c                           | 168 ++++++++++++++++
 drivers/base/dd.c                             |  29 +++
 drivers/of/platform.c                         | 189 ++++++++++++++++++
 include/linux/device.h                        |  55 +++++
 5 files changed, 446 insertions(+)

-- 
2.22.0.709.g102302147b-goog

^ permalink raw reply	[flat|nested] 37+ messages in thread

* [PATCH v7 1/7] driver core: Add support for linking devices during device addition
  2019-07-24  0:10 [PATCH v7 0/7] Solve postboot supplier cleanup and optimize probe ordering Saravana Kannan
@ 2019-07-24  0:10 ` Saravana Kannan
  2019-08-08  2:04   ` Frank Rowand
  2019-07-24  0:10 ` [PATCH v7 2/7] driver core: Add edit_links() callback for drivers Saravana Kannan
                   ` (7 subsequent siblings)
  8 siblings, 1 reply; 37+ messages in thread
From: Saravana Kannan @ 2019-07-24  0:10 UTC (permalink / raw)
  To: Rob Herring, Mark Rutland, Greg Kroah-Hartman, Rafael J. Wysocki,
	Frank Rowand
  Cc: Saravana Kannan, devicetree, linux-kernel, David Collins, kernel-team

When devices are added, the bus might want to create device links to track
functional dependencies between supplier and consumer devices. This
tracking of supplier-consumer relationship allows optimizing device probe
order and tracking whether all consumers of a supplier are active. The
add_links bus callback is added to support this.

However, when consumer devices are added, they might not have a supplier
device to link to despite needing mandatory resources/functionality from
one or more suppliers. A waiting_for_suppliers list is created to track
such consumers and retry linking them when new devices get added.

Signed-off-by: Saravana Kannan <saravanak@google.com>
---
 drivers/base/core.c    | 83 ++++++++++++++++++++++++++++++++++++++++++
 include/linux/device.h | 14 +++++++
 2 files changed, 97 insertions(+)

diff --git a/drivers/base/core.c b/drivers/base/core.c
index da84a73f2ba6..1b4eb221968f 100644
--- a/drivers/base/core.c
+++ b/drivers/base/core.c
@@ -44,6 +44,8 @@ early_param("sysfs.deprecated", sysfs_deprecated_setup);
 #endif
 
 /* Device links support. */
+static LIST_HEAD(wait_for_suppliers);
+static DEFINE_MUTEX(wfs_lock);
 
 #ifdef CONFIG_SRCU
 static DEFINE_MUTEX(device_links_lock);
@@ -401,6 +403,51 @@ struct device_link *device_link_add(struct device *consumer,
 }
 EXPORT_SYMBOL_GPL(device_link_add);
 
+/**
+ * device_link_wait_for_supplier - Mark device as waiting for supplier
+ * @consumer: Consumer device
+ *
+ * Marks the consumer device as waiting for suppliers to become available. The
+ * consumer device will never be probed until it's unmarked as waiting for
+ * suppliers. The caller is responsible for adding the link to the supplier
+ * once the supplier device is present.
+ *
+ * This function is NOT meant to be called from the probe function of the
+ * consumer but rather from code that creates/adds the consumer device.
+ */
+static void device_link_wait_for_supplier(struct device *consumer)
+{
+	mutex_lock(&wfs_lock);
+	list_add_tail(&consumer->links.needs_suppliers, &wait_for_suppliers);
+	mutex_unlock(&wfs_lock);
+}
+
+/**
+ * device_link_check_waiting_consumers - Try to unmark waiting consumers
+ *
+ * Loops through all consumers waiting on suppliers and tries to add all their
+ * supplier links. If that succeeds, the consumer device is unmarked as waiting
+ * for suppliers. Otherwise, they are left marked as waiting on suppliers,
+ *
+ * The add_links bus callback is expected to return 0 if it has found and added
+ * all the supplier links for the consumer device. It should return an error if
+ * it isn't able to do so.
+ *
+ * The caller of device_link_wait_for_supplier() is expected to call this once
+ * it's aware of potential suppliers becoming available.
+ */
+static void device_link_check_waiting_consumers(void)
+{
+	struct device *dev, *tmp;
+
+	mutex_lock(&wfs_lock);
+	list_for_each_entry_safe(dev, tmp, &wait_for_suppliers,
+				 links.needs_suppliers)
+		if (!dev->bus->add_links(dev))
+			list_del_init(&dev->links.needs_suppliers);
+	mutex_unlock(&wfs_lock);
+}
+
 static void device_link_free(struct device_link *link)
 {
 	while (refcount_dec_not_one(&link->rpm_active))
@@ -535,6 +582,19 @@ int device_links_check_suppliers(struct device *dev)
 	struct device_link *link;
 	int ret = 0;
 
+	/*
+	 * If a device is waiting for one or more suppliers (in
+	 * wait_for_suppliers list), it is not ready to probe yet. So just
+	 * return -EPROBE_DEFER without having to check the links with existing
+	 * suppliers.
+	 */
+	mutex_lock(&wfs_lock);
+	if (!list_empty(&dev->links.needs_suppliers)) {
+		mutex_unlock(&wfs_lock);
+		return -EPROBE_DEFER;
+	}
+	mutex_unlock(&wfs_lock);
+
 	device_links_write_lock();
 
 	list_for_each_entry(link, &dev->links.suppliers, c_node) {
@@ -812,6 +872,10 @@ static void device_links_purge(struct device *dev)
 {
 	struct device_link *link, *ln;
 
+	mutex_lock(&wfs_lock);
+	list_del(&dev->links.needs_suppliers);
+	mutex_unlock(&wfs_lock);
+
 	/*
 	 * Delete all of the remaining links from this device to any other
 	 * devices (either consumers or suppliers).
@@ -1673,6 +1737,7 @@ void device_initialize(struct device *dev)
 #endif
 	INIT_LIST_HEAD(&dev->links.consumers);
 	INIT_LIST_HEAD(&dev->links.suppliers);
+	INIT_LIST_HEAD(&dev->links.needs_suppliers);
 	dev->links.status = DL_DEV_NO_DRIVER;
 }
 EXPORT_SYMBOL_GPL(device_initialize);
@@ -2108,6 +2173,24 @@ int device_add(struct device *dev)
 					     BUS_NOTIFY_ADD_DEVICE, dev);
 
 	kobject_uevent(&dev->kobj, KOBJ_ADD);
+
+	/*
+	 * Check if any of the other devices (consumers) have been waiting for
+	 * this device (supplier) to be added so that they can create a device
+	 * link to it.
+	 *
+	 * This needs to happen after device_pm_add() because device_link_add()
+	 * requires the supplier be registered before it's called.
+	 *
+	 * But this also needs to happe before bus_probe_device() to make sure
+	 * waiting consumers can link to it before the driver is bound to the
+	 * device and the driver sync_state callback is called for this device.
+	 */
+	device_link_check_waiting_consumers();
+
+	if (dev->bus && dev->bus->add_links && dev->bus->add_links(dev))
+		device_link_wait_for_supplier(dev);
+
 	bus_probe_device(dev);
 	if (parent)
 		klist_add_tail(&dev->p->knode_parent,
diff --git a/include/linux/device.h b/include/linux/device.h
index c330b75c6c57..5d70babb7462 100644
--- a/include/linux/device.h
+++ b/include/linux/device.h
@@ -78,6 +78,17 @@ extern void bus_remove_file(struct bus_type *, struct bus_attribute *);
  *		-EPROBE_DEFER it will queue the device for deferred probing.
  * @uevent:	Called when a device is added, removed, or a few other things
  *		that generate uevents to add the environment variables.
+ * @add_links:	Called, perhaps multiple times per device, after a device is
+ *		added to this bus.  The function is expected to create device
+ *		links to all the suppliers of the input device that are
+ *		available at the time this function is called.  As in, the
+ *		function should NOT stop at the first failed device link if
+ *		other unlinked supplier devices are present in the system.
+ *
+ *		Return 0 if device links have been successfully created to all
+ *		the suppliers of this device.  Return an error if some of the
+ *		suppliers are not yet available and this function needs to be
+ *		reattempted in the future.
  * @probe:	Called when a new device or driver add to this bus, and callback
  *		the specific driver's probe to initial the matched device.
  * @remove:	Called when a device removed from this bus.
@@ -122,6 +133,7 @@ struct bus_type {
 
 	int (*match)(struct device *dev, struct device_driver *drv);
 	int (*uevent)(struct device *dev, struct kobj_uevent_env *env);
+	int (*add_links)(struct device *dev);
 	int (*probe)(struct device *dev);
 	int (*remove)(struct device *dev);
 	void (*shutdown)(struct device *dev);
@@ -893,11 +905,13 @@ enum dl_dev_state {
  * struct dev_links_info - Device data related to device links.
  * @suppliers: List of links to supplier devices.
  * @consumers: List of links to consumer devices.
+ * @needs_suppliers: Hook to global list of devices waiting for suppliers.
  * @status: Driver status information.
  */
 struct dev_links_info {
 	struct list_head suppliers;
 	struct list_head consumers;
+	struct list_head needs_suppliers;
 	enum dl_dev_state status;
 };
 
-- 
2.22.0.709.g102302147b-goog

^ permalink raw reply	[flat|nested] 37+ messages in thread

* [PATCH v7 2/7] driver core: Add edit_links() callback for drivers
  2019-07-24  0:10 [PATCH v7 0/7] Solve postboot supplier cleanup and optimize probe ordering Saravana Kannan
  2019-07-24  0:10 ` [PATCH v7 1/7] driver core: Add support for linking devices during device addition Saravana Kannan
@ 2019-07-24  0:10 ` Saravana Kannan
  2019-08-08  2:05   ` Frank Rowand
  2019-07-24  0:10 ` [PATCH v7 3/7] of/platform: Add functional dependency link from DT bindings Saravana Kannan
                   ` (6 subsequent siblings)
  8 siblings, 1 reply; 37+ messages in thread
From: Saravana Kannan @ 2019-07-24  0:10 UTC (permalink / raw)
  To: Rob Herring, Mark Rutland, Greg Kroah-Hartman, Rafael J. Wysocki,
	Frank Rowand
  Cc: Saravana Kannan, devicetree, linux-kernel, David Collins, kernel-team

The driver core/bus adding supplier-consumer dependencies by default
enables functional dependencies to be tracked correctly even when the
consumer devices haven't had their drivers registered or loaded (if they
are modules).

However, when the bus incorrectly adds dependencies that it shouldn't
have added, the devices might never probe.

For example, if device-C is a consumer of device-S and they have
phandles to each other in DT, the following could happen:

1.  Device-S get added first.
2.  The bus add_links() callback will (incorrectly) try to link it as
    a consumer of device-C.
3.  Since device-C isn't present, device-S will be put in
    "waiting-for-supplier" list.
4.  Device-C gets added next.
5.  All devices in "waiting-for-supplier" list are retried for linking.
6.  Device-S gets linked as consumer to Device-C.
7.  The bus add_links() callback will (correctly) try to link it as
    a consumer of device-S.
8.  This isn't allowed because it would create a cyclic device links.

Neither devices will get probed since the supplier is marked as
dependent on the consumer. And the consumer will never probe because the
consumer can't get resources from the supplier.

Without this patch, things stay in this broken state. However, with this
patch, the execution will continue like this:

9.  Device-C's driver is loaded.
10. Device-C's driver removes Device-S as a consumer of Device-C.
11. Device-C's driver adds Device-C as a consumer of Device-S.
12. Device-S probes.
14. Device-C probes.

Signed-off-by: Saravana Kannan <saravanak@google.com>
---
 drivers/base/core.c    | 24 ++++++++++++++++++++++--
 drivers/base/dd.c      | 29 +++++++++++++++++++++++++++++
 include/linux/device.h | 18 ++++++++++++++++++
 3 files changed, 69 insertions(+), 2 deletions(-)

diff --git a/drivers/base/core.c b/drivers/base/core.c
index 1b4eb221968f..733d8a9aec76 100644
--- a/drivers/base/core.c
+++ b/drivers/base/core.c
@@ -422,6 +422,19 @@ static void device_link_wait_for_supplier(struct device *consumer)
 	mutex_unlock(&wfs_lock);
 }
 
+/**
+ * device_link_remove_from_wfs - Unmark device as waiting for supplier
+ * @consumer: Consumer device
+ *
+ * Unmark the consumer device as waiting for suppliers to become available.
+ */
+void device_link_remove_from_wfs(struct device *consumer)
+{
+	mutex_lock(&wfs_lock);
+	list_del_init(&consumer->links.needs_suppliers);
+	mutex_unlock(&wfs_lock);
+}
+
 /**
  * device_link_check_waiting_consumers - Try to unmark waiting consumers
  *
@@ -439,12 +452,19 @@ static void device_link_wait_for_supplier(struct device *consumer)
 static void device_link_check_waiting_consumers(void)
 {
 	struct device *dev, *tmp;
+	int ret;
 
 	mutex_lock(&wfs_lock);
 	list_for_each_entry_safe(dev, tmp, &wait_for_suppliers,
-				 links.needs_suppliers)
-		if (!dev->bus->add_links(dev))
+				 links.needs_suppliers) {
+		ret = 0;
+		if (dev->has_edit_links)
+			ret = driver_edit_links(dev);
+		else if (dev->bus->add_links)
+			ret = dev->bus->add_links(dev);
+		if (!ret)
 			list_del_init(&dev->links.needs_suppliers);
+	}
 	mutex_unlock(&wfs_lock);
 }
 
diff --git a/drivers/base/dd.c b/drivers/base/dd.c
index 994a90747420..5e7041ede0d7 100644
--- a/drivers/base/dd.c
+++ b/drivers/base/dd.c
@@ -698,6 +698,12 @@ int driver_probe_device(struct device_driver *drv, struct device *dev)
 	pr_debug("bus: '%s': %s: matched device %s with driver %s\n",
 		 drv->bus->name, __func__, dev_name(dev), drv->name);
 
+	if (drv->edit_links) {
+		if (drv->edit_links(dev))
+			dev->has_edit_links = true;
+		else
+			device_link_remove_from_wfs(dev);
+	}
 	pm_runtime_get_suppliers(dev);
 	if (dev->parent)
 		pm_runtime_get_sync(dev->parent);
@@ -786,6 +792,29 @@ struct device_attach_data {
 	bool have_async;
 };
 
+static int __driver_edit_links(struct device_driver *drv, void *data)
+{
+	struct device *dev = data;
+
+	if (!drv->edit_links)
+		return 0;
+
+	if (driver_match_device(drv, dev) <= 0)
+		return 0;
+
+	return drv->edit_links(dev);
+}
+
+int driver_edit_links(struct device *dev)
+{
+	int ret;
+
+	device_lock(dev);
+	ret = bus_for_each_drv(dev->bus, NULL, dev, __driver_edit_links);
+	device_unlock(dev);
+	return ret;
+}
+
 static int __device_attach_driver(struct device_driver *drv, void *_data)
 {
 	struct device_attach_data *data = _data;
diff --git a/include/linux/device.h b/include/linux/device.h
index 5d70babb7462..35aed50033c4 100644
--- a/include/linux/device.h
+++ b/include/linux/device.h
@@ -263,6 +263,20 @@ enum probe_type {
  * @probe_type:	Type of the probe (synchronous or asynchronous) to use.
  * @of_match_table: The open firmware table.
  * @acpi_match_table: The ACPI match table.
+ * @edit_links:	Called to allow a matched driver to edit the device links the
+ *		bus might have added incorrectly. This will be useful to handle
+ *		cases where the bus incorrectly adds functional dependencies
+ *		that aren't true or tries to create cyclic dependencies. But
+ *		doesn't correctly handle functional dependencies that are
+ *		missed by the bus as the supplier's sync_state might get to
+ *		execute before the driver for a missing consumer is loaded and
+ *		gets to edit the device links for the consumer.
+ *
+ *		This function might be called multiple times after a new device
+ *		is added.  The function is expected to create all the device
+ *		links for the new device and return 0 if it was completed
+ *		successfully or return an error if it needs to be reattempted
+ *		in the future.
  * @probe:	Called to query the existence of a specific device,
  *		whether this driver can work with it, and bind the driver
  *		to a specific device.
@@ -302,6 +316,7 @@ struct device_driver {
 	const struct of_device_id	*of_match_table;
 	const struct acpi_device_id	*acpi_match_table;
 
+	int (*edit_links)(struct device *dev);
 	int (*probe) (struct device *dev);
 	int (*remove) (struct device *dev);
 	void (*shutdown) (struct device *dev);
@@ -1078,6 +1093,7 @@ struct device {
 	bool			offline_disabled:1;
 	bool			offline:1;
 	bool			of_node_reused:1;
+	bool			has_edit_links:1;
 #if defined(CONFIG_ARCH_HAS_SYNC_DMA_FOR_DEVICE) || \
     defined(CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU) || \
     defined(CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU_ALL)
@@ -1329,6 +1345,7 @@ extern int  __must_check device_attach(struct device *dev);
 extern int __must_check driver_attach(struct device_driver *drv);
 extern void device_initial_probe(struct device *dev);
 extern int __must_check device_reprobe(struct device *dev);
+extern int driver_edit_links(struct device *dev);
 
 extern bool device_is_bound(struct device *dev);
 
@@ -1419,6 +1436,7 @@ struct device_link *device_link_add(struct device *consumer,
 				    struct device *supplier, u32 flags);
 void device_link_del(struct device_link *link);
 void device_link_remove(void *consumer, struct device *supplier);
+void device_link_remove_from_wfs(struct device *consumer);
 
 #ifndef dev_fmt
 #define dev_fmt(fmt) fmt
-- 
2.22.0.709.g102302147b-goog

^ permalink raw reply	[flat|nested] 37+ messages in thread

* [PATCH v7 3/7] of/platform: Add functional dependency link from DT bindings
  2019-07-24  0:10 [PATCH v7 0/7] Solve postboot supplier cleanup and optimize probe ordering Saravana Kannan
  2019-07-24  0:10 ` [PATCH v7 1/7] driver core: Add support for linking devices during device addition Saravana Kannan
  2019-07-24  0:10 ` [PATCH v7 2/7] driver core: Add edit_links() callback for drivers Saravana Kannan
@ 2019-07-24  0:10 ` Saravana Kannan
  2019-08-08  2:06   ` Frank Rowand
  2019-07-24  0:10 ` [PATCH v7 4/7] driver core: Add sync_state driver/bus callback Saravana Kannan
                   ` (5 subsequent siblings)
  8 siblings, 1 reply; 37+ messages in thread
From: Saravana Kannan @ 2019-07-24  0:10 UTC (permalink / raw)
  To: Rob Herring, Mark Rutland, Greg Kroah-Hartman, Rafael J. Wysocki,
	Frank Rowand, Jonathan Corbet
  Cc: Saravana Kannan, devicetree, linux-kernel, David Collins,
	kernel-team, linux-doc

Add device-links after the devices are created (but before they are
probed) by looking at common DT bindings like clocks and
interconnects.

Automatically adding device-links for functional dependencies at the
framework level provides the following benefits:

- Optimizes device probe order and avoids the useless work of
  attempting probes of devices that will not probe successfully
  (because their suppliers aren't present or haven't probed yet).

  For example, in a commonly available mobile SoC, registering just
  one consumer device's driver at an initcall level earlier than the
  supplier device's driver causes 11 failed probe attempts before the
  consumer device probes successfully. This was with a kernel with all
  the drivers statically compiled in. This problem gets a lot worse if
  all the drivers are loaded as modules without direct symbol
  dependencies.

- Supplier devices like clock providers, interconnect providers, etc
  need to keep the resources they provide active and at a particular
  state(s) during boot up even if their current set of consumers don't
  request the resource to be active. This is because the rest of the
  consumers might not have probed yet and turning off the resource
  before all the consumers have probed could lead to a hang or
  undesired user experience.

  Some frameworks (Eg: regulator) handle this today by turning off
  "unused" resources at late_initcall_sync and hoping all the devices
  have probed by then. This is not a valid assumption for systems with
  loadable modules. Other frameworks (Eg: clock) just don't handle
  this due to the lack of a clear signal for when they can turn off
  resources. This leads to downstream hacks to handle cases like this
  that can easily be solved in the upstream kernel.

  By linking devices before they are probed, we give suppliers a clear
  count of the number of dependent consumers. Once all of the
  consumers are active, the suppliers can turn off the unused
  resources without making assumptions about the number of consumers.

By default we just add device-links to track "driver presence" (probe
succeeded) of the supplier device. If any other functionality provided
by device-links are needed, it is left to the consumer/supplier
devices to change the link when they probe.

Signed-off-by: Saravana Kannan <saravanak@google.com>
---
 .../admin-guide/kernel-parameters.txt         |   5 +
 drivers/of/platform.c                         | 165 ++++++++++++++++++
 2 files changed, 170 insertions(+)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 46b826fcb5ad..12937349d79d 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -3170,6 +3170,11 @@
 			This can be set from sysctl after boot.
 			See Documentation/admin-guide/sysctl/vm.rst for details.
 
+	of_devlink	[KNL] Make device links from common DT bindings. Useful
+			for optimizing probe order and making sure resources
+			aren't turned off before the consumer devices have
+			probed.
+
 	ohci1394_dma=early	[HW] enable debugging via the ohci1394 driver.
 			See Documentation/debugging-via-ohci1394.txt for more
 			info.
diff --git a/drivers/of/platform.c b/drivers/of/platform.c
index 7801e25e6895..4344419a26fc 100644
--- a/drivers/of/platform.c
+++ b/drivers/of/platform.c
@@ -508,6 +508,170 @@ int of_platform_default_populate(struct device_node *root,
 }
 EXPORT_SYMBOL_GPL(of_platform_default_populate);
 
+bool of_link_is_valid(struct device_node *con, struct device_node *sup)
+{
+	of_node_get(sup);
+	/*
+	 * Don't allow linking a device node as a consumer of one of its
+	 * descendant nodes. By definition, a child node can't be a functional
+	 * dependency for the parent node.
+	 */
+	while (sup) {
+		if (sup == con) {
+			of_node_put(sup);
+			return false;
+		}
+		sup = of_get_next_parent(sup);
+	}
+	return true;
+}
+
+static int of_link_to_phandle(struct device *dev, struct device_node *sup_np)
+{
+	struct platform_device *sup_dev;
+	u32 dl_flags = DL_FLAG_AUTOPROBE_CONSUMER;
+	int ret = 0;
+
+	/*
+	 * Since we are trying to create device links, we need to find
+	 * the actual device node that owns this supplier phandle.
+	 * Often times it's the same node, but sometimes it can be one
+	 * of the parents. So walk up the parent till you find a
+	 * device.
+	 */
+	while (sup_np && !of_find_property(sup_np, "compatible", NULL))
+		sup_np = of_get_next_parent(sup_np);
+	if (!sup_np)
+		return 0;
+
+	if (!of_link_is_valid(dev->of_node, sup_np)) {
+		of_node_put(sup_np);
+		return 0;
+	}
+	sup_dev = of_find_device_by_node(sup_np);
+	of_node_put(sup_np);
+	if (!sup_dev)
+		return -ENODEV;
+	if (!device_link_add(dev, &sup_dev->dev, dl_flags))
+		ret = -ENODEV;
+	put_device(&sup_dev->dev);
+	return ret;
+}
+
+static struct device_node *parse_prop_cells(struct device_node *np,
+					    const char *prop, int index,
+					    const char *binding,
+					    const char *cell)
+{
+	struct of_phandle_args sup_args;
+
+	/* Don't need to check property name for every index. */
+	if (!index && strcmp(prop, binding))
+		return NULL;
+
+	if (of_parse_phandle_with_args(np, binding, cell, index, &sup_args))
+		return NULL;
+
+	return sup_args.np;
+}
+
+static struct device_node *parse_clocks(struct device_node *np,
+					const char *prop, int index)
+{
+	return parse_prop_cells(np, prop, index, "clocks", "#clock-cells");
+}
+
+static struct device_node *parse_interconnects(struct device_node *np,
+					       const char *prop, int index)
+{
+	return parse_prop_cells(np, prop, index, "interconnects",
+				"#interconnect-cells");
+}
+
+static int strcmp_suffix(const char *str, const char *suffix)
+{
+	unsigned int len, suffix_len;
+
+	len = strlen(str);
+	suffix_len = strlen(suffix);
+	if (len <= suffix_len)
+		return -1;
+	return strcmp(str + len - suffix_len, suffix);
+}
+
+static struct device_node *parse_regulators(struct device_node *np,
+					    const char *prop, int index)
+{
+	if (index || strcmp_suffix(prop, "-supply"))
+		return NULL;
+
+	return of_parse_phandle(np, prop, 0);
+}
+
+/**
+ * struct supplier_bindings - Information for parsing supplier DT binding
+ *
+ * @parse_prop:		If the function cannot parse the property, return NULL.
+ *			Otherwise, return the phandle listed in the property
+ *			that corresponds to the index.
+ */
+struct supplier_bindings {
+	struct device_node *(*parse_prop)(struct device_node *np,
+					  const char *name, int index);
+};
+
+static const struct supplier_bindings bindings[] = {
+	{ .parse_prop = parse_clocks, },
+	{ .parse_prop = parse_interconnects, },
+	{ .parse_prop = parse_regulators, },
+	{ },
+};
+
+static bool of_link_property(struct device *dev, struct device_node *con_np,
+			     const char *prop)
+{
+	struct device_node *phandle;
+	struct supplier_bindings *s = bindings;
+	unsigned int i = 0;
+	bool done = true, matched = false;
+
+	while (!matched && s->parse_prop) {
+		while ((phandle = s->parse_prop(con_np, prop, i))) {
+			matched = true;
+			i++;
+			if (of_link_to_phandle(dev, phandle))
+				/*
+				 * Don't stop at the first failure. See
+				 * Documentation for bus_type.add_links for
+				 * more details.
+				 */
+				done = false;
+		}
+		s++;
+	}
+	return done ? 0 : -ENODEV;
+}
+
+static bool of_devlink;
+core_param(of_devlink, of_devlink, bool, 0);
+
+static int of_link_to_suppliers(struct device *dev)
+{
+	struct property *p;
+	bool done = true;
+
+	if (!of_devlink)
+		return 0;
+	if (unlikely(!dev->of_node))
+		return 0;
+
+	for_each_property_of_node(dev->of_node, p)
+		if (of_link_property(dev, dev->of_node, p->name))
+			done = false;
+
+	return done ? 0 : -ENODEV;
+}
+
 #ifndef CONFIG_PPC
 static const struct of_device_id reserved_mem_matches[] = {
 	{ .compatible = "qcom,rmtfs-mem" },
@@ -523,6 +687,7 @@ static int __init of_platform_default_populate_init(void)
 	if (!of_have_populated_dt())
 		return -ENODEV;
 
+	platform_bus_type.add_links = of_link_to_suppliers;
 	/*
 	 * Handle certain compatibles explicitly, since we don't want to create
 	 * platform_devices for every node in /reserved-memory with a
-- 
2.22.0.709.g102302147b-goog

^ permalink raw reply	[flat|nested] 37+ messages in thread

* [PATCH v7 4/7] driver core: Add sync_state driver/bus callback
  2019-07-24  0:10 [PATCH v7 0/7] Solve postboot supplier cleanup and optimize probe ordering Saravana Kannan
                   ` (2 preceding siblings ...)
  2019-07-24  0:10 ` [PATCH v7 3/7] of/platform: Add functional dependency link from DT bindings Saravana Kannan
@ 2019-07-24  0:10 ` Saravana Kannan
  2019-07-24  0:10 ` [PATCH v7 5/7] of/platform: Pause/resume sync state during init and of_platform_populate() Saravana Kannan
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 37+ messages in thread
From: Saravana Kannan @ 2019-07-24  0:10 UTC (permalink / raw)
  To: Rob Herring, Mark Rutland, Greg Kroah-Hartman, Rafael J. Wysocki,
	Frank Rowand
  Cc: Saravana Kannan, devicetree, linux-kernel, David Collins, kernel-team

This sync_state driver/bus callback is called once all the consumers
of a supplier have probed successfully.

This allows the supplier device's driver/bus to sync the supplier
device's state to the software state with the guarantee that all the
consumers are actively managing the resources provided by the supplier
device.

To maintain backwards compatibility and ease transition from existing
frameworks and resource cleanup schemes, late_initcall_sync is the
earliest when the sync_state callback might be called.

There is no upper bound on the time by which the sync_state callback
has to be called. This is because if a consumer device never probes,
the supplier has to maintain its resources in the state left by the
bootloader. For example, if the bootloader leaves the display
backlight at a fixed voltage and the backlight driver is never probed,
you don't want the backlight to ever be turned off after boot up.

Also, when multiple devices are added after kernel init, some
suppliers could be added before their consumer devices get added. In
these instances, the supplier devices could get their sync_state
callback called right after they probe because the consumers devices
haven't had a chance to create device links to the suppliers.

To handle this correctly, this change also provides APIs to
pause/resume sync state callbacks so that when multiple devices are
added, their sync_state callback evaluation can be postponed to happen
after all of them are added.

Signed-off-by: Saravana Kannan <saravanak@google.com>
---
 drivers/base/core.c    | 65 ++++++++++++++++++++++++++++++++++++++++++
 include/linux/device.h | 23 +++++++++++++++
 2 files changed, 88 insertions(+)

diff --git a/drivers/base/core.c b/drivers/base/core.c
index 733d8a9aec76..aec725d9e17e 100644
--- a/drivers/base/core.c
+++ b/drivers/base/core.c
@@ -46,6 +46,8 @@ early_param("sysfs.deprecated", sysfs_deprecated_setup);
 /* Device links support. */
 static LIST_HEAD(wait_for_suppliers);
 static DEFINE_MUTEX(wfs_lock);
+static LIST_HEAD(deferred_sync);
+static unsigned int supplier_sync_state_disabled;
 
 #ifdef CONFIG_SRCU
 static DEFINE_MUTEX(device_links_lock);
@@ -634,6 +636,62 @@ int device_links_check_suppliers(struct device *dev)
 	return ret;
 }
 
+static void __device_links_supplier_sync_state(struct device *dev)
+{
+	struct device_link *link;
+
+	if (dev->state_synced)
+		return;
+
+	list_for_each_entry(link, &dev->links.consumers, s_node) {
+		if (link->flags & DL_FLAG_STATELESS)
+			continue;
+		if (link->status != DL_STATE_ACTIVE)
+			return;
+	}
+
+	if (dev->bus->sync_state)
+		dev->bus->sync_state(dev);
+	else if (dev->driver && dev->driver->sync_state)
+		dev->driver->sync_state(dev);
+
+	dev->state_synced = true;
+}
+
+void device_links_supplier_sync_state_pause(void)
+{
+	device_links_write_lock();
+	supplier_sync_state_disabled++;
+	device_links_write_unlock();
+}
+
+void device_links_supplier_sync_state_resume(void)
+{
+	struct device *dev, *tmp;
+
+	device_links_write_lock();
+	if (!supplier_sync_state_disabled) {
+		WARN(true, "Unmatched sync_state pause/resume!");
+		goto out;
+	}
+	supplier_sync_state_disabled--;
+	if (supplier_sync_state_disabled)
+		goto out;
+
+	list_for_each_entry_safe(dev, tmp, &deferred_sync, links.defer_sync) {
+		__device_links_supplier_sync_state(dev);
+		list_del_init(&dev->links.defer_sync);
+	}
+out:
+	device_links_write_unlock();
+}
+
+static void __device_links_supplier_defer_sync(struct device *sup)
+{
+	if (list_empty(&sup->links.defer_sync))
+		list_add_tail(&sup->links.defer_sync, &deferred_sync);
+}
+
 /**
  * device_links_driver_bound - Update device links after probing its driver.
  * @dev: Device to update the links for.
@@ -678,6 +736,11 @@ void device_links_driver_bound(struct device *dev)
 
 		WARN_ON(link->status != DL_STATE_CONSUMER_PROBE);
 		WRITE_ONCE(link->status, DL_STATE_ACTIVE);
+
+		if (supplier_sync_state_disabled)
+			__device_links_supplier_defer_sync(link->supplier);
+		else
+			__device_links_supplier_sync_state(link->supplier);
 	}
 
 	dev->links.status = DL_DEV_DRIVER_BOUND;
@@ -787,6 +850,7 @@ void device_links_driver_cleanup(struct device *dev)
 		WRITE_ONCE(link->status, DL_STATE_DORMANT);
 	}
 
+	list_del_init(&dev->links.defer_sync);
 	__device_links_no_driver(dev);
 
 	device_links_write_unlock();
@@ -1758,6 +1822,7 @@ void device_initialize(struct device *dev)
 	INIT_LIST_HEAD(&dev->links.consumers);
 	INIT_LIST_HEAD(&dev->links.suppliers);
 	INIT_LIST_HEAD(&dev->links.needs_suppliers);
+	INIT_LIST_HEAD(&dev->links.defer_sync);
 	dev->links.status = DL_DEV_NO_DRIVER;
 }
 EXPORT_SYMBOL_GPL(device_initialize);
diff --git a/include/linux/device.h b/include/linux/device.h
index 35aed50033c4..4e74ed9137a0 100644
--- a/include/linux/device.h
+++ b/include/linux/device.h
@@ -84,6 +84,8 @@ extern void bus_remove_file(struct bus_type *, struct bus_attribute *);
  *		available at the time this function is called.  As in, the
  *		function should NOT stop at the first failed device link if
  *		other unlinked supplier devices are present in the system.
+ *		This is necessary for the sync_state() callback to work
+ *		correctly.
  *
  *		Return 0 if device links have been successfully created to all
  *		the suppliers of this device.  Return an error if some of the
@@ -91,6 +93,13 @@ extern void bus_remove_file(struct bus_type *, struct bus_attribute *);
  *		reattempted in the future.
  * @probe:	Called when a new device or driver add to this bus, and callback
  *		the specific driver's probe to initial the matched device.
+ * @sync_state:	Called to sync device state to software state after all the
+ *		state tracking consumers linked to this device (present at
+ *		the time of late_initcall) have successfully bound to a
+ *		driver. If the device has no consumers, this function will
+ *		be called at late_initcall_sync level. If the device has
+ *		consumers that are never bound to a driver, this function
+ *		will never get called until they do.
  * @remove:	Called when a device removed from this bus.
  * @shutdown:	Called at shut-down time to quiesce the device.
  *
@@ -135,6 +144,7 @@ struct bus_type {
 	int (*uevent)(struct device *dev, struct kobj_uevent_env *env);
 	int (*add_links)(struct device *dev);
 	int (*probe)(struct device *dev);
+	void (*sync_state)(struct device *dev);
 	int (*remove)(struct device *dev);
 	void (*shutdown)(struct device *dev);
 
@@ -280,6 +290,13 @@ enum probe_type {
  * @probe:	Called to query the existence of a specific device,
  *		whether this driver can work with it, and bind the driver
  *		to a specific device.
+ * @sync_state:	Called to sync device state to software state after all the
+ *		state tracking consumers linked to this device (present at
+ *		the time of late_initcall) have successfully bound to a
+ *		driver. If the device has no consumers, this function will
+ *		be called at late_initcall_sync level. If the device has
+ *		consumers that are never bound to a driver, this function
+ *		will never get called until they do.
  * @remove:	Called when the device is removed from the system to
  *		unbind a device from this driver.
  * @shutdown:	Called at shut-down time to quiesce the device.
@@ -318,6 +335,7 @@ struct device_driver {
 
 	int (*edit_links)(struct device *dev);
 	int (*probe) (struct device *dev);
+	void (*sync_state)(struct device *dev);
 	int (*remove) (struct device *dev);
 	void (*shutdown) (struct device *dev);
 	int (*suspend) (struct device *dev, pm_message_t state);
@@ -921,12 +939,14 @@ enum dl_dev_state {
  * @suppliers: List of links to supplier devices.
  * @consumers: List of links to consumer devices.
  * @needs_suppliers: Hook to global list of devices waiting for suppliers.
+ * @defer_sync: Hook to global list of devices that have deferred sync_state.
  * @status: Driver status information.
  */
 struct dev_links_info {
 	struct list_head suppliers;
 	struct list_head consumers;
 	struct list_head needs_suppliers;
+	struct list_head defer_sync;
 	enum dl_dev_state status;
 };
 
@@ -1094,6 +1114,7 @@ struct device {
 	bool			offline:1;
 	bool			of_node_reused:1;
 	bool			has_edit_links:1;
+	bool			state_synced:1;
 #if defined(CONFIG_ARCH_HAS_SYNC_DMA_FOR_DEVICE) || \
     defined(CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU) || \
     defined(CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU_ALL)
@@ -1437,6 +1458,8 @@ struct device_link *device_link_add(struct device *consumer,
 void device_link_del(struct device_link *link);
 void device_link_remove(void *consumer, struct device *supplier);
 void device_link_remove_from_wfs(struct device *consumer);
+void device_links_supplier_sync_state_pause(void);
+void device_links_supplier_sync_state_resume(void);
 
 #ifndef dev_fmt
 #define dev_fmt(fmt) fmt
-- 
2.22.0.709.g102302147b-goog

^ permalink raw reply	[flat|nested] 37+ messages in thread

* [PATCH v7 5/7] of/platform: Pause/resume sync state during init and of_platform_populate()
  2019-07-24  0:10 [PATCH v7 0/7] Solve postboot supplier cleanup and optimize probe ordering Saravana Kannan
                   ` (3 preceding siblings ...)
  2019-07-24  0:10 ` [PATCH v7 4/7] driver core: Add sync_state driver/bus callback Saravana Kannan
@ 2019-07-24  0:10 ` Saravana Kannan
  2019-07-24  0:10 ` [PATCH v7 6/7] of/platform: Create device links for all child-supplier depencencies Saravana Kannan
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 37+ messages in thread
From: Saravana Kannan @ 2019-07-24  0:10 UTC (permalink / raw)
  To: Rob Herring, Mark Rutland, Greg Kroah-Hartman, Rafael J. Wysocki,
	Frank Rowand
  Cc: Saravana Kannan, devicetree, linux-kernel, David Collins, kernel-team

When all the top level devices are populated from DT during kernel
init, the supplier devices could be added and probed before the
consumer devices are added and linked to the suppliers. To avoid the
sync_state() callback from being called prematurely, pause the
sync_state() callbacks before populating the devices and resume them
at late_initcall_sync().

Similarly, when children devices are populated after kernel init using
of_platform_populate(), there could be supplier-consumer dependencies
between the children devices that are populated. To avoid the same
problem with sync_state() being called prematurely, pause and resume
sync_state() callbacks across of_platform_populate().

Signed-off-by: Saravana Kannan <saravanak@google.com>
---
 drivers/of/platform.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/drivers/of/platform.c b/drivers/of/platform.c
index 4344419a26fc..71d6138698ec 100644
--- a/drivers/of/platform.c
+++ b/drivers/of/platform.c
@@ -485,6 +485,7 @@ int of_platform_populate(struct device_node *root,
 	pr_debug("%s()\n", __func__);
 	pr_debug(" starting at: %pOF\n", root);
 
+	device_links_supplier_sync_state_pause();
 	for_each_child_of_node(root, child) {
 		rc = of_platform_bus_create(child, matches, lookup, parent, true);
 		if (rc) {
@@ -492,6 +493,8 @@ int of_platform_populate(struct device_node *root,
 			break;
 		}
 	}
+	device_links_supplier_sync_state_resume();
+
 	of_node_set_flag(root, OF_POPULATED_BUS);
 
 	of_node_put(root);
@@ -688,6 +691,7 @@ static int __init of_platform_default_populate_init(void)
 		return -ENODEV;
 
 	platform_bus_type.add_links = of_link_to_suppliers;
+	device_links_supplier_sync_state_pause();
 	/*
 	 * Handle certain compatibles explicitly, since we don't want to create
 	 * platform_devices for every node in /reserved-memory with a
@@ -708,6 +712,13 @@ static int __init of_platform_default_populate_init(void)
 	return 0;
 }
 arch_initcall_sync(of_platform_default_populate_init);
+
+static int __init of_platform_sync_state_init(void)
+{
+	device_links_supplier_sync_state_resume();
+	return 0;
+}
+late_initcall_sync(of_platform_sync_state_init);
 #endif
 
 int of_platform_device_destroy(struct device *dev, void *data)
-- 
2.22.0.709.g102302147b-goog

^ permalink raw reply	[flat|nested] 37+ messages in thread

* [PATCH v7 6/7] of/platform: Create device links for all child-supplier depencencies
  2019-07-24  0:10 [PATCH v7 0/7] Solve postboot supplier cleanup and optimize probe ordering Saravana Kannan
                   ` (4 preceding siblings ...)
  2019-07-24  0:10 ` [PATCH v7 5/7] of/platform: Pause/resume sync state during init and of_platform_populate() Saravana Kannan
@ 2019-07-24  0:10 ` Saravana Kannan
  2019-07-24  0:11 ` [PATCH v7 7/7] of/platform: Don't create device links for default busses Saravana Kannan
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 37+ messages in thread
From: Saravana Kannan @ 2019-07-24  0:10 UTC (permalink / raw)
  To: Rob Herring, Mark Rutland, Greg Kroah-Hartman, Rafael J. Wysocki,
	Frank Rowand
  Cc: Saravana Kannan, devicetree, linux-kernel, David Collins, kernel-team

A parent device can have child devices that it adds when it probes. But
this probing of the parent device can happen way after kernel init is done
-- for example, when the parent device's driver is loaded as a module.

In such cases, if the child devices depend on a supplier in the system, we
need to make sure the supplier gets the sync_state() callback only after
these child devices are added and probed.

To achieve this, when creating device links for a device by looking at its
DT node, don't just look at DT references at the top node level. Look at DT
references in all the descendant nodes too and create device links from the
ancestor device to all these supplier devices.

This way, when the parent device probes and adds child devices, the child
devices can then create their own device links to the suppliers and further
delay the supplier's sync_state() callback to after the child devices are
probed.

Example:
In this illustration, -> denotes DT references and indentation
represents child status.

Device node A
	Device node B -> D
	Device node C -> B, D

Device node D

Assume all these devices have their drivers loaded as modules.

Without this patch, this is the sequence of events:
1. D is added.
2. A is added.
3. Device D probes.
4. Device D gets its sync_state() callback.
5. Device B and C might malfunction because their resources got
   altered/turned off before they can make active requests for them.

With this patch, this is the sequence of events:
1. D is added.
2. A is added and creates device links to D.
3. Device link from A to B is not added because A is a parent of B.
4. Device D probes.
5. Device D does not get it's sync_state() callback because consumer A
   hasn't probed yet.
5. Device A probes.
5. a. Devices B and C are added.
5. b. Device links from B and C to D are added.
5. c. Device A's probe completes.
6. Device D does not get it's sync_state() callback because consumer A
   has probed but consumers B and C haven't probed yet.
7. Device B and C probe.
8. Device D gets it's sync_state() callback because all its consumers
   have probed.
9. None of the devices malfunction.

Signed-off-by: Saravana Kannan <saravanak@google.com>
---
 drivers/of/platform.c | 27 +++++++++++++++++++--------
 1 file changed, 19 insertions(+), 8 deletions(-)

diff --git a/drivers/of/platform.c b/drivers/of/platform.c
index 71d6138698ec..41499ddc8d95 100644
--- a/drivers/of/platform.c
+++ b/drivers/of/platform.c
@@ -655,24 +655,35 @@ static bool of_link_property(struct device *dev, struct device_node *con_np,
 	return done ? 0 : -ENODEV;
 }
 
+static int __of_link_to_suppliers(struct device *dev,
+				  struct device_node *con_np)
+{
+	struct device_node *child;
+	struct property *p;
+	bool done = true;
+
+	for_each_property_of_node(con_np, p)
+		if (of_link_property(dev, con_np, p->name))
+			done = false;
+
+	for_each_child_of_node(con_np, child)
+		if (__of_link_to_suppliers(dev, child))
+			done = false;
+
+	return done ? 0 : -ENODEV;
+}
+
 static bool of_devlink;
 core_param(of_devlink, of_devlink, bool, 0);
 
 static int of_link_to_suppliers(struct device *dev)
 {
-	struct property *p;
-	bool done = true;
-
 	if (!of_devlink)
 		return 0;
 	if (unlikely(!dev->of_node))
 		return 0;
 
-	for_each_property_of_node(dev->of_node, p)
-		if (of_link_property(dev, dev->of_node, p->name))
-			done = false;
-
-	return done ? 0 : -ENODEV;
+	return __of_link_to_suppliers(dev, dev->of_node);
 }
 
 #ifndef CONFIG_PPC
-- 
2.22.0.709.g102302147b-goog

^ permalink raw reply	[flat|nested] 37+ messages in thread

* [PATCH v7 7/7] of/platform: Don't create device links for default busses
  2019-07-24  0:10 [PATCH v7 0/7] Solve postboot supplier cleanup and optimize probe ordering Saravana Kannan
                   ` (5 preceding siblings ...)
  2019-07-24  0:10 ` [PATCH v7 6/7] of/platform: Create device links for all child-supplier depencencies Saravana Kannan
@ 2019-07-24  0:11 ` Saravana Kannan
  2019-07-25 13:42 ` [PATCH v7 0/7] Solve postboot supplier cleanup and optimize probe ordering Greg Kroah-Hartman
  2019-08-08  2:02 ` Frank Rowand
  8 siblings, 0 replies; 37+ messages in thread
From: Saravana Kannan @ 2019-07-24  0:11 UTC (permalink / raw)
  To: Rob Herring, Mark Rutland, Greg Kroah-Hartman, Rafael J. Wysocki,
	Frank Rowand
  Cc: Saravana Kannan, devicetree, linux-kernel, David Collins, kernel-team

Default busses also have devices created for them. But there's no point
in creating device links for them. It's especially wasteful as it'll
cause the traversal of the entire device tree and also spend a lot of
time checking and figuring out that creating those links isn't allowed.
So check for default busses and skip trying to create device links for
them.

Signed-off-by: Saravana Kannan <saravanak@google.com>
---
 drivers/of/platform.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/of/platform.c b/drivers/of/platform.c
index 41499ddc8d95..676b2f730d1b 100644
--- a/drivers/of/platform.c
+++ b/drivers/of/platform.c
@@ -682,6 +682,8 @@ static int of_link_to_suppliers(struct device *dev)
 		return 0;
 	if (unlikely(!dev->of_node))
 		return 0;
+	if (of_match_node(of_default_bus_match_table, dev->of_node))
+		return 0;
 
 	return __of_link_to_suppliers(dev, dev->of_node);
 }
-- 
2.22.0.709.g102302147b-goog

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v7 0/7] Solve postboot supplier cleanup and optimize probe ordering
  2019-07-24  0:10 [PATCH v7 0/7] Solve postboot supplier cleanup and optimize probe ordering Saravana Kannan
                   ` (6 preceding siblings ...)
  2019-07-24  0:11 ` [PATCH v7 7/7] of/platform: Don't create device links for default busses Saravana Kannan
@ 2019-07-25 13:42 ` Greg Kroah-Hartman
  2019-07-25 21:04   ` Frank Rowand
  2019-08-08  2:02 ` Frank Rowand
  8 siblings, 1 reply; 37+ messages in thread
From: Greg Kroah-Hartman @ 2019-07-25 13:42 UTC (permalink / raw)
  To: Saravana Kannan
  Cc: Rob Herring, Mark Rutland, Rafael J. Wysocki, Frank Rowand,
	devicetree, linux-kernel, David Collins, kernel-team

On Tue, Jul 23, 2019 at 05:10:53PM -0700, Saravana Kannan wrote:
> Add device-links to track functional dependencies between devices
> after they are created (but before they are probed) by looking at
> their common DT bindings like clocks, interconnects, etc.
> 
> Having functional dependencies automatically added before the devices
> are probed, provides the following benefits:
> 
> - Optimizes device probe order and avoids the useless work of
>   attempting probes of devices that will not probe successfully
>   (because their suppliers aren't present or haven't probed yet).
> 
>   For example, in a commonly available mobile SoC, registering just
>   one consumer device's driver at an initcall level earlier than the
>   supplier device's driver causes 11 failed probe attempts before the
>   consumer device probes successfully. This was with a kernel with all
>   the drivers statically compiled in. This problem gets a lot worse if
>   all the drivers are loaded as modules without direct symbol
>   dependencies.
> 
> - Supplier devices like clock providers, interconnect providers, etc
>   need to keep the resources they provide active and at a particular
>   state(s) during boot up even if their current set of consumers don't
>   request the resource to be active. This is because the rest of the
>   consumers might not have probed yet and turning off the resource
>   before all the consumers have probed could lead to a hang or
>   undesired user experience.
> 
>   Some frameworks (Eg: regulator) handle this today by turning off
>   "unused" resources at late_initcall_sync and hoping all the devices
>   have probed by then. This is not a valid assumption for systems with
>   loadable modules. Other frameworks (Eg: clock) just don't handle
>   this due to the lack of a clear signal for when they can turn off
>   resources. This leads to downstream hacks to handle cases like this
>   that can easily be solved in the upstream kernel.
> 
>   By linking devices before they are probed, we give suppliers a clear
>   count of the number of dependent consumers. Once all of the
>   consumers are active, the suppliers can turn off the unused
>   resources without making assumptions about the number of consumers.
> 
> By default we just add device-links to track "driver presence" (probe
> succeeded) of the supplier device. If any other functionality provided
> by device-links are needed, it is left to the consumer/supplier
> devices to change the link when they probe.
> 
> v1 -> v2:
> - Drop patch to speed up of_find_device_by_node()
> - Drop depends-on property and use existing bindings
> 
> v2 -> v3:
> - Refactor the code to have driver core initiate the linking of devs
> - Have driver core link consumers to supplier before it's probed
> - Add support for drivers to edit the device links before probing
> 
> v3 -> v4:
> - Tested edit_links() on system with cyclic dependency. Works.
> - Added some checks to make sure device link isn't attempted from
>   parent device node to child device node.
> - Added way to pause/resume sync_state callbacks across
>   of_platform_populate().
> - Recursively parse DT node to create device links from parent to
>   suppliers of parent and all child nodes.
> 
> v4 -> v5:
> - Fixed copy-pasta bugs with linked list handling
> - Walk up the phandle reference till I find an actual device (needed
>   for regulators to work)
> - Added support for linking devices from regulator DT bindings
> - Tested the whole series again to make sure cyclic dependencies are
>   broken with edit_links() and regulator links are created properly.
> 
> v5 -> v6:
> - Split, squashed and reordered some of the patches.
> - Refactored the device linking code to follow the same code pattern for
>   any property.
> 
> v6 -> v7:
> - No functional changes.
> - Renamed i to index
> - Added comment to clarify not having to check property name for every
>   index
> - Added "matched" variable to clarify code. No functional change.
> - Added comments to include/linux/device.h for add_links()
> 
> I've also not updated this patch series to handle the new patch [1] from
> Rafael. Will do that once this patch series is close to being Acked.
> 
> [1] - https://lore.kernel.org/lkml/3121545.4lOhFoIcdQ@kreacher/


This looks sane to me.  Anyone have any objections for me queueing this
up for my tree to get into linux-next now?

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v7 0/7] Solve postboot supplier cleanup and optimize probe ordering
  2019-07-25 13:42 ` [PATCH v7 0/7] Solve postboot supplier cleanup and optimize probe ordering Greg Kroah-Hartman
@ 2019-07-25 21:04   ` Frank Rowand
  2019-07-26 14:32     ` Greg Kroah-Hartman
  0 siblings, 1 reply; 37+ messages in thread
From: Frank Rowand @ 2019-07-25 21:04 UTC (permalink / raw)
  To: Greg Kroah-Hartman, Saravana Kannan
  Cc: Rob Herring, Mark Rutland, Rafael J. Wysocki, devicetree,
	linux-kernel, David Collins, kernel-team

On 7/25/19 6:42 AM, Greg Kroah-Hartman wrote:
> On Tue, Jul 23, 2019 at 05:10:53PM -0700, Saravana Kannan wrote:
>> Add device-links to track functional dependencies between devices
>> after they are created (but before they are probed) by looking at
>> their common DT bindings like clocks, interconnects, etc.
>>
>> Having functional dependencies automatically added before the devices
>> are probed, provides the following benefits:
>>
>> - Optimizes device probe order and avoids the useless work of
>>   attempting probes of devices that will not probe successfully
>>   (because their suppliers aren't present or haven't probed yet).
>>
>>   For example, in a commonly available mobile SoC, registering just
>>   one consumer device's driver at an initcall level earlier than the
>>   supplier device's driver causes 11 failed probe attempts before the
>>   consumer device probes successfully. This was with a kernel with all
>>   the drivers statically compiled in. This problem gets a lot worse if
>>   all the drivers are loaded as modules without direct symbol
>>   dependencies.
>>
>> - Supplier devices like clock providers, interconnect providers, etc
>>   need to keep the resources they provide active and at a particular
>>   state(s) during boot up even if their current set of consumers don't
>>   request the resource to be active. This is because the rest of the
>>   consumers might not have probed yet and turning off the resource
>>   before all the consumers have probed could lead to a hang or
>>   undesired user experience.
>>
>>   Some frameworks (Eg: regulator) handle this today by turning off
>>   "unused" resources at late_initcall_sync and hoping all the devices
>>   have probed by then. This is not a valid assumption for systems with
>>   loadable modules. Other frameworks (Eg: clock) just don't handle
>>   this due to the lack of a clear signal for when they can turn off
>>   resources. This leads to downstream hacks to handle cases like this
>>   that can easily be solved in the upstream kernel.
>>
>>   By linking devices before they are probed, we give suppliers a clear
>>   count of the number of dependent consumers. Once all of the
>>   consumers are active, the suppliers can turn off the unused
>>   resources without making assumptions about the number of consumers.
>>
>> By default we just add device-links to track "driver presence" (probe
>> succeeded) of the supplier device. If any other functionality provided
>> by device-links are needed, it is left to the consumer/supplier
>> devices to change the link when they probe.
>>
>> v1 -> v2:
>> - Drop patch to speed up of_find_device_by_node()
>> - Drop depends-on property and use existing bindings
>>
>> v2 -> v3:
>> - Refactor the code to have driver core initiate the linking of devs
>> - Have driver core link consumers to supplier before it's probed
>> - Add support for drivers to edit the device links before probing
>>
>> v3 -> v4:
>> - Tested edit_links() on system with cyclic dependency. Works.
>> - Added some checks to make sure device link isn't attempted from
>>   parent device node to child device node.
>> - Added way to pause/resume sync_state callbacks across
>>   of_platform_populate().
>> - Recursively parse DT node to create device links from parent to
>>   suppliers of parent and all child nodes.
>>
>> v4 -> v5:
>> - Fixed copy-pasta bugs with linked list handling
>> - Walk up the phandle reference till I find an actual device (needed
>>   for regulators to work)
>> - Added support for linking devices from regulator DT bindings
>> - Tested the whole series again to make sure cyclic dependencies are
>>   broken with edit_links() and regulator links are created properly.
>>
>> v5 -> v6:
>> - Split, squashed and reordered some of the patches.
>> - Refactored the device linking code to follow the same code pattern for
>>   any property.
>>
>> v6 -> v7:
>> - No functional changes.
>> - Renamed i to index
>> - Added comment to clarify not having to check property name for every
>>   index
>> - Added "matched" variable to clarify code. No functional change.
>> - Added comments to include/linux/device.h for add_links()
>>
>> I've also not updated this patch series to handle the new patch [1] from
>> Rafael. Will do that once this patch series is close to being Acked.
>>
>> [1] - https://lore.kernel.org/lkml/3121545.4lOhFoIcdQ@kreacher/
> 
> 
> This looks sane to me.  Anyone have any objections for me queueing this
> up for my tree to get into linux-next now?

I would like for the series to get into linux-next sooner than later,
and spend some time there.  

I am _slightly_ more optimistic than Rob that sitting in linux-next for
an extended period might reveal any latent issues, so I would like for
the series to be in linux-next for an extended period of time.  (Yes,
my understanding is that Linus does not like patches to be in linux-next
if they are not targeted for the next merge window, but I prefer that
this patch series spend as much time in linux-next as possible).

I have been waiting for the changes to settle down before bringing up
the issue of devicetree overlays.  Now that the code seems to be
settling down, I need to look at how these changes impact overlays.
So I do not think the patches will be ready for a Linus pull request
until overlays are considered.

-Frank

> 
> thanks,
> 
> greg k-h
> 

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v7 0/7] Solve postboot supplier cleanup and optimize probe ordering
  2019-07-25 21:04   ` Frank Rowand
@ 2019-07-26 14:32     ` Greg Kroah-Hartman
  2019-07-31  2:22       ` Frank Rowand
  0 siblings, 1 reply; 37+ messages in thread
From: Greg Kroah-Hartman @ 2019-07-26 14:32 UTC (permalink / raw)
  To: Frank Rowand
  Cc: Saravana Kannan, Rob Herring, Mark Rutland, Rafael J. Wysocki,
	devicetree, linux-kernel, David Collins, kernel-team

On Thu, Jul 25, 2019 at 02:04:23PM -0700, Frank Rowand wrote:
> On 7/25/19 6:42 AM, Greg Kroah-Hartman wrote:
> > On Tue, Jul 23, 2019 at 05:10:53PM -0700, Saravana Kannan wrote:
> >> Add device-links to track functional dependencies between devices
> >> after they are created (but before they are probed) by looking at
> >> their common DT bindings like clocks, interconnects, etc.
> >>
> >> Having functional dependencies automatically added before the devices
> >> are probed, provides the following benefits:
> >>
> >> - Optimizes device probe order and avoids the useless work of
> >>   attempting probes of devices that will not probe successfully
> >>   (because their suppliers aren't present or haven't probed yet).
> >>
> >>   For example, in a commonly available mobile SoC, registering just
> >>   one consumer device's driver at an initcall level earlier than the
> >>   supplier device's driver causes 11 failed probe attempts before the
> >>   consumer device probes successfully. This was with a kernel with all
> >>   the drivers statically compiled in. This problem gets a lot worse if
> >>   all the drivers are loaded as modules without direct symbol
> >>   dependencies.
> >>
> >> - Supplier devices like clock providers, interconnect providers, etc
> >>   need to keep the resources they provide active and at a particular
> >>   state(s) during boot up even if their current set of consumers don't
> >>   request the resource to be active. This is because the rest of the
> >>   consumers might not have probed yet and turning off the resource
> >>   before all the consumers have probed could lead to a hang or
> >>   undesired user experience.
> >>
> >>   Some frameworks (Eg: regulator) handle this today by turning off
> >>   "unused" resources at late_initcall_sync and hoping all the devices
> >>   have probed by then. This is not a valid assumption for systems with
> >>   loadable modules. Other frameworks (Eg: clock) just don't handle
> >>   this due to the lack of a clear signal for when they can turn off
> >>   resources. This leads to downstream hacks to handle cases like this
> >>   that can easily be solved in the upstream kernel.
> >>
> >>   By linking devices before they are probed, we give suppliers a clear
> >>   count of the number of dependent consumers. Once all of the
> >>   consumers are active, the suppliers can turn off the unused
> >>   resources without making assumptions about the number of consumers.
> >>
> >> By default we just add device-links to track "driver presence" (probe
> >> succeeded) of the supplier device. If any other functionality provided
> >> by device-links are needed, it is left to the consumer/supplier
> >> devices to change the link when they probe.
> >>
> >> v1 -> v2:
> >> - Drop patch to speed up of_find_device_by_node()
> >> - Drop depends-on property and use existing bindings
> >>
> >> v2 -> v3:
> >> - Refactor the code to have driver core initiate the linking of devs
> >> - Have driver core link consumers to supplier before it's probed
> >> - Add support for drivers to edit the device links before probing
> >>
> >> v3 -> v4:
> >> - Tested edit_links() on system with cyclic dependency. Works.
> >> - Added some checks to make sure device link isn't attempted from
> >>   parent device node to child device node.
> >> - Added way to pause/resume sync_state callbacks across
> >>   of_platform_populate().
> >> - Recursively parse DT node to create device links from parent to
> >>   suppliers of parent and all child nodes.
> >>
> >> v4 -> v5:
> >> - Fixed copy-pasta bugs with linked list handling
> >> - Walk up the phandle reference till I find an actual device (needed
> >>   for regulators to work)
> >> - Added support for linking devices from regulator DT bindings
> >> - Tested the whole series again to make sure cyclic dependencies are
> >>   broken with edit_links() and regulator links are created properly.
> >>
> >> v5 -> v6:
> >> - Split, squashed and reordered some of the patches.
> >> - Refactored the device linking code to follow the same code pattern for
> >>   any property.
> >>
> >> v6 -> v7:
> >> - No functional changes.
> >> - Renamed i to index
> >> - Added comment to clarify not having to check property name for every
> >>   index
> >> - Added "matched" variable to clarify code. No functional change.
> >> - Added comments to include/linux/device.h for add_links()
> >>
> >> I've also not updated this patch series to handle the new patch [1] from
> >> Rafael. Will do that once this patch series is close to being Acked.
> >>
> >> [1] - https://lore.kernel.org/lkml/3121545.4lOhFoIcdQ@kreacher/
> > 
> > 
> > This looks sane to me.  Anyone have any objections for me queueing this
> > up for my tree to get into linux-next now?
> 
> I would like for the series to get into linux-next sooner than later,
> and spend some time there.  

Ok, care to give me an ack for it?  :)

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v7 0/7] Solve postboot supplier cleanup and optimize probe ordering
  2019-07-26 14:32     ` Greg Kroah-Hartman
@ 2019-07-31  2:22       ` Frank Rowand
  0 siblings, 0 replies; 37+ messages in thread
From: Frank Rowand @ 2019-07-31  2:22 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Saravana Kannan, Rob Herring, Mark Rutland, Rafael J. Wysocki,
	devicetree, linux-kernel, David Collins, kernel-team

Hi Greg, Rob,

On 7/26/19 7:32 AM, Greg Kroah-Hartman wrote:
> On Thu, Jul 25, 2019 at 02:04:23PM -0700, Frank Rowand wrote:
>> On 7/25/19 6:42 AM, Greg Kroah-Hartman wrote:
>>> On Tue, Jul 23, 2019 at 05:10:53PM -0700, Saravana Kannan wrote:
>>>> Add device-links to track functional dependencies between devices
>>>> after they are created (but before they are probed) by looking at
>>>> their common DT bindings like clocks, interconnects, etc.
>>>>
>>>> Having functional dependencies automatically added before the devices
>>>> are probed, provides the following benefits:
>>>>
>>>> - Optimizes device probe order and avoids the useless work of
>>>>   attempting probes of devices that will not probe successfully
>>>>   (because their suppliers aren't present or haven't probed yet).
>>>>
>>>>   For example, in a commonly available mobile SoC, registering just
>>>>   one consumer device's driver at an initcall level earlier than the
>>>>   supplier device's driver causes 11 failed probe attempts before the
>>>>   consumer device probes successfully. This was with a kernel with all
>>>>   the drivers statically compiled in. This problem gets a lot worse if
>>>>   all the drivers are loaded as modules without direct symbol
>>>>   dependencies.
>>>>
>>>> - Supplier devices like clock providers, interconnect providers, etc
>>>>   need to keep the resources they provide active and at a particular
>>>>   state(s) during boot up even if their current set of consumers don't
>>>>   request the resource to be active. This is because the rest of the
>>>>   consumers might not have probed yet and turning off the resource
>>>>   before all the consumers have probed could lead to a hang or
>>>>   undesired user experience.
>>>>
>>>>   Some frameworks (Eg: regulator) handle this today by turning off
>>>>   "unused" resources at late_initcall_sync and hoping all the devices
>>>>   have probed by then. This is not a valid assumption for systems with
>>>>   loadable modules. Other frameworks (Eg: clock) just don't handle
>>>>   this due to the lack of a clear signal for when they can turn off
>>>>   resources. This leads to downstream hacks to handle cases like this
>>>>   that can easily be solved in the upstream kernel.
>>>>
>>>>   By linking devices before they are probed, we give suppliers a clear
>>>>   count of the number of dependent consumers. Once all of the
>>>>   consumers are active, the suppliers can turn off the unused
>>>>   resources without making assumptions about the number of consumers.
>>>>
>>>> By default we just add device-links to track "driver presence" (probe
>>>> succeeded) of the supplier device. If any other functionality provided
>>>> by device-links are needed, it is left to the consumer/supplier
>>>> devices to change the link when they probe.
>>>>
>>>> v1 -> v2:
>>>> - Drop patch to speed up of_find_device_by_node()
>>>> - Drop depends-on property and use existing bindings
>>>>
>>>> v2 -> v3:
>>>> - Refactor the code to have driver core initiate the linking of devs
>>>> - Have driver core link consumers to supplier before it's probed
>>>> - Add support for drivers to edit the device links before probing
>>>>
>>>> v3 -> v4:
>>>> - Tested edit_links() on system with cyclic dependency. Works.
>>>> - Added some checks to make sure device link isn't attempted from
>>>>   parent device node to child device node.
>>>> - Added way to pause/resume sync_state callbacks across
>>>>   of_platform_populate().
>>>> - Recursively parse DT node to create device links from parent to
>>>>   suppliers of parent and all child nodes.
>>>>
>>>> v4 -> v5:
>>>> - Fixed copy-pasta bugs with linked list handling
>>>> - Walk up the phandle reference till I find an actual device (needed
>>>>   for regulators to work)
>>>> - Added support for linking devices from regulator DT bindings
>>>> - Tested the whole series again to make sure cyclic dependencies are
>>>>   broken with edit_links() and regulator links are created properly.
>>>>
>>>> v5 -> v6:
>>>> - Split, squashed and reordered some of the patches.
>>>> - Refactored the device linking code to follow the same code pattern for
>>>>   any property.
>>>>
>>>> v6 -> v7:
>>>> - No functional changes.
>>>> - Renamed i to index
>>>> - Added comment to clarify not having to check property name for every
>>>>   index
>>>> - Added "matched" variable to clarify code. No functional change.
>>>> - Added comments to include/linux/device.h for add_links()
>>>>
>>>> I've also not updated this patch series to handle the new patch [1] from
>>>> Rafael. Will do that once this patch series is close to being Acked.
>>>>
>>>> [1] - https://lore.kernel.org/lkml/3121545.4lOhFoIcdQ@kreacher/
>>>
>>>
>>> This looks sane to me.  Anyone have any objections for me queueing this
>>> up for my tree to get into linux-next now?
>>
>> I would like for the series to get into linux-next sooner than later,
>> and spend some time there.  
> 
> Ok, care to give me an ack for it?  :)

Rob opined to me that if you apply the series, it will go into 5.4 unless
reverted.  That is also what I would expect.

I'm going through the series carefully now.  This is currently my highest
priority task.  I don't yet know if my comments will be minor, or whether
I will have larger changes to request.

So I am not ready to ack the series yet.

-Frank

> 
> thanks,
> 
> greg k-h
> 

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v7 0/7] Solve postboot supplier cleanup and optimize probe ordering
  2019-07-24  0:10 [PATCH v7 0/7] Solve postboot supplier cleanup and optimize probe ordering Saravana Kannan
                   ` (7 preceding siblings ...)
  2019-07-25 13:42 ` [PATCH v7 0/7] Solve postboot supplier cleanup and optimize probe ordering Greg Kroah-Hartman
@ 2019-08-08  2:02 ` Frank Rowand
  8 siblings, 0 replies; 37+ messages in thread
From: Frank Rowand @ 2019-08-08  2:02 UTC (permalink / raw)
  To: Saravana Kannan, Rob Herring, Mark Rutland, Greg Kroah-Hartman,
	Rafael J. Wysocki
  Cc: devicetree, linux-kernel, David Collins, kernel-team

Hi Saravana,

On 7/23/19 5:10 PM, Saravana Kannan wrote:
> Add device-links to track functional dependencies between devices
> after they are created (but before they are probed) by looking at
> their common DT bindings like clocks, interconnects, etc.
> 

< snip >

I know that this series has moved on to versions 8 and 9.  And some
additional patches submitted.

Version 8 was a rebase to handle device_link changes.  The version 8
patch 0/7 description of the changes did not note any functional
changes, so I am assuming that my comments on version 7 are still
applicable.

The version 9 changes do not impact my comments.

I am sending review comments on patches 1, 2, and 3.  I will continue
review of patches later in the series when the fall out from these
review comments result in a new series.

-Frank

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v7 1/7] driver core: Add support for linking devices during device addition
  2019-07-24  0:10 ` [PATCH v7 1/7] driver core: Add support for linking devices during device addition Saravana Kannan
@ 2019-08-08  2:04   ` Frank Rowand
  2019-08-16  1:50     ` Saravana Kannan
  0 siblings, 1 reply; 37+ messages in thread
From: Frank Rowand @ 2019-08-08  2:04 UTC (permalink / raw)
  To: Saravana Kannan, Rob Herring, Mark Rutland, Greg Kroah-Hartman,
	Rafael J. Wysocki
  Cc: devicetree, linux-kernel, David Collins, kernel-team

> Date: Tue, 23 Jul 2019 17:10:54 -0700
> Subject: [PATCH v7 1/7] driver core: Add support for linking devices during
>  device addition
> From: Saravana Kannan <saravanak@google.com>
> 
> When devices are added, the bus might want to create device links to track
> functional dependencies between supplier and consumer devices. This
> tracking of supplier-consumer relationship allows optimizing device probe
> order and tracking whether all consumers of a supplier are active. The
> add_links bus callback is added to support this.

Change above to:

When devices are added, the bus may create device links to track which
suppliers a consumer device depends upon.  This
tracking of supplier-consumer relationship may be used to defer probing
the driver of a consumer device before the driver(s) for its supplier device(s)
are probed.  It may also be used by a supplier driver to determine if
all of its consumers have been successfully probed.
The add_links bus callback is added to create the supplier device links

> 
> However, when consumer devices are added, they might not have a supplier
> device to link to despite needing mandatory resources/functionality from
> one or more suppliers. A waiting_for_suppliers list is created to track
> such consumers and retry linking them when new devices get added.

Change above to:

If a supplier device has not yet been created when the consumer device attempts
to link it, the consumer device is added to the wait_for_suppliers list.
When supplier devices are created, the supplier device link will be added to
the relevant consumer devices on the wait_for_suppliers list.

> 
> Signed-off-by: Saravana Kannan <saravanak@google.com>
> ---
>  drivers/base/core.c    | 83 ++++++++++++++++++++++++++++++++++++++++++
>  include/linux/device.h | 14 +++++++
>  2 files changed, 97 insertions(+)
> 
> diff --git a/drivers/base/core.c b/drivers/base/core.c
> index da84a73f2ba6..1b4eb221968f 100644
> --- a/drivers/base/core.c
> +++ b/drivers/base/core.c
> @@ -44,6 +44,8 @@ early_param("sysfs.deprecated", sysfs_deprecated_setup);
>  #endif
>  
>  /* Device links support. */
> +static LIST_HEAD(wait_for_suppliers);
> +static DEFINE_MUTEX(wfs_lock);
>  
>  #ifdef CONFIG_SRCU
>  static DEFINE_MUTEX(device_links_lock);
> @@ -401,6 +403,51 @@ struct device_link *device_link_add(struct device *consumer,
>  }
>  EXPORT_SYMBOL_GPL(device_link_add);
>  
> +/**

> + * device_link_wait_for_supplier - Mark device as waiting for supplier

    * device_link_wait_for_supplier - Add device to wait_for_suppliers list


> + * @consumer: Consumer device
> + *
> + * Marks the consumer device as waiting for suppliers to become available. The
> + * consumer device will never be probed until it's unmarked as waiting for
> + * suppliers. The caller is responsible for adding the link to the supplier
> + * once the supplier device is present.
> + *
> + * This function is NOT meant to be called from the probe function of the
> + * consumer but rather from code that creates/adds the consumer device.
> + */
> +static void device_link_wait_for_supplier(struct device *consumer)
> +{
> +	mutex_lock(&wfs_lock);
> +	list_add_tail(&consumer->links.needs_suppliers, &wait_for_suppliers);
> +	mutex_unlock(&wfs_lock);
> +}
> +
> +/**


> + * device_link_check_waiting_consumers - Try to remove from supplier wait list
> + *
> + * Loops through all consumers waiting on suppliers and tries to add all their
> + * supplier links. If that succeeds, the consumer device is unmarked as waiting
> + * for suppliers. Otherwise, they are left marked as waiting on suppliers,
> + *
> + * The add_links bus callback is expected to return 0 if it has found and added
> + * all the supplier links for the consumer device. It should return an error if
> + * it isn't able to do so.
> + *
> + * The caller of device_link_wait_for_supplier() is expected to call this once
> + * it's aware of potential suppliers becoming available.

Change above comment to:

    * device_link_add_supplier_links - add links from consumer devices to
    *                                  supplier devices, leaving any consumer
    *                                  with inactive suppliers on the
    *                                  wait_for_suppliers list

    * Scan all consumer devices in the devicetree.  For any supplier device that
    * is not already linked to the consumer device, add the supplier to the
    * consumer device's device links.
    *
    * If all of a consumer device's suppliers are available then the consumer
    * is removed from the wait_for_suppliers list (if previously on the list).
    * Otherwise the consumer is added to the wait_for_suppliers list (if not
    * already on the list).


    * The add_links bus callback must return 0 if it has found and added all
    * the supplier links for the consumer device. It must return an error if
    * it is not able to do so.
    *
    * The caller of device_link_wait_for_supplier() is expected to call this once
    * it is aware of potential suppliers becoming available.



> + */
> +static void device_link_check_waiting_consumers(void)

Function name is misleading and hides side effects.

I have not come up with a name that does not hide side effects, but a better
name would be:

   device_link_add_supplier_links()


> +{
> +	struct device *dev, *tmp;
> +
> +	mutex_lock(&wfs_lock);
> +	list_for_each_entry_safe(dev, tmp, &wait_for_suppliers,
> +				 links.needs_suppliers)
> +		if (!dev->bus->add_links(dev))
> +			list_del_init(&dev->links.needs_suppliers);

Empties dev->links.needs_suppliers, but does not remove dev from
wait_for_suppliers list.  Where does that happen?

> +	mutex_unlock(&wfs_lock);
> +}
> +
>  static void device_link_free(struct device_link *link)
>  {
>  	while (refcount_dec_not_one(&link->rpm_active))
> @@ -535,6 +582,19 @@ int device_links_check_suppliers(struct device *dev)
>  	struct device_link *link;
>  	int ret = 0;
>  
> +	/*
> +	 * If a device is waiting for one or more suppliers (in
> +	 * wait_for_suppliers list), it is not ready to probe yet. So just
> +	 * return -EPROBE_DEFER without having to check the links with existing
> +	 * suppliers.
> +	 */

Change comment to:

	/*
	 * Device waiting for supplier to become available is not allowed
	 * to probe
	 */

> +	mutex_lock(&wfs_lock);
> +	if (!list_empty(&dev->links.needs_suppliers)) {
> +		mutex_unlock(&wfs_lock);
> +		return -EPROBE_DEFER;
> +	}
> +	mutex_unlock(&wfs_lock);
> +
>  	device_links_write_lock();

Update Documentation/driver-api/device_link.rst to reflect the
check of &dev->links.needs_suppliers in device_links_check_suppliers().

>  
>  	list_for_each_entry(link, &dev->links.suppliers, c_node) {
> @@ -812,6 +872,10 @@ static void device_links_purge(struct device *dev)
>  {
>  	struct device_link *link, *ln;
>  
> +	mutex_lock(&wfs_lock);
> +	list_del(&dev->links.needs_suppliers);
> +	mutex_unlock(&wfs_lock);
> +
>  	/*
>  	 * Delete all of the remaining links from this device to any other
>  	 * devices (either consumers or suppliers).
> @@ -1673,6 +1737,7 @@ void device_initialize(struct device *dev)
>  #endif
>  	INIT_LIST_HEAD(&dev->links.consumers);
>  	INIT_LIST_HEAD(&dev->links.suppliers);
> +	INIT_LIST_HEAD(&dev->links.needs_suppliers);
>  	dev->links.status = DL_DEV_NO_DRIVER;
>  }
>  EXPORT_SYMBOL_GPL(device_initialize);
> @@ -2108,6 +2173,24 @@ int device_add(struct device *dev)
>  					     BUS_NOTIFY_ADD_DEVICE, dev);
>  
>  	kobject_uevent(&dev->kobj, KOBJ_ADD);

> +
> +	/*
> +	 * Check if any of the other devices (consumers) have been waiting for
> +	 * this device (supplier) to be added so that they can create a device
> +	 * link to it.
> +	 *
> +	 * This needs to happen after device_pm_add() because device_link_add()
> +	 * requires the supplier be registered before it's called.
> +	 *
> +	 * But this also needs to happe before bus_probe_device() to make sure
> +	 * waiting consumers can link to it before the driver is bound to the
> +	 * device and the driver sync_state callback is called for this device.
> +	 */

	/*
	 * Add links to dev from any dependent consumer that has dev on it's
	 * list of needed suppliers (links.needs_suppliers).  Device_pm_add()
	 * must have previously registered dev to allow the links to be added.
	 *
	 * The consumer links must be created before dev is probed because the
	 * sync_state callback for dev will use the consumer links.
	 */

> +	device_link_check_waiting_consumers();
> +
> +	if (dev->bus && dev->bus->add_links && dev->bus->add_links(dev))
> +		device_link_wait_for_supplier(dev);
> +
>  	bus_probe_device(dev);
>  	if (parent)
>  		klist_add_tail(&dev->p->knode_parent,
> diff --git a/include/linux/device.h b/include/linux/device.h
> index c330b75c6c57..5d70babb7462 100644
> --- a/include/linux/device.h
> +++ b/include/linux/device.h
> @@ -78,6 +78,17 @@ extern void bus_remove_file(struct bus_type *, struct bus_attribute *);
>   *		-EPROBE_DEFER it will queue the device for deferred probing.
>   * @uevent:	Called when a device is added, removed, or a few other things
>   *		that generate uevents to add the environment variables.

> + * @add_links:	Called, perhaps multiple times per device, after a device is
> + *		added to this bus.  The function is expected to create device
> + *		links to all the suppliers of the input device that are
> + *		available at the time this function is called.  As in, the
> + *		function should NOT stop at the first failed device link if
> + *		other unlinked supplier devices are present in the system.

* @add_links:	Called after a device is added to this bus.  The function is
*		expected to create device links to all the suppliers of the
*		device that are available at the time this function is called.
*		The function must NOT stop at the first failed device link if
*		other unlinked supplier devices are present in the system.
*		If some suppliers are not yet available, this function will be
*		called again when the suppliers become available.

but add_links() not needed, so moving this comment to of_link_to_suppliers()


> + *
> + *		Return 0 if device links have been successfully created to all
> + *		the suppliers of this device.  Return an error if some of the
> + *		suppliers are not yet available and this function needs to be
> + *		reattempted in the future.

*
*		Return 0 if device links have been successfully created to all
*		the suppliers of this device.  Return an error if some of the
*		suppliers are not yet available.


>   * @probe:	Called when a new device or driver add to this bus, and callback
>   *		the specific driver's probe to initial the matched device.
>   * @remove:	Called when a device removed from this bus.
> @@ -122,6 +133,7 @@ struct bus_type {
>  
>  	int (*match)(struct device *dev, struct device_driver *drv);
>  	int (*uevent)(struct device *dev, struct kobj_uevent_env *env);


> +	int (*add_links)(struct device *dev);

              ^^^^^^^^^  add_supplier              ???
              ^^^^^^^^^  add_suppliers             ???

              ^^^^^^^^^  link_suppliers            ???

              ^^^^^^^^^  add_supplier_dependency   ???
              ^^^^^^^^^  add_supplier_dependencies ???
add_links() not needed


>  	int (*probe)(struct device *dev);
>  	int (*remove)(struct device *dev);
>  	void (*shutdown)(struct device *dev);




> @@ -893,11 +905,13 @@ enum dl_dev_state {
>   * struct dev_links_info - Device data related to device links.
>   * @suppliers: List of links to supplier devices.
>   * @consumers: List of links to consumer devices.

> + * @needs_suppliers: Hook to global list of devices waiting for suppliers.

    * @needs_suppliers: List of devices deferring probe until supplier drivers
    *                   are successfully probed.

>   * @status: Driver status information.
>   */
>  struct dev_links_info {
>  	struct list_head suppliers;
>  	struct list_head consumers;
> +	struct list_head needs_suppliers;
>  	enum dl_dev_state status;
>  };
>  
> -- 
> 2.22.0.709.g102302147b-goog
> 
> 

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v7 2/7] driver core: Add edit_links() callback for drivers
  2019-07-24  0:10 ` [PATCH v7 2/7] driver core: Add edit_links() callback for drivers Saravana Kannan
@ 2019-08-08  2:05   ` Frank Rowand
  2019-08-16  1:50     ` Saravana Kannan
  0 siblings, 1 reply; 37+ messages in thread
From: Frank Rowand @ 2019-08-08  2:05 UTC (permalink / raw)
  To: Saravana Kannan, Rob Herring, Mark Rutland, Greg Kroah-Hartman,
	Rafael J. Wysocki
  Cc: devicetree, linux-kernel, David Collins, kernel-team

> Date: Tue, 23 Jul 2019 17:10:55 -0700
> Subject: [PATCH v7 2/7] driver core: Add edit_links() callback for drivers
> From: Saravana Kannan <saravanak@google.com>
> 
> The driver core/bus adding supplier-consumer dependencies by default

> enables functional dependencies to be tracked correctly even when the
> consumer devices haven't had their drivers registered or loaded (if they
> are modules).

  enables functional dependencies to be tracked correctly before the
  consumer device drivers are registered or loaded (if they are modules).

> 
> However, when the bus incorrectly adds dependencies that it shouldn't

                    ^^^ driver core/bus

> have added, the devices might never probe.

Explain what causes a  dependency to be incorrectly added.

Is this a bug in the dependency detection code?

Are there cases where the dependency detection code can not reliably determine
whether there truly is a dependency?

> 
> For example, if device-C is a consumer of device-S and they have
> phandles to each other in DT, the following could happen:
> 
> 1.  Device-S get added first.
> 2.  The bus add_links() callback will (incorrectly) try to link it as
>     a consumer of device-C.
> 3.  Since device-C isn't present, device-S will be put in
>     "waiting-for-supplier" list.
> 4.  Device-C gets added next.
> 5.  All devices in "waiting-for-supplier" list are retried for linking.
> 6.  Device-S gets linked as consumer to Device-C.
> 7.  The bus add_links() callback will (correctly) try to link it as
>     a consumer of device-S.
> 8.  This isn't allowed because it would create a cyclic device links.
> 
> Neither devices will get probed since the supplier is marked as
> dependent on the consumer. And the consumer will never probe because the
> consumer can't get resources from the supplier.
> 
> Without this patch, things stay in this broken state. However, with this
> patch, the execution will continue like this:
> 
> 9.  Device-C's driver is loaded.

Change comment to:
  
  For example, if device-C is a consumer of device-S and they have phandles
  referencing each other in the devicetree, the following could happen:

  1.  Device-S is added first.
        - The bus add_links() callback will (incorrectly) link device-S
          as a consumer of device-C, and device-S will be put in the
          "wait_for_suppliers" list.

  2.  Device-C is added next.
        - All devices in the "wait_for_suppliers" list are retried for linking.
        - Device-S remains linked as a consumer to device-C.
        - The bus add_links() callback will (correctly) try to link device-C as
          a consumer of device-S.
        - The link attempt will fail because it would create a cyclic device
          link, and device-C will be put in the "wait_for_suppliers" list.

  Device-S will not be probed because it is in the "wait_for_suppliers" list.
  Device-C will not be probed because it is in the "wait_for_suppliers" list.
  
> 
> Without this patch, things stay in this broken state. However, with this
> patch, the execution will continue like this:
> 
> 9.  Device-C's driver is loaded.

What is "loaded"?  Does that mean the device-C probe succeeds?

What causes device-C to be probed?  The normal processing of -EPROBE_DEFER
devices?


> 10. Device-C's driver removes Device-S as a consumer of Device-C.
> 11. Device-C's driver adds Device-C as a consumer of Device-S.
> 12. Device-S probes.
> 14. Device-C probes.
> 
> Signed-off-by: Saravana Kannan <saravanak@google.com>
> ---
>  drivers/base/core.c    | 24 ++++++++++++++++++++++--
>  drivers/base/dd.c      | 29 +++++++++++++++++++++++++++++
>  include/linux/device.h | 18 ++++++++++++++++++
>  3 files changed, 69 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/base/core.c b/drivers/base/core.c
> index 1b4eb221968f..733d8a9aec76 100644
> --- a/drivers/base/core.c
> +++ b/drivers/base/core.c
> @@ -422,6 +422,19 @@ static void device_link_wait_for_supplier(struct device *consumer)
>  	mutex_unlock(&wfs_lock);
>  }
>  
> +/**
> + * device_link_remove_from_wfs - Unmark device as waiting for supplier
> + * @consumer: Consumer device
> + *
> + * Unmark the consumer device as waiting for suppliers to become available.
> + */
> +void device_link_remove_from_wfs(struct device *consumer)

Misleading function name.
Incorrect description.

Does not remove consumer from list wait_for_suppliers.

At best, consumer might eventually get removed from list wait_for_suppliers
if device_link_check_waiting_consumers() is called again.

> +{
> +	mutex_lock(&wfs_lock);
> +	list_del_init(&consumer->links.needs_suppliers);
> +	mutex_unlock(&wfs_lock);
> +}
> +
>  /**
>   * device_link_check_waiting_consumers - Try to unmark waiting consumers
>   *
> @@ -439,12 +452,19 @@ static void device_link_wait_for_supplier(struct device *consumer)
>  static void device_link_check_waiting_consumers(void)
>  {
>  	struct device *dev, *tmp;
> +	int ret;
>  
>  	mutex_lock(&wfs_lock);
>  	list_for_each_entry_safe(dev, tmp, &wait_for_suppliers,
> -				 links.needs_suppliers)
> -		if (!dev->bus->add_links(dev))
> +				 links.needs_suppliers) {
> +		ret = 0;
> +		if (dev->has_edit_links)
> +			ret = driver_edit_links(dev);
> +		else if (dev->bus->add_links)
> +			ret = dev->bus->add_links(dev);
> +		if (!ret)
>  			list_del_init(&dev->links.needs_suppliers);
> +	}
>  	mutex_unlock(&wfs_lock);
>  }
>  
> diff --git a/drivers/base/dd.c b/drivers/base/dd.c
> index 994a90747420..5e7041ede0d7 100644
> --- a/drivers/base/dd.c
> +++ b/drivers/base/dd.c
> @@ -698,6 +698,12 @@ int driver_probe_device(struct device_driver *drv, struct device *dev)
>  	pr_debug("bus: '%s': %s: matched device %s with driver %s\n",
>  		 drv->bus->name, __func__, dev_name(dev), drv->name);
>  
> +	if (drv->edit_links) {
> +		if (drv->edit_links(dev))
> +			dev->has_edit_links = true;
> +		else
> +			device_link_remove_from_wfs(dev);
> +	}

For the purposes of the following paragraphs, I refer to dev as "dev_1" to
distinguish it from a new dev that will be encountered later.  The following
paragraphs assume dev_1 has a supplier dependency for a supplier that has
not probed yet.

Q. Why the extra level of indirection?

A. really_probe() does not set dev->driver before returning if
   device_links_check_suppliers() returned -EPROBE_DEFER.  Thus
   device_link_check_waiting_consumers() can not directly check
   "if (dev_1->driver->edit_links)".

   The added driver_probe_device() code is setting dev_1->has_edit_links in the
   probe path, then device_link_check_waiting_consumers() will use the value
   of dev_1->has_edit_links instead of directly checking
   "if (dev_1->driver->edit_links)".

   If really_probe() was modified to set dev->driver in this
   case then the need for dev->has_edit_links is removed and
   driver_edit_links() is not needed, since dev->driver would
   be available.  Removing driver_edit_links() simplifies the
   code.

device_add() calls dev_1->bus->add_links(dev_1), thus dev_1 will have the
supplier links set (for any suppliers not currently available) and be on
list wait_for_suppliers.

Then device_add() calls bus_probe_device(), leading to calling
driver_probe_device().  The above code fragment either sets
dev_1->has_edit_links or removes the needs_suppliers links from dev_1.
dev_1 remains on list wait_for_suppliers.

If (drv->edit_links(dev_1) returns 0 then device_link_remove_from_wfs()
removes the supplier links.  Shouldn't device_link_remove_from_wfs()  also
remove the device from the list wait_for_suppliers?

The next time a device is added, device_link_check_waiting_consumers() will
be called and dev_1 will be on list wait_for_suppliers, thus
device_link_check_waiting_consumers() will find dev_1->has_edit_links true
and thus call driver_edit_links() instead of calling dev->bus->add_links().

The comment in device.h, later in this patch, says that drv->edit_links() is
responsible for editing the device links for dev.  The comment provides no
guidance on how drv->edit_links() is supposed to determine what edits to
perform.  No example drv->edit_links() function is provided in this patch
series.  dev_1->bus->add_links(dev_1) may have added one or more suppliers
to its needs_suppliers link.  drv->edit_links() needs to be able to handle
all possible variants of what suppliers are on the needs_suppliers link.


>  	pm_runtime_get_suppliers(dev);
>  	if (dev->parent)
>  		pm_runtime_get_sync(dev->parent);
> @@ -786,6 +792,29 @@ struct device_attach_data {
>  	bool have_async;
>  };
>  
> +static int __driver_edit_links(struct device_driver *drv, void *data)
> +{
> +	struct device *dev = data;
> +
> +	if (!drv->edit_links)
> +		return 0;
> +
> +	if (driver_match_device(drv, dev) <= 0)
> +		return 0;
> +
> +	return drv->edit_links(dev);
> +}
> +
> +int driver_edit_links(struct device *dev)
> +{
> +	int ret;
> +
> +	device_lock(dev);
> +	ret = bus_for_each_drv(dev->bus, NULL, dev, __driver_edit_links);
> +	device_unlock(dev);
> +	return ret;
> +}
> +
>  static int __device_attach_driver(struct device_driver *drv, void *_data)
>  {
>  	struct device_attach_data *data = _data;
> diff --git a/include/linux/device.h b/include/linux/device.h
> index 5d70babb7462..35aed50033c4 100644
> --- a/include/linux/device.h
> +++ b/include/linux/device.h
> @@ -263,6 +263,20 @@ enum probe_type {
>   * @probe_type:	Type of the probe (synchronous or asynchronous) to use.
>   * @of_match_table: The open firmware table.
>   * @acpi_match_table: The ACPI match table.
> + * @edit_links:	Called to allow a matched driver to edit the device links the

Where is the value of field edit_links set?

Is it only in an out of tree driver?  If so, I would like to see an
example implementation of the edit_links() function.


> + *		bus might have added incorrectly. This will be useful to handle
> + *		cases where the bus incorrectly adds functional dependencies
> + *		that aren't true or tries to create cyclic dependencies. But
> + *		doesn't correctly handle functional dependencies that are
> + *		missed by the bus as the supplier's sync_state might get to
> + *		execute before the driver for a missing consumer is loaded and
> + *		gets to edit the device links for the consumer.
> + *
> + *		This function might be called multiple times after a new device
> + *		is added.  The function is expected to create all the device
> + *		links for the new device and return 0 if it was completed
> + *		successfully or return an error if it needs to be reattempted
> + *		in the future.
>   * @probe:	Called to query the existence of a specific device,
>   *		whether this driver can work with it, and bind the driver
>   *		to a specific device.
> @@ -302,6 +316,7 @@ struct device_driver {
>  	const struct of_device_id	*of_match_table;
>  	const struct acpi_device_id	*acpi_match_table;
>  
> +	int (*edit_links)(struct device *dev);
>  	int (*probe) (struct device *dev);
>  	int (*remove) (struct device *dev);
>  	void (*shutdown) (struct device *dev);
> @@ -1078,6 +1093,7 @@ struct device {
>  	bool			offline_disabled:1;
>  	bool			offline:1;
>  	bool			of_node_reused:1;
> +	bool			has_edit_links:1;

Add has_edit_links to the struct's kernel_doc


>  #if defined(CONFIG_ARCH_HAS_SYNC_DMA_FOR_DEVICE) || \
>      defined(CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU) || \
>      defined(CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU_ALL)
> @@ -1329,6 +1345,7 @@ extern int  __must_check device_attach(struct device *dev);
>  extern int __must_check driver_attach(struct device_driver *drv);
>  extern void device_initial_probe(struct device *dev);
>  extern int __must_check device_reprobe(struct device *dev);
> +extern int driver_edit_links(struct device *dev);
>  
>  extern bool device_is_bound(struct device *dev);
>  
> @@ -1419,6 +1436,7 @@ struct device_link *device_link_add(struct device *consumer,
>  				    struct device *supplier, u32 flags);
>  void device_link_del(struct device_link *link);
>  void device_link_remove(void *consumer, struct device *supplier);
> +void device_link_remove_from_wfs(struct device *consumer);
>  
>  #ifndef dev_fmt
>  #define dev_fmt(fmt) fmt
> -- 
> 2.22.0.709.g102302147b-goog
> 
> 

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v7 3/7] of/platform: Add functional dependency link from DT bindings
  2019-07-24  0:10 ` [PATCH v7 3/7] of/platform: Add functional dependency link from DT bindings Saravana Kannan
@ 2019-08-08  2:06   ` Frank Rowand
  2019-08-16  1:50     ` Saravana Kannan
  0 siblings, 1 reply; 37+ messages in thread
From: Frank Rowand @ 2019-08-08  2:06 UTC (permalink / raw)
  To: Saravana Kannan, Rob Herring, Mark Rutland, Greg Kroah-Hartman,
	Rafael J. Wysocki, Jonathan Corbet
  Cc: devicetree, linux-kernel, David Collins, kernel-team, linux-doc

On 7/23/19 5:10 PM, Saravana Kannan wrote:
> Add device-links after the devices are created (but before they are
> probed) by looking at common DT bindings like clocks and
> interconnects.
> 
> Automatically adding device-links for functional dependencies at the
> framework level provides the following benefits:
> 
> - Optimizes device probe order and avoids the useless work of
>   attempting probes of devices that will not probe successfully
>   (because their suppliers aren't present or haven't probed yet).
> 
>   For example, in a commonly available mobile SoC, registering just
>   one consumer device's driver at an initcall level earlier than the
>   supplier device's driver causes 11 failed probe attempts before the
>   consumer device probes successfully. This was with a kernel with all
>   the drivers statically compiled in. This problem gets a lot worse if
>   all the drivers are loaded as modules without direct symbol
>   dependencies.
> 
> - Supplier devices like clock providers, interconnect providers, etc
>   need to keep the resources they provide active and at a particular
>   state(s) during boot up even if their current set of consumers don't
>   request the resource to be active. This is because the rest of the
>   consumers might not have probed yet and turning off the resource
>   before all the consumers have probed could lead to a hang or
>   undesired user experience.
> 
>   Some frameworks (Eg: regulator) handle this today by turning off
>   "unused" resources at late_initcall_sync and hoping all the devices
>   have probed by then. This is not a valid assumption for systems with
>   loadable modules. Other frameworks (Eg: clock) just don't handle
>   this due to the lack of a clear signal for when they can turn off
>   resources. This leads to downstream hacks to handle cases like this
>   that can easily be solved in the upstream kernel.
> 
>   By linking devices before they are probed, we give suppliers a clear
>   count of the number of dependent consumers. Once all of the
>   consumers are active, the suppliers can turn off the unused
>   resources without making assumptions about the number of consumers.
> 
> By default we just add device-links to track "driver presence" (probe
> succeeded) of the supplier device. If any other functionality provided
> by device-links are needed, it is left to the consumer/supplier
> devices to change the link when they probe.
> 
> Signed-off-by: Saravana Kannan <saravanak@google.com>
> ---
>  .../admin-guide/kernel-parameters.txt         |   5 +
>  drivers/of/platform.c                         | 165 ++++++++++++++++++
>  2 files changed, 170 insertions(+)
> 

Documentation/admin-guide/kernel-paramers.rst:

After line 129, add:

	OF	Devicetree is enabled

> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
> index 46b826fcb5ad..12937349d79d 100644
> --- a/Documentation/admin-guide/kernel-parameters.txt
> +++ b/Documentation/admin-guide/kernel-parameters.txt
> @@ -3170,6 +3170,11 @@
>  			This can be set from sysctl after boot.
>  			See Documentation/admin-guide/sysctl/vm.rst for details.
>  
> +	of_devlink	[KNL] Make device links from common DT bindings. Useful
> +			for optimizing probe order and making sure resources
> +			aren't turned off before the consumer devices have
> +			probed.

        of_supplier_depend instead of of_devlink ????

	of_supplier_depend
			[OF, KNL] Make device links from consumer devicetree
			nodes to supplier devicetree nodes.  The
			consumer / supplier relationships are inferred from
			scanning the devicetree.  The driver for a consumer
			device will not be probed until the drivers for all of
			its supplier devices have been successfully probed.


> +
>  	ohci1394_dma=early	[HW] enable debugging via the ohci1394 driver.
>  			See Documentation/debugging-via-ohci1394.txt for more
>  			info.
> diff --git a/drivers/of/platform.c b/drivers/of/platform.c
> index 7801e25e6895..4344419a26fc 100644
> --- a/drivers/of/platform.c
> +++ b/drivers/of/platform.c
> @@ -508,6 +508,170 @@ int of_platform_default_populate(struct device_node *root,
>  }
>  EXPORT_SYMBOL_GPL(of_platform_default_populate);
>  

> +bool of_link_is_valid(struct device_node *con, struct device_node *sup)

Change to less vague:

   bool of_ancestor_of(struct device_node *test_np, struct device_node *np)


> +{
> +	of_node_get(sup);
> +	/*
> +	 * Don't allow linking a device node as a consumer of one of its
> +	 * descendant nodes. By definition, a child node can't be a functional
> +	 * dependency for the parent node.
> +	 */
> +	while (sup) {
> +		if (sup == con) {
> +			of_node_put(sup);
> +			return false;
> +		}
> +		sup = of_get_next_parent(sup);
> +	}
> +	return true;

Change to more generic:

	of_node_get(test_np);
	while (test_np) {
		if (test_np == np) {
			of_node_put(test_np);
			return true;
		}
		test_np = of_get_next_parent(test_np);
	}
	return false;

> +}
> +


/**
 * of_link_to_phandle - Add device link to supplier
 * @dev: consumer device
 * @sup_np: pointer to the supplier device tree node
 *
 * TODO: ...
 *
 * Return:
 * * 0 if link successfully created for supplier or of_devlink is false
 * * an error if unable to create link
 */

Should have dev_debug() or pr_warn() or something on errors in this
function -- the caller does not report any issue

> +static int of_link_to_phandle(struct device *dev, struct device_node *sup_np)
> +{
> +	struct platform_device *sup_dev;
> +	u32 dl_flags = DL_FLAG_AUTOPROBE_CONSUMER;
> +	int ret = 0;
> +
> +	/*

> +	 * Since we are trying to create device links, we need to find
> +	 * the actual device node that owns this supplier phandle.
> +	 * Often times it's the same node, but sometimes it can be one
> +	 * of the parents. So walk up the parent till you find a
> +	 * device.

Change comment to:

	 * Find the device node that contains the supplier phandle.  It may
	 * be @sup_np or it may be an ancestor of @sup_np.

> +	 */

See comment in caller of of_link_to_phandle() - do not hide the final
of_node_put() of sup_np inside of_link_to_phandle(), so need to do
an of_node_get() here.

	of_node_get(sup_np);

> +	while (sup_np && !of_find_property(sup_np, "compatible", NULL))
> +		sup_np = of_get_next_parent(sup_np);
> +	if (!sup_np)

> +		return 0;

This case should never occur(?), it is an error.

                return -ENODEV;

> +
> +	if (!of_link_is_valid(dev->of_node, sup_np)) {
> +		of_node_put(sup_np);
> +		return 0;

Do not use a name that obscures what the function is doing, also
return an actual issue.

	if (of_ancestor_of(sup_np, dev->of_node)) {
		of_node_put(sup_np);
		return -EINVAL;

> +	}
> +	sup_dev = of_find_device_by_node(sup_np);
> +	of_node_put(sup_np);
> +	if (!sup_dev)
> +		return -ENODEV;
> +	if (!device_link_add(dev, &sup_dev->dev, dl_flags))
> +		ret = -ENODEV;
> +	put_device(&sup_dev->dev);
> +	return ret;
> +}
> +

/**
 * parse_prop_cells - Property parsing functions for suppliers
 *
 * @np:            pointer to a device tree node containing a list
 * @prop_name:     Name of property holding a phandle value
 * @phandle_index: For properties holding a table of phandles, this is the
 *                 index into the table
 * @list_name:     property name that contains a list
 * @cells_name:    property name that specifies phandles' arguments count
 *
 * This function is useful to parse lists of phandles and their arguments.
 *
 * Return:
 * * Node pointer with refcount incremented, use of_node_put() on it when done.
 * * NULL if not found.
 */

> +static struct device_node *parse_prop_cells(struct device_node *np,
> +					    const char *prop, int index,
> +					    const char *binding,
> +					    const char *cell)

Make names consistent with of_parse_phandle_with_args():
  Change prop to prop_name
  Change index to phandle_index
  Change binding to list_name
  Change cell to cells_name

> +{
> +	struct of_phandle_args sup_args;
> +

> +	/* Don't need to check property name for every index. */
> +	if (!index && strcmp(prop, binding))
> +		return NULL;

I read the discussion on whether to check property name only once
in version 6.

This check is fragile, depending upon the calling code to be properly
structured.  Do the check for all values of index.  The reduction of
overhead from not checking does not justify the fragileness and the
extra complexity for the code reader to understand why the check can
be bypassed when
index is not zero.

> +
> +	if (of_parse_phandle_with_args(np, binding, cell, index, &sup_args))
> +		return NULL;
> +
> +	return sup_args.np;
> +}
> +
> +static struct device_node *parse_clocks(struct device_node *np,
> +					const char *prop, int index)

Change prop to prop_name
Change index to phandle_index

> +{
> +	return parse_prop_cells(np, prop, index, "clocks", "#clock-cells");
> +}
> +
> +static struct device_node *parse_interconnects(struct device_node *np,
> +					       const char *prop, int index)

Change prop to prop_name
Change index to phandle_index

> +{
> +	return parse_prop_cells(np, prop, index, "interconnects",
> +				"#interconnect-cells");
> +}
> +
> +static int strcmp_suffix(const char *str, const char *suffix)
> +{
> +	unsigned int len, suffix_len;
> +
> +	len = strlen(str);
> +	suffix_len = strlen(suffix);
> +	if (len <= suffix_len)
> +		return -1;
> +	return strcmp(str + len - suffix_len, suffix);
> +}
> +
> +static struct device_node *parse_regulators(struct device_node *np,
> +					    const char *prop, int index)

Change prop to prop_name
Change index to phandle_index

> +{
> +	if (index || strcmp_suffix(prop, "-supply"))
> +		return NULL;
> +
> +	return of_parse_phandle(np, prop, 0);
> +}
> +
> +/**

> + * struct supplier_bindings - Information for parsing supplier DT binding
> + *
> + * @parse_prop:		If the function cannot parse the property, return NULL.
> + *			Otherwise, return the phandle listed in the property
> + *			that corresponds to the index.

There is no documentation of dynamic function parameters in the docbook
description of a struct.  Use this format for now and I will clean up when
I clean up all of the devicetree docbook info.

Change above comment to:

 * struct supplier_bindings - Property parsing functions for suppliers
 *
 * @parse_prop: function name
 *              parse_prop() finds the node corresponding to a supplier phandle
 * @parse_prop.np: Pointer to device node holding supplier phandle property
 * @parse_prop.prop_name: Name of property holding a phandle value
 * @parse_prop.index: For properties holding a table of phandles, this is the
 *                    index into the table
 *
 * Return:
 * * parse_prop() return values are
 * * Node pointer with refcount incremented, use of_node_put() on it when done.
 * * NULL if not found.

> + */
> +struct supplier_bindings {
> +	struct device_node *(*parse_prop)(struct device_node *np,
> +					  const char *name, int index);

Change name to prop_name
Change index to phandle_index

> +};
> +
> +static const struct supplier_bindings bindings[] = {
> +	{ .parse_prop = parse_clocks, },
> +	{ .parse_prop = parse_interconnects, },
> +	{ .parse_prop = parse_regulators, },

> +	{ },

	{},

> +};
> +

/**
 * of_link_property - TODO:
 * dev:
 * con_np:
 * prop:
 *
 * TODO...
 *
 * Any failed attempt to create a link will NOT result in an immediate return.
 * of_link_property() must create all possible links even when one of more
 * attempts to create a link fail.

Why?  isn't one failure enough to prevent probing this device?
Continuing to scan just results in extra work... which will be
repeated every time device_link_check_waiting_consumers() is called

 *
 * Return:
 * * 0 if TODO:
 * * -ENODEV on error
 */


I left some "TODO:" sections to be filled out above.


> +static bool of_link_property(struct device *dev, struct device_node *con_np,
> +			     const char *prop)

Returns 0 or -ENODEV, so bool is incorrect

(Also fixed on 8/8 in patch: "[PATCH 1/2] of/platform: Fix fn definitons for
of_link_is_valid() and of_link_property()")



> +{
> +	struct device_node *phandle;
> +	struct supplier_bindings *s = bindings;
> +	unsigned int i = 0;

> +	bool done = true, matched = false;

Change to:

   	bool matched = false;
	int ret = 0;

	/* do not stop at first failed link, link all available suppliers */

> +
> +	while (!matched && s->parse_prop) {
> +		while ((phandle = s->parse_prop(con_np, prop, i))) {
> +			matched = true;
> +			i++;
> +			if (of_link_to_phandle(dev, phandle))


Remove comment:

> +				/*
> +				 * Don't stop at the first failure. See
> +				 * Documentation for bus_type.add_links for
> +				 * more details.
> +				 */

> +				done = false;

				ret = -ENODEV;

Do not hide of_node_put() inside of_link_to_phandle(), do it here:

			of_node_put(phandle);

> +		}
> +		s++;
> +	}

> +	return done ? 0 : -ENODEV;

	return ret;

> +}
> +
> +static bool of_devlink;
> +core_param(of_devlink, of_devlink, bool, 0);
> +

/**
 * of_link_to_suppliers - Add device links to suppliers
 * @dev: consumer device
 *
 * Create device links to all available suppliers of @dev.
 * Must NOT stop at the first failed link.
 * If some suppliers are not yet available, this function will be
 * called again when additional suppliers become available.
 *
 * Return:
 * * 0 if links successfully created for all suppliers
 * * an error if one or more suppliers not yet available
 */

> +static int of_link_to_suppliers(struct device *dev)
> +{
> +	struct property *p;

> +	bool done = true;

remove done

        int ret = 0;

> +
> +	if (!of_devlink)
> +		return 0;

> +	if (unlikely(!dev->of_node))
> +		return 0;

Check not needed, for_each_property_of_node() will detect !dev->of_node.

> +
> +	for_each_property_of_node(dev->of_node, p)
> +		if (of_link_property(dev, dev->of_node, p->name))

> +			done = false;

                        ret = -EAGAIN;

> +
> +	return done ? 0 : -ENODEV;

	return ret;

> +}
> +
>  #ifndef CONFIG_PPC
>  static const struct of_device_id reserved_mem_matches[] = {
>  	{ .compatible = "qcom,rmtfs-mem" },
> @@ -523,6 +687,7 @@ static int __init of_platform_default_populate_init(void)
>  	if (!of_have_populated_dt())
>  		return -ENODEV;
>  
> +	platform_bus_type.add_links = of_link_to_suppliers;
>  	/*
>  	 * Handle certain compatibles explicitly, since we don't want to create
>  	 * platform_devices for every node in /reserved-memory with a
> -- 
> 2.22.0.709.g102302147b-goog
> 
>

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v7 1/7] driver core: Add support for linking devices during device addition
  2019-08-08  2:04   ` Frank Rowand
@ 2019-08-16  1:50     ` Saravana Kannan
  2019-08-19  3:38       ` Frank Rowand
  0 siblings, 1 reply; 37+ messages in thread
From: Saravana Kannan @ 2019-08-16  1:50 UTC (permalink / raw)
  To: Frank Rowand
  Cc: Rob Herring, Mark Rutland, Greg Kroah-Hartman, Rafael J. Wysocki,
	open list:OPEN FIRMWARE AND FLATTENED DEVICE TREE BINDINGS, LKML,
	David Collins, Android Kernel Team

On Wed, Aug 7, 2019 at 7:04 PM Frank Rowand <frowand.list@gmail.com> wrote:
>
> > Date: Tue, 23 Jul 2019 17:10:54 -0700
> > Subject: [PATCH v7 1/7] driver core: Add support for linking devices during
> >  device addition
> > From: Saravana Kannan <saravanak@google.com>
> >
> > When devices are added, the bus might want to create device links to track
> > functional dependencies between supplier and consumer devices. This
> > tracking of supplier-consumer relationship allows optimizing device probe
> > order and tracking whether all consumers of a supplier are active. The
> > add_links bus callback is added to support this.
>
> Change above to:
>
> When devices are added, the bus may create device links to track which
> suppliers a consumer device depends upon.  This
> tracking of supplier-consumer relationship may be used to defer probing
> the driver of a consumer device before the driver(s) for its supplier device(s)
> are probed.  It may also be used by a supplier driver to determine if
> all of its consumers have been successfully probed.
> The add_links bus callback is added to create the supplier device links
>
> >
> > However, when consumer devices are added, they might not have a supplier
> > device to link to despite needing mandatory resources/functionality from
> > one or more suppliers. A waiting_for_suppliers list is created to track
> > such consumers and retry linking them when new devices get added.
>
> Change above to:
>
> If a supplier device has not yet been created when the consumer device attempts
> to link it, the consumer device is added to the wait_for_suppliers list.
> When supplier devices are created, the supplier device link will be added to
> the relevant consumer devices on the wait_for_suppliers list.
>

I'll take these commit text suggestions if we decide to revert the
entire series at the end of this review.

> >
> > Signed-off-by: Saravana Kannan <saravanak@google.com>
> > ---
> >  drivers/base/core.c    | 83 ++++++++++++++++++++++++++++++++++++++++++
> >  include/linux/device.h | 14 +++++++
> >  2 files changed, 97 insertions(+)
> >
> > diff --git a/drivers/base/core.c b/drivers/base/core.c
> > index da84a73f2ba6..1b4eb221968f 100644
> > --- a/drivers/base/core.c
> > +++ b/drivers/base/core.c
> > @@ -44,6 +44,8 @@ early_param("sysfs.deprecated", sysfs_deprecated_setup);
> >  #endif
> >
> >  /* Device links support. */
> > +static LIST_HEAD(wait_for_suppliers);
> > +static DEFINE_MUTEX(wfs_lock);
> >
> >  #ifdef CONFIG_SRCU
> >  static DEFINE_MUTEX(device_links_lock);
> > @@ -401,6 +403,51 @@ struct device_link *device_link_add(struct device *consumer,
> >  }
> >  EXPORT_SYMBOL_GPL(device_link_add);
> >
> > +/**
>
> > + * device_link_wait_for_supplier - Mark device as waiting for supplier
>
>     * device_link_wait_for_supplier - Add device to wait_for_suppliers list

I intentionally chose "Mark device..." because that's a better
description of the semantics of the function instead of trying to
describe the implementation. Whether I'm using a linked list or some
other data structure should not be the one line documentation of a
function. Unless the function is explicitly about operating on that
specific data structure.

>
>
> > + * @consumer: Consumer device
> > + *
> > + * Marks the consumer device as waiting for suppliers to become available. The
> > + * consumer device will never be probed until it's unmarked as waiting for
> > + * suppliers. The caller is responsible for adding the link to the supplier
> > + * once the supplier device is present.
> > + *
> > + * This function is NOT meant to be called from the probe function of the
> > + * consumer but rather from code that creates/adds the consumer device.
> > + */
> > +static void device_link_wait_for_supplier(struct device *consumer)
> > +{
> > +     mutex_lock(&wfs_lock);
> > +     list_add_tail(&consumer->links.needs_suppliers, &wait_for_suppliers);
> > +     mutex_unlock(&wfs_lock);
> > +}
> > +
> > +/**
>
>
> > + * device_link_check_waiting_consumers - Try to remove from supplier wait list
> > + *
> > + * Loops through all consumers waiting on suppliers and tries to add all their
> > + * supplier links. If that succeeds, the consumer device is unmarked as waiting
> > + * for suppliers. Otherwise, they are left marked as waiting on suppliers,
> > + *
> > + * The add_links bus callback is expected to return 0 if it has found and added
> > + * all the supplier links for the consumer device. It should return an error if
> > + * it isn't able to do so.
> > + *
> > + * The caller of device_link_wait_for_supplier() is expected to call this once
> > + * it's aware of potential suppliers becoming available.
>
> Change above comment to:
>
>     * device_link_add_supplier_links - add links from consumer devices to
>     *                                  supplier devices, leaving any consumer
>     *                                  with inactive suppliers on the
>     *                                  wait_for_suppliers list

I didn't know that the first one line comment could span multiple
lines. Good to know.


>     * Scan all consumer devices in the devicetree.

This function doesn't have anything to do with devicetree. I've
intentionally kept all OF related parts out of the driver/core because
I hope that other busses can start using this feature too. So I can't
take this bit.

>  For any supplier device that
>     * is not already linked to the consumer device, add the supplier to the
>     * consumer device's device links.
>     *
>     * If all of a consumer device's suppliers are available then the consumer
>     * is removed from the wait_for_suppliers list (if previously on the list).
>     * Otherwise the consumer is added to the wait_for_suppliers list (if not
>     * already on the list).

Honestly, I don't think this is any better than what I already have.

>     * The add_links bus callback must return 0 if it has found and added all
>     * the supplier links for the consumer device. It must return an error if
>     * it is not able to do so.
>     *
>     * The caller of device_link_wait_for_supplier() is expected to call this once
>     * it is aware of potential suppliers becoming available.
>
>
>
> > + */
> > +static void device_link_check_waiting_consumers(void)
>
> Function name is misleading and hides side effects.
>
> I have not come up with a name that does not hide side effects, but a better
> name would be:
>
>    device_link_add_supplier_links()

I kinda agree that it could afford a better name. The current name is
too similar to device_links_check_suppliers() and I never liked that.

Maybe device_link_add_missing_suppliers()?

I don't think we need "links" repeated twice in the function name.
With this suggestion, what side effect is hidden in your opinion? That
the fully linked consumer is removed from the "waiting for suppliers"
list?

Maybe device_link_try_removing_from_wfs()?

I'll wait for us to agree on a better name here before I change this.

> > +{
> > +     struct device *dev, *tmp;
> > +
> > +     mutex_lock(&wfs_lock);
> > +     list_for_each_entry_safe(dev, tmp, &wait_for_suppliers,
> > +                              links.needs_suppliers)
> > +             if (!dev->bus->add_links(dev))
> > +                     list_del_init(&dev->links.needs_suppliers);
>
> Empties dev->links.needs_suppliers, but does not remove dev from
> wait_for_suppliers list.  Where does that happen?

I'll chalk this up to you having a long day or forgetting your coffee
:) list_del_init() does both of those things because needs_suppliers
is the node and wait_for_suppliers is the list.

>
> > +     mutex_unlock(&wfs_lock);
> > +}
> > +
> >  static void device_link_free(struct device_link *link)
> >  {
> >       while (refcount_dec_not_one(&link->rpm_active))
> > @@ -535,6 +582,19 @@ int device_links_check_suppliers(struct device *dev)
> >       struct device_link *link;
> >       int ret = 0;
> >
> > +     /*
> > +      * If a device is waiting for one or more suppliers (in
> > +      * wait_for_suppliers list), it is not ready to probe yet. So just
> > +      * return -EPROBE_DEFER without having to check the links with existing
> > +      * suppliers.
> > +      */
>
> Change comment to:
>
>         /*
>          * Device waiting for supplier to become available is not allowed
>          * to probe
>          */

Po-tay-to. Po-tah-to? I think my comment is just as good.

> > +     mutex_lock(&wfs_lock);
> > +     if (!list_empty(&dev->links.needs_suppliers)) {
> > +             mutex_unlock(&wfs_lock);
> > +             return -EPROBE_DEFER;
> > +     }
> > +     mutex_unlock(&wfs_lock);
> > +
> >       device_links_write_lock();
>
> Update Documentation/driver-api/device_link.rst to reflect the
> check of &dev->links.needs_suppliers in device_links_check_suppliers().

Thanks! Will do.

>
> >
> >       list_for_each_entry(link, &dev->links.suppliers, c_node) {
> > @@ -812,6 +872,10 @@ static void device_links_purge(struct device *dev)
> >  {
> >       struct device_link *link, *ln;
> >
> > +     mutex_lock(&wfs_lock);
> > +     list_del(&dev->links.needs_suppliers);
> > +     mutex_unlock(&wfs_lock);
> > +
> >       /*
> >        * Delete all of the remaining links from this device to any other
> >        * devices (either consumers or suppliers).
> > @@ -1673,6 +1737,7 @@ void device_initialize(struct device *dev)
> >  #endif
> >       INIT_LIST_HEAD(&dev->links.consumers);
> >       INIT_LIST_HEAD(&dev->links.suppliers);
> > +     INIT_LIST_HEAD(&dev->links.needs_suppliers);
> >       dev->links.status = DL_DEV_NO_DRIVER;
> >  }
> >  EXPORT_SYMBOL_GPL(device_initialize);
> > @@ -2108,6 +2173,24 @@ int device_add(struct device *dev)
> >                                            BUS_NOTIFY_ADD_DEVICE, dev);
> >
> >       kobject_uevent(&dev->kobj, KOBJ_ADD);
>
> > +
> > +     /*
> > +      * Check if any of the other devices (consumers) have been waiting for
> > +      * this device (supplier) to be added so that they can create a device
> > +      * link to it.
> > +      *
> > +      * This needs to happen after device_pm_add() because device_link_add()
> > +      * requires the supplier be registered before it's called.
> > +      *
> > +      * But this also needs to happe before bus_probe_device() to make sure
> > +      * waiting consumers can link to it before the driver is bound to the
> > +      * device and the driver sync_state callback is called for this device.
> > +      */
>
>         /*
>          * Add links to dev from any dependent consumer that has dev on it's
>          * list of needed suppliers

There is no list of needed suppliers.

> (links.needs_suppliers).  Device_pm_add()
>          * must have previously registered dev to allow the links to be added.
>          *
>          * The consumer links must be created before dev is probed because the
>          * sync_state callback for dev will use the consumer links.
>          */

I think what I wrote is just as clear.

>
> > +     device_link_check_waiting_consumers();
> > +
> > +     if (dev->bus && dev->bus->add_links && dev->bus->add_links(dev))
> > +             device_link_wait_for_supplier(dev);
> > +
> >       bus_probe_device(dev);
> >       if (parent)
> >               klist_add_tail(&dev->p->knode_parent,
> > diff --git a/include/linux/device.h b/include/linux/device.h
> > index c330b75c6c57..5d70babb7462 100644
> > --- a/include/linux/device.h
> > +++ b/include/linux/device.h
> > @@ -78,6 +78,17 @@ extern void bus_remove_file(struct bus_type *, struct bus_attribute *);
> >   *           -EPROBE_DEFER it will queue the device for deferred probing.
> >   * @uevent:  Called when a device is added, removed, or a few other things
> >   *           that generate uevents to add the environment variables.
>
> > + * @add_links:       Called, perhaps multiple times per device, after a device is
> > + *           added to this bus.  The function is expected to create device
> > + *           links to all the suppliers of the input device that are
> > + *           available at the time this function is called.  As in, the
> > + *           function should NOT stop at the first failed device link if
> > + *           other unlinked supplier devices are present in the system.
>
> * @add_links:   Called after a device is added to this bus.

Why are you removing the "perhaps multiple times" part? that's true
and that's how some of the other ops are documented.

>  The function is
> *               expected to create device links to all the suppliers of the
> *               device that are available at the time this function is called.
> *               The function must NOT stop at the first failed device link if
> *               other unlinked supplier devices are present in the system.
> *               If some suppliers are not yet available, this function will be
> *               called again when the suppliers become available.
>
> but add_links() not needed, so moving this comment to of_link_to_suppliers()

Sorry, I'm not sure I understand. Can you please explain what you are
trying to say? of_link_to_suppliers() is just one implementation of
add_links(). The comment above is try for any bus trying to implement
add_links().

>
>
> > + *
> > + *           Return 0 if device links have been successfully created to all
> > + *           the suppliers of this device.  Return an error if some of the
> > + *           suppliers are not yet available and this function needs to be
> > + *           reattempted in the future.
>
> *
> *               Return 0 if device links have been successfully created to all
> *               the suppliers of this device.  Return an error if some of the
> *               suppliers are not yet available.
>
>
> >   * @probe:   Called when a new device or driver add to this bus, and callback
> >   *           the specific driver's probe to initial the matched device.
> >   * @remove:  Called when a device removed from this bus.
> > @@ -122,6 +133,7 @@ struct bus_type {
> >
> >       int (*match)(struct device *dev, struct device_driver *drv);
> >       int (*uevent)(struct device *dev, struct kobj_uevent_env *env);
>
>
> > +     int (*add_links)(struct device *dev);
>
>               ^^^^^^^^^  add_supplier              ???
>               ^^^^^^^^^  add_suppliers             ???
>
>               ^^^^^^^^^  link_suppliers            ???
>
>               ^^^^^^^^^  add_supplier_dependency   ???
>               ^^^^^^^^^  add_supplier_dependencies ???
> add_links() not needed

add_links() was an intentional decision. There's no requirement that
the bus should only create links from this device to its suppliers. If
the bus also knows the consumers of this device (dev), then it
can/should add those too. So, it shouldn't have "suppliers" in the
name.

> >       int (*probe)(struct device *dev);
> >       int (*remove)(struct device *dev);
> >       void (*shutdown)(struct device *dev);
>
>
>
>
> > @@ -893,11 +905,13 @@ enum dl_dev_state {
> >   * struct dev_links_info - Device data related to device links.
> >   * @suppliers: List of links to supplier devices.
> >   * @consumers: List of links to consumer devices.
>
> > + * @needs_suppliers: Hook to global list of devices waiting for suppliers.
>
>     * @needs_suppliers: List of devices deferring probe until supplier drivers
>     *                   are successfully probed.

It's "need suppliers". As in, this is a device that "needs suppliers".
So, no, this is not a list. This is a node in a list. And all "nodes
in a list" are documented as "Hook" in rest of places in this file. So
I think the documentation is correct as is.

Thanks for your review.



-Saravana

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v7 2/7] driver core: Add edit_links() callback for drivers
  2019-08-08  2:05   ` Frank Rowand
@ 2019-08-16  1:50     ` Saravana Kannan
  0 siblings, 0 replies; 37+ messages in thread
From: Saravana Kannan @ 2019-08-16  1:50 UTC (permalink / raw)
  To: Frank Rowand
  Cc: Rob Herring, Mark Rutland, Greg Kroah-Hartman, Rafael J. Wysocki,
	open list:OPEN FIRMWARE AND FLATTENED DEVICE TREE BINDINGS, LKML,
	David Collins, Android Kernel Team

On Wed, Aug 7, 2019 at 7:05 PM Frank Rowand <frowand.list@gmail.com> wrote:
>
> > Date: Tue, 23 Jul 2019 17:10:55 -0700
> > Subject: [PATCH v7 2/7] driver core: Add edit_links() callback for drivers
> > From: Saravana Kannan <saravanak@google.com>
> >
> > The driver core/bus adding supplier-consumer dependencies by default
>
> > enables functional dependencies to be tracked correctly even when the
> > consumer devices haven't had their drivers registered or loaded (if they
> > are modules).
>
>   enables functional dependencies to be tracked correctly before the
>   consumer device drivers are registered or loaded (if they are modules).
>
> >
> > However, when the bus incorrectly adds dependencies that it shouldn't
>
>                     ^^^ driver core/bus
>
> > have added, the devices might never probe.
>
> Explain what causes a  dependency to be incorrectly added.

That depends on the bus specific implementation of add_links()? Not
sure I can explain how a future implementation can get it wrong?

> Is this a bug in the dependency detection code?

Yes, as in, thinking it's a dependency when it's not.

> Are there cases where the dependency detection code can not reliably determine
> whether there truly is a dependency?

Correct. One example is the clock controller cyclic dependency that
was mentioned in the v1 patch series [1]. Search for "cyclic" if you
want to read that part. Can happen between interconnect providers too
for example. But I explain it below in the commit text as Device-C and
Device-S. Both of those could be different clock providers.

> >
> > For example, if device-C is a consumer of device-S and they have
> > phandles to each other in DT, the following could happen:
> >
> > 1.  Device-S get added first.
> > 2.  The bus add_links() callback will (incorrectly) try to link it as
> >     a consumer of device-C.
> > 3.  Since device-C isn't present, device-S will be put in
> >     "waiting-for-supplier" list.
> > 4.  Device-C gets added next.
> > 5.  All devices in "waiting-for-supplier" list are retried for linking.
> > 6.  Device-S gets linked as consumer to Device-C.
> > 7.  The bus add_links() callback will (correctly) try to link it as
> >     a consumer of device-S.
> > 8.  This isn't allowed because it would create a cyclic device links.
> >
> > Neither devices will get probed since the supplier is marked as
> > dependent on the consumer. And the consumer will never probe because the
> > consumer can't get resources from the supplier.
> >
> > Without this patch, things stay in this broken state. However, with this
> > patch, the execution will continue like this:
> >
> > 9.  Device-C's driver is loaded.
>
> Change comment to:
>
>   For example, if device-C is a consumer of device-S and they have phandles
>   referencing each other in the devicetree, the following could happen:
>
>   1.  Device-S is added first.
>         - The bus add_links() callback will (incorrectly) link device-S
>           as a consumer of device-C, and device-S will be put in the
>           "wait_for_suppliers" list.
>
>   2.  Device-C is added next.
>         - All devices in the "wait_for_suppliers" list are retried for linking.
>         - Device-S remains linked as a consumer to device-C.

Device-S gets linked. Not remains linked.

>         - The bus add_links() callback will (correctly) try to link device-C as
>           a consumer of device-S.
>         - The link attempt will fail because it would create a cyclic device
>           link, and device-C will be put in the "wait_for_suppliers" list.
>
>   Device-S will not be probed because it is in the "wait_for_suppliers" list.

Not correct. Device-S will not be probed because it has a device link
to Device-C.

>   Device-C will not be probed because it is in the "wait_for_suppliers" list.

Correct.

>
> >
> > Without this patch, things stay in this broken state. However, with this
> > patch, the execution will continue like this:
> >
> > 9.  Device-C's driver is loaded.
>
> What is "loaded"?  Does that mean the device-C probe succeeds?

No, module loading. I was using a loadable driver as an example. The
same thing could happen if the driver is registered at a later init
call level too.

> What causes device-C to be probed?  The normal processing of -EPROBE_DEFER
> devices?

It's not probed.

> > 10. Device-C's driver removes Device-S as a consumer of Device-C.
> > 11. Device-C's driver adds Device-C as a consumer of Device-S.
> > 12. Device-S probes.
> > 14. Device-C probes.
> >
> > Signed-off-by: Saravana Kannan <saravanak@google.com>
> > ---
> >  drivers/base/core.c    | 24 ++++++++++++++++++++++--
> >  drivers/base/dd.c      | 29 +++++++++++++++++++++++++++++
> >  include/linux/device.h | 18 ++++++++++++++++++
> >  3 files changed, 69 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/base/core.c b/drivers/base/core.c
> > index 1b4eb221968f..733d8a9aec76 100644
> > --- a/drivers/base/core.c
> > +++ b/drivers/base/core.c
> > @@ -422,6 +422,19 @@ static void device_link_wait_for_supplier(struct device *consumer)
> >       mutex_unlock(&wfs_lock);
> >  }
> >
> > +/**
> > + * device_link_remove_from_wfs - Unmark device as waiting for supplier
> > + * @consumer: Consumer device
> > + *
> > + * Unmark the consumer device as waiting for suppliers to become available.
> > + */
> > +void device_link_remove_from_wfs(struct device *consumer)
>
> Misleading function name.
> Incorrect description.

See other reply. I think you mixed up a node in a list with a list.
The name is very correct and is not misleading.

> Does not remove consumer from list wait_for_suppliers.

It does.

> At best, consumer might eventually get removed from list wait_for_suppliers
> if device_link_check_waiting_consumers() is called again.
>
> > +{
> > +     mutex_lock(&wfs_lock);
> > +     list_del_init(&consumer->links.needs_suppliers);
> > +     mutex_unlock(&wfs_lock);
> > +}
> > +
> >  /**
> >   * device_link_check_waiting_consumers - Try to unmark waiting consumers
> >   *
> > @@ -439,12 +452,19 @@ static void device_link_wait_for_supplier(struct device *consumer)
> >  static void device_link_check_waiting_consumers(void)
> >  {
> >       struct device *dev, *tmp;
> > +     int ret;
> >
> >       mutex_lock(&wfs_lock);
> >       list_for_each_entry_safe(dev, tmp, &wait_for_suppliers,
> > -                              links.needs_suppliers)
> > -             if (!dev->bus->add_links(dev))
> > +                              links.needs_suppliers) {
> > +             ret = 0;
> > +             if (dev->has_edit_links)
> > +                     ret = driver_edit_links(dev);
> > +             else if (dev->bus->add_links)
> > +                     ret = dev->bus->add_links(dev);
> > +             if (!ret)
> >                       list_del_init(&dev->links.needs_suppliers);
> > +     }
> >       mutex_unlock(&wfs_lock);
> >  }
> >
> > diff --git a/drivers/base/dd.c b/drivers/base/dd.c
> > index 994a90747420..5e7041ede0d7 100644
> > --- a/drivers/base/dd.c
> > +++ b/drivers/base/dd.c
> > @@ -698,6 +698,12 @@ int driver_probe_device(struct device_driver *drv, struct device *dev)
> >       pr_debug("bus: '%s': %s: matched device %s with driver %s\n",
> >                drv->bus->name, __func__, dev_name(dev), drv->name);
> >
> > +     if (drv->edit_links) {
> > +             if (drv->edit_links(dev))
> > +                     dev->has_edit_links = true;
> > +             else
> > +                     device_link_remove_from_wfs(dev);
> > +     }
>
> For the purposes of the following paragraphs, I refer to dev as "dev_1" to
> distinguish it from a new dev that will be encountered later.  The following
> paragraphs assume dev_1 has a supplier dependency for a supplier that has
> not probed yet.
>
> Q. Why the extra level of indirection?
>
> A. really_probe() does not set dev->driver before returning if
>    device_links_check_suppliers() returned -EPROBE_DEFER.  Thus
>    device_link_check_waiting_consumers() can not directly check
>    "if (dev_1->driver->edit_links)".
>
>    The added driver_probe_device() code is setting dev_1->has_edit_links in the
>    probe path, then device_link_check_waiting_consumers() will use the value
>    of dev_1->has_edit_links instead of directly checking
>    "if (dev_1->driver->edit_links)".
>
>    If really_probe() was modified to set dev->driver in this
>    case then the need for dev->has_edit_links is removed and
>    driver_edit_links() is not needed, since dev->driver would
>    be available.  Removing driver_edit_links() simplifies the
>    code.

really_probe() doesn't set device->drv for a good reason. Because the
driver could be unregistered/unloaded at anytime before the device and
driver are bound to each other.

> device_add() calls dev_1->bus->add_links(dev_1), thus dev_1 will have the
> supplier links set (for any suppliers not currently available) and be on
> list wait_for_suppliers.
>
> Then device_add() calls bus_probe_device(), leading to calling
> driver_probe_device().  The above code fragment either sets
> dev_1->has_edit_links or removes the needs_suppliers links from dev_1.
> dev_1 remains on list wait_for_suppliers.
>
> If (drv->edit_links(dev_1) returns 0 then device_link_remove_from_wfs()
> removes the supplier links.  Shouldn't device_link_remove_from_wfs()  also
> remove the device from the list wait_for_suppliers?
>
> The next time a device is added, device_link_check_waiting_consumers() will
> be called and dev_1 will be on list wait_for_suppliers, thus
> device_link_check_waiting_consumers() will find dev_1->has_edit_links true
> and thus call driver_edit_links() instead of calling dev->bus->add_links().
>
> The comment in device.h, later in this patch, says that drv->edit_links() is
> responsible for editing the device links for dev.  The comment provides no
> guidance on how drv->edit_links() is supposed to determine what edits to
> perform.  No example drv->edit_links() function is provided in this patch
> series.  dev_1->bus->add_links(dev_1) may have added one or more suppliers
> to its needs_suppliers link.  drv->edit_links() needs to be able to handle
> all possible variants of what suppliers are on the needs_suppliers link.

Looks like a significant chunk of this question ties into the
assumption that needs_suppliers vs wait_for_suppliers are two
different lists. Since I'm clarifying that's not the case, I'll wait
to see if you still have any questions and if so wait for your to
rewrite this.

>
>
> >       pm_runtime_get_suppliers(dev);
> >       if (dev->parent)
> >               pm_runtime_get_sync(dev->parent);
> > @@ -786,6 +792,29 @@ struct device_attach_data {
> >       bool have_async;
> >  };
> >
> > +static int __driver_edit_links(struct device_driver *drv, void *data)
> > +{
> > +     struct device *dev = data;
> > +
> > +     if (!drv->edit_links)
> > +             return 0;
> > +
> > +     if (driver_match_device(drv, dev) <= 0)
> > +             return 0;
> > +
> > +     return drv->edit_links(dev);
> > +}
> > +
> > +int driver_edit_links(struct device *dev)
> > +{
> > +     int ret;
> > +
> > +     device_lock(dev);
> > +     ret = bus_for_each_drv(dev->bus, NULL, dev, __driver_edit_links);
> > +     device_unlock(dev);
> > +     return ret;
> > +}
> > +
> >  static int __device_attach_driver(struct device_driver *drv, void *_data)
> >  {
> >       struct device_attach_data *data = _data;
> > diff --git a/include/linux/device.h b/include/linux/device.h
> > index 5d70babb7462..35aed50033c4 100644
> > --- a/include/linux/device.h
> > +++ b/include/linux/device.h
> > @@ -263,6 +263,20 @@ enum probe_type {
> >   * @probe_type:      Type of the probe (synchronous or asynchronous) to use.
> >   * @of_match_table: The open firmware table.
> >   * @acpi_match_table: The ACPI match table.
> > + * @edit_links:      Called to allow a matched driver to edit the device links the
>
> Where is the value of field edit_links set?
>
> Is it only in an out of tree driver?  If so, I would like to see an
> example implementation of the edit_links() function.

In the example above, it just needs to delete the Device-S is a
consumer of Device-C link and then add a Device-C is a consumer of
Device-S link.

I can send an example code in a subsequent reply.

> > + *           bus might have added incorrectly. This will be useful to handle
> > + *           cases where the bus incorrectly adds functional dependencies
> > + *           that aren't true or tries to create cyclic dependencies. But
> > + *           doesn't correctly handle functional dependencies that are
> > + *           missed by the bus as the supplier's sync_state might get to
> > + *           execute before the driver for a missing consumer is loaded and
> > + *           gets to edit the device links for the consumer.
> > + *
> > + *           This function might be called multiple times after a new device
> > + *           is added.  The function is expected to create all the device
> > + *           links for the new device and return 0 if it was completed
> > + *           successfully or return an error if it needs to be reattempted
> > + *           in the future.
> >   * @probe:   Called to query the existence of a specific device,
> >   *           whether this driver can work with it, and bind the driver
> >   *           to a specific device.
> > @@ -302,6 +316,7 @@ struct device_driver {
> >       const struct of_device_id       *of_match_table;
> >       const struct acpi_device_id     *acpi_match_table;
> >
> > +     int (*edit_links)(struct device *dev);
> >       int (*probe) (struct device *dev);
> >       int (*remove) (struct device *dev);
> >       void (*shutdown) (struct device *dev);
> > @@ -1078,6 +1093,7 @@ struct device {
> >       bool                    offline_disabled:1;
> >       bool                    offline:1;
> >       bool                    of_node_reused:1;
> > +     bool                    has_edit_links:1;
>
> Add has_edit_links to the struct's kernel_doc

Already addressed that in a separate patch.

Thanks,
Saravana

[1] https://lore.kernel.org/lkml/CAGETcx-KwwjNgAy7BLv4+1=5N_s-UdmfSnTtHP8V5gc7t48W=Q@mail.gmail.com/

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v7 3/7] of/platform: Add functional dependency link from DT bindings
  2019-08-08  2:06   ` Frank Rowand
@ 2019-08-16  1:50     ` Saravana Kannan
  2019-08-19 17:16       ` Frank Rowand
  0 siblings, 1 reply; 37+ messages in thread
From: Saravana Kannan @ 2019-08-16  1:50 UTC (permalink / raw)
  To: Frank Rowand
  Cc: Rob Herring, Mark Rutland, Greg Kroah-Hartman, Rafael J. Wysocki,
	Jonathan Corbet,
	open list:OPEN FIRMWARE AND FLATTENED DEVICE TREE BINDINGS, LKML,
	David Collins, Android Kernel Team, Linux Doc Mailing List

On Wed, Aug 7, 2019 at 7:06 PM Frank Rowand <frowand.list@gmail.com> wrote:
>
> On 7/23/19 5:10 PM, Saravana Kannan wrote:
> > Add device-links after the devices are created (but before they are
> > probed) by looking at common DT bindings like clocks and
> > interconnects.
> >
> > Automatically adding device-links for functional dependencies at the
> > framework level provides the following benefits:
> >
> > - Optimizes device probe order and avoids the useless work of
> >   attempting probes of devices that will not probe successfully
> >   (because their suppliers aren't present or haven't probed yet).
> >
> >   For example, in a commonly available mobile SoC, registering just
> >   one consumer device's driver at an initcall level earlier than the
> >   supplier device's driver causes 11 failed probe attempts before the
> >   consumer device probes successfully. This was with a kernel with all
> >   the drivers statically compiled in. This problem gets a lot worse if
> >   all the drivers are loaded as modules without direct symbol
> >   dependencies.
> >
> > - Supplier devices like clock providers, interconnect providers, etc
> >   need to keep the resources they provide active and at a particular
> >   state(s) during boot up even if their current set of consumers don't
> >   request the resource to be active. This is because the rest of the
> >   consumers might not have probed yet and turning off the resource
> >   before all the consumers have probed could lead to a hang or
> >   undesired user experience.
> >
> >   Some frameworks (Eg: regulator) handle this today by turning off
> >   "unused" resources at late_initcall_sync and hoping all the devices
> >   have probed by then. This is not a valid assumption for systems with
> >   loadable modules. Other frameworks (Eg: clock) just don't handle
> >   this due to the lack of a clear signal for when they can turn off
> >   resources. This leads to downstream hacks to handle cases like this
> >   that can easily be solved in the upstream kernel.
> >
> >   By linking devices before they are probed, we give suppliers a clear
> >   count of the number of dependent consumers. Once all of the
> >   consumers are active, the suppliers can turn off the unused
> >   resources without making assumptions about the number of consumers.
> >
> > By default we just add device-links to track "driver presence" (probe
> > succeeded) of the supplier device. If any other functionality provided
> > by device-links are needed, it is left to the consumer/supplier
> > devices to change the link when they probe.
> >
> > Signed-off-by: Saravana Kannan <saravanak@google.com>
> > ---
> >  .../admin-guide/kernel-parameters.txt         |   5 +
> >  drivers/of/platform.c                         | 165 ++++++++++++++++++
> >  2 files changed, 170 insertions(+)
> >
>
> Documentation/admin-guide/kernel-paramers.rst:
>
> After line 129, add:
>
>         OF      Devicetree is enabled

Will do.

> > diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
> > index 46b826fcb5ad..12937349d79d 100644
> > --- a/Documentation/admin-guide/kernel-parameters.txt
> > +++ b/Documentation/admin-guide/kernel-parameters.txt
> > @@ -3170,6 +3170,11 @@
> >                       This can be set from sysctl after boot.
> >                       See Documentation/admin-guide/sysctl/vm.rst for details.
> >
> > +     of_devlink      [KNL] Make device links from common DT bindings. Useful
> > +                     for optimizing probe order and making sure resources
> > +                     aren't turned off before the consumer devices have
> > +                     probed.
>
>         of_supplier_depend instead of of_devlink ????

I'm open to other names, but of_supplier_depend is just odd.
of_devlink stands for of_device_links. If someone wants to know what
device links do, they can read up on the device links documentation?

>
>         of_supplier_depend
>                         [OF, KNL] Make device links from consumer devicetree
>                         nodes to supplier devicetree nodes.

We are creating device links between devices. Not device tree nodes.
So this would be wrong. I'll replace it with "devicetree nodes" with
"devices"?

> The
>                         consumer / supplier relationships are inferred from
>                         scanning the devicetree.

I like this part. How about clarifying it with "from scanning the
common bindings in the devicetree"? Because we aren't trying to scan
the device specific properties.

>  The driver for a consumer
>                         device will not be probed until the drivers for all of
>                         its supplier devices have been successfully probed.

A driver is never probed. It's a device that's probed. The dependency
is between devices and not drivers. So, how about I replace this with:
"The consumer device will not be probed until all of its supplier
devices are successfully probed"?

Also, any reason you removed the "resources aren't turned off ..."
part? That's one of the biggest improvements of this patch series. Do
you want to rewrite that part too? Or I'll leave my current wording of
that part as is?

>
>
> > +
> >       ohci1394_dma=early      [HW] enable debugging via the ohci1394 driver.
> >                       See Documentation/debugging-via-ohci1394.txt for more
> >                       info.
> > diff --git a/drivers/of/platform.c b/drivers/of/platform.c
> > index 7801e25e6895..4344419a26fc 100644
> > --- a/drivers/of/platform.c
> > +++ b/drivers/of/platform.c
> > @@ -508,6 +508,170 @@ int of_platform_default_populate(struct device_node *root,
> >  }
> >  EXPORT_SYMBOL_GPL(of_platform_default_populate);
> >
>
> > +bool of_link_is_valid(struct device_node *con, struct device_node *sup)
>
> Change to less vague:
>
>    bool of_ancestor_of(struct device_node *test_np, struct device_node *np)

I thought about that when I wrote the code. But the consumer being the
ancestor of the supplier is just one of the things that makes a link
invalid. There can be other tests we might add as we go. That's why I
kept the function name as of_link_is_valid(). Even if I add an
"ancestor_of" helper function, I'll still have of_link_is_valid() as a
wrapper around it.

> > +{
> > +     of_node_get(sup);
> > +     /*
> > +      * Don't allow linking a device node as a consumer of one of its
> > +      * descendant nodes. By definition, a child node can't be a functional
> > +      * dependency for the parent node.
> > +      */
> > +     while (sup) {
> > +             if (sup == con) {
> > +                     of_node_put(sup);
> > +                     return false;
> > +             }
> > +             sup = of_get_next_parent(sup);
> > +     }
> > +     return true;
>
> Change to more generic:
>
>         of_node_get(test_np);
>         while (test_np) {
>                 if (test_np == np) {
>                         of_node_put(test_np);
>                         return true;
>                 }
>                 test_np = of_get_next_parent(test_np);
>         }
>         return false;

If you do insist on this change, I think we need better names than
test_np and np. It's not clear which node needs to be the ancestor for
this function to return true. With consumer/supplier, it's kinda
obvious that a consumer can't be an ancestor of a supplier.

> > +}
> > +
>
>
> /**
>  * of_link_to_phandle - Add device link to supplier
>  * @dev: consumer device
>  * @sup_np: pointer to the supplier device tree node

Could you suggest something that makes it a bit more clear that this
phandle/node doesn't need to be an actual device (as in, one with
compatible property)? phandle kinda make it clear at least to me. I'd
prefer just saying "phandle to supplier". Thoughts?

Also, what's the guideline on which functions needs doc headers? I
always leaned towards adding them only for non-static functions.
That's why I skipped this one. What's the reasoning for needing one
for this? I'm happy to do this, just want to see that there's some
consistent guideline that's being followed and something I can use for
future patches.

>  *
>  * TODO: ...
>  *
>  * Return:
>  * * 0 if link successfully created for supplier or of_devlink is false

The "or of_devlink is false" isn't true for this function?

>  * * an error if unable to create link
>  */
>
> Should have dev_debug() or pr_warn() or something on errors in this
> function -- the caller does not report any issue

I think that'll be too spammy during bootup. This function is expected
to fail often and it's not necessary a catastrophic failure. The
caller can print something if they care to. The current set of callers
don't.

> > +static int of_link_to_phandle(struct device *dev, struct device_node *sup_np)
> > +{
> > +     struct platform_device *sup_dev;
> > +     u32 dl_flags = DL_FLAG_AUTOPROBE_CONSUMER;
> > +     int ret = 0;
> > +
> > +     /*
>
> > +      * Since we are trying to create device links, we need to find
> > +      * the actual device node that owns this supplier phandle.
> > +      * Often times it's the same node, but sometimes it can be one
> > +      * of the parents. So walk up the parent till you find a
> > +      * device.
>
> Change comment to:
>
>          * Find the device node that contains the supplier phandle.  It may
>          * be @sup_np or it may be an ancestor of @sup_np.

Aren't the existing comments giving a better explanation with more context?
But I'll do this.

>
> > +      */
>
> See comment in caller of of_link_to_phandle() - do not hide the final
> of_node_put() of sup_np inside of_link_to_phandle(), so need to do
> an of_node_get() here.
>
>         of_node_get(sup_np);

Will do. Good point.

>
> > +     while (sup_np && !of_find_property(sup_np, "compatible", NULL))
> > +             sup_np = of_get_next_parent(sup_np);
> > +     if (!sup_np)
>
> > +             return 0;
>
> This case should never occur(?), it is an error.
>
>                 return -ENODEV;

I'm not too sure about all the possible DT combinations to say this?
In that case, this isn't an error. As in, the consumer doesn't need to
wait for this non-existent device to get populated/added. I'd lean
towards leaving it as is and address it later if this is actually a
problem. I want of_link_to_phandle to fail only when a link can be
created, but isn't due to current system state (device isn't added,
creating a cyclic link, etc).

>
> > +
> > +     if (!of_link_is_valid(dev->of_node, sup_np)) {
> > +             of_node_put(sup_np);
> > +             return 0;
>
> Do not use a name that obscures what the function is doing, also
> return an actual issue.
>
>         if (of_ancestor_of(sup_np, dev->of_node)) {
>                 of_node_put(sup_np);
>                 return -EINVAL;

See my comment about not erroring on cases where a link can't ever be
created? So in your case, you'd want the caller to check the error
value to device which ones to ignore and which ones not to? That seems
a bit more fragile when this function is potentially changed in the
furture.

>
> > +     }
> > +     sup_dev = of_find_device_by_node(sup_np);
> > +     of_node_put(sup_np);
> > +     if (!sup_dev)
> > +             return -ENODEV;
> > +     if (!device_link_add(dev, &sup_dev->dev, dl_flags))
> > +             ret = -ENODEV;

For example, in the earlier comment you suggested -ENODEV if there's
no device with "compatible" property that encapsulates the supplier
phandle. But -ENODEV makes more sense for this case where there's
actually no device because it hasn't been added yet. And the caller
needs to be able to distinguish between these two. Are we just going
to arbitrarily pick error values just to make sure they don't overlap?

I don't have a strong opinion one way or another, but I'm trying to
understand what's better in the long run where this function can
evolve to add more checks or handle more cases.

> > +     put_device(&sup_dev->dev);
> > +     return ret;
> > +}
> > +
>
> /**
>  * parse_prop_cells - Property parsing functions for suppliers
>  *
>  * @np:            pointer to a device tree node containing a list
>  * @prop_name:     Name of property holding a phandle value
>  * @phandle_index: For properties holding a table of phandles, this is the
>  *                 index into the table
>  * @list_name:     property name that contains a list
>  * @cells_name:    property name that specifies phandles' arguments count
>  *
>  * This function is useful to parse lists of phandles and their arguments.
>  *
>  * Return:
>  * * Node pointer with refcount incremented, use of_node_put() on it when done.
>  * * NULL if not found.
>  */
>
> > +static struct device_node *parse_prop_cells(struct device_node *np,
> > +                                         const char *prop, int index,
> > +                                         const char *binding,
> > +                                         const char *cell)
>
> Make names consistent with of_parse_phandle_with_args():
>   Change prop to prop_name
>   Change index to phandle_index

You call this index even in of_parse_phandle_with_args()

>   Change binding to list_name
>   Change cell to cells_name

This going to cause a lot more line wraps for barely better names. But
I'll reluctantly do this.

>
> > +{
> > +     struct of_phandle_args sup_args;
> > +
>
> > +     /* Don't need to check property name for every index. */
> > +     if (!index && strcmp(prop, binding))
> > +             return NULL;
>
> I read the discussion on whether to check property name only once
> in version 6.
>
> This check is fragile, depending upon the calling code to be properly
> structured.  Do the check for all values of index.  The reduction of
> overhead from not checking does not justify the fragileness and the
> extra complexity for the code reader to understand why the check can
> be bypassed when
> index is not zero.

This is used only in this file. I understand needing the balance
between code complexity/fragility and efficiency. But I think you push
the line too far away from efficiency. This code is literally never
going to fail because it's a static function called only inside this
file. And the check isn't that hard to understand with that tiny
comment.

> > +
> > +     if (of_parse_phandle_with_args(np, binding, cell, index, &sup_args))
> > +             return NULL;
> > +
> > +     return sup_args.np;
> > +}
> > +
> > +static struct device_node *parse_clocks(struct device_node *np,
> > +                                     const char *prop, int index)
>
> Change prop to prop_name
> Change index to phandle_index
>
> > +{
> > +     return parse_prop_cells(np, prop, index, "clocks", "#clock-cells");
> > +}
> > +
> > +static struct device_node *parse_interconnects(struct device_node *np,
> > +                                            const char *prop, int index)
>
> Change prop to prop_name
> Change index to phandle_index
>
> > +{
> > +     return parse_prop_cells(np, prop, index, "interconnects",
> > +                             "#interconnect-cells");
> > +}
> > +
> > +static int strcmp_suffix(const char *str, const char *suffix)
> > +{
> > +     unsigned int len, suffix_len;
> > +
> > +     len = strlen(str);
> > +     suffix_len = strlen(suffix);
> > +     if (len <= suffix_len)
> > +             return -1;
> > +     return strcmp(str + len - suffix_len, suffix);
> > +}
> > +
> > +static struct device_node *parse_regulators(struct device_node *np,
> > +                                         const char *prop, int index)
>
> Change prop to prop_name
> Change index to phandle_index
>

Will do all the renames you list above.

> > +{
> > +     if (index || strcmp_suffix(prop, "-supply"))
> > +             return NULL;
> > +
> > +     return of_parse_phandle(np, prop, 0);
> > +}
> > +
> > +/**
>
> > + * struct supplier_bindings - Information for parsing supplier DT binding
> > + *
> > + * @parse_prop:              If the function cannot parse the property, return NULL.
> > + *                   Otherwise, return the phandle listed in the property
> > + *                   that corresponds to the index.
>
> There is no documentation of dynamic function parameters in the docbook
> description of a struct.  Use this format for now and I will clean up when
> I clean up all of the devicetree docbook info.
>
> Change above comment to:
>
>  * struct supplier_bindings - Property parsing functions for suppliers
>  *
>  * @parse_prop: function name
>  *              parse_prop() finds the node corresponding to a supplier phandle
>  * @parse_prop.np: Pointer to device node holding supplier phandle property
>  * @parse_prop.prop_name: Name of property holding a phandle value
>  * @parse_prop.index: For properties holding a table of phandles, this is the
>  *                    index into the table
>  *
>  * Return:
>  * * parse_prop() return values are
>  * * Node pointer with refcount incremented, use of_node_put() on it when done.
>  * * NULL if not found.

Will do. Thanks for writing it.

> > + */
> > +struct supplier_bindings {
> > +     struct device_node *(*parse_prop)(struct device_node *np,
> > +                                       const char *name, int index);
>
> Change name to prop_name
> Change index to phandle_index
>
> > +};
> > +
> > +static const struct supplier_bindings bindings[] = {
> > +     { .parse_prop = parse_clocks, },
> > +     { .parse_prop = parse_interconnects, },
> > +     { .parse_prop = parse_regulators, },
>
> > +     { },
>
>         {},
>
> > +};
> > +
>
> /**
>  * of_link_property - TODO:
>  * dev:
>  * con_np:
>  * prop:
>  *
>  * TODO...
>  *
>  * Any failed attempt to create a link will NOT result in an immediate return.
>  * of_link_property() must create all possible links even when one of more
>  * attempts to create a link fail.
>
> Why?  isn't one failure enough to prevent probing this device?
> Continuing to scan just results in extra work... which will be
> repeated every time device_link_check_waiting_consumers() is called

Context:
As I said in the cover letter, avoiding unnecessary probes is just one
of the reasons for this patch. The other (arguably more important)
reason for this patch is to make sure suppliers know that they have
consumers that are yet to be probed. That way, suppliers can leave
their resource on AND in the right state if they were left on by the
bootloader. For example, if a clock was left on and at 200 MHz, the
clock provider needs to keep that clock ON and at 200 MHz till all the
consumers are probed.

Answer: Let's say a consumer device Z has suppliers A, B and C. If the
linking fails at A and you return immediately, then B and C could
probe and then figure that they have no more consumers (they don't see
a link to Z) and turn off their resources. And Z could fail
catastrophically.

>  *
>  * Return:
>  * * 0 if TODO:
>  * * -ENODEV on error
>  */
>
>
> I left some "TODO:" sections to be filled out above.

Will do.

>
>
> > +static bool of_link_property(struct device *dev, struct device_node *con_np,
> > +                          const char *prop)
>
> Returns 0 or -ENODEV, so bool is incorrect
>
> (Also fixed on 8/8 in patch: "[PATCH 1/2] of/platform: Fix fn definitons for
> of_link_is_valid() and of_link_property()")

Right.

>
> > +{
> > +     struct device_node *phandle;
> > +     struct supplier_bindings *s = bindings;
> > +     unsigned int i = 0;
>
> > +     bool done = true, matched = false;
>
> Change to:
>
>         bool matched = false;
>         int ret = 0;
>
>         /* do not stop at first failed link, link all available suppliers */
>
> > +
> > +     while (!matched && s->parse_prop) {
> > +             while ((phandle = s->parse_prop(con_np, prop, i))) {
> > +                     matched = true;
> > +                     i++;
> > +                     if (of_link_to_phandle(dev, phandle))
>
>
> Remove comment:
>
> > +                             /*
> > +                              * Don't stop at the first failure. See
> > +                              * Documentation for bus_type.add_links for
> > +                              * more details.
> > +                              */

Ok

>
> > +                             done = false;
>
>                                 ret = -ENODEV;

This is nicer. Thanks.

>
> Do not hide of_node_put() inside of_link_to_phandle(), do it here:
>
>                         of_node_put(phandle);

Ok

>
> > +             }
> > +             s++;
> > +     }
>
> > +     return done ? 0 : -ENODEV;
>
>         return ret;
>
> > +}
> > +
> > +static bool of_devlink;
> > +core_param(of_devlink, of_devlink, bool, 0);
> > +
>
> /**
>  * of_link_to_suppliers - Add device links to suppliers
>  * @dev: consumer device
>  *
>  * Create device links to all available suppliers of @dev.
>  * Must NOT stop at the first failed link.
>  * If some suppliers are not yet available, this function will be
>  * called again when additional suppliers become available.
>  *
>  * Return:
>  * * 0 if links successfully created for all suppliers
>  * * an error if one or more suppliers not yet available
>  */

Ok

> > +static int of_link_to_suppliers(struct device *dev)
> > +{
> > +     struct property *p;
>
> > +     bool done = true;
>
> remove done
>
>         int ret = 0;
>
> > +
> > +     if (!of_devlink)
> > +             return 0;
>
> > +     if (unlikely(!dev->of_node))
> > +             return 0;
>
> Check not needed, for_each_property_of_node() will detect !dev->of_node.
>
> > +
> > +     for_each_property_of_node(dev->of_node, p)
> > +             if (of_link_property(dev, dev->of_node, p->name))
>
> > +                     done = false;
>
>                         ret = -EAGAIN;
>
> > +
> > +     return done ? 0 : -ENODEV;
>
>         return ret;

Thanks. I think I was too caught up on the rest of the logic
complexity that I missed this obviously ugly code. Will fix.

Thanks,
Saravana

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v7 1/7] driver core: Add support for linking devices during device addition
  2019-08-16  1:50     ` Saravana Kannan
@ 2019-08-19  3:38       ` Frank Rowand
  2019-08-20  0:00         ` Saravana Kannan
  0 siblings, 1 reply; 37+ messages in thread
From: Frank Rowand @ 2019-08-19  3:38 UTC (permalink / raw)
  To: Saravana Kannan
  Cc: Rob Herring, Mark Rutland, Greg Kroah-Hartman, Rafael J. Wysocki,
	open list:OPEN FIRMWARE AND FLATTENED DEVICE TREE BINDINGS, LKML,
	David Collins, Android Kernel Team

On 8/15/19 6:50 PM, Saravana Kannan wrote:
> On Wed, Aug 7, 2019 at 7:04 PM Frank Rowand <frowand.list@gmail.com> wrote:
>>
>>> Date: Tue, 23 Jul 2019 17:10:54 -0700
>>> Subject: [PATCH v7 1/7] driver core: Add support for linking devices during
>>>  device addition
>>> From: Saravana Kannan <saravanak@google.com>
>>>
>>> When devices are added, the bus might want to create device links to track
>>> functional dependencies between supplier and consumer devices. This
>>> tracking of supplier-consumer relationship allows optimizing device probe
>>> order and tracking whether all consumers of a supplier are active. The
>>> add_links bus callback is added to support this.
>>
>> Change above to:
>>
>> When devices are added, the bus may create device links to track which
>> suppliers a consumer device depends upon.  This
>> tracking of supplier-consumer relationship may be used to defer probing
>> the driver of a consumer device before the driver(s) for its supplier device(s)
>> are probed.  It may also be used by a supplier driver to determine if
>> all of its consumers have been successfully probed.
>> The add_links bus callback is added to create the supplier device links
>>
>>>
>>> However, when consumer devices are added, they might not have a supplier
>>> device to link to despite needing mandatory resources/functionality from
>>> one or more suppliers. A waiting_for_suppliers list is created to track
>>> such consumers and retry linking them when new devices get added.
>>
>> Change above to:
>>
>> If a supplier device has not yet been created when the consumer device attempts
>> to link it, the consumer device is added to the wait_for_suppliers list.
>> When supplier devices are created, the supplier device link will be added to
>> the relevant consumer devices on the wait_for_suppliers list.
>>
> 
> I'll take these commit text suggestions if we decide to revert the
> entire series at the end of this review.
> 
>>>
>>> Signed-off-by: Saravana Kannan <saravanak@google.com>
>>> ---
>>>  drivers/base/core.c    | 83 ++++++++++++++++++++++++++++++++++++++++++
>>>  include/linux/device.h | 14 +++++++
>>>  2 files changed, 97 insertions(+)
>>>
>>> diff --git a/drivers/base/core.c b/drivers/base/core.c
>>> index da84a73f2ba6..1b4eb221968f 100644
>>> --- a/drivers/base/core.c
>>> +++ b/drivers/base/core.c
>>> @@ -44,6 +44,8 @@ early_param("sysfs.deprecated", sysfs_deprecated_setup);
>>>  #endif
>>>
>>>  /* Device links support. */
>>> +static LIST_HEAD(wait_for_suppliers);
>>> +static DEFINE_MUTEX(wfs_lock);
>>>
>>>  #ifdef CONFIG_SRCU
>>>  static DEFINE_MUTEX(device_links_lock);
>>> @@ -401,6 +403,51 @@ struct device_link *device_link_add(struct device *consumer,
>>>  }
>>>  EXPORT_SYMBOL_GPL(device_link_add);
>>>
>>> +/**
>>
>>> + * device_link_wait_for_supplier - Mark device as waiting for supplier
>>
>>     * device_link_wait_for_supplier - Add device to wait_for_suppliers list
> 

As a meta-comment, I found this series very hard to understand in the context
of reading the new code for the first time.  When I read the code again in
six months or a year or two years it will not be in near term memory and it
will be as if I am reading it for the first time.  A lot of my suggestions
for changes of names are in that context -- the current names may be fine
when one has recently read the code, but not so much when trying to read
the whole thing again with a blank mind.

The code also inherits a good deal of complexity because it does not stand
alone in a nice discrete chunk, but instead delicately weaves into a more
complex body of code.

When I was trying to understand the code, I wrote a lot of additional
comments within my reply email to provide myself context, information
about various things, and questions that I needed to answer (or if I
could not answer to then ask you).  Then I ended up being able to remove
many of those notes before sending the reply.


> I intentionally chose "Mark device..." because that's a better
> description of the semantics of the function instead of trying to
> describe the implementation. Whether I'm using a linked list or some
> other data structure should not be the one line documentation of a
> function. Unless the function is explicitly about operating on that
> specific data structure.

I agree with the intent of trying to describe the semantics of a function,
especially at the API level where other systems (or drivers) would be using
the function.  But for this case the function is at the implementation level
and describing explicitly what it is doing makes this much more readable for
me.

I also find "Mark device" to be vague and not descriptive of what the
intent is.

> 
>>
>>
>>> + * @consumer: Consumer device
>>> + *
>>> + * Marks the consumer device as waiting for suppliers to become available. The
>>> + * consumer device will never be probed until it's unmarked as waiting for
>>> + * suppliers. The caller is responsible for adding the link to the supplier
>>> + * once the supplier device is present.
>>> + *
>>> + * This function is NOT meant to be called from the probe function of the
>>> + * consumer but rather from code that creates/adds the consumer device.
>>> + */
>>> +static void device_link_wait_for_supplier(struct device *consumer)
>>> +{
>>> +     mutex_lock(&wfs_lock);
>>> +     list_add_tail(&consumer->links.needs_suppliers, &wait_for_suppliers);
>>> +     mutex_unlock(&wfs_lock);
>>> +}
>>> +
>>> +/**
>>
>>
>>> + * device_link_check_waiting_consumers - Try to remove from supplier wait list
>>> + *
>>> + * Loops through all consumers waiting on suppliers and tries to add all their
>>> + * supplier links. If that succeeds, the consumer device is unmarked as waiting
>>> + * for suppliers. Otherwise, they are left marked as waiting on suppliers,
>>> + *
>>> + * The add_links bus callback is expected to return 0 if it has found and added
>>> + * all the supplier links for the consumer device. It should return an error if
>>> + * it isn't able to do so.
>>> + *
>>> + * The caller of device_link_wait_for_supplier() is expected to call this once
>>> + * it's aware of potential suppliers becoming available.
>>
>> Change above comment to:
>>
>>     * device_link_add_supplier_links - add links from consumer devices to
>>     *                                  supplier devices, leaving any consumer
>>     *                                  with inactive suppliers on the
>>     *                                  wait_for_suppliers list
> 
> I didn't know that the first one line comment could span multiple
> lines. Good to know.
> 
> 
>>     * Scan all consumer devices in the devicetree.
> 
> This function doesn't have anything to do with devicetree. I've
> intentionally kept all OF related parts out of the driver/core because
> I hope that other busses can start using this feature too. So I can't
> take this bit.

My comment is left over from when I was taking notes, trying to understand the
code.

At the moment, only devicetree is used as a source of the dependency information.
The comment would better be re-phrased as:

        * Scan all consumer devices in the firmware description of the hardware topology

I did not ask why this feature is tied to _only_ the platform bus, but will now.

I do not know of any reason that a consumer / supplier relationship can not be
between devices on different bus types.  Do you know of such a reason?


> 
>>  For any supplier device that
>>     * is not already linked to the consumer device, add the supplier to the
>>     * consumer device's device links.
>>     *
>>     * If all of a consumer device's suppliers are available then the consumer
>>     * is removed from the wait_for_suppliers list (if previously on the list).
>>     * Otherwise the consumer is added to the wait_for_suppliers list (if not
>>     * already on the list).
> 
> Honestly, I don't think this is any better than what I already have.

Note that my version of these comments was written while I was reading the code,
and did not have any big picture understanding yet.  This will likely also be
the mind set of most everyone who reads this code in the future, once it is
woven into the kernel.

If you don't like the change, I can revisit it in a later version of the
patch set.


> 
>>     * The add_links bus callback must return 0 if it has found and added all
>>     * the supplier links for the consumer device. It must return an error if
>>     * it is not able to do so.
>>     *
>>     * The caller of device_link_wait_for_supplier() is expected to call this once
>>     * it is aware of potential suppliers becoming available.
>>
>>
>>
>>> + */
>>> +static void device_link_check_waiting_consumers(void)
>>
>> Function name is misleading and hides side effects.
>>
>> I have not come up with a name that does not hide side effects, but a better
>> name would be:
>>
>>    device_link_add_supplier_links()
> 
> I kinda agree that it could afford a better name. The current name is
> too similar to device_links_check_suppliers() and I never liked that.

Naming new fields or variables related to device links looks pretty
challenging to me, because of the desire to be part of device links
and not a wart pasted on the side.  So I share the pain in trying
to find good names.

> 
> Maybe device_link_add_missing_suppliers()?

My first reaction was "yes, that sounds good".  But then I stopped and
tried to read the name out of context.  The name is not adding the
missing suppliers, it is saving the information that a supplier is
not yet available (eg, is "missing").  I struggled in coming up with
the name that I suggested.  We can keep thinking.


> 
> I don't think we need "links" repeated twice in the function name.

Yeah, I didn't like that either.


> With this suggestion, what side effect is hidden in your opinion? That
> the fully linked consumer is removed from the "waiting for suppliers"
> list?

The side effect is that the function does not merely do a check.  It also
adds missing suppliers to a list.


> 
> Maybe device_link_try_removing_from_wfs()?

I like that, other than the fact that it still does not provide a clue
that the function is potentially adding suppliers to a list.  I think
part of the challenge is that the function does two things: (1) a check,
and (2) potentially adding missing suppliers to a list.  Maybe a simple
one line comment at the call site, something like:

   /* adds missing suppliers to wfs */


> 
> I'll wait for us to agree on a better name here before I change this.
> 
>>> +{
>>> +     struct device *dev, *tmp;
>>> +
>>> +     mutex_lock(&wfs_lock);
>>> +     list_for_each_entry_safe(dev, tmp, &wait_for_suppliers,
>>> +                              links.needs_suppliers)
>>> +             if (!dev->bus->add_links(dev))
>>> +                     list_del_init(&dev->links.needs_suppliers);
>>
>> Empties dev->links.needs_suppliers, but does not remove dev from
>> wait_for_suppliers list.  Where does that happen?
> 
> I'll chalk this up to you having a long day or forgetting your coffee
> :) list_del_init() does both of those things because needs_suppliers
> is the node and wait_for_suppliers is the list.

Yes, brain mis-fire on my part.  I'll have to go back and look at the
list related code again.


> 
>>
>>> +     mutex_unlock(&wfs_lock);
>>> +}
>>> +
>>>  static void device_link_free(struct device_link *link)
>>>  {
>>>       while (refcount_dec_not_one(&link->rpm_active))
>>> @@ -535,6 +582,19 @@ int device_links_check_suppliers(struct device *dev)
>>>       struct device_link *link;
>>>       int ret = 0;
>>>
>>> +     /*
>>> +      * If a device is waiting for one or more suppliers (in
>>> +      * wait_for_suppliers list), it is not ready to probe yet. So just
>>> +      * return -EPROBE_DEFER without having to check the links with existing
>>> +      * suppliers.
>>> +      */
>>
>> Change comment to:
>>
>>         /*
>>          * Device waiting for supplier to become available is not allowed
>>          * to probe
>>          */
> 
> Po-tay-to. Po-tah-to? I think my comment is just as good.

If just as good and shorter, then better.

Also the original says "it is not ready to probe".  That is not correct.  It
is ready to probe, it is just that the probe attempt will return -EPROBE_DEFER.
Nit picky on my part, but tiny things like that mean I have to think harder.
I have to think "why is it not ready to probe?".  Maybe my version should have
instead been something like:

        * Device waiting for supplier to become available will return
        * -EPROBE_DEFER if probed.  Avoid the unneeded processing.

> 
>>> +     mutex_lock(&wfs_lock);
>>> +     if (!list_empty(&dev->links.needs_suppliers)) {
>>> +             mutex_unlock(&wfs_lock);
>>> +             return -EPROBE_DEFER;
>>> +     }
>>> +     mutex_unlock(&wfs_lock);
>>> +
>>>       device_links_write_lock();
>>
>> Update Documentation/driver-api/device_link.rst to reflect the
>> check of &dev->links.needs_suppliers in device_links_check_suppliers().
> 
> Thanks! Will do.
> 
>>
>>>
>>>       list_for_each_entry(link, &dev->links.suppliers, c_node) {
>>> @@ -812,6 +872,10 @@ static void device_links_purge(struct device *dev)
>>>  {
>>>       struct device_link *link, *ln;
>>>
>>> +     mutex_lock(&wfs_lock);
>>> +     list_del(&dev->links.needs_suppliers);
>>> +     mutex_unlock(&wfs_lock);
>>> +
>>>       /*
>>>        * Delete all of the remaining links from this device to any other
>>>        * devices (either consumers or suppliers).
>>> @@ -1673,6 +1737,7 @@ void device_initialize(struct device *dev)
>>>  #endif
>>>       INIT_LIST_HEAD(&dev->links.consumers);
>>>       INIT_LIST_HEAD(&dev->links.suppliers);
>>> +     INIT_LIST_HEAD(&dev->links.needs_suppliers);
>>>       dev->links.status = DL_DEV_NO_DRIVER;
>>>  }
>>>  EXPORT_SYMBOL_GPL(device_initialize);
>>> @@ -2108,6 +2173,24 @@ int device_add(struct device *dev)
>>>                                            BUS_NOTIFY_ADD_DEVICE, dev);
>>>
>>>       kobject_uevent(&dev->kobj, KOBJ_ADD);
>>
>>> +
>>> +     /*
>>> +      * Check if any of the other devices (consumers) have been waiting for
>>> +      * this device (supplier) to be added so that they can create a device
>>> +      * link to it.
>>> +      *
>>> +      * This needs to happen after device_pm_add() because device_link_add()
>>> +      * requires the supplier be registered before it's called.
>>> +      *
>>> +      * But this also needs to happe before bus_probe_device() to make sure
>>> +      * waiting consumers can link to it before the driver is bound to the
>>> +      * device and the driver sync_state callback is called for this device.
>>> +      */
>>
>>         /*
>>          * Add links to dev from any dependent consumer that has dev on it's
>>          * list of needed suppliers
> 
> There is no list of needed suppliers.

"the other devices (consumers) have been waiting for this device (supplier)".
Isn't that a list of needed suppliers?


> 
>> (links.needs_suppliers).  Device_pm_add()
>>          * must have previously registered dev to allow the links to be added.
>>          *
>>          * The consumer links must be created before dev is probed because the
>>          * sync_state callback for dev will use the consumer links.
>>          */
> 
> I think what I wrote is just as clear.

The original comment is vague.  It does not explain why consumer links must be
created before the probe.  I had to go off and read other code to determine
why that is true.

And again, brevity is better if otherwise just as clear.


> 
>>
>>> +     device_link_check_waiting_consumers();
>>> +
>>> +     if (dev->bus && dev->bus->add_links && dev->bus->add_links(dev))
>>> +             device_link_wait_for_supplier(dev);
>>> +
>>>       bus_probe_device(dev);
>>>       if (parent)
>>>               klist_add_tail(&dev->p->knode_parent,
>>> diff --git a/include/linux/device.h b/include/linux/device.h
>>> index c330b75c6c57..5d70babb7462 100644
>>> --- a/include/linux/device.h
>>> +++ b/include/linux/device.h
>>> @@ -78,6 +78,17 @@ extern void bus_remove_file(struct bus_type *, struct bus_attribute *);
>>>   *           -EPROBE_DEFER it will queue the device for deferred probing.
>>>   * @uevent:  Called when a device is added, removed, or a few other things
>>>   *           that generate uevents to add the environment variables.
>>
>>> + * @add_links:       Called, perhaps multiple times per device, after a device is
>>> + *           added to this bus.  The function is expected to create device
>>> + *           links to all the suppliers of the input device that are
>>> + *           available at the time this function is called.  As in, the
>>> + *           function should NOT stop at the first failed device link if
>>> + *           other unlinked supplier devices are present in the system.
>>
>> * @add_links:   Called after a device is added to this bus.
> 
> Why are you removing the "perhaps multiple times" part? that's true
> and that's how some of the other ops are documented.

I didn't remove it.  I rephrased it with a little bit more explanation as
"If some suppliers are not yet available, this function will be
called again when the suppliers become available." (below).


> 
>>  The function is
>> *               expected to create device links to all the suppliers of the
>> *               device that are available at the time this function is called.
>> *               The function must NOT stop at the first failed device link if
>> *               other unlinked supplier devices are present in the system.
>> *               If some suppliers are not yet available, this function will be
>> *               called again when the suppliers become available.
>>
>> but add_links() not needed, so moving this comment to of_link_to_suppliers()
> 
> Sorry, I'm not sure I understand. Can you please explain what you are
> trying to say? of_link_to_suppliers() is just one implementation of
> add_links(). The comment above is try for any bus trying to implement
> add_links().

This is conflating bus with the source of the firmware description of the
hardware topology.  For drivers that use various APIs to access firmware
description of topology that may be either devicetree or ACPI the access
is done via fwnode_operations, based on struct device.fwnode (if I recall
properly).

I failed to completely address why add_links() is not needed.  The answer
is that there should be a single function called for all buses.  Then
the proper firmware data source would be accessed via a struct fwnode_operations.

I think I left this out because I had not yet asked why this feature is
tied only to the platform bus.  Which I asked earlier in this reply.

> 
>>
>>
>>> + *
>>> + *           Return 0 if device links have been successfully created to all
>>> + *           the suppliers of this device.  Return an error if some of the
>>> + *           suppliers are not yet available and this function needs to be
>>> + *           reattempted in the future.
>>
>> *
>> *               Return 0 if device links have been successfully created to all
>> *               the suppliers of this device.  Return an error if some of the
>> *               suppliers are not yet available.
>>
>>
>>>   * @probe:   Called when a new device or driver add to this bus, and callback
>>>   *           the specific driver's probe to initial the matched device.
>>>   * @remove:  Called when a device removed from this bus.
>>> @@ -122,6 +133,7 @@ struct bus_type {
>>>
>>>       int (*match)(struct device *dev, struct device_driver *drv);
>>>       int (*uevent)(struct device *dev, struct kobj_uevent_env *env);
>>
>>
>>> +     int (*add_links)(struct device *dev);
>>
>>               ^^^^^^^^^  add_supplier              ???
>>               ^^^^^^^^^  add_suppliers             ???
>>
>>               ^^^^^^^^^  link_suppliers            ???
>>
>>               ^^^^^^^^^  add_supplier_dependency   ???
>>               ^^^^^^^^^  add_supplier_dependencies ???
>> add_links() not needed
> 
> add_links() was an intentional decision. There's no requirement that
> the bus should only create links from this device to its suppliers. If
> the bus also knows the consumers of this device (dev), then it
> can/should add those too.

Is creating links to consumers of this device implemented in this patch
series?  If so, I overlooked it and will have to consider how that
fits in to the design.


> So, it shouldn't have "suppliers" in the
> name.
> 
>>>       int (*probe)(struct device *dev);
>>>       int (*remove)(struct device *dev);
>>>       void (*shutdown)(struct device *dev);
>>
>>
>>
>>
>>> @@ -893,11 +905,13 @@ enum dl_dev_state {
>>>   * struct dev_links_info - Device data related to device links.
>>>   * @suppliers: List of links to supplier devices.
>>>   * @consumers: List of links to consumer devices.
>>
>>> + * @needs_suppliers: Hook to global list of devices waiting for suppliers.
>>
>>     * @needs_suppliers: List of devices deferring probe until supplier drivers
>>     *                   are successfully probed.
> 
> It's "need suppliers". As in, this is a device that "needs suppliers".
> So, no, this is not a list. This is a node in a list. And all "nodes
> in a list" are documented as "Hook" in rest of places in this file. So
> I think the documentation is correct as is.

Aha, I got confused about that while trying to keep everything straight.

Somehow I managed to conflate needs_suppliers with the links between
consumers and suppliers that are create via device_link_add().

So original comment is fine.

It is getting late, so I'll continue with patches 2 and 3 tomorrow.

-Frank

> 
> Thanks for your review.
> 
> 
> 
> -Saravana
> 

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v7 3/7] of/platform: Add functional dependency link from DT bindings
  2019-08-16  1:50     ` Saravana Kannan
@ 2019-08-19 17:16       ` Frank Rowand
  2019-08-19 20:49         ` Saravana Kannan
  0 siblings, 1 reply; 37+ messages in thread
From: Frank Rowand @ 2019-08-19 17:16 UTC (permalink / raw)
  To: Saravana Kannan
  Cc: Rob Herring, Mark Rutland, Greg Kroah-Hartman, Rafael J. Wysocki,
	Jonathan Corbet,
	open list:OPEN FIRMWARE AND FLATTENED DEVICE TREE BINDINGS, LKML,
	David Collins, Android Kernel Team, Linux Doc Mailing List

On 8/15/19 6:50 PM, Saravana Kannan wrote:
> On Wed, Aug 7, 2019 at 7:06 PM Frank Rowand <frowand.list@gmail.com> wrote:
>>
>> On 7/23/19 5:10 PM, Saravana Kannan wrote:
>>> Add device-links after the devices are created (but before they are
>>> probed) by looking at common DT bindings like clocks and
>>> interconnects.


< very big snip (lots of comments that deserve answers) >


>>
>> /**
>>  * of_link_property - TODO:
>>  * dev:
>>  * con_np:
>>  * prop:
>>  *
>>  * TODO...
>>  *
>>  * Any failed attempt to create a link will NOT result in an immediate return.
>>  * of_link_property() must create all possible links even when one of more
>>  * attempts to create a link fail.
>>
>> Why?  isn't one failure enough to prevent probing this device?
>> Continuing to scan just results in extra work... which will be
>> repeated every time device_link_check_waiting_consumers() is called
> 
> Context:
> As I said in the cover letter, avoiding unnecessary probes is just one
> of the reasons for this patch. The other (arguably more important)

Agree that it is more important.


> reason for this patch is to make sure suppliers know that they have
> consumers that are yet to be probed. That way, suppliers can leave
> their resource on AND in the right state if they were left on by the
> bootloader. For example, if a clock was left on and at 200 MHz, the
> clock provider needs to keep that clock ON and at 200 MHz till all the
> consumers are probed.
> 
> Answer: Let's say a consumer device Z has suppliers A, B and C. If the
> linking fails at A and you return immediately, then B and C could
> probe and then figure that they have no more consumers (they don't see
> a link to Z) and turn off their resources. And Z could fail
> catastrophically.

Then I think that this approach is fatally flawed in the current implementation.

A device can be added by a module that is loaded.  In that case the device
was not present at late boot when the suppliers may turn off their resources.
(I am assuming the details since I have not reviewed the patches later in
the series that implement this part.)

Am I missing something?

If I am wrong, then I'll have more comments for your review replies for
patches 2 and 3.

> 

< another snip >

> Thanks,
> Saravana
> 

-Frank

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v7 3/7] of/platform: Add functional dependency link from DT bindings
  2019-08-19 17:16       ` Frank Rowand
@ 2019-08-19 20:49         ` Saravana Kannan
  2019-08-19 21:30           ` Frank Rowand
  0 siblings, 1 reply; 37+ messages in thread
From: Saravana Kannan @ 2019-08-19 20:49 UTC (permalink / raw)
  To: Frank Rowand
  Cc: Rob Herring, Mark Rutland, Greg Kroah-Hartman, Rafael J. Wysocki,
	Jonathan Corbet,
	open list:OPEN FIRMWARE AND FLATTENED DEVICE TREE BINDINGS, LKML,
	David Collins, Android Kernel Team, Linux Doc Mailing List

On Mon, Aug 19, 2019 at 10:16 AM Frank Rowand <frowand.list@gmail.com> wrote:
>
> On 8/15/19 6:50 PM, Saravana Kannan wrote:
> > On Wed, Aug 7, 2019 at 7:06 PM Frank Rowand <frowand.list@gmail.com> wrote:
> >>
> >> On 7/23/19 5:10 PM, Saravana Kannan wrote:
> >>> Add device-links after the devices are created (but before they are
> >>> probed) by looking at common DT bindings like clocks and
> >>> interconnects.
>
>
> < very big snip (lots of comments that deserve answers) >
>
>
> >>
> >> /**
> >>  * of_link_property - TODO:
> >>  * dev:
> >>  * con_np:
> >>  * prop:
> >>  *
> >>  * TODO...
> >>  *
> >>  * Any failed attempt to create a link will NOT result in an immediate return.
> >>  * of_link_property() must create all possible links even when one of more
> >>  * attempts to create a link fail.
> >>
> >> Why?  isn't one failure enough to prevent probing this device?
> >> Continuing to scan just results in extra work... which will be
> >> repeated every time device_link_check_waiting_consumers() is called
> >
> > Context:
> > As I said in the cover letter, avoiding unnecessary probes is just one
> > of the reasons for this patch. The other (arguably more important)
>
> Agree that it is more important.
>
>
> > reason for this patch is to make sure suppliers know that they have
> > consumers that are yet to be probed. That way, suppliers can leave
> > their resource on AND in the right state if they were left on by the
> > bootloader. For example, if a clock was left on and at 200 MHz, the
> > clock provider needs to keep that clock ON and at 200 MHz till all the
> > consumers are probed.
> >
> > Answer: Let's say a consumer device Z has suppliers A, B and C. If the
> > linking fails at A and you return immediately, then B and C could
> > probe and then figure that they have no more consumers (they don't see
> > a link to Z) and turn off their resources. And Z could fail
> > catastrophically.
>
> Then I think that this approach is fatally flawed in the current implementation.

I'm waiting to hear how it is fatally flawed. But maybe this is just a
misunderstanding of the problem?

In the text below, I'm not sure if you mixing up two different things
or just that your wording it a bit ambiguous. So pardon my nitpick to
err on the side of clarity.

> A device can be added by a module that is loaded.

No, in the example I gave, of_platform_default_populate_init() would
add all 3 of those devices during arch_initcall_sync().

>  In that case the device
> was not present at late boot when the suppliers may turn off their resources.

In that case, the _drivers_ for those devices aren't present at late
boot. So that they can't request to keep the resources on for their
consumer devices. Since there are no consumer requests on resources,
the suppliers turn off their resources at late boot (since there isn't
a better location as of today). The sync_state() call back added in a
subsequent patche in this series will provide the better location.

> (I am assuming the details since I have not reviewed the patches later in
> the series that implement this part.)
>
> Am I missing something?

I think you are mixing up devices getting added/populated with drivers
getting loaded as modules?

> If I am wrong, then I'll have more comments for your review replies for
> patches 2 and 3.

I'll wait for more review replies?

Thanks,
Saravana

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v7 3/7] of/platform: Add functional dependency link from DT bindings
  2019-08-19 20:49         ` Saravana Kannan
@ 2019-08-19 21:30           ` Frank Rowand
  2019-08-20  0:09             ` Saravana Kannan
  0 siblings, 1 reply; 37+ messages in thread
From: Frank Rowand @ 2019-08-19 21:30 UTC (permalink / raw)
  To: Saravana Kannan
  Cc: Rob Herring, Mark Rutland, Greg Kroah-Hartman, Rafael J. Wysocki,
	Jonathan Corbet,
	open list:OPEN FIRMWARE AND FLATTENED DEVICE TREE BINDINGS, LKML,
	David Collins, Android Kernel Team, Linux Doc Mailing List

On 8/19/19 1:49 PM, Saravana Kannan wrote:
> On Mon, Aug 19, 2019 at 10:16 AM Frank Rowand <frowand.list@gmail.com> wrote:
>>
>> On 8/15/19 6:50 PM, Saravana Kannan wrote:
>>> On Wed, Aug 7, 2019 at 7:06 PM Frank Rowand <frowand.list@gmail.com> wrote:
>>>>
>>>> On 7/23/19 5:10 PM, Saravana Kannan wrote:
>>>>> Add device-links after the devices are created (but before they are
>>>>> probed) by looking at common DT bindings like clocks and
>>>>> interconnects.
>>
>>
>> < very big snip (lots of comments that deserve answers) >
>>
>>
>>>>
>>>> /**
>>>>  * of_link_property - TODO:
>>>>  * dev:
>>>>  * con_np:
>>>>  * prop:
>>>>  *
>>>>  * TODO...
>>>>  *
>>>>  * Any failed attempt to create a link will NOT result in an immediate return.
>>>>  * of_link_property() must create all possible links even when one of more
>>>>  * attempts to create a link fail.
>>>>
>>>> Why?  isn't one failure enough to prevent probing this device?
>>>> Continuing to scan just results in extra work... which will be
>>>> repeated every time device_link_check_waiting_consumers() is called
>>>
>>> Context:
>>> As I said in the cover letter, avoiding unnecessary probes is just one
>>> of the reasons for this patch. The other (arguably more important)
>>
>> Agree that it is more important.
>>
>>
>>> reason for this patch is to make sure suppliers know that they have
>>> consumers that are yet to be probed. That way, suppliers can leave
>>> their resource on AND in the right state if they were left on by the
>>> bootloader. For example, if a clock was left on and at 200 MHz, the
>>> clock provider needs to keep that clock ON and at 200 MHz till all the
>>> consumers are probed.
>>>
>>> Answer: Let's say a consumer device Z has suppliers A, B and C. If the
>>> linking fails at A and you return immediately, then B and C could
>>> probe and then figure that they have no more consumers (they don't see
>>> a link to Z) and turn off their resources. And Z could fail
>>> catastrophically.
>>
>> Then I think that this approach is fatally flawed in the current implementation.
> 
> I'm waiting to hear how it is fatally flawed. But maybe this is just a
> misunderstanding of the problem?

Fatally flawed because it does not handle modules that add a consumer
device when the module is loaded.


> 
> In the text below, I'm not sure if you mixing up two different things
> or just that your wording it a bit ambiguous. So pardon my nitpick to
> err on the side of clarity.

Please do nitpick.  Clarity is good.


> 
>> A device can be added by a module that is loaded.
> 
> No, in the example I gave, of_platform_default_populate_init() would
> add all 3 of those devices during arch_initcall_sync().

The example you gave does not cover all use cases.

There are modules that add devices when the module is loaded.  You can not
ignore systems using such modules.


> 
>>  In that case the device
>> was not present at late boot when the suppliers may turn off their resources.
> 
> In that case, the _drivers_ for those devices aren't present at late
> boot. So that they can't request to keep the resources on for their
> consumer devices. Since there are no consumer requests on resources,
> the suppliers turn off their resources at late boot (since there isn't
> a better location as of today). The sync_state() call back added in a
> subsequent patche in this series will provide the better location.

And the sync_state() call back will not deal with modules that add consumer
devices when the module is loaded, correct?


> 
>> (I am assuming the details since I have not reviewed the patches later in
>> the series that implement this part.)
>>
>> Am I missing something?
> 
> I think you are mixing up devices getting added/populated with drivers
> getting loaded as modules?

Only some modules add devices when they are loaded.  But these modules do
exist.

-Frank

> 
>> If I am wrong, then I'll have more comments for your review replies for
>> patches 2 and 3.
> 
> I'll wait for more review replies?
> 
> Thanks,
> Saravana
> 

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v7 1/7] driver core: Add support for linking devices during device addition
  2019-08-19  3:38       ` Frank Rowand
@ 2019-08-20  0:00         ` Saravana Kannan
  2019-08-20  4:25           ` Frank Rowand
  0 siblings, 1 reply; 37+ messages in thread
From: Saravana Kannan @ 2019-08-20  0:00 UTC (permalink / raw)
  To: Frank Rowand
  Cc: Rob Herring, Mark Rutland, Greg Kroah-Hartman, Rafael J. Wysocki,
	open list:OPEN FIRMWARE AND FLATTENED DEVICE TREE BINDINGS, LKML,
	David Collins, Android Kernel Team

On Sun, Aug 18, 2019 at 8:38 PM Frank Rowand <frowand.list@gmail.com> wrote:
>
> On 8/15/19 6:50 PM, Saravana Kannan wrote:
> > On Wed, Aug 7, 2019 at 7:04 PM Frank Rowand <frowand.list@gmail.com> wrote:
> >>
> >>> Date: Tue, 23 Jul 2019 17:10:54 -0700
> >>> Subject: [PATCH v7 1/7] driver core: Add support for linking devices during
> >>>  device addition
> >>> From: Saravana Kannan <saravanak@google.com>
> >>>
> >>> When devices are added, the bus might want to create device links to track
> >>> functional dependencies between supplier and consumer devices. This
> >>> tracking of supplier-consumer relationship allows optimizing device probe
> >>> order and tracking whether all consumers of a supplier are active. The
> >>> add_links bus callback is added to support this.
> >>
> >> Change above to:
> >>
> >> When devices are added, the bus may create device links to track which
> >> suppliers a consumer device depends upon.  This
> >> tracking of supplier-consumer relationship may be used to defer probing
> >> the driver of a consumer device before the driver(s) for its supplier device(s)
> >> are probed.  It may also be used by a supplier driver to determine if
> >> all of its consumers have been successfully probed.
> >> The add_links bus callback is added to create the supplier device links
> >>
> >>>
> >>> However, when consumer devices are added, they might not have a supplier
> >>> device to link to despite needing mandatory resources/functionality from
> >>> one or more suppliers. A waiting_for_suppliers list is created to track
> >>> such consumers and retry linking them when new devices get added.
> >>
> >> Change above to:
> >>
> >> If a supplier device has not yet been created when the consumer device attempts
> >> to link it, the consumer device is added to the wait_for_suppliers list.
> >> When supplier devices are created, the supplier device link will be added to
> >> the relevant consumer devices on the wait_for_suppliers list.
> >>
> >
> > I'll take these commit text suggestions if we decide to revert the
> > entire series at the end of this review.
> >
> >>>
> >>> Signed-off-by: Saravana Kannan <saravanak@google.com>
> >>> ---
> >>>  drivers/base/core.c    | 83 ++++++++++++++++++++++++++++++++++++++++++
> >>>  include/linux/device.h | 14 +++++++
> >>>  2 files changed, 97 insertions(+)
> >>>
> >>> diff --git a/drivers/base/core.c b/drivers/base/core.c
> >>> index da84a73f2ba6..1b4eb221968f 100644
> >>> --- a/drivers/base/core.c
> >>> +++ b/drivers/base/core.c
> >>> @@ -44,6 +44,8 @@ early_param("sysfs.deprecated", sysfs_deprecated_setup);
> >>>  #endif
> >>>
> >>>  /* Device links support. */
> >>> +static LIST_HEAD(wait_for_suppliers);
> >>> +static DEFINE_MUTEX(wfs_lock);
> >>>
> >>>  #ifdef CONFIG_SRCU
> >>>  static DEFINE_MUTEX(device_links_lock);
> >>> @@ -401,6 +403,51 @@ struct device_link *device_link_add(struct device *consumer,
> >>>  }
> >>>  EXPORT_SYMBOL_GPL(device_link_add);
> >>>
> >>> +/**
> >>
> >>> + * device_link_wait_for_supplier - Mark device as waiting for supplier
> >>
> >>     * device_link_wait_for_supplier - Add device to wait_for_suppliers list
> >
>
> As a meta-comment, I found this series very hard to understand in the context
> of reading the new code for the first time.  When I read the code again in
> six months or a year or two years it will not be in near term memory and it
> will be as if I am reading it for the first time.  A lot of my suggestions
> for changes of names are in that context -- the current names may be fine
> when one has recently read the code, but not so much when trying to read
> the whole thing again with a blank mind.

Thanks for the context.

> The code also inherits a good deal of complexity because it does not stand
> alone in a nice discrete chunk, but instead delicately weaves into a more
> complex body of code.

I'll take this as a compliment :)

> When I was trying to understand the code, I wrote a lot of additional
> comments within my reply email to provide myself context, information
> about various things, and questions that I needed to answer (or if I
> could not answer to then ask you).  Then I ended up being able to remove
> many of those notes before sending the reply.
>
>
> > I intentionally chose "Mark device..." because that's a better
> > description of the semantics of the function instead of trying to
> > describe the implementation. Whether I'm using a linked list or some
> > other data structure should not be the one line documentation of a
> > function. Unless the function is explicitly about operating on that
> > specific data structure.
>
> I agree with the intent of trying to describe the semantics of a function,
> especially at the API level where other systems (or drivers) would be using
> the function.  But for this case the function is at the implementation level
> and describing explicitly what it is doing makes this much more readable for
> me.

Are you distinguishing between API level vs implementation level based
on the function being "static"/not exported? I believe the earlier
version of this series had this function as an exported API. So maybe
that's why I had it as "Mark device".

> I also find "Mark device" to be vague and not descriptive of what the
> intent is.
>
> >
> >>
> >>
> >>> + * @consumer: Consumer device
> >>> + *
> >>> + * Marks the consumer device as waiting for suppliers to become available. The
> >>> + * consumer device will never be probed until it's unmarked as waiting for
> >>> + * suppliers. The caller is responsible for adding the link to the supplier
> >>> + * once the supplier device is present.
> >>> + *
> >>> + * This function is NOT meant to be called from the probe function of the
> >>> + * consumer but rather from code that creates/adds the consumer device.
> >>> + */
> >>> +static void device_link_wait_for_supplier(struct device *consumer)
> >>> +{
> >>> +     mutex_lock(&wfs_lock);
> >>> +     list_add_tail(&consumer->links.needs_suppliers, &wait_for_suppliers);
> >>> +     mutex_unlock(&wfs_lock);
> >>> +}
> >>> +
> >>> +/**
> >>
> >>
> >>> + * device_link_check_waiting_consumers - Try to remove from supplier wait list
> >>> + *
> >>> + * Loops through all consumers waiting on suppliers and tries to add all their
> >>> + * supplier links. If that succeeds, the consumer device is unmarked as waiting
> >>> + * for suppliers. Otherwise, they are left marked as waiting on suppliers,
> >>> + *
> >>> + * The add_links bus callback is expected to return 0 if it has found and added
> >>> + * all the supplier links for the consumer device. It should return an error if
> >>> + * it isn't able to do so.
> >>> + *
> >>> + * The caller of device_link_wait_for_supplier() is expected to call this once
> >>> + * it's aware of potential suppliers becoming available.
> >>
> >> Change above comment to:
> >>
> >>     * device_link_add_supplier_links - add links from consumer devices to
> >>     *                                  supplier devices, leaving any consumer
> >>     *                                  with inactive suppliers on the
> >>     *                                  wait_for_suppliers list
> >
> > I didn't know that the first one line comment could span multiple
> > lines. Good to know.
> >
> >
> >>     * Scan all consumer devices in the devicetree.
> >
> > This function doesn't have anything to do with devicetree. I've
> > intentionally kept all OF related parts out of the driver/core because
> > I hope that other busses can start using this feature too. So I can't
> > take this bit.
>
> My comment is left over from when I was taking notes, trying to understand the
> code.
>
> At the moment, only devicetree is used as a source of the dependency information.
> The comment would better be re-phrased as:
>
>         * Scan all consumer devices in the firmware description of the hardware topology
>

Ok

> I did not ask why this feature is tied to _only_ the platform bus, but will now.

Because devicetree and platform bus the only ones I'm familiar with.
If other busses want to add this, I'd be happy to help with code
and/or direction/review. But I won't pretend to know anything about
ACPI.

> I do not know of any reason that a consumer / supplier relationship can not be
> between devices on different bus types.  Do you know of such a reason?

Yes, it's hypothetically possible. But I haven't seen such a
relationship being defined in DT. Nor somewhere else where this might
be captured. So, how common/realistic is it?

> >
> >>  For any supplier device that
> >>     * is not already linked to the consumer device, add the supplier to the
> >>     * consumer device's device links.
> >>     *
> >>     * If all of a consumer device's suppliers are available then the consumer
> >>     * is removed from the wait_for_suppliers list (if previously on the list).
> >>     * Otherwise the consumer is added to the wait_for_suppliers list (if not
> >>     * already on the list).
> >
> > Honestly, I don't think this is any better than what I already have.
>
> Note that my version of these comments was written while I was reading the code,
> and did not have any big picture understanding yet.  This will likely also be
> the mind set of most everyone who reads this code in the future, once it is
> woven into the kernel.
>
> If you don't like the change, I can revisit it in a later version of the
> patch set.

I'll take in all the ones I feel are reasonable or don't feel strongly
about. We can revisit the rest later.

> >
> >>     * The add_links bus callback must return 0 if it has found and added all
> >>     * the supplier links for the consumer device. It must return an error if
> >>     * it is not able to do so.
> >>     *
> >>     * The caller of device_link_wait_for_supplier() is expected to call this once
> >>     * it is aware of potential suppliers becoming available.
> >>
> >>
> >>
> >>> + */
> >>> +static void device_link_check_waiting_consumers(void)
> >>
> >> Function name is misleading and hides side effects.
> >>
> >> I have not come up with a name that does not hide side effects, but a better
> >> name would be:
> >>
> >>    device_link_add_supplier_links()
> >
> > I kinda agree that it could afford a better name. The current name is
> > too similar to device_links_check_suppliers() and I never liked that.
>
> Naming new fields or variables related to device links looks pretty
> challenging to me, because of the desire to be part of device links
> and not a wart pasted on the side.  So I share the pain in trying
> to find good names.
>
> >
> > Maybe device_link_add_missing_suppliers()?
>
> My first reaction was "yes, that sounds good".  But then I stopped and
> tried to read the name out of context.  The name is not adding the
> missing suppliers, it is saving the information that a supplier is
> not yet available (eg, is "missing").  I struggled in coming up with
> the name that I suggested.  We can keep thinking.

No, this function _IS_ about adding links to suppliers. These
consumers were "saved" as "not yet having the supplier" earlier by
device_link_wait_for_supplier(). This function doesn't do that. This
function is just trying to see if those missing suppliers are present
now and if so adding a link to them from the "saved" consumers. I
think device_link_add_missing_suppliers() is actually a pretty good
name. Let me know what you think now.

>
>
> >
> > I don't think we need "links" repeated twice in the function name.
>
> Yeah, I didn't like that either.
>
>
> > With this suggestion, what side effect is hidden in your opinion? That
> > the fully linked consumer is removed from the "waiting for suppliers"
> > list?
>
> The side effect is that the function does not merely do a check.  It also
> adds missing suppliers to a list.

No, it doesn't do that. I can't keep a list of things that aren't
allocated yet :). In the whole patch series, we only keep a list of things
(consumers) that are waiting on other things (missing suppliers).

> >
> > Maybe device_link_try_removing_from_wfs()?
>
> I like that, other than the fact that it still does not provide a clue
> that the function is potentially adding suppliers to a list.

It doesn't. How would you add a supplier device to a list if the
device itself isn't there? :)

>  I think
> part of the challenge is that the function does two things: (1) a check,
> and (2) potentially adding missing suppliers to a list.  Maybe a simple
> one line comment at the call site, something like:
>
>    /* adds missing suppliers to wfs */
>
>
> >
> > I'll wait for us to agree on a better name here before I change this.
> >
> >>> +{
> >>> +     struct device *dev, *tmp;
> >>> +
> >>> +     mutex_lock(&wfs_lock);
> >>> +     list_for_each_entry_safe(dev, tmp, &wait_for_suppliers,
> >>> +                              links.needs_suppliers)
> >>> +             if (!dev->bus->add_links(dev))
> >>> +                     list_del_init(&dev->links.needs_suppliers);
> >>
> >> Empties dev->links.needs_suppliers, but does not remove dev from
> >> wait_for_suppliers list.  Where does that happen?
> >
> > I'll chalk this up to you having a long day or forgetting your coffee
> > :) list_del_init() does both of those things because needs_suppliers
> > is the node and wait_for_suppliers is the list.
>
> Yes, brain mis-fire on my part.  I'll have to go back and look at the
> list related code again.
>
>
> >
> >>
> >>> +     mutex_unlock(&wfs_lock);
> >>> +}
> >>> +
> >>>  static void device_link_free(struct device_link *link)
> >>>  {
> >>>       while (refcount_dec_not_one(&link->rpm_active))
> >>> @@ -535,6 +582,19 @@ int device_links_check_suppliers(struct device *dev)
> >>>       struct device_link *link;
> >>>       int ret = 0;
> >>>
> >>> +     /*
> >>> +      * If a device is waiting for one or more suppliers (in
> >>> +      * wait_for_suppliers list), it is not ready to probe yet. So just
> >>> +      * return -EPROBE_DEFER without having to check the links with existing
> >>> +      * suppliers.
> >>> +      */
> >>
> >> Change comment to:
> >>
> >>         /*
> >>          * Device waiting for supplier to become available is not allowed
> >>          * to probe
> >>          */
> >
> > Po-tay-to. Po-tah-to? I think my comment is just as good.
>
> If just as good and shorter, then better.
>
> Also the original says "it is not ready to probe".  That is not correct.  It
> is ready to probe, it is just that the probe attempt will return -EPROBE_DEFER.
> Nit picky on my part, but tiny things like that mean I have to think harder.
> I have to think "why is it not ready to probe?".  Maybe my version should have
> instead been something like:
>
>         * Device waiting for supplier to become available will return
>         * -EPROBE_DEFER if probed.  Avoid the unneeded processing.
>
> >
> >>> +     mutex_lock(&wfs_lock);
> >>> +     if (!list_empty(&dev->links.needs_suppliers)) {
> >>> +             mutex_unlock(&wfs_lock);
> >>> +             return -EPROBE_DEFER;
> >>> +     }
> >>> +     mutex_unlock(&wfs_lock);
> >>> +
> >>>       device_links_write_lock();
> >>
> >> Update Documentation/driver-api/device_link.rst to reflect the
> >> check of &dev->links.needs_suppliers in device_links_check_suppliers().
> >
> > Thanks! Will do.
> >
> >>
> >>>
> >>>       list_for_each_entry(link, &dev->links.suppliers, c_node) {
> >>> @@ -812,6 +872,10 @@ static void device_links_purge(struct device *dev)
> >>>  {
> >>>       struct device_link *link, *ln;
> >>>
> >>> +     mutex_lock(&wfs_lock);
> >>> +     list_del(&dev->links.needs_suppliers);
> >>> +     mutex_unlock(&wfs_lock);
> >>> +
> >>>       /*
> >>>        * Delete all of the remaining links from this device to any other
> >>>        * devices (either consumers or suppliers).
> >>> @@ -1673,6 +1737,7 @@ void device_initialize(struct device *dev)
> >>>  #endif
> >>>       INIT_LIST_HEAD(&dev->links.consumers);
> >>>       INIT_LIST_HEAD(&dev->links.suppliers);
> >>> +     INIT_LIST_HEAD(&dev->links.needs_suppliers);
> >>>       dev->links.status = DL_DEV_NO_DRIVER;
> >>>  }
> >>>  EXPORT_SYMBOL_GPL(device_initialize);
> >>> @@ -2108,6 +2173,24 @@ int device_add(struct device *dev)
> >>>                                            BUS_NOTIFY_ADD_DEVICE, dev);
> >>>
> >>>       kobject_uevent(&dev->kobj, KOBJ_ADD);
> >>
> >>> +
> >>> +     /*
> >>> +      * Check if any of the other devices (consumers) have been waiting for
> >>> +      * this device (supplier) to be added so that they can create a device
> >>> +      * link to it.
> >>> +      *
> >>> +      * This needs to happen after device_pm_add() because device_link_add()
> >>> +      * requires the supplier be registered before it's called.
> >>> +      *
> >>> +      * But this also needs to happe before bus_probe_device() to make sure
> >>> +      * waiting consumers can link to it before the driver is bound to the
> >>> +      * device and the driver sync_state callback is called for this device.
> >>> +      */
> >>
> >>         /*
> >>          * Add links to dev from any dependent consumer that has dev on it's
> >>          * list of needed suppliers
> >
> > There is no list of needed suppliers.
>
> "the other devices (consumers) have been waiting for this device (supplier)".
> Isn't that a list of needed suppliers?

No, that's a list of consumers that needs_suppliers.

> >
> >> (links.needs_suppliers).  Device_pm_add()
> >>          * must have previously registered dev to allow the links to be added.
> >>          *
> >>          * The consumer links must be created before dev is probed because the
> >>          * sync_state callback for dev will use the consumer links.
> >>          */
> >
> > I think what I wrote is just as clear.
>
> The original comment is vague.  It does not explain why consumer links must be
> created before the probe.  I had to go off and read other code to determine
> why that is true.
>
> And again, brevity is better if otherwise just as clear.
>
>
> >
> >>
> >>> +     device_link_check_waiting_consumers();
> >>> +
> >>> +     if (dev->bus && dev->bus->add_links && dev->bus->add_links(dev))
> >>> +             device_link_wait_for_supplier(dev);
> >>> +
> >>>       bus_probe_device(dev);
> >>>       if (parent)
> >>>               klist_add_tail(&dev->p->knode_parent,
> >>> diff --git a/include/linux/device.h b/include/linux/device.h
> >>> index c330b75c6c57..5d70babb7462 100644
> >>> --- a/include/linux/device.h
> >>> +++ b/include/linux/device.h
> >>> @@ -78,6 +78,17 @@ extern void bus_remove_file(struct bus_type *, struct bus_attribute *);
> >>>   *           -EPROBE_DEFER it will queue the device for deferred probing.
> >>>   * @uevent:  Called when a device is added, removed, or a few other things
> >>>   *           that generate uevents to add the environment variables.
> >>
> >>> + * @add_links:       Called, perhaps multiple times per device, after a device is
> >>> + *           added to this bus.  The function is expected to create device
> >>> + *           links to all the suppliers of the input device that are
> >>> + *           available at the time this function is called.  As in, the
> >>> + *           function should NOT stop at the first failed device link if
> >>> + *           other unlinked supplier devices are present in the system.
> >>
> >> * @add_links:   Called after a device is added to this bus.
> >
> > Why are you removing the "perhaps multiple times" part? that's true
> > and that's how some of the other ops are documented.
>
> I didn't remove it.  I rephrased it with a little bit more explanation as
> "If some suppliers are not yet available, this function will be
> called again when the suppliers become available." (below).
>
>
> >
> >>  The function is
> >> *               expected to create device links to all the suppliers of the
> >> *               device that are available at the time this function is called.
> >> *               The function must NOT stop at the first failed device link if
> >> *               other unlinked supplier devices are present in the system.
> >> *               If some suppliers are not yet available, this function will be
> >> *               called again when the suppliers become available.
> >>
> >> but add_links() not needed, so moving this comment to of_link_to_suppliers()
> >
> > Sorry, I'm not sure I understand. Can you please explain what you are
> > trying to say? of_link_to_suppliers() is just one implementation of
> > add_links(). The comment above is try for any bus trying to implement
> > add_links().
>
> This is conflating bus with the source of the firmware description of the
> hardware topology.  For drivers that use various APIs to access firmware
> description of topology that may be either devicetree or ACPI the access
> is done via fwnode_operations, based on struct device.fwnode (if I recall
> properly).
>
> I failed to completely address why add_links() is not needed.  The answer
> is that there should be a single function called for all buses.  Then
> the proper firmware data source would be accessed via a struct fwnode_operations.
>
> I think I left this out because I had not yet asked why this feature is
> tied only to the platform bus.  Which I asked earlier in this reply.

Thanks for the pointer about fwnode and fwnode_operations. I wasn't
aware of those. I see where you are going with this. I see a couple of
problems with this approach though:

1. How you interpret the properties of a fwnode is specific to the fw
type. The clocks DT property isn't going to have the same definition
in ACPI or some other firmware. Heck, I don't know if ACPI even has a
clocks like property. So have one function to parse all the FW types
doesn't make a lot of sense.

2. If this common code is implemented as part of driver/base/, then at
a minimum, I'll have to check if a fwnode is a DT node before I start
interpreting the properties of a device's fwnode. But that means I'll
have to include linux/of.h to use is_of_node(). I don't like having
driver/base code depend on OF or platform or ACPI headers.

3. The supplier info doesn't always need to come from a firmware. So I
don't want to limit it to that?

Also, I don't necessarily see this as conflating firmware (DT, ACPI,
etc) with the bus (platform bus, ACPI bus, PCI bus). Whoever creates
the device seems like the entity best suited to figure out the
suppliers of the device (apart from the driver, obviously). So the bus
deciding the suppliers doesn't seem wrong to me.

In this specific case, I'm trying to address DT for now and leaving
ACPI to whoever else wants to add device links based on ACPI data.
Most OF/DT based devices end up in platform bus. So I'm just handling
this in platform bus. If some other person wants this to work for ACPI
bus or PCI bus, they are welcome to implement add_links() for those
busses? I'm nowhere close to an expert on those.

> >
> >>
> >>
> >>> + *
> >>> + *           Return 0 if device links have been successfully created to all
> >>> + *           the suppliers of this device.  Return an error if some of the
> >>> + *           suppliers are not yet available and this function needs to be
> >>> + *           reattempted in the future.
> >>
> >> *
> >> *               Return 0 if device links have been successfully created to all
> >> *               the suppliers of this device.  Return an error if some of the
> >> *               suppliers are not yet available.
> >>
> >>
> >>>   * @probe:   Called when a new device or driver add to this bus, and callback
> >>>   *           the specific driver's probe to initial the matched device.
> >>>   * @remove:  Called when a device removed from this bus.
> >>> @@ -122,6 +133,7 @@ struct bus_type {
> >>>
> >>>       int (*match)(struct device *dev, struct device_driver *drv);
> >>>       int (*uevent)(struct device *dev, struct kobj_uevent_env *env);
> >>
> >>
> >>> +     int (*add_links)(struct device *dev);
> >>
> >>               ^^^^^^^^^  add_supplier              ???
> >>               ^^^^^^^^^  add_suppliers             ???
> >>
> >>               ^^^^^^^^^  link_suppliers            ???
> >>
> >>               ^^^^^^^^^  add_supplier_dependency   ???
> >>               ^^^^^^^^^  add_supplier_dependencies ???
> >> add_links() not needed
> >
> > add_links() was an intentional decision. There's no requirement that
> > the bus should only create links from this device to its suppliers. If
> > the bus also knows the consumers of this device (dev), then it
> > can/should add those too.
>
> Is creating links to consumers of this device implemented in this patch
> series?  If so, I overlooked it and will have to consider how that
> fits in to the design.
>
>
> > So, it shouldn't have "suppliers" in the
> > name.
> >
> >>>       int (*probe)(struct device *dev);
> >>>       int (*remove)(struct device *dev);
> >>>       void (*shutdown)(struct device *dev);
> >>
> >>
> >>
> >>
> >>> @@ -893,11 +905,13 @@ enum dl_dev_state {
> >>>   * struct dev_links_info - Device data related to device links.
> >>>   * @suppliers: List of links to supplier devices.
> >>>   * @consumers: List of links to consumer devices.
> >>
> >>> + * @needs_suppliers: Hook to global list of devices waiting for suppliers.
> >>
> >>     * @needs_suppliers: List of devices deferring probe until supplier drivers
> >>     *                   are successfully probed.
> >
> > It's "need suppliers". As in, this is a device that "needs suppliers".
> > So, no, this is not a list. This is a node in a list. And all "nodes
> > in a list" are documented as "Hook" in rest of places in this file. So
> > I think the documentation is correct as is.
>
> Aha, I got confused about that while trying to keep everything straight.
>
> Somehow I managed to conflate needs_suppliers with the links between
> consumers and suppliers that are create via device_link_add().
>
> So original comment is fine.
>
> It is getting late, so I'll continue with patches 2 and 3 tomorrow.
>

Thanks for the review.

-Saravana

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v7 3/7] of/platform: Add functional dependency link from DT bindings
  2019-08-19 21:30           ` Frank Rowand
@ 2019-08-20  0:09             ` Saravana Kannan
  2019-08-20  4:26               ` Frank Rowand
  0 siblings, 1 reply; 37+ messages in thread
From: Saravana Kannan @ 2019-08-20  0:09 UTC (permalink / raw)
  To: Frank Rowand
  Cc: Rob Herring, Mark Rutland, Greg Kroah-Hartman, Rafael J. Wysocki,
	Jonathan Corbet,
	open list:OPEN FIRMWARE AND FLATTENED DEVICE TREE BINDINGS, LKML,
	David Collins, Android Kernel Team, Linux Doc Mailing List

On Mon, Aug 19, 2019 at 2:30 PM Frank Rowand <frowand.list@gmail.com> wrote:
>
> On 8/19/19 1:49 PM, Saravana Kannan wrote:
> > On Mon, Aug 19, 2019 at 10:16 AM Frank Rowand <frowand.list@gmail.com> wrote:
> >>
> >> On 8/15/19 6:50 PM, Saravana Kannan wrote:
> >>> On Wed, Aug 7, 2019 at 7:06 PM Frank Rowand <frowand.list@gmail.com> wrote:
> >>>>
> >>>> On 7/23/19 5:10 PM, Saravana Kannan wrote:
> >>>>> Add device-links after the devices are created (but before they are
> >>>>> probed) by looking at common DT bindings like clocks and
> >>>>> interconnects.
> >>
> >>
> >> < very big snip (lots of comments that deserve answers) >
> >>
> >>
> >>>>
> >>>> /**
> >>>>  * of_link_property - TODO:
> >>>>  * dev:
> >>>>  * con_np:
> >>>>  * prop:
> >>>>  *
> >>>>  * TODO...
> >>>>  *
> >>>>  * Any failed attempt to create a link will NOT result in an immediate return.
> >>>>  * of_link_property() must create all possible links even when one of more
> >>>>  * attempts to create a link fail.
> >>>>
> >>>> Why?  isn't one failure enough to prevent probing this device?
> >>>> Continuing to scan just results in extra work... which will be
> >>>> repeated every time device_link_check_waiting_consumers() is called
> >>>
> >>> Context:
> >>> As I said in the cover letter, avoiding unnecessary probes is just one
> >>> of the reasons for this patch. The other (arguably more important)
> >>
> >> Agree that it is more important.
> >>
> >>
> >>> reason for this patch is to make sure suppliers know that they have
> >>> consumers that are yet to be probed. That way, suppliers can leave
> >>> their resource on AND in the right state if they were left on by the
> >>> bootloader. For example, if a clock was left on and at 200 MHz, the
> >>> clock provider needs to keep that clock ON and at 200 MHz till all the
> >>> consumers are probed.
> >>>
> >>> Answer: Let's say a consumer device Z has suppliers A, B and C. If the
> >>> linking fails at A and you return immediately, then B and C could
> >>> probe and then figure that they have no more consumers (they don't see
> >>> a link to Z) and turn off their resources. And Z could fail
> >>> catastrophically.
> >>
> >> Then I think that this approach is fatally flawed in the current implementation.
> >
> > I'm waiting to hear how it is fatally flawed. But maybe this is just a
> > misunderstanding of the problem?
>
> Fatally flawed because it does not handle modules that add a consumer
> device when the module is loaded.

If you are talking about modules adding child devices of the device
they are managing, then that's handled correctly later in the series.

If you are talking about modules adding devices that aren't defined in
DT, then right, I'm not trying to handle that. The module needs to
make sure it keeps the resources needed for new devices it's adding
are in the right state or need to add the right device links.

> > In the text below, I'm not sure if you mixing up two different things
> > or just that your wording it a bit ambiguous. So pardon my nitpick to
> > err on the side of clarity.
>
> Please do nitpick.  Clarity is good.
>
>
> >
> >> A device can be added by a module that is loaded.
> >
> > No, in the example I gave, of_platform_default_populate_init() would
> > add all 3 of those devices during arch_initcall_sync().
>
> The example you gave does not cover all use cases.
>
> There are modules that add devices when the module is loaded.  You can not
> ignore systems using such modules.

I'll have to agree to disagree on that. While I understand that the
design should be good and I'm happy to work on that, you can't insist
that a patch series shouldn't be allowed because it's only improving
99% of the cases and leaves the other 1% in the status quo. You are
just going to bring the kernel development to a grinding halt.

> >
> >>  In that case the device
> >> was not present at late boot when the suppliers may turn off their resources.
> >
> > In that case, the _drivers_ for those devices aren't present at late
> > boot. So that they can't request to keep the resources on for their
> > consumer devices. Since there are no consumer requests on resources,
> > the suppliers turn off their resources at late boot (since there isn't
> > a better location as of today). The sync_state() call back added in a
> > subsequent patche in this series will provide the better location.
>
> And the sync_state() call back will not deal with modules that add consumer
> devices when the module is loaded, correct?

Depends. If it's just more devices from DT, then it'll be fine. If
it's not, then the module needs to take care of the needs of devices
it's adding.

> >
> >> (I am assuming the details since I have not reviewed the patches later in
> >> the series that implement this part.)
> >>
> >> Am I missing something?
> >
> > I think you are mixing up devices getting added/populated with drivers
> > getting loaded as modules?
>
> Only some modules add devices when they are loaded.  But these modules do
> exist.

Out of the billions of Android devices, how many do you see this happening in?

Thanks,
Saravana

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v7 1/7] driver core: Add support for linking devices during device addition
  2019-08-20  0:00         ` Saravana Kannan
@ 2019-08-20  4:25           ` Frank Rowand
  2019-08-20 22:10             ` Saravana Kannan
  0 siblings, 1 reply; 37+ messages in thread
From: Frank Rowand @ 2019-08-20  4:25 UTC (permalink / raw)
  To: Saravana Kannan
  Cc: Rob Herring, Mark Rutland, Greg Kroah-Hartman, Rafael J. Wysocki,
	open list:OPEN FIRMWARE AND FLATTENED DEVICE TREE BINDINGS, LKML,
	David Collins, Android Kernel Team

On 8/19/19 5:00 PM, Saravana Kannan wrote:
> On Sun, Aug 18, 2019 at 8:38 PM Frank Rowand <frowand.list@gmail.com> wrote:
>>
>> On 8/15/19 6:50 PM, Saravana Kannan wrote:
>>> On Wed, Aug 7, 2019 at 7:04 PM Frank Rowand <frowand.list@gmail.com> wrote:
>>>>
>>>>> Date: Tue, 23 Jul 2019 17:10:54 -0700
>>>>> Subject: [PATCH v7 1/7] driver core: Add support for linking devices during
>>>>>  device addition
>>>>> From: Saravana Kannan <saravanak@google.com>
>>>>>
>>>>> When devices are added, the bus might want to create device links to track
>>>>> functional dependencies between supplier and consumer devices. This
>>>>> tracking of supplier-consumer relationship allows optimizing device probe
>>>>> order and tracking whether all consumers of a supplier are active. The
>>>>> add_links bus callback is added to support this.
>>>>
>>>> Change above to:
>>>>
>>>> When devices are added, the bus may create device links to track which
>>>> suppliers a consumer device depends upon.  This
>>>> tracking of supplier-consumer relationship may be used to defer probing
>>>> the driver of a consumer device before the driver(s) for its supplier device(s)
>>>> are probed.  It may also be used by a supplier driver to determine if
>>>> all of its consumers have been successfully probed.
>>>> The add_links bus callback is added to create the supplier device links
>>>>
>>>>>
>>>>> However, when consumer devices are added, they might not have a supplier
>>>>> device to link to despite needing mandatory resources/functionality from
>>>>> one or more suppliers. A waiting_for_suppliers list is created to track
>>>>> such consumers and retry linking them when new devices get added.
>>>>
>>>> Change above to:
>>>>
>>>> If a supplier device has not yet been created when the consumer device attempts
>>>> to link it, the consumer device is added to the wait_for_suppliers list.
>>>> When supplier devices are created, the supplier device link will be added to
>>>> the relevant consumer devices on the wait_for_suppliers list.
>>>>
>>>
>>> I'll take these commit text suggestions if we decide to revert the
>>> entire series at the end of this review.
>>>
>>>>>
>>>>> Signed-off-by: Saravana Kannan <saravanak@google.com>
>>>>> ---
>>>>>  drivers/base/core.c    | 83 ++++++++++++++++++++++++++++++++++++++++++
>>>>>  include/linux/device.h | 14 +++++++
>>>>>  2 files changed, 97 insertions(+)
>>>>>
>>>>> diff --git a/drivers/base/core.c b/drivers/base/core.c
>>>>> index da84a73f2ba6..1b4eb221968f 100644
>>>>> --- a/drivers/base/core.c
>>>>> +++ b/drivers/base/core.c
>>>>> @@ -44,6 +44,8 @@ early_param("sysfs.deprecated", sysfs_deprecated_setup);
>>>>>  #endif
>>>>>
>>>>>  /* Device links support. */
>>>>> +static LIST_HEAD(wait_for_suppliers);
>>>>> +static DEFINE_MUTEX(wfs_lock);
>>>>>
>>>>>  #ifdef CONFIG_SRCU
>>>>>  static DEFINE_MUTEX(device_links_lock);
>>>>> @@ -401,6 +403,51 @@ struct device_link *device_link_add(struct device *consumer,
>>>>>  }
>>>>>  EXPORT_SYMBOL_GPL(device_link_add);
>>>>>
>>>>> +/**
>>>>
>>>>> + * device_link_wait_for_supplier - Mark device as waiting for supplier
>>>>
>>>>     * device_link_wait_for_supplier - Add device to wait_for_suppliers list
>>>
>>
>> As a meta-comment, I found this series very hard to understand in the context
>> of reading the new code for the first time.  When I read the code again in
>> six months or a year or two years it will not be in near term memory and it
>> will be as if I am reading it for the first time.  A lot of my suggestions
>> for changes of names are in that context -- the current names may be fine
>> when one has recently read the code, but not so much when trying to read
>> the whole thing again with a blank mind.
> 
> Thanks for the context.
> 
>> The code also inherits a good deal of complexity because it does not stand
>> alone in a nice discrete chunk, but instead delicately weaves into a more
>> complex body of code.
> 
> I'll take this as a compliment :)

Please do!


> 
>> When I was trying to understand the code, I wrote a lot of additional
>> comments within my reply email to provide myself context, information
>> about various things, and questions that I needed to answer (or if I
>> could not answer to then ask you).  Then I ended up being able to remove
>> many of those notes before sending the reply.
>>
>>
>>> I intentionally chose "Mark device..." because that's a better
>>> description of the semantics of the function instead of trying to
>>> describe the implementation. Whether I'm using a linked list or some
>>> other data structure should not be the one line documentation of a
>>> function. Unless the function is explicitly about operating on that
>>> specific data structure.
>>
>> I agree with the intent of trying to describe the semantics of a function,
>> especially at the API level where other systems (or drivers) would be using
>> the function.  But for this case the function is at the implementation level
>> and describing explicitly what it is doing makes this much more readable for
>> me.
> 
> Are you distinguishing between API level vs implementation level based
> on the function being "static"/not exported? I believe the earlier

No, being static helps say a function is not API, but an function that is
not static may be intended to be used in a limited and constrained manner.
I distinguished based on the usage of the function.


> version of this series had this function as an exported API. So maybe
> that's why I had it as "Mark device".
> 
>> I also find "Mark device" to be vague and not descriptive of what the
>> intent is.
>>
>>>
>>>>
>>>>
>>>>> + * @consumer: Consumer device
>>>>> + *
>>>>> + * Marks the consumer device as waiting for suppliers to become available. The
>>>>> + * consumer device will never be probed until it's unmarked as waiting for
>>>>> + * suppliers. The caller is responsible for adding the link to the supplier
>>>>> + * once the supplier device is present.
>>>>> + *
>>>>> + * This function is NOT meant to be called from the probe function of the
>>>>> + * consumer but rather from code that creates/adds the consumer device.
>>>>> + */
>>>>> +static void device_link_wait_for_supplier(struct device *consumer)
>>>>> +{
>>>>> +     mutex_lock(&wfs_lock);
>>>>> +     list_add_tail(&consumer->links.needs_suppliers, &wait_for_suppliers);
>>>>> +     mutex_unlock(&wfs_lock);
>>>>> +}
>>>>> +
>>>>> +/**
>>>>
>>>>
>>>>> + * device_link_check_waiting_consumers - Try to remove from supplier wait list
>>>>> + *
>>>>> + * Loops through all consumers waiting on suppliers and tries to add all their
>>>>> + * supplier links. If that succeeds, the consumer device is unmarked as waiting
>>>>> + * for suppliers. Otherwise, they are left marked as waiting on suppliers,
>>>>> + *
>>>>> + * The add_links bus callback is expected to return 0 if it has found and added
>>>>> + * all the supplier links for the consumer device. It should return an error if
>>>>> + * it isn't able to do so.
>>>>> + *
>>>>> + * The caller of device_link_wait_for_supplier() is expected to call this once
>>>>> + * it's aware of potential suppliers becoming available.
>>>>
>>>> Change above comment to:
>>>>
>>>>     * device_link_add_supplier_links - add links from consumer devices to
>>>>     *                                  supplier devices, leaving any consumer
>>>>     *                                  with inactive suppliers on the
>>>>     *                                  wait_for_suppliers list
>>>
>>> I didn't know that the first one line comment could span multiple
>>> lines. Good to know.
>>>
>>>
>>>>     * Scan all consumer devices in the devicetree.
>>>
>>> This function doesn't have anything to do with devicetree. I've
>>> intentionally kept all OF related parts out of the driver/core because
>>> I hope that other busses can start using this feature too. So I can't
>>> take this bit.
>>
>> My comment is left over from when I was taking notes, trying to understand the
>> code.
>>
>> At the moment, only devicetree is used as a source of the dependency information.
>> The comment would better be re-phrased as:
>>
>>         * Scan all consumer devices in the firmware description of the hardware topology
>>
> 
> Ok
> 
>> I did not ask why this feature is tied to _only_ the platform bus, but will now.
> 
> Because devicetree and platform bus the only ones I'm familiar with.
> If other busses want to add this, I'd be happy to help with code
> and/or direction/review. But I won't pretend to know anything about
> ACPI.

Sorry, you don't get to ignore other buses because you are not familiar
with them.

I am not aware of any reason to exclude devices that on other buses and your
answer below does not provide a valid technical reason why the new feature is
correct when it excludes all other buses.

> 
>> I do not know of any reason that a consumer / supplier relationship can not be
>> between devices on different bus types.  Do you know of such a reason?
> 
> Yes, it's hypothetically possible. But I haven't seen such a
> relationship being defined in DT. Nor somewhere else where this might
> be captured. So, how common/realistic is it?

It is entirely legal.  I have no idea how common it is but that is not a valid
reason to exclude other buses from the feature.

> 
>>>
>>>>  For any supplier device that
>>>>     * is not already linked to the consumer device, add the supplier to the
>>>>     * consumer device's device links.
>>>>     *
>>>>     * If all of a consumer device's suppliers are available then the consumer
>>>>     * is removed from the wait_for_suppliers list (if previously on the list).
>>>>     * Otherwise the consumer is added to the wait_for_suppliers list (if not
>>>>     * already on the list).
>>>
>>> Honestly, I don't think this is any better than what I already have.
>>
>> Note that my version of these comments was written while I was reading the code,
>> and did not have any big picture understanding yet.  This will likely also be
>> the mind set of most everyone who reads this code in the future, once it is
>> woven into the kernel.
>>
>> If you don't like the change, I can revisit it in a later version of the
>> patch set.
> 
> I'll take in all the ones I feel are reasonable or don't feel strongly
> about. We can revisit the rest later.
> 
>>>
>>>>     * The add_links bus callback must return 0 if it has found and added all
>>>>     * the supplier links for the consumer device. It must return an error if
>>>>     * it is not able to do so.
>>>>     *
>>>>     * The caller of device_link_wait_for_supplier() is expected to call this once
>>>>     * it is aware of potential suppliers becoming available.
>>>>
>>>>
>>>>
>>>>> + */
>>>>> +static void device_link_check_waiting_consumers(void)
>>>>
>>>> Function name is misleading and hides side effects.
>>>>
>>>> I have not come up with a name that does not hide side effects, but a better
>>>> name would be:
>>>>
>>>>    device_link_add_supplier_links()
>>>
>>> I kinda agree that it could afford a better name. The current name is
>>> too similar to device_links_check_suppliers() and I never liked that.
>>
>> Naming new fields or variables related to device links looks pretty
>> challenging to me, because of the desire to be part of device links
>> and not a wart pasted on the side.  So I share the pain in trying
>> to find good names.
>>
>>>
>>> Maybe device_link_add_missing_suppliers()?
>>
>> My first reaction was "yes, that sounds good".  But then I stopped and
>> tried to read the name out of context.  The name is not adding the
>> missing suppliers, it is saving the information that a supplier is
>> not yet available (eg, is "missing").  I struggled in coming up with

Reading what you say below, and looking at the code again, what I say
in that sentence is backwards.  It is not adding the missing supplier
device links, it is instead adding existing supplier device inks.


>> the name that I suggested.  We can keep thinking.
> 
> No, this function _IS_ about adding links to suppliers. These

You are mis-reading what I wrote.  I said the function "is not adding
the missing suppliers".  You are converting that to "is not adding
links to the missing suppliers".

My suggested name was hinting "add_supplier_links", which is what you
say it does below.  The name you suggest is hinting "add_missing_suppliers".
Do you see the difference?


> consumers were "saved" as "not yet having the supplier" earlier by
> device_link_wait_for_supplier(). This function doesn't do that. This
> function is just trying to see if those missing suppliers are present
> now and if so adding a link to them from the "saved" consumers. I
> think device_link_add_missing_suppliers() is actually a pretty good
> name. Let me know what you think now.
> 
>>
>>
>>>
>>> I don't think we need "links" repeated twice in the function name.
>>
>> Yeah, I didn't like that either.
>>
>>
>>> With this suggestion, what side effect is hidden in your opinion? That
>>> the fully linked consumer is removed from the "waiting for suppliers"
>>> list?
>>
>> The side effect is that the function does not merely do a check.  It also
>> adds missing suppliers to a list.
> 
> No, it doesn't do that. I can't keep a list of things that aren't
> allocated yet :). In the whole patch series, we only keep a list of things
> (consumers) that are waiting on other things (missing suppliers).

OK, as I noted above, I stated that backwards.  It is adding links for
existing suppliers, not for the missing suppliers.

> 
>>>
>>> Maybe device_link_try_removing_from_wfs()?
>>
>> I like that, other than the fact that it still does not provide a clue
>> that the function is potentially adding suppliers to a list.
> 
> It doesn't. How would you add a supplier device to a list if the
> device itself isn't there? :)

Again, that should be existing suppliers, as you noted.  But the point stands
that the function is potentially adding links.


> 
>>  I think
>> part of the challenge is that the function does two things: (1) a check,
>> and (2) potentially adding missing suppliers to a list.  Maybe a simple
>> one line comment at the call site, something like:
>>
>>    /* adds missing suppliers to wfs */
>>
>>
>>>
>>> I'll wait for us to agree on a better name here before I change this.
>>>
>>>>> +{
>>>>> +     struct device *dev, *tmp;
>>>>> +
>>>>> +     mutex_lock(&wfs_lock);
>>>>> +     list_for_each_entry_safe(dev, tmp, &wait_for_suppliers,
>>>>> +                              links.needs_suppliers)
>>>>> +             if (!dev->bus->add_links(dev))
>>>>> +                     list_del_init(&dev->links.needs_suppliers);
>>>>
>>>> Empties dev->links.needs_suppliers, but does not remove dev from
>>>> wait_for_suppliers list.  Where does that happen?
>>>
>>> I'll chalk this up to you having a long day or forgetting your coffee
>>> :) list_del_init() does both of those things because needs_suppliers
>>> is the node and wait_for_suppliers is the list.
>>
>> Yes, brain mis-fire on my part.  I'll have to go back and look at the
>> list related code again.
>>
>>
>>>
>>>>
>>>>> +     mutex_unlock(&wfs_lock);
>>>>> +}
>>>>> +
>>>>>  static void device_link_free(struct device_link *link)
>>>>>  {
>>>>>       while (refcount_dec_not_one(&link->rpm_active))
>>>>> @@ -535,6 +582,19 @@ int device_links_check_suppliers(struct device *dev)
>>>>>       struct device_link *link;
>>>>>       int ret = 0;
>>>>>
>>>>> +     /*
>>>>> +      * If a device is waiting for one or more suppliers (in
>>>>> +      * wait_for_suppliers list), it is not ready to probe yet. So just
>>>>> +      * return -EPROBE_DEFER without having to check the links with existing
>>>>> +      * suppliers.
>>>>> +      */
>>>>
>>>> Change comment to:
>>>>
>>>>         /*
>>>>          * Device waiting for supplier to become available is not allowed
>>>>          * to probe
>>>>          */
>>>
>>> Po-tay-to. Po-tah-to? I think my comment is just as good.
>>
>> If just as good and shorter, then better.
>>
>> Also the original says "it is not ready to probe".  That is not correct.  It
>> is ready to probe, it is just that the probe attempt will return -EPROBE_DEFER.
>> Nit picky on my part, but tiny things like that mean I have to think harder.
>> I have to think "why is it not ready to probe?".  Maybe my version should have
>> instead been something like:
>>
>>         * Device waiting for supplier to become available will return
>>         * -EPROBE_DEFER if probed.  Avoid the unneeded processing.
>>
>>>
>>>>> +     mutex_lock(&wfs_lock);
>>>>> +     if (!list_empty(&dev->links.needs_suppliers)) {
>>>>> +             mutex_unlock(&wfs_lock);
>>>>> +             return -EPROBE_DEFER;
>>>>> +     }
>>>>> +     mutex_unlock(&wfs_lock);
>>>>> +
>>>>>       device_links_write_lock();
>>>>
>>>> Update Documentation/driver-api/device_link.rst to reflect the
>>>> check of &dev->links.needs_suppliers in device_links_check_suppliers().
>>>
>>> Thanks! Will do.
>>>
>>>>
>>>>>
>>>>>       list_for_each_entry(link, &dev->links.suppliers, c_node) {
>>>>> @@ -812,6 +872,10 @@ static void device_links_purge(struct device *dev)
>>>>>  {
>>>>>       struct device_link *link, *ln;
>>>>>
>>>>> +     mutex_lock(&wfs_lock);
>>>>> +     list_del(&dev->links.needs_suppliers);
>>>>> +     mutex_unlock(&wfs_lock);
>>>>> +
>>>>>       /*
>>>>>        * Delete all of the remaining links from this device to any other
>>>>>        * devices (either consumers or suppliers).
>>>>> @@ -1673,6 +1737,7 @@ void device_initialize(struct device *dev)
>>>>>  #endif
>>>>>       INIT_LIST_HEAD(&dev->links.consumers);
>>>>>       INIT_LIST_HEAD(&dev->links.suppliers);
>>>>> +     INIT_LIST_HEAD(&dev->links.needs_suppliers);
>>>>>       dev->links.status = DL_DEV_NO_DRIVER;
>>>>>  }
>>>>>  EXPORT_SYMBOL_GPL(device_initialize);
>>>>> @@ -2108,6 +2173,24 @@ int device_add(struct device *dev)
>>>>>                                            BUS_NOTIFY_ADD_DEVICE, dev);
>>>>>
>>>>>       kobject_uevent(&dev->kobj, KOBJ_ADD);
>>>>
>>>>> +
>>>>> +     /*
>>>>> +      * Check if any of the other devices (consumers) have been waiting for
>>>>> +      * this device (supplier) to be added so that they can create a device
>>>>> +      * link to it.
>>>>> +      *
>>>>> +      * This needs to happen after device_pm_add() because device_link_add()
>>>>> +      * requires the supplier be registered before it's called.
>>>>> +      *
>>>>> +      * But this also needs to happe before bus_probe_device() to make sure
>>>>> +      * waiting consumers can link to it before the driver is bound to the
>>>>> +      * device and the driver sync_state callback is called for this device.
>>>>> +      */
>>>>
>>>>         /*
>>>>          * Add links to dev from any dependent consumer that has dev on it's
>>>>          * list of needed suppliers
>>>
>>> There is no list of needed suppliers.
>>
>> "the other devices (consumers) have been waiting for this device (supplier)".
>> Isn't that a list of needed suppliers?
> 
> No, that's a list of consumers that needs_suppliers.
> 
>>>
>>>> (links.needs_suppliers).  Device_pm_add()
>>>>          * must have previously registered dev to allow the links to be added.
>>>>          *
>>>>          * The consumer links must be created before dev is probed because the
>>>>          * sync_state callback for dev will use the consumer links.
>>>>          */
>>>
>>> I think what I wrote is just as clear.
>>
>> The original comment is vague.  It does not explain why consumer links must be
>> created before the probe.  I had to go off and read other code to determine
>> why that is true.
>>
>> And again, brevity is better if otherwise just as clear.
>>
>>
>>>
>>>>
>>>>> +     device_link_check_waiting_consumers();
>>>>> +
>>>>> +     if (dev->bus && dev->bus->add_links && dev->bus->add_links(dev))
>>>>> +             device_link_wait_for_supplier(dev);
>>>>> +
>>>>>       bus_probe_device(dev);
>>>>>       if (parent)
>>>>>               klist_add_tail(&dev->p->knode_parent,
>>>>> diff --git a/include/linux/device.h b/include/linux/device.h
>>>>> index c330b75c6c57..5d70babb7462 100644
>>>>> --- a/include/linux/device.h
>>>>> +++ b/include/linux/device.h
>>>>> @@ -78,6 +78,17 @@ extern void bus_remove_file(struct bus_type *, struct bus_attribute *);
>>>>>   *           -EPROBE_DEFER it will queue the device for deferred probing.
>>>>>   * @uevent:  Called when a device is added, removed, or a few other things
>>>>>   *           that generate uevents to add the environment variables.
>>>>
>>>>> + * @add_links:       Called, perhaps multiple times per device, after a device is
>>>>> + *           added to this bus.  The function is expected to create device
>>>>> + *           links to all the suppliers of the input device that are
>>>>> + *           available at the time this function is called.  As in, the
>>>>> + *           function should NOT stop at the first failed device link if
>>>>> + *           other unlinked supplier devices are present in the system.
>>>>
>>>> * @add_links:   Called after a device is added to this bus.
>>>
>>> Why are you removing the "perhaps multiple times" part? that's true
>>> and that's how some of the other ops are documented.
>>
>> I didn't remove it.  I rephrased it with a little bit more explanation as
>> "If some suppliers are not yet available, this function will be
>> called again when the suppliers become available." (below).
>>
>>
>>>
>>>>  The function is
>>>> *               expected to create device links to all the suppliers of the
>>>> *               device that are available at the time this function is called.
>>>> *               The function must NOT stop at the first failed device link if
>>>> *               other unlinked supplier devices are present in the system.
>>>> *               If some suppliers are not yet available, this function will be
>>>> *               called again when the suppliers become available.
>>>>
>>>> but add_links() not needed, so moving this comment to of_link_to_suppliers()
>>>
>>> Sorry, I'm not sure I understand. Can you please explain what you are
>>> trying to say? of_link_to_suppliers() is just one implementation of
>>> add_links(). The comment above is try for any bus trying to implement
>>> add_links().
>>
>> This is conflating bus with the source of the firmware description of the
>> hardware topology.  For drivers that use various APIs to access firmware
>> description of topology that may be either devicetree or ACPI the access
>> is done via fwnode_operations, based on struct device.fwnode (if I recall
>> properly).
>>
>> I failed to completely address why add_links() is not needed.  The answer
>> is that there should be a single function called for all buses.  Then
>> the proper firmware data source would be accessed via a struct fwnode_operations.
>>
>> I think I left this out because I had not yet asked why this feature is
>> tied only to the platform bus.  Which I asked earlier in this reply.
> 
> Thanks for the pointer about fwnode and fwnode_operations. I wasn't
> aware of those. I see where you are going with this. I see a couple of
> problems with this approach though:
> 
> 1. How you interpret the properties of a fwnode is specific to the fw
> type. The clocks DT property isn't going to have the same definition
> in ACPI or some other firmware. Heck, I don't know if ACPI even has a
> clocks like property. So have one function to parse all the FW types
> doesn't make a lot of sense.

The functions in fwnode_operations are specific to the proper firmware.
So there is a set of functions in a struct fwnode_operations for
devicetree that only know about devicetree.  And there is a different
variable of type fwnode_operations that is initialized with ACPI
specific functions.

> 
> 2. If this common code is implemented as part of driver/base/, then at
> a minimum, I'll have to check if a fwnode is a DT node before I start
> interpreting the properties of a device's fwnode. But that means I'll
> have to include linux/of.h to use is_of_node(). I don't like having
> driver/base code depend on OF or platform or ACPI headers.

You just use the function in the device's fwnode_operations (I think,
I would have to go look at the precise way the code works because it
has been quite a while since I've looked at it).

> 
> 3. The supplier info doesn't always need to come from a firmware. So I
> don't want to limit it to that?

If you can find another source of topology info, then I would expect
that another set of fwnode_operations functions would be created
for the info source.


> 
> Also, I don't necessarily see this as conflating firmware (DT, ACPI,
> etc) with the bus (platform bus, ACPI bus, PCI bus). Whoever creates
> the device seems like the entity best suited to figure out the
> suppliers of the device (apart from the driver, obviously). So the bus
> deciding the suppliers doesn't seem wrong to me.

Patch 3 assigns the devicetree add_links function to the platform bus.
It seems incorrect to me for of_platform_default_populate_init() to be
changing a field in platform_bus_type.


   of_platform_default_populate_init()
           ...
           platform_bus_type.add_links = of_link_to_suppliers;


> 
> In this specific case, I'm trying to address DT for now and leaving
> ACPI to whoever else wants to add device links based on ACPI data.
> Most OF/DT based devices end up in platform bus. So I'm just handling
> this in platform bus. If some other person wants this to work for ACPI
> bus or PCI bus, they are welcome to implement add_links() for those
> busses? I'm nowhere close to an expert on those.

Devicetree is not limited to the platform bus.


> 
>>>
>>>>
>>>>
>>>>> + *
>>>>> + *           Return 0 if device links have been successfully created to all
>>>>> + *           the suppliers of this device.  Return an error if some of the
>>>>> + *           suppliers are not yet available and this function needs to be
>>>>> + *           reattempted in the future.
>>>>
>>>> *
>>>> *               Return 0 if device links have been successfully created to all
>>>> *               the suppliers of this device.  Return an error if some of the
>>>> *               suppliers are not yet available.
>>>>
>>>>
>>>>>   * @probe:   Called when a new device or driver add to this bus, and callback
>>>>>   *           the specific driver's probe to initial the matched device.
>>>>>   * @remove:  Called when a device removed from this bus.
>>>>> @@ -122,6 +133,7 @@ struct bus_type {
>>>>>
>>>>>       int (*match)(struct device *dev, struct device_driver *drv);
>>>>>       int (*uevent)(struct device *dev, struct kobj_uevent_env *env);
>>>>
>>>>
>>>>> +     int (*add_links)(struct device *dev);
>>>>
>>>>               ^^^^^^^^^  add_supplier              ???
>>>>               ^^^^^^^^^  add_suppliers             ???
>>>>
>>>>               ^^^^^^^^^  link_suppliers            ???
>>>>
>>>>               ^^^^^^^^^  add_supplier_dependency   ???
>>>>               ^^^^^^^^^  add_supplier_dependencies ???
>>>> add_links() not needed
>>>
>>> add_links() was an intentional decision. There's no requirement that
>>> the bus should only create links from this device to its suppliers. If
>>> the bus also knows the consumers of this device (dev), then it
>>> can/should add those too.
>>
>> Is creating links to consumers of this device implemented in this patch
>> series?  If so, I overlooked it and will have to consider how that
>> fits in to the design.
>>
>>
>>> So, it shouldn't have "suppliers" in the
>>> name.
>>>
>>>>>       int (*probe)(struct device *dev);
>>>>>       int (*remove)(struct device *dev);
>>>>>       void (*shutdown)(struct device *dev);
>>>>
>>>>
>>>>
>>>>
>>>>> @@ -893,11 +905,13 @@ enum dl_dev_state {
>>>>>   * struct dev_links_info - Device data related to device links.
>>>>>   * @suppliers: List of links to supplier devices.
>>>>>   * @consumers: List of links to consumer devices.
>>>>
>>>>> + * @needs_suppliers: Hook to global list of devices waiting for suppliers.
>>>>
>>>>     * @needs_suppliers: List of devices deferring probe until supplier drivers
>>>>     *                   are successfully probed.
>>>
>>> It's "need suppliers". As in, this is a device that "needs suppliers".
>>> So, no, this is not a list. This is a node in a list. And all "nodes
>>> in a list" are documented as "Hook" in rest of places in this file. So
>>> I think the documentation is correct as is.
>>
>> Aha, I got confused about that while trying to keep everything straight.
>>
>> Somehow I managed to conflate needs_suppliers with the links between
>> consumers and suppliers that are create via device_link_add().
>>
>> So original comment is fine.
>>
>> It is getting late, so I'll continue with patches 2 and 3 tomorrow.
>>
> 
> Thanks for the review.
> 
> -Saravana
> 

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v7 3/7] of/platform: Add functional dependency link from DT bindings
  2019-08-20  0:09             ` Saravana Kannan
@ 2019-08-20  4:26               ` Frank Rowand
  2019-08-20 22:09                 ` Saravana Kannan
  0 siblings, 1 reply; 37+ messages in thread
From: Frank Rowand @ 2019-08-20  4:26 UTC (permalink / raw)
  To: Saravana Kannan
  Cc: Rob Herring, Mark Rutland, Greg Kroah-Hartman, Rafael J. Wysocki,
	Jonathan Corbet,
	open list:OPEN FIRMWARE AND FLATTENED DEVICE TREE BINDINGS, LKML,
	David Collins, Android Kernel Team, Linux Doc Mailing List

On 8/19/19 5:09 PM, Saravana Kannan wrote:
> On Mon, Aug 19, 2019 at 2:30 PM Frank Rowand <frowand.list@gmail.com> wrote:
>>
>> On 8/19/19 1:49 PM, Saravana Kannan wrote:
>>> On Mon, Aug 19, 2019 at 10:16 AM Frank Rowand <frowand.list@gmail.com> wrote:
>>>>
>>>> On 8/15/19 6:50 PM, Saravana Kannan wrote:
>>>>> On Wed, Aug 7, 2019 at 7:06 PM Frank Rowand <frowand.list@gmail.com> wrote:
>>>>>>
>>>>>> On 7/23/19 5:10 PM, Saravana Kannan wrote:
>>>>>>> Add device-links after the devices are created (but before they are
>>>>>>> probed) by looking at common DT bindings like clocks and
>>>>>>> interconnects.
>>>>
>>>>
>>>> < very big snip (lots of comments that deserve answers) >
>>>>
>>>>
>>>>>>
>>>>>> /**
>>>>>>  * of_link_property - TODO:
>>>>>>  * dev:
>>>>>>  * con_np:
>>>>>>  * prop:
>>>>>>  *
>>>>>>  * TODO...
>>>>>>  *
>>>>>>  * Any failed attempt to create a link will NOT result in an immediate return.
>>>>>>  * of_link_property() must create all possible links even when one of more
>>>>>>  * attempts to create a link fail.
>>>>>>
>>>>>> Why?  isn't one failure enough to prevent probing this device?
>>>>>> Continuing to scan just results in extra work... which will be
>>>>>> repeated every time device_link_check_waiting_consumers() is called
>>>>>
>>>>> Context:
>>>>> As I said in the cover letter, avoiding unnecessary probes is just one
>>>>> of the reasons for this patch. The other (arguably more important)
>>>>
>>>> Agree that it is more important.
>>>>
>>>>
>>>>> reason for this patch is to make sure suppliers know that they have
>>>>> consumers that are yet to be probed. That way, suppliers can leave
>>>>> their resource on AND in the right state if they were left on by the
>>>>> bootloader. For example, if a clock was left on and at 200 MHz, the
>>>>> clock provider needs to keep that clock ON and at 200 MHz till all the
>>>>> consumers are probed.
>>>>>
>>>>> Answer: Let's say a consumer device Z has suppliers A, B and C. If the
>>>>> linking fails at A and you return immediately, then B and C could
>>>>> probe and then figure that they have no more consumers (they don't see
>>>>> a link to Z) and turn off their resources. And Z could fail
>>>>> catastrophically.
>>>>
>>>> Then I think that this approach is fatally flawed in the current implementation.
>>>
>>> I'm waiting to hear how it is fatally flawed. But maybe this is just a
>>> misunderstanding of the problem?
>>
>> Fatally flawed because it does not handle modules that add a consumer
>> device when the module is loaded.
> 
> If you are talking about modules adding child devices of the device
> they are managing, then that's handled correctly later in the series.

They may or they may not.  I do not know.  I am not going to audit all
current cases of devices being added to check that relationship and I am
not going to monitor all future patches that add devices.  Adding devices
is an existing pattern of behavior that the new feature must be able to
handle.

I have not looked at patch 6 yet (the place where modules adding child
devices is handled).  I am guessing that patch 6 could be made more
general to remove the parent child relationship restriction.

> 
> If you are talking about modules adding devices that aren't defined in
> DT, then right, I'm not trying to handle that. The module needs to
> make sure it keeps the resources needed for new devices it's adding
> are in the right state or need to add the right device links.

I am not talking about devices that are not defined in the devicetree.


> 
>>> In the text below, I'm not sure if you mixing up two different things
>>> or just that your wording it a bit ambiguous. So pardon my nitpick to
>>> err on the side of clarity.
>>
>> Please do nitpick.  Clarity is good.
>>
>>
>>>
>>>> A device can be added by a module that is loaded.
>>>
>>> No, in the example I gave, of_platform_default_populate_init() would
>>> add all 3 of those devices during arch_initcall_sync().
>>
>> The example you gave does not cover all use cases.
>>
>> There are modules that add devices when the module is loaded.  You can not
>> ignore systems using such modules.
> 
> I'll have to agree to disagree on that. While I understand that the
> design should be good and I'm happy to work on that, you can't insist
> that a patch series shouldn't be allowed because it's only improving
> 99% of the cases and leaves the other 1% in the status quo. You are
> just going to bring the kernel development to a grinding halt.

No, you do not get to disagree on that.  And you are presenting a straw
man argument.

You are proposing a new feature that contributes fragility and complexity
to the house of cards that device instantiation and driver probing already
is.

The feature is clever but it is intertwined into an area that is already
complex and in many cases difficult to work within.

I had hoped that the feature was robust enough and generic enough to
accept.  The proposed feature is a hack to paper over a specific problem
that you are facing.  I had hoped that the feature would appear generic
enough that I would not have to regard it as an attempt to paper over
the real problem.  I have not given up this hope yet but I still am
quite cautious about this approach to addressing your use case.

You have a real bug.  I have told you how to fix the real bug.  And you
have ignored my suggestion.  (To be honest, I do not know for sure that
my suggestion is feasible, but on the surface it appears to be.)  Again,
my suggestion is to have the boot loader pass information to the kernel
(via a chosen property) telling the kernel which devices the bootloader
has enabled power to.  The power subsystem would use that information
early in boot to do a "get" on the power supplier (I am not using precise
power subsystem terminology, but it should be obvious what I mean).
The consumer device driver would also have to be aware of the information
passed via the chosen property because the power subsystem has done the
"get" on the consumer devices behalf (exactly how the consumer gets
that information is an implementation detail).  This approach is
more direct, less subtle, less fragile.


> 
>>>
>>>>  In that case the device
>>>> was not present at late boot when the suppliers may turn off their resources.
>>>
>>> In that case, the _drivers_ for those devices aren't present at late
>>> boot. So that they can't request to keep the resources on for their
>>> consumer devices. Since there are no consumer requests on resources,
>>> the suppliers turn off their resources at late boot (since there isn't
>>> a better location as of today). The sync_state() call back added in a
>>> subsequent patche in this series will provide the better location.
>>
>> And the sync_state() call back will not deal with modules that add consumer
>> devices when the module is loaded, correct?
> 
> Depends. If it's just more devices from DT, then it'll be fine. If
> it's not, then the module needs to take care of the needs of devices
> it's adding.> 
>>>
>>>> (I am assuming the details since I have not reviewed the patches later in
>>>> the series that implement this part.)
>>>>
>>>> Am I missing something?
>>>
>>> I think you are mixing up devices getting added/populated with drivers
>>> getting loaded as modules?
>>
>> Only some modules add devices when they are loaded.  But these modules do
>> exist.
> 
> Out of the billions of Android devices, how many do you see this happening in?

The Linux kernel is not just used by Android devices.

-Frank

> 
> Thanks,
> Saravana
> 

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v7 3/7] of/platform: Add functional dependency link from DT bindings
  2019-08-20  4:26               ` Frank Rowand
@ 2019-08-20 22:09                 ` Saravana Kannan
  2019-08-21 16:39                   ` Frank Rowand
  0 siblings, 1 reply; 37+ messages in thread
From: Saravana Kannan @ 2019-08-20 22:09 UTC (permalink / raw)
  To: Frank Rowand
  Cc: Rob Herring, Mark Rutland, Greg Kroah-Hartman, Rafael J. Wysocki,
	Jonathan Corbet,
	open list:OPEN FIRMWARE AND FLATTENED DEVICE TREE BINDINGS, LKML,
	David Collins, Android Kernel Team, Linux Doc Mailing List

On Mon, Aug 19, 2019 at 9:26 PM Frank Rowand <frowand.list@gmail.com> wrote:
>
> On 8/19/19 5:09 PM, Saravana Kannan wrote:
> > On Mon, Aug 19, 2019 at 2:30 PM Frank Rowand <frowand.list@gmail.com> wrote:
> >>
> >> On 8/19/19 1:49 PM, Saravana Kannan wrote:
> >>> On Mon, Aug 19, 2019 at 10:16 AM Frank Rowand <frowand.list@gmail.com> wrote:
> >>>>
> >>>> On 8/15/19 6:50 PM, Saravana Kannan wrote:
> >>>>> On Wed, Aug 7, 2019 at 7:06 PM Frank Rowand <frowand.list@gmail.com> wrote:
> >>>>>>
> >>>>>> On 7/23/19 5:10 PM, Saravana Kannan wrote:
> >>>>>>> Add device-links after the devices are created (but before they are
> >>>>>>> probed) by looking at common DT bindings like clocks and
> >>>>>>> interconnects.
> >>>>
> >>>>
> >>>> < very big snip (lots of comments that deserve answers) >
> >>>>
> >>>>
> >>>>>>
> >>>>>> /**
> >>>>>>  * of_link_property - TODO:
> >>>>>>  * dev:
> >>>>>>  * con_np:
> >>>>>>  * prop:
> >>>>>>  *
> >>>>>>  * TODO...
> >>>>>>  *
> >>>>>>  * Any failed attempt to create a link will NOT result in an immediate return.
> >>>>>>  * of_link_property() must create all possible links even when one of more
> >>>>>>  * attempts to create a link fail.
> >>>>>>
> >>>>>> Why?  isn't one failure enough to prevent probing this device?
> >>>>>> Continuing to scan just results in extra work... which will be
> >>>>>> repeated every time device_link_check_waiting_consumers() is called
> >>>>>
> >>>>> Context:
> >>>>> As I said in the cover letter, avoiding unnecessary probes is just one
> >>>>> of the reasons for this patch. The other (arguably more important)
> >>>>
> >>>> Agree that it is more important.
> >>>>
> >>>>
> >>>>> reason for this patch is to make sure suppliers know that they have
> >>>>> consumers that are yet to be probed. That way, suppliers can leave
> >>>>> their resource on AND in the right state if they were left on by the
> >>>>> bootloader. For example, if a clock was left on and at 200 MHz, the
> >>>>> clock provider needs to keep that clock ON and at 200 MHz till all the
> >>>>> consumers are probed.
> >>>>>
> >>>>> Answer: Let's say a consumer device Z has suppliers A, B and C. If the
> >>>>> linking fails at A and you return immediately, then B and C could
> >>>>> probe and then figure that they have no more consumers (they don't see
> >>>>> a link to Z) and turn off their resources. And Z could fail
> >>>>> catastrophically.
> >>>>
> >>>> Then I think that this approach is fatally flawed in the current implementation.
> >>>
> >>> I'm waiting to hear how it is fatally flawed. But maybe this is just a
> >>> misunderstanding of the problem?
> >>
> >> Fatally flawed because it does not handle modules that add a consumer
> >> device when the module is loaded.
> >
> > If you are talking about modules adding child devices of the device
> > they are managing, then that's handled correctly later in the series.
>
> They may or they may not.  I do not know.  I am not going to audit all
> current cases of devices being added to check that relationship and I am
> not going to monitor all future patches that add devices.  Adding devices
> is an existing pattern of behavior that the new feature must be able to
> handle.
>
> I have not looked at patch 6 yet (the place where modules adding child
> devices is handled).  I am guessing that patch 6 could be made more
> general to remove the parent child relationship restriction.

Please do look into it then. I think it already handles all cases.

> >
> > If you are talking about modules adding devices that aren't defined in
> > DT, then right, I'm not trying to handle that. The module needs to
> > make sure it keeps the resources needed for new devices it's adding
> > are in the right state or need to add the right device links.
>
> I am not talking about devices that are not defined in the devicetree.

In that case, I'm sure my patch series handle all the scenarios
correctly. Here's why:
1. For all the top level devices the patches you've reviewed already
show how it's handled correctly.
2. All other devices in the DT are by definition the child devices of
the top level devices and patch 6 handles those cases.

Hopefully this shows to you that all DT cases are handled correctly.

> >>> In the text below, I'm not sure if you mixing up two different things
> >>> or just that your wording it a bit ambiguous. So pardon my nitpick to
> >>> err on the side of clarity.
> >>
> >> Please do nitpick.  Clarity is good.
> >>
> >>
> >>>
> >>>> A device can be added by a module that is loaded.
> >>>
> >>> No, in the example I gave, of_platform_default_populate_init() would
> >>> add all 3 of those devices during arch_initcall_sync().
> >>
> >> The example you gave does not cover all use cases.
> >>
> >> There are modules that add devices when the module is loaded.  You can not
> >> ignore systems using such modules.
> >
> > I'll have to agree to disagree on that. While I understand that the
> > design should be good and I'm happy to work on that, you can't insist
> > that a patch series shouldn't be allowed because it's only improving
> > 99% of the cases and leaves the other 1% in the status quo. You are
> > just going to bring the kernel development to a grinding halt.
>
> No, you do not get to disagree on that.  And you are presenting a straw
> man argument.
>
> You are proposing a new feature that contributes fragility and complexity
> to the house of cards that device instantiation and driver probing already
> is.

Any piece of code is going to "add complexity". It's a question of
benefits vs complexity. Also, I'm mostly reusing existing device links
API. The majority of the complexity is in parsing DT. The driver core
maintainers seem to be fine with adding sync_state() support for
device links (that is independent of DT).

> The feature is clever but it is intertwined into an area that is already
> complex and in many cases difficult to work within.
>
> I had hoped that the feature was robust enough and generic enough to
> accept.

What I'm doing IS the most generic solution instead of doing the same
work multiple times at a framework level. Also, for multi-function
devices, framework level solutions would be worse.

> The proposed feature is a hack to paper over a specific problem
> that you are facing.  I had hoped that the feature would appear generic
> enough that I would not have to regard it as an attempt to paper over
> the real problem.  I have not given up this hope yet but I still am
> quite cautious about this approach to addressing your use case.
>
> You have a real bug.  I have told you how to fix the real bug.  And you
> have ignored my suggestion.  (To be honest, I do not know for sure that
> my suggestion is feasible, but on the surface it appears to be.)

I'd actually say that your proposal is what's trying to paper over a
generic problem by saying it's specific to one or a few set of
resources. And it looks feasible to you because you haven't dove deep
into this issue.

> Again,
> my suggestion is to have the boot loader pass information to the kernel
> (via a chosen property) telling the kernel which devices the bootloader
> has enabled power to.  The power subsystem would use that information
> early in boot to do a "get" on the power supplier (I am not using precise
> power subsystem terminology, but it should be obvious what I mean).
> The consumer device driver would also have to be aware of the information
> passed via the chosen property because the power subsystem has done the
> "get" on the consumer devices behalf (exactly how the consumer gets
> that information is an implementation detail).  This approach is
> more direct, less subtle, less fragile.

I'll have to disagree on your claim. You are adding unnecessary
bootloader dependency when the kernel is completely capable of
handling this on its own. You are requiring explicit "gets" by
suppliers and then hoping all the consumers do the corresponding
"puts" to balance it out. Somehow the consumers need to know which
suppliers have parsed which bootloader input. And it's barely
scratching the surface of the problem.

You are assuming this has to do with just power when it can be clocks,
interconnects, etc. Why solve this repeated for each framework when
you can have a generic solution?

Also, while I understand what you mean by "get" it's not going to be
as simple as a reference count to keep the resource on. In reality
you'll need more complex handling. For example, having to keep a
voltage rail at or above X mV because one consumer might fail if the
voltage is < X mV. Or making sure a clock never goes about the
bootloader set frequency before all the consumer drivers are probed to
avoid overclocking one of the consumers. Trying to have this
explicitly coordinated across multiple drivers would be a nightmare.
It gets even more complicated with interconnects.

With my patch series, the consumers don't need to do anything. They
just probe as usual. The suppliers don't need to track or coordinate
with any consumers. For example, regulator suppliers just need to keep
the voltage rail at (or above) the level that the boot loader left it
on at and then apply the aggregated requests from their APIs once they
get the sync_state() callback. And it actually works -- tested for
regulators and clocks (and maybe even interconnects -- I forgot) in a
device I have.

> >>>
> >>>>  In that case the device
> >>>> was not present at late boot when the suppliers may turn off their resources.
> >>>
> >>> In that case, the _drivers_ for those devices aren't present at late
> >>> boot. So that they can't request to keep the resources on for their
> >>> consumer devices. Since there are no consumer requests on resources,
> >>> the suppliers turn off their resources at late boot (since there isn't
> >>> a better location as of today). The sync_state() call back added in a
> >>> subsequent patche in this series will provide the better location.
> >>
> >> And the sync_state() call back will not deal with modules that add consumer
> >> devices when the module is loaded, correct?
> >
> > Depends. If it's just more devices from DT, then it'll be fine. If
> > it's not, then the module needs to take care of the needs of devices
> > it's adding.>
> >>>
> >>>> (I am assuming the details since I have not reviewed the patches later in
> >>>> the series that implement this part.)
> >>>>
> >>>> Am I missing something?
> >>>
> >>> I think you are mixing up devices getting added/populated with drivers
> >>> getting loaded as modules?
> >>
> >> Only some modules add devices when they are loaded.  But these modules do
> >> exist.
> >
> > Out of the billions of Android devices, how many do you see this happening in?
>
> The Linux kernel is not just used by Android devices.

Ofcourse Linux is used by more than just Android. And Android is just
an ARM64(32) distribution. But how many platforms do you have where a
module adds devices that are not part of DT (because I'm handling the
DT part fine -- see other emails)? How does that count compare to
millions of products that can use this feature? And I'm not breaking
any of the existing platforms that don't use DT either. So saying I
have to fix this for 100% of the use cases for Linux before I can
remove the roadblocks for a common ARM64 kernel that can run on any
ARM64 platform seems like an unreasonable position.

-Saravana

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v7 1/7] driver core: Add support for linking devices during device addition
  2019-08-20  4:25           ` Frank Rowand
@ 2019-08-20 22:10             ` Saravana Kannan
  2019-08-21  1:06               ` Frank Rowand
  2019-08-21 15:36               ` Frank Rowand
  0 siblings, 2 replies; 37+ messages in thread
From: Saravana Kannan @ 2019-08-20 22:10 UTC (permalink / raw)
  To: Frank Rowand
  Cc: Rob Herring, Mark Rutland, Greg Kroah-Hartman, Rafael J. Wysocki,
	open list:OPEN FIRMWARE AND FLATTENED DEVICE TREE BINDINGS, LKML,
	David Collins, Android Kernel Team

On Mon, Aug 19, 2019 at 9:25 PM Frank Rowand <frowand.list@gmail.com> wrote:
>
> On 8/19/19 5:00 PM, Saravana Kannan wrote:
> > On Sun, Aug 18, 2019 at 8:38 PM Frank Rowand <frowand.list@gmail.com> wrote:
> >>
> >> On 8/15/19 6:50 PM, Saravana Kannan wrote:
> >>> On Wed, Aug 7, 2019 at 7:04 PM Frank Rowand <frowand.list@gmail.com> wrote:
> >>>>
> >>>>> Date: Tue, 23 Jul 2019 17:10:54 -0700
> >>>>> Subject: [PATCH v7 1/7] driver core: Add support for linking devices during
> >>>>>  device addition
> >>>>> From: Saravana Kannan <saravanak@google.com>
> >>>>>
> >>>>> When devices are added, the bus might want to create device links to track
> >>>>> functional dependencies between supplier and consumer devices. This
> >>>>> tracking of supplier-consumer relationship allows optimizing device probe
> >>>>> order and tracking whether all consumers of a supplier are active. The
> >>>>> add_links bus callback is added to support this.
> >>>>
> >>>> Change above to:
> >>>>
> >>>> When devices are added, the bus may create device links to track which
> >>>> suppliers a consumer device depends upon.  This
> >>>> tracking of supplier-consumer relationship may be used to defer probing
> >>>> the driver of a consumer device before the driver(s) for its supplier device(s)
> >>>> are probed.  It may also be used by a supplier driver to determine if
> >>>> all of its consumers have been successfully probed.
> >>>> The add_links bus callback is added to create the supplier device links
> >>>>
> >>>>>
> >>>>> However, when consumer devices are added, they might not have a supplier
> >>>>> device to link to despite needing mandatory resources/functionality from
> >>>>> one or more suppliers. A waiting_for_suppliers list is created to track
> >>>>> such consumers and retry linking them when new devices get added.
> >>>>
> >>>> Change above to:
> >>>>
> >>>> If a supplier device has not yet been created when the consumer device attempts
> >>>> to link it, the consumer device is added to the wait_for_suppliers list.
> >>>> When supplier devices are created, the supplier device link will be added to
> >>>> the relevant consumer devices on the wait_for_suppliers list.
> >>>>
> >>>
> >>> I'll take these commit text suggestions if we decide to revert the
> >>> entire series at the end of this review.
> >>>
> >>>>>
> >>>>> Signed-off-by: Saravana Kannan <saravanak@google.com>
> >>>>> ---
> >>>>>  drivers/base/core.c    | 83 ++++++++++++++++++++++++++++++++++++++++++
> >>>>>  include/linux/device.h | 14 +++++++
> >>>>>  2 files changed, 97 insertions(+)
> >>>>>
> >>>>> diff --git a/drivers/base/core.c b/drivers/base/core.c
> >>>>> index da84a73f2ba6..1b4eb221968f 100644
> >>>>> --- a/drivers/base/core.c
> >>>>> +++ b/drivers/base/core.c
> >>>>> @@ -44,6 +44,8 @@ early_param("sysfs.deprecated", sysfs_deprecated_setup);
> >>>>>  #endif
> >>>>>
> >>>>>  /* Device links support. */
> >>>>> +static LIST_HEAD(wait_for_suppliers);
> >>>>> +static DEFINE_MUTEX(wfs_lock);
> >>>>>
> >>>>>  #ifdef CONFIG_SRCU
> >>>>>  static DEFINE_MUTEX(device_links_lock);
> >>>>> @@ -401,6 +403,51 @@ struct device_link *device_link_add(struct device *consumer,
> >>>>>  }
> >>>>>  EXPORT_SYMBOL_GPL(device_link_add);
> >>>>>
> >>>>> +/**
> >>>>
> >>>>> + * device_link_wait_for_supplier - Mark device as waiting for supplier
> >>>>
> >>>>     * device_link_wait_for_supplier - Add device to wait_for_suppliers list
> >>>
> >>
> >> As a meta-comment, I found this series very hard to understand in the context
> >> of reading the new code for the first time.  When I read the code again in
> >> six months or a year or two years it will not be in near term memory and it
> >> will be as if I am reading it for the first time.  A lot of my suggestions
> >> for changes of names are in that context -- the current names may be fine
> >> when one has recently read the code, but not so much when trying to read
> >> the whole thing again with a blank mind.
> >
> > Thanks for the context.
> >
> >> The code also inherits a good deal of complexity because it does not stand
> >> alone in a nice discrete chunk, but instead delicately weaves into a more
> >> complex body of code.
> >
> > I'll take this as a compliment :)
>
> Please do!
>
>
> >
> >> When I was trying to understand the code, I wrote a lot of additional
> >> comments within my reply email to provide myself context, information
> >> about various things, and questions that I needed to answer (or if I
> >> could not answer to then ask you).  Then I ended up being able to remove
> >> many of those notes before sending the reply.
> >>
> >>
> >>> I intentionally chose "Mark device..." because that's a better
> >>> description of the semantics of the function instead of trying to
> >>> describe the implementation. Whether I'm using a linked list or some
> >>> other data structure should not be the one line documentation of a
> >>> function. Unless the function is explicitly about operating on that
> >>> specific data structure.
> >>
> >> I agree with the intent of trying to describe the semantics of a function,
> >> especially at the API level where other systems (or drivers) would be using
> >> the function.  But for this case the function is at the implementation level
> >> and describing explicitly what it is doing makes this much more readable for
> >> me.
> >
> > Are you distinguishing between API level vs implementation level based
> > on the function being "static"/not exported? I believe the earlier
>
> No, being static helps say a function is not API, but an function that is
> not static may be intended to be used in a limited and constrained manner.
> I distinguished based on the usage of the function.
>
>
> > version of this series had this function as an exported API. So maybe
> > that's why I had it as "Mark device".
> >
> >> I also find "Mark device" to be vague and not descriptive of what the
> >> intent is.
> >>
> >>>
> >>>>
> >>>>
> >>>>> + * @consumer: Consumer device
> >>>>> + *
> >>>>> + * Marks the consumer device as waiting for suppliers to become available. The
> >>>>> + * consumer device will never be probed until it's unmarked as waiting for
> >>>>> + * suppliers. The caller is responsible for adding the link to the supplier
> >>>>> + * once the supplier device is present.
> >>>>> + *
> >>>>> + * This function is NOT meant to be called from the probe function of the
> >>>>> + * consumer but rather from code that creates/adds the consumer device.
> >>>>> + */
> >>>>> +static void device_link_wait_for_supplier(struct device *consumer)
> >>>>> +{
> >>>>> +     mutex_lock(&wfs_lock);
> >>>>> +     list_add_tail(&consumer->links.needs_suppliers, &wait_for_suppliers);
> >>>>> +     mutex_unlock(&wfs_lock);
> >>>>> +}
> >>>>> +
> >>>>> +/**
> >>>>
> >>>>
> >>>>> + * device_link_check_waiting_consumers - Try to remove from supplier wait list
> >>>>> + *
> >>>>> + * Loops through all consumers waiting on suppliers and tries to add all their
> >>>>> + * supplier links. If that succeeds, the consumer device is unmarked as waiting
> >>>>> + * for suppliers. Otherwise, they are left marked as waiting on suppliers,
> >>>>> + *
> >>>>> + * The add_links bus callback is expected to return 0 if it has found and added
> >>>>> + * all the supplier links for the consumer device. It should return an error if
> >>>>> + * it isn't able to do so.
> >>>>> + *
> >>>>> + * The caller of device_link_wait_for_supplier() is expected to call this once
> >>>>> + * it's aware of potential suppliers becoming available.
> >>>>
> >>>> Change above comment to:
> >>>>
> >>>>     * device_link_add_supplier_links - add links from consumer devices to
> >>>>     *                                  supplier devices, leaving any consumer
> >>>>     *                                  with inactive suppliers on the
> >>>>     *                                  wait_for_suppliers list
> >>>
> >>> I didn't know that the first one line comment could span multiple
> >>> lines. Good to know.
> >>>
> >>>
> >>>>     * Scan all consumer devices in the devicetree.
> >>>
> >>> This function doesn't have anything to do with devicetree. I've
> >>> intentionally kept all OF related parts out of the driver/core because
> >>> I hope that other busses can start using this feature too. So I can't
> >>> take this bit.
> >>
> >> My comment is left over from when I was taking notes, trying to understand the
> >> code.
> >>
> >> At the moment, only devicetree is used as a source of the dependency information.
> >> The comment would better be re-phrased as:
> >>
> >>         * Scan all consumer devices in the firmware description of the hardware topology
> >>
> >
> > Ok
> >
> >> I did not ask why this feature is tied to _only_ the platform bus, but will now.
> >
> > Because devicetree and platform bus the only ones I'm familiar with.
> > If other busses want to add this, I'd be happy to help with code
> > and/or direction/review. But I won't pretend to know anything about
> > ACPI.
>
> Sorry, you don't get to ignore other buses because you are not familiar
> with them.

It's important that I don't design out other buses -- which I don't.
But why would you want someone who has no idea of ACPI to write code
for it? It's a futile effort that's going to be rejected by people who
know ACPI anyway.

> I am not aware of any reason to exclude devices that on other buses and your
> answer below does not provide a valid technical reason why the new feature is
> correct when it excludes all other buses.
> >
> >> I do not know of any reason that a consumer / supplier relationship can not be
> >> between devices on different bus types.  Do you know of such a reason?
> >
> > Yes, it's hypothetically possible. But I haven't seen such a
> > relationship being defined in DT. Nor somewhere else where this might
> > be captured. So, how common/realistic is it?
>
> It is entirely legal.  I have no idea how common it is but that is not a valid
> reason to exclude other buses from the feature.

I'm not going to write code for a hypothetical hardware scenario. Find
one supported in upstream, show me that it'll benefit from this series
and tell me how to interpret the dependency graph and then we'll talk
about writing code for that.

> >>>
> >>>>  For any supplier device that
> >>>>     * is not already linked to the consumer device, add the supplier to the
> >>>>     * consumer device's device links.
> >>>>     *
> >>>>     * If all of a consumer device's suppliers are available then the consumer
> >>>>     * is removed from the wait_for_suppliers list (if previously on the list).
> >>>>     * Otherwise the consumer is added to the wait_for_suppliers list (if not
> >>>>     * already on the list).
> >>>
> >>> Honestly, I don't think this is any better than what I already have.
> >>
> >> Note that my version of these comments was written while I was reading the code,
> >> and did not have any big picture understanding yet.  This will likely also be
> >> the mind set of most everyone who reads this code in the future, once it is
> >> woven into the kernel.
> >>
> >> If you don't like the change, I can revisit it in a later version of the
> >> patch set.
> >
> > I'll take in all the ones I feel are reasonable or don't feel strongly
> > about. We can revisit the rest later.
> >
> >>>
> >>>>     * The add_links bus callback must return 0 if it has found and added all
> >>>>     * the supplier links for the consumer device. It must return an error if
> >>>>     * it is not able to do so.
> >>>>     *
> >>>>     * The caller of device_link_wait_for_supplier() is expected to call this once
> >>>>     * it is aware of potential suppliers becoming available.
> >>>>
> >>>>
> >>>>
> >>>>> + */
> >>>>> +static void device_link_check_waiting_consumers(void)
> >>>>
> >>>> Function name is misleading and hides side effects.
> >>>>
> >>>> I have not come up with a name that does not hide side effects, but a better
> >>>> name would be:
> >>>>
> >>>>    device_link_add_supplier_links()
> >>>
> >>> I kinda agree that it could afford a better name. The current name is
> >>> too similar to device_links_check_suppliers() and I never liked that.
> >>
> >> Naming new fields or variables related to device links looks pretty
> >> challenging to me, because of the desire to be part of device links
> >> and not a wart pasted on the side.  So I share the pain in trying
> >> to find good names.
> >>
> >>>
> >>> Maybe device_link_add_missing_suppliers()?
> >>
> >> My first reaction was "yes, that sounds good".  But then I stopped and
> >> tried to read the name out of context.  The name is not adding the
> >> missing suppliers, it is saving the information that a supplier is
> >> not yet available (eg, is "missing").  I struggled in coming up with
>
> Reading what you say below, and looking at the code again, what I say
> in that sentence is backwards.  It is not adding the missing supplier
> device links, it is instead adding existing supplier device inks.
>
>
> >> the name that I suggested.  We can keep thinking.
> >
> > No, this function _IS_ about adding links to suppliers. These
>
> You are mis-reading what I wrote.  I said the function "is not adding
> the missing suppliers".  You are converting that to "is not adding
> links to the missing suppliers".
>
> My suggested name was hinting "add_supplier_links", which is what you
> say it does below.  The name you suggest is hinting "add_missing_suppliers".
> Do you see the difference?

Yeah, which is why I said earlier that I didn't want to repeat "links"
twice in a function name. As in
device_links_add_missing_supplier_links() has too many "links". In the
context of device_links_, "add missing suppliers" means "add missing
supplier links". Anyway, I think we can come back to figuring out a
good name once we agree on the more important discussions further
below.

> > consumers were "saved" as "not yet having the supplier" earlier by
> > device_link_wait_for_supplier(). This function doesn't do that. This
> > function is just trying to see if those missing suppliers are present
> > now and if so adding a link to them from the "saved" consumers. I
> > think device_link_add_missing_suppliers() is actually a pretty good
> > name. Let me know what you think now.
> >
> >>
> >>
> >>>
> >>> I don't think we need "links" repeated twice in the function name.
> >>
> >> Yeah, I didn't like that either.
> >>
> >>
> >>> With this suggestion, what side effect is hidden in your opinion? That
> >>> the fully linked consumer is removed from the "waiting for suppliers"
> >>> list?
> >>
> >> The side effect is that the function does not merely do a check.  It also
> >> adds missing suppliers to a list.
> >
> > No, it doesn't do that. I can't keep a list of things that aren't
> > allocated yet :). In the whole patch series, we only keep a list of things
> > (consumers) that are waiting on other things (missing suppliers).
>
> OK, as I noted above, I stated that backwards.  It is adding links for
> existing suppliers, not for the missing suppliers.
>
> >
> >>>
> >>> Maybe device_link_try_removing_from_wfs()?
> >>
> >> I like that, other than the fact that it still does not provide a clue
> >> that the function is potentially adding suppliers to a list.
> >
> > It doesn't. How would you add a supplier device to a list if the
> > device itself isn't there? :)
>
> Again, that should be existing suppliers, as you noted.  But the point stands
> that the function is potentially adding links.
>
>
> >
> >>  I think
> >> part of the challenge is that the function does two things: (1) a check,
> >> and (2) potentially adding missing suppliers to a list.  Maybe a simple
> >> one line comment at the call site, something like:
> >>
> >>    /* adds missing suppliers to wfs */
> >>
> >>
> >>>
> >>> I'll wait for us to agree on a better name here before I change this.
> >>>
> >>>>> +{
> >>>>> +     struct device *dev, *tmp;
> >>>>> +
> >>>>> +     mutex_lock(&wfs_lock);
> >>>>> +     list_for_each_entry_safe(dev, tmp, &wait_for_suppliers,
> >>>>> +                              links.needs_suppliers)
> >>>>> +             if (!dev->bus->add_links(dev))
> >>>>> +                     list_del_init(&dev->links.needs_suppliers);
> >>>>
> >>>> Empties dev->links.needs_suppliers, but does not remove dev from
> >>>> wait_for_suppliers list.  Where does that happen?
> >>>
> >>> I'll chalk this up to you having a long day or forgetting your coffee
> >>> :) list_del_init() does both of those things because needs_suppliers
> >>> is the node and wait_for_suppliers is the list.
> >>
> >> Yes, brain mis-fire on my part.  I'll have to go back and look at the
> >> list related code again.
> >>
> >>
> >>>
> >>>>
> >>>>> +     mutex_unlock(&wfs_lock);
> >>>>> +}
> >>>>> +
> >>>>>  static void device_link_free(struct device_link *link)
> >>>>>  {
> >>>>>       while (refcount_dec_not_one(&link->rpm_active))
> >>>>> @@ -535,6 +582,19 @@ int device_links_check_suppliers(struct device *dev)
> >>>>>       struct device_link *link;
> >>>>>       int ret = 0;
> >>>>>
> >>>>> +     /*
> >>>>> +      * If a device is waiting for one or more suppliers (in
> >>>>> +      * wait_for_suppliers list), it is not ready to probe yet. So just
> >>>>> +      * return -EPROBE_DEFER without having to check the links with existing
> >>>>> +      * suppliers.
> >>>>> +      */
> >>>>
> >>>> Change comment to:
> >>>>
> >>>>         /*
> >>>>          * Device waiting for supplier to become available is not allowed
> >>>>          * to probe
> >>>>          */
> >>>
> >>> Po-tay-to. Po-tah-to? I think my comment is just as good.
> >>
> >> If just as good and shorter, then better.
> >>
> >> Also the original says "it is not ready to probe".  That is not correct.  It
> >> is ready to probe, it is just that the probe attempt will return -EPROBE_DEFER.
> >> Nit picky on my part, but tiny things like that mean I have to think harder.
> >> I have to think "why is it not ready to probe?".  Maybe my version should have
> >> instead been something like:
> >>
> >>         * Device waiting for supplier to become available will return
> >>         * -EPROBE_DEFER if probed.  Avoid the unneeded processing.
> >>
> >>>
> >>>>> +     mutex_lock(&wfs_lock);
> >>>>> +     if (!list_empty(&dev->links.needs_suppliers)) {
> >>>>> +             mutex_unlock(&wfs_lock);
> >>>>> +             return -EPROBE_DEFER;
> >>>>> +     }
> >>>>> +     mutex_unlock(&wfs_lock);
> >>>>> +
> >>>>>       device_links_write_lock();
> >>>>
> >>>> Update Documentation/driver-api/device_link.rst to reflect the
> >>>> check of &dev->links.needs_suppliers in device_links_check_suppliers().
> >>>
> >>> Thanks! Will do.
> >>>
> >>>>
> >>>>>
> >>>>>       list_for_each_entry(link, &dev->links.suppliers, c_node) {
> >>>>> @@ -812,6 +872,10 @@ static void device_links_purge(struct device *dev)
> >>>>>  {
> >>>>>       struct device_link *link, *ln;
> >>>>>
> >>>>> +     mutex_lock(&wfs_lock);
> >>>>> +     list_del(&dev->links.needs_suppliers);
> >>>>> +     mutex_unlock(&wfs_lock);
> >>>>> +
> >>>>>       /*
> >>>>>        * Delete all of the remaining links from this device to any other
> >>>>>        * devices (either consumers or suppliers).
> >>>>> @@ -1673,6 +1737,7 @@ void device_initialize(struct device *dev)
> >>>>>  #endif
> >>>>>       INIT_LIST_HEAD(&dev->links.consumers);
> >>>>>       INIT_LIST_HEAD(&dev->links.suppliers);
> >>>>> +     INIT_LIST_HEAD(&dev->links.needs_suppliers);
> >>>>>       dev->links.status = DL_DEV_NO_DRIVER;
> >>>>>  }
> >>>>>  EXPORT_SYMBOL_GPL(device_initialize);
> >>>>> @@ -2108,6 +2173,24 @@ int device_add(struct device *dev)
> >>>>>                                            BUS_NOTIFY_ADD_DEVICE, dev);
> >>>>>
> >>>>>       kobject_uevent(&dev->kobj, KOBJ_ADD);
> >>>>
> >>>>> +
> >>>>> +     /*
> >>>>> +      * Check if any of the other devices (consumers) have been waiting for
> >>>>> +      * this device (supplier) to be added so that they can create a device
> >>>>> +      * link to it.
> >>>>> +      *
> >>>>> +      * This needs to happen after device_pm_add() because device_link_add()
> >>>>> +      * requires the supplier be registered before it's called.
> >>>>> +      *
> >>>>> +      * But this also needs to happe before bus_probe_device() to make sure
> >>>>> +      * waiting consumers can link to it before the driver is bound to the
> >>>>> +      * device and the driver sync_state callback is called for this device.
> >>>>> +      */
> >>>>
> >>>>         /*
> >>>>          * Add links to dev from any dependent consumer that has dev on it's
> >>>>          * list of needed suppliers
> >>>
> >>> There is no list of needed suppliers.
> >>
> >> "the other devices (consumers) have been waiting for this device (supplier)".
> >> Isn't that a list of needed suppliers?
> >
> > No, that's a list of consumers that needs_suppliers.
> >
> >>>
> >>>> (links.needs_suppliers).  Device_pm_add()
> >>>>          * must have previously registered dev to allow the links to be added.
> >>>>          *
> >>>>          * The consumer links must be created before dev is probed because the
> >>>>          * sync_state callback for dev will use the consumer links.
> >>>>          */
> >>>
> >>> I think what I wrote is just as clear.
> >>
> >> The original comment is vague.  It does not explain why consumer links must be
> >> created before the probe.  I had to go off and read other code to determine
> >> why that is true.
> >>
> >> And again, brevity is better if otherwise just as clear.
> >>
> >>
> >>>
> >>>>
> >>>>> +     device_link_check_waiting_consumers();
> >>>>> +
> >>>>> +     if (dev->bus && dev->bus->add_links && dev->bus->add_links(dev))
> >>>>> +             device_link_wait_for_supplier(dev);
> >>>>> +
> >>>>>       bus_probe_device(dev);
> >>>>>       if (parent)
> >>>>>               klist_add_tail(&dev->p->knode_parent,
> >>>>> diff --git a/include/linux/device.h b/include/linux/device.h
> >>>>> index c330b75c6c57..5d70babb7462 100644
> >>>>> --- a/include/linux/device.h
> >>>>> +++ b/include/linux/device.h
> >>>>> @@ -78,6 +78,17 @@ extern void bus_remove_file(struct bus_type *, struct bus_attribute *);
> >>>>>   *           -EPROBE_DEFER it will queue the device for deferred probing.
> >>>>>   * @uevent:  Called when a device is added, removed, or a few other things
> >>>>>   *           that generate uevents to add the environment variables.
> >>>>
> >>>>> + * @add_links:       Called, perhaps multiple times per device, after a device is
> >>>>> + *           added to this bus.  The function is expected to create device
> >>>>> + *           links to all the suppliers of the input device that are
> >>>>> + *           available at the time this function is called.  As in, the
> >>>>> + *           function should NOT stop at the first failed device link if
> >>>>> + *           other unlinked supplier devices are present in the system.
> >>>>
> >>>> * @add_links:   Called after a device is added to this bus.
> >>>
> >>> Why are you removing the "perhaps multiple times" part? that's true
> >>> and that's how some of the other ops are documented.
> >>
> >> I didn't remove it.  I rephrased it with a little bit more explanation as
> >> "If some suppliers are not yet available, this function will be
> >> called again when the suppliers become available." (below).
> >>
> >>
> >>>
> >>>>  The function is
> >>>> *               expected to create device links to all the suppliers of the
> >>>> *               device that are available at the time this function is called.
> >>>> *               The function must NOT stop at the first failed device link if
> >>>> *               other unlinked supplier devices are present in the system.
> >>>> *               If some suppliers are not yet available, this function will be
> >>>> *               called again when the suppliers become available.
> >>>>
> >>>> but add_links() not needed, so moving this comment to of_link_to_suppliers()
> >>>
> >>> Sorry, I'm not sure I understand. Can you please explain what you are
> >>> trying to say? of_link_to_suppliers() is just one implementation of
> >>> add_links(). The comment above is try for any bus trying to implement
> >>> add_links().
> >>
> >> This is conflating bus with the source of the firmware description of the
> >> hardware topology.  For drivers that use various APIs to access firmware
> >> description of topology that may be either devicetree or ACPI the access
> >> is done via fwnode_operations, based on struct device.fwnode (if I recall
> >> properly).
> >>
> >> I failed to completely address why add_links() is not needed.  The answer
> >> is that there should be a single function called for all buses.  Then
> >> the proper firmware data source would be accessed via a struct fwnode_operations.
> >>
> >> I think I left this out because I had not yet asked why this feature is
> >> tied only to the platform bus.  Which I asked earlier in this reply.
> >
> > Thanks for the pointer about fwnode and fwnode_operations. I wasn't
> > aware of those. I see where you are going with this. I see a couple of
> > problems with this approach though:
> >
> > 1. How you interpret the properties of a fwnode is specific to the fw
> > type. The clocks DT property isn't going to have the same definition
> > in ACPI or some other firmware. Heck, I don't know if ACPI even has a
> > clocks like property. So have one function to parse all the FW types
> > doesn't make a lot of sense.
>
> The functions in fwnode_operations are specific to the proper firmware.
> So there is a set of functions in a struct fwnode_operations for
> devicetree that only know about devicetree.  And there is a different
> variable of type fwnode_operations that is initialized with ACPI
> specific functions.

Yes, I understand how ops work :) So I have one ops (fwnode ops) to
call that will read a property from DT or ACPI depending on where that
specific device's firmware is from. But that's not my point here.

My point is that clock bindings in DT are under a "clocks" property
that lists references (phandles) to the supplier. But in ACPI, the
property might be called "clk" and could list references to actual
clock IDs. So, you can't have one piece of code that works for all
firmware even if I have one ops that can read properties from any
firmware.

I'll still have to know what type the underlying firmware is before I
try to interpret the properties. So having one function that parses DT
and ACPI and whatever else would be a terrible and unnecessary design.

> > 2. If this common code is implemented as part of driver/base/, then at
> > a minimum, I'll have to check if a fwnode is a DT node before I start
> > interpreting the properties of a device's fwnode. But that means I'll
> > have to include linux/of.h to use is_of_node(). I don't like having
> > driver/base code depend on OF or platform or ACPI headers.
>
> You just use the function in the device's fwnode_operations (I think,
> I would have to go look at the precise way the code works because it
> has been quite a while since I've looked at it).

Because you missed my point in (1) you are missing my point in (2).
I'll wait for your updated reply.

> >
> > 3. The supplier info doesn't always need to come from a firmware. So I
> > don't want to limit it to that?
>
> If you can find another source of topology info, then I would expect
> that another set of fwnode_operations functions would be created
> for the info source.

The other source could just be C files in the kernel. Using fwnodes
for that would be hacky. But let's sort (1) and (2) out first.

> >
> > Also, I don't necessarily see this as conflating firmware (DT, ACPI,
> > etc) with the bus (platform bus, ACPI bus, PCI bus). Whoever creates
> > the device seems like the entity best suited to figure out the
> > suppliers of the device (apart from the driver, obviously). So the bus
> > deciding the suppliers doesn't seem wrong to me.
>
> Patch 3 assigns the devicetree add_links function to the platform bus.
> It seems incorrect to me for of_platform_default_populate_init() to be
> changing a field in platform_bus_type.
>
>
>    of_platform_default_populate_init()
>            ...
>            platform_bus_type.add_links = of_link_to_suppliers;
>

I didn't want to have platform bus include OF header files.

> >
> > In this specific case, I'm trying to address DT for now and leaving
> > ACPI to whoever else wants to add device links based on ACPI data.
> > Most OF/DT based devices end up in platform bus. So I'm just handling
> > this in platform bus. If some other person wants this to work for ACPI
> > bus or PCI bus, they are welcome to implement add_links() for those
> > busses? I'm nowhere close to an expert on those.
>
> Devicetree is not limited to the platform bus.

I know. That's why I said "most". PCI seems to have some DT support too.

Thanks,
Saravana

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v7 1/7] driver core: Add support for linking devices during device addition
  2019-08-20 22:10             ` Saravana Kannan
@ 2019-08-21  1:06               ` Frank Rowand
  2019-08-21  1:56                 ` Greg Kroah-Hartman
  2019-08-21  2:22                 ` Saravana Kannan
  2019-08-21 15:36               ` Frank Rowand
  1 sibling, 2 replies; 37+ messages in thread
From: Frank Rowand @ 2019-08-21  1:06 UTC (permalink / raw)
  To: Saravana Kannan
  Cc: Rob Herring, Mark Rutland, Greg Kroah-Hartman, Rafael J. Wysocki,
	open list:OPEN FIRMWARE AND FLATTENED DEVICE TREE BINDINGS, LKML,
	David Collins, Android Kernel Team

On 8/20/19 3:10 PM, Saravana Kannan wrote:
> On Mon, Aug 19, 2019 at 9:25 PM Frank Rowand <frowand.list@gmail.com> wrote:
>>
>> On 8/19/19 5:00 PM, Saravana Kannan wrote:
>>> On Sun, Aug 18, 2019 at 8:38 PM Frank Rowand <frowand.list@gmail.com> wrote:
>>>>
>>>> On 8/15/19 6:50 PM, Saravana Kannan wrote:
>>>>> On Wed, Aug 7, 2019 at 7:04 PM Frank Rowand <frowand.list@gmail.com> wrote:
>>>>>>
>>>>>>> Date: Tue, 23 Jul 2019 17:10:54 -0700
>>>>>>> Subject: [PATCH v7 1/7] driver core: Add support for linking devices during
>>>>>>>  device addition
>>>>>>> From: Saravana Kannan <saravanak@google.com>
>>>>>>>
>>>>>>> When devices are added, the bus might want to create device links to track
>>>>>>> functional dependencies between supplier and consumer devices. This
>>>>>>> tracking of supplier-consumer relationship allows optimizing device probe
>>>>>>> order and tracking whether all consumers of a supplier are active. The
>>>>>>> add_links bus callback is added to support this.
>>>>>>
>>>>>> Change above to:
>>>>>>
>>>>>> When devices are added, the bus may create device links to track which
>>>>>> suppliers a consumer device depends upon.  This
>>>>>> tracking of supplier-consumer relationship may be used to defer probing
>>>>>> the driver of a consumer device before the driver(s) for its supplier device(s)
>>>>>> are probed.  It may also be used by a supplier driver to determine if
>>>>>> all of its consumers have been successfully probed.
>>>>>> The add_links bus callback is added to create the supplier device links
>>>>>>
>>>>>>>
>>>>>>> However, when consumer devices are added, they might not have a supplier
>>>>>>> device to link to despite needing mandatory resources/functionality from
>>>>>>> one or more suppliers. A waiting_for_suppliers list is created to track
>>>>>>> such consumers and retry linking them when new devices get added.
>>>>>>
>>>>>> Change above to:
>>>>>>
>>>>>> If a supplier device has not yet been created when the consumer device attempts
>>>>>> to link it, the consumer device is added to the wait_for_suppliers list.
>>>>>> When supplier devices are created, the supplier device link will be added to
>>>>>> the relevant consumer devices on the wait_for_suppliers list.
>>>>>>
>>>>>
>>>>> I'll take these commit text suggestions if we decide to revert the
>>>>> entire series at the end of this review.
>>>>>
>>>>>>>
>>>>>>> Signed-off-by: Saravana Kannan <saravanak@google.com>
>>>>>>> ---
>>>>>>>  drivers/base/core.c    | 83 ++++++++++++++++++++++++++++++++++++++++++
>>>>>>>  include/linux/device.h | 14 +++++++
>>>>>>>  2 files changed, 97 insertions(+)
>>>>>>>
>>>>>>> diff --git a/drivers/base/core.c b/drivers/base/core.c
>>>>>>> index da84a73f2ba6..1b4eb221968f 100644
>>>>>>> --- a/drivers/base/core.c
>>>>>>> +++ b/drivers/base/core.c
>>>>>>> @@ -44,6 +44,8 @@ early_param("sysfs.deprecated", sysfs_deprecated_setup);
>>>>>>>  #endif
>>>>>>>
>>>>>>>  /* Device links support. */
>>>>>>> +static LIST_HEAD(wait_for_suppliers);
>>>>>>> +static DEFINE_MUTEX(wfs_lock);
>>>>>>>
>>>>>>>  #ifdef CONFIG_SRCU
>>>>>>>  static DEFINE_MUTEX(device_links_lock);
>>>>>>> @@ -401,6 +403,51 @@ struct device_link *device_link_add(struct device *consumer,
>>>>>>>  }
>>>>>>>  EXPORT_SYMBOL_GPL(device_link_add);
>>>>>>>
>>>>>>> +/**
>>>>>>
>>>>>>> + * device_link_wait_for_supplier - Mark device as waiting for supplier
>>>>>>
>>>>>>     * device_link_wait_for_supplier - Add device to wait_for_suppliers list
>>>>>
>>>>
>>>> As a meta-comment, I found this series very hard to understand in the context
>>>> of reading the new code for the first time.  When I read the code again in
>>>> six months or a year or two years it will not be in near term memory and it
>>>> will be as if I am reading it for the first time.  A lot of my suggestions
>>>> for changes of names are in that context -- the current names may be fine
>>>> when one has recently read the code, but not so much when trying to read
>>>> the whole thing again with a blank mind.
>>>
>>> Thanks for the context.
>>>
>>>> The code also inherits a good deal of complexity because it does not stand
>>>> alone in a nice discrete chunk, but instead delicately weaves into a more
>>>> complex body of code.
>>>
>>> I'll take this as a compliment :)
>>
>> Please do!
>>
>>
>>>
>>>> When I was trying to understand the code, I wrote a lot of additional
>>>> comments within my reply email to provide myself context, information
>>>> about various things, and questions that I needed to answer (or if I
>>>> could not answer to then ask you).  Then I ended up being able to remove
>>>> many of those notes before sending the reply.
>>>>
>>>>
>>>>> I intentionally chose "Mark device..." because that's a better
>>>>> description of the semantics of the function instead of trying to
>>>>> describe the implementation. Whether I'm using a linked list or some
>>>>> other data structure should not be the one line documentation of a
>>>>> function. Unless the function is explicitly about operating on that
>>>>> specific data structure.
>>>>
>>>> I agree with the intent of trying to describe the semantics of a function,
>>>> especially at the API level where other systems (or drivers) would be using
>>>> the function.  But for this case the function is at the implementation level
>>>> and describing explicitly what it is doing makes this much more readable for
>>>> me.
>>>
>>> Are you distinguishing between API level vs implementation level based
>>> on the function being "static"/not exported? I believe the earlier
>>
>> No, being static helps say a function is not API, but an function that is
>> not static may be intended to be used in a limited and constrained manner.
>> I distinguished based on the usage of the function.
>>
>>
>>> version of this series had this function as an exported API. So maybe
>>> that's why I had it as "Mark device".
>>>
>>>> I also find "Mark device" to be vague and not descriptive of what the
>>>> intent is.
>>>>
>>>>>
>>>>>>
>>>>>>
>>>>>>> + * @consumer: Consumer device
>>>>>>> + *
>>>>>>> + * Marks the consumer device as waiting for suppliers to become available. The
>>>>>>> + * consumer device will never be probed until it's unmarked as waiting for
>>>>>>> + * suppliers. The caller is responsible for adding the link to the supplier
>>>>>>> + * once the supplier device is present.
>>>>>>> + *
>>>>>>> + * This function is NOT meant to be called from the probe function of the
>>>>>>> + * consumer but rather from code that creates/adds the consumer device.
>>>>>>> + */
>>>>>>> +static void device_link_wait_for_supplier(struct device *consumer)
>>>>>>> +{
>>>>>>> +     mutex_lock(&wfs_lock);
>>>>>>> +     list_add_tail(&consumer->links.needs_suppliers, &wait_for_suppliers);
>>>>>>> +     mutex_unlock(&wfs_lock);
>>>>>>> +}
>>>>>>> +
>>>>>>> +/**
>>>>>>
>>>>>>
>>>>>>> + * device_link_check_waiting_consumers - Try to remove from supplier wait list
>>>>>>> + *
>>>>>>> + * Loops through all consumers waiting on suppliers and tries to add all their
>>>>>>> + * supplier links. If that succeeds, the consumer device is unmarked as waiting
>>>>>>> + * for suppliers. Otherwise, they are left marked as waiting on suppliers,
>>>>>>> + *
>>>>>>> + * The add_links bus callback is expected to return 0 if it has found and added
>>>>>>> + * all the supplier links for the consumer device. It should return an error if
>>>>>>> + * it isn't able to do so.
>>>>>>> + *
>>>>>>> + * The caller of device_link_wait_for_supplier() is expected to call this once
>>>>>>> + * it's aware of potential suppliers becoming available.
>>>>>>
>>>>>> Change above comment to:
>>>>>>
>>>>>>     * device_link_add_supplier_links - add links from consumer devices to
>>>>>>     *                                  supplier devices, leaving any consumer
>>>>>>     *                                  with inactive suppliers on the
>>>>>>     *                                  wait_for_suppliers list
>>>>>
>>>>> I didn't know that the first one line comment could span multiple
>>>>> lines. Good to know.
>>>>>
>>>>>
>>>>>>     * Scan all consumer devices in the devicetree.
>>>>>
>>>>> This function doesn't have anything to do with devicetree. I've
>>>>> intentionally kept all OF related parts out of the driver/core because
>>>>> I hope that other busses can start using this feature too. So I can't
>>>>> take this bit.
>>>>
>>>> My comment is left over from when I was taking notes, trying to understand the
>>>> code.
>>>>
>>>> At the moment, only devicetree is used as a source of the dependency information.
>>>> The comment would better be re-phrased as:
>>>>
>>>>         * Scan all consumer devices in the firmware description of the hardware topology
>>>>
>>>
>>> Ok
>>>
>>>> I did not ask why this feature is tied to _only_ the platform bus, but will now.
>>>
>>> Because devicetree and platform bus the only ones I'm familiar with.
>>> If other busses want to add this, I'd be happy to help with code
>>> and/or direction/review. But I won't pretend to know anything about
>>> ACPI.
>>
>> Sorry, you don't get to ignore other buses because you are not familiar
>> with them.
> 
> It's important that I don't design out other buses -- which I don't.
> But why would you want someone who has no idea of ACPI to write code
> for it? It's a futile effort that's going to be rejected by people who
> know ACPI anyway.

ACPI is not a bus.

Devicetree is not a bus.

A devicetree can contain multiple buses in the topology that is described.


> 
>> I am not aware of any reason to exclude devices that on other buses and your
>> answer below does not provide a valid technical reason why the new feature is
>> correct when it excludes all other buses.
>>>
>>>> I do not know of any reason that a consumer / supplier relationship can not be
>>>> between devices on different bus types.  Do you know of such a reason?
>>>
>>> Yes, it's hypothetically possible. But I haven't seen such a
>>> relationship being defined in DT. Nor somewhere else where this might
>>> be captured. So, how common/realistic is it?
>>
>> It is entirely legal.  I have no idea how common it is but that is not a valid
>> reason to exclude other buses from the feature.
> 
> I'm not going to write code for a hypothetical hardware scenario. Find
> one supported in upstream, show me that it'll benefit from this series
> and tell me how to interpret the dependency graph and then we'll talk
> about writing code for that.

You don't get to implement a general feature in a way that only supports
a subset of potential devicetree users.  Note the word "general".  This is
not a small isolated feature.

Now, am I being inconsistent if I say that it is ok for the feature to
only support devicetree systems, or only support ACPI systems?  I'll
have to ponder that.

But I don't think the question of only platform buses or all buses needs
to be resolved because I don't think that the add_links function is a bus
specific function.  The add_links function is specific to devicetree or
ACPI.

We seem to be talking past each other on this point right now.  I don't now
how to get our minds to the same place, but let's keep trying.


> 
>>>>>
>>>>>>  For any supplier device that
>>>>>>     * is not already linked to the consumer device, add the supplier to the
>>>>>>     * consumer device's device links.
>>>>>>     *
>>>>>>     * If all of a consumer device's suppliers are available then the consumer
>>>>>>     * is removed from the wait_for_suppliers list (if previously on the list).
>>>>>>     * Otherwise the consumer is added to the wait_for_suppliers list (if not
>>>>>>     * already on the list).
>>>>>
>>>>> Honestly, I don't think this is any better than what I already have.
>>>>
>>>> Note that my version of these comments was written while I was reading the code,
>>>> and did not have any big picture understanding yet.  This will likely also be
>>>> the mind set of most everyone who reads this code in the future, once it is
>>>> woven into the kernel.
>>>>
>>>> If you don't like the change, I can revisit it in a later version of the
>>>> patch set.
>>>
>>> I'll take in all the ones I feel are reasonable or don't feel strongly
>>> about. We can revisit the rest later.
>>>
>>>>>
>>>>>>     * The add_links bus callback must return 0 if it has found and added all
>>>>>>     * the supplier links for the consumer device. It must return an error if
>>>>>>     * it is not able to do so.
>>>>>>     *
>>>>>>     * The caller of device_link_wait_for_supplier() is expected to call this once
>>>>>>     * it is aware of potential suppliers becoming available.
>>>>>>
>>>>>>
>>>>>>
>>>>>>> + */
>>>>>>> +static void device_link_check_waiting_consumers(void)
>>>>>>
>>>>>> Function name is misleading and hides side effects.
>>>>>>
>>>>>> I have not come up with a name that does not hide side effects, but a better
>>>>>> name would be:
>>>>>>
>>>>>>    device_link_add_supplier_links()
>>>>>
>>>>> I kinda agree that it could afford a better name. The current name is
>>>>> too similar to device_links_check_suppliers() and I never liked that.
>>>>
>>>> Naming new fields or variables related to device links looks pretty
>>>> challenging to me, because of the desire to be part of device links
>>>> and not a wart pasted on the side.  So I share the pain in trying
>>>> to find good names.
>>>>
>>>>>
>>>>> Maybe device_link_add_missing_suppliers()?
>>>>
>>>> My first reaction was "yes, that sounds good".  But then I stopped and
>>>> tried to read the name out of context.  The name is not adding the
>>>> missing suppliers, it is saving the information that a supplier is
>>>> not yet available (eg, is "missing").  I struggled in coming up with
>>
>> Reading what you say below, and looking at the code again, what I say
>> in that sentence is backwards.  It is not adding the missing supplier
>> device links, it is instead adding existing supplier device inks.
>>
>>
>>>> the name that I suggested.  We can keep thinking.
>>>
>>> No, this function _IS_ about adding links to suppliers. These
>>
>> You are mis-reading what I wrote.  I said the function "is not adding
>> the missing suppliers".  You are converting that to "is not adding
>> links to the missing suppliers".
>>
>> My suggested name was hinting "add_supplier_links", which is what you
>> say it does below.  The name you suggest is hinting "add_missing_suppliers".
>> Do you see the difference?
> 
> Yeah, which is why I said earlier that I didn't want to repeat "links"
> twice in a function name. As in
> device_links_add_missing_supplier_links() has too many "links". In the
> context of device_links_, "add missing suppliers" means "add missing
> supplier links". Anyway, I think we can come back to figuring out a
> good name once we agree on the more important discussions further
> below.

Yes, later is fine.  This is a detail.

> 
>>> consumers were "saved" as "not yet having the supplier" earlier by
>>> device_link_wait_for_supplier(). This function doesn't do that. This
>>> function is just trying to see if those missing suppliers are present
>>> now and if so adding a link to them from the "saved" consumers. I
>>> think device_link_add_missing_suppliers() is actually a pretty good
>>> name. Let me know what you think now.
>>>
>>>>
>>>>
>>>>>
>>>>> I don't think we need "links" repeated twice in the function name.
>>>>
>>>> Yeah, I didn't like that either.
>>>>
>>>>
>>>>> With this suggestion, what side effect is hidden in your opinion? That
>>>>> the fully linked consumer is removed from the "waiting for suppliers"
>>>>> list?
>>>>
>>>> The side effect is that the function does not merely do a check.  It also
>>>> adds missing suppliers to a list.
>>>
>>> No, it doesn't do that. I can't keep a list of things that aren't
>>> allocated yet :). In the whole patch series, we only keep a list of things
>>> (consumers) that are waiting on other things (missing suppliers).
>>
>> OK, as I noted above, I stated that backwards.  It is adding links for
>> existing suppliers, not for the missing suppliers.
>>
>>>
>>>>>
>>>>> Maybe device_link_try_removing_from_wfs()?
>>>>
>>>> I like that, other than the fact that it still does not provide a clue
>>>> that the function is potentially adding suppliers to a list.
>>>
>>> It doesn't. How would you add a supplier device to a list if the
>>> device itself isn't there? :)
>>
>> Again, that should be existing suppliers, as you noted.  But the point stands
>> that the function is potentially adding links.
>>
>>
>>>
>>>>  I think
>>>> part of the challenge is that the function does two things: (1) a check,
>>>> and (2) potentially adding missing suppliers to a list.  Maybe a simple
>>>> one line comment at the call site, something like:
>>>>
>>>>    /* adds missing suppliers to wfs */
>>>>
>>>>
>>>>>
>>>>> I'll wait for us to agree on a better name here before I change this.
>>>>>
>>>>>>> +{
>>>>>>> +     struct device *dev, *tmp;
>>>>>>> +
>>>>>>> +     mutex_lock(&wfs_lock);
>>>>>>> +     list_for_each_entry_safe(dev, tmp, &wait_for_suppliers,
>>>>>>> +                              links.needs_suppliers)
>>>>>>> +             if (!dev->bus->add_links(dev))
>>>>>>> +                     list_del_init(&dev->links.needs_suppliers);
>>>>>>
>>>>>> Empties dev->links.needs_suppliers, but does not remove dev from
>>>>>> wait_for_suppliers list.  Where does that happen?
>>>>>
>>>>> I'll chalk this up to you having a long day or forgetting your coffee
>>>>> :) list_del_init() does both of those things because needs_suppliers
>>>>> is the node and wait_for_suppliers is the list.
>>>>
>>>> Yes, brain mis-fire on my part.  I'll have to go back and look at the
>>>> list related code again.
>>>>
>>>>
>>>>>
>>>>>>
>>>>>>> +     mutex_unlock(&wfs_lock);
>>>>>>> +}
>>>>>>> +
>>>>>>>  static void device_link_free(struct device_link *link)
>>>>>>>  {
>>>>>>>       while (refcount_dec_not_one(&link->rpm_active))
>>>>>>> @@ -535,6 +582,19 @@ int device_links_check_suppliers(struct device *dev)
>>>>>>>       struct device_link *link;
>>>>>>>       int ret = 0;
>>>>>>>
>>>>>>> +     /*
>>>>>>> +      * If a device is waiting for one or more suppliers (in
>>>>>>> +      * wait_for_suppliers list), it is not ready to probe yet. So just
>>>>>>> +      * return -EPROBE_DEFER without having to check the links with existing
>>>>>>> +      * suppliers.
>>>>>>> +      */
>>>>>>
>>>>>> Change comment to:
>>>>>>
>>>>>>         /*
>>>>>>          * Device waiting for supplier to become available is not allowed
>>>>>>          * to probe
>>>>>>          */
>>>>>
>>>>> Po-tay-to. Po-tah-to? I think my comment is just as good.
>>>>
>>>> If just as good and shorter, then better.
>>>>
>>>> Also the original says "it is not ready to probe".  That is not correct.  It
>>>> is ready to probe, it is just that the probe attempt will return -EPROBE_DEFER.
>>>> Nit picky on my part, but tiny things like that mean I have to think harder.
>>>> I have to think "why is it not ready to probe?".  Maybe my version should have
>>>> instead been something like:
>>>>
>>>>         * Device waiting for supplier to become available will return
>>>>         * -EPROBE_DEFER if probed.  Avoid the unneeded processing.
>>>>
>>>>>
>>>>>>> +     mutex_lock(&wfs_lock);
>>>>>>> +     if (!list_empty(&dev->links.needs_suppliers)) {
>>>>>>> +             mutex_unlock(&wfs_lock);
>>>>>>> +             return -EPROBE_DEFER;
>>>>>>> +     }
>>>>>>> +     mutex_unlock(&wfs_lock);
>>>>>>> +
>>>>>>>       device_links_write_lock();
>>>>>>
>>>>>> Update Documentation/driver-api/device_link.rst to reflect the
>>>>>> check of &dev->links.needs_suppliers in device_links_check_suppliers().
>>>>>
>>>>> Thanks! Will do.
>>>>>
>>>>>>
>>>>>>>
>>>>>>>       list_for_each_entry(link, &dev->links.suppliers, c_node) {
>>>>>>> @@ -812,6 +872,10 @@ static void device_links_purge(struct device *dev)
>>>>>>>  {
>>>>>>>       struct device_link *link, *ln;
>>>>>>>
>>>>>>> +     mutex_lock(&wfs_lock);
>>>>>>> +     list_del(&dev->links.needs_suppliers);
>>>>>>> +     mutex_unlock(&wfs_lock);
>>>>>>> +
>>>>>>>       /*
>>>>>>>        * Delete all of the remaining links from this device to any other
>>>>>>>        * devices (either consumers or suppliers).
>>>>>>> @@ -1673,6 +1737,7 @@ void device_initialize(struct device *dev)
>>>>>>>  #endif
>>>>>>>       INIT_LIST_HEAD(&dev->links.consumers);
>>>>>>>       INIT_LIST_HEAD(&dev->links.suppliers);
>>>>>>> +     INIT_LIST_HEAD(&dev->links.needs_suppliers);
>>>>>>>       dev->links.status = DL_DEV_NO_DRIVER;
>>>>>>>  }
>>>>>>>  EXPORT_SYMBOL_GPL(device_initialize);
>>>>>>> @@ -2108,6 +2173,24 @@ int device_add(struct device *dev)
>>>>>>>                                            BUS_NOTIFY_ADD_DEVICE, dev);
>>>>>>>
>>>>>>>       kobject_uevent(&dev->kobj, KOBJ_ADD);
>>>>>>
>>>>>>> +
>>>>>>> +     /*
>>>>>>> +      * Check if any of the other devices (consumers) have been waiting for
>>>>>>> +      * this device (supplier) to be added so that they can create a device
>>>>>>> +      * link to it.
>>>>>>> +      *
>>>>>>> +      * This needs to happen after device_pm_add() because device_link_add()
>>>>>>> +      * requires the supplier be registered before it's called.
>>>>>>> +      *
>>>>>>> +      * But this also needs to happe before bus_probe_device() to make sure
>>>>>>> +      * waiting consumers can link to it before the driver is bound to the
>>>>>>> +      * device and the driver sync_state callback is called for this device.
>>>>>>> +      */
>>>>>>
>>>>>>         /*
>>>>>>          * Add links to dev from any dependent consumer that has dev on it's
>>>>>>          * list of needed suppliers
>>>>>
>>>>> There is no list of needed suppliers.
>>>>
>>>> "the other devices (consumers) have been waiting for this device (supplier)".
>>>> Isn't that a list of needed suppliers?
>>>
>>> No, that's a list of consumers that needs_suppliers.
>>>
>>>>>
>>>>>> (links.needs_suppliers).  Device_pm_add()
>>>>>>          * must have previously registered dev to allow the links to be added.
>>>>>>          *
>>>>>>          * The consumer links must be created before dev is probed because the
>>>>>>          * sync_state callback for dev will use the consumer links.
>>>>>>          */
>>>>>
>>>>> I think what I wrote is just as clear.
>>>>
>>>> The original comment is vague.  It does not explain why consumer links must be
>>>> created before the probe.  I had to go off and read other code to determine
>>>> why that is true.
>>>>
>>>> And again, brevity is better if otherwise just as clear.
>>>>
>>>>
>>>>>
>>>>>>
>>>>>>> +     device_link_check_waiting_consumers();
>>>>>>> +
>>>>>>> +     if (dev->bus && dev->bus->add_links && dev->bus->add_links(dev))
>>>>>>> +             device_link_wait_for_supplier(dev);
>>>>>>> +
>>>>>>>       bus_probe_device(dev);
>>>>>>>       if (parent)
>>>>>>>               klist_add_tail(&dev->p->knode_parent,
>>>>>>> diff --git a/include/linux/device.h b/include/linux/device.h
>>>>>>> index c330b75c6c57..5d70babb7462 100644
>>>>>>> --- a/include/linux/device.h
>>>>>>> +++ b/include/linux/device.h
>>>>>>> @@ -78,6 +78,17 @@ extern void bus_remove_file(struct bus_type *, struct bus_attribute *);
>>>>>>>   *           -EPROBE_DEFER it will queue the device for deferred probing.
>>>>>>>   * @uevent:  Called when a device is added, removed, or a few other things
>>>>>>>   *           that generate uevents to add the environment variables.
>>>>>>
>>>>>>> + * @add_links:       Called, perhaps multiple times per device, after a device is
>>>>>>> + *           added to this bus.  The function is expected to create device
>>>>>>> + *           links to all the suppliers of the input device that are
>>>>>>> + *           available at the time this function is called.  As in, the
>>>>>>> + *           function should NOT stop at the first failed device link if
>>>>>>> + *           other unlinked supplier devices are present in the system.
>>>>>>
>>>>>> * @add_links:   Called after a device is added to this bus.
>>>>>
>>>>> Why are you removing the "perhaps multiple times" part? that's true
>>>>> and that's how some of the other ops are documented.
>>>>
>>>> I didn't remove it.  I rephrased it with a little bit more explanation as
>>>> "If some suppliers are not yet available, this function will be
>>>> called again when the suppliers become available." (below).
>>>>
>>>>
>>>>>
>>>>>>  The function is
>>>>>> *               expected to create device links to all the suppliers of the
>>>>>> *               device that are available at the time this function is called.
>>>>>> *               The function must NOT stop at the first failed device link if
>>>>>> *               other unlinked supplier devices are present in the system.
>>>>>> *               If some suppliers are not yet available, this function will be
>>>>>> *               called again when the suppliers become available.
>>>>>>
>>>>>> but add_links() not needed, so moving this comment to of_link_to_suppliers()
>>>>>
>>>>> Sorry, I'm not sure I understand. Can you please explain what you are
>>>>> trying to say? of_link_to_suppliers() is just one implementation of
>>>>> add_links(). The comment above is try for any bus trying to implement
>>>>> add_links().
>>>>
>>>> This is conflating bus with the source of the firmware description of the
>>>> hardware topology.  For drivers that use various APIs to access firmware
>>>> description of topology that may be either devicetree or ACPI the access
>>>> is done via fwnode_operations, based on struct device.fwnode (if I recall
>>>> properly).
>>>>
>>>> I failed to completely address why add_links() is not needed.  The answer
>>>> is that there should be a single function called for all buses.  Then
>>>> the proper firmware data source would be accessed via a struct fwnode_operations.
>>>>
>>>> I think I left this out because I had not yet asked why this feature is
>>>> tied only to the platform bus.  Which I asked earlier in this reply.
>>>
>>> Thanks for the pointer about fwnode and fwnode_operations. I wasn't
>>> aware of those. I see where you are going with this. I see a couple of
>>> problems with this approach though:
>>>
>>> 1. How you interpret the properties of a fwnode is specific to the fw
>>> type. The clocks DT property isn't going to have the same definition
>>> in ACPI or some other firmware. Heck, I don't know if ACPI even has a
>>> clocks like property. So have one function to parse all the FW types
>>> doesn't make a lot of sense.
>>
>> The functions in fwnode_operations are specific to the proper firmware.
>> So there is a set of functions in a struct fwnode_operations for
>> devicetree that only know about devicetree.  And there is a different
>> variable of type fwnode_operations that is initialized with ACPI
>> specific functions.
> 
> Yes, I understand how ops work :) So I have one ops (fwnode ops) to
> call that will read a property from DT or ACPI depending on where that
> specific device's firmware is from. But that's not my point here.
> 
> My point is that clock bindings in DT are under a "clocks" property
> that lists references (phandles) to the supplier. But in ACPI, the
> property might be called "clk" and could list references to actual
> clock IDs. So, you can't have one piece of code that works for all
> firmware even if I have one ops that can read properties from any
> firmware.
> 
> I'll still have to know what type the underlying firmware is before I
> try to interpret the properties. So having one function that parses DT
> and ACPI and whatever else would be a terrible and unnecessary design.

You have already implemented the devicetree function, which is
of_link_to_suppliers().  The devicetree fwnode_operations would have
a pointer to of_link_to_suppliers().

If ACPI support is added, there would be an analogous ACPI aware function
that would essentially do the same thing that of_link_to_suppliers()
does.  This would be in the ACPI version of fwnode_operations.

There would not be a single function that is both devicetree aware and
ACPI aware.


> 
>>> 2. If this common code is implemented as part of driver/base/, then at
>>> a minimum, I'll have to check if a fwnode is a DT node before I start
>>> interpreting the properties of a device's fwnode. But that means I'll
>>> have to include linux/of.h to use is_of_node(). I don't like having
>>> driver/base code depend on OF or platform or ACPI headers.
>>
>> You just use the function in the device's fwnode_operations (I think,
>> I would have to go look at the precise way the code works because it
>> has been quite a while since I've looked at it).
> 
> Because you missed my point in (1) you are missing my point in (2).
> I'll wait for your updated reply.

We are still talking at cross purposes.  If my reply to (1) does not
change that, I'll have to go dig into how the fwnode framework figures
out which set of fwnode_operations to use for each device.


> 
>>>
>>> 3. The supplier info doesn't always need to come from a firmware. So I
>>> don't want to limit it to that?
>>
>> If you can find another source of topology info, then I would expect
>> that another set of fwnode_operations functions would be created
>> for the info source.
> 
> The other source could just be C files in the kernel. Using fwnodes
> for that would be hacky. But let's sort (1) and (2) out first.

I suspected that might be the other source.  But we have been trying
to deprecate this type of data.

> 
>>>
>>> Also, I don't necessarily see this as conflating firmware (DT, ACPI,
>>> etc) with the bus (platform bus, ACPI bus, PCI bus). Whoever creates
>>> the device seems like the entity best suited to figure out the
>>> suppliers of the device (apart from the driver, obviously). So the bus
>>> deciding the suppliers doesn't seem wrong to me.
>>
>> Patch 3 assigns the devicetree add_links function to the platform bus.
>> It seems incorrect to me for of_platform_default_populate_init() to be
>> changing a field in platform_bus_type.
>>
>>
>>    of_platform_default_populate_init()
>>            ...
>>            platform_bus_type.add_links = of_link_to_suppliers;
>>
> 
> I didn't want to have platform bus include OF header files.
Yet another clue that the function pointer should not belong to the bus.


> 
>>>
>>> In this specific case, I'm trying to address DT for now and leaving
>>> ACPI to whoever else wants to add device links based on ACPI data.
>>> Most OF/DT based devices end up in platform bus. So I'm just handling
>>> this in platform bus. If some other person wants this to work for ACPI
>>> bus or PCI bus, they are welcome to implement add_links() for those
>>> busses? I'm nowhere close to an expert on those.
>>
>> Devicetree is not limited to the platform bus.
> 
> I know. That's why I said "most". PCI seems to have some DT support too.
> 
> Thanks,
> Saravana
> 

-Frank

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v7 1/7] driver core: Add support for linking devices during device addition
  2019-08-21  1:06               ` Frank Rowand
@ 2019-08-21  1:56                 ` Greg Kroah-Hartman
  2019-08-21  2:01                   ` Saravana Kannan
  2019-08-21  2:04                   ` Saravana Kannan
  2019-08-21  2:22                 ` Saravana Kannan
  1 sibling, 2 replies; 37+ messages in thread
From: Greg Kroah-Hartman @ 2019-08-21  1:56 UTC (permalink / raw)
  To: Frank Rowand
  Cc: Saravana Kannan, Rob Herring, Mark Rutland, Rafael J. Wysocki,
	open list:OPEN FIRMWARE AND FLATTENED DEVICE TREE BINDINGS, LKML,
	David Collins, Android Kernel Team

On Tue, Aug 20, 2019 at 06:06:55PM -0700, Frank Rowand wrote:
> On 8/20/19 3:10 PM, Saravana Kannan wrote:
> > On Mon, Aug 19, 2019 at 9:25 PM Frank Rowand <frowand.list@gmail.com> wrote:
> >>
> >> On 8/19/19 5:00 PM, Saravana Kannan wrote:
> >>> On Sun, Aug 18, 2019 at 8:38 PM Frank Rowand <frowand.list@gmail.com> wrote:
> >>>>
> >>>> On 8/15/19 6:50 PM, Saravana Kannan wrote:
> >>>>> On Wed, Aug 7, 2019 at 7:04 PM Frank Rowand <frowand.list@gmail.com> wrote:
> >>>>>>
> >>>>>>> Date: Tue, 23 Jul 2019 17:10:54 -0700
> >>>>>>> Subject: [PATCH v7 1/7] driver core: Add support for linking devices during
> >>>>>>>  device addition
> >>>>>>> From: Saravana Kannan <saravanak@google.com>

This is a "fun" thread :(

You two should get together in person this week and talk.  I think you
both will be at ELC, can we do this tomorrow or Thursday so we can hash
it out in a way that doesn't end up talking past each other, like I feel
is happening here right now?

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v7 1/7] driver core: Add support for linking devices during device addition
  2019-08-21  1:56                 ` Greg Kroah-Hartman
@ 2019-08-21  2:01                   ` Saravana Kannan
  2019-08-21  4:24                     ` Frank Rowand
  2019-08-21  2:04                   ` Saravana Kannan
  1 sibling, 1 reply; 37+ messages in thread
From: Saravana Kannan @ 2019-08-21  2:01 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Frank Rowand, Rob Herring, Mark Rutland, Rafael J. Wysocki,
	open list:OPEN FIRMWARE AND FLATTENED DEVICE TREE BINDINGS, LKML,
	David Collins, Android Kernel Team

[-- Attachment #1: Type: text/plain, Size: 1233 bytes --]

On Tue, Aug 20, 2019, 6:56 PM Greg Kroah-Hartman <gregkh@linuxfoundation.org>
wrote:

> On Tue, Aug 20, 2019 at 06:06:55PM -0700, Frank Rowand wrote:
> > On 8/20/19 3:10 PM, Saravana Kannan wrote:
> > > On Mon, Aug 19, 2019 at 9:25 PM Frank Rowand <frowand.list@gmail.com>
> wrote:
> > >>
> > >> On 8/19/19 5:00 PM, Saravana Kannan wrote:
> > >>> On Sun, Aug 18, 2019 at 8:38 PM Frank Rowand <frowand.list@gmail.com>
> wrote:
> > >>>>
> > >>>> On 8/15/19 6:50 PM, Saravana Kannan wrote:
> > >>>>> On Wed, Aug 7, 2019 at 7:04 PM Frank Rowand <
> frowand.list@gmail.com> wrote:
> > >>>>>>
> > >>>>>>> Date: Tue, 23 Jul 2019 17:10:54 -0700
> > >>>>>>> Subject: [PATCH v7 1/7] driver core: Add support for linking
> devices during
> > >>>>>>>  device addition
> > >>>>>>> From: Saravana Kannan <saravanak@google.com>
>
> This is a "fun" thread :(
>
> You two should get together in person this week and talk.  I think you
> both will be at ELC, can we do this tomorrow or Thursday so we can hash
> it out in a way that doesn't end up talking past each other, like I feel
> is happening here right now?
>

That would be great. Wednesday would be better for me. I might not make it
to ELC on Thursday. Let us know Frank.

Thanks,
Saravana

[-- Attachment #2: Type: text/html, Size: 2233 bytes --]

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v7 1/7] driver core: Add support for linking devices during device addition
  2019-08-21  1:56                 ` Greg Kroah-Hartman
  2019-08-21  2:01                   ` Saravana Kannan
@ 2019-08-21  2:04                   ` Saravana Kannan
  1 sibling, 0 replies; 37+ messages in thread
From: Saravana Kannan @ 2019-08-21  2:04 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Frank Rowand, Rob Herring, Mark Rutland, Rafael J. Wysocki,
	open list:OPEN FIRMWARE AND FLATTENED DEVICE TREE BINDINGS, LKML,
	David Collins, Android Kernel Team

On Tue, Aug 20, 2019 at 6:56 PM Greg Kroah-Hartman
<gregkh@linuxfoundation.org> wrote:
>
> On Tue, Aug 20, 2019 at 06:06:55PM -0700, Frank Rowand wrote:
> > On 8/20/19 3:10 PM, Saravana Kannan wrote:
> > > On Mon, Aug 19, 2019 at 9:25 PM Frank Rowand <frowand.list@gmail.com> wrote:
> > >>
> > >> On 8/19/19 5:00 PM, Saravana Kannan wrote:
> > >>> On Sun, Aug 18, 2019 at 8:38 PM Frank Rowand <frowand.list@gmail.com> wrote:
> > >>>>
> > >>>> On 8/15/19 6:50 PM, Saravana Kannan wrote:
> > >>>>> On Wed, Aug 7, 2019 at 7:04 PM Frank Rowand <frowand.list@gmail.com> wrote:
> > >>>>>>
> > >>>>>>> Date: Tue, 23 Jul 2019 17:10:54 -0700
> > >>>>>>> Subject: [PATCH v7 1/7] driver core: Add support for linking devices during
> > >>>>>>>  device addition
> > >>>>>>> From: Saravana Kannan <saravanak@google.com>
>
> This is a "fun" thread :(
>
> You two should get together in person this week and talk.  I think you
> both will be at ELC, can we do this tomorrow or Thursday so we can hash
> it out in a way that doesn't end up talking past each other, like I feel
> is happening here right now?
>

Resending again due to HTML (Sorry, was a mobile reply)

That would be great. Wednesday would be better for me. I might not
make it to ELC on Thursday. Let us know Frank.

Thanks,
Saravana

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v7 1/7] driver core: Add support for linking devices during device addition
  2019-08-21  1:06               ` Frank Rowand
  2019-08-21  1:56                 ` Greg Kroah-Hartman
@ 2019-08-21  2:22                 ` Saravana Kannan
  1 sibling, 0 replies; 37+ messages in thread
From: Saravana Kannan @ 2019-08-21  2:22 UTC (permalink / raw)
  To: Frank Rowand
  Cc: Rob Herring, Mark Rutland, Greg Kroah-Hartman, Rafael J. Wysocki,
	open list:OPEN FIRMWARE AND FLATTENED DEVICE TREE BINDINGS, LKML,
	David Collins, Android Kernel Team

On Tue, Aug 20, 2019 at 6:07 PM Frank Rowand <frowand.list@gmail.com> wrote:
>
> On 8/20/19 3:10 PM, Saravana Kannan wrote:
> > On Mon, Aug 19, 2019 at 9:25 PM Frank Rowand <frowand.list@gmail.com> wrote:
> >>
> >> On 8/19/19 5:00 PM, Saravana Kannan wrote:
> >>> On Sun, Aug 18, 2019 at 8:38 PM Frank Rowand <frowand.list@gmail.com> wrote:
> >>>>
> >>>> On 8/15/19 6:50 PM, Saravana Kannan wrote:
> >>>>> On Wed, Aug 7, 2019 at 7:04 PM Frank Rowand <frowand.list@gmail.com> wrote:
> >>>>>>
> >>>>>>> Date: Tue, 23 Jul 2019 17:10:54 -0700
> >>>>>>> Subject: [PATCH v7 1/7] driver core: Add support for linking devices during
> >>>>>>>  device addition
> >>>>>>> From: Saravana Kannan <saravanak@google.com>
> >>>>>>>
> >>>>>>> When devices are added, the bus might want to create device links to track
> >>>>>>> functional dependencies between supplier and consumer devices. This
> >>>>>>> tracking of supplier-consumer relationship allows optimizing device probe
> >>>>>>> order and tracking whether all consumers of a supplier are active. The
> >>>>>>> add_links bus callback is added to support this.
> >>>>>>
> >>>>>> Change above to:
> >>>>>>
> >>>>>> When devices are added, the bus may create device links to track which
> >>>>>> suppliers a consumer device depends upon.  This
> >>>>>> tracking of supplier-consumer relationship may be used to defer probing
> >>>>>> the driver of a consumer device before the driver(s) for its supplier device(s)
> >>>>>> are probed.  It may also be used by a supplier driver to determine if
> >>>>>> all of its consumers have been successfully probed.
> >>>>>> The add_links bus callback is added to create the supplier device links
> >>>>>>
> >>>>>>>
> >>>>>>> However, when consumer devices are added, they might not have a supplier
> >>>>>>> device to link to despite needing mandatory resources/functionality from
> >>>>>>> one or more suppliers. A waiting_for_suppliers list is created to track
> >>>>>>> such consumers and retry linking them when new devices get added.
> >>>>>>
> >>>>>> Change above to:
> >>>>>>
> >>>>>> If a supplier device has not yet been created when the consumer device attempts
> >>>>>> to link it, the consumer device is added to the wait_for_suppliers list.
> >>>>>> When supplier devices are created, the supplier device link will be added to
> >>>>>> the relevant consumer devices on the wait_for_suppliers list.
> >>>>>>
> >>>>>
> >>>>> I'll take these commit text suggestions if we decide to revert the
> >>>>> entire series at the end of this review.
> >>>>>
> >>>>>>>
> >>>>>>> Signed-off-by: Saravana Kannan <saravanak@google.com>
> >>>>>>> ---
> >>>>>>>  drivers/base/core.c    | 83 ++++++++++++++++++++++++++++++++++++++++++
> >>>>>>>  include/linux/device.h | 14 +++++++
> >>>>>>>  2 files changed, 97 insertions(+)
> >>>>>>>
> >>>>>>> diff --git a/drivers/base/core.c b/drivers/base/core.c
> >>>>>>> index da84a73f2ba6..1b4eb221968f 100644
> >>>>>>> --- a/drivers/base/core.c
> >>>>>>> +++ b/drivers/base/core.c
> >>>>>>> @@ -44,6 +44,8 @@ early_param("sysfs.deprecated", sysfs_deprecated_setup);
> >>>>>>>  #endif
> >>>>>>>
> >>>>>>>  /* Device links support. */
> >>>>>>> +static LIST_HEAD(wait_for_suppliers);
> >>>>>>> +static DEFINE_MUTEX(wfs_lock);
> >>>>>>>
> >>>>>>>  #ifdef CONFIG_SRCU
> >>>>>>>  static DEFINE_MUTEX(device_links_lock);
> >>>>>>> @@ -401,6 +403,51 @@ struct device_link *device_link_add(struct device *consumer,
> >>>>>>>  }
> >>>>>>>  EXPORT_SYMBOL_GPL(device_link_add);
> >>>>>>>
> >>>>>>> +/**
> >>>>>>
> >>>>>>> + * device_link_wait_for_supplier - Mark device as waiting for supplier
> >>>>>>
> >>>>>>     * device_link_wait_for_supplier - Add device to wait_for_suppliers list
> >>>>>
> >>>>
> >>>> As a meta-comment, I found this series very hard to understand in the context
> >>>> of reading the new code for the first time.  When I read the code again in
> >>>> six months or a year or two years it will not be in near term memory and it
> >>>> will be as if I am reading it for the first time.  A lot of my suggestions
> >>>> for changes of names are in that context -- the current names may be fine
> >>>> when one has recently read the code, but not so much when trying to read
> >>>> the whole thing again with a blank mind.
> >>>
> >>> Thanks for the context.
> >>>
> >>>> The code also inherits a good deal of complexity because it does not stand
> >>>> alone in a nice discrete chunk, but instead delicately weaves into a more
> >>>> complex body of code.
> >>>
> >>> I'll take this as a compliment :)
> >>
> >> Please do!
> >>
> >>
> >>>
> >>>> When I was trying to understand the code, I wrote a lot of additional
> >>>> comments within my reply email to provide myself context, information
> >>>> about various things, and questions that I needed to answer (or if I
> >>>> could not answer to then ask you).  Then I ended up being able to remove
> >>>> many of those notes before sending the reply.
> >>>>
> >>>>
> >>>>> I intentionally chose "Mark device..." because that's a better
> >>>>> description of the semantics of the function instead of trying to
> >>>>> describe the implementation. Whether I'm using a linked list or some
> >>>>> other data structure should not be the one line documentation of a
> >>>>> function. Unless the function is explicitly about operating on that
> >>>>> specific data structure.
> >>>>
> >>>> I agree with the intent of trying to describe the semantics of a function,
> >>>> especially at the API level where other systems (or drivers) would be using
> >>>> the function.  But for this case the function is at the implementation level
> >>>> and describing explicitly what it is doing makes this much more readable for
> >>>> me.
> >>>
> >>> Are you distinguishing between API level vs implementation level based
> >>> on the function being "static"/not exported? I believe the earlier
> >>
> >> No, being static helps say a function is not API, but an function that is
> >> not static may be intended to be used in a limited and constrained manner.
> >> I distinguished based on the usage of the function.
> >>
> >>
> >>> version of this series had this function as an exported API. So maybe
> >>> that's why I had it as "Mark device".
> >>>
> >>>> I also find "Mark device" to be vague and not descriptive of what the
> >>>> intent is.
> >>>>
> >>>>>
> >>>>>>
> >>>>>>
> >>>>>>> + * @consumer: Consumer device
> >>>>>>> + *
> >>>>>>> + * Marks the consumer device as waiting for suppliers to become available. The
> >>>>>>> + * consumer device will never be probed until it's unmarked as waiting for
> >>>>>>> + * suppliers. The caller is responsible for adding the link to the supplier
> >>>>>>> + * once the supplier device is present.
> >>>>>>> + *
> >>>>>>> + * This function is NOT meant to be called from the probe function of the
> >>>>>>> + * consumer but rather from code that creates/adds the consumer device.
> >>>>>>> + */
> >>>>>>> +static void device_link_wait_for_supplier(struct device *consumer)
> >>>>>>> +{
> >>>>>>> +     mutex_lock(&wfs_lock);
> >>>>>>> +     list_add_tail(&consumer->links.needs_suppliers, &wait_for_suppliers);
> >>>>>>> +     mutex_unlock(&wfs_lock);
> >>>>>>> +}
> >>>>>>> +
> >>>>>>> +/**
> >>>>>>
> >>>>>>
> >>>>>>> + * device_link_check_waiting_consumers - Try to remove from supplier wait list
> >>>>>>> + *
> >>>>>>> + * Loops through all consumers waiting on suppliers and tries to add all their
> >>>>>>> + * supplier links. If that succeeds, the consumer device is unmarked as waiting
> >>>>>>> + * for suppliers. Otherwise, they are left marked as waiting on suppliers,
> >>>>>>> + *
> >>>>>>> + * The add_links bus callback is expected to return 0 if it has found and added
> >>>>>>> + * all the supplier links for the consumer device. It should return an error if
> >>>>>>> + * it isn't able to do so.
> >>>>>>> + *
> >>>>>>> + * The caller of device_link_wait_for_supplier() is expected to call this once
> >>>>>>> + * it's aware of potential suppliers becoming available.
> >>>>>>
> >>>>>> Change above comment to:
> >>>>>>
> >>>>>>     * device_link_add_supplier_links - add links from consumer devices to
> >>>>>>     *                                  supplier devices, leaving any consumer
> >>>>>>     *                                  with inactive suppliers on the
> >>>>>>     *                                  wait_for_suppliers list
> >>>>>
> >>>>> I didn't know that the first one line comment could span multiple
> >>>>> lines. Good to know.
> >>>>>
> >>>>>
> >>>>>>     * Scan all consumer devices in the devicetree.
> >>>>>
> >>>>> This function doesn't have anything to do with devicetree. I've
> >>>>> intentionally kept all OF related parts out of the driver/core because
> >>>>> I hope that other busses can start using this feature too. So I can't
> >>>>> take this bit.
> >>>>
> >>>> My comment is left over from when I was taking notes, trying to understand the
> >>>> code.
> >>>>
> >>>> At the moment, only devicetree is used as a source of the dependency information.
> >>>> The comment would better be re-phrased as:
> >>>>
> >>>>         * Scan all consumer devices in the firmware description of the hardware topology
> >>>>
> >>>
> >>> Ok
> >>>
> >>>> I did not ask why this feature is tied to _only_ the platform bus, but will now.
> >>>
> >>> Because devicetree and platform bus the only ones I'm familiar with.
> >>> If other busses want to add this, I'd be happy to help with code
> >>> and/or direction/review. But I won't pretend to know anything about
> >>> ACPI.
> >>
> >> Sorry, you don't get to ignore other buses because you are not familiar
> >> with them.
> >
> > It's important that I don't design out other buses -- which I don't.
> > But why would you want someone who has no idea of ACPI to write code
> > for it? It's a futile effort that's going to be rejected by people who
> > know ACPI anyway.
>
> ACPI is not a bus.
>
> Devicetree is not a bus.
>
> A devicetree can contain multiple buses in the topology that is described.

I understand these aren't busses. But most devices from ACPI and DT
get put on acpi bus or platform bus? So I was trying to just handle
platform bus since that's the majority of the use cases I run into as
part of Android. Anyway, see comments further below. I think we are
lining up now.

> >
> >> I am not aware of any reason to exclude devices that on other buses and your
> >> answer below does not provide a valid technical reason why the new feature is
> >> correct when it excludes all other buses.
> >>>
> >>>> I do not know of any reason that a consumer / supplier relationship can not be
> >>>> between devices on different bus types.  Do you know of such a reason?
> >>>
> >>> Yes, it's hypothetically possible. But I haven't seen such a
> >>> relationship being defined in DT. Nor somewhere else where this might
> >>> be captured. So, how common/realistic is it?
> >>
> >> It is entirely legal.  I have no idea how common it is but that is not a valid
> >> reason to exclude other buses from the feature.
> >
> > I'm not going to write code for a hypothetical hardware scenario. Find
> > one supported in upstream, show me that it'll benefit from this series
> > and tell me how to interpret the dependency graph and then we'll talk
> > about writing code for that.
>
> You don't get to implement a general feature in a way that only supports
> a subset of potential devicetree users.  Note the word "general".  This is
> not a small isolated feature.
>
> Now, am I being inconsistent if I say that it is ok for the feature to
> only support devicetree systems, or only support ACPI systems?  I'll
> have to ponder that.
>
> But I don't think the question of only platform buses or all buses needs
> to be resolved because I don't think that the add_links function is a bus
> specific function.  The add_links function is specific to devicetree or
> ACPI.
>
> We seem to be talking past each other on this point right now.  I don't now
> how to get our minds to the same place, but let's keep trying.

See my reply further below. That should address some of the concerns
as "buses" won't be a concern anymore. Having said that, the main
point I was making here is that I can't design for a hypothetical case
that has no example with a proper definition of what's the expected
behavior.

> >
> >>>>>
> >>>>>>  For any supplier device that
> >>>>>>     * is not already linked to the consumer device, add the supplier to the
> >>>>>>     * consumer device's device links.
> >>>>>>     *
> >>>>>>     * If all of a consumer device's suppliers are available then the consumer
> >>>>>>     * is removed from the wait_for_suppliers list (if previously on the list).
> >>>>>>     * Otherwise the consumer is added to the wait_for_suppliers list (if not
> >>>>>>     * already on the list).
> >>>>>
> >>>>> Honestly, I don't think this is any better than what I already have.
> >>>>
> >>>> Note that my version of these comments was written while I was reading the code,
> >>>> and did not have any big picture understanding yet.  This will likely also be
> >>>> the mind set of most everyone who reads this code in the future, once it is
> >>>> woven into the kernel.
> >>>>
> >>>> If you don't like the change, I can revisit it in a later version of the
> >>>> patch set.
> >>>
> >>> I'll take in all the ones I feel are reasonable or don't feel strongly
> >>> about. We can revisit the rest later.
> >>>
> >>>>>
> >>>>>>     * The add_links bus callback must return 0 if it has found and added all
> >>>>>>     * the supplier links for the consumer device. It must return an error if
> >>>>>>     * it is not able to do so.
> >>>>>>     *
> >>>>>>     * The caller of device_link_wait_for_supplier() is expected to call this once
> >>>>>>     * it is aware of potential suppliers becoming available.
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>> + */
> >>>>>>> +static void device_link_check_waiting_consumers(void)
> >>>>>>
> >>>>>> Function name is misleading and hides side effects.
> >>>>>>
> >>>>>> I have not come up with a name that does not hide side effects, but a better
> >>>>>> name would be:
> >>>>>>
> >>>>>>    device_link_add_supplier_links()
> >>>>>
> >>>>> I kinda agree that it could afford a better name. The current name is
> >>>>> too similar to device_links_check_suppliers() and I never liked that.
> >>>>
> >>>> Naming new fields or variables related to device links looks pretty
> >>>> challenging to me, because of the desire to be part of device links
> >>>> and not a wart pasted on the side.  So I share the pain in trying
> >>>> to find good names.
> >>>>
> >>>>>
> >>>>> Maybe device_link_add_missing_suppliers()?
> >>>>
> >>>> My first reaction was "yes, that sounds good".  But then I stopped and
> >>>> tried to read the name out of context.  The name is not adding the
> >>>> missing suppliers, it is saving the information that a supplier is
> >>>> not yet available (eg, is "missing").  I struggled in coming up with
> >>
> >> Reading what you say below, and looking at the code again, what I say
> >> in that sentence is backwards.  It is not adding the missing supplier
> >> device links, it is instead adding existing supplier device inks.
> >>
> >>
> >>>> the name that I suggested.  We can keep thinking.
> >>>
> >>> No, this function _IS_ about adding links to suppliers. These
> >>
> >> You are mis-reading what I wrote.  I said the function "is not adding
> >> the missing suppliers".  You are converting that to "is not adding
> >> links to the missing suppliers".
> >>
> >> My suggested name was hinting "add_supplier_links", which is what you
> >> say it does below.  The name you suggest is hinting "add_missing_suppliers".
> >> Do you see the difference?
> >
> > Yeah, which is why I said earlier that I didn't want to repeat "links"
> > twice in a function name. As in
> > device_links_add_missing_supplier_links() has too many "links". In the
> > context of device_links_, "add missing suppliers" means "add missing
> > supplier links". Anyway, I think we can come back to figuring out a
> > good name once we agree on the more important discussions further
> > below.
>
> Yes, later is fine.  This is a detail.
>
> >
> >>> consumers were "saved" as "not yet having the supplier" earlier by
> >>> device_link_wait_for_supplier(). This function doesn't do that. This
> >>> function is just trying to see if those missing suppliers are present
> >>> now and if so adding a link to them from the "saved" consumers. I
> >>> think device_link_add_missing_suppliers() is actually a pretty good
> >>> name. Let me know what you think now.
> >>>
> >>>>
> >>>>
> >>>>>
> >>>>> I don't think we need "links" repeated twice in the function name.
> >>>>
> >>>> Yeah, I didn't like that either.
> >>>>
> >>>>
> >>>>> With this suggestion, what side effect is hidden in your opinion? That
> >>>>> the fully linked consumer is removed from the "waiting for suppliers"
> >>>>> list?
> >>>>
> >>>> The side effect is that the function does not merely do a check.  It also
> >>>> adds missing suppliers to a list.
> >>>
> >>> No, it doesn't do that. I can't keep a list of things that aren't
> >>> allocated yet :). In the whole patch series, we only keep a list of things
> >>> (consumers) that are waiting on other things (missing suppliers).
> >>
> >> OK, as I noted above, I stated that backwards.  It is adding links for
> >> existing suppliers, not for the missing suppliers.
> >>
> >>>
> >>>>>
> >>>>> Maybe device_link_try_removing_from_wfs()?
> >>>>
> >>>> I like that, other than the fact that it still does not provide a clue
> >>>> that the function is potentially adding suppliers to a list.
> >>>
> >>> It doesn't. How would you add a supplier device to a list if the
> >>> device itself isn't there? :)
> >>
> >> Again, that should be existing suppliers, as you noted.  But the point stands
> >> that the function is potentially adding links.
> >>
> >>
> >>>
> >>>>  I think
> >>>> part of the challenge is that the function does two things: (1) a check,
> >>>> and (2) potentially adding missing suppliers to a list.  Maybe a simple
> >>>> one line comment at the call site, something like:
> >>>>
> >>>>    /* adds missing suppliers to wfs */
> >>>>
> >>>>
> >>>>>
> >>>>> I'll wait for us to agree on a better name here before I change this.
> >>>>>
> >>>>>>> +{
> >>>>>>> +     struct device *dev, *tmp;
> >>>>>>> +
> >>>>>>> +     mutex_lock(&wfs_lock);
> >>>>>>> +     list_for_each_entry_safe(dev, tmp, &wait_for_suppliers,
> >>>>>>> +                              links.needs_suppliers)
> >>>>>>> +             if (!dev->bus->add_links(dev))
> >>>>>>> +                     list_del_init(&dev->links.needs_suppliers);
> >>>>>>
> >>>>>> Empties dev->links.needs_suppliers, but does not remove dev from
> >>>>>> wait_for_suppliers list.  Where does that happen?
> >>>>>
> >>>>> I'll chalk this up to you having a long day or forgetting your coffee
> >>>>> :) list_del_init() does both of those things because needs_suppliers
> >>>>> is the node and wait_for_suppliers is the list.
> >>>>
> >>>> Yes, brain mis-fire on my part.  I'll have to go back and look at the
> >>>> list related code again.
> >>>>
> >>>>
> >>>>>
> >>>>>>
> >>>>>>> +     mutex_unlock(&wfs_lock);
> >>>>>>> +}
> >>>>>>> +
> >>>>>>>  static void device_link_free(struct device_link *link)
> >>>>>>>  {
> >>>>>>>       while (refcount_dec_not_one(&link->rpm_active))
> >>>>>>> @@ -535,6 +582,19 @@ int device_links_check_suppliers(struct device *dev)
> >>>>>>>       struct device_link *link;
> >>>>>>>       int ret = 0;
> >>>>>>>
> >>>>>>> +     /*
> >>>>>>> +      * If a device is waiting for one or more suppliers (in
> >>>>>>> +      * wait_for_suppliers list), it is not ready to probe yet. So just
> >>>>>>> +      * return -EPROBE_DEFER without having to check the links with existing
> >>>>>>> +      * suppliers.
> >>>>>>> +      */
> >>>>>>
> >>>>>> Change comment to:
> >>>>>>
> >>>>>>         /*
> >>>>>>          * Device waiting for supplier to become available is not allowed
> >>>>>>          * to probe
> >>>>>>          */
> >>>>>
> >>>>> Po-tay-to. Po-tah-to? I think my comment is just as good.
> >>>>
> >>>> If just as good and shorter, then better.
> >>>>
> >>>> Also the original says "it is not ready to probe".  That is not correct.  It
> >>>> is ready to probe, it is just that the probe attempt will return -EPROBE_DEFER.
> >>>> Nit picky on my part, but tiny things like that mean I have to think harder.
> >>>> I have to think "why is it not ready to probe?".  Maybe my version should have
> >>>> instead been something like:
> >>>>
> >>>>         * Device waiting for supplier to become available will return
> >>>>         * -EPROBE_DEFER if probed.  Avoid the unneeded processing.
> >>>>
> >>>>>
> >>>>>>> +     mutex_lock(&wfs_lock);
> >>>>>>> +     if (!list_empty(&dev->links.needs_suppliers)) {
> >>>>>>> +             mutex_unlock(&wfs_lock);
> >>>>>>> +             return -EPROBE_DEFER;
> >>>>>>> +     }
> >>>>>>> +     mutex_unlock(&wfs_lock);
> >>>>>>> +
> >>>>>>>       device_links_write_lock();
> >>>>>>
> >>>>>> Update Documentation/driver-api/device_link.rst to reflect the
> >>>>>> check of &dev->links.needs_suppliers in device_links_check_suppliers().
> >>>>>
> >>>>> Thanks! Will do.
> >>>>>
> >>>>>>
> >>>>>>>
> >>>>>>>       list_for_each_entry(link, &dev->links.suppliers, c_node) {
> >>>>>>> @@ -812,6 +872,10 @@ static void device_links_purge(struct device *dev)
> >>>>>>>  {
> >>>>>>>       struct device_link *link, *ln;
> >>>>>>>
> >>>>>>> +     mutex_lock(&wfs_lock);
> >>>>>>> +     list_del(&dev->links.needs_suppliers);
> >>>>>>> +     mutex_unlock(&wfs_lock);
> >>>>>>> +
> >>>>>>>       /*
> >>>>>>>        * Delete all of the remaining links from this device to any other
> >>>>>>>        * devices (either consumers or suppliers).
> >>>>>>> @@ -1673,6 +1737,7 @@ void device_initialize(struct device *dev)
> >>>>>>>  #endif
> >>>>>>>       INIT_LIST_HEAD(&dev->links.consumers);
> >>>>>>>       INIT_LIST_HEAD(&dev->links.suppliers);
> >>>>>>> +     INIT_LIST_HEAD(&dev->links.needs_suppliers);
> >>>>>>>       dev->links.status = DL_DEV_NO_DRIVER;
> >>>>>>>  }
> >>>>>>>  EXPORT_SYMBOL_GPL(device_initialize);
> >>>>>>> @@ -2108,6 +2173,24 @@ int device_add(struct device *dev)
> >>>>>>>                                            BUS_NOTIFY_ADD_DEVICE, dev);
> >>>>>>>
> >>>>>>>       kobject_uevent(&dev->kobj, KOBJ_ADD);
> >>>>>>
> >>>>>>> +
> >>>>>>> +     /*
> >>>>>>> +      * Check if any of the other devices (consumers) have been waiting for
> >>>>>>> +      * this device (supplier) to be added so that they can create a device
> >>>>>>> +      * link to it.
> >>>>>>> +      *
> >>>>>>> +      * This needs to happen after device_pm_add() because device_link_add()
> >>>>>>> +      * requires the supplier be registered before it's called.
> >>>>>>> +      *
> >>>>>>> +      * But this also needs to happe before bus_probe_device() to make sure
> >>>>>>> +      * waiting consumers can link to it before the driver is bound to the
> >>>>>>> +      * device and the driver sync_state callback is called for this device.
> >>>>>>> +      */
> >>>>>>
> >>>>>>         /*
> >>>>>>          * Add links to dev from any dependent consumer that has dev on it's
> >>>>>>          * list of needed suppliers
> >>>>>
> >>>>> There is no list of needed suppliers.
> >>>>
> >>>> "the other devices (consumers) have been waiting for this device (supplier)".
> >>>> Isn't that a list of needed suppliers?
> >>>
> >>> No, that's a list of consumers that needs_suppliers.
> >>>
> >>>>>
> >>>>>> (links.needs_suppliers).  Device_pm_add()
> >>>>>>          * must have previously registered dev to allow the links to be added.
> >>>>>>          *
> >>>>>>          * The consumer links must be created before dev is probed because the
> >>>>>>          * sync_state callback for dev will use the consumer links.
> >>>>>>          */
> >>>>>
> >>>>> I think what I wrote is just as clear.
> >>>>
> >>>> The original comment is vague.  It does not explain why consumer links must be
> >>>> created before the probe.  I had to go off and read other code to determine
> >>>> why that is true.
> >>>>
> >>>> And again, brevity is better if otherwise just as clear.
> >>>>
> >>>>
> >>>>>
> >>>>>>
> >>>>>>> +     device_link_check_waiting_consumers();
> >>>>>>> +
> >>>>>>> +     if (dev->bus && dev->bus->add_links && dev->bus->add_links(dev))
> >>>>>>> +             device_link_wait_for_supplier(dev);
> >>>>>>> +
> >>>>>>>       bus_probe_device(dev);
> >>>>>>>       if (parent)
> >>>>>>>               klist_add_tail(&dev->p->knode_parent,
> >>>>>>> diff --git a/include/linux/device.h b/include/linux/device.h
> >>>>>>> index c330b75c6c57..5d70babb7462 100644
> >>>>>>> --- a/include/linux/device.h
> >>>>>>> +++ b/include/linux/device.h
> >>>>>>> @@ -78,6 +78,17 @@ extern void bus_remove_file(struct bus_type *, struct bus_attribute *);
> >>>>>>>   *           -EPROBE_DEFER it will queue the device for deferred probing.
> >>>>>>>   * @uevent:  Called when a device is added, removed, or a few other things
> >>>>>>>   *           that generate uevents to add the environment variables.
> >>>>>>
> >>>>>>> + * @add_links:       Called, perhaps multiple times per device, after a device is
> >>>>>>> + *           added to this bus.  The function is expected to create device
> >>>>>>> + *           links to all the suppliers of the input device that are
> >>>>>>> + *           available at the time this function is called.  As in, the
> >>>>>>> + *           function should NOT stop at the first failed device link if
> >>>>>>> + *           other unlinked supplier devices are present in the system.
> >>>>>>
> >>>>>> * @add_links:   Called after a device is added to this bus.
> >>>>>
> >>>>> Why are you removing the "perhaps multiple times" part? that's true
> >>>>> and that's how some of the other ops are documented.
> >>>>
> >>>> I didn't remove it.  I rephrased it with a little bit more explanation as
> >>>> "If some suppliers are not yet available, this function will be
> >>>> called again when the suppliers become available." (below).
> >>>>
> >>>>
> >>>>>
> >>>>>>  The function is
> >>>>>> *               expected to create device links to all the suppliers of the
> >>>>>> *               device that are available at the time this function is called.
> >>>>>> *               The function must NOT stop at the first failed device link if
> >>>>>> *               other unlinked supplier devices are present in the system.
> >>>>>> *               If some suppliers are not yet available, this function will be
> >>>>>> *               called again when the suppliers become available.
> >>>>>>
> >>>>>> but add_links() not needed, so moving this comment to of_link_to_suppliers()
> >>>>>
> >>>>> Sorry, I'm not sure I understand. Can you please explain what you are
> >>>>> trying to say? of_link_to_suppliers() is just one implementation of
> >>>>> add_links(). The comment above is try for any bus trying to implement
> >>>>> add_links().
> >>>>
> >>>> This is conflating bus with the source of the firmware description of the
> >>>> hardware topology.  For drivers that use various APIs to access firmware
> >>>> description of topology that may be either devicetree or ACPI the access
> >>>> is done via fwnode_operations, based on struct device.fwnode (if I recall
> >>>> properly).
> >>>>
> >>>> I failed to completely address why add_links() is not needed.  The answer
> >>>> is that there should be a single function called for all buses.  Then
> >>>> the proper firmware data source would be accessed via a struct fwnode_operations.
> >>>>
> >>>> I think I left this out because I had not yet asked why this feature is
> >>>> tied only to the platform bus.  Which I asked earlier in this reply.
> >>>
> >>> Thanks for the pointer about fwnode and fwnode_operations. I wasn't
> >>> aware of those. I see where you are going with this. I see a couple of
> >>> problems with this approach though:
> >>>
> >>> 1. How you interpret the properties of a fwnode is specific to the fw
> >>> type. The clocks DT property isn't going to have the same definition
> >>> in ACPI or some other firmware. Heck, I don't know if ACPI even has a
> >>> clocks like property. So have one function to parse all the FW types
> >>> doesn't make a lot of sense.
> >>
> >> The functions in fwnode_operations are specific to the proper firmware.
> >> So there is a set of functions in a struct fwnode_operations for
> >> devicetree that only know about devicetree.  And there is a different
> >> variable of type fwnode_operations that is initialized with ACPI
> >> specific functions.
> >
> > Yes, I understand how ops work :) So I have one ops (fwnode ops) to
> > call that will read a property from DT or ACPI depending on where that
> > specific device's firmware is from. But that's not my point here.
> >
> > My point is that clock bindings in DT are under a "clocks" property
> > that lists references (phandles) to the supplier. But in ACPI, the
> > property might be called "clk" and could list references to actual
> > clock IDs. So, you can't have one piece of code that works for all
> > firmware even if I have one ops that can read properties from any
> > firmware.
> >
> > I'll still have to know what type the underlying firmware is before I
> > try to interpret the properties. So having one function that parses DT
> > and ACPI and whatever else would be a terrible and unnecessary design.
>
> You have already implemented the devicetree function, which is
> of_link_to_suppliers().  The devicetree fwnode_operations would have
> a pointer to of_link_to_suppliers().

Ah, I didn't realize you were asking me to add to the fwnode ops. I
thought you wanted me to handle this at the driver core level by
moving of_link_to_suppliers to driver core and replacing of_* APIs
with fwnode ops. And that seemed like a terrible idea. Glad you
weren't suggesting that.

I'm definitely open to adding a add_links to fwnode ops, but I'm not
sure if fwnode ops changes are frowned upon or not. I'll still need
the device specific edit_links() but that's a separate issue.

> If ACPI support is added, there would be an analogous ACPI aware function
> that would essentially do the same thing that of_link_to_suppliers()
> does.  This would be in the ACPI version of fwnode_operations.

Agreed. As long as you don't ask me to implement the ACPI ops :)

> There would not be a single function that is both devicetree aware and
> ACPI aware.

Great.

> >>> 2. If this common code is implemented as part of driver/base/, then at
> >>> a minimum, I'll have to check if a fwnode is a DT node before I start
> >>> interpreting the properties of a device's fwnode. But that means I'll
> >>> have to include linux/of.h to use is_of_node(). I don't like having
> >>> driver/base code depend on OF or platform or ACPI headers.
> >>
> >> You just use the function in the device's fwnode_operations (I think,
> >> I would have to go look at the precise way the code works because it
> >> has been quite a while since I've looked at it).
> >
> > Because you missed my point in (1) you are missing my point in (2).
> > I'll wait for your updated reply.
>
> We are still talking at cross purposes.  If my reply to (1) does not
> change that, I'll have to go dig into how the fwnode framework figures
> out which set of fwnode_operations to use for each device.

I think we are lining up better now. But still have some more to go :)

Having said that, I'd still like to meet your tomorrow if that's
possible (see Greg's email).

-Saravana

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v7 1/7] driver core: Add support for linking devices during device addition
  2019-08-21  2:01                   ` Saravana Kannan
@ 2019-08-21  4:24                     ` Frank Rowand
  0 siblings, 0 replies; 37+ messages in thread
From: Frank Rowand @ 2019-08-21  4:24 UTC (permalink / raw)
  To: Saravana Kannan, Greg Kroah-Hartman
  Cc: Rob Herring, Mark Rutland, Rafael J. Wysocki,
	open list:OPEN FIRMWARE AND FLATTENED DEVICE TREE BINDINGS, LKML,
	David Collins, Android Kernel Team

On 8/20/19 7:01 PM, Saravana Kannan wrote:
> 
> 
> On Tue, Aug 20, 2019, 6:56 PM Greg Kroah-Hartman <gregkh@linuxfoundation.org <mailto:gregkh@linuxfoundation.org>> wrote:
> 
>     On Tue, Aug 20, 2019 at 06:06:55PM -0700, Frank Rowand wrote:
>     > On 8/20/19 3:10 PM, Saravana Kannan wrote:
>     > > On Mon, Aug 19, 2019 at 9:25 PM Frank Rowand <frowand.list@gmail.com <mailto:frowand.list@gmail.com>> wrote:
>     > >>
>     > >> On 8/19/19 5:00 PM, Saravana Kannan wrote:
>     > >>> On Sun, Aug 18, 2019 at 8:38 PM Frank Rowand <frowand.list@gmail.com <mailto:frowand.list@gmail.com>> wrote:
>     > >>>>
>     > >>>> On 8/15/19 6:50 PM, Saravana Kannan wrote:
>     > >>>>> On Wed, Aug 7, 2019 at 7:04 PM Frank Rowand <frowand.list@gmail.com <mailto:frowand.list@gmail.com>> wrote:
>     > >>>>>>
>     > >>>>>>> Date: Tue, 23 Jul 2019 17:10:54 -0700
>     > >>>>>>> Subject: [PATCH v7 1/7] driver core: Add support for linking devices during
>     > >>>>>>>  device addition
>     > >>>>>>> From: Saravana Kannan <saravanak@google.com <mailto:saravanak@google.com>>
> 
>     This is a "fun" thread :(
> 
>     You two should get together in person this week and talk.  I think you
>     both will be at ELC, can we do this tomorrow or Thursday so we can hash
>     it out in a way that doesn't end up talking past each other, like I feel
>     is happening here right now?
> 
> 
> That would be great. Wednesday would be better for me. I might not make it to ELC on Thursday. Let us know Frank.
> 
> Thanks,
> Saravana

I am really glad that you are here at ELC.  It should be very productive to
sit down together and figure some things out.

I'll send a separate reply with my phone number off list.

-Frank

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v7 1/7] driver core: Add support for linking devices during device addition
  2019-08-20 22:10             ` Saravana Kannan
  2019-08-21  1:06               ` Frank Rowand
@ 2019-08-21 15:36               ` Frank Rowand
  1 sibling, 0 replies; 37+ messages in thread
From: Frank Rowand @ 2019-08-21 15:36 UTC (permalink / raw)
  To: Saravana Kannan
  Cc: Rob Herring, Mark Rutland, Greg Kroah-Hartman, Rafael J. Wysocki,
	open list:OPEN FIRMWARE AND FLATTENED DEVICE TREE BINDINGS, LKML,
	David Collins, Android Kernel Team

On 8/20/19 3:10 PM, Saravana Kannan wrote:
> On Mon, Aug 19, 2019 at 9:25 PM Frank Rowand <frowand.list@gmail.com> wrote:
>>
>> On 8/19/19 5:00 PM, Saravana Kannan wrote:
>>> On Sun, Aug 18, 2019 at 8:38 PM Frank Rowand <frowand.list@gmail.com> wrote:
>>>>

< snip >

>>>
>>> 3. The supplier info doesn't always need to come from a firmware. So I
>>> don't want to limit it to that?
>>
>> If you can find another source of topology info, then I would expect
>> that another set of fwnode_operations functions would be created
>> for the info source.
> 
> The other source could just be C files in the kernel. Using fwnodes
> for that would be hacky. But let's sort (1) and (2) out first.
> 

< snip >

Just a piece of trivia.  I got curious enough about this to search.
There is a third type of fwnode, software nodes.

See commit 59abd83672f70cac4b6bf9b237506c5bc6837606 for a description.

-Frank

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v7 3/7] of/platform: Add functional dependency link from DT bindings
  2019-08-20 22:09                 ` Saravana Kannan
@ 2019-08-21 16:39                   ` Frank Rowand
  0 siblings, 0 replies; 37+ messages in thread
From: Frank Rowand @ 2019-08-21 16:39 UTC (permalink / raw)
  To: Saravana Kannan
  Cc: Rob Herring, Mark Rutland, Greg Kroah-Hartman, Rafael J. Wysocki,
	Jonathan Corbet,
	open list:OPEN FIRMWARE AND FLATTENED DEVICE TREE BINDINGS, LKML,
	David Collins, Android Kernel Team, Linux Doc Mailing List

On 8/20/19 3:09 PM, Saravana Kannan wrote:
> On Mon, Aug 19, 2019 at 9:26 PM Frank Rowand <frowand.list@gmail.com> wrote:
>>


< snip - the stuff I snipped deserves reply, but I want to focus on just
  one topic for this reply >

>> You have a real bug.  I have told you how to fix the real bug.  And you
>> have ignored my suggestion.  (To be honest, I do not know for sure that
>> my suggestion is feasible, but on the surface it appears to be.)
> 
> I'd actually say that your proposal is what's trying to paper over a
> generic problem by saying it's specific to one or a few set of
> resources. And it looks feasible to you because you haven't dove deep
> into this issue.

Not saying it is specific to one or a few sets of resources.  The
proposal suggests handling every single consumer supplier relationship
for which the bootloader has enabled a supplier resource via an
explicit message communicating the enabled resources.  And directly
handling those exact resources.

Think about the definition of "paper over" vs "directly address".


> 
>> Again,
>> my suggestion is to have the boot loader pass information to the kernel
>> (via a chosen property) telling the kernel which devices the bootloader
>> has enabled power to.  The power subsystem would use that information
>> early in boot to do a "get" on the power supplier (I am not using precise
>> power subsystem terminology, but it should be obvious what I mean).
>> The consumer device driver would also have to be aware of the information
>> passed via the chosen property because the power subsystem has done the
>> "get" on the consumer devices behalf (exactly how the consumer gets
>> that information is an implementation detail).  This approach is
>> more direct, less subtle, less fragile.
> 
> I'll have to disagree on your claim. You are adding unnecessary
> bootloader dependency when the kernel is completely capable of
> handling this on its own. You are requiring explicit "gets" by
> suppliers and then hoping all the consumers do the corresponding
> "puts" to balance it out. Somehow the consumers need to know which
> suppliers have parsed which bootloader input. And it's barely
> scratching the surface of the problem.

OK, let me flesh out a possible implementation just a little bit.

This is focused on devicetree, as is your patch series.  For ACPI
a parallel implementation would exist.

The bootloader chosen property could be a list of tuples, each tuple
containing: consumer phandle, supplier phandle.  Each tuple could
contain more data if the implementation demands, but I'm trying to
keep it simple to illustrate the concept.

In early-ish boot a core function processes the chosen property.  For
each consumer / supplier pair, the supplier compatible could be used
to determine the supplier type.  (This might not be enough info to
determine the supplier type - maybe the consumer property that points
to the supplier will also have to be specified in the chosen property
tuple, or maybe a supplier type could be added to the tuple.)  Given
the consumer, supplier, and resource type the appropriate "get"
would be done.

Late in boot, and possible repeated after modules are loaded, a core
function would scan the chosen property tuples, and for each
consumer / supplier pair, if both the consumer and the supplier
drivers are bound, it would be ASSUMED that it is ok to do the
appropriate type of "put", and the "put" would be done.


> 
> You are assuming this has to do with just power when it can be clocks,
> interconnects, etc. Why solve this repeated for each framework when
> you can have a generic solution?

No such assumption.


> 
> Also, while I understand what you mean by "get" it's not going to be
> as simple as a reference count to keep the resource on. In reality
> you'll need more complex handling. For example, having to keep a
> voltage rail at or above X mV because one consumer might fail if the
> voltage is < X mV. Or making sure a clock never goes about the
> bootloader set frequency before all the consumer drivers are probed to
> avoid overclocking one of the consumers. Trying to have this
> explicitly coordinated across multiple drivers would be a nightmare.
> It gets even more complicated with interconnects.
> 
> With my patch series, the consumers don't need to do anything. They
> just probe as usual. The suppliers don't need to track or coordinate
> with any consumers. For example, regulator suppliers just need to keep
> the voltage rail at (or above) the level that the boot loader left it
> on at and then apply the aggregated requests from their APIs once they
> get the sync_state() callback. And it actually works -- tested for
> regulators and clocks (and maybe even interconnects -- I forgot) in a
> device I have.
> 

And same for the possible implementation I sketched above.  The equivalent
of the sync_state() callback would be done by the end of boot (potentially
repeated after each module loads) core function making a similar call.
Hand waving here about what suppliers to call.

Of course this is not the only way to implement my concept, just an
example to suggest that it might be feasible and it might work.

< snip >

-Frank

^ permalink raw reply	[flat|nested] 37+ messages in thread

end of thread, other threads:[~2019-08-21 16:39 UTC | newest]

Thread overview: 37+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-07-24  0:10 [PATCH v7 0/7] Solve postboot supplier cleanup and optimize probe ordering Saravana Kannan
2019-07-24  0:10 ` [PATCH v7 1/7] driver core: Add support for linking devices during device addition Saravana Kannan
2019-08-08  2:04   ` Frank Rowand
2019-08-16  1:50     ` Saravana Kannan
2019-08-19  3:38       ` Frank Rowand
2019-08-20  0:00         ` Saravana Kannan
2019-08-20  4:25           ` Frank Rowand
2019-08-20 22:10             ` Saravana Kannan
2019-08-21  1:06               ` Frank Rowand
2019-08-21  1:56                 ` Greg Kroah-Hartman
2019-08-21  2:01                   ` Saravana Kannan
2019-08-21  4:24                     ` Frank Rowand
2019-08-21  2:04                   ` Saravana Kannan
2019-08-21  2:22                 ` Saravana Kannan
2019-08-21 15:36               ` Frank Rowand
2019-07-24  0:10 ` [PATCH v7 2/7] driver core: Add edit_links() callback for drivers Saravana Kannan
2019-08-08  2:05   ` Frank Rowand
2019-08-16  1:50     ` Saravana Kannan
2019-07-24  0:10 ` [PATCH v7 3/7] of/platform: Add functional dependency link from DT bindings Saravana Kannan
2019-08-08  2:06   ` Frank Rowand
2019-08-16  1:50     ` Saravana Kannan
2019-08-19 17:16       ` Frank Rowand
2019-08-19 20:49         ` Saravana Kannan
2019-08-19 21:30           ` Frank Rowand
2019-08-20  0:09             ` Saravana Kannan
2019-08-20  4:26               ` Frank Rowand
2019-08-20 22:09                 ` Saravana Kannan
2019-08-21 16:39                   ` Frank Rowand
2019-07-24  0:10 ` [PATCH v7 4/7] driver core: Add sync_state driver/bus callback Saravana Kannan
2019-07-24  0:10 ` [PATCH v7 5/7] of/platform: Pause/resume sync state during init and of_platform_populate() Saravana Kannan
2019-07-24  0:10 ` [PATCH v7 6/7] of/platform: Create device links for all child-supplier depencencies Saravana Kannan
2019-07-24  0:11 ` [PATCH v7 7/7] of/platform: Don't create device links for default busses Saravana Kannan
2019-07-25 13:42 ` [PATCH v7 0/7] Solve postboot supplier cleanup and optimize probe ordering Greg Kroah-Hartman
2019-07-25 21:04   ` Frank Rowand
2019-07-26 14:32     ` Greg Kroah-Hartman
2019-07-31  2:22       ` Frank Rowand
2019-08-08  2:02 ` Frank Rowand

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).